DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  @ 2021-06-16 13:02  0%           ` Bruce Richardson
  2021-06-16 15:01  0%             ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2021-06-16 13:02 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Jerin Jacob, Thomas Monjalon, dpdk-dev, Olivier Matz,
	Andrew Rybchenko, Honnappa Nagarahalli, Ananyev, Konstantin,
	Ferruh Yigit, Jerin Jacob, Akhil Goyal

On Wed, Jun 16, 2021 at 01:27:17PM +0200, Morten Brørup wrote:
> > From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> > Sent: Wednesday, 16 June 2021 11.42
> > 
> > On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon <thomas@monjalon.net>
> > wrote:
> > >
> > > 14/06/2021 17:48, Morten Brørup:
> > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> > Monjalon
> > > > It would be much simpler to just increase RTE_MAX_ETHPORTS to
> > something big enough to hold a sufficiently large array. And possibly
> > add an rte_max_ethports variable to indicate the number of populated
> > entries in the array, for use when iterating over the array.
> > > >
> > > > Can we come up with another example than RTE_MAX_ETHPORTS where
> > this library provides a better benefit?
> > >
> > > What is big enough?
> > > Is 640KB enough for RAM? ;)
> > 
> > If I understand it correctly, Linux process allocates 640KB due to
> > that fact currently
> > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and it
> > is from BSS.
> 
> Correct.
> 
> > If we make this from heap i.e use malloc() to allocate this memory
> > then in my understanding Linux
> > really won't allocate the real page for backend memory until unless,
> > someone write/read to this memory.
> 
> If the array is allocated from the heap, its members will be accessed though a pointer to the array, e.g. in rte_eth_rx/tx_burst(). This might affect performance, which is probably why the array is allocated the way it is.
>

It depends on whether the array contains pointers to malloced elements or
the array itself is just a single malloced array of all the structures.
While I think the parray proposal referred to the former - which would have
an extra level of indirection - the switch we are discussing here is the
latter which should have no performance difference, since the method of
accessing the elements will be the same, only with the base address
pointing to a different area of memory.
 
> Although it might be worth investigating how much it actually affects the performance.
> 
> So we need to do something else if we want to conserve memory and still allow a large rte_eth_devices[] array.
> 
> Looking at struct rte_eth_dev, we could reduce its size as follows:
> 
> 1. Change the two callback arrays post_rx/pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT] to pointers to callback arrays, which are allocated from the heap.
> With the default RTE_MAX_QUEUES_PER_PORT of 1024, these two arrays are the sinners that make the struct rte_eth_dev use so much memory. This modification would save 16 KB (minus 16 bytes for the pointers to the two arrays) per port.
> Furthermore, these callback arrays would only need to be allocated if the application is compiled with callbacks enabled (#define RTE_ETHDEV_RXTX_CALLBACKS). And they would only need to be sized to the actual number of queues for the port.
> 
> The disadvantage is that this would add another level of indirection, although only for applications compiled with callbacks enabled.
> 
This seems reasonable to at least investigate.

> 2. Remove reserved_64s[4] and reserved_ptrs[4]. This would save 64 bytes per port. Not much, but worth considering if we are changing the API/ABI anyway.
> 
I strongly dislike reserved fields to I would tend to favour these.
However, it does possibly reduce future compatibility if we do need to add
something to ethdev.

Another option is to split ethdev into fast-path and non-fastpath parts -
similar to Konstantin's suggestion of just having an array of the ops. We
can have an array of minimal structures with fastpath ops and queue
pointers, for example, with an ethdev-private pointer to the rest of the
struct elsewhere in memory. Since that second struct would be allocated
on-demand, the size of the ethdev array can be scaled with far smaller
footprint.

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-16 13:02  0%           ` Bruce Richardson
@ 2021-06-16 15:01  0%             ` Morten Brørup
  0 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2021-06-16 15:01 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Jerin Jacob, Thomas Monjalon, dpdk-dev, Olivier Matz,
	Andrew Rybchenko, Honnappa Nagarahalli, Ananyev, Konstantin,
	Ferruh Yigit, Jerin Jacob, Akhil Goyal

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Wednesday, 16 June 2021 15.03
> 
> On Wed, Jun 16, 2021 at 01:27:17PM +0200, Morten Brørup wrote:
> > > From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> > > Sent: Wednesday, 16 June 2021 11.42
> > >
> > > On Tue, Jun 15, 2021 at 12:18 PM Thomas Monjalon
> <thomas@monjalon.net>
> > > wrote:
> > > >
> > > > 14/06/2021 17:48, Morten Brørup:
> > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> > > Monjalon
> > > > > It would be much simpler to just increase RTE_MAX_ETHPORTS to
> > > something big enough to hold a sufficiently large array. And
> possibly
> > > add an rte_max_ethports variable to indicate the number of
> populated
> > > entries in the array, for use when iterating over the array.
> > > > >
> > > > > Can we come up with another example than RTE_MAX_ETHPORTS where
> > > this library provides a better benefit?
> > > >
> > > > What is big enough?
> > > > Is 640KB enough for RAM? ;)
> > >
> > > If I understand it correctly, Linux process allocates 640KB due to
> > > that fact currently
> > > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS] is global and
> it
> > > is from BSS.
> >
> > Correct.
> >
> > > If we make this from heap i.e use malloc() to allocate this memory
> > > then in my understanding Linux
> > > really won't allocate the real page for backend memory until
> unless,
> > > someone write/read to this memory.
> >
> > If the array is allocated from the heap, its members will be accessed
> though a pointer to the array, e.g. in rte_eth_rx/tx_burst(). This
> might affect performance, which is probably why the array is allocated
> the way it is.
> >
> 
> It depends on whether the array contains pointers to malloced elements
> or
> the array itself is just a single malloced array of all the structures.
> While I think the parray proposal referred to the former - which would
> have
> an extra level of indirection - the switch we are discussing here is
> the
> latter which should have no performance difference, since the method of
> accessing the elements will be the same, only with the base address
> pointing to a different area of memory.

I was not talking about an array of pointers. And it is not the same:

int arr[27];
int * parr = arr;

// direct access
int dir(int i) { return arr[i]; }

// indirect access
int indir(int i) { return parr[i]; }

The direct access knows the address of arr, so it will compile to:
        movsx   rdi, edi
        mov     eax, DWORD PTR arr[0+rdi*4]
        ret

The indirect access needs to first read the memory location holding the pointer to the array, and then it can read the array member, so it will compile to:
        mov     rax, QWORD PTR parr[rip]
        movsx   rdi, edi
        mov     eax, DWORD PTR [rax+rdi*4]
        ret

> 
> > Although it might be worth investigating how much it actually affects
> the performance.
> >
> > So we need to do something else if we want to conserve memory and
> still allow a large rte_eth_devices[] array.
> >
> > Looking at struct rte_eth_dev, we could reduce its size as follows:
> >
> > 1. Change the two callback arrays
> post_rx/pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT] to pointers to
> callback arrays, which are allocated from the heap.
> > With the default RTE_MAX_QUEUES_PER_PORT of 1024, these two arrays
> are the sinners that make the struct rte_eth_dev use so much memory.
> This modification would save 16 KB (minus 16 bytes for the pointers to
> the two arrays) per port.
> > Furthermore, these callback arrays would only need to be allocated if
> the application is compiled with callbacks enabled (#define
> RTE_ETHDEV_RXTX_CALLBACKS). And they would only need to be sized to the
> actual number of queues for the port.
> >
> > The disadvantage is that this would add another level of indirection,
> although only for applications compiled with callbacks enabled.
> >
> This seems reasonable to at least investigate.
> 
> > 2. Remove reserved_64s[4] and reserved_ptrs[4]. This would save 64
> bytes per port. Not much, but worth considering if we are changing the
> API/ABI anyway.
> >
> I strongly dislike reserved fields to I would tend to favour these.
> However, it does possibly reduce future compatibility if we do need to
> add
> something to ethdev.

There should be an official policy about adding reserved fields for future compatibility. I'm against adding them, unless it can be argued that they are likely to match what is needed in the future; in the real world there is no way to know if they match future requirements.

> 
> Another option is to split ethdev into fast-path and non-fastpath parts
> -
> similar to Konstantin's suggestion of just having an array of the ops.
> We
> can have an array of minimal structures with fastpath ops and queue
> pointers, for example, with an ethdev-private pointer to the rest of
> the
> struct elsewhere in memory. Since that second struct would be allocated
> on-demand, the size of the ethdev array can be scaled with far smaller
> footprint.
> 
> /Bruce

The rte_eth_dev structures are really well organized now. E.g. the rx/tx function pointers and the pointer to the shared memory data of the driver are in the same cache line. We must be very careful if we change them.

Also, rte_ethdev.h and rte_ethdev_core.h are easy to read and understand.

-Morten

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  @ 2021-06-17 13:08  3%       ` Ferruh Yigit
  2021-06-17 14:58  0%         ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-06-17 13:08 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon, Richardson, Bruce
  Cc: Morten Brørup, dev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, jerinj, gakhil

On 6/14/2021 4:54 PM, Ananyev, Konstantin wrote:
> 
> 
>>>
>>> 14/06/2021 15:15, Bruce Richardson:
>>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
>>>>>> Sent: Monday, 14 June 2021 12.59
>>>>>>
>>>>>> Performance of access in a fixed-size array is very good
>>>>>> because of cache locality
>>>>>> and because there is a single pointer to dereference.
>>>>>> The only drawback is the lack of flexibility:
>>>>>> the size of such an array cannot be increase at runtime.
>>>>>>
>>>>>> An approach to this problem is to allocate the array at runtime,
>>>>>> being as efficient as static arrays, but still limited to a maximum.
>>>>>>
>>>>>> That's why the API rte_parray is introduced,
>>>>>> allowing to declare an array of pointer which can be resized
>>>>>> dynamically
>>>>>> and automatically at runtime while keeping a good read performance.
>>>>>>
>>>>>> After resize, the previous array is kept until the next resize
>>>>>> to avoid crashs during a read without any lock.
>>>>>>
>>>>>> Each element is a pointer to a memory chunk dynamically allocated.
>>>>>> This is not good for cache locality but it allows to keep the same
>>>>>> memory per element, no matter how the array is resized.
>>>>>> Cache locality could be improved with mempools.
>>>>>> The other drawback is having to dereference one more pointer
>>>>>> to read an element.
>>>>>>
>>>>>> There is not much locks, so the API is for internal use only.
>>>>>> This API may be used to completely remove some compilation-time
>>>>>> maximums.
>>>>>
>>>>> I get the purpose and overall intention of this library.
>>>>>
>>>>> I probably already mentioned that I prefer "embedded style programming" with fixed size arrays, rather than runtime configurability.
>> It's
>>> my personal opinion, and the DPDK Tech Board clearly prefers reducing the amount of compile time configurability, so there is no way for
>>> me to stop this progress, and I do not intend to oppose to this library. :-)
>>>>>
>>>>> This library is likely to become a core library of DPDK, so I think it is important getting it right. Could you please mention a few
>> examples
>>> where you think this internal library should be used, and where it should not be used. Then it is easier to discuss if the border line between
>>> control path and data plane is correct. E.g. this library is not intended to be used for dynamically sized packet queues that grow and shrink
>> in
>>> the fast path.
>>>>>
>>>>> If the library becomes a core DPDK library, it should probably be public instead of internal. E.g. if the library is used to make
>>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some applications might also need dynamically sized arrays for their
>>> application specific per-port runtime data, and this library could serve that purpose too.
>>>>>
>>>>
>>>> Thanks Thomas for starting this discussion and Morten for follow-up.
>>>>
>>>> My thinking is as follows, and I'm particularly keeping in mind the cases
>>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
>>>>
>>>> While I dislike the hard-coded limits in DPDK, I'm also not convinced that
>>>> we should switch away from the flat arrays or that we need fully dynamic
>>>> arrays that grow/shrink at runtime for ethdevs. I would suggest a half-way
>>>> house here, where we keep the ethdevs as an array, but one allocated/sized
>>>> at runtime rather than statically. This would allow us to have a
>>>> compile-time default value, but, for use cases that need it, allow use of a
>>>> flag e.g.  "max-ethdevs" to change the size of the parameter given to the
>>>> malloc call for the array.  This max limit could then be provided to apps
>>>> too if they want to match any array sizes. [Alternatively those apps could
>>>> check the provided size and error out if the size has been increased beyond
>>>> what the app is designed to use?]. There would be no extra dereferences per
>>>> rx/tx burst call in this scenario so performance should be the same as
>>>> before (potentially better if array is in hugepage memory, I suppose).
>>>
>>> I think we need some benchmarks to decide what is the best tradeoff.
>>> I spent time on this implementation, but sorry I won't have time for benchmarks.
>>> Volunteers?
>>
>> I had only a quick look at your approach so far.
>> But from what I can read, in MT environment your suggestion will require
>> extra synchronization for each read-write access to such parray element (lock, rcu, ...).
>> I think what Bruce suggests will be much ligther, easier to implement and less error prone.
>> At least for rte_ethdevs[] and friends.
>> Konstantin
> 
> One more thought here - if we are talking about rte_ethdev[] in particular, I think  we can:
> 1. move public function pointers (rx_pkt_burst(), etc.) from rte_ethdev into a separate flat array.
> We can keep it public to still use inline functions for 'fast' calls rte_eth_rx_burst(), etc. to avoid
> any regressions.
> That could still be flat array with max_size specified at application startup.
> 2. Hide rest of rte_ethdev struct in .c.
> That will allow us to change the struct itself and the whole rte_ethdev[] table in a way we like
> (flat array, vector, hash, linked list) without ABI/API breakages.
> 
> Yes, it would require all PMDs to change prototype for pkt_rx_burst() function
> (to accept port_id, queue_id instead of queue pointer), but the change is mechanical one.
> Probably some macro can be provided to simplify it.
> 

We are already planning some tasks for ABI stability for v21.11, I think
splitting 'struct rte_eth_dev' can be part of that task, it enables hiding more
internal data.

> The only significant complication I can foresee with implementing that approach -
> we'll need a an array of 'fast' function pointers per queue, not per device as we have now
> (to avoid extra indirection for callback implementation).
> Though as a bonus we'll have ability to use different RX/TX funcions per queue.
> 

What do you think split Rx/Tx callback into its own struct too?

Overall 'rte_eth_dev' can be split into three as:
1. rte_eth_dev
2. rte_eth_dev_burst
3. rte_eth_dev_cb

And we can hide 1 from applications even with the inline functions.



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-17 13:08  3%       ` Ferruh Yigit
@ 2021-06-17 14:58  0%         ` Ananyev, Konstantin
  2021-06-17 15:17  0%           ` Morten Brørup
  2021-06-17 15:44  3%           ` Ferruh Yigit
  0 siblings, 2 replies; 200+ results
From: Ananyev, Konstantin @ 2021-06-17 14:58 UTC (permalink / raw)
  To: Yigit, Ferruh, Thomas Monjalon, Richardson, Bruce
  Cc: Morten Brørup, dev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, jerinj, gakhil



> >>>
> >>> 14/06/2021 15:15, Bruce Richardson:
> >>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
> >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> >>>>>> Sent: Monday, 14 June 2021 12.59
> >>>>>>
> >>>>>> Performance of access in a fixed-size array is very good
> >>>>>> because of cache locality
> >>>>>> and because there is a single pointer to dereference.
> >>>>>> The only drawback is the lack of flexibility:
> >>>>>> the size of such an array cannot be increase at runtime.
> >>>>>>
> >>>>>> An approach to this problem is to allocate the array at runtime,
> >>>>>> being as efficient as static arrays, but still limited to a maximum.
> >>>>>>
> >>>>>> That's why the API rte_parray is introduced,
> >>>>>> allowing to declare an array of pointer which can be resized
> >>>>>> dynamically
> >>>>>> and automatically at runtime while keeping a good read performance.
> >>>>>>
> >>>>>> After resize, the previous array is kept until the next resize
> >>>>>> to avoid crashs during a read without any lock.
> >>>>>>
> >>>>>> Each element is a pointer to a memory chunk dynamically allocated.
> >>>>>> This is not good for cache locality but it allows to keep the same
> >>>>>> memory per element, no matter how the array is resized.
> >>>>>> Cache locality could be improved with mempools.
> >>>>>> The other drawback is having to dereference one more pointer
> >>>>>> to read an element.
> >>>>>>
> >>>>>> There is not much locks, so the API is for internal use only.
> >>>>>> This API may be used to completely remove some compilation-time
> >>>>>> maximums.
> >>>>>
> >>>>> I get the purpose and overall intention of this library.
> >>>>>
> >>>>> I probably already mentioned that I prefer "embedded style programming" with fixed size arrays, rather than runtime configurability.
> >> It's
> >>> my personal opinion, and the DPDK Tech Board clearly prefers reducing the amount of compile time configurability, so there is no way
> for
> >>> me to stop this progress, and I do not intend to oppose to this library. :-)
> >>>>>
> >>>>> This library is likely to become a core library of DPDK, so I think it is important getting it right. Could you please mention a few
> >> examples
> >>> where you think this internal library should be used, and where it should not be used. Then it is easier to discuss if the border line
> between
> >>> control path and data plane is correct. E.g. this library is not intended to be used for dynamically sized packet queues that grow and
> shrink
> >> in
> >>> the fast path.
> >>>>>
> >>>>> If the library becomes a core DPDK library, it should probably be public instead of internal. E.g. if the library is used to make
> >>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some applications might also need dynamically sized arrays for their
> >>> application specific per-port runtime data, and this library could serve that purpose too.
> >>>>>
> >>>>
> >>>> Thanks Thomas for starting this discussion and Morten for follow-up.
> >>>>
> >>>> My thinking is as follows, and I'm particularly keeping in mind the cases
> >>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
> >>>>
> >>>> While I dislike the hard-coded limits in DPDK, I'm also not convinced that
> >>>> we should switch away from the flat arrays or that we need fully dynamic
> >>>> arrays that grow/shrink at runtime for ethdevs. I would suggest a half-way
> >>>> house here, where we keep the ethdevs as an array, but one allocated/sized
> >>>> at runtime rather than statically. This would allow us to have a
> >>>> compile-time default value, but, for use cases that need it, allow use of a
> >>>> flag e.g.  "max-ethdevs" to change the size of the parameter given to the
> >>>> malloc call for the array.  This max limit could then be provided to apps
> >>>> too if they want to match any array sizes. [Alternatively those apps could
> >>>> check the provided size and error out if the size has been increased beyond
> >>>> what the app is designed to use?]. There would be no extra dereferences per
> >>>> rx/tx burst call in this scenario so performance should be the same as
> >>>> before (potentially better if array is in hugepage memory, I suppose).
> >>>
> >>> I think we need some benchmarks to decide what is the best tradeoff.
> >>> I spent time on this implementation, but sorry I won't have time for benchmarks.
> >>> Volunteers?
> >>
> >> I had only a quick look at your approach so far.
> >> But from what I can read, in MT environment your suggestion will require
> >> extra synchronization for each read-write access to such parray element (lock, rcu, ...).
> >> I think what Bruce suggests will be much ligther, easier to implement and less error prone.
> >> At least for rte_ethdevs[] and friends.
> >> Konstantin
> >
> > One more thought here - if we are talking about rte_ethdev[] in particular, I think  we can:
> > 1. move public function pointers (rx_pkt_burst(), etc.) from rte_ethdev into a separate flat array.
> > We can keep it public to still use inline functions for 'fast' calls rte_eth_rx_burst(), etc. to avoid
> > any regressions.
> > That could still be flat array with max_size specified at application startup.
> > 2. Hide rest of rte_ethdev struct in .c.
> > That will allow us to change the struct itself and the whole rte_ethdev[] table in a way we like
> > (flat array, vector, hash, linked list) without ABI/API breakages.
> >
> > Yes, it would require all PMDs to change prototype for pkt_rx_burst() function
> > (to accept port_id, queue_id instead of queue pointer), but the change is mechanical one.
> > Probably some macro can be provided to simplify it.
> >
> 
> We are already planning some tasks for ABI stability for v21.11, I think
> splitting 'struct rte_eth_dev' can be part of that task, it enables hiding more
> internal data.

Ok, sounds good.

> 
> > The only significant complication I can foresee with implementing that approach -
> > we'll need a an array of 'fast' function pointers per queue, not per device as we have now
> > (to avoid extra indirection for callback implementation).
> > Though as a bonus we'll have ability to use different RX/TX funcions per queue.
> >
> 
> What do you think split Rx/Tx callback into its own struct too?
> 
> Overall 'rte_eth_dev' can be split into three as:
> 1. rte_eth_dev
> 2. rte_eth_dev_burst
> 3. rte_eth_dev_cb
> 
> And we can hide 1 from applications even with the inline functions.

As discussed off-line, I think:
it is possible. 
My absolute preference would be to have just 1/2 (with CB hidden).
But even with 1/2/3 in place I think it would be  a good step forward.
Probably worth to start with 1/2/3 first and then see how difficult it
would be to switch to 1/2.
Do you plan to start working on it?
 
Konstantin





^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce IPv4 ihl and version fields
    @ 2021-06-17 15:02  3% ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2021-06-17 15:02 UTC (permalink / raw)
  To: Gregory Etelson
  Cc: dev, matan, orika, rasland, Bernard Iremonger, Olivier Matz

On Thu, May 27, 2021 at 06:28:58PM +0300, Gregory Etelson wrote:
> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h
> index 4b728969c1..684bb028b2 100644
> --- a/lib/net/rte_ip.h
> +++ b/lib/net/rte_ip.h
> @@ -38,7 +38,21 @@ extern "C" {
>   * IPv4 Header
>   */
>  struct rte_ipv4_hdr {
> -	uint8_t  version_ihl;		/**< version and header length */
> +	__extension__

this patch reduces compiler portability, though not strictly objecting
so long as the community accepts that it may lead to conditional
compilation having to be introduced in a future change.

please also be mindful of the impact of __attribute__ ((__packed__)) in
the presence of bitfields on gcc when evaluating abi compatibility.

> +	union {
> +		uint8_t version_ihl;    /**< version and header length */
> +		struct {
> +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
> +			uint8_t ihl:4;
> +			uint8_t version:4;
> +#elif RTE_BYTE_ORDER == RTE_BIG_ENDIAN
> +			uint8_t version:4;
> +			uint8_t ihl:4;
> +#else
> +#error "setup endian definition"
> +#endif
> +		};
> +	};
>  	uint8_t  type_of_service;	/**< type of service */
>  	rte_be16_t total_length;	/**< length of packet */
>  	rte_be16_t packet_id;		/**< packet ID */
> -- 
> 2.31.1

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-17 14:58  0%         ` Ananyev, Konstantin
@ 2021-06-17 15:17  0%           ` Morten Brørup
  2021-06-17 16:12  0%             ` Ferruh Yigit
  2021-06-17 15:44  3%           ` Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2021-06-17 15:17 UTC (permalink / raw)
  To: Ananyev, Konstantin, Yigit, Ferruh, Thomas Monjalon, Richardson, Bruce
  Cc: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	jerinj, gakhil

> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Thursday, 17 June 2021 16.59
> 
> > >>>
> > >>> 14/06/2021 15:15, Bruce Richardson:
> > >>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
> > >>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> Monjalon
> > >>>>>> Sent: Monday, 14 June 2021 12.59
> > >>>>>>
> > >>>>>> Performance of access in a fixed-size array is very good
> > >>>>>> because of cache locality
> > >>>>>> and because there is a single pointer to dereference.
> > >>>>>> The only drawback is the lack of flexibility:
> > >>>>>> the size of such an array cannot be increase at runtime.
> > >>>>>>
> > >>>>>> An approach to this problem is to allocate the array at
> runtime,
> > >>>>>> being as efficient as static arrays, but still limited to a
> maximum.
> > >>>>>>
> > >>>>>> That's why the API rte_parray is introduced,
> > >>>>>> allowing to declare an array of pointer which can be resized
> > >>>>>> dynamically
> > >>>>>> and automatically at runtime while keeping a good read
> performance.
> > >>>>>>
> > >>>>>> After resize, the previous array is kept until the next resize
> > >>>>>> to avoid crashs during a read without any lock.
> > >>>>>>
> > >>>>>> Each element is a pointer to a memory chunk dynamically
> allocated.
> > >>>>>> This is not good for cache locality but it allows to keep the
> same
> > >>>>>> memory per element, no matter how the array is resized.
> > >>>>>> Cache locality could be improved with mempools.
> > >>>>>> The other drawback is having to dereference one more pointer
> > >>>>>> to read an element.
> > >>>>>>
> > >>>>>> There is not much locks, so the API is for internal use only.
> > >>>>>> This API may be used to completely remove some compilation-
> time
> > >>>>>> maximums.
> > >>>>>
> > >>>>> I get the purpose and overall intention of this library.
> > >>>>>
> > >>>>> I probably already mentioned that I prefer "embedded style
> programming" with fixed size arrays, rather than runtime
> configurability.
> > >> It's
> > >>> my personal opinion, and the DPDK Tech Board clearly prefers
> reducing the amount of compile time configurability, so there is no way
> > for
> > >>> me to stop this progress, and I do not intend to oppose to this
> library. :-)
> > >>>>>
> > >>>>> This library is likely to become a core library of DPDK, so I
> think it is important getting it right. Could you please mention a few
> > >> examples
> > >>> where you think this internal library should be used, and where
> it should not be used. Then it is easier to discuss if the border line
> > between
> > >>> control path and data plane is correct. E.g. this library is not
> intended to be used for dynamically sized packet queues that grow and
> > shrink
> > >> in
> > >>> the fast path.
> > >>>>>
> > >>>>> If the library becomes a core DPDK library, it should probably
> be public instead of internal. E.g. if the library is used to make
> > >>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some
> applications might also need dynamically sized arrays for their
> > >>> application specific per-port runtime data, and this library
> could serve that purpose too.
> > >>>>>
> > >>>>
> > >>>> Thanks Thomas for starting this discussion and Morten for
> follow-up.
> > >>>>
> > >>>> My thinking is as follows, and I'm particularly keeping in mind
> the cases
> > >>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
> > >>>>
> > >>>> While I dislike the hard-coded limits in DPDK, I'm also not
> convinced that
> > >>>> we should switch away from the flat arrays or that we need fully
> dynamic
> > >>>> arrays that grow/shrink at runtime for ethdevs. I would suggest
> a half-way
> > >>>> house here, where we keep the ethdevs as an array, but one
> allocated/sized
> > >>>> at runtime rather than statically. This would allow us to have a
> > >>>> compile-time default value, but, for use cases that need it,
> allow use of a
> > >>>> flag e.g.  "max-ethdevs" to change the size of the parameter
> given to the
> > >>>> malloc call for the array.  This max limit could then be
> provided to apps
> > >>>> too if they want to match any array sizes. [Alternatively those
> apps could
> > >>>> check the provided size and error out if the size has been
> increased beyond
> > >>>> what the app is designed to use?]. There would be no extra
> dereferences per
> > >>>> rx/tx burst call in this scenario so performance should be the
> same as
> > >>>> before (potentially better if array is in hugepage memory, I
> suppose).
> > >>>
> > >>> I think we need some benchmarks to decide what is the best
> tradeoff.
> > >>> I spent time on this implementation, but sorry I won't have time
> for benchmarks.
> > >>> Volunteers?
> > >>
> > >> I had only a quick look at your approach so far.
> > >> But from what I can read, in MT environment your suggestion will
> require
> > >> extra synchronization for each read-write access to such parray
> element (lock, rcu, ...).
> > >> I think what Bruce suggests will be much ligther, easier to
> implement and less error prone.
> > >> At least for rte_ethdevs[] and friends.
> > >> Konstantin
> > >
> > > One more thought here - if we are talking about rte_ethdev[] in
> particular, I think  we can:
> > > 1. move public function pointers (rx_pkt_burst(), etc.) from
> rte_ethdev into a separate flat array.
> > > We can keep it public to still use inline functions for 'fast'
> calls rte_eth_rx_burst(), etc. to avoid
> > > any regressions.
> > > That could still be flat array with max_size specified at
> application startup.
> > > 2. Hide rest of rte_ethdev struct in .c.
> > > That will allow us to change the struct itself and the whole
> rte_ethdev[] table in a way we like
> > > (flat array, vector, hash, linked list) without ABI/API breakages.
> > >
> > > Yes, it would require all PMDs to change prototype for
> pkt_rx_burst() function
> > > (to accept port_id, queue_id instead of queue pointer), but the
> change is mechanical one.
> > > Probably some macro can be provided to simplify it.
> > >
> >
> > We are already planning some tasks for ABI stability for v21.11, I
> think
> > splitting 'struct rte_eth_dev' can be part of that task, it enables
> hiding more
> > internal data.
> 
> Ok, sounds good.
> 
> >
> > > The only significant complication I can foresee with implementing
> that approach -
> > > we'll need a an array of 'fast' function pointers per queue, not
> per device as we have now
> > > (to avoid extra indirection for callback implementation).
> > > Though as a bonus we'll have ability to use different RX/TX
> funcions per queue.
> > >
> >
> > What do you think split Rx/Tx callback into its own struct too?
> >
> > Overall 'rte_eth_dev' can be split into three as:
> > 1. rte_eth_dev
> > 2. rte_eth_dev_burst
> > 3. rte_eth_dev_cb
> >
> > And we can hide 1 from applications even with the inline functions.
> 
> As discussed off-line, I think:
> it is possible.
> My absolute preference would be to have just 1/2 (with CB hidden).
> But even with 1/2/3 in place I think it would be  a good step forward.
> Probably worth to start with 1/2/3 first and then see how difficult it
> would be to switch to 1/2.
> Do you plan to start working on it?
> 
> Konstantin

If you do proceed with this, be very careful. E.g. the inlined rx/tx burst functions should not touch more cache lines than they do today - especially if there are many active ports. The inlined rx/tx burst functions are very simple, so thorough code review (and possibly also of the resulting assembly) is appropriate. Simple performance testing might not detect if more cache lines are accessed than before the modifications.

Don't get me wrong... I do consider this an improvement of the ethdev library; I'm only asking you to take extra care!

-Morten


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-17 14:58  0%         ` Ananyev, Konstantin
  2021-06-17 15:17  0%           ` Morten Brørup
@ 2021-06-17 15:44  3%           ` Ferruh Yigit
  2021-06-18 10:41  0%             ` Ananyev, Konstantin
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-06-17 15:44 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon, Richardson, Bruce
  Cc: Morten Brørup, dev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, jerinj, gakhil

On 6/17/2021 3:58 PM, Ananyev, Konstantin wrote:
> 
> 
>>>>>
>>>>> 14/06/2021 15:15, Bruce Richardson:
>>>>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
>>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
>>>>>>>> Sent: Monday, 14 June 2021 12.59
>>>>>>>>
>>>>>>>> Performance of access in a fixed-size array is very good
>>>>>>>> because of cache locality
>>>>>>>> and because there is a single pointer to dereference.
>>>>>>>> The only drawback is the lack of flexibility:
>>>>>>>> the size of such an array cannot be increase at runtime.
>>>>>>>>
>>>>>>>> An approach to this problem is to allocate the array at runtime,
>>>>>>>> being as efficient as static arrays, but still limited to a maximum.
>>>>>>>>
>>>>>>>> That's why the API rte_parray is introduced,
>>>>>>>> allowing to declare an array of pointer which can be resized
>>>>>>>> dynamically
>>>>>>>> and automatically at runtime while keeping a good read performance.
>>>>>>>>
>>>>>>>> After resize, the previous array is kept until the next resize
>>>>>>>> to avoid crashs during a read without any lock.
>>>>>>>>
>>>>>>>> Each element is a pointer to a memory chunk dynamically allocated.
>>>>>>>> This is not good for cache locality but it allows to keep the same
>>>>>>>> memory per element, no matter how the array is resized.
>>>>>>>> Cache locality could be improved with mempools.
>>>>>>>> The other drawback is having to dereference one more pointer
>>>>>>>> to read an element.
>>>>>>>>
>>>>>>>> There is not much locks, so the API is for internal use only.
>>>>>>>> This API may be used to completely remove some compilation-time
>>>>>>>> maximums.
>>>>>>>
>>>>>>> I get the purpose and overall intention of this library.
>>>>>>>
>>>>>>> I probably already mentioned that I prefer "embedded style programming" with fixed size arrays, rather than runtime configurability.
>>>> It's
>>>>> my personal opinion, and the DPDK Tech Board clearly prefers reducing the amount of compile time configurability, so there is no way
>> for
>>>>> me to stop this progress, and I do not intend to oppose to this library. :-)
>>>>>>>
>>>>>>> This library is likely to become a core library of DPDK, so I think it is important getting it right. Could you please mention a few
>>>> examples
>>>>> where you think this internal library should be used, and where it should not be used. Then it is easier to discuss if the border line
>> between
>>>>> control path and data plane is correct. E.g. this library is not intended to be used for dynamically sized packet queues that grow and
>> shrink
>>>> in
>>>>> the fast path.
>>>>>>>
>>>>>>> If the library becomes a core DPDK library, it should probably be public instead of internal. E.g. if the library is used to make
>>>>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some applications might also need dynamically sized arrays for their
>>>>> application specific per-port runtime data, and this library could serve that purpose too.
>>>>>>>
>>>>>>
>>>>>> Thanks Thomas for starting this discussion and Morten for follow-up.
>>>>>>
>>>>>> My thinking is as follows, and I'm particularly keeping in mind the cases
>>>>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
>>>>>>
>>>>>> While I dislike the hard-coded limits in DPDK, I'm also not convinced that
>>>>>> we should switch away from the flat arrays or that we need fully dynamic
>>>>>> arrays that grow/shrink at runtime for ethdevs. I would suggest a half-way
>>>>>> house here, where we keep the ethdevs as an array, but one allocated/sized
>>>>>> at runtime rather than statically. This would allow us to have a
>>>>>> compile-time default value, but, for use cases that need it, allow use of a
>>>>>> flag e.g.  "max-ethdevs" to change the size of the parameter given to the
>>>>>> malloc call for the array.  This max limit could then be provided to apps
>>>>>> too if they want to match any array sizes. [Alternatively those apps could
>>>>>> check the provided size and error out if the size has been increased beyond
>>>>>> what the app is designed to use?]. There would be no extra dereferences per
>>>>>> rx/tx burst call in this scenario so performance should be the same as
>>>>>> before (potentially better if array is in hugepage memory, I suppose).
>>>>>
>>>>> I think we need some benchmarks to decide what is the best tradeoff.
>>>>> I spent time on this implementation, but sorry I won't have time for benchmarks.
>>>>> Volunteers?
>>>>
>>>> I had only a quick look at your approach so far.
>>>> But from what I can read, in MT environment your suggestion will require
>>>> extra synchronization for each read-write access to such parray element (lock, rcu, ...).
>>>> I think what Bruce suggests will be much ligther, easier to implement and less error prone.
>>>> At least for rte_ethdevs[] and friends.
>>>> Konstantin
>>>
>>> One more thought here - if we are talking about rte_ethdev[] in particular, I think  we can:
>>> 1. move public function pointers (rx_pkt_burst(), etc.) from rte_ethdev into a separate flat array.
>>> We can keep it public to still use inline functions for 'fast' calls rte_eth_rx_burst(), etc. to avoid
>>> any regressions.
>>> That could still be flat array with max_size specified at application startup.
>>> 2. Hide rest of rte_ethdev struct in .c.
>>> That will allow us to change the struct itself and the whole rte_ethdev[] table in a way we like
>>> (flat array, vector, hash, linked list) without ABI/API breakages.
>>>
>>> Yes, it would require all PMDs to change prototype for pkt_rx_burst() function
>>> (to accept port_id, queue_id instead of queue pointer), but the change is mechanical one.
>>> Probably some macro can be provided to simplify it.
>>>
>>
>> We are already planning some tasks for ABI stability for v21.11, I think
>> splitting 'struct rte_eth_dev' can be part of that task, it enables hiding more
>> internal data.
> 
> Ok, sounds good.
> 
>>
>>> The only significant complication I can foresee with implementing that approach -
>>> we'll need a an array of 'fast' function pointers per queue, not per device as we have now
>>> (to avoid extra indirection for callback implementation).
>>> Though as a bonus we'll have ability to use different RX/TX funcions per queue.
>>>
>>
>> What do you think split Rx/Tx callback into its own struct too?
>>
>> Overall 'rte_eth_dev' can be split into three as:
>> 1. rte_eth_dev
>> 2. rte_eth_dev_burst
>> 3. rte_eth_dev_cb
>>
>> And we can hide 1 from applications even with the inline functions.
> 
> As discussed off-line, I think:
> it is possible.
> My absolute preference would be to have just 1/2 (with CB hidden).

How can we hide the callbacks since they are used by inline burst functions.

> But even with 1/2/3 in place I think it would be  a good step forward.
> Probably worth to start with 1/2/3 first and then see how difficult it
> would be to switch to 1/2.

What do you mean by switch to 1/2?

If we keep having inline functions, and split struct as above three structs, we
can only hide 1, and 2/3 will be still visible to apps because of inline
functions. This way we will be able to hide more still having same performance.

> Do you plan to start working on it?
> 

We are gathering the list of the tasks for the ABI stability, most probably they
will be worked on during v21.11. I can take this one.

> Konstantin
> 
> 
> 
> 


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-17 15:17  0%           ` Morten Brørup
@ 2021-06-17 16:12  0%             ` Ferruh Yigit
  2021-06-17 16:55  0%               ` Morten Brørup
  2021-06-17 17:05  0%               ` Ananyev, Konstantin
  0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2021-06-17 16:12 UTC (permalink / raw)
  To: Morten Brørup, Ananyev, Konstantin, Thomas Monjalon,
	Richardson, Bruce
  Cc: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	jerinj, gakhil

On 6/17/2021 4:17 PM, Morten Brørup wrote:
>> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
>> Sent: Thursday, 17 June 2021 16.59
>>
>>>>>>
>>>>>> 14/06/2021 15:15, Bruce Richardson:
>>>>>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
>>>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
>> Monjalon
>>>>>>>>> Sent: Monday, 14 June 2021 12.59
>>>>>>>>>
>>>>>>>>> Performance of access in a fixed-size array is very good
>>>>>>>>> because of cache locality
>>>>>>>>> and because there is a single pointer to dereference.
>>>>>>>>> The only drawback is the lack of flexibility:
>>>>>>>>> the size of such an array cannot be increase at runtime.
>>>>>>>>>
>>>>>>>>> An approach to this problem is to allocate the array at
>> runtime,
>>>>>>>>> being as efficient as static arrays, but still limited to a
>> maximum.
>>>>>>>>>
>>>>>>>>> That's why the API rte_parray is introduced,
>>>>>>>>> allowing to declare an array of pointer which can be resized
>>>>>>>>> dynamically
>>>>>>>>> and automatically at runtime while keeping a good read
>> performance.
>>>>>>>>>
>>>>>>>>> After resize, the previous array is kept until the next resize
>>>>>>>>> to avoid crashs during a read without any lock.
>>>>>>>>>
>>>>>>>>> Each element is a pointer to a memory chunk dynamically
>> allocated.
>>>>>>>>> This is not good for cache locality but it allows to keep the
>> same
>>>>>>>>> memory per element, no matter how the array is resized.
>>>>>>>>> Cache locality could be improved with mempools.
>>>>>>>>> The other drawback is having to dereference one more pointer
>>>>>>>>> to read an element.
>>>>>>>>>
>>>>>>>>> There is not much locks, so the API is for internal use only.
>>>>>>>>> This API may be used to completely remove some compilation-
>> time
>>>>>>>>> maximums.
>>>>>>>>
>>>>>>>> I get the purpose and overall intention of this library.
>>>>>>>>
>>>>>>>> I probably already mentioned that I prefer "embedded style
>> programming" with fixed size arrays, rather than runtime
>> configurability.
>>>>> It's
>>>>>> my personal opinion, and the DPDK Tech Board clearly prefers
>> reducing the amount of compile time configurability, so there is no way
>>> for
>>>>>> me to stop this progress, and I do not intend to oppose to this
>> library. :-)
>>>>>>>>
>>>>>>>> This library is likely to become a core library of DPDK, so I
>> think it is important getting it right. Could you please mention a few
>>>>> examples
>>>>>> where you think this internal library should be used, and where
>> it should not be used. Then it is easier to discuss if the border line
>>> between
>>>>>> control path and data plane is correct. E.g. this library is not
>> intended to be used for dynamically sized packet queues that grow and
>>> shrink
>>>>> in
>>>>>> the fast path.
>>>>>>>>
>>>>>>>> If the library becomes a core DPDK library, it should probably
>> be public instead of internal. E.g. if the library is used to make
>>>>>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some
>> applications might also need dynamically sized arrays for their
>>>>>> application specific per-port runtime data, and this library
>> could serve that purpose too.
>>>>>>>>
>>>>>>>
>>>>>>> Thanks Thomas for starting this discussion and Morten for
>> follow-up.
>>>>>>>
>>>>>>> My thinking is as follows, and I'm particularly keeping in mind
>> the cases
>>>>>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
>>>>>>>
>>>>>>> While I dislike the hard-coded limits in DPDK, I'm also not
>> convinced that
>>>>>>> we should switch away from the flat arrays or that we need fully
>> dynamic
>>>>>>> arrays that grow/shrink at runtime for ethdevs. I would suggest
>> a half-way
>>>>>>> house here, where we keep the ethdevs as an array, but one
>> allocated/sized
>>>>>>> at runtime rather than statically. This would allow us to have a
>>>>>>> compile-time default value, but, for use cases that need it,
>> allow use of a
>>>>>>> flag e.g.  "max-ethdevs" to change the size of the parameter
>> given to the
>>>>>>> malloc call for the array.  This max limit could then be
>> provided to apps
>>>>>>> too if they want to match any array sizes. [Alternatively those
>> apps could
>>>>>>> check the provided size and error out if the size has been
>> increased beyond
>>>>>>> what the app is designed to use?]. There would be no extra
>> dereferences per
>>>>>>> rx/tx burst call in this scenario so performance should be the
>> same as
>>>>>>> before (potentially better if array is in hugepage memory, I
>> suppose).
>>>>>>
>>>>>> I think we need some benchmarks to decide what is the best
>> tradeoff.
>>>>>> I spent time on this implementation, but sorry I won't have time
>> for benchmarks.
>>>>>> Volunteers?
>>>>>
>>>>> I had only a quick look at your approach so far.
>>>>> But from what I can read, in MT environment your suggestion will
>> require
>>>>> extra synchronization for each read-write access to such parray
>> element (lock, rcu, ...).
>>>>> I think what Bruce suggests will be much ligther, easier to
>> implement and less error prone.
>>>>> At least for rte_ethdevs[] and friends.
>>>>> Konstantin
>>>>
>>>> One more thought here - if we are talking about rte_ethdev[] in
>> particular, I think  we can:
>>>> 1. move public function pointers (rx_pkt_burst(), etc.) from
>> rte_ethdev into a separate flat array.
>>>> We can keep it public to still use inline functions for 'fast'
>> calls rte_eth_rx_burst(), etc. to avoid
>>>> any regressions.
>>>> That could still be flat array with max_size specified at
>> application startup.
>>>> 2. Hide rest of rte_ethdev struct in .c.
>>>> That will allow us to change the struct itself and the whole
>> rte_ethdev[] table in a way we like
>>>> (flat array, vector, hash, linked list) without ABI/API breakages.
>>>>
>>>> Yes, it would require all PMDs to change prototype for
>> pkt_rx_burst() function
>>>> (to accept port_id, queue_id instead of queue pointer), but the
>> change is mechanical one.
>>>> Probably some macro can be provided to simplify it.
>>>>
>>>
>>> We are already planning some tasks for ABI stability for v21.11, I
>> think
>>> splitting 'struct rte_eth_dev' can be part of that task, it enables
>> hiding more
>>> internal data.
>>
>> Ok, sounds good.
>>
>>>
>>>> The only significant complication I can foresee with implementing
>> that approach -
>>>> we'll need a an array of 'fast' function pointers per queue, not
>> per device as we have now
>>>> (to avoid extra indirection for callback implementation).
>>>> Though as a bonus we'll have ability to use different RX/TX
>> funcions per queue.
>>>>
>>>
>>> What do you think split Rx/Tx callback into its own struct too?
>>>
>>> Overall 'rte_eth_dev' can be split into three as:
>>> 1. rte_eth_dev
>>> 2. rte_eth_dev_burst
>>> 3. rte_eth_dev_cb
>>>
>>> And we can hide 1 from applications even with the inline functions.
>>
>> As discussed off-line, I think:
>> it is possible.
>> My absolute preference would be to have just 1/2 (with CB hidden).
>> But even with 1/2/3 in place I think it would be  a good step forward.
>> Probably worth to start with 1/2/3 first and then see how difficult it
>> would be to switch to 1/2.
>> Do you plan to start working on it?
>>
>> Konstantin
> 
> If you do proceed with this, be very careful. E.g. the inlined rx/tx burst functions should not touch more cache lines than they do today - especially if there are many active ports. The inlined rx/tx burst functions are very simple, so thorough code review (and possibly also of the resulting assembly) is appropriate. Simple performance testing might not detect if more cache lines are accessed than before the modifications.
> 
> Don't get me wrong... I do consider this an improvement of the ethdev library; I'm only asking you to take extra care!
> 

ack

If we split as above, I think device specific data 'struct rte_eth_dev_data'
should be part of 1 (rte_eth_dev). Which means Rx/Tx inline functions access
additional cache line.

To prevent this, what about duplicating 'data' in 2 (rte_eth_dev_burst)? We have
enough space for it to fit into single cache line, currently it is:
struct rte_eth_dev {
        eth_rx_burst_t             rx_pkt_burst;         /*     0     8 */
        eth_tx_burst_t             tx_pkt_burst;         /*     8     8 */
        eth_tx_prep_t              tx_pkt_prepare;       /*    16     8 */
        eth_rx_queue_count_t       rx_queue_count;       /*    24     8 */
        eth_rx_descriptor_done_t   rx_descriptor_done;   /*    32     8 */
        eth_rx_descriptor_status_t rx_descriptor_status; /*    40     8 */
        eth_tx_descriptor_status_t tx_descriptor_status; /*    48     8 */
        struct rte_eth_dev_data *  data;                 /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */

'rx_descriptor_done' is deprecated and will be removed;

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] net: introduce IPv4 ihl and version fields
  @ 2021-06-17 16:29  0%                     ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2021-06-17 16:29 UTC (permalink / raw)
  To: Andrew Rybchenko, Olivier Matz, Gregory Etelson
  Cc: Iremonger, Bernard, Morten Brørup, dev, Matan Azrad,
	Ori Kam, Raslan Darawsheh, Asaf Penso, Thomas Monjalon

On 6/14/2021 5:36 PM, Andrew Rybchenko wrote:
> On 6/10/21 12:22 PM, Olivier Matz wrote:
>> Hi Gregory,
>>
>> On Thu, Jun 10, 2021 at 04:10:25AM +0000, Gregory Etelson wrote:
>>> Hello,
>>>
>>> There was no activity that patch for a long time.
>>> The patch is marked as failed, but we verified failed tests and concluded
>>> that the failures can be ignored.
>>> https://patchwork.dpdk.org/project/dpdk/patch/20210527152858.13312-1-getelson@nvidia.com/
>>>
>>> How should I proceed with this case ?
>>> Please advise.
>>>
>>
>> I like the idea of this patch: to me it is more convenient to access to
>> these fields with a bitfield. I don't see a problem about using
>> bitfields here, glibc or FreeBSD netinet/ip.h are doing the same.
>>
>> However, as stated previously, this patch breaks the initialization API.
> 
> Very good point. I guess we overlooked it in a number of patches
> with fix RTE flow API items to start from corresponding network
> headers. We used unions there to avoid ABI breakage, but it looks
> like we have broken initialization API anyway.
> 

Hi Andrew,

What is broken with the flow API item updates, can you please give a sample?

> We should decide if initialization ABI breakage is a show-stopper
> for RTE flow API items switching to use network protocol headers.
> 
>> The DPDK ABI/API policy is described here:
>> http://doc.dpdk.org/guides/contributing/abi_policy.html#the-dpdk-abi-policy
>>
>>> From this document:
>>
>>    The API should only be changed for significant reasons, such as
>>    performance enhancements. API breakages due to changes such as
>>    reorganizing public structure fields for aesthetic or readability
>>    purposes should be avoided.
>>
>> So to follow the project policy, I think we should reject this path.
>>
>> Regards,
>> Olivier
>>
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-17 16:12  0%             ` Ferruh Yigit
@ 2021-06-17 16:55  0%               ` Morten Brørup
  2021-06-18 10:21  0%                 ` Ferruh Yigit
  2021-06-17 17:05  0%               ` Ananyev, Konstantin
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2021-06-17 16:55 UTC (permalink / raw)
  To: Ferruh Yigit, Ananyev, Konstantin, Thomas Monjalon, Richardson, Bruce
  Cc: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	jerinj, gakhil

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
> Sent: Thursday, 17 June 2021 18.13
> 
> On 6/17/2021 4:17 PM, Morten Brørup wrote:
> >> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> >> Sent: Thursday, 17 June 2021 16.59
> >>
> >>>>>>
> >>>>>> 14/06/2021 15:15, Bruce Richardson:
> >>>>>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
> >>>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> >> Monjalon
> >>>>>>>>> Sent: Monday, 14 June 2021 12.59
> >>>>>>>>>
> >>>>>>>>> Performance of access in a fixed-size array is very good
> >>>>>>>>> because of cache locality
> >>>>>>>>> and because there is a single pointer to dereference.
> >>>>>>>>> The only drawback is the lack of flexibility:
> >>>>>>>>> the size of such an array cannot be increase at runtime.
> >>>>>>>>>
> >>>>>>>>> An approach to this problem is to allocate the array at
> >> runtime,
> >>>>>>>>> being as efficient as static arrays, but still limited to a
> >> maximum.
> >>>>>>>>>
> >>>>>>>>> That's why the API rte_parray is introduced,
> >>>>>>>>> allowing to declare an array of pointer which can be resized
> >>>>>>>>> dynamically
> >>>>>>>>> and automatically at runtime while keeping a good read
> >> performance.
> >>>>>>>>>
> >>>>>>>>> After resize, the previous array is kept until the next
> resize
> >>>>>>>>> to avoid crashs during a read without any lock.
> >>>>>>>>>
> >>>>>>>>> Each element is a pointer to a memory chunk dynamically
> >> allocated.
> >>>>>>>>> This is not good for cache locality but it allows to keep the
> >> same
> >>>>>>>>> memory per element, no matter how the array is resized.
> >>>>>>>>> Cache locality could be improved with mempools.
> >>>>>>>>> The other drawback is having to dereference one more pointer
> >>>>>>>>> to read an element.
> >>>>>>>>>
> >>>>>>>>> There is not much locks, so the API is for internal use only.
> >>>>>>>>> This API may be used to completely remove some compilation-
> >> time
> >>>>>>>>> maximums.
> >>>>>>>>
> >>>>>>>> I get the purpose and overall intention of this library.
> >>>>>>>>
> >>>>>>>> I probably already mentioned that I prefer "embedded style
> >> programming" with fixed size arrays, rather than runtime
> >> configurability.
> >>>>> It's
> >>>>>> my personal opinion, and the DPDK Tech Board clearly prefers
> >> reducing the amount of compile time configurability, so there is no
> way
> >>> for
> >>>>>> me to stop this progress, and I do not intend to oppose to this
> >> library. :-)
> >>>>>>>>
> >>>>>>>> This library is likely to become a core library of DPDK, so I
> >> think it is important getting it right. Could you please mention a
> few
> >>>>> examples
> >>>>>> where you think this internal library should be used, and where
> >> it should not be used. Then it is easier to discuss if the border
> line
> >>> between
> >>>>>> control path and data plane is correct. E.g. this library is not
> >> intended to be used for dynamically sized packet queues that grow
> and
> >>> shrink
> >>>>> in
> >>>>>> the fast path.
> >>>>>>>>
> >>>>>>>> If the library becomes a core DPDK library, it should probably
> >> be public instead of internal. E.g. if the library is used to make
> >>>>>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then
> some
> >> applications might also need dynamically sized arrays for their
> >>>>>> application specific per-port runtime data, and this library
> >> could serve that purpose too.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Thanks Thomas for starting this discussion and Morten for
> >> follow-up.
> >>>>>>>
> >>>>>>> My thinking is as follows, and I'm particularly keeping in mind
> >> the cases
> >>>>>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
> >>>>>>>
> >>>>>>> While I dislike the hard-coded limits in DPDK, I'm also not
> >> convinced that
> >>>>>>> we should switch away from the flat arrays or that we need
> fully
> >> dynamic
> >>>>>>> arrays that grow/shrink at runtime for ethdevs. I would suggest
> >> a half-way
> >>>>>>> house here, where we keep the ethdevs as an array, but one
> >> allocated/sized
> >>>>>>> at runtime rather than statically. This would allow us to have
> a
> >>>>>>> compile-time default value, but, for use cases that need it,
> >> allow use of a
> >>>>>>> flag e.g.  "max-ethdevs" to change the size of the parameter
> >> given to the
> >>>>>>> malloc call for the array.  This max limit could then be
> >> provided to apps
> >>>>>>> too if they want to match any array sizes. [Alternatively those
> >> apps could
> >>>>>>> check the provided size and error out if the size has been
> >> increased beyond
> >>>>>>> what the app is designed to use?]. There would be no extra
> >> dereferences per
> >>>>>>> rx/tx burst call in this scenario so performance should be the
> >> same as
> >>>>>>> before (potentially better if array is in hugepage memory, I
> >> suppose).
> >>>>>>
> >>>>>> I think we need some benchmarks to decide what is the best
> >> tradeoff.
> >>>>>> I spent time on this implementation, but sorry I won't have time
> >> for benchmarks.
> >>>>>> Volunteers?
> >>>>>
> >>>>> I had only a quick look at your approach so far.
> >>>>> But from what I can read, in MT environment your suggestion will
> >> require
> >>>>> extra synchronization for each read-write access to such parray
> >> element (lock, rcu, ...).
> >>>>> I think what Bruce suggests will be much ligther, easier to
> >> implement and less error prone.
> >>>>> At least for rte_ethdevs[] and friends.
> >>>>> Konstantin
> >>>>
> >>>> One more thought here - if we are talking about rte_ethdev[] in
> >> particular, I think  we can:
> >>>> 1. move public function pointers (rx_pkt_burst(), etc.) from
> >> rte_ethdev into a separate flat array.
> >>>> We can keep it public to still use inline functions for 'fast'
> >> calls rte_eth_rx_burst(), etc. to avoid
> >>>> any regressions.
> >>>> That could still be flat array with max_size specified at
> >> application startup.
> >>>> 2. Hide rest of rte_ethdev struct in .c.
> >>>> That will allow us to change the struct itself and the whole
> >> rte_ethdev[] table in a way we like
> >>>> (flat array, vector, hash, linked list) without ABI/API breakages.
> >>>>
> >>>> Yes, it would require all PMDs to change prototype for
> >> pkt_rx_burst() function
> >>>> (to accept port_id, queue_id instead of queue pointer), but the
> >> change is mechanical one.
> >>>> Probably some macro can be provided to simplify it.
> >>>>
> >>>
> >>> We are already planning some tasks for ABI stability for v21.11, I
> >> think
> >>> splitting 'struct rte_eth_dev' can be part of that task, it enables
> >> hiding more
> >>> internal data.
> >>
> >> Ok, sounds good.
> >>
> >>>
> >>>> The only significant complication I can foresee with implementing
> >> that approach -
> >>>> we'll need a an array of 'fast' function pointers per queue, not
> >> per device as we have now
> >>>> (to avoid extra indirection for callback implementation).
> >>>> Though as a bonus we'll have ability to use different RX/TX
> >> funcions per queue.
> >>>>
> >>>
> >>> What do you think split Rx/Tx callback into its own struct too?
> >>>
> >>> Overall 'rte_eth_dev' can be split into three as:
> >>> 1. rte_eth_dev
> >>> 2. rte_eth_dev_burst
> >>> 3. rte_eth_dev_cb
> >>>
> >>> And we can hide 1 from applications even with the inline functions.
> >>
> >> As discussed off-line, I think:
> >> it is possible.
> >> My absolute preference would be to have just 1/2 (with CB hidden).
> >> But even with 1/2/3 in place I think it would be  a good step
> forward.
> >> Probably worth to start with 1/2/3 first and then see how difficult
> it
> >> would be to switch to 1/2.
> >> Do you plan to start working on it?
> >>
> >> Konstantin
> >
> > If you do proceed with this, be very careful. E.g. the inlined rx/tx
> burst functions should not touch more cache lines than they do today -
> especially if there are many active ports. The inlined rx/tx burst
> functions are very simple, so thorough code review (and possibly also
> of the resulting assembly) is appropriate. Simple performance testing
> might not detect if more cache lines are accessed than before the
> modifications.
> >
> > Don't get me wrong... I do consider this an improvement of the ethdev
> library; I'm only asking you to take extra care!
> >
> 
> ack
> 
> If we split as above, I think device specific data 'struct
> rte_eth_dev_data'
> should be part of 1 (rte_eth_dev). Which means Rx/Tx inline functions
> access
> additional cache line.
> 
> To prevent this, what about duplicating 'data' in 2
> (rte_eth_dev_burst)? We have
> enough space for it to fit into single cache line, currently it is:
> struct rte_eth_dev {
>         eth_rx_burst_t             rx_pkt_burst;         /*     0     8
> */
>         eth_tx_burst_t             tx_pkt_burst;         /*     8     8
> */
>         eth_tx_prep_t              tx_pkt_prepare;       /*    16     8
> */
>         eth_rx_queue_count_t       rx_queue_count;       /*    24     8
> */
>         eth_rx_descriptor_done_t   rx_descriptor_done;   /*    32     8
> */
>         eth_rx_descriptor_status_t rx_descriptor_status; /*    40     8
> */
>         eth_tx_descriptor_status_t tx_descriptor_status; /*    48     8
> */
>         struct rte_eth_dev_data *  data;                 /*    56     8
> */
>         /* --- cacheline 1 boundary (64 bytes) --- */
> 
> 'rx_descriptor_done' is deprecated and will be removed;

Makes sense.

Also consider moving 'data' to the top of the new struct, so there is room to add future functions below. (Without growing to more than the one cache line size, one new function can be added when 'rx_descriptor_done' has been removed.)


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-17 16:12  0%             ` Ferruh Yigit
  2021-06-17 16:55  0%               ` Morten Brørup
@ 2021-06-17 17:05  0%               ` Ananyev, Konstantin
  2021-06-18 10:28  0%                 ` Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-06-17 17:05 UTC (permalink / raw)
  To: Yigit, Ferruh, Morten Brørup, Thomas Monjalon, Richardson, Bruce
  Cc: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	jerinj, gakhil


 
> On 6/17/2021 4:17 PM, Morten Brørup wrote:
> >> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> >> Sent: Thursday, 17 June 2021 16.59
> >>
> >>>>>>
> >>>>>> 14/06/2021 15:15, Bruce Richardson:
> >>>>>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
> >>>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
> >> Monjalon
> >>>>>>>>> Sent: Monday, 14 June 2021 12.59
> >>>>>>>>>
> >>>>>>>>> Performance of access in a fixed-size array is very good
> >>>>>>>>> because of cache locality
> >>>>>>>>> and because there is a single pointer to dereference.
> >>>>>>>>> The only drawback is the lack of flexibility:
> >>>>>>>>> the size of such an array cannot be increase at runtime.
> >>>>>>>>>
> >>>>>>>>> An approach to this problem is to allocate the array at
> >> runtime,
> >>>>>>>>> being as efficient as static arrays, but still limited to a
> >> maximum.
> >>>>>>>>>
> >>>>>>>>> That's why the API rte_parray is introduced,
> >>>>>>>>> allowing to declare an array of pointer which can be resized
> >>>>>>>>> dynamically
> >>>>>>>>> and automatically at runtime while keeping a good read
> >> performance.
> >>>>>>>>>
> >>>>>>>>> After resize, the previous array is kept until the next resize
> >>>>>>>>> to avoid crashs during a read without any lock.
> >>>>>>>>>
> >>>>>>>>> Each element is a pointer to a memory chunk dynamically
> >> allocated.
> >>>>>>>>> This is not good for cache locality but it allows to keep the
> >> same
> >>>>>>>>> memory per element, no matter how the array is resized.
> >>>>>>>>> Cache locality could be improved with mempools.
> >>>>>>>>> The other drawback is having to dereference one more pointer
> >>>>>>>>> to read an element.
> >>>>>>>>>
> >>>>>>>>> There is not much locks, so the API is for internal use only.
> >>>>>>>>> This API may be used to completely remove some compilation-
> >> time
> >>>>>>>>> maximums.
> >>>>>>>>
> >>>>>>>> I get the purpose and overall intention of this library.
> >>>>>>>>
> >>>>>>>> I probably already mentioned that I prefer "embedded style
> >> programming" with fixed size arrays, rather than runtime
> >> configurability.
> >>>>> It's
> >>>>>> my personal opinion, and the DPDK Tech Board clearly prefers
> >> reducing the amount of compile time configurability, so there is no way
> >>> for
> >>>>>> me to stop this progress, and I do not intend to oppose to this
> >> library. :-)
> >>>>>>>>
> >>>>>>>> This library is likely to become a core library of DPDK, so I
> >> think it is important getting it right. Could you please mention a few
> >>>>> examples
> >>>>>> where you think this internal library should be used, and where
> >> it should not be used. Then it is easier to discuss if the border line
> >>> between
> >>>>>> control path and data plane is correct. E.g. this library is not
> >> intended to be used for dynamically sized packet queues that grow and
> >>> shrink
> >>>>> in
> >>>>>> the fast path.
> >>>>>>>>
> >>>>>>>> If the library becomes a core DPDK library, it should probably
> >> be public instead of internal. E.g. if the library is used to make
> >>>>>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some
> >> applications might also need dynamically sized arrays for their
> >>>>>> application specific per-port runtime data, and this library
> >> could serve that purpose too.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Thanks Thomas for starting this discussion and Morten for
> >> follow-up.
> >>>>>>>
> >>>>>>> My thinking is as follows, and I'm particularly keeping in mind
> >> the cases
> >>>>>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
> >>>>>>>
> >>>>>>> While I dislike the hard-coded limits in DPDK, I'm also not
> >> convinced that
> >>>>>>> we should switch away from the flat arrays or that we need fully
> >> dynamic
> >>>>>>> arrays that grow/shrink at runtime for ethdevs. I would suggest
> >> a half-way
> >>>>>>> house here, where we keep the ethdevs as an array, but one
> >> allocated/sized
> >>>>>>> at runtime rather than statically. This would allow us to have a
> >>>>>>> compile-time default value, but, for use cases that need it,
> >> allow use of a
> >>>>>>> flag e.g.  "max-ethdevs" to change the size of the parameter
> >> given to the
> >>>>>>> malloc call for the array.  This max limit could then be
> >> provided to apps
> >>>>>>> too if they want to match any array sizes. [Alternatively those
> >> apps could
> >>>>>>> check the provided size and error out if the size has been
> >> increased beyond
> >>>>>>> what the app is designed to use?]. There would be no extra
> >> dereferences per
> >>>>>>> rx/tx burst call in this scenario so performance should be the
> >> same as
> >>>>>>> before (potentially better if array is in hugepage memory, I
> >> suppose).
> >>>>>>
> >>>>>> I think we need some benchmarks to decide what is the best
> >> tradeoff.
> >>>>>> I spent time on this implementation, but sorry I won't have time
> >> for benchmarks.
> >>>>>> Volunteers?
> >>>>>
> >>>>> I had only a quick look at your approach so far.
> >>>>> But from what I can read, in MT environment your suggestion will
> >> require
> >>>>> extra synchronization for each read-write access to such parray
> >> element (lock, rcu, ...).
> >>>>> I think what Bruce suggests will be much ligther, easier to
> >> implement and less error prone.
> >>>>> At least for rte_ethdevs[] and friends.
> >>>>> Konstantin
> >>>>
> >>>> One more thought here - if we are talking about rte_ethdev[] in
> >> particular, I think  we can:
> >>>> 1. move public function pointers (rx_pkt_burst(), etc.) from
> >> rte_ethdev into a separate flat array.
> >>>> We can keep it public to still use inline functions for 'fast'
> >> calls rte_eth_rx_burst(), etc. to avoid
> >>>> any regressions.
> >>>> That could still be flat array with max_size specified at
> >> application startup.
> >>>> 2. Hide rest of rte_ethdev struct in .c.
> >>>> That will allow us to change the struct itself and the whole
> >> rte_ethdev[] table in a way we like
> >>>> (flat array, vector, hash, linked list) without ABI/API breakages.
> >>>>
> >>>> Yes, it would require all PMDs to change prototype for
> >> pkt_rx_burst() function
> >>>> (to accept port_id, queue_id instead of queue pointer), but the
> >> change is mechanical one.
> >>>> Probably some macro can be provided to simplify it.
> >>>>
> >>>
> >>> We are already planning some tasks for ABI stability for v21.11, I
> >> think
> >>> splitting 'struct rte_eth_dev' can be part of that task, it enables
> >> hiding more
> >>> internal data.
> >>
> >> Ok, sounds good.
> >>
> >>>
> >>>> The only significant complication I can foresee with implementing
> >> that approach -
> >>>> we'll need a an array of 'fast' function pointers per queue, not
> >> per device as we have now
> >>>> (to avoid extra indirection for callback implementation).
> >>>> Though as a bonus we'll have ability to use different RX/TX
> >> funcions per queue.
> >>>>
> >>>
> >>> What do you think split Rx/Tx callback into its own struct too?
> >>>
> >>> Overall 'rte_eth_dev' can be split into three as:
> >>> 1. rte_eth_dev
> >>> 2. rte_eth_dev_burst
> >>> 3. rte_eth_dev_cb
> >>>
> >>> And we can hide 1 from applications even with the inline functions.
> >>
> >> As discussed off-line, I think:
> >> it is possible.
> >> My absolute preference would be to have just 1/2 (with CB hidden).
> >> But even with 1/2/3 in place I think it would be  a good step forward.
> >> Probably worth to start with 1/2/3 first and then see how difficult it
> >> would be to switch to 1/2.
> >> Do you plan to start working on it?
> >>
> >> Konstantin
> >
> > If you do proceed with this, be very careful. E.g. the inlined rx/tx burst functions should not touch more cache lines than they do today -
> especially if there are many active ports. The inlined rx/tx burst functions are very simple, so thorough code review (and possibly also of the
> resulting assembly) is appropriate. Simple performance testing might not detect if more cache lines are accessed than before the
> modifications.
> >
> > Don't get me wrong... I do consider this an improvement of the ethdev library; I'm only asking you to take extra care!
> >
> 
> ack
> 
> If we split as above, I think device specific data 'struct rte_eth_dev_data'
> should be part of 1 (rte_eth_dev). Which means Rx/Tx inline functions access
> additional cache line.
> 
> To prevent this, what about duplicating 'data' in 2 (rte_eth_dev_burst)? 

I think it would be better to change rx_pkt_burst() to accept port_id and queue_id,
instead of void *.
I.E:
typedef uint16_t (*eth_rx_burst_t)(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts,  uint16_t nb_pkts);

And we can do actual de-referencing of private rxq data inside the actual rx function.

> We have
> enough space for it to fit into single cache line, currently it is:
> struct rte_eth_dev {
>         eth_rx_burst_t             rx_pkt_burst;         /*     0     8 */
>         eth_tx_burst_t             tx_pkt_burst;         /*     8     8 */
>         eth_tx_prep_t              tx_pkt_prepare;       /*    16     8 */
>         eth_rx_queue_count_t       rx_queue_count;       /*    24     8 */
>         eth_rx_descriptor_done_t   rx_descriptor_done;   /*    32     8 */
>         eth_rx_descriptor_status_t rx_descriptor_status; /*    40     8 */
>         eth_tx_descriptor_status_t tx_descriptor_status; /*    48     8 */
>         struct rte_eth_dev_data *  data;                 /*    56     8 */
>         /* --- cacheline 1 boundary (64 bytes) --- */
> 
> 'rx_descriptor_done' is deprecated and will be removed;

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-17 16:55  0%               ` Morten Brørup
@ 2021-06-18 10:21  0%                 ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2021-06-18 10:21 UTC (permalink / raw)
  To: Morten Brørup, Ananyev, Konstantin, Thomas Monjalon,
	Richardson, Bruce
  Cc: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	jerinj, gakhil

On 6/17/2021 5:55 PM, Morten Brørup wrote:
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit
>> Sent: Thursday, 17 June 2021 18.13
>>
>> On 6/17/2021 4:17 PM, Morten Brørup wrote:
>>>> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
>>>> Sent: Thursday, 17 June 2021 16.59
>>>>
>>>>>>>>
>>>>>>>> 14/06/2021 15:15, Bruce Richardson:
>>>>>>>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
>>>>>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
>>>> Monjalon
>>>>>>>>>>> Sent: Monday, 14 June 2021 12.59
>>>>>>>>>>>
>>>>>>>>>>> Performance of access in a fixed-size array is very good
>>>>>>>>>>> because of cache locality
>>>>>>>>>>> and because there is a single pointer to dereference.
>>>>>>>>>>> The only drawback is the lack of flexibility:
>>>>>>>>>>> the size of such an array cannot be increase at runtime.
>>>>>>>>>>>
>>>>>>>>>>> An approach to this problem is to allocate the array at
>>>> runtime,
>>>>>>>>>>> being as efficient as static arrays, but still limited to a
>>>> maximum.
>>>>>>>>>>>
>>>>>>>>>>> That's why the API rte_parray is introduced,
>>>>>>>>>>> allowing to declare an array of pointer which can be resized
>>>>>>>>>>> dynamically
>>>>>>>>>>> and automatically at runtime while keeping a good read
>>>> performance.
>>>>>>>>>>>
>>>>>>>>>>> After resize, the previous array is kept until the next
>> resize
>>>>>>>>>>> to avoid crashs during a read without any lock.
>>>>>>>>>>>
>>>>>>>>>>> Each element is a pointer to a memory chunk dynamically
>>>> allocated.
>>>>>>>>>>> This is not good for cache locality but it allows to keep the
>>>> same
>>>>>>>>>>> memory per element, no matter how the array is resized.
>>>>>>>>>>> Cache locality could be improved with mempools.
>>>>>>>>>>> The other drawback is having to dereference one more pointer
>>>>>>>>>>> to read an element.
>>>>>>>>>>>
>>>>>>>>>>> There is not much locks, so the API is for internal use only.
>>>>>>>>>>> This API may be used to completely remove some compilation-
>>>> time
>>>>>>>>>>> maximums.
>>>>>>>>>>
>>>>>>>>>> I get the purpose and overall intention of this library.
>>>>>>>>>>
>>>>>>>>>> I probably already mentioned that I prefer "embedded style
>>>> programming" with fixed size arrays, rather than runtime
>>>> configurability.
>>>>>>> It's
>>>>>>>> my personal opinion, and the DPDK Tech Board clearly prefers
>>>> reducing the amount of compile time configurability, so there is no
>> way
>>>>> for
>>>>>>>> me to stop this progress, and I do not intend to oppose to this
>>>> library. :-)
>>>>>>>>>>
>>>>>>>>>> This library is likely to become a core library of DPDK, so I
>>>> think it is important getting it right. Could you please mention a
>> few
>>>>>>> examples
>>>>>>>> where you think this internal library should be used, and where
>>>> it should not be used. Then it is easier to discuss if the border
>> line
>>>>> between
>>>>>>>> control path and data plane is correct. E.g. this library is not
>>>> intended to be used for dynamically sized packet queues that grow
>> and
>>>>> shrink
>>>>>>> in
>>>>>>>> the fast path.
>>>>>>>>>>
>>>>>>>>>> If the library becomes a core DPDK library, it should probably
>>>> be public instead of internal. E.g. if the library is used to make
>>>>>>>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then
>> some
>>>> applications might also need dynamically sized arrays for their
>>>>>>>> application specific per-port runtime data, and this library
>>>> could serve that purpose too.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks Thomas for starting this discussion and Morten for
>>>> follow-up.
>>>>>>>>>
>>>>>>>>> My thinking is as follows, and I'm particularly keeping in mind
>>>> the cases
>>>>>>>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
>>>>>>>>>
>>>>>>>>> While I dislike the hard-coded limits in DPDK, I'm also not
>>>> convinced that
>>>>>>>>> we should switch away from the flat arrays or that we need
>> fully
>>>> dynamic
>>>>>>>>> arrays that grow/shrink at runtime for ethdevs. I would suggest
>>>> a half-way
>>>>>>>>> house here, where we keep the ethdevs as an array, but one
>>>> allocated/sized
>>>>>>>>> at runtime rather than statically. This would allow us to have
>> a
>>>>>>>>> compile-time default value, but, for use cases that need it,
>>>> allow use of a
>>>>>>>>> flag e.g.  "max-ethdevs" to change the size of the parameter
>>>> given to the
>>>>>>>>> malloc call for the array.  This max limit could then be
>>>> provided to apps
>>>>>>>>> too if they want to match any array sizes. [Alternatively those
>>>> apps could
>>>>>>>>> check the provided size and error out if the size has been
>>>> increased beyond
>>>>>>>>> what the app is designed to use?]. There would be no extra
>>>> dereferences per
>>>>>>>>> rx/tx burst call in this scenario so performance should be the
>>>> same as
>>>>>>>>> before (potentially better if array is in hugepage memory, I
>>>> suppose).
>>>>>>>>
>>>>>>>> I think we need some benchmarks to decide what is the best
>>>> tradeoff.
>>>>>>>> I spent time on this implementation, but sorry I won't have time
>>>> for benchmarks.
>>>>>>>> Volunteers?
>>>>>>>
>>>>>>> I had only a quick look at your approach so far.
>>>>>>> But from what I can read, in MT environment your suggestion will
>>>> require
>>>>>>> extra synchronization for each read-write access to such parray
>>>> element (lock, rcu, ...).
>>>>>>> I think what Bruce suggests will be much ligther, easier to
>>>> implement and less error prone.
>>>>>>> At least for rte_ethdevs[] and friends.
>>>>>>> Konstantin
>>>>>>
>>>>>> One more thought here - if we are talking about rte_ethdev[] in
>>>> particular, I think  we can:
>>>>>> 1. move public function pointers (rx_pkt_burst(), etc.) from
>>>> rte_ethdev into a separate flat array.
>>>>>> We can keep it public to still use inline functions for 'fast'
>>>> calls rte_eth_rx_burst(), etc. to avoid
>>>>>> any regressions.
>>>>>> That could still be flat array with max_size specified at
>>>> application startup.
>>>>>> 2. Hide rest of rte_ethdev struct in .c.
>>>>>> That will allow us to change the struct itself and the whole
>>>> rte_ethdev[] table in a way we like
>>>>>> (flat array, vector, hash, linked list) without ABI/API breakages.
>>>>>>
>>>>>> Yes, it would require all PMDs to change prototype for
>>>> pkt_rx_burst() function
>>>>>> (to accept port_id, queue_id instead of queue pointer), but the
>>>> change is mechanical one.
>>>>>> Probably some macro can be provided to simplify it.
>>>>>>
>>>>>
>>>>> We are already planning some tasks for ABI stability for v21.11, I
>>>> think
>>>>> splitting 'struct rte_eth_dev' can be part of that task, it enables
>>>> hiding more
>>>>> internal data.
>>>>
>>>> Ok, sounds good.
>>>>
>>>>>
>>>>>> The only significant complication I can foresee with implementing
>>>> that approach -
>>>>>> we'll need a an array of 'fast' function pointers per queue, not
>>>> per device as we have now
>>>>>> (to avoid extra indirection for callback implementation).
>>>>>> Though as a bonus we'll have ability to use different RX/TX
>>>> funcions per queue.
>>>>>>
>>>>>
>>>>> What do you think split Rx/Tx callback into its own struct too?
>>>>>
>>>>> Overall 'rte_eth_dev' can be split into three as:
>>>>> 1. rte_eth_dev
>>>>> 2. rte_eth_dev_burst
>>>>> 3. rte_eth_dev_cb
>>>>>
>>>>> And we can hide 1 from applications even with the inline functions.
>>>>
>>>> As discussed off-line, I think:
>>>> it is possible.
>>>> My absolute preference would be to have just 1/2 (with CB hidden).
>>>> But even with 1/2/3 in place I think it would be  a good step
>> forward.
>>>> Probably worth to start with 1/2/3 first and then see how difficult
>> it
>>>> would be to switch to 1/2.
>>>> Do you plan to start working on it?
>>>>
>>>> Konstantin
>>>
>>> If you do proceed with this, be very careful. E.g. the inlined rx/tx
>> burst functions should not touch more cache lines than they do today -
>> especially if there are many active ports. The inlined rx/tx burst
>> functions are very simple, so thorough code review (and possibly also
>> of the resulting assembly) is appropriate. Simple performance testing
>> might not detect if more cache lines are accessed than before the
>> modifications.
>>>
>>> Don't get me wrong... I do consider this an improvement of the ethdev
>> library; I'm only asking you to take extra care!
>>>
>>
>> ack
>>
>> If we split as above, I think device specific data 'struct
>> rte_eth_dev_data'
>> should be part of 1 (rte_eth_dev). Which means Rx/Tx inline functions
>> access
>> additional cache line.
>>
>> To prevent this, what about duplicating 'data' in 2
>> (rte_eth_dev_burst)? We have
>> enough space for it to fit into single cache line, currently it is:
>> struct rte_eth_dev {
>>         eth_rx_burst_t             rx_pkt_burst;         /*     0     8
>> */
>>         eth_tx_burst_t             tx_pkt_burst;         /*     8     8
>> */
>>         eth_tx_prep_t              tx_pkt_prepare;       /*    16     8
>> */
>>         eth_rx_queue_count_t       rx_queue_count;       /*    24     8
>> */
>>         eth_rx_descriptor_done_t   rx_descriptor_done;   /*    32     8
>> */
>>         eth_rx_descriptor_status_t rx_descriptor_status; /*    40     8
>> */
>>         eth_tx_descriptor_status_t tx_descriptor_status; /*    48     8
>> */
>>         struct rte_eth_dev_data *  data;                 /*    56     8
>> */
>>         /* --- cacheline 1 boundary (64 bytes) --- */
>>
>> 'rx_descriptor_done' is deprecated and will be removed;
> 
> Makes sense.
> 
> Also consider moving 'data' to the top of the new struct, so there is room to add future functions below. (Without growing to more than the one cache line size, one new function can be added when 'rx_descriptor_done' has been removed.)
> 

+1

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-17 17:05  0%               ` Ananyev, Konstantin
@ 2021-06-18 10:28  0%                 ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2021-06-18 10:28 UTC (permalink / raw)
  To: Ananyev, Konstantin, Morten Brørup, Thomas Monjalon,
	Richardson, Bruce
  Cc: dev, olivier.matz, andrew.rybchenko, honnappa.nagarahalli,
	jerinj, gakhil

On 6/17/2021 6:05 PM, Ananyev, Konstantin wrote:
> 
> 
>> On 6/17/2021 4:17 PM, Morten Brørup wrote:
>>>> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
>>>> Sent: Thursday, 17 June 2021 16.59
>>>>
>>>>>>>>
>>>>>>>> 14/06/2021 15:15, Bruce Richardson:
>>>>>>>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
>>>>>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas
>>>> Monjalon
>>>>>>>>>>> Sent: Monday, 14 June 2021 12.59
>>>>>>>>>>>
>>>>>>>>>>> Performance of access in a fixed-size array is very good
>>>>>>>>>>> because of cache locality
>>>>>>>>>>> and because there is a single pointer to dereference.
>>>>>>>>>>> The only drawback is the lack of flexibility:
>>>>>>>>>>> the size of such an array cannot be increase at runtime.
>>>>>>>>>>>
>>>>>>>>>>> An approach to this problem is to allocate the array at
>>>> runtime,
>>>>>>>>>>> being as efficient as static arrays, but still limited to a
>>>> maximum.
>>>>>>>>>>>
>>>>>>>>>>> That's why the API rte_parray is introduced,
>>>>>>>>>>> allowing to declare an array of pointer which can be resized
>>>>>>>>>>> dynamically
>>>>>>>>>>> and automatically at runtime while keeping a good read
>>>> performance.
>>>>>>>>>>>
>>>>>>>>>>> After resize, the previous array is kept until the next resize
>>>>>>>>>>> to avoid crashs during a read without any lock.
>>>>>>>>>>>
>>>>>>>>>>> Each element is a pointer to a memory chunk dynamically
>>>> allocated.
>>>>>>>>>>> This is not good for cache locality but it allows to keep the
>>>> same
>>>>>>>>>>> memory per element, no matter how the array is resized.
>>>>>>>>>>> Cache locality could be improved with mempools.
>>>>>>>>>>> The other drawback is having to dereference one more pointer
>>>>>>>>>>> to read an element.
>>>>>>>>>>>
>>>>>>>>>>> There is not much locks, so the API is for internal use only.
>>>>>>>>>>> This API may be used to completely remove some compilation-
>>>> time
>>>>>>>>>>> maximums.
>>>>>>>>>>
>>>>>>>>>> I get the purpose and overall intention of this library.
>>>>>>>>>>
>>>>>>>>>> I probably already mentioned that I prefer "embedded style
>>>> programming" with fixed size arrays, rather than runtime
>>>> configurability.
>>>>>>> It's
>>>>>>>> my personal opinion, and the DPDK Tech Board clearly prefers
>>>> reducing the amount of compile time configurability, so there is no way
>>>>> for
>>>>>>>> me to stop this progress, and I do not intend to oppose to this
>>>> library. :-)
>>>>>>>>>>
>>>>>>>>>> This library is likely to become a core library of DPDK, so I
>>>> think it is important getting it right. Could you please mention a few
>>>>>>> examples
>>>>>>>> where you think this internal library should be used, and where
>>>> it should not be used. Then it is easier to discuss if the border line
>>>>> between
>>>>>>>> control path and data plane is correct. E.g. this library is not
>>>> intended to be used for dynamically sized packet queues that grow and
>>>>> shrink
>>>>>>> in
>>>>>>>> the fast path.
>>>>>>>>>>
>>>>>>>>>> If the library becomes a core DPDK library, it should probably
>>>> be public instead of internal. E.g. if the library is used to make
>>>>>>>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some
>>>> applications might also need dynamically sized arrays for their
>>>>>>>> application specific per-port runtime data, and this library
>>>> could serve that purpose too.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks Thomas for starting this discussion and Morten for
>>>> follow-up.
>>>>>>>>>
>>>>>>>>> My thinking is as follows, and I'm particularly keeping in mind
>>>> the cases
>>>>>>>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
>>>>>>>>>
>>>>>>>>> While I dislike the hard-coded limits in DPDK, I'm also not
>>>> convinced that
>>>>>>>>> we should switch away from the flat arrays or that we need fully
>>>> dynamic
>>>>>>>>> arrays that grow/shrink at runtime for ethdevs. I would suggest
>>>> a half-way
>>>>>>>>> house here, where we keep the ethdevs as an array, but one
>>>> allocated/sized
>>>>>>>>> at runtime rather than statically. This would allow us to have a
>>>>>>>>> compile-time default value, but, for use cases that need it,
>>>> allow use of a
>>>>>>>>> flag e.g.  "max-ethdevs" to change the size of the parameter
>>>> given to the
>>>>>>>>> malloc call for the array.  This max limit could then be
>>>> provided to apps
>>>>>>>>> too if they want to match any array sizes. [Alternatively those
>>>> apps could
>>>>>>>>> check the provided size and error out if the size has been
>>>> increased beyond
>>>>>>>>> what the app is designed to use?]. There would be no extra
>>>> dereferences per
>>>>>>>>> rx/tx burst call in this scenario so performance should be the
>>>> same as
>>>>>>>>> before (potentially better if array is in hugepage memory, I
>>>> suppose).
>>>>>>>>
>>>>>>>> I think we need some benchmarks to decide what is the best
>>>> tradeoff.
>>>>>>>> I spent time on this implementation, but sorry I won't have time
>>>> for benchmarks.
>>>>>>>> Volunteers?
>>>>>>>
>>>>>>> I had only a quick look at your approach so far.
>>>>>>> But from what I can read, in MT environment your suggestion will
>>>> require
>>>>>>> extra synchronization for each read-write access to such parray
>>>> element (lock, rcu, ...).
>>>>>>> I think what Bruce suggests will be much ligther, easier to
>>>> implement and less error prone.
>>>>>>> At least for rte_ethdevs[] and friends.
>>>>>>> Konstantin
>>>>>>
>>>>>> One more thought here - if we are talking about rte_ethdev[] in
>>>> particular, I think  we can:
>>>>>> 1. move public function pointers (rx_pkt_burst(), etc.) from
>>>> rte_ethdev into a separate flat array.
>>>>>> We can keep it public to still use inline functions for 'fast'
>>>> calls rte_eth_rx_burst(), etc. to avoid
>>>>>> any regressions.
>>>>>> That could still be flat array with max_size specified at
>>>> application startup.
>>>>>> 2. Hide rest of rte_ethdev struct in .c.
>>>>>> That will allow us to change the struct itself and the whole
>>>> rte_ethdev[] table in a way we like
>>>>>> (flat array, vector, hash, linked list) without ABI/API breakages.
>>>>>>
>>>>>> Yes, it would require all PMDs to change prototype for
>>>> pkt_rx_burst() function
>>>>>> (to accept port_id, queue_id instead of queue pointer), but the
>>>> change is mechanical one.
>>>>>> Probably some macro can be provided to simplify it.
>>>>>>
>>>>>
>>>>> We are already planning some tasks for ABI stability for v21.11, I
>>>> think
>>>>> splitting 'struct rte_eth_dev' can be part of that task, it enables
>>>> hiding more
>>>>> internal data.
>>>>
>>>> Ok, sounds good.
>>>>
>>>>>
>>>>>> The only significant complication I can foresee with implementing
>>>> that approach -
>>>>>> we'll need a an array of 'fast' function pointers per queue, not
>>>> per device as we have now
>>>>>> (to avoid extra indirection for callback implementation).
>>>>>> Though as a bonus we'll have ability to use different RX/TX
>>>> funcions per queue.
>>>>>>
>>>>>
>>>>> What do you think split Rx/Tx callback into its own struct too?
>>>>>
>>>>> Overall 'rte_eth_dev' can be split into three as:
>>>>> 1. rte_eth_dev
>>>>> 2. rte_eth_dev_burst
>>>>> 3. rte_eth_dev_cb
>>>>>
>>>>> And we can hide 1 from applications even with the inline functions.
>>>>
>>>> As discussed off-line, I think:
>>>> it is possible.
>>>> My absolute preference would be to have just 1/2 (with CB hidden).
>>>> But even with 1/2/3 in place I think it would be  a good step forward.
>>>> Probably worth to start with 1/2/3 first and then see how difficult it
>>>> would be to switch to 1/2.
>>>> Do you plan to start working on it?
>>>>
>>>> Konstantin
>>>
>>> If you do proceed with this, be very careful. E.g. the inlined rx/tx burst functions should not touch more cache lines than they do today -
>> especially if there are many active ports. The inlined rx/tx burst functions are very simple, so thorough code review (and possibly also of the
>> resulting assembly) is appropriate. Simple performance testing might not detect if more cache lines are accessed than before the
>> modifications.
>>>
>>> Don't get me wrong... I do consider this an improvement of the ethdev library; I'm only asking you to take extra care!
>>>
>>
>> ack
>>
>> If we split as above, I think device specific data 'struct rte_eth_dev_data'
>> should be part of 1 (rte_eth_dev). Which means Rx/Tx inline functions access
>> additional cache line.
>>
>> To prevent this, what about duplicating 'data' in 2 (rte_eth_dev_burst)?
> 
> I think it would be better to change rx_pkt_burst() to accept port_id and queue_id,
> instead of void *.
> I.E:
> typedef uint16_t (*eth_rx_burst_t)(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts,  uint16_t nb_pkts);
> 

May not need to add 'port_id', since in the callback you are already in the
driver scope and all required device specific variables already accessible via
help of queue struct.

> And we can do actual de-referencing of private rxq data inside the actual rx function.
> 

Yes we can replace queue struct with 'queue_id', and do the referencing in the
Rx instead of burst API, but what is the benefit of it?

>> We have
>> enough space for it to fit into single cache line, currently it is:
>> struct rte_eth_dev {
>>         eth_rx_burst_t             rx_pkt_burst;         /*     0     8 */
>>         eth_tx_burst_t             tx_pkt_burst;         /*     8     8 */
>>         eth_tx_prep_t              tx_pkt_prepare;       /*    16     8 */
>>         eth_rx_queue_count_t       rx_queue_count;       /*    24     8 */
>>         eth_rx_descriptor_done_t   rx_descriptor_done;   /*    32     8 */
>>         eth_rx_descriptor_status_t rx_descriptor_status; /*    40     8 */
>>         eth_tx_descriptor_status_t tx_descriptor_status; /*    48     8 */
>>         struct rte_eth_dev_data *  data;                 /*    56     8 */
>>         /* --- cacheline 1 boundary (64 bytes) --- */
>>
>> 'rx_descriptor_done' is deprecated and will be removed;


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-17 15:44  3%           ` Ferruh Yigit
@ 2021-06-18 10:41  0%             ` Ananyev, Konstantin
  2021-06-18 10:49  0%               ` Ferruh Yigit
  2021-06-21 11:06  0%               ` Ananyev, Konstantin
  0 siblings, 2 replies; 200+ results
From: Ananyev, Konstantin @ 2021-06-18 10:41 UTC (permalink / raw)
  To: Yigit, Ferruh, Thomas Monjalon, Richardson, Bruce
  Cc: Morten Brørup, dev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, jerinj, gakhil


> >>>>>
> >>>>> 14/06/2021 15:15, Bruce Richardson:
> >>>>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
> >>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
> >>>>>>>> Sent: Monday, 14 June 2021 12.59
> >>>>>>>>
> >>>>>>>> Performance of access in a fixed-size array is very good
> >>>>>>>> because of cache locality
> >>>>>>>> and because there is a single pointer to dereference.
> >>>>>>>> The only drawback is the lack of flexibility:
> >>>>>>>> the size of such an array cannot be increase at runtime.
> >>>>>>>>
> >>>>>>>> An approach to this problem is to allocate the array at runtime,
> >>>>>>>> being as efficient as static arrays, but still limited to a maximum.
> >>>>>>>>
> >>>>>>>> That's why the API rte_parray is introduced,
> >>>>>>>> allowing to declare an array of pointer which can be resized
> >>>>>>>> dynamically
> >>>>>>>> and automatically at runtime while keeping a good read performance.
> >>>>>>>>
> >>>>>>>> After resize, the previous array is kept until the next resize
> >>>>>>>> to avoid crashs during a read without any lock.
> >>>>>>>>
> >>>>>>>> Each element is a pointer to a memory chunk dynamically allocated.
> >>>>>>>> This is not good for cache locality but it allows to keep the same
> >>>>>>>> memory per element, no matter how the array is resized.
> >>>>>>>> Cache locality could be improved with mempools.
> >>>>>>>> The other drawback is having to dereference one more pointer
> >>>>>>>> to read an element.
> >>>>>>>>
> >>>>>>>> There is not much locks, so the API is for internal use only.
> >>>>>>>> This API may be used to completely remove some compilation-time
> >>>>>>>> maximums.
> >>>>>>>
> >>>>>>> I get the purpose and overall intention of this library.
> >>>>>>>
> >>>>>>> I probably already mentioned that I prefer "embedded style programming" with fixed size arrays, rather than runtime
> configurability.
> >>>> It's
> >>>>> my personal opinion, and the DPDK Tech Board clearly prefers reducing the amount of compile time configurability, so there is no
> way
> >> for
> >>>>> me to stop this progress, and I do not intend to oppose to this library. :-)
> >>>>>>>
> >>>>>>> This library is likely to become a core library of DPDK, so I think it is important getting it right. Could you please mention a few
> >>>> examples
> >>>>> where you think this internal library should be used, and where it should not be used. Then it is easier to discuss if the border line
> >> between
> >>>>> control path and data plane is correct. E.g. this library is not intended to be used for dynamically sized packet queues that grow and
> >> shrink
> >>>> in
> >>>>> the fast path.
> >>>>>>>
> >>>>>>> If the library becomes a core DPDK library, it should probably be public instead of internal. E.g. if the library is used to make
> >>>>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some applications might also need dynamically sized arrays for
> their
> >>>>> application specific per-port runtime data, and this library could serve that purpose too.
> >>>>>>>
> >>>>>>
> >>>>>> Thanks Thomas for starting this discussion and Morten for follow-up.
> >>>>>>
> >>>>>> My thinking is as follows, and I'm particularly keeping in mind the cases
> >>>>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
> >>>>>>
> >>>>>> While I dislike the hard-coded limits in DPDK, I'm also not convinced that
> >>>>>> we should switch away from the flat arrays or that we need fully dynamic
> >>>>>> arrays that grow/shrink at runtime for ethdevs. I would suggest a half-way
> >>>>>> house here, where we keep the ethdevs as an array, but one allocated/sized
> >>>>>> at runtime rather than statically. This would allow us to have a
> >>>>>> compile-time default value, but, for use cases that need it, allow use of a
> >>>>>> flag e.g.  "max-ethdevs" to change the size of the parameter given to the
> >>>>>> malloc call for the array.  This max limit could then be provided to apps
> >>>>>> too if they want to match any array sizes. [Alternatively those apps could
> >>>>>> check the provided size and error out if the size has been increased beyond
> >>>>>> what the app is designed to use?]. There would be no extra dereferences per
> >>>>>> rx/tx burst call in this scenario so performance should be the same as
> >>>>>> before (potentially better if array is in hugepage memory, I suppose).
> >>>>>
> >>>>> I think we need some benchmarks to decide what is the best tradeoff.
> >>>>> I spent time on this implementation, but sorry I won't have time for benchmarks.
> >>>>> Volunteers?
> >>>>
> >>>> I had only a quick look at your approach so far.
> >>>> But from what I can read, in MT environment your suggestion will require
> >>>> extra synchronization for each read-write access to such parray element (lock, rcu, ...).
> >>>> I think what Bruce suggests will be much ligther, easier to implement and less error prone.
> >>>> At least for rte_ethdevs[] and friends.
> >>>> Konstantin
> >>>
> >>> One more thought here - if we are talking about rte_ethdev[] in particular, I think  we can:
> >>> 1. move public function pointers (rx_pkt_burst(), etc.) from rte_ethdev into a separate flat array.
> >>> We can keep it public to still use inline functions for 'fast' calls rte_eth_rx_burst(), etc. to avoid
> >>> any regressions.
> >>> That could still be flat array with max_size specified at application startup.
> >>> 2. Hide rest of rte_ethdev struct in .c.
> >>> That will allow us to change the struct itself and the whole rte_ethdev[] table in a way we like
> >>> (flat array, vector, hash, linked list) without ABI/API breakages.
> >>>
> >>> Yes, it would require all PMDs to change prototype for pkt_rx_burst() function
> >>> (to accept port_id, queue_id instead of queue pointer), but the change is mechanical one.
> >>> Probably some macro can be provided to simplify it.
> >>>
> >>
> >> We are already planning some tasks for ABI stability for v21.11, I think
> >> splitting 'struct rte_eth_dev' can be part of that task, it enables hiding more
> >> internal data.
> >
> > Ok, sounds good.
> >
> >>
> >>> The only significant complication I can foresee with implementing that approach -
> >>> we'll need a an array of 'fast' function pointers per queue, not per device as we have now
> >>> (to avoid extra indirection for callback implementation).
> >>> Though as a bonus we'll have ability to use different RX/TX funcions per queue.
> >>>
> >>
> >> What do you think split Rx/Tx callback into its own struct too?
> >>
> >> Overall 'rte_eth_dev' can be split into three as:
> >> 1. rte_eth_dev
> >> 2. rte_eth_dev_burst
> >> 3. rte_eth_dev_cb
> >>
> >> And we can hide 1 from applications even with the inline functions.
> >
> > As discussed off-line, I think:
> > it is possible.
> > My absolute preference would be to have just 1/2 (with CB hidden).
> 
> How can we hide the callbacks since they are used by inline burst functions.

I probably I owe a better explanation to what I meant in first mail.
Otherwise it sounds confusing.
I'll try to write a more detailed one in next few days.

> > But even with 1/2/3 in place I think it would be  a good step forward.
> > Probably worth to start with 1/2/3 first and then see how difficult it
> > would be to switch to 1/2.
> 
> What do you mean by switch to 1/2?

When we'll have just:
1. rte_eth_dev (hidden in .c)
2. rte_eth_dev_burst (visible)

And no specific public struct/array for callbacks - they will be hidden in rte_eth_dev.

> 
> If we keep having inline functions, and split struct as above three structs, we
> can only hide 1, and 2/3 will be still visible to apps because of inline
> functions. This way we will be able to hide more still having same performance.

I understand that, and as I said above - I think it is a good step forward.
Though even better would be to hide rte_eth_dev_cb too. 

> 
> > Do you plan to start working on it?
> >
> 
> We are gathering the list of the tasks for the ABI stability, most probably they
> will be worked on during v21.11. I can take this one.

Cool, please keep me in a loop.
I'll try to free some cycles for 21.11 to get involved and help (if needed off-course).
Konstantin



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-18 10:41  0%             ` Ananyev, Konstantin
@ 2021-06-18 10:49  0%               ` Ferruh Yigit
  2021-06-21 11:06  0%               ` Ananyev, Konstantin
  1 sibling, 0 replies; 200+ results
From: Ferruh Yigit @ 2021-06-18 10:49 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon, Richardson, Bruce
  Cc: Morten Brørup, dev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, jerinj, gakhil

On 6/18/2021 11:41 AM, Ananyev, Konstantin wrote:
> 
>>>>>>>
>>>>>>> 14/06/2021 15:15, Bruce Richardson:
>>>>>>>> On Mon, Jun 14, 2021 at 02:22:42PM +0200, Morten Brørup wrote:
>>>>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon
>>>>>>>>>> Sent: Monday, 14 June 2021 12.59
>>>>>>>>>>
>>>>>>>>>> Performance of access in a fixed-size array is very good
>>>>>>>>>> because of cache locality
>>>>>>>>>> and because there is a single pointer to dereference.
>>>>>>>>>> The only drawback is the lack of flexibility:
>>>>>>>>>> the size of such an array cannot be increase at runtime.
>>>>>>>>>>
>>>>>>>>>> An approach to this problem is to allocate the array at runtime,
>>>>>>>>>> being as efficient as static arrays, but still limited to a maximum.
>>>>>>>>>>
>>>>>>>>>> That's why the API rte_parray is introduced,
>>>>>>>>>> allowing to declare an array of pointer which can be resized
>>>>>>>>>> dynamically
>>>>>>>>>> and automatically at runtime while keeping a good read performance.
>>>>>>>>>>
>>>>>>>>>> After resize, the previous array is kept until the next resize
>>>>>>>>>> to avoid crashs during a read without any lock.
>>>>>>>>>>
>>>>>>>>>> Each element is a pointer to a memory chunk dynamically allocated.
>>>>>>>>>> This is not good for cache locality but it allows to keep the same
>>>>>>>>>> memory per element, no matter how the array is resized.
>>>>>>>>>> Cache locality could be improved with mempools.
>>>>>>>>>> The other drawback is having to dereference one more pointer
>>>>>>>>>> to read an element.
>>>>>>>>>>
>>>>>>>>>> There is not much locks, so the API is for internal use only.
>>>>>>>>>> This API may be used to completely remove some compilation-time
>>>>>>>>>> maximums.
>>>>>>>>>
>>>>>>>>> I get the purpose and overall intention of this library.
>>>>>>>>>
>>>>>>>>> I probably already mentioned that I prefer "embedded style programming" with fixed size arrays, rather than runtime
>> configurability.
>>>>>> It's
>>>>>>> my personal opinion, and the DPDK Tech Board clearly prefers reducing the amount of compile time configurability, so there is no
>> way
>>>> for
>>>>>>> me to stop this progress, and I do not intend to oppose to this library. :-)
>>>>>>>>>
>>>>>>>>> This library is likely to become a core library of DPDK, so I think it is important getting it right. Could you please mention a few
>>>>>> examples
>>>>>>> where you think this internal library should be used, and where it should not be used. Then it is easier to discuss if the border line
>>>> between
>>>>>>> control path and data plane is correct. E.g. this library is not intended to be used for dynamically sized packet queues that grow and
>>>> shrink
>>>>>> in
>>>>>>> the fast path.
>>>>>>>>>
>>>>>>>>> If the library becomes a core DPDK library, it should probably be public instead of internal. E.g. if the library is used to make
>>>>>>> RTE_MAX_ETHPORTS dynamic instead of compile time fixed, then some applications might also need dynamically sized arrays for
>> their
>>>>>>> application specific per-port runtime data, and this library could serve that purpose too.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks Thomas for starting this discussion and Morten for follow-up.
>>>>>>>>
>>>>>>>> My thinking is as follows, and I'm particularly keeping in mind the cases
>>>>>>>> of e.g. RTE_MAX_ETHPORTS, as a leading candidate here.
>>>>>>>>
>>>>>>>> While I dislike the hard-coded limits in DPDK, I'm also not convinced that
>>>>>>>> we should switch away from the flat arrays or that we need fully dynamic
>>>>>>>> arrays that grow/shrink at runtime for ethdevs. I would suggest a half-way
>>>>>>>> house here, where we keep the ethdevs as an array, but one allocated/sized
>>>>>>>> at runtime rather than statically. This would allow us to have a
>>>>>>>> compile-time default value, but, for use cases that need it, allow use of a
>>>>>>>> flag e.g.  "max-ethdevs" to change the size of the parameter given to the
>>>>>>>> malloc call for the array.  This max limit could then be provided to apps
>>>>>>>> too if they want to match any array sizes. [Alternatively those apps could
>>>>>>>> check the provided size and error out if the size has been increased beyond
>>>>>>>> what the app is designed to use?]. There would be no extra dereferences per
>>>>>>>> rx/tx burst call in this scenario so performance should be the same as
>>>>>>>> before (potentially better if array is in hugepage memory, I suppose).
>>>>>>>
>>>>>>> I think we need some benchmarks to decide what is the best tradeoff.
>>>>>>> I spent time on this implementation, but sorry I won't have time for benchmarks.
>>>>>>> Volunteers?
>>>>>>
>>>>>> I had only a quick look at your approach so far.
>>>>>> But from what I can read, in MT environment your suggestion will require
>>>>>> extra synchronization for each read-write access to such parray element (lock, rcu, ...).
>>>>>> I think what Bruce suggests will be much ligther, easier to implement and less error prone.
>>>>>> At least for rte_ethdevs[] and friends.
>>>>>> Konstantin
>>>>>
>>>>> One more thought here - if we are talking about rte_ethdev[] in particular, I think  we can:
>>>>> 1. move public function pointers (rx_pkt_burst(), etc.) from rte_ethdev into a separate flat array.
>>>>> We can keep it public to still use inline functions for 'fast' calls rte_eth_rx_burst(), etc. to avoid
>>>>> any regressions.
>>>>> That could still be flat array with max_size specified at application startup.
>>>>> 2. Hide rest of rte_ethdev struct in .c.
>>>>> That will allow us to change the struct itself and the whole rte_ethdev[] table in a way we like
>>>>> (flat array, vector, hash, linked list) without ABI/API breakages.
>>>>>
>>>>> Yes, it would require all PMDs to change prototype for pkt_rx_burst() function
>>>>> (to accept port_id, queue_id instead of queue pointer), but the change is mechanical one.
>>>>> Probably some macro can be provided to simplify it.
>>>>>
>>>>
>>>> We are already planning some tasks for ABI stability for v21.11, I think
>>>> splitting 'struct rte_eth_dev' can be part of that task, it enables hiding more
>>>> internal data.
>>>
>>> Ok, sounds good.
>>>
>>>>
>>>>> The only significant complication I can foresee with implementing that approach -
>>>>> we'll need a an array of 'fast' function pointers per queue, not per device as we have now
>>>>> (to avoid extra indirection for callback implementation).
>>>>> Though as a bonus we'll have ability to use different RX/TX funcions per queue.
>>>>>
>>>>
>>>> What do you think split Rx/Tx callback into its own struct too?
>>>>
>>>> Overall 'rte_eth_dev' can be split into three as:
>>>> 1. rte_eth_dev
>>>> 2. rte_eth_dev_burst
>>>> 3. rte_eth_dev_cb
>>>>
>>>> And we can hide 1 from applications even with the inline functions.
>>>
>>> As discussed off-line, I think:
>>> it is possible.
>>> My absolute preference would be to have just 1/2 (with CB hidden).
>>
>> How can we hide the callbacks since they are used by inline burst functions.
> 
> I probably I owe a better explanation to what I meant in first mail.
> Otherwise it sounds confusing.
> I'll try to write a more detailed one in next few days.
> 
>>> But even with 1/2/3 in place I think it would be  a good step forward.
>>> Probably worth to start with 1/2/3 first and then see how difficult it
>>> would be to switch to 1/2.
>>
>> What do you mean by switch to 1/2?
> 
> When we'll have just:
> 1. rte_eth_dev (hidden in .c)
> 2. rte_eth_dev_burst (visible)
> 
> And no specific public struct/array for callbacks - they will be hidden in rte_eth_dev.
> 

If we can hide them, agree this is better.

>>
>> If we keep having inline functions, and split struct as above three structs, we
>> can only hide 1, and 2/3 will be still visible to apps because of inline
>> functions. This way we will be able to hide more still having same performance.
> 
> I understand that, and as I said above - I think it is a good step forward.
> Though even better would be to hide rte_eth_dev_cb too.
> 
>>
>>> Do you plan to start working on it?
>>>
>>
>> We are gathering the list of the tasks for the ABI stability, most probably they
>> will be worked on during v21.11. I can take this one.
> 
> Cool, please keep me in a loop.
> I'll try to free some cycles for 21.11 to get involved and help (if needed off-course).

That would be great, thanks.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] devtools: script to track map symbols
@ 2021-06-18 16:36  5% Ray Kinsella
  2021-06-21 15:25  6% ` [dpdk-dev] [PATCH v3] " Ray Kinsella
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Ray Kinsella @ 2021-06-18 16:36 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, thomas, ktraynor, bruce.richardson, mdr

Script to track growth of stable and experimental symbols
over releases since v19.11.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 devtools/count_symbols.py | 230 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 230 insertions(+)
 create mode 100755 devtools/count_symbols.py

diff --git a/devtools/count_symbols.py b/devtools/count_symbols.py
new file mode 100755
index 0000000000..7b29651044
--- /dev/null
+++ b/devtools/count_symbols.py
@@ -0,0 +1,230 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+from pathlib import Path
+import sys, os
+import subprocess
+import argparse
+import re
+import datetime
+
+try:
+        from parsley import makeGrammar
+except ImportError:
+        print('This script uses the package Parsley to parse C Mapfiles.\n'
+              'This can be installed with \"pip install parsley".')
+        exit()
+
+symbolMapGrammar = r"""
+
+ws = (' ' | '\r' | '\n' | '\t')*
+
+ABI_VER = ({})
+DPDK_VER = ('DPDK_' ABI_VER)
+ABI_NAME = ('INTERNAL' | 'EXPERIMENTAL' | DPDK_VER)
+comment = '#' (~'\n' anything)+ '\n'
+symbol = (~(';' | '}}' | '#') anything )+:c ';' -> ''.join(c)
+global = 'global:'
+local = 'local: *;'
+symbols = comment* symbol:s ws comment* -> s
+
+abi = (abi_section+):m -> dict(m)
+abi_section = (ws ABI_NAME:e ws '{{' ws global* (~local ws symbols)*:s ws local* ws '}}' ws DPDK_VER* ';' ws) -> (e,s)
+"""
+
+#abi_ver = ['21', '20.0.1', '20.0', '20']
+
+def get_abi_versions():
+    year = datetime.date.today().year - 2000
+    s=" |".join(['\'{}\''.format(i) for i in reversed(range(21, year + 1)) ])
+    s = s + ' | \'20.0.1\' | \'20.0\' | \'20\''
+
+    return s
+
+def get_dpdk_releases():
+    year = datetime.date.today().year - 2000
+    s="|".join("{}".format(i) for i in range(19,year + 1))
+    pattern = re.compile('^\"v(' + s + ')\.\d{2}\"$')
+
+    cmd = ['git', 'for-each-ref', '--sort=taggerdate', '--format', '"%(tag)"']
+    result = subprocess.run(cmd, \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE)
+    if result.stderr.startswith(b'fatal'):
+        result = None
+
+    tags = result.stdout.decode('utf-8').split('\n')
+
+    # find the non-rcs between now and v19.11
+    tags = [ tag.replace('\"','') \
+             for tag in reversed(tags) \
+             if pattern.match(tag) ][:-3]
+
+    return tags
+
+
+def get_terminal_rows():
+    rows, _ = os.popen('stty size', 'r').read().split()
+    return int(rows)
+
+def fix_directory_name(path):
+    mapfilepath1 = str(path.parent.name)
+    mapfilepath2 = str(path.parents[1])
+    mapfilepath = mapfilepath2 + '/librte_' + mapfilepath1
+
+    return mapfilepath
+
+# fix removal of the librte_ from the directory names
+def directory_renamed(path, rel):
+    mapfilepath = fix_directory_name(path)
+    tagfile = '{}:{}/{}'.format(rel, mapfilepath,  path.name)
+
+    result = subprocess.run(['git', 'show', tagfile], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE)
+    if result.stderr.startswith(b'fatal'):
+        result = None
+
+    return result
+
+# fix renaming of map files
+def mapfile_renamed(path, rel):
+    newfile = None
+
+    result = subprocess.run(['git', 'ls-tree', \
+                             rel, str(path.parent) + '/'], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE)
+    dentries = result.stdout.decode('utf-8')
+    dentries = dentries.split('\n')
+
+    # filter entries looking for the map file
+    dentries = [dentry for dentry in dentries if dentry.endswith('.map')]
+    if len(dentries) > 1 or len(dentries) == 0:
+        return None
+
+    dparts = dentries[0].split('/')
+    newfile = dparts[len(dparts) - 1]
+
+    if(newfile is not None):
+        tagfile = '{}:{}/{}'.format(rel, path.parent, newfile)
+
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE)
+        if result.stderr.startswith(b'fatal'):
+            result = None
+
+    else:
+        result = None
+
+    return result
+
+# renaming of the map file & renaming of directory
+def mapfile_and_directory_renamed(path, rel):
+    mapfilepath = Path("{}/{}".format(fix_directory_name(path),path.name))
+
+    return mapfile_renamed(mapfilepath, rel)
+
+fix_strategies = [directory_renamed, \
+                  mapfile_renamed, \
+                  mapfile_and_directory_renamed]
+
+fmt = col_fmt = ""
+
+def set_terminal_output(dpdk_rel):
+    global fmt, col_fmt
+
+    fmt = '{:<50}'
+    col_fmt = fmt
+    for rel in dpdk_rel:
+        fmt += '{:<6}{:<6}'
+        col_fmt += '{:<12}'
+
+def set_csv_output(dpdk_rel):
+    global fmt, col_fmt
+
+    fmt = '{},'
+    col_fmt = fmt
+    for rel in dpdk_rel:
+        fmt += '{},{},'
+        col_fmt += '{},,'
+
+output_formats = { None: set_terminal_output, \
+                   'terminal': set_terminal_output, \
+                   'csv': set_csv_output }
+directories = 'drivers, lib'
+
+def main():
+    global fmt, col_fmt, symbolMapGrammar
+
+    parser = argparse.ArgumentParser(description='Count symbols in DPDK Libs')
+    parser.add_argument('--format-output', choices=['terminal','csv'], \
+                        default='terminal')
+    parser.add_argument('--directory', choices=directories,
+                        default=directories)
+    args = parser.parse_args()
+
+    dpdk_rel = get_dpdk_releases()
+
+    # set the output format
+    output_formats[args.format_output](dpdk_rel)
+
+    column_titles = ['mapfile'] + dpdk_rel
+    print(col_fmt.format(*column_titles))
+
+    symbolMapGrammar = symbolMapGrammar.format(get_abi_versions())
+    MAPParser = makeGrammar(symbolMapGrammar, {})
+
+    terminal_rows = get_terminal_rows()
+    row = 0
+
+    for src_dir in args.directory.split(','):
+        for path in Path(src_dir).rglob('*.map'):
+            csym = [0] * 2
+            relsym = [str(path)]
+
+            for rel in dpdk_rel:
+                i = csym[0] = csym[1] = 0
+                abi_sections = None
+
+                tagfile = '{}:{}'.format(rel,path)
+                result = subprocess.run(['git', 'show', tagfile], \
+                                        stdout=subprocess.PIPE, \
+                                        stderr=subprocess.PIPE)
+
+                if result.stderr.startswith(b'fatal'):
+                    result = None
+
+                while(result is None and i < len(fix_strategies)):
+                    result = fix_strategies[i](path, rel)
+                    i += 1
+
+                if result is not None:
+                    mapfile = result.stdout.decode('utf-8')
+                    abi_sections = MAPParser(mapfile).abi()
+
+                if abi_sections is not None:
+                    # which versions are present, and we care about
+                    ignore = ['EXPERIMENTAL','INTERNAL']
+                    found_ver = [ver \
+                                 for ver in abi_sections \
+                                 if ver not in ignore]
+
+                    for ver in found_ver:
+                        csym[0] += len(abi_sections[ver])
+
+                    # count experimental symbols
+                    if 'EXPERIMENTAL' in abi_sections:
+                        csym[1] = len(abi_sections['EXPERIMENTAL'])
+
+                relsym += csym
+
+            print(fmt.format(*relsym))
+            row += 1
+
+        if((terminal_rows>0) and ((row % terminal_rows) == 0)):
+            print(col_fmt.format(*column_titles))
+
+if __name__ == '__main__':
+        main()
-- 
2.26.2


^ permalink raw reply	[relevance 5%]

* [dpdk-dev] [PATCH v10 0/9] eal: Add EAL API for threading
    @ 2021-06-18 21:26  3%   ` Narcisa Ana Maria Vasile
  1 sibling, 0 replies; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-06-18 21:26 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

EAL thread API

**Problem Statement**
DPDK currently uses the pthread interface to create and manage threads.
Windows does not support the POSIX thread programming model,
so it currently relies on a header file that hides the Windows
calls under pthread matched interfaces.
Given that EAL should isolate the environment specifics from
the applications and libraries and mediate all the communication
with the operating systems, a new EAL interface
is needed for thread management.

**Goals**
* Introduce a generic EAL API for threading support that will remove
  the current Windows pthread.h shim.
* Replace references to pthread_* across the DPDK codebase with the new
  RTE_THREAD_* API.
* Allow users to choose between using the RTE_THREAD_* API or a
  3rd party thread library through a configuration option.

**Design plan**
New API main files:
* rte_thread.h (librte_eal/include)
* rte_thread.c (librte_eal/windows)
* rte_thread.c (librte_eal/common)

**A schematic example of the design**
--------------------------------------------------
lib/librte_eal/include/rte_thread.h
int rte_thread_create();

lib/librte_eal/common/rte_thread.c
int rte_thread_create() 
{
	return pthread_create();
}

lib/librte_eal/windows/rte_thread.c
int rte_thread_create() 
{
	return CreateThread();
}
-----------------------------------------------------

**Thread attributes**

When or after a thread is created, specific characteristics of the thread
can be adjusted. Given that the thread characteristics that are of interest
for DPDK applications are affinity and priority, the following structure
that represents thread attributes has been defined:

typedef struct
{
	enum rte_thread_priority priority;
	rte_cpuset_t cpuset;
} rte_thread_attr_t;

The *rte_thread_create()* function can optionally receive
an rte_thread_attr_t object that will cause the thread to be created
with the affinity and priority described by the attributes object.
If no rte_thread_attr_t is passed (parameter is NULL),
the default affinity and priority are used.
An rte_thread_attr_t object can also be set to the default values
by calling *rte_thread_attr_init()*.

*Priority* is represented through an enum that currently advertises
two values for priority:
	- RTE_THREAD_PRIORITY_NORMAL
	- RTE_THREAD_PRIORITY_REALTIME_CRITICAL
The enum can be extended to allow for multiple priority levels.
rte_thread_set_priority      - sets the priority of a thread
rte_thread_attr_set_priority - updates an rte_thread_attr_t object
                               with a new value for priority

The user can choose thread priority through an EAL parameter,
when starting an application.  If EAL parameter is not used,
the per-platform default value for thread priority is used.
Otherwise administrator has an option to set one of available options:
 --thread-prio normal
 --thread-prio realtime

Example:
./dpdk-l2fwd -l 0-3 -n 4 –thread-prio normal -- -q 8 -p ffff

*Affinity* is described by the already known “rte_cpuset_t” type.
rte_thread_attr_set/get_affinity - sets/gets the affinity field in a
                                   rte_thread_attr_t object
rte_thread_set/get_affinity      – sets/gets the affinity of a thread

**Errors**
A translation function that maps Windows error codes to errno-style
error codes is provided. 

**Future work**
The long term plan is for EAL to provide full threading support:
* Add support for conditional variables
* Add support for pthread_mutex_trylock
* Additional functionality offered by pthread_*
  (such as pthread_setname_np, etc.)

v10:
 - Remove patch no. 10. It will be broken down in subpatches 
   and sent as a different patchset that depends on this one.
   This is done due to the ABI breaks that would be caused by patch 10.
 - Replace unix/rte_thread.c with common/rte_thread.c
 - Remove initializations that may prevent compiler from issuing useful
   warnings.
 - Remove rte_thread_types.h and rte_windows_thread_types.h
 - Remove unneeded priority macros (EAL_THREAD_PRIORITY*)
 - Remove functions that retrieves thread handle from process handle
 - Remove rte_thread_cancel() until same behavior is obtained on
   all platforms.
 - Fix rte_thread_detach() function description,
   return value and remove empty line.
 - Reimplement mutex functions. Add compatible representation for mutex
   identifier. Add macro to replace static mutex initialization instances.
 - Fix commit messages (lines too long, remove unicode symbols)

v9:
- Sign patches

v8:
- Rebase
- Add rte_thread_detach() API
- Set default priority, when user did not specify a value

v7:
Based on DmitryK's review:
- Change thread id representation
- Change mutex id representation
- Implement static mutex inititalizer for Windows
- Change barrier identifier representation
- Improve commit messages
- Add missing doxygen comments
- Split error translation function
- Improve name for affinity function
- Remove cpuset_size parameter
- Fix eal_create_cpu_map function
- Map EAL priority values to OS specific values
- Add thread wrapper for start routine
- Do not export rte_thread_cancel() on Windows
- Cleanup, fix comments, fix typos.

v6:
- improve error-translation function
- call the error translation function in rte_thread_value_get()

v5:
- update cover letter with more details on the priority argument

v4:
- fix function description
- rebase

v3:
- rebase

v2:
- revert changes that break ABI 
- break up changes into smaller patches
- fix coding style issues
- fix issues with errors
- fix parameter type in examples/kni.c


Narcisa Vasile (9):
  eal: add basic threading functions
  eal: add thread attributes
  eal/windows: translate Windows errors to errno-style errors
  eal: implement functions for thread affinity management
  eal: implement thread priority management functions
  eal: add thread lifetime management
  eal: implement functions for mutex management
  eal: implement functions for thread barrier management
  eal: add EAL argument for setting thread priority

 lib/eal/common/eal_common_options.c |  28 +-
 lib/eal/common/eal_internal_cfg.h   |   2 +
 lib/eal/common/eal_options.h        |   2 +
 lib/eal/common/meson.build          |   1 +
 lib/eal/common/rte_thread.c         | 445 +++++++++++++++++++++
 lib/eal/include/rte_thread.h        | 406 ++++++++++++++++++-
 lib/eal/unix/meson.build            |   1 -
 lib/eal/unix/rte_thread.c           |  92 -----
 lib/eal/version.map                 |  20 +
 lib/eal/windows/eal_lcore.c         | 176 ++++++---
 lib/eal/windows/eal_windows.h       |  10 +
 lib/eal/windows/include/sched.h     |   2 +-
 lib/eal/windows/rte_thread.c        | 588 ++++++++++++++++++++++++++--
 13 files changed, 1599 insertions(+), 174 deletions(-)
 create mode 100644 lib/eal/common/rte_thread.c
 delete mode 100644 lib/eal/unix/rte_thread.c

-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v9 10/10] Enable the new EAL thread API
  @ 2021-06-18 21:53  0%         ` Narcisa Ana Maria Vasile
  0 siblings, 0 replies; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-06-18 21:53 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Thomas Monjalon, Dmitry Kozlyuk, Khoa To, navasile,
	Dmitry Malloy (MESHCHANINOV),
	roretzla, Tal Shnaiderman, Omar Cardona, Bruce Richardson,
	Pallavi Kadam

On Tue, Jun 08, 2021 at 09:45:44AM +0200, David Marchand wrote:
> On Tue, Jun 8, 2021 at 7:50 AM Narcisa Ana Maria Vasile
> <navasile@linux.microsoft.com> wrote:
> >
> > On Fri, Jun 04, 2021 at 04:44:34PM -0700, Narcisa Ana Maria Vasile wrote:
> > > From: Narcisa Vasile <navasile@microsoft.com>
> > >
> > > Rename pthread_* occurrences with the new rte_thread_* API.
> > > Enable the new API in the build system.
> > >
> > > Signed-off-by: Narcisa Vasile <navasile@microsoft.com>
> > > ---
> >
> > I'll send v10.
> > Can someone please help with an example on how to check for ABI breaks? Thank you!
> >
> > I've run:
> > DPDK_ABI_REF_VERSION=v21.05 DPDK_ABI_REF_DIR=~/ref ./devtools/test-meson-builds.sh
> > which doesn't give any warnings about the ABI break.
> 
> This should work the way you tried if you have working toolchains and
> libabigail installed.
> Something is off in your env.
> 
> Side note: ovsrobot is out those days (we have some trouble in one of
> RH labs and it happens ovsrobot is hosted there), but you could try
> with a github repo of yours + GHA, and the ABI failure should be
> caught too.
> 
> 
> I just tried on my rhel7 (gcc 4.8.5 + libabigail 1.8.2) with your
> series applied.
> $ DPDK_ABI_REF_VERSION=v21.05
> DPDK_ABI_REF_DIR=~/git/pub/dpdk.org/reference
> ./devtools/test-meson-builds.sh
> ...
> Error: ABI issue reported for 'abidiff --suppr
> /home/dmarchan/git/pub/dpdk.org/devtools/../devtools/libabigail.abignore
> --no-added-syms --headers-dir1
> /home/dmarchan/git/pub/dpdk.org/reference/v21.05/build-gcc-shared/usr/local/include
> --headers-dir2 /home/dmarchan/git/pub/dpdk.org/build-gcc-shared/install/usr/local/include
> /home/dmarchan/git/pub/dpdk.org/reference/v21.05/build-gcc-shared/dump/librte_eal.dump
> /home/dmarchan/git/pub/dpdk.org/build-gcc-shared/install/dump/librte_eal.dump'
> ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> this as a potential issue).
> 
> 
> $ abidiff --suppr
> /home/dmarchan/git/pub/dpdk.org/devtools/../devtools/libabigail.abignore
> --no-added-syms --headers-dir1
> /home/dmarchan/git/pub/dpdk.org/reference/v21.05/build-gcc-shared/usr/local/include
> --headers-dir2 /home/dmarchan/git/pub/dpdk.org/build-gcc-shared/install/usr/local/include
> /home/dmarchan/git/pub/dpdk.org/reference/v21.05/build-gcc-shared/dump/librte_eal.dump
> /home/dmarchan/git/pub/dpdk.org/build-gcc-shared/install/dump/librte_eal.dump
> Functions changes summary: 0 Removed, 2 Changed (1 filtered out), 0
> Added (20 filtered out) functions
> Variables changes summary: 0 Removed, 0 Changed, 0 Added variable
> 
> 2 functions with some indirect sub-type change:
> 
>   [C] 'function int rte_ctrl_thread_create(pthread_t*, const char*,
> const pthread_attr_t*, void* (void*)*, void*)' at rte_lcore.h:443:1
> has some indirect sub-type changes:
>     parameter 1 of type 'pthread_t*' changed:
>       in pointed to type 'typedef pthread_t' at rte_thread.h:42:1:
>         typedef name changed from pthread_t to rte_thread_t at rte_thread.h:42:1
>         underlying type 'unsigned long int' changed:
>           entity changed from 'unsigned long int' to 'struct
> rte_thread_tag' at rte_thread.h:40:1
>           type size hasn't changed
>     parameter 3 of type 'const pthread_attr_t*' changed:
>       in pointed to type 'const pthread_attr_t':
>         'const pthread_attr_t' changed to 'const rte_thread_attr_t'
> 
>   [C] 'function int rte_thread_setname(pthread_t, const char*)' at
> rte_lcore.h:377:1 has some indirect sub-type changes:
>     parameter 1 of type 'typedef pthread_t' changed:
>       typedef name changed from pthread_t to rte_thread_t at rte_thread.h:42:1
>       underlying type 'unsigned long int' changed:
>         entity changed from 'unsigned long int' to 'struct
> rte_thread_tag' at rte_thread.h:40:1
>         type size hasn't changed
> 
> 
> 
> Can you check that in your env build-gcc-shared/ and the build
> directory for references are configured with debug symbols?
> You should see:
> $ meson configure build-gcc-shared | awk '$1=="buildtype" {print $2}'
> debugoptimized
> $ meson configure reference/v21.05/build | awk '$1=="buildtype" {print $2}'
> debugoptimized
> 
> 
Thank you very much David! There was something wrong with my local reference.
Using your commands, I am able to run the tools now.
> 

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 2/6] eal: add function for control thread creation
  @ 2021-06-18 21:54  4% ` Narcisa Ana Maria Vasile
  2021-06-19  1:57  4% ` [dpdk-dev] [PATCH v2 0/6] Enable the internal EAL thread API Narcisa Ana Maria Vasile
  1 sibling, 0 replies; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-06-18 21:54 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

The existing rte_ctrl_thread_create() function will be replaced
with rte_thread_ctrl_thread_create() that uses the internal
EAL thread API.

This patch only introduces the new control thread creation
function. Replacing of the old function needs to be done according
to the ABI change procedures, to avoid an ABI break.

Signed-off-by: Narcisa Vasile <navasile@microsoft.com>
---
 lib/eal/common/eal_common_thread.c | 81 ++++++++++++++++++++++++++++++
 lib/eal/include/rte_thread.h       | 27 ++++++++++
 lib/eal/version.map                |  1 +
 3 files changed, 109 insertions(+)

diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
index 1a52f42a2b..79545c67d9 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -259,6 +259,87 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
 	return -ret;
 }
 
+struct rte_thread_ctrl_ctx {
+	rte_thread_func start_routine;
+	void *arg;
+	const char *name;
+};
+
+static void *ctrl_thread_wrapper(void *arg)
+{
+	struct internal_config *conf = eal_get_internal_configuration();
+	rte_cpuset_t *cpuset = &conf->ctrl_cpuset;
+	struct rte_thread_ctrl_ctx *ctx = arg;
+	rte_thread_func start_routine = ctx->start_routine;
+	void *routine_arg = ctx->arg;
+
+	__rte_thread_init(rte_lcore_id(), cpuset);
+
+	if (ctx->name != NULL) {
+		if (rte_thread_name_set(rte_thread_self(), ctx->name) < 0)
+			RTE_LOG(DEBUG, EAL, "Cannot set name for ctrl thread\n");
+	}
+
+	free(arg);
+
+	return start_routine(routine_arg);
+}
+
+int
+rte_thread_ctrl_thread_create(rte_thread_t *thread, const char *name,
+		rte_thread_func start_routine, void *arg)
+{
+	int ret;
+	rte_thread_attr_t attr;
+	struct internal_config *conf = eal_get_internal_configuration();
+	rte_cpuset_t *cpuset = &conf->ctrl_cpuset;
+	struct rte_thread_ctrl_ctx *ctx = NULL;
+
+	if (start_routine == NULL) {
+		ret = EINVAL;
+		goto cleanup;
+	}
+
+	ctx = malloc(sizeof(*ctx));
+	if (ctx == NULL) {
+		ret = ENOMEM;
+		goto cleanup;
+	}
+
+	ctx->start_routine = start_routine;
+	ctx->arg = arg;
+	ctx->name = name;
+
+	ret = rte_thread_attr_init(&attr);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot init ctrl thread attributes\n");
+		goto cleanup;
+	}
+
+	ret = rte_thread_attr_set_affinity(&attr, cpuset);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot set afifnity attribute for ctrl thread\n");
+		goto cleanup;
+	}
+	ret = rte_thread_attr_set_priority(&attr, RTE_THREAD_PRIORITY_NORMAL);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot set priority attribute for ctrl thread\n");
+		goto cleanup;
+	}
+
+	ret = rte_thread_create(thread, &attr, ctrl_thread_wrapper, ctx);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot create ctrl thread\n");
+		goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	free(ctx);
+	return ret;
+}
+
 int
 rte_thread_register(void)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index c65cfd8c9e..4da800ae27 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -457,6 +457,33 @@ int rte_thread_barrier_destroy(rte_thread_barrier *barrier);
 __rte_experimental
 int rte_thread_name_set(rte_thread_t thread_id, const char *name);
 
+/**
+ * Create a control thread.
+ *
+ * Set affinity and thread name. The affinity of the new thread is based
+ * on the CPU affinity retrieved at the time rte_eal_init() was called,
+ * the dataplane and service lcores are then excluded.
+ *
+ * @param thread
+ *   Filled with the thread id of the new created thread.
+ *
+ * @param name
+ *   The name of the control thread (max 16 characters including '\0').
+ *
+ * @param start_routine
+ *   Function to be executed by the new thread.
+ *
+ * @param arg
+ *   Argument passed to start_routine.
+ *
+ * @return
+ *   On success, return 0;
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_ctrl_thread_create(rte_thread_t *thread, const char *name,
+		rte_thread_func start_routine, void *arg);
+
 /**
  * Create a TLS data key visible to all threads in the process.
  * the created key is later used to get/set a value.
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 2a566c04af..02455a1c8d 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -444,6 +444,7 @@ EXPERIMENTAL {
 	rte_thread_barrier_wait;
 	rte_thread_barrier_destroy;
 	rte_thread_name_set;
+	rte_thread_ctrl_thread_create;
 };
 
 INTERNAL {
-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 2/6] eal: add function for control thread creation
  2021-06-19  1:57  4% ` [dpdk-dev] [PATCH v2 0/6] Enable the internal EAL thread API Narcisa Ana Maria Vasile
@ 2021-06-19  1:57  4%   ` Narcisa Ana Maria Vasile
  0 siblings, 0 replies; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-06-19  1:57 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

The existing rte_ctrl_thread_create() function will be replaced
with rte_thread_ctrl_thread_create() that uses the internal
EAL thread API.

This patch only introduces the new control thread creation
function. Replacing of the old function needs to be done according
to the ABI change procedures, to avoid an ABI break.

Depends-on: series-17402 ("eal: Add EAL API for threading")

Signed-off-by: Narcisa Vasile <navasile@microsoft.com>
---
 lib/eal/common/eal_common_thread.c | 81 ++++++++++++++++++++++++++++++
 lib/eal/include/rte_thread.h       | 27 ++++++++++
 lib/eal/version.map                |  1 +
 3 files changed, 109 insertions(+)

diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
index 1a52f42a2b..79545c67d9 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -259,6 +259,87 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name,
 	return -ret;
 }
 
+struct rte_thread_ctrl_ctx {
+	rte_thread_func start_routine;
+	void *arg;
+	const char *name;
+};
+
+static void *ctrl_thread_wrapper(void *arg)
+{
+	struct internal_config *conf = eal_get_internal_configuration();
+	rte_cpuset_t *cpuset = &conf->ctrl_cpuset;
+	struct rte_thread_ctrl_ctx *ctx = arg;
+	rte_thread_func start_routine = ctx->start_routine;
+	void *routine_arg = ctx->arg;
+
+	__rte_thread_init(rte_lcore_id(), cpuset);
+
+	if (ctx->name != NULL) {
+		if (rte_thread_name_set(rte_thread_self(), ctx->name) < 0)
+			RTE_LOG(DEBUG, EAL, "Cannot set name for ctrl thread\n");
+	}
+
+	free(arg);
+
+	return start_routine(routine_arg);
+}
+
+int
+rte_thread_ctrl_thread_create(rte_thread_t *thread, const char *name,
+		rte_thread_func start_routine, void *arg)
+{
+	int ret;
+	rte_thread_attr_t attr;
+	struct internal_config *conf = eal_get_internal_configuration();
+	rte_cpuset_t *cpuset = &conf->ctrl_cpuset;
+	struct rte_thread_ctrl_ctx *ctx = NULL;
+
+	if (start_routine == NULL) {
+		ret = EINVAL;
+		goto cleanup;
+	}
+
+	ctx = malloc(sizeof(*ctx));
+	if (ctx == NULL) {
+		ret = ENOMEM;
+		goto cleanup;
+	}
+
+	ctx->start_routine = start_routine;
+	ctx->arg = arg;
+	ctx->name = name;
+
+	ret = rte_thread_attr_init(&attr);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot init ctrl thread attributes\n");
+		goto cleanup;
+	}
+
+	ret = rte_thread_attr_set_affinity(&attr, cpuset);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot set afifnity attribute for ctrl thread\n");
+		goto cleanup;
+	}
+	ret = rte_thread_attr_set_priority(&attr, RTE_THREAD_PRIORITY_NORMAL);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot set priority attribute for ctrl thread\n");
+		goto cleanup;
+	}
+
+	ret = rte_thread_create(thread, &attr, ctrl_thread_wrapper, ctx);
+	if (ret != 0) {
+		RTE_LOG(DEBUG, EAL, "Cannot create ctrl thread\n");
+		goto cleanup;
+	}
+
+	return 0;
+
+cleanup:
+	free(ctx);
+	return ret;
+}
+
 int
 rte_thread_register(void)
 {
diff --git a/lib/eal/include/rte_thread.h b/lib/eal/include/rte_thread.h
index c65cfd8c9e..4da800ae27 100644
--- a/lib/eal/include/rte_thread.h
+++ b/lib/eal/include/rte_thread.h
@@ -457,6 +457,33 @@ int rte_thread_barrier_destroy(rte_thread_barrier *barrier);
 __rte_experimental
 int rte_thread_name_set(rte_thread_t thread_id, const char *name);
 
+/**
+ * Create a control thread.
+ *
+ * Set affinity and thread name. The affinity of the new thread is based
+ * on the CPU affinity retrieved at the time rte_eal_init() was called,
+ * the dataplane and service lcores are then excluded.
+ *
+ * @param thread
+ *   Filled with the thread id of the new created thread.
+ *
+ * @param name
+ *   The name of the control thread (max 16 characters including '\0').
+ *
+ * @param start_routine
+ *   Function to be executed by the new thread.
+ *
+ * @param arg
+ *   Argument passed to start_routine.
+ *
+ * @return
+ *   On success, return 0;
+ *   On failure, return a positive errno-style error number.
+ */
+__rte_experimental
+int rte_thread_ctrl_thread_create(rte_thread_t *thread, const char *name,
+		rte_thread_func start_routine, void *arg);
+
 /**
  * Create a TLS data key visible to all threads in the process.
  * the created key is later used to get/set a value.
diff --git a/lib/eal/version.map b/lib/eal/version.map
index 2a566c04af..02455a1c8d 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -444,6 +444,7 @@ EXPERIMENTAL {
 	rte_thread_barrier_wait;
 	rte_thread_barrier_destroy;
 	rte_thread_name_set;
+	rte_thread_ctrl_thread_create;
 };
 
 INTERNAL {
-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2 0/6] Enable the internal EAL thread API
    2021-06-18 21:54  4% ` [dpdk-dev] [PATCH 2/6] eal: add function for control thread creation Narcisa Ana Maria Vasile
@ 2021-06-19  1:57  4% ` Narcisa Ana Maria Vasile
  2021-06-19  1:57  4%   ` [dpdk-dev] [PATCH v2 2/6] eal: add function for control thread creation Narcisa Ana Maria Vasile
  1 sibling, 1 reply; 200+ results
From: Narcisa Ana Maria Vasile @ 2021-06-19  1:57 UTC (permalink / raw)
  To: dev, thomas, dmitry.kozliuk, khot, navasile, dmitrym, roretzla,
	talshn, ocardona
  Cc: bruce.richardson, david.marchand, pallavi.kadam

From: Narcisa Vasile <navasile@microsoft.com>

This patchset enables the new EAL thread API.
The newly defined thread attributes, priority and affinity,
are used in eal/windows when creating the threads. Similarly, 
some changes have been done in eal/linux/eal.c and eal/freebsd/eal.c
to initialize priority to a default value and set thread attributes.

The user is offered the option of either using the rte_thread_* API or
a 3rd party thread library, through a meson flag
called "use_external_thread_lib".
By default, this flag is set to FALSE, which means Windows libraries
and applications will use the EAL rte_thread_* API 
defined in windows/rte_thread.c for managing threads.
When the flag is set to TRUE, the common/rte_thread.c file is compiled
and an external thread library is used.

This patchset adds a new function for creating control threads that
uses the new thread API.
It enables the usage of the new function in Windows code and common code.
The old function is kept to avoid ABI break, however, its definition
is commented away on Windows, since the pthread_t and pthread_attr_t
arguments that it receives have been replaced with the new API on Windows.
This allows testing the "eal: Add EAL API for threading" that this
patchset depends on.

The ethdev lib also contains some changes that break the ABI.
Enabling the new EAL thread API will probably require going through
the proper process of ABI changes.

Depends-on: series-17402 ("eal: Add EAL API for threading")

v2:
- fix typo in SetThreadDescription_type function pointer
- add Depends-on on all patches to fix apply errors.
- modify cover letter

Narcisa Vasile (6):
  eal: add function that sets thread name
  eal: add function for control thread creation
  Enable the new EAL thread API in app, drivers and examples
  lib: enable the new EAL thread API
  eal: set affinity and priority attributes
  Allow choice between internal EAL thread API and external lib

 app/test/process.h                            |   8 +-
 app/test/test_lcores.c                        |  18 +-
 app/test/test_link_bonding.c                  |  14 +-
 app/test/test_lpm_perf.c                      |  12 +-
 config/meson.build                            |   1 -
 drivers/bus/dpaa/base/qbman/bman_driver.c     |   5 +-
 drivers/bus/dpaa/base/qbman/dpaa_sys.c        |  14 +-
 drivers/bus/dpaa/base/qbman/process.c         |   6 +-
 drivers/bus/dpaa/dpaa_bus.c                   |  14 +-
 drivers/bus/fslmc/portal/dpaa2_hw_dpio.c      |  19 +-
 drivers/common/dpaax/compat.h                 |   2 +-
 drivers/common/mlx5/windows/mlx5_common_os.h  |   1 +
 drivers/compress/mlx5/mlx5_compress.c         |  10 +-
 drivers/event/dlb2/dlb2.c                     |   2 +-
 drivers/event/dlb2/pf/base/dlb2_osdep.h       |   7 +-
 drivers/mempool/dpaa/dpaa_mempool.c           |   2 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c           |  18 +-
 drivers/net/ark/ark_ethdev.c                  |   4 +-
 drivers/net/ark/ark_pktgen.c                  |   4 +-
 drivers/net/atlantic/atl_ethdev.c             |   4 +-
 drivers/net/atlantic/atl_types.h              |   4 +-
 .../net/atlantic/hw_atl/hw_atl_utils_fw2x.c   |  26 +--
 drivers/net/axgbe/axgbe_common.h              |   2 +-
 drivers/net/axgbe/axgbe_dev.c                 |   8 +-
 drivers/net/axgbe/axgbe_ethdev.c              |   8 +-
 drivers/net/axgbe/axgbe_ethdev.h              |   8 +-
 drivers/net/axgbe/axgbe_i2c.c                 |   4 +-
 drivers/net/axgbe/axgbe_mdio.c                |   8 +-
 drivers/net/axgbe/axgbe_phy_impl.c            |   6 +-
 drivers/net/bnxt/bnxt.h                       |  16 +-
 drivers/net/bnxt/bnxt_cpr.c                   |   4 +-
 drivers/net/bnxt/bnxt_ethdev.c                |  54 ++---
 drivers/net/bnxt/bnxt_irq.c                   |   8 +-
 drivers/net/bnxt/bnxt_reps.c                  |  10 +-
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c            |  34 ++--
 drivers/net/bnxt/tf_ulp/bnxt_ulp.h            |   4 +-
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.c          |  28 +--
 drivers/net/bnxt/tf_ulp/ulp_fc_mgr.h          |   2 +-
 drivers/net/dpaa/dpaa_ethdev.c                |   2 +-
 drivers/net/dpaa/dpaa_rxtx.c                  |   2 +-
 drivers/net/ena/base/ena_plat_dpdk.h          |  15 +-
 drivers/net/enic/enic.h                       |   2 +-
 drivers/net/ice/ice_dcf_parent.c              |   8 +-
 drivers/net/ixgbe/ixgbe_ethdev.c              |   6 +-
 drivers/net/ixgbe/ixgbe_ethdev.h              |   2 +-
 drivers/net/mlx5/linux/mlx5_os.c              |   2 +-
 drivers/net/mlx5/mlx5.c                       |  20 +-
 drivers/net/mlx5/mlx5.h                       |   2 +-
 drivers/net/mlx5/mlx5_txpp.c                  |   8 +-
 drivers/net/mlx5/windows/mlx5_flow_os.c       |  10 +-
 drivers/net/mlx5/windows/mlx5_os.c            |   2 +-
 drivers/net/qede/base/bcm_osal.h              |   8 +-
 drivers/net/vhost/rte_eth_vhost.c             |  24 +--
 .../net/virtio/virtio_user/virtio_user_dev.c  |  30 +--
 .../net/virtio/virtio_user/virtio_user_dev.h  |   2 +-
 drivers/vdpa/ifc/ifcvf_vdpa.c                 |  49 +++--
 drivers/vdpa/mlx5/mlx5_vdpa.c                 |  24 +--
 drivers/vdpa/mlx5/mlx5_vdpa.h                 |   4 +-
 drivers/vdpa/mlx5/mlx5_vdpa_event.c           |  51 ++---
 examples/kni/main.c                           |   1 +
 .../pthread_shim/pthread_shim.h               |   1 +
 lib/eal/common/eal_common_options.c           |   6 +-
 lib/eal/common/eal_common_thread.c            | 105 +++++++++-
 lib/eal/common/eal_common_trace.c             |   1 +
 lib/eal/common/eal_private.h                  |   2 +-
 lib/eal/common/eal_thread.h                   |   6 +
 lib/eal/common/malloc_mp.c                    |   2 +
 lib/eal/common/rte_thread.c                   |  17 ++
 lib/eal/freebsd/eal.c                         |  53 +++--
 lib/eal/freebsd/eal_alarm.c                   |  12 +-
 lib/eal/freebsd/eal_interrupts.c              |   6 +-
 lib/eal/freebsd/eal_thread.c                  |  10 +-
 lib/eal/include/rte_lcore.h                   |   6 +
 lib/eal/include/rte_per_lcore.h               |   2 +-
 lib/eal/include/rte_thread.h                  |  45 ++++
 lib/eal/linux/eal.c                           |  55 +++--
 lib/eal/linux/eal_alarm.c                     |  10 +-
 lib/eal/linux/eal_interrupts.c                |   8 +-
 lib/eal/linux/eal_thread.c                    |  11 +-
 lib/eal/linux/eal_timer.c                     |   6 +-
 lib/eal/version.map                           |   6 +-
 lib/eal/windows/eal.c                         |  44 +++-
 lib/eal/windows/eal_interrupts.c              |  10 +-
 lib/eal/windows/eal_thread.c                  |  35 +---
 lib/eal/windows/eal_windows.h                 |  10 -
 lib/eal/windows/include/pthread.h             | 192 ------------------
 lib/eal/windows/include/rte_windows.h         |   1 +
 lib/eal/windows/meson.build                   |   7 +-
 lib/eal/windows/rte_thread.c                  |  60 ++++++
 lib/ethdev/rte_ethdev.c                       |   4 +-
 lib/ethdev/rte_ethdev_core.h                  |   4 +-
 lib/ethdev/rte_flow.c                         |   4 +-
 lib/eventdev/rte_event_eth_rx_adapter.c       |   1 +
 lib/vhost/vhost.c                             |   1 +
 meson_options.txt                             |   2 +
 95 files changed, 764 insertions(+), 654 deletions(-)
 delete mode 100644 lib/eal/windows/include/pthread.h

-- 
2.31.0.vfs.0.1


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [RFC PATCH v3 0/3] Add PIE support for HQoS library
  @ 2021-06-21  7:35  3% ` Liguzinski, WojciechX
  2021-07-05  8:04  3%   ` [dpdk-dev] [RFC PATCH v4 " Liguzinski, WojciechX
  0 siblings, 1 reply; 200+ results
From: Liguzinski, WojciechX @ 2021-06-21  7:35 UTC (permalink / raw)
  To: dev, jasvinder.singh, cristian.dumitrescu; +Cc: savinay.dharmappa, megha.ajmera

DPDK sched library is equipped with mechanism that secures it from the bufferbloat problem
which is a situation when excess buffers in the network cause high latency and latency 
variation. Currently, it supports RED for active queue management (which is designed 
to control the queue length but it does not control latency directly and is now being 
obsoleted). However, more advanced queue management is required to address this problem
and provide desirable quality of service to users.

This solution (RFC) proposes usage of new algorithm called "PIE" (Proportional Integral
controller Enhanced) that can effectively and directly control queuing latency to address 
the bufferbloat problem.

The implementation of mentioned functionality includes modification of existing and 
adding a new set of data structures to the library, adding PIE related APIs. 
This affects structures in public API/ABI. That is why deprecation notice is going
to be prepared and sent.

Liguzinski, WojciechX (3):
  sched: add PIE based congestion management
  example/qos_sched: add PIE support
  example/ip_pipeline: add PIE support

 config/rte_config.h                      |   1 -
 drivers/net/softnic/rte_eth_softnic_tm.c |   6 +-
 examples/ip_pipeline/tmgr.c              |   6 +-
 examples/qos_sched/app_thread.c          |   1 -
 examples/qos_sched/cfg_file.c            |  82 ++++-
 examples/qos_sched/init.c                |   7 +-
 examples/qos_sched/profile.cfg           | 196 ++++++++----
 lib/sched/meson.build                    |  10 +-
 lib/sched/rte_pie.c                      |  78 +++++
 lib/sched/rte_pie.h                      | 388 +++++++++++++++++++++++
 lib/sched/rte_sched.c                    | 229 +++++++++----
 lib/sched/rte_sched.h                    |  53 +++-
 12 files changed, 876 insertions(+), 181 deletions(-)
 create mode 100644 lib/sched/rte_pie.c
 create mode 100644 lib/sched/rte_pie.h

-- 
2.17.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-18 10:41  0%             ` Ananyev, Konstantin
  2021-06-18 10:49  0%               ` Ferruh Yigit
@ 2021-06-21 11:06  0%               ` Ananyev, Konstantin
  2021-06-21 14:05  0%                 ` Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-06-21 11:06 UTC (permalink / raw)
  To: Yigit, Ferruh, Thomas Monjalon, Richardson, Bruce
  Cc: Morten Brørup, dev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, jerinj, gakhil


Hi everyone,
 
> > >>> One more thought here - if we are talking about rte_ethdev[] in particular, I think  we can:
> > >>> 1. move public function pointers (rx_pkt_burst(), etc.) from rte_ethdev into a separate flat array.
> > >>> We can keep it public to still use inline functions for 'fast' calls rte_eth_rx_burst(), etc. to avoid
> > >>> any regressions.
> > >>> That could still be flat array with max_size specified at application startup.
> > >>> 2. Hide rest of rte_ethdev struct in .c.
> > >>> That will allow us to change the struct itself and the whole rte_ethdev[] table in a way we like
> > >>> (flat array, vector, hash, linked list) without ABI/API breakages.
> > >>>
> > >>> Yes, it would require all PMDs to change prototype for pkt_rx_burst() function
> > >>> (to accept port_id, queue_id instead of queue pointer), but the change is mechanical one.
> > >>> Probably some macro can be provided to simplify it.
> > >>>
> > >>
> > >> We are already planning some tasks for ABI stability for v21.11, I think
> > >> splitting 'struct rte_eth_dev' can be part of that task, it enables hiding more
> > >> internal data.
> > >
> > > Ok, sounds good.
> > >
> > >>
> > >>> The only significant complication I can foresee with implementing that approach -
> > >>> we'll need a an array of 'fast' function pointers per queue, not per device as we have now
> > >>> (to avoid extra indirection for callback implementation).
> > >>> Though as a bonus we'll have ability to use different RX/TX funcions per queue.
> > >>>
> > >>
> > >> What do you think split Rx/Tx callback into its own struct too?
> > >>
> > >> Overall 'rte_eth_dev' can be split into three as:
> > >> 1. rte_eth_dev
> > >> 2. rte_eth_dev_burst
> > >> 3. rte_eth_dev_cb
> > >>
> > >> And we can hide 1 from applications even with the inline functions.
> > >
> > > As discussed off-line, I think:
> > > it is possible.
> > > My absolute preference would be to have just 1/2 (with CB hidden).
> >
> > How can we hide the callbacks since they are used by inline burst functions.
> 
> I probably I owe a better explanation to what I meant in first mail.
> Otherwise it sounds confusing.
> I'll try to write a more detailed one in next few days.

Actually I gave it another thought over weekend, and might be we can
hide rte_eth_dev_cb even in a simpler way. I'd use eth_rx_burst() as
an example, but the same principle applies to other 'fast' functions. 

 1. Needed changes for PMDs rx_pkt_burst():
    a) change function prototype to accept 'uint16_t port_id' and 'uint16_t queue_id',
         instead of current 'void *'.
    b) Each PMD rx_pkt_burst() will have to call rte_eth_rx_epilog() function at return.
         This  inline function will do all CB calls for that queue.

To be more specific, let say we have some PMD: xyz with RX function:

uint16_t
xyz_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
{
     struct xyz_rx_queue *rxq = rx_queue;
     uint16_t nb_rx = 0;

     /* do actual stuff here */
    ....
    return nb_rx; 
}

It will be transformed to:

uint16_t
xyz_recv_pkts(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
{
         struct xyz_rx_queue *rxq;
         uint16_t nb_rx;

         rxq = _rte_eth_rx_prolog(port_id, queue_id);
         if (rxq == NULL)
             return 0;
         nb_rx = _xyz_real_recv_pkts(rxq, rx_pkts, nb_pkts);
         return _rte_eth_rx_epilog(port_id, queue_id, rx_pkts, nb_pkts);
}

And somewhere in ethdev_private.h:

static inline void *
_rte_eth_rx_prolog(uint16_t port_id, uint16_t queue_id); 
{
   struct rte_eth_dev *dev = &rte_eth_devices[port_id];

#ifdef RTE_ETHDEV_DEBUG_RX
        RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
        RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, NULL);

        if (queue_id >= dev->data->nb_rx_queues) {
                RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
                return NULL;
        }
#endif
  return dev->data->rx_queues[queue_id];   
}

static inline uint16_t
_rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, const uint16_t nb_pkts); 
{
    struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 
#ifdef RTE_ETHDEV_RXTX_CALLBACKS
        struct rte_eth_rxtx_callback *cb;

        /* __ATOMIC_RELEASE memory order was used when the
         * call back was inserted into the list.
         * Since there is a clear dependency between loading
         * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
         * not required.
         */
        cb = __atomic_load_n(&dev->post_rx_burst_cbs[queue_id],
                                __ATOMIC_RELAXED);

        if (unlikely(cb != NULL)) {
                do {
                        nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx,
                                                nb_pkts, cb->param);
                        cb = cb->next;
                } while (cb != NULL);
        }
#endif

        rte_ethdev_trace_rx_burst(port_id, queue_id, (void **)rx_pkts, nb_rx);
        return nb_rx;
 }

Now, as you said above, in rte_ethdev.h we will keep only a flat array
with pointers to 'fast' functions:
struct {
     eth_rx_burst_t             rx_pkt_burst
      eth_tx_burst_t             tx_pkt_burst;       
      eth_tx_prep_t              tx_pkt_prepare;
     .....
} rte_eth_dev_burst[];

And rte_eth_rx_burst() will look like:

static inline uint16_t
rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
                 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
{
    if (port_id >= RTE_MAX_ETHPORTS)
        return 0;
   return rte_eth_dev_burst[port_id](port_id, queue_id, rx_pkts, nb_pkts);
}

Yes, it will require changes in *all* PMDs, but as I said before the changes will be a mechanic ones.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-21 11:06  0%               ` Ananyev, Konstantin
@ 2021-06-21 14:05  0%                 ` Ferruh Yigit
  2021-06-21 14:42  0%                   ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-06-21 14:05 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon, Richardson, Bruce
  Cc: Morten Brørup, dev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, jerinj, gakhil

On 6/21/2021 12:06 PM, Ananyev, Konstantin wrote:
> 
> Hi everyone,
> 
>>>>>> One more thought here - if we are talking about rte_ethdev[] in particular, I think  we can:
>>>>>> 1. move public function pointers (rx_pkt_burst(), etc.) from rte_ethdev into a separate flat array.
>>>>>> We can keep it public to still use inline functions for 'fast' calls rte_eth_rx_burst(), etc. to avoid
>>>>>> any regressions.
>>>>>> That could still be flat array with max_size specified at application startup.
>>>>>> 2. Hide rest of rte_ethdev struct in .c.
>>>>>> That will allow us to change the struct itself and the whole rte_ethdev[] table in a way we like
>>>>>> (flat array, vector, hash, linked list) without ABI/API breakages.
>>>>>>
>>>>>> Yes, it would require all PMDs to change prototype for pkt_rx_burst() function
>>>>>> (to accept port_id, queue_id instead of queue pointer), but the change is mechanical one.
>>>>>> Probably some macro can be provided to simplify it.
>>>>>>
>>>>>
>>>>> We are already planning some tasks for ABI stability for v21.11, I think
>>>>> splitting 'struct rte_eth_dev' can be part of that task, it enables hiding more
>>>>> internal data.
>>>>
>>>> Ok, sounds good.
>>>>
>>>>>
>>>>>> The only significant complication I can foresee with implementing that approach -
>>>>>> we'll need a an array of 'fast' function pointers per queue, not per device as we have now
>>>>>> (to avoid extra indirection for callback implementation).
>>>>>> Though as a bonus we'll have ability to use different RX/TX funcions per queue.
>>>>>>
>>>>>
>>>>> What do you think split Rx/Tx callback into its own struct too?
>>>>>
>>>>> Overall 'rte_eth_dev' can be split into three as:
>>>>> 1. rte_eth_dev
>>>>> 2. rte_eth_dev_burst
>>>>> 3. rte_eth_dev_cb
>>>>>
>>>>> And we can hide 1 from applications even with the inline functions.
>>>>
>>>> As discussed off-line, I think:
>>>> it is possible.
>>>> My absolute preference would be to have just 1/2 (with CB hidden).
>>>
>>> How can we hide the callbacks since they are used by inline burst functions.
>>
>> I probably I owe a better explanation to what I meant in first mail.
>> Otherwise it sounds confusing.
>> I'll try to write a more detailed one in next few days.
> 
> Actually I gave it another thought over weekend, and might be we can
> hide rte_eth_dev_cb even in a simpler way. I'd use eth_rx_burst() as
> an example, but the same principle applies to other 'fast' functions.
> 
>  1. Needed changes for PMDs rx_pkt_burst():
>     a) change function prototype to accept 'uint16_t port_id' and 'uint16_t queue_id',
>          instead of current 'void *'.
>     b) Each PMD rx_pkt_burst() will have to call rte_eth_rx_epilog() function at return.
>          This  inline function will do all CB calls for that queue.
> 
> To be more specific, let say we have some PMD: xyz with RX function:
> 
> uint16_t
> xyz_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
> {
>      struct xyz_rx_queue *rxq = rx_queue;
>      uint16_t nb_rx = 0;
> 
>      /* do actual stuff here */
>     ....
>     return nb_rx;
> }
> 
> It will be transformed to:
> 
> uint16_t
> xyz_recv_pkts(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
> {
>          struct xyz_rx_queue *rxq;
>          uint16_t nb_rx;
> 
>          rxq = _rte_eth_rx_prolog(port_id, queue_id);
>          if (rxq == NULL)
>              return 0;
>          nb_rx = _xyz_real_recv_pkts(rxq, rx_pkts, nb_pkts);
>          return _rte_eth_rx_epilog(port_id, queue_id, rx_pkts, nb_pkts);
> }
> 
> And somewhere in ethdev_private.h:
> 
> static inline void *
> _rte_eth_rx_prolog(uint16_t port_id, uint16_t queue_id);
> {
>    struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> 
> #ifdef RTE_ETHDEV_DEBUG_RX
>         RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
>         RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, NULL);
> 
>         if (queue_id >= dev->data->nb_rx_queues) {
>                 RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
>                 return NULL;
>         }
> #endif
>   return dev->data->rx_queues[queue_id];
> }
> 
> static inline uint16_t
> _rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, const uint16_t nb_pkts);
> {
>     struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> 
> #ifdef RTE_ETHDEV_RXTX_CALLBACKS
>         struct rte_eth_rxtx_callback *cb;
> 
>         /* __ATOMIC_RELEASE memory order was used when the
>          * call back was inserted into the list.
>          * Since there is a clear dependency between loading
>          * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
>          * not required.
>          */
>         cb = __atomic_load_n(&dev->post_rx_burst_cbs[queue_id],
>                                 __ATOMIC_RELAXED);
> 
>         if (unlikely(cb != NULL)) {
>                 do {
>                         nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx,
>                                                 nb_pkts, cb->param);
>                         cb = cb->next;
>                 } while (cb != NULL);
>         }
> #endif
> 
>         rte_ethdev_trace_rx_burst(port_id, queue_id, (void **)rx_pkts, nb_rx);
>         return nb_rx;
>  }
> 
> Now, as you said above, in rte_ethdev.h we will keep only a flat array
> with pointers to 'fast' functions:
> struct {
>      eth_rx_burst_t             rx_pkt_burst
>       eth_tx_burst_t             tx_pkt_burst;
>       eth_tx_prep_t              tx_pkt_prepare;
>      .....
> } rte_eth_dev_burst[];
> 
> And rte_eth_rx_burst() will look like:
> 
> static inline uint16_t
> rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
>                  struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
> {
>     if (port_id >= RTE_MAX_ETHPORTS)
>         return 0;
>    return rte_eth_dev_burst[port_id](port_id, queue_id, rx_pkts, nb_pkts);
> }
> 
> Yes, it will require changes in *all* PMDs, but as I said before the changes will be a mechanic ones.
> 

I did not like the idea to push to calling Rx/TX callbacks responsibility to the
drivers, I think it should be in the ethdev layer.

What about making 'rte_eth_rx_epilog' an API and call from 'rte_eth_rx_burst()',
which will add another function call for Rx/Tx callback but shouldn't affect the
Rx/Tx burst.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-21 14:05  0%                 ` Ferruh Yigit
@ 2021-06-21 14:42  0%                   ` Ananyev, Konstantin
  2021-06-21 15:32  0%                     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-06-21 14:42 UTC (permalink / raw)
  To: Yigit, Ferruh, Thomas Monjalon, Richardson, Bruce
  Cc: Morten Brørup, dev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, jerinj, gakhil


> >>>>>> One more thought here - if we are talking about rte_ethdev[] in particular, I think  we can:
> >>>>>> 1. move public function pointers (rx_pkt_burst(), etc.) from rte_ethdev into a separate flat array.
> >>>>>> We can keep it public to still use inline functions for 'fast' calls rte_eth_rx_burst(), etc. to avoid
> >>>>>> any regressions.
> >>>>>> That could still be flat array with max_size specified at application startup.
> >>>>>> 2. Hide rest of rte_ethdev struct in .c.
> >>>>>> That will allow us to change the struct itself and the whole rte_ethdev[] table in a way we like
> >>>>>> (flat array, vector, hash, linked list) without ABI/API breakages.
> >>>>>>
> >>>>>> Yes, it would require all PMDs to change prototype for pkt_rx_burst() function
> >>>>>> (to accept port_id, queue_id instead of queue pointer), but the change is mechanical one.
> >>>>>> Probably some macro can be provided to simplify it.
> >>>>>>
> >>>>>
> >>>>> We are already planning some tasks for ABI stability for v21.11, I think
> >>>>> splitting 'struct rte_eth_dev' can be part of that task, it enables hiding more
> >>>>> internal data.
> >>>>
> >>>> Ok, sounds good.
> >>>>
> >>>>>
> >>>>>> The only significant complication I can foresee with implementing that approach -
> >>>>>> we'll need a an array of 'fast' function pointers per queue, not per device as we have now
> >>>>>> (to avoid extra indirection for callback implementation).
> >>>>>> Though as a bonus we'll have ability to use different RX/TX funcions per queue.
> >>>>>>
> >>>>>
> >>>>> What do you think split Rx/Tx callback into its own struct too?
> >>>>>
> >>>>> Overall 'rte_eth_dev' can be split into three as:
> >>>>> 1. rte_eth_dev
> >>>>> 2. rte_eth_dev_burst
> >>>>> 3. rte_eth_dev_cb
> >>>>>
> >>>>> And we can hide 1 from applications even with the inline functions.
> >>>>
> >>>> As discussed off-line, I think:
> >>>> it is possible.
> >>>> My absolute preference would be to have just 1/2 (with CB hidden).
> >>>
> >>> How can we hide the callbacks since they are used by inline burst functions.
> >>
> >> I probably I owe a better explanation to what I meant in first mail.
> >> Otherwise it sounds confusing.
> >> I'll try to write a more detailed one in next few days.
> >
> > Actually I gave it another thought over weekend, and might be we can
> > hide rte_eth_dev_cb even in a simpler way. I'd use eth_rx_burst() as
> > an example, but the same principle applies to other 'fast' functions.
> >
> >  1. Needed changes for PMDs rx_pkt_burst():
> >     a) change function prototype to accept 'uint16_t port_id' and 'uint16_t queue_id',
> >          instead of current 'void *'.
> >     b) Each PMD rx_pkt_burst() will have to call rte_eth_rx_epilog() function at return.
> >          This  inline function will do all CB calls for that queue.
> >
> > To be more specific, let say we have some PMD: xyz with RX function:
> >
> > uint16_t
> > xyz_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
> > {
> >      struct xyz_rx_queue *rxq = rx_queue;
> >      uint16_t nb_rx = 0;
> >
> >      /* do actual stuff here */
> >     ....
> >     return nb_rx;
> > }
> >
> > It will be transformed to:
> >
> > uint16_t
> > xyz_recv_pkts(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
> > {
> >          struct xyz_rx_queue *rxq;
> >          uint16_t nb_rx;
> >
> >          rxq = _rte_eth_rx_prolog(port_id, queue_id);
> >          if (rxq == NULL)
> >              return 0;
> >          nb_rx = _xyz_real_recv_pkts(rxq, rx_pkts, nb_pkts);
> >          return _rte_eth_rx_epilog(port_id, queue_id, rx_pkts, nb_pkts);
> > }
> >
> > And somewhere in ethdev_private.h:
> >
> > static inline void *
> > _rte_eth_rx_prolog(uint16_t port_id, uint16_t queue_id);
> > {
> >    struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> >
> > #ifdef RTE_ETHDEV_DEBUG_RX
> >         RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
> >         RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, NULL);
> >
> >         if (queue_id >= dev->data->nb_rx_queues) {
> >                 RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
> >                 return NULL;
> >         }
> > #endif
> >   return dev->data->rx_queues[queue_id];
> > }
> >
> > static inline uint16_t
> > _rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, const uint16_t nb_pkts);
> > {
> >     struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> >
> > #ifdef RTE_ETHDEV_RXTX_CALLBACKS
> >         struct rte_eth_rxtx_callback *cb;
> >
> >         /* __ATOMIC_RELEASE memory order was used when the
> >          * call back was inserted into the list.
> >          * Since there is a clear dependency between loading
> >          * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
> >          * not required.
> >          */
> >         cb = __atomic_load_n(&dev->post_rx_burst_cbs[queue_id],
> >                                 __ATOMIC_RELAXED);
> >
> >         if (unlikely(cb != NULL)) {
> >                 do {
> >                         nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx,
> >                                                 nb_pkts, cb->param);
> >                         cb = cb->next;
> >                 } while (cb != NULL);
> >         }
> > #endif
> >
> >         rte_ethdev_trace_rx_burst(port_id, queue_id, (void **)rx_pkts, nb_rx);
> >         return nb_rx;
> >  }
> >
> > Now, as you said above, in rte_ethdev.h we will keep only a flat array
> > with pointers to 'fast' functions:
> > struct {
> >      eth_rx_burst_t             rx_pkt_burst
> >       eth_tx_burst_t             tx_pkt_burst;
> >       eth_tx_prep_t              tx_pkt_prepare;
> >      .....
> > } rte_eth_dev_burst[];
> >
> > And rte_eth_rx_burst() will look like:
> >
> > static inline uint16_t
> > rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
> >                  struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
> > {
> >     if (port_id >= RTE_MAX_ETHPORTS)
> >         return 0;
> >    return rte_eth_dev_burst[port_id](port_id, queue_id, rx_pkts, nb_pkts);
> > }
> >
> > Yes, it will require changes in *all* PMDs, but as I said before the changes will be a mechanic ones.
> >
> 
> I did not like the idea to push to calling Rx/TX callbacks responsibility to the
> drivers, I think it should be in the ethdev layer.

Well, I'd say it is an ethdev layer function that has to be called by PMD 😊

> 
> What about making 'rte_eth_rx_epilog' an API and call from 'rte_eth_rx_burst()',
> which will add another function call for Rx/Tx callback but shouldn't affect the
> Rx/Tx burst.

But then we either need to expose call-back information to the user or pay the penalty
for extra function call, correct?



^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] devtools: script to track map symbols
  @ 2021-06-21 15:11  5% ` Ray Kinsella
  0 siblings, 0 replies; 200+ results
From: Ray Kinsella @ 2021-06-21 15:11 UTC (permalink / raw)
  To: dev; +Cc: stephen, ferruh.yigit, thomas, ktraynor, bruce.richardson, mdr

Script to track growth of stable and experimental symbols
over releases since v19.11.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 devtools/count_symbols.py | 230 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 230 insertions(+)
 create mode 100755 devtools/count_symbols.py

diff --git a/devtools/count_symbols.py b/devtools/count_symbols.py
new file mode 100755
index 0000000000..7b29651044
--- /dev/null
+++ b/devtools/count_symbols.py
@@ -0,0 +1,230 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+from pathlib import Path
+import sys, os
+import subprocess
+import argparse
+import re
+import datetime
+
+try:
+        from parsley import makeGrammar
+except ImportError:
+        print('This script uses the package Parsley to parse C Mapfiles.\n'
+              'This can be installed with \"pip install parsley".')
+        exit()
+
+symbolMapGrammar = r"""
+
+ws = (' ' | '\r' | '\n' | '\t')*
+
+ABI_VER = ({})
+DPDK_VER = ('DPDK_' ABI_VER)
+ABI_NAME = ('INTERNAL' | 'EXPERIMENTAL' | DPDK_VER)
+comment = '#' (~'\n' anything)+ '\n'
+symbol = (~(';' | '}}' | '#') anything )+:c ';' -> ''.join(c)
+global = 'global:'
+local = 'local: *;'
+symbols = comment* symbol:s ws comment* -> s
+
+abi = (abi_section+):m -> dict(m)
+abi_section = (ws ABI_NAME:e ws '{{' ws global* (~local ws symbols)*:s ws local* ws '}}' ws DPDK_VER* ';' ws) -> (e,s)
+"""
+
+#abi_ver = ['21', '20.0.1', '20.0', '20']
+
+def get_abi_versions():
+    year = datetime.date.today().year - 2000
+    s=" |".join(['\'{}\''.format(i) for i in reversed(range(21, year + 1)) ])
+    s = s + ' | \'20.0.1\' | \'20.0\' | \'20\''
+
+    return s
+
+def get_dpdk_releases():
+    year = datetime.date.today().year - 2000
+    s="|".join("{}".format(i) for i in range(19,year + 1))
+    pattern = re.compile('^\"v(' + s + ')\.\d{2}\"$')
+
+    cmd = ['git', 'for-each-ref', '--sort=taggerdate', '--format', '"%(tag)"']
+    result = subprocess.run(cmd, \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE)
+    if result.stderr.startswith(b'fatal'):
+        result = None
+
+    tags = result.stdout.decode('utf-8').split('\n')
+
+    # find the non-rcs between now and v19.11
+    tags = [ tag.replace('\"','') \
+             for tag in reversed(tags) \
+             if pattern.match(tag) ][:-3]
+
+    return tags
+
+
+def get_terminal_rows():
+    rows, _ = os.popen('stty size', 'r').read().split()
+    return int(rows)
+
+def fix_directory_name(path):
+    mapfilepath1 = str(path.parent.name)
+    mapfilepath2 = str(path.parents[1])
+    mapfilepath = mapfilepath2 + '/librte_' + mapfilepath1
+
+    return mapfilepath
+
+# fix removal of the librte_ from the directory names
+def directory_renamed(path, rel):
+    mapfilepath = fix_directory_name(path)
+    tagfile = '{}:{}/{}'.format(rel, mapfilepath,  path.name)
+
+    result = subprocess.run(['git', 'show', tagfile], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE)
+    if result.stderr.startswith(b'fatal'):
+        result = None
+
+    return result
+
+# fix renaming of map files
+def mapfile_renamed(path, rel):
+    newfile = None
+
+    result = subprocess.run(['git', 'ls-tree', \
+                             rel, str(path.parent) + '/'], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE)
+    dentries = result.stdout.decode('utf-8')
+    dentries = dentries.split('\n')
+
+    # filter entries looking for the map file
+    dentries = [dentry for dentry in dentries if dentry.endswith('.map')]
+    if len(dentries) > 1 or len(dentries) == 0:
+        return None
+
+    dparts = dentries[0].split('/')
+    newfile = dparts[len(dparts) - 1]
+
+    if(newfile is not None):
+        tagfile = '{}:{}/{}'.format(rel, path.parent, newfile)
+
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE)
+        if result.stderr.startswith(b'fatal'):
+            result = None
+
+    else:
+        result = None
+
+    return result
+
+# renaming of the map file & renaming of directory
+def mapfile_and_directory_renamed(path, rel):
+    mapfilepath = Path("{}/{}".format(fix_directory_name(path),path.name))
+
+    return mapfile_renamed(mapfilepath, rel)
+
+fix_strategies = [directory_renamed, \
+                  mapfile_renamed, \
+                  mapfile_and_directory_renamed]
+
+fmt = col_fmt = ""
+
+def set_terminal_output(dpdk_rel):
+    global fmt, col_fmt
+
+    fmt = '{:<50}'
+    col_fmt = fmt
+    for rel in dpdk_rel:
+        fmt += '{:<6}{:<6}'
+        col_fmt += '{:<12}'
+
+def set_csv_output(dpdk_rel):
+    global fmt, col_fmt
+
+    fmt = '{},'
+    col_fmt = fmt
+    for rel in dpdk_rel:
+        fmt += '{},{},'
+        col_fmt += '{},,'
+
+output_formats = { None: set_terminal_output, \
+                   'terminal': set_terminal_output, \
+                   'csv': set_csv_output }
+directories = 'drivers, lib'
+
+def main():
+    global fmt, col_fmt, symbolMapGrammar
+
+    parser = argparse.ArgumentParser(description='Count symbols in DPDK Libs')
+    parser.add_argument('--format-output', choices=['terminal','csv'], \
+                        default='terminal')
+    parser.add_argument('--directory', choices=directories,
+                        default=directories)
+    args = parser.parse_args()
+
+    dpdk_rel = get_dpdk_releases()
+
+    # set the output format
+    output_formats[args.format_output](dpdk_rel)
+
+    column_titles = ['mapfile'] + dpdk_rel
+    print(col_fmt.format(*column_titles))
+
+    symbolMapGrammar = symbolMapGrammar.format(get_abi_versions())
+    MAPParser = makeGrammar(symbolMapGrammar, {})
+
+    terminal_rows = get_terminal_rows()
+    row = 0
+
+    for src_dir in args.directory.split(','):
+        for path in Path(src_dir).rglob('*.map'):
+            csym = [0] * 2
+            relsym = [str(path)]
+
+            for rel in dpdk_rel:
+                i = csym[0] = csym[1] = 0
+                abi_sections = None
+
+                tagfile = '{}:{}'.format(rel,path)
+                result = subprocess.run(['git', 'show', tagfile], \
+                                        stdout=subprocess.PIPE, \
+                                        stderr=subprocess.PIPE)
+
+                if result.stderr.startswith(b'fatal'):
+                    result = None
+
+                while(result is None and i < len(fix_strategies)):
+                    result = fix_strategies[i](path, rel)
+                    i += 1
+
+                if result is not None:
+                    mapfile = result.stdout.decode('utf-8')
+                    abi_sections = MAPParser(mapfile).abi()
+
+                if abi_sections is not None:
+                    # which versions are present, and we care about
+                    ignore = ['EXPERIMENTAL','INTERNAL']
+                    found_ver = [ver \
+                                 for ver in abi_sections \
+                                 if ver not in ignore]
+
+                    for ver in found_ver:
+                        csym[0] += len(abi_sections[ver])
+
+                    # count experimental symbols
+                    if 'EXPERIMENTAL' in abi_sections:
+                        csym[1] = len(abi_sections['EXPERIMENTAL'])
+
+                relsym += csym
+
+            print(fmt.format(*relsym))
+            row += 1
+
+        if((terminal_rows>0) and ((row % terminal_rows) == 0)):
+            print(col_fmt.format(*column_titles))
+
+if __name__ == '__main__':
+        main()
-- 
2.26.2


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-21 14:42  0%                   ` Ananyev, Konstantin
@ 2021-06-21 15:32  0%                     ` Ferruh Yigit
  2021-06-21 15:37  0%                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-06-21 15:32 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon, Richardson, Bruce
  Cc: Morten Brørup, dev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, jerinj, gakhil

On 6/21/2021 3:42 PM, Ananyev, Konstantin wrote:
> 
>>>>>>>> One more thought here - if we are talking about rte_ethdev[] in particular, I think  we can:
>>>>>>>> 1. move public function pointers (rx_pkt_burst(), etc.) from rte_ethdev into a separate flat array.
>>>>>>>> We can keep it public to still use inline functions for 'fast' calls rte_eth_rx_burst(), etc. to avoid
>>>>>>>> any regressions.
>>>>>>>> That could still be flat array with max_size specified at application startup.
>>>>>>>> 2. Hide rest of rte_ethdev struct in .c.
>>>>>>>> That will allow us to change the struct itself and the whole rte_ethdev[] table in a way we like
>>>>>>>> (flat array, vector, hash, linked list) without ABI/API breakages.
>>>>>>>>
>>>>>>>> Yes, it would require all PMDs to change prototype for pkt_rx_burst() function
>>>>>>>> (to accept port_id, queue_id instead of queue pointer), but the change is mechanical one.
>>>>>>>> Probably some macro can be provided to simplify it.
>>>>>>>>
>>>>>>>
>>>>>>> We are already planning some tasks for ABI stability for v21.11, I think
>>>>>>> splitting 'struct rte_eth_dev' can be part of that task, it enables hiding more
>>>>>>> internal data.
>>>>>>
>>>>>> Ok, sounds good.
>>>>>>
>>>>>>>
>>>>>>>> The only significant complication I can foresee with implementing that approach -
>>>>>>>> we'll need a an array of 'fast' function pointers per queue, not per device as we have now
>>>>>>>> (to avoid extra indirection for callback implementation).
>>>>>>>> Though as a bonus we'll have ability to use different RX/TX funcions per queue.
>>>>>>>>
>>>>>>>
>>>>>>> What do you think split Rx/Tx callback into its own struct too?
>>>>>>>
>>>>>>> Overall 'rte_eth_dev' can be split into three as:
>>>>>>> 1. rte_eth_dev
>>>>>>> 2. rte_eth_dev_burst
>>>>>>> 3. rte_eth_dev_cb
>>>>>>>
>>>>>>> And we can hide 1 from applications even with the inline functions.
>>>>>>
>>>>>> As discussed off-line, I think:
>>>>>> it is possible.
>>>>>> My absolute preference would be to have just 1/2 (with CB hidden).
>>>>>
>>>>> How can we hide the callbacks since they are used by inline burst functions.
>>>>
>>>> I probably I owe a better explanation to what I meant in first mail.
>>>> Otherwise it sounds confusing.
>>>> I'll try to write a more detailed one in next few days.
>>>
>>> Actually I gave it another thought over weekend, and might be we can
>>> hide rte_eth_dev_cb even in a simpler way. I'd use eth_rx_burst() as
>>> an example, but the same principle applies to other 'fast' functions.
>>>
>>>  1. Needed changes for PMDs rx_pkt_burst():
>>>     a) change function prototype to accept 'uint16_t port_id' and 'uint16_t queue_id',
>>>          instead of current 'void *'.
>>>     b) Each PMD rx_pkt_burst() will have to call rte_eth_rx_epilog() function at return.
>>>          This  inline function will do all CB calls for that queue.
>>>
>>> To be more specific, let say we have some PMD: xyz with RX function:
>>>
>>> uint16_t
>>> xyz_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
>>> {
>>>      struct xyz_rx_queue *rxq = rx_queue;
>>>      uint16_t nb_rx = 0;
>>>
>>>      /* do actual stuff here */
>>>     ....
>>>     return nb_rx;
>>> }
>>>
>>> It will be transformed to:
>>>
>>> uint16_t
>>> xyz_recv_pkts(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
>>> {
>>>          struct xyz_rx_queue *rxq;
>>>          uint16_t nb_rx;
>>>
>>>          rxq = _rte_eth_rx_prolog(port_id, queue_id);
>>>          if (rxq == NULL)
>>>              return 0;
>>>          nb_rx = _xyz_real_recv_pkts(rxq, rx_pkts, nb_pkts);
>>>          return _rte_eth_rx_epilog(port_id, queue_id, rx_pkts, nb_pkts);
>>> }
>>>
>>> And somewhere in ethdev_private.h:
>>>
>>> static inline void *
>>> _rte_eth_rx_prolog(uint16_t port_id, uint16_t queue_id);
>>> {
>>>    struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>>>
>>> #ifdef RTE_ETHDEV_DEBUG_RX
>>>         RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
>>>         RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, NULL);
>>>
>>>         if (queue_id >= dev->data->nb_rx_queues) {
>>>                 RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
>>>                 return NULL;
>>>         }
>>> #endif
>>>   return dev->data->rx_queues[queue_id];
>>> }
>>>
>>> static inline uint16_t
>>> _rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, const uint16_t nb_pkts);
>>> {
>>>     struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>>>
>>> #ifdef RTE_ETHDEV_RXTX_CALLBACKS
>>>         struct rte_eth_rxtx_callback *cb;
>>>
>>>         /* __ATOMIC_RELEASE memory order was used when the
>>>          * call back was inserted into the list.
>>>          * Since there is a clear dependency between loading
>>>          * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
>>>          * not required.
>>>          */
>>>         cb = __atomic_load_n(&dev->post_rx_burst_cbs[queue_id],
>>>                                 __ATOMIC_RELAXED);
>>>
>>>         if (unlikely(cb != NULL)) {
>>>                 do {
>>>                         nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx,
>>>                                                 nb_pkts, cb->param);
>>>                         cb = cb->next;
>>>                 } while (cb != NULL);
>>>         }
>>> #endif
>>>
>>>         rte_ethdev_trace_rx_burst(port_id, queue_id, (void **)rx_pkts, nb_rx);
>>>         return nb_rx;
>>>  }
>>>
>>> Now, as you said above, in rte_ethdev.h we will keep only a flat array
>>> with pointers to 'fast' functions:
>>> struct {
>>>      eth_rx_burst_t             rx_pkt_burst
>>>       eth_tx_burst_t             tx_pkt_burst;
>>>       eth_tx_prep_t              tx_pkt_prepare;
>>>      .....
>>> } rte_eth_dev_burst[];
>>>
>>> And rte_eth_rx_burst() will look like:
>>>
>>> static inline uint16_t
>>> rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
>>>                  struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
>>> {
>>>     if (port_id >= RTE_MAX_ETHPORTS)
>>>         return 0;
>>>    return rte_eth_dev_burst[port_id](port_id, queue_id, rx_pkts, nb_pkts);
>>> }
>>>
>>> Yes, it will require changes in *all* PMDs, but as I said before the changes will be a mechanic ones.
>>>
>>
>> I did not like the idea to push to calling Rx/TX callbacks responsibility to the
>> drivers, I think it should be in the ethdev layer.
> 
> Well, I'd say it is an ethdev layer function that has to be called by PMD 😊
> 
>>
>> What about making 'rte_eth_rx_epilog' an API and call from 'rte_eth_rx_burst()',
>> which will add another function call for Rx/Tx callback but shouldn't affect the
>> Rx/Tx burst.
> 
> But then we either need to expose call-back information to the user or pay the penalty
> for extra function call, correct?
> 

Right. As a middle ground, we can keep Rx/Tx burst functions as inline, but have
the Rx/Tx callback part of it as function, so get the hit only for callbacks.


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3] devtools: script to track map symbols
  2021-06-18 16:36  5% [dpdk-dev] [PATCH] devtools: script to track map symbols Ray Kinsella
@ 2021-06-21 15:25  6% ` Ray Kinsella
  2021-06-21 15:35  6% ` [dpdk-dev] [PATCH v4] " Ray Kinsella
  2021-06-22 10:19  6% ` [dpdk-dev] [PATCH v5] " Ray Kinsella
  2 siblings, 0 replies; 200+ results
From: Ray Kinsella @ 2021-06-21 15:25 UTC (permalink / raw)
  To: dev; +Cc: stephen, ferruh.yigit, thomas, ktraynor, bruce.richardson, mdr

Script to track growth of stable and experimental symbols
over releases since v19.11.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
v2: reworked to fix pylint errors
v3: sent with the current in-reply-to

 devtools/count_symbols.py | 262 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 262 insertions(+)
 create mode 100755 devtools/count_symbols.py

diff --git a/devtools/count_symbols.py b/devtools/count_symbols.py
new file mode 100755
index 0000000000..30be09754f
--- /dev/null
+++ b/devtools/count_symbols.py
@@ -0,0 +1,262 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+'''Tool to count the number of symbols in each DPDK release'''
+from pathlib import Path
+import sys
+import os
+import subprocess
+import argparse
+import re
+import datetime
+
+try:
+    from parsley import makeGrammar
+except ImportError:
+    print('This script uses the package Parsley to parse C Mapfiles.\n'
+          'This can be installed with \"pip install parsley".')
+    sys.exit()
+
+MAP_GRAMMAR = r"""
+
+ws = (' ' | '\r' | '\n' | '\t')*
+
+ABI_VER = ({})
+DPDK_VER = ('DPDK_' ABI_VER)
+ABI_NAME = ('INTERNAL' | 'EXPERIMENTAL' | DPDK_VER)
+comment = '#' (~'\n' anything)+ '\n'
+symbol = (~(';' | '}}' | '#') anything )+:c ';' -> ''.join(c)
+global = 'global:'
+local = 'local: *;'
+symbols = comment* symbol:s ws comment* -> s
+
+abi = (abi_section+):m -> dict(m)
+abi_section = (ws ABI_NAME:e ws '{{' ws global* (~local ws symbols)*:s ws local* ws '}}' ws DPDK_VER* ';' ws) -> (e,s)
+"""
+
+def get_abi_versions():
+    '''Returns a string of possible dpdk abi versions'''
+
+    year = datetime.date.today().year - 2000
+    tags = " |".join(['\'{}\''.format(i) \
+                     for i in reversed(range(21, year + 1)) ])
+    tags  = tags + ' | \'20.0.1\' | \'20.0\' | \'20\''
+
+    return tags
+
+def get_dpdk_releases():
+    '''Returns a list of dpdk release tags names  since v19.11'''
+
+    year = datetime.date.today().year - 2000
+    year_range = "|".join("{}".format(i) for i in range(19,year + 1))
+    pattern = re.compile(r'^\"v(' +  year_range + r')\.\d{2}\"$')
+
+    cmd = ['git', 'for-each-ref', '--sort=taggerdate', '--format', '"%(tag)"']
+    try:
+        result = subprocess.run(cmd, \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        print("Failed to interogate git for release tags")
+        sys.exit()
+
+    tags = result.stdout.decode('utf-8').split('\n')
+
+    # find the non-rcs between now and v19.11
+    tags = [ tag.replace('\"','') \
+             for tag in reversed(tags) \
+             if pattern.match(tag) ][:-3]
+
+    return tags
+
+def fix_directory_name(path):
+    '''Prepend librte to the source directory name'''
+    mapfilepath1 = str(path.parent.name)
+    mapfilepath2 = str(path.parents[1])
+    mapfilepath = mapfilepath2 + '/librte_' + mapfilepath1
+
+    return mapfilepath
+
+def directory_renamed(path, rel):
+    '''Fix removal of the librte_ from the directory names'''
+
+    mapfilepath = fix_directory_name(path)
+    tagfile = '{}:{}/{}'.format(rel, mapfilepath,  path.name)
+
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    return result
+
+def mapfile_renamed(path, rel):
+    '''Fix renaming of map files'''
+    newfile = None
+
+    result = subprocess.run(['git', 'ls-tree', \
+                             rel, str(path.parent) + '/'], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE,
+                            check=True)
+    dentries = result.stdout.decode('utf-8')
+    dentries = dentries.split('\n')
+
+    # filter entries looking for the map file
+    dentries = [dentry for dentry in dentries if dentry.endswith('.map')]
+    if len(dentries) > 1 or len(dentries) == 0:
+        return None
+
+    dparts = dentries[0].split('/')
+    newfile = dparts[len(dparts) - 1]
+
+    if newfile is not None:
+        tagfile = '{}:{}/{}'.format(rel, path.parent, newfile)
+
+        try:
+            result = subprocess.run(['git', 'show', tagfile], \
+                                    stdout=subprocess.PIPE, \
+                                    stderr=subprocess.PIPE,
+                                    check=True)
+        except subprocess.CalledProcessError:
+            result = None
+
+    else:
+        result = None
+
+    return result
+
+def mapfile_and_directory_renamed(path, rel):
+    '''Fix renaming of the map file & the source directory'''
+    mapfilepath = Path("{}/{}".format(fix_directory_name(path),path.name))
+
+    return mapfile_renamed(mapfilepath, rel)
+
+def get_terminal_rows():
+    '''Find the number of rows in the terminal'''
+
+    rows, _ = os.popen('stty size', 'r').read().split()
+    return int(rows)
+
+class FormatOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  dpdk_releases
+
+        self.terminal_rows = get_terminal_rows()
+        self.row = 0
+
+    def set_terminal_output(self,dpdk_rel):
+        '''Set the output format to Tabbed Seperated Values'''
+
+        self.output_fmt = '{:<50}' + \
+            ''.join(['{:<6}{:<6}'] * (len(dpdk_rel)))
+        self.column_fmt = '{:50}' + \
+            ''.join(['{:<12}'] * (len(dpdk_rel)))
+
+    def set_csv_output(self,dpdk_rel):
+        '''Set the output format to Comma Seperated Values'''
+
+        self.output_fmt = '{},' + \
+            ','.join(['{},{}'] * (len(dpdk_rel)))
+        self.column_fmt = '{},' + \
+            ','.join(['{},'] * (len(dpdk_rel)))
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+        self.row += 1
+
+    def print_row(self,symbols):
+        '''Print row of symbol values'''
+        print(self.output_fmt.format(*symbols))
+        self.row += 1
+
+        if((self.terminal_rows>0) and ((self.row % self.terminal_rows) == 0)):
+            self.print_columns()
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                       'terminal': set_terminal_output, \
+                       'csv': set_csv_output }
+
+SRC_DIRECTORIES = 'drivers, lib'
+IGNORE_SECTIONS = ['EXPERIMENTAL','INTERNAL']
+FIX_STRATEGIES = [directory_renamed, \
+                  mapfile_renamed, \
+                  mapfile_and_directory_renamed]
+
+def count_release_symbols(map_parser, release, mapfile_path):
+    '''Count the symbols for a given release and mapfile'''
+    csym = [0] * 2
+    abi_sections = None
+
+    tagfile = '{}:{}'.format(release,mapfile_path)
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    for fix_strategy in FIX_STRATEGIES:
+        if result is not None:
+            break
+        result = fix_strategy(mapfile_path, release)
+
+    if result is not None:
+        mapfile = result.stdout.decode('utf-8')
+        abi_sections = map_parser(mapfile).abi()
+
+    if abi_sections is not None:
+        # which versions are present, and we care about
+        found_ver = [ver \
+                     for ver in abi_sections \
+                     if ver not in IGNORE_SECTIONS]
+
+        for ver in found_ver:
+            csym[0] += len(abi_sections[ver])
+
+        # count experimental symbols
+        if 'EXPERIMENTAL' in abi_sections:
+            csym[1] = len(abi_sections['EXPERIMENTAL'])
+
+    return csym
+
+def main():
+    '''Main entry point'''
+
+    parser = argparse.ArgumentParser(description='Count symbols in DPDK Libs')
+    parser.add_argument('--format-output', choices=['terminal','csv'], \
+                        default='terminal')
+    parser.add_argument('--directory', choices=SRC_DIRECTORIES,
+                        default=SRC_DIRECTORIES)
+    args = parser.parse_args()
+
+    dpdk_releases = get_dpdk_releases()
+    format_output = FormatOutput(args.format_output, dpdk_releases)
+
+    map_grammar = MAP_GRAMMAR.format(get_abi_versions())
+    map_parser = makeGrammar(map_grammar, {})
+
+    format_output.print_columns()
+    for src_dir in args.directory.split(','):
+        for path in Path(src_dir).rglob('*.map'):
+            relsym = [str(path)]
+
+            for release in dpdk_releases:
+                csym = count_release_symbols(map_parser, release, path)
+                relsym += csym
+
+            format_output.print_row(relsym)
+
+if __name__ == '__main__':
+    main()
-- 
2.26.2


^ permalink raw reply	[relevance 6%]

* Re: [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays
  2021-06-21 15:32  0%                     ` Ferruh Yigit
@ 2021-06-21 15:37  0%                       ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2021-06-21 15:37 UTC (permalink / raw)
  To: Yigit, Ferruh, Thomas Monjalon, Richardson, Bruce
  Cc: Morten Brørup, dev, olivier.matz, andrew.rybchenko,
	honnappa.nagarahalli, jerinj, gakhil


> 
> On 6/21/2021 3:42 PM, Ananyev, Konstantin wrote:
> >
> >>>>>>>> One more thought here - if we are talking about rte_ethdev[] in particular, I think  we can:
> >>>>>>>> 1. move public function pointers (rx_pkt_burst(), etc.) from rte_ethdev into a separate flat array.
> >>>>>>>> We can keep it public to still use inline functions for 'fast' calls rte_eth_rx_burst(), etc. to avoid
> >>>>>>>> any regressions.
> >>>>>>>> That could still be flat array with max_size specified at application startup.
> >>>>>>>> 2. Hide rest of rte_ethdev struct in .c.
> >>>>>>>> That will allow us to change the struct itself and the whole rte_ethdev[] table in a way we like
> >>>>>>>> (flat array, vector, hash, linked list) without ABI/API breakages.
> >>>>>>>>
> >>>>>>>> Yes, it would require all PMDs to change prototype for pkt_rx_burst() function
> >>>>>>>> (to accept port_id, queue_id instead of queue pointer), but the change is mechanical one.
> >>>>>>>> Probably some macro can be provided to simplify it.
> >>>>>>>>
> >>>>>>>
> >>>>>>> We are already planning some tasks for ABI stability for v21.11, I think
> >>>>>>> splitting 'struct rte_eth_dev' can be part of that task, it enables hiding more
> >>>>>>> internal data.
> >>>>>>
> >>>>>> Ok, sounds good.
> >>>>>>
> >>>>>>>
> >>>>>>>> The only significant complication I can foresee with implementing that approach -
> >>>>>>>> we'll need a an array of 'fast' function pointers per queue, not per device as we have now
> >>>>>>>> (to avoid extra indirection for callback implementation).
> >>>>>>>> Though as a bonus we'll have ability to use different RX/TX funcions per queue.
> >>>>>>>>
> >>>>>>>
> >>>>>>> What do you think split Rx/Tx callback into its own struct too?
> >>>>>>>
> >>>>>>> Overall 'rte_eth_dev' can be split into three as:
> >>>>>>> 1. rte_eth_dev
> >>>>>>> 2. rte_eth_dev_burst
> >>>>>>> 3. rte_eth_dev_cb
> >>>>>>>
> >>>>>>> And we can hide 1 from applications even with the inline functions.
> >>>>>>
> >>>>>> As discussed off-line, I think:
> >>>>>> it is possible.
> >>>>>> My absolute preference would be to have just 1/2 (with CB hidden).
> >>>>>
> >>>>> How can we hide the callbacks since they are used by inline burst functions.
> >>>>
> >>>> I probably I owe a better explanation to what I meant in first mail.
> >>>> Otherwise it sounds confusing.
> >>>> I'll try to write a more detailed one in next few days.
> >>>
> >>> Actually I gave it another thought over weekend, and might be we can
> >>> hide rte_eth_dev_cb even in a simpler way. I'd use eth_rx_burst() as
> >>> an example, but the same principle applies to other 'fast' functions.
> >>>
> >>>  1. Needed changes for PMDs rx_pkt_burst():
> >>>     a) change function prototype to accept 'uint16_t port_id' and 'uint16_t queue_id',
> >>>          instead of current 'void *'.
> >>>     b) Each PMD rx_pkt_burst() will have to call rte_eth_rx_epilog() function at return.
> >>>          This  inline function will do all CB calls for that queue.
> >>>
> >>> To be more specific, let say we have some PMD: xyz with RX function:
> >>>
> >>> uint16_t
> >>> xyz_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
> >>> {
> >>>      struct xyz_rx_queue *rxq = rx_queue;
> >>>      uint16_t nb_rx = 0;
> >>>
> >>>      /* do actual stuff here */
> >>>     ....
> >>>     return nb_rx;
> >>> }
> >>>
> >>> It will be transformed to:
> >>>
> >>> uint16_t
> >>> xyz_recv_pkts(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
> >>> {
> >>>          struct xyz_rx_queue *rxq;
> >>>          uint16_t nb_rx;
> >>>
> >>>          rxq = _rte_eth_rx_prolog(port_id, queue_id);
> >>>          if (rxq == NULL)
> >>>              return 0;
> >>>          nb_rx = _xyz_real_recv_pkts(rxq, rx_pkts, nb_pkts);
> >>>          return _rte_eth_rx_epilog(port_id, queue_id, rx_pkts, nb_pkts);
> >>> }
> >>>
> >>> And somewhere in ethdev_private.h:
> >>>
> >>> static inline void *
> >>> _rte_eth_rx_prolog(uint16_t port_id, uint16_t queue_id);
> >>> {
> >>>    struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> >>>
> >>> #ifdef RTE_ETHDEV_DEBUG_RX
> >>>         RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
> >>>         RTE_FUNC_PTR_OR_ERR_RET(*dev->rx_pkt_burst, NULL);
> >>>
> >>>         if (queue_id >= dev->data->nb_rx_queues) {
> >>>                 RTE_ETHDEV_LOG(ERR, "Invalid RX queue_id=%u\n", queue_id);
> >>>                 return NULL;
> >>>         }
> >>> #endif
> >>>   return dev->data->rx_queues[queue_id];
> >>> }
> >>>
> >>> static inline uint16_t
> >>> _rte_eth_rx_epilog(uint16_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, const uint16_t nb_pkts);
> >>> {
> >>>     struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> >>>
> >>> #ifdef RTE_ETHDEV_RXTX_CALLBACKS
> >>>         struct rte_eth_rxtx_callback *cb;
> >>>
> >>>         /* __ATOMIC_RELEASE memory order was used when the
> >>>          * call back was inserted into the list.
> >>>          * Since there is a clear dependency between loading
> >>>          * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
> >>>          * not required.
> >>>          */
> >>>         cb = __atomic_load_n(&dev->post_rx_burst_cbs[queue_id],
> >>>                                 __ATOMIC_RELAXED);
> >>>
> >>>         if (unlikely(cb != NULL)) {
> >>>                 do {
> >>>                         nb_rx = cb->fn.rx(port_id, queue_id, rx_pkts, nb_rx,
> >>>                                                 nb_pkts, cb->param);
> >>>                         cb = cb->next;
> >>>                 } while (cb != NULL);
> >>>         }
> >>> #endif
> >>>
> >>>         rte_ethdev_trace_rx_burst(port_id, queue_id, (void **)rx_pkts, nb_rx);
> >>>         return nb_rx;
> >>>  }
> >>>
> >>> Now, as you said above, in rte_ethdev.h we will keep only a flat array
> >>> with pointers to 'fast' functions:
> >>> struct {
> >>>      eth_rx_burst_t             rx_pkt_burst
> >>>       eth_tx_burst_t             tx_pkt_burst;
> >>>       eth_tx_prep_t              tx_pkt_prepare;
> >>>      .....
> >>> } rte_eth_dev_burst[];
> >>>
> >>> And rte_eth_rx_burst() will look like:
> >>>
> >>> static inline uint16_t
> >>> rte_eth_rx_burst(uint16_t port_id, uint16_t queue_id,
> >>>                  struct rte_mbuf **rx_pkts, const uint16_t nb_pkts)
> >>> {
> >>>     if (port_id >= RTE_MAX_ETHPORTS)
> >>>         return 0;
> >>>    return rte_eth_dev_burst[port_id](port_id, queue_id, rx_pkts, nb_pkts);
> >>> }
> >>>
> >>> Yes, it will require changes in *all* PMDs, but as I said before the changes will be a mechanic ones.
> >>>
> >>
> >> I did not like the idea to push to calling Rx/TX callbacks responsibility to the
> >> drivers, I think it should be in the ethdev layer.
> >
> > Well, I'd say it is an ethdev layer function that has to be called by PMD 😊
> >
> >>
> >> What about making 'rte_eth_rx_epilog' an API and call from 'rte_eth_rx_burst()',
> >> which will add another function call for Rx/Tx callback but shouldn't affect the
> >> Rx/Tx burst.
> >
> > But then we either need to expose call-back information to the user or pay the penalty
> > for extra function call, correct?
> >
> 
> Right. As a middle ground, we can keep Rx/Tx burst functions as inline, but have
> the Rx/Tx callback part of it as function, so get the hit only for callbacks.

To avoid the  hit we need to expose CB data to the user.
At least number of call-backs currently installed for each queue. 


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4] devtools: script to track map symbols
  2021-06-18 16:36  5% [dpdk-dev] [PATCH] devtools: script to track map symbols Ray Kinsella
  2021-06-21 15:25  6% ` [dpdk-dev] [PATCH v3] " Ray Kinsella
@ 2021-06-21 15:35  6% ` Ray Kinsella
  2021-06-22 10:19  6% ` [dpdk-dev] [PATCH v5] " Ray Kinsella
  2 siblings, 0 replies; 200+ results
From: Ray Kinsella @ 2021-06-21 15:35 UTC (permalink / raw)
  To: dev; +Cc: stephen, ferruh.yigit, thomas, ktraynor, bruce.richardson, mdr

Script to track growth of stable and experimental symbols
over releases since v19.11.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
v2: reworked to fix pylint errors
v3: sent with the correct in-reply-to
v4: fix typos picked up by the CI

 devtools/count_symbols.py | 262 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 262 insertions(+)
 create mode 100755 devtools/count_symbols.py

diff --git a/devtools/count_symbols.py b/devtools/count_symbols.py
new file mode 100755
index 0000000000..6194df0318
--- /dev/null
+++ b/devtools/count_symbols.py
@@ -0,0 +1,262 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+'''Tool to count the number of symbols in each DPDK release'''
+from pathlib import Path
+import sys
+import os
+import subprocess
+import argparse
+import re
+import datetime
+
+try:
+    from parsley import makeGrammar
+except ImportError:
+    print('This script uses the package Parsley to parse C Mapfiles.\n'
+          'This can be installed with \"pip install parsley".')
+    sys.exit()
+
+MAP_GRAMMAR = r"""
+
+ws = (' ' | '\r' | '\n' | '\t')*
+
+ABI_VER = ({})
+DPDK_VER = ('DPDK_' ABI_VER)
+ABI_NAME = ('INTERNAL' | 'EXPERIMENTAL' | DPDK_VER)
+comment = '#' (~'\n' anything)+ '\n'
+symbol = (~(';' | '}}' | '#') anything )+:c ';' -> ''.join(c)
+global = 'global:'
+local = 'local: *;'
+symbols = comment* symbol:s ws comment* -> s
+
+abi = (abi_section+):m -> dict(m)
+abi_section = (ws ABI_NAME:e ws '{{' ws global* (~local ws symbols)*:s ws local* ws '}}' ws DPDK_VER* ';' ws) -> (e,s)
+"""
+
+def get_abi_versions():
+    '''Returns a string of possible dpdk abi versions'''
+
+    year = datetime.date.today().year - 2000
+    tags = " |".join(['\'{}\''.format(i) \
+                     for i in reversed(range(21, year + 1)) ])
+    tags  = tags + ' | \'20.0.1\' | \'20.0\' | \'20\''
+
+    return tags
+
+def get_dpdk_releases():
+    '''Returns a list of dpdk release tags names  since v19.11'''
+
+    year = datetime.date.today().year - 2000
+    year_range = "|".join("{}".format(i) for i in range(19,year + 1))
+    pattern = re.compile(r'^\"v(' +  year_range + r')\.\d{2}\"$')
+
+    cmd = ['git', 'for-each-ref', '--sort=taggerdate', '--format', '"%(tag)"']
+    try:
+        result = subprocess.run(cmd, \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        print("Failed to interogate git for release tags")
+        sys.exit()
+
+    tags = result.stdout.decode('utf-8').split('\n')
+
+    # find the non-rcs between now and v19.11
+    tags = [ tag.replace('\"','') \
+             for tag in reversed(tags) \
+             if pattern.match(tag) ][:-3]
+
+    return tags
+
+def fix_directory_name(path):
+    '''Prepend librte to the source directory name'''
+    mapfilepath1 = str(path.parent.name)
+    mapfilepath2 = str(path.parents[1])
+    mapfilepath = mapfilepath2 + '/librte_' + mapfilepath1
+
+    return mapfilepath
+
+def directory_renamed(path, rel):
+    '''Fix removal of the librte_ from the directory names'''
+
+    mapfilepath = fix_directory_name(path)
+    tagfile = '{}:{}/{}'.format(rel, mapfilepath,  path.name)
+
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    return result
+
+def mapfile_renamed(path, rel):
+    '''Fix renaming of the map file'''
+    newfile = None
+
+    result = subprocess.run(['git', 'ls-tree', \
+                             rel, str(path.parent) + '/'], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE,
+                            check=True)
+    dentries = result.stdout.decode('utf-8')
+    dentries = dentries.split('\n')
+
+    # filter entries looking for the map file
+    dentries = [dentry for dentry in dentries if dentry.endswith('.map')]
+    if len(dentries) > 1 or len(dentries) == 0:
+        return None
+
+    dparts = dentries[0].split('/')
+    newfile = dparts[len(dparts) - 1]
+
+    if newfile is not None:
+        tagfile = '{}:{}/{}'.format(rel, path.parent, newfile)
+
+        try:
+            result = subprocess.run(['git', 'show', tagfile], \
+                                    stdout=subprocess.PIPE, \
+                                    stderr=subprocess.PIPE,
+                                    check=True)
+        except subprocess.CalledProcessError:
+            result = None
+
+    else:
+        result = None
+
+    return result
+
+def mapfile_and_directory_renamed(path, rel):
+    '''Fix renaming of the map file & the source directory'''
+    mapfilepath = Path("{}/{}".format(fix_directory_name(path),path.name))
+
+    return mapfile_renamed(mapfilepath, rel)
+
+def get_terminal_rows():
+    '''Find the number of rows in the terminal'''
+
+    rows, _ = os.popen('stty size', 'r').read().split()
+    return int(rows)
+
+class FormatOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  dpdk_releases
+
+        self.terminal_rows = get_terminal_rows()
+        self.row = 0
+
+    def set_terminal_output(self,dpdk_rel):
+        '''Set the output format to Tabbed Separated Values'''
+
+        self.output_fmt = '{:<50}' + \
+            ''.join(['{:<6}{:<6}'] * (len(dpdk_rel)))
+        self.column_fmt = '{:50}' + \
+            ''.join(['{:<12}'] * (len(dpdk_rel)))
+
+    def set_csv_output(self,dpdk_rel):
+        '''Set the output format to Comma Separated Values'''
+
+        self.output_fmt = '{},' + \
+            ','.join(['{},{}'] * (len(dpdk_rel)))
+        self.column_fmt = '{},' + \
+            ','.join(['{},'] * (len(dpdk_rel)))
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+        self.row += 1
+
+    def print_row(self,symbols):
+        '''Print row of symbol values'''
+        print(self.output_fmt.format(*symbols))
+        self.row += 1
+
+        if((self.terminal_rows>0) and ((self.row % self.terminal_rows) == 0)):
+            self.print_columns()
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                       'terminal': set_terminal_output, \
+                       'csv': set_csv_output }
+
+SRC_DIRECTORIES = 'drivers, lib'
+IGNORE_SECTIONS = ['EXPERIMENTAL','INTERNAL']
+FIX_STRATEGIES = [directory_renamed, \
+                  mapfile_renamed, \
+                  mapfile_and_directory_renamed]
+
+def count_release_symbols(map_parser, release, mapfile_path):
+    '''Count the symbols for a given release and mapfile'''
+    csym = [0] * 2
+    abi_sections = None
+
+    tagfile = '{}:{}'.format(release,mapfile_path)
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    for fix_strategy in FIX_STRATEGIES:
+        if result is not None:
+            break
+        result = fix_strategy(mapfile_path, release)
+
+    if result is not None:
+        mapfile = result.stdout.decode('utf-8')
+        abi_sections = map_parser(mapfile).abi()
+
+    if abi_sections is not None:
+        # which versions are present, and we care about
+        found_ver = [ver \
+                     for ver in abi_sections \
+                     if ver not in IGNORE_SECTIONS]
+
+        for ver in found_ver:
+            csym[0] += len(abi_sections[ver])
+
+        # count experimental symbols
+        if 'EXPERIMENTAL' in abi_sections:
+            csym[1] = len(abi_sections['EXPERIMENTAL'])
+
+    return csym
+
+def main():
+    '''Main entry point'''
+
+    parser = argparse.ArgumentParser(description='Count symbols in DPDK Libs')
+    parser.add_argument('--format-output', choices=['terminal','csv'], \
+                        default='terminal')
+    parser.add_argument('--directory', choices=SRC_DIRECTORIES,
+                        default=SRC_DIRECTORIES)
+    args = parser.parse_args()
+
+    dpdk_releases = get_dpdk_releases()
+    format_output = FormatOutput(args.format_output, dpdk_releases)
+
+    map_grammar = MAP_GRAMMAR.format(get_abi_versions())
+    map_parser = makeGrammar(map_grammar, {})
+
+    format_output.print_columns()
+    for src_dir in args.directory.split(','):
+        for path in Path(src_dir).rglob('*.map'):
+            relsym = [str(path)]
+
+            for release in dpdk_releases:
+                csym = count_release_symbols(map_parser, release, path)
+                relsym += csym
+
+            format_output.print_row(relsym)
+
+if __name__ == '__main__':
+    main()
-- 
2.26.2


^ permalink raw reply	[relevance 6%]

* [dpdk-dev] 回复: [PATCH v1 1/2] devtools: add relative path support for ABI compatibility check
  @ 2021-06-22  2:08  4%   ` Feifei Wang
  2021-06-22  9:19  4%   ` [dpdk-dev] " Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Feifei Wang @ 2021-06-22  2:08 UTC (permalink / raw)
  To: Feifei Wang, Bruce Richardson
  Cc: dev, nd, Phil Yang, Juraj Linkeš, Ruifeng Wang, nd

Hi, Bruce

Would you please help review this patch series?
Thanks.

Best Regards
Feifei

> -----邮件原件-----
> 发件人: Feifei Wang <feifei.wang2@arm.com>
> 发送时间: 2021年6月1日 9:57
> 收件人: Bruce Richardson <bruce.richardson@intel.com>
> 抄送: dev@dpdk.org; nd <nd@arm.com>; Phil Yang <Phil.Yang@arm.com>;
> Feifei Wang <Feifei.Wang2@arm.com>; Juraj Linkeš
> <juraj.linkes@pantheon.tech>; Ruifeng Wang <Ruifeng.Wang@arm.com>
> 主题: [PATCH v1 1/2] devtools: add relative path support for ABI
> compatibility check
> 
> From: Phil Yang <phil.yang@arm.com>
> 
> Because dpdk guide does not limit the relative path for ABI compatibility
> check, users maybe set 'DPDK_ABI_REF_DIR' as a relative
> path:
> 
> ~/dpdk/devtools$ DPDK_ABI_REF_VERSION=v19.11
> DPDK_ABI_REF_DIR=build-gcc-shared ./test-meson-builds.sh
> 
> And if the DESTDIR is not an absolute path, ninja complains:
> + install_target build-gcc-shared/v19.11/build
> + build-gcc-shared/v19.11/build-gcc-shared
> + rm -rf build-gcc-shared/v19.11/build-gcc-shared
> + echo 'DESTDIR=build-gcc-shared/v19.11/build-gcc-shared ninja -C build-gcc-
> shared/v19.11/build install'
> + DESTDIR=build-gcc-shared/v19.11/build-gcc-shared
> + ninja -C build-gcc-shared/v19.11/build install
> ...
> ValueError: dst_dir must be absolute, got build-gcc-shared/v19.11/build-gcc-
> shared/usr/local/share/dpdk/
> examples/bbdev_app
> ...
> Error: install directory 'build-gcc-shared/v19.11/build-gcc-shared' does not
> exist.
> 
> To fix this, add relative path support using 'readlink -f'.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  devtools/test-meson-builds.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
> index daf817ac3e..43b906598d 100755
> --- a/devtools/test-meson-builds.sh
> +++ b/devtools/test-meson-builds.sh
> @@ -168,7 +168,8 @@ build () # <directory> <target cc | cross file> <ABI
> check> [meson options]
>  	config $srcdir $builds_dir/$targetdir $cross --werror $*
>  	compile $builds_dir/$targetdir
>  	if [ -n "$DPDK_ABI_REF_VERSION" -a "$abicheck" = ABI ] ; then
> -		abirefdir=${DPDK_ABI_REF_DIR:-
> reference}/$DPDK_ABI_REF_VERSION
> +		abirefdir=$(readlink -f \
> +			${DPDK_ABI_REF_DIR:-
> reference}/$DPDK_ABI_REF_VERSION)
>  		if [ ! -d $abirefdir/$targetdir ]; then
>  			# clone current sources
>  			if [ ! -d $abirefdir/src ]; then
> --
> 2.25.1


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1 1/2] devtools: add relative path support for ABI compatibility check
    2021-06-22  2:08  4%   ` [dpdk-dev] 回复: " Feifei Wang
@ 2021-06-22  9:19  4%   ` Bruce Richardson
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2021-06-22  9:19 UTC (permalink / raw)
  To: Feifei Wang; +Cc: dev, nd, Phil Yang, Juraj Linkeš, Ruifeng Wang

On Tue, Jun 01, 2021 at 09:56:52AM +0800, Feifei Wang wrote:
> From: Phil Yang <phil.yang@arm.com>
> 
> Because dpdk guide does not limit the relative path for ABI
> compatibility check, users maybe set 'DPDK_ABI_REF_DIR' as a relative
> path:
> 
> ~/dpdk/devtools$ DPDK_ABI_REF_VERSION=v19.11 DPDK_ABI_REF_DIR=build-gcc-shared
> ./test-meson-builds.sh
> 
> And if the DESTDIR is not an absolute path, ninja complains:
> + install_target build-gcc-shared/v19.11/build build-gcc-shared/v19.11/build-gcc-shared
> + rm -rf build-gcc-shared/v19.11/build-gcc-shared
> + echo 'DESTDIR=build-gcc-shared/v19.11/build-gcc-shared ninja -C build-gcc-shared/v19.11/build install'
> + DESTDIR=build-gcc-shared/v19.11/build-gcc-shared
> + ninja -C build-gcc-shared/v19.11/build install
> ...
> ValueError: dst_dir must be absolute, got build-gcc-shared/v19.11/build-gcc-shared/usr/local/share/dpdk/
> examples/bbdev_app
> ...
> Error: install directory 'build-gcc-shared/v19.11/build-gcc-shared' does not exist.
> 
> To fix this, add relative path support using 'readlink -f'.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Signed-off-by: Feifei Wang <feifei.wang2@arm.com>
> Reviewed-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  devtools/test-meson-builds.sh | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/devtools/test-meson-builds.sh b/devtools/test-meson-builds.sh
> index daf817ac3e..43b906598d 100755
> --- a/devtools/test-meson-builds.sh
> +++ b/devtools/test-meson-builds.sh
> @@ -168,7 +168,8 @@ build () # <directory> <target cc | cross file> <ABI check> [meson options]
>  	config $srcdir $builds_dir/$targetdir $cross --werror $*
>  	compile $builds_dir/$targetdir
>  	if [ -n "$DPDK_ABI_REF_VERSION" -a "$abicheck" = ABI ] ; then
> -		abirefdir=${DPDK_ABI_REF_DIR:-reference}/$DPDK_ABI_REF_VERSION
> +		abirefdir=$(readlink -f \
> +			${DPDK_ABI_REF_DIR:-reference}/$DPDK_ABI_REF_VERSION)
>  		if [ ! -d $abirefdir/$targetdir ]; then
>  			# clone current sources
>  			if [ ! -d $abirefdir/src ]; then

This looks a simple enough change.

Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v5] devtools: script to track map symbols
  2021-06-18 16:36  5% [dpdk-dev] [PATCH] devtools: script to track map symbols Ray Kinsella
  2021-06-21 15:25  6% ` [dpdk-dev] [PATCH v3] " Ray Kinsella
  2021-06-21 15:35  6% ` [dpdk-dev] [PATCH v4] " Ray Kinsella
@ 2021-06-22 10:19  6% ` Ray Kinsella
  2 siblings, 0 replies; 200+ results
From: Ray Kinsella @ 2021-06-22 10:19 UTC (permalink / raw)
  To: dev; +Cc: stephen, ferruh.yigit, thomas, ktraynor, bruce.richardson, mdr

Script to track growth of stable and experimental symbols
over releases since v19.11.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
v2: reworked to fix pylint errors
v3: sent with the correct in-reply-to
v4: fix typos picked up by the CI
v5: fix terminal_size & directory args

 devtools/count_symbols.py | 262 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 262 insertions(+)
 create mode 100755 devtools/count_symbols.py

diff --git a/devtools/count_symbols.py b/devtools/count_symbols.py
new file mode 100755
index 0000000000..96990f609f
--- /dev/null
+++ b/devtools/count_symbols.py
@@ -0,0 +1,262 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 Intel Corporation
+'''Tool to count the number of symbols in each DPDK release'''
+from pathlib import Path
+import sys
+import os
+import subprocess
+import argparse
+import re
+import datetime
+
+try:
+    from parsley import makeGrammar
+except ImportError:
+    print('This script uses the package Parsley to parse C Mapfiles.\n'
+          'This can be installed with \"pip install parsley".')
+    sys.exit()
+
+MAP_GRAMMAR = r"""
+
+ws = (' ' | '\r' | '\n' | '\t')*
+
+ABI_VER = ({})
+DPDK_VER = ('DPDK_' ABI_VER)
+ABI_NAME = ('INTERNAL' | 'EXPERIMENTAL' | DPDK_VER)
+comment = '#' (~'\n' anything)+ '\n'
+symbol = (~(';' | '}}' | '#') anything )+:c ';' -> ''.join(c)
+global = 'global:'
+local = 'local: *;'
+symbols = comment* symbol:s ws comment* -> s
+
+abi = (abi_section+):m -> dict(m)
+abi_section = (ws ABI_NAME:e ws '{{' ws global* (~local ws symbols)*:s ws local* ws '}}' ws DPDK_VER* ';' ws) -> (e,s)
+"""
+
+def get_abi_versions():
+    '''Returns a string of possible dpdk abi versions'''
+
+    year = datetime.date.today().year - 2000
+    tags = " |".join(['\'{}\''.format(i) \
+                     for i in reversed(range(21, year + 1)) ])
+    tags  = tags + ' | \'20.0.1\' | \'20.0\' | \'20\''
+
+    return tags
+
+def get_dpdk_releases():
+    '''Returns a list of dpdk release tags names  since v19.11'''
+
+    year = datetime.date.today().year - 2000
+    year_range = "|".join("{}".format(i) for i in range(19,year + 1))
+    pattern = re.compile(r'^\"v(' +  year_range + r')\.\d{2}\"$')
+
+    cmd = ['git', 'for-each-ref', '--sort=taggerdate', '--format', '"%(tag)"']
+    try:
+        result = subprocess.run(cmd, \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        print("Failed to interogate git for release tags")
+        sys.exit()
+
+
+    tags = result.stdout.decode('utf-8').split('\n')
+
+    # find the non-rcs between now and v19.11
+    tags = [ tag.replace('\"','') \
+             for tag in reversed(tags) \
+             if pattern.match(tag) ][:-3]
+
+    return tags
+
+def fix_directory_name(path):
+    '''Prepend librte to the source directory name'''
+    mapfilepath1 = str(path.parent.name)
+    mapfilepath2 = str(path.parents[1])
+    mapfilepath = mapfilepath2 + '/librte_' + mapfilepath1
+
+    return mapfilepath
+
+def directory_renamed(path, rel):
+    '''Fix removal of the librte_ from the directory names'''
+
+    mapfilepath = fix_directory_name(path)
+    tagfile = '{}:{}/{}'.format(rel, mapfilepath,  path.name)
+
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    return result
+
+def mapfile_renamed(path, rel):
+    '''Fix renaming of the map file'''
+    newfile = None
+
+    result = subprocess.run(['git', 'ls-tree', \
+                             rel, str(path.parent) + '/'], \
+                            stdout=subprocess.PIPE, \
+                            stderr=subprocess.PIPE,
+                            check=True)
+    dentries = result.stdout.decode('utf-8')
+    dentries = dentries.split('\n')
+
+    # filter entries looking for the map file
+    dentries = [dentry for dentry in dentries if dentry.endswith('.map')]
+    if len(dentries) > 1 or len(dentries) == 0:
+        return None
+
+    dparts = dentries[0].split('/')
+    newfile = dparts[len(dparts) - 1]
+
+    if newfile is not None:
+        tagfile = '{}:{}/{}'.format(rel, path.parent, newfile)
+
+        try:
+            result = subprocess.run(['git', 'show', tagfile], \
+                                    stdout=subprocess.PIPE, \
+                                    stderr=subprocess.PIPE,
+                                    check=True)
+        except subprocess.CalledProcessError:
+            result = None
+
+    else:
+        result = None
+
+    return result
+
+def mapfile_and_directory_renamed(path, rel):
+    '''Fix renaming of the map file & the source directory'''
+    mapfilepath = Path("{}/{}".format(fix_directory_name(path),path.name))
+
+    return mapfile_renamed(mapfilepath, rel)
+
+def get_terminal_rows():
+    '''Find the number of rows in the terminal'''
+
+    return os.get_terminal_size().lines
+
+class FormatOutput():
+    '''Format the output to supported formats'''
+    output_fmt = ""
+    column_fmt = ""
+
+    def __init__(self, format_output, dpdk_releases):
+        self.OUTPUT_FORMATS[format_output](self,dpdk_releases)
+        self.column_titles = ['mapfile'] +  dpdk_releases
+
+        self.terminal_rows = get_terminal_rows()
+        self.row = 0
+
+    def set_terminal_output(self,dpdk_rel):
+        '''Set the output format to Tabbed Separated Values'''
+
+        self.output_fmt = '{:<50}' + \
+            ''.join(['{:<6}{:<6}'] * (len(dpdk_rel)))
+        self.column_fmt = '{:50}' + \
+            ''.join(['{:<12}'] * (len(dpdk_rel)))
+
+    def set_csv_output(self,dpdk_rel):
+        '''Set the output format to Comma Separated Values'''
+
+        self.output_fmt = '{},' + \
+            ','.join(['{},{}'] * (len(dpdk_rel)))
+        self.column_fmt = '{},' + \
+            ','.join(['{},'] * (len(dpdk_rel)))
+
+    def print_columns(self):
+        '''Print column rows with release names'''
+        print(self.column_fmt.format(*self.column_titles))
+        self.row += 1
+
+    def print_row(self,symbols):
+        '''Print row of symbol values'''
+        print(self.output_fmt.format(*symbols))
+        self.row += 1
+
+        if((self.terminal_rows>0) and ((self.row % self.terminal_rows) == 0)):
+            self.print_columns()
+
+    OUTPUT_FORMATS = { None: set_terminal_output, \
+                       'terminal': set_terminal_output, \
+                       'csv': set_csv_output }
+
+SRC_DIRECTORIES = 'drivers,lib'
+IGNORE_SECTIONS = ['EXPERIMENTAL','INTERNAL']
+FIX_STRATEGIES = [directory_renamed, \
+                  mapfile_renamed, \
+                  mapfile_and_directory_renamed]
+
+def count_release_symbols(map_parser, release, mapfile_path):
+    '''Count the symbols for a given release and mapfile'''
+    csym = [0] * 2
+    abi_sections = None
+
+    tagfile = '{}:{}'.format(release,mapfile_path)
+    try:
+        result = subprocess.run(['git', 'show', tagfile], \
+                                stdout=subprocess.PIPE, \
+                                stderr=subprocess.PIPE,
+                                check=True)
+    except subprocess.CalledProcessError:
+        result = None
+
+    for fix_strategy in FIX_STRATEGIES:
+        if result is not None:
+            break
+        result = fix_strategy(mapfile_path, release)
+
+    if result is not None:
+        mapfile = result.stdout.decode('utf-8')
+        abi_sections = map_parser(mapfile).abi()
+
+    if abi_sections is not None:
+        # which versions are present, and we care about
+        found_ver = [ver \
+                     for ver in abi_sections \
+                     if ver not in IGNORE_SECTIONS]
+
+        for ver in found_ver:
+            csym[0] += len(abi_sections[ver])
+
+        # count experimental symbols
+        if 'EXPERIMENTAL' in abi_sections:
+            csym[1] = len(abi_sections['EXPERIMENTAL'])
+
+    return csym
+
+def main():
+    '''Main entry point'''
+
+    parser = argparse.ArgumentParser(description='Count symbols in DPDK Libs')
+    parser.add_argument('--format-output', choices=['terminal','csv'], \
+                        default='terminal')
+    parser.add_argument('--directory', choices=SRC_DIRECTORIES.split(','),
+                        default=SRC_DIRECTORIES)
+    args = parser.parse_args()
+
+    dpdk_releases = get_dpdk_releases()
+    format_output = FormatOutput(args.format_output, dpdk_releases)
+
+    map_grammar = MAP_GRAMMAR.format(get_abi_versions())
+    map_parser = makeGrammar(map_grammar, {})
+
+    format_output.print_columns()
+    for src_dir in args.directory.split(','):
+        for path in Path(src_dir).rglob('*.map'):
+            relsym = [str(path)]
+
+            for release in dpdk_releases:
+                csym = count_release_symbols(map_parser, release, path)
+                relsym += csym
+
+            format_output.print_row(relsym)
+
+if __name__ == '__main__':
+    main()
-- 
2.26.2


^ permalink raw reply	[relevance 6%]

* [dpdk-dev] [PATCH v1] doc: update ABI in MAINTAINERS file
@ 2021-06-22 15:50 12% Ray Kinsella
  2021-06-25  8:08  7% ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Ray Kinsella @ 2021-06-22 15:50 UTC (permalink / raw)
  To: dev; +Cc: stephen, ferruh.yigit, thomas, ktraynor, bruce.richardson, mdr

Update to ABI MAINTAINERS.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5877a16971..dab8883a4f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -117,7 +117,6 @@ F: .ci/
 
 ABI Policy & Versioning
 M: Ray Kinsella <mdr@ashroe.eu>
-M: Neil Horman <nhorman@tuxdriver.com>
 F: lib/eal/include/rte_compat.h
 F: lib/eal/include/rte_function_versioning.h
 F: doc/guides/contributing/abi_*.rst
-- 
2.26.2


^ permalink raw reply	[relevance 12%]

* Re: [dpdk-dev] [PATCH v4 2/2] bus/auxiliary: introduce auxiliary bus
  @ 2021-06-23  8:15  4%     ` Thomas Monjalon
  2021-06-23 14:52  3%       ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-06-23  8:15 UTC (permalink / raw)
  To: Xueming(Steven) Li
  Cc: Parav Pandit, dev, Wang Haiyue, Kinsella Ray, david.marchand,
	ferruh.yigit

23/06/2021 01:50, Xueming(Steven) Li:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 13/06/2021 14:58, Xueming Li:
> > > --- /dev/null
> > > +++ b/drivers/bus/auxiliary/version.map
> > > @@ -0,0 +1,7 @@
> > > +EXPERIMENTAL {
> > > +	global:
> > > +
> > > +	# added in 21.08
> > > +	rte_auxiliary_register;
> > > +	rte_auxiliary_unregister;
> > > +};
> > 
> > After more thoughts, shouldn't it be an internal symbol?
> > It is used only by DPDK drivers.
> 
> So users will not be able to compose their own driver and register with auxiliary bus?z

Yes, that's an interesting question actually.
We can continue with experimental/stable status of driver ABI,
but we should invent a new ABI flag like DRIVER,
so there is no stability policy on such symbol.



^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v4 2/2] bus/auxiliary: introduce auxiliary bus
  2021-06-23  8:15  4%     ` Thomas Monjalon
@ 2021-06-23 14:52  3%       ` Xueming(Steven) Li
  2021-06-24  6:37  3%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-06-23 14:52 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon
  Cc: Parav Pandit, dev, Wang Haiyue, Kinsella Ray, david.marchand,
	ferruh.yigit



> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Wednesday, June 23, 2021 4:15 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>
> Cc: Parav Pandit <parav@nvidia.com>; dev@dpdk.org; Wang Haiyue <haiyue.wang@intel.com>; Kinsella Ray <mdr@ashroe.eu>;
> david.marchand@redhat.com; ferruh.yigit@intel.com
> Subject: Re: [dpdk-dev] [PATCH v4 2/2] bus/auxiliary: introduce auxiliary bus
> 
> 23/06/2021 01:50, Xueming(Steven) Li:
> > From: Thomas Monjalon <thomas@monjalon.net>
> > > 13/06/2021 14:58, Xueming Li:
> > > > --- /dev/null
> > > > +++ b/drivers/bus/auxiliary/version.map
> > > > @@ -0,0 +1,7 @@
> > > > +EXPERIMENTAL {
> > > > +	global:
> > > > +
> > > > +	# added in 21.08
> > > > +	rte_auxiliary_register;
> > > > +	rte_auxiliary_unregister;
> > > > +};
> > >
> > > After more thoughts, shouldn't it be an internal symbol?
> > > It is used only by DPDK drivers.
> >
> > So users will not be able to compose their own driver and register
> > with auxiliary bus?z
> 
> Yes, that's an interesting question actually.
> We can continue with experimental/stable status of driver ABI, but we should invent a new ABI flag like DRIVER, so there is no stability
> policy on such symbol.

Not quite understand here, why we want to export the function but no ABI guarantee? the api shouldn't change frequently IMHO.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 2/2] bus/auxiliary: introduce auxiliary bus
  2021-06-23 14:52  3%       ` Xueming(Steven) Li
@ 2021-06-24  6:37  3%         ` Thomas Monjalon
  2021-06-24  8:42  3%           ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-06-24  6:37 UTC (permalink / raw)
  To: Xueming(Steven) Li
  Cc: Parav Pandit, dev, Wang Haiyue, Kinsella Ray, david.marchand,
	ferruh.yigit

23/06/2021 16:52, Xueming(Steven) Li:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 23/06/2021 01:50, Xueming(Steven) Li:
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > 13/06/2021 14:58, Xueming Li:
> > > > > --- /dev/null
> > > > > +++ b/drivers/bus/auxiliary/version.map
> > > > > @@ -0,0 +1,7 @@
> > > > > +EXPERIMENTAL {
> > > > > +	global:
> > > > > +
> > > > > +	# added in 21.08
> > > > > +	rte_auxiliary_register;
> > > > > +	rte_auxiliary_unregister;
> > > > > +};
> > > >
> > > > After more thoughts, shouldn't it be an internal symbol?
> > > > It is used only by DPDK drivers.
> > >
> > > So users will not be able to compose their own driver and register
> > > with auxiliary bus?z
> > 
> > Yes, that's an interesting question actually.
> > We can continue with experimental/stable status of driver ABI, but we should invent a new ABI flag like DRIVER, so there is no stability
> > policy on such symbol.
> 
> Not quite understand here, why we want to export the function but no ABI guarantee? the api shouldn't change frequently IMHO.

Sorry my message was not clear.
I am OK to keep "EXPERIMENTAL" in this patch.
But in future, we don't want to make driver interface as part
of the stable ABI because it makes evolution harder for no good reason:
nobody is asking for a stable interface with drivers.



^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 2/2] bus/auxiliary: introduce auxiliary bus
  2021-06-24  6:37  3%         ` Thomas Monjalon
@ 2021-06-24  8:42  3%           ` Xueming(Steven) Li
  0 siblings, 0 replies; 200+ results
From: Xueming(Steven) Li @ 2021-06-24  8:42 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon
  Cc: Parav Pandit, dev, Wang Haiyue, Kinsella Ray, david.marchand,
	ferruh.yigit

Thanks for clarification, will update in next version.
________________________________
From: Thomas Monjalon <thomas@monjalon.net>
Sent: Thursday, June 24, 2021 2:37:19 PM
To: Xueming(Steven) Li <xuemingl@nvidia.com>
Cc: Parav Pandit <parav@nvidia.com>; dev@dpdk.org <dev@dpdk.org>; Wang Haiyue <haiyue.wang@intel.com>; Kinsella Ray <mdr@ashroe.eu>; david.marchand@redhat.com <david.marchand@redhat.com>; ferruh.yigit@intel.com <ferruh.yigit@intel.com>
Subject: Re: [dpdk-dev] [PATCH v4 2/2] bus/auxiliary: introduce auxiliary bus

23/06/2021 16:52, Xueming(Steven) Li:
> From: Thomas Monjalon <thomas@monjalon.net>
> > 23/06/2021 01:50, Xueming(Steven) Li:
> > > From: Thomas Monjalon <thomas@monjalon.net>
> > > > 13/06/2021 14:58, Xueming Li:
> > > > > --- /dev/null
> > > > > +++ b/drivers/bus/auxiliary/version.map
> > > > > @@ -0,0 +1,7 @@
> > > > > +EXPERIMENTAL {
> > > > > +     global:
> > > > > +
> > > > > +     # added in 21.08
> > > > > +     rte_auxiliary_register;
> > > > > +     rte_auxiliary_unregister;
> > > > > +};
> > > >
> > > > After more thoughts, shouldn't it be an internal symbol?
> > > > It is used only by DPDK drivers.
> > >
> > > So users will not be able to compose their own driver and register
> > > with auxiliary bus?z
> >
> > Yes, that's an interesting question actually.
> > We can continue with experimental/stable status of driver ABI, but we should invent a new ABI flag like DRIVER, so there is no stability
> > policy on such symbol.
>
> Not quite understand here, why we want to export the function but no ABI guarantee? the api shouldn't change frequently IMHO.

Sorry my message was not clear.
I am OK to keep "EXPERIMENTAL" in this patch.
But in future, we don't want to make driver interface as part
of the stable ABI because it makes evolution harder for no good reason:
nobody is asking for a stable interface with drivers.



^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in security lib
@ 2021-06-24 10:28  3% Kinsella, Ray
  2021-06-24 10:49  0% ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:28 UTC (permalink / raw)
  To: Declan Doherty, Akhil Goyal, Thomas Monjalon, Stephen Hemminger,
	dpdk-dev

Hi Declan and Goyal, 

The following security experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_security_get_userdata
 * rte_security_session_stats_get
 * rte_security_session_update

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in net lib
@ 2021-06-24 10:29  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:29 UTC (permalink / raw)
  To: Olivier Matz, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Oliver, 

The following net experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_net_make_rarp_packet
 * rte_net_skip_ip6_ext
 * rte_ether_unformat_addr 

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in mbuf lib
@ 2021-06-24 10:29  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:29 UTC (permalink / raw)
  To: Olivier Matz, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Oliver, 

The following mbuf experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_mbuf_check
 * rte_mbuf_dynfield_lookup
 * rte_mbuf_dynfield_register
 * rte_mbuf_dynfield_register_offset
 * rte_mbuf_dynflag_lookup
 * rte_mbuf_dynflag_register
 * rte_mbuf_dynflag_register_bitnum
 * rte_mbuf_dyn_dump
 * rte_pktmbuf_copy
 * rte_pktmbuf_free_bulk

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in vhost lib
@ 2021-06-24 10:30  3% Kinsella, Ray
  2021-06-24 11:04  0% ` Xia, Chenbo
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:30 UTC (permalink / raw)
  To: Maxime Coquelin, Chenbo Xia, Thomas Monjalon, Stephen Hemminger,
	dpdk-dev

Hi Maxime and Chenbo, 

The following vhost experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_vhost_driver_get_protocol_features
 * rte_vhost_driver_get_queue_num
 * rte_vhost_crypto_create
 * rte_vhost_crypto_free
 * rte_vhost_crypto_fetch_requests
 * rte_vhost_crypto_finalize_requests
 * rte_vhost_crypto_set_zero_copy
 * rte_vhost_va_from_guest_pa
 * rte_vhost_extern_callback_register
 * rte_vhost_driver_set_protocol_features
 * rte_vhost_set_inflight_desc_split
 * rte_vhost_set_inflight_desc_packed
 * rte_vhost_set_last_inflight_io_split
 * rte_vhost_set_last_inflight_io_packed
 * rte_vhost_clr_inflight_desc_split
 * rte_vhost_clr_inflight_desc_packed
 * rte_vhost_get_vhost_ring_inflight
 * rte_vhost_get_vring_base_from_inflight	

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in flow_classify lib
@ 2021-06-24 10:30  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:30 UTC (permalink / raw)
  To: Iremonger, Bernard, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Bernard, 

The following flow_classify experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_flow_classifier_create
 * rte_flow_classifier_free
 * rte_flow_classifier_query
 * rte_flow_classify_table_create
 * rte_flow_classify_table_entry_add
 * rte_flow_classify_table_entry_delete
 * rte_flow_classify_validate

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in eal lib
@ 2021-06-24 10:31  3% Kinsella, Ray
  2021-06-24 12:14  0% ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:31 UTC (permalink / raw)
  To: Thomas Monjalon, Stephen Hemminger, Burakov, Anatoly, dpdk-dev

Hi Anatoly & Thomas, 

The following eal experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_mp_action_register
 * rte_mp_action_unregister
 * rte_mp_reply
 * rte_mp_sendmsg
 * rte_dev_event_callback_register
 * rte_dev_event_callback_unregister
 * rte_dev_event_monitor_start
 * rte_dev_event_monitor_stop
 * rte_fbarray_attach
 * rte_fbarray_destroy
 * rte_fbarray_detach
 * rte_fbarray_dump_metadata
 * rte_fbarray_find_contig_free
 * rte_fbarray_find_contig_used
 * rte_fbarray_find_idx
 * rte_fbarray_find_next_free
 * rte_fbarray_find_next_n_free
 * rte_fbarray_find_next_n_used
 * rte_fbarray_find_next_used
 * rte_fbarray_get
 * rte_fbarray_init
 * rte_fbarray_is_used
 * rte_fbarray_set_free
 * rte_fbarray_set_used
 * rte_log_register_type_and_pick_level
 * rte_malloc_dump_heaps
 * rte_mem_alloc_validator_register
 * rte_mem_alloc_validator_unregister
 * rte_mem_check_dma_mask
 * rte_mem_event_callback_register
 * rte_mem_event_callback_unregister
 * rte_mem_iova2virt
 * rte_mem_virt2memseg
 * rte_mem_virt2memseg_list
 * rte_memseg_contig_walk
 * rte_memseg_list_walk
 * rte_memseg_walk
 * rte_mp_request_async
 * rte_mp_request_sync
 * rte_class_find
 * rte_class_find_by_name
 * rte_class_register
 * rte_class_unregister
 * rte_dev_iterator_init
 * rte_dev_iterator_next
 * rte_fbarray_find_prev_free
 * rte_fbarray_find_prev_n_free
 * rte_fbarray_find_prev_n_used
 * rte_fbarray_find_prev_used
 * rte_fbarray_find_rev_contig_free
 * rte_fbarray_find_rev_contig_used
 * rte_memseg_contig_walk_thread_unsafe
 * rte_memseg_list_walk_thread_unsafe
 * rte_memseg_walk_thread_unsafe
 * rte_delay_us_sleep
 * rte_dev_event_callback_process
 * rte_dev_hotplug_handle_disable
 * rte_dev_hotplug_handle_enable
 * rte_malloc_heap_create
 * rte_malloc_heap_destroy
 * rte_malloc_heap_get_socket
 * rte_malloc_heap_memory_add
 * rte_malloc_heap_memory_attach
 * rte_malloc_heap_memory_detach
 * rte_malloc_heap_memory_remove
 * rte_malloc_heap_socket_is_external
 * rte_mem_check_dma_mask_thread_unsafe
 * rte_mem_set_dma_mask
 * rte_memseg_get_fd
 * rte_memseg_get_fd_offset
 * rte_memseg_get_fd_offset_thread_unsafe
 * rte_memseg_get_fd_thread_unsafe
 * rte_extmem_attach
 * rte_extmem_detach
 * rte_extmem_register
 * rte_extmem_unregister
 * rte_dev_dma_map
 * rte_dev_dma_unmap
 * rte_fbarray_find_biggest_free
 * rte_fbarray_find_biggest_used
 * rte_fbarray_find_rev_biggest_free
 * rte_fbarray_find_rev_biggest_used
 * rte_intr_callback_unregister_pending
 * rte_realloc_socket
 * rte_intr_ack
 * rte_lcore_cpuset
 * rte_lcore_to_cpu_id
 * rte_mcfg_timer_lock
 * rte_mcfg_timer_unlock
 * rte_rand_max

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in port lib
@ 2021-06-24 10:31  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:31 UTC (permalink / raw)
  To: Cristian Dumitrescu, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Cristian

The following port experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_port_eventdev_writer_nodrop_ops
 * rte_port_eventdev_writer_ops

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in compressdev lib
@ 2021-06-24 10:32  3% Kinsella, Ray
  2021-06-24 10:55  0% ` Trahe, Fiona
  2021-06-25  7:49  0% ` David Marchand
  0 siblings, 2 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:32 UTC (permalink / raw)
  To: Fiona Trahe, Ashish Gupta, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Fiona & Ashish,

The following compressdev experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_compressdev_capability_get
 * rte_compressdev_close
 * rte_compressdev_configure
 * rte_compressdev_count
 * rte_compressdev_dequeue_burst
 * rte_compressdev_devices_get
 * rte_compressdev_enqueue_burst
 * rte_compressdev_get_dev_id
 * rte_compressdev_get_feature_name
 * rte_compressdev_info_get
 * rte_compressdev_name_get
 * rte_compressdev_pmd_allocate
 * rte_compressdev_pmd_create
 * rte_compressdev_pmd_destroy
 * rte_compressdev_pmd_get_named_dev
 * rte_compressdev_pmd_parse_input_args
 * rte_compressdev_pmd_release_device
 * rte_compressdev_private_xform_create
 * rte_compressdev_private_xform_free
 * rte_compressdev_queue_pair_count
 * rte_compressdev_queue_pair_setup
 * rte_compressdev_socket_id
 * rte_compressdev_start
 * rte_compressdev_stats_get
 * rte_compressdev_stats_reset
 * rte_compressdev_stop
 * rte_compressdev_stream_create
 * rte_compressdev_stream_free
 * rte_comp_get_feature_name
 * rte_comp_op_alloc
 * rte_comp_op_bulk_alloc
 * rte_comp_op_bulk_free
 * rte_comp_op_free
 * rte_comp_op_pool_create

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in sched lib
@ 2021-06-24 10:33  3% Kinsella, Ray
  2021-06-24 19:21  0% ` Singh, Jasvinder
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:33 UTC (permalink / raw)
  To: Cristian Dumitrescu, Thomas Monjalon, Stephen Hemminger, Singh,
	Jasvinder, dpdk-dev

Hi Cristian & Jasvinder,

The following sched experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_sched_subport_pipe_profile_add

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in cryptodev lib
@ 2021-06-24 10:33  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:33 UTC (permalink / raw)
  To: Declan Doherty, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Declan,

The following cryptodev experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_cryptodev_asym_capability_get
 * rte_cryptodev_asym_get_header_session_size
 * rte_cryptodev_asym_get_private_session_size
 * rte_cryptodev_asym_get_xform_enum
 * rte_cryptodev_asym_session_clear
 * rte_cryptodev_asym_session_create
 * rte_cryptodev_asym_session_free
 * rte_cryptodev_asym_session_init
 * rte_cryptodev_asym_xform_capability_check_modlen
 * rte_cryptodev_asym_xform_capability_check_optype
 * rte_cryptodev_sym_get_existing_header_session_size
 * rte_cryptodev_sym_session_get_user_data
 * rte_cryptodev_sym_session_pool_create
 * rte_cryptodev_sym_session_set_user_data
 * rte_crypto_asym_op_strings
 * rte_crypto_asym_xform_strings

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in rib lib
@ 2021-06-24 10:34  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:34 UTC (permalink / raw)
  To: Medvedkin, Vladimir, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Vladimir

The following rib experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

* rte_rib_create,
* rte_rib_find_existing,
* rte_rib_free,
* rte_rib_get_depth,
* rte_rib_get_ext,
* rte_rib_get_ip,
* rte_rib_get_nh,
* rte_rib_get_nxt,
* rte_rib_insert,
* rte_rib_lookup,
* rte_rib_lookup_parent,
* rte_rib_lookup_exact,
* rte_rib_set_nh,
* rte_rib_remove,
* rte_rib6_create,
* rte_rib6_find_existing,
* rte_rib6_free,
* rte_rib6_get_depth,
* rte_rib6_get_ext,
* rte_rib6_get_ip,
* rte_rib6_get_nh,
* rte_rib6_get_nxt,
* rte_rib6_insert,
* rte_rib6_lookup,
* rte_rib6_lookup_parent,
* rte_rib6_lookup_exact,
* rte_rib6_set_nh,
* rte_rib6_remove

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in pipeline lib
@ 2021-06-24 10:34  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:34 UTC (permalink / raw)
  To: Cristian Dumitrescu, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Cristian,

The following pipeline experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

* rte_port_in_action_create
* rte_port_in_action_fre
* rte_port_in_action_params_get
* rte_port_in_action_profile_action_register
* rte_port_in_action_profile_create
* rte_port_in_action_profile_free
* rte_port_in_action_profile_freeze
* rte_table_action_apply
* rte_table_action_create
* rte_table_action_dscp_table_update
* rte_table_action_free
* rte_table_action_meter_profile_add
* rte_table_action_meter_profile_delete
* rte_table_action_meter_read
* rte_table_action_profile_action_register
* rte_table_action_profile_create
* rte_table_action_profile_free
* rte_table_action_profile_freeze
* rte_table_action_stats_read
* rte_table_action_table_params_get,
* rte_table_action_time_read
* rte_table_action_ttl_read
* rte_table_action_crypto_sym_session_get

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in ip_frag
@ 2021-06-24 10:34  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:34 UTC (permalink / raw)
  To: Ananyev, Konstantin, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Konstantin

The following ip_frag experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

* rte_frag_table_del_expired_entries

Ray K

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in bbdev lib
@ 2021-06-24 10:35  3% Kinsella, Ray
  2021-06-24 15:42  3% ` Chautru, Nicolas
  2021-06-25  7:48  0% ` David Marchand
  0 siblings, 2 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:35 UTC (permalink / raw)
  To: Nicolas Chautru, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Nicolas

The following bbdev experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

* rte_bbdev_allocate
* rte_bbdev_callback_register
* rte_bbdev_callback_unregister
* rte_bbdev_close
* rte_bbdev_count
* rte_bbdev_dec_op_alloc_bulk
* rte_bbdev_dec_op_free_bulk
* rte_bbdev_dequeue_dec_ops
* rte_bbdev_dequeue_enc_ops
* rte_bbdev_devices
* rte_bbdev_enc_op_alloc_bulk
* rte_bbdev_enc_op_free_bulk
* rte_bbdev_enqueue_dec_ops
* rte_bbdev_enqueue_enc_ops
* rte_bbdev_find_next
* rte_bbdev_get_named_dev
* rte_bbdev_info_get
* rte_bbdev_intr_enable
* rte_bbdev_is_valid
* rte_bbdev_op_pool_create
* rte_bbdev_op_type_str
* rte_bbdev_pmd_callback_process
* rte_bbdev_queue_configure
* rte_bbdev_queue_info_get
* rte_bbdev_queue_intr_ctl
* rte_bbdev_queue_intr_disable
* rte_bbdev_queue_intr_enable
* rte_bbdev_queue_start
* rte_bbdev_queue_stop
* rte_bbdev_release
* rte_bbdev_setup_queues
* rte_bbdev_start
* rte_bbdev_stats_get
* rte_bbdev_stats_reset
* rte_bbdev_stop

Ray K

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental Symbols in ethdev lib
@ 2021-06-24 10:36  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:36 UTC (permalink / raw)
  To: Thomas Monjalon, Yigit, Ferruh, Andrew Rybchenko, dpdk-dev

Hi Thomas, Ferruh and Andrew,

The following ethdev experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_mtr_capabilities_get,
 * rte_mtr_create,
 * rte_mtr_destroy,
 * rte_mtr_meter_disable,
 * rte_mtr_meter_dscp_table_update,
 * rte_mtr_meter_enable,
 * rte_mtr_meter_profile_add,
 * rte_mtr_meter_profile_delete,
 * rte_mtr_meter_profile_update,
 * rte_mtr_stats_read,
 * rte_mtr_stats_update,
 * rte_eth_dev_is_removed,
 * rte_eth_dev_owner_delete,
 * rte_eth_dev_owner_get,
 * rte_eth_dev_owner_new,
 * rte_eth_dev_owner_set,
 * rte_eth_dev_owner_unset,
 * rte_eth_dev_get_module_eeprom,
 * rte_eth_dev_get_module_info,
 * rte_eth_dev_rx_intr_ctl_q_get_fd,
 * rte_flow_conv,
 * rte_eth_find_next_of,
 * rte_eth_find_next_sibling,
 * rte_eth_read_clock,
 * rte_eth_dev_hairpin_capability_get,
 * rte_eth_rx_burst_mode_get,
 * rte_eth_rx_hairpin_queue_setup,
 * rte_eth_tx_burst_mode_get,
 * rte_eth_tx_hairpin_queue_setup,
 * rte_flow_dynf_metadata_offs,
 * rte_flow_dynf_metadata_mask,
 * rte_flow_dynf_metadata_register,
 * rte_eth_dev_set_ptypes

Ray K

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental Symbols in kvargs
@ 2021-06-24 10:36  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:36 UTC (permalink / raw)
  To: Olivier Matz, Stephen Hemminger, Thomas Monjalon, dpdk-dev

Hi Oliver,

The following kvargs experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

* rte_kvargs_parse_delim
* rte_kvargs_strcmp

Ray K

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in power lib
@ 2021-06-24 10:39  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:39 UTC (permalink / raw)
  To: David Hunt, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi David,

The following power experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_empty_poll_detection
 * rte_power_empty_poll_stat_fetch
 * rte_power_empty_poll_stat_free
 * rte_power_empty_poll_stat_init
 * rte_power_empty_poll_stat_update
 * rte_power_guest_channel_receive_msg
 * rte_power_poll_stat_fetch
 * rte_power_poll_stat_update

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in kni lib
@ 2021-06-24 10:42  3% Kinsella, Ray
  2021-06-24 13:24  0% ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:42 UTC (permalink / raw)
  To: Yigit, Ferruh, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Ferruh, 

The following kni experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_kni_update_link

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in metrics lib
@ 2021-06-24 10:44  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:44 UTC (permalink / raw)
  To: Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Thomas, 

The following metrics experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_metrics_deinit

Ray K


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] Experimental symbols in fib lib
@ 2021-06-24 10:46  3% Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:46 UTC (permalink / raw)
  To: Medvedkin, Vladimir, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Vladimir, 

The following fib experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 

 * rte_fib_add
 * rte_fib_create
 * rte_fib_delete
 * rte_fib_find_existing
 * rte_fib_free
 * rte_fib_lookup_bulk
 * rte_fib_get_dp
 * rte_fib_get_rib
 * rte_fib6_add
 * rte_fib6_create
 * rte_fib6_delete
 * rte_fib6_find_existing
 * rte_fib6_free
 * rte_fib6_lookup_bulk
 * rte_fib6_get_dp
 * rte_fib6_get_rib

Ray K


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] Experimental symbols in hash lib
       [not found]     <c6c3ce36-9585-6fcb-8899-719d6b8a368b@ashroe.eu>
@ 2021-06-24 10:47  0% ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:47 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel, Richardson, Bruce, Medvedkin,
	Vladimir, dpdk-dev

+ dpdk dev

(missed the dev list the first time, apologies).

On 24/06/2021 11:41, Kinsella, Ray wrote:
> Hi Yipeng, Sameh, Bruce and Vladimir, 
> 
> The following hash experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 
> 
>  * rte_hash_free_key_with_position
> 
> Ray K
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in security lib
  2021-06-24 10:28  3% [dpdk-dev] Experimental symbols in security lib Kinsella, Ray
@ 2021-06-24 10:49  0% ` Kinsella, Ray
  2021-06-24 12:22  0%   ` [dpdk-dev] [EXT] " Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-06-24 10:49 UTC (permalink / raw)
  To: Declan Doherty, Thomas Monjalon, Stephen Hemminger, dpdk-dev,
	Akhil,Goyal,

(correcting Goyals address, apologies for the resend)  

On 24/06/2021 11:28, Kinsella, Ray wrote:
> Hi Declan and Goyal, 
> 
> The following security experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 
> 
>  * rte_security_get_userdata
>  * rte_security_session_stats_get
>  * rte_security_session_update
> 
> Ray K
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in compressdev lib
  2021-06-24 10:32  3% [dpdk-dev] Experimental symbols in compressdev lib Kinsella, Ray
@ 2021-06-24 10:55  0% ` Trahe, Fiona
  2021-06-25  7:49  0% ` David Marchand
  1 sibling, 0 replies; 200+ results
From: Trahe, Fiona @ 2021-06-24 10:55 UTC (permalink / raw)
  To: Kinsella, Ray, Ashish Gupta, Thomas Monjalon, Stephen Hemminger,
	dpdk-dev
  Cc: Trahe, Fiona

Hi Ray,
Sounds reasonable, however I'm not curently working on this project, so will have to leave to others to propose.
Fiona
 

> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: Thursday, June 24, 2021 11:33 AM
> To: Trahe, Fiona <fiona.trahe@intel.com>; Ashish Gupta <ashish.gupta@marvell.com>; Thomas
> Monjalon <thomas@monjalon.net>; Stephen Hemminger <stephen@networkplumber.org>; dpdk-dev
> <dev@dpdk.org>
> Subject: Experimental symbols in compressdev lib
> 
> Hi Fiona & Ashish,
> 
> The following compressdev experimental symbols are present in both v21.05 and v19.11 release.
> These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as
> they have been experimental for >= 2yrs at this point.
> 
>  * rte_compressdev_capability_get
>  * rte_compressdev_close
>  * rte_compressdev_configure
>  * rte_compressdev_count
>  * rte_compressdev_dequeue_burst
>  * rte_compressdev_devices_get
>  * rte_compressdev_enqueue_burst
>  * rte_compressdev_get_dev_id
>  * rte_compressdev_get_feature_name
>  * rte_compressdev_info_get
>  * rte_compressdev_name_get
>  * rte_compressdev_pmd_allocate
>  * rte_compressdev_pmd_create
>  * rte_compressdev_pmd_destroy
>  * rte_compressdev_pmd_get_named_dev
>  * rte_compressdev_pmd_parse_input_args
>  * rte_compressdev_pmd_release_device
>  * rte_compressdev_private_xform_create
>  * rte_compressdev_private_xform_free
>  * rte_compressdev_queue_pair_count
>  * rte_compressdev_queue_pair_setup
>  * rte_compressdev_socket_id
>  * rte_compressdev_start
>  * rte_compressdev_stats_get
>  * rte_compressdev_stats_reset
>  * rte_compressdev_stop
>  * rte_compressdev_stream_create
>  * rte_compressdev_stream_free
>  * rte_comp_get_feature_name
>  * rte_comp_op_alloc
>  * rte_comp_op_bulk_alloc
>  * rte_comp_op_bulk_free
>  * rte_comp_op_free
>  * rte_comp_op_pool_create
> 
> Ray K


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in vhost lib
  2021-06-24 10:30  3% [dpdk-dev] Experimental symbols in vhost lib Kinsella, Ray
@ 2021-06-24 11:04  0% ` Xia, Chenbo
  0 siblings, 0 replies; 200+ results
From: Xia, Chenbo @ 2021-06-24 11:04 UTC (permalink / raw)
  To: Kinsella, Ray, Maxime Coquelin, Thomas Monjalon,
	Stephen Hemminger, dpdk-dev

Hi Ray,

> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: Thursday, June 24, 2021 6:30 PM
> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Xia, Chenbo
> <chenbo.xia@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Stephen
> Hemminger <stephen@networkplumber.org>; dpdk-dev <dev@dpdk.org>
> Subject: Experimental symbols in vhost lib
> 
> Hi Maxime and Chenbo,
> 
> The following vhost experimental symbols are present in both v21.05 and v19.11
> release. These symbols should be considered for promotion to stable as part of
> the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this
> point.

[...]

Thanks for the heads up! I will discuss with Maxime on the experimental symbols.

Chenbo

> Ray K


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in eal lib
  2021-06-24 10:31  3% [dpdk-dev] Experimental symbols in eal lib Kinsella, Ray
@ 2021-06-24 12:14  0% ` David Marchand
  2021-06-24 12:15  0%   ` Kinsella, Ray
  2021-06-29 16:50  0%   ` Tyler Retzlaff
  0 siblings, 2 replies; 200+ results
From: David Marchand @ 2021-06-24 12:14 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: Thomas Monjalon, Stephen Hemminger, Burakov, Anatoly, dpdk-dev

On Thu, Jun 24, 2021 at 12:31 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>
> Hi Anatoly & Thomas,
>
> The following eal experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point.

Just an additional comment.
Marking stable is not the only choice.
We can also consider hiding such symbols (marking internal) if there
is no clear usecase out of DPDK.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in eal lib
  2021-06-24 12:14  0% ` David Marchand
@ 2021-06-24 12:15  0%   ` Kinsella, Ray
  2021-06-29 16:50  0%   ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 12:15 UTC (permalink / raw)
  To: David Marchand
  Cc: Thomas Monjalon, Stephen Hemminger, Burakov, Anatoly, dpdk-dev

Good point, that one is very up to the lib maintainer to make that call.

Ray K

On 24/06/2021 13:14, David Marchand wrote:
> On Thu, Jun 24, 2021 at 12:31 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>
>> Hi Anatoly & Thomas,
>>
>> The following eal experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point.
> 
> Just an additional comment.
> Marking stable is not the only choice.
> We can also consider hiding such symbols (marking internal) if there
> is no clear usecase out of DPDK.
> 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] Re: Experimental symbols in security lib
  2021-06-24 10:49  0% ` Kinsella, Ray
@ 2021-06-24 12:22  0%   ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2021-06-24 12:22 UTC (permalink / raw)
  To: Kinsella, Ray, Declan Doherty, Thomas Monjalon,
	Stephen Hemminger, dpdk-dev
  Cc: Anoob Joseph, Konstantin Ananyev, Hemant Agrawal,
	Nithin Kumar Dabilpuram, Fan Zhang, matan

Hi Ray,
> ----------------------------------------------------------------------
> (correcting Goyals address, apologies for the resend)
> 
> On 24/06/2021 11:28, Kinsella, Ray wrote:
> > Hi Declan and Goyal,
> >
> > The following security experimental symbols are present in both v21.05
> and v19.11 release. These symbols should be considered for promotion to
> stable as part of the v22 ABI in DPDK 21.11, as they have been experimental
> for >= 2yrs at this point.

Thanks for reminding this, I will plan to move it to stable API in 21.11 timeframe.
Adding more people in cc in case of any objections.

> >
> >  * rte_security_get_userdata
> >  * rte_security_session_stats_get
> >  * rte_security_session_update
> >


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in kni lib
  2021-06-24 10:42  3% [dpdk-dev] Experimental symbols in kni lib Kinsella, Ray
@ 2021-06-24 13:24  0% ` Ferruh Yigit
  2021-06-24 13:54  0%   ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-06-24 13:24 UTC (permalink / raw)
  To: Kinsella, Ray, Thomas Monjalon, Stephen Hemminger, dpdk-dev

On 6/24/2021 11:42 AM, Kinsella, Ray wrote:
> Hi Ferruh, 
> 
> The following kni experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 
> 
>  * rte_kni_update_link
> 
> Ray K
> 

Hi Ray,

Thanks for follow up.

I just checked the API and planning a small behavior update to it.
If the update is accepted, I suggest keeping the API experimental for 21.08 too,
but can mature it on v21.11.

Thanks,
ferruh

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in kni lib
  2021-06-24 13:24  0% ` Ferruh Yigit
@ 2021-06-24 13:54  0%   ` Kinsella, Ray
  2021-06-25 13:26  0%     ` Igor Ryzhov
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-06-24 13:54 UTC (permalink / raw)
  To: Ferruh Yigit, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Sounds more than reasonable, +1 from me.

Ray K

On 24/06/2021 14:24, Ferruh Yigit wrote:
> On 6/24/2021 11:42 AM, Kinsella, Ray wrote:
>> Hi Ferruh, 
>>
>> The following kni experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 
>>
>>  * rte_kni_update_link
>>
>> Ray K
>>
> 
> Hi Ray,
> 
> Thanks for follow up.
> 
> I just checked the API and planning a small behavior update to it.
> If the update is accepted, I suggest keeping the API experimental for 21.08 too,
> but can mature it on v21.11.
> 
> Thanks,
> ferruh
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in bbdev lib
  2021-06-24 10:35  3% [dpdk-dev] Experimental symbols in bbdev lib Kinsella, Ray
@ 2021-06-24 15:42  3% ` Chautru, Nicolas
  2021-06-24 19:27  3%   ` Kinsella, Ray
  2021-06-25  7:48  0% ` David Marchand
  1 sibling, 1 reply; 200+ results
From: Chautru, Nicolas @ 2021-06-24 15:42 UTC (permalink / raw)
  To: Kinsella, Ray, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Ray, 

That request was considered for 20.11. But this was deferred by the community while waiting for other vendors who may be willing to contribute their own PMDs.
Any specific concern with this not being on a tracked ABI?

Thanks
Nic


> -----Original Message-----
> From: Kinsella, Ray <mdr@ashroe.eu>
> Sent: Thursday, June 24, 2021 3:35 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Thomas Monjalon
> <thomas@monjalon.net>; Stephen Hemminger
> <stephen@networkplumber.org>; dpdk-dev <dev@dpdk.org>
> Subject: Experimental symbols in bbdev lib
> 
> Hi Nicolas
> 
> The following bbdev experimental symbols are present in both v21.05 and
> v19.11 release. These symbols should be considered for promotion to stable
> as part of the v22 ABI in DPDK 21.11, as they have been experimental for >=
> 2yrs at this point.
> 
> * rte_bbdev_allocate
> * rte_bbdev_callback_register
> * rte_bbdev_callback_unregister
> * rte_bbdev_close
> * rte_bbdev_count
> * rte_bbdev_dec_op_alloc_bulk
> * rte_bbdev_dec_op_free_bulk
> * rte_bbdev_dequeue_dec_ops
> * rte_bbdev_dequeue_enc_ops
> * rte_bbdev_devices
> * rte_bbdev_enc_op_alloc_bulk
> * rte_bbdev_enc_op_free_bulk
> * rte_bbdev_enqueue_dec_ops
> * rte_bbdev_enqueue_enc_ops
> * rte_bbdev_find_next
> * rte_bbdev_get_named_dev
> * rte_bbdev_info_get
> * rte_bbdev_intr_enable
> * rte_bbdev_is_valid
> * rte_bbdev_op_pool_create
> * rte_bbdev_op_type_str
> * rte_bbdev_pmd_callback_process
> * rte_bbdev_queue_configure
> * rte_bbdev_queue_info_get
> * rte_bbdev_queue_intr_ctl
> * rte_bbdev_queue_intr_disable
> * rte_bbdev_queue_intr_enable
> * rte_bbdev_queue_start
> * rte_bbdev_queue_stop
> * rte_bbdev_release
> * rte_bbdev_setup_queues
> * rte_bbdev_start
> * rte_bbdev_stats_get
> * rte_bbdev_stats_reset
> * rte_bbdev_stop
> 
> Ray K

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] Experimental symbols in sched lib
  2021-06-24 10:33  3% [dpdk-dev] Experimental symbols in sched lib Kinsella, Ray
@ 2021-06-24 19:21  0% ` Singh, Jasvinder
  0 siblings, 0 replies; 200+ results
From: Singh, Jasvinder @ 2021-06-24 19:21 UTC (permalink / raw)
  To: Kinsella, Ray
  Cc: Dumitrescu, Cristian, Thomas Monjalon, Stephen Hemminger, dpdk-dev



> On 24 Jun 2021, at 11:33, Kinsella, Ray <mdr@ashroe.eu> wrote:
> 
> Hi Cristian & Jasvinder,
> 
> The following sched experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point. 
> 
> * rte_sched_subport_pipe_profile_add
> 
> Ray K

I’ll send patch to remove experimental tag. Thanks for the heads up. 
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in bbdev lib
  2021-06-24 15:42  3% ` Chautru, Nicolas
@ 2021-06-24 19:27  3%   ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-24 19:27 UTC (permalink / raw)
  To: Chautru, Nicolas, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Nicolas,

I could equally ask is there is any concern with this being a tracked ABI?
The API has seen zero changes in two years - IMHO we'd need a very good reason not standardize it.
As there has been ample opportunities for others to chime in. 

git log --format=oneline --follow v19.11..v21.05 -- lib/bbdev/version.map
99a2dd955fba6e4cc23b77d590a033650ced9c45 lib: remove librte_ prefix from directory names
63b3907833d87288bbc74f370e22f2929ec34594 build: remove library name from version map file name

Ray K

On 24/06/2021 16:42, Chautru, Nicolas wrote:
> Hi Ray, 
> 
> That request was considered for 20.11. But this was deferred by the community while waiting for other vendors who may be willing to contribute their own PMDs.
> Any specific concern with this not being on a tracked ABI?
> 
> Thanks
> Nic
> 
> 
>> -----Original Message-----
>> From: Kinsella, Ray <mdr@ashroe.eu>
>> Sent: Thursday, June 24, 2021 3:35 AM
>> To: Chautru, Nicolas <nicolas.chautru@intel.com>; Thomas Monjalon
>> <thomas@monjalon.net>; Stephen Hemminger
>> <stephen@networkplumber.org>; dpdk-dev <dev@dpdk.org>
>> Subject: Experimental symbols in bbdev lib
>>
>> Hi Nicolas
>>
>> The following bbdev experimental symbols are present in both v21.05 and
>> v19.11 release. These symbols should be considered for promotion to stable
>> as part of the v22 ABI in DPDK 21.11, as they have been experimental for >=
>> 2yrs at this point.
>>
>> * rte_bbdev_allocate
>> * rte_bbdev_callback_register
>> * rte_bbdev_callback_unregister
>> * rte_bbdev_close
>> * rte_bbdev_count
>> * rte_bbdev_dec_op_alloc_bulk
>> * rte_bbdev_dec_op_free_bulk
>> * rte_bbdev_dequeue_dec_ops
>> * rte_bbdev_dequeue_enc_ops
>> * rte_bbdev_devices
>> * rte_bbdev_enc_op_alloc_bulk
>> * rte_bbdev_enc_op_free_bulk
>> * rte_bbdev_enqueue_dec_ops
>> * rte_bbdev_enqueue_enc_ops
>> * rte_bbdev_find_next
>> * rte_bbdev_get_named_dev
>> * rte_bbdev_info_get
>> * rte_bbdev_intr_enable
>> * rte_bbdev_is_valid
>> * rte_bbdev_op_pool_create
>> * rte_bbdev_op_type_str
>> * rte_bbdev_pmd_callback_process
>> * rte_bbdev_queue_configure
>> * rte_bbdev_queue_info_get
>> * rte_bbdev_queue_intr_ctl
>> * rte_bbdev_queue_intr_disable
>> * rte_bbdev_queue_intr_enable
>> * rte_bbdev_queue_start
>> * rte_bbdev_queue_stop
>> * rte_bbdev_release
>> * rte_bbdev_setup_queues
>> * rte_bbdev_start
>> * rte_bbdev_stats_get
>> * rte_bbdev_stats_reset
>> * rte_bbdev_stop
>>
>> Ray K

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] Experimental symbols in bbdev lib
  2021-06-24 10:35  3% [dpdk-dev] Experimental symbols in bbdev lib Kinsella, Ray
  2021-06-24 15:42  3% ` Chautru, Nicolas
@ 2021-06-25  7:48  0% ` David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2021-06-25  7:48 UTC (permalink / raw)
  To: Nicolas Chautru
  Cc: Kinsella, Ray, Thomas Monjalon, Stephen Hemminger, dpdk-dev,
	Maxime Coquelin

On Thu, Jun 24, 2021 at 12:35 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>
> Hi Nicolas
>
> The following bbdev experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point.
>
> * rte_bbdev_allocate
> * rte_bbdev_callback_register
> * rte_bbdev_callback_unregister
> * rte_bbdev_close
> * rte_bbdev_count
> * rte_bbdev_dec_op_alloc_bulk
> * rte_bbdev_dec_op_free_bulk
> * rte_bbdev_dequeue_dec_ops
> * rte_bbdev_dequeue_enc_ops
> * rte_bbdev_devices
> * rte_bbdev_enc_op_alloc_bulk
> * rte_bbdev_enc_op_free_bulk
> * rte_bbdev_enqueue_dec_ops
> * rte_bbdev_enqueue_enc_ops
> * rte_bbdev_find_next
> * rte_bbdev_get_named_dev
> * rte_bbdev_info_get
> * rte_bbdev_intr_enable
> * rte_bbdev_is_valid
> * rte_bbdev_op_pool_create
> * rte_bbdev_op_type_str
> * rte_bbdev_pmd_callback_process
> * rte_bbdev_queue_configure
> * rte_bbdev_queue_info_get
> * rte_bbdev_queue_intr_ctl
> * rte_bbdev_queue_intr_disable
> * rte_bbdev_queue_intr_enable
> * rte_bbdev_queue_start
> * rte_bbdev_queue_stop
> * rte_bbdev_release
> * rte_bbdev_setup_queues
> * rte_bbdev_start
> * rte_bbdev_stats_get
> * rte_bbdev_stats_reset
> * rte_bbdev_stop

Regardless of removing the experimental status on this API, part of
the symbols listed here are driver-only and should be marked internal.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in compressdev lib
  2021-06-24 10:32  3% [dpdk-dev] Experimental symbols in compressdev lib Kinsella, Ray
  2021-06-24 10:55  0% ` Trahe, Fiona
@ 2021-06-25  7:49  0% ` David Marchand
  2021-06-25  9:14  0%   ` Kinsella, Ray
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2021-06-25  7:49 UTC (permalink / raw)
  To: Fiona Trahe, Ashish Gupta
  Cc: Kinsella, Ray, Thomas Monjalon, Stephen Hemminger, dpdk-dev

On Thu, Jun 24, 2021 at 12:33 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>
> Hi Fiona & Ashish,
>
> The following compressdev experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point.
>
>  * rte_compressdev_capability_get
>  * rte_compressdev_close
>  * rte_compressdev_configure
>  * rte_compressdev_count
>  * rte_compressdev_dequeue_burst
>  * rte_compressdev_devices_get
>  * rte_compressdev_enqueue_burst
>  * rte_compressdev_get_dev_id
>  * rte_compressdev_get_feature_name
>  * rte_compressdev_info_get
>  * rte_compressdev_name_get
>  * rte_compressdev_pmd_allocate
>  * rte_compressdev_pmd_create
>  * rte_compressdev_pmd_destroy
>  * rte_compressdev_pmd_get_named_dev
>  * rte_compressdev_pmd_parse_input_args
>  * rte_compressdev_pmd_release_device
>  * rte_compressdev_private_xform_create
>  * rte_compressdev_private_xform_free
>  * rte_compressdev_queue_pair_count
>  * rte_compressdev_queue_pair_setup
>  * rte_compressdev_socket_id
>  * rte_compressdev_start
>  * rte_compressdev_stats_get
>  * rte_compressdev_stats_reset
>  * rte_compressdev_stop
>  * rte_compressdev_stream_create
>  * rte_compressdev_stream_free
>  * rte_comp_get_feature_name
>  * rte_comp_op_alloc
>  * rte_comp_op_bulk_alloc
>  * rte_comp_op_bulk_free
>  * rte_comp_op_free
>  * rte_comp_op_pool_create
>

Part of the symbols listed here are driver-only (at least the *_pmd_*
symbols) and should be marked internal.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1] doc: update ABI in MAINTAINERS file
  2021-06-22 15:50 12% [dpdk-dev] [PATCH v1] doc: update ABI in MAINTAINERS file Ray Kinsella
@ 2021-06-25  8:08  7% ` Ferruh Yigit
  2021-07-09 15:50  4%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-06-25  8:08 UTC (permalink / raw)
  To: Ray Kinsella, dev; +Cc: stephen, thomas, ktraynor, bruce.richardson

On 6/22/2021 4:50 PM, Ray Kinsella wrote:
> Update to ABI MAINTAINERS.
> 
> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> ---
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5877a16971..dab8883a4f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -117,7 +117,6 @@ F: .ci/
>  
>  ABI Policy & Versioning
>  M: Ray Kinsella <mdr@ashroe.eu>
> -M: Neil Horman <nhorman@tuxdriver.com>
>  F: lib/eal/include/rte_compat.h
>  F: lib/eal/include/rte_function_versioning.h
>  F: doc/guides/contributing/abi_*.rst
> 

Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

Tried to reach out Neil multiple times for ABI issues without success.

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] Experimental symbols in compressdev lib
  2021-06-25  7:49  0% ` David Marchand
@ 2021-06-25  9:14  0%   ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-06-25  9:14 UTC (permalink / raw)
  To: David Marchand, Fiona Trahe, Ashish Gupta
  Cc: Thomas Monjalon, Stephen Hemminger, dpdk-dev



On 25/06/2021 08:49, David Marchand wrote:
> On Thu, Jun 24, 2021 at 12:33 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
>>
>> Hi Fiona & Ashish,
>>
>> The following compressdev experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point.
>>
>>  * rte_compressdev_capability_get
>>  * rte_compressdev_close
>>  * rte_compressdev_configure
>>  * rte_compressdev_count
>>  * rte_compressdev_dequeue_burst
>>  * rte_compressdev_devices_get
>>  * rte_compressdev_enqueue_burst
>>  * rte_compressdev_get_dev_id
>>  * rte_compressdev_get_feature_name
>>  * rte_compressdev_info_get
>>  * rte_compressdev_name_get
>>  * rte_compressdev_pmd_allocate
>>  * rte_compressdev_pmd_create
>>  * rte_compressdev_pmd_destroy
>>  * rte_compressdev_pmd_get_named_dev
>>  * rte_compressdev_pmd_parse_input_args
>>  * rte_compressdev_pmd_release_device
>>  * rte_compressdev_private_xform_create
>>  * rte_compressdev_private_xform_free
>>  * rte_compressdev_queue_pair_count
>>  * rte_compressdev_queue_pair_setup
>>  * rte_compressdev_socket_id
>>  * rte_compressdev_start
>>  * rte_compressdev_stats_get
>>  * rte_compressdev_stats_reset
>>  * rte_compressdev_stop
>>  * rte_compressdev_stream_create
>>  * rte_compressdev_stream_free
>>  * rte_comp_get_feature_name
>>  * rte_comp_op_alloc
>>  * rte_comp_op_bulk_alloc
>>  * rte_comp_op_bulk_free
>>  * rte_comp_op_free
>>  * rte_comp_op_pool_create
>>
> 
> Part of the symbols listed here are driver-only (at least the *_pmd_*
> symbols) and should be marked internal.
> 
+1 agreed. 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Experimental symbols in kni lib
  2021-06-24 13:54  0%   ` Kinsella, Ray
@ 2021-06-25 13:26  0%     ` Igor Ryzhov
  2021-06-28 12:23  0%       ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Igor Ryzhov @ 2021-06-25 13:26 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Kinsella, Ray, Thomas Monjalon, Stephen Hemminger, dpdk-dev

Hi Ferruh, all,

Let's please discuss another approach to setting KNI link status before
making this API stable:
http://patches.dpdk.org/project/dpdk/patch/20190925093623.18419-1-iryzhov@nfware.com/

I explained the problem with the current implementation there.
More than that, using ioctl approach makes it possible to set also speed
and duplex and use them to implement get_link_ksettings callback.
I can send patches for both features.

Igor

On Thu, Jun 24, 2021 at 4:54 PM Kinsella, Ray <mdr@ashroe.eu> wrote:

> Sounds more than reasonable, +1 from me.
>
> Ray K
>
> On 24/06/2021 14:24, Ferruh Yigit wrote:
> > On 6/24/2021 11:42 AM, Kinsella, Ray wrote:
> >> Hi Ferruh,
> >>
> >> The following kni experimental symbols are present in both v21.05 and
> v19.11 release. These symbols should be considered for promotion to stable
> as part of the v22 ABI in DPDK 21.11, as they have been experimental for >=
> 2yrs at this point.
> >>
> >>  * rte_kni_update_link
> >>
> >> Ray K
> >>
> >
> > Hi Ray,
> >
> > Thanks for follow up.
> >
> > I just checked the API and planning a small behavior update to it.
> > If the update is accepted, I suggest keeping the API experimental for
> 21.08 too,
> > but can mature it on v21.11.
> >
> > Thanks,
> > ferruh
> >
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v2 1/7] power_intrinsics: use callbacks for comparison
  @ 2021-06-25 14:00  3%   ` Anatoly Burakov
  2021-06-25 14:00  3%   ` [dpdk-dev] [PATCH v2 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-06-25 14:00 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  1 +
 drivers/event/dlb2/dlb2.c                     | 16 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 19 ++++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 19 ++++++++----
 drivers/net/ice/ice_rxtx.c                    | 19 ++++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 19 ++++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 16 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 29 ++++++++++++++-----
 lib/eal/x86/rte_power_intrinsics.c            |  9 ++----
 9 files changed, 106 insertions(+), 41 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..c84ac280f5 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -84,6 +84,7 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..14dfac257c 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,15 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val, const uint64_t opaque[4])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3203,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 6c58decece..45f3fbf4ec 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,17 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +104,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 0361af0d85..6e12ecce07 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,17 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +80,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index fc9bb5a3e7..278eb4b9a1 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,17 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +50,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..0c5045d9dc 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,17 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1392,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 6cd71a44eb..f31a1ec839 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,17 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx_monitor_callback(const uint64_t value, const uint64_t opaque[4])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +293,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..046667ade6 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,34 @@
  * which are architecture-dependent.
  */
 
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[4]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[4]; /**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..3c5c9ce7ad 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -110,14 +110,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
+	/* if we have a callback, we might not need to sleep at all */
+	if (pmc->fn) {
 		const uint64_t cur_value = __get_umwait_val(
 				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
-
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
+		if (pmc->fn(cur_value, pmc->opaque) != 0)
 			goto end;
 	}
 
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v2 4/7] power: remove thread safety from PMD power API's
    2021-06-25 14:00  3%   ` [dpdk-dev] [PATCH v2 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
@ 2021-06-25 14:00  3%   ` Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-06-25 14:00 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: ciara.loftus

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   5 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 9d1cfac395..f015c509fc 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -88,6 +88,11 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
+
 ABI Changes
 -----------
 
diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..4f6a242364 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,4 +21,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] 20.11.2 patches review and test
@ 2021-06-26 15:41  1% Xueming(Steven) Li
  0 siblings, 0 replies; 200+ results
From: Xueming(Steven) Li @ 2021-06-26 15:41 UTC (permalink / raw)
  To: stable
  Cc: dev, Abhishek Marathe, Akhil Goyal, Ali Alnubani,
	benjamin.walker, David Christensen, hariprasad.govindharajan,
	Hemant Agrawal, Ian Stokes, Jerin Jacob, John McNamara,
	Ju-Hyoung Lee, Kevin Traynor, Luca Boccassi, Pei Zhang, pingx.yu,
	qian.q.xu, Raslan Darawsheh, NBU-Contact-Thomas Monjalon,
	yuan.peng, zhaoyan.chen

Hi all,

Here is a list of patches targeted for stable release 20.11.2.

The planned date for the final release is 6th July.

Please help with testing and validation of your use cases and report
any issues/results with reply-all to this mail. For the final release
the fixes and reported validations will be added to the release notes.

A release candidate tarball can be found at:

    https://dpdk.org/browse/dpdk-stable/tag/?id=v20.11.2-rc2

These patches are located at branch 20.11 of dpdk-stable repo:
    https://dpdk.org/browse/dpdk-stable/

Thanks.

Xueming Li <xuemingl@nvidia.com>

---
Adam Dybkowski (3):
      common/qat: increase IM buffer size for GEN3
      compress/qat: enable compression on GEN3
      crypto/qat: fix null authentication request

Ajit Khaparde (7):
      net/bnxt: fix RSS context cleanup
      net/bnxt: check kvargs parsing
      net/bnxt: fix resource cleanup
      doc: fix formatting in testpmd guide
      net/bnxt: fix mismatched type comparison in MAC restore
      net/bnxt: check PCI config read
      net/bnxt: fix mismatched type comparison in Rx

Alvin Zhang (11):
      net/ice: fix VLAN filter with PF
      net/i40e: fix input set field mask
      net/igc: fix Rx RSS hash offload capability
      net/igc: fix Rx error counter for bad length
      net/e1000: fix Rx error counter for bad length
      net/e1000: fix max Rx packet size
      net/igc: fix Rx packet size
      net/ice: fix fast mbuf freeing
      net/iavf: fix VF to PF command failure handling
      net/i40e: fix VF RSS configuration
      net/igc: fix speed configuration

Anatoly Burakov (3):
      fbarray: fix log message on truncation error
      power: do not skip saving original P-state governor
      power: save original ACPI governor always

Andrew Boyer (1):
      net/ionic: fix completion type in lif init

Andrew Rybchenko (4):
      net/failsafe: fix RSS hash offload reporting
      net/failsafe: report minimum and maximum MTU
      common/sfc_efx: remove GENEVE from supported tunnels
      net/sfc: fix mark support in EF100 native Rx datapath

Andy Moreton (2):
      common/sfc_efx/base: limit reported MCDI response length
      common/sfc_efx/base: add missing MCDI response length checks

Ankur Dwivedi (1):
      crypto/octeontx: fix session-less mode

Apeksha Gupta (1):
      examples/l2fwd-crypto: skip masked devices

Arek Kusztal (1):
      crypto/qat: fix offset for out-of-place scatter-gather

Beilei Xing (1):
      net/i40evf: fix packet loss for X722

Bing Zhao (1):
      net/mlx5: fix loopback for Direct Verbs queue

Bruce Richardson (2):
      build: exclude meson files from examples installation
      raw/ioat: fix script for configuring small number of queues

Chaoyong He (1):
      doc: fix multiport syntax in nfp guide

Chenbo Xia (1):
      examples/vhost: check memory table query

Chengchang Tang (20):
      net/hns3: fix HW buffer size on MTU update
      net/hns3: fix processing Tx offload flags
      net/hns3: fix Tx checksum for UDP packets with special port
      net/hns3: fix long task queue pairs reset time
      ethdev: validate input in module EEPROM dump
      ethdev: validate input in register info
      ethdev: validate input in EEPROM info
      net/hns3: fix rollback after setting PVID failure
      net/hns3: fix timing in resetting queues
      net/hns3: fix queue state when concurrent with reset
      net/hns3: fix configure FEC when concurrent with reset
      net/hns3: fix use of command status enumeration
      examples: add eal cleanup to examples
      net/bonding: fix adding itself as its slave
      net/hns3: fix timing in mailbox
      app/testpmd: fix max queue number for Tx offloads
      net/tap: fix interrupt vector array size
      net/bonding: fix socket ID check
      net/tap: check ioctl on restore
      examples/timer: fix time interval

Chengwen Feng (50):
      net/hns3: fix flow counter value
      net/hns3: fix VF mailbox head field
      net/hns3: support get device version when dump register
      net/hns3: fix some packet types
      net/hns3: fix missing outer L4 UDP flag for VXLAN
      net/hns3: remove VLAN/QinQ ptypes from support list
      test: check thread creation
      common/dpaax: fix possible null pointer access
      examples/ethtool: remove unused parsing
      net/hns3: fix flow director lock
      net/e1000/base: fix timeout for shadow RAM write
      net/hns3: fix setting default MAC address in bonding of VF
      net/hns3: fix possible mismatched response of mailbox
      net/hns3: fix VF handling LSC event in secondary process
      net/hns3: fix verification of NEON support
      mbuf: check shared memory before dumping dynamic space
      eventdev: remove redundant thread name setting
      eventdev: fix memory leakage on thread creation failure
      net/kni: check init result
      net/hns3: fix mailbox error message
      net/hns3: fix processing link status message on PF
      net/hns3: remove unused mailbox macro and struct
      net/bonding: fix leak on remove
      net/hns3: fix handling link update
      net/i40e: fix negative VEB index
      net/i40e: remove redundant VSI check in Tx queue setup
      net/virtio: fix getline memory leakage
      net/hns3: log time delta in decimal format
      net/hns3: fix time delta calculation
      net/hns3: remove unused macros
      net/hns3: fix vector Rx burst limitation
      net/hns3: remove read when enabling TM QCN error event
      net/hns3: remove unused VMDq code
      net/hns3: increase readability in logs
      raw/ntb: check SPAD user index
      raw/ntb: check memory allocations
      ipc: check malloc sync reply result
      eal: fix service core list parsing
      ipc: use monotonic clock
      net/hns3: return error on PCI config write failure
      net/hns3: fix log on flow director clear
      net/hns3: clear hash map on flow director clear
      net/hns3: fix querying flow director counter for out param
      net/hns3: fix TM QCN error event report by MSI-X
      net/hns3: fix mailbox message ID in log
      net/hns3: fix secondary process request start/stop Rx/Tx
      net/hns3: fix ordering in secondary process initialization
      net/hns3: fail setting FEC if one bit mode is not supported
      net/mlx4: fix secondary process initialization ordering
      net/mlx5: fix secondary process initialization ordering

Ciara Loftus (1):
      net/af_xdp: fix error handling during Rx queue setup

Ciara Power (2):
      telemetry: fix race on callbacks list
      test/crypto: fix return value of a skipped test

Conor Walsh (1):
      examples/l3fwd: fix LPM IPv6 subnets

Cristian Dumitrescu (3):
      table: fix actions with different data size
      pipeline: fix instruction translation
      pipeline: fix endianness conversions

Dapeng Yu (3):
      net/igc: remove MTU setting limitation
      net/e1000: remove MTU setting limitation
      examples/packet_ordering: fix port configuration

David Christensen (1):
      config/ppc: reduce number of cores and NUMA nodes

David Harton (1):
      net/ena: fix releasing Tx ring mbufs

David Hunt (4):
      test/power: fix CPU frequency check
      test/power: add turbo mode to frequency check
      test/power: fix low frequency test when turbo enabled
      test/power: fix turbo test

David Marchand (18):
      doc: fix sphinx rtd theme import in GHA
      service: clean references to removed symbol
      eal: fix evaluation of log level option
      ci: hook to GitHub Actions
      ci: enable v21 ABI checks
      ci: fix package installation in GitHub Actions
      ci: ignore APT update failure in GitHub Actions
      ci: catch coredumps
      vhost: fix offload flags in Rx path
      bus/fslmc: remove unused debug macro
      eal: fix leak in shared lib mode detection
      event/dpaa2: remove unused macros
      net/ice/base: fix memory allocation wrapper
      net/ice: fix leak on thread termination
      devtools: fix orphan symbols check with busybox
      net/vhost: restore pseudo TSO support
      net/ark: fix leak on thread termination
      build: fix drivers selection without Python

Dekel Peled (1):
      common/mlx5: fix DevX read output buffer size

Dmitry Kozlyuk (4):
      net/pcap: fix format string
      eal/windows: add missing SPDX license tag
      buildtools: fix all drivers disabled on Windows
      examples/rxtx_callbacks: fix port ID format specifier

Ed Czeck (2):
      net/ark: update packet director initial state
      net/ark: refactor Rx buffer recovery

Elad Nachman (2):
      kni: support async user request
      kni: fix kernel deadlock with bifurcated device

Feifei Wang (2):
      net/i40e: fix parsing packet type for NEON
      test/trace: fix race on collected perf data

Ferruh Yigit (9):
      power: remove duplicated symbols from map file
      log/linux: make default output stderr
      license: fix typos
      drivers/net: fix FW version query
      net/bnx2x: fix build with GCC 11
      net/bnx2x: fix build with GCC 11
      net/ice/base: fix build with GCC 11
      net/tap: fix build with GCC 11
      test/table: fix build with GCC 11

Gregory Etelson (2):
      app/testpmd: fix tunnel offload flows cleanup
      net/mlx5: fix tunnel offload private items location

Guoyang Zhou (1):
      net/hinic: fix crash in secondary process

Haiyue Wang (1):
      net/ixgbe: fix Rx errors statistics for UDP checksum

Harman Kalra (1):
      event/octeontx2: fix device reconfigure for single slot

Heinrich Kuhn (1):
      net/nfp: fix reporting of RSS capabilities

Hemant Agrawal (3):
      ethdev: add missing buses in device iterator
      crypto/dpaa_sec: affine the thread portal affinity
      crypto/dpaa2_sec: fix close and uninit functions

Hongbo Zheng (9):
      app/testpmd: fix Tx/Rx descriptor query error log
      net/hns3: fix FLR miss detection
      net/hns3: delete redundant blank line
      bpf: fix JSLT validation
      common/sfc_efx/base: fix dereferencing null pointer
      power: fix sanity checks for guest channel read
      net/hns3: fix VF alive notification after config restore
      examples/l3fwd-power: fix empty poll thresholds
      net/hns3: fix concurrent interrupt handling

Huisong Li (23):
      net/hns3: fix device capabilities for copper media type
      net/hns3: remove unused parameter markers
      net/hns3: fix reporting undefined speed
      net/hns3: fix link update when failed to get link info
      net/hns3: fix flow control exception
      app/testpmd: fix bitmap of link speeds when force speed
      net/hns3: fix flow control mode
      net/hns3: remove redundant mailbox response
      net/hns3: fix DCB mode check
      net/hns3: fix VMDq mode check
      net/hns3: fix mbuf leakage
      net/hns3: fix link status when port is stopped
      net/hns3: fix link speed when port is down
      app/testpmd: fix forward lcores number for DCB
      app/testpmd: fix DCB forwarding configuration
      app/testpmd: fix DCB re-configuration
      app/testpmd: verify DCB config during forward config
      net/hns3: fix Rx/Tx queue numbers check
      net/hns3: fix requested FC mode rollback
      net/hns3: remove meaningless packet buffer rollback
      net/hns3: fix DCB configuration
      net/hns3: fix DCB reconfiguration
      net/hns3: fix link speed when VF device is down

Ibtisam Tariq (1):
      examples/vhost_crypto: remove unused short option

Igor Chauskin (2):
      net/ena: switch memcpy to optimized version
      net/ena: fix parsing of large LLQ header device argument

Igor Russkikh (2):
      net/qede: reduce log verbosity
      net/qede: accept bigger RSS table

Ilya Maximets (1):
      net/virtio: fix interrupt unregistering for listening socket

Ivan Malov (5):
      net/sfc: fix buffer size for flow parse
      net: fix comment in IPv6 header
      net/sfc: fix error path inconsistency
      common/sfc_efx/base: fix indication of MAE encap support
      net/sfc: fix outer rule rollback on error

Jerin Jacob (1):
      examples: fix pkg-config override

Jiawei Wang (4):
      app/testpmd: fix NVGRE encap configuration
      net/mlx5: fix resource release for mirror flow
      net/mlx5: fix RSS flow item expansion for GRE key
      net/mlx5: fix RSS flow item expansion for NVGRE

Jiawei Zhu (1):
      net/mlx5: fix Rx segmented packets on mbuf starvation

Jiawen Wu (4):
      net/txgbe: remove unused functions
      net/txgbe: fix Rx missed packet counter
      net/txgbe: update packet type
      net/txgbe: fix QinQ strip

Jiayu Hu (2):
      vhost: fix queue initialization
      vhost: fix redundant vring status change notification

Jie Wang (1):
      net/ice: fix VSI array out of bounds access

John Daley (2):
      net/enic: fix flow initialization error handling
      net/enic: enable GENEVE offload via VNIC configuration

Juraj Linkeš (1):
      eal/arm64: fix platform register bit

Kai Ji (2):
      test/crypto: fix auth-cipher compare length in OOP
      test/crypto: copy offset data to OOP destination buffer

Kalesh AP (23):
      net/bnxt: remove unused macro
      net/bnxt: fix VNIC configuration
      net/bnxt: fix firmware fatal error handling
      net/bnxt: fix FW readiness check during recovery
      net/bnxt: fix device readiness check
      net/bnxt: fix VF info allocation
      net/bnxt: fix HWRM and FW incompatibility handling
      net/bnxt: mute some failure logs
      app/testpmd: check MAC address query
      net/bnxt: fix PCI write check
      net/bnxt: fix link state operations
      net/bnxt: fix timesync when PTP is not supported
      net/bnxt: fix memory allocation for command response
      net/bnxt: fix double free in port start failure
      net/bnxt: fix configuring LRO
      net/bnxt: fix health check alarm cancellation
      net/bnxt: fix PTP support for Thor
      net/bnxt: fix ring count calculation for Thor
      net/bnxt: remove unnecessary forward declarations
      net/bnxt: remove unused function parameters
      net/bnxt: drop unused attribute
      net/bnxt: fix single PF per port check
      net/bnxt: prevent device access in error state

Kamil Vojanec (1):
      net/mlx5/linux: fix firmware version

Kevin Traynor (5):
      test/cmdline: fix inputs array
      test/crypto: fix build with GCC 11
      crypto/zuc: fix build with GCC 11
      test: fix build with GCC 11
      test/cmdline: silence clang 12 warning

Konstantin Ananyev (1):
      acl: fix build with GCC 11

Lance Richardson (8):
      net/bnxt: fix Rx buffer posting
      net/bnxt: fix Tx length hint threshold
      net/bnxt: fix handling of null flow mask
      test: fix TCP header initialization
      net/bnxt: fix Rx descriptor status
      net/bnxt: fix Rx queue count
      net/bnxt: fix dynamic VNIC count
      eal: fix memory mapping on 32-bit target

Leyi Rong (1):
      net/iavf: fix packet length parsing in AVX512

Li Zhang (1):
      net/mlx5: fix flow actions index in cache

Luc Pelletier (2):
      eal: fix race in control thread creation
      eal: fix hang in control thread creation

Marvin Liu (5):
      vhost: fix split ring potential buffer overflow
      vhost: fix packed ring potential buffer overflow
      vhost: fix batch dequeue potential buffer overflow
      vhost: fix initialization of temporary header
      vhost: fix initialization of async temporary header

Matan Azrad (5):
      common/mlx5/linux: add glue function to query WQ
      common/mlx5: add DevX command to query WQ
      common/mlx5: add DevX commands for queue counters
      vdpa/mlx5: fix virtq cleaning
      vdpa/mlx5: fix device unplug

Michael Baum (1):
      net/mlx5: fix flow age event triggering

Michal Krawczyk (5):
      net/ena/base: improve style and comments
      net/ena/base: fix type conversions by explicit casting
      net/ena/base: destroy multiple wait events
      net/ena: fix crash with unsupported device argument
      net/ena: indicate Rx RSS hash presence

Min Hu (Connor) (25):
      net/hns3: fix MTU config complexity
      net/hns3: update HiSilicon copyright syntax
      net/hns3: fix copyright date
      examples/ptpclient: remove wrong comment
      test/bpf: fix error message
      doc: fix HiSilicon copyright syntax
      net/hns3: remove unused macros
      net/hns3: remove unused macro
      app/eventdev: fix overflow in lcore list parsing
      test/kni: fix a comment
      test/kni: check init result
      net/hns3: fix typos on comments
      net/e1000: fix flow error message object
      app/testpmd: fix division by zero on socket memory dump
      net/kni: warn on stop failure
      app/bbdev: check memory allocation
      app/bbdev: fix HARQ error messages
      raw/skeleton: add missing check after setting attribute
      test/timer: check memzone allocation
      app/crypto-perf: check memory allocation
      examples/flow_classify: fix NUMA check of port and core
      examples/l2fwd-cat: fix NUMA check of port and core
      examples/skeleton: fix NUMA check of port and core
      test: check flow classifier creation
      test: fix division by zero

Murphy Yang (3):
      net/ixgbe: fix RSS RETA being reset after port start
      net/i40e: fix flow director config after flow validate
      net/i40e: fix flow director for common pctypes

Natanael Copa (5):
      common/dpaax/caamflib: fix build with musl
      bus/dpaa: fix 64-bit arch detection
      bus/dpaa: fix build with musl
      net/cxgbe: remove use of uint type
      app/testpmd: fix build with musl

Nipun Gupta (1):
      bus/dpaa: fix statistics reading

Nithin Dabilpuram (3):
      vfio: do not merge contiguous areas
      vfio: fix DMA mapping granularity for IOVA as VA
      test/mem: fix page size for external memory

Olivier Matz (1):
      test/mempool: fix object initializer

Pallavi Kadam (1):
      bus/pci: skip probing some Windows NDIS devices

Pavan Nikhilesh (4):
      test/event: fix timeout accuracy
      app/eventdev: fix timeout accuracy
      app/eventdev: fix lcore parsing skipping last core
      event/octeontx2: fix XAQ pool reconfigure

Pu Xu (1):
      ip_frag: fix fragmenting IPv4 packet with header option

Qi Zhang (8):
      net/ice/base: fix payload indicator on ptype
      net/ice/base: fix uninitialized struct
      net/ice/base: cleanup filter list on error
      net/ice/base: fix memory allocation for MAC addresses
      net/iavf: fix TSO max segment size
      doc: fix matching versions in ice guide
      net/iavf: fix wrong Tx context descriptor
      common/iavf: fix duplicated offload bit

Radha Mohan Chintakuntla (1):
      raw/octeontx2_dma: assign PCI device in DPI VF

Raslan Darawsheh (1):
      ethdev: update flow item GTP QFI definition

Richael Zhuang (2):
      test/power: add delay before checking CPU frequency
      test/power: round CPU frequency to check

Robin Zhang (6):
      net/i40e: announce request queue capability in PF
      doc: update recommended versions for i40e
      net/i40e: fix lack of MAC type when set MAC address
      net/iavf: fix lack of MAC type when set MAC address
      net/iavf: fix primary MAC type when starting port
      net/i40e: fix primary MAC type when starting port

Rohit Raj (3):
      net/dpaa2: fix getting link status
      net/dpaa: fix getting link status
      examples/l2fwd-crypto: fix packet length while decryption

Roy Shterman (1):
      mem: fix freeing segments in --huge-unlink mode

Satheesh Paul (1):
      net/octeontx2: fix VLAN filter

Savinay Dharmappa (1):
      sched: fix traffic class oversubscription parameter

Shijith Thotton (3):
      eventdev: fix case to initiate crypto adapter service
      event/octeontx2: fix crypto adapter queue pair operations
      event/octeontx2: configure crypto adapter xaq pool

Siwar Zitouni (1):
      net/ice: fix disabling promiscuous mode

Somnath Kotur (5):
      net/bnxt: fix xstats get
      net/bnxt: fix Rx and Tx timestamps
      net/bnxt: fix Tx timestamp init
      net/bnxt: refactor multi-queue Rx configuration
      net/bnxt: fix Rx timestamp when FIFO pending bit is set

Stanislaw Kardach (6):
      test: proceed if timer subsystem already initialized
      stack: allow lock-free only on relevant architectures
      test/distributor: fix worker notification in burst mode
      test/distributor: fix burst flush on worker quit
      net/ena: remove endian swap functions
      net/ena: report default ring size

Stephen Hemminger (2):
      kni: refactor user request processing
      net/bnxt: use prefix on global function

Suanming Mou (1):
      net/mlx5: fix counter offset detection

Tal Shnaiderman (2):
      eal/windows: fix default thread priority
      eal/windows: fix return codes of pthread shim layer

Tengfei Zhang (1):
      net/pcap: fix file descriptor leak on close

Thinh Tran (1):
      test: fix autotest handling of skipped tests

Thomas Monjalon (18):
      bus/pci: fix Windows kernel driver categories
      eal: fix comment of OS-specific header files
      buildtools: fix build with busybox
      build: detect execinfo library on Linux
      build: remove redundant _GNU_SOURCE definitions
      eal: fix build with musl
      net/igc: remove use of uint type
      event/dlb: fix header includes for musl
      examples/bbdev: fix header include for musl
      drivers: fix log level after loading
      app/regex: fix usage text
      app/testpmd: fix usage text
      doc: fix names of UIO drivers
      doc: fix build with Sphinx 4
      bus/pci: support I/O port operations with musl
      app: fix exit messages
      regex/octeontx2: remove unused include directory
      doc: remove PDF requirements

Tianyu Li (1):
      net/memif: fix Tx bps statistics for zero-copy

Timothy McDaniel (2):
      event/dlb2: remove references to deferred scheduling
      doc: fix runtime options in DLB2 guide

Tyler Retzlaff (1):
      eal: add C++ include guard for reciprocal header

Vadim Podovinnikov (1):
      net/bonding: fix LACP system address check

Venkat Duvvuru (1):
      net/bnxt: fix queues per VNIC

Viacheslav Ovsiienko (16):
      net/mlx5: fix external buffer pool registration for Rx queue
      net/mlx5: fix metadata item validation for ingress flows
      net/mlx5: fix hashed list size for tunnel flow groups
      net/mlx5: fix UAR allocation diagnostics messages
      common/mlx5: add timestamp format support to DevX
      vdpa/mlx5: support timestamp format
      net/mlx5: fix Rx metadata leftovers
      net/mlx5: fix drop action for Direct Rules/Verbs
      net/mlx4: fix RSS action with null hash key
      net/mlx5: support timestamp format
      regex/mlx5: support timestamp format
      app/testpmd: fix segment number check
      net/mlx5: remove drop queue function prototypes
      net/mlx4: fix buffer leakage on device close
      net/mlx5: fix probing device in legacy bonding mode
      net/mlx5: fix receiving queue timestamp format

Wei Huang (1):
      raw/ifpga: fix device name format

Wenjun Wu (3):
      net/ice: check some functions return
      net/ice: fix RSS hash update
      net/ice: fix RSS for L2 packet

Wenwu Ma (1):
      net/ice: fix illegal access when removing MAC filter

Wenzhuo Lu (2):
      net/iavf: fix crash in AVX512
      net/ice: fix crash in AVX512

Wisam Jaddo (1):
      app/flow-perf: fix encap/decap actions

Xiao Wang (1):
      vdpa/ifc: check PCI config read

Xiaoyu Min (4):
      net/mlx5: support RSS expansion for IPv6 GRE
      net/mlx5: fix shared inner RSS
      net/mlx5: fix missing shared RSS hash types
      net/mlx5: fix redundant flow after RSS expansion

Xiaoyun Li (2):
      app/testpmd: remove unnecessary UDP tunnel check
      net/i40e: fix IPv4 fragment offload

Xueming Li (2):
      version: 20.11.2-rc1
      net/virtio: fix vectorized Rx queue rearm

Youri Querry (1):
      bus/fslmc: fix random portal hangs with qbman 5.0

Yunjian Wang (5):
      vfio: fix API description
      net/mlx5: fix using flow tunnel before null check
      vfio: fix duplicated user mem map
      net/mlx4: fix leak when configured repeatedly
      net/mlx5: fix leak when configured repeatedly

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] 20.11.2 patches review and test
@ 2021-06-26 23:08  1% Xueming Li
  0 siblings, 0 replies; 200+ results
From: Xueming Li @ 2021-06-26 23:08 UTC (permalink / raw)
  To: stable
  Cc: dev, Abhishek Marathe, Akhil Goyal, Ali Alnubani,
	benjamin.walker, David Christensen, hariprasad.govindharajan,
	Hemant Agrawal, Ian Stokes, Jerin Jacob, John McNamara,
	Ju-Hyoung Lee, Kevin Traynor, Luca Boccassi, Pei Zhang, pingx.yu,
	qian.q.xu, Raslan Darawsheh, Thomas Monjalon, yuan.peng,
	zhaoyan.chen, xuemingl

Hi all,

Here is a list of patches targeted for stable release 20.11.2.

The planned date for the final release is 6th July.

Please help with testing and validation of your use cases and report
any issues/results with reply-all to this mail. For the final release
the fixes and reported validations will be added to the release notes.

A release candidate tarball can be found at:

    https://dpdk.org/browse/dpdk-stable/tag/?id=v20.11.2-rc2

These patches are located at branch 20.11 of dpdk-stable repo:
    https://dpdk.org/browse/dpdk-stable/

Thanks.

Xueming Li <xuemingl@nvidia.com>

---
Adam Dybkowski (3):
      common/qat: increase IM buffer size for GEN3
      compress/qat: enable compression on GEN3
      crypto/qat: fix null authentication request

Ajit Khaparde (7):
      net/bnxt: fix RSS context cleanup
      net/bnxt: check kvargs parsing
      net/bnxt: fix resource cleanup
      doc: fix formatting in testpmd guide
      net/bnxt: fix mismatched type comparison in MAC restore
      net/bnxt: check PCI config read
      net/bnxt: fix mismatched type comparison in Rx

Alvin Zhang (11):
      net/ice: fix VLAN filter with PF
      net/i40e: fix input set field mask
      net/igc: fix Rx RSS hash offload capability
      net/igc: fix Rx error counter for bad length
      net/e1000: fix Rx error counter for bad length
      net/e1000: fix max Rx packet size
      net/igc: fix Rx packet size
      net/ice: fix fast mbuf freeing
      net/iavf: fix VF to PF command failure handling
      net/i40e: fix VF RSS configuration
      net/igc: fix speed configuration

Anatoly Burakov (3):
      fbarray: fix log message on truncation error
      power: do not skip saving original P-state governor
      power: save original ACPI governor always

Andrew Boyer (1):
      net/ionic: fix completion type in lif init

Andrew Rybchenko (4):
      net/failsafe: fix RSS hash offload reporting
      net/failsafe: report minimum and maximum MTU
      common/sfc_efx: remove GENEVE from supported tunnels
      net/sfc: fix mark support in EF100 native Rx datapath

Andy Moreton (2):
      common/sfc_efx/base: limit reported MCDI response length
      common/sfc_efx/base: add missing MCDI response length checks

Ankur Dwivedi (1):
      crypto/octeontx: fix session-less mode

Apeksha Gupta (1):
      examples/l2fwd-crypto: skip masked devices

Arek Kusztal (1):
      crypto/qat: fix offset for out-of-place scatter-gather

Beilei Xing (1):
      net/i40evf: fix packet loss for X722

Bing Zhao (1):
      net/mlx5: fix loopback for Direct Verbs queue

Bruce Richardson (2):
      build: exclude meson files from examples installation
      raw/ioat: fix script for configuring small number of queues

Chaoyong He (1):
      doc: fix multiport syntax in nfp guide

Chenbo Xia (1):
      examples/vhost: check memory table query

Chengchang Tang (20):
      net/hns3: fix HW buffer size on MTU update
      net/hns3: fix processing Tx offload flags
      net/hns3: fix Tx checksum for UDP packets with special port
      net/hns3: fix long task queue pairs reset time
      ethdev: validate input in module EEPROM dump
      ethdev: validate input in register info
      ethdev: validate input in EEPROM info
      net/hns3: fix rollback after setting PVID failure
      net/hns3: fix timing in resetting queues
      net/hns3: fix queue state when concurrent with reset
      net/hns3: fix configure FEC when concurrent with reset
      net/hns3: fix use of command status enumeration
      examples: add eal cleanup to examples
      net/bonding: fix adding itself as its slave
      net/hns3: fix timing in mailbox
      app/testpmd: fix max queue number for Tx offloads
      net/tap: fix interrupt vector array size
      net/bonding: fix socket ID check
      net/tap: check ioctl on restore
      examples/timer: fix time interval

Chengwen Feng (50):
      net/hns3: fix flow counter value
      net/hns3: fix VF mailbox head field
      net/hns3: support get device version when dump register
      net/hns3: fix some packet types
      net/hns3: fix missing outer L4 UDP flag for VXLAN
      net/hns3: remove VLAN/QinQ ptypes from support list
      test: check thread creation
      common/dpaax: fix possible null pointer access
      examples/ethtool: remove unused parsing
      net/hns3: fix flow director lock
      net/e1000/base: fix timeout for shadow RAM write
      net/hns3: fix setting default MAC address in bonding of VF
      net/hns3: fix possible mismatched response of mailbox
      net/hns3: fix VF handling LSC event in secondary process
      net/hns3: fix verification of NEON support
      mbuf: check shared memory before dumping dynamic space
      eventdev: remove redundant thread name setting
      eventdev: fix memory leakage on thread creation failure
      net/kni: check init result
      net/hns3: fix mailbox error message
      net/hns3: fix processing link status message on PF
      net/hns3: remove unused mailbox macro and struct
      net/bonding: fix leak on remove
      net/hns3: fix handling link update
      net/i40e: fix negative VEB index
      net/i40e: remove redundant VSI check in Tx queue setup
      net/virtio: fix getline memory leakage
      net/hns3: log time delta in decimal format
      net/hns3: fix time delta calculation
      net/hns3: remove unused macros
      net/hns3: fix vector Rx burst limitation
      net/hns3: remove read when enabling TM QCN error event
      net/hns3: remove unused VMDq code
      net/hns3: increase readability in logs
      raw/ntb: check SPAD user index
      raw/ntb: check memory allocations
      ipc: check malloc sync reply result
      eal: fix service core list parsing
      ipc: use monotonic clock
      net/hns3: return error on PCI config write failure
      net/hns3: fix log on flow director clear
      net/hns3: clear hash map on flow director clear
      net/hns3: fix querying flow director counter for out param
      net/hns3: fix TM QCN error event report by MSI-X
      net/hns3: fix mailbox message ID in log
      net/hns3: fix secondary process request start/stop Rx/Tx
      net/hns3: fix ordering in secondary process initialization
      net/hns3: fail setting FEC if one bit mode is not supported
      net/mlx4: fix secondary process initialization ordering
      net/mlx5: fix secondary process initialization ordering

Ciara Loftus (1):
      net/af_xdp: fix error handling during Rx queue setup

Ciara Power (2):
      telemetry: fix race on callbacks list
      test/crypto: fix return value of a skipped test

Conor Walsh (1):
      examples/l3fwd: fix LPM IPv6 subnets

Cristian Dumitrescu (3):
      table: fix actions with different data size
      pipeline: fix instruction translation
      pipeline: fix endianness conversions

Dapeng Yu (3):
      net/igc: remove MTU setting limitation
      net/e1000: remove MTU setting limitation
      examples/packet_ordering: fix port configuration

David Christensen (1):
      config/ppc: reduce number of cores and NUMA nodes

David Harton (1):
      net/ena: fix releasing Tx ring mbufs

David Hunt (4):
      test/power: fix CPU frequency check
      test/power: add turbo mode to frequency check
      test/power: fix low frequency test when turbo enabled
      test/power: fix turbo test

David Marchand (18):
      doc: fix sphinx rtd theme import in GHA
      service: clean references to removed symbol
      eal: fix evaluation of log level option
      ci: hook to GitHub Actions
      ci: enable v21 ABI checks
      ci: fix package installation in GitHub Actions
      ci: ignore APT update failure in GitHub Actions
      ci: catch coredumps
      vhost: fix offload flags in Rx path
      bus/fslmc: remove unused debug macro
      eal: fix leak in shared lib mode detection
      event/dpaa2: remove unused macros
      net/ice/base: fix memory allocation wrapper
      net/ice: fix leak on thread termination
      devtools: fix orphan symbols check with busybox
      net/vhost: restore pseudo TSO support
      net/ark: fix leak on thread termination
      build: fix drivers selection without Python

Dekel Peled (1):
      common/mlx5: fix DevX read output buffer size

Dmitry Kozlyuk (4):
      net/pcap: fix format string
      eal/windows: add missing SPDX license tag
      buildtools: fix all drivers disabled on Windows
      examples/rxtx_callbacks: fix port ID format specifier

Ed Czeck (2):
      net/ark: update packet director initial state
      net/ark: refactor Rx buffer recovery

Elad Nachman (2):
      kni: support async user request
      kni: fix kernel deadlock with bifurcated device

Feifei Wang (2):
      net/i40e: fix parsing packet type for NEON
      test/trace: fix race on collected perf data

Ferruh Yigit (9):
      power: remove duplicated symbols from map file
      log/linux: make default output stderr
      license: fix typos
      drivers/net: fix FW version query
      net/bnx2x: fix build with GCC 11
      net/bnx2x: fix build with GCC 11
      net/ice/base: fix build with GCC 11
      net/tap: fix build with GCC 11
      test/table: fix build with GCC 11

Gregory Etelson (2):
      app/testpmd: fix tunnel offload flows cleanup
      net/mlx5: fix tunnel offload private items location

Guoyang Zhou (1):
      net/hinic: fix crash in secondary process

Haiyue Wang (1):
      net/ixgbe: fix Rx errors statistics for UDP checksum

Harman Kalra (1):
      event/octeontx2: fix device reconfigure for single slot

Heinrich Kuhn (1):
      net/nfp: fix reporting of RSS capabilities

Hemant Agrawal (3):
      ethdev: add missing buses in device iterator
      crypto/dpaa_sec: affine the thread portal affinity
      crypto/dpaa2_sec: fix close and uninit functions

Hongbo Zheng (9):
      app/testpmd: fix Tx/Rx descriptor query error log
      net/hns3: fix FLR miss detection
      net/hns3: delete redundant blank line
      bpf: fix JSLT validation
      common/sfc_efx/base: fix dereferencing null pointer
      power: fix sanity checks for guest channel read
      net/hns3: fix VF alive notification after config restore
      examples/l3fwd-power: fix empty poll thresholds
      net/hns3: fix concurrent interrupt handling

Huisong Li (23):
      net/hns3: fix device capabilities for copper media type
      net/hns3: remove unused parameter markers
      net/hns3: fix reporting undefined speed
      net/hns3: fix link update when failed to get link info
      net/hns3: fix flow control exception
      app/testpmd: fix bitmap of link speeds when force speed
      net/hns3: fix flow control mode
      net/hns3: remove redundant mailbox response
      net/hns3: fix DCB mode check
      net/hns3: fix VMDq mode check
      net/hns3: fix mbuf leakage
      net/hns3: fix link status when port is stopped
      net/hns3: fix link speed when port is down
      app/testpmd: fix forward lcores number for DCB
      app/testpmd: fix DCB forwarding configuration
      app/testpmd: fix DCB re-configuration
      app/testpmd: verify DCB config during forward config
      net/hns3: fix Rx/Tx queue numbers check
      net/hns3: fix requested FC mode rollback
      net/hns3: remove meaningless packet buffer rollback
      net/hns3: fix DCB configuration
      net/hns3: fix DCB reconfiguration
      net/hns3: fix link speed when VF device is down

Ibtisam Tariq (1):
      examples/vhost_crypto: remove unused short option

Igor Chauskin (2):
      net/ena: switch memcpy to optimized version
      net/ena: fix parsing of large LLQ header device argument

Igor Russkikh (2):
      net/qede: reduce log verbosity
      net/qede: accept bigger RSS table

Ilya Maximets (1):
      net/virtio: fix interrupt unregistering for listening socket

Ivan Malov (5):
      net/sfc: fix buffer size for flow parse
      net: fix comment in IPv6 header
      net/sfc: fix error path inconsistency
      common/sfc_efx/base: fix indication of MAE encap support
      net/sfc: fix outer rule rollback on error

Jerin Jacob (1):
      examples: fix pkg-config override

Jiawei Wang (4):
      app/testpmd: fix NVGRE encap configuration
      net/mlx5: fix resource release for mirror flow
      net/mlx5: fix RSS flow item expansion for GRE key
      net/mlx5: fix RSS flow item expansion for NVGRE

Jiawei Zhu (1):
      net/mlx5: fix Rx segmented packets on mbuf starvation

Jiawen Wu (4):
      net/txgbe: remove unused functions
      net/txgbe: fix Rx missed packet counter
      net/txgbe: update packet type
      net/txgbe: fix QinQ strip

Jiayu Hu (2):
      vhost: fix queue initialization
      vhost: fix redundant vring status change notification

Jie Wang (1):
      net/ice: fix VSI array out of bounds access

John Daley (2):
      net/enic: fix flow initialization error handling
      net/enic: enable GENEVE offload via VNIC configuration

Juraj Linkeš (1):
      eal/arm64: fix platform register bit

Kai Ji (2):
      test/crypto: fix auth-cipher compare length in OOP
      test/crypto: copy offset data to OOP destination buffer

Kalesh AP (23):
      net/bnxt: remove unused macro
      net/bnxt: fix VNIC configuration
      net/bnxt: fix firmware fatal error handling
      net/bnxt: fix FW readiness check during recovery
      net/bnxt: fix device readiness check
      net/bnxt: fix VF info allocation
      net/bnxt: fix HWRM and FW incompatibility handling
      net/bnxt: mute some failure logs
      app/testpmd: check MAC address query
      net/bnxt: fix PCI write check
      net/bnxt: fix link state operations
      net/bnxt: fix timesync when PTP is not supported
      net/bnxt: fix memory allocation for command response
      net/bnxt: fix double free in port start failure
      net/bnxt: fix configuring LRO
      net/bnxt: fix health check alarm cancellation
      net/bnxt: fix PTP support for Thor
      net/bnxt: fix ring count calculation for Thor
      net/bnxt: remove unnecessary forward declarations
      net/bnxt: remove unused function parameters
      net/bnxt: drop unused attribute
      net/bnxt: fix single PF per port check
      net/bnxt: prevent device access in error state

Kamil Vojanec (1):
      net/mlx5/linux: fix firmware version

Kevin Traynor (5):
      test/cmdline: fix inputs array
      test/crypto: fix build with GCC 11
      crypto/zuc: fix build with GCC 11
      test: fix build with GCC 11
      test/cmdline: silence clang 12 warning

Konstantin Ananyev (1):
      acl: fix build with GCC 11

Lance Richardson (8):
      net/bnxt: fix Rx buffer posting
      net/bnxt: fix Tx length hint threshold
      net/bnxt: fix handling of null flow mask
      test: fix TCP header initialization
      net/bnxt: fix Rx descriptor status
      net/bnxt: fix Rx queue count
      net/bnxt: fix dynamic VNIC count
      eal: fix memory mapping on 32-bit target

Leyi Rong (1):
      net/iavf: fix packet length parsing in AVX512

Li Zhang (1):
      net/mlx5: fix flow actions index in cache

Luc Pelletier (2):
      eal: fix race in control thread creation
      eal: fix hang in control thread creation

Marvin Liu (5):
      vhost: fix split ring potential buffer overflow
      vhost: fix packed ring potential buffer overflow
      vhost: fix batch dequeue potential buffer overflow
      vhost: fix initialization of temporary header
      vhost: fix initialization of async temporary header

Matan Azrad (5):
      common/mlx5/linux: add glue function to query WQ
      common/mlx5: add DevX command to query WQ
      common/mlx5: add DevX commands for queue counters
      vdpa/mlx5: fix virtq cleaning
      vdpa/mlx5: fix device unplug

Michael Baum (1):
      net/mlx5: fix flow age event triggering

Michal Krawczyk (5):
      net/ena/base: improve style and comments
      net/ena/base: fix type conversions by explicit casting
      net/ena/base: destroy multiple wait events
      net/ena: fix crash with unsupported device argument
      net/ena: indicate Rx RSS hash presence

Min Hu (Connor) (25):
      net/hns3: fix MTU config complexity
      net/hns3: update HiSilicon copyright syntax
      net/hns3: fix copyright date
      examples/ptpclient: remove wrong comment
      test/bpf: fix error message
      doc: fix HiSilicon copyright syntax
      net/hns3: remove unused macros
      net/hns3: remove unused macro
      app/eventdev: fix overflow in lcore list parsing
      test/kni: fix a comment
      test/kni: check init result
      net/hns3: fix typos on comments
      net/e1000: fix flow error message object
      app/testpmd: fix division by zero on socket memory dump
      net/kni: warn on stop failure
      app/bbdev: check memory allocation
      app/bbdev: fix HARQ error messages
      raw/skeleton: add missing check after setting attribute
      test/timer: check memzone allocation
      app/crypto-perf: check memory allocation
      examples/flow_classify: fix NUMA check of port and core
      examples/l2fwd-cat: fix NUMA check of port and core
      examples/skeleton: fix NUMA check of port and core
      test: check flow classifier creation
      test: fix division by zero

Murphy Yang (3):
      net/ixgbe: fix RSS RETA being reset after port start
      net/i40e: fix flow director config after flow validate
      net/i40e: fix flow director for common pctypes

Natanael Copa (5):
      common/dpaax/caamflib: fix build with musl
      bus/dpaa: fix 64-bit arch detection
      bus/dpaa: fix build with musl
      net/cxgbe: remove use of uint type
      app/testpmd: fix build with musl

Nipun Gupta (1):
      bus/dpaa: fix statistics reading

Nithin Dabilpuram (3):
      vfio: do not merge contiguous areas
      vfio: fix DMA mapping granularity for IOVA as VA
      test/mem: fix page size for external memory

Olivier Matz (1):
      test/mempool: fix object initializer

Pallavi Kadam (1):
      bus/pci: skip probing some Windows NDIS devices

Pavan Nikhilesh (4):
      test/event: fix timeout accuracy
      app/eventdev: fix timeout accuracy
      app/eventdev: fix lcore parsing skipping last core
      event/octeontx2: fix XAQ pool reconfigure

Pu Xu (1):
      ip_frag: fix fragmenting IPv4 packet with header option

Qi Zhang (8):
      net/ice/base: fix payload indicator on ptype
      net/ice/base: fix uninitialized struct
      net/ice/base: cleanup filter list on error
      net/ice/base: fix memory allocation for MAC addresses
      net/iavf: fix TSO max segment size
      doc: fix matching versions in ice guide
      net/iavf: fix wrong Tx context descriptor
      common/iavf: fix duplicated offload bit

Radha Mohan Chintakuntla (1):
      raw/octeontx2_dma: assign PCI device in DPI VF

Raslan Darawsheh (1):
      ethdev: update flow item GTP QFI definition

Richael Zhuang (2):
      test/power: add delay before checking CPU frequency
      test/power: round CPU frequency to check

Robin Zhang (6):
      net/i40e: announce request queue capability in PF
      doc: update recommended versions for i40e
      net/i40e: fix lack of MAC type when set MAC address
      net/iavf: fix lack of MAC type when set MAC address
      net/iavf: fix primary MAC type when starting port
      net/i40e: fix primary MAC type when starting port

Rohit Raj (3):
      net/dpaa2: fix getting link status
      net/dpaa: fix getting link status
      examples/l2fwd-crypto: fix packet length while decryption

Roy Shterman (1):
      mem: fix freeing segments in --huge-unlink mode

Satheesh Paul (1):
      net/octeontx2: fix VLAN filter

Savinay Dharmappa (1):
      sched: fix traffic class oversubscription parameter

Shijith Thotton (3):
      eventdev: fix case to initiate crypto adapter service
      event/octeontx2: fix crypto adapter queue pair operations
      event/octeontx2: configure crypto adapter xaq pool

Siwar Zitouni (1):
      net/ice: fix disabling promiscuous mode

Somnath Kotur (5):
      net/bnxt: fix xstats get
      net/bnxt: fix Rx and Tx timestamps
      net/bnxt: fix Tx timestamp init
      net/bnxt: refactor multi-queue Rx configuration
      net/bnxt: fix Rx timestamp when FIFO pending bit is set

Stanislaw Kardach (6):
      test: proceed if timer subsystem already initialized
      stack: allow lock-free only on relevant architectures
      test/distributor: fix worker notification in burst mode
      test/distributor: fix burst flush on worker quit
      net/ena: remove endian swap functions
      net/ena: report default ring size

Stephen Hemminger (2):
      kni: refactor user request processing
      net/bnxt: use prefix on global function

Suanming Mou (1):
      net/mlx5: fix counter offset detection

Tal Shnaiderman (2):
      eal/windows: fix default thread priority
      eal/windows: fix return codes of pthread shim layer

Tengfei Zhang (1):
      net/pcap: fix file descriptor leak on close

Thinh Tran (1):
      test: fix autotest handling of skipped tests

Thomas Monjalon (18):
      bus/pci: fix Windows kernel driver categories
      eal: fix comment of OS-specific header files
      buildtools: fix build with busybox
      build: detect execinfo library on Linux
      build: remove redundant _GNU_SOURCE definitions
      eal: fix build with musl
      net/igc: remove use of uint type
      event/dlb: fix header includes for musl
      examples/bbdev: fix header include for musl
      drivers: fix log level after loading
      app/regex: fix usage text
      app/testpmd: fix usage text
      doc: fix names of UIO drivers
      doc: fix build with Sphinx 4
      bus/pci: support I/O port operations with musl
      app: fix exit messages
      regex/octeontx2: remove unused include directory
      doc: remove PDF requirements

Tianyu Li (1):
      net/memif: fix Tx bps statistics for zero-copy

Timothy McDaniel (2):
      event/dlb2: remove references to deferred scheduling
      doc: fix runtime options in DLB2 guide

Tyler Retzlaff (1):
      eal: add C++ include guard for reciprocal header

Vadim Podovinnikov (1):
      net/bonding: fix LACP system address check

Venkat Duvvuru (1):
      net/bnxt: fix queues per VNIC

Viacheslav Ovsiienko (16):
      net/mlx5: fix external buffer pool registration for Rx queue
      net/mlx5: fix metadata item validation for ingress flows
      net/mlx5: fix hashed list size for tunnel flow groups
      net/mlx5: fix UAR allocation diagnostics messages
      common/mlx5: add timestamp format support to DevX
      vdpa/mlx5: support timestamp format
      net/mlx5: fix Rx metadata leftovers
      net/mlx5: fix drop action for Direct Rules/Verbs
      net/mlx4: fix RSS action with null hash key
      net/mlx5: support timestamp format
      regex/mlx5: support timestamp format
      app/testpmd: fix segment number check
      net/mlx5: remove drop queue function prototypes
      net/mlx4: fix buffer leakage on device close
      net/mlx5: fix probing device in legacy bonding mode
      net/mlx5: fix receiving queue timestamp format

Wei Huang (1):
      raw/ifpga: fix device name format

Wenjun Wu (3):
      net/ice: check some functions return
      net/ice: fix RSS hash update
      net/ice: fix RSS for L2 packet

Wenwu Ma (1):
      net/ice: fix illegal access when removing MAC filter

Wenzhuo Lu (2):
      net/iavf: fix crash in AVX512
      net/ice: fix crash in AVX512

Wisam Jaddo (1):
      app/flow-perf: fix encap/decap actions

Xiao Wang (1):
      vdpa/ifc: check PCI config read

Xiaoyu Min (4):
      net/mlx5: support RSS expansion for IPv6 GRE
      net/mlx5: fix shared inner RSS
      net/mlx5: fix missing shared RSS hash types
      net/mlx5: fix redundant flow after RSS expansion

Xiaoyun Li (2):
      app/testpmd: remove unnecessary UDP tunnel check
      net/i40e: fix IPv4 fragment offload

Xueming Li (2):
      version: 20.11.2-rc1
      net/virtio: fix vectorized Rx queue rearm

Youri Querry (1):
      bus/fslmc: fix random portal hangs with qbman 5.0

Yunjian Wang (5):
      vfio: fix API description
      net/mlx5: fix using flow tunnel before null check
      vfio: fix duplicated user mem map
      net/mlx4: fix leak when configured repeatedly
      net/mlx5: fix leak when configured repeatedly

^ permalink raw reply	[relevance 1%]

* [dpdk-dev] 20.11.2 patches review and test
@ 2021-06-26 23:28  1% Xueming Li
  2021-06-30 10:33  0% ` Jiang, YuX
  2021-07-06  3:26  0% ` [dpdk-dev] [dpdk-stable] " Kalesh Anakkur Purayil
  0 siblings, 2 replies; 200+ results
From: Xueming Li @ 2021-06-26 23:28 UTC (permalink / raw)
  To: stable
  Cc: dev, Abhishek Marathe, Akhil Goyal, Ali Alnubani,
	benjamin.walker, David Christensen, hariprasad.govindharajan,
	Hemant Agrawal, Ian Stokes, Jerin Jacob, John McNamara,
	Ju-Hyoung Lee, Kevin Traynor, Luca Boccassi, Pei Zhang, pingx.yu,
	qian.q.xu, Raslan Darawsheh, Thomas Monjalon, yuan.peng,
	zhaoyan.chen, xuemingl

Hi all,

Here is a list of patches targeted for stable release 20.11.2.

The planned date for the final release is 6th July.

Please help with testing and validation of your use cases and report
any issues/results with reply-all to this mail. For the final release
the fixes and reported validations will be added to the release notes.

A release candidate tarball can be found at:

    https://dpdk.org/browse/dpdk-stable/tag/?id=v20.11.2-rc2

These patches are located at branch 20.11 of dpdk-stable repo:
    https://dpdk.org/browse/dpdk-stable/

Thanks.

Xueming Li <xuemingl@nvidia.com>

---
Adam Dybkowski (3):
      common/qat: increase IM buffer size for GEN3
      compress/qat: enable compression on GEN3
      crypto/qat: fix null authentication request

Ajit Khaparde (7):
      net/bnxt: fix RSS context cleanup
      net/bnxt: check kvargs parsing
      net/bnxt: fix resource cleanup
      doc: fix formatting in testpmd guide
      net/bnxt: fix mismatched type comparison in MAC restore
      net/bnxt: check PCI config read
      net/bnxt: fix mismatched type comparison in Rx

Alvin Zhang (11):
      net/ice: fix VLAN filter with PF
      net/i40e: fix input set field mask
      net/igc: fix Rx RSS hash offload capability
      net/igc: fix Rx error counter for bad length
      net/e1000: fix Rx error counter for bad length
      net/e1000: fix max Rx packet size
      net/igc: fix Rx packet size
      net/ice: fix fast mbuf freeing
      net/iavf: fix VF to PF command failure handling
      net/i40e: fix VF RSS configuration
      net/igc: fix speed configuration

Anatoly Burakov (3):
      fbarray: fix log message on truncation error
      power: do not skip saving original P-state governor
      power: save original ACPI governor always

Andrew Boyer (1):
      net/ionic: fix completion type in lif init

Andrew Rybchenko (4):
      net/failsafe: fix RSS hash offload reporting
      net/failsafe: report minimum and maximum MTU
      common/sfc_efx: remove GENEVE from supported tunnels
      net/sfc: fix mark support in EF100 native Rx datapath

Andy Moreton (2):
      common/sfc_efx/base: limit reported MCDI response length
      common/sfc_efx/base: add missing MCDI response length checks

Ankur Dwivedi (1):
      crypto/octeontx: fix session-less mode

Apeksha Gupta (1):
      examples/l2fwd-crypto: skip masked devices

Arek Kusztal (1):
      crypto/qat: fix offset for out-of-place scatter-gather

Beilei Xing (1):
      net/i40evf: fix packet loss for X722

Bing Zhao (1):
      net/mlx5: fix loopback for Direct Verbs queue

Bruce Richardson (2):
      build: exclude meson files from examples installation
      raw/ioat: fix script for configuring small number of queues

Chaoyong He (1):
      doc: fix multiport syntax in nfp guide

Chenbo Xia (1):
      examples/vhost: check memory table query

Chengchang Tang (20):
      net/hns3: fix HW buffer size on MTU update
      net/hns3: fix processing Tx offload flags
      net/hns3: fix Tx checksum for UDP packets with special port
      net/hns3: fix long task queue pairs reset time
      ethdev: validate input in module EEPROM dump
      ethdev: validate input in register info
      ethdev: validate input in EEPROM info
      net/hns3: fix rollback after setting PVID failure
      net/hns3: fix timing in resetting queues
      net/hns3: fix queue state when concurrent with reset
      net/hns3: fix configure FEC when concurrent with reset
      net/hns3: fix use of command status enumeration
      examples: add eal cleanup to examples
      net/bonding: fix adding itself as its slave
      net/hns3: fix timing in mailbox
      app/testpmd: fix max queue number for Tx offloads
      net/tap: fix interrupt vector array size
      net/bonding: fix socket ID check
      net/tap: check ioctl on restore
      examples/timer: fix time interval

Chengwen Feng (50):
      net/hns3: fix flow counter value
      net/hns3: fix VF mailbox head field
      net/hns3: support get device version when dump register
      net/hns3: fix some packet types
      net/hns3: fix missing outer L4 UDP flag for VXLAN
      net/hns3: remove VLAN/QinQ ptypes from support list
      test: check thread creation
      common/dpaax: fix possible null pointer access
      examples/ethtool: remove unused parsing
      net/hns3: fix flow director lock
      net/e1000/base: fix timeout for shadow RAM write
      net/hns3: fix setting default MAC address in bonding of VF
      net/hns3: fix possible mismatched response of mailbox
      net/hns3: fix VF handling LSC event in secondary process
      net/hns3: fix verification of NEON support
      mbuf: check shared memory before dumping dynamic space
      eventdev: remove redundant thread name setting
      eventdev: fix memory leakage on thread creation failure
      net/kni: check init result
      net/hns3: fix mailbox error message
      net/hns3: fix processing link status message on PF
      net/hns3: remove unused mailbox macro and struct
      net/bonding: fix leak on remove
      net/hns3: fix handling link update
      net/i40e: fix negative VEB index
      net/i40e: remove redundant VSI check in Tx queue setup
      net/virtio: fix getline memory leakage
      net/hns3: log time delta in decimal format
      net/hns3: fix time delta calculation
      net/hns3: remove unused macros
      net/hns3: fix vector Rx burst limitation
      net/hns3: remove read when enabling TM QCN error event
      net/hns3: remove unused VMDq code
      net/hns3: increase readability in logs
      raw/ntb: check SPAD user index
      raw/ntb: check memory allocations
      ipc: check malloc sync reply result
      eal: fix service core list parsing
      ipc: use monotonic clock
      net/hns3: return error on PCI config write failure
      net/hns3: fix log on flow director clear
      net/hns3: clear hash map on flow director clear
      net/hns3: fix querying flow director counter for out param
      net/hns3: fix TM QCN error event report by MSI-X
      net/hns3: fix mailbox message ID in log
      net/hns3: fix secondary process request start/stop Rx/Tx
      net/hns3: fix ordering in secondary process initialization
      net/hns3: fail setting FEC if one bit mode is not supported
      net/mlx4: fix secondary process initialization ordering
      net/mlx5: fix secondary process initialization ordering

Ciara Loftus (1):
      net/af_xdp: fix error handling during Rx queue setup

Ciara Power (2):
      telemetry: fix race on callbacks list
      test/crypto: fix return value of a skipped test

Conor Walsh (1):
      examples/l3fwd: fix LPM IPv6 subnets

Cristian Dumitrescu (3):
      table: fix actions with different data size
      pipeline: fix instruction translation
      pipeline: fix endianness conversions

Dapeng Yu (3):
      net/igc: remove MTU setting limitation
      net/e1000: remove MTU setting limitation
      examples/packet_ordering: fix port configuration

David Christensen (1):
      config/ppc: reduce number of cores and NUMA nodes

David Harton (1):
      net/ena: fix releasing Tx ring mbufs

David Hunt (4):
      test/power: fix CPU frequency check
      test/power: add turbo mode to frequency check
      test/power: fix low frequency test when turbo enabled
      test/power: fix turbo test

David Marchand (18):
      doc: fix sphinx rtd theme import in GHA
      service: clean references to removed symbol
      eal: fix evaluation of log level option
      ci: hook to GitHub Actions
      ci: enable v21 ABI checks
      ci: fix package installation in GitHub Actions
      ci: ignore APT update failure in GitHub Actions
      ci: catch coredumps
      vhost: fix offload flags in Rx path
      bus/fslmc: remove unused debug macro
      eal: fix leak in shared lib mode detection
      event/dpaa2: remove unused macros
      net/ice/base: fix memory allocation wrapper
      net/ice: fix leak on thread termination
      devtools: fix orphan symbols check with busybox
      net/vhost: restore pseudo TSO support
      net/ark: fix leak on thread termination
      build: fix drivers selection without Python

Dekel Peled (1):
      common/mlx5: fix DevX read output buffer size

Dmitry Kozlyuk (4):
      net/pcap: fix format string
      eal/windows: add missing SPDX license tag
      buildtools: fix all drivers disabled on Windows
      examples/rxtx_callbacks: fix port ID format specifier

Ed Czeck (2):
      net/ark: update packet director initial state
      net/ark: refactor Rx buffer recovery

Elad Nachman (2):
      kni: support async user request
      kni: fix kernel deadlock with bifurcated device

Feifei Wang (2):
      net/i40e: fix parsing packet type for NEON
      test/trace: fix race on collected perf data

Ferruh Yigit (9):
      power: remove duplicated symbols from map file
      log/linux: make default output stderr
      license: fix typos
      drivers/net: fix FW version query
      net/bnx2x: fix build with GCC 11
      net/bnx2x: fix build with GCC 11
      net/ice/base: fix build with GCC 11
      net/tap: fix build with GCC 11
      test/table: fix build with GCC 11

Gregory Etelson (2):
      app/testpmd: fix tunnel offload flows cleanup
      net/mlx5: fix tunnel offload private items location

Guoyang Zhou (1):
      net/hinic: fix crash in secondary process

Haiyue Wang (1):
      net/ixgbe: fix Rx errors statistics for UDP checksum

Harman Kalra (1):
      event/octeontx2: fix device reconfigure for single slot

Heinrich Kuhn (1):
      net/nfp: fix reporting of RSS capabilities

Hemant Agrawal (3):
      ethdev: add missing buses in device iterator
      crypto/dpaa_sec: affine the thread portal affinity
      crypto/dpaa2_sec: fix close and uninit functions

Hongbo Zheng (9):
      app/testpmd: fix Tx/Rx descriptor query error log
      net/hns3: fix FLR miss detection
      net/hns3: delete redundant blank line
      bpf: fix JSLT validation
      common/sfc_efx/base: fix dereferencing null pointer
      power: fix sanity checks for guest channel read
      net/hns3: fix VF alive notification after config restore
      examples/l3fwd-power: fix empty poll thresholds
      net/hns3: fix concurrent interrupt handling

Huisong Li (23):
      net/hns3: fix device capabilities for copper media type
      net/hns3: remove unused parameter markers
      net/hns3: fix reporting undefined speed
      net/hns3: fix link update when failed to get link info
      net/hns3: fix flow control exception
      app/testpmd: fix bitmap of link speeds when force speed
      net/hns3: fix flow control mode
      net/hns3: remove redundant mailbox response
      net/hns3: fix DCB mode check
      net/hns3: fix VMDq mode check
      net/hns3: fix mbuf leakage
      net/hns3: fix link status when port is stopped
      net/hns3: fix link speed when port is down
      app/testpmd: fix forward lcores number for DCB
      app/testpmd: fix DCB forwarding configuration
      app/testpmd: fix DCB re-configuration
      app/testpmd: verify DCB config during forward config
      net/hns3: fix Rx/Tx queue numbers check
      net/hns3: fix requested FC mode rollback
      net/hns3: remove meaningless packet buffer rollback
      net/hns3: fix DCB configuration
      net/hns3: fix DCB reconfiguration
      net/hns3: fix link speed when VF device is down

Ibtisam Tariq (1):
      examples/vhost_crypto: remove unused short option

Igor Chauskin (2):
      net/ena: switch memcpy to optimized version
      net/ena: fix parsing of large LLQ header device argument

Igor Russkikh (2):
      net/qede: reduce log verbosity
      net/qede: accept bigger RSS table

Ilya Maximets (1):
      net/virtio: fix interrupt unregistering for listening socket

Ivan Malov (5):
      net/sfc: fix buffer size for flow parse
      net: fix comment in IPv6 header
      net/sfc: fix error path inconsistency
      common/sfc_efx/base: fix indication of MAE encap support
      net/sfc: fix outer rule rollback on error

Jerin Jacob (1):
      examples: fix pkg-config override

Jiawei Wang (4):
      app/testpmd: fix NVGRE encap configuration
      net/mlx5: fix resource release for mirror flow
      net/mlx5: fix RSS flow item expansion for GRE key
      net/mlx5: fix RSS flow item expansion for NVGRE

Jiawei Zhu (1):
      net/mlx5: fix Rx segmented packets on mbuf starvation

Jiawen Wu (4):
      net/txgbe: remove unused functions
      net/txgbe: fix Rx missed packet counter
      net/txgbe: update packet type
      net/txgbe: fix QinQ strip

Jiayu Hu (2):
      vhost: fix queue initialization
      vhost: fix redundant vring status change notification

Jie Wang (1):
      net/ice: fix VSI array out of bounds access

John Daley (2):
      net/enic: fix flow initialization error handling
      net/enic: enable GENEVE offload via VNIC configuration

Juraj Linkeš (1):
      eal/arm64: fix platform register bit

Kai Ji (2):
      test/crypto: fix auth-cipher compare length in OOP
      test/crypto: copy offset data to OOP destination buffer

Kalesh AP (23):
      net/bnxt: remove unused macro
      net/bnxt: fix VNIC configuration
      net/bnxt: fix firmware fatal error handling
      net/bnxt: fix FW readiness check during recovery
      net/bnxt: fix device readiness check
      net/bnxt: fix VF info allocation
      net/bnxt: fix HWRM and FW incompatibility handling
      net/bnxt: mute some failure logs
      app/testpmd: check MAC address query
      net/bnxt: fix PCI write check
      net/bnxt: fix link state operations
      net/bnxt: fix timesync when PTP is not supported
      net/bnxt: fix memory allocation for command response
      net/bnxt: fix double free in port start failure
      net/bnxt: fix configuring LRO
      net/bnxt: fix health check alarm cancellation
      net/bnxt: fix PTP support for Thor
      net/bnxt: fix ring count calculation for Thor
      net/bnxt: remove unnecessary forward declarations
      net/bnxt: remove unused function parameters
      net/bnxt: drop unused attribute
      net/bnxt: fix single PF per port check
      net/bnxt: prevent device access in error state

Kamil Vojanec (1):
      net/mlx5/linux: fix firmware version

Kevin Traynor (5):
      test/cmdline: fix inputs array
      test/crypto: fix build with GCC 11
      crypto/zuc: fix build with GCC 11
      test: fix build with GCC 11
      test/cmdline: silence clang 12 warning

Konstantin Ananyev (1):
      acl: fix build with GCC 11

Lance Richardson (8):
      net/bnxt: fix Rx buffer posting
      net/bnxt: fix Tx length hint threshold
      net/bnxt: fix handling of null flow mask
      test: fix TCP header initialization
      net/bnxt: fix Rx descriptor status
      net/bnxt: fix Rx queue count
      net/bnxt: fix dynamic VNIC count
      eal: fix memory mapping on 32-bit target

Leyi Rong (1):
      net/iavf: fix packet length parsing in AVX512

Li Zhang (1):
      net/mlx5: fix flow actions index in cache

Luc Pelletier (2):
      eal: fix race in control thread creation
      eal: fix hang in control thread creation

Marvin Liu (5):
      vhost: fix split ring potential buffer overflow
      vhost: fix packed ring potential buffer overflow
      vhost: fix batch dequeue potential buffer overflow
      vhost: fix initialization of temporary header
      vhost: fix initialization of async temporary header

Matan Azrad (5):
      common/mlx5/linux: add glue function to query WQ
      common/mlx5: add DevX command to query WQ
      common/mlx5: add DevX commands for queue counters
      vdpa/mlx5: fix virtq cleaning
      vdpa/mlx5: fix device unplug

Michael Baum (1):
      net/mlx5: fix flow age event triggering

Michal Krawczyk (5):
      net/ena/base: improve style and comments
      net/ena/base: fix type conversions by explicit casting
      net/ena/base: destroy multiple wait events
      net/ena: fix crash with unsupported device argument
      net/ena: indicate Rx RSS hash presence

Min Hu (Connor) (25):
      net/hns3: fix MTU config complexity
      net/hns3: update HiSilicon copyright syntax
      net/hns3: fix copyright date
      examples/ptpclient: remove wrong comment
      test/bpf: fix error message
      doc: fix HiSilicon copyright syntax
      net/hns3: remove unused macros
      net/hns3: remove unused macro
      app/eventdev: fix overflow in lcore list parsing
      test/kni: fix a comment
      test/kni: check init result
      net/hns3: fix typos on comments
      net/e1000: fix flow error message object
      app/testpmd: fix division by zero on socket memory dump
      net/kni: warn on stop failure
      app/bbdev: check memory allocation
      app/bbdev: fix HARQ error messages
      raw/skeleton: add missing check after setting attribute
      test/timer: check memzone allocation
      app/crypto-perf: check memory allocation
      examples/flow_classify: fix NUMA check of port and core
      examples/l2fwd-cat: fix NUMA check of port and core
      examples/skeleton: fix NUMA check of port and core
      test: check flow classifier creation
      test: fix division by zero

Murphy Yang (3):
      net/ixgbe: fix RSS RETA being reset after port start
      net/i40e: fix flow director config after flow validate
      net/i40e: fix flow director for common pctypes

Natanael Copa (5):
      common/dpaax/caamflib: fix build with musl
      bus/dpaa: fix 64-bit arch detection
      bus/dpaa: fix build with musl
      net/cxgbe: remove use of uint type
      app/testpmd: fix build with musl

Nipun Gupta (1):
      bus/dpaa: fix statistics reading

Nithin Dabilpuram (3):
      vfio: do not merge contiguous areas
      vfio: fix DMA mapping granularity for IOVA as VA
      test/mem: fix page size for external memory

Olivier Matz (1):
      test/mempool: fix object initializer

Pallavi Kadam (1):
      bus/pci: skip probing some Windows NDIS devices

Pavan Nikhilesh (4):
      test/event: fix timeout accuracy
      app/eventdev: fix timeout accuracy
      app/eventdev: fix lcore parsing skipping last core
      event/octeontx2: fix XAQ pool reconfigure

Pu Xu (1):
      ip_frag: fix fragmenting IPv4 packet with header option

Qi Zhang (8):
      net/ice/base: fix payload indicator on ptype
      net/ice/base: fix uninitialized struct
      net/ice/base: cleanup filter list on error
      net/ice/base: fix memory allocation for MAC addresses
      net/iavf: fix TSO max segment size
      doc: fix matching versions in ice guide
      net/iavf: fix wrong Tx context descriptor
      common/iavf: fix duplicated offload bit

Radha Mohan Chintakuntla (1):
      raw/octeontx2_dma: assign PCI device in DPI VF

Raslan Darawsheh (1):
      ethdev: update flow item GTP QFI definition

Richael Zhuang (2):
      test/power: add delay before checking CPU frequency
      test/power: round CPU frequency to check

Robin Zhang (6):
      net/i40e: announce request queue capability in PF
      doc: update recommended versions for i40e
      net/i40e: fix lack of MAC type when set MAC address
      net/iavf: fix lack of MAC type when set MAC address
      net/iavf: fix primary MAC type when starting port
      net/i40e: fix primary MAC type when starting port

Rohit Raj (3):
      net/dpaa2: fix getting link status
      net/dpaa: fix getting link status
      examples/l2fwd-crypto: fix packet length while decryption

Roy Shterman (1):
      mem: fix freeing segments in --huge-unlink mode

Satheesh Paul (1):
      net/octeontx2: fix VLAN filter

Savinay Dharmappa (1):
      sched: fix traffic class oversubscription parameter

Shijith Thotton (3):
      eventdev: fix case to initiate crypto adapter service
      event/octeontx2: fix crypto adapter queue pair operations
      event/octeontx2: configure crypto adapter xaq pool

Siwar Zitouni (1):
      net/ice: fix disabling promiscuous mode

Somnath Kotur (5):
      net/bnxt: fix xstats get
      net/bnxt: fix Rx and Tx timestamps
      net/bnxt: fix Tx timestamp init
      net/bnxt: refactor multi-queue Rx configuration
      net/bnxt: fix Rx timestamp when FIFO pending bit is set

Stanislaw Kardach (6):
      test: proceed if timer subsystem already initialized
      stack: allow lock-free only on relevant architectures
      test/distributor: fix worker notification in burst mode
      test/distributor: fix burst flush on worker quit
      net/ena: remove endian swap functions
      net/ena: report default ring size

Stephen Hemminger (2):
      kni: refactor user request processing
      net/bnxt: use prefix on global function

Suanming Mou (1):
      net/mlx5: fix counter offset detection

Tal Shnaiderman (2):
      eal/windows: fix default thread priority
      eal/windows: fix return codes of pthread shim layer

Tengfei Zhang (1):
      net/pcap: fix file descriptor leak on close

Thinh Tran (1):
      test: fix autotest handling of skipped tests

Thomas Monjalon (18):
      bus/pci: fix Windows kernel driver categories
      eal: fix comment of OS-specific header files
      buildtools: fix build with busybox
      build: detect execinfo library on Linux
      build: remove redundant _GNU_SOURCE definitions
      eal: fix build with musl
      net/igc: remove use of uint type
      event/dlb: fix header includes for musl
      examples/bbdev: fix header include for musl
      drivers: fix log level after loading
      app/regex: fix usage text
      app/testpmd: fix usage text
      doc: fix names of UIO drivers
      doc: fix build with Sphinx 4
      bus/pci: support I/O port operations with musl
      app: fix exit messages
      regex/octeontx2: remove unused include directory
      doc: remove PDF requirements

Tianyu Li (1):
      net/memif: fix Tx bps statistics for zero-copy

Timothy McDaniel (2):
      event/dlb2: remove references to deferred scheduling
      doc: fix runtime options in DLB2 guide

Tyler Retzlaff (1):
      eal: add C++ include guard for reciprocal header

Vadim Podovinnikov (1):
      net/bonding: fix LACP system address check

Venkat Duvvuru (1):
      net/bnxt: fix queues per VNIC

Viacheslav Ovsiienko (16):
      net/mlx5: fix external buffer pool registration for Rx queue
      net/mlx5: fix metadata item validation for ingress flows
      net/mlx5: fix hashed list size for tunnel flow groups
      net/mlx5: fix UAR allocation diagnostics messages
      common/mlx5: add timestamp format support to DevX
      vdpa/mlx5: support timestamp format
      net/mlx5: fix Rx metadata leftovers
      net/mlx5: fix drop action for Direct Rules/Verbs
      net/mlx4: fix RSS action with null hash key
      net/mlx5: support timestamp format
      regex/mlx5: support timestamp format
      app/testpmd: fix segment number check
      net/mlx5: remove drop queue function prototypes
      net/mlx4: fix buffer leakage on device close
      net/mlx5: fix probing device in legacy bonding mode
      net/mlx5: fix receiving queue timestamp format

Wei Huang (1):
      raw/ifpga: fix device name format

Wenjun Wu (3):
      net/ice: check some functions return
      net/ice: fix RSS hash update
      net/ice: fix RSS for L2 packet

Wenwu Ma (1):
      net/ice: fix illegal access when removing MAC filter

Wenzhuo Lu (2):
      net/iavf: fix crash in AVX512
      net/ice: fix crash in AVX512

Wisam Jaddo (1):
      app/flow-perf: fix encap/decap actions

Xiao Wang (1):
      vdpa/ifc: check PCI config read

Xiaoyu Min (4):
      net/mlx5: support RSS expansion for IPv6 GRE
      net/mlx5: fix shared inner RSS
      net/mlx5: fix missing shared RSS hash types
      net/mlx5: fix redundant flow after RSS expansion

Xiaoyun Li (2):
      app/testpmd: remove unnecessary UDP tunnel check
      net/i40e: fix IPv4 fragment offload

Xueming Li (2):
      version: 20.11.2-rc1
      net/virtio: fix vectorized Rx queue rearm

Youri Querry (1):
      bus/fslmc: fix random portal hangs with qbman 5.0

Yunjian Wang (5):
      vfio: fix API description
      net/mlx5: fix using flow tunnel before null check
      vfio: fix duplicated user mem map
      net/mlx4: fix leak when configured repeatedly
      net/mlx5: fix leak when configured repeatedly

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] Experimental symbols in kni lib
  2021-06-25 13:26  0%     ` Igor Ryzhov
@ 2021-06-28 12:23  0%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2021-06-28 12:23 UTC (permalink / raw)
  To: Igor Ryzhov; +Cc: Kinsella, Ray, Thomas Monjalon, Stephen Hemminger, dpdk-dev

On 6/25/2021 2:26 PM, Igor Ryzhov wrote:
> Hi Ferruh, all,
> 
> Let's please discuss another approach to setting KNI link status before
> making this API stable:
> http://patches.dpdk.org/project/dpdk/patch/20190925093623.18419-1-iryzhov@nfware.com/
> 
> I explained the problem with the current implementation there.
> More than that, using ioctl approach makes it possible to set also speed
> and duplex and use them to implement get_link_ksettings callback.
> I can send patches for both features.
> 

Hi Igor, agree to discuss your patch before promoting the API, I will comment on
the outstanding patch.

> Igor
> 
> On Thu, Jun 24, 2021 at 4:54 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
> 
>> Sounds more than reasonable, +1 from me.
>>
>> Ray K
>>
>> On 24/06/2021 14:24, Ferruh Yigit wrote:
>>> On 6/24/2021 11:42 AM, Kinsella, Ray wrote:
>>>> Hi Ferruh,
>>>>
>>>> The following kni experimental symbols are present in both v21.05 and
>> v19.11 release. These symbols should be considered for promotion to stable
>> as part of the v22 ABI in DPDK 21.11, as they have been experimental for >=
>> 2yrs at this point.
>>>>
>>>>  * rte_kni_update_link
>>>>
>>>> Ray K
>>>>
>>>
>>> Hi Ray,
>>>
>>> Thanks for follow up.
>>>
>>> I just checked the API and planning a small behavior update to it.
>>> If the update is accepted, I suggest keeping the API experimental for
>> 21.08 too,
>>> but can mature it on v21.11.
>>>
>>> Thanks,
>>> ferruh
>>>
>>


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v3 1/7] power_intrinsics: use callbacks for comparison
  @ 2021-06-28 12:41  3%     ` Anatoly Burakov
  2021-06-28 12:41  3%     ` [dpdk-dev] [PATCH v3 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-06-28 12:41 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  1 +
 drivers/event/dlb2/dlb2.c                     | 16 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 19 ++++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 19 ++++++++----
 drivers/net/ice/ice_rxtx.c                    | 19 ++++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 19 ++++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 16 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 29 ++++++++++++++-----
 lib/eal/x86/rte_power_intrinsics.c            |  9 ++----
 9 files changed, 106 insertions(+), 41 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..c84ac280f5 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -84,6 +84,7 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..14dfac257c 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,15 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val, const uint64_t opaque[4])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3203,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 6c58decece..45f3fbf4ec 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,17 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +104,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 0361af0d85..6e12ecce07 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,17 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +80,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index fc9bb5a3e7..278eb4b9a1 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,17 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +50,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..0c5045d9dc 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,17 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value, const uint64_t arg[4] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1392,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..57f6ca1467 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,17 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx_monitor_callback(const uint64_t value, const uint64_t opaque[4])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +293,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..046667ade6 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,34 @@
  * which are architecture-dependent.
  */
 
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[4]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[4]; /**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..3c5c9ce7ad 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -110,14 +110,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
+	/* if we have a callback, we might not need to sleep at all */
+	if (pmc->fn) {
 		const uint64_t cur_value = __get_umwait_val(
 				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
-
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
+		if (pmc->fn(cur_value, pmc->opaque) != 0)
 			goto end;
 	}
 
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v3 4/7] power: remove thread safety from PMD power API's
    2021-06-28 12:41  3%     ` [dpdk-dev] [PATCH v3 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
@ 2021-06-28 12:41  3%     ` Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-06-28 12:41 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: ciara.loftus

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   5 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 9d1cfac395..f015c509fc 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -88,6 +88,11 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
+
 ABI Changes
 -----------
 
diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..4f6a242364 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,4 +21,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 1/7] power_intrinsics: use callbacks for comparison
  @ 2021-06-28 15:54  3%       ` Anatoly Burakov
  2021-06-28 15:54  3%       ` [dpdk-dev] [PATCH v4 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-06-28 15:54 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  1 +
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 121 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..c84ac280f5 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -84,6 +84,7 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 6c58decece..081682f88b 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 0361af0d85..7ed196ec22 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index fc9bb5a3e7..d12437d19d 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..17370b77dc 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 4/7] power: remove thread safety from PMD power API's
    2021-06-28 15:54  3%       ` [dpdk-dev] [PATCH v4 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
@ 2021-06-28 15:54  3%       ` Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-06-28 15:54 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: konstantin.ananyev, ciara.loftus

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   5 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 9d1cfac395..f015c509fc 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -88,6 +88,11 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
+
 ABI Changes
 -----------
 
diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..4f6a242364 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,4 +21,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 1/7] power_intrinsics: use callbacks for comparison
  @ 2021-06-29 15:48  3%         ` Anatoly Burakov
  2021-06-29 15:48  3%         ` [dpdk-dev] [PATCH v5 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-06-29 15:48 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  1 +
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 121 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..c84ac280f5 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -84,6 +84,7 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 6c58decece..081682f88b 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 0361af0d85..7ed196ec22 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index fc9bb5a3e7..d12437d19d 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..17370b77dc 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v5 4/7] power: remove thread safety from PMD power API's
    2021-06-29 15:48  3%         ` [dpdk-dev] [PATCH v5 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
@ 2021-06-29 15:48  3%         ` Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-06-29 15:48 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: konstantin.ananyev, ciara.loftus

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   5 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 9d1cfac395..f015c509fc 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -88,6 +88,11 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
+
 ABI Changes
 -----------
 
diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..4f6a242364 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,4 +21,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs
@ 2021-06-29 16:00 21% Ray Kinsella
  2021-06-29 16:28  3% ` Tyler Retzlaff
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Ray Kinsella @ 2021-06-29 16:00 UTC (permalink / raw)
  To: dev; +Cc: ferruh.yigit, thomas, david.marchand, stephen, Ray Kinsella

Clarifying the ABI policy on the promotion of experimental APIS to stable.
We have a fair number of APIs that have been experimental for more than
2 years. This policy ammendment indicates that these APIs should be
promoted or removed, or should at least form a conservation between the
maintainer and original contributor.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
 doc/guides/contributing/abi_policy.rst | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index 4ad87dbfed..58bc45b8a5 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -26,9 +26,10 @@ General Guidelines
    symbols is managed with :ref:`ABI Versioning <abi_versioning>`.
 #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
    once approved these will form part of the next ABI version.
-#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
-   be changed or removed without prior notice, as they are not considered part
-   of an ABI version.
+#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
+   changed or removed without prior notice, as they are not considered part of
+   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
+   is not an indefinite state.
 #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
    support for hardware which was previously supported, should be treated as an
    ABI change.
@@ -358,3 +359,16 @@ Libraries
 Libraries marked as ``experimental`` are entirely not considered part of an ABI
 version.
 All functions in such libraries may be changed or removed without prior notice.
+
+Promotion to stable
+~~~~~~~~~~~~~~~~~~~
+
+Ordinarily APIs marked as ``experimental`` will be promoted to the stable API
+once a maintainer and/or the original contributor is satisfied that the API is
+reasonably mature. In exceptional circumstances, should an API still be
+classified as ``experimental`` after two years and is without any prospect of
+becoming part of the stable API. The API will then become a candidate for
+removal, to avoid the acculumation of abandoned symbols.
+
+The promotion or removal of symbols will typically form part of a conversation
+between the maintainer and the original contributor.
-- 
2.26.2


^ permalink raw reply	[relevance 21%]

* Re: [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs
  2021-06-29 16:00 21% [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs Ray Kinsella
@ 2021-06-29 16:28  3% ` Tyler Retzlaff
  2021-06-29 18:38  0%   ` Kinsella, Ray
  2021-07-01 10:31 23% ` [dpdk-dev] [PATCH v2] " Ray Kinsella
  2021-07-01 10:38 23% ` [dpdk-dev] [PATCH v3] doc: policy on the " Ray Kinsella
  2 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2021-06-29 16:28 UTC (permalink / raw)
  To: Ray Kinsella; +Cc: dev, ferruh.yigit, thomas, david.marchand, stephen

On Tue, Jun 29, 2021 at 05:00:31PM +0100, Ray Kinsella wrote:
> Clarifying the ABI policy on the promotion of experimental APIS to stable.
> We have a fair number of APIs that have been experimental for more than
> 2 years. This policy ammendment indicates that these APIs should be
> promoted or removed, or should at least form a conservation between the
> maintainer and original contributor.
> 
> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> ---
>  doc/guides/contributing/abi_policy.rst | 20 +++++++++++++++++---
>  1 file changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
> index 4ad87dbfed..58bc45b8a5 100644
> --- a/doc/guides/contributing/abi_policy.rst
> +++ b/doc/guides/contributing/abi_policy.rst
> @@ -26,9 +26,10 @@ General Guidelines
>     symbols is managed with :ref:`ABI Versioning <abi_versioning>`.
>  #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
>     once approved these will form part of the next ABI version.
> -#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
> -   be changed or removed without prior notice, as they are not considered part
> -   of an ABI version.
> +#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
> +   changed or removed without prior notice, as they are not considered part of
> +   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
> +   is not an indefinite state.
>  #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
>     support for hardware which was previously supported, should be treated as an
>     ABI change.
> @@ -358,3 +359,16 @@ Libraries
>  Libraries marked as ``experimental`` are entirely not considered part of an ABI
>  version.
>  All functions in such libraries may be changed or removed without prior notice.
> +
> +Promotion to stable
> +~~~~~~~~~~~~~~~~~~~
> +
> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable API
> +once a maintainer and/or the original contributor is satisfied that the API is
> +reasonably mature. In exceptional circumstances, should an API still be

this seems vague and arbitrary. is there a way we can have a more
quantitative metric for what "reasonably mature" means.

> +classified as ``experimental`` after two years and is without any prospect of
> +becoming part of the stable API. The API will then become a candidate for
> +removal, to avoid the acculumation of abandoned symbols.

i think with the above comment the basis for removal then depends on
whatever metric is used to determine maturity. if it is still changing
then it seems like it is useful and still evolving so perhaps should not
be removed but hasn't changed but doesn't meet the metric for being made
stable then perhaps it becomes a candidate for removal.

> +
> +The promotion or removal of symbols will typically form part of a conversation
> +between the maintainer and the original contributor.

this should extend beyond just symbols. there are other changes that
impact the abi where exported symbols don't change. e.g. additions to
return values sets.

thanks for working on this.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] Experimental symbols in eal lib
  2021-06-24 12:14  0% ` David Marchand
  2021-06-24 12:15  0%   ` Kinsella, Ray
@ 2021-06-29 16:50  0%   ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2021-06-29 16:50 UTC (permalink / raw)
  To: David Marchand
  Cc: Kinsella, Ray, Thomas Monjalon, Stephen Hemminger, Burakov,
	Anatoly, dpdk-dev

On Thu, Jun 24, 2021 at 02:14:16PM +0200, David Marchand wrote:
> On Thu, Jun 24, 2021 at 12:31 PM Kinsella, Ray <mdr@ashroe.eu> wrote:
> >
> > Hi Anatoly & Thomas,
> >
> > The following eal experimental symbols are present in both v21.05 and v19.11 release. These symbols should be considered for promotion to stable as part of the v22 ABI in DPDK 21.11, as they have been experimental for >= 2yrs at this point.
> 
> Just an additional comment.
> Marking stable is not the only choice.
> We can also consider hiding such symbols (marking internal) if there
> is no clear usecase out of DPDK.

+1

there has to be a very strong/clear case for promotion to public.

> 
> 
> -- 
> David Marchand

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs
  2021-06-29 16:28  3% ` Tyler Retzlaff
@ 2021-06-29 18:38  0%   ` Kinsella, Ray
  2021-06-30 19:56  4%     ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-06-29 18:38 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, ferruh.yigit, thomas, david.marchand, stephen



On 29/06/2021 17:28, Tyler Retzlaff wrote:
> On Tue, Jun 29, 2021 at 05:00:31PM +0100, Ray Kinsella wrote:
>> Clarifying the ABI policy on the promotion of experimental APIS to stable.
>> We have a fair number of APIs that have been experimental for more than
>> 2 years. This policy ammendment indicates that these APIs should be
>> promoted or removed, or should at least form a conservation between the
>> maintainer and original contributor.
>>
>> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
>> ---
>>  doc/guides/contributing/abi_policy.rst | 20 +++++++++++++++++---
>>  1 file changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
>> index 4ad87dbfed..58bc45b8a5 100644
>> --- a/doc/guides/contributing/abi_policy.rst
>> +++ b/doc/guides/contributing/abi_policy.rst
>> @@ -26,9 +26,10 @@ General Guidelines
>>     symbols is managed with :ref:`ABI Versioning <abi_versioning>`.
>>  #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
>>     once approved these will form part of the next ABI version.
>> -#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
>> -   be changed or removed without prior notice, as they are not considered part
>> -   of an ABI version.
>> +#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
>> +   changed or removed without prior notice, as they are not considered part of
>> +   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
>> +   is not an indefinite state.
>>  #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
>>     support for hardware which was previously supported, should be treated as an
>>     ABI change.
>> @@ -358,3 +359,16 @@ Libraries
>>  Libraries marked as ``experimental`` are entirely not considered part of an ABI
>>  version.
>>  All functions in such libraries may be changed or removed without prior notice.
>> +
>> +Promotion to stable
>> +~~~~~~~~~~~~~~~~~~~
>> +
>> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable API
>> +once a maintainer and/or the original contributor is satisfied that the API is
>> +reasonably mature. In exceptional circumstances, should an API still be
> 
> this seems vague and arbitrary. is there a way we can have a more
> quantitative metric for what "reasonably mature" means.
> 
>> +classified as ``experimental`` after two years and is without any prospect of
>> +becoming part of the stable API. The API will then become a candidate for
>> +removal, to avoid the acculumation of abandoned symbols.
> 
> i think with the above comment the basis for removal then depends on
> whatever metric is used to determine maturity. 
> if it is still changing
> then it seems like it is useful and still evolving so perhaps should not
> be removed but hasn't changed but doesn't meet the metric for being made
> stable then perhaps it becomes a candidate for removal.

Good idea. 

I think it is reasonable to add a clause that indicates that any change 
to the "API signature" would reset the clock.

However equally any changes to the implementation do not reset the clock.

Would that work?

> 
>> +
>> +The promotion or removal of symbols will typically form part of a conversation
>> +between the maintainer and the original contributor.
> 
> this should extend beyond just symbols. there are other changes that
> impact the abi where exported symbols don't change. e.g. additions to
> return values sets.> 
> thanks for working on this.
> 

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] 20.11.2 patches review and test
  2021-06-26 23:28  1% Xueming Li
@ 2021-06-30 10:33  0% ` Jiang, YuX
  2021-07-06  2:37  0%   ` Xueming(Steven) Li
  2021-07-06  3:26  0% ` [dpdk-dev] [dpdk-stable] " Kalesh Anakkur Purayil
  1 sibling, 1 reply; 200+ results
From: Jiang, YuX @ 2021-06-30 10:33 UTC (permalink / raw)
  To: Xueming Li, stable
  Cc: dev, Abhishek Marathe, Akhil Goyal, Ali Alnubani, Walker,
	Benjamin, David Christensen, Govindharajan, Hariprasad,
	Hemant Agrawal, Stokes, Ian, Jerin Jacob, Mcnamara, John,
	Ju-Hyoung Lee, Kevin Traynor, Luca Boccassi, Pei Zhang, Yu,
	PingX, Xu, Qian Q, Raslan Darawsheh, Thomas Monjalon, Peng, Yuan,
	Chen, Zhaoyan

All,
Testing with dpdk v20.11.2-rc2 from Intel looks good, no critical issue is found. All of them are known issues.
Below two issues has been fixed in 20.11.2-rc2:
  1) Fedora34 GCC11 and Clang12 build failed.
  2) dcf_lifecycle/handle_acl_filter_05: after reset port the mac changed.

# Basic Intel(R) NIC testing
*PF(i40e, ixgbe): test scenarios including rte_flow/TSO/Jumboframe/checksum offload/Tunnel, etc. Listed but not all.
- Below two known issues are found.
  1)https://bugs.dpdk.org/show_bug.cgi?id=687 : unit_tests_power/power_cpufreq: unit test failed. This issue is found in 21.05 and not fixed yet.
  2)ddp_gtp_qregion/fd_gtpu_ipv4_dstip: flow director does not work. This issue is found in 21.05, fixed in 21.08.
    Fixed patch link: http://patches.dpdk.org/project/dpdk/patch/20210519032745.707639-1-stevex.yang@intel.com/                         
*VF(i40e,ixgbe): test scenarios including vf-rte_flow/TSO/Jumboframe/checksum offload/Tunnel, Listed but not all.
- No new issues are found.              
*PF/VF(ice): test scenarios including switch features/Flow Director/Advanced RSS/ACL/DCF/Flexible Descriptor and so on, Listed but not all.
- Below 3 known DPDK issues are found. 
  1)rxtx_offload/rxoffload_port: Pkt1 can't be distributed to the same queue. This issue is found in 21.05, fixed in 21.08
    Fixed patch link: http://patches.dpdk.org/project/dpdk/patch/20210527064251.242076-1-dapengx.yu@intel.com/ 
  2)cvl_advanced_iavf_rss: change the SCTP port value, the hash value remains unchanged. This issue is found in 20.11-rc3, fixed in 21.02, but it’s belong to 21.02 new feature, won’t backporting to LTS20.11.
  3)Can't create 512 acl rules after creating a full mask switch rule. This issue is also occurred in dpdk 20.11 and not fixed yet.                     
* Build: cover the build test combination with latest GCC/Clang/ICC version and the popular OS revision such as Ubuntu20.04, CentOS8.3 and so on. Listed but not all.
- All passed.              
* Intel NIC single core/NIC performance: test scenarios including PF/VF single core performance test(AVX2+AVX512) test and so on. Listed but not all.
- All passed. No big data drop. 

# Basic cryptodev and virtio testing
* Virtio: both function and performance test are covered. Such as PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf testing, etc.. Listed but not all.
- One known issues as below:
> (1)The UDP fragmentation offload feature of Virtio-net device can’t be turned on in the VM, kernel issue, bugzilla has been submited: https://bugzilla.kernel.org/show_bug.cgi?id=207075, not fixed yet.                     
* Cryptodev: 
- Function test: test scenarios including Cryptodev API testing/CompressDev ISA-L/QAT/ZLIB PMD Testing/FIPS, etc. Listed but not all.
  - All passed.
- Performance test: test scenarios including Thoughput Performance /Cryptodev Latency, etc. Listed but not all.
  - No big data drop.

Best regards,
Yu Jiang

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Xueming Li
> Sent: Sunday, June 27, 2021 7:28 AM
> To: stable@dpdk.org
> Cc: dev@dpdk.org; Abhishek Marathe <Abhishek.Marathe@microsoft.com>;
> Akhil Goyal <akhil.goyal@nxp.com>; Ali Alnubani <alialnu@nvidia.com>;
> Walker, Benjamin <benjamin.walker@intel.com>; David Christensen
> <drc@linux.vnet.ibm.com>; Govindharajan, Hariprasad
> <hariprasad.govindharajan@intel.com>; Hemant Agrawal
> <hemant.agrawal@nxp.com>; Stokes, Ian <ian.stokes@intel.com>; Jerin
> Jacob <jerinj@marvell.com>; Mcnamara, John <john.mcnamara@intel.com>;
> Ju-Hyoung Lee <juhlee@microsoft.com>; Kevin Traynor
> <ktraynor@redhat.com>; Luca Boccassi <bluca@debian.org>; Pei Zhang
> <pezhang@redhat.com>; Yu, PingX <pingx.yu@intel.com>; Xu, Qian Q
> <qian.q.xu@intel.com>; Raslan Darawsheh <rasland@nvidia.com>; Thomas
> Monjalon <thomas@monjalon.net>; Peng, Yuan <yuan.peng@intel.com>;
> Chen, Zhaoyan <zhaoyan.chen@intel.com>; xuemingl@nvidia.com
> Subject: [dpdk-dev] 20.11.2 patches review and test
> 
> Hi all,
> 
> Here is a list of patches targeted for stable release 20.11.2.
> 
> The planned date for the final release is 6th July.
> 
> Please help with testing and validation of your use cases and report any
> issues/results with reply-all to this mail. For the final release the fixes and
> reported validations will be added to the release notes.
> 
> A release candidate tarball can be found at:
> 
>     https://dpdk.org/browse/dpdk-stable/tag/?id=v20.11.2-rc2
> 
> These patches are located at branch 20.11 of dpdk-stable repo:
>     https://dpdk.org/browse/dpdk-stable/
> 
> Thanks.
> 
> Xueming Li <xuemingl@nvidia.com>
> 
> ---
> Adam Dybkowski (3):
>       common/qat: increase IM buffer size for GEN3
>       compress/qat: enable compression on GEN3
>       crypto/qat: fix null authentication request
> 
> Ajit Khaparde (7):
>       net/bnxt: fix RSS context cleanup
>       net/bnxt: check kvargs parsing
>       net/bnxt: fix resource cleanup
>       doc: fix formatting in testpmd guide
>       net/bnxt: fix mismatched type comparison in MAC restore
>       net/bnxt: check PCI config read
>       net/bnxt: fix mismatched type comparison in Rx
> 
> Alvin Zhang (11):
>       net/ice: fix VLAN filter with PF
>       net/i40e: fix input set field mask
>       net/igc: fix Rx RSS hash offload capability
>       net/igc: fix Rx error counter for bad length
>       net/e1000: fix Rx error counter for bad length
>       net/e1000: fix max Rx packet size
>       net/igc: fix Rx packet size
>       net/ice: fix fast mbuf freeing
>       net/iavf: fix VF to PF command failure handling
>       net/i40e: fix VF RSS configuration
>       net/igc: fix speed configuration
> 
> Anatoly Burakov (3):
>       fbarray: fix log message on truncation error
>       power: do not skip saving original P-state governor
>       power: save original ACPI governor always
> 
> Andrew Boyer (1):
>       net/ionic: fix completion type in lif init
> 
> Andrew Rybchenko (4):
>       net/failsafe: fix RSS hash offload reporting
>       net/failsafe: report minimum and maximum MTU
>       common/sfc_efx: remove GENEVE from supported tunnels
>       net/sfc: fix mark support in EF100 native Rx datapath
> 
> Andy Moreton (2):
>       common/sfc_efx/base: limit reported MCDI response length
>       common/sfc_efx/base: add missing MCDI response length checks
> 
> Ankur Dwivedi (1):
>       crypto/octeontx: fix session-less mode
> 
> Apeksha Gupta (1):
>       examples/l2fwd-crypto: skip masked devices
> 
> Arek Kusztal (1):
>       crypto/qat: fix offset for out-of-place scatter-gather
> 
> Beilei Xing (1):
>       net/i40evf: fix packet loss for X722
> 
> Bing Zhao (1):
>       net/mlx5: fix loopback for Direct Verbs queue
> 
> Bruce Richardson (2):
>       build: exclude meson files from examples installation
>       raw/ioat: fix script for configuring small number of queues
> 
> Chaoyong He (1):
>       doc: fix multiport syntax in nfp guide
> 
> Chenbo Xia (1):
>       examples/vhost: check memory table query
> 
> Chengchang Tang (20):
>       net/hns3: fix HW buffer size on MTU update
>       net/hns3: fix processing Tx offload flags
>       net/hns3: fix Tx checksum for UDP packets with special port
>       net/hns3: fix long task queue pairs reset time
>       ethdev: validate input in module EEPROM dump
>       ethdev: validate input in register info
>       ethdev: validate input in EEPROM info
>       net/hns3: fix rollback after setting PVID failure
>       net/hns3: fix timing in resetting queues
>       net/hns3: fix queue state when concurrent with reset
>       net/hns3: fix configure FEC when concurrent with reset
>       net/hns3: fix use of command status enumeration
>       examples: add eal cleanup to examples
>       net/bonding: fix adding itself as its slave
>       net/hns3: fix timing in mailbox
>       app/testpmd: fix max queue number for Tx offloads
>       net/tap: fix interrupt vector array size
>       net/bonding: fix socket ID check
>       net/tap: check ioctl on restore
>       examples/timer: fix time interval
> 
> Chengwen Feng (50):
>       net/hns3: fix flow counter value
>       net/hns3: fix VF mailbox head field
>       net/hns3: support get device version when dump register
>       net/hns3: fix some packet types
>       net/hns3: fix missing outer L4 UDP flag for VXLAN
>       net/hns3: remove VLAN/QinQ ptypes from support list
>       test: check thread creation
>       common/dpaax: fix possible null pointer access
>       examples/ethtool: remove unused parsing
>       net/hns3: fix flow director lock
>       net/e1000/base: fix timeout for shadow RAM write
>       net/hns3: fix setting default MAC address in bonding of VF
>       net/hns3: fix possible mismatched response of mailbox
>       net/hns3: fix VF handling LSC event in secondary process
>       net/hns3: fix verification of NEON support
>       mbuf: check shared memory before dumping dynamic space
>       eventdev: remove redundant thread name setting
>       eventdev: fix memory leakage on thread creation failure
>       net/kni: check init result
>       net/hns3: fix mailbox error message
>       net/hns3: fix processing link status message on PF
>       net/hns3: remove unused mailbox macro and struct
>       net/bonding: fix leak on remove
>       net/hns3: fix handling link update
>       net/i40e: fix negative VEB index
>       net/i40e: remove redundant VSI check in Tx queue setup
>       net/virtio: fix getline memory leakage
>       net/hns3: log time delta in decimal format
>       net/hns3: fix time delta calculation
>       net/hns3: remove unused macros
>       net/hns3: fix vector Rx burst limitation
>       net/hns3: remove read when enabling TM QCN error event
>       net/hns3: remove unused VMDq code
>       net/hns3: increase readability in logs
>       raw/ntb: check SPAD user index
>       raw/ntb: check memory allocations
>       ipc: check malloc sync reply result
>       eal: fix service core list parsing
>       ipc: use monotonic clock
>       net/hns3: return error on PCI config write failure
>       net/hns3: fix log on flow director clear
>       net/hns3: clear hash map on flow director clear
>       net/hns3: fix querying flow director counter for out param
>       net/hns3: fix TM QCN error event report by MSI-X
>       net/hns3: fix mailbox message ID in log
>       net/hns3: fix secondary process request start/stop Rx/Tx
>       net/hns3: fix ordering in secondary process initialization
>       net/hns3: fail setting FEC if one bit mode is not supported
>       net/mlx4: fix secondary process initialization ordering
>       net/mlx5: fix secondary process initialization ordering
> 
> Ciara Loftus (1):
>       net/af_xdp: fix error handling during Rx queue setup
> 
> Ciara Power (2):
>       telemetry: fix race on callbacks list
>       test/crypto: fix return value of a skipped test
> 
> Conor Walsh (1):
>       examples/l3fwd: fix LPM IPv6 subnets
> 
> Cristian Dumitrescu (3):
>       table: fix actions with different data size
>       pipeline: fix instruction translation
>       pipeline: fix endianness conversions
> 
> Dapeng Yu (3):
>       net/igc: remove MTU setting limitation
>       net/e1000: remove MTU setting limitation
>       examples/packet_ordering: fix port configuration
> 
> David Christensen (1):
>       config/ppc: reduce number of cores and NUMA nodes
> 
> David Harton (1):
>       net/ena: fix releasing Tx ring mbufs
> 
> David Hunt (4):
>       test/power: fix CPU frequency check
>       test/power: add turbo mode to frequency check
>       test/power: fix low frequency test when turbo enabled
>       test/power: fix turbo test
> 
> David Marchand (18):
>       doc: fix sphinx rtd theme import in GHA
>       service: clean references to removed symbol
>       eal: fix evaluation of log level option
>       ci: hook to GitHub Actions
>       ci: enable v21 ABI checks
>       ci: fix package installation in GitHub Actions
>       ci: ignore APT update failure in GitHub Actions
>       ci: catch coredumps
>       vhost: fix offload flags in Rx path
>       bus/fslmc: remove unused debug macro
>       eal: fix leak in shared lib mode detection
>       event/dpaa2: remove unused macros
>       net/ice/base: fix memory allocation wrapper
>       net/ice: fix leak on thread termination
>       devtools: fix orphan symbols check with busybox
>       net/vhost: restore pseudo TSO support
>       net/ark: fix leak on thread termination
>       build: fix drivers selection without Python
> 
> Dekel Peled (1):
>       common/mlx5: fix DevX read output buffer size
> 
> Dmitry Kozlyuk (4):
>       net/pcap: fix format string
>       eal/windows: add missing SPDX license tag
>       buildtools: fix all drivers disabled on Windows
>       examples/rxtx_callbacks: fix port ID format specifier
> 
> Ed Czeck (2):
>       net/ark: update packet director initial state
>       net/ark: refactor Rx buffer recovery
> 
> Elad Nachman (2):
>       kni: support async user request
>       kni: fix kernel deadlock with bifurcated device
> 
> Feifei Wang (2):
>       net/i40e: fix parsing packet type for NEON
>       test/trace: fix race on collected perf data
> 
> Ferruh Yigit (9):
>       power: remove duplicated symbols from map file
>       log/linux: make default output stderr
>       license: fix typos
>       drivers/net: fix FW version query
>       net/bnx2x: fix build with GCC 11
>       net/bnx2x: fix build with GCC 11
>       net/ice/base: fix build with GCC 11
>       net/tap: fix build with GCC 11
>       test/table: fix build with GCC 11
> 
> Gregory Etelson (2):
>       app/testpmd: fix tunnel offload flows cleanup
>       net/mlx5: fix tunnel offload private items location
> 
> Guoyang Zhou (1):
>       net/hinic: fix crash in secondary process
> 
> Haiyue Wang (1):
>       net/ixgbe: fix Rx errors statistics for UDP checksum
> 
> Harman Kalra (1):
>       event/octeontx2: fix device reconfigure for single slot
> 
> Heinrich Kuhn (1):
>       net/nfp: fix reporting of RSS capabilities
> 
> Hemant Agrawal (3):
>       ethdev: add missing buses in device iterator
>       crypto/dpaa_sec: affine the thread portal affinity
>       crypto/dpaa2_sec: fix close and uninit functions
> 
> Hongbo Zheng (9):
>       app/testpmd: fix Tx/Rx descriptor query error log
>       net/hns3: fix FLR miss detection
>       net/hns3: delete redundant blank line
>       bpf: fix JSLT validation
>       common/sfc_efx/base: fix dereferencing null pointer
>       power: fix sanity checks for guest channel read
>       net/hns3: fix VF alive notification after config restore
>       examples/l3fwd-power: fix empty poll thresholds
>       net/hns3: fix concurrent interrupt handling
> 
> Huisong Li (23):
>       net/hns3: fix device capabilities for copper media type
>       net/hns3: remove unused parameter markers
>       net/hns3: fix reporting undefined speed
>       net/hns3: fix link update when failed to get link info
>       net/hns3: fix flow control exception
>       app/testpmd: fix bitmap of link speeds when force speed
>       net/hns3: fix flow control mode
>       net/hns3: remove redundant mailbox response
>       net/hns3: fix DCB mode check
>       net/hns3: fix VMDq mode check
>       net/hns3: fix mbuf leakage
>       net/hns3: fix link status when port is stopped
>       net/hns3: fix link speed when port is down
>       app/testpmd: fix forward lcores number for DCB
>       app/testpmd: fix DCB forwarding configuration
>       app/testpmd: fix DCB re-configuration
>       app/testpmd: verify DCB config during forward config
>       net/hns3: fix Rx/Tx queue numbers check
>       net/hns3: fix requested FC mode rollback
>       net/hns3: remove meaningless packet buffer rollback
>       net/hns3: fix DCB configuration
>       net/hns3: fix DCB reconfiguration
>       net/hns3: fix link speed when VF device is down
> 
> Ibtisam Tariq (1):
>       examples/vhost_crypto: remove unused short option
> 
> Igor Chauskin (2):
>       net/ena: switch memcpy to optimized version
>       net/ena: fix parsing of large LLQ header device argument
> 
> Igor Russkikh (2):
>       net/qede: reduce log verbosity
>       net/qede: accept bigger RSS table
> 
> Ilya Maximets (1):
>       net/virtio: fix interrupt unregistering for listening socket
> 
> Ivan Malov (5):
>       net/sfc: fix buffer size for flow parse
>       net: fix comment in IPv6 header
>       net/sfc: fix error path inconsistency
>       common/sfc_efx/base: fix indication of MAE encap support
>       net/sfc: fix outer rule rollback on error
> 
> Jerin Jacob (1):
>       examples: fix pkg-config override
> 
> Jiawei Wang (4):
>       app/testpmd: fix NVGRE encap configuration
>       net/mlx5: fix resource release for mirror flow
>       net/mlx5: fix RSS flow item expansion for GRE key
>       net/mlx5: fix RSS flow item expansion for NVGRE
> 
> Jiawei Zhu (1):
>       net/mlx5: fix Rx segmented packets on mbuf starvation
> 
> Jiawen Wu (4):
>       net/txgbe: remove unused functions
>       net/txgbe: fix Rx missed packet counter
>       net/txgbe: update packet type
>       net/txgbe: fix QinQ strip
> 
> Jiayu Hu (2):
>       vhost: fix queue initialization
>       vhost: fix redundant vring status change notification
> 
> Jie Wang (1):
>       net/ice: fix VSI array out of bounds access
> 
> John Daley (2):
>       net/enic: fix flow initialization error handling
>       net/enic: enable GENEVE offload via VNIC configuration
> 
> Juraj Linkeš (1):
>       eal/arm64: fix platform register bit
> 
> Kai Ji (2):
>       test/crypto: fix auth-cipher compare length in OOP
>       test/crypto: copy offset data to OOP destination buffer
> 
> Kalesh AP (23):
>       net/bnxt: remove unused macro
>       net/bnxt: fix VNIC configuration
>       net/bnxt: fix firmware fatal error handling
>       net/bnxt: fix FW readiness check during recovery
>       net/bnxt: fix device readiness check
>       net/bnxt: fix VF info allocation
>       net/bnxt: fix HWRM and FW incompatibility handling
>       net/bnxt: mute some failure logs
>       app/testpmd: check MAC address query
>       net/bnxt: fix PCI write check
>       net/bnxt: fix link state operations
>       net/bnxt: fix timesync when PTP is not supported
>       net/bnxt: fix memory allocation for command response
>       net/bnxt: fix double free in port start failure
>       net/bnxt: fix configuring LRO
>       net/bnxt: fix health check alarm cancellation
>       net/bnxt: fix PTP support for Thor
>       net/bnxt: fix ring count calculation for Thor
>       net/bnxt: remove unnecessary forward declarations
>       net/bnxt: remove unused function parameters
>       net/bnxt: drop unused attribute
>       net/bnxt: fix single PF per port check
>       net/bnxt: prevent device access in error state
> 
> Kamil Vojanec (1):
>       net/mlx5/linux: fix firmware version
> 
> Kevin Traynor (5):
>       test/cmdline: fix inputs array
>       test/crypto: fix build with GCC 11
>       crypto/zuc: fix build with GCC 11
>       test: fix build with GCC 11
>       test/cmdline: silence clang 12 warning
> 
> Konstantin Ananyev (1):
>       acl: fix build with GCC 11
> 
> Lance Richardson (8):
>       net/bnxt: fix Rx buffer posting
>       net/bnxt: fix Tx length hint threshold
>       net/bnxt: fix handling of null flow mask
>       test: fix TCP header initialization
>       net/bnxt: fix Rx descriptor status
>       net/bnxt: fix Rx queue count
>       net/bnxt: fix dynamic VNIC count
>       eal: fix memory mapping on 32-bit target
> 
> Leyi Rong (1):
>       net/iavf: fix packet length parsing in AVX512
> 
> Li Zhang (1):
>       net/mlx5: fix flow actions index in cache
> 
> Luc Pelletier (2):
>       eal: fix race in control thread creation
>       eal: fix hang in control thread creation
> 
> Marvin Liu (5):
>       vhost: fix split ring potential buffer overflow
>       vhost: fix packed ring potential buffer overflow
>       vhost: fix batch dequeue potential buffer overflow
>       vhost: fix initialization of temporary header
>       vhost: fix initialization of async temporary header
> 
> Matan Azrad (5):
>       common/mlx5/linux: add glue function to query WQ
>       common/mlx5: add DevX command to query WQ
>       common/mlx5: add DevX commands for queue counters
>       vdpa/mlx5: fix virtq cleaning
>       vdpa/mlx5: fix device unplug
> 
> Michael Baum (1):
>       net/mlx5: fix flow age event triggering
> 
> Michal Krawczyk (5):
>       net/ena/base: improve style and comments
>       net/ena/base: fix type conversions by explicit casting
>       net/ena/base: destroy multiple wait events
>       net/ena: fix crash with unsupported device argument
>       net/ena: indicate Rx RSS hash presence
> 
> Min Hu (Connor) (25):
>       net/hns3: fix MTU config complexity
>       net/hns3: update HiSilicon copyright syntax
>       net/hns3: fix copyright date
>       examples/ptpclient: remove wrong comment
>       test/bpf: fix error message
>       doc: fix HiSilicon copyright syntax
>       net/hns3: remove unused macros
>       net/hns3: remove unused macro
>       app/eventdev: fix overflow in lcore list parsing
>       test/kni: fix a comment
>       test/kni: check init result
>       net/hns3: fix typos on comments
>       net/e1000: fix flow error message object
>       app/testpmd: fix division by zero on socket memory dump
>       net/kni: warn on stop failure
>       app/bbdev: check memory allocation
>       app/bbdev: fix HARQ error messages
>       raw/skeleton: add missing check after setting attribute
>       test/timer: check memzone allocation
>       app/crypto-perf: check memory allocation
>       examples/flow_classify: fix NUMA check of port and core
>       examples/l2fwd-cat: fix NUMA check of port and core
>       examples/skeleton: fix NUMA check of port and core
>       test: check flow classifier creation
>       test: fix division by zero
> 
> Murphy Yang (3):
>       net/ixgbe: fix RSS RETA being reset after port start
>       net/i40e: fix flow director config after flow validate
>       net/i40e: fix flow director for common pctypes
> 
> Natanael Copa (5):
>       common/dpaax/caamflib: fix build with musl
>       bus/dpaa: fix 64-bit arch detection
>       bus/dpaa: fix build with musl
>       net/cxgbe: remove use of uint type
>       app/testpmd: fix build with musl
> 
> Nipun Gupta (1):
>       bus/dpaa: fix statistics reading
> 
> Nithin Dabilpuram (3):
>       vfio: do not merge contiguous areas
>       vfio: fix DMA mapping granularity for IOVA as VA
>       test/mem: fix page size for external memory
> 
> Olivier Matz (1):
>       test/mempool: fix object initializer
> 
> Pallavi Kadam (1):
>       bus/pci: skip probing some Windows NDIS devices
> 
> Pavan Nikhilesh (4):
>       test/event: fix timeout accuracy
>       app/eventdev: fix timeout accuracy
>       app/eventdev: fix lcore parsing skipping last core
>       event/octeontx2: fix XAQ pool reconfigure
> 
> Pu Xu (1):
>       ip_frag: fix fragmenting IPv4 packet with header option
> 
> Qi Zhang (8):
>       net/ice/base: fix payload indicator on ptype
>       net/ice/base: fix uninitialized struct
>       net/ice/base: cleanup filter list on error
>       net/ice/base: fix memory allocation for MAC addresses
>       net/iavf: fix TSO max segment size
>       doc: fix matching versions in ice guide
>       net/iavf: fix wrong Tx context descriptor
>       common/iavf: fix duplicated offload bit
> 
> Radha Mohan Chintakuntla (1):
>       raw/octeontx2_dma: assign PCI device in DPI VF
> 
> Raslan Darawsheh (1):
>       ethdev: update flow item GTP QFI definition
> 
> Richael Zhuang (2):
>       test/power: add delay before checking CPU frequency
>       test/power: round CPU frequency to check
> 
> Robin Zhang (6):
>       net/i40e: announce request queue capability in PF
>       doc: update recommended versions for i40e
>       net/i40e: fix lack of MAC type when set MAC address
>       net/iavf: fix lack of MAC type when set MAC address
>       net/iavf: fix primary MAC type when starting port
>       net/i40e: fix primary MAC type when starting port
> 
> Rohit Raj (3):
>       net/dpaa2: fix getting link status
>       net/dpaa: fix getting link status
>       examples/l2fwd-crypto: fix packet length while decryption
> 
> Roy Shterman (1):
>       mem: fix freeing segments in --huge-unlink mode
> 
> Satheesh Paul (1):
>       net/octeontx2: fix VLAN filter
> 
> Savinay Dharmappa (1):
>       sched: fix traffic class oversubscription parameter
> 
> Shijith Thotton (3):
>       eventdev: fix case to initiate crypto adapter service
>       event/octeontx2: fix crypto adapter queue pair operations
>       event/octeontx2: configure crypto adapter xaq pool
> 
> Siwar Zitouni (1):
>       net/ice: fix disabling promiscuous mode
> 
> Somnath Kotur (5):
>       net/bnxt: fix xstats get
>       net/bnxt: fix Rx and Tx timestamps
>       net/bnxt: fix Tx timestamp init
>       net/bnxt: refactor multi-queue Rx configuration
>       net/bnxt: fix Rx timestamp when FIFO pending bit is set
> 
> Stanislaw Kardach (6):
>       test: proceed if timer subsystem already initialized
>       stack: allow lock-free only on relevant architectures
>       test/distributor: fix worker notification in burst mode
>       test/distributor: fix burst flush on worker quit
>       net/ena: remove endian swap functions
>       net/ena: report default ring size
> 
> Stephen Hemminger (2):
>       kni: refactor user request processing
>       net/bnxt: use prefix on global function
> 
> Suanming Mou (1):
>       net/mlx5: fix counter offset detection
> 
> Tal Shnaiderman (2):
>       eal/windows: fix default thread priority
>       eal/windows: fix return codes of pthread shim layer
> 
> Tengfei Zhang (1):
>       net/pcap: fix file descriptor leak on close
> 
> Thinh Tran (1):
>       test: fix autotest handling of skipped tests
> 
> Thomas Monjalon (18):
>       bus/pci: fix Windows kernel driver categories
>       eal: fix comment of OS-specific header files
>       buildtools: fix build with busybox
>       build: detect execinfo library on Linux
>       build: remove redundant _GNU_SOURCE definitions
>       eal: fix build with musl
>       net/igc: remove use of uint type
>       event/dlb: fix header includes for musl
>       examples/bbdev: fix header include for musl
>       drivers: fix log level after loading
>       app/regex: fix usage text
>       app/testpmd: fix usage text
>       doc: fix names of UIO drivers
>       doc: fix build with Sphinx 4
>       bus/pci: support I/O port operations with musl
>       app: fix exit messages
>       regex/octeontx2: remove unused include directory
>       doc: remove PDF requirements
> 
> Tianyu Li (1):
>       net/memif: fix Tx bps statistics for zero-copy
> 
> Timothy McDaniel (2):
>       event/dlb2: remove references to deferred scheduling
>       doc: fix runtime options in DLB2 guide
> 
> Tyler Retzlaff (1):
>       eal: add C++ include guard for reciprocal header
> 
> Vadim Podovinnikov (1):
>       net/bonding: fix LACP system address check
> 
> Venkat Duvvuru (1):
>       net/bnxt: fix queues per VNIC
> 
> Viacheslav Ovsiienko (16):
>       net/mlx5: fix external buffer pool registration for Rx queue
>       net/mlx5: fix metadata item validation for ingress flows
>       net/mlx5: fix hashed list size for tunnel flow groups
>       net/mlx5: fix UAR allocation diagnostics messages
>       common/mlx5: add timestamp format support to DevX
>       vdpa/mlx5: support timestamp format
>       net/mlx5: fix Rx metadata leftovers
>       net/mlx5: fix drop action for Direct Rules/Verbs
>       net/mlx4: fix RSS action with null hash key
>       net/mlx5: support timestamp format
>       regex/mlx5: support timestamp format
>       app/testpmd: fix segment number check
>       net/mlx5: remove drop queue function prototypes
>       net/mlx4: fix buffer leakage on device close
>       net/mlx5: fix probing device in legacy bonding mode
>       net/mlx5: fix receiving queue timestamp format
> 
> Wei Huang (1):
>       raw/ifpga: fix device name format
> 
> Wenjun Wu (3):
>       net/ice: check some functions return
>       net/ice: fix RSS hash update
>       net/ice: fix RSS for L2 packet
> 
> Wenwu Ma (1):
>       net/ice: fix illegal access when removing MAC filter
> 
> Wenzhuo Lu (2):
>       net/iavf: fix crash in AVX512
>       net/ice: fix crash in AVX512
> 
> Wisam Jaddo (1):
>       app/flow-perf: fix encap/decap actions
> 
> Xiao Wang (1):
>       vdpa/ifc: check PCI config read
> 
> Xiaoyu Min (4):
>       net/mlx5: support RSS expansion for IPv6 GRE
>       net/mlx5: fix shared inner RSS
>       net/mlx5: fix missing shared RSS hash types
>       net/mlx5: fix redundant flow after RSS expansion
> 
> Xiaoyun Li (2):
>       app/testpmd: remove unnecessary UDP tunnel check
>       net/i40e: fix IPv4 fragment offload
> 
> Xueming Li (2):
>       version: 20.11.2-rc1
>       net/virtio: fix vectorized Rx queue rearm
> 
> Youri Querry (1):
>       bus/fslmc: fix random portal hangs with qbman 5.0
> 
> Yunjian Wang (5):
>       vfio: fix API description
>       net/mlx5: fix using flow tunnel before null check
>       vfio: fix duplicated user mem map
>       net/mlx4: fix leak when configured repeatedly
>       net/mlx5: fix leak when configured repeatedly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [dpdk-ci] [PATCH v2 2/2] drivers: add octeontx crypto adapter data path
  @ 2021-06-30 16:23  4%       ` Brandon Lo
  0 siblings, 0 replies; 200+ results
From: Brandon Lo @ 2021-06-30 16:23 UTC (permalink / raw)
  To: Akhil Goyal
  Cc: Shijith Thotton, dev, ci, Pavan Nikhilesh Bhagavatula,
	Anoob Joseph, Jerin Jacob Kollanukkaran, abhinandan.gujjar,
	Ankur Dwivedi

Hi Akhil,

I believe the FreeBSD 13 failure appeared because new requirements
were added for drivers/event/octeontx.
The ABI reference was taken at the v21.05 release which was able to
build this driver at the time.
I will try to look for a way to produce a real ABI test.

Thanks,
Brandon

On Wed, Jun 30, 2021 at 4:54 AM Akhil Goyal <gakhil@marvell.com> wrote:
>
> > Added support for crypto adapter OP_FORWARD mode.
> >
> > As OcteonTx CPT crypto completions could be out of order, each crypto op
> > is enqueued to CPT, dequeued from CPT and enqueued to SSO one-by-one.
> >
> > Signed-off-by: Shijith Thotton <sthotton@marvell.com>
> > ---
> This patch shows a CI warning for FreeBSD, but was not able to locate the error/warning in the logs.
> Can anybody confirm what is the issue?
>
> http://mails.dpdk.org/archives/test-report/2021-June/200637.html
>
> Regards,
> Akhil



-- 

Brandon Lo

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

blo@iol.unh.edu

www.iol.unh.edu

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs
  2021-06-29 18:38  0%   ` Kinsella, Ray
@ 2021-06-30 19:56  4%     ` Tyler Retzlaff
  2021-07-01  7:56  0%       ` Ferruh Yigit
  2021-07-01 10:19  4%       ` Kinsella, Ray
  0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2021-06-30 19:56 UTC (permalink / raw)
  To: Kinsella, Ray; +Cc: dev, ferruh.yigit, thomas, david.marchand, stephen

On Tue, Jun 29, 2021 at 07:38:05PM +0100, Kinsella, Ray wrote:
> 
> 
> >> +Promotion to stable
> >> +~~~~~~~~~~~~~~~~~~~
> >> +
> >> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable API
> >> +once a maintainer and/or the original contributor is satisfied that the API is
> >> +reasonably mature. In exceptional circumstances, should an API still be
> > 
> > this seems vague and arbitrary. is there a way we can have a more
> > quantitative metric for what "reasonably mature" means.
> > 
> >> +classified as ``experimental`` after two years and is without any prospect of
> >> +becoming part of the stable API. The API will then become a candidate for
> >> +removal, to avoid the acculumation of abandoned symbols.
> > 
> > i think with the above comment the basis for removal then depends on
> > whatever metric is used to determine maturity. 
> > if it is still changing
> > then it seems like it is useful and still evolving so perhaps should not
> > be removed but hasn't changed but doesn't meet the metric for being made
> > stable then perhaps it becomes a candidate for removal.
> 
> Good idea. 
> 
> I think it is reasonable to add a clause that indicates that any change 
> to the "API signature" would reset the clock.

a time based strategy works but i guess the follow-on to that is how is
the clock tracked and how does it get updated? i don't think trying to
troll through git history will be effective.

one nit, i think "api signature" doesn't cover all cases of what i would
regard as change. i would prefer to define it as "no change where api/abi
compatibility or semantic change occurred"? which is a lot more strict
but in practice is necessary to support binaries when abi/api is stable.

i.e. if a recompile is necessary with or without code change then it's a
change.

> 
> However equally any changes to the implementation do not reset the clock.
> 
> Would that work?

that works for me.

> 
> > 
> >> +
> >> +The promotion or removal of symbols will typically form part of a conversation
> >> +between the maintainer and the original contributor.
> > 
> > this should extend beyond just symbols. there are other changes that
> > impact the abi where exported symbols don't change. e.g. additions to
> > return values sets.> 
> > thanks for working on this.
> > 

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs
  2021-06-30 19:56  4%     ` Tyler Retzlaff
@ 2021-07-01  7:56  0%       ` Ferruh Yigit
  2021-07-01 14:45  4%         ` Tyler Retzlaff
  2021-07-01 10:19  4%       ` Kinsella, Ray
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2021-07-01  7:56 UTC (permalink / raw)
  To: Tyler Retzlaff, Kinsella, Ray; +Cc: dev, thomas, david.marchand, stephen

On 6/30/2021 8:56 PM, Tyler Retzlaff wrote:
> On Tue, Jun 29, 2021 at 07:38:05PM +0100, Kinsella, Ray wrote:
>>
>>
>>>> +Promotion to stable
>>>> +~~~~~~~~~~~~~~~~~~~
>>>> +
>>>> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable API
>>>> +once a maintainer and/or the original contributor is satisfied that the API is
>>>> +reasonably mature. In exceptional circumstances, should an API still be
>>>
>>> this seems vague and arbitrary. is there a way we can have a more
>>> quantitative metric for what "reasonably mature" means.
>>>
>>>> +classified as ``experimental`` after two years and is without any prospect of
>>>> +becoming part of the stable API. The API will then become a candidate for
>>>> +removal, to avoid the acculumation of abandoned symbols.
>>>
>>> i think with the above comment the basis for removal then depends on
>>> whatever metric is used to determine maturity. 
>>> if it is still changing
>>> then it seems like it is useful and still evolving so perhaps should not
>>> be removed but hasn't changed but doesn't meet the metric for being made
>>> stable then perhaps it becomes a candidate for removal.
>>
>> Good idea. 
>>
>> I think it is reasonable to add a clause that indicates that any change 
>> to the "API signature" would reset the clock.
> 
> a time based strategy works but i guess the follow-on to that is how is
> the clock tracked and how does it get updated? i don't think trying to
> troll through git history will be effective.
> 

We are grouping the new experimental APIs in the version file based on the
release they are added with a comment, thanks to Dave. Like:

        # added in 19.02
        rte_extmem_attach;
        rte_extmem_detach;
        rte_extmem_register;
        rte_extmem_unregister;

        # added in 19.05
        rte_dev_dma_map;
        rte_dev_dma_unmap;
        ....

Please check 'lib/eal/version.map' as sample.

This enables us easily see the release experimental APIs are added.

> one nit, i think "api signature" doesn't cover all cases of what i would
> regard as change. i would prefer to define it as "no change where api/abi
> compatibility or semantic change occurred"? which is a lot more strict
> but in practice is necessary to support binaries when abi/api is stable.
> 
> i.e. if a recompile is necessary with or without code change then it's a
> change.
> 
>>
>> However equally any changes to the implementation do not reset the clock.
>>
>> Would that work?
> 
> that works for me.
> 
>>
>>>
>>>> +
>>>> +The promotion or removal of symbols will typically form part of a conversation
>>>> +between the maintainer and the original contributor.
>>>
>>> this should extend beyond just symbols. there are other changes that
>>> impact the abi where exported symbols don't change. e.g. additions to
>>> return values sets.> 
>>> thanks for working on this.
>>>


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs
  2021-06-30 19:56  4%     ` Tyler Retzlaff
  2021-07-01  7:56  0%       ` Ferruh Yigit
@ 2021-07-01 10:19  4%       ` Kinsella, Ray
  2021-07-01 15:09  4%         ` Tyler Retzlaff
  1 sibling, 1 reply; 200+ results
From: Kinsella, Ray @ 2021-07-01 10:19 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, ferruh.yigit, thomas, david.marchand, stephen



On 30/06/2021 20:56, Tyler Retzlaff wrote:
> On Tue, Jun 29, 2021 at 07:38:05PM +0100, Kinsella, Ray wrote:
>>
>>
>>>> +Promotion to stable
>>>> +~~~~~~~~~~~~~~~~~~~
>>>> +
>>>> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable API
>>>> +once a maintainer and/or the original contributor is satisfied that the API is
>>>> +reasonably mature. In exceptional circumstances, should an API still be
>>>
>>> this seems vague and arbitrary. is there a way we can have a more
>>> quantitative metric for what "reasonably mature" means.
>>>
>>>> +classified as ``experimental`` after two years and is without any prospect of
>>>> +becoming part of the stable API. The API will then become a candidate for
>>>> +removal, to avoid the acculumation of abandoned symbols.
>>>
>>> i think with the above comment the basis for removal then depends on
>>> whatever metric is used to determine maturity. 
>>> if it is still changing
>>> then it seems like it is useful and still evolving so perhaps should not
>>> be removed but hasn't changed but doesn't meet the metric for being made
>>> stable then perhaps it becomes a candidate for removal.
>>
>> Good idea. 
>>
>> I think it is reasonable to add a clause that indicates that any change 
>> to the "API signature" would reset the clock.
> 
> a time based strategy works but i guess the follow-on to that is how is
> the clock tracked and how does it get updated? i don't think trying to
> troll through git history will be effective.
> 
> one nit, i think "api signature" doesn't cover all cases of what i would
> regard as change. i would prefer to define it as "no change where api/abi
> compatibility or semantic change occurred"? which is a lot more strict
> but in practice is necessary to support binaries when abi/api is stable.
> 
> i.e. if a recompile is necessary with or without code change then it's a
> change.

Having thought a bit ... this becomes a bit problematic.

Many data-structures in DPDK are nested, 
these can have a ripple effect when changed - a change to mbuf is a good example.

What I saying is ...
I don't think changes in ABI due to in-direct reasons should count.
If there is a change due to a deliberate change in the ABI signature 
that is fine, reset the clock.

If there is a change due to some nested data-structure, 
3-levels down changing in my book that doesn't count. 
As that may or may not have been deliberate, and is almost impossible to police. 

Checking anything but a deliberate change to the ABI signature,
would be practically impossible IMHO. 

> 
>>
>> However equally any changes to the implementation do not reset the clock.
>>
>> Would that work?
> 
> that works for me.

v2 on the way.

> 
>>
>>>
>>>> +
>>>> +The promotion or removal of symbols will typically form part of a conversation
>>>> +between the maintainer and the original contributor.
>>>
>>> this should extend beyond just symbols. there are other changes that
>>> impact the abi where exported symbols don't change. e.g. additions to
>>> return values sets.> 
>>> thanks for working on this.
>>>

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v2] doc: policy on promotion of experimental APIs
  2021-06-29 16:00 21% [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs Ray Kinsella
  2021-06-29 16:28  3% ` Tyler Retzlaff
@ 2021-07-01 10:31 23% ` Ray Kinsella
  2021-07-01 10:38 23% ` [dpdk-dev] [PATCH v3] doc: policy on the " Ray Kinsella
  2 siblings, 0 replies; 200+ results
From: Ray Kinsella @ 2021-07-01 10:31 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, john.mcnamara, roretzla, ferruh.yigit, thomas,
	david.marchand, stephen, Ray Kinsella

Clarifying the ABI policy on the promotion of experimental APIS to stable.
We have a fair number of APIs that have been experimental for more than
2 years. This policy ammendment indicates that these APIs should be
promoted or removed, or should at least form a conservation between the
maintainer and original contributor.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
v2: addressing comments on abi expiry from Tyler Retzlaff.

 doc/guides/contributing/abi_policy.rst | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index 4ad87dbfed..840c295e5d 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -26,9 +26,10 @@ General Guidelines
    symbols is managed with :ref:`ABI Versioning <abi_versioning>`.
 #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
    once approved these will form part of the next ABI version.
-#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
-   be changed or removed without prior notice, as they are not considered part
-   of an ABI version.
+#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
+   changed or removed without prior notice, as they are not considered part of
+   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
+   is not an indefinite state.
 #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
    support for hardware which was previously supported, should be treated as an
    ABI change.
@@ -358,3 +359,18 @@ Libraries
 Libraries marked as ``experimental`` are entirely not considered part of an ABI
 version.
 All functions in such libraries may be changed or removed without prior notice.
+
+Promotion to stable
+~~~~~~~~~~~~~~~~~~~
+
+Ordinarily APIs marked as ``experimental`` will be promoted to the stable ABI
+once a maintainer and/or the original contributor is satisfied that the API is
+reasonably mature. In exceptional circumstances, should an API still be
+classified as ``experimental`` after two years and is without any prospect of
+becoming part of the stable API. The API will then become a candidate for
+removal, to avoid the acculumation of abandoned symbols.
+
+Should an API's Binary Interface change during the two year period, usually due
+to a direct change in the to API's signature. It is reasonable for the expiry
+clock to reset. The promotion or removal of symbols will typically form part of
+a conversation between the maintainer and the original contributor.
-- 
2.26.2


^ permalink raw reply	[relevance 23%]

* [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs
  2021-06-29 16:00 21% [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs Ray Kinsella
  2021-06-29 16:28  3% ` Tyler Retzlaff
  2021-07-01 10:31 23% ` [dpdk-dev] [PATCH v2] " Ray Kinsella
@ 2021-07-01 10:38 23% ` Ray Kinsella
  2021-07-07 18:32  0%   ` Tyler Retzlaff
  2021-07-09  6:16  0%   ` Jerin Jacob
  2 siblings, 2 replies; 200+ results
From: Ray Kinsella @ 2021-07-01 10:38 UTC (permalink / raw)
  To: dev
  Cc: bruce.richardson, john.mcnamara, roretzla, ferruh.yigit, thomas,
	david.marchand, stephen, Ray Kinsella

Clarifying the ABI policy on the promotion of experimental APIS to stable.
We have a fair number of APIs that have been experimental for more than
2 years. This policy amendment indicates that these APIs should be
promoted or removed, or should at least form a conservation between the
maintainer and original contributor.

Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
---
v2: addressing comments on abi expiry from Tyler Retzlaff.
v3: addressing typos in the git commit message

 doc/guides/contributing/abi_policy.rst | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index 4ad87dbfed..840c295e5d 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -26,9 +26,10 @@ General Guidelines
    symbols is managed with :ref:`ABI Versioning <abi_versioning>`.
 #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
    once approved these will form part of the next ABI version.
-#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
-   be changed or removed without prior notice, as they are not considered part
-   of an ABI version.
+#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
+   changed or removed without prior notice, as they are not considered part of
+   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
+   is not an indefinite state.
 #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
    support for hardware which was previously supported, should be treated as an
    ABI change.
@@ -358,3 +359,18 @@ Libraries
 Libraries marked as ``experimental`` are entirely not considered part of an ABI
 version.
 All functions in such libraries may be changed or removed without prior notice.
+
+Promotion to stable
+~~~~~~~~~~~~~~~~~~~
+
+Ordinarily APIs marked as ``experimental`` will be promoted to the stable ABI
+once a maintainer and/or the original contributor is satisfied that the API is
+reasonably mature. In exceptional circumstances, should an API still be
+classified as ``experimental`` after two years and is without any prospect of
+becoming part of the stable API. The API will then become a candidate for
+removal, to avoid the acculumation of abandoned symbols.
+
+Should an API's Binary Interface change during the two year period, usually due
+to a direct change in the to API's signature. It is reasonable for the expiry
+clock to reset. The promotion or removal of symbols will typically form part of
+a conversation between the maintainer and the original contributor.
-- 
2.26.2


^ permalink raw reply	[relevance 23%]

* Re: [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs
  2021-07-01  7:56  0%       ` Ferruh Yigit
@ 2021-07-01 14:45  4%         ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2021-07-01 14:45 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: Kinsella, Ray, dev, thomas, david.marchand, stephen

On Thu, Jul 01, 2021 at 08:56:22AM +0100, Ferruh Yigit wrote:
> On 6/30/2021 8:56 PM, Tyler Retzlaff wrote:
> > On Tue, Jun 29, 2021 at 07:38:05PM +0100, Kinsella, Ray wrote:
> >>
> >>
> >>>> +Promotion to stable
> >>>> +~~~~~~~~~~~~~~~~~~~
> >>>> +
> >>>> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable API
> >>>> +once a maintainer and/or the original contributor is satisfied that the API is
> >>>> +reasonably mature. In exceptional circumstances, should an API still be
> >>>
> >>> this seems vague and arbitrary. is there a way we can have a more
> >>> quantitative metric for what "reasonably mature" means.
> >>>
> >>>> +classified as ``experimental`` after two years and is without any prospect of
> >>>> +becoming part of the stable API. The API will then become a candidate for
> >>>> +removal, to avoid the acculumation of abandoned symbols.
> >>>
> >>> i think with the above comment the basis for removal then depends on
> >>> whatever metric is used to determine maturity. 
> >>> if it is still changing
> >>> then it seems like it is useful and still evolving so perhaps should not
> >>> be removed but hasn't changed but doesn't meet the metric for being made
> >>> stable then perhaps it becomes a candidate for removal.
> >>
> >> Good idea. 
> >>
> >> I think it is reasonable to add a clause that indicates that any change 
> >> to the "API signature" would reset the clock.
> > 
> > a time based strategy works but i guess the follow-on to that is how is
> > the clock tracked and how does it get updated? i don't think trying to
> > troll through git history will be effective.
> > 
> 
> We are grouping the new experimental APIs in the version file based on the
> release they are added with a comment, thanks to Dave. Like:
> 
>         # added in 19.02
>         rte_extmem_attach;
>         rte_extmem_detach;
>         rte_extmem_register;
>         rte_extmem_unregister;
> 
>         # added in 19.05
>         rte_dev_dma_map;
>         rte_dev_dma_unmap;
>         ....
> 
> Please check 'lib/eal/version.map' as sample.
> 
> This enables us easily see the release experimental APIs are added.

this is fine but the subject being discussed is oriented around how long
an api/abi has been unchanged to identify it as a candidate for qualifying
it as stable (not experimental). are you suggesting that if api/abi changes
then it is moved to the -current version to "restart the clock" as it were?


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs
  2021-07-01 10:19  4%       ` Kinsella, Ray
@ 2021-07-01 15:09  4%         ` Tyler Retzlaff
  2021-07-02  6:30  4%           ` Kinsella, Ray
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2021-07-01 15:09 UTC (permalink / raw)
  To: Kinsella, Ray; +Cc: dev, ferruh.yigit, thomas, david.marchand, stephen

On Thu, Jul 01, 2021 at 11:19:27AM +0100, Kinsella, Ray wrote:
> 
> 
> On 30/06/2021 20:56, Tyler Retzlaff wrote:
> > On Tue, Jun 29, 2021 at 07:38:05PM +0100, Kinsella, Ray wrote:
> >>
> >>
> >>>> +Promotion to stable
> >>>> +~~~~~~~~~~~~~~~~~~~
> >>>> +
> >>>> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable API
> >>>> +once a maintainer and/or the original contributor is satisfied that the API is
> >>>> +reasonably mature. In exceptional circumstances, should an API still be
> >>>
> >>> this seems vague and arbitrary. is there a way we can have a more
> >>> quantitative metric for what "reasonably mature" means.
> >>>
> >>>> +classified as ``experimental`` after two years and is without any prospect of
> >>>> +becoming part of the stable API. The API will then become a candidate for
> >>>> +removal, to avoid the acculumation of abandoned symbols.
> >>>
> >>> i think with the above comment the basis for removal then depends on
> >>> whatever metric is used to determine maturity. 
> >>> if it is still changing
> >>> then it seems like it is useful and still evolving so perhaps should not
> >>> be removed but hasn't changed but doesn't meet the metric for being made
> >>> stable then perhaps it becomes a candidate for removal.
> >>
> >> Good idea. 
> >>
> >> I think it is reasonable to add a clause that indicates that any change 
> >> to the "API signature" would reset the clock.
> > 
> > a time based strategy works but i guess the follow-on to that is how is
> > the clock tracked and how does it get updated? i don't think trying to
> > troll through git history will be effective.
> > 
> > one nit, i think "api signature" doesn't cover all cases of what i would
> > regard as change. i would prefer to define it as "no change where api/abi
> > compatibility or semantic change occurred"? which is a lot more strict
> > but in practice is necessary to support binaries when abi/api is stable.
> > 
> > i.e. if a recompile is necessary with or without code change then it's a
> > change.
> 
> Having thought a bit ... this becomes a bit problematic.
> 
> Many data-structures in DPDK are nested, 
> these can have a ripple effect when changed - a change to mbuf is a good example.
> 
> What I saying is ...
> I don't think changes in ABI due to in-direct reasons should count.
> If there is a change due to a deliberate change in the ABI signature 
> that is fine, reset the clock.
>
> 
> If there is a change due to some nested data-structure, 
> 3-levels down changing in my book that doesn't count. 

it has to count otherwise dpdk's abi stability promise for major version
releases is meaningless. or are you suggesting it doesn't count for the
purpose of determining whether or not an experimental api/abi has
changed?

> As that may or may not have been deliberate, and is almost impossible to police. 
> 
> Checking anything but a deliberate change to the ABI signature,
> would be practically impossible IMHO. 

well, it isn't impossible but it does take knowledge, mechanism and
process maintain the abi for a major version.

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] DPDK Release Status Meeting 01/07/2021
@ 2021-07-01 16:30  4% Mcnamara, John
  0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2021-07-01 16:30 UTC (permalink / raw)
  To: dev; +Cc: thomas, Yigit, Ferruh

Release status meeting minutes {Date}
=====================================
:Date: 1 July 2021
:toc:

.Agenda:
* Release Dates
* Subtrees
* Roadmaps
* LTS
* Defects
* Opens

.Participants:
* Broadcom
* Canonical
* Debian/Microsoft
* Intel
* Marvell
* Nvidia
* Red Hat


Release Dates
-------------

* `v21.08` dates
  - Proposal/V1:    Wednesday, 2 June (completed)
  - -rc1:           Wednesday, 7 July
  - Release:        Tuesday,   3 August

* Note: We need to hold to the early August release date since
  several of the maintainers will be on holidays after that.

* `v21.11` dates (proposed and subject to discussion)
  - Proposal/V1:    Friday, 10 September
  - -rc1:           Friday, 15 October
  - Release:        Friday, 19 November

Subtrees
--------

* main
  - Backlog is a little big at the moment. RC1 will probably slip to Wednesday 7th July.
  - Most subtrees PRs are ready or close to ready.
  - Still waiting update on Solarflare patches.
  - New auxiliary bus patch series should go into this release.

* next-net
  - Testpmd patchset for Windows.
  - Looking at net/sfc patches.

* next-crypto
  - 4 new PMDs in this release:
    ** CNXK - reviewed - awaiting final version for RC1.
    ** MLX - still in progress. New version will be sent today.
    ** Intel QAT - under review.
    ** NXP baseband - requires new version.

* next-eventdev
  - PR for RC1 will be completed today.

* next-virtio
  - PR posted yesterday.

* next-net-brcm
 - All patches in sub-tree waiting to be pulled.

* next-net-intel
  - Proceeding okay. No issues

* next-net-mlx
  - PR not pulled due to comments that need to be addressed.
  - New version sent today.

* next-net-mrvl
  - Pull request for RC1 sent.


LTS
---

* `v19.11` (next version is `v19.11.9`)
  - RC3 tagged.
  - Target release date July 2, however there are some late reported
    MLX regressions that are under investigation.
  - There are 2 other known issues:
    ** Plenty of GCC11 and Clang build issues were fixed, but 19.11.9
       is not yet compatible with clang 12.0.0. Fixes are discussed
       and a potential 3 backports identified for 19.11.10:
       https://bugs.dpdk.org/show_bug.cgi?id=733
    ** Due to a kernel patch backport in SUSE Linux Enterprise Server 15
       SP3 6, compilation of kni fails there:
       https://bugs.dpdk.org/show_bug.cgi?id=728

* `v20.11` (next version is `v20.11.2`)
  - RC2 released
  - Some test reports coming in (Intel, MLX)
  - 6 July is proposed release date.

* Distros
  - v20.11 in Debian 11
  - v20.11 in Ubuntu 21.04


Defects
-------

* Bugzilla links, 'Bugs',  added for hosted projects
  - https://www.dpdk.org/hosted-projects/


Opens
-----

* There in an ongoing initiative around ABI stability which was
  discussed in the Tech Board call. A workgroup has come up
  with a list of critical and major changes required to let us
  extend the ABI without as much disruption. For example:

  ** export driver interfaces as internal
  ** hide more structs (may require uninlining)
  ** split big structs + new feature-specific functions Major
  ** remove enum maximums
  ** reserved space initialized to 0
  ** reserved flags cleared

* We need to fill details and volunteers in this table:
  https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE/edit?usp=sharing

* The DPDK North America Summit will be on July 12-13. Registration is free.
  https://events.linuxfoundation.org/dpdk-summit-north-america/



.DPDK Release Status Meetings
*****
The DPDK Release Status Meeting is intended for DPDK Committers to discuss the status of the master tree and sub-trees, and for project managers to track progress or milestone dates.

The meeting occurs on every Thursdays at 8:30 UTC. on https://meet.jit.si/DPDK

If you wish to attend just send an email to "John McNamara <john.mcnamara@intel.com>" for the invite.
*****

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs
  2021-07-01 15:09  4%         ` Tyler Retzlaff
@ 2021-07-02  6:30  4%           ` Kinsella, Ray
  0 siblings, 0 replies; 200+ results
From: Kinsella, Ray @ 2021-07-02  6:30 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, ferruh.yigit, thomas, david.marchand, stephen



On 01/07/2021 16:09, Tyler Retzlaff wrote:
> On Thu, Jul 01, 2021 at 11:19:27AM +0100, Kinsella, Ray wrote:
>>
>>
>> On 30/06/2021 20:56, Tyler Retzlaff wrote:
>>> On Tue, Jun 29, 2021 at 07:38:05PM +0100, Kinsella, Ray wrote:
>>>>
>>>>
>>>>>> +Promotion to stable
>>>>>> +~~~~~~~~~~~~~~~~~~~
>>>>>> +
>>>>>> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable API
>>>>>> +once a maintainer and/or the original contributor is satisfied that the API is
>>>>>> +reasonably mature. In exceptional circumstances, should an API still be
>>>>>
>>>>> this seems vague and arbitrary. is there a way we can have a more
>>>>> quantitative metric for what "reasonably mature" means.
>>>>>
>>>>>> +classified as ``experimental`` after two years and is without any prospect of
>>>>>> +becoming part of the stable API. The API will then become a candidate for
>>>>>> +removal, to avoid the acculumation of abandoned symbols.
>>>>>
>>>>> i think with the above comment the basis for removal then depends on
>>>>> whatever metric is used to determine maturity. 
>>>>> if it is still changing
>>>>> then it seems like it is useful and still evolving so perhaps should not
>>>>> be removed but hasn't changed but doesn't meet the metric for being made
>>>>> stable then perhaps it becomes a candidate for removal.
>>>>
>>>> Good idea. 
>>>>
>>>> I think it is reasonable to add a clause that indicates that any change 
>>>> to the "API signature" would reset the clock.
>>>
>>> a time based strategy works but i guess the follow-on to that is how is
>>> the clock tracked and how does it get updated? i don't think trying to
>>> troll through git history will be effective.
>>>
>>> one nit, i think "api signature" doesn't cover all cases of what i would
>>> regard as change. i would prefer to define it as "no change where api/abi
>>> compatibility or semantic change occurred"? which is a lot more strict
>>> but in practice is necessary to support binaries when abi/api is stable.
>>>
>>> i.e. if a recompile is necessary with or without code change then it's a
>>> change.
>>
>> Having thought a bit ... this becomes a bit problematic.
>>
>> Many data-structures in DPDK are nested, 
>> these can have a ripple effect when changed - a change to mbuf is a good example.
>>
>> What I saying is ...
>> I don't think changes in ABI due to in-direct reasons should count.
>> If there is a change due to a deliberate change in the ABI signature 
>> that is fine, reset the clock.
>>
>>
>> If there is a change due to some nested data-structure, 
>> 3-levels down changing in my book that doesn't count. 
> 
> it has to count otherwise dpdk's abi stability promise for major version
> releases is meaningless. or are you suggesting it doesn't count for the
> purpose of determining whether or not an experimental api/abi has
> changed?
"it doesn't count for the purpose of determining whether or not an experimental api/abi has changed?".

Exactly - that is what I meant - apologies if I was unclear. 
In this case the change is not a deliberate act, 
in that it is not really happening because of any maturing of the ABI.

> 
>> As that may or may not have been deliberate, and is almost impossible to police. 
>>
>> Checking anything but a deliberate change to the ABI signature,
>> would be practically impossible IMHO. 
> 
> well, it isn't impossible but it does take knowledge, mechanism and
> process maintain the abi for a major version.

100% agree with this statement.

What do you think of the v3?


 

^ permalink raw reply	[relevance 4%]

* [dpdk-dev] ABI/API stability towards drivers
@ 2021-07-02  8:00  8% Morten Brørup
  2021-07-02  9:45  7% ` [dpdk-dev] [dpdk-techboard] " Ferruh Yigit
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Morten Brørup @ 2021-07-02  8:00 UTC (permalink / raw)
  To: dpdk-techboard; +Cc: dpdk-dev

Regarding the ongoing ABI stability project, it is suggested to export driver interfaces as internal.

What are we targeting regarding ABI and API stability towards drivers?

-Morten


^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [dpdk-techboard] ABI/API stability towards drivers
  2021-07-02  8:00  8% [dpdk-dev] ABI/API stability towards drivers Morten Brørup
@ 2021-07-02  9:45  7% ` Ferruh Yigit
  2021-07-02 12:26  4% ` Thomas Monjalon
  2021-07-07 18:46  8% ` [dpdk-dev] " Tyler Retzlaff
  2 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2021-07-02  9:45 UTC (permalink / raw)
  To: Morten Brørup, dpdk-techboard; +Cc: dpdk-dev

On 7/2/2021 10:00 AM, Morten Brørup wrote:
> Regarding the ongoing ABI stability project, it is suggested to export driver interfaces as internal.
> 
> What are we targeting regarding ABI and API stability towards drivers?
> 

Hi Morten,

It is about some device abstraction libraries, like cryptodev, exposing the
internal driver to library interface to the application. And any change on them
causing an unnecessary ABI break.

So target is not drivers, but hide everything from application that only needs
to be between lib and driver.

^ permalink raw reply	[relevance 7%]

* Re: [dpdk-dev] [dpdk-techboard] ABI/API stability towards drivers
  2021-07-02  8:00  8% [dpdk-dev] ABI/API stability towards drivers Morten Brørup
  2021-07-02  9:45  7% ` [dpdk-dev] [dpdk-techboard] " Ferruh Yigit
@ 2021-07-02 12:26  4% ` Thomas Monjalon
  2021-07-07 18:46  8% ` [dpdk-dev] " Tyler Retzlaff
  2 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-02 12:26 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dpdk-techboard, dpdk-dev

02/07/2021 10:00, Morten Brørup:
> Regarding the ongoing ABI stability project, it is suggested to export driver interfaces as internal.
> 
> What are we targeting regarding ABI and API stability towards drivers?

No stability for driver interface.
It is recommended to make drivers internal.
If a driver is kept external to DPDK, there is a maintenance cost.



^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v3 19/20] net/sfc: support flow action COUNT in transfer rules
  @ 2021-07-02 13:37  3%               ` David Marchand
  2021-07-02 13:39  0%                 ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2021-07-02 13:37 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: Bruce Richardson, Thomas Monjalon, dev, Igor Romanov,
	Andy Moreton, Ivan Malov

On Fri, Jul 2, 2021 at 10:43 AM Andrew Rybchenko
<andrew.rybchenko@oktetlabs.ru> wrote:
> I've send v4 with the problem fixed. However, I'm afraid
> build test systems should be updated to have libatomic
> correctly installed. Otherwise, they do not really check
> net/sfc build.

CI systems must be updated if they check ABI.
And in general, we want them to continue testing net/sfc.
I sent a mail to ask for this.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 19/20] net/sfc: support flow action COUNT in transfer rules
  2021-07-02 13:37  3%               ` David Marchand
@ 2021-07-02 13:39  0%                 ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2021-07-02 13:39 UTC (permalink / raw)
  To: David Marchand
  Cc: Bruce Richardson, Thomas Monjalon, dev, Igor Romanov,
	Andy Moreton, Ivan Malov

On 7/2/21 4:37 PM, David Marchand wrote:
> On Fri, Jul 2, 2021 at 10:43 AM Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru> wrote:
>> I've send v4 with the problem fixed. However, I'm afraid
>> build test systems should be updated to have libatomic
>> correctly installed. Otherwise, they do not really check
>> net/sfc build.
> 
> CI systems must be updated if they check ABI.
> And in general, we want them to continue testing net/sfc.
> I sent a mail to ask for this.

Many thanks, David


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library
  @ 2021-07-04  9:30  3% ` Jerin Jacob
  2021-07-05 10:52  0%   ` Bruce Richardson
      2 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2021-07-04  9:30 UTC (permalink / raw)
  To: Chengwen Feng
  Cc: Thomas Monjalon, Ferruh Yigit, Richardson, Bruce, Jerin Jacob,
	dpdk-dev, Morten Brørup, Nipun Gupta, Hemant Agrawal,
	Maxime Coquelin, Honnappa Nagarahalli, David Marchand,
	Satananda Burla, Prasun Kapoor, Ananyev, Konstantin, liangma,
	Radha Mohan Chintakuntla

On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng <fengchengwen@huawei.com> wrote:
>
> This patch introduces 'dmadevice' which is a generic type of DMA
> device.
>
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>

Thanks for v1.

I would suggest finalizing  lib/dmadev/rte_dmadev.h before doing the
implementation so that you don't need
to waste time on rewoking the implementation.

Comments inline.

> ---
>  MAINTAINERS                  |   4 +
>  config/rte_config.h          |   3 +
>  lib/dmadev/meson.build       |   6 +
>  lib/dmadev/rte_dmadev.c      | 438 +++++++++++++++++++++
>  lib/dmadev/rte_dmadev.h      | 919 +++++++++++++++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h |  98 +++++
>  lib/dmadev/rte_dmadev_pmd.h  | 210 ++++++++++
>  lib/dmadev/version.map       |  32 ++

Missed to update doxygen. See doc/api/doxy-api.conf.in
Use meson  -Denable_docs=true to verify the generated doxgen doc.

>  lib/meson.build              |   1 +
>  9 files changed, 1711 insertions(+)
>  create mode 100644 lib/dmadev/meson.build
>  create mode 100644 lib/dmadev/rte_dmadev.c
>  create mode 100644 lib/dmadev/rte_dmadev.h
>  create mode 100644 lib/dmadev/rte_dmadev_core.h
>  create mode 100644 lib/dmadev/rte_dmadev_pmd.h
>  create mode 100644 lib/dmadev/version.map
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4347555..2019783 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
>  F: app/test/test_rawdev.c
>  F: doc/guides/prog_guide/rawdev.rst
>

Add EXPERIMENTAL

> +Dma device API
> +M: Chengwen Feng <fengchengwen@huawei.com>
> +F: lib/dmadev/
> +
>

> new file mode 100644
> index 0000000..a94e839
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -0,0 +1,438 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 HiSilicon Limited.
> + */
> +
> +#include <ctype.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <stdint.h>
> +
> +#include <rte_log.h>
> +#include <rte_debug.h>
> +#include <rte_dev.h>
> +#include <rte_memory.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_errno.h>
> +#include <rte_string_fns.h>

Sort in alphabetical order.

> +
> +#include "rte_dmadev.h"
> +#include "rte_dmadev_pmd.h"
> +
> +struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];

# Please check have you missed any multiprocess angle.
lib/regexdev/rte_regexdev.c is latest device class implemented in dpdk and
please check *rte_regexdev_shared_data scheme.


# Missing dynamic log for this library.


> diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> new file mode 100644
> index 0000000..f74fc6a
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.h
> @@ -0,0 +1,919 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 HiSilicon Limited.

It would be nice to add other companies' names who have contributed to
the specification.

> + */
> +
> +#ifndef _RTE_DMADEV_H_
> +#define _RTE_DMADEV_H_
> +
> +/**
> + * @file rte_dmadev.h
> + *
> + * RTE DMA (Direct Memory Access) device APIs.
> + *
> + * The generic DMA device diagram:
> + *
> + *            ------------     ------------
> + *            | HW-queue |     | HW-queue |
> + *            ------------     ------------
> + *                   \            /
> + *                    \          /
> + *                     \        /
> + *                  ----------------
> + *                  |dma-controller|
> + *                  ----------------
> + *
> + *   The DMA could have multiple HW-queues, each HW-queue could have multiple
> + *   capabilities, e.g. whether to support fill operation, supported DMA
> + *   transfter direction and etc.

typo

> + *
> + * The DMA framework is built on the following abstraction model:
> + *
> + *     ------------    ------------
> + *     |virt-queue|    |virt-queue|
> + *     ------------    ------------
> + *            \           /
> + *             \         /
> + *              \       /
> + *            ------------     ------------
> + *            | HW-queue |     | HW-queue |
> + *            ------------     ------------
> + *                   \            /
> + *                    \          /
> + *                     \        /
> + *                     ----------
> + *                     | dmadev |
> + *                     ----------

Continuing the discussion with @Morten Brørup , I think, we need to
finalize the model.

> + *   a) The DMA operation request must be submitted to the virt queue, virt
> + *      queues must be created based on HW queues, the DMA device could have
> + *      multiple HW queues.
> + *   b) The virt queues on the same HW-queue could represent different contexts,
> + *      e.g. user could create virt-queue-0 on HW-queue-0 for mem-to-mem
> + *      transfer scenario, and create virt-queue-1 on the same HW-queue for
> + *      mem-to-dev transfer scenario.
> + *   NOTE: user could also create multiple virt queues for mem-to-mem transfer
> + *         scenario as long as the corresponding driver supports.
> + *
> + * The control plane APIs include configure/queue_setup/queue_release/start/
> + * stop/reset/close, in order to start device work, the call sequence must be
> + * as follows:
> + *     - rte_dmadev_configure()
> + *     - rte_dmadev_queue_setup()
> + *     - rte_dmadev_start()

Please add reconfigure behaviour etc, Please check the
lib/regexdev/rte_regexdev.h
introduction. I have added similar ones so you could reuse as much as possible.


> + * The dataplane APIs include two parts:
> + *   a) The first part is the submission of operation requests:
> + *        - rte_dmadev_copy()
> + *        - rte_dmadev_copy_sg() - scatter-gather form of copy
> + *        - rte_dmadev_fill()
> + *        - rte_dmadev_fill_sg() - scatter-gather form of fill
> + *        - rte_dmadev_fence()   - add a fence force ordering between operations
> + *        - rte_dmadev_perform() - issue doorbell to hardware
> + *      These APIs could work with different virt queues which have different
> + *      contexts.
> + *      The first four APIs are used to submit the operation request to the virt
> + *      queue, if the submission is successful, a cookie (as type
> + *      'dma_cookie_t') is returned, otherwise a negative number is returned.
> + *   b) The second part is to obtain the result of requests:
> + *        - rte_dmadev_completed()
> + *            - return the number of operation requests completed successfully.
> + *        - rte_dmadev_completed_fails()
> + *            - return the number of operation requests failed to complete.
> + *
> + * The misc APIs include info_get/queue_info_get/stats/xstats/selftest, provide
> + * information query and self-test capabilities.
> + *
> + * About the dataplane APIs MT-safe, there are two dimensions:
> + *   a) For one virt queue, the submit/completion API could be MT-safe,
> + *      e.g. one thread do submit operation, another thread do completion
> + *      operation.
> + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_VQ.
> + *      If driver don't support it, it's up to the application to guarantee
> + *      MT-safe.
> + *   b) For multiple virt queues on the same HW queue, e.g. one thread do
> + *      operation on virt-queue-0, another thread do operation on virt-queue-1.
> + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_MVQ.
> + *      If driver don't support it, it's up to the application to guarantee
> + *      MT-safe.

From an application PoV it may not be good to write portable
applications. Please check
latest thread with @Morten Brørup

> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <rte_common.h>
> +#include <rte_memory.h>
> +#include <rte_errno.h>
> +#include <rte_compat.h>

Sort in alphabetical order.

> +
> +/**
> + * dma_cookie_t - an opaque DMA cookie

Since we are defining the behaviour is not opaque any more.
I think, it is better to call ring_idx or so.


> +#define RTE_DMA_DEV_CAPA_MT_MVQ (1ull << 11) /**< Support MT-safe of multiple virt queues */

Please lot of @see for all symbols where it is being used. So that one
can understand the full scope of
symbols. See below example.

#define RTE_REGEXDEV_CAPA_RUNTIME_COMPILATION_F (1ULL << 0)
/**< RegEx device does support compiling the rules at runtime unlike
 * loading only the pre-built rule database using
 * struct rte_regexdev_config::rule_db in rte_regexdev_configure()
 *
 * @see struct rte_regexdev_config::rule_db, rte_regexdev_configure()
 * @see struct rte_regexdev_info::regexdev_capa
 */

> + *
> + * If dma_cookie_t is >=0 it's a DMA operation request cookie, <0 it's a error
> + * code.
> + * When using cookies, comply with the following rules:
> + * a) Cookies for each virtual queue are independent.
> + * b) For a virt queue, the cookie are monotonically incremented, when it reach
> + *    the INT_MAX, it wraps back to zero.
> + * c) The initial cookie of a virt queue is zero, after the device is stopped or
> + *    reset, the virt queue's cookie needs to be reset to zero.
> + * Example:
> + *    step-1: start one dmadev
> + *    step-2: enqueue a copy operation, the cookie return is 0
> + *    step-3: enqueue a copy operation again, the cookie return is 1
> + *    ...
> + *    step-101: stop the dmadev
> + *    step-102: start the dmadev
> + *    step-103: enqueue a copy operation, the cookie return is 0
> + *    ...
> + */

Good explanation.

> +typedef int32_t dma_cookie_t;


> +
> +/**
> + * dma_scatterlist - can hold scatter DMA operation request
> + */
> +struct dma_scatterlist {

I prefer to change scatterlist -> sg
i.e rte_dma_sg

> +       void *src;
> +       void *dst;
> +       uint32_t length;
> +};
> +

> +
> +/**
> + * A structure used to retrieve the contextual information of
> + * an DMA device
> + */
> +struct rte_dmadev_info {
> +       /**
> +        * Fields filled by framewok

typo.

> +        */
> +       struct rte_device *device; /**< Generic Device information */
> +       const char *driver_name; /**< Device driver name */
> +       int socket_id; /**< Socket ID where memory is allocated */
> +
> +       /**
> +        * Specification fields filled by driver
> +        */
> +       uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */
> +       uint16_t max_hw_queues; /**< Maximum number of HW queues. */
> +       uint16_t max_vqs_per_hw_queue;
> +       /**< Maximum number of virt queues to allocate per HW queue */
> +       uint16_t max_desc;
> +       /**< Maximum allowed number of virt queue descriptors */
> +       uint16_t min_desc;
> +       /**< Minimum allowed number of virt queue descriptors */

Please add max_nb_segs. i.e maximum number of segments supported.

> +
> +       /**
> +        * Status fields filled by driver
> +        */
> +       uint16_t nb_hw_queues; /**< Number of HW queues configured */
> +       uint16_t nb_vqs; /**< Number of virt queues configured */
> +};
> + i
> +
> +/**
> + * dma_address_type
> + */
> +enum dma_address_type {
> +       DMA_ADDRESS_TYPE_IOVA, /**< Use IOVA as dma address */
> +       DMA_ADDRESS_TYPE_VA, /**< Use VA as dma address */
> +};
> +
> +/**
> + * A structure used to configure a DMA device.
> + */
> +struct rte_dmadev_conf {
> +       enum dma_address_type addr_type; /**< Address type to used */

I think, there are 3 kinds of limitations/capabilities.

When the system is configured as IOVA as VA
1) Device supports any VA address like memory from rte_malloc(),
rte_memzone(), malloc, stack memory
2) Device support only VA address from rte_malloc(), rte_memzone() i.e
memory backed by hugepage and added to DMA map.

When the system is configured as IOVA as PA
1) Devices support only PA addresses .

IMO, Above needs to be  advertised as capability and application needs
to align with that
and I dont think application requests the driver to work in any of the modes.



> +       uint16_t nb_hw_queues; /**< Number of HW-queues enable to use */
> +       uint16_t max_vqs; /**< Maximum number of virt queues to use */

You need to what is max value allowed etc i.e it is based on
info_get() and mention the field
in info structure


> +
> +/**
> + * dma_transfer_direction
> + */
> +enum dma_transfer_direction {

rte_dma_transter_direction

> +       DMA_MEM_TO_MEM,
> +       DMA_MEM_TO_DEV,
> +       DMA_DEV_TO_MEM,
> +       DMA_DEV_TO_DEV,
> +};
> +
> +/**
> + * A structure used to configure a DMA virt queue.
> + */
> +struct rte_dmadev_queue_conf {
> +       enum dma_transfer_direction direction;


> +       /**< Associated transfer direction */
> +       uint16_t hw_queue_id; /**< The HW queue on which to create virt queue */
> +       uint16_t nb_desc; /**< Number of descriptor for this virt queue */
> +       uint64_t dev_flags; /**< Device specific flags */

Use of this? Need more comments on this.
Since it is in slowpath, We can have non opaque names here based on
each driver capability.


> +       void *dev_ctx; /**< Device specific context */

Use of this ? Need more comment ont this.


Please add some good amount of reserved bits and have API to init this
structure for future ABI stability, say rte_dmadev_queue_config_init()
or so.


> +
> +/**
> + * A structure used to retrieve information of a DMA virt queue.
> + */
> +struct rte_dmadev_queue_info {
> +       enum dma_transfer_direction direction;

A queue may support all directions so I think it should be a bitfield.

> +       /**< Associated transfer direction */
> +       uint16_t hw_queue_id; /**< The HW queue on which to create virt queue */
> +       uint16_t nb_desc; /**< Number of descriptor for this virt queue */
> +       uint64_t dev_flags; /**< Device specific flags */
> +};
> +

> +__rte_experimental
> +static inline dma_cookie_t
> +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id,
> +                  const struct dma_scatterlist *sg,
> +                  uint32_t sg_len, uint64_t flags)

I would like to change this as:
rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id, const struct
rte_dma_sg *src, uint32_t nb_src,
const struct rte_dma_sg *dst, uint32_t nb_dst) or so allow the use case like
src 30 MB copy can be splitted as written as 1 MB x 30 dst.



> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +       return (*dev->copy_sg)(dev, vq_id, sg, sg_len, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a fill operation onto the DMA virt queue
> + *
> + * This queues up a fill operation to be performed by hardware, but does not
> + * trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vq_id
> + *   The identifier of virt queue.
> + * @param pattern
> + *   The pattern to populate the destination buffer with.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the destination buffer.
> + * @param flags
> + *   An opaque flags for this operation.

PLEASE REMOVE opaque stuff from fastpath it will be a pain for
application writers as
they need to write multiple combinations of fastpath. flags are OK, if
we have a valid
generic flag now to control the transfer behavior.


> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Add a fence to force ordering between operations
> + *
> + * This adds a fence to a sequence of operations to enforce ordering, such that
> + * all operations enqueued before the fence must be completed before operations
> + * after the fence.
> + * NOTE: Since this fence may be added as a flag to the last operation enqueued,
> + * this API may not function correctly when called immediately after an
> + * "rte_dmadev_perform" call i.e. before any new operations are enqueued.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vq_id
> + *   The identifier of virt queue.
> + *
> + * @return
> + *   - =0: Successful add fence.
> + *   - <0: Failure to add fence.
> + *
> + * NOTE: The caller must ensure that the input parameter is valid and the
> + *       corresponding device supports the operation.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_fence(uint16_t dev_id, uint16_t vq_id)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +       return (*dev->fence)(dev, vq_id);
> +}

Since HW submission is in a queue(FIFO) the ordering is always
maintained. Right?
Could you share more details and use case of fence() from
driver/application PoV?


> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Trigger hardware to begin performing enqueued operations
> + *
> + * This API is used to write the "doorbell" to the hardware to trigger it
> + * to begin the operations previously enqueued by rte_dmadev_copy/fill()
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vq_id
> + *   The identifier of virt queue.
> + *
> + * @return
> + *   - =0: Successful trigger hardware.
> + *   - <0: Failure to trigger hardware.
> + *
> + * NOTE: The caller must ensure that the input parameter is valid and the
> + *       corresponding device supports the operation.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_perform(uint16_t dev_id, uint16_t vq_id)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +       return (*dev->perform)(dev, vq_id);
> +}

Since we have additional function call overhead in all the
applications for this scheme, I would like to understand
the use of doing this way vs enq does the doorbell implicitly from
driver/application PoV?


> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that have been successful completed.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vq_id
> + *   The identifier of virt queue.
> + * @param nb_cpls
> + *   The maximum number of completed operations that can be processed.
> + * @param[out] cookie
> + *   The last completed operation's cookie.
> + * @param[out] has_error
> + *   Indicates if there are transfer error.
> + *
> + * @return
> + *   The number of operations that successful completed.

successfully

> + *
> + * NOTE: The caller must ensure that the input parameter is valid and the
> + *       corresponding device supports the operation.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed(uint16_t dev_id, uint16_t vq_id, const uint16_t nb_cpls,
> +                    dma_cookie_t *cookie, bool *has_error)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +       has_error = false;
> +       return (*dev->completed)(dev, vq_id, nb_cpls, cookie, has_error);

It may be better to have cookie/ring_idx as third argument.

> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that failed to complete.
> + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vq_id
> + *   The identifier of virt queue.
(> + * @param nb_status
> + *   Indicates the size  of status array.
> + * @param[out] status
> + *   The error code of operations that failed to complete.
> + * @param[out] cookie
> + *   The last failed completed operation's cookie.
> + *
> + * @return
> + *   The number of operations that failed to complete.
> + *
> + * NOTE: The caller must ensure that the input parameter is valid and the
> + *       corresponding device supports the operation.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vq_id,
> +                          const uint16_t nb_status, uint32_t *status,
> +                          dma_cookie_t *cookie)

IMO, it is better to move cookie/rind_idx at 3.
Why it would return any array of errors? since it called after
rte_dmadev_completed() has
has_error. Is it better to change

rte_dmadev_error_status((uint16_t dev_id, uint16_t vq_id, dma_cookie_t
*cookie,  uint32_t *status)

I also think, we may need to set status as bitmask and enumerate all
the combination of error codes
of all the driver and return string from driver existing rte_flow_error

See
struct rte_flow_error {
        enum rte_flow_error_type type; /**< Cause field and error types. */
        const void *cause; /**< Object responsible for the error. */
        const char *message; /**< Human-readable error message. */
};

> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +       return (*dev->completed_fails)(dev, vq_id, nb_status, status, cookie);
> +}
> +
> +struct rte_dmadev_stats {
> +       uint64_t enqueue_fail_count;
> +       /**< Conut of all operations which failed enqueued */
> +       uint64_t enqueued_count;
> +       /**< Count of all operations which successful enqueued */
> +       uint64_t completed_fail_count;
> +       /**< Count of all operations which failed to complete */
> +       uint64_t completed_count;
> +       /**< Count of all operations which successful complete */
> +};

We need to have capability API to tell which items are
updated/supported by the driver.


> diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> new file mode 100644
> index 0000000..a3afea2
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev_core.h
> @@ -0,0 +1,98 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2021 HiSilicon Limited.
> + */
> +
> +#ifndef _RTE_DMADEV_CORE_H_
> +#define _RTE_DMADEV_CORE_H_
> +
> +/**
> + * @file
> + *
> + * RTE DMA Device internal header.
> + *
> + * This header contains internal data types. But they are still part of the
> + * public API because they are used by inline public functions.
> + */
> +
> +struct rte_dmadev;
> +
> +typedef dma_cookie_t (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vq_id,
> +                                     void *src, void *dst,
> +                                     uint32_t length, uint64_t flags);
> +/**< @internal Function used to enqueue a copy operation. */

To avoid namespace conflict(as it is public API) use rte_


> +
> +/**
> + * The data structure associated with each DMA device.
> + */
> +struct rte_dmadev {
> +       /**< Enqueue a copy operation onto the DMA device. */
> +       dmadev_copy_t copy;
> +       /**< Enqueue a scatter list copy operation onto the DMA device. */
> +       dmadev_copy_sg_t copy_sg;
> +       /**< Enqueue a fill operation onto the DMA device. */
> +       dmadev_fill_t fill;
> +       /**< Enqueue a scatter list fill operation onto the DMA device. */
> +       dmadev_fill_sg_t fill_sg;
> +       /**< Add a fence to force ordering between operations. */
> +       dmadev_fence_t fence;
> +       /**< Trigger hardware to begin performing enqueued operations. */
> +       dmadev_perform_t perform;
> +       /**< Returns the number of operations that successful completed. */
> +       dmadev_completed_t completed;
> +       /**< Returns the number of operations that failed to complete. */
> +       dmadev_completed_fails_t completed_fails;

We need to limit fastpath items in 1 CL

> +
> +       void *dev_private; /**< PMD-specific private data */
> +       const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD */
> +
> +       uint16_t dev_id; /**< Device ID for this instance */
> +       int socket_id; /**< Socket ID where memory is allocated */
> +       struct rte_device *device;
> +       /**< Device info. supplied during device initialization */
> +       const char *driver_name; /**< Driver info. supplied by probing */
> +       char name[RTE_DMADEV_NAME_MAX_LEN]; /**< Device name */
> +
> +       RTE_STD_C11
> +       uint8_t attached : 1; /**< Flag indicating the device is attached */
> +       uint8_t started : 1; /**< Device state: STARTED(1)/STOPPED(0) */

Add a couple of reserved fields for future ABI stability.

> +
> +} __rte_cache_aligned;
> +
> +extern struct rte_dmadev rte_dmadevices[];
> +

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
  @ 2021-07-04 16:13  3%   ` Andrew Rybchenko
  2021-07-05  5:47  0%     ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-07-04 16:13 UTC (permalink / raw)
  To: Xueming Li; +Cc: dev, Wang Haiyue, Thomas Monjalon, Kinsella Ray, Neil Horman

On 6/25/21 2:47 PM, Xueming Li wrote:
> Auxiliary bus [1] provides a way to split function into child-devices
> representing sub-domains of functionality. Each auxiliary device
> represents a part of its parent functionality.
> 
> Auxiliary device is identified by unique device name, sysfs path:
>   /sys/bus/auxiliary/devices/<name>
> 
> Devargs legacy syntax ofauxiliary device:

Missing space after 'of'

>   -a auxiliary:<name>[,args...]
> Devargs generic syntax of auxiliary device:
>   -a bus=auxiliary,name=<name>,,/class=<classs>,,/driver=<driver>,,

Are two commas above intentionall? What for?

> 
> [1] kernel auxiliary bus document:
> https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html
> 
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>

With my below notes fixed:

Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

> Cc: Wang Haiyue <haiyue.wang@intel.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>
> Cc: Kinsella Ray <mdr@ashroe.eu>
> ---
>  MAINTAINERS                               |   5 +
>  doc/guides/rel_notes/release_21_08.rst    |   6 +
>  drivers/bus/auxiliary/auxiliary_common.c  | 411 ++++++++++++++++++++++
>  drivers/bus/auxiliary/auxiliary_params.c  |  59 ++++
>  drivers/bus/auxiliary/linux/auxiliary.c   | 141 ++++++++
>  drivers/bus/auxiliary/meson.build         |  16 +
>  drivers/bus/auxiliary/private.h           |  74 ++++
>  drivers/bus/auxiliary/rte_bus_auxiliary.h | 201 +++++++++++
>  drivers/bus/auxiliary/version.map         |   7 +
>  drivers/bus/meson.build                   |   1 +
>  10 files changed, 921 insertions(+)
>  create mode 100644 drivers/bus/auxiliary/auxiliary_common.c
>  create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
>  create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
>  create mode 100644 drivers/bus/auxiliary/meson.build
>  create mode 100644 drivers/bus/auxiliary/private.h
>  create mode 100644 drivers/bus/auxiliary/rte_bus_auxiliary.h
>  create mode 100644 drivers/bus/auxiliary/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5877a16971..eaf691ca6a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -525,6 +525,11 @@ F: doc/guides/mempool/octeontx2.rst
>  Bus Drivers
>  -----------
>  
> +Auxiliary bus driver

Shouldn't it be EXPERIMENTAL?

> +M: Parav Pandit <parav@nvidia.com>
> +M: Xueming Li <xuemingl@nvidia.com>
> +F: drivers/bus/auxiliary/
> +
>  Intel FPGA bus
>  M: Rosen Xu <rosen.xu@intel.com>
>  F: drivers/bus/ifpga/
> diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
> index a6ecfdf3ce..e7ef4c8a05 100644
> --- a/doc/guides/rel_notes/release_21_08.rst
> +++ b/doc/guides/rel_notes/release_21_08.rst
> @@ -55,6 +55,12 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
>  
> +* **Added auxiliary bus support.**
> +
> +  Auxiliary bus provides a way to split function into child-devices
> +  representing sub-domains of functionality. Each auxiliary device
> +  represents a part of its parent functionality.
> +
>  
>  Removed Items
>  -------------
> diff --git a/drivers/bus/auxiliary/auxiliary_common.c b/drivers/bus/auxiliary/auxiliary_common.c
> new file mode 100644
> index 0000000000..8a75306da5
> --- /dev/null
> +++ b/drivers/bus/auxiliary/auxiliary_common.c
> @@ -0,0 +1,411 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2021 NVIDIA Corporation & Affiliates
> + */
> +
> +#include <string.h>
> +#include <inttypes.h>
> +#include <stdint.h>
> +#include <stdbool.h>
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <sys/queue.h>
> +#include <rte_errno.h>
> +#include <rte_interrupts.h>
> +#include <rte_log.h>
> +#include <rte_bus.h>
> +#include <rte_per_lcore.h>
> +#include <rte_memory.h>
> +#include <rte_eal.h>
> +#include <rte_eal_paging.h>
> +#include <rte_string_fns.h>
> +#include <rte_common.h>
> +#include <rte_devargs.h>
> +
> +#include "private.h"
> +#include "rte_bus_auxiliary.h"
> +
> +static struct rte_devargs *
> +auxiliary_devargs_lookup(const char *name)
> +{
> +	struct rte_devargs *devargs;
> +
> +	RTE_EAL_DEVARGS_FOREACH(RTE_BUS_AUXILIARY_NAME, devargs) {
> +		if (strcmp(devargs->name, name) == 0)
> +			return devargs;
> +	}
> +	return NULL;
> +}
> +
> +/*
> + * Test whether the auxiliary device exist

Missing full stop above.

> + *
> + * Stub for OS not supporting auxiliary bus.
> + */
> +__rte_weak bool
> +auxiliary_dev_exists(const char *name)
> +{
> +	RTE_SET_USED(name);
> +	return false;
> +}
> +
> +/*
> + * Scan the devices in the auxiliary bus.
> + *
> + * Stub for OS not supporting auxiliary bus.
> + */
> +__rte_weak int
> +auxiliary_scan(void)
> +{
> +	return 0;
> +}
> +
> +/*
> + * Update a device's devargs being scanned.
> + *
> + * @param aux_dev
> + *	AUXILIARY device.
> + */
> +void
> +auxiliary_on_scan(struct rte_auxiliary_device *aux_dev)
> +{
> +	aux_dev->device.devargs = auxiliary_devargs_lookup(aux_dev->name);
> +}
> +
> +/*
> + * Match the auxiliary driver and device using driver function.
> + */
> +bool
> +auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
> +		const struct rte_auxiliary_device *aux_dev)
> +{
> +	if (aux_drv->match == NULL)
> +		return false;
> +	return aux_drv->match(aux_dev->name);
> +}
> +
> +/*
> + * Call the probe() function of the driver.
> + */
> +static int
> +rte_auxiliary_probe_one_driver(struct rte_auxiliary_driver *drv,
> +			       struct rte_auxiliary_device *dev)
> +{
> +	enum rte_iova_mode iova_mode;
> +	int ret;
> +
> +	if ((drv == NULL) || (dev == NULL))

Unnecessary internal parenthesis.

> +		return -EINVAL;
> +
> +	/* Check if driver supports it. */
> +	if (!auxiliary_match(drv, dev))
> +		/* Match of device and driver failed */
> +		return 1;
> +
> +	/* No initialization when marked as blocked, return without error. */
> +	if (dev->device.devargs != NULL &&
> +	    dev->device.devargs->policy == RTE_DEV_BLOCKED) {
> +		AUXILIARY_LOG(INFO, "Device is blocked, not initializing");
> +		return -1;
> +	}
> +
> +	if (dev->device.numa_node < 0) {
> +		AUXILIARY_LOG(INFO, "Device is not NUMA-aware, defaulting socket to 0");

socket -> NUMA node

> +		dev->device.numa_node = 0;
> +	}
> +
> +	iova_mode = rte_eal_iova_mode();
> +	if ((drv->drv_flags & RTE_AUXILIARY_DRV_NEED_IOVA_AS_VA) > 0 &&
> +	    iova_mode != RTE_IOVA_VA) {
> +		AUXILIARY_LOG(ERR, "Driver %s expecting VA IOVA mode but current mode is PA, not initializing",
> +			      drv->driver.name);
> +		return -EINVAL;
> +	}
> +
> +	dev->driver = drv;
> +
> +	AUXILIARY_LOG(INFO, "Probe auxiliary driver: %s device: %s (socket %i)",

socket -> NUMA node

> +		      drv->driver.name, dev->name, dev->device.numa_node);
> +	ret = drv->probe(drv, dev);
> +	if (ret != 0)
> +		dev->driver = NULL;
> +	else
> +		dev->device.driver = &drv->driver;
> +
> +	return ret;
> +}
> +
> +/*
> + * Call the remove() function of the driver.
> + */
> +static int
> +rte_auxiliary_driver_remove_dev(struct rte_auxiliary_device *dev)
> +{
> +	struct rte_auxiliary_driver *drv;
> +	int ret = 0;
> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	drv = dev->driver;
> +
> +	AUXILIARY_LOG(DEBUG, "Driver %s remove auxiliary device %s on NUMA socket %i",

socket -> node

> +		      drv->driver.name, dev->name, dev->device.numa_node);
> +
> +	if (drv->remove != NULL) {
> +		ret = drv->remove(dev);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	/* clear driver structure */
> +	dev->driver = NULL;
> +	dev->device.driver = NULL;
> +
> +	return 0;
> +}
> +
> +/*
> + * Call the probe() function of all registered driver for the given device.
> + * Return < 0 if initialization failed.
> + * Return 1 if no driver is found for this device.
> + */
> +static int
> +auxiliary_probe_all_drivers(struct rte_auxiliary_device *dev)
> +{
> +	struct rte_auxiliary_driver *drv;
> +	int rc;
> +
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	FOREACH_DRIVER_ON_AUXILIARY_BUS(drv) {
> +		if (!drv->match(dev->name))
> +			continue;
> +
> +		rc = rte_auxiliary_probe_one_driver(drv, dev);
> +		if (rc < 0)
> +			/* negative value is an error */
> +			return rc;
> +		if (rc > 0)
> +			/* positive value means driver doesn't support it */
> +			continue;
> +		return 0;
> +	}
> +	return 1;
> +}
> +
> +/*
> + * Scan the content of the auxiliary bus, and call the probe function for
> + * all registered drivers to try to probe discovered devices.
> + */
> +static int
> +auxiliary_probe(void)
> +{
> +	struct rte_auxiliary_device *dev = NULL;
> +	size_t probed = 0, failed = 0;
> +	int ret = 0;
> +
> +	FOREACH_DEVICE_ON_AUXILIARY_BUS(dev) {
> +		probed++;
> +
> +		ret = auxiliary_probe_all_drivers(dev);
> +		if (ret < 0) {
> +			if (ret != -EEXIST) {
> +				AUXILIARY_LOG(ERR, "Requested device %s cannot be used",
> +					      dev->name);
> +				rte_errno = errno;
> +				failed++;
> +			}
> +			ret = 0;
> +		}
> +	}
> +
> +	return (probed && probed == failed) ? -1 : 0;
> +}
> +
> +static int
> +auxiliary_parse(const char *name, void *addr)
> +{
> +	struct rte_auxiliary_driver *drv = NULL;
> +	const char **out = addr;
> +
> +	/* Allow empty device name "auxiliary:" to bypass entire bus scan. */
> +	if (strlen(name) == 0)
> +		return 0;
> +
> +	FOREACH_DRIVER_ON_AUXILIARY_BUS(drv) {
> +		if (drv->match(name))
> +			break;
> +	}
> +	if (drv != NULL && addr != NULL)
> +		*out = name;
> +	return drv != NULL ? 0 : -1;
> +}
> +
> +/* Register a driver */
> +void
> +rte_auxiliary_register(struct rte_auxiliary_driver *driver)
> +{
> +	TAILQ_INSERT_TAIL(&auxiliary_bus.driver_list, driver, next);
> +	driver->bus = &auxiliary_bus;
> +}
> +
> +/* Unregister a driver */
> +void
> +rte_auxiliary_unregister(struct rte_auxiliary_driver *driver)
> +{
> +	TAILQ_REMOVE(&auxiliary_bus.driver_list, driver, next);
> +	driver->bus = NULL;
> +}
> +
> +/* Add a device to auxiliary bus */
> +void
> +auxiliary_add_device(struct rte_auxiliary_device *aux_dev)
> +{
> +	TAILQ_INSERT_TAIL(&auxiliary_bus.device_list, aux_dev, next);
> +}
> +
> +/* Insert a device into a predefined position in auxiliary bus */
> +void
> +auxiliary_insert_device(struct rte_auxiliary_device *exist_aux_dev,
> +			struct rte_auxiliary_device *new_aux_dev)
> +{
> +	TAILQ_INSERT_BEFORE(exist_aux_dev, new_aux_dev, next);
> +}
> +
> +/* Remove a device from auxiliary bus */
> +static void
> +rte_auxiliary_remove_device(struct rte_auxiliary_device *auxiliary_dev)
> +{
> +	TAILQ_REMOVE(&auxiliary_bus.device_list, auxiliary_dev, next);
> +}
> +
> +static struct rte_device *
> +auxiliary_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
> +		      const void *data)
> +{
> +	const struct rte_auxiliary_device *pstart;
> +	struct rte_auxiliary_device *adev;
> +
> +	if (start != NULL) {
> +		pstart = RTE_DEV_TO_AUXILIARY_CONST(start);
> +		adev = TAILQ_NEXT(pstart, next);
> +	} else {
> +		adev = TAILQ_FIRST(&auxiliary_bus.device_list);
> +	}
> +	while (adev != NULL) {
> +		if (cmp(&adev->device, data) == 0)
> +			return &adev->device;
> +		adev = TAILQ_NEXT(adev, next);
> +	}
> +	return NULL;
> +}
> +
> +static int
> +auxiliary_plug(struct rte_device *dev)
> +{
> +	if (!auxiliary_dev_exists(dev->name))
> +		return -ENOENT;
> +	return auxiliary_probe_all_drivers(RTE_DEV_TO_AUXILIARY(dev));
> +}
> +
> +static int
> +auxiliary_unplug(struct rte_device *dev)
> +{
> +	struct rte_auxiliary_device *adev;
> +	int ret;
> +
> +	adev = RTE_DEV_TO_AUXILIARY(dev);
> +	ret = rte_auxiliary_driver_remove_dev(adev);
> +	if (ret == 0) {
> +		rte_auxiliary_remove_device(adev);
> +		rte_devargs_remove(dev->devargs);
> +		free(adev);
> +	}
> +	return ret;
> +}
> +
> +static int
> +auxiliary_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
> +{
> +	struct rte_auxiliary_device *aux_dev = RTE_DEV_TO_AUXILIARY(dev);
> +
> +	if (dev == NULL || aux_dev->driver == NULL) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}
> +	if (aux_dev->driver->dma_map == NULL) {
> +		rte_errno = ENOTSUP;
> +		return -1;
> +	}
> +	return aux_dev->driver->dma_map(aux_dev, addr, iova, len);
> +}
> +
> +static int
> +auxiliary_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
> +		    size_t len)
> +{
> +	struct rte_auxiliary_device *aux_dev = RTE_DEV_TO_AUXILIARY(dev);
> +
> +	if (dev == NULL || aux_dev->driver == NULL) {
> +		rte_errno = EINVAL;
> +		return -1;
> +	}
> +	if (aux_dev->driver->dma_unmap == NULL) {
> +		rte_errno = ENOTSUP;
> +		return -1;
> +	}
> +	return aux_dev->driver->dma_unmap(aux_dev, addr, iova, len);
> +}
> +
> +bool
> +auxiliary_is_ignored_device(const char *name)
> +{
> +	struct rte_devargs *devargs = auxiliary_devargs_lookup(name);
> +
> +	switch (auxiliary_bus.bus.conf.scan_mode) {
> +	case RTE_BUS_SCAN_ALLOWLIST:
> +		if (devargs && devargs->policy == RTE_DEV_ALLOWED)
> +			return false;
> +		break;
> +	case RTE_BUS_SCAN_UNDEFINED:
> +	case RTE_BUS_SCAN_BLOCKLIST:
> +		if (devargs == NULL || devargs->policy != RTE_DEV_BLOCKED)
> +			return false;
> +		break;
> +	}
> +	return true;
> +}
> +
> +static enum rte_iova_mode
> +auxiliary_get_iommu_class(void)
> +{
> +	const struct rte_auxiliary_driver *drv;
> +
> +	FOREACH_DRIVER_ON_AUXILIARY_BUS(drv) {
> +		if ((drv->drv_flags & RTE_AUXILIARY_DRV_NEED_IOVA_AS_VA) > 0)
> +			return RTE_IOVA_VA;
> +	}
> +
> +	return RTE_IOVA_DC;
> +}
> +
> +struct rte_auxiliary_bus auxiliary_bus = {
> +	.bus = {
> +		.scan = auxiliary_scan,
> +		.probe = auxiliary_probe,
> +		.find_device = auxiliary_find_device,
> +		.plug = auxiliary_plug,
> +		.unplug = auxiliary_unplug,
> +		.parse = auxiliary_parse,
> +		.dma_map = auxiliary_dma_map,
> +		.dma_unmap = auxiliary_dma_unmap,
> +		.get_iommu_class = auxiliary_get_iommu_class,
> +		.dev_iterate = auxiliary_dev_iterate,
> +	},
> +	.device_list = TAILQ_HEAD_INITIALIZER(auxiliary_bus.device_list),
> +	.driver_list = TAILQ_HEAD_INITIALIZER(auxiliary_bus.driver_list),
> +};
> +
> +RTE_REGISTER_BUS(auxiliary, auxiliary_bus.bus);
> +RTE_LOG_REGISTER_DEFAULT(auxiliary_bus_logtype, NOTICE);
> diff --git a/drivers/bus/auxiliary/auxiliary_params.c b/drivers/bus/auxiliary/auxiliary_params.c
> new file mode 100644
> index 0000000000..cd3fa56cb4
> --- /dev/null
> +++ b/drivers/bus/auxiliary/auxiliary_params.c
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2021 NVIDIA Corporation & Affiliates
> + */
> +
> +#include <string.h>
> +
> +#include <rte_bus.h>
> +#include <rte_dev.h>
> +#include <rte_errno.h>
> +#include <rte_kvargs.h>
> +
> +#include "private.h"
> +#include "rte_bus_auxiliary.h"
> +
> +enum auxiliary_params {
> +	RTE_AUXILIARY_PARAM_NAME,
> +};
> +
> +static const char * const auxiliary_params_keys[] = {
> +	[RTE_AUXILIARY_PARAM_NAME] = "name",
> +};
> +
> +static int
> +auxiliary_dev_match(const struct rte_device *dev,
> +	      const void *_kvlist)
> +{
> +	const struct rte_kvargs *kvlist = _kvlist;
> +	int ret;
> +
> +	ret = rte_kvargs_process(kvlist,
> +			auxiliary_params_keys[RTE_AUXILIARY_PARAM_NAME],
> +			rte_kvargs_strcmp, (void *)(uintptr_t)dev->name);
> +
> +	return ret != 0 ? -1 : 0;
> +}
> +
> +void *
> +auxiliary_dev_iterate(const void *start,
> +		    const char *str,
> +		    const struct rte_dev_iterator *it __rte_unused)
> +{
> +	rte_bus_find_device_t find_device;
> +	struct rte_kvargs *kvargs = NULL;
> +	struct rte_device *dev;
> +
> +	if (str != NULL) {
> +		kvargs = rte_kvargs_parse(str, auxiliary_params_keys);
> +		if (kvargs == NULL) {
> +			AUXILIARY_LOG(ERR, "cannot parse argument list %s",
> +				      str);
> +			rte_errno = EINVAL;
> +			return NULL;
> +		}
> +	}
> +	find_device = auxiliary_bus.bus.find_device;
> +	dev = find_device(start, auxiliary_dev_match, kvargs);
> +	rte_kvargs_free(kvargs);
> +	return dev;
> +}
> diff --git a/drivers/bus/auxiliary/linux/auxiliary.c b/drivers/bus/auxiliary/linux/auxiliary.c
> new file mode 100644
> index 0000000000..8464487971
> --- /dev/null
> +++ b/drivers/bus/auxiliary/linux/auxiliary.c
> @@ -0,0 +1,141 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2021 NVIDIA Corporation & Affiliates
> + */
> +
> +#include <string.h>
> +#include <dirent.h>
> +
> +#include <rte_log.h>
> +#include <rte_bus.h>
> +#include <rte_malloc.h>
> +#include <rte_devargs.h>
> +#include <rte_memcpy.h>
> +#include <eal_filesystem.h>
> +
> +#include "../rte_bus_auxiliary.h"
> +#include "../private.h"
> +
> +#define AUXILIARY_SYSFS_PATH "/sys/bus/auxiliary/devices"
> +
> +/* Scan one auxiliary sysfs entry, and fill the devices list from it. */
> +static int
> +auxiliary_scan_one(const char *dirname, const char *name)
> +{
> +	struct rte_auxiliary_device *dev;
> +	struct rte_auxiliary_device *dev2;
> +	char filename[PATH_MAX];
> +	unsigned long tmp;
> +	int ret;
> +
> +	dev = malloc(sizeof(*dev));
> +	if (dev == NULL)
> +		return -1;
> +
> +	memset(dev, 0, sizeof(*dev));
> +	if (rte_strscpy(dev->name, name, sizeof(dev->name)) < 0) {
> +		free(dev);
> +		return -1;
> +	}
> +	dev->device.name = dev->name;
> +	dev->device.bus = &auxiliary_bus.bus;
> +
> +	/* Get NUMA node, default to 0 if not present */
> +	snprintf(filename, sizeof(filename), "%s/%s/numa_node",
> +		 dirname, name);
> +	if (access(filename, F_OK) != -1) {
> +		if (eal_parse_sysfs_value(filename, &tmp) == 0)
> +			dev->device.numa_node = tmp;
> +		else
> +			dev->device.numa_node = -1;
> +	} else {
> +		dev->device.numa_node = 0;
> +	}
> +
> +	auxiliary_on_scan(dev);
> +
> +	/* Device is valid, add in list (sorted) */
> +	TAILQ_FOREACH(dev2, &auxiliary_bus.device_list, next) {
> +		ret = strcmp(dev->name, dev2->name);
> +		if (ret > 0)
> +			continue;
> +		if (ret < 0) {
> +			auxiliary_insert_device(dev2, dev);
> +		} else { /* already registered */
> +			if (rte_dev_is_probed(&dev2->device) &&
> +			    dev2->device.devargs != dev->device.devargs) {
> +				/* To probe device with new devargs. */
> +				rte_devargs_remove(dev2->device.devargs);
> +				auxiliary_on_scan(dev2);
> +			}
> +			free(dev);
> +		}
> +		return 0;
> +	}
> +	auxiliary_add_device(dev);
> +	return 0;
> +}
> +
> +/*
> + * Test whether the auxiliary device exist

Missing full stop above.

> + */
> +bool
> +auxiliary_dev_exists(const char *name)
> +{
> +	DIR *dir;
> +	char dirname[PATH_MAX];
> +
> +	snprintf(dirname, sizeof(dirname), "%s/%s",
> +		 AUXILIARY_SYSFS_PATH, name);
> +	dir = opendir(dirname);
> +	if (dir == NULL)
> +		return false;
> +	closedir(dir);
> +	return true;
> +}
> +
> +/*
> + * Scan the devices in the auxiliary bus

Missing full stop above.

> + */
> +int
> +auxiliary_scan(void)
> +{
> +	struct dirent *e;
> +	DIR *dir;
> +	char dirname[PATH_MAX];
> +	struct rte_auxiliary_driver *drv;
> +
> +	dir = opendir(AUXILIARY_SYSFS_PATH);
> +	if (dir == NULL) {
> +		AUXILIARY_LOG(INFO, "%s not found, is auxiliary module loaded?",
> +			      AUXILIARY_SYSFS_PATH);
> +		return 0;
> +	}
> +
> +	while ((e = readdir(dir)) != NULL) {
> +		if (e->d_name[0] == '.')
> +			continue;
> +
> +		if (auxiliary_is_ignored_device(e->d_name))
> +			continue;
> +
> +		snprintf(dirname, sizeof(dirname), "%s/%s",
> +			 AUXILIARY_SYSFS_PATH, e->d_name);
> +
> +		/* Ignore if no driver can handle. */
> +		FOREACH_DRIVER_ON_AUXILIARY_BUS(drv) {
> +			if (drv->match(e->d_name))
> +				break;
> +		}
> +		if (drv == NULL)
> +			continue;
> +
> +		if (auxiliary_scan_one(dirname, e->d_name) < 0)
> +			goto error;
> +	}
> +	closedir(dir);
> +	return 0;
> +
> +error:
> +	closedir(dir);
> +	return -1;
> +}
> diff --git a/drivers/bus/auxiliary/meson.build b/drivers/bus/auxiliary/meson.build
> new file mode 100644
> index 0000000000..357550eff7
> --- /dev/null
> +++ b/drivers/bus/auxiliary/meson.build
> @@ -0,0 +1,16 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright (c) 2021 NVIDIA Corporation & Affiliates
> +
> +headers = files(
> +        'rte_bus_auxiliary.h',
> +)
> +sources = files(
> +        'auxiliary_common.c',
> +        'auxiliary_params.c',
> +)
> +if is_linux
> +    sources += files(
> +        'linux/auxiliary.c',
> +    )
> +endif
> +deps += ['kvargs']
> diff --git a/drivers/bus/auxiliary/private.h b/drivers/bus/auxiliary/private.h
> new file mode 100644
> index 0000000000..cb3e849993
> --- /dev/null
> +++ b/drivers/bus/auxiliary/private.h
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2021 NVIDIA Corporation & Affiliates
> + */
> +
> +#ifndef AUXILIARY_PRIVATE_H

May be add BUS_ prefix at leaat?

> +#define AUXILIARY_PRIVATE_H
> +
> +#include <stdbool.h>
> +#include <stdio.h>
> +
> +#include "rte_bus_auxiliary.h"
> +
> +extern struct rte_auxiliary_bus auxiliary_bus;
> +extern int auxiliary_bus_logtype;
> +
> +#define AUXILIARY_LOG(level, ...) \
> +	rte_log(RTE_LOG_ ## level, auxiliary_bus_logtype, \
> +		RTE_FMT("auxiliary bus: " RTE_FMT_HEAD(__VA_ARGS__,) "\n", \
> +			RTE_FMT_TAIL(__VA_ARGS__,)))
> +
> +/* Auxiliary bus iterators */
> +#define FOREACH_DEVICE_ON_AUXILIARY_BUS(p) \
> +		TAILQ_FOREACH(p, &(auxiliary_bus.device_list), next)
> +
> +#define FOREACH_DRIVER_ON_AUXILIARY_BUS(p) \
> +		TAILQ_FOREACH(p, &(auxiliary_bus.driver_list), next)
> +
> +bool auxiliary_dev_exists(const char *name);
> +
> +/*
> + * Scan the content of the auxiliary bus, and the devices in the devices
> + * list.
> + */
> +int auxiliary_scan(void);
> +
> +/*
> + * Update a device being scanned.
> + */
> +void auxiliary_on_scan(struct rte_auxiliary_device *aux_dev);
> +
> +/*
> + * Validate whether a device with given auxiliary device should be ignored
> + * or not.
> + */
> +bool auxiliary_is_ignored_device(const char *name);
> +
> +/*
> + * Add an auxiliary device to the auxiliary bus (append to auxiliary device
> + * list). This function also updates the bus references of the auxiliary
> + * device and the generic device object embedded within.
> + */
> +void auxiliary_add_device(struct rte_auxiliary_device *aux_dev);
> +
> +/*
> + * Insert an auxiliary device in the auxiliary bus at a particular location
> + * in the device list. It also updates the auxiliary bus reference of the
> + * new devices to be inserted.
> + */
> +void auxiliary_insert_device(struct rte_auxiliary_device *exist_aux_dev,
> +			     struct rte_auxiliary_device *new_aux_dev);
> +
> +/*
> + * Match the auxiliary driver and device by driver function

Missing full stop.

> + */
> +bool auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
> +		     const struct rte_auxiliary_device *aux_dev);
> +
> +/*
> + * Iterate over devices, matching any device against the provided string

Missing full stop.

> + */
> +void *auxiliary_dev_iterate(const void *start, const char *str,
> +			    const struct rte_dev_iterator *it);
> +
> +#endif /* AUXILIARY_PRIVATE_H */
> diff --git a/drivers/bus/auxiliary/rte_bus_auxiliary.h b/drivers/bus/auxiliary/rte_bus_auxiliary.h
> new file mode 100644
> index 0000000000..16b147e387
> --- /dev/null
> +++ b/drivers/bus/auxiliary/rte_bus_auxiliary.h
> @@ -0,0 +1,201 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright (c) 2021 NVIDIA Corporation & Affiliates
> + */
> +
> +#ifndef RTE_BUS_AUXILIARY_H
> +#define RTE_BUS_AUXILIARY_H
> +
> +/**
> + * @file
> + *
> + * Auxiliary Bus Interface.
> + */
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <limits.h>
> +#include <errno.h>
> +#include <sys/queue.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +
> +#include <rte_debug.h>
> +#include <rte_interrupts.h>
> +#include <rte_dev.h>
> +#include <rte_bus.h>
> +#include <rte_kvargs.h>
> +
> +#define RTE_BUS_AUXILIARY_NAME "auxiliary"
> +
> +/* Forward declarations */
> +struct rte_auxiliary_driver;
> +struct rte_auxiliary_bus;
> +struct rte_auxiliary_device;
> +
> +/**
> + * Match function for the driver to decide if device can be handled.
> + *
> + * @param name
> + *   Pointer to the auxiliary device name.
> + * @return
> + *   Whether the driver can handle the auxiliary device.
> + */
> +typedef bool(rte_auxiliary_match_t)(const char *name);
> +
> +/**
> + * Initialization function for the driver called during auxiliary probing.
> + *
> + * @param drv
> + *   Pointer to the auxiliary driver.
> + * @param dev
> + *   Pointer to the auxiliary device.
> + * @return
> + *   - 0 On success.
> + *   - Negative value and rte_errno is set otherwise.
> + */
> +typedef int(rte_auxiliary_probe_t)(struct rte_auxiliary_driver *drv,
> +				    struct rte_auxiliary_device *dev);
> +
> +/**
> + * Uninitialization function for the driver called during hotplugging.
> + *
> + * @param dev
> + *   Pointer to the auxiliary device.
> + * @return
> + *   - 0 On success.
> + *   - Negative value and rte_errno is set otherwise.
> + */
> +typedef int (rte_auxiliary_remove_t)(struct rte_auxiliary_device *dev);
> +
> +/**
> + * Driver-specific DMA mapping. After a successful call the device
> + * will be able to read/write from/to this segment.
> + *
> + * @param dev
> + *   Pointer to the auxiliary device.
> + * @param addr
> + *   Starting virtual address of memory to be mapped.
> + * @param iova
> + *   Starting IOVA address of memory to be mapped.
> + * @param len
> + *   Length of memory segment being mapped.
> + * @return
> + *   - 0 On success.
> + *   - Negative value and rte_errno is set otherwise.
> + */
> +typedef int (rte_auxiliary_dma_map_t)(struct rte_auxiliary_device *dev,
> +				       void *addr, uint64_t iova, size_t len);
> +
> +/**
> + * Driver-specific DMA un-mapping. After a successful call the device
> + * will not be able to read/write from/to this segment.
> + *
> + * @param dev
> + *   Pointer to the auxiliary device.
> + * @param addr
> + *   Starting virtual address of memory to be unmapped.
> + * @param iova
> + *   Starting IOVA address of memory to be unmapped.
> + * @param len
> + *   Length of memory segment being unmapped.
> + * @return
> + *   - 0 On success.
> + *   - Negative value and rte_errno is set otherwise.
> + */
> +typedef int (rte_auxiliary_dma_unmap_t)(struct rte_auxiliary_device *dev,
> +					 void *addr, uint64_t iova, size_t len);
> +
> +/**
> + * A structure describing an auxiliary device.
> + */
> +struct rte_auxiliary_device {
> +	TAILQ_ENTRY(rte_auxiliary_device) next;   /**< Next probed device. */
> +	struct rte_device device;                 /**< Inherit core device */
> +	char name[RTE_DEV_NAME_MAX_LEN + 1];      /**< ASCII device name */
> +	struct rte_intr_handle intr_handle;       /**< Interrupt handle */
> +	struct rte_auxiliary_driver *driver;      /**< Device driver */
> +};
> +
> +/** List of auxiliary devices */
> +TAILQ_HEAD(rte_auxiliary_device_list, rte_auxiliary_device);
> +/** List of auxiliary drivers */
> +TAILQ_HEAD(rte_auxiliary_driver_list, rte_auxiliary_driver);

Shouldn't we hide rte_auxiliary_device inside the library take
API/ABI stability into account? Or will be it DPDK internal anyway? If
so, it should be done INTERNAL from the very
beginning.

> +
> +/**
> + * Structure describing the auxiliary bus
> + */
> +struct rte_auxiliary_bus {
> +	struct rte_bus bus;                  /**< Inherit the generic class */
> +	struct rte_auxiliary_device_list device_list;  /**< List of devices */
> +	struct rte_auxiliary_driver_list driver_list;  /**< List of drivers */
> +};

It looks internal. The following forward declaration should be
sufficient to build.

struct rte_auxiliary_bus;


> +
> +/**
> + * A structure describing an auxiliary driver.
> + */
> +struct rte_auxiliary_driver {
> +	TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
> +	struct rte_driver driver;             /**< Inherit core driver. */
> +	struct rte_auxiliary_bus *bus;        /**< Auxiliary bus reference. */
> +	rte_auxiliary_match_t *match;         /**< Device match function. */
> +	rte_auxiliary_probe_t *probe;         /**< Device probe function. */
> +	rte_auxiliary_remove_t *remove;       /**< Device remove function. */
> +	rte_auxiliary_dma_map_t *dma_map;     /**< Device DMA map function. */
> +	rte_auxiliary_dma_unmap_t *dma_unmap; /**< Device DMA unmap function. */
> +	uint32_t drv_flags;                   /**< Flags RTE_AUXILIARY_DRV_*. */
> +};
> +
> +/**
> + * @internal
> + * Helper macro for drivers that need to convert to struct rte_auxiliary_device.
> + */
> +#define RTE_DEV_TO_AUXILIARY(ptr) \
> +	container_of(ptr, struct rte_auxiliary_device, device)
> +
> +#define RTE_DEV_TO_AUXILIARY_CONST(ptr) \
> +	container_of(ptr, const struct rte_auxiliary_device, device)
> +
> +#define RTE_ETH_DEV_TO_AUXILIARY(eth_dev) \
> +	RTE_DEV_TO_AUXILIARY((eth_dev)->device)
> +
> +/** Device driver needs IOVA as VA and cannot work with IOVA as PA */
> +#define RTE_AUXILIARY_DRV_NEED_IOVA_AS_VA 0x002
> +
> +/**

Don't we need EXPERIMENTAL notice here?

> + * Register an auxiliary driver.
> + *
> + * @param driver
> + *   A pointer to a rte_auxiliary_driver structure describing the driver
> + *   to be registered.
> + */
> +__rte_experimental
> +void rte_auxiliary_register(struct rte_auxiliary_driver *driver);
> +
> +/** Helper for auxiliary device registration from driver instance */
> +#define RTE_PMD_REGISTER_AUXILIARY(nm, auxiliary_drv) \
> +	RTE_INIT(auxiliaryinitfn_ ##nm) \
> +	{ \
> +		(auxiliary_drv).driver.name = RTE_STR(nm); \
> +		rte_auxiliary_register(&(auxiliary_drv)); \
> +	} \
> +	RTE_PMD_EXPORT_NAME(nm, __COUNTER__)
> +
> +/**

Don't we need EXPERIMENTAL notice here?

> + * Unregister an auxiliary driver.
> + *
> + * @param driver
> + *   A pointer to a rte_auxiliary_driver structure describing the driver
> + *   to be unregistered.
> + */
> +__rte_experimental
> +void rte_auxiliary_unregister(struct rte_auxiliary_driver *driver);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* RTE_BUS_AUXILIARY_H */
> diff --git a/drivers/bus/auxiliary/version.map b/drivers/bus/auxiliary/version.map
> new file mode 100644
> index 0000000000..a52260657c
> --- /dev/null
> +++ b/drivers/bus/auxiliary/version.map
> @@ -0,0 +1,7 @@
> +EXPERIMENTAL {
> +	global:
> +
> +	# added in 21.08
> +	rte_auxiliary_register;
> +	rte_auxiliary_unregister;
> +};
> diff --git a/drivers/bus/meson.build b/drivers/bus/meson.build
> index 410058de3a..45eab5233d 100644
> --- a/drivers/bus/meson.build
> +++ b/drivers/bus/meson.build
> @@ -2,6 +2,7 @@
>  # Copyright(c) 2017 Intel Corporation
>  
>  drivers = [
> +        'auxiliary',
>          'dpaa',
>          'fslmc',
>          'ifpga',
> 


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 19/20] net/sfc: support flow action COUNT in transfer rules
  @ 2021-07-04 19:45  3%     ` Thomas Monjalon
  2021-07-05  8:41  0%       ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-07-04 19:45 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: David Marchand, Bruce Richardson, dev, Igor Romanov,
	Andy Moreton, Ivan Malov

02/07/2021 14:53, Andrew Rybchenko:
> On 7/2/21 3:30 PM, Thomas Monjalon wrote:
> > 02/07/2021 10:43, Andrew Rybchenko:
> >> On 7/1/21 4:05 PM, Andrew Rybchenko wrote:
> >>> On 7/1/21 3:34 PM, David Marchand wrote:
> >>>> On Thu, Jul 1, 2021 at 11:22 AM Andrew Rybchenko
> >>>> <andrew.rybchenko@oktetlabs.ru> wrote:
> >>>>> The build works fine for me on FC34, but it has
> >>>>> libatomic-11.1.1-3.fc34.x86_64 installed.
> >>>>
> >>>> I first produced the issue on my "old" FC32.
> >>>> Afaics, for FC33 and later, gcc now depends on libatomic and the
> >>>> problem won't be noticed.
> >>>> FC32 and before are EOL, but I then reproduced the issue on RHEL 8
> >>>> (and Intel CI reported it on Centos 8 too).
> >>>
> >>> I see. Thanks for the clarification.
> >>>
> >>>>>
> >>>>> I'd like to understand what we're trying to solve here.
> >>>>> Are we trying to make meson to report the missing library
> >>>>> correctly?
> >>>>>
> >>>>> If so, I think I can do simple check using cc.links()
> >>>>> which will fail if the library is not found. I'll
> >>>>> test that it works as expected if the library is not
> >>>>> completely installed.
> >>>>>
> >>>>
> >>>> I tried below diff, and it works for me.
> >>>> "works" as in net/sfc gets disabled without libatomic installed:
> > [...]
> >>>>  # for gcc compiles we need -latomic for 128-bit atomic ops
> >>>>  if cc.get_id() == 'gcc'
> >>>> +    code = '''#include <stdio.h>
> >>>> +    void main() { printf("Atomilink me.\n"); }
> >>>> +    '''
> >>>> +    if not cc.links(code, args: '-latomic', name: 'libatomic link check')
> >>>> +        build = false
> >>>> +        reason = 'missing dependency, "libatomic"'
> >>>> +        subdir_done()
> >>>> +    endif
> >>>>      ext_deps += cc.find_library('atomic')
> >>>>  endif
> >>>
> >>> Many thanks, LGTM. I'll pick it up and add comments why
> >>> it is checked this way.
> >>>
> >>
> >> I've send v4 with the problem fixed. However, I'm afraid
> >> build test systems should be updated to have libatomic
> >> correctly installed. Otherwise, they do not really check
> >> net/sfc build.
> > 
> > When testing on old systems, sfc won't be tested anymore after this patchset.
> > On recent systems, sfc should be enabled I guess.
> > I don't see how to manage better, sorry.
> > 
> 
> I see. I thought that it is possible to install missing
> package on corresponding systems to make build coverage
> better.
> 
> Now I automatically test build on problematic distros
> with previously missing packages installed. So I have
> internal build coverage anyway.

David asked for installing libatomic:
https://inbox.dpdk.org/ci/CAJFAV8xCNBL4yEZU0c=dJGYS+13QM7Uz7e2qnUkMuM7eaKKw+Q@mail.gmail.com/

We should wait for it to be installed otherwise ABI check will fail.




^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
  2021-07-04 16:13  3%   ` Andrew Rybchenko
@ 2021-07-05  5:47  0%     ` Xueming(Steven) Li
  0 siblings, 0 replies; 200+ results
From: Xueming(Steven) Li @ 2021-07-05  5:47 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: dev, Wang Haiyue, NBU-Contact-Thomas Monjalon, Kinsella Ray, Neil Horman

Hi Andrew,

Thanks very much all the good suggestions, v7 posted.

Best Regards,
Xueming

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, July 5, 2021 12:13 AM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>
> Cc: dev@dpdk.org; Wang Haiyue <haiyue.wang@intel.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Kinsella Ray
> <mdr@ashroe.eu>; Neil Horman <nhorman@tuxdriver.com>
> Subject: Re: [dpdk-dev] [PATCH v6 2/2] bus/auxiliary: introduce auxiliary bus
> 
> On 6/25/21 2:47 PM, Xueming Li wrote:
> > Auxiliary bus [1] provides a way to split function into child-devices
> > representing sub-domains of functionality. Each auxiliary device
> > represents a part of its parent functionality.
> >
> > Auxiliary device is identified by unique device name, sysfs path:
> >   /sys/bus/auxiliary/devices/<name>
> >
> > Devargs legacy syntax ofauxiliary device:
> 
> Missing space after 'of'
> 
> >   -a auxiliary:<name>[,args...]
> > Devargs generic syntax of auxiliary device:
> >   -a bus=auxiliary,name=<name>,,/class=<classs>,,/driver=<driver>,,
> 
> Are two commas above intentionall? What for?
> 
> >
> > [1] kernel auxiliary bus document:
> > https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html
> >
> > Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> 
> With my below notes fixed:
> 
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> 
> > Cc: Wang Haiyue <haiyue.wang@intel.com>
> > Cc: Thomas Monjalon <thomas@monjalon.net>
> > Cc: Kinsella Ray <mdr@ashroe.eu>
> > ---
> >  MAINTAINERS                               |   5 +
> >  doc/guides/rel_notes/release_21_08.rst    |   6 +
> >  drivers/bus/auxiliary/auxiliary_common.c  | 411
> > ++++++++++++++++++++++  drivers/bus/auxiliary/auxiliary_params.c  |  59 ++++
> >  drivers/bus/auxiliary/linux/auxiliary.c   | 141 ++++++++
> >  drivers/bus/auxiliary/meson.build         |  16 +
> >  drivers/bus/auxiliary/private.h           |  74 ++++
> >  drivers/bus/auxiliary/rte_bus_auxiliary.h | 201 +++++++++++
> >  drivers/bus/auxiliary/version.map         |   7 +
> >  drivers/bus/meson.build                   |   1 +
> >  10 files changed, 921 insertions(+)
> >  create mode 100644 drivers/bus/auxiliary/auxiliary_common.c
> >  create mode 100644 drivers/bus/auxiliary/auxiliary_params.c
> >  create mode 100644 drivers/bus/auxiliary/linux/auxiliary.c
> >  create mode 100644 drivers/bus/auxiliary/meson.build  create mode
> > 100644 drivers/bus/auxiliary/private.h  create mode 100644
> > drivers/bus/auxiliary/rte_bus_auxiliary.h
> >  create mode 100644 drivers/bus/auxiliary/version.map
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS index 5877a16971..eaf691ca6a
> > 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -525,6 +525,11 @@ F: doc/guides/mempool/octeontx2.rst  Bus Drivers
> >  -----------
> >
> > +Auxiliary bus driver
> 
> Shouldn't it be EXPERIMENTAL?
> 
> > +M: Parav Pandit <parav@nvidia.com>
> > +M: Xueming Li <xuemingl@nvidia.com>
> > +F: drivers/bus/auxiliary/
> > +
> >  Intel FPGA bus
> >  M: Rosen Xu <rosen.xu@intel.com>
> >  F: drivers/bus/ifpga/
> > diff --git a/doc/guides/rel_notes/release_21_08.rst
> > b/doc/guides/rel_notes/release_21_08.rst
> > index a6ecfdf3ce..e7ef4c8a05 100644
> > --- a/doc/guides/rel_notes/release_21_08.rst
> > +++ b/doc/guides/rel_notes/release_21_08.rst
> > @@ -55,6 +55,12 @@ New Features
> >       Also, make sure to start the actual text at the margin.
> >       =======================================================
> >
> > +* **Added auxiliary bus support.**
> > +
> > +  Auxiliary bus provides a way to split function into child-devices
> > + representing sub-domains of functionality. Each auxiliary device
> > + represents a part of its parent functionality.
> > +
> >
> >  Removed Items
> >  -------------
> > diff --git a/drivers/bus/auxiliary/auxiliary_common.c
> > b/drivers/bus/auxiliary/auxiliary_common.c
> > new file mode 100644
> > index 0000000000..8a75306da5
> > --- /dev/null
> > +++ b/drivers/bus/auxiliary/auxiliary_common.c
> > @@ -0,0 +1,411 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2021 NVIDIA Corporation & Affiliates  */
> > +
> > +#include <string.h>
> > +#include <inttypes.h>
> > +#include <stdint.h>
> > +#include <stdbool.h>
> > +#include <stdlib.h>
> > +#include <stdio.h>
> > +#include <sys/queue.h>
> > +#include <rte_errno.h>
> > +#include <rte_interrupts.h>
> > +#include <rte_log.h>
> > +#include <rte_bus.h>
> > +#include <rte_per_lcore.h>
> > +#include <rte_memory.h>
> > +#include <rte_eal.h>
> > +#include <rte_eal_paging.h>
> > +#include <rte_string_fns.h>
> > +#include <rte_common.h>
> > +#include <rte_devargs.h>
> > +
> > +#include "private.h"
> > +#include "rte_bus_auxiliary.h"
> > +
> > +static struct rte_devargs *
> > +auxiliary_devargs_lookup(const char *name) {
> > +	struct rte_devargs *devargs;
> > +
> > +	RTE_EAL_DEVARGS_FOREACH(RTE_BUS_AUXILIARY_NAME, devargs) {
> > +		if (strcmp(devargs->name, name) == 0)
> > +			return devargs;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +/*
> > + * Test whether the auxiliary device exist
> 
> Missing full stop above.
> 
> > + *
> > + * Stub for OS not supporting auxiliary bus.
> > + */
> > +__rte_weak bool
> > +auxiliary_dev_exists(const char *name) {
> > +	RTE_SET_USED(name);
> > +	return false;
> > +}
> > +
> > +/*
> > + * Scan the devices in the auxiliary bus.
> > + *
> > + * Stub for OS not supporting auxiliary bus.
> > + */
> > +__rte_weak int
> > +auxiliary_scan(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Update a device's devargs being scanned.
> > + *
> > + * @param aux_dev
> > + *	AUXILIARY device.
> > + */
> > +void
> > +auxiliary_on_scan(struct rte_auxiliary_device *aux_dev) {
> > +	aux_dev->device.devargs = auxiliary_devargs_lookup(aux_dev->name);
> > +}
> > +
> > +/*
> > + * Match the auxiliary driver and device using driver function.
> > + */
> > +bool
> > +auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
> > +		const struct rte_auxiliary_device *aux_dev) {
> > +	if (aux_drv->match == NULL)
> > +		return false;
> > +	return aux_drv->match(aux_dev->name); }
> > +
> > +/*
> > + * Call the probe() function of the driver.
> > + */
> > +static int
> > +rte_auxiliary_probe_one_driver(struct rte_auxiliary_driver *drv,
> > +			       struct rte_auxiliary_device *dev)
> > +{
> > +	enum rte_iova_mode iova_mode;
> > +	int ret;
> > +
> > +	if ((drv == NULL) || (dev == NULL))
> 
> Unnecessary internal parenthesis.
> 
> > +		return -EINVAL;
> > +
> > +	/* Check if driver supports it. */
> > +	if (!auxiliary_match(drv, dev))
> > +		/* Match of device and driver failed */
> > +		return 1;
> > +
> > +	/* No initialization when marked as blocked, return without error. */
> > +	if (dev->device.devargs != NULL &&
> > +	    dev->device.devargs->policy == RTE_DEV_BLOCKED) {
> > +		AUXILIARY_LOG(INFO, "Device is blocked, not initializing");
> > +		return -1;
> > +	}
> > +
> > +	if (dev->device.numa_node < 0) {
> > +		AUXILIARY_LOG(INFO, "Device is not NUMA-aware, defaulting socket to 0");
> 
> socket -> NUMA node
> 
> > +		dev->device.numa_node = 0;
> > +	}
> > +
> > +	iova_mode = rte_eal_iova_mode();
> > +	if ((drv->drv_flags & RTE_AUXILIARY_DRV_NEED_IOVA_AS_VA) > 0 &&
> > +	    iova_mode != RTE_IOVA_VA) {
> > +		AUXILIARY_LOG(ERR, "Driver %s expecting VA IOVA mode but current mode is PA, not initializing",
> > +			      drv->driver.name);
> > +		return -EINVAL;
> > +	}
> > +
> > +	dev->driver = drv;
> > +
> > +	AUXILIARY_LOG(INFO, "Probe auxiliary driver: %s device: %s (socket %i)",
> 
> socket -> NUMA node
> 
> > +		      drv->driver.name, dev->name, dev->device.numa_node);
> > +	ret = drv->probe(drv, dev);
> > +	if (ret != 0)
> > +		dev->driver = NULL;
> > +	else
> > +		dev->device.driver = &drv->driver;
> > +
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Call the remove() function of the driver.
> > + */
> > +static int
> > +rte_auxiliary_driver_remove_dev(struct rte_auxiliary_device *dev)
> > +{
> > +	struct rte_auxiliary_driver *drv;
> > +	int ret = 0;
> > +
> > +	if (dev == NULL)
> > +		return -EINVAL;
> > +
> > +	drv = dev->driver;
> > +
> > +	AUXILIARY_LOG(DEBUG, "Driver %s remove auxiliary device %s on NUMA socket %i",
> 
> socket -> node
> 
> > +		      drv->driver.name, dev->name, dev->device.numa_node);
> > +
> > +	if (drv->remove != NULL) {
> > +		ret = drv->remove(dev);
> > +		if (ret < 0)
> > +			return ret;
> > +	}
> > +
> > +	/* clear driver structure */
> > +	dev->driver = NULL;
> > +	dev->device.driver = NULL;
> > +
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Call the probe() function of all registered driver for the given device.
> > + * Return < 0 if initialization failed.
> > + * Return 1 if no driver is found for this device.
> > + */
> > +static int
> > +auxiliary_probe_all_drivers(struct rte_auxiliary_device *dev)
> > +{
> > +	struct rte_auxiliary_driver *drv;
> > +	int rc;
> > +
> > +	if (dev == NULL)
> > +		return -EINVAL;
> > +
> > +	FOREACH_DRIVER_ON_AUXILIARY_BUS(drv) {
> > +		if (!drv->match(dev->name))
> > +			continue;
> > +
> > +		rc = rte_auxiliary_probe_one_driver(drv, dev);
> > +		if (rc < 0)
> > +			/* negative value is an error */
> > +			return rc;
> > +		if (rc > 0)
> > +			/* positive value means driver doesn't support it */
> > +			continue;
> > +		return 0;
> > +	}
> > +	return 1;
> > +}
> > +
> > +/*
> > + * Scan the content of the auxiliary bus, and call the probe function for
> > + * all registered drivers to try to probe discovered devices.
> > + */
> > +static int
> > +auxiliary_probe(void)
> > +{
> > +	struct rte_auxiliary_device *dev = NULL;
> > +	size_t probed = 0, failed = 0;
> > +	int ret = 0;
> > +
> > +	FOREACH_DEVICE_ON_AUXILIARY_BUS(dev) {
> > +		probed++;
> > +
> > +		ret = auxiliary_probe_all_drivers(dev);
> > +		if (ret < 0) {
> > +			if (ret != -EEXIST) {
> > +				AUXILIARY_LOG(ERR, "Requested device %s cannot be used",
> > +					      dev->name);
> > +				rte_errno = errno;
> > +				failed++;
> > +			}
> > +			ret = 0;
> > +		}
> > +	}
> > +
> > +	return (probed && probed == failed) ? -1 : 0;
> > +}
> > +
> > +static int
> > +auxiliary_parse(const char *name, void *addr)
> > +{
> > +	struct rte_auxiliary_driver *drv = NULL;
> > +	const char **out = addr;
> > +
> > +	/* Allow empty device name "auxiliary:" to bypass entire bus scan. */
> > +	if (strlen(name) == 0)
> > +		return 0;
> > +
> > +	FOREACH_DRIVER_ON_AUXILIARY_BUS(drv) {
> > +		if (drv->match(name))
> > +			break;
> > +	}
> > +	if (drv != NULL && addr != NULL)
> > +		*out = name;
> > +	return drv != NULL ? 0 : -1;
> > +}
> > +
> > +/* Register a driver */
> > +void
> > +rte_auxiliary_register(struct rte_auxiliary_driver *driver)
> > +{
> > +	TAILQ_INSERT_TAIL(&auxiliary_bus.driver_list, driver, next);
> > +	driver->bus = &auxiliary_bus;
> > +}
> > +
> > +/* Unregister a driver */
> > +void
> > +rte_auxiliary_unregister(struct rte_auxiliary_driver *driver)
> > +{
> > +	TAILQ_REMOVE(&auxiliary_bus.driver_list, driver, next);
> > +	driver->bus = NULL;
> > +}
> > +
> > +/* Add a device to auxiliary bus */
> > +void
> > +auxiliary_add_device(struct rte_auxiliary_device *aux_dev)
> > +{
> > +	TAILQ_INSERT_TAIL(&auxiliary_bus.device_list, aux_dev, next);
> > +}
> > +
> > +/* Insert a device into a predefined position in auxiliary bus */
> > +void
> > +auxiliary_insert_device(struct rte_auxiliary_device *exist_aux_dev,
> > +			struct rte_auxiliary_device *new_aux_dev)
> > +{
> > +	TAILQ_INSERT_BEFORE(exist_aux_dev, new_aux_dev, next);
> > +}
> > +
> > +/* Remove a device from auxiliary bus */
> > +static void
> > +rte_auxiliary_remove_device(struct rte_auxiliary_device *auxiliary_dev)
> > +{
> > +	TAILQ_REMOVE(&auxiliary_bus.device_list, auxiliary_dev, next);
> > +}
> > +
> > +static struct rte_device *
> > +auxiliary_find_device(const struct rte_device *start, rte_dev_cmp_t cmp,
> > +		      const void *data)
> > +{
> > +	const struct rte_auxiliary_device *pstart;
> > +	struct rte_auxiliary_device *adev;
> > +
> > +	if (start != NULL) {
> > +		pstart = RTE_DEV_TO_AUXILIARY_CONST(start);
> > +		adev = TAILQ_NEXT(pstart, next);
> > +	} else {
> > +		adev = TAILQ_FIRST(&auxiliary_bus.device_list);
> > +	}
> > +	while (adev != NULL) {
> > +		if (cmp(&adev->device, data) == 0)
> > +			return &adev->device;
> > +		adev = TAILQ_NEXT(adev, next);
> > +	}
> > +	return NULL;
> > +}
> > +
> > +static int
> > +auxiliary_plug(struct rte_device *dev)
> > +{
> > +	if (!auxiliary_dev_exists(dev->name))
> > +		return -ENOENT;
> > +	return auxiliary_probe_all_drivers(RTE_DEV_TO_AUXILIARY(dev));
> > +}
> > +
> > +static int
> > +auxiliary_unplug(struct rte_device *dev)
> > +{
> > +	struct rte_auxiliary_device *adev;
> > +	int ret;
> > +
> > +	adev = RTE_DEV_TO_AUXILIARY(dev);
> > +	ret = rte_auxiliary_driver_remove_dev(adev);
> > +	if (ret == 0) {
> > +		rte_auxiliary_remove_device(adev);
> > +		rte_devargs_remove(dev->devargs);
> > +		free(adev);
> > +	}
> > +	return ret;
> > +}
> > +
> > +static int
> > +auxiliary_dma_map(struct rte_device *dev, void *addr, uint64_t iova, size_t len)
> > +{
> > +	struct rte_auxiliary_device *aux_dev = RTE_DEV_TO_AUXILIARY(dev);
> > +
> > +	if (dev == NULL || aux_dev->driver == NULL) {
> > +		rte_errno = EINVAL;
> > +		return -1;
> > +	}
> > +	if (aux_dev->driver->dma_map == NULL) {
> > +		rte_errno = ENOTSUP;
> > +		return -1;
> > +	}
> > +	return aux_dev->driver->dma_map(aux_dev, addr, iova, len);
> > +}
> > +
> > +static int
> > +auxiliary_dma_unmap(struct rte_device *dev, void *addr, uint64_t iova,
> > +		    size_t len)
> > +{
> > +	struct rte_auxiliary_device *aux_dev = RTE_DEV_TO_AUXILIARY(dev);
> > +
> > +	if (dev == NULL || aux_dev->driver == NULL) {
> > +		rte_errno = EINVAL;
> > +		return -1;
> > +	}
> > +	if (aux_dev->driver->dma_unmap == NULL) {
> > +		rte_errno = ENOTSUP;
> > +		return -1;
> > +	}
> > +	return aux_dev->driver->dma_unmap(aux_dev, addr, iova, len);
> > +}
> > +
> > +bool
> > +auxiliary_is_ignored_device(const char *name)
> > +{
> > +	struct rte_devargs *devargs = auxiliary_devargs_lookup(name);
> > +
> > +	switch (auxiliary_bus.bus.conf.scan_mode) {
> > +	case RTE_BUS_SCAN_ALLOWLIST:
> > +		if (devargs && devargs->policy == RTE_DEV_ALLOWED)
> > +			return false;
> > +		break;
> > +	case RTE_BUS_SCAN_UNDEFINED:
> > +	case RTE_BUS_SCAN_BLOCKLIST:
> > +		if (devargs == NULL || devargs->policy != RTE_DEV_BLOCKED)
> > +			return false;
> > +		break;
> > +	}
> > +	return true;
> > +}
> > +
> > +static enum rte_iova_mode
> > +auxiliary_get_iommu_class(void)
> > +{
> > +	const struct rte_auxiliary_driver *drv;
> > +
> > +	FOREACH_DRIVER_ON_AUXILIARY_BUS(drv) {
> > +		if ((drv->drv_flags & RTE_AUXILIARY_DRV_NEED_IOVA_AS_VA) > 0)
> > +			return RTE_IOVA_VA;
> > +	}
> > +
> > +	return RTE_IOVA_DC;
> > +}
> > +
> > +struct rte_auxiliary_bus auxiliary_bus = {
> > +	.bus = {
> > +		.scan = auxiliary_scan,
> > +		.probe = auxiliary_probe,
> > +		.find_device = auxiliary_find_device,
> > +		.plug = auxiliary_plug,
> > +		.unplug = auxiliary_unplug,
> > +		.parse = auxiliary_parse,
> > +		.dma_map = auxiliary_dma_map,
> > +		.dma_unmap = auxiliary_dma_unmap,
> > +		.get_iommu_class = auxiliary_get_iommu_class,
> > +		.dev_iterate = auxiliary_dev_iterate,
> > +	},
> > +	.device_list = TAILQ_HEAD_INITIALIZER(auxiliary_bus.device_list),
> > +	.driver_list = TAILQ_HEAD_INITIALIZER(auxiliary_bus.driver_list),
> > +};
> > +
> > +RTE_REGISTER_BUS(auxiliary, auxiliary_bus.bus);
> > +RTE_LOG_REGISTER_DEFAULT(auxiliary_bus_logtype, NOTICE);
> > diff --git a/drivers/bus/auxiliary/auxiliary_params.c b/drivers/bus/auxiliary/auxiliary_params.c
> > new file mode 100644
> > index 0000000000..cd3fa56cb4
> > --- /dev/null
> > +++ b/drivers/bus/auxiliary/auxiliary_params.c
> > @@ -0,0 +1,59 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2021 NVIDIA Corporation & Affiliates
> > + */
> > +
> > +#include <string.h>
> > +
> > +#include <rte_bus.h>
> > +#include <rte_dev.h>
> > +#include <rte_errno.h>
> > +#include <rte_kvargs.h>
> > +
> > +#include "private.h"
> > +#include "rte_bus_auxiliary.h"
> > +
> > +enum auxiliary_params {
> > +	RTE_AUXILIARY_PARAM_NAME,
> > +};
> > +
> > +static const char * const auxiliary_params_keys[] = {
> > +	[RTE_AUXILIARY_PARAM_NAME] = "name",
> > +};
> > +
> > +static int
> > +auxiliary_dev_match(const struct rte_device *dev,
> > +	      const void *_kvlist)
> > +{
> > +	const struct rte_kvargs *kvlist = _kvlist;
> > +	int ret;
> > +
> > +	ret = rte_kvargs_process(kvlist,
> > +			auxiliary_params_keys[RTE_AUXILIARY_PARAM_NAME],
> > +			rte_kvargs_strcmp, (void *)(uintptr_t)dev->name);
> > +
> > +	return ret != 0 ? -1 : 0;
> > +}
> > +
> > +void *
> > +auxiliary_dev_iterate(const void *start,
> > +		    const char *str,
> > +		    const struct rte_dev_iterator *it __rte_unused)
> > +{
> > +	rte_bus_find_device_t find_device;
> > +	struct rte_kvargs *kvargs = NULL;
> > +	struct rte_device *dev;
> > +
> > +	if (str != NULL) {
> > +		kvargs = rte_kvargs_parse(str, auxiliary_params_keys);
> > +		if (kvargs == NULL) {
> > +			AUXILIARY_LOG(ERR, "cannot parse argument list %s",
> > +				      str);
> > +			rte_errno = EINVAL;
> > +			return NULL;
> > +		}
> > +	}
> > +	find_device = auxiliary_bus.bus.find_device;
> > +	dev = find_device(start, auxiliary_dev_match, kvargs);
> > +	rte_kvargs_free(kvargs);
> > +	return dev;
> > +}
> > diff --git a/drivers/bus/auxiliary/linux/auxiliary.c b/drivers/bus/auxiliary/linux/auxiliary.c
> > new file mode 100644
> > index 0000000000..8464487971
> > --- /dev/null
> > +++ b/drivers/bus/auxiliary/linux/auxiliary.c
> > @@ -0,0 +1,141 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2021 NVIDIA Corporation & Affiliates
> > + */
> > +
> > +#include <string.h>
> > +#include <dirent.h>
> > +
> > +#include <rte_log.h>
> > +#include <rte_bus.h>
> > +#include <rte_malloc.h>
> > +#include <rte_devargs.h>
> > +#include <rte_memcpy.h>
> > +#include <eal_filesystem.h>
> > +
> > +#include "../rte_bus_auxiliary.h"
> > +#include "../private.h"
> > +
> > +#define AUXILIARY_SYSFS_PATH "/sys/bus/auxiliary/devices"
> > +
> > +/* Scan one auxiliary sysfs entry, and fill the devices list from it. */
> > +static int
> > +auxiliary_scan_one(const char *dirname, const char *name)
> > +{
> > +	struct rte_auxiliary_device *dev;
> > +	struct rte_auxiliary_device *dev2;
> > +	char filename[PATH_MAX];
> > +	unsigned long tmp;
> > +	int ret;
> > +
> > +	dev = malloc(sizeof(*dev));
> > +	if (dev == NULL)
> > +		return -1;
> > +
> > +	memset(dev, 0, sizeof(*dev));
> > +	if (rte_strscpy(dev->name, name, sizeof(dev->name)) < 0) {
> > +		free(dev);
> > +		return -1;
> > +	}
> > +	dev->device.name = dev->name;
> > +	dev->device.bus = &auxiliary_bus.bus;
> > +
> > +	/* Get NUMA node, default to 0 if not present */
> > +	snprintf(filename, sizeof(filename), "%s/%s/numa_node",
> > +		 dirname, name);
> > +	if (access(filename, F_OK) != -1) {
> > +		if (eal_parse_sysfs_value(filename, &tmp) == 0)
> > +			dev->device.numa_node = tmp;
> > +		else
> > +			dev->device.numa_node = -1;
> > +	} else {
> > +		dev->device.numa_node = 0;
> > +	}
> > +
> > +	auxiliary_on_scan(dev);
> > +
> > +	/* Device is valid, add in list (sorted) */
> > +	TAILQ_FOREACH(dev2, &auxiliary_bus.device_list, next) {
> > +		ret = strcmp(dev->name, dev2->name);
> > +		if (ret > 0)
> > +			continue;
> > +		if (ret < 0) {
> > +			auxiliary_insert_device(dev2, dev);
> > +		} else { /* already registered */
> > +			if (rte_dev_is_probed(&dev2->device) &&
> > +			    dev2->device.devargs != dev->device.devargs) {
> > +				/* To probe device with new devargs. */
> > +				rte_devargs_remove(dev2->device.devargs);
> > +				auxiliary_on_scan(dev2);
> > +			}
> > +			free(dev);
> > +		}
> > +		return 0;
> > +	}
> > +	auxiliary_add_device(dev);
> > +	return 0;
> > +}
> > +
> > +/*
> > + * Test whether the auxiliary device exist
> 
> Missing full stop above.
> 
> > + */
> > +bool
> > +auxiliary_dev_exists(const char *name)
> > +{
> > +	DIR *dir;
> > +	char dirname[PATH_MAX];
> > +
> > +	snprintf(dirname, sizeof(dirname), "%s/%s",
> > +		 AUXILIARY_SYSFS_PATH, name);
> > +	dir = opendir(dirname);
> > +	if (dir == NULL)
> > +		return false;
> > +	closedir(dir);
> > +	return true;
> > +}
> > +
> > +/*
> > + * Scan the devices in the auxiliary bus
> 
> Missing full stop above.
> 
> > + */
> > +int
> > +auxiliary_scan(void)
> > +{
> > +	struct dirent *e;
> > +	DIR *dir;
> > +	char dirname[PATH_MAX];
> > +	struct rte_auxiliary_driver *drv;
> > +
> > +	dir = opendir(AUXILIARY_SYSFS_PATH);
> > +	if (dir == NULL) {
> > +		AUXILIARY_LOG(INFO, "%s not found, is auxiliary module loaded?",
> > +			      AUXILIARY_SYSFS_PATH);
> > +		return 0;
> > +	}
> > +
> > +	while ((e = readdir(dir)) != NULL) {
> > +		if (e->d_name[0] == '.')
> > +			continue;
> > +
> > +		if (auxiliary_is_ignored_device(e->d_name))
> > +			continue;
> > +
> > +		snprintf(dirname, sizeof(dirname), "%s/%s",
> > +			 AUXILIARY_SYSFS_PATH, e->d_name);
> > +
> > +		/* Ignore if no driver can handle. */
> > +		FOREACH_DRIVER_ON_AUXILIARY_BUS(drv) {
> > +			if (drv->match(e->d_name))
> > +				break;
> > +		}
> > +		if (drv == NULL)
> > +			continue;
> > +
> > +		if (auxiliary_scan_one(dirname, e->d_name) < 0)
> > +			goto error;
> > +	}
> > +	closedir(dir);
> > +	return 0;
> > +
> > +error:
> > +	closedir(dir);
> > +	return -1;
> > +}
> > diff --git a/drivers/bus/auxiliary/meson.build b/drivers/bus/auxiliary/meson.build
> > new file mode 100644
> > index 0000000000..357550eff7
> > --- /dev/null
> > +++ b/drivers/bus/auxiliary/meson.build
> > @@ -0,0 +1,16 @@
> > +# SPDX-License-Identifier: BSD-3-Clause
> > +# Copyright (c) 2021 NVIDIA Corporation & Affiliates
> > +
> > +headers = files(
> > +        'rte_bus_auxiliary.h',
> > +)
> > +sources = files(
> > +        'auxiliary_common.c',
> > +        'auxiliary_params.c',
> > +)
> > +if is_linux
> > +    sources += files(
> > +        'linux/auxiliary.c',
> > +    )
> > +endif
> > +deps += ['kvargs']
> > diff --git a/drivers/bus/auxiliary/private.h b/drivers/bus/auxiliary/private.h
> > new file mode 100644
> > index 0000000000..cb3e849993
> > --- /dev/null
> > +++ b/drivers/bus/auxiliary/private.h
> > @@ -0,0 +1,74 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2021 NVIDIA Corporation & Affiliates
> > + */
> > +
> > +#ifndef AUXILIARY_PRIVATE_H
> 
> May be add BUS_ prefix at leaat?
> 
> > +#define AUXILIARY_PRIVATE_H
> > +
> > +#include <stdbool.h>
> > +#include <stdio.h>
> > +
> > +#include "rte_bus_auxiliary.h"
> > +
> > +extern struct rte_auxiliary_bus auxiliary_bus;
> > +extern int auxiliary_bus_logtype;
> > +
> > +#define AUXILIARY_LOG(level, ...) \
> > +	rte_log(RTE_LOG_ ## level, auxiliary_bus_logtype, \
> > +		RTE_FMT("auxiliary bus: " RTE_FMT_HEAD(__VA_ARGS__,) "\n", \
> > +			RTE_FMT_TAIL(__VA_ARGS__,)))
> > +
> > +/* Auxiliary bus iterators */
> > +#define FOREACH_DEVICE_ON_AUXILIARY_BUS(p) \
> > +		TAILQ_FOREACH(p, &(auxiliary_bus.device_list), next)
> > +
> > +#define FOREACH_DRIVER_ON_AUXILIARY_BUS(p) \
> > +		TAILQ_FOREACH(p, &(auxiliary_bus.driver_list), next)
> > +
> > +bool auxiliary_dev_exists(const char *name);
> > +
> > +/*
> > + * Scan the content of the auxiliary bus, and the devices in the devices
> > + * list.
> > + */
> > +int auxiliary_scan(void);
> > +
> > +/*
> > + * Update a device being scanned.
> > + */
> > +void auxiliary_on_scan(struct rte_auxiliary_device *aux_dev);
> > +
> > +/*
> > + * Validate whether a device with given auxiliary device should be ignored
> > + * or not.
> > + */
> > +bool auxiliary_is_ignored_device(const char *name);
> > +
> > +/*
> > + * Add an auxiliary device to the auxiliary bus (append to auxiliary device
> > + * list). This function also updates the bus references of the auxiliary
> > + * device and the generic device object embedded within.
> > + */
> > +void auxiliary_add_device(struct rte_auxiliary_device *aux_dev);
> > +
> > +/*
> > + * Insert an auxiliary device in the auxiliary bus at a particular location
> > + * in the device list. It also updates the auxiliary bus reference of the
> > + * new devices to be inserted.
> > + */
> > +void auxiliary_insert_device(struct rte_auxiliary_device *exist_aux_dev,
> > +			     struct rte_auxiliary_device *new_aux_dev);
> > +
> > +/*
> > + * Match the auxiliary driver and device by driver function
> 
> Missing full stop.
> 
> > + */
> > +bool auxiliary_match(const struct rte_auxiliary_driver *aux_drv,
> > +		     const struct rte_auxiliary_device *aux_dev);
> > +
> > +/*
> > + * Iterate over devices, matching any device against the provided string
> 
> Missing full stop.
> 
> > + */
> > +void *auxiliary_dev_iterate(const void *start, const char *str,
> > +			    const struct rte_dev_iterator *it);
> > +
> > +#endif /* AUXILIARY_PRIVATE_H */
> > diff --git a/drivers/bus/auxiliary/rte_bus_auxiliary.h b/drivers/bus/auxiliary/rte_bus_auxiliary.h
> > new file mode 100644
> > index 0000000000..16b147e387
> > --- /dev/null
> > +++ b/drivers/bus/auxiliary/rte_bus_auxiliary.h
> > @@ -0,0 +1,201 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright (c) 2021 NVIDIA Corporation & Affiliates
> > + */
> > +
> > +#ifndef RTE_BUS_AUXILIARY_H
> > +#define RTE_BUS_AUXILIARY_H
> > +
> > +/**
> > + * @file
> > + *
> > + * Auxiliary Bus Interface.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <limits.h>
> > +#include <errno.h>
> > +#include <sys/queue.h>
> > +#include <stdint.h>
> > +#include <inttypes.h>
> > +
> > +#include <rte_debug.h>
> > +#include <rte_interrupts.h>
> > +#include <rte_dev.h>
> > +#include <rte_bus.h>
> > +#include <rte_kvargs.h>
> > +
> > +#define RTE_BUS_AUXILIARY_NAME "auxiliary"
> > +
> > +/* Forward declarations */
> > +struct rte_auxiliary_driver;
> > +struct rte_auxiliary_bus;
> > +struct rte_auxiliary_device;
> > +
> > +/**
> > + * Match function for the driver to decide if device can be handled.
> > + *
> > + * @param name
> > + *   Pointer to the auxiliary device name.
> > + * @return
> > + *   Whether the driver can handle the auxiliary device.
> > + */
> > +typedef bool(rte_auxiliary_match_t)(const char *name);
> > +
> > +/**
> > + * Initialization function for the driver called during auxiliary probing.
> > + *
> > + * @param drv
> > + *   Pointer to the auxiliary driver.
> > + * @param dev
> > + *   Pointer to the auxiliary device.
> > + * @return
> > + *   - 0 On success.
> > + *   - Negative value and rte_errno is set otherwise.
> > + */
> > +typedef int(rte_auxiliary_probe_t)(struct rte_auxiliary_driver *drv,
> > +				    struct rte_auxiliary_device *dev);
> > +
> > +/**
> > + * Uninitialization function for the driver called during hotplugging.
> > + *
> > + * @param dev
> > + *   Pointer to the auxiliary device.
> > + * @return
> > + *   - 0 On success.
> > + *   - Negative value and rte_errno is set otherwise.
> > + */
> > +typedef int (rte_auxiliary_remove_t)(struct rte_auxiliary_device *dev);
> > +
> > +/**
> > + * Driver-specific DMA mapping. After a successful call the device
> > + * will be able to read/write from/to this segment.
> > + *
> > + * @param dev
> > + *   Pointer to the auxiliary device.
> > + * @param addr
> > + *   Starting virtual address of memory to be mapped.
> > + * @param iova
> > + *   Starting IOVA address of memory to be mapped.
> > + * @param len
> > + *   Length of memory segment being mapped.
> > + * @return
> > + *   - 0 On success.
> > + *   - Negative value and rte_errno is set otherwise.
> > + */
> > +typedef int (rte_auxiliary_dma_map_t)(struct rte_auxiliary_device *dev,
> > +				       void *addr, uint64_t iova, size_t len);
> > +
> > +/**
> > + * Driver-specific DMA un-mapping. After a successful call the device
> > + * will not be able to read/write from/to this segment.
> > + *
> > + * @param dev
> > + *   Pointer to the auxiliary device.
> > + * @param addr
> > + *   Starting virtual address of memory to be unmapped.
> > + * @param iova
> > + *   Starting IOVA address of memory to be unmapped.
> > + * @param len
> > + *   Length of memory segment being unmapped.
> > + * @return
> > + *   - 0 On success.
> > + *   - Negative value and rte_errno is set otherwise.
> > + */
> > +typedef int (rte_auxiliary_dma_unmap_t)(struct rte_auxiliary_device *dev,
> > +					 void *addr, uint64_t iova, size_t len);
> > +
> > +/**
> > + * A structure describing an auxiliary device.
> > + */
> > +struct rte_auxiliary_device {
> > +	TAILQ_ENTRY(rte_auxiliary_device) next;   /**< Next probed device. */
> > +	struct rte_device device;                 /**< Inherit core device */
> > +	char name[RTE_DEV_NAME_MAX_LEN + 1];      /**< ASCII device name */
> > +	struct rte_intr_handle intr_handle;       /**< Interrupt handle */
> > +	struct rte_auxiliary_driver *driver;      /**< Device driver */
> > +};
> > +
> > +/** List of auxiliary devices */
> > +TAILQ_HEAD(rte_auxiliary_device_list, rte_auxiliary_device);
> > +/** List of auxiliary drivers */
> > +TAILQ_HEAD(rte_auxiliary_driver_list, rte_auxiliary_driver);
> 
> Shouldn't we hide rte_auxiliary_device inside the library take
> API/ABI stability into account? Or will be it DPDK internal anyway? If
> so, it should be done INTERNAL from the very
> beginning.
> 
> > +
> > +/**
> > + * Structure describing the auxiliary bus
> > + */
> > +struct rte_auxiliary_bus {
> > +	struct rte_bus bus;                  /**< Inherit the generic class */
> > +	struct rte_auxiliary_device_list device_list;  /**< List of devices */
> > +	struct rte_auxiliary_driver_list driver_list;  /**< List of drivers */
> > +};
> 
> It looks internal. The following forward declaration should be
> sufficient to build.
> 
> struct rte_auxiliary_bus;
> 
> 
> > +
> > +/**
> > + * A structure describing an auxiliary driver.
> > + */
> > +struct rte_auxiliary_driver {
> > +	TAILQ_ENTRY(rte_auxiliary_driver) next; /**< Next in list. */
> > +	struct rte_driver driver;             /**< Inherit core driver. */
> > +	struct rte_auxiliary_bus *bus;        /**< Auxiliary bus reference. */
> > +	rte_auxiliary_match_t *match;         /**< Device match function. */
> > +	rte_auxiliary_probe_t *probe;         /**< Device probe function. */
> > +	rte_auxiliary_remove_t *remove;       /**< Device remove function. */
> > +	rte_auxiliary_dma_map_t *dma_map;     /**< Device DMA map function. */
> > +	rte_auxiliary_dma_unmap_t *dma_unmap; /**< Device DMA unmap function. */
> > +	uint32_t drv_flags;                   /**< Flags RTE_AUXILIARY_DRV_*. */
> > +};
> > +
> > +/**
> > + * @internal
> > + * Helper macro for drivers that need to convert to struct rte_auxiliary_device.
> > + */
> > +#define RTE_DEV_TO_AUXILIARY(ptr) \
> > +	container_of(ptr, struct rte_auxiliary_device, device)
> > +
> > +#define RTE_DEV_TO_AUXILIARY_CONST(ptr) \
> > +	container_of(ptr, const struct rte_auxiliary_device, device)
> > +
> > +#define RTE_ETH_DEV_TO_AUXILIARY(eth_dev) \
> > +	RTE_DEV_TO_AUXILIARY((eth_dev)->device)
> > +
> > +/** Device driver needs IOVA as VA and cannot work with IOVA as PA */
> > +#define RTE_AUXILIARY_DRV_NEED_IOVA_AS_VA 0x002
> > +
> > +/**
> 
> Don't we need EXPERIMENTAL notice here?
> 
> > + * Register an auxiliary driver.
> > + *
> > + * @param driver
> > + *   A pointer to a rte_auxiliary_driver structure describing the driver
> > + *   to be registered.
> > + */
> > +__rte_experimental
> > +void rte_auxiliary_register(struct rte_auxiliary_driver *driver);
> > +
> > +/** Helper for auxiliary device registration from driver instance */
> > +#define RTE_PMD_REGISTER_AUXILIARY(nm, auxiliary_drv) \
> > +	RTE_INIT(auxiliaryinitfn_ ##nm) \
> > +	{ \
> > +		(auxiliary_drv).driver.name = RTE_STR(nm); \
> > +		rte_auxiliary_register(&(auxiliary_drv)); \
> > +	} \
> > +	RTE_PMD_EXPORT_NAME(nm, __COUNTER__)
> > +
> > +/**
> 
> Don't we need EXPERIMENTAL notice here?
> 
> > + * Unregister an auxiliary driver.
> > + *
> > + * @param driver
> > + *   A pointer to a rte_auxiliary_driver structure describing the driver
> > + *   to be unregistered.
> > + */
> > +__rte_experimental
> > +void rte_auxiliary_unregister(struct rte_auxiliary_driver *driver);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* RTE_BUS_AUXILIARY_H */
> > diff --git a/drivers/bus/auxiliary/version.map b/drivers/bus/auxiliary/version.map
> > new file mode 100644
> > index 0000000000..a52260657c
> > --- /dev/null
> > +++ b/drivers/bus/auxiliary/version.map
> > @@ -0,0 +1,7 @@
> > +EXPERIMENTAL {
> > +	global:
> > +
> > +	# added in 21.08
> > +	rte_auxiliary_register;
> > +	rte_auxiliary_unregister;
> > +};
> > diff --git a/drivers/bus/meson.build b/drivers/bus/meson.build
> > index 410058de3a..45eab5233d 100644
> > --- a/drivers/bus/meson.build
> > +++ b/drivers/bus/meson.build
> > @@ -2,6 +2,7 @@
> >  # Copyright(c) 2017 Intel Corporation
> >
> >  drivers = [
> > +        'auxiliary',
> >          'dpaa',
> >          'fslmc',
> >          'ifpga',
> >


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [RFC PATCH v4 0/3] Add PIE support for HQoS library
  2021-06-21  7:35  3% ` [dpdk-dev] [RFC PATCH v3 " Liguzinski, WojciechX
@ 2021-07-05  8:04  3%   ` Liguzinski, WojciechX
  0 siblings, 0 replies; 200+ results
From: Liguzinski, WojciechX @ 2021-07-05  8:04 UTC (permalink / raw)
  To: dev, jasvinder.singh, cristian.dumitrescu; +Cc: savinay.dharmappa, megha.ajmera

DPDK sched library is equipped with mechanism that secures it from the bufferbloat problem
which is a situation when excess buffers in the network cause high latency and latency 
variation. Currently, it supports RED for active queue management (which is designed 
to control the queue length but it does not control latency directly and is now being 
obsoleted). However, more advanced queue management is required to address this problem
and provide desirable quality of service to users.

This solution (RFC) proposes usage of new algorithm called "PIE" (Proportional Integral
controller Enhanced) that can effectively and directly control queuing latency to address 
the bufferbloat problem.

The implementation of mentioned functionality includes modification of existing and 
adding a new set of data structures to the library, adding PIE related APIs. 
This affects structures in public API/ABI. That is why deprecation notice is going
to be prepared and sent.

Liguzinski, WojciechX (3):
  sched: add PIE based congestion management
  example/qos_sched: add PIE support
  example/ip_pipeline: add PIE support

 config/rte_config.h                      |   1 -
 drivers/net/softnic/rte_eth_softnic_tm.c |   6 +-
 examples/ip_pipeline/tmgr.c              |   6 +-
 examples/qos_sched/app_thread.c          |   1 -
 examples/qos_sched/cfg_file.c            |  82 ++++-
 examples/qos_sched/init.c                |   7 +-
 examples/qos_sched/profile.cfg           | 196 +++++++----
 lib/sched/meson.build                    |  10 +-
 lib/sched/rte_pie.c                      |  82 +++++
 lib/sched/rte_pie.h                      | 393 +++++++++++++++++++++++
 lib/sched/rte_sched.c                    | 229 +++++++++----
 lib/sched/rte_sched.h                    |  53 ++-
 lib/sched/version.map                    |   3 +
 13 files changed, 888 insertions(+), 181 deletions(-)
 create mode 100644 lib/sched/rte_pie.c
 create mode 100644 lib/sched/rte_pie.h

-- 
2.17.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3 19/20] net/sfc: support flow action COUNT in transfer rules
  2021-07-04 19:45  3%     ` Thomas Monjalon
@ 2021-07-05  8:41  0%       ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2021-07-05  8:41 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: David Marchand, Bruce Richardson, dev, Igor Romanov,
	Andy Moreton, Ivan Malov

On 7/4/21 10:45 PM, Thomas Monjalon wrote:
> 02/07/2021 14:53, Andrew Rybchenko:
>> On 7/2/21 3:30 PM, Thomas Monjalon wrote:
>>> 02/07/2021 10:43, Andrew Rybchenko:
>>>> On 7/1/21 4:05 PM, Andrew Rybchenko wrote:
>>>>> On 7/1/21 3:34 PM, David Marchand wrote:
>>>>>> On Thu, Jul 1, 2021 at 11:22 AM Andrew Rybchenko
>>>>>> <andrew.rybchenko@oktetlabs.ru> wrote:
>>>>>>> The build works fine for me on FC34, but it has
>>>>>>> libatomic-11.1.1-3.fc34.x86_64 installed.
>>>>>> I first produced the issue on my "old" FC32.
>>>>>> Afaics, for FC33 and later, gcc now depends on libatomic and the
>>>>>> problem won't be noticed.
>>>>>> FC32 and before are EOL, but I then reproduced the issue on RHEL 8
>>>>>> (and Intel CI reported it on Centos 8 too).
>>>>> I see. Thanks for the clarification.
>>>>>
>>>>>>> I'd like to understand what we're trying to solve here.
>>>>>>> Are we trying to make meson to report the missing library
>>>>>>> correctly?
>>>>>>>
>>>>>>> If so, I think I can do simple check using cc.links()
>>>>>>> which will fail if the library is not found. I'll
>>>>>>> test that it works as expected if the library is not
>>>>>>> completely installed.
>>>>>>>
>>>>>> I tried below diff, and it works for me.
>>>>>> "works" as in net/sfc gets disabled without libatomic installed:
>>> [...]
>>>>>>  # for gcc compiles we need -latomic for 128-bit atomic ops
>>>>>>  if cc.get_id() == 'gcc'
>>>>>> +    code = '''#include <stdio.h>
>>>>>> +    void main() { printf("Atomilink me.\n"); }
>>>>>> +    '''
>>>>>> +    if not cc.links(code, args: '-latomic', name: 'libatomic link check')
>>>>>> +        build = false
>>>>>> +        reason = 'missing dependency, "libatomic"'
>>>>>> +        subdir_done()
>>>>>> +    endif
>>>>>>      ext_deps += cc.find_library('atomic')
>>>>>>  endif
>>>>> Many thanks, LGTM. I'll pick it up and add comments why
>>>>> it is checked this way.
>>>>>
>>>> I've send v4 with the problem fixed. However, I'm afraid
>>>> build test systems should be updated to have libatomic
>>>> correctly installed. Otherwise, they do not really check
>>>> net/sfc build.
>>> When testing on old systems, sfc won't be tested anymore after this patchset.
>>> On recent systems, sfc should be enabled I guess.
>>> I don't see how to manage better, sorry.
>>>
>> I see. I thought that it is possible to install missing
>> package on corresponding systems to make build coverage
>> better.
>>
>> Now I automatically test build on problematic distros
>> with previously missing packages installed. So I have
>> internal build coverage anyway.
> David asked for installing libatomic:
> https://inbox.dpdk.org/ci/CAJFAV8xCNBL4yEZU0c=dJGYS+13QM7Uz7e2qnUkMuM7eaKKw+Q@mail.gmail.com/
>
> We should wait for it to be installed otherwise ABI check will fail.

Yes, I see. Thanks.


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v8 2/2] bus/auxiliary: introduce auxiliary bus
  @ 2021-07-05  9:19  3%   ` Andrew Rybchenko
  2021-07-05  9:30  0%     ` Xueming(Steven) Li
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-07-05  9:19 UTC (permalink / raw)
  To: Xueming Li
  Cc: dev, Wang Haiyue, Thomas Monjalon, Kinsella Ray, Parav Pandit,
	Neil Horman

On 7/5/21 9:45 AM, Xueming Li wrote:
> Auxiliary bus [1] provides a way to split function into child-devices
> representing sub-domains of functionality. Each auxiliary device
> represents a part of its parent functionality.
> 
> Auxiliary device is identified by unique device name, sysfs path:
>   /sys/bus/auxiliary/devices/<name>
> 
> Devargs legacy syntax of auxiliary device:
>   -a auxiliary:<name>[,args...]
> Devargs generic syntax of auxiliary device:
>   -a bus=auxiliary,name=<name>/class=<class>/driver=<driver>[,args...]
> 
> [1] kernel auxiliary bus document:
> https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html
> 
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Cc: Wang Haiyue <haiyue.wang@intel.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>
> Cc: Kinsella Ray <mdr@ashroe.eu>
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

I still don't understand if we really need to make the API
a part of stable API/ABI in the future. Can it be internal?

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v8 2/2] bus/auxiliary: introduce auxiliary bus
  2021-07-05  9:19  3%   ` Andrew Rybchenko
@ 2021-07-05  9:30  0%     ` Xueming(Steven) Li
  2021-07-05  9:35  0%       ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Xueming(Steven) Li @ 2021-07-05  9:30 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: dev, Wang Haiyue, NBU-Contact-Thomas Monjalon, Kinsella Ray,
	Parav Pandit, Neil Horman

Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Monday, July 5, 2021 5:19 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>
> Cc: dev@dpdk.org; Wang Haiyue <haiyue.wang@intel.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Kinsella Ray
> <mdr@ashroe.eu>; Parav Pandit <parav@nvidia.com>; Neil Horman <nhorman@tuxdriver.com>
> Subject: Re: [PATCH v8 2/2] bus/auxiliary: introduce auxiliary bus
> 
> On 7/5/21 9:45 AM, Xueming Li wrote:
> > Auxiliary bus [1] provides a way to split function into child-devices
> > representing sub-domains of functionality. Each auxiliary device
> > represents a part of its parent functionality.
> >
> > Auxiliary device is identified by unique device name, sysfs path:
> >   /sys/bus/auxiliary/devices/<name>
> >
> > Devargs legacy syntax of auxiliary device:
> >   -a auxiliary:<name>[,args...]
> > Devargs generic syntax of auxiliary device:
> >   -a bus=auxiliary,name=<name>/class=<class>/driver=<driver>[,args...]
> >
> > [1] kernel auxiliary bus document:
> > https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html
> >
> > Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> > Cc: Wang Haiyue <haiyue.wang@intel.com>
> > Cc: Thomas Monjalon <thomas@monjalon.net>
> > Cc: Kinsella Ray <mdr@ashroe.eu>
> > Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> 
> I still don't understand if we really need to make the API a part of stable API/ABI in the future. Can it be internal?

There was some discussion on this with Thomas in earlier version. Users might want to register/unregister their own PMD driver,
Is this a valid scenario?

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v8 2/2] bus/auxiliary: introduce auxiliary bus
  2021-07-05  9:30  0%     ` Xueming(Steven) Li
@ 2021-07-05  9:35  0%       ` Andrew Rybchenko
  2021-07-05 14:57  0%         ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-07-05  9:35 UTC (permalink / raw)
  To: Xueming(Steven) Li
  Cc: dev, Wang Haiyue, NBU-Contact-Thomas Monjalon, Kinsella Ray,
	Parav Pandit, Neil Horman

On 7/5/21 12:30 PM, Xueming(Steven) Li wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Sent: Monday, July 5, 2021 5:19 PM
>> To: Xueming(Steven) Li <xuemingl@nvidia.com>
>> Cc: dev@dpdk.org; Wang Haiyue <haiyue.wang@intel.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Kinsella Ray
>> <mdr@ashroe.eu>; Parav Pandit <parav@nvidia.com>; Neil Horman <nhorman@tuxdriver.com>
>> Subject: Re: [PATCH v8 2/2] bus/auxiliary: introduce auxiliary bus
>>
>> On 7/5/21 9:45 AM, Xueming Li wrote:
>>> Auxiliary bus [1] provides a way to split function into child-devices
>>> representing sub-domains of functionality. Each auxiliary device
>>> represents a part of its parent functionality.
>>>
>>> Auxiliary device is identified by unique device name, sysfs path:
>>>   /sys/bus/auxiliary/devices/<name>
>>>
>>> Devargs legacy syntax of auxiliary device:
>>>   -a auxiliary:<name>[,args...]
>>> Devargs generic syntax of auxiliary device:
>>>   -a bus=auxiliary,name=<name>/class=<class>/driver=<driver>[,args...]
>>>
>>> [1] kernel auxiliary bus document:
>>> https://www.kernel.org/doc/html/latest/driver-api/auxiliary_bus.html
>>>
>>> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
>>> Cc: Wang Haiyue <haiyue.wang@intel.com>
>>> Cc: Thomas Monjalon <thomas@monjalon.net>
>>> Cc: Kinsella Ray <mdr@ashroe.eu>
>>> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>
>> I still don't understand if we really need to make the API a part of stable API/ABI in the future. Can it be internal?
> 
> There was some discussion on this with Thomas in earlier version.
> Users might want to register/unregister their own PMD driver,
> Is this a valid scenario?

Yes, it is true, but should DPDK care that much about
out-of-tree drivers. I'm just asking since don't know
techboard position on it.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library
  2021-07-04  9:30  3% ` Jerin Jacob
@ 2021-07-05 10:52  0%   ` Bruce Richardson
  2021-07-05 15:55  0%     ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2021-07-05 10:52 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Chengwen Feng, Thomas Monjalon, Ferruh Yigit, Jerin Jacob,
	dpdk-dev, Morten Brørup, Nipun Gupta, Hemant Agrawal,
	Maxime Coquelin, Honnappa Nagarahalli, David Marchand,
	Satananda Burla, Prasun Kapoor, Ananyev, Konstantin, liangma,
	Radha Mohan Chintakuntla

On Sun, Jul 04, 2021 at 03:00:30PM +0530, Jerin Jacob wrote:
> On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng <fengchengwen@huawei.com> wrote:
> >
> > This patch introduces 'dmadevice' which is a generic type of DMA
> > device.
> >
> > The APIs of dmadev library exposes some generic operations which can
> > enable configuration and I/O with the DMA devices.
> >
> > Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> 
> Thanks for v1.
> 
> I would suggest finalizing  lib/dmadev/rte_dmadev.h before doing the
> implementation so that you don't need
> to waste time on rewoking the implementation.
> 

I actually like having the .c file available too. Before we lock down the
.h file and the API, I want to verify the performance of our drivers with
the implementation, and having a working .c file is obviously necessary for
that. So I appreciate having it as part of the RFC.

> Comments inline.
> 
> > ---
<snip>
> > + *
> > + * The DMA framework is built on the following abstraction model:
> > + *
> > + *     ------------    ------------
> > + *     |virt-queue|    |virt-queue|
> > + *     ------------    ------------
> > + *            \           /
> > + *             \         /
> > + *              \       /
> > + *            ------------     ------------
> > + *            | HW-queue |     | HW-queue |
> > + *            ------------     ------------
> > + *                   \            /
> > + *                    \          /
> > + *                     \        /
> > + *                     ----------
> > + *                     | dmadev |
> > + *                     ----------
> 
> Continuing the discussion with @Morten Brørup , I think, we need to
> finalize the model.
> 

+1 and the terminology with regards to queues and channels. With our ioat
hardware, each HW queue was called a channel for instance.

> > + *   a) The DMA operation request must be submitted to the virt queue, virt
> > + *      queues must be created based on HW queues, the DMA device could have
> > + *      multiple HW queues.
> > + *   b) The virt queues on the same HW-queue could represent different contexts,
> > + *      e.g. user could create virt-queue-0 on HW-queue-0 for mem-to-mem
> > + *      transfer scenario, and create virt-queue-1 on the same HW-queue for
> > + *      mem-to-dev transfer scenario.
> > + *   NOTE: user could also create multiple virt queues for mem-to-mem transfer
> > + *         scenario as long as the corresponding driver supports.
> > + *
> > + * The control plane APIs include configure/queue_setup/queue_release/start/
> > + * stop/reset/close, in order to start device work, the call sequence must be
> > + * as follows:
> > + *     - rte_dmadev_configure()
> > + *     - rte_dmadev_queue_setup()
> > + *     - rte_dmadev_start()
> 
> Please add reconfigure behaviour etc, Please check the
> lib/regexdev/rte_regexdev.h
> introduction. I have added similar ones so you could reuse as much as possible.
> 
> 
> > + * The dataplane APIs include two parts:
> > + *   a) The first part is the submission of operation requests:
> > + *        - rte_dmadev_copy()
> > + *        - rte_dmadev_copy_sg() - scatter-gather form of copy
> > + *        - rte_dmadev_fill()
> > + *        - rte_dmadev_fill_sg() - scatter-gather form of fill
> > + *        - rte_dmadev_fence()   - add a fence force ordering between operations
> > + *        - rte_dmadev_perform() - issue doorbell to hardware
> > + *      These APIs could work with different virt queues which have different
> > + *      contexts.
> > + *      The first four APIs are used to submit the operation request to the virt
> > + *      queue, if the submission is successful, a cookie (as type
> > + *      'dma_cookie_t') is returned, otherwise a negative number is returned.
> > + *   b) The second part is to obtain the result of requests:
> > + *        - rte_dmadev_completed()
> > + *            - return the number of operation requests completed successfully.
> > + *        - rte_dmadev_completed_fails()
> > + *            - return the number of operation requests failed to complete.
> > + *
> > + * The misc APIs include info_get/queue_info_get/stats/xstats/selftest, provide
> > + * information query and self-test capabilities.
> > + *
> > + * About the dataplane APIs MT-safe, there are two dimensions:
> > + *   a) For one virt queue, the submit/completion API could be MT-safe,
> > + *      e.g. one thread do submit operation, another thread do completion
> > + *      operation.
> > + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_VQ.
> > + *      If driver don't support it, it's up to the application to guarantee
> > + *      MT-safe.
> > + *   b) For multiple virt queues on the same HW queue, e.g. one thread do
> > + *      operation on virt-queue-0, another thread do operation on virt-queue-1.
> > + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_MVQ.
> > + *      If driver don't support it, it's up to the application to guarantee
> > + *      MT-safe.
> 
> From an application PoV it may not be good to write portable
> applications. Please check
> latest thread with @Morten Brørup
> 
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#include <rte_common.h>
> > +#include <rte_memory.h>
> > +#include <rte_errno.h>
> > +#include <rte_compat.h>
> 
> Sort in alphabetical order.
> 
> > +
> > +/**
> > + * dma_cookie_t - an opaque DMA cookie
> 
> Since we are defining the behaviour is not opaque any more.
> I think, it is better to call ring_idx or so.
> 

+1 for ring index. We don't need a separate type for it though, just
document the index as an unsigned return value.

> 
> > +#define RTE_DMA_DEV_CAPA_MT_MVQ (1ull << 11) /**< Support MT-safe of multiple virt queues */
> 
> Please lot of @see for all symbols where it is being used. So that one
> can understand the full scope of
> symbols. See below example.
> 
> #define RTE_REGEXDEV_CAPA_RUNTIME_COMPILATION_F (1ULL << 0)
> /**< RegEx device does support compiling the rules at runtime unlike
>  * loading only the pre-built rule database using
>  * struct rte_regexdev_config::rule_db in rte_regexdev_configure()
>  *
>  * @see struct rte_regexdev_config::rule_db, rte_regexdev_configure()
>  * @see struct rte_regexdev_info::regexdev_capa
>  */
> 
> > + *
> > + * If dma_cookie_t is >=0 it's a DMA operation request cookie, <0 it's a error
> > + * code.
> > + * When using cookies, comply with the following rules:
> > + * a) Cookies for each virtual queue are independent.
> > + * b) For a virt queue, the cookie are monotonically incremented, when it reach
> > + *    the INT_MAX, it wraps back to zero.

I disagree with the INT_MAX (or INT32_MAX) value here. If we use that
value, it means that we cannot use implicit wrap-around inside the CPU and
have to check for the INT_MAX value. Better to:
1. Specify that it wraps at UINT16_MAX which allows us to just use a
uint16_t internally and wrap-around automatically, or:
2. Specify that it wraps at a power-of-2 value >= UINT16_MAX, giving
drivers the flexibility at what value to wrap around.

> > + * c) The initial cookie of a virt queue is zero, after the device is stopped or
> > + *    reset, the virt queue's cookie needs to be reset to zero.
> > + * Example:
> > + *    step-1: start one dmadev
> > + *    step-2: enqueue a copy operation, the cookie return is 0
> > + *    step-3: enqueue a copy operation again, the cookie return is 1
> > + *    ...
> > + *    step-101: stop the dmadev
> > + *    step-102: start the dmadev
> > + *    step-103: enqueue a copy operation, the cookie return is 0
> > + *    ...
> > + */
> 
> Good explanation.
> 
> > +typedef int32_t dma_cookie_t;
> 

As I mentioned before, I'd just remove this, and use regular int types,
with "ring_idx" as the name.

> 
> > +
> > +/**
> > + * dma_scatterlist - can hold scatter DMA operation request
> > + */
> > +struct dma_scatterlist {
> 
> I prefer to change scatterlist -> sg
> i.e rte_dma_sg
> 
> > +       void *src;
> > +       void *dst;
> > +       uint32_t length;
> > +};
> > +
> 
> > +
> > +/**
> > + * A structure used to retrieve the contextual information of
> > + * an DMA device
> > + */
> > +struct rte_dmadev_info {
> > +       /**
> > +        * Fields filled by framewok
> 
> typo.
> 
> > +        */
> > +       struct rte_device *device; /**< Generic Device information */
> > +       const char *driver_name; /**< Device driver name */
> > +       int socket_id; /**< Socket ID where memory is allocated */
> > +
> > +       /**
> > +        * Specification fields filled by driver
> > +        */
> > +       uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */
> > +       uint16_t max_hw_queues; /**< Maximum number of HW queues. */
> > +       uint16_t max_vqs_per_hw_queue;
> > +       /**< Maximum number of virt queues to allocate per HW queue */
> > +       uint16_t max_desc;
> > +       /**< Maximum allowed number of virt queue descriptors */
> > +       uint16_t min_desc;
> > +       /**< Minimum allowed number of virt queue descriptors */
> 
> Please add max_nb_segs. i.e maximum number of segments supported.
> 
> > +
> > +       /**
> > +        * Status fields filled by driver
> > +        */
> > +       uint16_t nb_hw_queues; /**< Number of HW queues configured */
> > +       uint16_t nb_vqs; /**< Number of virt queues configured */
> > +};
> > + i
> > +
> > +/**
> > + * dma_address_type
> > + */
> > +enum dma_address_type {
> > +       DMA_ADDRESS_TYPE_IOVA, /**< Use IOVA as dma address */
> > +       DMA_ADDRESS_TYPE_VA, /**< Use VA as dma address */
> > +};
> > +
> > +/**
> > + * A structure used to configure a DMA device.
> > + */
> > +struct rte_dmadev_conf {
> > +       enum dma_address_type addr_type; /**< Address type to used */
> 
> I think, there are 3 kinds of limitations/capabilities.
> 
> When the system is configured as IOVA as VA
> 1) Device supports any VA address like memory from rte_malloc(),
> rte_memzone(), malloc, stack memory
> 2) Device support only VA address from rte_malloc(), rte_memzone() i.e
> memory backed by hugepage and added to DMA map.
> 
> When the system is configured as IOVA as PA
> 1) Devices support only PA addresses .
> 
> IMO, Above needs to be  advertised as capability and application needs
> to align with that
> and I dont think application requests the driver to work in any of the modes.
> 
> 

I don't think we need this level of detail for addressing capabilities.
Unless I'm missing something, the hardware should behave exactly as other
hardware does taking in iova's.  If the user wants to check whether virtual
addresses to pinned memory can be used directly, the user can call
"rte_eal_iova_mode". We can't have a situation where some hardware uses one
type of addresses and another hardware the other.

Therefore, the only additional addressing capability we should need to
report is that the hardware can use SVM/SVA and use virtual addresses not
in hugepage memory.

> 
> > +       uint16_t nb_hw_queues; /**< Number of HW-queues enable to use */
> > +       uint16_t max_vqs; /**< Maximum number of virt queues to use */
> 
> You need to what is max value allowed etc i.e it is based on
> info_get() and mention the field
> in info structure
> 
> 
> > +
> > +/**
> > + * dma_transfer_direction
> > + */
> > +enum dma_transfer_direction {
> 
> rte_dma_transter_direction
> 
> > +       DMA_MEM_TO_MEM,
> > +       DMA_MEM_TO_DEV,
> > +       DMA_DEV_TO_MEM,
> > +       DMA_DEV_TO_DEV,
> > +};
> > +
> > +/**
> > + * A structure used to configure a DMA virt queue.
> > + */
> > +struct rte_dmadev_queue_conf {
> > +       enum dma_transfer_direction direction;
> 
> 
> > +       /**< Associated transfer direction */
> > +       uint16_t hw_queue_id; /**< The HW queue on which to create virt queue */
> > +       uint16_t nb_desc; /**< Number of descriptor for this virt queue */
> > +       uint64_t dev_flags; /**< Device specific flags */
> 
> Use of this? Need more comments on this.
> Since it is in slowpath, We can have non opaque names here based on
> each driver capability.
> 
> 
> > +       void *dev_ctx; /**< Device specific context */
> 
> Use of this ? Need more comment ont this.
> 

I think this should be dropped. We should not have any opaque
device-specific info in these structs, rather if a particular device needs
parameters we should call them out. Drivers for which it's not relevant can
ignore them (and report same in capability if necessary). Since this is not
a dataplane API, we aren't concerned too much about perf and can size the
struct appropriately.

> 
> Please add some good amount of reserved bits and have API to init this
> structure for future ABI stability, say rte_dmadev_queue_config_init()
> or so.
> 

I don't think that is necessary. Since the config struct is used only as
parameter to the config function, any changes to it can be managed by
versioning that single function. Padding would only be necessary if we had
an array of these config structs somewhere.

> 
> > +
> > +/**
> > + * A structure used to retrieve information of a DMA virt queue.
> > + */
> > +struct rte_dmadev_queue_info {
> > +       enum dma_transfer_direction direction;
> 
> A queue may support all directions so I think it should be a bitfield.
> 
> > +       /**< Associated transfer direction */
> > +       uint16_t hw_queue_id; /**< The HW queue on which to create virt queue */
> > +       uint16_t nb_desc; /**< Number of descriptor for this virt queue */
> > +       uint64_t dev_flags; /**< Device specific flags */
> > +};
> > +
> 
> > +__rte_experimental
> > +static inline dma_cookie_t
> > +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id,
> > +                  const struct dma_scatterlist *sg,
> > +                  uint32_t sg_len, uint64_t flags)
> 
> I would like to change this as:
> rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id, const struct
> rte_dma_sg *src, uint32_t nb_src,
> const struct rte_dma_sg *dst, uint32_t nb_dst) or so allow the use case like
> src 30 MB copy can be splitted as written as 1 MB x 30 dst.
> 
> 
> 
> > +{
> > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +       return (*dev->copy_sg)(dev, vq_id, sg, sg_len, flags);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a fill operation onto the DMA virt queue
> > + *
> > + * This queues up a fill operation to be performed by hardware, but does not
> > + * trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vq_id
> > + *   The identifier of virt queue.
> > + * @param pattern
> > + *   The pattern to populate the destination buffer with.
> > + * @param dst
> > + *   The address of the destination buffer.
> > + * @param length
> > + *   The length of the destination buffer.
> > + * @param flags
> > + *   An opaque flags for this operation.
> 
> PLEASE REMOVE opaque stuff from fastpath it will be a pain for
> application writers as
> they need to write multiple combinations of fastpath. flags are OK, if
> we have a valid
> generic flag now to control the transfer behavior.
> 

+1. Flags need to be explicitly listed. If we don't have any flags for now,
we can specify that the value must be given as zero and it's for future
use.

> 
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Add a fence to force ordering between operations
> > + *
> > + * This adds a fence to a sequence of operations to enforce ordering, such that
> > + * all operations enqueued before the fence must be completed before operations
> > + * after the fence.
> > + * NOTE: Since this fence may be added as a flag to the last operation enqueued,
> > + * this API may not function correctly when called immediately after an
> > + * "rte_dmadev_perform" call i.e. before any new operations are enqueued.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vq_id
> > + *   The identifier of virt queue.
> > + *
> > + * @return
> > + *   - =0: Successful add fence.
> > + *   - <0: Failure to add fence.
> > + *
> > + * NOTE: The caller must ensure that the input parameter is valid and the
> > + *       corresponding device supports the operation.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_fence(uint16_t dev_id, uint16_t vq_id)
> > +{
> > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +       return (*dev->fence)(dev, vq_id);
> > +}
> 
> Since HW submission is in a queue(FIFO) the ordering is always
> maintained. Right?
> Could you share more details and use case of fence() from
> driver/application PoV?
> 

There are different kinds of ordering to consider, ordering of completions
and the ordering of operations. While jobs are reported as completed to the
user in order, for performance hardware, may overlap individual jobs within
a burst (or even across bursts). Therefore, we need a fence operation to
inform hardware that one job should not be started until the other has
fully completed.

> 
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Trigger hardware to begin performing enqueued operations
> > + *
> > + * This API is used to write the "doorbell" to the hardware to trigger it
> > + * to begin the operations previously enqueued by rte_dmadev_copy/fill()
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vq_id
> > + *   The identifier of virt queue.
> > + *
> > + * @return
> > + *   - =0: Successful trigger hardware.
> > + *   - <0: Failure to trigger hardware.
> > + *
> > + * NOTE: The caller must ensure that the input parameter is valid and the
> > + *       corresponding device supports the operation.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_perform(uint16_t dev_id, uint16_t vq_id)
> > +{
> > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +       return (*dev->perform)(dev, vq_id);
> > +}
> 
> Since we have additional function call overhead in all the
> applications for this scheme, I would like to understand
> the use of doing this way vs enq does the doorbell implicitly from
> driver/application PoV?
> 

In our benchmarks it's just faster. When we tested it, the overhead of the
function calls was noticably less than the cost of building up the
parameter array(s) for passing the jobs in as a burst. [We don't see this
cost with things like NIC I/O since DPDK tends to already have the mbuf
fully populated before the TX call anyway.]

> 
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Returns the number of operations that have been successful completed.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vq_id
> > + *   The identifier of virt queue.
> > + * @param nb_cpls
> > + *   The maximum number of completed operations that can be processed.
> > + * @param[out] cookie
> > + *   The last completed operation's cookie.
> > + * @param[out] has_error
> > + *   Indicates if there are transfer error.
> > + *
> > + * @return
> > + *   The number of operations that successful completed.
> 
> successfully
> 
> > + *
> > + * NOTE: The caller must ensure that the input parameter is valid and the
> > + *       corresponding device supports the operation.
> > + */
> > +__rte_experimental
> > +static inline uint16_t
> > +rte_dmadev_completed(uint16_t dev_id, uint16_t vq_id, const uint16_t nb_cpls,
> > +                    dma_cookie_t *cookie, bool *has_error)
> > +{
> > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +       has_error = false;
> > +       return (*dev->completed)(dev, vq_id, nb_cpls, cookie, has_error);
> 
> It may be better to have cookie/ring_idx as third argument.
> 

No strong opinions here, but having it as in the code above means all
input parameters come before all output, which makes sense to me.

> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Returns the number of operations that failed to complete.
> > + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vq_id
> > + *   The identifier of virt queue.
> (> + * @param nb_status
> > + *   Indicates the size  of status array.
> > + * @param[out] status
> > + *   The error code of operations that failed to complete.
> > + * @param[out] cookie
> > + *   The last failed completed operation's cookie.
> > + *
> > + * @return
> > + *   The number of operations that failed to complete.
> > + *
> > + * NOTE: The caller must ensure that the input parameter is valid and the
> > + *       corresponding device supports the operation.
> > + */
> > +__rte_experimental
> > +static inline uint16_t
> > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vq_id,
> > +                          const uint16_t nb_status, uint32_t *status,
> > +                          dma_cookie_t *cookie)
> 
> IMO, it is better to move cookie/rind_idx at 3.
> Why it would return any array of errors? since it called after
> rte_dmadev_completed() has
> has_error. Is it better to change
> 
> rte_dmadev_error_status((uint16_t dev_id, uint16_t vq_id, dma_cookie_t
> *cookie,  uint32_t *status)
> 
> I also think, we may need to set status as bitmask and enumerate all
> the combination of error codes
> of all the driver and return string from driver existing rte_flow_error
> 
> See
> struct rte_flow_error {
>         enum rte_flow_error_type type; /**< Cause field and error types. */
>         const void *cause; /**< Object responsible for the error. */
>         const char *message; /**< Human-readable error message. */
> };
> 

I think we need a multi-return value API here, as we may add operations in
future which have non-error status values to return. The obvious case is
DMA engines which support "compare" operations. In that case a successful
compare (as in there were no DMA or HW errors) can return "equal" or
"not-equal" as statuses. For general "copy" operations, the faster
completion op can be used to just return successful values (and only call
this status version on error), while apps using those compare ops or a
mixture of copy and compare ops, would always use the slower one that
returns status values for each and every op..

The ioat APIs used 32-bit integer values for this status array so as to
allow e.g. 16-bits for error code and 16-bits for future status values. For
most operations there should be a fairly small set of things that can go
wrong, i.e. bad source address, bad destination address or invalid length.
Within that we may have a couple of specifics for why an address is bad,
but even so I don't think we need to start having multiple bit
combinations.

> > +{
> > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +       return (*dev->completed_fails)(dev, vq_id, nb_status, status, cookie);
> > +}
> > +
> > +struct rte_dmadev_stats {
> > +       uint64_t enqueue_fail_count;
> > +       /**< Conut of all operations which failed enqueued */
> > +       uint64_t enqueued_count;
> > +       /**< Count of all operations which successful enqueued */
> > +       uint64_t completed_fail_count;
> > +       /**< Count of all operations which failed to complete */
> > +       uint64_t completed_count;
> > +       /**< Count of all operations which successful complete */
> > +};
> 
> We need to have capability API to tell which items are
> updated/supported by the driver.
> 

I also would remove the enqueue fail counts, since they are better counted
by the app. If a driver reports 20,000 failures we have no way of knowing
if that is 20,000 unique operations which failed to enqueue or a single
operation which failed to enqueue 20,000 times but succeeded on attempt
20,001.

> 
> > diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> > new file mode 100644
> > index 0000000..a3afea2
> > --- /dev/null
> > +++ b/lib/dmadev/rte_dmadev_core.h
> > @@ -0,0 +1,98 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright 2021 HiSilicon Limited.
> > + */
> > +
> > +#ifndef _RTE_DMADEV_CORE_H_
> > +#define _RTE_DMADEV_CORE_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * RTE DMA Device internal header.
> > + *
> > + * This header contains internal data types. But they are still part of the
> > + * public API because they are used by inline public functions.
> > + */
> > +
> > +struct rte_dmadev;
> > +
> > +typedef dma_cookie_t (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vq_id,
> > +                                     void *src, void *dst,
> > +                                     uint32_t length, uint64_t flags);
> > +/**< @internal Function used to enqueue a copy operation. */
> 
> To avoid namespace conflict(as it is public API) use rte_
> 
> 
> > +
> > +/**
> > + * The data structure associated with each DMA device.
> > + */
> > +struct rte_dmadev {
> > +       /**< Enqueue a copy operation onto the DMA device. */
> > +       dmadev_copy_t copy;
> > +       /**< Enqueue a scatter list copy operation onto the DMA device. */
> > +       dmadev_copy_sg_t copy_sg;
> > +       /**< Enqueue a fill operation onto the DMA device. */
> > +       dmadev_fill_t fill;
> > +       /**< Enqueue a scatter list fill operation onto the DMA device. */
> > +       dmadev_fill_sg_t fill_sg;
> > +       /**< Add a fence to force ordering between operations. */
> > +       dmadev_fence_t fence;
> > +       /**< Trigger hardware to begin performing enqueued operations. */
> > +       dmadev_perform_t perform;
> > +       /**< Returns the number of operations that successful completed. */
> > +       dmadev_completed_t completed;
> > +       /**< Returns the number of operations that failed to complete. */
> > +       dmadev_completed_fails_t completed_fails;
> 
> We need to limit fastpath items in 1 CL
> 

I don't think that is going to be possible. I also would like to see
numbers to check if we benefit much from having these fastpath ops separate
from the regular ops.

> > +
> > +       void *dev_private; /**< PMD-specific private data */
> > +       const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD */
> > +
> > +       uint16_t dev_id; /**< Device ID for this instance */
> > +       int socket_id; /**< Socket ID where memory is allocated */
> > +       struct rte_device *device;
> > +       /**< Device info. supplied during device initialization */
> > +       const char *driver_name; /**< Driver info. supplied by probing */
> > +       char name[RTE_DMADEV_NAME_MAX_LEN]; /**< Device name */
> > +
> > +       RTE_STD_C11
> > +       uint8_t attached : 1; /**< Flag indicating the device is attached */
> > +       uint8_t started : 1; /**< Device state: STARTED(1)/STOPPED(0) */
> 
> Add a couple of reserved fields for future ABI stability.
> 
> > +
> > +} __rte_cache_aligned;
> > +
> > +extern struct rte_dmadev rte_dmadevices[];
> > +

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 21.11] telemetry: remove experimental tags from APIs
  @ 2021-07-05 10:58  3%   ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2021-07-05 10:58 UTC (permalink / raw)
  To: Power, Ciara; +Cc: dev, Ray Kinsella

On Mon, Jul 05, 2021 at 11:09:38AM +0100, Power, Ciara wrote:
> 
> 
> >-----Original Message-----
> >From: Richardson, Bruce <bruce.richardson@intel.com>
> >Sent: Friday 2 July 2021 16:23
> >To: dev@dpdk.org
> >Cc: Ray Kinsella <mdr@ashroe.eu>; Power, Ciara <ciara.power@intel.com>;
> >Richardson, Bruce <bruce.richardson@intel.com>
> >Subject: [PATCH 21.11] telemetry: remove experimental tags from APIs
> >
> >The telemetry APIs have been present and unchanged for >1 year now, so
> >remove experimental tag from them.
> >
> >Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> >---
> > lib/telemetry/rte_telemetry.h | 18 ------------------
> > lib/telemetry/version.map     |  2 +-
> > 2 files changed, 1 insertion(+), 19 deletions(-)
> >
> <snip>
> 
> Hi Bruce,
> 
> +1 for this change.
> 
> I think there are some experimental tags missing from this patch - the legacy telemetry functions that are in "metrics/rte_metrics_telemetry.h" currently have the tags too.

I'm not sure about making those part of the stable ABI.

> Also, there is a reference to the library being experimental in the Telemetry User Guide doc.
> 
I missed checking the "howto" doc on telemetry, yes. I'll include that in a
v2.


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v8 2/2] bus/auxiliary: introduce auxiliary bus
  2021-07-05  9:35  0%       ` Andrew Rybchenko
@ 2021-07-05 14:57  0%         ` Thomas Monjalon
  2021-07-05 15:06  0%           ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2021-07-05 14:57 UTC (permalink / raw)
  To: Xueming(Steven) Li, Andrew Rybchenko, techboard
  Cc: dev, Wang Haiyue, Kinsella Ray, Parav Pandit, david.marchand

05/07/2021 11:35, Andrew Rybchenko:
> On 7/5/21 12:30 PM, Xueming(Steven) Li wrote:
> > From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> >> I still don't understand if we really need to make the API a part of stable API/ABI in the future. Can it be internal?
> > 
> > There was some discussion on this with Thomas in earlier version.
> > Users might want to register/unregister their own PMD driver,
> > Is this a valid scenario?
> 
> Yes, it is true, but should DPDK care that much about
> out-of-tree drivers. I'm just asking since don't know
> techboard position on it.

I think there is a consensus to allow out-of-tree drivers
without any compatibility commitment.

Some other bus drivers are exporting some API like in this patch.
We could discuss again in techboard what to make internal.
If it is decided to hide buses API, we could change all bus drivers
later in DPDK 21.11.




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v8 2/2] bus/auxiliary: introduce auxiliary bus
  2021-07-05 14:57  0%         ` Thomas Monjalon
@ 2021-07-05 15:06  0%           ` Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2021-07-05 15:06 UTC (permalink / raw)
  To: Thomas Monjalon, Xueming(Steven) Li, techboard
  Cc: dev, Wang Haiyue, Kinsella Ray, Parav Pandit, david.marchand

On 7/5/21 5:57 PM, Thomas Monjalon wrote:
> 05/07/2021 11:35, Andrew Rybchenko:
>> On 7/5/21 12:30 PM, Xueming(Steven) Li wrote:
>>> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>> I still don't understand if we really need to make the API a part of stable API/ABI in the future. Can it be internal?
>>>
>>> There was some discussion on this with Thomas in earlier version.
>>> Users might want to register/unregister their own PMD driver,
>>> Is this a valid scenario?
>>
>> Yes, it is true, but should DPDK care that much about
>> out-of-tree drivers. I'm just asking since don't know
>> techboard position on it.
> 
> I think there is a consensus to allow out-of-tree drivers
> without any compatibility commitment.
> 
> Some other bus drivers are exporting some API like in this patch.
> We could discuss again in techboard what to make internal.
> If it is decided to hide buses API, we could change all bus drivers
> later in DPDK 21.11.

OK, thanks.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v6 1/7] power_intrinsics: use callbacks for comparison
  @ 2021-07-05 15:21  3%           ` Anatoly Burakov
  2021-07-05 15:21  3%           ` [dpdk-dev] [PATCH v6 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-05 15:21 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  1 +
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 121 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index a6ecfdf3ce..c84ac280f5 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -84,6 +84,7 @@ API Changes
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 6c58decece..081682f88b 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index 0361af0d85..7ed196ec22 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index fc9bb5a3e7..d12437d19d 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..17370b77dc 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v6 4/7] power: remove thread safety from PMD power API's
    2021-07-05 15:21  3%           ` [dpdk-dev] [PATCH v6 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
@ 2021-07-05 15:21  3%           ` Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-05 15:21 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: ciara.loftus, konstantin.ananyev

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   5 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 9d1cfac395..f015c509fc 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -88,6 +88,11 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
+
 ABI Changes
 -----------
 
diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..4f6a242364 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,4 +21,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library
  2021-07-05 10:52  0%   ` Bruce Richardson
@ 2021-07-05 15:55  0%     ` Jerin Jacob
  2021-07-05 17:16  0%       ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Jerin Jacob @ 2021-07-05 15:55 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Chengwen Feng, Thomas Monjalon, Ferruh Yigit, Jerin Jacob,
	dpdk-dev, Morten Brørup, Nipun Gupta, Hemant Agrawal,
	Maxime Coquelin, Honnappa Nagarahalli, David Marchand,
	Satananda Burla, Prasun Kapoor, Ananyev, Konstantin, liangma,
	Radha Mohan Chintakuntla

 need

On Mon, Jul 5, 2021 at 4:22 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Sun, Jul 04, 2021 at 03:00:30PM +0530, Jerin Jacob wrote:
> > On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng <fengchengwen@huawei.com> wrote:
> > >
> > > This patch introduces 'dmadevice' which is a generic type of DMA
> > > device.
> > >
> > > The APIs of dmadev library exposes some generic operations which can
> > > enable configuration and I/O with the DMA devices.
> > >
> > > Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >
> > Thanks for v1.
> >
> > I would suggest finalizing  lib/dmadev/rte_dmadev.h before doing the
> > implementation so that you don't need
> > to waste time on rewoking the implementation.
> >
>
> I actually like having the .c file available too. Before we lock down the
> .h file and the API, I want to verify the performance of our drivers with
> the implementation, and having a working .c file is obviously necessary for
> that. So I appreciate having it as part of the RFC.

Ack.

>
> > Comments inline.
> >
> > > ---
> <snip>
> > > + *
> > > + * The DMA framework is built on the following abstraction model:
> > > + *
> > > + *     ------------    ------------
> > > + *     |virt-queue|    |virt-queue|
> > > + *     ------------    ------------
> > > + *            \           /
> > > + *             \         /
> > > + *              \       /
> > > + *            ------------     ------------
> > > + *            | HW-queue |     | HW-queue |
> > > + *            ------------     ------------
> > > + *                   \            /
> > > + *                    \          /
> > > + *                     \        /
> > > + *                     ----------
> > > + *                     | dmadev |
> > > + *                     ----------
> >
> > Continuing the discussion with @Morten Brørup , I think, we need to
> > finalize the model.
> >
>
> +1 and the terminology with regards to queues and channels. With our ioat
> hardware, each HW queue was called a channel for instance.

Looks like <dmadev> <> <channel> can cover all the use cases, if the
HW has more than
1 queues it can be exposed as separate dmadev dev.


>
> > > + *   a) The DMA operation request must be submitted to the virt queue, virt
> > > + *      queues must be created based on HW queues, the DMA device could have
> > > + *      multiple HW queues.
> > > + *   b) The virt queues on the same HW-queue could represent different contexts,
> > > + *      e.g. user could create virt-queue-0 on HW-queue-0 for mem-to-mem
> > > + *      transfer scenario, and create virt-queue-1 on the same HW-queue for
> > > + *      mem-to-dev transfer scenario.
> > > + *   NOTE: user could also create multiple virt queues for mem-to-mem transfer
> > > + *         scenario as long as the corresponding driver supports.
> > > + *
> > > + * The control plane APIs include configure/queue_setup/queue_release/start/
> > > + * stop/reset/close, in order to start device work, the call sequence must be
> > > + * as follows:
> > > + *     - rte_dmadev_configure()
> > > + *     - rte_dmadev_queue_setup()
> > > + *     - rte_dmadev_start()
> >
> > Please add reconfigure behaviour etc, Please check the
> > lib/regexdev/rte_regexdev.h
> > introduction. I have added similar ones so you could reuse as much as possible.
> >
> >
> > > + * The dataplane APIs include two parts:
> > > + *   a) The first part is the submission of operation requests:
> > > + *        - rte_dmadev_copy()
> > > + *        - rte_dmadev_copy_sg() - scatter-gather form of copy
> > > + *        - rte_dmadev_fill()
> > > + *        - rte_dmadev_fill_sg() - scatter-gather form of fill
> > > + *        - rte_dmadev_fence()   - add a fence force ordering between operations
> > > + *        - rte_dmadev_perform() - issue doorbell to hardware
> > > + *      These APIs could work with different virt queues which have different
> > > + *      contexts.
> > > + *      The first four APIs are used to submit the operation request to the virt
> > > + *      queue, if the submission is successful, a cookie (as type
> > > + *      'dma_cookie_t') is returned, otherwise a negative number is returned.
> > > + *   b) The second part is to obtain the result of requests:
> > > + *        - rte_dmadev_completed()
> > > + *            - return the number of operation requests completed successfully.
> > > + *        - rte_dmadev_completed_fails()
> > > + *            - return the number of operation requests failed to complete.
> > > + *
> > > + * The misc APIs include info_get/queue_info_get/stats/xstats/selftest, provide
> > > + * information query and self-test capabilities.
> > > + *
> > > + * About the dataplane APIs MT-safe, there are two dimensions:
> > > + *   a) For one virt queue, the submit/completion API could be MT-safe,
> > > + *      e.g. one thread do submit operation, another thread do completion
> > > + *      operation.
> > > + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_VQ.
> > > + *      If driver don't support it, it's up to the application to guarantee
> > > + *      MT-safe.
> > > + *   b) For multiple virt queues on the same HW queue, e.g. one thread do
> > > + *      operation on virt-queue-0, another thread do operation on virt-queue-1.
> > > + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_MVQ.
> > > + *      If driver don't support it, it's up to the application to guarantee
> > > + *      MT-safe.
> >
> > From an application PoV it may not be good to write portable
> > applications. Please check
> > latest thread with @Morten Brørup
> >
> > > + */
> > > +
> > > +#ifdef __cplusplus
> > > +extern "C" {
> > > +#endif
> > > +
> > > +#include <rte_common.h>
> > > +#include <rte_memory.h>
> > > +#include <rte_errno.h>
> > > +#include <rte_compat.h>
> >
> > Sort in alphabetical order.
> >
> > > +
> > > +/**
> > > + * dma_cookie_t - an opaque DMA cookie
> >
> > Since we are defining the behaviour is not opaque any more.
> > I think, it is better to call ring_idx or so.
> >
>
> +1 for ring index. We don't need a separate type for it though, just
> document the index as an unsigned return value.
>
> >
> > > +#define RTE_DMA_DEV_CAPA_MT_MVQ (1ull << 11) /**< Support MT-safe of multiple virt queues */
> >
> > Please lot of @see for all symbols where it is being used. So that one
> > can understand the full scope of
> > symbols. See below example.
> >
> > #define RTE_REGEXDEV_CAPA_RUNTIME_COMPILATION_F (1ULL << 0)
> > /**< RegEx device does support compiling the rules at runtime unlike
> >  * loading only the pre-built rule database using
> >  * struct rte_regexdev_config::rule_db in rte_regexdev_configure()
> >  *
> >  * @see struct rte_regexdev_config::rule_db, rte_regexdev_configure()
> >  * @see struct rte_regexdev_info::regexdev_capa
> >  */
> >
> > > + *
> > > + * If dma_cookie_t is >=0 it's a DMA operation request cookie, <0 it's a error
> > > + * code.
> > > + * When using cookies, comply with the following rules:
> > > + * a) Cookies for each virtual queue are independent.
> > > + * b) For a virt queue, the cookie are monotonically incremented, when it reach
> > > + *    the INT_MAX, it wraps back to zero.
>
> I disagree with the INT_MAX (or INT32_MAX) value here. If we use that
> value, it means that we cannot use implicit wrap-around inside the CPU and
> have to check for the INT_MAX value. Better to:
> 1. Specify that it wraps at UINT16_MAX which allows us to just use a
> uint16_t internally and wrap-around automatically, or:
> 2. Specify that it wraps at a power-of-2 value >= UINT16_MAX, giving
> drivers the flexibility at what value to wrap around.

I think, (2) better than 1. I think, even better to wrap around the number of
descriptors configured in dev_configure()(We cake make this as the power of 2),


>
> > > + * c) The initial cookie of a virt queue is zero, after the device is stopped or
> > > + *    reset, the virt queue's cookie needs to be reset to zero.
> > > + * Example:
> > > + *    step-1: start one dmadev
> > > + *    step-2: enqueue a copy operation, the cookie return is 0
> > > + *    step-3: enqueue a copy operation again, the cookie return is 1
> > > + *    ...
> > > + *    step-101: stop the dmadev
> > > + *    step-102: start the dmadev
> > > + *    step-103: enqueue a copy operation, the cookie return is 0
> > > + *    ...
> > > + */
> >
> > Good explanation.
> >
> > > +typedef int32_t dma_cookie_t;
> >
>
> As I mentioned before, I'd just remove this, and use regular int types,
> with "ring_idx" as the name.

+1

>
> >
> > > +
> > > +/**
> > > + * dma_scatterlist - can hold scatter DMA operation request
> > > + */
> > > +struct dma_scatterlist {
> >
> > I prefer to change scatterlist -> sg
> > i.e rte_dma_sg
> >
> > > +       void *src;
> > > +       void *dst;
> > > +       uint32_t length;
> > > +};
> > > +
> >
> > > +
> > > +/**
> > > + * A structure used to retrieve the contextual information of
> > > + * an DMA device
> > > + */
> > > +struct rte_dmadev_info {
> > > +       /**
> > > +        * Fields filled by framewok
> >
> > typo.
> >
> > > +        */
> > > +       struct rte_device *device; /**< Generic Device information */
> > > +       const char *driver_name; /**< Device driver name */
> > > +       int socket_id; /**< Socket ID where memory is allocated */
> > > +
> > > +       /**
> > > +        * Specification fields filled by driver
> > > +        */
> > > +       uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */
> > > +       uint16_t max_hw_queues; /**< Maximum number of HW queues. */
> > > +       uint16_t max_vqs_per_hw_queue;
> > > +       /**< Maximum number of virt queues to allocate per HW queue */
> > > +       uint16_t max_desc;
> > > +       /**< Maximum allowed number of virt queue descriptors */
> > > +       uint16_t min_desc;
> > > +       /**< Minimum allowed number of virt queue descriptors */
> >
> > Please add max_nb_segs. i.e maximum number of segments supported.
> >
> > > +
> > > +       /**
> > > +        * Status fields filled by driver
> > > +        */
> > > +       uint16_t nb_hw_queues; /**< Number of HW queues configured */
> > > +       uint16_t nb_vqs; /**< Number of virt queues configured */
> > > +};
> > > + i
> > > +
> > > +/**
> > > + * dma_address_type
> > > + */
> > > +enum dma_address_type {
> > > +       DMA_ADDRESS_TYPE_IOVA, /**< Use IOVA as dma address */
> > > +       DMA_ADDRESS_TYPE_VA, /**< Use VA as dma address */
> > > +};
> > > +
> > > +/**
> > > + * A structure used to configure a DMA device.
> > > + */
> > > +struct rte_dmadev_conf {
> > > +       enum dma_address_type addr_type; /**< Address type to used */
> >
> > I think, there are 3 kinds of limitations/capabilities.
> >
> > When the system is configured as IOVA as VA
> > 1) Device supports any VA address like memory from rte_malloc(),
> > rte_memzone(), malloc, stack memory
> > 2) Device support only VA address from rte_malloc(), rte_memzone() i.e
> > memory backed by hugepage and added to DMA map.
> >
> > When the system is configured as IOVA as PA
> > 1) Devices support only PA addresses .
> >
> > IMO, Above needs to be  advertised as capability and application needs
> > to align with that
> > and I dont think application requests the driver to work in any of the modes.
> >
> >
>
> I don't think we need this level of detail for addressing capabilities.
> Unless I'm missing something, the hardware should behave exactly as other
> hardware does taking in iova's.  If the user wants to check whether virtual
> addresses to pinned memory can be used directly, the user can call
> "rte_eal_iova_mode". We can't have a situation where some hardware uses one
> type of addresses and another hardware the other.
>
> Therefore, the only additional addressing capability we should need to
> report is that the hardware can use SVM/SVA and use virtual addresses not
> in hugepage memory.

+1.


>
> >
> > > +       uint16_t nb_hw_queues; /**< Number of HW-queues enable to use */
> > > +       uint16_t max_vqs; /**< Maximum number of virt queues to use */
> >
> > You need to what is max value allowed etc i.e it is based on
> > info_get() and mention the field
> > in info structure
> >
> >
> > > +
> > > +/**
> > > + * dma_transfer_direction
> > > + */
> > > +enum dma_transfer_direction {
> >
> > rte_dma_transter_direction
> >
> > > +       DMA_MEM_TO_MEM,
> > > +       DMA_MEM_TO_DEV,
> > > +       DMA_DEV_TO_MEM,
> > > +       DMA_DEV_TO_DEV,
> > > +};
> > > +
> > > +/**
> > > + * A structure used to configure a DMA virt queue.
> > > + */
> > > +struct rte_dmadev_queue_conf {
> > > +       enum dma_transfer_direction direction;
> >
> >
> > > +       /**< Associated transfer direction */
> > > +       uint16_t hw_queue_id; /**< The HW queue on which to create virt queue */
> > > +       uint16_t nb_desc; /**< Number of descriptor for this virt queue */
> > > +       uint64_t dev_flags; /**< Device specific flags */
> >
> > Use of this? Need more comments on this.
> > Since it is in slowpath, We can have non opaque names here based on
> > each driver capability.
> >
> >
> > > +       void *dev_ctx; /**< Device specific context */
> >
> > Use of this ? Need more comment ont this.
> >
>
> I think this should be dropped. We should not have any opaque
> device-specific info in these structs, rather if a particular device needs
> parameters we should call them out. Drivers for which it's not relevant can
> ignore them (and report same in capability if necessary). Since this is not
> a dataplane API, we aren't concerned too much about perf and can size the
> struct appropriately.
>
> >
> > Please add some good amount of reserved bits and have API to init this
> > structure for future ABI stability, say rte_dmadev_queue_config_init()
> > or so.
> >
>
> I don't think that is necessary. Since the config struct is used only as
> parameter to the config function, any changes to it can be managed by
> versioning that single function. Padding would only be necessary if we had
> an array of these config structs somewhere.

OK.

For some reason, the versioning API looks ugly to me in code instead of keeping
some rsvd fields look cool to me with init function.

But I agree. function versioning works in this case. No need to find other API
if tt is not general DPDK API practice.

In other libraries, I have seen such _init or function that can use
for this as well as filling default value
in some cases implementation values is not zero).
So that application can avoid memset for param structure.
Added rte_event_queue_default_conf_get() in eventdev spec for this.

No strong opinion on this.



>
> >
> > > +
> > > +/**
> > > + * A structure used to retrieve information of a DMA virt queue.
> > > + */
> > > +struct rte_dmadev_queue_info {
> > > +       enum dma_transfer_direction direction;
> >
> > A queue may support all directions so I think it should be a bitfield.
> >
> > > +       /**< Associated transfer direction */
> > > +       uint16_t hw_queue_id; /**< The HW queue on which to create virt queue */
> > > +       uint16_t nb_desc; /**< Number of descriptor for this virt queue */
> > > +       uint64_t dev_flags; /**< Device specific flags */
> > > +};
> > > +
> >
> > > +__rte_experimental
> > > +static inline dma_cookie_t
> > > +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id,
> > > +                  const struct dma_scatterlist *sg,
> > > +                  uint32_t sg_len, uint64_t flags)
> >
> > I would like to change this as:
> > rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id, const struct
> > rte_dma_sg *src, uint32_t nb_src,
> > const struct rte_dma_sg *dst, uint32_t nb_dst) or so allow the use case like
> > src 30 MB copy can be splitted as written as 1 MB x 30 dst.
> >
> >
> >
> > > +{
> > > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > > +       return (*dev->copy_sg)(dev, vq_id, sg, sg_len, flags);
> > > +}
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > + *
> > > + * Enqueue a fill operation onto the DMA virt queue
> > > + *
> > > + * This queues up a fill operation to be performed by hardware, but does not
> > > + * trigger hardware to begin that operation.
> > > + *
> > > + * @param dev_id
> > > + *   The identifier of the device.
> > > + * @param vq_id
> > > + *   The identifier of virt queue.
> > > + * @param pattern
> > > + *   The pattern to populate the destination buffer with.
> > > + * @param dst
> > > + *   The address of the destination buffer.
> > > + * @param length
> > > + *   The length of the destination buffer.
> > > + * @param flags
> > > + *   An opaque flags for this operation.
> >
> > PLEASE REMOVE opaque stuff from fastpath it will be a pain for
> > application writers as
> > they need to write multiple combinations of fastpath. flags are OK, if
> > we have a valid
> > generic flag now to control the transfer behavior.
> >
>
> +1. Flags need to be explicitly listed. If we don't have any flags for now,
> we can specify that the value must be given as zero and it's for future
> use.

OK.

>
> >
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > + *
> > > + * Add a fence to force ordering between operations
> > > + *
> > > + * This adds a fence to a sequence of operations to enforce ordering, such that
> > > + * all operations enqueued before the fence must be completed before operations
> > > + * after the fence.
> > > + * NOTE: Since this fence may be added as a flag to the last operation enqueued,
> > > + * this API may not function correctly when called immediately after an
> > > + * "rte_dmadev_perform" call i.e. before any new operations are enqueued.
> > > + *
> > > + * @param dev_id
> > > + *   The identifier of the device.
> > > + * @param vq_id
> > > + *   The identifier of virt queue.
> > > + *
> > > + * @return
> > > + *   - =0: Successful add fence.
> > > + *   - <0: Failure to add fence.
> > > + *
> > > + * NOTE: The caller must ensure that the input parameter is valid and the
> > > + *       corresponding device supports the operation.
> > > + */
> > > +__rte_experimental
> > > +static inline int
> > > +rte_dmadev_fence(uint16_t dev_id, uint16_t vq_id)
> > > +{
> > > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > > +       return (*dev->fence)(dev, vq_id);
> > > +}
> >
> > Since HW submission is in a queue(FIFO) the ordering is always
> > maintained. Right?
> > Could you share more details and use case of fence() from
> > driver/application PoV?
> >
>
> There are different kinds of ordering to consider, ordering of completions
> and the ordering of operations. While jobs are reported as completed to the
> user in order, for performance hardware, may overlap individual jobs within
> a burst (or even across bursts). Therefore, we need a fence operation to
> inform hardware that one job should not be started until the other has
> fully completed.

Got it. In order to save space if first CL size for fastpath(Saving 8B
for the pointer) and to avoid
function overhead, Can we use one bit of flags of op function to
enable the fence?

>
> >
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > + *
> > > + * Trigger hardware to begin performing enqueued operations
> > > + *
> > > + * This API is used to write the "doorbell" to the hardware to trigger it
> > > + * to begin the operations previously enqueued by rte_dmadev_copy/fill()
> > > + *
> > > + * @param dev_id
> > > + *   The identifier of the device.
> > > + * @param vq_id
> > > + *   The identifier of virt queue.
> > > + *
> > > + * @return
> > > + *   - =0: Successful trigger hardware.
> > > + *   - <0: Failure to trigger hardware.
> > > + *
> > > + * NOTE: The caller must ensure that the input parameter is valid and the
> > > + *       corresponding device supports the operation.
> > > + */
> > > +__rte_experimental
> > > +static inline int
> > > +rte_dmadev_perform(uint16_t dev_id, uint16_t vq_id)
> > > +{
> > > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > > +       return (*dev->perform)(dev, vq_id);
> > > +}
> >
> > Since we have additional function call overhead in all the
> > applications for this scheme, I would like to understand
> > the use of doing this way vs enq does the doorbell implicitly from
> > driver/application PoV?
> >
>
> In our benchmarks it's just faster. When we tested it, the overhead of the
> function calls was noticably less than the cost of building up the
> parameter array(s) for passing the jobs in as a burst. [We don't see this
> cost with things like NIC I/O since DPDK tends to already have the mbuf
> fully populated before the TX call anyway.]

OK. I agree with stack population.

My question was more on doing implicit doorbell update enq. Is doorbell write
costly in other HW compare to a function call? In our HW, it is just write of
the number of instructions written in a register.

Also, we need to again access the internal PMD memory structure to find
where to write etc if it is a separate function.


>
> >
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > + *
> > > + * Returns the number of operations that have been successful completed.
> > > + *
> > > + * @param dev_id
> > > + *   The identifier of the device.
> > > + * @param vq_id
> > > + *   The identifier of virt queue.
> > > + * @param nb_cpls
> > > + *   The maximum number of completed operations that can be processed.
> > > + * @param[out] cookie
> > > + *   The last completed operation's cookie.
> > > + * @param[out] has_error
> > > + *   Indicates if there are transfer error.
> > > + *
> > > + * @return
> > > + *   The number of operations that successful completed.
> >
> > successfully
> >
> > > + *
> > > + * NOTE: The caller must ensure that the input parameter is valid and the
> > > + *       corresponding device supports the operation.
> > > + */
> > > +__rte_experimental
> > > +static inline uint16_t
> > > +rte_dmadev_completed(uint16_t dev_id, uint16_t vq_id, const uint16_t nb_cpls,
> > > +                    dma_cookie_t *cookie, bool *has_error)
> > > +{
> > > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > > +       has_error = false;
> > > +       return (*dev->completed)(dev, vq_id, nb_cpls, cookie, has_error);
> >
> > It may be better to have cookie/ring_idx as third argument.
> >
>
> No strong opinions here, but having it as in the code above means all
> input parameters come before all output, which makes sense to me.

+1

>
> > > +}
> > > +
> > > +/**
> > > + * @warning
> > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > + *
> > > + * Returns the number of operations that failed to complete.
> > > + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> > > + *
> > > + * @param dev_id
> > > + *   The identifier of the device.
> > > + * @param vq_id
> > > + *   The identifier of virt queue.
> > (> + * @param nb_status
> > > + *   Indicates the size  of status array.
> > > + * @param[out] status
> > > + *   The error code of operations that failed to complete.
> > > + * @param[out] cookie
> > > + *   The last failed completed operation's cookie.
> > > + *
> > > + * @return
> > > + *   The number of operations that failed to complete.
> > > + *
> > > + * NOTE: The caller must ensure that the input parameter is valid and the
> > > + *       corresponding device supports the operation.
> > > + */
> > > +__rte_experimental
> > > +static inline uint16_t
> > > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vq_id,
> > > +                          const uint16_t nb_status, uint32_t *status,
> > > +                          dma_cookie_t *cookie)
> >
> > IMO, it is better to move cookie/rind_idx at 3.
> > Why it would return any array of errors? since it called after
> > rte_dmadev_completed() has
> > has_error. Is it better to change
> >
> > rte_dmadev_error_status((uint16_t dev_id, uint16_t vq_id, dma_cookie_t
> > *cookie,  uint32_t *status)
> >
> > I also think, we may need to set status as bitmask and enumerate all
> > the combination of error codes
> > of all the driver and return string from driver existing rte_flow_error
> >
> > See
> > struct rte_flow_error {
> >         enum rte_flow_error_type type; /**< Cause field and error types. */
> >         const void *cause; /**< Object responsible for the error. */
> >         const char *message; /**< Human-readable error message. */
> > };
> >
>
> I think we need a multi-return value API here, as we may add operations in
> future which have non-error status values to return. The obvious case is
> DMA engines which support "compare" operations. In that case a successful
> compare (as in there were no DMA or HW errors) can return "equal" or
> "not-equal" as statuses. For general "copy" operations, the faster
> completion op can be used to just return successful values (and only call
> this status version on error), while apps using those compare ops or a
> mixture of copy and compare ops, would always use the slower one that
> returns status values for each and every op..
>
> The ioat APIs used 32-bit integer values for this status array so as to
> allow e.g. 16-bits for error code and 16-bits for future status values. For
> most operations there should be a fairly small set of things that can go
> wrong, i.e. bad source address, bad destination address or invalid length.
> Within that we may have a couple of specifics for why an address is bad,
> but even so I don't think we need to start having multiple bit
> combinations.

OK. What is the purpose of errors status? Is it for application printing it or
Does the application need to take any action based on specific error requests?

If the former is scope, then we need to define the standard enum value
for the error right?
ie. uint32_t *status needs to change to enum rte_dma_error or so.



>
> > > +{
> > > +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > > +       return (*dev->completed_fails)(dev, vq_id, nb_status, status, cookie);
> > > +}
> > > +
> > > +struct rte_dmadev_stats {
> > > +       uint64_t enqueue_fail_count;
> > > +       /**< Conut of all operations which failed enqueued */
> > > +       uint64_t enqueued_count;
> > > +       /**< Count of all operations which successful enqueued */
> > > +       uint64_t completed_fail_count;
> > > +       /**< Count of all operations which failed to complete */
> > > +       uint64_t completed_count;
> > > +       /**< Count of all operations which successful complete */
> > > +};
> >
> > We need to have capability API to tell which items are
> > updated/supported by the driver.
> >
>
> I also would remove the enqueue fail counts, since they are better counted
> by the app. If a driver reports 20,000 failures we have no way of knowing
> if that is 20,000 unique operations which failed to enqueue or a single
> operation which failed to enqueue 20,000 times but succeeded on attempt
> 20,001.
>
> >
> > > diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> > > new file mode 100644
> > > index 0000000..a3afea2
> > > --- /dev/null
> > > +++ b/lib/dmadev/rte_dmadev_core.h
> > > @@ -0,0 +1,98 @@
> > > +/* SPDX-License-Identifier: BSD-3-Clause
> > > + * Copyright 2021 HiSilicon Limited.
> > > + */
> > > +
> > > +#ifndef _RTE_DMADEV_CORE_H_
> > > +#define _RTE_DMADEV_CORE_H_
> > > +
> > > +/**
> > > + * @file
> > > + *
> > > + * RTE DMA Device internal header.
> > > + *
> > > + * This header contains internal data types. But they are still part of the
> > > + * public API because they are used by inline public functions.
> > > + */
> > > +
> > > +struct rte_dmadev;
> > > +
> > > +typedef dma_cookie_t (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vq_id,
> > > +                                     void *src, void *dst,
> > > +                                     uint32_t length, uint64_t flags);
> > > +/**< @internal Function used to enqueue a copy operation. */
> >
> > To avoid namespace conflict(as it is public API) use rte_
> >
> >
> > > +
> > > +/**
> > > + * The data structure associated with each DMA device.
> > > + */
> > > +struct rte_dmadev {
> > > +       /**< Enqueue a copy operation onto the DMA device. */
> > > +       dmadev_copy_t copy;
> > > +       /**< Enqueue a scatter list copy operation onto the DMA device. */
> > > +       dmadev_copy_sg_t copy_sg;
> > > +       /**< Enqueue a fill operation onto the DMA device. */
> > > +       dmadev_fill_t fill;
> > > +       /**< Enqueue a scatter list fill operation onto the DMA device. */
> > > +       dmadev_fill_sg_t fill_sg;
> > > +       /**< Add a fence to force ordering between operations. */
> > > +       dmadev_fence_t fence;
> > > +       /**< Trigger hardware to begin performing enqueued operations. */
> > > +       dmadev_perform_t perform;
> > > +       /**< Returns the number of operations that successful completed. */
> > > +       dmadev_completed_t completed;
> > > +       /**< Returns the number of operations that failed to complete. */
> > > +       dmadev_completed_fails_t completed_fails;
> >
> > We need to limit fastpath items in 1 CL
> >
>
> I don't think that is going to be possible. I also would like to see
> numbers to check if we benefit much from having these fastpath ops separate
> from the regular ops.
>
> > > +
> > > +       void *dev_private; /**< PMD-specific private data */
> > > +       const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD */
> > > +
> > > +       uint16_t dev_id; /**< Device ID for this instance */
> > > +       int socket_id; /**< Socket ID where memory is allocated */
> > > +       struct rte_device *device;
> > > +       /**< Device info. supplied during device initialization */
> > > +       const char *driver_name; /**< Driver info. supplied by probing */
> > > +       char name[RTE_DMADEV_NAME_MAX_LEN]; /**< Device name */
> > > +
> > > +       RTE_STD_C11
> > > +       uint8_t attached : 1; /**< Flag indicating the device is attached */
> > > +       uint8_t started : 1; /**< Device state: STARTED(1)/STOPPED(0) */
> >
> > Add a couple of reserved fields for future ABI stability.
> >
> > > +
> > > +} __rte_cache_aligned;
> > > +
> > > +extern struct rte_dmadev rte_dmadevices[];
> > > +

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library
  2021-07-05 15:55  0%     ` Jerin Jacob
@ 2021-07-05 17:16  0%       ` Bruce Richardson
  2021-07-07  8:08  0%         ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2021-07-05 17:16 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Chengwen Feng, Thomas Monjalon, Ferruh Yigit, Jerin Jacob,
	dpdk-dev, Morten Brørup, Nipun Gupta, Hemant Agrawal,
	Maxime Coquelin, Honnappa Nagarahalli, David Marchand,
	Satananda Burla, Prasun Kapoor, Ananyev, Konstantin, liangma,
	Radha Mohan Chintakuntla

On Mon, Jul 05, 2021 at 09:25:34PM +0530, Jerin Jacob wrote:
> 
> On Mon, Jul 5, 2021 at 4:22 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Sun, Jul 04, 2021 at 03:00:30PM +0530, Jerin Jacob wrote:
> > > On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng <fengchengwen@huawei.com> wrote:
> > > >
> > > > This patch introduces 'dmadevice' which is a generic type of DMA
> > > > device.
<snip>
> >
> > +1 and the terminology with regards to queues and channels. With our ioat
> > hardware, each HW queue was called a channel for instance.
> 
> Looks like <dmadev> <> <channel> can cover all the use cases, if the
> HW has more than
> 1 queues it can be exposed as separate dmadev dev.
> 

Fine for me.

However, just to confirm that Morten's suggestion of using a
(device-specific void *) channel pointer rather than dev_id + channel_id
pair of parameters won't work for you? You can't store a pointer or dev
index in the channel struct in the driver?

> 
<snip>
> > > > + *
> > > > + * If dma_cookie_t is >=0 it's a DMA operation request cookie, <0 it's a error
> > > > + * code.
> > > > + * When using cookies, comply with the following rules:
> > > > + * a) Cookies for each virtual queue are independent.
> > > > + * b) For a virt queue, the cookie are monotonically incremented, when it reach
> > > > + *    the INT_MAX, it wraps back to zero.
> >
> > I disagree with the INT_MAX (or INT32_MAX) value here. If we use that
> > value, it means that we cannot use implicit wrap-around inside the CPU and
> > have to check for the INT_MAX value. Better to:
> > 1. Specify that it wraps at UINT16_MAX which allows us to just use a
> > uint16_t internally and wrap-around automatically, or:
> > 2. Specify that it wraps at a power-of-2 value >= UINT16_MAX, giving
> > drivers the flexibility at what value to wrap around.
> 
> I think, (2) better than 1. I think, even better to wrap around the number of
> descriptors configured in dev_configure()(We cake make this as the power of 2),
> 

Interesting, I hadn't really considered that before. My only concern
would be if an app wants to keep values in the app ring for a while after
they have been returned from dmadev. I thought it easier to have the full
16-bit counter value returned to the user to give the most flexibility,
given that going from that to any power-of-2 ring size smaller is a trivial
operation.

Overall, while my ideal situation is to always have a 0..UINT16_MAX return
value from the function, I can live with your suggestion of wrapping at
ring_size, since drivers will likely do that internally anyway.
I think wrapping at INT32_MAX is too awkward and will be error prone since
we can't rely on hardware automatically wrapping to zero, nor on the driver
having pre-masked the value.

> >
> > > > + * c) The initial cookie of a virt queue is zero, after the device is stopped or
> > > > + *    reset, the virt queue's cookie needs to be reset to zero.
<snip>
> > >
> > > Please add some good amount of reserved bits and have API to init this
> > > structure for future ABI stability, say rte_dmadev_queue_config_init()
> > > or so.
> > >
> >
> > I don't think that is necessary. Since the config struct is used only as
> > parameter to the config function, any changes to it can be managed by
> > versioning that single function. Padding would only be necessary if we had
> > an array of these config structs somewhere.
> 
> OK.
> 
> For some reason, the versioning API looks ugly to me in code instead of keeping
> some rsvd fields look cool to me with init function.
> 
> But I agree. function versioning works in this case. No need to find other API
> if tt is not general DPDK API practice.
> 

The one thing I would suggest instead of the padding is for the internal
APIS, to pass the struct size through, since we can't version those - and
for padding we can't know whether any replaced padding should be used or
not. Specifically:

	typedef int (*rte_dmadev_configure_t)(struct rte_dmadev *dev, struct
			rte_dmadev_conf *cfg, size_t cfg_size);

but for the public function:

	int
	rte_dmadev_configure(struct rte_dmadev *dev, struct
			rte_dmadev_conf *cfg)
	{
		...
		ret = dev->ops.configure(dev, cfg, sizeof(*cfg));
		...
	}

Then if we change the structure and version the config API, the driver can
tell from the size what struct version it is and act accordingly. Without
that, each time the struct changed, we'd have to add a new function pointer
to the device ops.

> In other libraries, I have seen such _init or function that can use
> for this as well as filling default value
> in some cases implementation values is not zero).
> So that application can avoid memset for param structure.
> Added rte_event_queue_default_conf_get() in eventdev spec for this.
> 

I think that would largely have the same issues, unless it returned a
pointer to data inside the driver - and which therefore could not be
modified. Alternatively it would mean that the memory would have been
allocated in the driver and we would need to ensure proper cleanup
functions were called to free memory afterwards. Supporting having the
config parameter as a local variable I think makes things a lot easier.

> No strong opinion on this.
> 
> 
> 
> >
> > >
> > > > +
> > > > +/**
> > > > + * A structure used to retrieve information of a DMA virt queue.
> > > > + */
> > > > +struct rte_dmadev_queue_info {
> > > > +       enum dma_transfer_direction direction;
> > >
> > > A queue may support all directions so I think it should be a bitfield.
> > >
> > > > +       /**< Associated transfer direction */
> > > > +       uint16_t hw_queue_id; /**< The HW queue on which to create virt queue */
> > > > +       uint16_t nb_desc; /**< Number of descriptor for this virt queue */
> > > > +       uint64_t dev_flags; /**< Device specific flags */
> > > > +};
> > > > +
> > >
> > > > +__rte_experimental
> > > > +static inline dma_cookie_t
> > > > +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id,
> > > > +                  const struct dma_scatterlist *sg,
> > > > +                  uint32_t sg_len, uint64_t flags)
> > >
> > > I would like to change this as:
> > > rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id, const struct
> > > rte_dma_sg *src, uint32_t nb_src,
> > > const struct rte_dma_sg *dst, uint32_t nb_dst) or so allow the use case like
> > > src 30 MB copy can be splitted as written as 1 MB x 30 dst.
> > >

Out of interest, do you see much benefit (and in what way) from having the
scatter-gather support? Unlike sending 5 buffers in one packet rather than
5 buffers in 5 packets to a NIC, copying an array of memory in one op vs
multiple is functionally identical.

> > >
> > >
<snip>
> Got it. In order to save space if first CL size for fastpath(Saving 8B
> for the pointer) and to avoid
> function overhead, Can we use one bit of flags of op function to
> enable the fence?
> 

The original ioat implementation did exactly that. However, I then
discovered that because a fence logically belongs between two operations,
does the fence flag on an operation mean "don't do any jobs after this
until this job has completed" or does it mean "don't start this job until
all previous jobs have completed". [Or theoretically does it mean both :-)]
Naturally, some hardware does it the former way (i.e. fence flag goes on
last op before fence), while other hardware the latter way (i.e. fence flag
goes on first op after the fence). Therefore, since fencing is about
ordering *between* two (sets of) jobs, I decided that it should do exactly
that and go between two jobs, so there is no ambiguity!

However, I'm happy enough to switch to having a fence flag, but I think if
we do that, it should be put in the "first job after fence" case, because
it is always easier to modify a previously written job if we need to, than
to save the flag for a future one.

Alternatively, if we keep the fence as a separate function, I'm happy
enough for it not to be on the same cacheline as the "hot" operations,
since fencing will always introduce a small penalty anyway.

> >
> > >
<snip>
> > > Since we have additional function call overhead in all the
> > > applications for this scheme, I would like to understand
> > > the use of doing this way vs enq does the doorbell implicitly from
> > > driver/application PoV?
> > >
> >
> > In our benchmarks it's just faster. When we tested it, the overhead of the
> > function calls was noticably less than the cost of building up the
> > parameter array(s) for passing the jobs in as a burst. [We don't see this
> > cost with things like NIC I/O since DPDK tends to already have the mbuf
> > fully populated before the TX call anyway.]
> 
> OK. I agree with stack population.
> 
> My question was more on doing implicit doorbell update enq. Is doorbell write
> costly in other HW compare to a function call? In our HW, it is just write of
> the number of instructions written in a register.
> 
> Also, we need to again access the internal PMD memory structure to find
> where to write etc if it is a separate function.
> 

The cost varies depending on a number of factors - even writing to a single
HW register can be very slow if that register is mapped as device
(uncacheable) memory, since (AFAIK) it will act as a full fence and wait
for the write to go all the way to hardware. For more modern HW, the cost
can be lighter. However, any cost of HW writes is going to be the same
whether its a separate function call or not.

However, the main thing about the doorbell update is that it's a
once-per-burst thing, rather than a once-per-job. Therefore, even if you
have to re-read the struct memory (which is likely still somewhere in your
cores' cache), any extra small cost of doing so is to be amortized over the
cost of a whole burst of copies.

> 
> >
> > >
<snip>
> > > > +
> > > > +/**
> > > > + * @warning
> > > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > > + *
> > > > + * Returns the number of operations that failed to complete.
> > > > + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> > > > + *
> > > > + * @param dev_id
> > > > + *   The identifier of the device.
> > > > + * @param vq_id
> > > > + *   The identifier of virt queue.
> > > (> + * @param nb_status
> > > > + *   Indicates the size  of status array.
> > > > + * @param[out] status
> > > > + *   The error code of operations that failed to complete.
> > > > + * @param[out] cookie
> > > > + *   The last failed completed operation's cookie.
> > > > + *
> > > > + * @return
> > > > + *   The number of operations that failed to complete.
> > > > + *
> > > > + * NOTE: The caller must ensure that the input parameter is valid and the
> > > > + *       corresponding device supports the operation.
> > > > + */
> > > > +__rte_experimental
> > > > +static inline uint16_t
> > > > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vq_id,
> > > > +                          const uint16_t nb_status, uint32_t *status,
> > > > +                          dma_cookie_t *cookie)
> > >
> > > IMO, it is better to move cookie/rind_idx at 3.
> > > Why it would return any array of errors? since it called after
> > > rte_dmadev_completed() has
> > > has_error. Is it better to change
> > >
> > > rte_dmadev_error_status((uint16_t dev_id, uint16_t vq_id, dma_cookie_t
> > > *cookie,  uint32_t *status)
> > >
> > > I also think, we may need to set status as bitmask and enumerate all
> > > the combination of error codes
> > > of all the driver and return string from driver existing rte_flow_error
> > >
> > > See
> > > struct rte_flow_error {
> > >         enum rte_flow_error_type type; /**< Cause field and error types. */
> > >         const void *cause; /**< Object responsible for the error. */
> > >         const char *message; /**< Human-readable error message. */
> > > };
> > >
> >
> > I think we need a multi-return value API here, as we may add operations in
> > future which have non-error status values to return. The obvious case is
> > DMA engines which support "compare" operations. In that case a successful
> > compare (as in there were no DMA or HW errors) can return "equal" or
> > "not-equal" as statuses. For general "copy" operations, the faster
> > completion op can be used to just return successful values (and only call
> > this status version on error), while apps using those compare ops or a
> > mixture of copy and compare ops, would always use the slower one that
> > returns status values for each and every op..
> >
> > The ioat APIs used 32-bit integer values for this status array so as to
> > allow e.g. 16-bits for error code and 16-bits for future status values. For
> > most operations there should be a fairly small set of things that can go
> > wrong, i.e. bad source address, bad destination address or invalid length.
> > Within that we may have a couple of specifics for why an address is bad,
> > but even so I don't think we need to start having multiple bit
> > combinations.
> 
> OK. What is the purpose of errors status? Is it for application printing it or
> Does the application need to take any action based on specific error requests?

It's largely for information purposes, but in the case of SVA/SVM errors
could occur due to the memory not being pinned, i.e. a page fault, in some
cases. If that happens, then it's up the app to either touch the memory and
retry the copy, or to do a SW memcpy as a fallback.

In other error cases, I think it's good to tell the application if it's
passing around bad data, or data that is beyond the scope of hardware, e.g.
a copy that is beyond what can be done in a single transaction for a HW
instance. Given that there are always things that can go wrong, I think we
need some error reporting mechanism.

> If the former is scope, then we need to define the standard enum value
> for the error right?
> ie. uint32_t *status needs to change to enum rte_dma_error or so.
> 
Sure. Perhaps an error/status structure either is an option, where we
explicitly call out error info from status info.

> 
> 
<snip to end>

/Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] 20.11.2 patches review and test
  2021-06-30 10:33  0% ` Jiang, YuX
@ 2021-07-06  2:37  0%   ` Xueming(Steven) Li
  0 siblings, 0 replies; 200+ results
From: Xueming(Steven) Li @ 2021-07-06  2:37 UTC (permalink / raw)
  To: Jiang, YuX, stable
  Cc: dev, Abhishek Marathe, Akhil Goyal, Ali Alnubani, Walker,
	Benjamin, David Christensen, Govindharajan, Hariprasad,
	Hemant Agrawal, Stokes, Ian, Jerin Jacob, Mcnamara, John,
	Ju-Hyoung Lee, Kevin Traynor, Luca Boccassi, Pei Zhang, Yu,
	PingX, Xu, Qian Q, Raslan Darawsheh, NBU-Contact-Thomas Monjalon,
	Peng, Yuan, Chen, Zhaoyan



> -----Original Message-----
> From: Jiang, YuX <yux.jiang@intel.com>
> Sent: Wednesday, June 30, 2021 6:33 PM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>; stable@dpdk.org
> Cc: dev@dpdk.org; Abhishek Marathe <Abhishek.Marathe@microsoft.com>; Akhil Goyal <akhil.goyal@nxp.com>; Ali Alnubani
> <alialnu@nvidia.com>; Walker, Benjamin <benjamin.walker@intel.com>; David Christensen <drc@linux.vnet.ibm.com>;
> Govindharajan, Hariprasad <hariprasad.govindharajan@intel.com>; Hemant Agrawal <hemant.agrawal@nxp.com>; Stokes, Ian
> <ian.stokes@intel.com>; Jerin Jacob <jerinj@marvell.com>; Mcnamara, John <john.mcnamara@intel.com>; Ju-Hyoung Lee
> <juhlee@microsoft.com>; Kevin Traynor <ktraynor@redhat.com>; Luca Boccassi <bluca@debian.org>; Pei Zhang
> <pezhang@redhat.com>; Yu, PingX <pingx.yu@intel.com>; Xu, Qian Q <qian.q.xu@intel.com>; Raslan Darawsheh
> <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Peng, Yuan <yuan.peng@intel.com>; Chen,
> Zhaoyan <zhaoyan.chen@intel.com>
> Subject: RE: [dpdk-dev] 20.11.2 patches review and test
> 
> All,
> Testing with dpdk v20.11.2-rc2 from Intel looks good, no critical issue is found. All of them are known issues.
> Below two issues has been fixed in 20.11.2-rc2:
>   1) Fedora34 GCC11 and Clang12 build failed.
>   2) dcf_lifecycle/handle_acl_filter_05: after reset port the mac changed.
> 
> # Basic Intel(R) NIC testing
> *PF(i40e, ixgbe): test scenarios including rte_flow/TSO/Jumboframe/checksum offload/Tunnel, etc. Listed but not all.
> - Below two known issues are found.
>   1)https://bugs.dpdk.org/show_bug.cgi?id=687 : unit_tests_power/power_cpufreq: unit test failed. This issue is found in 21.05 and
> not fixed yet.
>   2)ddp_gtp_qregion/fd_gtpu_ipv4_dstip: flow director does not work. This issue is found in 21.05, fixed in 21.08.
>     Fixed patch link: http://patches.dpdk.org/project/dpdk/patch/20210519032745.707639-1-stevex.yang@intel.com/
> *VF(i40e,ixgbe): test scenarios including vf-rte_flow/TSO/Jumboframe/checksum offload/Tunnel, Listed but not all.
> - No new issues are found.
> *PF/VF(ice): test scenarios including switch features/Flow Director/Advanced RSS/ACL/DCF/Flexible Descriptor and so on, Listed but
> not all.
> - Below 3 known DPDK issues are found.
>   1)rxtx_offload/rxoffload_port: Pkt1 can't be distributed to the same queue. This issue is found in 21.05, fixed in 21.08
>     Fixed patch link: http://patches.dpdk.org/project/dpdk/patch/20210527064251.242076-1-dapengx.yu@intel.com/
>   2)cvl_advanced_iavf_rss: change the SCTP port value, the hash value remains unchanged. This issue is found in 20.11-rc3, fixed in
> 21.02, but it’s belong to 21.02 new feature, won’t backporting to LTS20.11.
>   3)Can't create 512 acl rules after creating a full mask switch rule. This issue is also occurred in dpdk 20.11 and not fixed yet.
> * Build: cover the build test combination with latest GCC/Clang/ICC version and the popular OS revision such as Ubuntu20.04,
> CentOS8.3 and so on. Listed but not all.
> - All passed.
> * Intel NIC single core/NIC performance: test scenarios including PF/VF single core performance test(AVX2+AVX512) test and so on.
> Listed but not all.
> - All passed. No big data drop.
> 
> # Basic cryptodev and virtio testing
> * Virtio: both function and performance test are covered. Such as PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf
> testing, etc.. Listed but not all.
> - One known issues as below:
> > (1)The UDP fragmentation offload feature of Virtio-net device can’t be turned on in the VM, kernel issue, bugzilla has been submited:
> https://bugzilla.kernel.org/show_bug.cgi?id=207075, not fixed yet.
> * Cryptodev:
> - Function test: test scenarios including Cryptodev API testing/CompressDev ISA-L/QAT/ZLIB PMD Testing/FIPS, etc. Listed but not all.
>   - All passed.
> - Performance test: test scenarios including Thoughput Performance /Cryptodev Latency, etc. Listed but not all.
>   - No big data drop.
> 
> Best regards,
> Yu Jiang

Thank you!

> 
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Xueming Li
> > Sent: Sunday, June 27, 2021 7:28 AM
> > To: stable@dpdk.org
> > Cc: dev@dpdk.org; Abhishek Marathe <Abhishek.Marathe@microsoft.com>;
> > Akhil Goyal <akhil.goyal@nxp.com>; Ali Alnubani <alialnu@nvidia.com>;
> > Walker, Benjamin <benjamin.walker@intel.com>; David Christensen
> > <drc@linux.vnet.ibm.com>; Govindharajan, Hariprasad
> > <hariprasad.govindharajan@intel.com>; Hemant Agrawal
> > <hemant.agrawal@nxp.com>; Stokes, Ian <ian.stokes@intel.com>; Jerin
> > Jacob <jerinj@marvell.com>; Mcnamara, John <john.mcnamara@intel.com>;
> > Ju-Hyoung Lee <juhlee@microsoft.com>; Kevin Traynor
> > <ktraynor@redhat.com>; Luca Boccassi <bluca@debian.org>; Pei Zhang
> > <pezhang@redhat.com>; Yu, PingX <pingx.yu@intel.com>; Xu, Qian Q
> > <qian.q.xu@intel.com>; Raslan Darawsheh <rasland@nvidia.com>; Thomas
> > Monjalon <thomas@monjalon.net>; Peng, Yuan <yuan.peng@intel.com>;
> > Chen, Zhaoyan <zhaoyan.chen@intel.com>; xuemingl@nvidia.com
> > Subject: [dpdk-dev] 20.11.2 patches review and test
> >
> > Hi all,
> >
> > Here is a list of patches targeted for stable release 20.11.2.
> >
> > The planned date for the final release is 6th July.
> >
> > Please help with testing and validation of your use cases and report
> > any issues/results with reply-all to this mail. For the final release
> > the fixes and reported validations will be added to the release notes.
> >
> > A release candidate tarball can be found at:
> >
> >     https://dpdk.org/browse/dpdk-stable/tag/?id=v20.11.2-rc2
> >
> > These patches are located at branch 20.11 of dpdk-stable repo:
> >     https://dpdk.org/browse/dpdk-stable/
> >
> > Thanks.
> >
> > Xueming Li <xuemingl@nvidia.com>
> >
> > ---
> > Adam Dybkowski (3):
> >       common/qat: increase IM buffer size for GEN3
> >       compress/qat: enable compression on GEN3
> >       crypto/qat: fix null authentication request
> >
> > Ajit Khaparde (7):
> >       net/bnxt: fix RSS context cleanup
> >       net/bnxt: check kvargs parsing
> >       net/bnxt: fix resource cleanup
> >       doc: fix formatting in testpmd guide
> >       net/bnxt: fix mismatched type comparison in MAC restore
> >       net/bnxt: check PCI config read
> >       net/bnxt: fix mismatched type comparison in Rx
> >
> > Alvin Zhang (11):
> >       net/ice: fix VLAN filter with PF
> >       net/i40e: fix input set field mask
> >       net/igc: fix Rx RSS hash offload capability
> >       net/igc: fix Rx error counter for bad length
> >       net/e1000: fix Rx error counter for bad length
> >       net/e1000: fix max Rx packet size
> >       net/igc: fix Rx packet size
> >       net/ice: fix fast mbuf freeing
> >       net/iavf: fix VF to PF command failure handling
> >       net/i40e: fix VF RSS configuration
> >       net/igc: fix speed configuration
> >
> > Anatoly Burakov (3):
> >       fbarray: fix log message on truncation error
> >       power: do not skip saving original P-state governor
> >       power: save original ACPI governor always
> >
> > Andrew Boyer (1):
> >       net/ionic: fix completion type in lif init
> >
> > Andrew Rybchenko (4):
> >       net/failsafe: fix RSS hash offload reporting
> >       net/failsafe: report minimum and maximum MTU
> >       common/sfc_efx: remove GENEVE from supported tunnels
> >       net/sfc: fix mark support in EF100 native Rx datapath
> >
> > Andy Moreton (2):
> >       common/sfc_efx/base: limit reported MCDI response length
> >       common/sfc_efx/base: add missing MCDI response length checks
> >
> > Ankur Dwivedi (1):
> >       crypto/octeontx: fix session-less mode
> >
> > Apeksha Gupta (1):
> >       examples/l2fwd-crypto: skip masked devices
> >
> > Arek Kusztal (1):
> >       crypto/qat: fix offset for out-of-place scatter-gather
> >
> > Beilei Xing (1):
> >       net/i40evf: fix packet loss for X722
> >
> > Bing Zhao (1):
> >       net/mlx5: fix loopback for Direct Verbs queue
> >
> > Bruce Richardson (2):
> >       build: exclude meson files from examples installation
> >       raw/ioat: fix script for configuring small number of queues
> >
> > Chaoyong He (1):
> >       doc: fix multiport syntax in nfp guide
> >
> > Chenbo Xia (1):
> >       examples/vhost: check memory table query
> >
> > Chengchang Tang (20):
> >       net/hns3: fix HW buffer size on MTU update
> >       net/hns3: fix processing Tx offload flags
> >       net/hns3: fix Tx checksum for UDP packets with special port
> >       net/hns3: fix long task queue pairs reset time
> >       ethdev: validate input in module EEPROM dump
> >       ethdev: validate input in register info
> >       ethdev: validate input in EEPROM info
> >       net/hns3: fix rollback after setting PVID failure
> >       net/hns3: fix timing in resetting queues
> >       net/hns3: fix queue state when concurrent with reset
> >       net/hns3: fix configure FEC when concurrent with reset
> >       net/hns3: fix use of command status enumeration
> >       examples: add eal cleanup to examples
> >       net/bonding: fix adding itself as its slave
> >       net/hns3: fix timing in mailbox
> >       app/testpmd: fix max queue number for Tx offloads
> >       net/tap: fix interrupt vector array size
> >       net/bonding: fix socket ID check
> >       net/tap: check ioctl on restore
> >       examples/timer: fix time interval
> >
> > Chengwen Feng (50):
> >       net/hns3: fix flow counter value
> >       net/hns3: fix VF mailbox head field
> >       net/hns3: support get device version when dump register
> >       net/hns3: fix some packet types
> >       net/hns3: fix missing outer L4 UDP flag for VXLAN
> >       net/hns3: remove VLAN/QinQ ptypes from support list
> >       test: check thread creation
> >       common/dpaax: fix possible null pointer access
> >       examples/ethtool: remove unused parsing
> >       net/hns3: fix flow director lock
> >       net/e1000/base: fix timeout for shadow RAM write
> >       net/hns3: fix setting default MAC address in bonding of VF
> >       net/hns3: fix possible mismatched response of mailbox
> >       net/hns3: fix VF handling LSC event in secondary process
> >       net/hns3: fix verification of NEON support
> >       mbuf: check shared memory before dumping dynamic space
> >       eventdev: remove redundant thread name setting
> >       eventdev: fix memory leakage on thread creation failure
> >       net/kni: check init result
> >       net/hns3: fix mailbox error message
> >       net/hns3: fix processing link status message on PF
> >       net/hns3: remove unused mailbox macro and struct
> >       net/bonding: fix leak on remove
> >       net/hns3: fix handling link update
> >       net/i40e: fix negative VEB index
> >       net/i40e: remove redundant VSI check in Tx queue setup
> >       net/virtio: fix getline memory leakage
> >       net/hns3: log time delta in decimal format
> >       net/hns3: fix time delta calculation
> >       net/hns3: remove unused macros
> >       net/hns3: fix vector Rx burst limitation
> >       net/hns3: remove read when enabling TM QCN error event
> >       net/hns3: remove unused VMDq code
> >       net/hns3: increase readability in logs
> >       raw/ntb: check SPAD user index
> >       raw/ntb: check memory allocations
> >       ipc: check malloc sync reply result
> >       eal: fix service core list parsing
> >       ipc: use monotonic clock
> >       net/hns3: return error on PCI config write failure
> >       net/hns3: fix log on flow director clear
> >       net/hns3: clear hash map on flow director clear
> >       net/hns3: fix querying flow director counter for out param
> >       net/hns3: fix TM QCN error event report by MSI-X
> >       net/hns3: fix mailbox message ID in log
> >       net/hns3: fix secondary process request start/stop Rx/Tx
> >       net/hns3: fix ordering in secondary process initialization
> >       net/hns3: fail setting FEC if one bit mode is not supported
> >       net/mlx4: fix secondary process initialization ordering
> >       net/mlx5: fix secondary process initialization ordering
> >
> > Ciara Loftus (1):
> >       net/af_xdp: fix error handling during Rx queue setup
> >
> > Ciara Power (2):
> >       telemetry: fix race on callbacks list
> >       test/crypto: fix return value of a skipped test
> >
> > Conor Walsh (1):
> >       examples/l3fwd: fix LPM IPv6 subnets
> >
> > Cristian Dumitrescu (3):
> >       table: fix actions with different data size
> >       pipeline: fix instruction translation
> >       pipeline: fix endianness conversions
> >
> > Dapeng Yu (3):
> >       net/igc: remove MTU setting limitation
> >       net/e1000: remove MTU setting limitation
> >       examples/packet_ordering: fix port configuration
> >
> > David Christensen (1):
> >       config/ppc: reduce number of cores and NUMA nodes
> >
> > David Harton (1):
> >       net/ena: fix releasing Tx ring mbufs
> >
> > David Hunt (4):
> >       test/power: fix CPU frequency check
> >       test/power: add turbo mode to frequency check
> >       test/power: fix low frequency test when turbo enabled
> >       test/power: fix turbo test
> >
> > David Marchand (18):
> >       doc: fix sphinx rtd theme import in GHA
> >       service: clean references to removed symbol
> >       eal: fix evaluation of log level option
> >       ci: hook to GitHub Actions
> >       ci: enable v21 ABI checks
> >       ci: fix package installation in GitHub Actions
> >       ci: ignore APT update failure in GitHub Actions
> >       ci: catch coredumps
> >       vhost: fix offload flags in Rx path
> >       bus/fslmc: remove unused debug macro
> >       eal: fix leak in shared lib mode detection
> >       event/dpaa2: remove unused macros
> >       net/ice/base: fix memory allocation wrapper
> >       net/ice: fix leak on thread termination
> >       devtools: fix orphan symbols check with busybox
> >       net/vhost: restore pseudo TSO support
> >       net/ark: fix leak on thread termination
> >       build: fix drivers selection without Python
> >
> > Dekel Peled (1):
> >       common/mlx5: fix DevX read output buffer size
> >
> > Dmitry Kozlyuk (4):
> >       net/pcap: fix format string
> >       eal/windows: add missing SPDX license tag
> >       buildtools: fix all drivers disabled on Windows
> >       examples/rxtx_callbacks: fix port ID format specifier
> >
> > Ed Czeck (2):
> >       net/ark: update packet director initial state
> >       net/ark: refactor Rx buffer recovery
> >
> > Elad Nachman (2):
> >       kni: support async user request
> >       kni: fix kernel deadlock with bifurcated device
> >
> > Feifei Wang (2):
> >       net/i40e: fix parsing packet type for NEON
> >       test/trace: fix race on collected perf data
> >
> > Ferruh Yigit (9):
> >       power: remove duplicated symbols from map file
> >       log/linux: make default output stderr
> >       license: fix typos
> >       drivers/net: fix FW version query
> >       net/bnx2x: fix build with GCC 11
> >       net/bnx2x: fix build with GCC 11
> >       net/ice/base: fix build with GCC 11
> >       net/tap: fix build with GCC 11
> >       test/table: fix build with GCC 11
> >
> > Gregory Etelson (2):
> >       app/testpmd: fix tunnel offload flows cleanup
> >       net/mlx5: fix tunnel offload private items location
> >
> > Guoyang Zhou (1):
> >       net/hinic: fix crash in secondary process
> >
> > Haiyue Wang (1):
> >       net/ixgbe: fix Rx errors statistics for UDP checksum
> >
> > Harman Kalra (1):
> >       event/octeontx2: fix device reconfigure for single slot
> >
> > Heinrich Kuhn (1):
> >       net/nfp: fix reporting of RSS capabilities
> >
> > Hemant Agrawal (3):
> >       ethdev: add missing buses in device iterator
> >       crypto/dpaa_sec: affine the thread portal affinity
> >       crypto/dpaa2_sec: fix close and uninit functions
> >
> > Hongbo Zheng (9):
> >       app/testpmd: fix Tx/Rx descriptor query error log
> >       net/hns3: fix FLR miss detection
> >       net/hns3: delete redundant blank line
> >       bpf: fix JSLT validation
> >       common/sfc_efx/base: fix dereferencing null pointer
> >       power: fix sanity checks for guest channel read
> >       net/hns3: fix VF alive notification after config restore
> >       examples/l3fwd-power: fix empty poll thresholds
> >       net/hns3: fix concurrent interrupt handling
> >
> > Huisong Li (23):
> >       net/hns3: fix device capabilities for copper media type
> >       net/hns3: remove unused parameter markers
> >       net/hns3: fix reporting undefined speed
> >       net/hns3: fix link update when failed to get link info
> >       net/hns3: fix flow control exception
> >       app/testpmd: fix bitmap of link speeds when force speed
> >       net/hns3: fix flow control mode
> >       net/hns3: remove redundant mailbox response
> >       net/hns3: fix DCB mode check
> >       net/hns3: fix VMDq mode check
> >       net/hns3: fix mbuf leakage
> >       net/hns3: fix link status when port is stopped
> >       net/hns3: fix link speed when port is down
> >       app/testpmd: fix forward lcores number for DCB
> >       app/testpmd: fix DCB forwarding configuration
> >       app/testpmd: fix DCB re-configuration
> >       app/testpmd: verify DCB config during forward config
> >       net/hns3: fix Rx/Tx queue numbers check
> >       net/hns3: fix requested FC mode rollback
> >       net/hns3: remove meaningless packet buffer rollback
> >       net/hns3: fix DCB configuration
> >       net/hns3: fix DCB reconfiguration
> >       net/hns3: fix link speed when VF device is down
> >
> > Ibtisam Tariq (1):
> >       examples/vhost_crypto: remove unused short option
> >
> > Igor Chauskin (2):
> >       net/ena: switch memcpy to optimized version
> >       net/ena: fix parsing of large LLQ header device argument
> >
> > Igor Russkikh (2):
> >       net/qede: reduce log verbosity
> >       net/qede: accept bigger RSS table
> >
> > Ilya Maximets (1):
> >       net/virtio: fix interrupt unregistering for listening socket
> >
> > Ivan Malov (5):
> >       net/sfc: fix buffer size for flow parse
> >       net: fix comment in IPv6 header
> >       net/sfc: fix error path inconsistency
> >       common/sfc_efx/base: fix indication of MAE encap support
> >       net/sfc: fix outer rule rollback on error
> >
> > Jerin Jacob (1):
> >       examples: fix pkg-config override
> >
> > Jiawei Wang (4):
> >       app/testpmd: fix NVGRE encap configuration
> >       net/mlx5: fix resource release for mirror flow
> >       net/mlx5: fix RSS flow item expansion for GRE key
> >       net/mlx5: fix RSS flow item expansion for NVGRE
> >
> > Jiawei Zhu (1):
> >       net/mlx5: fix Rx segmented packets on mbuf starvation
> >
> > Jiawen Wu (4):
> >       net/txgbe: remove unused functions
> >       net/txgbe: fix Rx missed packet counter
> >       net/txgbe: update packet type
> >       net/txgbe: fix QinQ strip
> >
> > Jiayu Hu (2):
> >       vhost: fix queue initialization
> >       vhost: fix redundant vring status change notification
> >
> > Jie Wang (1):
> >       net/ice: fix VSI array out of bounds access
> >
> > John Daley (2):
> >       net/enic: fix flow initialization error handling
> >       net/enic: enable GENEVE offload via VNIC configuration
> >
> > Juraj Linkeš (1):
> >       eal/arm64: fix platform register bit
> >
> > Kai Ji (2):
> >       test/crypto: fix auth-cipher compare length in OOP
> >       test/crypto: copy offset data to OOP destination buffer
> >
> > Kalesh AP (23):
> >       net/bnxt: remove unused macro
> >       net/bnxt: fix VNIC configuration
> >       net/bnxt: fix firmware fatal error handling
> >       net/bnxt: fix FW readiness check during recovery
> >       net/bnxt: fix device readiness check
> >       net/bnxt: fix VF info allocation
> >       net/bnxt: fix HWRM and FW incompatibility handling
> >       net/bnxt: mute some failure logs
> >       app/testpmd: check MAC address query
> >       net/bnxt: fix PCI write check
> >       net/bnxt: fix link state operations
> >       net/bnxt: fix timesync when PTP is not supported
> >       net/bnxt: fix memory allocation for command response
> >       net/bnxt: fix double free in port start failure
> >       net/bnxt: fix configuring LRO
> >       net/bnxt: fix health check alarm cancellation
> >       net/bnxt: fix PTP support for Thor
> >       net/bnxt: fix ring count calculation for Thor
> >       net/bnxt: remove unnecessary forward declarations
> >       net/bnxt: remove unused function parameters
> >       net/bnxt: drop unused attribute
> >       net/bnxt: fix single PF per port check
> >       net/bnxt: prevent device access in error state
> >
> > Kamil Vojanec (1):
> >       net/mlx5/linux: fix firmware version
> >
> > Kevin Traynor (5):
> >       test/cmdline: fix inputs array
> >       test/crypto: fix build with GCC 11
> >       crypto/zuc: fix build with GCC 11
> >       test: fix build with GCC 11
> >       test/cmdline: silence clang 12 warning
> >
> > Konstantin Ananyev (1):
> >       acl: fix build with GCC 11
> >
> > Lance Richardson (8):
> >       net/bnxt: fix Rx buffer posting
> >       net/bnxt: fix Tx length hint threshold
> >       net/bnxt: fix handling of null flow mask
> >       test: fix TCP header initialization
> >       net/bnxt: fix Rx descriptor status
> >       net/bnxt: fix Rx queue count
> >       net/bnxt: fix dynamic VNIC count
> >       eal: fix memory mapping on 32-bit target
> >
> > Leyi Rong (1):
> >       net/iavf: fix packet length parsing in AVX512
> >
> > Li Zhang (1):
> >       net/mlx5: fix flow actions index in cache
> >
> > Luc Pelletier (2):
> >       eal: fix race in control thread creation
> >       eal: fix hang in control thread creation
> >
> > Marvin Liu (5):
> >       vhost: fix split ring potential buffer overflow
> >       vhost: fix packed ring potential buffer overflow
> >       vhost: fix batch dequeue potential buffer overflow
> >       vhost: fix initialization of temporary header
> >       vhost: fix initialization of async temporary header
> >
> > Matan Azrad (5):
> >       common/mlx5/linux: add glue function to query WQ
> >       common/mlx5: add DevX command to query WQ
> >       common/mlx5: add DevX commands for queue counters
> >       vdpa/mlx5: fix virtq cleaning
> >       vdpa/mlx5: fix device unplug
> >
> > Michael Baum (1):
> >       net/mlx5: fix flow age event triggering
> >
> > Michal Krawczyk (5):
> >       net/ena/base: improve style and comments
> >       net/ena/base: fix type conversions by explicit casting
> >       net/ena/base: destroy multiple wait events
> >       net/ena: fix crash with unsupported device argument
> >       net/ena: indicate Rx RSS hash presence
> >
> > Min Hu (Connor) (25):
> >       net/hns3: fix MTU config complexity
> >       net/hns3: update HiSilicon copyright syntax
> >       net/hns3: fix copyright date
> >       examples/ptpclient: remove wrong comment
> >       test/bpf: fix error message
> >       doc: fix HiSilicon copyright syntax
> >       net/hns3: remove unused macros
> >       net/hns3: remove unused macro
> >       app/eventdev: fix overflow in lcore list parsing
> >       test/kni: fix a comment
> >       test/kni: check init result
> >       net/hns3: fix typos on comments
> >       net/e1000: fix flow error message object
> >       app/testpmd: fix division by zero on socket memory dump
> >       net/kni: warn on stop failure
> >       app/bbdev: check memory allocation
> >       app/bbdev: fix HARQ error messages
> >       raw/skeleton: add missing check after setting attribute
> >       test/timer: check memzone allocation
> >       app/crypto-perf: check memory allocation
> >       examples/flow_classify: fix NUMA check of port and core
> >       examples/l2fwd-cat: fix NUMA check of port and core
> >       examples/skeleton: fix NUMA check of port and core
> >       test: check flow classifier creation
> >       test: fix division by zero
> >
> > Murphy Yang (3):
> >       net/ixgbe: fix RSS RETA being reset after port start
> >       net/i40e: fix flow director config after flow validate
> >       net/i40e: fix flow director for common pctypes
> >
> > Natanael Copa (5):
> >       common/dpaax/caamflib: fix build with musl
> >       bus/dpaa: fix 64-bit arch detection
> >       bus/dpaa: fix build with musl
> >       net/cxgbe: remove use of uint type
> >       app/testpmd: fix build with musl
> >
> > Nipun Gupta (1):
> >       bus/dpaa: fix statistics reading
> >
> > Nithin Dabilpuram (3):
> >       vfio: do not merge contiguous areas
> >       vfio: fix DMA mapping granularity for IOVA as VA
> >       test/mem: fix page size for external memory
> >
> > Olivier Matz (1):
> >       test/mempool: fix object initializer
> >
> > Pallavi Kadam (1):
> >       bus/pci: skip probing some Windows NDIS devices
> >
> > Pavan Nikhilesh (4):
> >       test/event: fix timeout accuracy
> >       app/eventdev: fix timeout accuracy
> >       app/eventdev: fix lcore parsing skipping last core
> >       event/octeontx2: fix XAQ pool reconfigure
> >
> > Pu Xu (1):
> >       ip_frag: fix fragmenting IPv4 packet with header option
> >
> > Qi Zhang (8):
> >       net/ice/base: fix payload indicator on ptype
> >       net/ice/base: fix uninitialized struct
> >       net/ice/base: cleanup filter list on error
> >       net/ice/base: fix memory allocation for MAC addresses
> >       net/iavf: fix TSO max segment size
> >       doc: fix matching versions in ice guide
> >       net/iavf: fix wrong Tx context descriptor
> >       common/iavf: fix duplicated offload bit
> >
> > Radha Mohan Chintakuntla (1):
> >       raw/octeontx2_dma: assign PCI device in DPI VF
> >
> > Raslan Darawsheh (1):
> >       ethdev: update flow item GTP QFI definition
> >
> > Richael Zhuang (2):
> >       test/power: add delay before checking CPU frequency
> >       test/power: round CPU frequency to check
> >
> > Robin Zhang (6):
> >       net/i40e: announce request queue capability in PF
> >       doc: update recommended versions for i40e
> >       net/i40e: fix lack of MAC type when set MAC address
> >       net/iavf: fix lack of MAC type when set MAC address
> >       net/iavf: fix primary MAC type when starting port
> >       net/i40e: fix primary MAC type when starting port
> >
> > Rohit Raj (3):
> >       net/dpaa2: fix getting link status
> >       net/dpaa: fix getting link status
> >       examples/l2fwd-crypto: fix packet length while decryption
> >
> > Roy Shterman (1):
> >       mem: fix freeing segments in --huge-unlink mode
> >
> > Satheesh Paul (1):
> >       net/octeontx2: fix VLAN filter
> >
> > Savinay Dharmappa (1):
> >       sched: fix traffic class oversubscription parameter
> >
> > Shijith Thotton (3):
> >       eventdev: fix case to initiate crypto adapter service
> >       event/octeontx2: fix crypto adapter queue pair operations
> >       event/octeontx2: configure crypto adapter xaq pool
> >
> > Siwar Zitouni (1):
> >       net/ice: fix disabling promiscuous mode
> >
> > Somnath Kotur (5):
> >       net/bnxt: fix xstats get
> >       net/bnxt: fix Rx and Tx timestamps
> >       net/bnxt: fix Tx timestamp init
> >       net/bnxt: refactor multi-queue Rx configuration
> >       net/bnxt: fix Rx timestamp when FIFO pending bit is set
> >
> > Stanislaw Kardach (6):
> >       test: proceed if timer subsystem already initialized
> >       stack: allow lock-free only on relevant architectures
> >       test/distributor: fix worker notification in burst mode
> >       test/distributor: fix burst flush on worker quit
> >       net/ena: remove endian swap functions
> >       net/ena: report default ring size
> >
> > Stephen Hemminger (2):
> >       kni: refactor user request processing
> >       net/bnxt: use prefix on global function
> >
> > Suanming Mou (1):
> >       net/mlx5: fix counter offset detection
> >
> > Tal Shnaiderman (2):
> >       eal/windows: fix default thread priority
> >       eal/windows: fix return codes of pthread shim layer
> >
> > Tengfei Zhang (1):
> >       net/pcap: fix file descriptor leak on close
> >
> > Thinh Tran (1):
> >       test: fix autotest handling of skipped tests
> >
> > Thomas Monjalon (18):
> >       bus/pci: fix Windows kernel driver categories
> >       eal: fix comment of OS-specific header files
> >       buildtools: fix build with busybox
> >       build: detect execinfo library on Linux
> >       build: remove redundant _GNU_SOURCE definitions
> >       eal: fix build with musl
> >       net/igc: remove use of uint type
> >       event/dlb: fix header includes for musl
> >       examples/bbdev: fix header include for musl
> >       drivers: fix log level after loading
> >       app/regex: fix usage text
> >       app/testpmd: fix usage text
> >       doc: fix names of UIO drivers
> >       doc: fix build with Sphinx 4
> >       bus/pci: support I/O port operations with musl
> >       app: fix exit messages
> >       regex/octeontx2: remove unused include directory
> >       doc: remove PDF requirements
> >
> > Tianyu Li (1):
> >       net/memif: fix Tx bps statistics for zero-copy
> >
> > Timothy McDaniel (2):
> >       event/dlb2: remove references to deferred scheduling
> >       doc: fix runtime options in DLB2 guide
> >
> > Tyler Retzlaff (1):
> >       eal: add C++ include guard for reciprocal header
> >
> > Vadim Podovinnikov (1):
> >       net/bonding: fix LACP system address check
> >
> > Venkat Duvvuru (1):
> >       net/bnxt: fix queues per VNIC
> >
> > Viacheslav Ovsiienko (16):
> >       net/mlx5: fix external buffer pool registration for Rx queue
> >       net/mlx5: fix metadata item validation for ingress flows
> >       net/mlx5: fix hashed list size for tunnel flow groups
> >       net/mlx5: fix UAR allocation diagnostics messages
> >       common/mlx5: add timestamp format support to DevX
> >       vdpa/mlx5: support timestamp format
> >       net/mlx5: fix Rx metadata leftovers
> >       net/mlx5: fix drop action for Direct Rules/Verbs
> >       net/mlx4: fix RSS action with null hash key
> >       net/mlx5: support timestamp format
> >       regex/mlx5: support timestamp format
> >       app/testpmd: fix segment number check
> >       net/mlx5: remove drop queue function prototypes
> >       net/mlx4: fix buffer leakage on device close
> >       net/mlx5: fix probing device in legacy bonding mode
> >       net/mlx5: fix receiving queue timestamp format
> >
> > Wei Huang (1):
> >       raw/ifpga: fix device name format
> >
> > Wenjun Wu (3):
> >       net/ice: check some functions return
> >       net/ice: fix RSS hash update
> >       net/ice: fix RSS for L2 packet
> >
> > Wenwu Ma (1):
> >       net/ice: fix illegal access when removing MAC filter
> >
> > Wenzhuo Lu (2):
> >       net/iavf: fix crash in AVX512
> >       net/ice: fix crash in AVX512
> >
> > Wisam Jaddo (1):
> >       app/flow-perf: fix encap/decap actions
> >
> > Xiao Wang (1):
> >       vdpa/ifc: check PCI config read
> >
> > Xiaoyu Min (4):
> >       net/mlx5: support RSS expansion for IPv6 GRE
> >       net/mlx5: fix shared inner RSS
> >       net/mlx5: fix missing shared RSS hash types
> >       net/mlx5: fix redundant flow after RSS expansion
> >
> > Xiaoyun Li (2):
> >       app/testpmd: remove unnecessary UDP tunnel check
> >       net/i40e: fix IPv4 fragment offload
> >
> > Xueming Li (2):
> >       version: 20.11.2-rc1
> >       net/virtio: fix vectorized Rx queue rearm
> >
> > Youri Querry (1):
> >       bus/fslmc: fix random portal hangs with qbman 5.0
> >
> > Yunjian Wang (5):
> >       vfio: fix API description
> >       net/mlx5: fix using flow tunnel before null check
> >       vfio: fix duplicated user mem map
> >       net/mlx4: fix leak when configured repeatedly
> >       net/mlx5: fix leak when configured repeatedly

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [dpdk-stable] 20.11.2 patches review and test
  2021-06-26 23:28  1% Xueming Li
  2021-06-30 10:33  0% ` Jiang, YuX
@ 2021-07-06  3:26  0% ` Kalesh Anakkur Purayil
  2021-07-06  6:47  0%   ` Xueming(Steven) Li
  1 sibling, 1 reply; 200+ results
From: Kalesh Anakkur Purayil @ 2021-07-06  3:26 UTC (permalink / raw)
  To: Xueming Li
  Cc: dpdk stable, dpdk-dev, Abhishek Marathe, Akhil Goyal,
	Ali Alnubani, benjamin.walker, David Christensen,
	Hariprasad Govindharajan, Hemant Agrawal, Ian Stokes,
	Jerin Jacob, John McNamara, Ju-Hyoung Lee, Kevin Traynor,
	Luca Boccassi, Pei Zhang, pingx.yu, qian.q.xu, Raslan Darawsheh,
	Thomas Monjalon, yuan.peng, zhaoyan.chen

[-- Attachment #1: Type: text/plain, Size: 26728 bytes --]

Hi Xueming,

Testing with dpdk v20.11.2 from Broadcom looks good.

- Basic functionality:
  Send and receive multiple types of traffic.
- Changing/checking link status through testpmd.
- RSS tests.
- TSO tests
- VLAN filtering tests.
- MAC filtering test
- statistics tests
- Checksum offload tests
- MTU tests
- Promiscuous tests
- Allmulti test

NIC: BCM57414 NetXtreme-E 10Gb/25Gb Ethernet Controller, Firmware:
219.0.88.0
NIC: BCM57508 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet,
Firmware : 220.0.0.100

Regards,
Kalesh

On Sun, Jun 27, 2021 at 4:59 AM Xueming Li <xuemingl@nvidia.com> wrote:

> Hi all,
>
> Here is a list of patches targeted for stable release 20.11.2.
>
> The planned date for the final release is 6th July.
>
> Please help with testing and validation of your use cases and report
> any issues/results with reply-all to this mail. For the final release
> the fixes and reported validations will be added to the release notes.
>
> A release candidate tarball can be found at:
>
>     https://dpdk.org/browse/dpdk-stable/tag/?id=v20.11.2-rc2
>
> These patches are located at branch 20.11 of dpdk-stable repo:
>     https://dpdk.org/browse/dpdk-stable/
>
> Thanks.
>
> Xueming Li <xuemingl@nvidia.com>
>
> ---
> Adam Dybkowski (3):
>       common/qat: increase IM buffer size for GEN3
>       compress/qat: enable compression on GEN3
>       crypto/qat: fix null authentication request
>
> Ajit Khaparde (7):
>       net/bnxt: fix RSS context cleanup
>       net/bnxt: check kvargs parsing
>       net/bnxt: fix resource cleanup
>       doc: fix formatting in testpmd guide
>       net/bnxt: fix mismatched type comparison in MAC restore
>       net/bnxt: check PCI config read
>       net/bnxt: fix mismatched type comparison in Rx
>
> Alvin Zhang (11):
>       net/ice: fix VLAN filter with PF
>       net/i40e: fix input set field mask
>       net/igc: fix Rx RSS hash offload capability
>       net/igc: fix Rx error counter for bad length
>       net/e1000: fix Rx error counter for bad length
>       net/e1000: fix max Rx packet size
>       net/igc: fix Rx packet size
>       net/ice: fix fast mbuf freeing
>       net/iavf: fix VF to PF command failure handling
>       net/i40e: fix VF RSS configuration
>       net/igc: fix speed configuration
>
> Anatoly Burakov (3):
>       fbarray: fix log message on truncation error
>       power: do not skip saving original P-state governor
>       power: save original ACPI governor always
>
> Andrew Boyer (1):
>       net/ionic: fix completion type in lif init
>
> Andrew Rybchenko (4):
>       net/failsafe: fix RSS hash offload reporting
>       net/failsafe: report minimum and maximum MTU
>       common/sfc_efx: remove GENEVE from supported tunnels
>       net/sfc: fix mark support in EF100 native Rx datapath
>
> Andy Moreton (2):
>       common/sfc_efx/base: limit reported MCDI response length
>       common/sfc_efx/base: add missing MCDI response length checks
>
> Ankur Dwivedi (1):
>       crypto/octeontx: fix session-less mode
>
> Apeksha Gupta (1):
>       examples/l2fwd-crypto: skip masked devices
>
> Arek Kusztal (1):
>       crypto/qat: fix offset for out-of-place scatter-gather
>
> Beilei Xing (1):
>       net/i40evf: fix packet loss for X722
>
> Bing Zhao (1):
>       net/mlx5: fix loopback for Direct Verbs queue
>
> Bruce Richardson (2):
>       build: exclude meson files from examples installation
>       raw/ioat: fix script for configuring small number of queues
>
> Chaoyong He (1):
>       doc: fix multiport syntax in nfp guide
>
> Chenbo Xia (1):
>       examples/vhost: check memory table query
>
> Chengchang Tang (20):
>       net/hns3: fix HW buffer size on MTU update
>       net/hns3: fix processing Tx offload flags
>       net/hns3: fix Tx checksum for UDP packets with special port
>       net/hns3: fix long task queue pairs reset time
>       ethdev: validate input in module EEPROM dump
>       ethdev: validate input in register info
>       ethdev: validate input in EEPROM info
>       net/hns3: fix rollback after setting PVID failure
>       net/hns3: fix timing in resetting queues
>       net/hns3: fix queue state when concurrent with reset
>       net/hns3: fix configure FEC when concurrent with reset
>       net/hns3: fix use of command status enumeration
>       examples: add eal cleanup to examples
>       net/bonding: fix adding itself as its slave
>       net/hns3: fix timing in mailbox
>       app/testpmd: fix max queue number for Tx offloads
>       net/tap: fix interrupt vector array size
>       net/bonding: fix socket ID check
>       net/tap: check ioctl on restore
>       examples/timer: fix time interval
>
> Chengwen Feng (50):
>       net/hns3: fix flow counter value
>       net/hns3: fix VF mailbox head field
>       net/hns3: support get device version when dump register
>       net/hns3: fix some packet types
>       net/hns3: fix missing outer L4 UDP flag for VXLAN
>       net/hns3: remove VLAN/QinQ ptypes from support list
>       test: check thread creation
>       common/dpaax: fix possible null pointer access
>       examples/ethtool: remove unused parsing
>       net/hns3: fix flow director lock
>       net/e1000/base: fix timeout for shadow RAM write
>       net/hns3: fix setting default MAC address in bonding of VF
>       net/hns3: fix possible mismatched response of mailbox
>       net/hns3: fix VF handling LSC event in secondary process
>       net/hns3: fix verification of NEON support
>       mbuf: check shared memory before dumping dynamic space
>       eventdev: remove redundant thread name setting
>       eventdev: fix memory leakage on thread creation failure
>       net/kni: check init result
>       net/hns3: fix mailbox error message
>       net/hns3: fix processing link status message on PF
>       net/hns3: remove unused mailbox macro and struct
>       net/bonding: fix leak on remove
>       net/hns3: fix handling link update
>       net/i40e: fix negative VEB index
>       net/i40e: remove redundant VSI check in Tx queue setup
>       net/virtio: fix getline memory leakage
>       net/hns3: log time delta in decimal format
>       net/hns3: fix time delta calculation
>       net/hns3: remove unused macros
>       net/hns3: fix vector Rx burst limitation
>       net/hns3: remove read when enabling TM QCN error event
>       net/hns3: remove unused VMDq code
>       net/hns3: increase readability in logs
>       raw/ntb: check SPAD user index
>       raw/ntb: check memory allocations
>       ipc: check malloc sync reply result
>       eal: fix service core list parsing
>       ipc: use monotonic clock
>       net/hns3: return error on PCI config write failure
>       net/hns3: fix log on flow director clear
>       net/hns3: clear hash map on flow director clear
>       net/hns3: fix querying flow director counter for out param
>       net/hns3: fix TM QCN error event report by MSI-X
>       net/hns3: fix mailbox message ID in log
>       net/hns3: fix secondary process request start/stop Rx/Tx
>       net/hns3: fix ordering in secondary process initialization
>       net/hns3: fail setting FEC if one bit mode is not supported
>       net/mlx4: fix secondary process initialization ordering
>       net/mlx5: fix secondary process initialization ordering
>
> Ciara Loftus (1):
>       net/af_xdp: fix error handling during Rx queue setup
>
> Ciara Power (2):
>       telemetry: fix race on callbacks list
>       test/crypto: fix return value of a skipped test
>
> Conor Walsh (1):
>       examples/l3fwd: fix LPM IPv6 subnets
>
> Cristian Dumitrescu (3):
>       table: fix actions with different data size
>       pipeline: fix instruction translation
>       pipeline: fix endianness conversions
>
> Dapeng Yu (3):
>       net/igc: remove MTU setting limitation
>       net/e1000: remove MTU setting limitation
>       examples/packet_ordering: fix port configuration
>
> David Christensen (1):
>       config/ppc: reduce number of cores and NUMA nodes
>
> David Harton (1):
>       net/ena: fix releasing Tx ring mbufs
>
> David Hunt (4):
>       test/power: fix CPU frequency check
>       test/power: add turbo mode to frequency check
>       test/power: fix low frequency test when turbo enabled
>       test/power: fix turbo test
>
> David Marchand (18):
>       doc: fix sphinx rtd theme import in GHA
>       service: clean references to removed symbol
>       eal: fix evaluation of log level option
>       ci: hook to GitHub Actions
>       ci: enable v21 ABI checks
>       ci: fix package installation in GitHub Actions
>       ci: ignore APT update failure in GitHub Actions
>       ci: catch coredumps
>       vhost: fix offload flags in Rx path
>       bus/fslmc: remove unused debug macro
>       eal: fix leak in shared lib mode detection
>       event/dpaa2: remove unused macros
>       net/ice/base: fix memory allocation wrapper
>       net/ice: fix leak on thread termination
>       devtools: fix orphan symbols check with busybox
>       net/vhost: restore pseudo TSO support
>       net/ark: fix leak on thread termination
>       build: fix drivers selection without Python
>
> Dekel Peled (1):
>       common/mlx5: fix DevX read output buffer size
>
> Dmitry Kozlyuk (4):
>       net/pcap: fix format string
>       eal/windows: add missing SPDX license tag
>       buildtools: fix all drivers disabled on Windows
>       examples/rxtx_callbacks: fix port ID format specifier
>
> Ed Czeck (2):
>       net/ark: update packet director initial state
>       net/ark: refactor Rx buffer recovery
>
> Elad Nachman (2):
>       kni: support async user request
>       kni: fix kernel deadlock with bifurcated device
>
> Feifei Wang (2):
>       net/i40e: fix parsing packet type for NEON
>       test/trace: fix race on collected perf data
>
> Ferruh Yigit (9):
>       power: remove duplicated symbols from map file
>       log/linux: make default output stderr
>       license: fix typos
>       drivers/net: fix FW version query
>       net/bnx2x: fix build with GCC 11
>       net/bnx2x: fix build with GCC 11
>       net/ice/base: fix build with GCC 11
>       net/tap: fix build with GCC 11
>       test/table: fix build with GCC 11
>
> Gregory Etelson (2):
>       app/testpmd: fix tunnel offload flows cleanup
>       net/mlx5: fix tunnel offload private items location
>
> Guoyang Zhou (1):
>       net/hinic: fix crash in secondary process
>
> Haiyue Wang (1):
>       net/ixgbe: fix Rx errors statistics for UDP checksum
>
> Harman Kalra (1):
>       event/octeontx2: fix device reconfigure for single slot
>
> Heinrich Kuhn (1):
>       net/nfp: fix reporting of RSS capabilities
>
> Hemant Agrawal (3):
>       ethdev: add missing buses in device iterator
>       crypto/dpaa_sec: affine the thread portal affinity
>       crypto/dpaa2_sec: fix close and uninit functions
>
> Hongbo Zheng (9):
>       app/testpmd: fix Tx/Rx descriptor query error log
>       net/hns3: fix FLR miss detection
>       net/hns3: delete redundant blank line
>       bpf: fix JSLT validation
>       common/sfc_efx/base: fix dereferencing null pointer
>       power: fix sanity checks for guest channel read
>       net/hns3: fix VF alive notification after config restore
>       examples/l3fwd-power: fix empty poll thresholds
>       net/hns3: fix concurrent interrupt handling
>
> Huisong Li (23):
>       net/hns3: fix device capabilities for copper media type
>       net/hns3: remove unused parameter markers
>       net/hns3: fix reporting undefined speed
>       net/hns3: fix link update when failed to get link info
>       net/hns3: fix flow control exception
>       app/testpmd: fix bitmap of link speeds when force speed
>       net/hns3: fix flow control mode
>       net/hns3: remove redundant mailbox response
>       net/hns3: fix DCB mode check
>       net/hns3: fix VMDq mode check
>       net/hns3: fix mbuf leakage
>       net/hns3: fix link status when port is stopped
>       net/hns3: fix link speed when port is down
>       app/testpmd: fix forward lcores number for DCB
>       app/testpmd: fix DCB forwarding configuration
>       app/testpmd: fix DCB re-configuration
>       app/testpmd: verify DCB config during forward config
>       net/hns3: fix Rx/Tx queue numbers check
>       net/hns3: fix requested FC mode rollback
>       net/hns3: remove meaningless packet buffer rollback
>       net/hns3: fix DCB configuration
>       net/hns3: fix DCB reconfiguration
>       net/hns3: fix link speed when VF device is down
>
> Ibtisam Tariq (1):
>       examples/vhost_crypto: remove unused short option
>
> Igor Chauskin (2):
>       net/ena: switch memcpy to optimized version
>       net/ena: fix parsing of large LLQ header device argument
>
> Igor Russkikh (2):
>       net/qede: reduce log verbosity
>       net/qede: accept bigger RSS table
>
> Ilya Maximets (1):
>       net/virtio: fix interrupt unregistering for listening socket
>
> Ivan Malov (5):
>       net/sfc: fix buffer size for flow parse
>       net: fix comment in IPv6 header
>       net/sfc: fix error path inconsistency
>       common/sfc_efx/base: fix indication of MAE encap support
>       net/sfc: fix outer rule rollback on error
>
> Jerin Jacob (1):
>       examples: fix pkg-config override
>
> Jiawei Wang (4):
>       app/testpmd: fix NVGRE encap configuration
>       net/mlx5: fix resource release for mirror flow
>       net/mlx5: fix RSS flow item expansion for GRE key
>       net/mlx5: fix RSS flow item expansion for NVGRE
>
> Jiawei Zhu (1):
>       net/mlx5: fix Rx segmented packets on mbuf starvation
>
> Jiawen Wu (4):
>       net/txgbe: remove unused functions
>       net/txgbe: fix Rx missed packet counter
>       net/txgbe: update packet type
>       net/txgbe: fix QinQ strip
>
> Jiayu Hu (2):
>       vhost: fix queue initialization
>       vhost: fix redundant vring status change notification
>
> Jie Wang (1):
>       net/ice: fix VSI array out of bounds access
>
> John Daley (2):
>       net/enic: fix flow initialization error handling
>       net/enic: enable GENEVE offload via VNIC configuration
>
> Juraj Linkeš (1):
>       eal/arm64: fix platform register bit
>
> Kai Ji (2):
>       test/crypto: fix auth-cipher compare length in OOP
>       test/crypto: copy offset data to OOP destination buffer
>
> Kalesh AP (23):
>       net/bnxt: remove unused macro
>       net/bnxt: fix VNIC configuration
>       net/bnxt: fix firmware fatal error handling
>       net/bnxt: fix FW readiness check during recovery
>       net/bnxt: fix device readiness check
>       net/bnxt: fix VF info allocation
>       net/bnxt: fix HWRM and FW incompatibility handling
>       net/bnxt: mute some failure logs
>       app/testpmd: check MAC address query
>       net/bnxt: fix PCI write check
>       net/bnxt: fix link state operations
>       net/bnxt: fix timesync when PTP is not supported
>       net/bnxt: fix memory allocation for command response
>       net/bnxt: fix double free in port start failure
>       net/bnxt: fix configuring LRO
>       net/bnxt: fix health check alarm cancellation
>       net/bnxt: fix PTP support for Thor
>       net/bnxt: fix ring count calculation for Thor
>       net/bnxt: remove unnecessary forward declarations
>       net/bnxt: remove unused function parameters
>       net/bnxt: drop unused attribute
>       net/bnxt: fix single PF per port check
>       net/bnxt: prevent device access in error state
>
> Kamil Vojanec (1):
>       net/mlx5/linux: fix firmware version
>
> Kevin Traynor (5):
>       test/cmdline: fix inputs array
>       test/crypto: fix build with GCC 11
>       crypto/zuc: fix build with GCC 11
>       test: fix build with GCC 11
>       test/cmdline: silence clang 12 warning
>
> Konstantin Ananyev (1):
>       acl: fix build with GCC 11
>
> Lance Richardson (8):
>       net/bnxt: fix Rx buffer posting
>       net/bnxt: fix Tx length hint threshold
>       net/bnxt: fix handling of null flow mask
>       test: fix TCP header initialization
>       net/bnxt: fix Rx descriptor status
>       net/bnxt: fix Rx queue count
>       net/bnxt: fix dynamic VNIC count
>       eal: fix memory mapping on 32-bit target
>
> Leyi Rong (1):
>       net/iavf: fix packet length parsing in AVX512
>
> Li Zhang (1):
>       net/mlx5: fix flow actions index in cache
>
> Luc Pelletier (2):
>       eal: fix race in control thread creation
>       eal: fix hang in control thread creation
>
> Marvin Liu (5):
>       vhost: fix split ring potential buffer overflow
>       vhost: fix packed ring potential buffer overflow
>       vhost: fix batch dequeue potential buffer overflow
>       vhost: fix initialization of temporary header
>       vhost: fix initialization of async temporary header
>
> Matan Azrad (5):
>       common/mlx5/linux: add glue function to query WQ
>       common/mlx5: add DevX command to query WQ
>       common/mlx5: add DevX commands for queue counters
>       vdpa/mlx5: fix virtq cleaning
>       vdpa/mlx5: fix device unplug
>
> Michael Baum (1):
>       net/mlx5: fix flow age event triggering
>
> Michal Krawczyk (5):
>       net/ena/base: improve style and comments
>       net/ena/base: fix type conversions by explicit casting
>       net/ena/base: destroy multiple wait events
>       net/ena: fix crash with unsupported device argument
>       net/ena: indicate Rx RSS hash presence
>
> Min Hu (Connor) (25):
>       net/hns3: fix MTU config complexity
>       net/hns3: update HiSilicon copyright syntax
>       net/hns3: fix copyright date
>       examples/ptpclient: remove wrong comment
>       test/bpf: fix error message
>       doc: fix HiSilicon copyright syntax
>       net/hns3: remove unused macros
>       net/hns3: remove unused macro
>       app/eventdev: fix overflow in lcore list parsing
>       test/kni: fix a comment
>       test/kni: check init result
>       net/hns3: fix typos on comments
>       net/e1000: fix flow error message object
>       app/testpmd: fix division by zero on socket memory dump
>       net/kni: warn on stop failure
>       app/bbdev: check memory allocation
>       app/bbdev: fix HARQ error messages
>       raw/skeleton: add missing check after setting attribute
>       test/timer: check memzone allocation
>       app/crypto-perf: check memory allocation
>       examples/flow_classify: fix NUMA check of port and core
>       examples/l2fwd-cat: fix NUMA check of port and core
>       examples/skeleton: fix NUMA check of port and core
>       test: check flow classifier creation
>       test: fix division by zero
>
> Murphy Yang (3):
>       net/ixgbe: fix RSS RETA being reset after port start
>       net/i40e: fix flow director config after flow validate
>       net/i40e: fix flow director for common pctypes
>
> Natanael Copa (5):
>       common/dpaax/caamflib: fix build with musl
>       bus/dpaa: fix 64-bit arch detection
>       bus/dpaa: fix build with musl
>       net/cxgbe: remove use of uint type
>       app/testpmd: fix build with musl
>
> Nipun Gupta (1):
>       bus/dpaa: fix statistics reading
>
> Nithin Dabilpuram (3):
>       vfio: do not merge contiguous areas
>       vfio: fix DMA mapping granularity for IOVA as VA
>       test/mem: fix page size for external memory
>
> Olivier Matz (1):
>       test/mempool: fix object initializer
>
> Pallavi Kadam (1):
>       bus/pci: skip probing some Windows NDIS devices
>
> Pavan Nikhilesh (4):
>       test/event: fix timeout accuracy
>       app/eventdev: fix timeout accuracy
>       app/eventdev: fix lcore parsing skipping last core
>       event/octeontx2: fix XAQ pool reconfigure
>
> Pu Xu (1):
>       ip_frag: fix fragmenting IPv4 packet with header option
>
> Qi Zhang (8):
>       net/ice/base: fix payload indicator on ptype
>       net/ice/base: fix uninitialized struct
>       net/ice/base: cleanup filter list on error
>       net/ice/base: fix memory allocation for MAC addresses
>       net/iavf: fix TSO max segment size
>       doc: fix matching versions in ice guide
>       net/iavf: fix wrong Tx context descriptor
>       common/iavf: fix duplicated offload bit
>
> Radha Mohan Chintakuntla (1):
>       raw/octeontx2_dma: assign PCI device in DPI VF
>
> Raslan Darawsheh (1):
>       ethdev: update flow item GTP QFI definition
>
> Richael Zhuang (2):
>       test/power: add delay before checking CPU frequency
>       test/power: round CPU frequency to check
>
> Robin Zhang (6):
>       net/i40e: announce request queue capability in PF
>       doc: update recommended versions for i40e
>       net/i40e: fix lack of MAC type when set MAC address
>       net/iavf: fix lack of MAC type when set MAC address
>       net/iavf: fix primary MAC type when starting port
>       net/i40e: fix primary MAC type when starting port
>
> Rohit Raj (3):
>       net/dpaa2: fix getting link status
>       net/dpaa: fix getting link status
>       examples/l2fwd-crypto: fix packet length while decryption
>
> Roy Shterman (1):
>       mem: fix freeing segments in --huge-unlink mode
>
> Satheesh Paul (1):
>       net/octeontx2: fix VLAN filter
>
> Savinay Dharmappa (1):
>       sched: fix traffic class oversubscription parameter
>
> Shijith Thotton (3):
>       eventdev: fix case to initiate crypto adapter service
>       event/octeontx2: fix crypto adapter queue pair operations
>       event/octeontx2: configure crypto adapter xaq pool
>
> Siwar Zitouni (1):
>       net/ice: fix disabling promiscuous mode
>
> Somnath Kotur (5):
>       net/bnxt: fix xstats get
>       net/bnxt: fix Rx and Tx timestamps
>       net/bnxt: fix Tx timestamp init
>       net/bnxt: refactor multi-queue Rx configuration
>       net/bnxt: fix Rx timestamp when FIFO pending bit is set
>
> Stanislaw Kardach (6):
>       test: proceed if timer subsystem already initialized
>       stack: allow lock-free only on relevant architectures
>       test/distributor: fix worker notification in burst mode
>       test/distributor: fix burst flush on worker quit
>       net/ena: remove endian swap functions
>       net/ena: report default ring size
>
> Stephen Hemminger (2):
>       kni: refactor user request processing
>       net/bnxt: use prefix on global function
>
> Suanming Mou (1):
>       net/mlx5: fix counter offset detection
>
> Tal Shnaiderman (2):
>       eal/windows: fix default thread priority
>       eal/windows: fix return codes of pthread shim layer
>
> Tengfei Zhang (1):
>       net/pcap: fix file descriptor leak on close
>
> Thinh Tran (1):
>       test: fix autotest handling of skipped tests
>
> Thomas Monjalon (18):
>       bus/pci: fix Windows kernel driver categories
>       eal: fix comment of OS-specific header files
>       buildtools: fix build with busybox
>       build: detect execinfo library on Linux
>       build: remove redundant _GNU_SOURCE definitions
>       eal: fix build with musl
>       net/igc: remove use of uint type
>       event/dlb: fix header includes for musl
>       examples/bbdev: fix header include for musl
>       drivers: fix log level after loading
>       app/regex: fix usage text
>       app/testpmd: fix usage text
>       doc: fix names of UIO drivers
>       doc: fix build with Sphinx 4
>       bus/pci: support I/O port operations with musl
>       app: fix exit messages
>       regex/octeontx2: remove unused include directory
>       doc: remove PDF requirements
>
> Tianyu Li (1):
>       net/memif: fix Tx bps statistics for zero-copy
>
> Timothy McDaniel (2):
>       event/dlb2: remove references to deferred scheduling
>       doc: fix runtime options in DLB2 guide
>
> Tyler Retzlaff (1):
>       eal: add C++ include guard for reciprocal header
>
> Vadim Podovinnikov (1):
>       net/bonding: fix LACP system address check
>
> Venkat Duvvuru (1):
>       net/bnxt: fix queues per VNIC
>
> Viacheslav Ovsiienko (16):
>       net/mlx5: fix external buffer pool registration for Rx queue
>       net/mlx5: fix metadata item validation for ingress flows
>       net/mlx5: fix hashed list size for tunnel flow groups
>       net/mlx5: fix UAR allocation diagnostics messages
>       common/mlx5: add timestamp format support to DevX
>       vdpa/mlx5: support timestamp format
>       net/mlx5: fix Rx metadata leftovers
>       net/mlx5: fix drop action for Direct Rules/Verbs
>       net/mlx4: fix RSS action with null hash key
>       net/mlx5: support timestamp format
>       regex/mlx5: support timestamp format
>       app/testpmd: fix segment number check
>       net/mlx5: remove drop queue function prototypes
>       net/mlx4: fix buffer leakage on device close
>       net/mlx5: fix probing device in legacy bonding mode
>       net/mlx5: fix receiving queue timestamp format
>
> Wei Huang (1):
>       raw/ifpga: fix device name format
>
> Wenjun Wu (3):
>       net/ice: check some functions return
>       net/ice: fix RSS hash update
>       net/ice: fix RSS for L2 packet
>
> Wenwu Ma (1):
>       net/ice: fix illegal access when removing MAC filter
>
> Wenzhuo Lu (2):
>       net/iavf: fix crash in AVX512
>       net/ice: fix crash in AVX512
>
> Wisam Jaddo (1):
>       app/flow-perf: fix encap/decap actions
>
> Xiao Wang (1):
>       vdpa/ifc: check PCI config read
>
> Xiaoyu Min (4):
>       net/mlx5: support RSS expansion for IPv6 GRE
>       net/mlx5: fix shared inner RSS
>       net/mlx5: fix missing shared RSS hash types
>       net/mlx5: fix redundant flow after RSS expansion
>
> Xiaoyun Li (2):
>       app/testpmd: remove unnecessary UDP tunnel check
>       net/i40e: fix IPv4 fragment offload
>
> Xueming Li (2):
>       version: 20.11.2-rc1
>       net/virtio: fix vectorized Rx queue rearm
>
> Youri Querry (1):
>       bus/fslmc: fix random portal hangs with qbman 5.0
>
> Yunjian Wang (5):
>       vfio: fix API description
>       net/mlx5: fix using flow tunnel before null check
>       vfio: fix duplicated user mem map
>       net/mlx4: fix leak when configured repeatedly
>       net/mlx5: fix leak when configured repeatedly
>


-- 
Regards,
Kalesh A P

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [dpdk-stable] 20.11.2 patches review and test
  2021-07-06  3:26  0% ` [dpdk-dev] [dpdk-stable] " Kalesh Anakkur Purayil
@ 2021-07-06  6:47  0%   ` Xueming(Steven) Li
  0 siblings, 0 replies; 200+ results
From: Xueming(Steven) Li @ 2021-07-06  6:47 UTC (permalink / raw)
  To: Kalesh Anakkur Purayil
  Cc: dpdk stable, dpdk-dev, Abhishek Marathe, Akhil Goyal,
	Ali Alnubani, benjamin.walker, David Christensen,
	Hariprasad Govindharajan, Hemant Agrawal, Ian Stokes,
	Jerin Jacob, John McNamara, Ju-Hyoung Lee, Kevin Traynor,
	Luca Boccassi, Pei Zhang, pingx.yu, qian.q.xu, Raslan Darawsheh,
	NBU-Contact-Thomas Monjalon, yuan.peng, zhaoyan.chen

> 
> From: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com> 
> Sent: Tuesday, July 6, 2021 11:27 AM
> To: Xueming(Steven) Li <xuemingl@nvidia.com>
> Cc: dpdk stable <stable@dpdk.org>; dpdk-dev <dev@dpdk.org>; Abhishek Marathe <Abhishek.Marathe@microsoft.com>; Akhil Goyal <akhil.goyal@nxp.com>; Ali Alnubani <alialnu@nvidi
> Subject: Re: [dpdk-stable] 20.11.2 patches review and test
> 
> Hi Xueming,
> 
> Testing with dpdk v20.11.2 from Broadcom looks good.
> 
> - Basic functionality:
>   Send and receive multiple types of traffic.
> - Changing/checking link status through testpmd.
> - RSS tests.
> - TSO tests
> - VLAN filtering tests.
> - MAC filtering test
> - statistics tests
> - Checksum offload tests
> - MTU tests
> - Promiscuous tests
> - Allmulti test
> 
> NIC: BCM57414 NetXtreme-E 10Gb/25Gb Ethernet Controller, Firmware: 219.0.88.0
> NIC: BCM57508 NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb/200Gb Ethernet, Firmware : 220.0.0.100

Thanks very much!

> 
> Regards,
> Kalesh
> 
> On Sun, Jun 27, 2021 at 4:59 AM Xueming Li <mailto:xuemingl@nvidia.com> wrote:
> Hi all,
> 
> Here is a list of patches targeted for stable release 20.11.2.
> 
> The planned date for the final release is 6th July.
> 
> Please help with testing and validation of your use cases and report
> any issues/results with reply-all to this mail. For the final release
> the fixes and reported validations will be added to the release notes.
> 
> A release candidate tarball can be found at:
> 
>     https://dpdk.org/browse/dpdk-stable/tag/?id=v20.11.2-rc2
> 
> These patches are located at branch 20.11 of dpdk-stable repo:
>     https://dpdk.org/browse/dpdk-stable/
> 
> Thanks.
> 
> Xueming Li <mailto:xuemingl@nvidia.com>
> 
> ---
> Adam Dybkowski (3):
>       common/qat: increase IM buffer size for GEN3
>       compress/qat: enable compression on GEN3
>       crypto/qat: fix null authentication request
> 
> Ajit Khaparde (7):
>       net/bnxt: fix RSS context cleanup
>       net/bnxt: check kvargs parsing
>       net/bnxt: fix resource cleanup
>       doc: fix formatting in testpmd guide
>       net/bnxt: fix mismatched type comparison in MAC restore
>       net/bnxt: check PCI config read
>       net/bnxt: fix mismatched type comparison in Rx
> 
> Alvin Zhang (11):
>       net/ice: fix VLAN filter with PF
>       net/i40e: fix input set field mask
>       net/igc: fix Rx RSS hash offload capability
>       net/igc: fix Rx error counter for bad length
>       net/e1000: fix Rx error counter for bad length
>       net/e1000: fix max Rx packet size
>       net/igc: fix Rx packet size
>       net/ice: fix fast mbuf freeing
>       net/iavf: fix VF to PF command failure handling
>       net/i40e: fix VF RSS configuration
>       net/igc: fix speed configuration
> 
> Anatoly Burakov (3):
>       fbarray: fix log message on truncation error
>       power: do not skip saving original P-state governor
>       power: save original ACPI governor always
> 
> Andrew Boyer (1):
>       net/ionic: fix completion type in lif init
> 
> Andrew Rybchenko (4):
>       net/failsafe: fix RSS hash offload reporting
>       net/failsafe: report minimum and maximum MTU
>       common/sfc_efx: remove GENEVE from supported tunnels
>       net/sfc: fix mark support in EF100 native Rx datapath
> 
> Andy Moreton (2):
>       common/sfc_efx/base: limit reported MCDI response length
>       common/sfc_efx/base: add missing MCDI response length checks
> 
> Ankur Dwivedi (1):
>       crypto/octeontx: fix session-less mode
> 
> Apeksha Gupta (1):
>       examples/l2fwd-crypto: skip masked devices
> 
> Arek Kusztal (1):
>       crypto/qat: fix offset for out-of-place scatter-gather
> 
> Beilei Xing (1):
>       net/i40evf: fix packet loss for X722
> 
> Bing Zhao (1):
>       net/mlx5: fix loopback for Direct Verbs queue
> 
> Bruce Richardson (2):
>       build: exclude meson files from examples installation
>       raw/ioat: fix script for configuring small number of queues
> 
> Chaoyong He (1):
>       doc: fix multiport syntax in nfp guide
> 
> Chenbo Xia (1):
>       examples/vhost: check memory table query
> 
> Chengchang Tang (20):
>       net/hns3: fix HW buffer size on MTU update
>       net/hns3: fix processing Tx offload flags
>       net/hns3: fix Tx checksum for UDP packets with special port
>       net/hns3: fix long task queue pairs reset time
>       ethdev: validate input in module EEPROM dump
>       ethdev: validate input in register info
>       ethdev: validate input in EEPROM info
>       net/hns3: fix rollback after setting PVID failure
>       net/hns3: fix timing in resetting queues
>       net/hns3: fix queue state when concurrent with reset
>       net/hns3: fix configure FEC when concurrent with reset
>       net/hns3: fix use of command status enumeration
>       examples: add eal cleanup to examples
>       net/bonding: fix adding itself as its slave
>       net/hns3: fix timing in mailbox
>       app/testpmd: fix max queue number for Tx offloads
>       net/tap: fix interrupt vector array size
>       net/bonding: fix socket ID check
>       net/tap: check ioctl on restore
>       examples/timer: fix time interval
> 
> Chengwen Feng (50):
>       net/hns3: fix flow counter value
>       net/hns3: fix VF mailbox head field
>       net/hns3: support get device version when dump register
>       net/hns3: fix some packet types
>       net/hns3: fix missing outer L4 UDP flag for VXLAN
>       net/hns3: remove VLAN/QinQ ptypes from support list
>       test: check thread creation
>       common/dpaax: fix possible null pointer access
>       examples/ethtool: remove unused parsing
>       net/hns3: fix flow director lock
>       net/e1000/base: fix timeout for shadow RAM write
>       net/hns3: fix setting default MAC address in bonding of VF
>       net/hns3: fix possible mismatched response of mailbox
>       net/hns3: fix VF handling LSC event in secondary process
>       net/hns3: fix verification of NEON support
>       mbuf: check shared memory before dumping dynamic space
>       eventdev: remove redundant thread name setting
>       eventdev: fix memory leakage on thread creation failure
>       net/kni: check init result
>       net/hns3: fix mailbox error message
>       net/hns3: fix processing link status message on PF
>       net/hns3: remove unused mailbox macro and struct
>       net/bonding: fix leak on remove
>       net/hns3: fix handling link update
>       net/i40e: fix negative VEB index
>       net/i40e: remove redundant VSI check in Tx queue setup
>       net/virtio: fix getline memory leakage
>       net/hns3: log time delta in decimal format
>       net/hns3: fix time delta calculation
>       net/hns3: remove unused macros
>       net/hns3: fix vector Rx burst limitation
>       net/hns3: remove read when enabling TM QCN error event
>       net/hns3: remove unused VMDq code
>       net/hns3: increase readability in logs
>       raw/ntb: check SPAD user index
>       raw/ntb: check memory allocations
>       ipc: check malloc sync reply result
>       eal: fix service core list parsing
>       ipc: use monotonic clock
>       net/hns3: return error on PCI config write failure
>       net/hns3: fix log on flow director clear
>       net/hns3: clear hash map on flow director clear
>       net/hns3: fix querying flow director counter for out param
>       net/hns3: fix TM QCN error event report by MSI-X
>       net/hns3: fix mailbox message ID in log
>       net/hns3: fix secondary process request start/stop Rx/Tx
>       net/hns3: fix ordering in secondary process initialization
>       net/hns3: fail setting FEC if one bit mode is not supported
>       net/mlx4: fix secondary process initialization ordering
>       net/mlx5: fix secondary process initialization ordering
> 
> Ciara Loftus (1):
>       net/af_xdp: fix error handling during Rx queue setup
> 
> Ciara Power (2):
>       telemetry: fix race on callbacks list
>       test/crypto: fix return value of a skipped test
> 
> Conor Walsh (1):
>       examples/l3fwd: fix LPM IPv6 subnets
> 
> Cristian Dumitrescu (3):
>       table: fix actions with different data size
>       pipeline: fix instruction translation
>       pipeline: fix endianness conversions
> 
> Dapeng Yu (3):
>       net/igc: remove MTU setting limitation
>       net/e1000: remove MTU setting limitation
>       examples/packet_ordering: fix port configuration
> 
> David Christensen (1):
>       config/ppc: reduce number of cores and NUMA nodes
> 
> David Harton (1):
>       net/ena: fix releasing Tx ring mbufs
> 
> David Hunt (4):
>       test/power: fix CPU frequency check
>       test/power: add turbo mode to frequency check
>       test/power: fix low frequency test when turbo enabled
>       test/power: fix turbo test
> 
> David Marchand (18):
>       doc: fix sphinx rtd theme import in GHA
>       service: clean references to removed symbol
>       eal: fix evaluation of log level option
>       ci: hook to GitHub Actions
>       ci: enable v21 ABI checks
>       ci: fix package installation in GitHub Actions
>       ci: ignore APT update failure in GitHub Actions
>       ci: catch coredumps
>       vhost: fix offload flags in Rx path
>       bus/fslmc: remove unused debug macro
>       eal: fix leak in shared lib mode detection
>       event/dpaa2: remove unused macros
>       net/ice/base: fix memory allocation wrapper
>       net/ice: fix leak on thread termination
>       devtools: fix orphan symbols check with busybox
>       net/vhost: restore pseudo TSO support
>       net/ark: fix leak on thread termination
>       build: fix drivers selection without Python
> 
> Dekel Peled (1):
>       common/mlx5: fix DevX read output buffer size
> 
> Dmitry Kozlyuk (4):
>       net/pcap: fix format string
>       eal/windows: add missing SPDX license tag
>       buildtools: fix all drivers disabled on Windows
>       examples/rxtx_callbacks: fix port ID format specifier
> 
> Ed Czeck (2):
>       net/ark: update packet director initial state
>       net/ark: refactor Rx buffer recovery
> 
> Elad Nachman (2):
>       kni: support async user request
>       kni: fix kernel deadlock with bifurcated device
> 
> Feifei Wang (2):
>       net/i40e: fix parsing packet type for NEON
>       test/trace: fix race on collected perf data
> 
> Ferruh Yigit (9):
>       power: remove duplicated symbols from map file
>       log/linux: make default output stderr
>       license: fix typos
>       drivers/net: fix FW version query
>       net/bnx2x: fix build with GCC 11
>       net/bnx2x: fix build with GCC 11
>       net/ice/base: fix build with GCC 11
>       net/tap: fix build with GCC 11
>       test/table: fix build with GCC 11
> 
> Gregory Etelson (2):
>       app/testpmd: fix tunnel offload flows cleanup
>       net/mlx5: fix tunnel offload private items location
> 
> Guoyang Zhou (1):
>       net/hinic: fix crash in secondary process
> 
> Haiyue Wang (1):
>       net/ixgbe: fix Rx errors statistics for UDP checksum
> 
> Harman Kalra (1):
>       event/octeontx2: fix device reconfigure for single slot
> 
> Heinrich Kuhn (1):
>       net/nfp: fix reporting of RSS capabilities
> 
> Hemant Agrawal (3):
>       ethdev: add missing buses in device iterator
>       crypto/dpaa_sec: affine the thread portal affinity
>       crypto/dpaa2_sec: fix close and uninit functions
> 
> Hongbo Zheng (9):
>       app/testpmd: fix Tx/Rx descriptor query error log
>       net/hns3: fix FLR miss detection
>       net/hns3: delete redundant blank line
>       bpf: fix JSLT validation
>       common/sfc_efx/base: fix dereferencing null pointer
>       power: fix sanity checks for guest channel read
>       net/hns3: fix VF alive notification after config restore
>       examples/l3fwd-power: fix empty poll thresholds
>       net/hns3: fix concurrent interrupt handling
> 
> Huisong Li (23):
>       net/hns3: fix device capabilities for copper media type
>       net/hns3: remove unused parameter markers
>       net/hns3: fix reporting undefined speed
>       net/hns3: fix link update when failed to get link info
>       net/hns3: fix flow control exception
>       app/testpmd: fix bitmap of link speeds when force speed
>       net/hns3: fix flow control mode
>       net/hns3: remove redundant mailbox response
>       net/hns3: fix DCB mode check
>       net/hns3: fix VMDq mode check
>       net/hns3: fix mbuf leakage
>       net/hns3: fix link status when port is stopped
>       net/hns3: fix link speed when port is down
>       app/testpmd: fix forward lcores number for DCB
>       app/testpmd: fix DCB forwarding configuration
>       app/testpmd: fix DCB re-configuration
>       app/testpmd: verify DCB config during forward config
>       net/hns3: fix Rx/Tx queue numbers check
>       net/hns3: fix requested FC mode rollback
>       net/hns3: remove meaningless packet buffer rollback
>       net/hns3: fix DCB configuration
>       net/hns3: fix DCB reconfiguration
>       net/hns3: fix link speed when VF device is down
> 
> Ibtisam Tariq (1):
>       examples/vhost_crypto: remove unused short option
> 
> Igor Chauskin (2):
>       net/ena: switch memcpy to optimized version
>       net/ena: fix parsing of large LLQ header device argument
> 
> Igor Russkikh (2):
>       net/qede: reduce log verbosity
>       net/qede: accept bigger RSS table
> 
> Ilya Maximets (1):
>       net/virtio: fix interrupt unregistering for listening socket
> 
> Ivan Malov (5):
>       net/sfc: fix buffer size for flow parse
>       net: fix comment in IPv6 header
>       net/sfc: fix error path inconsistency
>       common/sfc_efx/base: fix indication of MAE encap support
>       net/sfc: fix outer rule rollback on error
> 
> Jerin Jacob (1):
>       examples: fix pkg-config override
> 
> Jiawei Wang (4):
>       app/testpmd: fix NVGRE encap configuration
>       net/mlx5: fix resource release for mirror flow
>       net/mlx5: fix RSS flow item expansion for GRE key
>       net/mlx5: fix RSS flow item expansion for NVGRE
> 
> Jiawei Zhu (1):
>       net/mlx5: fix Rx segmented packets on mbuf starvation
> 
> Jiawen Wu (4):
>       net/txgbe: remove unused functions
>       net/txgbe: fix Rx missed packet counter
>       net/txgbe: update packet type
>       net/txgbe: fix QinQ strip
> 
> Jiayu Hu (2):
>       vhost: fix queue initialization
>       vhost: fix redundant vring status change notification
> 
> Jie Wang (1):
>       net/ice: fix VSI array out of bounds access
> 
> John Daley (2):
>       net/enic: fix flow initialization error handling
>       net/enic: enable GENEVE offload via VNIC configuration
> 
> Juraj Linkea (1):
>       eal/arm64: fix platform register bit
> 
> Kai Ji (2):
>       test/crypto: fix auth-cipher compare length in OOP
>       test/crypto: copy offset data to OOP destination buffer
> 
> Kalesh AP (23):
>       net/bnxt: remove unused macro
>       net/bnxt: fix VNIC configuration
>       net/bnxt: fix firmware fatal error handling
>       net/bnxt: fix FW readiness check during recovery
>       net/bnxt: fix device readiness check
>       net/bnxt: fix VF info allocation
>       net/bnxt: fix HWRM and FW incompatibility handling
>       net/bnxt: mute some failure logs
>       app/testpmd: check MAC address query
>       net/bnxt: fix PCI write check
>       net/bnxt: fix link state operations
>       net/bnxt: fix timesync when PTP is not supported
>       net/bnxt: fix memory allocation for command response
>       net/bnxt: fix double free in port start failure
>       net/bnxt: fix configuring LRO
>       net/bnxt: fix health check alarm cancellation
>       net/bnxt: fix PTP support for Thor
>       net/bnxt: fix ring count calculation for Thor
>       net/bnxt: remove unnecessary forward declarations
>       net/bnxt: remove unused function parameters
>       net/bnxt: drop unused attribute
>       net/bnxt: fix single PF per port check
>       net/bnxt: prevent device access in error state
> 
> Kamil Vojanec (1):
>       net/mlx5/linux: fix firmware version
> 
> Kevin Traynor (5):
>       test/cmdline: fix inputs array
>       test/crypto: fix build with GCC 11
>       crypto/zuc: fix build with GCC 11
>       test: fix build with GCC 11
>       test/cmdline: silence clang 12 warning
> 
> Konstantin Ananyev (1):
>       acl: fix build with GCC 11
> 
> Lance Richardson (8):
>       net/bnxt: fix Rx buffer posting
>       net/bnxt: fix Tx length hint threshold
>       net/bnxt: fix handling of null flow mask
>       test: fix TCP header initialization
>       net/bnxt: fix Rx descriptor status
>       net/bnxt: fix Rx queue count
>       net/bnxt: fix dynamic VNIC count
>       eal: fix memory mapping on 32-bit target
> 
> Leyi Rong (1):
>       net/iavf: fix packet length parsing in AVX512
> 
> Li Zhang (1):
>       net/mlx5: fix flow actions index in cache
> 
> Luc Pelletier (2):
>       eal: fix race in control thread creation
>       eal: fix hang in control thread creation
> 
> Marvin Liu (5):
>       vhost: fix split ring potential buffer overflow
>       vhost: fix packed ring potential buffer overflow
>       vhost: fix batch dequeue potential buffer overflow
>       vhost: fix initialization of temporary header
>       vhost: fix initialization of async temporary header
> 
> Matan Azrad (5):
>       common/mlx5/linux: add glue function to query WQ
>       common/mlx5: add DevX command to query WQ
>       common/mlx5: add DevX commands for queue counters
>       vdpa/mlx5: fix virtq cleaning
>       vdpa/mlx5: fix device unplug
> 
> Michael Baum (1):
>       net/mlx5: fix flow age event triggering
> 
> Michal Krawczyk (5):
>       net/ena/base: improve style and comments
>       net/ena/base: fix type conversions by explicit casting
>       net/ena/base: destroy multiple wait events
>       net/ena: fix crash with unsupported device argument
>       net/ena: indicate Rx RSS hash presence
> 
> Min Hu (Connor) (25):
>       net/hns3: fix MTU config complexity
>       net/hns3: update HiSilicon copyright syntax
>       net/hns3: fix copyright date
>       examples/ptpclient: remove wrong comment
>       test/bpf: fix error message
>       doc: fix HiSilicon copyright syntax
>       net/hns3: remove unused macros
>       net/hns3: remove unused macro
>       app/eventdev: fix overflow in lcore list parsing
>       test/kni: fix a comment
>       test/kni: check init result
>       net/hns3: fix typos on comments
>       net/e1000: fix flow error message object
>       app/testpmd: fix division by zero on socket memory dump
>       net/kni: warn on stop failure
>       app/bbdev: check memory allocation
>       app/bbdev: fix HARQ error messages
>       raw/skeleton: add missing check after setting attribute
>       test/timer: check memzone allocation
>       app/crypto-perf: check memory allocation
>       examples/flow_classify: fix NUMA check of port and core
>       examples/l2fwd-cat: fix NUMA check of port and core
>       examples/skeleton: fix NUMA check of port and core
>       test: check flow classifier creation
>       test: fix division by zero
> 
> Murphy Yang (3):
>       net/ixgbe: fix RSS RETA being reset after port start
>       net/i40e: fix flow director config after flow validate
>       net/i40e: fix flow director for common pctypes
> 
> Natanael Copa (5):
>       common/dpaax/caamflib: fix build with musl
>       bus/dpaa: fix 64-bit arch detection
>       bus/dpaa: fix build with musl
>       net/cxgbe: remove use of uint type
>       app/testpmd: fix build with musl
> 
> Nipun Gupta (1):
>       bus/dpaa: fix statistics reading
> 
> Nithin Dabilpuram (3):
>       vfio: do not merge contiguous areas
>       vfio: fix DMA mapping granularity for IOVA as VA
>       test/mem: fix page size for external memory
> 
> Olivier Matz (1):
>       test/mempool: fix object initializer
> 
> Pallavi Kadam (1):
>       bus/pci: skip probing some Windows NDIS devices
> 
> Pavan Nikhilesh (4):
>       test/event: fix timeout accuracy
>       app/eventdev: fix timeout accuracy
>       app/eventdev: fix lcore parsing skipping last core
>       event/octeontx2: fix XAQ pool reconfigure
> 
> Pu Xu (1):
>       ip_frag: fix fragmenting IPv4 packet with header option
> 
> Qi Zhang (8):
>       net/ice/base: fix payload indicator on ptype
>       net/ice/base: fix uninitialized struct
>       net/ice/base: cleanup filter list on error
>       net/ice/base: fix memory allocation for MAC addresses
>       net/iavf: fix TSO max segment size
>       doc: fix matching versions in ice guide
>       net/iavf: fix wrong Tx context descriptor
>       common/iavf: fix duplicated offload bit
> 
> Radha Mohan Chintakuntla (1):
>       raw/octeontx2_dma: assign PCI device in DPI VF
> 
> Raslan Darawsheh (1):
>       ethdev: update flow item GTP QFI definition
> 
> Richael Zhuang (2):
>       test/power: add delay before checking CPU frequency
>       test/power: round CPU frequency to check
> 
> Robin Zhang (6):
>       net/i40e: announce request queue capability in PF
>       doc: update recommended versions for i40e
>       net/i40e: fix lack of MAC type when set MAC address
>       net/iavf: fix lack of MAC type when set MAC address
>       net/iavf: fix primary MAC type when starting port
>       net/i40e: fix primary MAC type when starting port
> 
> Rohit Raj (3):
>       net/dpaa2: fix getting link status
>       net/dpaa: fix getting link status
>       examples/l2fwd-crypto: fix packet length while decryption
> 
> Roy Shterman (1):
>       mem: fix freeing segments in --huge-unlink mode
> 
> Satheesh Paul (1):
>       net/octeontx2: fix VLAN filter
> 
> Savinay Dharmappa (1):
>       sched: fix traffic class oversubscription parameter
> 
> Shijith Thotton (3):
>       eventdev: fix case to initiate crypto adapter service
>       event/octeontx2: fix crypto adapter queue pair operations
>       event/octeontx2: configure crypto adapter xaq pool
> 
> Siwar Zitouni (1):
>       net/ice: fix disabling promiscuous mode
> 
> Somnath Kotur (5):
>       net/bnxt: fix xstats get
>       net/bnxt: fix Rx and Tx timestamps
>       net/bnxt: fix Tx timestamp init
>       net/bnxt: refactor multi-queue Rx configuration
>       net/bnxt: fix Rx timestamp when FIFO pending bit is set
> 
> Stanislaw Kardach (6):
>       test: proceed if timer subsystem already initialized
>       stack: allow lock-free only on relevant architectures
>       test/distributor: fix worker notification in burst mode
>       test/distributor: fix burst flush on worker quit
>       net/ena: remove endian swap functions
>       net/ena: report default ring size
> 
> Stephen Hemminger (2):
>       kni: refactor user request processing
>       net/bnxt: use prefix on global function
> 
> Suanming Mou (1):
>       net/mlx5: fix counter offset detection
> 
> Tal Shnaiderman (2):
>       eal/windows: fix default thread priority
>       eal/windows: fix return codes of pthread shim layer
> 
> Tengfei Zhang (1):
>       net/pcap: fix file descriptor leak on close
> 
> Thinh Tran (1):
>       test: fix autotest handling of skipped tests
> 
> Thomas Monjalon (18):
>       bus/pci: fix Windows kernel driver categories
>       eal: fix comment of OS-specific header files
>       buildtools: fix build with busybox
>       build: detect execinfo library on Linux
>       build: remove redundant _GNU_SOURCE definitions
>       eal: fix build with musl
>       net/igc: remove use of uint type
>       event/dlb: fix header includes for musl
>       examples/bbdev: fix header include for musl
>       drivers: fix log level after loading
>       app/regex: fix usage text
>       app/testpmd: fix usage text
>       doc: fix names of UIO drivers
>       doc: fix build with Sphinx 4
>       bus/pci: support I/O port operations with musl
>       app: fix exit messages
>       regex/octeontx2: remove unused include directory
>       doc: remove PDF requirements
> 
> Tianyu Li (1):
>       net/memif: fix Tx bps statistics for zero-copy
> 
> Timothy McDaniel (2):
>       event/dlb2: remove references to deferred scheduling
>       doc: fix runtime options in DLB2 guide
> 
> Tyler Retzlaff (1):
>       eal: add C++ include guard for reciprocal header
> 
> Vadim Podovinnikov (1):
>       net/bonding: fix LACP system address check
> 
> Venkat Duvvuru (1):
>       net/bnxt: fix queues per VNIC
> 
> Viacheslav Ovsiienko (16):
>       net/mlx5: fix external buffer pool registration for Rx queue
>       net/mlx5: fix metadata item validation for ingress flows
>       net/mlx5: fix hashed list size for tunnel flow groups
>       net/mlx5: fix UAR allocation diagnostics messages
>       common/mlx5: add timestamp format support to DevX
>       vdpa/mlx5: support timestamp format
>       net/mlx5: fix Rx metadata leftovers
>       net/mlx5: fix drop action for Direct Rules/Verbs
>       net/mlx4: fix RSS action with null hash key
>       net/mlx5: support timestamp format
>       regex/mlx5: support timestamp format
>       app/testpmd: fix segment number check
>       net/mlx5: remove drop queue function prototypes
>       net/mlx4: fix buffer leakage on device close
>       net/mlx5: fix probing device in legacy bonding mode
>       net/mlx5: fix receiving queue timestamp format
> 
> Wei Huang (1):
>       raw/ifpga: fix device name format
> 
> Wenjun Wu (3):
>       net/ice: check some functions return
>       net/ice: fix RSS hash update
>       net/ice: fix RSS for L2 packet
> 
> Wenwu Ma (1):
>       net/ice: fix illegal access when removing MAC filter
> 
> Wenzhuo Lu (2):
>       net/iavf: fix crash in AVX512
>       net/ice: fix crash in AVX512
> 
> Wisam Jaddo (1):
>       app/flow-perf: fix encap/decap actions
> 
> Xiao Wang (1):
>       vdpa/ifc: check PCI config read
> 
> Xiaoyu Min (4):
>       net/mlx5: support RSS expansion for IPv6 GRE
>       net/mlx5: fix shared inner RSS
>       net/mlx5: fix missing shared RSS hash types
>       net/mlx5: fix redundant flow after RSS expansion
> 
> Xiaoyun Li (2):
>       app/testpmd: remove unnecessary UDP tunnel check
>       net/i40e: fix IPv4 fragment offload
> 
> Xueming Li (2):
>       version: 20.11.2-rc1
>       net/virtio: fix vectorized Rx queue rearm
> 
> Youri Querry (1):
>       bus/fslmc: fix random portal hangs with qbman 5.0
> 
> Yunjian Wang (5):
>       vfio: fix API description
>       net/mlx5: fix using flow tunnel before null check
>       vfio: fix duplicated user mem map
>       net/mlx4: fix leak when configured repeatedly
>       net/mlx5: fix leak when configured repeatedly
> 
> 
> 
> -- 
> Regards,
> Kalesh A P
>

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH V2] ethdev: add dev configured flag
  @ 2021-07-06  8:36  4%   ` Andrew Rybchenko
  2021-07-07  2:55  0%     ` Huisong Li
  2021-07-07  7:39  3%     ` David Marchand
  0 siblings, 2 replies; 200+ results
From: Andrew Rybchenko @ 2021-07-06  8:36 UTC (permalink / raw)
  To: Huisong Li, dev
  Cc: thomas, ferruh.yigit, konstantin.ananyev, david.marchand, Ray Kinsella

@David, could you take a look at the ABI breakage warnings for
the patch. May we ignore it since ABI looks backward
compatible? Or should be marked as a minor change ABI
which is backward compatible with DPDK_21?

On 7/6/21 7:10 AM, Huisong Li wrote:
> Currently, if dev_configure is not called or fails to be called, users
> can still call dev_start successfully. So it is necessary to have a flag
> which indicates whether the device is configured, to control whether
> dev_start can be called and eliminate dependency on user invocation order.
> 
> The flag stored in "struct rte_eth_dev_data" is more reasonable than
>  "enum rte_eth_dev_state". "enum rte_eth_dev_state" is private to the
> primary and secondary processes, and can be independently controlled.
> However, the secondary process does not make resource allocations and
> does not call dev_configure(). These are done by the primary process
> and can be obtained or used by the secondary process. So this patch
> adds a "dev_configured" flag in "rte_eth_dev_data", like "dev_started".
> 
> Signed-off-by: Huisong Li <lihuisong@huawei.com>

Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

> ---
> v1 -> v2:
>   - adjusting the description of patch.
> 
> ---
>  lib/ethdev/rte_ethdev.c      | 16 ++++++++++++++++
>  lib/ethdev/rte_ethdev_core.h |  6 +++++-
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index c607eab..6540432 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -1356,6 +1356,13 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>  		return -EBUSY;
>  	}
>  
> +	/*
> +	 * Ensure that "dev_configured" is always 0 each time prepare to do
> +	 * dev_configure() to avoid any non-anticipated behaviour.
> +	 * And set to 1 when dev_configure() is executed successfully.
> +	 */
> +	dev->data->dev_configured = 0;
> +
>  	 /* Store original config, as rollback required on failure */
>  	memcpy(&orig_conf, &dev->data->dev_conf, sizeof(dev->data->dev_conf));
>  
> @@ -1606,6 +1613,8 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>  	}
>  
>  	rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, 0);
> +	dev->data->dev_configured = 1;
> +

I think it should be inserted before the trace, since tracing
is intentionally put close to return without any empty lines
in between.

>  	return 0;
>  reset_queues:
>  	eth_dev_rx_queue_config(dev, 0);
> @@ -1751,6 +1760,13 @@ rte_eth_dev_start(uint16_t port_id)
>  
>  	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_start, -ENOTSUP);
>  
> +	if (dev->data->dev_configured == 0) {
> +		RTE_ETHDEV_LOG(INFO,
> +			"Device with port_id=%"PRIu16" is not configured.\n",
> +			port_id);
> +		return -EINVAL;
> +	}
> +
>  	if (dev->data->dev_started != 0) {
>  		RTE_ETHDEV_LOG(INFO,
>  			"Device with port_id=%"PRIu16" already started\n",
> diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
> index 4679d94..edf96de 100644
> --- a/lib/ethdev/rte_ethdev_core.h
> +++ b/lib/ethdev/rte_ethdev_core.h
> @@ -167,7 +167,11 @@ struct rte_eth_dev_data {
>  		scattered_rx : 1,  /**< RX of scattered packets is ON(1) / OFF(0) */
>  		all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
>  		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
> -		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
> +		lro         : 1,   /**< RX LRO is ON(1) / OFF(0) */
> +		dev_configured : 1;
> +		/**< Indicates whether the device is configured.
> +		 *   CONFIGURED(1) / NOT CONFIGURED(0).
> +		 */
>  	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
>  		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
>  	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
> 


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH] test: fix crypto_op length for sessionless case
  @ 2021-07-06 16:09  3%       ` Brandon Lo
  0 siblings, 0 replies; 200+ results
From: Brandon Lo @ 2021-07-06 16:09 UTC (permalink / raw)
  To: Gujjar, Abhinandan S
  Cc: Yigit, Ferruh, dev, jerinj, dpdklab, aconole, gakhil, Power,
	Ciara, Ali Alnubani

Hi all,

I have rerun the failing unit test. It also recreated the report, so
that category should be passing now.
Currently, I am looking more into the ABI test that is failing on
Arch, as well as the failures with DTS tests.
I will keep this thread updated.

Thanks,
Brandon

On Mon, Jul 5, 2021 at 2:30 AM Gujjar, Abhinandan S
<abhinandan.gujjar@intel.com> wrote:
>
> Hi Jerin/Akhil,
>
> Could you please review the patch?
>
> Regards
> Abhinandan
>
> > -----Original Message-----
> > From: Yigit, Ferruh <ferruh.yigit@intel.com>
> > Sent: Saturday, July 3, 2021 4:56 AM
> > To: Gujjar, Abhinandan S <abhinandan.gujjar@intel.com>; dev@dpdk.org;
> > jerinj@marvell.com; dpdklab@iol.unh.edu; aconole@redhat.com
> > Cc: gakhil@marvell.com; Power, Ciara <ciara.power@intel.com>; Ali Alnubani
> > <alialnu@nvidia.com>
> > Subject: Re: [PATCH] test: fix crypto_op length for sessionless case
> >
> > On 7/2/2021 7:08 PM, Gujjar, Abhinandan S wrote:
> > > Hi Aaron/dpdklab,
> > >
> > > This patch's CI seems to have lot of false positive!
> > > Ferruh triggered the re-test sometime back. Now, it is reporting less.
> > > Could you please check from your end? Thanks!
> > >
> >
> > Only a malloc related unit test is still failing, which seems unrelated with the
> > patch. I am triggering it one more time, third time lucky.
> >
> > Also after re-run, some tests passing now still shown as fail in the patchwork
> > checks table. Isn't re-run sending the patchwork test status again?
> >
> > > Regards
> > > Abhinandan
> > >
> > >
> > >> -----Original Message-----
> > >> From: Gujjar, Abhinandan S <abhinandan.gujjar@intel.com>
> > >> Sent: Wednesday, June 30, 2021 6:17 PM
> > >> To: dev@dpdk.org; jerinj@marvell.com
> > >> Cc: gakhil@marvell.com; Gujjar, Abhinandan S
> > >> <abhinandan.gujjar@intel.com>; Power, Ciara <ciara.power@intel.com>
> > >> Subject: [PATCH] test: fix crypto_op length for sessionless case
> > >>
> > >> Currently, private_data_offset for the sessionless is computed
> > >> wrongly which includes extra bytes added because of using
> > >> sizeof(struct
> > >> rte_crypto_sym_xform) * 2) instead of (sizeof(union
> > >> rte_event_crypto_metadata)). Due to this buffer overflow, the
> > >> corruption was leading to test application crash while freeing the ops
> > mempool.
> > >>
> > >> Fixes: 3c2c535ecfc0 ("test: add event crypto adapter auto-test")
> > >> Reported-by: ciara.power@intel.com
> > >>
> > >> Signed-off-by: Abhinandan Gujjar <abhinandan.gujjar@intel.com>
> > >> ---
> > >>  app/test/test_event_crypto_adapter.c | 4 ++--
> > >>  1 file changed, 2 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/app/test/test_event_crypto_adapter.c
> > >> b/app/test/test_event_crypto_adapter.c
> > >> index f689bc1f2..688ac0b2f 100644
> > >> --- a/app/test/test_event_crypto_adapter.c
> > >> +++ b/app/test/test_event_crypto_adapter.c
> > >> @@ -229,7 +229,7 @@ test_op_forward_mode(uint8_t session_less)
> > >>               first_xform = &cipher_xform;
> > >>               sym_op->xform = first_xform;
> > >>               uint32_t len = IV_OFFSET + MAXIMUM_IV_LENGTH +
> > >> -                             (sizeof(struct rte_crypto_sym_xform) * 2);
> > >> +                             (sizeof(union
> > >> + rte_event_crypto_metadata));
> > >>               op->private_data_offset = len;
> > >>               /* Fill in private data information */
> > >>               rte_memcpy(&m_data.response_info, &response_info, @@ -
> > >> 424,7 +424,7 @@ test_op_new_mode(uint8_t session_less)
> > >>               first_xform = &cipher_xform;
> > >>               sym_op->xform = first_xform;
> > >>               uint32_t len = IV_OFFSET + MAXIMUM_IV_LENGTH +
> > >> -                             (sizeof(struct rte_crypto_sym_xform) * 2);
> > >> +                             (sizeof(union
> > >> + rte_event_crypto_metadata));
> > >>               op->private_data_offset = len;
> > >>               /* Fill in private data information */
> > >>               rte_memcpy(&m_data.response_info, &response_info,
> > >> --
> > >> 2.25.1
> > >
>


-- 

Brandon Lo

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

blo@iol.unh.edu

www.iol.unh.edu

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH V2] ethdev: add dev configured flag
  2021-07-06  8:36  4%   ` Andrew Rybchenko
@ 2021-07-07  2:55  0%     ` Huisong Li
  2021-07-07  8:25  3%       ` Andrew Rybchenko
  2021-07-07  7:39  3%     ` David Marchand
  1 sibling, 1 reply; 200+ results
From: Huisong Li @ 2021-07-07  2:55 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: thomas, ferruh.yigit, konstantin.ananyev, david.marchand, Ray Kinsella


在 2021/7/6 16:36, Andrew Rybchenko 写道:
> @David, could you take a look at the ABI breakage warnings for
> the patch. May we ignore it since ABI looks backward
> compatible? Or should be marked as a minor change ABI
> which is backward compatible with DPDK_21?
>
> On 7/6/21 7:10 AM, Huisong Li wrote:
>> Currently, if dev_configure is not called or fails to be called, users
>> can still call dev_start successfully. So it is necessary to have a flag
>> which indicates whether the device is configured, to control whether
>> dev_start can be called and eliminate dependency on user invocation order.
>>
>> The flag stored in "struct rte_eth_dev_data" is more reasonable than
>>   "enum rte_eth_dev_state". "enum rte_eth_dev_state" is private to the
>> primary and secondary processes, and can be independently controlled.
>> However, the secondary process does not make resource allocations and
>> does not call dev_configure(). These are done by the primary process
>> and can be obtained or used by the secondary process. So this patch
>> adds a "dev_configured" flag in "rte_eth_dev_data", like "dev_started".
>>
>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>
>> ---
>> v1 -> v2:
>>    - adjusting the description of patch.
>>
>> ---
>>   lib/ethdev/rte_ethdev.c      | 16 ++++++++++++++++
>>   lib/ethdev/rte_ethdev_core.h |  6 +++++-
>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>> index c607eab..6540432 100644
>> --- a/lib/ethdev/rte_ethdev.c
>> +++ b/lib/ethdev/rte_ethdev.c
>> @@ -1356,6 +1356,13 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>>   		return -EBUSY;
>>   	}
>>   
>> +	/*
>> +	 * Ensure that "dev_configured" is always 0 each time prepare to do
>> +	 * dev_configure() to avoid any non-anticipated behaviour.
>> +	 * And set to 1 when dev_configure() is executed successfully.
>> +	 */
>> +	dev->data->dev_configured = 0;
>> +
>>   	 /* Store original config, as rollback required on failure */
>>   	memcpy(&orig_conf, &dev->data->dev_conf, sizeof(dev->data->dev_conf));
>>   
>> @@ -1606,6 +1613,8 @@ rte_eth_dev_configure(uint16_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q,
>>   	}
>>   
>>   	rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q, dev_conf, 0);
>> +	dev->data->dev_configured = 1;
>> +
> I think it should be inserted before the trace, since tracing
> is intentionally put close to return without any empty lines
> in between.
All right. Do I need to send a patch V3?
>>   	return 0;
>>   reset_queues:
>>   	eth_dev_rx_queue_config(dev, 0);
>> @@ -1751,6 +1760,13 @@ rte_eth_dev_start(uint16_t port_id)
>>   
>>   	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_start, -ENOTSUP);
>>   
>> +	if (dev->data->dev_configured == 0) {
>> +		RTE_ETHDEV_LOG(INFO,
>> +			"Device with port_id=%"PRIu16" is not configured.\n",
>> +			port_id);
>> +		return -EINVAL;
>> +	}
>> +
>>   	if (dev->data->dev_started != 0) {
>>   		RTE_ETHDEV_LOG(INFO,
>>   			"Device with port_id=%"PRIu16" already started\n",
>> diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
>> index 4679d94..edf96de 100644
>> --- a/lib/ethdev/rte_ethdev_core.h
>> +++ b/lib/ethdev/rte_ethdev_core.h
>> @@ -167,7 +167,11 @@ struct rte_eth_dev_data {
>>   		scattered_rx : 1,  /**< RX of scattered packets is ON(1) / OFF(0) */
>>   		all_multicast : 1, /**< RX all multicast mode ON(1) / OFF(0). */
>>   		dev_started : 1,   /**< Device state: STARTED(1) / STOPPED(0). */
>> -		lro         : 1;   /**< RX LRO is ON(1) / OFF(0) */
>> +		lro         : 1,   /**< RX LRO is ON(1) / OFF(0) */
>> +		dev_configured : 1;
>> +		/**< Indicates whether the device is configured.
>> +		 *   CONFIGURED(1) / NOT CONFIGURED(0).
>> +		 */
>>   	uint8_t rx_queue_state[RTE_MAX_QUEUES_PER_PORT];
>>   		/**< Queues state: HAIRPIN(2) / STARTED(1) / STOPPED(0). */
>>   	uint8_t tx_queue_state[RTE_MAX_QUEUES_PER_PORT];
>>
> .

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v4 0/3] Use WFE for spinlock and ring
    @ 2021-07-07  5:43  3% ` Ruifeng Wang
  2021-07-07  5:48  3% ` Ruifeng Wang
  2 siblings, 0 replies; 200+ results
From: Ruifeng Wang @ 2021-07-07  5:43 UTC (permalink / raw)
  Cc: dev, david.marchand, thomas, bruce.richardson, jerinj, nd,
	honnappa.nagarahalli, ruifeng.wang

The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
for a memory location to become equal to a given value'[1].

Use the API for the rte spinlock and ring implementations.
With the wait until equal APIs being stable, changes will not impact ABI.

[1] http://patches.dpdk.org/cover/62703/

v4:
Added meson option to expose WFE. (David, Bruce)

v3:
Series rebased. (David)

Gavin Hu (1):
  spinlock: use wfe to reduce contention on aarch64

Ruifeng Wang (2):
  ring: use wfe to wait for ring tail update on aarch64
  build: add option to enable wait until equal

 config/arm/meson.build                 | 2 +-
 lib/eal/include/generic/rte_spinlock.h | 4 ++--
 lib/ring/rte_ring_c11_pvt.h            | 4 ++--
 lib/ring/rte_ring_generic_pvt.h        | 3 +--
 meson_options.txt                      | 2 ++
 5 files changed, 8 insertions(+), 7 deletions(-)

-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v4 0/3] Use WFE for spinlock and ring
      2021-07-07  5:43  3% ` [dpdk-dev] [PATCH v4 0/3] " Ruifeng Wang
@ 2021-07-07  5:48  3% ` Ruifeng Wang
  2021-07-09 18:39  0%   ` Thomas Monjalon
  2 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2021-07-07  5:48 UTC (permalink / raw)
  Cc: dev, david.marchand, thomas, bruce.richardson, jerinj, nd,
	honnappa.nagarahalli, ruifeng.wang

The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
for a memory location to become equal to a given value'[1].

Use the API for the rte spinlock and ring implementations.
With the wait until equal APIs being stable, changes will not impact ABI.

[1] http://patches.dpdk.org/cover/62703/

v4:
Added meson option to expose WFE. (David, Bruce)

v3:
Series rebased. (David)

Gavin Hu (1):
  spinlock: use wfe to reduce contention on aarch64

Ruifeng Wang (2):
  ring: use wfe to wait for ring tail update on aarch64
  build: add option to enable wait until equal

 config/arm/meson.build                 | 2 +-
 lib/eal/include/generic/rte_spinlock.h | 4 ++--
 lib/ring/rte_ring_c11_pvt.h            | 4 ++--
 lib/ring/rte_ring_generic_pvt.h        | 3 +--
 meson_options.txt                      | 2 ++
 5 files changed, 8 insertions(+), 7 deletions(-)

-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH V2] ethdev: add dev configured flag
  2021-07-06  8:36  4%   ` Andrew Rybchenko
  2021-07-07  2:55  0%     ` Huisong Li
@ 2021-07-07  7:39  3%     ` David Marchand
  2021-07-07  8:23  0%       ` Andrew Rybchenko
  1 sibling, 1 reply; 200+ results
From: David Marchand @ 2021-07-07  7:39 UTC (permalink / raw)
  To: Andrew Rybchenko, Dodji Seketeli
  Cc: Huisong Li, dev, Thomas Monjalon, Yigit, Ferruh, Ananyev,
	Konstantin, Ray Kinsella

On Tue, Jul 6, 2021 at 10:36 AM Andrew Rybchenko
<andrew.rybchenko@oktetlabs.ru> wrote:
>
> @David, could you take a look at the ABI breakage warnings for
> the patch. May we ignore it since ABI looks backward
> compatible? Or should be marked as a minor change ABI
> which is backward compatible with DPDK_21?

The whole eth_dev_shared_data area has always been reset to 0 at the
first port allocation in a dpdk application life.
Subsequent calls to rte_eth_dev_release_port() reset every port
eth_dev->data to 0.

This bit flag is added in a hole of the structure, and it is
set/manipulated internally of ethdev.

So unless the application was doing something nasty like highjacking
this empty hole in the structure, I see no problem with the change wrt
ABI.


I wonder if libabigail is too strict on this report.
Or maybe there is some extreme consideration on what a compiler could
do about this hole...
Dodji?


For now, we can waive the warning.
I'll look into the exception rule to add.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library
  2021-07-05 17:16  0%       ` Bruce Richardson
@ 2021-07-07  8:08  0%         ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2021-07-07  8:08 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Chengwen Feng, Thomas Monjalon, Ferruh Yigit, Jerin Jacob,
	dpdk-dev, Morten Brørup, Nipun Gupta, Hemant Agrawal,
	Maxime Coquelin, Honnappa Nagarahalli, David Marchand,
	Satananda Burla, Prasun Kapoor, Ananyev, Konstantin, liangma,
	Radha Mohan Chintakuntla

On Mon, Jul 5, 2021 at 10:46 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Mon, Jul 05, 2021 at 09:25:34PM +0530, Jerin Jacob wrote:
> >
> > On Mon, Jul 5, 2021 at 4:22 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > On Sun, Jul 04, 2021 at 03:00:30PM +0530, Jerin Jacob wrote:
> > > > On Fri, Jul 2, 2021 at 6:51 PM Chengwen Feng <fengchengwen@huawei.com> wrote:
> > > > >
> > > > > This patch introduces 'dmadevice' which is a generic type of DMA
> > > > > device.
> <snip>
> > >
> > > +1 and the terminology with regards to queues and channels. With our ioat
> > > hardware, each HW queue was called a channel for instance.
> >
> > Looks like <dmadev> <> <channel> can cover all the use cases, if the
> > HW has more than
> > 1 queues it can be exposed as separate dmadev dev.
> >
>
> Fine for me.
>
> However, just to confirm that Morten's suggestion of using a
> (device-specific void *) channel pointer rather than dev_id + channel_id
> pair of parameters won't work for you? You can't store a pointer or dev
> index in the channel struct in the driver?

Yes. That will work. To confirm, the suggestion is to use, void *
object instead of channel_id,
That will avoid one more indirection.(index -> pointer)


>
> >
> <snip>
> > > > > + *
> > > > > + * If dma_cookie_t is >=0 it's a DMA operation request cookie, <0 it's a error
> > > > > + * code.
> > > > > + * When using cookies, comply with the following rules:
> > > > > + * a) Cookies for each virtual queue are independent.
> > > > > + * b) For a virt queue, the cookie are monotonically incremented, when it reach
> > > > > + *    the INT_MAX, it wraps back to zero.
> > >
> > > I disagree with the INT_MAX (or INT32_MAX) value here. If we use that
> > > value, it means that we cannot use implicit wrap-around inside the CPU and
> > > have to check for the INT_MAX value. Better to:
> > > 1. Specify that it wraps at UINT16_MAX which allows us to just use a
> > > uint16_t internally and wrap-around automatically, or:
> > > 2. Specify that it wraps at a power-of-2 value >= UINT16_MAX, giving
> > > drivers the flexibility at what value to wrap around.
> >
> > I think, (2) better than 1. I think, even better to wrap around the number of
> > descriptors configured in dev_configure()(We cake make this as the power of 2),
> >
>
> Interesting, I hadn't really considered that before. My only concern
> would be if an app wants to keep values in the app ring for a while after
> they have been returned from dmadev. I thought it easier to have the full
> 16-bit counter value returned to the user to give the most flexibility,
> given that going from that to any power-of-2 ring size smaller is a trivial
> operation.
>
> Overall, while my ideal situation is to always have a 0..UINT16_MAX return
> value from the function, I can live with your suggestion of wrapping at
> ring_size, since drivers will likely do that internally anyway.
> I think wrapping at INT32_MAX is too awkward and will be error prone since
> we can't rely on hardware automatically wrapping to zero, nor on the driver
> having pre-masked the value.

OK. +1 for UINT16_MAX

>
> > >
> > > > > + * c) The initial cookie of a virt queue is zero, after the device is stopped or
> > > > > + *    reset, the virt queue's cookie needs to be reset to zero.
> <snip>
> > > >
> > > > Please add some good amount of reserved bits and have API to init this
> > > > structure for future ABI stability, say rte_dmadev_queue_config_init()
> > > > or so.
> > > >
> > >
> > > I don't think that is necessary. Since the config struct is used only as
> > > parameter to the config function, any changes to it can be managed by
> > > versioning that single function. Padding would only be necessary if we had
> > > an array of these config structs somewhere.
> >
> > OK.
> >
> > For some reason, the versioning API looks ugly to me in code instead of keeping
> > some rsvd fields look cool to me with init function.
> >
> > But I agree. function versioning works in this case. No need to find other API
> > if tt is not general DPDK API practice.
> >
>
> The one thing I would suggest instead of the padding is for the internal
> APIS, to pass the struct size through, since we can't version those - and
> for padding we can't know whether any replaced padding should be used or
> not. Specifically:
>
>         typedef int (*rte_dmadev_configure_t)(struct rte_dmadev *dev, struct
>                         rte_dmadev_conf *cfg, size_t cfg_size);
>
> but for the public function:
>
>         int
>         rte_dmadev_configure(struct rte_dmadev *dev, struct
>                         rte_dmadev_conf *cfg)
>         {
>                 ...
>                 ret = dev->ops.configure(dev, cfg, sizeof(*cfg));
>                 ...
>         }

Makes sense.

>
> Then if we change the structure and version the config API, the driver can
> tell from the size what struct version it is and act accordingly. Without
> that, each time the struct changed, we'd have to add a new function pointer
> to the device ops.
>
> > In other libraries, I have seen such _init or function that can use
> > for this as well as filling default value
> > in some cases implementation values is not zero).
> > So that application can avoid memset for param structure.
> > Added rte_event_queue_default_conf_get() in eventdev spec for this.
> >
>
> I think that would largely have the same issues, unless it returned a
> pointer to data inside the driver - and which therefore could not be
> modified. Alternatively it would mean that the memory would have been
> allocated in the driver and we would need to ensure proper cleanup
> functions were called to free memory afterwards. Supporting having the
> config parameter as a local variable I think makes things a lot easier.
>
> > No strong opinion on this.
> >
> >
> >
> > >
> > > >
> > > > > +
> > > > > +/**
> > > > > + * A structure used to retrieve information of a DMA virt queue.
> > > > > + */
> > > > > +struct rte_dmadev_queue_info {
> > > > > +       enum dma_transfer_direction direction;
> > > >
> > > > A queue may support all directions so I think it should be a bitfield.
> > > >
> > > > > +       /**< Associated transfer direction */
> > > > > +       uint16_t hw_queue_id; /**< The HW queue on which to create virt queue */
> > > > > +       uint16_t nb_desc; /**< Number of descriptor for this virt queue */
> > > > > +       uint64_t dev_flags; /**< Device specific flags */
> > > > > +};
> > > > > +
> > > >
> > > > > +__rte_experimental
> > > > > +static inline dma_cookie_t
> > > > > +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id,
> > > > > +                  const struct dma_scatterlist *sg,
> > > > > +                  uint32_t sg_len, uint64_t flags)
> > > >
> > > > I would like to change this as:
> > > > rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vq_id, const struct
> > > > rte_dma_sg *src, uint32_t nb_src,
> > > > const struct rte_dma_sg *dst, uint32_t nb_dst) or so allow the use case like

In the above syntax, @Chengchang Tang
rte_dma_sg needs to contains only ptr and size.

> > > > src 30 MB copy can be splitted as written as 1 MB x 30 dst.
> > > >
>
> Out of interest, do you see much benefit (and in what way) from having the
> scatter-gather support? Unlike sending 5 buffers in one packet rather than
> 5 buffers in 5 packets to a NIC, copying an array of memory in one op vs
> multiple is functionally identical.

Knowing upfront or in shot if such segments expressed can have better
optimization
in drivers like
1) In one DMA job request HW can fill multiple segments vs multiple
DMA job requests with each segment.
2) Single completion i.e less overhead system.
3) Less latency for the job requests.


>
> > > >
> > > >
> <snip>
> > Got it. In order to save space if first CL size for fastpath(Saving 8B
> > for the pointer) and to avoid
> > function overhead, Can we use one bit of flags of op function to
> > enable the fence?
> >
>
> The original ioat implementation did exactly that. However, I then
> discovered that because a fence logically belongs between two operations,
> does the fence flag on an operation mean "don't do any jobs after this
> until this job has completed" or does it mean "don't start this job until
> all previous jobs have completed". [Or theoretically does it mean both :-)]
> Naturally, some hardware does it the former way (i.e. fence flag goes on
> last op before fence), while other hardware the latter way (i.e. fence flag
> goes on first op after the fence). Therefore, since fencing is about
> ordering *between* two (sets of) jobs, I decided that it should do exactly
> that and go between two jobs, so there is no ambiguity!
>
> However, I'm happy enough to switch to having a fence flag, but I think if
> we do that, it should be put in the "first job after fence" case, because
> it is always easier to modify a previously written job if we need to, than
> to save the flag for a future one.
>
> Alternatively, if we keep the fence as a separate function, I'm happy
> enough for it not to be on the same cacheline as the "hot" operations,
> since fencing will always introduce a small penalty anyway.

Ack.
You may consider two flags, FENCE_THEN_JOB and JOB_THEN_FENCE( If
there any use case for this or it makes sense for your HW)


For us, Fence is NOP for us as we have an implicit fence between each
HW job descriptor.


>
> > >
> > > >
> <snip>
> > > > Since we have additional function call overhead in all the
> > > > applications for this scheme, I would like to understand
> > > > the use of doing this way vs enq does the doorbell implicitly from
> > > > driver/application PoV?
> > > >
> > >
> > > In our benchmarks it's just faster. When we tested it, the overhead of the
> > > function calls was noticably less than the cost of building up the
> > > parameter array(s) for passing the jobs in as a burst. [We don't see this
> > > cost with things like NIC I/O since DPDK tends to already have the mbuf
> > > fully populated before the TX call anyway.]
> >
> > OK. I agree with stack population.
> >
> > My question was more on doing implicit doorbell update enq. Is doorbell write
> > costly in other HW compare to a function call? In our HW, it is just write of
> > the number of instructions written in a register.
> >
> > Also, we need to again access the internal PMD memory structure to find
> > where to write etc if it is a separate function.
> >
>
> The cost varies depending on a number of factors - even writing to a single
> HW register can be very slow if that register is mapped as device
> (uncacheable) memory, since (AFAIK) it will act as a full fence and wait

I don't know, At least in our case, writes are write-back. so core does not need
to wait.(If there is no read operation).

> for the write to go all the way to hardware. For more modern HW, the cost
> can be lighter. However, any cost of HW writes is going to be the same
> whether its a separate function call or not.
>
> However, the main thing about the doorbell update is that it's a
> once-per-burst thing, rather than a once-per-job. Therefore, even if you
> have to re-read the struct memory (which is likely still somewhere in your
> cores' cache), any extra small cost of doing so is to be amortized over the
> cost of a whole burst of copies.

Linux kernel has xmit_more flag in skb to address similar thing.
i.e enq job flag can have one more bit field to say update ring bell or not?
Rather having yet another function overhead.IMO, it is the best of both worlds.


>
> >
> > >
> > > >
> <snip>
> > > > > +
> > > > > +/**
> > > > > + * @warning
> > > > > + * @b EXPERIMENTAL: this API may change without prior notice.
> > > > > + *
> > > > > + * Returns the number of operations that failed to complete.
> > > > > + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> > > > > + *
> > > > > + * @param dev_id
> > > > > + *   The identifier of the device.
> > > > > + * @param vq_id
> > > > > + *   The identifier of virt queue.
> > > > (> + * @param nb_status
> > > > > + *   Indicates the size  of status array.
> > > > > + * @param[out] status
> > > > > + *   The error code of operations that failed to complete.
> > > > > + * @param[out] cookie
> > > > > + *   The last failed completed operation's cookie.
> > > > > + *
> > > > > + * @return
> > > > > + *   The number of operations that failed to complete.
> > > > > + *
> > > > > + * NOTE: The caller must ensure that the input parameter is valid and the
> > > > > + *       corresponding device supports the operation.
> > > > > + */
> > > > > +__rte_experimental
> > > > > +static inline uint16_t
> > > > > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vq_id,
> > > > > +                          const uint16_t nb_status, uint32_t *status,
> > > > > +                          dma_cookie_t *cookie)
> > > >
> > > > IMO, it is better to move cookie/rind_idx at 3.
> > > > Why it would return any array of errors? since it called after
> > > > rte_dmadev_completed() has
> > > > has_error. Is it better to change
> > > >
> > > > rte_dmadev_error_status((uint16_t dev_id, uint16_t vq_id, dma_cookie_t
> > > > *cookie,  uint32_t *status)
> > > >
> > > > I also think, we may need to set status as bitmask and enumerate all
> > > > the combination of error codes
> > > > of all the driver and return string from driver existing rte_flow_error
> > > >
> > > > See
> > > > struct rte_flow_error {
> > > >         enum rte_flow_error_type type; /**< Cause field and error types. */
> > > >         const void *cause; /**< Object responsible for the error. */
> > > >         const char *message; /**< Human-readable error message. */
> > > > };
> > > >
> > >
> > > I think we need a multi-return value API here, as we may add operations in
> > > future which have non-error status values to return. The obvious case is
> > > DMA engines which support "compare" operations. In that case a successful
> > > compare (as in there were no DMA or HW errors) can return "equal" or
> > > "not-equal" as statuses. For general "copy" operations, the faster
> > > completion op can be used to just return successful values (and only call
> > > this status version on error), while apps using those compare ops or a
> > > mixture of copy and compare ops, would always use the slower one that
> > > returns status values for each and every op..
> > >
> > > The ioat APIs used 32-bit integer values for this status array so as to
> > > allow e.g. 16-bits for error code and 16-bits for future status values. For
> > > most operations there should be a fairly small set of things that can go
> > > wrong, i.e. bad source address, bad destination address or invalid length.
> > > Within that we may have a couple of specifics for why an address is bad,
> > > but even so I don't think we need to start having multiple bit
> > > combinations.
> >
> > OK. What is the purpose of errors status? Is it for application printing it or
> > Does the application need to take any action based on specific error requests?
>
> It's largely for information purposes, but in the case of SVA/SVM errors
> could occur due to the memory not being pinned, i.e. a page fault, in some
> cases. If that happens, then it's up the app to either touch the memory and
> retry the copy, or to do a SW memcpy as a fallback.
>
> In other error cases, I think it's good to tell the application if it's
> passing around bad data, or data that is beyond the scope of hardware, e.g.
> a copy that is beyond what can be done in a single transaction for a HW
> instance. Given that there are always things that can go wrong, I think we
> need some error reporting mechanism.
>
> > If the former is scope, then we need to define the standard enum value
> > for the error right?
> > ie. uint32_t *status needs to change to enum rte_dma_error or so.
> >
> Sure. Perhaps an error/status structure either is an option, where we
> explicitly call out error info from status info.

Agree. Better to have a structure with filed like,

1)  enum rte_dma_error_type
2)  memory to store, informative message on fine aspects of error.
LIke address caused issue etc.(Which will be driver-specific
information).


>
> >
> >
> <snip to end>
>
> /Bruce

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH V2] ethdev: add dev configured flag
  2021-07-07  7:39  3%     ` David Marchand
@ 2021-07-07  8:23  0%       ` Andrew Rybchenko
  2021-07-07  9:36  0%         ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-07-07  8:23 UTC (permalink / raw)
  To: David Marchand, Dodji Seketeli
  Cc: Huisong Li, dev, Thomas Monjalon, Yigit, Ferruh, Ananyev,
	Konstantin, Ray Kinsella

On 7/7/21 10:39 AM, David Marchand wrote:
> On Tue, Jul 6, 2021 at 10:36 AM Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru> wrote:
>>
>> @David, could you take a look at the ABI breakage warnings for
>> the patch. May we ignore it since ABI looks backward
>> compatible? Or should be marked as a minor change ABI
>> which is backward compatible with DPDK_21?
> 
> The whole eth_dev_shared_data area has always been reset to 0 at the
> first port allocation in a dpdk application life.
> Subsequent calls to rte_eth_dev_release_port() reset every port
> eth_dev->data to 0.
> 
> This bit flag is added in a hole of the structure, and it is
> set/manipulated internally of ethdev.
> 
> So unless the application was doing something nasty like highjacking
> this empty hole in the structure, I see no problem with the change wrt
> ABI.
> 
> 
> I wonder if libabigail is too strict on this report.
> Or maybe there is some extreme consideration on what a compiler could
> do about this hole...

I was wondering if it could be any specifics related to big-
little endian vs bit fields placement, but throw the idea
away...

> Dodji?
> 
> 
> For now, we can waive the warning.
> I'll look into the exception rule to add.

Thanks a lot. I'll hold on the patch for now.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH V2] ethdev: add dev configured flag
  2021-07-07  2:55  0%     ` Huisong Li
@ 2021-07-07  8:25  3%       ` Andrew Rybchenko
  2021-07-07  9:26  0%         ` Huisong Li
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-07-07  8:25 UTC (permalink / raw)
  To: Huisong Li, dev
  Cc: thomas, ferruh.yigit, konstantin.ananyev, david.marchand, Ray Kinsella

On 7/7/21 5:55 AM, Huisong Li wrote:
> 
> 在 2021/7/6 16:36, Andrew Rybchenko 写道:
>> @David, could you take a look at the ABI breakage warnings for
>> the patch. May we ignore it since ABI looks backward
>> compatible? Or should be marked as a minor change ABI
>> which is backward compatible with DPDK_21?
>>
>> On 7/6/21 7:10 AM, Huisong Li wrote:
>>> Currently, if dev_configure is not called or fails to be called, users
>>> can still call dev_start successfully. So it is necessary to have a flag
>>> which indicates whether the device is configured, to control whether
>>> dev_start can be called and eliminate dependency on user invocation
>>> order.
>>>
>>> The flag stored in "struct rte_eth_dev_data" is more reasonable than
>>>   "enum rte_eth_dev_state". "enum rte_eth_dev_state" is private to the
>>> primary and secondary processes, and can be independently controlled.
>>> However, the secondary process does not make resource allocations and
>>> does not call dev_configure(). These are done by the primary process
>>> and can be obtained or used by the secondary process. So this patch
>>> adds a "dev_configured" flag in "rte_eth_dev_data", like "dev_started".
>>>
>>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>
>>> ---
>>> v1 -> v2:
>>>    - adjusting the description of patch.
>>>
>>> ---
>>>   lib/ethdev/rte_ethdev.c      | 16 ++++++++++++++++
>>>   lib/ethdev/rte_ethdev_core.h |  6 +++++-
>>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>>> index c607eab..6540432 100644
>>> --- a/lib/ethdev/rte_ethdev.c
>>> +++ b/lib/ethdev/rte_ethdev.c
>>> @@ -1356,6 +1356,13 @@ rte_eth_dev_configure(uint16_t port_id,
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>           return -EBUSY;
>>>       }
>>>   +    /*
>>> +     * Ensure that "dev_configured" is always 0 each time prepare to do
>>> +     * dev_configure() to avoid any non-anticipated behaviour.
>>> +     * And set to 1 when dev_configure() is executed successfully.
>>> +     */
>>> +    dev->data->dev_configured = 0;
>>> +
>>>        /* Store original config, as rollback required on failure */
>>>       memcpy(&orig_conf, &dev->data->dev_conf,
>>> sizeof(dev->data->dev_conf));
>>>   @@ -1606,6 +1613,8 @@ rte_eth_dev_configure(uint16_t port_id,
>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>       }
>>>         rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q,
>>> dev_conf, 0);
>>> +    dev->data->dev_configured = 1;
>>> +
>> I think it should be inserted before the trace, since tracing
>> is intentionally put close to return without any empty lines
>> in between.
> All right. Do I need to send a patch V3?

Since the patch is waiting for resolution for ABI warning,
please, send v3 with my Reviewed-by and ack from Konstantin.
It will be a bit easier to apply when it is OK to do it.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH V2] ethdev: add dev configured flag
  2021-07-07  8:25  3%       ` Andrew Rybchenko
@ 2021-07-07  9:26  0%         ` Huisong Li
  0 siblings, 0 replies; 200+ results
From: Huisong Li @ 2021-07-07  9:26 UTC (permalink / raw)
  To: Andrew Rybchenko, dev
  Cc: thomas, ferruh.yigit, konstantin.ananyev, david.marchand, Ray Kinsella


在 2021/7/7 16:25, Andrew Rybchenko 写道:
> On 7/7/21 5:55 AM, Huisong Li wrote:
>> 在 2021/7/6 16:36, Andrew Rybchenko 写道:
>>> @David, could you take a look at the ABI breakage warnings for
>>> the patch. May we ignore it since ABI looks backward
>>> compatible? Or should be marked as a minor change ABI
>>> which is backward compatible with DPDK_21?
>>>
>>> On 7/6/21 7:10 AM, Huisong Li wrote:
>>>> Currently, if dev_configure is not called or fails to be called, users
>>>> can still call dev_start successfully. So it is necessary to have a flag
>>>> which indicates whether the device is configured, to control whether
>>>> dev_start can be called and eliminate dependency on user invocation
>>>> order.
>>>>
>>>> The flag stored in "struct rte_eth_dev_data" is more reasonable than
>>>>    "enum rte_eth_dev_state". "enum rte_eth_dev_state" is private to the
>>>> primary and secondary processes, and can be independently controlled.
>>>> However, the secondary process does not make resource allocations and
>>>> does not call dev_configure(). These are done by the primary process
>>>> and can be obtained or used by the secondary process. So this patch
>>>> adds a "dev_configured" flag in "rte_eth_dev_data", like "dev_started".
>>>>
>>>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>>> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>>>
>>>> ---
>>>> v1 -> v2:
>>>>     - adjusting the description of patch.
>>>>
>>>> ---
>>>>    lib/ethdev/rte_ethdev.c      | 16 ++++++++++++++++
>>>>    lib/ethdev/rte_ethdev_core.h |  6 +++++-
>>>>    2 files changed, 21 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>>>> index c607eab..6540432 100644
>>>> --- a/lib/ethdev/rte_ethdev.c
>>>> +++ b/lib/ethdev/rte_ethdev.c
>>>> @@ -1356,6 +1356,13 @@ rte_eth_dev_configure(uint16_t port_id,
>>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>>            return -EBUSY;
>>>>        }
>>>>    +    /*
>>>> +     * Ensure that "dev_configured" is always 0 each time prepare to do
>>>> +     * dev_configure() to avoid any non-anticipated behaviour.
>>>> +     * And set to 1 when dev_configure() is executed successfully.
>>>> +     */
>>>> +    dev->data->dev_configured = 0;
>>>> +
>>>>         /* Store original config, as rollback required on failure */
>>>>        memcpy(&orig_conf, &dev->data->dev_conf,
>>>> sizeof(dev->data->dev_conf));
>>>>    @@ -1606,6 +1613,8 @@ rte_eth_dev_configure(uint16_t port_id,
>>>> uint16_t nb_rx_q, uint16_t nb_tx_q,
>>>>        }
>>>>          rte_ethdev_trace_configure(port_id, nb_rx_q, nb_tx_q,
>>>> dev_conf, 0);
>>>> +    dev->data->dev_configured = 1;
>>>> +
>>> I think it should be inserted before the trace, since tracing
>>> is intentionally put close to return without any empty lines
>>> in between.
>> All right. Do I need to send a patch V3?
> Since the patch is waiting for resolution for ABI warning,
> please, send v3 with my Reviewed-by and ack from Konstantin.
> It will be a bit easier to apply when it is OK to do it.
> .
ok. I will send patch V3.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH V2] ethdev: add dev configured flag
  2021-07-07  8:23  0%       ` Andrew Rybchenko
@ 2021-07-07  9:36  0%         ` David Marchand
  2021-07-07  9:59  0%           ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2021-07-07  9:36 UTC (permalink / raw)
  To: Andrew Rybchenko
  Cc: Dodji Seketeli, Huisong Li, dev, Thomas Monjalon, Yigit, Ferruh,
	Ananyev, Konstantin, Ray Kinsella

On Wed, Jul 7, 2021 at 10:23 AM Andrew Rybchenko
<andrew.rybchenko@oktetlabs.ru> wrote:
>
> On 7/7/21 10:39 AM, David Marchand wrote:
> > On Tue, Jul 6, 2021 at 10:36 AM Andrew Rybchenko
> > <andrew.rybchenko@oktetlabs.ru> wrote:
> >>
> >> @David, could you take a look at the ABI breakage warnings for
> >> the patch. May we ignore it since ABI looks backward
> >> compatible? Or should be marked as a minor change ABI
> >> which is backward compatible with DPDK_21?
> >
> > The whole eth_dev_shared_data area has always been reset to 0 at the
> > first port allocation in a dpdk application life.
> > Subsequent calls to rte_eth_dev_release_port() reset every port
> > eth_dev->data to 0.
> >
> > This bit flag is added in a hole of the structure, and it is
> > set/manipulated internally of ethdev.
> >
> > So unless the application was doing something nasty like highjacking
> > this empty hole in the structure, I see no problem with the change wrt
> > ABI.
> >
> >
> > I wonder if libabigail is too strict on this report.
> > Or maybe there is some extreme consideration on what a compiler could
> > do about this hole...
>
> I was wondering if it could be any specifics related to big-
> little endian vs bit fields placement, but throw the idea
> away...

After some discussion offlist with (fairly busy ;-)) Dodji, the report
here is a good warning.

But it looks we have an issue with libabigail not properly computing
bitfields offsets.
I just opened a bz for tracking
https://sourceware.org/bugzilla/show_bug.cgi?id=28060

This is problematic, as the following rule does not work:

+; Ignore bitfields added in rte_eth_dev_data hole
+[suppress_type]
+        name = rte_eth_dev_data
+        has_data_member_inserted_between = {offset_after(lro),
offset_of(rx_queue_state)}

On the other hand, a (wrong) rule with "has_data_member_inserted_at =
2" (2 being the wrong offset you can read in abidiff output) works.

This might force us to waive all changes to rte_eth_dev_data... not
that I am happy about it.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH V2] ethdev: add dev configured flag
  2021-07-07  9:36  0%         ` David Marchand
@ 2021-07-07  9:59  0%           ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-07  9:59 UTC (permalink / raw)
  To: Andrew Rybchenko, David Marchand
  Cc: Dodji Seketeli, Huisong Li, dev, Yigit, Ferruh, Ananyev,
	Konstantin, Ray Kinsella

07/07/2021 11:36, David Marchand:
> On Wed, Jul 7, 2021 at 10:23 AM Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru> wrote:
> >
> > On 7/7/21 10:39 AM, David Marchand wrote:
> > > On Tue, Jul 6, 2021 at 10:36 AM Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru> wrote:
> > >>
> > >> @David, could you take a look at the ABI breakage warnings for
> > >> the patch. May we ignore it since ABI looks backward
> > >> compatible? Or should be marked as a minor change ABI
> > >> which is backward compatible with DPDK_21?
> > >
> > > The whole eth_dev_shared_data area has always been reset to 0 at the
> > > first port allocation in a dpdk application life.
> > > Subsequent calls to rte_eth_dev_release_port() reset every port
> > > eth_dev->data to 0.
> > >
> > > This bit flag is added in a hole of the structure, and it is
> > > set/manipulated internally of ethdev.
> > >
> > > So unless the application was doing something nasty like highjacking
> > > this empty hole in the structure, I see no problem with the change wrt
> > > ABI.
> > >
> > >
> > > I wonder if libabigail is too strict on this report.
> > > Or maybe there is some extreme consideration on what a compiler could
> > > do about this hole...
> >
> > I was wondering if it could be any specifics related to big-
> > little endian vs bit fields placement, but throw the idea
> > away...
> 
> After some discussion offlist with (fairly busy ;-)) Dodji, the report
> here is a good warning.
> 
> But it looks we have an issue with libabigail not properly computing
> bitfields offsets.
> I just opened a bz for tracking
> https://sourceware.org/bugzilla/show_bug.cgi?id=28060
> 
> This is problematic, as the following rule does not work:
> 
> +; Ignore bitfields added in rte_eth_dev_data hole
> +[suppress_type]
> +        name = rte_eth_dev_data
> +        has_data_member_inserted_between = {offset_after(lro),
> offset_of(rx_queue_state)}
> 
> On the other hand, a (wrong) rule with "has_data_member_inserted_at =
> 2" (2 being the wrong offset you can read in abidiff output) works.
> 
> This might force us to waive all changes to rte_eth_dev_data... not
> that I am happy about it.

We are not going to do other changes until 21.11, so it could be fine.



^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v7 1/7] power_intrinsics: use callbacks for comparison
  @ 2021-07-07 10:48  3%             ` Anatoly Burakov
  2021-07-07 10:48  3%             ` [dpdk-dev] [PATCH v7 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-07 10:48 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: ciara.loftus, david.hunt

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  2 ++
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index cd02820e68..c1d063bb11 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -117,6 +117,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 8d65f287f4..65f325ede1 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 3f6e735984..5d7ab4f047 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..17370b77dc 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v7 4/7] power: remove thread safety from PMD power API's
    2021-07-07 10:48  3%             ` [dpdk-dev] [PATCH v7 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
@ 2021-07-07 10:48  3%             ` Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-07 10:48 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: konstantin.ananyev, ciara.loftus

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index c1d063bb11..4b84c89c0b 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -119,6 +119,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 -----------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..4f6a242364 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,4 +21,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [dpdk-announce] DPDK 20.11.2 released
@ 2021-07-07 12:37  1% Xueming(Steven) Li
  0 siblings, 0 replies; 200+ results
From: Xueming(Steven) Li @ 2021-07-07 12:37 UTC (permalink / raw)
  To: announce

Hi all,

Here is a new stable release:
	https://fast.dpdk.org/rel/dpdk-20.11.2.tar.xz

The git tree is at:
	https://git.dpdk.org/dpdk-stable/log/?h=20.11

Special thanks to Luca for his great help on this version!

Xueming Li <xuemingl@nvidia.com>

---
 .ci/linux-build.sh                                 |  59 ++-
 .github/workflows/build.yml                        | 130 +++++
 .travis.yml                                        |  51 +-
 MAINTAINERS                                        |   1 +
 VERSION                                            |   2 +-
 app/meson.build                                    |   4 -
 app/test-bbdev/test_bbdev_perf.c                   |   7 +-
 app/test-compress-perf/comp_perf_options_parse.c   |   2 +-
 app/test-crypto-perf/cperf_options_parsing.c       |   8 +-
 app/test-eventdev/evt_options.c                    |   4 +-
 app/test-eventdev/parser.c                         |   4 +-
 app/test-eventdev/parser.h                         |   2 +-
 app/test-eventdev/test_perf_common.c               |  22 +-
 app/test-flow-perf/main.c                          |  47 +-
 app/test-pmd/bpf_cmd.c                             |   2 +-
 app/test-pmd/cmdline.c                             |  29 +-
 app/test-pmd/cmdline_flow.c                        |   2 +
 app/test-pmd/config.c                              | 108 +++-
 app/test-pmd/parameters.c                          |  39 +-
 app/test-pmd/testpmd.c                             |  35 +-
 app/test-pmd/testpmd.h                             |   3 +-
 app/test-regex/main.c                              |   7 +-
 app/test/autotest_test_funcs.py                    |   5 +-
 app/test/meson.build                               |   3 -
 app/test/packet_burst_generator.c                  |   1 +
 app/test/process.h                                 |  10 +-
 app/test/test.c                                    |  11 +-
 app/test/test_bpf.c                                |   2 +-
 app/test/test_cmdline_ipaddr.c                     |   2 +-
 app/test/test_cmdline_num.c                        |   4 +-
 app/test/test_cryptodev.c                          |  44 +-
 app/test/test_cryptodev_blockcipher.c              |   2 +-
 app/test/test_debug.c                              |  11 +-
 app/test/test_distributor_perf.c                   |   6 +-
 app/test/test_event_timer_adapter.c                |   4 +-
 app/test/test_external_mem.c                       |   3 +-
 app/test/test_flow_classify.c                      |   6 +
 app/test/test_kni.c                                |   8 +-
 app/test/test_mbuf.c                               |   9 +-
 app/test/test_mempool.c                            |   2 +-
 app/test/test_power_cpufreq.c                      |  97 +++-
 app/test/test_prefetch.c                           |   2 +-
 app/test/test_reciprocal_division_perf.c           |  41 +-
 app/test/test_stack.c                              |   4 +
 app/test/test_stack_perf.c                         |   4 +
 app/test/test_table_tables.c                       |   3 +-
 app/test/test_timer_secondary.c                    |   8 +-
 app/test/test_trace_perf.c                         |   5 +-
 buildtools/binutils-avx512-check.sh                |   2 +-
 buildtools/check-symbols.sh                        |   2 +-
 buildtools/list-dir-globs.py                       |   2 +-
 buildtools/map-list-symbol.sh                      |   2 +-
 buildtools/meson.build                             |   2 +-
 config/meson.build                                 |   9 +-
 config/ppc/meson.build                             |  17 +-
 devtools/check-symbol-maps.sh                      |   3 +-
 devtools/checkpatches.sh                           |   3 +-
 doc/api/doxy-api.conf.in                           |   3 +-
 doc/guides/conf.py                                 |  49 +-
 doc/guides/contributing/documentation.rst          |  74 +--
 doc/guides/cryptodevs/caam_jr.rst                  |   2 +-
 doc/guides/cryptodevs/qat.rst                      |   2 +-
 doc/guides/cryptodevs/virtio.rst                   |   2 +-
 doc/guides/eventdevs/dlb2.rst                      |  41 +-
 doc/guides/linux_gsg/linux_drivers.rst             |  10 +
 doc/guides/nics/enic.rst                           |  32 +-
 doc/guides/nics/hns3.rst                           |   6 +-
 doc/guides/nics/i40e.rst                           |   2 +-
 doc/guides/nics/ice.rst                            |   2 +-
 doc/guides/nics/netvsc.rst                         |   2 +-
 doc/guides/nics/nfp.rst                            |  10 +-
 doc/guides/nics/virtio.rst                         |   5 +-
 doc/guides/nics/vmxnet3.rst                        |   3 +-
 doc/guides/prog_guide/vhost_lib.rst                |  12 +
 doc/guides/rel_notes/known_issues.rst              |  10 +-
 doc/guides/rel_notes/release_20_05.rst             |   7 +
 doc/guides/rel_notes/release_20_11.rst             | 556 +++++++++++++++++++++
 doc/guides/sample_app_ug/vhost.rst                 |   2 +-
 doc/guides/testpmd_app_ug/run_app.rst              |  10 +-
 doc/guides/testpmd_app_ug/testpmd_funcs.rst        |   3 +-
 drivers/bus/dpaa/base/fman/fman_hw.c               |  33 +-
 drivers/bus/dpaa/base/fman/netcfg_layer.c          |   4 +-
 drivers/bus/dpaa/base/qbman/bman_driver.c          |  13 +-
 drivers/bus/dpaa/base/qbman/qman_driver.c          |  17 +-
 drivers/bus/dpaa/include/fsl_qman.h                |   2 +-
 drivers/bus/dpaa/include/netcfg.h                  |   1 -
 drivers/bus/fslmc/fslmc_logs.h                     |   2 -
 drivers/bus/fslmc/qbman/include/compat.h           |   3 -
 drivers/bus/fslmc/qbman/qbman_portal.c             |  14 +-
 drivers/bus/pci/linux/pci_uio.c                    |  12 +
 drivers/bus/pci/rte_bus_pci.h                      |  13 +-
 drivers/bus/pci/windows/pci.c                      |  28 +-
 drivers/common/dpaax/caamflib/compat.h             |  12 +-
 drivers/common/dpaax/compat.h                      |   5 -
 drivers/common/dpaax/dpaax_iova_table.c            |   4 +-
 drivers/common/dpaax/meson.build                   |   1 -
 drivers/common/iavf/virtchnl.h                     |   6 +-
 drivers/common/mlx5/linux/mlx5_glue.c              |  18 +
 drivers/common/mlx5/linux/mlx5_glue.h              |   2 +
 drivers/common/mlx5/mlx5_common.c                  |   9 +-
 drivers/common/mlx5/mlx5_devx_cmds.c               | 140 +++++-
 drivers/common/mlx5/mlx5_devx_cmds.h               |  16 +
 drivers/common/mlx5/mlx5_prm.h                     | 155 +++++-
 drivers/common/mlx5/version.map                    |   5 +-
 drivers/common/octeontx2/otx2_mbox.h               |   7 +
 drivers/common/qat/qat_device.h                    |   2 +-
 drivers/common/sfc_efx/base/ef10_filter.c          |  11 +-
 drivers/common/sfc_efx/base/ef10_nic.c             |  10 +-
 drivers/common/sfc_efx/base/efx_mae.c              |  61 ++-
 drivers/common/sfc_efx/base/efx_mcdi.c             |  10 +
 drivers/common/sfc_efx/base/efx_pci.c              |   3 +-
 drivers/common/sfc_efx/base/rhead_nic.c            |   1 -
 drivers/compress/qat/qat_comp.c                    |   7 +-
 drivers/compress/qat/qat_comp_pmd.c                | 111 ++--
 drivers/crypto/bcmfs/bcmfs_logs.c                  |  17 +-
 drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c        |  50 +-
 drivers/crypto/dpaa_sec/dpaa_sec.c                 |  14 +
 drivers/crypto/octeontx/otx_cryptodev_ops.c        |   4 +-
 drivers/crypto/octeontx2/otx2_cryptodev_qp.h       |   4 +-
 drivers/crypto/qat/qat_sym.c                       |  10 +-
 drivers/crypto/zuc/rte_zuc_pmd.c                   |   8 +-
 drivers/event/dlb/dlb.c                            |   2 +-
 drivers/event/dlb/pf/dlb_pf.c                      |   3 +-
 drivers/event/dlb2/dlb2.c                          |   2 +-
 drivers/event/dlb2/dlb2_priv.h                     |   3 -
 drivers/event/dlb2/pf/dlb2_pf.c                    |   3 +-
 drivers/event/dpaa2/dpaa2_eventdev_logs.h          |   2 -
 drivers/event/octeontx2/otx2_evdev.c               |  65 ++-
 drivers/event/octeontx2/otx2_evdev_adptr.c         |   2 +-
 drivers/event/octeontx2/otx2_evdev_crypto_adptr.c  | 110 ++--
 drivers/meson.build                                |   2 +-
 drivers/net/af_xdp/rte_eth_af_xdp.c                |   3 +-
 drivers/net/ark/ark_ethdev.c                       |   3 +
 drivers/net/ark/ark_ethdev_rx.c                    |  49 +-
 drivers/net/ark/ark_pktdir.c                       |   2 +-
 drivers/net/ark/ark_pktdir.h                       |   2 +-
 drivers/net/atlantic/atl_ethdev.c                  |   7 +-
 drivers/net/bnx2x/bnx2x.h                          |  13 +-
 drivers/net/bnx2x/bnx2x_rxtx.c                     |  13 +-
 drivers/net/bnxt/bnxt.h                            |  23 +-
 drivers/net/bnxt/bnxt_cpr.h                        |   4 +
 drivers/net/bnxt/bnxt_ethdev.c                     | 514 +++++++++++++++----
 drivers/net/bnxt/bnxt_flow.c                       |  56 ++-
 drivers/net/bnxt/bnxt_hwrm.c                       | 185 ++++---
 drivers/net/bnxt/bnxt_hwrm.h                       |   9 +-
 drivers/net/bnxt/bnxt_reps.c                       |   4 +-
 drivers/net/bnxt/bnxt_rxq.c                        |  33 +-
 drivers/net/bnxt/bnxt_rxr.c                        |  25 +-
 drivers/net/bnxt/bnxt_rxr.h                        |   4 +-
 drivers/net/bnxt/bnxt_stats.c                      |  23 +-
 drivers/net/bnxt/bnxt_stats.h                      |   7 +-
 drivers/net/bnxt/bnxt_txr.c                        |   2 +-
 drivers/net/bnxt/bnxt_util.h                       |   2 +
 drivers/net/bnxt/bnxt_vnic.c                       |   4 +-
 drivers/net/bnxt/bnxt_vnic.h                       |   4 +-
 drivers/net/bonding/eth_bond_private.h             |   2 +-
 drivers/net/bonding/rte_eth_bond_8023ad.c          |  17 +-
 drivers/net/bonding/rte_eth_bond_api.c             |  26 +-
 drivers/net/bonding/rte_eth_bond_args.c            |   8 +-
 drivers/net/bonding/rte_eth_bond_pmd.c             |   7 +-
 drivers/net/cxgbe/base/common.h                    |  18 +-
 drivers/net/dpaa/dpaa_ethdev.c                     |  26 +-
 drivers/net/dpaa2/dpaa2_ethdev.c                   |  25 +-
 drivers/net/e1000/base/e1000_i210.c                |   2 +
 drivers/net/e1000/e1000_logs.c                     |  49 +-
 drivers/net/e1000/em_ethdev.c                      |  21 +-
 drivers/net/e1000/igb_ethdev.c                     |  33 +-
 drivers/net/e1000/igb_flow.c                       |   2 +-
 drivers/net/e1000/igb_rxtx.c                       |   9 +-
 drivers/net/ena/base/ena_com.c                     |  60 ++-
 drivers/net/ena/base/ena_defs/ena_admin_defs.h     |  85 ++--
 drivers/net/ena/base/ena_eth_com.c                 |  16 +-
 drivers/net/ena/base/ena_plat_dpdk.h               |   9 +-
 drivers/net/ena/ena_ethdev.c                       |  38 +-
 drivers/net/ena/ena_platform.h                     |  12 -
 drivers/net/enic/base/vnic_dev.c                   |   2 +-
 drivers/net/enic/base/vnic_enet.h                  |   1 +
 drivers/net/enic/enic.h                            |   4 +-
 drivers/net/enic/enic_ethdev.c                     |  85 ++--
 drivers/net/enic/enic_fm_flow.c                    |   6 +-
 drivers/net/enic/enic_main.c                       | 161 +++---
 drivers/net/enic/enic_res.c                        |   7 +-
 drivers/net/failsafe/failsafe_ops.c                |  10 +-
 drivers/net/hinic/base/hinic_compat.h              |  25 +-
 drivers/net/hinic/hinic_pmd_ethdev.c               |   5 +
 drivers/net/hns3/hns3_cmd.c                        |  24 +-
 drivers/net/hns3/hns3_cmd.h                        |  21 +-
 drivers/net/hns3/hns3_dcb.c                        | 109 ++--
 drivers/net/hns3/hns3_dcb.h                        |   4 +-
 drivers/net/hns3/hns3_ethdev.c                     | 443 +++++++++-------
 drivers/net/hns3/hns3_ethdev.h                     |  46 +-
 drivers/net/hns3/hns3_ethdev_vf.c                  | 121 ++---
 drivers/net/hns3/hns3_fdir.c                       |  52 +-
 drivers/net/hns3/hns3_fdir.h                       |   5 +-
 drivers/net/hns3/hns3_flow.c                       | 112 ++++-
 drivers/net/hns3/hns3_intr.c                       |  73 ++-
 drivers/net/hns3/hns3_intr.h                       |   4 +-
 drivers/net/hns3/hns3_logs.h                       |   2 +-
 drivers/net/hns3/hns3_mbx.c                        | 256 +++++++---
 drivers/net/hns3/hns3_mbx.h                        |  32 +-
 drivers/net/hns3/hns3_mp.c                         |   6 +-
 drivers/net/hns3/hns3_mp.h                         |   2 +-
 drivers/net/hns3/hns3_regs.c                       |   9 +-
 drivers/net/hns3/hns3_regs.h                       |   2 +-
 drivers/net/hns3/hns3_rss.c                        |   2 +-
 drivers/net/hns3/hns3_rss.h                        |   2 +-
 drivers/net/hns3/hns3_rxtx.c                       | 307 +++++++++---
 drivers/net/hns3/hns3_rxtx.h                       |  37 +-
 drivers/net/hns3/hns3_rxtx_vec.c                   |  38 +-
 drivers/net/hns3/hns3_rxtx_vec.h                   |   5 +-
 drivers/net/hns3/hns3_rxtx_vec_neon.h              |   2 +-
 drivers/net/hns3/hns3_rxtx_vec_sve.c               |  34 +-
 drivers/net/hns3/hns3_stats.c                      |  10 +-
 drivers/net/hns3/hns3_stats.h                      |   6 +-
 drivers/net/hns3/meson.build                       |   2 +-
 drivers/net/i40e/base/virtchnl.h                   |  29 +-
 drivers/net/i40e/i40e_ethdev.c                     | 167 +++++--
 drivers/net/i40e/i40e_ethdev.h                     |   5 +-
 drivers/net/i40e/i40e_ethdev_vf.c                  |  95 ++--
 drivers/net/i40e/i40e_fdir.c                       |  89 ++++
 drivers/net/i40e/i40e_flow.c                       | 181 ++++---
 drivers/net/i40e/i40e_pf.c                         |  65 +++
 drivers/net/i40e/i40e_rxtx.c                       |   2 -
 drivers/net/i40e/i40e_rxtx_vec_neon.c              |  20 +-
 drivers/net/iavf/iavf.h                            |   6 +-
 drivers/net/iavf/iavf_ethdev.c                     |  16 +-
 drivers/net/iavf/iavf_rxtx.c                       |   5 +
 drivers/net/iavf/iavf_rxtx.h                       |   2 +-
 drivers/net/iavf/iavf_rxtx_vec_avx2.c              | 120 +----
 drivers/net/iavf/iavf_rxtx_vec_avx512.c            |  13 +-
 drivers/net/iavf/iavf_rxtx_vec_common.h            | 203 ++++++++
 drivers/net/iavf/iavf_vchnl.c                      |  25 +-
 drivers/net/ice/base/ice_flow.c                    |  11 +-
 drivers/net/ice/base/ice_lan_tx_rx.h               |   2 +-
 drivers/net/ice/base/ice_osdep.h                   |   2 +-
 drivers/net/ice/base/ice_switch.c                  |   3 +-
 drivers/net/ice/base/meson.build                   |   5 +
 drivers/net/ice/ice_dcf_parent.c                   |   2 +
 drivers/net/ice/ice_ethdev.c                       |  56 ++-
 drivers/net/ice/ice_hash.c                         |  14 +
 drivers/net/ice/ice_rxtx_vec_avx2.c                | 120 +----
 drivers/net/ice/ice_rxtx_vec_avx512.c              |   5 +-
 drivers/net/ice/ice_rxtx_vec_common.h              | 203 ++++++++
 drivers/net/ice/meson.build                        |   2 +
 drivers/net/igc/igc_ethdev.c                       |  46 +-
 drivers/net/igc/igc_ethdev.h                       |   3 +-
 drivers/net/igc/igc_flow.c                         |   2 +-
 drivers/net/igc/igc_txrx.c                         |  30 +-
 drivers/net/ionic/ionic_ethdev.c                   |  15 +-
 drivers/net/ionic/ionic_lif.c                      |   5 +-
 drivers/net/ixgbe/ixgbe_ethdev.c                   |  22 +-
 drivers/net/kni/rte_eth_kni.c                      |  12 +-
 drivers/net/memif/rte_eth_memif.c                  |   1 +
 drivers/net/memif/rte_eth_memif.h                  |   4 -
 drivers/net/mlx4/mlx4.c                            |   1 +
 drivers/net/mlx4/mlx4_flow.c                       |   3 +-
 drivers/net/mlx4/mlx4_mp.c                         |   2 +-
 drivers/net/mlx4/mlx4_rxtx.c                       |   4 -
 drivers/net/mlx4/mlx4_txq.c                        |  19 +-
 drivers/net/mlx5/linux/mlx5_ethdev_os.c            |   4 +-
 drivers/net/mlx5/linux/mlx5_mp_os.c                |   2 +-
 drivers/net/mlx5/linux/mlx5_os.c                   |  37 +-
 drivers/net/mlx5/linux/mlx5_socket.c               |   4 -
 drivers/net/mlx5/linux/mlx5_verbs.c                | 121 +++++
 drivers/net/mlx5/linux/mlx5_verbs.h                |   2 +
 drivers/net/mlx5/mlx5.c                            |  19 +-
 drivers/net/mlx5/mlx5.h                            |  20 +-
 drivers/net/mlx5/mlx5_devx.c                       |   4 +
 drivers/net/mlx5/mlx5_flow.c                       | 103 ++--
 drivers/net/mlx5/mlx5_flow.h                       |  68 ++-
 drivers/net/mlx5/mlx5_flow_age.c                   |   5 +-
 drivers/net/mlx5/mlx5_flow_dv.c                    | 311 ++++++++----
 drivers/net/mlx5/mlx5_mr.c                         |  11 +
 drivers/net/mlx5/mlx5_rxtx.c                       |  16 +-
 drivers/net/mlx5/mlx5_rxtx.h                       |   1 +
 drivers/net/mlx5/mlx5_rxtx_vec_altivec.h           |  11 +-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h              |  13 +-
 drivers/net/mlx5/mlx5_rxtx_vec_sse.h               |   9 +-
 drivers/net/mlx5/mlx5_trigger.c                    |  10 +
 drivers/net/mlx5/mlx5_txpp.c                       |   2 +
 drivers/net/nfp/nfp_net.c                          |  26 +-
 drivers/net/octeontx2/otx2_ethdev_ops.c            |   5 +-
 drivers/net/octeontx2/otx2_vlan.c                  |   8 +-
 drivers/net/pcap/rte_eth_pcap.c                    |  12 +-
 drivers/net/qede/base/ecore_int.c                  |   2 +-
 drivers/net/qede/qede_ethdev.c                     |   9 +-
 drivers/net/sfc/sfc_ef100_rx.c                     |  21 +-
 drivers/net/sfc/sfc_ethdev.c                       |   8 -
 drivers/net/sfc/sfc_mae.c                          |  25 +-
 drivers/net/sfc/sfc_mae.h                          |   3 +-
 drivers/net/tap/rte_eth_tap.c                      |   5 +-
 drivers/net/tap/tap_flow.c                         |   8 +-
 drivers/net/tap/tap_intr.c                         |   2 +-
 drivers/net/txgbe/base/txgbe_eeprom.c              |  76 +--
 drivers/net/txgbe/base/txgbe_eeprom.h              |   2 -
 drivers/net/txgbe/base/txgbe_type.h                |   1 +
 drivers/net/txgbe/txgbe_ethdev.c                   |  47 +-
 drivers/net/txgbe/txgbe_ptypes.c                   |   4 +-
 drivers/net/virtio/virtio_rxtx_simple_altivec.c    |  12 +-
 drivers/net/virtio/virtio_rxtx_simple_neon.c       |  12 +-
 drivers/net/virtio/virtio_rxtx_simple_sse.c        |  12 +-
 drivers/net/virtio/virtio_user_ethdev.c            |  75 ++-
 drivers/raw/ifpga/ifpga_rawdev.c                   |   4 +-
 drivers/raw/ifpga/ifpga_rawdev.h                   |   2 +
 drivers/raw/ioat/dpdk_idxd_cfg.py                  |  10 +-
 drivers/raw/ntb/ntb.c                              |  13 +
 drivers/raw/ntb/ntb_hw_intel.c                     |   5 +
 drivers/raw/octeontx2_dma/otx2_dpi_rawdev.c        |   1 +
 drivers/raw/skeleton/skeleton_rawdev_test.c        |   1 +
 drivers/regex/mlx5/mlx5_regex.c                    |   1 +
 drivers/regex/mlx5/mlx5_regex.h                    |   1 +
 drivers/regex/mlx5/mlx5_regex_control.c            |   1 +
 drivers/regex/octeontx2/meson.build                |   1 -
 drivers/vdpa/ifc/base/ifcvf.c                      |   7 +-
 drivers/vdpa/mlx5/mlx5_vdpa.c                      |   3 +
 drivers/vdpa/mlx5/mlx5_vdpa.h                      |   1 +
 drivers/vdpa/mlx5/mlx5_vdpa_event.c                |   2 +
 drivers/vdpa/mlx5/mlx5_vdpa_virtq.c                |   8 +-
 examples/bbdev_app/Makefile                        |   6 +-
 examples/bbdev_app/main.c                          |   5 +-
 examples/bond/Makefile                             |   6 +-
 examples/bond/main.c                               |   4 +
 examples/cmdline/Makefile                          |   6 +-
 examples/cmdline/main.c                            |   3 +
 examples/distributor/Makefile                      |   6 +-
 examples/distributor/main.c                        |   3 +
 examples/ethtool/ethtool-app/Makefile              |   6 +-
 examples/ethtool/ethtool-app/ethapp.c              |   1 -
 examples/ethtool/ethtool-app/main.c                |   3 +
 examples/ethtool/lib/Makefile                      |   6 +-
 examples/eventdev_pipeline/Makefile                |   6 +-
 examples/fips_validation/Makefile                  |   6 +-
 examples/fips_validation/main.c                    |   3 +
 examples/flow_classify/Makefile                    |   6 +-
 examples/flow_classify/flow_classify.c             |   5 +-
 examples/flow_filtering/Makefile                   |   6 +-
 examples/flow_filtering/main.c                     |   7 +-
 examples/helloworld/Makefile                       |   6 +-
 examples/helloworld/main.c                         |   4 +
 examples/ioat/Makefile                             |   6 +-
 examples/ioat/ioatfwd.c                            |   3 +
 examples/ip_fragmentation/Makefile                 |   6 +-
 examples/ip_fragmentation/main.c                   |   3 +
 examples/ip_pipeline/Makefile                      |   6 +-
 examples/ip_reassembly/Makefile                    |   6 +-
 examples/ip_reassembly/main.c                      |   3 +
 examples/ipsec-secgw/Makefile                      |   6 +-
 examples/ipsec-secgw/ipsec-secgw.c                 |   3 +
 examples/ipv4_multicast/Makefile                   |   6 +-
 examples/ipv4_multicast/main.c                     |   3 +
 examples/kni/Makefile                              |   6 +-
 examples/kni/main.c                                |   3 +
 examples/l2fwd-cat/Makefile                        |   6 +-
 examples/l2fwd-cat/l2fwd-cat.c                     |   5 +-
 examples/l2fwd-crypto/Makefile                     |   6 +-
 examples/l2fwd-crypto/main.c                       |  23 +
 examples/l2fwd-event/Makefile                      |   6 +-
 examples/l2fwd-event/main.c                        |   3 +
 examples/l2fwd-jobstats/Makefile                   |   6 +-
 examples/l2fwd-jobstats/main.c                     |   3 +
 examples/l2fwd-keepalive/Makefile                  |   6 +-
 examples/l2fwd-keepalive/ka-agent/Makefile         |   6 +-
 examples/l2fwd-keepalive/main.c                    |   4 +
 examples/l2fwd/Makefile                            |   6 +-
 examples/l2fwd/main.c                              |   3 +
 examples/l3fwd-acl/Makefile                        |   6 +-
 examples/l3fwd-acl/main.c                          |   3 +
 examples/l3fwd-graph/Makefile                      |   6 +-
 examples/l3fwd-graph/main.c                        |   3 +
 examples/l3fwd-power/Makefile                      |   6 +-
 examples/l3fwd-power/main.c                        |   4 +-
 examples/l3fwd/Makefile                            |   6 +-
 examples/l3fwd/l3fwd_lpm.c                         |  26 +-
 examples/l3fwd/main.c                              |   4 +
 examples/link_status_interrupt/Makefile            |   6 +-
 examples/link_status_interrupt/main.c              |   3 +
 examples/meson.build                               |  10 +-
 .../client_server_mp/mp_client/Makefile            |   6 +-
 .../client_server_mp/mp_client/client.c            |   3 +
 .../client_server_mp/mp_server/Makefile            |   6 +-
 .../client_server_mp/mp_server/main.c              |   4 +
 examples/multi_process/hotplug_mp/Makefile         |   6 +-
 examples/multi_process/simple_mp/Makefile          |   6 +-
 examples/multi_process/simple_mp/main.c            |   4 +
 examples/multi_process/symmetric_mp/Makefile       |   6 +-
 examples/multi_process/symmetric_mp/main.c         |   3 +
 examples/ntb/Makefile                              |   6 +-
 examples/ntb/ntb_fwd.c                             |   3 +
 examples/packet_ordering/Makefile                  |   6 +-
 examples/packet_ordering/main.c                    |   6 +-
 examples/performance-thread/l3fwd-thread/Makefile  |   5 +-
 examples/performance-thread/l3fwd-thread/main.c    |   3 +
 examples/performance-thread/pthread_shim/Makefile  |   6 +-
 examples/performance-thread/pthread_shim/main.c    |   4 +
 examples/pipeline/Makefile                         |   6 +-
 examples/pipeline/main.c                           |   3 +
 examples/ptpclient/Makefile                        |   6 +-
 examples/ptpclient/ptpclient.c                     |   7 +-
 examples/qos_meter/Makefile                        |   6 +-
 examples/qos_meter/main.c                          |   3 +
 examples/qos_sched/Makefile                        |   6 +-
 examples/qos_sched/main.c                          |   3 +
 examples/rxtx_callbacks/Makefile                   |   6 +-
 examples/rxtx_callbacks/main.c                     |   6 +-
 examples/server_node_efd/node/Makefile             |   6 +-
 examples/server_node_efd/node/node.c               |   3 +
 examples/server_node_efd/server/Makefile           |   6 +-
 examples/server_node_efd/server/main.c             |   4 +
 examples/service_cores/Makefile                    |   6 +-
 examples/service_cores/main.c                      |   3 +
 examples/skeleton/Makefile                         |   6 +-
 examples/skeleton/basicfwd.c                       |   5 +-
 examples/timer/Makefile                            |   6 +-
 examples/timer/main.c                              |  23 +-
 examples/vdpa/Makefile                             |   6 +-
 examples/vdpa/main.c                               |   3 +
 examples/vhost/Makefile                            |   6 +-
 examples/vhost/main.c                              |  48 +-
 examples/vhost/virtio_net.c                        |   8 +-
 examples/vhost_blk/Makefile                        |   6 +-
 examples/vhost_blk/vhost_blk.c                     |   3 +
 examples/vhost_crypto/Makefile                     |   6 +-
 examples/vhost_crypto/main.c                       |   5 +-
 examples/vm_power_manager/Makefile                 |   6 +-
 examples/vm_power_manager/guest_cli/Makefile       |   6 +-
 examples/vm_power_manager/guest_cli/main.c         |   3 +
 examples/vm_power_manager/main.c                   |   3 +
 examples/vmdq/Makefile                             |   6 +-
 examples/vmdq/main.c                               |   3 +
 examples/vmdq_dcb/Makefile                         |   6 +-
 examples/vmdq_dcb/main.c                           |   3 +
 kernel/linux/kni/kni_net.c                         |  48 +-
 lib/librte_acl/acl_run_avx512_common.h             |  24 +
 lib/librte_bpf/bpf_validate.c                      |   2 +-
 lib/librte_eal/arm/rte_cpuflags.c                  |   2 +-
 lib/librte_eal/common/eal_common_fbarray.c         |   7 +-
 lib/librte_eal/common/eal_common_options.c         |  12 +-
 lib/librte_eal/common/eal_common_proc.c            |  27 +-
 lib/librte_eal/common/eal_common_thread.c          |  66 +--
 lib/librte_eal/common/malloc_mp.c                  |   4 +-
 lib/librte_eal/freebsd/eal.c                       |   4 +
 lib/librte_eal/freebsd/include/rte_os.h            |   6 +-
 lib/librte_eal/include/rte_eal_paging.h            |   2 +-
 lib/librte_eal/include/rte_lcore.h                 |   8 +
 lib/librte_eal/include/rte_reciprocal.h            |   8 +
 lib/librte_eal/include/rte_service.h               |   5 +-
 lib/librte_eal/include/rte_vfio.h                  |   7 +-
 lib/librte_eal/linux/eal.c                         |   4 +
 lib/librte_eal/linux/eal_log.c                     |   6 +-
 lib/librte_eal/linux/eal_memalloc.c                |  14 +-
 lib/librte_eal/linux/eal_vfio.c                    |  98 ++--
 lib/librte_eal/linux/eal_vfio.h                    |   1 +
 lib/librte_eal/linux/include/rte_os.h              |   8 +-
 lib/librte_eal/unix/eal_file.c                     |   1 +
 lib/librte_eal/unix/eal_unix_memory.c              |  11 +-
 lib/librte_eal/version.map                         |   1 -
 lib/librte_eal/windows/eal.c                       |   4 +
 lib/librte_eal/windows/eal_hugepages.c             |   4 +
 lib/librte_eal/windows/eal_memory.c                |   2 +-
 lib/librte_eal/windows/eal_thread.c                |   4 +-
 lib/librte_eal/windows/include/pthread.h           |  16 +-
 lib/librte_eal/windows/include/rte_os.h            |   5 +-
 lib/librte_eal/windows/include/sched.h             |   1 +
 lib/librte_ethdev/rte_ethdev.c                     |  14 +-
 lib/librte_ethdev/rte_ethdev.h                     |   5 +
 lib/librte_ethdev/rte_flow.h                       |   4 +-
 lib/librte_eventdev/rte_event_crypto_adapter.c     |   1 +
 lib/librte_eventdev/rte_event_eth_rx_adapter.c     |   5 +-
 lib/librte_ip_frag/rte_ipv4_fragmentation.c        |  34 +-
 lib/librte_kni/rte_kni.c                           |   7 +-
 lib/librte_kni/rte_kni_common.h                    |   1 +
 lib/librte_mbuf/rte_mbuf_dyn.c                     |  10 +-
 lib/librte_net/rte_ip.h                            |   2 +-
 lib/librte_pipeline/rte_swx_pipeline.c             | 494 ++++++++++++++----
 lib/librte_power/guest_channel.c                   |  22 +-
 lib/librte_power/power_acpi_cpufreq.c              |   5 +-
 lib/librte_power/power_pstate_cpufreq.c            |   5 +-
 lib/librte_power/rte_power_guest_channel.h         |   8 -
 lib/librte_power/version.map                       |   2 -
 lib/librte_sched/rte_sched.c                       |   2 +-
 lib/librte_stack/rte_stack.c                       |   4 +-
 lib/librte_stack/rte_stack.h                       |   3 +-
 lib/librte_stack/rte_stack_lf.h                    |   5 +
 lib/librte_table/rte_swx_table_em.c                |   6 +-
 lib/librte_telemetry/rte_telemetry.h               |   4 +
 lib/librte_telemetry/telemetry.c                   |   2 +
 lib/librte_vhost/rte_vhost.h                       |   1 +
 lib/librte_vhost/socket.c                          |   5 +-
 lib/librte_vhost/vhost.c                           |   8 +-
 lib/librte_vhost/vhost.h                           |  14 +-
 lib/librte_vhost/vhost_user.c                      |   3 -
 lib/librte_vhost/virtio_net.c                      | 213 ++++++--
 license/README                                     |   4 +-
 meson.build                                        |   2 +-
 494 files changed, 7573 insertions(+), 3561 deletions(-)
Adam Dybkowski (3):
      common/qat: increase IM buffer size for GEN3
      compress/qat: enable compression on GEN3
      crypto/qat: fix null authentication request

Ajit Khaparde (7):
      net/bnxt: fix RSS context cleanup
      net/bnxt: check kvargs parsing
      net/bnxt: fix resource cleanup
      doc: fix formatting in testpmd guide
      net/bnxt: fix mismatched type comparison in MAC restore
      net/bnxt: check PCI config read
      net/bnxt: fix mismatched type comparison in Rx

Alvin Zhang (11):
      net/ice: fix VLAN filter with PF
      net/i40e: fix input set field mask
      net/igc: fix Rx RSS hash offload capability
      net/igc: fix Rx error counter for bad length
      net/e1000: fix Rx error counter for bad length
      net/e1000: fix max Rx packet size
      net/igc: fix Rx packet size
      net/ice: fix fast mbuf freeing
      net/iavf: fix VF to PF command failure handling
      net/i40e: fix VF RSS configuration
      net/igc: fix speed configuration

Anatoly Burakov (3):
      fbarray: fix log message on truncation error
      power: do not skip saving original P-state governor
      power: save original ACPI governor always

Andrew Boyer (1):
      net/ionic: fix completion type in lif init

Andrew Rybchenko (4):
      net/failsafe: fix RSS hash offload reporting
      net/failsafe: report minimum and maximum MTU
      common/sfc_efx: remove GENEVE from supported tunnels
      net/sfc: fix mark support in EF100 native Rx datapath

Andy Moreton (2):
      common/sfc_efx/base: limit reported MCDI response length
      common/sfc_efx/base: add missing MCDI response length checks

Ankur Dwivedi (1):
      crypto/octeontx: fix session-less mode

Apeksha Gupta (1):
      examples/l2fwd-crypto: skip masked devices

Arek Kusztal (1):
      crypto/qat: fix offset for out-of-place scatter-gather

Beilei Xing (1):
      net/i40evf: fix packet loss for X722

Bing Zhao (1):
      net/mlx5: fix loopback for Direct Verbs queue

Bruce Richardson (2):
      build: exclude meson files from examples installation
      raw/ioat: fix script for configuring small number of queues

Chaoyong He (1):
      doc: fix multiport syntax in nfp guide

Chenbo Xia (1):
      examples/vhost: check memory table query

Chengchang Tang (20):
      net/hns3: fix HW buffer size on MTU update
      net/hns3: fix processing Tx offload flags
      net/hns3: fix Tx checksum for UDP packets with special port
      net/hns3: fix long task queue pairs reset time
      ethdev: validate input in module EEPROM dump
      ethdev: validate input in register info
      ethdev: validate input in EEPROM info
      net/hns3: fix rollback after setting PVID failure
      net/hns3: fix timing in resetting queues
      net/hns3: fix queue state when concurrent with reset
      net/hns3: fix configure FEC when concurrent with reset
      net/hns3: fix use of command status enumeration
      examples: add eal cleanup to examples
      net/bonding: fix adding itself as its slave
      net/hns3: fix timing in mailbox
      app/testpmd: fix max queue number for Tx offloads
      net/tap: fix interrupt vector array size
      net/bonding: fix socket ID check
      net/tap: check ioctl on restore
      examples/timer: fix time interval

Chengwen Feng (50):
      net/hns3: fix flow counter value
      net/hns3: fix VF mailbox head field
      net/hns3: support get device version when dump register
      net/hns3: fix some packet types
      net/hns3: fix missing outer L4 UDP flag for VXLAN
      net/hns3: remove VLAN/QinQ ptypes from support list
      test: check thread creation
      common/dpaax: fix possible null pointer access
      examples/ethtool: remove unused parsing
      net/hns3: fix flow director lock
      net/e1000/base: fix timeout for shadow RAM write
      net/hns3: fix setting default MAC address in bonding of VF
      net/hns3: fix possible mismatched response of mailbox
      net/hns3: fix VF handling LSC event in secondary process
      net/hns3: fix verification of NEON support
      mbuf: check shared memory before dumping dynamic space
      eventdev: remove redundant thread name setting
      eventdev: fix memory leakage on thread creation failure
      net/kni: check init result
      net/hns3: fix mailbox error message
      net/hns3: fix processing link status message on PF
      net/hns3: remove unused mailbox macro and struct
      net/bonding: fix leak on remove
      net/hns3: fix handling link update
      net/i40e: fix negative VEB index
      net/i40e: remove redundant VSI check in Tx queue setup
      net/virtio: fix getline memory leakage
      net/hns3: log time delta in decimal format
      net/hns3: fix time delta calculation
      net/hns3: remove unused macros
      net/hns3: fix vector Rx burst limitation
      net/hns3: remove read when enabling TM QCN error event
      net/hns3: remove unused VMDq code
      net/hns3: increase readability in logs
      raw/ntb: check SPAD user index
      raw/ntb: check memory allocations
      ipc: check malloc sync reply result
      eal: fix service core list parsing
      ipc: use monotonic clock
      net/hns3: return error on PCI config write failure
      net/hns3: fix log on flow director clear
      net/hns3: clear hash map on flow director clear
      net/hns3: fix querying flow director counter for out param
      net/hns3: fix TM QCN error event report by MSI-X
      net/hns3: fix mailbox message ID in log
      net/hns3: fix secondary process request start/stop Rx/Tx
      net/hns3: fix ordering in secondary process initialization
      net/hns3: fail setting FEC if one bit mode is not supported
      net/mlx4: fix secondary process initialization ordering
      net/mlx5: fix secondary process initialization ordering

Ciara Loftus (1):
      net/af_xdp: fix error handling during Rx queue setup

Ciara Power (2):
      telemetry: fix race on callbacks list
      test/crypto: fix return value of a skipped test

Conor Walsh (1):
      examples/l3fwd: fix LPM IPv6 subnets

Cristian Dumitrescu (3):
      table: fix actions with different data size
      pipeline: fix instruction translation
      pipeline: fix endianness conversions

Dapeng Yu (3):
      net/igc: remove MTU setting limitation
      net/e1000: remove MTU setting limitation
      examples/packet_ordering: fix port configuration

David Christensen (1):
      config/ppc: reduce number of cores and NUMA nodes

David Harton (1):
      net/ena: fix releasing Tx ring mbufs

David Hunt (4):
      test/power: fix CPU frequency check
      test/power: add turbo mode to frequency check
      test/power: fix low frequency test when turbo enabled
      test/power: fix turbo test

David Marchand (18):
      doc: fix sphinx rtd theme import in GHA
      service: clean references to removed symbol
      eal: fix evaluation of log level option
      ci: hook to GitHub Actions
      ci: enable v21 ABI checks
      ci: fix package installation in GitHub Actions
      ci: ignore APT update failure in GitHub Actions
      ci: catch coredumps
      vhost: fix offload flags in Rx path
      bus/fslmc: remove unused debug macro
      eal: fix leak in shared lib mode detection
      event/dpaa2: remove unused macros
      net/ice/base: fix memory allocation wrapper
      net/ice: fix leak on thread termination
      devtools: fix orphan symbols check with busybox
      net/vhost: restore pseudo TSO support
      net/ark: fix leak on thread termination
      build: fix drivers selection without Python

Dekel Peled (1):
      common/mlx5: fix DevX read output buffer size

Dmitry Kozlyuk (4):
      net/pcap: fix format string
      eal/windows: add missing SPDX license tag
      buildtools: fix all drivers disabled on Windows
      examples/rxtx_callbacks: fix port ID format specifier

Ed Czeck (2):
      net/ark: update packet director initial state
      net/ark: refactor Rx buffer recovery

Elad Nachman (2):
      kni: support async user request
      kni: fix kernel deadlock with bifurcated device

Feifei Wang (2):
      net/i40e: fix parsing packet type for NEON
      test/trace: fix race on collected perf data

Ferruh Yigit (9):
      power: remove duplicated symbols from map file
      log/linux: make default output stderr
      license: fix typos
      drivers/net: fix FW version query
      net/bnx2x: fix build with GCC 11
      net/bnx2x: fix build with GCC 11
      net/ice/base: fix build with GCC 11
      net/tap: fix build with GCC 11
      test/table: fix build with GCC 11

Gregory Etelson (2):
      app/testpmd: fix tunnel offload flows cleanup
      net/mlx5: fix tunnel offload private items location

Guoyang Zhou (1):
      net/hinic: fix crash in secondary process

Haiyue Wang (1):
      net/ixgbe: fix Rx errors statistics for UDP checksum

Harman Kalra (1):
      event/octeontx2: fix device reconfigure for single slot

Heinrich Kuhn (1):
      net/nfp: fix reporting of RSS capabilities

Hemant Agrawal (3):
      ethdev: add missing buses in device iterator
      crypto/dpaa_sec: affine the thread portal affinity
      crypto/dpaa2_sec: fix close and uninit functions

Hongbo Zheng (9):
      app/testpmd: fix Tx/Rx descriptor query error log
      net/hns3: fix FLR miss detection
      net/hns3: delete redundant blank line
      bpf: fix JSLT validation
      common/sfc_efx/base: fix dereferencing null pointer
      power: fix sanity checks for guest channel read
      net/hns3: fix VF alive notification after config restore
      examples/l3fwd-power: fix empty poll thresholds
      net/hns3: fix concurrent interrupt handling

Huisong Li (23):
      net/hns3: fix device capabilities for copper media type
      net/hns3: remove unused parameter markers
      net/hns3: fix reporting undefined speed
      net/hns3: fix link update when failed to get link info
      net/hns3: fix flow control exception
      app/testpmd: fix bitmap of link speeds when force speed
      net/hns3: fix flow control mode
      net/hns3: remove redundant mailbox response
      net/hns3: fix DCB mode check
      net/hns3: fix VMDq mode check
      net/hns3: fix mbuf leakage
      net/hns3: fix link status when port is stopped
      net/hns3: fix link speed when port is down
      app/testpmd: fix forward lcores number for DCB
      app/testpmd: fix DCB forwarding configuration
      app/testpmd: fix DCB re-configuration
      app/testpmd: verify DCB config during forward config
      net/hns3: fix Rx/Tx queue numbers check
      net/hns3: fix requested FC mode rollback
      net/hns3: remove meaningless packet buffer rollback
      net/hns3: fix DCB configuration
      net/hns3: fix DCB reconfiguration
      net/hns3: fix link speed when VF device is down

Ibtisam Tariq (1):
      examples/vhost_crypto: remove unused short option

Igor Chauskin (2):
      net/ena: switch memcpy to optimized version
      net/ena: fix parsing of large LLQ header device argument

Igor Russkikh (2):
      net/qede: reduce log verbosity
      net/qede: accept bigger RSS table

Ilya Maximets (1):
      net/virtio: fix interrupt unregistering for listening socket

Ivan Malov (5):
      net/sfc: fix buffer size for flow parse
      net: fix comment in IPv6 header
      net/sfc: fix error path inconsistency
      common/sfc_efx/base: fix indication of MAE encap support
      net/sfc: fix outer rule rollback on error

Jerin Jacob (1):
      examples: fix pkg-config override

Jiawei Wang (4):
      app/testpmd: fix NVGRE encap configuration
      net/mlx5: fix resource release for mirror flow
      net/mlx5: fix RSS flow item expansion for GRE key
      net/mlx5: fix RSS flow item expansion for NVGRE

Jiawei Zhu (1):
      net/mlx5: fix Rx segmented packets on mbuf starvation

Jiawen Wu (4):
      net/txgbe: remove unused functions
      net/txgbe: fix Rx missed packet counter
      net/txgbe: update packet type
      net/txgbe: fix QinQ strip

Jiayu Hu (2):
      vhost: fix queue initialization
      vhost: fix redundant vring status change notification

Jie Wang (1):
      net/ice: fix VSI array out of bounds access

John Daley (2):
      net/enic: fix flow initialization error handling
      net/enic: enable GENEVE offload via VNIC configuration

Juraj Linkeš (1):
      eal/arm64: fix platform register bit

Kai Ji (2):
      test/crypto: fix auth-cipher compare length in OOP
      test/crypto: copy offset data to OOP destination buffer

Kalesh AP (23):
      net/bnxt: remove unused macro
      net/bnxt: fix VNIC configuration
      net/bnxt: fix firmware fatal error handling
      net/bnxt: fix FW readiness check during recovery
      net/bnxt: fix device readiness check
      net/bnxt: fix VF info allocation
      net/bnxt: fix HWRM and FW incompatibility handling
      net/bnxt: mute some failure logs
      app/testpmd: check MAC address query
      net/bnxt: fix PCI write check
      net/bnxt: fix link state operations
      net/bnxt: fix timesync when PTP is not supported
      net/bnxt: fix memory allocation for command response
      net/bnxt: fix double free in port start failure
      net/bnxt: fix configuring LRO
      net/bnxt: fix health check alarm cancellation
      net/bnxt: fix PTP support for Thor
      net/bnxt: fix ring count calculation for Thor
      net/bnxt: remove unnecessary forward declarations
      net/bnxt: remove unused function parameters
      net/bnxt: drop unused attribute
      net/bnxt: fix single PF per port check
      net/bnxt: prevent device access in error state

Kamil Vojanec (1):
      net/mlx5/linux: fix firmware version

Kevin Traynor (5):
      test/cmdline: fix inputs array
      test/crypto: fix build with GCC 11
      crypto/zuc: fix build with GCC 11
      test: fix build with GCC 11
      test/cmdline: silence clang 12 warning

Konstantin Ananyev (1):
      acl: fix build with GCC 11

Lance Richardson (8):
      net/bnxt: fix Rx buffer posting
      net/bnxt: fix Tx length hint threshold
      net/bnxt: fix handling of null flow mask
      test: fix TCP header initialization
      net/bnxt: fix Rx descriptor status
      net/bnxt: fix Rx queue count
      net/bnxt: fix dynamic VNIC count
      eal: fix memory mapping on 32-bit target

Leyi Rong (1):
      net/iavf: fix packet length parsing in AVX512

Li Zhang (1):
      net/mlx5: fix flow actions index in cache

Luc Pelletier (2):
      eal: fix race in control thread creation
      eal: fix hang in control thread creation

Marvin Liu (5):
      vhost: fix split ring potential buffer overflow
      vhost: fix packed ring potential buffer overflow
      vhost: fix batch dequeue potential buffer overflow
      vhost: fix initialization of temporary header
      vhost: fix initialization of async temporary header

Matan Azrad (5):
      common/mlx5/linux: add glue function to query WQ
      common/mlx5: add DevX command to query WQ
      common/mlx5: add DevX commands for queue counters
      vdpa/mlx5: fix virtq cleaning
      vdpa/mlx5: fix device unplug

Michael Baum (1):
      net/mlx5: fix flow age event triggering

Michal Krawczyk (5):
      net/ena/base: improve style and comments
      net/ena/base: fix type conversions by explicit casting
      net/ena/base: destroy multiple wait events
      net/ena: fix crash with unsupported device argument
      net/ena: indicate Rx RSS hash presence

Min Hu (Connor) (25):
      net/hns3: fix MTU config complexity
      net/hns3: update HiSilicon copyright syntax
      net/hns3: fix copyright date
      examples/ptpclient: remove wrong comment
      test/bpf: fix error message
      doc: fix HiSilicon copyright syntax
      net/hns3: remove unused macros
      net/hns3: remove unused macro
      app/eventdev: fix overflow in lcore list parsing
      test/kni: fix a comment
      test/kni: check init result
      net/hns3: fix typos on comments
      net/e1000: fix flow error message object
      app/testpmd: fix division by zero on socket memory dump
      net/kni: warn on stop failure
      app/bbdev: check memory allocation
      app/bbdev: fix HARQ error messages
      raw/skeleton: add missing check after setting attribute
      test/timer: check memzone allocation
      app/crypto-perf: check memory allocation
      examples/flow_classify: fix NUMA check of port and core
      examples/l2fwd-cat: fix NUMA check of port and core
      examples/skeleton: fix NUMA check of port and core
      test: check flow classifier creation
      test: fix division by zero

Murphy Yang (3):
      net/ixgbe: fix RSS RETA being reset after port start
      net/i40e: fix flow director config after flow validate
      net/i40e: fix flow director for common pctypes

Natanael Copa (5):
      common/dpaax/caamflib: fix build with musl
      bus/dpaa: fix 64-bit arch detection
      bus/dpaa: fix build with musl
      net/cxgbe: remove use of uint type
      app/testpmd: fix build with musl

Nipun Gupta (1):
      bus/dpaa: fix statistics reading

Nithin Dabilpuram (3):
      vfio: do not merge contiguous areas
      vfio: fix DMA mapping granularity for IOVA as VA
      test/mem: fix page size for external memory

Olivier Matz (1):
      test/mempool: fix object initializer

Pallavi Kadam (1):
      bus/pci: skip probing some Windows NDIS devices

Pavan Nikhilesh (4):
      test/event: fix timeout accuracy
      app/eventdev: fix timeout accuracy
      app/eventdev: fix lcore parsing skipping last core
      event/octeontx2: fix XAQ pool reconfigure

Pu Xu (1):
      ip_frag: fix fragmenting IPv4 packet with header option

Qi Zhang (8):
      net/ice/base: fix payload indicator on ptype
      net/ice/base: fix uninitialized struct
      net/ice/base: cleanup filter list on error
      net/ice/base: fix memory allocation for MAC addresses
      net/iavf: fix TSO max segment size
      doc: fix matching versions in ice guide
      net/iavf: fix wrong Tx context descriptor
      common/iavf: fix duplicated offload bit

Radha Mohan Chintakuntla (1):
      raw/octeontx2_dma: assign PCI device in DPI VF

Raslan Darawsheh (1):
      ethdev: update flow item GTP QFI definition

Richael Zhuang (2):
      test/power: add delay before checking CPU frequency
      test/power: round CPU frequency to check

Robin Zhang (6):
      net/i40e: announce request queue capability in PF
      doc: update recommended versions for i40e
      net/i40e: fix lack of MAC type when set MAC address
      net/iavf: fix lack of MAC type when set MAC address
      net/iavf: fix primary MAC type when starting port
      net/i40e: fix primary MAC type when starting port

Rohit Raj (3):
      net/dpaa2: fix getting link status
      net/dpaa: fix getting link status
      examples/l2fwd-crypto: fix packet length while decryption

Roy Shterman (1):
      mem: fix freeing segments in --huge-unlink mode

Satheesh Paul (1):
      net/octeontx2: fix VLAN filter

Savinay Dharmappa (1):
      sched: fix traffic class oversubscription parameter

Shijith Thotton (3):
      eventdev: fix case to initiate crypto adapter service
      event/octeontx2: fix crypto adapter queue pair operations
      event/octeontx2: configure crypto adapter xaq pool

Siwar Zitouni (1):
      net/ice: fix disabling promiscuous mode

Somnath Kotur (5):
      net/bnxt: fix xstats get
      net/bnxt: fix Rx and Tx timestamps
      net/bnxt: fix Tx timestamp init
      net/bnxt: refactor multi-queue Rx configuration
      net/bnxt: fix Rx timestamp when FIFO pending bit is set

Stanislaw Kardach (6):
      test: proceed if timer subsystem already initialized
      stack: allow lock-free only on relevant architectures
      test/distributor: fix worker notification in burst mode
      test/distributor: fix burst flush on worker quit
      net/ena: remove endian swap functions
      net/ena: report default ring size

Stephen Hemminger (2):
      kni: refactor user request processing
      net/bnxt: use prefix on global function

Suanming Mou (1):
      net/mlx5: fix counter offset detection

Tal Shnaiderman (2):
      eal/windows: fix default thread priority
      eal/windows: fix return codes of pthread shim layer

Tengfei Zhang (1):
      net/pcap: fix file descriptor leak on close

Thinh Tran (1):
      test: fix autotest handling of skipped tests

Thomas Monjalon (18):
      bus/pci: fix Windows kernel driver categories
      eal: fix comment of OS-specific header files
      buildtools: fix build with busybox
      build: detect execinfo library on Linux
      build: remove redundant _GNU_SOURCE definitions
      eal: fix build with musl
      net/igc: remove use of uint type
      event/dlb: fix header includes for musl
      examples/bbdev: fix header include for musl
      drivers: fix log level after loading
      app/regex: fix usage text
      app/testpmd: fix usage text
      doc: fix names of UIO drivers
      doc: fix build with Sphinx 4
      bus/pci: support I/O port operations with musl
      app: fix exit messages
      regex/octeontx2: remove unused include directory
      doc: remove PDF requirements

Tianyu Li (1):
      net/memif: fix Tx bps statistics for zero-copy

Timothy McDaniel (2):
      event/dlb2: remove references to deferred scheduling
      doc: fix runtime options in DLB2 guide

Tyler Retzlaff (1):
      eal: add C++ include guard for reciprocal header

Vadim Podovinnikov (1):
      net/bonding: fix LACP system address check

Venkat Duvvuru (1):
      net/bnxt: fix queues per VNIC

Viacheslav Ovsiienko (16):
      net/mlx5: fix external buffer pool registration for Rx queue
      net/mlx5: fix metadata item validation for ingress flows
      net/mlx5: fix hashed list size for tunnel flow groups
      net/mlx5: fix UAR allocation diagnostics messages
      common/mlx5: add timestamp format support to DevX
      vdpa/mlx5: support timestamp format
      net/mlx5: fix Rx metadata leftovers
      net/mlx5: fix drop action for Direct Rules/Verbs
      net/mlx4: fix RSS action with null hash key
      net/mlx5: support timestamp format
      regex/mlx5: support timestamp format
      app/testpmd: fix segment number check
      net/mlx5: remove drop queue function prototypes
      net/mlx4: fix buffer leakage on device close
      net/mlx5: fix probing device in legacy bonding mode
      net/mlx5: fix receiving queue timestamp format

Wei Huang (1):
      raw/ifpga: fix device name format

Wenjun Wu (3):
      net/ice: check some functions return
      net/ice: fix RSS hash update
      net/ice: fix RSS for L2 packet

Wenwu Ma (1):
      net/ice: fix illegal access when removing MAC filter

Wenzhuo Lu (2):
      net/iavf: fix crash in AVX512
      net/ice: fix crash in AVX512

Wisam Jaddo (1):
      app/flow-perf: fix encap/decap actions

Xiao Wang (1):
      vdpa/ifc: check PCI config read

Xiaoyu Min (4):
      net/mlx5: support RSS expansion for IPv6 GRE
      net/mlx5: fix shared inner RSS
      net/mlx5: fix missing shared RSS hash types
      net/mlx5: fix redundant flow after RSS expansion

Xiaoyun Li (2):
      app/testpmd: remove unnecessary UDP tunnel check
      net/i40e: fix IPv4 fragment offload

Xueming Li (4):
      version: 20.11.2-rc1
      net/virtio: fix vectorized Rx queue rearm
      version: 20.11.2-rc2
      version: 20.11.2

Youri Querry (1):
      bus/fslmc: fix random portal hangs with qbman 5.0

Yunjian Wang (5):
      vfio: fix API description
      net/mlx5: fix using flow tunnel before null check
      vfio: fix duplicated user mem map
      net/mlx4: fix leak when configured repeatedly
      net/mlx5: fix leak when configured repeatedly

^ permalink raw reply	[relevance 1%]

* Re: [dpdk-dev] Use WFE for spinlock and ring
  @ 2021-07-07 14:47  0%   ` Stephen Hemminger
  2021-07-08  9:41  0%     ` Ruifeng Wang
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2021-07-07 14:47 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: dev, david.marchand, thomas, jerinj, nd, honnappa.nagarahalli

On Sun, 25 Apr 2021 05:56:51 +0000
Ruifeng Wang <ruifeng.wang@arm.com> wrote:

> The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> for a memory location to become equal to a given value'[1].
> 
> Use the API for the rte spinlock and ring implementations.
> With the wait until equal APIs being stable, changes will not impact ABI.
> 
> [1] http://patches.dpdk.org/cover/62703/
> 
> v3:
> Series rebased. (David)
> 
> Gavin Hu (1):
>   spinlock: use wfe to reduce contention on aarch64
> 
> Ruifeng Wang (1):
>   ring: use wfe to wait for ring tail update on aarch64
> 
>  lib/eal/include/generic/rte_spinlock.h | 4 ++--
>  lib/ring/rte_ring_c11_pvt.h            | 4 ++--
>  lib/ring/rte_ring_generic_pvt.h        | 3 +--
>  3 files changed, 5 insertions(+), 6 deletions(-)
> 

Other places that should use WFE:

rte_mcslock.h:rte_mcslock_lock()
rte_mcslock_unlock:rte_mcslock_unlock()

rte_pflock.h:rte_pflock_lock()
rte_rwlock.h:rte_rwlock_read_lock()
rte_rwlock.h:rte_rwlock_write_lock()


You should also introduce rte_wait_while_XXX variants to handle some
of these cases.




^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs
  2021-07-01 10:38 23% ` [dpdk-dev] [PATCH v3] doc: policy on the " Ray Kinsella
@ 2021-07-07 18:32  0%   ` Tyler Retzlaff
  2021-07-09  6:16  0%   ` Jerin Jacob
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2021-07-07 18:32 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dev, bruce.richardson, john.mcnamara, ferruh.yigit, thomas,
	david.marchand, stephen

On Thu, Jul 01, 2021 at 11:38:42AM +0100, Ray Kinsella wrote:
> Clarifying the ABI policy on the promotion of experimental APIS to stable.
> We have a fair number of APIs that have been experimental for more than
> 2 years. This policy amendment indicates that these APIs should be
> promoted or removed, or should at least form a conservation between the
> maintainer and original contributor.
> 
> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> ---
Acked-By: Tyler Retzlaff <roretzla@microsoft.com>


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] ABI/API stability towards drivers
  2021-07-02  8:00  8% [dpdk-dev] ABI/API stability towards drivers Morten Brørup
  2021-07-02  9:45  7% ` [dpdk-dev] [dpdk-techboard] " Ferruh Yigit
  2021-07-02 12:26  4% ` Thomas Monjalon
@ 2021-07-07 18:46  8% ` Tyler Retzlaff
  2 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2021-07-07 18:46 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dpdk-techboard, dpdk-dev

On Fri, Jul 02, 2021 at 10:00:11AM +0200, Morten Brørup wrote:
> Regarding the ongoing ABI stability project, it is suggested to export driver interfaces as internal.
> 
> What are we targeting regarding ABI and API stability towards drivers?

last discussed the outcome was that there was no promise of api/abi stability
at all for drivers only applications. tech-board may have discussed it
further i don't know.

we (Microsoft) would like to see them evolve to stable abi/api but we
understand the challenges and effort involved. so driver stability is
pretty much the interface consumers problem right now for drivers built
in-tree and out of tree.

^ permalink raw reply	[relevance 8%]

* Re: [dpdk-dev] [pull-request] next-crypto 21.08 rc1
  @ 2021-07-07 21:57  5% ` Thomas Monjalon
  2021-07-08  7:39  0%   ` [dpdk-dev] [EXT] " Akhil Goyal
  2021-07-08  7:41  0%   ` [dpdk-dev] " Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2021-07-07 21:57 UTC (permalink / raw)
  To: Akhil Goyal, Shijith Thotton; +Cc: dev, jerinj, david.marchand

07/07/2021 21:30, Akhil Goyal:
> Shijith Thotton (2):
>       drivers: add octeontx crypto adapter framework
>       drivers: add octeontx crypto adapter data path

It seems there is an ABI breakage:

devtools/check-abi.sh: line 38: 958581 Segmentation fault
(core dumped) abidiff $ABIDIFF_OPTIONS $dump $dump2
Error: ABI issue reported for 'abidiff --suppr devtools/libabigail.abignore --no-added-syms --headers-dir1 v21.05/build-gcc-shared/usr/local/include --headers-dir2 build-gcc-shared/install/usr/local/include v21.05/build-gcc-shared/dump/librte_crypto_octeontx.dump build-gcc-shared/install/dump/librte_crypto_octeontx.dump'

Without this series, the ABI check is passing.



^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [EXT] Re:  [pull-request] next-crypto 21.08 rc1
  2021-07-07 21:57  5% ` Thomas Monjalon
@ 2021-07-08  7:39  0%   ` Akhil Goyal
  2021-07-08  7:41  0%   ` [dpdk-dev] " Thomas Monjalon
  1 sibling, 0 replies; 200+ results
From: Akhil Goyal @ 2021-07-08  7:39 UTC (permalink / raw)
  To: Thomas Monjalon, Shijith Thotton
  Cc: dev, Jerin Jacob Kollanukkaran, david.marchand

> 07/07/2021 21:30, Akhil Goyal:
> > Shijith Thotton (2):
> >       drivers: add octeontx crypto adapter framework
> >       drivers: add octeontx crypto adapter data path
> 
> It seems there is an ABI breakage:
> 
> devtools/check-abi.sh: line 38: 958581 Segmentation fault
> (core dumped) abidiff $ABIDIFF_OPTIONS $dump $dump2
> Error: ABI issue reported for 'abidiff --suppr devtools/libabigail.abignore --
> no-added-syms --headers-dir1 v21.05/build-gcc-shared/usr/local/include --
> headers-dir2 build-gcc-shared/install/usr/local/include v21.05/build-gcc-
> shared/dump/librte_crypto_octeontx.dump build-gcc-
> shared/install/dump/librte_crypto_octeontx.dump'
> 
> Without this series, the ABI check is passing.
> 

I do not see this error at my end + there is no such issue reported on CI.
On CI it failed only on FreeBSD, and that too was a false report.

Can you paste the output of
'abidiff --suppr devtools/libabigail.abignore --no-added-syms --headers-dir1 v21.05/build-gcc-shared/usr/local/include --headers-dir2 build-gcc-shared/install/usr/local/include v21.05/build-gcc-shared/dump/librte_crypto_octeontx.dump build-gcc-shared/install/dump/librte_crypto_octeontx.dump'

Regards,
Akhil

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [pull-request] next-crypto 21.08 rc1
  2021-07-07 21:57  5% ` Thomas Monjalon
  2021-07-08  7:39  0%   ` [dpdk-dev] [EXT] " Akhil Goyal
@ 2021-07-08  7:41  0%   ` Thomas Monjalon
  2021-07-08  7:47  3%     ` David Marchand
  2021-07-08  7:48  0%     ` [dpdk-dev] [EXT] " Akhil Goyal
  1 sibling, 2 replies; 200+ results
From: Thomas Monjalon @ 2021-07-08  7:41 UTC (permalink / raw)
  To: Akhil Goyal; +Cc: Shijith Thotton, dev, jerinj, david.marchand

07/07/2021 23:57, Thomas Monjalon:
> 07/07/2021 21:30, Akhil Goyal:
> > Shijith Thotton (2):
> >       drivers: add octeontx crypto adapter framework
> >       drivers: add octeontx crypto adapter data path
> 
> It seems there is an ABI breakage:
> 
> devtools/check-abi.sh: line 38: 958581 Segmentation fault
> (core dumped) abidiff $ABIDIFF_OPTIONS $dump $dump2
> Error: ABI issue reported for 'abidiff --suppr devtools/libabigail.abignore --no-added-syms --headers-dir1 v21.05/build-gcc-shared/usr/local/include --headers-dir2 build-gcc-shared/install/usr/local/include v21.05/build-gcc-shared/dump/librte_crypto_octeontx.dump build-gcc-shared/install/dump/librte_crypto_octeontx.dump'
> 
> Without this series, the ABI check is passing.

After updating libabigail, it passes OK.

Note there was another bug, in PPC toolchain this time.
After upgrading to recent PPC toolchain it is OK.

What a difficult pull request for the tools!



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [EXT] Re:  [pull-request] next-crypto 21.08 rc1
  2021-07-08  7:41  0%   ` [dpdk-dev] " Thomas Monjalon
  2021-07-08  7:47  3%     ` David Marchand
@ 2021-07-08  7:48  0%     ` Akhil Goyal
  1 sibling, 0 replies; 200+ results
From: Akhil Goyal @ 2021-07-08  7:48 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Shijith Thotton, dev, Jerin Jacob Kollanukkaran, david.marchand

> 07/07/2021 23:57, Thomas Monjalon:
> > 07/07/2021 21:30, Akhil Goyal:
> > > Shijith Thotton (2):
> > >       drivers: add octeontx crypto adapter framework
> > >       drivers: add octeontx crypto adapter data path
> >
> > It seems there is an ABI breakage:
> >
> > devtools/check-abi.sh: line 38: 958581 Segmentation fault
> > (core dumped) abidiff $ABIDIFF_OPTIONS $dump $dump2
> > Error: ABI issue reported for 'abidiff --suppr devtools/libabigail.abignore --
> no-added-syms --headers-dir1 v21.05/build-gcc-shared/usr/local/include --
> headers-dir2 build-gcc-shared/install/usr/local/include v21.05/build-gcc-
> shared/dump/librte_crypto_octeontx.dump build-gcc-
> shared/install/dump/librte_crypto_octeontx.dump'
> >
> > Without this series, the ABI check is passing.
> 
> After updating libabigail, it passes OK.
> 
> Note there was another bug, in PPC toolchain this time.
> After upgrading to recent PPC toolchain it is OK.
> 
> What a difficult pull request for the tools!
> 
Ok thanks for the update. Is there anything else in the pull request which I need to look into?

Regards,
Akhil

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [pull-request] next-crypto 21.08 rc1
  2021-07-08  7:41  0%   ` [dpdk-dev] " Thomas Monjalon
@ 2021-07-08  7:47  3%     ` David Marchand
  2021-07-08  7:48  0%     ` [dpdk-dev] [EXT] " Akhil Goyal
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2021-07-08  7:47 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Akhil Goyal, Shijith Thotton, dev, Jerin Jacob Kollanukkaran,
	Andrew Rybchenko, Yigit, Ferruh

On Thu, Jul 8, 2021 at 9:41 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 07/07/2021 23:57, Thomas Monjalon:
> > 07/07/2021 21:30, Akhil Goyal:
> > > Shijith Thotton (2):
> > >       drivers: add octeontx crypto adapter framework
> > >       drivers: add octeontx crypto adapter data path
> >
> > It seems there is an ABI breakage:
> >
> > devtools/check-abi.sh: line 38: 958581 Segmentation fault
> > (core dumped) abidiff $ABIDIFF_OPTIONS $dump $dump2
> > Error: ABI issue reported for 'abidiff --suppr devtools/libabigail.abignore --no-added-syms --headers-dir1 v21.05/build-gcc-shared/usr/local/include --headers-dir2 build-gcc-shared/install/usr/local/include v21.05/build-gcc-shared/dump/librte_crypto_octeontx.dump build-gcc-shared/install/dump/librte_crypto_octeontx.dump'
> >
> > Without this series, the ABI check is passing.
>
> After updating libabigail, it passes OK.

And for the record...

- libabigail-1.8.1-1.fc32.x86_64 is fine,
- libabigail freshly compiled from current master is fine too


>
> Note there was another bug, in PPC toolchain this time.
> After upgrading to recent PPC toolchain it is OK.

- bootlin toolchain powerpc64le-power8--glibc--stable-2018.11-1 stalls
when compiling drivers/crypto/cnxk/cn9k_cryptodev_ops.c
- bootlin toolchain powerpc64le-power8--glibc--stable-2020.08-1 is fine

Plus, if someone wants to upgrade their ppc toolchain, don't forget to
regenerate your ABI reference with this toolchain.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
  @ 2021-07-08  9:27  4%         ` Raslan Darawsheh
  2021-07-08  9:39  0%           ` Andrew Rybchenko
  0 siblings, 1 reply; 200+ results
From: Raslan Darawsheh @ 2021-07-08  9:27 UTC (permalink / raw)
  To: Andrew Rybchenko, Singh, Aman Deep, dev

Thank you for the review,

Basically it's not used yet since it will break the abi
The main usage was in rte_flow item of gtp_psc
To replace the current structure with the header definition. And since this will break the abi I'm adding the header definition now but will be used later in rte_flow.

Kindest regards,
Raslan Darawsheh

________________________________
From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Sent: Thursday, July 8, 2021, 12:23 PM
To: Raslan Darawsheh; Singh, Aman Deep; dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc

Hi Raslan,

On 7/6/21 5:24 PM, Raslan Darawsheh wrote:
> Hi Guys,
>
> Sorry for missing this mail, for some reason it was missed in my inbox,
> This is the link to this rfc:
> https://www.3gpp.org/ftp/Specs/archive/38_series/38.415/38415-g30.zip

Thanks for the link. The patch LGTM, but I have only one question left.
Where is it used? Are you going to upstream corresponding code in
the release cycle?

Andrew.

> Kindest regards,
> Raslan Darawsheh
>
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Andrew Rybchenko
>> Sent: Thursday, July 1, 2021 5:06 PM
>> To: Singh, Aman Deep <aman.deep.singh@intel.com>; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
>>
>> Hi Raslan,
>>
>> could you reply, please.
>>
>> Andrew.
>>
>> On 6/22/21 10:27 AM, Singh, Aman Deep wrote:
>>> Hi Raslan,
>>>
>>> Can you please provide link to this RFC 38415-g30 I just had some
>>> doubt on byte-order conversion as per RFC 1700
>>> <https://tools.ietf.org/html/rfc1700>
>>>
>>> Regards
>>> Aman



^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
  2021-07-08  9:27  4%         ` Raslan Darawsheh
@ 2021-07-08  9:39  0%           ` Andrew Rybchenko
  2021-07-08 10:29  0%             ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Andrew Rybchenko @ 2021-07-08  9:39 UTC (permalink / raw)
  To: Raslan Darawsheh, Thomas Monjalon
  Cc: Singh, Aman Deep, dev, david.marchand, Olivier Matz

On 7/8/21 12:27 PM, Raslan Darawsheh wrote:
> Thank you for the review,
> 
> Basically it's not used yet since it will break the abi
> The main usage was in rte_flow item of gtp_psc
> To replace the current structure with the header definition. And since
> this will break the abi I'm adding the header definition now but will be
> used later in rte_flow.

@Thomas If so, should we accept it in the current release cycle
or should it simply wait for the code which uses it?

> Kindest regards,
> Raslan Darawsheh
> 
> ------------------------------------------------------------------------
> *From:* Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> *Sent:* Thursday, July 8, 2021, 12:23 PM
> *To:* Raslan Darawsheh; Singh, Aman Deep; dev@dpdk.org
> *Subject:* Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
> 
> Hi Raslan,
> 
> On 7/6/21 5:24 PM, Raslan Darawsheh wrote:
>> Hi Guys,
>>
>> Sorry for missing this mail, for some reason it was missed in my inbox, 
>> This is the link to this rfc:
>> https://www.3gpp.org/ftp/Specs/archive/38_series/38.415/38415-g30.zip
> <https://www.3gpp.org/ftp/Specs/archive/38_series/38.415/38415-g30.zip>
> 
> Thanks for the link. The patch LGTM, but I have only one question left.
> Where is it used? Are you going to upstream corresponding code in
> the release cycle?
> 
> Andrew.
> 
>> Kindest regards,
>> Raslan Darawsheh
>>
>>> -----Original Message-----
>>> From: dev <dev-bounces@dpdk.org> On Behalf Of Andrew Rybchenko
>>> Sent: Thursday, July 1, 2021 5:06 PM
>>> To: Singh, Aman Deep <aman.deep.singh@intel.com>; dev@dpdk.org
>>> Subject: Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
>>>
>>> Hi Raslan,
>>>
>>> could you reply, please.
>>>
>>> Andrew.
>>>
>>> On 6/22/21 10:27 AM, Singh, Aman Deep wrote:
>>>> Hi Raslan,
>>>>
>>>> Can you please provide link to this RFC 38415-g30 I just had some
>>>> doubt on byte-order conversion as per RFC 1700
>>>> <https://tools.ietf.org/html/rfc1700 <https://tools.ietf.org/html/rfc1700>>
>>>>
>>>> Regards
>>>> Aman
> 
> 


^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Use WFE for spinlock and ring
  2021-07-07 14:47  0%   ` Stephen Hemminger
@ 2021-07-08  9:41  0%     ` Ruifeng Wang
  2021-07-08 16:58  0%       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 200+ results
From: Ruifeng Wang @ 2021-07-08  9:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, david.marchand, thomas, jerinj, nd, Honnappa Nagarahalli, nd

> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Wednesday, July 7, 2021 10:48 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: dev@dpdk.org; david.marchand@redhat.com; thomas@monjalon.net;
> jerinj@marvell.com; nd <nd@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>
> Subject: Re: [dpdk-dev] Use WFE for spinlock and ring
> 
> On Sun, 25 Apr 2021 05:56:51 +0000
> Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> 
> > The rte_wait_until_equal_xxx APIs abstract the functionality of
> > 'polling for a memory location to become equal to a given value'[1].
> >
> > Use the API for the rte spinlock and ring implementations.
> > With the wait until equal APIs being stable, changes will not impact ABI.
> >
> > [1] http://patches.dpdk.org/cover/62703/
> >
> > v3:
> > Series rebased. (David)
> >
> > Gavin Hu (1):
> >   spinlock: use wfe to reduce contention on aarch64
> >
> > Ruifeng Wang (1):
> >   ring: use wfe to wait for ring tail update on aarch64
> >
> >  lib/eal/include/generic/rte_spinlock.h | 4 ++--
> >  lib/ring/rte_ring_c11_pvt.h            | 4 ++--
> >  lib/ring/rte_ring_generic_pvt.h        | 3 +--
> >  3 files changed, 5 insertions(+), 6 deletions(-)
> >
> 
> Other places that should use WFE:
Thank you Stephen for looking into this.

> 
> rte_mcslock.h:rte_mcslock_lock()
Existing API can be used in this one.

> rte_mcslock_unlock:rte_mcslock_unlock()
This one needs rte_wait_while_xxx variant.

> 
> rte_pflock.h:rte_pflock_lock()
> rte_rwlock.h:rte_rwlock_read_lock()
> rte_rwlock.h:rte_rwlock_write_lock()
These occurrences have extra logic (AND, conditional branch, CAS) in the loop.
I'm not sure generic API can be abstracted from these use cases.

> 
> 
> You should also introduce rte_wait_while_XXX variants to handle some of
> these cases.
> 



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH V3] ethdev: add dev configured flag
  @ 2021-07-08  9:56  3%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2021-07-08  9:56 UTC (permalink / raw)
  To: Huisong Li, Thomas Monjalon, Andrew Rybchenko, Yigit, Ferruh
  Cc: dev, Ananyev, Konstantin, Mcnamara, John, Ray Kinsella, Dodji Seketeli

On Wed, Jul 7, 2021 at 11:54 AM Huisong Li <lihuisong@huawei.com> wrote:
>
> Currently, if dev_configure is not called or fails to be called, users
> can still call dev_start successfully. So it is necessary to have a flag
> which indicates whether the device is configured, to control whether
> dev_start can be called and eliminate dependency on user invocation order.
>
> The flag stored in "struct rte_eth_dev_data" is more reasonable than
>  "enum rte_eth_dev_state". "enum rte_eth_dev_state" is private to the
> primary and secondary processes, and can be independently controlled.
> However, the secondary process does not make resource allocations and
> does not call dev_configure(). These are done by the primary process
> and can be obtained or used by the secondary process. So this patch
> adds a "dev_configured" flag in "rte_eth_dev_data", like "dev_started".
>
> Signed-off-by: Huisong Li <lihuisong@huawei.com>
> Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

As explained in the thread, I added a rather "large" ABI exception
rule so that we can merge this patch.

+; Ignore all changes to rte_eth_dev_data
+; Note: we only cared about dev_configured bit addition, but libabigail
+; seems to wrongly compute bitfields offset.
+; https://sourceware.org/bugzilla/show_bug.cgi?id=28060
+[suppress_type]
+        name = rte_eth_dev_data


*Reminder to ethdev maintainers*: with this exception, we have no
check on rte_eth_dev_data struct changes until 21.11.


Applied, thanks.

-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
  2021-07-08  9:39  0%           ` Andrew Rybchenko
@ 2021-07-08 10:29  0%             ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-08 10:29 UTC (permalink / raw)
  To: Raslan Darawsheh, Andrew Rybchenko
  Cc: Singh, Aman Deep, dev, david.marchand, Olivier Matz

08/07/2021 11:39, Andrew Rybchenko:
> On 7/8/21 12:27 PM, Raslan Darawsheh wrote:
> > Thank you for the review,
> > 
> > Basically it's not used yet since it will break the abi
> > The main usage was in rte_flow item of gtp_psc
> > To replace the current structure with the header definition. And since
> > this will break the abi I'm adding the header definition now but will be
> > used later in rte_flow.
> 
> @Thomas If so, should we accept it in the current release cycle
> or should it simply wait for the code which uses it?

If no need, we can wait next release.


> > ------------------------------------------------------------------------
> > *From:* Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> > *Sent:* Thursday, July 8, 2021, 12:23 PM
> > *To:* Raslan Darawsheh; Singh, Aman Deep; dev@dpdk.org
> > *Subject:* Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
> > 
> > Hi Raslan,
> > 
> > On 7/6/21 5:24 PM, Raslan Darawsheh wrote:
> >> Hi Guys,
> >>
> >> Sorry for missing this mail, for some reason it was missed in my inbox, 
> >> This is the link to this rfc:
> >> https://www.3gpp.org/ftp/Specs/archive/38_series/38.415/38415-g30.zip
> > <https://www.3gpp.org/ftp/Specs/archive/38_series/38.415/38415-g30.zip>
> > 
> > Thanks for the link. The patch LGTM, but I have only one question left.
> > Where is it used? Are you going to upstream corresponding code in
> > the release cycle?
> > 
> > Andrew.
> > 
> >> Kindest regards,
> >> Raslan Darawsheh
> >>
> >>> -----Original Message-----
> >>> From: dev <dev-bounces@dpdk.org> On Behalf Of Andrew Rybchenko
> >>> Sent: Thursday, July 1, 2021 5:06 PM
> >>> To: Singh, Aman Deep <aman.deep.singh@intel.com>; dev@dpdk.org
> >>> Subject: Re: [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc
> >>>
> >>> Hi Raslan,
> >>>
> >>> could you reply, please.
> >>>
> >>> Andrew.
> >>>
> >>> On 6/22/21 10:27 AM, Singh, Aman Deep wrote:
> >>>> Hi Raslan,
> >>>>
> >>>> Can you please provide link to this RFC 38415-g30 I just had some
> >>>> doubt on byte-order conversion as per RFC 1700
> >>>> <https://tools.ietf.org/html/rfc1700 <https://tools.ietf.org/html/rfc1700>>
> >>>>
> >>>> Regards
> >>>> Aman




^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison
  @ 2021-07-08 14:13  3%               ` Anatoly Burakov
  2021-07-08 16:56  0%                 ` McDaniel, Timothy
  2021-07-08 14:13  3%               ` [dpdk-dev] [PATCH v8 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
    2 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2021-07-08 14:13 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: ciara.loftus, david.hunt

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  2 ++
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index c92e016783..65910de348 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -135,6 +135,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index 8d65f287f4..65f325ede1 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 3f6e735984..5d7ab4f047 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..17370b77dc 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v8 4/7] power: remove thread safety from PMD power API's
    2021-07-08 14:13  3%               ` [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
@ 2021-07-08 14:13  3%               ` Anatoly Burakov
    2 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-08 14:13 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: ciara.loftus, konstantin.ananyev

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 65910de348..33e66d746b 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -137,6 +137,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 -----------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index c1097d32f1..4f6a242364 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -21,4 +21,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison
  2021-07-08 14:13  3%               ` [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
@ 2021-07-08 16:56  0%                 ` McDaniel, Timothy
  0 siblings, 0 replies; 200+ results
From: McDaniel, Timothy @ 2021-07-08 16:56 UTC (permalink / raw)
  To: Burakov, Anatoly, dev, Xing, Beilei, Wu, Jingjing, Yang, Qiming,
	Zhang, Qi Z, Wang, Haiyue, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Richardson, Bruce, Ananyev, Konstantin
  Cc: Loftus, Ciara, Hunt, David



> -----Original Message-----
> From: Burakov, Anatoly <anatoly.burakov@intel.com>
> Sent: Thursday, July 8, 2021 9:14 AM
> To: dev@dpdk.org; McDaniel, Timothy <timothy.mcdaniel@intel.com>; Xing,
> Beilei <beilei.xing@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Yang,
> Qiming <qiming.yang@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>; Wang,
> Haiyue <haiyue.wang@intel.com>; Matan Azrad <matan@nvidia.com>; Shahaf
> Shuler <shahafs@nvidia.com>; Viacheslav Ovsiienko <viacheslavo@nvidia.com>;
> Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>; Hunt, David <david.hunt@intel.com>
> Subject: [PATCH v8 1/7] power_intrinsics: use callbacks for comparison
> 
> Previously, the semantics of power monitor were such that we were
> checking current value against the expected value, and if they matched,
> then the sleep was aborted. This is somewhat inflexible, because it only
> allowed us to check for a specific value in a specific way.
> 
> This commit replaces the comparison with a user callback mechanism, so
> that any PMD (or other code) using `rte_power_monitor()` can define
> their own comparison semantics and decision making on how to detect the
> need to abort the entering of power optimized state.
> 
> Existing implementations are adjusted to follow the new semantics.
> 
> Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
> 
> Notes:
>     v4:
>     - Return error if callback is set to NULL
>     - Replace raw number with a macro in monitor condition opaque data
> 
>     v2:
>     - Use callback mechanism for more flexibility
>     - Address feedback from Konstantin
> 
>  doc/guides/rel_notes/release_21_08.rst        |  2 ++
>  drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
>  drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
>  drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
>  drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
>  drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
>  drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
>  .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
>  lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
>  9 files changed, 122 insertions(+), 44 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_21_08.rst
> b/doc/guides/rel_notes/release_21_08.rst
> index c92e016783..65910de348 100644
> --- a/doc/guides/rel_notes/release_21_08.rst
> +++ b/doc/guides/rel_notes/release_21_08.rst
> @@ -135,6 +135,8 @@ API Changes
>  * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
>    truncation.
> 
> +* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
> +
> 
>  ABI Changes
>  -----------
> diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
> index eca183753f..252bbd8d5e 100644
> --- a/drivers/event/dlb2/dlb2.c
> +++ b/drivers/event/dlb2/dlb2.c
> @@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port,
> int num)
>  	}
>  }
> 
> +#define CLB_MASK_IDX 0
> +#define CLB_VAL_IDX 1
> +static int
> +dlb2_monitor_callback(const uint64_t val,
> +		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
> +{
> +	/* abort if the value matches */
> +	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 :
> 0;
> +}
> +
>  static inline int
>  dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
>  		  struct dlb2_eventdev_port *ev_port,
> @@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
>  			expected_value = 0;
> 
>  		pmc.addr = monitor_addr;
> -		pmc.val = expected_value;
> -		pmc.mask = qe_mask.raw_qe[1];
> +		/* store expected value and comparison mask in opaque data */
> +		pmc.opaque[CLB_VAL_IDX] = expected_value;
> +		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
> +		/* set up callback */
> +		pmc.fn = dlb2_monitor_callback;
>  		pmc.size = sizeof(uint64_t);
> 
>  		rte_power_monitor(&pmc, timeout + start_ticks);
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> index 8d65f287f4..65f325ede1 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -81,6 +81,18 @@
>  #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
>  		(PKT_TX_OFFLOAD_MASK ^
> I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
> 
> +static int
> +i40e_monitor_callback(const uint64_t value,
> +		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ]
> __rte_unused)
> +{
> +	const uint64_t m = rte_cpu_to_le_64(1 <<
> I40E_RX_DESC_STATUS_DD_SHIFT);
> +	/*
> +	 * we expect the DD bit to be set to 1 if this descriptor was already
> +	 * written to.
> +	 */
> +	return (value & m) == m ? -1 : 0;
> +}
> +
>  int
>  i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond
> *pmc)
>  {
> @@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	/* watch for changes in status bit */
>  	pmc->addr = &rxdp->wb.qword1.status_error_len;
> 
> -	/*
> -	 * we expect the DD bit to be set to 1 if this descriptor was already
> -	 * written to.
> -	 */
> -	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
> -	pmc->mask = rte_cpu_to_le_64(1 <<
> I40E_RX_DESC_STATUS_DD_SHIFT);
> +	/* comparison callback */
> +	pmc->fn = i40e_monitor_callback;
> 
>  	/* registers are 64-bit */
>  	pmc->size = sizeof(uint64_t);
> diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
> index f817fbc49b..d61b32fcee 100644
> --- a/drivers/net/iavf/iavf_rxtx.c
> +++ b/drivers/net/iavf/iavf_rxtx.c
> @@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
>  				rxdid_map[flex_type] :
> IAVF_RXDID_COMMS_OVS_1;
>  }
> 
> +static int
> +iavf_monitor_callback(const uint64_t value,
> +		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ]
> __rte_unused)
> +{
> +	const uint64_t m = rte_cpu_to_le_64(1 <<
> IAVF_RX_DESC_STATUS_DD_SHIFT);
> +	/*
> +	 * we expect the DD bit to be set to 1 if this descriptor was already
> +	 * written to.
> +	 */
> +	return (value & m) == m ? -1 : 0;
> +}
> +
>  int
>  iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>  {
> @@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	/* watch for changes in status bit */
>  	pmc->addr = &rxdp->wb.qword1.status_error_len;
> 
> -	/*
> -	 * we expect the DD bit to be set to 1 if this descriptor was already
> -	 * written to.
> -	 */
> -	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
> -	pmc->mask = rte_cpu_to_le_64(1 <<
> IAVF_RX_DESC_STATUS_DD_SHIFT);
> +	/* comparison callback */
> +	pmc->fn = iavf_monitor_callback;
> 
>  	/* registers are 64-bit */
>  	pmc->size = sizeof(uint64_t);
> diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
> index 3f6e735984..5d7ab4f047 100644
> --- a/drivers/net/ice/ice_rxtx.c
> +++ b/drivers/net/ice/ice_rxtx.c
> @@ -27,6 +27,18 @@ uint64_t
> rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
>  uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
>  uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
> 
> +static int
> +ice_monitor_callback(const uint64_t value,
> +		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ]
> __rte_unused)
> +{
> +	const uint64_t m = rte_cpu_to_le_16(1 <<
> ICE_RX_FLEX_DESC_STATUS0_DD_S);
> +	/*
> +	 * we expect the DD bit to be set to 1 if this descriptor was already
> +	 * written to.
> +	 */
> +	return (value & m) == m ? -1 : 0;
> +}
> +
>  int
>  ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>  {
> @@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	/* watch for changes in status bit */
>  	pmc->addr = &rxdp->wb.status_error0;
> 
> -	/*
> -	 * we expect the DD bit to be set to 1 if this descriptor was already
> -	 * written to.
> -	 */
> -	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
> -	pmc->mask = rte_cpu_to_le_16(1 <<
> ICE_RX_FLEX_DESC_STATUS0_DD_S);
> +	/* comparison callback */
> +	pmc->fn = ice_monitor_callback;
> 
>  	/* register is 16-bit */
>  	pmc->size = sizeof(uint16_t);
> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
> index d69f36e977..c814a28cb4 100644
> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> @@ -1369,6 +1369,18 @@ const uint32_t
>  		RTE_PTYPE_INNER_L3_IPV4_EXT |
> RTE_PTYPE_INNER_L4_UDP,
>  };
> 
> +static int
> +ixgbe_monitor_callback(const uint64_t value,
> +		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ]
> __rte_unused)
> +{
> +	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> +	/*
> +	 * we expect the DD bit to be set to 1 if this descriptor was already
> +	 * written to.
> +	 */
> +	return (value & m) == m ? -1 : 0;
> +}
> +
>  int
>  ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond
> *pmc)
>  {
> @@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  	/* watch for changes in status bit */
>  	pmc->addr = &rxdp->wb.upper.status_error;
> 
> -	/*
> -	 * we expect the DD bit to be set to 1 if this descriptor was already
> -	 * written to.
> -	 */
> -	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> -	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
> +	/* comparison callback */
> +	pmc->fn = ixgbe_monitor_callback;
> 
>  	/* the registers are 32-bit */
>  	pmc->size = sizeof(uint32_t);
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index 777a1d6e45..17370b77dc 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev,
> uint16_t rx_queue_id)
>  	return rx_queue_count(rxq);
>  }
> 
> +#define CLB_VAL_IDX 0
> +#define CLB_MSK_IDX 1
> +static int
> +mlx_monitor_callback(const uint64_t value,
> +		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
> +{
> +	const uint64_t m = opaque[CLB_MSK_IDX];
> +	const uint64_t v = opaque[CLB_VAL_IDX];
> +
> +	return (value & m) == v ? -1 : 0;
> +}
> +
>  int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond
> *pmc)
>  {
>  	struct mlx5_rxq_data *rxq = rx_queue;
> @@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct
> rte_power_monitor_cond *pmc)
>  		return -rte_errno;
>  	}
>  	pmc->addr = &cqe->op_own;
> -	pmc->val =  !!idx;
> -	pmc->mask = MLX5_CQE_OWNER_MASK;
> +	pmc->opaque[CLB_VAL_IDX] = !!idx;
> +	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
> +	pmc->fn = mlx_monitor_callback;
>  	pmc->size = sizeof(uint8_t);
>  	return 0;
>  }
> diff --git a/lib/eal/include/generic/rte_power_intrinsics.h
> b/lib/eal/include/generic/rte_power_intrinsics.h
> index dddca3d41c..c9aa52a86d 100644
> --- a/lib/eal/include/generic/rte_power_intrinsics.h
> +++ b/lib/eal/include/generic/rte_power_intrinsics.h
> @@ -18,19 +18,38 @@
>   * which are architecture-dependent.
>   */
> 
> +/** Size of the opaque data in monitor condition */
> +#define RTE_POWER_MONITOR_OPAQUE_SZ 4
> +
> +/**
> + * Callback definition for monitoring conditions. Callbacks with this signature
> + * will be used by `rte_power_monitor()` to check if the entering of power
> + * optimized state should be aborted.
> + *
> + * @param val
> + *   The value read from memory.
> + * @param opaque
> + *   Callback-specific data.
> + *
> + * @return
> + *   0 if entering of power optimized state should proceed
> + *   -1 if entering of power optimized state should be aborted
> + */
> +typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
> +		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
>  struct rte_power_monitor_cond {
>  	volatile void *addr;  /**< Address to monitor for changes */
> -	uint64_t val;         /**< If the `mask` is non-zero, location pointed
> -	                       *   to by `addr` will be read and compared
> -	                       *   against this value.
> -	                       */
> -	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
> -	uint8_t size;    /**< Data size (in bytes) that will be used to compare
> -	                  *   expected value (`val`) with data read from the
> +	uint8_t size;    /**< Data size (in bytes) that will be read from the
>  	                  *   monitored memory location (`addr`). Can be 1, 2,
>  	                  *   4, or 8. Supplying any other value will result in
>  	                  *   an error.
>  	                  */
> +	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
> +	                             *   entering power optimized state should
> +	                             *   be aborted.
> +	                             */
> +	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
> +	/**< Callback-specific data */
>  };
> 
>  /**
> diff --git a/lib/eal/x86/rte_power_intrinsics.c
> b/lib/eal/x86/rte_power_intrinsics.c
> index 39ea9fdecd..66fea28897 100644
> --- a/lib/eal/x86/rte_power_intrinsics.c
> +++ b/lib/eal/x86/rte_power_intrinsics.c
> @@ -76,6 +76,7 @@ rte_power_monitor(const struct
> rte_power_monitor_cond *pmc,
>  	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
>  	const unsigned int lcore_id = rte_lcore_id();
>  	struct power_wait_status *s;
> +	uint64_t cur_value;
> 
>  	/* prevent user from running this instruction if it's not supported */
>  	if (!wait_supported)
> @@ -91,6 +92,9 @@ rte_power_monitor(const struct
> rte_power_monitor_cond *pmc,
>  	if (__check_val_size(pmc->size) < 0)
>  		return -EINVAL;
> 
> +	if (pmc->fn == NULL)
> +		return -EINVAL;
> +
>  	s = &wait_status[lcore_id];
> 
>  	/* update sleep address */
> @@ -110,16 +114,11 @@ rte_power_monitor(const struct
> rte_power_monitor_cond *pmc,
>  	/* now that we've put this address into monitor, we can unlock */
>  	rte_spinlock_unlock(&s->lock);
> 
> -	/* if we have a comparison mask, we might not need to sleep at all */
> -	if (pmc->mask) {
> -		const uint64_t cur_value = __get_umwait_val(
> -				pmc->addr, pmc->size);
> -		const uint64_t masked = cur_value & pmc->mask;
> +	cur_value = __get_umwait_val(pmc->addr, pmc->size);
> 
> -		/* if the masked value is already matching, abort */
> -		if (masked == pmc->val)
> -			goto end;
> -	}
> +	/* check if callback indicates we should abort */
> +	if (pmc->fn(cur_value, pmc->opaque) != 0)
> +		goto end;
> 
>  	/* execute UMWAIT */
>  	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
> --
> 2.25.1

DLB changes look good to me

Acked-by: timothy.mcdaniel@intel.com

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] Use WFE for spinlock and ring
  2021-07-08  9:41  0%     ` Ruifeng Wang
@ 2021-07-08 16:58  0%       ` Honnappa Nagarahalli
  0 siblings, 0 replies; 200+ results
From: Honnappa Nagarahalli @ 2021-07-08 16:58 UTC (permalink / raw)
  To: Ruifeng Wang, Stephen Hemminger
  Cc: dev, david.marchand, thomas, jerinj, nd, Honnappa Nagarahalli, nd

<snip>

> >
> > On Sun, 25 Apr 2021 05:56:51 +0000
> > Ruifeng Wang <ruifeng.wang@arm.com> wrote:
> >
> > > The rte_wait_until_equal_xxx APIs abstract the functionality of
> > > 'polling for a memory location to become equal to a given value'[1].
> > >
> > > Use the API for the rte spinlock and ring implementations.
> > > With the wait until equal APIs being stable, changes will not impact ABI.
> > >
> > > [1] http://patches.dpdk.org/cover/62703/
> > >
> > > v3:
> > > Series rebased. (David)
> > >
> > > Gavin Hu (1):
> > >   spinlock: use wfe to reduce contention on aarch64
> > >
> > > Ruifeng Wang (1):
> > >   ring: use wfe to wait for ring tail update on aarch64
> > >
> > >  lib/eal/include/generic/rte_spinlock.h | 4 ++--
> > >  lib/ring/rte_ring_c11_pvt.h            | 4 ++--
> > >  lib/ring/rte_ring_generic_pvt.h        | 3 +--
> > >  3 files changed, 5 insertions(+), 6 deletions(-)
> > >
> >
> > Other places that should use WFE:
> Thank you Stephen for looking into this.
> 
> >
> > rte_mcslock.h:rte_mcslock_lock()
> Existing API can be used in this one.
> 
> > rte_mcslock_unlock:rte_mcslock_unlock()
> This one needs rte_wait_while_xxx variant.
> 
> >
> > rte_pflock.h:rte_pflock_lock()
> > rte_rwlock.h:rte_rwlock_read_lock()
> > rte_rwlock.h:rte_rwlock_write_lock()
> These occurrences have extra logic (AND, conditional branch, CAS) in the loop.
> I'm not sure generic API can be abstracted from these use cases.
I think it is possible to create additional abstractions to address these cases.

> 
> >
> >
> > You should also introduce rte_wait_while_XXX variants to handle some
> > of these cases.
> >
> 
> 


^ permalink raw reply	[relevance 0%]

* [dpdk-dev] RFC enabling dll/dso for dpdk on windows
@ 2021-07-08 19:21  3% Tyler Retzlaff
  2021-07-08 20:49  3% ` Dmitry Kozlyuk
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2021-07-08 19:21 UTC (permalink / raw)
  To: dev, dmitry.kozliuk, thomas

hi folks,

we would like to submit a a patch series that makes dll/dso for dpdk
work on windows. there are two differences in the windows platform that
would need to be address through enhancements to dpdk.

(1) windows dynamic objects don't export sufficient information for
    tls variables and the windows loader and runtime would need to be
    enhanced in order to perform runtime linking. [1][2]

(2) importing exported data symbols from a dll/dso on windows requires
    that the symbol be decorated with dllimport. optionally loading
    performance of dll/dso is also further improved by decorating
    exported function symbols. [3]

for (1) a novel approach is proposed where a new set of per_lcore
macros are introduced and used to replace existing macros with some
adjustment to declaration/definition usage is made. of note

    * on linux the new macros would expand compatibly to maintain abi
      of existing exported tls variables. since windows dynamic
      linking has never worked there is no compatibility concern for
      existing windows binaries.

    * the existing macros would be retained for api compatibility
      potentially with the intent of deprecating them at a later time.

    * new macros would be "internal" to dpdk they should not be
      available to applications as a part of the stable api.

for (2) we would propose the introduction and use of two macros to
allow decoration of exported data symbols. these macro would be or
similarly named __rte_import and __rte_export. of note

    * on linux the macros would expand empty but optionally
      in the future__rte_export could be enhanced to expand to
      __attribute__((visibility("default"))) enabling the use of gcc
      -fvisibility=hidden in dpdk to improve dso load times. [4][5]

    * on windows the macros would trivially expand to
      __declspec(dllimport) and __declspec(dllexport)

    * library meson.build files would need to define a preprocessor
      knob to control decoration internal/external to libraries
      exporting data symbols to ensure optimal code generation for
      accesses.

the following is the base list of proposed macro additions for the new
per_lcore macros a new header is proposed named `rte_tls.h'

__rte_export
__rte_import

  have already been explained in (2)

__rte_thread_local

  is trivially expanded to __thread or _Thread_local or
  __declspec(thread) as appropriate.

RTE_DEFINE_TLS(vartype, varname, value)
RTE_DEFINE_TLS_EXPORT(vartype, varname, value)
RTE_DECLARE_TLS(vartype, varname)
RTE_DECLARE_TLS_IMPORT(vartype, varname)

  are roughly equivalent to RTE_DEFINE_PER_LCORE and
  RTE_DECLARE_PER_LCORE where the difference in the new macros are.

    * separate macros for exported vs non-exported variables.

    * DEFINE macros require initialization value as a parameter instead
      of the assignment usage. `RTE_DEFINE_PER_LCORE(t, n) = x;' there
      was no reasonable way to expand the windows variant of the macro
      to maintain assignment so it was parameterized to allow
      flexibility.

RTE_TLS(varname)

  is the equivalent of RTE_PER_LCORE to allow r/w access to the variable
  on linux the expansion is the same for windows it is non-trivial.

we look forward to feedback on this proposal, once we have initial
feedback the series will be submitted where further review can take
place.

thanks

1.  https://docs.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2492?view=msvc-160
2. https://docs.microsoft.com/en-us/windows/win32/debug/pe-format
3.  https://docs.microsoft.com/en-us/cpp/build/importing-into-an-application-using-declspec-dllimport?view=msvc-160
4. https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Function-Attributes.html
5. https://gcc.gnu.org/wiki/Visibility


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] RFC enabling dll/dso for dpdk on windows
  2021-07-08 19:21  3% [dpdk-dev] RFC enabling dll/dso for dpdk on windows Tyler Retzlaff
@ 2021-07-08 20:49  3% ` Dmitry Kozlyuk
  2021-07-09  1:03  2%   ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Dmitry Kozlyuk @ 2021-07-08 20:49 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, thomas

Hi Tyler,

2021-07-08 12:21 (UTC-0700), Tyler Retzlaff:
> hi folks,
> 
> we would like to submit a a patch series that makes dll/dso for dpdk
> work on windows. there are two differences in the windows platform that
> would need to be address through enhancements to dpdk.
> 
> (1) windows dynamic objects don't export sufficient information for
>     tls variables and the windows loader and runtime would need to be
>     enhanced in order to perform runtime linking. [1][2]

When will the new loader be available?
Will it be ported to Server 2019?
Will this functionality require compiler support
(you mention that accessing such variables will be "non-trivial")?
 
> (2) importing exported data symbols from a dll/dso on windows requires
>     that the symbol be decorated with dllimport. optionally loading
>     performance of dll/dso is also further improved by decorating
>     exported function symbols. [3]

Does it affect ABI?

It is also a huge code change, although a mechanical one.
Is it required? All exported symbols are listed in .map/def, after all.

> for (1) a novel approach is proposed where a new set of per_lcore
> macros are introduced and used to replace existing macros with some
> adjustment to declaration/definition usage is made. of note
> 
>     * on linux the new macros would expand compatibly to maintain abi
>       of existing exported tls variables. since windows dynamic
>       linking has never worked there is no compatibility concern for
>       existing windows binaries.
> 
>     * the existing macros would be retained for api compatibility
>       potentially with the intent of deprecating them at a later time.
> 
>     * new macros would be "internal" to dpdk they should not be
>       available to applications as a part of the stable api.
> 
> for (2) we would propose the introduction and use of two macros to
> allow decoration of exported data symbols. these macro would be or
> similarly named __rte_import and __rte_export. of note
> 
>     * on linux the macros would expand empty but optionally
>       in the future__rte_export could be enhanced to expand to
>       __attribute__((visibility("default"))) enabling the use of gcc
>       -fvisibility=hidden in dpdk to improve dso load times. [4][5]
> 
>     * on windows the macros would trivially expand to
>       __declspec(dllimport) and __declspec(dllexport)
> 
>     * library meson.build files would need to define a preprocessor
>       knob to control decoration internal/external to libraries
>       exporting data symbols to ensure optimal code generation for
>       accesses.

Either you mean a macro to switch __rte_export between dllimport/dllexport
or I don't understand this point. BTW, what will __rte_export be for static
build?

> 
> the following is the base list of proposed macro additions for the new
> per_lcore macros a new header is proposed named `rte_tls.h'

When rte_thread_key*() family of functions was introduced as rte_tls_*(),
Jerin objected that there's a confusion with Transport Layer Security.
How about RTE_THREAD_VAR, etc?

> __rte_export
> __rte_import
> 
>   have already been explained in (2)
> 
> __rte_thread_local
> 
>   is trivially expanded to __thread or _Thread_local or
>   __declspec(thread) as appropriate.
> 
> RTE_DEFINE_TLS(vartype, varname, value)
> RTE_DEFINE_TLS_EXPORT(vartype, varname, value)
> RTE_DECLARE_TLS(vartype, varname)
> RTE_DECLARE_TLS_IMPORT(vartype, varname)
> 
>   are roughly equivalent to RTE_DEFINE_PER_LCORE and
>   RTE_DECLARE_PER_LCORE where the difference in the new macros are.
> 
>     * separate macros for exported vs non-exported variables.

Is it necessary, i.e. can' RTE_DECLARE/DEFINE_TLS compose with other
attributes, like __rte_experimental and __rte_deprecated?

>     * DEFINE macros require initialization value as a parameter instead
>       of the assignment usage. `RTE_DEFINE_PER_LCORE(t, n) = x;' there
>       was no reasonable way to expand the windows variant of the macro
>       to maintain assignment so it was parameterized to allow
>       flexibility.
> 
> RTE_TLS(varname)
> 
>   is the equivalent of RTE_PER_LCORE to allow r/w access to the variable
>   on linux the expansion is the same for windows it is non-trivial.
> [...]


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] RFC enabling dll/dso for dpdk on windows
  2021-07-08 20:49  3% ` Dmitry Kozlyuk
@ 2021-07-09  1:03  2%   ` Tyler Retzlaff
  2021-07-16  9:40  4%     ` Dmitry Kozlyuk
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2021-07-09  1:03 UTC (permalink / raw)
  To: Dmitry Kozlyuk; +Cc: dev, thomas

On Thu, Jul 08, 2021 at 11:49:53PM +0300, Dmitry Kozlyuk wrote:
> Hi Tyler,
> 
> 2021-07-08 12:21 (UTC-0700), Tyler Retzlaff:
> > hi folks,
> > 
> > we would like to submit a a patch series that makes dll/dso for dpdk
> > work on windows. there are two differences in the windows platform that
> > would need to be address through enhancements to dpdk.
> > 
> > (1) windows dynamic objects don't export sufficient information for
> >     tls variables and the windows loader and runtime would need to be
> >     enhanced in order to perform runtime linking. [1][2]
> 
> When will the new loader be available?

the solution i have prototyped does not directly export the tls variables
and instead relies on exports of tls offsets within a module.  no loader
change or new os is required.

> Will it be ported to Server 2019?

not necessary (as per above)

> Will this functionality require compiler support

the prototype was developed using windows clang, mingw code compiles but
i did not try to run it. i suspect it is okay though haven't examine any
side-effects when using emul tls like mingw does. anyway mingw dll's
don't work now and it probably shouldn't block them being available with
clang.

> (you mention that accessing such variables will be "non-trivial")?

the solution involves exporting offsets that then allow explicit tls
accesses relative to the gs segment. it's non-trivial in the sense that
none of the normal explicit tls functions in windows are used and the
compiler doesn't generate the code for implicit tls access. the overhead
is relatively tolerable (one or two additional dereferences).

>  
> > (2) importing exported data symbols from a dll/dso on windows requires
> >     that the symbol be decorated with dllimport. optionally loading
> >     performance of dll/dso is also further improved by decorating
> >     exported function symbols. [3]
> 
> Does it affect ABI?

the data symbols are already part of the abi for linux. this just allows
them to be properly accessed when exported from dll on windows.
surprisingly lld-link doesn't fail when building dll's now which it should
in the absence of a __declspec(dllimport) ms link would.

on windows now the tls variables are exported but not useful with this
change we would choose not to export them at all and each exported tls
variable would be replaced with a new variable.

one nit (which we will get separate feedback on) is how to export
symbols only on windows (and don't export them on linux) because similar
to the tls variables linux has no use for my new variables.

> 
> It is also a huge code change, although a mechanical one.
> Is it required? All exported symbols are listed in .map/def, after all.

if broad sweeping mechanical change is a sensitive issue we can limit
the change to just the data symbols which are required. but keeping in
mind there is a penalty on load time when the function symbols are not
decorated. ultimately we would like them all properly decorated but we
don't need to push it now since we're just trying to enable the
functionality.

> 
> > for (1) a novel approach is proposed where a new set of per_lcore
> > macros are introduced and used to replace existing macros with some
> > adjustment to declaration/definition usage is made. of note
> > 
> >     * on linux the new macros would expand compatibly to maintain abi
> >       of existing exported tls variables. since windows dynamic
> >       linking has never worked there is no compatibility concern for
> >       existing windows binaries.
> > 
> >     * the existing macros would be retained for api compatibility
> >       potentially with the intent of deprecating them at a later time.
> > 
> >     * new macros would be "internal" to dpdk they should not be
> >       available to applications as a part of the stable api.
> > 
> > for (2) we would propose the introduction and use of two macros to
> > allow decoration of exported data symbols. these macro would be or
> > similarly named __rte_import and __rte_export. of note
> > 
> >     * on linux the macros would expand empty but optionally
> >       in the future__rte_export could be enhanced to expand to
> >       __attribute__((visibility("default"))) enabling the use of gcc
> >       -fvisibility=hidden in dpdk to improve dso load times. [4][5]
> > 
> >     * on windows the macros would trivially expand to
> >       __declspec(dllimport) and __declspec(dllexport)
> > 
> >     * library meson.build files would need to define a preprocessor
> >       knob to control decoration internal/external to libraries
> >       exporting data symbols to ensure optimal code generation for
> >       accesses.
> 
> Either you mean a macro to switch __rte_export between dllimport/dllexport
> or I don't understand this point. BTW, what will __rte_export be for static
> build?

there are two import cases that a library like eal has when it exports a
data symbol.

e.g. if eal exports a variable where the variable is used both within
eal and outside of eal you want different expansions of __rte_import.

when consuming the variable within eal if __declspec(import) is used you
will get less-optimal codegen (because the code is generated for
imported access). however outside of eal you need the __declspec(import)
to generate the correct code to access the exported data.

i haven't looked into how gcc/ld deals with this. maybe ld is just
smarter and figures out when to generate the optimal code.

static build doesn't really get negatively impacted by __rte_export but
when statically linking the ms linker will complain with warnings that
can be suppressed without harm.

it's still something that is on my mind (and i don't want to make it an
issue that blocks this proposal) but i'm starting to lean toward a build
time option where either static or dynamic build is requested instead of
cobbling both together out of the same build product.  but that is
really off topic for this change.
> 
> > 
> > the following is the base list of proposed macro additions for the new
> > per_lcore macros a new header is proposed named `rte_tls.h'
> 
> When rte_thread_key*() family of functions was introduced as rte_tls_*(),
> Jerin objected that there's a confusion with Transport Layer Security.
> How about RTE_THREAD_VAR, etc?

no objection, one of the reason i posted the set of macros from the
prototype was so people could offer up suggestions on better namespace.

> 
> > __rte_export
> > __rte_import
> > 
> >   have already been explained in (2)
> > 
> > __rte_thread_local
> > 
> >   is trivially expanded to __thread or _Thread_local or
> >   __declspec(thread) as appropriate.
> > 
> > RTE_DEFINE_TLS(vartype, varname, value)
> > RTE_DEFINE_TLS_EXPORT(vartype, varname, value)
> > RTE_DECLARE_TLS(vartype, varname)
> > RTE_DECLARE_TLS_IMPORT(vartype, varname)
> > 
> >   are roughly equivalent to RTE_DEFINE_PER_LCORE and
> >   RTE_DECLARE_PER_LCORE where the difference in the new macros are.
> > 
> >     * separate macros for exported vs non-exported variables.
> 
> Is it necessary, i.e. can' RTE_DECLARE/DEFINE_TLS compose with other
> attributes, like __rte_experimental and __rte_deprecated?

it's necessary in so far as the existing per-lcore variables that are
not imported/export can still have storage class specifier like static
applied without jumping through hoops.

i tried for some time to have a single declare/define macro but dealing
with 'static' being used made the problem hard. i can't just "shut-off"
the nested __rte_export/__rte_import expansion if static is put in front
of the macro or parameterized.

> 
> >     * DEFINE macros require initialization value as a parameter instead
> >       of the assignment usage. `RTE_DEFINE_PER_LCORE(t, n) = x;' there
> >       was no reasonable way to expand the windows variant of the macro
> >       to maintain assignment so it was parameterized to allow
> >       flexibility.
> > 
> > RTE_TLS(varname)
> > 
> >   is the equivalent of RTE_PER_LCORE to allow r/w access to the variable
> >   on linux the expansion is the same for windows it is non-trivial.
> > [...]


^ permalink raw reply	[relevance 2%]

* Re: [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs
  2021-07-01 10:38 23% ` [dpdk-dev] [PATCH v3] doc: policy on the " Ray Kinsella
  2021-07-07 18:32  0%   ` Tyler Retzlaff
@ 2021-07-09  6:16  0%   ` Jerin Jacob
  2021-07-09 19:15  3%     ` Tyler Retzlaff
  1 sibling, 1 reply; 200+ results
From: Jerin Jacob @ 2021-07-09  6:16 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dpdk-dev, Richardson, Bruce, John McNamara, roretzla,
	Ferruh Yigit, Thomas Monjalon, David Marchand, Stephen Hemminger

On Thu, Jul 1, 2021 at 4:08 PM Ray Kinsella <mdr@ashroe.eu> wrote:
>
> Clarifying the ABI policy on the promotion of experimental APIS to stable.
> We have a fair number of APIs that have been experimental for more than
> 2 years. This policy amendment indicates that these APIs should be
> promoted or removed, or should at least form a conservation between the
> maintainer and original contributor.
>
> Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> ---
> v2: addressing comments on abi expiry from Tyler Retzlaff.
> v3: addressing typos in the git commit message
>
>  doc/guides/contributing/abi_policy.rst | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
> index 4ad87dbfed..840c295e5d 100644
> --- a/doc/guides/contributing/abi_policy.rst
> +++ b/doc/guides/contributing/abi_policy.rst
> @@ -26,9 +26,10 @@ General Guidelines
>     symbols is managed with :ref:`ABI Versioning <abi_versioning>`.
>  #. The removal of symbols is considered an :ref:`ABI breakage <abi_breakages>`,
>     once approved these will form part of the next ABI version.
> -#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may
> -   be changed or removed without prior notice, as they are not considered part
> -   of an ABI version.
> +#. Libraries or APIs marked as :ref:`experimental <experimental_apis>` may be
> +   changed or removed without prior notice, as they are not considered part of
> +   an ABI version. The :ref:`experimental <experimental_apis>` status of an API
> +   is not an indefinite state.
>  #. Updates to the :ref:`minimum hardware requirements <hw_rqmts>`, which drop
>     support for hardware which was previously supported, should be treated as an
>     ABI change.
> @@ -358,3 +359,18 @@ Libraries
>  Libraries marked as ``experimental`` are entirely not considered part of an ABI
>  version.
>  All functions in such libraries may be changed or removed without prior notice.
> +
> +Promotion to stable
> +~~~~~~~~~~~~~~~~~~~
> +
> +Ordinarily APIs marked as ``experimental`` will be promoted to the stable ABI
> +once a maintainer and/or the original contributor is satisfied that the API is
> +reasonably mature. In exceptional circumstances, should an API still be

Is this line with git commit message?
Why making an exceptional case? why not make it stable after two years
or remove it.
My worry is if we make an exception case, it will be difficult to
enumerate the exception case.

> +classified as ``experimental`` after two years and is without any prospect of
> +becoming part of the stable API. The API will then become a candidate for
> +removal, to avoid the acculumation of abandoned symbols.
> +
> +Should an API's Binary Interface change during the two year period, usually due
> +to a direct change in the to API's signature. It is reasonable for the expiry
> +clock to reset. The promotion or removal of symbols will typically form part of
> +a conversation between the maintainer and the original contributor.
> --
> 2.26.2
>

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 3/3] bitrate: promote rte_stats_bitrate_free() to stable
  @ 2021-07-09 15:19  3% ` Kevin Traynor
  0 siblings, 0 replies; 200+ results
From: Kevin Traynor @ 2021-07-09 15:19 UTC (permalink / raw)
  To: dev; +Cc: mdr, Kevin Traynor, Hemant Agrawal

rte_stats_bitrate_free() has been in DPDK since 20.11.

Its signature is very basic as it just frees an opaque
data struct allocated in rte_stats_bitrate_create()
and returns void.

It's unlikely that such a basic signature would need to change
so might as well promote it to stable for the next major ABI.

Cc: Hemant Agrawal <hemant.agrawal@nxp.com>

Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 lib/bitratestats/rte_bitrate.h | 3 ---
 lib/bitratestats/version.map   | 4 ++--
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/lib/bitratestats/rte_bitrate.h b/lib/bitratestats/rte_bitrate.h
index fcd1564ddc..e494389b95 100644
--- a/lib/bitratestats/rte_bitrate.h
+++ b/lib/bitratestats/rte_bitrate.h
@@ -8,6 +8,4 @@
 #include <stdint.h>
 
-#include <rte_compat.h>
-
 #ifdef __cplusplus
 extern "C" {
@@ -36,5 +34,4 @@ struct rte_stats_bitrates *rte_stats_bitrate_create(void);
  *   Pointer allocated by rte_stats_bitrate_create()
  */
-__rte_experimental
 void rte_stats_bitrate_free(struct rte_stats_bitrates *bitrate_data);
 
diff --git a/lib/bitratestats/version.map b/lib/bitratestats/version.map
index 152730bb4e..a14d21ebba 100644
--- a/lib/bitratestats/version.map
+++ b/lib/bitratestats/version.map
@@ -9,7 +9,7 @@ DPDK_21 {
 };
 
-EXPERIMENTAL {
+DPDK_22 {
 	global:
 
 	rte_stats_bitrate_free;
-};
+} DPDK_21;
-- 
2.31.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v1] doc: update ABI in MAINTAINERS file
  2021-06-25  8:08  7% ` Ferruh Yigit
@ 2021-07-09 15:50  4%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-09 15:50 UTC (permalink / raw)
  To: Ray Kinsella
  Cc: dev, stephen, ktraynor, bruce.richardson, Ferruh Yigit, Neil Horman

25/06/2021 10:08, Ferruh Yigit:
> On 6/22/2021 4:50 PM, Ray Kinsella wrote:
> > Update to ABI MAINTAINERS.
> > 
> > Signed-off-by: Ray Kinsella <mdr@ashroe.eu>
> > ---
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> >  ABI Policy & Versioning
> >  M: Ray Kinsella <mdr@ashroe.eu>
> > -M: Neil Horman <nhorman@tuxdriver.com>
> >  F: lib/eal/include/rte_compat.h
> >  F: lib/eal/include/rte_function_versioning.h
> >  F: doc/guides/contributing/abi_*.rst
> 
> Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 
> Tried to reach out Neil multiple times for ABI issues without success.

Acked-by: Thomas Monjalon <thomas@monjalon.net>

Applied



^ permalink raw reply	[relevance 4%]

* [dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison
  @ 2021-07-09 15:53  3%                 ` Anatoly Burakov
  2021-07-09 16:00  3%                   ` Anatoly Burakov
  2021-07-09 15:53  3%                 ` [dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's Anatoly Burakov
    2 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2021-07-09 15:53 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  2 ++
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 476822b47f..912fb13b84 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -144,6 +144,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index e518409fe5..8489f91f1d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 3f6e735984..5d7ab4f047 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..8d47637892 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx5_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx5_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's
    2021-07-09 15:53  3%                 ` [dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison Anatoly Burakov
@ 2021-07-09 15:53  3%                 ` Anatoly Burakov
  2021-07-09 16:00  3%                   ` Anatoly Burakov
    2 siblings, 1 reply; 200+ results
From: Anatoly Burakov @ 2021-07-09 15:53 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: ciara.loftus, konstantin.ananyev

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 912fb13b84..b9a3caabf0 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -146,6 +146,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 -----------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index 36e5a65874..bf937acde4 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -22,4 +22,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison
  2021-07-09 15:53  3%                 ` [dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison Anatoly Burakov
@ 2021-07-09 16:00  3%                   ` Anatoly Burakov
  0 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-09 16:00 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  2 ++
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 476822b47f..912fb13b84 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -144,6 +144,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index e518409fe5..8489f91f1d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 3f6e735984..5d7ab4f047 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..8d47637892 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx5_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx5_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's
  2021-07-09 15:53  3%                 ` [dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's Anatoly Burakov
@ 2021-07-09 16:00  3%                   ` Anatoly Burakov
  0 siblings, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-09 16:00 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: ciara.loftus, konstantin.ananyev

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 912fb13b84..b9a3caabf0 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -146,6 +146,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 -----------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index 36e5a65874..bf937acde4 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -22,4 +22,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v10 1/8] eal: use callbacks for power monitoring comparison
  @ 2021-07-09 16:08  3%                   ` Anatoly Burakov
  2021-07-09 16:08  3%                   ` [dpdk-dev] [PATCH v10 5/8] power: remove thread safety from PMD power API's Anatoly Burakov
  1 sibling, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-09 16:08 UTC (permalink / raw)
  To: dev, Timothy McDaniel, Beilei Xing, Jingjing Wu, Qiming Yang,
	Qi Zhang, Haiyue Wang, Matan Azrad, Shahaf Shuler,
	Viacheslav Ovsiienko, Bruce Richardson, Konstantin Ananyev
  Cc: david.hunt, ciara.loftus

Previously, the semantics of power monitor were such that we were
checking current value against the expected value, and if they matched,
then the sleep was aborted. This is somewhat inflexible, because it only
allowed us to check for a specific value in a specific way.

This commit replaces the comparison with a user callback mechanism, so
that any PMD (or other code) using `rte_power_monitor()` can define
their own comparison semantics and decision making on how to detect the
need to abort the entering of power optimized state.

Existing implementations are adjusted to follow the new semantics.

Suggested-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
Acked-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
---

Notes:
    v4:
    - Return error if callback is set to NULL
    - Replace raw number with a macro in monitor condition opaque data
    
    v2:
    - Use callback mechanism for more flexibility
    - Address feedback from Konstantin

 doc/guides/rel_notes/release_21_08.rst        |  2 ++
 drivers/event/dlb2/dlb2.c                     | 17 ++++++++--
 drivers/net/i40e/i40e_rxtx.c                  | 20 +++++++----
 drivers/net/iavf/iavf_rxtx.c                  | 20 +++++++----
 drivers/net/ice/ice_rxtx.c                    | 20 +++++++----
 drivers/net/ixgbe/ixgbe_rxtx.c                | 20 +++++++----
 drivers/net/mlx5/mlx5_rx.c                    | 17 ++++++++--
 .../include/generic/rte_power_intrinsics.h    | 33 +++++++++++++++----
 lib/eal/x86/rte_power_intrinsics.c            | 17 +++++-----
 9 files changed, 122 insertions(+), 44 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 476822b47f..912fb13b84 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -144,6 +144,8 @@ API Changes
 * eal: ``rte_strscpy`` sets ``rte_errno`` to ``E2BIG`` in case of string
   truncation.
 
+* eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
+
 
 ABI Changes
 -----------
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index eca183753f..252bbd8d5e 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -3154,6 +3154,16 @@ dlb2_port_credits_inc(struct dlb2_port *qm_port, int num)
 	}
 }
 
+#define CLB_MASK_IDX 0
+#define CLB_VAL_IDX 1
+static int
+dlb2_monitor_callback(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	/* abort if the value matches */
+	return (val & opaque[CLB_MASK_IDX]) == opaque[CLB_VAL_IDX] ? -1 : 0;
+}
+
 static inline int
 dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 		  struct dlb2_eventdev_port *ev_port,
@@ -3194,8 +3204,11 @@ dlb2_dequeue_wait(struct dlb2_eventdev *dlb2,
 			expected_value = 0;
 
 		pmc.addr = monitor_addr;
-		pmc.val = expected_value;
-		pmc.mask = qe_mask.raw_qe[1];
+		/* store expected value and comparison mask in opaque data */
+		pmc.opaque[CLB_VAL_IDX] = expected_value;
+		pmc.opaque[CLB_MASK_IDX] = qe_mask.raw_qe[1];
+		/* set up callback */
+		pmc.fn = dlb2_monitor_callback;
 		pmc.size = sizeof(uint64_t);
 
 		rte_power_monitor(&pmc, timeout + start_ticks);
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
index e518409fe5..8489f91f1d 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -81,6 +81,18 @@
 #define I40E_TX_OFFLOAD_SIMPLE_NOTSUP_MASK \
 		(PKT_TX_OFFLOAD_MASK ^ I40E_TX_OFFLOAD_SIMPLE_SUP_MASK)
 
+static int
+i40e_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -93,12 +105,8 @@ i40e_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << I40E_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = i40e_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/iavf/iavf_rxtx.c b/drivers/net/iavf/iavf_rxtx.c
index f817fbc49b..d61b32fcee 100644
--- a/drivers/net/iavf/iavf_rxtx.c
+++ b/drivers/net/iavf/iavf_rxtx.c
@@ -57,6 +57,18 @@ iavf_proto_xtr_type_to_rxdid(uint8_t flex_type)
 				rxdid_map[flex_type] : IAVF_RXDID_COMMS_OVS_1;
 }
 
+static int
+iavf_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -69,12 +81,8 @@ iavf_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.qword1.status_error_len;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
-	pmc->mask = rte_cpu_to_le_64(1 << IAVF_RX_DESC_STATUS_DD_SHIFT);
+	/* comparison callback */
+	pmc->fn = iavf_monitor_callback;
 
 	/* registers are 64-bit */
 	pmc->size = sizeof(uint64_t);
diff --git a/drivers/net/ice/ice_rxtx.c b/drivers/net/ice/ice_rxtx.c
index 3f6e735984..5d7ab4f047 100644
--- a/drivers/net/ice/ice_rxtx.c
+++ b/drivers/net/ice/ice_rxtx.c
@@ -27,6 +27,18 @@ uint64_t rte_net_ice_dynflag_proto_xtr_ipv6_flow_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_tcp_mask;
 uint64_t rte_net_ice_dynflag_proto_xtr_ip_offset_mask;
 
+static int
+ice_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -39,12 +51,8 @@ ice_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.status_error0;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
-	pmc->mask = rte_cpu_to_le_16(1 << ICE_RX_FLEX_DESC_STATUS0_DD_S);
+	/* comparison callback */
+	pmc->fn = ice_monitor_callback;
 
 	/* register is 16-bit */
 	pmc->size = sizeof(uint16_t);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index d69f36e977..c814a28cb4 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -1369,6 +1369,18 @@ const uint32_t
 		RTE_PTYPE_INNER_L3_IPV4_EXT | RTE_PTYPE_INNER_L4_UDP,
 };
 
+static int
+ixgbe_monitor_callback(const uint64_t value,
+		const uint64_t arg[RTE_POWER_MONITOR_OPAQUE_SZ] __rte_unused)
+{
+	const uint64_t m = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/*
+	 * we expect the DD bit to be set to 1 if this descriptor was already
+	 * written to.
+	 */
+	return (value & m) == m ? -1 : 0;
+}
+
 int
 ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
@@ -1381,12 +1393,8 @@ ixgbe_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 	/* watch for changes in status bit */
 	pmc->addr = &rxdp->wb.upper.status_error;
 
-	/*
-	 * we expect the DD bit to be set to 1 if this descriptor was already
-	 * written to.
-	 */
-	pmc->val = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
-	pmc->mask = rte_cpu_to_le_32(IXGBE_RXDADV_STAT_DD);
+	/* comparison callback */
+	pmc->fn = ixgbe_monitor_callback;
 
 	/* the registers are 32-bit */
 	pmc->size = sizeof(uint32_t);
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 777a1d6e45..8d47637892 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -269,6 +269,18 @@ mlx5_rx_queue_count(struct rte_eth_dev *dev, uint16_t rx_queue_id)
 	return rx_queue_count(rxq);
 }
 
+#define CLB_VAL_IDX 0
+#define CLB_MSK_IDX 1
+static int
+mlx5_monitor_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+
+	return (value & m) == v ? -1 : 0;
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -282,8 +294,9 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		return -rte_errno;
 	}
 	pmc->addr = &cqe->op_own;
-	pmc->val =  !!idx;
-	pmc->mask = MLX5_CQE_OWNER_MASK;
+	pmc->opaque[CLB_VAL_IDX] = !!idx;
+	pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+	pmc->fn = mlx5_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index dddca3d41c..c9aa52a86d 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -18,19 +18,38 @@
  * which are architecture-dependent.
  */
 
+/** Size of the opaque data in monitor condition */
+#define RTE_POWER_MONITOR_OPAQUE_SZ 4
+
+/**
+ * Callback definition for monitoring conditions. Callbacks with this signature
+ * will be used by `rte_power_monitor()` to check if the entering of power
+ * optimized state should be aborted.
+ *
+ * @param val
+ *   The value read from memory.
+ * @param opaque
+ *   Callback-specific data.
+ *
+ * @return
+ *   0 if entering of power optimized state should proceed
+ *   -1 if entering of power optimized state should be aborted
+ */
+typedef int (*rte_power_monitor_clb_t)(const uint64_t val,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ]);
 struct rte_power_monitor_cond {
 	volatile void *addr;  /**< Address to monitor for changes */
-	uint64_t val;         /**< If the `mask` is non-zero, location pointed
-	                       *   to by `addr` will be read and compared
-	                       *   against this value.
-	                       */
-	uint64_t mask;   /**< 64-bit mask to extract value read from `addr` */
-	uint8_t size;    /**< Data size (in bytes) that will be used to compare
-	                  *   expected value (`val`) with data read from the
+	uint8_t size;    /**< Data size (in bytes) that will be read from the
 	                  *   monitored memory location (`addr`). Can be 1, 2,
 	                  *   4, or 8. Supplying any other value will result in
 	                  *   an error.
 	                  */
+	rte_power_monitor_clb_t fn; /**< Callback to be used to check if
+	                             *   entering power optimized state should
+	                             *   be aborted.
+	                             */
+	uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ];
+	/**< Callback-specific data */
 };
 
 /**
diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c
index 39ea9fdecd..66fea28897 100644
--- a/lib/eal/x86/rte_power_intrinsics.c
+++ b/lib/eal/x86/rte_power_intrinsics.c
@@ -76,6 +76,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	const uint32_t tsc_h = (uint32_t)(tsc_timestamp >> 32);
 	const unsigned int lcore_id = rte_lcore_id();
 	struct power_wait_status *s;
+	uint64_t cur_value;
 
 	/* prevent user from running this instruction if it's not supported */
 	if (!wait_supported)
@@ -91,6 +92,9 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (__check_val_size(pmc->size) < 0)
 		return -EINVAL;
 
+	if (pmc->fn == NULL)
+		return -EINVAL;
+
 	s = &wait_status[lcore_id];
 
 	/* update sleep address */
@@ -110,16 +114,11 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	/* now that we've put this address into monitor, we can unlock */
 	rte_spinlock_unlock(&s->lock);
 
-	/* if we have a comparison mask, we might not need to sleep at all */
-	if (pmc->mask) {
-		const uint64_t cur_value = __get_umwait_val(
-				pmc->addr, pmc->size);
-		const uint64_t masked = cur_value & pmc->mask;
+	cur_value = __get_umwait_val(pmc->addr, pmc->size);
 
-		/* if the masked value is already matching, abort */
-		if (masked == pmc->val)
-			goto end;
-	}
+	/* check if callback indicates we should abort */
+	if (pmc->fn(cur_value, pmc->opaque) != 0)
+		goto end;
 
 	/* execute UMWAIT */
 	asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;"
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH v10 5/8] power: remove thread safety from PMD power API's
    2021-07-09 16:08  3%                   ` [dpdk-dev] [PATCH v10 1/8] eal: use callbacks for power monitoring comparison Anatoly Burakov
@ 2021-07-09 16:08  3%                   ` Anatoly Burakov
  1 sibling, 0 replies; 200+ results
From: Anatoly Burakov @ 2021-07-09 16:08 UTC (permalink / raw)
  To: dev, David Hunt; +Cc: konstantin.ananyev, ciara.loftus

Currently, we expect that only one callback can be active at any given
moment, for a particular queue configuration, which is relatively easy
to implement in a thread-safe way. However, we're about to add support
for multiple queues per lcore, which will greatly increase the
possibility of various race conditions.

We could have used something like an RCU for this use case, but absent
of a pressing need for thread safety we'll go the easy way and just
mandate that the API's are to be called when all affected ports are
stopped, and document this limitation. This greatly simplifies the
`rte_power_monitor`-related code.

Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Tested-by: David Hunt <david.hunt@intel.com>
---

Notes:
    v2:
    - Add check for stopped queue
    - Clarified doc message
    - Added release notes

 doc/guides/rel_notes/release_21_08.rst |   4 +
 lib/power/meson.build                  |   3 +
 lib/power/rte_power_pmd_mgmt.c         | 133 ++++++++++---------------
 lib/power/rte_power_pmd_mgmt.h         |   6 ++
 4 files changed, 66 insertions(+), 80 deletions(-)

diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_notes/release_21_08.rst
index 912fb13b84..b9a3caabf0 100644
--- a/doc/guides/rel_notes/release_21_08.rst
+++ b/doc/guides/rel_notes/release_21_08.rst
@@ -146,6 +146,10 @@ API Changes
 
 * eal: the ``rte_power_intrinsics`` API changed to use a callback mechanism.
 
+* rte_power: The experimental PMD power management API is no longer considered
+  to be thread safe; all Rx queues affected by the API will now need to be
+  stopped before making any changes to the power management scheme.
+
 
 ABI Changes
 -----------
diff --git a/lib/power/meson.build b/lib/power/meson.build
index 36e5a65874..bf937acde4 100644
--- a/lib/power/meson.build
+++ b/lib/power/meson.build
@@ -22,4 +22,7 @@ headers = files(
         'rte_power_pmd_mgmt.h',
         'rte_power_guest_channel.h',
 )
+if cc.has_argument('-Wno-cast-qual')
+    cflags += '-Wno-cast-qual'
+endif
 deps += ['timer', 'ethdev']
diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
index db03cbf420..9b95cf1794 100644
--- a/lib/power/rte_power_pmd_mgmt.c
+++ b/lib/power/rte_power_pmd_mgmt.c
@@ -40,8 +40,6 @@ struct pmd_queue_cfg {
 	/**< Callback mode for this queue */
 	const struct rte_eth_rxtx_callback *cur_cb;
 	/**< Callback instance */
-	volatile bool umwait_in_progress;
-	/**< are we currently sleeping? */
 	uint64_t empty_poll_stats;
 	/**< Number of empty polls */
 } __rte_cache_aligned;
@@ -92,30 +90,11 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
 			struct rte_power_monitor_cond pmc;
 			uint16_t ret;
 
-			/*
-			 * we might get a cancellation request while being
-			 * inside the callback, in which case the wakeup
-			 * wouldn't work because it would've arrived too early.
-			 *
-			 * to get around this, we notify the other thread that
-			 * we're sleeping, so that it can spin until we're done.
-			 * unsolicited wakeups are perfectly safe.
-			 */
-			q_conf->umwait_in_progress = true;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-			/* check if we need to cancel sleep */
-			if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) {
-				/* use monitoring condition to sleep */
-				ret = rte_eth_get_monitor_addr(port_id, qidx,
-						&pmc);
-				if (ret == 0)
-					rte_power_monitor(&pmc, UINT64_MAX);
-			}
-			q_conf->umwait_in_progress = false;
-
-			rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+			/* use monitoring condition to sleep */
+			ret = rte_eth_get_monitor_addr(port_id, qidx,
+					&pmc);
+			if (ret == 0)
+				rte_power_monitor(&pmc, UINT64_MAX);
 		}
 	} else
 		q_conf->empty_poll_stats = 0;
@@ -177,12 +156,24 @@ clb_scale_freq(uint16_t port_id, uint16_t qidx,
 	return nb_rx;
 }
 
+static int
+queue_stopped(const uint16_t port_id, const uint16_t queue_id)
+{
+	struct rte_eth_rxq_info qinfo;
+
+	if (rte_eth_rx_queue_info_get(port_id, queue_id, &qinfo) < 0)
+		return -1;
+
+	return qinfo.queue_state == RTE_ETH_QUEUE_STATE_STOPPED;
+}
+
 int
 rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
 {
 	struct pmd_queue_cfg *queue_cfg;
 	struct rte_eth_dev_info info;
+	rte_rx_callback_fn clb;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
@@ -203,6 +194,14 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		goto end;
 	}
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		ret = ret < 0 ? -EINVAL : -EBUSY;
+		goto end;
+	}
+
 	queue_cfg = &port_cfg[port_id][queue_id];
 
 	if (queue_cfg->pwr_mgmt_state != PMD_MGMT_DISABLED) {
@@ -232,17 +231,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->umwait_in_progress = false;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* ensure we update our state before callback starts */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_umwait, NULL);
+		clb = clb_umwait;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_SCALE:
@@ -269,16 +258,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 			ret = -ENOTSUP;
 			goto end;
 		}
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id,
-				queue_id, clb_scale_freq, NULL);
+		clb = clb_scale_freq;
 		break;
 	}
 	case RTE_POWER_MGMT_TYPE_PAUSE:
@@ -286,18 +266,21 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
 		if (global_data.tsc_per_us == 0)
 			calc_tsc();
 
-		/* initialize data before enabling the callback */
-		queue_cfg->empty_poll_stats = 0;
-		queue_cfg->cb_mode = mode;
-		queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
-
-		/* this is not necessary here, but do it anyway */
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
-		queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
-				clb_pause, NULL);
+		clb = clb_pause;
 		break;
+	default:
+		RTE_LOG(DEBUG, POWER, "Invalid power management type\n");
+		ret = -EINVAL;
+		goto end;
 	}
+
+	/* initialize data before enabling the callback */
+	queue_cfg->empty_poll_stats = 0;
+	queue_cfg->cb_mode = mode;
+	queue_cfg->pwr_mgmt_state = PMD_MGMT_ENABLED;
+	queue_cfg->cur_cb = rte_eth_add_rx_callback(port_id, queue_id,
+			clb, NULL);
+
 	ret = 0;
 end:
 	return ret;
@@ -308,12 +291,20 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		uint16_t port_id, uint16_t queue_id)
 {
 	struct pmd_queue_cfg *queue_cfg;
+	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL);
 
 	if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT)
 		return -EINVAL;
 
+	/* check if the queue is stopped */
+	ret = queue_stopped(port_id, queue_id);
+	if (ret != 1) {
+		/* error means invalid queue, 0 means queue wasn't stopped */
+		return ret < 0 ? -EINVAL : -EBUSY;
+	}
+
 	/* no need to check queue id as wrong queue id would not be enabled */
 	queue_cfg = &port_cfg[port_id][queue_id];
 
@@ -323,27 +314,8 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 	/* stop any callbacks from progressing */
 	queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED;
 
-	/* ensure we update our state before continuing */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
-
 	switch (queue_cfg->cb_mode) {
-	case RTE_POWER_MGMT_TYPE_MONITOR:
-	{
-		bool exit = false;
-		do {
-			/*
-			 * we may request cancellation while the other thread
-			 * has just entered the callback but hasn't started
-			 * sleeping yet, so keep waking it up until we know it's
-			 * done sleeping.
-			 */
-			if (queue_cfg->umwait_in_progress)
-				rte_power_monitor_wakeup(lcore_id);
-			else
-				exit = true;
-		} while (!exit);
-	}
-	/* fall-through */
+	case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */
 	case RTE_POWER_MGMT_TYPE_PAUSE:
 		rte_eth_remove_rx_callback(port_id, queue_id,
 				queue_cfg->cur_cb);
@@ -356,10 +328,11 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id,
 		break;
 	}
 	/*
-	 * we don't free the RX callback here because it is unsafe to do so
-	 * unless we know for a fact that all data plane threads have stopped.
+	 * the API doc mandates that the user stops all processing on affected
+	 * ports before calling any of these API's, so we can assume that the
+	 * callbacks can be freed. we're intentionally casting away const-ness.
 	 */
-	queue_cfg->cur_cb = NULL;
+	rte_free((void *)queue_cfg->cur_cb);
 
 	return 0;
 }
diff --git a/lib/power/rte_power_pmd_mgmt.h b/lib/power/rte_power_pmd_mgmt.h
index 7a0ac24625..444e7b8a66 100644
--- a/lib/power/rte_power_pmd_mgmt.h
+++ b/lib/power/rte_power_pmd_mgmt.h
@@ -43,6 +43,9 @@ enum rte_power_pmd_mgmt_type {
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue will be polled from.
  * @param port_id
@@ -69,6 +72,9 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id,
  *
  * @note This function is not thread-safe.
  *
+ * @warning This function must be called when all affected Ethernet queues are
+ *   stopped and no Rx/Tx is in progress!
+ *
  * @param lcore_id
  *   The lcore the Rx queue is polled from.
  * @param port_id
-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v4 0/3] Use WFE for spinlock and ring
  2021-07-07  5:48  3% ` Ruifeng Wang
@ 2021-07-09 18:39  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-09 18:39 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: dev, david.marchand, bruce.richardson, jerinj, nd,
	honnappa.nagarahalli, ruifeng.wang

07/07/2021 07:48, Ruifeng Wang:
> The rte_wait_until_equal_xxx APIs abstract the functionality of 'polling
> for a memory location to become equal to a given value'[1].
> 
> Use the API for the rte spinlock and ring implementations.
> With the wait until equal APIs being stable, changes will not impact ABI.
> 
> Gavin Hu (1):
>   spinlock: use wfe to reduce contention on aarch64
> 
> Ruifeng Wang (2):
>   ring: use wfe to wait for ring tail update on aarch64
>   build: add option to enable wait until equal

As discussed in the thread, patches 1 & 2 are applied.
The patch 3 (meson option) is rejected.



^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs
  2021-07-09  6:16  0%   ` Jerin Jacob
@ 2021-07-09 19:15  3%     ` Tyler Retzlaff
  2021-07-11  7:22  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2021-07-09 19:15 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: Ray Kinsella, dpdk-dev, Richardson, Bruce, John McNamara,
	Ferruh Yigit, Thomas Monjalon, David Marchand, Stephen Hemminger

On Fri, Jul 09, 2021 at 11:46:54AM +0530, Jerin Jacob wrote:
> > +
> > +Promotion to stable
> > +~~~~~~~~~~~~~~~~~~~
> > +
> > +Ordinarily APIs marked as ``experimental`` will be promoted to the stable ABI
> > +once a maintainer and/or the original contributor is satisfied that the API is
> > +reasonably mature. In exceptional circumstances, should an API still be
> 
> Is this line with git commit message?
> Why making an exceptional case? why not make it stable after two years
> or remove it.
> My worry is if we make an exception case, it will be difficult to
> enumerate the exception case.

i think the intent here is to indicate that an api/abi doesn't just
automatically become stable after a period of time.  there also has to
be an evaluation by the maintainer / community before making it stable.

so i guess the timer is something that should force that evaluation. as
a part of that evaluation one would imagine there is justification for
keeping the api as experimental for longer and if so a rationale as to
why.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] doc: policy on the promotion of experimental APIs
  2021-07-09 19:15  3%     ` Tyler Retzlaff
@ 2021-07-11  7:22  0%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2021-07-11  7:22 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Ray Kinsella, dpdk-dev, Richardson, Bruce, John McNamara,
	Ferruh Yigit, Thomas Monjalon, David Marchand, Stephen Hemminger

On Sat, Jul 10, 2021 at 12:46 AM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> On Fri, Jul 09, 2021 at 11:46:54AM +0530, Jerin Jacob wrote:
> > > +
> > > +Promotion to stable
> > > +~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Ordinarily APIs marked as ``experimental`` will be promoted to the stable ABI
> > > +once a maintainer and/or the original contributor is satisfied that the API is
> > > +reasonably mature. In exceptional circumstances, should an API still be
> >
> > Is this line with git commit message?
> > Why making an exceptional case? why not make it stable after two years
> > or remove it.
> > My worry is if we make an exception case, it will be difficult to
> > enumerate the exception case.
>
> i think the intent here is to indicate that an api/abi doesn't just
> automatically become stable after a period of time.  there also has to
> be an evaluation by the maintainer / community before making it stable.
>
> so i guess the timer is something that should force that evaluation. as
> a part of that evaluation one would imagine there is justification for
> keeping the api as experimental for longer and if so a rationale as to
> why.

I think, we need to have a deadline. Probably one year timer for evaluation and
two year for max time for decision to make it as stable or remove.

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH v1] doc: update atomic operation deprecation
@ 2021-07-12  8:02  4% Joyce Kong
  0 siblings, 0 replies; 200+ results
From: Joyce Kong @ 2021-07-12  8:02 UTC (permalink / raw)
  To: thomas, stephen, honnappa.nagarahalli, ruifeng.wang, mdr; +Cc: dev, nd, stable

Update the incorrect description about atomic operations
with provided wrappers in deprecation doc[1].

[1]https://mails.dpdk.org/archives/dev/2021-July/213333.html

Fixes: 7518c5c4ae6a ("doc: announce adoption of C11 atomic operations semantics")
Cc: stable@dpdk.org

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 doc/guides/rel_notes/deprecation.rst | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 9584d6bfd7..4142315842 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -19,16 +19,16 @@ Deprecation Notices
 
 * rte_atomicNN_xxx: These APIs do not take memory order parameter. This does
   not allow for writing optimized code for all the CPU architectures supported
-  in DPDK. DPDK will adopt C11 atomic operations semantics and provide wrappers
-  using C11 atomic built-ins. These wrappers must be used for patches that
-  need to be merged in 20.08 onwards. This change will not introduce any
-  performance degradation.
+  in DPDK. DPDK has adopted atomic operations semantics. GCC atomic built-ins
+  must be used for patches that need to be merged in 20.08 onwards. This change
+  will not introduce any performance degradation.
 
 * rte_smp_*mb: These APIs provide full barrier functionality. However, many
-  use cases do not require full barriers. To support such use cases, DPDK will
-  adopt C11 barrier semantics and provide wrappers using C11 atomic built-ins.
-  These wrappers must be used for patches that need to be merged in 20.08
-  onwards. This change will not introduce any performance degradation.
+  use cases do not require full barriers. To support such use cases, DPDK has
+  adopted atomic barrier semantics. GCC atomic built-ins and a new wrapper
+  ``rte_atomic_thread_fence`` instead of ``__atomic_thread_fence`` must be
+  used for patches that need to be merged in 20.08 onwards. This change will
+  not introduce any performance degradation.
 
 * lib: will fix extending some enum/define breaking the ABI. There are multiple
   samples in DPDK that enum/define terminated with a ``.*MAX.*`` value which is
-- 
2.17.1


^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library
  @ 2021-07-12 12:05  3%   ` Bruce Richardson
  2021-07-12 15:50  3%   ` Bruce Richardson
  2021-07-13 14:19  3%   ` Ananyev, Konstantin
  2 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2021-07-12 12:05 UTC (permalink / raw)
  To: Chengwen Feng
  Cc: thomas, ferruh.yigit, jerinj, jerinjacobk, dev, mb, nipun.gupta,
	hemant.agrawal, maxime.coquelin, honnappa.nagarahalli,
	david.marchand, sburla, pkapoor, konstantin.ananyev, liangma

On Sun, Jul 11, 2021 at 05:25:56PM +0800, Chengwen Feng wrote:
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>

Thanks for this V2.
Some initial (mostly minor) comments on the meson.build and dmadev .c file
below. I'll review the headers in a separate email.

/Bruce

> ---
>  MAINTAINERS                  |    4 +
>  config/rte_config.h          |    3 +
>  lib/dmadev/meson.build       |    6 +
>  lib/dmadev/rte_dmadev.c      |  560 +++++++++++++++++++++++
>  lib/dmadev/rte_dmadev.h      | 1030 ++++++++++++++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h |  159 +++++++
>  lib/dmadev/rte_dmadev_pmd.h  |   72 +++
>  lib/dmadev/version.map       |   40 ++
>  lib/meson.build              |    1 +
>  9 files changed, 1875 insertions(+)
>  create mode 100644 lib/dmadev/meson.build
>  create mode 100644 lib/dmadev/rte_dmadev.c
>  create mode 100644 lib/dmadev/rte_dmadev.h
>  create mode 100644 lib/dmadev/rte_dmadev_core.h
>  create mode 100644 lib/dmadev/rte_dmadev_pmd.h
>  create mode 100644 lib/dmadev/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4347555..0595239 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -496,6 +496,10 @@ F: drivers/raw/skeleton/
>  F: app/test/test_rawdev.c
>  F: doc/guides/prog_guide/rawdev.rst
>  
> +DMA device API - EXPERIMENTAL
> +M: Chengwen Feng <fengchengwen@huawei.com>
> +F: lib/dmadev/
> +
>  
>  Memory Pool Drivers
>  -------------------
> diff --git a/config/rte_config.h b/config/rte_config.h
> index 590903c..331a431 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -81,6 +81,9 @@
>  /* rawdev defines */
>  #define RTE_RAWDEV_MAX_DEVS 64
>  
> +/* dmadev defines */
> +#define RTE_DMADEV_MAX_DEVS 64
> +
>  /* ip_fragmentation defines */
>  #define RTE_LIBRTE_IP_FRAG_MAX_FRAG 4
>  #undef RTE_LIBRTE_IP_FRAG_TBL_STAT
> diff --git a/lib/dmadev/meson.build b/lib/dmadev/meson.build
> new file mode 100644
> index 0000000..c918dae
> --- /dev/null
> +++ b/lib/dmadev/meson.build
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2021 HiSilicon Limited.
> +
> +sources = files('rte_dmadev.c')
> +headers = files('rte_dmadev.h', 'rte_dmadev_pmd.h')

If rte_dmadev_pmd.h is only for PMD use, then it should be in
"driver_sdk_headers".

> +indirect_headers += files('rte_dmadev_core.h')
> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> new file mode 100644
> index 0000000..8a29abb
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -0,0 +1,560 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + */
> +
> +#include <ctype.h>
> +#include <inttypes.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#include <rte_debug.h>
> +#include <rte_dev.h>
> +#include <rte_eal.h>
> +#include <rte_errno.h>
> +#include <rte_lcore.h>
> +#include <rte_log.h>
> +#include <rte_memory.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_string_fns.h>
> +
> +#include "rte_dmadev.h"
> +#include "rte_dmadev_pmd.h"
> +
> +RTE_LOG_REGISTER(rte_dmadev_logtype, lib.dmadev, INFO);
> +
> +struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];
> +
> +static const char *MZ_RTE_DMADEV_DATA = "rte_dmadev_data";
> +/* Shared memory between primary and secondary processes. */
> +static struct {
> +	struct rte_dmadev_data data[RTE_DMADEV_MAX_DEVS];
> +} *dmadev_shared_data;
> +
> +static int
> +dmadev_check_name(const char *name)
> +{
> +	size_t name_len;
> +
> +	if (name == NULL) {
> +		RTE_DMADEV_LOG(ERR, "Name can't be NULL\n");
> +		return -EINVAL;
> +	}
> +
> +	name_len = strnlen(name, RTE_DMADEV_NAME_MAX_LEN);
> +	if (name_len == 0) {
> +		RTE_DMADEV_LOG(ERR, "Zero length DMA device name\n");
> +		return -EINVAL;
> +	}
> +	if (name_len >= RTE_DMADEV_NAME_MAX_LEN) {
> +		RTE_DMADEV_LOG(ERR, "DMA device name is too long\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static uint16_t
> +dmadev_find_free_dev(void)
> +{
> +	uint16_t i;
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if (dmadev_shared_data->data[i].dev_name[0] == '\0') {
> +			RTE_ASSERT(rte_dmadevices[i].attached == 0);
> +			return i;
> +		}
> +	}
> +
> +	return RTE_DMADEV_MAX_DEVS;
> +}
> +
> +static struct rte_dmadev*
> +dmadev_allocated(const char *name)

The name implies a boolean lookup for whether a particular dmadev has been
allocated or not. Since this returns a pointer, I think a name like
"dmadev_find" or "dmadev_get" would be more appropriate.

> +{
> +	uint16_t i;
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if ((rte_dmadevices[i].attached == 1) &&
> +		    (!strcmp(name, rte_dmadevices[i].data->dev_name)))
> +			return &rte_dmadevices[i];
> +	}
> +
> +	return NULL;
> +}
> +
> +static int
> +dmadev_shared_data_prepare(void)
> +{
> +	const struct rte_memzone *mz;
> +
> +	if (dmadev_shared_data == NULL) {
> +		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +			/* Allocate port data and ownership shared memory. */
> +			mz = rte_memzone_reserve(MZ_RTE_DMADEV_DATA,
> +					 sizeof(*dmadev_shared_data),
> +					 rte_socket_id(), 0);
> +		} else {
> +			mz = rte_memzone_lookup(MZ_RTE_DMADEV_DATA);
> +		}
> +		if (mz == NULL)
> +			return -ENOMEM;
> +
> +		dmadev_shared_data = mz->addr;
> +		if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +			memset(dmadev_shared_data->data, 0,
> +			       sizeof(dmadev_shared_data->data));
> +	}
> +
> +	return 0;
> +}
> +
> +static struct rte_dmadev *
> +dmadev_allocate(const char *name)
> +{
> +	struct rte_dmadev *dev;
> +	uint16_t dev_id;
> +
> +	dev = dmadev_allocated(name);
> +	if (dev != NULL) {
> +		RTE_DMADEV_LOG(ERR, "DMA device already allocated\n");
> +		return NULL;
> +	}
> +
> +	dev_id = dmadev_find_free_dev();
> +	if (dev_id == RTE_DMADEV_MAX_DEVS) {
> +		RTE_DMADEV_LOG(ERR, "Reached maximum number of DMA devices\n");
> +		return NULL;
> +	}
> +
> +	if (dmadev_shared_data_prepare() != 0) {
> +		RTE_DMADEV_LOG(ERR, "Cannot allocate DMA shared data\n");
> +		return NULL;
> +	}
> +
> +	dev = &rte_dmadevices[dev_id];
> +	dev->data = &dmadev_shared_data->data[dev_id];
> +	dev->data->dev_id = dev_id;
> +	strlcpy(dev->data->dev_name, name, sizeof(dev->data->dev_name));
> +
> +	return dev;
> +}
> +
> +static struct rte_dmadev *
> +dmadev_attach_secondary(const char *name)
> +{
> +	struct rte_dmadev *dev;
> +	uint16_t i;
> +
> +	if (dmadev_shared_data_prepare() != 0) {
> +		RTE_DMADEV_LOG(ERR, "Cannot allocate DMA shared data\n");
> +		return NULL;
> +	}
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if (!strcmp(dmadev_shared_data->data[i].dev_name, name))
> +			break;
> +	}
> +	if (i == RTE_DMADEV_MAX_DEVS) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %s is not driven by the primary process\n",
> +			name);
> +		return NULL;
> +	}
> +
> +	dev = &rte_dmadevices[i];
> +	dev->data = &dmadev_shared_data->data[i];
> +	RTE_ASSERT(dev->data->dev_id == i);
> +
> +	return dev;
> +}
> +
> +struct rte_dmadev *
> +rte_dmadev_pmd_allocate(const char *name)
> +{
> +	struct rte_dmadev *dev;
> +
> +	if (dmadev_check_name(name) != 0)
> +		return NULL;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +		dev = dmadev_allocate(name);
> +	else
> +		dev = dmadev_attach_secondary(name);
> +
> +	if (dev == NULL)
> +		return NULL;
> +	dev->attached = 1;
> +
> +	return dev;
> +}
> +
> +int
> +rte_dmadev_pmd_release(struct rte_dmadev *dev)
> +{
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	if (dev->attached == 0)
> +		return 0;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		rte_free(dev->data->dev_private);
> +		memset(dev->data, 0, sizeof(struct rte_dmadev_data));
> +	}
> +
> +	memset(dev, 0, sizeof(struct rte_dmadev));
> +	dev->attached = 0;
> +
> +	return 0;
> +}
> +
> +struct rte_dmadev *
> +rte_dmadev_get_device_by_name(const char *name)
> +{
> +	if (dmadev_check_name(name) != 0)
> +		return NULL;
> +	return dmadev_allocated(name);
> +}
> +
> +bool
> +rte_dmadev_is_valid_dev(uint16_t dev_id)
> +{
> +	if (dev_id >= RTE_DMADEV_MAX_DEVS ||
> +	    rte_dmadevices[dev_id].attached == 0)
> +		return false;
> +	return true;
> +}
> +
> +uint16_t
> +rte_dmadev_count(void)
> +{
> +	uint16_t count = 0;
> +	uint16_t i;
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if (rte_dmadevices[i].attached == 1)
> +			count++;
> +	}
> +
> +	return count;
> +}
> +
> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info)
> +{
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(dev_info, -EINVAL);
> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_info_get, -ENOTSUP);
> +	memset(dev_info, 0, sizeof(struct rte_dmadev_info));
> +	ret = (*dev->dev_ops->dev_info_get)(dev, dev_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	dev_info->device = dev->device;
> +
> +	return 0;
> +}

Should the info_get function (and the related info structure), not include
in it the parameters passed into the configure function. That way, the user
can query a previously set up configuration. This should be done at the
dmadev level, rather than driver level, since I see the parameters are
already being saved in configure below.

Also, for ABI purposes, I would strongly suggest passing "sizeof(dev_info)"
to the driver in the "dev_info_get" call. When dev_info changes, we can
version rte_dmadev_info_get, but can't version the functions that it calls
in turn. When we add a new field to the struct, the driver functions that
choose to use that new field can check the size of the struct passed to
determine if it's safe to write that new field or not. [So long as field is
added at the end, driver functions not updated for the new field, need no
changes]

> +
> +int
> +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf)
> +{
> +	struct rte_dmadev_info info;
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(dev_conf, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	ret = rte_dmadev_info_get(dev_id, &info);
> +	if (ret != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (dev_conf->max_vchans > info.max_vchans) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u configure too many vchans\n", dev_id);

We allow up to 100 characters per line for DPDK code, so these don't need
to be wrapped so aggressively.

> +		return -EINVAL;
> +	}
> +	if (dev_conf->enable_mt_vchan &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MT_VCHAN)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support MT-safe vchan\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (dev_conf->enable_mt_multi_vchan &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support MT-safe multiple vchan\n",
> +			dev_id);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->data->dev_started != 0) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u must be stopped to allow configuration\n",
> +			dev_id);
> +		return -EBUSY;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_configure, -ENOTSUP);
> +	ret = (*dev->dev_ops->dev_configure)(dev, dev_conf);
> +	if (ret == 0)
> +		memcpy(&dev->data->dev_conf, dev_conf, sizeof(*dev_conf));
> +
> +	return ret;
> +}
> +
> +int
> +rte_dmadev_start(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (dev->data->dev_started != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u already started\n", dev_id);

Maybe make this a warning rather than error.

> +		return 0;
> +	}
> +
> +	if (dev->dev_ops->dev_start == NULL)
> +		goto mark_started;
> +
> +	ret = (*dev->dev_ops->dev_start)(dev);
> +	if (ret != 0)
> +		return ret;
> +
> +mark_started:
> +	dev->data->dev_started = 1;
> +	return 0;
> +}
> +
> +int
> +rte_dmadev_stop(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (dev->data->dev_started == 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u already stopped\n", dev_id);

As above, suggest just warning rather than error.

> +		return 0;
> +	}
> +
> +	if (dev->dev_ops->dev_stop == NULL)
> +		goto mark_stopped;
> +
> +	ret = (*dev->dev_ops->dev_stop)(dev);
> +	if (ret != 0)
> +		return ret;
> +
> +mark_stopped:
> +	dev->data->dev_started = 0;
> +	return 0;
> +}
> +
> +int
> +rte_dmadev_close(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	/* Device must be stopped before it can be closed */
> +	if (dev->data->dev_started == 1) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u must be stopped before closing\n", dev_id);
> +		return -EBUSY;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_close, -ENOTSUP);
> +	return (*dev->dev_ops->dev_close)(dev);
> +}
> +
> +int
> +rte_dmadev_reset(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_reset, -ENOTSUP);
> +	/* Reset is not dependent on state of the device */
> +	return (*dev->dev_ops->dev_reset)(dev);
> +}

I would tend to agree with the query as to whether this is needed or not.
Can we perhaps remove for now, and add it back later if it does prove to be
needed. The less code to review and work with for the first version, the
better IMHO. :-)

> +
> +int
> +rte_dmadev_vchan_setup(uint16_t dev_id,
> +		       const struct rte_dmadev_vchan_conf *conf)
> +{
> +	struct rte_dmadev_info info;
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(conf, -EINVAL);

This is confusing, because you are actually doing a parameter check using a
macro named for checking a function. Better to explicitly just check conf
for null.

> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	ret = rte_dmadev_info_get(dev_id, &info);
> +	if (ret != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction == 0 ||
> +	    conf->direction & ~RTE_DMA_TRANSFER_DIR_ALL) {
> +		RTE_DMADEV_LOG(ERR, "Device %u direction invalid!\n", dev_id);
> +		return -EINVAL;
> +	}

I wonder should we allow direction == 0, to be the same as all bits set,
or to be all supported bits set?

> +	if (conf->direction & RTE_DMA_MEM_TO_MEM &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MEM_TO_MEM)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support mem2mem transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction & RTE_DMA_MEM_TO_DEV &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MEM_TO_DEV)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support mem2dev transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction & RTE_DMA_DEV_TO_MEM &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_DEV_TO_MEM)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support dev2mem transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction & RTE_DMA_DEV_TO_DEV &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_DEV_TO_DEV)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support dev2dev transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->nb_desc < info.min_desc || conf->nb_desc > info.max_desc) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u number of descriptors invalid\n", dev_id);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vchan_setup, -ENOTSUP);
> +	return (*dev->dev_ops->vchan_setup)(dev, conf);
> +}
> +
> +int
> +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u vchan %u out of range\n", dev_id, vchan);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vchan_release, -ENOTSUP);
> +	return (*dev->dev_ops->vchan_release)(dev, vchan);
> +}
> +
> +int
> +rte_dmadev_stats_get(uint16_t dev_id, int vchan, struct rte_dmadev_stats *stats)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(stats, -EINVAL);
> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (vchan < -1 || vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u vchan %u out of range\n", dev_id, vchan);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_get, -ENOTSUP);
> +	return (*dev->dev_ops->stats_get)(dev, vchan, stats);
> +}
> +
> +int
> +rte_dmadev_stats_reset(uint16_t dev_id, int vchan)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (vchan < -1 || vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u vchan %u out of range\n", dev_id, vchan);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_reset, -ENOTSUP);
> +	return (*dev->dev_ops->stats_reset)(dev, vchan);
> +}
> +
> +int
> +rte_dmadev_dump(uint16_t dev_id, FILE *f)
> +{
> +	struct rte_dmadev_info info;
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(f, -EINVAL);
> +
> +	ret = rte_dmadev_info_get(dev_id, &info);
> +	if (ret != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
> +		return -EINVAL;
> +	}
> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	fprintf(f, "DMA Dev %u, '%s' [%s]\n",
> +		dev->data->dev_id,
> +		dev->data->dev_name,
> +		dev->data->dev_started ? "started" : "stopped");
> +	fprintf(f, "  dev_capa: 0x%" PRIx64 "\n", info.dev_capa);
> +	fprintf(f, "  max_vchans_supported: %u\n", info.max_vchans);
> +	fprintf(f, "  max_vchans_configured: %u\n", info.nb_vchans);
> +	fprintf(f, "  MT-safe-configured: vchans: %u multi-vchans: %u\n",
> +		dev->data->dev_conf.enable_mt_vchan,
> +		dev->data->dev_conf.enable_mt_multi_vchan);
> +
> +	if (dev->dev_ops->dev_dump != NULL)
> +		return (*dev->dev_ops->dev_dump)(dev, f);
> +
> +	return 0;
> +}
> +
> +int
> +rte_dmadev_selftest(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_selftest, -ENOTSUP);
> +	return (*dev->dev_ops->dev_selftest)(dev_id);
> +}

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library
    2021-07-12 12:05  3%   ` Bruce Richardson
@ 2021-07-12 15:50  3%   ` Bruce Richardson
  2021-07-13  9:07  0%     ` Jerin Jacob
  2021-07-13 14:19  3%   ` Ananyev, Konstantin
  2 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2021-07-12 15:50 UTC (permalink / raw)
  To: Chengwen Feng
  Cc: thomas, ferruh.yigit, jerinj, jerinjacobk, dev, mb, nipun.gupta,
	hemant.agrawal, maxime.coquelin, honnappa.nagarahalli,
	david.marchand, sburla, pkapoor, konstantin.ananyev, liangma

On Sun, Jul 11, 2021 at 05:25:56PM +0800, Chengwen Feng wrote:
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>

Hi again,

some further review comments inline.

/Bruce

> ---
>  MAINTAINERS                  |    4 +
>  config/rte_config.h          |    3 +
>  lib/dmadev/meson.build       |    6 +
>  lib/dmadev/rte_dmadev.c      |  560 +++++++++++++++++++++++
>  lib/dmadev/rte_dmadev.h      | 1030 ++++++++++++++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h |  159 +++++++
>  lib/dmadev/rte_dmadev_pmd.h  |   72 +++
>  lib/dmadev/version.map       |   40 ++
>  lib/meson.build              |    1 +

<snip>

> diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> new file mode 100644
> index 0000000..8779512
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.h
> @@ -0,0 +1,1030 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + * Copyright(c) 2021 Marvell International Ltd.
> + */
> +
> +#ifndef _RTE_DMADEV_H_
> +#define _RTE_DMADEV_H_
> +
> +/**
> + * @file rte_dmadev.h
> + *
> + * RTE DMA (Direct Memory Access) device APIs.
> + *
> + * The DMA framework is built on the following model:
> + *
> + *     ---------------   ---------------       ---------------
> + *     | virtual DMA |   | virtual DMA |       | virtual DMA |
> + *     | channel     |   | channel     |       | channel     |
> + *     ---------------   ---------------       ---------------
> + *            |                |                      |
> + *            ------------------                      |
> + *                     |                              |
> + *               ------------                    ------------
> + *               |  dmadev  |                    |  dmadev  |
> + *               ------------                    ------------
> + *                     |                              |
> + *            ------------------               ------------------
> + *            | HW-DMA-channel |               | HW-DMA-channel |
> + *            ------------------               ------------------
> + *                     |                              |
> + *                     --------------------------------
> + *                                     |
> + *                           ---------------------
> + *                           | HW-DMA-Controller |
> + *                           ---------------------
> + *
> + * The DMA controller could have multilpe HW-DMA-channels (aka. HW-DMA-queues),
> + * each HW-DMA-channel should be represented by a dmadev.
> + *
> + * The dmadev could create multiple virtual DMA channel, each virtual DMA
> + * channel represents a different transfer context. The DMA operation request
> + * must be submitted to the virtual DMA channel.
> + * E.G. Application could create virtual DMA channel 0 for mem-to-mem transfer
> + *      scenario, and create virtual DMA channel 1 for mem-to-dev transfer
> + *      scenario.
> + *
> + * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
> + * PCI/SoC device probing phase performed at EAL initialization time. And could
> + * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
> + * phase.
> + *
> + * We use 'uint16_t dev_id' as the device identifier of a dmadev, and
> + * 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.
> + *
> + * The functions exported by the dmadev API to setup a device designated by its
> + * device identifier must be invoked in the following order:
> + *     - rte_dmadev_configure()
> + *     - rte_dmadev_vchan_setup()
> + *     - rte_dmadev_start()
> + *
> + * Then, the application can invoke dataplane APIs to process jobs.
> + *
> + * If the application wants to change the configuration (i.e. call
> + * rte_dmadev_configure()), it must call rte_dmadev_stop() first to stop the
> + * device and then do the reconfiguration before calling rte_dmadev_start()
> + * again. The dataplane APIs should not be invoked when the device is stopped.
> + *
> + * Finally, an application can close a dmadev by invoking the
> + * rte_dmadev_close() function.
> + *
> + * The dataplane APIs include two parts:
> + *   a) The first part is the submission of operation requests:
> + *        - rte_dmadev_copy()
> + *        - rte_dmadev_copy_sg() - scatter-gather form of copy
> + *        - rte_dmadev_fill()
> + *        - rte_dmadev_fill_sg() - scatter-gather form of fill
> + *        - rte_dmadev_perform() - issue doorbell to hardware
> + *      These APIs could work with different virtual DMA channels which have
> + *      different contexts.
> + *      The first four APIs are used to submit the operation request to the
> + *      virtual DMA channel, if the submission is successful, a uint16_t
> + *      ring_idx is returned, otherwise a negative number is returned.
> + *   b) The second part is to obtain the result of requests:
> + *        - rte_dmadev_completed()
> + *            - return the number of operation requests completed successfully.
> + *        - rte_dmadev_completed_fails()
> + *            - return the number of operation requests failed to complete.

Please rename this to "completed_status" to allow the return of information
other than just errors. As I suggested before, I think this should also be
usable as a slower version of "completed" even in the case where there are
no errors, in that it returns status information for each and every job
rather than just returning as soon as it hits a failure.

> + * + * About the ring_idx which rte_dmadev_copy/copy_sg/fill/fill_sg()
> returned, + * the rules are as follows: + *   a) ring_idx for each
> virtual DMA channel are independent.  + *   b) For a virtual DMA channel,
> the ring_idx is monotonically incremented, + *      when it reach
> UINT16_MAX, it wraps back to zero.

Based on other feedback, I suggest we put in the detail here that: "This
index can be used by applications to track per-job metadata in an
application-defined circular ring, where the ring is a power-of-2 size, and
the indexes are masked appropriately."

> + *   c) The initial ring_idx of a virtual DMA channel is zero, after the device
> + *      is stopped or reset, the ring_idx needs to be reset to zero.
> + *   Example:
> + *      step-1: start one dmadev
> + *      step-2: enqueue a copy operation, the ring_idx return is 0
> + *      step-3: enqueue a copy operation again, the ring_idx return is 1
> + *      ...
> + *      step-101: stop the dmadev
> + *      step-102: start the dmadev
> + *      step-103: enqueue a copy operation, the cookie return is 0
> + *      ...
> + *      step-x+0: enqueue a fill operation, the ring_idx return is 65535
> + *      step-x+1: enqueue a copy operation, the ring_idx return is 0
> + *      ...
> + *
> + * By default, all the non-dataplane functions of the dmadev API exported by a
> + * PMD are lock-free functions which assume to not be invoked in parallel on
> + * different logical cores to work on the same target object.
> + *
> + * The dataplane functions of the dmadev API exported by a PMD can be MT-safe
> + * only when supported by the driver, generally, the driver will reports two
> + * capabilities:
> + *   a) Whether to support MT-safe for the submit/completion API of the same
> + *      virtual DMA channel.
> + *      E.G. one thread do submit operation, another thread do completion
> + *           operation.
> + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_VCHAN.
> + *      If driver don't support it, it's up to the application to guarantee
> + *      MT-safe.
> + *   b) Whether to support MT-safe for different virtual DMA channels.
> + *      E.G. one thread do operation on virtual DMA channel 0, another thread
> + *           do operation on virtual DMA channel 1.
> + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN.
> + *      If driver don't support it, it's up to the application to guarantee
> + *      MT-safe.
> + *
> + */

Just to check - do we have hardware that currently supports these
capabilities? For Intel HW, we will only support one virtual channel per
device without any MT-safety guarantees, so won't be setting either of
these flags. If any of these flags are unused in all planned drivers, we
should drop them from the spec until they prove necessary. Idealy,
everything in the dmadev definition should be testable, and features unused
by anyone obviously will be untested.

> +
> +#include <rte_common.h>
> +#include <rte_compat.h>
> +#include <rte_errno.h>
> +#include <rte_memory.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define RTE_DMADEV_NAME_MAX_LEN	RTE_DEV_NAME_MAX_LEN
> +
> +extern int rte_dmadev_logtype;
> +
> +#define RTE_DMADEV_LOG(level, ...) \
> +	rte_log(RTE_LOG_ ## level, rte_dmadev_logtype, "" __VA_ARGS__)
> +
> +/* Macros to check for valid port */
> +#define RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, retval) do { \
> +	if (!rte_dmadev_is_valid_dev(dev_id)) { \
> +		RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
> +		return retval; \
> +	} \
> +} while (0)
> +
> +#define RTE_DMADEV_VALID_DEV_ID_OR_RET(dev_id) do { \
> +	if (!rte_dmadev_is_valid_dev(dev_id)) { \
> +		RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
> +		return; \
> +	} \
> +} while (0)
> +

Can we avoid using these in the inline functions in this file, and move
them to the _pmd.h which is for internal PMD use only? It would mean we
don't get logging from the key dataplane functions, but I would hope the
return values would provide enough info.

Alternatively, can we keep the logtype definition and first macro and move
the other two to the _pmd.h file.

> +/**
> + * @internal
> + * Validate if the DMA device index is a valid attached DMA device.
> + *
> + * @param dev_id
> + *   DMA device index.
> + *
> + * @return
> + *   - If the device index is valid (true) or not (false).
> + */
> +__rte_internal
> +bool
> +rte_dmadev_is_valid_dev(uint16_t dev_id);
> +
> +/**
> + * rte_dma_sg - can hold scatter DMA operation request
> + */
> +struct rte_dma_sg {
> +	rte_iova_t src;
> +	rte_iova_t dst;
> +	uint32_t length;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get the total number of DMA devices that have been successfully
> + * initialised.
> + *
> + * @return
> + *   The total number of usable DMA devices.
> + */
> +__rte_experimental
> +uint16_t
> +rte_dmadev_count(void);
> +
> +/**
> + * The capabilities of a DMA device
> + */
> +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM	(1ull << 0)
> +/**< DMA device support mem-to-mem transfer.

Do we need this? Can we assume that any device appearing as a dmadev can
do mem-to-mem copies, and drop the capability for mem-to-mem and the
capability for copying?

> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_MEM_TO_DEV	(1ull << 1)
> +/**< DMA device support slave mode & mem-to-dev transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_DEV_TO_MEM	(1ull << 2)
> +/**< DMA device support slave mode & dev-to-mem transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_DEV_TO_DEV	(1ull << 3)
> +/**< DMA device support slave mode & dev-to-dev transfer.
> + *

Just to confirm, are there devices currently planned for dmadev that
supports only a subset of these flags? Thinking particularly of the
dev-2-mem and mem-2-dev ones here - do any of the devices we are
considering not support using device memory?
[Again, just want to ensure we aren't adding too much stuff that we don't
need yet]

> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_COPY	(1ull << 4)
> +/**< DMA device support copy ops.
> + *

Suggest dropping this and making it min for dmadev.

> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_FILL	(1ull << 5)
> +/**< DMA device support fill ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_SG		(1ull << 6)
> +/**< DMA device support scatter-list ops.
> + * If device support ops_copy and ops_sg, it means supporting copy_sg ops.
> + * If device support ops_fill and ops_sg, it means supporting fill_sg ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_FENCE		(1ull << 7)
> +/**< DMA device support fence.
> + * If device support fence, then application could set a fence flags when
> + * enqueue operation by rte_dma_copy/copy_sg/fill/fill_sg.
> + * If a operation has a fence flags, it means the operation must be processed
> + * only after all previous operations are completed.
> + *

Is this needed? As I understand it, the Marvell driver doesn't require
fences so providing one is a no-op. Therefore, this flag is probably
unnecessary.

> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_SVA		(1ull << 8)
> +/**< DMA device support SVA which could use VA as DMA address.
> + * If device support SVA then application could pass any VA address like memory
> + * from rte_malloc(), rte_memzone(), malloc, stack memory.
> + * If device don't support SVA, then application should pass IOVA address which
> + * from rte_malloc(), rte_memzone().
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_MT_VCHAN	(1ull << 9)
> +/**< DMA device support MT-safe of a virtual DMA channel.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN	(1ull << 10)
> +/**< DMA device support MT-safe of different virtual DMA channels.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */

As with comments above - let's check that these will actually be used
before we add them.

> +
> +/**
> + * A structure used to retrieve the contextual information of
> + * an DMA device
> + */
> +struct rte_dmadev_info {
> +	struct rte_device *device; /**< Generic Device information */
> +	uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */
> +	/** Maximum number of virtual DMA channels supported */
> +	uint16_t max_vchans;
> +	/** Maximum allowed number of virtual DMA channel descriptors */
> +	uint16_t max_desc;
> +	/** Minimum allowed number of virtual DMA channel descriptors */
> +	uint16_t min_desc;
> +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> +};

Let's add rte_dmadev_conf struct into this to return the configuration
settings.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve the contextual information of a DMA device.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param[out] dev_info
> + *   A pointer to a structure of type *rte_dmadev_info* to be filled with the
> + *   contextual information of the device.
> + *
> + * @return
> + *   - =0: Success, driver updates the contextual information of the DMA device
> + *   - <0: Error code returned by the driver info get function.
> + *
> + */
> +__rte_experimental
> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
> +

Should have "const" on second param.

> +/**
> + * A structure used to configure a DMA device.
> + */
> +struct rte_dmadev_conf {
> +	/** Maximum number of virtual DMA channel to use.
> +	 * This value cannot be greater than the field 'max_vchans' of struct
> +	 * rte_dmadev_info which get from rte_dmadev_info_get().
> +	 */
> +	uint16_t max_vchans;
> +	/** Enable bit for MT-safe of a virtual DMA channel.
> +	 * This bit can be enabled only when the device supports
> +	 * RTE_DMA_DEV_CAPA_MT_VCHAN.
> +	 * @see RTE_DMA_DEV_CAPA_MT_VCHAN
> +	 */
> +	uint8_t enable_mt_vchan : 1;
> +	/** Enable bit for MT-safe of different virtual DMA channels.
> +	 * This bit can be enabled only when the device supports
> +	 * RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN.
> +	 * @see RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN
> +	 */
> +	uint8_t enable_mt_multi_vchan : 1;
> +	uint64_t reserved[2]; /**< Reserved for future fields */
> +};

Drop the reserved fields. ABI versioning is a better way to deal with
adding new fields.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Configure a DMA device.
> + *
> + * This function must be invoked first before any other function in the
> + * API. This function can also be re-invoked when a device is in the
> + * stopped state.
> + *
> + * @param dev_id
> + *   The identifier of the device to configure.
> + * @param dev_conf
> + *   The DMA device configuration structure encapsulated into rte_dmadev_conf
> + *   object.
> + *
> + * @return
> + *   - =0: Success, device configured.
> + *   - <0: Error code returned by the driver configuration function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Start a DMA device.
> + *
> + * The device start step is the last one and consists of setting the DMA
> + * to start accepting jobs.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - =0: Success, device started.
> + *   - <0: Error code returned by the driver start function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_start(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Stop a DMA device.
> + *
> + * The device can be restarted with a call to rte_dmadev_start()
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - =0: Success, device stopped.
> + *   - <0: Error code returned by the driver stop function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stop(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Close a DMA device.
> + *
> + * The device cannot be restarted after this call.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *  - =0: Successfully close device
> + *  - <0: Failure to close device
> + */
> +__rte_experimental
> +int
> +rte_dmadev_close(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Reset a DMA device.
> + *
> + * This is different from cycle of rte_dmadev_start->rte_dmadev_stop in the
> + * sense similar to hard or soft reset.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - =0: Successfully reset device.
> + *   - <0: Failure to reset device.
> + *   - (-ENOTSUP): If the device doesn't support this function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_reset(uint16_t dev_id);
> +
> +/**
> + * DMA transfer direction defines.
> + */
> +#define RTE_DMA_MEM_TO_MEM	(1ull << 0)
> +/**< DMA transfer direction - from memory to memory.
> + *
> + * @see struct rte_dmadev_vchan_conf::direction
> + */
> +#define RTE_DMA_MEM_TO_DEV	(1ull << 1)
> +/**< DMA transfer direction - slave mode & from memory to device.
> + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> + * request from ARM memory to x86 host memory.

For clarity, it would be good to specify in the scenario described which
memory is the "mem" and which is the "dev" (I assume SoC memory is "mem"
and x86 host memory is "dev"??)

> + *
> + * @see struct rte_dmadev_vchan_conf::direction
> + */
> +#define RTE_DMA_DEV_TO_MEM	(1ull << 2)
> +/**< DMA transfer direction - slave mode & from device to memory.
> + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> + * request from x86 host memory to ARM memory.
> + *
> + * @see struct rte_dmadev_vchan_conf::direction
> + */
> +#define RTE_DMA_DEV_TO_DEV	(1ull << 3)
> +/**< DMA transfer direction - slave mode & from device to device.
> + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> + * request from x86 host memory to another x86 host memory.
> + *
> + * @see struct rte_dmadev_vchan_conf::direction
> + */
> +#define RTE_DMA_TRANSFER_DIR_ALL	(RTE_DMA_MEM_TO_MEM | \
> +					 RTE_DMA_MEM_TO_DEV | \
> +					 RTE_DMA_DEV_TO_MEM | \
> +					 RTE_DMA_DEV_TO_DEV)
> +
> +/**
> + * enum rte_dma_slave_port_type - slave mode type defines
> + */
> +enum rte_dma_slave_port_type {
> +	/** The slave port is PCIE. */
> +	RTE_DMA_SLAVE_PORT_PCIE = 1,
> +};
> +

As previously mentioned, this needs to be updated to use other terms.
For some suggested alternatives see:
https://doc.dpdk.org/guides-21.05/contributing/coding_style.html#naming

> +/**
> + * A structure used to descript slave port parameters.
> + */
> +struct rte_dma_slave_port_parameters {
> +	enum rte_dma_slave_port_type port_type;
> +	union {
> +		/** For PCIE port */
> +		struct {
> +			/** The physical function number which to use */
> +			uint64_t pf_number : 6;
> +			/** Virtual function enable bit */
> +			uint64_t vf_enable : 1;
> +			/** The virtual function number which to use */
> +			uint64_t vf_number : 8;
> +			uint64_t pasid : 20;
> +			/** The attributes filed in TLP packet */
> +			uint64_t tlp_attr : 3;
> +		};
> +	};
> +};
> +
> +/**
> + * A structure used to configure a virtual DMA channel.
> + */
> +struct rte_dmadev_vchan_conf {
> +	uint8_t direction; /**< Set of supported transfer directions */
> +	/** Number of descriptor for the virtual DMA channel */
> +	uint16_t nb_desc;
> +	/** 1) Used to describes the dev parameter in the mem-to-dev/dev-to-mem
> +	 * transfer scenario.
> +	 * 2) Used to describes the src dev parameter in the dev-to-dev
> +	 * transfer scenario.
> +	 */
> +	struct rte_dma_slave_port_parameters port;
> +	/** Used to describes the dst dev parameters in the dev-to-dev
> +	 * transfer scenario.
> +	 */
> +	struct rte_dma_slave_port_parameters peer_port;
> +	uint64_t reserved[2]; /**< Reserved for future fields */
> +};

Let's drop the reserved fields and use ABI versioning if necesssary in
future.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate and set up a virtual DMA channel.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param conf
> + *   The virtual DMA channel configuration structure encapsulated into
> + *   rte_dmadev_vchan_conf object.
> + *
> + * @return
> + *   - >=0: Allocate success, it is the virtual DMA channel id. This value must
> + *          be less than the field 'max_vchans' of struct rte_dmadev_conf
> +	    which configured by rte_dmadev_configure().

nit: whitespace error here.

> + *   - <0: Error code returned by the driver virtual channel setup function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_vchan_setup(uint16_t dev_id,
> +		       const struct rte_dmadev_vchan_conf *conf);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Release a virtual DMA channel.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel which return by vchan setup.
> + *
> + * @return
> + *   - =0: Successfully release the virtual DMA channel.
> + *   - <0: Error code returned by the driver virtual channel release function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
> +
> +/**
> + * rte_dmadev_stats - running statistics.
> + */
> +struct rte_dmadev_stats {
> +	/** Count of operations which were successfully enqueued */
> +	uint64_t enqueued_count;
> +	/** Count of operations which were submitted to hardware */
> +	uint64_t submitted_count;
> +	/** Count of operations which failed to complete */
> +	uint64_t completed_fail_count;
> +	/** Count of operations which successfully complete */
> +	uint64_t completed_count;
> +	uint64_t reserved[4]; /**< Reserved for future fields */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve basic statistics of a or all virtual DMA channel(s).
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel, -1 means all channels.
> + * @param[out] stats
> + *   The basic statistics structure encapsulated into rte_dmadev_stats
> + *   object.
> + *
> + * @return
> + *   - =0: Successfully retrieve stats.
> + *   - <0: Failure to retrieve stats.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stats_get(uint16_t dev_id, int vchan,

vchan as uint16_t rather than int, I think. This would apply to all
dataplane functions. There is no need for a signed vchan value.

> +		     struct rte_dmadev_stats *stats);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Reset basic statistics of a or all virtual DMA channel(s).
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel, -1 means all channels.
> + *
> + * @return
> + *   - =0: Successfully reset stats.
> + *   - <0: Failure to reset stats.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stats_reset(uint16_t dev_id, int vchan);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Dump DMA device info.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param f
> + *   The file to write the output to.
> + *
> + * @return
> + *   0 on success. Non-zero otherwise.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_dump(uint16_t dev_id, FILE *f);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Trigger the dmadev self test.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - 0: Selftest successful.
> + *   - -ENOTSUP if the device doesn't support selftest
> + *   - other values < 0 on failure.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_selftest(uint16_t dev_id);

I don't think this needs to be in the public API, since it should only be
for the autotest app to use. Maybe move the prototype to the _pmd.h (since
we don't have a separate internal header), and then the autotest app can
pick it up from there.

> +
> +#include "rte_dmadev_core.h"
> +
> +/**
> + *  DMA flags to augment operation preparation.
> + *  Used as the 'flags' parameter of rte_dmadev_copy/copy_sg/fill/fill_sg.
> + */
> +#define RTE_DMA_FLAG_FENCE	(1ull << 0)
> +/**< DMA fence flag
> + * It means the operation with this flag must be processed only after all
> + * previous operations are completed.
> + *
> + * @see rte_dmadev_copy()
> + * @see rte_dmadev_copy_sg()
> + * @see rte_dmadev_fill()
> + * @see rte_dmadev_fill_sg()
> + */

As a general comment, I think all these multi-line comments should go
before the item they describe. Comments after should only be used in the
case where the comment fits on the rest of the line after a value.

We also should define the SUBMIT flag as suggested by Jerin, to allow apps
to automatically submit jobs after enqueue.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a copy operation onto the virtual DMA channel.
> + *
> + * This queues up a copy operation to be performed by hardware, but does not
> + * trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param src
> + *   The address of the source buffer.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the data to be copied.
> + * @param flags
> + *   An flags for this operation.
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
> +		uint32_t length, uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->copy)(dev, vchan, src, dst, length, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a scatter list copy operation onto the virtual DMA channel.
> + *
> + * This queues up a scatter list copy operation to be performed by hardware,
> + * but does not trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param sg
> + *   The pointer of scatterlist.
> + * @param sg_len
> + *   The number of scatterlist elements.
> + * @param flags
> + *   An flags for this operation.
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, const struct rte_dma_sg *sg,
> +		   uint32_t sg_len, uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(sg, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy_sg, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->copy_sg)(dev, vchan, sg, sg_len, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a fill operation onto the virtual DMA channel.
> + *
> + * This queues up a fill operation to be performed by hardware, but does not
> + * trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param pattern
> + *   The pattern to populate the destination buffer with.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the destination buffer.
> + * @param flags
> + *   An flags for this operation.
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> +		rte_iova_t dst, uint32_t length, uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a scatter list fill operation onto the virtual DMA channel.
> + *
> + * This queues up a scatter list fill operation to be performed by hardware,
> + * but does not trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param pattern
> + *   The pattern to populate the destination buffer with.
> + * @param sg
> + *   The pointer of scatterlist.
> + * @param sg_len
> + *   The number of scatterlist elements.
> + * @param flags
> + *   An flags for this operation.
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_fill_sg(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> +		   const struct rte_dma_sg *sg, uint32_t sg_len,
> +		   uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(sg, -ENOTSUP);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->fill_sg)(dev, vchan, pattern, sg, sg_len, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Trigger hardware to begin performing enqueued operations.
> + *
> + * This API is used to write the "doorbell" to the hardware to trigger it
> + * to begin the operations previously enqueued by rte_dmadev_copy/fill()
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + *
> + * @return
> + *   - =0: Successfully trigger hardware.
> + *   - <0: Failure to trigger hardware.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_submit(uint16_t dev_id, uint16_t vchan)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->submit, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->submit)(dev, vchan);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that have been successfully completed.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param nb_cpls
> + *   The maximum number of completed operations that can be processed.
> + * @param[out] last_idx
> + *   The last completed operation's index.
> + *   If not required, NULL can be passed in.
> + * @param[out] has_error
> + *   Indicates if there are transfer error.
> + *   If not required, NULL can be passed in.
> + *
> + * @return
> + *   The number of operations that successfully completed.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed(uint16_t dev_id, uint16_t vchan, const uint16_t nb_cpls,
> +		     uint16_t *last_idx, bool *has_error)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +	uint16_t idx;
> +	bool err;
> +
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->completed, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +	if (nb_cpls == 0) {
> +		RTE_DMADEV_LOG(ERR, "Invalid nb_cpls\n");
> +		return -EINVAL;
> +	}
> +#endif
> +
> +	/* Ensure the pointer values are non-null to simplify drivers.
> +	 * In most cases these should be compile time evaluated, since this is
> +	 * an inline function.
> +	 * - If NULL is explicitly passed as parameter, then compiler knows the
> +	 *   value is NULL
> +	 * - If address of local variable is passed as parameter, then compiler
> +	 *   can know it's non-NULL.
> +	 */
> +	if (last_idx == NULL)
> +		last_idx = &idx;
> +	if (has_error == NULL)
> +		has_error = &err;
> +
> +	*has_error = false;
> +	return (*dev->completed)(dev, vchan, nb_cpls, last_idx, has_error);
> +}
> +
> +/**
> + * DMA transfer status code defines
> + */
> +enum rte_dma_status_code {
> +	/** The operation completed successfully */
> +	RTE_DMA_STATUS_SUCCESSFUL = 0,
> +	/** The operation failed to complete due active drop
> +	 * This is mainly used when processing dev_stop, allow outstanding
> +	 * requests to be completed as much as possible.
> +	 */
> +	RTE_DMA_STATUS_ACTIVE_DROP,
> +	/** The operation failed to complete due invalid source address */
> +	RTE_DMA_STATUS_INVALID_SRC_ADDR,
> +	/** The operation failed to complete due invalid destination address */
> +	RTE_DMA_STATUS_INVALID_DST_ADDR,
> +	/** The operation failed to complete due invalid length */
> +	RTE_DMA_STATUS_INVALID_LENGTH,
> +	/** The operation failed to complete due invalid opcode
> +	 * The DMA descriptor could have multiple format, which are
> +	 * distinguished by the opcode field.
> +	 */
> +	RTE_DMA_STATUS_INVALID_OPCODE,
> +	/** The operation failed to complete due bus err */
> +	RTE_DMA_STATUS_BUS_ERROR,
> +	/** The operation failed to complete due data poison */
> +	RTE_DMA_STATUS_DATA_POISION,
> +	/** The operation failed to complete due descriptor read error */
> +	RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR,
> +	/** The operation failed to complete due device link error
> +	 * Used to indicates that the link error in the mem-to-dev/dev-to-mem/
> +	 * dev-to-dev transfer scenario.
> +	 */
> +	RTE_DMA_STATUS_DEV_LINK_ERROR,
> +	/** Driver specific status code offset
> +	 * Start status code for the driver to define its own error code.
> +	 */
> +	RTE_DMA_STATUS_DRV_SPECIFIC_OFFSET = 0x10000,
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that failed to complete.
> + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param nb_status
> + *   Indicates the size of status array.
> + * @param[out] status
> + *   The error code of operations that failed to complete.
> + *   Some standard error code are described in 'enum rte_dma_status_code'
> + *   @see rte_dma_status_code
> + * @param[out] last_idx
> + *   The last failed completed operation's index.
> + *
> + * @return
> + *   The number of operations that failed to complete.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> +			   const uint16_t nb_status, uint32_t *status,
> +			   uint16_t *last_idx)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(status, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(last_idx, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->completed_fails, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +	if (nb_status == 0) {
> +		RTE_DMADEV_LOG(ERR, "Invalid nb_status\n");
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->completed_fails)(dev, vchan, nb_status, status, last_idx);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DMADEV_H_ */
> diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> new file mode 100644
> index 0000000..410faf0
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev_core.h
> @@ -0,0 +1,159 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + */
> +
> +#ifndef _RTE_DMADEV_CORE_H_
> +#define _RTE_DMADEV_CORE_H_
> +
> +/**
> + * @file
> + *
> + * RTE DMA Device internal header.
> + *
> + * This header contains internal data types, that are used by the DMA devices
> + * in order to expose their ops to the class.
> + *
> + * Applications should not use these API directly.
> + *
> + */
> +
> +struct rte_dmadev;
> +
> +/** @internal Used to get device information of a device. */
> +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> +				 struct rte_dmadev_info *dev_info);

First parameter can be "const"

> +/** @internal Used to configure a device. */
> +typedef int (*dmadev_configure_t)(struct rte_dmadev *dev,
> +				  const struct rte_dmadev_conf *dev_conf);
> +
> +/** @internal Used to start a configured device. */
> +typedef int (*dmadev_start_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to stop a configured device. */
> +typedef int (*dmadev_stop_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to close a configured device. */
> +typedef int (*dmadev_close_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to reset a configured device. */
> +typedef int (*dmadev_reset_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to allocate and set up a virtual DMA channel. */
> +typedef int (*dmadev_vchan_setup_t)(struct rte_dmadev *dev,
> +				    const struct rte_dmadev_vchan_conf *conf);
> +
> +/** @internal Used to release a virtual DMA channel. */
> +typedef int (*dmadev_vchan_release_t)(struct rte_dmadev *dev, uint16_t vchan);
> +
> +/** @internal Used to retrieve basic statistics. */
> +typedef int (*dmadev_stats_get_t)(struct rte_dmadev *dev, int vchan,
> +				  struct rte_dmadev_stats *stats);

First parameter can be "const"

> +
> +/** @internal Used to reset basic statistics. */
> +typedef int (*dmadev_stats_reset_t)(struct rte_dmadev *dev, int vchan);
> +
> +/** @internal Used to dump internal information. */
> +typedef int (*dmadev_dump_t)(struct rte_dmadev *dev, FILE *f);
> +

First param "const"

> +/** @internal Used to start dmadev selftest. */
> +typedef int (*dmadev_selftest_t)(uint16_t dev_id);
> +

This looks an outlier taking a dev_id. It should take a rawdev parameter.
Most drivers should not need to implement this anyway, as the main unit
tests should be in "test_dmadev.c" in the autotest app.

> +/** @internal Used to enqueue a copy operation. */
> +typedef int (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vchan,
> +			     rte_iova_t src, rte_iova_t dst,
> +			     uint32_t length, uint64_t flags);
> +
> +/** @internal Used to enqueue a scatter list copy operation. */
> +typedef int (*dmadev_copy_sg_t)(struct rte_dmadev *dev, uint16_t vchan,
> +				const struct rte_dma_sg *sg,
> +				uint32_t sg_len, uint64_t flags);
> +
> +/** @internal Used to enqueue a fill operation. */
> +typedef int (*dmadev_fill_t)(struct rte_dmadev *dev, uint16_t vchan,
> +			     uint64_t pattern, rte_iova_t dst,
> +			     uint32_t length, uint64_t flags);
> +
> +/** @internal Used to enqueue a scatter list fill operation. */
> +typedef int (*dmadev_fill_sg_t)(struct rte_dmadev *dev, uint16_t vchan,
> +			uint64_t pattern, const struct rte_dma_sg *sg,
> +			uint32_t sg_len, uint64_t flags);
> +
> +/** @internal Used to trigger hardware to begin working. */
> +typedef int (*dmadev_submit_t)(struct rte_dmadev *dev, uint16_t vchan);
> +
> +/** @internal Used to return number of successful completed operations. */
> +typedef uint16_t (*dmadev_completed_t)(struct rte_dmadev *dev, uint16_t vchan,
> +				       const uint16_t nb_cpls,
> +				       uint16_t *last_idx, bool *has_error);
> +
> +/** @internal Used to return number of failed completed operations. */
> +typedef uint16_t (*dmadev_completed_fails_t)(struct rte_dmadev *dev,
> +			uint16_t vchan, const uint16_t nb_status,
> +			uint32_t *status, uint16_t *last_idx);
> +
> +/**
> + * DMA device operations function pointer table
> + */
> +struct rte_dmadev_ops {
> +	dmadev_info_get_t dev_info_get;
> +	dmadev_configure_t dev_configure;
> +	dmadev_start_t dev_start;
> +	dmadev_stop_t dev_stop;
> +	dmadev_close_t dev_close;
> +	dmadev_reset_t dev_reset;
> +	dmadev_vchan_setup_t vchan_setup;
> +	dmadev_vchan_release_t vchan_release;
> +	dmadev_stats_get_t stats_get;
> +	dmadev_stats_reset_t stats_reset;
> +	dmadev_dump_t dev_dump;
> +	dmadev_selftest_t dev_selftest;
> +};
> +
> +/**
> + * @internal
> + * The data part, with no function pointers, associated with each DMA device.
> + *
> + * This structure is safe to place in shared memory to be common among different
> + * processes in a multi-process configuration.
> + */
> +struct rte_dmadev_data {
> +	uint16_t dev_id; /**< Device [external] identifier. */
> +	char dev_name[RTE_DMADEV_NAME_MAX_LEN]; /**< Unique identifier name */
> +	void *dev_private; /**< PMD-specific private data. */
> +	struct rte_dmadev_conf dev_conf; /**< DMA device configuration. */
> +	uint8_t dev_started : 1; /**< Device state: STARTED(1)/STOPPED(0). */
> +	uint64_t reserved[4]; /**< Reserved for future fields */
> +} __rte_cache_aligned;
> +

While I generally don't like having reserved space, this is one place where
it makes sense, so +1 for it here.

> +/**
> + * @internal
> + * The generic data structure associated with each DMA device.
> + *
> + * The dataplane APIs are located at the beginning of the structure, along
> + * with the pointer to where all the data elements for the particular device
> + * are stored in shared memory. This split scheme allows the function pointer
> + * and driver data to be per-process, while the actual configuration data for
> + * the device is shared.
> + */
> +struct rte_dmadev {
> +	dmadev_copy_t copy;
> +	dmadev_copy_sg_t copy_sg;
> +	dmadev_fill_t fill;
> +	dmadev_fill_sg_t fill_sg;
> +	dmadev_submit_t submit;
> +	dmadev_completed_t completed;
> +	dmadev_completed_fails_t completed_fails;
> +	const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD. */
> +	/** Flag indicating the device is attached: ATTACHED(1)/DETACHED(0). */
> +	uint8_t attached : 1;

Since it's in the midst of a series of pointers, this 1-bit flag is
actually using 8-bytes of space. Is it needed. Can we use dev_ops == NULL
or data == NULL instead to indicate this is a valid entry?

> +	/** Device info which supplied during device initialization. */
> +	struct rte_device *device;
> +	struct rte_dmadev_data *data; /**< Pointer to device data. */

If we are to try and minimise cacheline access, we should put this data
pointer - or even better a copy of data->private pointer - at the top of
the structure on the same cacheline as datapath operations. For dataplane,
I can't see any elements of data, except the private pointer being
accessed, so we would probably get most benefit for having a copy put there
on init of the dmadev struct.

> +	uint64_t reserved[4]; /**< Reserved for future fields */
> +} __rte_cache_aligned;
> +
> +extern struct rte_dmadev rte_dmadevices[];
> +
> +#endif /* _RTE_DMADEV_CORE_H_ */
> diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
> new file mode 100644
> index 0000000..45141f9
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev_pmd.h
> @@ -0,0 +1,72 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + */
> +
> +#ifndef _RTE_DMADEV_PMD_H_
> +#define _RTE_DMADEV_PMD_H_
> +
> +/**
> + * @file
> + *
> + * RTE DMA Device PMD APIs
> + *
> + * Driver facing APIs for a DMA device. These are not to be called directly by
> + * any application.
> + */
> +
> +#include "rte_dmadev.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @internal
> + * Allocates a new dmadev slot for an DMA device and returns the pointer
> + * to that slot for the driver to use.
> + *
> + * @param name
> + *   DMA device name.
> + *
> + * @return
> + *   A pointer to the DMA device slot case of success,
> + *   NULL otherwise.
> + */
> +__rte_internal
> +struct rte_dmadev *
> +rte_dmadev_pmd_allocate(const char *name);
> +
> +/**
> + * @internal
> + * Release the specified dmadev.
> + *
> + * @param dev
> + *   Device to be released.
> + *
> + * @return
> + *   - 0 on success, negative on error
> + */
> +__rte_internal
> +int
> +rte_dmadev_pmd_release(struct rte_dmadev *dev);
> +
> +/**
> + * @internal
> + * Return the DMA device based on the device name.
> + *
> + * @param name
> + *   DMA device name.
> + *
> + * @return
> + *   A pointer to the DMA device slot case of success,
> + *   NULL otherwise.
> + */
> +__rte_internal
> +struct rte_dmadev *
> +rte_dmadev_get_device_by_name(const char *name);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DMADEV_PMD_H_ */
> diff --git a/lib/dmadev/version.map b/lib/dmadev/version.map
> new file mode 100644
> index 0000000..0f099e7
> --- /dev/null
> +++ b/lib/dmadev/version.map
> @@ -0,0 +1,40 @@
> +EXPERIMENTAL {
> +	global:
> +
> +	rte_dmadev_count;
> +	rte_dmadev_info_get;
> +	rte_dmadev_configure;
> +	rte_dmadev_start;
> +	rte_dmadev_stop;
> +	rte_dmadev_close;
> +	rte_dmadev_reset;
> +	rte_dmadev_vchan_setup;
> +	rte_dmadev_vchan_release;
> +	rte_dmadev_stats_get;
> +	rte_dmadev_stats_reset;
> +	rte_dmadev_dump;
> +	rte_dmadev_selftest;
> +	rte_dmadev_copy;
> +	rte_dmadev_copy_sg;
> +	rte_dmadev_fill;
> +	rte_dmadev_fill_sg;
> +	rte_dmadev_submit;
> +	rte_dmadev_completed;
> +	rte_dmadev_completed_fails;
> +
> +	local: *;
> +};

The elements in the version.map file blocks should be sorted alphabetically.

> +
> +INTERNAL {
> +        global:
> +
> +	rte_dmadevices;
> +	rte_dmadev_pmd_allocate;
> +	rte_dmadev_pmd_release;
> +	rte_dmadev_get_device_by_name;
> +
> +	local:
> +
> +	rte_dmadev_is_valid_dev;
> +};
> +
> diff --git a/lib/meson.build b/lib/meson.build
> index 1673ca4..68d239f 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -60,6 +60,7 @@ libraries = [
>          'bpf',
>          'graph',
>          'node',
> +        'dmadev',
>  ]
>  
>  if is_windows
> -- 
> 2.8.1
> 

^ permalink raw reply	[relevance 3%]

* [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name
@ 2021-07-12 16:17  3% Andrew Rybchenko
  0 siblings, 0 replies; 200+ results
From: Andrew Rybchenko @ 2021-07-12 16:17 UTC (permalink / raw)
  To: Ajit Khaparde, Somnath Kotur, John Daley, Hyong Youb Kim,
	Beilei Xing, Qiming Yang, Qi Zhang, Haiyue Wang, Matan Azrad,
	Shahaf Shuler, Viacheslav Ovsiienko, Thomas Monjalon,
	Ferruh Yigit, Xueming Li
  Cc: dev, Viacheslav Galaktionov, stable

From: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>

Fix representor port ID search by name if the representor itself
does not provide representors info. Getting a list of representors
from a representor does not make sense. Instead, a parent device
should be used.

To this end, extend the rte_eth_dev_data structure to include the port ID
of the parent device for representors.

Fixes: df7547a6a2cc ("ethdev: add helper function to get representor ID")
Cc: stable@dpdk.org

Signed-off-by: Viacheslav Galaktionov <viacheslav.galaktionov@oktetlabs.ru>
Signed-off-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
The new field is added into the hole in rte_eth_dev_data structure.
The patch does not change ABI, but extra care is required since ABI
check is disabled for the structure because of the libabigail bug [1].

Potentially it is bad for out-of-tree drivers which implement
representors but do not fill in a new parert_port_id field in
rte_eth_dev_data structure. Do we care?

May be the patch should add lines to release notes, but I'd like
to get initial feedback first.

mlx5 changes should be reviwed by maintainers very carefully, since
we are not sure if we patch it correctly.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28060

 drivers/net/bnxt/bnxt_reps.c             |  1 +
 drivers/net/enic/enic_vf_representor.c   |  1 +
 drivers/net/i40e/i40e_vf_representor.c   |  1 +
 drivers/net/ice/ice_dcf_vf_representor.c |  1 +
 drivers/net/ixgbe/ixgbe_vf_representor.c |  1 +
 drivers/net/mlx5/linux/mlx5_os.c         | 11 +++++++++++
 drivers/net/mlx5/windows/mlx5_os.c       | 11 +++++++++++
 lib/ethdev/ethdev_driver.h               |  6 +++---
 lib/ethdev/rte_class_eth.c               |  2 +-
 lib/ethdev/rte_ethdev.c                  |  8 ++++----
 lib/ethdev/rte_ethdev_core.h             |  4 ++++
 11 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bnxt/bnxt_reps.c b/drivers/net/bnxt/bnxt_reps.c
index bdbad53b7d..902591cd39 100644
--- a/drivers/net/bnxt/bnxt_reps.c
+++ b/drivers/net/bnxt/bnxt_reps.c
@@ -187,6 +187,7 @@ int bnxt_representor_init(struct rte_eth_dev *eth_dev, void *params)
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
 					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 	eth_dev->data->representor_id = rep_params->vf_id;
+	eth_dev->data->parent_port_id = rep_params->parent_dev->data->port_id;
 
 	rte_eth_random_addr(vf_rep_bp->dflt_mac_addr);
 	memcpy(vf_rep_bp->mac_addr, vf_rep_bp->dflt_mac_addr,
diff --git a/drivers/net/enic/enic_vf_representor.c b/drivers/net/enic/enic_vf_representor.c
index 79dd6e5640..6ee7967ce9 100644
--- a/drivers/net/enic/enic_vf_representor.c
+++ b/drivers/net/enic/enic_vf_representor.c
@@ -662,6 +662,7 @@ int enic_vf_representor_init(struct rte_eth_dev *eth_dev, void *init_params)
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
 					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 	eth_dev->data->representor_id = vf->vf_id;
+	eth_dev->data->parent_port_id = pf->port_id;
 	eth_dev->data->mac_addrs = rte_zmalloc("enic_mac_addr_vf",
 		sizeof(struct rte_ether_addr) *
 		ENIC_UNICAST_PERFECT_FILTERS, 0);
diff --git a/drivers/net/i40e/i40e_vf_representor.c b/drivers/net/i40e/i40e_vf_representor.c
index 0481b55381..865b637585 100644
--- a/drivers/net/i40e/i40e_vf_representor.c
+++ b/drivers/net/i40e/i40e_vf_representor.c
@@ -514,6 +514,7 @@ i40e_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
 	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR |
 					RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
 	ethdev->data->representor_id = representor->vf_id;
+	ethdev->data->parent_port_id = pf->dev_data->parent_port_id;
 
 	/* Setting the number queues allocated to the VF */
 	ethdev->data->nb_rx_queues = vf->vsi->nb_qps;
diff --git a/drivers/net/ice/ice_dcf_vf_representor.c b/drivers/net/ice/ice_dcf_vf_representor.c
index 970461f3e9..c7cd3fd290 100644
--- a/drivers/net/ice/ice_dcf_vf_representor.c
+++ b/drivers/net/ice/ice_dcf_vf_representor.c
@@ -418,6 +418,7 @@ ice_dcf_vf_repr_init(struct rte_eth_dev *vf_rep_eth_dev, void *init_param)
 
 	vf_rep_eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 	vf_rep_eth_dev->data->representor_id = repr->vf_id;
+	vf_rep_eth_dev->data->parent_port_id = repr->dcf_eth_dev->data->port_id;
 
 	vf_rep_eth_dev->data->mac_addrs = &repr->mac_addr;
 
diff --git a/drivers/net/ixgbe/ixgbe_vf_representor.c b/drivers/net/ixgbe/ixgbe_vf_representor.c
index d5b636a194..7a2063849e 100644
--- a/drivers/net/ixgbe/ixgbe_vf_representor.c
+++ b/drivers/net/ixgbe/ixgbe_vf_representor.c
@@ -197,6 +197,7 @@ ixgbe_vf_representor_init(struct rte_eth_dev *ethdev, void *init_params)
 
 	ethdev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 	ethdev->data->representor_id = representor->vf_id;
+	ethdev->data->parent_port_id = representor->pf_ethdev->data->port_id;
 
 	/* Set representor device ops */
 	ethdev->dev_ops = &ixgbe_vf_representor_dev_ops;
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index be22d9cbd2..5550d30628 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -1511,6 +1511,17 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->representor) {
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 		eth_dev->data->representor_id = priv->representor_id;
+		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+			const struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+
+			if (!opriv ||
+			    opriv->sh != priv->sh ||
+			    opriv->representor)
+				continue;
+			eth_dev->data->parent_port_id = port_id;
+			break;
+		}
 	}
 	priv->mp_id.port_id = eth_dev->data->port_id;
 	strlcpy(priv->mp_id.name, MLX5_MP_NAME, RTE_MP_MAX_NAME_LEN);
diff --git a/drivers/net/mlx5/windows/mlx5_os.c b/drivers/net/mlx5/windows/mlx5_os.c
index e30b682822..037c928dc1 100644
--- a/drivers/net/mlx5/windows/mlx5_os.c
+++ b/drivers/net/mlx5/windows/mlx5_os.c
@@ -506,6 +506,17 @@ mlx5_dev_spawn(struct rte_device *dpdk_dev,
 	if (priv->representor) {
 		eth_dev->data->dev_flags |= RTE_ETH_DEV_REPRESENTOR;
 		eth_dev->data->representor_id = priv->representor_id;
+		MLX5_ETH_FOREACH_DEV(port_id, priv->pci_dev) {
+			const struct mlx5_priv *opriv =
+				rte_eth_devices[port_id].data->dev_private;
+
+			if (!opriv ||
+			    opriv->sh != priv->sh ||
+			    opriv->representor)
+				continue;
+			eth_dev->data->parent_port_id = port_id;
+			break;
+		}
 	}
 	/*
 	 * Store associated network device interface index. This index
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 40e474aa7e..07f6d1f9a4 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -1248,8 +1248,8 @@ struct rte_eth_devargs {
  * For backward compatibility, if no representor info, direct
  * map legacy VF (no controller and pf).
  *
- * @param ethdev
- *  Handle of ethdev port.
+ * @param parent_port_id
+ *  Port ID of the backing device.
  * @param type
  *  Representor type.
  * @param controller
@@ -1266,7 +1266,7 @@ struct rte_eth_devargs {
  */
 __rte_internal
 int
-rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
+rte_eth_representor_id_get(uint16_t parent_port_id,
 			   enum rte_eth_representor_type type,
 			   int controller, int pf, int representor_port,
 			   uint16_t *repr_id);
diff --git a/lib/ethdev/rte_class_eth.c b/lib/ethdev/rte_class_eth.c
index 1fe5fa1f36..e3b7ab9728 100644
--- a/lib/ethdev/rte_class_eth.c
+++ b/lib/ethdev/rte_class_eth.c
@@ -95,7 +95,7 @@ eth_representor_cmp(const char *key __rte_unused,
 		c = i / (np * nf);
 		p = (i / nf) % np;
 		f = i % nf;
-		if (rte_eth_representor_id_get(edev,
+		if (rte_eth_representor_id_get(edev->data->parent_port_id,
 			eth_da.type,
 			eth_da.nb_mh_controllers == 0 ? -1 :
 					eth_da.mh_controllers[c],
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 6ebf52b641..acda1d43fb 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5997,7 +5997,7 @@ rte_eth_devargs_parse(const char *dargs, struct rte_eth_devargs *eth_da)
 }
 
 int
-rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
+rte_eth_representor_id_get(uint16_t parent_port_id,
 			   enum rte_eth_representor_type type,
 			   int controller, int pf, int representor_port,
 			   uint16_t *repr_id)
@@ -6012,7 +6012,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
 		return -EINVAL;
 
 	/* Get PMD representor range info. */
-	ret = rte_eth_representor_info_get(ethdev->data->port_id, NULL);
+	ret = rte_eth_representor_info_get(parent_port_id, NULL);
 	if (ret == -ENOTSUP && type == RTE_ETH_REPRESENTOR_VF &&
 	    controller == -1 && pf == -1) {
 		/* Direct mapping for legacy VF representor. */
@@ -6026,7 +6026,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
 	info = calloc(1, size);
 	if (info == NULL)
 		return -ENOMEM;
-	ret = rte_eth_representor_info_get(ethdev->data->port_id, info);
+	ret = rte_eth_representor_info_get(parent_port_id, info);
 	if (ret < 0)
 		goto out;
 
@@ -6045,7 +6045,7 @@ rte_eth_representor_id_get(const struct rte_eth_dev *ethdev,
 			continue;
 		if (info->ranges[i].id_end < info->ranges[i].id_base) {
 			RTE_LOG(WARNING, EAL, "Port %hu invalid representor ID Range %u - %u, entry %d\n",
-				ethdev->data->port_id, info->ranges[i].id_base,
+				parent_port_id, info->ranges[i].id_base,
 				info->ranges[i].id_end, i);
 			continue;
 
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index edf96de2dc..13cb84b52f 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -185,6 +185,10 @@ struct rte_eth_dev_data {
 			/**< Switch-specific identifier.
 			 *   Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
 			 */
+	uint16_t parent_port_id;
+			/**< Port ID of the backing device.
+			 *   Valid if RTE_ETH_DEV_REPRESENTOR in dev_flags.
+			 */
 
 	pthread_mutex_t flow_ops_mutex; /**< rte_flow ops mutex. */
 	uint64_t reserved_64s[4]; /**< Reserved for future fields */
-- 
2.30.2


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library
  2021-07-12 15:50  3%   ` Bruce Richardson
@ 2021-07-13  9:07  0%     ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2021-07-13  9:07 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Chengwen Feng, Thomas Monjalon, Ferruh Yigit, Jerin Jacob,
	dpdk-dev, Morten Brørup, Nipun Gupta, Hemant Agrawal,
	Maxime Coquelin, Honnappa Nagarahalli, David Marchand,
	Satananda Burla, Prasun Kapoor, Ananyev, Konstantin, liangma

On Mon, Jul 12, 2021 at 9:21 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Sun, Jul 11, 2021 at 05:25:56PM +0800, Chengwen Feng wrote:
> > This patch introduce 'dmadevice' which is a generic type of DMA
> > device.
> >
> > The APIs of dmadev library exposes some generic operations which can
> > enable configuration and I/O with the DMA devices.
> >
> > Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>
> Hi again,
>
> some further review comments inline.
>
> /Bruce
>
> > ---
> >  MAINTAINERS                  |    4 +
> >  config/rte_config.h          |    3 +
> >  lib/dmadev/meson.build       |    6 +
> >  lib/dmadev/rte_dmadev.c      |  560 +++++++++++++++++++++++
> >  lib/dmadev/rte_dmadev.h      | 1030 ++++++++++++++++++++++++++++++++++++++++++
> >  lib/dmadev/rte_dmadev_core.h |  159 +++++++
> >  lib/dmadev/rte_dmadev_pmd.h  |   72 +++
> >  lib/dmadev/version.map       |   40 ++
> >  lib/meson.build              |    1 +
>
> <snip>
>
> > diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> > new file mode 100644
> > index 0000000..8779512
> > --- /dev/null
> > +++ b/lib/dmadev/rte_dmadev.h
> > @@ -0,0 +1,1030 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2021 HiSilicon Limited.
> > + * Copyright(c) 2021 Intel Corporation.
> > + * Copyright(c) 2021 Marvell International Ltd.
> > + */
> > +
> > +#ifndef _RTE_DMADEV_H_
> > +#define _RTE_DMADEV_H_
> > +
> > +/**
> > + * @file rte_dmadev.h
> > + *
> > + * RTE DMA (Direct Memory Access) device APIs.
> > + *
> > + * The DMA framework is built on the following model:
> > + *
> > + *     ---------------   ---------------       ---------------
> > + *     | virtual DMA |   | virtual DMA |       | virtual DMA |
> > + *     | channel     |   | channel     |       | channel     |
> > + *     ---------------   ---------------       ---------------
> > + *            |                |                      |
> > + *            ------------------                      |
> > + *                     |                              |
> > + *               ------------                    ------------
> > + *               |  dmadev  |                    |  dmadev  |
> > + *               ------------                    ------------
> > + *                     |                              |
> > + *            ------------------               ------------------
> > + *            | HW-DMA-channel |               | HW-DMA-channel |
> > + *            ------------------               ------------------
> > + *                     |                              |
> > + *                     --------------------------------
> > + *                                     |
> > + *                           ---------------------
> > + *                           | HW-DMA-Controller |
> > + *                           ---------------------
> > + *
> > + * The DMA controller could have multilpe HW-DMA-channels (aka. HW-DMA-queues),
> > + * each HW-DMA-channel should be represented by a dmadev.
> > + *
> > + * The dmadev could create multiple virtual DMA channel, each virtual DMA
> > + * channel represents a different transfer context. The DMA operation request
> > + * must be submitted to the virtual DMA channel.
> > + * E.G. Application could create virtual DMA channel 0 for mem-to-mem transfer
> > + *      scenario, and create virtual DMA channel 1 for mem-to-dev transfer
> > + *      scenario.
> > + *
> > + * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
> > + * PCI/SoC device probing phase performed at EAL initialization time. And could
> > + * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
> > + * phase.
> > + *
> > + * We use 'uint16_t dev_id' as the device identifier of a dmadev, and
> > + * 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.
> > + *
> > + * The functions exported by the dmadev API to setup a device designated by its
> > + * device identifier must be invoked in the following order:
> > + *     - rte_dmadev_configure()
> > + *     - rte_dmadev_vchan_setup()
> > + *     - rte_dmadev_start()
> > + *
> > + * Then, the application can invoke dataplane APIs to process jobs.
> > + *
> > + * If the application wants to change the configuration (i.e. call
> > + * rte_dmadev_configure()), it must call rte_dmadev_stop() first to stop the
> > + * device and then do the reconfiguration before calling rte_dmadev_start()
> > + * again. The dataplane APIs should not be invoked when the device is stopped.
> > + *
> > + * Finally, an application can close a dmadev by invoking the
> > + * rte_dmadev_close() function.
> > + *
> > + * The dataplane APIs include two parts:
> > + *   a) The first part is the submission of operation requests:
> > + *        - rte_dmadev_copy()
> > + *        - rte_dmadev_copy_sg() - scatter-gather form of copy
> > + *        - rte_dmadev_fill()
> > + *        - rte_dmadev_fill_sg() - scatter-gather form of fill
> > + *        - rte_dmadev_perform() - issue doorbell to hardware
> > + *      These APIs could work with different virtual DMA channels which have
> > + *      different contexts.
> > + *      The first four APIs are used to submit the operation request to the
> > + *      virtual DMA channel, if the submission is successful, a uint16_t
> > + *      ring_idx is returned, otherwise a negative number is returned.
> > + *   b) The second part is to obtain the result of requests:
> > + *        - rte_dmadev_completed()
> > + *            - return the number of operation requests completed successfully.
> > + *        - rte_dmadev_completed_fails()
> > + *            - return the number of operation requests failed to complete.
>
> Please rename this to "completed_status" to allow the return of information
> other than just errors. As I suggested before, I think this should also be
> usable as a slower version of "completed" even in the case where there are
> no errors, in that it returns status information for each and every job
> rather than just returning as soon as it hits a failure.
>
> > + * + * About the ring_idx which rte_dmadev_copy/copy_sg/fill/fill_sg()
> > returned, + * the rules are as follows: + *   a) ring_idx for each
> > virtual DMA channel are independent.  + *   b) For a virtual DMA channel,
> > the ring_idx is monotonically incremented, + *      when it reach
> > UINT16_MAX, it wraps back to zero.
>
> Based on other feedback, I suggest we put in the detail here that: "This
> index can be used by applications to track per-job metadata in an
> application-defined circular ring, where the ring is a power-of-2 size, and
> the indexes are masked appropriately."
>
> > + *   c) The initial ring_idx of a virtual DMA channel is zero, after the device
> > + *      is stopped or reset, the ring_idx needs to be reset to zero.
> > + *   Example:
> > + *      step-1: start one dmadev
> > + *      step-2: enqueue a copy operation, the ring_idx return is 0
> > + *      step-3: enqueue a copy operation again, the ring_idx return is 1
> > + *      ...
> > + *      step-101: stop the dmadev
> > + *      step-102: start the dmadev
> > + *      step-103: enqueue a copy operation, the cookie return is 0
> > + *      ...
> > + *      step-x+0: enqueue a fill operation, the ring_idx return is 65535
> > + *      step-x+1: enqueue a copy operation, the ring_idx return is 0
> > + *      ...
> > + *
> > + * By default, all the non-dataplane functions of the dmadev API exported by a
> > + * PMD are lock-free functions which assume to not be invoked in parallel on
> > + * different logical cores to work on the same target object.
> > + *
> > + * The dataplane functions of the dmadev API exported by a PMD can be MT-safe
> > + * only when supported by the driver, generally, the driver will reports two
> > + * capabilities:
> > + *   a) Whether to support MT-safe for the submit/completion API of the same
> > + *      virtual DMA channel.
> > + *      E.G. one thread do submit operation, another thread do completion
> > + *           operation.
> > + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_VCHAN.
> > + *      If driver don't support it, it's up to the application to guarantee
> > + *      MT-safe.
> > + *   b) Whether to support MT-safe for different virtual DMA channels.
> > + *      E.G. one thread do operation on virtual DMA channel 0, another thread
> > + *           do operation on virtual DMA channel 1.
> > + *      If driver support it, then declare RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN.
> > + *      If driver don't support it, it's up to the application to guarantee
> > + *      MT-safe.
> > + *
> > + */
>
> Just to check - do we have hardware that currently supports these
> capabilities? For Intel HW, we will only support one virtual channel per
> device without any MT-safety guarantees, so won't be setting either of
> these flags. If any of these flags are unused in all planned drivers, we
> should drop them from the spec until they prove necessary. Idealy,
> everything in the dmadev definition should be testable, and features unused
> by anyone obviously will be untested.
>
> > +
> > +#include <rte_common.h>
> > +#include <rte_compat.h>
> > +#include <rte_errno.h>
> > +#include <rte_memory.h>
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +#define RTE_DMADEV_NAME_MAX_LEN      RTE_DEV_NAME_MAX_LEN
> > +
> > +extern int rte_dmadev_logtype;
> > +
> > +#define RTE_DMADEV_LOG(level, ...) \
> > +     rte_log(RTE_LOG_ ## level, rte_dmadev_logtype, "" __VA_ARGS__)
> > +
> > +/* Macros to check for valid port */
> > +#define RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, retval) do { \
> > +     if (!rte_dmadev_is_valid_dev(dev_id)) { \
> > +             RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
> > +             return retval; \
> > +     } \
> > +} while (0)
> > +
> > +#define RTE_DMADEV_VALID_DEV_ID_OR_RET(dev_id) do { \
> > +     if (!rte_dmadev_is_valid_dev(dev_id)) { \
> > +             RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
> > +             return; \
> > +     } \
> > +} while (0)
> > +
>
> Can we avoid using these in the inline functions in this file, and move
> them to the _pmd.h which is for internal PMD use only? It would mean we
> don't get logging from the key dataplane functions, but I would hope the
> return values would provide enough info.
>
> Alternatively, can we keep the logtype definition and first macro and move
> the other two to the _pmd.h file.
>
> > +/**
> > + * @internal
> > + * Validate if the DMA device index is a valid attached DMA device.
> > + *
> > + * @param dev_id
> > + *   DMA device index.
> > + *
> > + * @return
> > + *   - If the device index is valid (true) or not (false).
> > + */
> > +__rte_internal
> > +bool
> > +rte_dmadev_is_valid_dev(uint16_t dev_id);
> > +
> > +/**
> > + * rte_dma_sg - can hold scatter DMA operation request
> > + */
> > +struct rte_dma_sg {
> > +     rte_iova_t src;
> > +     rte_iova_t dst;
> > +     uint32_t length;
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Get the total number of DMA devices that have been successfully
> > + * initialised.
> > + *
> > + * @return
> > + *   The total number of usable DMA devices.
> > + */
> > +__rte_experimental
> > +uint16_t
> > +rte_dmadev_count(void);
> > +
> > +/**
> > + * The capabilities of a DMA device
> > + */
> > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM  (1ull << 0)
> > +/**< DMA device support mem-to-mem transfer.
>
> Do we need this? Can we assume that any device appearing as a dmadev can
> do mem-to-mem copies, and drop the capability for mem-to-mem and the
> capability for copying?
>
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_MEM_TO_DEV  (1ull << 1)
> > +/**< DMA device support slave mode & mem-to-dev transfer.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_DEV_TO_MEM  (1ull << 2)
> > +/**< DMA device support slave mode & dev-to-mem transfer.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_DEV_TO_DEV  (1ull << 3)
> > +/**< DMA device support slave mode & dev-to-dev transfer.
> > + *
>
> Just to confirm, are there devices currently planned for dmadev that

We are planning to use this support as our exiting raw driver has this.

> supports only a subset of these flags? Thinking particularly of the
> dev-2-mem and mem-2-dev ones here - do any of the devices we are
> considering not support using device memory?
> [Again, just want to ensure we aren't adding too much stuff that we don't
> need yet]



>
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_OPS_COPY    (1ull << 4)
> > +/**< DMA device support copy ops.
> > + *
>
> Suggest dropping this and making it min for dmadev.
>
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_OPS_FILL    (1ull << 5)
> > +/**< DMA device support fill ops.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_OPS_SG              (1ull << 6)
> > +/**< DMA device support scatter-list ops.
> > + * If device support ops_copy and ops_sg, it means supporting copy_sg ops.
> > + * If device support ops_fill and ops_sg, it means supporting fill_sg ops.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_FENCE               (1ull << 7)
> > +/**< DMA device support fence.
> > + * If device support fence, then application could set a fence flags when
> > + * enqueue operation by rte_dma_copy/copy_sg/fill/fill_sg.
> > + * If a operation has a fence flags, it means the operation must be processed
> > + * only after all previous operations are completed.
> > + *
>
> Is this needed? As I understand it, the Marvell driver doesn't require
> fences so providing one is a no-op. Therefore, this flag is probably
> unnecessary.

+1

>
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_SVA         (1ull << 8)
> > +/**< DMA device support SVA which could use VA as DMA address.
> > + * If device support SVA then application could pass any VA address like memory
> > + * from rte_malloc(), rte_memzone(), malloc, stack memory.
> > + * If device don't support SVA, then application should pass IOVA address which
> > + * from rte_malloc(), rte_memzone().
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_MT_VCHAN    (1ull << 9)
> > +/**< DMA device support MT-safe of a virtual DMA channel.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
> > +#define RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN      (1ull << 10)
> > +/**< DMA device support MT-safe of different virtual DMA channels.
> > + *
> > + * @see struct rte_dmadev_info::dev_capa
> > + */
>
> As with comments above - let's check that these will actually be used
> before we add them.
>
> > +
> > +/**
> > + * A structure used to retrieve the contextual information of
> > + * an DMA device
> > + */
> > +struct rte_dmadev_info {
> > +     struct rte_device *device; /**< Generic Device information */
> > +     uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_) */
> > +     /** Maximum number of virtual DMA channels supported */
> > +     uint16_t max_vchans;
> > +     /** Maximum allowed number of virtual DMA channel descriptors */
> > +     uint16_t max_desc;
> > +     /** Minimum allowed number of virtual DMA channel descriptors */
> > +     uint16_t min_desc;
> > +     uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> > +};
>
> Let's add rte_dmadev_conf struct into this to return the configuration
> settings.
>
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Retrieve the contextual information of a DMA device.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param[out] dev_info
> > + *   A pointer to a structure of type *rte_dmadev_info* to be filled with the
> > + *   contextual information of the device.
> > + *
> > + * @return
> > + *   - =0: Success, driver updates the contextual information of the DMA device
> > + *   - <0: Error code returned by the driver info get function.
> > + *
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
> > +
>
> Should have "const" on second param.
>
> > +/**
> > + * A structure used to configure a DMA device.
> > + */
> > +struct rte_dmadev_conf {
> > +     /** Maximum number of virtual DMA channel to use.
> > +      * This value cannot be greater than the field 'max_vchans' of struct
> > +      * rte_dmadev_info which get from rte_dmadev_info_get().
> > +      */
> > +     uint16_t max_vchans;
> > +     /** Enable bit for MT-safe of a virtual DMA channel.
> > +      * This bit can be enabled only when the device supports
> > +      * RTE_DMA_DEV_CAPA_MT_VCHAN.
> > +      * @see RTE_DMA_DEV_CAPA_MT_VCHAN
> > +      */
> > +     uint8_t enable_mt_vchan : 1;
> > +     /** Enable bit for MT-safe of different virtual DMA channels.
> > +      * This bit can be enabled only when the device supports
> > +      * RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN.
> > +      * @see RTE_DMA_DEV_CAPA_MT_MULTI_VCHAN
> > +      */
> > +     uint8_t enable_mt_multi_vchan : 1;
> > +     uint64_t reserved[2]; /**< Reserved for future fields */
> > +};
>
> Drop the reserved fields. ABI versioning is a better way to deal with
> adding new fields.

+1

>
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Configure a DMA device.
> > + *
> > + * This function must be invoked first before any other function in the
> > + * API. This function can also be re-invoked when a device is in the
> > + * stopped state.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device to configure.
> > + * @param dev_conf
> > + *   The DMA device configuration structure encapsulated into rte_dmadev_conf
> > + *   object.
> > + *
> > + * @return
> > + *   - =0: Success, device configured.
> > + *   - <0: Error code returned by the driver configuration function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Start a DMA device.
> > + *
> > + * The device start step is the last one and consists of setting the DMA
> > + * to start accepting jobs.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + * @return
> > + *   - =0: Success, device started.
> > + *   - <0: Error code returned by the driver start function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_start(uint16_t dev_id);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Stop a DMA device.
> > + *
> > + * The device can be restarted with a call to rte_dmadev_start()
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + * @return
> > + *   - =0: Success, device stopped.
> > + *   - <0: Error code returned by the driver stop function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_stop(uint16_t dev_id);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Close a DMA device.
> > + *
> > + * The device cannot be restarted after this call.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + * @return
> > + *  - =0: Successfully close device
> > + *  - <0: Failure to close device
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_close(uint16_t dev_id);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Reset a DMA device.
> > + *
> > + * This is different from cycle of rte_dmadev_start->rte_dmadev_stop in the
> > + * sense similar to hard or soft reset.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + * @return
> > + *   - =0: Successfully reset device.
> > + *   - <0: Failure to reset device.
> > + *   - (-ENOTSUP): If the device doesn't support this function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_reset(uint16_t dev_id);
> > +
> > +/**
> > + * DMA transfer direction defines.
> > + */
> > +#define RTE_DMA_MEM_TO_MEM   (1ull << 0)
> > +/**< DMA transfer direction - from memory to memory.
> > + *
> > + * @see struct rte_dmadev_vchan_conf::direction
> > + */
> > +#define RTE_DMA_MEM_TO_DEV   (1ull << 1)
> > +/**< DMA transfer direction - slave mode & from memory to device.
> > + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> > + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> > + * request from ARM memory to x86 host memory.
>
> For clarity, it would be good to specify in the scenario described which
> memory is the "mem" and which is the "dev" (I assume SoC memory is "mem"
> and x86 host memory is "dev"??)
>
> > + *
> > + * @see struct rte_dmadev_vchan_conf::direction
> > + */
> > +#define RTE_DMA_DEV_TO_MEM   (1ull << 2)
> > +/**< DMA transfer direction - slave mode & from device to memory.
> > + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> > + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> > + * request from x86 host memory to ARM memory.
> > + *
> > + * @see struct rte_dmadev_vchan_conf::direction
> > + */
> > +#define RTE_DMA_DEV_TO_DEV   (1ull << 3)
> > +/**< DMA transfer direction - slave mode & from device to device.
> > + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs. In
> > + * this case, the ARM SoCs works in slave mode, it could initiate a DMA move
> > + * request from x86 host memory to another x86 host memory.
> > + *
> > + * @see struct rte_dmadev_vchan_conf::direction
> > + */
> > +#define RTE_DMA_TRANSFER_DIR_ALL     (RTE_DMA_MEM_TO_MEM | \
> > +                                      RTE_DMA_MEM_TO_DEV | \
> > +                                      RTE_DMA_DEV_TO_MEM | \
> > +                                      RTE_DMA_DEV_TO_DEV)
> > +
> > +/**
> > + * enum rte_dma_slave_port_type - slave mode type defines
> > + */
> > +enum rte_dma_slave_port_type {
> > +     /** The slave port is PCIE. */
> > +     RTE_DMA_SLAVE_PORT_PCIE = 1,
> > +};
> > +
>
> As previously mentioned, this needs to be updated to use other terms.
> For some suggested alternatives see:
> https://doc.dpdk.org/guides-21.05/contributing/coding_style.html#naming
>
> > +/**
> > + * A structure used to descript slave port parameters.
> > + */
> > +struct rte_dma_slave_port_parameters {
> > +     enum rte_dma_slave_port_type port_type;
> > +     union {
> > +             /** For PCIE port */
> > +             struct {
> > +                     /** The physical function number which to use */
> > +                     uint64_t pf_number : 6;
> > +                     /** Virtual function enable bit */
> > +                     uint64_t vf_enable : 1;
> > +                     /** The virtual function number which to use */
> > +                     uint64_t vf_number : 8;
> > +                     uint64_t pasid : 20;
> > +                     /** The attributes filed in TLP packet */
> > +                     uint64_t tlp_attr : 3;
> > +             };
> > +     };
> > +};
> > +
> > +/**
> > + * A structure used to configure a virtual DMA channel.
> > + */
> > +struct rte_dmadev_vchan_conf {
> > +     uint8_t direction; /**< Set of supported transfer directions */
> > +     /** Number of descriptor for the virtual DMA channel */
> > +     uint16_t nb_desc;
> > +     /** 1) Used to describes the dev parameter in the mem-to-dev/dev-to-mem
> > +      * transfer scenario.
> > +      * 2) Used to describes the src dev parameter in the dev-to-dev
> > +      * transfer scenario.
> > +      */
> > +     struct rte_dma_slave_port_parameters port;
> > +     /** Used to describes the dst dev parameters in the dev-to-dev
> > +      * transfer scenario.
> > +      */
> > +     struct rte_dma_slave_port_parameters peer_port;
> > +     uint64_t reserved[2]; /**< Reserved for future fields */
> > +};
>
> Let's drop the reserved fields and use ABI versioning if necesssary in
> future.
>
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Allocate and set up a virtual DMA channel.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param conf
> > + *   The virtual DMA channel configuration structure encapsulated into
> > + *   rte_dmadev_vchan_conf object.
> > + *
> > + * @return
> > + *   - >=0: Allocate success, it is the virtual DMA channel id. This value must
> > + *          be less than the field 'max_vchans' of struct rte_dmadev_conf
> > +         which configured by rte_dmadev_configure().
>
> nit: whitespace error here.
>
> > + *   - <0: Error code returned by the driver virtual channel setup function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_vchan_setup(uint16_t dev_id,
> > +                    const struct rte_dmadev_vchan_conf *conf);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Release a virtual DMA channel.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel which return by vchan setup.
> > + *
> > + * @return
> > + *   - =0: Successfully release the virtual DMA channel.
> > + *   - <0: Error code returned by the driver virtual channel release function.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
> > +
> > +/**
> > + * rte_dmadev_stats - running statistics.
> > + */
> > +struct rte_dmadev_stats {
> > +     /** Count of operations which were successfully enqueued */
> > +     uint64_t enqueued_count;
> > +     /** Count of operations which were submitted to hardware */
> > +     uint64_t submitted_count;
> > +     /** Count of operations which failed to complete */
> > +     uint64_t completed_fail_count;
> > +     /** Count of operations which successfully complete */
> > +     uint64_t completed_count;
> > +     uint64_t reserved[4]; /**< Reserved for future fields */
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Retrieve basic statistics of a or all virtual DMA channel(s).
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel, -1 means all channels.
> > + * @param[out] stats
> > + *   The basic statistics structure encapsulated into rte_dmadev_stats
> > + *   object.
> > + *
> > + * @return
> > + *   - =0: Successfully retrieve stats.
> > + *   - <0: Failure to retrieve stats.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_stats_get(uint16_t dev_id, int vchan,
>
> vchan as uint16_t rather than int, I think. This would apply to all
> dataplane functions. There is no need for a signed vchan value.
>
> > +                  struct rte_dmadev_stats *stats);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Reset basic statistics of a or all virtual DMA channel(s).
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel, -1 means all channels.
> > + *
> > + * @return
> > + *   - =0: Successfully reset stats.
> > + *   - <0: Failure to reset stats.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_stats_reset(uint16_t dev_id, int vchan);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Dump DMA device info.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param f
> > + *   The file to write the output to.
> > + *
> > + * @return
> > + *   0 on success. Non-zero otherwise.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_dump(uint16_t dev_id, FILE *f);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Trigger the dmadev self test.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + *
> > + * @return
> > + *   - 0: Selftest successful.
> > + *   - -ENOTSUP if the device doesn't support selftest
> > + *   - other values < 0 on failure.
> > + */
> > +__rte_experimental
> > +int
> > +rte_dmadev_selftest(uint16_t dev_id);
>
> I don't think this needs to be in the public API, since it should only be
> for the autotest app to use. Maybe move the prototype to the _pmd.h (since
> we don't have a separate internal header), and then the autotest app can
> pick it up from there.
>
> > +
> > +#include "rte_dmadev_core.h"
> > +
> > +/**
> > + *  DMA flags to augment operation preparation.
> > + *  Used as the 'flags' parameter of rte_dmadev_copy/copy_sg/fill/fill_sg.
> > + */
> > +#define RTE_DMA_FLAG_FENCE   (1ull << 0)
> > +/**< DMA fence flag
> > + * It means the operation with this flag must be processed only after all
> > + * previous operations are completed.
> > + *
> > + * @see rte_dmadev_copy()
> > + * @see rte_dmadev_copy_sg()
> > + * @see rte_dmadev_fill()
> > + * @see rte_dmadev_fill_sg()
> > + */
>
> As a general comment, I think all these multi-line comments should go
> before the item they describe. Comments after should only be used in the
> case where the comment fits on the rest of the line after a value.
>
> We also should define the SUBMIT flag as suggested by Jerin, to allow apps
> to automatically submit jobs after enqueue.
>
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a copy operation onto the virtual DMA channel.
> > + *
> > + * This queues up a copy operation to be performed by hardware, but does not
> > + * trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param src
> > + *   The address of the source buffer.
> > + * @param dst
> > + *   The address of the destination buffer.
> > + * @param length
> > + *   The length of the data to be copied.
> > + * @param flags
> > + *   An flags for this operation.
> > + *
> > + * @return
> > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > + *   - <0: Error code returned by the driver copy function.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
> > +             uint32_t length, uint64_t flags)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->copy, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->copy)(dev, vchan, src, dst, length, flags);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a scatter list copy operation onto the virtual DMA channel.
> > + *
> > + * This queues up a scatter list copy operation to be performed by hardware,
> > + * but does not trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param sg
> > + *   The pointer of scatterlist.
> > + * @param sg_len
> > + *   The number of scatterlist elements.
> > + * @param flags
> > + *   An flags for this operation.
> > + *
> > + * @return
> > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > + *   - <0: Error code returned by the driver copy function.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, const struct rte_dma_sg *sg,
> > +                uint32_t sg_len, uint64_t flags)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(sg, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->copy_sg, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->copy_sg)(dev, vchan, sg, sg_len, flags);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a fill operation onto the virtual DMA channel.
> > + *
> > + * This queues up a fill operation to be performed by hardware, but does not
> > + * trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param pattern
> > + *   The pattern to populate the destination buffer with.
> > + * @param dst
> > + *   The address of the destination buffer.
> > + * @param length
> > + *   The length of the destination buffer.
> > + * @param flags
> > + *   An flags for this operation.
> > + *
> > + * @return
> > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > + *   - <0: Error code returned by the driver copy function.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> > +             rte_iova_t dst, uint32_t length, uint64_t flags)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a scatter list fill operation onto the virtual DMA channel.
> > + *
> > + * This queues up a scatter list fill operation to be performed by hardware,
> > + * but does not trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param pattern
> > + *   The pattern to populate the destination buffer with.
> > + * @param sg
> > + *   The pointer of scatterlist.
> > + * @param sg_len
> > + *   The number of scatterlist elements.
> > + * @param flags
> > + *   An flags for this operation.
> > + *
> > + * @return
> > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > + *   - <0: Error code returned by the driver copy function.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_fill_sg(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> > +                const struct rte_dma_sg *sg, uint32_t sg_len,
> > +                uint64_t flags)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(sg, -ENOTSUP);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->fill_sg)(dev, vchan, pattern, sg, sg_len, flags);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Trigger hardware to begin performing enqueued operations.
> > + *
> > + * This API is used to write the "doorbell" to the hardware to trigger it
> > + * to begin the operations previously enqueued by rte_dmadev_copy/fill()
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + *
> > + * @return
> > + *   - =0: Successfully trigger hardware.
> > + *   - <0: Failure to trigger hardware.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_submit(uint16_t dev_id, uint16_t vchan)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->submit, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->submit)(dev, vchan);
> > +}
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Returns the number of operations that have been successfully completed.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param nb_cpls
> > + *   The maximum number of completed operations that can be processed.
> > + * @param[out] last_idx
> > + *   The last completed operation's index.
> > + *   If not required, NULL can be passed in.
> > + * @param[out] has_error
> > + *   Indicates if there are transfer error.
> > + *   If not required, NULL can be passed in.
> > + *
> > + * @return
> > + *   The number of operations that successfully completed.
> > + */
> > +__rte_experimental
> > +static inline uint16_t
> > +rte_dmadev_completed(uint16_t dev_id, uint16_t vchan, const uint16_t nb_cpls,
> > +                  uint16_t *last_idx, bool *has_error)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +     uint16_t idx;
> > +     bool err;
> > +
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->completed, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +     if (nb_cpls == 0) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid nb_cpls\n");
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +
> > +     /* Ensure the pointer values are non-null to simplify drivers.
> > +      * In most cases these should be compile time evaluated, since this is
> > +      * an inline function.
> > +      * - If NULL is explicitly passed as parameter, then compiler knows the
> > +      *   value is NULL
> > +      * - If address of local variable is passed as parameter, then compiler
> > +      *   can know it's non-NULL.
> > +      */
> > +     if (last_idx == NULL)
> > +             last_idx = &idx;
> > +     if (has_error == NULL)
> > +             has_error = &err;
> > +
> > +     *has_error = false;
> > +     return (*dev->completed)(dev, vchan, nb_cpls, last_idx, has_error);
> > +}
> > +
> > +/**
> > + * DMA transfer status code defines
> > + */
> > +enum rte_dma_status_code {
> > +     /** The operation completed successfully */
> > +     RTE_DMA_STATUS_SUCCESSFUL = 0,
> > +     /** The operation failed to complete due active drop
> > +      * This is mainly used when processing dev_stop, allow outstanding
> > +      * requests to be completed as much as possible.
> > +      */
> > +     RTE_DMA_STATUS_ACTIVE_DROP,
> > +     /** The operation failed to complete due invalid source address */
> > +     RTE_DMA_STATUS_INVALID_SRC_ADDR,
> > +     /** The operation failed to complete due invalid destination address */
> > +     RTE_DMA_STATUS_INVALID_DST_ADDR,
> > +     /** The operation failed to complete due invalid length */
> > +     RTE_DMA_STATUS_INVALID_LENGTH,
> > +     /** The operation failed to complete due invalid opcode
> > +      * The DMA descriptor could have multiple format, which are
> > +      * distinguished by the opcode field.
> > +      */
> > +     RTE_DMA_STATUS_INVALID_OPCODE,
> > +     /** The operation failed to complete due bus err */
> > +     RTE_DMA_STATUS_BUS_ERROR,
> > +     /** The operation failed to complete due data poison */
> > +     RTE_DMA_STATUS_DATA_POISION,
> > +     /** The operation failed to complete due descriptor read error */
> > +     RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR,
> > +     /** The operation failed to complete due device link error
> > +      * Used to indicates that the link error in the mem-to-dev/dev-to-mem/
> > +      * dev-to-dev transfer scenario.
> > +      */
> > +     RTE_DMA_STATUS_DEV_LINK_ERROR,
> > +     /** Driver specific status code offset
> > +      * Start status code for the driver to define its own error code.
> > +      */
> > +     RTE_DMA_STATUS_DRV_SPECIFIC_OFFSET = 0x10000,
> > +};
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Returns the number of operations that failed to complete.
> > + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param nb_status
> > + *   Indicates the size of status array.
> > + * @param[out] status
> > + *   The error code of operations that failed to complete.
> > + *   Some standard error code are described in 'enum rte_dma_status_code'
> > + *   @see rte_dma_status_code
> > + * @param[out] last_idx
> > + *   The last failed completed operation's index.
> > + *
> > + * @return
> > + *   The number of operations that failed to complete.
> > + */
> > +__rte_experimental
> > +static inline uint16_t
> > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> > +                        const uint16_t nb_status, uint32_t *status,
> > +                        uint16_t *last_idx)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> > +#ifdef RTE_DMADEV_DEBUG
> > +     RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(status, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(last_idx, -EINVAL);
> > +     RTE_FUNC_PTR_OR_ERR_RET(*dev->completed_fails, -ENOTSUP);
> > +     if (vchan >= dev->data->dev_conf.max_vchans) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> > +             return -EINVAL;
> > +     }
> > +     if (nb_status == 0) {
> > +             RTE_DMADEV_LOG(ERR, "Invalid nb_status\n");
> > +             return -EINVAL;
> > +     }
> > +#endif
> > +     return (*dev->completed_fails)(dev, vchan, nb_status, status, last_idx);
> > +}
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_DMADEV_H_ */
> > diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> > new file mode 100644
> > index 0000000..410faf0
> > --- /dev/null
> > +++ b/lib/dmadev/rte_dmadev_core.h
> > @@ -0,0 +1,159 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2021 HiSilicon Limited.
> > + * Copyright(c) 2021 Intel Corporation.
> > + */
> > +
> > +#ifndef _RTE_DMADEV_CORE_H_
> > +#define _RTE_DMADEV_CORE_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * RTE DMA Device internal header.
> > + *
> > + * This header contains internal data types, that are used by the DMA devices
> > + * in order to expose their ops to the class.
> > + *
> > + * Applications should not use these API directly.
> > + *
> > + */
> > +
> > +struct rte_dmadev;
> > +
> > +/** @internal Used to get device information of a device. */
> > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> > +                              struct rte_dmadev_info *dev_info);
>
> First parameter can be "const"
>
> > +/** @internal Used to configure a device. */
> > +typedef int (*dmadev_configure_t)(struct rte_dmadev *dev,
> > +                               const struct rte_dmadev_conf *dev_conf);
> > +
> > +/** @internal Used to start a configured device. */
> > +typedef int (*dmadev_start_t)(struct rte_dmadev *dev);
> > +
> > +/** @internal Used to stop a configured device. */
> > +typedef int (*dmadev_stop_t)(struct rte_dmadev *dev);
> > +
> > +/** @internal Used to close a configured device. */
> > +typedef int (*dmadev_close_t)(struct rte_dmadev *dev);
> > +
> > +/** @internal Used to reset a configured device. */
> > +typedef int (*dmadev_reset_t)(struct rte_dmadev *dev);
> > +
> > +/** @internal Used to allocate and set up a virtual DMA channel. */
> > +typedef int (*dmadev_vchan_setup_t)(struct rte_dmadev *dev,
> > +                                 const struct rte_dmadev_vchan_conf *conf);
> > +
> > +/** @internal Used to release a virtual DMA channel. */
> > +typedef int (*dmadev_vchan_release_t)(struct rte_dmadev *dev, uint16_t vchan);
> > +
> > +/** @internal Used to retrieve basic statistics. */
> > +typedef int (*dmadev_stats_get_t)(struct rte_dmadev *dev, int vchan,
> > +                               struct rte_dmadev_stats *stats);
>
> First parameter can be "const"
>
> > +
> > +/** @internal Used to reset basic statistics. */
> > +typedef int (*dmadev_stats_reset_t)(struct rte_dmadev *dev, int vchan);
> > +
> > +/** @internal Used to dump internal information. */
> > +typedef int (*dmadev_dump_t)(struct rte_dmadev *dev, FILE *f);
> > +
>
> First param "const"
>
> > +/** @internal Used to start dmadev selftest. */
> > +typedef int (*dmadev_selftest_t)(uint16_t dev_id);
> > +
>
> This looks an outlier taking a dev_id. It should take a rawdev parameter.
> Most drivers should not need to implement this anyway, as the main unit
> tests should be in "test_dmadev.c" in the autotest app.
>
> > +/** @internal Used to enqueue a copy operation. */
> > +typedef int (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vchan,
> > +                          rte_iova_t src, rte_iova_t dst,
> > +                          uint32_t length, uint64_t flags);
> > +
> > +/** @internal Used to enqueue a scatter list copy operation. */
> > +typedef int (*dmadev_copy_sg_t)(struct rte_dmadev *dev, uint16_t vchan,
> > +                             const struct rte_dma_sg *sg,
> > +                             uint32_t sg_len, uint64_t flags);
> > +
> > +/** @internal Used to enqueue a fill operation. */
> > +typedef int (*dmadev_fill_t)(struct rte_dmadev *dev, uint16_t vchan,
> > +                          uint64_t pattern, rte_iova_t dst,
> > +                          uint32_t length, uint64_t flags);
> > +
> > +/** @internal Used to enqueue a scatter list fill operation. */
> > +typedef int (*dmadev_fill_sg_t)(struct rte_dmadev *dev, uint16_t vchan,
> > +                     uint64_t pattern, const struct rte_dma_sg *sg,
> > +                     uint32_t sg_len, uint64_t flags);
> > +
> > +/** @internal Used to trigger hardware to begin working. */
> > +typedef int (*dmadev_submit_t)(struct rte_dmadev *dev, uint16_t vchan);
> > +
> > +/** @internal Used to return number of successful completed operations. */
> > +typedef uint16_t (*dmadev_completed_t)(struct rte_dmadev *dev, uint16_t vchan,
> > +                                    const uint16_t nb_cpls,
> > +                                    uint16_t *last_idx, bool *has_error);
> > +
> > +/** @internal Used to return number of failed completed operations. */
> > +typedef uint16_t (*dmadev_completed_fails_t)(struct rte_dmadev *dev,
> > +                     uint16_t vchan, const uint16_t nb_status,
> > +                     uint32_t *status, uint16_t *last_idx);
> > +
> > +/**
> > + * DMA device operations function pointer table
> > + */
> > +struct rte_dmadev_ops {
> > +     dmadev_info_get_t dev_info_get;
> > +     dmadev_configure_t dev_configure;
> > +     dmadev_start_t dev_start;
> > +     dmadev_stop_t dev_stop;
> > +     dmadev_close_t dev_close;
> > +     dmadev_reset_t dev_reset;
> > +     dmadev_vchan_setup_t vchan_setup;
> > +     dmadev_vchan_release_t vchan_release;
> > +     dmadev_stats_get_t stats_get;
> > +     dmadev_stats_reset_t stats_reset;
> > +     dmadev_dump_t dev_dump;
> > +     dmadev_selftest_t dev_selftest;
> > +};
> > +
> > +/**
> > + * @internal
> > + * The data part, with no function pointers, associated with each DMA device.
> > + *
> > + * This structure is safe to place in shared memory to be common among different
> > + * processes in a multi-process configuration.
> > + */
> > +struct rte_dmadev_data {
> > +     uint16_t dev_id; /**< Device [external] identifier. */
> > +     char dev_name[RTE_DMADEV_NAME_MAX_LEN]; /**< Unique identifier name */
> > +     void *dev_private; /**< PMD-specific private data. */
> > +     struct rte_dmadev_conf dev_conf; /**< DMA device configuration. */
> > +     uint8_t dev_started : 1; /**< Device state: STARTED(1)/STOPPED(0). */
> > +     uint64_t reserved[4]; /**< Reserved for future fields */
> > +} __rte_cache_aligned;
> > +
>
> While I generally don't like having reserved space, this is one place where
> it makes sense, so +1 for it here.
>
> > +/**
> > + * @internal
> > + * The generic data structure associated with each DMA device.
> > + *
> > + * The dataplane APIs are located at the beginning of the structure, along
> > + * with the pointer to where all the data elements for the particular device
> > + * are stored in shared memory. This split scheme allows the function pointer
> > + * and driver data to be per-process, while the actual configuration data for
> > + * the device is shared.
> > + */
> > +struct rte_dmadev {
> > +     dmadev_copy_t copy;
> > +     dmadev_copy_sg_t copy_sg;
> > +     dmadev_fill_t fill;
> > +     dmadev_fill_sg_t fill_sg;
> > +     dmadev_submit_t submit;
> > +     dmadev_completed_t completed;
> > +     dmadev_completed_fails_t completed_fails;
> > +     const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD. */
> > +     /** Flag indicating the device is attached: ATTACHED(1)/DETACHED(0). */
> > +     uint8_t attached : 1;
>
> Since it's in the midst of a series of pointers, this 1-bit flag is
> actually using 8-bytes of space. Is it needed. Can we use dev_ops == NULL
> or data == NULL instead to indicate this is a valid entry?
>
> > +     /** Device info which supplied during device initialization. */
> > +     struct rte_device *device;
> > +     struct rte_dmadev_data *data; /**< Pointer to device data. */
>
> If we are to try and minimise cacheline access, we should put this data
> pointer - or even better a copy of data->private pointer - at the top of
> the structure on the same cacheline as datapath operations. For dataplane,
> I can't see any elements of data, except the private pointer being
> accessed, so we would probably get most benefit for having a copy put there
> on init of the dmadev struct.
>
> > +     uint64_t reserved[4]; /**< Reserved for future fields */
> > +} __rte_cache_aligned;
> > +
> > +extern struct rte_dmadev rte_dmadevices[];
> > +
> > +#endif /* _RTE_DMADEV_CORE_H_ */
> > diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
> > new file mode 100644
> > index 0000000..45141f9
> > --- /dev/null
> > +++ b/lib/dmadev/rte_dmadev_pmd.h
> > @@ -0,0 +1,72 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2021 HiSilicon Limited.
> > + */
> > +
> > +#ifndef _RTE_DMADEV_PMD_H_
> > +#define _RTE_DMADEV_PMD_H_
> > +
> > +/**
> > + * @file
> > + *
> > + * RTE DMA Device PMD APIs
> > + *
> > + * Driver facing APIs for a DMA device. These are not to be called directly by
> > + * any application.
> > + */
> > +
> > +#include "rte_dmadev.h"
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * @internal
> > + * Allocates a new dmadev slot for an DMA device and returns the pointer
> > + * to that slot for the driver to use.
> > + *
> > + * @param name
> > + *   DMA device name.
> > + *
> > + * @return
> > + *   A pointer to the DMA device slot case of success,
> > + *   NULL otherwise.
> > + */
> > +__rte_internal
> > +struct rte_dmadev *
> > +rte_dmadev_pmd_allocate(const char *name);
> > +
> > +/**
> > + * @internal
> > + * Release the specified dmadev.
> > + *
> > + * @param dev
> > + *   Device to be released.
> > + *
> > + * @return
> > + *   - 0 on success, negative on error
> > + */
> > +__rte_internal
> > +int
> > +rte_dmadev_pmd_release(struct rte_dmadev *dev);
> > +
> > +/**
> > + * @internal
> > + * Return the DMA device based on the device name.
> > + *
> > + * @param name
> > + *   DMA device name.
> > + *
> > + * @return
> > + *   A pointer to the DMA device slot case of success,
> > + *   NULL otherwise.
> > + */
> > +__rte_internal
> > +struct rte_dmadev *
> > +rte_dmadev_get_device_by_name(const char *name);
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_DMADEV_PMD_H_ */
> > diff --git a/lib/dmadev/version.map b/lib/dmadev/version.map
> > new file mode 100644
> > index 0000000..0f099e7
> > --- /dev/null
> > +++ b/lib/dmadev/version.map
> > @@ -0,0 +1,40 @@
> > +EXPERIMENTAL {
> > +     global:
> > +
> > +     rte_dmadev_count;
> > +     rte_dmadev_info_get;
> > +     rte_dmadev_configure;
> > +     rte_dmadev_start;
> > +     rte_dmadev_stop;
> > +     rte_dmadev_close;
> > +     rte_dmadev_reset;
> > +     rte_dmadev_vchan_setup;
> > +     rte_dmadev_vchan_release;
> > +     rte_dmadev_stats_get;
> > +     rte_dmadev_stats_reset;
> > +     rte_dmadev_dump;
> > +     rte_dmadev_selftest;
> > +     rte_dmadev_copy;
> > +     rte_dmadev_copy_sg;
> > +     rte_dmadev_fill;
> > +     rte_dmadev_fill_sg;
> > +     rte_dmadev_submit;
> > +     rte_dmadev_completed;
> > +     rte_dmadev_completed_fails;
> > +
> > +     local: *;
> > +};
>
> The elements in the version.map file blocks should be sorted alphabetically.
>
> > +
> > +INTERNAL {
> > +        global:
> > +
> > +     rte_dmadevices;
> > +     rte_dmadev_pmd_allocate;
> > +     rte_dmadev_pmd_release;
> > +     rte_dmadev_get_device_by_name;
> > +
> > +     local:
> > +
> > +     rte_dmadev_is_valid_dev;
> > +};
> > +
> > diff --git a/lib/meson.build b/lib/meson.build
> > index 1673ca4..68d239f 100644
> > --- a/lib/meson.build
> > +++ b/lib/meson.build
> > @@ -60,6 +60,7 @@ libraries = [
> >          'bpf',
> >          'graph',
> >          'node',
> > +        'dmadev',
> >  ]
> >
> >  if is_windows
> > --
> > 2.8.1
> >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  @ 2021-07-13 12:33  3%                     ` Ananyev, Konstantin
  2021-07-13 14:08  0%                       ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-07-13 12:33 UTC (permalink / raw)
  To: Nithin Dabilpuram
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	 Radu, jiawenwu, jianwang


Adding more rte_security and PMD maintainers into the loop.

> > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > >
> > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > work on Tx.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > >
> > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > >
> > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > >
> > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > >
> > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > >
> > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > >
> > > > > > > > > 1. receive_pkt()
> > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > 5. Do route_lookup()
> > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > 7. Send packet out.
> > > > > > > > >
> > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > ipsec-secgw is following.
> > > > > > > >
> > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > >
> > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > >
> > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > >
> > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling rte_security_set_pkt_metadata().
> > > > > > >
> > > > > > > This is also fine with us.
> > > > > >
> > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > Yes.
> > > > >
> > > > > >
> > > > > > Is that correct understanding?
> > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > >
> > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > are callbacks from same PMD, do you see any issue ?
> > > > >
> > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > rte_security_set_pkt_metadata() is called again.
> > > >
> > > > Yep, I do have a concern here.
> > > > Right now it is perfectly valid to do something like that:
> > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > /* can modify contents of the packet */
> > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > rte_eth_tx_burst(..., &mb, 1);
> > > >
> > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > >
> > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > PMD which are the ones only implementing the call back are already expecting the
> > > same ?
> >
> > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > All that ixgbe callback does - store session related data inside mbuf.
> > It's only expectation to have ESP trailer at the proper place (after ICV):
> 
> This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> have ESP trailer updated or when mbuf->pkt_len = 0
> 
> >
> > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> >                                 rte_security_dynfield(m);
> >   mdata->enc = 1;
> >   mdata->sa_idx = ic_session->sa_index;
> >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> >
> > Then this data will be used by tx_burst() function.
> So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> will be using incorrect pad len ?

No, pkt_len can be modified.
Though ESP trailer pad_len can't.

> 
> This patch is also trying to add similar restriction on when
> rte_security_set_pkt_metadata() should be called and what cannot be done after
> calling rte_security_set_pkt_metadata().

No, I don't think it is really the same.
Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
that ESP packet is already formed and trailer contains valid data.
In fact, I think this pad_len calculation can be moved to actual TX function.

> 
> >
> > >
> > > >
> > > > >
> > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > security,
> > > >
> > > > Yes, that was my thought.
> > > >
> > > > > my only argument was that since there is already a hit in
> > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > do security related pre-processing there.
> > > >
> > > > Yes, it would be extra callback call that way.
> > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > of function call will be spread around the whole burst, and I presume
> > > > shouldn't be too high.
> > > >
> > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > non-security pkts.
> > > >
> > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > modifications are required for the packet.
> > >
> > > But the major issues I see are
> > >
> > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > >    In our case, we need to know the security session details to do things.
> >
> > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> 
> We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> just for storing session pointer in rte_security_dynfield consumes unnecessary
> cycles per pkt.

In fact there are two function calls: one for rte_security_set_pkt_metadata(),
second for  instance->ops->set_pkt_metadata() callback.
Which off-course way too expensive for such simple operation.
Actually same thought for rte_security_get_userdata().
Both of these functions belong to data-path and ideally have to be as fast as possible.
Probably 21.11 is a right timeframe for that.
 
> >
> > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> >
> > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > specially for that - to store secuiryt related data inside the mbuf.
> > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> >
> > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > processing, it can be done via security specific set_pkt_metadata().
> >
> > But what you proposing introduces new limitations and might existing functionality.
> > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > itself is not an option?
> 
> We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> rte_security_set_pkt_metadata().
> 
> Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> set, then, user needs to update struct rte_security_session's sess_private_data in a in
> rte_security_dynfield like below ?
> 
> <snip>
> 
> static inline void
> inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
>         struct rte_mbuf *mb[], uint16_t num)
> {
>         uint32_t i, ol_flags;
> 
>         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
>         for (i = 0; i != num; i++) {
> 
>                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> 
>                 if (ol_flags != 0)
>                         rte_security_set_pkt_metadata(ss->security.ctx,
>                                 ss->security.ses, mb[i], NULL);
> 		else
>                 	*rte_security_dynfield(mb[i]) =
>                                 (uint64_t)ss->security.ses->sess_private_data;
> 
> 
> If the above can be done, then in our PMD, we will not have a callback for
> set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> in capabilities.

That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
So all existing apps that use this API will have to be changed.
We'd better avoid such changes unless there is really good reason for that.
So, I'd suggest to tweak your idea a bit:

1) change rte_security_set_pkt_metadata():
if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
otherwise just: rte_security_dynfield(m) = sess->session_private_data;
(fast-path)

2) consider to make rte_security_set_pkt_metadata() inline function. 
We probably can have some special flag inside struct rte_security_ctx,
or even store inside ctx a pointer to set_pkt_metadata() itself.

As a brief code snippet:

struct rte_security_ctx {
        void *device;
        /**< Crypto/ethernet device attached */
        const struct rte_security_ops *ops;
        /**< Pointer to security ops for the device */
        uint16_t sess_cnt;
        /**< Number of sessions attached to this context */
+     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);   
}; 

static inline int
rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
                              struct rte_security_session *sess,
                              struct rte_mbuf *m, void *params)
{
     /* fast-path */
      if (instance->set_pkt_mdata == NULL) {
             *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
             return 0; 
       /* slow path */ 
       } else
           return instance->set_pkt_mdata(instance->device, sess, m, params);
}

That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require 
some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
(ctx_create()), but hopefully will benefit everyone.

> 
> >
> > > I'm fine to
> > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > compensate for the overhead.
> > >
> > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > rte_mbuf had space for struct rte_security_session pointer,
> >
> > But it does, see above.
> > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > If your PMD requires more data to be associated with mbuf
> > - you can request it via mbuf_dynfield and store there whatever is needed.
> >
> > > then then I guess it would have been better to do the way you proposed.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > called.
> > > > > > > > >
> > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > Does it makes sense ?
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Once called,
> > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > >
> > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > >
> > > > > > > > > > >  /**
> > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > >   */
> > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > 2.25.1
> > > > > > > > > >

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] dmadev: introduce DMA device library
  @ 2021-07-13 13:06  3%   ` fengchengwen
  2021-07-13 13:37  0%     ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2021-07-13 13:06 UTC (permalink / raw)
  To: thomas, ferruh.yigit, bruce.richardson, jerinj, jerinjacobk,
	andrew.rybchenko
  Cc: dev, mb, nipun.gupta, hemant.agrawal, maxime.coquelin,
	honnappa.nagarahalli, david.marchand, sburla, pkapoor,
	konstantin.ananyev

Thank you for your valuable comments, and I think we've taken a big step forward.

@andrew Could you provide the copyright line so that I can add it to relevant file.

@burce, jerin  Some unmodified review comments are returned here:

1.
COMMENT: We allow up to 100 characters per line for DPDK code, so these don't need
to be wrapped so aggressively.

REPLY: Our CI still has 80 characters limit, and I review most framework still comply.

2.
COMMENT: > +#define RTE_DMA_MEM_TO_MEM     (1ull << 0)
RTE_DMA_DIRECTION_...

REPLY: add the 'DIRECTION' may the macro too long, I prefer keep it simple.

3.
COMMENT: > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
We are not making release as pubic API in other device class. See ethdev spec.
bbdev/eventdev/rawdev

REPLY: because ethdev's queue is hard-queue, and here is the software defined channels,
I think release is OK, BTW: bbdev/eventdev also have release ops.

4.
COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future fields */
> +};
Please add the capability for each counter in info structure as one
device may support all
the counters.

REPLY: This is a statistics function. If this function is not supported, then do not need
to implement the stats ops function. Also could to set the unimplemented ones to zero.

5.
COMMENT: > +#endif
> +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
Instead of every driver set the NOP function, In the common code, If
the CAPA is not set,
common code can set NOP function for this with <0 return value.

REPLY: I don't think it's a good idea to judge in IO path, it's application duty to ensure
don't call API which driver not supported (which could get from capabilities).

6.
COMMENT: > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> +                          const uint16_t nb_status, uint32_t *status,
uint32_t -> enum rte_dma_status_code

REPLY:I'm still evaluating this. It takes a long time for the driver to perform error code
conversion in this API. Do we need to provide an error code conversion function alone ?

7.
COMMENT: > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> +                                struct rte_dmadev_info *dev_info);
Please change to rte_dmadev_info_get_t to avoid conflict due to namespace issue
as this header is exported.

REPLY: I prefer not add 'rte_' prefix, it make the define too long.

8.
COMMENT: > + *        - rte_dmadev_completed_fails()
> + *            - return the number of operation requests failed to complete.
Please rename this to "completed_status" to allow the return of information
other than just errors. As I suggested before, I think this should also be
usable as a slower version of "completed" even in the case where there are
no errors, in that it returns status information for each and every job
rather than just returning as soon as it hits a failure.

REPLY: well, I think it maybe confuse (current OK/FAIL API is easy to understand.),
and we can build the slow path function on the two API.

9.
COMMENT: > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM	(1ull << 0)
> +/**< DMA device support mem-to-mem transfer.
Do we need this? Can we assume that any device appearing as a dmadev can
do mem-to-mem copies, and drop the capability for mem-to-mem and the
capability for copying?
also for RTE_DMA_DEV_CAPA_OPS_COPY

REPLY: yes, I insist on adding this for the sake of conceptual integrity.
For ioat driver just make a statement.

10.
COMMENT: > +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> +};
Let's add rte_dmadev_conf struct into this to return the configuration
settings.

REPLY: If we add rte_dmadev_conf in, it may break ABI when rte_dmadev_conf add fields.


[snip]

On 2021/7/13 20:27, Chengwen Feng wrote:
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
> v3:
> * rm reset and fill_sg ops.
> * rm MT-safe capabilities.
> * add submit flag.
> * redefine rte_dma_sg to implement asymmetric copy.
> * delete some reserved field for future use.
> * rearrangement rte_dmadev/rte_dmadev_data struct.
> * refresh rte_dmadev.h copyright.
> * update vchan setup parameter.
> * modified some inappropriate descriptions.
> * arrange version.map alphabetically.
> * other minor modifications from review comment.
> ---
>  MAINTAINERS                  |   4 +
>  config/rte_config.h          |   3 +
>  lib/dmadev/meson.build       |   7 +
>  lib/dmadev/rte_dmadev.c      | 561 +++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev.h      | 968 +++++++++++++++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h | 161 +++++++
>  lib/dmadev/rte_dmadev_pmd.h  |  72 ++++
>  lib/dmadev/version.map       |  37 ++
>  lib/meson.build              |   1 +


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v3] dmadev: introduce DMA device library
  2021-07-13 13:06  3%   ` fengchengwen
@ 2021-07-13 13:37  0%     ` Bruce Richardson
  2021-07-15  6:44  0%       ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2021-07-13 13:37 UTC (permalink / raw)
  To: fengchengwen
  Cc: thomas, ferruh.yigit, jerinj, jerinjacobk, andrew.rybchenko, dev,
	mb, nipun.gupta, hemant.agrawal, maxime.coquelin,
	honnappa.nagarahalli, david.marchand, sburla, pkapoor,
	konstantin.ananyev

On Tue, Jul 13, 2021 at 09:06:39PM +0800, fengchengwen wrote:
> Thank you for your valuable comments, and I think we've taken a big step forward.
> 
> @andrew Could you provide the copyright line so that I can add it to relevant file.
> 
> @burce, jerin  Some unmodified review comments are returned here:

Thanks. Some further comments inline below. Most points you make I'm ok
with, but I do disagree on a number of others.

/Bruce

> 
> 1.
> COMMENT: We allow up to 100 characters per line for DPDK code, so these don't need
> to be wrapped so aggressively.
> 
> REPLY: Our CI still has 80 characters limit, and I review most framework still comply.
> 
Ok.

> 2.
> COMMENT: > +#define RTE_DMA_MEM_TO_MEM     (1ull << 0)
> RTE_DMA_DIRECTION_...
> 
> REPLY: add the 'DIRECTION' may the macro too long, I prefer keep it simple.
> 
DIRECTION could be shortened to DIR, but I think this is probably ok as is
too.

> 3.
> COMMENT: > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
> We are not making release as pubic API in other device class. See ethdev spec.
> bbdev/eventdev/rawdev
> 
> REPLY: because ethdev's queue is hard-queue, and here is the software defined channels,
> I think release is OK, BTW: bbdev/eventdev also have release ops.
> 
Ok

> 4.  COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future
> fields */
> > +};
> Please add the capability for each counter in info structure as one
> device may support all the counters.
> 
> REPLY: This is a statistics function. If this function is not supported,
> then do not need to implement the stats ops function. Also could to set
> the unimplemented ones to zero.
> 
+1
The stats functions should be a minimum set that is supported by all
drivers. Each of these stats can be easily tracked by software if HW
support for it is not available, so I agree that we should not have each
stat as a capability.

> 5.
> COMMENT: > +#endif
> > +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> Instead of every driver set the NOP function, In the common code, If
> the CAPA is not set,
> common code can set NOP function for this with <0 return value.
> 
> REPLY: I don't think it's a good idea to judge in IO path, it's application duty to ensure
> don't call API which driver not supported (which could get from capabilities).
> 
For datapath functions, +1.

> 6.
> COMMENT: > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> > +                          const uint16_t nb_status, uint32_t *status,
> uint32_t -> enum rte_dma_status_code
> 
> REPLY:I'm still evaluating this. It takes a long time for the driver to perform error code
> conversion in this API. Do we need to provide an error code conversion function alone ?
> 
It's not that difficult a conversion to do, and so long as we have the
regular "completed" function which doesn't do all the error manipulation we
should be fine. Performance in the case of errors is not expected to be as
good, since errors should be very rare.

> 7.
> COMMENT: > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> > +                                struct rte_dmadev_info *dev_info);
> Please change to rte_dmadev_info_get_t to avoid conflict due to namespace issue
> as this header is exported.
> 
> REPLY: I prefer not add 'rte_' prefix, it make the define too long.
> 
I disagree on this, they need the rte_ prefix, despite the fact it makes
them longer. If length is a concern, these can be changed from "dmadev_" to
"rte_dma_", which is only one character longer.
In fact, I believe Morten already suggested we use "rte_dma" rather than
"rte_dmadev" as a function prefix across the library.

> 8.
> COMMENT: > + *        - rte_dmadev_completed_fails()
> > + *            - return the number of operation requests failed to complete.
> Please rename this to "completed_status" to allow the return of information
> other than just errors. As I suggested before, I think this should also be
> usable as a slower version of "completed" even in the case where there are
> no errors, in that it returns status information for each and every job
> rather than just returning as soon as it hits a failure.
> 
> REPLY: well, I think it maybe confuse (current OK/FAIL API is easy to understand.),
> and we can build the slow path function on the two API.
> 
I still disagree on this too. We have a "completed" op where we get
informed of what has completed and minimal error indication, and a
"completed_status" operation which provides status information for each
operation completed, at the cost of speed.

> 9.
> COMMENT: > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM	(1ull << 0)
> > +/**< DMA device support mem-to-mem transfer.
> Do we need this? Can we assume that any device appearing as a dmadev can
> do mem-to-mem copies, and drop the capability for mem-to-mem and the
> capability for copying?
> also for RTE_DMA_DEV_CAPA_OPS_COPY
> 
> REPLY: yes, I insist on adding this for the sake of conceptual integrity.
> For ioat driver just make a statement.
> 

Ok. It seems a wasted bit to me, but I don't see us running out of them
soon.

> 10.
> COMMENT: > +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> > +};
> Let's add rte_dmadev_conf struct into this to return the configuration
> settings.
> 
> REPLY: If we add rte_dmadev_conf in, it may break ABI when rte_dmadev_conf add fields.
> 
Yes, that is true, but I fail to see why that is a major problem. It just
means that if the conf structure changes we have two functions to version
instead of one. The information is still useful.

If you don't want the actual conf structure explicitly put into the info
struct, we can instead put the fields in directly. I really think that the
info_get function should provide back to the user the details of what way
the device was configured previously.

regards,
/Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH 00/10] new features for ipsec and security libraries
@ 2021-07-13 13:35  3% Radu Nicolau
  0 siblings, 0 replies; 200+ results
From: Radu Nicolau @ 2021-07-13 13:35 UTC (permalink / raw)
  Cc: dev, Radu Nicolau, Declan Doherty, Abhijit Sinha, Daniel Martin Buckley

Add support for:
TSO, NAT-T/UDP encapsulation, ESN
AES_CCM, CHACHA20_POLY1305 and AES_GMAC
SA telemetry
mbuf offload flags
Initial SQN value

This patchset introduces ABI breakages and it is intended for 21.11 release

Signed-off-by: Declan Doherty <declan.doherty@intel.com>
Signed-off-by: Radu Nicolau <radu.nicolau@intel.com>
Signed-off-by: Abhijit Sinha <abhijit.sinha@intel.com>
Signed-off-by: Daniel Martin Buckley <daniel.m.buckley@intel.com>

Radu Nicolau (10):
  security: add support for TSO on IPsec session
  security: add UDP params for IPsec NAT-T
  security: add ESN field to ipsec_xform
  mbuf: add IPsec ESP tunnel type
  ipsec: add support for AEAD algorithms
  ipsec: add transmit segmentation offload support
  ipsec: add support for NAT-T
  ipsec: add support for SA telemetry
  ipsec: add support for initial SQN value
  ipsec: add ol_flags support

 lib/ipsec/crypto.h          | 137 ++++++++++++
 lib/ipsec/esp_inb.c         |  88 +++++++-
 lib/ipsec/esp_outb.c        | 262 +++++++++++++++++++----
 lib/ipsec/iph.h             |  23 +-
 lib/ipsec/meson.build       |   2 +-
 lib/ipsec/rte_ipsec.h       |  11 +
 lib/ipsec/rte_ipsec_sa.h    |  11 +-
 lib/ipsec/sa.c              | 406 ++++++++++++++++++++++++++++++++++--
 lib/ipsec/sa.h              |  43 ++++
 lib/ipsec/version.map       |   8 +
 lib/mbuf/rte_mbuf_core.h    |   1 +
 lib/security/rte_security.h |  31 +++
 12 files changed, 950 insertions(+), 73 deletions(-)

-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  2021-07-13 12:33  3%                     ` Ananyev, Konstantin
@ 2021-07-13 14:08  0%                       ` Ananyev, Konstantin
  2021-07-13 15:58  0%                         ` Nithin Dabilpuram
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-07-13 14:08 UTC (permalink / raw)
  To: Ananyev, Konstantin, Nithin Dabilpuram
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	 Radu, jiawenwu, jianwang


> 
> Adding more rte_security and PMD maintainers into the loop.
> 
> > > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > > work on Tx.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > > >
> > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > > >
> > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > > >
> > > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > > >
> > > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > > >
> > > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > > >
> > > > > > > > > > 1. receive_pkt()
> > > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > > 5. Do route_lookup()
> > > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > > 7. Send packet out.
> > > > > > > > > >
> > > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > > ipsec-secgw is following.
> > > > > > > > >
> > > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > > >
> > > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > > >
> > > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > > >
> > > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling rte_security_set_pkt_metadata().
> > > > > > > >
> > > > > > > > This is also fine with us.
> > > > > > >
> > > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > > Yes.
> > > > > >
> > > > > > >
> > > > > > > Is that correct understanding?
> > > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > > >
> > > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > > are callbacks from same PMD, do you see any issue ?
> > > > > >
> > > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > > rte_security_set_pkt_metadata() is called again.
> > > > >
> > > > > Yep, I do have a concern here.
> > > > > Right now it is perfectly valid to do something like that:
> > > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > > /* can modify contents of the packet */
> > > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > > rte_eth_tx_burst(..., &mb, 1);
> > > > >
> > > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > > >
> > > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > > PMD which are the ones only implementing the call back are already expecting the
> > > > same ?
> > >
> > > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > > All that ixgbe callback does - store session related data inside mbuf.
> > > It's only expectation to have ESP trailer at the proper place (after ICV):
> >
> > This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> > have ESP trailer updated or when mbuf->pkt_len = 0
> >
> > >
> > > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> > >                                 rte_security_dynfield(m);
> > >   mdata->enc = 1;
> > >   mdata->sa_idx = ic_session->sa_index;
> > >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> > >
> > > Then this data will be used by tx_burst() function.
> > So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> > mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> > will be using incorrect pad len ?
> 
> No, pkt_len can be modified.
> Though ESP trailer pad_len can't.
> 
> >
> > This patch is also trying to add similar restriction on when
> > rte_security_set_pkt_metadata() should be called and what cannot be done after
> > calling rte_security_set_pkt_metadata().
> 
> No, I don't think it is really the same.
> Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
> that ESP packet is already formed and trailer contains valid data.
> In fact, I think this pad_len calculation can be moved to actual TX function.
> 
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > > security,
> > > > >
> > > > > Yes, that was my thought.
> > > > >
> > > > > > my only argument was that since there is already a hit in
> > > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > > do security related pre-processing there.
> > > > >
> > > > > Yes, it would be extra callback call that way.
> > > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > > of function call will be spread around the whole burst, and I presume
> > > > > shouldn't be too high.
> > > > >
> > > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > > non-security pkts.
> > > > >
> > > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > > modifications are required for the packet.
> > > >
> > > > But the major issues I see are
> > > >
> > > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > > >    In our case, we need to know the security session details to do things.
> > >
> > > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> >
> > We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> > just for storing session pointer in rte_security_dynfield consumes unnecessary
> > cycles per pkt.
> 
> In fact there are two function calls: one for rte_security_set_pkt_metadata(),
> second for  instance->ops->set_pkt_metadata() callback.
> Which off-course way too expensive for such simple operation.
> Actually same thought for rte_security_get_userdata().
> Both of these functions belong to data-path and ideally have to be as fast as possible.
> Probably 21.11 is a right timeframe for that.
> 
> > >
> > > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> > >
> > > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > > specially for that - to store secuiryt related data inside the mbuf.
> > > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> > >
> > > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > > processing, it can be done via security specific set_pkt_metadata().
> > >
> > > But what you proposing introduces new limitations and might existing functionality.
> > > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > > itself is not an option?
> >
> > We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> > rte_security_set_pkt_metadata().
> >
> > Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> > set, then, user needs to update struct rte_security_session's sess_private_data in a in
> > rte_security_dynfield like below ?
> >
> > <snip>
> >
> > static inline void
> > inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
> >         struct rte_mbuf *mb[], uint16_t num)
> > {
> >         uint32_t i, ol_flags;
> >
> >         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
> >         for (i = 0; i != num; i++) {
> >
> >                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> >
> >                 if (ol_flags != 0)
> >                         rte_security_set_pkt_metadata(ss->security.ctx,
> >                                 ss->security.ses, mb[i], NULL);
> > 		else
> >                 	*rte_security_dynfield(mb[i]) =
> >                                 (uint64_t)ss->security.ses->sess_private_data;
> >
> >
> > If the above can be done, then in our PMD, we will not have a callback for
> > set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> > in capabilities.
> 
> That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
> So all existing apps that use this API will have to be changed.
> We'd better avoid such changes unless there is really good reason for that.
> So, I'd suggest to tweak your idea a bit:
> 
> 1) change rte_security_set_pkt_metadata():
> if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
> otherwise just: rte_security_dynfield(m) = sess->session_private_data;
> (fast-path)
> 
> 2) consider to make rte_security_set_pkt_metadata() inline function.
> We probably can have some special flag inside struct rte_security_ctx,
> or even store inside ctx a pointer to set_pkt_metadata() itself.

After another thoughts some new flags might be better.
Then later, if we'll realize that set_pkt_metadata() and get_useradata()
are not really used by PMDs, it might be easier to deprecate these callbacks.

> 
> As a brief code snippet:
> 
> struct rte_security_ctx {
>         void *device;
>         /**< Crypto/ethernet device attached */
>         const struct rte_security_ops *ops;
>         /**< Pointer to security ops for the device */
>         uint16_t sess_cnt;
>         /**< Number of sessions attached to this context */
> +     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);
> };
> 
> static inline int
> rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
>                               struct rte_security_session *sess,
>                               struct rte_mbuf *m, void *params)
> {
>      /* fast-path */
>       if (instance->set_pkt_mdata == NULL) {
>              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
>              return 0;
>        /* slow path */
>        } else
>            return instance->set_pkt_mdata(instance->device, sess, m, params);
> }
> 
> That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require
> some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
> (ctx_create()), but hopefully will benefit everyone.
> 
> >
> > >
> > > > I'm fine to
> > > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > > compensate for the overhead.
> > > >
> > > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > > rte_mbuf had space for struct rte_security_session pointer,
> > >
> > > But it does, see above.
> > > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > > If your PMD requires more data to be associated with mbuf
> > > - you can request it via mbuf_dynfield and store there whatever is needed.
> > >
> > > > then then I guess it would have been better to do the way you proposed.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > > called.
> > > > > > > > > >
> > > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > > Does it makes sense ?
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Once called,
> > > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > > >
> > > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > > >
> > > > > > > > > > > >  /**
> > > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > > >   */
> > > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.25.1
> > > > > > > > > > >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library
    2021-07-12 12:05  3%   ` Bruce Richardson
  2021-07-12 15:50  3%   ` Bruce Richardson
@ 2021-07-13 14:19  3%   ` Ananyev, Konstantin
  2021-07-13 14:28  0%     ` Bruce Richardson
  2 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-07-13 14:19 UTC (permalink / raw)
  To: Chengwen Feng, thomas, Yigit, Ferruh, Richardson,  Bruce, jerinj,
	jerinjacobk
  Cc: dev, mb, nipun.gupta, hemant.agrawal, maxime.coquelin,
	honnappa.nagarahalli, david.marchand, sburla, pkapoor, liangma


> +#include "rte_dmadev_core.h"
> +
> +/**
> + *  DMA flags to augment operation preparation.
> + *  Used as the 'flags' parameter of rte_dmadev_copy/copy_sg/fill/fill_sg.
> + */
> +#define RTE_DMA_FLAG_FENCE	(1ull << 0)
> +/**< DMA fence flag
> + * It means the operation with this flag must be processed only after all
> + * previous operations are completed.
> + *
> + * @see rte_dmadev_copy()
> + * @see rte_dmadev_copy_sg()
> + * @see rte_dmadev_fill()
> + * @see rte_dmadev_fill_sg()
> + */
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a copy operation onto the virtual DMA channel.
> + *
> + * This queues up a copy operation to be performed by hardware, but does not
> + * trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param src
> + *   The address of the source buffer.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the data to be copied.
> + * @param flags
> + *   An flags for this operation.
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
> +		uint32_t length, uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];

One question I have - did you guys consider hiding definitions of struct rte_dmadev 
and  rte_dmadevices[] into .c straight from the start?
Probably no point to repeat our famous ABI ethdev/cryptodev/... pitfalls here.  

> +#ifdef RTE_DMADEV_DEBUG
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy, -ENOTSUP);
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
> +		return -EINVAL;
> +	}
> +#endif
> +	return (*dev->copy)(dev, vchan, src, dst, length, flags);
> +}
> +
> +/**

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library
  2021-07-13 14:19  3%   ` Ananyev, Konstantin
@ 2021-07-13 14:28  0%     ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2021-07-13 14:28 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Chengwen Feng, thomas, Yigit, Ferruh, jerinj, jerinjacobk, dev,
	mb, nipun.gupta, hemant.agrawal, maxime.coquelin,
	honnappa.nagarahalli, david.marchand, sburla, pkapoor, liangma

On Tue, Jul 13, 2021 at 03:19:39PM +0100, Ananyev, Konstantin wrote:
> 
> > +#include "rte_dmadev_core.h"
> > +
> > +/**
> > + *  DMA flags to augment operation preparation.
> > + *  Used as the 'flags' parameter of rte_dmadev_copy/copy_sg/fill/fill_sg.
> > + */
> > +#define RTE_DMA_FLAG_FENCE   (1ull << 0)
> > +/**< DMA fence flag
> > + * It means the operation with this flag must be processed only after all
> > + * previous operations are completed.
> > + *
> > + * @see rte_dmadev_copy()
> > + * @see rte_dmadev_copy_sg()
> > + * @see rte_dmadev_fill()
> > + * @see rte_dmadev_fill_sg()
> > + */
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice.
> > + *
> > + * Enqueue a copy operation onto the virtual DMA channel.
> > + *
> > + * This queues up a copy operation to be performed by hardware, but does not
> > + * trigger hardware to begin that operation.
> > + *
> > + * @param dev_id
> > + *   The identifier of the device.
> > + * @param vchan
> > + *   The identifier of virtual DMA channel.
> > + * @param src
> > + *   The address of the source buffer.
> > + * @param dst
> > + *   The address of the destination buffer.
> > + * @param length
> > + *   The length of the data to be copied.
> > + * @param flags
> > + *   An flags for this operation.
> > + *
> > + * @return
> > + *   - 0..UINT16_MAX: index of enqueued copy job.
> > + *   - <0: Error code returned by the driver copy function.
> > + */
> > +__rte_experimental
> > +static inline int
> > +rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
> > +             uint32_t length, uint64_t flags)
> > +{
> > +     struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> 
> One question I have - did you guys consider hiding definitions of struct rte_dmadev
> and  rte_dmadevices[] into .c straight from the start?
> Probably no point to repeat our famous ABI ethdev/cryptodev/... pitfalls here.
> 
I considered it, but I found even moving one operation (the doorbell one)
to be non-inline made a small but noticable perf drop. Until we get all the
drivers done and more testing in various scenarios, I'd rather err on the
side of getting the best performance.

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  2021-07-13 14:08  0%                       ` Ananyev, Konstantin
@ 2021-07-13 15:58  0%                         ` Nithin Dabilpuram
  2021-07-14 11:09  0%                           ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Nithin Dabilpuram @ 2021-07-13 15:58 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	Radu, jiawenwu, jianwang

On Tue, Jul 13, 2021 at 02:08:18PM +0000, Ananyev, Konstantin wrote:
> 
> > 
> > Adding more rte_security and PMD maintainers into the loop.
> > 
> > > > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > > > work on Tx.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > > > >
> > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > > > >
> > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > > > >
> > > > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > > > >
> > > > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > > > >
> > > > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > > > >
> > > > > > > > > > > 1. receive_pkt()
> > > > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > > > 5. Do route_lookup()
> > > > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > > > 7. Send packet out.
> > > > > > > > > > >
> > > > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > > > ipsec-secgw is following.
> > > > > > > > > >
> > > > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > > > >
> > > > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > > > >
> > > > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > > > >
> > > > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling rte_security_set_pkt_metadata().
> > > > > > > > >
> > > > > > > > > This is also fine with us.
> > > > > > > >
> > > > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > > > Yes.
> > > > > > >
> > > > > > > >
> > > > > > > > Is that correct understanding?
> > > > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > > > >
> > > > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > > > are callbacks from same PMD, do you see any issue ?
> > > > > > >
> > > > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > > > rte_security_set_pkt_metadata() is called again.
> > > > > >
> > > > > > Yep, I do have a concern here.
> > > > > > Right now it is perfectly valid to do something like that:
> > > > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > > > /* can modify contents of the packet */
> > > > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > > > rte_eth_tx_burst(..., &mb, 1);
> > > > > >
> > > > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > > > >
> > > > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > > > PMD which are the ones only implementing the call back are already expecting the
> > > > > same ?
> > > >
> > > > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > > > All that ixgbe callback does - store session related data inside mbuf.
> > > > It's only expectation to have ESP trailer at the proper place (after ICV):
> > >
> > > This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> > > have ESP trailer updated or when mbuf->pkt_len = 0
> > >
> > > >
> > > > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> > > >                                 rte_security_dynfield(m);
> > > >   mdata->enc = 1;
> > > >   mdata->sa_idx = ic_session->sa_index;
> > > >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> > > >
> > > > Then this data will be used by tx_burst() function.
> > > So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> > > mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> > > will be using incorrect pad len ?
> > 
> > No, pkt_len can be modified.
> > Though ESP trailer pad_len can't.
> > 
> > >
> > > This patch is also trying to add similar restriction on when
> > > rte_security_set_pkt_metadata() should be called and what cannot be done after
> > > calling rte_security_set_pkt_metadata().
> > 
> > No, I don't think it is really the same.
> > Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
> > that ESP packet is already formed and trailer contains valid data.
> > In fact, I think this pad_len calculation can be moved to actual TX function.
> > 
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > > > security,
> > > > > >
> > > > > > Yes, that was my thought.
> > > > > >
> > > > > > > my only argument was that since there is already a hit in
> > > > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > > > do security related pre-processing there.
> > > > > >
> > > > > > Yes, it would be extra callback call that way.
> > > > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > > > of function call will be spread around the whole burst, and I presume
> > > > > > shouldn't be too high.
> > > > > >
> > > > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > > > non-security pkts.
> > > > > >
> > > > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > > > modifications are required for the packet.
> > > > >
> > > > > But the major issues I see are
> > > > >
> > > > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > > > >    In our case, we need to know the security session details to do things.
> > > >
> > > > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> > >
> > > We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> > > just for storing session pointer in rte_security_dynfield consumes unnecessary
> > > cycles per pkt.
> > 
> > In fact there are two function calls: one for rte_security_set_pkt_metadata(),
> > second for  instance->ops->set_pkt_metadata() callback.
> > Which off-course way too expensive for such simple operation.
> > Actually same thought for rte_security_get_userdata().
> > Both of these functions belong to data-path and ideally have to be as fast as possible.
> > Probably 21.11 is a right timeframe for that.
> > 
> > > >
> > > > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > > > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > > > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> > > >
> > > > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > > > specially for that - to store secuiryt related data inside the mbuf.
> > > > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> > > >
> > > > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > > > processing, it can be done via security specific set_pkt_metadata().
> > > >
> > > > But what you proposing introduces new limitations and might existing functionality.
> > > > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > > > itself is not an option?
> > >
> > > We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> > > rte_security_set_pkt_metadata().
> > >
> > > Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> > > set, then, user needs to update struct rte_security_session's sess_private_data in a in
> > > rte_security_dynfield like below ?
> > >
> > > <snip>
> > >
> > > static inline void
> > > inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
> > >         struct rte_mbuf *mb[], uint16_t num)
> > > {
> > >         uint32_t i, ol_flags;
> > >
> > >         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
> > >         for (i = 0; i != num; i++) {
> > >
> > >                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> > >
> > >                 if (ol_flags != 0)
> > >                         rte_security_set_pkt_metadata(ss->security.ctx,
> > >                                 ss->security.ses, mb[i], NULL);
> > > 		else
> > >                 	*rte_security_dynfield(mb[i]) =
> > >                                 (uint64_t)ss->security.ses->sess_private_data;
> > >
> > >
> > > If the above can be done, then in our PMD, we will not have a callback for
> > > set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> > > in capabilities.
> > 
> > That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
> > So all existing apps that use this API will have to be changed.
> > We'd better avoid such changes unless there is really good reason for that.
> > So, I'd suggest to tweak your idea a bit:
> > 
> > 1) change rte_security_set_pkt_metadata():
> > if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
> > otherwise just: rte_security_dynfield(m) = sess->session_private_data;
> > (fast-path)
> > 
> > 2) consider to make rte_security_set_pkt_metadata() inline function.
> > We probably can have some special flag inside struct rte_security_ctx,
> > or even store inside ctx a pointer to set_pkt_metadata() itself.
> 
> After another thoughts some new flags might be better.
> Then later, if we'll realize that set_pkt_metadata() and get_useradata()
> are not really used by PMDs, it might be easier to deprecate these callbacks.

Thanks, I agree with your thoughts. I'll submit a V2 with above change, new flags and 
set_pkt_metadata() and get_userdata() function pointers moved to rte_security_ctx for
review so that it can be targeted for 21.11. 

Even with flags moving set_pkt_metadata() and get_userdata() function pointers is still needed
as we need to make rte_security_set_pkt_metadata() API inline while struct rte_security_ops is not
exposed to user. I think this is fine as it is inline with how fast path function pointers
of rte_ethdev and rte_cryptodev are currently placed.

> 
> > 
> > As a brief code snippet:
> > 
> > struct rte_security_ctx {
> >         void *device;
> >         /**< Crypto/ethernet device attached */
> >         const struct rte_security_ops *ops;
> >         /**< Pointer to security ops for the device */
> >         uint16_t sess_cnt;
> >         /**< Number of sessions attached to this context */
> > +     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);
> > };
> > 
> > static inline int
> > rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> >                               struct rte_security_session *sess,
> >                               struct rte_mbuf *m, void *params)
> > {
> >      /* fast-path */
> >       if (instance->set_pkt_mdata == NULL) {
> >              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
> >              return 0;
> >        /* slow path */
> >        } else
> >            return instance->set_pkt_mdata(instance->device, sess, m, params);
> > }
> > 
> > That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require
> > some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
> > (ctx_create()), but hopefully will benefit everyone.
> > 
> > >
> > > >
> > > > > I'm fine to
> > > > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > > > compensate for the overhead.
> > > > >
> > > > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > > > rte_mbuf had space for struct rte_security_session pointer,
> > > >
> > > > But it does, see above.
> > > > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > > > If your PMD requires more data to be associated with mbuf
> > > > - you can request it via mbuf_dynfield and store there whatever is needed.
> > > >
> > > > > then then I guess it would have been better to do the way you proposed.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > > > called.
> > > > > > > > > > >
> > > > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > > > Does it makes sense ?
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Once called,
> > > > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > > > >
> > > > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > > > >
> > > > > > > > > > > > >  /**
> > > > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > > > >   */
> > > > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > >

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] [PATCH] eal: fix argument to rte_bsf32_safe
@ 2021-07-13 20:12  3% Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2021-07-13 20:12 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, anatoly.burakov

The first argument to rte_bsf32_safe was incorrectly declared as
a 64 bit value. This function only correctly handles on 32 bit values
and the underlying function rte_bsf32 only accepts 32 bit values.
This was introduced when the safe version was added and probably cause
by copy/paste from the 64 bit version.

The bug passed silently under the radar until some other code was
built with -Wall and -Wextra in C++ and C++ complains about the
missing cast.

Yes, this is a API signature change, but the original code was wrong.
It is an inline so not an ABI change.

Fixes: 4e261f551986 ("eal: add 64-bit bsf and 32-bit safe bsf functions")
Cc: anatoly.burakov@intel.com
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/eal/include/rte_common.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index d5a32c66a5fe..99eb5f1820ae 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -623,7 +623,7 @@ rte_bsf32(uint32_t v)
  *     Returns 0 if ``v`` was 0, otherwise returns 1.
  */
 static inline int
-rte_bsf32_safe(uint64_t v, uint32_t *pos)
+rte_bsf32_safe(uint32_t v, uint32_t *pos)
 {
 	if (v == 0)
 		return 0;
-- 
2.30.2


^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  2021-07-13 15:58  0%                         ` Nithin Dabilpuram
@ 2021-07-14 11:09  0%                           ` Ananyev, Konstantin
  2021-07-14 13:29  0%                             ` Nithin Dabilpuram
  0 siblings, 1 reply; 200+ results
From: Ananyev, Konstantin @ 2021-07-14 11:09 UTC (permalink / raw)
  To: Nithin Dabilpuram
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	 Radu, jiawenwu, jianwang

> > >
> > > Adding more rte_security and PMD maintainers into the loop.
> > >
> > > > > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > > > > work on Tx.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > > > > >
> > > > > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > > > > >
> > > > > > > > > > > > 1. receive_pkt()
> > > > > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > > > > 5. Do route_lookup()
> > > > > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > > > > 7. Send packet out.
> > > > > > > > > > > >
> > > > > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > > > > ipsec-secgw is following.
> > > > > > > > > > >
> > > > > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > > > > >
> > > > > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > > > > >
> > > > > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > > > > >
> > > > > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling
> rte_security_set_pkt_metadata().
> > > > > > > > > >
> > > > > > > > > > This is also fine with us.
> > > > > > > > >
> > > > > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Is that correct understanding?
> > > > > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > > > > >
> > > > > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > > > > are callbacks from same PMD, do you see any issue ?
> > > > > > > >
> > > > > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > > > > rte_security_set_pkt_metadata() is called again.
> > > > > > >
> > > > > > > Yep, I do have a concern here.
> > > > > > > Right now it is perfectly valid to do something like that:
> > > > > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > > > > /* can modify contents of the packet */
> > > > > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > > > > rte_eth_tx_burst(..., &mb, 1);
> > > > > > >
> > > > > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > > > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > > > > >
> > > > > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > > > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > > > > PMD which are the ones only implementing the call back are already expecting the
> > > > > > same ?
> > > > >
> > > > > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > > > > All that ixgbe callback does - store session related data inside mbuf.
> > > > > It's only expectation to have ESP trailer at the proper place (after ICV):
> > > >
> > > > This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> > > > have ESP trailer updated or when mbuf->pkt_len = 0
> > > >
> > > > >
> > > > > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> > > > >                                 rte_security_dynfield(m);
> > > > >   mdata->enc = 1;
> > > > >   mdata->sa_idx = ic_session->sa_index;
> > > > >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> > > > >
> > > > > Then this data will be used by tx_burst() function.
> > > > So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> > > > mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> > > > will be using incorrect pad len ?
> > >
> > > No, pkt_len can be modified.
> > > Though ESP trailer pad_len can't.
> > >
> > > >
> > > > This patch is also trying to add similar restriction on when
> > > > rte_security_set_pkt_metadata() should be called and what cannot be done after
> > > > calling rte_security_set_pkt_metadata().
> > >
> > > No, I don't think it is really the same.
> > > Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
> > > that ESP packet is already formed and trailer contains valid data.
> > > In fact, I think this pad_len calculation can be moved to actual TX function.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > > > > security,
> > > > > > >
> > > > > > > Yes, that was my thought.
> > > > > > >
> > > > > > > > my only argument was that since there is already a hit in
> > > > > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > > > > do security related pre-processing there.
> > > > > > >
> > > > > > > Yes, it would be extra callback call that way.
> > > > > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > > > > of function call will be spread around the whole burst, and I presume
> > > > > > > shouldn't be too high.
> > > > > > >
> > > > > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > > > > non-security pkts.
> > > > > > >
> > > > > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > > > > modifications are required for the packet.
> > > > > >
> > > > > > But the major issues I see are
> > > > > >
> > > > > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > > > > >    In our case, we need to know the security session details to do things.
> > > > >
> > > > > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> > > >
> > > > We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> > > > just for storing session pointer in rte_security_dynfield consumes unnecessary
> > > > cycles per pkt.
> > >
> > > In fact there are two function calls: one for rte_security_set_pkt_metadata(),
> > > second for  instance->ops->set_pkt_metadata() callback.
> > > Which off-course way too expensive for such simple operation.
> > > Actually same thought for rte_security_get_userdata().
> > > Both of these functions belong to data-path and ideally have to be as fast as possible.
> > > Probably 21.11 is a right timeframe for that.
> > >
> > > > >
> > > > > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > > > > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > > > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > > > > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> > > > >
> > > > > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > > > > specially for that - to store secuiryt related data inside the mbuf.
> > > > > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> > > > >
> > > > > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > > > > processing, it can be done via security specific set_pkt_metadata().
> > > > >
> > > > > But what you proposing introduces new limitations and might existing functionality.
> > > > > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > > > > itself is not an option?
> > > >
> > > > We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> > > > rte_security_set_pkt_metadata().
> > > >
> > > > Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> > > > set, then, user needs to update struct rte_security_session's sess_private_data in a in
> > > > rte_security_dynfield like below ?
> > > >
> > > > <snip>
> > > >
> > > > static inline void
> > > > inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
> > > >         struct rte_mbuf *mb[], uint16_t num)
> > > > {
> > > >         uint32_t i, ol_flags;
> > > >
> > > >         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
> > > >         for (i = 0; i != num; i++) {
> > > >
> > > >                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> > > >
> > > >                 if (ol_flags != 0)
> > > >                         rte_security_set_pkt_metadata(ss->security.ctx,
> > > >                                 ss->security.ses, mb[i], NULL);
> > > > 		else
> > > >                 	*rte_security_dynfield(mb[i]) =
> > > >                                 (uint64_t)ss->security.ses->sess_private_data;
> > > >
> > > >
> > > > If the above can be done, then in our PMD, we will not have a callback for
> > > > set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> > > > in capabilities.
> > >
> > > That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
> > > So all existing apps that use this API will have to be changed.
> > > We'd better avoid such changes unless there is really good reason for that.
> > > So, I'd suggest to tweak your idea a bit:
> > >
> > > 1) change rte_security_set_pkt_metadata():
> > > if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
> > > otherwise just: rte_security_dynfield(m) = sess->session_private_data;
> > > (fast-path)
> > >
> > > 2) consider to make rte_security_set_pkt_metadata() inline function.
> > > We probably can have some special flag inside struct rte_security_ctx,
> > > or even store inside ctx a pointer to set_pkt_metadata() itself.
> >
> > After another thoughts some new flags might be better.
> > Then later, if we'll realize that set_pkt_metadata() and get_useradata()
> > are not really used by PMDs, it might be easier to deprecate these callbacks.
> 
> Thanks, I agree with your thoughts. I'll submit a V2 with above change, new flags and
> set_pkt_metadata() and get_userdata() function pointers moved to rte_security_ctx for
> review so that it can be targeted for 21.11.
> 
> Even with flags moving set_pkt_metadata() and get_userdata() function pointers is still needed
> as we need to make rte_security_set_pkt_metadata() API inline while struct rte_security_ops is not
> exposed to user. I think this is fine as it is inline with how fast path function pointers
> of rte_ethdev and rte_cryptodev are currently placed.

My thought was we can get away with just flags only.
Something like that:
rte_security.h:

...

enum {
	RTE_SEC_CTX_F_FAST_SET_MDATA = 0x1,
              RTE_SEC_CTX_F_FAST_GET_UDATA = 0x2,
}; 

struct rte_security_ctx {
        void *device;
        /**< Crypto/ethernet device attached */
        const struct rte_security_ops *ops;
        /**< Pointer to security ops for the device */
        uint16_t sess_cnt;
        /**< Number of sessions attached to this context */
       uint32_t flags;
};

extern int
__rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
                               struct rte_security_session *sess,
                               struct rte_mbuf *m, void *params); 

static inline int
 rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
                               struct rte_security_session *sess,
                               struct rte_mbuf *m, void *params)
{
      /* fast-path */
       if (instance->flags & RTE_SEC_CTX_F_FAST_SET_MDATA) {
              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
              return 0;
        /* slow path */
        } else
            return __rte_security_set_pkt_metadata (instance->device, sess, m, params);
}

rte_security.c: 

...
/* existing one, just renamed */
int
__rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
                              struct rte_security_session *sess,
                              struct rte_mbuf *m, void *params)
{
#ifdef RTE_DEBUG
        RTE_PTR_OR_ERR_RET(sess, -EINVAL);
        RTE_PTR_OR_ERR_RET(instance, -EINVAL);
        RTE_PTR_OR_ERR_RET(instance->ops, -EINVAL);
#endif
        RTE_FUNC_PTR_OR_ERR_RET(*instance->ops->set_pkt_metadata, -ENOTSUP);
        return instance->ops->set_pkt_metadata(instance->device,
                                               sess, m, params);
}


I think both ways are possible (flags vs actual func pointers) and both have
some pluses and minuses.
I suppose the main choice here what do we think should be the future of
set_pkt_metadata() and rte_security_get_userdata(). 
If we think that they will be useful for some future PMDs and we want to keep them,
then probably storing actual func pointers inside ctx is a better approach.
If not, then flags seems like a better one, as in that case we can eventually
deprecate and remove these callbacks.
From what I see right now, custom callbacks seems excessive,
and rte_security_dynfield is enough.
But might be there are some future plans that would require them?   
 
> 
> >
> > >
> > > As a brief code snippet:
> > >
> > > struct rte_security_ctx {
> > >         void *device;
> > >         /**< Crypto/ethernet device attached */
> > >         const struct rte_security_ops *ops;
> > >         /**< Pointer to security ops for the device */
> > >         uint16_t sess_cnt;
> > >         /**< Number of sessions attached to this context */
> > > +     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);
> > > };
> > >
> > > static inline int
> > > rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> > >                               struct rte_security_session *sess,
> > >                               struct rte_mbuf *m, void *params)
> > > {
> > >      /* fast-path */
> > >       if (instance->set_pkt_mdata == NULL) {
> > >              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
> > >              return 0;
> > >        /* slow path */
> > >        } else
> > >            return instance->set_pkt_mdata(instance->device, sess, m, params);
> > > }
> > >
> > > That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require
> > > some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
> > > (ctx_create()), but hopefully will benefit everyone.
> > >
> > > >
> > > > >
> > > > > > I'm fine to
> > > > > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > > > > compensate for the overhead.
> > > > > >
> > > > > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > > > > rte_mbuf had space for struct rte_security_session pointer,
> > > > >
> > > > > But it does, see above.
> > > > > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > > > > If your PMD requires more data to be associated with mbuf
> > > > > - you can request it via mbuf_dynfield and store there whatever is needed.
> > > > >
> > > > > > then then I guess it would have been better to do the way you proposed.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > > > > called.
> > > > > > > > > > > >
> > > > > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > > > > Does it makes sense ?
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Once called,
> > > > > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  /**
> > > > > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > > > > >   */
> > > > > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  2021-07-14 11:09  0%                           ` Ananyev, Konstantin
@ 2021-07-14 13:29  0%                             ` Nithin Dabilpuram
  2021-07-14 17:28  0%                               ` Ananyev, Konstantin
  0 siblings, 1 reply; 200+ results
From: Nithin Dabilpuram @ 2021-07-14 13:29 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	Radu, jiawenwu, jianwang

On Wed, Jul 14, 2021 at 11:09:08AM +0000, Ananyev, Konstantin wrote:
> > > >
> > > > Adding more rte_security and PMD maintainers into the loop.
> > > >
> > > > > > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > > > > > work on Tx.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > > > > > >
> > > > > > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > > > > > >
> > > > > > > > > > > > > 1. receive_pkt()
> > > > > > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > > > > > 5. Do route_lookup()
> > > > > > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > > > > > 7. Send packet out.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > > > > > ipsec-secgw is following.
> > > > > > > > > > > >
> > > > > > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > > > > > >
> > > > > > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > > > > > >
> > > > > > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > > > > > >
> > > > > > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling
> > rte_security_set_pkt_metadata().
> > > > > > > > > > >
> > > > > > > > > > > This is also fine with us.
> > > > > > > > > >
> > > > > > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Is that correct understanding?
> > > > > > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > > > > > >
> > > > > > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > > > > > are callbacks from same PMD, do you see any issue ?
> > > > > > > > >
> > > > > > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > > > > > rte_security_set_pkt_metadata() is called again.
> > > > > > > >
> > > > > > > > Yep, I do have a concern here.
> > > > > > > > Right now it is perfectly valid to do something like that:
> > > > > > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > > > > > /* can modify contents of the packet */
> > > > > > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > > > > > rte_eth_tx_burst(..., &mb, 1);
> > > > > > > >
> > > > > > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > > > > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > > > > > >
> > > > > > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > > > > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > > > > > PMD which are the ones only implementing the call back are already expecting the
> > > > > > > same ?
> > > > > >
> > > > > > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > > > > > All that ixgbe callback does - store session related data inside mbuf.
> > > > > > It's only expectation to have ESP trailer at the proper place (after ICV):
> > > > >
> > > > > This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> > > > > have ESP trailer updated or when mbuf->pkt_len = 0
> > > > >
> > > > > >
> > > > > > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> > > > > >                                 rte_security_dynfield(m);
> > > > > >   mdata->enc = 1;
> > > > > >   mdata->sa_idx = ic_session->sa_index;
> > > > > >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> > > > > >
> > > > > > Then this data will be used by tx_burst() function.
> > > > > So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> > > > > mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> > > > > will be using incorrect pad len ?
> > > >
> > > > No, pkt_len can be modified.
> > > > Though ESP trailer pad_len can't.
> > > >
> > > > >
> > > > > This patch is also trying to add similar restriction on when
> > > > > rte_security_set_pkt_metadata() should be called and what cannot be done after
> > > > > calling rte_security_set_pkt_metadata().
> > > >
> > > > No, I don't think it is really the same.
> > > > Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
> > > > that ESP packet is already formed and trailer contains valid data.
> > > > In fact, I think this pad_len calculation can be moved to actual TX function.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > > > > > security,
> > > > > > > >
> > > > > > > > Yes, that was my thought.
> > > > > > > >
> > > > > > > > > my only argument was that since there is already a hit in
> > > > > > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > > > > > do security related pre-processing there.
> > > > > > > >
> > > > > > > > Yes, it would be extra callback call that way.
> > > > > > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > > > > > of function call will be spread around the whole burst, and I presume
> > > > > > > > shouldn't be too high.
> > > > > > > >
> > > > > > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > > > > > non-security pkts.
> > > > > > > >
> > > > > > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > > > > > modifications are required for the packet.
> > > > > > >
> > > > > > > But the major issues I see are
> > > > > > >
> > > > > > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > > > > > >    In our case, we need to know the security session details to do things.
> > > > > >
> > > > > > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> > > > >
> > > > > We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> > > > > just for storing session pointer in rte_security_dynfield consumes unnecessary
> > > > > cycles per pkt.
> > > >
> > > > In fact there are two function calls: one for rte_security_set_pkt_metadata(),
> > > > second for  instance->ops->set_pkt_metadata() callback.
> > > > Which off-course way too expensive for such simple operation.
> > > > Actually same thought for rte_security_get_userdata().
> > > > Both of these functions belong to data-path and ideally have to be as fast as possible.
> > > > Probably 21.11 is a right timeframe for that.
> > > >
> > > > > >
> > > > > > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > > > > > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > > > > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > > > > > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> > > > > >
> > > > > > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > > > > > specially for that - to store secuiryt related data inside the mbuf.
> > > > > > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> > > > > >
> > > > > > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > > > > > processing, it can be done via security specific set_pkt_metadata().
> > > > > >
> > > > > > But what you proposing introduces new limitations and might existing functionality.
> > > > > > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > > > > > itself is not an option?
> > > > >
> > > > > We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> > > > > rte_security_set_pkt_metadata().
> > > > >
> > > > > Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> > > > > set, then, user needs to update struct rte_security_session's sess_private_data in a in
> > > > > rte_security_dynfield like below ?
> > > > >
> > > > > <snip>
> > > > >
> > > > > static inline void
> > > > > inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
> > > > >         struct rte_mbuf *mb[], uint16_t num)
> > > > > {
> > > > >         uint32_t i, ol_flags;
> > > > >
> > > > >         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
> > > > >         for (i = 0; i != num; i++) {
> > > > >
> > > > >                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> > > > >
> > > > >                 if (ol_flags != 0)
> > > > >                         rte_security_set_pkt_metadata(ss->security.ctx,
> > > > >                                 ss->security.ses, mb[i], NULL);
> > > > > 		else
> > > > >                 	*rte_security_dynfield(mb[i]) =
> > > > >                                 (uint64_t)ss->security.ses->sess_private_data;
> > > > >
> > > > >
> > > > > If the above can be done, then in our PMD, we will not have a callback for
> > > > > set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> > > > > in capabilities.
> > > >
> > > > That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
> > > > So all existing apps that use this API will have to be changed.
> > > > We'd better avoid such changes unless there is really good reason for that.
> > > > So, I'd suggest to tweak your idea a bit:
> > > >
> > > > 1) change rte_security_set_pkt_metadata():
> > > > if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
> > > > otherwise just: rte_security_dynfield(m) = sess->session_private_data;
> > > > (fast-path)
> > > >
> > > > 2) consider to make rte_security_set_pkt_metadata() inline function.
> > > > We probably can have some special flag inside struct rte_security_ctx,
> > > > or even store inside ctx a pointer to set_pkt_metadata() itself.
> > >
> > > After another thoughts some new flags might be better.
> > > Then later, if we'll realize that set_pkt_metadata() and get_useradata()
> > > are not really used by PMDs, it might be easier to deprecate these callbacks.
> > 
> > Thanks, I agree with your thoughts. I'll submit a V2 with above change, new flags and
> > set_pkt_metadata() and get_userdata() function pointers moved to rte_security_ctx for
> > review so that it can be targeted for 21.11.
> > 
> > Even with flags moving set_pkt_metadata() and get_userdata() function pointers is still needed
> > as we need to make rte_security_set_pkt_metadata() API inline while struct rte_security_ops is not
> > exposed to user. I think this is fine as it is inline with how fast path function pointers
> > of rte_ethdev and rte_cryptodev are currently placed.
> 
> My thought was we can get away with just flags only.
> Something like that:
> rte_security.h:
> 
> ...
> 
> enum {
> 	RTE_SEC_CTX_F_FAST_SET_MDATA = 0x1,
>               RTE_SEC_CTX_F_FAST_GET_UDATA = 0x2,
> }; 
> 
> struct rte_security_ctx {
>         void *device;
>         /**< Crypto/ethernet device attached */
>         const struct rte_security_ops *ops;
>         /**< Pointer to security ops for the device */
>         uint16_t sess_cnt;
>         /**< Number of sessions attached to this context */
>        uint32_t flags;
> };
> 
> extern int
> __rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
>                                struct rte_security_session *sess,
>                                struct rte_mbuf *m, void *params); 
> 
> static inline int
>  rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
>                                struct rte_security_session *sess,
>                                struct rte_mbuf *m, void *params)
> {
>       /* fast-path */
>        if (instance->flags & RTE_SEC_CTX_F_FAST_SET_MDATA) {
>               *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
>               return 0;
>         /* slow path */
>         } else
>             return __rte_security_set_pkt_metadata (instance->device, sess, m, params);
> }
> 
> rte_security.c: 
> 
> ...
> /* existing one, just renamed */
> int
> __rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
>                               struct rte_security_session *sess,
>                               struct rte_mbuf *m, void *params)
> {
> #ifdef RTE_DEBUG
>         RTE_PTR_OR_ERR_RET(sess, -EINVAL);
>         RTE_PTR_OR_ERR_RET(instance, -EINVAL);
>         RTE_PTR_OR_ERR_RET(instance->ops, -EINVAL);
> #endif
>         RTE_FUNC_PTR_OR_ERR_RET(*instance->ops->set_pkt_metadata, -ENOTSUP);
>         return instance->ops->set_pkt_metadata(instance->device,
>                                                sess, m, params);
> }
> 
> 
> I think both ways are possible (flags vs actual func pointers) and both have
> some pluses and minuses.
> I suppose the main choice here what do we think should be the future of
> set_pkt_metadata() and rte_security_get_userdata(). 
> If we think that they will be useful for some future PMDs and we want to keep them,
> then probably storing actual func pointers inside ctx is a better approach.
> If not, then flags seems like a better one, as in that case we can eventually
> deprecate and remove these callbacks.
> From what I see right now, custom callbacks seems excessive,
> and rte_security_dynfield is enough.
> But might be there are some future plans that would require them?   

Above method is also fine. Moving fn pointers to rte_security_ctx can be
done later if other PMD's need it.

Atleast our HW PMD's doesn't plan to use set_pkt_metada()/get_user_data() 
fn pointers in future if above is implemented.

>  
> > 
> > >
> > > >
> > > > As a brief code snippet:
> > > >
> > > > struct rte_security_ctx {
> > > >         void *device;
> > > >         /**< Crypto/ethernet device attached */
> > > >         const struct rte_security_ops *ops;
> > > >         /**< Pointer to security ops for the device */
> > > >         uint16_t sess_cnt;
> > > >         /**< Number of sessions attached to this context */
> > > > +     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);
> > > > };
> > > >
> > > > static inline int
> > > > rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> > > >                               struct rte_security_session *sess,
> > > >                               struct rte_mbuf *m, void *params)
> > > > {
> > > >      /* fast-path */
> > > >       if (instance->set_pkt_mdata == NULL) {
> > > >              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
> > > >              return 0;
> > > >        /* slow path */
> > > >        } else
> > > >            return instance->set_pkt_mdata(instance->device, sess, m, params);
> > > > }
> > > >
> > > > That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require
> > > > some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
> > > > (ctx_create()), but hopefully will benefit everyone.
> > > >
> > > > >
> > > > > >
> > > > > > > I'm fine to
> > > > > > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > > > > > compensate for the overhead.
> > > > > > >
> > > > > > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > > > > > rte_mbuf had space for struct rte_security_session pointer,
> > > > > >
> > > > > > But it does, see above.
> > > > > > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > > > > > If your PMD requires more data to be associated with mbuf
> > > > > > - you can request it via mbuf_dynfield and store there whatever is needed.
> > > > > >
> > > > > > > then then I guess it would have been better to do the way you proposed.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > > > > > called.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > > > > > Does it makes sense ?
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Once called,
> > > > > > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  /**
> > > > > > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > > > > > >   */
> > > > > > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > >

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] Minutes of Technical Board Meeting, 2021-06-30
@ 2021-07-14 15:11  4% Aaron Conole
  0 siblings, 0 replies; 200+ results
From: Aaron Conole @ 2021-07-14 15:11 UTC (permalink / raw)
  To: techboard, dev

Attendees
---------
* Aaron
* Bruce
* Ferruh
* Hemant
* Honnappa
* Jerin
* Kevin
* Konstantin
* Lincoln Lavioe (UNH representative)
* Maxime
* Olivier
* Stephen
* Thomas

NOTE: The technical board meets every second Wednesday at
https://meet.jit.si/DPDK at 3 pm UTC.
Meetings are public, and DPDK community members are welcome to attend.

NOTE: Additional follow up for ABI scheduled 2021-07-02

* intro
** updated agenda
*** added temp. gov board rep - aaron chosen (Thomas)
*** added next-net survey (Ferruh)
*** added security process (Maxime)

* Temp governing board membership
** Honnappa will be on PTO for the next Gov. Board meeting
** Decision to have Aaron present at the governing board

* Discussion about API for clang / gcc builtins (Honnappa)
** shemminger: MSFT doesn't have support for atomic builtins
** thomas: need to know what the effort looks like for compatibility
** thomas: 3 options to vote for atomics
*** OPT1 - continue using gcc builtins
*** OPT2 - a wrapper at compile time that can be done in future
*** OPT3 - do mass renames / wrapping now with internal implementations
*** OPT4 - create a wrapper that clones gcc built-ins instead
**** Not a good option because it could clash with external project that links in
     stdatomic vs builtin
*** shemminger: only available in c++
** Tabled for more discussion - Honnappa to follow up via tech board mailing list

* IOL
** Ask governing board about coverity license for coverity desktop
** First cut tools, cppcheck, scan-build, flawfinder
** Daily sub-tree reporting for merging, for now dashboard
** Single release report as well - Lincoln to capture as a story
** DTS workgroup for virtio
** question for techboard - is compression testing a priority?

* API / ABI discussion
*** Meeting setup for Fri, Jul 02, 2021


^ permalink raw reply	[relevance 4%]

* [dpdk-dev] Minutes of Technical Board Meeting, 2021-06-16
@ 2021-07-14 15:15  4% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2021-07-14 15:15 UTC (permalink / raw)
  To: dev; +Cc: techboard

Members Attending: 12/12
	- Aaron Conole
	- Bruce Richardson
	- Ferruh Yigit
	- Hemant Agrawal
	- Honnappa Nagarahalli
	- Jerin Jacob
	- Kevin Traynor
	- Konstantin Ananyev
	- Maxime Coquelin
	- Olivier Matz
	- Stephen Hemminger
	- Thomas Monjalon (Chair)

NOTE: The Technical Board meetings take place every second Wednesday
on https://meet.jit.si/DPDK at 3 pm UTC.
Meetings are public, and DPDK community members are welcome to attend.
Agenda and minutes can be found at http://core.dpdk.org/techboard/minutes

NOTE: Next meeting will be on Wednesday 2021-06-30 @3pm UTC,
and will be chaired by Aaron.


1/ DTS workgroup

A group is working on DTS (DPDK Test Suite) feedbacks
with the target of making DTS test mandatory for new features,
starting with 22.05.

The tests are being listed in 2 categories: reviewed / non-reviewed
so it does not block DTS development while introducing some new policies.

There are many questions like how to manage DPDK code and DTS tests
in separate repositories? What is the scope of DTS?
How to manage limited HW availability?

Working document:
https://docs.google.com/document/d/1c5S0_mZzFvzZfYkqyORLT2-qNvUb-fBdjA6DGusy4yM
Emails:
https://inbox.dpdk.org/dev/?q=DTS+Workgroup


2/ UNH report

There is a document of Community Lab updates to read carefully:
https://docs.google.com/document/d/1v0VKtZdsMXg35WNDawdsnqj5J4Xl9Egu_4180ukKD2o

The report will be discussed during the next techboard meeting.


3/ IRC network

It seems freenode is not a trusted/working IRC network anymore.
We need to choose a new place for quick discussions.
OFTC is an old trusted network, Libera.Chat is in continuation of freenode.
Libera.Chat is chosen to be the network used by the DPDK community.
Our default channel is #DPDK.


4/ CVE

The vulnerabilities are better managed since Cheng Jiang joined the effort.
Thanks to him.


5/ techboard policies

There is document in progress to better define the techboard policies:
https://docs.google.com/document/d/1Al9-DPJSn7kXgEF3nhbp-srb_IMUU_T_wEWqugw4vtA

We will try to get an agreement in mid-July meeting.


6/ ABI

Ray, Bruce, Ferruh and Thomas worked on a plan to improve the ABI stability
with the objective of extending the compatibility period to 2 years:
https://docs.google.com/document/d/1Kju9FxBj3zR_hezErzitaatUrtdBsgL0iAlE05QNpck

After discussing the status and the focus of next improvements,
it has been decided to share a spreadsheet for volunteering:
https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE

The objective should be discussed in details during the next meeting.



^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
  2021-07-14 13:29  0%                             ` Nithin Dabilpuram
@ 2021-07-14 17:28  0%                               ` Ananyev, Konstantin
  0 siblings, 0 replies; 200+ results
From: Ananyev, Konstantin @ 2021-07-14 17:28 UTC (permalink / raw)
  To: Nithin Dabilpuram
  Cc: Akhil Goyal, dev, hemant.agrawal, thomas, g.singh, Yigit, Ferruh,
	Zhang, Roy Fan, olivier.matz, jerinj, Doherty, Declan, Nicolau,
	 Radu, jiawenwu, jianwang



> -----Original Message-----
> From: Nithin Dabilpuram <nithind1988@gmail.com>
> Sent: Wednesday, July 14, 2021 2:30 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Akhil Goyal <gakhil@marvell.com>; dev@dpdk.org; hemant.agrawal@nxp.com; thomas@monjalon.net; g.singh@nxp.com; Yigit, Ferruh
> <ferruh.yigit@intel.com>; Zhang, Roy Fan <roy.fan.zhang@intel.com>; olivier.matz@6wind.com; jerinj@marvell.com; Doherty, Declan
> <declan.doherty@intel.com>; Nicolau, Radu <radu.nicolau@intel.com>; jiawenwu@trustnetic.com; jianwang@trustnetic.com
> Subject: Re: [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing
> 
> On Wed, Jul 14, 2021 at 11:09:08AM +0000, Ananyev, Konstantin wrote:
> > > > >
> > > > > Adding more rte_security and PMD maintainers into the loop.
> > > > >
> > > > > > > > > > > > > > > > For Tx inline processing, when RTE_SECURITY_TX_OLOAD_NEED_MDATA is
> > > > > > > > > > > > > > > > set, rte_security_set_pkt_metadata() needs to be called for pkts
> > > > > > > > > > > > > > > > to associate a Security session with a mbuf before submitting
> > > > > > > > > > > > > > > > to Ethdev Tx. This is apart from setting PKT_TX_SEC_OFFLOAD in
> > > > > > > > > > > > > > > > mbuf.ol_flags. rte_security_set_pkt_metadata() is also used to
> > > > > > > > > > > > > > > > set some opaque metadata in mbuf for PMD's use.
> > > > > > > > > > > > > > > > This patch updates documentation that rte_security_set_pkt_metadata()
> > > > > > > > > > > > > > > > should be called only with mbuf containing Layer 3 and above data.
> > > > > > > > > > > > > > > > This behaviour is consistent with existing PMD's such as ixgbe.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tx, not all net PMD's/HW can parse packet and identify
> > > > > > > > > > > > > > > > L2 header and L3 header locations on Tx. This is inline with other
> > > > > > > > > > > > > > > > Tx offloads requirements such as L3 checksum, L4 checksum offload,
> > > > > > > > > > > > > > > > etc, where mbuf.l2_len, mbuf.l3_len etc, needs to be set for
> > > > > > > > > > > > > > > > HW to be able to generate checksum. Since Inline IPSec is also
> > > > > > > > > > > > > > > > such a Tx offload, some PMD's at least need mbuf.l2_len to be
> > > > > > > > > > > > > > > > valid to find L3 header and perform Outbound IPSec processing.
> > > > > > > > > > > > > > > > Hence, this patch updates documentation to enforce setting
> > > > > > > > > > > > > > > > mbuf.l2_len while setting PKT_TX_SEC_OFFLOAD in mbuf.ol_flags
> > > > > > > > > > > > > > > > for Inline IPSec Crypto / Protocol offload processing to
> > > > > > > > > > > > > > > > work on Tx.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
> > > > > > > > > > > > > > > > Reviewed-by: Akhil Goyal <gakhil@marvell.com>
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >  doc/guides/nics/features.rst           | 2 ++
> > > > > > > > > > > > > > > >  doc/guides/prog_guide/rte_security.rst | 6 +++++-
> > > > > > > > > > > > > > > >  lib/mbuf/rte_mbuf_core.h               | 2 ++
> > > > > > > > > > > > > > > >  3 files changed, 9 insertions(+), 1 deletion(-)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > > index 403c2b03a..414baf14f 100644
> > > > > > > > > > > > > > > > --- a/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > > +++ b/doc/guides/nics/features.rst
> > > > > > > > > > > > > > > > @@ -430,6 +430,7 @@ of protocol operations. See Security library and PMD documentation for more deta
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``capabilities_get``.
> > > > > > > > > > > > > > > >  * **[provides] rte_eth_dev_info**: ``rx_offload_capa,rx_queue_offload_capa:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > > > @@ -451,6 +452,7 @@ protocol operations. See security library and PMD documentation for more details
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  * **[uses]       rte_eth_rxconf,rte_eth_rxmode**: ``offloads:DEV_RX_OFFLOAD_SECURITY``,
> > > > > > > > > > > > > > > >  * **[uses]       rte_eth_txconf,rte_eth_txmode**: ``offloads:DEV_TX_OFFLOAD_SECURITY``.
> > > > > > > > > > > > > > > > +* **[uses]       mbuf**: ``mbuf.l2_len``.
> > > > > > > > > > > > > > > >  * **[implements] rte_security_ops**: ``session_create``, ``session_update``,
> > > > > > > > > > > > > > > >    ``session_stats_get``, ``session_destroy``, ``set_pkt_metadata``, ``get_userdata``,
> > > > > > > > > > > > > > > >    ``capabilities_get``.
> > > > > > > > > > > > > > > > diff --git a/doc/guides/prog_guide/rte_security.rst b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > > index f72bc8a78..7b68c698d 100644
> > > > > > > > > > > > > > > > --- a/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > > +++ b/doc/guides/prog_guide/rte_security.rst
> > > > > > > > > > > > > > > > @@ -560,7 +560,11 @@ created by the application is attached to the security session by the API
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  For Inline Crypto and Inline protocol offload, device specific defined metadata is
> > > > > > > > > > > > > > > >  updated in the mbuf using ``rte_security_set_pkt_metadata()`` if
> > > > > > > > > > > > > > > > -``DEV_TX_OFFLOAD_SEC_NEED_MDATA`` is set.
> > > > > > > > > > > > > > > > +``RTE_SECURITY_TX_OLOAD_NEED_MDATA`` is set. ``rte_security_set_pkt_metadata()``
> > > > > > > > > > > > > > > > +should be called on mbuf only with Layer 3 and above data present and
> > > > > > > > > > > > > > > > +``mbuf.data_off`` should be pointing to Layer 3 Header.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hmm... not sure why mbuf.data_off should point to L3 hdr.
> > > > > > > > > > > > > > > Who will add L2 hdr to the packet in that case?
> > > > > > > > > > > > > > > Or did you mean ``mbuf.data_off + mbuf.l2_len`` here?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > That is the semantics I was trying to define. I think below are the sequence of
> > > > > > > > > > > > > > operations to be done for ipsec processing,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 1. receive_pkt()
> > > > > > > > > > > > > > 2. strip_l2_hdr()
> > > > > > > > > > > > > > 3. Do policy lookup ()
> > > > > > > > > > > > > > 4. Call rte_security_set_pkt_metadata() if pkt needs to be encrypted with a
> > > > > > > > > > > > > > particular SA. Now pkt only has L3 and above data.
> > > > > > > > > > > > > > 5. Do route_lookup()
> > > > > > > > > > > > > > 6. add_l2hdr() which might be different from stripped l2hdr.
> > > > > > > > > > > > > > 7. Send packet out.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The above sequence is what I believe the current poll mode worker thread in
> > > > > > > > > > > > > > ipsec-secgw is following.
> > > > > > > > > > > > >
> > > > > > > > > > > > > That's just a sample app, it doesn't mean it has to be the only possible way.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > While in event mode, step 2 and step 6 are missing.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think this L2 hdr manipulation is totally optional.
> > > > > > > > > > > > > If your rte_security_set_pkt_metadata() implementation really needs to know L3 hdr offset (not sure why?),
> > > > > > > > > > > > Since rte_security_set_pkt_metadata() is PMD specific function ptr call, we are currently doing some pre-processing
> > > > > > > > > > > > here before submitting packet to inline IPSec via rte_eth_tx_burst(). This saves us cycles later in rte_eth_tx_burst().
> > > > > > > > > > > > If we cannot know for sure, the pkt content at the time of rte_security_set_pkt_metadata() call, then I think
> > > > > > > > > > > > having a PMD specific callback is not much of use except for saving SA priv data to rte_mbuf.
> > > > > > > > > > > >
> > > > > > > > > > > > > then I suppose we can add a requirement that l2_len has to be set properly before calling
> > > rte_security_set_pkt_metadata().
> > > > > > > > > > > >
> > > > > > > > > > > > This is also fine with us.
> > > > > > > > > > >
> > > > > > > > > > > Ok, so to make sure we are on the same page, you propose:
> > > > > > > > > > > 1. before calling rte_security_set_pkt_metadata() mbuf.l2_len should be properly set.
> > > > > > > > > > > 2. after rte_security_set_pkt_metadata() and before rte_eth_tx_burst() packet contents
> > > > > > > > > > >     at [mbuf.l2_len, mbuf.pkt_len) can't be modified?
> > > > > > > > > > Yes.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Is that correct understanding?
> > > > > > > > > > > If yes, I wonder how 2) will correlate with rte_eth_tx_prepare() concept?
> > > > > > > > > >
> > > > > > > > > > Since our PMD doesn't have a prepare function, I missed that but, since
> > > > > > > > > > rte_security_set_pkt_metadata() is only used for Inline Crypto/Protocol via
> > > > > > > > > > a rte_eth_dev, and both rte_security_set_pkt_metadata() and rte_eth_tx_prepare()
> > > > > > > > > > are callbacks from same PMD, do you see any issue ?
> > > > > > > > > >
> > > > > > > > > > The restriction is from user side, data is not supposed to be modified unless
> > > > > > > > > > rte_security_set_pkt_metadata() is called again.
> > > > > > > > >
> > > > > > > > > Yep, I do have a concern here.
> > > > > > > > > Right now it is perfectly valid to do something like that:
> > > > > > > > > rte_security_set_pkt_metadata(..., mb, ...);
> > > > > > > > > /* can modify contents of the packet */
> > > > > > > > > rte_eth_tx_prepare(..., &mb, 1);
> > > > > > > > > rte_eth_tx_burst(..., &mb, 1);
> > > > > > > > >
> > > > > > > > > With the new restrictions you are proposing it wouldn't be allowed any more.
> > > > > > > > You can still modify L2 header and IPSEC is only concerned about L3 and above.
> > > > > > > >
> > > > > > > > I think insisting that rte_security_set_pkt_metadata() be called after all L3
> > > > > > > > and above header modifications is no a problem. I guess existing ixgbe/txgbe
> > > > > > > > PMD which are the ones only implementing the call back are already expecting the
> > > > > > > > same ?
> > > > > > >
> > > > > > > AFAIK, no there are no such requirements for ixgbe or txgbe.
> > > > > > > All that ixgbe callback does - store session related data inside mbuf.
> > > > > > > It's only expectation to have ESP trailer at the proper place (after ICV):
> > > > > >
> > > > > > This implies rte_security_set_pkt_metadata() cannot be called when mbuf does't
> > > > > > have ESP trailer updated or when mbuf->pkt_len = 0
> > > > > >
> > > > > > >
> > > > > > > union ixgbe_crypto_tx_desc_md *mdata = (union ixgbe_crypto_tx_desc_md *)
> > > > > > >                                 rte_security_dynfield(m);
> > > > > > >   mdata->enc = 1;
> > > > > > >   mdata->sa_idx = ic_session->sa_index;
> > > > > > >   mdata->pad_len = ixgbe_crypto_compute_pad_len(m);
> > > > > > >
> > > > > > > Then this data will be used by tx_burst() function.
> > > > > > So it implies that after above rte_security_set_pkt_metadata() call, and before tx_burst(),
> > > > > > mbuf data / packet len cannot be modified right as if modified, then tx_burst()
> > > > > > will be using incorrect pad len ?
> > > > >
> > > > > No, pkt_len can be modified.
> > > > > Though ESP trailer pad_len can't.
> > > > >
> > > > > >
> > > > > > This patch is also trying to add similar restriction on when
> > > > > > rte_security_set_pkt_metadata() should be called and what cannot be done after
> > > > > > calling rte_security_set_pkt_metadata().
> > > > >
> > > > > No, I don't think it is really the same.
> > > > > Also, IMO, inside ixgbe set_pkt_metadata() implementaion we probably shouldn't silently imply
> > > > > that ESP packet is already formed and trailer contains valid data.
> > > > > In fact, I think this pad_len calculation can be moved to actual TX function.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > If your question is can't we do the preprocessing in rte_eth_tx_prepare() for
> > > > > > > > > > security,
> > > > > > > > >
> > > > > > > > > Yes, that was my thought.
> > > > > > > > >
> > > > > > > > > > my only argument was that since there is already a hit in
> > > > > > > > > > rte_security_set_pkt_metadata() to PMD specific callback and
> > > > > > > > > > struct rte_security_session is passed as an argument to it, it is more benefitial to
> > > > > > > > > > do security related pre-processing there.
> > > > > > > > >
> > > > > > > > > Yes, it would be extra callback call that way.
> > > > > > > > > Though tx_prepare() accepts burst of packets, so the overhead
> > > > > > > > > of function call will be spread around the whole burst, and I presume
> > > > > > > > > shouldn't be too high.
> > > > > > > > >
> > > > > > > > > > Also rte_eth_tx_prepare() if implemented will be called for both security and
> > > > > > > > > > non-security pkts.
> > > > > > > > >
> > > > > > > > > Yes, but tx_prepare() can distinguish (by ol_flags and/or other field contents) which
> > > > > > > > > modifications are required for the packet.
> > > > > > > >
> > > > > > > > But the major issues I see are
> > > > > > > >
> > > > > > > > 1. tx_prepare() doesn't take rte_security_session as argument though ol_flags has security flag.
> > > > > > > >    In our case, we need to know the security session details to do things.
> > > > > > >
> > > > > > > I suppose you can store pointer to session (or so) inside mbuf in rte_security_dynfield, no?
> > > > > >
> > > > > > We can do. But having to call PMD specific function call via rte_security_set_pkt_metadata()
> > > > > > just for storing session pointer in rte_security_dynfield consumes unnecessary
> > > > > > cycles per pkt.
> > > > >
> > > > > In fact there are two function calls: one for rte_security_set_pkt_metadata(),
> > > > > second for  instance->ops->set_pkt_metadata() callback.
> > > > > Which off-course way too expensive for such simple operation.
> > > > > Actually same thought for rte_security_get_userdata().
> > > > > Both of these functions belong to data-path and ideally have to be as fast as possible.
> > > > > Probably 21.11 is a right timeframe for that.
> > > > >
> > > > > > >
> > > > > > > > 2. AFAIU tx_prepare() is not mandatory as per spec and even by default disabled under compile time
> > > > > > > >    macro RTE_ETHDEV_TX_PREPARE_NOOP.
> > > > > > > > 3. Even if we do tx_prepare(), rte_security_set_pkt_mdata() is mandatory to associate
> > > > > > > >    struct rte_security_session to a pkt as unlike ol_flags, there is no direct space to do the same.
> > > > > > >
> > > > > > > Didn't get you here, obviously we do have rte_security_dynfield inside mbuf,
> > > > > > > specially for that - to store secuiryt related data inside the mbuf.
> > > > > > > Yes your PMD has to request it at initialization time, but I suppose it is not a big deal.
> > > > > > >
> > > > > > > > So I think instead of enforcing yet another callback tx_prepare() for inline security
> > > > > > > > processing, it can be done via security specific set_pkt_metadata().
> > > > > > >
> > > > > > > But what you proposing introduces new limitations and might existing functionality.
> > > > > > > BTW, if you don't like to use tx_prepare() - why doing these calculations inside tx_burst()
> > > > > > > itself is not an option?
> > > > > >
> > > > > > We can do things in tx_burst() but if we are doing it there, then we want to avoid having callback for
> > > > > > rte_security_set_pkt_metadata().
> > > > > >
> > > > > > Are you fine if we can update the spec that "When DEV_TX_OFFLOAD_SEC_NEED_MDATA is not
> > > > > > set, then, user needs to update struct rte_security_session's sess_private_data in a in
> > > > > > rte_security_dynfield like below ?
> > > > > >
> > > > > > <snip>
> > > > > >
> > > > > > static inline void
> > > > > > inline_outb_mbuf_prepare(const struct rte_ipsec_session *ss,
> > > > > >         struct rte_mbuf *mb[], uint16_t num)
> > > > > > {
> > > > > >         uint32_t i, ol_flags;
> > > > > >
> > > > > >         ol_flags = ss->security.ol_flags & RTE_SECURITY_TX_OLOAD_NEED_MDATA;
> > > > > >         for (i = 0; i != num; i++) {
> > > > > >
> > > > > >                 mb[i]->ol_flags |= PKT_TX_SEC_OFFLOAD;
> > > > > >
> > > > > >                 if (ol_flags != 0)
> > > > > >                         rte_security_set_pkt_metadata(ss->security.ctx,
> > > > > >                                 ss->security.ses, mb[i], NULL);
> > > > > > 		else
> > > > > >                 	*rte_security_dynfield(mb[i]) =
> > > > > >                                 (uint64_t)ss->security.ses->sess_private_data;
> > > > > >
> > > > > >
> > > > > > If the above can be done, then in our PMD, we will not have a callback for
> > > > > > set_pkt_metadata() and DEV_TX_OFFLOAD_SEC_NEED_MDATA will also be not set
> > > > > > in capabilities.
> > > > >
> > > > > That's an interesting idea, but what you propose is the change in current rte_security API behaviour.
> > > > > So all existing apps that use this API will have to be changed.
> > > > > We'd better avoid such changes unless there is really good reason for that.
> > > > > So, I'd suggest to tweak your idea a bit:
> > > > >
> > > > > 1) change rte_security_set_pkt_metadata():
> > > > > if ops->set_pkt_metadata != NULL, then call it (existing behaviour)
> > > > > otherwise just: rte_security_dynfield(m) = sess->session_private_data;
> > > > > (fast-path)
> > > > >
> > > > > 2) consider to make rte_security_set_pkt_metadata() inline function.
> > > > > We probably can have some special flag inside struct rte_security_ctx,
> > > > > or even store inside ctx a pointer to set_pkt_metadata() itself.
> > > >
> > > > After another thoughts some new flags might be better.
> > > > Then later, if we'll realize that set_pkt_metadata() and get_useradata()
> > > > are not really used by PMDs, it might be easier to deprecate these callbacks.
> > >
> > > Thanks, I agree with your thoughts. I'll submit a V2 with above change, new flags and
> > > set_pkt_metadata() and get_userdata() function pointers moved to rte_security_ctx for
> > > review so that it can be targeted for 21.11.
> > >
> > > Even with flags moving set_pkt_metadata() and get_userdata() function pointers is still needed
> > > as we need to make rte_security_set_pkt_metadata() API inline while struct rte_security_ops is not
> > > exposed to user. I think this is fine as it is inline with how fast path function pointers
> > > of rte_ethdev and rte_cryptodev are currently placed.
> >
> > My thought was we can get away with just flags only.
> > Something like that:
> > rte_security.h:
> >
> > ...
> >
> > enum {
> > 	RTE_SEC_CTX_F_FAST_SET_MDATA = 0x1,
> >               RTE_SEC_CTX_F_FAST_GET_UDATA = 0x2,
> > };
> >
> > struct rte_security_ctx {
> >         void *device;
> >         /**< Crypto/ethernet device attached */
> >         const struct rte_security_ops *ops;
> >         /**< Pointer to security ops for the device */
> >         uint16_t sess_cnt;
> >         /**< Number of sessions attached to this context */
> >        uint32_t flags;
> > };
> >
> > extern int
> > __rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> >                                struct rte_security_session *sess,
> >                                struct rte_mbuf *m, void *params);
> >
> > static inline int
> >  rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> >                                struct rte_security_session *sess,
> >                                struct rte_mbuf *m, void *params)
> > {
> >       /* fast-path */
> >        if (instance->flags & RTE_SEC_CTX_F_FAST_SET_MDATA) {
> >               *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
> >               return 0;
> >         /* slow path */
> >         } else
> >             return __rte_security_set_pkt_metadata (instance->device, sess, m, params);
> > }
> >
> > rte_security.c:
> >
> > ...
> > /* existing one, just renamed */
> > int
> > __rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> >                               struct rte_security_session *sess,
> >                               struct rte_mbuf *m, void *params)
> > {
> > #ifdef RTE_DEBUG
> >         RTE_PTR_OR_ERR_RET(sess, -EINVAL);
> >         RTE_PTR_OR_ERR_RET(instance, -EINVAL);
> >         RTE_PTR_OR_ERR_RET(instance->ops, -EINVAL);
> > #endif
> >         RTE_FUNC_PTR_OR_ERR_RET(*instance->ops->set_pkt_metadata, -ENOTSUP);
> >         return instance->ops->set_pkt_metadata(instance->device,
> >                                                sess, m, params);
> > }
> >
> >
> > I think both ways are possible (flags vs actual func pointers) and both have
> > some pluses and minuses.
> > I suppose the main choice here what do we think should be the future of
> > set_pkt_metadata() and rte_security_get_userdata().
> > If we think that they will be useful for some future PMDs and we want to keep them,
> > then probably storing actual func pointers inside ctx is a better approach.
> > If not, then flags seems like a better one, as in that case we can eventually
> > deprecate and remove these callbacks.
> > From what I see right now, custom callbacks seems excessive,
> > and rte_security_dynfield is enough.
> > But might be there are some future plans that would require them?
> 
> Above method is also fine. Moving fn pointers to rte_security_ctx can be
> done later if other PMD's need it.

Yes, agree.

> 
> Atleast our HW PMD's doesn't plan to use set_pkt_metada()/get_user_data()
> fn pointers in future if above is implemented.
> 
> >
> > >
> > > >
> > > > >
> > > > > As a brief code snippet:
> > > > >
> > > > > struct rte_security_ctx {
> > > > >         void *device;
> > > > >         /**< Crypto/ethernet device attached */
> > > > >         const struct rte_security_ops *ops;
> > > > >         /**< Pointer to security ops for the device */
> > > > >         uint16_t sess_cnt;
> > > > >         /**< Number of sessions attached to this context */
> > > > > +     int (*set_pkt_mdata)(void *, struct rte_security_session *, struct rte_mbuf *,  void *);
> > > > > };
> > > > >
> > > > > static inline int
> > > > > rte_security_set_pkt_metadata(struct rte_security_ctx *instance,
> > > > >                               struct rte_security_session *sess,
> > > > >                               struct rte_mbuf *m, void *params)
> > > > > {
> > > > >      /* fast-path */
> > > > >       if (instance->set_pkt_mdata == NULL) {
> > > > >              *rte_security_dynfield(m) = (rte_security_dynfield_t)(session->sess_priv_data);
> > > > >              return 0;
> > > > >        /* slow path */
> > > > >        } else
> > > > >            return instance->set_pkt_mdata(instance->device, sess, m, params);
> > > > > }
> > > > >
> > > > > That probably would be an ABI breakage (new fileld in rte_security_ctx) and would require
> > > > > some trivial changes for all existing PMDs that use RTE_SECURITY_TX_OFLOAD_NEED_MDATA
> > > > > (ctx_create()), but hopefully will benefit everyone.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > I'm fine to
> > > > > > > > introduce a burst call for the same(I was thinking to propose it in future) to
> > > > > > > > compensate for the overhead.
> > > > > > > >
> > > > > > > > If rte_security_set_pkt_metadata() was not a PMD specific function ptr call and
> > > > > > > > rte_mbuf had space for struct rte_security_session pointer,
> > > > > > >
> > > > > > > But it does, see above.
> > > > > > > In fact it even more flexible - because it is driver specific, you are not limited to one 64-bit field.
> > > > > > > If your PMD requires more data to be associated with mbuf
> > > > > > > - you can request it via mbuf_dynfield and store there whatever is needed.
> > > > > > >
> > > > > > > > then then I guess it would have been better to do the way you proposed.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This patch is trying to enforce semantics as above so that
> > > > > > > > > > > > > > rte_security_set_pkt_metadata() can predict what comes in the pkt when he is
> > > > > > > > > > > > > > called.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I also think above sequence is what Linux kernel stack or other stacks follow.
> > > > > > > > > > > > > > Does it makes sense ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Once called,
> > > > > > > > > > > > > > > > +Layer 3 and above data cannot be modified or moved around unless
> > > > > > > > > > > > > > > > +``rte_security_set_pkt_metadata()`` is called again.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  For inline protocol offloaded ingress traffic, the application can register a
> > > > > > > > > > > > > > > >  pointer, ``userdata`` , in the security session. When the packet is received,
> > > > > > > > > > > > > > > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > > index bb38d7f58..9d8e3ddc8 100644
> > > > > > > > > > > > > > > > --- a/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > > > > > > > > > > > > > > @@ -228,6 +228,8 @@ extern "C" {
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  /**
> > > > > > > > > > > > > > > >   * Request security offload processing on the TX packet.
> > > > > > > > > > > > > > > > + * To use Tx security offload, the user needs to fill l2_len in mbuf
> > > > > > > > > > > > > > > > + * indicating L2 header size and where L3 header starts.
> > > > > > > > > > > > > > > >   */
> > > > > > > > > > > > > > > >  #define PKT_TX_SEC_OFFLOAD	(1ULL << 43)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > 2.25.1
> > > > > > > > > > > > > > >

^ permalink raw reply	[relevance 0%]

* Re: [dpdk-dev] [PATCH v3] dmadev: introduce DMA device library
  2021-07-13 13:37  0%     ` Bruce Richardson
@ 2021-07-15  6:44  0%       ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2021-07-15  6:44 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: fengchengwen, Thomas Monjalon, Ferruh Yigit, Jerin Jacob,
	Andrew Rybchenko, dpdk-dev, Morten Brørup, Nipun Gupta,
	Hemant Agrawal, Maxime Coquelin, Honnappa Nagarahalli,
	David Marchand, Satananda Burla, Prasun Kapoor, Ananyev,
	Konstantin

On Tue, Jul 13, 2021 at 7:08 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Tue, Jul 13, 2021 at 09:06:39PM +0800, fengchengwen wrote:
> > Thank you for your valuable comments, and I think we've taken a big step forward.
> >
> > @andrew Could you provide the copyright line so that I can add it to relevant file.
> >
> > @burce, jerin  Some unmodified review comments are returned here:
>
> Thanks. Some further comments inline below. Most points you make I'm ok
> with, but I do disagree on a number of others.
>
> /Bruce
>
> >
> > 1.
> > COMMENT: We allow up to 100 characters per line for DPDK code, so these don't need
> > to be wrapped so aggressively.
> >
> > REPLY: Our CI still has 80 characters limit, and I review most framework still comply.
> >
> Ok.
>
> > 2.
> > COMMENT: > +#define RTE_DMA_MEM_TO_MEM     (1ull << 0)
> > RTE_DMA_DIRECTION_...
> >
> > REPLY: add the 'DIRECTION' may the macro too long, I prefer keep it simple.
> >
> DIRECTION could be shortened to DIR, but I think this is probably ok as is
> too.
>

I prefer to keep DIR so that it easy to point in documentation like
@see RTE_DMA_DIR_*


> > 3.
> > COMMENT: > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
> > We are not making release as pubic API in other device class. See ethdev spec.
> > bbdev/eventdev/rawdev
> >
> > REPLY: because ethdev's queue is hard-queue, and here is the software defined channels,
> > I think release is OK, BTW: bbdev/eventdev also have release ops.

I don't see any API like rte_event_queue_release() in event dev. It
has the only setup.

Typical flow is
1) configure() the N vchan
2) for i..N setup() the chan
3) start()
3) stop()
4) configure again with M vchan
5)  for i..M setup() the chan
5) start()

And above is documented at the beginning of the rte_dmadev.h header file.
I think, above sequence makes it easy for drivers. Just like other
device class _release can be
PMD hook which will be handled in configure() common code.



> >
> Ok


> > 4.  COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future
> > fields */
> > > +};
> > Please add the capability for each counter in info structure as one
> > device may support all the counters.
> >
> > REPLY: This is a statistics function. If this function is not supported,
> > then do not need to implement the stats ops function. Also could to set
> > the unimplemented ones to zero.
> >
> +1
> The stats functions should be a minimum set that is supported by all
> drivers. Each of these stats can be easily tracked by software if HW
> support for it is not available, so I agree that we should not have each
> stat as a capability.

In our current HW, submitted_count and completed_count offloaded to HW.
In addition to that, we have a provision for getting stats for bytes
copied.( We can make it as xstat, if other drivers won't support)

our plan is to use enqueued_count and completed_fail_count in SW under
condition compilation flags or another scheme as it is in fastpath.

If we are not planning to add capability, IMO, we need to update the
documentation,
like unimplemented counters will return zero. But there is the
question of how to differentiate between
unimplemented vs genuine zero value. IMO, we can update the doc for
this case as well or
add capability.


>
> > 5.
> > COMMENT: > +#endif
> > > +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> > Instead of every driver set the NOP function, In the common code, If
> > the CAPA is not set,
> > common code can set NOP function for this with <0 return value.
> >
> > REPLY: I don't think it's a good idea to judge in IO path, it's application duty to ensure
> > don't call API which driver not supported (which could get from capabilities).
> >
> For datapath functions, +1.

OK. Probably add some NOP function(returns it as error) in pmd.h so
that all drivers can reuse.
No strong opnion.

>
> > 6.
> > COMMENT: > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> > > +                          const uint16_t nb_status, uint32_t *status,
> > uint32_t -> enum rte_dma_status_code
> >
> > REPLY:I'm still evaluating this. It takes a long time for the driver to perform error code
> > conversion in this API. Do we need to provide an error code conversion function alone ?
> >
> It's not that difficult a conversion to do, and so long as we have the
> regular "completed" function which doesn't do all the error manipulation we
> should be fine. Performance in the case of errors is not expected to be as
> good, since errors should be very rare.

+1

>
> > 7.
> > COMMENT: > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> > > +                                struct rte_dmadev_info *dev_info);
> > Please change to rte_dmadev_info_get_t to avoid conflict due to namespace issue
> > as this header is exported.
> >
> > REPLY: I prefer not add 'rte_' prefix, it make the define too long.
> >
> I disagree on this, they need the rte_ prefix, despite the fact it makes
> them longer. If length is a concern, these can be changed from "dmadev_" to
> "rte_dma_", which is only one character longer.
> In fact, I believe Morten already suggested we use "rte_dma" rather than
> "rte_dmadev" as a function prefix across the library.

+1

>
> > 8.
> > COMMENT: > + *        - rte_dmadev_completed_fails()
> > > + *            - return the number of operation requests failed to complete.
> > Please rename this to "completed_status" to allow the return of information
> > other than just errors. As I suggested before, I think this should also be
> > usable as a slower version of "completed" even in the case where there are
> > no errors, in that it returns status information for each and every job
> > rather than just returning as soon as it hits a failure.
> >
> > REPLY: well, I think it maybe confuse (current OK/FAIL API is easy to understand.),
> > and we can build the slow path function on the two API.
> >
> I still disagree on this too. We have a "completed" op where we get
> informed of what has completed and minimal error indication, and a
> "completed_status" operation which provides status information for each
> operation completed, at the cost of speed.

+1

>
> > 9.
> > COMMENT: > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM       (1ull << 0)
> > > +/**< DMA device support mem-to-mem transfer.
> > Do we need this? Can we assume that any device appearing as a dmadev can
> > do mem-to-mem copies, and drop the capability for mem-to-mem and the
> > capability for copying?
> > also for RTE_DMA_DEV_CAPA_OPS_COPY
> >
> > REPLY: yes, I insist on adding this for the sake of conceptual integrity.
> > For ioat driver just make a statement.
> >
>
> Ok. It seems a wasted bit to me, but I don't see us running out of them
> soon.
>
> > 10.
> > COMMENT: > +  uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> > > +};
> > Let's add rte_dmadev_conf struct into this to return the configuration
> > settings.
> >
> > REPLY: If we add rte_dmadev_conf in, it may break ABI when rte_dmadev_conf add fields.
> >
> Yes, that is true, but I fail to see why that is a major problem. It just
> means that if the conf structure changes we have two functions to version
> instead of one. The information is still useful.
>
> If you don't want the actual conf structure explicitly put into the info
> struct, we can instead put the fields in directly. I really think that the
> info_get function should provide back to the user the details of what way
> the device was configured previously.
>
> regards,
> /Bruce

^ permalink raw reply	[relevance 0%]

* [dpdk-dev] Techboard - minutes of meeting 2021-07-14
@ 2021-07-15  9:29  5% Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2021-07-15  9:29 UTC (permalink / raw)
  To: techboard; +Cc: dev

Attendees
---------

* Aaron Conole
* Bruce Richardson
* Ferruh Yigit
* Hemant Agrawal
* Honnappa Nagarahalli
* Jerin Jacob
* Kevin Traynor
* Konstantin Ananyev
* Stephen Hemminger
* Thomas Monjalon

Minutes
-------

1. Readout on survey on next-net maintainership

* Ferruh provided a summary of results of a survey he carried out with
  driver maintainers and techboard for feedback on the next-net tree he
  maintains.
* Unfortunately response rate was very low
* Key feedback received in survey:
  * working with patches on mailing list is found to be difficult, with
    large volumes of mails
  * submitters found it awkward to have to do patchwork updates manually on
    sending new patch revisions
  * request for more user-friendly tooling workflow

* Techboard held a discussion on a possible trial of using other tools for
  development workflow in the future. Largely requires a tree maintainer to
  volunteer to run such a trial for a release period to investigate how it
  works and what issues are discovered.

2. ABI/API compatibility and expanded ABI-stability window

* Proposal has been sent out to TB and maintainers on increasing the ABI
  compatibility period to 2 years from 1 year.
  * Lack of general feedback on this
* Work is ongoing to identify and address ABI concerns within DPDK project.
  Maintainers are asked to help with identifying issues in their own areas
  of expertise.
* Discussion was help on changing to 2-year ABI period immediately for
  21.11 or to do so after a review next year. No clear consensus emerged

* ACTION: Ferruh/Thomas to send out patch to DPDK mailing list on ABI:
  * To clarify 2-year proposal specifically
  * To expand discussion wider to the whole development community.

3. US DPDK Event

* the lower than expected attendance numbers was noted
* some discussion on selection criteria and avoidance of very
  vendor-specific content for future events
* general hope within TB for in-person events rather than virtual next
  year!

4. Techboard Membership

* Hemant led some initial discussion on latest document draft

* ACTION: All-TB-Members, (re)review latest document on Techboard
  governance.


^ permalink raw reply	[relevance 5%]

* Re: [dpdk-dev] [PATCH v2 1/2] drivers: add octeontx crypto adapter framework
  @ 2021-07-15 14:21  5%     ` David Marchand
  2021-07-16  8:39  3%       ` [dpdk-dev] [EXT] " Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2021-07-15 14:21 UTC (permalink / raw)
  To: Shijith Thotton, Akhil Goyal
  Cc: dev, Pavan Nikhilesh, Anoob Joseph, Jerin Jacob Kollanukkaran,
	Abhinandan Gujjar, Ankur Dwivedi, Ray Kinsella, Aaron Conole,
	dpdklab, Lincoln Lavoie

Hello,

On Wed, Jun 23, 2021 at 10:54 PM Shijith Thotton <sthotton@marvell.com> wrote:
> diff --git a/drivers/event/octeontx/meson.build b/drivers/event/octeontx/meson.build
> index 3cb140b4de..0d9eec3f2e 100644
> --- a/drivers/event/octeontx/meson.build
> +++ b/drivers/event/octeontx/meson.build
> @@ -12,3 +12,4 @@ sources = files(
>  )
>
>  deps += ['common_octeontx', 'mempool_octeontx', 'bus_vdev', 'net_octeontx']
> +deps += ['crypto_octeontx']

This extra dependency resulted in disabling the event/octeontx driver
in FreeBSD, since crypto/octeontx only builds on Linux.
Removing hw support triggers a ABI failure for FreeBSD.


- This had been reported by UNH CI:
http://mails.dpdk.org/archives/test-report/2021-June/200637.html
It seems the result has been ignored but it should have at least
raised some discussion.


- I asked UNH to stop testing FreeBSD abi for now, waiting to get the
main branch fixed.

I don't have the time to look at this, please can you work on it?

Several options:
* crypto/octeontx is made so that it compiles on FreeBSD,
* the abi check is extended to have exceptions per OS,
* the FreeBSD abi reference is regenerated at UNH not to have those
drivers in it (not sure it is doable),


Thanks.

-- 
David Marchand


^ permalink raw reply	[relevance 5%]

* [dpdk-dev] DPDK Release Status Meeting 15/07/2021
@ 2021-07-15 22:28  4% Mcnamara, John
  0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2021-07-15 22:28 UTC (permalink / raw)
  To: dev; +Cc: thomas, Yigit, Ferruh

Release status meeting minutes {Date}
=====================================
:Date: 15 July 2021
:toc:

.Agenda:
* Release Dates
* Subtrees
* Roadmaps
* LTS
* Defects
* Opens

.Participants:
* ARM
* Debian/Microsoft
* Intel
* Marvell
* Nvidia
* Red Hat


Release Dates
-------------

* `v21.08` dates
  - Proposal/V1:    Wednesday, 2 June  (completed)
  - rc1:            Saturday,  10 July (completed)
  - rc2:            Thursday,  22 July
  - rc3:            Thursday,  29 July
  - Release:        Tuesday,   3 August

* Note: We need to hold to the early August release date since
  several of the maintainers will be on holidays after that.

* `v21.11` dates (proposed and subject to discussion)
  - Proposal/V1:    Friday, 10 September
  - -rc1:           Friday, 15 October
  - Release:        Friday, 19 November

Subtrees
--------

* main
  - RC1 released.
  - RC2 targeted for Thursday 22 July.
  - Still waiting update on Solarflare patches.


* next-net
  - No update.

* next-crypto
  - 4 new PMDs in this release:
    ** CNXK - merged.
    ** MLX - on last review. Should be merged for RC2.
    ** Intel QAT - Should be merged for RC2.
    ** NXP baseband - will be deferred to next release.

* next-eventdev
  - PR for RC2 today or tomorrow.

* next-virtio
  - Some patches in PR today.
  - New DMA Dev work complicates merging of some of the vhost patches.
    An offline meeting will be held to view options and decide the
    best technical approach. Maxime to set up.

* next-net-brcm
  - No update.

* next-net-intel
  - No update.

* next-net-mlx
  - PR not pulled due to comments that need to be addressed.
  - New version sent today.

* next-net-mrvl
  - Almost ready for RC2.
  - 1 or 2 small series and bug fixes.


LTS
---

* `v19.11` (next version is `v19.11.9`)
  - RC4 tagged.
  - Target release date July 19.

* `v20.11` (next version is `v20.11.3`)
  - 20.11.2 released by Xueming Li on July 7.
  - https://git.dpdk.org/dpdk-stable/commit/?h=20.11&id=a86024748385423306aac45524d6fc8d22ea6703

* Distros
  - v20.11 in Debian 11
  - v20.11 in Ubuntu 21.04


Defects
-------

* Bugzilla links, 'Bugs',  added for hosted projects
  - https://www.dpdk.org/hosted-projects/


Opens
-----

* There in an ongoing inititive around ABI stability which was
  discussed in the Tech Board call. A workgroup has come up
  with a list of critical and major changes required to let us
  extend the ABI without as much disruption. For example:

  ** export driver interfaces as internal
  ** hide more structs (may require uninlining)
  ** split big structs + new feature-specific functions Major
  ** remove enum maximums
  ** reserved space initialized to 0
  ** reserved flags cleared

* We need to fill details and volunteers in this table:
  https://docs.google.com/spreadsheets/d/1betlC000ua5SsSiJIcC54mCCCJnW6voH5Dqv9UxeyfE/edit?usp=sharing




.DPDK Release Status Meetings
*****
The DPDK Release Status Meeting is intended for DPDK Committers to discuss the status of the master tree and sub-trees, and for project managers to track progress or milestone dates.

The meeting occurs on every Thursdays at 8:30 UTC. on https://meet.jit.si/DPDK

If you wish to attend just send an email to "John McNamara <john.mcnamara@intel.com>" for the invite.
*****

^ permalink raw reply	[relevance 4%]

* Re: [dpdk-dev] [EXT] Re: [PATCH v2 1/2] drivers: add octeontx crypto adapter framework
  2021-07-15 14:21  5%     ` David Marchand
@ 2021-07-16  8:39  3%       ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2021-07-16  8:39 UTC (permalink / raw)
  To: David Marchand, Shijith Thotton, Thomas Monjalon,
	Jerin Jacob Kollanukkaran
  Cc: dev, Pavan Nikhilesh Bhagavatula, Anoob Joseph,
	Abhinandan Gujjar, Ankur Dwivedi, Ray Kinsella, Aaron Conole,
	dpdklab, Lincoln Lavoie

Hi David,

> >  deps += ['common_octeontx', 'mempool_octeontx', 'bus_vdev',
> 'net_octeontx']
> > +deps += ['crypto_octeontx']
> 
> This extra dependency resulted in disabling the event/octeontx driver
> in FreeBSD, since crypto/octeontx only builds on Linux.
> Removing hw support triggers a ABI failure for FreeBSD.
> 
> 
> - This had been reported by UNH CI:
> http://mails.dpdk.org/archives/test-report/2021-June/200637.html 
> It seems the result has been ignored but it should have at least
> raised some discussion.
> 
This was highlighted to CI ML
http://patches.dpdk.org/project/dpdk/patch/0686a7c3fb3a22e37378a8545bc37bce04f4c391.1624481225.git.sthotton@marvell.com/

but I think I missed to take the follow up with Brandon and applied the patch
as it did not look an issue to me as octeon drivers are not currently built on FreeBSD.
Not sure why event driver is getting built there.

> 
> - I asked UNH to stop testing FreeBSD abi for now, waiting to get the
> main branch fixed.
> 
> I don't have the time to look at this, please can you work on it?
> 
> Several options:
> * crypto/octeontx is made so that it compiles on FreeBSD,
> * the abi check is extended to have exceptions per OS,
> * the FreeBSD abi reference is regenerated at UNH not to have those
> drivers in it (not sure it is doable),

Thanks for the suggestions, we are working on it to resolve this as soon as possible.
We may need to add exception in ABI checking so that it does not shout if a PMD
is not compiled.

Regards,
Akhil

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd doesn't show RSS hash offload
  @ 2021-07-16  8:52  3%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2021-07-16  8:52 UTC (permalink / raw)
  To: Li, Xiaoyun, Wang, Jie1X, dev; +Cc: andrew.rybchenko, stable

On 7/16/2021 9:30 AM, Li, Xiaoyun wrote:
>> -----Original Message-----
>> From: stable <stable-bounces@dpdk.org> On Behalf Of Li, Xiaoyun
>> Sent: Thursday, July 15, 2021 12:54
>> To: Wang, Jie1X <jie1x.wang@intel.com>; dev@dpdk.org
>> Cc: andrew.rybchenko@oktetlabs.ru; stable@dpdk.org
>> Subject: Re: [dpdk-stable] [PATCH v4] app/testpmd: fix testpmd doesn't show
>> RSS hash offload
>>
>>> -----Original Message-----
>>> From: Wang, Jie1X <jie1x.wang@intel.com>
>>> Sent: Thursday, July 15, 2021 19:57
>>> To: dev@dpdk.org
>>> Cc: Li, Xiaoyun <xiaoyun.li@intel.com>; andrew.rybchenko@oktetlabs.ru;
>>> Wang, Jie1X <jie1x.wang@intel.com>; stable@dpdk.org
>>> Subject: [PATCH v4] app/testpmd: fix testpmd doesn't show RSS hash
>>> offload
>>>
>>> The driver may change offloads info into dev->data->dev_conf in
>>> dev_configure which may cause port->dev_conf and port->rx_conf contain
>> outdated values.
>>>
>>> This patch updates the offloads info if it changes to fix this issue.
>>>
>>> Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Jie Wang <jie1x.wang@intel.com>
>>> ---
>>> v4: delete the whitespace at the end of the line.
>>> v3:
>>>  - check and update the "offloads" of "port->dev_conf.rx/txmode".
>>>  - update the commit log.
>>> v2: copy "rx/txmode.offloads", instead of copying the entire struct
>>> "dev->data-
>>>> dev_conf.rx/txmode".
>>> ---
>>>  app/test-pmd/testpmd.c | 27 +++++++++++++++++++++++++++
>>>  1 file changed, 27 insertions(+)
>>
>> Acked-by: Xiaoyun Li <xiaoyun.li@intel.com>
> 
> Although I gave my ack, app shouldn't touch rte_eth_devices which this patch does. Usually, testpmd should only call function like eth_dev_info_get_print_err().
> But dev_info doesn't contain the info dev->data->dev_conf which the driver modifies.
> 
> Probably we need a better fix.
> 

Agree, an application accessing directly to 'rte_eth_devices' is sign of
something missing/wrong.

In this case there is no way for application to know what is the configured
offload settings per port and queue. Which is missing part I think.

As you said normally we get data from PMD mainly via 'rte_eth_dev_info_get()',
which is an overloaded function, it provides many different things, like driver
default values, limitations, current config/status, capabilities etc...

So I think we can do a few things:
1) Add current offload configuration to 'rte_eth_dev_info_get()', so application
can get it and use it.
The advantage is this API already called many places, many times, so there is a
big chance that application already have this information when it needs.
Disadvantage is, as mentioned above the API already big and messy, making it
bigger makes more error prone and makes easier to break ABI.

2) Add a new API to get configured offload information, so a specific API for it.

3) Get a more generic API to get configured config (dev_conf) which will cover
offloads too.
Disadvantage can be leaking out too many internal config to user unintentionally.

^ permalink raw reply	[relevance 3%]

* Re: [dpdk-dev] RFC enabling dll/dso for dpdk on windows
  2021-07-09  1:03  2%   ` Tyler Retzlaff
@ 2021-07-16  9:40  4%     ` Dmitry Kozlyuk
  0 siblings, 0 replies; 200+ results
From: Dmitry Kozlyuk @ 2021-07-16  9:40 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: dev, thomas

2021-07-08 18:03 (UTC-0700), Tyler Retzlaff:
> On Thu, Jul 08, 2021 at 11:49:53PM +0300, Dmitry Kozlyuk wrote:
> > Hi Tyler,
> > 
> > 2021-07-08 12:21 (UTC-0700), Tyler Retzlaff:  
> > > hi folks,
> > > 
> > > we would like to submit a a patch series that makes dll/dso for dpdk
> > > work on windows. there are two differences in the windows platform that
> > > would need to be address through enhancements to dpdk.
> > > 
> > > (1) windows dynamic objects don't export sufficient information for
> > >     tls variables and the windows loader and runtime would need to be
> > >     enhanced in order to perform runtime linking. [1][2]  
> > 
> > When will the new loader be available?  
> 
> the solution i have prototyped does not directly export the tls variables
> and instead relies on exports of tls offsets within a module.  no loader
> change or new os is required.
> 
> > Will it be ported to Server 2019?  
> 
> not necessary (as per above)
> 
> > Will this functionality require compiler support  
> 
> the prototype was developed using windows clang, mingw code compiles but
> i did not try to run it. i suspect it is okay though haven't examine any
> side-effects when using emul tls like mingw does. anyway mingw dll's
> don't work now and it probably shouldn't block them being available with
> clang.

AFAIK it's the opposite. MinGW can handle TLS varibale export from DLL,
although with "__emutls." prefix and some performance penalty.
Clang can't at all, despite compiling and linking without an issue.

No, it is not acceptable to add a generic feature supported by only one
compiler. (FWIW, I'm displeased even by mlx5 being tied to clang.)
Particularly, I don't understand how could MinGW and clang coexist
if they export different sets of symbols. Apps will need to know
if it's MingW- or clang-compiled DPDK? Sounds messy.

> > (you mention that accessing such variables will be "non-trivial")?  
> 
> the solution involves exporting offsets that then allow explicit tls
> accesses relative to the gs segment. it's non-trivial in the sense that
> none of the normal explicit tls functions in windows are used and the
> compiler doesn't generate the code for implicit tls access. the overhead
> is relatively tolerable (one or two additional dereferences).

A thorough benchmark will be required. I'm afraid that inline assembly
(which %gs mention suggests) can impact optimization of the code nearby.
Ideally it should be a DPDK performance autotest.

> 
> >    
> > > (2) importing exported data symbols from a dll/dso on windows requires
> > >     that the symbol be decorated with dllimport. optionally loading
> > >     performance of dll/dso is also further improved by decorating
> > >     exported function symbols. [3]  
> > 
> > Does it affect ABI?  
> 
> the data symbols are already part of the abi for linux. this just allows
> them to be properly accessed when exported from dll on windows.
> surprisingly lld-link doesn't fail when building dll's now which it should
> in the absence of a __declspec(dllimport) ms link would.
> 
> on windows now the tls variables are exported but not useful with this
> change we would choose not to export them at all and each exported tls
> variable would be replaced with a new variable.
> 
> one nit (which we will get separate feedback on) is how to export
> symbols only on windows (and don't export them on linux) because similar
> to the tls variables linux has no use for my new variables.

There's already WINDOWS_NO_EXPORT mark in .map to generate .def,
likewise, .map for Linux/FreeBSD could be generated from a basic .map
with similar marks.

> > 
> > It is also a huge code change, although a mechanical one.
> > Is it required? All exported symbols are listed in .map/def, after all.  
> 
> if broad sweeping mechanical change is a sensitive issue we can limit
> the change to just the data symbols which are required. but keeping in
> mind there is a penalty on load time when the function symbols are not
> decorated. ultimately we would like them all properly decorated but we
> don't need to push it now since we're just trying to enable the
> functionality.

I was asking in connection with the previous question about ABI,
because 21.11 ABI freeze may be a two-year one. Since ABI is not affected
for Unix and for Windows we don't maintain it currently, there is no rush for
the change at least.


^ permalink raw reply	[relevance 4%]

Results 10001-10200 of ~18000   |  | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2020-04-24  7:07     [dpdk-dev] [PATCH v1 0/2] Use WFE for spinlock and ring Gavin Hu
2021-04-25  5:56     ` [dpdk-dev] " Ruifeng Wang
2021-07-07 14:47  0%   ` Stephen Hemminger
2021-07-08  9:41  0%     ` Ruifeng Wang
2021-07-08 16:58  0%       ` Honnappa Nagarahalli
2021-07-07  5:43  3% ` [dpdk-dev] [PATCH v4 0/3] " Ruifeng Wang
2021-07-07  5:48  3% ` Ruifeng Wang
2021-07-09 18:39  0%   ` Thomas Monjalon
2021-05-08  8:00     [dpdk-dev] [RFC] lib/ethdev: add dev configured flag Huisong Li
2021-07-06  4:10     ` [dpdk-dev] [PATCH V2] ethdev: " Huisong Li
2021-07-06  8:36  4%   ` Andrew Rybchenko
2021-07-07  2:55  0%     ` Huisong Li
2021-07-07  8:25  3%       ` Andrew Rybchenko
2021-07-07  9:26  0%         ` Huisong Li
2021-07-07  7:39  3%     ` David Marchand
2021-07-07  8:23  0%       ` Andrew Rybchenko
2021-07-07  9:36  0%         ` David Marchand
2021-07-07  9:59  0%           ` Thomas Monjalon
2021-07-07  9:53     ` [dpdk-dev] [PATCH V3] " Huisong Li
2021-07-08  9:56  3%   ` David Marchand
2021-05-10 13:47     [dpdk-dev] [RFC v2] bus/auxiliary: introduce auxiliary bus Xueming Li
2021-06-21 16:11     ` [dpdk-dev] [PATCH v4 2/2] " Thomas Monjalon
2021-06-22 23:50       ` Xueming(Steven) Li
2021-06-23  8:15  4%     ` Thomas Monjalon
2021-06-23 14:52  3%       ` Xueming(Steven) Li
2021-06-24  6:37  3%         ` Thomas Monjalon
2021-06-24  8:42  3%           ` Xueming(Steven) Li
2021-05-27 15:24     [dpdk-dev] [PATCH 00/20] net/sfc: support flow API COUNT action Andrew Rybchenko
2021-06-18 13:40     ` [dpdk-dev] [PATCH v3 19/20] net/sfc: support flow action COUNT in transfer rules Andrew Rybchenko
2021-06-21  8:28       ` David Marchand
2021-06-21  9:30         ` Thomas Monjalon
2021-07-01  9:22           ` Andrew Rybchenko
2021-07-01 12:34             ` David Marchand
2021-07-01 13:05               ` Andrew Rybchenko
2021-07-02  8:43                 ` Andrew Rybchenko
2021-07-02 13:37  3%               ` David Marchand
2021-07-02 13:39  0%                 ` Andrew Rybchenko
2021-07-02 12:30     ` Thomas Monjalon
2021-07-02 12:53       ` Andrew Rybchenko
2021-07-04 19:45  3%     ` Thomas Monjalon
2021-07-05  8:41  0%       ` Andrew Rybchenko
2021-05-27 15:28     [dpdk-dev] [PATCH] net: introduce IPv4 ihl and version fields Gregory Etelson
2021-05-27 15:56     ` Morten Brørup
2021-05-28 10:20       ` Ananyev, Konstantin
2021-05-28 10:52         ` Morten Brørup
2021-05-28 14:18           ` Gregory Etelson
2021-05-31  9:58             ` Ananyev, Konstantin
2021-05-31 11:10               ` Gregory Etelson
2021-06-02  9:51                 ` Gregory Etelson
2021-06-10  4:10                   ` Gregory Etelson
2021-06-10  9:22                     ` Olivier Matz
2021-06-14 16:36                       ` Andrew Rybchenko
2021-06-17 16:29  0%                     ` Ferruh Yigit
2021-06-17 15:02  3% ` Tyler Retzlaff
2021-06-01  1:56     [dpdk-dev] [PATCH v1 0/2] relative path support for ABI compatibility check Feifei Wang
2021-06-01  1:56     ` [dpdk-dev] [PATCH v1 1/2] devtools: add " Feifei Wang
2021-06-22  2:08  4%   ` [dpdk-dev] 回复: " Feifei Wang
2021-06-22  9:19  4%   ` [dpdk-dev] " Bruce Richardson
2021-06-01 12:00     [dpdk-dev] [PATCH v1 0/7] Enhancements for PMD power management Anatoly Burakov
2021-06-25 14:00     ` [dpdk-dev] [PATCH v2 " Anatoly Burakov
2021-06-25 14:00  3%   ` [dpdk-dev] [PATCH v2 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
2021-06-25 14:00  3%   ` [dpdk-dev] [PATCH v2 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
2021-06-28 12:41       ` [dpdk-dev] [PATCH v3 0/7] Enhancements for PMD power management Anatoly Burakov
2021-06-28 12:41  3%     ` [dpdk-dev] [PATCH v3 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
2021-06-28 12:41  3%     ` [dpdk-dev] [PATCH v3 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
2021-06-28 15:54         ` [dpdk-dev] [PATCH v4 0/7] Enhancements for PMD power management Anatoly Burakov
2021-06-28 15:54  3%       ` [dpdk-dev] [PATCH v4 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
2021-06-28 15:54  3%       ` [dpdk-dev] [PATCH v4 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
2021-06-29 15:48           ` [dpdk-dev] [PATCH v5 0/7] Enhancements for PMD power management Anatoly Burakov
2021-06-29 15:48  3%         ` [dpdk-dev] [PATCH v5 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
2021-06-29 15:48  3%         ` [dpdk-dev] [PATCH v5 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
2021-07-05 15:21             ` [dpdk-dev] [PATCH v6 0/7] Enhancements for PMD power management Anatoly Burakov
2021-07-05 15:21  3%           ` [dpdk-dev] [PATCH v6 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
2021-07-05 15:21  3%           ` [dpdk-dev] [PATCH v6 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
2021-07-07 10:48               ` [dpdk-dev] [PATCH v7 0/7] Enhancements for PMD power management Anatoly Burakov
2021-07-07 10:48  3%             ` [dpdk-dev] [PATCH v7 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
2021-07-07 10:48  3%             ` [dpdk-dev] [PATCH v7 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
2021-07-08 14:13                 ` [dpdk-dev] [PATCH v8 0/7] Enhancements for PMD power management Anatoly Burakov
2021-07-08 14:13  3%               ` [dpdk-dev] [PATCH v8 1/7] power_intrinsics: use callbacks for comparison Anatoly Burakov
2021-07-08 16:56  0%                 ` McDaniel, Timothy
2021-07-08 14:13  3%               ` [dpdk-dev] [PATCH v8 4/7] power: remove thread safety from PMD power API's Anatoly Burakov
2021-07-09 15:53                   ` [dpdk-dev] [PATCH v9 0/8] Enhancements for PMD power management Anatoly Burakov
2021-07-09 15:53  3%                 ` [dpdk-dev] [PATCH v9 1/8] eal: use callbacks for power monitoring comparison Anatoly Burakov
2021-07-09 16:00  3%                   ` Anatoly Burakov
2021-07-09 15:53  3%                 ` [dpdk-dev] [PATCH v9 5/8] power: remove thread safety from PMD power API's Anatoly Burakov
2021-07-09 16:00  3%                   ` Anatoly Burakov
2021-07-09 16:08                     ` [dpdk-dev] [PATCH v10 0/8] Enhancements for PMD power management Anatoly Burakov
2021-07-09 16:08  3%                   ` [dpdk-dev] [PATCH v10 1/8] eal: use callbacks for power monitoring comparison Anatoly Burakov
2021-07-09 16:08  3%                   ` [dpdk-dev] [PATCH v10 5/8] power: remove thread safety from PMD power API's Anatoly Burakov
2021-06-04 23:38     [dpdk-dev] [PATCH v8 00/10] eal: Add EAL API for threading Narcisa Ana Maria Vasile
2021-06-04 23:44     ` [dpdk-dev] [PATCH v9 " Narcisa Ana Maria Vasile
2021-06-04 23:44       ` [dpdk-dev] [PATCH v9 10/10] Enable the new EAL thread API Narcisa Ana Maria Vasile
2021-06-08  5:50         ` Narcisa Ana Maria Vasile
2021-06-08  7:45           ` David Marchand
2021-06-18 21:53  0%         ` Narcisa Ana Maria Vasile
2021-06-18 21:26  3%   ` [dpdk-dev] [PATCH v10 0/9] eal: Add EAL API for threading Narcisa Ana Maria Vasile
2021-06-14 10:58     [dpdk-dev] [PATCH] parray: introduce internal API for dynamic arrays Thomas Monjalon
2021-06-14 12:22     ` Morten Brørup
2021-06-14 13:15       ` Bruce Richardson
2021-06-14 13:32     ` Thomas Monjalon
2021-06-14 14:59       ` Ananyev, Konstantin
2021-06-14 15:54         ` Ananyev, Konstantin
2021-06-17 13:08  3%       ` Ferruh Yigit
2021-06-17 14:58  0%         ` Ananyev, Konstantin
2021-06-17 15:17  0%           ` Morten Brørup
2021-06-17 16:12  0%             ` Ferruh Yigit
2021-06-17 16:55  0%               ` Morten Brørup
2021-06-18 10:21  0%                 ` Ferruh Yigit
2021-06-17 17:05  0%               ` Ananyev, Konstantin
2021-06-18 10:28  0%                 ` Ferruh Yigit
2021-06-17 15:44  3%           ` Ferruh Yigit
2021-06-18 10:41  0%             ` Ananyev, Konstantin
2021-06-18 10:49  0%               ` Ferruh Yigit
2021-06-21 11:06  0%               ` Ananyev, Konstantin
2021-06-21 14:05  0%                 ` Ferruh Yigit
2021-06-21 14:42  0%                   ` Ananyev, Konstantin
2021-06-21 15:32  0%                     ` Ferruh Yigit
2021-06-21 15:37  0%                       ` Ananyev, Konstantin
2021-06-14 15:48       ` Morten Brørup
2021-06-15  6:48         ` Thomas Monjalon
2021-06-16  9:42           ` Jerin Jacob
2021-06-16 11:27             ` Morten Brørup
2021-06-16 13:02  0%           ` Bruce Richardson
2021-06-16 15:01  0%             ` Morten Brørup
2021-06-15  9:01     [dpdk-dev] [RFC PATCH v2 0/3] Add PIE support for HQoS library Liguzinski, WojciechX
2021-06-21  7:35  3% ` [dpdk-dev] [RFC PATCH v3 " Liguzinski, WojciechX
2021-07-05  8:04  3%   ` [dpdk-dev] [RFC PATCH v4 " Liguzinski, WojciechX
2021-06-17  9:17     [dpdk-dev] [PATCH v6] ethdev: add new ext hdr for gtp psc Raslan Darawsheh
2021-06-22  7:27     ` Singh, Aman Deep
2021-07-01 14:06       ` Andrew Rybchenko
2021-07-06 14:24         ` Raslan Darawsheh
2021-07-08  9:23           ` Andrew Rybchenko
2021-07-08  9:27  4%         ` Raslan Darawsheh
2021-07-08  9:39  0%           ` Andrew Rybchenko
2021-07-08 10:29  0%             ` Thomas Monjalon
2021-06-18 16:36  5% [dpdk-dev] [PATCH] devtools: script to track map symbols Ray Kinsella
2021-06-21 15:25  6% ` [dpdk-dev] [PATCH v3] " Ray Kinsella
2021-06-21 15:35  6% ` [dpdk-dev] [PATCH v4] " Ray Kinsella
2021-06-22 10:19  6% ` [dpdk-dev] [PATCH v5] " Ray Kinsella
2021-06-18 21:54     [dpdk-dev] [PATCH 0/6] Enable the internal EAL thread API Narcisa Ana Maria Vasile
2021-06-18 21:54  4% ` [dpdk-dev] [PATCH 2/6] eal: add function for control thread creation Narcisa Ana Maria Vasile
2021-06-19  1:57  4% ` [dpdk-dev] [PATCH v2 0/6] Enable the internal EAL thread API Narcisa Ana Maria Vasile
2021-06-19  1:57  4%   ` [dpdk-dev] [PATCH v2 2/6] eal: add function for control thread creation Narcisa Ana Maria Vasile
2021-06-21  9:18     [dpdk-dev] [PATCH] devtools: script to track map symbols Kinsella, Ray
2021-06-21 15:11  5% ` Ray Kinsella
2021-06-22 15:50 12% [dpdk-dev] [PATCH v1] doc: update ABI in MAINTAINERS file Ray Kinsella
2021-06-25  8:08  7% ` Ferruh Yigit
2021-07-09 15:50  4%   ` Thomas Monjalon
2021-06-22 16:48     [dpdk-dev] [PATCH 0/2] OCTEONTX crypto adapter support Shijith Thotton
2021-06-23 20:53     ` [dpdk-dev] [PATCH v2 " Shijith Thotton
2021-06-23 20:53       ` [dpdk-dev] [PATCH v2 1/2] drivers: add octeontx crypto adapter framework Shijith Thotton
2021-07-15 14:21  5%     ` David Marchand
2021-07-16  8:39  3%       ` [dpdk-dev] [EXT] " Akhil Goyal
2021-06-23 20:53       ` [dpdk-dev] [PATCH v2 2/2] drivers: add octeontx crypto adapter data path Shijith Thotton
2021-06-30  8:54         ` Akhil Goyal
2021-06-30 16:23  4%       ` [dpdk-dev] [dpdk-ci] " Brandon Lo
2021-06-23  0:03     [dpdk-dev] [PATCH v5 2/2] bus/auxiliary: introduce auxiliary bus Xueming Li
2021-06-25 11:47     ` [dpdk-dev] [PATCH v6 " Xueming Li
2021-07-04 16:13  3%   ` Andrew Rybchenko
2021-07-05  5:47  0%     ` Xueming(Steven) Li
2021-06-24 10:28  3% [dpdk-dev] Experimental symbols in security lib Kinsella, Ray
2021-06-24 10:49  0% ` Kinsella, Ray
2021-06-24 12:22  0%   ` [dpdk-dev] [EXT] " Akhil Goyal
2021-06-24 10:28     [dpdk-dev] [PATCH 1/2] security: enforce semantics for Tx inline processing Akhil Goyal
2021-07-06 10:56     ` Ananyev, Konstantin
2021-07-06 12:27       ` Nithin Dabilpuram
2021-07-06 12:42         ` Ananyev, Konstantin
2021-07-06 12:58           ` Nithin Dabilpuram
2021-07-06 14:07             ` Ananyev, Konstantin
2021-07-07  9:07               ` Nithin Dabilpuram
2021-07-07  9:59                 ` Ananyev, Konstantin
2021-07-07 11:22                   ` Nithin Dabilpuram
2021-07-10 12:57                     ` Ananyev, Konstantin
2021-07-12 17:01                       ` Nithin Dabilpuram
2021-07-13 12:33  3%                     ` Ananyev, Konstantin
2021-07-13 14:08  0%                       ` Ananyev, Konstantin
2021-07-13 15:58  0%                         ` Nithin Dabilpuram
2021-07-14 11:09  0%                           ` Ananyev, Konstantin
2021-07-14 13:29  0%                             ` Nithin Dabilpuram
2021-07-14 17:28  0%                               ` Ananyev, Konstantin
2021-06-24 10:29  3% [dpdk-dev] Experimental symbols in net lib Kinsella, Ray
2021-06-24 10:29  3% [dpdk-dev] Experimental symbols in mbuf lib Kinsella, Ray
2021-06-24 10:30  3% [dpdk-dev] Experimental symbols in vhost lib Kinsella, Ray
2021-06-24 11:04  0% ` Xia, Chenbo
2021-06-24 10:30  3% [dpdk-dev] Experimental symbols in flow_classify lib Kinsella, Ray
2021-06-24 10:31  3% [dpdk-dev] Experimental symbols in eal lib Kinsella, Ray
2021-06-24 12:14  0% ` David Marchand
2021-06-24 12:15  0%   ` Kinsella, Ray
2021-06-29 16:50  0%   ` Tyler Retzlaff
2021-06-24 10:31  3% [dpdk-dev] Experimental symbols in port lib Kinsella, Ray
2021-06-24 10:32  3% [dpdk-dev] Experimental symbols in compressdev lib Kinsella, Ray
2021-06-24 10:55  0% ` Trahe, Fiona
2021-06-25  7:49  0% ` David Marchand
2021-06-25  9:14  0%   ` Kinsella, Ray
2021-06-24 10:33  3% [dpdk-dev] Experimental symbols in sched lib Kinsella, Ray
2021-06-24 19:21  0% ` Singh, Jasvinder
2021-06-24 10:33  3% [dpdk-dev] Experimental symbols in cryptodev lib Kinsella, Ray
2021-06-24 10:34  3% [dpdk-dev] Experimental symbols in rib lib Kinsella, Ray
2021-06-24 10:34  3% [dpdk-dev] Experimental symbols in pipeline lib Kinsella, Ray
2021-06-24 10:34  3% [dpdk-dev] Experimental symbols in ip_frag Kinsella, Ray
2021-06-24 10:35  3% [dpdk-dev] Experimental symbols in bbdev lib Kinsella, Ray
2021-06-24 15:42  3% ` Chautru, Nicolas
2021-06-24 19:27  3%   ` Kinsella, Ray
2021-06-25  7:48  0% ` David Marchand
2021-06-24 10:36  3% [dpdk-dev] Experimental Symbols in ethdev lib Kinsella, Ray
2021-06-24 10:36  3% [dpdk-dev] Experimental Symbols in kvargs Kinsella, Ray
2021-06-24 10:39  3% [dpdk-dev] Experimental symbols in power lib Kinsella, Ray
2021-06-24 10:42  3% [dpdk-dev] Experimental symbols in kni lib Kinsella, Ray
2021-06-24 13:24  0% ` Ferruh Yigit
2021-06-24 13:54  0%   ` Kinsella, Ray
2021-06-25 13:26  0%     ` Igor Ryzhov
2021-06-28 12:23  0%       ` Ferruh Yigit
2021-06-24 10:44  3% [dpdk-dev] Experimental symbols in metrics lib Kinsella, Ray
2021-06-24 10:46  3% [dpdk-dev] Experimental symbols in fib lib Kinsella, Ray
     [not found]     <c6c3ce36-9585-6fcb-8899-719d6b8a368b@ashroe.eu>
2021-06-24 10:47  0% ` [dpdk-dev] Experimental symbols in hash lib Kinsella, Ray
2021-06-25 11:47     [dpdk-dev] [PATCH v6 1/2] devargs: add common key definition Xueming Li
2021-07-05  6:45     ` [dpdk-dev] [PATCH v8 2/2] bus/auxiliary: introduce auxiliary bus Xueming Li
2021-07-05  9:19  3%   ` Andrew Rybchenko
2021-07-05  9:30  0%     ` Xueming(Steven) Li
2021-07-05  9:35  0%       ` Andrew Rybchenko
2021-07-05 14:57  0%         ` Thomas Monjalon
2021-07-05 15:06  0%           ` Andrew Rybchenko
2021-06-26 15:41  1% [dpdk-dev] 20.11.2 patches review and test Xueming(Steven) Li
2021-06-26 23:08  1% Xueming Li
2021-06-26 23:28  1% Xueming Li
2021-06-30 10:33  0% ` Jiang, YuX
2021-07-06  2:37  0%   ` Xueming(Steven) Li
2021-07-06  3:26  0% ` [dpdk-dev] [dpdk-stable] " Kalesh Anakkur Purayil
2021-07-06  6:47  0%   ` Xueming(Steven) Li
2021-06-29 16:00 21% [dpdk-dev] [PATCH v1] doc: policy on promotion of experimental APIs Ray Kinsella
2021-06-29 16:28  3% ` Tyler Retzlaff
2021-06-29 18:38  0%   ` Kinsella, Ray
2021-06-30 19:56  4%     ` Tyler Retzlaff
2021-07-01  7:56  0%       ` Ferruh Yigit
2021-07-01 14:45  4%         ` Tyler Retzlaff
2021-07-01 10:19  4%       ` Kinsella, Ray
2021-07-01 15:09  4%         ` Tyler Retzlaff
2021-07-02  6:30  4%           ` Kinsella, Ray
2021-07-01 10:31 23% ` [dpdk-dev] [PATCH v2] " Ray Kinsella
2021-07-01 10:38 23% ` [dpdk-dev] [PATCH v3] doc: policy on the " Ray Kinsella
2021-07-07 18:32  0%   ` Tyler Retzlaff
2021-07-09  6:16  0%   ` Jerin Jacob
2021-07-09 19:15  3%     ` Tyler Retzlaff
2021-07-11  7:22  0%       ` Jerin Jacob
2021-06-30 12:46     [dpdk-dev] [PATCH] test: fix crypto_op length for sessionless case Abhinandan Gujjar
2021-07-02 17:08     ` Gujjar, Abhinandan S
2021-07-02 23:26       ` Ferruh Yigit
2021-07-05  6:30         ` Gujjar, Abhinandan S
2021-07-06 16:09  3%       ` Brandon Lo
2021-07-01 16:30  4% [dpdk-dev] DPDK Release Status Meeting 01/07/2021 Mcnamara, John
2021-07-02  8:00  8% [dpdk-dev] ABI/API stability towards drivers Morten Brørup
2021-07-02  9:45  7% ` [dpdk-dev] [dpdk-techboard] " Ferruh Yigit
2021-07-02 12:26  4% ` Thomas Monjalon
2021-07-07 18:46  8% ` [dpdk-dev] " Tyler Retzlaff
2021-07-02 13:18     [dpdk-dev] [PATCH] dmadev: introduce DMA device library Chengwen Feng
2021-07-04  9:30  3% ` Jerin Jacob
2021-07-05 10:52  0%   ` Bruce Richardson
2021-07-05 15:55  0%     ` Jerin Jacob
2021-07-05 17:16  0%       ` Bruce Richardson
2021-07-07  8:08  0%         ` Jerin Jacob
2021-07-11  9:25     ` [dpdk-dev] [PATCH v2] " Chengwen Feng
2021-07-12 12:05  3%   ` Bruce Richardson
2021-07-12 15:50  3%   ` Bruce Richardson
2021-07-13  9:07  0%     ` Jerin Jacob
2021-07-13 14:19  3%   ` Ananyev, Konstantin
2021-07-13 14:28  0%     ` Bruce Richardson
2021-07-13 12:27     ` [dpdk-dev] [PATCH v3] " Chengwen Feng
2021-07-13 13:06  3%   ` fengchengwen
2021-07-13 13:37  0%     ` Bruce Richardson
2021-07-15  6:44  0%       ` Jerin Jacob
2021-07-02 15:23     [dpdk-dev] [PATCH 21.11] telemetry: remove experimental tags from APIs Bruce Richardson
2021-07-05 10:09     ` Power, Ciara
2021-07-05 10:58  3%   ` Bruce Richardson
2021-07-07 12:37  1% [dpdk-dev] [dpdk-announce] DPDK 20.11.2 released Xueming(Steven) Li
2021-07-07 19:30     [dpdk-dev] [pull-request] next-crypto 21.08 rc1 Akhil Goyal
2021-07-07 21:57  5% ` Thomas Monjalon
2021-07-08  7:39  0%   ` [dpdk-dev] [EXT] " Akhil Goyal
2021-07-08  7:41  0%   ` [dpdk-dev] " Thomas Monjalon
2021-07-08  7:47  3%     ` David Marchand
2021-07-08  7:48  0%     ` [dpdk-dev] [EXT] " Akhil Goyal
2021-07-08 19:21  3% [dpdk-dev] RFC enabling dll/dso for dpdk on windows Tyler Retzlaff
2021-07-08 20:49  3% ` Dmitry Kozlyuk
2021-07-09  1:03  2%   ` Tyler Retzlaff
2021-07-16  9:40  4%     ` Dmitry Kozlyuk
2021-07-09 15:19     [dpdk-dev] [PATCH 1/3] bitrate: change reg implementation to match API description Kevin Traynor
2021-07-09 15:19  3% ` [dpdk-dev] [PATCH 3/3] bitrate: promote rte_stats_bitrate_free() to stable Kevin Traynor
2021-07-12  8:02  4% [dpdk-dev] [PATCH v1] doc: update atomic operation deprecation Joyce Kong
2021-07-12 16:17  3% [dpdk-dev] [PATCH] ethdev: fix representor port ID search by name Andrew Rybchenko
2021-07-13 13:35  3% [dpdk-dev] [PATCH 00/10] new features for ipsec and security libraries Radu Nicolau
2021-07-13 20:12  3% [dpdk-dev] [PATCH] eal: fix argument to rte_bsf32_safe Stephen Hemminger
2021-07-14 15:11  4% [dpdk-dev] Minutes of Technical Board Meeting, 2021-06-30 Aaron Conole
2021-07-14 15:15  4% [dpdk-dev] Minutes of Technical Board Meeting, 2021-06-16 Thomas Monjalon
2021-07-15  9:29  5% [dpdk-dev] Techboard - minutes of meeting 2021-07-14 Bruce Richardson
2021-07-15 11:33     [dpdk-dev] [PATCH v3] app/testpmd: fix testpmd doesn't show RSS hash offload Jie Wang
2021-07-15 11:57     ` [dpdk-dev] [PATCH v4] " Jie Wang
2021-07-15  4:53       ` Li, Xiaoyun
2021-07-16  8:30         ` Li, Xiaoyun
2021-07-16  8:52  3%       ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
2021-07-15 22:28  4% [dpdk-dev] DPDK Release Status Meeting 15/07/2021 Mcnamara, John

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).