* [dpdk-dev] Arm roadmap for 20.05 @ 2020-03-10 16:42 Honnappa Nagarahalli 2020-03-11 8:25 ` Mattias Rönnblom 0 siblings, 1 reply; 14+ messages in thread From: Honnappa Nagarahalli @ 2020-03-10 16:42 UTC (permalink / raw) To: dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, Honnappa Nagarahalli, nd, nd Hello, Following are the work items planned for 20.05: 1) Use C11 atomic APIs in timer library 2) Use C11 atomic APIs in service cores 3) Use C11 atomics in VirtIO split ring 4) Performance optimizations in i40e and MLX drivers for Arm platforms 5) RCU defer API 6) Enable Travis CI with no huge-page tests - ~25 test cases Thank you, Honnappa ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-10 16:42 [dpdk-dev] Arm roadmap for 20.05 Honnappa Nagarahalli @ 2020-03-11 8:25 ` Mattias Rönnblom 2020-03-20 20:45 ` Honnappa Nagarahalli 0 siblings, 1 reply; 14+ messages in thread From: Mattias Rönnblom @ 2020-03-11 8:25 UTC (permalink / raw) To: Honnappa Nagarahalli, dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd On 2020-03-10 17:42, Honnappa Nagarahalli wrote: > Hello, > Following are the work items planned for 20.05: > > 1) Use C11 atomic APIs in timer library > 2) Use C11 atomic APIs in service cores > 3) Use C11 atomics in VirtIO split ring > 4) Performance optimizations in i40e and MLX drivers for Arm platforms > 5) RCU defer API > 6) Enable Travis CI with no huge-page tests - ~25 test cases > > Thank you, > Honnappa Maybe you should have a look at legacy DPDK atomics as well? Avoiding a full barrier for the add operation, for example. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-11 8:25 ` Mattias Rönnblom @ 2020-03-20 20:45 ` Honnappa Nagarahalli 2020-03-21 8:17 ` Mattias Rönnblom 0 siblings, 1 reply; 14+ messages in thread From: Honnappa Nagarahalli @ 2020-03-20 20:45 UTC (permalink / raw) To: Mattias Rönnblom, dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd, Honnappa Nagarahalli, nd <snip> > Subject: Re: [dpdk-dev] Arm roadmap for 20.05 > > On 2020-03-10 17:42, Honnappa Nagarahalli wrote: > > Hello, > > Following are the work items planned for 20.05: > > > > 1) Use C11 atomic APIs in timer library > > 2) Use C11 atomic APIs in service cores > > 3) Use C11 atomics in VirtIO split ring > > 4) Performance optimizations in i40e and MLX drivers for Arm platforms > > 5) RCU defer API > > 6) Enable Travis CI with no huge-page tests - ~25 test cases > > > > Thank you, > > Honnappa > > Maybe you should have a look at legacy DPDK atomics as well? Avoiding a full > barrier for the add operation, for example. By legacy, I believe you meant rte_atomic APIs. Those APIs do not take memory order as a parameter. So, it is difficult to change the implementation for those APIs. For ex: the add operation could take a RELEASE or RELAXED order depending on the use case. So, the proposal is to deprecate the rte_atomic APIs and use C11 APIs directly. The proposal is here: https://patches.dpdk.org/cover/66745/ > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-20 20:45 ` Honnappa Nagarahalli @ 2020-03-21 8:17 ` Mattias Rönnblom 2020-03-21 8:23 ` Mattias Rönnblom 2020-03-23 17:12 ` Honnappa Nagarahalli 0 siblings, 2 replies; 14+ messages in thread From: Mattias Rönnblom @ 2020-03-21 8:17 UTC (permalink / raw) To: Honnappa Nagarahalli, dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd On 2020-03-20 21:45, Honnappa Nagarahalli wrote: > <snip> > >> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 >> >> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: >>> Hello, >>> Following are the work items planned for 20.05: >>> >>> 1) Use C11 atomic APIs in timer library >>> 2) Use C11 atomic APIs in service cores >>> 3) Use C11 atomics in VirtIO split ring >>> 4) Performance optimizations in i40e and MLX drivers for Arm platforms >>> 5) RCU defer API >>> 6) Enable Travis CI with no huge-page tests - ~25 test cases >>> >>> Thank you, >>> Honnappa >> Maybe you should have a look at legacy DPDK atomics as well? Avoiding a full >> barrier for the add operation, for example. > By legacy, I believe you meant rte_atomic APIs. Those APIs do not take memory order as a parameter. So, it is difficult to change the implementation for those APIs. For ex: the add operation could take a RELEASE or RELAXED order depending on the use case. > So, the proposal is to deprecate the rte_atomic APIs and use C11 APIs directly. The proposal is here: https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185-865b3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2-aa5e88331555&u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F Even though rte_atomic lacks the flexibility of C11 atomics, there might still be areas of improvement. Such improvements will have an instant effect, as opposed to waiting for all the rte_atomic users to change. The rte_atomic API leaves ordering unspecified, unfortunately. In the Linux kernel, from which DPDK seems to borrow much of the atomics and memory order related semantics, an atomic add doesn't imply any memory barriers. The current __sync_fetch_and_add()-based implementation implies a full barrier (ldadd+dmb) or release (ldaddal, on v8.1-a). If you would use C11 atomics to implement rte_atomic in ARM, you could use a relaxed memory order on rte_atomic*_add() (assuming you agree those are the implicit semantics of the legacy API) and just get an ldadd instruction. An alternative would be to implement the same thing in assembler, of course. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-21 8:17 ` Mattias Rönnblom @ 2020-03-21 8:23 ` Mattias Rönnblom 2020-03-23 17:14 ` Honnappa Nagarahalli 2020-03-23 17:12 ` Honnappa Nagarahalli 1 sibling, 1 reply; 14+ messages in thread From: Mattias Rönnblom @ 2020-03-21 8:23 UTC (permalink / raw) To: Honnappa Nagarahalli, dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd On 2020-03-21 09:17, Mattias Rönnblom wrote: > On 2020-03-20 21:45, Honnappa Nagarahalli wrote: >> <snip> >> >>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 >>> >>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: >>>> Hello, >>>> Following are the work items planned for 20.05: >>>> >>>> 1) Use C11 atomic APIs in timer library >>>> 2) Use C11 atomic APIs in service cores >>>> 3) Use C11 atomics in VirtIO split ring >>>> 4) Performance optimizations in i40e and MLX drivers for Arm platforms >>>> 5) RCU defer API >>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases >>>> >>>> Thank you, >>>> Honnappa >>> Maybe you should have a look at legacy DPDK atomics as well? Avoiding a full >>> barrier for the add operation, for example. >> By legacy, I believe you meant rte_atomic APIs. Those APIs do not take memory order as a parameter. So, it is difficult to change the implementation for those APIs. For ex: the add operation could take a RELEASE or RELAXED order depending on the use case. >> So, the proposal is to deprecate the rte_atomic APIs and use C11 APIs directly. The proposal is here: https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185-865b3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2-aa5e88331555&u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F > Even though rte_atomic lacks the flexibility of C11 atomics, there might > still be areas of improvement. Such improvements will have an instant > effect, as opposed to waiting for all the rte_atomic users to change. > > > The rte_atomic API leaves ordering unspecified, unfortunately. In the > Linux kernel, from which DPDK seems to borrow much of the atomics and > memory order related semantics, an atomic add doesn't imply any memory > barriers. The current __sync_fetch_and_add()-based implementation > implies a full barrier (ldadd+dmb) or release (ldaddal, on v8.1-a). If > you would use C11 atomics to implement rte_atomic in ARM, you could use > a relaxed memory order on rte_atomic*_add() (assuming you agree those > are the implicit semantics of the legacy API) and just get an ldadd > instruction. An alternative would be to implement the same thing in > assembler, of course. > > Another approach might be to just scrap all of the intrinsics and inline assembler used for all the functions in rte_atomic, on all architectures, and use C11 atomics instead. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-21 8:23 ` Mattias Rönnblom @ 2020-03-23 17:14 ` Honnappa Nagarahalli 2020-03-23 17:34 ` Mattias Rönnblom 0 siblings, 1 reply; 14+ messages in thread From: Honnappa Nagarahalli @ 2020-03-23 17:14 UTC (permalink / raw) To: Mattias Rönnblom, dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd, Honnappa Nagarahalli, nd <snip> > >> > >>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 > >>> > >>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: > >>>> Hello, > >>>> Following are the work items planned for 20.05: > >>>> > >>>> 1) Use C11 atomic APIs in timer library > >>>> 2) Use C11 atomic APIs in service cores > >>>> 3) Use C11 atomics in VirtIO split ring > >>>> 4) Performance optimizations in i40e and MLX drivers for Arm > >>>> platforms > >>>> 5) RCU defer API > >>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases > >>>> > >>>> Thank you, > >>>> Honnappa > >>> Maybe you should have a look at legacy DPDK atomics as well? > >>> Avoiding a full barrier for the add operation, for example. > >> By legacy, I believe you meant rte_atomic APIs. Those APIs do not take > memory order as a parameter. So, it is difficult to change the implementation > for those APIs. For ex: the add operation could take a RELEASE or RELAXED > order depending on the use case. > >> So, the proposal is to deprecate the rte_atomic APIs and use C11 APIs > >> directly. The proposal is here: > >> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185- > 865b > >> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2- > aa5e88331555& > >> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F > > Even though rte_atomic lacks the flexibility of C11 atomics, there > > might still be areas of improvement. Such improvements will have an > > instant effect, as opposed to waiting for all the rte_atomic users to change. > > > > > > The rte_atomic API leaves ordering unspecified, unfortunately. In the > > Linux kernel, from which DPDK seems to borrow much of the atomics and > > memory order related semantics, an atomic add doesn't imply any memory > > barriers. The current __sync_fetch_and_add()-based implementation > > implies a full barrier (ldadd+dmb) or release (ldaddal, on v8.1-a). If > > you would use C11 atomics to implement rte_atomic in ARM, you could > > use a relaxed memory order on rte_atomic*_add() (assuming you agree > > those are the implicit semantics of the legacy API) and just get an > > ldadd instruction. An alternative would be to implement the same thing > > in assembler, of course. > > > > > Another approach might be to just scrap all of the intrinsics and inline > assembler used for all the functions in rte_atomic, on all architectures, and > use C11 atomics instead. Yes, this is the approach we are taking. But, it does not solve the use of rte_atomic APIs in the applications. > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-23 17:14 ` Honnappa Nagarahalli @ 2020-03-23 17:34 ` Mattias Rönnblom 2020-03-24 8:01 ` Morten Brørup 2020-03-24 18:53 ` Honnappa Nagarahalli 0 siblings, 2 replies; 14+ messages in thread From: Mattias Rönnblom @ 2020-03-23 17:34 UTC (permalink / raw) To: Honnappa Nagarahalli, dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd On 2020-03-23 18:14, Honnappa Nagarahalli wrote: > <snip> > >>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 >>>>> >>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: >>>>>> Hello, >>>>>> Following are the work items planned for 20.05: >>>>>> >>>>>> 1) Use C11 atomic APIs in timer library >>>>>> 2) Use C11 atomic APIs in service cores >>>>>> 3) Use C11 atomics in VirtIO split ring >>>>>> 4) Performance optimizations in i40e and MLX drivers for Arm >>>>>> platforms >>>>>> 5) RCU defer API >>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases >>>>>> >>>>>> Thank you, >>>>>> Honnappa >>>>> Maybe you should have a look at legacy DPDK atomics as well? >>>>> Avoiding a full barrier for the add operation, for example. >>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do not take >> memory order as a parameter. So, it is difficult to change the implementation >> for those APIs. For ex: the add operation could take a RELEASE or RELAXED >> order depending on the use case. >>>> So, the proposal is to deprecate the rte_atomic APIs and use C11 APIs >>>> directly. The proposal is here: >>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185- >> 865b >>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2- >> aa5e88331555& >>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F >>> Even though rte_atomic lacks the flexibility of C11 atomics, there >>> might still be areas of improvement. Such improvements will have an >>> instant effect, as opposed to waiting for all the rte_atomic users to change. >>> >>> >>> The rte_atomic API leaves ordering unspecified, unfortunately. In the >>> Linux kernel, from which DPDK seems to borrow much of the atomics and >>> memory order related semantics, an atomic add doesn't imply any memory >>> barriers. The current __sync_fetch_and_add()-based implementation >>> implies a full barrier (ldadd+dmb) or release (ldaddal, on v8.1-a). If >>> you would use C11 atomics to implement rte_atomic in ARM, you could >>> use a relaxed memory order on rte_atomic*_add() (assuming you agree >>> those are the implicit semantics of the legacy API) and just get an >>> ldadd instruction. An alternative would be to implement the same thing >>> in assembler, of course. >>> >>> >> Another approach might be to just scrap all of the intrinsics and inline >> assembler used for all the functions in rte_atomic, on all architectures, and >> use C11 atomics instead. > Yes, this is the approach we are taking. But, it does not solve the use of rte_atomic APIs in the applications. Agreed. Another question. "C11 atomics" here seems to mean using GCC instrinsics, normally used to implement C11 atomics, not C11 atomics (i.e. <stdatomic.h>). What is the reason directly calling the intrinsics, rather than using the standard API? With this in mind, wouldn't be better to extend <rte_atomic.h> with functions that take a memory ordering parameter? And properly document the memory ordering for the functions already in this API, and maybe deprecate some functions in favor of others, more C11-like, functions? If not, assuming <stdatomic.h> can't be used, wouldn't it be better if we added a <rte_stdatomic.h>, which mimics the standard API, maybe with some DPDK tweaks, plus potentially with DPDK-specific extensions as well? Directly accessing instrinsics will lead to things like __atomic_add_ifless() (already in DPDK code base), when people need to extend the API. This very much look like GCC built-in function, but is not. Sorry for hijacking the ARM roadmap thread. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-23 17:34 ` Mattias Rönnblom @ 2020-03-24 8:01 ` Morten Brørup 2020-03-24 18:53 ` Honnappa Nagarahalli 1 sibling, 0 replies; 14+ messages in thread From: Morten Brørup @ 2020-03-24 8:01 UTC (permalink / raw) To: Mattias Rönnblom, Honnappa Nagarahalli, dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Mattias Rönnblom > Sent: Monday, March 23, 2020 6:35 PM > > On 2020-03-23 18:14, Honnappa Nagarahalli wrote: > > <snip> > > > >>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 > >>>>> > >>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: > >>>>>> Hello, > >>>>>> Following are the work items planned for 20.05: > >>>>>> > >>>>>> 1) Use C11 atomic APIs in timer library > >>>>>> 2) Use C11 atomic APIs in service cores > >>>>>> 3) Use C11 atomics in VirtIO split ring > >>>>>> 4) Performance optimizations in i40e and MLX drivers for Arm > >>>>>> platforms > >>>>>> 5) RCU defer API > >>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases > >>>>>> > >>>>>> Thank you, > >>>>>> Honnappa > >>>>> Maybe you should have a look at legacy DPDK atomics as well? > >>>>> Avoiding a full barrier for the add operation, for example. > >>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do not > take > >> memory order as a parameter. So, it is difficult to change the > implementation > >> for those APIs. For ex: the add operation could take a RELEASE or > RELAXED > >> order depending on the use case. > >>>> So, the proposal is to deprecate the rte_atomic APIs and use C11 > APIs > >>>> directly. The proposal is here: > >>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185- > >> 865b > >>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2- > >> aa5e88331555& > >>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F > >>> Even though rte_atomic lacks the flexibility of C11 atomics, there > >>> might still be areas of improvement. Such improvements will have an > >>> instant effect, as opposed to waiting for all the rte_atomic users > to change. > >>> > >>> > >>> The rte_atomic API leaves ordering unspecified, unfortunately. In > the > >>> Linux kernel, from which DPDK seems to borrow much of the atomics > and > >>> memory order related semantics, an atomic add doesn't imply any > memory > >>> barriers. The current __sync_fetch_and_add()-based implementation > >>> implies a full barrier (ldadd+dmb) or release (ldaddal, on v8.1-a). > If > >>> you would use C11 atomics to implement rte_atomic in ARM, you could > >>> use a relaxed memory order on rte_atomic*_add() (assuming you agree > >>> those are the implicit semantics of the legacy API) and just get an > >>> ldadd instruction. An alternative would be to implement the same > thing > >>> in assembler, of course. > >>> > >>> > >> Another approach might be to just scrap all of the intrinsics and > inline > >> assembler used for all the functions in rte_atomic, on all > architectures, and > >> use C11 atomics instead. > > Yes, this is the approach we are taking. But, it does not solve the > use of rte_atomic APIs in the applications. > > Agreed. > > > Another question. "C11 atomics" here seems to mean using GCC > instrinsics, normally used to implement C11 atomics, not C11 atomics > (i.e. <stdatomic.h>). What is the reason directly calling the > intrinsics, rather than using the standard API? I'm surprised if this was the intention. It doesn't make sense replacing obsolete implementations with other soon-to-be-obsolete implementations! Use <stdatomic.h>, not GCC extensions. Consider the boolean type... back in the days everyone had to implement their own boolean types, now everyone uses the "bool" type from <stdbool.h>, not the "_Bool" type from C99. Aren't we looking at exactly the same scenario regarding atomics here? > > With this in mind, wouldn't be better to extend <rte_atomic.h> with > functions that take a memory ordering parameter? And properly document > the memory ordering for the functions already in this API, and maybe > deprecate some functions in favor of others, more C11-like, functions? Atomics have been standardized by C11 and <stdatomic.h>. Sooner or later they will be well documented and well understood. We might as well embrace them now. With this in mind, the existing rte_atomic functions should be documented in a way that describes how they behave in standard atomic terms, i.e. which standard atomics they mimic. And as Honnappa already mentioned, we don't know how applications use them, so we cannot delete them, and it is risky to modify their behavior; the best we can do is document their current behavior properly and mark them as deprecated (superseded by C11 atomics and <stdatomic.h>). > If not, assuming <stdatomic.h> can't be used, wouldn't it be better if > we added a <rte_stdatomic.h>, which mimics the standard API, maybe with > some DPDK tweaks, plus potentially with DPDK-specific extensions as > well? I agree. The future is <stdatomic.h>, and if that header is not available, we can add it ourselves. > > Directly accessing instrinsics will lead to things like > __atomic_add_ifless() (already in DPDK code base), when people need to > extend the API. This very much look like GCC built-in function, but is > not. In other words.... we should stop using GCC atomics and stop defining functions that look like GCC atomics, but use C11 atomics (preferably <stdatomic.h>) and make our own atomic functions look like those instead. > > Sorry for hijacking the ARM roadmap thread. > PS: Thanks to ARM for taking this initiative to modernize critical elements at the very core of DPDK. As we said back in the days: Keep up the good work! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-23 17:34 ` Mattias Rönnblom 2020-03-24 8:01 ` Morten Brørup @ 2020-03-24 18:53 ` Honnappa Nagarahalli 2020-03-24 21:41 ` Honnappa Nagarahalli 2020-04-07 19:10 ` Mattias Rönnblom 1 sibling, 2 replies; 14+ messages in thread From: Honnappa Nagarahalli @ 2020-03-24 18:53 UTC (permalink / raw) To: Mattias Rönnblom, dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd, Honnappa Nagarahalli, nd <snip> > >>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 > >>>>> > >>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: > >>>>>> Hello, > >>>>>> Following are the work items planned for 20.05: > >>>>>> > >>>>>> 1) Use C11 atomic APIs in timer library > >>>>>> 2) Use C11 atomic APIs in service cores > >>>>>> 3) Use C11 atomics in VirtIO split ring > >>>>>> 4) Performance optimizations in i40e and MLX drivers for Arm > >>>>>> platforms > >>>>>> 5) RCU defer API > >>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases > >>>>>> > >>>>>> Thank you, > >>>>>> Honnappa > >>>>> Maybe you should have a look at legacy DPDK atomics as well? > >>>>> Avoiding a full barrier for the add operation, for example. > >>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do not > >>>> take > >> memory order as a parameter. So, it is difficult to change the > >> implementation for those APIs. For ex: the add operation could take a > >> RELEASE or RELAXED order depending on the use case. > >>>> So, the proposal is to deprecate the rte_atomic APIs and use C11 > >>>> APIs directly. The proposal is here: > >>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185- > >> 865b > >>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2- > >> aa5e88331555& > >>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F > >>> Even though rte_atomic lacks the flexibility of C11 atomics, there > >>> might still be areas of improvement. Such improvements will have an > >>> instant effect, as opposed to waiting for all the rte_atomic users to change. > >>> > >>> > >>> The rte_atomic API leaves ordering unspecified, unfortunately. In > >>> the Linux kernel, from which DPDK seems to borrow much of the > >>> atomics and memory order related semantics, an atomic add doesn't > >>> imply any memory barriers. The current __sync_fetch_and_add()-based > >>> implementation implies a full barrier (ldadd+dmb) or release > >>> (ldaddal, on v8.1-a). If you would use C11 atomics to implement > >>> rte_atomic in ARM, you could use a relaxed memory order on > >>> rte_atomic*_add() (assuming you agree those are the implicit > >>> semantics of the legacy API) and just get an ldadd instruction. An > >>> alternative would be to implement the same thing in assembler, of course. > >>> > >>> > >> Another approach might be to just scrap all of the intrinsics and > >> inline assembler used for all the functions in rte_atomic, on all > >> architectures, and use C11 atomics instead. > > Yes, this is the approach we are taking. But, it does not solve the use of > rte_atomic APIs in the applications. > > Agreed. > > > Another question. "C11 atomics" here seems to mean using GCC instrinsics, > normally used to implement C11 atomics, not C11 atomics (i.e. <stdatomic.h>). > What is the reason directly calling the intrinsics, rather than using the standard > API? I did not know they existed for C. Looking at them, they looks like just wrappers around the intrinsics. The advantage seems to be the type check enforced by the compiler. i.e. if a variable is defined of type '_Atomic', the compiler should not allow any non-atomic operations on them. Anything else? I will explore this further. > > > With this in mind, wouldn't be better to extend <rte_atomic.h> with functions > that take a memory ordering parameter? And properly document the memory > ordering for the functions already in this API, and maybe deprecate some > functions in favor of others, more C11-like, functions? I would prefer to use what the language provides rather than creating DPDK's own, which will be just wrappers on top of what C provides. If we follow the existing model of rte_atomic APIs, we will be creating these for every size of the parameter (rte_atomic8/16/32/64_xxx). This results in more core to maintain. > If not, assuming <stdatomic.h> can't be used, wouldn't it be better if we added > a <rte_stdatomic.h>, which mimics the standard API, maybe with some DPDK > tweaks, plus potentially with DPDK-specific extensions as well? What kind of extensions are you thinking about? > > > Directly accessing instrinsics will lead to things like > __atomic_add_ifless() (already in DPDK code base), when people need to > extend the API. This very much look like GCC built-in function, but is not. I think the DPDK code should not be using symbols that will potentially collide with language/library symbols. Luckily, in this case, it is internal to a PMD which can be changed. It also contains more symbols which are on the border to collide with 'stdatomic.h'. > > > Sorry for hijacking the ARM roadmap thread. No problem. I am glad we are having these important discussions. > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-24 18:53 ` Honnappa Nagarahalli @ 2020-03-24 21:41 ` Honnappa Nagarahalli 2020-04-07 5:15 ` Honnappa Nagarahalli 2020-04-07 19:10 ` Mattias Rönnblom 1 sibling, 1 reply; 14+ messages in thread From: Honnappa Nagarahalli @ 2020-03-24 21:41 UTC (permalink / raw) To: Mattias Rönnblom, dev, thomas, david.marchand, Morten Brørup, Ananyev, Konstantin, Richardson, Bruce, Van Haaren, Harry, David Christensen Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd, Honnappa Nagarahalli, nd <snip> (apologies Morten - I missed your response, consolidating the discussion in this thread) + Intel x86 and IBM POWER maintainers > > > >>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 > > >>>>> > > >>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: > > >>>>>> Hello, > > >>>>>> Following are the work items planned for 20.05: > > >>>>>> > > >>>>>> 1) Use C11 atomic APIs in timer library > > >>>>>> 2) Use C11 atomic APIs in service cores > > >>>>>> 3) Use C11 atomics in VirtIO split ring > > >>>>>> 4) Performance optimizations in i40e and MLX drivers for Arm > > >>>>>> platforms > > >>>>>> 5) RCU defer API > > >>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases > > >>>>>> > > >>>>>> Thank you, > > >>>>>> Honnappa > > >>>>> Maybe you should have a look at legacy DPDK atomics as well? > > >>>>> Avoiding a full barrier for the add operation, for example. > > >>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do not > > >>>> take > > >> memory order as a parameter. So, it is difficult to change the > > >> implementation for those APIs. For ex: the add operation could take > > >> a RELEASE or RELAXED order depending on the use case. > > >>>> So, the proposal is to deprecate the rte_atomic APIs and use C11 > > >>>> APIs directly. The proposal is here: > > >>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185- > > >> 865b > > >>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2- > > >> aa5e88331555& > > >>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F > > >>> Even though rte_atomic lacks the flexibility of C11 atomics, there > > >>> might still be areas of improvement. Such improvements will have > > >>> an instant effect, as opposed to waiting for all the rte_atomic users to > change. > > >>> > > >>> > > >>> The rte_atomic API leaves ordering unspecified, unfortunately. In > > >>> the Linux kernel, from which DPDK seems to borrow much of the > > >>> atomics and memory order related semantics, an atomic add doesn't > > >>> imply any memory barriers. The current > > >>> __sync_fetch_and_add()-based implementation implies a full barrier > > >>> (ldadd+dmb) or release (ldaddal, on v8.1-a). If you would use C11 > > >>> atomics to implement rte_atomic in ARM, you could use a relaxed > > >>> memory order on > > >>> rte_atomic*_add() (assuming you agree those are the implicit > > >>> semantics of the legacy API) and just get an ldadd instruction. An > > >>> alternative would be to implement the same thing in assembler, of > course. > > >>> > > >>> > > >> Another approach might be to just scrap all of the intrinsics and > > >> inline assembler used for all the functions in rte_atomic, on all > > >> architectures, and use C11 atomics instead. > > > Yes, this is the approach we are taking. But, it does not solve the > > > use of > > rte_atomic APIs in the applications. > > > > Agreed. > > > > > > Another question. "C11 atomics" here seems to mean using GCC > > instrinsics, normally used to implement C11 atomics, not C11 atomics (i.e. > <stdatomic.h>). > > What is the reason directly calling the intrinsics, rather than using > > the standard API? > I did not know they existed for C. Looking at them, they looks like just > wrappers around the intrinsics. The advantage seems to be the type check > enforced by the compiler. i.e. if a variable is defined of type '_Atomic', the > compiler should not allow any non-atomic operations on them. Anything else? > I will explore this further. I see some issues expressed for Intel ICC compiler [1], but they seem to have been fixed in the latest versions [2]. Please check. [1] https://software.intel.com/en-us/forums/intel-c-compiler/topic/681815 [2] https://software.intel.com/en-us/articles/c11-support-in-intel-c-compiler > > > > > > > With this in mind, wouldn't be better to extend <rte_atomic.h> with > > functions that take a memory ordering parameter? And properly document > > the memory ordering for the functions already in this API, and maybe > > deprecate some functions in favor of others, more C11-like, functions? > I would prefer to use what the language provides rather than creating DPDK's > own, which will be just wrappers on top of what C provides. If we follow the > existing model of rte_atomic APIs, we will be creating these for every size of > the parameter (rte_atomic8/16/32/64_xxx). This results in more core to > maintain. > > > If not, assuming <stdatomic.h> can't be used, wouldn't it be better if > > we added a <rte_stdatomic.h>, which mimics the standard API, maybe > > with some DPDK tweaks, plus potentially with DPDK-specific extensions as > well? > What kind of extensions are you thinking about? > > > > > > > Directly accessing instrinsics will lead to things like > > __atomic_add_ifless() (already in DPDK code base), when people need to > > extend the API. This very much look like GCC built-in function, but is not. > I think the DPDK code should not be using symbols that will potentially collide > with language/library symbols. > Luckily, in this case, it is internal to a PMD which can be changed. > It also contains more symbols which are on the border to collide with > 'stdatomic.h'. > > > > > > > Sorry for hijacking the ARM roadmap thread. > No problem. I am glad we are having these important discussions. > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-24 21:41 ` Honnappa Nagarahalli @ 2020-04-07 5:15 ` Honnappa Nagarahalli 2020-04-09 1:25 ` Chen, Zhaoyan 0 siblings, 1 reply; 14+ messages in thread From: Honnappa Nagarahalli @ 2020-04-07 5:15 UTC (permalink / raw) To: Mattias Rönnblom, dev, thomas, david.marchand, Morten Brørup, Ananyev, Konstantin, Richardson, Bruce, Van Haaren, Harry, David Christensen, Phil Yang Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd, Honnappa Nagarahalli, nd <snip> > Subject: RE: [dpdk-dev] Arm roadmap for 20.05 > > <snip> > (apologies Morten - I missed your response, consolidating the discussion in > this thread) > > + Intel x86 and IBM POWER maintainers > > > > > > >>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 > > > >>>>> > > > >>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: > > > >>>>>> Hello, > > > >>>>>> Following are the work items planned for 20.05: > > > >>>>>> > > > >>>>>> 1) Use C11 atomic APIs in timer library > > > >>>>>> 2) Use C11 atomic APIs in service cores > > > >>>>>> 3) Use C11 atomics in VirtIO split ring > > > >>>>>> 4) Performance optimizations in i40e and MLX drivers for Arm > > > >>>>>> platforms > > > >>>>>> 5) RCU defer API > > > >>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases > > > >>>>>> > > > >>>>>> Thank you, > > > >>>>>> Honnappa > > > >>>>> Maybe you should have a look at legacy DPDK atomics as well? > > > >>>>> Avoiding a full barrier for the add operation, for example. > > > >>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do > > > >>>> not take > > > >> memory order as a parameter. So, it is difficult to change the > > > >> implementation for those APIs. For ex: the add operation could > > > >> take a RELEASE or RELAXED order depending on the use case. > > > >>>> So, the proposal is to deprecate the rte_atomic APIs and use > > > >>>> C11 APIs directly. The proposal is here: > > > >>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7- > 2e04718 > > > >>>> 5- > > > >> 865b > > > >>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2- > > > >> aa5e88331555& > > > >>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F > > > >>> Even though rte_atomic lacks the flexibility of C11 atomics, > > > >>> there might still be areas of improvement. Such improvements > > > >>> will have an instant effect, as opposed to waiting for all the > > > >>> rte_atomic users to > > change. > > > >>> > > > >>> > > > >>> The rte_atomic API leaves ordering unspecified, unfortunately. > > > >>> In the Linux kernel, from which DPDK seems to borrow much of the > > > >>> atomics and memory order related semantics, an atomic add > > > >>> doesn't imply any memory barriers. The current > > > >>> __sync_fetch_and_add()-based implementation implies a full > > > >>> barrier > > > >>> (ldadd+dmb) or release (ldaddal, on v8.1-a). If you would use > > > >>> C11 atomics to implement rte_atomic in ARM, you could use a > > > >>> relaxed memory order on > > > >>> rte_atomic*_add() (assuming you agree those are the implicit > > > >>> semantics of the legacy API) and just get an ldadd instruction. > > > >>> An alternative would be to implement the same thing in > > > >>> assembler, of > > course. > > > >>> > > > >>> > > > >> Another approach might be to just scrap all of the intrinsics and > > > >> inline assembler used for all the functions in rte_atomic, on all > > > >> architectures, and use C11 atomics instead. > > > > Yes, this is the approach we are taking. But, it does not solve > > > > the use of > > > rte_atomic APIs in the applications. > > > > > > Agreed. > > > > > > > > > Another question. "C11 atomics" here seems to mean using GCC > > > instrinsics, normally used to implement C11 atomics, not C11 atomics (i.e. > > <stdatomic.h>). > > > What is the reason directly calling the intrinsics, rather than > > > using the standard API? > > I did not know they existed for C. Looking at them, they looks like > > just wrappers around the intrinsics. The advantage seems to be the > > type check enforced by the compiler. i.e. if a variable is defined of > > type '_Atomic', the compiler should not allow any non-atomic operations on > them. Anything else? > > I will explore this further. > I see some issues expressed for Intel ICC compiler [1], but they seem to have > been fixed in the latest versions [2]. Please check. > > [1] https://software.intel.com/en-us/forums/intel-c-compiler/topic/681815 > [2] https://software.intel.com/en-us/articles/c11-support-in-intel-c-compiler > I looked into this some more. The built-ins are supported in GCC from 4.7 and in clang from 3.1. The stdatomic.h is supported in GCC from 4.9 and in clang from 3.6. I see that Intel Compilation CI has 3 configurations that use GCC 4.8.5 and Clang 3.4.2. Any reasoning for using these? Can these be upgraded? > > > > > > > > > > > With this in mind, wouldn't be better to extend <rte_atomic.h> with > > > functions that take a memory ordering parameter? And properly > > > document the memory ordering for the functions already in this API, > > > and maybe deprecate some functions in favor of others, more C11-like, > functions? > > I would prefer to use what the language provides rather than creating > > DPDK's own, which will be just wrappers on top of what C provides. If > > we follow the existing model of rte_atomic APIs, we will be creating > > these for every size of the parameter (rte_atomic8/16/32/64_xxx). This > > results in more core to maintain. > > > > > If not, assuming <stdatomic.h> can't be used, wouldn't it be better > > > if we added a <rte_stdatomic.h>, which mimics the standard API, > > > maybe with some DPDK tweaks, plus potentially with DPDK-specific > > > extensions as > > well? > > What kind of extensions are you thinking about? > > > > > > > > > > > Directly accessing instrinsics will lead to things like > > > __atomic_add_ifless() (already in DPDK code base), when people need > > > to extend the API. This very much look like GCC built-in function, but is not. > > I think the DPDK code should not be using symbols that will > > potentially collide with language/library symbols. > > Luckily, in this case, it is internal to a PMD which can be changed. > > It also contains more symbols which are on the border to collide with > > 'stdatomic.h'. > > > > > > > > > > > Sorry for hijacking the ARM roadmap thread. > > No problem. I am glad we are having these important discussions. > > > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-04-07 5:15 ` Honnappa Nagarahalli @ 2020-04-09 1:25 ` Chen, Zhaoyan 0 siblings, 0 replies; 14+ messages in thread From: Chen, Zhaoyan @ 2020-04-09 1:25 UTC (permalink / raw) To: Honnappa Nagarahalli, Mattias Rönnblom, dev, thomas, david.marchand, Morten Brørup, Ananyev, Konstantin, Richardson, Bruce, Van Haaren, Harry, David Christensen, Phil Yang Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd, nd, Chen, Zhaoyan > -----Original Message----- > From: dev <dev-bounces@dpdk.org> On Behalf Of Honnappa Nagarahalli > Sent: Tuesday, April 7, 2020 1:15 PM > To: Mattias Rönnblom <mattias.ronnblom@ericsson.com>; dev@dpdk.org; > thomas@monjalon.net; david.marchand@redhat.com; Morten Brørup > <mb@smartsharesystems.com>; Ananyev, Konstantin > <konstantin.ananyev@intel.com>; Richardson, Bruce > <bruce.richardson@intel.com>; Van Haaren, Harry > <harry.van.haaren@intel.com>; David Christensen > <drc@linux.vnet.ibm.com>; Phil Yang <Phil.Yang@arm.com> > Cc: Song Zhu <Song.Zhu@arm.com>; Gavin Hu <Gavin.Hu@arm.com>; Jeff > Brownlee <Jeff.Brownlee@arm.com>; Philippe Robin > <Philippe.Robin@arm.com>; Pravin Kantak <Pravin.Kantak@arm.com>; nd > <nd@arm.com>; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com> > Subject: Re: [dpdk-dev] Arm roadmap for 20.05 > > <snip> > > > Subject: RE: [dpdk-dev] Arm roadmap for 20.05 > > > > <snip> > > (apologies Morten - I missed your response, consolidating the > > discussion in this thread) > > > > + Intel x86 and IBM POWER maintainers > > > > > > > > > >>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 > > > > >>>>> > > > > >>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: > > > > >>>>>> Hello, > > > > >>>>>> Following are the work items planned for 20.05: > > > > >>>>>> > > > > >>>>>> 1) Use C11 atomic APIs in timer library > > > > >>>>>> 2) Use C11 atomic APIs in service cores > > > > >>>>>> 3) Use C11 atomics in VirtIO split ring > > > > >>>>>> 4) Performance optimizations in i40e and MLX drivers for > > > > >>>>>> Arm platforms > > > > >>>>>> 5) RCU defer API > > > > >>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test > > > > >>>>>> cases > > > > >>>>>> > > > > >>>>>> Thank you, > > > > >>>>>> Honnappa > > > > >>>>> Maybe you should have a look at legacy DPDK atomics as well? > > > > >>>>> Avoiding a full barrier for the add operation, for example. > > > > >>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do > > > > >>>> not take > > > > >> memory order as a parameter. So, it is difficult to change the > > > > >> implementation for those APIs. For ex: the add operation could > > > > >> take a RELEASE or RELAXED order depending on the use case. > > > > >>>> So, the proposal is to deprecate the rte_atomic APIs and use > > > > >>>> C11 APIs directly. The proposal is here: > > > > >>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7- > > 2e04718 > > > > >>>> 5- > > > > >> 865b > > > > >>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2- > > > > >> aa5e88331555& > > > > >>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F > > > > >>> Even though rte_atomic lacks the flexibility of C11 atomics, > > > > >>> there might still be areas of improvement. Such improvements > > > > >>> will have an instant effect, as opposed to waiting for all the > > > > >>> rte_atomic users to > > > change. > > > > >>> > > > > >>> > > > > >>> The rte_atomic API leaves ordering unspecified, unfortunately. > > > > >>> In the Linux kernel, from which DPDK seems to borrow much of > > > > >>> the atomics and memory order related semantics, an atomic add > > > > >>> doesn't imply any memory barriers. The current > > > > >>> __sync_fetch_and_add()-based implementation implies a full > > > > >>> barrier > > > > >>> (ldadd+dmb) or release (ldaddal, on v8.1-a). If you would use > > > > >>> C11 atomics to implement rte_atomic in ARM, you could use a > > > > >>> relaxed memory order on > > > > >>> rte_atomic*_add() (assuming you agree those are the implicit > > > > >>> semantics of the legacy API) and just get an ldadd instruction. > > > > >>> An alternative would be to implement the same thing in > > > > >>> assembler, of > > > course. > > > > >>> > > > > >>> > > > > >> Another approach might be to just scrap all of the intrinsics > > > > >> and inline assembler used for all the functions in rte_atomic, > > > > >> on all architectures, and use C11 atomics instead. > > > > > Yes, this is the approach we are taking. But, it does not solve > > > > > the use of > > > > rte_atomic APIs in the applications. > > > > > > > > Agreed. > > > > > > > > > > > > Another question. "C11 atomics" here seems to mean using GCC > > > > instrinsics, normally used to implement C11 atomics, not C11 > atomics (i.e. > > > <stdatomic.h>). > > > > What is the reason directly calling the intrinsics, rather than > > > > using the standard API? > > > I did not know they existed for C. Looking at them, they looks like > > > just wrappers around the intrinsics. The advantage seems to be the > > > type check enforced by the compiler. i.e. if a variable is defined > > > of type '_Atomic', the compiler should not allow any non-atomic > > > operations on > > them. Anything else? > > > I will explore this further. > > I see some issues expressed for Intel ICC compiler [1], but they seem > > to have been fixed in the latest versions [2]. Please check. > > > > [1] > > https://software.intel.com/en-us/forums/intel-c-compiler/topic/681815 > > [2] > > https://software.intel.com/en-us/articles/c11-support-in-intel-c-compi > > ler > > > I looked into this some more. The built-ins are supported in GCC from 4.7 > and in clang from 3.1. The stdatomic.h is supported in GCC from 4.9 and in > clang from 3.6. > > I see that Intel Compilation CI has 3 configurations that use GCC 4.8.5 and > Clang 3.4.2. Any reasoning for using these? Can these be upgraded? [Chen, Zhaoyan] It's associated with CENTOS7.7/RHEL7.7 distro which is still in lifecycle. Even CENTOS7.7/RHEL7.7 was released at 19.08. > > > > > > > > > > > > > > > > With this in mind, wouldn't be better to extend <rte_atomic.h> > > > > with functions that take a memory ordering parameter? And > properly > > > > document the memory ordering for the functions already in this > > > > API, and maybe deprecate some functions in favor of others, more > > > > C11-like, > > functions? > > > I would prefer to use what the language provides rather than > > > creating DPDK's own, which will be just wrappers on top of what C > > > provides. If we follow the existing model of rte_atomic APIs, we > > > will be creating these for every size of the parameter > > > (rte_atomic8/16/32/64_xxx). This results in more core to maintain. > > > > > > > If not, assuming <stdatomic.h> can't be used, wouldn't it be > > > > better if we added a <rte_stdatomic.h>, which mimics the standard > > > > API, maybe with some DPDK tweaks, plus potentially with > > > > DPDK-specific extensions as > > > well? > > > What kind of extensions are you thinking about? > > > > > > > > > > > > > > > Directly accessing instrinsics will lead to things like > > > > __atomic_add_ifless() (already in DPDK code base), when people > > > > need to extend the API. This very much look like GCC built-in > function, but is not. > > > I think the DPDK code should not be using symbols that will > > > potentially collide with language/library symbols. > > > Luckily, in this case, it is internal to a PMD which can be changed. > > > It also contains more symbols which are on the border to collide > > > with 'stdatomic.h'. > > > > > > > > > > > > > > > Sorry for hijacking the ARM roadmap thread. > > > No problem. I am glad we are having these important discussions. > > > > > > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-24 18:53 ` Honnappa Nagarahalli 2020-03-24 21:41 ` Honnappa Nagarahalli @ 2020-04-07 19:10 ` Mattias Rönnblom 1 sibling, 0 replies; 14+ messages in thread From: Mattias Rönnblom @ 2020-04-07 19:10 UTC (permalink / raw) To: Honnappa Nagarahalli, dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd On 2020-03-24 19:53, Honnappa Nagarahalli wrote: > <snip> > >>>>>>> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 >>>>>>> >>>>>>> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: >>>>>>>> Hello, >>>>>>>> Following are the work items planned for 20.05: >>>>>>>> >>>>>>>> 1) Use C11 atomic APIs in timer library >>>>>>>> 2) Use C11 atomic APIs in service cores >>>>>>>> 3) Use C11 atomics in VirtIO split ring >>>>>>>> 4) Performance optimizations in i40e and MLX drivers for Arm >>>>>>>> platforms >>>>>>>> 5) RCU defer API >>>>>>>> 6) Enable Travis CI with no huge-page tests - ~25 test cases >>>>>>>> >>>>>>>> Thank you, >>>>>>>> Honnappa >>>>>>> Maybe you should have a look at legacy DPDK atomics as well? >>>>>>> Avoiding a full barrier for the add operation, for example. >>>>>> By legacy, I believe you meant rte_atomic APIs. Those APIs do not >>>>>> take >>>> memory order as a parameter. So, it is difficult to change the >>>> implementation for those APIs. For ex: the add operation could take a >>>> RELEASE or RELAXED order depending on the use case. >>>>>> So, the proposal is to deprecate the rte_atomic APIs and use C11 >>>>>> APIs directly. The proposal is here: >>>>>> https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185- >>>> 865b >>>>>> 3b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2- >>>> aa5e88331555& >>>>>> u=https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F >>>>> Even though rte_atomic lacks the flexibility of C11 atomics, there >>>>> might still be areas of improvement. Such improvements will have an >>>>> instant effect, as opposed to waiting for all the rte_atomic users to change. >>>>> >>>>> >>>>> The rte_atomic API leaves ordering unspecified, unfortunately. In >>>>> the Linux kernel, from which DPDK seems to borrow much of the >>>>> atomics and memory order related semantics, an atomic add doesn't >>>>> imply any memory barriers. The current __sync_fetch_and_add()-based >>>>> implementation implies a full barrier (ldadd+dmb) or release >>>>> (ldaddal, on v8.1-a). If you would use C11 atomics to implement >>>>> rte_atomic in ARM, you could use a relaxed memory order on >>>>> rte_atomic*_add() (assuming you agree those are the implicit >>>>> semantics of the legacy API) and just get an ldadd instruction. An >>>>> alternative would be to implement the same thing in assembler, of course. >>>>> >>>>> >>>> Another approach might be to just scrap all of the intrinsics and >>>> inline assembler used for all the functions in rte_atomic, on all >>>> architectures, and use C11 atomics instead. >>> Yes, this is the approach we are taking. But, it does not solve the use of >> rte_atomic APIs in the applications. >> >> Agreed. >> >> >> Another question. "C11 atomics" here seems to mean using GCC instrinsics, >> normally used to implement C11 atomics, not C11 atomics (i.e. <stdatomic.h>). >> What is the reason directly calling the intrinsics, rather than using the standard >> API? > I did not know they existed for C. Looking at them, they looks like just wrappers around the intrinsics. The advantage seems to be the type check enforced by the compiler. i.e. if a variable is defined of type '_Atomic', the compiler should not allow any non-atomic operations on them. Anything else? > I will explore this further. > That's the only difference I know of. My initial impression was that the type checking was more of a bug than a feature in this case, but I might well be wrong. I have very little practical experience with the <stdatomic.h> constructs. >> >> >> With this in mind, wouldn't be better to extend <rte_atomic.h> with functions >> that take a memory ordering parameter? And properly document the memory >> ordering for the functions already in this API, and maybe deprecate some >> functions in favor of others, more C11-like, functions? > I would prefer to use what the language provides rather than creating DPDK's own, which will be just wrappers on top of what C provides. If we follow the existing model of rte_atomic APIs, we will be creating these for every size of the parameter (rte_atomic8/16/32/64_xxx). This results in more core to maintain. > >> If not, assuming <stdatomic.h> can't be used, wouldn't it be better if we added >> a <rte_stdatomic.h>, which mimics the standard API, maybe with some DPDK >> tweaks, plus potentially with DPDK-specific extensions as well? > What kind of extensions are you thinking about? > It's difficult to make predictions, especially about the future. __atomic_add_ifless() is an example already in the code base. From what I understand, there was something missing in C11 for an efficient RCU implementation in the Linux kernel. "Consume load", if I recall correctly. A middle ground could be to make <rte_atomic.h> obsolete, but then introduce a <rte_atomic11.h>, which would be a thin wrapper, in the same manner as <stdatomic.h>. >> >> >> Directly accessing instrinsics will lead to things like >> __atomic_add_ifless() (already in DPDK code base), when people need to >> extend the API. This very much look like GCC built-in function, but is not. > I think the DPDK code should not be using symbols that will potentially collide with language/library symbols. > Luckily, in this case, it is internal to a PMD which can be changed. > It also contains more symbols which are on the border to collide with 'stdatomic.h'. > >> >> >> Sorry for hijacking the ARM roadmap thread. > No problem. I am glad we are having these important discussions. > >> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [dpdk-dev] Arm roadmap for 20.05 2020-03-21 8:17 ` Mattias Rönnblom 2020-03-21 8:23 ` Mattias Rönnblom @ 2020-03-23 17:12 ` Honnappa Nagarahalli 1 sibling, 0 replies; 14+ messages in thread From: Honnappa Nagarahalli @ 2020-03-23 17:12 UTC (permalink / raw) To: Mattias Rönnblom, dev, thomas, david.marchand Cc: Song Zhu, Gavin Hu, Jeff Brownlee, Philippe Robin, Pravin Kantak, nd, Honnappa Nagarahalli, nd <snip> > > > >> Subject: Re: [dpdk-dev] Arm roadmap for 20.05 > >> > >> On 2020-03-10 17:42, Honnappa Nagarahalli wrote: > >>> Hello, > >>> Following are the work items planned for 20.05: > >>> > >>> 1) Use C11 atomic APIs in timer library > >>> 2) Use C11 atomic APIs in service cores > >>> 3) Use C11 atomics in VirtIO split ring > >>> 4) Performance optimizations in i40e and MLX drivers for Arm > >>> platforms > >>> 5) RCU defer API > >>> 6) Enable Travis CI with no huge-page tests - ~25 test cases > >>> > >>> Thank you, > >>> Honnappa > >> Maybe you should have a look at legacy DPDK atomics as well? Avoiding > >> a full barrier for the add operation, for example. > > By legacy, I believe you meant rte_atomic APIs. Those APIs do not take > memory order as a parameter. So, it is difficult to change the implementation > for those APIs. For ex: the add operation could take a RELEASE or RELAXED > order depending on the use case. > > So, the proposal is to deprecate the rte_atomic APIs and use C11 APIs > > directly. The proposal is here: > > https://protect2.fireeye.com/v1/url?k=2e04311e-72d039b7-2e047185- > 865b3 > > b1e120b-91a0698f69ff0d1f&q=1&e=976056f3-f089-4fa8-86b2- > aa5e88331555&u= > > https%3A%2F%2Fpatches.dpdk.org%2Fcover%2F66745%2F > > Even though rte_atomic lacks the flexibility of C11 atomics, there might still be > areas of improvement. Such improvements will have an instant effect, as > opposed to waiting for all the rte_atomic users to change. > > > The rte_atomic API leaves ordering unspecified, unfortunately. In the Linux > kernel, from which DPDK seems to borrow much of the atomics and memory > order related semantics, an atomic add doesn't imply any memory barriers. > The current __sync_fetch_and_add()-based implementation implies a full > barrier (ldadd+dmb) or release (ldaddal, on v8.1-a). If you would use C11 > atomics to implement rte_atomic in ARM, you could use a relaxed memory > order on rte_atomic*_add() (assuming you agree those are the implicit > semantics of the legacy API) and just get an ldadd instruction. An alternative The problem is agreeing on the implicit semantics of the legacy APIs. We can find this out in the existing DPDK code. However, given that these APIs are public APIs, they might be used in the applications and we do not know how they are used in the applications. > would be to implement the same thing in assembler, of course. > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-04-09 1:25 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-10 16:42 [dpdk-dev] Arm roadmap for 20.05 Honnappa Nagarahalli 2020-03-11 8:25 ` Mattias Rönnblom 2020-03-20 20:45 ` Honnappa Nagarahalli 2020-03-21 8:17 ` Mattias Rönnblom 2020-03-21 8:23 ` Mattias Rönnblom 2020-03-23 17:14 ` Honnappa Nagarahalli 2020-03-23 17:34 ` Mattias Rönnblom 2020-03-24 8:01 ` Morten Brørup 2020-03-24 18:53 ` Honnappa Nagarahalli 2020-03-24 21:41 ` Honnappa Nagarahalli 2020-04-07 5:15 ` Honnappa Nagarahalli 2020-04-09 1:25 ` Chen, Zhaoyan 2020-04-07 19:10 ` Mattias Rönnblom 2020-03-23 17:12 ` Honnappa Nagarahalli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).