A quick hack might just to increase cache line size as experiment On Mon, Aug 28, 2023, 11:54 AM Morten Brørup wrote: > > From: Mattias Rönnblom [mailto:hofors@lysator.liu.se] > > Sent: Monday, 28 August 2023 10.46 > > > > On 2023-08-28 08:32, Morten Brørup wrote: > > >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se] > > >> Sent: Monday, 28 August 2023 00.31 > > >> > > >> On 2023-08-27 17:40, Morten Brørup wrote: > > >>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se] > > >>>> Sent: Sunday, 27 August 2023 15.55 > > [...] > > > >>> So, this gets added to rte_common.h: > > >>> > > >>> /** > > >>> * Empty cache lines, to guard against false sharing-like effects > > >>> * on systems with a next-N-lines hardware prefetcher. > > >>> * > > >>> * Use as spacing between data accessed by different lcores, > > >>> * to prevent cache thrashing on hardware with speculative > > prefetching. > > >>> */ > > >>> #if RTE_CACHE_GUARD_LINES > 0 > > >>> #define _RTE_CACHE_GUARD_HELPER2(unique) \ > > >>> char cache_guard_ ## unique[RTE_CACHE_LINE_SIZE * > > >> RTE_CACHE_GUARD_LINES] \ > > >>> __rte_cache_aligned; > > >>> #define _RTE_CACHE_GUARD_HELPER1(unique) > > _RTE_CACHE_GUARD_HELPER2(unique) > > >>> #define RTE_CACHE_GUARD _RTE_CACHE_GUARD_HELPER1(__COUNTER__) > > >>> #else > > >>> #define RTE_CACHE_GUARD > > >>> #endif > > >>> > > >> > > >> Seems like a good solution. I thought as far as using __LINE__ to > > build > > >> a unique name, but __COUNTER__ is much cleaner, provided it's > > available > > >> in relevant compilers. (It's not in C11.) > > > > > > I considered __LINE__ too, but came to the same conclusion... > > __COUNTER__ is cleaner for this purpose. > > > > > > And since __COUNTER__ is being used elsewhere in DPDK, I assume it is > > available for use here too. > > > > > > If it turns out causing problems, we can easily switch to __LINE__ > > instead. > > > > > >> > > >> Should the semicolon be included or not in HELPER2? If left out, a > > >> lonely ";" will be left for RTE_CACHE_GUARD_LINES == 0, but I don't > > >> think that is a problem. > > > > > > I tested it on Godbolt, and the lonely ";" in a struct didn't seem to > > be a problem. > > > > > > With the semicolon in HELPER2, there will be a lonely ";" in the > > struct in both cases, i.e. with and without cache guards enabled. > > > > > >> > > >> I don't see why __rte_cache_aligned is needed here. The adjacent > > struct > > >> must be cache-line aligned. Maybe it makes it more readable, having > > the > > >> explicit guard padding starting at the start of the actual guard > > cache > > >> lines, rather than potentially at some earlier point before, and > > having > > >> non-guard padding at the end of the struct (from __rte_cache_aligned > > on > > >> the struct level). > > > > > > Having both __rte_cache_aligned and the char array with full cache > > lines ensures that the guard field itself is on its own separate cache > > line, regardless of the organization of adjacent fields in the struct. > > E.g. this will also work: > > > > > > struct test { > > > char x; > > > RTE_CACHE_GUARD; > > > char y; > > > }; > > > > > > > That struct declaration is broken, since it will create false sharing > > between x and y, in case RTE_CACHE_GUARD_LINES is defined to 0. > > > > Maybe the most intuitive function (semantics) of the RTE_CACHE_GUARD > > macro would be have it deal exclusively with the issue resulting from > > next-N-line (and similar) hardware prefetching, and leave > > __rte_cache_aligned to deal with "classic" (same-cache line) false > > sharing. > > Excellent review feedback! > > I only thought of the cache guard as a means to provide spacing between > elements where the developer already prevented (same-cache line) false > sharing by some other means. I didn't even consider the alternative > interpretation of its purpose. > > Your feedback leaves no doubt that we should extend the cache guard's > purpose to also enforce cache alignment (under all circumstances, also when > RTE_CACHE_GUARD_LINES is 0). > > > > > Otherwise you would have to have something like > > > > struct test > > { > > char x; > > RTE_CACHE_GUARD(char, y); > > }; > > > > ...so that 'y' can be made __rte_cache_aligned by the macro. > > There's an easier solution... > > We can copy the concept from the RTE_MARKER type, which uses a zero-length > array. By simply omitting the #if RTE_CACHE_GUARD_LINES > 0, the macro will > serve both purposes: > > #define _RTE_CACHE_GUARD_HELPER2(unique) \ > char cache_guard_ ## unique[RTE_CACHE_LINE_SIZE * > RTE_CACHE_GUARD_LINES] \ > __rte_cache_aligned; > #define _RTE_CACHE_GUARD_HELPER1(unique) _RTE_CACHE_GUARD_HELPER2(unique) > #define RTE_CACHE_GUARD _RTE_CACHE_GUARD_HELPER1(__COUNTER__) > > I have verified on Godbolt that this works. The memif driver also uses > RTE_MARKER this way [1]. > > [1]: > https://elixir.bootlin.com/dpdk/latest/source/drivers/net/memif/memif.h#L173 > > > > > RTE_HW_PREFETCH_GUARD could be an alternative name, but I think I like > > RTE_CACHE_GUARD better. > > > > When the macro serves both purposes (regardless of the value of > RTE_CACHE_GUARD_LINES), I think we can stick with the RTE_CACHE_GUARD name. > > >