* memif insufficient padding
@ 2024-08-28 21:04 Morten Brørup
2024-08-29 7:54 ` Mattias Rönnblom
0 siblings, 1 reply; 3+ messages in thread
From: Morten Brørup @ 2024-08-28 21:04 UTC (permalink / raw)
To: Jakub Grajciar; +Cc: dev
Jakub,
While browsing virtual interfaces in DPDK, I noticed a possible performance issue in the memif driver:
If "head" and "tail" are accessed by different lcores, they are not sufficiently far away from each other (and other hot fields) to prevent false sharing-like effects on systems with a next-N-lines hardware prefetcher, which will prefetch "tail" when fetching "head", and prefetch "head" when fetching "flags".
I suggest updating the structure somewhat like this:
-#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
- alignas(RTE_CACHE_LINE_SIZE) RTE_MARKER mark;
-
-typedef struct {
- MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+typedef struct __rte_cache_aligned {
uint32_t cookie; /**< MEMIF_COOKIE */
uint16_t flags; /**< flags */
#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ RTE_CACHE_GUARD; /* isolate head from flags */
RTE_ATOMIC(uint16_t) head; /**< pointer to ring buffer head */
- MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ RTE_CACHE_GUARD; /* isolate tail from head */
RTE_ATOMIC(uint16_t) tail; /**< pointer to ring buffer tail */
- MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ RTE_CACHE_GUARD; /* isolate descriptors from tail */
- memif_desc_t desc[0]; /**< buffer descriptors */
+ memif_desc_t desc[]; /**< buffer descriptors */
} memif_ring_t;
Med venlig hilsen / Kind regards,
-Morten Brørup
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: memif insufficient padding
2024-08-28 21:04 memif insufficient padding Morten Brørup
@ 2024-08-29 7:54 ` Mattias Rönnblom
2024-08-29 8:55 ` Morten Brørup
0 siblings, 1 reply; 3+ messages in thread
From: Mattias Rönnblom @ 2024-08-29 7:54 UTC (permalink / raw)
To: Morten Brørup, Jakub Grajciar; +Cc: dev
On 2024-08-28 23:04, Morten Brørup wrote:
> Jakub,
>
> While browsing virtual interfaces in DPDK, I noticed a possible performance issue in the memif driver:
>
> If "head" and "tail" are accessed by different lcores, they are not sufficiently far away from each other (and other hot fields) to prevent false sharing-like effects on systems with a next-N-lines hardware prefetcher, which will prefetch "tail" when fetching "head", and prefetch "head" when fetching "flags".
>
> I suggest updating the structure somewhat like this:
>
> -#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
> - alignas(RTE_CACHE_LINE_SIZE) RTE_MARKER mark;
> -
> -typedef struct {
> - MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
> +typedef struct __rte_cache_aligned {
> uint32_t cookie; /**< MEMIF_COOKIE */
> uint16_t flags; /**< flags */
> #define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
> + RTE_CACHE_GUARD; /* isolate head from flags */
Wouldn't it be better to cache align the 'head' (or cache-aligned 'head'
*and* add a RTE_CACHE_GUARD)? In other words, isn't the purpose of
RTE_CACHE_GUARD to provide zero or more cache line of extra padding,
rather than a mechanism to avoid same-cache line false sharing?
> RTE_ATOMIC(uint16_t) head; /**< pointer to ring buffer head */
> - MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
> + RTE_CACHE_GUARD; /* isolate tail from head */
> RTE_ATOMIC(uint16_t) tail; /**< pointer to ring buffer tail */
> - MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
> + RTE_CACHE_GUARD; /* isolate descriptors from tail */
> - memif_desc_t desc[0]; /**< buffer descriptors */
> + memif_desc_t desc[]; /**< buffer descriptors */
> } memif_ring_t;
>
>
> Med venlig hilsen / Kind regards,
> -Morten Brørup
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: memif insufficient padding
2024-08-29 7:54 ` Mattias Rönnblom
@ 2024-08-29 8:55 ` Morten Brørup
0 siblings, 0 replies; 3+ messages in thread
From: Morten Brørup @ 2024-08-29 8:55 UTC (permalink / raw)
To: Mattias Rönnblom, Jakub Grajciar; +Cc: dev
> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>
> On 2024-08-28 23:04, Morten Brørup wrote:
> > Jakub,
> >
> > While browsing virtual interfaces in DPDK, I noticed a possible performance
> issue in the memif driver:
> >
> > If "head" and "tail" are accessed by different lcores, they are not
> sufficiently far away from each other (and other hot fields) to prevent false
> sharing-like effects on systems with a next-N-lines hardware prefetcher, which
> will prefetch "tail" when fetching "head", and prefetch "head" when fetching
> "flags".
> >
> > I suggest updating the structure somewhat like this:
> >
> > -#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
> > - alignas(RTE_CACHE_LINE_SIZE) RTE_MARKER mark;
> > -
> > -typedef struct {
> > - MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
> > +typedef struct __rte_cache_aligned {
> > uint32_t cookie; /**< MEMIF_COOKIE */
> > uint16_t flags; /**< flags */
> > #define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
> > + RTE_CACHE_GUARD; /* isolate head from flags */
>
> Wouldn't it be better to cache align the 'head' (or cache-aligned 'head'
> *and* add a RTE_CACHE_GUARD)? In other words, isn't the purpose of
> RTE_CACHE_GUARD to provide zero or more cache line of extra padding,
> rather than a mechanism to avoid same-cache line false sharing?
IMO the general purpose of RTE_CACHE_GUARD is to prevent false cache line sharing; both sharing of the same cache line (on systems with or without speculative prefetching) and sharing of the next cache lines (on systems with speculative prefetching).
RTE_CACHE_GUARD provides two things:
1. Zero or more bytes of padding up to cache alignment, which prevents same-cache line sharing. This effectively cache aligns the field that follows the RTE_CACHE_GUARD, here the "head".
2. Zero or more cache lines of extra padding (configured by RTE_CACHE_GUARD_LINES in rte_config.h), which prevents sharing of the next cache lines on systems with speculative prefetching.
My description failed to mention the reason for the RTE_CACHE_GUARD between "flags" and "head":
The lcore updating "tail" also reads "flags", and if reading "flags" causes that lcore to prefetch the next cache line, it will thereby read the cache line holding "head", causing false cache line sharing with the other lcore updating "head".
>
> > RTE_ATOMIC(uint16_t) head; /**< pointer to ring buffer head
> */
> > - MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
> > + RTE_CACHE_GUARD; /* isolate tail from head */
> > RTE_ATOMIC(uint16_t) tail; /**< pointer to ring buffer tail
> */
> > - MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
> > + RTE_CACHE_GUARD; /* isolate descriptors from tail */
> > - memif_desc_t desc[0]; /**< buffer descriptors */
> > + memif_desc_t desc[]; /**< buffer descriptors */
> > } memif_ring_t;
> >
> >
> > Med venlig hilsen / Kind regards,
> > -Morten Brørup
> >
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-08-29 8:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-08-28 21:04 memif insufficient padding Morten Brørup
2024-08-29 7:54 ` Mattias Rönnblom
2024-08-29 8:55 ` Morten Brørup
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).