From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 959F141B29; Mon, 28 Aug 2023 12:40:30 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8DC5D40278; Mon, 28 Aug 2023 12:40:30 +0200 (CEST) Received: from mail-vk1-f179.google.com (mail-vk1-f179.google.com [209.85.221.179]) by mails.dpdk.org (Postfix) with ESMTP id 451EA4021E for ; Mon, 28 Aug 2023 12:40:28 +0200 (CEST) Received: by mail-vk1-f179.google.com with SMTP id 71dfb90a1353d-48d0ae408b8so876628e0c.1 for ; Mon, 28 Aug 2023 03:40:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20221208.gappssmtp.com; s=20221208; t=1693219227; x=1693824027; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=MfYssAK3nau7UxWdudbjhETxR2t2w0dFTymbUsgLSZ4=; b=kaVzm0ukTdrhiTSZpZvdPZZmNAWygtaU5NiCdFlyqMTy+hfEn3xwpNu63nqH7U1KPV OjKSf6mh9UnkW83lz9RCzyMEdViaPu+c0tvRmgPvbPCqEPAnhepR4LXA6AfryoRQnGaM diCMV3kiTKzBvKlKkk01UNMdSIQlVVoHcHaVV3I1MmkKIfcbKx8o54wRECVvBOavqeMN CU0LdzFKw3v4/T+3dvM5nH7xO88ZaE3/kr4uSqZR1D3Q7JTTcTOlIzHMxiWlO/1SubhL SmgQADxMZeE+2DoqOau20tsDelZtH3+eF51Vv8GHKey2TD8/N94e1hWU8kg3Zzv/6InY KkCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693219227; x=1693824027; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MfYssAK3nau7UxWdudbjhETxR2t2w0dFTymbUsgLSZ4=; b=Gu5Fn4nSXE11aEcaIEykQ21SXrErIABGGA0D9jZd4ZWc+4SaTvRlfSjlTbXyrxl67n dfBwadJVUqqy66D5kQCGn9EW5oLmNyGoX3nucCTvh5Z8cIElEPIrTqODiAQ/Bf16ZdDY ga3NixbhuYIIouIZ0MsvKfZoAl0ObNvrjfANe0LI5TTTk0kYNRpmIoEOmxQnDav7J6pu bIePQeIYeFeFPTvaWh313MPqdvztuqLg9o0Q+V9lzHYiQmoqilXq75TtdY4RyoYCt9RK wTbw8dwUQk5Ph7vBZXGAkkrF0w3RB2F05+dbQun3zQAvT7+bQyb1eJ+IYHf5d8vgpZ9j UwtQ== X-Gm-Message-State: AOJu0YwT5HgQE5dlLyr3ZpzZ2Nh2SwPzs+Qj0x2Z0Opc2UimaXEvft9O HKvH2dr2kcWB4cJJJzhJvxNm+DO4o+7+egsF1mEQmQ== X-Google-Smtp-Source: AGHT+IGEDTWRZII8ILC+u4JQTy1qxo1Zz68dh9orhRBGj38WOIrLI5CpuSCEw1gpUZne0Uijc0c+lv2GlKagCfGjbi0= X-Received: by 2002:a1f:df04:0:b0:48f:8708:7e20 with SMTP id w4-20020a1fdf04000000b0048f87087e20mr15101913vkg.15.1693219227532; Mon, 28 Aug 2023 03:40:27 -0700 (PDT) MIME-Version: 1.0 References: <98CBD80474FA8B44BF855DF32C47DC35D87B39@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D87B3A@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D87B47@smartserver.smartshare.dk> <5a56c531-5d5c-777b-c1ea-dbcc25009220@lysator.liu.se> <98CBD80474FA8B44BF855DF32C47DC35D87B49@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D87B4A@smartserver.smartshare.dk> <952263a3-0264-2d7a-a45d-b3445da1e053@lysator.liu.se> <98CBD80474FA8B44BF855DF32C47DC35D87B4D@smartserver.smartshare.dk> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D87B4D@smartserver.smartshare.dk> From: Stephen Hemminger Date: Mon, 28 Aug 2023 12:40:16 +0200 Message-ID: Subject: Re: [RFC] cache guard To: =?UTF-8?Q?Morten_Br=C3=B8rup?= Cc: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , Bruce Richardson , dev , Olivier Matz , Andrew Rybchenko , Honnappa Nagarahalli , Konstantin Ananyev , =?UTF-8?Q?Mattias_R=C3=B6nnblom?= Content-Type: multipart/alternative; boundary="000000000000959a200603f95246" X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --000000000000959a200603f95246 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable A quick hack might just to increase cache line size as experiment On Mon, Aug 28, 2023, 11:54 AM Morten Br=C3=B8rup wrote: > > From: Mattias R=C3=B6nnblom [mailto:hofors@lysator.liu.se] > > Sent: Monday, 28 August 2023 10.46 > > > > On 2023-08-28 08:32, Morten Br=C3=B8rup wrote: > > >> From: Mattias R=C3=B6nnblom [mailto:hofors@lysator.liu.se] > > >> Sent: Monday, 28 August 2023 00.31 > > >> > > >> On 2023-08-27 17:40, Morten Br=C3=B8rup wrote: > > >>>> From: Mattias R=C3=B6nnblom [mailto:hofors@lysator.liu.se] > > >>>> Sent: Sunday, 27 August 2023 15.55 > > [...] > > > >>> So, this gets added to rte_common.h: > > >>> > > >>> /** > > >>> * Empty cache lines, to guard against false sharing-like effects > > >>> * on systems with a next-N-lines hardware prefetcher. > > >>> * > > >>> * Use as spacing between data accessed by different lcores, > > >>> * to prevent cache thrashing on hardware with speculative > > prefetching. > > >>> */ > > >>> #if RTE_CACHE_GUARD_LINES > 0 > > >>> #define _RTE_CACHE_GUARD_HELPER2(unique) \ > > >>> char cache_guard_ ## unique[RTE_CACHE_LINE_SIZE * > > >> RTE_CACHE_GUARD_LINES] \ > > >>> __rte_cache_aligned; > > >>> #define _RTE_CACHE_GUARD_HELPER1(unique) > > _RTE_CACHE_GUARD_HELPER2(unique) > > >>> #define RTE_CACHE_GUARD _RTE_CACHE_GUARD_HELPER1(__COUNTER__) > > >>> #else > > >>> #define RTE_CACHE_GUARD > > >>> #endif > > >>> > > >> > > >> Seems like a good solution. I thought as far as using __LINE__ to > > build > > >> a unique name, but __COUNTER__ is much cleaner, provided it's > > available > > >> in relevant compilers. (It's not in C11.) > > > > > > I considered __LINE__ too, but came to the same conclusion... > > __COUNTER__ is cleaner for this purpose. > > > > > > And since __COUNTER__ is being used elsewhere in DPDK, I assume it is > > available for use here too. > > > > > > If it turns out causing problems, we can easily switch to __LINE__ > > instead. > > > > > >> > > >> Should the semicolon be included or not in HELPER2? If left out, a > > >> lonely ";" will be left for RTE_CACHE_GUARD_LINES =3D=3D 0, but I do= n't > > >> think that is a problem. > > > > > > I tested it on Godbolt, and the lonely ";" in a struct didn't seem to > > be a problem. > > > > > > With the semicolon in HELPER2, there will be a lonely ";" in the > > struct in both cases, i.e. with and without cache guards enabled. > > > > > >> > > >> I don't see why __rte_cache_aligned is needed here. The adjacent > > struct > > >> must be cache-line aligned. Maybe it makes it more readable, having > > the > > >> explicit guard padding starting at the start of the actual guard > > cache > > >> lines, rather than potentially at some earlier point before, and > > having > > >> non-guard padding at the end of the struct (from __rte_cache_aligned > > on > > >> the struct level). > > > > > > Having both __rte_cache_aligned and the char array with full cache > > lines ensures that the guard field itself is on its own separate cache > > line, regardless of the organization of adjacent fields in the struct. > > E.g. this will also work: > > > > > > struct test { > > > char x; > > > RTE_CACHE_GUARD; > > > char y; > > > }; > > > > > > > That struct declaration is broken, since it will create false sharing > > between x and y, in case RTE_CACHE_GUARD_LINES is defined to 0. > > > > Maybe the most intuitive function (semantics) of the RTE_CACHE_GUARD > > macro would be have it deal exclusively with the issue resulting from > > next-N-line (and similar) hardware prefetching, and leave > > __rte_cache_aligned to deal with "classic" (same-cache line) false > > sharing. > > Excellent review feedback! > > I only thought of the cache guard as a means to provide spacing between > elements where the developer already prevented (same-cache line) false > sharing by some other means. I didn't even consider the alternative > interpretation of its purpose. > > Your feedback leaves no doubt that we should extend the cache guard's > purpose to also enforce cache alignment (under all circumstances, also wh= en > RTE_CACHE_GUARD_LINES is 0). > > > > > Otherwise you would have to have something like > > > > struct test > > { > > char x; > > RTE_CACHE_GUARD(char, y); > > }; > > > > ...so that 'y' can be made __rte_cache_aligned by the macro. > > There's an easier solution... > > We can copy the concept from the RTE_MARKER type, which uses a zero-lengt= h > array. By simply omitting the #if RTE_CACHE_GUARD_LINES > 0, the macro wi= ll > serve both purposes: > > #define _RTE_CACHE_GUARD_HELPER2(unique) \ > char cache_guard_ ## unique[RTE_CACHE_LINE_SIZE * > RTE_CACHE_GUARD_LINES] \ > __rte_cache_aligned; > #define _RTE_CACHE_GUARD_HELPER1(unique) _RTE_CACHE_GUARD_HELPER2(unique) > #define RTE_CACHE_GUARD _RTE_CACHE_GUARD_HELPER1(__COUNTER__) > > I have verified on Godbolt that this works. The memif driver also uses > RTE_MARKER this way [1]. > > [1]: > https://elixir.bootlin.com/dpdk/latest/source/drivers/net/memif/memif.h#L= 173 > > > > > RTE_HW_PREFETCH_GUARD could be an alternative name, but I think I like > > RTE_CACHE_GUARD better. > > > > When the macro serves both purposes (regardless of the value of > RTE_CACHE_GUARD_LINES), I think we can stick with the RTE_CACHE_GUARD nam= e. > > > --000000000000959a200603f95246 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
A quick hack might just to increase cache line size as ex= periment=C2=A0

On Mon, Aug 28, 2023, 11:54 AM Morten Br=C3=B8rup <mb@smartsharesystems.com> wro= te:
> From: Mattias R=C3=B6nnblo= m [mailto:hofors@lysator.liu.se]
> Sent: Monday, 28 August 2023 10.46
>
> On 2023-08-28 08:32, Morten Br=C3=B8rup wrote:
> >> From: Mattias R=C3=B6nnblom [mailto:hofors@lysator.liu.se<= /a>]
> >> Sent: Monday, 28 August 2023 00.31
> >>
> >> On 2023-08-27 17:40, Morten Br=C3=B8rup wrote:
> >>>> From: Mattias R=C3=B6nnblom [mailto:
hofors@lysator= .liu.se]
> >>>> Sent: Sunday, 27 August 2023 15.55

[...]

> >>> So, this gets added to rte_common.h:
> >>>
> >>> /**
> >>>=C2=A0 =C2=A0 * Empty cache lines, to guard against false = sharing-like effects
> >>>=C2=A0 =C2=A0 * on systems with a next-N-lines hardware pr= efetcher.
> >>>=C2=A0 =C2=A0 *
> >>>=C2=A0 =C2=A0 * Use as spacing between data accessed by di= fferent lcores,
> >>>=C2=A0 =C2=A0 * to prevent cache thrashing on hardware wit= h speculative
> prefetching.
> >>>=C2=A0 =C2=A0 */
> >>> #if RTE_CACHE_GUARD_LINES > 0
> >>> #define _RTE_CACHE_GUARD_HELPER2(unique) \
> >>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0char cache_guard_= ## unique[RTE_CACHE_LINE_SIZE *
> >> RTE_CACHE_GUARD_LINES] \
> >>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__rte_cache_align= ed;
> >>> #define _RTE_CACHE_GUARD_HELPER1(unique)
> _RTE_CACHE_GUARD_HELPER2(unique)
> >>> #define RTE_CACHE_GUARD _RTE_CACHE_GUARD_HELPER1(__COUNTE= R__)
> >>> #else
> >>> #define RTE_CACHE_GUARD
> >>> #endif
> >>>
> >>
> >> Seems like a good solution. I thought as far as using __LINE_= _ to
> build
> >> a unique name, but __COUNTER__ is much cleaner, provided it&#= 39;s
> available
> >> in relevant compilers. (It's not in C11.)
> >
> > I considered __LINE__ too, but came to the same conclusion...
> __COUNTER__ is cleaner for this purpose.
> >
> > And since __COUNTER__ is being used elsewhere in DPDK, I assume i= t is
> available for use here too.
> >
> > If it turns out causing problems, we can easily switch to __LINE_= _
> instead.
> >
> >>
> >> Should the semicolon be included or not in HELPER2? If left o= ut, a
> >> lonely ";" will be left for RTE_CACHE_GUARD_LINES = =3D=3D 0, but I don't
> >> think that is a problem.
> >
> > I tested it on Godbolt, and the lonely ";" in a struct = didn't seem to
> be a problem.
> >
> > With the semicolon in HELPER2, there will be a lonely ";&quo= t; in the
> struct in both cases, i.e. with and without cache guards enabled.
> >
> >>
> >> I don't see why __rte_cache_aligned is needed here. The a= djacent
> struct
> >> must be cache-line aligned. Maybe it makes it more readable, = having
> the
> >> explicit guard padding starting at the start of the actual gu= ard
> cache
> >> lines, rather than potentially at some earlier point before, = and
> having
> >> non-guard padding at the end of the struct (from __rte_cache_= aligned
> on
> >> the struct level).
> >
> > Having both __rte_cache_aligned and the char array with full cach= e
> lines ensures that the guard field itself is on its own separate cache=
> line, regardless of the organization of adjacent fields in the struct.=
> E.g. this will also work:
> >
> > struct test {
> >=C2=A0 =C2=A0 =C2=A0 char x;
> >=C2=A0 =C2=A0 =C2=A0 RTE_CACHE_GUARD;
> >=C2=A0 =C2=A0 =C2=A0 char y;
> > };
> >
>
> That struct declaration is broken, since it will create false sharing<= br> > between x and y, in case RTE_CACHE_GUARD_LINES is defined to 0.
>
> Maybe the most intuitive function (semantics) of the RTE_CACHE_GUARD > macro would be have it deal exclusively with the issue resulting from<= br> > next-N-line (and similar) hardware prefetching, and leave
> __rte_cache_aligned to deal with "classic" (same-cache line)= false
> sharing.

Excellent review feedback!

I only thought of the cache guard as a means to provide spacing between ele= ments where the developer already prevented (same-cache line) false sharing= by some other means. I didn't even consider the alternative interpreta= tion of its purpose.

Your feedback leaves no doubt that we should extend the cache guard's p= urpose to also enforce cache alignment (under all circumstances, also when = RTE_CACHE_GUARD_LINES is 0).

>
> Otherwise you would have to have something like
>
> struct test
> {
>=C2=A0 =C2=A0 =C2=A0 =C2=A0char x;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0RTE_CACHE_GUARD(char, y);
> };
>
> ...so that 'y' can be made __rte_cache_aligned by the macro.
There's an easier solution...

We can copy the concept from the RTE_MARKER type, which uses a zero-length = array. By simply omitting the #if RTE_CACHE_GUARD_LINES > 0, the macro w= ill serve both purposes:

#define _RTE_CACHE_GUARD_HELPER2(unique) \
=C2=A0 =C2=A0 =C2=A0 =C2=A0 char cache_guard_ ## unique[RTE_CACHE_LINE_SIZE= * RTE_CACHE_GUARD_LINES] \
=C2=A0 =C2=A0 =C2=A0 =C2=A0 __rte_cache_aligned;
#define _RTE_CACHE_GUARD_HELPER1(unique) _RTE_CACHE_GUARD_HELPER2(unique) #define RTE_CACHE_GUARD _RTE_CACHE_GUARD_HELPER1(__COUNTER__)

I have verified on Godbolt that this works. The memif driver also uses RTE_= MARKER this way [1].

[1]: https://= elixir.bootlin.com/dpdk/latest/source/drivers/net/memif/memif.h#L173
>
> RTE_HW_PREFETCH_GUARD could be an alternative name, but I think I like=
> RTE_CACHE_GUARD better.
>

When the macro serves both purposes (regardless of the value of RTE_CACHE_G= UARD_LINES), I think we can stick with the RTE_CACHE_GUARD name.


--000000000000959a200603f95246--