From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 7E87F428D0;
	Thu,  6 Apr 2023 02:07:42 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 57D2D40F18;
	Thu,  6 Apr 2023 02:07:42 +0200 (CEST)
Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182])
 by mails.dpdk.org (Postfix) with ESMTP id 7DDE240DF6
 for <dev@dpdk.org>; Thu,  6 Apr 2023 02:07:41 +0200 (CEST)
Received: by linux.microsoft.com (Postfix, from userid 1086)
 id A5393210DEF7; Wed,  5 Apr 2023 17:07:40 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com A5393210DEF7
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com;
 s=default; t=1680739660;
 bh=dWGnjsABnWvlInkDgxNKl+9NMn6lca+sEp4JcTO9MWQ=;
 h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
 b=N2v0B2IH3vMA0ztBwNnlwEP8TgINllihHVRgwlS01gAovBLB7Y3SVLZj6f1MgsPBF
 hzd4CPEd5BTs0bOOjctG1DDM8b1bVLI/CIyR9m4CNA99P9pFhDnIfbRfL8DMF6sBP6
 T8/j+m3Xs8rnzdohhur1jIbHDaqWsp912Q/Jlcy8=
Date: Wed, 5 Apr 2023 17:07:40 -0700
From: Tyler Retzlaff <roretzla@linux.microsoft.com>
To: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Cc: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>,
 "dev@dpdk.org" <dev@dpdk.org>,
 "bruce.richardson@intel.com" <bruce.richardson@intel.com>,
 "david.marchand@redhat.com" <david.marchand@redhat.com>,
 "thomas@monjalon.net" <thomas@monjalon.net>,
 "mb@smartsharesystems.com" <mb@smartsharesystems.com>
Subject: Re: [PATCH 3/9] eal: use barrier intrinsics when compiling with msvc
Message-ID: <20230406000740.GB2287@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
References: <1680558751-17931-1-git-send-email-roretzla@linux.microsoft.com>
 <1680558751-17931-4-git-send-email-roretzla@linux.microsoft.com>
 <754a877d020a4a199a5310b469e876a7@huawei.com>
 <20230404154906.GC23247@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
 <faad6a2b-ff4c-5c52-1d6d-79e355e6249a@yandex.ru>
 <20230405000441.GA24769@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
 <df70cd0153354344acb70029b7eb5af8@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <df70cd0153354344acb70029b7eb5af8@huawei.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On Wed, Apr 05, 2023 at 10:57:02AM +0000, Konstantin Ananyev wrote:
> 
> > > >>>Inline assembly is not supported for msvc x64 instead use
> > > >>>_{Read,Write,ReadWrite}Barrier() intrinsics.
> > > >>>
> > > >>>Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > >>>---
> > > >>>  lib/eal/include/generic/rte_atomic.h |  4 ++++
> > > >>>  lib/eal/x86/include/rte_atomic.h     | 10 +++++++++-
> > > >>>  2 files changed, 13 insertions(+), 1 deletion(-)
> > > >>>
> > > >>>diff --git a/lib/eal/include/generic/rte_atomic.h b/lib/eal/include/generic/rte_atomic.h
> > > >>>index 234b268..e973184 100644
> > > >>>--- a/lib/eal/include/generic/rte_atomic.h
> > > >>>+++ b/lib/eal/include/generic/rte_atomic.h
> > > >>>@@ -116,9 +116,13 @@
> > > >>>   * Guarantees that operation reordering does not occur at compile time
> > > >>>   * for operations directly before and after the barrier.
> > > >>>   */
> > > >>>+#ifndef RTE_TOOLCHAIN_MSVC
> > > >>>  #define	rte_compiler_barrier() do {		\
> > > >>>  	asm volatile ("" : : : "memory");	\
> > > >>>  } while(0)
> > > >>>+#else
> > > >>>+#define rte_compiler_barrier() _ReadWriteBarrier()
> > > >>>+#endif
> > > >>>
> > > >>>  /**
> > > >>>   * Synchronization fence between threads based on the specified memory order.
> > > >>>diff --git a/lib/eal/x86/include/rte_atomic.h b/lib/eal/x86/include/rte_atomic.h
> > > >>>index f2ee1a9..5cce9ba 100644
> > > >>>--- a/lib/eal/x86/include/rte_atomic.h
> > > >>>+++ b/lib/eal/x86/include/rte_atomic.h
> > > >>>@@ -27,9 +27,13 @@
> > > >>>
> > > >>>  #define	rte_rmb() _mm_lfence()
> > > >>>
> > > >>>+#ifndef RTE_TOOLCHAIN_MSVC
> > > >>>  #define rte_smp_wmb() rte_compiler_barrier()
> > > >>>-
> > > >>>  #define rte_smp_rmb() rte_compiler_barrier()
> > > >>>+#else
> > > >>>+#define rte_smp_wmb() _WriteBarrier()
> > > >>>+#define rte_smp_rmb() _ReadBarrier()
> > > >>>+#endif
> > > >>>
> > > >>>  /*
> > > >>>   * From Intel Software Development Manual; Vol 3;
> > > >>>@@ -66,11 +70,15 @@
> > > >>>  static __rte_always_inline void
> > > >>>  rte_smp_mb(void)
> > > >>>  {
> > > >>>+#ifndef RTE_TOOLCHAIN_MSVC
> > > >>>  #ifdef RTE_ARCH_I686
> > > >>>  	asm volatile("lock addl $0, -128(%%esp); " ::: "memory");
> > > >>>  #else
> > > >>>  	asm volatile("lock addl $0, -128(%%rsp); " ::: "memory");
> > > >>>  #endif
> > > >>>+#else
> > > >>>+	rte_compiler_barrier();
> > > >>>+#endif
> > > >>
> > > >>It doesn't look right to me: compiler_barrier is not identical to LOCK-ed operation,
> > > >>and is not enough to serve as a proper memory barrier for SMP.
> > > >
> > > >i think i'm confused by the macro naming here.  i'll take another look
> > > >thank you for raising it.
> > > >
> > > >>
> > > >>Another ore generic comment - do we really need to pollute all that code with RTE_TOOLCHAIN_MSVC ifdefs?
> > > >>Right now we have ability to have subdir per arch (x86/arm/etc.).
> > > >>Can we treat x86+windows+msvc as a special arch?
> > > >
> > > >i asked this question previously and confirmed in the technical board
> > > >meeting. the answer i received was that the community did not want new
> > > >directory/headers introduced for compiler support matrix and i should
> > > >use #ifdef in the existing headers.
> > >
> > > Ok, can I then ask at least to minimize number of ifdefs to absolute
> > > minimum?
> > 
> > in principal no objection at all, one question though is what to do with
> > comment based documentation attached to macros? e.g.
> > 
> > #ifdef SOME_FOO
> > /* some documentation */
> > #define some_macro
> > #else
> > #define some_macro
> > #endif
> > 
> > #ifdef SOME_FOO
> > /* some documentation 2 */
> > #define some_macro2
> > #else
> > #define some_macro2
> > #endif
> > 
> > i can either duplicate the documentation for every define so it stays
> > "attached" or i can only document the first expansion. let me know what
> > you expect.
> > 
> > so something like this?
> > 
> > #ifdef SOME_FOO
> > /* some documentation */
> > #define some_macro
> > /* some documentation 2 */
> > #define some_macro2
> > #else
> > #define some_macro
> > #define some_macro2
> > #endif
> > 
> > or should all documentation be duplicated? which can become a teadious
> > redundancy for anyone maintaining it. keep in mind we might have to make
> > an exception for rte_common.h because it seems doing this would be
> > really ugly there. take a look let me know.
> 
> My personal preference would be to keep one documentation block for both cases
> (yes, I suppose it needs to be updated if required):
> 
> /* some documentation, probably for both SOME_FOO on/off */
> #ifdef SOME_FOO
> #define some_macro
> #else
> #define some_macro
> #endif 
> 
> 
> > 
> > > It is really hard to read an follow acode that is heavily ifdefed.
> > > Let say above we probably don't need to re-define
> > > rte_smp_rmb/rte_smp_wmb, as both are boiled down to
> > > compiler_barrier(), which is already redefined.
> > 
> > can you take a look at v2 of the patch and re-prescribe your advise
> > here? in v2 only the intel macros expand to the compiler barrier. though
> > i find this vexing since as you pointed out it seems they aren't
> > supposed to be compiler only barriers according to the documentation in
> > generic/rte_atomic.h they are intended to be memory barriers.
> 
> Commented, pls check if I explained my thoughts clear enough there.
>  
> > 
> > please help me if i've goofed up in this regard.
> > 
> > > Another question - could it be visa-versa approach:
> > > can we replace some inline assembly with common instincts whenever possible?
> > 
> > msvc has only intrinsics and the conditional expansion for msvc is to
> > use those intrinsics, gcc doesn't generally define intrinsics for processor
> > specific code does it?
> 
> AFAIK latest gcc (and clang) versions do support majority of these instincts: __rdtsc, _xbegin/_xend, etc.
> So my thought was - might be we can use same instincts for all compilers...
> One implication I can think about - older versions of gcc.
> But might be we can re-order things and have inlines only for these oldere gcc versions?

i'm going to propose if we do this we do it as a separate change later.

i fear it could turn into the following dance which seems not a lot
better given i'm sure some people will argue there is no benefit to
removing inline assembly for gcc/clang. my preference is not to get
side-tracked on refactoring with the short merge window.

#if (defined(__clang__) && clang version < x) ||
    (defined(__GNUC__) && gcc version < x)
__asm(whatever...
#else
__rdtsc()
#endif