DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ola Liljedahl <Ola.Liljedahl@arm.com>
To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
	Steve Capper <Steve.Capper@arm.com>, nd <nd@arm.com>,
	"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load
Date: Mon, 8 Oct 2018 10:25:45 +0000	[thread overview]
Message-ID: <B00E8CE9-7D26-46FC-A98B-06BA6DEEB813@arm.com> (raw)
In-Reply-To: <20181008100004.GB11081@jerin>



On 08/10/2018, 12:00, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:

    -----Original Message-----
    > Date: Mon, 8 Oct 2018 09:22:05 +0000
    > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
    > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    > CC: "dev@dpdk.org" <dev@dpdk.org>, Honnappa Nagarahalli
    >  <Honnappa.Nagarahalli@arm.com>, "Ananyev, Konstantin"
    >  <konstantin.ananyev@intel.com>, "Gavin Hu (Arm Technology China)"
    >  <Gavin.Hu@arm.com>, Steve Capper <Steve.Capper@arm.com>, nd <nd@arm.com>,
    >  "stable@dpdk.org" <stable@dpdk.org>
    > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load
    > user-agent: Microsoft-MacOutlook/10.11.0.180909
    > 
    > External Email
    > 
    > On 08/10/2018, 08:06, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
    > 
    >     -----Original Message-----
    >     > Date: Sun, 7 Oct 2018 20:44:54 +0000
    >     > From: Ola Liljedahl <Ola.Liljedahl@arm.com>
    >     > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    >     > CC: "dev@dpdk.org" <dev@dpdk.org>, Honnappa Nagarahalli
    >     >  <Honnappa.Nagarahalli@arm.com>, "Ananyev, Konstantin"
    >     >  <konstantin.ananyev@intel.com>, "Gavin Hu (Arm Technology China)"
    >     >  <Gavin.Hu@arm.com>, Steve Capper <Steve.Capper@arm.com>, nd <nd@arm.com>,
    >     >  "stable@dpdk.org" <stable@dpdk.org>
    >     > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load
    >     > user-agent: Microsoft-MacOutlook/10.11.0.180909
    >     >
    > 
    > 
    >     Could you please fix the email client for inline reply.
    > Sorry that doesn't seem to be possible with Outlook for Mac 16 or Office365. The official Office365/Outlook
    > documentation doesn't match the actual user interface...
    > 
    > 
    > 
    >     https://www.kernel.org/doc/html/v4.19-rc7/process/email-clients.html
    > 
    > 
    >     >
    >     > On 07/10/2018, 06:03, "Jerin Jacob" <jerin.jacob@caviumnetworks.com> wrote:
    >     >
    >     >     In arm64 case, it will have ATOMIC_RELAXED followed by asm volatile ("":::"memory") of rte_pause().
    >     >     I would n't have any issue, if the generated code code is same or better than the exiting case. but it not the case, Right?
    >     > The existing case is actually not interesting (IMO) as it exposes undefined behaviour which allows the compiler to do anything. But you seem to be satisfied with "works for me, right here right now". I think the cost of avoiding undefined behaviour is acceptable (actually I don't think it even will be noticeable).
    > 
    >     I am not convinced because of use of volatile in head and tail indexes.
    >     For me that brings the defined behavior.
    > As long as you don't mix in C11 atomic accesses (just use "plain" accesses to volatile objects),
    > it is AFAIK defined behaviour (but not necessarily using atomic loads and stores). But I quoted
    > the C11 spec where it explicitly mentions that mixing atomic and non-atomic accesses to the same
    > object is undefined behaviour. Don't argue with me, argue with the C11 spec.
    > If you want to disobey the spec, this should at least be called out for in the code with a comment.
    
    That's boils down only one question, should we follow C11 spec? Why not only take load
    acquire and store release semantics only just like Linux kernel and FreeBSD.
And introduce even more undefined behaviour?

    Does not look like C11 memory model is super efficient in term of gcc
    implementation.
You are making a chicken out of a feather.

I think this "problem" with one additional ADD instruction will only concern __atomic_load_n(__ATOMIC_RELAXED) and __atomic_store_n(__ATOMIC_RELAXED) because the compiler separates the address generation (add offset of struct member) from the load or store itself. For other atomic operations and memory orderings (e.g. __atomic_load_n(__ATOMIC_ACQUIRE), the extra ADD instruction will be included anyway (as long as we access a non-first struct member) because e.g. LDAR only accepts a base register with no offset.

I suggest minimising the imposed memory orderings can have a much larger (positive) effect on performance compared to avoiding one ADD instruction (memory accesses are much slower than CPU ALU instructions).
Using C11 memory model and identifying exactly which objects are used for synchronisation and whether (any) updates to shared memory are acquired or released (no updates to shared memory means relaxed order can be used) will provide maximum freedom to the compiler and hardware to get the best result.

The FreeBSD and DPDK ring buffers show some fundamental misunderstandings here. Instead excessive orderings and explicit barriers have been used as band-aids, with unknown effects on performance.

    
    > 
    > 
    >     That the reason why I shared
    >     the generated assembly code. If you think other way, Pick any compiler
    >     and see generated output.
    > This is what one compiler for one architecture generates today. These things change. Other things
    > that used to work or worked for some specific architecture has stopped working in newer versions of
    > the compiler.
    > 
    > 
    >     And
    > 
    >     Freebsd implementation of ring buffer(Which DPDK derived from), Don't have
    >     such logic, See https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L108
    > It looks like FreeBSD uses some kind of C11 atomic memory model-inspired API although I don't see
    > exactly how e.g. atomic_store_rel_int() is implemented. The code also mixes in explicit barriers
    > so definitively not pure C11 memory model usage. And finally, it doesn't establish the proper
    > load-acquire/store-release relationships (e.g. store-release cons_tail requires a load-acquire cons_tail,
    > same for prod_tail).
    > 
    > "* multi-producer safe lock-free ring buffer enqueue"
    > The comment is also wrong. This design is not lock-free, how could it be when there is spinning
    > (waiting) for other threads in the code? If a thread must wait for other threads, then by definition
    > the design is blocking.
    > 
    > So you are saying that because FreeBSD is doing it wrong, DPDK can also do it wrong?
    > 
    > 
    >     See below too.
    > 
    >     >
    >     > Skipping the compiler memory barrier in rte_pause() potentially allows for optimisations that provide much more benefit, e.g. hiding some cache miss latency for later loads. The DPDK ring buffer implementation is defined so to enable inlining of enqueue/dequeue functions into the caller, any code could immediately follow these calls.
    >     >
    >     > From INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
    >     > Programming languages — C
    >     >
    >     > 5.1.2.4
    >     > 4 Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.
    >     >
    >     > 25 The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
    > 
    >     IMO, Both condition will satisfy if the variable is volatile and 32bit read will atomic
    >     for 32b and 64b machines. If not, the problem persist for generic case
    >     as well(lib/librte_ring/rte_ring_generic.h)
    > The read from a volatile object is not an atomic access per the C11 spec. It just happens to
    > be translated to an instruction (on x86-64 and AArch64/A64) that implements an atomic load.
    > I don't think any compiler would change this code generation and suddenly generate some
    > non-atomic load instruction for a program that *only* uses volatile to do "atomic" accesses.
    > But a future compiler could detect the mix of atomic and non-atomic accesses and mark this
    > expression as causing undefined behaviour and that would have consequences for code generation.
    > 
    > 
    >     I agree with you on C11 memory model semantics usage. The reason why I
    >     propose name for the file as rte_ring_c11_mem.h as DPDK it self did not
    >     had definitions for load acquire and store release semantics.
    >     I was looking for taking load acquire and store release semantics
    >     from C11 instead of creating new API like Linux kernel for FreeBSD(APIs
    >     like  atomic_load_acq_32(), atomic_store_rel_32()). If the file name is your
    >     concern then we could create new abstractions as well. That would help
    >     exiting KNI problem as well.
    > I appreciate your embrace of the C11 memory model. I think it is better for describing
    > (both to the compiler and to humans) which and how objects are used for synchronisation.
    > 
    > However, I don't think an API as you suggest (and others have suggested before, e.g. as
    > done in ODP) is a good idea. There is an infinite amount of possible base types, an
    > increasing number of operations and a bunch of different memory orderings, a "complete"
    > API would be very large and difficult to test, and most members of the API would never be used.
    > GCC and Clang both support the __atomic intrinsics. This API avoids the problems I
    > described above. Or we could use the official C11 syntax (stdatomic.h). But then we
    > have the problem with using pre-C11 compilers...
    
    I have no objection, if everyone agrees to move C11 memory model
    with __atomic intrinsics. But if we need to keep both have then
    atomic_load_acq_32() kind of API make sense.
    
    
    > 
    > 
    > 
    > 
    >     I think, currently it mixed usage because, the same variable declaration
    >     used for C11 vs non C11 usage.Ideally we wont need "volatile" for C11
    >     case. Either we need to change only to C11 mode OR have APIs for
    >     atomic_load_acq_() and atomic_store_rel_() to allow both models like
    >     Linux kernel and FreeBSD.
    > 
    >     >
    >     > -- Ola
    >     >
    >     >
    >     >
    > 
    > 
    


  reply	other threads:[~2018-10-08 10:25 UTC|newest]

Thread overview: 131+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-06  1:18 [dpdk-dev] [PATCH] ring: fix c11 memory ordering issue Gavin Hu
2018-08-06  9:19 ` Thomas Monjalon
2018-08-08  1:39   ` Gavin Hu
2018-08-07  3:19 ` [dpdk-dev] [PATCH v2] " Gavin Hu
2018-08-07  5:56   ` He, Jia
2018-08-07  7:56     ` Gavin Hu
2018-08-08  3:07       ` Jerin Jacob
2018-08-08  7:23         ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
2018-09-17  7:47   ` [dpdk-dev] [PATCH v3 1/3] app/testpmd: show errno along with flow API errors Gavin Hu
2018-09-17  7:47     ` [dpdk-dev] [PATCH v3 2/3] net/i40e: remove invalid comment Gavin Hu
2018-09-17  8:25       ` Gavin Hu (Arm Technology China)
2018-09-17  7:47     ` [dpdk-dev] [PATCH v3 3/3] doc: add cross compile part for sample applications Gavin Hu
2018-09-17  9:48       ` Jerin Jacob
2018-09-17 10:28         ` Gavin Hu (Arm Technology China)
2018-09-17 10:34           ` Jerin Jacob
2018-09-17 10:55             ` Gavin Hu (Arm Technology China)
2018-09-17 10:49       ` [dpdk-dev] [PATCH v4] " Gavin Hu
2018-09-17 10:53         ` [dpdk-dev] [PATCH v5] " Gavin Hu
2018-09-18 11:00           ` Jerin Jacob
2018-09-19  0:33           ` [dpdk-dev] [PATCH v6] " Gavin Hu
2018-09-17  8:11     ` [dpdk-dev] [PATCH v4 1/4] bus/fslmc: fix undefined reference of memsegs Gavin Hu
2018-09-17  8:11       ` [dpdk-dev] [PATCH v4 2/4] ring: read tail using atomic load Gavin Hu
2018-09-20  6:41         ` Jerin Jacob
2018-09-25  9:26           ` Gavin Hu (Arm Technology China)
2018-09-17  8:11       ` [dpdk-dev] [PATCH v4 3/4] ring: synchronize the load and store of the tail Gavin Hu
2018-09-17  8:11       ` [dpdk-dev] [PATCH v4 4/4] ring: move the atomic load of head above the loop Gavin Hu
2018-10-27 14:21         ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
2018-09-17  8:17   ` [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load Gavin Hu
2018-09-17  8:17     ` [dpdk-dev] [PATCH v3 2/3] ring: synchronize the load and store of the tail Gavin Hu
2018-09-26  9:29       ` Gavin Hu (Arm Technology China)
2018-09-26  9:59         ` Justin He
2018-09-29 10:57       ` Jerin Jacob
2018-10-17  6:29       ` [dpdk-dev] [PATCH 1/2] " Gavin Hu
2018-10-17  6:29         ` [dpdk-dev] [PATCH 2/2] ring: move the atomic load of head above the loop Gavin Hu
2018-10-17  6:35         ` [dpdk-dev] [PATCH 1/2] ring: synchronize the load and store of the tail Gavin Hu (Arm Technology China)
2018-10-27 14:39           ` Thomas Monjalon
2018-10-27 15:00             ` Jerin Jacob
2018-10-27 15:13               ` Thomas Monjalon
2018-10-27 15:34                 ` Jerin Jacob
2018-10-27 15:48                   ` Thomas Monjalon
2018-10-29  2:51                   ` Gavin Hu (Arm Technology China)
2018-10-29  2:57                   ` Gavin Hu (Arm Technology China)
2018-10-29 10:16                     ` Jerin Jacob
2018-10-29 10:47                       ` Thomas Monjalon
2018-10-29 11:10                         ` Jerin Jacob
2018-11-03 20:12                 ` Mattias Rönnblom
2018-11-05 21:51                   ` Honnappa Nagarahalli
2018-11-06 11:03                     ` Mattias Rönnblom
2018-10-31  3:35         ` [dpdk-dev] [PATCH v2 0/2] rte ring c11 bug fix and optimization Gavin Hu
2018-10-31 10:26           ` [dpdk-dev] [PATCH v3 0/2] ring library with c11 memory model " Gavin Hu
2018-10-31 16:58             ` Thomas Monjalon
2018-11-01  9:53             ` [dpdk-dev] [PATCH v4 1/2] ring: synchronize the load and store of the tail Gavin Hu
2018-11-01  9:53             ` [dpdk-dev] [PATCH v4 2/2] ring: move the atomic load of head above the loop Gavin Hu
2018-11-01 17:26               ` Stephen Hemminger
2018-11-02  0:53                 ` Gavin Hu (Arm Technology China)
2018-11-02  4:30                   ` Honnappa Nagarahalli
2018-11-02  7:15                     ` Gavin Hu (Arm Technology China)
2018-11-02  9:36                       ` Thomas Monjalon
2018-11-02 11:23                         ` Gavin Hu (Arm Technology China)
2018-10-31 10:26           ` [dpdk-dev] [PATCH v3 1/2] ring: synchronize the load and store of the tail Gavin Hu
2018-10-31 22:07             ` Stephen Hemminger
2018-11-01  9:56               ` Gavin Hu (Arm Technology China)
2018-10-31 10:26           ` [dpdk-dev] [PATCH v3 2/2] ring: move the atomic load of head above the loop Gavin Hu
2018-10-31  3:35         ` [dpdk-dev] [PATCH v2 1/2] ring: synchronize the load and store of the tail Gavin Hu
2018-10-31  3:35         ` [dpdk-dev] [PATCH v2 2/2] ring: move the atomic load of head above the loop Gavin Hu
2018-10-31  9:36           ` Thomas Monjalon
2018-10-31 10:27             ` Gavin Hu (Arm Technology China)
2018-11-01  9:53         ` [dpdk-dev] [PATCH v4 0/2] ring library with c11 memory model bug fix and optimization Gavin Hu
2018-11-02 11:21           ` [dpdk-dev] [PATCH v5 " Gavin Hu
2018-11-02 11:21           ` [dpdk-dev] [PATCH v5 1/2] ring: synchronize the load and store of the tail Gavin Hu
2018-11-05  9:30             ` Olivier Matz
2018-11-02 11:21           ` [dpdk-dev] [PATCH v5 2/2] ring: move the atomic load of head above the loop Gavin Hu
2018-11-02 11:43             ` Bruce Richardson
2018-11-03  1:19               ` Gavin Hu (Arm Technology China)
2018-11-03  9:34                 ` Honnappa Nagarahalli
2018-11-05 13:17                   ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
2018-11-05 13:41                     ` Jerin Jacob
2018-11-05  9:44                 ` [dpdk-dev] " Olivier Matz
2018-11-05 13:36                   ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
2018-09-17  8:17     ` [dpdk-dev] [PATCH v3 3/3] " Gavin Hu
2018-09-26  9:29       ` Gavin Hu (Arm Technology China)
2018-09-26 10:06         ` Justin He
2018-09-29  7:19           ` Stephen Hemminger
2018-09-29 10:59       ` Jerin Jacob
2018-09-26  9:29     ` [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load Gavin Hu (Arm Technology China)
2018-09-26 10:09       ` Justin He
2018-09-29 10:48     ` Jerin Jacob
2018-10-05  0:47       ` Gavin Hu (Arm Technology China)
2018-10-05  8:21         ` Ananyev, Konstantin
2018-10-05 11:15           ` Ola Liljedahl
2018-10-05 11:36             ` Ola Liljedahl
2018-10-05 13:44               ` Ananyev, Konstantin
2018-10-05 14:21                 ` Ola Liljedahl
2018-10-05 15:11                 ` Honnappa Nagarahalli
2018-10-05 17:07                   ` Jerin Jacob
2018-10-05 18:05                     ` Ola Liljedahl
2018-10-05 20:06                       ` Honnappa Nagarahalli
2018-10-05 20:17                         ` Ola Liljedahl
2018-10-05 20:29                           ` Honnappa Nagarahalli
2018-10-05 20:34                             ` Ola Liljedahl
2018-10-06  7:41                               ` Jerin Jacob
2018-10-06 19:44                                 ` Ola Liljedahl
2018-10-06 19:59                                   ` Ola Liljedahl
2018-10-07  4:02                                   ` Jerin Jacob
2018-10-07 20:11                                     ` Ola Liljedahl
2018-10-07 20:44                                     ` Ola Liljedahl
2018-10-08  6:06                                       ` Jerin Jacob
2018-10-08  9:22                                         ` Ola Liljedahl
2018-10-08 10:00                                           ` Jerin Jacob
2018-10-08 10:25                                             ` Ola Liljedahl [this message]
2018-10-08 10:33                                               ` Gavin Hu (Arm Technology China)
2018-10-08 10:39                                                 ` Ola Liljedahl
2018-10-08 10:41                                                   ` Gavin Hu (Arm Technology China)
2018-10-08 10:49                                                 ` Jerin Jacob
2018-10-10  6:28                                                   ` Gavin Hu (Arm Technology China)
2018-10-10 19:26                                                     ` Honnappa Nagarahalli
2018-10-08 10:46                                               ` Jerin Jacob
2018-10-08 11:21                                                 ` Ola Liljedahl
2018-10-08 11:50                                                   ` Jerin Jacob
2018-10-08 11:59                                                     ` Ola Liljedahl
2018-10-08 12:05                                                       ` Jerin Jacob
2018-10-08 12:20                                                         ` Jerin Jacob
2018-10-08 12:30                                                           ` Ola Liljedahl
2018-10-09  8:53                                                             ` Olivier Matz
2018-10-09  3:16                                             ` Honnappa Nagarahalli
2018-10-08 14:43                                           ` Bruce Richardson
2018-10-08 14:46                                             ` Ola Liljedahl
2018-10-08 15:45                                               ` Ola Liljedahl
2018-10-08  5:27                               ` Honnappa Nagarahalli
2018-10-08 10:01                                 ` Ola Liljedahl
2018-10-27 14:17     ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B00E8CE9-7D26-46FC-A98B-06BA6DEEB813@arm.com \
    --to=ola.liljedahl@arm.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Steve.Capper@arm.com \
    --cc=dev@dpdk.org \
    --cc=jerin.jacob@caviumnetworks.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=nd@arm.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).