DPDK patches and discussions
 help / color / mirror / Atom feed
From: "François-Frédéric Ozog" <ff@ozog.com>
To: "'Thomas Monjalon'" <thomas.monjalon@6wind.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] spinlock: fix atomic and out of order execution
Date: Fri, 10 Jan 2014 19:02:18 +0100	[thread overview]
Message-ID: <019301cf0e2e$1b0af0a0$5120d1e0$@com> (raw)
In-Reply-To: <1387582656-1892-1-git-send-email-thomas.monjalon@6wind.com>

Hi Thomas,

I am afraid I introduced unnecessary complexity in the discussion as the
spinlock issues I mentioned are connected to a work in progress on my side
(implement a Chelsio cxgb5 PMD) but *not* to the general DPDK. 

I'll explain some aspects of the context and how critical sections has to be
handled in the case of the Chelsio cxgb5 PMD.

1) WC memory
Most memory is cached, some is not cached at all, and some is tagged (via
paging attributes) to be "Write Combining". This means that writes to WC
memory locations do *NOT* go through the cache but rather to a memory "write
buffer" internal to the processor. The processor delays write to memory as
it please, trying to minimize the actual number of writes to DRAM. This type
of memory is used by some devices as a fast interface, for instance Chelsio
cxgb5 write queue can filled using this method
(http://lwn.net/Articles/542643/). Public DPDK cannot declare such memory
type as it requires kernel support.

2) fencing and out of order execution
Out of order execution is related to the processor that may actually change
the order of the instructions of a program as it has been produced by the
compiler. In general, this relates only to performance but for critical
sections, it is absolutely necessary to ensure that the processor does NOT
postpone critical section instructions AFTER the unlock.
So there is a need of a "fence" to tell the processor that some instruction
blocks out of order execution: no previous instruction can be postponed
after the "fence".

3) Public DPDK critical sections

The DPDK implements lock/unlock with xchg instruction which is atomic and
serializing for "normal" cached memory. It includes an implicit lock prefix,
so no need to add such prefix (your patch proposal). This (implicit) lock
prefix is actually acting as an out of order execution fence, thus ensuring
that critical section is working as expected. Nothing has to be added or
changed for public DPDK to have correct critical sections.

4) "Extended" DPDK critical sections

Because of Chelsio cxgb5 PMD driver I am developing, I added kernel support
to DPDK to declare some memory regions as Write Combining, hence to be able
to leverage the output queue fast path.
Because of this, I also had to add proper fencing for critical sections.
The relation is the fact that writes to WC memory does NOT go through the
cache AND that the lock prefix is implemented as a cache protocol
transaction. In other words, a lock prefix is NOT an out of order
instruction fence for instructions that deal with WC memory. So, the
processor may decide to postpone the WC related instruction ATFER the
unlock.
To create a proper out of order execution fence, the processor has a set of
explicit memory fences that do the job.
So before a rte_unlock, I have to add a rte_mb().


So:
- for people willing to develop applications on top of DPDK, my comments can
be disregarded, they are not relevant, there is no need to understand them. 
- for people willing to develop PMD drivers, the fencing comments should be
clear to use the proper fencing.


I am sorry for any confusion I may have introduced in the community.


François-Frédéric


> -----Message d'origine-----
> De : dev [mailto:dev-bounces@dpdk.org] De la part de Thomas Monjalon
> Envoyé : samedi 21 décembre 2013 00:38
> À : dev@dpdk.org
> Objet : [dpdk-dev] [PATCH] spinlock: fix atomic and out of order execution
> 
> From: Damien Millescamps <damien.millescamps@6wind.com>
> 
> Add lock prefix before xchg instructions in order to be atomic and flush
> speculative values to ensure effective execution order (as an acquire
> barrier).
> 
> MPLOCKED is a "lock" in multicore case.
> 
> Signed-off-by: Damien Millescamps <damien.millescamps@6wind.com>
> Signed-off-by: Thomas Monjalon <thomas.monjalon@6wind.com>
> ---
>  lib/librte_eal/common/include/rte_spinlock.h |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_eal/common/include/rte_spinlock.h
> b/lib/librte_eal/common/include/rte_spinlock.h
> index f7a245a..8edb971 100644
> --- a/lib/librte_eal/common/include/rte_spinlock.h
> +++ b/lib/librte_eal/common/include/rte_spinlock.h
> @@ -51,6 +51,7 @@
>  extern "C" {
>  #endif
> 
> +#include <rte_atomic.h>
>  #include <rte_lcore.h>
>  #ifdef RTE_FORCE_INTRINSICS
>  #include <rte_common.h>
> @@ -93,7 +94,7 @@ rte_spinlock_lock(rte_spinlock_t *sl)
>  	int lock_val = 1;
>  	asm volatile (
>  			"1:\n"
> -			"xchg %[locked], %[lv]\n"
> +			MPLOCKED "xchg %[locked], %[lv]\n"
>  			"test %[lv], %[lv]\n"
>  			"jz 3f\n"
>  			"2:\n"
> @@ -124,7 +125,7 @@ rte_spinlock_unlock (rte_spinlock_t *sl)  #ifndef
> RTE_FORCE_INTRINSICS
>  	int unlock_val = 0;
>  	asm volatile (
> -			"xchg %[locked], %[ulv]\n"
> +			MPLOCKED "xchg %[locked], %[ulv]\n"
>  			: [locked] "=m" (sl->locked), [ulv] "=q"
(unlock_val)
>  			: "[ulv]" (unlock_val)
>  			: "memory");
> @@ -148,7 +149,7 @@ rte_spinlock_trylock (rte_spinlock_t *sl)
>  	int lockval = 1;
> 
>  	asm volatile (
> -			"xchg %[locked], %[lockval]"
> +			MPLOCKED "xchg %[locked], %[lockval]"
>  			: [locked] "=m" (sl->locked), [lockval] "=q"
(lockval)
>  			: "[lockval]" (lockval)
>  			: "memory");
> --
> 1.7.10.4

      parent reply	other threads:[~2014-01-10 18:03 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-20 23:37 Thomas Monjalon
2014-01-02 16:32 ` Stephen Hemminger
2014-01-10 18:02 ` François-Frédéric Ozog [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='019301cf0e2e$1b0af0a0$5120d1e0$@com' \
    --to=ff@ozog.com \
    --cc=dev@dpdk.org \
    --cc=thomas.monjalon@6wind.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).