DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] memory barriers in rte_ring
@ 2014-03-27 16:48 Olivier MATZ
  2014-03-27 19:06 ` Stephen Hemminger
  0 siblings, 1 reply; 5+ messages in thread
From: Olivier MATZ @ 2014-03-27 16:48 UTC (permalink / raw)
  To: dev

Hi,

The commit 286bd05bf7 [1] removed the memory barriers in the ring
functions. This patch is present in DPDK since version 1.4.0r0, so I
guess it does not cause any issue.

But after checking the excellent Linux kernel documentation about memory
barriers [2], I'm wondering why memory barriers would not be required in
that case.

To illustrate the previous behavior (before dpdk 1.4):

   ring_enqueue()
     - move producer_head to reserve space in ring (atomically if
       multi producers)
     - write objects between producer_head and producer_tail
     - wmb() to ensure that STORE operations are issued
     - write producer_tail

   ring_dequeue()
     - move consumer_head (atomically if multi consumers)
     - rmb() to ensure that LOAD operations are issued: the read of
       consumer_head must occur before the reading of objects ptrs.
       In fact, rmb() is probably not needed here because knowing the
       value of consumer_head is required before reading the objects
       table.
     - read objects between consumer_head and consumer_tail
     - write consumer_tail

The memory barriers have been removed, but in my understanding at least
the wmb() would be needed according to the generic memory barrier
documentation. Maybe this is not needed on newest Intel processors?
Could anyone from Intel enlight me on this?

Thanks & regards,
Olivier


[1] 
http://dpdk.org/browse/dpdk/commit/lib/librte_ring/rte_ring.h?id=286bd05bf70d1da1b6017007276c267a1e012c1d

[2] http://lxr.free-electrons.com/source/Documentation/memory-barriers.txt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] memory barriers in rte_ring
  2014-03-27 16:48 [dpdk-dev] memory barriers in rte_ring Olivier MATZ
@ 2014-03-27 19:06 ` Stephen Hemminger
  2014-03-27 19:47   ` Olivier MATZ
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2014-03-27 19:06 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: dev

On Thu, 27 Mar 2014 17:48:21 +0100
Olivier MATZ <olivier.matz@6wind.com> wrote:

> Hi,
> 
> The commit 286bd05bf7 [1] removed the memory barriers in the ring
> functions. This patch is present in DPDK since version 1.4.0r0, so I
> guess it does not cause any issue.
> 
> But after checking the excellent Linux kernel documentation about memory
> barriers [2], I'm wondering why memory barriers would not be required in
> that case.
> 
> To illustrate the previous behavior (before dpdk 1.4):
> 
>    ring_enqueue()
>      - move producer_head to reserve space in ring (atomically if
>        multi producers)
>      - write objects between producer_head and producer_tail
>      - wmb() to ensure that STORE operations are issued
>      - write producer_tail
> 
>    ring_dequeue()
>      - move consumer_head (atomically if multi consumers)
>      - rmb() to ensure that LOAD operations are issued: the read of
>        consumer_head must occur before the reading of objects ptrs.
>        In fact, rmb() is probably not needed here because knowing the
>        value of consumer_head is required before reading the objects
>        table.
>      - read objects between consumer_head and consumer_tail
>      - write consumer_tail
> 
> The memory barriers have been removed, but in my understanding at least
> the wmb() would be needed according to the generic memory barrier
> documentation. Maybe this is not needed on newest Intel processors?
> Could anyone from Intel enlight me on this?
> 
> Thanks & regards,
> Olivier
> 
> 
> [1] 
> http://dpdk.org/browse/dpdk/commit/lib/librte_ring/rte_ring.h?id=286bd05bf70d1da1b6017007276c267a1e012c1d
> 
> [2] http://lxr.free-electrons.com/source/Documentation/memory-barriers.txt

Short answer, only a compiler barrier is necessary.

Long answer: for the multple CPU access ring, it is equivalent to smp_wmb and smp_rmb
 in Linux kernel. For x86 where DPDK is used, this can normally be replaced by simpler
 compiler barrier. In kernel there is a special flage X86_OOSTORE which is only enabled
 for a few special cases, for most cases it is not. When cpu doesnt do out of order
 stores, there are no cases where other cpu will see wrong state.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] memory barriers in rte_ring
  2014-03-27 19:06 ` Stephen Hemminger
@ 2014-03-27 19:47   ` Olivier MATZ
  2014-03-27 20:20     ` Stephen Hemminger
  0 siblings, 1 reply; 5+ messages in thread
From: Olivier MATZ @ 2014-03-27 19:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Hi Stephen,

On 03/27/2014 08:06 PM, Stephen Hemminger wrote:
> Long answer: for the multple CPU access ring, it is equivalent to smp_wmb and smp_rmb
>   in Linux kernel. For x86 where DPDK is used, this can normally be replaced by simpler
>   compiler barrier. In kernel there is a special flage X86_OOSTORE which is only enabled
>   for a few special cases, for most cases it is not. When cpu doesnt do out of order
>   stores, there are no cases where other cpu will see wrong state.

Thank you for this clarification.

So, if I understand properly, all usages of rte_*mb() sequencing memory
operations between CPUs could be replaced by a compiler barrier. On the
other hand, if the memory is also accessed by a device, a memory
barrier has to be used.

Olivier

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] memory barriers in rte_ring
  2014-03-27 19:47   ` Olivier MATZ
@ 2014-03-27 20:20     ` Stephen Hemminger
  2014-03-27 23:53       ` Venkatesan, Venky
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2014-03-27 20:20 UTC (permalink / raw)
  To: Olivier MATZ; +Cc: dev

On Thu, 27 Mar 2014 20:47:37 +0100
Olivier MATZ <olivier.matz@6wind.com> wrote:

> Hi Stephen,
> 
> On 03/27/2014 08:06 PM, Stephen Hemminger wrote:
> > Long answer: for the multple CPU access ring, it is equivalent to smp_wmb and smp_rmb
> >   in Linux kernel. For x86 where DPDK is used, this can normally be replaced by simpler
> >   compiler barrier. In kernel there is a special flage X86_OOSTORE which is only enabled
> >   for a few special cases, for most cases it is not. When cpu doesnt do out of order
> >   stores, there are no cases where other cpu will see wrong state.
> 
> Thank you for this clarification.
> 
> So, if I understand properly, all usages of rte_*mb() sequencing memory
> operations between CPUs could be replaced by a compiler barrier. On the
> other hand, if the memory is also accessed by a device, a memory
> barrier has to be used.
> 
> Olivier
> 

I think so for the current architecture that DPDK runs on. It might be good
to abstract this in some way for eventual users in other environments.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] memory barriers in rte_ring
  2014-03-27 20:20     ` Stephen Hemminger
@ 2014-03-27 23:53       ` Venkatesan, Venky
  0 siblings, 0 replies; 5+ messages in thread
From: Venkatesan, Venky @ 2014-03-27 23:53 UTC (permalink / raw)
  To: Stephen Hemminger, Olivier MATZ; +Cc: dev

One caveat - a compiler_barrier should be enough when both sides are using strongly-ordered memory operations (as in the case of the rings). Weakly ordered operations will still need fencing.

-Venky

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Stephen Hemminger
Sent: Thursday, March 27, 2014 1:20 PM
To: Olivier MATZ
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] memory barriers in rte_ring

On Thu, 27 Mar 2014 20:47:37 +0100
Olivier MATZ <olivier.matz@6wind.com> wrote:

> Hi Stephen,
> 
> On 03/27/2014 08:06 PM, Stephen Hemminger wrote:
> > Long answer: for the multple CPU access ring, it is equivalent to smp_wmb and smp_rmb
> >   in Linux kernel. For x86 where DPDK is used, this can normally be replaced by simpler
> >   compiler barrier. In kernel there is a special flage X86_OOSTORE which is only enabled
> >   for a few special cases, for most cases it is not. When cpu doesnt do out of order
> >   stores, there are no cases where other cpu will see wrong state.
> 
> Thank you for this clarification.
> 
> So, if I understand properly, all usages of rte_*mb() sequencing 
> memory operations between CPUs could be replaced by a compiler 
> barrier. On the other hand, if the memory is also accessed by a 
> device, a memory barrier has to be used.
> 
> Olivier
> 

I think so for the current architecture that DPDK runs on. It might be good to abstract this in some way for eventual users in other environments.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-03-27 23:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-27 16:48 [dpdk-dev] memory barriers in rte_ring Olivier MATZ
2014-03-27 19:06 ` Stephen Hemminger
2014-03-27 19:47   ` Olivier MATZ
2014-03-27 20:20     ` Stephen Hemminger
2014-03-27 23:53       ` Venkatesan, Venky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).