patches for DPDK stable branches
 help / color / mirror / Atom feed
From: Thomas Monjalon <thomas@monjalon.net>
To: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
Cc: stable@dpdk.org, dev@dpdk.org, olivier.matz@6wind.com,
	chaozhu@linux.vnet.ibm.com, konstantin.ananyev@intel.com,
	Jerin Jacob <jerin.jacob@caviumnetworks.com>,
	honnappa.nagarahalli@arm.com
Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64
Date: Wed, 24 Mar 2021 22:45:16 +0100	[thread overview]
Message-ID: <7532216.RsKJNViV3k@thomas> (raw)
In-Reply-To: <20180717033410.GA3344@jerin>

No reply after more than 2 years.
Unfortunately it is probably outdated now.
Classified as "Changes Requested".


17/07/2018 05:34, Jerin Jacob:
> From: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
> 
> Cc: olivier.matz@6wind.com
> Cc: chaozhu@linux.vnet.ibm.com
> Cc: konstantin.ananyev@intel.com
> 
> > 
> > > Adding rte_smp_rmb() cause performance regression on non x86 platforms.
> > > Having said that, load-load barrier can be expressed very  well with C11 memory
> > > model. I guess ppc64 supports C11 memory model. If so,
> > > Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check
> > > original issue?
> > 
> > Yes, the performance regression happens on non-x86 with single
> > producer/consumer.
> > The average latency of an enqueue was increased from 21 nsec to 24 nsec in my
> > simple experiment. But, I think it is worth it.
> 
> That varies to machine to machine. What is the burst size etc.
> 
> > 
> > 
> > I also tested C11 rte_ring, however, it caused the same race condition in ppc64.
> > I tried to fix the C11 problem as well, but I also found the C11
> > rte_ring had other potential
> > incorrect choices of memory orders, which caused another race
> > condition in ppc64.
> 
> Does it happens on all ppc64 machines? Or on a specific machine?
> Is following tests are passing on your system without the patch?
> test/test/test_ring_perf.c
> test/test/test_ring.c
> 
> > 
> > For example,
> > __ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(), but
> > I am not sure why the load-acquire is used for the compare exchange.
> 
> It correct as per C11 acquire and release semantics.
> 
> > Also in update_tail, the pause can be called before the data copy because
> > of ht->tail load without atomic_load_n.
> > 
> > The memory order is simply difficult, so it might take a bit longer
> > time to check
> > if the code is correct. I think I can fix the C11 rte_ring as another patch.
> > 
> > >>
> > >> SPDK blobfs encountered a crash around rte_ring dequeues in ppc64.
> > >> It uses a single consumer and multiple producers for a rte_ring.
> > >> The problem was a load-load reorder in rte_ring_sc_dequeue_bulk().
> > >
> > > Adding rte_smp_rmb() cause performance regression on non x86 platforms.
> > > Having said that, load-load barrier can be expressed very  well with C11 memory
> > > model. I guess ppc64 supports C11 memory model. If so,
> > > Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check
> > > original issue?
> > >
> > >>
> > >> The reordered loads happened on r->prod.tail in
> 
> There is rte_smp_rmb() just before reading r->prod.tail in 
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> _rte_ring_move_cons_head(). Would that not suffice the requirement?
> 
> Can you check adding compiler barrier and see is compiler is reordering
> the stuff?
> 
> DPDK's ring implementation is based freebsd's ring implementation, I
> don't see need for such barrier
> 
> https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h
> 
> If it is something specific to ppc64 or a specific ppc64 machine, we 
> could add a compile option as it is arch specific to avoid performance
> impact on other architectures.
> 
> > >> __rte_ring_move_cons_head() (rte_ring_generic.h) and ring[idx] in
> > >> DEQUEUE_PTRS() (rte_ring.h). They have a load-load control
> > >> dependency, but the code does not satisfy it. Note that they are
> > >> not reordered if __rte_ring_move_cons_head() with is_sc != 1 because
> > >> cmpset invokes a read barrier.
> > >>
> > >> The paired stores on these loads are in ENQUEUE_PTRS() and
> > >> update_tail(). Simplified code around the reorder is the following.
> > >>
> > >> Consumer             Producer
> > >> load idx[ring]
> > >>                      store idx[ring]
> > >>                      store r->prod.tail
> > >> load r->prod.tail
> > >>
> > >> In this case, the consumer loads old idx[ring] and confirms the load
> > >> is valid with the new r->prod.tail.
> > >>
> > >> I added a read barrier in the case where __IS_SC is passed to
> > >> __rte_ring_move_cons_head(). I also fixed __rte_ring_move_prod_head()
> > >> to avoid similar problems with a single producer.
> > >>
> > >> Cc: stable@dpdk.org
> > >>
> > >> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
> > >> ---
> > >>  lib/librte_ring/rte_ring_generic.h | 10 ++++++----
> > >>  1 file changed, 6 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
> > >> index ea7dbe5b9..477326180 100644
> > >> --- a/lib/librte_ring/rte_ring_generic.h
> > >> +++ b/lib/librte_ring/rte_ring_generic.h
> > >> @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
> > >>                         return 0;
> > >>
> > >>                 *new_head = *old_head + n;
> > >> -               if (is_sp)
> > >> +               if (is_sp) {
> > >> +                       rte_smp_rmb();
> > >>                         r->prod.head = *new_head, success = 1;
> > >> -               else
> > >> +               } else
> > >>                         success = rte_atomic32_cmpset(&r->prod.head,
> > >>                                         *old_head, *new_head);
> > >>         } while (unlikely(success == 0));
> > >> @@ -158,9 +159,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
> > >>                         return 0;
> > >>
> > >>                 *new_head = *old_head + n;
> > >> -               if (is_sc)
> > >> +               if (is_sc) {
> > >> +                       rte_smp_rmb();
> > >>                         r->cons.head = *new_head, success = 1;
> > >> -               else
> > >> +               } else
> > >>                         success = rte_atomic32_cmpset(&r->cons.head, *old_head,
> > >>                                         *new_head);
> > >>         } while (unlikely(success == 0));
> > >> --
> > >> 2.17.1




  reply	other threads:[~2021-03-24 21:45 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-12  2:44 [dpdk-stable] " Takeshi Yoshimura
2018-07-12 17:08 ` [dpdk-stable] [dpdk-dev] " Jerin Jacob
2018-07-17  2:54   ` Takeshi Yoshimura
2018-07-17  3:34     ` Jerin Jacob
2021-03-24 21:45       ` Thomas Monjalon [this message]
2021-03-28  1:00         ` Honnappa Nagarahalli
2021-06-16  7:14           ` [dpdk-stable] 回复: " Feifei Wang
2021-06-16 16:37             ` [dpdk-stable] " Honnappa Nagarahalli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7532216.RsKJNViV3k@thomas \
    --to=thomas@monjalon.net \
    --cc=chaozhu@linux.vnet.ibm.com \
    --cc=dev@dpdk.org \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jerin.jacob@caviumnetworks.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=olivier.matz@6wind.com \
    --cc=stable@dpdk.org \
    --cc=t.yoshimura8869@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).