From: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Cc: dev@dpdk.org, stable@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64
Date: Tue, 17 Jul 2018 11:54:18 +0900 [thread overview]
Message-ID: <CALBPOTWYxCN3+DL_5Ozx3parn+hyo582BJgrjRzicPOEwR1CcA@mail.gmail.com> (raw)
In-Reply-To: <20180712170839.GA11626@jerin>
> Adding rte_smp_rmb() cause performance regression on non x86 platforms.
> Having said that, load-load barrier can be expressed very well with C11 memory
> model. I guess ppc64 supports C11 memory model. If so,
> Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check
> original issue?
Yes, the performance regression happens on non-x86 with single
producer/consumer.
The average latency of an enqueue was increased from 21 nsec to 24 nsec in my
simple experiment. But, I think it is worth it.
I also tested C11 rte_ring, however, it caused the same race condition in ppc64.
I tried to fix the C11 problem as well, but I also found the C11
rte_ring had other potential
incorrect choices of memory orders, which caused another race
condition in ppc64.
For example,
__ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(), but
I am not sure why the load-acquire is used for the compare exchange.
Also in update_tail, the pause can be called before the data copy because
of ht->tail load without atomic_load_n.
The memory order is simply difficult, so it might take a bit longer
time to check
if the code is correct. I think I can fix the C11 rte_ring as another patch.
2018-07-13 2:08 GMT+09:00 Jerin Jacob <jerin.jacob@caviumnetworks.com>:
> -----Original Message-----
>> Date: Thu, 12 Jul 2018 11:44:14 +0900
>> From: Takeshi Yoshimura <t.yoshimura8869@gmail.com>
>> To: dev@dpdk.org
>> Cc: Takeshi Yoshimura <t.yoshimura8869@gmail.com>, stable@dpdk.org, Takeshi
>> Yoshimura <tyos@jp.ibm.com>
>> Subject: [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64
>> X-Mailer: git-send-email 2.15.1
>>
>> External Email
>>
>> SPDK blobfs encountered a crash around rte_ring dequeues in ppc64.
>> It uses a single consumer and multiple producers for a rte_ring.
>> The problem was a load-load reorder in rte_ring_sc_dequeue_bulk().
>
> Adding rte_smp_rmb() cause performance regression on non x86 platforms.
> Having said that, load-load barrier can be expressed very well with C11 memory
> model. I guess ppc64 supports C11 memory model. If so,
> Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check
> original issue?
>
>>
>> The reordered loads happened on r->prod.tail in
>> __rte_ring_move_cons_head() (rte_ring_generic.h) and ring[idx] in
>> DEQUEUE_PTRS() (rte_ring.h). They have a load-load control
>> dependency, but the code does not satisfy it. Note that they are
>> not reordered if __rte_ring_move_cons_head() with is_sc != 1 because
>> cmpset invokes a read barrier.
>>
>> The paired stores on these loads are in ENQUEUE_PTRS() and
>> update_tail(). Simplified code around the reorder is the following.
>>
>> Consumer Producer
>> load idx[ring]
>> store idx[ring]
>> store r->prod.tail
>> load r->prod.tail
>>
>> In this case, the consumer loads old idx[ring] and confirms the load
>> is valid with the new r->prod.tail.
>>
>> I added a read barrier in the case where __IS_SC is passed to
>> __rte_ring_move_cons_head(). I also fixed __rte_ring_move_prod_head()
>> to avoid similar problems with a single producer.
>>
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
>> ---
>> lib/librte_ring/rte_ring_generic.h | 10 ++++++----
>> 1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h
>> index ea7dbe5b9..477326180 100644
>> --- a/lib/librte_ring/rte_ring_generic.h
>> +++ b/lib/librte_ring/rte_ring_generic.h
>> @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
>> return 0;
>>
>> *new_head = *old_head + n;
>> - if (is_sp)
>> + if (is_sp) {
>> + rte_smp_rmb();
>> r->prod.head = *new_head, success = 1;
>> - else
>> + } else
>> success = rte_atomic32_cmpset(&r->prod.head,
>> *old_head, *new_head);
>> } while (unlikely(success == 0));
>> @@ -158,9 +159,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
>> return 0;
>>
>> *new_head = *old_head + n;
>> - if (is_sc)
>> + if (is_sc) {
>> + rte_smp_rmb();
>> r->cons.head = *new_head, success = 1;
>> - else
>> + } else
>> success = rte_atomic32_cmpset(&r->cons.head, *old_head,
>> *new_head);
>> } while (unlikely(success == 0));
>> --
>> 2.17.1
>>
next prev parent reply other threads:[~2018-07-17 2:54 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-12 2:44 Takeshi Yoshimura
2018-07-12 17:08 ` Jerin Jacob
2018-07-17 2:54 ` Takeshi Yoshimura [this message]
2018-07-17 3:34 ` Jerin Jacob
2021-03-24 21:45 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
2021-03-28 1:00 ` Honnappa Nagarahalli
2021-06-16 7:14 ` [dpdk-dev] 回复: " Feifei Wang
2021-06-16 16:37 ` [dpdk-dev] " Honnappa Nagarahalli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALBPOTWYxCN3+DL_5Ozx3parn+hyo582BJgrjRzicPOEwR1CcA@mail.gmail.com \
--to=t.yoshimura8869@gmail.com \
--cc=dev@dpdk.org \
--cc=jerin.jacob@caviumnetworks.com \
--cc=stable@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).