* [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 @ 2018-07-12 2:44 Takeshi Yoshimura 2018-07-12 17:08 ` Jerin Jacob 0 siblings, 1 reply; 8+ messages in thread From: Takeshi Yoshimura @ 2018-07-12 2:44 UTC (permalink / raw) To: dev; +Cc: Takeshi Yoshimura, stable, Takeshi Yoshimura SPDK blobfs encountered a crash around rte_ring dequeues in ppc64. It uses a single consumer and multiple producers for a rte_ring. The problem was a load-load reorder in rte_ring_sc_dequeue_bulk(). The reordered loads happened on r->prod.tail in __rte_ring_move_cons_head() (rte_ring_generic.h) and ring[idx] in DEQUEUE_PTRS() (rte_ring.h). They have a load-load control dependency, but the code does not satisfy it. Note that they are not reordered if __rte_ring_move_cons_head() with is_sc != 1 because cmpset invokes a read barrier. The paired stores on these loads are in ENQUEUE_PTRS() and update_tail(). Simplified code around the reorder is the following. Consumer Producer load idx[ring] store idx[ring] store r->prod.tail load r->prod.tail In this case, the consumer loads old idx[ring] and confirms the load is valid with the new r->prod.tail. I added a read barrier in the case where __IS_SC is passed to __rte_ring_move_cons_head(). I also fixed __rte_ring_move_prod_head() to avoid similar problems with a single producer. Cc: stable@dpdk.org Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com> --- lib/librte_ring/rte_ring_generic.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h index ea7dbe5b9..477326180 100644 --- a/lib/librte_ring/rte_ring_generic.h +++ b/lib/librte_ring/rte_ring_generic.h @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, return 0; *new_head = *old_head + n; - if (is_sp) + if (is_sp) { + rte_smp_rmb(); r->prod.head = *new_head, success = 1; - else + } else success = rte_atomic32_cmpset(&r->prod.head, *old_head, *new_head); } while (unlikely(success == 0)); @@ -158,9 +159,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc, return 0; *new_head = *old_head + n; - if (is_sc) + if (is_sc) { + rte_smp_rmb(); r->cons.head = *new_head, success = 1; - else + } else success = rte_atomic32_cmpset(&r->cons.head, *old_head, *new_head); } while (unlikely(success == 0)); -- 2.17.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 2018-07-12 2:44 [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 Takeshi Yoshimura @ 2018-07-12 17:08 ` Jerin Jacob 2018-07-17 2:54 ` Takeshi Yoshimura 0 siblings, 1 reply; 8+ messages in thread From: Jerin Jacob @ 2018-07-12 17:08 UTC (permalink / raw) To: Takeshi Yoshimura; +Cc: dev, stable, Takeshi Yoshimura -----Original Message----- > Date: Thu, 12 Jul 2018 11:44:14 +0900 > From: Takeshi Yoshimura <t.yoshimura8869@gmail.com> > To: dev@dpdk.org > Cc: Takeshi Yoshimura <t.yoshimura8869@gmail.com>, stable@dpdk.org, Takeshi > Yoshimura <tyos@jp.ibm.com> > Subject: [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 > X-Mailer: git-send-email 2.15.1 > > External Email > > SPDK blobfs encountered a crash around rte_ring dequeues in ppc64. > It uses a single consumer and multiple producers for a rte_ring. > The problem was a load-load reorder in rte_ring_sc_dequeue_bulk(). Adding rte_smp_rmb() cause performance regression on non x86 platforms. Having said that, load-load barrier can be expressed very well with C11 memory model. I guess ppc64 supports C11 memory model. If so, Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check original issue? > > The reordered loads happened on r->prod.tail in > __rte_ring_move_cons_head() (rte_ring_generic.h) and ring[idx] in > DEQUEUE_PTRS() (rte_ring.h). They have a load-load control > dependency, but the code does not satisfy it. Note that they are > not reordered if __rte_ring_move_cons_head() with is_sc != 1 because > cmpset invokes a read barrier. > > The paired stores on these loads are in ENQUEUE_PTRS() and > update_tail(). Simplified code around the reorder is the following. > > Consumer Producer > load idx[ring] > store idx[ring] > store r->prod.tail > load r->prod.tail > > In this case, the consumer loads old idx[ring] and confirms the load > is valid with the new r->prod.tail. > > I added a read barrier in the case where __IS_SC is passed to > __rte_ring_move_cons_head(). I also fixed __rte_ring_move_prod_head() > to avoid similar problems with a single producer. > > Cc: stable@dpdk.org > > Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com> > --- > lib/librte_ring/rte_ring_generic.h | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h > index ea7dbe5b9..477326180 100644 > --- a/lib/librte_ring/rte_ring_generic.h > +++ b/lib/librte_ring/rte_ring_generic.h > @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, > return 0; > > *new_head = *old_head + n; > - if (is_sp) > + if (is_sp) { > + rte_smp_rmb(); > r->prod.head = *new_head, success = 1; > - else > + } else > success = rte_atomic32_cmpset(&r->prod.head, > *old_head, *new_head); > } while (unlikely(success == 0)); > @@ -158,9 +159,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc, > return 0; > > *new_head = *old_head + n; > - if (is_sc) > + if (is_sc) { > + rte_smp_rmb(); > r->cons.head = *new_head, success = 1; > - else > + } else > success = rte_atomic32_cmpset(&r->cons.head, *old_head, > *new_head); > } while (unlikely(success == 0)); > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 2018-07-12 17:08 ` Jerin Jacob @ 2018-07-17 2:54 ` Takeshi Yoshimura 2018-07-17 3:34 ` Jerin Jacob 0 siblings, 1 reply; 8+ messages in thread From: Takeshi Yoshimura @ 2018-07-17 2:54 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, stable > Adding rte_smp_rmb() cause performance regression on non x86 platforms. > Having said that, load-load barrier can be expressed very well with C11 memory > model. I guess ppc64 supports C11 memory model. If so, > Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check > original issue? Yes, the performance regression happens on non-x86 with single producer/consumer. The average latency of an enqueue was increased from 21 nsec to 24 nsec in my simple experiment. But, I think it is worth it. I also tested C11 rte_ring, however, it caused the same race condition in ppc64. I tried to fix the C11 problem as well, but I also found the C11 rte_ring had other potential incorrect choices of memory orders, which caused another race condition in ppc64. For example, __ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(), but I am not sure why the load-acquire is used for the compare exchange. Also in update_tail, the pause can be called before the data copy because of ht->tail load without atomic_load_n. The memory order is simply difficult, so it might take a bit longer time to check if the code is correct. I think I can fix the C11 rte_ring as another patch. 2018-07-13 2:08 GMT+09:00 Jerin Jacob <jerin.jacob@caviumnetworks.com>: > -----Original Message----- >> Date: Thu, 12 Jul 2018 11:44:14 +0900 >> From: Takeshi Yoshimura <t.yoshimura8869@gmail.com> >> To: dev@dpdk.org >> Cc: Takeshi Yoshimura <t.yoshimura8869@gmail.com>, stable@dpdk.org, Takeshi >> Yoshimura <tyos@jp.ibm.com> >> Subject: [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 >> X-Mailer: git-send-email 2.15.1 >> >> External Email >> >> SPDK blobfs encountered a crash around rte_ring dequeues in ppc64. >> It uses a single consumer and multiple producers for a rte_ring. >> The problem was a load-load reorder in rte_ring_sc_dequeue_bulk(). > > Adding rte_smp_rmb() cause performance regression on non x86 platforms. > Having said that, load-load barrier can be expressed very well with C11 memory > model. I guess ppc64 supports C11 memory model. If so, > Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check > original issue? > >> >> The reordered loads happened on r->prod.tail in >> __rte_ring_move_cons_head() (rte_ring_generic.h) and ring[idx] in >> DEQUEUE_PTRS() (rte_ring.h). They have a load-load control >> dependency, but the code does not satisfy it. Note that they are >> not reordered if __rte_ring_move_cons_head() with is_sc != 1 because >> cmpset invokes a read barrier. >> >> The paired stores on these loads are in ENQUEUE_PTRS() and >> update_tail(). Simplified code around the reorder is the following. >> >> Consumer Producer >> load idx[ring] >> store idx[ring] >> store r->prod.tail >> load r->prod.tail >> >> In this case, the consumer loads old idx[ring] and confirms the load >> is valid with the new r->prod.tail. >> >> I added a read barrier in the case where __IS_SC is passed to >> __rte_ring_move_cons_head(). I also fixed __rte_ring_move_prod_head() >> to avoid similar problems with a single producer. >> >> Cc: stable@dpdk.org >> >> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com> >> --- >> lib/librte_ring/rte_ring_generic.h | 10 ++++++---- >> 1 file changed, 6 insertions(+), 4 deletions(-) >> >> diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h >> index ea7dbe5b9..477326180 100644 >> --- a/lib/librte_ring/rte_ring_generic.h >> +++ b/lib/librte_ring/rte_ring_generic.h >> @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, >> return 0; >> >> *new_head = *old_head + n; >> - if (is_sp) >> + if (is_sp) { >> + rte_smp_rmb(); >> r->prod.head = *new_head, success = 1; >> - else >> + } else >> success = rte_atomic32_cmpset(&r->prod.head, >> *old_head, *new_head); >> } while (unlikely(success == 0)); >> @@ -158,9 +159,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc, >> return 0; >> >> *new_head = *old_head + n; >> - if (is_sc) >> + if (is_sc) { >> + rte_smp_rmb(); >> r->cons.head = *new_head, success = 1; >> - else >> + } else >> success = rte_atomic32_cmpset(&r->cons.head, *old_head, >> *new_head); >> } while (unlikely(success == 0)); >> -- >> 2.17.1 >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 2018-07-17 2:54 ` Takeshi Yoshimura @ 2018-07-17 3:34 ` Jerin Jacob 2021-03-24 21:45 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon 0 siblings, 1 reply; 8+ messages in thread From: Jerin Jacob @ 2018-07-17 3:34 UTC (permalink / raw) To: Takeshi Yoshimura; +Cc: dev, stable, olivier.matz, chaozhu, konstantin.ananyev -----Original Message----- > Date: Tue, 17 Jul 2018 11:54:18 +0900 > From: Takeshi Yoshimura <t.yoshimura8869@gmail.com> > To: Jerin Jacob <jerin.jacob@caviumnetworks.com> > Cc: dev@dpdk.org, stable@dpdk.org > Subject: Re: [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 > Cc: olivier.matz@6wind.com Cc: chaozhu@linux.vnet.ibm.com Cc: konstantin.ananyev@intel.com > > > Adding rte_smp_rmb() cause performance regression on non x86 platforms. > > Having said that, load-load barrier can be expressed very well with C11 memory > > model. I guess ppc64 supports C11 memory model. If so, > > Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check > > original issue? > > Yes, the performance regression happens on non-x86 with single > producer/consumer. > The average latency of an enqueue was increased from 21 nsec to 24 nsec in my > simple experiment. But, I think it is worth it. That varies to machine to machine. What is the burst size etc. > > > I also tested C11 rte_ring, however, it caused the same race condition in ppc64. > I tried to fix the C11 problem as well, but I also found the C11 > rte_ring had other potential > incorrect choices of memory orders, which caused another race > condition in ppc64. Does it happens on all ppc64 machines? Or on a specific machine? Is following tests are passing on your system without the patch? test/test/test_ring_perf.c test/test/test_ring.c > > For example, > __ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(), but > I am not sure why the load-acquire is used for the compare exchange. It correct as per C11 acquire and release semantics. > Also in update_tail, the pause can be called before the data copy because > of ht->tail load without atomic_load_n. > > The memory order is simply difficult, so it might take a bit longer > time to check > if the code is correct. I think I can fix the C11 rte_ring as another patch. > > >> > >> SPDK blobfs encountered a crash around rte_ring dequeues in ppc64. > >> It uses a single consumer and multiple producers for a rte_ring. > >> The problem was a load-load reorder in rte_ring_sc_dequeue_bulk(). > > > > Adding rte_smp_rmb() cause performance regression on non x86 platforms. > > Having said that, load-load barrier can be expressed very well with C11 memory > > model. I guess ppc64 supports C11 memory model. If so, > > Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check > > original issue? > > > >> > >> The reordered loads happened on r->prod.tail in There is rte_smp_rmb() just before reading r->prod.tail in ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ _rte_ring_move_cons_head(). Would that not suffice the requirement? Can you check adding compiler barrier and see is compiler is reordering the stuff? DPDK's ring implementation is based freebsd's ring implementation, I don't see need for such barrier https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h If it is something specific to ppc64 or a specific ppc64 machine, we could add a compile option as it is arch specific to avoid performance impact on other architectures. > >> __rte_ring_move_cons_head() (rte_ring_generic.h) and ring[idx] in > >> DEQUEUE_PTRS() (rte_ring.h). They have a load-load control > >> dependency, but the code does not satisfy it. Note that they are > >> not reordered if __rte_ring_move_cons_head() with is_sc != 1 because > >> cmpset invokes a read barrier. > >> > >> The paired stores on these loads are in ENQUEUE_PTRS() and > >> update_tail(). Simplified code around the reorder is the following. > >> > >> Consumer Producer > >> load idx[ring] > >> store idx[ring] > >> store r->prod.tail > >> load r->prod.tail > >> > >> In this case, the consumer loads old idx[ring] and confirms the load > >> is valid with the new r->prod.tail. > >> > >> I added a read barrier in the case where __IS_SC is passed to > >> __rte_ring_move_cons_head(). I also fixed __rte_ring_move_prod_head() > >> to avoid similar problems with a single producer. > >> > >> Cc: stable@dpdk.org > >> > >> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com> > >> --- > >> lib/librte_ring/rte_ring_generic.h | 10 ++++++---- > >> 1 file changed, 6 insertions(+), 4 deletions(-) > >> > >> diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h > >> index ea7dbe5b9..477326180 100644 > >> --- a/lib/librte_ring/rte_ring_generic.h > >> +++ b/lib/librte_ring/rte_ring_generic.h > >> @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, > >> return 0; > >> > >> *new_head = *old_head + n; > >> - if (is_sp) > >> + if (is_sp) { > >> + rte_smp_rmb(); > >> r->prod.head = *new_head, success = 1; > >> - else > >> + } else > >> success = rte_atomic32_cmpset(&r->prod.head, > >> *old_head, *new_head); > >> } while (unlikely(success == 0)); > >> @@ -158,9 +159,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc, > >> return 0; > >> > >> *new_head = *old_head + n; > >> - if (is_sc) > >> + if (is_sc) { > >> + rte_smp_rmb(); > >> r->cons.head = *new_head, success = 1; > >> - else > >> + } else > >> success = rte_atomic32_cmpset(&r->cons.head, *old_head, > >> *new_head); > >> } while (unlikely(success == 0)); > >> -- > >> 2.17.1 > >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [dpdk-stable] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 2018-07-17 3:34 ` Jerin Jacob @ 2021-03-24 21:45 ` Thomas Monjalon 2021-03-28 1:00 ` Honnappa Nagarahalli 0 siblings, 1 reply; 8+ messages in thread From: Thomas Monjalon @ 2021-03-24 21:45 UTC (permalink / raw) To: Takeshi Yoshimura Cc: stable, dev, olivier.matz, chaozhu, konstantin.ananyev, Jerin Jacob, honnappa.nagarahalli No reply after more than 2 years. Unfortunately it is probably outdated now. Classified as "Changes Requested". 17/07/2018 05:34, Jerin Jacob: > From: Takeshi Yoshimura <t.yoshimura8869@gmail.com> > > Cc: olivier.matz@6wind.com > Cc: chaozhu@linux.vnet.ibm.com > Cc: konstantin.ananyev@intel.com > > > > > > Adding rte_smp_rmb() cause performance regression on non x86 platforms. > > > Having said that, load-load barrier can be expressed very well with C11 memory > > > model. I guess ppc64 supports C11 memory model. If so, > > > Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check > > > original issue? > > > > Yes, the performance regression happens on non-x86 with single > > producer/consumer. > > The average latency of an enqueue was increased from 21 nsec to 24 nsec in my > > simple experiment. But, I think it is worth it. > > That varies to machine to machine. What is the burst size etc. > > > > > > > I also tested C11 rte_ring, however, it caused the same race condition in ppc64. > > I tried to fix the C11 problem as well, but I also found the C11 > > rte_ring had other potential > > incorrect choices of memory orders, which caused another race > > condition in ppc64. > > Does it happens on all ppc64 machines? Or on a specific machine? > Is following tests are passing on your system without the patch? > test/test/test_ring_perf.c > test/test/test_ring.c > > > > > For example, > > __ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(), but > > I am not sure why the load-acquire is used for the compare exchange. > > It correct as per C11 acquire and release semantics. > > > Also in update_tail, the pause can be called before the data copy because > > of ht->tail load without atomic_load_n. > > > > The memory order is simply difficult, so it might take a bit longer > > time to check > > if the code is correct. I think I can fix the C11 rte_ring as another patch. > > > > >> > > >> SPDK blobfs encountered a crash around rte_ring dequeues in ppc64. > > >> It uses a single consumer and multiple producers for a rte_ring. > > >> The problem was a load-load reorder in rte_ring_sc_dequeue_bulk(). > > > > > > Adding rte_smp_rmb() cause performance regression on non x86 platforms. > > > Having said that, load-load barrier can be expressed very well with C11 memory > > > model. I guess ppc64 supports C11 memory model. If so, > > > Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 and check > > > original issue? > > > > > >> > > >> The reordered loads happened on r->prod.tail in > > There is rte_smp_rmb() just before reading r->prod.tail in > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > _rte_ring_move_cons_head(). Would that not suffice the requirement? > > Can you check adding compiler barrier and see is compiler is reordering > the stuff? > > DPDK's ring implementation is based freebsd's ring implementation, I > don't see need for such barrier > > https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h > > If it is something specific to ppc64 or a specific ppc64 machine, we > could add a compile option as it is arch specific to avoid performance > impact on other architectures. > > > >> __rte_ring_move_cons_head() (rte_ring_generic.h) and ring[idx] in > > >> DEQUEUE_PTRS() (rte_ring.h). They have a load-load control > > >> dependency, but the code does not satisfy it. Note that they are > > >> not reordered if __rte_ring_move_cons_head() with is_sc != 1 because > > >> cmpset invokes a read barrier. > > >> > > >> The paired stores on these loads are in ENQUEUE_PTRS() and > > >> update_tail(). Simplified code around the reorder is the following. > > >> > > >> Consumer Producer > > >> load idx[ring] > > >> store idx[ring] > > >> store r->prod.tail > > >> load r->prod.tail > > >> > > >> In this case, the consumer loads old idx[ring] and confirms the load > > >> is valid with the new r->prod.tail. > > >> > > >> I added a read barrier in the case where __IS_SC is passed to > > >> __rte_ring_move_cons_head(). I also fixed __rte_ring_move_prod_head() > > >> to avoid similar problems with a single producer. > > >> > > >> Cc: stable@dpdk.org > > >> > > >> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com> > > >> --- > > >> lib/librte_ring/rte_ring_generic.h | 10 ++++++---- > > >> 1 file changed, 6 insertions(+), 4 deletions(-) > > >> > > >> diff --git a/lib/librte_ring/rte_ring_generic.h b/lib/librte_ring/rte_ring_generic.h > > >> index ea7dbe5b9..477326180 100644 > > >> --- a/lib/librte_ring/rte_ring_generic.h > > >> +++ b/lib/librte_ring/rte_ring_generic.h > > >> @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp, > > >> return 0; > > >> > > >> *new_head = *old_head + n; > > >> - if (is_sp) > > >> + if (is_sp) { > > >> + rte_smp_rmb(); > > >> r->prod.head = *new_head, success = 1; > > >> - else > > >> + } else > > >> success = rte_atomic32_cmpset(&r->prod.head, > > >> *old_head, *new_head); > > >> } while (unlikely(success == 0)); > > >> @@ -158,9 +159,10 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc, > > >> return 0; > > >> > > >> *new_head = *old_head + n; > > >> - if (is_sc) > > >> + if (is_sc) { > > >> + rte_smp_rmb(); > > >> r->cons.head = *new_head, success = 1; > > >> - else > > >> + } else > > >> success = rte_atomic32_cmpset(&r->cons.head, *old_head, > > >> *new_head); > > >> } while (unlikely(success == 0)); > > >> -- > > >> 2.17.1 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [dpdk-stable] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 2021-03-24 21:45 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon @ 2021-03-28 1:00 ` Honnappa Nagarahalli 2021-06-16 7:14 ` [dpdk-dev] 回复: " Feifei Wang 0 siblings, 1 reply; 8+ messages in thread From: Honnappa Nagarahalli @ 2021-03-28 1:00 UTC (permalink / raw) To: thomas, Takeshi Yoshimura Cc: stable, dev, olivier.matz, chaozhu, konstantin.ananyev, Jerin Jacob, nd, nd <snip> > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] rte_ring: fix racy > dequeue/enqueue in ppc64 > > No reply after more than 2 years. > Unfortunately it is probably outdated now. > Classified as "Changes Requested". Looking at the code, I think this patch in fact fixes a bug. Appreciate rebasing this patch. The problem is already fixed in '__rte_ring_move_cons_head' but needs to be fixed in '__rte_ring_move_prod_head'. This problem is fixed for C11 version due to acquire load of cons.tail and prod.tail. > > > 17/07/2018 05:34, Jerin Jacob: > > From: Takeshi Yoshimura <t.yoshimura8869@gmail.com> > > > > Cc: olivier.matz@6wind.com > > Cc: chaozhu@linux.vnet.ibm.com > > Cc: konstantin.ananyev@intel.com > > > > > > > > > Adding rte_smp_rmb() cause performance regression on non x86 > platforms. > > > > Having said that, load-load barrier can be expressed very well > > > > with C11 memory model. I guess ppc64 supports C11 memory model. If > > > > so, Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 > > > > and check original issue? > > > > > > Yes, the performance regression happens on non-x86 with single > > > producer/consumer. > > > The average latency of an enqueue was increased from 21 nsec to 24 > > > nsec in my simple experiment. But, I think it is worth it. > > > > That varies to machine to machine. What is the burst size etc. > > > > > > > > > > > I also tested C11 rte_ring, however, it caused the same race condition in > ppc64. > > > I tried to fix the C11 problem as well, but I also found the C11 > > > rte_ring had other potential incorrect choices of memory orders, > > > which caused another race condition in ppc64. > > > > Does it happens on all ppc64 machines? Or on a specific machine? > > Is following tests are passing on your system without the patch? > > test/test/test_ring_perf.c > > test/test/test_ring.c > > > > > > > > For example, > > > __ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(), but I > > > am not sure why the load-acquire is used for the compare exchange. > > > > It correct as per C11 acquire and release semantics. > > > > > Also in update_tail, the pause can be called before the data copy > > > because of ht->tail load without atomic_load_n. > > > > > > The memory order is simply difficult, so it might take a bit longer > > > time to check if the code is correct. I think I can fix the C11 > > > rte_ring as another patch. > > > > > > >> > > > >> SPDK blobfs encountered a crash around rte_ring dequeues in ppc64. > > > >> It uses a single consumer and multiple producers for a rte_ring. > > > >> The problem was a load-load reorder in rte_ring_sc_dequeue_bulk(). > > > > > > > > Adding rte_smp_rmb() cause performance regression on non x86 > platforms. > > > > Having said that, load-load barrier can be expressed very well > > > > with C11 memory model. I guess ppc64 supports C11 memory model. If > > > > so, Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y for ppc64 > > > > and check original issue? > > > > > > > >> > > > >> The reordered loads happened on r->prod.tail in > > > > There is rte_smp_rmb() just before reading r->prod.tail in > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > _rte_ring_move_cons_head(). Would that not suffice the requirement? > > > > Can you check adding compiler barrier and see is compiler is > > reordering the stuff? > > > > DPDK's ring implementation is based freebsd's ring implementation, I > > don't see need for such barrier > > > > https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h > > > > If it is something specific to ppc64 or a specific ppc64 machine, we > > could add a compile option as it is arch specific to avoid performance > > impact on other architectures. > > > > > >> __rte_ring_move_cons_head() (rte_ring_generic.h) and ring[idx] in > > > >> DEQUEUE_PTRS() (rte_ring.h). They have a load-load control > > > >> dependency, but the code does not satisfy it. Note that they are > > > >> not reordered if __rte_ring_move_cons_head() with is_sc != 1 > > > >> because cmpset invokes a read barrier. > > > >> > > > >> The paired stores on these loads are in ENQUEUE_PTRS() and > > > >> update_tail(). Simplified code around the reorder is the following. > > > >> > > > >> Consumer Producer > > > >> load idx[ring] > > > >> store idx[ring] > > > >> store r->prod.tail load r->prod.tail > > > >> > > > >> In this case, the consumer loads old idx[ring] and confirms the > > > >> load is valid with the new r->prod.tail. > > > >> > > > >> I added a read barrier in the case where __IS_SC is passed to > > > >> __rte_ring_move_cons_head(). I also fixed > > > >> __rte_ring_move_prod_head() to avoid similar problems with a single > producer. > > > >> > > > >> Cc: stable@dpdk.org > > > >> > > > >> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com> > > > >> --- > > > >> lib/librte_ring/rte_ring_generic.h | 10 ++++++---- > > > >> 1 file changed, 6 insertions(+), 4 deletions(-) > > > >> > > > >> diff --git a/lib/librte_ring/rte_ring_generic.h > > > >> b/lib/librte_ring/rte_ring_generic.h > > > >> index ea7dbe5b9..477326180 100644 > > > >> --- a/lib/librte_ring/rte_ring_generic.h > > > >> +++ b/lib/librte_ring/rte_ring_generic.h > > > >> @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, > unsigned int is_sp, > > > >> return 0; > > > >> > > > >> *new_head = *old_head + n; > > > >> - if (is_sp) > > > >> + if (is_sp) { > > > >> + rte_smp_rmb(); > > > >> r->prod.head = *new_head, success = 1; > > > >> - else > > > >> + } else > > > >> success = rte_atomic32_cmpset(&r->prod.head, > > > >> *old_head, *new_head); > > > >> } while (unlikely(success == 0)); @@ -158,9 +159,10 @@ > > > >> __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc, > > > >> return 0; > > > >> > > > >> *new_head = *old_head + n; > > > >> - if (is_sc) > > > >> + if (is_sc) { > > > >> + rte_smp_rmb(); > > > >> r->cons.head = *new_head, success = 1; > > > >> - else > > > >> + } else > > > >> success = rte_atomic32_cmpset(&r->cons.head, *old_head, > > > >> *new_head); > > > >> } while (unlikely(success == 0)); > > > >> -- > > > >> 2.17.1 > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* [dpdk-dev] 回复: [dpdk-stable] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 2021-03-28 1:00 ` Honnappa Nagarahalli @ 2021-06-16 7:14 ` Feifei Wang 2021-06-16 16:37 ` [dpdk-dev] " Honnappa Nagarahalli 0 siblings, 1 reply; 8+ messages in thread From: Feifei Wang @ 2021-06-16 7:14 UTC (permalink / raw) To: Honnappa Nagarahalli, thomas, Takeshi Yoshimura Cc: stable, dev, olivier.matz, chaozhu, konstantin.ananyev, jerinj, nd, nd Hi, everyone This patch can be closed with the following reasons. > -----邮件原件----- > 发件人: dev <dev-bounces@dpdk.org> 代表 Honnappa Nagarahalli > 发送时间: 2021年3月28日 9:00 > 收件人: thomas@monjalon.net; Takeshi Yoshimura > <t.yoshimura8869@gmail.com> > 抄送: stable@dpdk.org; dev@dpdk.org; olivier.matz@6wind.com; > chaozhu@linux.vnet.ibm.com; konstantin.ananyev@intel.com; Jerin Jacob > <jerin.jacob@caviumnetworks.com>; nd <nd@arm.com>; nd <nd@arm.com> > 主题: Re: [dpdk-dev] [dpdk-stable] [PATCH] rte_ring: fix racy > dequeue/enqueue in ppc64 > > <snip> > > > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] rte_ring: fix racy > > dequeue/enqueue in ppc64 > > > > No reply after more than 2 years. > > Unfortunately it is probably outdated now. > > Classified as "Changes Requested". > Looking at the code, I think this patch in fact fixes a bug. Appreciate rebasing > this patch. > > The problem is already fixed in '__rte_ring_move_cons_head' but needs to > be fixed in '__rte_ring_move_prod_head'. > This problem is fixed for C11 version due to acquire load of cons.tail and > prod.tail. First, for consumer in dequeue: the reason for that adding a rmb in move_cons_head of “generic” is based on this patch: http://patches.dpdk.org/project/dpdk/patch/1552409933-45684-2-git-send-email-gavin.hu@arm.com/ Slot Consumer Producer 1 dequeue elements 2 update prod_tail 3 load new prod_tail 4 check room is enough(n < entries) Dequeue elements maybe before load updated prod_tail, so consumer can load incorrect elements value. For dequeue multiple consumers case, ‘rte_atomic32_cmpset’ with acquire and release order can prevent dequeue before load prod_tail, no extra rmb is needed. Second, for single producer in enqueue: Slot Producer Consumer 1 enqueue elements(not commited) 2 update consumer_tail 3 load new consumer_tail 4 check room is enough(n < entries) 5 enqueued elements is committed Though enqueue elements maybe reorder before load consumer_tail, these elements will not be committed until ‘check’ has finished. So from load to write control dependency is reliable and rmb is not needed here. [1] https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf (page:15) As a result, it is unnecessary to add a rmb for enqueue single producer due to control dependency. And this patch can be closed. Best Regards Feifei > > > > > > > 17/07/2018 05:34, Jerin Jacob: > > > From: Takeshi Yoshimura <t.yoshimura8869@gmail.com> > > > > > > Cc: olivier.matz@6wind.com > > > Cc: chaozhu@linux.vnet.ibm.com > > > Cc: konstantin.ananyev@intel.com > > > > > > > > > > > > Adding rte_smp_rmb() cause performance regression on non x86 > > platforms. > > > > > Having said that, load-load barrier can be expressed very well > > > > > with C11 memory model. I guess ppc64 supports C11 memory model. > > > > > If so, Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y > for > > > > > ppc64 and check original issue? > > > > > > > > Yes, the performance regression happens on non-x86 with single > > > > producer/consumer. > > > > The average latency of an enqueue was increased from 21 nsec to 24 > > > > nsec in my simple experiment. But, I think it is worth it. > > > > > > That varies to machine to machine. What is the burst size etc. > > > > > > > > > > > > > > > I also tested C11 rte_ring, however, it caused the same race > > > > condition in > > ppc64. > > > > I tried to fix the C11 problem as well, but I also found the C11 > > > > rte_ring had other potential incorrect choices of memory orders, > > > > which caused another race condition in ppc64. > > > > > > Does it happens on all ppc64 machines? Or on a specific machine? > > > Is following tests are passing on your system without the patch? > > > test/test/test_ring_perf.c > > > test/test/test_ring.c > > > > > > > > > > > For example, > > > > __ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(), > but I > > > > am not sure why the load-acquire is used for the compare exchange. > > > > > > It correct as per C11 acquire and release semantics. > > > > > > > Also in update_tail, the pause can be called before the data copy > > > > because of ht->tail load without atomic_load_n. > > > > > > > > The memory order is simply difficult, so it might take a bit > > > > longer time to check if the code is correct. I think I can fix the > > > > C11 rte_ring as another patch. > > > > > > > > >> > > > > >> SPDK blobfs encountered a crash around rte_ring dequeues in ppc64. > > > > >> It uses a single consumer and multiple producers for a rte_ring. > > > > >> The problem was a load-load reorder in > rte_ring_sc_dequeue_bulk(). > > > > > > > > > > Adding rte_smp_rmb() cause performance regression on non x86 > > platforms. > > > > > Having said that, load-load barrier can be expressed very well > > > > > with C11 memory model. I guess ppc64 supports C11 memory model. > > > > > If so, Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y > for > > > > > ppc64 and check original issue? > > > > > > > > > >> > > > > >> The reordered loads happened on r->prod.tail in > > > > > > There is rte_smp_rmb() just before reading r->prod.tail in > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > _rte_ring_move_cons_head(). Would that not suffice the requirement? > > > > > > Can you check adding compiler barrier and see is compiler is > > > reordering the stuff? > > > > > > DPDK's ring implementation is based freebsd's ring implementation, I > > > don't see need for such barrier > > > > > > https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h > > > > > > If it is something specific to ppc64 or a specific ppc64 machine, we > > > could add a compile option as it is arch specific to avoid > > > performance impact on other architectures. > > > > > > > >> __rte_ring_move_cons_head() (rte_ring_generic.h) and ring[idx] > > > > >> in > > > > >> DEQUEUE_PTRS() (rte_ring.h). They have a load-load control > > > > >> dependency, but the code does not satisfy it. Note that they > > > > >> are not reordered if __rte_ring_move_cons_head() with is_sc != > > > > >> 1 because cmpset invokes a read barrier. > > > > >> > > > > >> The paired stores on these loads are in ENQUEUE_PTRS() and > > > > >> update_tail(). Simplified code around the reorder is the following. > > > > >> > > > > >> Consumer Producer > > > > >> load idx[ring] > > > > >> store idx[ring] > > > > >> store r->prod.tail load r->prod.tail > > > > >> > > > > >> In this case, the consumer loads old idx[ring] and confirms the > > > > >> load is valid with the new r->prod.tail. > > > > >> > > > > >> I added a read barrier in the case where __IS_SC is passed to > > > > >> __rte_ring_move_cons_head(). I also fixed > > > > >> __rte_ring_move_prod_head() to avoid similar problems with a > > > > >> single > > producer. > > > > >> > > > > >> Cc: stable@dpdk.org > > > > >> > > > > >> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com> > > > > >> --- > > > > >> lib/librte_ring/rte_ring_generic.h | 10 ++++++---- > > > > >> 1 file changed, 6 insertions(+), 4 deletions(-) > > > > >> > > > > >> diff --git a/lib/librte_ring/rte_ring_generic.h > > > > >> b/lib/librte_ring/rte_ring_generic.h > > > > >> index ea7dbe5b9..477326180 100644 > > > > >> --- a/lib/librte_ring/rte_ring_generic.h > > > > >> +++ b/lib/librte_ring/rte_ring_generic.h > > > > >> @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring > > > > >> *r, > > unsigned int is_sp, > > > > >> return 0; > > > > >> > > > > >> *new_head = *old_head + n; > > > > >> - if (is_sp) > > > > >> + if (is_sp) { > > > > >> + rte_smp_rmb(); > > > > >> r->prod.head = *new_head, success = 1; > > > > >> - else > > > > >> + } else > > > > >> success = rte_atomic32_cmpset(&r->prod.head, > > > > >> *old_head, *new_head); > > > > >> } while (unlikely(success == 0)); @@ -158,9 +159,10 @@ > > > > >> __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc, > > > > >> return 0; > > > > >> > > > > >> *new_head = *old_head + n; > > > > >> - if (is_sc) > > > > >> + if (is_sc) { > > > > >> + rte_smp_rmb(); > > > > >> r->cons.head = *new_head, success = 1; > > > > >> - else > > > > >> + } else > > > > >> success = rte_atomic32_cmpset(&r->cons.head, > *old_head, > > > > >> *new_head); > > > > >> } while (unlikely(success == 0)); > > > > >> -- > > > > >> 2.17.1 > > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [dpdk-stable] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 2021-06-16 7:14 ` [dpdk-dev] 回复: " Feifei Wang @ 2021-06-16 16:37 ` Honnappa Nagarahalli 0 siblings, 0 replies; 8+ messages in thread From: Honnappa Nagarahalli @ 2021-06-16 16:37 UTC (permalink / raw) To: Feifei Wang, thomas, Takeshi Yoshimura Cc: stable, dev, olivier.matz, chaozhu, konstantin.ananyev, jerinj, nd, Honnappa Nagarahalli, nd <snip> > > Hi, everyone > > This patch can be closed with the following reasons. > > > -----邮件原件----- > > 发件人: dev <dev-bounces@dpdk.org> 代表 Honnappa Nagarahalli > > 发送时间: 2021年3月28日 9:00 > > 收件人: thomas@monjalon.net; Takeshi Yoshimura > > <t.yoshimura8869@gmail.com> > > 抄送: stable@dpdk.org; dev@dpdk.org; olivier.matz@6wind.com; > > chaozhu@linux.vnet.ibm.com; konstantin.ananyev@intel.com; Jerin Jacob > > <jerin.jacob@caviumnetworks.com>; nd <nd@arm.com>; nd > <nd@arm.com> > > 主题: Re: [dpdk-dev] [dpdk-stable] [PATCH] rte_ring: fix racy > > dequeue/enqueue in ppc64 > > > > <snip> > > > > > Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] rte_ring: fix racy > > > dequeue/enqueue in ppc64 > > > > > > No reply after more than 2 years. > > > Unfortunately it is probably outdated now. > > > Classified as "Changes Requested". > > Looking at the code, I think this patch in fact fixes a bug. > > Appreciate rebasing this patch. > > > > The problem is already fixed in '__rte_ring_move_cons_head' but needs > > to be fixed in '__rte_ring_move_prod_head'. > > This problem is fixed for C11 version due to acquire load of cons.tail > > and prod.tail. > > First, for consumer in dequeue: > the reason for that adding a rmb in move_cons_head of “generic” is based on > this patch: > http://patches.dpdk.org/project/dpdk/patch/1552409933-45684-2-git-send- > email-gavin.hu@arm.com/ > > Slot Consumer Producer > 1 dequeue elements > 2 update prod_tail > 3 load new prod_tail > 4 check room is enough(n < entries) > > Dequeue elements maybe before load updated prod_tail, so consumer can > load incorrect elements value. > For dequeue multiple consumers case, ‘rte_atomic32_cmpset’ with acquire > and release order can prevent dequeue before load prod_tail, no extra rmb is > needed. > > Second, for single producer in enqueue: > > Slot Producer Consumer > 1 enqueue elements(not commited) > 2 update > consumer_tail > 3 load new consumer_tail > 4 check room is enough(n < entries) > 5 enqueued elements is committed > > Though enqueue elements maybe reorder before load consumer_tail, these > elements will not be committed until ‘check’ has finished. So from load to > write control dependency is reliable and rmb is not needed here. > [1] https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf (page:15) > > As a result, it is unnecessary to add a rmb for enqueue single producer due to > control dependency. And this patch can be closed. Thanks Feifei, I did not consider the control dependency from load to store which is reliable in my comments below. Agree, we can reject this patch. > > Best Regards > Feifei > > > > > > > > > > > 17/07/2018 05:34, Jerin Jacob: > > > > From: Takeshi Yoshimura <t.yoshimura8869@gmail.com> > > > > > > > > Cc: olivier.matz@6wind.com > > > > Cc: chaozhu@linux.vnet.ibm.com > > > > Cc: konstantin.ananyev@intel.com > > > > > > > > > > > > > > > Adding rte_smp_rmb() cause performance regression on non x86 > > > platforms. > > > > > > Having said that, load-load barrier can be expressed very > > > > > > well with C11 memory model. I guess ppc64 supports C11 memory > model. > > > > > > If so, Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y > > for > > > > > > ppc64 and check original issue? > > > > > > > > > > Yes, the performance regression happens on non-x86 with single > > > > > producer/consumer. > > > > > The average latency of an enqueue was increased from 21 nsec to > > > > > 24 nsec in my simple experiment. But, I think it is worth it. > > > > > > > > That varies to machine to machine. What is the burst size etc. > > > > > > > > > > > > > > > > > > > I also tested C11 rte_ring, however, it caused the same race > > > > > condition in > > > ppc64. > > > > > I tried to fix the C11 problem as well, but I also found the C11 > > > > > rte_ring had other potential incorrect choices of memory orders, > > > > > which caused another race condition in ppc64. > > > > > > > > Does it happens on all ppc64 machines? Or on a specific machine? > > > > Is following tests are passing on your system without the patch? > > > > test/test/test_ring_perf.c > > > > test/test/test_ring.c > > > > > > > > > > > > > > For example, > > > > > __ATOMIC_ACQUIRE is passed to __atomic_compare_exchange_n(), > > but I > > > > > am not sure why the load-acquire is used for the compare exchange. > > > > > > > > It correct as per C11 acquire and release semantics. > > > > > > > > > Also in update_tail, the pause can be called before the data > > > > > copy because of ht->tail load without atomic_load_n. > > > > > > > > > > The memory order is simply difficult, so it might take a bit > > > > > longer time to check if the code is correct. I think I can fix > > > > > the > > > > > C11 rte_ring as another patch. > > > > > > > > > > >> > > > > > >> SPDK blobfs encountered a crash around rte_ring dequeues in > ppc64. > > > > > >> It uses a single consumer and multiple producers for a rte_ring. > > > > > >> The problem was a load-load reorder in > > rte_ring_sc_dequeue_bulk(). > > > > > > > > > > > > Adding rte_smp_rmb() cause performance regression on non x86 > > > platforms. > > > > > > Having said that, load-load barrier can be expressed very > > > > > > well with C11 memory model. I guess ppc64 supports C11 memory > model. > > > > > > If so, Could you try CONFIG_RTE_RING_USE_C11_MEM_MODEL=y > > for > > > > > > ppc64 and check original issue? > > > > > > > > > > > >> > > > > > >> The reordered loads happened on r->prod.tail in > > > > > > > > There is rte_smp_rmb() just before reading r->prod.tail in > > > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > _rte_ring_move_cons_head(). Would that not suffice the requirement? > > > > > > > > Can you check adding compiler barrier and see is compiler is > > > > reordering the stuff? > > > > > > > > DPDK's ring implementation is based freebsd's ring implementation, > > > > I don't see need for such barrier > > > > > > > > https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h > > > > > > > > If it is something specific to ppc64 or a specific ppc64 machine, > > > > we could add a compile option as it is arch specific to avoid > > > > performance impact on other architectures. > > > > > > > > > >> __rte_ring_move_cons_head() (rte_ring_generic.h) and > > > > > >> ring[idx] in > > > > > >> DEQUEUE_PTRS() (rte_ring.h). They have a load-load control > > > > > >> dependency, but the code does not satisfy it. Note that they > > > > > >> are not reordered if __rte_ring_move_cons_head() with is_sc > > > > > >> != > > > > > >> 1 because cmpset invokes a read barrier. > > > > > >> > > > > > >> The paired stores on these loads are in ENQUEUE_PTRS() and > > > > > >> update_tail(). Simplified code around the reorder is the following. > > > > > >> > > > > > >> Consumer Producer > > > > > >> load idx[ring] > > > > > >> store idx[ring] > > > > > >> store r->prod.tail load r->prod.tail > > > > > >> > > > > > >> In this case, the consumer loads old idx[ring] and confirms > > > > > >> the load is valid with the new r->prod.tail. > > > > > >> > > > > > >> I added a read barrier in the case where __IS_SC is passed to > > > > > >> __rte_ring_move_cons_head(). I also fixed > > > > > >> __rte_ring_move_prod_head() to avoid similar problems with a > > > > > >> single > > > producer. > > > > > >> > > > > > >> Cc: stable@dpdk.org > > > > > >> > > > > > >> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com> > > > > > >> --- > > > > > >> lib/librte_ring/rte_ring_generic.h | 10 ++++++---- > > > > > >> 1 file changed, 6 insertions(+), 4 deletions(-) > > > > > >> > > > > > >> diff --git a/lib/librte_ring/rte_ring_generic.h > > > > > >> b/lib/librte_ring/rte_ring_generic.h > > > > > >> index ea7dbe5b9..477326180 100644 > > > > > >> --- a/lib/librte_ring/rte_ring_generic.h > > > > > >> +++ b/lib/librte_ring/rte_ring_generic.h > > > > > >> @@ -90,9 +90,10 @@ __rte_ring_move_prod_head(struct rte_ring > > > > > >> *r, > > > unsigned int is_sp, > > > > > >> return 0; > > > > > >> > > > > > >> *new_head = *old_head + n; > > > > > >> - if (is_sp) > > > > > >> + if (is_sp) { > > > > > >> + rte_smp_rmb(); > > > > > >> r->prod.head = *new_head, success = 1; > > > > > >> - else > > > > > >> + } else > > > > > >> success = rte_atomic32_cmpset(&r->prod.head, > > > > > >> *old_head, *new_head); > > > > > >> } while (unlikely(success == 0)); @@ -158,9 +159,10 > > > > > >> @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int > is_sc, > > > > > >> return 0; > > > > > >> > > > > > >> *new_head = *old_head + n; > > > > > >> - if (is_sc) > > > > > >> + if (is_sc) { > > > > > >> + rte_smp_rmb(); > > > > > >> r->cons.head = *new_head, success = 1; > > > > > >> - else > > > > > >> + } else > > > > > >> success = > > > > > >> rte_atomic32_cmpset(&r->cons.head, > > *old_head, > > > > > >> *new_head); > > > > > >> } while (unlikely(success == 0)); > > > > > >> -- > > > > > >> 2.17.1 > > > > > > ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-06-16 16:38 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-07-12 2:44 [dpdk-dev] [PATCH] rte_ring: fix racy dequeue/enqueue in ppc64 Takeshi Yoshimura 2018-07-12 17:08 ` Jerin Jacob 2018-07-17 2:54 ` Takeshi Yoshimura 2018-07-17 3:34 ` Jerin Jacob 2021-03-24 21:45 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon 2021-03-28 1:00 ` Honnappa Nagarahalli 2021-06-16 7:14 ` [dpdk-dev] 回复: " Feifei Wang 2021-06-16 16:37 ` [dpdk-dev] " Honnappa Nagarahalli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).