patches for DPDK stable branches
 help / color / mirror / Atom feed
  • * [dpdk-stable] [PATCH v4 1/2] ring: synchronize the load and store of the tail
           [not found]   ` <1540981587-88590-1-git-send-email-gavin.hu@arm.com>
           [not found]     ` <1540956945-211373-1-git-send-email-gavin.hu@arm.com>
    @ 2018-11-01  9:53     ` Gavin Hu
      2018-11-01  9:53     ` [dpdk-stable] [PATCH v4 2/2] ring: move the atomic load of head above the loop Gavin Hu
      2 siblings, 0 replies; 125+ messages in thread
    From: Gavin Hu @ 2018-11-01  9:53 UTC (permalink / raw)
      To: dev
      Cc: thomas, stephen, olivier.matz, chaozhu, bruce.richardson,
    	konstantin.ananyev, jerin.jacob, Honnappa.Nagarahalli, gavin.hu,
    	stable
    
    Synchronize the load-acquire of the tail and the store-release
    within update_tail, the store release ensures all the ring operations,
    enqueue or dequeue, are seen by the observers on the other side as soon
    as they see the updated tail. The load-acquire is needed here as the
    data dependency is not a reliable way for ordering as the compiler might
    break it by saving to temporary values to boost performance.
    When computing the free_entries and avail_entries, use atomic semantics
    to load the heads and tails instead.
    
    The patch was benchmarked with test/ring_perf_autotest and it decreases
    the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, the real gains
    are dependent on the number of lcores, depth of the ring, SPSC or MPMC.
    For 1 lcore, it also improves a little, about 3 ~ 4%.
    It is a big improvement, in case of MPMC, with two lcores and ring size
    of 32, it saves latency up to (3.26-2.36)/3.26 = 27.6%.
    
    This patch is a bug fix, while the improvement is a bonus. In our analysis
    the improvement comes from the cacheline pre-filling after hoisting load-
    acquire from _atomic_compare_exchange_n up above.
    
    The test command:
    $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=\
    1024 -- -i
    
    Test result with this patch(two cores):
     SP/SC bulk enq/dequeue (size: 8): 5.86
     MP/MC bulk enq/dequeue (size: 8): 10.15
     SP/SC bulk enq/dequeue (size: 32): 1.94
     MP/MC bulk enq/dequeue (size: 32): 2.36
    
    In comparison of the test result without this patch:
     SP/SC bulk enq/dequeue (size: 8): 6.67
     MP/MC bulk enq/dequeue (size: 8): 13.12
     SP/SC bulk enq/dequeue (size: 32): 2.04
     MP/MC bulk enq/dequeue (size: 32): 3.26
    
    Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
    Cc: stable@dpdk.org
    
    Signed-off-by: Gavin Hu <gavin.hu@arm.com>
    Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
    Reviewed-by: Steve Capper <steve.capper@arm.com>
    Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
    Reviewed-by: Jia He <justin.he@arm.com>
    Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    ---
     lib/librte_ring/rte_ring_c11_mem.h | 22 ++++++++++++++++++----
     1 file changed, 18 insertions(+), 4 deletions(-)
    
    diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
    index 94df3c4..52da95a 100644
    --- a/lib/librte_ring/rte_ring_c11_mem.h
    +++ b/lib/librte_ring/rte_ring_c11_mem.h
    @@ -57,6 +57,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
     		uint32_t *free_entries)
     {
     	const uint32_t capacity = r->capacity;
    +	uint32_t cons_tail;
     	unsigned int max = n;
     	int success;
     
    @@ -67,13 +68,18 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
     		*old_head = __atomic_load_n(&r->prod.head,
     					__ATOMIC_ACQUIRE);
     
    -		/*
    -		 *  The subtraction is done between two unsigned 32bits value
    +		/* load-acquire synchronize with store-release of ht->tail
    +		 * in update_tail.
    +		 */
    +		cons_tail = __atomic_load_n(&r->cons.tail,
    +					__ATOMIC_ACQUIRE);
    +
    +		/* The subtraction is done between two unsigned 32bits value
     		 * (the result is always modulo 32 bits even if we have
     		 * *old_head > cons_tail). So 'free_entries' is always between 0
     		 * and capacity (which is < size).
     		 */
    -		*free_entries = (capacity + r->cons.tail - *old_head);
    +		*free_entries = (capacity + cons_tail - *old_head);
     
     		/* check that we have enough room in ring */
     		if (unlikely(n > *free_entries))
    @@ -125,21 +131,29 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
     		uint32_t *entries)
     {
     	unsigned int max = n;
    +	uint32_t prod_tail;
     	int success;
     
     	/* move cons.head atomically */
     	do {
     		/* Restore n as it may change every loop */
     		n = max;
    +
     		*old_head = __atomic_load_n(&r->cons.head,
     					__ATOMIC_ACQUIRE);
     
    +		/* this load-acquire synchronize with store-release of ht->tail
    +		 * in update_tail.
    +		 */
    +		prod_tail = __atomic_load_n(&r->prod.tail,
    +					__ATOMIC_ACQUIRE);
    +
     		/* The subtraction is done between two unsigned 32bits value
     		 * (the result is always modulo 32 bits even if we have
     		 * cons_head > prod_tail). So 'entries' is always between 0
     		 * and size(ring)-1.
     		 */
    -		*entries = (r->prod.tail - *old_head);
    +		*entries = (prod_tail - *old_head);
     
     		/* Set the actual entries for dequeue */
     		if (n > *entries)
    -- 
    2.7.4
    
    ^ permalink raw reply	[flat|nested] 125+ messages in thread
  • * [dpdk-stable] [PATCH v4 2/2] ring: move the atomic load of head above the loop
           [not found]   ` <1540981587-88590-1-git-send-email-gavin.hu@arm.com>
           [not found]     ` <1540956945-211373-1-git-send-email-gavin.hu@arm.com>
      2018-11-01  9:53     ` [dpdk-stable] [PATCH v4 1/2] ring: synchronize the load and store of the tail Gavin Hu
    @ 2018-11-01  9:53     ` Gavin Hu
      2018-11-01 17:26       ` Stephen Hemminger
      2 siblings, 1 reply; 125+ messages in thread
    From: Gavin Hu @ 2018-11-01  9:53 UTC (permalink / raw)
      To: dev
      Cc: thomas, stephen, olivier.matz, chaozhu, bruce.richardson,
    	konstantin.ananyev, jerin.jacob, Honnappa.Nagarahalli, gavin.hu,
    	stable
    
    In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
    the do {} while loop as upon failure the old_head will be updated,
    another load is costly and not necessary.
    
    This helps a little on the latency,about 1~5%.
    
     Test result with the patch(two cores):
     SP/SC bulk enq/dequeue (size: 8): 5.64
     MP/MC bulk enq/dequeue (size: 8): 9.58
     SP/SC bulk enq/dequeue (size: 32): 1.98
     MP/MC bulk enq/dequeue (size: 32): 2.30
    
    Fixes: 39368ebfc606 ("ring: introduce C11 memory model barrier option")
    Cc: stable@dpdk.org
    
    Signed-off-by: Gavin Hu <gavin.hu@arm.com>
    Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
    Reviewed-by: Steve Capper <steve.capper@arm.com>
    Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
    Reviewed-by: Jia He <justin.he@arm.com>
    Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    ---
     doc/guides/rel_notes/release_18_11.rst |  7 +++++++
     lib/librte_ring/rte_ring_c11_mem.h     | 10 ++++------
     2 files changed, 11 insertions(+), 6 deletions(-)
    
    diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
    index 376128f..c9c2b86 100644
    --- a/doc/guides/rel_notes/release_18_11.rst
    +++ b/doc/guides/rel_notes/release_18_11.rst
    @@ -69,6 +69,13 @@ New Features
       checked out against that dma mask and rejected if out of range. If more than
       one device has addressing limitations, the dma mask is the more restricted one.
     
    +* **Updated the ring library with C11 memory model.**
    +
    +  Updated the ring library with C11 memory model including the following changes:
    +
    +  * Synchronize the load and store of the tail
    +  * Move the atomic load of head above the loop
    +
     * **Added hot-unplug handle mechanism.**
     
       ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
    diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
    index 52da95a..7bc74a4 100644
    --- a/lib/librte_ring/rte_ring_c11_mem.h
    +++ b/lib/librte_ring/rte_ring_c11_mem.h
    @@ -61,13 +61,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
     	unsigned int max = n;
     	int success;
     
    +	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_ACQUIRE);
     	do {
     		/* Reset n to the initial burst count */
     		n = max;
     
    -		*old_head = __atomic_load_n(&r->prod.head,
    -					__ATOMIC_ACQUIRE);
    -
     		/* load-acquire synchronize with store-release of ht->tail
     		 * in update_tail.
     		 */
    @@ -93,6 +91,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
     		if (is_sp)
     			r->prod.head = *new_head, success = 1;
     		else
    +			/* on failure, *old_head is updated */
     			success = __atomic_compare_exchange_n(&r->prod.head,
     					old_head, *new_head,
     					0, __ATOMIC_ACQUIRE,
    @@ -135,13 +134,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
     	int success;
     
     	/* move cons.head atomically */
    +	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE);
     	do {
     		/* Restore n as it may change every loop */
     		n = max;
     
    -		*old_head = __atomic_load_n(&r->cons.head,
    -					__ATOMIC_ACQUIRE);
    -
     		/* this load-acquire synchronize with store-release of ht->tail
     		 * in update_tail.
     		 */
    @@ -166,6 +163,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
     		if (is_sc)
     			r->cons.head = *new_head, success = 1;
     		else
    +			/* on failure, *old_head will be updated */
     			success = __atomic_compare_exchange_n(&r->cons.head,
     							old_head, *new_head,
     							0, __ATOMIC_ACQUIRE,
    -- 
    2.7.4
    
    ^ permalink raw reply	[flat|nested] 125+ messages in thread
  • * [dpdk-stable] [PATCH v5 1/2] ring: synchronize the load and store of the tail
           [not found] ` <1541066031-29125-1-git-send-email-gavin.hu@arm.com>
           [not found]   ` <1540981587-88590-1-git-send-email-gavin.hu@arm.com>
    @ 2018-11-02 11:21   ` Gavin Hu
      2018-11-05  9:30     ` Olivier Matz
      2018-11-02 11:21   ` [dpdk-stable] [PATCH v5 2/2] ring: move the atomic load of head above the loop Gavin Hu
      2 siblings, 1 reply; 125+ messages in thread
    From: Gavin Hu @ 2018-11-02 11:21 UTC (permalink / raw)
      To: dev
      Cc: thomas, stephen, olivier.matz, chaozhu, bruce.richardson,
    	konstantin.ananyev, jerin.jacob, Honnappa.Nagarahalli, gavin.hu,
    	stable
    
    Synchronize the load-acquire of the tail and the store-release
    within update_tail, the store release ensures all the ring operations,
    enqueue or dequeue, are seen by the observers on the other side as soon
    as they see the updated tail. The load-acquire is needed here as the
    data dependency is not a reliable way for ordering as the compiler might
    break it by saving to temporary values to boost performance.
    When computing the free_entries and avail_entries, use atomic semantics
    to load the heads and tails instead.
    
    The patch was benchmarked with test/ring_perf_autotest and it decreases
    the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, the real gains
    are dependent on the number of lcores, depth of the ring, SPSC or MPMC.
    For 1 lcore, it also improves a little, about 3 ~ 4%.
    It is a big improvement, in case of MPMC, with two lcores and ring size
    of 32, it saves latency up to (3.26-2.36)/3.26 = 27.6%.
    
    This patch is a bug fix, while the improvement is a bonus. In our analysis
    the improvement comes from the cacheline pre-filling after hoisting load-
    acquire from _atomic_compare_exchange_n up above.
    
    The test command:
    $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 --socket-mem=\
    1024 -- -i
    
    Test result with this patch(two cores):
     SP/SC bulk enq/dequeue (size: 8): 5.86
     MP/MC bulk enq/dequeue (size: 8): 10.15
     SP/SC bulk enq/dequeue (size: 32): 1.94
     MP/MC bulk enq/dequeue (size: 32): 2.36
    
    In comparison of the test result without this patch:
     SP/SC bulk enq/dequeue (size: 8): 6.67
     MP/MC bulk enq/dequeue (size: 8): 13.12
     SP/SC bulk enq/dequeue (size: 32): 2.04
     MP/MC bulk enq/dequeue (size: 32): 3.26
    
    Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
    Cc: stable@dpdk.org
    
    Signed-off-by: Gavin Hu <gavin.hu@arm.com>
    Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
    Reviewed-by: Steve Capper <steve.capper@arm.com>
    Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
    Reviewed-by: Jia He <justin.he@arm.com>
    Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    ---
     lib/librte_ring/rte_ring_c11_mem.h | 22 ++++++++++++++++++----
     1 file changed, 18 insertions(+), 4 deletions(-)
    
    diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
    index 94df3c4..52da95a 100644
    --- a/lib/librte_ring/rte_ring_c11_mem.h
    +++ b/lib/librte_ring/rte_ring_c11_mem.h
    @@ -57,6 +57,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
     		uint32_t *free_entries)
     {
     	const uint32_t capacity = r->capacity;
    +	uint32_t cons_tail;
     	unsigned int max = n;
     	int success;
     
    @@ -67,13 +68,18 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
     		*old_head = __atomic_load_n(&r->prod.head,
     					__ATOMIC_ACQUIRE);
     
    -		/*
    -		 *  The subtraction is done between two unsigned 32bits value
    +		/* load-acquire synchronize with store-release of ht->tail
    +		 * in update_tail.
    +		 */
    +		cons_tail = __atomic_load_n(&r->cons.tail,
    +					__ATOMIC_ACQUIRE);
    +
    +		/* The subtraction is done between two unsigned 32bits value
     		 * (the result is always modulo 32 bits even if we have
     		 * *old_head > cons_tail). So 'free_entries' is always between 0
     		 * and capacity (which is < size).
     		 */
    -		*free_entries = (capacity + r->cons.tail - *old_head);
    +		*free_entries = (capacity + cons_tail - *old_head);
     
     		/* check that we have enough room in ring */
     		if (unlikely(n > *free_entries))
    @@ -125,21 +131,29 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
     		uint32_t *entries)
     {
     	unsigned int max = n;
    +	uint32_t prod_tail;
     	int success;
     
     	/* move cons.head atomically */
     	do {
     		/* Restore n as it may change every loop */
     		n = max;
    +
     		*old_head = __atomic_load_n(&r->cons.head,
     					__ATOMIC_ACQUIRE);
     
    +		/* this load-acquire synchronize with store-release of ht->tail
    +		 * in update_tail.
    +		 */
    +		prod_tail = __atomic_load_n(&r->prod.tail,
    +					__ATOMIC_ACQUIRE);
    +
     		/* The subtraction is done between two unsigned 32bits value
     		 * (the result is always modulo 32 bits even if we have
     		 * cons_head > prod_tail). So 'entries' is always between 0
     		 * and size(ring)-1.
     		 */
    -		*entries = (r->prod.tail - *old_head);
    +		*entries = (prod_tail - *old_head);
     
     		/* Set the actual entries for dequeue */
     		if (n > *entries)
    -- 
    2.7.4
    
    ^ permalink raw reply	[flat|nested] 125+ messages in thread
  • * [dpdk-stable] [PATCH v5 2/2] ring: move the atomic load of head above the loop
           [not found] ` <1541066031-29125-1-git-send-email-gavin.hu@arm.com>
           [not found]   ` <1540981587-88590-1-git-send-email-gavin.hu@arm.com>
      2018-11-02 11:21   ` [dpdk-stable] [PATCH v5 1/2] ring: synchronize the load and store of the tail Gavin Hu
    @ 2018-11-02 11:21   ` Gavin Hu
      2018-11-02 11:43     ` Bruce Richardson
      2 siblings, 1 reply; 125+ messages in thread
    From: Gavin Hu @ 2018-11-02 11:21 UTC (permalink / raw)
      To: dev
      Cc: thomas, stephen, olivier.matz, chaozhu, bruce.richardson,
    	konstantin.ananyev, jerin.jacob, Honnappa.Nagarahalli, gavin.hu,
    	stable
    
    In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
    the do {} while loop as upon failure the old_head will be updated,
    another load is costly and not necessary.
    
    This helps a little on the latency,about 1~5%.
    
     Test result with the patch(two cores):
     SP/SC bulk enq/dequeue (size: 8): 5.64
     MP/MC bulk enq/dequeue (size: 8): 9.58
     SP/SC bulk enq/dequeue (size: 32): 1.98
     MP/MC bulk enq/dequeue (size: 32): 2.30
    
    Fixes: 39368ebfc606 ("ring: introduce C11 memory model barrier option")
    Cc: stable@dpdk.org
    
    Signed-off-by: Gavin Hu <gavin.hu@arm.com>
    Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
    Reviewed-by: Steve Capper <steve.capper@arm.com>
    Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
    Reviewed-by: Jia He <justin.he@arm.com>
    Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    Tested-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
    ---
     doc/guides/rel_notes/release_18_11.rst |  7 +++++++
     lib/librte_ring/rte_ring_c11_mem.h     | 10 ++++------
     2 files changed, 11 insertions(+), 6 deletions(-)
    
    diff --git a/doc/guides/rel_notes/release_18_11.rst b/doc/guides/rel_notes/release_18_11.rst
    index 376128f..b68afab 100644
    --- a/doc/guides/rel_notes/release_18_11.rst
    +++ b/doc/guides/rel_notes/release_18_11.rst
    @@ -69,6 +69,13 @@ New Features
       checked out against that dma mask and rejected if out of range. If more than
       one device has addressing limitations, the dma mask is the more restricted one.
     
    +* **Updated the ring library with C11 memory model.**
    +
    +  Updated the ring library with C11 memory model, in our tests the changes
    +  decreased latency by 27~29% and 3~15% for MPMC and SPSC cases respectively.
    +  The real improvements may vary with the number of contending lcores and the
    +  size of ring.
    +
     * **Added hot-unplug handle mechanism.**
     
       ``rte_dev_hotplug_handle_enable`` and ``rte_dev_hotplug_handle_disable`` are
    diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
    index 52da95a..7bc74a4 100644
    --- a/lib/librte_ring/rte_ring_c11_mem.h
    +++ b/lib/librte_ring/rte_ring_c11_mem.h
    @@ -61,13 +61,11 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
     	unsigned int max = n;
     	int success;
     
    +	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_ACQUIRE);
     	do {
     		/* Reset n to the initial burst count */
     		n = max;
     
    -		*old_head = __atomic_load_n(&r->prod.head,
    -					__ATOMIC_ACQUIRE);
    -
     		/* load-acquire synchronize with store-release of ht->tail
     		 * in update_tail.
     		 */
    @@ -93,6 +91,7 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
     		if (is_sp)
     			r->prod.head = *new_head, success = 1;
     		else
    +			/* on failure, *old_head is updated */
     			success = __atomic_compare_exchange_n(&r->prod.head,
     					old_head, *new_head,
     					0, __ATOMIC_ACQUIRE,
    @@ -135,13 +134,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
     	int success;
     
     	/* move cons.head atomically */
    +	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_ACQUIRE);
     	do {
     		/* Restore n as it may change every loop */
     		n = max;
     
    -		*old_head = __atomic_load_n(&r->cons.head,
    -					__ATOMIC_ACQUIRE);
    -
     		/* this load-acquire synchronize with store-release of ht->tail
     		 * in update_tail.
     		 */
    @@ -166,6 +163,7 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
     		if (is_sc)
     			r->cons.head = *new_head, success = 1;
     		else
    +			/* on failure, *old_head will be updated */
     			success = __atomic_compare_exchange_n(&r->cons.head,
     							old_head, *new_head,
     							0, __ATOMIC_ACQUIRE,
    -- 
    2.7.4
    
    ^ permalink raw reply	[flat|nested] 125+ messages in thread
  • * [dpdk-stable] [PATCH] ring: fix c11 memory ordering issue
    @ 2018-08-06  1:18 Gavin Hu
      2018-08-06  9:19 ` [dpdk-stable] [dpdk-dev] " Thomas Monjalon
      2018-08-07  3:19 ` [dpdk-stable] [PATCH v2] " Gavin Hu
      0 siblings, 2 replies; 125+ messages in thread
    From: Gavin Hu @ 2018-08-06  1:18 UTC (permalink / raw)
      To: dev
      Cc: gavin.hu, Honnappa.Nagarahalli, steve.capper, Ola.Liljedahl,
    	jerin.jacob, hemant.agrawal, jia.he, stable
    
    1) In update_tail, read ht->tail using __atomic_load.
    2) In __rte_ring_move_prod_head, move the __atomic_load_n up and out of
       the do {} while loop as upon failure the old_head will be updated,
       another load is not necessary.
    3) Synchronize the load-acquires of prod.tail and cons.tail with store-
       releases of update_tail which releases all ring updates up to the
       value of ht->tail.
    4) When calling __atomic_compare_exchange_n, relaxed ordering for both
       success and failure, as multiple threads can work independently on
       the same end of the ring (either enqueue or dequeue) without
       synchronization, not as operating on tail, which has to be finished
       in sequence.
    
    Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
    Cc: stable@dpdk.org
    
    Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
    Reviewed-by: Steve Capper <steve.capper@arm.com>
    Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
    Signed-off-by: Gavin Hu <gavin.hu@arm.com>
    ---
     lib/librte_ring/rte_ring_c11_mem.h | 38 +++++++++++++++++++++++++-------------
     1 file changed, 25 insertions(+), 13 deletions(-)
    
    diff --git a/lib/librte_ring/rte_ring_c11_mem.h b/lib/librte_ring/rte_ring_c11_mem.h
    index 94df3c4a6..cfa3be4a7 100644
    --- a/lib/librte_ring/rte_ring_c11_mem.h
    +++ b/lib/librte_ring/rte_ring_c11_mem.h
    @@ -21,7 +21,8 @@ update_tail(struct rte_ring_headtail *ht, uint32_t old_val, uint32_t new_val,
     	 * we need to wait for them to complete
     	 */
     	if (!single)
    -		while (unlikely(ht->tail != old_val))
    +		while (unlikely(old_val != __atomic_load_n(&ht->tail,
    +						__ATOMIC_RELAXED)))
     			rte_pause();
     
     	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
    @@ -60,20 +61,24 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
     	unsigned int max = n;
     	int success;
     
    +	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
     	do {
     		/* Reset n to the initial burst count */
     		n = max;
     
    -		*old_head = __atomic_load_n(&r->prod.head,
    -					__ATOMIC_ACQUIRE);
     
    -		/*
    -		 *  The subtraction is done between two unsigned 32bits value
    +		/* load-acquire synchronize with store-release of ht->tail
    +		 * in update_tail.
    +		 */
    +		const uint32_t cons_tail = __atomic_load_n(&r->cons.tail,
    +							__ATOMIC_ACQUIRE);
    +
    +		/* The subtraction is done between two unsigned 32bits value
     		 * (the result is always modulo 32 bits even if we have
     		 * *old_head > cons_tail). So 'free_entries' is always between 0
     		 * and capacity (which is < size).
     		 */
    -		*free_entries = (capacity + r->cons.tail - *old_head);
    +		*free_entries = (capacity + cons_tail - *old_head);
     
     		/* check that we have enough room in ring */
     		if (unlikely(n > *free_entries))
    @@ -87,9 +92,10 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
     		if (is_sp)
     			r->prod.head = *new_head, success = 1;
     		else
    +			/* on failure, *old_head is updated */
     			success = __atomic_compare_exchange_n(&r->prod.head,
     					old_head, *new_head,
    -					0, __ATOMIC_ACQUIRE,
    +					/*weak=*/0, __ATOMIC_RELAXED,
     					__ATOMIC_RELAXED);
     	} while (unlikely(success == 0));
     	return n;
    @@ -128,18 +134,23 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
     	int success;
     
     	/* move cons.head atomically */
    +	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
     	do {
     		/* Restore n as it may change every loop */
     		n = max;
    -		*old_head = __atomic_load_n(&r->cons.head,
    -					__ATOMIC_ACQUIRE);
    +
    +		/* this load-acquire synchronize with store-release of ht->tail
    +		 * in update_tail.
    +		 */
    +		const uint32_t prod_tail = __atomic_load_n(&r->prod.tail,
    +							__ATOMIC_ACQUIRE);
     
     		/* The subtraction is done between two unsigned 32bits value
     		 * (the result is always modulo 32 bits even if we have
     		 * cons_head > prod_tail). So 'entries' is always between 0
     		 * and size(ring)-1.
     		 */
    -		*entries = (r->prod.tail - *old_head);
    +		*entries = (prod_tail - *old_head);
     
     		/* Set the actual entries for dequeue */
     		if (n > *entries)
    @@ -152,10 +163,11 @@ __rte_ring_move_cons_head(struct rte_ring *r, int is_sc,
     		if (is_sc)
     			r->cons.head = *new_head, success = 1;
     		else
    +			/* on failure, *old_head will be updated */
     			success = __atomic_compare_exchange_n(&r->cons.head,
    -							old_head, *new_head,
    -							0, __ATOMIC_ACQUIRE,
    -							__ATOMIC_RELAXED);
    +						old_head, *new_head,
    +						/*weak=*/0, __ATOMIC_RELAXED,
    +						__ATOMIC_RELAXED);
     	} while (unlikely(success == 0));
     	return n;
     }
    -- 
    2.11.0
    
    ^ permalink raw reply	[flat|nested] 125+ messages in thread

    end of thread, other threads:[~2018-11-06 11:03 UTC | newest]
    
    Thread overview: 125+ messages (download: mbox.gz / follow: Atom feed)
    -- links below jump to the message on this page --
         [not found] <1541157688-40012-1-git-send-email-gavin.hu@arm.com>
         [not found] ` <1541066031-29125-1-git-send-email-gavin.hu@arm.com>
         [not found]   ` <1540981587-88590-1-git-send-email-gavin.hu@arm.com>
         [not found]     ` <1540956945-211373-1-git-send-email-gavin.hu@arm.com>
    2018-10-31 10:26       ` [dpdk-stable] [PATCH v3 1/2] ring: synchronize the load and store of the tail Gavin Hu
    2018-10-31 22:07         ` [dpdk-stable] [dpdk-dev] " Stephen Hemminger
    2018-11-01  9:56           ` Gavin Hu (Arm Technology China)
    2018-10-31 10:26       ` [dpdk-stable] [PATCH v3 2/2] ring: move the atomic load of head above the loop Gavin Hu
    2018-11-01  9:53     ` [dpdk-stable] [PATCH v4 1/2] ring: synchronize the load and store of the tail Gavin Hu
    2018-11-01  9:53     ` [dpdk-stable] [PATCH v4 2/2] ring: move the atomic load of head above the loop Gavin Hu
    2018-11-01 17:26       ` Stephen Hemminger
    2018-11-02  0:53         ` Gavin Hu (Arm Technology China)
    2018-11-02  4:30           ` Honnappa Nagarahalli
    2018-11-02  7:15             ` Gavin Hu (Arm Technology China)
    2018-11-02  9:36               ` Thomas Monjalon
    2018-11-02 11:23                 ` Gavin Hu (Arm Technology China)
    2018-11-02 11:21   ` [dpdk-stable] [PATCH v5 1/2] ring: synchronize the load and store of the tail Gavin Hu
    2018-11-05  9:30     ` Olivier Matz
    2018-11-02 11:21   ` [dpdk-stable] [PATCH v5 2/2] ring: move the atomic load of head above the loop Gavin Hu
    2018-11-02 11:43     ` Bruce Richardson
    2018-11-03  1:19       ` Gavin Hu (Arm Technology China)
    2018-11-03  9:34         ` Honnappa Nagarahalli
    2018-11-05 13:17           ` Thomas Monjalon
    2018-11-05 13:41             ` Jerin Jacob
    2018-11-05  9:44         ` Olivier Matz
    2018-11-05 13:36           ` Thomas Monjalon
    2018-08-06  1:18 [dpdk-stable] [PATCH] ring: fix c11 memory ordering issue Gavin Hu
    2018-08-06  9:19 ` [dpdk-stable] [dpdk-dev] " Thomas Monjalon
    2018-08-08  1:39   ` Gavin Hu
    2018-08-07  3:19 ` [dpdk-stable] [PATCH v2] " Gavin Hu
    2018-08-07  5:56   ` He, Jia
    2018-08-07  7:56     ` Gavin Hu
    2018-08-08  3:07       ` Jerin Jacob
    2018-08-08  7:23         ` Thomas Monjalon
         [not found]   ` <20180917074735.28161-1-gavin.hu@arm.com>
    2018-09-17  7:47     ` [dpdk-stable] [PATCH v3 3/3] doc: add cross compile part for sample applications Gavin Hu
    2018-09-17  9:48       ` Jerin Jacob
    2018-09-17 10:28         ` Gavin Hu (Arm Technology China)
    2018-09-17 10:34           ` Jerin Jacob
    2018-09-17 10:55             ` Gavin Hu (Arm Technology China)
    2018-09-17 10:49       ` [dpdk-stable] [PATCH v4] " Gavin Hu
    2018-09-17 10:53         ` [dpdk-stable] [PATCH v5] " Gavin Hu
    2018-09-18 11:00           ` Jerin Jacob
    2018-09-19  0:33           ` [dpdk-stable] [PATCH v6] " Gavin Hu
    2018-09-17  8:11     ` [dpdk-stable] [PATCH v4 1/4] bus/fslmc: fix undefined reference of memsegs Gavin Hu
    2018-09-17  8:11       ` [dpdk-stable] [PATCH v4 2/4] ring: read tail using atomic load Gavin Hu
    2018-09-20  6:41         ` Jerin Jacob
    2018-09-25  9:26           ` Gavin Hu (Arm Technology China)
    2018-09-17  8:11       ` [dpdk-stable] [PATCH v4 3/4] ring: synchronize the load and store of the tail Gavin Hu
    2018-09-17  8:11       ` [dpdk-stable] [PATCH v4 4/4] ring: move the atomic load of head above the loop Gavin Hu
    2018-10-27 14:21         ` Thomas Monjalon
    2018-09-17  8:17   ` [dpdk-stable] [PATCH v3 1/3] ring: read tail using atomic load Gavin Hu
    2018-09-17  8:17     ` [dpdk-stable] [PATCH v3 2/3] ring: synchronize the load and store of the tail Gavin Hu
    2018-09-26  9:29       ` Gavin Hu (Arm Technology China)
    2018-09-26  9:59         ` Justin He
    2018-09-29 10:57       ` Jerin Jacob
    2018-10-17  6:29       ` [dpdk-stable] [PATCH 1/2] " Gavin Hu
    2018-10-17  6:29         ` [dpdk-stable] [PATCH 2/2] ring: move the atomic load of head above the loop Gavin Hu
    2018-10-17  6:35         ` [dpdk-stable] [PATCH 1/2] ring: synchronize the load and store of the tail Gavin Hu (Arm Technology China)
    2018-10-27 14:39           ` [dpdk-stable] [dpdk-dev] " Thomas Monjalon
    2018-10-27 15:00             ` Jerin Jacob
    2018-10-27 15:13               ` Thomas Monjalon
    2018-10-27 15:34                 ` Jerin Jacob
    2018-10-27 15:48                   ` Thomas Monjalon
    2018-10-29  2:51                   ` Gavin Hu (Arm Technology China)
    2018-10-29  2:57                   ` Gavin Hu (Arm Technology China)
    2018-10-29 10:16                     ` Jerin Jacob
    2018-10-29 10:47                       ` Thomas Monjalon
    2018-10-29 11:10                         ` Jerin Jacob
    2018-11-03 20:12                 ` Mattias Rönnblom
    2018-11-05 21:51                   ` Honnappa Nagarahalli
    2018-11-06 11:03                     ` Mattias Rönnblom
         [not found]         ` <1540955698-209159-1-git-send-email-gavin.hu@arm.com>
    2018-10-31  3:14           ` [dpdk-stable] [PATCH v2 " Gavin Hu
    2018-10-31  3:14           ` [dpdk-stable] [PATCH v2 2/2] ring: move the atomic load of head above the loop Gavin Hu
    2018-10-31  3:35         ` [dpdk-stable] [PATCH v2 1/2] ring: synchronize the load and store of the tail Gavin Hu
    2018-10-31  3:35         ` [dpdk-stable] [PATCH v2 2/2] ring: move the atomic load of head above the loop Gavin Hu
    2018-10-31  9:36           ` Thomas Monjalon
    2018-10-31 10:27             ` Gavin Hu (Arm Technology China)
    2018-09-17  8:17     ` [dpdk-stable] [PATCH v3 3/3] " Gavin Hu
    2018-09-26  9:29       ` Gavin Hu (Arm Technology China)
    2018-09-26 10:06         ` Justin He
    2018-09-29  7:19           ` [dpdk-stable] [dpdk-dev] " Stephen Hemminger
    2018-09-29 10:59       ` [dpdk-stable] " Jerin Jacob
    2018-09-26  9:29     ` [dpdk-stable] [PATCH v3 1/3] ring: read tail using atomic load Gavin Hu (Arm Technology China)
    2018-09-26 10:09       ` Justin He
    2018-09-29 10:48     ` Jerin Jacob
    2018-10-05  0:47       ` Gavin Hu (Arm Technology China)
    2018-10-05  8:21         ` Ananyev, Konstantin
    2018-10-05 11:15           ` Ola Liljedahl
    2018-10-05 11:36             ` Ola Liljedahl
    2018-10-05 13:44               ` Ananyev, Konstantin
    2018-10-05 14:21                 ` Ola Liljedahl
    2018-10-05 15:11                 ` Honnappa Nagarahalli
    2018-10-05 17:07                   ` Jerin Jacob
    2018-10-05 18:05                     ` Ola Liljedahl
    2018-10-05 20:06                       ` Honnappa Nagarahalli
    2018-10-05 20:17                         ` Ola Liljedahl
    2018-10-05 20:29                           ` Honnappa Nagarahalli
    2018-10-05 20:34                             ` Ola Liljedahl
    2018-10-06  7:41                               ` Jerin Jacob
    2018-10-06 19:44                                 ` Ola Liljedahl
    2018-10-06 19:59                                   ` Ola Liljedahl
    2018-10-07  4:02                                   ` Jerin Jacob
    2018-10-07 20:11                                     ` Ola Liljedahl
    2018-10-07 20:44                                     ` Ola Liljedahl
    2018-10-08  6:06                                       ` Jerin Jacob
    2018-10-08  9:22                                         ` Ola Liljedahl
    2018-10-08 10:00                                           ` Jerin Jacob
    2018-10-08 10:25                                             ` Ola Liljedahl
    2018-10-08 10:33                                               ` Gavin Hu (Arm Technology China)
    2018-10-08 10:39                                                 ` Ola Liljedahl
    2018-10-08 10:41                                                   ` Gavin Hu (Arm Technology China)
    2018-10-08 10:49                                                 ` Jerin Jacob
    2018-10-10  6:28                                                   ` Gavin Hu (Arm Technology China)
    2018-10-10 19:26                                                     ` Honnappa Nagarahalli
    2018-10-08 10:46                                               ` Jerin Jacob
    2018-10-08 11:21                                                 ` Ola Liljedahl
    2018-10-08 11:50                                                   ` Jerin Jacob
    2018-10-08 11:59                                                     ` Ola Liljedahl
    2018-10-08 12:05                                                       ` Jerin Jacob
    2018-10-08 12:20                                                         ` [dpdk-stable] [dpdk-dev] " Jerin Jacob
    2018-10-08 12:30                                                           ` Ola Liljedahl
    2018-10-09  8:53                                                             ` Olivier Matz
    2018-10-09  3:16                                             ` [dpdk-stable] " Honnappa Nagarahalli
    2018-10-08 14:43                                           ` [dpdk-stable] [dpdk-dev] " Bruce Richardson
    2018-10-08 14:46                                             ` Ola Liljedahl
    2018-10-08 15:45                                               ` Ola Liljedahl
    2018-10-08  5:27                               ` [dpdk-stable] " Honnappa Nagarahalli
    2018-10-08 10:01                                 ` Ola Liljedahl
    2018-10-27 14:17     ` Thomas Monjalon
    

    This is a public inbox, see mirroring instructions
    for how to clone and mirror all data and code used for this inbox;
    as well as URLs for NNTP newsgroup(s).