DPDK patches and discussions
 help / color / Atom feed
* [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path
@ 2020-09-11  5:38 Phil Yang
  2020-09-18 11:58 ` Ferruh Yigit
  2020-09-18 22:49 ` Honnappa Nagarahalli
  0 siblings, 2 replies; 8+ messages in thread
From: Phil Yang @ 2020-09-11  5:38 UTC (permalink / raw)
  To: jgrajcia, dev; +Cc: Honnappa.Nagarahalli, Ruifeng.Wang, nd

Using 'rte_mb' to synchronize the shared ring head/tail between producer
and consumer will stall the pipeline and damage performance on the weak
memory model platforms, such like aarch64.

Relax the expensive barrier with c11 atomic with explicit memory
ordering can improve 3.6% performance on throughput.

Signed-off-by: Phil Yang <phil.yang@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/memif/rte_eth_memif.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
index c1c7e9f..a19c0f3 100644
--- a/drivers/net/memif/rte_eth_memif.c
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -253,7 +253,12 @@ memif_free_stored_mbufs(struct pmd_process_private *proc_private, struct memif_q
 	memif_ring_t *ring = memif_get_ring_from_queue(proc_private, mq);
 
 	/* FIXME: improve performance */
-	while (mq->last_tail != ring->tail) {
+	/* The ring->tail acts as a guard variable between Tx and Rx
+	 * threads, so using load-acquire pairs with store-release
+	 * to synchronize it between threads.
+	 */
+	while (mq->last_tail != __atomic_load_n(&ring->tail,
+						__ATOMIC_ACQUIRE)) {
 		RTE_MBUF_PREFETCH_TO_FREE(mq->buffers[(mq->last_tail + 1) & mask]);
 		/* Decrement refcnt and free mbuf. (current segment) */
 		rte_mbuf_refcnt_update(mq->buffers[mq->last_tail & mask], -1);
@@ -455,7 +460,11 @@ eth_memif_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	mask = ring_size - 1;
 
 	cur_slot = mq->last_tail;
-	last_slot = ring->tail;
+	/* The ring->tail acts as a guard variable between Tx and Rx
+	 * threads, so using load-acquire pairs with store-release
+	 * to synchronize it between threads.
+	 */
+	last_slot = __atomic_load_n(&ring->tail, __ATOMIC_ACQUIRE);
 	if (cur_slot == last_slot)
 		goto refill;
 	n_slots = last_slot - cur_slot;
@@ -501,7 +510,11 @@ eth_memif_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 
 /* Supply master with new buffers */
 refill:
-	head = ring->head;
+	/* The ring->head acts as a guard variable between Tx and Rx
+	 * threads, so using load-acquire pairs with store-release
+	 * to synchronize it between threads.
+	 */
+	head = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE);
 	n_slots = ring_size - head + mq->last_tail;
 
 	if (n_slots < 32)
@@ -526,8 +539,7 @@ eth_memif_rx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 			(uint8_t *)proc_private->regions[d0->region]->addr;
 	}
 no_free_mbufs:
-	rte_mb();
-	ring->head = head;
+	__atomic_store_n(&ring->head, head, __ATOMIC_RELEASE);
 
 	mq->n_pkts += n_rx_pkts;
 
@@ -723,8 +735,12 @@ eth_memif_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	memif_free_stored_mbufs(proc_private, mq);
 
 	/* ring type always MEMIF_RING_S2M */
-	slot = ring->head;
-	n_free = ring_size - ring->head + mq->last_tail;
+	/* The ring->head acts as a guard variable between Tx and Rx
+	 * threads, so using load-acquire pairs with store-release
+	 * to synchronize it between threads.
+	 */
+	slot = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE);
+	n_free = ring_size - slot + mq->last_tail;
 
 	int used_slots;
 
@@ -778,12 +794,11 @@ eth_memif_tx_zc(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 	}
 
 no_free_slots:
-	rte_mb();
 	/* update ring pointers */
 	if (type == MEMIF_RING_S2M)
-		ring->head = slot;
+		__atomic_store_n(&ring->head, slot, __ATOMIC_RELEASE);
 	else
-		ring->tail = slot;
+		__atomic_store_n(&ring->tail, slot, __ATOMIC_RELEASE);
 
 	/* Send interrupt, if enabled. */
 	if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
-- 
2.7.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path
  2020-09-11  5:38 [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path Phil Yang
@ 2020-09-18 11:58 ` Ferruh Yigit
  2020-09-18 22:49 ` Honnappa Nagarahalli
  1 sibling, 0 replies; 8+ messages in thread
From: Ferruh Yigit @ 2020-09-18 11:58 UTC (permalink / raw)
  To: jgrajcia; +Cc: Honnappa.Nagarahalli, Ruifeng.Wang, nd, Phil Yang, dev

On 9/11/2020 6:38 AM, Phil Yang wrote:
> Using 'rte_mb' to synchronize the shared ring head/tail between producer
> and consumer will stall the pipeline and damage performance on the weak
> memory model platforms, such like aarch64.
> 
> Relax the expensive barrier with c11 atomic with explicit memory
> ordering can improve 3.6% performance on throughput.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

Hi Jakub,

Can you please check/test this patch?

Thanks,
ferruh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path
  2020-09-11  5:38 [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path Phil Yang
  2020-09-18 11:58 ` Ferruh Yigit
@ 2020-09-18 22:49 ` Honnappa Nagarahalli
  2020-09-21  9:03   ` Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)
  1 sibling, 1 reply; 8+ messages in thread
From: Honnappa Nagarahalli @ 2020-09-18 22:49 UTC (permalink / raw)
  To: Phil Yang, jgrajcia, dev; +Cc: Ruifeng Wang, nd, Honnappa Nagarahalli, nd

Hi Jakub,
	I am trying to review this patch. I am having difficulty in understanding the implementation for the queue/ring, appreciate if you could help me understand the logic.

1) The S2M queues - are used to send packets from slave to master. My understanding is that, the slave thread would call 'eth_memif_tx_zc' and the master thread would call 'eth_memif_rx_zc'. Is this correct?
2) The M2S queues - are used to send packets from master to slave. Here the slave thread would call 'eth_memif_rx_zc' and the master thread would call 'eth_memif_tx_zc'. Is this correct?

Thank you,
Honnappa

> -----Original Message-----
> From: Phil Yang <phil.yang@arm.com>
> Sent: Friday, September 11, 2020 12:38 AM
> To: jgrajcia@cisco.com; dev@dpdk.org
> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: [PATCH] net/memif: relax barrier for zero copy path
> 
> Using 'rte_mb' to synchronize the shared ring head/tail between producer
> and consumer will stall the pipeline and damage performance on the weak
> memory model platforms, such like aarch64.
> 
> Relax the expensive barrier with c11 atomic with explicit memory ordering
> can improve 3.6% performance on throughput.
> 
> Signed-off-by: Phil Yang <phil.yang@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  drivers/net/memif/rte_eth_memif.c | 35 +++++++++++++++++++++++++------
> ----
>  1 file changed, 25 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/memif/rte_eth_memif.c
> b/drivers/net/memif/rte_eth_memif.c
> index c1c7e9f..a19c0f3 100644
> --- a/drivers/net/memif/rte_eth_memif.c
> +++ b/drivers/net/memif/rte_eth_memif.c
> @@ -253,7 +253,12 @@ memif_free_stored_mbufs(struct
> pmd_process_private *proc_private, struct memif_q
>  	memif_ring_t *ring = memif_get_ring_from_queue(proc_private,
> mq);
> 
>  	/* FIXME: improve performance */
> -	while (mq->last_tail != ring->tail) {
> +	/* The ring->tail acts as a guard variable between Tx and Rx
> +	 * threads, so using load-acquire pairs with store-release
> +	 * to synchronize it between threads.
> +	 */
> +	while (mq->last_tail != __atomic_load_n(&ring->tail,
> +						__ATOMIC_ACQUIRE)) {
>  		RTE_MBUF_PREFETCH_TO_FREE(mq->buffers[(mq->last_tail
> + 1) & mask]);
>  		/* Decrement refcnt and free mbuf. (current segment) */
>  		rte_mbuf_refcnt_update(mq->buffers[mq->last_tail & mask],
> -1); @@ -455,7 +460,11 @@ eth_memif_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>  	mask = ring_size - 1;
> 
>  	cur_slot = mq->last_tail;
> -	last_slot = ring->tail;
> +	/* The ring->tail acts as a guard variable between Tx and Rx
> +	 * threads, so using load-acquire pairs with store-release
> +	 * to synchronize it between threads.
> +	 */
> +	last_slot = __atomic_load_n(&ring->tail, __ATOMIC_ACQUIRE);
>  	if (cur_slot == last_slot)
>  		goto refill;
>  	n_slots = last_slot - cur_slot;
> @@ -501,7 +510,11 @@ eth_memif_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
> 
>  /* Supply master with new buffers */
>  refill:
> -	head = ring->head;
> +	/* The ring->head acts as a guard variable between Tx and Rx
> +	 * threads, so using load-acquire pairs with store-release
> +	 * to synchronize it between threads.
> +	 */
> +	head = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE);
>  	n_slots = ring_size - head + mq->last_tail;
> 
>  	if (n_slots < 32)
> @@ -526,8 +539,7 @@ eth_memif_rx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>  			(uint8_t *)proc_private->regions[d0->region]->addr;
>  	}
>  no_free_mbufs:
> -	rte_mb();
> -	ring->head = head;
> +	__atomic_store_n(&ring->head, head, __ATOMIC_RELEASE);
> 
>  	mq->n_pkts += n_rx_pkts;
> 
> @@ -723,8 +735,12 @@ eth_memif_tx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>  	memif_free_stored_mbufs(proc_private, mq);
> 
>  	/* ring type always MEMIF_RING_S2M */
> -	slot = ring->head;
> -	n_free = ring_size - ring->head + mq->last_tail;
> +	/* The ring->head acts as a guard variable between Tx and Rx
> +	 * threads, so using load-acquire pairs with store-release
> +	 * to synchronize it between threads.
> +	 */
> +	slot = __atomic_load_n(&ring->head, __ATOMIC_ACQUIRE);
> +	n_free = ring_size - slot + mq->last_tail;
> 
>  	int used_slots;
> 
> @@ -778,12 +794,11 @@ eth_memif_tx_zc(void *queue, struct rte_mbuf
> **bufs, uint16_t nb_pkts)
>  	}
> 
>  no_free_slots:
> -	rte_mb();
>  	/* update ring pointers */
>  	if (type == MEMIF_RING_S2M)
> -		ring->head = slot;
> +		__atomic_store_n(&ring->head, slot, __ATOMIC_RELEASE);
>  	else
> -		ring->tail = slot;
> +		__atomic_store_n(&ring->tail, slot, __ATOMIC_RELEASE);
> 
>  	/* Send interrupt, if enabled. */
>  	if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
> --
> 2.7.4


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path
  2020-09-18 22:49 ` Honnappa Nagarahalli
@ 2020-09-21  9:03   ` Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)
  2020-09-21 10:22     ` Phil Yang
  2020-09-21 19:28     ` Honnappa Nagarahalli
  0 siblings, 2 replies; 8+ messages in thread
From: Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco) @ 2020-09-21  9:03 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Phil Yang, dev; +Cc: Ruifeng Wang, nd, nd

Hi Honnappa,

Inline comments...

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Saturday, September 19, 2020 12:49 AM
> To: Phil Yang <Phil.Yang@arm.com>; Jakub Grajciar -X (jgrajcia - PANTHEON
> TECH SRO at Cisco) <jgrajcia@cisco.com>; dev@dpdk.org
> Cc: Ruifeng Wang <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH] net/memif: relax barrier for zero copy path
> 
> Hi Jakub,
> 	I am trying to review this patch. I am having difficulty in understanding
> the implementation for the queue/ring, appreciate if you could help me
> understand the logic.

'ring' refers to a ring buffer holding packet descriptors. These descriptors hold metadata about the packet (packet buffer address, length, etc..).
'queues' are a representation of rings and buffers  (+ some metadata). In more detail, one ring (S2M) and packet buffers allocated for this ring would be represented as 'tx queue' for the slave and 'rx queue' for the master.

> 
> 1) The S2M queues - are used to send packets from slave to master. My
> understanding is that, the slave thread would call 'eth_memif_tx_zc' and the
> master thread would call 'eth_memif_rx_zc'. Is this correct?
> 2) The M2S queues - are used to send packets from master to slave. Here the
> slave thread would call 'eth_memif_rx_zc' and the master thread would call
> 'eth_memif_tx_zc'. Is this correct?

This is inded correct.

> 
> Thank you,
> Honnappa
> 
> > -----Original Message-----
> > From: Phil Yang <phil.yang@arm.com>
> > Sent: Friday, September 11, 2020 12:38 AM
> > To: jgrajcia@cisco.com; dev@dpdk.org
> > Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > Subject: [PATCH] net/memif: relax barrier for zero copy path
> >
> > Using 'rte_mb' to synchronize the shared ring head/tail between
> > producer and consumer will stall the pipeline and damage performance
> > on the weak memory model platforms, such like aarch64.
> >
> > Relax the expensive barrier with c11 atomic with explicit memory
> > ordering can improve 3.6% performance on throughput.

My question here is: `rte_mb` is supposed to make sure that head/tail pointer are not updated before the packets are written into shared memory. Does the atomic ensures that the packets are written into shared memory before head/tail pointers are updated?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path
  2020-09-21  9:03   ` Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)
@ 2020-09-21 10:22     ` Phil Yang
  2020-09-21 12:21       ` Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)
  2020-09-21 19:28     ` Honnappa Nagarahalli
  1 sibling, 1 reply; 8+ messages in thread
From: Phil Yang @ 2020-09-21 10:22 UTC (permalink / raw)
  To: Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco),
	Honnappa Nagarahalli, dev
  Cc: Ruifeng Wang, nd, nd, nd

Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco) writes:

<snip>

> > Subject: RE: [PATCH] net/memif: relax barrier for zero copy path
> >
> > Hi Jakub,
> > 	I am trying to review this patch. I am having difficulty in
> understanding
> > the implementation for the queue/ring, appreciate if you could help me
> > understand the logic.
> 
> 'ring' refers to a ring buffer holding packet descriptors. These descriptors
> hold metadata about the packet (packet buffer address, length, etc..).
> 'queues' are a representation of rings and buffers  (+ some metadata). In
> more detail, one ring (S2M) and packet buffers allocated for this ring would
> be represented as 'tx queue' for the slave and 'rx queue' for the master.
> 
> >
> > 1) The S2M queues - are used to send packets from slave to master. My
> > understanding is that, the slave thread would call 'eth_memif_tx_zc' and
> the
> > master thread would call 'eth_memif_rx_zc'. Is this correct?
> > 2) The M2S queues - are used to send packets from master to slave. Here
> the
> > slave thread would call 'eth_memif_rx_zc' and the master thread would call
> > 'eth_memif_tx_zc'. Is this correct?
> 
> This is inded correct.
 
<snip>

> > >
> > > Using 'rte_mb' to synchronize the shared ring head/tail between
> > > producer and consumer will stall the pipeline and damage performance
> > > on the weak memory model platforms, such like aarch64.
> > >
> > > Relax the expensive barrier with c11 atomic with explicit memory
> > > ordering can improve 3.6% performance on throughput.
> 
> My question here is: `rte_mb` is supposed to make sure that head/tail
> pointer are not updated before the packets are written into shared memory.
> Does the atomic ensures that the packets are written into shared memory
> before head/tail pointers are updated?

Yes, it does. 
The atomic store-release acts as a one-way barrier here to make sure all the memory accesses before the store-release are observed before it.


Thanks,
Phil

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path
  2020-09-21 10:22     ` Phil Yang
@ 2020-09-21 12:21       ` Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)
  2020-09-21 13:27         ` Ferruh Yigit
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco) @ 2020-09-21 12:21 UTC (permalink / raw)
  To: Phil Yang, Honnappa Nagarahalli, dev; +Cc: Ruifeng Wang, nd, nd, nd

> > > >
> > > > Using 'rte_mb' to synchronize the shared ring head/tail between
> > > > producer and consumer will stall the pipeline and damage
> > > > performance on the weak memory model platforms, such like aarch64.
> > > >
> > > > Relax the expensive barrier with c11 atomic with explicit memory
> > > > ordering can improve 3.6% performance on throughput.
> >
> > My question here is: `rte_mb` is supposed to make sure that head/tail
> > pointer are not updated before the packets are written into shared
> memory.
> > Does the atomic ensures that the packets are written into shared
> > memory before head/tail pointers are updated?
> 
> Yes, it does.
> The atomic store-release acts as a one-way barrier here to make sure all the
> memory accesses before the store-release are observed before it.

Ok then, since the sync is there and the rest seems fine to me.

Thanks,
Jakub

Reviewed-by: Jakub Grajciar <jgrajcia@cisco.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path
  2020-09-21 12:21       ` Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)
@ 2020-09-21 13:27         ` Ferruh Yigit
  0 siblings, 0 replies; 8+ messages in thread
From: Ferruh Yigit @ 2020-09-21 13:27 UTC (permalink / raw)
  To: Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco),
	Phil Yang, Honnappa Nagarahalli, dev
  Cc: Ruifeng Wang, nd

On 9/21/2020 1:21 PM, Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at 
Cisco) wrote:
>>>>>
>>>>> Using 'rte_mb' to synchronize the shared ring head/tail between
>>>>> producer and consumer will stall the pipeline and damage
>>>>> performance on the weak memory model platforms, such like aarch64.
>>>>>
>>>>> Relax the expensive barrier with c11 atomic with explicit memory
>>>>> ordering can improve 3.6% performance on throughput.
>>>
>>> My question here is: `rte_mb` is supposed to make sure that head/tail
>>> pointer are not updated before the packets are written into shared
>> memory.
>>> Does the atomic ensures that the packets are written into shared
>>> memory before head/tail pointers are updated?
>>
>> Yes, it does.
>> The atomic store-release acts as a one-way barrier here to make sure all the
>> memory accesses before the store-release are observed before it.
> 
> Ok then, since the sync is there and the rest seems fine to me.
> 
> Thanks,
> Jakub
> 
> Reviewed-by: Jakub Grajciar <jgrajcia@cisco.com>
> 

Applied to dpdk-next-net/main, thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path
  2020-09-21  9:03   ` Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)
  2020-09-21 10:22     ` Phil Yang
@ 2020-09-21 19:28     ` Honnappa Nagarahalli
  1 sibling, 0 replies; 8+ messages in thread
From: Honnappa Nagarahalli @ 2020-09-21 19:28 UTC (permalink / raw)
  To: Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco),
	Phil Yang, dev
  Cc: Ruifeng Wang, nd, Honnappa Nagarahalli, nd

<snip>

> >
> > Hi Jakub,
> > 	I am trying to review this patch. I am having difficulty in
> > understanding the implementation for the queue/ring, appreciate if you
> > could help me understand the logic.
> 
> 'ring' refers to a ring buffer holding packet descriptors. These descriptors
> hold metadata about the packet (packet buffer address, length, etc..).
> 'queues' are a representation of rings and buffers  (+ some metadata). In
> more detail, one ring (S2M) and packet buffers allocated for this ring would
> be represented as 'tx queue' for the slave and 'rx queue' for the master.
Thanks Jakub. I could run the testpmd app and was able to understand the implementation. I found some issues and I have sent out a patch series [1].

[1] https://patches.dpdk.org/patch/78212/

> 
> >
> > 1) The S2M queues - are used to send packets from slave to master. My
> > understanding is that, the slave thread would call 'eth_memif_tx_zc'
> > and the master thread would call 'eth_memif_rx_zc'. Is this correct?
> > 2) The M2S queues - are used to send packets from master to slave.
> > Here the slave thread would call 'eth_memif_rx_zc' and the master
> > thread would call 'eth_memif_tx_zc'. Is this correct?
> 
> This is inded correct.
> 
> >
> > Thank you,
> > Honnappa
> >
> > > -----Original Message-----
> > > From: Phil Yang <phil.yang@arm.com>
> > > Sent: Friday, September 11, 2020 12:38 AM
> > > To: jgrajcia@cisco.com; dev@dpdk.org
> > > Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng
> > > Wang <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> > > Subject: [PATCH] net/memif: relax barrier for zero copy path
> > >
> > > Using 'rte_mb' to synchronize the shared ring head/tail between
> > > producer and consumer will stall the pipeline and damage performance
> > > on the weak memory model platforms, such like aarch64.
> > >
> > > Relax the expensive barrier with c11 atomic with explicit memory
> > > ordering can improve 3.6% performance on throughput.
> 
> My question here is: `rte_mb` is supposed to make sure that head/tail
> pointer are not updated before the packets are written into shared memory.
> Does the atomic ensures that the packets are written into shared memory
> before head/tail pointers are updated?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-11  5:38 [dpdk-dev] [PATCH] net/memif: relax barrier for zero copy path Phil Yang
2020-09-18 11:58 ` Ferruh Yigit
2020-09-18 22:49 ` Honnappa Nagarahalli
2020-09-21  9:03   ` Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)
2020-09-21 10:22     ` Phil Yang
2020-09-21 12:21       ` Jakub Grajciar -X (jgrajcia - PANTHEON TECH SRO at Cisco)
2020-09-21 13:27         ` Ferruh Yigit
2020-09-21 19:28     ` Honnappa Nagarahalli

DPDK patches and discussions

Archives are clonable:
	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox