DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier
@ 2014-09-11  7:48 Hiroshi Shimamoto
  2014-09-24 15:18 ` Thomas Monjalon
  0 siblings, 1 reply; 3+ messages in thread
From: Hiroshi Shimamoto @ 2014-09-11  7:48 UTC (permalink / raw)
  To: dev; +Cc: Hayato Momma

From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>

x86 can keep store ordering with standard operations.

Using memory barrier is much expensive in main packet processing loop.
Removing this improves xmit/recv packet performance.

We can see performance improvements with memnic-tester.
Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
 size |  before  |  after
   64 | 4.18Mpps | 4.59Mpps
  128 | 3.85Mpps | 4.87Mpps
  256 | 4.01Mpps | 4.72Mpps
  512 | 3.52Mpps | 4.41Mpps
 1024 | 3.18Mpps | 3.64Mpps
 1280 | 2.86Mpps | 3.15Mpps
 1518 | 2.59Mpps | 2.87Mpps

Note: we have to take care if we use temporal cache.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Reviewed-by: Hayato Momma <h-momma@ce.jp.nec.com>
---
 pmd/pmd_memnic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/pmd/pmd_memnic.c b/pmd/pmd_memnic.c
index 8341da7..c22a14d 100644
--- a/pmd/pmd_memnic.c
+++ b/pmd/pmd_memnic.c
@@ -316,7 +316,7 @@ static uint16_t memnic_recv_pkts(void *rx_queue,
 		bytes += p->len;
 
 drop:
-		rte_mb();
+		rte_compiler_barrier();
 		p->status = MEMNIC_PKT_ST_FREE;
 
 		if (++idx >= MEMNIC_NR_PACKET)
@@ -403,7 +403,7 @@ retry:
 		pkts++;
 		bytes += pkt_len;
 
-		rte_mb();
+		rte_compiler_barrier();
 		p->status = MEMNIC_PKT_ST_FILLED;
 
 		rte_pktmbuf_free(tx_pkts[nr]);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier
  2014-09-11  7:48 [dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier Hiroshi Shimamoto
@ 2014-09-24 15:18 ` Thomas Monjalon
  2014-09-25  0:35   ` Hiroshi Shimamoto
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Monjalon @ 2014-09-24 15:18 UTC (permalink / raw)
  To: dev, Hiroshi Shimamoto; +Cc: Hayato Momma

2014-09-11 07:48, Hiroshi Shimamoto:
> x86 can keep store ordering with standard operations.

Are we sure it's always the case (including old 32-bit CPU)?
I would prefer to have a reference here. I know we already discussed
this kind of things but having a reference in commit log could help
for future discussions.

> Using memory barrier is much expensive in main packet processing loop.
> Removing this improves xmit/recv packet performance.
> 
> We can see performance improvements with memnic-tester.
> Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
>  size |  before  |  after
>    64 | 4.18Mpps | 4.59Mpps
>   128 | 3.85Mpps | 4.87Mpps
>   256 | 4.01Mpps | 4.72Mpps
>   512 | 3.52Mpps | 4.41Mpps
>  1024 | 3.18Mpps | 3.64Mpps
>  1280 | 2.86Mpps | 3.15Mpps
>  1518 | 2.59Mpps | 2.87Mpps
> 
> Note: we have to take care if we use temporal cache.

Please, could you explain this last sentence?

Thanks
-- 
Thomas

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier
  2014-09-24 15:18 ` Thomas Monjalon
@ 2014-09-25  0:35   ` Hiroshi Shimamoto
  0 siblings, 0 replies; 3+ messages in thread
From: Hiroshi Shimamoto @ 2014-09-25  0:35 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: Hayato Momma

> Subject: Re: [dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier
> 
> 2014-09-11 07:48, Hiroshi Shimamoto:
> > x86 can keep store ordering with standard operations.
> 
> Are we sure it's always the case (including old 32-bit CPU)?
> I would prefer to have a reference here. I know we already discussed
> this kind of things but having a reference in commit log could help
> for future discussions.
> 
> > Using memory barrier is much expensive in main packet processing loop.
> > Removing this improves xmit/recv packet performance.
> >
> > We can see performance improvements with memnic-tester.
> > Using Xeon E5-2697 v2 @ 2.70GHz, 4 vCPU.
> >  size |  before  |  after
> >    64 | 4.18Mpps | 4.59Mpps
> >   128 | 3.85Mpps | 4.87Mpps
> >   256 | 4.01Mpps | 4.72Mpps
> >   512 | 3.52Mpps | 4.41Mpps
> >  1024 | 3.18Mpps | 3.64Mpps
> >  1280 | 2.86Mpps | 3.15Mpps
> >  1518 | 2.59Mpps | 2.87Mpps
> >
> > Note: we have to take care if we use temporal cache.
> 
> Please, could you explain this last sentence?

Oops, I have mistaken the word, "temporal" should be "non-temporal".

By the way, there are some instructions which use non-temporal
cache liek MOVNTx series.
The store ordering of these instructions is not kept.

Ref. Intel Software Developer Manual
 Vol.1 10.4.6.2 Caching of Temporal vs. Non-Temporal Data
 Vol.3 8.2 Memory Ordering

thanks,
Hiroshi

> 
> Thanks
> --
> Thomas

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-09-25  0:29 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-11  7:48 [dpdk-dev] [memnic PATCH 4/7] pmd: use compiler barrier Hiroshi Shimamoto
2014-09-24 15:18 ` Thomas Monjalon
2014-09-25  0:35   ` Hiroshi Shimamoto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).