patches for DPDK stable branches
 help / color / mirror / Atom feed
* [dpdk-stable] [PATCH] app/testpmd: improve MAC swap performance
@ 2018-11-20  4:45 Qi Zhang
  2018-11-20  9:16 ` [dpdk-stable] [dpdk-dev] " Ananyev, Konstantin
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Qi Zhang @ 2018-11-20  4:45 UTC (permalink / raw)
  To: bruce.richardson, keith.wiles
  Cc: dev, wenzhuo.lu, bernard.iremonger, Qi Zhang, stable

The patch optimizes the mac swap operation by taking advantage
of SSE instructions, it only impacts x86 platform.

Cc: stable@dpdk.org

Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
---
 app/test-pmd/macswap.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
index a8384d5b8..0722782b0 100644
--- a/app/test-pmd/macswap.c
+++ b/app/test-pmd/macswap.c
@@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 	struct rte_port  *txp;
 	struct rte_mbuf  *mb;
 	struct ether_hdr *eth_hdr;
-	struct ether_addr addr;
 	uint16_t nb_rx;
 	uint16_t nb_tx;
 	uint16_t i;
@@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 	start_tsc = rte_rdtsc();
 #endif
 
+#ifdef RTE_ARCH_X86
+	__m128i addr;
+	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
+					5, 4, 3, 2,
+					1, 0, 11, 10,
+					9, 8, 7, 6);
+#else
+	struct ether_addr addr;
+#endif
 	/*
 	 * Receive a burst of packets and forward them.
 	 */
@@ -123,9 +131,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
 
 		/* Swap dest and src mac addresses. */
+#ifdef RTE_ARCH_X86
+		addr = _mm_loadu_si128((__m128i *)eth_hdr);
+		addr = _mm_shuffle_epi8(addr, shfl_msk);
+		_mm_storeu_si128((__m128i *)eth_hdr, addr);
+#else
 		ether_addr_copy(&eth_hdr->d_addr, &addr);
 		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
 		ether_addr_copy(&addr, &eth_hdr->s_addr);
+#endif
 
 		mb->ol_flags &= IND_ATTACHED_MBUF | EXT_ATTACHED_MBUF;
 		mb->ol_flags |= ol_flags;
-- 
2.13.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
  2018-11-20  4:45 [dpdk-stable] [PATCH] app/testpmd: improve MAC swap performance Qi Zhang
@ 2018-11-20  9:16 ` Ananyev, Konstantin
  2018-11-20 14:48   ` Wiles, Keith
  2018-11-20 16:58   ` Zhang, Qi Z
  2018-11-23 22:43 ` [dpdk-stable] " Wiles, Keith
  2018-11-23 22:43 ` Wiles, Keith
  2 siblings, 2 replies; 11+ messages in thread
From: Ananyev, Konstantin @ 2018-11-20  9:16 UTC (permalink / raw)
  To: Zhang, Qi Z, Richardson, Bruce, Wiles, Keith
  Cc: dev, Lu, Wenzhuo, Iremonger, Bernard, Zhang, Qi Z, stable

Hi Qi,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Qi Zhang
> Sent: Tuesday, November 20, 2018 4:46 AM
> To: Richardson, Bruce <bruce.richardson@intel.com>; Wiles, Keith <keith.wiles@intel.com>
> Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger, Bernard <bernard.iremonger@intel.com>; Zhang, Qi Z
> <qi.z.zhang@intel.com>; stable@dpdk.org
> Subject: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> 
> The patch optimizes the mac swap operation by taking advantage
> of SSE instructions, it only impacts x86 platform.
> 
> Cc: stable@dpdk.org
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
>  app/test-pmd/macswap.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> index a8384d5b8..0722782b0 100644
> --- a/app/test-pmd/macswap.c
> +++ b/app/test-pmd/macswap.c
> @@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>  	struct rte_port  *txp;
>  	struct rte_mbuf  *mb;
>  	struct ether_hdr *eth_hdr;
> -	struct ether_addr addr;
>  	uint16_t nb_rx;
>  	uint16_t nb_tx;
>  	uint16_t i;
> @@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>  	start_tsc = rte_rdtsc();
>  #endif
> 
> +#ifdef RTE_ARCH_X86
> +	__m128i addr;
> +	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
> +					5, 4, 3, 2,
> +					1, 0, 11, 10,
> +					9, 8, 7, 6);
> +#else
> +	struct ether_addr addr;
> +#endif

I think it would better to place IA specific code into a separate fnction
(and probably into a separate .h file).
BTW, just curious what % of improvement it gives?
Konstantin


>  	/*
>  	 * Receive a burst of packets and forward them.
>  	 */
> @@ -123,9 +131,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>  		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> 
>  		/* Swap dest and src mac addresses. */
> +#ifdef RTE_ARCH_X86
> +		addr = _mm_loadu_si128((__m128i *)eth_hdr);
> +		addr = _mm_shuffle_epi8(addr, shfl_msk);
> +		_mm_storeu_si128((__m128i *)eth_hdr, addr);
> +#else
>  		ether_addr_copy(&eth_hdr->d_addr, &addr);
>  		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
>  		ether_addr_copy(&addr, &eth_hdr->s_addr);
> +#endif
> 
>  		mb->ol_flags &= IND_ATTACHED_MBUF | EXT_ATTACHED_MBUF;
>  		mb->ol_flags |= ol_flags;
> --
> 2.13.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
  2018-11-20  9:16 ` [dpdk-stable] [dpdk-dev] " Ananyev, Konstantin
@ 2018-11-20 14:48   ` Wiles, Keith
  2018-11-20 16:58   ` Zhang, Qi Z
  1 sibling, 0 replies; 11+ messages in thread
From: Wiles, Keith @ 2018-11-20 14:48 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Zhang, Qi Z, Richardson, Bruce, dev, Lu, Wenzhuo, Iremonger,
	Bernard, stable



> On Nov 20, 2018, at 3:16 AM, Ananyev, Konstantin <konstantin.ananyev@intel.com> wrote:
> 
> Hi Qi,
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Qi Zhang
>> Sent: Tuesday, November 20, 2018 4:46 AM
>> To: Richardson, Bruce <bruce.richardson@intel.com>; Wiles, Keith <keith.wiles@intel.com>
>> Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger, Bernard <bernard.iremonger@intel.com>; Zhang, Qi Z
>> <qi.z.zhang@intel.com>; stable@dpdk.org
>> Subject: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
>> 
>> The patch optimizes the mac swap operation by taking advantage
>> of SSE instructions, it only impacts x86 platform.
>> 
>> Cc: stable@dpdk.org
>> 
>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> ---
>> app/test-pmd/macswap.c | 16 +++++++++++++++-
>> 1 file changed, 15 insertions(+), 1 deletion(-)
>> 
>> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
>> index a8384d5b8..0722782b0 100644
>> --- a/app/test-pmd/macswap.c
>> +++ b/app/test-pmd/macswap.c
>> @@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>> 	struct rte_port  *txp;
>> 	struct rte_mbuf  *mb;
>> 	struct ether_hdr *eth_hdr;
>> -	struct ether_addr addr;
>> 	uint16_t nb_rx;
>> 	uint16_t nb_tx;
>> 	uint16_t i;
>> @@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>> 	start_tsc = rte_rdtsc();
>> #endif
>> 
>> +#ifdef RTE_ARCH_X86
>> +	__m128i addr;
>> +	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
>> +					5, 4, 3, 2,
>> +					1, 0, 11, 10,
>> +					9, 8, 7, 6);
>> +#else
>> +	struct ether_addr addr;
>> +#endif
> 
> I think it would better to place IA specific code into a separate fnction
> (and probably into a separate .h file).
> BTW, just curious what % of improvement it gives?
> Konstantin
> 

If we are going to the trouble of moving the code out of the .c into a .h then I suggest it be placed into the rte_ethdev headers for everyone to use.
> 
>> 	/*
>> 	 * Receive a burst of packets and forward them.
>> 	 */
>> @@ -123,9 +131,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>> 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
>> 
>> 		/* Swap dest and src mac addresses. */
>> +#ifdef RTE_ARCH_X86
>> +		addr = _mm_loadu_si128((__m128i *)eth_hdr);
>> +		addr = _mm_shuffle_epi8(addr, shfl_msk);
>> +		_mm_storeu_si128((__m128i *)eth_hdr, addr);
>> +#else
>> 		ether_addr_copy(&eth_hdr->d_addr, &addr);
>> 		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
>> 		ether_addr_copy(&addr, &eth_hdr->s_addr);
>> +#endif
>> 
>> 		mb->ol_flags &= IND_ATTACHED_MBUF | EXT_ATTACHED_MBUF;
>> 		mb->ol_flags |= ol_flags;
>> --
>> 2.13.6

Regards,
Keith

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
  2018-11-20  9:16 ` [dpdk-stable] [dpdk-dev] " Ananyev, Konstantin
  2018-11-20 14:48   ` Wiles, Keith
@ 2018-11-20 16:58   ` Zhang, Qi Z
  2018-11-20 17:26     ` Ananyev, Konstantin
  2018-11-20 22:53     ` Ananyev, Konstantin
  1 sibling, 2 replies; 11+ messages in thread
From: Zhang, Qi Z @ 2018-11-20 16:58 UTC (permalink / raw)
  To: Ananyev, Konstantin, Richardson, Bruce, Wiles, Keith
  Cc: dev, Lu, Wenzhuo, Iremonger, Bernard, stable



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, November 20, 2018 1:17 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Wiles, Keith <keith.wiles@intel.com>
> Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger, Bernard
> <bernard.iremonger@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> stable@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> 
> Hi Qi,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Qi Zhang
> > Sent: Tuesday, November 20, 2018 4:46 AM
> > To: Richardson, Bruce <bruce.richardson@intel.com>; Wiles, Keith
> > <keith.wiles@intel.com>
> > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger,
> > Bernard <bernard.iremonger@intel.com>; Zhang, Qi Z
> > <qi.z.zhang@intel.com>; stable@dpdk.org
> > Subject: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> >
> > The patch optimizes the mac swap operation by taking advantage of SSE
> > instructions, it only impacts x86 platform.
> >
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > ---
> >  app/test-pmd/macswap.c | 16 +++++++++++++++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c index
> > a8384d5b8..0722782b0 100644
> > --- a/app/test-pmd/macswap.c
> > +++ b/app/test-pmd/macswap.c
> > @@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> >  	struct rte_port  *txp;
> >  	struct rte_mbuf  *mb;
> >  	struct ether_hdr *eth_hdr;
> > -	struct ether_addr addr;
> >  	uint16_t nb_rx;
> >  	uint16_t nb_tx;
> >  	uint16_t i;
> > @@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> >  	start_tsc = rte_rdtsc();
> >  #endif
> >
> > +#ifdef RTE_ARCH_X86
> > +	__m128i addr;
> > +	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
> > +					5, 4, 3, 2,
> > +					1, 0, 11, 10,
> > +					9, 8, 7, 6);
> > +#else
> > +	struct ether_addr addr;
> > +#endif
> 
> I think it would better to place IA specific code into a separate fnction (and
> probably into a separate .h file).

OK, I will think about how to rework this.

> BTW, just curious what % of improvement it gives?

So far , the only server I can test is a 1.6GHz Broadwell server with 2 ports on 1 i40e 25G.
The macswap performance is increase from 16.8mpps to 20mpps (about 19% improvement)
	
> Konstantin
> 
> 
> >  	/*
> >  	 * Receive a burst of packets and forward them.
> >  	 */
> > @@ -123,9 +131,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> >  		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> >
> >  		/* Swap dest and src mac addresses. */
> > +#ifdef RTE_ARCH_X86
> > +		addr = _mm_loadu_si128((__m128i *)eth_hdr);
> > +		addr = _mm_shuffle_epi8(addr, shfl_msk);
> > +		_mm_storeu_si128((__m128i *)eth_hdr, addr); #else
> >  		ether_addr_copy(&eth_hdr->d_addr, &addr);
> >  		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
> >  		ether_addr_copy(&addr, &eth_hdr->s_addr);
> > +#endif
> >
> >  		mb->ol_flags &= IND_ATTACHED_MBUF | EXT_ATTACHED_MBUF;
> >  		mb->ol_flags |= ol_flags;
> > --
> > 2.13.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
  2018-11-20 16:58   ` Zhang, Qi Z
@ 2018-11-20 17:26     ` Ananyev, Konstantin
  2018-11-20 22:53     ` Ananyev, Konstantin
  1 sibling, 0 replies; 11+ messages in thread
From: Ananyev, Konstantin @ 2018-11-20 17:26 UTC (permalink / raw)
  To: Zhang, Qi Z, Richardson, Bruce, Wiles, Keith
  Cc: dev, Lu, Wenzhuo, Iremonger, Bernard, stable



> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Tuesday, November 20, 2018 4:58 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Wiles, Keith
> <keith.wiles@intel.com>
> Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger, Bernard <bernard.iremonger@intel.com>; stable@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Tuesday, November 20, 2018 1:17 AM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Richardson, Bruce
> > <bruce.richardson@intel.com>; Wiles, Keith <keith.wiles@intel.com>
> > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger, Bernard
> > <bernard.iremonger@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> > stable@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> >
> > Hi Qi,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Qi Zhang
> > > Sent: Tuesday, November 20, 2018 4:46 AM
> > > To: Richardson, Bruce <bruce.richardson@intel.com>; Wiles, Keith
> > > <keith.wiles@intel.com>
> > > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger,
> > > Bernard <bernard.iremonger@intel.com>; Zhang, Qi Z
> > > <qi.z.zhang@intel.com>; stable@dpdk.org
> > > Subject: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> > >
> > > The patch optimizes the mac swap operation by taking advantage of SSE
> > > instructions, it only impacts x86 platform.
> > >
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > ---
> > >  app/test-pmd/macswap.c | 16 +++++++++++++++-
> > >  1 file changed, 15 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c index
> > > a8384d5b8..0722782b0 100644
> > > --- a/app/test-pmd/macswap.c
> > > +++ b/app/test-pmd/macswap.c
> > > @@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> > >  	struct rte_port  *txp;
> > >  	struct rte_mbuf  *mb;
> > >  	struct ether_hdr *eth_hdr;
> > > -	struct ether_addr addr;
> > >  	uint16_t nb_rx;
> > >  	uint16_t nb_tx;
> > >  	uint16_t i;
> > > @@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> > >  	start_tsc = rte_rdtsc();
> > >  #endif
> > >
> > > +#ifdef RTE_ARCH_X86
> > > +	__m128i addr;
> > > +	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
> > > +					5, 4, 3, 2,
> > > +					1, 0, 11, 10,
> > > +					9, 8, 7, 6);
> > > +#else
> > > +	struct ether_addr addr;
> > > +#endif
> >
> > I think it would better to place IA specific code into a separate fnction (and
> > probably into a separate .h file).
> 
> OK, I will think about how to rework this.

Ideally would be good to have an generic one, and IA optimized version.

> 
> > BTW, just curious what % of improvement it gives?
> 
> So far , the only server I can test is a 1.6GHz Broadwell server with 2 ports on 1 i40e 25G.
> The macswap performance is increase from 16.8mpps to 20mpps (about 19% improvement)

Quite a lot, definitely looks like worth it.

> 
> > Konstantin
> >
> >
> > >  	/*
> > >  	 * Receive a burst of packets and forward them.
> > >  	 */
> > > @@ -123,9 +131,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> > >  		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> > >
> > >  		/* Swap dest and src mac addresses. */
> > > +#ifdef RTE_ARCH_X86
> > > +		addr = _mm_loadu_si128((__m128i *)eth_hdr);
> > > +		addr = _mm_shuffle_epi8(addr, shfl_msk);
> > > +		_mm_storeu_si128((__m128i *)eth_hdr, addr); #else
> > >  		ether_addr_copy(&eth_hdr->d_addr, &addr);
> > >  		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
> > >  		ether_addr_copy(&addr, &eth_hdr->s_addr);
> > > +#endif
> > >
> > >  		mb->ol_flags &= IND_ATTACHED_MBUF | EXT_ATTACHED_MBUF;
> > >  		mb->ol_flags |= ol_flags;
> > > --
> > > 2.13.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
  2018-11-20 16:58   ` Zhang, Qi Z
  2018-11-20 17:26     ` Ananyev, Konstantin
@ 2018-11-20 22:53     ` Ananyev, Konstantin
  2018-11-21 21:24       ` Zhang, Qi Z
  1 sibling, 1 reply; 11+ messages in thread
From: Ananyev, Konstantin @ 2018-11-20 22:53 UTC (permalink / raw)
  To: Zhang, Qi Z, Richardson, Bruce, Wiles, Keith
  Cc: dev, Lu, Wenzhuo, Iremonger, Bernard, stable



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, November 20, 2018 5:26 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Wiles, Keith <keith.wiles@intel.com>
> Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger, Bernard <bernard.iremonger@intel.com>; stable@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> 
> 
> 
> > -----Original Message-----
> > From: Zhang, Qi Z
> > Sent: Tuesday, November 20, 2018 4:58 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>; Wiles, Keith
> > <keith.wiles@intel.com>
> > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger, Bernard <bernard.iremonger@intel.com>; stable@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> >
> >
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Tuesday, November 20, 2018 1:17 AM
> > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Richardson, Bruce
> > > <bruce.richardson@intel.com>; Wiles, Keith <keith.wiles@intel.com>
> > > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger, Bernard
> > > <bernard.iremonger@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>;
> > > stable@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> > >
> > > Hi Qi,
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Qi Zhang
> > > > Sent: Tuesday, November 20, 2018 4:46 AM
> > > > To: Richardson, Bruce <bruce.richardson@intel.com>; Wiles, Keith
> > > > <keith.wiles@intel.com>
> > > > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger,
> > > > Bernard <bernard.iremonger@intel.com>; Zhang, Qi Z
> > > > <qi.z.zhang@intel.com>; stable@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> > > >
> > > > The patch optimizes the mac swap operation by taking advantage of SSE
> > > > instructions, it only impacts x86 platform.
> > > >
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > > ---
> > > >  app/test-pmd/macswap.c | 16 +++++++++++++++-
> > > >  1 file changed, 15 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c index
> > > > a8384d5b8..0722782b0 100644
> > > > --- a/app/test-pmd/macswap.c
> > > > +++ b/app/test-pmd/macswap.c
> > > > @@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> > > >  	struct rte_port  *txp;
> > > >  	struct rte_mbuf  *mb;
> > > >  	struct ether_hdr *eth_hdr;
> > > > -	struct ether_addr addr;
> > > >  	uint16_t nb_rx;
> > > >  	uint16_t nb_tx;
> > > >  	uint16_t i;
> > > > @@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> > > >  	start_tsc = rte_rdtsc();
> > > >  #endif
> > > >
> > > > +#ifdef RTE_ARCH_X86
> > > > +	__m128i addr;
> > > > +	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
> > > > +					5, 4, 3, 2,
> > > > +					1, 0, 11, 10,
> > > > +					9, 8, 7, 6);
> > > > +#else
> > > > +	struct ether_addr addr;
> > > > +#endif
> > >
> > > I think it would better to place IA specific code into a separate fnction (and
> > > probably into a separate .h file).
> >
> > OK, I will think about how to rework this.
> 
> Ideally would be good to have an generic one, and IA optimized version.
> 
> >
> > > BTW, just curious what % of improvement it gives?
> >
> > So far , the only server I can test is a 1.6GHz Broadwell server with 2 ports on 1 i40e 25G.
> > The macswap performance is increase from 16.8mpps to 20mpps (about 19% improvement)
> 
> Quite a lot, definitely looks like worth it.

You probably can squeeze few more cycles doing it in bulks of 4 or so.
Konstantin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
  2018-11-20 22:53     ` Ananyev, Konstantin
@ 2018-11-21 21:24       ` Zhang, Qi Z
  0 siblings, 0 replies; 11+ messages in thread
From: Zhang, Qi Z @ 2018-11-21 21:24 UTC (permalink / raw)
  To: Ananyev, Konstantin, Richardson, Bruce, Wiles, Keith
  Cc: dev, Lu, Wenzhuo, Iremonger, Bernard, stable



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, November 20, 2018 2:54 PM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Wiles, Keith <keith.wiles@intel.com>
> Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger, Bernard
> <bernard.iremonger@intel.com>; stable@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap performance
> 
> 
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Tuesday, November 20, 2018 5:26 PM
> > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Richardson, Bruce
> > <bruce.richardson@intel.com>; Wiles, Keith <keith.wiles@intel.com>
> > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger,
> > Bernard <bernard.iremonger@intel.com>; stable@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap
> > performance
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhang, Qi Z
> > > Sent: Tuesday, November 20, 2018 4:58 PM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Richardson,
> > > Bruce <bruce.richardson@intel.com>; Wiles, Keith
> > > <keith.wiles@intel.com>
> > > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger,
> > > Bernard <bernard.iremonger@intel.com>; stable@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap
> > > performance
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Tuesday, November 20, 2018 1:17 AM
> > > > To: Zhang, Qi Z <qi.z.zhang@intel.com>; Richardson, Bruce
> > > > <bruce.richardson@intel.com>; Wiles, Keith <keith.wiles@intel.com>
> > > > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger,
> > > > Bernard <bernard.iremonger@intel.com>; Zhang, Qi Z
> > > > <qi.z.zhang@intel.com>; stable@dpdk.org
> > > > Subject: RE: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap
> > > > performance
> > > >
> > > > Hi Qi,
> > > >
> > > > > -----Original Message-----
> > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Qi Zhang
> > > > > Sent: Tuesday, November 20, 2018 4:46 AM
> > > > > To: Richardson, Bruce <bruce.richardson@intel.com>; Wiles, Keith
> > > > > <keith.wiles@intel.com>
> > > > > Cc: dev@dpdk.org; Lu, Wenzhuo <wenzhuo.lu@intel.com>; Iremonger,
> > > > > Bernard <bernard.iremonger@intel.com>; Zhang, Qi Z
> > > > > <qi.z.zhang@intel.com>; stable@dpdk.org
> > > > > Subject: [dpdk-dev] [PATCH] app/testpmd: improve MAC swap
> > > > > performance
> > > > >
> > > > > The patch optimizes the mac swap operation by taking advantage
> > > > > of SSE instructions, it only impacts x86 platform.
> > > > >
> > > > > Cc: stable@dpdk.org
> > > > >
> > > > > Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> > > > > ---
> > > > >  app/test-pmd/macswap.c | 16 +++++++++++++++-
> > > > >  1 file changed, 15 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> > > > > index
> > > > > a8384d5b8..0722782b0 100644
> > > > > --- a/app/test-pmd/macswap.c
> > > > > +++ b/app/test-pmd/macswap.c
> > > > > @@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> > > > >  	struct rte_port  *txp;
> > > > >  	struct rte_mbuf  *mb;
> > > > >  	struct ether_hdr *eth_hdr;
> > > > > -	struct ether_addr addr;
> > > > >  	uint16_t nb_rx;
> > > > >  	uint16_t nb_tx;
> > > > >  	uint16_t i;
> > > > > @@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> > > > >  	start_tsc = rte_rdtsc();
> > > > >  #endif
> > > > >
> > > > > +#ifdef RTE_ARCH_X86
> > > > > +	__m128i addr;
> > > > > +	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
> > > > > +					5, 4, 3, 2,
> > > > > +					1, 0, 11, 10,
> > > > > +					9, 8, 7, 6);
> > > > > +#else
> > > > > +	struct ether_addr addr;
> > > > > +#endif
> > > >
> > > > I think it would better to place IA specific code into a separate
> > > > fnction (and probably into a separate .h file).
> > >
> > > OK, I will think about how to rework this.
> >
> > Ideally would be good to have an generic one, and IA optimized version.
> >
> > >
> > > > BTW, just curious what % of improvement it gives?
> > >
> > > So far , the only server I can test is a 1.6GHz Broadwell server with 2 ports on
> 1 i40e 25G.
> > > The macswap performance is increase from 16.8mpps to 20mpps (about
> > > 19% improvement)

I need to add a notice here, I found previous test is running on CPU from remote socket.
For the test on CPU from local socket on the same server, actually the mac swap performance is improved from 23.34 to 26.36, its about 12.9% increase, but still considerable.

> >
> > Quite a lot, definitely looks like worth it.
> 
> You probably can squeeze few more cycles doing it in bulks of 4 or so.

it's a good idea, based on my experience I can get more than 4% increase by batch with 4, 
it can reach 27.46mpps, so now its 17.7% increase, I will send patch later, please help to polish:)

Thanks
Qi

> Konstantin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-stable] [PATCH] app/testpmd: improve MAC swap performance
  2018-11-20  4:45 [dpdk-stable] [PATCH] app/testpmd: improve MAC swap performance Qi Zhang
  2018-11-20  9:16 ` [dpdk-stable] [dpdk-dev] " Ananyev, Konstantin
@ 2018-11-23 22:43 ` Wiles, Keith
  2018-11-23 22:43 ` Wiles, Keith
  2 siblings, 0 replies; 11+ messages in thread
From: Wiles, Keith @ 2018-11-23 22:43 UTC (permalink / raw)
  To: Zhang, Qi Z
  Cc: Richardson, Bruce, dev, Lu, Wenzhuo, Iremonger, Bernard, stable



> On Nov 19, 2018, at 10:45 PM, Zhang, Qi Z <qi.z.zhang@intel.com> wrote:
> 
> The patch optimizes the mac swap operation by taking advantage
> of SSE instructions, it only impacts x86 platform.
> 
> Cc: stable@dpdk.org
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
> app/test-pmd/macswap.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> index a8384d5b8..0722782b0 100644
> --- a/app/test-pmd/macswap.c
> +++ b/app/test-pmd/macswap.c
> @@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> 	struct rte_port  *txp;
> 	struct rte_mbuf  *mb;
> 	struct ether_hdr *eth_hdr;
> -	struct ether_addr addr;
> 	uint16_t nb_rx;
> 	uint16_t nb_tx;
> 	uint16_t i;
> @@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> 	start_tsc = rte_rdtsc();
> #endif
> 
> +#ifdef RTE_ARCH_X86
> +	__m128i addr;
> +	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
> +					5, 4, 3, 2,
> +					1, 0, 11, 10,
> +					9, 8, 7, 6);

I was playing around with these mask values and I was not able to make it work as I expected.
I ended up with different values in the mask.

_mm_set_epi8(15, 14, 13, 12, 5, 4, 3, 2, 1, 0, 11, 10, 9, 8, 7, 6);

After dumping the memory for a large number of tests this one seems correct, can you verify your mask is correct?

> +#else
> +	struct ether_addr addr;
> +#endif
> 	/*
> 	 * Receive a burst of packets and forward them.
> 	 */
> @@ -123,9 +131,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> 
> 		/* Swap dest and src mac addresses. */
> +#ifdef RTE_ARCH_X86
> +		addr = _mm_loadu_si128((__m128i *)eth_hdr);
> +		addr = _mm_shuffle_epi8(addr, shfl_msk);
> +		_mm_storeu_si128((__m128i *)eth_hdr, addr);
> +#else
> 		ether_addr_copy(&eth_hdr->d_addr, &addr);
> 		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
> 		ether_addr_copy(&addr, &eth_hdr->s_addr);
> +#endif
> 
> 		mb->ol_flags &= IND_ATTACHED_MBUF | EXT_ATTACHED_MBUF;
> 		mb->ol_flags |= ol_flags;
> -- 
> 2.13.6
> 

Regards,
Keith

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-stable] [PATCH] app/testpmd: improve MAC swap performance
  2018-11-20  4:45 [dpdk-stable] [PATCH] app/testpmd: improve MAC swap performance Qi Zhang
  2018-11-20  9:16 ` [dpdk-stable] [dpdk-dev] " Ananyev, Konstantin
  2018-11-23 22:43 ` [dpdk-stable] " Wiles, Keith
@ 2018-11-23 22:43 ` Wiles, Keith
  2018-11-24 16:24   ` Wiles, Keith
  2 siblings, 1 reply; 11+ messages in thread
From: Wiles, Keith @ 2018-11-23 22:43 UTC (permalink / raw)
  To: Zhang, Qi Z
  Cc: Richardson, Bruce, dev, Lu, Wenzhuo, Iremonger, Bernard, stable



> On Nov 19, 2018, at 10:45 PM, Zhang, Qi Z <qi.z.zhang@intel.com> wrote:
> 
> The patch optimizes the mac swap operation by taking advantage
> of SSE instructions, it only impacts x86 platform.
> 
> Cc: stable@dpdk.org
> 
> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> ---
> app/test-pmd/macswap.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
> index a8384d5b8..0722782b0 100644
> --- a/app/test-pmd/macswap.c
> +++ b/app/test-pmd/macswap.c
> @@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> 	struct rte_port  *txp;
> 	struct rte_mbuf  *mb;
> 	struct ether_hdr *eth_hdr;
> -	struct ether_addr addr;
> 	uint16_t nb_rx;
> 	uint16_t nb_tx;
> 	uint16_t i;
> @@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> 	start_tsc = rte_rdtsc();
> #endif
> 
> +#ifdef RTE_ARCH_X86
> +	__m128i addr;
> +	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
> +					5, 4, 3, 2,
> +					1, 0, 11, 10,
> +					9, 8, 7, 6);

I was playing around with these mask values and I was not able to make it work as I expected.
I ended up with different values in the mask.

_mm_set_epi8(15, 14, 13, 12, 5, 4, 3, 2, 1, 0, 11, 10, 9, 8, 7, 6);

After dumping the memory for a large number of tests this one seems correct, can you verify your mask is correct?

> +#else
> +	struct ether_addr addr;
> +#endif
> 	/*
> 	 * Receive a burst of packets and forward them.
> 	 */
> @@ -123,9 +131,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> 
> 		/* Swap dest and src mac addresses. */
> +#ifdef RTE_ARCH_X86
> +		addr = _mm_loadu_si128((__m128i *)eth_hdr);
> +		addr = _mm_shuffle_epi8(addr, shfl_msk);
> +		_mm_storeu_si128((__m128i *)eth_hdr, addr);
> +#else
> 		ether_addr_copy(&eth_hdr->d_addr, &addr);
> 		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
> 		ether_addr_copy(&addr, &eth_hdr->s_addr);
> +#endif
> 
> 		mb->ol_flags &= IND_ATTACHED_MBUF | EXT_ATTACHED_MBUF;
> 		mb->ol_flags |= ol_flags;
> -- 
> 2.13.6
> 

Regards,
Keith

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-stable] [PATCH] app/testpmd: improve MAC swap performance
  2018-11-23 22:43 ` Wiles, Keith
@ 2018-11-24 16:24   ` Wiles, Keith
  2018-11-27  1:06     ` Zhang, Qi Z
  0 siblings, 1 reply; 11+ messages in thread
From: Wiles, Keith @ 2018-11-24 16:24 UTC (permalink / raw)
  To: Zhang, Qi Z
  Cc: Richardson, Bruce, dev, Lu, Wenzhuo, Iremonger, Bernard, stable



> On Nov 23, 2018, at 4:43 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
> 
> 
> 
>> On Nov 19, 2018, at 10:45 PM, Zhang, Qi Z <qi.z.zhang@intel.com> wrote:
>> 
>> The patch optimizes the mac swap operation by taking advantage
>> of SSE instructions, it only impacts x86 platform.
>> 
>> Cc: stable@dpdk.org
>> 
>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> ---
>> app/test-pmd/macswap.c | 16 +++++++++++++++-
>> 1 file changed, 15 insertions(+), 1 deletion(-)
>> 
>> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c
>> index a8384d5b8..0722782b0 100644
>> --- a/app/test-pmd/macswap.c
>> +++ b/app/test-pmd/macswap.c
>> @@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>> 	struct rte_port  *txp;
>> 	struct rte_mbuf  *mb;
>> 	struct ether_hdr *eth_hdr;
>> -	struct ether_addr addr;
>> 	uint16_t nb_rx;
>> 	uint16_t nb_tx;
>> 	uint16_t i;
>> @@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>> 	start_tsc = rte_rdtsc();
>> #endif
>> 
>> +#ifdef RTE_ARCH_X86
>> +	__m128i addr;
>> +	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
>> +					5, 4, 3, 2,
>> +					1, 0, 11, 10,
>> +					9, 8, 7, 6);
> 
> I was playing around with these mask values and I was not able to make it work as I expected.
> I ended up with different values in the mask.
> 
> _mm_set_epi8(15, 14, 13, 12, 5, 4, 3, 2, 1, 0, 11, 10, 9, 8, 7, 6);
> 
> After dumping the memory for a large number of tests this one seems correct, can you verify your mask is correct?

Sorry, I do not know why I thought the code was not the same, but your example is correct my mistake.
> 
>> +#else
>> +	struct ether_addr addr;
>> +#endif
>> 	/*
>> 	 * Receive a burst of packets and forward them.
>> 	 */
>> @@ -123,9 +131,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
>> 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
>> 
>> 		/* Swap dest and src mac addresses. */
>> +#ifdef RTE_ARCH_X86
>> +		addr = _mm_loadu_si128((__m128i *)eth_hdr);
>> +		addr = _mm_shuffle_epi8(addr, shfl_msk);
>> +		_mm_storeu_si128((__m128i *)eth_hdr, addr);
>> +#else
>> 		ether_addr_copy(&eth_hdr->d_addr, &addr);
>> 		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
>> 		ether_addr_copy(&addr, &eth_hdr->s_addr);
>> +#endif
>> 
>> 		mb->ol_flags &= IND_ATTACHED_MBUF | EXT_ATTACHED_MBUF;
>> 		mb->ol_flags |= ol_flags;
>> -- 
>> 2.13.6
>> 
> 
> Regards,
> Keith

Regards,
Keith

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [dpdk-stable] [PATCH] app/testpmd: improve MAC swap performance
  2018-11-24 16:24   ` Wiles, Keith
@ 2018-11-27  1:06     ` Zhang, Qi Z
  0 siblings, 0 replies; 11+ messages in thread
From: Zhang, Qi Z @ 2018-11-27  1:06 UTC (permalink / raw)
  To: Wiles, Keith
  Cc: Richardson, Bruce, dev, Lu, Wenzhuo, Iremonger, Bernard, stable



> -----Original Message-----
> From: Wiles, Keith
> Sent: Saturday, November 24, 2018 8:24 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; dev <dev@dpdk.org>; Lu,
> Wenzhuo <wenzhuo.lu@intel.com>; Iremonger, Bernard
> <bernard.iremonger@intel.com>; stable@dpdk.org
> Subject: Re: [PATCH] app/testpmd: improve MAC swap performance
> 
> 
> 
> > On Nov 23, 2018, at 4:43 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
> >
> >
> >
> >> On Nov 19, 2018, at 10:45 PM, Zhang, Qi Z <qi.z.zhang@intel.com> wrote:
> >>
> >> The patch optimizes the mac swap operation by taking advantage of SSE
> >> instructions, it only impacts x86 platform.
> >>
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
> >> ---
> >> app/test-pmd/macswap.c | 16 +++++++++++++++-
> >> 1 file changed, 15 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c index
> >> a8384d5b8..0722782b0 100644
> >> --- a/app/test-pmd/macswap.c
> >> +++ b/app/test-pmd/macswap.c
> >> @@ -78,7 +78,6 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> >> 	struct rte_port  *txp;
> >> 	struct rte_mbuf  *mb;
> >> 	struct ether_hdr *eth_hdr;
> >> -	struct ether_addr addr;
> >> 	uint16_t nb_rx;
> >> 	uint16_t nb_tx;
> >> 	uint16_t i;
> >> @@ -95,6 +94,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> >> 	start_tsc = rte_rdtsc();
> >> #endif
> >>
> >> +#ifdef RTE_ARCH_X86
> >> +	__m128i addr;
> >> +	__m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12,
> >> +					5, 4, 3, 2,
> >> +					1, 0, 11, 10,
> >> +					9, 8, 7, 6);
> >
> > I was playing around with these mask values and I was not able to make it
> work as I expected.
> > I ended up with different values in the mask.
> >
> > _mm_set_epi8(15, 14, 13, 12, 5, 4, 3, 2, 1, 0, 11, 10, 9, 8, 7, 6);
> >
> > After dumping the memory for a large number of tests this one seems correct,
> can you verify your mask is correct?
> 
> Sorry, I do not know why I thought the code was not the same, but your
> example is correct my mistake.

Thanks for review and verify this!

Regards
Qi

> >
> >> +#else
> >> +	struct ether_addr addr;
> >> +#endif
> >> 	/*
> >> 	 * Receive a burst of packets and forward them.
> >> 	 */
> >> @@ -123,9 +131,15 @@ pkt_burst_mac_swap(struct fwd_stream *fs)
> >> 		eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
> >>
> >> 		/* Swap dest and src mac addresses. */
> >> +#ifdef RTE_ARCH_X86
> >> +		addr = _mm_loadu_si128((__m128i *)eth_hdr);
> >> +		addr = _mm_shuffle_epi8(addr, shfl_msk);
> >> +		_mm_storeu_si128((__m128i *)eth_hdr, addr); #else
> >> 		ether_addr_copy(&eth_hdr->d_addr, &addr);
> >> 		ether_addr_copy(&eth_hdr->s_addr, &eth_hdr->d_addr);
> >> 		ether_addr_copy(&addr, &eth_hdr->s_addr);
> >> +#endif
> >>
> >> 		mb->ol_flags &= IND_ATTACHED_MBUF | EXT_ATTACHED_MBUF;
> >> 		mb->ol_flags |= ol_flags;
> >> --
> >> 2.13.6
> >>
> >
> > Regards,
> > Keith
> 
> Regards,
> Keith

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-11-27  1:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-20  4:45 [dpdk-stable] [PATCH] app/testpmd: improve MAC swap performance Qi Zhang
2018-11-20  9:16 ` [dpdk-stable] [dpdk-dev] " Ananyev, Konstantin
2018-11-20 14:48   ` Wiles, Keith
2018-11-20 16:58   ` Zhang, Qi Z
2018-11-20 17:26     ` Ananyev, Konstantin
2018-11-20 22:53     ` Ananyev, Konstantin
2018-11-21 21:24       ` Zhang, Qi Z
2018-11-23 22:43 ` [dpdk-stable] " Wiles, Keith
2018-11-23 22:43 ` Wiles, Keith
2018-11-24 16:24   ` Wiles, Keith
2018-11-27  1:06     ` Zhang, Qi Z

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).