DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev]  [PATCH] examples/l3fwd: fix NEON instructions
@ 2017-10-29  7:48 Guduri Prathyusha
  2017-10-29  8:24 ` Guduri Prathyusha
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Guduri Prathyusha @ 2017-10-29  7:48 UTC (permalink / raw)
  To: tomasz.kantecki; +Cc: jianbo.liu, guduriprathyusha, dev, Guduri Prathyusha

To group consecutive packets with same destination port in bursts of 4
neon intrinsic data types dp1 and dp2 are calculated such that if
dst_port[]={a,b,c,d,e,f,g,h,i...} dp1 should contain: <a,b,c,d> and
dp2 should contain: <b,c,d,e> in the first iteration. dp1 should
be <e,f,g,h> and dp2 should be <f,g,h,i> in the next iteration. dp2 in
the last iteration should be <w,x,y,y>.

Whereas the existing code incorrectly calculates dp1 as <d,e,f,g> from
second iteration and thus incorrect calculation of dp2 as <d,e,f,f>
in the last iteration.

This patch fixes the incorrect ARM NEON instructions on dp1 and dp2.

Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation")

Signed-off-by: Guduri Prathyusha <gprathyusha@caviumnetworks.com>
---
 examples/l3fwd/l3fwd_neon.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h
index 42d50d3c2..1eace4e03 100644
--- a/examples/l3fwd/l3fwd_neon.h
+++ b/examples/l3fwd/l3fwd_neon.h
@@ -192,13 +192,13 @@ send_packets_multi(struct lcore_conf *qconf, struct rte_mbuf **pkts_burst,
 			 * dp1:
 			 * <d[j], d[j+1], d[j+2], d[j+3], ... >
 			 */
-			dp1 = vextq_u16(dp1, dp1, FWDSTEP - 1);
+			dp1 = vextq_u16(dp2, vdupq_n_u16(0), FWDSTEP - 1);
 		}

 		/*
 		 * dp2: <d[j-3], d[j-2], d[j-1], d[j-1], ... >
 		 */
-		dp2 = vextq_u16(dp1, dp1, 1);
+		dp2 = vextq_u16(dp1, vdupq_n_u16(0), 1);
 		dp2 = vsetq_lane_u16(vgetq_lane_u16(dp2, 2), dp2, 3);
 		lp  = port_groupx4(&pnum[j - FWDSTEP], lp, dp1, dp2);

--
2.14.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH] examples/l3fwd: fix NEON instructions
  2017-10-29  7:48 [dpdk-dev] [PATCH] examples/l3fwd: fix NEON instructions Guduri Prathyusha
@ 2017-10-29  8:24 ` Guduri Prathyusha
  2017-10-30  5:59 ` Jianbo Liu
  2017-10-30  6:27 ` Jianbo Liu
  2 siblings, 0 replies; 5+ messages in thread
From: Guduri Prathyusha @ 2017-10-29  8:24 UTC (permalink / raw)
  To: tomasz.kantecki; +Cc: dev, jianbo.liu

+ jianbo.liu@arm.com
On Sun, Oct 29, 2017 at 01:18:07PM +0530, Guduri Prathyusha wrote:
> To group consecutive packets with same destination port in bursts of 4
> neon intrinsic data types dp1 and dp2 are calculated such that if
> dst_port[]={a,b,c,d,e,f,g,h,i...} dp1 should contain: <a,b,c,d> and
> dp2 should contain: <b,c,d,e> in the first iteration. dp1 should
> be <e,f,g,h> and dp2 should be <f,g,h,i> in the next iteration. dp2 in
> the last iteration should be <w,x,y,y>.
>
> Whereas the existing code incorrectly calculates dp1 as <d,e,f,g> from
> second iteration and thus incorrect calculation of dp2 as <d,e,f,f>
> in the last iteration.
>
> This patch fixes the incorrect ARM NEON instructions on dp1 and dp2.
>
> Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation")
>
> Signed-off-by: Guduri Prathyusha <gprathyusha@caviumnetworks.com>
> ---
>  examples/l3fwd/l3fwd_neon.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h
> index 42d50d3c2..1eace4e03 100644
> --- a/examples/l3fwd/l3fwd_neon.h
> +++ b/examples/l3fwd/l3fwd_neon.h
> @@ -192,13 +192,13 @@ send_packets_multi(struct lcore_conf *qconf, struct rte_mbuf **pkts_burst,
>  			 * dp1:
>  			 * <d[j], d[j+1], d[j+2], d[j+3], ... >
>  			 */
> -			dp1 = vextq_u16(dp1, dp1, FWDSTEP - 1);
> +			dp1 = vextq_u16(dp2, vdupq_n_u16(0), FWDSTEP - 1);
>  		}
>
>  		/*
>  		 * dp2: <d[j-3], d[j-2], d[j-1], d[j-1], ... >
>  		 */
> -		dp2 = vextq_u16(dp1, dp1, 1);
> +		dp2 = vextq_u16(dp1, vdupq_n_u16(0), 1);
>  		dp2 = vsetq_lane_u16(vgetq_lane_u16(dp2, 2), dp2, 3);
>  		lp  = port_groupx4(&pnum[j - FWDSTEP], lp, dp1, dp2);
>
> --
> 2.14.1
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH] examples/l3fwd: fix NEON instructions
  2017-10-29  7:48 [dpdk-dev] [PATCH] examples/l3fwd: fix NEON instructions Guduri Prathyusha
  2017-10-29  8:24 ` Guduri Prathyusha
@ 2017-10-30  5:59 ` Jianbo Liu
  2017-10-30  6:27 ` Jianbo Liu
  2 siblings, 0 replies; 5+ messages in thread
From: Jianbo Liu @ 2017-10-30  5:59 UTC (permalink / raw)
  To: Guduri Prathyusha; +Cc: tomasz.kantecki, jianbo.liu, guduriprathyusha, dev

The 10/29/2017 13:18, Guduri Prathyusha wrote:
> To group consecutive packets with same destination port in bursts of 4
> neon intrinsic data types dp1 and dp2 are calculated such that if
> dst_port[]={a,b,c,d,e,f,g,h,i...} dp1 should contain: <a,b,c,d> and
> dp2 should contain: <b,c,d,e> in the first iteration. dp1 should
> be <e,f,g,h> and dp2 should be <f,g,h,i> in the next iteration. dp2 in
> the last iteration should be <w,x,y,y>.
>
> Whereas the existing code incorrectly calculates dp1 as <d,e,f,g> from
> second iteration and thus incorrect calculation of dp2 as <d,e,f,f>
> in the last iteration.
>
> This patch fixes the incorrect ARM NEON instructions on dp1 and dp2.
>
> Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation")
>
> Signed-off-by: Guduri Prathyusha <gprathyusha@caviumnetworks.com>
> ---
>  examples/l3fwd/l3fwd_neon.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h
> index 42d50d3c2..1eace4e03 100644
> --- a/examples/l3fwd/l3fwd_neon.h
> +++ b/examples/l3fwd/l3fwd_neon.h
> @@ -192,13 +192,13 @@ send_packets_multi(struct lcore_conf *qconf, struct rte_mbuf **pkts_burst,
>                        * dp1:
>                        * <d[j], d[j+1], d[j+2], d[j+3], ... >
>                        */
> -                     dp1 = vextq_u16(dp1, dp1, FWDSTEP - 1);
> +                     dp1 = vextq_u16(dp2, vdupq_n_u16(0), FWDSTEP - 1);

Can you write it as "dp1 = vextq_u16(dp2, dp1, FWDSTEP - 1)"? It's my typo.

>               }
>
>               /*
>                * dp2: <d[j-3], d[j-2], d[j-1], d[j-1], ... >
>                */
> -             dp2 = vextq_u16(dp1, dp1, 1);
> +             dp2 = vextq_u16(dp1, vdupq_n_u16(0), 1);

Same as above.

>               dp2 = vsetq_lane_u16(vgetq_lane_u16(dp2, 2), dp2, 3);
>               lp  = port_groupx4(&pnum[j - FWDSTEP], lp, dp1, dp2);
>
> --
> 2.14.1
>

--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH] examples/l3fwd: fix NEON instructions
  2017-10-29  7:48 [dpdk-dev] [PATCH] examples/l3fwd: fix NEON instructions Guduri Prathyusha
  2017-10-29  8:24 ` Guduri Prathyusha
  2017-10-30  5:59 ` Jianbo Liu
@ 2017-10-30  6:27 ` Jianbo Liu
  2017-10-30  7:14   ` Guduri Prathyusha
  2 siblings, 1 reply; 5+ messages in thread
From: Jianbo Liu @ 2017-10-30  6:27 UTC (permalink / raw)
  To: Guduri Prathyusha; +Cc: tomasz.kantecki, jianbo.liu, guduriprathyusha, dev

The 10/29/2017 13:18, Guduri Prathyusha wrote:
> To group consecutive packets with same destination port in bursts of 4
> neon intrinsic data types dp1 and dp2 are calculated such that if
> dst_port[]={a,b,c,d,e,f,g,h,i...} dp1 should contain: <a,b,c,d> and
> dp2 should contain: <b,c,d,e> in the first iteration. dp1 should
> be <e,f,g,h> and dp2 should be <f,g,h,i> in the next iteration. dp2 in
> the last iteration should be <w,x,y,y>.
>
> Whereas the existing code incorrectly calculates dp1 as <d,e,f,g> from
> second iteration and thus incorrect calculation of dp2 as <d,e,f,f>
> in the last iteration.
>
> This patch fixes the incorrect ARM NEON instructions on dp1 and dp2.
>
> Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation")
>
> Signed-off-by: Guduri Prathyusha <gprathyusha@caviumnetworks.com>
> ---
>  examples/l3fwd/l3fwd_neon.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h
> index 42d50d3c2..1eace4e03 100644
> --- a/examples/l3fwd/l3fwd_neon.h
> +++ b/examples/l3fwd/l3fwd_neon.h
> @@ -192,13 +192,13 @@ send_packets_multi(struct lcore_conf *qconf, struct rte_mbuf **pkts_burst,
>                        * dp1:
>                        * <d[j], d[j+1], d[j+2], d[j+3], ... >
>                        */
> -                     dp1 = vextq_u16(dp1, dp1, FWDSTEP - 1);
> +                     dp1 = vextq_u16(dp2, vdupq_n_u16(0), FWDSTEP - 1);
>               }
>
>               /*
>                * dp2: <d[j-3], d[j-2], d[j-1], d[j-1], ... >
>                */
> -             dp2 = vextq_u16(dp1, dp1, 1);
> +             dp2 = vextq_u16(dp1, vdupq_n_u16(0), 1);

Sorry, I don't think you need to change this line. Please ignore my
comment about it in the last email.

>               dp2 = vsetq_lane_u16(vgetq_lane_u16(dp2, 2), dp2, 3);
>               lp  = port_groupx4(&pnum[j - FWDSTEP], lp, dp1, dp2);
>
> --
> 2.14.1
>

--
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] [PATCH] examples/l3fwd: fix NEON instructions
  2017-10-30  6:27 ` Jianbo Liu
@ 2017-10-30  7:14   ` Guduri Prathyusha
  0 siblings, 0 replies; 5+ messages in thread
From: Guduri Prathyusha @ 2017-10-30  7:14 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev, tomasz.kantecki, guduriprathyusha

On Mon, Oct 30, 2017 at 02:27:09PM +0800, Jianbo Liu wrote:
> The 10/29/2017 13:18, Guduri Prathyusha wrote:
> > To group consecutive packets with same destination port in bursts of 4
> > neon intrinsic data types dp1 and dp2 are calculated such that if
> > dst_port[]={a,b,c,d,e,f,g,h,i...} dp1 should contain: <a,b,c,d> and
> > dp2 should contain: <b,c,d,e> in the first iteration. dp1 should
> > be <e,f,g,h> and dp2 should be <f,g,h,i> in the next iteration. dp2 in
> > the last iteration should be <w,x,y,y>.
> >
> > Whereas the existing code incorrectly calculates dp1 as <d,e,f,g> from
> > second iteration and thus incorrect calculation of dp2 as <d,e,f,f>
> > in the last iteration.
> >
> > This patch fixes the incorrect ARM NEON instructions on dp1 and dp2.
> >
> > Fixes: 569b290cdb36 ("examples/l3fwd: add NEON implementation")
> >
> > Signed-off-by: Guduri Prathyusha <gprathyusha@caviumnetworks.com>
> > ---
> >  examples/l3fwd/l3fwd_neon.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/examples/l3fwd/l3fwd_neon.h b/examples/l3fwd/l3fwd_neon.h
> > index 42d50d3c2..1eace4e03 100644
> > --- a/examples/l3fwd/l3fwd_neon.h
> > +++ b/examples/l3fwd/l3fwd_neon.h
> > @@ -192,13 +192,13 @@ send_packets_multi(struct lcore_conf *qconf, struct rte_mbuf **pkts_burst,
> >                        * dp1:
> >                        * <d[j], d[j+1], d[j+2], d[j+3], ... >
> >                        */
> > -                     dp1 = vextq_u16(dp1, dp1, FWDSTEP - 1);
> > +                     dp1 = vextq_u16(dp2, vdupq_n_u16(0), FWDSTEP - 1);
> >               }
> >
> >               /*
> >                * dp2: <d[j-3], d[j-2], d[j-1], d[j-1], ... >
> >                */
> > -             dp2 = vextq_u16(dp1, dp1, 1);
> > +             dp2 = vextq_u16(dp1, vdupq_n_u16(0), 1);
>
> Sorry, I don't think you need to change this line. Please ignore my
> comment about it in the last email.
>
Thanks for the quick review. dp1 changed as you said and leaving dp2 as
is solves the issue. Will spin a v2 with only changing dp1.

Prathyusha
> >               dp2 = vsetq_lane_u16(vgetq_lane_u16(dp2, 2), dp2, 3);
> >               lp  = port_groupx4(&pnum[j - FWDSTEP], lp, dp1, dp2);
> >
> > --
> > 2.14.1
> >
>
> --
> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-10-30  7:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-29  7:48 [dpdk-dev] [PATCH] examples/l3fwd: fix NEON instructions Guduri Prathyusha
2017-10-29  8:24 ` Guduri Prathyusha
2017-10-30  5:59 ` Jianbo Liu
2017-10-30  6:27 ` Jianbo Liu
2017-10-30  7:14   ` Guduri Prathyusha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).