* [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
@ 2022-01-04 3:00 Ruifeng Wang
2022-02-10 6:24 ` Ruifeng Wang
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Ruifeng Wang @ 2022-01-04 3:00 UTC (permalink / raw)
To: matan, viacheslavo; +Cc: dev, honnappa.nagarahalli, stable, nd, Ruifeng Wang
In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the word that includes op_own field could be
reordered after read of other words. In this case, some words could
contain invalid data.
Reloaded qword0 after read barrier to update vector register. This
ensures that the fetched data is correct.
Testpmd single core test on N1SDP/ThunderX2 showed no performance drop.
Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx completions")
Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index b1d16baa61..b1ec615b51 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
c0 = vld1q_u64((uint64_t *)(p0 + 48));
/* Synchronize for loading the rest of blocks. */
rte_io_rmb();
+ /* B.0 (CQE 3) reload lower half of the block. */
+ c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
+ /* B.0 (CQE 2) reload lower half of the block. */
+ c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
+ /* B.0 (CQE 1) reload lower half of the block. */
+ c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
+ /* B.0 (CQE 0) reload lower half of the block. */
+ c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
/* Prefetch next 4 CQEs. */
if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
unsigned int next = pos + MLX5_VPMD_DESCS_PER_LOOP;
--
2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-01-04 3:00 [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
@ 2022-02-10 6:24 ` Ruifeng Wang
2022-02-10 8:16 ` Slava Ovsiienko
2022-05-19 14:56 ` Ali Alnubani
2023-05-30 5:48 ` [PATCH v2] " Ruifeng Wang
2 siblings, 1 reply; 13+ messages in thread
From: Ruifeng Wang @ 2022-02-10 6:24 UTC (permalink / raw)
To: matan, viacheslavo
Cc: dev, Honnappa Nagarahalli, stable, nd, Ruifeng Wang, nd
Ping.
Please could you help to review this patch?
Thanks.
Ruifeng
> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Tuesday, January 4, 2022 11:01 AM
> To: matan@nvidia.com; viacheslavo@nvidia.com
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>;
> Ruifeng Wang <Ruifeng.Wang@arm.com>
> Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
>
> In NEON vector PMD, vector load loads two contiguous 8B of descriptor data
> into vector register. Given vector load ensures no 16B atomicity, read of the
> word that includes op_own field could be reordered after read of other
> words. In this case, some words could contain invalid data.
>
> Reloaded qword0 after read barrier to update vector register. This ensures
> that the fetched data is correct.
>
> Testpmd single core test on N1SDP/ThunderX2 showed no performance
> drop.
>
> Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> completions")
> Cc: stable@dpdk.org
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> index b1d16baa61..b1ec615b51 100644
> --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq,
> volatile struct mlx5_cqe *cq,
> c0 = vld1q_u64((uint64_t *)(p0 + 48));
> /* Synchronize for loading the rest of blocks. */
> rte_io_rmb();
> + /* B.0 (CQE 3) reload lower half of the block. */
> + c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
> + /* B.0 (CQE 2) reload lower half of the block. */
> + c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
> + /* B.0 (CQE 1) reload lower half of the block. */
> + c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
> + /* B.0 (CQE 0) reload lower half of the block. */
> + c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
> /* Prefetch next 4 CQEs. */
> if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
> unsigned int next = pos +
> MLX5_VPMD_DESCS_PER_LOOP;
> --
> 2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-02-10 6:24 ` Ruifeng Wang
@ 2022-02-10 8:16 ` Slava Ovsiienko
2022-02-10 8:29 ` Ruifeng Wang
0 siblings, 1 reply; 13+ messages in thread
From: Slava Ovsiienko @ 2022-02-10 8:16 UTC (permalink / raw)
To: Ruifeng Wang, Matan Azrad; +Cc: dev, Honnappa Nagarahalli, stable, nd, nd
Hi Ruifeng,
Patch looks reasonable, thank you.
Just curious - did you see the real issue with re-ordering in this code fragment?
And, please, let us do performance check.
With best regards,
Slava
> -----Original Message-----
> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Sent: Thursday, February 10, 2022 8:25
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>;
> Ruifeng Wang <Ruifeng.Wang@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
>
> Ping.
> Please could you help to review this patch?
>
> Thanks.
> Ruifeng
>
> > -----Original Message-----
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > Sent: Tuesday, January 4, 2022 11:01 AM
> > To: matan@nvidia.com; viacheslavo@nvidia.com
> > Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>;
> > stable@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > In NEON vector PMD, vector load loads two contiguous 8B of descriptor
> > data into vector register. Given vector load ensures no 16B atomicity,
> > read of the word that includes op_own field could be reordered after
> > read of other words. In this case, some words could contain invalid data.
> >
> > Reloaded qword0 after read barrier to update vector register. This
> > ensures that the fetched data is correct.
> >
> > Testpmd single core test on N1SDP/ThunderX2 showed no performance
> > drop.
> >
> > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> > completions")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> > drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > index b1d16baa61..b1ec615b51 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq,
> > volatile struct mlx5_cqe *cq,
> > c0 = vld1q_u64((uint64_t *)(p0 + 48));
> > /* Synchronize for loading the rest of blocks. */
> > rte_io_rmb();
> > + /* B.0 (CQE 3) reload lower half of the block. */
> > + c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
> > + /* B.0 (CQE 2) reload lower half of the block. */
> > + c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
> > + /* B.0 (CQE 1) reload lower half of the block. */
> > + c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
> > + /* B.0 (CQE 0) reload lower half of the block. */
> > + c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
> > /* Prefetch next 4 CQEs. */
> > if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
> > unsigned int next = pos +
> > MLX5_VPMD_DESCS_PER_LOOP;
> > --
> > 2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-02-10 8:16 ` Slava Ovsiienko
@ 2022-02-10 8:29 ` Ruifeng Wang
0 siblings, 0 replies; 13+ messages in thread
From: Ruifeng Wang @ 2022-02-10 8:29 UTC (permalink / raw)
To: Slava Ovsiienko, Matan Azrad
Cc: dev, Honnappa Nagarahalli, stable, nd, nd, nd
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Thursday, February 10, 2022 4:17 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Matan Azrad
> <matan@nvidia.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>;
> nd <nd@arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
>
> Hi Ruifeng,
Hi Slava,
>
> Patch looks reasonable, thank you.
> Just curious - did you see the real issue with re-ordering in this code
> fragment?
No real issue was seen. It is analysis from architecture perspective.
> And, please, let us do performance check.
Sure. Thank you.
>
> With best regards,
> Slava
>
> > -----Original Message-----
> > From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Sent: Thursday, February 10, 2022 8:25
> > To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>
> > Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>;
> > stable@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>;
> > nd <nd@arm.com>
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > Ping.
> > Please could you help to review this patch?
> >
> > Thanks.
> > Ruifeng
> >
> > > -----Original Message-----
> > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Sent: Tuesday, January 4, 2022 11:01 AM
> > > To: matan@nvidia.com; viacheslavo@nvidia.com
> > > Cc: dev@dpdk.org; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>;
> > > stable@dpdk.org; nd <nd@arm.com>; Ruifeng Wang
> > <Ruifeng.Wang@arm.com>
> > > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > > vector path
> > >
> > > In NEON vector PMD, vector load loads two contiguous 8B of
> > > descriptor data into vector register. Given vector load ensures no
> > > 16B atomicity, read of the word that includes op_own field could be
> > > reordered after read of other words. In this case, some words could
> contain invalid data.
> > >
> > > Reloaded qword0 after read barrier to update vector register. This
> > > ensures that the fetched data is correct.
> > >
> > > Testpmd single core test on N1SDP/ThunderX2 showed no performance
> > > drop.
> > >
> > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> > > completions")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> > > drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > > b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > > index b1d16baa61..b1ec615b51 100644
> > > --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > > +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
> > > @@ -647,6 +647,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq,
> > > volatile struct mlx5_cqe *cq,
> > > c0 = vld1q_u64((uint64_t *)(p0 + 48));
> > > /* Synchronize for loading the rest of blocks. */
> > > rte_io_rmb();
> > > + /* B.0 (CQE 3) reload lower half of the block. */
> > > + c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
> > > + /* B.0 (CQE 2) reload lower half of the block. */
> > > + c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
> > > + /* B.0 (CQE 1) reload lower half of the block. */
> > > + c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
> > > + /* B.0 (CQE 0) reload lower half of the block. */
> > > + c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
> > > /* Prefetch next 4 CQEs. */
> > > if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
> > > unsigned int next = pos +
> > > MLX5_VPMD_DESCS_PER_LOOP;
> > > --
> > > 2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-01-04 3:00 [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
2022-02-10 6:24 ` Ruifeng Wang
@ 2022-05-19 14:56 ` Ali Alnubani
2022-06-20 5:37 ` Slava Ovsiienko
2023-05-30 5:48 ` [PATCH v2] " Ruifeng Wang
2 siblings, 1 reply; 13+ messages in thread
From: Ali Alnubani @ 2022-05-19 14:56 UTC (permalink / raw)
To: Ruifeng Wang, Matan Azrad, Slava Ovsiienko
Cc: dev, honnappa.nagarahalli, stable, nd
> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Tuesday, January 4, 2022 5:01 AM
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; stable@dpdk.org;
> nd@arm.com; Ruifeng Wang <ruifeng.wang@arm.com>
> Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
>
> In NEON vector PMD, vector load loads two contiguous 8B of
> descriptor data into vector register. Given vector load ensures no
> 16B atomicity, read of the word that includes op_own field could be
> reordered after read of other words. In this case, some words could
> contain invalid data.
>
> Reloaded qword0 after read barrier to update vector register. This
> ensures that the fetched data is correct.
>
> Testpmd single core test on N1SDP/ThunderX2 showed no performance
> drop.
>
> Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> completions")
> Cc: stable@dpdk.org
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
Tested with BlueField-2 and didn't see a performance impact.
Tested-by: Ali Alnubani <alialnu@nvidia.com>
Thanks,
Ali
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-05-19 14:56 ` Ali Alnubani
@ 2022-06-20 5:37 ` Slava Ovsiienko
2022-06-27 11:08 ` Ruifeng Wang
0 siblings, 1 reply; 13+ messages in thread
From: Slava Ovsiienko @ 2022-06-20 5:37 UTC (permalink / raw)
To: Ali Alnubani, Ruifeng Wang, Matan Azrad
Cc: dev, honnappa.nagarahalli, stable, nd
Hi, Ruifeng
My apologies for review delay.
As far I understand the hypothetical problem scenario is:
- CPU core reorders reading of qwords of 16B vector
- core reads the second 8B of CQE (old CQE values)
- CQE update
- core reads the first 8B of CQE (new CQE values)
How the re-reading of CQEs can resolve the issue?
This wrong scenario might happen on the second read
and we would run into the same issue.
In my opinion, the right solution to cover potential reordering should be:
- read CQE
- check CQE status (first 8B)
- read memory barrier
- read the rest of CQE
With best regards,
Slava
> -----Original Message-----
> From: Ali Alnubani <alialnu@nvidia.com>
> Sent: Thursday, May 19, 2022 17:56
> To: Ruifeng Wang <ruifeng.wang@arm.com>; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; stable@dpdk.org;
> nd@arm.com
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
>
> > -----Original Message-----
> > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > Sent: Tuesday, January 4, 2022 5:01 AM
> > To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > <viacheslavo@nvidia.com>
> > Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; stable@dpdk.org;
> > nd@arm.com; Ruifeng Wang <ruifeng.wang@arm.com>
> > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > In NEON vector PMD, vector load loads two contiguous 8B of descriptor
> > data into vector register. Given vector load ensures no 16B atomicity,
> > read of the word that includes op_own field could be reordered after
> > read of other words. In this case, some words could contain invalid
> > data.
> >
> > Reloaded qword0 after read barrier to update vector register. This
> > ensures that the fetched data is correct.
> >
> > Testpmd single core test on N1SDP/ThunderX2 showed no performance
> > drop.
> >
> > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> > completions")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
>
> Tested with BlueField-2 and didn't see a performance impact.
>
> Tested-by: Ali Alnubani <alialnu@nvidia.com>
>
> Thanks,
> Ali
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-06-20 5:37 ` Slava Ovsiienko
@ 2022-06-27 11:08 ` Ruifeng Wang
2022-06-29 7:55 ` Slava Ovsiienko
0 siblings, 1 reply; 13+ messages in thread
From: Ruifeng Wang @ 2022-06-27 11:08 UTC (permalink / raw)
To: Slava Ovsiienko, Ali Alnubani, Matan Azrad
Cc: dev, Honnappa Nagarahalli, stable, nd, nd
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Monday, June 20, 2022 1:38 PM
> To: Ali Alnubani <alialnu@nvidia.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Matan Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
>
> Hi, Ruifeng
Hi Slava,
Thanks for your review.
>
> My apologies for review delay.
Apologies too. I was on something else.
> As far I understand the hypothetical problem scenario is:
> - CPU core reorders reading of qwords of 16B vector
> - core reads the second 8B of CQE (old CQE values)
> - CQE update
> - core reads the first 8B of CQE (new CQE values)
Yes, This is the problem.
>
> How the re-reading of CQEs can resolve the issue?
> This wrong scenario might happen on the second read and we would run into
> the same issue.
Here we are trying to ordering reading of a 16B vector (8B with op_own - high, and 8B without op_own - low).
The first read will load 16B. The second read will load and update low 8B (no op_own).
There are 2 possible status indicated by op_own: valid, invalid.
If CQE status is invalid, no problem, it will be ignored this time.
If CQE status is valid, the second read ensures the rest of CQE is no older than high 8B (with op_own).
Assuming NIC updates op_own no earlier than the rest part of CQE, I think the second read ensures CQE content retrieved is correct.
>
> In my opinion, the right solution to cover potential reordering should be:
> - read CQE
> - check CQE status (first 8B)
We don't need to check CQE status at the moment. See explanation above.
> - read memory barrier
> - read the rest of CQE
>
> With best regards,
> Slava
>
> > -----Original Message-----
> > From: Ali Alnubani <alialnu@nvidia.com>
> > Sent: Thursday, May 19, 2022 17:56
> > To: Ruifeng Wang <ruifeng.wang@arm.com>; Matan Azrad
> > <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> > Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; stable@dpdk.org;
> > nd@arm.com
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > > -----Original Message-----
> > > From: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Sent: Tuesday, January 4, 2022 5:01 AM
> > > To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> > > <viacheslavo@nvidia.com>
> > > Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; stable@dpdk.org;
> > > nd@arm.com; Ruifeng Wang <ruifeng.wang@arm.com>
> > > Subject: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > > vector path
> > >
> > > In NEON vector PMD, vector load loads two contiguous 8B of
> > > descriptor data into vector register. Given vector load ensures no
> > > 16B atomicity, read of the word that includes op_own field could be
> > > reordered after read of other words. In this case, some words could
> > > contain invalid data.
> > >
> > > Reloaded qword0 after read barrier to update vector register. This
> > > ensures that the fetched data is correct.
> > >
> > > Testpmd single core test on N1SDP/ThunderX2 showed no performance
> > > drop.
> > >
> > > Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> > > completions")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > ---
> >
> > Tested with BlueField-2 and didn't see a performance impact.
> >
> > Tested-by: Ali Alnubani <alialnu@nvidia.com>
> >
> > Thanks,
> > Ali
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-06-27 11:08 ` Ruifeng Wang
@ 2022-06-29 7:55 ` Slava Ovsiienko
2022-06-29 11:41 ` Ruifeng Wang
0 siblings, 1 reply; 13+ messages in thread
From: Slava Ovsiienko @ 2022-06-29 7:55 UTC (permalink / raw)
To: Ruifeng Wang, Ali Alnubani, Matan Azrad
Cc: dev, Honnappa Nagarahalli, stable, nd, nd
Hi, Ruifeng
> -----Original Message-----
> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Sent: Monday, June 27, 2022 14:08
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Ali Alnubani
> <alialnu@nvidia.com>; Matan Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> stable@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> vector path
>
> > -----Original Message-----
> > From: Slava Ovsiienko <viacheslavo@nvidia.com>
> > Sent: Monday, June 20, 2022 1:38 PM
> > To: Ali Alnubani <alialnu@nvidia.com>; Ruifeng Wang
> > <Ruifeng.Wang@arm.com>; Matan Azrad <matan@nvidia.com>
> > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> > stable@dpdk.org; nd <nd@arm.com>
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > Hi, Ruifeng
>
> Hi Slava,
>
> Thanks for your review.
> >
> > My apologies for review delay.
>
> Apologies too. I was on something else.
>
> > As far I understand the hypothetical problem scenario is:
> > - CPU core reorders reading of qwords of 16B vector
> > - core reads the second 8B of CQE (old CQE values)
> > - CQE update
> > - core reads the first 8B of CQE (new CQE values)
>
> Yes, This is the problem.
> >
> > How the re-reading of CQEs can resolve the issue?
> > This wrong scenario might happen on the second read and we would run
> > into the same issue.
>
> Here we are trying to ordering reading of a 16B vector (8B with op_own -
> high, and 8B without op_own - low).
> The first read will load 16B. The second read will load and update low
> 8B (no op_own).
OK, I got the point, thank you for the explanations.
Can we avoid the first reading of low 8B (no containing CQE owning field)?
I mean to update this part to read only upper 8Bs:
/* B.0 (CQE 3) load a block having op_own. */
c3 = vld1q_u64((uint64_t *)(p3 + 48));
/* B.0 (CQE 2) load a block having op_own. */
c2 = vld1q_u64((uint64_t *)(p2 + 48));
/* B.0 (CQE 1) load a block having op_own. */
c1 = vld1q_u64((uint64_t *)(p1 + 48));
/* B.0 (CQE 0) load a block having op_own. */
c0 = vld1q_u64((uint64_t *)(p0 + 48));
/* Synchronize for loading the rest of blocks. */
rte_io_rmb();
Because lower 8Bs will be overlapped with the second read (in your patch)
and barrier ensures the correct order.
With best regards,
Slava
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-06-29 7:55 ` Slava Ovsiienko
@ 2022-06-29 11:41 ` Ruifeng Wang
2022-09-29 6:51 ` Ruifeng Wang
0 siblings, 1 reply; 13+ messages in thread
From: Ruifeng Wang @ 2022-06-29 11:41 UTC (permalink / raw)
To: Slava Ovsiienko, Ali Alnubani, Matan Azrad
Cc: dev, Honnappa Nagarahalli, stable, nd, nd, nd
> -----Original Message-----
> From: Slava Ovsiienko <viacheslavo@nvidia.com>
> Sent: Wednesday, June 29, 2022 3:55 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Ali Alnubani
> <alialnu@nvidia.com>; Matan Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>;
> nd <nd@arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
>
> Hi, Ruifeng
>
> > -----Original Message-----
> > From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Sent: Monday, June 27, 2022 14:08
> > To: Slava Ovsiienko <viacheslavo@nvidia.com>; Ali Alnubani
> > <alialnu@nvidia.com>; Matan Azrad <matan@nvidia.com>
> > Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>;
> > stable@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > > -----Original Message-----
> > > From: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > Sent: Monday, June 20, 2022 1:38 PM
> > > To: Ali Alnubani <alialnu@nvidia.com>; Ruifeng Wang
> > > <Ruifeng.Wang@arm.com>; Matan Azrad <matan@nvidia.com>
> > > Cc: dev@dpdk.org; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>
> > > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in
> > > NEON vector path
> > >
> > > Hi, Ruifeng
> >
> > Hi Slava,
> >
> > Thanks for your review.
> > >
> > > My apologies for review delay.
> >
> > Apologies too. I was on something else.
> >
> > > As far I understand the hypothetical problem scenario is:
> > > - CPU core reorders reading of qwords of 16B vector
> > > - core reads the second 8B of CQE (old CQE values)
> > > - CQE update
> > > - core reads the first 8B of CQE (new CQE values)
> >
> > Yes, This is the problem.
> > >
> > > How the re-reading of CQEs can resolve the issue?
> > > This wrong scenario might happen on the second read and we would run
> > > into the same issue.
> >
> > Here we are trying to ordering reading of a 16B vector (8B with op_own
> > - high, and 8B without op_own - low).
> > The first read will load 16B. The second read will load and update low
> > 8B (no op_own).
> OK, I got the point, thank you for the explanations.
> Can we avoid the first reading of low 8B (no containing CQE owning field)?
>
> I mean to update this part to read only upper 8Bs:
> /* B.0 (CQE 3) load a block having op_own. */
> c3 = vld1q_u64((uint64_t *)(p3 + 48));
> /* B.0 (CQE 2) load a block having op_own. */
> c2 = vld1q_u64((uint64_t *)(p2 + 48));
> /* B.0 (CQE 1) load a block having op_own. */
> c1 = vld1q_u64((uint64_t *)(p1 + 48));
> /* B.0 (CQE 0) load a block having op_own. */
> c0 = vld1q_u64((uint64_t *)(p0 + 48));
> /* Synchronize for loading the rest of blocks. */
> rte_io_rmb();
>
> Because lower 8Bs will be overlapped with the second read (in your patch)
> and barrier ensures the correct order.
Hi Slava,
Yes, your suggestion is valid.
Actually, I tried that approach: load higher 8B + barrier + load lower 8B + combine the two 8Bs into a vector.
It also has no observable performance impact but generates more instructions compared to the current patch (the 'combine' operation).
So I followed current approach.
Thanks.
>
>
> With best regards,
> Slava
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-06-29 11:41 ` Ruifeng Wang
@ 2022-09-29 6:51 ` Ruifeng Wang
2023-03-07 16:59 ` Slava Ovsiienko
0 siblings, 1 reply; 13+ messages in thread
From: Ruifeng Wang @ 2022-09-29 6:51 UTC (permalink / raw)
To: Slava Ovsiienko, Ali Alnubani, Matan Azrad
Cc: dev, Honnappa Nagarahalli, stable, nd, nd
> -----Original Message-----
> From: Ruifeng Wang
> Sent: Wednesday, June 29, 2022 7:41 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Ali Alnubani <alialnu@nvidia.com>; Matan
> Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd
> <nd@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
>
> > -----Original Message-----
> > From: Slava Ovsiienko <viacheslavo@nvidia.com>
> > Sent: Wednesday, June 29, 2022 3:55 PM
> > To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Ali Alnubani
> > <alialnu@nvidia.com>; Matan Azrad <matan@nvidia.com>
> > Cc: dev@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> > stable@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > Hi, Ruifeng
> >
> > > -----Original Message-----
> > > From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > > Sent: Monday, June 27, 2022 14:08
> > > To: Slava Ovsiienko <viacheslavo@nvidia.com>; Ali Alnubani
> > > <alialnu@nvidia.com>; Matan Azrad <matan@nvidia.com>
> > > Cc: dev@dpdk.org; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>;
> > > stable@dpdk.org; nd <nd@arm.com>; nd <nd@arm.com>
> > > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in
> > > NEON vector path
> > >
> > > > -----Original Message-----
> > > > From: Slava Ovsiienko <viacheslavo@nvidia.com>
> > > > Sent: Monday, June 20, 2022 1:38 PM
> > > > To: Ali Alnubani <alialnu@nvidia.com>; Ruifeng Wang
> > > > <Ruifeng.Wang@arm.com>; Matan Azrad <matan@nvidia.com>
> > > > Cc: dev@dpdk.org; Honnappa Nagarahalli
> > > > <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>
> > > > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in
> > > > NEON vector path
> > > >
> > > > Hi, Ruifeng
> > >
> > > Hi Slava,
> > >
> > > Thanks for your review.
> > > >
> > > > My apologies for review delay.
> > >
> > > Apologies too. I was on something else.
> > >
> > > > As far I understand the hypothetical problem scenario is:
> > > > - CPU core reorders reading of qwords of 16B vector
> > > > - core reads the second 8B of CQE (old CQE values)
> > > > - CQE update
> > > > - core reads the first 8B of CQE (new CQE values)
> > >
> > > Yes, This is the problem.
> > > >
> > > > How the re-reading of CQEs can resolve the issue?
> > > > This wrong scenario might happen on the second read and we would
> > > > run into the same issue.
> > >
> > > Here we are trying to ordering reading of a 16B vector (8B with
> > > op_own
> > > - high, and 8B without op_own - low).
> > > The first read will load 16B. The second read will load and update
> > > low 8B (no op_own).
> > OK, I got the point, thank you for the explanations.
> > Can we avoid the first reading of low 8B (no containing CQE owning field)?
> >
> > I mean to update this part to read only upper 8Bs:
> > /* B.0 (CQE 3) load a block having op_own. */
> > c3 = vld1q_u64((uint64_t *)(p3 + 48));
> > /* B.0 (CQE 2) load a block having op_own. */
> > c2 = vld1q_u64((uint64_t *)(p2 + 48));
> > /* B.0 (CQE 1) load a block having op_own. */
> > c1 = vld1q_u64((uint64_t *)(p1 + 48));
> > /* B.0 (CQE 0) load a block having op_own. */
> > c0 = vld1q_u64((uint64_t *)(p0 + 48));
> > /* Synchronize for loading the rest of blocks. */
> > rte_io_rmb();
> >
> > Because lower 8Bs will be overlapped with the second read (in your
> > patch) and barrier ensures the correct order.
>
> Hi Slava,
>
> Yes, your suggestion is valid.
> Actually, I tried that approach: load higher 8B + barrier + load lower 8B + combine the
> two 8Bs into a vector.
> It also has no observable performance impact but generates more instructions compared to
> the current patch (the 'combine' operation).
> So I followed current approach.
>
> Thanks.
> >
Hi Slava,
Are there any further comments?
Thanks,
Ruifeng
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-09-29 6:51 ` Ruifeng Wang
@ 2023-03-07 16:59 ` Slava Ovsiienko
0 siblings, 0 replies; 13+ messages in thread
From: Slava Ovsiienko @ 2023-03-07 16:59 UTC (permalink / raw)
To: Ruifeng Wang, Ali Alnubani, Matan Azrad
Cc: dev, Honnappa Nagarahalli, stable, nd, nd
> -----Original Message-----
> From: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Sent: четверг, 29 сентября 2022 г. 09:51
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; Ali Alnubani
> <alialnu@nvidia.com>; Matan Azrad <matan@nvidia.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>; nd
> <nd@arm.com>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
> Hi Slava,
>
> Are there any further comments?
>
Hi, Ruifeng
I've recalled the context and re-reviewed the patch.
There is no performance impact and I'm run out of objections 😊
My sincere apologizes for the long delay ☹
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2022-01-04 3:00 [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
2022-02-10 6:24 ` Ruifeng Wang
2022-05-19 14:56 ` Ali Alnubani
@ 2023-05-30 5:48 ` Ruifeng Wang
2023-06-19 12:13 ` Raslan Darawsheh
2 siblings, 1 reply; 13+ messages in thread
From: Ruifeng Wang @ 2023-05-30 5:48 UTC (permalink / raw)
To: rasland, matan, viacheslavo
Cc: dev, honnappa.nagarahalli, stable, nd, Ruifeng Wang, Ali Alnubani
In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the word that includes op_own field could be
reordered after read of other words. In this case, some words could
contain invalid data.
Reloaded qword0 after read barrier to update vector register. This
ensures that the fetched data is correct.
Testpmd single core test on N1SDP/ThunderX2 showed no performance drop.
Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx completions")
Cc: stable@dpdk.org
Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
Tested-by: Ali Alnubani <alialnu@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
v2: Rebased and added tags that received.
drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 75e8ed7e5a..9079da65de 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -675,6 +675,14 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
c0 = vld1q_u64((uint64_t *)(p0 + 48));
/* Synchronize for loading the rest of blocks. */
rte_io_rmb();
+ /* B.0 (CQE 3) reload lower half of the block. */
+ c3 = vld1q_lane_u64((uint64_t *)(p3 + 48), c3, 0);
+ /* B.0 (CQE 2) reload lower half of the block. */
+ c2 = vld1q_lane_u64((uint64_t *)(p2 + 48), c2, 0);
+ /* B.0 (CQE 1) reload lower half of the block. */
+ c1 = vld1q_lane_u64((uint64_t *)(p1 + 48), c1, 0);
+ /* B.0 (CQE 0) reload lower half of the block. */
+ c0 = vld1q_lane_u64((uint64_t *)(p0 + 48), c0, 0);
/* Prefetch next 4 CQEs. */
if (pkts_n - pos >= 2 * MLX5_VPMD_DESCS_PER_LOOP) {
unsigned int next = pos + MLX5_VPMD_DESCS_PER_LOOP;
--
2.25.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH v2] net/mlx5: fix risk in Rx descriptor read in NEON vector path
2023-05-30 5:48 ` [PATCH v2] " Ruifeng Wang
@ 2023-06-19 12:13 ` Raslan Darawsheh
0 siblings, 0 replies; 13+ messages in thread
From: Raslan Darawsheh @ 2023-06-19 12:13 UTC (permalink / raw)
To: Ruifeng Wang, Matan Azrad, Slava Ovsiienko
Cc: dev, honnappa.nagarahalli, stable, nd, Ali Alnubani
Hi,
> -----Original Message-----
> From: Ruifeng Wang <ruifeng.wang@arm.com>
> Sent: Tuesday, May 30, 2023 8:48 AM
> To: Raslan Darawsheh <rasland@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Slava Ovsiienko <viacheslavo@nvidia.com>
> Cc: dev@dpdk.org; honnappa.nagarahalli@arm.com; stable@dpdk.org;
> nd@arm.com; Ruifeng Wang <ruifeng.wang@arm.com>; Ali Alnubani
> <alialnu@nvidia.com>
> Subject: [PATCH v2] net/mlx5: fix risk in Rx descriptor read in NEON vector
> path
>
> In NEON vector PMD, vector load loads two contiguous 8B of
> descriptor data into vector register. Given vector load ensures no
> 16B atomicity, read of the word that includes op_own field could be
> reordered after read of other words. In this case, some words could
> contain invalid data.
>
> Reloaded qword0 after read barrier to update vector register. This
> ensures that the fetched data is correct.
>
> Testpmd single core test on N1SDP/ThunderX2 showed no performance drop.
>
> Fixes: 1742c2d9fab0 ("net/mlx5: fix synchronization on polling Rx
> completions")
> Cc: stable@dpdk.org
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Tested-by: Ali Alnubani <alialnu@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---
> v2: Rebased and added tags that received.
>
Patch applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-06-19 12:13 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-04 3:00 [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector path Ruifeng Wang
2022-02-10 6:24 ` Ruifeng Wang
2022-02-10 8:16 ` Slava Ovsiienko
2022-02-10 8:29 ` Ruifeng Wang
2022-05-19 14:56 ` Ali Alnubani
2022-06-20 5:37 ` Slava Ovsiienko
2022-06-27 11:08 ` Ruifeng Wang
2022-06-29 7:55 ` Slava Ovsiienko
2022-06-29 11:41 ` Ruifeng Wang
2022-09-29 6:51 ` Ruifeng Wang
2023-03-07 16:59 ` Slava Ovsiienko
2023-05-30 5:48 ` [PATCH v2] " Ruifeng Wang
2023-06-19 12:13 ` Raslan Darawsheh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).