[PATCH] net/mlx5: fix unbind of incorrect hairpin queue

DPDK patches and discussions
 help / color / mirror / Atom feed

* [PATCH] net/mlx5: fix unbind of incorrect hairpin queue
@ 2023-11-09 18:01 Dariusz Sosnowski
  2023-11-12 14:39 ` Raslan Darawsheh
  0 siblings, 1 reply; 2+ messages in thread
From: Dariusz Sosnowski @ 2023-11-09 18:01 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam, Suanming Mou, Bing Zhao
  Cc: dev, Raslan Darawsheh, stable

Let's take an application with the following configuration:

- It uses 2 ports.
- Each port has 3 Rx queues and 3 Tx queues.
- On each port, Rx queues have a following purposes:
  - Rx queue 0 - SW queue,
  - Rx queue 1 - hairpin queue, bound to Tx queue on the same port,
  - Rx queue 2 - hairpin queue, bound to Tx queue on another port.
- On each port, Tx queues have a following purposes:
  - Tx queue 0 - SW queue,
  - Tx queue 1 - hairpin queue, bound to Rx queue on the same port,
  - Tx queue 2 - hairpin queue, bound to Rx queue on another port.
- Application configured all of the hairpin queues for manual binding.

After ports are configured and queues are set up,
if the application does the following API call sequence:

1. rte_eth_dev_start(port_id=0)
2. rte_eth_hairpin_bind(tx_port=0, rx_port=0)
3. rte_eth_hairpin_bind(tx_port=0, rx_port=1)

mlx5 PMD fails to modify SQ and logs this error:

  mlx5_common: mlx5_devx_cmds.c:2079: mlx5_devx_cmd_modify_sq():
    Failed to modify SQ using DevX

This error was caused by an incorrect unbind operation taken during
error handling inside call (3).

(3) fails, because port 1 (Rx side of the hairpin) was not started.
As a result of this failure, PMD goes into error handling, where all
previously bound hairpin queues are unbound.
This is incorrect, since this error handling procedure
in rte_eth_hairpin_bind() implementation assumes that
all hairpin queues are bound to the same rx_port, which is not the case.
The following sequence of function calls appears:

- rte_eth_hairpin_queue_peer_unbind(rx_port=**1**, rx_queue=1, 0),
- mlx5_hairpin_queue_peer_unbind(dev=**port 0**, tx_queue=1, 1).

Which violates the hairpin queue destroy flow, by unbinding Tx queue 1
on port 0, before unbinding Rx queue 1 on port 1.

This patch fixes that behavior, by filtering Tx queues on which error
handling is done to only affect:

- hairpin queues (it also reduces unnecessary debug log messages),
- hairpin queues connected to the rx_port which is currently processed.

Fixes: 37cd4501e873 ("net/mlx5: support two ports hairpin mode")
Cc: bingz@nvidia.com
Cc: stable@dpdk.org

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/net/mlx5/mlx5_trigger.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 88dc271a21..329fa7da3e 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -845,6 +845,11 @@ mlx5_hairpin_bind_single_port(struct rte_eth_dev *dev, uint16_t rx_port)
 		txq_ctrl = mlx5_txq_get(dev, i);
 		if (txq_ctrl == NULL)
 			continue;
+		if (!txq_ctrl->is_hairpin ||
+		    txq_ctrl->hairpin_conf.peers[0].port != rx_port) {
+			mlx5_txq_release(dev, i);
+			continue;
+		}
 		rx_queue = txq_ctrl->hairpin_conf.peers[0].queue;
 		rte_eth_hairpin_queue_peer_unbind(rx_port, rx_queue, 0);
 		mlx5_hairpin_queue_peer_unbind(dev, i, 1);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: [PATCH] net/mlx5: fix unbind of incorrect hairpin queue
  2023-11-09 18:01 [PATCH] net/mlx5: fix unbind of incorrect hairpin queue Dariusz Sosnowski
@ 2023-11-12 14:39 ` Raslan Darawsheh
  0 siblings, 0 replies; 2+ messages in thread
From: Raslan Darawsheh @ 2023-11-12 14:39 UTC (permalink / raw)
  To: Dariusz Sosnowski, Matan Azrad, Slava Ovsiienko, Ori Kam,
	Suanming Mou, Bing Zhao
  Cc: dev, stable

Hi,

> -----Original Message-----
> From: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Sent: Thursday, November 9, 2023 8:01 PM
> To: Matan Azrad <matan@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Ori Kam <orika@nvidia.com>; Suanming Mou
> <suanmingm@nvidia.com>; Bing Zhao <bingz@nvidia.com>
> Cc: dev@dpdk.org; Raslan Darawsheh <rasland@nvidia.com>;
> stable@dpdk.org
> Subject: [PATCH] net/mlx5: fix unbind of incorrect hairpin queue
> 
> Let's take an application with the following configuration:
> 
> - It uses 2 ports.
> - Each port has 3 Rx queues and 3 Tx queues.
> - On each port, Rx queues have a following purposes:
>   - Rx queue 0 - SW queue,
>   - Rx queue 1 - hairpin queue, bound to Tx queue on the same port,
>   - Rx queue 2 - hairpin queue, bound to Tx queue on another port.
> - On each port, Tx queues have a following purposes:
>   - Tx queue 0 - SW queue,
>   - Tx queue 1 - hairpin queue, bound to Rx queue on the same port,
>   - Tx queue 2 - hairpin queue, bound to Rx queue on another port.
> - Application configured all of the hairpin queues for manual binding.
> 
> After ports are configured and queues are set up, if the application does the
> following API call sequence:
> 
> 1. rte_eth_dev_start(port_id=0)
> 2. rte_eth_hairpin_bind(tx_port=0, rx_port=0) 3.
> rte_eth_hairpin_bind(tx_port=0, rx_port=1)
> 
> mlx5 PMD fails to modify SQ and logs this error:
> 
>   mlx5_common: mlx5_devx_cmds.c:2079: mlx5_devx_cmd_modify_sq():
>     Failed to modify SQ using DevX
> 
> This error was caused by an incorrect unbind operation taken during error
> handling inside call (3).
> 
> (3) fails, because port 1 (Rx side of the hairpin) was not started.
> As a result of this failure, PMD goes into error handling, where all previously
> bound hairpin queues are unbound.
> This is incorrect, since this error handling procedure in rte_eth_hairpin_bind()
> implementation assumes that all hairpin queues are bound to the same
> rx_port, which is not the case.
> The following sequence of function calls appears:
> 
> - rte_eth_hairpin_queue_peer_unbind(rx_port=**1**, rx_queue=1, 0),
> - mlx5_hairpin_queue_peer_unbind(dev=**port 0**, tx_queue=1, 1).
> 
> Which violates the hairpin queue destroy flow, by unbinding Tx queue 1 on
> port 0, before unbinding Rx queue 1 on port 1.
> 
> This patch fixes that behavior, by filtering Tx queues on which error handling is
> done to only affect:
> 
> - hairpin queues (it also reduces unnecessary debug log messages),
> - hairpin queues connected to the rx_port which is currently processed.
> 
> Fixes: 37cd4501e873 ("net/mlx5: support two ports hairpin mode")
> Cc: bingz@nvidia.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> ---

Patch applied to next-net-mlx,

Kindest regards,
Raslan Darawsheh

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-11-12 14:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-09 18:01 [PATCH] net/mlx5: fix unbind of incorrect hairpin queue Dariusz Sosnowski
2023-11-12 14:39 ` Raslan Darawsheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).