DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH] net/netvsc: fix number Tx queues > Rx queues
@ 2024-02-29 19:29 Alan Elder
  2024-02-29 21:53 ` Stephen Hemminger
  2024-03-08 18:09 ` [PATCH v2] " Alan Elder
  0 siblings, 2 replies; 19+ messages in thread
From: Alan Elder @ 2024-02-29 19:29 UTC (permalink / raw)
  To: Long Li, Ferruh Yigit, Andrew Rybchenko; +Cc: dev, stephen

The previous code allowed the number of Tx queues to be set higher than
the number of Rx queues.  If a packet was sent on a Tx queue with index
>= number Rx queues there was a segfault.

This commit fixes the issue by creating an Rx queue for every Tx queue
meaning that an event buffer is allocated to handle receiving Tx
completion messages.

mbuf pool and Rx ring are not allocated for these additional Rx queues
and RSS configuration ensures that no packets are received on them.

Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
Cc: sthemmin@microsoft.com
Cc: stable@dpdk.org

Signed-off-by: Alan Elder <alan.elder@microsoft.com>
---
 drivers/net/netvsc/hn_ethdev.c |  9 ++++++++
 drivers/net/netvsc/hn_rxtx.c   | 40 ++++++++++++++++++++++++++++++++++
 drivers/net/netvsc/hn_var.h    |  1 +
 3 files changed, 50 insertions(+)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index b8a32832d7..4111f26a37 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -313,6 +313,15 @@ static int hn_rss_reta_update(struct rte_eth_dev *dev,
 
 		if (reta_conf[idx].mask & mask)
 			hv->rss_ind[i] = reta_conf[idx].reta[shift];
+
+		/*
+		 * Ensure we don't allow config that directs traffic to an Rx
+		 * queue that we aren't going to poll
+		 */
+		if (hv->rss_ind[i] >=  dev->data->nb_rx_queues) {
+			PMD_DRV_LOG(ERR, "RSS distributing traffic to invalid Rx queue");
+			return -EINVAL;
+		}
 	}
 
 	err = hn_rndis_conf_rss(hv, NDIS_RSS_FLAG_DISABLE);
diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
index 9bf1ec5509..e4f6b25cee 100644
--- a/drivers/net/netvsc/hn_rxtx.c
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -243,6 +243,7 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 {
 	struct hn_data *hv = dev->data->dev_private;
 	struct hn_tx_queue *txq;
+	struct hn_rx_queue *rxq;
 	char name[RTE_MEMPOOL_NAMESIZE];
 	uint32_t tx_free_thresh;
 	int err = -ENOMEM;
@@ -301,6 +302,22 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		goto error;
 	}
 
+	/*
+	 * If there are more Tx queues than Rx queues, allocate rx_queues
+	 * with event buffer so that Tx completion messages can still be
+	 * received
+	 */
+	if (queue_idx >= dev->data->nb_rx_queues) {
+		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+		/*
+		 * Don't allocate mbuf pool or rx ring.  RSS is always configured
+		 * to ensure packets aren't received by this Rx queue.
+		 */
+		rxq->mb_pool = NULL;
+		rxq->rx_ring = NULL;
+		dev->data->rx_queues[queue_idx] = rxq;
+	}
+
 	txq->agg_szmax  = RTE_MIN(hv->chim_szmax, hv->rndis_agg_size);
 	txq->agg_pktmax = hv->rndis_agg_pkts;
 	txq->agg_align  = hv->rndis_agg_align;
@@ -364,6 +381,13 @@ hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 	if (!txq)
 		return;
 
+	/*
+	 * Free any Rx queues allocated for a Tx queue without a corresponding
+	 * Rx queue
+	 */
+	if (qid >= dev->data->nb_rx_queues)
+		hn_rx_queue_free_common(dev->data->rx_queues[qid]);
+
 	rte_mempool_free(txq->txdesc_pool);
 
 	rte_memzone_free(txq->tx_rndis_mz);
@@ -942,6 +966,13 @@ hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	if (queue_idx == 0) {
 		rxq = hv->primary;
 	} else {
+		/*
+		 * If the number of Tx queues was previously greater than
+		 * the number of Rx queues, we may already have allocated
+		 * an rxq. If so, free it now before allocating a new one.
+		 */
+		hn_rx_queue_free_common(dev->data->rx_queues[queue_idx]);
+
 		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
 		if (!rxq)
 			return -ENOMEM;
@@ -998,6 +1029,15 @@ hn_rx_queue_free(struct hn_rx_queue *rxq, bool keep_primary)
 	if (keep_primary && rxq == rxq->hv->primary)
 		return;
 
+	hn_rx_queue_free_common(rxq);
+}
+
+static void
+hn_rx_queue_free_common(struct hn_rx_queue *rxq)
+{
+	if (!rxq)
+		return;
+
 	rte_free(rxq->rxbuf_info);
 	rte_free(rxq->event_buf);
 	rte_free(rxq);
diff --git a/drivers/net/netvsc/hn_var.h b/drivers/net/netvsc/hn_var.h
index ba86129ff6..ca599957c0 100644
--- a/drivers/net/netvsc/hn_var.h
+++ b/drivers/net/netvsc/hn_var.h
@@ -216,6 +216,7 @@ int	hn_dev_tx_descriptor_status(void *arg, uint16_t offset);
 struct hn_rx_queue *hn_rx_queue_alloc(struct hn_data *hv,
 				      uint16_t queue_id,
 				      unsigned int socket_id);
+static void hn_rx_queue_free_common(struct hn_rx_queue *rxq);
 int	hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 			      uint16_t queue_idx, uint16_t nb_desc,
 			      unsigned int socket_id,
-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] net/netvsc: fix number Tx queues > Rx queues
  2024-02-29 19:29 [PATCH] net/netvsc: fix number Tx queues > Rx queues Alan Elder
@ 2024-02-29 21:53 ` Stephen Hemminger
  2024-03-01  2:03   ` Long Li
  2024-03-08 18:09 ` [PATCH v2] " Alan Elder
  1 sibling, 1 reply; 19+ messages in thread
From: Stephen Hemminger @ 2024-02-29 21:53 UTC (permalink / raw)
  To: Alan Elder; +Cc: Long Li, Ferruh Yigit, Andrew Rybchenko, dev

On Thu, 29 Feb 2024 19:29:11 +0000
Alan Elder <alan.elder@microsoft.com> wrote:

> The previous code allowed the number of Tx queues to be set higher than
> the number of Rx queues.  If a packet was sent on a Tx queue with index
> >= number Rx queues there was a segfault.  
> 
> This commit fixes the issue by creating an Rx queue for every Tx queue
> meaning that an event buffer is allocated to handle receiving Tx
> completion messages.
> 
> mbuf pool and Rx ring are not allocated for these additional Rx queues
> and RSS configuration ensures that no packets are received on them.
> 
> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> Cc: sthemmin@microsoft.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Alan Elder <alan.elder@microsoft.com>

Don't have Azure account to test, but looks good to me.

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] net/netvsc: fix number Tx queues > Rx queues
  2024-02-29 21:53 ` Stephen Hemminger
@ 2024-03-01  2:03   ` Long Li
  2024-03-08 18:21     ` Alan Elder
  0 siblings, 1 reply; 19+ messages in thread
From: Long Li @ 2024-03-01  2:03 UTC (permalink / raw)
  To: stephen, Alan Elder; +Cc: Ferruh Yigit, Andrew Rybchenko, dev

> Subject: Re: [PATCH] net/netvsc: fix number Tx queues > Rx queues
> 
> On Thu, 29 Feb 2024 19:29:11 +0000
> Alan Elder <alan.elder@microsoft.com> wrote:
> 
> > The previous code allowed the number of Tx queues to be set higher
> > than the number of Rx queues.  If a packet was sent on a Tx queue with
> > index
> > >= number Rx queues there was a segfault.
> >
> > This commit fixes the issue by creating an Rx queue for every Tx queue
> > meaning that an event buffer is allocated to handle receiving Tx
> > completion messages.
> >
> > mbuf pool and Rx ring are not allocated for these additional Rx queues
> > and RSS configuration ensures that no packets are received on them.
> >
> > Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> > Cc: sthemmin@microsoft.com
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Alan Elder <alan.elder@microsoft.com>
> 
> Don't have Azure account to test, but looks good to me.
> 
> Acked-by: Stephen Hemminger <stephen@networkplumber.org>

Please hold on while we are discussing this patch internally with its interaction with MANA.

Long

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] net/netvsc: fix number Tx queues > Rx queues
  2024-02-29 19:29 [PATCH] net/netvsc: fix number Tx queues > Rx queues Alan Elder
  2024-02-29 21:53 ` Stephen Hemminger
@ 2024-03-08 18:09 ` Alan Elder
  2024-03-11 22:31   ` Ferruh Yigit
  2024-03-12 19:08   ` Long Li
  1 sibling, 2 replies; 19+ messages in thread
From: Alan Elder @ 2024-03-08 18:09 UTC (permalink / raw)
  To: Alan Elder, Long Li, Ferruh Yigit, Andrew Rybchenko; +Cc: dev, stephen

The previous code allowed the number of Tx queues to be set higher than
the number of Rx queues.  If a packet was sent on a Tx queue with index
>= number Rx queues there was a segfault.

This commit fixes the issue by creating an Rx queue for every Tx queue
meaning that an event buffer is allocated to handle receiving Tx
completion messages.

mbuf pool and Rx ring are not allocated for these additional Rx queues
and RSS configuration ensures that no packets are received on them.

Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
Cc: sthemmin@microsoft.com
Cc: stable@dpdk.org

Signed-off-by: Alan Elder <alan.elder@microsoft.com>
---
v2:
* Remove function declaration for static non-member function

---
 drivers/net/netvsc/hn_ethdev.c |  9 +++++++
 drivers/net/netvsc/hn_rxtx.c   | 46 +++++++++++++++++++++++++++++++---
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index b8a32832d7..d7e3f12346 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -313,6 +313,15 @@ static int hn_rss_reta_update(struct rte_eth_dev *dev,
 
 		if (reta_conf[idx].mask & mask)
 			hv->rss_ind[i] = reta_conf[idx].reta[shift];
+
+		/*
+		 * Ensure we don't allow config that directs traffic to an Rx
+		 * queue that we aren't going to poll
+		 */
+		if (hv->rss_ind[i] >=  dev->data->nb_rx_queues) {
+			PMD_DRV_LOG(ERR, "RSS distributing traffic to invalid Rx queue");
+			return -EINVAL;
+		}
 	}
 
 	err = hn_rndis_conf_rss(hv, NDIS_RSS_FLAG_DISABLE);
diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
index 9bf1ec5509..c0aaeaa972 100644
--- a/drivers/net/netvsc/hn_rxtx.c
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -243,6 +243,7 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 {
 	struct hn_data *hv = dev->data->dev_private;
 	struct hn_tx_queue *txq;
+	struct hn_rx_queue *rxq;
 	char name[RTE_MEMPOOL_NAMESIZE];
 	uint32_t tx_free_thresh;
 	int err = -ENOMEM;
@@ -301,6 +302,22 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		goto error;
 	}
 
+	/*
+	 * If there are more Tx queues than Rx queues, allocate rx_queues
+	 * with event buffer so that Tx completion messages can still be
+	 * received
+	 */
+	if (queue_idx >= dev->data->nb_rx_queues) {
+		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+		/*
+		 * Don't allocate mbuf pool or rx ring.  RSS is always configured
+		 * to ensure packets aren't received by this Rx queue.
+		 */
+		rxq->mb_pool = NULL;
+		rxq->rx_ring = NULL;
+		dev->data->rx_queues[queue_idx] = rxq;
+	}
+
 	txq->agg_szmax  = RTE_MIN(hv->chim_szmax, hv->rndis_agg_size);
 	txq->agg_pktmax = hv->rndis_agg_pkts;
 	txq->agg_align  = hv->rndis_agg_align;
@@ -354,6 +371,17 @@ static void hn_txd_put(struct hn_tx_queue *txq, struct hn_txdesc *txd)
 	rte_mempool_put(txq->txdesc_pool, txd);
 }
 
+static void
+hn_rx_queue_free_common(struct hn_rx_queue *rxq)
+{
+	if (!rxq)
+		return;
+
+	rte_free(rxq->rxbuf_info);
+	rte_free(rxq->event_buf);
+	rte_free(rxq);
+}
+
 void
 hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 {
@@ -364,6 +392,13 @@ hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 	if (!txq)
 		return;
 
+	/*
+	 * Free any Rx queues allocated for a Tx queue without a corresponding
+	 * Rx queue
+	 */
+	if (qid >= dev->data->nb_rx_queues)
+		hn_rx_queue_free_common(dev->data->rx_queues[qid]);
+
 	rte_mempool_free(txq->txdesc_pool);
 
 	rte_memzone_free(txq->tx_rndis_mz);
@@ -942,6 +977,13 @@ hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	if (queue_idx == 0) {
 		rxq = hv->primary;
 	} else {
+		/*
+		 * If the number of Tx queues was previously greater than
+		 * the number of Rx queues, we may already have allocated
+		 * an rxq. If so, free it now before allocating a new one.
+		 */
+		hn_rx_queue_free_common(dev->data->rx_queues[queue_idx]);
+
 		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
 		if (!rxq)
 			return -ENOMEM;
@@ -998,9 +1040,7 @@ hn_rx_queue_free(struct hn_rx_queue *rxq, bool keep_primary)
 	if (keep_primary && rxq == rxq->hv->primary)
 		return;
 
-	rte_free(rxq->rxbuf_info);
-	rte_free(rxq->event_buf);
-	rte_free(rxq);
+	hn_rx_queue_free_common(rxq);
 }
 
 void
-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH] net/netvsc: fix number Tx queues > Rx queues
  2024-03-01  2:03   ` Long Li
@ 2024-03-08 18:21     ` Alan Elder
  0 siblings, 0 replies; 19+ messages in thread
From: Alan Elder @ 2024-03-08 18:21 UTC (permalink / raw)
  To: Long Li, stephen; +Cc: Ferruh Yigit, Andrew Rybchenko, dev


> Subject: RE: [PATCH] net/netvsc: fix number Tx queues > Rx queues
> 
> > Subject: Re: [PATCH] net/netvsc: fix number Tx queues > Rx queues
> >
> > On Thu, 29 Feb 2024 19:29:11 +0000
> > Alan Elder <alan.elder@microsoft.com> wrote:
> >
> > > The previous code allowed the number of Tx queues to be set higher
> > > than the number of Rx queues.  If a packet was sent on a Tx queue
> > > with index
> > > >= number Rx queues there was a segfault.
> > >
> > > This commit fixes the issue by creating an Rx queue for every Tx
> > > queue meaning that an event buffer is allocated to handle receiving
> > > Tx completion messages.
> > >
> > > mbuf pool and Rx ring are not allocated for these additional Rx
> > > queues and RSS configuration ensures that no packets are received on
> them.
> > >
> > > Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> > > Cc: sthemmin@microsoft.com
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Alan Elder <alan.elder@microsoft.com>
> >
> > Don't have Azure account to test, but looks good to me.
> >
> > Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> Please hold on while we are discussing this patch internally with its interaction
> with MANA.
> 
> Long

Thanks for the feedback Long and Stephen.  We've discussed the interaction with MANA and I think we're good to go ahead with this now.  I've submitted v2 of the patch with a minor fix.

Cheers,
Alan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v2] net/netvsc: fix number Tx queues > Rx queues
  2024-03-08 18:09 ` [PATCH v2] " Alan Elder
@ 2024-03-11 22:31   ` Ferruh Yigit
  2024-03-12 19:08   ` Long Li
  1 sibling, 0 replies; 19+ messages in thread
From: Ferruh Yigit @ 2024-03-11 22:31 UTC (permalink / raw)
  To: Alan Elder, Long Li, Andrew Rybchenko; +Cc: dev, stephen

On 3/8/2024 6:09 PM, Alan Elder wrote:
> The previous code allowed the number of Tx queues to be set higher than
> the number of Rx queues.  If a packet was sent on a Tx queue with index
>> = number Rx queues there was a segfault.
> 
> This commit fixes the issue by creating an Rx queue for every Tx queue
> meaning that an event buffer is allocated to handle receiving Tx
> completion messages.
> 
> mbuf pool and Rx ring are not allocated for these additional Rx queues
> and RSS configuration ensures that no packets are received on them.
> 
> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> Cc: sthemmin@microsoft.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Alan Elder <alan.elder@microsoft.com>
> ---
> v2:
> * Remove function declaration for static non-member function
> 

Hi Long,

There was a request to hold prev version, what is the latest status?
If you agree with this patch, can you please ack/review ?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v2] net/netvsc: fix number Tx queues > Rx queues
  2024-03-08 18:09 ` [PATCH v2] " Alan Elder
  2024-03-11 22:31   ` Ferruh Yigit
@ 2024-03-12 19:08   ` Long Li
  2024-03-19 14:16     ` [PATCH v3] " Alan Elder
  2024-03-19 14:19     ` [PATCH v2] " Alan Elder
  1 sibling, 2 replies; 19+ messages in thread
From: Long Li @ 2024-03-12 19:08 UTC (permalink / raw)
  To: Alan Elder, Ferruh Yigit, Andrew Rybchenko; +Cc: dev, stephen

> a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c index
> 9bf1ec5509..c0aaeaa972 100644
> --- a/drivers/net/netvsc/hn_rxtx.c
> +++ b/drivers/net/netvsc/hn_rxtx.c
> @@ -243,6 +243,7 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,  {
>  	struct hn_data *hv = dev->data->dev_private;
>  	struct hn_tx_queue *txq;
> +	struct hn_rx_queue *rxq;
>  	char name[RTE_MEMPOOL_NAMESIZE];
>  	uint32_t tx_free_thresh;
>  	int err = -ENOMEM;
> @@ -301,6 +302,22 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
>  		goto error;
>  	}
> 
> +	/*
> +	 * If there are more Tx queues than Rx queues, allocate rx_queues
> +	 * with event buffer so that Tx completion messages can still be
> +	 * received
> +	 */
> +	if (queue_idx >= dev->data->nb_rx_queues) {
> +		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);

Need to check if rxq is NULL.

> +		/*
> +		 * Don't allocate mbuf pool or rx ring.  RSS is always configured
> +		 * to ensure packets aren't received by this Rx queue.
> +		 */
> +		rxq->mb_pool = NULL;
> +		rxq->rx_ring = NULL;
> +		dev->data->rx_queues[queue_idx] = rxq;
> +	}
> +
>  	txq->agg_szmax  = RTE_MIN(hv->chim_szmax, hv->rndis_agg_size);
>  	txq->agg_pktmax = hv->rndis_agg_pkts;
>  	txq->agg_align  = hv->rndis_agg_align; @@ -354,6 +371,17 @@ static
> void hn_txd_put(struct hn_tx_queue *txq, struct hn_txdesc *txd)
>  	rte_mempool_put(txq->txdesc_pool, txd);  }
> 
> +static void
> +hn_rx_queue_free_common(struct hn_rx_queue *rxq) {
> +	if (!rxq)
> +		return;
> +
> +	rte_free(rxq->rxbuf_info);
> +	rte_free(rxq->event_buf);
> +	rte_free(rxq);
> +}
> +
>  void
>  hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)  { @@ -364,6
> +392,13 @@ hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
>  	if (!txq)
>  		return;
> 
> +	/*
> +	 * Free any Rx queues allocated for a Tx queue without a corresponding
> +	 * Rx queue
> +	 */
> +	if (qid >= dev->data->nb_rx_queues)
> +		hn_rx_queue_free_common(dev->data->rx_queues[qid]);
> +
>  	rte_mempool_free(txq->txdesc_pool);
> 
>  	rte_memzone_free(txq->tx_rndis_mz);
> @@ -942,6 +977,13 @@ hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
>  	if (queue_idx == 0) {
>  		rxq = hv->primary;
>  	} else {
> +		/*
> +		 * If the number of Tx queues was previously greater than
> +		 * the number of Rx queues, we may already have allocated
> +		 * an rxq. If so, free it now before allocating a new one.
> +		 */
> +		hn_rx_queue_free_common(dev->data-
> >rx_queues[queue_idx]);

This logic seems strange. How about check if rxq is already allocated. If not, allocate it.

Something like:

if (!dev->data->rx_queues[queue_idx])
	rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);



Thanks,

Long

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v3] net/netvsc: fix number Tx queues > Rx queues
  2024-03-12 19:08   ` Long Li
@ 2024-03-19 14:16     ` Alan Elder
  2024-03-19 18:40       ` Long Li
                         ` (2 more replies)
  2024-03-19 14:19     ` [PATCH v2] " Alan Elder
  1 sibling, 3 replies; 19+ messages in thread
From: Alan Elder @ 2024-03-19 14:16 UTC (permalink / raw)
  To: Long Li, Ferruh Yigit, Andrew Rybchenko; +Cc: dev, stephen

The previous code allowed the number of Tx queues to be set higher than
the number of Rx queues.  If a packet was sent on a Tx queue with index
>= number Rx queues there was a segfault.

This commit fixes the issue by creating an Rx queue for every Tx queue
meaning that an event buffer is allocated to handle receiving Tx
completion messages.

mbuf pool and Rx ring are not allocated for these additional Rx queues
and RSS configuration ensures that no packets are received on them.

Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
Cc: sthemmin@microsoft.com
Cc: stable@dpdk.org

Signed-off-by: Alan Elder <alan.elder@microsoft.com>
---
v3:
* Handle case of Rx queue creation failure in hn_dev_tx_queue_setup.
* Re-use rx queue if it has already been allocated.
* Don't allocate an mbuf if pool is NULL.  This avoids segfault if RSS
  configuration is incorrect.

v2:
* Remove function declaration for static non-member function

---
 drivers/net/netvsc/hn_ethdev.c |  9 +++++
 drivers/net/netvsc/hn_rxtx.c   | 70 +++++++++++++++++++++++++++++-----
 2 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c
index b8a32832d7..d7e3f12346 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -313,6 +313,15 @@ static int hn_rss_reta_update(struct rte_eth_dev *dev,
 
 		if (reta_conf[idx].mask & mask)
 			hv->rss_ind[i] = reta_conf[idx].reta[shift];
+
+		/*
+		 * Ensure we don't allow config that directs traffic to an Rx
+		 * queue that we aren't going to poll
+		 */
+		if (hv->rss_ind[i] >=  dev->data->nb_rx_queues) {
+			PMD_DRV_LOG(ERR, "RSS distributing traffic to invalid Rx queue");
+			return -EINVAL;
+		}
 	}
 
 	err = hn_rndis_conf_rss(hv, NDIS_RSS_FLAG_DISABLE);
diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c
index 9bf1ec5509..e23880c176 100644
--- a/drivers/net/netvsc/hn_rxtx.c
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -234,6 +234,17 @@ static void hn_reset_txagg(struct hn_tx_queue *txq)
 	txq->agg_prevpkt = NULL;
 }
 
+static void
+hn_rx_queue_free_common(struct hn_rx_queue *rxq)
+{
+	if (!rxq)
+		return;
+
+	rte_free(rxq->rxbuf_info);
+	rte_free(rxq->event_buf);
+	rte_free(rxq);
+}
+
 int
 hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		      uint16_t queue_idx, uint16_t nb_desc,
@@ -243,6 +254,7 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 {
 	struct hn_data *hv = dev->data->dev_private;
 	struct hn_tx_queue *txq;
+	struct hn_rx_queue *rxq = NULL;
 	char name[RTE_MEMPOOL_NAMESIZE];
 	uint32_t tx_free_thresh;
 	int err = -ENOMEM;
@@ -301,6 +313,27 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		goto error;
 	}
 
+	/*
+	 * If there are more Tx queues than Rx queues, allocate rx_queues
+	 * with event buffer so that Tx completion messages can still be
+	 * received
+	 */
+	if (queue_idx >= dev->data->nb_rx_queues) {
+		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+
+		if (!rxq) {
+			err = -ENOMEM;
+			goto error;
+		}
+
+		/*
+		 * Don't allocate mbuf pool or rx ring.  RSS is always configured
+		 * to ensure packets aren't received by this Rx queue.
+		 */
+		rxq->mb_pool = NULL;
+		rxq->rx_ring = NULL;
+	}
+
 	txq->agg_szmax  = RTE_MIN(hv->chim_szmax, hv->rndis_agg_size);
 	txq->agg_pktmax = hv->rndis_agg_pkts;
 	txq->agg_align  = hv->rndis_agg_align;
@@ -311,12 +344,15 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 				     socket_id, tx_conf);
 	if (err == 0) {
 		dev->data->tx_queues[queue_idx] = txq;
+		if (rxq != NULL)
+			dev->data->rx_queues[queue_idx] = rxq;
 		return 0;
 	}
 
 error:
 	rte_mempool_free(txq->txdesc_pool);
 	rte_memzone_free(txq->tx_rndis_mz);
+	hn_rx_queue_free_common(rxq);
 	rte_free(txq);
 	return err;
 }
@@ -364,6 +400,13 @@ hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 	if (!txq)
 		return;
 
+	/*
+	 * Free any Rx queues allocated for a Tx queue without a corresponding
+	 * Rx queue
+	 */
+	if (qid >= dev->data->nb_rx_queues)
+		hn_rx_queue_free_common(dev->data->rx_queues[qid]);
+
 	rte_mempool_free(txq->txdesc_pool);
 
 	rte_memzone_free(txq->tx_rndis_mz);
@@ -552,10 +595,12 @@ static void hn_rxpkt(struct hn_rx_queue *rxq, struct hn_rx_bufinfo *rxb,
 		     const struct hn_rxinfo *info)
 {
 	struct hn_data *hv = rxq->hv;
-	struct rte_mbuf *m;
+	struct rte_mbuf *m = NULL;
 	bool use_extbuf = false;
 
-	m = rte_pktmbuf_alloc(rxq->mb_pool);
+	if (likely(rxq->mb_pool != NULL))
+		m = rte_pktmbuf_alloc(rxq->mb_pool);
+
 	if (unlikely(!m)) {
 		struct rte_eth_dev *dev =
 			&rte_eth_devices[rxq->port_id];
@@ -942,7 +987,15 @@ hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	if (queue_idx == 0) {
 		rxq = hv->primary;
 	} else {
-		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+		/*
+		 * If the number of Tx queues was previously greater than the
+		 * number of Rx queues, we may already have allocated an rxq.
+		 */
+		if (!dev->data->rx_queues[queue_idx])
+			rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+		else
+			rxq = dev->data->rx_queues[queue_idx];
+
 		if (!rxq)
 			return -ENOMEM;
 	}
@@ -975,9 +1028,10 @@ hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 
 fail:
 	rte_ring_free(rxq->rx_ring);
-	rte_free(rxq->rxbuf_info);
-	rte_free(rxq->event_buf);
-	rte_free(rxq);
+	/* Only free rxq if it was created in this function. */
+	if (!dev->data->rx_queues[queue_idx])
+		hn_rx_queue_free_common(rxq);
+
 	return error;
 }
 
@@ -998,9 +1052,7 @@ hn_rx_queue_free(struct hn_rx_queue *rxq, bool keep_primary)
 	if (keep_primary && rxq == rxq->hv->primary)
 		return;
 
-	rte_free(rxq->rxbuf_info);
-	rte_free(rxq->event_buf);
-	rte_free(rxq);
+	hn_rx_queue_free_common(rxq);
 }
 
 void
-- 
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v2] net/netvsc: fix number Tx queues > Rx queues
  2024-03-12 19:08   ` Long Li
  2024-03-19 14:16     ` [PATCH v3] " Alan Elder
@ 2024-03-19 14:19     ` Alan Elder
  1 sibling, 0 replies; 19+ messages in thread
From: Alan Elder @ 2024-03-19 14:19 UTC (permalink / raw)
  To: Long Li, Ferruh Yigit, Andrew Rybchenko; +Cc: dev, stephen

Thanks for the feedback Long.

I've made both changes you suggested, plus one additional one to not try and allocate an mbuf if the pool is null.

This means if a packet is received on a Rx queue that isn't being polled we will see it appear as "mbuf allocation failed" rather than causing a segfault.

Cheers,
Alan

> -----Original Message-----
> From: Long Li <longli@microsoft.com>
> Sent: Tuesday, March 12, 2024 7:09 PM
> To: Alan Elder <alan.elder@microsoft.com>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Andrew Rybchenko
> <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
> Subject: RE: [PATCH v2] net/netvsc: fix number Tx queues > Rx queues
> 
> > a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c index
> > 9bf1ec5509..c0aaeaa972 100644
> > --- a/drivers/net/netvsc/hn_rxtx.c
> > +++ b/drivers/net/netvsc/hn_rxtx.c
> > @@ -243,6 +243,7 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,  {
> >  	struct hn_data *hv = dev->data->dev_private;
> >  	struct hn_tx_queue *txq;
> > +	struct hn_rx_queue *rxq;
> >  	char name[RTE_MEMPOOL_NAMESIZE];
> >  	uint32_t tx_free_thresh;
> >  	int err = -ENOMEM;
> > @@ -301,6 +302,22 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
> >  		goto error;
> >  	}
> >
> > +	/*
> > +	 * If there are more Tx queues than Rx queues, allocate rx_queues
> > +	 * with event buffer so that Tx completion messages can still be
> > +	 * received
> > +	 */
> > +	if (queue_idx >= dev->data->nb_rx_queues) {
> > +		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
> 
> Need to check if rxq is NULL.
> 
> > +		/*
> > +		 * Don't allocate mbuf pool or rx ring.  RSS is always configured
> > +		 * to ensure packets aren't received by this Rx queue.
> > +		 */
> > +		rxq->mb_pool = NULL;
> > +		rxq->rx_ring = NULL;
> > +		dev->data->rx_queues[queue_idx] = rxq;
> > +	}
> > +
> >  	txq->agg_szmax  = RTE_MIN(hv->chim_szmax, hv->rndis_agg_size);
> >  	txq->agg_pktmax = hv->rndis_agg_pkts;
> >  	txq->agg_align  = hv->rndis_agg_align; @@ -354,6 +371,17 @@ static
> > void hn_txd_put(struct hn_tx_queue *txq, struct hn_txdesc *txd)
> >  	rte_mempool_put(txq->txdesc_pool, txd);  }
> >
> > +static void
> > +hn_rx_queue_free_common(struct hn_rx_queue *rxq) {
> > +	if (!rxq)
> > +		return;
> > +
> > +	rte_free(rxq->rxbuf_info);
> > +	rte_free(rxq->event_buf);
> > +	rte_free(rxq);
> > +}
> > +
> >  void
> >  hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)  { @@
> > -364,6
> > +392,13 @@ hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t
> > +qid)
> >  	if (!txq)
> >  		return;
> >
> > +	/*
> > +	 * Free any Rx queues allocated for a Tx queue without a
> corresponding
> > +	 * Rx queue
> > +	 */
> > +	if (qid >= dev->data->nb_rx_queues)
> > +		hn_rx_queue_free_common(dev->data->rx_queues[qid]);
> > +
> >  	rte_mempool_free(txq->txdesc_pool);
> >
> >  	rte_memzone_free(txq->tx_rndis_mz);
> > @@ -942,6 +977,13 @@ hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
> >  	if (queue_idx == 0) {
> >  		rxq = hv->primary;
> >  	} else {
> > +		/*
> > +		 * If the number of Tx queues was previously greater than
> > +		 * the number of Rx queues, we may already have allocated
> > +		 * an rxq. If so, free it now before allocating a new one.
> > +		 */
> > +		hn_rx_queue_free_common(dev->data-
> > >rx_queues[queue_idx]);
> 
> This logic seems strange. How about check if rxq is already allocated. If not,
> allocate it.
> 
> Something like:
> 
> if (!dev->data->rx_queues[queue_idx])
> 	rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
> 
> 
> 
> Thanks,
> 
> Long

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v3] net/netvsc: fix number Tx queues > Rx queues
  2024-03-19 14:16     ` [PATCH v3] " Alan Elder
@ 2024-03-19 18:40       ` Long Li
  2024-04-11 11:38       ` Ferruh Yigit
  2024-04-15 14:40       ` [PATCH v4] " Alan Elder
  2 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2024-03-19 18:40 UTC (permalink / raw)
  To: Alan Elder, Ferruh Yigit, Andrew Rybchenko; +Cc: dev, stephen

> Subject: [PATCH v3] net/netvsc: fix number Tx queues > Rx queues
> 
> The previous code allowed the number of Tx queues to be set higher than the
> number of Rx queues.  If a packet was sent on a Tx queue with index
> >= number Rx queues there was a segfault.
> 
> This commit fixes the issue by creating an Rx queue for every Tx queue meaning
> that an event buffer is allocated to handle receiving Tx completion messages.
> 
> mbuf pool and Rx ring are not allocated for these additional Rx queues and RSS
> configuration ensures that no packets are received on them.
> 
> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> Cc: sthemmin@microsoft.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Alan Elder <alan.elder@microsoft.com>

Reviewed-by: Long Li <longli@microsoft.com>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx queues
  2024-03-19 14:16     ` [PATCH v3] " Alan Elder
  2024-03-19 18:40       ` Long Li
@ 2024-04-11 11:38       ` Ferruh Yigit
  2024-04-11 20:45         ` [EXTERNAL] " Alan Elder
  2024-04-15 14:40       ` [PATCH v4] " Alan Elder
  2 siblings, 1 reply; 19+ messages in thread
From: Ferruh Yigit @ 2024-04-11 11:38 UTC (permalink / raw)
  To: Alan Elder, Long Li, Andrew Rybchenko; +Cc: dev, stephen

On 3/19/2024 2:16 PM, Alan Elder wrote:
> The previous code allowed the number of Tx queues to be set higher than
> the number of Rx queues.  If a packet was sent on a Tx queue with index
>> = number Rx queues there was a segfault.
> This commit fixes the issue by creating an Rx queue for every Tx queue
> meaning that an event buffer is allocated to handle receiving Tx
> completion messages.
> 
> mbuf pool and Rx ring are not allocated for these additional Rx queues
> and RSS configuration ensures that no packets are received on them.
> 
> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> Cc: sthemmin@microsoft.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Alan Elder <alan.elder@microsoft.com>
>

Hi Alan,

What is the root cause of the crash, is it in driver scope or application?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx queues
  2024-04-11 11:38       ` Ferruh Yigit
@ 2024-04-11 20:45         ` Alan Elder
  2024-04-12 10:23           ` Ferruh Yigit
  0 siblings, 1 reply; 19+ messages in thread
From: Alan Elder @ 2024-04-11 20:45 UTC (permalink / raw)
  To: Ferruh Yigit, Long Li, Andrew Rybchenko; +Cc: dev, stephen

> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Thursday, April 11, 2024 7:38 AM
> To: Alan Elder <alan.elder@microsoft.com>; Long Li <longli@microsoft.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
> queues
> 
> On 3/19/2024 2:16 PM, Alan Elder wrote:
> > The previous code allowed the number of Tx queues to be set higher
> > than the number of Rx queues.  If a packet was sent on a Tx queue with
> > index
> >> = number Rx queues there was a segfault.
> > This commit fixes the issue by creating an Rx queue for every Tx queue
> > meaning that an event buffer is allocated to handle receiving Tx
> > completion messages.
> >
> > mbuf pool and Rx ring are not allocated for these additional Rx queues
> > and RSS configuration ensures that no packets are received on them.
> >
> > Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> > Cc: sthemmin@microsoft.com
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Alan Elder <alan.elder@microsoft.com>
> >
> 
> Hi Alan,
> 
> What is the root cause of the crash, is it in driver scope or application?

Hi Ferruh,

The root cause of the crash was in the driver - a packet received on a Tx queue that had no corresponding Rx queue would cause the dev->data->rx_queues[] array to be accessed past the length of the array.

https://github.com/DPDK/dpdk/blob/main/drivers/net/netvsc/hn_rxtx.c#L1071

Thanks,
Alan 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx queues
  2024-04-11 20:45         ` [EXTERNAL] " Alan Elder
@ 2024-04-12 10:23           ` Ferruh Yigit
  2024-04-12 16:50             ` Alan Elder
  0 siblings, 1 reply; 19+ messages in thread
From: Ferruh Yigit @ 2024-04-12 10:23 UTC (permalink / raw)
  To: Alan Elder, Long Li, Andrew Rybchenko; +Cc: dev, stephen

On 4/11/2024 9:45 PM, Alan Elder wrote:
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Thursday, April 11, 2024 7:38 AM
>> To: Alan Elder <alan.elder@microsoft.com>; Long Li <longli@microsoft.com>;
>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
>> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
>> queues
>>
>> On 3/19/2024 2:16 PM, Alan Elder wrote:
>>> The previous code allowed the number of Tx queues to be set higher
>>> than the number of Rx queues.  If a packet was sent on a Tx queue with
>>> index
>>>> = number Rx queues there was a segfault.
>>> This commit fixes the issue by creating an Rx queue for every Tx queue
>>> meaning that an event buffer is allocated to handle receiving Tx
>>> completion messages.
>>>
>>> mbuf pool and Rx ring are not allocated for these additional Rx queues
>>> and RSS configuration ensures that no packets are received on them.
>>>
>>> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
>>> Cc: sthemmin@microsoft.com
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Alan Elder <alan.elder@microsoft.com>
>>>
>>
>> Hi Alan,
>>
>> What is the root cause of the crash, is it in driver scope or application?
> 
> Hi Ferruh,
> 
> The root cause of the crash was in the driver - a packet received on a Tx queue that had no corresponding Rx queue would cause the dev->data->rx_queues[] array to be accessed past the length of the array.
> 
> https://github.com/DPDK/dpdk/blob/main/drivers/net/netvsc/hn_rxtx.c#L1071
> 
> 

Why there is an access to Rx queue when processing Tx queue?

A backtrace of the crash can help to understand the issue, can you
please include this in commit log, plus some explanation why crash happens?

Thanks,
ferruh


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx queues
  2024-04-12 10:23           ` Ferruh Yigit
@ 2024-04-12 16:50             ` Alan Elder
  2024-04-15 17:54               ` Ferruh Yigit
  0 siblings, 1 reply; 19+ messages in thread
From: Alan Elder @ 2024-04-12 16:50 UTC (permalink / raw)
  To: Ferruh Yigit, Long Li, Andrew Rybchenko; +Cc: dev, stephen



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, April 12, 2024 6:23 AM
> To: Alan Elder <alan.elder@microsoft.com>; Long Li <longli@microsoft.com>;
> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
> Subject: Re: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
> queues
> 
> On 4/11/2024 9:45 PM, Alan Elder wrote:
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Thursday, April 11, 2024 7:38 AM
> >> To: Alan Elder <alan.elder@microsoft.com>; Long Li
> >> <longli@microsoft.com>; Andrew Rybchenko
> >> <andrew.rybchenko@oktetlabs.ru>
> >> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
> >> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues >
> >> Rx queues
> >>
> >> On 3/19/2024 2:16 PM, Alan Elder wrote:
> >>> The previous code allowed the number of Tx queues to be set higher
> >>> than the number of Rx queues.  If a packet was sent on a Tx queue
> >>> with index
> >>>> = number Rx queues there was a segfault.
> >>> This commit fixes the issue by creating an Rx queue for every Tx
> >>> queue meaning that an event buffer is allocated to handle receiving
> >>> Tx completion messages.
> >>>
> >>> mbuf pool and Rx ring are not allocated for these additional Rx
> >>> queues and RSS configuration ensures that no packets are received on
> them.
> >>>
> >>> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> >>> Cc: sthemmin@microsoft.com
> >>> Cc: stable@dpdk.org
> >>>
> >>> Signed-off-by: Alan Elder <alan.elder@microsoft.com>
> >>>
> >>
> >> Hi Alan,
> >>
> >> What is the root cause of the crash, is it in driver scope or application?
> >
> > Hi Ferruh,
> >
> > The root cause of the crash was in the driver - a packet received on a Tx
> queue that had no corresponding Rx queue would cause the dev->data-
> >rx_queues[] array to be accessed past the length of the array.
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >
> ub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Fdrivers%2Fnet%2Fnetvsc%2Fhn
> _rxtx.
> >
> c%23L1071&data=05%7C02%7Calan.elder%40microsoft.com%7C3985f99c07c1
> 4a64
> >
> 99fd08dc5ada98d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6
> 3848514
> >
> 2149539930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
> iV2luMzI
> >
> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Y%2F6lr6v2j4Q
> cSm6g0
> > dTcV%2FEimyfPs0nMBJ0X5s9omAE%3D&reserved=0
> >
> >
> 
> Why there is an access to Rx queue when processing Tx queue?
> 
> A backtrace of the crash can help to understand the issue, can you please
> include this in commit log, plus some explanation why crash happens?
> 
> Thanks,
> Ferruh

Hi Ferruh,

Netvsc slow path needs to handle Tx completion messages (to know when it can reclaim Tx buffers).  Tx completion messages are received on Rx queue, which is why the Rx queue is accessed as part of transmit processing.

An example call stack is:

#6 rte_spinlock_trylock (sl=0x20) at /include/rte_spinlock.h
#7  hn_process_events (hv=, queue_id=2, tx_limit=) at /drivers/net/netvsc/hn_rxtx.c
#8  hn_xmit_pkts (ptxq=, tx_pkts=, nb_pkts=1) at /drivers/net/netvsc/hn_rxtx.c

Which leads to the SEGV as 0x20 is not a valid address.

I'll update the commit messages and resubmit the patch.

Thanks,
Alan 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4] net/netvsc: fix number Tx queues > Rx queues
  2024-03-19 14:16     ` [PATCH v3] " Alan Elder
  2024-03-19 18:40       ` Long Li
  2024-04-11 11:38       ` Ferruh Yigit
@ 2024-04-15 14:40       ` Alan Elder
  2024-04-15 18:11         ` Ferruh Yigit
  2024-05-01  7:43         ` Morten Brørup
  2 siblings, 2 replies; 19+ messages in thread
From: Alan Elder @ 2024-04-15 14:40 UTC (permalink / raw)
  To: Long Li, Ferruh Yigit, Andrew Rybchenko; +Cc: dev, stephen

The previous code allowed the number of Tx queues to be set higher than the number of Rx queues.  If a packet was sent on a Tx queue with index
>= number Rx queues there was a segfault due to accessing beyond the end of the dev->data->rx_queues[] array.

#0 rte_spinlock_trylock (sl = invalid address) at /include/rte_spinlock.h L63
#1  hn_process_events at /drivers/net/netvsc/hn_rxtx.c L 1129
#2  hn_xmit_pkts at /drivers/net/netvsc/hn_rxtx.c L1553

This commit fixes the issue by creating an Rx queue for every Tx queue meaning that an event buffer is allocated to handle receiving Tx completion messages.

mbuf pool and Rx ring are not allocated for these additional Rx queues and RSS configuration ensures that no packets are received on them.

Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
Cc: sthemmin@microsoft.com
Cc: stable@dpdk.org

Signed-off-by: Alan Elder <alan.elder@microsoft.com>
---
v4:
* Include segfault core stack in commit message

v3:
* Handle case of Rx queue creation failure in hn_dev_tx_queue_setup.
* Re-use rx queue if it has already been allocated.
* Don't allocate an mbuf if pool is NULL.  This avoids segfault if RSS
  configuration is incorrect.

v2:
* Remove function declaration for static non-member function

---
 drivers/net/netvsc/hn_ethdev.c |  9 +++++
 drivers/net/netvsc/hn_rxtx.c   | 70 +++++++++++++++++++++++++++++-----
 2 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netvsc/hn_ethdev.c b/drivers/net/netvsc/hn_ethdev.c index b8a32832d7..d7e3f12346 100644
--- a/drivers/net/netvsc/hn_ethdev.c
+++ b/drivers/net/netvsc/hn_ethdev.c
@@ -313,6 +313,15 @@ static int hn_rss_reta_update(struct rte_eth_dev *dev,
 
 		if (reta_conf[idx].mask & mask)
 			hv->rss_ind[i] = reta_conf[idx].reta[shift];
+
+		/*
+		 * Ensure we don't allow config that directs traffic to an Rx
+		 * queue that we aren't going to poll
+		 */
+		if (hv->rss_ind[i] >=  dev->data->nb_rx_queues) {
+			PMD_DRV_LOG(ERR, "RSS distributing traffic to invalid Rx queue");
+			return -EINVAL;
+		}
 	}
 
 	err = hn_rndis_conf_rss(hv, NDIS_RSS_FLAG_DISABLE); diff --git a/drivers/net/netvsc/hn_rxtx.c b/drivers/net/netvsc/hn_rxtx.c index 9bf1ec5509..e23880c176 100644
--- a/drivers/net/netvsc/hn_rxtx.c
+++ b/drivers/net/netvsc/hn_rxtx.c
@@ -234,6 +234,17 @@ static void hn_reset_txagg(struct hn_tx_queue *txq)
 	txq->agg_prevpkt = NULL;
 }
 
+static void
+hn_rx_queue_free_common(struct hn_rx_queue *rxq) {
+	if (!rxq)
+		return;
+
+	rte_free(rxq->rxbuf_info);
+	rte_free(rxq->event_buf);
+	rte_free(rxq);
+}
+
 int
 hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		      uint16_t queue_idx, uint16_t nb_desc, @@ -243,6 +254,7 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,  {
 	struct hn_data *hv = dev->data->dev_private;
 	struct hn_tx_queue *txq;
+	struct hn_rx_queue *rxq = NULL;
 	char name[RTE_MEMPOOL_NAMESIZE];
 	uint32_t tx_free_thresh;
 	int err = -ENOMEM;
@@ -301,6 +313,27 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 		goto error;
 	}
 
+	/*
+	 * If there are more Tx queues than Rx queues, allocate rx_queues
+	 * with event buffer so that Tx completion messages can still be
+	 * received
+	 */
+	if (queue_idx >= dev->data->nb_rx_queues) {
+		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+
+		if (!rxq) {
+			err = -ENOMEM;
+			goto error;
+		}
+
+		/*
+		 * Don't allocate mbuf pool or rx ring.  RSS is always configured
+		 * to ensure packets aren't received by this Rx queue.
+		 */
+		rxq->mb_pool = NULL;
+		rxq->rx_ring = NULL;
+	}
+
 	txq->agg_szmax  = RTE_MIN(hv->chim_szmax, hv->rndis_agg_size);
 	txq->agg_pktmax = hv->rndis_agg_pkts;
 	txq->agg_align  = hv->rndis_agg_align; @@ -311,12 +344,15 @@ hn_dev_tx_queue_setup(struct rte_eth_dev *dev,
 				     socket_id, tx_conf);
 	if (err == 0) {
 		dev->data->tx_queues[queue_idx] = txq;
+		if (rxq != NULL)
+			dev->data->rx_queues[queue_idx] = rxq;
 		return 0;
 	}
 
 error:
 	rte_mempool_free(txq->txdesc_pool);
 	rte_memzone_free(txq->tx_rndis_mz);
+	hn_rx_queue_free_common(rxq);
 	rte_free(txq);
 	return err;
 }
@@ -364,6 +400,13 @@ hn_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 	if (!txq)
 		return;
 
+	/*
+	 * Free any Rx queues allocated for a Tx queue without a corresponding
+	 * Rx queue
+	 */
+	if (qid >= dev->data->nb_rx_queues)
+		hn_rx_queue_free_common(dev->data->rx_queues[qid]);
+
 	rte_mempool_free(txq->txdesc_pool);
 
 	rte_memzone_free(txq->tx_rndis_mz);
@@ -552,10 +595,12 @@ static void hn_rxpkt(struct hn_rx_queue *rxq, struct hn_rx_bufinfo *rxb,
 		     const struct hn_rxinfo *info)
 {
 	struct hn_data *hv = rxq->hv;
-	struct rte_mbuf *m;
+	struct rte_mbuf *m = NULL;
 	bool use_extbuf = false;
 
-	m = rte_pktmbuf_alloc(rxq->mb_pool);
+	if (likely(rxq->mb_pool != NULL))
+		m = rte_pktmbuf_alloc(rxq->mb_pool);
+
 	if (unlikely(!m)) {
 		struct rte_eth_dev *dev =
 			&rte_eth_devices[rxq->port_id];
@@ -942,7 +987,15 @@ hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 	if (queue_idx == 0) {
 		rxq = hv->primary;
 	} else {
-		rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+		/*
+		 * If the number of Tx queues was previously greater than the
+		 * number of Rx queues, we may already have allocated an rxq.
+		 */
+		if (!dev->data->rx_queues[queue_idx])
+			rxq = hn_rx_queue_alloc(hv, queue_idx, socket_id);
+		else
+			rxq = dev->data->rx_queues[queue_idx];
+
 		if (!rxq)
 			return -ENOMEM;
 	}
@@ -975,9 +1028,10 @@ hn_dev_rx_queue_setup(struct rte_eth_dev *dev,
 
 fail:
 	rte_ring_free(rxq->rx_ring);
-	rte_free(rxq->rxbuf_info);
-	rte_free(rxq->event_buf);
-	rte_free(rxq);
+	/* Only free rxq if it was created in this function. */
+	if (!dev->data->rx_queues[queue_idx])
+		hn_rx_queue_free_common(rxq);
+
 	return error;
 }
 
@@ -998,9 +1052,7 @@ hn_rx_queue_free(struct hn_rx_queue *rxq, bool keep_primary)
 	if (keep_primary && rxq == rxq->hv->primary)
 		return;
 
-	rte_free(rxq->rxbuf_info);
-	rte_free(rxq->event_buf);
-	rte_free(rxq);
+	hn_rx_queue_free_common(rxq);
 }
 
 void
--
2.25.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx queues
  2024-04-12 16:50             ` Alan Elder
@ 2024-04-15 17:54               ` Ferruh Yigit
  0 siblings, 0 replies; 19+ messages in thread
From: Ferruh Yigit @ 2024-04-15 17:54 UTC (permalink / raw)
  To: Alan Elder, Long Li, Andrew Rybchenko; +Cc: dev, stephen

On 4/12/2024 5:50 PM, Alan Elder wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, April 12, 2024 6:23 AM
>> To: Alan Elder <alan.elder@microsoft.com>; Long Li <longli@microsoft.com>;
>> Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
>> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
>> Subject: Re: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues > Rx
>> queues
>>
>> On 4/11/2024 9:45 PM, Alan Elder wrote:
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>>> Sent: Thursday, April 11, 2024 7:38 AM
>>>> To: Alan Elder <alan.elder@microsoft.com>; Long Li
>>>> <longli@microsoft.com>; Andrew Rybchenko
>>>> <andrew.rybchenko@oktetlabs.ru>
>>>> Cc: dev@dpdk.org; stephen <stephen@networkplumber.org>
>>>> Subject: [EXTERNAL] Re: [PATCH v3] net/netvsc: fix number Tx queues >
>>>> Rx queues
>>>>
>>>> On 3/19/2024 2:16 PM, Alan Elder wrote:
>>>>> The previous code allowed the number of Tx queues to be set higher
>>>>> than the number of Rx queues.  If a packet was sent on a Tx queue
>>>>> with index
>>>>>> = number Rx queues there was a segfault.
>>>>> This commit fixes the issue by creating an Rx queue for every Tx
>>>>> queue meaning that an event buffer is allocated to handle receiving
>>>>> Tx completion messages.
>>>>>
>>>>> mbuf pool and Rx ring are not allocated for these additional Rx
>>>>> queues and RSS configuration ensures that no packets are received on
>> them.
>>>>>
>>>>> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
>>>>> Cc: sthemmin@microsoft.com
>>>>> Cc: stable@dpdk.org
>>>>>
>>>>> Signed-off-by: Alan Elder <alan.elder@microsoft.com>
>>>>>
>>>>
>>>> Hi Alan,
>>>>
>>>> What is the root cause of the crash, is it in driver scope or application?
>>>
>>> Hi Ferruh,
>>>
>>> The root cause of the crash was in the driver - a packet received on a Tx
>> queue that had no corresponding Rx queue would cause the dev->data-
>>> rx_queues[] array to be accessed past the length of the array.
>>>
>>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>>>
>> ub.com%2FDPDK%2Fdpdk%2Fblob%2Fmain%2Fdrivers%2Fnet%2Fnetvsc%2Fhn
>> _rxtx.
>>>
>> c%23L1071&data=05%7C02%7Calan.elder%40microsoft.com%7C3985f99c07c1
>> 4a64
>>>
>> 99fd08dc5ada98d0%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6
>> 3848514
>>>
>> 2149539930%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjo
>> iV2luMzI
>>>
>> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Y%2F6lr6v2j4Q
>> cSm6g0
>>> dTcV%2FEimyfPs0nMBJ0X5s9omAE%3D&reserved=0
>>>
>>>
>>
>> Why there is an access to Rx queue when processing Tx queue?
>>
>> A backtrace of the crash can help to understand the issue, can you please
>> include this in commit log, plus some explanation why crash happens?
>>
>> Thanks,
>> Ferruh
> 
> Hi Ferruh,
> 
> Netvsc slow path needs to handle Tx completion messages (to know when it can reclaim Tx buffers).  Tx completion messages are received on Rx queue, which is why the Rx queue is accessed as part of transmit processing.
> 
> An example call stack is:
> 
> #6 rte_spinlock_trylock (sl=0x20) at /include/rte_spinlock.h
> #7  hn_process_events (hv=, queue_id=2, tx_limit=) at /drivers/net/netvsc/hn_rxtx.c
> #8  hn_xmit_pkts (ptxq=, tx_pkts=, nb_pkts=1) at /drivers/net/netvsc/hn_rxtx.c
> 
> Which leads to the SEGV as 0x20 is not a valid address.
> 
> I'll update the commit messages and resubmit the patch.
> 
> 

Hi Alan,

Thanks for the detail.

'hn_xmit_pkts()' calls 'hn_process_events()' with the Tx queue_id, but
'hn_process_events()' seems designed for Rx event processing since it
uses 'queue_id' to get rxq.
Does it help to pass queue type to 'hn_process_events()'?


And the patch creates Rx queues for access Tx queues. Are the Tx
completion packets needs to be delivered to Rx queue with exact same Tx
queue_id by design?
Or the new Rx queues created just to prevent the crash, by providing
'rxq->ring_lock' etc?

Please also check comments on v4, thanks.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4] net/netvsc: fix number Tx queues > Rx queues
  2024-04-15 14:40       ` [PATCH v4] " Alan Elder
@ 2024-04-15 18:11         ` Ferruh Yigit
  2024-04-17 23:45           ` Long Li
  2024-05-01  7:43         ` Morten Brørup
  1 sibling, 1 reply; 19+ messages in thread
From: Ferruh Yigit @ 2024-04-15 18:11 UTC (permalink / raw)
  To: Alan Elder, Long Li, Andrew Rybchenko; +Cc: dev, stephen

On 4/15/2024 3:40 PM, Alan Elder wrote:
> The previous code allowed the number of Tx queues to be set higher than the number of Rx queues.  If a packet was sent on a Tx queue with index
>> = number Rx queues there was a segfault due to accessing beyond the end of the dev->data->rx_queues[] array.
> 
> #0 rte_spinlock_trylock (sl = invalid address) at /include/rte_spinlock.h L63
> #1  hn_process_events at /drivers/net/netvsc/hn_rxtx.c L 1129
> #2  hn_xmit_pkts at /drivers/net/netvsc/hn_rxtx.c L1553
> 
> This commit fixes the issue by creating an Rx queue for every Tx queue meaning that an event buffer is allocated to handle receiving Tx completion messages.
> 
> mbuf pool and Rx ring are not allocated for these additional Rx queues and RSS configuration ensures that no packets are received on them.
> 
> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> Cc: sthemmin@microsoft.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Alan Elder <alan.elder@microsoft.com>

<...>

> @@ -552,10 +595,12 @@ static void hn_rxpkt(struct hn_rx_queue *rxq, struct hn_rx_bufinfo *rxb,
>  		     const struct hn_rxinfo *info)
>  {
>  	struct hn_data *hv = rxq->hv;
> -	struct rte_mbuf *m;
> +	struct rte_mbuf *m = NULL;
>  	bool use_extbuf = false;
>  
> -	m = rte_pktmbuf_alloc(rxq->mb_pool);
> +	if (likely(rxq->mb_pool != NULL))
> +		m = rte_pktmbuf_alloc(rxq->mb_pool);
> +
>

This introduced additional check in Rx path, not sure what is the
performance impact.

I can see Long already acked the v3, I just want to double check.
If Tx queue number > Rx queue number is not a common usecase, perhaps it
can be an option to forbid it instead of getting performance hit.
Or it can be possible to have a dedicated Rx queue, like queue_id 0, for
Tx completion events for Tx queue_id > Rx queue number, etc..

But Long if you prefer to continue with this patch, please ack it and I
can continue with it.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v4] net/netvsc: fix number Tx queues > Rx queues
  2024-04-15 18:11         ` Ferruh Yigit
@ 2024-04-17 23:45           ` Long Li
  0 siblings, 0 replies; 19+ messages in thread
From: Long Li @ 2024-04-17 23:45 UTC (permalink / raw)
  To: Ferruh Yigit, Alan Elder, Andrew Rybchenko; +Cc: dev, stephen

 
> This introduced additional check in Rx path, not sure what is the performance
> impact.
> 
> I can see Long already acked the v3, I just want to double check.
> If Tx queue number > Rx queue number is not a common usecase, perhaps it can
> be an option to forbid it instead of getting performance hit.
> Or it can be possible to have a dedicated Rx queue, like queue_id 0, for Tx
> completion events for Tx queue_id > Rx queue number, etc..
> 
> But Long if you prefer to continue with this patch, please ack it and I can continue
> with it.

Ferruh, thank you for raising this concern. We will run some tests to evaluate performance impact of this patch.

Will update soon.

Long

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH v4] net/netvsc: fix number Tx queues > Rx queues
  2024-04-15 14:40       ` [PATCH v4] " Alan Elder
  2024-04-15 18:11         ` Ferruh Yigit
@ 2024-05-01  7:43         ` Morten Brørup
  1 sibling, 0 replies; 19+ messages in thread
From: Morten Brørup @ 2024-05-01  7:43 UTC (permalink / raw)
  To: Alan Elder, Long Li, Ferruh Yigit, Andrew Rybchenko
  Cc: dev, Stephen Hemminger

> From: Alan Elder [mailto:alan.elder@microsoft.com]
> Sent: Monday, 15 April 2024 16.41
> 
> The previous code allowed the number of Tx queues to be set higher than the
> number of Rx queues.  If a packet was sent on a Tx queue with index
> >= number Rx queues there was a segfault due to accessing beyond the end of
> the dev->data->rx_queues[] array.
> 
> #0 rte_spinlock_trylock (sl = invalid address) at /include/rte_spinlock.h L63
> #1  hn_process_events at /drivers/net/netvsc/hn_rxtx.c L 1129
> #2  hn_xmit_pkts at /drivers/net/netvsc/hn_rxtx.c L1553
> 
> This commit fixes the issue by creating an Rx queue for every Tx queue meaning
> that an event buffer is allocated to handle receiving Tx completion messages.
> 
> mbuf pool and Rx ring are not allocated for these additional Rx queues and RSS
> configuration ensures that no packets are received on them.
> 
> Fixes: 4e9c73e96e83 ("net/netvsc: add Hyper-V network device")
> Cc: sthemmin@microsoft.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Alan Elder <alan.elder@microsoft.com>
> ---

Is there any requirements to the order the application must call rte_eth_rx_queue_setup() and rte_eth_tx_queue_setup()?

I.e. does it work if rte_eth_tx_queue_setup() is called before rte_eth_rx_queue_setup(), and in the opposite order?


Although the ethdev documentation says:

"The functions exported by the application Ethernet API to setup a device designated by its port identifier must be invoked in the following order:

rte_eth_dev_configure()
rte_eth_tx_queue_setup()
rte_eth_rx_queue_setup()
rte_eth_dev_start()",

I would assume the order of calling rte_eth_tx_queue_setup() and rte_eth_rx_queue_setup() doesn't matter.


And the rte_eth_dev_reset() function documentation has rx/tx queue setup in the opposite order:

"After calling rte_eth_dev_reset(), the application should use rte_eth_dev_configure(), rte_eth_rx_queue_setup(), rte_eth_tx_queue_setup(), and rte_eth_dev_start() to reconfigure the device as appropriate."


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-05-01  7:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-29 19:29 [PATCH] net/netvsc: fix number Tx queues > Rx queues Alan Elder
2024-02-29 21:53 ` Stephen Hemminger
2024-03-01  2:03   ` Long Li
2024-03-08 18:21     ` Alan Elder
2024-03-08 18:09 ` [PATCH v2] " Alan Elder
2024-03-11 22:31   ` Ferruh Yigit
2024-03-12 19:08   ` Long Li
2024-03-19 14:16     ` [PATCH v3] " Alan Elder
2024-03-19 18:40       ` Long Li
2024-04-11 11:38       ` Ferruh Yigit
2024-04-11 20:45         ` [EXTERNAL] " Alan Elder
2024-04-12 10:23           ` Ferruh Yigit
2024-04-12 16:50             ` Alan Elder
2024-04-15 17:54               ` Ferruh Yigit
2024-04-15 14:40       ` [PATCH v4] " Alan Elder
2024-04-15 18:11         ` Ferruh Yigit
2024-04-17 23:45           ` Long Li
2024-05-01  7:43         ` Morten Brørup
2024-03-19 14:19     ` [PATCH v2] " Alan Elder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).