* [dpdk-dev] [PATCH 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart
2021-09-23 8:17 [dpdk-dev] [PATCH 1/2] vdpa/mlx5: workaround FW first completion in start Xueming Li
@ 2021-09-23 8:17 ` Xueming Li
2021-10-13 10:06 ` Maxime Coquelin
2021-10-13 9:55 ` [dpdk-dev] [PATCH 1/2] vdpa/mlx5: workaround FW first completion in start Maxime Coquelin
` (2 subsequent siblings)
3 siblings, 1 reply; 16+ messages in thread
From: Xueming Li @ 2021-09-23 8:17 UTC (permalink / raw)
To: dev; +Cc: Matan Azrad, Viacheslav Ovsiienko
VAR is the device memory space for the virtio queues doorbells, qemu
could mmap it to directly to speed up doorbell push.
On a busy system, Qemu takes time to release VAR resources during driver
shutdown. If vdpa restarted quickly, the VAR allocation failed with
error 28 since the VAR is singleton resource per device.
This patch adds retry mechanism for VAR allocation.
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
---
drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 6d17d7a6f3..991739e984 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -693,7 +693,14 @@ mlx5_vdpa_dev_probe(struct rte_device *dev)
if (attr.num_lag_ports == 0)
priv->num_lag_ports = 1;
priv->ctx = ctx;
- priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
+ for (retry = 0; retry < 7; retry++) {
+ priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
+ if (priv->var != NULL)
+ break;
+ DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
+ /* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
+ usleep(100000U << retry);
+ }
if (!priv->var) {
DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
goto error;
--
2.33.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart
2021-09-23 8:17 ` [dpdk-dev] [PATCH 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart Xueming Li
@ 2021-10-13 10:06 ` Maxime Coquelin
2021-10-13 10:14 ` Xueming(Steven) Li
0 siblings, 1 reply; 16+ messages in thread
From: Maxime Coquelin @ 2021-10-13 10:06 UTC (permalink / raw)
To: Xueming Li, dev; +Cc: Matan Azrad, Viacheslav Ovsiienko
On 9/23/21 10:17, Xueming Li wrote:
> VAR is the device memory space for the virtio queues doorbells, qemu
> could mmap it to directly to speed up doorbell push.
>
> On a busy system, Qemu takes time to release VAR resources during driver
> shutdown. If vdpa restarted quickly, the VAR allocation failed with
> error 28 since the VAR is singleton resource per device.
>
> This patch adds retry mechanism for VAR allocation.
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> index 6d17d7a6f3..991739e984 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> @@ -693,7 +693,14 @@ mlx5_vdpa_dev_probe(struct rte_device *dev)
> if (attr.num_lag_ports == 0)
> priv->num_lag_ports = 1;
> priv->ctx = ctx;
> - priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
> + for (retry = 0; retry < 7; retry++) {
> + priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
> + if (priv->var != NULL)
> + break;
> + DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
> + /* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
> + usleep(100000U << retry);
> + }
> if (!priv->var) {
> DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
> goto error;
>
That looks fragile, but at least we have a warning we can rely on.
Shouldn't we have a way to wait for Qemu to release the resources at
vdpa driver shutdown time?
Also as on patch 1, please add a Fixes tag it you want it to be
backported.
Regards,
Maxime
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart
2021-10-13 10:06 ` Maxime Coquelin
@ 2021-10-13 10:14 ` Xueming(Steven) Li
0 siblings, 0 replies; 16+ messages in thread
From: Xueming(Steven) Li @ 2021-10-13 10:14 UTC (permalink / raw)
To: maxime.coquelin, dev; +Cc: Matan Azrad, Slava Ovsiienko
On Wed, 2021-10-13 at 12:06 +0200, Maxime Coquelin wrote:
>
> On 9/23/21 10:17, Xueming Li wrote:
> > VAR is the device memory space for the virtio queues doorbells, qemu
> > could mmap it to directly to speed up doorbell push.
> >
> > On a busy system, Qemu takes time to release VAR resources during driver
> > shutdown. If vdpa restarted quickly, the VAR allocation failed with
> > error 28 since the VAR is singleton resource per device.
> >
> > This patch adds retry mechanism for VAR allocation.
> >
> > Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> > Reviewed-by: Matan Azrad <matan@nvidia.com>
> > ---
> > drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> > index 6d17d7a6f3..991739e984 100644
> > --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> > @@ -693,7 +693,14 @@ mlx5_vdpa_dev_probe(struct rte_device *dev)
> > if (attr.num_lag_ports == 0)
> > priv->num_lag_ports = 1;
> > priv->ctx = ctx;
> > - priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
> > + for (retry = 0; retry < 7; retry++) {
> > + priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
> > + if (priv->var != NULL)
> > + break;
> > + DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
> > + /* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
> > + usleep(100000U << retry);
> > + }
> > if (!priv->var) {
> > DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
> > goto error;
> >
>
> That looks fragile, but at least we have a warning we can rely on.
> Shouldn't we have a way to wait for Qemu to release the resources at
> vdpa driver shutdown time?
If dpdk-vdpa get killed and restart, qemu will shutdown device and
unmap the resources independently.
>
> Also as on patch 1, please add a Fixes tag it you want it to be
> backported.
Agree to backport, but not a fix, I'll add cc:stable@dpdk.org, the
patch will be noticed by maintainer, thanks for the suggestion!
>
> Regards,
> Maxime
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] vdpa/mlx5: workaround FW first completion in start
2021-09-23 8:17 [dpdk-dev] [PATCH 1/2] vdpa/mlx5: workaround FW first completion in start Xueming Li
2021-09-23 8:17 ` [dpdk-dev] [PATCH 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart Xueming Li
@ 2021-10-13 9:55 ` Maxime Coquelin
2021-10-15 13:43 ` [dpdk-dev] [PATCH v1 " Xueming Li
2021-10-15 15:05 ` [dpdk-dev] [PATCH v2 " Xueming Li
3 siblings, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2021-10-13 9:55 UTC (permalink / raw)
To: Xueming Li, dev; +Cc: Matan Azrad, Viacheslav Ovsiienko
Hi Xueming,
On 9/23/21 10:17, Xueming Li wrote:
> After a vDPA application restart, qemu restores VQ with used and
> available index, new incoming packet triggers virtio driver to
> handle buffers. Under heavy traffic, no available buffer for
> firmware to receive new packets, no Rx interrupts generated,
> driver is stuck on endless interrupt waiting.
>
> As a firmware workaround, this patch sends a notification after
> VQ setup to ask driver handling buffers and filling new buffers.
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> index f530646058..71470d23d9 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> @@ -4,6 +4,7 @@
> #include <string.h>
> #include <unistd.h>
> #include <sys/mman.h>
> +#include <sys/eventfd.h>
>
> #include <rte_malloc.h>
> #include <rte_errno.h>
> @@ -367,6 +368,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
> goto error;
> }
> virtq->stopped = false;
> + /* Initial notification to ask qemu handling completed buffers. */
> + if (virtq->eqp.cq.callfd != -1)
> + eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
> DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid,
> index);
> return 0;
>
Maybe this patch should be backported to stable branch?
If so, could you reply with the Fixes tag so that I can add it while
applying?
Thanks,
Maxime
^ permalink raw reply [flat|nested] 16+ messages in thread
* [dpdk-dev] [PATCH v1 1/2] vdpa/mlx5: workaround FW first completion in start
2021-09-23 8:17 [dpdk-dev] [PATCH 1/2] vdpa/mlx5: workaround FW first completion in start Xueming Li
2021-09-23 8:17 ` [dpdk-dev] [PATCH 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart Xueming Li
2021-10-13 9:55 ` [dpdk-dev] [PATCH 1/2] vdpa/mlx5: workaround FW first completion in start Maxime Coquelin
@ 2021-10-15 13:43 ` Xueming Li
2021-10-15 13:43 ` [dpdk-dev] [PATCH v1 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart Xueming Li
2021-10-15 13:57 ` [dpdk-dev] [PATCH v1 1/2] vdpa/mlx5: workaround FW first completion in start Maxime Coquelin
2021-10-15 15:05 ` [dpdk-dev] [PATCH v2 " Xueming Li
3 siblings, 2 replies; 16+ messages in thread
From: Xueming Li @ 2021-10-15 13:43 UTC (permalink / raw)
To: Maxime Coquelin, dev; +Cc: xuemingl, stable, Matan Azrad, Viacheslav Ovsiienko
After a vDPA application restart, qemu restores VQ with used and
available index, new incoming packet triggers virtio driver to
handle buffers. Under heavy traffic, no available buffer for
firmware to receive new packets, no Rx interrupts generated,
driver is stuck on endless interrupt waiting.
As a firmware workaround, this patch sends a notification after
VQ setup to ask driver handling buffers and filling new buffers.
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
---
drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index f530646058f..71470d23d9e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -4,6 +4,7 @@
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
+#include <sys/eventfd.h>
#include <rte_malloc.h>
#include <rte_errno.h>
@@ -367,6 +368,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
goto error;
}
virtq->stopped = false;
+ /* Initial notification to ask qemu handling completed buffers. */
+ if (virtq->eqp.cq.callfd != -1)
+ eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid,
index);
return 0;
--
2.33.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [dpdk-dev] [PATCH v1 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart
2021-10-15 13:43 ` [dpdk-dev] [PATCH v1 " Xueming Li
@ 2021-10-15 13:43 ` Xueming Li
2021-10-15 13:57 ` [dpdk-dev] [PATCH v1 1/2] vdpa/mlx5: workaround FW first completion in start Maxime Coquelin
1 sibling, 0 replies; 16+ messages in thread
From: Xueming Li @ 2021-10-15 13:43 UTC (permalink / raw)
To: Maxime Coquelin, dev; +Cc: xuemingl, stable, Matan Azrad, Viacheslav Ovsiienko
VAR is the device memory space for the virtio queues doorbells, qemu
could mmap it to directly to speed up doorbell push.
On a busy system, Qemu takes time to release VAR resources during driver
shutdown. If vdpa restarted quickly, the VAR allocation failed with
error 28 since the VAR is singleton resource per device.
This patch adds retry mechanism for VAR allocation.
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
---
drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 6d17d7a6f3e..991739e9840 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -693,7 +693,14 @@ mlx5_vdpa_dev_probe(struct rte_device *dev)
if (attr.num_lag_ports == 0)
priv->num_lag_ports = 1;
priv->ctx = ctx;
- priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
+ for (retry = 0; retry < 7; retry++) {
+ priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
+ if (priv->var != NULL)
+ break;
+ DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
+ /* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
+ usleep(100000U << retry);
+ }
if (!priv->var) {
DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
goto error;
--
2.33.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v1 1/2] vdpa/mlx5: workaround FW first completion in start
2021-10-15 13:43 ` [dpdk-dev] [PATCH v1 " Xueming Li
2021-10-15 13:43 ` [dpdk-dev] [PATCH v1 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart Xueming Li
@ 2021-10-15 13:57 ` Maxime Coquelin
2021-10-15 14:51 ` Xueming(Steven) Li
1 sibling, 1 reply; 16+ messages in thread
From: Maxime Coquelin @ 2021-10-15 13:57 UTC (permalink / raw)
To: Xueming Li, dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
On 10/15/21 15:43, Xueming Li wrote:
> After a vDPA application restart, qemu restores VQ with used and
> available index, new incoming packet triggers virtio driver to
> handle buffers. Under heavy traffic, no available buffer for
> firmware to receive new packets, no Rx interrupts generated,
> driver is stuck on endless interrupt waiting.
>
> As a firmware workaround, this patch sends a notification after
> VQ setup to ask driver handling buffers and filling new buffers.
>
As I mentionned on my reply to the v1, I would expect a Fixes tag,
it would make downstream maintainers life easier.
Maybe pointing to the commit introducing the function would help.
this is not ideal, but otherwise the risk is that your patch get
missed by the stable maintainers.
Thanks!
Maxime
> Cc: stable@dpdk.org
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> index f530646058f..71470d23d9e 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> @@ -4,6 +4,7 @@
> #include <string.h>
> #include <unistd.h>
> #include <sys/mman.h>
> +#include <sys/eventfd.h>
>
> #include <rte_malloc.h>
> #include <rte_errno.h>
> @@ -367,6 +368,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
> goto error;
> }
> virtq->stopped = false;
> + /* Initial notification to ask qemu handling completed buffers. */
> + if (virtq->eqp.cq.callfd != -1)
> + eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
> DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid,
> index);
> return 0;
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v1 1/2] vdpa/mlx5: workaround FW first completion in start
2021-10-15 13:57 ` [dpdk-dev] [PATCH v1 1/2] vdpa/mlx5: workaround FW first completion in start Maxime Coquelin
@ 2021-10-15 14:51 ` Xueming(Steven) Li
0 siblings, 0 replies; 16+ messages in thread
From: Xueming(Steven) Li @ 2021-10-15 14:51 UTC (permalink / raw)
To: maxime.coquelin, dev; +Cc: Matan Azrad, Slava Ovsiienko, stable
On Fri, 2021-10-15 at 15:57 +0200, Maxime Coquelin wrote:
>
> On 10/15/21 15:43, Xueming Li wrote:
> > After a vDPA application restart, qemu restores VQ with used and
> > available index, new incoming packet triggers virtio driver to
> > handle buffers. Under heavy traffic, no available buffer for
> > firmware to receive new packets, no Rx interrupts generated,
> > driver is stuck on endless interrupt waiting.
> >
> > As a firmware workaround, this patch sends a notification after
> > VQ setup to ask driver handling buffers and filling new buffers.
> >
>
> As I mentionned on my reply to the v1, I would expect a Fixes tag,
> it would make downstream maintainers life easier.
>
> Maybe pointing to the commit introducing the function would help.
> this is not ideal, but otherwise the risk is that your patch get
> missed by the stable maintainers.
Yes, my bad, a Fixes tag should be helpful to identify which LTS need
it, thanks!
>
> Thanks!
> Maxime
>
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> > Reviewed-by: Matan Azrad <matan@nvidia.com>
> > ---
> > drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> > index f530646058f..71470d23d9e 100644
> > --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> > +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
> > @@ -4,6 +4,7 @@
> > #include <string.h>
> > #include <unistd.h>
> > #include <sys/mman.h>
> > +#include <sys/eventfd.h>
> >
> > #include <rte_malloc.h>
> > #include <rte_errno.h>
> > @@ -367,6 +368,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
> > goto error;
> > }
> > virtq->stopped = false;
> > + /* Initial notification to ask qemu handling completed buffers. */
> > + if (virtq->eqp.cq.callfd != -1)
> > + eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
> > DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid,
> > index);
> > return 0;
> >
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [dpdk-dev] [PATCH v2 1/2] vdpa/mlx5: workaround FW first completion in start
2021-09-23 8:17 [dpdk-dev] [PATCH 1/2] vdpa/mlx5: workaround FW first completion in start Xueming Li
` (2 preceding siblings ...)
2021-10-15 13:43 ` [dpdk-dev] [PATCH v1 " Xueming Li
@ 2021-10-15 15:05 ` Xueming Li
2021-10-15 15:05 ` [dpdk-dev] [PATCH v2 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart Xueming Li
` (2 more replies)
3 siblings, 3 replies; 16+ messages in thread
From: Xueming Li @ 2021-10-15 15:05 UTC (permalink / raw)
To: dev; +Cc: xuemingl, Maxime Coquelin, stable, Matan Azrad, Viacheslav Ovsiienko
After a vDPA application restart, qemu restores VQ with used and
available index, new incoming packet triggers virtio driver to
handle buffers. Under heavy traffic, no available buffer for
firmware to receive new packets, no Rx interrupts generated,
driver is stuck on endless interrupt waiting.
As a firmware workaround, this patch sends a notification after
VQ setup to ask driver handling buffers and filling new buffers.
Fixes: bff735011078 ("vdpa/mlx5: prepare virtio queues")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
---
drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
index f530646058f..71470d23d9e 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c
@@ -4,6 +4,7 @@
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
+#include <sys/eventfd.h>
#include <rte_malloc.h>
#include <rte_errno.h>
@@ -367,6 +368,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index)
goto error;
}
virtq->stopped = false;
+ /* Initial notification to ask qemu handling completed buffers. */
+ if (virtq->eqp.cq.callfd != -1)
+ eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1);
DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid,
index);
return 0;
--
2.33.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [dpdk-dev] [PATCH v2 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart
2021-10-15 15:05 ` [dpdk-dev] [PATCH v2 " Xueming Li
@ 2021-10-15 15:05 ` Xueming Li
2021-10-21 9:40 ` Maxime Coquelin
2021-10-21 12:27 ` Maxime Coquelin
2021-10-21 9:40 ` [dpdk-dev] [PATCH v2 1/2] vdpa/mlx5: workaround FW first completion in start Maxime Coquelin
2021-10-21 12:27 ` Maxime Coquelin
2 siblings, 2 replies; 16+ messages in thread
From: Xueming Li @ 2021-10-15 15:05 UTC (permalink / raw)
To: dev; +Cc: xuemingl, Maxime Coquelin, stable, Matan Azrad, Viacheslav Ovsiienko
VAR is the device memory space for the virtio queues doorbells, qemu
could mmap it to directly to speed up doorbell push.
On a busy system, Qemu takes time to release VAR resources during driver
shutdown. If vdpa restarted quickly, the VAR allocation failed with
error 28 since the VAR is singleton resource per device.
This patch adds retry mechanism for VAR allocation.
Fixes: 4cae722c1b06 ("vdpa/mlx5: move virtual doorbell alloc to probe")
Cc: stable@dpdk.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Reviewed-by: Matan Azrad <matan@nvidia.com>
---
drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
index 6d17d7a6f3e..991739e9840 100644
--- a/drivers/vdpa/mlx5/mlx5_vdpa.c
+++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
@@ -693,7 +693,14 @@ mlx5_vdpa_dev_probe(struct rte_device *dev)
if (attr.num_lag_ports == 0)
priv->num_lag_ports = 1;
priv->ctx = ctx;
- priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
+ for (retry = 0; retry < 7; retry++) {
+ priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
+ if (priv->var != NULL)
+ break;
+ DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
+ /* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
+ usleep(100000U << retry);
+ }
if (!priv->var) {
DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
goto error;
--
2.33.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart
2021-10-15 15:05 ` [dpdk-dev] [PATCH v2 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart Xueming Li
@ 2021-10-21 9:40 ` Maxime Coquelin
2021-10-21 12:27 ` Maxime Coquelin
1 sibling, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2021-10-21 9:40 UTC (permalink / raw)
To: Xueming Li, dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
On 10/15/21 17:05, Xueming Li wrote:
> VAR is the device memory space for the virtio queues doorbells, qemu
> could mmap it to directly to speed up doorbell push.
>
> On a busy system, Qemu takes time to release VAR resources during driver
> shutdown. If vdpa restarted quickly, the VAR allocation failed with
> error 28 since the VAR is singleton resource per device.
>
> This patch adds retry mechanism for VAR allocation.
>
> Fixes: 4cae722c1b06 ("vdpa/mlx5: move virtual doorbell alloc to probe")
> Cc: stable@dpdk.org
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> index 6d17d7a6f3e..991739e9840 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> @@ -693,7 +693,14 @@ mlx5_vdpa_dev_probe(struct rte_device *dev)
> if (attr.num_lag_ports == 0)
> priv->num_lag_ports = 1;
> priv->ctx = ctx;
> - priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
> + for (retry = 0; retry < 7; retry++) {
> + priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
> + if (priv->var != NULL)
> + break;
> + DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
> + /* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
> + usleep(100000U << retry);
> + }
> if (!priv->var) {
> DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
> goto error;
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v2 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart
2021-10-15 15:05 ` [dpdk-dev] [PATCH v2 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart Xueming Li
2021-10-21 9:40 ` Maxime Coquelin
@ 2021-10-21 12:27 ` Maxime Coquelin
1 sibling, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2021-10-21 12:27 UTC (permalink / raw)
To: Xueming Li, dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
On 10/15/21 17:05, Xueming Li wrote:
> VAR is the device memory space for the virtio queues doorbells, qemu
> could mmap it to directly to speed up doorbell push.
>
> On a busy system, Qemu takes time to release VAR resources during driver
> shutdown. If vdpa restarted quickly, the VAR allocation failed with
> error 28 since the VAR is singleton resource per device.
>
> This patch adds retry mechanism for VAR allocation.
>
> Fixes: 4cae722c1b06 ("vdpa/mlx5: move virtual doorbell alloc to probe")
> Cc: stable@dpdk.org
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
Applied to dpdk-next-virtio/main.
Thanks,
Maxime
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/2] vdpa/mlx5: workaround FW first completion in start
2021-10-15 15:05 ` [dpdk-dev] [PATCH v2 " Xueming Li
2021-10-15 15:05 ` [dpdk-dev] [PATCH v2 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart Xueming Li
@ 2021-10-21 9:40 ` Maxime Coquelin
2021-10-21 12:27 ` Maxime Coquelin
2 siblings, 0 replies; 16+ messages in thread
From: Maxime Coquelin @ 2021-10-21 9:40 UTC (permalink / raw)
To: Xueming Li, dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
On 10/15/21 17:05, Xueming Li wrote:
> After a vDPA application restart, qemu restores VQ with used and
> available index, new incoming packet triggers virtio driver to
> handle buffers. Under heavy traffic, no available buffer for
> firmware to receive new packets, no Rx interrupts generated,
> driver is stuck on endless interrupt waiting.
>
> As a firmware workaround, this patch sends a notification after
> VQ setup to ask driver handling buffers and filling new buffers.
>
> Fixes: bff735011078 ("vdpa/mlx5: prepare virtio queues")
> Cc: stable@dpdk.org
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/2] vdpa/mlx5: workaround FW first completion in start
2021-10-15 15:05 ` [dpdk-dev] [PATCH v2 " Xueming Li
2021-10-15 15:05 ` [dpdk-dev] [PATCH v2 2/2] vdpa/mlx5: retry VAR allocation during vDPA restart Xueming Li
2021-10-21 9:40 ` [dpdk-dev] [PATCH v2 1/2] vdpa/mlx5: workaround FW first completion in start Maxime Coquelin
@ 2021-10-21 12:27 ` Maxime Coquelin
2021-10-21 12:36 ` Xueming(Steven) Li
2 siblings, 1 reply; 16+ messages in thread
From: Maxime Coquelin @ 2021-10-21 12:27 UTC (permalink / raw)
To: Xueming Li, dev; +Cc: stable, Matan Azrad, Viacheslav Ovsiienko
On 10/15/21 17:05, Xueming Li wrote:
> After a vDPA application restart, qemu restores VQ with used and
> available index, new incoming packet triggers virtio driver to
> handle buffers. Under heavy traffic, no available buffer for
> firmware to receive new packets, no Rx interrupts generated,
> driver is stuck on endless interrupt waiting.
>
> As a firmware workaround, this patch sends a notification after
> VQ setup to ask driver handling buffers and filling new buffers.
>
> Fixes: bff735011078 ("vdpa/mlx5: prepare virtio queues")
> Cc: stable@dpdk.org
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
Applied to dpdk-next-virtio/main.
Thanks,
Maxime
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/2] vdpa/mlx5: workaround FW first completion in start
2021-10-21 12:27 ` Maxime Coquelin
@ 2021-10-21 12:36 ` Xueming(Steven) Li
0 siblings, 0 replies; 16+ messages in thread
From: Xueming(Steven) Li @ 2021-10-21 12:36 UTC (permalink / raw)
To: maxime.coquelin, dev; +Cc: Matan Azrad, Slava Ovsiienko, stable
On Thu, 2021-10-21 at 14:27 +0200, Maxime Coquelin wrote:
>
> On 10/15/21 17:05, Xueming Li wrote:
> > After a vDPA application restart, qemu restores VQ with used and
> > available index, new incoming packet triggers virtio driver to
> > handle buffers. Under heavy traffic, no available buffer for
> > firmware to receive new packets, no Rx interrupts generated,
> > driver is stuck on endless interrupt waiting.
> >
> > As a firmware workaround, this patch sends a notification after
> > VQ setup to ask driver handling buffers and filling new buffers.
> >
> > Fixes: bff735011078 ("vdpa/mlx5: prepare virtio queues")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> > Reviewed-by: Matan Azrad <matan@nvidia.com>
> > ---
> > drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
>
>
> Applied to dpdk-next-virtio/main.
Thanks Maxime!
>
> Thanks,
> Maxime
>
^ permalink raw reply [flat|nested] 16+ messages in thread