After a vDPA application restart, qemu restores VQ with used and available index, new incoming packet triggers virtio driver to handle buffers. Under heavy traffic, no available buffer for firmware to receive new packets, no Rx interrupts generated, driver is stuck on endless interrupt waiting. As a firmware workaround, this patch sends a notification after VQ setup to ask driver handling buffers and filling new buffers. Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Matan Azrad <matan@nvidia.com> --- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c index f530646058f..71470d23d9e 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c @@ -4,6 +4,7 @@ #include <string.h> #include <unistd.h> #include <sys/mman.h> +#include <sys/eventfd.h> #include <rte_malloc.h> #include <rte_errno.h> @@ -367,6 +368,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index) goto error; } virtq->stopped = false; + /* Initial notification to ask qemu handling completed buffers. */ + if (virtq->eqp.cq.callfd != -1) + eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1); DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid, index); return 0; -- 2.33.0
VAR is the device memory space for the virtio queues doorbells, qemu could mmap it to directly to speed up doorbell push. On a busy system, Qemu takes time to release VAR resources during driver shutdown. If vdpa restarted quickly, the VAR allocation failed with error 28 since the VAR is singleton resource per device. This patch adds retry mechanism for VAR allocation. Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Matan Azrad <matan@nvidia.com> --- drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index 6d17d7a6f3e..991739e9840 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -693,7 +693,14 @@ mlx5_vdpa_dev_probe(struct rte_device *dev) if (attr.num_lag_ports == 0) priv->num_lag_ports = 1; priv->ctx = ctx; - priv->var = mlx5_glue->dv_alloc_var(ctx, 0); + for (retry = 0; retry < 7; retry++) { + priv->var = mlx5_glue->dv_alloc_var(ctx, 0); + if (priv->var != NULL) + break; + DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry); + /* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */ + usleep(100000U << retry); + } if (!priv->var) { DRV_LOG(ERR, "Failed to allocate VAR %u.", errno); goto error; -- 2.33.0
On 10/15/21 15:43, Xueming Li wrote: > After a vDPA application restart, qemu restores VQ with used and > available index, new incoming packet triggers virtio driver to > handle buffers. Under heavy traffic, no available buffer for > firmware to receive new packets, no Rx interrupts generated, > driver is stuck on endless interrupt waiting. > > As a firmware workaround, this patch sends a notification after > VQ setup to ask driver handling buffers and filling new buffers. > As I mentionned on my reply to the v1, I would expect a Fixes tag, it would make downstream maintainers life easier. Maybe pointing to the commit introducing the function would help. this is not ideal, but otherwise the risk is that your patch get missed by the stable maintainers. Thanks! Maxime > Cc: stable@dpdk.org > > Signed-off-by: Xueming Li <xuemingl@nvidia.com> > Reviewed-by: Matan Azrad <matan@nvidia.com> > --- > drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c > index f530646058f..71470d23d9e 100644 > --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c > +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c > @@ -4,6 +4,7 @@ > #include <string.h> > #include <unistd.h> > #include <sys/mman.h> > +#include <sys/eventfd.h> > > #include <rte_malloc.h> > #include <rte_errno.h> > @@ -367,6 +368,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index) > goto error; > } > virtq->stopped = false; > + /* Initial notification to ask qemu handling completed buffers. */ > + if (virtq->eqp.cq.callfd != -1) > + eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1); > DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid, > index); > return 0; >
On Fri, 2021-10-15 at 15:57 +0200, Maxime Coquelin wrote: > > On 10/15/21 15:43, Xueming Li wrote: > > After a vDPA application restart, qemu restores VQ with used and > > available index, new incoming packet triggers virtio driver to > > handle buffers. Under heavy traffic, no available buffer for > > firmware to receive new packets, no Rx interrupts generated, > > driver is stuck on endless interrupt waiting. > > > > As a firmware workaround, this patch sends a notification after > > VQ setup to ask driver handling buffers and filling new buffers. > > > > As I mentionned on my reply to the v1, I would expect a Fixes tag, > it would make downstream maintainers life easier. > > Maybe pointing to the commit introducing the function would help. > this is not ideal, but otherwise the risk is that your patch get > missed by the stable maintainers. Yes, my bad, a Fixes tag should be helpful to identify which LTS need it, thanks! > > Thanks! > Maxime > > > Cc: stable@dpdk.org > > > > Signed-off-by: Xueming Li <xuemingl@nvidia.com> > > Reviewed-by: Matan Azrad <matan@nvidia.com> > > --- > > drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c > > index f530646058f..71470d23d9e 100644 > > --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c > > +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c > > @@ -4,6 +4,7 @@ > > #include <string.h> > > #include <unistd.h> > > #include <sys/mman.h> > > +#include <sys/eventfd.h> > > > > #include <rte_malloc.h> > > #include <rte_errno.h> > > @@ -367,6 +368,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index) > > goto error; > > } > > virtq->stopped = false; > > + /* Initial notification to ask qemu handling completed buffers. */ > > + if (virtq->eqp.cq.callfd != -1) > > + eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1); > > DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid, > > index); > > return 0; > > >
After a vDPA application restart, qemu restores VQ with used and available index, new incoming packet triggers virtio driver to handle buffers. Under heavy traffic, no available buffer for firmware to receive new packets, no Rx interrupts generated, driver is stuck on endless interrupt waiting. As a firmware workaround, this patch sends a notification after VQ setup to ask driver handling buffers and filling new buffers. Fixes: bff735011078 ("vdpa/mlx5: prepare virtio queues") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Matan Azrad <matan@nvidia.com> --- drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c index f530646058f..71470d23d9e 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa_virtq.c @@ -4,6 +4,7 @@ #include <string.h> #include <unistd.h> #include <sys/mman.h> +#include <sys/eventfd.h> #include <rte_malloc.h> #include <rte_errno.h> @@ -367,6 +368,9 @@ mlx5_vdpa_virtq_setup(struct mlx5_vdpa_priv *priv, int index) goto error; } virtq->stopped = false; + /* Initial notification to ask qemu handling completed buffers. */ + if (virtq->eqp.cq.callfd != -1) + eventfd_write(virtq->eqp.cq.callfd, (eventfd_t)1); DRV_LOG(DEBUG, "vid %u virtq %u was created successfully.", priv->vid, index); return 0; -- 2.33.0
VAR is the device memory space for the virtio queues doorbells, qemu could mmap it to directly to speed up doorbell push. On a busy system, Qemu takes time to release VAR resources during driver shutdown. If vdpa restarted quickly, the VAR allocation failed with error 28 since the VAR is singleton resource per device. This patch adds retry mechanism for VAR allocation. Fixes: 4cae722c1b06 ("vdpa/mlx5: move virtual doorbell alloc to probe") Cc: stable@dpdk.org Signed-off-by: Xueming Li <xuemingl@nvidia.com> Reviewed-by: Matan Azrad <matan@nvidia.com> --- drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c index 6d17d7a6f3e..991739e9840 100644 --- a/drivers/vdpa/mlx5/mlx5_vdpa.c +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c @@ -693,7 +693,14 @@ mlx5_vdpa_dev_probe(struct rte_device *dev) if (attr.num_lag_ports == 0) priv->num_lag_ports = 1; priv->ctx = ctx; - priv->var = mlx5_glue->dv_alloc_var(ctx, 0); + for (retry = 0; retry < 7; retry++) { + priv->var = mlx5_glue->dv_alloc_var(ctx, 0); + if (priv->var != NULL) + break; + DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry); + /* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */ + usleep(100000U << retry); + } if (!priv->var) { DRV_LOG(ERR, "Failed to allocate VAR %u.", errno); goto error; -- 2.33.0
On 10/15/21 17:05, Xueming Li wrote:
> After a vDPA application restart, qemu restores VQ with used and
> available index, new incoming packet triggers virtio driver to
> handle buffers. Under heavy traffic, no available buffer for
> firmware to receive new packets, no Rx interrupts generated,
> driver is stuck on endless interrupt waiting.
>
> As a firmware workaround, this patch sends a notification after
> VQ setup to ask driver handling buffers and filling new buffers.
>
> Fixes: bff735011078 ("vdpa/mlx5: prepare virtio queues")
> Cc: stable@dpdk.org
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
On 10/15/21 17:05, Xueming Li wrote:
> VAR is the device memory space for the virtio queues doorbells, qemu
> could mmap it to directly to speed up doorbell push.
>
> On a busy system, Qemu takes time to release VAR resources during driver
> shutdown. If vdpa restarted quickly, the VAR allocation failed with
> error 28 since the VAR is singleton resource per device.
>
> This patch adds retry mechanism for VAR allocation.
>
> Fixes: 4cae722c1b06 ("vdpa/mlx5: move virtual doorbell alloc to probe")
> Cc: stable@dpdk.org
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/mlx5/mlx5_vdpa.c b/drivers/vdpa/mlx5/mlx5_vdpa.c
> index 6d17d7a6f3e..991739e9840 100644
> --- a/drivers/vdpa/mlx5/mlx5_vdpa.c
> +++ b/drivers/vdpa/mlx5/mlx5_vdpa.c
> @@ -693,7 +693,14 @@ mlx5_vdpa_dev_probe(struct rte_device *dev)
> if (attr.num_lag_ports == 0)
> priv->num_lag_ports = 1;
> priv->ctx = ctx;
> - priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
> + for (retry = 0; retry < 7; retry++) {
> + priv->var = mlx5_glue->dv_alloc_var(ctx, 0);
> + if (priv->var != NULL)
> + break;
> + DRV_LOG(WARNING, "Failed to allocate VAR, retry %d.\n", retry);
> + /* Wait Qemu release VAR during vdpa restart, 0.1 sec based. */
> + usleep(100000U << retry);
> + }
> if (!priv->var) {
> DRV_LOG(ERR, "Failed to allocate VAR %u.", errno);
> goto error;
>
Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Thanks,
Maxime
On 10/15/21 17:05, Xueming Li wrote:
> After a vDPA application restart, qemu restores VQ with used and
> available index, new incoming packet triggers virtio driver to
> handle buffers. Under heavy traffic, no available buffer for
> firmware to receive new packets, no Rx interrupts generated,
> driver is stuck on endless interrupt waiting.
>
> As a firmware workaround, this patch sends a notification after
> VQ setup to ask driver handling buffers and filling new buffers.
>
> Fixes: bff735011078 ("vdpa/mlx5: prepare virtio queues")
> Cc: stable@dpdk.org
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
Applied to dpdk-next-virtio/main.
Thanks,
Maxime
On 10/15/21 17:05, Xueming Li wrote:
> VAR is the device memory space for the virtio queues doorbells, qemu
> could mmap it to directly to speed up doorbell push.
>
> On a busy system, Qemu takes time to release VAR resources during driver
> shutdown. If vdpa restarted quickly, the VAR allocation failed with
> error 28 since the VAR is singleton resource per device.
>
> This patch adds retry mechanism for VAR allocation.
>
> Fixes: 4cae722c1b06 ("vdpa/mlx5: move virtual doorbell alloc to probe")
> Cc: stable@dpdk.org
>
> Signed-off-by: Xueming Li <xuemingl@nvidia.com>
> Reviewed-by: Matan Azrad <matan@nvidia.com>
> ---
> drivers/vdpa/mlx5/mlx5_vdpa.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
Applied to dpdk-next-virtio/main.
Thanks,
Maxime
On Thu, 2021-10-21 at 14:27 +0200, Maxime Coquelin wrote: > > On 10/15/21 17:05, Xueming Li wrote: > > After a vDPA application restart, qemu restores VQ with used and > > available index, new incoming packet triggers virtio driver to > > handle buffers. Under heavy traffic, no available buffer for > > firmware to receive new packets, no Rx interrupts generated, > > driver is stuck on endless interrupt waiting. > > > > As a firmware workaround, this patch sends a notification after > > VQ setup to ask driver handling buffers and filling new buffers. > > > > Fixes: bff735011078 ("vdpa/mlx5: prepare virtio queues") > > Cc: stable@dpdk.org > > > > Signed-off-by: Xueming Li <xuemingl@nvidia.com> > > Reviewed-by: Matan Azrad <matan@nvidia.com> > > --- > > drivers/vdpa/mlx5/mlx5_vdpa_virtq.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > > Applied to dpdk-next-virtio/main. Thanks Maxime! > > Thanks, > Maxime >