DPDK patches and discussions
 help / color / mirror / Atom feed
From: David Marchand <david.marchand@redhat.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: dev@dpdk.org, chenbox@nvidia.com, bnemeth@redhat.com,
	echaudro@redhat.com,  stable@dpdk.org
Subject: Re: [PATCH 1/2] vhost: fix memory leak in Virtio Tx split path
Date: Wed, 31 Jan 2024 14:19:55 +0100	[thread overview]
Message-ID: <CAJFAV8xFAHy622ZsQqi=yrcmaz-jMRx7a0NwjVfCe56F5wx2LA@mail.gmail.com> (raw)
In-Reply-To: <20240131093113.2208894-1-maxime.coquelin@redhat.com>

On Wed, Jan 31, 2024 at 10:31 AM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> When vIOMMU is enabled and Virtio device is bound to kernel
> driver in guest, rte_vhost_dequeue_burst() will often return
> early because of IOTLB misses.
>
> This patch fixes a mbuf leak occurring in this case.
>
> Fixes: 242695f6122a ("vhost: allocate and free packets in bulk in Tx split")
> Cc: stable@dpdk.org
>
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/vhost/virtio_net.c | 20 ++++++++------------
>  1 file changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 280d4845f8..db9985c9b9 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -3120,11 +3120,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>                                                 VHOST_ACCESS_RO) < 0))
>                         break;
>
> -               c(vq, head_idx, 0);
> -
>                 if (unlikely(buf_len <= dev->vhost_hlen)) {
> -                       dropped += 1;
> -                       i++;
> +                       dropped = 1;
>                         break;
>                 }

``i`` was used for both filling the returned mbuf array, but also to
update the shadow_used_idx / array.

So this change here also affects how the currently considered
descriptor is returned through the used ring.

See below...

>
> @@ -3143,8 +3140,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>                                         buf_len, mbuf_pool->name);
>                                 allocerr_warned = true;
>                         }
> -                       dropped += 1;
> -                       i++;
> +                       dropped = 1;
>                         break;
>                 }
>
> @@ -3155,17 +3151,17 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>                                 VHOST_DATA_LOG(dev->ifname, ERR, "failed to copy desc to mbuf.");
>                                 allocerr_warned = true;
>                         }
> -                       dropped += 1;
> -                       i++;
> +                       dropped = 1;
>                         break;
>                 }
>
> +               update_shadow_used_ring_split(vq, head_idx, 0);
>         }
>
> -       if (dropped)
> -               rte_pktmbuf_free_bulk(&pkts[i - 1], count - i + 1);
> +       if (unlikely(count != i))
> +               rte_pktmbuf_free_bulk(&pkts[i], count - i);
>
> -       vq->last_avail_idx += i;
> +       vq->last_avail_idx += i + dropped;
>
>         do_data_copy_dequeue(vq);
>         if (unlikely(i < count))

... I am copying the rest of the context:
       if (unlikely(i < count))
               vq->shadow_used_idx = i;

Before the patch, when breaking and doing the i++ stuff,
vq->shadow_used_idx was probably already equal to i because
update_shadow_used_ring_split had been called earlier.
Because of this, the "dropped" descriptor was part of the shadow_used
array for returning it through the used ring.

With the patch, since we break without touching i, it means that the
"dropped" descriptor is not returned anymore.

Fixing this issue could take the form of restoring the call to
update_shadow_used_ring_split in the loop and adjust
vq->shadow_used_idx to i + dropped.

But I think we can go one step further, by noting that
vq->last_avail_idx is being incremented by the same "i + dropped"
value.
Meaning that we could simply rely on calls to
update_shadow_used_ring_split and reuse vq->shadow_used_idx to adjust
vq->last_avail_idx.
By doing this, there is no need for a "dropped" variable, and no need
for touching of vq->shadow_used_idx manually, which is probably better
for robustness / easing readability.


Note: we could also move the call do_data_copy_dequeue() under the
check on vq->shadow_used_idx != 0, though it won't probably change
anything.


This gives the following (untested) diff, on top of your fix:
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 211d24b36a..4e1d61bd54 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -3104,7 +3104,6 @@ virtio_dev_tx_split(struct virtio_net *dev,
struct vhost_virtqueue *vq,
 {
        uint16_t i;
        uint16_t avail_entries;
-       uint16_t dropped = 0;
        static bool allocerr_warned;

        /*
@@ -3141,10 +3140,10 @@ virtio_dev_tx_split(struct virtio_net *dev,
struct vhost_virtqueue *vq,
                                                VHOST_ACCESS_RO) < 0))
                        break;

-               if (unlikely(buf_len <= dev->vhost_hlen)) {
-                       dropped = 1;
+               update_shadow_used_ring_split(vq, head_idx, 0);
+
+               if (unlikely(buf_len <= dev->vhost_hlen))
                        break;
-               }

                buf_len -= dev->vhost_hlen;

@@ -3161,7 +3160,6 @@ virtio_dev_tx_split(struct virtio_net *dev,
struct vhost_virtqueue *vq,
                                        buf_len, mbuf_pool->name);
                                allocerr_warned = true;
                        }
-                       dropped = 1;
                        break;
                }

@@ -3172,22 +3170,16 @@ virtio_dev_tx_split(struct virtio_net *dev,
struct vhost_virtqueue *vq,
                                VHOST_DATA_LOG(dev->ifname, ERR,
"failed to copy desc to mbuf.");
                                allocerr_warned = true;
                        }
-                       dropped = 1;
                        break;
                }
-
-               update_shadow_used_ring_split(vq, head_idx, 0);
        }

        if (unlikely(count != i))
                rte_pktmbuf_free_bulk(&pkts[i], count - i);

-       vq->last_avail_idx += i + dropped;
-
        do_data_copy_dequeue(vq);
-       if (unlikely(i < count))
-               vq->shadow_used_idx = i;
        if (likely(vq->shadow_used_idx)) {
+               vq->last_avail_idx += vq->shadow_used_idx;
                flush_shadow_used_ring_split(dev, vq);
                vhost_vring_call_split(dev, vq);
        }


-- 
David Marchand


  parent reply	other threads:[~2024-01-31 13:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31  9:31 Maxime Coquelin
2024-01-31  9:31 ` [PATCH 2/2] vhost: add new mbuf allocation failure statistic Maxime Coquelin
2024-01-31 13:19 ` David Marchand [this message]
2024-01-31 13:32   ` [PATCH 1/2] vhost: fix memory leak in Virtio Tx split path Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJFAV8xFAHy622ZsQqi=yrcmaz-jMRx7a0NwjVfCe56F5wx2LA@mail.gmail.com' \
    --to=david.marchand@redhat.com \
    --cc=bnemeth@redhat.com \
    --cc=chenbox@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=echaudro@redhat.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).