From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: David Marchand <david.marchand@redhat.com>
Cc: dev@dpdk.org, chenbox@nvidia.com, bnemeth@redhat.com,
echaudro@redhat.com, stable@dpdk.org
Subject: Re: [PATCH 1/2] vhost: fix memory leak in Virtio Tx split path
Date: Wed, 31 Jan 2024 14:32:28 +0100 [thread overview]
Message-ID: <9efc5d7e-1e44-4338-8264-6d544a895d2d@redhat.com> (raw)
In-Reply-To: <CAJFAV8xFAHy622ZsQqi=yrcmaz-jMRx7a0NwjVfCe56F5wx2LA@mail.gmail.com>
Hi David,
On 1/31/24 14:19, David Marchand wrote:
> On Wed, Jan 31, 2024 at 10:31 AM Maxime Coquelin
> <maxime.coquelin@redhat.com> wrote:
>>
>> When vIOMMU is enabled and Virtio device is bound to kernel
>> driver in guest, rte_vhost_dequeue_burst() will often return
>> early because of IOTLB misses.
>>
>> This patch fixes a mbuf leak occurring in this case.
>>
>> Fixes: 242695f6122a ("vhost: allocate and free packets in bulk in Tx split")
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>> lib/vhost/virtio_net.c | 20 ++++++++------------
>> 1 file changed, 8 insertions(+), 12 deletions(-)
>>
>> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
>> index 280d4845f8..db9985c9b9 100644
>> --- a/lib/vhost/virtio_net.c
>> +++ b/lib/vhost/virtio_net.c
>> @@ -3120,11 +3120,8 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>> VHOST_ACCESS_RO) < 0))
>> break;
>>
>> - c(vq, head_idx, 0);
>> -
>> if (unlikely(buf_len <= dev->vhost_hlen)) {
>> - dropped += 1;
>> - i++;
>> + dropped = 1;
>> break;
>> }
>
> ``i`` was used for both filling the returned mbuf array, but also to
> update the shadow_used_idx / array.
>
> So this change here also affects how the currently considered
> descriptor is returned through the used ring.
>
> See below...
>
>>
>> @@ -3143,8 +3140,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>> buf_len, mbuf_pool->name);
>> allocerr_warned = true;
>> }
>> - dropped += 1;
>> - i++;
>> + dropped = 1;
>> break;
>> }
>>
>> @@ -3155,17 +3151,17 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>> VHOST_DATA_LOG(dev->ifname, ERR, "failed to copy desc to mbuf.");
>> allocerr_warned = true;
>> }
>> - dropped += 1;
>> - i++;
>> + dropped = 1;
>> break;
>> }
>>
>> + update_shadow_used_ring_split(vq, head_idx, 0);
>> }
>>
>> - if (dropped)
>> - rte_pktmbuf_free_bulk(&pkts[i - 1], count - i + 1);
>> + if (unlikely(count != i))
>> + rte_pktmbuf_free_bulk(&pkts[i], count - i);
>>
>> - vq->last_avail_idx += i;
>> + vq->last_avail_idx += i + dropped;
>>
>> do_data_copy_dequeue(vq);
>> if (unlikely(i < count))
>
> ... I am copying the rest of the context:
> if (unlikely(i < count))
> vq->shadow_used_idx = i;
>
> Before the patch, when breaking and doing the i++ stuff,
> vq->shadow_used_idx was probably already equal to i because
> update_shadow_used_ring_split had been called earlier.
> Because of this, the "dropped" descriptor was part of the shadow_used
> array for returning it through the used ring.
>
> With the patch, since we break without touching i, it means that the
> "dropped" descriptor is not returned anymore.
>
> Fixing this issue could take the form of restoring the call to
> update_shadow_used_ring_split in the loop and adjust
> vq->shadow_used_idx to i + dropped.
>
> But I think we can go one step further, by noting that
> vq->last_avail_idx is being incremented by the same "i + dropped"
> value.
> Meaning that we could simply rely on calls to
> update_shadow_used_ring_split and reuse vq->shadow_used_idx to adjust
> vq->last_avail_idx.
> By doing this, there is no need for a "dropped" variable, and no need
> for touching of vq->shadow_used_idx manually, which is probably better
> for robustness / easing readability.
I fully agree with your suggestion as discussed off-list.
On top of that, it also makes the split code closer to the packed one.
>
> Note: we could also move the call do_data_copy_dequeue() under the
> check on vq->shadow_used_idx != 0, though it won't probably change
> anything.
I will move the call to do_data_copy_dequeue() under vq->shadow_used_idx
!= 0 check.
Thanks,
Maxime
>
> This gives the following (untested) diff, on top of your fix:
> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
> index 211d24b36a..4e1d61bd54 100644
> --- a/lib/vhost/virtio_net.c
> +++ b/lib/vhost/virtio_net.c
> @@ -3104,7 +3104,6 @@ virtio_dev_tx_split(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> {
> uint16_t i;
> uint16_t avail_entries;
> - uint16_t dropped = 0;
> static bool allocerr_warned;
>
> /*
> @@ -3141,10 +3140,10 @@ virtio_dev_tx_split(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> VHOST_ACCESS_RO) < 0))
> break;
>
> - if (unlikely(buf_len <= dev->vhost_hlen)) {
> - dropped = 1;
> + update_shadow_used_ring_split(vq, head_idx, 0);
> +
> + if (unlikely(buf_len <= dev->vhost_hlen))
> break;
> - }
>
> buf_len -= dev->vhost_hlen;
>
> @@ -3161,7 +3160,6 @@ virtio_dev_tx_split(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> buf_len, mbuf_pool->name);
> allocerr_warned = true;
> }
> - dropped = 1;
> break;
> }
>
> @@ -3172,22 +3170,16 @@ virtio_dev_tx_split(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> VHOST_DATA_LOG(dev->ifname, ERR,
> "failed to copy desc to mbuf.");
> allocerr_warned = true;
> }
> - dropped = 1;
> break;
> }
> -
> - update_shadow_used_ring_split(vq, head_idx, 0);
> }
>
> if (unlikely(count != i))
> rte_pktmbuf_free_bulk(&pkts[i], count - i);
>
> - vq->last_avail_idx += i + dropped;
> -
> do_data_copy_dequeue(vq);
> - if (unlikely(i < count))
> - vq->shadow_used_idx = i;
> if (likely(vq->shadow_used_idx)) {
> + vq->last_avail_idx += vq->shadow_used_idx;
> flush_shadow_used_ring_split(dev, vq);
> vhost_vring_call_split(dev, vq);
> }
>
>
prev parent reply other threads:[~2024-01-31 13:32 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 9:31 Maxime Coquelin
2024-01-31 9:31 ` [PATCH 2/2] vhost: add new mbuf allocation failure statistic Maxime Coquelin
2024-01-31 13:19 ` [PATCH 1/2] vhost: fix memory leak in Virtio Tx split path David Marchand
2024-01-31 13:32 ` Maxime Coquelin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9efc5d7e-1e44-4338-8264-6d544a895d2d@redhat.com \
--to=maxime.coquelin@redhat.com \
--cc=bnemeth@redhat.com \
--cc=chenbox@nvidia.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=echaudro@redhat.com \
--cc=stable@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).