DPDK patches and discussions
 help / color / mirror / Atom feed
From: Rich Lane <rich.lane@bigswitch.com>
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
Date: Tue, 17 Nov 2015 08:39:30 -0800	[thread overview]
Message-ID: <CAGSMBPPrustS5-2BdGmkKFUTQBnHcb6NRJ43TYuL2zRVAWkWqw@mail.gmail.com> (raw)
In-Reply-To: <20151117132349.GT2326@yliu-dev.sh.intel.com>

On Tue, Nov 17, 2015 at 5:23 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Thu, Nov 12, 2015 at 01:46:03PM -0800, Rich Lane wrote:
> > You can reproduce this with l2fwd and the vhost PMD.
> >
> > You'll need this patch on top of the vhost PMD patches:
> > --- a/lib/librte_vhost/virtio-net.c
> > +++ b/lib/librte_vhost/virtio-net.c
> > @@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx)
> >                 return -1;
> >
> >         if (dev->flags & VIRTIO_DEV_RUNNING)
> > -               notify_ops->destroy_device(dev);
> > +               notify_destroy_device(dev);
> >
> >         cleanup_device(dev);
> >         reset_device(dev);
> >
> > 1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev
> > eth_vhost0,iface=/run/vhost0.sock -- -p3
> > 2. Start a VM using vhost-user and set up uio, hugepages, etc.
> > 3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3
> > 4. Kill the l2fwd inside the VM with SIGINT.
> > 5. Start l2fwd inside the VM.
> > 6. l2fwd on the host crashes.
> >
> > I found the source of the memory corruption by setting a watchpoint in
> > gdb: watch -l rte_eth_devices[1].data->rx_queues
>
> Rich,
>
> Thanks for the detailed steps for reproducing this issue, and sorry for
> being a bit late: I finally got the time to dig this issue today.
>
> Put simply, buffer overflow is not the root cause, but the fact "we do
> not release resource on stop/exit" is.
>
> And here is how the issue comes.  After step 4 (terminating l2fwd), neither
> the l2fwd nor the virtio pmd driver does some resource release. Hence,
> l2fwd at HOST will not notice such chage, still trying to receive and
> queue packets to the vhost dev. It's not an issue as far as we don't
> start l2fwd again, for there is actaully no packets to forward, and
> rte_vhost_dequeue_burst returns from:
>
>     596         avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
>     597
>     598         /* If there are no available buffers then return. */
>     599         if (vq->last_used_idx == avail_idx)
>     600                 return 0;
>
> But just at the init stage while starting l2fwd (step 5),
> rte_eal_memory_init()
> resets all huge pages memory to zero, resulting all vq->desc[] items
> being reset to zero, which in turn ends up with secure_len being set
> with 0 at return.
>
> (BTW, I'm not quite sure why the inside VM huge pages memory reset
> would results to vq->desc reset).
>
> The vq desc reset reuslts to a dead loop at virtio_dev_merge_rx(),
> as update_secure_len() keeps setting secure_len with 0:
>
>     511                    do {
>     512                            avail_idx = *((volatile uint16_t
> *)&vq->avail->idx);
>     513                            if (unlikely(res_cur_idx == avail_idx))
> {
>     514                                    LOG_DEBUG(VHOST_DATA,
>     515                                            "(%"PRIu64") Failed "
>     516                                            "to get enough desc
> from "
>     517                                            "vring\n",
>     518                                            dev->device_fh);
>     519                                    goto merge_rx_exit;
>     520                            } else {
>     521                                    update_secure_len(vq,
> res_cur_idx, &secure_len, &vec_idx);
>     522                                    res_cur_idx++;
>     523                            }
>     524                    } while (pkt_len > secure_len);
>
> The dead loop causes vec_idx keep increasing then, and overflows
> quickly, leading to the crash in the end as you saw.
>
> So, the following would resolve this issue, in a right way (I
> guess), and it's for virtio-pmd and l2fwd only so far.
>
> ---
> diff --git a/drivers/net/virtio/virtio_ethdev.c
> b/drivers/net/virtio/virtio_ethdev.c
> index 12fcc23..8d6bf56 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1507,9 +1507,12 @@ static void
>  virtio_dev_stop(struct rte_eth_dev *dev)
>  {
>         struct rte_eth_link link;
> +       struct virtio_hw *hw = dev->data->dev_private;
>
>         PMD_INIT_LOG(DEBUG, "stop");
>
> +       vtpci_reset(hw);
> +
>         if (dev->data->dev_conf.intr_conf.lsc)
>                 rte_intr_disable(&dev->pci_dev->intr_handle);
>
> diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
> index 720fd5a..565f648 100644
> --- a/examples/l2fwd/main.c
> +++ b/examples/l2fwd/main.c
> @@ -44,6 +44,7 @@
>  #include <ctype.h>
>  #include <errno.h>
>  #include <getopt.h>
> +#include <signal.h>
>
>  #include <rte_common.h>
>  #include <rte_log.h>
> @@ -534,14 +535,40 @@ check_all_ports_link_status(uint8_t port_num,
> uint32_t port_mask)
>         }
>  }
>
> +static uint8_t nb_ports;
> +static uint8_t nb_ports_available;
> +
> +/* When we receive a INT signal, unregister vhost driver */
> +static void
> +sigint_handler(__rte_unused int signum)
> +{
> +       uint8_t portid;
> +
> +       for (portid = 0; portid < nb_ports; portid++) {
> +               /* skip ports that are not enabled */
> +               if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
> +                       printf("Skipping disabled port %u\n", (unsigned)
> portid);
> +                       nb_ports_available--;
> +                       continue;
> +               }
> +
> +               /* stopping port */
> +               printf("Stopping port %u... ", (unsigned) portid);
> +               fflush(stdout);
> +               rte_eth_dev_stop(portid);
> +
> +               printf("done: \n");
> +       }
> +
> +        exit(0);
> +}
> +
>  int
>  main(int argc, char **argv)
>  {
>         struct lcore_queue_conf *qconf;
>         struct rte_eth_dev_info dev_info;
>         int ret;
> -       uint8_t nb_ports;
> -       uint8_t nb_ports_available;
>         uint8_t portid, last_port;
>         unsigned lcore_id, rx_lcore_id;
>         unsigned nb_ports_in_mask = 0;
> @@ -688,6 +715,8 @@ main(int argc, char **argv)
>                 /* initialize port stats */
>                 memset(&port_statistics, 0, sizeof(port_statistics));
>         }
> +       signal(SIGINT, sigint_handler);
> +
>
>         if (!nb_ports_available) {
>                 rte_exit(EXIT_FAILURE,
>
>
> ----
>
> And if you rethink this issue twice, you will find it's neither a
> vhost-pmd nor l2fwd specific issue. I could easy reproduce it with
> vhost-switch and virtio testpmd combo. The reason behind that would
> be same: we don't release/stop the resources at stop.
>
> It's kind of a known issue so far, and it's on Zhihong (cc'ed) TODO
> list to handle them correctly in next release.
>
>         --yliu


Thanks for looking into this. I agree with your description of the root
cause, it's what I was referring to when I mentioned that the virtqueue
memory is zeroed when the guest app is restarted. Agreed that it's not
specific to l2fwd/vhost PMD.

When the guest zeroes the avail virtqueue idx it goes backwards from the
perspective of the host. The host then loops up to 2^16 times until
res_cur_idx == avail_idx, overflowing the buf_vec array after the first 256
iterations. No real packet TX is needed.

I don't think that adding a SIGINT handler is the right solution, though.
The guest app could be killed with another signal (SIGKILL). Worse, a
malicious or buggy guest could write to just that field. vhost should not
crash no matter what the guest writes into the virtqueues.

  reply	other threads:[~2015-11-17 16:39 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-12  8:02 Rich Lane
2015-11-12  9:23 ` Yuanhan Liu
2015-11-12 21:46   ` Rich Lane
2015-11-17 13:23     ` Yuanhan Liu
2015-11-17 16:39       ` Rich Lane [this message]
2015-11-18  2:56         ` Yuanhan Liu
2015-11-18  5:23           ` Wang, Zhihong
2015-11-18  5:26           ` Rich Lane
2015-11-18  5:32             ` Yuanhan Liu
2015-11-18  6:13           ` Xie, Huawei
2015-11-18  6:25             ` Yuanhan Liu
2015-11-18  8:13               ` Xie, Huawei
2015-11-18 15:53             ` Stephen Hemminger
2015-11-18 16:00               ` Xie, Huawei
2015-11-18  7:53           ` Xie, Huawei
2015-11-18  8:48             ` Yuanhan Liu
2015-11-18 11:15               ` Xie, Huawei
2015-11-19  5:51                 ` Yuanhan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGSMBPPrustS5-2BdGmkKFUTQBnHcb6NRJ43TYuL2zRVAWkWqw@mail.gmail.com \
    --to=rich.lane@bigswitch.com \
    --cc=dev@dpdk.org \
    --cc=yuanhan.liu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).