From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rich.lane@bigswitch.com>
Received: from mail-vk0-f41.google.com (mail-vk0-f41.google.com
 [209.85.213.41]) by dpdk.org (Postfix) with ESMTP id D5B885A71
 for <dev@dpdk.org>; Tue, 17 Nov 2015 17:39:30 +0100 (CET)
Received: by vkas68 with SMTP id s68so9335745vka.2
 for <dev@dpdk.org>; Tue, 17 Nov 2015 08:39:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=bigswitch_com.20150623.gappssmtp.com; s=20150623;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=NpGYqOsKBT1ZQk1Ef0b462kpr2jMQwFWDejtJGdWMO0=;
 b=gEuFnuj26gtmkVJ+/NlOxq4Ey25Wew+baP+gO4QRYEsgSR8dtK9iYM/vkq6nFNazaY
 SsnX8eL+id8sNanD+/OjfKf5er56v3DjBLf3W8vL//mfOZIfQq7keX/3Bhhfg3SuCDaK
 SKLfkYCxWOzimlM0hfLm9vqeSj5tAP3GOFg/o/ZE0IgSuXaG3IoIh4mXywZx6i0fkszv
 Nc78GCzOjhDyFHzn4TTwAd31ooL7gyPYonF0q8jpsbt8Z3oIDl6yy8rAAZFJLoVgvXmO
 IXzLOIR7sKL/csHMug98JREE8yBb51LjBGX18XkhPuwEEXH2RnB92EcAtzAFXxM6e4y9
 acNw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=NpGYqOsKBT1ZQk1Ef0b462kpr2jMQwFWDejtJGdWMO0=;
 b=UcKfW2kOOz0ocNAu0FJpRK1RbD5u84oqkkdfqmzdGTfMJEPtgNewizD38pGHbTw9oe
 9Rn1hqdJeFgknYcD8mLlgUrRSaS2BvUurmbtCuei0ADhrb44czdikJIIzUZyCNsnD21M
 qjX3eRlqy724JB4ABd5g06cq5Kgh1JwzvtlSX6l3iC51XgPzY/HX74fSmXCq+2uUhk66
 36btCNdAXALxSv+iZbEeqjXF4kkGw0l3oL4h8y39Lsg7MQlxPROLMJlmG9Io4pkpwCyc
 2exq/NRLyjFyoNGy5ewm1H++YH9HQdo4qsyDDiPUbRJ9SF7r8YvoAP5LmIH3UL3nZ8M2
 yxJA==
X-Gm-Message-State: ALoCoQkk4iyp6+pahlSrFu+QDsUN5DzIX0kjGyK5XdaE2l9fLuZUw06Y2XyDSNC50ciB4ZrDGkgM
MIME-Version: 1.0
X-Received: by 10.31.13.1 with SMTP id 1mr4967425vkn.100.1447778370220; Tue,
 17 Nov 2015 08:39:30 -0800 (PST)
Received: by 10.31.3.170 with HTTP; Tue, 17 Nov 2015 08:39:30 -0800 (PST)
In-Reply-To: <20151117132349.GT2326@yliu-dev.sh.intel.com>
References: <1447315353-42152-1-git-send-email-rlane@bigswitch.com>
 <20151112092305.GI2326@yliu-dev.sh.intel.com>
 <CAGSMBPOLNsc-+_Zj7FgBhmD0kpUAoy3fu5urxN74YTfmE20Qzw@mail.gmail.com>
 <20151117132349.GT2326@yliu-dev.sh.intel.com>
Date: Tue, 17 Nov 2015 08:39:30 -0800
Message-ID: <CAGSMBPPrustS5-2BdGmkKFUTQBnHcb6NRJ43TYuL2zRVAWkWqw@mail.gmail.com>
From: Rich Lane <rich.lane@bigswitch.com>
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in
	update_secure_len
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Nov 2015 16:39:31 -0000

On Tue, Nov 17, 2015 at 5:23 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Thu, Nov 12, 2015 at 01:46:03PM -0800, Rich Lane wrote:
> > You can reproduce this with l2fwd and the vhost PMD.
> >
> > You'll need this patch on top of the vhost PMD patches:
> > --- a/lib/librte_vhost/virtio-net.c
> > +++ b/lib/librte_vhost/virtio-net.c
> > @@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx)
> >                 return -1;
> >
> >         if (dev->flags & VIRTIO_DEV_RUNNING)
> > -               notify_ops->destroy_device(dev);
> > +               notify_destroy_device(dev);
> >
> >         cleanup_device(dev);
> >         reset_device(dev);
> >
> > 1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev
> > eth_vhost0,iface=/run/vhost0.sock -- -p3
> > 2. Start a VM using vhost-user and set up uio, hugepages, etc.
> > 3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3
> > 4. Kill the l2fwd inside the VM with SIGINT.
> > 5. Start l2fwd inside the VM.
> > 6. l2fwd on the host crashes.
> >
> > I found the source of the memory corruption by setting a watchpoint in
> > gdb: watch -l rte_eth_devices[1].data->rx_queues
>
> Rich,
>
> Thanks for the detailed steps for reproducing this issue, and sorry for
> being a bit late: I finally got the time to dig this issue today.
>
> Put simply, buffer overflow is not the root cause, but the fact "we do
> not release resource on stop/exit" is.
>
> And here is how the issue comes.  After step 4 (terminating l2fwd), neither
> the l2fwd nor the virtio pmd driver does some resource release. Hence,
> l2fwd at HOST will not notice such chage, still trying to receive and
> queue packets to the vhost dev. It's not an issue as far as we don't
> start l2fwd again, for there is actaully no packets to forward, and
> rte_vhost_dequeue_burst returns from:
>
>     596         avail_idx =  *((volatile uint16_t *)&vq->avail->idx);
>     597
>     598         /* If there are no available buffers then return. */
>     599         if (vq->last_used_idx == avail_idx)
>     600                 return 0;
>
> But just at the init stage while starting l2fwd (step 5),
> rte_eal_memory_init()
> resets all huge pages memory to zero, resulting all vq->desc[] items
> being reset to zero, which in turn ends up with secure_len being set
> with 0 at return.
>
> (BTW, I'm not quite sure why the inside VM huge pages memory reset
> would results to vq->desc reset).
>
> The vq desc reset reuslts to a dead loop at virtio_dev_merge_rx(),
> as update_secure_len() keeps setting secure_len with 0:
>
>     511                    do {
>     512                            avail_idx = *((volatile uint16_t
> *)&vq->avail->idx);
>     513                            if (unlikely(res_cur_idx == avail_idx))
> {
>     514                                    LOG_DEBUG(VHOST_DATA,
>     515                                            "(%"PRIu64") Failed "
>     516                                            "to get enough desc
> from "
>     517                                            "vring\n",
>     518                                            dev->device_fh);
>     519                                    goto merge_rx_exit;
>     520                            } else {
>     521                                    update_secure_len(vq,
> res_cur_idx, &secure_len, &vec_idx);
>     522                                    res_cur_idx++;
>     523                            }
>     524                    } while (pkt_len > secure_len);
>
> The dead loop causes vec_idx keep increasing then, and overflows
> quickly, leading to the crash in the end as you saw.
>
> So, the following would resolve this issue, in a right way (I
> guess), and it's for virtio-pmd and l2fwd only so far.
>
> ---
> diff --git a/drivers/net/virtio/virtio_ethdev.c
> b/drivers/net/virtio/virtio_ethdev.c
> index 12fcc23..8d6bf56 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1507,9 +1507,12 @@ static void
>  virtio_dev_stop(struct rte_eth_dev *dev)
>  {
>         struct rte_eth_link link;
> +       struct virtio_hw *hw = dev->data->dev_private;
>
>         PMD_INIT_LOG(DEBUG, "stop");
>
> +       vtpci_reset(hw);
> +
>         if (dev->data->dev_conf.intr_conf.lsc)
>                 rte_intr_disable(&dev->pci_dev->intr_handle);
>
> diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
> index 720fd5a..565f648 100644
> --- a/examples/l2fwd/main.c
> +++ b/examples/l2fwd/main.c
> @@ -44,6 +44,7 @@
>  #include <ctype.h>
>  #include <errno.h>
>  #include <getopt.h>
> +#include <signal.h>
>
>  #include <rte_common.h>
>  #include <rte_log.h>
> @@ -534,14 +535,40 @@ check_all_ports_link_status(uint8_t port_num,
> uint32_t port_mask)
>         }
>  }
>
> +static uint8_t nb_ports;
> +static uint8_t nb_ports_available;
> +
> +/* When we receive a INT signal, unregister vhost driver */
> +static void
> +sigint_handler(__rte_unused int signum)
> +{
> +       uint8_t portid;
> +
> +       for (portid = 0; portid < nb_ports; portid++) {
> +               /* skip ports that are not enabled */
> +               if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
> +                       printf("Skipping disabled port %u\n", (unsigned)
> portid);
> +                       nb_ports_available--;
> +                       continue;
> +               }
> +
> +               /* stopping port */
> +               printf("Stopping port %u... ", (unsigned) portid);
> +               fflush(stdout);
> +               rte_eth_dev_stop(portid);
> +
> +               printf("done: \n");
> +       }
> +
> +        exit(0);
> +}
> +
>  int
>  main(int argc, char **argv)
>  {
>         struct lcore_queue_conf *qconf;
>         struct rte_eth_dev_info dev_info;
>         int ret;
> -       uint8_t nb_ports;
> -       uint8_t nb_ports_available;
>         uint8_t portid, last_port;
>         unsigned lcore_id, rx_lcore_id;
>         unsigned nb_ports_in_mask = 0;
> @@ -688,6 +715,8 @@ main(int argc, char **argv)
>                 /* initialize port stats */
>                 memset(&port_statistics, 0, sizeof(port_statistics));
>         }
> +       signal(SIGINT, sigint_handler);
> +
>
>         if (!nb_ports_available) {
>                 rte_exit(EXIT_FAILURE,
>
>
> ----
>
> And if you rethink this issue twice, you will find it's neither a
> vhost-pmd nor l2fwd specific issue. I could easy reproduce it with
> vhost-switch and virtio testpmd combo. The reason behind that would
> be same: we don't release/stop the resources at stop.
>
> It's kind of a known issue so far, and it's on Zhihong (cc'ed) TODO
> list to handle them correctly in next release.
>
>         --yliu


Thanks for looking into this. I agree with your description of the root
cause, it's what I was referring to when I mentioned that the virtqueue
memory is zeroed when the guest app is restarted. Agreed that it's not
specific to l2fwd/vhost PMD.

When the guest zeroes the avail virtqueue idx it goes backwards from the
perspective of the host. The host then loops up to 2^16 times until
res_cur_idx == avail_idx, overflowing the buf_vec array after the first 256
iterations. No real packet TX is needed.

I don't think that adding a SIGINT handler is the right solution, though.
The guest app could be killed with another signal (SIGKILL). Worse, a
malicious or buggy guest could write to just that field. vhost should not
crash no matter what the guest writes into the virtqueues.