* [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
@ 2015-11-12 8:02 Rich Lane
2015-11-12 9:23 ` Yuanhan Liu
0 siblings, 1 reply; 18+ messages in thread
From: Rich Lane @ 2015-11-12 8:02 UTC (permalink / raw)
To: dev
The guest could trigger this buffer overflow by creating a cycle of descriptors
(which would also cause an infinite loop). The more common case is that
vq->avail->idx jumps out of the range [last_used_idx, last_used_idx+256). This
happens nearly every time when restarting a DPDK app inside a VM connected to a
vhost-user vswitch because the virtqueue memory allocated by the previous run
is zeroed.
Signed-off-by: Rich Lane <rlane@bigswitch.com>
---
lib/librte_vhost/vhost_rxtx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 9322ce6..d95b478 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
vq->buf_vec[vec_id].desc_idx = idx;
vec_id++;
- if (vq->desc[idx].flags & VRING_DESC_F_NEXT) {
+ if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id < BUF_VECTOR_MAX) {
idx = vq->desc[idx].next;
next_desc = 1;
}
--
1.9.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-12 8:02 [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len Rich Lane
@ 2015-11-12 9:23 ` Yuanhan Liu
2015-11-12 21:46 ` Rich Lane
0 siblings, 1 reply; 18+ messages in thread
From: Yuanhan Liu @ 2015-11-12 9:23 UTC (permalink / raw)
To: Rich Lane; +Cc: dev
On Thu, Nov 12, 2015 at 12:02:33AM -0800, Rich Lane wrote:
> The guest could trigger this buffer overflow by creating a cycle of descriptors
> (which would also cause an infinite loop). The more common case is that
> vq->avail->idx jumps out of the range [last_used_idx, last_used_idx+256). This
> happens nearly every time when restarting a DPDK app inside a VM connected to a
> vhost-user vswitch because the virtqueue memory allocated by the previous run
> is zeroed.
Hi,
I somehow was aware of this issue before while reading the code.
Thinking that we never met that, I delayed the fix (it was still
in my TODO list).
Would you please tell me the steps (commands would be better) to
reproduce your issue? I'd like to know more about the isue: I'm
guessing maybe we need fix it with a bit more cares.
--yliu
>
> Signed-off-by: Rich Lane <rlane@bigswitch.com>
> ---
> lib/librte_vhost/vhost_rxtx.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 9322ce6..d95b478 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
> vq->buf_vec[vec_id].desc_idx = idx;
> vec_id++;
>
> - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) {
> + if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id < BUF_VECTOR_MAX) {
> idx = vq->desc[idx].next;
> next_desc = 1;
> }
> --
> 1.9.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-12 9:23 ` Yuanhan Liu
@ 2015-11-12 21:46 ` Rich Lane
2015-11-17 13:23 ` Yuanhan Liu
0 siblings, 1 reply; 18+ messages in thread
From: Rich Lane @ 2015-11-12 21:46 UTC (permalink / raw)
To: Yuanhan Liu; +Cc: dev
You can reproduce this with l2fwd and the vhost PMD.
You'll need this patch on top of the vhost PMD patches:
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx)
return -1;
if (dev->flags & VIRTIO_DEV_RUNNING)
- notify_ops->destroy_device(dev);
+ notify_destroy_device(dev);
cleanup_device(dev);
reset_device(dev);
1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev
eth_vhost0,iface=/run/vhost0.sock -- -p3
2. Start a VM using vhost-user and set up uio, hugepages, etc.
3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3
4. Kill the l2fwd inside the VM with SIGINT.
5. Start l2fwd inside the VM.
6. l2fwd on the host crashes.
I found the source of the memory corruption by setting a watchpoint in
gdb: watch -l rte_eth_devices[1].data->rx_queues
On Thu, Nov 12, 2015 at 1:23 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:
> On Thu, Nov 12, 2015 at 12:02:33AM -0800, Rich Lane wrote:
> > The guest could trigger this buffer overflow by creating a cycle of
> descriptors
> > (which would also cause an infinite loop). The more common case is that
> > vq->avail->idx jumps out of the range [last_used_idx,
> last_used_idx+256). This
> > happens nearly every time when restarting a DPDK app inside a VM
> connected to a
> > vhost-user vswitch because the virtqueue memory allocated by the
> previous run
> > is zeroed.
>
> Hi,
>
> I somehow was aware of this issue before while reading the code.
> Thinking that we never met that, I delayed the fix (it was still
> in my TODO list).
>
> Would you please tell me the steps (commands would be better) to
> reproduce your issue? I'd like to know more about the isue: I'm
> guessing maybe we need fix it with a bit more cares.
>
> --yliu
> >
> > Signed-off-by: Rich Lane <rlane@bigswitch.com>
> > ---
> > lib/librte_vhost/vhost_rxtx.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_vhost/vhost_rxtx.c
> b/lib/librte_vhost/vhost_rxtx.c
> > index 9322ce6..d95b478 100644
> > --- a/lib/librte_vhost/vhost_rxtx.c
> > +++ b/lib/librte_vhost/vhost_rxtx.c
> > @@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq,
> uint32_t id,
> > vq->buf_vec[vec_id].desc_idx = idx;
> > vec_id++;
> >
> > - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) {
> > + if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id <
> BUF_VECTOR_MAX) {
> > idx = vq->desc[idx].next;
> > next_desc = 1;
> > }
> > --
> > 1.9.1
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-12 21:46 ` Rich Lane
@ 2015-11-17 13:23 ` Yuanhan Liu
2015-11-17 16:39 ` Rich Lane
0 siblings, 1 reply; 18+ messages in thread
From: Yuanhan Liu @ 2015-11-17 13:23 UTC (permalink / raw)
To: Rich Lane; +Cc: dev
On Thu, Nov 12, 2015 at 01:46:03PM -0800, Rich Lane wrote:
> You can reproduce this with l2fwd and the vhost PMD.
>
> You'll need this patch on top of the vhost PMD patches:
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx)
> return -1;
>
> if (dev->flags & VIRTIO_DEV_RUNNING)
> - notify_ops->destroy_device(dev);
> + notify_destroy_device(dev);
>
> cleanup_device(dev);
> reset_device(dev);
>
> 1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev
> eth_vhost0,iface=/run/vhost0.sock -- -p3
> 2. Start a VM using vhost-user and set up uio, hugepages, etc.
> 3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3
> 4. Kill the l2fwd inside the VM with SIGINT.
> 5. Start l2fwd inside the VM.
> 6. l2fwd on the host crashes.
>
> I found the source of the memory corruption by setting a watchpoint in
> gdb: watch -l rte_eth_devices[1].data->rx_queues
Rich,
Thanks for the detailed steps for reproducing this issue, and sorry for
being a bit late: I finally got the time to dig this issue today.
Put simply, buffer overflow is not the root cause, but the fact "we do
not release resource on stop/exit" is.
And here is how the issue comes. After step 4 (terminating l2fwd), neither
the l2fwd nor the virtio pmd driver does some resource release. Hence,
l2fwd at HOST will not notice such chage, still trying to receive and
queue packets to the vhost dev. It's not an issue as far as we don't
start l2fwd again, for there is actaully no packets to forward, and
rte_vhost_dequeue_burst returns from:
596 avail_idx = *((volatile uint16_t *)&vq->avail->idx);
597
598 /* If there are no available buffers then return. */
599 if (vq->last_used_idx == avail_idx)
600 return 0;
But just at the init stage while starting l2fwd (step 5), rte_eal_memory_init()
resets all huge pages memory to zero, resulting all vq->desc[] items
being reset to zero, which in turn ends up with secure_len being set
with 0 at return.
(BTW, I'm not quite sure why the inside VM huge pages memory reset
would results to vq->desc reset).
The vq desc reset reuslts to a dead loop at virtio_dev_merge_rx(),
as update_secure_len() keeps setting secure_len with 0:
511 do {
512 avail_idx = *((volatile uint16_t *)&vq->avail->idx);
513 if (unlikely(res_cur_idx == avail_idx)) {
514 LOG_DEBUG(VHOST_DATA,
515 "(%"PRIu64") Failed "
516 "to get enough desc from "
517 "vring\n",
518 dev->device_fh);
519 goto merge_rx_exit;
520 } else {
521 update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx);
522 res_cur_idx++;
523 }
524 } while (pkt_len > secure_len);
The dead loop causes vec_idx keep increasing then, and overflows
quickly, leading to the crash in the end as you saw.
So, the following would resolve this issue, in a right way (I
guess), and it's for virtio-pmd and l2fwd only so far.
---
diff --git a/drivers/net/virtio/virtio_ethdev.c b/drivers/net/virtio/virtio_ethdev.c
index 12fcc23..8d6bf56 100644
--- a/drivers/net/virtio/virtio_ethdev.c
+++ b/drivers/net/virtio/virtio_ethdev.c
@@ -1507,9 +1507,12 @@ static void
virtio_dev_stop(struct rte_eth_dev *dev)
{
struct rte_eth_link link;
+ struct virtio_hw *hw = dev->data->dev_private;
PMD_INIT_LOG(DEBUG, "stop");
+ vtpci_reset(hw);
+
if (dev->data->dev_conf.intr_conf.lsc)
rte_intr_disable(&dev->pci_dev->intr_handle);
diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index 720fd5a..565f648 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -44,6 +44,7 @@
#include <ctype.h>
#include <errno.h>
#include <getopt.h>
+#include <signal.h>
#include <rte_common.h>
#include <rte_log.h>
@@ -534,14 +535,40 @@ check_all_ports_link_status(uint8_t port_num, uint32_t port_mask)
}
}
+static uint8_t nb_ports;
+static uint8_t nb_ports_available;
+
+/* When we receive a INT signal, unregister vhost driver */
+static void
+sigint_handler(__rte_unused int signum)
+{
+ uint8_t portid;
+
+ for (portid = 0; portid < nb_ports; portid++) {
+ /* skip ports that are not enabled */
+ if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
+ printf("Skipping disabled port %u\n", (unsigned) portid);
+ nb_ports_available--;
+ continue;
+ }
+
+ /* stopping port */
+ printf("Stopping port %u... ", (unsigned) portid);
+ fflush(stdout);
+ rte_eth_dev_stop(portid);
+
+ printf("done: \n");
+ }
+
+ exit(0);
+}
+
int
main(int argc, char **argv)
{
struct lcore_queue_conf *qconf;
struct rte_eth_dev_info dev_info;
int ret;
- uint8_t nb_ports;
- uint8_t nb_ports_available;
uint8_t portid, last_port;
unsigned lcore_id, rx_lcore_id;
unsigned nb_ports_in_mask = 0;
@@ -688,6 +715,8 @@ main(int argc, char **argv)
/* initialize port stats */
memset(&port_statistics, 0, sizeof(port_statistics));
}
+ signal(SIGINT, sigint_handler);
+
if (!nb_ports_available) {
rte_exit(EXIT_FAILURE,
----
And if you rethink this issue twice, you will find it's neither a
vhost-pmd nor l2fwd specific issue. I could easy reproduce it with
vhost-switch and virtio testpmd combo. The reason behind that would
be same: we don't release/stop the resources at stop.
It's kind of a known issue so far, and it's on Zhihong (cc'ed) TODO
list to handle them correctly in next release.
--yliu
>
> On Thu, Nov 12, 2015 at 1:23 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
>
> On Thu, Nov 12, 2015 at 12:02:33AM -0800, Rich Lane wrote:
> > The guest could trigger this buffer overflow by creating a cycle of
> descriptors
> > (which would also cause an infinite loop). The more common case is that
> > vq->avail->idx jumps out of the range [last_used_idx, last_used_idx+256).
> This
> > happens nearly every time when restarting a DPDK app inside a VM
> connected to a
> > vhost-user vswitch because the virtqueue memory allocated by the previous
> run
> > is zeroed.
>
> Hi,
>
> I somehow was aware of this issue before while reading the code.
> Thinking that we never met that, I delayed the fix (it was still
> in my TODO list).
>
> Would you please tell me the steps (commands would be better) to
> reproduce your issue? I'd like to know more about the isue: I'm
> guessing maybe we need fix it with a bit more cares.
>
> --yliu
> >
> > Signed-off-by: Rich Lane <rlane@bigswitch.com>
> > ---
> > lib/librte_vhost/vhost_rxtx.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/
> vhost_rxtx.c
> > index 9322ce6..d95b478 100644
> > --- a/lib/librte_vhost/vhost_rxtx.c
> > +++ b/lib/librte_vhost/vhost_rxtx.c
> > @@ -453,7 +453,7 @@ update_secure_len(struct vhost_virtqueue *vq,
> uint32_t id,
> > vq->buf_vec[vec_id].desc_idx = idx;
> > vec_id++;
> >
> > - if (vq->desc[idx].flags & VRING_DESC_F_NEXT) {
> > + if (vq->desc[idx].flags & VRING_DESC_F_NEXT && vec_id <
> BUF_VECTOR_MAX) {
> > idx = vq->desc[idx].next;
> > next_desc = 1;
> > }
> > --
> > 1.9.1
>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-17 13:23 ` Yuanhan Liu
@ 2015-11-17 16:39 ` Rich Lane
2015-11-18 2:56 ` Yuanhan Liu
0 siblings, 1 reply; 18+ messages in thread
From: Rich Lane @ 2015-11-17 16:39 UTC (permalink / raw)
To: Yuanhan Liu; +Cc: dev
On Tue, Nov 17, 2015 at 5:23 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:
> On Thu, Nov 12, 2015 at 01:46:03PM -0800, Rich Lane wrote:
> > You can reproduce this with l2fwd and the vhost PMD.
> >
> > You'll need this patch on top of the vhost PMD patches:
> > --- a/lib/librte_vhost/virtio-net.c
> > +++ b/lib/librte_vhost/virtio-net.c
> > @@ -471,7 +471,7 @@ reset_owner(struct vhost_device_ctx ctx)
> > return -1;
> >
> > if (dev->flags & VIRTIO_DEV_RUNNING)
> > - notify_ops->destroy_device(dev);
> > + notify_destroy_device(dev);
> >
> > cleanup_device(dev);
> > reset_device(dev);
> >
> > 1. Start l2fwd on the host: l2fwd -l 0,1 --vdev eth_null --vdev
> > eth_vhost0,iface=/run/vhost0.sock -- -p3
> > 2. Start a VM using vhost-user and set up uio, hugepages, etc.
> > 3. Start l2fwd inside the VM: l2fwd -l 0,1 --vdev eth_null -- -p3
> > 4. Kill the l2fwd inside the VM with SIGINT.
> > 5. Start l2fwd inside the VM.
> > 6. l2fwd on the host crashes.
> >
> > I found the source of the memory corruption by setting a watchpoint in
> > gdb: watch -l rte_eth_devices[1].data->rx_queues
>
> Rich,
>
> Thanks for the detailed steps for reproducing this issue, and sorry for
> being a bit late: I finally got the time to dig this issue today.
>
> Put simply, buffer overflow is not the root cause, but the fact "we do
> not release resource on stop/exit" is.
>
> And here is how the issue comes. After step 4 (terminating l2fwd), neither
> the l2fwd nor the virtio pmd driver does some resource release. Hence,
> l2fwd at HOST will not notice such chage, still trying to receive and
> queue packets to the vhost dev. It's not an issue as far as we don't
> start l2fwd again, for there is actaully no packets to forward, and
> rte_vhost_dequeue_burst returns from:
>
> 596 avail_idx = *((volatile uint16_t *)&vq->avail->idx);
> 597
> 598 /* If there are no available buffers then return. */
> 599 if (vq->last_used_idx == avail_idx)
> 600 return 0;
>
> But just at the init stage while starting l2fwd (step 5),
> rte_eal_memory_init()
> resets all huge pages memory to zero, resulting all vq->desc[] items
> being reset to zero, which in turn ends up with secure_len being set
> with 0 at return.
>
> (BTW, I'm not quite sure why the inside VM huge pages memory reset
> would results to vq->desc reset).
>
> The vq desc reset reuslts to a dead loop at virtio_dev_merge_rx(),
> as update_secure_len() keeps setting secure_len with 0:
>
> 511 do {
> 512 avail_idx = *((volatile uint16_t
> *)&vq->avail->idx);
> 513 if (unlikely(res_cur_idx == avail_idx))
> {
> 514 LOG_DEBUG(VHOST_DATA,
> 515 "(%"PRIu64") Failed "
> 516 "to get enough desc
> from "
> 517 "vring\n",
> 518 dev->device_fh);
> 519 goto merge_rx_exit;
> 520 } else {
> 521 update_secure_len(vq,
> res_cur_idx, &secure_len, &vec_idx);
> 522 res_cur_idx++;
> 523 }
> 524 } while (pkt_len > secure_len);
>
> The dead loop causes vec_idx keep increasing then, and overflows
> quickly, leading to the crash in the end as you saw.
>
> So, the following would resolve this issue, in a right way (I
> guess), and it's for virtio-pmd and l2fwd only so far.
>
> ---
> diff --git a/drivers/net/virtio/virtio_ethdev.c
> b/drivers/net/virtio/virtio_ethdev.c
> index 12fcc23..8d6bf56 100644
> --- a/drivers/net/virtio/virtio_ethdev.c
> +++ b/drivers/net/virtio/virtio_ethdev.c
> @@ -1507,9 +1507,12 @@ static void
> virtio_dev_stop(struct rte_eth_dev *dev)
> {
> struct rte_eth_link link;
> + struct virtio_hw *hw = dev->data->dev_private;
>
> PMD_INIT_LOG(DEBUG, "stop");
>
> + vtpci_reset(hw);
> +
> if (dev->data->dev_conf.intr_conf.lsc)
> rte_intr_disable(&dev->pci_dev->intr_handle);
>
> diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
> index 720fd5a..565f648 100644
> --- a/examples/l2fwd/main.c
> +++ b/examples/l2fwd/main.c
> @@ -44,6 +44,7 @@
> #include <ctype.h>
> #include <errno.h>
> #include <getopt.h>
> +#include <signal.h>
>
> #include <rte_common.h>
> #include <rte_log.h>
> @@ -534,14 +535,40 @@ check_all_ports_link_status(uint8_t port_num,
> uint32_t port_mask)
> }
> }
>
> +static uint8_t nb_ports;
> +static uint8_t nb_ports_available;
> +
> +/* When we receive a INT signal, unregister vhost driver */
> +static void
> +sigint_handler(__rte_unused int signum)
> +{
> + uint8_t portid;
> +
> + for (portid = 0; portid < nb_ports; portid++) {
> + /* skip ports that are not enabled */
> + if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
> + printf("Skipping disabled port %u\n", (unsigned)
> portid);
> + nb_ports_available--;
> + continue;
> + }
> +
> + /* stopping port */
> + printf("Stopping port %u... ", (unsigned) portid);
> + fflush(stdout);
> + rte_eth_dev_stop(portid);
> +
> + printf("done: \n");
> + }
> +
> + exit(0);
> +}
> +
> int
> main(int argc, char **argv)
> {
> struct lcore_queue_conf *qconf;
> struct rte_eth_dev_info dev_info;
> int ret;
> - uint8_t nb_ports;
> - uint8_t nb_ports_available;
> uint8_t portid, last_port;
> unsigned lcore_id, rx_lcore_id;
> unsigned nb_ports_in_mask = 0;
> @@ -688,6 +715,8 @@ main(int argc, char **argv)
> /* initialize port stats */
> memset(&port_statistics, 0, sizeof(port_statistics));
> }
> + signal(SIGINT, sigint_handler);
> +
>
> if (!nb_ports_available) {
> rte_exit(EXIT_FAILURE,
>
>
> ----
>
> And if you rethink this issue twice, you will find it's neither a
> vhost-pmd nor l2fwd specific issue. I could easy reproduce it with
> vhost-switch and virtio testpmd combo. The reason behind that would
> be same: we don't release/stop the resources at stop.
>
> It's kind of a known issue so far, and it's on Zhihong (cc'ed) TODO
> list to handle them correctly in next release.
>
> --yliu
Thanks for looking into this. I agree with your description of the root
cause, it's what I was referring to when I mentioned that the virtqueue
memory is zeroed when the guest app is restarted. Agreed that it's not
specific to l2fwd/vhost PMD.
When the guest zeroes the avail virtqueue idx it goes backwards from the
perspective of the host. The host then loops up to 2^16 times until
res_cur_idx == avail_idx, overflowing the buf_vec array after the first 256
iterations. No real packet TX is needed.
I don't think that adding a SIGINT handler is the right solution, though.
The guest app could be killed with another signal (SIGKILL). Worse, a
malicious or buggy guest could write to just that field. vhost should not
crash no matter what the guest writes into the virtqueues.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-17 16:39 ` Rich Lane
@ 2015-11-18 2:56 ` Yuanhan Liu
2015-11-18 5:23 ` Wang, Zhihong
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Yuanhan Liu @ 2015-11-18 2:56 UTC (permalink / raw)
To: Rich Lane; +Cc: dev
On Tue, Nov 17, 2015 at 08:39:30AM -0800, Rich Lane wrote:
>
> I don't think that adding a SIGINT handler is the right solution, though. The
> guest app could be killed with another signal (SIGKILL).
Good point.
> Worse, a malicious or
> buggy guest could write to just that field. vhost should not crash no matter
> what the guest writes into the virtqueues.
Yeah, I agree with you: though we could fix this issue in the source
side, we also should do some defend here.
How about following patch then?
Note that the vec_id overflow check should be done before referencing
it, but not after. Hence I moved it ahead.
--yliu
---
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 9322ce6..08f5942 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -132,6 +132,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
/* Get descriptor from available ring */
desc = &vq->desc[head[packet_success]];
+ if (desc->len == 0)
+ break;
buff = pkts[packet_success];
@@ -153,6 +155,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
/* Buffer address translation. */
buff_addr = gpa_to_vva(dev, desc->addr);
} else {
+ if (desc->len < vq->vhost_hlen)
+ break;
vb_offset += vq->vhost_hlen;
hdr = 1;
}
@@ -446,6 +450,9 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
uint32_t vec_id = *vec_idx;
do {
+ if (vec_id >= BUF_VECTOR_MAX)
+ break;
+
next_desc = 0;
len += vq->desc[idx].len;
vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
@@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
goto merge_rx_exit;
} else {
update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx);
+ if (secure_len == 0)
+ goto merge_rx_exit;
res_cur_idx++;
}
} while (pkt_len > secure_len);
@@ -631,6 +640,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
uint8_t alloc_err = 0;
desc = &vq->desc[head[entry_success]];
+ if (desc->len == 0)
+ break;
/* Discard first buffer as it is the virtio header */
if (desc->flags & VRING_DESC_F_NEXT) {
@@ -638,6 +649,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
vb_offset = 0;
vb_avail = desc->len;
} else {
+ if (desc->len < vq->vhost_hlen)
+ break;
vb_offset = vq->vhost_hlen;
vb_avail = desc->len - vb_offset;
}
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 2:56 ` Yuanhan Liu
@ 2015-11-18 5:23 ` Wang, Zhihong
2015-11-18 5:26 ` Rich Lane
` (2 subsequent siblings)
3 siblings, 0 replies; 18+ messages in thread
From: Wang, Zhihong @ 2015-11-18 5:23 UTC (permalink / raw)
To: Yuanhan Liu, Rich Lane; +Cc: dev
> -----Original Message-----
> From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com]
> Sent: Wednesday, November 18, 2015 10:57 AM
> To: Rich Lane <rich.lane@bigswitch.com>
> Cc: dev@dpdk.org; Xie, Huawei <huawei.xie@intel.com>; Wang, Zhihong
> <zhihong.wang@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [PATCH] vhost: avoid buffer overflow in update_secure_len
>
> On Tue, Nov 17, 2015 at 08:39:30AM -0800, Rich Lane wrote:
> >
> > I don't think that adding a SIGINT handler is the right solution,
> > though. The guest app could be killed with another signal (SIGKILL).
>
> Good point.
>
> > Worse, a malicious or
> > buggy guest could write to just that field. vhost should not crash no
> > matter what the guest writes into the virtqueues.
>
> Yeah, I agree with you: though we could fix this issue in the source side, we also
> should do some defend here.
>
Exactly, DPDK should be able to take care of both ends:
# Provide interface for resource cleanup
# Be prepared if the app doesn't shutdown properly
> How about following patch then?
>
> Note that the vec_id overflow check should be done before referencing it, but
> not after. Hence I moved it ahead.
>
> --yliu
>
> ---
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index
> 9322ce6..08f5942 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -132,6 +132,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>
> /* Get descriptor from available ring */
> desc = &vq->desc[head[packet_success]];
> + if (desc->len == 0)
> + break;
>
> buff = pkts[packet_success];
>
> @@ -153,6 +155,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
> /* Buffer address translation. */
> buff_addr = gpa_to_vva(dev, desc->addr);
> } else {
> + if (desc->len < vq->vhost_hlen)
> + break;
> vb_offset += vq->vhost_hlen;
> hdr = 1;
> }
> @@ -446,6 +450,9 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t
> id,
> uint32_t vec_id = *vec_idx;
>
> do {
> + if (vec_id >= BUF_VECTOR_MAX)
> + break;
> +
> next_desc = 0;
> len += vq->desc[idx].len;
> vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr; @@ -519,6
> +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
> goto merge_rx_exit;
> } else {
> update_secure_len(vq, res_cur_idx, &secure_len,
> &vec_idx);
> + if (secure_len == 0)
> + goto merge_rx_exit;
> res_cur_idx++;
> }
> } while (pkt_len > secure_len);
> @@ -631,6 +640,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
> uint16_t queue_id,
> uint8_t alloc_err = 0;
>
> desc = &vq->desc[head[entry_success]];
> + if (desc->len == 0)
> + break;
>
> /* Discard first buffer as it is the virtio header */
> if (desc->flags & VRING_DESC_F_NEXT) { @@ -638,6 +649,8 @@
> rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> vb_offset = 0;
> vb_avail = desc->len;
> } else {
> + if (desc->len < vq->vhost_hlen)
> + break;
> vb_offset = vq->vhost_hlen;
> vb_avail = desc->len - vb_offset;
> }
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 2:56 ` Yuanhan Liu
2015-11-18 5:23 ` Wang, Zhihong
@ 2015-11-18 5:26 ` Rich Lane
2015-11-18 5:32 ` Yuanhan Liu
2015-11-18 6:13 ` Xie, Huawei
2015-11-18 7:53 ` Xie, Huawei
3 siblings, 1 reply; 18+ messages in thread
From: Rich Lane @ 2015-11-18 5:26 UTC (permalink / raw)
To: Yuanhan Liu; +Cc: dev
On Tue, Nov 17, 2015 at 6:56 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:
> @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t
> queue_id,
> goto merge_rx_exit;
> } else {
> update_secure_len(vq, res_cur_idx,
> &secure_len, &vec_idx);
> + if (secure_len == 0)
> + goto merge_rx_exit;
> res_cur_idx++;
> }
> } while (pkt_len > secure_len);
>
I think this needs to check whether secure_len was modified. secure_len is
read-write and could have a nonzero value going into the call. It could be
cleaner to give update_secure_len a return value saying whether it was able
to reserve any buffers.
Otherwise looks good, thanks!
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 5:26 ` Rich Lane
@ 2015-11-18 5:32 ` Yuanhan Liu
0 siblings, 0 replies; 18+ messages in thread
From: Yuanhan Liu @ 2015-11-18 5:32 UTC (permalink / raw)
To: Rich Lane; +Cc: dev
On Tue, Nov 17, 2015 at 09:26:57PM -0800, Rich Lane wrote:
> On Tue, Nov 17, 2015 at 6:56 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> wrote:
>
> @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t
> queue_id,
> goto merge_rx_exit;
> } else {
> update_secure_len(vq, res_cur_idx,
> &secure_len, &vec_idx);
> + if (secure_len == 0)
> + goto merge_rx_exit;
> res_cur_idx++;
> }
> } while (pkt_len > secure_len);
>
>
> I think this needs to check whether secure_len was modified. secure_len is
> read-write and could have a nonzero value going into the call. It could be
> cleaner to give update_secure_len a return value saying whether it was able to
> reserve any buffers.
Good suggestion.
--yliu
>
> Otherwise looks good, thanks!
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 2:56 ` Yuanhan Liu
2015-11-18 5:23 ` Wang, Zhihong
2015-11-18 5:26 ` Rich Lane
@ 2015-11-18 6:13 ` Xie, Huawei
2015-11-18 6:25 ` Yuanhan Liu
2015-11-18 15:53 ` Stephen Hemminger
2015-11-18 7:53 ` Xie, Huawei
3 siblings, 2 replies; 18+ messages in thread
From: Xie, Huawei @ 2015-11-18 6:13 UTC (permalink / raw)
To: Yuanhan Liu, Rich Lane; +Cc: dev
On 11/18/2015 10:56 AM, Yuanhan Liu wrote:
> On Tue, Nov 17, 2015 at 08:39:30AM -0800, Rich Lane wrote:
>> I don't think that adding a SIGINT handler is the right solution, though. The
>> guest app could be killed with another signal (SIGKILL).
> Good point.
>
>> Worse, a malicious or
>> buggy guest could write to just that field. vhost should not crash no matter
>> what the guest writes into the virtqueues.
Rich, exactly, that has been in our list for a long time. We should
ensure that "Any malicious guest couldn't crash host through vrings"
otherwise this vhost implementation couldn't be deployed into production
environment.
There are many other known security holes in current dpdk vhost in my mind.
A very simple example is we don't check the gpa_to_vva return value, so
you could easily put a invalid GPA to vring entry to crash vhost.
My plan is to review the vhost implementation, fix all the possible
issues in one single patch set, and make the fix performance
optimization friendly rather than fix them here and there.
> Yeah, I agree with you: though we could fix this issue in the source
> side, we also should do some defend here.
>
> How about following patch then?
>
> Note that the vec_id overflow check should be done before referencing
> it, but not after. Hence I moved it ahead.
>
> --yliu
>
> ---
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 9322ce6..08f5942 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -132,6 +132,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>
> /* Get descriptor from available ring */
> desc = &vq->desc[head[packet_success]];
> + if (desc->len == 0)
> + break;
>
> buff = pkts[packet_success];
>
> @@ -153,6 +155,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
> /* Buffer address translation. */
> buff_addr = gpa_to_vva(dev, desc->addr);
> } else {
> + if (desc->len < vq->vhost_hlen)
> + break;
> vb_offset += vq->vhost_hlen;
> hdr = 1;
> }
> @@ -446,6 +450,9 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
> uint32_t vec_id = *vec_idx;
>
> do {
> + if (vec_id >= BUF_VECTOR_MAX)
> + break;
> +
> next_desc = 0;
> len += vq->desc[idx].len;
> vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
> @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
> goto merge_rx_exit;
> } else {
> update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx);
> + if (secure_len == 0)
> + goto merge_rx_exit;
> res_cur_idx++;
> }
> } while (pkt_len > secure_len);
> @@ -631,6 +640,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> uint8_t alloc_err = 0;
>
> desc = &vq->desc[head[entry_success]];
> + if (desc->len == 0)
> + break;
>
> /* Discard first buffer as it is the virtio header */
> if (desc->flags & VRING_DESC_F_NEXT) {
> @@ -638,6 +649,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> vb_offset = 0;
> vb_avail = desc->len;
> } else {
> + if (desc->len < vq->vhost_hlen)
> + break;
> vb_offset = vq->vhost_hlen;
> vb_avail = desc->len - vb_offset;
> }
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 6:13 ` Xie, Huawei
@ 2015-11-18 6:25 ` Yuanhan Liu
2015-11-18 8:13 ` Xie, Huawei
2015-11-18 15:53 ` Stephen Hemminger
1 sibling, 1 reply; 18+ messages in thread
From: Yuanhan Liu @ 2015-11-18 6:25 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev
On Wed, Nov 18, 2015 at 06:13:08AM +0000, Xie, Huawei wrote:
> On 11/18/2015 10:56 AM, Yuanhan Liu wrote:
> > On Tue, Nov 17, 2015 at 08:39:30AM -0800, Rich Lane wrote:
> >> I don't think that adding a SIGINT handler is the right solution, though. The
> >> guest app could be killed with another signal (SIGKILL).
> > Good point.
> >
> >> Worse, a malicious or
> >> buggy guest could write to just that field. vhost should not crash no matter
> >> what the guest writes into the virtqueues.
> Rich, exactly, that has been in our list for a long time. We should
> ensure that "Any malicious guest couldn't crash host through vrings"
> otherwise this vhost implementation couldn't be deployed into production
> environment.
> There are many other known security holes in current dpdk vhost in my mind.
> A very simple example is we don't check the gpa_to_vva return value, so
> you could easily put a invalid GPA to vring entry to crash vhost.
> My plan is to review the vhost implementation, fix all the possible
> issues in one single patch set, and make the fix performance
First of all, there is no way you could find all of them out at
once, for we simply make mistakes, and may miss something here
and there.
And, fixing them in one single patch is not a good pratice; fixing
them with one issue per patch is. That will make patch eaiser to
review, yet easier to revert if it's a wrong fix. And it's friendly
to bisect as well, if it breaks something.
--yliu
> optimization friendly rather than fix them here and there.
>
> > Yeah, I agree with you: though we could fix this issue in the source
> > side, we also should do some defend here.
> >
> > How about following patch then?
> >
> > Note that the vec_id overflow check should be done before referencing
> > it, but not after. Hence I moved it ahead.
> >
> > --yliu
> >
> > ---
> > diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> > index 9322ce6..08f5942 100644
> > --- a/lib/librte_vhost/vhost_rxtx.c
> > +++ b/lib/librte_vhost/vhost_rxtx.c
> > @@ -132,6 +132,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
> >
> > /* Get descriptor from available ring */
> > desc = &vq->desc[head[packet_success]];
> > + if (desc->len == 0)
> > + break;
> >
> > buff = pkts[packet_success];
> >
> > @@ -153,6 +155,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
> > /* Buffer address translation. */
> > buff_addr = gpa_to_vva(dev, desc->addr);
> > } else {
> > + if (desc->len < vq->vhost_hlen)
> > + break;
> > vb_offset += vq->vhost_hlen;
> > hdr = 1;
> > }
> > @@ -446,6 +450,9 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
> > uint32_t vec_id = *vec_idx;
> >
> > do {
> > + if (vec_id >= BUF_VECTOR_MAX)
> > + break;
> > +
> > next_desc = 0;
> > len += vq->desc[idx].len;
> > vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
> > @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
> > goto merge_rx_exit;
> > } else {
> > update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx);
> > + if (secure_len == 0)
> > + goto merge_rx_exit;
> > res_cur_idx++;
> > }
> > } while (pkt_len > secure_len);
> > @@ -631,6 +640,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> > uint8_t alloc_err = 0;
> >
> > desc = &vq->desc[head[entry_success]];
> > + if (desc->len == 0)
> > + break;
> >
> > /* Discard first buffer as it is the virtio header */
> > if (desc->flags & VRING_DESC_F_NEXT) {
> > @@ -638,6 +649,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> > vb_offset = 0;
> > vb_avail = desc->len;
> > } else {
> > + if (desc->len < vq->vhost_hlen)
> > + break;
> > vb_offset = vq->vhost_hlen;
> > vb_avail = desc->len - vb_offset;
> > }
> >
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 2:56 ` Yuanhan Liu
` (2 preceding siblings ...)
2015-11-18 6:13 ` Xie, Huawei
@ 2015-11-18 7:53 ` Xie, Huawei
2015-11-18 8:48 ` Yuanhan Liu
3 siblings, 1 reply; 18+ messages in thread
From: Xie, Huawei @ 2015-11-18 7:53 UTC (permalink / raw)
To: Yuanhan Liu, Rich Lane; +Cc: dev
On 11/18/2015 10:56 AM, Yuanhan Liu wrote:
> On Tue, Nov 17, 2015 at 08:39:30AM -0800, Rich Lane wrote:
>> I don't think that adding a SIGINT handler is the right solution, though. The
>> guest app could be killed with another signal (SIGKILL).
> Good point.
>
>> Worse, a malicious or
>> buggy guest could write to just that field. vhost should not crash no matter
>> what the guest writes into the virtqueues.
> Yeah, I agree with you: though we could fix this issue in the source
> side, we also should do some defend here.
>
> How about following patch then?
>
> Note that the vec_id overflow check should be done before referencing
> it, but not after. Hence I moved it ahead.
>
> --yliu
>
> ---
> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
> index 9322ce6..08f5942 100644
> --- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -132,6 +132,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>
> /* Get descriptor from available ring */
> desc = &vq->desc[head[packet_success]];
> + if (desc->len == 0)
> + break;
>
> buff = pkts[packet_success];
>
> @@ -153,6 +155,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
> /* Buffer address translation. */
> buff_addr = gpa_to_vva(dev, desc->addr);
> } else {
> + if (desc->len < vq->vhost_hlen)
> + break;
> vb_offset += vq->vhost_hlen;
> hdr = 1;
> }
> @@ -446,6 +450,9 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
> uint32_t vec_id = *vec_idx;
>
> do {
> + if (vec_id >= BUF_VECTOR_MAX)
> + break;
> +
> next_desc = 0;
> len += vq->desc[idx].len;
> vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
> @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
> goto merge_rx_exit;
> } else {
> update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx);
> + if (secure_len == 0)
> + goto merge_rx_exit;
Why do we exit when secure_len is 0 rather than 1? :). Malicious guest
could easily forge the desc len so that secure_len never reach pkt_len
even it is not zero so that host enters into dead loop here.
Generally speaking, we shouldn't fix for a specific issue, and the
security checks should be as few as possible. We need to consider
refactor the code here for the generic fix.
> res_cur_idx++;
> }
> } while (pkt_len > secure_len);
> @@ -631,6 +640,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> uint8_t alloc_err = 0;
>
> desc = &vq->desc[head[entry_success]];
> + if (desc->len == 0)
> + break;
>
> /* Discard first buffer as it is the virtio header */
> if (desc->flags & VRING_DESC_F_NEXT) {
> @@ -638,6 +649,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> vb_offset = 0;
> vb_avail = desc->len;
> } else {
> + if (desc->len < vq->vhost_hlen)
> + break;
> vb_offset = vq->vhost_hlen;
> vb_avail = desc->len - vb_offset;
> }
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 6:25 ` Yuanhan Liu
@ 2015-11-18 8:13 ` Xie, Huawei
0 siblings, 0 replies; 18+ messages in thread
From: Xie, Huawei @ 2015-11-18 8:13 UTC (permalink / raw)
To: Yuanhan Liu; +Cc: dev
On 11/18/2015 2:25 PM, Yuanhan Liu wrote:
> On Wed, Nov 18, 2015 at 06:13:08AM +0000, Xie, Huawei wrote:
>> On 11/18/2015 10:56 AM, Yuanhan Liu wrote:
>>> On Tue, Nov 17, 2015 at 08:39:30AM -0800, Rich Lane wrote:
>>>> I don't think that adding a SIGINT handler is the right solution, though. The
>>>> guest app could be killed with another signal (SIGKILL).
>>> Good point.
>>>
>>>> Worse, a malicious or
>>>> buggy guest could write to just that field. vhost should not crash no matter
>>>> what the guest writes into the virtqueues.
>> Rich, exactly, that has been in our list for a long time. We should
>> ensure that "Any malicious guest couldn't crash host through vrings"
>> otherwise this vhost implementation couldn't be deployed into production
>> environment.
>> There are many other known security holes in current dpdk vhost in my mind.
>> A very simple example is we don't check the gpa_to_vva return value, so
>> you could easily put a invalid GPA to vring entry to crash vhost.
>> My plan is to review the vhost implementation, fix all the possible
>> issues in one single patch set, and make the fix performance
> First of all, there is no way you could find all of them out at
> once, for we simply make mistakes, and may miss something here
> and there.
Agree.
>
> And, fixing them in one single patch is not a good pratice; fixing
> them with one issue per patch is. That will make patch eaiser to
> review, yet easier to revert if it's a wrong fix. And it's friendly
> to bisect as well, if it breaks something.
One patch set, not one big patch. Anyway it isn't the key point.
The key point i want to make is we re-review the dpdk vhost
implementation from security point's review, from high level.
Otherwise as i commented in another mail, we add checks here and there,
but actually the fix isn't the generic fix, and some checks could be merged.
>
> --yliu
>
>> optimization friendly rather than fix them here and there.
>>
>>> Yeah, I agree with you: though we could fix this issue in the source
>>> side, we also should do some defend here.
>>>
>>> How about following patch then?
>>>
>>> Note that the vec_id overflow check should be done before referencing
>>> it, but not after. Hence I moved it ahead.
>>>
>>> --yliu
>>>
>>> ---
>>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
>>> index 9322ce6..08f5942 100644
>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>>> @@ -132,6 +132,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>>>
>>> /* Get descriptor from available ring */
>>> desc = &vq->desc[head[packet_success]];
>>> + if (desc->len == 0)
>>> + break;
>>>
>>> buff = pkts[packet_success];
>>>
>>> @@ -153,6 +155,8 @@ virtio_dev_rx(struct virtio_net *dev, uint16_t queue_id,
>>> /* Buffer address translation. */
>>> buff_addr = gpa_to_vva(dev, desc->addr);
>>> } else {
>>> + if (desc->len < vq->vhost_hlen)
>>> + break;
>>> vb_offset += vq->vhost_hlen;
>>> hdr = 1;
>>> }
>>> @@ -446,6 +450,9 @@ update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
>>> uint32_t vec_id = *vec_idx;
>>>
>>> do {
>>> + if (vec_id >= BUF_VECTOR_MAX)
>>> + break;
>>> +
>>> next_desc = 0;
>>> len += vq->desc[idx].len;
>>> vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
>>> @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
>>> goto merge_rx_exit;
>>> } else {
>>> update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx);
>>> + if (secure_len == 0)
>>> + goto merge_rx_exit;
>>> res_cur_idx++;
>>> }
>>> } while (pkt_len > secure_len);
>>> @@ -631,6 +640,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
>>> uint8_t alloc_err = 0;
>>>
>>> desc = &vq->desc[head[entry_success]];
>>> + if (desc->len == 0)
>>> + break;
>>>
>>> /* Discard first buffer as it is the virtio header */
>>> if (desc->flags & VRING_DESC_F_NEXT) {
>>> @@ -638,6 +649,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
>>> vb_offset = 0;
>>> vb_avail = desc->len;
>>> } else {
>>> + if (desc->len < vq->vhost_hlen)
>>> + break;
>>> vb_offset = vq->vhost_hlen;
>>> vb_avail = desc->len - vb_offset;
>>> }
>>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 7:53 ` Xie, Huawei
@ 2015-11-18 8:48 ` Yuanhan Liu
2015-11-18 11:15 ` Xie, Huawei
0 siblings, 1 reply; 18+ messages in thread
From: Yuanhan Liu @ 2015-11-18 8:48 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev
On Wed, Nov 18, 2015 at 07:53:24AM +0000, Xie, Huawei wrote:
...
> > do {
> > + if (vec_id >= BUF_VECTOR_MAX)
> > + break;
> > +
> > next_desc = 0;
> > len += vq->desc[idx].len;
> > vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
> > @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
> > goto merge_rx_exit;
> > } else {
> > update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx);
> > + if (secure_len == 0)
> > + goto merge_rx_exit;
> Why do we exit when secure_len is 0 rather than 1? :). Malicious guest
I confess it's not a proper fix. Making it return an error code, as Rich
suggested in early email, is better. It's generic enough, as we have to
check the vec_buf overflow here.
BTW, can we move the vec_buf outside `struct vhost_virtqueue'? It makes
the structure huge.
> could easily forge the desc len so that secure_len never reach pkt_len
> even it is not zero so that host enters into dead loop here.
> Generally speaking, we shouldn't fix for a specific issue,
Agreed.
> and the
> security checks should be as few as possible.
Idealy, yes.
> We need to consider
> refactor the code here for the generic fix.
What's your thougths?
--yliu
>
> > res_cur_idx++;
> > }
> > } while (pkt_len > secure_len);
> > @@ -631,6 +640,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> > uint8_t alloc_err = 0;
> >
> > desc = &vq->desc[head[entry_success]];
> > + if (desc->len == 0)
> > + break;
> >
> > /* Discard first buffer as it is the virtio header */
> > if (desc->flags & VRING_DESC_F_NEXT) {
> > @@ -638,6 +649,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
> > vb_offset = 0;
> > vb_avail = desc->len;
> > } else {
> > + if (desc->len < vq->vhost_hlen)
> > + break;
> > vb_offset = vq->vhost_hlen;
> > vb_avail = desc->len - vb_offset;
> > }
> >
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 8:48 ` Yuanhan Liu
@ 2015-11-18 11:15 ` Xie, Huawei
2015-11-19 5:51 ` Yuanhan Liu
0 siblings, 1 reply; 18+ messages in thread
From: Xie, Huawei @ 2015-11-18 11:15 UTC (permalink / raw)
To: Yuanhan Liu; +Cc: dev
On 11/18/2015 4:47 PM, Yuanhan Liu wrote:
> On Wed, Nov 18, 2015 at 07:53:24AM +0000, Xie, Huawei wrote:
> ...
>>> do {
>>> + if (vec_id >= BUF_VECTOR_MAX)
>>> + break;
>>> +
>>> next_desc = 0;
>>> len += vq->desc[idx].len;
>>> vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
>>> @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
>>> goto merge_rx_exit;
>>> } else {
>>> update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx);
>>> + if (secure_len == 0)
>>> + goto merge_rx_exit;
>> Why do we exit when secure_len is 0 rather than 1? :). Malicious guest
> I confess it's not a proper fix. Making it return an error code, as Rich
> suggested in early email, is better. It's generic enough, as we have to
> check the vec_buf overflow here.
>
> BTW, can we move the vec_buf outside `struct vhost_virtqueue'? It makes
> the structure huge.
>
>> could easily forge the desc len so that secure_len never reach pkt_len
>> even it is not zero so that host enters into dead loop here.
>> Generally speaking, we shouldn't fix for a specific issue,
> Agreed.
>
>> and the
>> security checks should be as few as possible.
> Idealy, yes.
>
>> We need to consider
>> refactor the code here for the generic fix.
> What's your thougths?
Maybe we merge the update_secure_len with the outside loop into a simple
inline function, in which we consider both the max vector number and
desc count to avoid trapped into dead loop. This functions returns a buf
vec with which we could copy securely afterwards.
>
> --yliu
>>> res_cur_idx++;
>>> }
>>> } while (pkt_len > secure_len);
>>> @@ -631,6 +640,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
>>> uint8_t alloc_err = 0;
>>>
>>> desc = &vq->desc[head[entry_success]];
>>> + if (desc->len == 0)
>>> + break;
>>>
>>> /* Discard first buffer as it is the virtio header */
>>> if (desc->flags & VRING_DESC_F_NEXT) {
>>> @@ -638,6 +649,8 @@ rte_vhost_dequeue_burst(struct virtio_net *dev, uint16_t queue_id,
>>> vb_offset = 0;
>>> vb_avail = desc->len;
>>> } else {
>>> + if (desc->len < vq->vhost_hlen)
>>> + break;
>>> vb_offset = vq->vhost_hlen;
>>> vb_avail = desc->len - vb_offset;
>>> }
>>>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 6:13 ` Xie, Huawei
2015-11-18 6:25 ` Yuanhan Liu
@ 2015-11-18 15:53 ` Stephen Hemminger
2015-11-18 16:00 ` Xie, Huawei
1 sibling, 1 reply; 18+ messages in thread
From: Stephen Hemminger @ 2015-11-18 15:53 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev
On Wed, 18 Nov 2015 06:13:08 +0000
"Xie, Huawei" <huawei.xie@intel.com> wrote:
> On 11/18/2015 10:56 AM, Yuanhan Liu wrote:
> > On Tue, Nov 17, 2015 at 08:39:30AM -0800, Rich Lane wrote:
> >> I don't think that adding a SIGINT handler is the right solution, though. The
> >> guest app could be killed with another signal (SIGKILL).
> > Good point.
> >
> >> Worse, a malicious or
> >> buggy guest could write to just that field. vhost should not crash no matter
> >> what the guest writes into the virtqueues.
> Rich, exactly, that has been in our list for a long time. We should
> ensure that "Any malicious guest couldn't crash host through vrings"
> otherwise this vhost implementation couldn't be deployed into production
> environment.
> There are many other known security holes in current dpdk vhost in my mind.
> A very simple example is we don't check the gpa_to_vva return value, so
> you could easily put a invalid GPA to vring entry to crash vhost.
> My plan is to review the vhost implementation, fix all the possible
> issues in one single patch set, and make the fix performance
> optimization friendly rather than fix them here and there.
>
Both virtio and vhost need to adopt the "other side is broken" flag
model that is in Linux drivers. What this means is that the virtio
and vhost driver would check parameters for consistency, and if out
of bounds set a broken flag and refuse to do anything more with the
device until reset.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 15:53 ` Stephen Hemminger
@ 2015-11-18 16:00 ` Xie, Huawei
0 siblings, 0 replies; 18+ messages in thread
From: Xie, Huawei @ 2015-11-18 16:00 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
On 11/18/2015 11:53 PM, Stephen Hemminger wrote:
> On Wed, 18 Nov 2015 06:13:08 +0000
> "Xie, Huawei" <huawei.xie@intel.com> wrote:
>
>> On 11/18/2015 10:56 AM, Yuanhan Liu wrote:
>>> On Tue, Nov 17, 2015 at 08:39:30AM -0800, Rich Lane wrote:
>>>> I don't think that adding a SIGINT handler is the right solution, though. The
>>>> guest app could be killed with another signal (SIGKILL).
>>> Good point.
>>>
>>>> Worse, a malicious or
>>>> buggy guest could write to just that field. vhost should not crash no matter
>>>> what the guest writes into the virtqueues.
>> Rich, exactly, that has been in our list for a long time. We should
>> ensure that "Any malicious guest couldn't crash host through vrings"
>> otherwise this vhost implementation couldn't be deployed into production
>> environment.
>> There are many other known security holes in current dpdk vhost in my mind.
>> A very simple example is we don't check the gpa_to_vva return value, so
>> you could easily put a invalid GPA to vring entry to crash vhost.
>> My plan is to review the vhost implementation, fix all the possible
>> issues in one single patch set, and make the fix performance
>> optimization friendly rather than fix them here and there.
>>
> Both virtio and vhost need to adopt the "other side is broken" flag
> model that is in Linux drivers. What this means is that the virtio
> and vhost driver would check parameters for consistency, and if out
> of bounds set a broken flag and refuse to do anything more with the
> device until reset.
Stephen:
You raise an important opinion.
Current DPDK virtio driver implementation chooses to trust the vhost, so
doesn't do any consistency check.
What is the reason that virtio driver also needs consistency check? Is
it that vhost might be buggy or that vhost might also not be trusted in
some user case?
/huawei
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len
2015-11-18 11:15 ` Xie, Huawei
@ 2015-11-19 5:51 ` Yuanhan Liu
0 siblings, 0 replies; 18+ messages in thread
From: Yuanhan Liu @ 2015-11-19 5:51 UTC (permalink / raw)
To: Xie, Huawei; +Cc: dev
On Wed, Nov 18, 2015 at 11:15:25AM +0000, Xie, Huawei wrote:
> On 11/18/2015 4:47 PM, Yuanhan Liu wrote:
> > On Wed, Nov 18, 2015 at 07:53:24AM +0000, Xie, Huawei wrote:
> > ...
> >>> do {
> >>> + if (vec_id >= BUF_VECTOR_MAX)
> >>> + break;
> >>> +
> >>> next_desc = 0;
> >>> len += vq->desc[idx].len;
> >>> vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
> >>> @@ -519,6 +526,8 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
> >>> goto merge_rx_exit;
> >>> } else {
> >>> update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx);
> >>> + if (secure_len == 0)
> >>> + goto merge_rx_exit;
> >> Why do we exit when secure_len is 0 rather than 1? :). Malicious guest
> > I confess it's not a proper fix. Making it return an error code, as Rich
> > suggested in early email, is better. It's generic enough, as we have to
> > check the vec_buf overflow here.
> >
> > BTW, can we move the vec_buf outside `struct vhost_virtqueue'? It makes
> > the structure huge.
> >
> >> could easily forge the desc len so that secure_len never reach pkt_len
> >> even it is not zero so that host enters into dead loop here.
> >> Generally speaking, we shouldn't fix for a specific issue,
> > Agreed.
> >
> >> and the
> >> security checks should be as few as possible.
> > Idealy, yes.
> >
> >> We need to consider
> >> refactor the code here for the generic fix.
> > What's your thougths?
> Maybe we merge the update_secure_len with the outside loop into a simple
> inline function, in which we consider both the max vector number and
> desc count to avoid trapped into dead loop. This functions returns a buf
> vec with which we could copy securely afterwards.
I agree that grouping them into a function makes the logic clearer, and
hence less error-prone.
I made a quick try. Comments?
--yliu
---
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 4fc35d1..e270fb1 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -439,32 +439,98 @@ copy_from_mbuf_to_vring(struct virtio_net *dev, uint32_t queue_id,
return entry_success;
}
-static inline void __attribute__((always_inline))
-update_secure_len(struct vhost_virtqueue *vq, uint32_t id,
- uint32_t *secure_len, uint32_t *vec_idx)
+static inline int
+fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx,
+ uint32_t *allocated, uint32_t *vec_idx)
{
- uint16_t wrapped_idx = id & (vq->size - 1);
- uint32_t idx = vq->avail->ring[wrapped_idx];
- uint8_t next_desc;
- uint32_t len = *secure_len;
+ uint16_t idx = vq->avail->ring[avail_idx & (vq->size - 1)];
uint32_t vec_id = *vec_idx;
+ uint32_t len = *allocated;
+
+ while (1) {
+ if (vec_id >= BUF_VECTOR_MAX)
+ return -1;
- do {
- next_desc = 0;
len += vq->desc[idx].len;
vq->buf_vec[vec_id].buf_addr = vq->desc[idx].addr;
vq->buf_vec[vec_id].buf_len = vq->desc[idx].len;
vq->buf_vec[vec_id].desc_idx = idx;
vec_id++;
- if (vq->desc[idx].flags & VRING_DESC_F_NEXT) {
- idx = vq->desc[idx].next;
- next_desc = 1;
+ if ((vq->desc[idx].flags & VRING_DESC_F_NEXT) == 0)
+ break;
+
+ idx = vq->desc[idx].next;
+ }
+
+ *allocated = len;
+ *vec_idx = vec_id;
+
+ return 0;
+}
+
+/*
+ * As many data cores may want to access available buffers concurrently,
+ * they need to be reserved.
+ *
+ * Returns -1 on fail, 0 on success
+ */
+static inline int
+reserve_avail_buf(struct vhost_virtqueue *vq, uint32_t size,
+ uint16_t *start, uint16_t *end)
+{
+ uint16_t res_base_idx;
+ uint16_t res_cur_idx;
+ uint16_t avail_idx;
+ uint32_t allocated;
+ uint32_t vec_idx;
+ uint16_t tries;
+
+again:
+ res_base_idx = vq->last_used_idx_res;
+ res_cur_idx = res_base_idx;
+
+ allocated = 0;
+ vec_idx = 0;
+ tries = 0;
+ while (1) {
+ avail_idx = *((volatile uint16_t *)&vq->avail->idx);
+ if (unlikely(res_cur_idx == avail_idx)) {
+ LOG_DEBUG(VHOST_DATA, "(%"PRIu64") Failed "
+ "to get enough desc from vring\n",
+ dev->device_fh);
+ return -1;
}
- } while (next_desc);
- *secure_len = len;
- *vec_idx = vec_id;
+ if (fill_vec_buf(vq, res_cur_idx, &allocated, &vec_idx) < 0)
+ return -1;
+
+ res_cur_idx++;
+ tries++;
+
+ if (allocated >= size)
+ break;
+
+ /*
+ * if we tried all available ring items, and still
+ * can't get enough buf, it means something abnormal
+ * happened.
+ */
+ if (tries >= vq->size)
+ return -1;
+ }
+
+ /*
+ * update vq->last_used_idx_res atomically.
+ * retry again if failed.
+ */
+ if (rte_atomic16_cmpset(&vq->last_used_idx_res,
+ res_base_idx, res_cur_idx) == 0)
+ goto again;
+
+ *start = res_base_idx;
+ *end = res_cur_idx;
+ return 0;
}
/*
@@ -476,9 +542,7 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
{
struct vhost_virtqueue *vq;
uint32_t pkt_idx = 0, entry_success = 0;
- uint16_t avail_idx;
- uint16_t res_base_idx, res_cur_idx;
- uint8_t success = 0;
+ uint16_t start, end;
LOG_DEBUG(VHOST_DATA, "(%"PRIu64") virtio_dev_merge_rx()\n",
dev->device_fh);
@@ -501,40 +565,11 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
for (pkt_idx = 0; pkt_idx < count; pkt_idx++) {
uint32_t pkt_len = pkts[pkt_idx]->pkt_len + vq->vhost_hlen;
- do {
- /*
- * As many data cores may want access to available
- * buffers, they need to be reserved.
- */
- uint32_t secure_len = 0;
- uint32_t vec_idx = 0;
-
- res_base_idx = vq->last_used_idx_res;
- res_cur_idx = res_base_idx;
-
- do {
- avail_idx = *((volatile uint16_t *)&vq->avail->idx);
- if (unlikely(res_cur_idx == avail_idx)) {
- LOG_DEBUG(VHOST_DATA,
- "(%"PRIu64") Failed "
- "to get enough desc from "
- "vring\n",
- dev->device_fh);
- goto merge_rx_exit;
- } else {
- update_secure_len(vq, res_cur_idx, &secure_len, &vec_idx);
- res_cur_idx++;
- }
- } while (pkt_len > secure_len);
-
- /* vq->last_used_idx_res is atomically updated. */
- success = rte_atomic16_cmpset(&vq->last_used_idx_res,
- res_base_idx,
- res_cur_idx);
- } while (success == 0);
+ if (reserve_avail_buf(vq, pkt_len, &start, &end) < 0)
+ break;
entry_success = copy_from_mbuf_to_vring(dev, queue_id,
- res_base_idx, res_cur_idx, pkts[pkt_idx]);
+ start, end, pkts[pkt_idx]);
rte_compiler_barrier();
@@ -542,14 +577,13 @@ virtio_dev_merge_rx(struct virtio_net *dev, uint16_t queue_id,
* Wait until it's our turn to add our buffer
* to the used ring.
*/
- while (unlikely(vq->last_used_idx != res_base_idx))
+ while (unlikely(vq->last_used_idx != start))
rte_pause();
*(volatile uint16_t *)&vq->used->idx += entry_success;
- vq->last_used_idx = res_cur_idx;
+ vq->last_used_idx = end;
}
-merge_rx_exit:
if (likely(pkt_idx)) {
/* flush used->idx update before we read avail->flags. */
rte_mb();
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2015-11-19 5:50 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12 8:02 [dpdk-dev] [PATCH] vhost: avoid buffer overflow in update_secure_len Rich Lane
2015-11-12 9:23 ` Yuanhan Liu
2015-11-12 21:46 ` Rich Lane
2015-11-17 13:23 ` Yuanhan Liu
2015-11-17 16:39 ` Rich Lane
2015-11-18 2:56 ` Yuanhan Liu
2015-11-18 5:23 ` Wang, Zhihong
2015-11-18 5:26 ` Rich Lane
2015-11-18 5:32 ` Yuanhan Liu
2015-11-18 6:13 ` Xie, Huawei
2015-11-18 6:25 ` Yuanhan Liu
2015-11-18 8:13 ` Xie, Huawei
2015-11-18 15:53 ` Stephen Hemminger
2015-11-18 16:00 ` Xie, Huawei
2015-11-18 7:53 ` Xie, Huawei
2015-11-18 8:48 ` Yuanhan Liu
2015-11-18 11:15 ` Xie, Huawei
2015-11-19 5:51 ` Yuanhan Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).