From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 2A7D829D2 for ; Thu, 17 Mar 2016 02:35:17 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga102.fm.intel.com with ESMTP; 16 Mar 2016 18:35:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,347,1455004800"; d="scan'208";a="912573433" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga001.jf.intel.com with ESMTP; 16 Mar 2016 18:35:15 -0700 Received: from fmsmsx120.amr.corp.intel.com (10.18.124.208) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 16 Mar 2016 18:35:15 -0700 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by fmsmsx120.amr.corp.intel.com (10.18.124.208) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 16 Mar 2016 18:35:15 -0700 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.136]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.42]) with mapi id 14.03.0248.002; Thu, 17 Mar 2016 09:35:10 +0800 From: "Xie, Huawei" To: Patrik Andersson R , "dev@dpdk.org" Thread-Topic: [dpdk-dev] vhost: no protection against malformed queue descriptors in rte_vhost_dequeue_burst() Thread-Index: AdF/7T3s34T94EKdTEi8S5rnz6BmpQ== Date: Thu, 17 Mar 2016 01:35:09 +0000 Message-ID: References: <56E956F5.6080606@ericsson.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] vhost: no protection against malformed queue descriptors in rte_vhost_dequeue_burst() X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2016 01:35:17 -0000 On 3/16/2016 8:53 PM, Patrik Andersson R wrote:=0A= > Hello,=0A= >=0A= > When taking a snapshot of a running VM instance, using OpenStack=0A= > "nova image-create", I noticed that one OVS pmd-thread eventually=0A= > failed in DPDK rte_vhost_dequeue_burst() with repeating log entries:=0A= >=0A= > compute-0-6 ovs-vswitchd[38172]: VHOST_DATA: Failed to allocate=0A= > memory for mbuf.=0A= >=0A= >=0A= > Debugging (data included further down) this issue lead to the=0A= > observation that there is no protection against malformed vhost=0A= > queue descriptors, thus tenant separation might be violated as a=0A= > single faulty VM might bring down the connectivity of all VMs=0A= > connected to the same virtual switch.=0A= >=0A= > To avoid this, validation would be needed at some points in the=0A= > rte_vhost_dequeue_burst() code:=0A= >=0A= > 1) when the queue descriptor is picked up for processing,=0A= > desc->flags and desc->len might both be 0=0A= >=0A= > ...=0A= > desc =3D &vq->desc[head[entry_success]];=0A= > ...=0A= > /* Discard first buffer as it is the virtio header */=0A= > if (desc->flags & VRING_DESC_F_NEXT) {=0A= > desc =3D &vq->desc[desc->next];=0A= > vb_offset =3D 0;=0A= > vb_avail =3D desc->len;=0A= > } else {=0A= > vb_offset =3D vq->vhost_hlen;=0A= > vb_avail =3D desc->len - vb_offset;=0A= > }=0A= > ....=0A= >=0A= > 2) at buffer address translation gpa_to_vva(), might fail=0A= > returning NULL as indication=0A= >=0A= > vb_addr =3D gpa_to_vva(dev, desc->addr);=0A= > ...=0A= > while (cpy_len !=3D 0) {=0A= > rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, seg_offset),= =0A= > (void *)((uintptr_t)(vb_addr + vb_offset)),=0A= > cpy_len);=0A= > ...=0A= > }=0A= > ...=0A= >=0A= >=0A= > Wondering if there are any plans of adding any kind of validation in=0A= > DPDK, or if it would be useful to suggest specific implementation of=0A= > such validations in the DPDK code?=0A= >=0A= > Or is there some mechanism that gives us the confidence to trust=0A= > the vhost queue content absolutely?=0A= >=0A= >=0A= >=0A= > Debugging data:=0A= >=0A= > For my scenario the problem occurs in DPDK rte_vhost_dequeue_burst()=0A= > due to use of a vhost queue descriptor that has all fields 0:=0A= >=0A= > (gdb) print *desc=0A= > {addr =3D 0, len =3D 0, flags =3D 0, next =3D 0}=0A= >=0A= >=0A= > Subsequent use of desc->len to compute vb_avail =3D desc->len - vb_offset= ,=0A= > leads to the problem observed. What happens is that the packet needs to= =0A= > be segmented -- on my system it fails roughly at segment 122000 when=0A= > memory available for mbufs run out.=0A= >=0A= > The relevant local variables for rte_vhost_dequeue_burst() when breaking= =0A= > on the condition desc->len =3D=3D 0:=0A= >=0A= > vb_avail =3D 4294967284 (0xfffffff4)=0A= > seg_avail =3D 2608=0A= > vb_offset =3D 12=0A= > cpy_len =3D 2608=0A= > seg_num =3D 1=0A= > desc =3D 0x2aadb6e5c000=0A= > vb_addr =3D 46928960159744=0A= > entry_success =3D 0=0A= >=0A= > Note also that there is no crash despite to the desc->addr being zero,=0A= > it is a valid address in the regions mapped to the device. Although, the= =0A= > 3 regions mapped does not seem to be correct either at this stage.=0A= >=0A= >=0A= > The versions that I'm running are OVS 2.4.0, with corrections from the=0A= > 2.4 branch, and DPDK 2.1.0. QEMU emulator version 2.2.0 and=0A= > libvirt version 1.2.12.=0A= >=0A= >=0A= > Regards,=0A= >=0A= > Patrik=0A= =0A= Thanks Patrik. You are right. We had planned to enhance the robustness=0A= of vhost so that neither malicious nor buggy guest virtio driver could=0A= corrupt vhost. Actually the 16.04 RC1 has fixed some issues (the return=0A= of gpa_to_vva isn't checked).=0A= =0A= >=0A= =0A=