From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B2302A00C4; Wed, 28 Sep 2022 17:21:35 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6B11C4113C; Wed, 28 Sep 2022 17:21:35 +0200 (CEST) Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by mails.dpdk.org (Postfix) with ESMTP id 80B30410FA for ; Wed, 28 Sep 2022 17:21:34 +0200 (CEST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 4D9C821A81; Wed, 28 Sep 2022 15:21:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1664378494; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oGpShhXRpy2cWvhBb/XGUZD0XoxDehwzaXkdduEAgwY=; b=Br1BNBlG4Rnolzbg8MKn67UbHIQLzzJ8hQtdIbi+Og79K0SrmbnXQAC49/puumY7bYRwUB PMRnYEN7nY4n0tjti3h72KPgIHQU/W64GbO9Imr6qp2pHMn7RHTtPnIroEpk3V7psbjW8w iKY7YTv29MAtj2xtsjFQipagpo9blTs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1664378494; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oGpShhXRpy2cWvhBb/XGUZD0XoxDehwzaXkdduEAgwY=; b=CsXNP8X6u29y0FoqgZIjKFQFHiQ7M6rM18ssQ8qIc3rS5HWo/d3aPDr3q3FlzrQrkzqwrx svj1vhnHM6/udTAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 282C813A84; Wed, 28 Sep 2022 15:21:34 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id uHUdCH5mNGPaDAAAMHmgww (envelope-from ); Wed, 28 Sep 2022 15:21:34 +0000 Message-ID: <77f63478-88d1-01b2-0cdd-c93289bd42a6@suse.de> Date: Wed, 28 Sep 2022 17:21:33 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: [PATCH v3 1/2] vhost: check for nr_vec == 0 in desc_to_mbuf, mbuf_to_desc Content-Language: en-US To: Maxime Coquelin , Chenbo Xia Cc: dev@dpdk.org References: <20220802004938.23670-1-cfontana@suse.de> <20220802004938.23670-2-cfontana@suse.de> <228bf109-57c2-91f0-ab2c-6ac92d688781@redhat.com> From: Claudio Fontana In-Reply-To: <228bf109-57c2-91f0-ab2c-6ac92d688781@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 9/28/22 16:37, Maxime Coquelin wrote: > Hi Claudio, > > The title should be reworded, maybe something like below? > "vhost: fix possible out of bound access in buffer vectors" possible, I leave it to you and other maintainers here now to figure out. > > On 8/2/22 02:49, Claudio Fontana wrote: >> in virtio_dev_split we cannot currently call desc_to_mbuf with >> nr_vec == 0, or we end up trying to rte_memcpy from a source address >> buf_vec[0] that is an uninitialized stack variable. >> >> Improve this in general by having desc_to_mbuf and mbuf_to_desc >> return -1 when called with an invalid nr_vec == 0, which should >> fix any other instance of this problem. >> >> This should fix errors that have been reported in multiple occasions >> from telcos to the DPDK, OVS and QEMU projects, as this affects in >> particular the openvswitch/DPDK, QEMU vhost-user setup when the >> guest DPDK application abruptly goes away via SIGKILL and then >> reconnects. > > Looking at fill_vec_buf_split() and map_one_desc(), I suspect that on > guest DPDK application kill/crash, the shared virtqueue memory gets > unmmaped on guest side, and the backend reads zeros in descriptors, and > so zero-length. > > If that's the case, then your fix might not be enough, if for example > the backends reads the .next field of the desc as 0, it may cause > undefined behaviour. I'm not sure how we could address this though. > > Maybe we miss something on Vhost side to detect the guest application > has crashed, because we should not try to access the vrings as long as > the new guest app instance is done with its initialization. > > I'm going to try to reproduce the issue, but I'm not sure I will > succeed. Could you please share the Vhost logs when the issue is > reproduced and you face the crash? With vacations and lab work it's unlikely anything can be done from my side for the next 2-3 weeks. This issue has been reported multiple times from multiple telco customers for about a year, it's all over the mailing lists between ovs, dpdk and qemu, with all the details. Something in the governance of these Open Source projects is clearly not working right, probably too much inward-focus between a small number of companies, but I digress. I think Chenbo Xia already knows the context, and I suspect this is now considered a "security issue". The problem is, the information about all of this has been public already for a year. I will again repost how to reproduce here: In general, the easiest thing to reproduce this is to use an older DPDK in the guest (16.11.x, as per the Ubuntu package), run a testpmd application with the easiest possible configuration: I could reduce the problem to just a single data interface, without any need to pin anything, or do any complex flow configuration. The key is in getting guest application, qemu and dpdk in the right state. The data plane traffic is flowing from a traffic generator on another host. 1. Interfaces and Bridges # ovs-vsctl show 9edee1e2-c16b-41cf-bf9d-ce1ffbc096d2 Bridge ovs-br0 fail_mode: standalone datapath_type: netdev Port eth0 Interface eth0 Port dpdkvhostuser0 Interface dpdkvhostuser0 type: dpdkvhostuserclient options: {vhost-server-path="/tmp/dpdkvhostuser0"} Port ovs-br0 Interface ovs-br0 type: internal Port data0 Interface data0 type: dpdk options: {dpdk-devargs="0000:17:00.1", dpdk-lsc-interrupt="true", n_rxq="2", n_txq_desc="2048"} ovs_version: "2.14.2" 2. QEMU VM XML configuration snippet:
3. Inside the VM bind the device. $ cat devbind.sh modprobe uio_pci_generic dpdk-stable-16.11.11/tools/dpdk-devbind.py -b uio_pci_generic 0000:07:00.0 4. Prepare the test scripts inside the VM: Here are the scripts to use: $ cat start_testpmd.sh #! /bin/bash while true ; do testpmd --log-level=8 -c 0x1e -n 4 --socket-mem 512 -- -i --nb-cores=3 --port-topology=chained --disable-hw-vlan --forward-mode=macswap --auto-start --rxq=4 --txq=4 --rxd=512 --txd=512 --burst=32 done $ cat kill_testpmd.sh #! /bin/bash while true ; do kill -9 `pgrep -x testpmd` sleep 2 done 5. Run the test scripts in separate ssh sessions or consoles: session 1: ./start_testpmd.sh session 2: ./kill_testpmd.sh this ensures that the test is continuosly started and killed after a short time running. If the "sleep 2" is too little (necessary if you are using older qemu or DPDK on the host) you can increase it to "sleep 10". But with upstream DPDK and decent hardware, "sleep 2" will make you hit the issue quicker. The issue will appear as a segfault of the ovs daemon. It can be captured via gdb, or monitored via the ovs monitor, with the backtrace that follows: > >> >> The back trace looks roughly like this, depending on the specific >> rte_memcpy selected, etc, in any case the "src" parameter is garbage >> (in this example containing 0 + dev->host_hlen(12 = 0xc)). >> >> Thread 153 "pmd-c88/id:150" received signal SIGSEGV, Segmentation fault. >> [Switching to Thread 0x7f64e5e6b700 (LWP 141373)] >> rte_mov128blocks (n=2048, src=0xc , >> dst=0x150da4480) at ../lib/eal/x86/include/rte_memcpy.h:384 >> (gdb) bt >> 0 rte_mov128blocks (n=2048, src=0xc, dst=0x150da4480) >> 1 rte_memcpy_generic (n=2048, src=0xc, dst=0x150da4480) >> 2 rte_memcpy (n=2048, src=0xc, dst=) >> 3 sync_fill_seg >> 4 desc_to_mbuf >> 5 virtio_dev_tx_split >> 6 virtio_dev_tx_split_legacy >> 7 0x00007f676fea0fef in rte_vhost_dequeue_burst >> 8 0x00007f6772005a62 in netdev_dpdk_vhost_rxq_recv >> 9 0x00007f6771f38116 in netdev_rxq_recv >> 10 0x00007f6771f03d96 in dp_netdev_process_rxq_port >> 11 0x00007f6771f04239 in pmd_thread_main >> 12 0x00007f6771f92aff in ovsthread_wrapper >> 13 0x00007f6771c1b6ea in start_thread >> 14 0x00007f6771933a8f in clone > > > It is a fix, so it should contains the Fixes tag, and also cc > stable@dpdk.org. After one year, it is time for Redhat and Intel or whatever the governance of this project is, to mention if there is any intention to fix this or not, before I or anyone else at SUSE will invest any more of our time and efforts here. Any tags you need you can add as required. Thanks, Claudio > >> >> Tested-by: Claudio Fontana > > Tested-by can be removed as you're already the author. > >> Signed-off-by: Claudio Fontana >> --- >> lib/vhost/virtio_net.c | 11 ++++++++--- >> 1 file changed, 8 insertions(+), 3 deletions(-) >> >> diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c >> index 35fa4670fd..eb19e54c2b 100644 >> --- a/lib/vhost/virtio_net.c >> +++ b/lib/vhost/virtio_net.c >> @@ -1153,7 +1153,7 @@ mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, >> struct virtio_net_hdr_mrg_rxbuf tmp_hdr, *hdr = NULL; >> struct vhost_async *async = vq->async; >> >> - if (unlikely(m == NULL)) >> + if (unlikely(m == NULL) || nr_vec == 0) > > I think nr_vec == 0 is also unlikely. > >> return -1; >> >> buf_addr = buf_vec[vec_idx].buf_addr; >> @@ -2673,6 +2673,9 @@ desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, >> struct vhost_async *async = vq->async; >> struct async_inflight_info *pkts_info; >> >> + if (unlikely(nr_vec == 0)) >> + return -1; >> + >> buf_addr = buf_vec[vec_idx].buf_addr; >> buf_iova = buf_vec[vec_idx].buf_iova; >> buf_len = buf_vec[vec_idx].buf_len; >> @@ -2917,9 +2920,11 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq, >> vq->last_avail_idx + i, >> &nr_vec, buf_vec, >> &head_idx, &buf_len, >> - VHOST_ACCESS_RO) < 0)) >> + VHOST_ACCESS_RO) < 0)) { >> + dropped += 1; >> + i++; >> break; >> - >> + } >> update_shadow_used_ring_split(vq, head_idx, 0); >> >> err = virtio_dev_pktmbuf_prep(dev, pkts[i], buf_len); >