From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 949D9A04B5; Wed, 13 Jan 2021 15:15:48 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 16BDC140D5C; Wed, 13 Jan 2021 15:15:48 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by mails.dpdk.org (Postfix) with ESMTP id 58422140D4E for ; Wed, 13 Jan 2021 15:15:45 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1610547344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0hqajCNSm3hF3ZA6IqPcsSFqqWR3ipZBruOEKFZBCro=; b=XxOqL5Aj/sfYXOjbY/KkMIHCI/WYExyk8g9li78dmv4XFqoCsE5anEc51B54ixun/wJE1n P3PBUYdluMaMjsSGHEDJCjGajvlkNsx0JHqO9D5FL0+E/1LdeHGCOXcKWnNjGWp52bxt8w 1loZS7IYackQFV31aQhrCCHkGid2umY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-179-vNnN-cFBOKWMPTCob4enKg-1; Wed, 13 Jan 2021 09:15:38 -0500 X-MC-Unique: vNnN-cFBOKWMPTCob4enKg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 163511088331; Wed, 13 Jan 2021 14:14:33 +0000 (UTC) Received: from [10.36.114.130] (ovpn-114-130.ams2.redhat.com [10.36.114.130]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8DACB1F075; Wed, 13 Jan 2021 14:14:31 +0000 (UTC) To: "Alex Yeh (ayeh)" , "Stokes, Ian" , "dev@dpdk.org" Cc: "Yegappan Lakshmanan (yega)" , Maxime Coquelin , Chenbo Xia References: <322122fb-619d-96f6-5c3e-9eabdbf3819a@redhat.com> From: Kevin Traynor Message-ID: <459bd74c-cafa-7f0c-b488-ce72c3a0d07f@redhat.com> Date: Wed, 13 Jan 2021 14:14:25 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=ktraynor@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 12/01/2021 18:20, Alex Yeh (ayeh) wrote: > Hi Kevin, Stokes, > Resending just to make sure the email is not lost. > Thanks and looking forward to your suggestion, > Alex > +Cc vhost/virtio maintainers Thanks for the report and checking the newer versions. I think at this stage you should log a report in https://bugs.dpdk.org and provide steps for the vhost/virtio maintainers so they can reproduce this issue. > -----Original Message----- > From: Alex Yeh (ayeh) > Sent: Friday, January 08, 2021 11:36 AM > To: Kevin Traynor ; Stokes, Ian ; dev@dpdk.org > Cc: Yegappan Lakshmanan (yega) > Subject: RE: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service > > Hi Kevin, Stokes, > Thanks for the suggestion. > We have upgrade to OVS 2.11.4 and DPDK 18.11.10. The OVS still crashes with the same segfault error when application within the guest VM retarts. Any suggestion on how to proceed? > > Thanks > Alex > > [root@nfvis ~]# ovs-vswitchd --version > ovs-vswitchd (Open vSwitch) 2.11.4 > DPDK 18.11.10 > > -----Original Message----- > From: Kevin Traynor > Sent: Thursday, November 19, 2020 4:09 AM > To: Stokes, Ian ; Alex Yeh (ayeh) ; dev@dpdk.org > Cc: Yegappan Lakshmanan (yega) > Subject: Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service > > On 19/11/2020 11:21, Stokes, Ian wrote: >>> Hi, >>> We are seeing a ovs-vswitchd service crash with >>> segfault in the librte_vhost library when a DPDK application within a guest VM is stopped. >>> >>> We are using OVS 2.11.1 on CentOS 7.6 (3.10.0-1062 >>> Linux kernel) with DPDK 18.11.2. >> >> Hi, >> >> Is there a reason you are using OVS 2.11.1 and DPDK 18.11.2? These are quite old. >> >> As a first step I would recommend using the latest of these branches that have been validated with by the OVS community. >> >> As of now this would be OVS 2.11.4 and DPDK 18.11.9 to check if the issue is still present there my suspicion is that this could be an issue resolved in the DPDK library since 18.11.2. >> > > +1, there's 58 commits in the vhost library on 18.11 branch since > 18.11.2, so it might be already fixed. 18.11.10 is the latest release, while below is in from 18.11.7. > > $ git log --oneline v18.11.2..HEAD . | grep crash 90b5ba739f vhost: fix crash on port deletion > > If you are planning to continue to use 18.11 for a while, I think you will want to test the 18.11.11 Release Candidate that will be available in a few weeks. It is the last planned 18.11 release, so any issues you find *after* it is released won't be fixed. > > Kevin. > > > >> Regards >> Ian >> >>> >>> We are using OVS-DPDK on the host and the guest VM is >>> running a DPDK application. With some traffic, if the application >>> service within the VM is restarted, then OVS crashes. >>> >>> This crash is not seen if the guest VM is restarted >>> (instead of stopping the application within the VM). >>> >>> The crash trackback (attached below) points to the >>> rte_memcpy_generic() function in rte_memcpy.h. It looks like the >>> crash occurs when vhost is trying to dequeue the packets from the >>> guest VM (as the application in the guest VM has stopped and the huge >>> pages are returned to the guest kernel). >>> >>> We have tried enabling iommu in ovs by setting >>> "other_config:vhost-iommu-support=true" and enabling iommu in qemu >>> using the following configuration in the guest domain XML: >>> >>> >>> >>> With iommu enabled ovs-vswitchd still crashes when >>> guest VM restarts the network service. >>> >>> Is this a known problem? Anyone else seen a crash like >>> this? How can we protect the ovs-vswitchd from crashing when a guest >>> VM restarts the network application or service? >>> >>> Thanks >>> Alex >>> --------------------------------------------------------------------- >>> --- >>> >>> Log: >>> Oct 7 19:54:16 Branch81-Bravo kernel: [2245909.596635] pmd16[25721]: >>> segfault at 7f4d1d733000 ip 00007f4d2ae5d066 sp 00007f4d1ce65618 >>> error 4 in librte_vhost.so.4[7f4d2ae52000+1a000] >>> Oct 7 19:54:19 Branch81-Bravo systemd[1]: ovs-vswitchd.service: main >>> process exited, code=killed, status=11/SEGV >>> >>> Environment: >>> CentOs 7.6.1810 >>> openvswitch-2.11.1-1.el7.centos.x86_64 >>> openvswitch-kmod-2.11.1-1.el7.centos.x86_64 >>> dpdk-18.11-2.el7.centos.x86_64 >>> 3.10.0-1062.4.1.el7.x86_64 >>> qemu-kvm-ev-2.12.0-18.el7.centos_6.1.1 >>> >>> Core dump trace: >>> (gdb) bt >>> #-1 0x00007ffff205602e in rte_memcpy_generic (dst=, >>> src=0x7fffcef3607c, n=) at >>> /usr/src/debug/dpdk-18.11/x86_64-native-linuxapp- >>> gcc/include/rte_memcpy.h:793 >>> Backtrace stopped: Cannot access memory at address 0x7ffff20558f0 >>> >>> (gdb) list *0x00007ffff205602e >>> 0x7ffff205602e is in rte_memcpy_generic >>> (/usr/src/debug/dpdk-18.11/x86_64- >>> native-linuxapp-gcc/include/rte_memcpy.h:793). >>> 788 } >>> 789 >>> 790 /** >>> 791 * For copy with unaligned load >>> 792 */ >>> 793 MOVEUNALIGNED_LEFT47(dst, src, n, srcofs); >>> 794 >>> 795 /** >>> 796 * Copy whatever left >>> 797 */ >>> >>> (gdb) list *0x00007ffff205c192 >>> 0x7ffff205c192 is in rte_vhost_dequeue_burst (/usr/src/debug/dpdk- >>> 18.11/lib/librte_vhost/virtio_net.c:1192). >>> 1187 * In zero copy mode, one mbuf can only reference data >>> 1188 * for one or partial of one desc buff. >>> 1189 */ >>> 1190 mbuf_avail = cpy_len; >>> 1191 } else { >>> 1192 if (likely(cpy_len > MAX_BATCH_LEN || >>> 1193 vq->batch_copy_nb_elems >= vq->size || >>> 1194 (hdr && cur == m))) { >>> 1195 rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *, >>> 1196 mbuf_offset), >>> (gdb) >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >