From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qg0-f46.google.com (mail-qg0-f46.google.com [209.85.192.46]) by dpdk.org (Postfix) with ESMTP id D68BB2949 for ; Thu, 7 Apr 2016 16:49:47 +0200 (CEST) Received: by mail-qg0-f46.google.com with SMTP id c6so64978557qga.1 for ; Thu, 07 Apr 2016 07:49:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=ithwFujZblbWcHohDAW/WwijfNPHjoNdfZcQTVkKpZM=; b=PvlBrQlNQfUkpR8MaZmOcpfWH8UUC/cVrAc9FsXvmRwg8IpCy661TxGMWejq7UFTvE nrfqutCBhQvL+4HyTPAXJ4vtsvKGGpkyuSGDZrSgGtD4XCqTlj9Vpoylu5iwrri1QxzF LdtYnjRecubzcuo8TtushFVzKPNBY8WJl7d3qMuQYyrrrPZr6v4bxYqHd+wQgUMXufNW 2AMryvEE0/muK+NnF/adAkRAE6bvX8XflRxXmWX4dfX1IAr/LQADJNQx7ruweqD/drC4 1aK3axDIaJmahDdar8Px2fuDEE27wUTx2xXbCNdJ+B6BSYLEN5lX+tYrtPsrrNh/CsLI AQHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ithwFujZblbWcHohDAW/WwijfNPHjoNdfZcQTVkKpZM=; b=Xd+znH3kLiO/zuN9Aqwy2BpmunERMyRsN53Yn/4cTQ487hBqp0wa7uflxFy9UHfjVx smckfE1/m0/GiKwWYr8DsEehnBJghSxnC6gXvOy5fFWCer9b+7hNV2zv7YqCgc/e9OM+ QOkDJ7Nyyftj9S95go/T4JTQNhewlchWnYoUaNbLmyxrogB7PWQ+BHEekAT64UhEGCbt Gmz6jvU+/lrYsMruqKsPfl3jEUA5JI2J4eP67zW4zQauDK9ur2EMAiNO/2wFlc4om+k+ Lfn8TmRIarq2V3GcW8+aXbeuOIDn1SG/zrFtxXCI8mg0u3RcGNEYyMczEt6CpQ/mpnN0 Sr4w== X-Gm-Message-State: AD7BkJIp5NWbm+C8j4dnq29fifM+3QbVm6xGaUazr/wkk+ISW5/4UpHXw00+sOb+1Z4/PnpiVEN2BTVDwJsfadhj X-Received: by 10.140.42.68 with SMTP id b62mr4264393qga.12.1460040586954; Thu, 07 Apr 2016 07:49:46 -0700 (PDT) MIME-Version: 1.0 Received: by 10.55.6.144 with HTTP; Thu, 7 Apr 2016 07:49:27 -0700 (PDT) In-Reply-To: <570379F9.6020306@ericsson.com> References: <1458292380-9258-1-git-send-email-patrik.r.andersson@ericsson.com> <570379F9.6020306@ericsson.com> From: Christian Ehrhardt Date: Thu, 7 Apr 2016 16:49:27 +0200 Message-ID: To: Patrik Andersson R , Daniele Di Proietto Cc: "Xie, Huawei" , "dev@dpdk.org" , Thomas Monjalon , "Ananyev, Konstantin" , Yuanhan Liu Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [RFC] vhost user: add error handling for fd > 1023 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Apr 2016 14:49:48 -0000 Hi Patrick, On Tue, Apr 5, 2016 at 10:40 AM, Patrik Andersson R < patrik.r.andersson@ericsson.com> wrote: > > The described fault situation arises due to the fact that there is a bug > in an OpenStack component, Neutron or Nova, that fails to release ports > on VM deletion. This typically leads to an accumulation of 1-2 file > descriptors per unreleased port. It could also arise when allocating a > large > number (~500?) of vhost user ports and connecting them all to VMs. > I can confirm that I'm able to trigger this without Openstack. Using DPDK 2.2 and OpenVswitch 2.5. Initially I had at least 2 guests attached to the first two ports, but it seems not necessary which makes it as easy as: ovs-vsctl add-br ovsdpdkbr0 -- set bridge ovsdpdkbr0 datapath_type=netdev ovs-vsctl add-port ovsdpdkbr0 dpdk0 -- set Interface dpdk0 type=dpdk for idx in {1..1023}; do ovs-vsctl add-port ovsdpdkbr0 vhost-user-${idx} -- set Interface vhost-user-${idx} type=dpdkvhostuser; done => as soon as the associated fd is >1023 the vhost_user socket gets created, but just afterwards I see the crash mentioned by Patrick #0 0x00007f51cb187518 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54 #1 0x00007f51cb1890ea in __GI_abort () at abort.c:89 #2 0x00007f51cb1c98c4 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f51cb2e1584 "*** %s ***: %s terminated\n") at ../sysdeps/posix/libc_fatal.c:175 #3 0x00007f51cb26af94 in __GI___fortify_fail (msg=, msg@entry=0x7f51cb2e1515 "buffer overflow detected") at fortify_fail.c:37 #4 0x00007f51cb268fa0 in __GI___chk_fail () at chk_fail.c:28 #5 0x00007f51cb26aee7 in __fdelt_chk (d=) at fdelt_chk.c:25 #6 0x00007f51cbd6d665 in fdset_fill (pfdset=0x7f51cc03dfa0 , wfset=0x7f51c78e4a30, rfset=0x7f51c78e49b0) at /build/dpdk-3lQdSB/dpdk-2.2.0/lib/librte_vhost/vhost_user/fd_man.c:110 #7 fdset_event_dispatch (pfdset=pfdset@entry=0x7f51cc03dfa0 ) at /build/dpdk-3lQdSB/dpdk-2.2.0/lib/librte_vhost/vhost_user/fd_man.c:243 #8 0x00007f51cbdc1b00 in rte_vhost_driver_session_start () at /build/dpdk-3lQdSB/dpdk-2.2.0/lib/librte_vhost/vhost_user/vhost-net-user.c:525 #9 0x00000000005061ab in start_vhost_loop (dummy=) at ../lib/netdev-dpdk.c:2047 #10 0x00000000004c2c64 in ovsthread_wrapper (aux_=) at ../lib/ovs-thread.c:340 #11 0x00007f51cba346fa in start_thread (arg=0x7f51c78e5700) at pthread_create.c:333 #12 0x00007f51cb2592dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 As Patrick I don't have a "pure" DPDK test yet, but at least OpenStack is our of the scope now which should help. [...] > The key point, I think, is that more than one file descriptor is used per > vhost user device. This means that there is no real relation between the > number of devices and the number of file descriptors in use. Well it is "one per vhost_user device" as far as I've seen, but those are not the only fd's used overall. [...] > In my opinion the problem is that the assumption: number of vhost > user device == number of file descriptors does not hold. What the actual > relation might be hard to determine with any certainty. > I totally agree to that there is no deterministic rule what to expect. The only rule is that #fd certainly always is > #vhost_user devices. In various setup variants I've crossed fd 1024 anywhere between 475 and 970 vhost_user ports. Once the discussion continues and we have an updates version of the patch with some more agreement I hope I can help to test it. Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd