From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com [209.85.220.43]) by dpdk.org (Postfix) with ESMTP id 1EAFC5AB6 for ; Fri, 19 Aug 2016 23:03:40 +0200 (CEST) Received: by mail-pa0-f43.google.com with SMTP id hb8so759414pac.2 for ; Fri, 19 Aug 2016 14:03:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=B9F/1kUqbBvz53HxQW0Gad4xQbd9CoKB0wGYHy7NHKI=; b=f1gMpiQyNCJeRSb08MuNXbGD3PcZEHK9yrSevKVcGIehCR3eaL+UItbxgfwRx/sv9I ob3jIjwIwN1EWZlIejDMuJhpTDdxt/O81GR+gP7aXD6rbG9NvBDl9xhuadNyIzAYNL/0 2f+sQOsv5FguEkJRah41E9NGEFWu4T/JJqA1PpidEMcREOLoJslQqxUqNYTre/2GSzng U34nzZhAxWHsWbnmXRn+gbLdCwUWZJ9DtYoP41B88W7MMyZxE5xkBXglHcnhu/dDk6Pw RI9Vdy8WTXJAqu39J0I9d9a2tCPdlDwht+HJ7qHlHwGvf3D9YtpIR4qqPJzAKN07kB2w mzoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=B9F/1kUqbBvz53HxQW0Gad4xQbd9CoKB0wGYHy7NHKI=; b=KzZzahN/GQ1IvXQRpc474/WWkq2O2McWdIVb7FBQoGi0cADBGrT2xdhOJMpssoqmhk QAgR2T1ov2lnyGn/L6APigQ3cElxbWJAAGCy/ouznjCmV7qQ00HpQ1A4D/QOtc5n4b/z hSuXZPqzA96KD2M+TemrmDQoJqRN1XBxyVqksHL8p36Zmz7969gMt7jKFUXknz1pTl9o dPmocIbqCkkg6qUXkZ+tQjoJWXUwAPoqa4XBlfdl8rKMFhAqQws83Spp+Wusj4EotkCw H+TaAd42hv5OjSHn4EwPlo6yYCBKWj3DlU3HkPfknpn7Q6TdA6+MQMyfstI5r7A7tVO8 47kw== X-Gm-Message-State: AEkoouseAHWuh2W6Xr6Gyyo6LwVHsddtan05kronwNqPmAFEoLveV7KZ01TyH8Ci1LKnkQ== X-Received: by 10.66.66.203 with SMTP id h11mr17545462pat.5.1471640619303; Fri, 19 Aug 2016 14:03:39 -0700 (PDT) Received: from xeon-e3 (static-50-53-69-251.bvtn.or.frontiernet.net. [50.53.69.251]) by smtp.gmail.com with ESMTPSA id g27sm8678908pfd.47.2016.08.19.14.03.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 19 Aug 2016 14:03:38 -0700 (PDT) Date: Fri, 19 Aug 2016 14:03:50 -0700 From: Stephen Hemminger To: Zhongming Qu Cc: users@dpdk.org Message-ID: <20160819140350.70fbc49e@xeon-e3> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-users] running multiple independent dpdk applications randomly locks up machines X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Aug 2016 21:03:40 -0000 On Fri, 19 Aug 2016 13:32:06 -0700 Zhongming Qu wrote: > Hi, > > > As stated in the subject, running multiple dpdk applications (only one > process per application) randomly locks up machines. Thanks in advance for > any help. > > It is difficult to provide the exact set of information useful for > debugging. Just listing the as much info as possible in the hope of ringing > a bell somewhere. > > System Configuration: > - Motherboard: Supermicro X10SRi-F (BIOS upgraded to the latest version as > of July 2016) > - Intel Xeon E5-2667 v3 (Haswell), no NUMA > - 64GB DRAM > - Ubuntu 14.04 kernel 3.13.0-49-generic > - DPDK 16.04 > - 1024 x 2M hugepages are reserved > - 82599ES NIC (2 x 10G) at pci_addr 02:00.0 and 02:00.1. Both ports use the > ixgbe_uio kernel driver and the ixgbe PMD. > > > Use Scenario of DPDK Application: > - Two single-process dpdk applications, A and B, need to run simultaneously. > - It is made sure that A and B do not have any race conditions or memory > issues, that is, apart from dpdk. > - Each application uses 512 x 2M hugepages (half of the total reserved > amount). > - Each application binds to one port via `--pci-whitelist `. > - Use `-m 1024` and `--file-prefix `, as > instructed by 19.2.3 in the Programmer's Guide ( > http://dpdk.org/doc/guides/prog_guide/multi_proc_support.html). > > > Description of Problem: > - Starting and killing down A and B repeatedly every 30 seconds has a > chance of locking up the machine. > - No kernel var/log/syslog, no dmesg, nothing persistent, is available for > debugging after a reboot of the frozen machine. > - Looks like a kernel panic as it dumps some panic info to the serial > console (not useful...) and the CapsLock and NumLock keys on a physically > connected keyboard do not respond. > - No particular sequence of operations of starting and killing A and B, so > far, has been found to reliably lead to a lockup. The best effort of > reproducing the lockup is a keep-trying-until-lockup approach. > > > A Few Things Tried: > - Via dumping logging to stderr and files, it is found that the lock up can > happen during rte_eal_hugepage_init(), or after it, after the program is > killed. > - It is made sure that rte_config.mem_config->memseg is properly > initialized. That is, the total amount of memory reserved in the memseg is > 512 x 2M hugepages. > - Zeroing all huepages when the hugefile is created and mapped, or > immediately after memsegs are initialized (as the second call of > map_all_hugepages() in rte_eal_hugepage_init()) does not fix the problem. > - By default, hugefiles in /mnt/huge are not cleaned up when the > applications are killed. Though, cleaning them up did not solve the problem > either. > > > > Thanks very much for any input! > > > Zhongming Obviously, two applications can't share the same queue. Also, you need to give application a different core mask; at least if you are using poll mode like the DPDK examples. You might be better off having one primary DPDK process and two secondary processes.