From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb0-f170.google.com (mail-yb0-f170.google.com [209.85.213.170]) by dpdk.org (Postfix) with ESMTP id 9292ACE7 for ; Fri, 26 Aug 2016 19:55:33 +0200 (CEST) Received: by mail-yb0-f170.google.com with SMTP id d10so29524274ybi.1 for ; Fri, 26 Aug 2016 10:55:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=luminatewireless-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=SZosvtChk/5R2M/Xuv7gf5JtkBT0EZiRwA1BvgWDYIQ=; b=BXzL6EtdS2rHdmX3I3Uj0Ea3u30YETrP2cR2ZLkA+XhBZ4KZwcdMa1aW6UlawSdbbM jl+OsyDN5ut32EP6mR6ThfULgeDCbb5yuwyZvV+rWhtXS45mLqO/y1Brtg9om7pOjAfk FbRhZF05MB5UBoFD/u7LjfMj/uB+f3xE5giI8s3M5GzvuAjhc/Wx6wikU8Ak1WxH2E88 znpoRdxNxR6IvK5zUOVPA4Zcv2Huc2mY8lLcZOn3vg1C3/Xz7/tqHtTbScsYtskZ4NKu wU/zWFh+6cqjEzsV9KW+o+VCNkgsSMLBT+Ma+2puyWmIYX9h0OIOMKYit8HsFqF1pStB wSpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=SZosvtChk/5R2M/Xuv7gf5JtkBT0EZiRwA1BvgWDYIQ=; b=AAlpijgzmj//Py7FSXnb0vPfUBRDF+FpBcBcUWPg1XznDC3/lmvJW4kOdWdJfr0F8X Kao7JSoKQBPlhCdKkyWXWZ/zPBleaXcw65GJ8mHPwzv7Pn2b91pe+M3m6BB6ju8mJsrN OlCTBeCSi1Vg2vvxxm8V7azV19IASEzXz8lB84Bh7QJuqNf2r43t9ZQuAkPOWj9kWzdY KqeIEfi3hZcmIx2MaRDequS65EyZrL+2Ih/it9EBUZ3MLBHhFlb4xrkBvv8oabfthkOC 8qhq3o6hLLe6zDj5/tz1IUvUOa0vY7NTlmf3GZ3lGozAUp5D7vG1mguyIDzOQJxkVz6u T+kw== X-Gm-Message-State: AE9vXwNHYFddWjAd+nx1lQM2DIRe15GQeHDW82CoJUPfmWoQphTYRjs9WjlPwTwnf/g0QTSs6SCFkpdftwFYnRLX X-Received: by 10.37.105.209 with SMTP id e200mr4135181ybc.64.1472234132920; Fri, 26 Aug 2016 10:55:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.119.139 with HTTP; Fri, 26 Aug 2016 10:55:30 -0700 (PDT) In-Reply-To: <20160819183029.6deb8ebd@xeon-e3> References: <20160819140350.70fbc49e@xeon-e3> <20160819183029.6deb8ebd@xeon-e3> From: Zhongming Qu Date: Fri, 26 Aug 2016 10:55:30 -0700 Message-ID: To: Stephen Hemminger Cc: users@dpdk.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] running multiple independent dpdk applications randomly locks up machines X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Aug 2016 17:55:33 -0000 Hi, Just an update. Thanks for all the inputs. I feel obliged to update the latest findings here so that this thread may become useful for other people. As it turned out, the rx/tx queue problem is not really the problem. Here is why: Our use model is to run two different *primary* dpdk processes each of which binds to a different port. Both ports are on the same 82599ES nic. They are separate ports that have independent rx/tx queues (in the sense of BARs and the BAR0-based registers). What the problem was, though, was that our application never calls the rte_eth_dev_stop() function to properly shutdown the device. Simply making sure that rte_eth_dev_stop() is called solved our problem. >>From the standpoint of a user of the dpdk library, the problem is solved. BUT it is not understood, yet, how exactly failing to call rte_eth_dev_stop() could have caused machine lockups. Could someone shed light upon this question by a) simply confirming that I am not the only person seeing this problem, b) explain how, at a very low level, race conditions or memory corruptions or anything could happen that causes a kernel panic, or c) provide pointers to potentially relevant information? Thanks a lot! Zhongming On Fri, Aug 19, 2016 at 6:30 PM, Stephen Hemminger < stephen@networkplumber.org> wrote: > On Fri, 19 Aug 2016 18:19:21 -0700 > Zhongming Qu wrote: > > > Thanks! > > > > I did use a hard coded queue_id of 0 when initializing the rx/tx queues, > > i.e., rte_eth_rx/tx_queue_setup(). So that is a problem to solve. Will > fix > > that and try again. > > > > When A and B run at the same time, this lockup problem can be explained > by > > the conflicting queue usage. But the lockup happens even in the use case > > where only one dpdk process is running. That is, A and B take turns to > run > > but do not run at the same time. > > > > Thanks for pointing out an alternative approach. That sounds really > > promising. A concern came up when that idea was talked over: What would > > happen if the primary process dies? Would all the secondary processes > > eventually go awry at some point? Would `--proc-type auto` solve this > > problem? > > > > I haven't actually used primary/secondary model, but the recommendation > is that the primary process does nothing (or is a watchdog) so it would > be pretty much impossible to crash unless killed by malicious entity. > > All the packet logic would be in the secondary. >