From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) by dpdk.org (Postfix) with ESMTP id 96FC337A4 for ; Tue, 17 Nov 2015 01:12:04 +0100 (CET) Received: by pabfh17 with SMTP id fh17so193890854pab.0 for ; Mon, 16 Nov 2015 16:12:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber_org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=t60l7R1fDpHbpyHt0aM7Hb9AXG7ypcArtgE09ZgMfts=; b=itd8uExSTah4DQ4C0B/OAvpP8WAlAK4upU4rJMxl5sRjJyOdIjsM8FQgnyxviDvBvt AVIS6JTRvU7ZJR7eTM939P9Th/yLw+epLkyCndHtUjBYhHc3Ro1TgE+Zy39LTBIoNOvC 67PvRUFuw2+M64aKHfPKPHh3OqYNumqO+byazGYQRXPSh1e2b0OupG38TZYMxOzqode+ v9JbxDUdW2+drEiu7Yyj2eRQr/MI9CW0U3sz6KbmJNyeA65bxq8n3SbthTF1ZuKmmkSl 2+KjoslGtc+kTlh3/n+jNmvFEdMyAIVn1yGBtZb4mbnFR8hjQwJfU6u6zfEdzux4u7Zx liUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=t60l7R1fDpHbpyHt0aM7Hb9AXG7ypcArtgE09ZgMfts=; b=IU1WWDhxYNIdCJxriTWgii3QpkRBq5DQBmcgHeJGcJXMyWvso+k20M9RMZBKUKqTNN Q9r6wTFsrtd4OSVABX+XK014kLyfkQirGSyVa/bs3X1RoaZ9bOaPz77XpXe65wOoEODN T/JAG7Ppam2CZRjc2KPgYaRPEskzt4I+Iua7Q8VpGuClW4DDjAgu2v4iHGzXM8Ls3oDG 81i5Am19KsndeBYpag1+a/m2XBYmmcIYV+TUpknurxT3tPXamXjjYi3VZm0YrwE5wL24 WBdqGNTiicbcySggLaxpdQdRMre5r7RLNEJb9v5V68EdlqfTkXIxAMZ77czF/3vOxDS3 npmQ== X-Gm-Message-State: ALoCoQlXlWuJVh95VVXu6XJV1XfoV4tmm6oA0OVw082syOzBVBmtsY3GsdDH0vC1Lke6OVmIXl7v X-Received: by 10.66.192.193 with SMTP id hi1mr58113806pac.110.1447719123654; Mon, 16 Nov 2015 16:12:03 -0800 (PST) Received: from samsung9 (50-206-118-3-static.hfc.comcastbusiness.net. [50.206.118.3]) by smtp.gmail.com with ESMTPSA id qd2sm38713429pbb.68.2015.11.16.16.12.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 16 Nov 2015 16:12:03 -0800 (PST) Date: Mon, 16 Nov 2015 16:12:01 -0800 From: Stephen Hemminger To: Matt Laswell Message-ID: <20151116161201.7e951097@samsung9> In-Reply-To: References: X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] How to approach packet TX lockups X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Nov 2015 00:12:04 -0000 On Mon, 16 Nov 2015 17:48:35 -0600 Matt Laswell wrote: > Hey Folks, > > I sent this to the users email list, but I'm not sure how many people are > actively reading that list at this point. I'm dealing with a situation in > which my application loses the ability to transmit packets out of a port > during times of moderate stress. I'd love to hear suggestions for how to > approach this problem, as I'm a bit at a loss at the moment. > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on Haswell > processors. I'm using the 82599 controller, configured to spread packets > across multiple queues. Each queue is accessed by a different lcore in my > application; there is therefore concurrent access to the controller, but > not to any of the queues. We're binding the ports to the igb_uio driver. > The symptoms I see are these: > > > - All transmit out of a particular port stops > - rte_eth_tx_burst() indicates that it is sending all of the packets > that I give to it > - rte_eth_stats_get() gives me stats indicating that no packets are > being sent on the affected port. Also, no tx errors, and no pause frames > sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.) > - All other ports continue to work normally > - The affected port continues to receive packets without problems; only > TX is affected > - Resetting the port via rte_eth_dev_stop() and rte_eth_dev_start() > restores things and packets can flow again > - The problem is replicable on multiple devices, and doesn't follow one > particular port > > I've tried calling rte_mbuf_sanity_check() on all packets before sending > them. I've also instrumented my code to look for packets that have already > been sent or freed, as well as cycles in chained packets being sent. I > also put a lock around all accesses to rte_eth* calls to synchronize access > to the NIC. Given some recent discussion here, I also tried changing the > TX RS threshold from 0 to 32, 16, and 1. None of these strategies proved > effective. > > Like I said at the top, I'm a little at a loss at this point. If you were > dealing with this set of symptoms, how would you proceed? > I remember some issues with old DPDK 1.6 with some of the prefetch thresholds on 82599. You would be better off going to a later DPDK version.