From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f53.google.com (mail-pa0-f53.google.com [209.85.220.53]) by dpdk.org (Postfix) with ESMTP id 045EA5A56 for ; Tue, 17 Nov 2015 02:31:37 +0100 (CET) Received: by padhx2 with SMTP id hx2so193043989pad.1 for ; Mon, 16 Nov 2015 17:31:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber_org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=iO3/G1hWz18qo+Rvw6zKInzuVjAv6VZL1nYEFCCysjo=; b=GAuo2/24q7+YaaWaV+vmufN6akIeeuPaEHZyTr+7hy6Ua27drPnbwLI0hgfjJKJkUO 72ow3jdD5P1mXy4OLZqsaBDXzh1Wf+BNrJui4nTuW9aj2/LvmuJPz0VYnL6K8fYLUuZT yCPOdugqNwOfnOeRM3RsCxoESAkKeX+rwO8aUPkz/vvs1/Us3lBZiEYysfGMaC2GgGuF A5sDOQodnxhI27wTZYf2s9ydemeGX5OerlauDbAQFxo3mdWw9oojE8TXLsaGMTXeh9Oz 5UDINphX+HKoZZvQLhnK6vRwD5yYxyczJkU/S61y4fEyYfiVqi5YA+hbkhdBiV1rg7BD Ya7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=iO3/G1hWz18qo+Rvw6zKInzuVjAv6VZL1nYEFCCysjo=; b=VaVvxsn9K90OwdITLrtro+GsBQk5XCNrdKwRBaVT5QBd+6NYroMtnMdOC+LOFs/klj oPCOH0Hx9zqO/Or4lUqqhCvNcinTT9uJ4wrgd+c/uBlZFNcTQwIsOTwFSGQ10xf+gdQb NbdLR3HpQwyOpHtThLjpicgJgGVRhuQwyLX6FQCAmssIpYoD/0ubCuY/BsIS6rT9RX68 uYCrFa/YagJUMo/6DBoT8C+IpR5iKhPvxCT0DSPnvKPzE8FuOI1yar12NYFCyIbSmXVJ cmUzjjJ8hDd4LI1wSwHvGqcoMmx428QPVuoL4hetaVlsQKs1eSnCbqPb4VMF/Lh3548t kKwg== X-Gm-Message-State: ALoCoQliksBPT8weco2HegtNvFdoPjHey0BtdS9SZtg2kg+N4fAQ/wqYjxuH+N2fze1u/ve0oojp X-Received: by 10.68.235.40 with SMTP id uj8mr59057387pbc.95.1447723896133; Mon, 16 Nov 2015 17:31:36 -0800 (PST) Received: from samsung9 (ip-64-134-231-5.public.wayport.net. [64.134.231.5]) by smtp.gmail.com with ESMTPSA id sz9sm39039222pab.13.2015.11.16.17.31.35 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 16 Nov 2015 17:31:35 -0800 (PST) Date: Mon, 16 Nov 2015 17:31:29 -0800 From: Stephen Hemminger To: Matt Laswell Message-ID: <20151116173129.2a429930@samsung9> In-Reply-To: References: <20151116161201.7e951097@samsung9> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] How to approach packet TX lockups X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Nov 2015 01:31:37 -0000 On Mon, 16 Nov 2015 18:49:15 -0600 Matt Laswell wrote: > Hey Stephen, > > Thanks a lot; that's really useful information. Unfortunately, I'm at a > stage in our release cycle where upgrading to a new version of DPDK isn't > feasible. Any chance you (or others reading this) has a pointer to the > relevant changes? While I can't afford to upgrade DPDK entirely, > backporting targeted fixes is more doable. > > Again, thanks. > > - Matt > > > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger < > stephen@networkplumber.org> wrote: > > > On Mon, 16 Nov 2015 17:48:35 -0600 > > Matt Laswell wrote: > > > > > Hey Folks, > > > > > > I sent this to the users email list, but I'm not sure how many people are > > > actively reading that list at this point. I'm dealing with a situation > > in > > > which my application loses the ability to transmit packets out of a port > > > during times of moderate stress. I'd love to hear suggestions for how to > > > approach this problem, as I'm a bit at a loss at the moment. > > > > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on Haswell > > > processors. I'm using the 82599 controller, configured to spread packets > > > across multiple queues. Each queue is accessed by a different lcore in > > my > > > application; there is therefore concurrent access to the controller, but > > > not to any of the queues. We're binding the ports to the igb_uio driver. > > > The symptoms I see are these: > > > > > > > > > - All transmit out of a particular port stops > > > - rte_eth_tx_burst() indicates that it is sending all of the packets > > > that I give to it > > > - rte_eth_stats_get() gives me stats indicating that no packets are > > > being sent on the affected port. Also, no tx errors, and no pause > > frames > > > sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.) > > > - All other ports continue to work normally > > > - The affected port continues to receive packets without problems; > > only > > > TX is affected > > > - Resetting the port via rte_eth_dev_stop() and rte_eth_dev_start() > > > restores things and packets can flow again > > > - The problem is replicable on multiple devices, and doesn't follow > > one > > > particular port > > > > > > I've tried calling rte_mbuf_sanity_check() on all packets before sending > > > them. I've also instrumented my code to look for packets that have > > already > > > been sent or freed, as well as cycles in chained packets being sent. I > > > also put a lock around all accesses to rte_eth* calls to synchronize > > access > > > to the NIC. Given some recent discussion here, I also tried changing the > > > TX RS threshold from 0 to 32, 16, and 1. None of these strategies proved > > > effective. > > > > > > Like I said at the top, I'm a little at a loss at this point. If you > > were > > > dealing with this set of symptoms, how would you proceed? > > > > > > > I remember some issues with old DPDK 1.6 with some of the prefetch > > thresholds on 82599. You would be better off going to a later DPDK > > version. > > I hope you are on 1.6.0r2 at least?? With older DPDK there was no way to get driver to tell you what the preferred settings were for pthresh/hthresh/wthresh. And the values in Intel sample applications were broken on some hardware. I remember reverse engineering the safe values from reading the Linux driver. The Linux driver is much better tested than the DPDK one... In the Linux driver, the Transmit Descriptor Controller (txdctl) is fixed at (for transmit) wthresh = 1 hthresh = 1 pthresh = 32 The DPDK 2.2 driver uses: wthresh = 0 hthresh = 0 pthresh = 32