From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f176.google.com (mail-io0-f176.google.com [209.85.223.176]) by dpdk.org (Postfix) with ESMTP id D12695A8C for ; Tue, 17 Nov 2015 15:23:45 +0100 (CET) Received: by iofh3 with SMTP id h3so19688302iof.3 for ; Tue, 17 Nov 2015 06:23:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infiniteio_com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Bep/g1H9abFNVZAIqJySzN5UfsWgOIM64P5fmyMroRA=; b=04H+8NsiP1ZSlsRVDeAOSyQOBilWSuw6boyieVZjaFkec+Bf6ITl+Aut4jkdaeLa7b Mys4FEN3Qne5P8yqoUF2oi1brkhJvwmx5rayfej9+twyL6+nZ0dKLxZ3oLP1GbSWU3Ux Jq6jw1HNnnCzZVae6ofRWPw//0JJIXKyz7EyBMmpIKZn4vLg3xSuPDJyQ8ajQ8sSrJsJ jZIiGQUGfuVQFkTd4Jq6u3ddm7itsHrU86PnMbbITB+9xQr2JNNUkuXvtu8q9cO/gMA+ 8AZDBPTGi/VIYI9LJZ8p0UKDxJFITduwn1lNCaAIfRY5QHOb0flIyjhlaQxKFbR8HRVT WhBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=Bep/g1H9abFNVZAIqJySzN5UfsWgOIM64P5fmyMroRA=; b=PXJRF/BYjcbOMotpB9oDTc5UuK61i0zWX7XVZux66RqijExwIQ5V4Vnz9eGIkghKUo ME5pMRBUgopJRIVHlTBDz8meTuFCtL+V5Ehbt9LESm42bIMtfpq+HrY1rTnjo3vB552u XdgDiWrpJHmvML4HyISyJY3sGQ7sWCWcAwZgUOLVnXdJzuACwRjzIGUBHAYyvuAC7Rai yrrssngOim079sawMqGHbGv5c7f1ivQYANXBMpinAzdC3pbUBx0h2W5VD3LNA2/XaBBz McOWiuLuiDAL6B5msAsFFPu4IIemgnNiPPgD3PuQDtUYSK/YW3p38GjT7NA1N2hWvFK7 DIbQ== X-Gm-Message-State: ALoCoQl0y8niuFiUVnGj3YvBGkWaSwAMRsuUuK6pVZMB+EyEs51syF2tsNIoLtpY6RhifZer7FUM MIME-Version: 1.0 X-Received: by 10.107.154.67 with SMTP id c64mr35538660ioe.53.1447770225273; Tue, 17 Nov 2015 06:23:45 -0800 (PST) Received: by 10.107.140.15 with HTTP; Tue, 17 Nov 2015 06:23:45 -0800 (PST) In-Reply-To: <20151116173129.2a429930@samsung9> References: <20151116161201.7e951097@samsung9> <20151116173129.2a429930@samsung9> Date: Tue, 17 Nov 2015 08:23:45 -0600 Message-ID: From: Matt Laswell To: Stephen Hemminger Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] How to approach packet TX lockups X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Nov 2015 14:23:46 -0000 Yes, we're on 1.6r2. That said, I've tried a number of different values for the thresholds without a lot of luck. Setting wthresh/hthresh/pthresh to 0/0/32 or 0/0/0 doesn't appear to fix things. And, as Matthew suggested, I'm pretty sure using 0 for the thresholds leads to auto-config by the driver. I also tried 1/1/32, which required that I also change the rs_thresh value from 0 to 1 to work around a panic in PMD initialization ("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1"). Any other suggestions? On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger < stephen@networkplumber.org> wrote: > On Mon, 16 Nov 2015 18:49:15 -0600 > Matt Laswell wrote: > > > Hey Stephen, > > > > Thanks a lot; that's really useful information. Unfortunately, I'm at a > > stage in our release cycle where upgrading to a new version of DPDK isn't > > feasible. Any chance you (or others reading this) has a pointer to the > > relevant changes? While I can't afford to upgrade DPDK entirely, > > backporting targeted fixes is more doable. > > > > Again, thanks. > > > > - Matt > > > > > > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger < > > stephen@networkplumber.org> wrote: > > > > > On Mon, 16 Nov 2015 17:48:35 -0600 > > > Matt Laswell wrote: > > > > > > > Hey Folks, > > > > > > > > I sent this to the users email list, but I'm not sure how many > people are > > > > actively reading that list at this point. I'm dealing with a > situation > > > in > > > > which my application loses the ability to transmit packets out of a > port > > > > during times of moderate stress. I'd love to hear suggestions for > how to > > > > approach this problem, as I'm a bit at a loss at the moment. > > > > > > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on > Haswell > > > > processors. I'm using the 82599 controller, configured to spread > packets > > > > across multiple queues. Each queue is accessed by a different lcore > in > > > my > > > > application; there is therefore concurrent access to the controller, > but > > > > not to any of the queues. We're binding the ports to the igb_uio > driver. > > > > The symptoms I see are these: > > > > > > > > > > > > - All transmit out of a particular port stops > > > > - rte_eth_tx_burst() indicates that it is sending all of the > packets > > > > that I give to it > > > > - rte_eth_stats_get() gives me stats indicating that no packets > are > > > > being sent on the affected port. Also, no tx errors, and no pause > > > frames > > > > sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.) > > > > - All other ports continue to work normally > > > > - The affected port continues to receive packets without problems; > > > only > > > > TX is affected > > > > - Resetting the port via rte_eth_dev_stop() and > rte_eth_dev_start() > > > > restores things and packets can flow again > > > > - The problem is replicable on multiple devices, and doesn't > follow > > > one > > > > particular port > > > > > > > > I've tried calling rte_mbuf_sanity_check() on all packets before > sending > > > > them. I've also instrumented my code to look for packets that have > > > already > > > > been sent or freed, as well as cycles in chained packets being > sent. I > > > > also put a lock around all accesses to rte_eth* calls to synchronize > > > access > > > > to the NIC. Given some recent discussion here, I also tried > changing the > > > > TX RS threshold from 0 to 32, 16, and 1. None of these strategies > proved > > > > effective. > > > > > > > > Like I said at the top, I'm a little at a loss at this point. If you > > > were > > > > dealing with this set of symptoms, how would you proceed? > > > > > > > > > > I remember some issues with old DPDK 1.6 with some of the prefetch > > > thresholds on 82599. You would be better off going to a later DPDK > > > version. > > > > > I hope you are on 1.6.0r2 at least?? > > With older DPDK there was no way to get driver to tell you what the > preferred settings were for pthresh/hthresh/wthresh. And the values > in Intel sample applications were broken on some hardware. > > I remember reverse engineering the safe values from reading the Linux > driver. > > The Linux driver is much better tested than the DPDK one... > In the Linux driver, the Transmit Descriptor Controller (txdctl) > is fixed at (for transmit) > wthresh = 1 > hthresh = 1 > pthresh = 32 > > The DPDK 2.2 driver uses: > wthresh = 0 > hthresh = 0 > pthresh = 32 > > > > > > >