From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f178.google.com (mail-ig0-f178.google.com [209.85.213.178]) by dpdk.org (Postfix) with ESMTP id D01345A68 for ; Tue, 17 Nov 2015 17:25:56 +0100 (CET) Received: by igcph11 with SMTP id ph11so82440039igc.1 for ; Tue, 17 Nov 2015 08:25:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infiniteio-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=2W+JfXszFKiiXnGnUJxOyc3JlTcHZinHuKGXyJsNtvM=; b=vHgCrgVZWSMONQOaBS1FEmDJxGSn8ONcfT9B3Bzp1+FT49b301hVkBBl9Z0VQBULjN ryAUzLxEYAIJYcHY0j/orai88B6ED0cr0ZtKGyLhc2KTMu7wKrUKkvOd8Dl8RE7Fm92I iIpCP+3bAuGzdRBEGyBF6w6oPKtY2JUHIN+bc8Yrvyt6Ei09hWdVYD6C/D65ZIo47KZr 7lfNZBvfirv5qw7immdPr+VaffMmAt3TZ6G/slT0Zk+BFCMAPS75/NpMAmh9VNLSd2Iz ymSlmIIXlJaWCHToRiLjcMJgZw9WVl+vDXlRgBOgnEqS2nhuoBjYLOb074SaLpYhJkrU +A8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=2W+JfXszFKiiXnGnUJxOyc3JlTcHZinHuKGXyJsNtvM=; b=H9rJz3eCSIFy66BkanMfw9sDW17VHhVJo6tH3bUl9uHU8EHIaZIv8ABxYBgBM+Hbth 3mdS8B5ZcC1A6wiH5T65+3DiBpvKcn3T/refycKN0LtQ9gGzeMBf1WOlNgKU0GvtVIfR 0c0m0a0RlkGabSlNNxJvNAgguGw7/YcIA7vor+OwvxZIz6AJHPze9j5K33w7rfZNIgsj qrW86kx5ixcwjNcZB9tiL83QNwb4P2bUUY4pA8GlEUrI3NKby3qDUp420f8NForgRPZI 9IAyxpjatwATJsU2L6X9F3BzkJYk8I9Z5f6NJCMV3nu3unmSbH6NT+6mJOboaOAXUg25 ngVw== X-Gm-Message-State: ALoCoQnf2NKs+rJy3Wm6YRnZDd50nxSpKJZ7HEj8r+h+ESZLtTlRzO7luGSRb48DfMVCCBxy5w4p MIME-Version: 1.0 X-Received: by 10.50.97.37 with SMTP id dx5mr2915349igb.14.1447777556078; Tue, 17 Nov 2015 08:25:56 -0800 (PST) Received: by 10.107.140.15 with HTTP; Tue, 17 Nov 2015 08:25:56 -0800 (PST) In-Reply-To: <2601191342CEEE43887BDE71AB97725836AC9C07@irsmsx105.ger.corp.intel.com> References: <20151116161201.7e951097@samsung9> <20151116173129.2a429930@samsung9> <2601191342CEEE43887BDE71AB97725836AC9A40@irsmsx105.ger.corp.intel.com> <2601191342CEEE43887BDE71AB97725836AC9C07@irsmsx105.ger.corp.intel.com> Date: Tue, 17 Nov 2015 10:25:56 -0600 Message-ID: From: Matt Laswell To: "Ananyev, Konstantin" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] How to approach packet TX lockups X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Nov 2015 16:25:57 -0000 Thanks, I'll give that a try. In my environment, I'm pretty sure we're using the fully-featured ixgbe_xmit_pkts() and not _simple(). If setting rs_thresh=3D1 is safer, I'll stick with that. Again, thanks to all for the assistance. - Matt On Tue, Nov 17, 2015 at 10:20 AM, Ananyev, Konstantin < konstantin.ananyev@intel.com> wrote: > Hi Matt, > > > > As I said, at least try to upgrade contents of shared code to the latest > one. > > In previous releases: lib/librte_pmd_ixgbe/ixgbe, now located at: > drivers/net/ixgbe/. > > > > > For reference, my transmit function is rte_eth_tx_burst(). > > I meant what ixgbe TX function it points to: ixgbe_xmit_pkts or > ixgbe_xmit_pkts_simple()? > > For ixgbe_xmit_pkts_simple() don=E2=80=99t set tx_rs_thresh > 32, > > for ixgbe_xmit_pkts() the safest way is to set tx_rs_thresh=3D1. > > Though as I understand from your previous mails, you already did that, an= d > it didn=E2=80=99t help. > > Konstantin > > > > > > *From:* Matt Laswell [mailto:laswell@infiniteio.com] > *Sent:* Tuesday, November 17, 2015 3:05 PM > *To:* Ananyev, Konstantin > *Cc:* Stephen Hemminger; dev@dpdk.org > > *Subject:* Re: [dpdk-dev] How to approach packet TX lockups > > > > Hey Konstantin, > > > > Moving from 1.6r2 to 2.2 is going to be a pretty significant change due t= o > things like changes in the MBuf format, API differences, etc. Even as an > experiment, that's an awfully large change to absorb. Is there a subset > that you're referring to that could be more readily included without > modifying so many touch points into DPDK? > > > > For reference, my transmit function is rte_eth_tx_burst(). It seems to > reliably tell me that it has enqueued all of the packets that I gave it, > however the stats from rte_eth_stats_get() indicate that no packets are > actually being sent. > > > > Thanks, > > > > - Matt > > > > On Tue, Nov 17, 2015 at 8:44 AM, Ananyev, Konstantin < > konstantin.ananyev@intel.com> wrote: > > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Matt Laswell > > Sent: Tuesday, November 17, 2015 2:24 PM > > To: Stephen Hemminger > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] How to approach packet TX lockups > > > > Yes, we're on 1.6r2. That said, I've tried a number of different value= s > > for the thresholds without a lot of luck. Setting wthresh/hthresh/ > pthresh > > to 0/0/32 or 0/0/0 doesn't appear to fix things. And, as Matthew > > suggested, I'm pretty sure using 0 for the thresholds leads to auto- > config > > by the driver. I also tried 1/1/32, which required that I also change > the > > rs_thresh value from 0 to 1 to work around a panic in PMD initializatio= n > > ("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1"). > > > > Any other suggestions? > > That's not only DPDK code changed since 1.6. > I am pretty sure that we also have a new update of shared code since then > (and as I remember probably more than one). > One suggestion would be at least try to upgrade the shared code up to the > latest. > Another one - even if you can't upgrade to 2.2 in you production > environment, > it probably worth to do that in some test environment and then check does > the problem persist. > If yes, then we'll need some guidance how to reproduce it. > > Another question it is not clear what TX function do you use? > Konstantin > > > > > > On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger < > > stephen@networkplumber.org> wrote: > > > > > On Mon, 16 Nov 2015 18:49:15 -0600 > > > Matt Laswell wrote: > > > > > > > Hey Stephen, > > > > > > > > Thanks a lot; that's really useful information. Unfortunately, I'm > at a > > > > stage in our release cycle where upgrading to a new version of DPDK > isn't > > > > feasible. Any chance you (or others reading this) has a pointer to > the > > > > relevant changes? While I can't afford to upgrade DPDK entirely, > > > > backporting targeted fixes is more doable. > > > > > > > > Again, thanks. > > > > > > > > - Matt > > > > > > > > > > > > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger < > > > > stephen@networkplumber.org> wrote: > > > > > > > > > On Mon, 16 Nov 2015 17:48:35 -0600 > > > > > Matt Laswell wrote: > > > > > > > > > > > Hey Folks, > > > > > > > > > > > > I sent this to the users email list, but I'm not sure how many > > > people are > > > > > > actively reading that list at this point. I'm dealing with a > > > situation > > > > > in > > > > > > which my application loses the ability to transmit packets out > of a > > > port > > > > > > during times of moderate stress. I'd love to hear suggestions > for > > > how to > > > > > > approach this problem, as I'm a bit at a loss at the moment. > > > > > > > > > > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS o= n > > > Haswell > > > > > > processors. I'm using the 82599 controller, configured to spre= ad > > > packets > > > > > > across multiple queues. Each queue is accessed by a different > lcore > > > in > > > > > my > > > > > > application; there is therefore concurrent access to the > controller, > > > but > > > > > > not to any of the queues. We're binding the ports to the > igb_uio > > > driver. > > > > > > The symptoms I see are these: > > > > > > > > > > > > > > > > > > - All transmit out of a particular port stops > > > > > > - rte_eth_tx_burst() indicates that it is sending all of the > > > packets > > > > > > that I give to it > > > > > > - rte_eth_stats_get() gives me stats indicating that no > packets > > > are > > > > > > being sent on the affected port. Also, no tx errors, and no > pause > > > > > frames > > > > > > sent or received (opackets =3D 0, obytes =3D 0, oerrors =3D = 0, > etc.) > > > > > > - All other ports continue to work normally > > > > > > - The affected port continues to receive packets without > problems; > > > > > only > > > > > > TX is affected > > > > > > - Resetting the port via rte_eth_dev_stop() and > > > rte_eth_dev_start() > > > > > > restores things and packets can flow again > > > > > > - The problem is replicable on multiple devices, and doesn't > > > follow > > > > > one > > > > > > particular port > > > > > > > > > > > > I've tried calling rte_mbuf_sanity_check() on all packets befor= e > > > sending > > > > > > them. I've also instrumented my code to look for packets that > have > > > > > already > > > > > > been sent or freed, as well as cycles in chained packets being > > > sent. I > > > > > > also put a lock around all accesses to rte_eth* calls to > synchronize > > > > > access > > > > > > to the NIC. Given some recent discussion here, I also tried > > > changing the > > > > > > TX RS threshold from 0 to 32, 16, and 1. None of these > strategies > > > proved > > > > > > effective. > > > > > > > > > > > > Like I said at the top, I'm a little at a loss at this point. > If you > > > > > were > > > > > > dealing with this set of symptoms, how would you proceed? > > > > > > > > > > > > > > > > I remember some issues with old DPDK 1.6 with some of the prefetc= h > > > > > thresholds on 82599. You would be better off going to a later DPD= K > > > > > version. > > > > > > > > > > > I hope you are on 1.6.0r2 at least?? > > > > > > With older DPDK there was no way to get driver to tell you what the > > > preferred settings were for pthresh/hthresh/wthresh. And the values > > > in Intel sample applications were broken on some hardware. > > > > > > I remember reverse engineering the safe values from reading the Linux > > > driver. > > > > > > The Linux driver is much better tested than the DPDK one... > > > In the Linux driver, the Transmit Descriptor Controller (txdctl) > > > is fixed at (for transmit) > > > wthresh =3D 1 > > > hthresh =3D 1 > > > pthresh =3D 32 > > > > > > The DPDK 2.2 driver uses: > > > wthresh =3D 0 > > > hthresh =3D 0 > > > pthresh =3D 32 > > > > > > > > > > > > > > > > > > > > > > > >