From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f178.google.com (mail-io0-f178.google.com [209.85.223.178]) by dpdk.org (Postfix) with ESMTP id 0480D374E for ; Tue, 17 Nov 2015 01:49:16 +0100 (CET) Received: by iouu10 with SMTP id u10so3328666iou.0 for ; Mon, 16 Nov 2015 16:49:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infiniteio_com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=6prIkjliPJCtn+Vl5hKgTexGlAHT0gfR6UPhA3ssoq4=; b=nGkBDz7NQMbadMS6bXCnb5jUxDEX0oba77FiP0XhOhS5izRswtDJMCaaYXpj7OFwRZ KTUgOZAoKLnhv5bDIqttED6zx7E2JbMj0Eg35uJOV588gDM1Ijzb3OH6XoZPu3c8b1yR KdDnvd1IOdo+b+X6cg1AQdRWsbqqLCxvOUk8m4RfXRQads3Ar7aIj4wmqCq9qpS+Tsxk bE2quH8OGzoDvR077ev9jSrs1+QBtotfMKx7L4D2+8/x7QjGrIYKDgH1YFkRgovqbCFl y/hznYP8f9sOhCSwe6cuEpMetLPX9C86J43CkJSERKoUWQIYJK/CbFtSfSw32/ifQNg+ /eyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=6prIkjliPJCtn+Vl5hKgTexGlAHT0gfR6UPhA3ssoq4=; b=CzXd7UD5RQ+XWzwFQgZzyM8gHxZ/3cVciSRMNAjXN6fCB5mpiReuT0jfxknSeX1Tbb 53r1QR+K2NLj/PCKXFbqJUbHkkSOJ79SE77AL0/sua+KlNbQgxtATJzt5RQa8bpTPg9j 6u7Fsujn6+JCfrsgreGNf1GK/BROWOo4W/waKvrnbGNHvTRUi22A86fnSPw3rV6zKKdQ hC3+Kowp2Lx0WttO8flY9zIhnSEJU/E/MM82nWhkCv6NrJ0bz9iOUEws2zEm47I5/ba+ tTjE/NxRNSmFeNlVaRpuPzQjlRIxhO5EUP5Fh+WJi0qsq69ciN3SrTRhOFVjQaauL8P2 9slg== X-Gm-Message-State: ALoCoQnXovGD2vizzzbbQUNVGmMZDPuzYD8IRttNhNeefOGTwbm325hYQ1rllgHXK4QGMHwO/+E/ MIME-Version: 1.0 X-Received: by 10.107.32.209 with SMTP id g200mr27114512iog.39.1447721355457; Mon, 16 Nov 2015 16:49:15 -0800 (PST) Received: by 10.107.140.15 with HTTP; Mon, 16 Nov 2015 16:49:15 -0800 (PST) In-Reply-To: <20151116161201.7e951097@samsung9> References: <20151116161201.7e951097@samsung9> Date: Mon, 16 Nov 2015 18:49:15 -0600 Message-ID: From: Matt Laswell To: Stephen Hemminger Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] How to approach packet TX lockups X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Nov 2015 00:49:16 -0000 Hey Stephen, Thanks a lot; that's really useful information. Unfortunately, I'm at a stage in our release cycle where upgrading to a new version of DPDK isn't feasible. Any chance you (or others reading this) has a pointer to the relevant changes? While I can't afford to upgrade DPDK entirely, backporting targeted fixes is more doable. Again, thanks. - Matt On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger < stephen@networkplumber.org> wrote: > On Mon, 16 Nov 2015 17:48:35 -0600 > Matt Laswell wrote: > > > Hey Folks, > > > > I sent this to the users email list, but I'm not sure how many people are > > actively reading that list at this point. I'm dealing with a situation > in > > which my application loses the ability to transmit packets out of a port > > during times of moderate stress. I'd love to hear suggestions for how to > > approach this problem, as I'm a bit at a loss at the moment. > > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on Haswell > > processors. I'm using the 82599 controller, configured to spread packets > > across multiple queues. Each queue is accessed by a different lcore in > my > > application; there is therefore concurrent access to the controller, but > > not to any of the queues. We're binding the ports to the igb_uio driver. > > The symptoms I see are these: > > > > > > - All transmit out of a particular port stops > > - rte_eth_tx_burst() indicates that it is sending all of the packets > > that I give to it > > - rte_eth_stats_get() gives me stats indicating that no packets are > > being sent on the affected port. Also, no tx errors, and no pause > frames > > sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.) > > - All other ports continue to work normally > > - The affected port continues to receive packets without problems; > only > > TX is affected > > - Resetting the port via rte_eth_dev_stop() and rte_eth_dev_start() > > restores things and packets can flow again > > - The problem is replicable on multiple devices, and doesn't follow > one > > particular port > > > > I've tried calling rte_mbuf_sanity_check() on all packets before sending > > them. I've also instrumented my code to look for packets that have > already > > been sent or freed, as well as cycles in chained packets being sent. I > > also put a lock around all accesses to rte_eth* calls to synchronize > access > > to the NIC. Given some recent discussion here, I also tried changing the > > TX RS threshold from 0 to 32, 16, and 1. None of these strategies proved > > effective. > > > > Like I said at the top, I'm a little at a loss at this point. If you > were > > dealing with this set of symptoms, how would you proceed? > > > > I remember some issues with old DPDK 1.6 with some of the prefetch > thresholds on 82599. You would be better off going to a later DPDK > version. >