From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ig0-f170.google.com (mail-ig0-f170.google.com [209.85.213.170]) by dpdk.org (Postfix) with ESMTP id 78FCEC36E for ; Wed, 1 Jul 2015 02:50:14 +0200 (CEST) Received: by igcur8 with SMTP id ur8so78386274igc.0 for ; Tue, 30 Jun 2015 17:50:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=5FmXvuxzmhSDioOWLCnaXT0AtTi9VYaBCeeuuiuRh3A=; b=09Iam3duYN/mBepbEMU5WMZGBQ+UBMO6uFhoVcJSrWTJhLJHhkeEG4SlOVWr4WHNz1 HGPZDvYHclepTcKUr0DAYgubwNC/h0RIX/Xd2PZ3obFsx3Pwx1/UZ4zlX7Sa0y+E8gRm pBAby3GvKtE00cmoZRwmPDaZ5BvlwvLh5x7OZUdxD1oswvSE/ROuCjbzunl6hd23vhtF RH0tsrA/STDyZtNAx8KajQtBU2M+aQDHzJS1FwZnPT4YBAzdpq+CQfsZU2HoNyWdP9vA G0mpMlzM3EBEUt8lSG9i5D1x8da8Y3eoGP8hrDYez0pLJkFtsnKPOYZ650iGWzfbRHva 1c5g== MIME-Version: 1.0 X-Received: by 10.107.128.72 with SMTP id b69mr31946368iod.84.1435711814380; Tue, 30 Jun 2015 17:50:14 -0700 (PDT) Received: by 10.36.194.129 with HTTP; Tue, 30 Jun 2015 17:50:14 -0700 (PDT) In-Reply-To: References: <5700614.EXxNvnLqa2@xps13> Date: Tue, 30 Jun 2015 17:50:14 -0700 Message-ID: From: Gopakumar Choorakkot Edakkunni To: Thomas Monjalon , bruce.richardson@intel.com Content-Type: text/plain; charset=UTF-8 Cc: dev@dpdk.org Subject: Re: [dpdk-dev] dpdk-2.0.0: crash in ixgbe_recv_scattered_pkts_vec->_recv_raw_pkts_vec->desc_to_olflags_v X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Jul 2015 00:50:14 -0000 So update on this. Summary is that its purely my fault, apologies for prematurely suspecting the wrong areas. Details below 1. So my AWS box had an eth0 interface without DPDK, I enabled dpdk AND created a KNI interface also AND named the KNI interface to be eth0 2. So Ubuntu started its dhcpclient on that interface, but my app doesnt really do anything do read the dhcp (renews) from the KNI and send it out the physical port and vice versa .. The kni was just sitting there not doing much of Rx/Tx 3. Now my l2fwd-equivalent code started working fine, after a few minutes, the dhcp client on ubuntu gave up attempting dhcp renew (eth0 already had an IP) and attempted to take off the IP from eth0 4. At this point the standard KNI examples in dpdk which has callbacks registered, ended up being invoked - and the examples have a port_stop() and a port_start() in them - and exactly at this point my app crashed So my bad! I just no-oped the callbacks for now and changed AWS eht0 from dhcp to static IP and this are fine now ! My system has been up for long with no issues. Thanks again Thomas and Bruce for the quick response and suggestions Rgds, Gopa. On Tue, Jun 30, 2015 at 11:28 AM, Gopakumar Choorakkot Edakkunni wrote: > Hi Thomas, Bruce, > > Thanks for the responses. Please find my answers as below. > > Thomas>> "You mean you are using SR-IOV from Amazon, right? Do you > have more hardware details?" > > That is correct. I am attaching three files cpuinfo.txt lcpci.txt and > portconf.txt (just the port config that I am using, nothing special, > yanked off of l2fwd example). The two 82599 VF interfaces seen in > lspci output are the ones of interest - I use one of them in dpdk > mode. > > Thomas>> Did you try to disable CONFIG_RTE_IXGBE_INC_VECTOR? > > Thanks for the suggestion, I made that change and was giving it some > time. Now the result of that is not entirely black and white: > previously (in vector mode) my app used to Rx/Tx packets nicely > without any hiccups, but would crash in 10 minutes :). Now with this > suggested change, its been running for a while and doesnt crash, but > the Tx latency and Tx loss is so high (around 10% tx loss) that the > app is not doing a great job - but that might just be something that I > need to adapt to when using non-vector mode ? I will experiment on > that a bit more. So I "think" its fair to say that with the vector > disabled, theres no crash, but I need to chase this latency/loss now. > > Thomas>> Not needed. A DPDK application is fast enough to do the job > in 10 minutes ;) > > Haha, good one :). Thats where I want to get to eventually, but right > now some distance from it. > > Bruce>> Can you perhaps isolate any further the root cause of the > issue. For example, does it only occur when you get three packets at > the receive ring wraps back around to zero? > > I will try some more experiments, will read and understand this Rx > code a bit more to be able to answer the qn about whether ring wraps > around when the problem happens etc.. > > Rgds, > Gopa. > > > On Tue, Jun 30, 2015 at 9:08 AM, Thomas Monjalon > wrote: >> 2015-06-30 08:49, Gopakumar Choorakkot Edakkunni: >>> I am starting to tryout dpdk-2.0.0 with a simple Rx routine very >>> similar to the l2fwd example - I am running this on a c3.8xlarge aws >>> sr-iov enabled vpc instance (inside the vm it uses ixgbevf driver). >> >> You mean you are using SR-IOV from Amazon, right? >> Do you have more hardware details? >> >>> Once in every 10 minutes my application crashes in the recieve path. >>> And whenever I check the crash reason its because it always has three >>> packets in the burst array (I have provided array size of 32) instead >>> of the four that it tries to collect in one bunch. And inside >>> desc_to_olflags_v(), theres the assumption that there are four >>> packets, and obviously it crashes trying to access the fourth buffer. >> >> Did you try to disable CONFIG_RTE_IXGBE_INC_VECTOR? >> >>> With a brief look at the code, I really cant make out how its >>> guaranteed that we will always have four descriptors fully populated ? >>> After the first iteration, the loop does break out if (likely(var != >>> RTE_IXGBE_DESCS_PER_LOOP)), but how about the very first iteration >>> where we might not have four ? >>> >>> Any thoughts will be helpful here, trying to get my app working for >>> more than 10 minutes :) >> >> Not needed. A DPDK application is fast enough to do the job in 10 minutes ;) >>