From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <keith.wiles@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 9B9B57D0D
 for <dev@dpdk.org>; Thu,  4 May 2017 18:35:01 +0200 (CEST)
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 04 May 2017 09:35:00 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.38,287,1491289200"; d="scan'208";a="83672916"
Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206])
 by orsmga004.jf.intel.com with ESMTP; 04 May 2017 09:34:59 -0700
Received: from fmsmsx114.amr.corp.intel.com (10.18.116.8) by
 FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Thu, 4 May 2017 09:34:59 -0700
Received: from fmsmsx113.amr.corp.intel.com ([169.254.13.235]) by
 FMSMSX114.amr.corp.intel.com ([169.254.6.206]) with mapi id 14.03.0319.002;
 Thu, 4 May 2017 09:34:59 -0700
From: "Wiles, Keith" <keith.wiles@intel.com>
To: Matt Laswell <laswell@infinite.io>
CC: "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] Occasional instability in RSS Hashes/Queues from
 X540 NIC
Thread-Index: AQHSxNb7ouTsxLjiXEKddMLpif2a+qHk1GcA
Date: Thu, 4 May 2017 16:34:58 +0000
Message-ID: <7B7539B0-8CB0-4DDB-B329-D11F295A2604@intel.com>
References: <CA+GnqArrtwzGpYd7GgZwV1LWB6X4-4oiOBDmFc-J4AfkT3OHbg@mail.gmail.com>
In-Reply-To: <CA+GnqArrtwzGpYd7GgZwV1LWB6X4-4oiOBDmFc-J4AfkT3OHbg@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.254.17.150]
Content-Type: text/plain; charset="us-ascii"
Content-ID: <EEDA0F54DFA8F84E9D89EF322692D750@intel.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] Occasional instability in RSS Hashes/Queues from
 X540 NIC
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 04 May 2017 16:35:03 -0000


> On May 4, 2017, at 8:04 AM, Matt Laswell <laswell@infinite.io> wrote:
>=20
> Hey Folks,
>=20
> I'm seeing some strange behavior with regard to the RSS hash values in my
> applications and was hoping somebody might have some pointers on where to
> look.  In my application, I'm using RSS to divide work among multiple
> cores, each of which services a single RX queue.  When dealing with a
> single long-lived TCP connection, I occasionally see packets going to the
> wrong core.   That is, almost all of the packets in the connection go to
> core 5 in this case, but every once in a while, one goes to core 0 instea=
d.
>=20
> Upon further investigation, I find two problems are occurring.  The first
> is that problem packets have the RSS hash value in the mbuf incorrectly s=
et
> to zero.  They are therefore put in queue zero, where they are read by co=
re
> zero.  Other packets from the same connection that occur immediately befo=
re
> and after the packet in question have the correct hash value and therefor=
e
> go to a different core.   The second problem is that we sometimes see
> packets in which the RSS hash in the mbuf appears correct, but the packet=
s
> are incorrectly put into queue zero.  As with the first, this results in
> the wrong core getting the packet.  Either one of these confuses the stat=
e
> tracking we're doing per-core.
>=20
> A few details:
>=20
>   - Using an Intel X540-AT2 NIC and the igb_uio driver
>   - DPDK 16.04
>   - A particular packet in our workflow always encounters this problem.
>   - Retransmissions of the packet in question also encounter the problem
>   - The packet is IPv4, with header length of 20 (so no options), no
>   fragmentation.
>   - The only differences I can see in the IP header between packets that
>   get the right hash value and those that get the wrong one are in the IP=
 ID,
>   total length, and checksum fields.
>   - Using ETH_RSS_IPV4
>   - The packet is TCP with about 100 bytes of payload - it's not a jumbo
>   or a runt
>   - We fill the key in with 0x6d5a to get symmetric hashing of both sides
>   of the connection
>   - We only configure RSS information at boot; things like the key or
>   header fields are not being changed dynamically
>   - Traffic load is light when the problem occurs
>=20
> Is anybody aware of an errata, either in the NIC or the PMD's configurati=
on
> of it that might explain something like this?   Failing that, if you ran
> into this sort of behavior, how would you approach finding the reason for
> the error?  Every failure mode I can think of would tend to affect all of
> the packets in the connection consistently, even if incorrectly.

Just to add more information to this email, can you provide hexdumps of the=
 packets to help someone maybe spot the problem?

Need the previous OK packet plus the one after it and the failing packets y=
ou are seeing.

I do not know why this is happening as I do not know of any errata to expla=
in this issue.

>=20
> Thanks in advance for any ideas.
>=20
> --
> Matt Laswell
> laswell@infinite.io

Regards,
Keith