From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr0-f178.google.com (mail-wr0-f178.google.com [209.85.128.178]) by dpdk.org (Postfix) with ESMTP id 30151567F for ; Wed, 3 May 2017 00:36:44 +0200 (CEST) Received: by mail-wr0-f178.google.com with SMTP id l9so92793942wre.1 for ; Tue, 02 May 2017 15:36:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infinite-io.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=B2V4F1xVcgi4n/wx2lyOvE/b/YjCI3qbGwYrEc0ZJ2c=; b=woE5TMNxaAes54SWI01j6oTpjVEHVHaXTzlcxRMtaKKbVFaPCZdFSkaFb8snAl+yMA 2xOJr/e8DpHP2YQDHqZxhPGjvGGmsrVHLU3ywaVpKkRiWovbRBylApegIHP7daL4UzCm hEZ4FLeoLa87Ay4/DX8gsOKDHfRbeibWbUKv6kkz5bHjcJeFi4o9+9mM84u3bifQWpsR JmMDxGLdNzUyDuvT8jRTBoPu195qzMLp6SqvKSszpqGDUmruwYQA4uLoW4RE4lKmlXiq h4swUrHHuFYFjgVaszoqC1hC9MTOfKwsVjMIwDcXi5PgAFg0FibE0nGDVhXtWvYJWbLI 5e3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=B2V4F1xVcgi4n/wx2lyOvE/b/YjCI3qbGwYrEc0ZJ2c=; b=e2d/AGQbGeGTXHojUKpgF4eBVJGMiqdbo60qKXDMrFULZ2Mf4TNwzqm1gsE/luwAlB kZv52XCjMln/S2cd21hOCIauNSg9Ygg7GzQAjfxlMo4mapcgNHf3yWzLvT6BcqVN1O+7 XHnUaszqbEAsLQYAut1cuiXWhyMypEbRsC/px9sQRe2yTMS/G5cKeHUvgpmIL+jjLFK9 fN8MnFGu+N0Fva641WJv9K4OcvrXnMEvMYU+Ozef5qaJDpcDbK+PdivCZ+Yx5xXcdfIS krsSQ7lSOgUef6j7Czu1Zfq/I1u/LB2+Dswlw1gaZN0dbugn/4ORQpU0+5ZgjU/WbdEX rJGw== X-Gm-Message-State: AN3rC/4r+Tr7w92y3hbueC6ezNA6bpQJcrmUfqwTUiFWPBhrch4nedeW 52YCLLbDRucY4UibxpyDGEaCkM+lkmU4 X-Received: by 10.46.69.139 with SMTP id s133mr11174347lja.72.1493764603465; Tue, 02 May 2017 15:36:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.43.138 with HTTP; Tue, 2 May 2017 15:36:43 -0700 (PDT) From: Matt Laswell Date: Tue, 2 May 2017 17:36:43 -0500 Message-ID: To: users Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-users] Occasional instability in RSS Hashes from X540 NIC X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 May 2017 22:36:44 -0000 Hey Folks, I'm seeing some strange behavior with regard to the RSS hash values in my applications and was hoping somebody might have some pointers on where to look. In my application, I'm using RSS to divide work among multiple cores, each of which services a single RX queue. When dealing with a single long-lived TCP connection, I occasionally see packets going to the wrong core. That is, almost all of the packets in the connection go to core 5 in this case, but every once in a while, one goes to core 0 instead. Upon further investigation, I find that the problem packets always have the RSS hash value in the mbuf set to zero. They are therefore put in queue zero, where they are read by core zero. Other packets from the same connection that occur immediately before and after the packet in question have the correct hash value and therefore go to a different core. This plays havoc with my tracking of the TCP stream. A few details: - Using an Intel X540-AT2 NIC and the igb_uio driver - DPDK 16.04 - A particular packet in our workflow always encounters this problem. - Retransmissions of the packet in question also encounter the problem - The packet is IPv4, with header length of 20 (so no options), no fragmentation. - The only differences I can see in the IP header between packets that get the right hash value and those that get the wrong one are in the IP ID, total length, and checksum fields. - Using ETH_RSS_IPV4 - We fill the key in with 0x6d5a to get symmetric hashing of both sides of the connection - We only configure RSS information at boot; things like the key or header fields are not being changed dynamically - Traffic load is light when the problem occurs Is anybody aware of an errata, either in the NIC or the PMD's configuration of it that might explain something like this? Failing that, if you ran into this sort of behavior, how would you approach finding the reason for the error? Every failure mode I can think of would tend to affect all of the packets in the connection consistently, even if incorrectly. Thanks in advance for any ideas. -- Matt Laswell laswell@infinite.io