DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Wiles, Keith" <keith.wiles@intel.com>
To: Take Ceara <dumitru.ceara@gmail.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Performance hit - NICs on different CPU sockets
Date: Thu, 16 Jun 2016 16:56:41 +0000	[thread overview]
Message-ID: <9FD66398-C7F0-43AF-89F6-79BB95A16A37@intel.com> (raw)
In-Reply-To: <CAKKV4w9-mNTOwiuNd1enS4ixoQrK4nOZ2Va6-ZvdZKzLtVS_Gg@mail.gmail.com>


On 6/16/16, 11:20 AM, "Take Ceara" <dumitru.ceara@gmail.com> wrote:

>On Thu, Jun 16, 2016 at 5:29 PM, Wiles, Keith <keith.wiles@intel.com> wrote:
>
>>
>> Right now I do not know what the issue is with the system. Could be too many Rx/Tx ring pairs per port and limiting the memory in the NICs, which is why you get better performance when you have 8 core per port. I am not really seeing the whole picture and how DPDK is configured to help more. Sorry.
>
>I doubt that there is a limitation wrt running 16 cores per port vs 8
>cores per port as I've tried with two different machines connected
>back to back each with one X710 port and 16 cores on each of them
>running on that port. In that case our performance doubled as
>expected.
>
>>
>> Maybe seeing the DPDK command line would help.
>
>The command line I use with ports 01:00.3 and 81:00.3 is:
>./warp17 -c 0xFFFFFFFFF3   -m 32768 -w 0000:81:00.3 -w 0000:01:00.3 --
>--qmap 0.0x003FF003F0 --qmap 1.0x0FC00FFC00
>
>Our own qmap args allow the user to control exactly how cores are
>split between ports. In this case we end up with:
>
>warp17> show port map
>Port 0[socket: 0]:
>   Core 4[socket:0] (Tx: 0, Rx: 0)
>   Core 5[socket:0] (Tx: 1, Rx: 1)
>   Core 6[socket:0] (Tx: 2, Rx: 2)
>   Core 7[socket:0] (Tx: 3, Rx: 3)
>   Core 8[socket:0] (Tx: 4, Rx: 4)
>   Core 9[socket:0] (Tx: 5, Rx: 5)
>   Core 20[socket:0] (Tx: 6, Rx: 6)
>   Core 21[socket:0] (Tx: 7, Rx: 7)
>   Core 22[socket:0] (Tx: 8, Rx: 8)
>   Core 23[socket:0] (Tx: 9, Rx: 9)
>   Core 24[socket:0] (Tx: 10, Rx: 10)
>   Core 25[socket:0] (Tx: 11, Rx: 11)
>   Core 26[socket:0] (Tx: 12, Rx: 12)
>   Core 27[socket:0] (Tx: 13, Rx: 13)
>   Core 28[socket:0] (Tx: 14, Rx: 14)
>   Core 29[socket:0] (Tx: 15, Rx: 15)
>
>Port 1[socket: 1]:
>   Core 10[socket:1] (Tx: 0, Rx: 0)
>   Core 11[socket:1] (Tx: 1, Rx: 1)
>   Core 12[socket:1] (Tx: 2, Rx: 2)
>   Core 13[socket:1] (Tx: 3, Rx: 3)
>   Core 14[socket:1] (Tx: 4, Rx: 4)
>   Core 15[socket:1] (Tx: 5, Rx: 5)
>   Core 16[socket:1] (Tx: 6, Rx: 6)
>   Core 17[socket:1] (Tx: 7, Rx: 7)
>   Core 18[socket:1] (Tx: 8, Rx: 8)
>   Core 19[socket:1] (Tx: 9, Rx: 9)
>   Core 30[socket:1] (Tx: 10, Rx: 10)
>   Core 31[socket:1] (Tx: 11, Rx: 11)
>   Core 32[socket:1] (Tx: 12, Rx: 12)
>   Core 33[socket:1] (Tx: 13, Rx: 13)
>   Core 34[socket:1] (Tx: 14, Rx: 14)
>   Core 35[socket:1] (Tx: 15, Rx: 15)

On each socket you have 10 physical cores or 20 lcores per socket for 40 lcores total.

The above is listing the LCORES (or hyper-threads) and not COREs, which I understand some like to think they are interchangeable. The problem is the hyper-threads are logically interchangeable, but not performance wise. If you have two run-to-completion threads on a single physical core each on a different hyper-thread of that core [0,1], then the second lcore or thread (1) on that physical core will only get at most about 30-20% of the CPU cycles. Normally it is much less, unless you tune the code to make sure each thread is not trying to share the internal execution units, but some internal execution units are always shared.

To get the best performance when hyper-threading is enable is to not run both threads on a single physical core, but only run one hyper-thread-0.

In the table below the table lists the physical core id and each of the lcore ids per socket. Use the first lcore per socket for the best performance:
Core 1 [1, 21]    [11, 31]
Use lcore 1 or 11 depending on the socket you are on.

The info below is most likely the best performance and utilization of your system. If I got the values right ☺

./warp17 -c 0x00000FFFe0   -m 32768 -w 0000:81:00.3 -w 0000:01:00.3 --
--qmap 0.0x00000003FE --qmap 1.0x00000FFE00

Port 0[socket: 0]:
   Core 2[socket:0] (Tx: 0, Rx: 0)
   Core 3[socket:0] (Tx: 1, Rx: 1)
   Core 4[socket:0] (Tx: 2, Rx: 2)
   Core 5[socket:0] (Tx: 3, Rx: 3)
   Core 6[socket:0] (Tx: 4, Rx: 4)
   Core 7[socket:0] (Tx: 5, Rx: 5)
   Core 8[socket:0] (Tx: 6, Rx: 6)
   Core 9[socket:0] (Tx: 7, Rx: 7)

8 cores on first socket leaving 0-1 lcores for Linux.

Port 1[socket: 1]:
   Core 10[socket:1] (Tx: 0, Rx: 0)
   Core 11[socket:1] (Tx: 1, Rx: 1)
   Core 12[socket:1] (Tx: 2, Rx: 2)
   Core 13[socket:1] (Tx: 3, Rx: 3)
   Core 14[socket:1] (Tx: 4, Rx: 4)
   Core 15[socket:1] (Tx: 5, Rx: 5)
   Core 16[socket:1] (Tx: 6, Rx: 6)
   Core 17[socket:1] (Tx: 7, Rx: 7)
   Core 18[socket:1] (Tx: 8, Rx: 8)
   Core 19[socket:1] (Tx: 9, Rx: 9)

All 10 cores on the second socket.

++Keith

>
>Just for reference, the cpu_layout script shows:
>$ $RTE_SDK/tools/cpu_layout.py
>============================================================
>Core and Socket Information (as reported by '/proc/cpuinfo')
>============================================================
>
>cores =  [0, 1, 2, 3, 4, 8, 9, 10, 11, 12]
>sockets =  [0, 1]
>
>        Socket 0        Socket 1
>        --------        --------
>Core 0  [0, 20]         [10, 30]
>Core 1  [1, 21]         [11, 31]
>Core 2  [2, 22]         [12, 32]
>Core 3  [3, 23]         [13, 33]
>Core 4  [4, 24]         [14, 34]
>Core 8  [5, 25]         [15, 35]
>Core 9  [6, 26]         [16, 36]
>Core 10 [7, 27]         [17, 37]
>Core 11 [8, 28]         [18, 38]
>Core 12 [9, 29]         [19, 39]
>
>I know it might be complicated to gigure out exactly what's happening
>in our setup with our own code so please let me know if you need
>additional information.
>
>I appreciate the help!
>
>Thanks,
>Dumitru
>




  reply	other threads:[~2016-06-16 16:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-13 14:07 Take Ceara
2016-06-13 14:28 ` Bruce Richardson
2016-06-14  7:47   ` Take Ceara
2016-06-13 19:35 ` Wiles, Keith
2016-06-14  7:46   ` Take Ceara
2016-06-14 13:47     ` Wiles, Keith
2016-06-16 14:36       ` Take Ceara
2016-06-16 14:58         ` Wiles, Keith
2016-06-16 15:16           ` Take Ceara
2016-06-16 15:29             ` Wiles, Keith
2016-06-16 16:20               ` Take Ceara
2016-06-16 16:56                 ` Wiles, Keith [this message]
2016-06-16 16:59                   ` Wiles, Keith
2016-06-16 18:20                     ` Take Ceara
2016-06-16 19:33                       ` Wiles, Keith
2016-06-16 20:00                         ` Take Ceara
2016-06-16 20:16                           ` Wiles, Keith
2016-06-16 20:19                             ` Wiles, Keith
2016-06-16 20:27                               ` Take Ceara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9FD66398-C7F0-43AF-89F6-79BB95A16A37@intel.com \
    --to=keith.wiles@intel.com \
    --cc=dev@dpdk.org \
    --cc=dumitru.ceara@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).