* [dpdk-dev] NUMA CPU Sockets and DPDK
@ 2014-02-12 11:28 Prashant Upadhyaya
2014-02-12 11:47 ` Etai Lev Ran
2014-02-12 11:49 ` Richardson, Bruce
0 siblings, 2 replies; 5+ messages in thread
From: Prashant Upadhyaya @ 2014-02-12 11:28 UTC (permalink / raw)
To: dev
Hi guys,
What has been your experience of using DPDK based app's in NUMA mode with multiple sockets where some cores are present on one socket and other cores on some other socket.
I am migrating my application from one intel machine with 8 cores, all in one socket to a 32 core machine where 16 cores are in one socket and 16 other cores in the second socket.
My core 0 does all initialization for mbuf's, nic ports, queues etc. and uses SOCKET_ID_ANY for socket related parameters.
The usecase works, but I think I am running into performance issues on the 32 core machine.
The lscpu output on my 32 core machine shows the following -
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
I am using core 1 to lift all the data from a single queue of an 82599EB port and I see that the cpu utilization for this core 1 is way too high even for lifting traffic of 1 Gbps with packet size of 650 bytes.
In general, does one need to be careful in working with multiple sockets and so forth, any comments would be helpful.
Regards
-Prashant
===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] NUMA CPU Sockets and DPDK
2014-02-12 11:28 [dpdk-dev] NUMA CPU Sockets and DPDK Prashant Upadhyaya
@ 2014-02-12 11:47 ` Etai Lev Ran
2014-02-12 12:03 ` Prashant Upadhyaya
2014-02-12 11:49 ` Richardson, Bruce
1 sibling, 1 reply; 5+ messages in thread
From: Etai Lev Ran @ 2014-02-12 11:47 UTC (permalink / raw)
To: 'Prashant Upadhyaya'; +Cc: dev
Hi Prashant,
Based on our experience, using DPDK cross CPU sockets may indeed result in
some performance degradation (~10% for our application vs. staying
in socket. YMMV based on HW, application structure, etc.).
Regarding CPU utilization on core 1, the one picking up traffic: perhaps I
had misunderstood your comment, but I would expect it to always be close
to 100% since it's polling the device via the PMD and not driven by
interrupts.
Regards,
Etai
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Prashant Upadhyaya
Sent: Wednesday, February 12, 2014 1:28 PM
To: dev@dpdk.org
Subject: [dpdk-dev] NUMA CPU Sockets and DPDK
Hi guys,
What has been your experience of using DPDK based app's in NUMA mode with
multiple sockets where some cores are present on one socket and other cores
on some other socket.
I am migrating my application from one intel machine with 8 cores, all in
one socket to a 32 core machine where 16 cores are in one socket and 16
other cores in the second socket.
My core 0 does all initialization for mbuf's, nic ports, queues etc. and
uses SOCKET_ID_ANY for socket related parameters.
The usecase works, but I think I am running into performance issues on the
32 core machine.
The lscpu output on my 32 core machine shows the following - NUMA node0
CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
I am using core 1 to lift all the data from a single queue of an 82599EB
port and I see that the cpu utilization for this core 1 is way too high even
for lifting traffic of 1 Gbps with packet size of 650 bytes.
In general, does one need to be careful in working with multiple sockets and
so forth, any comments would be helpful.
Regards
-Prashant
============================================================================
===
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
============================================================================
===
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] NUMA CPU Sockets and DPDK
2014-02-12 11:28 [dpdk-dev] NUMA CPU Sockets and DPDK Prashant Upadhyaya
2014-02-12 11:47 ` Etai Lev Ran
@ 2014-02-12 11:49 ` Richardson, Bruce
1 sibling, 0 replies; 5+ messages in thread
From: Richardson, Bruce @ 2014-02-12 11:49 UTC (permalink / raw)
To: Prashant Upadhyaya, dev
>
> What has been your experience of using DPDK based app's in NUMA mode
> with multiple sockets where some cores are present on one socket and
> other cores on some other socket.
>
> I am migrating my application from one intel machine with 8 cores, all in
> one socket to a 32 core machine where 16 cores are in one socket and 16
> other cores in the second socket.
> My core 0 does all initialization for mbuf's, nic ports, queues etc. and uses
> SOCKET_ID_ANY for socket related parameters.
It is recommended that you decide ahead of time on what cores on what numa socket different parts of your application are going to run, and then set up your objects in memory appropriately. SOCKET_ID_ANY should only be used to allocate items that are not for use in the data-path and for which you therefore don't care about access time. Any objects for rings or mempools should be created by specifying the correct socket to allocate the memory on. If you are working using two sockets, in some cases you may want to duplicate your data structures, for example, use two memory pools - one on each socket - instead of one, so that all data access is local.
>
> The usecase works, but I think I am running into performance issues on the
> 32 core machine.
> The lscpu output on my 32 core machine shows the following - NUMA
> node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
> I am using core 1 to lift all the data from a single queue of an 82599EB port
> and I see that the cpu utilization for this core 1 is way too high even for
> lifting traffic of 1 Gbps with packet size of 650 bytes.
How are you measuring the cpu utilization, because when using the Intel DPDK in most cases your cpu utilization will always be 100% as you are constantly polling? Therefore actual cpu headroom can be hard to judge at times.
Another thing to consider is the numa nodes to which your NICs are connected. You can check using the rte_eth_dev_socket_id() what numa socket your NIC is connected to - assuming a modern platform where the PCI connects straight to the CPUs. Whatever numa node that is connected to, you want to run the code for polling the NIC RX queues on that numa node, and do all packet transmission using cores on that NUMA node.
>
> In general, does one need to be careful in working with multiple sockets and
> so forth, any comments would be helpful.
In general, yes, you need to be a bit more careful, but the basic rules as outlined above should give you a good start.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] NUMA CPU Sockets and DPDK
2014-02-12 11:47 ` Etai Lev Ran
@ 2014-02-12 12:03 ` Prashant Upadhyaya
2014-02-12 14:11 ` François-Frédéric Ozog
0 siblings, 1 reply; 5+ messages in thread
From: Prashant Upadhyaya @ 2014-02-12 12:03 UTC (permalink / raw)
To: Etai Lev Ran; +Cc: dev
Hi Etai,
Ofcourse all DPDK threads consume 100 % (unless some waits are introduced for some power saving etc., all typical DPDK threads are while(1) loops)
When I said core 1 is unusually busy, I meant to say that it is not able to read beyond 2 Gbps or so and the packets are dropping at NIC.
(I have my own custom way of calculating the cpu utilization of core 1 based on how many empty polls were done and how many polls got me data which I then process)
On the 8 core machine with single socket, the core 1 was being able to lift successfully much higher data rates, hence the question.
Regards
-Prashant
-----Original Message-----
From: Etai Lev Ran [mailto:elevran@gmail.com]
Sent: Wednesday, February 12, 2014 5:18 PM
To: Prashant Upadhyaya
Cc: dev@dpdk.org
Subject: RE: [dpdk-dev] NUMA CPU Sockets and DPDK
Hi Prashant,
Based on our experience, using DPDK cross CPU sockets may indeed result in some performance degradation (~10% for our application vs. staying in socket. YMMV based on HW, application structure, etc.).
Regarding CPU utilization on core 1, the one picking up traffic: perhaps I had misunderstood your comment, but I would expect it to always be close to 100% since it's polling the device via the PMD and not driven by interrupts.
Regards,
Etai
-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Prashant Upadhyaya
Sent: Wednesday, February 12, 2014 1:28 PM
To: dev@dpdk.org
Subject: [dpdk-dev] NUMA CPU Sockets and DPDK
Hi guys,
What has been your experience of using DPDK based app's in NUMA mode with multiple sockets where some cores are present on one socket and other cores on some other socket.
I am migrating my application from one intel machine with 8 cores, all in one socket to a 32 core machine where 16 cores are in one socket and 16 other cores in the second socket.
My core 0 does all initialization for mbuf's, nic ports, queues etc. and uses SOCKET_ID_ANY for socket related parameters.
The usecase works, but I think I am running into performance issues on the
32 core machine.
The lscpu output on my 32 core machine shows the following - NUMA node0
CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
I am using core 1 to lift all the data from a single queue of an 82599EB port and I see that the cpu utilization for this core 1 is way too high even for lifting traffic of 1 Gbps with packet size of 650 bytes.
In general, does one need to be careful in working with multiple sockets and so forth, any comments would be helpful.
Regards
-Prashant
============================================================================
===
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
============================================================================
===
===============================================================================
Please refer to http://www.aricent.com/legal/email_disclaimer.html
for important disclosures regarding this electronic communication.
===============================================================================
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] NUMA CPU Sockets and DPDK
2014-02-12 12:03 ` Prashant Upadhyaya
@ 2014-02-12 14:11 ` François-Frédéric Ozog
0 siblings, 0 replies; 5+ messages in thread
From: François-Frédéric Ozog @ 2014-02-12 14:11 UTC (permalink / raw)
To: 'Prashant Upadhyaya'; +Cc: dev
Hi Prashant,
May be you could monitor RAM, QPI and PCIe activity with
http://software.intel.com/en-us/articles/intel-performance-counter-monitor-a
-better-way-to-measure-cpu-utilization
It may ease investigating the issue.
François-Frédéric
> -----Message d'origine-----
> De : dev [mailto:dev-bounces@dpdk.org] De la part de Prashant Upadhyaya
> Envoyé : mercredi 12 février 2014 13:03
> À : Etai Lev Ran
> Cc : dev@dpdk.org
> Objet : Re: [dpdk-dev] NUMA CPU Sockets and DPDK
>
> Hi Etai,
>
> Ofcourse all DPDK threads consume 100 % (unless some waits are introduced
> for some power saving etc., all typical DPDK threads are while(1) loops)
> When I said core 1 is unusually busy, I meant to say that it is not able
to
> read beyond 2 Gbps or so and the packets are dropping at NIC.
> (I have my own custom way of calculating the cpu utilization of core 1
> based on how many empty polls were done and how many polls got me data
> which I then process) On the 8 core machine with single socket, the core 1
> was being able to lift successfully much higher data rates, hence the
> question.
>
> Regards
> -Prashant
>
>
> -----Original Message-----
> From: Etai Lev Ran [mailto:elevran@gmail.com]
> Sent: Wednesday, February 12, 2014 5:18 PM
> To: Prashant Upadhyaya
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] NUMA CPU Sockets and DPDK
>
> Hi Prashant,
>
> Based on our experience, using DPDK cross CPU sockets may indeed result in
> some performance degradation (~10% for our application vs. staying in
> socket. YMMV based on HW, application structure, etc.).
>
> Regarding CPU utilization on core 1, the one picking up traffic: perhaps I
> had misunderstood your comment, but I would expect it to always be close
to
> 100% since it's polling the device via the PMD and not driven by
> interrupts.
>
> Regards,
> Etai
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Prashant Upadhyaya
> Sent: Wednesday, February 12, 2014 1:28 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] NUMA CPU Sockets and DPDK
>
> Hi guys,
>
> What has been your experience of using DPDK based app's in NUMA mode with
> multiple sockets where some cores are present on one socket and other
cores
> on some other socket.
>
> I am migrating my application from one intel machine with 8 cores, all in
> one socket to a 32 core machine where 16 cores are in one socket and 16
> other cores in the second socket.
> My core 0 does all initialization for mbuf's, nic ports, queues etc. and
> uses SOCKET_ID_ANY for socket related parameters.
>
> The usecase works, but I think I am running into performance issues on the
> 32 core machine.
> The lscpu output on my 32 core machine shows the following - NUMA node0
> CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
> I am using core 1 to lift all the data from a single queue of an 82599EB
> port and I see that the cpu utilization for this core 1 is way too high
> even for lifting traffic of 1 Gbps with packet size of 650 bytes.
>
> In general, does one need to be careful in working with multiple sockets
> and so forth, any comments would be helpful.
>
> Regards
> -Prashant
>
>
>
>
>
>
===========================================================================
> =
> ===
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
>
===========================================================================
> =
> ===
>
>
>
>
>
>
===========================================================================
> ====
> Please refer to http://www.aricent.com/legal/email_disclaimer.html
> for important disclosures regarding this electronic communication.
>
===========================================================================
> ====
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-02-12 14:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-12 11:28 [dpdk-dev] NUMA CPU Sockets and DPDK Prashant Upadhyaya
2014-02-12 11:47 ` Etai Lev Ran
2014-02-12 12:03 ` Prashant Upadhyaya
2014-02-12 14:11 ` François-Frédéric Ozog
2014-02-12 11:49 ` Richardson, Bruce
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).