* [dpdk-dev] Polling too often at lower packet rates?
@ 2015-04-08 16:35 Aaron Campbell
2015-04-08 18:00 ` Stephen Hemminger
2015-04-08 19:11 ` Ananyev, Konstantin
0 siblings, 2 replies; 7+ messages in thread
From: Aaron Campbell @ 2015-04-08 16:35 UTC (permalink / raw)
To: dev
Hi,
I have a machine with 6 DPDK ports (4 igb, 2 ixgbe), with 1.23Mpps traffic offered to only one of the 10G ports (the other 5 are unused). I also have a program with a pretty standard looking DPDK receive loop, where it calls rte_eth_rx_burst() for each configured port. If I configure the loop to read from all 6 ports, it can read the 1.23Mpps rate with no drops. If I configure the loop to poll only 1 port (the ixgbe receiving the traffic), I lose about 1/3rd of the packets (i.e., the NIC drops ~400Kpps).
Another data point is that if I configure the loop to read from 3 out of the 6 ports, the drop rate is reduced to less than half (i.e., the NIC is only dropping ~190Kpps now). So it seems that in this test, throughput improves by adding NICs, not removing them, which is counter-intuitive. Again, I get no drops when polling all 6 ports. Note, the burst size is 32.
I did find a reference to a similar issue in a recent paper (http://www.net.in.tum.de/fileadmin/bibtex/publications/papers/ICN2015.pdf), Section III, which states:
"The DPDK L2FWD application initially only managed to forward 13.8 Mpps in the single direction test at the maximum CPU frequency, a similar result can be found in [11]. Reducing the CPU frequency increased the throughput to the expected value of 14.88 Mpps. Our investigation of this anomaly revealed that the lack of any processing combined with the fast CPU caused DPDK to poll the NIC too often. DPDK does not use interrupts, it utilizes a busy wait loop that polls the NIC until at least one packet is returned. This resulted in a high poll rate which affected the throughput. We limited the poll rate to 500,000 poll operations per second (i.e., a batch size of about 30 packets) and achieved line rate in the unidirectional test with all frequencies. This effect was only observed with the X520 NIC, tests with X540 NICs did not show this anomaly.”
Another reference, from this mailing list last year (http://wiki.dpdk.org/ml/archives/dev/2014-January/001169.html):
"I suggest you to check average burst sizes on receive queues. Looks like I stumbled upon a similar issue several times. If you are calling rte_eth_rx_burst too frequently, NIC begins losing packets no matter how many CPU horse power you have (more you have, more it loses, actually). In my case this situation occured when average burst size is less than 20 packets or so. I'm not sure what's the reason for this behavior, but I observed it on several applications on Intel 82599 10Gb cards.”
So I’m wondering if anyone can explain at a lower level what happens when you poll “too often”, and if there are any practical workarounds. We’re using this same program and DPDK version to process 10G line-rate in other scenarios, so I’m confident that the overall packet capture architecture is sound.
-Aaron
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] Polling too often at lower packet rates?
2015-04-08 16:35 [dpdk-dev] Polling too often at lower packet rates? Aaron Campbell
@ 2015-04-08 18:00 ` Stephen Hemminger
2015-04-09 18:26 ` Aaron Campbell
2015-04-08 19:11 ` Ananyev, Konstantin
1 sibling, 1 reply; 7+ messages in thread
From: Stephen Hemminger @ 2015-04-08 18:00 UTC (permalink / raw)
To: Aaron Campbell; +Cc: dev
We use adaptive polling loop similar to l3fwd-power example.
See:
http://video.fosdem.org/2015/devroom-network_management_and_sdn/
On Wed, Apr 8, 2015 at 9:35 AM, Aaron Campbell <aaron@arbor.net> wrote:
> Hi,
>
> I have a machine with 6 DPDK ports (4 igb, 2 ixgbe), with 1.23Mpps traffic
> offered to only one of the 10G ports (the other 5 are unused). I also have
> a program with a pretty standard looking DPDK receive loop, where it calls
> rte_eth_rx_burst() for each configured port. If I configure the loop to
> read from all 6 ports, it can read the 1.23Mpps rate with no drops. If I
> configure the loop to poll only 1 port (the ixgbe receiving the traffic), I
> lose about 1/3rd of the packets (i.e., the NIC drops ~400Kpps).
>
> Another data point is that if I configure the loop to read from 3 out of
> the 6 ports, the drop rate is reduced to less than half (i.e., the NIC is
> only dropping ~190Kpps now). So it seems that in this test, throughput
> improves by adding NICs, not removing them, which is counter-intuitive.
> Again, I get no drops when polling all 6 ports. Note, the burst size is 32.
>
> I did find a reference to a similar issue in a recent paper (
> http://www.net.in.tum.de/fileadmin/bibtex/publications/papers/ICN2015.pdf),
> Section III, which states:
>
> "The DPDK L2FWD application initially only managed to forward 13.8 Mpps in
> the single direction test at the maximum CPU frequency, a similar result
> can be found in [11]. Reducing the CPU frequency increased the throughput
> to the expected value of 14.88 Mpps. Our investigation of this anomaly
> revealed that the lack of any processing combined with the fast CPU caused
> DPDK to poll the NIC too often. DPDK does not use interrupts, it utilizes a
> busy wait loop that polls the NIC until at least one packet is returned.
> This resulted in a high poll rate which affected the throughput. We limited
> the poll rate to 500,000 poll operations per second (i.e., a batch size of
> about 30 packets) and achieved line rate in the unidirectional test with
> all frequencies. This effect was only observed with the X520 NIC, tests
> with X540 NICs did not show this anomaly.”
>
> Another reference, from this mailing list last year (
> http://wiki.dpdk.org/ml/archives/dev/2014-January/001169.html):
>
> "I suggest you to check average burst sizes on receive queues. Looks like
> I stumbled upon a similar issue several times. If you are calling
> rte_eth_rx_burst too frequently, NIC begins losing packets no matter how
> many CPU horse power you have (more you have, more it loses, actually). In
> my case this situation occured when average burst size is less than 20
> packets or so. I'm not sure what's the reason for this behavior, but I
> observed it on several applications on Intel 82599 10Gb cards.”
>
> So I’m wondering if anyone can explain at a lower level what happens when
> you poll “too often”, and if there are any practical workarounds. We’re
> using this same program and DPDK version to process 10G line-rate in other
> scenarios, so I’m confident that the overall packet capture architecture is
> sound.
>
> -Aaron
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] Polling too often at lower packet rates?
2015-04-08 18:00 ` Stephen Hemminger
@ 2015-04-09 18:26 ` Aaron Campbell
2015-04-09 21:24 ` Stephen Hemminger
0 siblings, 1 reply; 7+ messages in thread
From: Aaron Campbell @ 2015-04-09 18:26 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
Hi Stephen,
Thanks, that was an informative talk. In this case, are you referring to your comments about the thermal budget?
That’s definitely interesting, but there must be more to it than that. Again, if I loop over all 6 ports (i.e., continue to keep the CPU busy), it works around the problem.
I agree that adaptive polling makes sense and will look into it. But will still take any further ideas on what is going on here.
-Aaron
> On Apr 8, 2015, at 3:00 PM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>
> We use adaptive polling loop similar to l3fwd-power example.
> See:
>
>
> http://video.fosdem.org/2015/devroom-network_management_and_sdn/ <http://video.fosdem.org/2015/devroom-network_management_and_sdn/>
>
> On Wed, Apr 8, 2015 at 9:35 AM, Aaron Campbell <aaron@arbor.net <mailto:aaron@arbor.net>> wrote:
> Hi,
>
> I have a machine with 6 DPDK ports (4 igb, 2 ixgbe), with 1.23Mpps traffic offered to only one of the 10G ports (the other 5 are unused). I also have a program with a pretty standard looking DPDK receive loop, where it calls rte_eth_rx_burst() for each configured port. If I configure the loop to read from all 6 ports, it can read the 1.23Mpps rate with no drops. If I configure the loop to poll only 1 port (the ixgbe receiving the traffic), I lose about 1/3rd of the packets (i.e., the NIC drops ~400Kpps).
>
> Another data point is that if I configure the loop to read from 3 out of the 6 ports, the drop rate is reduced to less than half (i.e., the NIC is only dropping ~190Kpps now). So it seems that in this test, throughput improves by adding NICs, not removing them, which is counter-intuitive. Again, I get no drops when polling all 6 ports. Note, the burst size is 32.
>
> I did find a reference to a similar issue in a recent paper (http://www.net.in.tum.de/fileadmin/bibtex/publications/papers/ICN2015.pdf <http://www.net.in.tum.de/fileadmin/bibtex/publications/papers/ICN2015.pdf>), Section III, which states:
>
> "The DPDK L2FWD application initially only managed to forward 13.8 Mpps in the single direction test at the maximum CPU frequency, a similar result can be found in [11]. Reducing the CPU frequency increased the throughput to the expected value of 14.88 Mpps. Our investigation of this anomaly revealed that the lack of any processing combined with the fast CPU caused DPDK to poll the NIC too often. DPDK does not use interrupts, it utilizes a busy wait loop that polls the NIC until at least one packet is returned. This resulted in a high poll rate which affected the throughput. We limited the poll rate to 500,000 poll operations per second (i.e., a batch size of about 30 packets) and achieved line rate in the unidirectional test with all frequencies. This effect was only observed with the X520 NIC, tests with X540 NICs did not show this anomaly.”
>
> Another reference, from this mailing list last year (http://wiki.dpdk.org/ml/archives/dev/2014-January/001169.html <http://wiki.dpdk.org/ml/archives/dev/2014-January/001169.html>):
>
> "I suggest you to check average burst sizes on receive queues. Looks like I stumbled upon a similar issue several times. If you are calling rte_eth_rx_burst too frequently, NIC begins losing packets no matter how many CPU horse power you have (more you have, more it loses, actually). In my case this situation occured when average burst size is less than 20 packets or so. I'm not sure what's the reason for this behavior, but I observed it on several applications on Intel 82599 10Gb cards.”
>
> So I’m wondering if anyone can explain at a lower level what happens when you poll “too often”, and if there are any practical workarounds. We’re using this same program and DPDK version to process 10G line-rate in other scenarios, so I’m confident that the overall packet capture architecture is sound.
>
> -Aaron
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] Polling too often at lower packet rates?
2015-04-09 18:26 ` Aaron Campbell
@ 2015-04-09 21:24 ` Stephen Hemminger
2015-04-10 0:42 ` Paul Emmerich
0 siblings, 1 reply; 7+ messages in thread
From: Stephen Hemminger @ 2015-04-09 21:24 UTC (permalink / raw)
To: Aaron Campbell; +Cc: dev
On Thu, 9 Apr 2015 15:26:23 -0300
Aaron Campbell <aaron@arbor.net> wrote:
> Hi Stephen,
>
> Thanks, that was an informative talk. In this case, are you referring to your comments about the thermal budget?
>
> That’s definitely interesting, but there must be more to it than that. Again, if I loop over all 6 ports (i.e., continue to keep the CPU busy), it works around the problem.
>
> I agree that adaptive polling makes sense and will look into it. But will still take any further ideas on what is going on here.
>
> -Aaron
Your excess polling consumes PCI bandwidth which is a fixed resource.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] Polling too often at lower packet rates?
2015-04-09 21:24 ` Stephen Hemminger
@ 2015-04-10 0:42 ` Paul Emmerich
2015-04-10 1:05 ` Paul Emmerich
0 siblings, 1 reply; 7+ messages in thread
From: Paul Emmerich @ 2015-04-10 0:42 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
Stephen Hemminger <stephen@networkplumber.org> wrote:
> Your excess polling consumes PCI bandwidth which is a fixed resource.
I doubt that this is the problem for three reasons:
* The poll rate would regulate itself if the PCIe bus was the bottleneck
* This problem only occurs with 82599 chips, not with X540 chips
(which are virtually identical except for 10GBase-T vs. fiber)
* This effect already appears with relatively low poll rates (batch size 1).
The overhead would have to be > 128 bytes per poll to saturate the PCIe bus
at this rate. (Even if this was the case, my first point still applies)
I unfortunately don't have a test system with a CPU that supports PCIe
performance counters with a 82599 card. Otherwise I'd measure this since I
think this effect is really interesting (I'm a co-authors of the paper
linked above).
Paul
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] Polling too often at lower packet rates?
2015-04-10 0:42 ` Paul Emmerich
@ 2015-04-10 1:05 ` Paul Emmerich
0 siblings, 0 replies; 7+ messages in thread
From: Paul Emmerich @ 2015-04-10 1:05 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
Paul Emmerich <emmericp@net.in.tum.de> wrote:
> Stephen Hemminger <stephen@networkplumber.org> wrote:
>
>> Your excess polling consumes PCI bandwidth which is a fixed resource.
>
> I doubt that this is the problem for three reasons:
>
4th: polling should not cause a PCIe access as all the required information is written
to memory by the NIC [1].
I only noticed that after I tried to measure the PCIe bandwidth caused by polling an X540
NIC... the result was 0 MBit/s ;)
Paul
[1] 82599 and X540 data sheets, Table 1-9
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [dpdk-dev] Polling too often at lower packet rates?
2015-04-08 16:35 [dpdk-dev] Polling too often at lower packet rates? Aaron Campbell
2015-04-08 18:00 ` Stephen Hemminger
@ 2015-04-08 19:11 ` Ananyev, Konstantin
1 sibling, 0 replies; 7+ messages in thread
From: Ananyev, Konstantin @ 2015-04-08 19:11 UTC (permalink / raw)
To: Aaron Campbell, dev
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Aaron Campbell
> Sent: Wednesday, April 08, 2015 5:36 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] Polling too often at lower packet rates?
>
> Hi,
>
> I have a machine with 6 DPDK ports (4 igb, 2 ixgbe), with 1.23Mpps traffic offered to only one of the 10G ports (the other 5 are
> unused). I also have a program with a pretty standard looking DPDK receive loop, where it calls rte_eth_rx_burst() for each
> configured port. If I configure the loop to read from all 6 ports, it can read the 1.23Mpps rate with no drops. If I configure the loop to
> poll only 1 port (the ixgbe receiving the traffic), I lose about 1/3rd of the packets (i.e., the NIC drops ~400Kpps).
That seems a bit strange to see packet drops with such low rate.
Tried it with latest DPDK over 1 ixgbe port (X520-4) , 2.8 GHz IVB cpu:
./dpdk.org/x86_64-native-linuxapp-gcc/app/testpmd -c ff -n 4 --socket-mem=1024,0 -w 0000:04:00.1 -- -i --burst=32
testpmd> start
Tried with 1/1.5/2/3 Mpps - no packet loss.
Wonder to what port do you forward your packets?
What stats tells?
Does number of RX packets equal to the number of TX packets?
Konstantin
>
> Another data point is that if I configure the loop to read from 3 out of the 6 ports, the drop rate is reduced to less than half (i.e., the
> NIC is only dropping ~190Kpps now). So it seems that in this test, throughput improves by adding NICs, not removing them, which is
> counter-intuitive. Again, I get no drops when polling all 6 ports. Note, the burst size is 32.
>
> I did find a reference to a similar issue in a recent paper
> (http://www.net.in.tum.de/fileadmin/bibtex/publications/papers/ICN2015.pdf), Section III, which states:
>
> "The DPDK L2FWD application initially only managed to forward 13.8 Mpps in the single direction test at the maximum CPU frequency,
> a similar result can be found in [11]. Reducing the CPU frequency increased the throughput to the expected value of 14.88 Mpps. Our
> investigation of this anomaly revealed that the lack of any processing combined with the fast CPU caused DPDK to poll the NIC too
> often. DPDK does not use interrupts, it utilizes a busy wait loop that polls the NIC until at least one packet is returned. This resulted in
> a high poll rate which affected the throughput. We limited the poll rate to 500,000 poll operations per second (i.e., a batch size of
> about 30 packets) and achieved line rate in the unidirectional test with all frequencies. This effect was only observed with the X520
> NIC, tests with X540 NICs did not show this anomaly.”
>
> Another reference, from this mailing list last year (http://wiki.dpdk.org/ml/archives/dev/2014-January/001169.html):
>
> "I suggest you to check average burst sizes on receive queues. Looks like I stumbled upon a similar issue several times. If you are
> calling rte_eth_rx_burst too frequently, NIC begins losing packets no matter how many CPU horse power you have (more you have,
> more it loses, actually). In my case this situation occured when average burst size is less than 20 packets or so. I'm not sure what's the
> reason for this behavior, but I observed it on several applications on Intel 82599 10Gb cards.”
>
> So I’m wondering if anyone can explain at a lower level what happens when you poll “too often”, and if there are any practical
> workarounds. We’re using this same program and DPDK version to process 10G line-rate in other scenarios, so I’m confident that the
> overall packet capture architecture is sound.
>
> -Aaron
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-04-10 1:05 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-08 16:35 [dpdk-dev] Polling too often at lower packet rates? Aaron Campbell
2015-04-08 18:00 ` Stephen Hemminger
2015-04-09 18:26 ` Aaron Campbell
2015-04-09 21:24 ` Stephen Hemminger
2015-04-10 0:42 ` Paul Emmerich
2015-04-10 1:05 ` Paul Emmerich
2015-04-08 19:11 ` Ananyev, Konstantin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).