DPDK usage discussions
 help / color / mirror / Atom feed
* [dpdk-users] Performance of rte_eth_stats_get
@ 2021-05-19 13:10 Filip Janiszewski
  2021-05-19 15:14 ` Van Haaren, Harry
  0 siblings, 1 reply; 4+ messages in thread
From: Filip Janiszewski @ 2021-05-19 13:10 UTC (permalink / raw)
  To: users

Hi,

Is it safe to call rte_eth_stats_get while capturing from the port?

I'm mostly concerned about performance, if rte_eth_stats_get will in any
way impact the port performance, in the application I plan to call the
function from a thread that is not directly involved in the capture,
there's another worker responsible for rx bursting, but I wonder if the
NIC might get upset if I call it too frequently (say 10 times per
second) and potentially cause some performance issues.

The question is really Nic agnostic, but if the Nic vendor is actually
relevant then I'm running Intel 700 series nic and Mellanox ConnectX-4/5.

Thanks

-- 
BR, Filip
+48 666 369 823

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Performance of rte_eth_stats_get
  2021-05-19 13:10 [dpdk-users] Performance of rte_eth_stats_get Filip Janiszewski
@ 2021-05-19 15:14 ` Van Haaren, Harry
  2021-05-19 16:06   ` Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: Van Haaren, Harry @ 2021-05-19 15:14 UTC (permalink / raw)
  To: Filip Janiszewski, users

> -----Original Message-----
> From: users <users-bounces@dpdk.org> On Behalf Of Filip Janiszewski
> Sent: Wednesday, May 19, 2021 2:10 PM
> To: users@dpdk.org
> Subject: [dpdk-users] Performance of rte_eth_stats_get
> 
> Hi,
> 
> Is it safe to call rte_eth_stats_get while capturing from the port?
> 
> I'm mostly concerned about performance, if rte_eth_stats_get will in any
> way impact the port performance, in the application I plan to call the
> function from a thread that is not directly involved in the capture,
> there's another worker responsible for rx bursting, but I wonder if the
> NIC might get upset if I call it too frequently (say 10 times per
> second) and potentially cause some performance issues.
> 
> The question is really Nic agnostic, but if the Nic vendor is actually
> relevant then I'm running Intel 700 series nic and Mellanox ConnectX-4/5.

To understand what really goes on when getting stats, it might help to list the
steps involved in getting statistics from the NIC HW.

1) CPU sends an MMIO read (Memory Mapped I/O, aka, sometimes referred
to as a "pci read") to the NIC.
2) The PCI bus has to handle extra TLPs (pci transactions) to satisfy read
3) NIC has to send a reply based on accessing its internal counters
4) CPU gets a result from the PCI read.

Notice how elegantly this whole process is abstracted from SW? In code, reading
a stat value is just dereferencing a pointer that is mapped to the NIC HW address.
In practice from a CPU performance point of view, doing an MMIO-read is one of
the slowest things you can do. You say the stats-reads are occurring from a thread
that is not handling rx/datapath, so perhaps the CPU cycle cost itself isn't a concern.

Do note however, that when reading a full set of extended stats from the NIC, there
could be many 10's to 100's of MMIO reads (depending on the statistics requested,
and how the PMD itself is implemented to handle stats updates).

The PCI bus does become more busy with reads to the NIC HW when doing lots of
statistic updates, so there is some more contention/activity to be expected there.
The PCM tool can be very useful to see MMIO traffic, you could measure how many
extra PCI transactions are occurring due to reading stats every X ms?
https://github.com/opcm/pcm

I can recommend measuring pkt latency/jitter as a histogram, as then outliers in performance
can be identified. If you specifically want to identify if these are due stats reads, compare
with a "no stats reads" latency/jitter histogram, and graphically see the impact.
In the end if it doesn't affect packet latency/jitter, then it has no impact right?

Ultimately, I can't give a generic answer - best steps are to measure carefully and find out!

> Thanks

Hope the above helps and doesn't add confusion :)  Regards, -Harry

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Performance of rte_eth_stats_get
  2021-05-19 15:14 ` Van Haaren, Harry
@ 2021-05-19 16:06   ` Stephen Hemminger
  2021-07-14 10:25     ` Alireza Sanaee
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2021-05-19 16:06 UTC (permalink / raw)
  To: Van Haaren, Harry; +Cc: Filip Janiszewski, users

On Wed, 19 May 2021 15:14:38 +0000
"Van Haaren, Harry" <harry.van.haaren@intel.com> wrote:

> > -----Original Message-----
> > From: users <users-bounces@dpdk.org> On Behalf Of Filip Janiszewski
> > Sent: Wednesday, May 19, 2021 2:10 PM
> > To: users@dpdk.org
> > Subject: [dpdk-users] Performance of rte_eth_stats_get
> > 
> > Hi,
> > 
> > Is it safe to call rte_eth_stats_get while capturing from the port?
> > 
> > I'm mostly concerned about performance, if rte_eth_stats_get will in any
> > way impact the port performance, in the application I plan to call the
> > function from a thread that is not directly involved in the capture,
> > there's another worker responsible for rx bursting, but I wonder if the
> > NIC might get upset if I call it too frequently (say 10 times per
> > second) and potentially cause some performance issues.
> > 
> > The question is really Nic agnostic, but if the Nic vendor is actually
> > relevant then I'm running Intel 700 series nic and Mellanox ConnectX-4/5.  
> 
> To understand what really goes on when getting stats, it might help to list the
> steps involved in getting statistics from the NIC HW.
> 
> 1) CPU sends an MMIO read (Memory Mapped I/O, aka, sometimes referred
> to as a "pci read") to the NIC.
> 2) The PCI bus has to handle extra TLPs (pci transactions) to satisfy read
> 3) NIC has to send a reply based on accessing its internal counters
> 4) CPU gets a result from the PCI read.
> 
> Notice how elegantly this whole process is abstracted from SW? In code, reading
> a stat value is just dereferencing a pointer that is mapped to the NIC HW address.
> In practice from a CPU performance point of view, doing an MMIO-read is one of
> the slowest things you can do. You say the stats-reads are occurring from a thread
> that is not handling rx/datapath, so perhaps the CPU cycle cost itself isn't a concern.
> 
> Do note however, that when reading a full set of extended stats from the NIC, there
> could be many 10's to 100's of MMIO reads (depending on the statistics requested,
> and how the PMD itself is implemented to handle stats updates).
> 
> The PCI bus does become more busy with reads to the NIC HW when doing lots of
> statistic updates, so there is some more contention/activity to be expected there.
> The PCM tool can be very useful to see MMIO traffic, you could measure how many
> extra PCI transactions are occurring due to reading stats every X ms?
> https://github.com/opcm/pcm
> 
> I can recommend measuring pkt latency/jitter as a histogram, as then outliers in performance
> can be identified. If you specifically want to identify if these are due stats reads, compare
> with a "no stats reads" latency/jitter histogram, and graphically see the impact.
> In the end if it doesn't affect packet latency/jitter, then it has no impact right?
> 
> Ultimately, I can't give a generic answer - best steps are to measure carefully and find out!
> 
> > Thanks  
> 
> Hope the above helps and doesn't add confusion :)  Regards, -Harry

Many drivers require transactions with the firmware via mailbox.
And that transaction needs a spin wait for the shared area.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Performance of rte_eth_stats_get
  2021-05-19 16:06   ` Stephen Hemminger
@ 2021-07-14 10:25     ` Alireza Sanaee
  0 siblings, 0 replies; 4+ messages in thread
From: Alireza Sanaee @ 2021-07-14 10:25 UTC (permalink / raw)
  To: Stephen Hemminger, Van Haaren, Harry; +Cc: Filip Janiszewski, users

On 19/05/2021 17:06, Stephen Hemminger wrote:
> On Wed, 19 May 2021 15:14:38 +0000
> "Van Haaren, Harry" <harry.van.haaren@intel.com> wrote:
> 
>>> -----Original Message-----
>>> From: users <users-bounces@dpdk.org> On Behalf Of Filip Janiszewski
>>> Sent: Wednesday, May 19, 2021 2:10 PM
>>> To: users@dpdk.org
>>> Subject: [dpdk-users] Performance of rte_eth_stats_get
>>>
>>> Hi,
>>>
>>> Is it safe to call rte_eth_stats_get while capturing from the port?
>>>
>>> I'm mostly concerned about performance, if rte_eth_stats_get will in any
>>> way impact the port performance, in the application I plan to call the
>>> function from a thread that is not directly involved in the capture,
>>> there's another worker responsible for rx bursting, but I wonder if the
>>> NIC might get upset if I call it too frequently (say 10 times per
>>> second) and potentially cause some performance issues.
>>>
>>> The question is really Nic agnostic, but if the Nic vendor is actually
>>> relevant then I'm running Intel 700 series nic and Mellanox ConnectX-4/5.
>>
>> To understand what really goes on when getting stats, it might help to list the
>> steps involved in getting statistics from the NIC HW.
>>
>> 1) CPU sends an MMIO read (Memory Mapped I/O, aka, sometimes referred
>> to as a "pci read") to the NIC.
>> 2) The PCI bus has to handle extra TLPs (pci transactions) to satisfy read
>> 3) NIC has to send a reply based on accessing its internal counters
>> 4) CPU gets a result from the PCI read.
>>
>> Notice how elegantly this whole process is abstracted from SW? In code, reading
>> a stat value is just dereferencing a pointer that is mapped to the NIC HW address.
>> In practice from a CPU performance point of view, doing an MMIO-read is one of
>> the slowest things you can do. You say the stats-reads are occurring from a thread
>> that is not handling rx/datapath, so perhaps the CPU cycle cost itself isn't a concern.
>>
>> Do note however, that when reading a full set of extended stats from the NIC, there
>> could be many 10's to 100's of MMIO reads (depending on the statistics requested,
>> and how the PMD itself is implemented to handle stats updates).
>>
>> The PCI bus does become more busy with reads to the NIC HW when doing lots of
>> statistic updates, so there is some more contention/activity to be expected there.
>> The PCM tool can be very useful to see MMIO traffic, you could measure how many
>> extra PCI transactions are occurring due to reading stats every X ms?
>> https://github.com/opcm/pcm
>>
>> I can recommend measuring pkt latency/jitter as a histogram, as then outliers in performance
>> can be identified. If you specifically want to identify if these are due stats reads, compare
>> with a "no stats reads" latency/jitter histogram, and graphically see the impact.
>> In the end if it doesn't affect packet latency/jitter, then it has no impact right?
>>
>> Ultimately, I can't give a generic answer - best steps are to measure carefully and find out!
>>
>>> Thanks
>>
>> Hope the above helps and doesn't add confusion :)  Regards, -Harry
> 
> Many drivers require transactions with the firmware via mailbox.
> And that transaction needs a spin wait for the shared area.
> 

Thank you for explaining the steps quite nicely. I also noticed this 
problem too. Calling `rte_eth_stats_get` in the PMDport per batch almost
halves the throughput in a 10G setup IIRC, the cost is prohibitively 
HIGH. This, however, doesn't show up when DPDK connects a vhost-pmdport, 
since all of port statistics are probably somewhere in the shared
memory.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-07-14 10:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19 13:10 [dpdk-users] Performance of rte_eth_stats_get Filip Janiszewski
2021-05-19 15:14 ` Van Haaren, Harry
2021-05-19 16:06   ` Stephen Hemminger
2021-07-14 10:25     ` Alireza Sanaee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).