* [dpdk-users] Performance of rte_eth_stats_get @ 2021-05-19 13:10 Filip Janiszewski 2021-05-19 15:14 ` Van Haaren, Harry 0 siblings, 1 reply; 4+ messages in thread From: Filip Janiszewski @ 2021-05-19 13:10 UTC (permalink / raw) To: users Hi, Is it safe to call rte_eth_stats_get while capturing from the port? I'm mostly concerned about performance, if rte_eth_stats_get will in any way impact the port performance, in the application I plan to call the function from a thread that is not directly involved in the capture, there's another worker responsible for rx bursting, but I wonder if the NIC might get upset if I call it too frequently (say 10 times per second) and potentially cause some performance issues. The question is really Nic agnostic, but if the Nic vendor is actually relevant then I'm running Intel 700 series nic and Mellanox ConnectX-4/5. Thanks -- BR, Filip +48 666 369 823 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Performance of rte_eth_stats_get 2021-05-19 13:10 [dpdk-users] Performance of rte_eth_stats_get Filip Janiszewski @ 2021-05-19 15:14 ` Van Haaren, Harry 2021-05-19 16:06 ` Stephen Hemminger 0 siblings, 1 reply; 4+ messages in thread From: Van Haaren, Harry @ 2021-05-19 15:14 UTC (permalink / raw) To: Filip Janiszewski, users > -----Original Message----- > From: users <users-bounces@dpdk.org> On Behalf Of Filip Janiszewski > Sent: Wednesday, May 19, 2021 2:10 PM > To: users@dpdk.org > Subject: [dpdk-users] Performance of rte_eth_stats_get > > Hi, > > Is it safe to call rte_eth_stats_get while capturing from the port? > > I'm mostly concerned about performance, if rte_eth_stats_get will in any > way impact the port performance, in the application I plan to call the > function from a thread that is not directly involved in the capture, > there's another worker responsible for rx bursting, but I wonder if the > NIC might get upset if I call it too frequently (say 10 times per > second) and potentially cause some performance issues. > > The question is really Nic agnostic, but if the Nic vendor is actually > relevant then I'm running Intel 700 series nic and Mellanox ConnectX-4/5. To understand what really goes on when getting stats, it might help to list the steps involved in getting statistics from the NIC HW. 1) CPU sends an MMIO read (Memory Mapped I/O, aka, sometimes referred to as a "pci read") to the NIC. 2) The PCI bus has to handle extra TLPs (pci transactions) to satisfy read 3) NIC has to send a reply based on accessing its internal counters 4) CPU gets a result from the PCI read. Notice how elegantly this whole process is abstracted from SW? In code, reading a stat value is just dereferencing a pointer that is mapped to the NIC HW address. In practice from a CPU performance point of view, doing an MMIO-read is one of the slowest things you can do. You say the stats-reads are occurring from a thread that is not handling rx/datapath, so perhaps the CPU cycle cost itself isn't a concern. Do note however, that when reading a full set of extended stats from the NIC, there could be many 10's to 100's of MMIO reads (depending on the statistics requested, and how the PMD itself is implemented to handle stats updates). The PCI bus does become more busy with reads to the NIC HW when doing lots of statistic updates, so there is some more contention/activity to be expected there. The PCM tool can be very useful to see MMIO traffic, you could measure how many extra PCI transactions are occurring due to reading stats every X ms? https://github.com/opcm/pcm I can recommend measuring pkt latency/jitter as a histogram, as then outliers in performance can be identified. If you specifically want to identify if these are due stats reads, compare with a "no stats reads" latency/jitter histogram, and graphically see the impact. In the end if it doesn't affect packet latency/jitter, then it has no impact right? Ultimately, I can't give a generic answer - best steps are to measure carefully and find out! > Thanks Hope the above helps and doesn't add confusion :) Regards, -Harry ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Performance of rte_eth_stats_get 2021-05-19 15:14 ` Van Haaren, Harry @ 2021-05-19 16:06 ` Stephen Hemminger 2021-07-14 10:25 ` Alireza Sanaee 0 siblings, 1 reply; 4+ messages in thread From: Stephen Hemminger @ 2021-05-19 16:06 UTC (permalink / raw) To: Van Haaren, Harry; +Cc: Filip Janiszewski, users On Wed, 19 May 2021 15:14:38 +0000 "Van Haaren, Harry" <harry.van.haaren@intel.com> wrote: > > -----Original Message----- > > From: users <users-bounces@dpdk.org> On Behalf Of Filip Janiszewski > > Sent: Wednesday, May 19, 2021 2:10 PM > > To: users@dpdk.org > > Subject: [dpdk-users] Performance of rte_eth_stats_get > > > > Hi, > > > > Is it safe to call rte_eth_stats_get while capturing from the port? > > > > I'm mostly concerned about performance, if rte_eth_stats_get will in any > > way impact the port performance, in the application I plan to call the > > function from a thread that is not directly involved in the capture, > > there's another worker responsible for rx bursting, but I wonder if the > > NIC might get upset if I call it too frequently (say 10 times per > > second) and potentially cause some performance issues. > > > > The question is really Nic agnostic, but if the Nic vendor is actually > > relevant then I'm running Intel 700 series nic and Mellanox ConnectX-4/5. > > To understand what really goes on when getting stats, it might help to list the > steps involved in getting statistics from the NIC HW. > > 1) CPU sends an MMIO read (Memory Mapped I/O, aka, sometimes referred > to as a "pci read") to the NIC. > 2) The PCI bus has to handle extra TLPs (pci transactions) to satisfy read > 3) NIC has to send a reply based on accessing its internal counters > 4) CPU gets a result from the PCI read. > > Notice how elegantly this whole process is abstracted from SW? In code, reading > a stat value is just dereferencing a pointer that is mapped to the NIC HW address. > In practice from a CPU performance point of view, doing an MMIO-read is one of > the slowest things you can do. You say the stats-reads are occurring from a thread > that is not handling rx/datapath, so perhaps the CPU cycle cost itself isn't a concern. > > Do note however, that when reading a full set of extended stats from the NIC, there > could be many 10's to 100's of MMIO reads (depending on the statistics requested, > and how the PMD itself is implemented to handle stats updates). > > The PCI bus does become more busy with reads to the NIC HW when doing lots of > statistic updates, so there is some more contention/activity to be expected there. > The PCM tool can be very useful to see MMIO traffic, you could measure how many > extra PCI transactions are occurring due to reading stats every X ms? > https://github.com/opcm/pcm > > I can recommend measuring pkt latency/jitter as a histogram, as then outliers in performance > can be identified. If you specifically want to identify if these are due stats reads, compare > with a "no stats reads" latency/jitter histogram, and graphically see the impact. > In the end if it doesn't affect packet latency/jitter, then it has no impact right? > > Ultimately, I can't give a generic answer - best steps are to measure carefully and find out! > > > Thanks > > Hope the above helps and doesn't add confusion :) Regards, -Harry Many drivers require transactions with the firmware via mailbox. And that transaction needs a spin wait for the shared area. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [dpdk-users] Performance of rte_eth_stats_get 2021-05-19 16:06 ` Stephen Hemminger @ 2021-07-14 10:25 ` Alireza Sanaee 0 siblings, 0 replies; 4+ messages in thread From: Alireza Sanaee @ 2021-07-14 10:25 UTC (permalink / raw) To: Stephen Hemminger, Van Haaren, Harry; +Cc: Filip Janiszewski, users On 19/05/2021 17:06, Stephen Hemminger wrote: > On Wed, 19 May 2021 15:14:38 +0000 > "Van Haaren, Harry" <harry.van.haaren@intel.com> wrote: > >>> -----Original Message----- >>> From: users <users-bounces@dpdk.org> On Behalf Of Filip Janiszewski >>> Sent: Wednesday, May 19, 2021 2:10 PM >>> To: users@dpdk.org >>> Subject: [dpdk-users] Performance of rte_eth_stats_get >>> >>> Hi, >>> >>> Is it safe to call rte_eth_stats_get while capturing from the port? >>> >>> I'm mostly concerned about performance, if rte_eth_stats_get will in any >>> way impact the port performance, in the application I plan to call the >>> function from a thread that is not directly involved in the capture, >>> there's another worker responsible for rx bursting, but I wonder if the >>> NIC might get upset if I call it too frequently (say 10 times per >>> second) and potentially cause some performance issues. >>> >>> The question is really Nic agnostic, but if the Nic vendor is actually >>> relevant then I'm running Intel 700 series nic and Mellanox ConnectX-4/5. >> >> To understand what really goes on when getting stats, it might help to list the >> steps involved in getting statistics from the NIC HW. >> >> 1) CPU sends an MMIO read (Memory Mapped I/O, aka, sometimes referred >> to as a "pci read") to the NIC. >> 2) The PCI bus has to handle extra TLPs (pci transactions) to satisfy read >> 3) NIC has to send a reply based on accessing its internal counters >> 4) CPU gets a result from the PCI read. >> >> Notice how elegantly this whole process is abstracted from SW? In code, reading >> a stat value is just dereferencing a pointer that is mapped to the NIC HW address. >> In practice from a CPU performance point of view, doing an MMIO-read is one of >> the slowest things you can do. You say the stats-reads are occurring from a thread >> that is not handling rx/datapath, so perhaps the CPU cycle cost itself isn't a concern. >> >> Do note however, that when reading a full set of extended stats from the NIC, there >> could be many 10's to 100's of MMIO reads (depending on the statistics requested, >> and how the PMD itself is implemented to handle stats updates). >> >> The PCI bus does become more busy with reads to the NIC HW when doing lots of >> statistic updates, so there is some more contention/activity to be expected there. >> The PCM tool can be very useful to see MMIO traffic, you could measure how many >> extra PCI transactions are occurring due to reading stats every X ms? >> https://github.com/opcm/pcm >> >> I can recommend measuring pkt latency/jitter as a histogram, as then outliers in performance >> can be identified. If you specifically want to identify if these are due stats reads, compare >> with a "no stats reads" latency/jitter histogram, and graphically see the impact. >> In the end if it doesn't affect packet latency/jitter, then it has no impact right? >> >> Ultimately, I can't give a generic answer - best steps are to measure carefully and find out! >> >>> Thanks >> >> Hope the above helps and doesn't add confusion :) Regards, -Harry > > Many drivers require transactions with the firmware via mailbox. > And that transaction needs a spin wait for the shared area. > Thank you for explaining the steps quite nicely. I also noticed this problem too. Calling `rte_eth_stats_get` in the PMDport per batch almost halves the throughput in a 10G setup IIRC, the cost is prohibitively HIGH. This, however, doesn't show up when DPDK connects a vhost-pmdport, since all of port statistics are probably somewhere in the shared memory. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-07-14 10:25 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-19 13:10 [dpdk-users] Performance of rte_eth_stats_get Filip Janiszewski 2021-05-19 15:14 ` Van Haaren, Harry 2021-05-19 16:06 ` Stephen Hemminger 2021-07-14 10:25 ` Alireza Sanaee
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).