* [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count @ 2016-03-29 1:45 Mohammad El-Shabani 2016-03-29 2:22 ` Lu, Wenzhuo 2016-03-29 9:31 ` Bruce Richardson 0 siblings, 2 replies; 5+ messages in thread From: Mohammad El-Shabani @ 2016-03-29 1:45 UTC (permalink / raw) To: dev Hi, Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count is implemented a scan of elements of rx descriptors, which is very expensive. I am wondering why its implemented the way it is. Could it not just read the head location from the driver? Thanks! Mohammad El-Shabani ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count 2016-03-29 1:45 [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count Mohammad El-Shabani @ 2016-03-29 2:22 ` Lu, Wenzhuo 2016-03-29 9:31 ` Bruce Richardson 1 sibling, 0 replies; 5+ messages in thread From: Lu, Wenzhuo @ 2016-03-29 2:22 UTC (permalink / raw) To: Mohammad El-Shabani, dev Hi Mohammad, > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Mohammad El-Shabani > Sent: Tuesday, March 29, 2016 9:45 AM > To: dev@dpdk.org > Subject: [dpdk-dev] librte_pmd_ixgbe implementation of > ixgbe_dev_rx_queue_count > > Hi, > Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count is > implemented a scan of elements of rx descriptors, which is very expensive. I am > wondering why its implemented the way it is. Could it not just read the head > location from the driver? Not sure about the history. But to my opinion it's a control plane ops not a data plane ops. Maybe the reason is the author doesn't care about the performance too much. As you have a good idea, would you like to share it with us? Thanks in advance:) > > Thanks! > Mohammad El-Shabani ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count 2016-03-29 1:45 [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count Mohammad El-Shabani 2016-03-29 2:22 ` Lu, Wenzhuo @ 2016-03-29 9:31 ` Bruce Richardson 2016-03-29 16:54 ` Stephen Hemminger 1 sibling, 1 reply; 5+ messages in thread From: Bruce Richardson @ 2016-03-29 9:31 UTC (permalink / raw) To: Mohammad El-Shabani; +Cc: dev On Mon, Mar 28, 2016 at 06:45:26PM -0700, Mohammad El-Shabani wrote: > Hi, > Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count > is implemented a scan of elements of rx descriptors, which is very > expensive. I am wondering why its implemented the way it is. Could it not > just read the head location from the driver? > > Thanks! > Mohammad El-Shabani It's likely that reading the head location from the driver will be even slower than scanning the descriptor rings in memory. Access to PCI is very much slower than accessing memory - especially since on platforms with DDIO, many memory accesses will actually be cache reads. That being said, I haven't actually written a test to prove this out, so feel free to try out the head pointer read method instead and see if it improves things. The results may vary depending on how far ahead needs to be scanned, but certainly for the empty ring case, the descriptor scan method will be far faster than a head read. Regards, /Bruce ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count 2016-03-29 9:31 ` Bruce Richardson @ 2016-03-29 16:54 ` Stephen Hemminger 2016-03-30 14:23 ` Bruce Richardson 0 siblings, 1 reply; 5+ messages in thread From: Stephen Hemminger @ 2016-03-29 16:54 UTC (permalink / raw) To: Bruce Richardson; +Cc: Mohammad El-Shabani, dev On Tue, 29 Mar 2016 10:31:19 +0100 Bruce Richardson <bruce.richardson@intel.com> wrote: > On Mon, Mar 28, 2016 at 06:45:26PM -0700, Mohammad El-Shabani wrote: > > Hi, > > Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count > > is implemented a scan of elements of rx descriptors, which is very > > expensive. I am wondering why its implemented the way it is. Could it not > > just read the head location from the driver? > > > > Thanks! > > Mohammad El-Shabani > > It's likely that reading the head location from the driver will be even slower > than scanning the descriptor rings in memory. Access to PCI is very much slower > than accessing memory - especially since on platforms with DDIO, many memory > accesses will actually be cache reads. > > That being said, I haven't actually written a test to prove this out, so feel > free to try out the head pointer read method instead and see if it improves > things. The results may vary depending on how far ahead needs to be scanned, > but certainly for the empty ring case, the descriptor scan method will be far > faster than a head read. > > Regards, > /Bruce Also the most common use case is "is there any more packets ready before I go to sleep on epoll", and the descriptor done API tells more than is needed. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count 2016-03-29 16:54 ` Stephen Hemminger @ 2016-03-30 14:23 ` Bruce Richardson 0 siblings, 0 replies; 5+ messages in thread From: Bruce Richardson @ 2016-03-30 14:23 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Mohammad El-Shabani, dev On Tue, Mar 29, 2016 at 09:54:18AM -0700, Stephen Hemminger wrote: > On Tue, 29 Mar 2016 10:31:19 +0100 > Bruce Richardson <bruce.richardson@intel.com> wrote: > > > On Mon, Mar 28, 2016 at 06:45:26PM -0700, Mohammad El-Shabani wrote: > > > Hi, > > > Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count > > > is implemented a scan of elements of rx descriptors, which is very > > > expensive. I am wondering why its implemented the way it is. Could it not > > > just read the head location from the driver? > > > > > > Thanks! > > > Mohammad El-Shabani > > > > It's likely that reading the head location from the driver will be even slower > > than scanning the descriptor rings in memory. Access to PCI is very much slower > > than accessing memory - especially since on platforms with DDIO, many memory > > accesses will actually be cache reads. > > > > That being said, I haven't actually written a test to prove this out, so feel > > free to try out the head pointer read method instead and see if it improves > > things. The results may vary depending on how far ahead needs to be scanned, > > but certainly for the empty ring case, the descriptor scan method will be far > > faster than a head read. > > > > Regards, > > /Bruce > > Also the most common use case is "is there any more packets ready before > I go to sleep on epoll", and the descriptor done API tells more than > is needed. Yes, it's not designed for that case. For the are-there-any-more-packets query, the rx_burst api is the one to call. :-) The rx_queue_count API is for the case where you are under load and need to see beyond the max count returned by rx_burst before you process the burst of packets. /Bruce ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-03-30 14:23 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-03-29 1:45 [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count Mohammad El-Shabani 2016-03-29 2:22 ` Lu, Wenzhuo 2016-03-29 9:31 ` Bruce Richardson 2016-03-29 16:54 ` Stephen Hemminger 2016-03-30 14:23 ` Bruce Richardson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).