DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count
@ 2016-03-29  1:45 Mohammad El-Shabani
  2016-03-29  2:22 ` Lu, Wenzhuo
  2016-03-29  9:31 ` Bruce Richardson
  0 siblings, 2 replies; 5+ messages in thread
From: Mohammad El-Shabani @ 2016-03-29  1:45 UTC (permalink / raw)
  To: dev

Hi,
Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count
is implemented a scan of elements of rx descriptors, which is very
expensive. I am wondering why its implemented the way it is. Could it not
just read the head location from the driver?

Thanks!
Mohammad El-Shabani

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count
  2016-03-29  1:45 [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count Mohammad El-Shabani
@ 2016-03-29  2:22 ` Lu, Wenzhuo
  2016-03-29  9:31 ` Bruce Richardson
  1 sibling, 0 replies; 5+ messages in thread
From: Lu, Wenzhuo @ 2016-03-29  2:22 UTC (permalink / raw)
  To: Mohammad El-Shabani, dev

Hi Mohammad,


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Mohammad El-Shabani
> Sent: Tuesday, March 29, 2016 9:45 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] librte_pmd_ixgbe implementation of
> ixgbe_dev_rx_queue_count
> 
> Hi,
> Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count is
> implemented a scan of elements of rx descriptors, which is very expensive. I am
> wondering why its implemented the way it is. Could it not just read the head
> location from the driver?
Not sure about the history. But to my opinion it's a control plane ops not a data plane ops. Maybe the reason is the author doesn't care about the performance too much.
As you have a good idea, would you like to share it with us? Thanks in advance:)

> 
> Thanks!
> Mohammad El-Shabani

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count
  2016-03-29  1:45 [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count Mohammad El-Shabani
  2016-03-29  2:22 ` Lu, Wenzhuo
@ 2016-03-29  9:31 ` Bruce Richardson
  2016-03-29 16:54   ` Stephen Hemminger
  1 sibling, 1 reply; 5+ messages in thread
From: Bruce Richardson @ 2016-03-29  9:31 UTC (permalink / raw)
  To: Mohammad El-Shabani; +Cc: dev

On Mon, Mar 28, 2016 at 06:45:26PM -0700, Mohammad El-Shabani wrote:
> Hi,
> Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count
> is implemented a scan of elements of rx descriptors, which is very
> expensive. I am wondering why its implemented the way it is. Could it not
> just read the head location from the driver?
> 
> Thanks!
> Mohammad El-Shabani

It's likely that reading the head location from the driver will be even slower
than scanning the descriptor rings in memory. Access to PCI is very much slower
than accessing memory - especially since on platforms with DDIO, many memory
accesses will actually be cache reads.

That being said, I haven't actually written a test to prove this out, so feel
free to try out the head pointer read method instead and see if it improves
things. The results may vary depending on how far ahead needs to be scanned,
but certainly for the empty ring case, the descriptor scan method will be far
faster than a head read.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count
  2016-03-29  9:31 ` Bruce Richardson
@ 2016-03-29 16:54   ` Stephen Hemminger
  2016-03-30 14:23     ` Bruce Richardson
  0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2016-03-29 16:54 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Mohammad El-Shabani, dev

On Tue, 29 Mar 2016 10:31:19 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> On Mon, Mar 28, 2016 at 06:45:26PM -0700, Mohammad El-Shabani wrote:
> > Hi,
> > Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count
> > is implemented a scan of elements of rx descriptors, which is very
> > expensive. I am wondering why its implemented the way it is. Could it not
> > just read the head location from the driver?
> > 
> > Thanks!
> > Mohammad El-Shabani
> 
> It's likely that reading the head location from the driver will be even slower
> than scanning the descriptor rings in memory. Access to PCI is very much slower
> than accessing memory - especially since on platforms with DDIO, many memory
> accesses will actually be cache reads.
> 
> That being said, I haven't actually written a test to prove this out, so feel
> free to try out the head pointer read method instead and see if it improves
> things. The results may vary depending on how far ahead needs to be scanned,
> but certainly for the empty ring case, the descriptor scan method will be far
> faster than a head read.
> 
> Regards,
> /Bruce

Also the most common use case is "is there any more packets ready before
I go to sleep on epoll", and the descriptor done API tells more than
is needed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count
  2016-03-29 16:54   ` Stephen Hemminger
@ 2016-03-30 14:23     ` Bruce Richardson
  0 siblings, 0 replies; 5+ messages in thread
From: Bruce Richardson @ 2016-03-30 14:23 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Mohammad El-Shabani, dev

On Tue, Mar 29, 2016 at 09:54:18AM -0700, Stephen Hemminger wrote:
> On Tue, 29 Mar 2016 10:31:19 +0100
> Bruce Richardson <bruce.richardson@intel.com> wrote:
> 
> > On Mon, Mar 28, 2016 at 06:45:26PM -0700, Mohammad El-Shabani wrote:
> > > Hi,
> > > Looking into why it hurts performance, I see that ixgbe_dev_rx_queue_count
> > > is implemented a scan of elements of rx descriptors, which is very
> > > expensive. I am wondering why its implemented the way it is. Could it not
> > > just read the head location from the driver?
> > > 
> > > Thanks!
> > > Mohammad El-Shabani
> > 
> > It's likely that reading the head location from the driver will be even slower
> > than scanning the descriptor rings in memory. Access to PCI is very much slower
> > than accessing memory - especially since on platforms with DDIO, many memory
> > accesses will actually be cache reads.
> > 
> > That being said, I haven't actually written a test to prove this out, so feel
> > free to try out the head pointer read method instead and see if it improves
> > things. The results may vary depending on how far ahead needs to be scanned,
> > but certainly for the empty ring case, the descriptor scan method will be far
> > faster than a head read.
> > 
> > Regards,
> > /Bruce
> 
> Also the most common use case is "is there any more packets ready before
> I go to sleep on epoll", and the descriptor done API tells more than
> is needed.

Yes, it's not designed for that case. For the are-there-any-more-packets query,
the rx_burst api is the one to call. :-)
The rx_queue_count API is for the case where you are under load and need to see
beyond the max count returned by rx_burst before you process the burst of packets.

/Bruce

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-03-30 14:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-29  1:45 [dpdk-dev] librte_pmd_ixgbe implementation of ixgbe_dev_rx_queue_count Mohammad El-Shabani
2016-03-29  2:22 ` Lu, Wenzhuo
2016-03-29  9:31 ` Bruce Richardson
2016-03-29 16:54   ` Stephen Hemminger
2016-03-30 14:23     ` Bruce Richardson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).