DPDK usage discussions
 help / color / Atom feed
* [dpdk-users] Interrupt mode, queues, own event loop
@ 2020-09-04 10:24 Budiský Jakub
  2020-09-04 16:18 ` Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: Budiský Jakub @ 2020-09-04 10:24 UTC (permalink / raw)
  To: users

Hi,

I'm working on a project that involves packet bursts receiving; other 
than that it's mostly idle. The DPDK was incorporated later on (when I 
found out that Linux AF_XDP won't do the job) and I use my own C++ 
implementation of an epoll-based event loop along with eventfd and 
timerfd for communication / timeouts.

So I'm trying to use per-queue interrupts in my own event loop with 
DPDK. Per-queue is quite important since I'm using the flow director for 
load balancing and I'm relying on it. In the DPDK 18.11 (I believe) a 
new function `rte_eth_dev_rx_intr_ctl_q_get_fd` was introduced just for 
this purpose.

I'm currently using `uio_pci_generic` driver with Intel's 82599ES NIC 
for debugging. For production I will switch to `vfio` due to the 
application running in the userspace.

I've encountered two problems; the first being that I've expected the 
DPDK to pass me eventfd file descriptors. While debugging I found out 
that these are, in fact, /dev/uio0 files (I guess these are special 
files created by the driver). I don't mind them "being different", but 
this raises a few other issues: Is it safe to read them, i.e. does the 
`ixgbe_pmd` driver rely on them in any way? Is there a way of 
discriminating between a different types of file descriptor I may obtain 
except looking at `/proc/self/fd/<fd_number>`? From the implementation 
of `eal_intr_proc_rxtx_intr` it looks like the file descriptors will 
differ for the `vfio` driver and I need to read a different amount of 
data from them (4 Bytes for UIO vs. 8 Bytes for VFIO respectively, other 
sizes may rise EINVAL).

The second problem is that I've got the same file descriptor for all the 
queues, which means it may not be captured by the epoll in all relevant 
threads. Is this behaviour intended? I recall seeing some limits 
regarding the number of interrupt file descriptors but I believe it was 
15 for my NIC. I don't mind but I need to change the program's logic to 
account for this. Can I read the file descriptor and find out which 
queues do need to process incoming packets, or do I just wake them all 
up? Does this differ (and if, how) between the `vfio` and 
`uio_pci_generic` drivers?

I feel like I may have missed something, reading the 
`linux/eal_interrupts.c` it indeed looks like some eventfd descriptors 
are set up, but maybe this matters only if you use DPDK-encapsulated 
event loop. Please let me know if I should call anything besides 
`rte_eth_dev_rx_intr_ctl_q_get_fd` and the usual device configuration 
functions.

Thanks for any help!

Best regards,
Jakub Budisky

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Interrupt mode, queues, own event loop
  2020-09-04 10:24 [dpdk-users] Interrupt mode, queues, own event loop Budiský Jakub
@ 2020-09-04 16:18 ` Stephen Hemminger
  2020-09-05 21:21   ` Budiský Jakub
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2020-09-04 16:18 UTC (permalink / raw)
  To: Budiský Jakub; +Cc: users

On Fri, 04 Sep 2020 12:24:06 +0200
Budiský Jakub <ibudisky@fit.vutbr.cz> wrote:

> Hi,
> 
> I'm working on a project that involves packet bursts receiving; other 
> than that it's mostly idle. The DPDK was incorporated later on (when I 
> found out that Linux AF_XDP won't do the job) and I use my own C++ 
> implementation of an epoll-based event loop along with eventfd and 
> timerfd for communication / timeouts.
> 
> So I'm trying to use per-queue interrupts in my own event loop with 
> DPDK. Per-queue is quite important since I'm using the flow director for 
> load balancing and I'm relying on it. In the DPDK 18.11 (I believe) a 
> new function `rte_eth_dev_rx_intr_ctl_q_get_fd` was introduced just for 
> this purpose.
> 
> I'm currently using `uio_pci_generic` driver with Intel's 82599ES NIC 
> for debugging. For production I will switch to `vfio` due to the 
> application running in the userspace.
> 
> I've encountered two problems; the first being that I've expected the 
> DPDK to pass me eventfd file descriptors. While debugging I found out 
> that these are, in fact, /dev/uio0 files (I guess these are special 
> files created by the driver). I don't mind them "being different", but 
> this raises a few other issues: Is it safe to read them, i.e. does the 
> `ixgbe_pmd` driver rely on them in any way? Is there a way of 
> discriminating between a different types of file descriptor I may obtain 
> except looking at `/proc/self/fd/<fd_number>`? From the implementation 
> of `eal_intr_proc_rxtx_intr` it looks like the file descriptors will 
> differ for the `vfio` driver and I need to read a different amount of 
> data from them (4 Bytes for UIO vs. 8 Bytes for VFIO respectively, other 
> sizes may rise EINVAL).
> 
> The second problem is that I've got the same file descriptor for all the 
> queues, which means it may not be captured by the epoll in all relevant 
> threads. Is this behaviour intended? I recall seeing some limits 
> regarding the number of interrupt file descriptors but I believe it was 
> 15 for my NIC. I don't mind but I need to change the program's logic to 
> account for this. Can I read the file descriptor and find out which 
> queues do need to process incoming packets, or do I just wake them all 
> up? Does this differ (and if, how) between the `vfio` and 
> `uio_pci_generic` drivers?
> 
> I feel like I may have missed something, reading the 
> `linux/eal_interrupts.c` it indeed looks like some eventfd descriptors 
> are set up, but maybe this matters only if you use DPDK-encapsulated 
> event loop. Please let me know if I should call anything besides 
> `rte_eth_dev_rx_intr_ctl_q_get_fd` and the usual device configuration 
> functions.


The per-queue interrupt functionality for PCI devices is built
on top of MSI-X interrupts. The uio_pci_generic driver you are using
does not support MSI-X.

The way UIO driver works is to use the legacy INTx functionality,
and when an interrupt occurs the device driver in the kernel is called.
For the uio_pci_generic driver this is mapped to the device file descriptor.

For VFIO, you can have one interrupt per queue and it uses eventfd's
to create a per-queue channel.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Interrupt mode, queues, own event loop
  2020-09-04 16:18 ` Stephen Hemminger
@ 2020-09-05 21:21   ` Budiský Jakub
  2020-09-08 15:09     ` Budiský Jakub
  0 siblings, 1 reply; 4+ messages in thread
From: Budiský Jakub @ 2020-09-05 21:21 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: users

On 2020-09-04 18:18, Stephen Hemminger wrote:
> The per-queue interrupt functionality for PCI devices is built
> on top of MSI-X interrupts. The uio_pci_generic driver you are using
> does not support MSI-X.
> 
> The way UIO driver works is to use the legacy INTx functionality,
> and when an interrupt occurs the device driver in the kernel is called.
> For the uio_pci_generic driver this is mapped to the device file 
> descriptor.
> 
> For VFIO, you can have one interrupt per queue and it uses eventfd's
> to create a per-queue channel.

Hi,

thanks for the valuable info!

I've now switched to the `vfio` module even for testing and I can 
confirm I get a set of separate eventfd file descriptors. I've 
encountered a new issue though that appears like a bug to me.

Either one of the file descriptors (`--vfio-intr msix`, always the one 
associated with the last queue, regardless of the initialization order) 
or all of them (`--vfio-intr msi`) are available for reading just once 
per application run. I cannot get any followup interrupts and I can 
confirm by polling that there are new packets that have arrived.

With `--vfio-intr legacy` I get a same file descriptor for all my 
workers but it is also only triggered once.

As far as I understand there is no clear flag for the MSI(-X) interrupts 
and so I'm not sure what else to try. There is nothing of interest in 
the application output (not even with `--log-level lib.eal:debug`).

Thanks again.

Best regards,
Jakub Budisky

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [dpdk-users] Interrupt mode, queues, own event loop
  2020-09-05 21:21   ` Budiský Jakub
@ 2020-09-08 15:09     ` Budiský Jakub
  0 siblings, 0 replies; 4+ messages in thread
From: Budiský Jakub @ 2020-09-08 15:09 UTC (permalink / raw)
  To: users

On 2020-09-05 23:21, Budiský Jakub wrote:
> Hi,
> 
> thanks for the valuable info!
> 
> I've now switched to the `vfio` module even for testing and I can
> confirm I get a set of separate eventfd file descriptors. I've
> encountered a new issue though that appears like a bug to me.
> 
> Either one of the file descriptors (`--vfio-intr msix`, always the one
> associated with the last queue, regardless of the initialization
> order) or all of them (`--vfio-intr msi`) are available for reading
> just once per application run. I cannot get any followup interrupts
> and I can confirm by polling that there are new packets that have
> arrived.
> 
> With `--vfio-intr legacy` I get a same file descriptor for all my
> workers but it is also only triggered once.
> 
> As far as I understand there is no clear flag for the MSI(-X)
> interrupts and so I'm not sure what else to try. There is nothing of
> interest in the application output (not even with `--log-level
> lib.eal:debug`).
> 
> Thanks again.
> 
> Best regards,
> Jakub Budisky

Hi,

Just letting you know that I've resolved the aforementioned issues and 
it seems to be working with the Intel NIC. In the debugging process I've 
also tried a 100 Gb NIC from Mellanox with no luck so far (I can't get 
an epoll event from their "infinibandevent" (?) file descriptor, but I 
didn't invest too much time into investigating it further). On the 
bright side, trying a different NIC pointed me towards some of the 
issues so I'm glad for that.

For the record, here are the resolutions (it may help somebody in the 
future):
– Only the last queue was interrupted because my Flow Director setup was 
flawed and didn't match the incoming packets properly. So that one was 
unrelated. Needless to say that the (un)reported errors from the 
`rte_flow` API were not too helpful.
– The MSI interrupts cannot distinguish between queues, similarly to the 
legacy interrupt mechanism. Confirmed this in the DPDK source code. That 
explains the difference in behaviour. So if you want per-queue 
interrupts you are pretty much stuck with the `vfio` + MSI-X.
– And most importantly; apparently the DPDK RX interrupts were designed 
in a way that an "explicit switch back to the polling mode" is necessary 
(at least this seems to be the case for the Intel NIC / ixgbe PMD I'm 
using). You won't get another interrupt unless you re-enable them. More 
specifically, in my current code I now call 
`rte_eth_dev_rx_intr_disable` immediately after receiving the event from 
`epoll` followed by a read from the file descriptor, and 
`rte_eth_dev_rx_intr_enable` after I'm done with the packet bursts 
retrieval. I hope I won't miss an interrupt doing it this way due to a 
race condition (incoming packet in-between the processing and enabling 
the interrupt).

Thanks again for the help I've got.

Best regards,
Jakub Budisky

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 10:24 [dpdk-users] Interrupt mode, queues, own event loop Budiský Jakub
2020-09-04 16:18 ` Stephen Hemminger
2020-09-05 21:21   ` Budiský Jakub
2020-09-08 15:09     ` Budiský Jakub

DPDK usage discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/users/0 users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 users users/ http://inbox.dpdk.org/users \
		users@dpdk.org
	public-inbox-index users


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.users


AGPL code for this site: git clone https://public-inbox.org/ public-inbox