DPDK usage discussions
 help / color / mirror / Atom feed
* [dpdk-users] vmxnet3 tx queue fills then never empties
@ 2016-10-07 14:29 Paul Atkins
  0 siblings, 0 replies; only message in thread
From: Paul Atkins @ 2016-10-07 14:29 UTC (permalink / raw)
  To: users

Hi,

I am having an issue with the vmxnet3 driver that I have done a lot of 
investigation into (details follow) and have reached the stage where I 
feel I need help from people with access to the vmxnet3 virtual nic code.

The issue is with the dpdk vmxnet3 driver where a tx queue will get 
full, and then never empties.  The trigger for this is sending 1500byte 
packets out of the interface at ~60kpps and then marking the tx 
interface as 'not connected' in vmware.  At this stage the tx queues 
fill up, and when the interface is then marked as 'connected' again in 
vmware some of the tx queues are in a state where they never send any 
further packets.  In my setup that has 4 tx queues with the traffic 
being equally shared over them, I typically see this in 1 of the 4 
queues when the interface comes back up.

If we don't use dpdk and send the traffic via the linux kernel instead 
the problem is not seen, which would suggest a bug in the dpdk driver. 
However, if i modify the linux kernel driver to call 
vmxnet3_tq_tx_complete() inline with the tx code (in the same way that 
it is done for dpdk) then we start to see the bug with the kernel too.  
This suggests a timing issue with calling the tx_complete function. When 
it was called from the interrupt handler the issue was not seen.  
Further, adding debugs into the start of the tx_complete func (before 
any work done) caused the issue to no longer be seen, again suggesting 
some timing race.

I then proceeded to add debug that stores the index of next2fill and 
next2comp in the cmd ring, plus their gen bits, plus the gen bits of the 
3 indices above/below.  For the data ring I stored the index of the 
comp_ring, its gen bit and the gen bits for the 3 indices above/below.  
These values were stored in binary form (no string conversion until 
later) each time we entered/exited the tx_complete function, and each 
time we exited the xmit func.  Once the issue was seen, these were 
formatted, and the values all looked correct for both the working and 
the non working case.

I suspect that the cause of this is some quirk of the way the driver 
code is interacting with the virtual NIC, but I have no access to the 
code for the virtual NIC, so am struggling to make any progress 
identifying the root cause.

Is this a known issue, and do you have any suggestions as to how best to 
proceed with this?

I have seen this with the following versions:

ESXi 5.5 and later (VM version 10)
ESXi 5.0 and later (VM version 8)
Linux 4.4 kernel
dpdk 2.2

thanks,
Paul

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2016-10-07 14:29 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-07 14:29 [dpdk-users] vmxnet3 tx queue fills then never empties Paul Atkins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).