* [dpdk-dev] KNI performance
@ 2015-06-05 15:06 Jay Rolette
2015-06-05 15:13 ` Marc Sune
0 siblings, 1 reply; 3+ messages in thread
From: Jay Rolette @ 2015-06-05 15:06 UTC (permalink / raw)
To: DPDK
The past few days I've been trying to chase down why operations over KNI
are so bloody slow. To give you an idea how bad it is, we did a simple test
over an NFS mount:
# Mount over a non-KNI interface (eth0 on vanilla Ubuntu 14.04 LTS)
$ time $(ls -last -R /mnt/sfs2008 > /dev/null)
real 11m58.224s
user 0m10.758s
sys 0m25.050s
# Reboot to make sure NFS cache is cleared and mount over a KNI interface
$ time $(ls -last -R /mnt/sfs2008 > /dev/null)
real 87m36.295s
user 0m14.552s
sys 0m25.949s
Packet captures showed a pretty consistent ~4ms delay. Get a READDIRPLUS
reply from NFS server and the TCP stack on the DPDK/KNI system took about
4ms to ACK the reply. It isn't just on ACK packets either. If there was no
ACK required, there would be a 4ms delay before the next call was sent
(ACCESS, LOOKUP, another READDIR, etc.).
This is running on top of a real DPDK app, so there are various queues and
ring-buffers in the path between KNI and the wire, so I started there. Long
story short, worst case, those could only inject ~120us of latency into the
path.
Next stop was KNI itself. Ignoring a few minor optos I found, nothing in
the code looked like it could account for 4ms of latency. That wasn't quite
right though...
Here's the code for the processing loop in kni_thread_single():
while (!kthread_should_stop()) {
down_read(&kni_list_lock);
for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
list_for_each_entry(dev, &kni_list_head, list) {
#ifdef RTE_KNI_VHOST
kni_chk_vhost_rx(dev);
#else
kni_net_rx(dev);
#endif
kni_net_poll_resp(dev);
}
}
up_read(&kni_list_lock);
/* reschedule out for a while */
schedule_timeout_interruptible(usecs_to_jiffies( \
KNI_KTHREAD_RESCHEDULE_INTERVAL));
Turns out the 4ms delay is due to the schedule_timeout() call. The code
specifies a 5us sleep, but the call only guarantees a sleep of *at least*
the time specified.
The resolution of the sleep is controlled by the timer interrupt rate. If
you are using a kernel from one of the usual Linux distros, HZ = 250 on
x86. That works out nicely to a 4ms period. The KNI kernel thread was going
to sleep and frequently not getting woken up for nearly 4ms.
We rebuilt the kernel with HZ = 1000 and things improved considerably:
# Mount over a KNI interface, HZ=1000
$ time $(ls -last -R /mnt/sfs2008 > /dev/null)
real 21m8.478s
user 0m13.824s
sys 0m18.113s
Still not where I'd like to get it, but much, much better.
Hopefully my pain is your gain and this helps other KNI users.
Jay
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [dpdk-dev] KNI performance
2015-06-05 15:06 [dpdk-dev] KNI performance Jay Rolette
@ 2015-06-05 15:13 ` Marc Sune
2015-06-05 15:24 ` Jay Rolette
0 siblings, 1 reply; 3+ messages in thread
From: Marc Sune @ 2015-06-05 15:13 UTC (permalink / raw)
To: dev
On 05/06/15 17:06, Jay Rolette wrote:
> The past few days I've been trying to chase down why operations over KNI
> are so bloody slow. To give you an idea how bad it is, we did a simple test
> over an NFS mount:
>
> # Mount over a non-KNI interface (eth0 on vanilla Ubuntu 14.04 LTS)
> $ time $(ls -last -R /mnt/sfs2008 > /dev/null)
> real 11m58.224s
> user 0m10.758s
> sys 0m25.050s
>
> # Reboot to make sure NFS cache is cleared and mount over a KNI interface
> $ time $(ls -last -R /mnt/sfs2008 > /dev/null)
> real 87m36.295s
> user 0m14.552s
> sys 0m25.949s
>
> Packet captures showed a pretty consistent ~4ms delay. Get a READDIRPLUS
> reply from NFS server and the TCP stack on the DPDK/KNI system took about
> 4ms to ACK the reply. It isn't just on ACK packets either. If there was no
> ACK required, there would be a 4ms delay before the next call was sent
> (ACCESS, LOOKUP, another READDIR, etc.).
>
> This is running on top of a real DPDK app, so there are various queues and
> ring-buffers in the path between KNI and the wire, so I started there. Long
> story short, worst case, those could only inject ~120us of latency into the
> path.
>
> Next stop was KNI itself. Ignoring a few minor optos I found, nothing in
> the code looked like it could account for 4ms of latency. That wasn't quite
> right though...
>
> Here's the code for the processing loop in kni_thread_single():
>
> while (!kthread_should_stop()) {
> down_read(&kni_list_lock);
> for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
> list_for_each_entry(dev, &kni_list_head, list) {
> #ifdef RTE_KNI_VHOST
> kni_chk_vhost_rx(dev);
> #else
> kni_net_rx(dev);
> #endif
> kni_net_poll_resp(dev);
> }
> }
> up_read(&kni_list_lock);
> /* reschedule out for a while */
> schedule_timeout_interruptible(usecs_to_jiffies( \
> KNI_KTHREAD_RESCHEDULE_INTERVAL));
>
> Turns out the 4ms delay is due to the schedule_timeout() call. The code
> specifies a 5us sleep, but the call only guarantees a sleep of *at least*
> the time specified.
>
> The resolution of the sleep is controlled by the timer interrupt rate. If
> you are using a kernel from one of the usual Linux distros, HZ = 250 on
> x86. That works out nicely to a 4ms period. The KNI kernel thread was going
> to sleep and frequently not getting woken up for nearly 4ms.
>
> We rebuilt the kernel with HZ = 1000 and things improved considerably:
>
> # Mount over a KNI interface, HZ=1000
> $ time $(ls -last -R /mnt/sfs2008 > /dev/null)
>
> real 21m8.478s
> user 0m13.824s
> sys 0m18.113s
>
> Still not where I'd like to get it, but much, much better.
>
> Hopefully my pain is your gain and this helps other KNI users.
Jay,
If you set CONFIG_RTE_KNI_PREEMPT_DEFAULT to 'n' you should see a
reduced latency and delay since there is no preemption (though
sacrifices 1 CPU for the kni kmod):
http://patchwork.dpdk.org/dev/patchwork/patch/3304/
However, KNI is still pretty slow. Even considering that there will
always be at least 1 copy involved, I still think is too slow. I didn't
had time to look closer yet.
Marc
>
> Jay
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [dpdk-dev] KNI performance
2015-06-05 15:13 ` Marc Sune
@ 2015-06-05 15:24 ` Jay Rolette
0 siblings, 0 replies; 3+ messages in thread
From: Jay Rolette @ 2015-06-05 15:24 UTC (permalink / raw)
To: Marc Sune; +Cc: DPDK
On Fri, Jun 5, 2015 at 10:13 AM, Marc Sune <marc.sune@bisdn.de> wrote:
>
>
> On 05/06/15 17:06, Jay Rolette wrote:
>
>> The past few days I've been trying to chase down why operations over KNI
>> are so bloody slow. To give you an idea how bad it is, we did a simple
>> test
>> over an NFS mount:
>>
>> # Mount over a non-KNI interface (eth0 on vanilla Ubuntu 14.04 LTS)
>> $ time $(ls -last -R /mnt/sfs2008 > /dev/null)
>> real 11m58.224s
>> user 0m10.758s
>> sys 0m25.050s
>>
>> # Reboot to make sure NFS cache is cleared and mount over a KNI interface
>> $ time $(ls -last -R /mnt/sfs2008 > /dev/null)
>> real 87m36.295s
>> user 0m14.552s
>> sys 0m25.949s
>>
>> Packet captures showed a pretty consistent ~4ms delay. Get a READDIRPLUS
>> reply from NFS server and the TCP stack on the DPDK/KNI system took about
>> 4ms to ACK the reply. It isn't just on ACK packets either. If there was no
>> ACK required, there would be a 4ms delay before the next call was sent
>> (ACCESS, LOOKUP, another READDIR, etc.).
>>
>> This is running on top of a real DPDK app, so there are various queues and
>> ring-buffers in the path between KNI and the wire, so I started there.
>> Long
>> story short, worst case, those could only inject ~120us of latency into
>> the
>> path.
>>
>> Next stop was KNI itself. Ignoring a few minor optos I found, nothing in
>> the code looked like it could account for 4ms of latency. That wasn't
>> quite
>> right though...
>>
>> Here's the code for the processing loop in kni_thread_single():
>>
>> while (!kthread_should_stop()) {
>> down_read(&kni_list_lock);
>> for (j = 0; j < KNI_RX_LOOP_NUM; j++) {
>> list_for_each_entry(dev, &kni_list_head, list) {
>> #ifdef RTE_KNI_VHOST
>> kni_chk_vhost_rx(dev);
>> #else
>> kni_net_rx(dev);
>> #endif
>> kni_net_poll_resp(dev);
>> }
>> }
>> up_read(&kni_list_lock);
>> /* reschedule out for a while */
>> schedule_timeout_interruptible(usecs_to_jiffies( \
>> KNI_KTHREAD_RESCHEDULE_INTERVAL));
>>
>> Turns out the 4ms delay is due to the schedule_timeout() call. The code
>> specifies a 5us sleep, but the call only guarantees a sleep of *at least*
>> the time specified.
>>
>> The resolution of the sleep is controlled by the timer interrupt rate. If
>> you are using a kernel from one of the usual Linux distros, HZ = 250 on
>> x86. That works out nicely to a 4ms period. The KNI kernel thread was
>> going
>> to sleep and frequently not getting woken up for nearly 4ms.
>>
>> We rebuilt the kernel with HZ = 1000 and things improved considerably:
>>
>> # Mount over a KNI interface, HZ=1000
>> $ time $(ls -last -R /mnt/sfs2008 > /dev/null)
>>
>> real 21m8.478s
>> user 0m13.824s
>> sys 0m18.113s
>>
>> Still not where I'd like to get it, but much, much better.
>>
>> Hopefully my pain is your gain and this helps other KNI users.
>>
>
> Jay,
>
> If you set CONFIG_RTE_KNI_PREEMPT_DEFAULT to 'n' you should see a reduced
> latency and delay since there is no preemption (though sacrifices 1 CPU for
> the kni kmod):
>
> http://patchwork.dpdk.org/dev/patchwork/patch/3304/
>
> However, KNI is still pretty slow. Even considering that there will always
> be at least 1 copy involved, I still think is too slow. I didn't had time
> to look closer yet.
>
> Marc
>
Hi Marc,
Thanks for the pointer to the patch. I did something similar as a test
before we started mucking with rebuilding the kernel. Skipping the call to
put the KNI kernel thread to sleep improved performance and reduced
latency, but oddly enough, it wasn't as fast for the end-app as the HZ=1000
change.
Here's what I got on that test:
# Mount over "no-sleep" KNI
$ time $(ls -last -R /mnt/sfs2008 > /dev/null)
real 37m49.004s
user 0m23.274s
sys 0m9.010s
Jay
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-06-05 15:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-05 15:06 [dpdk-dev] KNI performance Jay Rolette
2015-06-05 15:13 ` Marc Sune
2015-06-05 15:24 ` Jay Rolette
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).