From: Ferruh Yigit <ferruh.yigit@intel.com>
To: Tudor Cornea <tudor.cornea@gmail.com>, <thomas@monjalon.net>
Cc: <stephen@networkplumber.org>, <dev@dpdk.org>
Subject: Re: [PATCH v3] kni: allow configuring the kni thread granularity
Date: Tue, 23 Nov 2021 17:08:28 +0000 [thread overview]
Message-ID: <943ec9ee-0b91-f112-59c9-76267bf021cb@intel.com> (raw)
In-Reply-To: <d6cbcb6f-dca3-4379-4650-2d350f3bd43b@intel.com>
On 11/22/2021 5:31 PM, Ferruh Yigit wrote:
> On 11/8/2021 10:13 AM, Tudor Cornea wrote:
>> The Kni kthreads seem to be re-scheduled at a granularity of roughly
>> 1 millisecond right now, which seems to be insufficient for performing
>> tests involving a lot of control plane traffic.
>>
>> Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it
>> seems that the existing code cannot reschedule at the desired granularily,
>> due to precision constraints of schedule_timeout_interruptible().
>>
>
> ack
>
>> In our use case, we leverage the Linux Kernel for control plane, and
>> it is not uncommon to have 60K - 100K pps for some signaling protocols.
>>
>> Since we are not in atomic context, the usleep_range() function seems to be
>> more appropriate for being able to introduce smaller controlled delays,
>> in the range of 5-10 microseconds. Upon reading the existing code, it would
>> seem that this was the original intent. Adding sub-millisecond delays,
>> seems unfeasible with a call to schedule_timeout_interruptible().
>>> KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
>> schedule_timeout_interruptible(
>> usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
>>
>
> Agree, although comment highlights that intention is to have microsecond
> current code doesn't provide it.
>
>> Below, we attempted a brief comparison between the existing implementation,
>> which uses schedule_timeout_interruptible() and usleep_range().
>>
>
> +1 to use 'usleep_range()'.
>
> Overall +1 to the change, I think it fixes the kernel thread delay, and
> makes it configurable. As you clarified below, making the polls too frequent
> cause too much CPU consumption, so it is good idea to make it configurable.
>
> Let me test the code first, I think it is too late for this release, but
> we can get it for next release if the testing goes well.
>
As I tested both with KNI sample app and KNI PMD, change looks good,
practically we can't change the scheduler delay with existing code but
this patch enables it and lets configure performance/CPU usage trade of.
Only possible change is to remove 'RTE_KNI_PREEMPT_DEFAULT' macro as
mentioned below.
>> We attempt to measure the CPU usage, and RTT between two Kni interfaces,
>> which are created on top of vmxnet3 adapters, connected by a vSwitch.
>>
>> insmod rte_kni.ko kthread_mode=single carrier=on
>>
>> schedule_timeout_interruptible(usecs_to_jiffies(5))
>> kni_single CPU Usage: 2-4 %
>> [root@localhost ~]# ping 1.1.1.2 -I eth1
>> PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data.
>> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms
>> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms
>> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms
>> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms
>> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms
>>
>> usleep_range(5, 10)
>> kni_single CPU usage: 50%
>> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms
>> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms
>> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms
>> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms
>> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms
>>
>> usleep_range(20, 50)
>> kni_single CPU usage: 24%
>> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms
>> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms
>> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms
>> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms
>> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms
>>
>> usleep_range(50, 100)
>> kni_single CPU usage: 13%
>> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms
>> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms
>> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms
>> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms
>> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms
>>
>> usleep_range(100, 200)
>> kni_single CPU usage: 7%
>> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms
>> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms
>> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms
>> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms
>> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms
>>
>> usleep_range(1000, 1100)
>> kni_single CPU usage: 2%
>> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms
>> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms
>> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms
>> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms
>> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms
>>
>> Upon testing, usleep_range(1000, 1100) seems roughly equivalent in
>> latency and cpu usage to the variant with schedule_timeout_interruptible(),
>> while usleep_range(100, 200) seems to give a decent tradeoff between
>> latency and cpu usage, while allowing users to tweak the limits for
>> improved precision if they have such use cases.
>>
>> Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a
>> softlockup on my kernel.
>>
>
> Same here. That is why I wonder if there is a point to keep the compile
> time flag?
> Since we can't unset it practically, and now the delay can be configurable
> by module parameters, what do you think to remove the compile time flag
> completely?
>
>> Kernel panic - not syncing: softlockup: hung tasks
>> CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1
>> <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b
>> [<ffffffff814f7891>] panic+0xcd/0x1e0
>> [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160
>> [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0
>> [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0
>> [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0
>> [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80
>>
>> References:
>> [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt
>>
>> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
>>
>
> <...>
next prev parent reply other threads:[~2021-11-23 17:20 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-02 10:38 [dpdk-dev] [PATCH] " Tudor Cornea
2021-11-02 15:51 ` [dpdk-dev] [PATCH v2] " Tudor Cornea
2021-11-02 15:53 ` Stephen Hemminger
2021-11-03 20:40 ` Tudor Cornea
2021-11-03 22:18 ` Stephen Hemminger
2021-11-08 10:13 ` [dpdk-dev] [PATCH v3] " Tudor Cornea
2021-11-22 17:31 ` Ferruh Yigit
2021-11-23 17:08 ` Ferruh Yigit [this message]
2021-11-24 17:10 ` Tudor Cornea
2021-11-24 19:24 ` [PATCH v4] " Tudor Cornea
2022-01-14 13:53 ` Connolly, Padraig J
2022-01-14 14:13 ` Ferruh Yigit
2022-01-14 15:18 ` [PATCH v5] " Tudor Cornea
2022-01-14 16:24 ` Stephen Hemminger
2022-01-14 16:43 ` Ferruh Yigit
2022-01-17 16:24 ` Tudor Cornea
2022-01-20 12:41 ` [PATCH v6] " Tudor Cornea
2022-02-02 19:30 ` Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=943ec9ee-0b91-f112-59c9-76267bf021cb@intel.com \
--to=ferruh.yigit@intel.com \
--cc=dev@dpdk.org \
--cc=stephen@networkplumber.org \
--cc=thomas@monjalon.net \
--cc=tudor.cornea@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).