DPDK patches and discussions
 help / color / mirror / Atom feed
From: Tudor Cornea <tudor.cornea@gmail.com>
To: ferruh.yigit@intel.com
Cc: dev@dpdk.org, Tudor Cornea <tudor.cornea@gmail.com>
Subject: [dpdk-dev] [PATCH v2] kni: allow configuring the kni thread granularity
Date: Tue,  2 Nov 2021 17:51:13 +0200
Message-ID: <1635868273-69843-1-git-send-email-tudor.cornea@gmail.com> (raw)
In-Reply-To: <1635849481-47147-1-git-send-email-tudor.cornea@gmail.com>

The Kni kthreads seem to be re-scheduled at a granularity of roughly
1 millisecond right now, which seems to be insufficient for performing tests
involving a lot of control plane traffic.

Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it
seems that the existing code cannot reschedule at the desired granularily,
due to precision constraints of schedule_timeout_interruptible().

In our use case, we leverage the Linux Kernel for control plane, and
it is not uncommon to have 60K - 100K pps for some signaling protocols.

Since we are in non-atomic context, the usleep_range() function seems to be
more appropriate for being able to introduce smaller controlled delays,
in the range of 5-10 microseconds. Upon reading the existing code, it would
seem that this was the original intent. Adding sub-millisecond delays,
seems unfeasible with a call to schedule_timeout_interruptible().

KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
schedule_timeout_interruptible(
        usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));

Below, we attempted a brief comparison between the existing implementation,
which uses schedule_timeout_interruptible() and usleep_range().

insmod rte_kni.ko kthread_mode=single carrier=on

schedule_timeout_interruptible(usecs_to_jiffies(5))
kni_single CPU Usage: 2-4 %
[root@localhost ~]# ping 1.1.1.2 -I eth1
PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data.
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms

usleep_range(5, 10)
kni_single CPU usage: 50%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms

usleep_range(20, 50)
kni_single CPU usage: 24%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms

usleep_range(50, 100)
kni_single CPU usage: 13%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms

usleep_range(100, 200)
kni_single CPU usage: 7%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms

usleep_range(1000, 1100)
kni_single CPU usage: 2%
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms
64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms

Upon testing, usleep_range(1000, 1100) seems roughly equivalent in
latency and cpu usage to the variant with schedule_timeout_interruptible(),
while usleep_range(100, 200) seems to give a decent tradeoff between
latency and cpu usage, while allowing users to tweak the limits for
improved precision if they have such use cases.

Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a
softlockup on my kernel.

Kernel panic - not syncing: softlockup: hung tasks
CPU: 0 PID: 1226 Comm: kni_single Tainted: G        W  O 3.10 #1
 <IRQ>  [<ffffffff814f84de>] dump_stack+0x19/0x1b
 [<ffffffff814f7891>] panic+0xcd/0x1e0
 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160
 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0
 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0
 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0
 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80

References:
[1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt

Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>

---
v2:
* Fixed some spelling errors
---
 doc/guides/prog_guide/kernel_nic_interface.rst | 33 +++++++++++++++++++++++
 kernel/linux/kni/kni_dev.h                     |  2 +-
 kernel/linux/kni/kni_misc.c                    | 36 +++++++++++++++++++++++---
 3 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst
index 1ce03ec..2dd3481 100644
--- a/doc/guides/prog_guide/kernel_nic_interface.rst
+++ b/doc/guides/prog_guide/kernel_nic_interface.rst
@@ -56,6 +56,10 @@ can be specified when the module is loaded to control its behavior:
                     off   Interfaces will be created with carrier state set to off.
                     on    Interfaces will be created with carrier state set to on.
                      (charp)
+    parm:           min_scheduling_interval: "Kni thread min scheduling interval (default=100 microseconds):
+                     (long)
+    parm:           max_scheduling_interval: "Kni thread max scheduling interval (default=200 microseconds):
+                     (long)
 
 Loading the ``rte_kni`` kernel module without any optional parameters is
 the typical way a DPDK application gets packets into and out of the kernel
@@ -174,6 +178,35 @@ To set the default carrier state to *off*:
 If the ``carrier`` parameter is not specified, the default carrier state
 of KNI interfaces will be set to *off*.
 
+KNI Kthread Scheduling
+~~~~~~~~~~~~~~~~~~~~~~
+
+The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters
+control the rescheduling interval of the KNI kthreads.
+
+This might be useful if we have use cases in which we require improved
+latency or performance for control plane traffic.
+
+The implementation is backed by Linux hrtimers, and uses ``usleep_range``.
+Hence, it will have the same granularity constraints as Linux hrtimers.
+
+To see more about the Linux hrtimers, you can check the following resource: `Kernel Timers <http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_
+
+To set the ``min_scheduling_interval`` to a value of 100 microseconds:
+
+.. code-block:: console
+
+    # insmod <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=100
+
+To set the ``max_scheduling_interval`` to a value of 200 microseconds:
+
+.. code-block:: console
+
+    # insmod <build_dir>/kernel/linux/kni/rte_kni.ko max_scheduling_interval=200
+
+If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are
+not specified, the default interval limits will be set to *100* and *200* respectively.
+
 KNI Creation and Deletion
 -------------------------
 
diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h
index c15da311..bb4d891 100644
--- a/kernel/linux/kni/kni_dev.h
+++ b/kernel/linux/kni/kni_dev.h
@@ -27,7 +27,7 @@
 #include <linux/list.h>
 
 #include <rte_kni_common.h>
-#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
+#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */
 
 #define MBUF_BURST_SZ 32
 
diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c
index 2b464c4..e23cfd9 100644
--- a/kernel/linux/kni/kni_misc.c
+++ b/kernel/linux/kni/kni_misc.c
@@ -41,6 +41,12 @@ static uint32_t multiple_kthread_on;
 static char *carrier;
 uint32_t kni_dflt_carrier;
 
+#ifdef RTE_KNI_PREEMPT_DEFAULT
+/* Kni thread scheduling interval */
+static long min_scheduling_interval = 100; /* us */
+static long max_scheduling_interval = 200; /* us */
+#endif
+
 #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */
 
 static int kni_net_id;
@@ -130,8 +136,7 @@ kni_thread_single(void *data)
 		up_read(&knet->kni_list_lock);
 #ifdef RTE_KNI_PREEMPT_DEFAULT
 		/* reschedule out for a while */
-		schedule_timeout_interruptible(
-			usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
+		usleep_range(min_scheduling_interval, max_scheduling_interval);
 #endif
 	}
 
@@ -150,8 +155,7 @@ kni_thread_multiple(void *param)
 			kni_net_poll_resp(dev);
 		}
 #ifdef RTE_KNI_PREEMPT_DEFAULT
-		schedule_timeout_interruptible(
-			usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
+		usleep_range(min_scheduling_interval, max_scheduling_interval);
 #endif
 	}
 
@@ -593,6 +597,16 @@ kni_init(void)
 	else
 		pr_debug("Default carrier state set to on.\n");
 
+#ifdef RTE_KNI_PREEMPT_DEFAULT
+	if (min_scheduling_interval < 0 || max_scheduling_interval < 0 ||
+		min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL ||
+		max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL ||
+		min_scheduling_interval >= max_scheduling_interval) {
+		pr_err("Invalid parameters for scheduling interval\n");
+		return -EINVAL;
+	}
+#endif
+
 #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS
 	rc = register_pernet_subsys(&kni_net_ops);
 #else
@@ -659,3 +673,17 @@ MODULE_PARM_DESC(carrier,
 "\t\ton    Interfaces will be created with carrier state set to on.\n"
 "\t\t"
 );
+
+#ifdef RTE_KNI_PREEMPT_DEFAULT
+module_param(min_scheduling_interval, long, 0644);
+MODULE_PARM_DESC(min_scheduling_interval,
+"Kni thread min scheduling interval (default=100 microseconds):\n"
+"\t\t"
+);
+
+module_param(max_scheduling_interval, long, 0644);
+MODULE_PARM_DESC(max_scheduling_interval,
+"Kni thread max scheduling interval (default=200 microseconds):\n"
+"\t\t"
+);
+#endif
-- 
2.7.4


  reply	other threads:[~2021-11-02 15:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-02 10:38 [dpdk-dev] [PATCH] " Tudor Cornea
2021-11-02 15:51 ` Tudor Cornea [this message]
2021-11-02 15:53   ` [dpdk-dev] [PATCH v2] " Stephen Hemminger
2021-11-03 20:40     ` Tudor Cornea
2021-11-03 22:18       ` Stephen Hemminger
2021-11-08 10:13   ` [dpdk-dev] [PATCH v3] " Tudor Cornea
2021-11-22 17:31     ` Ferruh Yigit
2021-11-23 17:08       ` Ferruh Yigit
2021-11-24 17:10         ` Tudor Cornea
2021-11-24 19:24     ` [PATCH v4] " Tudor Cornea
2022-01-14 13:53       ` Connolly, Padraig J
2022-01-14 14:13       ` Ferruh Yigit
2022-01-14 15:18       ` [PATCH v5] " Tudor Cornea
2022-01-14 16:24         ` Stephen Hemminger
2022-01-14 16:43           ` Ferruh Yigit
2022-01-17 16:24             ` Tudor Cornea

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1635868273-69843-1-git-send-email-tudor.cornea@gmail.com \
    --to=tudor.cornea@gmail.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git