* [dpdk-dev] [PATCH] kni: allow configuring the kni thread granularity @ 2021-11-02 10:38 Tudor Cornea 2021-11-02 15:51 ` [dpdk-dev] [PATCH v2] " Tudor Cornea 0 siblings, 1 reply; 18+ messages in thread From: Tudor Cornea @ 2021-11-02 10:38 UTC (permalink / raw) To: ferruh.yigit; +Cc: dev, Tudor Cornea The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 milisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic. Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible(). In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols. Since we are in non-atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-milisecond delays, seems unfeasible with a call to schedule_timeout_interruptible(). KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range(). insmod rte_kni.ko kthread_mode=single carrier=on schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases. Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel. Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b [<ffffffff814f7891>] panic+0xcd/0x1e0 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> --- doc/guides/prog_guide/kernel_nic_interface.rst | 33 +++++++++++++++++++++++ kernel/linux/kni/kni_dev.h | 2 +- kernel/linux/kni/kni_misc.c | 36 +++++++++++++++++++++++--- 3 files changed, 66 insertions(+), 5 deletions(-) diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst index 1ce03ec..5d6d535 100644 --- a/doc/guides/prog_guide/kernel_nic_interface.rst +++ b/doc/guides/prog_guide/kernel_nic_interface.rst @@ -56,6 +56,10 @@ can be specified when the module is loaded to control its behavior: off Interfaces will be created with carrier state set to off. on Interfaces will be created with carrier state set to on. (charp) + parm: min_scheduling_interval: "Kni thread min scheduling interval (default=100 microseconds): + (long) + parm: max_scheduling_interval: "Kni thread max scheduling interval (default=200 microseconds): + (long) Loading the ``rte_kni`` kernel module without any optional parameters is the typical way a DPDK application gets packets into and out of the kernel @@ -174,6 +178,35 @@ To set the default carrier state to *off*: If the ``carrier`` parameter is not specified, the default carrier state of KNI interfaces will be set to *off*. +KNI Kthread Scheduling +~~~~~~~~~~~~~~~~~~~~~~ + +The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters +control the rescheduling interval of the KNI kthreads. + +This might be useful if we have use cases in which we require improved +latency or performance for control plane traffic. + +The implementation is backed by Linux hrtimers, and uses ``usleep_range``. +Hence, it will have the same granularity constraints as Linux hrtimers. + +To see more about the Linux hrimers, you can check the following resource: `Kernel Timers <http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_ + +To set the ``min_scheduling_interval`` to a value of 100 microsecnds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=100 + +To set the ``max_scheduling_interval`` to a value of 200 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko max_scheduling_interval=200 + +If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are +not specified, the default interval limits will be set to *100* and *200* respectively. + KNI Creation and Deletion ------------------------- diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index c15da311..bb4d891 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -27,7 +27,7 @@ #include <linux/list.h> #include <rte_kni_common.h> -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */ #define MBUF_BURST_SZ 32 diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index 2b464c4..e23cfd9 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -41,6 +41,12 @@ static uint32_t multiple_kthread_on; static char *carrier; uint32_t kni_dflt_carrier; +#ifdef RTE_KNI_PREEMPT_DEFAULT +/* Kni thread scheduling interval */ +static long min_scheduling_interval = 100; /* us */ +static long max_scheduling_interval = 200; /* us */ +#endif + #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ static int kni_net_id; @@ -130,8 +136,7 @@ kni_thread_single(void *data) up_read(&knet->kni_list_lock); #ifdef RTE_KNI_PREEMPT_DEFAULT /* reschedule out for a while */ - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); + usleep_range(min_scheduling_interval, max_scheduling_interval); #endif } @@ -150,8 +155,7 @@ kni_thread_multiple(void *param) kni_net_poll_resp(dev); } #ifdef RTE_KNI_PREEMPT_DEFAULT - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); + usleep_range(min_scheduling_interval, max_scheduling_interval); #endif } @@ -593,6 +597,16 @@ kni_init(void) else pr_debug("Default carrier state set to on.\n"); +#ifdef RTE_KNI_PREEMPT_DEFAULT + if (min_scheduling_interval < 0 || max_scheduling_interval < 0 || + min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + min_scheduling_interval >= max_scheduling_interval) { + pr_err("Invalid parameters for scheduling interval\n"); + return -EINVAL; + } +#endif + #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS rc = register_pernet_subsys(&kni_net_ops); #else @@ -659,3 +673,17 @@ MODULE_PARM_DESC(carrier, "\t\ton Interfaces will be created with carrier state set to on.\n" "\t\t" ); + +#ifdef RTE_KNI_PREEMPT_DEFAULT +module_param(min_scheduling_interval, long, 0644); +MODULE_PARM_DESC(min_scheduling_interval, +"Kni thread min scheduling interval (default=100 microseconds):\n" +"\t\t" +); + +module_param(max_scheduling_interval, long, 0644); +MODULE_PARM_DESC(max_scheduling_interval, +"Kni thread max scheduling interval (default=200 microseconds):\n" +"\t\t" +); +#endif -- 2.7.4 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [dpdk-dev] [PATCH v2] kni: allow configuring the kni thread granularity 2021-11-02 10:38 [dpdk-dev] [PATCH] kni: allow configuring the kni thread granularity Tudor Cornea @ 2021-11-02 15:51 ` Tudor Cornea 2021-11-02 15:53 ` Stephen Hemminger 2021-11-08 10:13 ` [dpdk-dev] [PATCH v3] " Tudor Cornea 0 siblings, 2 replies; 18+ messages in thread From: Tudor Cornea @ 2021-11-02 15:51 UTC (permalink / raw) To: ferruh.yigit; +Cc: dev, Tudor Cornea The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic. Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible(). In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols. Since we are in non-atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-millisecond delays, seems unfeasible with a call to schedule_timeout_interruptible(). KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range(). insmod rte_kni.ko kthread_mode=single carrier=on schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases. Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel. Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b [<ffffffff814f7891>] panic+0xcd/0x1e0 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> --- v2: * Fixed some spelling errors --- doc/guides/prog_guide/kernel_nic_interface.rst | 33 +++++++++++++++++++++++ kernel/linux/kni/kni_dev.h | 2 +- kernel/linux/kni/kni_misc.c | 36 +++++++++++++++++++++++--- 3 files changed, 66 insertions(+), 5 deletions(-) diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst index 1ce03ec..2dd3481 100644 --- a/doc/guides/prog_guide/kernel_nic_interface.rst +++ b/doc/guides/prog_guide/kernel_nic_interface.rst @@ -56,6 +56,10 @@ can be specified when the module is loaded to control its behavior: off Interfaces will be created with carrier state set to off. on Interfaces will be created with carrier state set to on. (charp) + parm: min_scheduling_interval: "Kni thread min scheduling interval (default=100 microseconds): + (long) + parm: max_scheduling_interval: "Kni thread max scheduling interval (default=200 microseconds): + (long) Loading the ``rte_kni`` kernel module without any optional parameters is the typical way a DPDK application gets packets into and out of the kernel @@ -174,6 +178,35 @@ To set the default carrier state to *off*: If the ``carrier`` parameter is not specified, the default carrier state of KNI interfaces will be set to *off*. +KNI Kthread Scheduling +~~~~~~~~~~~~~~~~~~~~~~ + +The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters +control the rescheduling interval of the KNI kthreads. + +This might be useful if we have use cases in which we require improved +latency or performance for control plane traffic. + +The implementation is backed by Linux hrtimers, and uses ``usleep_range``. +Hence, it will have the same granularity constraints as Linux hrtimers. + +To see more about the Linux hrtimers, you can check the following resource: `Kernel Timers <http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_ + +To set the ``min_scheduling_interval`` to a value of 100 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=100 + +To set the ``max_scheduling_interval`` to a value of 200 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko max_scheduling_interval=200 + +If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are +not specified, the default interval limits will be set to *100* and *200* respectively. + KNI Creation and Deletion ------------------------- diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index c15da311..bb4d891 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -27,7 +27,7 @@ #include <linux/list.h> #include <rte_kni_common.h> -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */ #define MBUF_BURST_SZ 32 diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index 2b464c4..e23cfd9 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -41,6 +41,12 @@ static uint32_t multiple_kthread_on; static char *carrier; uint32_t kni_dflt_carrier; +#ifdef RTE_KNI_PREEMPT_DEFAULT +/* Kni thread scheduling interval */ +static long min_scheduling_interval = 100; /* us */ +static long max_scheduling_interval = 200; /* us */ +#endif + #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ static int kni_net_id; @@ -130,8 +136,7 @@ kni_thread_single(void *data) up_read(&knet->kni_list_lock); #ifdef RTE_KNI_PREEMPT_DEFAULT /* reschedule out for a while */ - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); + usleep_range(min_scheduling_interval, max_scheduling_interval); #endif } @@ -150,8 +155,7 @@ kni_thread_multiple(void *param) kni_net_poll_resp(dev); } #ifdef RTE_KNI_PREEMPT_DEFAULT - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); + usleep_range(min_scheduling_interval, max_scheduling_interval); #endif } @@ -593,6 +597,16 @@ kni_init(void) else pr_debug("Default carrier state set to on.\n"); +#ifdef RTE_KNI_PREEMPT_DEFAULT + if (min_scheduling_interval < 0 || max_scheduling_interval < 0 || + min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + min_scheduling_interval >= max_scheduling_interval) { + pr_err("Invalid parameters for scheduling interval\n"); + return -EINVAL; + } +#endif + #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS rc = register_pernet_subsys(&kni_net_ops); #else @@ -659,3 +673,17 @@ MODULE_PARM_DESC(carrier, "\t\ton Interfaces will be created with carrier state set to on.\n" "\t\t" ); + +#ifdef RTE_KNI_PREEMPT_DEFAULT +module_param(min_scheduling_interval, long, 0644); +MODULE_PARM_DESC(min_scheduling_interval, +"Kni thread min scheduling interval (default=100 microseconds):\n" +"\t\t" +); + +module_param(max_scheduling_interval, long, 0644); +MODULE_PARM_DESC(max_scheduling_interval, +"Kni thread max scheduling interval (default=200 microseconds):\n" +"\t\t" +); +#endif -- 2.7.4 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v2] kni: allow configuring the kni thread granularity 2021-11-02 15:51 ` [dpdk-dev] [PATCH v2] " Tudor Cornea @ 2021-11-02 15:53 ` Stephen Hemminger 2021-11-03 20:40 ` Tudor Cornea 2021-11-08 10:13 ` [dpdk-dev] [PATCH v3] " Tudor Cornea 1 sibling, 1 reply; 18+ messages in thread From: Stephen Hemminger @ 2021-11-02 15:53 UTC (permalink / raw) To: Tudor Cornea; +Cc: ferruh.yigit, dev On Tue, 2 Nov 2021 17:51:13 +0200 Tudor Cornea <tudor.cornea@gmail.com> wrote: > +#ifdef RTE_KNI_PREEMPT_DEFAULT > +module_param(min_scheduling_interval, long, 0644); > +MODULE_PARM_DESC(min_scheduling_interval, > +"Kni thread min scheduling interval (default=100 microseconds):\n" > +"\t\t" > +); Why the non-standard newline's and tab's? Please try to make KNI look like other kernel code. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v2] kni: allow configuring the kni thread granularity 2021-11-02 15:53 ` Stephen Hemminger @ 2021-11-03 20:40 ` Tudor Cornea 2021-11-03 22:18 ` Stephen Hemminger 0 siblings, 1 reply; 18+ messages in thread From: Tudor Cornea @ 2021-11-03 20:40 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Ferruh Yigit, dev On Tue, 2 Nov 2021 at 17:53, Stephen Hemminger <stephen@networkplumber.org> wrote: > On Tue, 2 Nov 2021 17:51:13 +0200 > Tudor Cornea <tudor.cornea@gmail.com> wrote: > > > +#ifdef RTE_KNI_PREEMPT_DEFAULT > > +module_param(min_scheduling_interval, long, 0644); > > +MODULE_PARM_DESC(min_scheduling_interval, > > +"Kni thread min scheduling interval (default=100 microseconds):\n" > > +"\t\t" > > +); > > Why the non-standard newline's and tab's? > Please try to make KNI look like other kernel code. > Hi Stephen, I tried to base the description of the new parameters on an existing parameter implemented for the rte_kni module - carrier. module_param(carrier, charp, 0644); MODULE_PARM_DESC(carrier, "Default carrier state for KNI interface (default=off):\n" "\t\toff Interfaces will be created with carrier state set to off.\n" "\t\ton Interfaces will be created with carrier state set to on.\n" "\t\t" ); I thought about keeping the compatibility in terms of coding style with the existing Kni module parameters. Upon browsing the Linux tree, I realise it might not be standard ( checkpatch.pl , interestingly didn't seem to complain about the patch) I also realise now, that I missed two tabs at the beginning of the params description. Should I add the missing tabs, so that the new parameters that I intend to add through this patch are similar in style to the existing ones, or should I remove the newlines and tabs altogether, when specifying the description for min_scheduling_interval and max_scheduling_interval ? Thanks, Tudor ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [dpdk-dev] [PATCH v2] kni: allow configuring the kni thread granularity 2021-11-03 20:40 ` Tudor Cornea @ 2021-11-03 22:18 ` Stephen Hemminger 0 siblings, 0 replies; 18+ messages in thread From: Stephen Hemminger @ 2021-11-03 22:18 UTC (permalink / raw) To: Tudor Cornea; +Cc: Ferruh Yigit, dev On Wed, 3 Nov 2021 22:40:51 +0200 Tudor Cornea <tudor.cornea@gmail.com> wrote: > On Tue, 2 Nov 2021 at 17:53, Stephen Hemminger <stephen@networkplumber.org> > wrote: > > > On Tue, 2 Nov 2021 17:51:13 +0200 > > Tudor Cornea <tudor.cornea@gmail.com> wrote: > > > > > +#ifdef RTE_KNI_PREEMPT_DEFAULT > > > +module_param(min_scheduling_interval, long, 0644); > > > +MODULE_PARM_DESC(min_scheduling_interval, > > > +"Kni thread min scheduling interval (default=100 microseconds):\n" > > > +"\t\t" > > > +); > > > > Why the non-standard newline's and tab's? > > Please try to make KNI look like other kernel code. > > > > Hi Stephen, > > I tried to base the description of the new parameters on an existing > parameter implemented for the rte_kni module - carrier. > > module_param(carrier, charp, 0644); > MODULE_PARM_DESC(carrier, > "Default carrier state for KNI interface (default=off):\n" > "\t\toff Interfaces will be created with carrier state set to off.\n" > "\t\ton Interfaces will be created with carrier state set to on.\n" > "\t\t" > ); > > I thought about keeping the compatibility in terms of coding style with the > existing Kni module parameters. > Upon browsing the Linux tree, I realise it might not be standard ( > checkpatch.pl , interestingly didn't seem to complain about the patch) > > I also realise now, that I missed two tabs at the beginning of the params > description. > Should I add the missing tabs, so that the new parameters that I intend to > add through this patch are similar in style to the existing ones, or should > I remove the newlines and tabs altogether, when specifying the description > for min_scheduling_interval and max_scheduling_interval ? > > Thanks, > Tudor Although KNI is unlikely to ever get upstream code review, there is no reason to deviate from common practice in kernel drivers. The original module parameter was doing something unconventional. Not wrong, just different. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [dpdk-dev] [PATCH v3] kni: allow configuring the kni thread granularity 2021-11-02 15:51 ` [dpdk-dev] [PATCH v2] " Tudor Cornea 2021-11-02 15:53 ` Stephen Hemminger @ 2021-11-08 10:13 ` Tudor Cornea 2021-11-22 17:31 ` Ferruh Yigit 2021-11-24 19:24 ` [PATCH v4] " Tudor Cornea 1 sibling, 2 replies; 18+ messages in thread From: Tudor Cornea @ 2021-11-08 10:13 UTC (permalink / raw) To: thomas; +Cc: stephen, ferruh.yigit, dev, Tudor Cornea The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic. Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible(). In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols. Since we are not in atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-millisecond delays, seems unfeasible with a call to schedule_timeout_interruptible(). KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range(). We attempt to measure the CPU usage, and RTT between two Kni interfaces, which are created on top of vmxnet3 adapters, connected by a vSwitch. insmod rte_kni.ko kthread_mode=single carrier=on schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases. Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel. Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b [<ffffffff814f7891>] panic+0xcd/0x1e0 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> --- v3: * Fixed unwrapped commit description warning * Changed from hrtimers to Linux High Precision Timers in docs * Added two tabs at the beginning of the new params description. Stephen correctly pointed out that the descriptions of the parameters for the Kni module are nonstandard w.r.t existing kernel code. I was thinking to preserve compatibility with the existing parameters of the Kni module for the moment, while an additional clean-up patch could format the descriptions to be closer to the kernel standard. v2: * Fixed some spelling errors --- doc/guides/prog_guide/kernel_nic_interface.rst | 33 +++++++++++++++++++++++ kernel/linux/kni/kni_dev.h | 2 +- kernel/linux/kni/kni_misc.c | 36 +++++++++++++++++++++++--- 3 files changed, 66 insertions(+), 5 deletions(-) diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst index 1ce03ec..fce3667 100644 --- a/doc/guides/prog_guide/kernel_nic_interface.rst +++ b/doc/guides/prog_guide/kernel_nic_interface.rst @@ -56,6 +56,10 @@ can be specified when the module is loaded to control its behavior: off Interfaces will be created with carrier state set to off. on Interfaces will be created with carrier state set to on. (charp) + parm: min_scheduling_interval: "Kni thread min scheduling interval (default=100 microseconds): + (long) + parm: max_scheduling_interval: "Kni thread max scheduling interval (default=200 microseconds): + (long) Loading the ``rte_kni`` kernel module without any optional parameters is the typical way a DPDK application gets packets into and out of the kernel @@ -174,6 +178,35 @@ To set the default carrier state to *off*: If the ``carrier`` parameter is not specified, the default carrier state of KNI interfaces will be set to *off*. +KNI Kthread Scheduling +~~~~~~~~~~~~~~~~~~~~~~ + +The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters +control the rescheduling interval of the KNI kthreads. + +This might be useful if we have use cases in which we require improved +latency or performance for control plane traffic. + +The implementation is backed by Linux High Precision Timers, and uses ``usleep_range``. +Hence, it will have the same granularity constraints as this Linux subsystem. + +For Linux High Precision Timers, you can check the following resource: `Kernel Timers <http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_ + +To set the ``min_scheduling_interval`` to a value of 100 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=100 + +To set the ``max_scheduling_interval`` to a value of 200 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko max_scheduling_interval=200 + +If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are +not specified, the default interval limits will be set to *100* and *200* respectively. + KNI Creation and Deletion ------------------------- diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index c15da311..bb4d891 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -27,7 +27,7 @@ #include <linux/list.h> #include <rte_kni_common.h> -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */ #define MBUF_BURST_SZ 32 diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index 2b464c4..1bfa33f 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -41,6 +41,12 @@ static uint32_t multiple_kthread_on; static char *carrier; uint32_t kni_dflt_carrier; +#ifdef RTE_KNI_PREEMPT_DEFAULT +/* Kni thread scheduling interval */ +static long min_scheduling_interval = 100; /* us */ +static long max_scheduling_interval = 200; /* us */ +#endif + #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ static int kni_net_id; @@ -130,8 +136,7 @@ kni_thread_single(void *data) up_read(&knet->kni_list_lock); #ifdef RTE_KNI_PREEMPT_DEFAULT /* reschedule out for a while */ - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); + usleep_range(min_scheduling_interval, max_scheduling_interval); #endif } @@ -150,8 +155,7 @@ kni_thread_multiple(void *param) kni_net_poll_resp(dev); } #ifdef RTE_KNI_PREEMPT_DEFAULT - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); + usleep_range(min_scheduling_interval, max_scheduling_interval); #endif } @@ -593,6 +597,16 @@ kni_init(void) else pr_debug("Default carrier state set to on.\n"); +#ifdef RTE_KNI_PREEMPT_DEFAULT + if (min_scheduling_interval < 0 || max_scheduling_interval < 0 || + min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + min_scheduling_interval >= max_scheduling_interval) { + pr_err("Invalid parameters for scheduling interval\n"); + return -EINVAL; + } +#endif + #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS rc = register_pernet_subsys(&kni_net_ops); #else @@ -659,3 +673,17 @@ MODULE_PARM_DESC(carrier, "\t\ton Interfaces will be created with carrier state set to on.\n" "\t\t" ); + +#ifdef RTE_KNI_PREEMPT_DEFAULT +module_param(min_scheduling_interval, long, 0644); +MODULE_PARM_DESC(min_scheduling_interval, +"\t\tKni thread min scheduling interval (default=100 microseconds):\n" +"\t\t" +); + +module_param(max_scheduling_interval, long, 0644); +MODULE_PARM_DESC(max_scheduling_interval, +"\t\tKni thread max scheduling interval (default=200 microseconds):\n" +"\t\t" +); +#endif -- 2.7.4 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] kni: allow configuring the kni thread granularity 2021-11-08 10:13 ` [dpdk-dev] [PATCH v3] " Tudor Cornea @ 2021-11-22 17:31 ` Ferruh Yigit 2021-11-23 17:08 ` Ferruh Yigit 2021-11-24 19:24 ` [PATCH v4] " Tudor Cornea 1 sibling, 1 reply; 18+ messages in thread From: Ferruh Yigit @ 2021-11-22 17:31 UTC (permalink / raw) To: Tudor Cornea, thomas; +Cc: stephen, dev On 11/8/2021 10:13 AM, Tudor Cornea wrote: > The Kni kthreads seem to be re-scheduled at a granularity of roughly > 1 millisecond right now, which seems to be insufficient for performing > tests involving a lot of control plane traffic. > > Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it > seems that the existing code cannot reschedule at the desired granularily, > due to precision constraints of schedule_timeout_interruptible(). > ack > In our use case, we leverage the Linux Kernel for control plane, and > it is not uncommon to have 60K - 100K pps for some signaling protocols. > > Since we are not in atomic context, the usleep_range() function seems to be > more appropriate for being able to introduce smaller controlled delays, > in the range of 5-10 microseconds. Upon reading the existing code, it would > seem that this was the original intent. Adding sub-millisecond delays, > seems unfeasible with a call to schedule_timeout_interruptible(). >> KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ > schedule_timeout_interruptible( > usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); > Agree, although comment highlights that intention is to have microsecond current code doesn't provide it. > Below, we attempted a brief comparison between the existing implementation, > which uses schedule_timeout_interruptible() and usleep_range(). > +1 to use 'usleep_range()'. Overall +1 to the change, I think it fixes the kernel thread delay, and makes it configurable. As you clarified below, making the polls too frequent cause too much CPU consumption, so it is good idea to make it configurable. Let me test the code first, I think it is too late for this release, but we can get it for next release if the testing goes well. > We attempt to measure the CPU usage, and RTT between two Kni interfaces, > which are created on top of vmxnet3 adapters, connected by a vSwitch. > > insmod rte_kni.ko kthread_mode=single carrier=on > > schedule_timeout_interruptible(usecs_to_jiffies(5)) > kni_single CPU Usage: 2-4 % > [root@localhost ~]# ping 1.1.1.2 -I eth1 > PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms > > usleep_range(5, 10) > kni_single CPU usage: 50% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms > > usleep_range(20, 50) > kni_single CPU usage: 24% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms > > usleep_range(50, 100) > kni_single CPU usage: 13% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms > > usleep_range(100, 200) > kni_single CPU usage: 7% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms > > usleep_range(1000, 1100) > kni_single CPU usage: 2% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms > > Upon testing, usleep_range(1000, 1100) seems roughly equivalent in > latency and cpu usage to the variant with schedule_timeout_interruptible(), > while usleep_range(100, 200) seems to give a decent tradeoff between > latency and cpu usage, while allowing users to tweak the limits for > improved precision if they have such use cases. > > Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a > softlockup on my kernel. > Same here. That is why I wonder if there is a point to keep the compile time flag? Since we can't unset it practically, and now the delay can be configurable by module parameters, what do you think to remove the compile time flag completely? > Kernel panic - not syncing: softlockup: hung tasks > CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 > <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b > [<ffffffff814f7891>] panic+0xcd/0x1e0 > [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 > [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 > [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 > [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 > [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 > > References: > [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt > > Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> > <...> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] kni: allow configuring the kni thread granularity 2021-11-22 17:31 ` Ferruh Yigit @ 2021-11-23 17:08 ` Ferruh Yigit 2021-11-24 17:10 ` Tudor Cornea 0 siblings, 1 reply; 18+ messages in thread From: Ferruh Yigit @ 2021-11-23 17:08 UTC (permalink / raw) To: Tudor Cornea, thomas; +Cc: stephen, dev On 11/22/2021 5:31 PM, Ferruh Yigit wrote: > On 11/8/2021 10:13 AM, Tudor Cornea wrote: >> The Kni kthreads seem to be re-scheduled at a granularity of roughly >> 1 millisecond right now, which seems to be insufficient for performing >> tests involving a lot of control plane traffic. >> >> Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it >> seems that the existing code cannot reschedule at the desired granularily, >> due to precision constraints of schedule_timeout_interruptible(). >> > > ack > >> In our use case, we leverage the Linux Kernel for control plane, and >> it is not uncommon to have 60K - 100K pps for some signaling protocols. >> >> Since we are not in atomic context, the usleep_range() function seems to be >> more appropriate for being able to introduce smaller controlled delays, >> in the range of 5-10 microseconds. Upon reading the existing code, it would >> seem that this was the original intent. Adding sub-millisecond delays, >> seems unfeasible with a call to schedule_timeout_interruptible(). >>> KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ >> schedule_timeout_interruptible( >> usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); >> > > Agree, although comment highlights that intention is to have microsecond > current code doesn't provide it. > >> Below, we attempted a brief comparison between the existing implementation, >> which uses schedule_timeout_interruptible() and usleep_range(). >> > > +1 to use 'usleep_range()'. > > Overall +1 to the change, I think it fixes the kernel thread delay, and > makes it configurable. As you clarified below, making the polls too frequent > cause too much CPU consumption, so it is good idea to make it configurable. > > Let me test the code first, I think it is too late for this release, but > we can get it for next release if the testing goes well. > As I tested both with KNI sample app and KNI PMD, change looks good, practically we can't change the scheduler delay with existing code but this patch enables it and lets configure performance/CPU usage trade of. Only possible change is to remove 'RTE_KNI_PREEMPT_DEFAULT' macro as mentioned below. >> We attempt to measure the CPU usage, and RTT between two Kni interfaces, >> which are created on top of vmxnet3 adapters, connected by a vSwitch. >> >> insmod rte_kni.ko kthread_mode=single carrier=on >> >> schedule_timeout_interruptible(usecs_to_jiffies(5)) >> kni_single CPU Usage: 2-4 % >> [root@localhost ~]# ping 1.1.1.2 -I eth1 >> PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. >> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms >> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms >> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms >> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms >> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms >> >> usleep_range(5, 10) >> kni_single CPU usage: 50% >> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms >> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms >> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms >> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms >> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms >> >> usleep_range(20, 50) >> kni_single CPU usage: 24% >> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms >> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms >> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms >> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms >> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms >> >> usleep_range(50, 100) >> kni_single CPU usage: 13% >> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms >> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms >> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms >> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms >> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms >> >> usleep_range(100, 200) >> kni_single CPU usage: 7% >> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms >> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms >> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms >> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms >> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms >> >> usleep_range(1000, 1100) >> kni_single CPU usage: 2% >> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms >> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms >> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms >> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms >> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms >> >> Upon testing, usleep_range(1000, 1100) seems roughly equivalent in >> latency and cpu usage to the variant with schedule_timeout_interruptible(), >> while usleep_range(100, 200) seems to give a decent tradeoff between >> latency and cpu usage, while allowing users to tweak the limits for >> improved precision if they have such use cases. >> >> Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a >> softlockup on my kernel. >> > > Same here. That is why I wonder if there is a point to keep the compile > time flag? > Since we can't unset it practically, and now the delay can be configurable > by module parameters, what do you think to remove the compile time flag > completely? > >> Kernel panic - not syncing: softlockup: hung tasks >> CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 >> <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b >> [<ffffffff814f7891>] panic+0xcd/0x1e0 >> [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 >> [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 >> [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 >> [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 >> [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 >> >> References: >> [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt >> >> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> >> > > <...> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3] kni: allow configuring the kni thread granularity 2021-11-23 17:08 ` Ferruh Yigit @ 2021-11-24 17:10 ` Tudor Cornea 0 siblings, 0 replies; 18+ messages in thread From: Tudor Cornea @ 2021-11-24 17:10 UTC (permalink / raw) To: Ferruh Yigit; +Cc: Thomas Monjalon, Stephen Hemminger, dev, helin.zhang [-- Attachment #1: Type: text/plain, Size: 453 bytes --] Hi Ferruh, > As I tested both with KNI sample app and KNI PMD, change looks good, > practically we can't change the scheduler delay with existing code but > this patch enables it and lets configure performance/CPU usage trade of. > > Only possible change is to remove 'RTE_KNI_PREEMPT_DEFAULT' macro as > mentioned below. > > Thanks for the suggestion. I will send an updated version of the patch, which will remove the RTE_KNI_PREEMPT_DEFAULT macro [-- Attachment #2: Type: text/html, Size: 770 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4] kni: allow configuring the kni thread granularity 2021-11-08 10:13 ` [dpdk-dev] [PATCH v3] " Tudor Cornea 2021-11-22 17:31 ` Ferruh Yigit @ 2021-11-24 19:24 ` Tudor Cornea 2022-01-14 13:53 ` Connolly, Padraig J ` (2 more replies) 1 sibling, 3 replies; 18+ messages in thread From: Tudor Cornea @ 2021-11-24 19:24 UTC (permalink / raw) To: ferruh.yigit; +Cc: thomas, stephen, helin.zhang, dev, Tudor Cornea The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic. Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible(). In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols. Since we are not in atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-millisecond delays, seems unfeasible with a call to schedule_timeout_interruptible(). KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range(). We attempt to measure the CPU usage, and RTT between two Kni interfaces, which are created on top of vmxnet3 adapters, connected by a vSwitch. insmod rte_kni.ko kthread_mode=single carrier=on schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases. Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel. Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b [<ffffffff814f7891>] panic+0xcd/0x1e0 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 This patch also attempts to remove this option. References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> --- v4: * Removed RTE_KNI_PREEMPT_DEFAULT configuration option v3: * Fixed unwrapped commit description warning * Changed from hrtimers to Linux High Precision Timers in docs * Added two tabs at the beginning of the new params description. Stephen correctly pointed out that the descriptions of the parameters for the Kni module are nonstandard w.r.t existing kernel code. I was thinking to preserve compatibility with the existing parameters of the Kni module for the moment, while an additional clean-up patch could format the descriptions to be closer to the kernel standard. v2: * Fixed some spelling errors --- config/rte_config.h | 3 --- doc/guides/prog_guide/kernel_nic_interface.rst | 33 +++++++++++++++++++++++++ kernel/linux/kni/kni_dev.h | 2 +- kernel/linux/kni/kni_misc.c | 34 ++++++++++++++++++++------ 4 files changed, 60 insertions(+), 12 deletions(-) diff --git a/config/rte_config.h b/config/rte_config.h index cab4390..91d96ee 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -95,9 +95,6 @@ #define RTE_SCHED_PORT_N_GRINDERS 8 #undef RTE_SCHED_VECTOR -/* KNI defines */ -#define RTE_KNI_PREEMPT_DEFAULT 1 - /* rte_graph defines */ #define RTE_GRAPH_BURST_SIZE 256 #define RTE_LIBRTE_GRAPH_STATS 1 diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst index 1ce03ec..fce3667 100644 --- a/doc/guides/prog_guide/kernel_nic_interface.rst +++ b/doc/guides/prog_guide/kernel_nic_interface.rst @@ -56,6 +56,10 @@ can be specified when the module is loaded to control its behavior: off Interfaces will be created with carrier state set to off. on Interfaces will be created with carrier state set to on. (charp) + parm: min_scheduling_interval: "Kni thread min scheduling interval (default=100 microseconds): + (long) + parm: max_scheduling_interval: "Kni thread max scheduling interval (default=200 microseconds): + (long) Loading the ``rte_kni`` kernel module without any optional parameters is the typical way a DPDK application gets packets into and out of the kernel @@ -174,6 +178,35 @@ To set the default carrier state to *off*: If the ``carrier`` parameter is not specified, the default carrier state of KNI interfaces will be set to *off*. +KNI Kthread Scheduling +~~~~~~~~~~~~~~~~~~~~~~ + +The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters +control the rescheduling interval of the KNI kthreads. + +This might be useful if we have use cases in which we require improved +latency or performance for control plane traffic. + +The implementation is backed by Linux High Precision Timers, and uses ``usleep_range``. +Hence, it will have the same granularity constraints as this Linux subsystem. + +For Linux High Precision Timers, you can check the following resource: `Kernel Timers <http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_ + +To set the ``min_scheduling_interval`` to a value of 100 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=100 + +To set the ``max_scheduling_interval`` to a value of 200 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko max_scheduling_interval=200 + +If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are +not specified, the default interval limits will be set to *100* and *200* respectively. + KNI Creation and Deletion ------------------------- diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index c15da311..bb4d891 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -27,7 +27,7 @@ #include <linux/list.h> #include <rte_kni_common.h> -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */ #define MBUF_BURST_SZ 32 diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index f4944e1..23132bb 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -41,6 +41,10 @@ static uint32_t multiple_kthread_on; static char *carrier; uint32_t kni_dflt_carrier; +/* Kni thread scheduling interval */ +static long min_scheduling_interval = 100; /* us */ +static long max_scheduling_interval = 200; /* us */ + #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ static int kni_net_id; @@ -128,11 +132,8 @@ kni_thread_single(void *data) } } up_read(&knet->kni_list_lock); -#ifdef RTE_KNI_PREEMPT_DEFAULT /* reschedule out for a while */ - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); -#endif + usleep_range(min_scheduling_interval, max_scheduling_interval); } return 0; @@ -149,10 +150,7 @@ kni_thread_multiple(void *param) kni_net_rx(dev); kni_net_poll_resp(dev); } -#ifdef RTE_KNI_PREEMPT_DEFAULT - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); -#endif + usleep_range(min_scheduling_interval, max_scheduling_interval); } return 0; @@ -590,6 +588,14 @@ kni_init(void) else pr_debug("Default carrier state set to on.\n"); + if (min_scheduling_interval < 0 || max_scheduling_interval < 0 || + min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + min_scheduling_interval >= max_scheduling_interval) { + pr_err("Invalid parameters for scheduling interval\n"); + return -EINVAL; + } + #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS rc = register_pernet_subsys(&kni_net_ops); #else @@ -656,3 +662,15 @@ MODULE_PARM_DESC(carrier, "\t\ton Interfaces will be created with carrier state set to on.\n" "\t\t" ); + +module_param(min_scheduling_interval, long, 0644); +MODULE_PARM_DESC(min_scheduling_interval, +"\t\tKni thread min scheduling interval (default=100 microseconds):\n" +"\t\t" +); + +module_param(max_scheduling_interval, long, 0644); +MODULE_PARM_DESC(max_scheduling_interval, +"\t\tKni thread max scheduling interval (default=200 microseconds):\n" +"\t\t" +); -- 2.7.4 ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: [PATCH v4] kni: allow configuring the kni thread granularity 2021-11-24 19:24 ` [PATCH v4] " Tudor Cornea @ 2022-01-14 13:53 ` Connolly, Padraig J 2022-01-14 14:13 ` Ferruh Yigit 2022-01-14 15:18 ` [PATCH v5] " Tudor Cornea 2 siblings, 0 replies; 18+ messages in thread From: Connolly, Padraig J @ 2022-01-14 13:53 UTC (permalink / raw) To: Tudor Cornea, Yigit, Ferruh, Connolly, Padraig J Cc: thomas, stephen, Zhang, Helin, dev > -----Original Message----- > From: Tudor Cornea <tudor.cornea@gmail.com> > Sent: Wednesday, November 24, 2021 7:24 PM > To: Yigit, Ferruh <ferruh.yigit@intel.com> > Cc: thomas@monjalon.net; stephen@networkplumber.org; Zhang, Helin > <helin.zhang@intel.com>; dev@dpdk.org; Tudor Cornea > <tudor.cornea@gmail.com> > Subject: [PATCH v4] kni: allow configuring the kni thread granularity > > The Kni kthreads seem to be re-scheduled at a granularity of roughly > 1 millisecond right now, which seems to be insufficient for performing tests > involving a lot of control plane traffic. > > Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it > seems that the existing code cannot reschedule at the desired granularily, > due to precision constraints of schedule_timeout_interruptible(). > > In our use case, we leverage the Linux Kernel for control plane, and it is not > uncommon to have 60K - 100K pps for some signaling protocols. > > Since we are not in atomic context, the usleep_range() function seems to be > more appropriate for being able to introduce smaller controlled delays, in the > range of 5-10 microseconds. Upon reading the existing code, it would seem > that this was the original intent. Adding sub-millisecond delays, seems > unfeasible with a call to schedule_timeout_interruptible(). > > KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ > schedule_timeout_interruptible( > usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); > > Below, we attempted a brief comparison between the existing > implementation, which uses schedule_timeout_interruptible() and > usleep_range(). > > We attempt to measure the CPU usage, and RTT between two Kni interfaces, > which are created on top of vmxnet3 adapters, connected by a vSwitch. > > insmod rte_kni.ko kthread_mode=single carrier=on > > schedule_timeout_interruptible(usecs_to_jiffies(5)) > kni_single CPU Usage: 2-4 % > [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 > eth1: 56(84) bytes of data. > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms > > usleep_range(5, 10) > kni_single CPU usage: 50% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms > > usleep_range(20, 50) > kni_single CPU usage: 24% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms > > usleep_range(50, 100) > kni_single CPU usage: 13% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms > > usleep_range(100, 200) > kni_single CPU usage: 7% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms > > usleep_range(1000, 1100) > kni_single CPU usage: 2% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms > > Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency > and cpu usage to the variant with schedule_timeout_interruptible(), while > usleep_range(100, 200) seems to give a decent tradeoff between latency > and cpu usage, while allowing users to tweak the limits for improved > precision if they have such use cases. > > Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a > softlockup on my kernel. > > Kernel panic - not syncing: softlockup: hung tasks > CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 > <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b [<ffffffff814f7891>] > panic+0xcd/0x1e0 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 > [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 [<ffffffff81064b57>] > hrtimer_interrupt+0xe7/0x1f0 [<ffffffff8102cd57>] > smp_apic_timer_interrupt+0x67/0xa0 > [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 > > This patch also attempts to remove this option. > > References: > [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt > > Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> > > --- > v4: > * Removed RTE_KNI_PREEMPT_DEFAULT configuration option > v3: > * Fixed unwrapped commit description warning > * Changed from hrtimers to Linux High Precision Timers in docs > * Added two tabs at the beginning of the new params description. > Stephen correctly pointed out that the descriptions of the parameters > for the Kni module are nonstandard w.r.t existing kernel code. > I was thinking to preserve compatibility with the existing parameters > of the Kni module for the moment, while an additional clean-up patch > could format the descriptions to be closer to the kernel standard. > v2: > * Fixed some spelling errors > --- > config/rte_config.h | 3 --- > doc/guides/prog_guide/kernel_nic_interface.rst | 33 > +++++++++++++++++++++++++ > kernel/linux/kni/kni_dev.h | 2 +- > kernel/linux/kni/kni_misc.c | 34 ++++++++++++++++++++------ > 4 files changed, 60 insertions(+), 12 deletions(-) > > diff --git a/config/rte_config.h b/config/rte_config.h index cab4390..91d96ee > 100644 > --- a/config/rte_config.h > +++ b/config/rte_config.h > @@ -95,9 +95,6 @@ > #define RTE_SCHED_PORT_N_GRINDERS 8 > #undef RTE_SCHED_VECTOR > > -/* KNI defines */ > -#define RTE_KNI_PREEMPT_DEFAULT 1 > - > /* rte_graph defines */ > #define RTE_GRAPH_BURST_SIZE 256 > #define RTE_LIBRTE_GRAPH_STATS 1 > diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst > b/doc/guides/prog_guide/kernel_nic_interface.rst > index 1ce03ec..fce3667 100644 > --- a/doc/guides/prog_guide/kernel_nic_interface.rst > +++ b/doc/guides/prog_guide/kernel_nic_interface.rst > @@ -56,6 +56,10 @@ can be specified when the module is loaded to control > its behavior: > off Interfaces will be created with carrier state set to off. > on Interfaces will be created with carrier state set to on. > (charp) > + parm: min_scheduling_interval: "Kni thread min scheduling interval > (default=100 microseconds): > + (long) > + parm: max_scheduling_interval: "Kni thread max scheduling interval > (default=200 microseconds): > + (long) > > Loading the ``rte_kni`` kernel module without any optional parameters is > the typical way a DPDK application gets packets into and out of the kernel > @@ -174,6 +178,35 @@ To set the default carrier state to *off*: > If the ``carrier`` parameter is not specified, the default carrier state of KNI > interfaces will be set to *off*. > > +KNI Kthread Scheduling > +~~~~~~~~~~~~~~~~~~~~~~ > + > +The ``min_scheduling_interval`` and ``max_scheduling_interval`` > +parameters control the rescheduling interval of the KNI kthreads. > + > +This might be useful if we have use cases in which we require improved > +latency or performance for control plane traffic. > + > +The implementation is backed by Linux High Precision Timers, and uses > ``usleep_range``. > +Hence, it will have the same granularity constraints as this Linux subsystem. > + > +For Linux High Precision Timers, you can check the following resource: > +`Kernel Timers > +<http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_ > + > +To set the ``min_scheduling_interval`` to a value of 100 microseconds: > + > +.. code-block:: console > + > + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko > + min_scheduling_interval=100 > + > +To set the ``max_scheduling_interval`` to a value of 200 microseconds: > + > +.. code-block:: console > + > + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko > + max_scheduling_interval=200 > + > +If the ``min_scheduling_interval`` and ``max_scheduling_interval`` > +parameters are not specified, the default interval limits will be set to *100* > and *200* respectively. > + > KNI Creation and Deletion > ------------------------- > > diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index > c15da311..bb4d891 100644 > --- a/kernel/linux/kni/kni_dev.h > +++ b/kernel/linux/kni/kni_dev.h > @@ -27,7 +27,7 @@ > #include <linux/list.h> > > #include <rte_kni_common.h> > -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ > +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */ > > #define MBUF_BURST_SZ 32 > > diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index > f4944e1..23132bb 100644 > --- a/kernel/linux/kni/kni_misc.c > +++ b/kernel/linux/kni/kni_misc.c > @@ -41,6 +41,10 @@ static uint32_t multiple_kthread_on; static char > *carrier; uint32_t kni_dflt_carrier; > > +/* Kni thread scheduling interval */ > +static long min_scheduling_interval = 100; /* us */ static long > +max_scheduling_interval = 200; /* us */ > + > #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ > > static int kni_net_id; > @@ -128,11 +132,8 @@ kni_thread_single(void *data) > } > } > up_read(&knet->kni_list_lock); > -#ifdef RTE_KNI_PREEMPT_DEFAULT > /* reschedule out for a while */ > - schedule_timeout_interruptible( > - > usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); > -#endif > + usleep_range(min_scheduling_interval, > max_scheduling_interval); > } > > return 0; > @@ -149,10 +150,7 @@ kni_thread_multiple(void *param) > kni_net_rx(dev); > kni_net_poll_resp(dev); > } > -#ifdef RTE_KNI_PREEMPT_DEFAULT > - schedule_timeout_interruptible( > - > usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); > -#endif > + usleep_range(min_scheduling_interval, > max_scheduling_interval); > } > > return 0; > @@ -590,6 +588,14 @@ kni_init(void) > else > pr_debug("Default carrier state set to on.\n"); > > + if (min_scheduling_interval < 0 || max_scheduling_interval < 0 || > + min_scheduling_interval > > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || > + max_scheduling_interval > > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || > + min_scheduling_interval >= max_scheduling_interval) { > + pr_err("Invalid parameters for scheduling interval\n"); > + return -EINVAL; > + } > + > #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS > rc = register_pernet_subsys(&kni_net_ops); > #else > @@ -656,3 +662,15 @@ MODULE_PARM_DESC(carrier, > "\t\ton Interfaces will be created with carrier state set to on.\n" > "\t\t" > ); > + > +module_param(min_scheduling_interval, long, 0644); > +MODULE_PARM_DESC(min_scheduling_interval, > +"\t\tKni thread min scheduling interval (default=100 microseconds):\n" > +"\t\t" > +); > + > +module_param(max_scheduling_interval, long, 0644); > +MODULE_PARM_DESC(max_scheduling_interval, > +"\t\tKni thread max scheduling interval (default=200 microseconds):\n" > +"\t\t" > +); This patch looks good to me. Tested with <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=20 max_scheduling_interval=50 Results: KNI Perf before patch was added: 64 bytes from 5.5.5.6: icmp_seq=1 ttl=64 time=4.79 ms 64 bytes from 5.5.5.6: icmp_seq=2 ttl=64 time=2.97 ms 64 bytes from 5.5.5.6: icmp_seq=3 ttl=64 time=1.90 ms 64 bytes from 5.5.5.6: icmp_seq=4 ttl=64 time=7.94 ms 64 bytes from 5.5.5.6: icmp_seq=5 ttl=64 time=6.85 ms KNI Perf after patch was added (min_scheduling_interval=20 max_scheduling_interval=50): 64 bytes from 5.5.5.6: icmp_seq=1 ttl=64 time=0.106 ms 64 bytes from 5.5.5.6: icmp_seq=2 ttl=64 time=0.055 ms 64 bytes from 5.5.5.6: icmp_seq=3 ttl=64 time=0.059 ms 64 bytes from 5.5.5.6: icmp_seq=4 ttl=64 time=0.056 ms 64 bytes from 5.5.5.6: icmp_seq=5 ttl=64 time=0.061 ms Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com> > -- > 2.7.4 -------------------------------------------------------------- Intel Research and Development Ireland Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v4] kni: allow configuring the kni thread granularity 2021-11-24 19:24 ` [PATCH v4] " Tudor Cornea 2022-01-14 13:53 ` Connolly, Padraig J @ 2022-01-14 14:13 ` Ferruh Yigit 2022-01-14 15:18 ` [PATCH v5] " Tudor Cornea 2 siblings, 0 replies; 18+ messages in thread From: Ferruh Yigit @ 2022-01-14 14:13 UTC (permalink / raw) To: Tudor Cornea; +Cc: thomas, stephen, helin.zhang, dev On 11/24/2021 7:24 PM, Tudor Cornea wrote: > The Kni kthreads seem to be re-scheduled at a granularity of roughly > 1 millisecond right now, which seems to be insufficient for performing > tests involving a lot of control plane traffic. > > Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it > seems that the existing code cannot reschedule at the desired granularily, > due to precision constraints of schedule_timeout_interruptible(). > > In our use case, we leverage the Linux Kernel for control plane, and > it is not uncommon to have 60K - 100K pps for some signaling protocols. > > Since we are not in atomic context, the usleep_range() function seems to be > more appropriate for being able to introduce smaller controlled delays, > in the range of 5-10 microseconds. Upon reading the existing code, it would > seem that this was the original intent. Adding sub-millisecond delays, > seems unfeasible with a call to schedule_timeout_interruptible(). > > KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ > schedule_timeout_interruptible( > usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); > > Below, we attempted a brief comparison between the existing implementation, > which uses schedule_timeout_interruptible() and usleep_range(). > > We attempt to measure the CPU usage, and RTT between two Kni interfaces, > which are created on top of vmxnet3 adapters, connected by a vSwitch. > > insmod rte_kni.ko kthread_mode=single carrier=on > > schedule_timeout_interruptible(usecs_to_jiffies(5)) > kni_single CPU Usage: 2-4 % > [root@localhost ~]# ping 1.1.1.2 -I eth1 > PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms > > usleep_range(5, 10) > kni_single CPU usage: 50% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms > > usleep_range(20, 50) > kni_single CPU usage: 24% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms > > usleep_range(50, 100) > kni_single CPU usage: 13% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms > > usleep_range(100, 200) > kni_single CPU usage: 7% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms > > usleep_range(1000, 1100) > kni_single CPU usage: 2% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms > > Upon testing, usleep_range(1000, 1100) seems roughly equivalent in > latency and cpu usage to the variant with schedule_timeout_interruptible(), > while usleep_range(100, 200) seems to give a decent tradeoff between > latency and cpu usage, while allowing users to tweak the limits for > improved precision if they have such use cases. > > Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a > softlockup on my kernel. > > Kernel panic - not syncing: softlockup: hung tasks > CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 > <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b > [<ffffffff814f7891>] panic+0xcd/0x1e0 > [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 > [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 > [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 > [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 > [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 > > This patch also attempts to remove this option. > > References: > [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt > > Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> > > --- > v4: > * Removed RTE_KNI_PREEMPT_DEFAULT configuration option > v3: > * Fixed unwrapped commit description warning > * Changed from hrtimers to Linux High Precision Timers in docs > * Added two tabs at the beginning of the new params description. > Stephen correctly pointed out that the descriptions of the parameters > for the Kni module are nonstandard w.r.t existing kernel code. > I was thinking to preserve compatibility with the existing parameters > of the Kni module for the moment, while an additional clean-up patch > could format the descriptions to be closer to the kernel standard. > v2: > * Fixed some spelling errors Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Only it doesn't apply cleanly because of the latest changes in the kni, can you please rebase it? Please keep the review tag in the new version after rebase. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v5] kni: allow configuring the kni thread granularity 2021-11-24 19:24 ` [PATCH v4] " Tudor Cornea 2022-01-14 13:53 ` Connolly, Padraig J 2022-01-14 14:13 ` Ferruh Yigit @ 2022-01-14 15:18 ` Tudor Cornea 2022-01-14 16:24 ` Stephen Hemminger 2022-01-20 12:41 ` [PATCH v6] " Tudor Cornea 2 siblings, 2 replies; 18+ messages in thread From: Tudor Cornea @ 2022-01-14 15:18 UTC (permalink / raw) To: ferruh.yigit Cc: padraig.j.connolly, thomas, stephen, helin.zhang, dev, Tudor Cornea The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic. Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible(). In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols. Since we are not in atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-millisecond delays, seems unfeasible with a call to schedule_timeout_interruptible(). KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range(). We attempt to measure the CPU usage, and RTT between two Kni interfaces, which are created on top of vmxnet3 adapters, connected by a vSwitch. insmod rte_kni.ko kthread_mode=single carrier=on schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases. Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel. Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b [<ffffffff814f7891>] panic+0xcd/0x1e0 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 This patch also attempts to remove this option. References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> --- v5: * Rebased the patch on top of the dpdk-next-net branch v4: * Removed RTE_KNI_PREEMPT_DEFAULT configuration option v3: * Fixed unwrapped commit description warning * Changed from hrtimers to Linux High Precision Timers in docs * Added two tabs at the beginning of the new params description. Stephen correctly pointed out that the descriptions of the parameters for the Kni module are nonstandard w.r.t existing kernel code. I was thinking to preserve compatibility with the existing parameters of the Kni module for the moment, while an additional clean-up patch could format the descriptions to be closer to the kernel standard. v2: * Fixed some spelling errors --- --- config/rte_config.h | 3 --- doc/guides/prog_guide/kernel_nic_interface.rst | 33 +++++++++++++++++++++++++ kernel/linux/kni/kni_dev.h | 2 +- kernel/linux/kni/kni_misc.c | 34 ++++++++++++++++++++------ 4 files changed, 60 insertions(+), 12 deletions(-) diff --git a/config/rte_config.h b/config/rte_config.h index cab4390..91d96ee 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -95,9 +95,6 @@ #define RTE_SCHED_PORT_N_GRINDERS 8 #undef RTE_SCHED_VECTOR -/* KNI defines */ -#define RTE_KNI_PREEMPT_DEFAULT 1 - /* rte_graph defines */ #define RTE_GRAPH_BURST_SIZE 256 #define RTE_LIBRTE_GRAPH_STATS 1 diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst index 771c7d7..a0763c5 100644 --- a/doc/guides/prog_guide/kernel_nic_interface.rst +++ b/doc/guides/prog_guide/kernel_nic_interface.rst @@ -61,6 +61,10 @@ can be specified when the module is loaded to control its behavior: userspace callback and supporting async requests (default=off): on Enable request processing support for bifurcated drivers. (charp) + parm: min_scheduling_interval: "Kni thread min scheduling interval (default=100 microseconds): + (long) + parm: max_scheduling_interval: "Kni thread max scheduling interval (default=200 microseconds): + (long) Loading the ``rte_kni`` kernel module without any optional parameters is @@ -202,6 +206,35 @@ Enabling bifurcated device support releases ``rtnl`` lock before calling callback and locks it back after callback. Also enables asynchronous request to support callbacks that requires rtnl lock to work (interface down). +KNI Kthread Scheduling +~~~~~~~~~~~~~~~~~~~~~~ + +The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters +control the rescheduling interval of the KNI kthreads. + +This might be useful if we have use cases in which we require improved +latency or performance for control plane traffic. + +The implementation is backed by Linux High Precision Timers, and uses ``usleep_range``. +Hence, it will have the same granularity constraints as this Linux subsystem. + +For Linux High Precision Timers, you can check the following resource: `Kernel Timers <http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_ + +To set the ``min_scheduling_interval`` to a value of 100 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=100 + +To set the ``max_scheduling_interval`` to a value of 200 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko max_scheduling_interval=200 + +If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are +not specified, the default interval limits will be set to *100* and *200* respectively. + KNI Creation and Deletion ------------------------- diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index e863348..a2c6d9f 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -27,7 +27,7 @@ #include <linux/list.h> #include <rte_kni_common.h> -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */ #define MBUF_BURST_SZ 32 diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index f10dcd0..c5218d5 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -45,6 +45,10 @@ uint32_t kni_dflt_carrier; static char *enable_bifurcated; uint32_t bifurcated_support; +/* Kni thread scheduling interval */ +static long min_scheduling_interval = 100; /* us */ +static long max_scheduling_interval = 200; /* us */ + #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ static int kni_net_id; @@ -132,11 +136,8 @@ kni_thread_single(void *data) } } up_read(&knet->kni_list_lock); -#ifdef RTE_KNI_PREEMPT_DEFAULT /* reschedule out for a while */ - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); -#endif + usleep_range(min_scheduling_interval, max_scheduling_interval); } return 0; @@ -153,10 +154,7 @@ kni_thread_multiple(void *param) kni_net_rx(dev); kni_net_poll_resp(dev); } -#ifdef RTE_KNI_PREEMPT_DEFAULT - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); -#endif + usleep_range(min_scheduling_interval, max_scheduling_interval); } return 0; @@ -617,6 +615,14 @@ kni_init(void) if (bifurcated_support == 1) pr_debug("bifurcated support is enabled.\n"); + if (min_scheduling_interval < 0 || max_scheduling_interval < 0 || + min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + min_scheduling_interval >= max_scheduling_interval) { + pr_err("Invalid parameters for scheduling interval\n"); + return -EINVAL; + } + #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS rc = register_pernet_subsys(&kni_net_ops); #else @@ -692,3 +698,15 @@ MODULE_PARM_DESC(enable_bifurcated, "\t\ton Enable request processing support for bifurcated drivers.\n" "\t\t" ); + +module_param(min_scheduling_interval, long, 0644); +MODULE_PARM_DESC(min_scheduling_interval, +"\t\tKni thread min scheduling interval (default=100 microseconds):\n" +"\t\t" +); + +module_param(max_scheduling_interval, long, 0644); +MODULE_PARM_DESC(max_scheduling_interval, +"\t\tKni thread max scheduling interval (default=200 microseconds):\n" +"\t\t" +); -- 2.7.4 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5] kni: allow configuring the kni thread granularity 2022-01-14 15:18 ` [PATCH v5] " Tudor Cornea @ 2022-01-14 16:24 ` Stephen Hemminger 2022-01-14 16:43 ` Ferruh Yigit 2022-01-20 12:41 ` [PATCH v6] " Tudor Cornea 1 sibling, 1 reply; 18+ messages in thread From: Stephen Hemminger @ 2022-01-14 16:24 UTC (permalink / raw) To: Tudor Cornea; +Cc: ferruh.yigit, padraig.j.connolly, thomas, helin.zhang, dev On Fri, 14 Jan 2022 17:18:19 +0200 Tudor Cornea <tudor.cornea@gmail.com> wrote: > +module_param(min_scheduling_interval, long, 0644); > +MODULE_PARM_DESC(min_scheduling_interval, > +"\t\tKni thread min scheduling interval (default=100 microseconds):\n" > +"\t\t" > +); > + > +module_param(max_scheduling_interval, long, 0644); > +MODULE_PARM_DESC(max_scheduling_interval, > +"\t\tKni thread max scheduling interval (default=200 microseconds):\n" > +"\t\t" > +); Please don't add more bad module parameter strings. The KNI author did something no other kernel modules do with tabs and double spacing, stop this bogus stuff. Is there any reason you have to use KNI at all. KNI is broken on many levels and is not fixable. What about virtio or tap? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5] kni: allow configuring the kni thread granularity 2022-01-14 16:24 ` Stephen Hemminger @ 2022-01-14 16:43 ` Ferruh Yigit 2022-01-17 16:24 ` Tudor Cornea 0 siblings, 1 reply; 18+ messages in thread From: Ferruh Yigit @ 2022-01-14 16:43 UTC (permalink / raw) To: Stephen Hemminger, Tudor Cornea Cc: padraig.j.connolly, thomas, helin.zhang, dev On 1/14/2022 4:24 PM, Stephen Hemminger wrote: > On Fri, 14 Jan 2022 17:18:19 +0200 > Tudor Cornea <tudor.cornea@gmail.com> wrote: > >> +module_param(min_scheduling_interval, long, 0644); >> +MODULE_PARM_DESC(min_scheduling_interval, >> +"\t\tKni thread min scheduling interval (default=100 microseconds):\n" >> +"\t\t" >> +); >> + >> +module_param(max_scheduling_interval, long, 0644); >> +MODULE_PARM_DESC(max_scheduling_interval, >> +"\t\tKni thread max scheduling interval (default=200 microseconds):\n" >> +"\t\t" >> +); > > Please don't add more bad module parameter strings. > The KNI author did something no other kernel modules do with tabs > and double spacing, stop this bogus stuff. > The patch is good, let's not block it for the module parameter string, all can be fixed with another patch. Can you please give a sample what is a common way of it, me or Tudor can do the patch? > Is there any reason you have to use KNI at all. > KNI is broken on many levels and is not fixable. > What about virtio or tap? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v5] kni: allow configuring the kni thread granularity 2022-01-14 16:43 ` Ferruh Yigit @ 2022-01-17 16:24 ` Tudor Cornea 0 siblings, 0 replies; 18+ messages in thread From: Tudor Cornea @ 2022-01-17 16:24 UTC (permalink / raw) To: Ferruh Yigit Cc: Stephen Hemminger, padraig.j.connolly, Thomas Monjalon, helin.zhang, dev [-- Attachment #1: Type: text/plain, Size: 1645 bytes --] On Fri, 14 Jan 2022 at 18:44, Ferruh Yigit <ferruh.yigit@intel.com> wrote: > On 1/14/2022 4:24 PM, Stephen Hemminger wrote: > > On Fri, 14 Jan 2022 17:18:19 +0200 > > Tudor Cornea <tudor.cornea@gmail.com> wrote: > > > >> +module_param(min_scheduling_interval, long, 0644); > >> +MODULE_PARM_DESC(min_scheduling_interval, > >> +"\t\tKni thread min scheduling interval (default=100 microseconds):\n" > >> +"\t\t" > >> +); > >> + > >> +module_param(max_scheduling_interval, long, 0644); > >> +MODULE_PARM_DESC(max_scheduling_interval, > >> +"\t\tKni thread max scheduling interval (default=200 microseconds):\n" > >> +"\t\t" > >> +); > > > > Please don't add more bad module parameter strings. > > The KNI author did something no other kernel modules do with tabs > > and double spacing, stop this bogus stuff. > > > > The patch is good, let's not block it for the module parameter string, > all can be fixed with another patch. > > Can you please give a sample what is a common way of it, me or Tudor can > do the patch? > > I agree that the module parameter string is in non-standard format. I was planning to send a follow-up patch, which would correct the description for all of the KNI parameters (including the two new parameters that the current patch would add) in one shot. > > Is there any reason you have to use KNI at all. > > KNI is broken on many levels and is not fixable. > > What about virtio or tap? > > We've run some tests with tap interfaces and found the performance to not be good enough for our use. We're going to experiment with virtio_user in the future. I'm aware that there is a long term plan to deprecate the KNI. [-- Attachment #2: Type: text/html, Size: 2556 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v6] kni: allow configuring the kni thread granularity 2022-01-14 15:18 ` [PATCH v5] " Tudor Cornea 2022-01-14 16:24 ` Stephen Hemminger @ 2022-01-20 12:41 ` Tudor Cornea 2022-02-02 19:30 ` Thomas Monjalon 1 sibling, 1 reply; 18+ messages in thread From: Tudor Cornea @ 2022-01-20 12:41 UTC (permalink / raw) To: ferruh.yigit Cc: padraig.j.connolly, thomas, stephen, helin.zhang, dev, Tudor Cornea, Padraig Connolly The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic. Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible(). In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols. Since we are not in atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-millisecond delays, seems unfeasible with a call to schedule_timeout_interruptible(). KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range(). We attempt to measure the CPU usage, and RTT between two Kni interfaces, which are created on top of vmxnet3 adapters, connected by a vSwitch. insmod rte_kni.ko kthread_mode=single carrier=on schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases. Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel. Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b [<ffffffff814f7891>] panic+0xcd/0x1e0 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 This patch also attempts to remove this option. References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> --- v6: * Removed tabs and newline in the description of the min_scheduling_interval and max_scheduling_interval parameters. They seem to be non-standard. In a quick glance over the Linux tree, I saw some (rare) usages of newlines and tabs: drivers/scsi/bnx2fc/bnx2fc_fcoe.c (debug_logging) Fixing the other parameters might mean that we have to chop some text, otherwise the line could probably get too big. v5: * Rebased the patch on top of the dpdk-next-net branch v4: * Removed RTE_KNI_PREEMPT_DEFAULT configuration option v3: * Fixed unwrapped commit description warning * Changed from hrtimers to Linux High Precision Timers in docs * Added two tabs at the beginning of the new params description. Stephen correctly pointed out that the descriptions of the parameters for the Kni module are nonstandard w.r.t existing kernel code. I was thinking to preserve compatibility with the existing parameters of the Kni module for the moment, while an additional clean-up patch could format the descriptions to be closer to the kernel standard. v2: * Fixed some spelling errors --- --- config/rte_config.h | 3 --- doc/guides/prog_guide/kernel_nic_interface.rst | 33 ++++++++++++++++++++++++++ kernel/linux/kni/kni_dev.h | 2 +- kernel/linux/kni/kni_misc.c | 32 ++++++++++++++++++------- 4 files changed, 58 insertions(+), 12 deletions(-) diff --git a/config/rte_config.h b/config/rte_config.h index cab4390..91d96ee 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -95,9 +95,6 @@ #define RTE_SCHED_PORT_N_GRINDERS 8 #undef RTE_SCHED_VECTOR -/* KNI defines */ -#define RTE_KNI_PREEMPT_DEFAULT 1 - /* rte_graph defines */ #define RTE_GRAPH_BURST_SIZE 256 #define RTE_LIBRTE_GRAPH_STATS 1 diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst index 771c7d7..a0763c5 100644 --- a/doc/guides/prog_guide/kernel_nic_interface.rst +++ b/doc/guides/prog_guide/kernel_nic_interface.rst @@ -61,6 +61,10 @@ can be specified when the module is loaded to control its behavior: userspace callback and supporting async requests (default=off): on Enable request processing support for bifurcated drivers. (charp) + parm: min_scheduling_interval: "Kni thread min scheduling interval (default=100 microseconds): + (long) + parm: max_scheduling_interval: "Kni thread max scheduling interval (default=200 microseconds): + (long) Loading the ``rte_kni`` kernel module without any optional parameters is @@ -202,6 +206,35 @@ Enabling bifurcated device support releases ``rtnl`` lock before calling callback and locks it back after callback. Also enables asynchronous request to support callbacks that requires rtnl lock to work (interface down). +KNI Kthread Scheduling +~~~~~~~~~~~~~~~~~~~~~~ + +The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters +control the rescheduling interval of the KNI kthreads. + +This might be useful if we have use cases in which we require improved +latency or performance for control plane traffic. + +The implementation is backed by Linux High Precision Timers, and uses ``usleep_range``. +Hence, it will have the same granularity constraints as this Linux subsystem. + +For Linux High Precision Timers, you can check the following resource: `Kernel Timers <http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_ + +To set the ``min_scheduling_interval`` to a value of 100 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=100 + +To set the ``max_scheduling_interval`` to a value of 200 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko max_scheduling_interval=200 + +If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are +not specified, the default interval limits will be set to *100* and *200* respectively. + KNI Creation and Deletion ------------------------- diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index e863348..a2c6d9f 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -27,7 +27,7 @@ #include <linux/list.h> #include <rte_kni_common.h> -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */ #define MBUF_BURST_SZ 32 diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index f10dcd0..45ef4c5 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -45,6 +45,10 @@ uint32_t kni_dflt_carrier; static char *enable_bifurcated; uint32_t bifurcated_support; +/* Kni thread scheduling interval */ +static long min_scheduling_interval = 100; /* us */ +static long max_scheduling_interval = 200; /* us */ + #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ static int kni_net_id; @@ -132,11 +136,8 @@ kni_thread_single(void *data) } } up_read(&knet->kni_list_lock); -#ifdef RTE_KNI_PREEMPT_DEFAULT /* reschedule out for a while */ - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); -#endif + usleep_range(min_scheduling_interval, max_scheduling_interval); } return 0; @@ -153,10 +154,7 @@ kni_thread_multiple(void *param) kni_net_rx(dev); kni_net_poll_resp(dev); } -#ifdef RTE_KNI_PREEMPT_DEFAULT - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); -#endif + usleep_range(min_scheduling_interval, max_scheduling_interval); } return 0; @@ -617,6 +615,14 @@ kni_init(void) if (bifurcated_support == 1) pr_debug("bifurcated support is enabled.\n"); + if (min_scheduling_interval < 0 || max_scheduling_interval < 0 || + min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + min_scheduling_interval >= max_scheduling_interval) { + pr_err("Invalid parameters for scheduling interval\n"); + return -EINVAL; + } + #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS rc = register_pernet_subsys(&kni_net_ops); #else @@ -692,3 +698,13 @@ MODULE_PARM_DESC(enable_bifurcated, "\t\ton Enable request processing support for bifurcated drivers.\n" "\t\t" ); + +module_param(min_scheduling_interval, long, 0644); +MODULE_PARM_DESC(min_scheduling_interval, +"Kni thread min scheduling interval (default=100 microseconds)" +); + +module_param(max_scheduling_interval, long, 0644); +MODULE_PARM_DESC(max_scheduling_interval, +"Kni thread max scheduling interval (default=200 microseconds)" +); -- 2.7.4 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v6] kni: allow configuring the kni thread granularity 2022-01-20 12:41 ` [PATCH v6] " Tudor Cornea @ 2022-02-02 19:30 ` Thomas Monjalon 0 siblings, 0 replies; 18+ messages in thread From: Thomas Monjalon @ 2022-02-02 19:30 UTC (permalink / raw) To: Tudor Cornea Cc: ferruh.yigit, dev, padraig.j.connolly, stephen, helin.zhang, Padraig Connolly 20/01/2022 13:41, Tudor Cornea: > The Kni kthreads seem to be re-scheduled at a granularity of roughly > 1 millisecond right now, which seems to be insufficient for performing > tests involving a lot of control plane traffic. > > Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it > seems that the existing code cannot reschedule at the desired granularily, > due to precision constraints of schedule_timeout_interruptible(). > > In our use case, we leverage the Linux Kernel for control plane, and > it is not uncommon to have 60K - 100K pps for some signaling protocols. > > Since we are not in atomic context, the usleep_range() function seems to be > more appropriate for being able to introduce smaller controlled delays, > in the range of 5-10 microseconds. Upon reading the existing code, it would > seem that this was the original intent. Adding sub-millisecond delays, > seems unfeasible with a call to schedule_timeout_interruptible(). > > KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ > schedule_timeout_interruptible( > usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); > > Below, we attempted a brief comparison between the existing implementation, > which uses schedule_timeout_interruptible() and usleep_range(). > > We attempt to measure the CPU usage, and RTT between two Kni interfaces, > which are created on top of vmxnet3 adapters, connected by a vSwitch. > > insmod rte_kni.ko kthread_mode=single carrier=on > > schedule_timeout_interruptible(usecs_to_jiffies(5)) > kni_single CPU Usage: 2-4 % > [root@localhost ~]# ping 1.1.1.2 -I eth1 > PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms > > usleep_range(5, 10) > kni_single CPU usage: 50% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms > > usleep_range(20, 50) > kni_single CPU usage: 24% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms > > usleep_range(50, 100) > kni_single CPU usage: 13% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms > > usleep_range(100, 200) > kni_single CPU usage: 7% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms > > usleep_range(1000, 1100) > kni_single CPU usage: 2% > 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms > 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms > 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms > > Upon testing, usleep_range(1000, 1100) seems roughly equivalent in > latency and cpu usage to the variant with schedule_timeout_interruptible(), > while usleep_range(100, 200) seems to give a decent tradeoff between > latency and cpu usage, while allowing users to tweak the limits for > improved precision if they have such use cases. > > Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a > softlockup on my kernel. > > Kernel panic - not syncing: softlockup: hung tasks > CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 > <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b > [<ffffffff814f7891>] panic+0xcd/0x1e0 > [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 > [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 > [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 > [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 > [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 > > This patch also attempts to remove this option. > > References: > [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt > > Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> > Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com> > Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> > --- > v6: > * Removed tabs and newline in the description of the > > min_scheduling_interval and max_scheduling_interval > parameters. They seem to be non-standard. The doc had to be updated a bit as well. Fixed Kni -> KNI and applied, thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-02-02 19:30 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-02 10:38 [dpdk-dev] [PATCH] kni: allow configuring the kni thread granularity Tudor Cornea 2021-11-02 15:51 ` [dpdk-dev] [PATCH v2] " Tudor Cornea 2021-11-02 15:53 ` Stephen Hemminger 2021-11-03 20:40 ` Tudor Cornea 2021-11-03 22:18 ` Stephen Hemminger 2021-11-08 10:13 ` [dpdk-dev] [PATCH v3] " Tudor Cornea 2021-11-22 17:31 ` Ferruh Yigit 2021-11-23 17:08 ` Ferruh Yigit 2021-11-24 17:10 ` Tudor Cornea 2021-11-24 19:24 ` [PATCH v4] " Tudor Cornea 2022-01-14 13:53 ` Connolly, Padraig J 2022-01-14 14:13 ` Ferruh Yigit 2022-01-14 15:18 ` [PATCH v5] " Tudor Cornea 2022-01-14 16:24 ` Stephen Hemminger 2022-01-14 16:43 ` Ferruh Yigit 2022-01-17 16:24 ` Tudor Cornea 2022-01-20 12:41 ` [PATCH v6] " Tudor Cornea 2022-02-02 19:30 ` Thomas Monjalon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).