From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2F0CDA034E; Thu, 20 Jan 2022 13:42:02 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C75DB426E9; Thu, 20 Jan 2022 13:42:01 +0100 (CET) Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by mails.dpdk.org (Postfix) with ESMTP id 0C20540042 for ; Thu, 20 Jan 2022 13:42:00 +0100 (CET) Received: by mail-wm1-f51.google.com with SMTP id e9-20020a05600c4e4900b0034d23cae3f0so13432687wmq.2 for ; Thu, 20 Jan 2022 04:42:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VQcIixn2UE/D6561PQLxEHkeHMYYCh8MWViPL25YyX4=; b=kx4T6b0P78eCICuoMr3nb20neLjvmWKSfUbQVlt6gXH6AJEmmnMWuTFTbGbKKVKTBD uTLQFuiDMmGbsQNTxDfH7jmD0M4eBysp11N3myKVZDZAhDFbofjq0zltXMWuavfv8aZ9 iHzbWyrt9lQwLoMAtsAUvPekDtU60BHcjbm8lyjGgWGhd7wnjBzp1KA0Nxa8jDX3kUYk +2UU6UidLPyJWdExoiSgW54jQcifHUjqxMYlYkcqE6AJi8HgwpdaOrDlD2SAdHXHjhP5 gLmdnEZb+iZqLMjjVPilZEE9lsh1rWveqi6E2Yy5jxvdbkU+nShs0UAugcrpCRSe9STo 6QJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=VQcIixn2UE/D6561PQLxEHkeHMYYCh8MWViPL25YyX4=; b=gjvNhZ7i5diml7Pw7eY0tJBtrf/J76h75TrTDCY2/BzXrB+0ol+8aLFkyEHtfEJ9R3 Q8I9N2zJDlyCTs7nJH0xY7ZOxsJ+Bwkh1dFaODHfaiOhMlApc9Lu0JWu1kzDCasX7b90 0WYG82lmtIlud9geFGUNAIlvpEC1XDLJurLkVd/Q7q+NOBhT2mJuc8cdIlAADk93Hw9Z BjU4xeD34EFnfUClI0i884Jwjlcm57ezwM8DqQZBEH8xz58ibm1VnBTQe/eaobp726RU diJ8hJG+uGWhljlDRadG4ptQNLhX2t7xWyAx0lFieXQyI3cHnvHlC0TgSIINRb+YRzXO 85Uw== X-Gm-Message-State: AOAM531xfvuARJiH2dx+RRhf7f9jUDk4ihXukW5IG0blo2V3SvyBSrei DC+DZ/ixbKFGNYbLw0UFFws= X-Google-Smtp-Source: ABdhPJx9zn7U5eI1hj3d6153PIwpdOGSe97ciUB6pNqrCjr+R1aVnAnPcFPiO1/VGDwBwdhyQ9TFoA== X-Received: by 2002:a05:6000:1ac9:: with SMTP id i9mr18452113wry.47.1642682519578; Thu, 20 Jan 2022 04:41:59 -0800 (PST) Received: from tucornea-pc.localdomain ([193.226.172.43]) by smtp.gmail.com with ESMTPSA id p4sm7681348wmq.40.2022.01.20.04.41.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 20 Jan 2022 04:41:59 -0800 (PST) From: Tudor Cornea To: ferruh.yigit@intel.com Cc: padraig.j.connolly@intel.com, thomas@monjalon.net, stephen@networkplumber.org, helin.zhang@intel.com, dev@dpdk.org, Tudor Cornea , Padraig Connolly Subject: [PATCH v6] kni: allow configuring the kni thread granularity Date: Thu, 20 Jan 2022 14:41:34 +0200 Message-Id: <20220120124134.4123542-1-tudor.cornea@gmail.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <1642173499-59396-1-git-send-email-tudor.cornea@gmail.com> References: <1642173499-59396-1-git-send-email-tudor.cornea@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic. Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible(). In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols. Since we are not in atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-millisecond delays, seems unfeasible with a call to schedule_timeout_interruptible(). KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range(). We attempt to measure the CPU usage, and RTT between two Kni interfaces, which are created on top of vmxnet3 adapters, connected by a vSwitch. insmod rte_kni.ko kthread_mode=single carrier=on schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases. Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel. Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 [] dump_stack+0x19/0x1b [] panic+0xcd/0x1e0 [] watchdog_timer_fn+0x160/0x160 [] __run_hrtimer.isra.4+0x42/0xd0 [] hrtimer_interrupt+0xe7/0x1f0 [] smp_apic_timer_interrupt+0x67/0xa0 [] apic_timer_interrupt+0x6d/0x80 This patch also attempts to remove this option. References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Signed-off-by: Tudor Cornea Acked-by: Padraig Connolly Reviewed-by: Ferruh Yigit --- v6: * Removed tabs and newline in the description of the min_scheduling_interval and max_scheduling_interval parameters. They seem to be non-standard. In a quick glance over the Linux tree, I saw some (rare) usages of newlines and tabs: drivers/scsi/bnx2fc/bnx2fc_fcoe.c (debug_logging) Fixing the other parameters might mean that we have to chop some text, otherwise the line could probably get too big. v5: * Rebased the patch on top of the dpdk-next-net branch v4: * Removed RTE_KNI_PREEMPT_DEFAULT configuration option v3: * Fixed unwrapped commit description warning * Changed from hrtimers to Linux High Precision Timers in docs * Added two tabs at the beginning of the new params description. Stephen correctly pointed out that the descriptions of the parameters for the Kni module are nonstandard w.r.t existing kernel code. I was thinking to preserve compatibility with the existing parameters of the Kni module for the moment, while an additional clean-up patch could format the descriptions to be closer to the kernel standard. v2: * Fixed some spelling errors --- --- config/rte_config.h | 3 --- doc/guides/prog_guide/kernel_nic_interface.rst | 33 ++++++++++++++++++++++++++ kernel/linux/kni/kni_dev.h | 2 +- kernel/linux/kni/kni_misc.c | 32 ++++++++++++++++++------- 4 files changed, 58 insertions(+), 12 deletions(-) diff --git a/config/rte_config.h b/config/rte_config.h index cab4390..91d96ee 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -95,9 +95,6 @@ #define RTE_SCHED_PORT_N_GRINDERS 8 #undef RTE_SCHED_VECTOR -/* KNI defines */ -#define RTE_KNI_PREEMPT_DEFAULT 1 - /* rte_graph defines */ #define RTE_GRAPH_BURST_SIZE 256 #define RTE_LIBRTE_GRAPH_STATS 1 diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst index 771c7d7..a0763c5 100644 --- a/doc/guides/prog_guide/kernel_nic_interface.rst +++ b/doc/guides/prog_guide/kernel_nic_interface.rst @@ -61,6 +61,10 @@ can be specified when the module is loaded to control its behavior: userspace callback and supporting async requests (default=off): on Enable request processing support for bifurcated drivers. (charp) + parm: min_scheduling_interval: "Kni thread min scheduling interval (default=100 microseconds): + (long) + parm: max_scheduling_interval: "Kni thread max scheduling interval (default=200 microseconds): + (long) Loading the ``rte_kni`` kernel module without any optional parameters is @@ -202,6 +206,35 @@ Enabling bifurcated device support releases ``rtnl`` lock before calling callback and locks it back after callback. Also enables asynchronous request to support callbacks that requires rtnl lock to work (interface down). +KNI Kthread Scheduling +~~~~~~~~~~~~~~~~~~~~~~ + +The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters +control the rescheduling interval of the KNI kthreads. + +This might be useful if we have use cases in which we require improved +latency or performance for control plane traffic. + +The implementation is backed by Linux High Precision Timers, and uses ``usleep_range``. +Hence, it will have the same granularity constraints as this Linux subsystem. + +For Linux High Precision Timers, you can check the following resource: `Kernel Timers `_ + +To set the ``min_scheduling_interval`` to a value of 100 microseconds: + +.. code-block:: console + + # insmod /kernel/linux/kni/rte_kni.ko min_scheduling_interval=100 + +To set the ``max_scheduling_interval`` to a value of 200 microseconds: + +.. code-block:: console + + # insmod /kernel/linux/kni/rte_kni.ko max_scheduling_interval=200 + +If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are +not specified, the default interval limits will be set to *100* and *200* respectively. + KNI Creation and Deletion ------------------------- diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index e863348..a2c6d9f 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -27,7 +27,7 @@ #include #include -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */ #define MBUF_BURST_SZ 32 diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index f10dcd0..45ef4c5 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -45,6 +45,10 @@ uint32_t kni_dflt_carrier; static char *enable_bifurcated; uint32_t bifurcated_support; +/* Kni thread scheduling interval */ +static long min_scheduling_interval = 100; /* us */ +static long max_scheduling_interval = 200; /* us */ + #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ static int kni_net_id; @@ -132,11 +136,8 @@ kni_thread_single(void *data) } } up_read(&knet->kni_list_lock); -#ifdef RTE_KNI_PREEMPT_DEFAULT /* reschedule out for a while */ - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); -#endif + usleep_range(min_scheduling_interval, max_scheduling_interval); } return 0; @@ -153,10 +154,7 @@ kni_thread_multiple(void *param) kni_net_rx(dev); kni_net_poll_resp(dev); } -#ifdef RTE_KNI_PREEMPT_DEFAULT - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); -#endif + usleep_range(min_scheduling_interval, max_scheduling_interval); } return 0; @@ -617,6 +615,14 @@ kni_init(void) if (bifurcated_support == 1) pr_debug("bifurcated support is enabled.\n"); + if (min_scheduling_interval < 0 || max_scheduling_interval < 0 || + min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + min_scheduling_interval >= max_scheduling_interval) { + pr_err("Invalid parameters for scheduling interval\n"); + return -EINVAL; + } + #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS rc = register_pernet_subsys(&kni_net_ops); #else @@ -692,3 +698,13 @@ MODULE_PARM_DESC(enable_bifurcated, "\t\ton Enable request processing support for bifurcated drivers.\n" "\t\t" ); + +module_param(min_scheduling_interval, long, 0644); +MODULE_PARM_DESC(min_scheduling_interval, +"Kni thread min scheduling interval (default=100 microseconds)" +); + +module_param(max_scheduling_interval, long, 0644); +MODULE_PARM_DESC(max_scheduling_interval, +"Kni thread max scheduling interval (default=200 microseconds)" +); -- 2.7.4