From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 425C1A0C4E; Tue, 2 Nov 2021 16:51:24 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1E28041152; Tue, 2 Nov 2021 16:51:24 +0100 (CET) Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) by mails.dpdk.org (Postfix) with ESMTP id BDE9641139 for ; Tue, 2 Nov 2021 16:51:22 +0100 (CET) Received: by mail-wr1-f54.google.com with SMTP id r8so21196322wra.7 for ; Tue, 02 Nov 2021 08:51:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=UZkrDrp7dfAvWK1VTIeTpU+oTQ9+5RCqlTPCAEiHkk8=; b=CN8RphLYQH+x066431chZ+I5GPJ1kocSlwSeTCTnQn76yYqh5j7RAVELw/EppU4wAA hbzYkGg5T9gJ/TGnYPa5OWthiUzCH0P5K5uhm3QAUUcxthJHGJFEVlZGKUljWWtdS7o0 BA0BtzM4yAaPRGcBCUxEK0sQIR+lQeIkrhFsJYm8MPWEV1KVOzjMs6HP396sJ7vfuits Ee6URZHEpHnozJgJOmD8me9SxpX8ABZlojs4Sd3vDbhw3a2llK3D5b3oe2R5hrwiDE/h uCibRYoUtwron0JxSTAXH5Bt2Q9kbfZLdf0ver+rVaOYG2yqxrU0qs3YH93ZUsY7YnmJ Actg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=UZkrDrp7dfAvWK1VTIeTpU+oTQ9+5RCqlTPCAEiHkk8=; b=KCGy9mBAXpsscuW3xi4ao/yxjO6QF85ZeLlSyy7yq7U8cQp2m4mxz6KiPUOrSjBs/v uKNO8mxf+AQWPCEl5BuR6021PFCKeWoWrxfH2EMD++17rZEkZvap+noylTFLnZtcDB3z ReKMkOtmiWHnp+CmsAdM9RHTQFpi5aLEd4z8eusVOACbG67kSFiNdBf3JE+HcpFcb7Ev Pgc9nq1Ma7VwoIGnBrSgZROK9RtTKE+jrLLXim/ja+cI5nJukHw4zXGyAJua1E3pETwm uluvRDtTfF24lL0Yw5CZeyNFYXcMDzqSDWHg5v22HKV4SUDSIJwgRViIekJ96CDf+Nle AjaA== X-Gm-Message-State: AOAM533qm/GvrS5S35Jngu+UMxBuYP9kMkUW4l/NDPThW8t216cXBv/C 0l/L5WxCkza4jBfrQfeupx4= X-Google-Smtp-Source: ABdhPJy/9urWHZP2XqHScviWLmRBoG2qHMlUZy6s/wjgaFs1+eJSJjr4y6kBYojWJjTvKuaqtPs4LQ== X-Received: by 2002:adf:cd8c:: with SMTP id q12mr48369400wrj.144.1635868282434; Tue, 02 Nov 2021 08:51:22 -0700 (PDT) Received: from tucornea-dev-machine.localdomain ([193.226.172.44]) by smtp.gmail.com with ESMTPSA id e3sm4473846wrp.8.2021.11.02.08.51.21 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Nov 2021 08:51:22 -0700 (PDT) From: Tudor Cornea To: ferruh.yigit@intel.com Cc: dev@dpdk.org, Tudor Cornea Date: Tue, 2 Nov 2021 17:51:13 +0200 Message-Id: <1635868273-69843-1-git-send-email-tudor.cornea@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1635849481-47147-1-git-send-email-tudor.cornea@gmail.com> References: <1635849481-47147-1-git-send-email-tudor.cornea@gmail.com> Subject: [dpdk-dev] [PATCH v2] kni: allow configuring the kni thread granularity X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic. Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible(). In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols. Since we are in non-atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-millisecond delays, seems unfeasible with a call to schedule_timeout_interruptible(). KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range(). insmod rte_kni.ko kthread_mode=single carrier=on schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases. Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel. Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 [] dump_stack+0x19/0x1b [] panic+0xcd/0x1e0 [] watchdog_timer_fn+0x160/0x160 [] __run_hrtimer.isra.4+0x42/0xd0 [] hrtimer_interrupt+0xe7/0x1f0 [] smp_apic_timer_interrupt+0x67/0xa0 [] apic_timer_interrupt+0x6d/0x80 References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Signed-off-by: Tudor Cornea --- v2: * Fixed some spelling errors --- doc/guides/prog_guide/kernel_nic_interface.rst | 33 +++++++++++++++++++++++ kernel/linux/kni/kni_dev.h | 2 +- kernel/linux/kni/kni_misc.c | 36 +++++++++++++++++++++++--- 3 files changed, 66 insertions(+), 5 deletions(-) diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst index 1ce03ec..2dd3481 100644 --- a/doc/guides/prog_guide/kernel_nic_interface.rst +++ b/doc/guides/prog_guide/kernel_nic_interface.rst @@ -56,6 +56,10 @@ can be specified when the module is loaded to control its behavior: off Interfaces will be created with carrier state set to off. on Interfaces will be created with carrier state set to on. (charp) + parm: min_scheduling_interval: "Kni thread min scheduling interval (default=100 microseconds): + (long) + parm: max_scheduling_interval: "Kni thread max scheduling interval (default=200 microseconds): + (long) Loading the ``rte_kni`` kernel module without any optional parameters is the typical way a DPDK application gets packets into and out of the kernel @@ -174,6 +178,35 @@ To set the default carrier state to *off*: If the ``carrier`` parameter is not specified, the default carrier state of KNI interfaces will be set to *off*. +KNI Kthread Scheduling +~~~~~~~~~~~~~~~~~~~~~~ + +The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters +control the rescheduling interval of the KNI kthreads. + +This might be useful if we have use cases in which we require improved +latency or performance for control plane traffic. + +The implementation is backed by Linux hrtimers, and uses ``usleep_range``. +Hence, it will have the same granularity constraints as Linux hrtimers. + +To see more about the Linux hrtimers, you can check the following resource: `Kernel Timers `_ + +To set the ``min_scheduling_interval`` to a value of 100 microseconds: + +.. code-block:: console + + # insmod /kernel/linux/kni/rte_kni.ko min_scheduling_interval=100 + +To set the ``max_scheduling_interval`` to a value of 200 microseconds: + +.. code-block:: console + + # insmod /kernel/linux/kni/rte_kni.ko max_scheduling_interval=200 + +If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are +not specified, the default interval limits will be set to *100* and *200* respectively. + KNI Creation and Deletion ------------------------- diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index c15da311..bb4d891 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -27,7 +27,7 @@ #include #include -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */ #define MBUF_BURST_SZ 32 diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index 2b464c4..e23cfd9 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -41,6 +41,12 @@ static uint32_t multiple_kthread_on; static char *carrier; uint32_t kni_dflt_carrier; +#ifdef RTE_KNI_PREEMPT_DEFAULT +/* Kni thread scheduling interval */ +static long min_scheduling_interval = 100; /* us */ +static long max_scheduling_interval = 200; /* us */ +#endif + #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ static int kni_net_id; @@ -130,8 +136,7 @@ kni_thread_single(void *data) up_read(&knet->kni_list_lock); #ifdef RTE_KNI_PREEMPT_DEFAULT /* reschedule out for a while */ - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); + usleep_range(min_scheduling_interval, max_scheduling_interval); #endif } @@ -150,8 +155,7 @@ kni_thread_multiple(void *param) kni_net_poll_resp(dev); } #ifdef RTE_KNI_PREEMPT_DEFAULT - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); + usleep_range(min_scheduling_interval, max_scheduling_interval); #endif } @@ -593,6 +597,16 @@ kni_init(void) else pr_debug("Default carrier state set to on.\n"); +#ifdef RTE_KNI_PREEMPT_DEFAULT + if (min_scheduling_interval < 0 || max_scheduling_interval < 0 || + min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + min_scheduling_interval >= max_scheduling_interval) { + pr_err("Invalid parameters for scheduling interval\n"); + return -EINVAL; + } +#endif + #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS rc = register_pernet_subsys(&kni_net_ops); #else @@ -659,3 +673,17 @@ MODULE_PARM_DESC(carrier, "\t\ton Interfaces will be created with carrier state set to on.\n" "\t\t" ); + +#ifdef RTE_KNI_PREEMPT_DEFAULT +module_param(min_scheduling_interval, long, 0644); +MODULE_PARM_DESC(min_scheduling_interval, +"Kni thread min scheduling interval (default=100 microseconds):\n" +"\t\t" +); + +module_param(max_scheduling_interval, long, 0644); +MODULE_PARM_DESC(max_scheduling_interval, +"Kni thread max scheduling interval (default=200 microseconds):\n" +"\t\t" +); +#endif -- 2.7.4