DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Connolly, Padraig J" <padraig.j.connolly@intel.com>
To: Tudor Cornea <tudor.cornea@gmail.com>,
	"Yigit, Ferruh" <ferruh.yigit@intel.com>,
	"Connolly, Padraig J" <padraig.j.connolly@intel.com>
Cc: "thomas@monjalon.net" <thomas@monjalon.net>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	"Zhang, Helin" <helin.zhang@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: RE: [PATCH v4] kni: allow configuring the kni thread granularity
Date: Fri, 14 Jan 2022 13:53:51 +0000
Message-ID: <DM6PR11MB37380A4EB21112CBF7E87026D7549@DM6PR11MB3738.namprd11.prod.outlook.com> (raw)
In-Reply-To: <1637781854-74761-1-git-send-email-tudor.cornea@gmail.com>

> -----Original Message-----
> From: Tudor Cornea <tudor.cornea@gmail.com>
> Sent: Wednesday, November 24, 2021 7:24 PM
> To: Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: thomas@monjalon.net; stephen@networkplumber.org; Zhang, Helin
> <helin.zhang@intel.com>; dev@dpdk.org; Tudor Cornea
> <tudor.cornea@gmail.com>
> Subject: [PATCH v4] kni: allow configuring the kni thread granularity
> 
> The Kni kthreads seem to be re-scheduled at a granularity of roughly
> 1 millisecond right now, which seems to be insufficient for performing tests
> involving a lot of control plane traffic.
> 
> Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it
> seems that the existing code cannot reschedule at the desired granularily,
> due to precision constraints of schedule_timeout_interruptible().
> 
> In our use case, we leverage the Linux Kernel for control plane, and it is not
> uncommon to have 60K - 100K pps for some signaling protocols.
> 
> Since we are not in atomic context, the usleep_range() function seems to be
> more appropriate for being able to introduce smaller controlled delays, in the
> range of 5-10 microseconds. Upon reading the existing code, it would seem
> that this was the original intent. Adding sub-millisecond delays, seems
> unfeasible with a call to schedule_timeout_interruptible().
> 
> KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
> schedule_timeout_interruptible(
>         usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
> 
> Below, we attempted a brief comparison between the existing
> implementation, which uses schedule_timeout_interruptible() and
> usleep_range().
> 
> We attempt to measure the CPU usage, and RTT between two Kni interfaces,
> which are created on top of vmxnet3 adapters, connected by a vSwitch.
> 
> insmod rte_kni.ko kthread_mode=single carrier=on
> 
> schedule_timeout_interruptible(usecs_to_jiffies(5))
> kni_single CPU Usage: 2-4 %
> [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1
> eth1: 56(84) bytes of data.
> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms
> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms
> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms
> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms
> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms
> 
> usleep_range(5, 10)
> kni_single CPU usage: 50%
> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms
> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms
> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms
> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms
> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms
> 
> usleep_range(20, 50)
> kni_single CPU usage: 24%
> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms
> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms
> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms
> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms
> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms
> 
> usleep_range(50, 100)
> kni_single CPU usage: 13%
> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms
> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms
> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms
> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms
> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms
> 
> usleep_range(100, 200)
> kni_single CPU usage: 7%
> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms
> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms
> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms
> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms
> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms
> 
> usleep_range(1000, 1100)
> kni_single CPU usage: 2%
> 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms
> 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms
> 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms
> 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms
> 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms
> 
> Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency
> and cpu usage to the variant with schedule_timeout_interruptible(), while
> usleep_range(100, 200) seems to give a decent tradeoff between latency
> and cpu usage, while allowing users to tweak the limits for improved
> precision if they have such use cases.
> 
> Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a
> softlockup on my kernel.
> 
> Kernel panic - not syncing: softlockup: hung tasks
> CPU: 0 PID: 1226 Comm: kni_single Tainted: G        W  O 3.10 #1
>  <IRQ>  [<ffffffff814f84de>] dump_stack+0x19/0x1b  [<ffffffff814f7891>]
> panic+0xcd/0x1e0  [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160
> [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0  [<ffffffff81064b57>]
> hrtimer_interrupt+0xe7/0x1f0  [<ffffffff8102cd57>]
> smp_apic_timer_interrupt+0x67/0xa0
>  [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80
> 
> This patch also attempts to remove this option.
> 
> References:
> [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt
> 
> Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com>
> 
> ---
> v4:
> * Removed RTE_KNI_PREEMPT_DEFAULT configuration option
> v3:
> * Fixed unwrapped commit description warning
> * Changed from hrtimers to Linux High Precision Timers in docs
> * Added two tabs at the beginning of the new params description.
>   Stephen correctly pointed out that the descriptions of the parameters
>   for the Kni module are nonstandard w.r.t existing kernel code.
>   I was thinking to preserve compatibility with the existing parameters
>   of the Kni module for the moment, while an additional clean-up patch
>   could format the descriptions to be closer to the kernel standard.
> v2:
> * Fixed some spelling errors
> ---
>  config/rte_config.h                            |  3 ---
>  doc/guides/prog_guide/kernel_nic_interface.rst | 33
> +++++++++++++++++++++++++
>  kernel/linux/kni/kni_dev.h                     |  2 +-
>  kernel/linux/kni/kni_misc.c                    | 34 ++++++++++++++++++++------
>  4 files changed, 60 insertions(+), 12 deletions(-)
> 
> diff --git a/config/rte_config.h b/config/rte_config.h index cab4390..91d96ee
> 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -95,9 +95,6 @@
>  #define RTE_SCHED_PORT_N_GRINDERS 8
>  #undef RTE_SCHED_VECTOR
> 
> -/* KNI defines */
> -#define RTE_KNI_PREEMPT_DEFAULT 1
> -
>  /* rte_graph defines */
>  #define RTE_GRAPH_BURST_SIZE 256
>  #define RTE_LIBRTE_GRAPH_STATS 1
> diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst
> b/doc/guides/prog_guide/kernel_nic_interface.rst
> index 1ce03ec..fce3667 100644
> --- a/doc/guides/prog_guide/kernel_nic_interface.rst
> +++ b/doc/guides/prog_guide/kernel_nic_interface.rst
> @@ -56,6 +56,10 @@ can be specified when the module is loaded to control
> its behavior:
>                      off   Interfaces will be created with carrier state set to off.
>                      on    Interfaces will be created with carrier state set to on.
>                       (charp)
> +    parm:           min_scheduling_interval: "Kni thread min scheduling interval
> (default=100 microseconds):
> +                     (long)
> +    parm:           max_scheduling_interval: "Kni thread max scheduling interval
> (default=200 microseconds):
> +                     (long)
> 
>  Loading the ``rte_kni`` kernel module without any optional parameters is
> the typical way a DPDK application gets packets into and out of the kernel
> @@ -174,6 +178,35 @@ To set the default carrier state to *off*:
>  If the ``carrier`` parameter is not specified, the default carrier state  of KNI
> interfaces will be set to *off*.
> 
> +KNI Kthread Scheduling
> +~~~~~~~~~~~~~~~~~~~~~~
> +
> +The ``min_scheduling_interval`` and ``max_scheduling_interval``
> +parameters control the rescheduling interval of the KNI kthreads.
> +
> +This might be useful if we have use cases in which we require improved
> +latency or performance for control plane traffic.
> +
> +The implementation is backed by Linux High Precision Timers, and uses
> ``usleep_range``.
> +Hence, it will have the same granularity constraints as this Linux subsystem.
> +
> +For Linux High Precision Timers, you can check the following resource:
> +`Kernel Timers
> +<http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_
> +
> +To set the ``min_scheduling_interval`` to a value of 100 microseconds:
> +
> +.. code-block:: console
> +
> +    # insmod <build_dir>/kernel/linux/kni/rte_kni.ko
> + min_scheduling_interval=100
> +
> +To set the ``max_scheduling_interval`` to a value of 200 microseconds:
> +
> +.. code-block:: console
> +
> +    # insmod <build_dir>/kernel/linux/kni/rte_kni.ko
> + max_scheduling_interval=200
> +
> +If the ``min_scheduling_interval`` and ``max_scheduling_interval``
> +parameters are not specified, the default interval limits will be set to *100*
> and *200* respectively.
> +
>  KNI Creation and Deletion
>  -------------------------
> 
> diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index
> c15da311..bb4d891 100644
> --- a/kernel/linux/kni/kni_dev.h
> +++ b/kernel/linux/kni/kni_dev.h
> @@ -27,7 +27,7 @@
>  #include <linux/list.h>
> 
>  #include <rte_kni_common.h>
> -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */
> +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */
> 
>  #define MBUF_BURST_SZ 32
> 
> diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index
> f4944e1..23132bb 100644
> --- a/kernel/linux/kni/kni_misc.c
> +++ b/kernel/linux/kni/kni_misc.c
> @@ -41,6 +41,10 @@ static uint32_t multiple_kthread_on;  static char
> *carrier;  uint32_t kni_dflt_carrier;
> 
> +/* Kni thread scheduling interval */
> +static long min_scheduling_interval = 100; /* us */ static long
> +max_scheduling_interval = 200; /* us */
> +
>  #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */
> 
>  static int kni_net_id;
> @@ -128,11 +132,8 @@ kni_thread_single(void *data)
>  			}
>  		}
>  		up_read(&knet->kni_list_lock);
> -#ifdef RTE_KNI_PREEMPT_DEFAULT
>  		/* reschedule out for a while */
> -		schedule_timeout_interruptible(
> -
> 	usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
> -#endif
> +		usleep_range(min_scheduling_interval,
> max_scheduling_interval);
>  	}
> 
>  	return 0;
> @@ -149,10 +150,7 @@ kni_thread_multiple(void *param)
>  			kni_net_rx(dev);
>  			kni_net_poll_resp(dev);
>  		}
> -#ifdef RTE_KNI_PREEMPT_DEFAULT
> -		schedule_timeout_interruptible(
> -
> 	usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL));
> -#endif
> +		usleep_range(min_scheduling_interval,
> max_scheduling_interval);
>  	}
> 
>  	return 0;
> @@ -590,6 +588,14 @@ kni_init(void)
>  	else
>  		pr_debug("Default carrier state set to on.\n");
> 
> +	if (min_scheduling_interval < 0 || max_scheduling_interval < 0 ||
> +		min_scheduling_interval >
> KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL ||
> +		max_scheduling_interval >
> KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL ||
> +		min_scheduling_interval >= max_scheduling_interval) {
> +		pr_err("Invalid parameters for scheduling interval\n");
> +		return -EINVAL;
> +	}
> +
>  #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS
>  	rc = register_pernet_subsys(&kni_net_ops);
>  #else
> @@ -656,3 +662,15 @@ MODULE_PARM_DESC(carrier,
>  "\t\ton    Interfaces will be created with carrier state set to on.\n"
>  "\t\t"
>  );
> +
> +module_param(min_scheduling_interval, long, 0644);
> +MODULE_PARM_DESC(min_scheduling_interval,
> +"\t\tKni thread min scheduling interval (default=100 microseconds):\n"
> +"\t\t"
> +);
> +
> +module_param(max_scheduling_interval, long, 0644);
> +MODULE_PARM_DESC(max_scheduling_interval,
> +"\t\tKni thread max scheduling interval (default=200 microseconds):\n"
> +"\t\t"
> +);
This patch looks good to me.

Tested with <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=20 max_scheduling_interval=50

Results:
KNI Perf before patch was added:
64 bytes from 5.5.5.6: icmp_seq=1 ttl=64 time=4.79 ms
64 bytes from 5.5.5.6: icmp_seq=2 ttl=64 time=2.97 ms
64 bytes from 5.5.5.6: icmp_seq=3 ttl=64 time=1.90 ms
64 bytes from 5.5.5.6: icmp_seq=4 ttl=64 time=7.94 ms
64 bytes from 5.5.5.6: icmp_seq=5 ttl=64 time=6.85 ms
KNI Perf after patch was added (min_scheduling_interval=20 max_scheduling_interval=50):
64 bytes from 5.5.5.6: icmp_seq=1 ttl=64 time=0.106 ms
64 bytes from 5.5.5.6: icmp_seq=2 ttl=64 time=0.055 ms
64 bytes from 5.5.5.6: icmp_seq=3 ttl=64 time=0.059 ms
64 bytes from 5.5.5.6: icmp_seq=4 ttl=64 time=0.056 ms
64 bytes from 5.5.5.6: icmp_seq=5 ttl=64 time=0.061 ms

Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com>
> --
> 2.7.4

--------------------------------------------------------------
Intel Research and Development Ireland Limited
Registered in Ireland
Registered Office: Collinstown Industrial Park, Leixlip, County Kildare
Registered Number: 308263


This e-mail and any attachments may contain confidential material for the sole
use of the intended recipient(s). Any review or distribution by others is
strictly prohibited. If you are not the intended recipient, please contact the
sender and delete all copies.


  reply	other threads:[~2022-01-14 13:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-02 10:38 [dpdk-dev] [PATCH] " Tudor Cornea
2021-11-02 15:51 ` [dpdk-dev] [PATCH v2] " Tudor Cornea
2021-11-02 15:53   ` Stephen Hemminger
2021-11-03 20:40     ` Tudor Cornea
2021-11-03 22:18       ` Stephen Hemminger
2021-11-08 10:13   ` [dpdk-dev] [PATCH v3] " Tudor Cornea
2021-11-22 17:31     ` Ferruh Yigit
2021-11-23 17:08       ` Ferruh Yigit
2021-11-24 17:10         ` Tudor Cornea
2021-11-24 19:24     ` [PATCH v4] " Tudor Cornea
2022-01-14 13:53       ` Connolly, Padraig J [this message]
2022-01-14 14:13       ` Ferruh Yigit
2022-01-14 15:18       ` [PATCH v5] " Tudor Cornea
2022-01-14 16:24         ` Stephen Hemminger
2022-01-14 16:43           ` Ferruh Yigit
2022-01-17 16:24             ` Tudor Cornea
2022-01-20 12:41         ` [PATCH v6] " Tudor Cornea
2022-02-02 19:30           ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR11MB37380A4EB21112CBF7E87026D7549@DM6PR11MB3738.namprd11.prod.outlook.com \
    --to=padraig.j.connolly@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@intel.com \
    --cc=helin.zhang@intel.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    --cc=tudor.cornea@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git