From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0604CA0545; Mon, 20 Jun 2022 10:35:54 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id ECD5340F19; Mon, 20 Jun 2022 10:35:53 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id CB9B340150 for ; Mon, 20 Jun 2022 10:35:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655714152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VUpMeCwTM5lM/L9O6JlOFW5YKgAKTXlk0aUoCTVdzDw=; b=Cjm/ZJu4KL62SNn9qiLsKEUL7o7/DQVfVVfgP3JYsuiZ7M+NqJRx8jkfM63OURL81FObr8 rLpbYvkk5lV89YfkVvvfMZv/31BYU8SX2/fqrbWC37BqC5MwHiGvGCvEnDNnvPaB3REQAZ pvob6wGky5P604vmxJcZOqqXTO6rUWc= Received: from mail-lf1-f72.google.com (mail-lf1-f72.google.com [209.85.167.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-170-biCH_LEPPjeje-Z9U0oBEQ-1; Mon, 20 Jun 2022 04:35:51 -0400 X-MC-Unique: biCH_LEPPjeje-Z9U0oBEQ-1 Received: by mail-lf1-f72.google.com with SMTP id f32-20020a0565123b2000b004791bf1af10so5178873lfv.1 for ; Mon, 20 Jun 2022 01:35:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=VUpMeCwTM5lM/L9O6JlOFW5YKgAKTXlk0aUoCTVdzDw=; b=LC8Z/Nf6eaqGbwRZcoJCf0Uj7WtOccJbqKS2lW1sFoyfUeyxaVtlGC9RkaaLCLpgWc CoIuoGYkq7Qup0cNKWDx4P2k/KhmNXIF+rcWRnw7iDME8oV4xuvuhFXyRJyL2+agVYPR iLLgpF38WugrMtwWMHSVLS6RfjVKA3ADhvOpxtxRTPqZxqUxdurziUA5FeovPoiCfaea Y5HqpeFsnb00Je6nFFMKPjBs94TmA/qdqbHiFkb2bZ+hkTk5qPbHPhcjfiZuszKlHcl7 qiPWAUq2qaMgzQK2M/AoHouL1mVOtwJ7pT7A+vouvuNMQORZZgh+6lNpFRWwXjAvM1Uy EdkA== X-Gm-Message-State: AJIora/ZhV2sRWt9iW1RHYYvofxw8DNoKqtf6q9dvd6m6Tk+GyxSOgf+ 31jlhygekl7YjEKC4c2VqNFVy+2SzQpZvmaMeFEUR0xZ0ZD0uo8CPyGv8xln8wFGhOVbaXNfu3/ xpeDtKwDU1iupyfOBwyA= X-Received: by 2002:a05:651c:38e:b0:25a:7182:410d with SMTP id e14-20020a05651c038e00b0025a7182410dmr1026675ljp.81.1655714149268; Mon, 20 Jun 2022 01:35:49 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sQw5RUoPeE91BrpydJqJp4ktJAL0pHsq+RxDN+0Q2LquZliid1Yy6zlgQEYsOQNqx4/+EBmkr0aRDaWbfiB6k= X-Received: by 2002:a05:651c:38e:b0:25a:7182:410d with SMTP id e14-20020a05651c038e00b0025a7182410dmr1026655ljp.81.1655714148875; Mon, 20 Jun 2022 01:35:48 -0700 (PDT) MIME-Version: 1.0 References: <20220502141058.12707-1-donw@xsightlabs.com> <20220524195138.4963-1-donw@xsightlabs.com> In-Reply-To: <20220524195138.4963-1-donw@xsightlabs.com> From: David Marchand Date: Mon, 20 Jun 2022 10:35:37 +0200 Message-ID: Subject: Re: [PATCH v6] eal: allow worker lcore stacks to be allocated from hugepage memory To: Don Wallwork , Thomas Monjalon , "Burakov, Anatoly" , Dmitry Kozlyuk Cc: dev , Stephen Hemminger , Chengwen Feng , =?UTF-8?Q?Morten_Br=C3=B8rup?= , Bruce Richardson , Honnappa Nagarahalli , nd , "Wang, Haiyue" , Kathleen.Capella@arm.com Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dmarchan@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, May 24, 2022 at 9:52 PM Don Wallwork wrote: > > Add support for using hugepages for worker lcore stack memory. The > intent is to improve performance by reducing stack memory related TLB > misses and also by using memory local to the NUMA node of each lcore. > EAL option '--huge-worker-stack [stack-size-in-kbytes]' is added to allow > the feature to be enabled at runtime. If the size is not specified, > the system pthread stack size will be used. - About the name of the option... I don't have a better name. Just want to highlight, that what this patch does is use the DPDK memory allocator for the stack memory. It happens that DPDK memory allocator is primarily used with hugepages, but this is not systematic for example with the "no-huge" mode of the DPDK memory allocator. IOW, in this patch current form, you can still run as: # dpdk-testpmd -c 3 --no-huge --huge-worker-stack=3D16 -m 40 -- etc... Opinions? - This patch adds one more EAL flag, we need some unit test (even if basic)= . Comments below: > > Signed-off-by: Don Wallwork > Acked-by: Morten Br=C3=B8rup > Acked-by: Chengwen Feng > --- > doc/guides/linux_gsg/eal_args.include.rst | 6 ++ > .../prog_guide/env_abstraction_layer.rst | 21 +++++++ > lib/eal/common/eal_common_options.c | 35 +++++++++++ > lib/eal/common/eal_internal_cfg.h | 4 ++ > lib/eal/common/eal_options.h | 2 + > lib/eal/freebsd/eal.c | 6 ++ > lib/eal/linux/eal.c | 61 ++++++++++++++++++- > lib/eal/windows/eal.c | 6 ++ > 8 files changed, 139 insertions(+), 2 deletions(-) > > diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux= _gsg/eal_args.include.rst > index 3549a0cf56..9cfbf7de84 100644 > --- a/doc/guides/linux_gsg/eal_args.include.rst > +++ b/doc/guides/linux_gsg/eal_args.include.rst > @@ -116,6 +116,12 @@ Memory-related options > > Force IOVA mode to a specific value. > > +* ``--huge-worker-stack[=3Dsize]`` > + > + Allocate worker stack memory from hugepage memory. Stack size defaul= ts > + to system pthread stack size unless the optional size (in kbytes) is > + specified. > + > Debugging options > ~~~~~~~~~~~~~~~~~ > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides= /prog_guide/env_abstraction_layer.rst > index 5f0748fba1..e74516f0cf 100644 > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > @@ -329,6 +329,27 @@ Another option is to use bigger page sizes. Since fe= wer pages are required to > cover the same memory area, fewer file descriptors will be stored intern= ally > by EAL. > > +.. _huge-worker-stack: There is nothing pointing to this reference. It can be removed. > + > +Hugepage Worker Stacks > +^^^^^^^^^^^^^^^^^^^^^^ > + > +When the ``--huge-worker-stack[=3Dsize]`` EAL option is specified, worke= r > +thread stacks are allocated from hugepage memory local to the NUMA node > +of the thread. Worker stack size defaults to system pthread stack size > +if the optional size parameter is not specified. > + > +.. warning:: > + Stacks allocated from hugepage memory are not protected by guard > + pages. Worker stacks must be sufficiently sized to prevent stack > + overflow when this option is used. > + > + As with normal thread stacks, hugepage worker thread stack size is > + fixed and is not dynamically resized. Therefore, an application that > + is free of stack page faults under a given load should be safe with > + hugepage worker thread stacks given the same thread stack size and > + loading conditions. > + > Support for Externally Allocated Memory > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_com= mon_options.c > index f247a42455..02e59051e8 100644 > --- a/lib/eal/common/eal_common_options.c > +++ b/lib/eal/common/eal_common_options.c > @@ -103,6 +103,7 @@ eal_long_options[] =3D { > {OPT_TELEMETRY, 0, NULL, OPT_TELEMETRY_NUM }, > {OPT_NO_TELEMETRY, 0, NULL, OPT_NO_TELEMETRY_NUM }, > {OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, OPT_FORCE_MAX_SIMD_BITWIDT= H_NUM}, > + {OPT_HUGE_WORKER_STACK, 2, NULL, OPT_HUGE_WORKER_STACK_NUM }, > > {0, 0, NULL, 0 } > }; > @@ -1618,6 +1619,26 @@ eal_parse_huge_unlink(const char *arg, struct huge= page_file_discipline *out) > return -1; > } > > +static int > +eal_parse_huge_worker_stack(const char *arg, size_t *huge_worker_stack_s= ize) > +{ > + size_t worker_stack_size; Nit: strtoul returns an unsigned long int. POSIX defines size_t as an unsigned integer. It does not specify though that size_t can handle a long unsigned integer. > + char *end; > + > + if (arg =3D=3D NULL || arg[0] =3D=3D '\0') { > + *huge_worker_stack_size =3D WORKER_STACK_SIZE_FROM_OS; We should resolve (in theory, via a OS-specific helper, but this is overkill if we move this to Linux EAL) the default stack size here, once and for all. That simplifies the EAL worker thread create helper. WORKER_STACK_SIZE_FROM_OS is then unneeded. And this parser needs some debug level EAL log, so that users know the feature is engaged. > + return 0; > + } > + errno =3D 0; > + worker_stack_size =3D strtoul(arg, &end, 10); > + if (errno || end =3D=3D NULL || worker_stack_size =3D=3D 0 || > + worker_stack_size >=3D (size_t)-1 / 1024) > + return -1; > + > + *huge_worker_stack_size =3D worker_stack_size * 1024; With previous comments applied, this could look like: +static int +eal_parse_huge_worker_stack(const char *arg) +{ + struct internal_config *cfg =3D eal_get_internal_configuration(); + + if (arg =3D=3D NULL || arg[0] =3D=3D '\0') { + pthread_attr_t attr; + int ret; + + if (pthread_attr_init(&attr) !=3D 0) { + RTE_LOG(ERR, EAL, "Could not retrieve default stack size\n"); + return -1; + } + ret =3D pthread_attr_getstacksize(&attr, &cfg->huge_worker_stack_size); + pthread_attr_destroy(&attr); + if (ret !=3D 0) { + RTE_LOG(ERR, EAL, "Could not retrieve default stack size\n"); + return -1; + } + } else { + unsigned long int stack_size; + char *end; + + errno =3D 0; + stack_size =3D strtoul(arg, &end, 10); + if (errno || end =3D=3D NULL || stack_size =3D=3D 0 || + stack_size >=3D (size_t)-1 / 1024) + return -1; + + cfg->huge_worker_stack_size =3D stack_size * 1024; + } + + RTE_LOG(DEBUG, EAL, "EAL worker threads will use %zu kB of DPDK memory as stack.\n", + cfg->huge_worker_stack_size / 1024); + return 0; +} > + return 0; > +} > + > int > eal_parse_common_option(int opt, const char *optarg, > struct internal_config *conf) > @@ -1921,6 +1942,15 @@ eal_parse_common_option(int opt, const char *optar= g, > } > break; > > + case OPT_HUGE_WORKER_STACK_NUM: > + if (eal_parse_huge_worker_stack(optarg, > + &conf->huge_worker_stack_= size) < 0) { > + RTE_LOG(ERR, EAL, "invalid parameter for --" > + OPT_HUGE_WORKER_STACK"\n"); > + return -1; > + } > + break; > + This parser and calling it should be moved out of the common options, and moved to the Linux implementation (around https://git.dpdk.org/dpdk/tree/lib/eal/linux/eal.c#n715) as it is a Linux-only option at the moment. Doing so, there is nothing to add to FreeBSD and Windows EAL. > /* don't know what to do, leave this to caller */ > default: > return 1; > @@ -2235,5 +2265,10 @@ eal_common_usage(void) > " --"OPT_NO_PCI" Disable PCI\n" > " --"OPT_NO_HPET" Disable HPET\n" > " --"OPT_NO_SHCONF" No shared config (mmap'd file= s)\n" > + " --"OPT_HUGE_WORKER_STACK"[=3Dsize]\n" > + " Allocate worker thread stacks from\= n" > + " hugepage memory. Size is in units o= f\n" > + " kbytes and defaults to system threa= d\n" > + " stack size if not specified.\n" > "\n", RTE_MAX_LCORE); > } > diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_inter= nal_cfg.h > index b71faadd18..5e154967e4 100644 > --- a/lib/eal/common/eal_internal_cfg.h > +++ b/lib/eal/common/eal_internal_cfg.h > @@ -48,6 +48,9 @@ struct hugepage_file_discipline { > bool unlink_existing; > }; > > +/** Worker hugepage stack size should default to OS value. */ > +#define WORKER_STACK_SIZE_FROM_OS ((size_t)~0) No need for this limit, as per previous comment. > + > /** > * internal configuration > */ > @@ -102,6 +105,7 @@ struct internal_config { > unsigned int no_telemetry; /**< true to disable Telemetry */ > struct simd_bitwidth max_simd_bitwidth; > /**< max simd bitwidth path to use */ > + size_t huge_worker_stack_size; /**< worker thread stack size */ > }; > > void eal_reset_internal_config(struct internal_config *internal_cfg); > diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h > index 8e4f7202a2..3cc9cb6412 100644 > --- a/lib/eal/common/eal_options.h > +++ b/lib/eal/common/eal_options.h > @@ -87,6 +87,8 @@ enum { > OPT_NO_TELEMETRY_NUM, > #define OPT_FORCE_MAX_SIMD_BITWIDTH "force-max-simd-bitwidth" > OPT_FORCE_MAX_SIMD_BITWIDTH_NUM, > +#define OPT_HUGE_WORKER_STACK "huge-worker-stack" > + OPT_HUGE_WORKER_STACK_NUM, > > OPT_LONG_MAX_NUM > }; > diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c > index a6b20960f2..7368956649 100644 > --- a/lib/eal/freebsd/eal.c > +++ b/lib/eal/freebsd/eal.c > @@ -795,6 +795,12 @@ rte_eal_init(int argc, char **argv) > config->main_lcore, (uintptr_t)pthread_self(), cpuset, > ret =3D=3D 0 ? "" : "..."); > > + if (internal_conf->huge_worker_stack_size !=3D 0) { > + rte_eal_init_alert("Hugepage worker stacks not supported"= ); > + rte_errno =3D ENOTSUP; > + return -1; > + } > + As previously mentionned, this is unneeded if option is parsed only in lib/eal/linux/eal.c. > RTE_LCORE_FOREACH_WORKER(i) { > > /* > diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c > index 1ef263434a..d28a0fdb78 100644 > --- a/lib/eal/linux/eal.c > +++ b/lib/eal/linux/eal.c > @@ -857,6 +857,64 @@ is_iommu_enabled(void) > return n > 2; > } > > +static int > +eal_worker_thread_create(struct internal_config *internal_conf, > + int lcore_id) > +{ > + pthread_attr_t attr; > + size_t stack_size; > + void *stack_ptr; > + int ret; > + > + if (internal_conf->huge_worker_stack_size =3D=3D 0) > + return pthread_create(&lcore_config[lcore_id].thread_id, > + NULL, > + eal_thread_loop, > + (void *)(uintptr_t)lcore_id); If you invert the branch here (checking that stack_size !=3D 0), all the stack setup can then be put under a branch and we have a more readable unified call to pthread_create. If more pthread attributes were to be added in the future, the code would be ready too. > + > + /* Allocate NUMA aware stack memory and set pthread attributes */ > + if (pthread_attr_init(&attr) !=3D 0) { > + rte_eal_init_alert("Cannot init pthread attributes"); > + rte_errno =3D EFAULT; > + return -1; > + } > + if (internal_conf->huge_worker_stack_size =3D=3D WORKER_STACK_SIZ= E_FROM_OS) { > + if (pthread_attr_getstacksize(&attr, &stack_size) !=3D 0)= { > + rte_errno =3D EFAULT; > + return -1; > + } > + } else { > + stack_size =3D internal_conf->huge_worker_stack_size; > + } > + stack_ptr =3D rte_zmalloc_socket("lcore_stack", > + stack_size, > + RTE_CACHE_LINE_SIZE, > + rte_lcore_to_socket_id(lcore_id)); This stack_ptr is "leaked" if any later branch in this function ends up in failure. > + > + if (stack_ptr =3D=3D NULL) { > + rte_eal_init_alert("Cannot allocate worker lcore stack me= mory"); > + rte_errno =3D ENOMEM; > + return -1; > + } > + > + if (pthread_attr_setstack(&attr, stack_ptr, stack_size) !=3D 0) { > + rte_eal_init_alert("Cannot set pthread stack attributes")= ; > + rte_errno =3D EFAULT; > + return -1; > + } > + > + ret =3D pthread_create(&lcore_config[lcore_id].thread_id, &attr, > + eal_thread_loop, > + (void *)(uintptr_t)lcore_id); > + > + if (pthread_attr_destroy(&attr) !=3D 0) { > + rte_eal_init_alert("Cannot destroy pthread attributes"); > + rte_errno =3D EFAULT; > + return -1; > + } > + return ret; > +} With previous comments applied, this could look like: +static int +eal_worker_thread_create(unsigned int lcore_id) +{ + pthread_attr_t *attrp =3D NULL; + void *stack_ptr =3D NULL; + pthread_attr_t attr; + size_t stack_size; + int ret =3D -1; + + stack_size =3D eal_get_internal_configuration()->huge_worker_stack_= size; + if (stack_size !=3D 0) { + + /* Allocate NUMA aware stack memory and set pthread attributes */ + stack_ptr =3D rte_zmalloc_socket("lcore_stack", stack_size, + RTE_CACHE_LINE_SIZE, rte_lcore_to_socket_id(lcore_i= d)); + if (stack_ptr =3D=3D NULL) { + rte_eal_init_alert("Cannot allocate worker lcore stack memory"); + rte_errno =3D ENOMEM; + goto out; + } + + if (pthread_attr_init(&attr) !=3D 0) { + rte_eal_init_alert("Cannot init pthread attributes"= ); + rte_errno =3D EFAULT; + goto out; + } + attrp =3D &attr; + + if (pthread_attr_setstack(attrp, stack_ptr, stack_size) != =3D 0) { + rte_eal_init_alert("Cannot set pthread stack attributes"); + rte_errno =3D EFAULT; + goto out; + } + } + + if (pthread_create(&lcore_config[lcore_id].thread_id, attrp, + eal_thread_loop, (void *)(uintptr_t)lcore_id) =3D= =3D 0) + ret =3D 0; + +out: + if (ret !=3D 0) + rte_free(stack_ptr); + if (attrp !=3D NULL) + pthread_attr_destroy(attrp); + return ret; +} > + > /* Launch threads, called at application init(). */ > int > rte_eal_init(int argc, char **argv) > @@ -1144,8 +1202,7 @@ rte_eal_init(int argc, char **argv) > lcore_config[i].state =3D WAIT; > > /* create a thread for each lcore */ > - ret =3D pthread_create(&lcore_config[i].thread_id, NULL, > - eal_thread_loop, (void *)(uintptr_t)= i); > + ret =3D eal_worker_thread_create(internal_conf, i); > if (ret !=3D 0) > rte_panic("Cannot create thread\n"); > > diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c > index 122de2a319..5cd4a45872 100644 > --- a/lib/eal/windows/eal.c > +++ b/lib/eal/windows/eal.c > @@ -416,6 +416,12 @@ rte_eal_init(int argc, char **argv) > config->main_lcore, (uintptr_t)pthread_self(), cpuset, > ret =3D=3D 0 ? "" : "..."); > > + if (internal_conf->huge_worker_stack_size !=3D 0) { > + rte_eal_init_alert("Hugepage worker stacks not supported"= ); > + rte_errno =3D ENOTSUP; > + return -1; > + } > + Same as FreeBSD, this is unneeded if option is parsed only in lib/eal/linux/eal.c. > RTE_LCORE_FOREACH_WORKER(i) { > > /* > -- > 2.17.1 > --=20 David Marchand