DPDK patches and discussions
 help / color / mirror / Atom feed
From: David Marchand <david.marchand@redhat.com>
To: Don Wallwork <donw@xsightlabs.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	 "Burakov, Anatoly" <anatoly.burakov@intel.com>,
	Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: dev <dev@dpdk.org>,
	"Stephen Hemminger" <stephen@networkplumber.org>,
	"Chengwen Feng" <fengchengwen@huawei.com>,
	"Morten Brørup" <mb@smartsharesystems.com>,
	"Bruce Richardson" <bruce.richardson@intel.com>,
	"Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>, "Wang, Haiyue" <haiyue.wang@intel.com>,
	Kathleen.Capella@arm.com
Subject: Re: [PATCH v6] eal: allow worker lcore stacks to be allocated from hugepage memory
Date: Mon, 20 Jun 2022 10:35:37 +0200	[thread overview]
Message-ID: <CAJFAV8xBKESHrPZmbTnwHTy7hrUyFArQJG6yVaAkcNha0LO+Bw@mail.gmail.com> (raw)
In-Reply-To: <20220524195138.4963-1-donw@xsightlabs.com>

On Tue, May 24, 2022 at 9:52 PM Don Wallwork <donw@xsightlabs.com> wrote:
>
> Add support for using hugepages for worker lcore stack memory.  The
> intent is to improve performance by reducing stack memory related TLB
> misses and also by using memory local to the NUMA node of each lcore.
> EAL option '--huge-worker-stack [stack-size-in-kbytes]' is added to allow
> the feature to be enabled at runtime.  If the size is not specified,
> the system pthread stack size will be used.

- About the name of the option... I don't have a better name.

Just want to highlight, that what this patch does is use the DPDK
memory allocator for the stack memory.
It happens that DPDK memory allocator is primarily used with
hugepages, but this is not systematic for example with the "no-huge"
mode of the DPDK memory allocator.

IOW, in this patch current form, you can still run as:

# dpdk-testpmd -c 3 --no-huge --huge-worker-stack=16 -m 40 -- etc...

Opinions?


- This patch adds one more EAL flag, we need some unit test (even if basic).


Comments below:

>
> Signed-off-by: Don Wallwork <donw@xsightlabs.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
>  doc/guides/linux_gsg/eal_args.include.rst     |  6 ++
>  .../prog_guide/env_abstraction_layer.rst      | 21 +++++++
>  lib/eal/common/eal_common_options.c           | 35 +++++++++++
>  lib/eal/common/eal_internal_cfg.h             |  4 ++
>  lib/eal/common/eal_options.h                  |  2 +
>  lib/eal/freebsd/eal.c                         |  6 ++
>  lib/eal/linux/eal.c                           | 61 ++++++++++++++++++-
>  lib/eal/windows/eal.c                         |  6 ++
>  8 files changed, 139 insertions(+), 2 deletions(-)
>
> diff --git a/doc/guides/linux_gsg/eal_args.include.rst b/doc/guides/linux_gsg/eal_args.include.rst
> index 3549a0cf56..9cfbf7de84 100644
> --- a/doc/guides/linux_gsg/eal_args.include.rst
> +++ b/doc/guides/linux_gsg/eal_args.include.rst
> @@ -116,6 +116,12 @@ Memory-related options
>
>      Force IOVA mode to a specific value.
>
> +*   ``--huge-worker-stack[=size]``
> +
> +    Allocate worker stack memory from hugepage memory. Stack size defaults
> +    to system pthread stack size unless the optional size (in kbytes) is
> +    specified.
> +
>  Debugging options
>  ~~~~~~~~~~~~~~~~~
>
> diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst
> index 5f0748fba1..e74516f0cf 100644
> --- a/doc/guides/prog_guide/env_abstraction_layer.rst
> +++ b/doc/guides/prog_guide/env_abstraction_layer.rst
> @@ -329,6 +329,27 @@ Another option is to use bigger page sizes. Since fewer pages are required to
>  cover the same memory area, fewer file descriptors will be stored internally
>  by EAL.
>
> +.. _huge-worker-stack:

There is nothing pointing to this reference.
It can be removed.


> +
> +Hugepage Worker Stacks
> +^^^^^^^^^^^^^^^^^^^^^^
> +
> +When the ``--huge-worker-stack[=size]`` EAL option is specified, worker
> +thread stacks are allocated from hugepage memory local to the NUMA node
> +of the thread. Worker stack size defaults to system pthread stack size
> +if the optional size parameter is not specified.
> +
> +.. warning::
> +    Stacks allocated from hugepage memory are not protected by guard
> +    pages. Worker stacks must be sufficiently sized to prevent stack
> +    overflow when this option is used.
> +
> +    As with normal thread stacks, hugepage worker thread stack size is
> +    fixed and is not dynamically resized. Therefore, an application that
> +    is free of stack page faults under a given load should be safe with
> +    hugepage worker thread stacks given the same thread stack size and
> +    loading conditions.
> +
>  Support for Externally Allocated Memory
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c
> index f247a42455..02e59051e8 100644
> --- a/lib/eal/common/eal_common_options.c
> +++ b/lib/eal/common/eal_common_options.c
> @@ -103,6 +103,7 @@ eal_long_options[] = {
>         {OPT_TELEMETRY,         0, NULL, OPT_TELEMETRY_NUM        },
>         {OPT_NO_TELEMETRY,      0, NULL, OPT_NO_TELEMETRY_NUM     },
>         {OPT_FORCE_MAX_SIMD_BITWIDTH, 1, NULL, OPT_FORCE_MAX_SIMD_BITWIDTH_NUM},
> +       {OPT_HUGE_WORKER_STACK, 2, NULL, OPT_HUGE_WORKER_STACK_NUM     },
>
>         {0,                     0, NULL, 0                        }
>  };
> @@ -1618,6 +1619,26 @@ eal_parse_huge_unlink(const char *arg, struct hugepage_file_discipline *out)
>         return -1;
>  }
>
> +static int
> +eal_parse_huge_worker_stack(const char *arg, size_t *huge_worker_stack_size)
> +{
> +       size_t worker_stack_size;

Nit: strtoul returns an unsigned long int.
POSIX defines size_t as an unsigned integer.
It does not specify though that size_t can handle a long unsigned integer.


> +       char *end;
> +
> +       if (arg == NULL || arg[0] == '\0') {
> +               *huge_worker_stack_size = WORKER_STACK_SIZE_FROM_OS;

We should resolve (in theory, via a OS-specific helper, but this is
overkill if we move this to Linux EAL) the default stack size here,
once and for all.
That simplifies the EAL worker thread create helper.
WORKER_STACK_SIZE_FROM_OS is then unneeded.

And this parser needs some debug level EAL log, so that users know the
feature is engaged.


> +               return 0;
> +       }
> +       errno = 0;
> +       worker_stack_size = strtoul(arg, &end, 10);
> +       if (errno || end == NULL || worker_stack_size == 0 ||
> +           worker_stack_size >= (size_t)-1 / 1024)
> +               return -1;
> +
> +       *huge_worker_stack_size = worker_stack_size * 1024;

With previous comments applied, this could look like:

+static int
+eal_parse_huge_worker_stack(const char *arg)
+{
+       struct internal_config *cfg = eal_get_internal_configuration();
+
+       if (arg == NULL || arg[0] == '\0') {
+               pthread_attr_t attr;
+               int ret;
+
+               if (pthread_attr_init(&attr) != 0) {
+                       RTE_LOG(ERR, EAL, "Could not retrieve default
stack size\n");
+                       return -1;
+               }
+               ret = pthread_attr_getstacksize(&attr,
&cfg->huge_worker_stack_size);
+               pthread_attr_destroy(&attr);
+               if (ret != 0) {
+                       RTE_LOG(ERR, EAL, "Could not retrieve default
stack size\n");
+                       return -1;
+               }
+       } else {
+               unsigned long int stack_size;
+               char *end;
+
+               errno = 0;
+               stack_size = strtoul(arg, &end, 10);
+               if (errno || end == NULL || stack_size == 0 ||
+                               stack_size >= (size_t)-1 / 1024)
+                       return -1;
+
+               cfg->huge_worker_stack_size = stack_size * 1024;
+       }
+
+       RTE_LOG(DEBUG, EAL, "EAL worker threads will use %zu kB of
DPDK memory as stack.\n",
+               cfg->huge_worker_stack_size / 1024);
+       return 0;
+}



> +       return 0;
> +}
> +
>  int
>  eal_parse_common_option(int opt, const char *optarg,
>                         struct internal_config *conf)
> @@ -1921,6 +1942,15 @@ eal_parse_common_option(int opt, const char *optarg,
>                 }
>                 break;
>
> +       case OPT_HUGE_WORKER_STACK_NUM:
> +               if (eal_parse_huge_worker_stack(optarg,
> +                                               &conf->huge_worker_stack_size) < 0) {
> +                       RTE_LOG(ERR, EAL, "invalid parameter for --"
> +                               OPT_HUGE_WORKER_STACK"\n");
> +                       return -1;
> +               }
> +               break;
> +

This parser and calling it should be moved out of the common options,
and moved to the Linux implementation (around
https://git.dpdk.org/dpdk/tree/lib/eal/linux/eal.c#n715) as it is a
Linux-only option at the moment.
Doing so, there is nothing to add to FreeBSD and Windows EAL.


>         /* don't know what to do, leave this to caller */
>         default:
>                 return 1;
> @@ -2235,5 +2265,10 @@ eal_common_usage(void)
>                "  --"OPT_NO_PCI"            Disable PCI\n"
>                "  --"OPT_NO_HPET"           Disable HPET\n"
>                "  --"OPT_NO_SHCONF"         No shared config (mmap'd files)\n"
> +              "  --"OPT_HUGE_WORKER_STACK"[=size]\n"
> +              "                      Allocate worker thread stacks from\n"
> +              "                      hugepage memory. Size is in units of\n"
> +              "                      kbytes and defaults to system thread\n"
> +              "                      stack size if not specified.\n"
>                "\n", RTE_MAX_LCORE);
>  }
> diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_internal_cfg.h
> index b71faadd18..5e154967e4 100644
> --- a/lib/eal/common/eal_internal_cfg.h
> +++ b/lib/eal/common/eal_internal_cfg.h
> @@ -48,6 +48,9 @@ struct hugepage_file_discipline {
>         bool unlink_existing;
>  };
>
> +/** Worker hugepage stack size should default to OS value. */
> +#define WORKER_STACK_SIZE_FROM_OS ((size_t)~0)

No need for this limit, as per previous comment.


> +
>  /**
>   * internal configuration
>   */
> @@ -102,6 +105,7 @@ struct internal_config {
>         unsigned int no_telemetry; /**< true to disable Telemetry */
>         struct simd_bitwidth max_simd_bitwidth;
>         /**< max simd bitwidth path to use */
> +       size_t huge_worker_stack_size; /**< worker thread stack size */
>  };
>
>  void eal_reset_internal_config(struct internal_config *internal_cfg);
> diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h
> index 8e4f7202a2..3cc9cb6412 100644
> --- a/lib/eal/common/eal_options.h
> +++ b/lib/eal/common/eal_options.h
> @@ -87,6 +87,8 @@ enum {
>         OPT_NO_TELEMETRY_NUM,
>  #define OPT_FORCE_MAX_SIMD_BITWIDTH  "force-max-simd-bitwidth"
>         OPT_FORCE_MAX_SIMD_BITWIDTH_NUM,
> +#define OPT_HUGE_WORKER_STACK  "huge-worker-stack"
> +       OPT_HUGE_WORKER_STACK_NUM,
>
>         OPT_LONG_MAX_NUM
>  };
> diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
> index a6b20960f2..7368956649 100644
> --- a/lib/eal/freebsd/eal.c
> +++ b/lib/eal/freebsd/eal.c
> @@ -795,6 +795,12 @@ rte_eal_init(int argc, char **argv)
>                 config->main_lcore, (uintptr_t)pthread_self(), cpuset,
>                 ret == 0 ? "" : "...");
>
> +       if (internal_conf->huge_worker_stack_size != 0) {
> +               rte_eal_init_alert("Hugepage worker stacks not supported");
> +               rte_errno = ENOTSUP;
> +               return -1;
> +       }
> +

As previously mentionned, this is unneeded if option is parsed only in
lib/eal/linux/eal.c.

>         RTE_LCORE_FOREACH_WORKER(i) {
>
>                 /*
> diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
> index 1ef263434a..d28a0fdb78 100644
> --- a/lib/eal/linux/eal.c
> +++ b/lib/eal/linux/eal.c
> @@ -857,6 +857,64 @@ is_iommu_enabled(void)
>         return n > 2;
>  }
>
> +static int
> +eal_worker_thread_create(struct internal_config *internal_conf,
> +                        int lcore_id)
> +{
> +       pthread_attr_t attr;
> +       size_t stack_size;
> +       void *stack_ptr;
> +       int ret;
> +
> +       if (internal_conf->huge_worker_stack_size == 0)
> +               return pthread_create(&lcore_config[lcore_id].thread_id,
> +                                     NULL,
> +                                     eal_thread_loop,
> +                                     (void *)(uintptr_t)lcore_id);

If you invert the branch here (checking that stack_size != 0), all the
stack setup can then be put under a branch and we have a more readable
unified call to pthread_create.
If more pthread attributes were to be added in the future, the code
would be ready too.


> +
> +       /* Allocate NUMA aware stack memory and set pthread attributes */
> +       if (pthread_attr_init(&attr) != 0) {
> +               rte_eal_init_alert("Cannot init pthread attributes");
> +               rte_errno = EFAULT;
> +               return -1;
> +       }
> +       if (internal_conf->huge_worker_stack_size == WORKER_STACK_SIZE_FROM_OS) {
> +               if (pthread_attr_getstacksize(&attr, &stack_size) != 0) {
> +                       rte_errno = EFAULT;
> +                       return -1;
> +               }
> +       } else {
> +               stack_size = internal_conf->huge_worker_stack_size;
> +       }
> +       stack_ptr = rte_zmalloc_socket("lcore_stack",
> +                                      stack_size,
> +                                      RTE_CACHE_LINE_SIZE,
> +                                      rte_lcore_to_socket_id(lcore_id));

This stack_ptr is "leaked" if any later branch in this function ends
up in failure.


> +
> +       if (stack_ptr == NULL) {
> +               rte_eal_init_alert("Cannot allocate worker lcore stack memory");
> +               rte_errno = ENOMEM;
> +               return -1;
> +       }
> +
> +       if (pthread_attr_setstack(&attr, stack_ptr, stack_size) != 0) {
> +               rte_eal_init_alert("Cannot set pthread stack attributes");
> +               rte_errno = EFAULT;
> +               return -1;
> +       }
> +
> +       ret = pthread_create(&lcore_config[lcore_id].thread_id, &attr,
> +                            eal_thread_loop,
> +                            (void *)(uintptr_t)lcore_id);
> +
> +       if (pthread_attr_destroy(&attr) != 0) {
> +               rte_eal_init_alert("Cannot destroy pthread attributes");
> +               rte_errno = EFAULT;
> +               return -1;
> +       }
> +       return ret;
> +}

With previous comments applied, this could look like:

+static int
+eal_worker_thread_create(unsigned int lcore_id)
+{
+       pthread_attr_t *attrp = NULL;
+       void *stack_ptr = NULL;
+       pthread_attr_t attr;
+       size_t stack_size;
+       int ret = -1;
+
+       stack_size = eal_get_internal_configuration()->huge_worker_stack_size;
+       if (stack_size != 0) {
+
+               /* Allocate NUMA aware stack memory and set pthread
attributes */
+               stack_ptr = rte_zmalloc_socket("lcore_stack", stack_size,
+                       RTE_CACHE_LINE_SIZE, rte_lcore_to_socket_id(lcore_id));
+               if (stack_ptr == NULL) {
+                       rte_eal_init_alert("Cannot allocate worker
lcore stack memory");
+                       rte_errno = ENOMEM;
+                       goto out;
+               }
+
+               if (pthread_attr_init(&attr) != 0) {
+                       rte_eal_init_alert("Cannot init pthread attributes");
+                       rte_errno = EFAULT;
+                       goto out;
+               }
+               attrp = &attr;
+
+               if (pthread_attr_setstack(attrp, stack_ptr, stack_size) != 0) {
+                       rte_eal_init_alert("Cannot set pthread stack
attributes");
+                       rte_errno = EFAULT;
+                       goto out;
+               }
+       }
+
+       if (pthread_create(&lcore_config[lcore_id].thread_id, attrp,
+                       eal_thread_loop, (void *)(uintptr_t)lcore_id) == 0)
+               ret = 0;
+
+out:
+       if (ret != 0)
+               rte_free(stack_ptr);
+       if (attrp != NULL)
+               pthread_attr_destroy(attrp);
+       return ret;
+}


> +
>  /* Launch threads, called at application init(). */
>  int
>  rte_eal_init(int argc, char **argv)
> @@ -1144,8 +1202,7 @@ rte_eal_init(int argc, char **argv)
>                 lcore_config[i].state = WAIT;
>
>                 /* create a thread for each lcore */
> -               ret = pthread_create(&lcore_config[i].thread_id, NULL,
> -                                    eal_thread_loop, (void *)(uintptr_t)i);
> +               ret = eal_worker_thread_create(internal_conf, i);
>                 if (ret != 0)
>                         rte_panic("Cannot create thread\n");
>
> diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c
> index 122de2a319..5cd4a45872 100644
> --- a/lib/eal/windows/eal.c
> +++ b/lib/eal/windows/eal.c
> @@ -416,6 +416,12 @@ rte_eal_init(int argc, char **argv)
>                 config->main_lcore, (uintptr_t)pthread_self(), cpuset,
>                 ret == 0 ? "" : "...");
>
> +       if (internal_conf->huge_worker_stack_size != 0) {
> +               rte_eal_init_alert("Hugepage worker stacks not supported");
> +               rte_errno = ENOTSUP;
> +               return -1;
> +       }
> +

Same as FreeBSD, this is unneeded if option is parsed only in
lib/eal/linux/eal.c.

>         RTE_LCORE_FOREACH_WORKER(i) {
>
>                 /*
> --
> 2.17.1
>


-- 
David Marchand


  parent reply	other threads:[~2022-06-20  8:35 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-02 14:10 [PATCH] " Don Wallwork
2022-05-03  6:10 ` Morten Brørup
2022-05-03 13:08 ` Wang, Haiyue
2022-05-03 19:46   ` Don Wallwork
2022-05-04  3:08     ` Wang, Haiyue
2022-05-13 17:58 ` [PATCH v2] " Don Wallwork
2022-05-13 21:38   ` Stephen Hemminger
2022-05-16 19:43     ` Don Wallwork
2022-05-13 21:41   ` Stephen Hemminger
2022-05-14  3:31   ` fengchengwen
2022-05-16 19:47     ` Don Wallwork
2022-05-17  6:28       ` Morten Brørup
2022-05-16 19:50 ` [PATCH v3] " Don Wallwork
2022-05-16 20:28   ` Stephen Hemminger
2022-05-16 20:29     ` Don Wallwork
2022-05-17 15:31 ` [PATCH v4] " Don Wallwork
2022-05-17 15:56   ` Stephen Hemminger
2022-05-18 14:10     ` Don Wallwork
2022-05-20  8:30   ` fengchengwen
2022-05-23 22:35   ` Kathleen Capella
2022-05-24 13:48     ` Don Wallwork
2022-05-24 14:40   ` Burakov, Anatoly
2022-05-24 19:38     ` Don Wallwork
2022-05-24 19:46 ` [PATCH v5] " Don Wallwork
2022-05-24 19:51 ` [PATCH v6] " Don Wallwork
2022-06-01  0:05   ` Kathleen Capella
2022-06-20  8:35   ` David Marchand [this message]
2022-06-21 10:37     ` Thomas Monjalon
2022-06-21 12:31       ` Don Wallwork
2022-06-21 14:42         ` Thomas Monjalon
2022-06-21 14:52           ` Don Wallwork
2022-06-21 15:00             ` Thomas Monjalon
2022-06-21 16:32               ` Honnappa Nagarahalli
2022-06-21 19:33               ` David Marchand
2022-06-23 11:21 ` [PATCH v7] " Don Wallwork
2022-06-23 20:32   ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJFAV8xBKESHrPZmbTnwHTy7hrUyFArQJG6yVaAkcNha0LO+Bw@mail.gmail.com \
    --to=david.marchand@redhat.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Kathleen.Capella@arm.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=donw@xsightlabs.com \
    --cc=fengchengwen@huawei.com \
    --cc=haiyue.wang@intel.com \
    --cc=mb@smartsharesystems.com \
    --cc=nd@arm.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).