DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Don Wallwork" <donw@xsightlabs.com>,
	"Stephen Hemminger" <stephen@networkplumber.org>,
	"Anatoly Burakov" <anatoly.burakov@intel.com>,
	"Dmitry Kozlyuk" <dmitry.kozliuk@gmail.com>,
	"Bruce Richardson" <bruce.richardson@intel.com>
Cc: <dev@dpdk.org>
Subject: RE: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory
Date: Wed, 27 Apr 2022 10:17:40 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D87006@smartserver.smartshare.dk> (raw)
In-Reply-To: <aa79c2bf-adb1-a489-6162-f2021fd4c144@xsightlabs.com>

+CC: EAL and Memory maintainers.

> From: Don Wallwork [mailto:donw@xsightlabs.com]
> Sent: Tuesday, 26 April 2022 23.26
> 
> On 4/26/2022 5:21 PM, Stephen Hemminger wrote:
> > On Tue, 26 Apr 2022 17:01:18 -0400
> > Don Wallwork <donw@xsightlabs.com> wrote:
> >
> >> On 4/26/2022 10:58 AM, Stephen Hemminger wrote:
> >>> On Tue, 26 Apr 2022 08:19:59 -0400
> >>> Don Wallwork <donw@xsightlabs.com> wrote:
> >>>
> >>>> Add support for using hugepages for worker lcore stack memory.
> The
> >>>> intent is to improve performance by reducing stack memory related
> TLB
> >>>> misses and also by using memory local to the NUMA node of each
> lcore.

This certainly seems like a good idea!

However, I wonder: Does the O/S assign memory local to the NUMA node to an lcore-pinned thread's stack when instantiating the tread? And does the DPDK EAL ensure that the preconditions for the O/S to do that are present?

(Not relevant for this patch, but the same locality questions come to mind regarding Thread Local Storage.)

> >>>>
> >>>> Platforms desiring to make use of this capability must enable the
> >>>> associated option flag and stack size settings in platform config
> >>>> files.
> >>>> ---
> >>>>    lib/eal/linux/eal.c | 39
> +++++++++++++++++++++++++++++++++++++++
> >>>>    1 file changed, 39 insertions(+)
> >>>>
> >>> Good idea but having a fixed size stack makes writing complex
> application
> >>> more difficult. Plus you lose the safety of guard pages.

Would it be possible to add a guard page or guard region by using the O/S memory allocator instead of rte_zmalloc_socket()? Since the stack is considered private to the process, i.e. not accessible from other processes, this patch does not need to provide remote access to stack memory from secondary processes - and thus it is not a requirement for this features to use DPDK managed memory.

> >> Thanks for the quick reply.
> >>
> >> The expectation is that use of this optional feature would be
> limited to
> >> cases where
> >> the performance gains justify the implications of these tradeoffs.
> For
> >> example, a specific
> >> data plane application may be okay with limited stack size and could
> be
> >> tested to ensure
> >> stack usage remains within limits.

How to identify the required stack size and verify it... If aiming for small stacks, some instrumentation would be nice, like rte_mempool_audit() and rte_mempool_list_dump().

Alternatively, just assume that the stack is "always big enough", and don't worry about it - like the default O/S stack size. And as Stephen already mentioned: Regardless of stack size, overflowing the stack will cause memory corruption instead of a segmentation fault.

Keep in mind that the required stack size not only depends on the application, but also on DPDK and other libraries being used by the application.

> >>
> >> Also, since this applies only to worker threads, the main thread
> would
> >> not be impacted
> >> by this change.
> >>
> >>
> > I would prefer it as a runtime, not compile time option.
> > That way distributions could ship DPDK and application could opt in
> if it wanted.
> Good point..  I'll work on a v2 and will post that when it's ready.

May I suggest using the stack size configured in the O/S, from pthread_attr_getstacksize() or similar, instead of choosing the stack size manually? If you want it to be configurable, use the default size unless explicitly specified otherwise.

Do the worker threads need a different stack size than the main thread? In my opinion: "Nice to have", not "must have".

Do the worker threads need different stack sizes individually? In my opinion: Perhaps "nice to have", certainly not "must have".


  reply	other threads:[~2022-04-27  8:17 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-26 12:19 Don Wallwork
2022-04-26 14:58 ` Stephen Hemminger
2022-04-26 21:01   ` Don Wallwork
2022-04-26 21:21     ` Stephen Hemminger
2022-04-26 21:25       ` Don Wallwork
2022-04-27  8:17         ` Morten Brørup [this message]
2022-04-29 18:52           ` Don Wallwork
2022-04-29 19:03             ` Stephen Hemminger
2022-05-02 13:15               ` Don Wallwork
2022-04-30  7:55             ` Morten Brørup
2022-04-27  0:42 ` Honnappa Nagarahalli
2022-04-27 17:50   ` Don Wallwork
2022-04-27 19:09     ` Honnappa Nagarahalli
2022-04-29 20:00 ` [RFC v2] " Don Wallwork
2022-04-30  7:20   ` Morten Brørup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D87006@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=donw@xsightlabs.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).