From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9D98FA0505; Fri, 29 Apr 2022 20:52:11 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 360F142819; Fri, 29 Apr 2022 20:52:11 +0200 (CEST) Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-eopbgr130083.outbound.protection.outlook.com [40.107.13.83]) by mails.dpdk.org (Postfix) with ESMTP id BF3DA415D7 for ; Fri, 29 Apr 2022 20:52:09 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=LhBURyV3GVYvD4yeR/BP/t4QVyiDU3TZe/xcxtn9mx5zt6A5U2ealLpfAexvJQn0dF4epoFun3fG31QioAKi0UhwmChlOW7POWBSPg3TcstdQP3eweGTEtytU0EvjEmtHO+7uWcqyv51WzFHXZZlYuHzLCCDbF7KqAd9zXGDfhuFp3VzqIYk89p6Lvxb1UowdjU1Fof9mHRZJrwq+CrGpI+exoKWlzOdFX4RaSDwP4CJqHSDQaPbXBFEyeQKts6wt5A2P3/KRAVoJJ4kmKqkd3rPsPoYETzKm1APoRXYbekqEYDotvsDqZb74XxG/tmba7TwRd9SgQLjBnyXrRE44A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=WL4x4QBa/HtPr2ErXkzmkeK2BKlHNNB4t+j0n8UENCs=; b=IAWLYnkqvI0IwZz0dhi0AQmassFN0AOw9DAZL9h5Lc6U7hsibCYzuEWYqhfWBbaN8vNgOaDe3zNhwuMF2HYyvHhIclceJgHiAoUe7jSKI6bp5Wou2loZmFZ5dbqz7YPsNtWu17Yomy084sKIrYEtcchMuxqDqpKkL+CcrlS9WRRO9wti0ihKbQ2mb882RBzCtamG6Nyp8FxXeyaHMtMzDozHKisxpCUpEZb0Guc1vCUe5MdXst+dMuVigeXIOHYcArzPThMVL8V9eG87fmj0AONy1iSZiW6gLRWfB4MUlGW8hwrBJxBgjfCxBAMgL7LGaQmvNQXAJwLLdldL2IwCpg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=xsightlabs.com; dmarc=pass action=none header.from=xsightlabs.com; dkim=pass header.d=xsightlabs.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xsightlabs.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WL4x4QBa/HtPr2ErXkzmkeK2BKlHNNB4t+j0n8UENCs=; b=0Wy0v1N17cpEZ9I6HjutKYSEk8vn0K/NcNHfHxB5+2LajYBVDG1g039sIyxy+btJjnwMVBo7Tv08sQ4QhgCvQN66PzofWFp2YJv1q+tn9ymBeX5naicY/4eayN0+PsnWWU3fXwqFMZFnGDhOz7zUkXmuNVFOpjlzfsQqX3XcVeE= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=xsightlabs.com; Received: from DB9P193MB1482.EURP193.PROD.OUTLOOK.COM (2603:10a6:10:2a6::7) by PR3P193MB0778.EURP193.PROD.OUTLOOK.COM (2603:10a6:102:33::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5186.15; Fri, 29 Apr 2022 18:52:08 +0000 Received: from DB9P193MB1482.EURP193.PROD.OUTLOOK.COM ([fe80::3c11:328c:a5e5:7253]) by DB9P193MB1482.EURP193.PROD.OUTLOOK.COM ([fe80::3c11:328c:a5e5:7253%5]) with mapi id 15.20.5206.014; Fri, 29 Apr 2022 18:52:08 +0000 Message-ID: Date: Fri, 29 Apr 2022 14:52:03 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [RFC] eal: allow worker lcore stacks to be allocated from hugepage memory To: =?UTF-8?Q?Morten_Br=c3=b8rup?= , Stephen Hemminger , Anatoly Burakov , Dmitry Kozlyuk , Bruce Richardson Cc: dev@dpdk.org References: <20220426122000.24743-1-donw@xsightlabs.com> <20220426075858.2c28f427@hermes.local> <53a03de6-fb78-986e-64f6-890b08321343@xsightlabs.com> <20220426142124.524069c5@hermes.local> <98CBD80474FA8B44BF855DF32C47DC35D87006@smartserver.smartshare.dk> From: Don Wallwork In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D87006@smartserver.smartshare.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: VE1PR03CA0044.eurprd03.prod.outlook.com (2603:10a6:803:118::33) To DB9P193MB1482.EURP193.PROD.OUTLOOK.COM (2603:10a6:10:2a6::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 8e1eb50f-5a74-4e89-540d-08da2a115c86 X-MS-TrafficTypeDiagnostic: PR3P193MB0778:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: S0wYoCD0KGu8TIrgi3Vvzy3BaMVITLX5ZCJBBernwChEx1dNMBmNiBpuEBaPE6CbURbB73yXgaEumP6sSNs+8Ts6ES5bKLhqcaHoNAykuBxMagDkWEvEgKjNdL2Mpv1Kr0LruUdmTOIpLjGwQRip9ZBx2/k9Uan8dpxmT4iDIiLe23BBuWLYLAI4WNtDOoM/p1k/DszrPeEpdMqEl4wkQNL0Hkf9Uu5uOfltREgaP3NYmLLRQcO538NbZhJL6BqPrGKvy5knrV9NNzW7cH2zZK0cPQSrpQvxFF4Sesmyld+n2SOM/AzfVxARzUohjpk+CipHIe0NACa5CyGGBk5UEqw5fshwo8/wXhzZlIlLMPHXJrwGBiLcIeENxTWmb2sn8sPOgqEWV9ipPj0AJdF+OH5vnGgS9/Z1JF7IbqlSVhh8ftYP8K2qPWHI1KY2UlOvA/Uxso6ns6AJ40VhOG+E2fmMDXgO+edsCMj3ad3/RqS8VGngTU+Sd+hbFMufsMs7+ucB9kdvLFjeGQpY54FZRXnO5e/5EqVXEqOQg0zQ2h+TIictldcacA1lCoBFO99Yb7i9q0Pa2smdjtO57vGgxpxRU9Sd3rhTj/xJgPP7B/SsjJSjWk/XGUMJcvG5GKWK1p1Ayd5m8FG3YhkuC123/W4CeZ52vvIHPkBVccNyrwpDsjvj2zlmdb7kx2FPH9D/RRsOeD4sIzv6HZ7y1ZH7foVv27t58CapBot0igIk8YL7zGkoiMMYXPfF6yzhVtT4Rnqepk5BzPgnIRjruZlZOQ== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DB9P193MB1482.EURP193.PROD.OUTLOOK.COM; PTR:; CAT:NONE; SFS:(13230001)(4636009)(366004)(6512007)(508600001)(6486002)(6666004)(53546011)(26005)(8936002)(5660300002)(6506007)(52116002)(31696002)(2616005)(66574015)(38350700002)(38100700002)(186003)(110136005)(316002)(2906002)(4326008)(83380400001)(66556008)(8676002)(66946007)(86362001)(31686004)(66476007)(36756003)(45980500001)(43740500002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?QTQ4MmM5TXg1RVpKcWRGQzFqWU5xMFBMNzlvbTRoTTVMK3BkYTcvTFpyTGNo?= =?utf-8?B?L0U0UUFQUTNMSEhPM0RKM09SZGpMVEVyc09iYWR0Q09vaWhXSjloUklQS2p5?= =?utf-8?B?WDFIaytHN3JXN2RHQlprQ0NVU0N1L3dvMXYySUtPbis3WENXd0ZzeThQT0Nx?= =?utf-8?B?K0RTWFBQd0IrVEl1ZXRDWk9YaG44bUtBMFlKbU1sdWJNaDJxWG9CancrRFNn?= =?utf-8?B?Nms3TlBXNG9TYWlPanNZbVVxaDJDMS84MVdwMXI4SmVpb2Qvdnc5OXJuUnd4?= =?utf-8?B?Zmk0cHQ0WDhEZjBDWWpoNmFZL1A3VTlnZXF5L2RIY1hkbTBXUHA4Z2gvZUhX?= =?utf-8?B?YVl0WVZUSEQwUExLNmZTWmxhVGpFdnY1eFpWT3FrLzU4ZlRBOVlUMk1kV1Nq?= =?utf-8?B?V2JDcmlqZnpNQzdrcmxrZStXV3p4OFVxVytEc0p0WE4rYXA3UFFiTXVPbHNx?= =?utf-8?B?Q2ZQTDZiWnBXR0JLa3RUaERyV0ZrTnNtYTJ0WGltd2VUdzRjK3NHb2NzNE9x?= =?utf-8?B?REo3T3dhMWx3VUFUV1BBdFZLOWhWK3R6c0xsY0duRk96bnNCMVl4aDZWVkdP?= =?utf-8?B?L2pqM29VR3c5cjJXSVhSSm9lZ08vQ256TmNoWktJY2FlS1ppOHl4UTBNMUhu?= =?utf-8?B?NnYyUVRZdEprQjhqRFl3QXVDWExKSFkvSkFVSDJ6M3dpQnVjU1hvcGFxZlJl?= =?utf-8?B?TXo0bkQ0N0hxS1l4N3lESW9wOTdmN2lMSVc2NTRFSnJrNmw5V3Zrc0FwLzd0?= =?utf-8?B?U21sRExXNVJvVzlEbEptd3VhR01ycG05YmN2c0xJRy9kMTZJbEsyUWl3aVky?= =?utf-8?B?Z044VWp3WnpCczExdGpURmpibjk0ZXY2M2Z0S1Jwb2tRTWNrSUQyQ1h5ZGdl?= =?utf-8?B?MUYvdkp1aHB2WXc5a1hHNGNUWGlTczZEMXhYdzFMNytEVTkwV3lEZTFRT3Fv?= =?utf-8?B?ZERDbVBKdjd5bTBacWVyeDluTVlKY2R2UVRhbTU3VFE1K3F0UTBmU0ErcWpa?= =?utf-8?B?YllBUjJzenlOckpHNEMrRUk2NGJoVTFXaEpaWjVNeGFoVzdRSTE4NS80Q1dX?= =?utf-8?B?Zjd2MkNlcDRmZlV4dnhoOFA5ZmY5RHRlSWlKTzJCa2IxbFJ3VzVKVXQ5NVBo?= =?utf-8?B?V1ZEZDJEQnI4ak15V0dxalZGbXpiSTRla2xsaVNjUWpuUkNFY3B0ZEtMd0Z0?= =?utf-8?B?azN4b3VZSEhqc1psQ3Rra2o1NGdXWkJGK3lQL3dOVnV6WXlKT2puQ2ZnSnhH?= =?utf-8?B?L1JoTENRc1ZOSmxHYytXd2Eyd2hyc29WRjVrclF0QlNiNFlSdFluc0xRWHQ4?= =?utf-8?B?RncvcnBmVWV1R2hSN04xZkpwc1Z5VU9MVlcyc2xmUWVZUjM1NVViWVdTd0E5?= =?utf-8?B?c1d4SkZOYVRPVExOcEpGU01wNEJxaHNYMU5oL3ZGY2p3QmxzWkx6OFNIdWIy?= =?utf-8?B?MlpQTUpxaVVRMXViS0FPMUcrRjlyUm9KeW5DbGdHdlhycXhUUHhBRit0Sml2?= =?utf-8?B?NnU3THdMTElidTVZWFlJVXA5YkdVQkJKbDlRMlU3Zlk4WklUL3RHZ0ZySXJk?= =?utf-8?B?eUpEMHFONnNSVUVQdTNsR1JpTjAxRFBkaUlveS9KZlRNakljbklUNjZFZnlH?= =?utf-8?B?NCttNFhybW41Rngzc2ZzeEVJN2dZTEFiMXNyWGVORXNlNG1hV2owYVBCWU83?= =?utf-8?B?aDN1Q3VOM1dsc0JVNndGeTlXTGFYeHkrVW1ubGhwb2F0eVZTc1B6UGl0czgy?= =?utf-8?B?QU0zb1M4WVd5OXF2OEdhTUFnWHdmVWFHendpU2JzZzRWc1ZwMGg4TDdoMFVj?= =?utf-8?B?QWlHdm84ek1yQnozZHJwVEtzeVFHTDIxUTRqZzZhcHUyVWpVcG5aVjYzSHZK?= =?utf-8?B?UDNxZkZ3aXYrVE9XOGhDekp1UnhqeXJxNHVkSjVkckN6cVNodUlJR0VKTFps?= =?utf-8?B?TVBORzUxRk5GSXBwWWdGYndMUFNIMGloNlhSYzFIMjRXZGJBZ3U3Y0dLRWEw?= =?utf-8?B?cS9haXRQY25SMnFaSlFHVGNDcmJZaVFnL0owLzRoZTZYcWUvanJUaHo2c0c4?= =?utf-8?B?M2lQMU40TzJGSnNvQ2o3dTdqUE01dmVoQ3IybzNpMHFvVEZ6YmpZelFwSUw1?= =?utf-8?B?SEM5Z1lNYUR6V2dGTHVNNlhYdjhOSnRhN2c5aGxtSjUzSFhGWnAyeDB2NXNw?= =?utf-8?B?UGtFZ2tTcElCOTVUOWxickVTeWlBVFdUaEpVVHpoSU5YZUozbDRCMERBRW1s?= =?utf-8?B?N1lDT2sza3hGNE84d29rMmE0U0FoMXdNeXdnUU8xUXVBaWEwcndUZjc5Z0VT?= =?utf-8?B?TUFWd3RoTTdjYjBtOSs1Zk5HcDRtTGFadkVLQ0txTkhlUGJBbmUxUT09?= X-OriginatorOrg: xsightlabs.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8e1eb50f-5a74-4e89-540d-08da2a115c86 X-MS-Exchange-CrossTenant-AuthSource: DB9P193MB1482.EURP193.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Apr 2022 18:52:08.2557 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 646a3e34-83ea-4273-9177-ab01923abaa9 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: y5v7NfaWMm1nMp+mpULROkJuYyVVSjH+aUCsLxs170jaKeq/rQTAo37hrMeSKdef93v8y7tRRoYHAOodq+hngQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PR3P193MB0778 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 4/27/2022 4:17 AM, Morten Brørup wrote: > +CC: EAL and Memory maintainers. > >> From: Don Wallwork [mailto:donw@xsightlabs.com] >> Sent: Tuesday, 26 April 2022 23.26 >> >> On 4/26/2022 5:21 PM, Stephen Hemminger wrote: >>> On Tue, 26 Apr 2022 17:01:18 -0400 >>> Don Wallwork wrote: >>> >>>> On 4/26/2022 10:58 AM, Stephen Hemminger wrote: >>>>> On Tue, 26 Apr 2022 08:19:59 -0400 >>>>> Don Wallwork wrote: >>>>> >>>>>> Add support for using hugepages for worker lcore stack memory. The intent is to improve performance by reducing stack memory related TLB misses and also by using memory local to the NUMA node of each lcore. > This certainly seems like a good idea! > > However, I wonder: Does the O/S assign memory local to the NUMA node to an lcore-pinned thread's stack when instantiating the tread? And does the DPDK EAL ensure that the preconditions for the O/S to do that are present? > > (Not relevant for this patch, but the same locality questions come to mind regarding Thread Local Storage.) Currently, DPDK does not set pthread affinity until after the pthread is created and the stack has been allocated.  If the affinity attribute were set before the pthread_create call, it seems possible that pthread_create could be NUMA aware when allocating the stack.  However, it looks like at least the glibc v2.35 implementation of pthread_create does not consider this at stack allocation time. > >>>>>> Platforms desiring to make use of this capability must enable the >>>>>> associated option flag and stack size settings in platform config >>>>>> files. >>>>>> --- >>>>>> lib/eal/linux/eal.c | 39 >> +++++++++++++++++++++++++++++++++++++++ >>>>>> 1 file changed, 39 insertions(+) >>>>>> >>>>> Good idea but having a fixed size stack makes writing complex application more difficult. Plus you lose the safety of guard pages. > Would it be possible to add a guard page or guard region by using the O/S memory allocator instead of rte_zmalloc_socket()? Since the stack is considered private to the process, i.e. not accessible from other processes, this patch does not need to provide remote access to stack memory from secondary processes - and thus it is not a requirement for this features to use DPDK managed memory. In order for each stack to have guard page protection, this would likely require reserving an entire hugepage per stack.  Although guard pages do not require physical memory allocation, it would not be possible for multiple stacks to share a hugepage and also have per stack guard page protection. > >>>> Thanks for the quick reply. >>>> >>>> The expectation is that use of this optional feature would be limited to cases where the performance gains justify the implications of these tradeoffs. For example, a specific data plane application may be okay with limited stack size and could be tested to ensure stack usage remains within limits. > How to identify the required stack size and verify it... If aiming for small stacks, some instrumentation would be nice, like rte_mempool_audit() and rte_mempool_list_dump(). Theoretically, a region of memory following the stack could be populated with a poison pattern that could be audited.   Not as robust as hw mprotect/MMU, but it could provide some protection. > > Alternatively, just assume that the stack is "always big enough", and don't worry about it - like the default O/S stack size. And as Stephen already mentioned: Regardless of stack size, overflowing the stack will cause memory corruption instead of a segmentation fault. > > Keep in mind that the required stack size not only depends on the application, but also on DPDK and other libraries being used by the application. > >>>> Also, since this applies only to worker threads, the main thread would not be impacted by this change. >>> I would prefer it as a runtime, not compile time option. >>> That way distributions could ship DPDK and application could opt in if it wanted. >> Good point..  I'll work on a v2 and will post that when it's ready. > May I suggest using the stack size configured in the O/S, from pthread_attr_getstacksize() or similar, instead of choosing the stack size manually? If you want it to be configurable, use the default size unless explicitly specified otherwise. Yes, that can be handled in EAL args.  I'll include that in the next version. > > Do the worker threads need a different stack size than the main thread? In my opinion: "Nice to have", not "must have". The main thread stack behaves differently anyway; it can grow dynamically, but regarless of this patch, pthread stack sizes are always fixed.   This change only relates to worker threads. > > Do the worker threads need different stack sizes individually? In my opinion: Perhaps "nice to have", certainly not "must have". > Currently, worker thread stack sizes are uniformly sized and not dynamically resized. This patch does not change that aspect.  Given that, it seems unnecessary to add that complexity here.