From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B7F6742B07; Sat, 20 May 2023 17:03:31 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 4FE5042BC9; Sat, 20 May 2023 17:03:31 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by mails.dpdk.org (Postfix) with ESMTP id 02C5942B71 for ; Sat, 20 May 2023 17:03:28 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684595009; x=1716131009; h=message-id:date:subject:to:cc:references:from: in-reply-to:content-transfer-encoding:mime-version; bh=F0v7tyjMjmvI3YlSzhi5QmUeLrozaAoWkNQUjJNbShI=; b=n6PQpr68LVPw/XvmdiyDfpabRAVIlJwJ3iRgIsHYxu3AWtw6jWbr5Hoe LEF1RoetCNmiHktZScgU6NVox4ucNIvbc9S3duoATMo/LWUbdrs0HtsVC HMOVF3QxMx6cZ7FMPLG+ulb1Q4X0duWgbZmUZgzoshchoerX631kTGBHd dRDCoOP7qkOxpuvQirnfG8QV9R/7AYI1b0LnlE0OGp2OIcElzfmxVvuRI 53pZr9Y3wNWKzCffEB8O9je7qDEzKVOayYCrOoz7bWz2eNz1SJe21llqm Ire8nGIXQOldSvbq7Srzzv1EJSNAurPwd1FaWrYsfPviXp3GIIvAVZOpr Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10716"; a="342040620" X-IronPort-AV: E=Sophos;i="6.00,180,1681196400"; d="scan'208";a="342040620" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 May 2023 08:03:27 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10716"; a="792701431" X-IronPort-AV: E=Sophos;i="6.00,180,1681196400"; d="scan'208";a="792701431" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by FMSMGA003.fm.intel.com with ESMTP; 20 May 2023 08:03:27 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Sat, 20 May 2023 08:03:26 -0700 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23 via Frontend Transport; Sat, 20 May 2023 08:03:26 -0700 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (104.47.74.46) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.23; Sat, 20 May 2023 08:03:26 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kEIQxntdY215qUsrNZIVl4nk7q6t2qVz1jEos8Xml4ttSPYrkePToylAhx0TQTdy1WsKNQKnkPCDoIadHHlAlxS19kI7WTW2AZZ19o/RMA0SwVgEjzykCEFXFr0s5T8W21gYIJYoPNKGVa1VUjcz3VmUkdG7jnDSQqRgUt3P+FlJAqQJ0X/FynCLEzO/6dSzp5c/66pfdeIh9elw5rWdMGUFhUbJD33+l29yTXlj0dQSqR9O/MNjdVrUjtNG6AWYtEZxWrfKnnc6XnQatoX0n7iaSPhARnx4rk5avtmo4sZmJgDOBPuNTuCyQiu6b/OzUWVOm9Skw1U16ecDMkgWHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=3EnGDsHXSAMIIkYowOp0kvRoU5q3wj2D5tPKlH41RPA=; b=fzTx2z4L38hAUQqElrh7QDYeCXbVqTEZBI/acjJi6LhwymVxW5ah4jSYFqG2be16QthLHjsF9ZLefD07gnC/51oRdzat9yhiFHHckGiHHwqITz/jpwiMpdws4jITs6gnBjcgi6Wp5r05RqpGPxFzKMoigvsoc6jTRLAgIXC9vB2rFKVSbuNEvLYDlgu9TxjQzjpnUNE2NT5gZ/mEs5lXbJAtzscW7g88oCH/rYv/gMLV7BcWCEqadHB7OZbxPcJOlwQ/Lc3K9w4OAIR10OmWVcgb+CgKzSO4rf7lZ77mfwDKWTsBxRzfPOVMNT6vMKVIsTG0x7OlR4C99BJN50KcNA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from DM4PR11MB6502.namprd11.prod.outlook.com (2603:10b6:8:89::7) by DS0PR11MB7285.namprd11.prod.outlook.com (2603:10b6:8:13d::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6411.23; Sat, 20 May 2023 15:03:19 +0000 Received: from DM4PR11MB6502.namprd11.prod.outlook.com ([fe80::49c0:aa4c:e5b4:e718]) by DM4PR11MB6502.namprd11.prod.outlook.com ([fe80::49c0:aa4c:e5b4:e718%5]) with mapi id 15.20.6411.021; Sat, 20 May 2023 15:03:19 +0000 Message-ID: Date: Sat, 20 May 2023 16:03:13 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.11.0 Subject: Re: [PATCH] eal: fix eal init may failed when too much continuous memsegs under legacy mode Content-Language: en-US To: Fengnan Chang , CC: Lin Li References: <20230516122108.38617-1-changfengnan@bytedance.com> From: "Burakov, Anatoly" In-Reply-To: <20230516122108.38617-1-changfengnan@bytedance.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: LNXP265CA0034.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:5c::22) To DM4PR11MB6502.namprd11.prod.outlook.com (2603:10b6:8:89::7) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM4PR11MB6502:EE_|DS0PR11MB7285:EE_ X-MS-Office365-Filtering-Correlation-Id: 5638a887-b291-4415-daf1-08db59435878 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: F4kH+urqqksS52WKzWosOVCmb5JmtKkY8XkAtfLUA8DdYxYpue7XhjtlAwIrT6oEjG5wVnpYKcVNYOZbyltTj3/ZHWsXha1tN4Gfrw3H0NJmeKaw0tmBeoIEzsMGu3UEVCk8DqFUiYmHngz+rNlL8ytgheBhvTRTDb9BfQqLZBJ+AEZoLvoFlv9yvLI2key7rmhAjHE2crK5YVtBw8iqrkyanjVQ3zIrqYr/XaGO8s7UYHAuab5fz+ql8rB+PwwW1VTy/6ahXBRjhTlk00l9dm/P+WCRsd/3t0TYr3ta0YAkfH6YgjO+1dC4XKSrpqRrsK1uCDxoqUxHHuU0Gr1Qb+/cIIcjY1Ja8fi3FSL5raJX7norYLnGIg9E+PMbKK7NpOWYbzPIsfWMDVEbwSG2m6Sk+9Dp+UMnhHMMmM4a+wNR2wwp48dkTYaNkY2iRvf2VpVJGlsMvuouqvUNaUDkZXzvBkfp3btAkFu1CPlc5qYUSq7i5sTn8soW62VakFbgwGr3dWFibCQJORppauChBqXRJYn9EQlUvgk2xKr4n/uH/MHJQsLn1u8U3Z++f5rN6RgwlwdTLJ7b7DZdvsjY05Q8Wfhv2Dr5w0RB+FKwZ9Ir1CvYD+wboBiea18zMoBN++fXmxmwwr1zijfozu36VA== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM4PR11MB6502.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(136003)(396003)(39860400002)(376002)(346002)(366004)(451199021)(478600001)(8676002)(8936002)(5660300002)(4326008)(6666004)(66556008)(66946007)(66476007)(41300700001)(38100700002)(6486002)(31686004)(6512007)(6506007)(26005)(53546011)(83380400001)(316002)(2616005)(82960400001)(86362001)(31696002)(36756003)(186003)(2906002)(45980500001)(43740500002); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?WTVQeHVIRlR5Q1dWc3RDd3JlWEs0bk54aWpsT0FlVVJVUkJuWDN4Mm13WkRL?= =?utf-8?B?eFBlNzFTZkJQWldpakx3VGtndU5kUkdKQUZuT0kwUTJWQXoycXF5RjRSSWRm?= =?utf-8?B?VlJTdlhzcWdOSTdRRk5uS011amwzMXV4ZzhZVk04bUtkUFdSQkxtL05xWWw5?= =?utf-8?B?K252R3NTVnVIU1Z5VUlqRHpPa1dLdk5BMk04M29ZOGs3NWl4aUlBMmpSaVpr?= =?utf-8?B?NXJLUzNNNFNtSXZRY1hGTndsUEhjcHlHS0kyMjIxbjFiK1VrK3J4UWJRRmdH?= =?utf-8?B?QUVkNXhjVkZqc1hBSW42YkVWampBRWwyNHBBNzN4clBxdUIxdFo1azZ4TG4w?= =?utf-8?B?cFR4U0psVE1sVVNaeThEaE1rYVZxRENqekhWQ3FvVStKM0xmMDJVZ3ROU0hD?= =?utf-8?B?MDlCaGpibVlwUHhOZE9WK3pHNlpDdjRCMTQ0Wm9MdHkzNXJaUE9hWWN5cmZS?= =?utf-8?B?Z1FicXBlajRETmJSRGFoSmZ0T012ejlpTmFnNHhjaFFTRGFITlZsK01Rbk5N?= =?utf-8?B?TDhPV0dSNkVzU3YzUHQvZEdXRWdlSjlGbXo4aDdnVDBDczEyUVpCcm9HQWFC?= =?utf-8?B?OWc4Z2VtNktGSkJVTFh5eHZJWFVic1ZKRGxBYnlFUk9SSUJYT0RubVBqZWNF?= =?utf-8?B?SzlQMGtuWllNR2pyRGNhSXhLK2RFYXFXOENQbVF1NmFWM2FHdVF1Tkd6UTJQ?= =?utf-8?B?UUFtQWFjaVEvaWNISWVJdWZ1T284WjZwZlVDY0FyanM5bmJhak5XaEpDb0dD?= =?utf-8?B?RjRROGRyWDBHRi93aHNrNFlmdHlJMEZOdk92S0FQelB2ZUJjeGJUKzJ6SGFm?= =?utf-8?B?TE83TjdvZ240TjJRclV0Mk1acURyek1LZFFHUFltd3preEVqa2U2VzhGQ3ow?= =?utf-8?B?MUVHZExpNXFxZTlWZlMwQS9namlwZS9JcEFVSUNUUm1Gdklkb3FldmgzWkNL?= =?utf-8?B?NGljRGJDR3VwYzNMZEFhL3hzRFMwRE5KM0tCYmtmQjF6aFVsNDcxZzVRc21T?= =?utf-8?B?NGxkK1RtZ3RvVlVyUTE1YUFQaDdQTGx6ckJOaXZIRXFUVmN2dWtMV3BxVGJD?= =?utf-8?B?NzQ2VFVrb2lYVUZVckhNRitIbFRRZGZVb3hkV2RtVVovTlhScVRTNkN2Y2Ji?= =?utf-8?B?YWJMY08yNi80WkxOeFBNMFQvWXBRUHFGOEJxbXFhem9wWmRJelRMZjBjOXg2?= =?utf-8?B?VENyWm1VVjJWMlk3Y1IrNnlPdlNaUmx1RFZIYmhWMXRRSmN3TXAvSUZhSnZ0?= =?utf-8?B?enV5SlpRR0RXSUtDZ0t5K0tQYUlERnR0T1E0UHRrb3BBeTB2dCtkWW14NTZ6?= =?utf-8?B?QVRyZE1FZmRkUkJwK0d5OUxPa2hBbUg0QUZNWE5wNklqbkoyVk5ubnE1SE1X?= =?utf-8?B?M2JUVmgrYlRkQmNJcDAybzl5NTdINzRNUEZ6dFFxSnVpVVUyQlZNNkZqMlNk?= =?utf-8?B?SjdUakkzU3M3V0FXSHNzSlFXa3VRY3MwLyszNVVaYjBPU3QyQUNGYWphbWM3?= =?utf-8?B?dXNBYVVENWVMWk4yOFlVTStUUU1QVUhTcXRlS1RQQ0VpY0NOZDJMODlwMklH?= =?utf-8?B?N3hiZFVOd0FwQTZSNHh5a3RtWjk1UmE2Z293K3JycFNQRWJVc1Z2cWNydEFw?= =?utf-8?B?Tk5Gb0g2WXd4OExGcmlPVnpuYU9hSEdDcjVpRDE1M1dDMFpJYkdKdGliaCtz?= =?utf-8?B?MTZOb2lRUjkvV1VFdTFxRW9DNWlPQkVKQmZzZzcvNmd0ZlE5Y1lHamhQRGM5?= =?utf-8?B?Z25SbUZTazRPYkI3NTd1djRCWXFPZXFWK0dPc1hhZjZJMXRDeU1xdVhwWUJY?= =?utf-8?B?UTFpd3NHbzJZN0dLQlJFNEdxL1Q5S0hhUmExMlBiQTJTRGZRelVJT3pkdEhX?= =?utf-8?B?cWQzV0Q5OVdQclVqT2xabE5pSVBHRU43RG5kT0RnZkxrVXhuZ0JKYndtNE0x?= =?utf-8?B?RUZFcUl0U0s0ci9OSkJ5RzFUUHhsM3N5Vk9HemZXUTFsR25UR2M1OHFSd09V?= =?utf-8?B?Ylh5Yy9XUUtWcUs2Um1ka1FHR3Vta0N2T3QzRk80Vy82NUlaNEE2bDYrNHJW?= =?utf-8?B?OFhKcGd6V0g3ZzdKQStzMitNNHhhcHg4bFpTNTA2M1ZjWXFxZVdkTGx3Y2ZV?= =?utf-8?B?cFd3ODM0Z3Y0UUVnakxzVEwrc1hMZ1N6UHRiL08za0JYdWV1dnhacmFFdXNy?= =?utf-8?B?WHc9PQ==?= X-MS-Exchange-CrossTenant-Network-Message-Id: 5638a887-b291-4415-daf1-08db59435878 X-MS-Exchange-CrossTenant-AuthSource: DM4PR11MB6502.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 May 2023 15:03:18.7229 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: guFRw3i157R1Wxma66EKuTBgc1o7wrQaj9n9r8w5Z0Mk38BZCAu7YQFfdiorJmNc5TLPvOavYqvHO+4C+8254Z/udLd1Cq7OpRpvaviUIeQ= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR11MB7285 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi, On 5/16/2023 1:21 PM, Fengnan Chang wrote: > Under legacy mode, if the number of continuous memsegs greater > than RTE_MAX_MEMSEG_PER_LIST, eal init will failed even though > another memseg list is empty, because only one memseg list used > to check in remap_needed_hugepages. > > For example: > hugepage configure: > 20480 > 13370 > 7110 > > startup log: > EAL: Detected memory type: socket_id:0 hugepage_sz:2097152 > EAL: Detected memory type: socket_id:1 hugepage_sz:2097152 > EAL: Creating 4 segment lists: n_segs:8192 socket_id:0 hugepage_sz:2097152 > EAL: Creating 4 segment lists: n_segs:8192 socket_id:1 hugepage_sz:2097152 > EAL: Requesting 13370 pages of size 2MB from socket 0 > EAL: Requesting 7110 pages of size 2MB from socket 1 > EAL: Attempting to map 14220M on socket 1 > EAL: Allocated 14220M on socket 1 > EAL: Attempting to map 26740M on socket 0 > EAL: Could not find space for memseg. Please increase 32768 and/or 65536 in > configuration. Unrelated, but this is probably a wrong message, this should've called out the config options to change, not their values. Sounds like a log message needs fixing somewhere... > EAL: Couldn't remap hugepage files into memseg lists > EAL: FATAL: Cannot init memory > EAL: Cannot init memory > > Signed-off-by: Fengnan Chang > Signed-off-by: Lin Li > --- > lib/eal/linux/eal_memory.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c > index 60fc8cc6ca..36b9e78f5f 100644 > --- a/lib/eal/linux/eal_memory.c > +++ b/lib/eal/linux/eal_memory.c > @@ -1001,6 +1001,8 @@ remap_needed_hugepages(struct hugepage_file *hugepages, int n_pages) > if (cur->size == 0) > break; > > + if (cur_page - seg_start_page >= RTE_MAX_MEMSEG_PER_LIST) > + new_memseg = 1; I don't think this is quite right, because technically, `RTE_MAX_MEMSEG_PER_LIST` is only applied to smaller page size segment lists - larger page sizes segment lists will hit their limits earlier. So, while this will work for 2MB pages, it won't work for page sizes which segment list length is smaller than the maximum (such as 1GB pages). I think this solution could be improved upon by trying to break up the contiguous area instead. I suspect the core of the issue is not even the fact that we're exceeding limits of one memseg list, but that we're always attempting to map exactly N pages in `remap_hugepages`, which results in us leaving large contiguous zones inside memseg lists unused because we couldn't satisfy current allocation request and skipped to a new memseg list. For example, let's suppose we found a large contiguous area that would've exceeded limits of current memseg list. Sooner or later, this contiguous area will end, and we'll attempt to remap this virtual area into a memseg list. Whenever that happens, we call into the remap code, which will start with first segment, attempt to find exactly N number of free spots, fail to do so, and skip to the next segment list. Thus, sooner or later, if we get contiguous areas that are large enough, we will not populate our memseg lists but instead skip through them, and start with a new memseg list every time we need a large contiguous area. We prioritize having a large contiguous area over using up all of our memory map. If, instead, we could break up the allocation - that is, use `rte_fbarray_find_biggest_free()` instead of `rte_fbarray_find_next_n_free()`, and keep doing it until we run out of segment lists, we will achieve the same result your patch does, but have it work for all page sizes, because now we would be targeting the actual issue (under-utilization of memseg lists), not its symptoms (exceeding segment list limits for large allocations). This logic could either be inside `remap_hugepages`, or we could just return number of pages mapped from `remap_hugepages`, and have the calling code (`remap_needed_hugepages`) try again, this time with a different start segment, reflecting how much pages we actually mapped. IMO this would be easier to implement, as `remap_hugepages` is overly complex as it is! > if (cur_page == 0) > new_memseg = 1; > else if (cur->socket_id != prev->socket_id) -- Thanks, Anatoly