From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id EC7202C54 for ; Mon, 26 Nov 2018 12:16:15 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 26 Nov 2018 03:16:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,281,1539673200"; d="scan'208";a="111555585" Received: from irsmsx101.ger.corp.intel.com ([163.33.3.153]) by orsmga002.jf.intel.com with ESMTP; 26 Nov 2018 03:16:13 -0800 Received: from irsmsx155.ger.corp.intel.com (163.33.192.3) by IRSMSX101.ger.corp.intel.com (163.33.3.153) with Microsoft SMTP Server (TLS) id 14.3.408.0; Mon, 26 Nov 2018 11:16:12 +0000 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.38]) by irsmsx155.ger.corp.intel.com ([169.254.14.44]) with mapi id 14.03.0415.000; Mon, 26 Nov 2018 11:16:12 +0000 From: "Burakov, Anatoly" To: Kevin Traynor CC: dpdk stable Thread-Topic: patch 'mem: improve segment list preallocation' has been queued to stable release 18.08.1 Thread-Index: AQHUgoOyCVqw+cRqhEG2OOdXT8+9i6Vh7cFA Date: Mon, 26 Nov 2018 11:16:11 +0000 Message-ID: References: <20181122164957.13003-1-ktraynor@redhat.com> <20181122164957.13003-18-ktraynor@redhat.com> In-Reply-To: <20181122164957.13003-18-ktraynor@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_NT x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMzRjYzQzYjgtZDhiZS00YmQ4LWE4OGEtYjdjOGNiMTllMTg2IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiNm9sNk55MG9RWUVoNGRXTitYeUhqYTFzTTI5bmFUelZucWprekhweURoTm44MkErbU84amd3TG5YbXpPWlwvOXcifQ== dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-stable] patch 'mem: improve segment list preallocation' has been queued to stable release 18.08.1 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Nov 2018 11:16:16 -0000 Hi Kevin, FYI http://patches.dpdk.org/patch/48338/ Thanks, Anatoly > -----Original Message----- > From: Kevin Traynor [mailto:ktraynor@redhat.com] > Sent: Thursday, November 22, 2018 4:49 PM > To: Burakov, Anatoly > Cc: dpdk stable > Subject: patch 'mem: improve segment list preallocation' has been queued > to stable release 18.08.1 >=20 > Hi, >=20 > FYI, your patch has been queued to stable release 18.08.1 >=20 > Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. > It will be pushed if I get no objections before 11/28/18. So please shout= if > anyone has objections. >=20 > Also note that after the patch there's a diff of the upstream commit vs t= he > patch applied to the branch. If the code is different (ie: not only metad= ata > diffs), due for example to a change in context or macro names, please > double check it. >=20 > Thanks. >=20 > Kevin Traynor >=20 > --- > From 6d552c83eacba7be4b4f2efbafd58724e07e2330 Mon Sep 17 00:00:00 > 2001 > From: Anatoly Burakov > Date: Fri, 5 Oct 2018 09:29:44 +0100 > Subject: [PATCH] mem: improve segment list preallocation >=20 > [ upstream commit 1dd342d0fdc4f72102f0b48c89b6a39f029004fe ] >=20 > Current code to preallocate segment lists is trying to do everything in o= ne go, > and thus ends up being convoluted, hard to understand, and, most > importantly, does not scale beyond initial assumptions about number of > NUMA nodes and number of page sizes, and therefore has issues on some > configurations. >=20 > Instead of fixing these issues in the existing code, simply rewrite it to= be > slightly less clever but much more logical, and provide ample comments to > explain exactly what is going on. >=20 > We cannot use the same approach for 32-bit code because the limitations o= f > the target dictate current socket-centric approach rather than type-centr= ic > approach we use on 64-bit target, so 32-bit code is left unmodified. Free= BSD > doesn't support NUMA so there's no complexity involved there, and thus it= s > code is much more readable and not worth changing. >=20 > Fixes: 1d406458db47 ("mem: make segment preallocation OS-specific") >=20 > Signed-off-by: Anatoly Burakov > --- > lib/librte_eal/linuxapp/eal/eal_memory.c | 179 +++++++++++++++++------ > 1 file changed, 137 insertions(+), 42 deletions(-) >=20 > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c > b/lib/librte_eal/linuxapp/eal/eal_memory.c > index 6131bfde2..cc2d3fb69 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -2096,7 +2096,13 @@ memseg_primary_init(void) { > struct rte_mem_config *mcfg =3D rte_eal_get_configuration()- > >mem_config; > - int i, socket_id, hpi_idx, msl_idx =3D 0; > + struct memtype { > + uint64_t page_sz; > + int socket_id; > + } *memtypes =3D NULL; > + int i, hpi_idx, msl_idx; > struct rte_memseg_list *msl; > - uint64_t max_mem, total_mem; > + uint64_t max_mem, max_mem_per_type; > + unsigned int max_seglists_per_type; > + unsigned int n_memtypes, cur_type; >=20 > /* no-huge does not need this at all */ @@ -2104,8 +2110,49 @@ > memseg_primary_init(void) > return 0; >=20 > - max_mem =3D (uint64_t)RTE_MAX_MEM_MB << 20; > - total_mem =3D 0; > + /* > + * figuring out amount of memory we're going to have is a long and > very > + * involved process. the basic element we're operating with is a > memory > + * type, defined as a combination of NUMA node ID and page size (so > that > + * e.g. 2 sockets with 2 page sizes yield 4 memory types in total). > + * > + * deciding amount of memory going towards each memory type is a > + * balancing act between maximum segments per type, maximum > memory per > + * type, and number of detected NUMA nodes. the goal is to make > sure > + * each memory type gets at least one memseg list. > + * > + * the total amount of memory is limited by RTE_MAX_MEM_MB > value. > + * > + * the total amount of memory per type is limited by either > + * RTE_MAX_MEM_MB_PER_TYPE, or by RTE_MAX_MEM_MB > divided by the number > + * of detected NUMA nodes. additionally, maximum number of > segments per > + * type is also limited by RTE_MAX_MEMSEG_PER_TYPE. this is > because for > + * smaller page sizes, it can take hundreds of thousands of segments > to > + * reach the above specified per-type memory limits. > + * > + * additionally, each type may have multiple memseg lists associated > + * with it, each limited by either RTE_MAX_MEM_MB_PER_LIST for > bigger > + * page sizes, or RTE_MAX_MEMSEG_PER_LIST segments for smaller > ones. > + * > + * the number of memseg lists per type is decided based on the > above > + * limits, and also taking number of detected NUMA nodes, to make > sure > + * that we don't run out of memseg lists before we populate all > NUMA > + * nodes with memory. > + * > + * we do this in three stages. first, we collect the number of types. > + * then, we figure out memory constraints and populate the list of > + * would-be memseg lists. then, we go ahead and allocate the > memseg > + * lists. > + */ >=20 > - /* create memseg lists */ > + /* create space for mem types */ > + n_memtypes =3D internal_config.num_hugepage_sizes * > rte_socket_count(); > + memtypes =3D calloc(n_memtypes, sizeof(*memtypes)); > + if (memtypes =3D=3D NULL) { > + RTE_LOG(ERR, EAL, "Cannot allocate space for memory > types\n"); > + return -1; > + } > + > + /* populate mem types */ > + cur_type =3D 0; > for (hpi_idx =3D 0; hpi_idx < (int) internal_config.num_hugepage_sizes; > hpi_idx++) { > @@ -2116,9 +2163,6 @@ memseg_primary_init(void) > hugepage_sz =3D hpi->hugepage_sz; >=20 > - for (i =3D 0; i < (int) rte_socket_count(); i++) { > - uint64_t max_type_mem, total_type_mem =3D 0; > - int type_msl_idx, max_segs, total_segs =3D 0; > - > - socket_id =3D rte_socket_id_by_idx(i); > + for (i =3D 0; i < (int) rte_socket_count(); i++, cur_type++) { > + int socket_id =3D rte_socket_id_by_idx(i); >=20 > #ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES > @@ -2126,47 +2170,98 @@ memseg_primary_init(void) > break; > #endif > + memtypes[cur_type].page_sz =3D hugepage_sz; > + memtypes[cur_type].socket_id =3D socket_id; >=20 > - if (total_mem >=3D max_mem) > - break; > + RTE_LOG(DEBUG, EAL, "Detected memory type: " > + "socket_id:%u hugepage_sz:%" PRIu64 "\n", > + socket_id, hugepage_sz); > + } > + } >=20 > - max_type_mem =3D RTE_MIN(max_mem - > total_mem, > - (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << > 20); > - max_segs =3D RTE_MAX_MEMSEG_PER_TYPE; > + /* set up limits for types */ > + max_mem =3D (uint64_t)RTE_MAX_MEM_MB << 20; > + max_mem_per_type =3D > RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, > + max_mem / n_memtypes); > + /* > + * limit maximum number of segment lists per type to ensure there's > + * space for memseg lists for all NUMA nodes with all page sizes > + */ > + max_seglists_per_type =3D RTE_MAX_MEMSEG_LISTS / n_memtypes; >=20 > - type_msl_idx =3D 0; > - while (total_type_mem < max_type_mem && > - total_segs < max_segs) { > - uint64_t cur_max_mem, cur_mem; > - unsigned int n_segs; > + if (max_seglists_per_type =3D=3D 0) { > + RTE_LOG(ERR, EAL, "Cannot accommodate all memory types, > please increase %s\n", > + RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); > + return -1; > + } >=20 > - if (msl_idx >=3D RTE_MAX_MEMSEG_LISTS) { > - RTE_LOG(ERR, EAL, > - "No more space in memseg > lists, please increase %s\n", > - > RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); > - return -1; > - } > + /* go through all mem types and create segment lists */ > + msl_idx =3D 0; > + for (cur_type =3D 0; cur_type < n_memtypes; cur_type++) { > + unsigned int cur_seglist, n_seglists, n_segs; > + unsigned int max_segs_per_type, max_segs_per_list; > + struct memtype *type =3D &memtypes[cur_type]; > + uint64_t max_mem_per_list, pagesz; > + int socket_id; >=20 > - msl =3D &mcfg->memsegs[msl_idx++]; > + pagesz =3D type->page_sz; > + socket_id =3D type->socket_id; >=20 > - cur_max_mem =3D max_type_mem - > total_type_mem; > + /* > + * we need to create segment lists for this type. we must > take > + * into account the following things: > + * > + * 1. total amount of memory we can use for this memory > type > + * 2. total amount of memory per memseg list allowed > + * 3. number of segments needed to fit the amount of > memory > + * 4. number of segments allowed per type > + * 5. number of segments allowed per memseg list > + * 6. number of memseg lists we are allowed to take up > + */ >=20 > - cur_mem =3D > get_mem_amount(hugepage_sz, > - cur_max_mem); > - n_segs =3D cur_mem / hugepage_sz; > + /* calculate how much segments we will need in total */ > + max_segs_per_type =3D max_mem_per_type / pagesz; > + /* limit number of segments to maximum allowed per type > */ > + max_segs_per_type =3D RTE_MIN(max_segs_per_type, > + (unsigned > int)RTE_MAX_MEMSEG_PER_TYPE); > + /* limit number of segments to maximum allowed per list */ > + max_segs_per_list =3D RTE_MIN(max_segs_per_type, > + (unsigned > int)RTE_MAX_MEMSEG_PER_LIST); >=20 > - if (alloc_memseg_list(msl, hugepage_sz, > n_segs, > - socket_id, type_msl_idx)) > - return -1; > + /* calculate how much memory we can have per segment list > */ > + max_mem_per_list =3D RTE_MIN(max_segs_per_list * pagesz, > + (uint64_t)RTE_MAX_MEM_MB_PER_LIST << > 20); >=20 > - total_segs +=3D msl->memseg_arr.len; > - total_type_mem =3D total_segs * > hugepage_sz; > - type_msl_idx++; > + /* calculate how many segments each segment list will have > */ > + n_segs =3D RTE_MIN(max_segs_per_list, max_mem_per_list / > pagesz); >=20 > - if (alloc_va_space(msl)) { > - RTE_LOG(ERR, EAL, "Cannot allocate > VA space for memseg list\n"); > - return -1; > - } > + /* calculate how many segment lists we can have */ > + n_seglists =3D RTE_MIN(max_segs_per_type / n_segs, > + max_mem_per_type / max_mem_per_list); > + > + /* limit number of segment lists according to our maximum > */ > + n_seglists =3D RTE_MIN(n_seglists, max_seglists_per_type); > + > + RTE_LOG(DEBUG, EAL, "Creating %i segment lists: " > + "n_segs:%i socket_id:%i hugepage_sz:%" > PRIu64 "\n", > + n_seglists, n_segs, socket_id, pagesz); > + > + /* create all segment lists */ > + for (cur_seglist =3D 0; cur_seglist < n_seglists; cur_seglist++) { > + if (msl_idx >=3D RTE_MAX_MEMSEG_LISTS) { > + RTE_LOG(ERR, EAL, > + "No more space in memseg lists, > please increase %s\n", > + > RTE_STR(CONFIG_RTE_MAX_MEMSEG_LISTS)); > + return -1; > + } > + msl =3D &mcfg->memsegs[msl_idx++]; > + > + if (alloc_memseg_list(msl, pagesz, n_segs, > + socket_id, cur_seglist)) > + return -1; > + > + if (alloc_va_space(msl)) { > + RTE_LOG(ERR, EAL, "Cannot allocate VA > space for memseg list\n"); > + return -1; > } > - total_mem +=3D total_type_mem; > } > } > -- > 2.19.0 >=20 > --- > Diff of the applied patch vs upstream commit (please double-check if no= n- > empty: > --- > --- - 2018-11-22 16:47:32.741839268 +0000 > +++ 0018-mem-improve-segment-list-preallocation.patch 2018-11-22 > 16:47:32.000000000 +0000 > @@ -1,8 +1,10 @@ > -From 1dd342d0fdc4f72102f0b48c89b6a39f029004fe Mon Sep 17 00:00:00 > 2001 > +From 6d552c83eacba7be4b4f2efbafd58724e07e2330 Mon Sep 17 00:00:00 > 2001 > From: Anatoly Burakov > Date: Fri, 5 Oct 2018 09:29:44 +0100 > Subject: [PATCH] mem: improve segment list preallocation >=20 > +[ upstream commit 1dd342d0fdc4f72102f0b48c89b6a39f029004fe ] > + > Current code to preallocate segment lists is trying to do everything in= one > go, and thus ends up being convoluted, hard to understand, and, most > importantly, does not scale beyond @@ -21,7 +23,6 @@ its code is much > more readable and not worth changing. >=20 > Fixes: 1d406458db47 ("mem: make segment preallocation OS-specific") > -Cc: stable@dpdk.org >=20 > Signed-off-by: Anatoly Burakov > --- > @@ -29,10 +30,10 @@ > 1 file changed, 137 insertions(+), 42 deletions(-) >=20 > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c > b/lib/librte_eal/linuxapp/eal/eal_memory.c > -index 04f264818..19e686eb6 100644 > +index 6131bfde2..cc2d3fb69 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > -@@ -2132,7 +2132,13 @@ memseg_primary_init(void) > +@@ -2096,7 +2096,13 @@ memseg_primary_init(void) > { > struct rte_mem_config *mcfg =3D rte_eal_get_configuration()- > >mem_config; > - int i, socket_id, hpi_idx, msl_idx =3D 0; > @@ -48,7 +49,7 @@ > + unsigned int n_memtypes, cur_type; >=20 > /* no-huge does not need this at all */ -@@ -2140,8 +2146,49 @@ > memseg_primary_init(void) > +@@ -2104,8 +2110,49 @@ memseg_primary_init(void) > return 0; >=20 > - max_mem =3D (uint64_t)RTE_MAX_MEM_MB << 20; > @@ -101,7 +102,7 @@ > + cur_type =3D 0; > for (hpi_idx =3D 0; hpi_idx < (int) internal_config.num_hugepage_sizes= ; > hpi_idx++) { > -@@ -2152,9 +2199,6 @@ memseg_primary_init(void) > +@@ -2116,9 +2163,6 @@ memseg_primary_init(void) > hugepage_sz =3D hpi->hugepage_sz; >=20 > - for (i =3D 0; i < (int) rte_socket_count(); i++) { > @@ -113,7 +114,7 @@ > + int socket_id =3D rte_socket_id_by_idx(i); >=20 > #ifndef RTE_EAL_NUMA_AWARE_HUGEPAGES > -@@ -2162,47 +2206,98 @@ memseg_primary_init(void) > +@@ -2126,47 +2170,98 @@ memseg_primary_init(void) > break; > #endif > + memtypes[cur_type].page_sz =3D hugepage_sz;