From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8D655A04A5; Mon, 15 Jun 2020 08:03:38 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D4C4D54AE; Mon, 15 Jun 2020 08:03:37 +0200 (CEST) Received: from qrelay103.mxroute.com (qrelay103.mxroute.com [172.82.139.103]) by dpdk.org (Postfix) with ESMTP id 92C1A4C9D for ; Mon, 15 Jun 2020 08:03:36 +0200 (CEST) Received: from filter004.mxroute.com ([149.28.56.236] 149.28.56.236.vultr.com) (Authenticated sender: mN4UYu2MZsgR) by qrelay103.mxroute.com (ZoneMTA) with ESMTPA id 172b69360630004840.001 for ; Mon, 15 Jun 2020 06:03:31 +0000 X-Zone-Loop: d951d05a908e983834dc20b6fa886d9b11a7b6bf146f X-Originating-IP: [149.28.56.236] Received: from echo.mxrouting.net (echo.mxrouting.net [116.202.222.109]) by filter004.mxroute.com (Postfix) with ESMTPS id 3EFC73EA30; Mon, 15 Jun 2020 06:03:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=ashroe.eu; s=x; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=+TA+3Gcg7hTv1SiAYIW22sVq5hpxf0lRfBFDa6P237I=; b=T9MmUNAyDKLpPB71foZ9C4MNu4 OkE/S5yoabaPO+d5ky7Zym0zVun7pFakQrRLBJnz+a9rQgkp1vDFXPRFxATh8GPsUWfWuqUjWbgtC 9N7hRefdkHuia0ORr2fNyE92S7lYTX0n2jiBtAGP/h1IkRQEbEvM/eyns0t2SpzGCyhtdno1svhAL 8AhR9P6OCF19V0hzwwCT2pwccEldEyqGfDeOAaA+JRNVHBfGE7McfgTKG/G7PCcRrbxljOuUHrzb7 HtnNDl76VTSzJnL8IoH4rUzsX8l9nMMh96Lid+GogseRjYegIu//Rwn2TJ1rvEt8CrAIqwH9mBanc kO3AGiRw==; To: Dmitry Kozlyuk , dev@dpdk.org Cc: Dmitry Malloy , Narcisa Ana Maria Vasile , Fady Bader , Tal Shnaiderman , Anatoly Burakov , Bruce Richardson , Neil Horman References: <20200610142730.31376-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-1-dmitry.kozliuk@gmail.com> <20200615004354.14380-4-dmitry.kozliuk@gmail.com> From: "Kinsella, Ray" Autocrypt: addr=mdr@ashroe.eu; keydata= mQINBFv8B3wBEAC+5ImcgbIvadt3axrTnt7Sxch3FsmWTTomXfB8YiuHT8KL8L/bFRQSL1f6 ASCHu3M89EjYazlY+vJUWLr0BhK5t/YI7bQzrOuYrl9K94vlLwzD19s/zB/g5YGGR5plJr0s JtJsFGEvF9LL3e+FKMRXveQxBB8A51nAHfwG0WSyx53d61DYz7lp4/Y4RagxaJoHp9lakn8j HV2N6rrnF+qt5ukj5SbbKWSzGg5HQF2t0QQ5tzWhCAKTfcPlnP0GymTBfNMGOReWivi3Qqzr S51Xo7hoGujUgNAM41sxpxmhx8xSwcQ5WzmxgAhJ/StNV9cb3HWIoE5StCwQ4uXOLplZNGnS uxNdegvKB95NHZjRVRChg/uMTGpg9PqYbTIFoPXjuk27sxZLRJRrueg4tLbb3HM39CJwSB++ YICcqf2N+GVD48STfcIlpp12/HI+EcDSThzfWFhaHDC0hyirHxJyHXjnZ8bUexI/5zATn/ux TpMbc/vicJxeN+qfaVqPkCbkS71cHKuPluM3jE8aNCIBNQY1/j87k5ELzg3qaesLo2n1krBH bKvFfAmQuUuJT84/IqfdVtrSCTabvDuNBDpYBV0dGbTwaRfE7i+LiJJclUr8lOvHUpJ4Y6a5 0cxEPxm498G12Z3NoY/mP5soItPIPtLR0rA0fage44zSPwp6cQARAQABtBxSYXkgS2luc2Vs bGEgPG1kckBhc2hyb2UuZXU+iQJUBBMBCAA+FiEEcDUDlKDJaDuJlfZfdJdaH/sCCpsFAlv8 B3wCGyMFCQlmAYAFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQdJdaH/sCCptdtRAAl0oE msa+djBVYLIsax+0f8acidtWg2l9f7kc2hEjp9h9aZCpPchQvhhemtew/nKavik3RSnLTAyn B3C/0GNlmvI1l5PFROOgPZwz4xhJKGN7jOsRrbkJa23a8ly5UXwF3Vqnlny7D3z+7cu1qq/f VRK8qFyWkAb+xgqeZ/hTcbJUWtW+l5Zb+68WGEp8hB7TuJLEWb4+VKgHTpQ4vElYj8H3Z94a 04s2PJMbLIZSgmKDASnyrKY0CzTpPXx5rSJ1q+B1FCsfepHLqt3vKSALa3ld6bJ8fSJtDUJ7 JLiU8dFZrywgDIVme01jPbjJtUScW6jONLvhI8Z2sheR71UoKqGomMHNQpZ03ViVWBEALzEt TcjWgJFn8yAmxqM4nBnZ+hE3LbMo34KCHJD4eg18ojDt3s9VrDLa+V9fNxUHPSib9FD9UX/1 +nGfU/ZABmiTuUDM7WZdXri7HaMpzDRJUKI6b+/uunF8xH/h/MHW16VuMzgI5dkOKKv1LejD dT5mA4R+2zBS+GsM0oa2hUeX9E5WwjaDzXtVDg6kYq8YvEd+m0z3M4e6diFeLS77/sAOgaYL 92UcoKD+Beym/fVuC6/55a0e12ksTmgk5/ZoEdoNQLlVgd2INtvnO+0k5BJcn66ZjKn3GbEC VqFbrnv1GnA58nEInRCTzR1k26h9nmS5Ag0EW/wHfAEQAMth1vHr3fOZkVOPfod3M6DkQir5 xJvUW5EHgYUjYCPIa2qzgIVVuLDqZgSCCinyooG5dUJONVHj3nCbITCpJp4eB3PI84RPfDcC hf/V34N/Gx5mTeoymSZDBmXT8YtvV/uJvn+LvHLO4ZJdvq5ZxmDyxfXFmkm3/lLw0+rrNdK5 pt6OnVlCqEU9tcDBezjUwDtOahyV20XqxtUttN4kQWbDRkhT+HrA9WN9l2HX91yEYC+zmF1S OhBqRoTPLrR6g4sCWgFywqztpvZWhyIicJipnjac7qL/wRS+wrWfsYy6qWLIV80beN7yoa6v ccnuy4pu2uiuhk9/edtlmFE4dNdoRf7843CV9k1yRASTlmPkU59n0TJbw+okTa9fbbQgbIb1 pWsAuicRHyLUIUz4f6kPgdgty2FgTKuPuIzJd1s8s6p2aC1qo+Obm2gnBTduB+/n1Jw+vKpt 07d+CKEKu4CWwvZZ8ktJJLeofi4hMupTYiq+oMzqH+V1k6QgNm0Da489gXllU+3EFC6W1qKj tkvQzg2rYoWeYD1Qn8iXcO4Fpk6wzylclvatBMddVlQ6qrYeTmSbCsk+m2KVrz5vIyja0o5Y yfeN29s9emXnikmNfv/dA5fpi8XCANNnz3zOfA93DOB9DBf0TQ2/OrSPGjB3op7RCfoPBZ7u AjJ9dM7VABEBAAGJAjwEGAEIACYWIQRwNQOUoMloO4mV9l90l1of+wIKmwUCW/wHfAIbDAUJ CWYBgAAKCRB0l1of+wIKm3KlD/9w/LOG5rtgtCUWPl4B3pZvGpNym6XdK8cop9saOnE85zWf u+sKWCrxNgYkYP7aZrYMPwqDvilxhbTsIJl5HhPgpTO1b0i+c0n1Tij3EElj5UCg3q8mEc17 c+5jRrY3oz77g7E3oPftAjaq1ybbXjY4K32o3JHFR6I8wX3m9wJZJe1+Y+UVrrjY65gZFxcA thNVnWKErarVQGjeNgHV4N1uF3pIx3kT1N4GSnxhoz4Bki91kvkbBhUgYfNflGURfZT3wIKK +d50jd7kqRouXUCzTdzmDh7jnYrcEFM4nvyaYu0JjSS5R672d9SK5LVIfWmoUGzqD4AVmUW8 pcv461+PXchuS8+zpltR9zajl72Q3ymlT4BTAQOlCWkD0snBoKNUB5d2EXPNV13nA0qlm4U2 GpROfJMQXjV6fyYRvttKYfM5xYKgRgtP0z5lTAbsjg9WFKq0Fndh7kUlmHjuAIwKIV4Tzo75 QO2zC0/NTaTjmrtiXhP+vkC4pcrOGNsbHuaqvsc/ZZ0siXyYsqbctj/sCd8ka2r94u+c7o4l BGaAm+FtwAfEAkXHu4y5Phuv2IRR+x1wTey1U1RaEPgN8xq0LQ1OitX4t2mQwjdPihZQBCnZ wzOrkbzlJMNrMKJpEgulmxAHmYJKgvZHXZXtLJSejFjR0GdHJcL5rwVOMWB8cg== Message-ID: Date: Mon, 15 Jun 2020 07:03:23 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: <20200615004354.14380-4-dmitry.kozliuk@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-AuthUser: mdr@ashroe.eu Subject: Re: [dpdk-dev] [PATCH v9 03/12] eal: introduce memory management wrappers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 15/06/2020 01:43, Dmitry Kozlyuk wrote: > Introduce OS-independent wrappers for memory management operations used > across DPDK and specifically in common code of EAL: > > * rte_mem_map() > * rte_mem_unmap() > * rte_mem_page_size() > * rte_mem_lock() > > Windows uses different APIs for memory mapping and reservation, while > Unices reserve memory by mapping it. Introduce EAL private functions to > support memory reservation in common code: > > * eal_mem_reserve() > * eal_mem_free() > * eal_mem_set_dump() > > Wrappers follow POSIX semantics limited to DPDK tasks, but their > signatures deliberately differ from POSIX ones to be more safe and > expressive. New symbols are internal. Being thin wrappers, they require > no special maintenance. > > Signed-off-by: Dmitry Kozlyuk > --- > > Not adding rte_eal_paging.h to Doxygen index because, to my > understanding, it only contains public API, and it was decided to keep > rte_eal_paging.h functions private. > > lib/librte_eal/common/eal_common_fbarray.c | 40 +++--- > lib/librte_eal/common/eal_common_memory.c | 61 ++++----- > lib/librte_eal/common/eal_private.h | 78 ++++++++++- > lib/librte_eal/freebsd/Makefile | 1 + > lib/librte_eal/include/rte_eal_paging.h | 98 +++++++++++++ > lib/librte_eal/linux/Makefile | 1 + > lib/librte_eal/linux/eal_memalloc.c | 5 +- > lib/librte_eal/rte_eal_version.map | 9 ++ > lib/librte_eal/unix/eal_unix_memory.c | 152 +++++++++++++++++++++ > lib/librte_eal/unix/meson.build | 1 + > 10 files changed, 381 insertions(+), 65 deletions(-) > create mode 100644 lib/librte_eal/include/rte_eal_paging.h > create mode 100644 lib/librte_eal/unix/eal_unix_memory.c > > diff --git a/lib/librte_eal/common/eal_common_fbarray.c b/lib/librte_eal/common/eal_common_fbarray.c > index c52ddb967..fd0292a64 100644 > --- a/lib/librte_eal/common/eal_common_fbarray.c > +++ b/lib/librte_eal/common/eal_common_fbarray.c > @@ -5,15 +5,16 @@ > #include > #include > #include > -#include > #include > #include > #include > #include > > #include > -#include > +#include > #include > +#include > +#include > #include > #include > > @@ -90,12 +91,9 @@ resize_and_map(int fd, void *addr, size_t len) > return -1; > } > > - map_addr = mmap(addr, len, PROT_READ | PROT_WRITE, > - MAP_SHARED | MAP_FIXED, fd, 0); > + map_addr = rte_mem_map(addr, len, RTE_PROT_READ | RTE_PROT_WRITE, > + RTE_MAP_SHARED | RTE_MAP_FORCE_ADDRESS, fd, 0); > if (map_addr != addr) { > - RTE_LOG(ERR, EAL, "mmap() failed: %s\n", strerror(errno)); > - /* pass errno up the chain */ > - rte_errno = errno; > return -1; > } > return 0; > @@ -733,7 +731,7 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, > return -1; > } > > - page_sz = sysconf(_SC_PAGESIZE); > + page_sz = rte_mem_page_size(); > if (page_sz == (size_t)-1) { > free(ma); > return -1; > @@ -754,11 +752,13 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, > > if (internal_config.no_shconf) { > /* remap virtual area as writable */ > - void *new_data = mmap(data, mmap_len, PROT_READ | PROT_WRITE, > - MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS, fd, 0); > - if (new_data == MAP_FAILED) { > + static const int flags = RTE_MAP_FORCE_ADDRESS | > + RTE_MAP_PRIVATE | RTE_MAP_ANONYMOUS; > + void *new_data = rte_mem_map(data, mmap_len, > + RTE_PROT_READ | RTE_PROT_WRITE, flags, fd, 0); > + if (new_data == NULL) { > RTE_LOG(DEBUG, EAL, "%s(): couldn't remap anonymous memory: %s\n", > - __func__, strerror(errno)); > + __func__, rte_strerror(rte_errno)); > goto fail; > } > } else { > @@ -820,7 +820,7 @@ rte_fbarray_init(struct rte_fbarray *arr, const char *name, unsigned int len, > return 0; > fail: > if (data) > - munmap(data, mmap_len); > + rte_mem_unmap(data, mmap_len); > if (fd >= 0) > close(fd); > free(ma); > @@ -858,7 +858,7 @@ rte_fbarray_attach(struct rte_fbarray *arr) > return -1; > } > > - page_sz = sysconf(_SC_PAGESIZE); > + page_sz = rte_mem_page_size(); > if (page_sz == (size_t)-1) { > free(ma); > return -1; > @@ -909,7 +909,7 @@ rte_fbarray_attach(struct rte_fbarray *arr) > return 0; > fail: > if (data) > - munmap(data, mmap_len); > + rte_mem_unmap(data, mmap_len); > if (fd >= 0) > close(fd); > free(ma); > @@ -937,8 +937,7 @@ rte_fbarray_detach(struct rte_fbarray *arr) > * really do anything about it, things will blow up either way. > */ > > - size_t page_sz = sysconf(_SC_PAGESIZE); > - > + size_t page_sz = rte_mem_page_size(); > if (page_sz == (size_t)-1) > return -1; > > @@ -957,7 +956,7 @@ rte_fbarray_detach(struct rte_fbarray *arr) > goto out; > } > > - munmap(arr->data, mmap_len); > + rte_mem_unmap(arr->data, mmap_len); > > /* area is unmapped, close fd and remove the tailq entry */ > if (tmp->fd >= 0) > @@ -992,8 +991,7 @@ rte_fbarray_destroy(struct rte_fbarray *arr) > * really do anything about it, things will blow up either way. > */ > > - size_t page_sz = sysconf(_SC_PAGESIZE); > - > + size_t page_sz = rte_mem_page_size(); > if (page_sz == (size_t)-1) > return -1; > > @@ -1042,7 +1040,7 @@ rte_fbarray_destroy(struct rte_fbarray *arr) > } > close(fd); > } > - munmap(arr->data, mmap_len); > + rte_mem_unmap(arr->data, mmap_len); > > /* area is unmapped, remove the tailq entry */ > TAILQ_REMOVE(&mem_area_tailq, tmp, next); > diff --git a/lib/librte_eal/common/eal_common_memory.c b/lib/librte_eal/common/eal_common_memory.c > index 4c897a13f..aa377990f 100644 > --- a/lib/librte_eal/common/eal_common_memory.c > +++ b/lib/librte_eal/common/eal_common_memory.c > @@ -11,13 +11,13 @@ > #include > #include > #include > -#include > #include > > #include > #include > #include > #include > +#include > #include > #include > > @@ -40,18 +40,10 @@ > static void *next_baseaddr; > static uint64_t system_page_sz; > > -#ifdef RTE_EXEC_ENV_LINUX > -#define RTE_DONTDUMP MADV_DONTDUMP > -#elif defined RTE_EXEC_ENV_FREEBSD > -#define RTE_DONTDUMP MADV_NOCORE > -#else > -#error "madvise doesn't support this OS" > -#endif > - > #define MAX_MMAP_WITH_DEFINED_ADDR_TRIES 5 > void * > eal_get_virtual_area(void *requested_addr, size_t *size, > - size_t page_sz, int flags, int mmap_flags) > + size_t page_sz, int flags, int reserve_flags) > { > bool addr_is_hint, allow_shrink, unmap, no_align; > uint64_t map_sz; > @@ -59,9 +51,7 @@ eal_get_virtual_area(void *requested_addr, size_t *size, > uint8_t try = 0; > > if (system_page_sz == 0) > - system_page_sz = sysconf(_SC_PAGESIZE); > - > - mmap_flags |= MAP_PRIVATE | MAP_ANONYMOUS; > + system_page_sz = rte_mem_page_size(); > > RTE_LOG(DEBUG, EAL, "Ask a virtual area of 0x%zx bytes\n", *size); > > @@ -105,24 +95,24 @@ eal_get_virtual_area(void *requested_addr, size_t *size, > return NULL; > } > > - mapped_addr = mmap(requested_addr, (size_t)map_sz, PROT_NONE, > - mmap_flags, -1, 0); > - if (mapped_addr == MAP_FAILED && allow_shrink) > + mapped_addr = eal_mem_reserve( > + requested_addr, (size_t)map_sz, reserve_flags); > + if ((mapped_addr == NULL) && allow_shrink) > *size -= page_sz; > > - if (mapped_addr != MAP_FAILED && addr_is_hint && > - mapped_addr != requested_addr) { > + if ((mapped_addr != NULL) && addr_is_hint && > + (mapped_addr != requested_addr)) { > try++; > next_baseaddr = RTE_PTR_ADD(next_baseaddr, page_sz); > if (try <= MAX_MMAP_WITH_DEFINED_ADDR_TRIES) { > /* hint was not used. Try with another offset */ > - munmap(mapped_addr, map_sz); > - mapped_addr = MAP_FAILED; > + eal_mem_free(mapped_addr, map_sz); > + mapped_addr = NULL; > requested_addr = next_baseaddr; > } > } > } while ((allow_shrink || addr_is_hint) && > - mapped_addr == MAP_FAILED && *size > 0); > + (mapped_addr == NULL) && (*size > 0)); > > /* align resulting address - if map failed, we will ignore the value > * anyway, so no need to add additional checks. > @@ -132,20 +122,17 @@ eal_get_virtual_area(void *requested_addr, size_t *size, > > if (*size == 0) { > RTE_LOG(ERR, EAL, "Cannot get a virtual area of any size: %s\n", > - strerror(errno)); > - rte_errno = errno; > + rte_strerror(rte_errno)); > return NULL; > - } else if (mapped_addr == MAP_FAILED) { > + } else if (mapped_addr == NULL) { > RTE_LOG(ERR, EAL, "Cannot get a virtual area: %s\n", > - strerror(errno)); > - /* pass errno up the call chain */ > - rte_errno = errno; > + rte_strerror(rte_errno)); > return NULL; > } else if (requested_addr != NULL && !addr_is_hint && > aligned_addr != requested_addr) { > RTE_LOG(ERR, EAL, "Cannot get a virtual area at requested address: %p (got %p)\n", > requested_addr, aligned_addr); > - munmap(mapped_addr, map_sz); > + eal_mem_free(mapped_addr, map_sz); > rte_errno = EADDRNOTAVAIL; > return NULL; > } else if (requested_addr != NULL && addr_is_hint && > @@ -161,7 +148,7 @@ eal_get_virtual_area(void *requested_addr, size_t *size, > aligned_addr, *size); > > if (unmap) { > - munmap(mapped_addr, map_sz); > + eal_mem_free(mapped_addr, map_sz); > } else if (!no_align) { > void *map_end, *aligned_end; > size_t before_len, after_len; > @@ -179,19 +166,17 @@ eal_get_virtual_area(void *requested_addr, size_t *size, > /* unmap space before aligned mmap address */ > before_len = RTE_PTR_DIFF(aligned_addr, mapped_addr); > if (before_len > 0) > - munmap(mapped_addr, before_len); > + eal_mem_free(mapped_addr, before_len); > > /* unmap space after aligned end mmap address */ > after_len = RTE_PTR_DIFF(map_end, aligned_end); > if (after_len > 0) > - munmap(aligned_end, after_len); > + eal_mem_free(aligned_end, after_len); > } > > if (!unmap) { > /* Exclude these pages from a core dump. */ > - if (madvise(aligned_addr, *size, RTE_DONTDUMP) != 0) > - RTE_LOG(DEBUG, EAL, "madvise failed: %s\n", > - strerror(errno)); > + eal_mem_set_dump(aligned_addr, *size, false); > } > > return aligned_addr; > @@ -547,10 +532,10 @@ rte_eal_memdevice_init(void) > int > rte_mem_lock_page(const void *virt) > { > - unsigned long virtual = (unsigned long)virt; > - int page_size = getpagesize(); > - unsigned long aligned = (virtual & ~(page_size - 1)); > - return mlock((void *)aligned, page_size); > + uintptr_t virtual = (uintptr_t)virt; > + size_t page_size = rte_mem_page_size(); > + uintptr_t aligned = RTE_PTR_ALIGN_FLOOR(virtual, page_size); > + return rte_mem_lock((void *)aligned, page_size); > } > > int > diff --git a/lib/librte_eal/common/eal_private.h b/lib/librte_eal/common/eal_private.h > index 6733a2321..1696345c2 100644 > --- a/lib/librte_eal/common/eal_private.h > +++ b/lib/librte_eal/common/eal_private.h > @@ -11,6 +11,7 @@ > > #include > #include > +#include > > /** > * Structure storing internal configuration (per-lcore) > @@ -202,6 +203,24 @@ int rte_eal_alarm_init(void); > */ > int rte_eal_check_module(const char *module_name); > > +/** > + * Memory reservation flags. > + */ > +enum eal_mem_reserve_flags { > + /** > + * Reserve hugepages. May be unsupported by some platforms. > + */ > + EAL_RESERVE_HUGEPAGES = 1 << 0, > + /** > + * Force reserving memory at the requested address. > + * This can be a destructive action depending on the implementation. > + * > + * @see RTE_MAP_FORCE_ADDRESS for description of possible consequences > + * (although implementations are not required to use it). > + */ > + EAL_RESERVE_FORCE_ADDRESS = 1 << 1 > +}; > + > /** > * Get virtual area of specified size from the OS. > * > @@ -215,8 +234,8 @@ int rte_eal_check_module(const char *module_name); > * Page size on which to align requested virtual area. > * @param flags > * EAL_VIRTUAL_AREA_* flags. > - * @param mmap_flags > - * Extra flags passed directly to mmap(). > + * @param reserve_flags > + * Extra flags passed directly to eal_mem_reserve(). > * > * @return > * Virtual area address if successful. > @@ -233,7 +252,7 @@ int rte_eal_check_module(const char *module_name); > /**< immediately unmap reserved virtual area. */ > void * > eal_get_virtual_area(void *requested_addr, size_t *size, > - size_t page_sz, int flags, int mmap_flags); > + size_t page_sz, int flags, int reserve_flags); > > /** > * Get cpu core_id. > @@ -493,4 +512,57 @@ eal_file_lock(int fd, enum eal_flock_op op, enum eal_flock_mode mode); > int > eal_file_truncate(int fd, ssize_t size); > > +/** > + * Reserve a region of virtual memory. > + * > + * Use eal_mem_free() to free reserved memory. > + * > + * @param requested_addr > + * A desired reservation address which must be page-aligned. > + * The system might not respect it. > + * NULL means the address will be chosen by the system. > + * @param size > + * Reservation size. Must be a multiple of system page size. > + * @param flags > + * Reservation options, a combination of eal_mem_reserve_flags. > + * @returns > + * Starting address of the reserved area on success, NULL on failure. > + * Callers must not access this memory until remapping it. > + */ > +void * > +eal_mem_reserve(void *requested_addr, size_t size, int flags); > + > +/** > + * Free memory obtained by eal_mem_reserve() or eal_mem_alloc(). > + * > + * If *virt* and *size* describe a part of the reserved region, > + * only this part of the region is freed (accurately up to the system > + * page size). If *virt* points to allocated memory, *size* must match > + * the one specified on allocation. The behavior is undefined > + * if the memory pointed by *virt* is obtained from another source > + * than listed above. > + * > + * @param virt > + * A virtual address in a region previously reserved. > + * @param size > + * Number of bytes to unreserve. > + */ > +void > +eal_mem_free(void *virt, size_t size); > + > +/** > + * Configure memory region inclusion into dumps. > + * > + * @param virt > + * Starting address of the region. > + * @param size > + * Size of the region. > + * @param dump > + * True to include memory into dumps, false to exclude. > + * @return > + * 0 on success, (-1) on failure and rte_errno is set. > + */ > +int > +eal_mem_set_dump(void *virt, size_t size, bool dump); > + > #endif /* _EAL_PRIVATE_H_ */ > diff --git a/lib/librte_eal/freebsd/Makefile b/lib/librte_eal/freebsd/Makefile > index 0f8741d96..2374ba0b7 100644 > --- a/lib/librte_eal/freebsd/Makefile > +++ b/lib/librte_eal/freebsd/Makefile > @@ -77,6 +77,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += rte_reciprocal.c > > # from unix dir > SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += eal_file.c > +SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += eal_unix_memory.c > > # from arch dir > SRCS-$(CONFIG_RTE_EXEC_ENV_FREEBSD) += rte_cpuflags.c > diff --git a/lib/librte_eal/include/rte_eal_paging.h b/lib/librte_eal/include/rte_eal_paging.h > new file mode 100644 > index 000000000..ed98e70e9 > --- /dev/null > +++ b/lib/librte_eal/include/rte_eal_paging.h > @@ -0,0 +1,98 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Dmitry Kozlyuk > + */ > + > +#include > + > +#include > + > +/** > + * @file > + * @internal > + * > + * Wrappers for OS facilities related to memory paging, used across DPDK. > + */ > + > +/** Memory protection flags. */ > +enum rte_mem_prot { > + RTE_PROT_READ = 1 << 0, /**< Read access. */ > + RTE_PROT_WRITE = 1 << 1, /**< Write access. */ > + RTE_PROT_EXECUTE = 1 << 2 /**< Code execution. */ > +}; > + > +/** Additional flags for memory mapping. */ > +enum rte_map_flags { > + /** Changes to the mapped memory are visible to other processes. */ > + RTE_MAP_SHARED = 1 << 0, > + /** Mapping is not backed by a regular file. */ > + RTE_MAP_ANONYMOUS = 1 << 1, > + /** Copy-on-write mapping, changes are invisible to other processes. */ > + RTE_MAP_PRIVATE = 1 << 2, > + /** > + * Force mapping to the requested address. This flag should be used > + * with caution, because to fulfill the request implementation > + * may remove all other mappings in the requested region. However, > + * it is not required to do so, thus mapping with this flag may fail. > + */ > + RTE_MAP_FORCE_ADDRESS = 1 << 3 > +}; > + > +/** > + * Map a portion of an opened file or the page file into memory. > + * > + * This function is similar to POSIX mmap(3) with common MAP_ANONYMOUS > + * extension, except for the return value. > + * > + * @param requested_addr > + * Desired virtual address for mapping. Can be NULL to let OS choose. > + * @param size > + * Size of the mapping in bytes. > + * @param prot > + * Protection flags, a combination of rte_mem_prot values. > + * @param flags > + * Additional mapping flags, a combination of rte_map_flags. > + * @param fd > + * Mapped file descriptor. Can be negative for anonymous mapping. > + * @param offset > + * Offset of the mapped region in fd. Must be 0 for anonymous mappings. > + * @return > + * Mapped address or NULL on failure and rte_errno is set to OS error. > + */ > +__rte_internal > +void * > +rte_mem_map(void *requested_addr, size_t size, int prot, int flags, > + int fd, size_t offset); > + > +/** > + * OS-independent implementation of POSIX munmap(3). > + */ > +__rte_internal > +int > +rte_mem_unmap(void *virt, size_t size); > + > +/** > + * Get system page size. This function never fails. > + * > + * @return > + * Page size in bytes. > + */ > +__rte_internal > +size_t > +rte_mem_page_size(void); > + > +/** > + * Lock in physical memory all pages crossed by the address region. > + * > + * @param virt > + * Base virtual address of the region. > + * @param size > + * Size of the region. > + * @return > + * 0 on success, negative on error. > + * > + * @see rte_mem_page_size() to retrieve the page size. > + * @see rte_mem_lock_page() to lock an entire single page. > + */ > +__rte_internal > +int > +rte_mem_lock(const void *virt, size_t size); > diff --git a/lib/librte_eal/linux/Makefile b/lib/librte_eal/linux/Makefile > index 331489f99..8febf2212 100644 > --- a/lib/librte_eal/linux/Makefile > +++ b/lib/librte_eal/linux/Makefile > @@ -84,6 +84,7 @@ SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += rte_reciprocal.c > > # from unix dir > SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_file.c > +SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += eal_unix_memory.c > > # from arch dir > SRCS-$(CONFIG_RTE_EXEC_ENV_LINUX) += rte_cpuflags.c > diff --git a/lib/librte_eal/linux/eal_memalloc.c b/lib/librte_eal/linux/eal_memalloc.c > index 2c717f8bd..bf29b83c6 100644 > --- a/lib/librte_eal/linux/eal_memalloc.c > +++ b/lib/librte_eal/linux/eal_memalloc.c > @@ -630,7 +630,7 @@ alloc_seg(struct rte_memseg *ms, void *addr, int socket_id, > mapped: > munmap(addr, alloc_sz); > unmapped: > - flags = MAP_FIXED; > + flags = EAL_RESERVE_FORCE_ADDRESS; > new_addr = eal_get_virtual_area(addr, &alloc_sz, alloc_sz, 0, flags); > if (new_addr != addr) { > if (new_addr != NULL) > @@ -687,8 +687,7 @@ free_seg(struct rte_memseg *ms, struct hugepage_info *hi, > return -1; > } > > - if (madvise(ms->addr, ms->len, MADV_DONTDUMP) != 0) > - RTE_LOG(DEBUG, EAL, "madvise failed: %s\n", strerror(errno)); > + eal_mem_set_dump(ms->addr, ms->len, false); > > exit_early = false; > > diff --git a/lib/librte_eal/rte_eal_version.map b/lib/librte_eal/rte_eal_version.map > index d8038749a..196eef5af 100644 > --- a/lib/librte_eal/rte_eal_version.map > +++ b/lib/librte_eal/rte_eal_version.map > @@ -387,3 +387,12 @@ EXPERIMENTAL { > rte_trace_regexp; > rte_trace_save; > }; > + > +INTERNAL { > + global: > + > + rte_mem_lock; > + rte_mem_map; > + rte_mem_page_size; > + rte_mem_unmap; > +}; Don't * eal_mem_reserve() * eal_mem_free() * eal_mem_set_dump() Belong in the map file also? > diff --git a/lib/librte_eal/unix/eal_unix_memory.c b/lib/librte_eal/unix/eal_unix_memory.c > new file mode 100644 > index 000000000..ec7156df9 > --- /dev/null > +++ b/lib/librte_eal/unix/eal_unix_memory.c > @@ -0,0 +1,152 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Dmitry Kozlyuk > + */ > + > +#include > +#include > +#include > + > +#include > +#include > +#include > + > +#include "eal_private.h" > + > +#ifdef RTE_EXEC_ENV_LINUX > +#define EAL_DONTDUMP MADV_DONTDUMP > +#define EAL_DODUMP MADV_DODUMP > +#elif defined RTE_EXEC_ENV_FREEBSD > +#define EAL_DONTDUMP MADV_NOCORE > +#define EAL_DODUMP MADV_CORE > +#else > +#error "madvise doesn't support this OS" > +#endif > + > +static void * > +mem_map(void *requested_addr, size_t size, int prot, int flags, > + int fd, size_t offset) > +{ > + void *virt = mmap(requested_addr, size, prot, flags, fd, offset); > + if (virt == MAP_FAILED) { > + RTE_LOG(DEBUG, EAL, > + "Cannot mmap(%p, 0x%zx, 0x%x, 0x%x, %d, 0x%zx): %s\n", > + requested_addr, size, prot, flags, fd, offset, > + strerror(errno)); > + rte_errno = errno; > + return NULL; > + } > + return virt; > +} > + > +static int > +mem_unmap(void *virt, size_t size) > +{ > + int ret = munmap(virt, size); > + if (ret < 0) { > + RTE_LOG(DEBUG, EAL, "Cannot munmap(%p, 0x%zx): %s\n", > + virt, size, strerror(errno)); > + rte_errno = errno; > + } > + return ret; > +} > + > +void * > +eal_mem_reserve(void *requested_addr, size_t size, int flags) > +{ > + int sys_flags = MAP_PRIVATE | MAP_ANONYMOUS; > + > + if (flags & EAL_RESERVE_HUGEPAGES) { > +#ifdef MAP_HUGETLB > + sys_flags |= MAP_HUGETLB; > +#else > + rte_errno = ENOTSUP; > + return NULL; > +#endif > + } > + > + if (flags & EAL_RESERVE_FORCE_ADDRESS) > + sys_flags |= MAP_FIXED; > + > + return mem_map(requested_addr, size, PROT_NONE, sys_flags, -1, 0); > +} > + > +void > +eal_mem_free(void *virt, size_t size) > +{ > + mem_unmap(virt, size); > +} > + > +int > +eal_mem_set_dump(void *virt, size_t size, bool dump) > +{ > + int flags = dump ? EAL_DODUMP : EAL_DONTDUMP; > + int ret = madvise(virt, size, flags); > + if (ret) { > + RTE_LOG(DEBUG, EAL, "madvise(%p, %#zx, %d) failed: %s\n", > + virt, size, flags, strerror(rte_errno)); > + rte_errno = errno; > + } > + return ret; > +} > + > +static int > +mem_rte_to_sys_prot(int prot) > +{ > + int sys_prot = PROT_NONE; > + > + if (prot & RTE_PROT_READ) > + sys_prot |= PROT_READ; > + if (prot & RTE_PROT_WRITE) > + sys_prot |= PROT_WRITE; > + if (prot & RTE_PROT_EXECUTE) > + sys_prot |= PROT_EXEC; > + > + return sys_prot; > +} > + > +void * > +rte_mem_map(void *requested_addr, size_t size, int prot, int flags, > + int fd, size_t offset) > +{ > + int sys_flags = 0; > + int sys_prot; > + > + sys_prot = mem_rte_to_sys_prot(prot); > + > + if (flags & RTE_MAP_SHARED) > + sys_flags |= MAP_SHARED; > + if (flags & RTE_MAP_ANONYMOUS) > + sys_flags |= MAP_ANONYMOUS; > + if (flags & RTE_MAP_PRIVATE) > + sys_flags |= MAP_PRIVATE; > + if (flags & RTE_MAP_FORCE_ADDRESS) > + sys_flags |= MAP_FIXED; > + > + return mem_map(requested_addr, size, sys_prot, sys_flags, fd, offset); > +} > + > +int > +rte_mem_unmap(void *virt, size_t size) > +{ > + return mem_unmap(virt, size); > +} > + > +size_t > +rte_mem_page_size(void) > +{ > + static size_t page_size; > + > + if (!page_size) > + page_size = sysconf(_SC_PAGESIZE); > + > + return page_size; > +} > + > +int > +rte_mem_lock(const void *virt, size_t size) > +{ > + int ret = mlock(virt, size); > + if (ret) > + rte_errno = errno; > + return ret; > +} > diff --git a/lib/librte_eal/unix/meson.build b/lib/librte_eal/unix/meson.build > index 21029ba1a..e733910a1 100644 > --- a/lib/librte_eal/unix/meson.build > +++ b/lib/librte_eal/unix/meson.build > @@ -3,4 +3,5 @@ > > sources += files( > 'eal_file.c', > + 'eal_unix_memory.c', > )