From: Kamraan Nasim <knasim@sidebandnetworks.com>
To: dev@dpdk.org, newman555p@gmail.com, liran@weka.io
Subject: Re: [dpdk-dev] dev Digest, Vol 22, Issue 37
Date: Wed, 15 Apr 2015 17:43:02 -0700 [thread overview]
Message-ID: <CAPrTskgnK3XNZNkeTe_qC=yfy9O7gKpD1bY9=ckwOSx83xUyMQ@mail.gmail.com> (raw)
In-Reply-To: <mailman.9407.1420917966.2352.dev@dpdk.org>
>
> This had me stumped for a while as well. In my case, PostGres9.4 was also
> running on my system which also used huge pages and came up before my DPDK
> application causing rte_mempool_create() to ENOMEM.
Check which other applications are using huge pages:
> lsof | grep huge
And see if you can disable huge pages for them or increase the total pages
you're allocating in Kernel.
--Kam
>
> Date: Sat, 10 Jan 2015 21:26:03 +0200
> From: Liran Zvibel <liran@weka.io>
> To: Newman Poborsky <newman555p@gmail.com>, "dev@dpdk.org"
> <dev@dpdk.org>
> Subject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
> Message-ID:
> <CAF28U9ORGNY7=
> QUKrd-ZCGn6HqBw7h6NE7wxUszf6WxOY18geg@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> Hi Newman,
>
> There are two options, either one of your pools is very large, and
> just does not fit in half of the memory,
> so if the physical memory must be split it just can never work, or
> what you?re seeing is localized to your
> environment, and just when allocating from both NUMAs the huge pages
> just happen to be to scattered
> for your pools to be allocated.
>
> In any case, we also have to deal with large pools that don?t always
> fit into consecutive huge pages as
> allocated by the kernel. I have created a small patch to DPDK itself,
> then some more code that can live
> as part of the dpdk application that does the scattered allocation.
>
> I?m going to send both parts here (the change to the DPDK and the user
> part). I don?t know what are the
> rules that allow pushing to the repository, so I won?t try to do so.
>
> First ? the DPDK patch, that just makes sure that the huge pates are
> mapped in a continuous virtual memory,
> and then the memory segments are allocated continuously in virtual
> memory: I?m attaching full mbox content to make it easier
> for you to use if you?d like. I created it against 1.7.1, since that
> is the version we?re using. If you?d like, I can also create it
> against 1.8.0
>
> ====================================================
>
> >From 10ebc74eda2c3fe9e5a34815e0f7ee1f44d99aa3 Mon Sep 17 00:00:00 2001
> From: Liran Zvibel <liran@weka.io>
> Date: Sat, 10 Jan 2015 12:46:54 +0200
> Subject: [PATCH] Add an option to allocate huge pages in contiunous virtual
> addresses
> To: dev@dpdk.org
>
> Add a configuration option: CONFIG_RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR
> that advises the memory sengment allocation code to allocate as many
> hugemages in a continuous way in virtual addresses as possible.
>
> This way, a mempool may be created out of disparsed memzones allocated
> from these new continuos memory segments.
> ---
> lib/librte_eal/linuxapp/eal/eal_memory.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
> b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index f2454f4..b8d68b0 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -329,6 +329,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
>
> #ifndef RTE_EAL_SINGLE_FILE_SEGMENTS
> else if (vma_len == 0) {
> +#ifndef RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR
> unsigned j, num_pages;
>
> /* reserve a virtual area for next contiguous
> @@ -340,6 +341,14 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,
> break;
> }
> num_pages = j - i;
> +#else // hugepages are will be allocated in a continous virtual address
> way
> + unsigned num_pages;
> + /* We will reserve a virtual area large enough
> to fit ALL
> + * physical blocks.
> + * This way we can have bigger mempools even
> if there is no
> + * continuos physcial region.
> */
> + num_pages = hpi->num_pages[0] - i;
> +#endif
> vma_len = num_pages * hugepage_sz;
>
> /* get the biggest virtual memory area up to
> @@ -1268,6 +1277,16 @@ rte_eal_hugepage_init(void)
> new_memseg = 1;
>
> if (new_memseg) {
> +#ifdef RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR
> + if (0 <= j) {
> + RTE_LOG(DEBUG, EAL, "Closing memory
> segment #%d(%p) vaddr is %p phys is 0x%lx size is 0x%lx "
> + "which is #%ld pages next
> vaddr will be at 0x%lx\n",
> + j,&mcfg->memseg[j],
> + mcfg->memseg[j].addr,
> mcfg->memseg[j].phys_addr, mcfg->memseg[j].len,
> + mcfg->memseg[j].len /
> mcfg->memseg[j].hugepage_sz,
> + mcfg->memseg[j].addr_64 +
> mcfg->memseg[j].len);
> + }
> +#endif
> j += 1;
> if (j == RTE_MAX_MEMSEG)
> break;
> --
> 1.9.3 (Apple Git-50)
>
> ================================================================
>
> Then there is the dpdk-application library part that implements the
> struct rte_mempool *scattered_mempool_create(uint32_t elt_size,
> uint32_t elt_num, int32_t socket_id,
> rte_mempool_ctor_t
> *mp_init, void *mp_init_arg,
> rte_mempool_obj_ctor_t
> *obj_init, void *obj_init_arg)
>
> interface. If you would like, I can easily break the different
> functions into their right place in the rte_memseg and rte_mempool
> DPDK modules and have it included as another interface of the DPDK
> library (as suggested by Konstantin below)
>
> =====================================================
> static inline int is_memseg_valid(struct rte_memseg * free_memseg,
> size_t requested_page_size,
> int socket_id)
> {
> if (free_memseg->len == 0) {
> return 0;
> }
>
> if (socket_id != SOCKET_ID_ANY &&
> free_memseg->socket_id != SOCKET_ID_ANY &&
> free_memseg->socket_id != socket_id) {
> RTE_LOG(DEBUG, USER1, "memseg goes not qualify for
> socked_id, requested %d got %d",
> socket_id, free_memseg->socket_id);
> return 0;
> }
>
> if (free_memseg->len < requested_page_size) {
> RTE_LOG(DEBUG, USER1, "memseg too small. len %lu <
> requested_page_size %lu",
> free_memseg->len, requested_page_size);
> return 0;
> }
>
>
> if (free_memseg->hugepage_sz != requested_page_size) {
> RTE_LOG(DEBUG, USER1, "memset hugepage size !=
> requested page size %lu != %lu",
> free_memseg->hugepage_sz,
> requested_page_size);
> return 0;
> }
>
> return 1;
> }
>
> static int try_allocating_memseg_range(struct rte_memseg *
> free_memseg, int start,
> int requested_page_size, size_t
> len, int socket_id)
> {
> int i;
> for (i = start; i < RTE_MAX_MEMSEG; i++) {
> if (free_memseg[i].addr == NULL) {
> return -1;
> }
>
> if (!is_memseg_valid(free_memseg +i,
> requested_page_size, socket_id)) {
> return -1;
> }
>
> if ((start != i) &&
> ((char *)free_memseg[i].addr !=
> (char*)free_memseg[i-1].addr + free_memseg[i-1].len)) {
> RTE_LOG(DEBUG, USER1, "Looking for cont memseg
> range. "
> "[%d].vaddr %p != [%d].vaddr %p +
> [i-1].len %lu == %p",
> i, free_memseg[i].addr, i-1,
> free_memseg[i-1].addr,
> free_memseg[i-1].len,
> (char *)(free_memseg[i-1].addr) +
> free_memseg[i-1].len);
> return -1;
> }
>
> if ((free_memseg[i].len < len) && ((free_memseg[i].len
> % requested_page_size) != 0)) {
> RTE_LOG(DEBUG, USER1, "#%d memseg length not a
> multplie of page size, or last."
> " len %lu len %% requsted_pg_size %lu,
> requested_pg_sz %d",
> i, free_memseg[i].len, free_memseg[i].len %
> requested_page_size, requested_page_size);
> return -1;
> }
>
>
> if (len <= free_memseg[i].len) {
> RTE_LOG(DEBUG, USER1, "Successfuly finished
> lookng for memsegs. remaining req. "
> "len %lu seg_len %lu, start %d i %d",
> len, free_memseg[i].len, start, i);
> return i - start +1;
> }
>
> if (i == start) {
> // We may not start on the beginning, have to
> move to next pagesize alignment...
> char * aligned_vaddr =
> RTE_PTR_ALIGN_CEIL(free_memseg[i].addr, requested_page_size);
> size_t diff = (size_t)(aligned_vaddr - (char
> *)free_memseg[i].addr);
> if ((free_memseg[i].len - diff) %
> requested_page_size != 0) {
> RTE_LOG(ERR, USER1, "BUG! First
> segment is not page aligned! vaddr %p aligned "
> "vaddr %p diff %lu len %lu,
> len - diff %lu, "
> "(len%%diff)/%d == %lu",
> free_memseg[i].addr,
> aligned_vaddr, diff, free_memseg[i].len,
> free_memseg[i].len - diff,
> requested_page_size,
> (free_memseg[i].len - diff)
> % requested_page_size);
> return -1;
> } else if (0 == free_memseg[i].len - diff) {
> RTE_LOG(DEBUG, USER1, "After
> alignment, first memseg is empty!");
> return -1;
> }
>
> RTE_LOG(DEBUG, USER1, "First memseg gives
> (after alignment) len %lu out of potential %lu",
> (free_memseg[i].len - diff),
> free_memseg[i].len);
> len -= (free_memseg[i].len - diff);
> }
> len -= free_memseg[i].len;
> }
>
> return -1;
> }
>
>
> /**
> * Will register several memory zones, in continueues virtual
> addresses of large size.
> * All first memzones will use full pages, only the last memzone may
> request less than a full hugepage.
> *
> * It will go through all the free memory segments, once it finds a
> memsegment with full hugepages, it
> * will check wheter it can start allocating from that memory segment on.
> */
> static const struct rte_memzone *
> memzone_reserve_multiple_cont_mz(const char * basename, size_t *
> zones_len, size_t len, int socket_id,
> unsigned flags, unsigned align)
> {
> struct rte_mem_config *mcfg;
> const struct rte_memzone * ret = NULL;
> size_t requested_page_size;
> int i;
> struct rte_memseg * free_memseg = NULL;
> int first_memseg = -1;
> int memseg_count = -1;
>
> mcfg = rte_eal_get_configuration()->mem_config;
> free_memseg = mcfg->free_memseg;
>
> RTE_LOG(DEBUG, USER1, "mcfg is at %p free_memseg at %p memseg
> at %p", mcfg, mcfg->free_memseg, mcfg->memseg);
>
> for (i = 0; i < 10 && (free_memseg[i].addr != NULL); i++) {
> RTE_LOG(DEBUG, USER1, "free_memseg[%d] : vaddr 0x%p
> phys_addr 0x%p len %lu pages: %lu [0x%lu]", i,
> free_memseg[i].addr,
> (void*)free_memseg[i].phys_addr,
> free_memseg[i].len, free_memseg[i].len/free_memseg[i].hugepage_sz,
> free_memseg[i].hugepage_sz);
> }
>
>
> for (i = 0; i < 10 && (mcfg->memseg[i].addr != NULL); i++) {
> RTE_LOG(DEBUG, USER1, "memseg[%d] : vaddr 0x%p
> phys_addr 0x%p len %lu pages: %lu [0x%lu]", i,
> mcfg->memseg[i].addr,
> (void*)mcfg->memseg[i].phys_addr,
> mcfg->memseg[i].len,
> mcfg->memseg[i].len/mcfg->memseg[i].hugepage_sz,
> mcfg->memseg[i].hugepage_sz);
> }
>
> *zones_len = 0;
>
> if (mcfg->memzone_idx >= RTE_MAX_MEMZONE) {
> RTE_LOG(DEBUG, USER1, "No more room for new memzones");
> return NULL;
> }
>
> if ((flags & (RTE_MEMZONE_2MB | RTE_MEMZONE_1GB)) == 0) {
> RTE_LOG(DEBUG, USER1, "Must request either 2MB or 1GB
> pages");
> return NULL;
> }
>
> if ((flags & RTE_MEMZONE_1GB ) && (flags & RTE_MEMZONE_2MB)) {
> RTE_LOG(DEBUG, USER1, "Cannot request both 1GB and 2MB
> pages");
> return NULL;
> }
>
> if (flags & RTE_MEMZONE_2MB) {
> requested_page_size = RTE_PGSIZE_2M;
> } else {
> requested_page_size = RTE_PGSIZE_1G;
> }
>
> if (len < requested_page_size) {
> RTE_LOG(DEBUG, USER1, "Requested length %lu is smaller
> than requested pages size %lu",
> len , requested_page_size);
> return NULL;
> }
>
> ret = rte_memzone_reserve_aligned(basename, len, socket_id,
> flags, align);
> if (ret != NULL) {
> RTE_LOG(DEBUG, USER1, "Normal
> rte_memzone_reserve_aligned worked!");
> *zones_len = 1;
> return ret;
> }
>
> RTE_LOG(DEBUG, USER1, "rte_memzone_reserve_aligned failed.
> Will have to allocate on our own");
> rte_rwlock_write_lock(&mcfg->mlock);
>
> for (i = 0; i < RTE_MAX_MEMSEG; i++) {
> if (free_memseg[i].addr == NULL) {
> break;
> }
>
> if (!is_memseg_valid(free_memseg +i,
> requested_page_size, socket_id)) {
> continue;
> }
>
> memseg_count =
> try_allocating_memseg_range(free_memseg, i, requested_page_size, len,
> socket_id);
> if (0 < memseg_count ) {
> RTE_LOG(DEBUG, USER1, "Was able to find
> memsegments for zone! "
> "first segment: %d segment_count %d len
> %lu",
> i, memseg_count, len);
> first_memseg = i;
>
> // Fix first memseg -- make sure it's page aligned!
> char * aligned_vaddr =
> RTE_PTR_ALIGN_CEIL(free_memseg[i].addr,
>
> requested_page_size);
> size_t diff = (size_t)(aligned_vaddr - (char
> *)free_memseg[i].addr);
> RTE_LOG(DEBUG, USER1, "Decreasing first
> segment by %lu", diff);
> free_memseg[i].addr = aligned_vaddr;
> free_memseg[i].phys_addr += diff;
> free_memseg[i].len -= diff;
> if ((free_memseg[i].phys_addr %
> requested_page_size != 0)) {
> RTE_LOG(ERR, USER1, "After aligning
> first free memseg, "
> "physical address NOT page
> aligned! %p",
>
> (void*)free_memseg[i].phys_addr);
> abort();
> }
>
> break;
> }
> }
>
> if (first_memseg < 0) {
> RTE_LOG(DEBUG, USER1, "Could not find memsegs to
> allocate enough memory");
> goto out;
> }
>
> // now perform actual allocation.
> if (mcfg->memzone_idx + memseg_count >= RTE_MAX_MEMZONE) {
> RTE_LOG(DEBUG, USER1, "There are not enough memzones
> to allocate. "
> "memzone_idx %d memseg_count %d max %s=%d",
> mcfg->memzone_idx, memseg_count,
> RTE_STR(RTE_MAX_MEMZONE), RTE_MAX_MEMZONE);
> goto out;
> }
>
> ret = &mcfg->memzone[mcfg->memzone_idx];
> *zones_len = memseg_count;
> for (i = first_memseg; i < first_memseg + memseg_count; i++) {
> size_t allocated_length;
> if (free_memseg[i].len <= len) {
> allocated_length = free_memseg[i].len;
> } else {
> allocated_length = len;
> }
>
> struct rte_memzone * mz =
> &mcfg->memzone[mcfg->memzone_idx++];
> snprintf(mz->name, sizeof(mz->name), "%s%d", basename,
> i - first_memseg);
> mz->phys_addr = free_memseg[i].phys_addr;
> mz->addr = free_memseg[i].addr;
> mz->len = allocated_length;
> mz->hugepage_sz = free_memseg[i].hugepage_sz;
> mz->socket_id = free_memseg[i].socket_id;
> mz->flags = 0;
> mz->memseg_id = i;
>
> free_memseg[i].len -= allocated_length;
> free_memseg[i].phys_addr += allocated_length;
> free_memseg[i].addr_64 += allocated_length;
> len -= allocated_length;
> }
>
> if (len != 0) {
> RTE_LOG(DEBUG, USER1, "After registering all the
> memzone, len is too small! Len is %lu", len);
> ret = NULL;
> goto out;
> }
> out:
> rte_rwlock_write_unlock(&mcfg->mlock);
> return ret;
> }
>
>
> static inline void build_physical_pages(phys_addr_t * phys_pages, int
> num_phys_pages, size_t sz,
> const struct rte_memzone * mz,
> int num_zones)
> {
> size_t accounted_for_size =0;
> int curr_page = 0;
> int i;
> unsigned j;
>
> RTE_LOG(DEBUG, USER1, "Phys pages are at %p 2M is %d mz
> pagesize is %lu trailing zeros: %d",
> phys_pages, RTE_PGSIZE_2M, mz->hugepage_sz,
> __builtin_ctz(mz->hugepage_sz));
>
> for (i = 0; i < num_zones; i++) {
> size_t mz_remaining_len = mz[i].len;
> for (j = 0; (j <= mz[i].len / RTE_PGSIZE_2M) && (0 <
> mz_remaining_len) ; j++) {
> phys_pages[curr_page++] = mz[i].phys_addr + j
> * RTE_PGSIZE_2M;
>
> size_t added_len =
> RTE_MIN((size_t)RTE_PGSIZE_2M, mz_remaining_len);
> accounted_for_size += added_len;
> mz_remaining_len -= added_len;
>
> if (sz <= accounted_for_size) {
> RTE_LOG(DEBUG, USER1, "Filled in %d
> pages of the physical pages array", curr_page);
> return;
> }
> if (num_phys_pages < curr_page) {
> RTE_LOG(ERR, USER1, "When building
> physcial pages array, "
> "used pages (%d) is more
> than allocated pages %d. "
> "accounted size %lu size %lu",
> curr_page, num_phys_pages,
> accounted_for_size, sz);
> abort();
> }
> }
> }
>
> if (accounted_for_size < sz) {
> RTE_LOG(ERR, USER1, "Finished going over %d memory
> zones, and still accounted size is %lu "
> "and requested size is %lu",
> num_zones, accounted_for_size, sz);
> abort();
> }
> }
>
> struct rte_mempool *scattered_mempool_create(uint32_t elt_size,
> uint32_t elt_num, int32_t socket_id,
> rte_mempool_ctor_t
> *mp_init, void *mp_init_arg,
> rte_mempool_obj_ctor_t
> *obj_init, void *obj_init_arg)
> {
> struct rte_mempool *mp;
> const struct rte_memzone *mz;
> size_t num_zones;
> struct rte_mempool_objsz obj_sz;
> uint32_t flags, total_size;
> size_t sz;
>
> flags = (MEMPOOL_F_NO_SPREAD|MEMPOOL_F_SC_GET|MEMPOOL_F_SP_PUT);
>
> total_size = rte_mempool_calc_obj_size(elt_size, flags, &obj_sz);
>
> sz = elt_num * total_size;
> /* We now have to account for the "gaps" at the end of each
> page. Worst case is that we get
> * all distinct pages, so we have to add the gap for each
> possible page */
> int pages_num = (sz + RTE_PGSIZE_2M -1) / RTE_PGSIZE_2M;
> int page_gap = RTE_PGSIZE_2M % elt_size;
> sz += pages_num + page_gap;
>
> RTE_LOG(DEBUG, USER1, "Will have to allocate %d 2M pages for
> the page table.", pages_num);
>
> if ((mz = memzone_reserve_multiple_cont_mz("data_obj",
> &num_zones, sz, socket_id,
> RTE_MEMZONE_2MB,
> RTE_PGSIZE_2M)) == NULL) {
> RTE_LOG(WARNING, USER1, "memzone reserve multi mz
> returned NULL for socket id %d, will try ANY",
> socket_id);
> if ((mz =
> memzone_reserve_multiple_cont_mz("data_obj",
> &num_zones, sz, socket_id,
> RTE_MEMZONE_2MB,
> RTE_PGSIZE_2M)) == NULL) {
> RTE_LOG(ERR, USER1, "memzone reserve multi mz
> returned NULL even for any socket");
> return NULL;
> } else {
> RTE_LOG(DEBUG, USER1, "memzone reserve multi
> mz returne %p with %lu zones for SOCKET_ID_ANY",
> mz, num_zones);
> }
> } else {
> RTE_LOG(DEBUG, USER1, "memzone reserve multi mz
> returned %p with %lu zones for size %lu socket %d",
> mz, num_zones, sz, socket_id);
> }
>
> // Now will "break" the pages into smaller ones
> phys_addr_t * phys_pages = malloc(sizeof(phys_addr_t)*pages_num);
> if(phys_pages == NULL) {
> RTE_LOG(DEBUG, USER1, "phys_pages is null. aborting");
> abort();
> }
>
> build_physical_pages(phys_pages, pages_num, sz, mz, num_zones);
> RTE_LOG(DEBUG, USER1, "Beginning of vaddr is %p beginning of
> physical addr is 0x%lx", mz->addr, mz->phys_addr);
> mp = rte_mempool_xmem_create("data_pool", elt_num, elt_size,
> 257 , sizeof(struct
> rte_pktmbuf_pool_private),
> mp_init, mp_init_arg, obj_init,
> obj_init_arg,
> socket_id, flags, (char *)mz[0].addr,
> phys_pages, pages_num,
> rte_bsf32(RTE_PGSIZE_2M));
>
> RTE_LOG(DEBUG, USER1, "rte_mempool_xmem_create returned %p", mp);
> return mp;
> }
>
> =================================================================
>
> Please let me know if you have any questions/comments about this code.
>
> Best Regards,
>
> Liran.
>
> On Jan 8, 2015, at 10:19, Newman Poborsky <newman555p@gmail.com> wrote:
>
> I finally found the time to try this and I noticed that on a server
> with 1 NUMA node, this works, but if server has 2 NUMA nodes than by
> default memory policy, reserved hugepages are divided on each node and
> again DPDK test app fails for the reason already mentioned. I found
> out that 'solution' for this is to deallocate hugepages on node1
> (after boot) and leave them only on node0:
> echo 0 >
> /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages
>
> Could someone please explain what changes when there are hugepages on
> both nodes? Does this cause some memory fragmentation so that there
> aren't enough contiguous segments? If so, how?
>
> Thanks!
>
> Newman
>
> On Mon, Dec 22, 2014 at 11:48 AM, Newman Poborsky <newman555p@gmail.com>
> wrote:
>
> On Sat, Dec 20, 2014 at 2:34 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>
> You can reserve hugepages on the kernel cmdline (GRUB).
>
>
> Great, thanks, I'll try that!
>
> Newman
>
>
> On Fri, Dec 19, 2014 at 12:13 PM, Newman Poborsky <newman555p@gmail.com>
> wrote:
>
>
> On Thu, Dec 18, 2014 at 9:03 PM, Ananyev, Konstantin <
> konstantin.ananyev@intel.com> wrote:
>
>
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> Sent: Thursday, December 18, 2014 5:43 PM
> To: Newman Poborsky; dev@dpdk.org
> Subject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM
>
> Hi
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Newman Poborsky
> Sent: Thursday, December 18, 2014 1:26 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] rte_mempool_create fails with ENOMEM
>
> Hi,
>
> could someone please provide any explanation why sometimes mempool
>
> creation
>
> fails with ENOMEM?
>
> I run my test app several times without any problems and then I
> start
> getting ENOMEM error when creating mempool that are used for
> packets.
>
> I try
>
> to delete everything from /mnt/huge, I increase the number of huge
>
> pages,
>
> remount /mnt/huge but nothing helps.
>
> There is more than enough memory on server. I tried to debug
> rte_mempool_create() call and it seems that after server is
> restarted
>
> free
>
> mem segments are bigger than 2MB, but after running test app for
>
> several
>
> times, it seems that all free mem segments have a size of 2MB, and
>
> since I
>
> am requesting 8MB for my packet mempool, this fails. I'm not really
>
> sure
>
> that this conclusion is correct.
>
>
> Yes,rte_mempool_create uses rte_memzone_reserve() to allocate
> single physically continuous chunk of memory.
> If no such chunk exist, then it would fail.
> Why physically continuous?
> Main reason - to make things easier for us, as in that case we don't
>
> have to worry
>
> about situation when mbuf crosses page boundary.
> So you can overcome that problem like that:
> Allocate max amount of memory you would need to hold all mbufs in
> worst
>
> case (all pages physically disjoint)
>
> using rte_malloc().
>
>
> Actually my wrong: rte_malloc()s wouldn't help you here.
> You probably need to allocate some external (not managed by EAL) memory
> in
> that case,
> may be mmap() with MAP_HUGETLB, or something similar.
>
> Figure out it's physical mappings.
> Call rte_mempool_xmem_create().
> You can look at: app/test-pmd/mempool_anon.c as a reference.
> It uses same approach to create mempool over 4K pages.
>
> We probably add similar function into mempool API
>
> (create_scatter_mempool or something)
>
> or just add a new flag (USE_SCATTER_MEM) into rte_mempool_create().
> Though right now it is not there.
>
> Another quick alternative - use 1G pages.
>
> Konstantin
>
>
>
>
> Ok, thanks for the explanation. I understand that this is probably an OS
> question more than DPDK, but is there a way to again allocate a contiguous
> memory for n-th run of my test app? It seems that hugepages get
> divded/separated to individual 2MB hugepage. Shouldn't OS's memory
> management system try to group those hupages back to one contiguous chunk
> once my app/process is done? Again, I know very little about Linux
> memory
> management and hugepages, so forgive me if this is a stupid question.
> Is rebooting the OS the only way to deal with this problem? Or should I
> just try to use 1GB hugepages?
>
> p.s. Konstantin, sorry for the double reply, I accidentally forgot to
> include dev list in my first reply :)
>
> Newman
>
>
>
> Does anybody have any idea what to check and how running my test app
> several times affects hugepages?
>
> For me, this doesn't make any since because after test app exits,
>
> resources
>
> should be freed, right?
>
> This has been driving me crazy for days now. I tried reading a bit
> more
> theory about hugepages, but didn't find out anything that could help
>
> me.
>
> Maybe it's something else and completely trivial, but I can't figure
> it
> out, so any help is appreciated.
>
> Thank you!
>
> BR,
> Newman P.
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> dev mailing list
> dev@dpdk.org
> http://dpdk.org/ml/listinfo/dev
>
>
> ------------------------------
>
> End of dev Digest, Vol 22, Issue 37
> ***********************************
>
next parent reply other threads:[~2015-04-16 0:43 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <mailman.9407.1420917966.2352.dev@dpdk.org>
2015-04-16 0:43 ` Kamraan Nasim [this message]
2015-04-16 0:50 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPrTskgnK3XNZNkeTe_qC=yfy9O7gKpD1bY9=ckwOSx83xUyMQ@mail.gmail.com' \
--to=knasim@sidebandnetworks.com \
--cc=dev@dpdk.org \
--cc=liran@weka.io \
--cc=newman555p@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).