From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vn0-f47.google.com (mail-vn0-f47.google.com [209.85.216.47]) by dpdk.org (Postfix) with ESMTP id E6E63C346 for ; Thu, 16 Apr 2015 02:43:02 +0200 (CEST) Received: by vnbg62 with SMTP id g62so21769988vnb.6 for ; Wed, 15 Apr 2015 17:43:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sidebandnetworks.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=/Ii6IWdMqXTkHwabsnV5ulnQCV5wfoqx0JelAbAftc0=; b=QLkukfAs6cjaBgYb4EVAQLVPZiLR8JuSBoxbhsxw9yvY4k4ktDF9WFu1kW5q/2Bh+P Vej9y/yK4lRds2KPfddUQBCLPt2znyl/T9hYFC/UjrL9iF8nQddeoI54EQkwTqfDaa2r jaqIvnreVhoaO4GE+CkOmVkfwWi8VRBr6G+6Q= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=/Ii6IWdMqXTkHwabsnV5ulnQCV5wfoqx0JelAbAftc0=; b=kiOfJQA0r7U7Tf6sFcI0c9dTP+JgG05vDwT1+99cXZBDyPzQBaf1DxAOVFkIvDUb6B dAvcWKk8qnI+sToe9gtqiebrJAd8MvzvjY7/UgueWxqt7ub19hydCXWdRzZK4wgGB+oT vc3e2if5ME6wk7lubazt/rz3yJ/sOU3cGpcOiCstqGvJbVw1ozp1CPcWSRFqn2YKPUfJ flWrPMkxjaqc3IKYk38812Lm3cH1dPP0nr2LZoDsozrp1sY4N84KOck54pgqz3V5zlgS /qVVX0V0nZZR9IaPrb1VymnDq0ZiXlinxRup1Kl4jFDdffNcdje6XQxj9e7nAneboXTT 4faw== X-Gm-Message-State: ALoCoQmT3Ew4bDTC/DA1PGXtza0TeqMImD6Z5ex24ETaeg8bomweBgXExFEvbN8Bt9MeJOjcZ8GC MIME-Version: 1.0 X-Received: by 10.60.58.165 with SMTP id s5mr23940471oeq.2.1429144982265; Wed, 15 Apr 2015 17:43:02 -0700 (PDT) Received: by 10.202.4.196 with HTTP; Wed, 15 Apr 2015 17:43:02 -0700 (PDT) In-Reply-To: References: Date: Wed, 15 Apr 2015 17:43:02 -0700 Message-ID: From: Kamraan Nasim To: dev@dpdk.org, newman555p@gmail.com, liran@weka.io Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] dev Digest, Vol 22, Issue 37 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Apr 2015 00:43:03 -0000 > > This had me stumped for a while as well. In my case, PostGres9.4 was also > running on my system which also used huge pages and came up before my DPDK > application causing rte_mempool_create() to ENOMEM. Check which other applications are using huge pages: > lsof | grep huge And see if you can disable huge pages for them or increase the total pages you're allocating in Kernel. --Kam > > Date: Sat, 10 Jan 2015 21:26:03 +0200 > From: Liran Zvibel > To: Newman Poborsky , "dev@dpdk.org" > > Subject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM > Message-ID: > QUKrd-ZCGn6HqBw7h6NE7wxUszf6WxOY18geg@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Hi Newman, > > There are two options, either one of your pools is very large, and > just does not fit in half of the memory, > so if the physical memory must be split it just can never work, or > what you?re seeing is localized to your > environment, and just when allocating from both NUMAs the huge pages > just happen to be to scattered > for your pools to be allocated. > > In any case, we also have to deal with large pools that don?t always > fit into consecutive huge pages as > allocated by the kernel. I have created a small patch to DPDK itself, > then some more code that can live > as part of the dpdk application that does the scattered allocation. > > I?m going to send both parts here (the change to the DPDK and the user > part). I don?t know what are the > rules that allow pushing to the repository, so I won?t try to do so. > > First ? the DPDK patch, that just makes sure that the huge pates are > mapped in a continuous virtual memory, > and then the memory segments are allocated continuously in virtual > memory: I?m attaching full mbox content to make it easier > for you to use if you?d like. I created it against 1.7.1, since that > is the version we?re using. If you?d like, I can also create it > against 1.8.0 > > ==================================================== > > >From 10ebc74eda2c3fe9e5a34815e0f7ee1f44d99aa3 Mon Sep 17 00:00:00 2001 > From: Liran Zvibel > Date: Sat, 10 Jan 2015 12:46:54 +0200 > Subject: [PATCH] Add an option to allocate huge pages in contiunous virtual > addresses > To: dev@dpdk.org > > Add a configuration option: CONFIG_RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR > that advises the memory sengment allocation code to allocate as many > hugemages in a continuous way in virtual addresses as possible. > > This way, a mempool may be created out of disparsed memzones allocated > from these new continuos memory segments. > --- > lib/librte_eal/linuxapp/eal/eal_memory.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c > b/lib/librte_eal/linuxapp/eal/eal_memory.c > index f2454f4..b8d68b0 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -329,6 +329,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl, > > #ifndef RTE_EAL_SINGLE_FILE_SEGMENTS > else if (vma_len == 0) { > +#ifndef RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR > unsigned j, num_pages; > > /* reserve a virtual area for next contiguous > @@ -340,6 +341,14 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl, > break; > } > num_pages = j - i; > +#else // hugepages are will be allocated in a continous virtual address > way > + unsigned num_pages; > + /* We will reserve a virtual area large enough > to fit ALL > + * physical blocks. > + * This way we can have bigger mempools even > if there is no > + * continuos physcial region. > */ > + num_pages = hpi->num_pages[0] - i; > +#endif > vma_len = num_pages * hugepage_sz; > > /* get the biggest virtual memory area up to > @@ -1268,6 +1277,16 @@ rte_eal_hugepage_init(void) > new_memseg = 1; > > if (new_memseg) { > +#ifdef RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR > + if (0 <= j) { > + RTE_LOG(DEBUG, EAL, "Closing memory > segment #%d(%p) vaddr is %p phys is 0x%lx size is 0x%lx " > + "which is #%ld pages next > vaddr will be at 0x%lx\n", > + j,&mcfg->memseg[j], > + mcfg->memseg[j].addr, > mcfg->memseg[j].phys_addr, mcfg->memseg[j].len, > + mcfg->memseg[j].len / > mcfg->memseg[j].hugepage_sz, > + mcfg->memseg[j].addr_64 + > mcfg->memseg[j].len); > + } > +#endif > j += 1; > if (j == RTE_MAX_MEMSEG) > break; > -- > 1.9.3 (Apple Git-50) > > ================================================================ > > Then there is the dpdk-application library part that implements the > struct rte_mempool *scattered_mempool_create(uint32_t elt_size, > uint32_t elt_num, int32_t socket_id, > rte_mempool_ctor_t > *mp_init, void *mp_init_arg, > rte_mempool_obj_ctor_t > *obj_init, void *obj_init_arg) > > interface. If you would like, I can easily break the different > functions into their right place in the rte_memseg and rte_mempool > DPDK modules and have it included as another interface of the DPDK > library (as suggested by Konstantin below) > > ===================================================== > static inline int is_memseg_valid(struct rte_memseg * free_memseg, > size_t requested_page_size, > int socket_id) > { > if (free_memseg->len == 0) { > return 0; > } > > if (socket_id != SOCKET_ID_ANY && > free_memseg->socket_id != SOCKET_ID_ANY && > free_memseg->socket_id != socket_id) { > RTE_LOG(DEBUG, USER1, "memseg goes not qualify for > socked_id, requested %d got %d", > socket_id, free_memseg->socket_id); > return 0; > } > > if (free_memseg->len < requested_page_size) { > RTE_LOG(DEBUG, USER1, "memseg too small. len %lu < > requested_page_size %lu", > free_memseg->len, requested_page_size); > return 0; > } > > > if (free_memseg->hugepage_sz != requested_page_size) { > RTE_LOG(DEBUG, USER1, "memset hugepage size != > requested page size %lu != %lu", > free_memseg->hugepage_sz, > requested_page_size); > return 0; > } > > return 1; > } > > static int try_allocating_memseg_range(struct rte_memseg * > free_memseg, int start, > int requested_page_size, size_t > len, int socket_id) > { > int i; > for (i = start; i < RTE_MAX_MEMSEG; i++) { > if (free_memseg[i].addr == NULL) { > return -1; > } > > if (!is_memseg_valid(free_memseg +i, > requested_page_size, socket_id)) { > return -1; > } > > if ((start != i) && > ((char *)free_memseg[i].addr != > (char*)free_memseg[i-1].addr + free_memseg[i-1].len)) { > RTE_LOG(DEBUG, USER1, "Looking for cont memseg > range. " > "[%d].vaddr %p != [%d].vaddr %p + > [i-1].len %lu == %p", > i, free_memseg[i].addr, i-1, > free_memseg[i-1].addr, > free_memseg[i-1].len, > (char *)(free_memseg[i-1].addr) + > free_memseg[i-1].len); > return -1; > } > > if ((free_memseg[i].len < len) && ((free_memseg[i].len > % requested_page_size) != 0)) { > RTE_LOG(DEBUG, USER1, "#%d memseg length not a > multplie of page size, or last." > " len %lu len %% requsted_pg_size %lu, > requested_pg_sz %d", > i, free_memseg[i].len, free_memseg[i].len % > requested_page_size, requested_page_size); > return -1; > } > > > if (len <= free_memseg[i].len) { > RTE_LOG(DEBUG, USER1, "Successfuly finished > lookng for memsegs. remaining req. " > "len %lu seg_len %lu, start %d i %d", > len, free_memseg[i].len, start, i); > return i - start +1; > } > > if (i == start) { > // We may not start on the beginning, have to > move to next pagesize alignment... > char * aligned_vaddr = > RTE_PTR_ALIGN_CEIL(free_memseg[i].addr, requested_page_size); > size_t diff = (size_t)(aligned_vaddr - (char > *)free_memseg[i].addr); > if ((free_memseg[i].len - diff) % > requested_page_size != 0) { > RTE_LOG(ERR, USER1, "BUG! First > segment is not page aligned! vaddr %p aligned " > "vaddr %p diff %lu len %lu, > len - diff %lu, " > "(len%%diff)/%d == %lu", > free_memseg[i].addr, > aligned_vaddr, diff, free_memseg[i].len, > free_memseg[i].len - diff, > requested_page_size, > (free_memseg[i].len - diff) > % requested_page_size); > return -1; > } else if (0 == free_memseg[i].len - diff) { > RTE_LOG(DEBUG, USER1, "After > alignment, first memseg is empty!"); > return -1; > } > > RTE_LOG(DEBUG, USER1, "First memseg gives > (after alignment) len %lu out of potential %lu", > (free_memseg[i].len - diff), > free_memseg[i].len); > len -= (free_memseg[i].len - diff); > } > len -= free_memseg[i].len; > } > > return -1; > } > > > /** > * Will register several memory zones, in continueues virtual > addresses of large size. > * All first memzones will use full pages, only the last memzone may > request less than a full hugepage. > * > * It will go through all the free memory segments, once it finds a > memsegment with full hugepages, it > * will check wheter it can start allocating from that memory segment on. > */ > static const struct rte_memzone * > memzone_reserve_multiple_cont_mz(const char * basename, size_t * > zones_len, size_t len, int socket_id, > unsigned flags, unsigned align) > { > struct rte_mem_config *mcfg; > const struct rte_memzone * ret = NULL; > size_t requested_page_size; > int i; > struct rte_memseg * free_memseg = NULL; > int first_memseg = -1; > int memseg_count = -1; > > mcfg = rte_eal_get_configuration()->mem_config; > free_memseg = mcfg->free_memseg; > > RTE_LOG(DEBUG, USER1, "mcfg is at %p free_memseg at %p memseg > at %p", mcfg, mcfg->free_memseg, mcfg->memseg); > > for (i = 0; i < 10 && (free_memseg[i].addr != NULL); i++) { > RTE_LOG(DEBUG, USER1, "free_memseg[%d] : vaddr 0x%p > phys_addr 0x%p len %lu pages: %lu [0x%lu]", i, > free_memseg[i].addr, > (void*)free_memseg[i].phys_addr, > free_memseg[i].len, free_memseg[i].len/free_memseg[i].hugepage_sz, > free_memseg[i].hugepage_sz); > } > > > for (i = 0; i < 10 && (mcfg->memseg[i].addr != NULL); i++) { > RTE_LOG(DEBUG, USER1, "memseg[%d] : vaddr 0x%p > phys_addr 0x%p len %lu pages: %lu [0x%lu]", i, > mcfg->memseg[i].addr, > (void*)mcfg->memseg[i].phys_addr, > mcfg->memseg[i].len, > mcfg->memseg[i].len/mcfg->memseg[i].hugepage_sz, > mcfg->memseg[i].hugepage_sz); > } > > *zones_len = 0; > > if (mcfg->memzone_idx >= RTE_MAX_MEMZONE) { > RTE_LOG(DEBUG, USER1, "No more room for new memzones"); > return NULL; > } > > if ((flags & (RTE_MEMZONE_2MB | RTE_MEMZONE_1GB)) == 0) { > RTE_LOG(DEBUG, USER1, "Must request either 2MB or 1GB > pages"); > return NULL; > } > > if ((flags & RTE_MEMZONE_1GB ) && (flags & RTE_MEMZONE_2MB)) { > RTE_LOG(DEBUG, USER1, "Cannot request both 1GB and 2MB > pages"); > return NULL; > } > > if (flags & RTE_MEMZONE_2MB) { > requested_page_size = RTE_PGSIZE_2M; > } else { > requested_page_size = RTE_PGSIZE_1G; > } > > if (len < requested_page_size) { > RTE_LOG(DEBUG, USER1, "Requested length %lu is smaller > than requested pages size %lu", > len , requested_page_size); > return NULL; > } > > ret = rte_memzone_reserve_aligned(basename, len, socket_id, > flags, align); > if (ret != NULL) { > RTE_LOG(DEBUG, USER1, "Normal > rte_memzone_reserve_aligned worked!"); > *zones_len = 1; > return ret; > } > > RTE_LOG(DEBUG, USER1, "rte_memzone_reserve_aligned failed. > Will have to allocate on our own"); > rte_rwlock_write_lock(&mcfg->mlock); > > for (i = 0; i < RTE_MAX_MEMSEG; i++) { > if (free_memseg[i].addr == NULL) { > break; > } > > if (!is_memseg_valid(free_memseg +i, > requested_page_size, socket_id)) { > continue; > } > > memseg_count = > try_allocating_memseg_range(free_memseg, i, requested_page_size, len, > socket_id); > if (0 < memseg_count ) { > RTE_LOG(DEBUG, USER1, "Was able to find > memsegments for zone! " > "first segment: %d segment_count %d len > %lu", > i, memseg_count, len); > first_memseg = i; > > // Fix first memseg -- make sure it's page aligned! > char * aligned_vaddr = > RTE_PTR_ALIGN_CEIL(free_memseg[i].addr, > > requested_page_size); > size_t diff = (size_t)(aligned_vaddr - (char > *)free_memseg[i].addr); > RTE_LOG(DEBUG, USER1, "Decreasing first > segment by %lu", diff); > free_memseg[i].addr = aligned_vaddr; > free_memseg[i].phys_addr += diff; > free_memseg[i].len -= diff; > if ((free_memseg[i].phys_addr % > requested_page_size != 0)) { > RTE_LOG(ERR, USER1, "After aligning > first free memseg, " > "physical address NOT page > aligned! %p", > > (void*)free_memseg[i].phys_addr); > abort(); > } > > break; > } > } > > if (first_memseg < 0) { > RTE_LOG(DEBUG, USER1, "Could not find memsegs to > allocate enough memory"); > goto out; > } > > // now perform actual allocation. > if (mcfg->memzone_idx + memseg_count >= RTE_MAX_MEMZONE) { > RTE_LOG(DEBUG, USER1, "There are not enough memzones > to allocate. " > "memzone_idx %d memseg_count %d max %s=%d", > mcfg->memzone_idx, memseg_count, > RTE_STR(RTE_MAX_MEMZONE), RTE_MAX_MEMZONE); > goto out; > } > > ret = &mcfg->memzone[mcfg->memzone_idx]; > *zones_len = memseg_count; > for (i = first_memseg; i < first_memseg + memseg_count; i++) { > size_t allocated_length; > if (free_memseg[i].len <= len) { > allocated_length = free_memseg[i].len; > } else { > allocated_length = len; > } > > struct rte_memzone * mz = > &mcfg->memzone[mcfg->memzone_idx++]; > snprintf(mz->name, sizeof(mz->name), "%s%d", basename, > i - first_memseg); > mz->phys_addr = free_memseg[i].phys_addr; > mz->addr = free_memseg[i].addr; > mz->len = allocated_length; > mz->hugepage_sz = free_memseg[i].hugepage_sz; > mz->socket_id = free_memseg[i].socket_id; > mz->flags = 0; > mz->memseg_id = i; > > free_memseg[i].len -= allocated_length; > free_memseg[i].phys_addr += allocated_length; > free_memseg[i].addr_64 += allocated_length; > len -= allocated_length; > } > > if (len != 0) { > RTE_LOG(DEBUG, USER1, "After registering all the > memzone, len is too small! Len is %lu", len); > ret = NULL; > goto out; > } > out: > rte_rwlock_write_unlock(&mcfg->mlock); > return ret; > } > > > static inline void build_physical_pages(phys_addr_t * phys_pages, int > num_phys_pages, size_t sz, > const struct rte_memzone * mz, > int num_zones) > { > size_t accounted_for_size =0; > int curr_page = 0; > int i; > unsigned j; > > RTE_LOG(DEBUG, USER1, "Phys pages are at %p 2M is %d mz > pagesize is %lu trailing zeros: %d", > phys_pages, RTE_PGSIZE_2M, mz->hugepage_sz, > __builtin_ctz(mz->hugepage_sz)); > > for (i = 0; i < num_zones; i++) { > size_t mz_remaining_len = mz[i].len; > for (j = 0; (j <= mz[i].len / RTE_PGSIZE_2M) && (0 < > mz_remaining_len) ; j++) { > phys_pages[curr_page++] = mz[i].phys_addr + j > * RTE_PGSIZE_2M; > > size_t added_len = > RTE_MIN((size_t)RTE_PGSIZE_2M, mz_remaining_len); > accounted_for_size += added_len; > mz_remaining_len -= added_len; > > if (sz <= accounted_for_size) { > RTE_LOG(DEBUG, USER1, "Filled in %d > pages of the physical pages array", curr_page); > return; > } > if (num_phys_pages < curr_page) { > RTE_LOG(ERR, USER1, "When building > physcial pages array, " > "used pages (%d) is more > than allocated pages %d. " > "accounted size %lu size %lu", > curr_page, num_phys_pages, > accounted_for_size, sz); > abort(); > } > } > } > > if (accounted_for_size < sz) { > RTE_LOG(ERR, USER1, "Finished going over %d memory > zones, and still accounted size is %lu " > "and requested size is %lu", > num_zones, accounted_for_size, sz); > abort(); > } > } > > struct rte_mempool *scattered_mempool_create(uint32_t elt_size, > uint32_t elt_num, int32_t socket_id, > rte_mempool_ctor_t > *mp_init, void *mp_init_arg, > rte_mempool_obj_ctor_t > *obj_init, void *obj_init_arg) > { > struct rte_mempool *mp; > const struct rte_memzone *mz; > size_t num_zones; > struct rte_mempool_objsz obj_sz; > uint32_t flags, total_size; > size_t sz; > > flags = (MEMPOOL_F_NO_SPREAD|MEMPOOL_F_SC_GET|MEMPOOL_F_SP_PUT); > > total_size = rte_mempool_calc_obj_size(elt_size, flags, &obj_sz); > > sz = elt_num * total_size; > /* We now have to account for the "gaps" at the end of each > page. Worst case is that we get > * all distinct pages, so we have to add the gap for each > possible page */ > int pages_num = (sz + RTE_PGSIZE_2M -1) / RTE_PGSIZE_2M; > int page_gap = RTE_PGSIZE_2M % elt_size; > sz += pages_num + page_gap; > > RTE_LOG(DEBUG, USER1, "Will have to allocate %d 2M pages for > the page table.", pages_num); > > if ((mz = memzone_reserve_multiple_cont_mz("data_obj", > &num_zones, sz, socket_id, > RTE_MEMZONE_2MB, > RTE_PGSIZE_2M)) == NULL) { > RTE_LOG(WARNING, USER1, "memzone reserve multi mz > returned NULL for socket id %d, will try ANY", > socket_id); > if ((mz = > memzone_reserve_multiple_cont_mz("data_obj", > &num_zones, sz, socket_id, > RTE_MEMZONE_2MB, > RTE_PGSIZE_2M)) == NULL) { > RTE_LOG(ERR, USER1, "memzone reserve multi mz > returned NULL even for any socket"); > return NULL; > } else { > RTE_LOG(DEBUG, USER1, "memzone reserve multi > mz returne %p with %lu zones for SOCKET_ID_ANY", > mz, num_zones); > } > } else { > RTE_LOG(DEBUG, USER1, "memzone reserve multi mz > returned %p with %lu zones for size %lu socket %d", > mz, num_zones, sz, socket_id); > } > > // Now will "break" the pages into smaller ones > phys_addr_t * phys_pages = malloc(sizeof(phys_addr_t)*pages_num); > if(phys_pages == NULL) { > RTE_LOG(DEBUG, USER1, "phys_pages is null. aborting"); > abort(); > } > > build_physical_pages(phys_pages, pages_num, sz, mz, num_zones); > RTE_LOG(DEBUG, USER1, "Beginning of vaddr is %p beginning of > physical addr is 0x%lx", mz->addr, mz->phys_addr); > mp = rte_mempool_xmem_create("data_pool", elt_num, elt_size, > 257 , sizeof(struct > rte_pktmbuf_pool_private), > mp_init, mp_init_arg, obj_init, > obj_init_arg, > socket_id, flags, (char *)mz[0].addr, > phys_pages, pages_num, > rte_bsf32(RTE_PGSIZE_2M)); > > RTE_LOG(DEBUG, USER1, "rte_mempool_xmem_create returned %p", mp); > return mp; > } > > ================================================================= > > Please let me know if you have any questions/comments about this code. > > Best Regards, > > Liran. > > On Jan 8, 2015, at 10:19, Newman Poborsky wrote: > > I finally found the time to try this and I noticed that on a server > with 1 NUMA node, this works, but if server has 2 NUMA nodes than by > default memory policy, reserved hugepages are divided on each node and > again DPDK test app fails for the reason already mentioned. I found > out that 'solution' for this is to deallocate hugepages on node1 > (after boot) and leave them only on node0: > echo 0 > > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages > > Could someone please explain what changes when there are hugepages on > both nodes? Does this cause some memory fragmentation so that there > aren't enough contiguous segments? If so, how? > > Thanks! > > Newman > > On Mon, Dec 22, 2014 at 11:48 AM, Newman Poborsky > wrote: > > On Sat, Dec 20, 2014 at 2:34 AM, Stephen Hemminger > wrote: > > You can reserve hugepages on the kernel cmdline (GRUB). > > > Great, thanks, I'll try that! > > Newman > > > On Fri, Dec 19, 2014 at 12:13 PM, Newman Poborsky > wrote: > > > On Thu, Dec 18, 2014 at 9:03 PM, Ananyev, Konstantin < > konstantin.ananyev@intel.com> wrote: > > > > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, > Konstantin > Sent: Thursday, December 18, 2014 5:43 PM > To: Newman Poborsky; dev@dpdk.org > Subject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM > > Hi > > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Newman Poborsky > Sent: Thursday, December 18, 2014 1:26 PM > To: dev@dpdk.org > Subject: [dpdk-dev] rte_mempool_create fails with ENOMEM > > Hi, > > could someone please provide any explanation why sometimes mempool > > creation > > fails with ENOMEM? > > I run my test app several times without any problems and then I > start > getting ENOMEM error when creating mempool that are used for > packets. > > I try > > to delete everything from /mnt/huge, I increase the number of huge > > pages, > > remount /mnt/huge but nothing helps. > > There is more than enough memory on server. I tried to debug > rte_mempool_create() call and it seems that after server is > restarted > > free > > mem segments are bigger than 2MB, but after running test app for > > several > > times, it seems that all free mem segments have a size of 2MB, and > > since I > > am requesting 8MB for my packet mempool, this fails. I'm not really > > sure > > that this conclusion is correct. > > > Yes,rte_mempool_create uses rte_memzone_reserve() to allocate > single physically continuous chunk of memory. > If no such chunk exist, then it would fail. > Why physically continuous? > Main reason - to make things easier for us, as in that case we don't > > have to worry > > about situation when mbuf crosses page boundary. > So you can overcome that problem like that: > Allocate max amount of memory you would need to hold all mbufs in > worst > > case (all pages physically disjoint) > > using rte_malloc(). > > > Actually my wrong: rte_malloc()s wouldn't help you here. > You probably need to allocate some external (not managed by EAL) memory > in > that case, > may be mmap() with MAP_HUGETLB, or something similar. > > Figure out it's physical mappings. > Call rte_mempool_xmem_create(). > You can look at: app/test-pmd/mempool_anon.c as a reference. > It uses same approach to create mempool over 4K pages. > > We probably add similar function into mempool API > > (create_scatter_mempool or something) > > or just add a new flag (USE_SCATTER_MEM) into rte_mempool_create(). > Though right now it is not there. > > Another quick alternative - use 1G pages. > > Konstantin > > > > > Ok, thanks for the explanation. I understand that this is probably an OS > question more than DPDK, but is there a way to again allocate a contiguous > memory for n-th run of my test app? It seems that hugepages get > divded/separated to individual 2MB hugepage. Shouldn't OS's memory > management system try to group those hupages back to one contiguous chunk > once my app/process is done? Again, I know very little about Linux > memory > management and hugepages, so forgive me if this is a stupid question. > Is rebooting the OS the only way to deal with this problem? Or should I > just try to use 1GB hugepages? > > p.s. Konstantin, sorry for the double reply, I accidentally forgot to > include dev list in my first reply :) > > Newman > > > > Does anybody have any idea what to check and how running my test app > several times affects hugepages? > > For me, this doesn't make any since because after test app exits, > > resources > > should be freed, right? > > This has been driving me crazy for days now. I tried reading a bit > more > theory about hugepages, but didn't find out anything that could help > > me. > > Maybe it's something else and completely trivial, but I can't figure > it > out, so any help is appreciated. > > Thank you! > > BR, > Newman P. > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > dev mailing list > dev@dpdk.org > http://dpdk.org/ml/listinfo/dev > > > ------------------------------ > > End of dev Digest, Vol 22, Issue 37 > *********************************** >