From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id DEBDC1B19 for ; Fri, 23 Mar 2018 12:25:28 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Mar 2018 04:25:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,349,1517904000"; d="scan'208";a="40566231" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.237.220.112]) ([10.237.220.112]) by fmsmga001.fm.intel.com with ESMTP; 23 Mar 2018 04:25:21 -0700 To: Shreyansh Jain Cc: dev@dpdk.org, Olivier Matz , keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, Bruce Richardson , Thomas Monjalon , "Ananyev, Konstantin" , kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, Hemant Agrawal References: From: "Burakov, Anatoly" Message-ID: Date: Fri, 23 Mar 2018 11:25:20 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v2 23/41] mempool: add support for the new allocation methods X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Mar 2018 11:25:29 -0000 On 20-Mar-18 11:35 AM, Shreyansh Jain wrote: > Hello Anatoly, > > On Wed, Mar 7, 2018 at 10:26 PM, Anatoly Burakov > wrote: >> If a user has specified that the zone should have contiguous memory, >> use the new _contig allocation API's instead of normal ones. >> Otherwise, account for the fact that unless we're in IOVA_AS_VA >> mode, we cannot guarantee that the pages would be physically >> contiguous, so we calculate the memzone size and alignments as if >> we were getting the smallest page size available. >> >> Signed-off-by: Anatoly Burakov >> --- > > [...] > >> static void >> mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova) >> { >> @@ -549,6 +570,7 @@ rte_mempool_populate_default(struct rte_mempool *mp) >> unsigned mz_id, n; >> unsigned int mp_flags; >> int ret; >> + bool force_contig, no_contig; >> >> /* mempool must not be populated */ >> if (mp->nb_mem_chunks != 0) >> @@ -563,10 +585,46 @@ rte_mempool_populate_default(struct rte_mempool *mp) >> /* update mempool capabilities */ >> mp->flags |= mp_flags; >> >> - if (rte_eal_has_hugepages()) { >> - pg_shift = 0; /* not needed, zone is physically contiguous */ >> + no_contig = mp->flags & MEMPOOL_F_NO_PHYS_CONTIG; >> + force_contig = mp->flags & MEMPOOL_F_CAPA_PHYS_CONTIG; >> + >> + /* >> + * there are several considerations for page size and page shift here. >> + * >> + * if we don't need our mempools to have physically contiguous objects, >> + * then just set page shift and page size to 0, because the user has >> + * indicated that there's no need to care about anything. > > I think the above case is not handled properly here. > reason below... > >> + * >> + * if we do need contiguous objects, there is also an option to reserve >> + * the entire mempool memory as one contiguous block of memory, in >> + * which case the page shift and alignment wouldn't matter as well. >> + * >> + * if we require contiguous objects, but not necessarily the entire >> + * mempool reserved space to be contiguous, then there are two options. >> + * >> + * if our IO addresses are virtual, not actual physical (IOVA as VA >> + * case), then no page shift needed - our memory allocation will give us >> + * contiguous physical memory as far as the hardware is concerned, so >> + * act as if we're getting contiguous memory. >> + * >> + * if our IO addresses are physical, we may get memory from bigger >> + * pages, or we might get memory from smaller pages, and how much of it >> + * we require depends on whether we want bigger or smaller pages. >> + * However, requesting each and every memory size is too much work, so >> + * what we'll do instead is walk through the page sizes available, pick >> + * the smallest one and set up page shift to match that one. We will be >> + * wasting some space this way, but it's much nicer than looping around >> + * trying to reserve each and every page size. >> + */ >> + >> + if (no_contig || force_contig || rte_eal_iova_mode() == RTE_IOVA_VA) { >> pg_sz = 0; >> + pg_shift = 0; >> align = RTE_CACHE_LINE_SIZE; > > So, assuming dpaa2 as example, I ran testpmd. IOVA=VA is the mode. > pg_sz = 0 is set. > same as before applying the hotplug patchset except that earlier this > decision was purely based on availability of hugepages > (rte_eal_has_hugepages()). > Moving on... > >> + } else if (rte_eal_has_hugepages()) { >> + pg_sz = get_min_page_size(); >> + pg_shift = rte_bsf32(pg_sz); >> + align = pg_sz; >> } else { >> pg_sz = getpagesize(); >> pg_shift = rte_bsf32(pg_sz); >> @@ -585,23 +643,34 @@ rte_mempool_populate_default(struct rte_mempool *mp) >> goto fail; >> } >> >> - mz = rte_memzone_reserve_aligned(mz_name, size, >> - mp->socket_id, mz_flags, align); >> - /* not enough memory, retry with the biggest zone we have */ >> - if (mz == NULL) >> - mz = rte_memzone_reserve_aligned(mz_name, 0, >> + if (force_contig) { >> + /* >> + * if contiguous memory for entire mempool memory was >> + * requested, don't try reserving again if we fail. >> + */ >> + mz = rte_memzone_reserve_aligned_contig(mz_name, size, >> + mp->socket_id, mz_flags, align); >> + } else { >> + mz = rte_memzone_reserve_aligned(mz_name, size, >> mp->socket_id, mz_flags, align); >> + /* not enough memory, retry with the biggest zone we >> + * have >> + */ >> + if (mz == NULL) >> + mz = rte_memzone_reserve_aligned(mz_name, 0, >> + mp->socket_id, mz_flags, align); >> + } >> if (mz == NULL) { >> ret = -rte_errno; >> goto fail; >> } >> >> - if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG) >> + if (no_contig) >> iova = RTE_BAD_IOVA; >> else >> iova = mz->iova; >> >> - if (rte_eal_has_hugepages()) >> + if (rte_eal_has_hugepages() && force_contig) > > So, pre-hotplugging patch, call used to enter mempool_populate_iova. > But, with the 'force_contig' not set (in app/test-pmd/testpmd.c:521) > while calling rte_pktmbuf_pool_create, rte_mempool_populate_va is > called instead. > >> ret = rte_mempool_populate_iova(mp, mz->addr, >> iova, mz->len, >> rte_mempool_memchunk_mz_free, >> -- >> 2.7.4 > > This is called with pg_sz = 0: > 678 else >>> # 679 ret = rte_mempool_populate_virt(mp, mz->addr, > 680 mz->len, pg_sz, > 681 rte_mempool_memchunk_mz_free, > 682 (void *)(uintptr_t)mz); > > In this function, > > 512 /* address and len must be page-aligned */ > 513 if (RTE_PTR_ALIGN_CEIL(addr, pg_sz) != addr) > 514 return -EINVAL; > > This is where error is returned. > > I don't think RTE_PTR_ALIGN_CEIL is designed to handle pg_sz = 0. > > It is roughly equivalent to: > RTE_PTR_ALIGN_FLOOR(((uintptr_t)addr - 1), pg_sz) which returns NULL > (0 ~ pg_sz). > > Basically, this ends up failing rte_mempool_populate_default. > > I think the reason is the assumption that when > rte_mempool_populate_virt is called, it can handle 0 page sizes (there > would issues besides the above RTE_PTR_ALIGN_CEIL as well, like a > for-loop looping over off+pg_sz), is wrong. It needs a valid page-size > value to work with (!0). > > So, basically, DPAA2 is stuck with this patch because of above issue, > if I am correctly comprehending it as above. > > Regards, > Shreyansh > Thanks for finding this issue. Fix is now pushed to github for testing. -- Thanks, Anatoly