From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id 5986C1AFE8 for ; Mon, 19 Mar 2018 18:11:35 +0100 (CET) Received: from lfbn-lil-1-702-109.w81-254.abo.wanadoo.fr ([81.254.39.109] helo=droids-corp.org) by mail.droids-corp.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1exyKK-000757-S1; Mon, 19 Mar 2018 18:12:06 +0100 Received: by droids-corp.org (sSMTP sendmail emulation); Mon, 19 Mar 2018 18:11:31 +0100 Date: Mon, 19 Mar 2018 18:11:31 +0100 From: Olivier Matz To: Anatoly Burakov Cc: dev@dpdk.org, keith.wiles@intel.com, jianfeng.tan@intel.com, andras.kovacs@ericsson.com, laszlo.vadkeri@ericsson.com, benjamin.walker@intel.com, bruce.richardson@intel.com, thomas@monjalon.net, konstantin.ananyev@intel.com, kuralamudhan.ramakrishnan@intel.com, louise.m.daly@intel.com, nelio.laranjeiro@6wind.com, yskoh@mellanox.com, pepperjo@japf.ch, jerin.jacob@caviumnetworks.com, hemant.agrawal@nxp.com, arybchenko@solarflare.com Message-ID: <20180319171131.dnhd752syi6fo67s@platinum> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH v2 23/41] mempool: add support for the new allocation methods X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Mar 2018 17:11:35 -0000 Hi Anatoly, Please find some comments below. On Wed, Mar 07, 2018 at 04:56:51PM +0000, Anatoly Burakov wrote: > If a user has specified that the zone should have contiguous memory, > use the new _contig allocation API's instead of normal ones. > Otherwise, account for the fact that unless we're in IOVA_AS_VA > mode, we cannot guarantee that the pages would be physically > contiguous, so we calculate the memzone size and alignments as if > we were getting the smallest page size available. > > Signed-off-by: Anatoly Burakov [...] > @@ -563,10 +585,46 @@ rte_mempool_populate_default(struct rte_mempool *mp) > /* update mempool capabilities */ > mp->flags |= mp_flags; > > - if (rte_eal_has_hugepages()) { > - pg_shift = 0; /* not needed, zone is physically contiguous */ > + no_contig = mp->flags & MEMPOOL_F_NO_PHYS_CONTIG; > + force_contig = mp->flags & MEMPOOL_F_CAPA_PHYS_CONTIG; > + > + /* > + * there are several considerations for page size and page shift here. I would add a little word here to describe what page size and page shift are used for: These values impact the result of rte_mempool_xmem_size() (*), which returns the amount of memory that should be allocated to store the desired number of objects. When not zero, it allocates more memory for the padding between objects, to ensure that an object does not cross a page boundary. (*) it is renamed in Andrew's patchset about mempool_ops API, but it seems the memory rework may be pushed before. > + * > + * if we don't need our mempools to have physically contiguous objects, > + * then just set page shift and page size to 0, because the user has > + * indicated that there's no need to care about anything. > + * > + * if we do need contiguous objects, there is also an option to reserve > + * the entire mempool memory as one contiguous block of memory, in > + * which case the page shift and alignment wouldn't matter as well. > + * > + * if we require contiguous objects, but not necessarily the entire > + * mempool reserved space to be contiguous, then there are two options. > + * > + * if our IO addresses are virtual, not actual physical (IOVA as VA > + * case), then no page shift needed - our memory allocation will give us > + * contiguous physical memory as far as the hardware is concerned, so > + * act as if we're getting contiguous memory. > + * > + * if our IO addresses are physical, we may get memory from bigger > + * pages, or we might get memory from smaller pages, and how much of it > + * we require depends on whether we want bigger or smaller pages. > + * However, requesting each and every memory size is too much work, so > + * what we'll do instead is walk through the page sizes available, pick > + * the smallest one and set up page shift to match that one. We will be > + * wasting some space this way, but it's much nicer than looping around > + * trying to reserve each and every page size. > + */ This comment is helpful to understand, thanks. (by the way, reading it makes me think we should rename MEMPOOL_F_*_PHYS_CONTIG as MEMPOOL_F_*_IOVA_CONTIG) > + > + if (no_contig || force_contig || rte_eal_iova_mode() == RTE_IOVA_VA) { > pg_sz = 0; > + pg_shift = 0; > align = RTE_CACHE_LINE_SIZE; > + } else if (rte_eal_has_hugepages()) { > + pg_sz = get_min_page_size(); > + pg_shift = rte_bsf32(pg_sz); > + align = pg_sz; > } else { > pg_sz = getpagesize(); > pg_shift = rte_bsf32(pg_sz); > @@ -585,23 +643,34 @@ rte_mempool_populate_default(struct rte_mempool *mp) > goto fail; > } > > - mz = rte_memzone_reserve_aligned(mz_name, size, > - mp->socket_id, mz_flags, align); > - /* not enough memory, retry with the biggest zone we have */ > - if (mz == NULL) > - mz = rte_memzone_reserve_aligned(mz_name, 0, > + if (force_contig) { > + /* > + * if contiguous memory for entire mempool memory was > + * requested, don't try reserving again if we fail. > + */ > + mz = rte_memzone_reserve_aligned_contig(mz_name, size, > + mp->socket_id, mz_flags, align); > + } else { > + mz = rte_memzone_reserve_aligned(mz_name, size, > mp->socket_id, mz_flags, align); > + /* not enough memory, retry with the biggest zone we > + * have > + */ > + if (mz == NULL) > + mz = rte_memzone_reserve_aligned(mz_name, 0, > + mp->socket_id, mz_flags, align); > + } This is not wrong, but at first glance I think it is not required, because we have this in populate_iova(): /* Detect pool area has sufficient space for elements */ if (mp_capa_flags & MEMPOOL_F_CAPA_PHYS_CONTIG) { if (len < total_elt_sz * mp->size) { RTE_LOG(ERR, MEMPOOL, "pool area %" PRIx64 " not enough\n", (uint64_t)len); return -ENOSPC; } } Thanks, Olivier