From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 2F696A00E6
	for <public@inbox.dpdk.org>; Wed,  7 Aug 2019 17:21:36 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id E8CC02BBD;
	Wed,  7 Aug 2019 17:21:35 +0200 (CEST)
Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com
 [148.163.129.52]) by dpdk.org (Postfix) with ESMTP id 98EB61D7
 for <dev@dpdk.org>; Wed,  7 Aug 2019 17:21:33 +0200 (CEST)
X-Virus-Scanned: Proofpoint Essentials engine
Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1-us2.ppe-hosted.com (PPE Hosted ESMTP Server) with ESMTPS id 07E7EA4008F;
 Wed,  7 Aug 2019 15:21:32 +0000 (UTC)
Received: from [192.168.1.11] (85.187.13.152) by ukex01.SolarFlarecom.com
 (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Wed, 7 Aug 2019
 16:21:25 +0100
To: Olivier Matz <olivier.matz@6wind.com>, Vamsi Krishna Attunuru
 <vattunuru@marvell.com>, <dev@dpdk.org>
CC: Thomas Monjalon <thomas@monjalon.net>, Anatoly Burakov
 <anatoly.burakov@intel.com>, Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
 Kokkilagadda <kirankumark@marvell.com>, Ferruh Yigit <ferruh.yigit@intel.com>
References: <CH2PR18MB338160CD8EF16EEB45EED387A6C80@CH2PR18MB3381.namprd18.prod.outlook.com>
 <20190719133845.32432-1-olivier.matz@6wind.com>
 <20190719133845.32432-3-olivier.matz@6wind.com>
From: Andrew Rybchenko <arybchenko@solarflare.com>
Message-ID: <98e55711-c232-1184-5293-9f8ab1454a44@solarflare.com>
Date: Wed, 7 Aug 2019 18:21:20 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <20190719133845.32432-3-olivier.matz@6wind.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Content-Language: en-US
X-Originating-IP: [85.187.13.152]
X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To
 ukex01.SolarFlarecom.com (10.17.10.4)
X-TM-AS-Product-Ver: SMEX-12.5.0.1300-8.5.1010-24828.000
X-TM-AS-Result: No-16.191600-8.000000-10
X-TMASE-MatchedRID: ZFzIhWOuIztq0U6EhO9EE4mfV7NNMGm+69aS+7/zbj+qvcIF1TcLYKEG
 Khm9baaN0wKYUXiwDuspYPdJucL2NzfOOPFtR30Sfc7cX82yHHkX2zxRNhh61X9nRLJB1yYQ9Vl
 GBjCDncjFG6rPbKnpuldOQryZMQKx0rly6TajiUniHyvyXeXh5sMA9JsxaUa3Sg8ufp5n3T7ccF
 +vY4fYYSiXjjQS3tHr6A/E3QXDjtxpokC3EjW38Khx+sYfmafeyoUTqBF1E5sGmHr1eMxt2UAc6
 DyoS2rIXQS0szagh4PRo5gP/gf+mZ5DPlQD/Hz3CuDAUX+yO6aeutXlSdRVja4136m7d1MaQDov
 87hVw1J4laxa1ZJtNf8a5GYWRdTP7cCiNMOXTK1fYa9W9OjitQRryDXHx6oX7L2+zGEubN51LAB
 7GKWwE5jecj2e6HJGFec8Bmj61xYIyeWU2toSHjqf0DtPln4novA/6ONsv0qqagILjdvgWvjWAW
 a6XtbpEGZf0gNLwy0Cg/XkqUTEmnT3K3VineCfyf21YeIsPYZyqV0HbyMU8psoi2XrUn/JyeMtM
 D9QOgChMIDkR/KfwI2j49Ftap9EOwBXM346/+zRI8qKRhYjUAESu6AEf8DvbrCkEVexCEVjILLt
 OWd9KdHTfPmfXIB+
X-TM-AS-User-Approved-Sender: Yes
X-TM-AS-User-Blocked-Sender: No
X-TMASE-Result: 10--16.191600-8.000000
X-TMASE-Version: SMEX-12.5.0.1300-8.5.1010-24828.000
X-MDID: 1565191293-X954YbNwUXzL
Subject: Re: [dpdk-dev] ***Spam*** [RFC 2/4] mempool: unalign size when
 calculating required mem amount
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 7/19/19 4:38 PM, Olivier Matz wrote:
> The size returned by rte_mempool_op_calc_mem_size_default() is aligned
> to the specified page size. This means that with big pages, the returned
> amount is more that what we really need to populate the mempool.
>
> This problem is tempered by the allocation method of
> rte_mempool_populate_default(): in some conditions (when
> try_iova_contig_mempool=true), it first tries to allocate all objs
> memory in an iova contiguous area, without the alignment constraint. If
> it fails, it fallbacks to the big aligned allocation, that can also
> fallback into several smaller allocations.
>
> This commit changes rte_mempool_op_calc_mem_size_default() to return the
> unaligned amount of memory (the alignment constraint is still returned
> via the *align argument),

It does not as far as I can see. There is no changes in
lib/librte_mempool/rte_mempool_ops_default.c

> and removes the optimistic contiguous
> allocation done when try_iova_contig_mempool=true.

Unfortunately I don't understand why these 2 changes are combined
into to one patch. The first looks good to me, the second is
questionable. It can impact performance.

> This will make the amount of allocated memory more predictible: it will
> be more than the optimistic contiguous allocation, but less than the big
> aligned allocation.
>
> This opens the door for the next commits that will try to prevent objets
> from being located across pages.
>
> Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
> ---
>   lib/librte_mempool/rte_mempool.c     | 44 ++++--------------------------------
>   lib/librte_mempool/rte_mempool.h     |  2 +-
>   lib/librte_mempool/rte_mempool_ops.c |  4 +++-
>   3 files changed, 9 insertions(+), 41 deletions(-)
>
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index 0f29e8712..335032dc8 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -430,7 +430,6 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>   	unsigned mz_id, n;
>   	int ret;
>   	bool need_iova_contig_obj;
> -	bool try_iova_contig_mempool;
>   	bool alloc_in_ext_mem;
>   
>   	ret = mempool_ops_alloc_once(mp);
> @@ -477,18 +476,10 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>   	 * wasting some space this way, but it's much nicer than looping around
>   	 * trying to reserve each and every page size.
>   	 *
> -	 * However, since size calculation will produce page-aligned sizes, it
> -	 * makes sense to first try and see if we can reserve the entire memzone
> -	 * in one contiguous chunk as well (otherwise we might end up wasting a
> -	 * 1G page on a 10MB memzone). If we fail to get enough contiguous
> -	 * memory, then we'll go and reserve space page-by-page.
> -	 *
>   	 * We also have to take into account the fact that memory that we're
>   	 * going to allocate from can belong to an externally allocated memory
>   	 * area, in which case the assumption of IOVA as VA mode being
> -	 * synonymous with IOVA contiguousness will not hold. We should also try
> -	 * to go for contiguous memory even if we're in no-huge mode, because
> -	 * external memory may in fact be IOVA-contiguous.
> +	 * synonymous with IOVA contiguousness will not hold.
>   	 */
>   
>   	/* check if we can retrieve a valid socket ID */
> @@ -497,7 +488,6 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>   		return -EINVAL;
>   	alloc_in_ext_mem = (ret == 1);
>   	need_iova_contig_obj = !(mp->flags & MEMPOOL_F_NO_IOVA_CONTIG);
> -	try_iova_contig_mempool = false;
>   
>   	if (!need_iova_contig_obj) {
>   		pg_sz = 0;
> @@ -506,7 +496,6 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>   		pg_sz = 0;
>   		pg_shift = 0;
>   	} else if (rte_eal_has_hugepages() || alloc_in_ext_mem) {
> -		try_iova_contig_mempool = true;
>   		pg_sz = get_min_page_size(mp->socket_id);
>   		pg_shift = rte_bsf32(pg_sz);
>   	} else {
> @@ -518,12 +507,8 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>   		size_t min_chunk_size;
>   		unsigned int flags;
>   
> -		if (try_iova_contig_mempool || pg_sz == 0)
> -			mem_size = rte_mempool_ops_calc_mem_size(mp, n,
> -					0, &min_chunk_size, &align);
> -		else
> -			mem_size = rte_mempool_ops_calc_mem_size(mp, n,
> -					pg_shift, &min_chunk_size, &align);
> +		mem_size = rte_mempool_ops_calc_mem_size(
> +			mp, n, pg_shift, &min_chunk_size, &align);
>   
>   		if (mem_size < 0) {
>   			ret = mem_size;
> @@ -542,31 +527,12 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>   		/* if we're trying to reserve contiguous memory, add appropriate
>   		 * memzone flag.
>   		 */
> -		if (try_iova_contig_mempool)
> +		if (min_chunk_size == (size_t)mem_size)
>   			flags |= RTE_MEMZONE_IOVA_CONTIG;
>   
>   		mz = rte_memzone_reserve_aligned(mz_name, mem_size,
>   				mp->socket_id, flags, align);
>   
> -		/* if we were trying to allocate contiguous memory, failed and
> -		 * minimum required contiguous chunk fits minimum page, adjust
> -		 * memzone size to the page size, and try again.
> -		 */
> -		if (mz == NULL && try_iova_contig_mempool &&
> -				min_chunk_size <= pg_sz) {
> -			try_iova_contig_mempool = false;
> -			flags &= ~RTE_MEMZONE_IOVA_CONTIG;
> -
> -			mem_size = rte_mempool_ops_calc_mem_size(mp, n,
> -					pg_shift, &min_chunk_size, &align);
> -			if (mem_size < 0) {
> -				ret = mem_size;
> -				goto fail;
> -			}
> -
> -			mz = rte_memzone_reserve_aligned(mz_name, mem_size,
> -				mp->socket_id, flags, align);
> -		}
>   		/* don't try reserving with 0 size if we were asked to reserve
>   		 * IOVA-contiguous memory.
>   		 */
> @@ -594,7 +560,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>   		else
>   			iova = RTE_BAD_IOVA;
>   
> -		if (try_iova_contig_mempool || pg_sz == 0)
> +		if (pg_sz == 0)
>   			ret = rte_mempool_populate_iova(mp, mz->addr,
>   				iova, mz->len,
>   				rte_mempool_memchunk_mz_free,
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index 8053f7a04..7bc10e699 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -458,7 +458,7 @@ typedef unsigned (*rte_mempool_get_count)(const struct rte_mempool *mp);
>    * @param[out] align
>    *   Location for required memory chunk alignment.
>    * @return
> - *   Required memory size aligned at page boundary.
> + *   Required memory size.
>    */
>   typedef ssize_t (*rte_mempool_calc_mem_size_t)(const struct rte_mempool *mp,
>   		uint32_t obj_num,  uint32_t pg_shift,
> diff --git a/lib/librte_mempool/rte_mempool_ops.c b/lib/librte_mempool/rte_mempool_ops.c
> index e02eb702c..22c5251eb 100644
> --- a/lib/librte_mempool/rte_mempool_ops.c
> +++ b/lib/librte_mempool/rte_mempool_ops.c
> @@ -100,7 +100,9 @@ rte_mempool_ops_get_count(const struct rte_mempool *mp)
>   	return ops->get_count(mp);
>   }
>   
> -/* wrapper to notify new memory area to external mempool */
> +/* wrapper to calculate the memory size required to store given number
> + * of objects
> + */
>   ssize_t
>   rte_mempool_ops_calc_mem_size(const struct rte_mempool *mp,
>   				uint32_t obj_num, uint32_t pg_shift,