From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <olivier.matz@6wind.com>
Received: from mail-wm0-f54.google.com (mail-wm0-f54.google.com [74.125.82.54])
 by dpdk.org (Postfix) with ESMTP id B4388BD2A
 for <dev@dpdk.org>; Mon,  6 Feb 2017 18:01:20 +0100 (CET)
Received: by mail-wm0-f54.google.com with SMTP id b65so129009495wmf.0
 for <dev@dpdk.org>; Mon, 06 Feb 2017 09:01:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=6wind-com.20150623.gappssmtp.com; s=20150623;
 h=date:from:to:cc:subject:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=g4HX4OtgYW7HX3MtldHfg7S25MvMq0WChXb7eN4jftA=;
 b=cKKc9Ao9oLf0N/+MqfSnrcDFdWtNmQI2D7zvz49EAXI4TTdXMkchTgU9p4/i7kbHcz
 W81pLaePBlyZMQpo4fzI8OIaKDHK3Plq/6DR6l07iAxgpHMAdg819QMzTtba0c0dKdCX
 +O++D8oXnSg7i+2/PEhG+L7HNCZzxH1HvSuJg7Qpo0xsdZkS0yYS/KhCTbO8UicYqBiw
 igccIGp0SDJYcBxryiorfci6+oN2AD1tqSQIDxtD+049BcyGtvUw6qcd815i6LdbUBMe
 moDupynwvsVLmkPNBdnjToAHT9dzf33JdyPIcCLPYzhg2tnmIZWNcDyDNqQT6TzrB3lZ
 Ho1A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=g4HX4OtgYW7HX3MtldHfg7S25MvMq0WChXb7eN4jftA=;
 b=Rl1g4HaY9xFf+Aie+chRKHQ2TxMkrE2Lz+R6YQl3pJ1aWG4rP7reY311ViKbeFhPGg
 Hu7rHlc6MMM8gneJnD99cjk8sk6owpKKrcJJUSER479ltwa2EDJxAFew654h2+26mDfH
 v3dWaAVBIFVFpR5hPBUnnj26va4rYhpfk039fAl/gSLR09RTOqe8kbYKXrdTYKJYOJjL
 KmC5gsM2eEvaQL22k6bBbNeg1Zr7CSM0a2zvuPo/Dk+bBQrf14Hq23K2JXxSq2Ew5xNX
 QnCoXVOmADrsW3N8a4r4U7qCNUtXxt+DFKOu1Cblk14PbPi7Kzx20/zvCzEUksxnM1xf
 rCuQ==
X-Gm-Message-State: AMke39n4o4fXp2u1t5qK3MkXgiw7/Lt6ML9wkqQSQFW16CCiFaZKV74uBU2rPngCXC2iOu6U
X-Received: by 10.28.211.200 with SMTP id k191mr8807646wmg.137.1486400480340; 
 Mon, 06 Feb 2017 09:01:20 -0800 (PST)
Received: from platinum (2a01cb0c03c651000226b0fffeed02fc.ipv6.abo.wanadoo.fr.
 [2a01:cb0c:3c6:5100:226:b0ff:feed:2fc])
 by smtp.gmail.com with ESMTPSA id z134sm13802268wmc.20.2017.02.06.09.01.20
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Mon, 06 Feb 2017 09:01:20 -0800 (PST)
Date: Mon, 6 Feb 2017 18:01:15 +0100
From: Olivier Matz <olivier.matz@6wind.com>
To: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Cc: <dev@dpdk.org>, <thomas.monjalon@6wind.com>,
 <jerin.jacob@caviumnetworks.com>, <hemant.agrawal@nxp.com>
Message-ID: <20170206180115.480e07af@platinum>
In-Reply-To: <20170131143211.GA31464@santosh-Latitude-E5530-non-vPro>
References: <1484922017-26030-1-git-send-email-santosh.shukla@caviumnetworks.com>
 <1484925221-18431-1-git-send-email-santosh.shukla@caviumnetworks.com>
 <20170131113151.3f8e07a0@platinum>
 <20170131143211.GA31464@santosh-Latitude-E5530-non-vPro>
X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH v2] mempool: Introduce _populate_mz_range api
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Feb 2017 17:01:20 -0000

On Tue, 31 Jan 2017 20:02:26 +0530, Santosh Shukla
<santosh.shukla@caviumnetworks.com> wrote:
> Hi Olivier,
> 
> Reply inline.
> 
> On Tue, Jan 31, 2017 at 11:31:51AM +0100, Olivier Matz wrote:
> > Hi Santosh,
> > 
> > I guess this patch is targeted for 17.05, right?
> >  
> 
> Yes.
> 
> > Please see some other comments below.
> > 
> > On Fri, 20 Jan 2017 20:43:41 +0530,
> > <santosh.shukla@caviumnetworks.com> wrote:  
> > > From: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> > > 
> > > HW pool manager e.g. Cavium SoC need s/w to program start and
> > > end address of pool. Currently there is no such api in
> > > ext-mempool.  
> > 
> > Today, the mempool objects are not necessarily contiguous in
> > virtual or physical memory. The only assumption that can be made is
> > that each object is contiguous (virtually and physically). If the
> > flag MEMPOOL_F_NO_PHYS_CONTIG is passed, each object is assured to
> > be contiguous virtually.
> >  
> 
> Some clarification: IIUC then pa/va addr for each hugepage (aka mz)
> is contiguous but no gaurantee of contiguity across the hugepages (aka
> muli-mzs), Right?
> 
> If above said is correct then case like pool start addr in one mz and
> end addr is on another mz is a problem scenario. Correct? That said
> then ext-mempool drivers will get affected in such cases.

Yes that's globally correct, except it is possible that several
hugepages are physically contiguous (it will try to, but it's not
guaranteed).


> 
> > > So introducing _populate_mz_range API which will let HW(pool
> > > manager) know about hugepage mapped virtual start and end
> > > address.  
> > 
> > rte_mempool_ops_populate_mz_range() looks a bit long. What about
> > rte_mempool_ops_populate() instead?  
> 
> Ok.
> 
> > > diff --git a/lib/librte_mempool/rte_mempool.c
> > > b/lib/librte_mempool/rte_mempool.c index 1c2aed8..9a39f5c 100644
> > > --- a/lib/librte_mempool/rte_mempool.c
> > > +++ b/lib/librte_mempool/rte_mempool.c
> > > @@ -568,6 +568,10 @@ static unsigned optimize_object_size(unsigned
> > > obj_size) else
> > >  			paddr = mz->phys_addr;
> > >  
> > > +		/* Populate mz range */
> > > +		if ((mp->flags & MEMPOOL_F_POOL_CREATED) == 0)
> > > +			rte_mempool_ops_populate_mz_range(mp,
> > > mz); +
> > >  		if (rte_eal_has_hugepages()  
> > 
> > Given what I've said above, I think the populate() callback should
> > be in rte_mempool_populate_phys() instead of
> > rte_mempool_populate_default(). It would be called for each
> > contiguous zone.
> >  
> 
> Ok.
> 
> > > --- a/lib/librte_mempool/rte_mempool.h
> > > +++ b/lib/librte_mempool/rte_mempool.h
> > > @@ -387,6 +387,12 @@ typedef int (*rte_mempool_dequeue_t)(struct
> > > rte_mempool *mp, */
> > >  typedef unsigned (*rte_mempool_get_count)(const struct
> > > rte_mempool *mp); +/**
> > > + * Set the memzone va/pa addr range and len in the external pool.
> > > + */
> > > +typedef void (*rte_mempool_populate_mz_range_t)(struct
> > > rte_mempool *mp,
> > > +		const struct rte_memzone *mz);
> > > +  
> > 
> > And this API would be:
> > typedef void (*rte_mempool_populate_t)(struct rte_mempool *mp,
> > 		char *vaddr, phys_addr_t paddr, size_t len)
> >  
> > 
> > If your hw absolutly needs a contiguous memory, a solution could be:
> > 
> > - add a new flag MEMPOOL_F_CONTIG (maybe a better nale could be
> >   found) saying that all the mempool objects must be contiguous.
> > - add the ops_populate() callback in rte_mempool_populate_phys(), as
> >   suggested above
> > 
> > Then:
> > 
> >   /* create an empty mempool */
> >   rte_mempool_create_empty(...);
> > 
> >   /* set the handler:
> >    *   in the ext handler, the mempool flags are updated with
> >    *   MEMPOOL_F_CONTIG
> >    */
> >   rte_mempool_set_ops_byname(..., "my_hardware");
> >   
> 
> You mean, ext handler will set mempool flag using 'pool_config'
> param; somethng like rte_mempool_ops_by_name(..., "my_hardware" ,
> MEMPOOL_F_CONTIG); ?

I don't mean changing the API of rte_mempool_set_ops_byname().
I was suggesting something like this:

static int your_handler_alloc(struct rte_mempool *mp)
{
	/* handler init */
	...

	mp->flags |= MEMPOOL_F_CONTIG;
	return 0;
}


And update rte_mempool_populate_*() functions to manage this flag:
instead of segment, just fail if it cannot fit in one segment. It won't
really solve the issue, but at least it will be properly detected, and
the user could take dispositions to have more contiguous memory (ex:
use larger hugepages, allocate hugepages at boot time).

> 
> >   /* if MEMPOOL_F_CONTIG is set, all populate() functions should
> > ensure
> >    * that there is only one contiguous zone
> >    */
> >   rte_mempool_populate_default(...);
> >   
> 
> I am not too sure about scope of MEMPOOL_F_CONTIG. How
> MEMPOOL_F_CONTIG flag {setted by application/ driver by calling
> rte_mempool_create(..., flag)} can make sure only one contiguous
> zone? Like to understand from you.

As described above, there would be no change from application point of
view. The handler would set the mempool flag by itself to change the
behavior of the populate functions.


> 
> In my understanding:
> rte_mempool_populate_default() will request for memzone based on
> mp->size value; And if mp->size more than one hugepage_sz{i.e. one mz
> request not enough} then it will request more hugepages{ i.e. more mz
> request},. So IIIU then contiguity not gauranteed.

Yes, that's how it works today. As EAL cannot guarantees that the
hugepages are physically contiguous, it tries to segment the mempool
objects in several memory zones.


Regards,
Olivier