From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 2D774A32A2
	for <public@inbox.dpdk.org>; Thu, 24 Oct 2019 21:31:04 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 0AADF1EBF5;
	Thu, 24 Oct 2019 21:31:03 +0200 (CEST)
Received: from mail-io1-f67.google.com (mail-io1-f67.google.com
 [209.85.166.67]) by dpdk.org (Postfix) with ESMTP id 118E41EBEE
 for <dev@dpdk.org>; Thu, 24 Oct 2019 21:31:02 +0200 (CEST)
Received: by mail-io1-f67.google.com with SMTP id z19so31006122ior.0
 for <dev@dpdk.org>; Thu, 24 Oct 2019 12:31:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=h8pSmHDeTazj5SuJiGNZZSm1lCJUjmA9UU4ycJQ55tc=;
 b=lWN+Sy1YTciJjY0pq3vokf98i1+xRrGBEkDZHqdQtlr3OHhRSVoSfP1KfJLbhkfmzr
 A8udsgOx5ofhNjroib82U+PBJWJqzOdH7WKf+fP14lh0Svn/A4ft0xxcdQtekRoRCZq/
 kmZvHmOOJGDhC2sOe+hasYEfaEFTRE2RALms6SxVdkHKueo0x9I4VWR4DQd3YYNzZy04
 gY0MNMXK6eqYblAkwbGc6ijhfRacB4CMLNWBPwACqKFhcztxcyZBYMGYhsSRaD4rPBE5
 CnlxCrmEz0sXxUR7iJMFTj7H+7gcmZwgX2VPcGIVAqGHVPFkqjOEhumkz1Fbj8e/EwYJ
 27hw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=h8pSmHDeTazj5SuJiGNZZSm1lCJUjmA9UU4ycJQ55tc=;
 b=dTorjgq4XkH1P3XcnqmMm3T35NTv5kHpSOxBFIOw/K+2AjohRH2J9ry5V41PLw0baF
 4vZezrbU0pjvD3M73RBo0qD4+w1hzinYKlbHW53/9EjiW22dHpqAk3YYrviug7M3YBqB
 h4ztKduMAXGw04BfR0EPUFfAXmLbYBp2szaGq1iX7D5wNb3x9VKVlaKPgfOtmo75RF3l
 U6vaejqAwQ7tcc6K4MD65hCI8kTctU9PQWUosgVZTCRdQzVOE4U5dj5TelJZ3WckO2Kh
 T5T3kQHIXUEHpzXhk1L9sb/x0ggUOJqbDCVFshXAzsqJ6sVofa15/AI7NaEQuBZtbMAD
 IJ6A==
X-Gm-Message-State: APjAAAX+ZOi0f/gpmm5L/qQQXf0ddq1nmMpWf/Rpc7HzlHNmGe3XKmzx
 nwZpL7rqBhSW3k7wAkq076K396Bd/HQ6cq1KaKA=
X-Google-Smtp-Source: APXvYqztWLdh6WKlaA3TqvRxFk6DDFr17qSjl2ZRZht574So2D9YjluwKupfR07PdTR8mChOcZFb8jTaoxt76rCLozQ=
X-Received: by 2002:a5d:8598:: with SMTP id f24mr4619693ioj.60.1571945461072; 
 Thu, 24 Oct 2019 12:31:01 -0700 (PDT)
MIME-Version: 1.0
References: <20191021080324.10659-3-vattunuru@marvell.com>
 <4bd1acf5-2da2-b2da-2b0c-7ee243d5aeb9@intel.com>
 <MWHPR18MB16454BCA52557B9A6FAA2215A6690@MWHPR18MB1645.namprd18.prod.outlook.com>
 <77f8eaf0-52ca-1295-973d-c8085f7b7736@intel.com>
 <MWHPR18MB1645FF9B1A7E429E23F1F881A6690@MWHPR18MB1645.namprd18.prod.outlook.com>
 <08c426d1-6fc9-1c3f-02d4-8632a8e3c337@solarflare.com>
 <MWHPR18MB1645B82FE6E75C4B8E6FE84BA6680@MWHPR18MB1645.namprd18.prod.outlook.com>
 <CALBAE1O4biRt5vcirv5febUz+7QDMynTaBAxOovgTXaiTjaVqQ@mail.gmail.com>
 <20191023144724.GO25286@glumotte.dev.6wind.com>
 <CALBAE1Og1saVEdZOvW=7PeLK6V9N_gmVnNUj2zLVby88JsfmNA@mail.gmail.com>
 <20191024173506.GU25286@glumotte.dev.6wind.com>
In-Reply-To: <20191024173506.GU25286@glumotte.dev.6wind.com>
From: Jerin Jacob <jerinjacobk@gmail.com>
Date: Fri, 25 Oct 2019 01:00:44 +0530
Message-ID: <CALBAE1PN5_Qt=pP0cRZ9Bi4UKUCcvuhsMQSOQV7t2P8R35XWAQ@mail.gmail.com>
To: Olivier Matz <olivier.matz@6wind.com>
Cc: Vamsi Krishna Attunuru <vattunuru@marvell.com>,
 Andrew Rybchenko <arybchenko@solarflare.com>, 
 Ferruh Yigit <ferruh.yigit@intel.com>,
 "thomas@monjalon.net" <thomas@monjalon.net>, 
 Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
 Kiran Kumar Kokkilagadda <kirankumark@marvell.com>, 
 "anatoly.burakov@intel.com" <anatoly.burakov@intel.com>, 
 "stephen@networkplumber.org" <stephen@networkplumber.org>,
 "dev@dpdk.org" <dev@dpdk.org>
Content-Type: text/plain; charset="UTF-8"
Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz <olivier.matz@6wind.com> wrote:
>
> Hi,
>
> On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote:
> > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz <olivier.matz@6wind.com> wrote:
> > >
> > > Hi,
> > >
> > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote:
> > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru
> > > > <vattunuru@marvell.com> wrote:
> > > > >
> > > > > Hi Ferruh,
> > > > >
> > > > > Can you please explain the problems in using kni dedicated mbuf alloc routines while enabling kni iova=va mode. Please see the below discussion with Andrew. He wanted to know the problems in having newer APIs.
> > > >
> > > >
> > > > While waiting  for the Ferruh reply, I would like to summarise the
> > > > current status
> > > >
> > > > # In order to make KNI work with IOVA as VA, We need to make sure
> > > > mempool pool _object_ should not span across two huge pages
> > > >
> > > > # This problem can be fixed by, either of:
> > > >
> > > > a) Introduce a flag in mempool to define this constraint, so that,
> > > > when only needed, this constraint enforced and this is in line
> > > > with existing semantics of addressing such problems in mempool
> > > >
> > > > b) Instead of creating a flag, Make this behavior by default in
> > > > mempool for IOVA as VA case
> > > >
> > > > Upside:
> > > > b1) There is no need for specific mempool_create for KNI.
> > > >
> > > > Downside:
> > > > b2) Not align with existing mempool API semantics
> > > > b3) There will be a trivial amount of memory waste as we can not
> > > > allocate from the edge. Considering the normal huge
> > > > page memory size is 1G or 512MB this not a real issue.
> > > >
> > > > c) Make IOVA as PA when KNI kernel module is loaded
> > > >
> > > > Upside:
> > > > c1) Doing option (a) would call for new KNI specific mempool create
> > > > API i.e existing KNI applications need a one-line change in
> > > > application to make it work with release 19.11 or later.
> > > >
> > > > Downslide:
> > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not work with KNI
> > > > c3) Need root privilege to run KNI as IOVA as PA need root privilege
> > > >
> > > > For the next year, we expect applications to work 19.11 without any
> > > > code change. My personal opinion to make go with option (a)
> > > > and update the release notes to document the change any it simple
> > > > one-line change.
> > > >
> > > > The selection of (a) vs (b) is between KNI and Mempool maintainers.
> > > > Could we please reach a consensus? Or can we discuss this TB meeting?
> > > >
> > > > We are going back and forth on this feature on for the last 3
> > > > releases. Now that, we solved all the technical problems, please help
> > > > us
> > > > to decide (a) vs (b) to make forward progress.
> > >
> > > Thank you for the summary.
> > > What is not clear to me is if (a) or (b) may break an existing
> > > application, and if yes, in which case.
> >
> > Thanks for the reply.
> >
> > To be clear we are talking about out of tree KNI tree application.
> > Which they don't want to
> > change rte_pktmbuf_pool_create() to rte_kni_pktmbuf_pool_create() and
> > build for v19.11
> >
> > So in case (b) there is no issue as It will be using rte_pktmbuf_pool_create ().
> > But in case of (a) it will create an issue if out of tree KNI
> > application is using rte_pktmbuf_pool_create() which is not using the
> > NEW flag.
>
> Following yesterday's discussion at techboard, I looked at the mempool
> code and at my previous RFC patch. It took some time to remind me what
> was my worries.

Thanks for the review Olivier.

Just to make sure the correct one is reviewed.

1) v7 had similar issue mentioned
http://patches.dpdk.org/patch/56585/

2) v11 addressed the review comments and you have given the Acked-by
for mempool change
http://patches.dpdk.org/patch/61559/

My thought process in the TB meeting was, since
rte_mempool_populate_from_pg_sz_chunks() reviwed
replace rte_pktmbuf_pool_create's  rte_mempool_populate_default() with
rte_mempool_populate_from_pg_sz_chunks()
in IOVA == VA case to avoid a new KNI mempool_create API.

>
> Currently, in rte_mempool_populate_default(), when the mempool is
> populated, we first try to allocate one iova-contiguous block of (n *
> elt_size). On success, we use this memory to fully populate the mempool
> without taking care of crossing page boundaries.
>
> If we change the behavior to prevent objects from crossing pages, the
> assumption that allocating (n * elt_size) is always enough becomes
> wrong.  By luck, there is no real impact, because if the mempool is not
> fully populated after this first iteration, it will allocate a new
> chunk.
>
> To be rigorous, we need to better calculate the amount of memory to
> allocate, according to page size.
>
> Looking at the code, I found another problem in the same area: let's say
> we populate a mempool that requires 1.1GB (and we use 1G huge pages):
>
> 1/ mempool code will first tries to allocate an iova-contiguous zone
>    of 1.1G -> fail
> 2/ it then tries to allocate a page-aligned non iova-contiguous
>    zone of 1.1G, which is 2G. On success, a lot of memory is wasted.
> 3/ on error, we try to allocate the biggest zone, it can still return
>    a zone between 1.1G and 2G, which can also waste memory.
>
> I will rework my mempool patchset to properly address these issues,
> hopefully tomorrow.

OK.


>
> Also, I thought about another idea to solve your issue, not sure it is
> better but it would not imply to change the mempool behavior. If I
> understood the problem, when a mbuf is accross 2 pages, the copy of the
> data can fail in kni because the mbuf is not virtually contiguous in the

For KNI use case, we would need _physically_ contiguous to make sure
that using, get_user_pages_remote() we get  physically contiguous memory map,
so that both KNI kernel thread and KNI kernel context and DPDK userspace can
use the same memory in different contexts.


> kernel. So why not in this case splitting the memcpy() into several,
> each of them being on a single page (and calling phys2virt() for each
> page)? The same would have to be done when accessing the fields of the
> mbuf structure if it crosses a page boundary. Would that work? This

If the above is the requirement, Does this logic need to be in slow
path or fast path?

> could be a B plan.

OK.

>
> Olivier