From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 16E12A00BE; Mon, 28 Oct 2019 15:05:37 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A72CC1BECA; Mon, 28 Oct 2019 15:05:35 +0100 (CET) Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by dpdk.org (Postfix) with ESMTP id 70B8C1BEC9 for ; Mon, 28 Oct 2019 15:05:34 +0100 (CET) Received: by mail-wr1-f66.google.com with SMTP id q13so10000947wrs.12 for ; Mon, 28 Oct 2019 07:05:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=vqkvc+7dzCvRW2QZ5bF3SVaAd6/rC3mwruelVaoQ5xI=; b=GUHxrn31VjQUN9UTkLOWnvUPz9jHaco41xGHe3R2rgtUsmptdnGPS+Q2XWAXt1cTQz x7Gk9tdRVem81KysSYU78pHJjm8pzUT1UtIgNk9XpUNbZT/0DBRCE06NMc4YNIFPhVeL pP0jQs1p+c18GHv7rJf1xgU9tucoOanB0S2cBBCFsRVqTlEQHb1Oyeet5Br5s5cbMsrt ICam4y5R+hkWJ4Aaawiu018TI7uBEQ8MULZnzhgpX3lXuejP2K3obMxqzF1aJGornDbT vTlMDyTFSMtgE/uGp5+pldMcYCQjwanaqe1+AAXqt9HRRulwuCheEa7iD6s/1r+X0b80 dgDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=vqkvc+7dzCvRW2QZ5bF3SVaAd6/rC3mwruelVaoQ5xI=; b=pq5vaPXYy8StrM1aW8I2vq0oaJCG/AdAZf5iMHXmGU7gQHEt2FIqYeu7/elDm9EN5C K6aFT0XsC/d/b5PY4uPGpnM+TFwmtjQxyRgo17pCNGWcGuXu5sO2c9RS5aP2VRDkOrco pNSTwhAdIp3q+iaAooMbvxgkl+gRmQzcLgjvhnfVtVeNq/pykJjxU9aAe5w9/JS2oVzb vtB6ZpHV17WawqL8eTT6HWp91EV18OFH7onkChMzP4zOFJbyH9k4vzgczQMSQCeoUhY9 tfz5zxiv3RAD2PU3oSxVBCOSiK2WAjQFqGp5tJ3IFC4Ym3jp38myPu0gFJvsE+14oQX8 8KQA== X-Gm-Message-State: APjAAAVRmz8MHpASVdkdKikvr6Bxvm0KQoTnTWK4dwkkwl1PAd17iax7 X6WsEu5qkz5o6FZMms3ZDKLhCw== X-Google-Smtp-Source: APXvYqyG4vRiGZIz+8g7Q3tYibIYRw8L5EUcyCLyW26I9DB/vzM0gHrohAeasXZz05NNhyKIr7d6mQ== X-Received: by 2002:adf:e982:: with SMTP id h2mr7255647wrm.53.1572271534015; Mon, 28 Oct 2019 07:05:34 -0700 (PDT) Received: from 6wind.com (2a01cb0c0005a6000226b0fffeed02fc.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:226:b0ff:feed:2fc]) by smtp.gmail.com with ESMTPSA id u9sm7079343wmm.14.2019.10.28.07.05.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Oct 2019 07:05:33 -0700 (PDT) Date: Mon, 28 Oct 2019 15:05:32 +0100 From: Olivier Matz To: Vamsi Krishna Attunuru Cc: Jerin Jacob , Andrew Rybchenko , Ferruh Yigit , "thomas@monjalon.net" , Jerin Jacob Kollanukkaran , Kiran Kumar Kokkilagadda , "anatoly.burakov@intel.com" , "stephen@networkplumber.org" , "dev@dpdk.org" Message-ID: <20191028140532.wb4imunpxnvdbk5g@platinum> References: <08c426d1-6fc9-1c3f-02d4-8632a8e3c337@solarflare.com> <20191023144724.GO25286@glumotte.dev.6wind.com> <20191024173506.GU25286@glumotte.dev.6wind.com> <20191026122525.ny6wwtrnfw32367j@platinum> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Sat, Oct 26, 2019 at 02:09:51PM +0000, Vamsi Krishna Attunuru wrote: > Hi Olivier, > > > -----Original Message----- > > From: Olivier Matz > > Sent: Saturday, October 26, 2019 5:55 PM > > To: Vamsi Krishna Attunuru > > Cc: Jerin Jacob ; Andrew Rybchenko > > ; Ferruh Yigit ; > > thomas@monjalon.net; Jerin Jacob Kollanukkaran ; > > Kiran Kumar Kokkilagadda ; > > anatoly.burakov@intel.com; stephen@networkplumber.org; dev@dpdk.org > > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy kni option > > > > Hi Jerin, Hi Vamsi, > > > > On Fri, Oct 25, 2019 at 09:20:20AM +0000, Vamsi Krishna Attunuru wrote: > > > > > > > > > > -----Original Message----- > > > > From: Jerin Jacob > > > > Sent: Friday, October 25, 2019 1:01 AM > > > > To: Olivier Matz > > > > Cc: Vamsi Krishna Attunuru ; Andrew > > Rybchenko > > > > ; Ferruh Yigit ; > > > > thomas@monjalon.net; Jerin Jacob Kollanukkaran ; > > > > Kiran Kumar Kokkilagadda ; > > > > anatoly.burakov@intel.com; stephen@networkplumber.org; > > dev@dpdk.org > > > > Subject: Re: [dpdk-dev] [EXT] Re: [PATCH v11 2/4] eal: add legacy > > > > kni option > > > > > > > > On Thu, Oct 24, 2019 at 11:05 PM Olivier Matz > > > > > > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Wed, Oct 23, 2019 at 08:32:08PM +0530, Jerin Jacob wrote: > > > > > > On Wed, Oct 23, 2019 at 8:17 PM Olivier Matz > > > > > > > > > > wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > On Wed, Oct 23, 2019 at 03:42:39PM +0530, Jerin Jacob wrote: > > > > > > > > On Tue, Oct 22, 2019 at 7:01 PM Vamsi Krishna Attunuru > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi Ferruh, > > > > > > > > > > > > > > > > > > Can you please explain the problems in using kni dedicated > > > > > > > > > mbuf alloc > > > > routines while enabling kni iova=va mode. Please see the below > > > > discussion with Andrew. He wanted to know the problems in having > > newer APIs. > > > > > > > > > > > > > > > > > > > > > > > > While waiting for the Ferruh reply, I would like to > > > > > > > > summarise the current status > > > > > > > > > > > > > > > > # In order to make KNI work with IOVA as VA, We need to make > > > > > > > > sure mempool pool _object_ should not span across two huge > > > > > > > > pages > > > > > > > > > > > > > > > > # This problem can be fixed by, either of: > > > > > > > > > > > > > > > > a) Introduce a flag in mempool to define this constraint, so > > > > > > > > that, when only needed, this constraint enforced and this is > > > > > > > > in line with existing semantics of addressing such problems > > > > > > > > in mempool > > > > > > > > > > > > > > > > b) Instead of creating a flag, Make this behavior by default > > > > > > > > in mempool for IOVA as VA case > > > > > > > > > > > > > > > > Upside: > > > > > > > > b1) There is no need for specific mempool_create for KNI. > > > > > > > > > > > > > > > > Downside: > > > > > > > > b2) Not align with existing mempool API semantics > > > > > > > > b3) There will be a trivial amount of memory waste as we can > > > > > > > > not allocate from the edge. Considering the normal huge page > > > > > > > > memory size is 1G or 512MB this not a real issue. > > > > > > > > > > > > > > > > c) Make IOVA as PA when KNI kernel module is loaded > > > > > > > > > > > > > > > > Upside: > > > > > > > > c1) Doing option (a) would call for new KNI specific mempool > > > > > > > > create API i.e existing KNI applications need a one-line > > > > > > > > change in application to make it work with release 19.11 or later. > > > > > > > > > > > > > > > > Downslide: > > > > > > > > c2) Driver which needs RTE_PCI_DRV_NEED_IOVA_AS_VA can not > > > > > > > > work with KNI > > > > > > > > c3) Need root privilege to run KNI as IOVA as PA need root > > > > > > > > privilege > > > > > > > > > > > > > > > > For the next year, we expect applications to work 19.11 > > > > > > > > without any code change. My personal opinion to make go with > > > > > > > > option (a) and update the release notes to document the > > > > > > > > change any it simple one-line change. > > > > > > > > > > > > > > > > The selection of (a) vs (b) is between KNI and Mempool > > maintainers. > > > > > > > > Could we please reach a consensus? Or can we discuss this TB > > meeting? > > > > > > > > > > > > > > > > We are going back and forth on this feature on for the last > > > > > > > > 3 releases. Now that, we solved all the technical problems, > > > > > > > > please help us to decide (a) vs (b) to make forward progress. > > > > > > > > > > > > > > Thank you for the summary. > > > > > > > What is not clear to me is if (a) or (b) may break an existing > > > > > > > application, and if yes, in which case. > > > > > > > > > > > > Thanks for the reply. > > > > > > > > > > > > To be clear we are talking about out of tree KNI tree application. > > > > > > Which they don't want to > > > > > > change rte_pktmbuf_pool_create() to > > > > > > rte_kni_pktmbuf_pool_create() and build for v19.11 > > > > > > > > > > > > So in case (b) there is no issue as It will be using > > rte_pktmbuf_pool_create (). > > > > > > But in case of (a) it will create an issue if out of tree KNI > > > > > > application is using rte_pktmbuf_pool_create() which is not > > > > > > using the NEW flag. > > > > > > > > > > Following yesterday's discussion at techboard, I looked at the > > > > > mempool code and at my previous RFC patch. It took some time to > > > > > remind me what was my worries. > > > > > > > > Thanks for the review Olivier. > > > > > > > > Just to make sure the correct one is reviewed. > > > > > > > > 1) v7 had similar issue mentioned > > > > https://urldefense.proofpoint.com/v2/url?u=http- > > > > > > 3A__patches.dpdk.org_patch_56585_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7 > > xtf > > > > > > Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YM > > VHe > > > > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=mfN_afnyFm65sQYzaAg_- > > > > uM9o22A5j392TdBZY-bKK4&e= > > > > The v7 has the problem I described below: the iova-contiguous allocation > > may fail because the calculated size is too small, and remaining objects will > > be added in another chunk. This can happen if a fully iova-contiguous > > mempool is requested (it happens for octeontx). > > > > Also, the way page size is retrieved in > > rte_mempool_op_populate_default() assume that memzones are used, > > which is not correct. > > > > > > 2) v11 addressed the review comments and you have given the Acked-by > > > > for mempool change https://urldefense.proofpoint.com/v2/url?u=http- > > > > > > 3A__patches.dpdk.org_patch_61559_&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7 > > xtf > > > > > > Q&r=WllrYaumVkxaWjgKto6E_rtDQshhIhik2jkvzFyRhW8&m=MMwAZe76YM > > VHe > > > > > > 8UcHjL4IBnfX5YvtbocwICAZGBY97A&s=frFvKOHFDRhTam6jDZZc6omK2gb1RU > > 62 > > > > xzAiiBMnf0I&e= > > > > The v11 looked fine to me, because it does not impact any default behavior, > > and the plan was to remove this new function when my mempool patchset > > was in. > > > > > > > > > > > > My thought process in the TB meeting was, since > > > > rte_mempool_populate_from_pg_sz_chunks() reviwed replace > > > > rte_pktmbuf_pool_create's rte_mempool_populate_default() with > > > > rte_mempool_populate_from_pg_sz_chunks() > > > > in IOVA == VA case to avoid a new KNI mempool_create API. > > > > > > > > > > > > > > Currently, in rte_mempool_populate_default(), when the mempool is > > > > > populated, we first try to allocate one iova-contiguous block of > > > > > (n * elt_size). On success, we use this memory to fully populate > > > > > the mempool without taking care of crossing page boundaries. > > > > > > > > > > If we change the behavior to prevent objects from crossing pages, > > > > > the assumption that allocating (n * elt_size) is always enough > > > > > becomes wrong. By luck, there is no real impact, because if the > > > > > mempool is not fully populated after this first iteration, it will > > > > > allocate a new chunk. > > > > > > > > > > To be rigorous, we need to better calculate the amount of memory > > > > > to allocate, according to page size. > > > > > > Hi Olivier, > > > > > > Thanks for the review, I think the below mentioned problems exist with > > > current mempool_populate_default() api and will there be high chances > > > of hitting those problems when we precalculate the memory size(after > > > preventing objs from pg boundary and fit complete mempool memory in > > > single mem chunk) and if mempool size goes beyond page size as below > > > example. ?, > > > > Yes, the problem described below (alloc a mempool of 1.1GB resulting in 2GB > > reserved) exists in the current version. It will be fixed in the new version of > > my "mempool: avoid objects allocations across pages" > > patchset. > > > > FYI, a reworked patchset is alsmost ready, I'll send it monday. > > Thanks a lot Olivier. > > > > > > > > > Regards, > > > Vamsi > > > > > > > > > > > > > Looking at the code, I found another problem in the same area: > > > > > let's say we populate a mempool that requires 1.1GB (and we use 1G > > huge pages): > > > > > > > > > > 1/ mempool code will first tries to allocate an iova-contiguous zone > > > > > of 1.1G -> fail > > > > > 2/ it then tries to allocate a page-aligned non iova-contiguous > > > > > zone of 1.1G, which is 2G. On success, a lot of memory is wasted. > > > > > 3/ on error, we try to allocate the biggest zone, it can still return > > > > > a zone between 1.1G and 2G, which can also waste memory. > > > > > > > > > > I will rework my mempool patchset to properly address these > > > > > issues, hopefully tomorrow. > > > > > > > > OK. > > > > > > > > > > > > > > > > > > Also, I thought about another idea to solve your issue, not sure > > > > > it is better but it would not imply to change the mempool > > > > > behavior. If I understood the problem, when a mbuf is accross 2 > > > > > pages, the copy of the data can fail in kni because the mbuf is > > > > > not virtually contiguous in the > > > > > > > > For KNI use case, we would need _physically_ contiguous to make sure > > > > that using, get_user_pages_remote() we get physically contiguous > > > > memory map, so that both KNI kernel thread and KNI kernel context > > > > and DPDK userspace can use the same memory in different contexts. > > > > > > > > > > > > > > > > > kernel. So why not in this case splitting the memcpy() into > > > > > several, each of them being on a single page (and calling > > > > > phys2virt() for each page)? The same would have to be done when > > > > > accessing the fields of the mbuf structure if it crosses a page > > > > > boundary. Would that work? This > > > > > > > > If the above is the requirement, Does this logic need to be in slow > > > > path or fast path? > > > > In fast path. But I don't think the performance impact would be significative. > > > > Vamsi, do you confirm this approach could also solve the issue without > > changing the mempool? > > So far there is no performance impact observed when these feature(kni iova as va) is enabled with this patch set. Not sure of impact if when memcpy split and extra address translations are introduced, this approach might solve but I feel it's not a clearer way of solving the issue. Agree it's not the clearer way, but maybe less risky. As you can see, the reworked patchset is not that short: http://inbox.dpdk.org/dev/20190719133845.32432-1-olivier.matz@6wind.com/T/#m6249311baae8469f5727f07c32e4e6e6844eae6a > > > > > > > > > > > could be a B plan. > > > > > > > > OK. > > > > > > > > > > > > > > Olivier