From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id D42B21B4D3 for ; Thu, 11 Oct 2018 12:46:55 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 11 Oct 2018 03:46:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,368,1534834800"; d="scan'208";a="270424560" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.79]) ([10.237.220.79]) by fmsmga005.fm.intel.com with ESMTP; 11 Oct 2018 03:46:50 -0700 To: Shreyansh Jain Cc: ferruh.yigit@intel.com, dev@dpdk.org References: <20180925125423.7505-1-shreyansh.jain@nxp.com> <20180925125423.7505-4-shreyansh.jain@nxp.com> <894130a9-017c-348d-31f8-c4c23f517f25@nxp.com> <2d2e9008-fb5b-3ecd-2d2c-e86250f5d363@nxp.com> <5e153c76-6eaa-e6a5-28ed-7cec191d4581@intel.com> <2c377b57-c418-5c03-a23d-5da91ef898d0@nxp.com> <16940ff0-278c-2376-1a86-6beaf0770a09@nxp.com> <4f10749d-b6e8-e257-0606-2fbb3c1e3d76@intel.com> <2fdcf7ac-6bed-00b6-1601-b816cffc613a@nxp.com> From: "Burakov, Anatoly" Message-ID: <02288d2c-56ad-e814-0487-79295da2aac7@intel.com> Date: Thu, 11 Oct 2018 11:46:49 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <2fdcf7ac-6bed-00b6-1601-b816cffc613a@nxp.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH 3/5] common/dpaax: add library for PA VA translation table X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Oct 2018 10:46:56 -0000 On 11-Oct-18 11:39 AM, Shreyansh Jain wrote: > On Thursday 11 October 2018 03:43 PM, Burakov, Anatoly wrote: >> On 11-Oct-18 11:07 AM, Shreyansh Jain wrote: >>> On Thursday 11 October 2018 03:32 PM, Shreyansh Jain wrote: >>>> On Thursday 11 October 2018 02:33 PM, Burakov, Anatoly wrote: >>>>> On 09-Oct-18 11:45 AM, Shreyansh Jain wrote: >>>>>> On Tuesday 25 September 2018 07:09 PM, Shreyansh Jain wrote: >>>>>>> Hello Anatoly, >>>>>>> >>>>>>> On Tuesday 25 September 2018 06:58 PM, Burakov, Anatoly wrote: >>>>>>>> On 25-Sep-18 1:54 PM, Shreyansh Jain wrote: >>>>>>>>> A common library, valid for dpaaX drivers, which is used to >>>>>>>>> maintain >>>>>>>>> a local copy of PA->VA translations. >>>>>>>>> >>>>>>>>> In case of physical addressing mode (one of the option for >>>>>>>>> FSLMC, and >>>>>>>>> only option for DPAA bus), the addresses of descriptors Rx'd are >>>>>>>>> physical. These need to be converted into equivalent VA for >>>>>>>>> rte_mbuf >>>>>>>>> and other similar calls. >>>>>>>>> >>>>>>>>> Using the rte_mem_virt2iova or rte_mem_virt2phy is expensive. This >>>>>>>>> library is an attempt to reduce the overall cost associated with >>>>>>>>> this translation. >>>>>>>>> >>>>>>>>> A small table is maintained, containing continuous entries >>>>>>>>> representing a continguous physical range. Each of these entries >>>>>>>>> stores the equivalent VA, which is fed during mempool creation, or >>>>>>>>> memory allocation/deallocation callbacks. >>>>>>>>> >>>>>> >>>>>> [...] >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Also, a couple of nitpicks below. >>>>>>>> >>>>>>>>>   cosnfig/common_base                            |   5 + >>>>>>>>>   config/common_linuxapp                        |   5 + >>>>>>>>>   drivers/common/Makefile                       |   4 + >>>>>>>>>   drivers/common/dpaax/Makefile                 |  31 ++ >>>>>>>>>   drivers/common/dpaax/dpaax_iova_table.c       | 509 >>>>>>>>> ++++++++++++++++++ >>>>>>>>>   drivers/common/dpaax/dpaax_iova_table.h       | 104 ++++ >>>>>>>>>   drivers/common/dpaax/dpaax_logs.h             |  39 ++ >>>>>>>>>   drivers/common/dpaax/meson.build              |  12 + >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> +    DPAAX_DEBUG("Add: Found slot at (%"PRIu64")[(%zu)] for >>>>>>>>> vaddr:(%p)," >>>>>>>>> +            " phy(%"PRIu64"), len(%zu)", entry[i].start, >>>>>>>>> e_offset, >>>>>>>>> +            vaddr, paddr, length); >>>>>>>>> +    return 0; >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> +int >>>>>>>>> +dpaax_iova_table_del(phys_addr_t paddr, size_t len __rte_unused) >>>>>>>> >>>>>>>> len is not unused. >>>>>>> >>>>>>> I will fix this. >>>>>>> Actually, this function itself is useless - more for symmetry >>>>>>> reason. >>>>>>> Callers would be either simply updating the table, or ignoring it >>>>>>> completely. But, yes, this is indeed wrong that I set that unused. >>>>>>> >>>>>> >>>>>> Actually, I was wrong in my first reply. In case of >>>>>> dpaax_iova_table_del(), len is indeed redundant. This is because >>>>>> the mapping is for a complete page (min of 2MB size), even if the >>>>>> request is for lesser length. So, removal of a single entry (of >>>>>> fixed size) would be done. >>>>>> >>>>>> In fact, while on this, I think deleting a PA->VA entry itself is >>>>>> incorrect (not just useless). A single entry (~2MB equivalent) can >>>>>> represent multiple users (working on a rte_malloc'd area, for >>>>>> example). So, effectively, its always an update - not an add or del. >>>>> >>>>> I'm not sure what you mean here. If you got a mem event about >>>>> memory area being freed, it's guaranteed to *not* have any users - >>>>> neither malloc, nor any other memory. And len is always page-aligned. >>>> >>>> ok. Maybe I am getting this wrong, but consider this: >>>> >>>> 1) hugepage size=2MB >>>> 2) a = malloc(1M) >>>>    this will pin an entry in table for a block starting at VA=(a) >>>> and PA=(a'). Each entry is of 2MB length - that means, even if >>>> someone were to access a+1048577 for an equivalent PA, they would >>>> get it (though, that is a incorrect access). >>>> 3) b = malloc(1M) >>>>    this *might* lead to a case where same 2MB page is used and >>>> VA=(b==(a+1MB)). Being hugepage backed, PA=(b=PA(a)+1M). >>>> = After b, the PA-VA table has a single entry of 2MB, representing >>>> two mallocs. It can be used for translation for any thread >>>> requesting PAs of a or b. >>>> 4) Free(a) >>>>   - this would attempt to remove one 2MB entry from PA-VA table. >>>> But, 'b' is already valid. Access to get_pa(VA(b)) should return me >>>> the PA(b). >>>>   - 'len' is not even used as the entry in PA-VA table is of a fixed >>>> size. >>> >>> Just to add to this: >>> - if talking about the mem_event callback, it definitely won't be a >>> case where same page is still being served under another rte_malloc >>> - But, calls can come to delete from users of PA-VA table based on >>> their own rte_free(). >>> >>> And, your comment makes me think - I should probably del entry from >>> the table only when mem_event callback is received. >> >> Mem events are not triggered on rte_free(), they're triggered on page >> deallocation. A call to rte_free/rte_memzone_free/rte_mempool_free >> etc. *might* trigger a page deallocation, but *only* if the memory >> area being freed encompasses an entire page. If you rte_malloc() 64 >> bytes and then rte_free() those 64 bytes, you won't get a mem event >> *unless* these were the only 64 bytes allocated on a particular page, >> and the entire page is no longer used by anything else. > > My understanding is same. > But, it seems my explanation wasn't well written: > > For a rte_free(), I am not expecting that mem_event is raised - but, the > caller of rte_free() (the eth or crypto drivers, or applications) may > call the PA-VA table del function to remove the entries. > > This voluntary delete of table entry from the drivers or applications > using PA-VA calling del of PA-VA table - is not correct. Yes, so it appears :) -- Thanks, Anatoly