From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 8E88F91 for ; Tue, 20 Nov 2018 11:55:52 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Nov 2018 02:55:51 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,256,1539673200"; d="scan'208";a="92567716" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.124]) ([10.237.220.124]) by orsmga006.jf.intel.com with ESMTP; 20 Nov 2018 02:55:48 -0800 To: Shahaf Shuler , "dev@dpdk.org" Cc: Olga Shern , Yongseok Koh , "pawelx.wodkowski@intel.com" , "gowrishankar.m@linux.vnet.ibm.com" , "ferruh.yigit@intel.com" , Thomas Monjalon , "arybchenko@solarflare.com" , "shreyansh.jain@nxp.com" References: <6c7243cd-5370-846b-2999-5ae34722f640@intel.com> From: "Burakov, Anatoly" Message-ID: <1c00477d-266a-400e-44b4-5bcb1f41e4b7@intel.com> Date: Tue, 20 Nov 2018 10:55:47 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [RFC] ethdev: introduce DMA memory mapping for external memory X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Nov 2018 10:55:53 -0000 On 20-Nov-18 8:08 AM, Shahaf Shuler wrote: > Monday, November 19, 2018 7:18 PM, Burakov, Anatoly: >> Subject: Re: [RFC] ethdev: introduce DMA memory mapping for external >> memory >> >> On 19-Nov-18 11:20 AM, Shahaf Shuler wrote: >>> Thursday, November 15, 2018 1:00 PM, Burakov, Anatoly: >>>> Subject: Re: [RFC] ethdev: introduce DMA memory mapping for external >>>> memory >>>> >>>> On 15-Nov-18 9:46 AM, Shahaf Shuler wrote: >>>>> Wednesday, November 14, 2018 7:06 PM, Burakov, Anatoly: >>>>>> Subject: Re: [RFC] ethdev: introduce DMA memory mapping for >>>>>> external memory >>>>>> >>>>>> On 14-Nov-18 2:53 PM, Shahaf Shuler wrote: >>>>>>> Hi Anatoly, >>>>>>> >>>>>>> Wednesday, November 14, 2018 1:19 PM, Burakov, Anatoly: >>>>>>>> Subject: Re: [RFC] ethdev: introduce DMA memory mapping for >>>>>>>> external memory >>>>>>>> > > [...] > >>>> >>>> I have to be honest, I didn't consider this question before :D I >>>> guess there could be cases where using rte_malloc might not be >>>> suitable because it wastes some memory on malloc elements, i.e. if >>>> you want to use N pages as memory, you'd have to allocate N+1 pages. >>>> If memory is at a premium, maybe manual management of it would be >> better in some cases. >>> >>> I had similar thoughts, more related to the usability from the user side. >>> When application allocated allocates external memory it just wants to use it >> for DMA, i.e. put it as the mbuf buf_addr or to populate it w/ a mempool. >>> It is an "overhead" to create a socket for this external memory, to populate >> it w/ the memory, and later on to malloc from this socket (or use the socket >> id for the mempool creation). >>> Not to mention the fact that maybe the application wants to manage this >> memory differently than how rte_malloc does. >>> >>> On the other hand, mapping memory to device before using it for dma is >> far more intuitive. >> >> It is far more intuitive *if* you're doing all of the memory management >> yourself or "just" using this memory for a mempool. This was already working >> before, and if you had that as your use case, there is no need for the >> external memory feature. > > I think it was broken in some way when you did the first work on the new memory management. > Because after this work, PMDs were adjusted to look for the memory only in the fbarrays, and such external memory doesn't exists there. > otherwise, if working, why you added the support for the external memory? Which other use case you tried to cover? > > It is fixed in some way with the new work to support external memory, but requires changes in the application (the "overhead" I referred before). I feel we have a difference of terminology here. What i refer to as "external memory support" is the feature that was added in 18.11, that has to do with creating heaps etc. - this memory is now (hopefully) properly supported across entire DPDK. What you're probably referring to as "external memory support" is not that - it's the feature of mempools, rings etc. that allow you to use memory that's not registered with and not managed by DPDK. This is the way "external memory" was done prior to 18.11. The above scenario (working with anonymous memory in mempools) is still "broken" in this sense - and i see no general way to fix it. It could work with mempools in that we could automatically create new heaps behind the scenes and populate them with new memory, but for other data structures (e.g. rte_ring_init() over anonymous memory), it's just not possible to do that. I personally would prefer these API's to just go away and people started using the "official" way to manage external memory. Removing the API's may even be feasible for everything bar mempool, and mempool support we can manage more-or-less automatically in the above way (by automatically creating an external heap behind the scenes). But then by the time we're creating/adding external memory heaps inside mempool API's, you don't really need the external memory DMA mapping API calls because the mempool would do that automatically by way of using the external heaps API. In other words, the problem here isn't so much usability as it is the fact that, for some stuff, a lot of internal code now relies on DPDK's internal page tables, and we have to either make allowances for that on a case-by-case basis (by the way, no one came to me with these issues!), or admit that using unregistered memory is a bad idea and let's not do that in the future. > >> >> On the other hand, if you were to use it in a different way - for example, >> allocating hash tables or other DPDK data structures - then such a feature is >> essential. > > Hash tables that the NIC needs to read/write to/from? > I didn't get the point here. The point wasn't hash tables, but support for use cases we might not have thought of. That includes doing strange things like allocating DPDK data structures in non-standard memory locations. > > The entire point was to allow using external memory with >> semantics identical to how you use the rest of DPDK. > > I understand this design choice, and it has the benefit of consistency v.s. more API call on the application side. > >> >> Also, whether it's "intuitive" depends on perspective - you say "i expect to >> allocate memory and map it for DMA myself", i say "why do i have to care >> about DMA mapping, DPDK should do this for me" :) > > Right 😊. We should be driven by customers and not vendors, too bad there is no much feedback from them on the community. That implies current design *wasn't* driven by customers, which is not true. They may not be the same as ones you are aware of - but then this isn't really my fault :) > I know from VPP side they prefer not to use the rte_socket rather to manage the memory by them self, and they are perfectly fine with mapping it explicitly. This is true for other company that has vswitch solution (which I cannot mention by name). Well, it's not like i can do anything if no one from either communities comes here and engages in the discussion. I wish i could read people's minds, but alas :) > > > If you are using your >> own memory management and doing everything manually - you get to map >> everything for DMA manually. If you are using DPDK facilities >> - it is intuitive that for any DPDK-managed memory, internal or external, >> same rules apply across the board - DMA mapping included. > > So if understand you point here, your claim "we need both". One for users which prefer the DPDK to do that for them and the other for the users who wants more control. > However, when keeping the API of explicit mapping, we can no longer say that external memory is behaves exactly the same as internal. Well, more or less that, yes. It would be good to have full support for "external" memory (in the "unregistered with DPDK" sense, as outlined above). > > So going back to my RFC 😊, if we decide to keep both approach, what I suggest is to change the VFIO mapping one to be more generic and per device (can be rte_device). > I think the biggest questions we need to answer are > 1. are we OK with generic, vendor agnostic API? Yes, generic and vendor-agnostic FTW! > 2. are we OK that the explicit mapping is done per device? This question is slightly misleading, because that's not *all* there is to it. In general, yes, i'm OK with the idea. In practice, VFIO may get in the way. VFIO works by mapping memory for a VFIO container. The container can have one or more devices in it, and how many devices it can have depends on the IOMMU setup (i.e. it may not be possible to have one device per container on some setups, which is why currently everything is done with one container). So, "per-device" means we either keep each device in separate container (which will not work in some cases), or we somehow work around the fact that mappings can be performed only once (i'm not sure if VFIO will return appropriate error codes that would allow us to distinguish failed mappings due to duplicate, or failed mappings due to other reasons - i'll have to research this), and keep all devices in the same container. In other words, it may be possible to solve this issue for VFIO, but i would not be comfortable to proceed without knowing that this can be done in a sane way. There's also a question of automatic vs. manual mapping. The way i see it, the choices are: 1) drop automatic registration for DMA altogether, for both internal and external memory 2) drop automatic registration for DMA for external memory only, but keep automatic DMA registration for internal memory 3) do automatic DMA registration for both I don't like the second choice for API consistency reasons, so i would rather go either for 1) (and cause massive breakage of all existing applications along the way...), or 3) (the way it's working right now). The pragmatic choice *may* be 2), which is what i feel you are advocating for, but it has its own problems, mainly the mechanics of it. If we can get them worked out, i may just bite the bullet and ack the patches :D Specifically, let's suppose we want automatic memory registration for DMA on internal but not external memory. How does it happen? Currently, it happens via mem event callbacks. If we were to drop the callback approach for external memory, we would lose out on additional functionality provided by said callbacks, and i would really like to avoid that. If we were to switch to manual mapping, that means EAL will depend on ethdev (you have suggested rte_device as an alternative - would that fix the issue?), and we will have to rewrite current DMA mapping code to do mappings explicitly, rather than relying on callbacks mechanism. So, instead of e.g. VFIO mapping being performed via callback, we would instead iterate over each rte_device and call its respective DMA mapping function explicitly - for internal memory, but not for external. This *may* work, but i need to think it through a bit more :) This also still leaves a question of external unregistered memory - this is a wholly separate issue. TL;DR i'm tentatively OK with the proposal, but we need to work certain things out before we go too far along the path and find out that we've reached a dead end. -- Thanks, Anatoly