DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: "Nélio Laranjeiro" <nelio.laranjeiro@6wind.com>
Cc: Yongseok Koh <yskoh@mellanox.com>,
	"Walker, Benjamin" <benjamin.walker@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	"andras.kovacs@ericsson.com" <andras.kovacs@ericsson.com>,
	"Wiles, Keith" <keith.wiles@intel.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>
Subject: Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK
Date: Mon, 5 Feb 2018 10:36:58 +0000	[thread overview]
Message-ID: <bea8e86d-5e30-88b5-5db4-624002d255d2@intel.com> (raw)
In-Reply-To: <20180205101852.owogsnbcach32z2k@laranjeiro-vm.dev.6wind.com>

On 05-Feb-18 10:18 AM, Nélio Laranjeiro wrote:
> On Mon, Feb 05, 2018 at 10:03:35AM +0000, Burakov, Anatoly wrote:
>> On 02-Feb-18 7:28 PM, Yongseok Koh wrote:
>>> On Tue, Dec 26, 2017 at 05:19:25PM +0000, Walker, Benjamin wrote:
>>>> On Fri, 2017-12-22 at 09:13 +0000, Burakov, Anatoly wrote:
>>>>> On 21-Dec-17 9:38 PM, Walker, Benjamin wrote:
>>>>>> SPDK will need some way to register for a notification when pages are
>>>>>> allocated
>>>>>> or freed. For storage, the number of requests per second is (relative to
>>>>>> networking) fairly small (hundreds of thousands per second in a traditional
>>>>>> block storage stack, or a few million per second with SPDK). Given that, we
>>>>>> can
>>>>>> afford to do a dynamic lookup from va to pa/iova on each request in order to
>>>>>> greatly simplify our APIs (users can just pass pointers around instead of
>>>>>> mbufs). DPDK has a way to lookup the pa from a given va, but it does so by
>>>>>> scanning /proc/self/pagemap and is very slow. SPDK instead handles this by
>>>>>> implementing a lookup table of va to pa/iova which we populate by scanning
>>>>>> through the DPDK memory segments at start up, so the lookup in our table is
>>>>>> sufficiently fast for storage use cases. If the list of memory segments
>>>>>> changes,
>>>>>> we need to know about it in order to update our map.
>>>>>
>>>>> Hi Benjamin,
>>>>>
>>>>> So, in other words, we need callbacks on alloa/free. What information
>>>>> would SPDK need when receiving this notification? Since we can't really
>>>>> know in advance how many pages we allocate (it may be one, it may be a
>>>>> thousand) and they no longer are guaranteed to be contiguous, would a
>>>>> per-page callback be OK? Alternatively, we could have one callback per
>>>>> operation, but only provide VA and size of allocated memory, while
>>>>> leaving everything else to the user. I do add a virt2memseg() function
>>>>> which would allow you to look up segment physical addresses easier, so
>>>>> you won't have to manually scan memseg lists to get IOVA for a given VA.
>>>>>
>>>>> Thanks for your feedback and suggestions!
>>>>
>>>> Yes - callbacks on alloc/free would be perfect. Ideally for us we want one
>>>> callback per virtual memory region allocated, plus a function we can call to
>>>> find the physical addresses/page break points on that virtual region. The
>>>> function that finds the physical addresses does not have to be efficient - we'll
>>>> just call that once when the new region is allocated and store the results in a
>>>> fast lookup table. One call per virtual region is better for us than one call
>>>> per physical page because we're actually keeping multiple different types of
>>>> memory address translation tables in SPDK. One translates from va to pa/iova, so
>>>> for this one we need to break this up into physical pages and it doesn't matter
>>>> if you do one call per virtual region or one per physical page. However another
>>>> one translates from va to RDMA lkey, so it is much more efficient if we can
>>>> register large virtual regions in a single call.
>>>
>>> Another yes to callbacks. Like Benjamin mentioned about RDMA, MLX PMD has to
>>> look up LKEY per each packet DMA. Let me briefly explain about this for your
>>> understanding. For security reason, we don't allow application initiates a DMA
>>> transaction with unknown random physical addresses. Instead, va-to-pa mapping
>>> (we call it Memory Region) should be pre-registered and LKEY is the index of the
>>> translation entry registered in device. With the current static memory model, it
>>> is easy to manage because v-p mapping is unchanged over time. But if it becomes
>>> dynamic, MLX PMD should get notified with the event to register/un-regsiter
>>> Memory Region.
>>>
>>> For MLX PMD, it is also enough to get one notification per allocation/free of a
>>> virutal memory region. It shouldn't necessarily be a per-page call like Benjamin
>>> mentioned because PA of region doesn't need to be contiguous for registration.
>>> But it doesn't need to know about physical address of the region (I'm not saying
>>> it is unnecessary, but just FYI :-).
>>>
>>> Thanks,
>>> Yongseok
>>>
>>
>> Thanks for your feedback, good to hear we're on the right track. I already
>> have a prototype implementation of this working, due for v1 submission :)
> 
> Hi Anatoly,
> 
> Good to know.
> Do you see some performances impact with this series?
> 
> Thanks,
> 

In general case, no impact is noticeable, since e.g. underlying ring 
implementation does not depend on IO space layout whatsoever. In certain 
specific cases, some optimizations that were made on the assumption that 
physical space is contiguous, would no longer be possible (e.g. 
calculating offset spanning several pages) unless VFIO is in use, as due 
to unpredictability of IO space layout, each page will have to be 
checked individually, rather than sharing common base offset.

-- 
Thanks,
Anatoly

  reply	other threads:[~2018-02-05 10:37 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-19 11:14 Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 01/23] eal: move get_virtual_area out of linuxapp eal_memory.c Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 02/23] eal: add function to report number of detected sockets Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 03/23] eal: add rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 04/23] eal: move all locking to heap Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 05/23] eal: protect malloc heap stats with a lock Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 06/23] eal: make malloc a doubly-linked list Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 07/23] eal: make malloc_elem_join_adjacent_free public Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 08/23] eal: add "single file segments" command-line option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 09/23] eal: add "legacy memory" option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 10/23] eal: read hugepage counts from node-specific sysfs path Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 11/23] eal: replace memseg with memseg lists Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 12/23] eal: add support for dynamic memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 13/23] eal: make use of dynamic memory allocation for init Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 14/23] eal: add support for dynamic unmapping of pages Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 15/23] eal: add API to check if memory is physically contiguous Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 16/23] eal: enable dynamic memory allocation/free on malloc/free Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 17/23] eal: add backend support for contiguous memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 18/23] eal: add rte_malloc support for allocating contiguous memory Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 19/23] eal: enable reserving physically contiguous memzones Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 20/23] eal: make memzones use rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 21/23] mempool: add support for the new memory allocation methods Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 22/23] vfio: allow to map other memory regions Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 23/23] eal: map/unmap memory with VFIO when alloc/free pages Anatoly Burakov
2017-12-19 15:46 ` [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK Stephen Hemminger
2017-12-19 16:02   ` Burakov, Anatoly
2017-12-19 16:06     ` Stephen Hemminger
2017-12-19 16:09       ` Burakov, Anatoly
2017-12-21 21:38 ` Walker, Benjamin
2017-12-22  9:13   ` Burakov, Anatoly
2017-12-26 17:19     ` Walker, Benjamin
2018-02-02 19:28       ` Yongseok Koh
2018-02-05 10:03         ` Burakov, Anatoly
2018-02-05 10:18           ` Nélio Laranjeiro
2018-02-05 10:36             ` Burakov, Anatoly [this message]
2018-02-06  9:10               ` Nélio Laranjeiro
2018-02-14  2:01           ` Yongseok Koh
2018-02-14  9:32             ` Burakov, Anatoly
2018-02-14 18:13               ` Yongseok Koh
2018-01-13 14:13 ` Burakov, Anatoly
2018-01-23 22:33 ` Yongseok Koh
2018-01-25 16:18   ` Burakov, Anatoly
2018-02-14  8:04 ` Thomas Monjalon
2018-02-14 10:07   ` Burakov, Anatoly
2018-04-25 16:02     ` Burakov, Anatoly
2018-04-25 16:12       ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bea8e86d-5e30-88b5-5db4-624002d255d2@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=andras.kovacs@ericsson.com \
    --cc=benjamin.walker@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=keith.wiles@intel.com \
    --cc=nelio.laranjeiro@6wind.com \
    --cc=thomas@monjalon.net \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).