DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Nélio Laranjeiro" <nelio.laranjeiro@6wind.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: Yongseok Koh <yskoh@mellanox.com>,
	"Walker, Benjamin" <benjamin.walker@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"thomas@monjalon.net" <thomas@monjalon.net>,
	"andras.kovacs@ericsson.com" <andras.kovacs@ericsson.com>,
	"Wiles, Keith" <keith.wiles@intel.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>
Subject: Re: [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK
Date: Mon, 5 Feb 2018 11:18:52 +0100	[thread overview]
Message-ID: <20180205101852.owogsnbcach32z2k@laranjeiro-vm.dev.6wind.com> (raw)
In-Reply-To: <afbeaf2a-3d5b-544e-d6ae-0d6b6d2f6023@intel.com>

On Mon, Feb 05, 2018 at 10:03:35AM +0000, Burakov, Anatoly wrote:
> On 02-Feb-18 7:28 PM, Yongseok Koh wrote:
> > On Tue, Dec 26, 2017 at 05:19:25PM +0000, Walker, Benjamin wrote:
> > > On Fri, 2017-12-22 at 09:13 +0000, Burakov, Anatoly wrote:
> > > > On 21-Dec-17 9:38 PM, Walker, Benjamin wrote:
> > > > > SPDK will need some way to register for a notification when pages are
> > > > > allocated
> > > > > or freed. For storage, the number of requests per second is (relative to
> > > > > networking) fairly small (hundreds of thousands per second in a traditional
> > > > > block storage stack, or a few million per second with SPDK). Given that, we
> > > > > can
> > > > > afford to do a dynamic lookup from va to pa/iova on each request in order to
> > > > > greatly simplify our APIs (users can just pass pointers around instead of
> > > > > mbufs). DPDK has a way to lookup the pa from a given va, but it does so by
> > > > > scanning /proc/self/pagemap and is very slow. SPDK instead handles this by
> > > > > implementing a lookup table of va to pa/iova which we populate by scanning
> > > > > through the DPDK memory segments at start up, so the lookup in our table is
> > > > > sufficiently fast for storage use cases. If the list of memory segments
> > > > > changes,
> > > > > we need to know about it in order to update our map.
> > > > 
> > > > Hi Benjamin,
> > > > 
> > > > So, in other words, we need callbacks on alloa/free. What information
> > > > would SPDK need when receiving this notification? Since we can't really
> > > > know in advance how many pages we allocate (it may be one, it may be a
> > > > thousand) and they no longer are guaranteed to be contiguous, would a
> > > > per-page callback be OK? Alternatively, we could have one callback per
> > > > operation, but only provide VA and size of allocated memory, while
> > > > leaving everything else to the user. I do add a virt2memseg() function
> > > > which would allow you to look up segment physical addresses easier, so
> > > > you won't have to manually scan memseg lists to get IOVA for a given VA.
> > > > 
> > > > Thanks for your feedback and suggestions!
> > > 
> > > Yes - callbacks on alloc/free would be perfect. Ideally for us we want one
> > > callback per virtual memory region allocated, plus a function we can call to
> > > find the physical addresses/page break points on that virtual region. The
> > > function that finds the physical addresses does not have to be efficient - we'll
> > > just call that once when the new region is allocated and store the results in a
> > > fast lookup table. One call per virtual region is better for us than one call
> > > per physical page because we're actually keeping multiple different types of
> > > memory address translation tables in SPDK. One translates from va to pa/iova, so
> > > for this one we need to break this up into physical pages and it doesn't matter
> > > if you do one call per virtual region or one per physical page. However another
> > > one translates from va to RDMA lkey, so it is much more efficient if we can
> > > register large virtual regions in a single call.
> > 
> > Another yes to callbacks. Like Benjamin mentioned about RDMA, MLX PMD has to
> > look up LKEY per each packet DMA. Let me briefly explain about this for your
> > understanding. For security reason, we don't allow application initiates a DMA
> > transaction with unknown random physical addresses. Instead, va-to-pa mapping
> > (we call it Memory Region) should be pre-registered and LKEY is the index of the
> > translation entry registered in device. With the current static memory model, it
> > is easy to manage because v-p mapping is unchanged over time. But if it becomes
> > dynamic, MLX PMD should get notified with the event to register/un-regsiter
> > Memory Region.
> > 
> > For MLX PMD, it is also enough to get one notification per allocation/free of a
> > virutal memory region. It shouldn't necessarily be a per-page call like Benjamin
> > mentioned because PA of region doesn't need to be contiguous for registration.
> > But it doesn't need to know about physical address of the region (I'm not saying
> > it is unnecessary, but just FYI :-).
> > 
> > Thanks,
> > Yongseok
> > 
> 
> Thanks for your feedback, good to hear we're on the right track. I already
> have a prototype implementation of this working, due for v1 submission :)

Hi Anatoly,

Good to know.
Do you see some performances impact with this series?

Thanks,

-- 
Nélio Laranjeiro
6WIND

  reply	other threads:[~2018-02-05 10:18 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-19 11:14 Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 01/23] eal: move get_virtual_area out of linuxapp eal_memory.c Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 02/23] eal: add function to report number of detected sockets Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 03/23] eal: add rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 04/23] eal: move all locking to heap Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 05/23] eal: protect malloc heap stats with a lock Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 06/23] eal: make malloc a doubly-linked list Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 07/23] eal: make malloc_elem_join_adjacent_free public Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 08/23] eal: add "single file segments" command-line option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 09/23] eal: add "legacy memory" option Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 10/23] eal: read hugepage counts from node-specific sysfs path Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 11/23] eal: replace memseg with memseg lists Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 12/23] eal: add support for dynamic memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 13/23] eal: make use of dynamic memory allocation for init Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 14/23] eal: add support for dynamic unmapping of pages Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 15/23] eal: add API to check if memory is physically contiguous Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 16/23] eal: enable dynamic memory allocation/free on malloc/free Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 17/23] eal: add backend support for contiguous memory allocation Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 18/23] eal: add rte_malloc support for allocating contiguous memory Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 19/23] eal: enable reserving physically contiguous memzones Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 20/23] eal: make memzones use rte_fbarray Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 21/23] mempool: add support for the new memory allocation methods Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 22/23] vfio: allow to map other memory regions Anatoly Burakov
2017-12-19 11:14 ` [dpdk-dev] [RFC v2 23/23] eal: map/unmap memory with VFIO when alloc/free pages Anatoly Burakov
2017-12-19 15:46 ` [dpdk-dev] [RFC v2 00/23] Dynamic memory allocation for DPDK Stephen Hemminger
2017-12-19 16:02   ` Burakov, Anatoly
2017-12-19 16:06     ` Stephen Hemminger
2017-12-19 16:09       ` Burakov, Anatoly
2017-12-21 21:38 ` Walker, Benjamin
2017-12-22  9:13   ` Burakov, Anatoly
2017-12-26 17:19     ` Walker, Benjamin
2018-02-02 19:28       ` Yongseok Koh
2018-02-05 10:03         ` Burakov, Anatoly
2018-02-05 10:18           ` Nélio Laranjeiro [this message]
2018-02-05 10:36             ` Burakov, Anatoly
2018-02-06  9:10               ` Nélio Laranjeiro
2018-02-14  2:01           ` Yongseok Koh
2018-02-14  9:32             ` Burakov, Anatoly
2018-02-14 18:13               ` Yongseok Koh
2018-01-13 14:13 ` Burakov, Anatoly
2018-01-23 22:33 ` Yongseok Koh
2018-01-25 16:18   ` Burakov, Anatoly
2018-02-14  8:04 ` Thomas Monjalon
2018-02-14 10:07   ` Burakov, Anatoly
2018-04-25 16:02     ` Burakov, Anatoly
2018-04-25 16:12       ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180205101852.owogsnbcach32z2k@laranjeiro-vm.dev.6wind.com \
    --to=nelio.laranjeiro@6wind.com \
    --cc=anatoly.burakov@intel.com \
    --cc=andras.kovacs@ericsson.com \
    --cc=benjamin.walker@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=keith.wiles@intel.com \
    --cc=thomas@monjalon.net \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).