DPDK patches and discussions
 help / color / mirror / Atom feed
From: Rajesh Ravi <rajesh.ravi@broadcom.com>
To: David Marchand <david.marchand@redhat.com>
Cc: "Burakov, Anatoly" <anatoly.burakov@intel.com>,
	dev <dev@dpdk.org>,
	 Bruce Richardson <bruce.richardson@intel.com>,
	Ajit Khaparde <ajit.khaparde@broadcom.com>,
	 Jonathan Richardson <jonathan.richardson@broadcom.com>,
	 Scott Branden <scott.branden@broadcom.com>,
	 Vikram Mysore Prakash <vikram.prakash@broadcom.com>,
	Srinath Mannam <srinath.mannam@broadcom.com>,
	 Thomas Monjalon <thomas@monjalon.net>
Subject: Re: [dpdk-dev] [PATCH 19.11] vfio: fix DMA mapping of externally allocated heaps
Date: Wed, 6 Nov 2019 20:57:59 +0530	[thread overview]
Message-ID: <CAAr5zNebTYCVgHp_406ES8b--qqYg5SvTQ5DxW1sJfXQB8mVpA@mail.gmail.com> (raw)
In-Reply-To: <CAJFAV8zTsxtc-qmtArP3XguoHF67HVq=XZnhcU5gM1jA9RVAMg@mail.gmail.com>

Thanks Anatoly & David.
The main patch provided  by Anatoly is not compatible with DPDK 19.02.
So I had to make following patch and verified it to be working with DPDK
19.02 & SPDK 19.07 combination.

------------------------------------------------------------------------------------------------------------------------------------------------------------------

diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h
b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 84aabe3..49934f5 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -35,6 +35,7 @@ struct rte_memseg_list {
  volatile uint32_t version; /**< version number for multiprocess sync. */
  size_t len; /**< Length of memory area covered by this memseg list. */
  unsigned int external; /**< 1 if this list points to external memory */
+ unsigned int heap; /**< 1 if this list points to a heap */
  struct rte_fbarray memseg_arr;
 };

diff --git a/lib/librte_eal/common/rte_malloc.c
b/lib/librte_eal/common/rte_malloc.c
index b39de3c..4d31bf7 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -368,6 +368,7 @@ void rte_free(void *addr)

  rte_spinlock_lock(&heap->lock);
  ret = malloc_heap_add_external_memory(heap, msl);
+ msl->heap = 1; /* mark it as heap segment */
  rte_spinlock_unlock(&heap->lock);

 unlock:
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 1b96b57..33dd923 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -836,6 +836,7 @@ void numa_error(char *where)
  msl->page_sz = page_sz;
  msl->socket_id = socket_id;
  msl->base_va = NULL;
+ msl->heap = 1; /* mark it as a heap segment */

  RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n",
  (size_t)page_sz >> 10, socket_id);
@@ -1410,6 +1411,7 @@ void numa_error(char *where)
  msl->page_sz = page_sz;
  msl->socket_id = 0;
  msl->len = internal_config.memory;
+ msl->heap = 1;

  /* we're in single-file segments mode, so only the segment list
  * fd needs to be set up.
@@ -1682,6 +1684,7 @@ void numa_error(char *where)
  mem_sz = msl->len;
  munmap(msl->base_va, mem_sz);
  msl->base_va = NULL;
+ msl->heap = 0;

  /* destroy backing fbarray */
  rte_fbarray_destroy(&msl->memseg_arr);
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 61b54f9..3bb3e6b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1238,7 +1238,16 @@ static int vfio_dma_mem_map(struct vfio_config
*vfio_cfg, uint64_t vaddr,
 {
  int *vfio_container_fd = arg;

- if (msl->external & !internal_config.iso_cmem)
+ /* skip external memory that isn't a heap */
+ if (msl->external && !msl->heap)
+ return 0;
+
+ /* skip any segments with invalid IOVA addresses */
+ if (ms->iova == RTE_BAD_IOVA)
+ return 0;
+
+ /* if IOVA mode is VA, we've already mapped the internal segments */
+ if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
  return 0;

  return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
@@ -1362,9 +1371,19 @@ static int vfio_dma_mem_map(struct vfio_config
*vfio_cfg, uint64_t vaddr,
 {
  int *vfio_container_fd = arg;

- if (msl->external)
+ /* skip external memory that isn't a heap */
+ if (msl->external && !msl->heap)
  return 0;

+ /* skip any segments with invalid IOVA addresses */
+ if (ms->iova == RTE_BAD_IOVA)
+ return 0;
+
+#if 0
+ if (ms->addr_64 == param->addr_64)
+ return 0;
+#endif
+
  return vfio_spapr_dma_do_map(*vfio_container_fd, ms->addr_64, ms->iova,
  ms->len, 1);
 }
@@ -1381,6 +1400,12 @@ struct spapr_walk_param {
  uint64_t max = ms->iova + ms->len;

  if (msl->external)
+ /* skip external memory that isn't a heap */
+ if (msl->external && !msl->heap)
+ return 0;
+
+ /* skip any segments with invalid IOVA addresses */
+ if (ms->iova == RTE_BAD_IOVA)
  return 0;

  if (max > param->window_size) {

------------------------------------------------------------------------------------------------------------------------------------------------------------------

Regards,
Rajesh

On Wed, Nov 6, 2019 at 7:28 PM David Marchand <david.marchand@redhat.com>
wrote:

> On Tue, Nov 5, 2019 at 6:15 PM Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
> >
> > On 05-Nov-19 3:15 PM, Anatoly Burakov wrote:
> > > Currently, externally created heaps are supposed to be automatically
> > > mapped for VFIO DMA by EAL, however they only do so if, at the time of
> > > heap creation, VFIO is initialized and has at least one device
> > > available. If no devices are available at the time of heap creation (or
> > > if devices were available, but were since hot-unplugged, thus dropping
> > > all VFIO container mappings), then VFIO mapping code would have skipped
> > > over externally allocated heaps.
> > >
> > > The fix is two-fold. First, we allow externally allocated memory
> > > segments to be marked as "heap" segments. This allows us to distinguish
> > > between external memory segments that were created via heap API, from
> > > those that were created via rte_extmem_register() API.
> > >
> > > Then, we fix the VFIO code to only skip non-heap external segments.
> > > Also, since external heaps are not guaranteed to have valid IOVA
> > > addresses, we will skip those which have invalid IOVA addresses as
> well.
> > >
> > > Fixes: 0f526d674f8e ("malloc: separate creating memseg list and malloc
> heap")
> > >
> > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > ---
> > >
> > > Notes:
> > >      This cannot be backported to older releases as it breaks the
> > >      API and ABI. A separate fix is in the works for stable.
> > >
> >
> > Alternative, non-breaking implementation available (which will be slower
> > due to O(N) memseg list heaps lookups):
> >
> > http://patches.dpdk.org/patch/62486/
> >
> > I'm fine with either option being merged.
> >
> > The more perfect solution would've been to rename "msl->external" into
> > "msl->flags" and have various flags for memseg lists, but it's too late
> > to break the API now.
>
> Either way is fine for me (between the 18.11 and the master patches you
> sent).
> Breaking the ABI is acceptable, but I agree the API is another story.
>
> If the code seems better to you with the additional heap flag, let's go
> for it.
>
> I would still like to hear from Rajesh though, since he seems to be
> the first to hit this issue.
>
>
> --
> David Marchand
>
>

-- 
Regards,
Rajesh

  reply	other threads:[~2019-11-06 18:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-05 15:15 Anatoly Burakov
2019-11-05 17:15 ` Burakov, Anatoly
2019-11-06 13:58   ` David Marchand
2019-11-06 15:27     ` Rajesh Ravi [this message]
2019-11-06 16:03       ` Burakov, Anatoly
2019-11-06 21:53 ` David Marchand
2019-11-06 22:12   ` Thomas Monjalon
2019-11-07  6:35 ` Rajesh Ravi
2019-11-07 15:35 ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAr5zNebTYCVgHp_406ES8b--qqYg5SvTQ5DxW1sJfXQB8mVpA@mail.gmail.com \
    --to=rajesh.ravi@broadcom.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=jonathan.richardson@broadcom.com \
    --cc=scott.branden@broadcom.com \
    --cc=srinath.mannam@broadcom.com \
    --cc=thomas@monjalon.net \
    --cc=vikram.prakash@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).