From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 70DD8A04AD;
	Wed,  6 Nov 2019 19:17:37 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 63CC11DFE7;
	Wed,  6 Nov 2019 19:17:31 +0100 (CET)
Received: from mail-il1-f193.google.com (mail-il1-f193.google.com
 [209.85.166.193]) by dpdk.org (Postfix) with ESMTP id 4BC701C1D3
 for <dev@dpdk.org>; Wed,  6 Nov 2019 16:28:37 +0100 (CET)
Received: by mail-il1-f193.google.com with SMTP id n18so15144882ilt.9
 for <dev@dpdk.org>; Wed, 06 Nov 2019 07:28:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=2qmMlpE0nQ+qNAL6NspS287/QjFsGy9MTDhADTEnCG4=;
 b=ckOxYYjjzgu1fC8EttUhlwMOX2ZG0n4ptOoh0ryYI5QLRb63bpuVq+dqzjxGt75EHn
 1Xz9gmoZHCK7ctxd3x+FiMwySLJmqrPTJzwDOCrCs0CdSi6aW3iFs7JYXNLmyRXwWnwd
 xNlPM3v99oaRso7TcVzq8LyAcLAg+SXcKOojs=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=2qmMlpE0nQ+qNAL6NspS287/QjFsGy9MTDhADTEnCG4=;
 b=cxoKy+WTk4d68/otmz9f2A6mPU5yosy7Feq1uyBXD+dumQgLg7ck3YjopD8X8FvKda
 RE/px0VB8iQlIoLfEsNi8dCPlzaGrbuFldClDfkmFArJxFWw1uC1fYN81xWLVs2V+hMT
 y4ad2SlVNuk6kCdMcJt696otUjtkgByc6FJXghSUQDxILVSso6nZizQdSZsAr+RVsxus
 u/eYiO1TkoRHVzpsHVNu5SX8v5P6Znrxai9HLOj2X3RXaG0LBbGO1gEQ9R8ef3oLMjaA
 2sSHtaEHAtoK+zFdEW+yzbh1LLeHa+zxHX7Uxc1fkf/Ty+rA6j+05kOd2xVf5KUw/0Hp
 9g6Q==
X-Gm-Message-State: APjAAAV6G3Z6VDHPWLKOtA7Ww4FXuQ2HSUJEL9QNa/EW04c7Kktz3uT9
 xhj34xNrGEWUCMJqHUoklFIbWTbiU9o69x3pDOpaGQ==
X-Google-Smtp-Source: APXvYqz7LAdCBJTDwpxDlrCXf8yJWqIXYc/C7me752HA1Jeg2iEcTYkF0pK/5pwEKch1CTCMLRtFFxT8K2io6mG13nc=
X-Received: by 2002:a92:b109:: with SMTP id t9mr1480804ilh.144.1573054116474; 
 Wed, 06 Nov 2019 07:28:36 -0800 (PST)
MIME-Version: 1.0
References: <5dd669557fc499df5e345a14c9252c095eff6c07.1572966906.git.anatoly.burakov@intel.com>
 <cdfb9a63-3899-8b2e-402a-689e7fd612bb@intel.com>
 <CAJFAV8zTsxtc-qmtArP3XguoHF67HVq=XZnhcU5gM1jA9RVAMg@mail.gmail.com>
In-Reply-To: <CAJFAV8zTsxtc-qmtArP3XguoHF67HVq=XZnhcU5gM1jA9RVAMg@mail.gmail.com>
From: Rajesh Ravi <rajesh.ravi@broadcom.com>
Date: Wed, 6 Nov 2019 20:57:59 +0530
Message-ID: <CAAr5zNebTYCVgHp_406ES8b--qqYg5SvTQ5DxW1sJfXQB8mVpA@mail.gmail.com>
To: David Marchand <david.marchand@redhat.com>
Cc: "Burakov, Anatoly" <anatoly.burakov@intel.com>, dev <dev@dpdk.org>, 
 Bruce Richardson <bruce.richardson@intel.com>,
 Ajit Khaparde <ajit.khaparde@broadcom.com>, 
 Jonathan Richardson <jonathan.richardson@broadcom.com>, 
 Scott Branden <scott.branden@broadcom.com>, 
 Vikram Mysore Prakash <vikram.prakash@broadcom.com>,
 Srinath Mannam <srinath.mannam@broadcom.com>, 
 Thomas Monjalon <thomas@monjalon.net>
X-Mailman-Approved-At: Wed, 06 Nov 2019 19:17:27 +0100
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: Re: [dpdk-dev] [PATCH 19.11] vfio: fix DMA mapping of externally
 allocated heaps
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Thanks Anatoly & David.
The main patch provided  by Anatoly is not compatible with DPDK 19.02.
So I had to make following patch and verified it to be working with DPDK
19.02 & SPDK 19.07 combination.

------------------------------------------------------------------------------------------------------------------------------------------------------------------

diff --git a/lib/librte_eal/common/include/rte_eal_memconfig.h
b/lib/librte_eal/common/include/rte_eal_memconfig.h
index 84aabe3..49934f5 100644
--- a/lib/librte_eal/common/include/rte_eal_memconfig.h
+++ b/lib/librte_eal/common/include/rte_eal_memconfig.h
@@ -35,6 +35,7 @@ struct rte_memseg_list {
  volatile uint32_t version; /**< version number for multiprocess sync. */
  size_t len; /**< Length of memory area covered by this memseg list. */
  unsigned int external; /**< 1 if this list points to external memory */
+ unsigned int heap; /**< 1 if this list points to a heap */
  struct rte_fbarray memseg_arr;
 };

diff --git a/lib/librte_eal/common/rte_malloc.c
b/lib/librte_eal/common/rte_malloc.c
index b39de3c..4d31bf7 100644
--- a/lib/librte_eal/common/rte_malloc.c
+++ b/lib/librte_eal/common/rte_malloc.c
@@ -368,6 +368,7 @@ void rte_free(void *addr)

  rte_spinlock_lock(&heap->lock);
  ret = malloc_heap_add_external_memory(heap, msl);
+ msl->heap = 1; /* mark it as heap segment */
  rte_spinlock_unlock(&heap->lock);

 unlock:
diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c
b/lib/librte_eal/linuxapp/eal/eal_memory.c
index 1b96b57..33dd923 100644
--- a/lib/librte_eal/linuxapp/eal/eal_memory.c
+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
@@ -836,6 +836,7 @@ void numa_error(char *where)
  msl->page_sz = page_sz;
  msl->socket_id = socket_id;
  msl->base_va = NULL;
+ msl->heap = 1; /* mark it as a heap segment */

  RTE_LOG(DEBUG, EAL, "Memseg list allocated: 0x%zxkB at socket %i\n",
  (size_t)page_sz >> 10, socket_id);
@@ -1410,6 +1411,7 @@ void numa_error(char *where)
  msl->page_sz = page_sz;
  msl->socket_id = 0;
  msl->len = internal_config.memory;
+ msl->heap = 1;

  /* we're in single-file segments mode, so only the segment list
  * fd needs to be set up.
@@ -1682,6 +1684,7 @@ void numa_error(char *where)
  mem_sz = msl->len;
  munmap(msl->base_va, mem_sz);
  msl->base_va = NULL;
+ msl->heap = 0;

  /* destroy backing fbarray */
  rte_fbarray_destroy(&msl->memseg_arr);
diff --git a/lib/librte_eal/linuxapp/eal/eal_vfio.c
b/lib/librte_eal/linuxapp/eal/eal_vfio.c
index 61b54f9..3bb3e6b 100644
--- a/lib/librte_eal/linuxapp/eal/eal_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_vfio.c
@@ -1238,7 +1238,16 @@ static int vfio_dma_mem_map(struct vfio_config
*vfio_cfg, uint64_t vaddr,
 {
  int *vfio_container_fd = arg;

- if (msl->external & !internal_config.iso_cmem)
+ /* skip external memory that isn't a heap */
+ if (msl->external && !msl->heap)
+ return 0;
+
+ /* skip any segments with invalid IOVA addresses */
+ if (ms->iova == RTE_BAD_IOVA)
+ return 0;
+
+ /* if IOVA mode is VA, we've already mapped the internal segments */
+ if (!msl->external && rte_eal_iova_mode() == RTE_IOVA_VA)
  return 0;

  return vfio_type1_dma_mem_map(*vfio_container_fd, ms->addr_64, ms->iova,
@@ -1362,9 +1371,19 @@ static int vfio_dma_mem_map(struct vfio_config
*vfio_cfg, uint64_t vaddr,
 {
  int *vfio_container_fd = arg;

- if (msl->external)
+ /* skip external memory that isn't a heap */
+ if (msl->external && !msl->heap)
  return 0;

+ /* skip any segments with invalid IOVA addresses */
+ if (ms->iova == RTE_BAD_IOVA)
+ return 0;
+
+#if 0
+ if (ms->addr_64 == param->addr_64)
+ return 0;
+#endif
+
  return vfio_spapr_dma_do_map(*vfio_container_fd, ms->addr_64, ms->iova,
  ms->len, 1);
 }
@@ -1381,6 +1400,12 @@ struct spapr_walk_param {
  uint64_t max = ms->iova + ms->len;

  if (msl->external)
+ /* skip external memory that isn't a heap */
+ if (msl->external && !msl->heap)
+ return 0;
+
+ /* skip any segments with invalid IOVA addresses */
+ if (ms->iova == RTE_BAD_IOVA)
  return 0;

  if (max > param->window_size) {

------------------------------------------------------------------------------------------------------------------------------------------------------------------

Regards,
Rajesh

On Wed, Nov 6, 2019 at 7:28 PM David Marchand <david.marchand@redhat.com>
wrote:

> On Tue, Nov 5, 2019 at 6:15 PM Burakov, Anatoly
> <anatoly.burakov@intel.com> wrote:
> >
> > On 05-Nov-19 3:15 PM, Anatoly Burakov wrote:
> > > Currently, externally created heaps are supposed to be automatically
> > > mapped for VFIO DMA by EAL, however they only do so if, at the time of
> > > heap creation, VFIO is initialized and has at least one device
> > > available. If no devices are available at the time of heap creation (or
> > > if devices were available, but were since hot-unplugged, thus dropping
> > > all VFIO container mappings), then VFIO mapping code would have skipped
> > > over externally allocated heaps.
> > >
> > > The fix is two-fold. First, we allow externally allocated memory
> > > segments to be marked as "heap" segments. This allows us to distinguish
> > > between external memory segments that were created via heap API, from
> > > those that were created via rte_extmem_register() API.
> > >
> > > Then, we fix the VFIO code to only skip non-heap external segments.
> > > Also, since external heaps are not guaranteed to have valid IOVA
> > > addresses, we will skip those which have invalid IOVA addresses as
> well.
> > >
> > > Fixes: 0f526d674f8e ("malloc: separate creating memseg list and malloc
> heap")
> > >
> > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
> > > ---
> > >
> > > Notes:
> > >      This cannot be backported to older releases as it breaks the
> > >      API and ABI. A separate fix is in the works for stable.
> > >
> >
> > Alternative, non-breaking implementation available (which will be slower
> > due to O(N) memseg list heaps lookups):
> >
> > http://patches.dpdk.org/patch/62486/
> >
> > I'm fine with either option being merged.
> >
> > The more perfect solution would've been to rename "msl->external" into
> > "msl->flags" and have various flags for memseg lists, but it's too late
> > to break the API now.
>
> Either way is fine for me (between the 18.11 and the master patches you
> sent).
> Breaking the ABI is acceptable, but I agree the API is another story.
>
> If the code seems better to you with the additional heap flag, let's go
> for it.
>
> I would still like to hear from Rajesh though, since he seems to be
> the first to hit this issue.
>
>
> --
> David Marchand
>
>

-- 
Regards,
Rajesh