DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Rajesh Ravi <rajesh.ravi@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>,
	dev@dpdk.org,
	Jonathan Richardson <jonathan.richardson@broadcom.com>,
	Scott Branden <scott.branden@broadcom.com>,
	Vikram Mysore Prakash <vikram.prakash@broadcom.com>,
	Srinath Mannam <srinath.mannam@broadcom.com>
Subject: Re: [dpdk-dev] [PATCH] eal: add option --iso-cmem for external custom memory
Date: Fri, 25 Oct 2019 16:02:27 +0100	[thread overview]
Message-ID: <31bff5b7-169c-e158-3a87-6448272c571c@intel.com> (raw)
In-Reply-To: <CAAr5zNfUgD69gRFTuvQkaO6z68tqyENgt76giHpKticPqJFong@mail.gmail.com>

On 25-Oct-19 1:53 PM, Rajesh Ravi wrote:
> Thanks Anatoly.
> I think now we 're close to sync. You understood it right: the problem 
> is dma/iommu mappings are not getting created.
> 
> 1)vfio_mem_event_callback() is getting called later when memory is 
> allocated from heap but not when external memory is added to heap

Please correct me if i'm wrong, but i believe that's not true. 
vfio_mem_event_callback() is called every time memory is added to a 
heap. That includes internal and external memory [1] [2].

[1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/rte_malloc.c#n397
[2] http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/malloc_heap.c#n1231

The only way the callback doesn't get called is if VFIO doesn't get 
initialize and/or fails to register said callback.

> 
> 2)vfio_dma_mem_map() is called inside vfio_mem_event_callback() but it's 
> skipped in case of external memory because it's invoked inside following 
> if condition:
>      if (rte_eal_iova_mode() == RTE_IOVA_VA && msl->external == 0)
>      [ I verified this also with gdb by stepping over line by line 
> inside vfio_mem_event_callback() ]
>    We 're using DPDK v19.02 but I see almost same code and flow in DPDK 
> 19.08-rc4 as well.

I believe that is not true either.

The line you're referring to [1] does *not* skip external memory 
segments (or rather, that's not its intention). Instead, it is checking 
if we can assume that the region is IOVA-contiguous. For IOVA as VA 
mode, we can assume that, *but not for external memory*, which is why it 
is skipped.

After this check, there is a loop that goes memseg by memseg [2], and it 
does *not* skip external memory. The memseg will only get skipped if it 
doesn't have a valid IOVA address [3], which will only happen if you 
don't provide an IOVA table at heap creation time (we can't make 
assumptions about desired IOVA addresses, so we just skip them). Perhaps 
it should output a debug message when it does that, but the important 
thing is, the code does nothing to skip external memory.

[1] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n521
[2] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n533
[3] http://git.dpdk.org/dpdk/tree/lib/librte_eal/linux/eal/eal_vfio.c#n536

> ---------------------------------------------------------------------------------------------------------------------------------
> [ _Further explanation of point 1) :_
>   I used gdb and set break point and checked back trace to verify the 
> call stack trace: first I set break point at rte_malloc_heap_memory_add.
>    Then I hit break point and then set break point at 
> vfio_mem_event_callback. I see from back trace that it's from 
> malloc_heap_alloc () with event = RTE_MEM_EVENT_ALLOC as can be seen 
> from log at the end of this mail.
> 
>   And also I don't see any event for memory ADD as below:
> enum rte_mem_event {
>           RTE_MEM_EVENT_ALLOC = 0, /**< Allocation event. */
>           RTE_MEM_EVENT_FREE,      /**< Deallocation event. */
>   }; ]
> 
> *_Log_*
> *_
> _*
> (gdb) bt
> #0  vfio_mem_event_callback (type=RTE_MEM_EVENT_ALLOC, 
> addr=0x200016c00000, len=4194304, arg=0x0) at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/linuxapp/eal/eal_vfio.c:512
> #1  0x00000000004b78f0 in eal_memalloc_mem_event_notify 
> (event=event@entry=RTE_MEM_EVENT_ALLOC, 
> start=start@entry=0x200016c00000, len=len@entry=4194304)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_memalloc.c:248
> #2  0x00000000004c7a90 in try_expand_heap_primary (contig=false, 
> bound=0, align=35184753770496, flags=0, socket=0, 
> elt_size=35184753770496, pg_sz=2097152, heap=0x569b3c 
> <early_mem_config+10364>)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:412
> #3  try_expand_heap (heap=heap@entry=0x569b3c <early_mem_config+10364>, 
> pg_sz=2097152, elt_size=elt_size@entry=8192, socket=socket@entry=0, 
> flags=flags@entry=0, align=align@entry=2097152, bound=bound@entry=0, 
> contig=contig@entry=false)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:494
> #4  0x00000000004c7f28 in alloc_more_mem_on_socket 
> (heap=heap@entry=0x569b3c <early_mem_config+10364>, 
> size=size@entry=8192, socket=0, flags=flags@entry=0, 
> align=align@entry=2097152, bound=bound@entry=0, contig=contig@entry=false)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:622
> #5  0x00000000004c822c in malloc_heap_alloc_on_heap_id 
> (size=size@entry=8192, heap_id=heap_id@entry=0, flags=flags@entry=0, 
> align=<optimized out>, align@entry=2097152, bound=bound@entry=0, 
> contig=contig@entry=false, type=0x0)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:676
> #6  0x00000000004c8464 in malloc_heap_alloc (type=type@entry=0x0, 
> size=size@entry=8192, socket_arg=<optimized out>, flags=flags@entry=0, 
> align=2097152, bound=bound@entry=0, contig=contig@entry=false)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/malloc_heap.c:714
> #7  0x00000000004c498c in rte_malloc_socket (type=type@entry=0x0, 
> size=size@entry=8192, align=align@entry=2097152, socket_arg=<optimized 
> out>, socket_arg@entry=-1)
>      at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/rte_malloc.c:58
> #8  0x000000000049fe24 in spdk_malloc (size=size@entry=8192, 
> align=align@entry=2097152, phys_addr=phys_addr@entry=0x0, 
> socket_id=socket_id@entry=-1, flags=flags@entry=3)
>      at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:95
> #9  0x000000000049fe94 in spdk_zmalloc (size=8192, 
> align=align@entry=2097152, phys_addr=phys_addr@entry=0x0, 
> socket_id=socket_id@entry=-1, flags=flags@entry=3) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/env.c:108
> #10 0x0000000000431f64 in nvme_pcie_qpair_construct 
> (qpair=0x200016aed1d8, opts=opts@entry=0x0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:1054
> #11 0x000000000043233c in nvme_pcie_ctrlr_construct_admin_qpair 
> (ctrlr=0x200016aed340) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:719
> #12 nvme_pcie_ctrlr_construct (trid=<optimized out>, opts=<optimized 
> out>, devhandle=0x70f4b0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:881
> #13 0x0000000000437674 in nvme_transport_ctrlr_construct 
> (trid=trid@entry=0xffffffffec98, opts=opts@entry=0xffffffffea00, 
> devhandle=devhandle@entry=0x70f4b0)
>      at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:99
> #14 0x0000000000436b08 in nvme_ctrlr_probe 
> (trid=trid@entry=0xffffffffec98, probe_ctx=0x70eae0, 
> devhandle=devhandle@entry=0x70f4b0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:426
> #15 0x0000000000430d40 in pcie_nvme_enum_cb (ctx=0xffffffffefe8, 
> pci_dev=0x70f4b0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:752
> #16 0x00000000004a2e20 in spdk_pci_device_init (_drv=0x51b660 
> <g_nvme_pci_drv>, _dev=0x61b5f0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:347
> #17 0x00000000004d5c68 in rte_pci_probe_one_driver (dev=0x61b5f0, 
> dr=0x51b660 <g_nvme_pci_drv>) at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:185
> #18 pci_probe_all_drivers (dev=0x61b5f0) at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:259
> #19 rte_pci_probe () at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/drivers/bus/pci/pci_common.c:294
> #20 0x00000000004bb490 in rte_bus_probe () at 
> /usr/src/debug/brcm-dpdk/gitAUTOINC+ca5edf7f64-r0/lib/librte_eal/common/eal_common_bus.c:100
> #21 0x00000000004a3030 in spdk_pci_enumerate (driver=0x51b660 
> <g_nvme_pci_drv>, enum_cb=enum_cb@entry=0x430c50 <pcie_nvme_enum_cb>, 
> enum_ctx=enum_ctx@entry=0xffffffffefe8)
>      at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/env_dpdk/pci.c:502
> #22 0x00000000004319c4 in nvme_pcie_ctrlr_scan (probe_ctx=0x70eae0, 
> direct_connect=<optimized out>) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_pcie.c:780
> #23 0x00000000004376ac in nvme_transport_ctrlr_scan 
> (probe_ctx=probe_ctx@entry=0x70eae0, 
> direct_connect=direct_connect@entry=false) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme_transport.c:106
> #24 0x0000000000436be4 in spdk_nvme_probe_internal 
> (probe_ctx=probe_ctx@entry=0x70eae0, 
> direct_connect=direct_connect@entry=false) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:553
> #25 0x0000000000436fc4 in spdk_nvme_probe_async (trid=0xfffffffff0d8, 
> cb_ctx=cb_ctx@entry=0x6402c0, probe_cb=probe_cb@entry=0x420b70 
> <probe_cb>, attach_cb=0x421b50 <attach_cb>, remove_cb=0x0)
>      at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:1078
> #26 0x00000000004372d0 in spdk_nvme_probe (trid=<optimized out>, 
> trid@entry=0x0, cb_ctx=cb_ctx@entry=0x6402c0, 
> probe_cb=probe_cb@entry=0x420b70 <probe_cb>, 
> attach_cb=attach_cb@entry=0x421b50 <attach_cb>, 
> remove_cb=remove_cb@entry=0x0)
>      at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/nvme/nvme.c:625
> #27 0x00000000004223b0 in bdev_nvme_library_init () at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/nvme/bdev_nvme.c:1539
> #28 0x000000000048cf80 in spdk_bdev_modules_init () at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:963
> #29 spdk_bdev_initialize (cb_fn=<optimized out>, cb_arg=<optimized out>) 
> at /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/bdev/bdev.c:1084
> #30 0x00000000004811b8 in spdk_subsystem_init (cb_fn=<optimized out>, 
> cb_arg=<optimized out>) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/subsystem.c:178
> #31 0x0000000000484720 in _spdk_msg_queue_run_batch (max_msgs=<optimized 
> out>, thread=0x60c600) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:406
> #32 spdk_thread_poll (thread=thread@entry=0x60c600, 
> max_msgs=max_msgs@entry=0, now=now@entry=2386134829398) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/thread/thread.c:462
> #33 0x000000000048040c in _spdk_reactor_run (arg=0x60bcc0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:272
> #34 0x0000000000480990 in spdk_reactors_start () at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/reactor.c:381
> #35 0x000000000047f7d4 in spdk_app_start 
> (opts=opts@entry=0xfffffffffa78, start_fn=start_fn@entry=0x407350 
> <nvmf_tgt_started>, arg1=arg1@entry=0x0) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/lib/event/app.c:692
> #36 0x0000000000405b2c in main (argc=7, argv=0xfffffffffc78) at 
> /usr/src/debug/brcm-spdk/gitAUTOINC+6ae1d4b779-r0/app/nvmf_tgt/nvmf_main.c:75
> 
> Regards,
> Rajesh
> 
> On Thu, Oct 24, 2019 at 5:13 PM Burakov, Anatoly 
> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> 
>     On 22-Oct-19 8:56 AM, Rajesh Ravi wrote:
>      > Hi Anatoly,
>      > I tried calling rte_vfio_dma_map() but, it  failed for me because
>      >
>      > vfio_dma_mem_map() failed with error:VFIO support not initialized
>      > because: default_vfio_cfg->vfio_iommu_type=NULL ( container_dma_map
>      > calls vfio_dma_mem_map  with vfio_cfg = default_vfio_cfg )
>      >
>      > call sequence: rte_vfio_dma_map() -->container_dma_map()
>      > --> vfio_dma_mem_map
>      >
>      > Regards,
>      > Rajesh
>      >
>      > On Mon, Oct 21, 2019 at 9:16 PM Rajesh Ravi
>     <rajesh.ravi@broadcom.com <mailto:rajesh.ravi@broadcom.com>
>      > <mailto:rajesh.ravi@broadcom.com
>     <mailto:rajesh.ravi@broadcom.com>>> wrote:
>      >
>      >     Thanks Anatoly for prompt response. Sorry for the delayed
>     response,
>      >     I took some time to reverify with SPDK.
>      >
>      >     Infact, I do want the iommu mapping to be performed. I don't
>     want it
>      >     to be bypassed by type1_map()
>      >     [lib/librte_eal/linuxapp/eal/eal_vfio.c] for external memory.
>      >
>      >     Then, if I understood correctly, you suggested to call
>      >     rte_vfio_dma_map() as an alternative.
>      >
>      >     _Context & clarification_
>      >
>      >     1) We 're using DPDK to manage/allocate memory for SPDK
>     through heap
>      >     API.
>      >
>      >        The only DPDK APIs we 're calling are:
>      >        A)rte_malloc_heap_memory_add() to add external memory to heap.
>      >        B)rte_malloc_heap_get_socket() & rte_malloc_socket() to
>     allocate
>      >     memory
>      >
>      >     /Are you suggesting to make a call to rte_vfio_dma_map() from
>     spdk,
>      >     in addition to the APIs listed under 1)A & 1)B instead of
>     modifying
>      >     DPDK vfio functions?/
>      >         Please confirm, Probably I missed the context and might
>     not have
>      >     understood fully.
>      >
>      >
>      >     2) .dma_user_map_func=vfio_type1_dma_mem_map() is called from 2
>      >     paths in dpdk. In either case call to dma_user_map_func() is
>     skipped.
>      >         A) _During the startup, as you said before:_
>      >           rte_vfio_setup_device() --> type1_map()
>      >
>      >         B)During allocation event:
>      >              vfio_mem_event_callback()  (type=RTE_MEM_EVENT_ALLOC,..)
>      >     -->vfio_dma_mem_map() -->dma_user_map_func()
>      >   
>       ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>      >     _Conclusion_
>      >
>      >     So we have 2 alternatives:
>      >
>      >     A) make additional call to rte_vfio_dma_map() API after adding
>      >     external memory using rte_malloc_heap_memory_add()  API.
>      >
>      >     B) remove msl->external check which bypasses call to
>      >     dma_user_map_func() in DPDK.
>      >
>      >     I modified DPDK functions [Option B) ]. I guess you 're
>      >     suggesting option A)
>      >
>      >     Please confirm.
>      >
>      >     Regards,
>      >     Rajesh
> 
>     Hi Rajesh,
> 
>     Apologies for delayed response.
> 
>     I'm still not sure i understand your problem.
> 
>     When adding memory with rte_malloc_heap_memory_add(), this memory
>     is, as
>     far as i can tell, automatically mapped for DMA mapping with VFIO. This
>     happens because:
> 
>     rte_malloc_heap_memory_add() -> malloc_heap_add_external_memory() ->
>     eal_memalloc_mem_event_notify() -> vfio_mem_event_callback() ->
>     vfio_dma_mem_map()
> 
>     VFIO registers itself for mem event callbacks in
>     eal_vfio.c:rte_vfio_setup_device():791.
> 
>     So, there is no need to map anything for DMA when you're creating
>     external heaps - it is being mapped automatically by EAL.
> 
>     For an example implementation have a look at testpmd, specifically
>     testpmd.c:mbuf_pool_create() for MP_ALLOC_XMEM type. This calls
>     testpmd.c:setup_extmem(), which creates an external heap. Notice how it
>     doesn't do anything to map the memory for DMA, because that is already
>     done by EAL. You can verify this working by adding --mp-alloc=xmem
>     argument to testpmd to force it to use external memory for its mempools.
> 
>     So, what is the actual issue here, given that the memory is mapped for
>     DMA by EAL automatically when it is added?
> 
>     -- 
>     Thanks,
>     Anatoly
> 
> 
> 
> -- 
> Regards,
> Rajesh


-- 
Thanks,
Anatoly

  reply	other threads:[~2019-10-25 15:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-15  5:30 Ajit Khaparde
2019-10-17 15:45 ` Burakov, Anatoly
2019-10-18 10:54   ` Rajesh Ravi
2019-10-18 16:53     ` Burakov, Anatoly
2019-10-21 15:46       ` Rajesh Ravi
2019-10-22  7:56         ` Rajesh Ravi
2019-10-24 11:43           ` Burakov, Anatoly
2019-10-25 12:53             ` Rajesh Ravi
2019-10-25 15:02               ` Burakov, Anatoly [this message]
2019-10-30 19:50                 ` Rajesh Ravi
2019-11-04 10:25                   ` Burakov, Anatoly
2019-11-05 11:41                     ` Burakov, Anatoly
2019-11-05 14:10                       ` Rajesh Ravi
2019-11-05 15:18                         ` Burakov, Anatoly
2019-11-05 17:13                           ` Burakov, Anatoly
2019-11-06 13:55                             ` David Marchand
2019-11-07 15:51                               ` David Marchand
2019-11-07 16:16                                 ` Rajesh Ravi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31bff5b7-169c-e158-3a87-6448272c571c@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=dev@dpdk.org \
    --cc=jonathan.richardson@broadcom.com \
    --cc=rajesh.ravi@broadcom.com \
    --cc=scott.branden@broadcom.com \
    --cc=srinath.mannam@broadcom.com \
    --cc=vikram.prakash@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).