DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Xia, Chenbo" <chenbo.xia@intel.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>,
	Thomas Monjalon <thomas@monjalon.net>,
	David Marchand <david.marchand@redhat.com>,
	"Wu, Jingjing" <jingjing.wu@intel.com>
Cc: dev <dev@dpdk.org>,
	Stephen Hemminger <stephen@networkplumber.org>,
	"Liang, Cunming" <cunming.liang@intel.com>,
	"Lu, Xiuchun" <xiuchun.lu@intel.com>,
	"Li, Miao" <miao.li@intel.com>
Subject: Re: [dpdk-dev] [PATCH 0/8] Introduce emudev library and iavf emudev driver
Date: Wed, 23 Dec 2020 05:28:17 +0000	[thread overview]
Message-ID: <MN2PR11MB406358231FAD0C664D54B3399CDE0@MN2PR11MB4063.namprd11.prod.outlook.com> (raw)
In-Reply-To: <33a01a88-327b-844b-9f3c-03660bc2ceb3@redhat.com>

Hi Maxime,

> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> Sent: Tuesday, December 22, 2020 4:49 PM
> To: Xia, Chenbo <chenbo.xia@intel.com>; Thomas Monjalon <thomas@monjalon.net>;
> David Marchand <david.marchand@redhat.com>
> Cc: dev <dev@dpdk.org>; Stephen Hemminger <stephen@networkplumber.org>; Liang,
> Cunming <cunming.liang@intel.com>; Lu, Xiuchun <xiuchun.lu@intel.com>; Li,
> Miao <miao.li@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Subject: Re: [dpdk-dev] [PATCH 0/8] Introduce emudev library and iavf emudev
> driver
> 
> Hi Chenbo,
> 
> Thanks for the detailed reply.
> 
> On 12/22/20 4:09 AM, Xia, Chenbo wrote:
> > Hi Maxime,
> >
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin@redhat.com>
> >> Sent: Monday, December 21, 2020 8:02 PM
> >> To: Xia, Chenbo <chenbo.xia@intel.com>; Thomas Monjalon
> <thomas@monjalon.net>;
> >> David Marchand <david.marchand@redhat.com>
> >> Cc: dev <dev@dpdk.org>; Stephen Hemminger <stephen@networkplumber.org>;
> Liang,
> >> Cunming <cunming.liang@intel.com>; Lu, Xiuchun <xiuchun.lu@intel.com>; Li,
> >> Miao <miao.li@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> >> Subject: Re: [dpdk-dev] [PATCH 0/8] Introduce emudev library and iavf
> emudev
> >> driver
> >>
> >>
> >>
> >> On 12/21/20 10:52 AM, Maxime Coquelin wrote:
> >>> Hi Chenbo,
> >>>
> >>> On 12/19/20 7:11 AM, Xia, Chenbo wrote:
> >>>> Hi David,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: David Marchand <david.marchand@redhat.com>
> >>>>> Sent: Friday, December 18, 2020 5:54 PM
> >>>>> To: Xia, Chenbo <chenbo.xia@intel.com>
> >>>>> Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Stephen
> >>>>> Hemminger <stephen@networkplumber.org>; Liang, Cunming
> >>>>> <cunming.liang@intel.com>; Lu, Xiuchun <xiuchun.lu@intel.com>; Li, Miao
> >>>>> <miao.li@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> >>>>> Subject: Re: [PATCH 0/8] Introduce emudev library and iavf emudev driver
> >>>>>
> >>>>> On Fri, Dec 18, 2020 at 9:02 AM Chenbo Xia <chenbo.xia@intel.com> wrote:
> >>>>>>
> >>>>>> This series introduces a new device abstraction called emudev for
> >>>>> emulated
> >>>>>> devices. A new library (librte_emudev) is implemented. The first emudev
> >>>>>> driver is also introduced, which emulates Intel Adaptive Virtual
> >>>>> Function
> >>>>>> (iavf) as a software network device.
> >>>>>>
> >>>>>> This series has a dependency on librte_vfio_user patch series:
> >>>>>> http://patchwork.dpdk.org/cover/85389/
> >>>>>>
> >>>>>> Background & Motivation
> >>>>>> -----------------------
> >>>>>> The disaggregated/multi-process QEMU is using VFIO-over-socket/vfio-
> user
> >>>>>> as the main transport mechanism to disaggregate IO services from QEMU.
> >>>>>> Therefore, librte_vfio_user is introduced in DPDK to accommodate
> >>>>>> emulated devices for high performance I/O. Although vfio-user library
> >>>>>> provides possibility of emulating devices in DPDK, DPDK does not have
> >>>>>> a device abstraction for emulated devices. A good device abstraction
> >>>>> will
> >>>>>> be useful for applications or high performance data path driver. With
> >>>>>> this consideration, emudev library is designed and implemented. It also
> >>>>>> make it possbile to keep modular design on emulated devices by
> >>>>> implementing
> >>>>>> data path related logic in a standalone driver (e.g., an ethdev driver)
> >>>>>> and keeps the unrelated logic in the emudev driver.
> >>>>>
> >>>>> Since you mention performance, how does it compare to vhost-user/virtio?
> >>>>
> >>>> I think it depends on the device specification (i.e., how complex its
> data
> >> path
> >>>> handling is). A first try on iavf spec shows better performance than
> virtio
> >>>> in our simple tests.
> >>>
> >>> That's interesting! How big is the performance difference? And how do
> >>> we explain it?
> >>>
> >>> If there are improvements that could be done in the Virtio
> >>> specification, it would be great to know and work on their
> >>> implementations. It worries me a bit that every one is coming with
> >>> his new device emulation every release, making things like live-
> >>> migration difficult to achieve in the future.
> >>
> >> I did a quick review of the IAVF emudev driver to understand what other
> >> factors than ring layout could explain a performance gain.
> >>
> >> My understanding is that part of the performance gain may come from
> >> following things that are supported/implemented in Vhost-user backend
> >> and not in IAVF driver:
> >> 1. Memory hotplug. It seems the datapath is not safe against memory
> >> hotplug in the VM, which causes the memory tables to be updated
> >> asynchronously from the datapath. In order to support it in Vhost-user
> >> library, we had to introduce locks to ensure the datapath isn't
> >> accessing the shared memory while it is being remapped.
> >
> > I think now it uses the similar way that vhost-user does.
> >
> > First, in the vfio-user patch series, we introduce a callback lock_dp to
> lock
> > the data path when messages like DMA MAP/UNMAP come. It will lock datapath
> > in our data path driver.
> >
> > Note that the data path handling is in our data path driver:
> > http://patchwork.dpdk.org/cover/85500/
> >
> > For modular design, iavf_emu driver emulates the device but the iavf back-
> end
> > driver does data path handling.
> 
> My analysis was actually based on the data path driver series.
> 
> My point was that iavfbe_recv_pkts() and iavfbe_xmit_pkts() are not safe
> against asynchronous changes like memory table updates.
> 
> As far as I can see, the access_lock aren't taken by these functions, so
> if for example a memory table update happen during the these functions
> execution, it could lead to undefined behaviour. Only things checked
> there is whether the queue is enabled when entering the function, but
> this value can be changed right after having being checked.
> 
> For example, in Vhost-user lib, we protected rte_vhost_dequeue_burst()
> and rte_vhost_enqueue_burst() with a spinlock. Note that this spinlock
> is per-virtqueue in order to avoid contention between the different
> queues.
> 

Oops, I didn't realize the data path driver missed that. And yes, the data path
driver should do that like you said.

> >>
> >> 2. Live-migration. This feature does not seem supported in the driver,
> >> as I could not find dirty pages tracking mechanism. On Vhost-user side,
> >> supporting implies adding more branch conditions in the hot path, even
> >> when it is not used.
> >
> > Yes, we don't support this now in this version. And yes, when we support
> this
> > feature, it will introduce complexity in data path.
> >
> >>
> >> 3. No IOMMU support. Same here, this is supported in Vhost-user library,
> >> and adding its support in IAVF driver would introduce some more branches
> >> in the hot path even when not used.
> >
> > Yes, vIOMMU is not fully supported in vfio-user spec for now and I also
> agree
> > when we have to support it, it will slow down the data path.
> >
> >>
> >> 4. Out of bound memory accesses checks. While
> >> rte_iavf_emu_get_dma_vaddr() provides a way to ensure the full requested
> >> length is mapped, the data path does not use it. It does not even ensure
> >
> > In fact, it uses it 😊. I think you may miss our data path driver. Here's a
> > use: http://patchwork.dpdk.org/patch/85504/
> 
> Sorry, I was not clear. What I meant is that
> rte_iavf_emu_get_dma_vaddr() is indeed used, but its outputs aren't
> checked properly.
> 
> >> the translated address is non-NULL. It makes it trivial for a malicious
> >> guest to make the hypervisor's vSwitch to crash by simply passing random
> >> values as buffer's address and length. Fixing it is trivial, but it will
> >> add several more checks and loops (if a buffer is spanned across two
> >> pages) in the hot path.
> >
> > I don't quite understand this one. First, rte_iavf_emu_get_dma_vaddr() is
> the
> > only way to translate address. And I think this function will ensure the
> input
> > address is valid. Looking at the vhost side, vhost_iova_to_vva() does
> similar
> > things when vIOMMU is not used. Do I miss something? Just correct me if I am
> > wrong.
> 
> I think rte_iavf_emu_get_dma_vaddr() is good, the issue is the way it is
> used in iavfbe_recv_pkts() and iavfbe_xmit_pkts().
> 
> First the returned address (desc_addr) is not checked. So if the start
> address of the buffer is out of guest memory ranges, the host app will
> crash with a segmentation fault. It can happen in case of bug in the
> guest driver, or with a maliciously crafted descriptor.
> 
> Then, rte_iavf_emu_get_dma_vaddr() len parameter is a pointer. The
> caller has to pass the length of the buffer it wants to map. Once
> rte_iavf_emu_get_dma_vaddr() is updated with the contiguous length that
> is mapped.
> 
> The caller needs to check whether all requested length is mapped, and
> loop until it is all mapped:
> http://code.dpdk.org/dpdk/latest/source/lib/librte_vhost/virtio_net.c#L484
> 
> This is required because a buffer contiguous in IOVA (guest physical
> address space in this case) is not necessarily contiguous in Host
> virtual address space. Also, this is required to be safe against
> untrusted guests, as a malicious guest could pass ~0ULL in the
> descriptor buffer len of the Tx path to crash the host application or
> overwrite its memory.
> 
> Currently, in the driver, descriptor buf len is passed to the DMA map
> function, but its value is not checked afterward, so it only ensure
> first byte is mapped. For Tx, 1 bytes length is requested at DMA map, so
> also only first byte of the buffer is ensured to be mapped (if desc_addr
> is not NULL, which is not checked).
>

Correct! I also missed that rte_iavf_emu_get_dma_vaddr() is not properly used
in data path driver. For the lock issue and this, @Wu, Jingjing could we fix
in V2?

Thanks for the good catches!
Chenbo
 
> >>
> >> Other than that, there is for sure a performance gain due to all the
> >> features Virtio-net supports that we have to check and handle in the
> >> hotpath, like indirect descriptors or mergeable buffers for example.
> >
> > I think iavf has similar features like indirect and mergeable to recv/xmit
> > large pkts. But I believe there will be some feature difference between
> > iavf and virtio/vhost.
> 
> Sure, but I meant it might not be necessary to implement all those
> features (if they are optional in the spec) in your driver, just that in
> the case of Vhost we have a legacy to support.
> 
> > I think you are correct that for this version, it's not fair to compare
> virtio/vhost
> > with iavf back-end because iavf back-end has not supported some features,
> and besides,
> > we have not optimized the data path of iavf back-end. We expect the
> performance of
> > iavf back-end in the same level of virtio 1.1 and hopefully better because
> of the ring
> > layout. But let's see when we can do complete performance analysis 😊.
> 
> Thanks!
> Maxime
> 
> > Thanks!
> > Chenbo
> >
> >>
> >> Best regards,
> >> Maxime
> >>
> >>> Regards,
> >>> Maxime
> >>>
> >>>> Thanks!
> >>>> Chenbo
> >>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> David Marchand
> >>>>
> >


  reply	other threads:[~2020-12-23  5:28 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-18  7:47 Chenbo Xia
2020-12-18  7:47 ` [dpdk-dev] [PATCH 1/8] lib: introduce emudev library Chenbo Xia
2020-12-18  7:47 ` [dpdk-dev] [PATCH 2/8] doc: add emudev library guide Chenbo Xia
2020-12-18  7:47 ` [dpdk-dev] [PATCH 3/8] emu: introduce emulated iavf driver Chenbo Xia
2020-12-18  7:47 ` [dpdk-dev] [PATCH 4/8] emu/iavf: add vfio-user device register and unregister Chenbo Xia
2021-01-07  7:18   ` Xing, Beilei
2021-01-07  8:41     ` Xia, Chenbo
2020-12-18  7:47 ` [dpdk-dev] [PATCH 5/8] emu/iavf: add resource management and internal logic of iavf Chenbo Xia
2020-12-18  7:47 ` [dpdk-dev] [PATCH 6/8] emu/iavf: add emudev operations to fit in emudev framework Chenbo Xia
2020-12-18  7:47 ` [dpdk-dev] [PATCH 7/8] test/emudev: introduce functional test Chenbo Xia
2020-12-18  7:47 ` [dpdk-dev] [PATCH 8/8] doc: update release notes for iavf emudev driver Chenbo Xia
2020-12-18  9:53 ` [dpdk-dev] [PATCH 0/8] Introduce emudev library and " David Marchand
2020-12-19  6:11   ` Xia, Chenbo
2020-12-21  9:52     ` Maxime Coquelin
2020-12-21 12:01       ` Maxime Coquelin
2020-12-22  3:09         ` Xia, Chenbo
2020-12-22  8:48           ` Maxime Coquelin
2020-12-23  5:28             ` Xia, Chenbo [this message]
2020-12-19  6:27 ` [dpdk-dev] [PATCH v2 " Chenbo Xia
2020-12-19  6:27   ` [dpdk-dev] [PATCH v2 1/8] lib: introduce emudev library Chenbo Xia
2020-12-19  6:28   ` [dpdk-dev] [PATCH v2 2/8] doc: add emudev library guide Chenbo Xia
2020-12-19  6:28   ` [dpdk-dev] [PATCH v2 3/8] emu: introduce emulated iavf driver Chenbo Xia
2020-12-19  6:28   ` [dpdk-dev] [PATCH v2 4/8] emu/iavf: add vfio-user device register and unregister Chenbo Xia
2021-01-04  6:45     ` Wu, Jingjing
2021-01-05  1:26       ` Xia, Chenbo
2021-01-05 13:41     ` Wu, Jingjing
2021-01-06  7:41       ` Xia, Chenbo
2020-12-19  6:28   ` [dpdk-dev] [PATCH v2 5/8] emu/iavf: add resource management and internal logic of iavf Chenbo Xia
2020-12-29  6:05     ` Wu, Jingjing
2020-12-30  1:59       ` Xia, Chenbo
2020-12-19  6:28   ` [dpdk-dev] [PATCH v2 6/8] emu/iavf: add emudev operations to fit in emudev framework Chenbo Xia
2020-12-19  6:28   ` [dpdk-dev] [PATCH v2 7/8] test/emudev: introduce functional test Chenbo Xia
2020-12-19  6:28   ` [dpdk-dev] [PATCH v2 8/8] doc: update release notes for iavf emudev driver Chenbo Xia
2021-01-13 16:52   ` [dpdk-dev] [PATCH v2 0/8] Introduce emudev library and " Thomas Monjalon
2021-01-14  1:35     ` Xia, Chenbo
2021-01-14  6:25   ` [dpdk-dev] [PATCH v3 " Chenbo Xia
2021-01-14  6:25     ` [dpdk-dev] [PATCH v3 1/8] lib: introduce emudev library Chenbo Xia
2021-01-14  6:25     ` [dpdk-dev] [PATCH v3 2/8] doc: add emudev library guide Chenbo Xia
2021-01-14  6:25     ` [dpdk-dev] [PATCH v3 3/8] emu: introduce emulated iavf driver Chenbo Xia
2021-01-14  6:25     ` [dpdk-dev] [PATCH v3 4/8] emu/iavf: add vfio-user device register and unregister Chenbo Xia
2021-01-14  6:25     ` [dpdk-dev] [PATCH v3 5/8] emu/iavf: add resource management and internal logic of iavf Chenbo Xia
2021-01-14  6:25     ` [dpdk-dev] [PATCH v3 6/8] emu/iavf: add emudev operations to fit in emudev framework Chenbo Xia
2021-01-14  6:25     ` [dpdk-dev] [PATCH v3 7/8] test/emudev: introduce functional test Chenbo Xia
2021-01-14  6:25     ` [dpdk-dev] [PATCH v3 8/8] doc: update release notes for iavf emudev driver Chenbo Xia
2024-02-12 22:49     ` [dpdk-dev] [PATCH v3 0/8] Introduce emudev library and " Stephen Hemminger
2023-06-14 19:47 ` [dpdk-dev] [PATCH " Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR11MB406358231FAD0C664D54B3399CDE0@MN2PR11MB4063.namprd11.prod.outlook.com \
    --to=chenbo.xia@intel.com \
    --cc=cunming.liang@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=jingjing.wu@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=miao.li@intel.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    --cc=xiuchun.lu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).