DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Ilya Maximets" <i.maximets@ovn.org>,
	"Bruce Richardson" <bruce.richardson@intel.com>
Cc: "Maxime Coquelin" <maxime.coquelin@redhat.com>,
	"Van Haaren, Harry" <harry.van.haaren@intel.com>,
	"Pai G, Sunil" <sunil.pai.g@intel.com>,
	"Stokes, Ian" <ian.stokes@intel.com>,
	"Hu, Jiayu" <jiayu.hu@intel.com>,
	"Ferriter, Cian" <cian.ferriter@intel.com>,
	<ovs-dev@openvswitch.org>, <dev@dpdk.org>,
	"Mcnamara, John" <john.mcnamara@intel.com>,
	"O'Driscoll, Tim" <tim.odriscoll@intel.com>,
	"Finn, Emma" <emma.finn@intel.com>
Subject: RE: OVS DPDK DMA-Dev library/Design Discussion
Date: Tue, 29 Mar 2022 20:46:58 +0200	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D86F81@smartserver.smartshare.dk> (raw)
In-Reply-To: <fc844aec-65e6-adcc-cf03-c363026c4d27@ovn.org>

> From: Ilya Maximets [mailto:i.maximets@ovn.org]
> Sent: Tuesday, 29 March 2022 19.45
> 
> On 3/29/22 19:13, Morten Brørup wrote:
> >> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> >> Sent: Tuesday, 29 March 2022 19.03
> >>
> >> On Tue, Mar 29, 2022 at 06:45:19PM +0200, Morten Brørup wrote:
> >>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >>>> Sent: Tuesday, 29 March 2022 18.24
> >>>>
> >>>> Hi Morten,
> >>>>
> >>>> On 3/29/22 16:44, Morten Brørup wrote:
> >>>>>> From: Van Haaren, Harry [mailto:harry.van.haaren@intel.com]
> >>>>>> Sent: Tuesday, 29 March 2022 15.02
> >>>>>>
> >>>>>>> From: Morten Brørup <mb@smartsharesystems.com>
> >>>>>>> Sent: Tuesday, March 29, 2022 1:51 PM
> >>>>>>>
> >>>>>>> Having thought more about it, I think that a completely
> >> different
> >>>> architectural approach is required:
> >>>>>>>
> >>>>>>> Many of the DPDK Ethernet PMDs implement a variety of RX and TX
> >>>> packet burst functions, each optimized for different CPU vector
> >>>> instruction sets. The availability of a DMA engine should be
> >> treated
> >>>> the same way. So I suggest that PMDs copying packet contents, e.g.
> >>>> memif, pcap, vmxnet3, should implement DMA optimized RX and TX
> >> packet
> >>>> burst functions.
> >>>>>>>
> >>>>>>> Similarly for the DPDK vhost library.
> >>>>>>>
> >>>>>>> In such an architecture, it would be the application's job to
> >>>> allocate DMA channels and assign them to the specific PMDs that
> >> should
> >>>> use them. But the actual use of the DMA channels would move down
> >> below
> >>>> the application and into the DPDK PMDs and libraries.
> >>>>>>>
> >>>>>>>
> >>>>>>> Med venlig hilsen / Kind regards,
> >>>>>>> -Morten Brørup
> >>>>>>
> >>>>>> Hi Morten,
> >>>>>>
> >>>>>> That's *exactly* how this architecture is designed &
> >> implemented.
> >>>>>> 1.	The DMA configuration and initialization is up to the
> >> application
> >>>> (OVS).
> >>>>>> 2.	The VHost library is passed the DMA-dev ID, and its new
> >> async
> >>>> rx/tx APIs, and uses the DMA device to accelerate the copy.
> >>>>>>
> >>>>>> Looking forward to talking on the call that just started.
> >> Regards, -
> >>>> Harry
> >>>>>>
> >>>>>
> >>>>> OK, thanks - as I said on the call, I haven't looked at the
> >> patches.
> >>>>>
> >>>>> Then, I suppose that the TX completions can be handled in the TX
> >>>> function, and the RX completions can be handled in the RX
> function,
> >>>> just like the Ethdev PMDs handle packet descriptors:
> >>>>>
> >>>>> TX_Burst(tx_packet_array):
> >>>>> 1.	Clean up descriptors processed by the NIC chip. --> Process
> >> TX
> >>>> DMA channel completions. (Effectively, the 2nd pipeline stage.)
> >>>>> 2.	Pass on the tx_packet_array to the NIC chip descriptors. --
> >>> Pass
> >>>> on the tx_packet_array to the TX DMA channel. (Effectively, the
> 1st
> >>>> pipeline stage.)
> >>>>
> >>>> The problem is Tx function might not be called again, so enqueued
> >>>> packets in 2. may never be completed from a Virtio point of view.
> >> IOW,
> >>>> the packets will be copied to the Virtio descriptors buffers, but
> >> the
> >>>> descriptors will not be made available to the Virtio driver.
> >>>
> >>> In that case, the application needs to call TX_Burst() periodically
> >> with an empty array, for completion purposes.
> >>>
> >>> Or some sort of TX_Keepalive() function can be added to the DPDK
> >> library, to handle DMA completion. It might even handle multiple DMA
> >> channels, if convenient - and if possible without locking or other
> >> weird complexity.
> >>>
> >>> Here is another idea, inspired by a presentation at one of the DPDK
> >> Userspace conferences. It may be wishful thinking, though:
> >>>
> >>> Add an additional transaction to each DMA burst; a special
> >> transaction containing the memory write operation that makes the
> >> descriptors available to the Virtio driver.
> 
> I was talking with Maxime after the call today about the same idea.
> And it looks fairly doable, I would say.
> 
> >>>
> >>
> >> That is something that can work, so long as the receiver is
> operating
> >> in
> >> polling mode. For cases where virtio interrupts are enabled, you
> still
> >> need
> >> to do a write to the eventfd in the kernel in vhost to signal the
> >> virtio
> >> side. That's not something that can be offloaded to a DMA engine,
> >> sadly, so
> >> we still need some form of completion call.
> >
> > I guess that virtio interrupts is the most widely deployed scenario,
> so let's
> > ignore the DMA TX completion transaction for now - and call it a
> possible
> > future optimization for specific use cases. So it seems that some
> form of
> > completion call is unavoidable.
> >
> 
> We could separate the actual kick of the guest with the data transfer.
> If interrupts are enabled, this means that the guest is not actively
> polling, i.e. we can allow some extra latency by performing the actual
> kick from the rx context, or, as Maxime said, if DMA engine can
> generate
> interrupts when the DMA queue is empty, vhost thread may listen to them
> and kick the guest if needed.  This will additionally remove the extra
> system call from the fast path.

Excellent point about latency sensitivity!

I reminds me that we should consider what to optimize for...

Adding that extra "burst complete signal" DMA transaction at the end of a DMA burst has a cost. So we need to ask ourselves:
1. Is this the cheapest method to signal and handle completion, or would some other method (e.g. polling) be cheaper?
2. In the most important (i.e. most frequent or most latency sensitive) traffic scenarios, is this the cheapest method?


  reply	other threads:[~2022-03-29 18:47 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 15:36 Stokes, Ian
2022-03-28 18:19 ` Pai G, Sunil
2022-03-29 12:51   ` Morten Brørup
2022-03-29 13:01     ` Van Haaren, Harry
2022-03-29 14:44       ` Morten Brørup
2022-03-29 16:24         ` Maxime Coquelin
2022-03-29 16:45           ` Morten Brørup
2022-03-29 17:03             ` Bruce Richardson
2022-03-29 17:13               ` Morten Brørup
2022-03-29 17:45                 ` Ilya Maximets
2022-03-29 18:46                   ` Morten Brørup [this message]
2022-03-30  2:02                   ` Hu, Jiayu
2022-03-30  9:25                     ` Maxime Coquelin
2022-03-30 10:20                       ` Bruce Richardson
2022-03-30 14:27                       ` Hu, Jiayu
2022-03-29 17:46                 ` Van Haaren, Harry
2022-03-29 19:59                   ` Morten Brørup
2022-03-30  9:01                     ` Van Haaren, Harry
2022-04-07 14:04                       ` Van Haaren, Harry
2022-04-07 14:25                         ` Maxime Coquelin
2022-04-07 14:39                           ` Ilya Maximets
2022-04-07 14:42                             ` Van Haaren, Harry
2022-04-07 15:01                               ` Ilya Maximets
2022-04-07 15:46                                 ` Maxime Coquelin
2022-04-07 16:04                                   ` Bruce Richardson
2022-04-08  7:13                             ` Hu, Jiayu
2022-04-08  8:21                               ` Morten Brørup
2022-04-08  9:57                               ` Ilya Maximets
2022-04-20 15:39                                 ` Mcnamara, John
2022-04-20 16:41                                 ` Mcnamara, John
2022-04-25 21:46                                   ` Ilya Maximets
2022-04-27 14:55                                     ` Mcnamara, John
2022-04-27 20:34                                     ` Bruce Richardson
2022-04-28 12:59                                       ` Ilya Maximets
2022-04-28 13:55                                         ` Bruce Richardson
2022-05-03 19:38                                         ` Van Haaren, Harry
2022-05-10 14:39                                           ` Van Haaren, Harry
2022-05-24 12:12                                           ` Ilya Maximets
2022-03-30 10:41   ` Ilya Maximets
2022-03-30 10:52     ` Ilya Maximets
2022-03-30 11:12       ` Bruce Richardson
2022-03-30 11:41         ` Ilya Maximets
2022-03-30 14:09           ` Bruce Richardson
2022-04-05 11:29             ` Ilya Maximets
2022-04-05 12:07               ` Bruce Richardson
2022-04-08  6:29                 ` Pai G, Sunil
2022-05-13  8:52                   ` fengchengwen
2022-05-13  9:10                     ` Bruce Richardson
2022-05-13  9:48                       ` fengchengwen
2022-05-13 10:34                         ` Bruce Richardson
2022-05-16  9:04                           ` Morten Brørup
2022-05-16 22:31                           ` [EXT] " Radha Chintakuntla
  -- strict thread matches above, loose matches on Subject: below --
2022-04-25 15:19 Mcnamara, John
2022-04-21 14:57 Mcnamara, John
     [not found] <DM6PR11MB3227AC0014F321EB901BE385FC199@DM6PR11MB3227.namprd11.prod.outlook.com>
2022-04-21 11:51 ` Mcnamara, John
     [not found] <DM8PR11MB5605B4A5DBD79FFDB4B1C3B2BD0A9@DM8PR11MB5605.namprd11.prod.outlook.com>
2022-03-21 18:23 ` Pai G, Sunil
2022-03-15 15:48 Stokes, Ian
2022-03-15 13:17 Stokes, Ian
2022-03-15 11:15 Stokes, Ian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35D86F81@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=bruce.richardson@intel.com \
    --cc=cian.ferriter@intel.com \
    --cc=dev@dpdk.org \
    --cc=emma.finn@intel.com \
    --cc=harry.van.haaren@intel.com \
    --cc=i.maximets@ovn.org \
    --cc=ian.stokes@intel.com \
    --cc=jiayu.hu@intel.com \
    --cc=john.mcnamara@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=ovs-dev@openvswitch.org \
    --cc=sunil.pai.g@intel.com \
    --cc=tim.odriscoll@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).