DPDK patches and discussions
 help / color / mirror / Atom feed
From: Maxime Coquelin <maxime.coquelin@redhat.com>
To: "Hu, Jiayu" <jiayu.hu@intel.com>,
	"Ilya Maximets" <i.maximets@ovn.org>,
	"Morten Brørup" <mb@smartsharesystems.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>
Cc: "Van Haaren, Harry" <harry.van.haaren@intel.com>,
	"Pai G, Sunil" <sunil.pai.g@intel.com>,
	"Stokes, Ian" <ian.stokes@intel.com>,
	"Ferriter, Cian" <cian.ferriter@intel.com>,
	"ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Mcnamara, John" <john.mcnamara@intel.com>,
	"O'Driscoll, Tim" <tim.odriscoll@intel.com>,
	"Finn, Emma" <emma.finn@intel.com>
Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
Date: Wed, 30 Mar 2022 11:25:05 +0200	[thread overview]
Message-ID: <4a66558c-d0ad-f41a-fef6-db670a330f21@redhat.com> (raw)
In-Reply-To: <431821f0b06d45958f67cb157029d306@intel.com>



On 3/30/22 04:02, Hu, Jiayu wrote:
> 
> 
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets@ovn.org>
>> Sent: Wednesday, March 30, 2022 1:45 AM
>> To: Morten Brørup <mb@smartsharesystems.com>; Richardson, Bruce
>> <bruce.richardson@intel.com>
>> Cc: i.maximets@ovn.org; Maxime Coquelin <maxime.coquelin@redhat.com>;
>> Van Haaren, Harry <harry.van.haaren@intel.com>; Pai G, Sunil
>> <sunil.pai.g@intel.com>; Stokes, Ian <ian.stokes@intel.com>; Hu, Jiayu
>> <jiayu.hu@intel.com>; Ferriter, Cian <cian.ferriter@intel.com>; ovs-
>> dev@openvswitch.org; dev@dpdk.org; Mcnamara, John
>> <john.mcnamara@intel.com>; O'Driscoll, Tim <tim.odriscoll@intel.com>;
>> Finn, Emma <emma.finn@intel.com>
>> Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
>>
>> On 3/29/22 19:13, Morten Brørup wrote:
>>>> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
>>>> Sent: Tuesday, 29 March 2022 19.03
>>>>
>>>> On Tue, Mar 29, 2022 at 06:45:19PM +0200, Morten Brørup wrote:
>>>>>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>>>>>> Sent: Tuesday, 29 March 2022 18.24
>>>>>>
>>>>>> Hi Morten,
>>>>>>
>>>>>> On 3/29/22 16:44, Morten Brørup wrote:
>>>>>>>> From: Van Haaren, Harry [mailto:harry.van.haaren@intel.com]
>>>>>>>> Sent: Tuesday, 29 March 2022 15.02
>>>>>>>>
>>>>>>>>> From: Morten Brørup <mb@smartsharesystems.com>
>>>>>>>>> Sent: Tuesday, March 29, 2022 1:51 PM
>>>>>>>>>
>>>>>>>>> Having thought more about it, I think that a completely
>>>> different
>>>>>> architectural approach is required:
>>>>>>>>>
>>>>>>>>> Many of the DPDK Ethernet PMDs implement a variety of RX and TX
>>>>>> packet burst functions, each optimized for different CPU vector
>>>>>> instruction sets. The availability of a DMA engine should be
>>>> treated
>>>>>> the same way. So I suggest that PMDs copying packet contents, e.g.
>>>>>> memif, pcap, vmxnet3, should implement DMA optimized RX and TX
>>>> packet
>>>>>> burst functions.
>>>>>>>>>
>>>>>>>>> Similarly for the DPDK vhost library.
>>>>>>>>>
>>>>>>>>> In such an architecture, it would be the application's job to
>>>>>> allocate DMA channels and assign them to the specific PMDs that
>>>> should
>>>>>> use them. But the actual use of the DMA channels would move down
>>>> below
>>>>>> the application and into the DPDK PMDs and libraries.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Med venlig hilsen / Kind regards, -Morten Brørup
>>>>>>>>
>>>>>>>> Hi Morten,
>>>>>>>>
>>>>>>>> That's *exactly* how this architecture is designed &
>>>> implemented.
>>>>>>>> 1.	The DMA configuration and initialization is up to the
>>>> application
>>>>>> (OVS).
>>>>>>>> 2.	The VHost library is passed the DMA-dev ID, and its new
>>>> async
>>>>>> rx/tx APIs, and uses the DMA device to accelerate the copy.
>>>>>>>>
>>>>>>>> Looking forward to talking on the call that just started.
>>>> Regards, -
>>>>>> Harry
>>>>>>>>
>>>>>>>
>>>>>>> OK, thanks - as I said on the call, I haven't looked at the
>>>> patches.
>>>>>>>
>>>>>>> Then, I suppose that the TX completions can be handled in the TX
>>>>>> function, and the RX completions can be handled in the RX function,
>>>>>> just like the Ethdev PMDs handle packet descriptors:
>>>>>>>
>>>>>>> TX_Burst(tx_packet_array):
>>>>>>> 1.	Clean up descriptors processed by the NIC chip. --> Process
>>>> TX
>>>>>> DMA channel completions. (Effectively, the 2nd pipeline stage.)
>>>>>>> 2.	Pass on the tx_packet_array to the NIC chip descriptors. --
>>>>> Pass
>>>>>> on the tx_packet_array to the TX DMA channel. (Effectively, the 1st
>>>>>> pipeline stage.)
>>>>>>
>>>>>> The problem is Tx function might not be called again, so enqueued
>>>>>> packets in 2. may never be completed from a Virtio point of view.
>>>> IOW,
>>>>>> the packets will be copied to the Virtio descriptors buffers, but
>>>> the
>>>>>> descriptors will not be made available to the Virtio driver.
>>>>>
>>>>> In that case, the application needs to call TX_Burst() periodically
>>>> with an empty array, for completion purposes.
>>>>>
>>>>> Or some sort of TX_Keepalive() function can be added to the DPDK
>>>> library, to handle DMA completion. It might even handle multiple DMA
>>>> channels, if convenient - and if possible without locking or other
>>>> weird complexity.
>>>>>
>>>>> Here is another idea, inspired by a presentation at one of the DPDK
>>>> Userspace conferences. It may be wishful thinking, though:
>>>>>
>>>>> Add an additional transaction to each DMA burst; a special
>>>> transaction containing the memory write operation that makes the
>>>> descriptors available to the Virtio driver.
>>
>> I was talking with Maxime after the call today about the same idea.
>> And it looks fairly doable, I would say.
> 
> If the idea is making DMA update used ring's index (2B) and packed ring descriptor's flag (2B),
> yes, it will work functionally. But considering the offloading cost of DMA, it would hurt
> performance. In addition, the latency of small copy of DMA is much higher than that of
> CPU. So it will also increase latency.

I agree writing back descriptors using DMA can be sub-optimal,
especially for packed ring where the head desc flags have to be written
last.

Are you sure about latency? With current solution, the descriptors
write-backs can happen quite some time after the DMA transfers are done,
isn't it?

>>
>>>>>
>>>>
>>>> That is something that can work, so long as the receiver is operating
>>>> in polling mode. For cases where virtio interrupts are enabled, you
>>>> still need to do a write to the eventfd in the kernel in vhost to
>>>> signal the virtio side. That's not something that can be offloaded to
>>>> a DMA engine, sadly, so we still need some form of completion call.
>>>
>>> I guess that virtio interrupts is the most widely deployed scenario,
>>> so let's ignore the DMA TX completion transaction for now - and call
>>> it a possible future optimization for specific use cases. So it seems
>>> that some form of completion call is unavoidable.
>>>
>>
>> We could separate the actual kick of the guest with the data transfer.
>> If interrupts are enabled, this means that the guest is not actively polling, i.e.
>> we can allow some extra latency by performing the actual kick from the rx
>> context, or, as Maxime said, if DMA engine can generate interrupts when the
>> DMA queue is empty, vhost thread may listen to them and kick the guest if
>> needed.  This will additionally remove the extra system call from the fast
>> path.
> 
> Separating kick with data transfer is a very good idea. But it requires a dedicated
> control plane thread to kick guest after DMA interrupt. Anyway, we can try this
> optimization in the future.

Yes it requires a dedicated thread, but I don't think this is really an
issue. Interrupt mode can be considered as slow-path.

> 
> Thanks,
> Jiayu


  reply	other threads:[~2022-03-30  9:25 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 15:36 Stokes, Ian
2022-03-28 18:19 ` Pai G, Sunil
2022-03-29 12:51   ` Morten Brørup
2022-03-29 13:01     ` Van Haaren, Harry
2022-03-29 14:44       ` Morten Brørup
2022-03-29 16:24         ` Maxime Coquelin
2022-03-29 16:45           ` Morten Brørup
2022-03-29 17:03             ` Bruce Richardson
2022-03-29 17:13               ` Morten Brørup
2022-03-29 17:45                 ` Ilya Maximets
2022-03-29 18:46                   ` Morten Brørup
2022-03-30  2:02                   ` Hu, Jiayu
2022-03-30  9:25                     ` Maxime Coquelin [this message]
2022-03-30 10:20                       ` Bruce Richardson
2022-03-30 14:27                       ` Hu, Jiayu
2022-03-29 17:46                 ` Van Haaren, Harry
2022-03-29 19:59                   ` Morten Brørup
2022-03-30  9:01                     ` Van Haaren, Harry
2022-04-07 14:04                       ` Van Haaren, Harry
2022-04-07 14:25                         ` Maxime Coquelin
2022-04-07 14:39                           ` Ilya Maximets
2022-04-07 14:42                             ` Van Haaren, Harry
2022-04-07 15:01                               ` Ilya Maximets
2022-04-07 15:46                                 ` Maxime Coquelin
2022-04-07 16:04                                   ` Bruce Richardson
2022-04-08  7:13                             ` Hu, Jiayu
2022-04-08  8:21                               ` Morten Brørup
2022-04-08  9:57                               ` Ilya Maximets
2022-04-20 15:39                                 ` Mcnamara, John
2022-04-20 16:41                                 ` Mcnamara, John
2022-04-25 21:46                                   ` Ilya Maximets
2022-04-27 14:55                                     ` Mcnamara, John
2022-04-27 20:34                                     ` Bruce Richardson
2022-04-28 12:59                                       ` Ilya Maximets
2022-04-28 13:55                                         ` Bruce Richardson
2022-05-03 19:38                                         ` Van Haaren, Harry
2022-05-10 14:39                                           ` Van Haaren, Harry
2022-05-24 12:12                                           ` Ilya Maximets
2022-03-30 10:41   ` Ilya Maximets
2022-03-30 10:52     ` Ilya Maximets
2022-03-30 11:12       ` Bruce Richardson
2022-03-30 11:41         ` Ilya Maximets
2022-03-30 14:09           ` Bruce Richardson
2022-04-05 11:29             ` Ilya Maximets
2022-04-05 12:07               ` Bruce Richardson
2022-04-08  6:29                 ` Pai G, Sunil
2022-05-13  8:52                   ` fengchengwen
2022-05-13  9:10                     ` Bruce Richardson
2022-05-13  9:48                       ` fengchengwen
2022-05-13 10:34                         ` Bruce Richardson
2022-05-16  9:04                           ` Morten Brørup
2022-05-16 22:31                           ` [EXT] " Radha Chintakuntla
  -- strict thread matches above, loose matches on Subject: below --
2022-04-25 15:19 Mcnamara, John
2022-04-21 14:57 Mcnamara, John
     [not found] <DM6PR11MB3227AC0014F321EB901BE385FC199@DM6PR11MB3227.namprd11.prod.outlook.com>
2022-04-21 11:51 ` Mcnamara, John
     [not found] <DM8PR11MB5605B4A5DBD79FFDB4B1C3B2BD0A9@DM8PR11MB5605.namprd11.prod.outlook.com>
2022-03-21 18:23 ` Pai G, Sunil
2022-03-15 15:48 Stokes, Ian
2022-03-15 13:17 Stokes, Ian
2022-03-15 11:15 Stokes, Ian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a66558c-d0ad-f41a-fef6-db670a330f21@redhat.com \
    --to=maxime.coquelin@redhat.com \
    --cc=bruce.richardson@intel.com \
    --cc=cian.ferriter@intel.com \
    --cc=dev@dpdk.org \
    --cc=emma.finn@intel.com \
    --cc=harry.van.haaren@intel.com \
    --cc=i.maximets@ovn.org \
    --cc=ian.stokes@intel.com \
    --cc=jiayu.hu@intel.com \
    --cc=john.mcnamara@intel.com \
    --cc=mb@smartsharesystems.com \
    --cc=ovs-dev@openvswitch.org \
    --cc=sunil.pai.g@intel.com \
    --cc=tim.odriscoll@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).