DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ilya Maximets <i.maximets@ovn.org>
To: "Mcnamara, John" <john.mcnamara@intel.com>,
	"Hu, Jiayu" <jiayu.hu@intel.com>,
	"Maxime Coquelin" <maxime.coquelin@redhat.com>,
	"Van Haaren, Harry" <harry.van.haaren@intel.com>,
	"Morten Brørup" <mb@smartsharesystems.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>
Cc: i.maximets@ovn.org, "Pai G, Sunil" <sunil.pai.g@intel.com>,
	"Stokes, Ian" <ian.stokes@intel.com>,
	"Ferriter, Cian" <cian.ferriter@intel.com>,
	"ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"O'Driscoll, Tim" <tim.odriscoll@intel.com>,
	"Finn, Emma" <emma.finn@intel.com>
Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
Date: Mon, 25 Apr 2022 23:46:01 +0200	[thread overview]
Message-ID: <fb0597ae-8943-22c4-b4f4-15f34e179825@ovn.org> (raw)
In-Reply-To: <DM6PR11MB3227BEE72B9D240404902463FCF59@DM6PR11MB3227.namprd11.prod.outlook.com>

On 4/20/22 18:41, Mcnamara, John wrote:
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets@ovn.org>
>> Sent: Friday, April 8, 2022 10:58 AM
>> To: Hu, Jiayu <jiayu.hu@intel.com>; Maxime Coquelin
>> <maxime.coquelin@redhat.com>; Van Haaren, Harry
>> <harry.van.haaren@intel.com>; Morten Brørup <mb@smartsharesystems.com>;
>> Richardson, Bruce <bruce.richardson@intel.com>
>> Cc: i.maximets@ovn.org; Pai G, Sunil <sunil.pai.g@intel.com>; Stokes, Ian
>> <ian.stokes@intel.com>; Ferriter, Cian <cian.ferriter@intel.com>; ovs-
>> dev@openvswitch.org; dev@dpdk.org; Mcnamara, John
>> <john.mcnamara@intel.com>; O'Driscoll, Tim <tim.odriscoll@intel.com>;
>> Finn, Emma <emma.finn@intel.com>
>> Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
>>
>> On 4/8/22 09:13, Hu, Jiayu wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ilya Maximets <i.maximets@ovn.org>
>>>> Sent: Thursday, April 7, 2022 10:40 PM
>>>> To: Maxime Coquelin <maxime.coquelin@redhat.com>; Van Haaren, Harry
>>>> <harry.van.haaren@intel.com>; Morten Brørup
>>>> <mb@smartsharesystems.com>; Richardson, Bruce
>>>> <bruce.richardson@intel.com>
>>>> Cc: i.maximets@ovn.org; Pai G, Sunil <sunil.pai.g@intel.com>; Stokes,
>>>> Ian <ian.stokes@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>; Ferriter,
>>>> Cian <cian.ferriter@intel.com>; ovs-dev@openvswitch.org;
>>>> dev@dpdk.org; Mcnamara, John <john.mcnamara@intel.com>; O'Driscoll,
>>>> Tim <tim.odriscoll@intel.com>; Finn, Emma <emma.finn@intel.com>
>>>> Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
>>>>
>>>> On 4/7/22 16:25, Maxime Coquelin wrote:
>>>>> Hi Harry,
>>>>>
>>>>> On 4/7/22 16:04, Van Haaren, Harry wrote:
>>>>>> Hi OVS & DPDK, Maintainers & Community,
>>>>>>
>>>>>> Top posting overview of discussion as replies to thread become
>> slower:
>>>>>> perhaps it is a good time to review and plan for next steps?
>>>>>>
>>>>>>  From my perspective, it those most vocal in the thread seem to be
>>>>>> in favour of the clean rx/tx split ("defer work"), with the
>>>>>> tradeoff that the application must be aware of handling the async
>>>>>> DMA completions. If there are any concerns opposing upstreaming of
>>>>>> this
>>>> method, please indicate this promptly, and we can continue technical
>>>> discussions here now.
>>>>>
>>>>> Wasn't there some discussions about handling the Virtio completions
>>>>> with the DMA engine? With that, we wouldn't need the deferral of work.
>>>>
>>>> +1
>>>>
>>>> With the virtio completions handled by DMA itself, the vhost port
>>>> turns almost into a real HW NIC.  With that we will not need any
>>>> extra manipulations from the OVS side, i.e. no need to defer any work
>>>> while maintaining clear split between rx and tx operations.
>>>
>>> First, making DMA do 2B copy would sacrifice performance, and I think
>>> we all agree on that.
>>
>> I do not agree with that.  Yes, 2B copy by DMA will likely be slower than
>> done by CPU, however CPU is going away for dozens or even hundreds of
>> thousands of cycles to process a new packet batch or service other ports,
>> hence DMA will likely complete the transmission faster than waiting for
>> the CPU thread to come back to that task.  In any case, this has to be
>> tested.
>>
>>> Second, this method comes with an issue of ordering.
>>> For example, PMD thread0 enqueue 10 packets to vring0 first, then PMD
>>> thread1 enqueue 20 packets to vring0. If PMD thread0 and threa1 have
>>> own dedicated DMA device dma0 and dma1, flag/index update for the
>>> first 10 packets is done by dma0, and flag/index update for the left
>>> 20 packets is done by dma1. But there is no ordering guarantee among
>>> different DMA devices, so flag/index update may error. If PMD threads
>>> don't have dedicated DMA devices, which means DMA devices are shared
>>> among threads, we need lock and pay for lock contention in data-path.
>>> Or we can allocate DMA devices for vring dynamically to avoid DMA
>>> sharing among threads. But what's the overhead of allocation mechanism?
>> Who does it? Any thoughts?
>>
>> 1. DMA completion was discussed in context of per-queue allocation, so
>> there
>>    is no re-ordering in this case.
>>
>> 2. Overhead can be minimal if allocated device can stick to the queue for
>> a
>>    reasonable amount of time without re-allocation on every send.  You may
>>    look at XPS implementation in lib/dpif-netdev.c in OVS for example of
>>    such mechanism.  For sure it can not be the same, but ideas can be re-
>> used.
>>
>> 3. Locking doesn't mean contention if resources are allocated/distributed
>>    thoughtfully.
>>
>> 4. Allocation can be done be either OVS or vhost library itself, I'd vote
>>    for doing that inside the vhost library, so any DPDK application and
>>    vhost ethdev can use it without re-inventing from scratch.  It also
>> should
>>    be simpler from the API point of view if allocation and usage are in
>>    the same place.  But I don't have a strong opinion here as for now,
>> since
>>    no real code examples exist, so it's hard to evaluate how they could
>> look
>>    like.
>>
>> But I feel like we're starting to run in circles here as I did already say
>> most of that before.
> 
> 

Hi, John.

Just reading this email as I was on PTO for a last 1.5 weeks
and didn't get through all the emails yet.

> This does seem to be going in circles, especially since there seemed to be technical alignment on the last public call on March 29th.

I guess, there is a typo in the date here.
It seems to be 26th, not 29th.

> It is not feasible to do a real world implementation/POC of every design proposal.

FWIW, I think it makes sense to PoC and test options that are
going to be simply unavailable going forward if not explored now.
Especially because we don't have any good solutions anyway
("Deferral of Work" is architecturally wrong solution for OVS).

> Let's have another call so that we can move towards a single solution that the DPDK and OVS communities agree on. I'll set up a call for next week in a similar time slot to the previous one.

Is there any particular reason we can't use a mailing list to
discuss that topic further?  Live discussions tend to cause
information loss as people start to forget what was already
discussed fairly quickly, and there is no reliable source to
refresh the memory (recordings are not really suitable for this
purpose, and we didn't have them before).

Anyway, I have a conflict tomorrow with another meetings, so
will not be able to attend.

Best regards, Ilya Maximets.

  reply	other threads:[~2022-04-25 21:46 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 15:36 Stokes, Ian
2022-03-28 18:19 ` Pai G, Sunil
2022-03-29 12:51   ` Morten Brørup
2022-03-29 13:01     ` Van Haaren, Harry
2022-03-29 14:44       ` Morten Brørup
2022-03-29 16:24         ` Maxime Coquelin
2022-03-29 16:45           ` Morten Brørup
2022-03-29 17:03             ` Bruce Richardson
2022-03-29 17:13               ` Morten Brørup
2022-03-29 17:45                 ` Ilya Maximets
2022-03-29 18:46                   ` Morten Brørup
2022-03-30  2:02                   ` Hu, Jiayu
2022-03-30  9:25                     ` Maxime Coquelin
2022-03-30 10:20                       ` Bruce Richardson
2022-03-30 14:27                       ` Hu, Jiayu
2022-03-29 17:46                 ` Van Haaren, Harry
2022-03-29 19:59                   ` Morten Brørup
2022-03-30  9:01                     ` Van Haaren, Harry
2022-04-07 14:04                       ` Van Haaren, Harry
2022-04-07 14:25                         ` Maxime Coquelin
2022-04-07 14:39                           ` Ilya Maximets
2022-04-07 14:42                             ` Van Haaren, Harry
2022-04-07 15:01                               ` Ilya Maximets
2022-04-07 15:46                                 ` Maxime Coquelin
2022-04-07 16:04                                   ` Bruce Richardson
2022-04-08  7:13                             ` Hu, Jiayu
2022-04-08  8:21                               ` Morten Brørup
2022-04-08  9:57                               ` Ilya Maximets
2022-04-20 15:39                                 ` Mcnamara, John
2022-04-20 16:41                                 ` Mcnamara, John
2022-04-25 21:46                                   ` Ilya Maximets [this message]
2022-04-27 14:55                                     ` Mcnamara, John
2022-04-27 20:34                                     ` Bruce Richardson
2022-04-28 12:59                                       ` Ilya Maximets
2022-04-28 13:55                                         ` Bruce Richardson
2022-05-03 19:38                                         ` Van Haaren, Harry
2022-05-10 14:39                                           ` Van Haaren, Harry
2022-05-24 12:12                                           ` Ilya Maximets
2022-03-30 10:41   ` Ilya Maximets
2022-03-30 10:52     ` Ilya Maximets
2022-03-30 11:12       ` Bruce Richardson
2022-03-30 11:41         ` Ilya Maximets
2022-03-30 14:09           ` Bruce Richardson
2022-04-05 11:29             ` Ilya Maximets
2022-04-05 12:07               ` Bruce Richardson
2022-04-08  6:29                 ` Pai G, Sunil
2022-05-13  8:52                   ` fengchengwen
2022-05-13  9:10                     ` Bruce Richardson
2022-05-13  9:48                       ` fengchengwen
2022-05-13 10:34                         ` Bruce Richardson
2022-05-16  9:04                           ` Morten Brørup
2022-05-16 22:31                           ` [EXT] " Radha Chintakuntla
  -- strict thread matches above, loose matches on Subject: below --
2022-04-25 15:19 Mcnamara, John
2022-04-21 14:57 Mcnamara, John
     [not found] <DM6PR11MB3227AC0014F321EB901BE385FC199@DM6PR11MB3227.namprd11.prod.outlook.com>
2022-04-21 11:51 ` Mcnamara, John
     [not found] <DM8PR11MB5605B4A5DBD79FFDB4B1C3B2BD0A9@DM8PR11MB5605.namprd11.prod.outlook.com>
2022-03-21 18:23 ` Pai G, Sunil
2022-03-15 15:48 Stokes, Ian
2022-03-15 13:17 Stokes, Ian
2022-03-15 11:15 Stokes, Ian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fb0597ae-8943-22c4-b4f4-15f34e179825@ovn.org \
    --to=i.maximets@ovn.org \
    --cc=bruce.richardson@intel.com \
    --cc=cian.ferriter@intel.com \
    --cc=dev@dpdk.org \
    --cc=emma.finn@intel.com \
    --cc=harry.van.haaren@intel.com \
    --cc=ian.stokes@intel.com \
    --cc=jiayu.hu@intel.com \
    --cc=john.mcnamara@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=mb@smartsharesystems.com \
    --cc=ovs-dev@openvswitch.org \
    --cc=sunil.pai.g@intel.com \
    --cc=tim.odriscoll@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).