Re: OVS DPDK DMA-Dev library/Design Discussion

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Ilya Maximets <i.maximets@ovn.org>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: i.maximets@ovn.org, "Pai G, Sunil" <sunil.pai.g@intel.com>,
	"Stokes, Ian" <ian.stokes@intel.com>,
	"Hu, Jiayu" <jiayu.hu@intel.com>,
	"Ferriter, Cian" <cian.ferriter@intel.com>,
	"Van Haaren, Harry" <harry.van.haaren@intel.com>,
	"Maxime Coquelin (maxime.coquelin@redhat.com)"
	<maxime.coquelin@redhat.com>,
	"ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Mcnamara, John" <john.mcnamara@intel.com>,
	"O'Driscoll, Tim" <tim.odriscoll@intel.com>,
	"Finn, Emma" <emma.finn@intel.com>
Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
Date: Tue, 5 Apr 2022 13:29:25 +0200	[thread overview]
Message-ID: <0633e31c-68fc-618c-e4f8-78a74662078c@ovn.org> (raw)
In-Reply-To: <YkRkmuou9773bMf/@bricha3-MOBL.ger.corp.intel.com>

On 3/30/22 16:09, Bruce Richardson wrote:
> On Wed, Mar 30, 2022 at 01:41:34PM +0200, Ilya Maximets wrote:
>> On 3/30/22 13:12, Bruce Richardson wrote:
>>> On Wed, Mar 30, 2022 at 12:52:15PM +0200, Ilya Maximets wrote:
>>>> On 3/30/22 12:41, Ilya Maximets wrote:
>>>>> Forking the thread to discuss a memory consistency/ordering model.
>>>>>
>>>>> AFAICT, dmadev can be anything from part of a CPU to a completely
>>>>> separate PCI device.  However, I don't see any memory ordering being
>>>>> enforced or even described in the dmadev API or documentation.
>>>>> Please, point me to the correct documentation, if I somehow missed it.
>>>>>
>>>>> We have a DMA device (A) and a CPU core (B) writing respectively
>>>>> the data and the descriptor info.  CPU core (C) is reading the
>>>>> descriptor and the data it points too.
>>>>>
>>>>> A few things about that process:
>>>>>
>>>>> 1. There is no memory barrier between writes A and B (Did I miss
>>>>>    them?).  Meaning that those operations can be seen by C in a
>>>>>    different order regardless of barriers issued by C and regardless
>>>>>    of the nature of devices A and B.
>>>>>
>>>>> 2. Even if there is a write barrier between A and B, there is
>>>>>    no guarantee that C will see these writes in the same order
>>>>>    as C doesn't use real memory barriers because vhost advertises
>>>>
>>>> s/advertises/does not advertise/
>>>>
>>>>>    VIRTIO_F_ORDER_PLATFORM.
>>>>>
>>>>> So, I'm getting to conclusion that there is a missing write barrier
>>>>> on the vhost side and vhost itself must not advertise the
>>>>
>>>> s/must not/must/
>>>>
>>>> Sorry, I wrote things backwards. :)
>>>>
>>>>> VIRTIO_F_ORDER_PLATFORM, so the virtio driver can use actual memory
>>>>> barriers.
>>>>>
>>>>> Would like to hear some thoughts on that topic.  Is it a real issue?
>>>>> Is it an issue considering all possible CPU architectures and DMA
>>>>> HW variants?
>>>>>
>>>
>>> In terms of ordering of operations using dmadev:
>>>
>>> * Some DMA HW will perform all operations strictly in order e.g. Intel
>>>   IOAT, while other hardware may not guarantee order of operations/do
>>>   things in parallel e.g. Intel DSA. Therefore the dmadev API provides the
>>>   fence operation which allows the order to be enforced. The fence can be
>>>   thought of as a full memory barrier, meaning no jobs after the barrier can
>>>   be started until all those before it have completed. Obviously, for HW
>>>   where order is always enforced, this will be a no-op, but for hardware that
>>>   parallelizes, we want to reduce the fences to get best performance.
>>>
>>> * For synchronization between DMA devices and CPUs, where a CPU can only
>>>   write after a DMA copy has been done, the CPU must wait for the dma
>>>   completion to guarantee ordering. Once the completion has been returned
>>>   the completed operation is globally visible to all cores.
>>
>> Thanks for explanation!  Some questions though:
>>
>> In our case one CPU waits for completion and another CPU is actually using
>> the data.  IOW, "CPU must wait" is a bit ambiguous.  Which CPU must wait?
>>
>> Or should it be "Once the completion is visible on any core, the completed
>> operation is globally visible to all cores." ?
>>
> 
> The latter.
> Once the change to memory/cache is visible to any core, it is visible to
> all ones. This applies to regular CPU memory writes too - at least on IA,
> and I expect on many other architectures - once the write is visible
> outside the current core it is visible to every other core. Once the data
> hits the l1 or l2 cache of any core, any subsequent requests for that data
> from any other core will "snoop" the latest data from the cores cache, even
> if it has not made its way down to a shared cache, e.g. l3 on most IA
> systems.

It sounds like you're referring to the "multicopy atomicity" of the
architecture.  However, that is not universally supported thing.
AFAICT, POWER and older ARM systems doesn't support it, so writes
performed by one core are not necessarily available to all other
cores at the same time.  That means that if the CPU0 writes the data
and the completion flag, CPU1 reads the completion flag and writes
the ring, CPU2 may see the ring write, but may still not see the
write of the data, even though there was a control dependency on CPU1.
There should be a full memory barrier on CPU1 in order to fulfill
the memory ordering requirements for CPU2, IIUC.

In our scenario the CPU0 is a DMA device, which may or may not be
part of a CPU and may have different memory consistency/ordering
requirements.  So, the question is: does DPDK DMA API guarantee
multicopy atomicity between DMA device and all CPU cores regardless
of CPU architecture and a nature of the DMA device?

> 
>> And the main question:
>>   Are these synchronization claims documented somewhere?
>>
> 
> Not explicitly, no. However, the way DMA devices works in the regard if
> global observability is absolutely no different from how crypto,
> compression, or any other hardware devices work. Doing a memory copy using
> a DMA device is exactly the same as doing a no-op crypto, or compression
> job with the output going to a separate output buffer. In all cases, a job
> cannot be considered completed until you get a hardware completion
> notification for it, and one you get that notification, it is globally
> observable by all entities.
> 
> The only different for dmadev APIs is that we do have the capability to
> specify that jobs must be done in a specific order, using a fence flag,
> which is documented in the API documentation.
> 
> /Bruce

next prev parent reply	other threads:[~2022-04-05 11:29 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-24 15:36 Stokes, Ian
2022-03-28 18:19 ` Pai G, Sunil
2022-03-29 12:51   ` Morten Brørup
2022-03-29 13:01     ` Van Haaren, Harry
2022-03-29 14:44       ` Morten Brørup
2022-03-29 16:24         ` Maxime Coquelin
2022-03-29 16:45           ` Morten Brørup
2022-03-29 17:03             ` Bruce Richardson
2022-03-29 17:13               ` Morten Brørup
2022-03-29 17:45                 ` Ilya Maximets
2022-03-29 18:46                   ` Morten Brørup
2022-03-30  2:02                   ` Hu, Jiayu
2022-03-30  9:25                     ` Maxime Coquelin
2022-03-30 10:20                       ` Bruce Richardson
2022-03-30 14:27                       ` Hu, Jiayu
2022-03-29 17:46                 ` Van Haaren, Harry
2022-03-29 19:59                   ` Morten Brørup
2022-03-30  9:01                     ` Van Haaren, Harry
2022-04-07 14:04                       ` Van Haaren, Harry
2022-04-07 14:25                         ` Maxime Coquelin
2022-04-07 14:39                           ` Ilya Maximets
2022-04-07 14:42                             ` Van Haaren, Harry
2022-04-07 15:01                               ` Ilya Maximets
2022-04-07 15:46                                 ` Maxime Coquelin
2022-04-07 16:04                                   ` Bruce Richardson
2022-04-08  7:13                             ` Hu, Jiayu
2022-04-08  8:21                               ` Morten Brørup
2022-04-08  9:57                               ` Ilya Maximets
2022-04-20 15:39                                 ` Mcnamara, John
2022-04-20 16:41                                 ` Mcnamara, John
2022-04-25 21:46                                   ` Ilya Maximets
2022-04-27 14:55                                     ` Mcnamara, John
2022-04-27 20:34                                     ` Bruce Richardson
2022-04-28 12:59                                       ` Ilya Maximets
2022-04-28 13:55                                         ` Bruce Richardson
2022-05-03 19:38                                         ` Van Haaren, Harry
2022-05-10 14:39                                           ` Van Haaren, Harry
2022-05-24 12:12                                           ` Ilya Maximets
2022-03-30 10:41   ` Ilya Maximets
2022-03-30 10:52     ` Ilya Maximets
2022-03-30 11:12       ` Bruce Richardson
2022-03-30 11:41         ` Ilya Maximets
2022-03-30 14:09           ` Bruce Richardson
2022-04-05 11:29             ` Ilya Maximets [this message]
2022-04-05 12:07               ` Bruce Richardson
2022-04-08  6:29                 ` Pai G, Sunil
2022-05-13  8:52                   ` fengchengwen
2022-05-13  9:10                     ` Bruce Richardson
2022-05-13  9:48                       ` fengchengwen
2022-05-13 10:34                         ` Bruce Richardson
2022-05-16  9:04                           ` Morten Brørup
2022-05-16 22:31                           ` [EXT] " Radha Chintakuntla
  -- strict thread matches above, loose matches on Subject: below --
2022-04-25 15:19 Mcnamara, John
2022-04-21 14:57 Mcnamara, John
     [not found] <DM6PR11MB3227AC0014F321EB901BE385FC199@DM6PR11MB3227.namprd11.prod.outlook.com>
2022-04-21 11:51 ` Mcnamara, John
     [not found] <DM8PR11MB5605B4A5DBD79FFDB4B1C3B2BD0A9@DM8PR11MB5605.namprd11.prod.outlook.com>
2022-03-21 18:23 ` Pai G, Sunil
2022-03-15 15:48 Stokes, Ian
2022-03-15 13:17 Stokes, Ian
2022-03-15 11:15 Stokes, Ian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0633e31c-68fc-618c-e4f8-78a74662078c@ovn.org \
    --to=i.maximets@ovn.org \
    --cc=bruce.richardson@intel.com \
    --cc=cian.ferriter@intel.com \
    --cc=dev@dpdk.org \
    --cc=emma.finn@intel.com \
    --cc=harry.van.haaren@intel.com \
    --cc=ian.stokes@intel.com \
    --cc=jiayu.hu@intel.com \
    --cc=john.mcnamara@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=ovs-dev@openvswitch.org \
    --cc=sunil.pai.g@intel.com \
    --cc=tim.odriscoll@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).