From: fengchengwen <fengchengwen@huawei.com>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: "Pai G, Sunil" <sunil.pai.g@intel.com>,
Ilya Maximets <i.maximets@ovn.org>,
Radha Mohan Chintakuntla <radhac@marvell.com>,
Veerasenareddy Burru <vburru@marvell.com>,
Gagandeep Singh <g.singh@nxp.com>,
Nipun Gupta <nipun.gupta@nxp.com>,
"Stokes, Ian" <ian.stokes@intel.com>,
"Hu, Jiayu" <jiayu.hu@intel.com>,
"Ferriter, Cian" <cian.ferriter@intel.com>,
"Van Haaren, Harry" <harry.van.haaren@intel.com>,
"Maxime Coquelin (maxime.coquelin@redhat.com)"
<maxime.coquelin@redhat.com>,
"ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>,
"dev@dpdk.org" <dev@dpdk.org>,
"Mcnamara, John" <john.mcnamara@intel.com>,
"O'Driscoll, Tim" <tim.odriscoll@intel.com>,
"Finn, Emma" <emma.finn@intel.com>
Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
Date: Fri, 13 May 2022 17:48:35 +0800 [thread overview]
Message-ID: <a993ec37-323e-2e00-a423-3ecfbc3e7b35@huawei.com> (raw)
In-Reply-To: <Yn4gaY2IPmDuCq5V@bricha3-MOBL.ger.corp.intel.com>
On 2022/5/13 17:10, Bruce Richardson wrote:
> On Fri, May 13, 2022 at 04:52:10PM +0800, fengchengwen wrote:
>> On 2022/4/8 14:29, Pai G, Sunil wrote:
>>>> -----Original Message-----
>>>> From: Richardson, Bruce <bruce.richardson@intel.com>
>>>> Sent: Tuesday, April 5, 2022 5:38 PM
>>>> To: Ilya Maximets <i.maximets@ovn.org>; Chengwen Feng
>>>> <fengchengwen@huawei.com>; Radha Mohan Chintakuntla <radhac@marvell.com>;
>>>> Veerasenareddy Burru <vburru@marvell.com>; Gagandeep Singh
>>>> <g.singh@nxp.com>; Nipun Gupta <nipun.gupta@nxp.com>
>>>> Cc: Pai G, Sunil <sunil.pai.g@intel.com>; Stokes, Ian
>>>> <ian.stokes@intel.com>; Hu, Jiayu <jiayu.hu@intel.com>; Ferriter, Cian
>>>> <cian.ferriter@intel.com>; Van Haaren, Harry <harry.van.haaren@intel.com>;
>>>> Maxime Coquelin (maxime.coquelin@redhat.com) <maxime.coquelin@redhat.com>;
>>>> ovs-dev@openvswitch.org; dev@dpdk.org; Mcnamara, John
>>>> <john.mcnamara@intel.com>; O'Driscoll, Tim <tim.odriscoll@intel.com>;
>>>> Finn, Emma <emma.finn@intel.com>
>>>> Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
>>>>
>>>> On Tue, Apr 05, 2022 at 01:29:25PM +0200, Ilya Maximets wrote:
>>>>> On 3/30/22 16:09, Bruce Richardson wrote:
>>>>>> On Wed, Mar 30, 2022 at 01:41:34PM +0200, Ilya Maximets wrote:
>>>>>>> On 3/30/22 13:12, Bruce Richardson wrote:
>>>>>>>> On Wed, Mar 30, 2022 at 12:52:15PM +0200, Ilya Maximets wrote:
>>>>>>>>> On 3/30/22 12:41, Ilya Maximets wrote:
>>>>>>>>>> Forking the thread to discuss a memory consistency/ordering model.
>>>>>>>>>>
>>>>>>>>>> AFAICT, dmadev can be anything from part of a CPU to a
>>>>>>>>>> completely separate PCI device. However, I don't see any memory
>>>>>>>>>> ordering being enforced or even described in the dmadev API or
>>>> documentation.
>>>>>>>>>> Please, point me to the correct documentation, if I somehow missed
>>>> it.
>>>>>>>>>>
>>>>>>>>>> We have a DMA device (A) and a CPU core (B) writing respectively
>>>>>>>>>> the data and the descriptor info. CPU core (C) is reading the
>>>>>>>>>> descriptor and the data it points too.
>>>>>>>>>>
>>>>>>>>>> A few things about that process:
>>>>>>>>>>
>>>>>>>>>> 1. There is no memory barrier between writes A and B (Did I miss
>>>>>>>>>> them?). Meaning that those operations can be seen by C in a
>>>>>>>>>> different order regardless of barriers issued by C and
>>>> regardless
>>>>>>>>>> of the nature of devices A and B.
>>>>>>>>>>
>>>>>>>>>> 2. Even if there is a write barrier between A and B, there is
>>>>>>>>>> no guarantee that C will see these writes in the same order
>>>>>>>>>> as C doesn't use real memory barriers because vhost
>>>>>>>>>> advertises
>>>>>>>>>
>>>>>>>>> s/advertises/does not advertise/
>>>>>>>>>
>>>>>>>>>> VIRTIO_F_ORDER_PLATFORM.
>>>>>>>>>>
>>>>>>>>>> So, I'm getting to conclusion that there is a missing write
>>>>>>>>>> barrier on the vhost side and vhost itself must not advertise
>>>>>>>>>> the
>>>>>>>>>
>>>>>>>>> s/must not/must/
>>>>>>>>>
>>>>>>>>> Sorry, I wrote things backwards. :)
>>>>>>>>>
>>>>>>>>>> VIRTIO_F_ORDER_PLATFORM, so the virtio driver can use actual
>>>>>>>>>> memory barriers.
>>>>>>>>>>
>>>>>>>>>> Would like to hear some thoughts on that topic. Is it a real
>>>> issue?
>>>>>>>>>> Is it an issue considering all possible CPU architectures and
>>>>>>>>>> DMA HW variants?
>>>>>>>>>>
>>>>>>>>
>>>>>>>> In terms of ordering of operations using dmadev:
>>>>>>>>
>>>>>>>> * Some DMA HW will perform all operations strictly in order e.g.
>>>> Intel
>>>>>>>> IOAT, while other hardware may not guarantee order of
>>>> operations/do
>>>>>>>> things in parallel e.g. Intel DSA. Therefore the dmadev API
>>>> provides the
>>>>>>>> fence operation which allows the order to be enforced. The fence
>>>> can be
>>>>>>>> thought of as a full memory barrier, meaning no jobs after the
>>>> barrier can
>>>>>>>> be started until all those before it have completed. Obviously,
>>>> for HW
>>>>>>>> where order is always enforced, this will be a no-op, but for
>>>> hardware that
>>>>>>>> parallelizes, we want to reduce the fences to get best
>>>> performance.
>>>>>>>>
>>>>>>>> * For synchronization between DMA devices and CPUs, where a CPU can
>>>> only
>>>>>>>> write after a DMA copy has been done, the CPU must wait for the
>>>> dma
>>>>>>>> completion to guarantee ordering. Once the completion has been
>>>> returned
>>>>>>>> the completed operation is globally visible to all cores.
>>>>>>>
>>>>>>> Thanks for explanation! Some questions though:
>>>>>>>
>>>>>>> In our case one CPU waits for completion and another CPU is
>>>>>>> actually using the data. IOW, "CPU must wait" is a bit ambiguous.
>>>> Which CPU must wait?
>>>>>>>
>>>>>>> Or should it be "Once the completion is visible on any core, the
>>>>>>> completed operation is globally visible to all cores." ?
>>>>>>>
>>>>>>
>>>>>> The latter.
>>>>>> Once the change to memory/cache is visible to any core, it is
>>>>>> visible to all ones. This applies to regular CPU memory writes too -
>>>>>> at least on IA, and I expect on many other architectures - once the
>>>>>> write is visible outside the current core it is visible to every
>>>>>> other core. Once the data hits the l1 or l2 cache of any core, any
>>>>>> subsequent requests for that data from any other core will "snoop"
>>>>>> the latest data from the cores cache, even if it has not made its
>>>>>> way down to a shared cache, e.g. l3 on most IA systems.
>>>>>
>>>>> It sounds like you're referring to the "multicopy atomicity" of the
>>>>> architecture. However, that is not universally supported thing.
>>>>> AFAICT, POWER and older ARM systems doesn't support it, so writes
>>>>> performed by one core are not necessarily available to all other cores
>>>>> at the same time. That means that if the CPU0 writes the data and the
>>>>> completion flag, CPU1 reads the completion flag and writes the ring,
>>>>> CPU2 may see the ring write, but may still not see the write of the
>>>>> data, even though there was a control dependency on CPU1.
>>>>> There should be a full memory barrier on CPU1 in order to fulfill the
>>>>> memory ordering requirements for CPU2, IIUC.
>>>>>
>>>>> In our scenario the CPU0 is a DMA device, which may or may not be part
>>>>> of a CPU and may have different memory consistency/ordering
>>>>> requirements. So, the question is: does DPDK DMA API guarantee
>>>>> multicopy atomicity between DMA device and all CPU cores regardless of
>>>>> CPU architecture and a nature of the DMA device?
>>>>>
>>>>
>>>> Right now, it doesn't because this never came up in discussion. In order
>>>> to be useful, it sounds like it explicitly should do so. At least for the
>>>> Intel ioat and idxd driver cases, this will be supported, so we just need
>>>> to ensure all other drivers currently upstreamed can offer this too. If
>>>> they cannot, we cannot offer it as a global guarantee, and we should see
>>>> about adding a capability flag for this to indicate when the guarantee is
>>>> there or not.
>>>>
>>>> Maintainers of dma/cnxk, dma/dpaa and dma/hisilicon - are we ok to
>>>> document for dmadev that once a DMA operation is completed, the op is
>>>> guaranteed visible to all cores/threads? If not, any thoughts on what
>>>> guarantees we can provide in this regard, or what capabilities should be
>>>> exposed?
>>>
>>>
>>>
>>> Hi @Chengwen Feng, @Radha Mohan Chintakuntla, @Veerasenareddy Burru, @Gagandeep Singh, @Nipun Gupta,
>>> Requesting your valuable opinions for the queries on this thread.
>>
>> Sorry late for reply due I didn't follow this thread.
>>
>> I don't think the DMA API should provide such guarantee because:
>> 1. DMA is an acceleration device, which is the same as encryption/decryption device or network device.
>> 2. For Hisilicon Kunpeng platform:
>> The DMA device support:
>> a) IO coherency: which mean it could read read the latest data which may stay the cache, and will
>> invalidate cache's data and write data to DDR when write.
>> b) Order in one request: which mean it only write completion descriptor after the copy is done.
>> Note: orders between multiple requests can be implemented through the fence mechanism.
>> The DMA driver only should:
>> a) Add one write memory barrier(use lightweight mb) when doorbell.
>> So once the DMA is completed the operation is guaranteed visible to all cores,
>> And the 3rd core will observed the right order: core-B prepare data and issue request to DMA, DMA
>> start work, core-B get completion status.
>> 3. I did a TI multi-core SoC many years ago, the SoC don't support cache coherence and consistency between
>> cores. The SoC also have DMA device which have many channel. Here we do a hypothetical design the DMA
>> driver with the DPDK DMA framework:
>> The DMA driver should:
>> a) write back DMA's src buffer, so that there are none cache data when DMA running.
>> b) invalidate DMA's dst buffer
>> c) do a full mb
>> d) update DMA's registers.
>> Then DMA will execute the copy task, it copy from DDR and write to DDR, and after copy it will modify
>> it's status register to completed.
>> In this case, the 3rd core will also observed the right order.
>> A particular point of this is: If one buffer will shared on multiple core, application should explicit
>> maintain the cache.
>>
>> Based on above, I don't think the DMA API should explicit add the descriptor, it's driver's and even
>> application(e.g. above TI's SoC)'s duty to make sure it.
>>
> Hi,
>
> thanks for that. So if I understand correctly, your current HW does provide
> this guarantee, but you don't think it should be always the case for
> dmadev, correct?
Yes, our HW will provide the guarantee.
If some HW could not provide, it's driver's and maybe application's duty to provide it.
>
> Based on that, what do you think should be the guarantee on completion?
> Once a job is completed, the completion is visible to the submitting core,
> or the core reading the completion? Do you think it's acceptable to add a
Both core will visible to it.
> capability flag for drivers to indicate that they do support a "globally
> visible" guarantee?
I think the driver (and with HW) should support "globally visible" guarantee.
And for some HW, even application (or middleware) should care about it.
>
> Thanks,
> /Bruce
>
> .
>
next prev parent reply other threads:[~2022-05-13 9:48 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-24 15:36 Stokes, Ian
2022-03-28 18:19 ` Pai G, Sunil
2022-03-29 12:51 ` Morten Brørup
2022-03-29 13:01 ` Van Haaren, Harry
2022-03-29 14:44 ` Morten Brørup
2022-03-29 16:24 ` Maxime Coquelin
2022-03-29 16:45 ` Morten Brørup
2022-03-29 17:03 ` Bruce Richardson
2022-03-29 17:13 ` Morten Brørup
2022-03-29 17:45 ` Ilya Maximets
2022-03-29 18:46 ` Morten Brørup
2022-03-30 2:02 ` Hu, Jiayu
2022-03-30 9:25 ` Maxime Coquelin
2022-03-30 10:20 ` Bruce Richardson
2022-03-30 14:27 ` Hu, Jiayu
2022-03-29 17:46 ` Van Haaren, Harry
2022-03-29 19:59 ` Morten Brørup
2022-03-30 9:01 ` Van Haaren, Harry
2022-04-07 14:04 ` Van Haaren, Harry
2022-04-07 14:25 ` Maxime Coquelin
2022-04-07 14:39 ` Ilya Maximets
2022-04-07 14:42 ` Van Haaren, Harry
2022-04-07 15:01 ` Ilya Maximets
2022-04-07 15:46 ` Maxime Coquelin
2022-04-07 16:04 ` Bruce Richardson
2022-04-08 7:13 ` Hu, Jiayu
2022-04-08 8:21 ` Morten Brørup
2022-04-08 9:57 ` Ilya Maximets
2022-04-20 15:39 ` Mcnamara, John
2022-04-20 16:41 ` Mcnamara, John
2022-04-25 21:46 ` Ilya Maximets
2022-04-27 14:55 ` Mcnamara, John
2022-04-27 20:34 ` Bruce Richardson
2022-04-28 12:59 ` Ilya Maximets
2022-04-28 13:55 ` Bruce Richardson
2022-05-03 19:38 ` Van Haaren, Harry
2022-05-10 14:39 ` Van Haaren, Harry
2022-05-24 12:12 ` Ilya Maximets
2022-03-30 10:41 ` Ilya Maximets
2022-03-30 10:52 ` Ilya Maximets
2022-03-30 11:12 ` Bruce Richardson
2022-03-30 11:41 ` Ilya Maximets
2022-03-30 14:09 ` Bruce Richardson
2022-04-05 11:29 ` Ilya Maximets
2022-04-05 12:07 ` Bruce Richardson
2022-04-08 6:29 ` Pai G, Sunil
2022-05-13 8:52 ` fengchengwen
2022-05-13 9:10 ` Bruce Richardson
2022-05-13 9:48 ` fengchengwen [this message]
2022-05-13 10:34 ` Bruce Richardson
2022-05-16 9:04 ` Morten Brørup
2022-05-16 22:31 ` [EXT] " Radha Chintakuntla
-- strict thread matches above, loose matches on Subject: below --
2022-04-25 15:19 Mcnamara, John
2022-04-21 14:57 Mcnamara, John
[not found] <DM6PR11MB3227AC0014F321EB901BE385FC199@DM6PR11MB3227.namprd11.prod.outlook.com>
2022-04-21 11:51 ` Mcnamara, John
[not found] <DM8PR11MB5605B4A5DBD79FFDB4B1C3B2BD0A9@DM8PR11MB5605.namprd11.prod.outlook.com>
2022-03-21 18:23 ` Pai G, Sunil
2022-03-15 15:48 Stokes, Ian
2022-03-15 13:17 Stokes, Ian
2022-03-15 11:15 Stokes, Ian
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a993ec37-323e-2e00-a423-3ecfbc3e7b35@huawei.com \
--to=fengchengwen@huawei.com \
--cc=bruce.richardson@intel.com \
--cc=cian.ferriter@intel.com \
--cc=dev@dpdk.org \
--cc=emma.finn@intel.com \
--cc=g.singh@nxp.com \
--cc=harry.van.haaren@intel.com \
--cc=i.maximets@ovn.org \
--cc=ian.stokes@intel.com \
--cc=jiayu.hu@intel.com \
--cc=john.mcnamara@intel.com \
--cc=maxime.coquelin@redhat.com \
--cc=nipun.gupta@nxp.com \
--cc=ovs-dev@openvswitch.org \
--cc=radhac@marvell.com \
--cc=sunil.pai.g@intel.com \
--cc=tim.odriscoll@intel.com \
--cc=vburru@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).