From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id B944DA050A;
	Wed, 30 Mar 2022 13:41:40 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 5CFF740685;
	Wed, 30 Mar 2022 13:41:40 +0200 (CEST)
Received: from relay7-d.mail.gandi.net (relay7-d.mail.gandi.net
 [217.70.183.200])
 by mails.dpdk.org (Postfix) with ESMTP id A514D4013F
 for <dev@dpdk.org>; Wed, 30 Mar 2022 13:41:38 +0200 (CEST)
Received: (Authenticated sender: i.maximets@ovn.org)
 by mail.gandi.net (Postfix) with ESMTPSA id 78DB220013;
 Wed, 30 Mar 2022 11:41:34 +0000 (UTC)
Message-ID: <ea799746-090c-c7a8-ed37-78f8f88e0b8c@ovn.org>
Date: Wed, 30 Mar 2022 13:41:34 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.5.0
Cc: i.maximets@ovn.org, "Pai G, Sunil" <sunil.pai.g@intel.com>,
 "Stokes, Ian" <ian.stokes@intel.com>, "Hu, Jiayu" <jiayu.hu@intel.com>,
 "Ferriter, Cian" <cian.ferriter@intel.com>,
 "Van Haaren, Harry" <harry.van.haaren@intel.com>,
 "Maxime Coquelin (maxime.coquelin@redhat.com)" <maxime.coquelin@redhat.com>,
 "ovs-dev@openvswitch.org" <ovs-dev@openvswitch.org>,
 "dev@dpdk.org" <dev@dpdk.org>, "Mcnamara, John" <john.mcnamara@intel.com>,
 "O'Driscoll, Tim" <tim.odriscoll@intel.com>, "Finn, Emma"
 <emma.finn@intel.com>
Content-Language: en-US
To: Bruce Richardson <bruce.richardson@intel.com>
References: <ddaaf8eb51cf463581eef245543a719d@intel.com>
 <DM8PR11MB56058BABCF2D0CDA3D9AA90DBD1D9@DM8PR11MB5605.namprd11.prod.outlook.com>
 <22e3ff73-f3d9-abae-1866-90d133af5528@ovn.org>
 <a095984c-bdee-8f7b-583e-034ac1165497@ovn.org>
 <YkQ7Mz6l0JIHW8Gh@bricha3-MOBL.ger.corp.intel.com>
From: Ilya Maximets <i.maximets@ovn.org>
Subject: Re: OVS DPDK DMA-Dev library/Design Discussion
In-Reply-To: <YkQ7Mz6l0JIHW8Gh@bricha3-MOBL.ger.corp.intel.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On 3/30/22 13:12, Bruce Richardson wrote:
> On Wed, Mar 30, 2022 at 12:52:15PM +0200, Ilya Maximets wrote:
>> On 3/30/22 12:41, Ilya Maximets wrote:
>>> Forking the thread to discuss a memory consistency/ordering model.
>>>
>>> AFAICT, dmadev can be anything from part of a CPU to a completely
>>> separate PCI device.  However, I don't see any memory ordering being
>>> enforced or even described in the dmadev API or documentation.
>>> Please, point me to the correct documentation, if I somehow missed it.
>>>
>>> We have a DMA device (A) and a CPU core (B) writing respectively
>>> the data and the descriptor info.  CPU core (C) is reading the
>>> descriptor and the data it points too.
>>>
>>> A few things about that process:
>>>
>>> 1. There is no memory barrier between writes A and B (Did I miss
>>>    them?).  Meaning that those operations can be seen by C in a
>>>    different order regardless of barriers issued by C and regardless
>>>    of the nature of devices A and B.
>>>
>>> 2. Even if there is a write barrier between A and B, there is
>>>    no guarantee that C will see these writes in the same order
>>>    as C doesn't use real memory barriers because vhost advertises
>>
>> s/advertises/does not advertise/
>>
>>>    VIRTIO_F_ORDER_PLATFORM.
>>>
>>> So, I'm getting to conclusion that there is a missing write barrier
>>> on the vhost side and vhost itself must not advertise the
>>
>> s/must not/must/
>>
>> Sorry, I wrote things backwards. :)
>>
>>> VIRTIO_F_ORDER_PLATFORM, so the virtio driver can use actual memory
>>> barriers.
>>>
>>> Would like to hear some thoughts on that topic.  Is it a real issue?
>>> Is it an issue considering all possible CPU architectures and DMA
>>> HW variants?
>>>
> 
> In terms of ordering of operations using dmadev:
> 
> * Some DMA HW will perform all operations strictly in order e.g. Intel
>   IOAT, while other hardware may not guarantee order of operations/do
>   things in parallel e.g. Intel DSA. Therefore the dmadev API provides the
>   fence operation which allows the order to be enforced. The fence can be
>   thought of as a full memory barrier, meaning no jobs after the barrier can
>   be started until all those before it have completed. Obviously, for HW
>   where order is always enforced, this will be a no-op, but for hardware that
>   parallelizes, we want to reduce the fences to get best performance.
> 
> * For synchronization between DMA devices and CPUs, where a CPU can only
>   write after a DMA copy has been done, the CPU must wait for the dma
>   completion to guarantee ordering. Once the completion has been returned
>   the completed operation is globally visible to all cores.

Thanks for explanation!  Some questions though:

In our case one CPU waits for completion and another CPU is actually using
the data.  IOW, "CPU must wait" is a bit ambiguous.  Which CPU must wait?

Or should it be "Once the completion is visible on any core, the completed
operation is globally visible to all cores." ?

And the main question:
  Are these synchronization claims documented somewhere?

Best regards, Ilya Maximets.