From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 1E47FA0C41;
	Wed, 23 Jun 2021 05:30:31 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 8DEBC4003F;
	Wed, 23 Jun 2021 05:30:30 +0200 (CEST)
Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189])
 by mails.dpdk.org (Postfix) with ESMTP id 5DA334003E
 for <dev@dpdk.org>; Wed, 23 Jun 2021 05:30:28 +0200 (CEST)
Received: from dggemv711-chm.china.huawei.com (unknown [172.30.72.55])
 by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4G8pZg2YV1z71dm;
 Wed, 23 Jun 2021 11:26:19 +0800 (CST)
Received: from dggpeml500024.china.huawei.com (7.185.36.10) by
 dggemv711-chm.china.huawei.com (10.1.198.66) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2176.2; Wed, 23 Jun 2021 11:30:25 +0800
Received: from [127.0.0.1] (10.40.190.165) by dggpeml500024.china.huawei.com
 (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Wed, 23 Jun
 2021 11:30:24 +0800
To: Jerin Jacob <jerinjacobk@gmail.com>
CC: Bruce Richardson <bruce.richardson@intel.com>,
 =?UTF-8?Q?Morten_Br=c3=b8rup?= <mb@smartsharesystems.com>, Thomas Monjalon
 <thomas@monjalon.net>, Ferruh Yigit <ferruh.yigit@intel.com>, dpdk-dev
 <dev@dpdk.org>, Nipun Gupta <nipun.gupta@nxp.com>, Hemant Agrawal
 <hemant.agrawal@nxp.com>, Maxime Coquelin <maxime.coquelin@redhat.com>,
 Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>, Jerin Jacob
 <jerinj@marvell.com>, David Marchand <david.marchand@redhat.com>, "Satananda
 Burla" <sburla@marvell.com>, Prasun Kapoor <pkapoor@marvell.com>
References: <1623763327-30987-1-git-send-email-fengchengwen@huawei.com>
 <YMjXilFHjCxQ9ViD@bricha3-MOBL.ger.corp.intel.com>
 <98CBD80474FA8B44BF855DF32C47DC35C61860@smartserver.smartshare.dk>
 <b0fdb909-584d-00ce-ed41-58f6ef020527@huawei.com>
 <CALBAE1MuSq1Heyk8_GR=O1aaXFvBY=A1v2076U4zzKyxjwC_Ww@mail.gmail.com>
 <YMsSxeIQvvcQoyp1@bricha3-MOBL.ger.corp.intel.com>
 <CALBAE1M=JXAKt6VxZ3uOwyM3c04D9Kw556WtMeBHYhQx3ywJWA@mail.gmail.com>
 <3cb0bd01-2b0d-cf96-d173-920947466041@huawei.com>
 <CALBAE1NgLz2bV53X3DzjG+_tjN4B0M4J83ykeXgh2KTyP0szTw@mail.gmail.com>
From: fengchengwen <fengchengwen@huawei.com>
Message-ID: <aceadea0-a786-a666-cf70-4e501d501ba1@huawei.com>
Date: Wed, 23 Jun 2021 11:30:24 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
 Thunderbird/68.11.0
MIME-Version: 1.0
In-Reply-To: <CALBAE1NgLz2bV53X3DzjG+_tjN4B0M4J83ykeXgh2KTyP0szTw@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.40.190.165]
X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To
 dggpeml500024.china.huawei.com (7.185.36.10)
X-CFilter-Loop: Reflected
Subject: Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 2021/6/23 1:25, Jerin Jacob wrote:
> On Fri, Jun 18, 2021 at 3:11 PM fengchengwen <fengchengwen@huawei.com> wrote:
>>
>> On 2021/6/18 13:52, Jerin Jacob wrote:
>>> On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson
>>> <bruce.richardson@intel.com> wrote:
>>>>
>>>> On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote:
>>>>> On Wed, Jun 16, 2021 at 3:47 PM fengchengwen <fengchengwen@huawei.com> wrote:
>>>>>>
>>>>>> On 2021/6/16 15:09, Morten Brørup wrote:
>>>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
>>>>>>>> Sent: Tuesday, 15 June 2021 18.39
>>>>>>>>
>>>>>>>> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
>>>>>>>>> This patch introduces 'dmadevice' which is a generic type of DMA
>>>>>>>>> device.
>>>>>>>>>
>>>>>>>>> The APIs of dmadev library exposes some generic operations which can
>>>>>>>>> enable configuration and I/O with the DMA devices.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>>>>>>>>> ---
>>>>>>>> Thanks for sending this.
>>>>>>>>
>>>>>>>> Of most interest to me right now are the key data-plane APIs. While we
>>>>>>>> are
>>>>>>>> still in the prototyping phase, below is a draft of what we are
>>>>>>>> thinking
>>>>>>>> for the key enqueue/perform_ops/completed_ops APIs.
>>>>>>>>
>>>>>>>> Some key differences I note in below vs your original RFC:
>>>>>>>> * Use of void pointers rather than iova addresses. While using iova's
>>>>>>>> makes
>>>>>>>>   sense in the general case when using hardware, in that it can work
>>>>>>>> with
>>>>>>>>   both physical addresses and virtual addresses, if we change the APIs
>>>>>>>> to use
>>>>>>>>   void pointers instead it will still work for DPDK in VA mode, while
>>>>>>>> at the
>>>>>>>>   same time allow use of software fallbacks in error cases, and also a
>>>>>>>> stub
>>>>>>>>   driver than uses memcpy in the background. Finally, using iova's
>>>>>>>> makes the
>>>>>>>>   APIs a lot more awkward to use with anything but mbufs or similar
>>>>>>>> buffers
>>>>>>>>   where we already have a pre-computed physical address.
>>>>>>>> * Use of id values rather than user-provided handles. Allowing the
>>>>>>>> user/app
>>>>>>>>   to manage the amount of data stored per operation is a better
>>>>>>>> solution, I
>>>>>>>>   feel than proscribing a certain about of in-driver tracking. Some
>>>>>>>> apps may
>>>>>>>>   not care about anything other than a job being completed, while other
>>>>>>>> apps
>>>>>>>>   may have significant metadata to be tracked. Taking the user-context
>>>>>>>>   handles out of the API also makes the driver code simpler.
>>>>>>>> * I've kept a single combined API for completions, which differs from
>>>>>>>> the
>>>>>>>>   separate error handling completion API you propose. I need to give
>>>>>>>> the
>>>>>>>>   two function approach a bit of thought, but likely both could work.
>>>>>>>> If we
>>>>>>>>   (likely) never expect failed ops, then the specifics of error
>>>>>>>> handling
>>>>>>>>   should not matter that much.
>>>>>>>>
>>>>>>>> For the rest, the control / setup APIs are likely to be rather
>>>>>>>> uncontroversial, I suspect. However, I think that rather than xstats
>>>>>>>> APIs,
>>>>>>>> the library should first provide a set of standardized stats like
>>>>>>>> ethdev
>>>>>>>> does. If driver-specific stats are needed, we can add xstats later to
>>>>>>>> the
>>>>>>>> API.
>>>>>>>>
>>>>>>>> Appreciate your further thoughts on this, thanks.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> /Bruce
>>>>>>>
>>>>>>> I generally agree with Bruce's points above.
>>>>>>>
>>>>>>> I would like to share a couple of ideas for further discussion:
>>>>>
>>>>>
>>>>> I believe some of the other requirements and comments for generic DMA will be
>>>>>
>>>>> 1) Support for the _channel_, Each channel may have different
>>>>> capabilities and functionalities.
>>>>> Typical cases are, each channel have separate source and destination
>>>>> devices like
>>>>> DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe
>>>>> EP to PCIe EP.
>>>>> So we need some notion of the channel in the specification.
>>>>>
>>>>
>>>> Can you share a bit more detail on what constitutes a channel in this case?
>>>> Is it equivalent to a device queue (which we are flattening to individual
>>>> devices in this API), or to a specific configuration on a queue?
>>>
>>> It not a queue. It is one of the attributes for transfer.
>>> I.e in the same queue, for a given transfer it can specify the
>>> different "source" and "destination" device.
>>> Like CPU to Sound card, CPU to network card etc.
>>>
>>>
>>>>
>>>>> 2) I assume current data plane APIs are not thread-safe. Is it right?
>>>>>
>>>> Yes.
>>>>
>>>>>
>>>>> 3) Cookie scheme outlined earlier looks good to me. Instead of having
>>>>> generic dequeue() API
>>>>>
>>>>> 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src,
>>>>> void * dst, unsigned int length);
>>>>> to two stage API like, Where one will be used in fastpath and other
>>>>> one will use used in slowpath.
>>>>>
>>>>> - slowpath API will for take channel and take other attributes for transfer
>>>>>
>>>>> Example syantx will be:
>>>>>
>>>>> struct rte_dmadev_desc {
>>>>>            channel id;
>>>>>            ops ; // copy, xor, fill etc
>>>>>           other arguments specific to dma transfer // it can be set
>>>>> based on capability.
>>>>>
>>>>> };
>>>>>
>>>>> rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id,  struct
>>>>> rte_dmadev_desc *dec);
>>>>>
>>>>> - Fastpath takes arguments that need to change per transfer along with
>>>>> slow-path handle.
>>>>>
>>>>> rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, unsigned
>>>>> int length,  rte_dmadev_desc_t desc)
>>>>>
>>>>> This will help to driver to
>>>>> -Former API form the device-specific descriptors in slow path  for a
>>>>> given channel and fixed attributes per transfer
>>>>> -Later API blend "variable" arguments such as src, dest address with
>>>>> slow-path created descriptors
>>>>>
>>>>
>>>> This seems like an API for a context-aware device, where the channel is the
>>>> config data/context that is preserved across operations - is that correct?
>>>> At least from the Intel DMA accelerators side, we have no concept of this
>>>> context, and each operation is completely self-described. The location or
>>>> type of memory for copies is irrelevant, you just pass the src/dst
>>>> addresses to reference.
>>>
>>> it is not context-aware device. Each HW JOB is self-described.
>>> You can view it different attributes of transfer.
>>>
>>>
>>>>
>>>>> The above will give better performance and is the best trade-off c
>>>>> between performance and per transfer variables.
>>>>
>>>> We may need to have different APIs for context-aware and context-unaware
>>>> processing, with which to use determined by the capabilities discovery.
>>>> Given that for these DMA devices the offload cost is critical, more so than
>>>> any other dev class I've looked at before, I'd like to avoid having APIs
>>>> with extra parameters than need to be passed about since that just adds
>>>> extra CPU cycles to the offload.
>>>
>>> If driver does not support additional attributes and/or the
>>> application does not need it, rte_dmadev_desc_t can be NULL.
>>> So that it won't have any cost in the datapath. I think, we can go to
>>> different API
>>> cases if we can not abstract problems without performance impact.
>>> Otherwise, it will be too much
>>> pain for applications.
>>
>> Yes, currently we plan to use different API for different case, e.g.
>>   rte_dmadev_memcpy()  -- deal with local to local memcopy
>>   rte_dmadev_memset()  -- deal with fill with local memory with pattern
>> maybe:
>>   rte_dmadev_imm_data()  --deal with copy very little data
>>   rte_dmadev_p2pcopy()   --deal with peer-to-peer copy of diffenet PCIE addr
>>
>> These API capabilities will be reflected in the device capability set so that
>> application could know by standard API.
> 
> 
> There will be a lot of combination of that it will be like M x N cross
> base case, It won't scale.

Currently, it is hard to define generic dma descriptor, I think the well-defined
APIs is feasible.

> 
>>
>>>
>>> Just to understand, I think, we need to HW capabilities and how to
>>> have a common API.
>>> I assume HW will have some HW JOB descriptors which will be filled in
>>> SW and submitted to HW.
>>> In our HW,  Job descriptor has the following main elements
>>>
>>> - Channel   // We don't expect the application to change per transfer
>>> - Source address - It can be scatter-gather too - Will be changed per transfer
>>> - Destination address - It can be scatter-gather too - Will be changed
>>> per transfer
>>> - Transfer Length - - It can be scatter-gather too - Will be changed
>>> per transfer
>>> - IOVA address where HW post Job completion status PER Job descriptor
>>> - Will be changed per transfer
>>> - Another sideband information related to channel  // We don't expect
>>> the application to change per transfer
>>> - As an option, Job completion can be posted as an event to
>>> rte_event_queue  too // We don't expect the application to change per
>>> transfer
>>
>> The 'option' field looks like a software interface field, but not HW descriptor.
> 
> It is in HW descriptor.

The HW is interesting, something like: DMA could send completion direct to EventHWQueue,
the DMA and EventHWQueue are link in the hardware range, rather than by software.

Could you provide public driver of this HW ? So we could know more about it's working
mechanism and software-hardware collaboration.

> 
>>
>>>
>>> @Richardson, Bruce @fengchengwen @Hemant Agrawal
>>>
>>> Could you share the options for your HW descriptors  which you are
>>> planning to expose through API like above so that we can easily
>>> converge on fastpath API
>>>
>>
>> Kunpeng HW descriptor is self-describing, and don't need refer context info.
>>
>> Maybe the fields which was fix with some transfer type could setup by driver, and
>> don't expose to application.
> 
> Yes. I agree.I think, that reason why I though to have
> rte_dmadev_prep() call to convert DPDK DMA transfer attributes to HW
> specific descriptors
> and have single enq() operation with variable argument(through enq
> parameter) and fix argumenents through rte_dmadev_prep() call object.
> 
>>
>> So that we could use more generic way to define the API.
>>
>>>
>>>
>>>>
>>>> /Bruce
>>>
>>> .
>>>
>>
> 
> .
>