DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Medvedkin, Vladimir" <vladimir.medvedkin@intel.com>
To: Jerin Jacob <jerinjacobk@gmail.com>,
	fengchengwen <fengchengwen@huawei.com>, <sburla@marvell.com>,
	<anoobj@marvell.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>, <dev@dpdk.org>,
	Kevin Laatz <kevin.laatz@intel.com>,
	Bruce Richardson <bruce.richardson@intel.com>
Subject: Re: [PATCH v1 1/3] dmadev: add inter-domain operations
Date: Fri, 27 Oct 2023 14:46:28 +0100	[thread overview]
Message-ID: <3504aaf1-b4e0-44f5-883a-5e4bc0197283@intel.com> (raw)
In-Reply-To: <CALBAE1PiCdA3nLUBo-XwRVGNY7AykxyjJwznmoNdpuDXh37=yg@mail.gmail.com>

Hi Satananda, Anoob, Chengwen, Jerin, all,

After a number of internal discussions we have decided that we're going 
to postpone this feature/patchset till next release.

 >[Satananda] Have you considered extending  rte_dma_port_param and 
rte_dma_vchan_conf to represent interdomain memory transfer setup as a 
separate port type like RTE_DMA_PORT_INTER_DOMAIN ?

 >[Anoob] Can we move this 'src_handle' and 'dst_handle' registration to 
rte_dma_vchan_setup so that the 'src_handle' and 'dst_handle' can be 
configured in control path and the existing datapath APIs can work as is.

 >[Jerin] Or move src_handle/dst_hanel to vchan config

We've listened to feedback on implementation, and have prototyped a 
vchan-based interface. This has a number of advantages and 
disadvantages, both in terms of API usage and in terms of our specific 
driver.

Setting up inter-domain operations as separate vchans allow us to store 
data inside the PMD and not duplicate any API paths, so having multiple 
vchans addresses that problem. However, this also means that any new 
vchans added while the PMD is active (such as attaching to a new 
process) will have to be gated by start/stop. This is probably fine from 
API point of view, but a hassle for user (previously, we could've just 
started using the new inter-domain handle right away).

Another usability issue with multiple vchan approach is that now, each 
vchan will have its own enqueue/submit/completion cycle, so any use case 
relying on one thread communicating with many processes will have to 
process each vchan separately, instead of everything going into one 
vchan - again, looks fine API-wise, but a hassle for the user, since 
this requires calling submit and completion for each vchan, and in some 
cases it requires maintaining some kind of reordering queue. (On the 
other hand, it would be much easier to separate operations intended for 
different processes with this approach, so perhaps this is not such a 
big issue)

Finally, there is also an IDXD-specific issue. Currently, IDXD HW 
acceleration is implemented in such a way that each work queue will have 
a unique DMA device ID (rather than a unique vchan), and each device can 
technically process requests for both local and remote memory (local to 
remote, remote to local, remote to remote), all in one queue - as it was 
in our original implementation.

By changing implementation to use vchans, we're essentially bifurcating 
this single queue - all vchans would have their own rings etc., but the 
enqueue-to-hardware operation is still common to all vchans, because 
there's a single underlying queue as far as hardware is concerned. The 
queue is atomic in hardware, and technically, ENQCMD instruction returns 
status in case of enqueue failure (such as when too many requests are in 
flight), so technically we could just not pay attention to number of 
in-flight operations and just rely on ENQCMD returning failures to 
handle error/retry, but the problem with this is that this failure is 
only happening on submit, not on enqueue.

So, in essence, with IDXD driver we have two choices: either we 
implement some kind of in-flight counter to prevent our driver from 
submitting too many requests (that is, vchans will have to cooperate - 
use atomics or similar), or every user will have to handle not just 
errors on enqueue, but also on submit (which I don't believe many people 
do currently, even though technically submit can return failure - all 
non-test usage in DPDK seems to assume submit realistically won't fail, 
and I'd like to keep it that way).

We're in process of measuring performance impact of different 
implementations, however I should note that while atomic operations on 
data path are unfortunate, realistically these atomics are accessed only 
at beginning/end of every 'enqueue-submit-complete' cycle, and not on 
every operation. At the first glance where are no observable performance 
penalty in regular use case (assuming we are not calling submit for 
every enqueued job).

 >[Satananda]Do you have usecases where a process from 3rd domain sets 
up transfer between memories from 2 domains? i.e process 1 is src, 
process 2 is dest and process 3 executes transfer.

This usecase is working with proposed API on our hardware.

 >[Chengwen]And last, Could you introduce the application scenarios of 
this feature?

We have used this feature to improve performance for memif driver.


On 09/10/2023 06:05, Jerin Jacob wrote:
> On Sun, Oct 8, 2023 at 8:03 AM fengchengwen <fengchengwen@huawei.com> wrote:
>> Hi Anatoly,
>>
>> On 2023/8/12 0:14, Anatoly Burakov wrote:
>>> Add a flag to indicate that a specific device supports inter-domain
>>> operations, and add an API for inter-domain copy and fill.
>>>
>>> Inter-domain operation is an operation that is very similar to regular
>>> DMA operation, except either source or destination addresses can be in a
>>> different process's address space, indicated by source and destination
>>> handle values. These values are currently meant to be provided by
>>> private drivers' API's.
>>>
>>> This commit also adds a controller ID field into the DMA device API.
>>> This is an arbitrary value that may not be implemented by hardware, but
>>> it is meant to represent some kind of device hierarchy.
>>>
>>> Signed-off-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
>>> Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
>>> ---
>> ...
>>
>>> +__rte_experimental
>>> +static inline int
>>> +rte_dma_copy_inter_dom(int16_t dev_id, uint16_t vchan, rte_iova_t src,
>>> +             rte_iova_t dst, uint32_t length, uint16_t src_handle,
>>> +             uint16_t dst_handle, uint64_t flags)
>> I would suggest add more general extension:
>> rte_dma_copy*(int16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
>>                uint32_t length, uint64_t flags, void *param)
>> The param only valid under some flags bits.
>> As for this inter-domain extension: we could define inter-domain param struct.
>>
>>
>> Whether add in current rte_dma_copy() API or add one new API, I think it mainly
>> depend on performance impact of parameter transfer. Suggest more discuss for
>> differnt platform and call specification.
> Or move src_handle/dst_hanel to vchan config to enable better performance.
> Application create N number of vchan based on the requirements.
>
>>
>> And last, Could you introduce the application scenarios of this feature?
> Looks like VM to VM or container to container copy.
>
>>
>> Thanks.
>>
-- 
Regards,
Vladimir


  reply	other threads:[~2023-10-27 13:46 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-11 16:14 [PATCH v1 0/3] Add support for inter-domain DMA operations Anatoly Burakov
2023-08-11 16:14 ` [PATCH v1 1/3] dmadev: add inter-domain operations Anatoly Burakov
2023-08-18  8:08   ` [EXT] " Anoob Joseph
2023-10-08  2:33   ` fengchengwen
2023-10-09  5:05     ` Jerin Jacob
2023-10-27 13:46       ` Medvedkin, Vladimir [this message]
2023-11-23  5:24         ` Jerin Jacob
2023-08-11 16:14 ` [PATCH v1 2/3] dma/idxd: implement " Anatoly Burakov
2023-08-11 16:14 ` [PATCH v1 3/3] dma/idxd: add API to create and attach to window Anatoly Burakov
2023-08-14  4:39   ` Jerin Jacob
2023-08-14  9:55     ` Burakov, Anatoly
2023-08-15 19:20 ` [EXT] [PATCH v1 0/3] Add support for inter-domain DMA operations Satananda Burla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3504aaf1-b4e0-44f5-883a-5e4bc0197283@intel.com \
    --to=vladimir.medvedkin@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=anoobj@marvell.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=fengchengwen@huawei.com \
    --cc=jerinjacobk@gmail.com \
    --cc=kevin.laatz@intel.com \
    --cc=sburla@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).