DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Takeshi T Yoshimura <TYOS@jp.ibm.com>
Cc: "Mo, YufengX" <yufengx.mo@intel.com>,
	dev@dpdk.org, David Christensen <drc@ibm.com>,
	Pradeep Satyanarayana <pradeep@us.ibm.com>
Subject: Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA area in ppc64le
Date: Fri, 28 Jun 2019 14:47:01 +0100	[thread overview]
Message-ID: <23b7cf60-e99a-51b1-0cf8-201f31b648ee@intel.com> (raw)
In-Reply-To: <OFD546F28C.B5422A21-ON00258427.003A2A38-00258427.003FFD5F@notes.na.collabserv.com>

On 28-Jun-19 12:38 PM, Takeshi T Yoshimura wrote:
>> To: "Mo, YufengX" <yufengx.mo@intel.com>, dev@dpdk.org
>> From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
>> Date: 06/26/2019 06:43PM
>> Cc: drc@ibm.com, pradeep@us.ibm.com, Takeshi Yoshimura
>> <tyos@jp.ibm.com>
>> Subject: [EXTERNAL] Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA
>> area in ppc64le
>>
>> On 18-Jun-19 3:37 AM, Mo, YufengX wrote:
>>> From: Takeshi Yoshimura <tyos@jp.ibm.com>
>>>
>>> In ppc64le, expanding DMA areas always fail because we cannot
>> remove
>>> a DMA window. As a result, we cannot allocate more than one memseg
>> in
>>> ppc64le. This is because vfio_spapr_dma_mem_map() doesn't unmap all
>>> the mapped DMA before removing the window. This patch fixes this
>>> incorrect behavior.
>>>
>>> I added a global variable to track current window size since we do
>>> not have better ways to get exact size of it than doing so. sPAPR
>>> IOMMU seems not to provide any ways to get window size with ioctl
>>> interfaces. rte_memseg_walk*() is currently used to calculate
>> window
>>> size, but it walks memsegs that are marked as used, not mapped. So,
>>> we need to determine if a given memseg is mapped or not, otherwise
>>> the ioctl reports errors due to attempting to unregister memory
>>> addresses that are not registered. The global variable is excluded
>>> in non-ppc64le binaries.
>>>
>>> Similar problems happen in user maps. We need to avoid attempting
>> to
>>> unmap the address that is given as the function's parameter. The
>>> compaction of user maps prevents us from passing correct length for
>>> unmapping DMA at the window recreation. So, I removed it in
>> ppc64le.
>>>
>>> I also fixed the order of ioctl for unregister and unmap. The ioctl
>>> for unregister sometimes report device busy errors due to the
>>> existence of mapped area.
>>>
>>> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
>>> ---
>>
>> OK there are three patches, and two v1's with two different authors
>> in
>> reply to the same original patch. There's too much going on here, i
>> can't review this. Needs splitting.
>>
>> Also, #ifdef-ing out the map merging seems highly suspect.
>>
>> With regards to "walking used memsegs, not mapped", unless i'm
>> misunderstanding something, these are the same - whenever a segment
>> is
>> mapped, it is marked as used, and whenever it is unmapped, it is
>> marked
>> as free. Could you please explain what is the difference and why is
>> this
>> needed?
>>
>> Is the point of contention here being the fact that whenever the
>> unmap
>> callback arrives, the segments still appear used when iterating over
>> the
>> map? If that's the case, then i think it would be OK to mark them as
>> unused *before* triggering callbacks, and chances are some of this
>> code
>> wouldn't be needed. That would require a deprecation notice though,
>> because the API behavior will change (even if this fact is not
>> documented properly).
>>
>> -- 
>> Thanks,
>> Anatoly
>>
>>
> 
> I am the author of this patch. We should ignore a patch from YufengX Mo.
> 
>>From my code reading, a memsg is at first marked as used when it is allocated. Then, the memseg is passed to vfio_spapr_dma_mem_map(). The callback iterates all the used (i.e., allocated) memsegs and call ioctl for mapping VA to IOVA. So, when vfio_spapr_dma_mem_map() is called, passed memsegs can be non-mapped but marked as used. As a result, an attempt to unmap non-mapped area happens during DMA window expansion. This is the difference and why this fix was needed.
> 
>> i think it would be OK to mark them as unused *before* triggering callbacks
> 
> Yes, my first idea was the same as yours, but I was also worried that it might cause inconsistent API behavior as you also pointed out. If you think so, I think I can rewrite the patch without ugly #ifdef.
> 
> Unfortunately, I don't have enough time for writing code next week and next next week. So, I will resubmit the revised patch weeks later.

I think the approach with fixing the mem callbacks to report the 
unmapped segments as no longer used would be better.

As far as i can remember at the point where callbacks are triggered, the 
memory is already removed from malloc heap and from all processes. Each 
secondary also stores their own shadow copy of the memory map, so 
removing the "used" flags from the main map will not have any 
consequences as far as correctness is concerned. Each callback is also 
getting the memory area being removed as parameters, so if there is code 
that needs to be run taking into account that memory area, it can be done.

Existing code may rely on this behavior (even though it doesn't make 
much sense now that i think of it), so going with this approach *will* 
require a deprecation notice and can only be done in the next release.

> 
> Regards,
> Takeshi
> 
> 


-- 
Thanks,
Anatoly

  reply	other threads:[~2019-06-28 13:47 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-12  6:33 Takeshi Yoshimura
2019-06-12 14:06 ` Aaron Conole
2019-06-13  2:22 ` Takeshi Yoshimura
2019-06-13 17:37   ` David Christensen
2019-06-14  7:34   ` David Marchand
2019-06-14  7:49   ` [dpdk-dev] [PATCH v2] " Takeshi Yoshimura
2019-07-13  1:15     ` [dpdk-dev] [PATCH v3] " Takeshi Yoshimura
2019-07-16  0:20       ` David Christensen
2019-07-16 10:56         ` Thomas Monjalon
2019-06-18  2:37   ` [dpdk-dev] [PATCH] " Mo, YufengX
2019-06-18  2:39   ` Mo, YufengX
2019-06-26  9:43   ` Burakov, Anatoly
2019-06-28 11:38   ` Takeshi T Yoshimura
2019-06-28 13:47     ` Burakov, Anatoly [this message]
2019-06-28 14:04       ` Burakov, Anatoly
2019-06-13  2:30 ` Takeshi T Yoshimura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23b7cf60-e99a-51b1-0cf8-201f31b648ee@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=TYOS@jp.ibm.com \
    --cc=dev@dpdk.org \
    --cc=drc@ibm.com \
    --cc=pradeep@us.ibm.com \
    --cc=yufengx.mo@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).