DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Takeshi T Yoshimura <TYOS@jp.ibm.com>
Cc: "Mo, YufengX" <yufengx.mo@intel.com>,
	dev@dpdk.org, David Christensen <drc@ibm.com>,
	Pradeep Satyanarayana <pradeep@us.ibm.com>
Subject: Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA area in ppc64le
Date: Fri, 28 Jun 2019 15:04:50 +0100	[thread overview]
Message-ID: <3536d9a6-76d4-0ecd-1b53-1d1c81f36dff@intel.com> (raw)
In-Reply-To: <23b7cf60-e99a-51b1-0cf8-201f31b648ee@intel.com>

On 28-Jun-19 2:47 PM, Burakov, Anatoly wrote:
> On 28-Jun-19 12:38 PM, Takeshi T Yoshimura wrote:
>>> To: "Mo, YufengX" <yufengx.mo@intel.com>, dev@dpdk.org
>>> From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
>>> Date: 06/26/2019 06:43PM
>>> Cc: drc@ibm.com, pradeep@us.ibm.com, Takeshi Yoshimura
>>> <tyos@jp.ibm.com>
>>> Subject: [EXTERNAL] Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA
>>> area in ppc64le
>>>
>>> On 18-Jun-19 3:37 AM, Mo, YufengX wrote:
>>>> From: Takeshi Yoshimura <tyos@jp.ibm.com>
>>>>
>>>> In ppc64le, expanding DMA areas always fail because we cannot
>>> remove
>>>> a DMA window. As a result, we cannot allocate more than one memseg
>>> in
>>>> ppc64le. This is because vfio_spapr_dma_mem_map() doesn't unmap all
>>>> the mapped DMA before removing the window. This patch fixes this
>>>> incorrect behavior.
>>>>
>>>> I added a global variable to track current window size since we do
>>>> not have better ways to get exact size of it than doing so. sPAPR
>>>> IOMMU seems not to provide any ways to get window size with ioctl
>>>> interfaces. rte_memseg_walk*() is currently used to calculate
>>> window
>>>> size, but it walks memsegs that are marked as used, not mapped. So,
>>>> we need to determine if a given memseg is mapped or not, otherwise
>>>> the ioctl reports errors due to attempting to unregister memory
>>>> addresses that are not registered. The global variable is excluded
>>>> in non-ppc64le binaries.
>>>>
>>>> Similar problems happen in user maps. We need to avoid attempting
>>> to
>>>> unmap the address that is given as the function's parameter. The
>>>> compaction of user maps prevents us from passing correct length for
>>>> unmapping DMA at the window recreation. So, I removed it in
>>> ppc64le.
>>>>
>>>> I also fixed the order of ioctl for unregister and unmap. The ioctl
>>>> for unregister sometimes report device busy errors due to the
>>>> existence of mapped area.
>>>>
>>>> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
>>>> ---
>>>
>>> OK there are three patches, and two v1's with two different authors
>>> in
>>> reply to the same original patch. There's too much going on here, i
>>> can't review this. Needs splitting.
>>>
>>> Also, #ifdef-ing out the map merging seems highly suspect.
>>>
>>> With regards to "walking used memsegs, not mapped", unless i'm
>>> misunderstanding something, these are the same - whenever a segment
>>> is
>>> mapped, it is marked as used, and whenever it is unmapped, it is
>>> marked
>>> as free. Could you please explain what is the difference and why is
>>> this
>>> needed?
>>>
>>> Is the point of contention here being the fact that whenever the
>>> unmap
>>> callback arrives, the segments still appear used when iterating over
>>> the
>>> map? If that's the case, then i think it would be OK to mark them as
>>> unused *before* triggering callbacks, and chances are some of this
>>> code
>>> wouldn't be needed. That would require a deprecation notice though,
>>> because the API behavior will change (even if this fact is not
>>> documented properly).
>>>
>>> -- 
>>> Thanks,
>>> Anatoly
>>>
>>>
>>
>> I am the author of this patch. We should ignore a patch from YufengX Mo.
>>
>>> From my code reading, a memsg is at first marked as used when it is 
>>> allocated. Then, the memseg is passed to vfio_spapr_dma_mem_map(). 
>>> The callback iterates all the used (i.e., allocated) memsegs and call 
>>> ioctl for mapping VA to IOVA. So, when vfio_spapr_dma_mem_map() is 
>>> called, passed memsegs can be non-mapped but marked as used. As a 
>>> result, an attempt to unmap non-mapped area happens during DMA window 
>>> expansion. This is the difference and why this fix was needed.
>>
>>> i think it would be OK to mark them as unused *before* triggering 
>>> callbacks
>>
>> Yes, my first idea was the same as yours, but I was also worried that 
>> it might cause inconsistent API behavior as you also pointed out. If 
>> you think so, I think I can rewrite the patch without ugly #ifdef.
>>
>> Unfortunately, I don't have enough time for writing code next week and 
>> next next week. So, I will resubmit the revised patch weeks later.
> 
> I think the approach with fixing the mem callbacks to report the 
> unmapped segments as no longer used would be better.
> 
> As far as i can remember at the point where callbacks are triggered, the 
> memory is already removed from malloc heap and from all processes. Each 
> secondary also stores their own shadow copy of the memory map, so 
> removing the "used" flags from the main map will not have any 
> consequences as far as correctness is concerned. Each callback is also 
> getting the memory area being removed as parameters, so if there is code 
> that needs to be run taking into account that memory area, it can be done.
> 
> Existing code may rely on this behavior (even though it doesn't make 
> much sense now that i think of it), so going with this approach *will* 
> require a deprecation notice and can only be done in the next release.


One *other* way to fix this would be to stored the pages being removed 
in a struct, and pass it as parameters to the _walk() window size 
calculation function. This would avoid needless API changes and handle 
this case correctly.


> 
>>
>> Regards,
>> Takeshi
>>
>>
> 
> 


-- 
Thanks,
Anatoly

  reply	other threads:[~2019-06-28 14:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-12  6:33 Takeshi Yoshimura
2019-06-12 14:06 ` Aaron Conole
2019-06-13  2:22 ` Takeshi Yoshimura
2019-06-13 17:37   ` David Christensen
2019-06-14  7:34   ` David Marchand
2019-06-14  7:49   ` [dpdk-dev] [PATCH v2] " Takeshi Yoshimura
2019-07-13  1:15     ` [dpdk-dev] [PATCH v3] " Takeshi Yoshimura
2019-07-16  0:20       ` David Christensen
2019-07-16 10:56         ` Thomas Monjalon
2019-06-18  2:37   ` [dpdk-dev] [PATCH] " Mo, YufengX
2019-06-18  2:39   ` Mo, YufengX
2019-06-26  9:43   ` Burakov, Anatoly
2019-06-28 11:38   ` Takeshi T Yoshimura
2019-06-28 13:47     ` Burakov, Anatoly
2019-06-28 14:04       ` Burakov, Anatoly [this message]
2019-06-13  2:30 ` Takeshi T Yoshimura

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3536d9a6-76d4-0ecd-1b53-1d1c81f36dff@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=TYOS@jp.ibm.com \
    --cc=dev@dpdk.org \
    --cc=drc@ibm.com \
    --cc=pradeep@us.ibm.com \
    --cc=yufengx.mo@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).