From: "Takeshi T Yoshimura" <TYOS@jp.ibm.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: "Mo, YufengX" <yufengx.mo@intel.com>,
dev@dpdk.org, "David Christensen" <drc@ibm.com>,
"Pradeep Satyanarayana" <pradeep@us.ibm.com>
Subject: Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA area in ppc64le
Date: Fri, 28 Jun 2019 11:38:56 +0000 [thread overview]
Message-ID: <OFD546F28C.B5422A21-ON00258427.003A2A38-00258427.003FFD5F@notes.na.collabserv.com> (raw)
In-Reply-To: <4b10a275-10b4-9806-997f-7241a9e5cfed@intel.com>
>To: "Mo, YufengX" <yufengx.mo@intel.com>, dev@dpdk.org
>From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
>Date: 06/26/2019 06:43PM
>Cc: drc@ibm.com, pradeep@us.ibm.com, Takeshi Yoshimura
><tyos@jp.ibm.com>
>Subject: [EXTERNAL] Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA
>area in ppc64le
>
>On 18-Jun-19 3:37 AM, Mo, YufengX wrote:
>> From: Takeshi Yoshimura <tyos@jp.ibm.com>
>>
>> In ppc64le, expanding DMA areas always fail because we cannot
>remove
>> a DMA window. As a result, we cannot allocate more than one memseg
>in
>> ppc64le. This is because vfio_spapr_dma_mem_map() doesn't unmap all
>> the mapped DMA before removing the window. This patch fixes this
>> incorrect behavior.
>>
>> I added a global variable to track current window size since we do
>> not have better ways to get exact size of it than doing so. sPAPR
>> IOMMU seems not to provide any ways to get window size with ioctl
>> interfaces. rte_memseg_walk*() is currently used to calculate
>window
>> size, but it walks memsegs that are marked as used, not mapped. So,
>> we need to determine if a given memseg is mapped or not, otherwise
>> the ioctl reports errors due to attempting to unregister memory
>> addresses that are not registered. The global variable is excluded
>> in non-ppc64le binaries.
>>
>> Similar problems happen in user maps. We need to avoid attempting
>to
>> unmap the address that is given as the function's parameter. The
>> compaction of user maps prevents us from passing correct length for
>> unmapping DMA at the window recreation. So, I removed it in
>ppc64le.
>>
>> I also fixed the order of ioctl for unregister and unmap. The ioctl
>> for unregister sometimes report device busy errors due to the
>> existence of mapped area.
>>
>> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
>> ---
>
>OK there are three patches, and two v1's with two different authors
>in
>reply to the same original patch. There's too much going on here, i
>can't review this. Needs splitting.
>
>Also, #ifdef-ing out the map merging seems highly suspect.
>
>With regards to "walking used memsegs, not mapped", unless i'm
>misunderstanding something, these are the same - whenever a segment
>is
>mapped, it is marked as used, and whenever it is unmapped, it is
>marked
>as free. Could you please explain what is the difference and why is
>this
>needed?
>
>Is the point of contention here being the fact that whenever the
>unmap
>callback arrives, the segments still appear used when iterating over
>the
>map? If that's the case, then i think it would be OK to mark them as
>unused *before* triggering callbacks, and chances are some of this
>code
>wouldn't be needed. That would require a deprecation notice though,
>because the API behavior will change (even if this fact is not
>documented properly).
>
>--
>Thanks,
>Anatoly
>
>
I am the author of this patch. We should ignore a patch from YufengX Mo.
From my code reading, a memsg is at first marked as used when it is allocated. Then, the memseg is passed to vfio_spapr_dma_mem_map(). The callback iterates all the used (i.e., allocated) memsegs and call ioctl for mapping VA to IOVA. So, when vfio_spapr_dma_mem_map() is called, passed memsegs can be non-mapped but marked as used. As a result, an attempt to unmap non-mapped area happens during DMA window expansion. This is the difference and why this fix was needed.
> i think it would be OK to mark them as unused *before* triggering callbacks
Yes, my first idea was the same as yours, but I was also worried that it might cause inconsistent API behavior as you also pointed out. If you think so, I think I can rewrite the patch without ugly #ifdef.
Unfortunately, I don't have enough time for writing code next week and next next week. So, I will resubmit the revised patch weeks later.
Regards,
Takeshi
next prev parent reply other threads:[~2019-06-28 11:39 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-12 6:33 Takeshi Yoshimura
2019-06-12 14:06 ` Aaron Conole
2019-06-13 2:22 ` Takeshi Yoshimura
2019-06-13 17:37 ` David Christensen
2019-06-14 7:34 ` David Marchand
2019-06-14 7:49 ` [dpdk-dev] [PATCH v2] " Takeshi Yoshimura
2019-07-13 1:15 ` [dpdk-dev] [PATCH v3] " Takeshi Yoshimura
2019-07-16 0:20 ` David Christensen
2019-07-16 10:56 ` Thomas Monjalon
2019-06-18 2:37 ` [dpdk-dev] [PATCH] " Mo, YufengX
2019-06-18 2:39 ` Mo, YufengX
2019-06-26 9:43 ` Burakov, Anatoly
2019-06-28 11:38 ` Takeshi T Yoshimura [this message]
2019-06-28 13:47 ` Burakov, Anatoly
2019-06-28 14:04 ` Burakov, Anatoly
2019-06-13 2:30 ` Takeshi T Yoshimura
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=OFD546F28C.B5422A21-ON00258427.003A2A38-00258427.003FFD5F@notes.na.collabserv.com \
--to=tyos@jp.ibm.com \
--cc=anatoly.burakov@intel.com \
--cc=dev@dpdk.org \
--cc=drc@ibm.com \
--cc=pradeep@us.ibm.com \
--cc=yufengx.mo@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).