From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 94910A046B
	for <public@inbox.dpdk.org>; Fri, 28 Jun 2019 16:05:01 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 3262F37A2;
	Fri, 28 Jun 2019 16:05:00 +0200 (CEST)
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by dpdk.org (Postfix) with ESMTP id 80CEB3798
 for <dev@dpdk.org>; Fri, 28 Jun 2019 16:04:58 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga007.fm.intel.com ([10.253.24.52])
 by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 28 Jun 2019 07:04:57 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.63,427,1557212400"; d="scan'208";a="164666737"
Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.252.3.102])
 ([10.252.3.102])
 by fmsmga007.fm.intel.com with ESMTP; 28 Jun 2019 07:04:51 -0700
To: Takeshi T Yoshimura <TYOS@jp.ibm.com>
Cc: "Mo, YufengX" <yufengx.mo@intel.com>, dev@dpdk.org,
 David Christensen <drc@ibm.com>, Pradeep Satyanarayana <pradeep@us.ibm.com>
References: <4b10a275-10b4-9806-997f-7241a9e5cfed@intel.com>
 <20190613022239.6946-1-tyos@jp.ibm.com>
 <OFD546F28C.B5422A21-ON00258427.003A2A38-00258427.003FFD5F@notes.na.collabserv.com>
 <23b7cf60-e99a-51b1-0cf8-201f31b648ee@intel.com>
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Message-ID: <3536d9a6-76d4-0ecd-1b53-1d1c81f36dff@intel.com>
Date: Fri, 28 Jun 2019 15:04:50 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101
 Thunderbird/60.7.2
MIME-Version: 1.0
In-Reply-To: <23b7cf60-e99a-51b1-0cf8-201f31b648ee@intel.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA area in ppc64le
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 28-Jun-19 2:47 PM, Burakov, Anatoly wrote:
> On 28-Jun-19 12:38 PM, Takeshi T Yoshimura wrote:
>>> To: "Mo, YufengX" <yufengx.mo@intel.com>, dev@dpdk.org
>>> From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
>>> Date: 06/26/2019 06:43PM
>>> Cc: drc@ibm.com, pradeep@us.ibm.com, Takeshi Yoshimura
>>> <tyos@jp.ibm.com>
>>> Subject: [EXTERNAL] Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA
>>> area in ppc64le
>>>
>>> On 18-Jun-19 3:37 AM, Mo, YufengX wrote:
>>>> From: Takeshi Yoshimura <tyos@jp.ibm.com>
>>>>
>>>> In ppc64le, expanding DMA areas always fail because we cannot
>>> remove
>>>> a DMA window. As a result, we cannot allocate more than one memseg
>>> in
>>>> ppc64le. This is because vfio_spapr_dma_mem_map() doesn't unmap all
>>>> the mapped DMA before removing the window. This patch fixes this
>>>> incorrect behavior.
>>>>
>>>> I added a global variable to track current window size since we do
>>>> not have better ways to get exact size of it than doing so. sPAPR
>>>> IOMMU seems not to provide any ways to get window size with ioctl
>>>> interfaces. rte_memseg_walk*() is currently used to calculate
>>> window
>>>> size, but it walks memsegs that are marked as used, not mapped. So,
>>>> we need to determine if a given memseg is mapped or not, otherwise
>>>> the ioctl reports errors due to attempting to unregister memory
>>>> addresses that are not registered. The global variable is excluded
>>>> in non-ppc64le binaries.
>>>>
>>>> Similar problems happen in user maps. We need to avoid attempting
>>> to
>>>> unmap the address that is given as the function's parameter. The
>>>> compaction of user maps prevents us from passing correct length for
>>>> unmapping DMA at the window recreation. So, I removed it in
>>> ppc64le.
>>>>
>>>> I also fixed the order of ioctl for unregister and unmap. The ioctl
>>>> for unregister sometimes report device busy errors due to the
>>>> existence of mapped area.
>>>>
>>>> Signed-off-by: Takeshi Yoshimura <tyos@jp.ibm.com>
>>>> ---
>>>
>>> OK there are three patches, and two v1's with two different authors
>>> in
>>> reply to the same original patch. There's too much going on here, i
>>> can't review this. Needs splitting.
>>>
>>> Also, #ifdef-ing out the map merging seems highly suspect.
>>>
>>> With regards to "walking used memsegs, not mapped", unless i'm
>>> misunderstanding something, these are the same - whenever a segment
>>> is
>>> mapped, it is marked as used, and whenever it is unmapped, it is
>>> marked
>>> as free. Could you please explain what is the difference and why is
>>> this
>>> needed?
>>>
>>> Is the point of contention here being the fact that whenever the
>>> unmap
>>> callback arrives, the segments still appear used when iterating over
>>> the
>>> map? If that's the case, then i think it would be OK to mark them as
>>> unused *before* triggering callbacks, and chances are some of this
>>> code
>>> wouldn't be needed. That would require a deprecation notice though,
>>> because the API behavior will change (even if this fact is not
>>> documented properly).
>>>
>>> -- 
>>> Thanks,
>>> Anatoly
>>>
>>>
>>
>> I am the author of this patch. We should ignore a patch from YufengX Mo.
>>
>>> From my code reading, a memsg is at first marked as used when it is 
>>> allocated. Then, the memseg is passed to vfio_spapr_dma_mem_map(). 
>>> The callback iterates all the used (i.e., allocated) memsegs and call 
>>> ioctl for mapping VA to IOVA. So, when vfio_spapr_dma_mem_map() is 
>>> called, passed memsegs can be non-mapped but marked as used. As a 
>>> result, an attempt to unmap non-mapped area happens during DMA window 
>>> expansion. This is the difference and why this fix was needed.
>>
>>> i think it would be OK to mark them as unused *before* triggering 
>>> callbacks
>>
>> Yes, my first idea was the same as yours, but I was also worried that 
>> it might cause inconsistent API behavior as you also pointed out. If 
>> you think so, I think I can rewrite the patch without ugly #ifdef.
>>
>> Unfortunately, I don't have enough time for writing code next week and 
>> next next week. So, I will resubmit the revised patch weeks later.
> 
> I think the approach with fixing the mem callbacks to report the 
> unmapped segments as no longer used would be better.
> 
> As far as i can remember at the point where callbacks are triggered, the 
> memory is already removed from malloc heap and from all processes. Each 
> secondary also stores their own shadow copy of the memory map, so 
> removing the "used" flags from the main map will not have any 
> consequences as far as correctness is concerned. Each callback is also 
> getting the memory area being removed as parameters, so if there is code 
> that needs to be run taking into account that memory area, it can be done.
> 
> Existing code may rely on this behavior (even though it doesn't make 
> much sense now that i think of it), so going with this approach *will* 
> require a deprecation notice and can only be done in the next release.


One *other* way to fix this would be to stored the pages being removed 
in a struct, and pass it as parameters to the _walk() window size 
calculation function. This would avoid needless API changes and handle 
this case correctly.


> 
>>
>> Regards,
>> Takeshi
>>
>>
> 
> 


-- 
Thanks,
Anatoly