From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 94910A046B for ; Fri, 28 Jun 2019 16:05:01 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 3262F37A2; Fri, 28 Jun 2019 16:05:00 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 80CEB3798 for ; Fri, 28 Jun 2019 16:04:58 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Jun 2019 07:04:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,427,1557212400"; d="scan'208";a="164666737" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.252.3.102]) ([10.252.3.102]) by fmsmga007.fm.intel.com with ESMTP; 28 Jun 2019 07:04:51 -0700 To: Takeshi T Yoshimura Cc: "Mo, YufengX" , dev@dpdk.org, David Christensen , Pradeep Satyanarayana References: <4b10a275-10b4-9806-997f-7241a9e5cfed@intel.com> <20190613022239.6946-1-tyos@jp.ibm.com> <23b7cf60-e99a-51b1-0cf8-201f31b648ee@intel.com> From: "Burakov, Anatoly" Message-ID: <3536d9a6-76d4-0ecd-1b53-1d1c81f36dff@intel.com> Date: Fri, 28 Jun 2019 15:04:50 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <23b7cf60-e99a-51b1-0cf8-201f31b648ee@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA area in ppc64le X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 28-Jun-19 2:47 PM, Burakov, Anatoly wrote: > On 28-Jun-19 12:38 PM, Takeshi T Yoshimura wrote: >>> To: "Mo, YufengX" , dev@dpdk.org >>> From: "Burakov, Anatoly" >>> Date: 06/26/2019 06:43PM >>> Cc: drc@ibm.com, pradeep@us.ibm.com, Takeshi Yoshimura >>> >>> Subject: [EXTERNAL] Re: [dpdk-dev] [PATCH] vfio: fix expanding DMA >>> area in ppc64le >>> >>> On 18-Jun-19 3:37 AM, Mo, YufengX wrote: >>>> From: Takeshi Yoshimura >>>> >>>> In ppc64le, expanding DMA areas always fail because we cannot >>> remove >>>> a DMA window. As a result, we cannot allocate more than one memseg >>> in >>>> ppc64le. This is because vfio_spapr_dma_mem_map() doesn't unmap all >>>> the mapped DMA before removing the window. This patch fixes this >>>> incorrect behavior. >>>> >>>> I added a global variable to track current window size since we do >>>> not have better ways to get exact size of it than doing so. sPAPR >>>> IOMMU seems not to provide any ways to get window size with ioctl >>>> interfaces. rte_memseg_walk*() is currently used to calculate >>> window >>>> size, but it walks memsegs that are marked as used, not mapped. So, >>>> we need to determine if a given memseg is mapped or not, otherwise >>>> the ioctl reports errors due to attempting to unregister memory >>>> addresses that are not registered. The global variable is excluded >>>> in non-ppc64le binaries. >>>> >>>> Similar problems happen in user maps. We need to avoid attempting >>> to >>>> unmap the address that is given as the function's parameter. The >>>> compaction of user maps prevents us from passing correct length for >>>> unmapping DMA at the window recreation. So, I removed it in >>> ppc64le. >>>> >>>> I also fixed the order of ioctl for unregister and unmap. The ioctl >>>> for unregister sometimes report device busy errors due to the >>>> existence of mapped area. >>>> >>>> Signed-off-by: Takeshi Yoshimura >>>> --- >>> >>> OK there are three patches, and two v1's with two different authors >>> in >>> reply to the same original patch. There's too much going on here, i >>> can't review this. Needs splitting. >>> >>> Also, #ifdef-ing out the map merging seems highly suspect. >>> >>> With regards to "walking used memsegs, not mapped", unless i'm >>> misunderstanding something, these are the same - whenever a segment >>> is >>> mapped, it is marked as used, and whenever it is unmapped, it is >>> marked >>> as free. Could you please explain what is the difference and why is >>> this >>> needed? >>> >>> Is the point of contention here being the fact that whenever the >>> unmap >>> callback arrives, the segments still appear used when iterating over >>> the >>> map? If that's the case, then i think it would be OK to mark them as >>> unused *before* triggering callbacks, and chances are some of this >>> code >>> wouldn't be needed. That would require a deprecation notice though, >>> because the API behavior will change (even if this fact is not >>> documented properly). >>> >>> -- >>> Thanks, >>> Anatoly >>> >>> >> >> I am the author of this patch. We should ignore a patch from YufengX Mo. >> >>> From my code reading, a memsg is at first marked as used when it is >>> allocated. Then, the memseg is passed to vfio_spapr_dma_mem_map(). >>> The callback iterates all the used (i.e., allocated) memsegs and call >>> ioctl for mapping VA to IOVA. So, when vfio_spapr_dma_mem_map() is >>> called, passed memsegs can be non-mapped but marked as used. As a >>> result, an attempt to unmap non-mapped area happens during DMA window >>> expansion. This is the difference and why this fix was needed. >> >>> i think it would be OK to mark them as unused *before* triggering >>> callbacks >> >> Yes, my first idea was the same as yours, but I was also worried that >> it might cause inconsistent API behavior as you also pointed out. If >> you think so, I think I can rewrite the patch without ugly #ifdef. >> >> Unfortunately, I don't have enough time for writing code next week and >> next next week. So, I will resubmit the revised patch weeks later. > > I think the approach with fixing the mem callbacks to report the > unmapped segments as no longer used would be better. > > As far as i can remember at the point where callbacks are triggered, the > memory is already removed from malloc heap and from all processes. Each > secondary also stores their own shadow copy of the memory map, so > removing the "used" flags from the main map will not have any > consequences as far as correctness is concerned. Each callback is also > getting the memory area being removed as parameters, so if there is code > that needs to be run taking into account that memory area, it can be done. > > Existing code may rely on this behavior (even though it doesn't make > much sense now that i think of it), so going with this approach *will* > require a deprecation notice and can only be done in the next release. One *other* way to fix this would be to stored the pages being removed in a struct, and pass it as parameters to the _walk() window size calculation function. This would avoid needless API changes and handle this case correctly. > >> >> Regards, >> Takeshi >> >> > > -- Thanks, Anatoly