From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stable-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id CACCBA04DC
	for <public@inbox.dpdk.org>; Mon, 19 Oct 2020 11:43:29 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 82C0CC98C;
	Mon, 19 Oct 2020 11:43:28 +0200 (CEST)
Received: from mail-pf1-f194.google.com (mail-pf1-f194.google.com
 [209.85.210.194]) by dpdk.org (Postfix) with ESMTP id 9C52DAD29;
 Mon, 19 Oct 2020 11:43:24 +0200 (CEST)
Received: by mail-pf1-f194.google.com with SMTP id e15so2464266pfh.6;
 Mon, 19 Oct 2020 02:43:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to;
 bh=224521vJvqgSSb4hOge46pQElnri3r+sLf3g93cqCPA=;
 b=OiBmleyP8dbtTDAERHhP7Bdxmgm7tM7yExtb4+czJUUIC5ue3c6Q/sZMzA4Z64XHTy
 nN0L6PUzP1Y0O3qMTPYu6qLeIulllzatBVQ9dDsZ100A6abHCU6KNLxsqoIuA7A/FQzW
 B6jGKhPUGf6bUUw1YPrvzPjJhtkO5X0DmEC2UoAKlNss8FL940n2wfJSVahAN83jYKmz
 A5ETBrrjkPKhVVhdg+6h6vD9zeSAYt067le7fLQqvBwpC3QdogowPgweS0vNNicZDZCQ
 IcWvAuIm6Ann/nUJMS1ZSiZERWOxmei5KE2FKaE45fLHdzlC1jEohTm2fmB6F6SojSZT
 rtww==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=224521vJvqgSSb4hOge46pQElnri3r+sLf3g93cqCPA=;
 b=FEWDekEtRuX5xNLRyWVUCBz+XCdWbflW6PnHMUKSYDnxkvkqlwlat4vzIMB/jjUv5Z
 62xrn8ZR3Y7n7WYySBkNRbhUT+K9jzduxWqxMQpFGfKKlA/yQvNGLvPprEkzywYKjEZ8
 gwC216W11/9XwfAYrfeWQyJlr9uy4OxyUHDB73Qe/YsmyZHhuxtJL8izGUJEb4KKNf2z
 AMPoBMnfUNYrT3xPZ4tY6cYHrZDiQj/xpeYav9TAacVJxzbaCh3t/jKDEDzA/+HTO/0A
 gB1uCxF5dVCOvat+cyAmwl03GDj3UHwN2TdCgMhaa595oLNBgMiwiHcLKkV6itcXxFsb
 DHBg==
X-Gm-Message-State: AOAM532/oU4bqjv6WGt2cSm02CWqjiXlnJXJ8nZaLXhbOgFD3OEXwjS1
 eTvW52DTs3sCuOk+a+LZXFTRCDZ2XwMk8Q==
X-Google-Smtp-Source: ABdhPJxF4EYsp9ABhKcmyDseG2SuTTm98bowB1uRczLMaQfaMd96P+AtPmHmOVNn0ivWU65ZX0y5Sw==
X-Received: by 2002:a62:c181:0:b029:152:6ba8:a011 with SMTP id
 i123-20020a62c1810000b02901526ba8a011mr15411009pfg.2.1603100603716; 
 Mon, 19 Oct 2020 02:43:23 -0700 (PDT)
Received: from gmail.com ([1.6.215.26])
 by smtp.gmail.com with ESMTPSA id z28sm7698649pfk.213.2020.10.19.02.43.21
 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 19 Oct 2020 02:43:22 -0700 (PDT)
Date: Mon, 19 Oct 2020 15:13:15 +0530
From: Nithin Dabilpuram <nithind1988@gmail.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: Jerin Jacob <jerinj@marvell.com>, dev@dpdk.org, stable@dpdk.org
Message-ID: <X41frH9yj0TU+53b@gmail.com>
References: <20201012081106.10610-1-ndabilpuram@marvell.com>
 <20201012081106.10610-3-ndabilpuram@marvell.com>
 <05afb7f5-96bf-dffd-15dd-2024586f7290@intel.com>
 <20201015060914.GA32207@outlook.office365.com>
 <bd079e7f-d4f9-769f-2a45-1e93668c9c9b@intel.com>
 <CAMuDWKTwkQgumHnMjdrmQ+Ldow2p-6eZj+uRT=WpOYyMEo+-2w@mail.gmail.com>
 <66b61bda-03a8-d4c4-af9f-0f90a6ef956d@intel.com>
 <20201016071015.GA22749@gmail.com>
 <4deaf00f-02d3-15b3-2ebe-4a2becc89251@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4deaf00f-02d3-15b3-2ebe-4a2becc89251@intel.com>
Subject: Re: [dpdk-stable] [dpdk-dev] [EXT] Re: [PATCH 2/2] vfio: fix
 partial DMA unmapping for VFIO type1
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
Errors-To: stable-bounces@dpdk.org
Sender: "stable" <stable-bounces@dpdk.org>

On Sat, Oct 17, 2020 at 05:14:55PM +0100, Burakov, Anatoly wrote:
> On 16-Oct-20 8:10 AM, Nithin Dabilpuram wrote:
> > On Thu, Oct 15, 2020 at 04:10:31PM +0100, Burakov, Anatoly wrote:
> > > On 15-Oct-20 12:57 PM, Nithin Dabilpuram wrote:
> > > > On Thu, Oct 15, 2020 at 3:31 PM Burakov, Anatoly
> > > > <anatoly.burakov@intel.com> wrote:
> > > > > 
> > > > > On 15-Oct-20 7:09 AM, Nithin Dabilpuram wrote:
> > > > > > On Wed, Oct 14, 2020 at 04:07:10PM +0100, Burakov, Anatoly wrote:
> > > > > > > External Email
> > > > > > > 
> > > > > > > ----------------------------------------------------------------------
> > > > > > > On 12-Oct-20 9:11 AM, Nithin Dabilpuram wrote:
> > > > > > > > Partial unmapping is not supported for VFIO IOMMU type1
> > > > > > > > by kernel. Though kernel gives return as zero, the unmapped size
> > > > > > > > returned will not be same as expected. So check for
> > > > > > > > returned unmap size and return error.
> > > > > > > > 
> > > > > > > > For case of DMA map/unmap triggered by heap allocations,
> > > > > > > > maintain granularity of memseg page size so that heap
> > > > > > > > expansion and contraction does not have this issue.
> > > > > > > 
> > > > > > > This is quite unfortunate, because there was a different bug that had to do
> > > > > > > with kernel having a very limited number of mappings available [1], as a
> > > > > > > result of which the page concatenation code was added.
> > > > > > > 
> > > > > > > It should therefore be documented that the dma_entry_limit parameter should
> > > > > > > be adjusted should the user run out of the DMA entries.
> > > > > > > 
> > > > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_155414977872.12780.13728555131525362206.stgit-40gimli.home_T_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=FZ_tPCbgFOh18zwRPO9H0yDx8VW38vuapifdDfc8SFQ&m=3GMg-634_cdUCY4WpQPwjzZ_S4ckuMHOnt2FxyyjXMk&s=TJLzppkaDS95VGyRHX2hzflQfb9XLK0OiOszSXoeXKk&e=
> > > > > 
> > > > > <snip>
> > > > > 
> > > > > > > >                       RTE_LOG(ERR, EAL, "  cannot clear DMA remapping, error %i (%s)\n",
> > > > > > > >                                       errno, strerror(errno));
> > > > > > > >                       return -1;
> > > > > > > > +           } else if (dma_unmap.size != len) {
> > > > > > > > +                   RTE_LOG(ERR, EAL, "  unexpected size %"PRIu64" of DMA "
> > > > > > > > +                           "remapping cleared instead of %"PRIu64"\n",
> > > > > > > > +                           (uint64_t)dma_unmap.size, len);
> > > > > > > > +                   rte_errno = EIO;
> > > > > > > > +                   return -1;
> > > > > > > >               }
> > > > > > > >       }
> > > > > > > > @@ -1853,6 +1869,12 @@ container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova,
> > > > > > > >               /* we're partially unmapping a previously mapped region, so we
> > > > > > > >                * need to split entry into two.
> > > > > > > >                */
> > > > > > > > +           if (!vfio_cfg->vfio_iommu_type->partial_unmap) {
> > > > > > > > +                   RTE_LOG(DEBUG, EAL, "DMA partial unmap unsupported\n");
> > > > > > > > +                   rte_errno = ENOTSUP;
> > > > > > > > +                   ret = -1;
> > > > > > > > +                   goto out;
> > > > > > > > +           }
> > > > > > > 
> > > > > > > How would we ever arrive here if we never do more than 1 page worth of
> > > > > > > memory anyway? I don't think this is needed.
> > > > > > 
> > > > > > container_dma_unmap() is called by user via rte_vfio_container_dma_unmap()
> > > > > > and when he maps we don't split it as we don't about his memory.
> > > > > > So if he maps multiple pages and tries to unmap partially, then we should fail.
> > > > > 
> > > > > Should we map it in page granularity then, instead of adding this
> > > > > discrepancy between EAL and user mapping? I.e. instead of adding a
> > > > > workaround, how about we just do the same thing for user mem mappings?
> > > > > 
> > > > In heap mapping's we map and unmap it at huge page granularity as we will always
> > > > maintain that.
> > > > 
> > > > But here I think we don't know if user's allocation is huge page or
> > > > collection of system
> > > > pages. Only thing we can do here is map it at system page granularity which
> > > > could waste entries if he say really is working with hugepages. Isn't ?
> > > > 
> > > 
> > > Yeah we do. The API mandates the pages granularity, and it will check
> > > against page size and number of IOVA entries, so yes, we do enforce the fact
> > > that the IOVA addresses supplied by the user have to be page addresses.
> > 
> > If I see rte_vfio_container_dma_map(), there is no mention of Huge page size
> > user is providing or we computing. He can call rte_vfio_container_dma_map()
> > with 1GB huge page or 4K system page.
> > 
> > Am I missing something ?
> 
> Are you suggesting that a DMA mapping for hugepage-backed memory will be
> made at system page size granularity? E.g. will a 1GB page-backed segment be
> mapped for DMA as a contiguous 4K-based block?

I'm not suggesting anything. My only thought is how to solve below problem.
Say application does the following.

#1 Allocate 1GB memory from huge page or some external mem.
#2 Do rte_vfio_container_dma_map(RTE_VFIO_DEFAULT_CONTAINER_FD, mem, mem, 1GB)
   In linux/eal_vfio.c, we map it is as single VFIO DMA entry of 1 GB as we
   don't know where this memory is coming from or backed by what.
#3 After a while call rte_vfio_container_dma_unmap(RTE_VFIO_DEFAULT_CONTAINER_FD, mem+4KB, mem+4KB, 4KB)
 
Though rte_vfio_container_dma_unmap() supports #3 by splitting entry as shown below,
In VFIO type1 iommu, #3 cannot be supported by current kernel interface. So how
can we allow #3 ?


static int
container_dma_unmap(struct vfio_config *vfio_cfg, uint64_t vaddr, uint64_t iova,
                uint64_t len) 
{
        struct user_mem_map *map, *new_map = NULL;
        struct user_mem_maps *user_mem_maps;
        int ret = 0; 

        user_mem_maps = &vfio_cfg->mem_maps;
        rte_spinlock_recursive_lock(&user_mem_maps->lock);

        /* find our mapping */
        map = find_user_mem_map(user_mem_maps, vaddr, iova, len);
        if (!map) {
                RTE_LOG(ERR, EAL, "Couldn't find previously mapped region\n");
                rte_errno = EINVAL;
                ret = -1;
                goto out; 
        }
        if (map->addr != vaddr || map->iova != iova || map->len != len) {
                /* we're partially unmapping a previously mapped region, so we
                 * need to split entry into two.
                 */


> 
> > > 
> > > -- 
> > > Thanks,
> > > Anatoly
> 
> 
> -- 
> Thanks,
> Anatoly