From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id F126AA055A; Tue, 25 Feb 2020 14:49:48 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 596FD2C4F; Tue, 25 Feb 2020 14:49:48 +0100 (CET) Received: from qrelay110.mxroute.com (qrelay110.mxroute.com [172.82.139.110]) by dpdk.org (Postfix) with ESMTP id E92432C02 for ; Tue, 25 Feb 2020 14:49:46 +0100 (CET) Received: from filter003.mxroute.com ([168.235.111.26] 168-235-111-26.cloud.ramnode.com) (Authenticated sender: mN4UYu2MZsgR) by qrelay110.mxroute.com (ZoneMTA) with ESMTPA id 1707c9c47cd000f8af.001 for ; Tue, 25 Feb 2020 13:49:41 +0000 X-Zone-Loop: a694acd49edcea02766fb577cc66ef25ef2f621be8ac X-Originating-IP: [168.235.111.26] Received: from galaxy.mxroute.com (unknown [23.92.70.113]) by filter003.mxroute.com (Postfix) with ESMTPS id B230C6002C; Tue, 25 Feb 2020 13:49:40 +0000 (UTC) Received: from [192.198.151.43] by galaxy.mxroute.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.91) (envelope-from ) id 1j6aFf-0000Tl-9F; Tue, 25 Feb 2020 08:27:55 -0500 To: Anatoly Burakov , dev@dpdk.org References: From: Ray Kinsella Autocrypt: addr=mdr@ashroe.eu; keydata= mQINBFv8B3wBEAC+5ImcgbIvadt3axrTnt7Sxch3FsmWTTomXfB8YiuHT8KL8L/bFRQSL1f6 ASCHu3M89EjYazlY+vJUWLr0BhK5t/YI7bQzrOuYrl9K94vlLwzD19s/zB/g5YGGR5plJr0s JtJsFGEvF9LL3e+FKMRXveQxBB8A51nAHfwG0WSyx53d61DYz7lp4/Y4RagxaJoHp9lakn8j HV2N6rrnF+qt5ukj5SbbKWSzGg5HQF2t0QQ5tzWhCAKTfcPlnP0GymTBfNMGOReWivi3Qqzr S51Xo7hoGujUgNAM41sxpxmhx8xSwcQ5WzmxgAhJ/StNV9cb3HWIoE5StCwQ4uXOLplZNGnS uxNdegvKB95NHZjRVRChg/uMTGpg9PqYbTIFoPXjuk27sxZLRJRrueg4tLbb3HM39CJwSB++ YICcqf2N+GVD48STfcIlpp12/HI+EcDSThzfWFhaHDC0hyirHxJyHXjnZ8bUexI/5zATn/ux TpMbc/vicJxeN+qfaVqPkCbkS71cHKuPluM3jE8aNCIBNQY1/j87k5ELzg3qaesLo2n1krBH bKvFfAmQuUuJT84/IqfdVtrSCTabvDuNBDpYBV0dGbTwaRfE7i+LiJJclUr8lOvHUpJ4Y6a5 0cxEPxm498G12Z3NoY/mP5soItPIPtLR0rA0fage44zSPwp6cQARAQABtBxSYXkgS2luc2Vs bGEgPG1kckBhc2hyb2UuZXU+iQJUBBMBCAA+FiEEcDUDlKDJaDuJlfZfdJdaH/sCCpsFAlv8 B3wCGyMFCQlmAYAFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQdJdaH/sCCptdtRAAl0oE msa+djBVYLIsax+0f8acidtWg2l9f7kc2hEjp9h9aZCpPchQvhhemtew/nKavik3RSnLTAyn B3C/0GNlmvI1l5PFROOgPZwz4xhJKGN7jOsRrbkJa23a8ly5UXwF3Vqnlny7D3z+7cu1qq/f VRK8qFyWkAb+xgqeZ/hTcbJUWtW+l5Zb+68WGEp8hB7TuJLEWb4+VKgHTpQ4vElYj8H3Z94a 04s2PJMbLIZSgmKDASnyrKY0CzTpPXx5rSJ1q+B1FCsfepHLqt3vKSALa3ld6bJ8fSJtDUJ7 JLiU8dFZrywgDIVme01jPbjJtUScW6jONLvhI8Z2sheR71UoKqGomMHNQpZ03ViVWBEALzEt TcjWgJFn8yAmxqM4nBnZ+hE3LbMo34KCHJD4eg18ojDt3s9VrDLa+V9fNxUHPSib9FD9UX/1 +nGfU/ZABmiTuUDM7WZdXri7HaMpzDRJUKI6b+/uunF8xH/h/MHW16VuMzgI5dkOKKv1LejD dT5mA4R+2zBS+GsM0oa2hUeX9E5WwjaDzXtVDg6kYq8YvEd+m0z3M4e6diFeLS77/sAOgaYL 92UcoKD+Beym/fVuC6/55a0e12ksTmgk5/ZoEdoNQLlVgd2INtvnO+0k5BJcn66ZjKn3GbEC VqFbrnv1GnA58nEInRCTzR1k26h9nmS5Ag0EW/wHfAEQAMth1vHr3fOZkVOPfod3M6DkQir5 xJvUW5EHgYUjYCPIa2qzgIVVuLDqZgSCCinyooG5dUJONVHj3nCbITCpJp4eB3PI84RPfDcC hf/V34N/Gx5mTeoymSZDBmXT8YtvV/uJvn+LvHLO4ZJdvq5ZxmDyxfXFmkm3/lLw0+rrNdK5 pt6OnVlCqEU9tcDBezjUwDtOahyV20XqxtUttN4kQWbDRkhT+HrA9WN9l2HX91yEYC+zmF1S OhBqRoTPLrR6g4sCWgFywqztpvZWhyIicJipnjac7qL/wRS+wrWfsYy6qWLIV80beN7yoa6v ccnuy4pu2uiuhk9/edtlmFE4dNdoRf7843CV9k1yRASTlmPkU59n0TJbw+okTa9fbbQgbIb1 pWsAuicRHyLUIUz4f6kPgdgty2FgTKuPuIzJd1s8s6p2aC1qo+Obm2gnBTduB+/n1Jw+vKpt 07d+CKEKu4CWwvZZ8ktJJLeofi4hMupTYiq+oMzqH+V1k6QgNm0Da489gXllU+3EFC6W1qKj tkvQzg2rYoWeYD1Qn8iXcO4Fpk6wzylclvatBMddVlQ6qrYeTmSbCsk+m2KVrz5vIyja0o5Y yfeN29s9emXnikmNfv/dA5fpi8XCANNnz3zOfA93DOB9DBf0TQ2/OrSPGjB3op7RCfoPBZ7u AjJ9dM7VABEBAAGJAjwEGAEIACYWIQRwNQOUoMloO4mV9l90l1of+wIKmwUCW/wHfAIbDAUJ CWYBgAAKCRB0l1of+wIKm3KlD/9w/LOG5rtgtCUWPl4B3pZvGpNym6XdK8cop9saOnE85zWf u+sKWCrxNgYkYP7aZrYMPwqDvilxhbTsIJl5HhPgpTO1b0i+c0n1Tij3EElj5UCg3q8mEc17 c+5jRrY3oz77g7E3oPftAjaq1ybbXjY4K32o3JHFR6I8wX3m9wJZJe1+Y+UVrrjY65gZFxcA thNVnWKErarVQGjeNgHV4N1uF3pIx3kT1N4GSnxhoz4Bki91kvkbBhUgYfNflGURfZT3wIKK +d50jd7kqRouXUCzTdzmDh7jnYrcEFM4nvyaYu0JjSS5R672d9SK5LVIfWmoUGzqD4AVmUW8 pcv461+PXchuS8+zpltR9zajl72Q3ymlT4BTAQOlCWkD0snBoKNUB5d2EXPNV13nA0qlm4U2 GpROfJMQXjV6fyYRvttKYfM5xYKgRgtP0z5lTAbsjg9WFKq0Fndh7kUlmHjuAIwKIV4Tzo75 QO2zC0/NTaTjmrtiXhP+vkC4pcrOGNsbHuaqvsc/ZZ0siXyYsqbctj/sCd8ka2r94u+c7o4l BGaAm+FtwAfEAkXHu4y5Phuv2IRR+x1wTey1U1RaEPgN8xq0LQ1OitX4t2mQwjdPihZQBCnZ wzOrkbzlJMNrMKJpEgulmxAHmYJKgvZHXZXtLJSejFjR0GdHJcL5rwVOMWB8cg== Message-ID: <276c85ed-ac2f-52c9-dffc-18ce41ab0f35@ashroe.eu> Date: Tue, 25 Feb 2020 13:49:38 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-AuthUser: mdr@ashroe.eu Subject: Re: [dpdk-dev] [PATCH] vfio: map contiguous areas in one go X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Anatoly, On 25/02/2020 13:24, Anatoly Burakov wrote: > Currently, when we are creating DMA mappings for memory that's > either external or is backed by hugepages in IOVA as PA mode, we > assume that each page is necessarily discontiguous. This may not > actually be the case, especially for external memory, where the > user is able to create their own IOVA table and make it > contiguous. This is a problem because VFIO has a limited number > of DMA mappings, and it does not appear to concatenate them and > treats each mapping as separate, even when they cover adjacent > areas. > > Fix this so that we always map contiguous memory in a single > chunk, as opposed to mapping each segment separately. Can I confirm my understanding. We are essentially correcting user errant behavior, trading off startup/mapping time to save IOMMU resources? > > Signed-off-by: Anatoly Burakov > --- > lib/librte_eal/linux/eal/eal_vfio.c | 59 +++++++++++++++++++++++++---- > 1 file changed, 51 insertions(+), 8 deletions(-) > > diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c > index 01b5ef3f42..4502aefed3 100644 > --- a/lib/librte_eal/linux/eal/eal_vfio.c > +++ b/lib/librte_eal/linux/eal/eal_vfio.c > @@ -514,9 +514,11 @@ static void > vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > void *arg __rte_unused) > { > + rte_iova_t iova_start, iova_expected; > struct rte_memseg_list *msl; > struct rte_memseg *ms; > size_t cur_len = 0; > + uint64_t va_start; > > msl = rte_mem_virt2memseg_list(addr); > > @@ -545,22 +547,63 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, > #endif > /* memsegs are contiguous in memory */ > ms = rte_mem_virt2memseg(addr, msl); > + > + /* > + * This memory is not guaranteed to be contiguous, but it still could > + * be, or it could have some small contiguous chunks. Since the number > + * of VFIO mappings is limited, and VFIO appears to not concatenate > + * adjacent mappings, we have to do this ourselves. > + * > + * So, find contiguous chunks, then map them. > + */ > + va_start = ms->addr_64; > + iova_start = iova_expected = ms->iova; > while (cur_len < len) { > + bool new_contig_area = ms->iova != iova_expected; > + bool last_seg = (len - cur_len) == ms->len; > + bool skip_last = false; > + > + /* only do mappings when current contiguous area ends */ > + if (new_contig_area) { > + if (type == RTE_MEM_EVENT_ALLOC) > + vfio_dma_mem_map(default_vfio_cfg, va_start, > + iova_start, > + iova_expected - iova_start, 1); > + else > + vfio_dma_mem_map(default_vfio_cfg, va_start, > + iova_start, > + iova_expected - iova_start, 0); > + va_start = ms->addr_64; > + iova_start = ms->iova; > + } > /* some memory segments may have invalid IOVA */ > if (ms->iova == RTE_BAD_IOVA) { > RTE_LOG(DEBUG, EAL, "Memory segment at %p has bad IOVA, skipping\n", > ms->addr); > - goto next; > + skip_last = true; > } > - if (type == RTE_MEM_EVENT_ALLOC) > - vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, > - ms->iova, ms->len, 1); > - else > - vfio_dma_mem_map(default_vfio_cfg, ms->addr_64, > - ms->iova, ms->len, 0); > -next: > + iova_expected = ms->iova + ms->len; > cur_len += ms->len; > ++ms; > + > + /* > + * don't count previous segment, and don't attempt to > + * dereference a potentially invalid pointer. > + */ > + if (skip_last && !last_seg) { > + iova_expected = iova_start = ms->iova; > + va_start = ms->addr_64; > + } else if (!skip_last && last_seg) { > + /* this is the last segment and we're not skipping */ > + if (type == RTE_MEM_EVENT_ALLOC) > + vfio_dma_mem_map(default_vfio_cfg, va_start, > + iova_start, > + iova_expected - iova_start, 1); > + else > + vfio_dma_mem_map(default_vfio_cfg, va_start, > + iova_start, > + iova_expected - iova_start, 0); > + } > } > #ifdef RTE_ARCH_PPC_64 > cur_len = 0; >