From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by dpdk.org (Postfix) with ESMTP id C3DC911A4 for ; Mon, 1 Apr 2019 11:51:51 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x319oAEM006151; Mon, 1 Apr 2019 02:51:51 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=dtSOVxdV0GzN64acJ89cMRTwNmNIDkei75ZndYB7bVc=; b=EVuT+Tt/Nv6iyMYhMDoplzGCA9+3/LThqpjyfdxBU5BqiiXqApbm/zQKhCHERCEmDA8A 2sEaWVZYpNy6rMvPau9C00apAPPM0WqV+nrK+L/Xc4ZP9CMiWzpv3CzPfcl+qEwGdanx OgUqgpy0+M3x1XiyxUTcwYjk2U45zGpu0r2A9mB1TMeiFmbELp2IzkvLtI0nD9/UFXWT q7jRjsxixjwhmr0ReSz1VFb8flkuZmmDdLYtknnuFvL9DaVPXeh33ZwfOOQ+NfDyk/bP rz63UZKNr6gq1PBURqNRDzzfis+tnplUXzdkHJ6BQBMRjTpTL5ol2Jz5OMuRVxxUETHa 8A== Received: from sc-exch02.marvell.com ([199.233.58.182]) by mx0b-0016f401.pphosted.com with ESMTP id 2rkgbc006r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Mon, 01 Apr 2019 02:51:50 -0700 Received: from SC-EXCH02.marvell.com (10.93.176.82) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Mon, 1 Apr 2019 02:51:43 -0700 Received: from NAM05-BY2-obe.outbound.protection.outlook.com (104.47.50.58) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Mon, 1 Apr 2019 02:51:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector1-marvell-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dtSOVxdV0GzN64acJ89cMRTwNmNIDkei75ZndYB7bVc=; b=EZDSMdpvaj4jQfuyWedJgHAvF7GDxbKp9ohHfSBKX6WB3BJ0aBlET3v0bwkJRcaZ/BgAeOFz7MFA0LZKJi5UfiZFc5O/6b+gSSS9B4t4tIEeRgoCGWlVhVV/Wz0d+pYm83pfmM/biR9LDUp6xToXJYnU1dOHFpa9vzJQp44jkWM= Received: from BYAPR18MB2392.namprd18.prod.outlook.com (20.179.91.29) by BYAPR18MB2725.namprd18.prod.outlook.com (20.179.56.95) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1750.15; Mon, 1 Apr 2019 09:51:39 +0000 Received: from BYAPR18MB2392.namprd18.prod.outlook.com ([fe80::7076:7506:270f:9794]) by BYAPR18MB2392.namprd18.prod.outlook.com ([fe80::7076:7506:270f:9794%6]) with mapi id 15.20.1750.017; Mon, 1 Apr 2019 09:51:39 +0000 From: Kiran Kumar Kokkilagadda To: "ferruh.yigit@intel.com" CC: "dev@dpdk.org" , Kiran Kumar Kokkilagadda Thread-Topic: [dpdk-dev] [PATCH v2] kni: add IOVA va support for kni Thread-Index: AQHU6HCAnqramHuq5kqFpGJx1rsNag== Date: Mon, 1 Apr 2019 09:51:39 +0000 Message-ID: <20190401095118.4176-1-kirankumark@marvell.com> References: <20180927104846.16356-1-kkokkilagadda@caviumnetworks.com> In-Reply-To: <20180927104846.16356-1-kkokkilagadda@caviumnetworks.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: PN1PR01CA0114.INDPRD01.PROD.OUTLOOK.COM (2603:1096:c00::30) To BYAPR18MB2392.namprd18.prod.outlook.com (2603:10b6:a03:12e::29) x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 2.19.0.windows.1 x-originating-ip: [115.113.156.3] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 95b13675-34cd-4ad7-b504-08d6b687a27c x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600139)(711020)(4605104)(4534185)(7168020)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060)(7193020); SRVR:BYAPR18MB2725; x-ms-traffictypediagnostic: BYAPR18MB2725: x-microsoft-antispam-prvs: x-forefront-prvs: 0994F5E0C5 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(136003)(376002)(346002)(39850400004)(396003)(189003)(199004)(66066001)(6486002)(8936002)(68736007)(81166006)(71190400001)(316002)(105586002)(478600001)(102836004)(305945005)(4326008)(97736004)(8676002)(86362001)(36756003)(26005)(50226002)(186003)(81156014)(55236004)(6506007)(386003)(14444005)(256004)(2906002)(71200400001)(30864003)(5660300002)(99286004)(2351001)(54906003)(53946003)(486006)(476003)(106356001)(11346002)(446003)(1076003)(76176011)(2501003)(7736002)(6116002)(14454004)(52116002)(2616005)(6916009)(53936002)(6512007)(107886003)(3846002)(25786009)(6436002)(5640700003); DIR:OUT; SFP:1101; SCL:1; SRVR:BYAPR18MB2725; H:BYAPR18MB2392.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: yLnBZ0uifxjkPvVHIB9hwGVHDD/9dvrxKBupBfO33AREi4LiwKZMYZ/Bv4g2Jdq/UkLiyhgCza/VvBgWSfpaC3Pq8HIZszVRNiJ6nTY6x+NuEzBsH58LCE7e9hI0GWNYRqoAQ3shY0QNWmkgsvEz3aX2O2hUSlnW5rlnL/Bve6i/6qLMC/vxTeBWvPSKodHcVOARjzfZd69Esyu+ucRww4M0T7wMTD0s246ADL2wbEY9jAMfGA3/pgef2z5OhT3YV0wHGGcL9NTWMMvjz5JCxQAFi/fD/OgPETitCQKhjqvPcPa64URmBBzlTHwJE7mSMvVClpk3vp03r2oBD1vL735hPUdw2kk0UUMDq5UXrpgOw48x5O/YqqlhOM/q5SCMfxwT25RNOtx0B2e0TLC4sUo6XJcoiDNGuAP7k1SUocY= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 95b13675-34cd-4ad7-b504-08d6b687a27c X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Apr 2019 09:51:39.1492 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR18MB2725 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-04-01_04:, , signatures=0 X-Mailman-Approved-At: Mon, 01 Apr 2019 20:31:29 +0200 Subject: [dpdk-dev] [PATCH v2] kni: add IOVA va support for kni X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Apr 2019 09:51:52 -0000 From: Kiran Kumar K With current KNI implementation kernel module will work only in IOVA=3DPA mode. This patch will add support for kernel module to work with IOVA=3DVA mode. The idea is to maintain a mapping in KNI module between user pages and kernel pages and in fast path perform a lookup in this table and get the kernel virtual address for corresponding user virtual address. In IOVA=3DVA mode, the memory allocated to the pool is physically and virtually contiguous. We will take advantage of this and create a mapping in the kernel.In kernel we need mapping for queues (tx_q, rx_q,... slow path) and mbuf memory (fast path). At the KNI init time, in slow path we will create a mapping for the queues and mbuf using get_user_pages similar to af_xdp. Using pool memory base address, we will create a page map table for the mbuf, which we will use in the fast path for kernel page translation. At KNI init time, we will pass the base address of the pool and size of the pool to kernel. In kernel, using get_user_pages API, we will get the pages with size PAGE_SIZE and store the mapping and start address of user space in a table. In fast path for any user address perform PAGE_SHIFT (user_addr >> PAGE_SHIFT) and subtract the start address from this value, we will get the index of the kernel page with in the page map table. Adding offset to this kernel page address, we will get the kernel address for this user virtual address. For example user pool base address is X, and size is S that we passed to kernel. In kernel we will create a mapping for this using get_user_pages. Our page map table will look like [Y, Y+PAGE_SIZE, Y+(PAGE_SIZE*2) ....] and user start page will be U (we will get it from X >> PAGE_SHIFT). For any user address Z we will get the index of the page map table using ((Z >> PAGE_SHIFT) - U). Adding offset (Z & (PAGE_SIZE - 1)) to this address will give kernel virtual address. Signed-off-by: Kiran Kumar K --- V2 changes: * Fixed build issue with older kernel kernel/linux/kni/kni_dev.h | 37 +++ kernel/linux/kni/kni_misc.c | 215 +++++++++++++++++- kernel/linux/kni/kni_net.c | 114 ++++++++-- .../eal/include/exec-env/rte_kni_common.h | 8 + lib/librte_kni/rte_kni.c | 21 ++ 5 files changed, 369 insertions(+), 26 deletions(-) diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index 688f574a4..055b8d59e 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -32,10 +32,47 @@ /* Default carrier state for created KNI network interfaces */ extern uint32_t dflt_carrier; +struct iova_page_info { + /* User to kernel page table map, used for + * fast path lookup + */ + struct mbuf_page { + void *addr; + } *page_map; + + /* Page mask */ + u64 page_mask; + + /* Start page for user address */ + u64 start_page; + + struct page_info { + /* Physical pages returned by get_user_pages */ + struct page **pgs; + + /* Number of Pages returned by get_user_pages */ + u32 npgs; + } page_info; + + /* Queue info */ + struct page_info tx_q; + struct page_info rx_q; + struct page_info alloc_q; + struct page_info free_q; + struct page_info req_q; + struct page_info resp_q; + struct page_info sync_va; +}; + /** * A structure describing the private information for a kni device. */ struct kni_dev { + /* Page info for IOVA=3DVA mode */ + struct iova_page_info va_info; + /* IOVA mode 0 =3D PA, 1 =3D VA */ + uint8_t iova_mode; + /* kni list */ struct list_head list; diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index 04c78eb87..0be0e1dfd 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -201,6 +201,122 @@ kni_dev_remove(struct kni_dev *dev) return 0; } +static void +kni_unpin_pages(struct page_info *mem) +{ + u32 i; + + /* Set the user pages as dirty, so that these pages will not be + * allocated to other applications until we release them. + */ + for (i =3D 0; i < mem->npgs; i++) { + struct page *page =3D mem->pgs[i]; + + set_page_dirty_lock(page); + put_page(page); + } + + kfree(mem->pgs); + mem->pgs =3D NULL; +} + +static void +kni_clean_queue(struct page_info *mem) +{ + if (mem->pgs) { + set_page_dirty_lock(mem->pgs[0]); + put_page(mem->pgs[0]); + kfree(mem->pgs); + mem->pgs =3D NULL; + } +} + +static void +kni_cleanup_iova(struct iova_page_info *mem) +{ + kni_unpin_pages(&mem->page_info); + kfree(mem->page_map); + mem->page_map =3D NULL; + + kni_clean_queue(&mem->tx_q); + kni_clean_queue(&mem->rx_q); + kni_clean_queue(&mem->alloc_q); + kni_clean_queue(&mem->free_q); + kni_clean_queue(&mem->req_q); + kni_clean_queue(&mem->resp_q); + kni_clean_queue(&mem->sync_va); +} + +int +kni_pin_pages(void *address, size_t size, struct page_info *mem) +{ + unsigned int gup_flags =3D FOLL_WRITE; + long npgs; + int err; + + /* Get at least one page */ + if (size < PAGE_SIZE) + size =3D PAGE_SIZE; + + /* Compute number of user pages based on page size */ + mem->npgs =3D (size + PAGE_SIZE - 1) / PAGE_SIZE; + + /* Allocate memory for the pages */ + mem->pgs =3D kcalloc(mem->npgs, sizeof(*mem->pgs), + GFP_KERNEL | __GFP_NOWARN); + if (!mem->pgs) { + pr_err("%s: -ENOMEM\n", __func__); + return -ENOMEM; + } + + down_write(¤t->mm->mmap_sem); + + /* Get the user pages from the user address*/ +#if LINUX_VERSION_CODE >=3D KERNEL_VERSION(4,9,0) + npgs =3D get_user_pages((u64)address, mem->npgs, + gup_flags, &mem->pgs[0], NULL); +#else + npgs =3D get_user_pages(current, current->mm, (u64)address, mem->npgs, + gup_flags, 0, &mem->pgs[0], NULL); +#endif + up_write(¤t->mm->mmap_sem); + + /* We didn't get all the requested pages, throw error */ + if (npgs !=3D mem->npgs) { + if (npgs >=3D 0) { + mem->npgs =3D npgs; + err =3D -ENOMEM; + pr_err("%s: -ENOMEM\n", __func__); + goto out_pin; + } + err =3D npgs; + goto out_pgs; + } + return 0; + +out_pin: + kni_unpin_pages(mem); +out_pgs: + kfree(mem->pgs); + mem->pgs =3D NULL; + return err; +} + +static void* +kni_map_queue(struct kni_dev *kni, u64 addr, + struct page_info *mm) +{ + /* Map atleast 1 page */ + if (kni_pin_pages((void *)addr, PAGE_SIZE, + mm) !=3D 0) { + pr_err("Unable to pin pages\n"); + return NULL; + } + + return (page_address(mm->pgs[0]) + + (addr & kni->va_info.page_mask)); +} + static int kni_release(struct inode *inode, struct file *file) { @@ -228,6 +344,11 @@ kni_release(struct inode *inode, struct file *file) } kni_dev_remove(dev); + + /* IOVA=3DVA mode, unpin pages */ + if (likely(dev->iova_mode =3D=3D 1)) + kni_cleanup_iova(&dev->va_info); + list_del(&dev->list); } up_write(&knet->kni_list_lock); @@ -368,16 +489,91 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num, strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE); /* Translate user space info into kernel space info */ - kni->tx_q =3D phys_to_virt(dev_info.tx_phys); - kni->rx_q =3D phys_to_virt(dev_info.rx_phys); - kni->alloc_q =3D phys_to_virt(dev_info.alloc_phys); - kni->free_q =3D phys_to_virt(dev_info.free_phys); + kni->iova_mode =3D dev_info.iova_mode; - kni->req_q =3D phys_to_virt(dev_info.req_phys); - kni->resp_q =3D phys_to_virt(dev_info.resp_phys); - kni->sync_va =3D dev_info.sync_va; - kni->sync_kva =3D phys_to_virt(dev_info.sync_phys); + if (kni->iova_mode) { + u64 mbuf_addr; + int i; + + /* map userspace memory info */ + mbuf_addr =3D (u64)dev_info.mbuf_va; + /* Pre compute page mask, used in fast path */ + kni->va_info.page_mask =3D (u64)(PAGE_SIZE - 1); + + /* Store start page address, This is the reference + * for all the user virtual address + */ + kni->va_info.start_page =3D (mbuf_addr >> PAGE_SHIFT); + + /* Get and pin the user pages */ + if (kni_pin_pages(dev_info.mbuf_va, dev_info.mbuf_pool_size, + &kni->va_info.page_info) !=3D 0) { + pr_err("Unable to pin pages\n"); + return -1; + } + + /* Page map table between user and kernel pages */ + kni->va_info.page_map =3D kcalloc(kni->va_info.page_info.npgs, + sizeof(struct mbuf_page), + GFP_KERNEL); + if (kni->va_info.page_map =3D=3D NULL) { + pr_err("Out of memory\n"); + return -ENOMEM; + } + + /* Conver the user pages to kernel pages */ + for (i =3D 0; i < kni->va_info.page_info.npgs; i++) { + kni->va_info.page_map[i].addr =3D + page_address(kni->va_info.page_info.pgs[i]); + } + + /* map queues */ + kni->tx_q =3D kni_map_queue(kni, dev_info.tx_phys, + &kni->va_info.tx_q); + if (kni->tx_q =3D=3D NULL) + goto iova_err; + + kni->rx_q =3D kni_map_queue(kni, dev_info.rx_phys, + &kni->va_info.rx_q); + if (kni->rx_q =3D=3D NULL) + goto iova_err; + + kni->alloc_q =3D kni_map_queue(kni, dev_info.alloc_phys, + &kni->va_info.alloc_q); + if (kni->alloc_q =3D=3D NULL) + goto iova_err; + + kni->free_q =3D kni_map_queue(kni, dev_info.free_phys, + &kni->va_info.free_q); + if (kni->free_q =3D=3D NULL) + goto iova_err; + + kni->req_q =3D kni_map_queue(kni, dev_info.req_phys, + &kni->va_info.req_q); + if (kni->req_q =3D=3D NULL) + goto iova_err; + + kni->resp_q =3D kni_map_queue(kni, dev_info.resp_phys, + &kni->va_info.resp_q); + if (kni->resp_q =3D=3D NULL) + goto iova_err; + + kni->sync_kva =3D kni_map_queue(kni, dev_info.sync_phys, + &kni->va_info.sync_va); + if (kni->sync_kva =3D=3D NULL) + goto iova_err; + } else { + /* Address tranlation for IOVA=3DPA mode */ + kni->tx_q =3D phys_to_virt(dev_info.tx_phys); + kni->rx_q =3D phys_to_virt(dev_info.rx_phys); + kni->alloc_q =3D phys_to_virt(dev_info.alloc_phys); + kni->free_q =3D phys_to_virt(dev_info.free_phys); + kni->req_q =3D phys_to_virt(dev_info.req_phys); + kni->resp_q =3D phys_to_virt(dev_info.resp_phys); + kni->sync_kva =3D phys_to_virt(dev_info.sync_phys); + } + kni->sync_va =3D dev_info.sync_va; kni->mbuf_size =3D dev_info.mbuf_size; pr_debug("tx_phys: 0x%016llx, tx_q addr: 0x%p\n", @@ -484,6 +680,9 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num, up_write(&knet->kni_list_lock); return 0; +iova_err: + kni_cleanup_iova(&kni->va_info); + return -1; } static int diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 7371b6d58..83fbcf6f1 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -35,6 +35,25 @@ static void kni_net_rx_normal(struct kni_dev *kni); /* kni rx function pointer, with default to normal rx */ static kni_net_rx_t kni_net_rx_func =3D kni_net_rx_normal; + +/* Get the kernel address from the user address using + * page map table. Will be used only in IOVA=3DVA mode + */ +static inline void* +get_kva(uint64_t usr_addr, struct kni_dev *kni) +{ + uint32_t index; + /* User page - start user page will give the index + * with in the page map table + */ + index =3D (usr_addr >> PAGE_SHIFT) - kni->va_info.start_page; + + /* Add the offset to the page address */ + return (kni->va_info.page_map[index].addr + + (usr_addr & kni->va_info.page_mask)); + +} + /* physical address to kernel virtual address */ static void * pa2kva(void *pa) @@ -186,7 +205,10 @@ kni_fifo_trans_pa2va(struct kni_dev *kni, return; for (i =3D 0; i < num_rx; i++) { - kva =3D pa2kva(kni->pa[i]); + if (likely(kni->iova_mode =3D=3D 1)) + kva =3D get_kva((u64)(kni->pa[i]), kni); + else + kva =3D pa2kva(kni->pa[i]); kni->va[i] =3D pa2va(kni->pa[i], kva); } @@ -263,8 +285,16 @@ kni_net_tx(struct sk_buff *skb, struct net_device *dev= ) if (likely(ret =3D=3D 1)) { void *data_kva; - pkt_kva =3D pa2kva(pkt_pa); - data_kva =3D kva2data_kva(pkt_kva); + + if (likely(kni->iova_mode =3D=3D 1)) { + pkt_kva =3D get_kva((u64)pkt_pa, kni); + data_kva =3D (uint8_t *)pkt_kva + + (sizeof(struct rte_kni_mbuf) + + pkt_kva->data_off); + } else { + pkt_kva =3D pa2kva(pkt_pa); + data_kva =3D kva2data_kva(pkt_kva); + } pkt_va =3D pa2va(pkt_pa, pkt_kva); len =3D skb->len; @@ -333,11 +363,18 @@ kni_net_rx_normal(struct kni_dev *kni) if (num_rx =3D=3D 0) return; + /* Transfer received packets to netif */ for (i =3D 0; i < num_rx; i++) { - kva =3D pa2kva(kni->pa[i]); + if (likely(kni->iova_mode =3D=3D 1)) { + kva =3D get_kva((u64)kni->pa[i], kni); + data_kva =3D (uint8_t *)kva + + (sizeof(struct rte_kni_mbuf) + kva->data_off); + } else { + kva =3D pa2kva(kni->pa[i]); + data_kva =3D kva2data_kva(kva); + } len =3D kva->pkt_len; - data_kva =3D kva2data_kva(kva); kni->va[i] =3D pa2va(kni->pa[i], kva); skb =3D dev_alloc_skb(len + 2); @@ -363,8 +400,17 @@ kni_net_rx_normal(struct kni_dev *kni) if (!kva->next) break; - kva =3D pa2kva(va2pa(kva->next, kva)); - data_kva =3D kva2data_kva(kva); + if (likely(kni->iova_mode =3D=3D 1)) { + kva =3D get_kva( + (u64)va2pa(kva->next, kva), + kni); + data_kva =3D (uint8_t *)kva + + (sizeof(struct rte_kni_mbuf) + + kva->data_off); + } else { + kva =3D pa2kva(va2pa(kva->next, kva)); + data_kva =3D kva2data_kva(kva); + } } } @@ -434,14 +480,31 @@ kni_net_rx_lo_fifo(struct kni_dev *kni) num =3D ret; /* Copy mbufs */ for (i =3D 0; i < num; i++) { - kva =3D pa2kva(kni->pa[i]); - len =3D kva->pkt_len; - data_kva =3D kva2data_kva(kva); - kni->va[i] =3D pa2va(kni->pa[i], kva); + if (likely(kni->iova_mode =3D=3D 1)) { + kva =3D get_kva((u64)(kni->pa[i]), kni); + len =3D kva->pkt_len; + data_kva =3D (uint8_t *)kva + + (sizeof(struct rte_kni_mbuf) + + kva->data_off); + kni->va[i] =3D pa2va(kni->pa[i], kva); + alloc_kva =3D get_kva((u64)(kni->alloc_pa[i]), + kni); + alloc_data_kva =3D (uint8_t *)alloc_kva + + (sizeof(struct rte_kni_mbuf) + + alloc_kva->data_off); + kni->alloc_va[i] =3D pa2va(kni->alloc_pa[i], + alloc_kva); + } else { + kva =3D pa2kva(kni->pa[i]); + len =3D kva->pkt_len; + data_kva =3D kva2data_kva(kva); + kni->va[i] =3D pa2va(kni->pa[i], kva); - alloc_kva =3D pa2kva(kni->alloc_pa[i]); - alloc_data_kva =3D kva2data_kva(alloc_kva); - kni->alloc_va[i] =3D pa2va(kni->alloc_pa[i], alloc_kva); + alloc_kva =3D pa2kva(kni->alloc_pa[i]); + alloc_data_kva =3D kva2data_kva(alloc_kva); + kni->alloc_va[i] =3D pa2va(kni->alloc_pa[i], + alloc_kva); + } memcpy(alloc_data_kva, data_kva, len); alloc_kva->pkt_len =3D len; @@ -507,9 +570,15 @@ kni_net_rx_lo_fifo_skb(struct kni_dev *kni) /* Copy mbufs to sk buffer and then call tx interface */ for (i =3D 0; i < num; i++) { - kva =3D pa2kva(kni->pa[i]); + if (likely(kni->iova_mode =3D=3D 1)) { + kva =3D get_kva((u64)(kni->pa[i]), kni); + data_kva =3D (uint8_t *)kva + + (sizeof(struct rte_kni_mbuf) + kva->data_off); + } else { + kva =3D pa2kva(kni->pa[i]); + data_kva =3D kva2data_kva(kva); + } len =3D kva->pkt_len; - data_kva =3D kva2data_kva(kva); kni->va[i] =3D pa2va(kni->pa[i], kva); skb =3D dev_alloc_skb(len + 2); @@ -545,8 +614,17 @@ kni_net_rx_lo_fifo_skb(struct kni_dev *kni) if (!kva->next) break; - kva =3D pa2kva(va2pa(kva->next, kva)); - data_kva =3D kva2data_kva(kva); + if (likely(kni->iova_mode =3D=3D 1)) { + kva =3D get_kva( + (u64)(va2pa(kva->next, kva)), + kni); + data_kva =3D (uint8_t *)kva + + (sizeof(struct rte_kni_mbuf) + + kva->data_off); + } else { + kva =3D pa2kva(va2pa(kva->next, kva)); + data_kva =3D kva2data_kva(kva); + } } } diff --git a/lib/librte_eal/linux/eal/include/exec-env/rte_kni_common.h b/l= ib/librte_eal/linux/eal/include/exec-env/rte_kni_common.h index 5afa08713..897dd956f 100644 --- a/lib/librte_eal/linux/eal/include/exec-env/rte_kni_common.h +++ b/lib/librte_eal/linux/eal/include/exec-env/rte_kni_common.h @@ -128,6 +128,14 @@ struct rte_kni_device_info { unsigned mbuf_size; unsigned int mtu; char mac_addr[6]; + + /* IOVA mode. 1 =3D VA, 0 =3D PA */ + uint8_t iova_mode; + + /* Pool size, will be used in kernel to map the + * user pages + */ + uint64_t mbuf_pool_size; }; #define KNI_DEVICE "kni" diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index 492e207a3..3bf19faa0 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -304,6 +304,27 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool, kni->group_id =3D conf->group_id; kni->mbuf_size =3D conf->mbuf_size; + dev_info.iova_mode =3D (rte_eal_iova_mode() =3D=3D RTE_IOVA_VA) ? 1 : 0; + if (dev_info.iova_mode) { + struct rte_mempool_memhdr *hdr; + uint64_t pool_size =3D 0; + + /* In each pool header chunk, we will maintain the + * base address of the pool. This chunk is physically and + * virtually contiguous. + * This approach will work, only if the allocated pool + * memory is contiguous, else it won't work + */ + hdr =3D STAILQ_FIRST(&pktmbuf_pool->mem_list); + dev_info.mbuf_va =3D (void *)(hdr->addr); + + /* Traverse the list and get the total size of the pool */ + STAILQ_FOREACH(hdr, &pktmbuf_pool->mem_list, next) { + pool_size +=3D hdr->len; + } + dev_info.mbuf_pool_size =3D pool_size + + pktmbuf_pool->mz->len; + } ret =3D ioctl(kni_fd, RTE_KNI_IOCTL_CREATE, &dev_info); if (ret < 0) goto ioctl_fail; -- 2.17.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 50CA0A0679 for ; Mon, 1 Apr 2019 20:31:31 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 1FE8E5A6E; Mon, 1 Apr 2019 20:31:30 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0b-0016f401.pphosted.com [67.231.156.173]) by dpdk.org (Postfix) with ESMTP id C3DC911A4 for ; Mon, 1 Apr 2019 11:51:51 +0200 (CEST) Received: from pps.filterd (m0045851.ppops.net [127.0.0.1]) by mx0b-0016f401.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x319oAEM006151; Mon, 1 Apr 2019 02:51:51 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-transfer-encoding : mime-version; s=pfpt0818; bh=dtSOVxdV0GzN64acJ89cMRTwNmNIDkei75ZndYB7bVc=; b=EVuT+Tt/Nv6iyMYhMDoplzGCA9+3/LThqpjyfdxBU5BqiiXqApbm/zQKhCHERCEmDA8A 2sEaWVZYpNy6rMvPau9C00apAPPM0WqV+nrK+L/Xc4ZP9CMiWzpv3CzPfcl+qEwGdanx OgUqgpy0+M3x1XiyxUTcwYjk2U45zGpu0r2A9mB1TMeiFmbELp2IzkvLtI0nD9/UFXWT q7jRjsxixjwhmr0ReSz1VFb8flkuZmmDdLYtknnuFvL9DaVPXeh33ZwfOOQ+NfDyk/bP rz63UZKNr6gq1PBURqNRDzzfis+tnplUXzdkHJ6BQBMRjTpTL5ol2Jz5OMuRVxxUETHa 8A== Received: from sc-exch02.marvell.com ([199.233.58.182]) by mx0b-0016f401.pphosted.com with ESMTP id 2rkgbc006r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Mon, 01 Apr 2019 02:51:50 -0700 Received: from SC-EXCH02.marvell.com (10.93.176.82) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3; Mon, 1 Apr 2019 02:51:43 -0700 Received: from NAM05-BY2-obe.outbound.protection.outlook.com (104.47.50.58) by SC-EXCH02.marvell.com (10.93.176.82) with Microsoft SMTP Server (TLS) id 15.0.1367.3 via Frontend Transport; Mon, 1 Apr 2019 02:51:43 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector1-marvell-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=dtSOVxdV0GzN64acJ89cMRTwNmNIDkei75ZndYB7bVc=; b=EZDSMdpvaj4jQfuyWedJgHAvF7GDxbKp9ohHfSBKX6WB3BJ0aBlET3v0bwkJRcaZ/BgAeOFz7MFA0LZKJi5UfiZFc5O/6b+gSSS9B4t4tIEeRgoCGWlVhVV/Wz0d+pYm83pfmM/biR9LDUp6xToXJYnU1dOHFpa9vzJQp44jkWM= Received: from BYAPR18MB2392.namprd18.prod.outlook.com (20.179.91.29) by BYAPR18MB2725.namprd18.prod.outlook.com (20.179.56.95) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1750.15; Mon, 1 Apr 2019 09:51:39 +0000 Received: from BYAPR18MB2392.namprd18.prod.outlook.com ([fe80::7076:7506:270f:9794]) by BYAPR18MB2392.namprd18.prod.outlook.com ([fe80::7076:7506:270f:9794%6]) with mapi id 15.20.1750.017; Mon, 1 Apr 2019 09:51:39 +0000 From: Kiran Kumar Kokkilagadda To: "ferruh.yigit@intel.com" CC: "dev@dpdk.org" , Kiran Kumar Kokkilagadda Thread-Topic: [dpdk-dev] [PATCH v2] kni: add IOVA va support for kni Thread-Index: AQHU6HCAnqramHuq5kqFpGJx1rsNag== Date: Mon, 1 Apr 2019 09:51:39 +0000 Message-ID: <20190401095118.4176-1-kirankumark@marvell.com> References: <20180927104846.16356-1-kkokkilagadda@caviumnetworks.com> In-Reply-To: <20180927104846.16356-1-kkokkilagadda@caviumnetworks.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: PN1PR01CA0114.INDPRD01.PROD.OUTLOOK.COM (2603:1096:c00::30) To BYAPR18MB2392.namprd18.prod.outlook.com (2603:10b6:a03:12e::29) x-ms-exchange-messagesentrepresentingtype: 1 x-mailer: git-send-email 2.19.0.windows.1 x-originating-ip: [115.113.156.3] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 95b13675-34cd-4ad7-b504-08d6b687a27c x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(5600139)(711020)(4605104)(4534185)(7168020)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060)(7193020); SRVR:BYAPR18MB2725; x-ms-traffictypediagnostic: BYAPR18MB2725: x-microsoft-antispam-prvs: x-forefront-prvs: 0994F5E0C5 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(136003)(376002)(346002)(39850400004)(396003)(189003)(199004)(66066001)(6486002)(8936002)(68736007)(81166006)(71190400001)(316002)(105586002)(478600001)(102836004)(305945005)(4326008)(97736004)(8676002)(86362001)(36756003)(26005)(50226002)(186003)(81156014)(55236004)(6506007)(386003)(14444005)(256004)(2906002)(71200400001)(30864003)(5660300002)(99286004)(2351001)(54906003)(53946003)(486006)(476003)(106356001)(11346002)(446003)(1076003)(76176011)(2501003)(7736002)(6116002)(14454004)(52116002)(2616005)(6916009)(53936002)(6512007)(107886003)(3846002)(25786009)(6436002)(5640700003); DIR:OUT; SFP:1101; SCL:1; SRVR:BYAPR18MB2725; H:BYAPR18MB2392.namprd18.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: marvell.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: yLnBZ0uifxjkPvVHIB9hwGVHDD/9dvrxKBupBfO33AREi4LiwKZMYZ/Bv4g2Jdq/UkLiyhgCza/VvBgWSfpaC3Pq8HIZszVRNiJ6nTY6x+NuEzBsH58LCE7e9hI0GWNYRqoAQ3shY0QNWmkgsvEz3aX2O2hUSlnW5rlnL/Bve6i/6qLMC/vxTeBWvPSKodHcVOARjzfZd69Esyu+ucRww4M0T7wMTD0s246ADL2wbEY9jAMfGA3/pgef2z5OhT3YV0wHGGcL9NTWMMvjz5JCxQAFi/fD/OgPETitCQKhjqvPcPa64URmBBzlTHwJE7mSMvVClpk3vp03r2oBD1vL735hPUdw2kk0UUMDq5UXrpgOw48x5O/YqqlhOM/q5SCMfxwT25RNOtx0B2e0TLC4sUo6XJcoiDNGuAP7k1SUocY= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 95b13675-34cd-4ad7-b504-08d6b687a27c X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Apr 2019 09:51:39.1492 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR18MB2725 X-OriginatorOrg: marvell.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-04-01_04:, , signatures=0 X-Mailman-Approved-At: Mon, 01 Apr 2019 20:31:29 +0200 Subject: [dpdk-dev] [PATCH v2] kni: add IOVA va support for kni X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190401095139.B-yIqpOi5wufg47xVEXVoiipc5gPQsxSSuQWU8Fxk98@z> From: Kiran Kumar K With current KNI implementation kernel module will work only in IOVA=3DPA mode. This patch will add support for kernel module to work with IOVA=3DVA mode. The idea is to maintain a mapping in KNI module between user pages and kernel pages and in fast path perform a lookup in this table and get the kernel virtual address for corresponding user virtual address. In IOVA=3DVA mode, the memory allocated to the pool is physically and virtually contiguous. We will take advantage of this and create a mapping in the kernel.In kernel we need mapping for queues (tx_q, rx_q,... slow path) and mbuf memory (fast path). At the KNI init time, in slow path we will create a mapping for the queues and mbuf using get_user_pages similar to af_xdp. Using pool memory base address, we will create a page map table for the mbuf, which we will use in the fast path for kernel page translation. At KNI init time, we will pass the base address of the pool and size of the pool to kernel. In kernel, using get_user_pages API, we will get the pages with size PAGE_SIZE and store the mapping and start address of user space in a table. In fast path for any user address perform PAGE_SHIFT (user_addr >> PAGE_SHIFT) and subtract the start address from this value, we will get the index of the kernel page with in the page map table. Adding offset to this kernel page address, we will get the kernel address for this user virtual address. For example user pool base address is X, and size is S that we passed to kernel. In kernel we will create a mapping for this using get_user_pages. Our page map table will look like [Y, Y+PAGE_SIZE, Y+(PAGE_SIZE*2) ....] and user start page will be U (we will get it from X >> PAGE_SHIFT). For any user address Z we will get the index of the page map table using ((Z >> PAGE_SHIFT) - U). Adding offset (Z & (PAGE_SIZE - 1)) to this address will give kernel virtual address. Signed-off-by: Kiran Kumar K --- V2 changes: * Fixed build issue with older kernel kernel/linux/kni/kni_dev.h | 37 +++ kernel/linux/kni/kni_misc.c | 215 +++++++++++++++++- kernel/linux/kni/kni_net.c | 114 ++++++++-- .../eal/include/exec-env/rte_kni_common.h | 8 + lib/librte_kni/rte_kni.c | 21 ++ 5 files changed, 369 insertions(+), 26 deletions(-) diff --git a/kernel/linux/kni/kni_dev.h b/kernel/linux/kni/kni_dev.h index 688f574a4..055b8d59e 100644 --- a/kernel/linux/kni/kni_dev.h +++ b/kernel/linux/kni/kni_dev.h @@ -32,10 +32,47 @@ /* Default carrier state for created KNI network interfaces */ extern uint32_t dflt_carrier; +struct iova_page_info { + /* User to kernel page table map, used for + * fast path lookup + */ + struct mbuf_page { + void *addr; + } *page_map; + + /* Page mask */ + u64 page_mask; + + /* Start page for user address */ + u64 start_page; + + struct page_info { + /* Physical pages returned by get_user_pages */ + struct page **pgs; + + /* Number of Pages returned by get_user_pages */ + u32 npgs; + } page_info; + + /* Queue info */ + struct page_info tx_q; + struct page_info rx_q; + struct page_info alloc_q; + struct page_info free_q; + struct page_info req_q; + struct page_info resp_q; + struct page_info sync_va; +}; + /** * A structure describing the private information for a kni device. */ struct kni_dev { + /* Page info for IOVA=3DVA mode */ + struct iova_page_info va_info; + /* IOVA mode 0 =3D PA, 1 =3D VA */ + uint8_t iova_mode; + /* kni list */ struct list_head list; diff --git a/kernel/linux/kni/kni_misc.c b/kernel/linux/kni/kni_misc.c index 04c78eb87..0be0e1dfd 100644 --- a/kernel/linux/kni/kni_misc.c +++ b/kernel/linux/kni/kni_misc.c @@ -201,6 +201,122 @@ kni_dev_remove(struct kni_dev *dev) return 0; } +static void +kni_unpin_pages(struct page_info *mem) +{ + u32 i; + + /* Set the user pages as dirty, so that these pages will not be + * allocated to other applications until we release them. + */ + for (i =3D 0; i < mem->npgs; i++) { + struct page *page =3D mem->pgs[i]; + + set_page_dirty_lock(page); + put_page(page); + } + + kfree(mem->pgs); + mem->pgs =3D NULL; +} + +static void +kni_clean_queue(struct page_info *mem) +{ + if (mem->pgs) { + set_page_dirty_lock(mem->pgs[0]); + put_page(mem->pgs[0]); + kfree(mem->pgs); + mem->pgs =3D NULL; + } +} + +static void +kni_cleanup_iova(struct iova_page_info *mem) +{ + kni_unpin_pages(&mem->page_info); + kfree(mem->page_map); + mem->page_map =3D NULL; + + kni_clean_queue(&mem->tx_q); + kni_clean_queue(&mem->rx_q); + kni_clean_queue(&mem->alloc_q); + kni_clean_queue(&mem->free_q); + kni_clean_queue(&mem->req_q); + kni_clean_queue(&mem->resp_q); + kni_clean_queue(&mem->sync_va); +} + +int +kni_pin_pages(void *address, size_t size, struct page_info *mem) +{ + unsigned int gup_flags =3D FOLL_WRITE; + long npgs; + int err; + + /* Get at least one page */ + if (size < PAGE_SIZE) + size =3D PAGE_SIZE; + + /* Compute number of user pages based on page size */ + mem->npgs =3D (size + PAGE_SIZE - 1) / PAGE_SIZE; + + /* Allocate memory for the pages */ + mem->pgs =3D kcalloc(mem->npgs, sizeof(*mem->pgs), + GFP_KERNEL | __GFP_NOWARN); + if (!mem->pgs) { + pr_err("%s: -ENOMEM\n", __func__); + return -ENOMEM; + } + + down_write(¤t->mm->mmap_sem); + + /* Get the user pages from the user address*/ +#if LINUX_VERSION_CODE >=3D KERNEL_VERSION(4,9,0) + npgs =3D get_user_pages((u64)address, mem->npgs, + gup_flags, &mem->pgs[0], NULL); +#else + npgs =3D get_user_pages(current, current->mm, (u64)address, mem->npgs, + gup_flags, 0, &mem->pgs[0], NULL); +#endif + up_write(¤t->mm->mmap_sem); + + /* We didn't get all the requested pages, throw error */ + if (npgs !=3D mem->npgs) { + if (npgs >=3D 0) { + mem->npgs =3D npgs; + err =3D -ENOMEM; + pr_err("%s: -ENOMEM\n", __func__); + goto out_pin; + } + err =3D npgs; + goto out_pgs; + } + return 0; + +out_pin: + kni_unpin_pages(mem); +out_pgs: + kfree(mem->pgs); + mem->pgs =3D NULL; + return err; +} + +static void* +kni_map_queue(struct kni_dev *kni, u64 addr, + struct page_info *mm) +{ + /* Map atleast 1 page */ + if (kni_pin_pages((void *)addr, PAGE_SIZE, + mm) !=3D 0) { + pr_err("Unable to pin pages\n"); + return NULL; + } + + return (page_address(mm->pgs[0]) + + (addr & kni->va_info.page_mask)); +} + static int kni_release(struct inode *inode, struct file *file) { @@ -228,6 +344,11 @@ kni_release(struct inode *inode, struct file *file) } kni_dev_remove(dev); + + /* IOVA=3DVA mode, unpin pages */ + if (likely(dev->iova_mode =3D=3D 1)) + kni_cleanup_iova(&dev->va_info); + list_del(&dev->list); } up_write(&knet->kni_list_lock); @@ -368,16 +489,91 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num, strncpy(kni->name, dev_info.name, RTE_KNI_NAMESIZE); /* Translate user space info into kernel space info */ - kni->tx_q =3D phys_to_virt(dev_info.tx_phys); - kni->rx_q =3D phys_to_virt(dev_info.rx_phys); - kni->alloc_q =3D phys_to_virt(dev_info.alloc_phys); - kni->free_q =3D phys_to_virt(dev_info.free_phys); + kni->iova_mode =3D dev_info.iova_mode; - kni->req_q =3D phys_to_virt(dev_info.req_phys); - kni->resp_q =3D phys_to_virt(dev_info.resp_phys); - kni->sync_va =3D dev_info.sync_va; - kni->sync_kva =3D phys_to_virt(dev_info.sync_phys); + if (kni->iova_mode) { + u64 mbuf_addr; + int i; + + /* map userspace memory info */ + mbuf_addr =3D (u64)dev_info.mbuf_va; + /* Pre compute page mask, used in fast path */ + kni->va_info.page_mask =3D (u64)(PAGE_SIZE - 1); + + /* Store start page address, This is the reference + * for all the user virtual address + */ + kni->va_info.start_page =3D (mbuf_addr >> PAGE_SHIFT); + + /* Get and pin the user pages */ + if (kni_pin_pages(dev_info.mbuf_va, dev_info.mbuf_pool_size, + &kni->va_info.page_info) !=3D 0) { + pr_err("Unable to pin pages\n"); + return -1; + } + + /* Page map table between user and kernel pages */ + kni->va_info.page_map =3D kcalloc(kni->va_info.page_info.npgs, + sizeof(struct mbuf_page), + GFP_KERNEL); + if (kni->va_info.page_map =3D=3D NULL) { + pr_err("Out of memory\n"); + return -ENOMEM; + } + + /* Conver the user pages to kernel pages */ + for (i =3D 0; i < kni->va_info.page_info.npgs; i++) { + kni->va_info.page_map[i].addr =3D + page_address(kni->va_info.page_info.pgs[i]); + } + + /* map queues */ + kni->tx_q =3D kni_map_queue(kni, dev_info.tx_phys, + &kni->va_info.tx_q); + if (kni->tx_q =3D=3D NULL) + goto iova_err; + + kni->rx_q =3D kni_map_queue(kni, dev_info.rx_phys, + &kni->va_info.rx_q); + if (kni->rx_q =3D=3D NULL) + goto iova_err; + + kni->alloc_q =3D kni_map_queue(kni, dev_info.alloc_phys, + &kni->va_info.alloc_q); + if (kni->alloc_q =3D=3D NULL) + goto iova_err; + + kni->free_q =3D kni_map_queue(kni, dev_info.free_phys, + &kni->va_info.free_q); + if (kni->free_q =3D=3D NULL) + goto iova_err; + + kni->req_q =3D kni_map_queue(kni, dev_info.req_phys, + &kni->va_info.req_q); + if (kni->req_q =3D=3D NULL) + goto iova_err; + + kni->resp_q =3D kni_map_queue(kni, dev_info.resp_phys, + &kni->va_info.resp_q); + if (kni->resp_q =3D=3D NULL) + goto iova_err; + + kni->sync_kva =3D kni_map_queue(kni, dev_info.sync_phys, + &kni->va_info.sync_va); + if (kni->sync_kva =3D=3D NULL) + goto iova_err; + } else { + /* Address tranlation for IOVA=3DPA mode */ + kni->tx_q =3D phys_to_virt(dev_info.tx_phys); + kni->rx_q =3D phys_to_virt(dev_info.rx_phys); + kni->alloc_q =3D phys_to_virt(dev_info.alloc_phys); + kni->free_q =3D phys_to_virt(dev_info.free_phys); + kni->req_q =3D phys_to_virt(dev_info.req_phys); + kni->resp_q =3D phys_to_virt(dev_info.resp_phys); + kni->sync_kva =3D phys_to_virt(dev_info.sync_phys); + } + kni->sync_va =3D dev_info.sync_va; kni->mbuf_size =3D dev_info.mbuf_size; pr_debug("tx_phys: 0x%016llx, tx_q addr: 0x%p\n", @@ -484,6 +680,9 @@ kni_ioctl_create(struct net *net, uint32_t ioctl_num, up_write(&knet->kni_list_lock); return 0; +iova_err: + kni_cleanup_iova(&kni->va_info); + return -1; } static int diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index 7371b6d58..83fbcf6f1 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -35,6 +35,25 @@ static void kni_net_rx_normal(struct kni_dev *kni); /* kni rx function pointer, with default to normal rx */ static kni_net_rx_t kni_net_rx_func =3D kni_net_rx_normal; + +/* Get the kernel address from the user address using + * page map table. Will be used only in IOVA=3DVA mode + */ +static inline void* +get_kva(uint64_t usr_addr, struct kni_dev *kni) +{ + uint32_t index; + /* User page - start user page will give the index + * with in the page map table + */ + index =3D (usr_addr >> PAGE_SHIFT) - kni->va_info.start_page; + + /* Add the offset to the page address */ + return (kni->va_info.page_map[index].addr + + (usr_addr & kni->va_info.page_mask)); + +} + /* physical address to kernel virtual address */ static void * pa2kva(void *pa) @@ -186,7 +205,10 @@ kni_fifo_trans_pa2va(struct kni_dev *kni, return; for (i =3D 0; i < num_rx; i++) { - kva =3D pa2kva(kni->pa[i]); + if (likely(kni->iova_mode =3D=3D 1)) + kva =3D get_kva((u64)(kni->pa[i]), kni); + else + kva =3D pa2kva(kni->pa[i]); kni->va[i] =3D pa2va(kni->pa[i], kva); } @@ -263,8 +285,16 @@ kni_net_tx(struct sk_buff *skb, struct net_device *dev= ) if (likely(ret =3D=3D 1)) { void *data_kva; - pkt_kva =3D pa2kva(pkt_pa); - data_kva =3D kva2data_kva(pkt_kva); + + if (likely(kni->iova_mode =3D=3D 1)) { + pkt_kva =3D get_kva((u64)pkt_pa, kni); + data_kva =3D (uint8_t *)pkt_kva + + (sizeof(struct rte_kni_mbuf) + + pkt_kva->data_off); + } else { + pkt_kva =3D pa2kva(pkt_pa); + data_kva =3D kva2data_kva(pkt_kva); + } pkt_va =3D pa2va(pkt_pa, pkt_kva); len =3D skb->len; @@ -333,11 +363,18 @@ kni_net_rx_normal(struct kni_dev *kni) if (num_rx =3D=3D 0) return; + /* Transfer received packets to netif */ for (i =3D 0; i < num_rx; i++) { - kva =3D pa2kva(kni->pa[i]); + if (likely(kni->iova_mode =3D=3D 1)) { + kva =3D get_kva((u64)kni->pa[i], kni); + data_kva =3D (uint8_t *)kva + + (sizeof(struct rte_kni_mbuf) + kva->data_off); + } else { + kva =3D pa2kva(kni->pa[i]); + data_kva =3D kva2data_kva(kva); + } len =3D kva->pkt_len; - data_kva =3D kva2data_kva(kva); kni->va[i] =3D pa2va(kni->pa[i], kva); skb =3D dev_alloc_skb(len + 2); @@ -363,8 +400,17 @@ kni_net_rx_normal(struct kni_dev *kni) if (!kva->next) break; - kva =3D pa2kva(va2pa(kva->next, kva)); - data_kva =3D kva2data_kva(kva); + if (likely(kni->iova_mode =3D=3D 1)) { + kva =3D get_kva( + (u64)va2pa(kva->next, kva), + kni); + data_kva =3D (uint8_t *)kva + + (sizeof(struct rte_kni_mbuf) + + kva->data_off); + } else { + kva =3D pa2kva(va2pa(kva->next, kva)); + data_kva =3D kva2data_kva(kva); + } } } @@ -434,14 +480,31 @@ kni_net_rx_lo_fifo(struct kni_dev *kni) num =3D ret; /* Copy mbufs */ for (i =3D 0; i < num; i++) { - kva =3D pa2kva(kni->pa[i]); - len =3D kva->pkt_len; - data_kva =3D kva2data_kva(kva); - kni->va[i] =3D pa2va(kni->pa[i], kva); + if (likely(kni->iova_mode =3D=3D 1)) { + kva =3D get_kva((u64)(kni->pa[i]), kni); + len =3D kva->pkt_len; + data_kva =3D (uint8_t *)kva + + (sizeof(struct rte_kni_mbuf) + + kva->data_off); + kni->va[i] =3D pa2va(kni->pa[i], kva); + alloc_kva =3D get_kva((u64)(kni->alloc_pa[i]), + kni); + alloc_data_kva =3D (uint8_t *)alloc_kva + + (sizeof(struct rte_kni_mbuf) + + alloc_kva->data_off); + kni->alloc_va[i] =3D pa2va(kni->alloc_pa[i], + alloc_kva); + } else { + kva =3D pa2kva(kni->pa[i]); + len =3D kva->pkt_len; + data_kva =3D kva2data_kva(kva); + kni->va[i] =3D pa2va(kni->pa[i], kva); - alloc_kva =3D pa2kva(kni->alloc_pa[i]); - alloc_data_kva =3D kva2data_kva(alloc_kva); - kni->alloc_va[i] =3D pa2va(kni->alloc_pa[i], alloc_kva); + alloc_kva =3D pa2kva(kni->alloc_pa[i]); + alloc_data_kva =3D kva2data_kva(alloc_kva); + kni->alloc_va[i] =3D pa2va(kni->alloc_pa[i], + alloc_kva); + } memcpy(alloc_data_kva, data_kva, len); alloc_kva->pkt_len =3D len; @@ -507,9 +570,15 @@ kni_net_rx_lo_fifo_skb(struct kni_dev *kni) /* Copy mbufs to sk buffer and then call tx interface */ for (i =3D 0; i < num; i++) { - kva =3D pa2kva(kni->pa[i]); + if (likely(kni->iova_mode =3D=3D 1)) { + kva =3D get_kva((u64)(kni->pa[i]), kni); + data_kva =3D (uint8_t *)kva + + (sizeof(struct rte_kni_mbuf) + kva->data_off); + } else { + kva =3D pa2kva(kni->pa[i]); + data_kva =3D kva2data_kva(kva); + } len =3D kva->pkt_len; - data_kva =3D kva2data_kva(kva); kni->va[i] =3D pa2va(kni->pa[i], kva); skb =3D dev_alloc_skb(len + 2); @@ -545,8 +614,17 @@ kni_net_rx_lo_fifo_skb(struct kni_dev *kni) if (!kva->next) break; - kva =3D pa2kva(va2pa(kva->next, kva)); - data_kva =3D kva2data_kva(kva); + if (likely(kni->iova_mode =3D=3D 1)) { + kva =3D get_kva( + (u64)(va2pa(kva->next, kva)), + kni); + data_kva =3D (uint8_t *)kva + + (sizeof(struct rte_kni_mbuf) + + kva->data_off); + } else { + kva =3D pa2kva(va2pa(kva->next, kva)); + data_kva =3D kva2data_kva(kva); + } } } diff --git a/lib/librte_eal/linux/eal/include/exec-env/rte_kni_common.h b/l= ib/librte_eal/linux/eal/include/exec-env/rte_kni_common.h index 5afa08713..897dd956f 100644 --- a/lib/librte_eal/linux/eal/include/exec-env/rte_kni_common.h +++ b/lib/librte_eal/linux/eal/include/exec-env/rte_kni_common.h @@ -128,6 +128,14 @@ struct rte_kni_device_info { unsigned mbuf_size; unsigned int mtu; char mac_addr[6]; + + /* IOVA mode. 1 =3D VA, 0 =3D PA */ + uint8_t iova_mode; + + /* Pool size, will be used in kernel to map the + * user pages + */ + uint64_t mbuf_pool_size; }; #define KNI_DEVICE "kni" diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index 492e207a3..3bf19faa0 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -304,6 +304,27 @@ rte_kni_alloc(struct rte_mempool *pktmbuf_pool, kni->group_id =3D conf->group_id; kni->mbuf_size =3D conf->mbuf_size; + dev_info.iova_mode =3D (rte_eal_iova_mode() =3D=3D RTE_IOVA_VA) ? 1 : 0; + if (dev_info.iova_mode) { + struct rte_mempool_memhdr *hdr; + uint64_t pool_size =3D 0; + + /* In each pool header chunk, we will maintain the + * base address of the pool. This chunk is physically and + * virtually contiguous. + * This approach will work, only if the allocated pool + * memory is contiguous, else it won't work + */ + hdr =3D STAILQ_FIRST(&pktmbuf_pool->mem_list); + dev_info.mbuf_va =3D (void *)(hdr->addr); + + /* Traverse the list and get the total size of the pool */ + STAILQ_FOREACH(hdr, &pktmbuf_pool->mem_list, next) { + pool_size +=3D hdr->len; + } + dev_info.mbuf_pool_size =3D pool_size + + pktmbuf_pool->mz->len; + } ret =3D ioctl(kni_fd, RTE_KNI_IOCTL_CREATE, &dev_info); if (ret < 0) goto ioctl_fail; -- 2.17.1