From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 88ECEA0534; Tue, 4 Feb 2020 14:02:33 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E22F41C12F; Tue, 4 Feb 2020 14:02:32 +0100 (CET) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id D25351C12D for ; Tue, 4 Feb 2020 14:02:30 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 04 Feb 2020 05:00:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.70,401,1574150400"; d="scan'208";a="224287597" Received: from fyigit-mobl.ger.corp.intel.com (HELO [10.252.22.38]) ([10.252.22.38]) by orsmga008.jf.intel.com with ESMTP; 04 Feb 2020 05:00:54 -0800 To: Scott Wasson , "dev@dpdk.org" References: Cc: Anatoly Burakov From: Ferruh Yigit Autocrypt: addr=ferruh.yigit@intel.com; prefer-encrypt=mutual; keydata= mQINBFXZCFABEADCujshBOAaqPZpwShdkzkyGpJ15lmxiSr3jVMqOtQS/sB3FYLT0/d3+bvy qbL9YnlbPyRvZfnP3pXiKwkRoR1RJwEo2BOf6hxdzTmLRtGtwWzI9MwrUPj6n/ldiD58VAGQ +iR1I/z9UBUN/ZMksElA2D7Jgg7vZ78iKwNnd+vLBD6I61kVrZ45Vjo3r+pPOByUBXOUlxp9 GWEKKIrJ4eogqkVNSixN16VYK7xR+5OUkBYUO+sE6etSxCr7BahMPKxH+XPlZZjKrxciaWQb +dElz3Ab4Opl+ZT/bK2huX+W+NJBEBVzjTkhjSTjcyRdxvS1gwWRuXqAml/sh+KQjPV1PPHF YK5LcqLkle+OKTCa82OvUb7cr+ALxATIZXQkgmn+zFT8UzSS3aiBBohg3BtbTIWy51jNlYdy ezUZ4UxKSsFuUTPt+JjHQBvF7WKbmNGS3fCid5Iag4tWOfZoqiCNzxApkVugltxoc6rG2TyX CmI2rP0mQ0GOsGXA3+3c1MCdQFzdIn/5tLBZyKy4F54UFo35eOX8/g7OaE+xrgY/4bZjpxC1 1pd66AAtKb3aNXpHvIfkVV6NYloo52H+FUE5ZDPNCGD0/btFGPWmWRmkPybzColTy7fmPaGz cBcEEqHK4T0aY4UJmE7Ylvg255Kz7s6wGZe6IR3N0cKNv++O7QARAQABtCVGZXJydWggWWln aXQgPGZlcnJ1aC55aWdpdEBpbnRlbC5jb20+iQJUBBMBCgA+AhsDAh4BAheABQsJCAcDBRUK CQgLBRYCAwEAFiEE0jZTh0IuwoTjmYHH+TPrQ98TYR8FAl1meboFCQlupOoACgkQ+TPrQ98T YR9ACBAAv2tomhyxY0Tp9Up7mNGLfEdBu/7joB/vIdqMRv63ojkwr9orQq5V16V/25+JEAD0 60cKodBDM6HdUvqLHatS8fooWRueSXHKYwJ3vxyB2tWDyZrLzLI1jxEvunGodoIzUOtum0Ce gPynnfQCelXBja0BwLXJMplM6TY1wXX22ap0ZViC0m714U5U4LQpzjabtFtjT8qOUR6L7hfy YQ72PBuktGb00UR/N5UrR6GqB0x4W41aZBHXfUQnvWIMmmCrRUJX36hOTYBzh+x86ULgg7H2 1499tA4o6rvE13FiGccplBNWCAIroAe/G11rdoN5NBgYVXu++38gTa/MBmIt6zRi6ch15oLA Ln2vHOdqhrgDuxjhMpG2bpNE36DG/V9WWyWdIRlz3NYPCDM/S3anbHlhjStXHOz1uHOnerXM 1jEjcsvmj1vSyYoQMyRcRJmBZLrekvgZeh7nJzbPHxtth8M7AoqiZ/o/BpYU+0xZ+J5/szWZ aYxxmIRu5ejFf+Wn9s5eXNHmyqxBidpCWvcbKYDBnkw2+Y9E5YTpL0mS0dCCOlrO7gca27ux ybtbj84aaW1g0CfIlUnOtHgMCmz6zPXThb+A8H8j3O6qmPoVqT3qnq3Uhy6GOoH8Fdu2Vchh TWiF5yo+pvUagQP6LpslffufSnu+RKAagkj7/RSuZV25Ag0EV9ZMvgEQAKc0Db17xNqtSwEv mfp4tkddwW9XA0tWWKtY4KUdd/jijYqc3fDD54ESYpV8QWj0xK4YM0dLxnDU2IYxjEshSB1T qAatVWz9WtBYvzalsyTqMKP3w34FciuL7orXP4AibPtrHuIXWQOBECcVZTTOdZYGAzaYzxiA ONzF9eTiwIqe9/oaOjTwTLnOarHt16QApTYQSnxDUQljeNvKYt1lZE/gAUUxNLWsYyTT+22/ vU0GDUahsJxs1+f1yEr+OGrFiEAmqrzpF0lCS3f/3HVTU6rS9cK3glVUeaTF4+1SK5ZNO35p iVQCwphmxa+dwTG/DvvHYCtgOZorTJ+OHfvCnSVjsM4kcXGjJPy3JZmUtyL9UxEbYlrffGPQ I3gLXIGD5AN5XdAXFCjjaID/KR1c9RHd7Oaw0Pdcq9UtMLgM1vdX8RlDuMGPrj5sQrRVbgYH fVU/TQCk1C9KhzOwg4Ap2T3tE1umY/DqrXQgsgH71PXFucVjOyHMYXXugLT8YQ0gcBPHy9mZ qw5mgOI5lCl6d4uCcUT0l/OEtPG/rA1lxz8ctdFBVOQOxCvwRG2QCgcJ/UTn5vlivul+cThi 6ERPvjqjblLncQtRg8izj2qgmwQkvfj+h7Ex88bI8iWtu5+I3K3LmNz/UxHBSWEmUnkg4fJl Rr7oItHsZ0ia6wWQ8lQnABEBAAGJAjwEGAEKACYCGwwWIQTSNlOHQi7ChOOZgcf5M+tD3xNh HwUCXWZ5wAUJB3FgggAKCRD5M+tD3xNhH2O+D/9OEz62YuJQLuIuOfL67eFTIB5/1+0j8Tsu o2psca1PUQ61SZJZOMl6VwNxpdvEaolVdrpnSxUF31kPEvR0Igy8HysQ11pj8AcgH0a9FrvU /8k2Roccd2ZIdpNLkirGFZR7LtRw41Kt1Jg+lafI0efkiHKMT/6D/P1EUp1RxOBNtWGV2hrd 0Yg9ds+VMphHHU69fDH02SwgpvXwG8Qm14Zi5WQ66R4CtTkHuYtA63sS17vMl8fDuTCtvfPF HzvdJLIhDYN3Mm1oMjKLlq4PUdYh68Fiwm+boJoBUFGuregJFlO3hM7uHBDhSEnXQr5mqpPM 6R/7Q5BjAxrwVBisH0yQGjsWlnysRWNfExAE2sRePSl0or9q19ddkRYltl6X4FDUXy2DTXa9 a+Fw4e1EvmcF3PjmTYs9IE3Vc64CRQXkhujcN4ZZh5lvOpU8WgyDxFq7bavFnSS6kx7Tk29/ wNJBp+cf9qsQxLbqhW5kfORuZGecus0TLcmpZEFKKjTJBK9gELRBB/zoN3j41hlEl7uTUXTI JQFLhpsFlEdKLujyvT/aCwP3XWT+B2uZDKrMAElF6ltpTxI53JYi22WO7NH7MR16Fhi4R6vh FHNBOkiAhUpoXRZXaCR6+X4qwA8CwHGqHRBfYFSU/Ulq1ZLR+S3hNj2mbnSx0lBs1eEqe2vh cA== Message-ID: Date: Tue, 4 Feb 2020 13:00:53 +0000 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] IOVA_CONTIG flag needed in kni initialization X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 1/30/2020 8:19 PM, Scott Wasson wrote: > Hi, >   > We’re seeing an issue since upgrading to 19.08, the kni FIFO’s apparently aren’t contiguous.  From user-space’s perspective, the kni’s tx_q straddles the 2MB pageboundary at 0x17a600000.  The mbuf pointers in the ring prior to this address are valid.  The tx_q’s write pointer is indicating there are mbufs at 0x17a600000 and beyond, but the pointers are all NULL. >   > Because the rte_kni kernel module is loaded: >   > In eal.c: >                                 /* Workaround for KNI which requires physical address to work */ >                                 if (iova_mode == RTE_IOVA_VA && >                                                                 rte_eal_check_module("rte_kni") == 1) { >                                                 if (phys_addrs) { >                                                                 iova_mode = RTE_IOVA_PA; >   > Iova_mode is forced to PA. >   > Through brute-force and experimentation, we determined that enabling --legacy-mem caused the problem to go away.  But this caused the locations of the kni’s data structures to move, so they no longer straddled a hugepages boundary.  Our concern is that the furniture may move around again and bring us back to where we were.  Being tied to using --legacy-mem is undesirable in the long-term, anyway. >   > Through further brute-force and experimentation, we found that the following code patch helps (even without --legacy-mem): >   > index 3d2ffb2..5cc9d69 100644 > --- a/lib/librte_kni/rte_kni.c > +++ b/lib/librte_kni/rte_kni.c > @@ -143,31 +143,31 @@ kni_reserve_mz(struct rte_kni *kni) >         char mz_name[RTE_MEMZONE_NAMESIZE]; >         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_TX_Q_MZ_NAME_FMT, kni->name); > -       kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); > +       kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); >         KNI_MEM_CHECK(kni->m_tx_q == NULL, tx_q_fail); >         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RX_Q_MZ_NAME_FMT, kni->name); > -       kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); > +       kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); >         KNI_MEM_CHECK(kni->m_rx_q == NULL, rx_q_fail); >         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_ALLOC_Q_MZ_NAME_FMT, kni->name); > -       kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); > +       kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); >         KNI_MEM_CHECK(kni->m_alloc_q == NULL, alloc_q_fail); >         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_FREE_Q_MZ_NAME_FMT, kni->name); > -       kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); > +       kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); >         KNI_MEM_CHECK(kni->m_free_q == NULL, free_q_fail); >         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_REQ_Q_MZ_NAME_FMT, kni->name); > -       kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); > +       kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); >         KNI_MEM_CHECK(kni->m_req_q == NULL, req_q_fail); >         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RESP_Q_MZ_NAME_FMT, kni->name); > -       kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); > +       kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); >         KNI_MEM_CHECK(kni->m_resp_q == NULL, resp_q_fail); >         snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_SYNC_ADDR_MZ_NAME_FMT, kni->name); > -       kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0); > +       kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG); >         KNI_MEM_CHECK(kni->m_sync_addr == NULL, sync_addr_fail); >         return 0; >   > I removed --legacy-mem, the tx_q still straddles the same 2MB page boundary, yet now it’s been running for a few hours and everything seems OK. >   > This would seem to follow precedent in rte_mempool.c: >   >                                 /* if we're trying to reserve contiguous memory, add appropriate >                                 * memzone flag. >                                 */ >                                 if (try_contig) >                                                 flags |= RTE_MEMZONE_IOVA_CONTIG; >   > which I think explains why our mbufs haven’t seen data truncation issues. >   > Could you please why RTE_MEMZONE_IOVA_CONTIG is necessary in PA mode?  Isn’t contiguousness a fundamental property of physical addressing? >   > Are we still potentially vulnerable with --legacy-mem and without the above code change?  Did we just get lucky because the furniture moved and doesn’t straddle a page boundary at the moment? >   > We also tested with stock 19.11 and did not see the crash.  However the FIFO’s were not straddling a page boundary, and so we believe it is also vulnerable. >   cc'ed Anatoly. Hi Scott, Yes, KNI FIFO's needs to be physically contiguous, after memory rework 'rte_memzone_reserve()' doesn't guarantee the physically contiguous memory and requires 'RTE_MEMZONE_IOVA_CONTIG' as you did above. With '--legacy-mem' you are getting old behavior which provides physically contiguous and it should be safe already. Can you please send a patch for above change? Thanks, ferruh