DPDK patches and discussions
 help / color / mirror / Atom feed
From: Scott Wasson <scott_wasson@affirmednetworks.com>
To: "dev@dpdk.org" <dev@dpdk.org>
Subject: [dpdk-dev] IOVA_CONTIG flag needed in kni initialization
Date: Thu, 30 Jan 2020 20:19:00 +0000	[thread overview]
Message-ID: <FCAEED08-5F1C-42EE-AE60-56B502654AA9@affirmednetworks.com> (raw)

Hi,
 
We’re seeing an issue since upgrading to 19.08, the kni FIFO’s apparently aren’t contiguous.  From user-space’s perspective, the kni’s tx_q straddles the 2MB pageboundary at 0x17a600000.  The mbuf pointers in the ring prior to this address are valid.  The tx_q’s write pointer is indicating there are mbufs at 0x17a600000 and beyond, but the pointers are all NULL.
 
Because the rte_kni kernel module is loaded:
 
In eal.c:
                                /* Workaround for KNI which requires physical address to work */
                                if (iova_mode == RTE_IOVA_VA &&
                                                                rte_eal_check_module("rte_kni") == 1) {
                                                if (phys_addrs) {
                                                                iova_mode = RTE_IOVA_PA;
 
Iova_mode is forced to PA.
 
Through brute-force and experimentation, we determined that enabling --legacy-mem caused the problem to go away.  But this caused the locations of the kni’s data structures to move, so they no longer straddled a hugepages boundary.  Our concern is that the furniture may move around again and bring us back to where we were.  Being tied to using --legacy-mem is undesirable in the long-term, anyway.
 
Through further brute-force and experimentation, we found that the following code patch helps (even without --legacy-mem):
 
index 3d2ffb2..5cc9d69 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -143,31 +143,31 @@ kni_reserve_mz(struct rte_kni *kni)
        char mz_name[RTE_MEMZONE_NAMESIZE];
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_TX_Q_MZ_NAME_FMT, kni->name);
-       kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+       kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_tx_q == NULL, tx_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RX_Q_MZ_NAME_FMT, kni->name);
-       kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+       kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_rx_q == NULL, rx_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_ALLOC_Q_MZ_NAME_FMT, kni->name);
-       kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+       kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_alloc_q == NULL, alloc_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_FREE_Q_MZ_NAME_FMT, kni->name);
-       kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+       kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_free_q == NULL, free_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_REQ_Q_MZ_NAME_FMT, kni->name);
-       kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+       kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_req_q == NULL, req_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RESP_Q_MZ_NAME_FMT, kni->name);
-       kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+       kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_resp_q == NULL, resp_q_fail);
        snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_SYNC_ADDR_MZ_NAME_FMT, kni->name);
-       kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+       kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
        KNI_MEM_CHECK(kni->m_sync_addr == NULL, sync_addr_fail);
        return 0;
 
I removed --legacy-mem, the tx_q still straddles the same 2MB page boundary, yet now it’s been running for a few hours and everything seems OK.
 
This would seem to follow precedent in rte_mempool.c:
 
                                /* if we're trying to reserve contiguous memory, add appropriate
                                * memzone flag.
                                */
                                if (try_contig)
                                                flags |= RTE_MEMZONE_IOVA_CONTIG;
 
which I think explains why our mbufs haven’t seen data truncation issues.
 
Could you please why RTE_MEMZONE_IOVA_CONTIG is necessary in PA mode?  Isn’t contiguousness a fundamental property of physical addressing?
 
Are we still potentially vulnerable with --legacy-mem and without the above code change?  Did we just get lucky because the furniture moved and doesn’t straddle a page boundary at the moment?
 
We also tested with stock 19.11 and did not see the crash.  However the FIFO’s were not straddling a page boundary, and so we believe it is also vulnerable.
 
Thanks!
 
-Scott
 
 


             reply	other threads:[~2020-01-30 20:19 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-30 20:19 Scott Wasson [this message]
2020-02-04 13:00 ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FCAEED08-5F1C-42EE-AE60-56B502654AA9@affirmednetworks.com \
    --to=scott_wasson@affirmednetworks.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).