From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id CE1D7A046B for ; Tue, 25 Jun 2019 17:04:55 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 06AE21B9C9; Tue, 25 Jun 2019 17:04:55 +0200 (CEST) Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by dpdk.org (Postfix) with ESMTP id 07B3623D for ; Tue, 25 Jun 2019 17:04:54 +0200 (CEST) Received: by mail-pf1-f195.google.com with SMTP id 81so9610859pfy.13 for ; Tue, 25 Jun 2019 08:04:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wF9jD3K/4qr+XnMdKhju1+asohbtiTCZYesjd/0hoMo=; b=GgzFaHhlIVAEQqzSbHEQPiNF9vmTT8AFyal6JiOnhZwruow/00mQDraFnKYqTrXay1 hikLjFZWgh5UHt1zb1XwMaNpnHcBkUB2T8NwJwvYWjDdjgp2XsV4h1qttv9x+zxJbteo xOIiZHVXJZUy6oM15ANUkq83yzqtCsTC/0vlHyyQZ8YqT9Bt1/WLPp0akYR9/ZtmrlSE YhGZmigfYUWzllYbQI9Fi7caBuZfYUMfBzb3hnKi9ncpuxfzWXd5ySyePAV04KqTd6E0 dJQ6LGRRGq3zMcxnCQL3TyiP+4yP2uOs0DtQKjTrCzbkkL5jBlcMiozmhwjGEqAt+6Pc VHEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=wF9jD3K/4qr+XnMdKhju1+asohbtiTCZYesjd/0hoMo=; b=bRgL6UhSr6i9SFxNMKB4ogAjQ/IUiMBdBma0JkwC2lb2M+ylXNkW4gGkjRQFrpuJUI KAyYronS6YbiMrMQ2CtOalLZABKzfT05FiCx9sKlyhFYatXrUSMV5fMJrXebj0YFoBDk lDuJGdYiOUWDPyCjzPDX58Vxafww+8kfbTRQK1v1PkvPnxHitMMnzY7sGqOnYyFr/o6E lQ1L0snUP5iKpnCQBqgbe5UGx7Qm/oaDtuZ+N1LwOwOR6wsSczWPNtYjMnDEoxaaJ8wS /S3d8hsthLQTOkmUYRNiV/OPnBpTNQM2vx/CTdf3SSIo6trVG1pZbVWpaU4WJ2mVZ650 3kKQ== X-Gm-Message-State: APjAAAV1LtgUq0mGWZzG7jpPeYv1EreSNhXPQ0OLEQMZqTGvKJ6yBuxB gOnyCMZ23i205HxJHfafI2mMKQJrAGeRBg== X-Google-Smtp-Source: APXvYqwJO7yxeCti5/s8mxMonCiYIa/nXSlbV/3ykK+bV7nGRqqs5dYdpJiuCfQzu0E2HZVDSbViCQ== X-Received: by 2002:a17:90a:cf8f:: with SMTP id i15mr31216565pju.110.1561475092899; Tue, 25 Jun 2019 08:04:52 -0700 (PDT) Received: from yateszhou-PC0.tencent.com ([203.205.141.44]) by smtp.gmail.com with ESMTPSA id 23sm16503849pfn.176.2019.06.25.08.04.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Jun 2019 08:04:51 -0700 (PDT) From: Yangchao Zhou To: dev@dpdk.org Cc: stephen@networkplumber.org, ferruh.yigit@intel.com, sodey@rbbn.com Date: Tue, 25 Jun 2019 23:04:14 +0800 Message-Id: <20190625150414.11332-1-zhouyates@gmail.com> In-Reply-To: <20190312092232.93640-1-zhouyates@gmail.com> References: <20190312092232.93640-1-zhouyates@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v3] kni: fix possible kernel crash with va2pa X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" va2pa depends on the physical address and virtual address offset of current mbuf. It may get the wrong physical address of next mbuf which allocated in another hugepage segment. In rte_mempool_populate_default(), trying to allocate whole block of contiguous memory could be failed. Then, it would reserve memory in several memzones that have different physical address and virtual address offsets. The rte_mempool_populate_default() is used by rte_pktmbuf_pool_create(). Fixes: 8451269e6d7b ("kni: remove continuous memory restriction") Signed-off-by: Yangchao Zhou --- .../prog_guide/kernel_nic_interface.rst | 6 ++- kernel/linux/kni/kni_net.c | 51 ++++++++++++------- .../linux/eal/include/rte_kni_common.h | 2 +- lib/librte_kni/rte_kni.c | 16 +++++- lib/librte_kni/rte_kni_fifo.h | 12 +++++ 5 files changed, 65 insertions(+), 22 deletions(-) diff --git a/doc/guides/prog_guide/kernel_nic_interface.rst b/doc/guides/prog_guide/kernel_nic_interface.rst index daf87f4a8..2fd3b7983 100644 --- a/doc/guides/prog_guide/kernel_nic_interface.rst +++ b/doc/guides/prog_guide/kernel_nic_interface.rst @@ -268,12 +268,14 @@ Use Case: Ingress ----------------- On the DPDK RX side, the mbuf is allocated by the PMD in the RX thread context. -This thread will enqueue the mbuf in the rx_q FIFO. +This thread will enqueue the mbuf in the rx_q FIFO, and the next pointers in mbuf-chain will convert to physical address. The KNI thread will poll all KNI active devices for the rx_q. If an mbuf is dequeued, it will be converted to a sk_buff and sent to the net stack via netif_rx(). -The dequeued mbuf must be freed, so the same pointer is sent back in the free_q FIFO. +The dequeued mbuf must be freed, so the same pointer is sent back in the free_q FIFO, +and next pointers must convert back to virtual address if exists before put in the free_q FIFO. The RX thread, in the same main loop, polls this FIFO and frees the mbuf after dequeuing it. +The address conversion of the next pointer is to prevent the chained mbuf in different hugepage segments from causing kernel crash. Use Case: Egress ---------------- diff --git a/kernel/linux/kni/kni_net.c b/kernel/linux/kni/kni_net.c index ad8365877..f65233aaa 100644 --- a/kernel/linux/kni/kni_net.c +++ b/kernel/linux/kni/kni_net.c @@ -61,18 +61,6 @@ kva2data_kva(struct rte_kni_mbuf *m) return phys_to_virt(m->buf_physaddr + m->data_off); } -/* virtual address to physical address */ -static void * -va2pa(void *va, struct rte_kni_mbuf *m) -{ - void *pa; - - pa = (void *)((unsigned long)va - - ((unsigned long)m->buf_addr - - (unsigned long)m->buf_physaddr)); - return pa; -} - /* * It can be called to process the request. */ @@ -173,7 +161,10 @@ kni_fifo_trans_pa2va(struct kni_dev *kni, struct rte_kni_fifo *src_pa, struct rte_kni_fifo *dst_va) { uint32_t ret, i, num_dst, num_rx; - void *kva; + struct rte_kni_mbuf *kva, *prev_kva; + int nb_segs; + int kva_nb_segs; + do { num_dst = kni_fifo_free_count(dst_va); if (num_dst == 0) @@ -188,6 +179,17 @@ kni_fifo_trans_pa2va(struct kni_dev *kni, for (i = 0; i < num_rx; i++) { kva = pa2kva(kni->pa[i]); kni->va[i] = pa2va(kni->pa[i], kva); + + kva_nb_segs = kva->nb_segs; + for (nb_segs = 0; nb_segs < kva_nb_segs; nb_segs++) { + if (!kva->next) + break; + + prev_kva = kva; + kva = pa2kva(kva->next); + /* Convert physical address to virtual address */ + prev_kva->next = pa2va(prev_kva->next, kva); + } } ret = kni_fifo_put(dst_va, kni->va, num_rx); @@ -313,7 +315,7 @@ kni_net_rx_normal(struct kni_dev *kni) uint32_t ret; uint32_t len; uint32_t i, num_rx, num_fq; - struct rte_kni_mbuf *kva; + struct rte_kni_mbuf *kva, *prev_kva; void *data_kva; struct sk_buff *skb; struct net_device *dev = kni->net_dev; @@ -363,8 +365,11 @@ kni_net_rx_normal(struct kni_dev *kni) if (!kva->next) break; - kva = pa2kva(va2pa(kva->next, kva)); + prev_kva = kva; + kva = pa2kva(kva->next); data_kva = kva2data_kva(kva); + /* Convert physical address to virtual address */ + prev_kva->next = pa2va(prev_kva->next, kva); } } @@ -396,7 +401,7 @@ kni_net_rx_lo_fifo(struct kni_dev *kni) uint32_t ret; uint32_t len; uint32_t i, num, num_rq, num_tq, num_aq, num_fq; - struct rte_kni_mbuf *kva; + struct rte_kni_mbuf *kva, *next_kva; void *data_kva; struct rte_kni_mbuf *alloc_kva; void *alloc_data_kva; @@ -439,6 +444,13 @@ kni_net_rx_lo_fifo(struct kni_dev *kni) data_kva = kva2data_kva(kva); kni->va[i] = pa2va(kni->pa[i], kva); + while (kva->next) { + next_kva = pa2kva(kva->next); + /* Convert physical address to virtual address */ + kva->next = pa2va(kva->next, next_kva); + kva = next_kva; + } + alloc_kva = pa2kva(kni->alloc_pa[i]); alloc_data_kva = kva2data_kva(alloc_kva); kni->alloc_va[i] = pa2va(kni->alloc_pa[i], alloc_kva); @@ -481,7 +493,7 @@ kni_net_rx_lo_fifo_skb(struct kni_dev *kni) uint32_t ret; uint32_t len; uint32_t i, num_rq, num_fq, num; - struct rte_kni_mbuf *kva; + struct rte_kni_mbuf *kva, *prev_kva; void *data_kva; struct sk_buff *skb; struct net_device *dev = kni->net_dev; @@ -545,8 +557,11 @@ kni_net_rx_lo_fifo_skb(struct kni_dev *kni) if (!kva->next) break; - kva = pa2kva(va2pa(kva->next, kva)); + prev_kva = kva; + kva = pa2kva(kva->next); data_kva = kva2data_kva(kva); + /* Convert physical address to virtual address */ + prev_kva->next = pa2va(prev_kva->next, kva); } } diff --git a/lib/librte_eal/linux/eal/include/rte_kni_common.h b/lib/librte_eal/linux/eal/include/rte_kni_common.h index 91a1c1408..37d9ee8f0 100644 --- a/lib/librte_eal/linux/eal/include/rte_kni_common.h +++ b/lib/librte_eal/linux/eal/include/rte_kni_common.h @@ -86,7 +86,7 @@ struct rte_kni_mbuf { /* fields on second cache line */ char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_MIN_SIZE))); void *pool; - void *next; + void *next; /**< Physical address of next mbuf in kernel. */ }; /* diff --git a/lib/librte_kni/rte_kni.c b/lib/librte_kni/rte_kni.c index e29d0cc7d..6fc36f744 100644 --- a/lib/librte_kni/rte_kni.c +++ b/lib/librte_kni/rte_kni.c @@ -344,6 +344,19 @@ va2pa(struct rte_mbuf *m) (unsigned long)m->buf_iova)); } +static void * +va2pa_all(struct rte_mbuf *mbuf) +{ + void *phy_mbuf = va2pa(mbuf); + struct rte_mbuf *next = mbuf->next; + while (next) { + mbuf->next = va2pa(next); + mbuf = next; + next = mbuf->next; + } + return phy_mbuf; +} + static void obj_free(struct rte_mempool *mp __rte_unused, void *opaque, void *obj, unsigned obj_idx __rte_unused) @@ -536,12 +549,13 @@ rte_kni_handle_request(struct rte_kni *kni) unsigned rte_kni_tx_burst(struct rte_kni *kni, struct rte_mbuf **mbufs, unsigned num) { + num = RTE_MIN(kni_fifo_free_count(kni->rx_q), num); void *phy_mbufs[num]; unsigned int ret; unsigned int i; for (i = 0; i < num; i++) - phy_mbufs[i] = va2pa(mbufs[i]); + phy_mbufs[i] = va2pa_all(mbufs[i]); ret = kni_fifo_put(kni->rx_q, phy_mbufs, num); diff --git a/lib/librte_kni/rte_kni_fifo.h b/lib/librte_kni/rte_kni_fifo.h index 287d7deb2..11d7fd6bd 100644 --- a/lib/librte_kni/rte_kni_fifo.h +++ b/lib/librte_kni/rte_kni_fifo.h @@ -104,3 +104,15 @@ kni_fifo_count(struct rte_kni_fifo *fifo) unsigned fifo_read = __KNI_LOAD_ACQUIRE(&fifo->read); return (fifo->len + fifo_write - fifo_read) & (fifo->len - 1); } + +/** + * Get the num of available elements in the fifo + */ +static inline uint32_t +kni_fifo_free_count(struct rte_kni_fifo *fifo) +{ + uint32_t fifo_write = __KNI_LOAD_ACQUIRE(&fifo->write); + uint32_t fifo_read = __KNI_LOAD_ACQUIRE(&fifo->read); + return (fifo_read - fifo_write - 1) & (fifo->len - 1); +} + -- 2.17.1