* [dpdk-dev] IOVA_CONTIG flag needed in kni initialization
@ 2020-01-30 20:19 Scott Wasson
2020-02-04 13:00 ` Ferruh Yigit
0 siblings, 1 reply; 2+ messages in thread
From: Scott Wasson @ 2020-01-30 20:19 UTC (permalink / raw)
To: dev
Hi,
We’re seeing an issue since upgrading to 19.08, the kni FIFO’s apparently aren’t contiguous. From user-space’s perspective, the kni’s tx_q straddles the 2MB pageboundary at 0x17a600000. The mbuf pointers in the ring prior to this address are valid. The tx_q’s write pointer is indicating there are mbufs at 0x17a600000 and beyond, but the pointers are all NULL.
Because the rte_kni kernel module is loaded:
In eal.c:
/* Workaround for KNI which requires physical address to work */
if (iova_mode == RTE_IOVA_VA &&
rte_eal_check_module("rte_kni") == 1) {
if (phys_addrs) {
iova_mode = RTE_IOVA_PA;
Iova_mode is forced to PA.
Through brute-force and experimentation, we determined that enabling --legacy-mem caused the problem to go away. But this caused the locations of the kni’s data structures to move, so they no longer straddled a hugepages boundary. Our concern is that the furniture may move around again and bring us back to where we were. Being tied to using --legacy-mem is undesirable in the long-term, anyway.
Through further brute-force and experimentation, we found that the following code patch helps (even without --legacy-mem):
index 3d2ffb2..5cc9d69 100644
--- a/lib/librte_kni/rte_kni.c
+++ b/lib/librte_kni/rte_kni.c
@@ -143,31 +143,31 @@ kni_reserve_mz(struct rte_kni *kni)
char mz_name[RTE_MEMZONE_NAMESIZE];
snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_TX_Q_MZ_NAME_FMT, kni->name);
- kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+ kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
KNI_MEM_CHECK(kni->m_tx_q == NULL, tx_q_fail);
snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RX_Q_MZ_NAME_FMT, kni->name);
- kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+ kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
KNI_MEM_CHECK(kni->m_rx_q == NULL, rx_q_fail);
snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_ALLOC_Q_MZ_NAME_FMT, kni->name);
- kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+ kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
KNI_MEM_CHECK(kni->m_alloc_q == NULL, alloc_q_fail);
snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_FREE_Q_MZ_NAME_FMT, kni->name);
- kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+ kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
KNI_MEM_CHECK(kni->m_free_q == NULL, free_q_fail);
snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_REQ_Q_MZ_NAME_FMT, kni->name);
- kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+ kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
KNI_MEM_CHECK(kni->m_req_q == NULL, req_q_fail);
snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RESP_Q_MZ_NAME_FMT, kni->name);
- kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+ kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
KNI_MEM_CHECK(kni->m_resp_q == NULL, resp_q_fail);
snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_SYNC_ADDR_MZ_NAME_FMT, kni->name);
- kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
+ kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
KNI_MEM_CHECK(kni->m_sync_addr == NULL, sync_addr_fail);
return 0;
I removed --legacy-mem, the tx_q still straddles the same 2MB page boundary, yet now it’s been running for a few hours and everything seems OK.
This would seem to follow precedent in rte_mempool.c:
/* if we're trying to reserve contiguous memory, add appropriate
* memzone flag.
*/
if (try_contig)
flags |= RTE_MEMZONE_IOVA_CONTIG;
which I think explains why our mbufs haven’t seen data truncation issues.
Could you please why RTE_MEMZONE_IOVA_CONTIG is necessary in PA mode? Isn’t contiguousness a fundamental property of physical addressing?
Are we still potentially vulnerable with --legacy-mem and without the above code change? Did we just get lucky because the furniture moved and doesn’t straddle a page boundary at the moment?
We also tested with stock 19.11 and did not see the crash. However the FIFO’s were not straddling a page boundary, and so we believe it is also vulnerable.
Thanks!
-Scott
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [dpdk-dev] IOVA_CONTIG flag needed in kni initialization
2020-01-30 20:19 [dpdk-dev] IOVA_CONTIG flag needed in kni initialization Scott Wasson
@ 2020-02-04 13:00 ` Ferruh Yigit
0 siblings, 0 replies; 2+ messages in thread
From: Ferruh Yigit @ 2020-02-04 13:00 UTC (permalink / raw)
To: Scott Wasson, dev; +Cc: Anatoly Burakov
On 1/30/2020 8:19 PM, Scott Wasson wrote:
> Hi,
>
> We’re seeing an issue since upgrading to 19.08, the kni FIFO’s apparently aren’t contiguous. From user-space’s perspective, the kni’s tx_q straddles the 2MB pageboundary at 0x17a600000. The mbuf pointers in the ring prior to this address are valid. The tx_q’s write pointer is indicating there are mbufs at 0x17a600000 and beyond, but the pointers are all NULL.
>
> Because the rte_kni kernel module is loaded:
>
> In eal.c:
> /* Workaround for KNI which requires physical address to work */
> if (iova_mode == RTE_IOVA_VA &&
> rte_eal_check_module("rte_kni") == 1) {
> if (phys_addrs) {
> iova_mode = RTE_IOVA_PA;
>
> Iova_mode is forced to PA.
>
> Through brute-force and experimentation, we determined that enabling --legacy-mem caused the problem to go away. But this caused the locations of the kni’s data structures to move, so they no longer straddled a hugepages boundary. Our concern is that the furniture may move around again and bring us back to where we were. Being tied to using --legacy-mem is undesirable in the long-term, anyway.
>
> Through further brute-force and experimentation, we found that the following code patch helps (even without --legacy-mem):
>
> index 3d2ffb2..5cc9d69 100644
> --- a/lib/librte_kni/rte_kni.c
> +++ b/lib/librte_kni/rte_kni.c
> @@ -143,31 +143,31 @@ kni_reserve_mz(struct rte_kni *kni)
> char mz_name[RTE_MEMZONE_NAMESIZE];
> snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_TX_Q_MZ_NAME_FMT, kni->name);
> - kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
> + kni->m_tx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
> KNI_MEM_CHECK(kni->m_tx_q == NULL, tx_q_fail);
> snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RX_Q_MZ_NAME_FMT, kni->name);
> - kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
> + kni->m_rx_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
> KNI_MEM_CHECK(kni->m_rx_q == NULL, rx_q_fail);
> snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_ALLOC_Q_MZ_NAME_FMT, kni->name);
> - kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
> + kni->m_alloc_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
> KNI_MEM_CHECK(kni->m_alloc_q == NULL, alloc_q_fail);
> snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_FREE_Q_MZ_NAME_FMT, kni->name);
> - kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
> + kni->m_free_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
> KNI_MEM_CHECK(kni->m_free_q == NULL, free_q_fail);
> snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_REQ_Q_MZ_NAME_FMT, kni->name);
> - kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
> + kni->m_req_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
> KNI_MEM_CHECK(kni->m_req_q == NULL, req_q_fail);
> snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_RESP_Q_MZ_NAME_FMT, kni->name);
> - kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
> + kni->m_resp_q = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
> KNI_MEM_CHECK(kni->m_resp_q == NULL, resp_q_fail);
> snprintf(mz_name, RTE_MEMZONE_NAMESIZE, KNI_SYNC_ADDR_MZ_NAME_FMT, kni->name);
> - kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, 0);
> + kni->m_sync_addr = rte_memzone_reserve(mz_name, KNI_FIFO_SIZE, SOCKET_ID_ANY, RTE_MEMZONE_IOVA_CONTIG);
> KNI_MEM_CHECK(kni->m_sync_addr == NULL, sync_addr_fail);
> return 0;
>
> I removed --legacy-mem, the tx_q still straddles the same 2MB page boundary, yet now it’s been running for a few hours and everything seems OK.
>
> This would seem to follow precedent in rte_mempool.c:
>
> /* if we're trying to reserve contiguous memory, add appropriate
> * memzone flag.
> */
> if (try_contig)
> flags |= RTE_MEMZONE_IOVA_CONTIG;
>
> which I think explains why our mbufs haven’t seen data truncation issues.
>
> Could you please why RTE_MEMZONE_IOVA_CONTIG is necessary in PA mode? Isn’t contiguousness a fundamental property of physical addressing?
>
> Are we still potentially vulnerable with --legacy-mem and without the above code change? Did we just get lucky because the furniture moved and doesn’t straddle a page boundary at the moment?
>
> We also tested with stock 19.11 and did not see the crash. However the FIFO’s were not straddling a page boundary, and so we believe it is also vulnerable.
>
cc'ed Anatoly.
Hi Scott,
Yes, KNI FIFO's needs to be physically contiguous, after memory rework
'rte_memzone_reserve()' doesn't guarantee the physically contiguous memory and
requires 'RTE_MEMZONE_IOVA_CONTIG' as you did above.
With '--legacy-mem' you are getting old behavior which provides physically
contiguous and it should be safe already.
Can you please send a patch for above change?
Thanks,
ferruh
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-02-04 13:02 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-30 20:19 [dpdk-dev] IOVA_CONTIG flag needed in kni initialization Scott Wasson
2020-02-04 13:00 ` Ferruh Yigit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).