https://bugs.dpdk.org/show_bug.cgi?id=1776 Bug ID: 1776 Summary: Segmentation fault encountered in MPRQ vectorized mode Product: DPDK Version: 22.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: critical Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: canary.overflow@gmail.com Target Milestone: --- I have been encountering segmentation fault when running DPDK in MPRQ vectorized mode. To reproduce the issue on testpmd, run with the following parameters: dpdk-testpmd -l 1-5 -n 4 -a 0000:1f:00.0,rxq_comp_en=1,rxq_pkt_pad_en=1,rxqs_min_mprq=1,mprq_en=1,mprq_log_stride_num=6,mprq_log_stride_size=9,mprq_max_memcpy_len=64,rx_vec_en=1 -- -i --rxd=8192 --max-pkt-len=9000 --rxq=1 --total-num-mbufs=16384 --mbuf-size=3000 --enable-drop-en –-enable-scatter This segmentation fault goes away when I disable vectorization (rx_vec_en=0). (Note that the segmentation fault does not occur in forward-mode=rxonly). The segmentation fault also seems to happen with higher chances when there is a rxnombuf. The backtrace of the segmentation fault was: #0 0x0000000001c34912 in __rte_pktmbuf_free_extbuf () #1 0x0000000001c36a10 in rte_pktmbuf_detach () #2 0x0000000001c4a9ec in rxq_copy_mprq_mbuf_v () #3 0x0000000001c4d63b in rxq_burst_mprq_v () #4 0x0000000001c4d7a7 in mlx5_rx_burst_mprq_vec () #5 0x000000000050be66 in rte_eth_rx_burst () #6 0x000000000050c53d in pkt_burst_io_forward () #7 0x00000000005427b4 in run_pkt_fwd_on_lcore () #8 0x000000000054289b in start_pkt_forward_on_core () #9 0x0000000000a473c9 in eal_thread_loop () #10 0x00007ffff60061ca in start_thread () from /lib64/libpthread.so.0 #11 0x00007ffff5c72e73 in clone () from /lib64/libc.so.6 *Note that the addresses may not be exact as I've added some log statements and attempted fixes previously (they were commented out when I obtained this backtrace). Upon some investigation, I noticed that in DPDK’s source codes drivers/net/mlx5/mlx5_rxtx_vec.c (function rxq_copy_mprq_mbuf_v()), there is a possibility where the consumed stride exceeds the stride number (64 in this case) which should not be happening. I'm suspecting that there's some CQE misalignment here upon encountering rxnombuf. rxq_copy_mprq_mbuf_v(...) { ... if(rxq->consumed_strd == strd_n) { // replenish WQE } ... strd_cnt = (elts[i]->pkt_len / strd_sz) + ((elts[i]->pkt_len % strd_sz) ? 1 : 0); rxq_code = mprq_buf_to_pkt(rxq, elts[i], elts[i]->pkt_len, buf, rxq->consumed_strd, strd_cnt); rxq->consumed_strd += strd_cnt; // encountering cases where rxq->consumed_strd > strd_n ... } In addition, there were also cases in mprq_buf_to_pkt() where the allocated seg address is exactly the same as the pkt (elts[i]) address passed in which should not happen. mprq_buf_to_pkt(...) { ... if(hdrm_overlap > 0) { MLX5_ASSERT(rxq->strd_scatter_en); struct rte_mbuf *seg = rte_pktmbuf_alloc(rxq->mp); if (unlikely(seg == NULL)) return MLX5_RXQ_CODE_NOMBUF; SET_DATA_OFF(seg, 0); // added debug statement // saw instances where pkt = seg DRV_LOG(DEBUG, "pkt %p seg %p", (void *)pkt, (void *)seg); rte_memcpy(rte_pktmbuf_mtod(seg, void *), RTE_PTR_ADD(addr, len - hdrm_overlap), hdrm_overlap); ... } } I have tried upgrading my DPDK version to 24.11 but the segmentation fault still persists. -- You are receiving this mail because: You are the assignee for the bug.