From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by dpdk.org (Postfix, from userid 33) id C09382F7D; Wed, 30 May 2018 15:39:45 +0200 (CEST) From: bugzilla@dpdk.org To: dev@dpdk.org Date: Wed, 30 May 2018 13:39:45 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: DPDK X-Bugzilla-Component: other X-Bugzilla-Version: 18.05 X-Bugzilla-Keywords: X-Bugzilla-Severity: critical X-Bugzilla-Who: david.marchand@6wind.com X-Bugzilla-Status: CONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: dev@dpdk.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://dpdk.org/tracker/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All MIME-Version: 1.0 Subject: [dpdk-dev] [Bug 56] crash when freeing memory with no mlx5 device attached X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 May 2018 13:39:45 -0000 https://dpdk.org/tracker/show_bug.cgi?id=3D56 Bug ID: 56 Summary: crash when freeing memory with no mlx5 device attached Product: DPDK Version: 18.05 Hardware: All OS: All Status: CONFIRMED Severity: critical Priority: Normal Component: other Assignee: dev@dpdk.org Reporter: david.marchand@6wind.com Target Milestone: --- This problem is produced when a memory free event reaches the mlx5 callback, but no mlx5 device has been initialised (yet). Looking at the code, the mlx5 driver always register a memory callback: RTE_INIT(rte_mlx5_pmd_init); static void rte_mlx5_pmd_init(void) { ... rte_mem_event_callback_register("MLX5_MEM_EVENT_CB", mlx5_mr_mem_event_cb, NULL); } When invoked, this callback tries to take a lock: void=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20 mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,=20=20= =20=20=20=20=20=20=20=20=20 size_t len, void *arg __rte_unused)=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 {=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20 struct priv *priv;=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20 struct mlx5_dev_list *dev_list =3D &mlx5_shared_data->mem_event_cb_= list;=20=20 switch (event_type) {=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20 case RTE_MEM_EVENT_FREE:=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20=20=20=20=20=20=20 rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);= =20=20=20=20=20 /* Iterate all the existing mlx5 devices. */=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 But this lock is not initialised unless a mlx5 device has been probed, since its init is done in mlx5_prepare_shared_data() called from mlx5_pci_probe(). Reproducing the issue is not direct, I forced an allocation / liberation in= the testpmd code to make sure a free event would be triggered: root@ubuntu1604:~/dpdk# git diff diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 35cf266..79c9531 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -2772,6 +2772,8 @@ main(int argc, char** argv) } #endif + rte_free(rte_malloc(NULL, 10000000, 0)); + #ifdef RTE_LIBRTE_CMDLINE if (strlen(cmdline_filename) !=3D 0) cmdline_read_from_file(cmdline_filename); Then: root@ubuntu1604:~/dpdk# LD_LIBRARY_PATH=3D/root/rdma-core/build/lib ./build/app/testpmd --log-level .*,8 -c 0x6 -- -i --total-num-mbufs 2048 EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 0 on socket 0 EAL: Detected lcore 2 as core 0 on socket 0 ... EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 90MB Interactive-mode selected testpmd: create a new mbuf pool : n=3D2048, size=3D2176, socket=3D0 testpmd: preferred mempool ops selected: ring_mp_mc EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous space EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 8MB Done EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 10MB EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'Segmentation fault (core dumped) root@ubuntu1604:~/dpdk# gdb ./build/app/testpmd core ... Core was generated by `./build/app/testpmd --log-level .*,8 -c 0x6 -- -i --total-num-mbufs 2048'. Program terminated with signal SIGSEGV, Segmentation fault. #0 rte_rwlock_write_lock (rwl=3D) at /root/dpdk/build/include/generic/rte_rwlock.h:103 103 x =3D rwl->cnt; [Current thread is 1 (Thread 0x7f1871022c00 (LWP 5732))] (gdb) bt #0 rte_rwlock_write_lock (rwl=3D) at /root/dpdk/build/include/generic/rte_rwlock.h:103 #1 mlx5_mr_mem_event_cb (event_type=3DRTE_MEM_EVENT_FREE, addr=3D0x7f1474a= 00000, len=3D10485760, arg=3D) at /root/dpdk/drivers/net/mlx5/mlx5_= mr.c:884 #2 0x000000000054ae86 in eal_memalloc_mem_event_notify () #3 0x0000000000558994 in malloc_heap_free () #4 0x000000000055445f in rte_free () #5 0x0000000000477231 in main () --=20 You are receiving this mail because: You are the assignee for the bug.=