From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id DA265A034D; Thu, 27 Jan 2022 11:07:00 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 51DE942773; Thu, 27 Jan 2022 11:07:00 +0100 (CET) Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by mails.dpdk.org (Postfix) with ESMTP id 2AF4E4067C for ; Thu, 27 Jan 2022 11:06:59 +0100 (CET) Received: by mail-wr1-f43.google.com with SMTP id s9so3738502wrb.6 for ; Thu, 27 Jan 2022 02:06:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=0S5hXould9hmTERZ+ypb5xR1cVt+QgxjWdigUMceVAc=; b=NwLok5f6p1cPOw7tQYDLtMsusMzLZrRvJHzYWesT+a0R/ooAvt9f/8gBDTRKIK47Zv Y+Dz36wcV4Urxygx0142m8GK+UE7Q9tR1op007i+3yIBdPZ1VIL8GGHt40B+vlw7idt1 LzoWpZOMC8PeyNTgiKH8dBI4UL+qJiROoqXJcmnxEzs2fvfPZLPvMVpiUy2tEQ6sHTbb o0G27Cc+203Qeonc4wBkwQd+CCyu2lqCNO9Zu4FgbhLezQ0ieLYmd7WZ7Va/3NIEUG7w WVvoyMYfk9T3H3Eo9LwfBFYerGxKK7C8WaulYDkz0TsSN2ATVkoKomOQZRnk+9PwWikQ miFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=0S5hXould9hmTERZ+ypb5xR1cVt+QgxjWdigUMceVAc=; b=AMZAdG//lyYuzM0GfgrWpl4rPBWy6z+7GtzV30Y5fThvGHWJpRVGirCa7h0jZMGuqP 7zruv04SHU4hlJGeU0KYoo+CdnXmP+8vNn4So5cC450ehsVxXtmlOfHtN0a5wxh/pydX +s+MYyaHzfoZwUG5rTD8xdZ4S/6Zl32woW9VnUBUXyZHmh/GqYgtUKlW2YPUfQzdSQM+ 2/vm2JNi1hKAU8yR4Der0M+6ZAYHFokPphjeJ8xqUUB6LC2EujoVQqYuD/x71D5nPZMC cop+73lPc5sRb/wz+6pjWyTvHNxZx80r4OhBhXkn+T826xi8aWlKsbOrfYQIBiulvJvv SzxQ== X-Gm-Message-State: AOAM532tp1bBeheNAGZ8xGLmKwUnGDvD13NIPLvaHXN36oZUk19g+CLj qKfNtM12V89iTD0eSjS+la6YRwkxe7P0VA== X-Google-Smtp-Source: ABdhPJzI6iigmFAhCpArcZDCgCBpQQI3mA899YWhERIWPa1t/Xi76dm8CZ//6wH2xgxjk82VXTirlQ== X-Received: by 2002:a05:6000:1687:: with SMTP id y7mr2422473wrd.159.1643278018824; Thu, 27 Jan 2022 02:06:58 -0800 (PST) Received: from 6wind.com ([2a01:e0a:5ac:6460:c065:401d:87eb:9b25]) by smtp.gmail.com with ESMTPSA id o8sm1477998wmc.46.2022.01.27.02.06.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jan 2022 02:06:57 -0800 (PST) Date: Thu, 27 Jan 2022 11:06:56 +0100 From: Olivier Matz To: Tianli Lai Cc: dev@dpdk.org Subject: Re: [PATCH] mempool: fix rte primary program coredump Message-ID: References: <1636559839-6553-1-git-send-email-laitianli@tom.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1636559839-6553-1-git-send-email-laitianli@tom.com> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Tianli, On Wed, Nov 10, 2021 at 11:57:19PM +0800, Tianli Lai wrote: > the primary program(such as ofp app) run first, then run the secondary > program(such as dpdk-pdump), the primary program would receive signal > SIGSEGV. the function stack as follow: > > aived signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffee60e700 (LWP 112613)] > 0x00007ffff5f2cc0b in bucket_stack_pop (stack=0xffff00010000) at > /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:95 > 95 if (stack->top == 0) > Missing separate debuginfos, use: debuginfo-install > glibc-2.17-196.el7.x86_64 libatomic-4.8.5-16.el7.x86_64 > libconfig-1.4.9-5.el7.x86_64 libgcc-4.8.5-16.el7.x86_64 > libpcap-1.5.3-12.el7.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64 > openssl-libs-1.0.2k-8.el7.x86_64 zlib-1.2.7-17.el7.x86_64 > (gdb) bt > #0 0x00007ffff5f2cc0b in bucket_stack_pop (stack=0xffff00010000) at /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:95 > #1 0x00007ffff5f2e5dc in bucket_dequeue_orphans (bd=0x2209e5fac0,obj_table=0x220b083710, n_orphans=251) at /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:190 > #2 0x00007ffff5f30192 in bucket_dequeue (mp=0x220b07d5c0,obj_table=0x220b083710, n=251) at /ofp/dpdk/drivers/mempool/bucket/rte_mempool_bucket.c:288 > #3 0x00007ffff5f47e18 in rte_mempool_ops_dequeue_bulk (mp=0x220b07d5c0,obj_table=0x220b083710, n=251) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:739 > #4 0x00007ffff5f4819d in __mempool_generic_get (cache=0x220b083700, n=1, obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1443 > #5 rte_mempool_generic_get (cache=0x220b083700, n=1, obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1506 > #6 rte_mempool_get_bulk (n=1, obj_table=0x7fffee5deb18, mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1539 > #7 rte_mempool_get (obj_p=0x7fffee5deb18, mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mempool.h:1565 > #8 rte_mbuf_raw_alloc (mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:551 > #9 0x00007ffff5f483a4 in rte_pktmbuf_alloc (mp=0x220b07d5c0) at /ofp/dpdk/x86_64-native-linuxapp-gcc/include/rte_mbuf.h:804 > #10 0x00007ffff5f4c9d9 in pdump_pktmbuf_copy (m=0x220746ad80, mp=0x220b07d5c0) at /ofp/dpdk/lib/librte_pdump/rte_pdump.c:99 > #11 0x00007ffff5f4e42e in pdump_copy (pkts=0x7fffee5dfdf0, nb_pkts=1, user_params=0x7ffff76d7cc0 ) at /ofp/dpdk/lib/librte_pdump/rte_pdump.c:151 > #12 0x00007ffff5f4eadd in pdump_rx (port=0, qidx=0, pkts=0x7fffee5dfdf0, nb_pkts=1, max_pkts=16, user_params=0x7ffff76d7cc0 ) at /ofp/dpdk/lib/librte_pdump/rte_pdump.c:172 > #13 0x00007ffff5d0e9e8 in rte_eth_rx_burst (port_id=0, queue_id=0, rx_pkts=0x7fffee5dfdf0, nb_pkts=16) at /ofp/dpdk/x86_64-native-linuxapp-gcc/usr/local/include/dpdk/rte_ethdev.h:4396 > #14 0x00007ffff5d114c3 in recv_pkt_dpdk (pktio_entry=0x22005436c0, index=0, pkt_table=0x7fffee5dfdf0, num=16) at odp_packet_dpdk.c:1081 > #15 0x00007ffff5d2f931 in odp_pktin_recv (queue=...,packets=0x7fffee5dfdf0, num=16) at ../linux-generic/odp_packet_io.c:1896 > #16 0x000000000040a344 in rx_burst (pktin=...) at app_main.c:223 > #17 0x000000000040aca4 in run_server_single (arg=0x7fffffffe2b0) at app_main.c:417 > #18 0x00007ffff7bd6883 in run_thread (arg=0x7fffffffe3b8) at threads.c:67 > #19 0x00007ffff53c8e25 in start_thread () from /lib64/libpthread.so.0 > #20 0x00007ffff433e34d in clone () from /lib64/libc.so.6.c:67 > > The program crash down reason is: > > In primary program and secondary program , the global array rte_mempool_ops.ops[]: > primary name secondary name > [0]: "bucket" "ring_mp_mc" > [1]: "dpaa" "ring_sp_sc" > [2]: "dpaa2" "ring_mp_sc" > [3]: "octeontx_fpavf" "ring_sp_mc" > [4]: "octeontx2_npa" "octeontx2_npa" > [5]: "ring_mp_mc" "bucket" > [6]: "ring_sp_sc" "stack" > [7]: "ring_mp_sc" "if_stack" > [8]: "ring_sp_mc" "dpaa" > [9]: "stack" "dpaa2" > [10]: "if_stack" "octeontx_fpavf" > [11]: NULL NULL > > this array in primary program is different with secondary program. > so when secondary program call rte_pktmbuf_pool_create_by_ops() with > mempool name “ring_mp_mc”, but the primary program use "bucket" type > to alloc rte_mbuf. > > so sort this array both primary program and secondary program when init > memzone. > > Signed-off-by: Tianli Lai I think it is the same problem than the one described here: http://inbox.dpdk.org/dev/1583114253-15345-1-git-send-email-xiangxia.m.yue@gmail.com/#r To summarize what is said in the thread, sorting ops look dangerous because it changes the index during the lifetime of the application. A new proposal was made to use a shared memory to ensure the indexes are the same in primary and secondaries, but it requires some changes in EAL to have init callbacks at a specific place. I have a draft patchset that may fix this issue by using the vdev infrastructure instead of a specific init, but it is not heavily tested. I can send it here as a RFC if you want to try it. One thing that is not clear to me is how do you trigger this issue? Why the mempool ops are not loaded in the same order in primary and secondary? Thanks, Olivier