From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id EE53937AC for ; Sat, 28 Apr 2018 06:22:30 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Apr 2018 21:22:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,337,1520924400"; d="scan'208";a="41288301" Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.255.28.178]) ([10.255.28.178]) by fmsmga002.fm.intel.com with ESMTP; 27 Apr 2018 21:22:28 -0700 To: Stephen Hemminger , Thomas Monjalon References: <1524847302-88110-1-git-send-email-jianfeng.tan@intel.com> <20180427103945.511a118e@xeon-e3> <13763738.ezdo4hZiut@xps> <20180427182442.1384459d@xeon-e3> Cc: Shreyansh Jain , dev@dpdk.org, Olivier Matz , Anatoly Burakov From: "Tan, Jianfeng" Message-ID: <7ded8d32-6731-6e13-c6ce-50fd89448132@intel.com> Date: Sat, 28 Apr 2018 12:22:28 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20180427182442.1384459d@xeon-e3> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] eal: fix threads block on barrier X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Apr 2018 04:22:32 -0000 On 4/28/2018 9:24 AM, Stephen Hemminger wrote: > On Fri, 27 Apr 2018 21:52:26 +0200 > Thomas Monjalon wrote: > >> 27/04/2018 19:45, Shreyansh Jain: >>> From: Stephen Hemminger [mailto:stephen@networkplumber.org] >>>> Shreyansh Jain wrote: >>>>> From: Jianfeng Tan >>>>>> Below commit introduced pthread barrier for synchronization. >>>>>> But two IPC threads block on the barrier, and never wake up. >>>>>> >>>>>> (gdb) bt >>>>>> #0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4) >>>>>> at ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>>>>> #1 futex_wait_simple (private=0, expected=0, >>>>>> futex_word=0x7fffffffcff4) >>>>>> at ../sysdeps/nptl/futex-internal.h:135 >>>>>> #2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at >>>>>> pthread_barrier_wait.c:184 >>>>>> #3 rte_thread_init (arg=0x7fffffffcfe0) >>>>>> at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160 >>>>>> #4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333 >>>>>> #5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >>>>>> >>>>>> Through analysis, we find the barrier defined on the stack >>>>>> could be the root cause. This patch will change to use heap >>>>>> memory as the barrier. >>>>>> >>>>>> Fixes: d651ee4919cd ("eal: set affinity for control threads") >>>>>> >>>>>> Cc: Olivier Matz >>>>>> Cc: Anatoly Burakov >>>>>> >>>>>> Signed-off-by: Jianfeng Tan >>>>> Though I have seen Stephen's comment on this (possibly a library >>>> bug), this at least fixes an issue which was dogging dpaa and dpaa2 - >>>> generating bus errors and futex errors with variation in core masks >>>> provided to applications. >>>>> Thanks a lot for this. >>>>> >>>>> Acked-by: Shreyansh Jain >> Applied, thanks Jianfeng. >> >>>> Could you verify there is not a use after free by using valgrind or >>>> some library that poisons memory on free. >>> I will probably do that soon - but for the time being I don't want >>> this issue to block the dpaa/dpaa2 for RC1 - these drivers were >>> completely unusable without this patch. >> Please Shreyansh, continue the analysis of this bug. >> Thanks >> >> > The pthread_barrier should also be destroyed when it is no longer needed. I tried this could also kick the sleeping thread; but due to "The effect of subsequent use of the barrier is undefined", I did not use that way. Anyway, I agree that destroy() shall be called for completeness. Thanks, Jianfeng