From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 1959E49DF for ; Sat, 28 Apr 2018 06:15:22 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 27 Apr 2018 21:15:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,337,1520924400"; d="scan'208";a="41287322" Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.255.28.178]) ([10.255.28.178]) by fmsmga002.fm.intel.com with ESMTP; 27 Apr 2018 21:15:20 -0700 To: Stephen Hemminger , Thomas Monjalon References: <1524847302-88110-1-git-send-email-jianfeng.tan@intel.com> <20180427103945.511a118e@xeon-e3> <13763738.ezdo4hZiut@xps> <20180427182141.227af689@xeon-e3> Cc: Shreyansh Jain , dev@dpdk.org, Olivier Matz , Anatoly Burakov From: "Tan, Jianfeng" Message-ID: <5bddd33d-32ab-aae7-c97f-5df7ac09b328@intel.com> Date: Sat, 28 Apr 2018 12:15:20 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20180427182141.227af689@xeon-e3> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] eal: fix threads block on barrier X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 28 Apr 2018 04:15:23 -0000 On 4/28/2018 9:21 AM, Stephen Hemminger wrote: > On Fri, 27 Apr 2018 21:52:26 +0200 > Thomas Monjalon wrote: > >> 27/04/2018 19:45, Shreyansh Jain: >>> From: Stephen Hemminger [mailto:stephen@networkplumber.org] >>>> Shreyansh Jain wrote: >>>>> From: Jianfeng Tan >>>>>> Below commit introduced pthread barrier for synchronization. >>>>>> But two IPC threads block on the barrier, and never wake up. >>>>>> >>>>>> (gdb) bt >>>>>> #0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4) >>>>>> at ../sysdeps/unix/sysv/linux/futex-internal.h:61 >>>>>> #1 futex_wait_simple (private=0, expected=0, >>>>>> futex_word=0x7fffffffcff4) >>>>>> at ../sysdeps/nptl/futex-internal.h:135 >>>>>> #2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at >>>>>> pthread_barrier_wait.c:184 >>>>>> #3 rte_thread_init (arg=0x7fffffffcfe0) >>>>>> at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160 >>>>>> #4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333 >>>>>> #5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 >>>>>> >>>>>> Through analysis, we find the barrier defined on the stack >>>>>> could be the root cause. This patch will change to use heap >>>>>> memory as the barrier. >>>>>> >>>>>> Fixes: d651ee4919cd ("eal: set affinity for control threads") >>>>>> >>>>>> Cc: Olivier Matz >>>>>> Cc: Anatoly Burakov >>>>>> >>>>>> Signed-off-by: Jianfeng Tan >>>>> Though I have seen Stephen's comment on this (possibly a library >>>> bug), this at least fixes an issue which was dogging dpaa and dpaa2 - >>>> generating bus errors and futex errors with variation in core masks >>>> provided to applications. >>>>> Thanks a lot for this. >>>>> >>>>> Acked-by: Shreyansh Jain >> Applied, thanks Jianfeng. >> >>>> Could you verify there is not a use after free by using valgrind or >>>> some library that poisons memory on free. >>> I will probably do that soon - but for the time being I don't want >>> this issue to block the dpaa/dpaa2 for RC1 - these drivers were >>> completely unusable without this patch. >> Please Shreyansh, continue the analysis of this bug. >> Thanks >> >> > I think the patch needs to change. > The attributes need be either global (or leak and never free). > > The glibc source for init keeps the pointer to the attributes. Did not follow why we need to add attr here. Besides, init only uses attr to decide futex type (private or shared); seems that it does not keep the pointer. So I cannot understand why we need to add a non-null attr parameter. Thanks, Jianfeng > > > static const struct pthread_barrierattr default_barrierattr = > { > .pshared = PTHREAD_PROCESS_PRIVATE > }; > > > int > __pthread_barrier_init (pthread_barrier_t *barrier, > const pthread_barrierattr_t *attr, unsigned int count) > { > struct pthread_barrier *ibarrier; > > /* XXX EINVAL is not specified by POSIX as a possible error code for COUNT > being too large. See pthread_barrier_wait for the reason for the > comparison with BARRIER_IN_THRESHOLD. */ > if (__glibc_unlikely (count == 0 || count >= BARRIER_IN_THRESHOLD)) > return EINVAL; > > const struct pthread_barrierattr *iattr > = (attr != NULL > ? (struct pthread_barrierattr *) attr > : &default_barrierattr); > > ibarrier = (struct pthread_barrier *) barrier; > > /* Initialize the individual fields. */ > ibarrier->in = 0; > ibarrier->out = 0; > ibarrier->count = count; > ibarrier->current_round = 0; > ibarrier->shared = (iattr->pshared == PTHREAD_PROCESS_PRIVATE > ? FUTEX_PRIVATE : FUTEX_SHARED); > > return 0; > } > weak_alias (__pthread_barrier_init, pthread_barrier_init)