From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 0FF7EDED for ; Wed, 2 May 2018 11:32:32 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 May 2018 02:32:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,354,1520924400"; d="scan'208";a="225100199" Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.67.64.103]) ([10.67.64.103]) by fmsmga006.fm.intel.com with ESMTP; 02 May 2018 02:32:30 -0700 To: Olivier Matz , Maxime Coquelin References: <20180403130439.11151-1-olivier.matz@6wind.com> <20180424144651.13145-1-olivier.matz@6wind.com> <4256B2F0-EF9D-4B22-AC1A-D440C002360A@6wind.com> <39d5baf8-2bad-6df8-0419-a06c65d41475@redhat.com> <2d828aa1-482f-7f19-1909-c3ca4599c9b2@intel.com> <393a2f7e-ed20-fa28-0b07-aa3374593d5a@redhat.com> <20180502092011.5nxl5nbka6zfi4hb@neon> Cc: dev@dpdk.org, Anatoly Burakov , Thomas Monjalon From: "Tan, Jianfeng" Message-ID: <7afa9235-cc14-a05f-7f85-87d8a40d447e@intel.com> Date: Wed, 2 May 2018 17:32:30 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20180502092011.5nxl5nbka6zfi4hb@neon> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] pthread_barrier_deadlock in -rc1 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 02 May 2018 09:32:33 -0000 Hi Maxime and Olivier, [...] >>> Below patch can fix another strange sigsegv issue in my VM. Please check >>> if it works for you. I doubt it's use-after-free problem which could >>> lead to different issues in different env. Please have a try. >>> >>> >>> diff --git a/lib/librte_eal/common/eal_common_thread.c >>> b/lib/librte_eal/common/eal_common_thread.c >>> index de69452..d91b67d 100644 >>> --- a/lib/librte_eal/common/eal_common_thread.c >>> +++ b/lib/librte_eal/common/eal_common_thread.c >>> @@ -205,6 +205,7 @@ rte_ctrl_thread_create(pthread_t *thread, const char >>> *name, >>> goto fail; >>> >>> pthread_barrier_wait(¶ms->configured); >>> + pthread_barrier_destroy(¶ms->configured); >> Thanks Jianfeng, that fixes my issue. >> For correctness, I wonder whether we should check pthread_barrier_wait >> return, and only call destroy() if PTHREAD_BARRIER_SERIAL_THREAD? >> And so also do same the same thing in rte_thread_init(). >> >> What do you think? >> Thanks, >> Maxime > > Thanks for the update. I also have a patch that replaces the barrier by > a lock which could also work, but if Jianfeng's one fixes the issue, I > think it is better. > > About the PTHREAD_BARRIER_SERIAL_THREAD, not sure it will change > something: > > Upon successful completion, the pthread_barrier_wait() function > shall return PTHREAD_BARRIER_SERIAL_THREAD for a single > (arbitrary) thread synchronized at the barrier and zero for each > of the other threads. Otherwise, an error number shall be > returned to indicate the error. > > I understand that it will ensure that only one barrier will return > PTHREAD_BARRIER_SERIAL_THREAD, but not necessarily the last one. So > if destroy() is called in the parent thread, it should be the same, no? > > By the way, there is also a small memory leak that was introduced by > the previous patch, maybe you can add the fix too: > > - if (ret != 0) > + if (ret != 0) { > + free(params); > return ret; > + } How about: the thread who gets PTHREAD_BARRIER_SERIAL_THREAD returned, is responsible for the destroy and free(params)? Thanks, Jianfeng