From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jianfeng.tan@intel.com>
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id EE53937AC
 for <dev@dpdk.org>; Sat, 28 Apr 2018 06:22:30 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 27 Apr 2018 21:22:30 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,337,1520924400"; d="scan'208";a="41288301"
Received: from tanjianf-mobl.ccr.corp.intel.com (HELO [10.255.28.178])
 ([10.255.28.178])
 by fmsmga002.fm.intel.com with ESMTP; 27 Apr 2018 21:22:28 -0700
To: Stephen Hemminger <stephen@networkplumber.org>,
 Thomas Monjalon <thomas@monjalon.net>
References: <1524847302-88110-1-git-send-email-jianfeng.tan@intel.com>
 <20180427103945.511a118e@xeon-e3>
 <HE1PR0402MB27804445AAA3B7116898D4C9908D0@HE1PR0402MB2780.eurprd04.prod.outlook.com>
 <13763738.ezdo4hZiut@xps> <20180427182442.1384459d@xeon-e3>
Cc: Shreyansh Jain <shreyansh.jain@nxp.com>, dev@dpdk.org,
 Olivier Matz <olivier.matz@6wind.com>,
 Anatoly Burakov <anatoly.burakov@intel.com>
From: "Tan, Jianfeng" <jianfeng.tan@intel.com>
Message-ID: <7ded8d32-6731-6e13-c6ce-50fd89448132@intel.com>
Date: Sat, 28 Apr 2018 12:22:28 +0800
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <20180427182442.1384459d@xeon-e3>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH] eal: fix threads block on barrier
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Sat, 28 Apr 2018 04:22:32 -0000



On 4/28/2018 9:24 AM, Stephen Hemminger wrote:
> On Fri, 27 Apr 2018 21:52:26 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
>
>> 27/04/2018 19:45, Shreyansh Jain:
>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>>>> Shreyansh Jain <shreyansh.jain@nxp.com> wrote:
>>>>> From: Jianfeng Tan
>>>>>> Below commit introduced pthread barrier for synchronization.
>>>>>> But two IPC threads block on the barrier, and never wake up.
>>>>>>
>>>>>>    (gdb) bt
>>>>>>    #0  futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4)
>>>>>>        at ../sysdeps/unix/sysv/linux/futex-internal.h:61
>>>>>>    #1  futex_wait_simple (private=0, expected=0,
>>>>>> futex_word=0x7fffffffcff4)
>>>>>>        at ../sysdeps/nptl/futex-internal.h:135
>>>>>>    #2  __pthread_barrier_wait (barrier=0x7fffffffcff0) at
>>>>>> pthread_barrier_wait.c:184
>>>>>>    #3  rte_thread_init (arg=0x7fffffffcfe0)
>>>>>>        at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160
>>>>>>    #4  start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333
>>>>>>    #5  clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>>>
>>>>>> Through analysis, we find the barrier defined on the stack
>>>>>> could be the root cause. This patch will change to use heap
>>>>>> memory as the barrier.
>>>>>>
>>>>>> Fixes: d651ee4919cd ("eal: set affinity for control threads")
>>>>>>
>>>>>> Cc: Olivier Matz <olivier.matz@6wind.com>
>>>>>> Cc: Anatoly Burakov <anatoly.burakov@intel.com>
>>>>>>
>>>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>>>> Though I have seen Stephen's comment on this (possibly a library
>>>> bug), this at least fixes an issue which was dogging dpaa and dpaa2 -
>>>> generating bus errors and futex errors with variation in core masks
>>>> provided to applications.
>>>>> Thanks a lot for this.
>>>>>
>>>>> Acked-by: Shreyansh Jain <shreyansh.jain@nxp.com>
>> Applied, thanks Jianfeng.
>>
>>>> Could you verify there is not a use after free by using valgrind or
>>>> some library that poisons memory on free.
>>> I will probably do that soon - but for the time being I don't want
>>> this issue to block the dpaa/dpaa2 for RC1 - these drivers were
>>> completely unusable without this patch.
>> Please Shreyansh, continue the analysis of this bug.
>> Thanks
>>
>>
> The pthread_barrier should also be destroyed when it is no longer needed.

I tried this could also kick the sleeping thread; but due to "The effect 
of subsequent use of the barrier is undefined", I did not use that way.

Anyway, I agree that destroy() shall be called for completeness.

Thanks,
Jianfeng