From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-f68.google.com (mail-pg0-f68.google.com [74.125.83.68]) by dpdk.org (Postfix) with ESMTP id 4CFF4AAED for ; Fri, 27 Apr 2018 19:03:40 +0200 (CEST) Received: by mail-pg0-f68.google.com with SMTP id b9-v6so2008721pgf.6 for ; Fri, 27 Apr 2018 10:03:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8cIlfnVrVP0gTDWj+sl6e+tt2zcLaBIPV692UP9HLVM=; b=GiRHTCnzoAAwFKMHCPT9tqM74NRYMp+LLZ9WJe/zclXmALGKPR+TpIxqR4jZMzwN8q nO6hzreONA+8FOcBKAQaMbUHTA0ZbP7BEi3IkhLjrcbalNsSCyH7r6SFf9P0d/kguqmE NVVdKWhxznwEwUun/u0lbLntCdP8upk7tFMixkRrvdSH4b4u5a43Ba1H/5k5+tzlIorI m9VQA4XMbR5FJ7+FEchDH0NbKy+gLQhvvH1p0waOSul27cT9OyGR3OrwGsQrHF58RT/x WI4fGN7zM1sgiVJ51fagp39ecoHJSfTq3n6OTSVc0LqpceqE21FM/5jy/KtAVjqLBBqK L0TA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8cIlfnVrVP0gTDWj+sl6e+tt2zcLaBIPV692UP9HLVM=; b=qt/1fC2iMLx8Hv3MPG7xNdl9cVpSMjxkx8FfR3bNC2nXpHzqCNPLZDJH9YQYD0Ho3n lQQu7nhFkKloA5nQ4InU2z/Cgst5KKHD9LWP56WHwjcX9J7mm4bAI60MPVdvJOQxsCeZ x4gMmmjrKihe4YYFsNf0r5O/vxDDNEpmHk9iXcA2APlVyVIvf/5Xwq8UrzdA4d/eIZKv hnUYsXB4QOR4qGmJFFEQswMpG+Kz9R540P/SkHUfCpfa6sGE1U8VaC1KcHu6hMSzBY9m VKp3faqQA+Es7UBVIgC8vuZsBfOODeW1RuFImEnR43UUpvPecWOWVjyWiDdXWPojeEgG kjOg== X-Gm-Message-State: ALQs6tBrep0DTXC58/txVA3b/oZqPnOhs/zRcUeDGnJm9OQv8tKuoxiR 0hilUfV4CSwMHK7fYWXmWFNNmQ== X-Google-Smtp-Source: AB8JxZqsQB0jYJoGJF2nFqsW2pQLt8wtPLPlYxS3+n5AQ2XJ1PFr4dKpZcQRtNh1XAHod7Q3tzoXBQ== X-Received: by 2002:a17:902:9303:: with SMTP id bc3-v6mr2990227plb.18.1524848619452; Fri, 27 Apr 2018 10:03:39 -0700 (PDT) Received: from xeon-e3 (204-195-71-95.wavecable.com. [204.195.71.95]) by smtp.gmail.com with ESMTPSA id m7-v6sm3848467pga.46.2018.04.27.10.03.39 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 27 Apr 2018 10:03:39 -0700 (PDT) Date: Fri, 27 Apr 2018 10:03:37 -0700 From: Stephen Hemminger To: Jianfeng Tan Cc: dev@dpdk.org, thomas@monjalon.net, Olivier Matz , Anatoly Burakov Message-ID: <20180427100337.3fca7ca7@xeon-e3> In-Reply-To: <1524847302-88110-1-git-send-email-jianfeng.tan@intel.com> References: <1524847302-88110-1-git-send-email-jianfeng.tan@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] eal: fix threads block on barrier X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Apr 2018 17:03:40 -0000 On Fri, 27 Apr 2018 16:41:42 +0000 Jianfeng Tan wrote: > Below commit introduced pthread barrier for synchronization. > But two IPC threads block on the barrier, and never wake up. > > (gdb) bt > #0 futex_wait (private=0, expected=0, futex_word=0x7fffffffcff4) > at ../sysdeps/unix/sysv/linux/futex-internal.h:61 > #1 futex_wait_simple (private=0, expected=0, futex_word=0x7fffffffcff4) > at ../sysdeps/nptl/futex-internal.h:135 > #2 __pthread_barrier_wait (barrier=0x7fffffffcff0) at pthread_barrier_wait.c:184 > #3 rte_thread_init (arg=0x7fffffffcfe0) > at ../dpdk/lib/librte_eal/common/eal_common_thread.c:160 > #4 start_thread (arg=0x7ffff6ecf700) at pthread_create.c:333 > #5 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 > > Through analysis, we find the barrier defined on the stack could be the > root cause. This patch will change to use heap memory as the barrier. > > Fixes: d651ee4919cd ("eal: set affinity for control threads") > > Cc: Olivier Matz > Cc: Anatoly Burakov > > Signed-off-by: Jianfeng Tan > --- > lib/librte_eal/common/eal_common_thread.c | 20 +++++++++++++------- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/lib/librte_eal/common/eal_common_thread.c b/lib/librte_eal/common/eal_common_thread.c > index 4e75cb8..da2b84f 100644 > --- a/lib/librte_eal/common/eal_common_thread.c > +++ b/lib/librte_eal/common/eal_common_thread.c > @@ -166,17 +166,21 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name, > const pthread_attr_t *attr, > void *(*start_routine)(void *), void *arg) > { > - struct rte_thread_ctrl_params params = { > - .start_routine = start_routine, > - .arg = arg, > - }; > + struct rte_thread_ctrl_params *params; > unsigned int lcore_id; > rte_cpuset_t cpuset; > int cpu_found, ret; > > - pthread_barrier_init(¶ms.configured, NULL, 2); > + params = malloc(sizeof(*params)); > + if (!params) > + return -1; > + > + params->start_routine = start_routine; > + params->arg = arg; > > - ret = pthread_create(thread, attr, rte_thread_init, (void *)¶ms); > + pthread_barrier_init(¶ms->configured, NULL, 2); > + > + ret = pthread_create(thread, attr, rte_thread_init, (void *)params); > if (ret != 0) > return ret; > > @@ -203,12 +207,14 @@ rte_ctrl_thread_create(pthread_t *thread, const char *name, > if (ret < 0) > goto fail; > > - pthread_barrier_wait(¶ms.configured); > + pthread_barrier_wait(¶ms->configured); > + free(params); > > return 0; > > fail: > pthread_cancel(*thread); > pthread_join(*thread, NULL); > + free(params); > return ret; > } This looks like a library bug. If there is a race on the configured barrier, then putting on heap is just moving problem. It still has bug where other thread is referring to freed memory.