From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 68141A04AF; Thu, 17 Sep 2020 13:21:10 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 53FA51D60D; Thu, 17 Sep 2020 13:21:10 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id 61ACE1D5ED; Thu, 17 Sep 2020 13:21:09 +0200 (CEST) IronPort-SDR: dg1S0tOincJHMHP6Sbp8ss9h2H2oPrc/r4+0DaW4lPQ1jUbME3LUHEo6j2YmLx1jcLdCSAVuU0 MW+sQ/y80czw== X-IronPort-AV: E=McAfee;i="6000,8403,9746"; a="139679839" X-IronPort-AV: E=Sophos;i="5.76,436,1592895600"; d="scan'208";a="139679839" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Sep 2020 04:21:05 -0700 IronPort-SDR: hTZ4fXJxfOrkPepYov6ANixrXO+VCu1FJst92pPjY6exSju9wCKda4HoOwjTfsrIFBN0BKSnuL 696E1gNiCV3A== X-IronPort-AV: E=Sophos;i="5.76,436,1592895600"; d="scan'208";a="483708543" Received: from dhunt5-mobl5.ger.corp.intel.com (HELO [10.213.228.49]) ([10.213.228.49]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Sep 2020 04:21:04 -0700 To: Lukasz Wojciechowski , Bruce Richardson Cc: dev@dpdk.org, stable@dpdk.org References: <20200915193449.13310-1-l.wojciechow@partner.samsung.com> <20200915193449.13310-2-l.wojciechow@partner.samsung.com> From: David Hunt Message-ID: Date: Thu, 17 Sep 2020 12:21:02 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20200915193449.13310-2-l.wojciechow@partner.samsung.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB Subject: Re: [dpdk-dev] [PATCH v1 1/6] app/test: fix deadlock in distributor test X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Lukasz, On 15/9/2020 8:34 PM, Lukasz Wojciechowski wrote: > The sanity test with worker shutdown delegates all bufs > to be processed by a single lcore worker, then it freezes > one of the lcore workers and continues to send more bufs. > > Problem occurred if freezed lcore is the same as the one > that is processing the mbufs. The lcore processing mbufs > might be different every time test is launched. > This is caused by keeping the value of wkr static variable > in rte_distributor_process function between running test cases. > > Test freezed always lcore with 0 id. The patch changes avoids > possible collision by freezing lcore with zero_idx. The lcore > that receives the data updates the zero_idx, so it is not freezed > itself. > > To reproduce the issue fixed by this patch, please run > distributor_autotest command in test app several times in a row. > > Fixes: c3eabff124e6 ("distributor: add unit tests") > Cc: bruce.richardson@intel.com > Cc: stable@dpdk.org > > Signed-off-by: Lukasz Wojciechowski > --- > app/test/test_distributor.c | 22 ++++++++++++++++++++-- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c > index ba1f81cf8..35b25463a 100644 > --- a/app/test/test_distributor.c > +++ b/app/test/test_distributor.c > @@ -28,6 +28,7 @@ struct worker_params worker_params; > static volatile int quit; /**< general quit variable for all threads */ > static volatile int zero_quit; /**< var for when we just want thr0 to quit*/ > static volatile unsigned worker_idx; > +static volatile unsigned zero_idx; > > struct worker_stats { > volatile unsigned handled_packets; > @@ -346,27 +347,43 @@ handle_work_for_shutdown_test(void *arg) > unsigned int total = 0; > unsigned int i; > unsigned int returned = 0; > + unsigned int zero_id = 0; > const unsigned int id = __atomic_fetch_add(&worker_idx, 1, > __ATOMIC_RELAXED); > > num = rte_distributor_get_pkt(d, id, buf, buf, num); > > + zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE); > + if (id == zero_id && num > 0) { > + zero_id = (zero_id + 1) % __atomic_load_n(&worker_idx, > + __ATOMIC_ACQUIRE); > + __atomic_store_n(&zero_idx, zero_id, __ATOMIC_RELEASE); > + } > + > /* wait for quit single globally, or for worker zero, wait > * for zero_quit */ > - while (!quit && !(id == 0 && zero_quit)) { > + while (!quit && !(id == zero_id && zero_quit)) { > worker_stats[id].handled_packets += num; > count += num; > for (i = 0; i < num; i++) > rte_pktmbuf_free(buf[i]); > num = rte_distributor_get_pkt(d, > id, buf, buf, num); > + > + zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE); > + if (id == zero_id && num > 0) { > + zero_id = (zero_id + 1) % __atomic_load_n(&worker_idx, > + __ATOMIC_ACQUIRE); > + __atomic_store_n(&zero_idx, zero_id, __ATOMIC_RELEASE); > + } > + > total += num; > } > worker_stats[id].handled_packets += num; > count += num; > returned = rte_distributor_return_pkt(d, id, buf, num); > > - if (id == 0) { > + if (id == zero_id) { > /* for worker zero, allow it to restart to pick up last packet > * when all workers are shutting down. > */ > @@ -586,6 +603,7 @@ quit_workers(struct worker_params *wp, struct rte_mempool *p) > rte_eal_mp_wait_lcore(); > quit = 0; > worker_idx = 0; > + zero_idx = 0; > } > > static int Lockup reproducable if you run the distributor_autotest 19 times in succesion. I was able to run the test many times more than that with the patch applied. Thanks. Tested-by: David Hunt