From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stable-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 2C24CA04AF
	for <public@inbox.dpdk.org>; Thu, 17 Sep 2020 13:21:13 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 8107B1C1FD;
	Thu, 17 Sep 2020 13:21:11 +0200 (CEST)
Received: from mga17.intel.com (mga17.intel.com [192.55.52.151])
 by dpdk.org (Postfix) with ESMTP id 61ACE1D5ED;
 Thu, 17 Sep 2020 13:21:09 +0200 (CEST)
IronPort-SDR: dg1S0tOincJHMHP6Sbp8ss9h2H2oPrc/r4+0DaW4lPQ1jUbME3LUHEo6j2YmLx1jcLdCSAVuU0
 MW+sQ/y80czw==
X-IronPort-AV: E=McAfee;i="6000,8403,9746"; a="139679839"
X-IronPort-AV: E=Sophos;i="5.76,436,1592895600"; d="scan'208";a="139679839"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga005.jf.intel.com ([10.7.209.41])
 by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 17 Sep 2020 04:21:05 -0700
IronPort-SDR: hTZ4fXJxfOrkPepYov6ANixrXO+VCu1FJst92pPjY6exSju9wCKda4HoOwjTfsrIFBN0BKSnuL
 696E1gNiCV3A==
X-IronPort-AV: E=Sophos;i="5.76,436,1592895600"; d="scan'208";a="483708543"
Received: from dhunt5-mobl5.ger.corp.intel.com (HELO [10.213.228.49])
 ([10.213.228.49])
 by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 17 Sep 2020 04:21:04 -0700
To: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>,
 Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org, stable@dpdk.org
References: <20200915193449.13310-1-l.wojciechow@partner.samsung.com>
 <CGME20200915193457eucas1p2adbe25c41a0e4ef16c029e7bff104503@eucas1p2.samsung.com>
 <20200915193449.13310-2-l.wojciechow@partner.samsung.com>
From: David Hunt <david.hunt@intel.com>
Message-ID: <fcdc0ab2-3829-d1c9-f9fa-31c5a0c6ecf0@intel.com>
Date: Thu, 17 Sep 2020 12:21:02 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
 Thunderbird/68.12.0
MIME-Version: 1.0
In-Reply-To: <20200915193449.13310-2-l.wojciechow@partner.samsung.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Content-Language: en-GB
Subject: Re: [dpdk-stable] [PATCH v1 1/6] app/test: fix deadlock in
	distributor test
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
Errors-To: stable-bounces@dpdk.org
Sender: "stable" <stable-bounces@dpdk.org>

Hi Lukasz,

On 15/9/2020 8:34 PM, Lukasz Wojciechowski wrote:
> The sanity test with worker shutdown delegates all bufs
> to be processed by a single lcore worker, then it freezes
> one of the lcore workers and continues to send more bufs.
>
> Problem occurred if freezed lcore is the same as the one
> that is processing the mbufs. The lcore processing mbufs
> might be different every time test is launched.
> This is caused by keeping the value of wkr static variable
> in rte_distributor_process function between running test cases.
>
> Test freezed always lcore with 0 id. The patch changes avoids
> possible collision by freezing lcore with zero_idx. The lcore
> that receives the data updates the zero_idx, so it is not freezed
> itself.
>
> To reproduce the issue fixed by this patch, please run
> distributor_autotest command in test app several times in a row.
>
> Fixes: c3eabff124e6 ("distributor: add unit tests")
> Cc: bruce.richardson@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> ---
>   app/test/test_distributor.c | 22 ++++++++++++++++++++--
>   1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
> index ba1f81cf8..35b25463a 100644
> --- a/app/test/test_distributor.c
> +++ b/app/test/test_distributor.c
> @@ -28,6 +28,7 @@ struct worker_params worker_params;
>   static volatile int quit;      /**< general quit variable for all threads */
>   static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
>   static volatile unsigned worker_idx;
> +static volatile unsigned zero_idx;
>   
>   struct worker_stats {
>   	volatile unsigned handled_packets;
> @@ -346,27 +347,43 @@ handle_work_for_shutdown_test(void *arg)
>   	unsigned int total = 0;
>   	unsigned int i;
>   	unsigned int returned = 0;
> +	unsigned int zero_id = 0;
>   	const unsigned int id = __atomic_fetch_add(&worker_idx, 1,
>   			__ATOMIC_RELAXED);
>   
>   	num = rte_distributor_get_pkt(d, id, buf, buf, num);
>   
> +	zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE);
> +	if (id == zero_id && num > 0) {
> +		zero_id = (zero_id + 1) %  __atomic_load_n(&worker_idx,
> +			__ATOMIC_ACQUIRE);
> +		__atomic_store_n(&zero_idx, zero_id, __ATOMIC_RELEASE);
> +	}
> +
>   	/* wait for quit single globally, or for worker zero, wait
>   	 * for zero_quit */
> -	while (!quit && !(id == 0 && zero_quit)) {
> +	while (!quit && !(id == zero_id && zero_quit)) {
>   		worker_stats[id].handled_packets += num;
>   		count += num;
>   		for (i = 0; i < num; i++)
>   			rte_pktmbuf_free(buf[i]);
>   		num = rte_distributor_get_pkt(d,
>   				id, buf, buf, num);
> +
> +		zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE);
> +		if (id == zero_id && num > 0) {
> +			zero_id = (zero_id + 1) %  __atomic_load_n(&worker_idx,
> +				__ATOMIC_ACQUIRE);
> +			__atomic_store_n(&zero_idx, zero_id, __ATOMIC_RELEASE);
> +		}
> +
>   		total += num;
>   	}
>   	worker_stats[id].handled_packets += num;
>   	count += num;
>   	returned = rte_distributor_return_pkt(d, id, buf, num);
>   
> -	if (id == 0) {
> +	if (id == zero_id) {
>   		/* for worker zero, allow it to restart to pick up last packet
>   		 * when all workers are shutting down.
>   		 */
> @@ -586,6 +603,7 @@ quit_workers(struct worker_params *wp, struct rte_mempool *p)
>   	rte_eal_mp_wait_lcore();
>   	quit = 0;
>   	worker_idx = 0;
> +	zero_idx = 0;
>   }
>   
>   static int


Lockup reproducable if you run the distributor_autotest 19 times in 
succesion. I was able to run the test many times more than that with the 
patch applied. Thanks.

Tested-by: David Hunt <david.hunt@intel.com>