From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 87AE7A04B5; Wed, 30 Sep 2020 22:23:07 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 96CC81D556; Wed, 30 Sep 2020 22:23:05 +0200 (CEST) Received: from mailout2.w1.samsung.com (mailout2.w1.samsung.com [210.118.77.12]) by dpdk.org (Postfix) with ESMTP id 563921C23D for ; Wed, 30 Sep 2020 22:23:04 +0200 (CEST) Received: from eucas1p1.samsung.com (unknown [182.198.249.206]) by mailout2.w1.samsung.com (KnoxPortal) with ESMTP id 20200930202302euoutp020bee09340fee169b5b31ef719ec3c55c~5qWJXOiWV1739617396euoutp02g for ; Wed, 30 Sep 2020 20:23:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout2.w1.samsung.com 20200930202302euoutp020bee09340fee169b5b31ef719ec3c55c~5qWJXOiWV1739617396euoutp02g DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1601497382; bh=p9tG5sl1WT5Rus1BNUpMVPilkfobweeW10NM8vIGvJc=; h=Subject:To:Cc:From:Date:In-Reply-To:References:From; b=hyqvNnj4WO2UoQ34TALUtANrAg/PjBMqK9+SVswfQV45ZcFrKIvJKgmoVHHQSi1B3 SjbL0kv2r3zw0+KO6CUS21ryPVNe5zKJizNHwDE8COSruF+CtKPw5XrsShvhMDtXEO dQkBMFm1ZGVt+NeOG74OOpz0mMNu9xJctEkV/+nc= Received: from eusmges3new.samsung.com (unknown [203.254.199.245]) by eucas1p2.samsung.com (KnoxPortal) with ESMTP id 20200930202301eucas1p275e4438f7aa6e0c7fb9c44a05210d8de~5qWIilgLN2068920689eucas1p2V; Wed, 30 Sep 2020 20:23:01 +0000 (GMT) Received: from eucas1p2.samsung.com ( [182.198.249.207]) by eusmges3new.samsung.com (EUCPMTA) with SMTP id 7B.82.06318.529E47F5; Wed, 30 Sep 2020 21:23:01 +0100 (BST) Received: from eusmtrp2.samsung.com (unknown [182.198.249.139]) by eucas1p2.samsung.com (KnoxPortal) with ESMTPA id 20200930202300eucas1p2990f41e23b6c4520d62f4bd825e73c2f~5qWHkP4ev3200432004eucas1p2c; Wed, 30 Sep 2020 20:23:00 +0000 (GMT) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eusmtrp2.samsung.com (KnoxPortal) with ESMTP id 20200930202300eusmtrp27fa7765471321ca49a32927a6cb6b6c1~5qWHjmtTx2336323363eusmtrp2R; Wed, 30 Sep 2020 20:23:00 +0000 (GMT) X-AuditID: cbfec7f5-38bff700000018ae-58-5f74e9259bd7 Received: from eusmtip2.samsung.com ( [203.254.199.222]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id 03.CA.06314.429E47F5; Wed, 30 Sep 2020 21:23:00 +0100 (BST) Received: from [106.210.88.70] (unknown [106.210.88.70]) by eusmtip2.samsung.com (KnoxPortal) with ESMTPA id 20200930202258eusmtip25ddbba340a413b96fdd17ae90413829f~5qWGABYdG0106801068eusmtip26; Wed, 30 Sep 2020 20:22:55 +0000 (GMT) To: Honnappa Nagarahalli , David Hunt , Bruce Richardson Cc: "dev@dpdk.org" , "stable@dpdk.org" , nd From: Lukasz Wojciechowski Message-ID: <1df7468a-dd9a-a998-8af7-febb09952f7e@partner.samsung.com> Date: Wed, 30 Sep 2020 22:22:41 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Transfer-Encoding: 8bit Content-Language: en-US X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprNKsWRmVeSWpSXmKPExsWy7djP87qqL0viDVZuE7G4screom/SRyaL d5+2M1nMfNrCbnFmeQ+zxb+OP+wObB5r5q1h9Pi1YCmrx+I9L5kCmKO4bFJSczLLUov07RK4 Mrb9msRYsEuz4ukvqwbG5wpdjBwcEgImEg07WbsYuTiEBFYwSrRte8UC4XxhlDj+YSczhPOZ UaL58VX2LkZOsI6Thy+wQSSWM0r8u9PKCpIQEnjLKHHgexqILSwQLfH38BOwIhGBbkaJSWfP MIIkmAUiJU6eeQU2iU3AVuLIzK9gzbwCbhIvbnUxg9gsAqoST87tAKsRFYiTOHbqEQtEjaDE yZlPwGxOgViJj61v2CFmyks0b53NDGGLS9x6Mp8JZLGEwDx2iTs3tzFBnO0isXvDURYIW1ji 1fEtUO/ISPzfCdOwjVHi6u+fjBDOfkaJ670roKqsJQ7/+80GCjJmAU2J9bv0IcKOEitOfGGB hCSfxI23ghBH8ElM2jadGSLMK9HRJgRRrSfxtGcqI8zaP2ufsExgVJqF5LVZSN6ZheSdWQh7 FzCyrGIUTy0tzk1PLTbOSy3XK07MLS7NS9dLzs/dxAhMMqf/Hf+6g3Hfn6RDjAIcjEo8vBPy SuKFWBPLiitzDzFKcDArifA6nT0dJ8SbklhZlVqUH19UmpNafIhRmoNFSZzXeNHLWCGB9MSS 1OzU1ILUIpgsEwenVANjiJST+roFhQpHTSZtvXJuodjh5eovXe2Y9+Yb77n+572Ipe2+mQ0X /pVNXtV/xN7y5iGRo07XNN9NKj9z7uSsK84qHkxuu5RCq224LZuYJnyumXTuyMLMNMWtk76/ PxpqkHNrxy5/ttWbZm2N3ukvfebtniOrfEo3/vkwx6VqmaNM7lk9tuKXLEosxRmJhlrMRcWJ AJuNqWQuAwAA X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrHIsWRmVeSWpSXmKPExsVy+t/xe7oqL0viDabtFbS4screom/SRyaL d5+2M1nMfNrCbnFmeQ+zxb+OP+wObB5r5q1h9Pi1YCmrx+I9L5kCmKP0bIryS0tSFTLyi0ts laINLYz0DC0t9IxMLPUMjc1jrYxMlfTtbFJSczLLUov07RL0Mrb9msRYsEuz4ukvqwbG5wpd jJwcEgImEicPX2DrYuTiEBJYyihxtXUVcxcjB1BCRuLDJQGIGmGJP9e6oGpeM0ocWbyfHSQh LBAt8ffwE7CEiEAvo8T2rrlMIAlmgUiJKQtmM0N0PGGSWLF8AQtIgk3AVuLIzK+sIDavgJvE i1tdzCA2i4CqxJNzO8CmigrESZzpecEGUSMocXLmE7BeToFYiY+tb9ghFphJzNv8kBnClpdo 3jobyhaXuPVkPtMERqFZSNpnIWmZhaRlFpKWBYwsqxhFUkuLc9Nziw31ihNzi0vz0vWS83M3 MQLjatuxn5t3MF7aGHyIUYCDUYmHd0JeSbwQa2JZcWXuIUYJDmYlEV6ns6fjhHhTEiurUovy 44tKc1KLDzGaAj03kVlKNDkfGPNAPYamhuYWlobmxubGZhZK4rwdAgdjhATSE0tSs1NTC1KL YPqYODilGhjLreu+6PSsvr6l453tppUyBjs7wpleWfscC0xselUgfkGBL+5a9PoDZe8DBFzm Kc3+XMN5+ZEMw+27K90nnDCSUV51etX/P6dkjoeUzr+tEv3k4/IX+45IT+G12XS/8+DZEHfJ bztfnM744/ro0CLv2xfSOXwsFM507srV5ua2EgienfGhWiZAiaU4I9FQi7moOBEA9krBcsEC AAA= X-CMS-MailID: 20200930202300eucas1p2990f41e23b6c4520d62f4bd825e73c2f X-Msg-Generator: CA Content-Type: text/plain; charset="utf-8" X-RootMTR: 20200925224216eucas1p1e8e1d0ecab4bbbf6e43b117c1d210649 X-EPHeader: CA CMS-TYPE: 201P X-CMS-RootMailID: 20200925224216eucas1p1e8e1d0ecab4bbbf6e43b117c1d210649 References: <20200923132541.21417-1-l.wojciechow@partner.samsung.com> <20200925224209.12173-1-l.wojciechow@partner.samsung.com> <20200925224209.12173-2-l.wojciechow@partner.samsung.com> Subject: Re: [dpdk-dev] [PATCH v4 1/8] test/distributor: fix deadlock with freezed worker X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Honnappa, Thank you very much for your review Reply inline below W dniu 28.09.2020 o 01:34, Honnappa Nagarahalli pisze: > Hi Lukasz, > Few comments inline > > > >> The sanity test with worker shutdown delegates all bufs to be processed by a >> single lcore worker, then it freezes one of the lcore workers and continues to >> send more bufs. > The designated core to freeze (core with id == 0 in the existing code) gets out of the first while loop and gets into the 2nd while loop in the function ' handle_work_for_shutdown_test'. > In between these 2 while loops, it informs the distributor that it will not accept any more packets by calling ' rte_distributor_return_pkt' (at least this API is supposed to do that). But, the distributor hangs waiting for the frozen core to start accepting packets. I think this is a problem with the distributor and not the test case. I agree. I did some further investigation and you are correct. This is the distributor issue. The new burst model doesn't care at all if the worker has called rte_distributor_return_pkt(). It it doesn't find a worker with matching tag, it will process packets to the worker without checking if it requested for packets. The legacy single model used a different handshake value to indicate it does not want any more packets. The flag is reused for other purposes in burst model (marking valid return packets) and that's obviously wrong. The tests however also need to be adjusted as they don't verify the request/return status of worker properly. I hope I will be able to update the patches this or next week to fix it. > >> Problem occurred if freezed lcore is the same as the one that is processing >> the mbufs. The lcore processing mbufs might be different every time test is >> launched. >> This is caused by keeping the value of wkr static variable in >> rte_distributor_process function between running test cases. >> >> Test freezed always lcore with 0 id. The patch changes avoids possible >> collision by freezing lcore with zero_idx. The lcore that receives the data >> updates the zero_idx, so it is not freezed itself. >> >> To reproduce the issue fixed by this patch, please run distributor_autotest >> command in test app several times in a row. >> >> Fixes: c3eabff124e6 ("distributor: add unit tests") >> Cc: bruce.richardson@intel.com >> Cc: stable@dpdk.org >> >> Signed-off-by: Lukasz Wojciechowski >> Tested-by: David Hunt >> --- >> app/test/test_distributor.c | 22 ++++++++++++++++++++-- >> 1 file changed, 20 insertions(+), 2 deletions(-) >> >> diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c index >> ba1f81cf8..35b25463a 100644 >> --- a/app/test/test_distributor.c >> +++ b/app/test/test_distributor.c >> @@ -28,6 +28,7 @@ struct worker_params worker_params; >> static volatile int quit; /**< general quit variable for all threads */ >> static volatile int zero_quit; /**< var for when we just want thr0 to quit*/ >> static volatile unsigned worker_idx; >> +static volatile unsigned zero_idx; >> >> struct worker_stats { >> volatile unsigned handled_packets; >> @@ -346,27 +347,43 @@ handle_work_for_shutdown_test(void *arg) >> unsigned int total = 0; >> unsigned int i; >> unsigned int returned = 0; >> + unsigned int zero_id = 0; >> const unsigned int id = __atomic_fetch_add(&worker_idx, 1, >> __ATOMIC_RELAXED); >> >> num = rte_distributor_get_pkt(d, id, buf, buf, num); >> >> + zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE); >> + if (id == zero_id && num > 0) { >> + zero_id = (zero_id + 1) % __atomic_load_n(&worker_idx, >> + __ATOMIC_ACQUIRE); >> + __atomic_store_n(&zero_idx, zero_id, __ATOMIC_RELEASE); >> + } >> + >> /* wait for quit single globally, or for worker zero, wait >> * for zero_quit */ >> - while (!quit && !(id == 0 && zero_quit)) { >> + while (!quit && !(id == zero_id && zero_quit)) { >> worker_stats[id].handled_packets += num; >> count += num; >> for (i = 0; i < num; i++) >> rte_pktmbuf_free(buf[i]); >> num = rte_distributor_get_pkt(d, >> id, buf, buf, num); >> + >> + zero_id = __atomic_load_n(&zero_idx, >> __ATOMIC_ACQUIRE); >> + if (id == zero_id && num > 0) { >> + zero_id = (zero_id + 1) % >> __atomic_load_n(&worker_idx, >> + __ATOMIC_ACQUIRE); >> + __atomic_store_n(&zero_idx, zero_id, >> __ATOMIC_RELEASE); >> + } >> + >> total += num; >> } >> worker_stats[id].handled_packets += num; >> count += num; >> returned = rte_distributor_return_pkt(d, id, buf, num); >> >> - if (id == 0) { >> + if (id == zero_id) { >> /* for worker zero, allow it to restart to pick up last packet >> * when all workers are shutting down. >> */ >> @@ -586,6 +603,7 @@ quit_workers(struct worker_params *wp, struct >> rte_mempool *p) >> rte_eal_mp_wait_lcore(); >> quit = 0; >> worker_idx = 0; >> + zero_idx = 0; >> } >> >> static int >> -- >> 2.17.1 -- Lukasz Wojciechowski Principal Software Engineer Samsung R&D Institute Poland Samsung Electronics Office +48 22 377 88 25 l.wojciechow@partner.samsung.com