From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7735BA0585 for ; Tue, 22 Nov 2022 18:29:03 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6A94F42D84; Tue, 22 Nov 2022 18:29:03 +0100 (CET) Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by mails.dpdk.org (Postfix) with ESMTP id 7DA5C42D78 for ; Tue, 22 Nov 2022 18:29:01 +0100 (CET) Received: by mail-pj1-f47.google.com with SMTP id w3-20020a17090a460300b00218524e8877so1708146pjg.1 for ; Tue, 22 Nov 2022 09:29:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=nxHc/C+vc5LAtbxdgIARZ3044izs6hTaF3nwKkm0Qwc=; b=brBaZ9iNgif5bSDTjZeAWuvc7L/Tdvm0+G+SB4Sp7qhMsumANahB/uLOSj6HIbZFOS pt3JQyHdUGNRoyT8shBRWuLwYaxUvSn3wbVay2B4FS3rg/nmcH0LYTSyYjv4bj40MLIO G4S5Pyyl8/Kh+w3BHSZwoI3jGQU4hOLleOEgQDh6BCjSFy7WDPsM5fWukJ1OHJmKRe7/ UBw4yINOuLj6VD9/DnIDomzhe6J4W9/7BlGWKOPRePM9DjWkLeYecCdbn5b0Tp4EBgN4 ciLQncKRfFHVRDxoaHiEGBOwZWj3wN/k7M9KjXmjKBSpDQlI4YgcSqhDhwKbZiN59A/g qOkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nxHc/C+vc5LAtbxdgIARZ3044izs6hTaF3nwKkm0Qwc=; b=m87BfYnFpyGykpY8hmZST1d3b66TtSah5vsII9uLlih8bPrzUvqUwwj51XUzt/jGPz g7RtFGdfPaXH4rYvb5+lgA5aiHC6iFK8nd3reiYzXaF1oJ/pJPxj1rMLfsnfw0ERLKxw PlyfpqyMcdmLPUOk2Lmc7KhrWyNK6H5G/Tq6VoJJrG1V5G7NDAmbJBlFkh2MQCFQH90+ E6CQvgmXAHHYhBkFN43ghLrSTP/KVLfUtWa+qAuENWiMU45Q60yk78etSbFcNI/7dxfM PQTHS1KCECuGqbimOqQLqaE3+muc838sPJN/wPdI0/Kahj5YxI3EQdt1mZNDJ6lSoEKH uyHA== X-Gm-Message-State: ANoB5pmPOmbxjTXykhamh8HiTGKRFb6r6lmKeKqMo4E+Cfu+7XLw9b4o VIqxdHGCtrkB5VrwN4MynGLCjA== X-Google-Smtp-Source: AA0mqf5mbbSKeabHfrqqX4rzVcY58fzhCUTCwFDHJ7LBBFJUJhbxYjdZ6isO+WekmmA5RH2TZH4XUw== X-Received: by 2002:a17:902:b613:b0:179:c436:4528 with SMTP id b19-20020a170902b61300b00179c4364528mr10084750pls.102.1669138140563; Tue, 22 Nov 2022 09:29:00 -0800 (PST) Received: from hermes.local (204-195-120-218.wavecable.com. [204.195.120.218]) by smtp.gmail.com with ESMTPSA id f126-20020a62db84000000b00574345ee12csm347657pfg.23.2022.11.22.09.28.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Nov 2022 09:29:00 -0800 (PST) Date: Tue, 22 Nov 2022 09:28:57 -0800 From: Stephen Hemminger To: "Zhou, YidingX" Cc: "ferruh.yigit@amd.com" , "dev@dpdk.org" , "Burakov, Anatoly" , "He, Xingguang" , "stable@dpdk.org" Subject: Re: [PATCH v2] net/pcap: fix timeout of stopping device Message-ID: <20221122092857.53a50b2e@hermes.local> In-Reply-To: References: <20220825072041.10768-1-yidingx.zhou@intel.com> <20220906080511.46088-1-yidingx.zhou@intel.com> <20220906075737.2fb429a5@hermes.local> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org On Tue, 22 Nov 2022 09:25:33 +0000 "Zhou, YidingX" wrote: > > -----Original Message----- > > From: Zhou, YidingX > > Sent: Wednesday, September 21, 2022 3:15 PM > > To: Stephen Hemminger ; Zhang, Qi Z > > > > Cc: dev@dpdk.org; Burakov, Anatoly ; He, > > Xingguang ; stable@dpdk.org > > Subject: RE: [PATCH v2] net/pcap: fix timeout of stopping device > > > > > > > > > -----Original Message----- > > > From: Stephen Hemminger > > > Sent: Tuesday, September 6, 2022 10:58 PM > > > To: Zhou, YidingX > > > Cc: mailto:dev@dpdk.org; Zhang, Qi Z ; Burakov, Anatoly > > > ; He, Xingguang ; > > > mailto:stable@dpdk.org > > > Subject: Re: [PATCH v2] net/pcap: fix timeout of stopping device > > > > > > On Tue, 6 Sep 2022 16:05:11 +0800 > > > Yiding Zhou wrote: > > > > > > > The pcap file will be synchronized to the disk when stopping the device. > > > > It takes a long time if the file is large that would cause the > > > > 'detach sync request' timeout when the device is closed under > > > > multi-process scenario. > > > > > > > > This commit fixes the issue by using alarm handler to release dumper. > > > > > > > > Fixes: 0ecfb6c04d54 ("net/pcap: move handler to process private") > > > > Cc: mailto:stable@dpdk.org > > > > > > > > Signed-off-by: Yiding Zhou > > > > > > > > > I think you need to redesign the handshake if this the case. > > > Forcing 30 second delay at the end of all uses of pcap is not acceptable. > > > > @Zhang, Qi Z Do we need to redesign the handshake to fix this? > > Hi, Ferruh > Sorry for the late reply. > I did not receive your email on Oct 6, I got your comments from patchwork. > > "Can you please provide more details on multi-process communication and > call trace, to help us think about a solution to address this issue in a > more generic way (not just for pcap but for any case device close takes > more than multi-process timeout)?" > > I try to explain this issue with a sequence diagram, hope it can be displayed correctly in the mail. > > thread intr thread intr thread thread > of secondary of secondary of primary of primary > | | | | > | | | | > rte_eal_hotplug_remove > rte_dev_remove > eal_dev_hotplug_request_to_primary > rte_mp_request_sync ------------------------------------------------------->| > | > handle_secondary_request > |<-----------------| > | > __handle_secondary_request > eal_dev_hotplug_request_to_secondary > |<------------------------------------- rte_mp_request_sync > | > handle_primary_request--------->| > | > __handle_primary_request > local_dev_remove(this will take long time) > rte_mp_reply -------------------------------->| > | > local_dev_remove > |<------------------------------------------------- rte_mp_reply > > The marked 'local_dev_remove()' in the secondary process will perform a pcap file synchronization operation. > When the pcap file is too large, it will take a lot of time (according to my test 100G takes 20+ seconds). > This caused the processing of hot_plug message to time out. Part of the problem maybe a hidden file sync in some library. Normally, closing a file should be fast even with lots of outstanding data. The actual write done by OS will continue from file cache. I wonder if doing some kind of fadvise call might help see POSIX_FADV_SEQUENTIAL or POSIX_FADV_DONTNEED