From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 35B12A054F; Tue, 2 Mar 2021 17:44:48 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 15C0840693; Tue, 2 Mar 2021 17:44:45 +0100 (CET) Received: from mail-il1-f182.google.com (mail-il1-f182.google.com [209.85.166.182]) by mails.dpdk.org (Postfix) with ESMTP id 8552A40142 for ; Tue, 2 Mar 2021 17:44:43 +0100 (CET) Received: by mail-il1-f182.google.com with SMTP id p10so6452683ils.9 for ; Tue, 02 Mar 2021 08:44:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Ow5GhVvd0DzVBnW92CR8SM9pLK6zohUJvkDGCgZSH6Y=; b=kIAMbkoAeRmlKnElJGrCk5J9uCycY2XNbrQjpHugoo7Oj6ZlZk5F4EmsPeqX25Gp3t aI013Xng758vDcEqFGsLE7pOPv89u/R4i0SdViOjqcyq/5zk8jq8ytX+BldGuQPvNlxI YS7ZGps1NIM5dfrPVyMYqZq/1oKt0YAfzfi++aF9ZveFdwMh5iCsht+jEs3B61f34Iae hk6ZdFCxAqf6JV6CC+ejPpDa3I99ibYFhz6dz1Fm8IXyBlhXGxS6yzp6nKO+jfXwahlJ QJYHlw5V5BQayGu0Oj5sfzQvm6snv0FRAiHj9Ygprzu77GnVnsPuyAMTSEn3gs3DxjHm NQyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Ow5GhVvd0DzVBnW92CR8SM9pLK6zohUJvkDGCgZSH6Y=; b=hBuHl2s37Rq3OfxD95M07RIYmHjEdUbdjoLMDeBPOX6SA5I9JKOkmoV36Czc7D+3TQ w0MMYMGpK3VNWAutT0m7cWI1BF0IJDIqLU9Uj90Vvq/TlzJIIHHfBCIBeO6xH6kIPnOQ 5kb7cggLp0qzyymYFZiM2pOESres64RvvzOj1ZzK9oOxY0ln5cSL3K7oLgIk16nJ+Tw6 AGt8vFnXvasQtXyY+jr4FHwwIXKVYLwUK8q7KbKgXUlp67F4khcZ5sLyUfxho8Je95Lf 1P6MEW6VF78g2jyhZyRhflX1cG0A6WkG/CIOobzA75IvoYmh1gnG5aJ24wRghzaRnyEg n60g== X-Gm-Message-State: AOAM530PVdBS+q029IrBZAx3La6XQBKjjTqnLBNLu52hZu5Gsi1iaXdr hCO6Tt5FfiI2O4xv+rhrSjxphSMovNxu6vRE+mk= X-Google-Smtp-Source: ABdhPJwg78xb8a6Eo005Yu0Q4U5pTMOL5JE9DkdcbpYfcx/LJ2MIkF309xZxzwyNTc+1Vii3XM6/4VRO6AMfuCeet8I= X-Received: by 2002:a05:6e02:14c2:: with SMTP id o2mr6929064ilk.91.1614703482768; Tue, 02 Mar 2021 08:44:42 -0800 (PST) MIME-Version: 1.0 References: <20201126144613.4986-1-eladv6@gmail.com> <20210225143239.14220-1-eladv6@gmail.com> <20210225143239.14220-2-eladv6@gmail.com> <20210226074817.7e9e0a71@hermes.local> In-Reply-To: From: Elad Nachman Date: Tue, 2 Mar 2021 18:44:30 +0200 Message-ID: To: Dan Gora Cc: Igor Ryzhov , Stephen Hemminger , Ferruh Yigit , dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [dpdk-dev] [PATCH 2/2] kni: fix rtnl deadlocks and race conditions v4 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Dan, Thanks for the information but you are addressing a different problem. The problem discussed here is making ifconfig up or down while the DPDK application is running. Elad. =D7=91=D7=AA=D7=90=D7=A8=D7=99=D7=9A =D7=99=D7=95=D7=9D =D7=91=D7=B3, 1 =D7= =91=D7=9E=D7=A8=D7=A5 2021, 23:26, =D7=9E=D7=90=D7=AA Dan Gora =E2=80=8F: > This is from my git commit fixing this: > > kni: separate releasing netdev from freeing KNI interface > > Currently the rte_kni kernel driver suffers from a problem where > when the interface is released, it generates a callback to the DPDK > application to change the interface state to Down. However, after th= e > DPDK application handles the callback and generates a response back t= o > the kernel, the rte_kni driver cannot wake the thread which is asleep > waiting for the response, because it is holding the kni_link_lock > semaphore and it has already removed the 'struct kni_dev' from the > list of interfaces to poll for responses. > > This means that if the KNI interface is in the Up state when > rte_kni_release() is called, it will always sleep for three seconds > until kni_net_release gives up waiting for a response from the DPDK > application. > > To fix this, we must separate the step to release the kernel network > interface from the steps to remove the KNI interface from the list > of interfaces to poll. > > When the kernel network interface is removed with unregister_netdev()= , > if the interface is up, it will generate a callback to mark the > interface down, which calls kni_net_release(). kni_net_release() wil= l > block waiting for the DPDK application to call rte_kni_handle_request= () > to handle the callback, but it also needs the thread in the KNI drive= r > (either the per-dev thread for multi-thread or the per-driver thread) > to call kni_net_poll_resp() in order to wake the thread sleeping in > kni_net_release (actually kni_net_process_request()). > > So now, KNI interfaces should be removed as such: > > 1) The user calls rte_kni_release(). This only unregisters the > netdev in the kernel, but touches nothing else. This allows all the > threads to run which are necessary to handle the callback into the > DPDK application to mark the interface down. > > 2) The user stops the thread running rte_kni_handle_request(). > After rte_kni_release() has been called, there will be no more > callbacks for that interface so it is not necessary. It cannot be > running at the same time that rte_kni_free() frees all of the FIFOs > and DPDK memory for that KNI interface. > > 3) The user calls rte_kni_free(). This performs the RTE_KNI_IOCTL_FR= EE > ioctl which calls kni_ioctl_free(). This function removes the struct > kni_dev from the list of interfaces to poll (and kills the per-dev > kthread, if configured for multi-thread), then frees the memory in > the FIFOs. > > Signed-off-by: Dan Gora > > I'm not sure that this is exactly the problem that you're seeing, but > it sounds like it to me. > > thanks > dan > > On Mon, Mar 1, 2021 at 5:27 PM Dan Gora wrote: > > > > Hi All, > > > > Sorry to butt in on this, but I fixed this same issue about 3 years > > ago in my application, but I was never able to get the changes > > integrated and eventually just gave up trying. > > > > The rule with KNI is: > > 1) The app should have a separate control thread per rte_kni which > > just spins calling rte_kni_handle_request(). This ensures that other > > threads calling rte_kni_XXX functions will always get a response. > > > > 2) In order to deal with lockups and timeouts when closing the device, = I > sent > > patches which separated the closing process into two steps: > > rte_kni_release() which would unregister the underlying netdev, then > > rte_kni_free() which would free the KNI portions of the KNI device. > > When rte_kni_release() is called the kernel netdev is unregistered and > > a response is sent back to the application, the control thread calling > > rte_kni_handle_request() is still running, so the application will > > still get a response back from the kernel and not lock up, the > > application then kills the control thread so that > > rte_kni_handle_request() is not called again, then the application > > calls rte_kni_free() which frees all of the FIFOs and closes the > > device. > > > > If anyone is interested the patches are probably still floating around > > patchwork. If not you can check them out here: > > > > https://github.com/danielgora/dpdk.git > > > > thanks- > > dan > > > > On Mon, Mar 1, 2021 at 5:10 AM Igor Ryzhov wrote: > > > > > > Stephen, > > > > > > No, I don't have a better proposal, but I think it is not correct to > change > > > the behavior of KNI (making link down without a real response). > > > Even though we know that communicating with userspace under rtnl_lock > is a > > > bad idea, it works as it is for many years already. > > > > > > Elad, > > > > > > I agree with you that KNI should be removed from the main tree if it > is not > > > possible to fix this __dev_close_many issue. > > > There were discussions about this multiple times already, but no one = is > > > working on this AFAIK. > > > Last time the discussion was a month ago: > > > https://www.mail-archive.com/dev@dpdk.org/msg196033.html > > > > > > Igor > > > > > > On Fri, Feb 26, 2021 at 8:43 PM Elad Nachman wrote= : > > > > > > > The way the kernel handles its locks and lists for the dev close ma= ny > > > > path, there is no way you can go around this with rtnl unlocked : > > > > " > > > > > > > > There is a race condition in __dev_close_many() processing the > > > > close_list while the application terminates. > > > > It looks like if two vEth devices are terminating, > > > > and one releases the rtnl lock, the other takes it, > > > > updating the close_list in an unstable state, > > > > causing the close_list to become a circular linked list, > > > > hence list_for_each_entry() will endlessly loop inside > > > > __dev_close_many() . > > > > > > > > " > > > > And I don't expect David Miller will bend the kernel networking for > DPDK > > > > or KNI. > > > > > > > > But - Stephen - if you can personally convince David to accept a > > > > kernel patch which will separate the close_list locking mechanism t= o > a > > > > separate (RCU?) lock, then I can introduce first a patch to the > kernel > > > > which will add a lock for the close_list, this way rtnl can be > > > > unlocked for the if down case. > > > > > > > > After that kernel patch, your original patch + relocation of the sy= nc > > > > mutex locking will do the job . > > > >lockups > > > > Otherwise, rtnl has to be kept locked all of the way for the if dow= n > > > > case in order to prevent corruption causing a circular linked list > out > > > > of the close_list, causing a hang in the kernel. > > > >lockups > > > > Currently, the rtnl lock is the only thing keeping the close_list > from > > > > corruption. > > > > > > > > If you doubt rtnl cannot be unlocked for dev close path, you can > > > > consult David for his opinion, as I think it is critical to > understand > > > > what the kernel can or cannot do, or expects to be done before we c= an > > > > unlock its locks as we wish inside rte_kni.ko . > > > > > > > > Otherwise, if we are still in disagreement on how to patch this set > of > > > > problems, I think the responsible way around it is to completely > > > > remove kni from the main dpdk tree and move it to dpdk-kmods > > > > repository. > > > > > > > > I know BSD style open-source does not carry legal responsibility fr= om > > > > the developers, but I think when a bunch of developers know a piece > of > > > > code is highly buggy, they should not leave it for countless new > users > > > > to bounce their head desperately against, if they cannot agree on a > > > > correct way to solve the bunch of problems, of which I think we all > > > > agree exist (we just do not agree on the proper solution or patch).= .. > > > > > > > > That's my two cents, > > > > > > > > Elad. > > > > > > > > On Fri, Feb 26, 2021 at 5:49 PM Stephen Hemminger > > > > wrote: > > > > > > > > > > On Fri, 26 Feb 2021 00:01:01 +0300 > > > > > Igor Ryzhov wrote: > > > > > > > > > > > Hi Elad, > > > > > > > > > > > > Thanks for the patch, but this is still NACK from me. > > > > > > > > > > > > The only real advantage of KNI over other exceptional-path > techniques > > > > > > like virtio-user is the ability to configure DPDK-managed > interfaces > > > > > > directly > > > > > > from the kernel using well-known utils like iproute2. A very > important > > > > part > > > > > > of this is getting responses from the DPDK app and knowing the > actual > > > > > > result of command execution. > > > > > > If you're making async requests to the application and you don'= t > know > > > > > > the result, then what's the point of using KNI at all? > > > > > > > > > > > > Igor > > > > > > > > > > Do you have a better proposal that keeps the request result but > does not > > > > > call userspace with lock held. > > > > > > > > > > PS: I also have strong dislike of KNI, as designed it would have > been > > > > rejected > > > > > by Linux kernel developers. A better solution would be userspace > > > > version of > > > > > something like devlink devices. But doing control operations by > proxy is > > > > > a locking nightmare. > > > > > > > > -- > Dan Gora > Software Engineer > > Adax, Inc. > Rua Dona Maria Alves, 1070 Casa 5 > Centro > Ubatuba, SP > CEP 11680-000 > Brasil > > Tel: +55 (12) 3833-1021 (Brazil and outside of US) > : +1 (510) 859-4801 (Inside of US) > : dan_gora (Skype) > > email: dg@adax.com >