From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 081372B92 for ; Wed, 29 Aug 2018 12:59:17 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Aug 2018 03:59:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,303,1531810800"; d="scan'208";a="79256217" Received: from fyigit-mobl.ger.corp.intel.com (HELO [10.237.221.56]) ([10.237.221.56]) by orsmga003.jf.intel.com with ESMTP; 29 Aug 2018 03:59:16 -0700 To: Dan Gora Cc: dev@dpdk.org References: <20180628224513.18391-1-dg@adax.com> <20180629015508.26599-1-dg@adax.com> <20180629015508.26599-3-dg@adax.com> From: Ferruh Yigit Openpgp: preference=signencrypt Message-ID: <611163de-bef7-488b-a77b-0e1ff190f1fb@intel.com> Date: Wed, 29 Aug 2018 11:59:15 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180629015508.26599-3-dg@adax.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v2 02/10] kni: separate releasing netdev from freeing KNI interface X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 29 Aug 2018 10:59:18 -0000 On 6/29/2018 2:55 AM, Dan Gora wrote: > Currently the rte_kni kernel driver suffers from a problem where > when the interface is released, it generates a callback to the DPDK > application to change the interface state to Down. However, after the > DPDK application handles the callback and generates a response back to > the kernel, the rte_kni driver cannot wake the thread which is asleep > waiting for the response, because it is holding the kni_link_lock > semaphore and it has already removed the 'struct kni_dev' from the > list of interfaces to poll for responses. > > This means that if the KNI interface is in the Up state when > rte_kni_release() is called, it will always sleep for three seconds > until kni_net_release gives up waiting for a response from the DPDK > application. > > To fix this, we must separate the step to release the kernel network > interface from the steps to remove the KNI interface from the list > of interfaces to poll. > > When the kernel network interface is removed with unregister_netdev(), > if the interface is up, it will generate a callback to mark the > interface down, which calls kni_net_release(). kni_net_release() will > block waiting for the DPDK application to call rte_kni_handle_request() > to handle the callback, but it also needs the thread in the KNI driver > (either the per-dev thread for multi-thread or the per-driver thread) > to call kni_net_poll_resp() in order to wake the thread sleeping in > kni_net_release (actually kni_net_process_request()). > > So now, KNI interfaces should be removed as such: > > 1) The user calls rte_kni_release(). This only unregisters the > netdev in the kernel, but touches nothing else. This allows all the > threads to run which are necessary to handle the callback into the > DPDK application to mark the interface down. > > 2) The user stops the thread running rte_kni_handle_request(). > After rte_kni_release() has been called, there will be no more > callbacks for that interface so it is not necessary. It cannot be > running at the same time that rte_kni_free() frees all of the FIFOs > and DPDK memory for that KNI interface. > > 3) The user calls rte_kni_free(). This performs the RTE_KNI_IOCTL_FREE > ioctl which calls kni_ioctl_free(). This function removes the struct > kni_dev from the list of interfaces to poll (and kills the per-dev > kthread, if configured for multi-thread), then frees the memory in > the FIFOs. > > Signed-off-by: Dan Gora You are right, that problem exits. Although I don't see problem related to holding the kni_list_lock, polling thread terminated before unregister interface cause the problem. And it has a reason to terminate polling thread first, because it uses device resources. Separating unregister and free steps looks good, but I am not sure if this should be reflected to the user, with a new ioctl and API. When user done with interface it calls rte_kni_release() to release them, does user really need a rte_kni_free() API or need to know the difference of two, is there any action to take in userspace between these two APIs? I think no. What about keeping single rte_kni_release() API and solve the issue internally in KNI? Previously it was doing: - Stop threads (also there is another single/multi thread error [1]) - kni_dev_remove() - unregister and free netdev() [2] - kni_net_release_fifo_phy() [3] Instead internally can we do: a- Unregister kernel interfaces, rte_kni_unregister()? b- stop threads c- kni_net_release_fifo_phy d- free netdev The challenge I can see is some time required between a) and b) to let userspace app to response, we need a way to know response received before stopping the thread. Another thing is there are two release path, kni_release() and kni_ioctl_release() both should be fixed. [1] If multi thread enabled they have been stopped, but if single thread used it has not been stopped that is why you don't see the 3 seconds delay for default single thread case, but not stopping the polling thread but removing the interface is wrong. [2] unregistering netdev will trigger a userspace request but response won't be read because polling thread also polls the response queue, and that thread is already stopped at this stage. [3] This is also wrong as you have pointed in later patch in your series, kni_net_release_fifo_phy() moves packets from rxq/alloq queue to free queue, queues are still allocated but the references kept in kernel may be invalid at this stage because of free netdev()