From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <arybchenko@solarflare.com>
Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com
 [148.163.129.52]) by dpdk.org (Postfix) with ESMTP id 6C5F21B45D
 for <dev@dpdk.org>; Fri,  5 Apr 2019 10:16:08 +0200 (CEST)
X-Virus-Scanned: Proofpoint Essentials engine
Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1-us4.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id
 E95D8BC0074; Fri,  5 Apr 2019 08:16:05 +0000 (UTC)
Received: from [192.168.38.17] (91.220.146.112) by ukex01.SolarFlarecom.com
 (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Fri, 5 Apr 2019
 09:15:57 +0100
To: Thomas Monjalon <thomas@monjalon.net>, Qi Zhang <qi.z.zhang@intel.com>
CC: <dev@dpdk.org>, <declan.doherty@intel.com>, <ferruh.yigit@intel.com>,
 <ktraynor@redhat.com>, <sthemmin@microsoft.com>,
 <benjamin.h.shelton@intel.com>, <narender.vangati@intel.com>,
 <david.marchand@redhat.com>
References: <20190320045403.14594-1-qi.z.zhang@intel.com>
 <777209833.n87rErGVB9@xps>
From: Andrew Rybchenko <arybchenko@solarflare.com>
Message-ID: <223ffe37-30ea-43b6-4d01-6d6e9cb6e75c@solarflare.com>
Date: Fri, 5 Apr 2019 11:15:53 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.5.1
MIME-Version: 1.0
In-Reply-To: <777209833.n87rErGVB9@xps>
Content-Language: en-GB
X-Originating-IP: [91.220.146.112]
X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To
 ukex01.SolarFlarecom.com (10.17.10.4)
X-TM-AS-Product-Ver: SMEX-12.5.0.1300-8.5.1010-24532.003
X-TM-AS-Result: No-17.837600-8.000000-10
X-TMASE-MatchedRID: gjZGo2H/wj8OwH4pD14DsPHkpkyUphL9a7GKgkgPr1F+YesuCgkiXEok
 D8d4JXHvaebNnHxj6GSYV+zO936gFw0OY2pvo6ZKVU3yVpaj3QzVy4hHC3/gyGecrqZc3vabBAd
 PriCD699kiC5EJPsotv99OHJG0AE0kDDfPE15Im1mvZ4BRIIeO0qAhuLHn5fEInzOyTDR1usc5Z
 b9Gn+KyqaRSFufqvFfMiQ6ATUElfaoiMFFv/Tn+Mf52VD5rJyZ4SkIdSwphgYAIXlMppp3X6Dxd
 GOQK0e0aOuXAv7E+ZcAIc/mL/Pz9u9StYP0tme3bc297PAGtWbZph2fCfuod2bs6x3sbgEmKNBK
 gNd80HOPvD59Vg586xTzlSen6y+OGTvDgN8UHcMD2WXLXdz+AdFqG4/BpDVaxiq9BX7QgLDzTu3
 5URal5iBkhyrvdMSlklPOPDP4bOjHuFqMm/pFRcg6fo0rxLVrJd2n2XoSRFnnZVNiuSZvWyzJWG
 rzf/lSNTcc+FGtHL0Ojb0PL1TpnN9faxl/I4mhngIgpj8eDcC063Wh9WVqgtQdB5NUNSsi1GcRA
 JRT6PNBEigZYnGpDFMqzgd1a0ic92H0PYV9AQcjMYmmlZAYwlDv8BEghmSNKHiBSUTsJHmPV3kw
 T3+zHtJjC6SZqqxdywrvmvnfC/wHAH3HntIhvw==
X-TM-AS-User-Approved-Sender: Yes
X-TM-AS-User-Blocked-Sender: No
X-TMASE-Result: 10--17.837600-8.000000
X-TMASE-Version: SMEX-12.5.0.1300-8.5.1010-24532.003
X-MDID: 1554452167-O2JmTTF4vWLs
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: Re: [dpdk-dev] [PATCH] ethdev: claim device reset as async
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Apr 2019 08:16:08 -0000

On 4/5/19 1:01 AM, Thomas Monjalon wrote:
> Hi,
>
> You forgot to Cc Andrew, co-maintainer of ethdev.
>
> 20/03/2019 05:54, Qi Zhang:
>> Device reset should be implemented in an async way since it is
>> possible to be invoked in interrupt thread and sometimes to reset a
>> device need to wait for some dependency, for example, a VF expects for
>> PF ready or a NIC function as part of a SOC wait for the whole system
>> reset complete, and all these time-consuming tasks will block the
>> interrupt thread.
>> The patch rename rte_eth_dev_reset to rte_eth_dev_reset_async and
>> rework the implementation. It will spawn a new thread which will call
>> ops->dev_reset, and when finished it will raise the event
>> RTE_ETH_EVENT_RESET_COMPLETE. The application should always wait for
>> this event before it continues to configure and restart the device.
>>
>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> ---
>> - * When this function is called, it first stops the port and then calls the
>> - * PMD specific dev_uninit( ) and dev_init( ) to return the port to initial
>> - * state, in which no Tx and Rx queues are setup, as if the port has been
>> - * reset and not started. The port keeps the port id it had before the
>> - * function call.
>> - *
>> - * After calling rte_eth_dev_reset( ), the application should use
>> - * rte_eth_dev_configure( ), rte_eth_rx_queue_setup( ),
>> - * rte_eth_tx_queue_setup( ), and rte_eth_dev_start( )
>> - * to reconfigure the device as appropriate.
>> - *
>> - * Note: To avoid unexpected behavior, the application should stop calling
>> - * Tx and Rx functions before calling rte_eth_dev_reset( ). For thread
>> - * safety, all these controlling functions should be called from the same
>> - * thread.
>> + * @note
>> + * Device reset may have the dependency, for example, a VF reset expects
>> + * PF ready, or a NIC function as a part of a SOC need to wait for other
>> + * parts of the system be ready, these are time-consuming tasks and will
>> + * block current thread.
>> + *
>> + * As the name, rte_eth_dev_reset_async is an async API, it will spwan a
>> + * new thread to call ops->dev_reset, once it is finished, it will raise
>> + * the RTE_ETH_EVENT_RESET_COMPLETE event to notify application.  That makes
>> + * things easy for an application that want to reset the device from the
>> + * interrupt thread since typically a RTE_ETH_EVENT_INTR_RESET handler is
>> + * invoked in interrupt thread. The typical implementation of ops->dev_reset
>> + * will do some hardware reset operations through calling dev_uninit() and
>> + * dev_init().
>> + *
>> + * Application should not assume device reset is finished after
>> + * rte_eth_dev_reset_async return, it should always wait for a
>> + * RTE_ETH_EVENT_RESET_COMPLETE event and check the reset result.
>> + * If reset success, application should call rte_eth_dev_configure( ),
>> + * rte_eth_rx_queue_setup( ), rte_eth_tx_queue_setup( ),
>> + * and rte_eth_dev_start( ) to reconfigure the device as appropriate.
>> + *
>> + * @Note
>> + * To avoid unexpected behavior, the application should stop calling
>> + * Tx and Rx functions before calling rte_eth_dev_reset_async( ).
>>    *
>>    * @param port_id
>>    *   The port identifier of the Ethernet device.
>> @@ -1880,12 +1892,10 @@ void rte_eth_dev_close(uint16_t port_id);
>>    *   - (0) if successful.
>>    *   - (-EINVAL) if port identifier is invalid.
>>    *   - (-ENOTSUP) if hardware doesn't support this function.
>> - *   - (-EPERM) if not ran from the primary process.
>> - *   - (-EIO) if re-initialisation failed or device is removed.
>>    *   - (-ENOMEM) if the reset failed due to OOM.
>> - *   - (-EAGAIN) if the reset temporarily failed and should be retried later.
>> + *   - (<0) other errors from low level driver.
>>    */
>> -int rte_eth_dev_reset(uint16_t port_id);
>> +int rte_eth_dev_reset_async(uint16_t port_id);
> Sorry I didn't check whether this API is better or not,
> but I know it cannot be accepted before proposing a deprecation notice.
> Perhaps you may keep the old API and just add the new one.
>
> Honestly, I never really agreed with the purpose of the reset API.
> So making it async or not, I have no real opinion...
> ... but spawning a new thread at each function call, I feel it is bad.

I agree with Thomas.

It looks very dangerous from locking point of view when a PMD
callback is executed from dynamically created thread.

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by dpdk.space (Postfix) with ESMTP id BCCBAA0679
	for <public@inbox.dpdk.org>; Fri,  5 Apr 2019 10:16:11 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id E05EF1B474;
	Fri,  5 Apr 2019 10:16:09 +0200 (CEST)
Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com
 [148.163.129.52]) by dpdk.org (Postfix) with ESMTP id 6C5F21B45D
 for <dev@dpdk.org>; Fri,  5 Apr 2019 10:16:08 +0200 (CEST)
X-Virus-Scanned: Proofpoint Essentials engine
Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1-us4.ppe-hosted.com (Proofpoint Essentials ESMTP Server) with ESMTPS id
 E95D8BC0074; Fri,  5 Apr 2019 08:16:05 +0000 (UTC)
Received: from [192.168.38.17] (91.220.146.112) by ukex01.SolarFlarecom.com
 (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Fri, 5 Apr 2019
 09:15:57 +0100
To: Thomas Monjalon <thomas@monjalon.net>, Qi Zhang <qi.z.zhang@intel.com>
CC: <dev@dpdk.org>, <declan.doherty@intel.com>, <ferruh.yigit@intel.com>,
 <ktraynor@redhat.com>, <sthemmin@microsoft.com>,
 <benjamin.h.shelton@intel.com>, <narender.vangati@intel.com>,
 <david.marchand@redhat.com>
References: <20190320045403.14594-1-qi.z.zhang@intel.com>
 <777209833.n87rErGVB9@xps>
From: Andrew Rybchenko <arybchenko@solarflare.com>
Message-ID: <223ffe37-30ea-43b6-4d01-6d6e9cb6e75c@solarflare.com>
Date: Fri, 5 Apr 2019 11:15:53 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.5.1
MIME-Version: 1.0
In-Reply-To: <777209833.n87rErGVB9@xps>
Content-Language: en-GB
X-Originating-IP: [91.220.146.112]
X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To
 ukex01.SolarFlarecom.com (10.17.10.4)
X-TM-AS-Product-Ver: SMEX-12.5.0.1300-8.5.1010-24532.003
X-TM-AS-Result: No-17.837600-8.000000-10
X-TMASE-MatchedRID: gjZGo2H/wj8OwH4pD14DsPHkpkyUphL9a7GKgkgPr1F+YesuCgkiXEok
 D8d4JXHvaebNnHxj6GSYV+zO936gFw0OY2pvo6ZKVU3yVpaj3QzVy4hHC3/gyGecrqZc3vabBAd
 PriCD699kiC5EJPsotv99OHJG0AE0kDDfPE15Im1mvZ4BRIIeO0qAhuLHn5fEInzOyTDR1usc5Z
 b9Gn+KyqaRSFufqvFfMiQ6ATUElfaoiMFFv/Tn+Mf52VD5rJyZ4SkIdSwphgYAIXlMppp3X6Dxd
 GOQK0e0aOuXAv7E+ZcAIc/mL/Pz9u9StYP0tme3bc297PAGtWbZph2fCfuod2bs6x3sbgEmKNBK
 gNd80HOPvD59Vg586xTzlSen6y+OGTvDgN8UHcMD2WXLXdz+AdFqG4/BpDVaxiq9BX7QgLDzTu3
 5URal5iBkhyrvdMSlklPOPDP4bOjHuFqMm/pFRcg6fo0rxLVrJd2n2XoSRFnnZVNiuSZvWyzJWG
 rzf/lSNTcc+FGtHL0Ojb0PL1TpnN9faxl/I4mhngIgpj8eDcC063Wh9WVqgtQdB5NUNSsi1GcRA
 JRT6PNBEigZYnGpDFMqzgd1a0ic92H0PYV9AQcjMYmmlZAYwlDv8BEghmSNKHiBSUTsJHmPV3kw
 T3+zHtJjC6SZqqxdywrvmvnfC/wHAH3HntIhvw==
X-TM-AS-User-Approved-Sender: Yes
X-TM-AS-User-Blocked-Sender: No
X-TMASE-Result: 10--17.837600-8.000000
X-TMASE-Version: SMEX-12.5.0.1300-8.5.1010-24532.003
X-MDID: 1554452167-O2JmTTF4vWLs
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: Re: [dpdk-dev] [PATCH] ethdev: claim device reset as async
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>
Message-ID: <20190405081553.haaFg8ampDavJgO1oizfvgH0YxTSueGuetHLvWKNbcQ@z>

On 4/5/19 1:01 AM, Thomas Monjalon wrote:
> Hi,
>
> You forgot to Cc Andrew, co-maintainer of ethdev.
>
> 20/03/2019 05:54, Qi Zhang:
>> Device reset should be implemented in an async way since it is
>> possible to be invoked in interrupt thread and sometimes to reset a
>> device need to wait for some dependency, for example, a VF expects for
>> PF ready or a NIC function as part of a SOC wait for the whole system
>> reset complete, and all these time-consuming tasks will block the
>> interrupt thread.
>> The patch rename rte_eth_dev_reset to rte_eth_dev_reset_async and
>> rework the implementation. It will spawn a new thread which will call
>> ops->dev_reset, and when finished it will raise the event
>> RTE_ETH_EVENT_RESET_COMPLETE. The application should always wait for
>> this event before it continues to configure and restart the device.
>>
>> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
>> ---
>> - * When this function is called, it first stops the port and then calls the
>> - * PMD specific dev_uninit( ) and dev_init( ) to return the port to initial
>> - * state, in which no Tx and Rx queues are setup, as if the port has been
>> - * reset and not started. The port keeps the port id it had before the
>> - * function call.
>> - *
>> - * After calling rte_eth_dev_reset( ), the application should use
>> - * rte_eth_dev_configure( ), rte_eth_rx_queue_setup( ),
>> - * rte_eth_tx_queue_setup( ), and rte_eth_dev_start( )
>> - * to reconfigure the device as appropriate.
>> - *
>> - * Note: To avoid unexpected behavior, the application should stop calling
>> - * Tx and Rx functions before calling rte_eth_dev_reset( ). For thread
>> - * safety, all these controlling functions should be called from the same
>> - * thread.
>> + * @note
>> + * Device reset may have the dependency, for example, a VF reset expects
>> + * PF ready, or a NIC function as a part of a SOC need to wait for other
>> + * parts of the system be ready, these are time-consuming tasks and will
>> + * block current thread.
>> + *
>> + * As the name, rte_eth_dev_reset_async is an async API, it will spwan a
>> + * new thread to call ops->dev_reset, once it is finished, it will raise
>> + * the RTE_ETH_EVENT_RESET_COMPLETE event to notify application.  That makes
>> + * things easy for an application that want to reset the device from the
>> + * interrupt thread since typically a RTE_ETH_EVENT_INTR_RESET handler is
>> + * invoked in interrupt thread. The typical implementation of ops->dev_reset
>> + * will do some hardware reset operations through calling dev_uninit() and
>> + * dev_init().
>> + *
>> + * Application should not assume device reset is finished after
>> + * rte_eth_dev_reset_async return, it should always wait for a
>> + * RTE_ETH_EVENT_RESET_COMPLETE event and check the reset result.
>> + * If reset success, application should call rte_eth_dev_configure( ),
>> + * rte_eth_rx_queue_setup( ), rte_eth_tx_queue_setup( ),
>> + * and rte_eth_dev_start( ) to reconfigure the device as appropriate.
>> + *
>> + * @Note
>> + * To avoid unexpected behavior, the application should stop calling
>> + * Tx and Rx functions before calling rte_eth_dev_reset_async( ).
>>    *
>>    * @param port_id
>>    *   The port identifier of the Ethernet device.
>> @@ -1880,12 +1892,10 @@ void rte_eth_dev_close(uint16_t port_id);
>>    *   - (0) if successful.
>>    *   - (-EINVAL) if port identifier is invalid.
>>    *   - (-ENOTSUP) if hardware doesn't support this function.
>> - *   - (-EPERM) if not ran from the primary process.
>> - *   - (-EIO) if re-initialisation failed or device is removed.
>>    *   - (-ENOMEM) if the reset failed due to OOM.
>> - *   - (-EAGAIN) if the reset temporarily failed and should be retried later.
>> + *   - (<0) other errors from low level driver.
>>    */
>> -int rte_eth_dev_reset(uint16_t port_id);
>> +int rte_eth_dev_reset_async(uint16_t port_id);
> Sorry I didn't check whether this API is better or not,
> but I know it cannot be accepted before proposing a deprecation notice.
> Perhaps you may keep the old API and just add the new one.
>
> Honestly, I never really agreed with the purpose of the reset API.
> So making it async or not, I have no real opinion...
> ... but spawning a new thread at each function call, I feel it is bad.

I agree with Thomas.

It looks very dangerous from locking point of view when a PMD
callback is executed from dynamically created thread.