From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id 364C9DE3 for ; Thu, 24 May 2018 08:55:50 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 May 2018 23:55:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,436,1520924400"; d="scan'208,217";a="51764091" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by FMSMGA003.fm.intel.com with ESMTP; 23 May 2018 23:55:45 -0700 Received: from shsmsx101.ccr.corp.intel.com (10.239.4.153) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 23 May 2018 23:55:45 -0700 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.79]) by SHSMSX101.ccr.corp.intel.com ([169.254.1.40]) with mapi id 14.03.0319.002; Thu, 24 May 2018 14:55:43 +0800 From: "Guo, Jia" To: "dev@dpdk.org" CC: "Ananyev, Konstantin" , "stephen@networkplumber.org" , "Richardson, Bruce" , "Yigit, Ferruh" , "gaetan.rivet@6wind.com" , "Wu, Jingjing" , "thomas@monjalon.net" , "motih@mellanox.com" , "matan@mellanox.com" , "Van Haaren, Harry" , "Zhang, Qi Z" , "Zhang, Helin" , "jblunck@infradead.org" , "shreyansh.jain@nxp.com" , "Guo, Jia" Thread-Topic: [dpdk-dev] [RFC] hot plug failure handle mechanism Thread-Index: AdPzKv18jRKvx3SLT1aSbnU0t6tpMQ== Date: Thu, 24 May 2018 06:55:43 +0000 Message-ID: <01BA8470C017D6468C8290E4B9C5E1E83B379B43@shsmsx102.ccr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_NT x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNTQwNzQ3ZGYtNGM0Zi00ZTJmLWE3NGUtNGNmMWFjNmFiMTMzIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiV0kwY0tYdDJYUG1qaGdCZmZ4N212bnRET2k0empOUlNIR3NcLzBzMzZLRTZyT05QOUw0MGxJVjFmOUQ3NkpJaG4ifQ== dlp-product: dlpe-windows dlp-version: 11.0.200.100 dlp-reaction: no-action x-originating-ip: [10.239.127.40] MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-dev] [RFC] hot plug failure handle mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 May 2018 06:55:51 -0000 As we know, hot plug is an importance feature whenever it use for the datac= enter device's fail-safe and consumption management , or use for the dynamic deployment a= nd SRIOV Live Migration in SDN/NFV, it could be bring the higher flexibility and con= tinuality of the networking services in multiple use case in industry. So let we see, dpdk as an importance networking combine framework with pack= et control path/fast path lib and multiple diversity PMD drivers, what can it do to he= lp if application want to achieve their hot plug solution when they are working in packet processi= ng by dpdk. We already have a general device event mechanism, failsafe driver, bonding = driver and hot plug/unplug api in framework, app could use these api to develop functional, but for th= e case of hot plug failure handle, that is removing a device at run-time will cause app trigger MMIO error and= crash out, it is lack of a mechanism to handle the failure when hot unplug device. At present, kernel only guant= iy the hotplug handle safer on the kernel side, but for the user mode side, no more specific 3rd tools such as= udev/driverctl have especially cover about these part of mechanism, and considerate feasibility of the imp= lementation, runtime performance and the general for almost user mode PMD driver, here a general hot plug failur= e handle mechanism in dpdk framework would be proposed. The hot plug failure handle mechanism should be come across as bellow: 1. Add a new bus ops "handle_hot-unplug"in bus to handle bus read/writ= e error, it is bus-specific and each kind of bus can implement its own logic. 2. Implement pci bus specific ops"pci_handle_hot_unplug", in the funct= ion, base on the failure address to remap memory which belong to the corresponding device th= at unplugged. 3. Implement a new sigbus handler, and register it when start device e= vent monitoring, once the MMIO sigbus error exposure, it will trigger the above hot plug fai= lure handle mechanism, that will keep app, that working on packet processing, would not be broken = and crash, then could keep going clean, fail-safe or other working task. 4. Also also will introduce the solution by use testpmd to show the ex= ample of the whole procedure like that: device unplug ->failure handle->stop forwarding->stop port->close port->det= ach port. Best regards, Jeff Guo