From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jia.guo@intel.com>
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by dpdk.org (Postfix) with ESMTP id 364C9DE3
 for <dev@dpdk.org>; Thu, 24 May 2018 08:55:50 +0200 (CEST)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by orsmga105.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 23 May 2018 23:55:46 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.49,436,1520924400"; d="scan'208,217";a="51764091"
Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205])
 by FMSMGA003.fm.intel.com with ESMTP; 23 May 2018 23:55:45 -0700
Received: from shsmsx101.ccr.corp.intel.com (10.239.4.153) by
 fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS)
 id 14.3.319.2; Wed, 23 May 2018 23:55:45 -0700
Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.79]) by
 SHSMSX101.ccr.corp.intel.com ([169.254.1.40]) with mapi id 14.03.0319.002;
 Thu, 24 May 2018 14:55:43 +0800
From: "Guo, Jia" <jia.guo@intel.com>
To: "dev@dpdk.org" <dev@dpdk.org>
CC: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
 "stephen@networkplumber.org" <stephen@networkplumber.org>, "Richardson,
 Bruce" <bruce.richardson@intel.com>, "Yigit, Ferruh"
 <ferruh.yigit@intel.com>, "gaetan.rivet@6wind.com" <gaetan.rivet@6wind.com>,
 "Wu, Jingjing" <jingjing.wu@intel.com>, "thomas@monjalon.net"
 <thomas@monjalon.net>, "motih@mellanox.com" <motih@mellanox.com>,
 "matan@mellanox.com" <matan@mellanox.com>, "Van Haaren, Harry"
 <harry.van.haaren@intel.com>, "Zhang, Qi Z" <qi.z.zhang@intel.com>, "Zhang,
 Helin" <helin.zhang@intel.com>, "jblunck@infradead.org"
 <jblunck@infradead.org>, "shreyansh.jain@nxp.com" <shreyansh.jain@nxp.com>,
 "Guo, Jia" <jia.guo@intel.com>
Thread-Topic: [dpdk-dev] [RFC] hot plug failure handle mechanism
Thread-Index: AdPzKv18jRKvx3SLT1aSbnU0t6tpMQ==
Date: Thu, 24 May 2018 06:55:43 +0000
Message-ID: <01BA8470C017D6468C8290E4B9C5E1E83B379B43@shsmsx102.ccr.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ctpclassification: CTP_NT
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNTQwNzQ3ZGYtNGM0Zi00ZTJmLWE3NGUtNGNmMWFjNmFiMTMzIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiV0kwY0tYdDJYUG1qaGdCZmZ4N212bnRET2k0empOUlNIR3NcLzBzMzZLRTZyT05QOUw0MGxJVjFmOUQ3NkpJaG4ifQ==
dlp-product: dlpe-windows
dlp-version: 11.0.200.100
dlp-reaction: no-action
x-originating-ip: [10.239.127.40]
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: [dpdk-dev]  [RFC] hot plug failure handle mechanism
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 24 May 2018 06:55:51 -0000

As we know, hot plug is an importance feature whenever it use for the datac=
enter device's
fail-safe and consumption management , or use for the dynamic deployment  a=
nd SRIOV
Live Migration in SDN/NFV, it could be bring the higher flexibility and con=
tinuality of the
networking services in multiple use case in industry.

So let we see, dpdk as an importance networking combine framework with pack=
et control
path/fast path lib and multiple diversity PMD drivers, what can it do to he=
lp if application want
to achieve their hot plug solution when they are working in packet processi=
ng by dpdk.

We already have a general device event mechanism, failsafe driver, bonding =
driver and hot plug/unplug
api in framework, app could use these api to develop functional, but for th=
e case of hot plug failure handle,
that is removing a device at run-time will cause app trigger MMIO error and=
 crash out, it is lack of a mechanism
to handle the failure when hot unplug device. At present, kernel only guant=
iy the hotplug handle safer on the
kernel side, but for the user mode side, no more specific 3rd tools such as=
 udev/driverctl have especially
cover about these part of mechanism, and considerate feasibility of the imp=
lementation, runtime performance and
the general for almost user mode PMD driver, here a general hot plug failur=
e handle mechanism in dpdk framework
would be proposed.

The hot plug failure handle mechanism should be come across as bellow:

1.      Add a new bus ops "handle_hot-unplug"in bus to handle bus read/writ=
e error, it is bus-specific and each

kind of bus can implement its own logic.

2.      Implement pci bus specific ops"pci_handle_hot_unplug", in the funct=
ion, base on the

failure address to remap memory which belong to the corresponding device th=
at unplugged.

3.      Implement a new sigbus handler, and register it when start device e=
vent monitoring,

once the MMIO sigbus error exposure, it will trigger the above hot plug fai=
lure handle mechanism,

that will keep app, that working on packet processing, would not be broken =
and crash, then could

keep going clean, fail-safe or other working task.

4.      Also also will introduce the solution by use testpmd to show the ex=
ample of the whole procedure like that:

device unplug ->failure handle->stop forwarding->stop port->close port->det=
ach port.

Best regards,

Jeff Guo