From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-eopbgr30086.outbound.protection.outlook.com [40.107.3.86]) by dpdk.org (Postfix) with ESMTP id C5C6F1B05 for ; Thu, 24 May 2018 16:57:51 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lXppRYcVkAULfsQlq33xjR528pnUZ/cRmaTbM4xkyio=; b=VzASkvpLL0PKRiaTwFcxpaSx12qixh7xvbC37pp7RySMDaB4wzgeHawvWvWKL+XIXkyA3y0L0RgLVWeD67yWe64Qi2ruvMmO+dwddiQFqv+BIHZdRwltuyFsVdsXqFPvEVdIe3XXuIKWwJn3Y0iaLzt6nJiBnnSNPdHqpUx8esw= Received: from VI1PR0501MB2608.eurprd05.prod.outlook.com (10.168.137.20) by VI1PR0501MB2207.eurprd05.prod.outlook.com (10.169.134.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.797.11; Thu, 24 May 2018 14:57:49 +0000 Received: from VI1PR0501MB2608.eurprd05.prod.outlook.com ([fe80::1035:58f9:b94c:2180]) by VI1PR0501MB2608.eurprd05.prod.outlook.com ([fe80::1035:58f9:b94c:2180%18]) with mapi id 15.20.0776.015; Thu, 24 May 2018 14:57:48 +0000 From: Matan Azrad To: "Guo, Jia" , "dev@dpdk.org" CC: "Ananyev, Konstantin" , "stephen@networkplumber.org" , "Richardson, Bruce" , "Yigit, Ferruh" , "gaetan.rivet@6wind.com" , "Wu, Jingjing" , Thomas Monjalon , Mordechay Haimovsky , "Van Haaren, Harry" , "Zhang, Qi Z" , "Zhang, Helin" , "jblunck@infradead.org" , "shreyansh.jain@nxp.com" Thread-Topic: [dpdk-dev] [RFC] hot plug failure handle mechanism Thread-Index: AdPzKv18jRKvx3SLT1aSbnU0t6tpMQAQptKA Date: Thu, 24 May 2018 14:57:48 +0000 Message-ID: References: <01BA8470C017D6468C8290E4B9C5E1E83B379B43@shsmsx102.ccr.corp.intel.com> In-Reply-To: <01BA8470C017D6468C8290E4B9C5E1E83B379B43@shsmsx102.ccr.corp.intel.com> Accept-Language: en-US, he-IL Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=matan@mellanox.com; x-originating-ip: [193.47.165.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; VI1PR0501MB2207; 7:sWSq3UwZwhMUlNHDpPM3iSbapEKJsZRv1cG0eUi6C+g/ZQh9xSHYTLPZ0SFFzAoVk+zStvCT80lCEbBq0tMCVVneWDq898WPKykB7wY6RnFCNQixI0F5eCScomz62w98YqwNXypRjvc3zUIgiOPNMemOwQ83wRc447OY/HTJ11dMIzR6QcYRga7zp6diEtmrELO8TpTuyEq0Yx9t3AAolTMSvaLcNvk7wFR0f78eZ8ZJD2NDmDYAcXbFITtZHcn3 x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(5600026)(48565401081)(4534165)(4627221)(201703031133081)(201702281549075)(2017052603328)(7153060)(7193020); SRVR:VI1PR0501MB2207; x-ms-traffictypediagnostic: VI1PR0501MB2207: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(72170088055959); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(10201501046)(93006095)(93001095)(3002001)(3231254)(944501410)(52105095)(6055026)(149027)(150027)(6041310)(20161123562045)(20161123558120)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123560045)(6072148)(201708071742011)(7699016); SRVR:VI1PR0501MB2207; BCL:0; PCL:0; RULEID:; SRVR:VI1PR0501MB2207; x-forefront-prvs: 0682FC00E8 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(39860400002)(39380400002)(376002)(346002)(366004)(189003)(199004)(478600001)(33656002)(14454004)(2900100001)(7736002)(305945005)(68736007)(7416002)(4326008)(25786009)(5660300001)(53936002)(229853002)(55016002)(6116002)(3846002)(9686003)(6436002)(6246003)(106356001)(8936002)(105586002)(3660700001)(81166006)(81156014)(8676002)(3280700002)(2906002)(97736004)(102836004)(8656006)(59450400001)(6506007)(186003)(11346002)(446003)(316002)(26005)(2501003)(54906003)(86362001)(99286004)(66066001)(5250100002)(7696005)(110136005)(76176011)(486006)(74316002)(476003); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR0501MB2207; H:VI1PR0501MB2608.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: ifG05N+T0xYRoy8kI+mL2bsW/4rH9fekLs3k+9mUyyokr65SwtkCZu1PHFcWG96sJCGyuEsMJemkZKsYSmwsryEjAv/AFxOT4wjlRXnXZOeNuEH8nn+CVJalpauI3I4Ai4ns9lrAXccLboE21CH/Quv7MkiR0OZbwoKlrX4zO8h/MQjlngmZy85ucPFj19ik spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: 80944a4b-d6d9-4df5-c1e0-08d5c186b75f X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 80944a4b-d6d9-4df5-c1e0-08d5c186b75f X-MS-Exchange-CrossTenant-originalarrivaltime: 24 May 2018 14:57:48.8608 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0501MB2207 Subject: Re: [dpdk-dev] [RFC] hot plug failure handle mechanism X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 May 2018 14:57:52 -0000 Hi Guo Some questions. From: Guo Jia > As we know, hot plug is an importance feature whenever it use for the > datacenter device's fail-safe and consumption management , or use for the > dynamic deployment and SRIOV Live Migration in SDN/NFV, it could be brin= g > the higher flexibility and continuality of the networking services in mul= tiple use > case in industry. >=20 > So let we see, dpdk as an importance networking combine framework with > packet control path/fast path lib and multiple diversity PMD drivers, wha= t can it > do to help if application want to achieve their hot plug solution when th= ey are > working in packet processing by dpdk. >=20 > We already have a general device event mechanism, failsafe driver, bondin= g > driver and hot plug/unplug api in framework, app could use these api to > develop functional, but for the case of hot plug failure handle, that is = removing > a device at run-time will cause app trigger MMIO error and crash out, it = is lack > of a mechanism to handle the failure when hot unplug device. At present, > kernel only guantiy the hotplug handle safer on the kernel side, but for = the user > mode side, no more specific 3rd tools such as udev/driverctl have especia= lly > cover about these part of mechanism, and considerate feasibility of the > implementation, runtime performance and the general for almost user mode > PMD driver, here a general hot plug failure handle mechanism in dpdk > framework would be proposed. >=20 > The hot plug failure handle mechanism should be come across as bellow: > 1. Add a new bus ops "handle_hot-unplug"in bus to handle bus read/write > error, it is bus-specific and each kind of bus can implement its own logi= c. > 2. Implement pci bus specific ops"pci_handle_hot_unplug", in the function= , > base on the failure address to remap memory which belong to the > corresponding device that unplugged. > 3. Implement a new sigbus handler, and register it when start device even= t > monitoring, once the MMIO sigbus error exposure, it will trigger the abov= e hot > plug failure handle mechanism, that will keep app, that working on packet > processing, would not be broken and crash, then could keep going clean, f= ail- > safe or other working task. Can you explain more what's happened with all the threads? Master thread, h= ost thread, data-path threads, The signal may happened only in a datapath thread or even from a control th= read? What's about resource leak? (mainly relevant for control threads): If you jump from the signal address to the restart address, how can you cle= an the process which was started and got the signal? Matan. > 4. Also also will introduce the solution by use testpmd to show the examp= le of > the whole procedure like that: > device unplug ->failure handle->stop forwarding->stop port->close port->d= etach > port. >=20 > Best regards, >=20 > Jeff Guo