From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id CA6421B5CA for ; Fri, 23 Nov 2018 20:10:49 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Nov 2018 11:10:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,270,1539673200"; d="scan'208";a="98539977" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by FMSMGA003.fm.intel.com with ESMTP; 23 Nov 2018 11:10:48 -0800 Received: from fmsmsx121.amr.corp.intel.com (10.18.125.36) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.408.0; Fri, 23 Nov 2018 11:10:47 -0800 Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by fmsmsx121.amr.corp.intel.com (10.18.125.36) with Microsoft SMTP Server (TLS) id 14.3.408.0; Fri, 23 Nov 2018 11:10:47 -0800 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.161]) by shsmsx102.ccr.corp.intel.com ([169.254.2.84]) with mapi id 14.03.0415.000; Sat, 24 Nov 2018 03:10:45 +0800 From: "Zhang, Qi Z" To: "Stojaczyk, Dariusz" , "dev@dpdk.org" CC: "thomas@monjalon.net" Thread-Topic: [PATCH] dev: fix attach rollback of a device that was already attached Thread-Index: AQHUgzvs+muhwMoivUCHjbXKCxvU9aVdsoKQ Date: Fri, 23 Nov 2018 19:10:45 +0000 Message-ID: <039ED4275CED7440929022BC67E70611532EA1D0@SHSMSX103.ccr.corp.intel.com> References: <20181123144506.95367-1-dariusz.stojaczyk@intel.com> In-Reply-To: <20181123144506.95367-1-dariusz.stojaczyk@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOTRiODA0NDUtMWU4ZC00MjliLTg3YjItNGNkMGE5YjRkOTk3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoieDloOEV1MERnNndRdlF5Q2czVFg0ajlNcVhkcElcL1dEZlwvc2xcL2VxREVEXC9VMDY0NzlXQ3c5bWZ6U09OWkdoblwvIn0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] dev: fix attach rollback of a device that was already attached X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Nov 2018 19:10:50 -0000 > -----Original Message----- > From: Stojaczyk, Dariusz > Sent: Friday, November 23, 2018 6:45 AM > To: dev@dpdk.org > Cc: thomas@monjalon.net; Stojaczyk, Dariusz = ; > Zhang, Qi Z > Subject: [PATCH] dev: fix attach rollback of a device that was already at= tached >=20 > When primary process receives an IPC attach request of a device that's al= ready > locally-attached, it doesn't setup its variables properly and is prone to= segfaulting > on a subsequent rollback. >=20 > `ret =3D local_dev_probe(req->devargs, &dev)` >=20 > The above function will set `dev` pointer to the proper device *unless* i= t returns > with error. One of those errors is -EEXIST, which the hotplug function ex= plicitly > ignores. For -EEXIST, it proceeds with attaching the device and expects t= he dev > pointer to be valid. Good capture. >=20 > Despite this patch being a fix, it also introduces a design decision - wh= en any > secondary process fails to attach a device, the primary process that alre= ady had > the device attached won't attempt to detach that device locally as a part= of the > rollback routine. > Primary process would have already printed a message "Failed to [...] on > secondary" and now it will also print a warning "Devices may not be in sy= nc [...]". A little bit concern for this. we may try to avoid the abnormal situation that device is not synced. The scenario you describe actually is start from an abnormal situation due = to some previous error. so is it better to always take chance to end up with a normal situation. It looks better for me if we can fixed it in local_dev_probe to return a va= lid device with -EEXIST. >=20 > Fixes: ac9e4a17370f ("eal: support attach/detach shared device from > secondary") > Cc: qi.z.zhang@intel.com >=20 > Signed-off-by: Darek Stojaczyk > --- > lib/librte_eal/common/hotplug_mp.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) >=20 > diff --git a/lib/librte_eal/common/hotplug_mp.c > b/lib/librte_eal/common/hotplug_mp.c > index 7c9fcc46c..7ee074a31 100644 > --- a/lib/librte_eal/common/hotplug_mp.c > +++ b/lib/librte_eal/common/hotplug_mp.c > @@ -88,7 +88,7 @@ __handle_secondary_request(void *param) > (const struct eal_dev_mp_req *)msg->param; > struct eal_dev_mp_req tmp_req; > struct rte_devargs *da; > - struct rte_device *dev; > + struct rte_device *dev =3D NULL; > struct rte_bus *bus; > int ret =3D 0; >=20 > @@ -168,7 +168,15 @@ __handle_secondary_request(void *param) > if (req->t =3D=3D EAL_DEV_REQ_TYPE_ATTACH) { > tmp_req.t =3D EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK; > eal_dev_hotplug_request_to_secondary(&tmp_req); > - local_dev_remove(dev); > + if (dev =3D=3D NULL) { > + /* device was already attached at the time we got the > + * request, don't detach it now. > + */ > + RTE_LOG(WARNING, EAL, > + "Devices in secondary may not sync with primary\n"); > + } else { > + local_dev_remove(dev); > + } > } else { > tmp_req.t =3D EAL_DEV_REQ_TYPE_DETACH_ROLLBACK; > eal_dev_hotplug_request_to_secondary(&tmp_req); > -- > 2.17.1