From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dariusz.stojaczyk@intel.com>
Received: from mga12.intel.com (mga12.intel.com [192.55.52.136])
 by dpdk.org (Postfix) with ESMTP id 19C831B5E2
 for <dev@dpdk.org>; Fri, 23 Nov 2018 21:29:09 +0100 (CET)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 23 Nov 2018 12:29:09 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.56,271,1539673200"; d="scan'208";a="88693163"
Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205])
 by fmsmga008.fm.intel.com with ESMTP; 23 Nov 2018 12:29:08 -0800
Received: from lcsmsx154.ger.corp.intel.com (10.186.165.229) by
 fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS)
 id 14.3.408.0; Fri, 23 Nov 2018 12:29:08 -0800
Received: from hasmsx105.ger.corp.intel.com ([169.254.1.193]) by
 LCSMSX154.ger.corp.intel.com ([169.254.7.79]) with mapi id 14.03.0415.000;
 Fri, 23 Nov 2018 22:29:05 +0200
From: "Stojaczyk, Dariusz" <dariusz.stojaczyk@intel.com>
To: "Zhang, Qi Z" <qi.z.zhang@intel.com>, "dev@dpdk.org" <dev@dpdk.org>
CC: "thomas@monjalon.net" <thomas@monjalon.net>
Thread-Topic: [PATCH] dev: fix attach rollback of a device that was already
 attached
Thread-Index: AQHUgzvsYqpLh2GJeUWuRc5wZiMduaVdmHmAgAA0hzA=
Date: Fri, 23 Nov 2018 20:29:04 +0000
Message-ID: <FBE7E039FA50BF47A673AD0BD3CD56A846234D6F@HASMSX105.ger.corp.intel.com>
References: <20181123144506.95367-1-dariusz.stojaczyk@intel.com>
 <039ED4275CED7440929022BC67E70611532EA1D0@SHSMSX103.ccr.corp.intel.com>
In-Reply-To: <039ED4275CED7440929022BC67E70611532EA1D0@SHSMSX103.ccr.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-product: dlpe-windows
dlp-version: 11.0.400.15
dlp-reaction: no-action
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiOTRiODA0NDUtMWU4ZC00MjliLTg3YjItNGNkMGE5YjRkOTk3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoieDloOEV1MERnNndRdlF5Q2czVFg0ajlNcVhkcElcL1dEZlwvc2xcL2VxREVEXC9VMDY0NzlXQ3c5bWZ6U09OWkdoblwvIn0=
x-ctpclassification: CTP_NT
x-originating-ip: [10.252.41.58]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH] dev: fix attach rollback of a device that
 was already attached
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Nov 2018 20:29:10 -0000



> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Friday, November 23, 2018 8:11 PM
> To: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>; dev@dpdk.org
> Cc: thomas@monjalon.net
> Subject: RE: [PATCH] dev: fix attach rollback of a device that was alread=
y
> attached
>=20
>=20
>=20
> > -----Original Message-----
> > From: Stojaczyk, Dariusz
> > Sent: Friday, November 23, 2018 6:45 AM
> > To: dev@dpdk.org
> > Cc: thomas@monjalon.net; Stojaczyk, Dariusz
> <dariusz.stojaczyk@intel.com>;
> > Zhang, Qi Z <qi.z.zhang@intel.com>
> > Subject: [PATCH] dev: fix attach rollback of a device that was already
> attached
> >
> > When primary process receives an IPC attach request of a device that's
> already
> > locally-attached, it doesn't setup its variables properly and is prone =
to
> segfaulting
> > on a subsequent rollback.
> >
> > `ret =3D local_dev_probe(req->devargs, &dev)`
> >
> > The above function will set `dev` pointer to the proper device *unless*=
 it
> returns
> > with error. One of those errors is -EEXIST, which the hotplug function
> explicitly
> > ignores. For -EEXIST, it proceeds with attaching the device and expects=
 the
> dev
> > pointer to be valid.
>=20
> Good capture.
> >
> > Despite this patch being a fix, it also introduces a design decision - =
when
> any
> > secondary process fails to attach a device, the primary process that al=
ready
> had
> > the device attached won't attempt to detach that device locally as a pa=
rt of
> the
> > rollback routine.
> > Primary process would have already printed a message "Failed to [...] o=
n
> > secondary" and now it will also print a warning "Devices may not be in =
sync
> [...]".
>=20
> A little bit concern for this.
> we may try to avoid the abnormal situation that device is not synced.
> The scenario you describe actually is start from an abnormal situation du=
e to
> some previous error.
> so is it better to always take chance to end up with a normal situation.
>=20
> It looks better for me if we can fixed it in local_dev_probe to return a =
valid
> device with -EEXIST.

Actually that was my original idea, but I gave it up in the end.
Ok, I'll do that in V2.

Thanks,
D.

>=20
> >
> > Fixes: ac9e4a17370f ("eal: support attach/detach shared device from
> > secondary")
> > Cc: qi.z.zhang@intel.com
> >
> > Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> > ---
> >  lib/librte_eal/common/hotplug_mp.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/hotplug_mp.c
> > b/lib/librte_eal/common/hotplug_mp.c
> > index 7c9fcc46c..7ee074a31 100644
> > --- a/lib/librte_eal/common/hotplug_mp.c
> > +++ b/lib/librte_eal/common/hotplug_mp.c
> > @@ -88,7 +88,7 @@ __handle_secondary_request(void *param)
> >  		(const struct eal_dev_mp_req *)msg->param;
> >  	struct eal_dev_mp_req tmp_req;
> >  	struct rte_devargs *da;
> > -	struct rte_device *dev;
> > +	struct rte_device *dev =3D NULL;
> >  	struct rte_bus *bus;
> >  	int ret =3D 0;
> >
> > @@ -168,7 +168,15 @@ __handle_secondary_request(void *param)
> >  	if (req->t =3D=3D EAL_DEV_REQ_TYPE_ATTACH) {
> >  		tmp_req.t =3D EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
> >  		eal_dev_hotplug_request_to_secondary(&tmp_req);
> > -		local_dev_remove(dev);
> > +		if (dev =3D=3D NULL) {
> > +			/* device was already attached at the time we got
> the
> > +			 * request, don't detach it now.
> > +			 */
> > +			RTE_LOG(WARNING, EAL,
> > +				"Devices in secondary may not sync with
> primary\n");
> > +		} else {
> > +			local_dev_remove(dev);
> > +		}
> >  	} else {
> >  		tmp_req.t =3D EAL_DEV_REQ_TYPE_DETACH_ROLLBACK;
> >  		eal_dev_hotplug_request_to_secondary(&tmp_req);
> > --
> > 2.17.1