From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <matan@mellanox.com>
Received: from EUR01-HE1-obe.outbound.protection.outlook.com
 (mail-he1eur01on0041.outbound.protection.outlook.com [104.47.0.41])
 by dpdk.org (Postfix) with ESMTP id DD6001322C
 for <dev@dpdk.org>; Mon,  8 Jan 2018 09:14:23 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com;
 s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;
 bh=vHlHA0hrQRVwD8XCFgZOnS7fYsQsclrr+n/pwdofbgs=;
 b=xk/0UEK0vgF9SnooSBx+ObkhYRV3YUj55bH+2HSlSotWhLXIibWKCEG4DP6tNdT8CXUWH797tmpCU6BzQLy6+gCUnf6Ik2vvJTSH5nvAKSPUAdDuzw/+DBxWdWH/7kp7h82tC7FL+qAdKgmUXEe3PWPGAoGnUQQlknzFCq6Dao8=
Received: from AM6PR0502MB3797.eurprd05.prod.outlook.com (52.133.21.26) by
 VI1PR05MB3215.eurprd05.prod.outlook.com (10.170.237.160) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id
 15.20.386.5; Mon, 8 Jan 2018 08:14:21 +0000
Received: from AM6PR0502MB3797.eurprd05.prod.outlook.com
 ([fe80::b4b4:7de8:cf70:aa3a]) by AM6PR0502MB3797.eurprd05.prod.outlook.com
 ([fe80::b4b4:7de8:cf70:aa3a%13]) with mapi id 15.20.0386.006; Mon, 8 Jan 2018
 08:14:21 +0000
From: Matan Azrad <matan@mellanox.com>
To: "Guo, Jia" <jia.guo@intel.com>, "stephen@networkplumber.org"
 <stephen@networkplumber.org>, "bruce.richardson@intel.com"
 <bruce.richardson@intel.com>, "ferruh.yigit@intel.com"
 <ferruh.yigit@intel.com>, "gaetan.rivet@6wind.com" <gaetan.rivet@6wind.com>
CC: "konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
 "jblunck@infradead.org" <jblunck@infradead.org>, "shreyansh.jain@nxp.com"
 <shreyansh.jain@nxp.com>, "jingjing.wu@intel.com" <jingjing.wu@intel.com>,
 "dev@dpdk.org" <dev@dpdk.org>, Thomas Monjalon <thomas@monjalon.net>,
 "helin.zhang@intel.com" <helin.zhang@intel.com>, Mordechay Haimovsky
 <motih@mellanox.com>
Thread-Topic: [dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug
Thread-Index: AQHTg64SECBQ/TN3RUeAgZfnB8cQC6NgekaQgAkA4ICAABnxsA==
Date: Mon, 8 Jan 2018 08:14:20 +0000
Message-ID: <AM6PR0502MB379734190769B07E0DA0EBECD2130@AM6PR0502MB3797.eurprd05.prod.outlook.com>
References: <1509567405-27439-3-git-send-email-jia.guo@intel.com>
 <1514943736-5931-1-git-send-email-jia.guo@intel.com>
 <1514943736-5931-2-git-send-email-jia.guo@intel.com>
 <AM6PR0502MB379764A90FED042C26FB975BD2190@AM6PR0502MB3797.eurprd05.prod.outlook.com>
 <d3bb6fd9-2d56-341f-5e4b-9d05542c588c@intel.com>
In-Reply-To: <d3bb6fd9-2d56-341f-5e4b-9d05542c588c@intel.com>
Accept-Language: en-US, he-IL
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=matan@mellanox.com; 
x-originating-ip: [193.47.165.251]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; VI1PR05MB3215;
 6:uDH4U1OoF3UbUqwaGyP8Rn7jfh4IV+s6muPD8Y8G9Hlql9lD297czEu942i0wBfKORusuJKk5xEEvlbUh/SreoP3HCMVIdMdepdC3yPpuPfCYuHU33Ik5bJ2LXWYcxiBr3onhXfbXSHaxYlU077lBt9nrq7iCcTDoT3IAjNvcTOpY+HZ0oowoK5m+l41wI8PQ7/J6yQUnlByg8hjXPzDuBcdGxrZkWV8b0/Jxzr7dj640hg6lFlFf+KfzX/5PoifAliP7Zvd8vGiENaVFLLSADwJpa76wGCBRDzoIIcJ9TNocyhNpc9vyAvBvjF5CTtNVDU8BkEQq2OIciPlUiD/OGyiDbSG4eLm95lZSAdY6G4VPeK0W8aauIawTnuu9By0;
 5:UwkPrR39DE5q9Wyw2ljZx36V0Wmx+ARUdYDLsvFG8JlEbSug1/+KuAVU7u+kyQYs/sEG5H4iNm+uyY1L5wFV6nN8k6vl1SnVzmtfNZsLcwKLgrppCHD3wNKIsTqkTHTIadx4u6neZlIYA/tr9EbcpwbPw4yhwW9Udo7KDMqHNx8=;
 24:cptoRwZXGu+QQHpg91XV9VDjTt7hZMdLSV+OCw+fDm08NLRMR8kJK0V1YxNW2qI26bAYPbU2ZZ3Egvl5NXMh2veRCVa5aRdCI2nztm/Xm8I=;
 7:e1T6n2do7ErbZWse167gIABWE7jAR2fH+CMl6VVBeq3fOrFg+wEacc2fgQg7p3zpRuXklt0AO0HNPQnhbgzsImdHbC3bZ6V7G5y+Qb3mMLPndP2enOMnYUmaWt3UMTM3zgpyTPZjxG1q0Wez6y7PtNT/K8PjixIwn6UJpMUnsfdusW9iQdeIFXErR5DAneRYxek03uUTl08fZV8izRTMsgkgdAoCEtzMfWdvaOUpTnwAm1C5VyA2BRtXIxgULntq
x-ms-exchange-antispam-srfa-diagnostics: SSOS;
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: d5f1dfad-7995-4517-ca4b-08d5566fd237
x-microsoft-antispam: UriScan:; BCL:0; PCL:0;
 RULEID:(48565401081)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(5600026)(4604075)(3008032)(2017052603307)(7153060)(7193020);
 SRVR:VI1PR05MB3215; 
x-ms-traffictypediagnostic: VI1PR05MB3215:
x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr
x-microsoft-antispam-prvs: <VI1PR05MB3215D72DFC989F64B0C7A79CD2130@VI1PR05MB3215.eurprd05.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(72170088055959)(228905959029699);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
 RULEID:(6040470)(2401047)(8121501046)(5005006)(3231023)(944501075)(3002001)(10201501046)(93006095)(93001095)(6055026)(6041268)(20161123558120)(20161123560045)(20161123562045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011);
 SRVR:VI1PR05MB3215; BCL:0; PCL:0; RULEID:(100000803101)(100110400095);
 SRVR:VI1PR05MB3215; 
x-forefront-prvs: 054642504A
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(376002)(396003)(39860400002)(346002)(39380400002)(366004)(24454002)(189003)(199004)(7696005)(7416002)(316002)(25786009)(66066001)(105586002)(68736007)(2900100001)(5890100001)(5250100002)(2501003)(6116002)(55016002)(106356001)(81156014)(3660700001)(81166006)(3846002)(8676002)(4326008)(305945005)(107886003)(7736002)(6246003)(5660300001)(99286004)(9686003)(2950100002)(54906003)(53936002)(3280700002)(74316002)(59450400001)(8656006)(33656002)(8936002)(229853002)(110136005)(2906002)(93886005)(86362001)(6436002)(102836004)(53546011)(14454004)(76176011)(97736004)(2201001)(478600001)(6506007);
 DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR05MB3215;
 H:AM6PR0502MB3797.eurprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords;
 MX:1; A:1; LANG:en; 
received-spf: None (protection.outlook.com: mellanox.com does not designate
 permitted sender hosts)
x-microsoft-antispam-message-info: l8/cn950GcGCgYgNbA2j9+rAxvtGAnqr52EOicNXyzONw3I+P4OHqOx9bQJzpciRkEFgugs64YyDwlmKSqufkw==
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: Mellanox.com
X-MS-Exchange-CrossTenant-Network-Message-Id: d5f1dfad-7995-4517-ca4b-08d5566fd237
X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Jan 2018 08:14:20.9460 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR05MB3215
Subject: Re: [dpdk-dev] [PATCH v7 1/2] eal: add uevent monitor for hot plug
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 08 Jan 2018 08:14:24 -0000

Hi Guo
I answered for the 2 threads here.=20

 From: Guo, Jia, Monday, January 8, 2018 7:27 AM
> On 1/3/2018 1:02 AM, Matan Azrad wrote:
> > Hi Jeff
> >
> > Maybe I'm touching in previous discussions but please see some
> comments\questions.
> >
> > From: Jeff Guo:
> >> This patch aim to add a general uevent mechanism in eal device layer,
> >> to enable all linux kernel object hot plug monitoring, so user could u=
se
> these
> >> APIs to monitor and read out the device status info that sent from the
> kernel
> >> side, then corresponding to handle it, such as detach or attach the
> >> device, and even benefit to use it to do smoothly fail safe work.
> >>
> >> 1) About uevent monitoring:
> >> a: add one epolling to poll the netlink socket, to monitor the uevent =
of
> >>     the device, add device_state in struct of rte_device, to identify =
the
> >>     device state machine.
> >> b: add enum of rte_eal_dev_event_type and struct of rte_eal_uevent.
> >> c: add below API in rte eal device common layer.
> >>     rte_eal_dev_monitor_enable
> >>     rte_dev_callback_register
> >>     rte_dev_callback_unregister
> >>     _rte_dev_callback_process
> >>     rte_dev_monitor_start
> >>     rte_dev_monitor_stop
> >>
> >> 2) About failure handler, use pci uio for example,
> >>     add pci_remap_device in bus layer and below function to process it=
:
> >>     rte_pci_remap_device
> >>     pci_uio_remap_resource
> >>     pci_map_private_resource
> >>     add rte_pci_dev_bind_driver to bind pci device with explicit drive=
r.
> >>
> >> Signed-off-by: Jeff Guo <jia.guo@intel.com>
<snip>
> >> +		user_cb->cb_arg =3D cb_arg;
> >> +		user_cb->event =3D event;
> >> +		if (event =3D=3D RTE_EAL_DEV_EVENT_ADD)
> >> +			dev_add_cb =3D user_cb;
> > Only one dpdk entity can register to ADD callback?
> >
> > I suggest to add option to register all devices maybe by using dummy
> device which will include all the "ALL_DEVICES"  callbacks per event.
> > All means past, present and future devices, by this way 1 callback can =
be
> called for all the devices and more than one dpdk entity could register t=
o  an
> ADD\NEW event.
> > What's about NEW instead of ADD?
> >
> > I also suggest to add the device pointer as a parameter to the
> callback(which will be managed by EAL).
> if you talk about dev_add_cb, the add means device add not cb add, if
> you talk about dev event type, the ADD type is consistent with the type
> form kernel side, anyway could be find a better.

I'm talking about next:
1. dev_add_cb can hold only 1 callback, why? Can't 2 callbacks to be regist=
ered to RTE_EAL_DEV_EVENT_ADD event? (actually there is memory leak in this=
 case)
2. Suggestion to register same callback to "all" devices by 1 call.
3. Suggestion to add parameter for the callback functions - the device poin=
ter.=20
4. Suggestion to change name from RTE_EAL_DEV_EVENT_ADD to RTE_EAL_DEV_EVEN=
T_NEW.
5. Clue how to implement 1,2 by dummy device.


> but for 1 callback for all device, it is make sense , i will think about =
that.
> >> +		else
> >> +			TAILQ_INSERT_TAIL(&(device->uev_cbs), user_cb,
> >> next);
> >> +	}
> >> +
> >> +	rte_spinlock_unlock(&rte_dev_cb_lock);
> >> +	return 0;
> >> +}
<snip>
> >> +int
> >> +_rte_dev_callback_process(struct rte_device *device,
> >> +			enum rte_eal_dev_event_type event,
> >> +			void *cb_arg, void *ret_param)
> >> +{
> >> +	struct rte_eal_dev_callback dev_cb;
> >> +	struct rte_eal_dev_callback *cb_lst;
> >> +	int rc =3D 0;
> >> +
> >> +	rte_spinlock_lock(&rte_dev_cb_lock);
> >> +	if (event =3D=3D RTE_EAL_DEV_EVENT_ADD) {
> >> +		if (cb_arg !=3D NULL)
> >> +			dev_add_cb->cb_arg =3D cb_arg;
> >> +
> >> +		if (ret_param !=3D NULL)
> >> +			dev_add_cb->ret_param =3D ret_param;
> >> +
> >> +		rte_spinlock_unlock(&rte_dev_cb_lock);
> > Can't someone free it when it running?
> > I suggest to  keep the lock locked.
> > Callbacks are not allowed to use this mechanism to prevent deadlock.
> seems it would bring some deadlock here, let's check it more.

A deadlock should occur only when a callback tries to use this mechanism - =
I think it is OK, you just need to document it for the user.=20

> >> +		rc =3D dev_add_cb->cb_fn(dev_add_cb->event,
> >> +				dev_add_cb->cb_arg, dev_add_cb-
> >>> ret_param);
> >> +		rte_spinlock_lock(&rte_dev_cb_lock);
> >> +	} else {
> >> +		TAILQ_FOREACH(cb_lst, &(device->uev_cbs), next) {
> >> +			if (cb_lst->cb_fn =3D=3D NULL || cb_lst->event !=3D event)
> >> +				continue;
> >> +			dev_cb =3D *cb_lst;
> >> +			cb_lst->active =3D 1;
> >> +			if (cb_arg !=3D NULL)
> >> +				dev_cb.cb_arg =3D cb_arg;
> >> +			if (ret_param !=3D NULL)
> >> +				dev_cb.ret_param =3D ret_param;
> >> +
> >> +			rte_spinlock_unlock(&rte_dev_cb_lock);
> > The current active flag doesn't do it  thread safe here, I suggest to k=
eep the
> lock locked.
> > Scenario:
> > 	1. Thread A see active =3D 0 in unregister function.
> > 	2. Context switch.
> > 	3. Thread B start the callback.
> > 	4. Context switch.
> > 	5. Thread A free it.
> > 	6. Context switch.
> > 	7. Seg fault in Thread B.
> the same as above.
The same as above, and I think the active flag doesn't solve the race and y=
ou must solve it for the both cases.
I suggest just to keep the lock locked and document the optional deadlock b=
y the callback code.

<snip>=20
> >> +			rc =3D dev_cb.cb_fn(dev_cb.event,
> >> +					dev_cb.cb_arg, dev_cb.ret_param);
> >> +			rte_spinlock_lock(&rte_dev_cb_lock);
> >> +			cb_lst->active =3D 0;
> >> +		}
> >> +	}
> >> +	rte_spinlock_unlock(&rte_dev_cb_lock);
> >> +	return rc;
> >> +}
> >> 	return(_rte_dev_callback_process(dev,
> >> +					  RTE_EAL_DEV_EVENT_REMOVE,
> >> +					  NULL, NULL));
> > What is the reason to keep this device in EAL device list after the rem=
oval?
> > I suggest to remove it (driver remove, bus remove and EAL remove) after
> the callbacks running.
> > By this way EAL can initiate all device removals.
> agree, device should be remove from the eal device list after the removal=
.

I suggest using rte_eal_hotplug_remove().

<Bring from the second thread>
> it will do device  removal from the device list by the eal device detach=
=20
>function in the call backs running. does it fulfill your concerns.

I mean the removal\probe should be initiated by the EAL and not by the user=
s callbacks.

> >> +			} else if (uevent.type =3D=3D RTE_EAL_DEV_EVENT_ADD)
> >> {
> >> +				if (dev =3D=3D NULL) {
> >> +					/**
> >> +					 * bind the driver to the device
> >> +					 * before user's add processing
> >> +					 */
> >> +					bus->bind_driver(
> >> +						uevent.devname,
> >> +						"igb_uio");
> >> +
> > Similar comments here:
> > EAL can initiate all device probe operations by adding the device and
> probing it here before the callback running.
> > Then, also the device pointer can be passed to the callbacks.
> pass a device pointer could be bring some more change, let's think about
> more.

Yes, I know, it will help to the user especially in ADD(NEW) and REMOVE eve=
nts.

Here you can use rte_eal_hotplug_add().

> >> 	return(_rte_dev_callback_process(NULL,
> >> +					  RTE_EAL_DEV_EVENT_ADD,
> >> +					  uevent.devname, NULL));
> >> +				}
> >> +			}
> >> +		}
> >> +	}
> >> +	return 0;
> >> +}
<snip>
> >> +int
> >> +rte_dev_monitor_start(void)
> >> +{
> > Maybe add option to run it also by new EAL command line parameter?
> good idea.
> >> +	int ret;
> >> +
> >> +	if (!no_request_thread)
> >> +		return 0;

Look, also here there is race, no_request_thread doesn't solve it.
Maybe the EAL parameter should be the only way to run it(just don't expose =
this API), I think the default should be TRUE.

> >> +	no_request_thread =3D false;
> >> +
> >> +	/* create the host thread to wait/handle the uevent from kernel */
> >> +	ret =3D pthread_create(&uev_monitor_thread, NULL,
> >> +		dev_uev_monitoring, NULL);
> > What is the reason to open new thread for hotplug?
> > Why not to use the current dpdk host thread by the alarm mechanism?
> appropriate if you could talk about what you mean the disadvantage of
> new thread here and the advantage of alarm mechanism at the case?

One more thread can complicate things - the user will need to synchronize h=
is alarm\interrupt callbacks code(host thread) with his hotplug callbacks c=
ode(hotplug thread). =20

> >> +	return ret;
> >> +}
> >> +
> >> +int
> >> +rte_dev_monitor_stop(void)
> >> +{
> >> +	udev_exit =3D true;
> >> +	no_request_thread =3D true;
> >> +	return 0;
> >> +}
<snip>