From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-VE1-obe.outbound.protection.outlook.com (mail-eopbgr20042.outbound.protection.outlook.com [40.107.2.42]) by dpdk.org (Postfix) with ESMTP id 402B81B760; Wed, 31 Jan 2018 18:07:58 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=4l3Z90RnFxTR8Wox8XFKw4ehU27UfIvStXDMkB2B7/4=; b=KY5dd5TyfOz0UoYUwFrQ786URMsus7ffTvMrYwT4J0ay5ctpDpAobOftkS8ggBsecUW06zofmRzguiV5Eroh45a4joBi32Q4F+NvNv139l3/rlCz14T0i21oOiadyERRhUvUy1yLtxp2g5IzpSOUhbf6NwXuvcnoQN3dDk5ANK8= Received: from AM4PR0501MB2657.eurprd05.prod.outlook.com (10.172.215.19) by AM4PR0501MB2337.eurprd05.prod.outlook.com (10.165.82.154) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.444.14; Wed, 31 Jan 2018 17:07:55 +0000 Received: from AM4PR0501MB2657.eurprd05.prod.outlook.com ([fe80::50a5:cd88:b3d8:763e]) by AM4PR0501MB2657.eurprd05.prod.outlook.com ([fe80::50a5:cd88:b3d8:763e%17]) with mapi id 15.20.0444.016; Wed, 31 Jan 2018 17:07:56 +0000 From: Matan Azrad To: Adrien Mazarguil CC: Shahaf Shuler , Mordechay Haimovsky , "dev@dpdk.org" , "stable@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v3] net/mlx4: fix dev rmv not detected after port stop Thread-Index: AQHTmoBrLBnxYvtDvEq0X4rYUAyrUKON6+DwgAAfnYCAABjRgA== Date: Wed, 31 Jan 2018 17:07:56 +0000 Message-ID: References: <1516357009-15463-1-git-send-email-motih@mellanox.com> <1517214877-126768-1-git-send-email-motih@mellanox.com> <20180130093958.GE4256@6wind.com> <20180131091513.GS4256@6wind.com> <20180131104352.GT4256@6wind.com> <20180131143157.GV4256@6wind.com> In-Reply-To: <20180131143157.GV4256@6wind.com> Accept-Language: en-US, he-IL Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=matan@mellanox.com; x-originating-ip: [193.47.165.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM4PR0501MB2337; 7:ifm5FpuiTkx5s5We4CE925Jjgjsv8cxQof4M+mNNHgnfZ5T/BPbwpLQlIg3WOojtzmQtwDsNBHPGHvUfodMyveTab5GGinPU3AP4aM2zwTVXLycaetgXFQqDnE8Ctf5pepOxrq3KiJyWCWVUOTIm0pJdn7IgcsqkeY1VgYoH1AZy09XVPd9spNvXDjW1w5thU5X7yMOuAL7ACSLKy/rbDga26H9qxZrMO00w8xPL4nAmuMAT2xHXbgsSRknQy8Py x-ms-exchange-antispam-srfa-diagnostics: SSOS; x-ms-office365-filtering-ht: Tenant x-ms-office365-filtering-correlation-id: ccbcb1b6-ce8b-494b-4028-08d568cd2c8a x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(4534165)(4627221)(201703031133081)(201702281549075)(48565401081)(5600026)(4604075)(3008032)(2017052603307)(7153060)(7193020); SRVR:AM4PR0501MB2337; x-ms-traffictypediagnostic: AM4PR0501MB2337: x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(278428928389397); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040501)(2401047)(8121501046)(5005006)(93006095)(93001095)(3231101)(2400082)(944501161)(3002001)(10201501046)(6055026)(6041288)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123564045)(20161123562045)(20161123558120)(6072148)(201708071742011); SRVR:AM4PR0501MB2337; BCL:0; PCL:0; RULEID:; SRVR:AM4PR0501MB2337; x-forefront-prvs: 056929CBB8 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(376002)(39860400002)(346002)(39380400002)(366004)(51444003)(189003)(199004)(53754006)(105586002)(102836004)(5890100001)(76176011)(25786009)(5250100002)(14454004)(2900100001)(305945005)(6506007)(478600001)(66066001)(74316002)(59450400001)(7736002)(2906002)(6916009)(3846002)(4326008)(316002)(6116002)(93886005)(54906003)(186003)(99286004)(8936002)(86362001)(5660300001)(7696005)(2950100002)(81166006)(6246003)(229853002)(97736004)(53936002)(106356001)(33656002)(3660700001)(6436002)(68736007)(8676002)(3280700002)(55016002)(81156014)(26005)(9686003); DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR0501MB2337; H:AM4PR0501MB2657.eurprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: vhv5tTLd0z5WD2b7l42q74TPkwYquzi0HzTgYqSJ05aClVjJis3mhkwp+/dzWOAmKFMYEWLOft87Qw2rlAnKjw== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: ccbcb1b6-ce8b-494b-4028-08d568cd2c8a X-MS-Exchange-CrossTenant-originalarrivaltime: 31 Jan 2018 17:07:56.6697 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR0501MB2337 Subject: Re: [dpdk-dev] [PATCH v3] net/mlx4: fix dev rmv not detected after port stop X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Jan 2018 17:07:58 -0000 Hi Adrien From: Adrien Mazarguil , Sent: Wednesday, January 31, 2018 4:32 PM > Hi Matan, >=20 > On Wed, Jan 31, 2018 at 01:44:41PM +0000, Matan Azrad wrote: > > Hi Adrien > > > > From: Adrien Mazarguil, Sent: Wednesday, January 31, 2018 12:44 PM > > > On Wed, Jan 31, 2018 at 10:08:06AM +0000, Matan Azrad wrote: > > > > Hi all > > > > > > > > From: Adrien Mazarguil > > > > > On Tue, Jan 30, 2018 at 08:37:06PM +0000, Shahaf Shuler wrote: > > > > > > Tuesday, January 30, 2018 11:40 AM, Adrien Mazarguil: > > > > > > > Unfortunately I didn't get a chance to review this patch > > > > > > > before it was > > > > > applied. > > > > > > > I'm not sure a stopped port is supposed to report events > > > > > > > (interrupts). Will applications expect them to occur at this = point? > > > > > > > > > > > > Why not? > > > > > > > > > > > > Stopped port is still counted as attached. The fact the > > > > > > application stopped > > > > > the packet receive on it doesn't mean it should not receive a > > > > > sync events (such as the remove event). > > > > > > async events, by definition, are not related to traffic being > > > > > > flows through > > > > > the port. > > > > > > > > > > My comment is based on my understanding of rte_eth_dev_stop(), > > > which > > > > > is a device (or port) is completely stopped, in a suspended > > > > > state and no interrupts shall occur, as a means for applications > > > > > to temporarily not be bothered by them until restarted. > > > > > > > > Stopping traffic is not saying that the application is not > > > > interesting in the device, I think that you mean to dev_close(). > > > > > > No, dev_close() releases resources and destroys configuration. Good > > > luck using dev_start() or any other devop after dev_close(). > > > > I'm just saying here that when the user call dev_close() it means he > > is not interesting in the device any more, This is not the case in dev_= stop(). > > > > > > Any event may still be usable for application between dev_stop() > > > > to dev_start(), especially RMV or LCS can still be interested. > > > > > > Possibly, but then, how come no PMD implements it that way? > > > > Again, maybe others PMDs make mistakes. >=20 > It's basically the same as stating that for all these years, neither PMD = nor > application maintainers understood the true meaning of this API. >=20 > Maybe you're right, maybe not. What's for sure is this patch unilaterally > decided to modify an accepted behavior. Perhaps it's not a big deal but w= e > must be cautious. Agree. I'm just saying my opinion here. > > > Neither did mlx4 before this patch got applied. At the very least it > > > cannot be considered a "fix". > > > > It fixes something. >=20 > Technically every time a feature is added, its absence gets "fixed" :) :) =20 > > > > > Think about it that way: applications do not want to get > > > > > interrupts immediately after the device is initialized, because > > > > > they might not be ready to process them at this point. An > > > > > explicit call to > > > > > rte_eth_dev_start() tells the PMD when it's OK to do so. The > > > > > converse is > > > rte_eth_dev_stop(). > > > > > > > > So, they can delay the event registration to the time they > > > > interesting in the > > > events. > > > > And use event unregister when they are not interesting in it anymor= e. > > > > > > Of course you can ask application maintainers to adapt to the new > > > behavior, or you know, leave things as they used to be. > > > > > > > I don't know what any application does but for me it is a mistake to > > stop all event processes in dev_stop(), Maybe for other application > maintainers too. >=20 > Just like you, I don't know either what all the applications ever written= for > DPDK expect out of dev_stop(). What's for sure is that currently, LSC/RMV > don't occcur afterward, the same way these events do not occur before > dev_start(). Why not? RMV event can occur any time after probe. > Any application possibly relying on this fact will break. In such a > situation, a conservative approach is better. If an application should fail to get event in stopped state it may fail in = the previous code too: The interrupt run from host thread so the next race may occur: dev_start() : master thread. Context switch. RMV interrupt started to run callbacks: host thread. Context switch. dev_stop(): master thread. Start reconfiguration: master thread.=20 Context switch. Callback running. So, the only thing which can disable callback running after dev_stop() is c= allback unregistration before it. > > > Setting up RMV/LSC callbacks is not the only configuration an > > > application usually performs before calling dev_start(). Think about > > > setting up flow rules, MAC addresses, VLANs, and so on, this on > > > multiple ports before starting them up all at once. Previously it > > > could be done in an unspecified order, now they have to take special = care > for RMV/LSC. > > > > Or maybe there callbacks code are already safe for it. > > Or they manages the unregister\register calls in the right places. >=20 > That's my point, these "maybes" don't argue in favor of changing things. What I'm saying is that callbacks should be safe or not registered in the r= ight time. > > > Many devops are only safe when called while a device is stopped. > > > It's even documented in rte_ethdev.h. > > > > > > > And? >=20 > ...And applications therefore often do all their configuration in an unsp= ecified > order while a port is stopped as a measure of safety. No extra care is ta= ken > for RMV/LSC. This uncertainty can be addressed by not modifying the curre= nt > behavior. Or they expect to get interrupt and the corner case will come later if we w= ill not change it. > > > > > Stopping traffic can already be achieved by not polling from the > > > > > application side, calling rte_eth_dev_[rt]x_queue_stop() and/or > > > > > toggling RX/TX interrupts through rte_eth_dev_[rt]x_intr_enable()= . > > > > > rte_eth_dev_stop() provides lower-level device control. > > > > > > > > I think it makes sense only for Rx interrupt which is traffic > > > > oriented(like stop > > > and start). > > > > > > OK, looks like we disagree :) > > > > > > > > Perhaps documentation is not clear, however that's how LSC seems > > > > > implemented in all PMDs; it gets disabled after > > > > > rte_eth_dev_stop() and one should explicitly use > > > > > rte_eth_link_get() to retrieve link status afterward. I think > > > > > RMV should behave similarly with > > > rte_eth_dev_is_removed(). > > > > > Adapting fail-safe should be easier than modifying all the > > > > > remaining > > > PMDs. > > > > > > > > Or maybe PMDs which do it make mistakes. > > > > > > I'm not convinced mlx4 is the only PMD doing the right thing, we > > > should ask other maintainers as well as application writers' opinion > > > on the matter. If it's not a problem for RMV/LSC to occur while a > > > device is stopped, then I'll agree with the approach. It still won't = make it a > fix regardless. > > > > Let's think about RMV event, How can the application\other dpdk entitie= s > to know if the device was removed when it was in stopped state? > > Checking it synchronically (by rte_eth_dev_is_removed()) can miss the > removal just a moment after the device came back again, errors will occur > and no one will know why. > > > > So, at least for RMV event, we need the notification also in stopped st= ate. >=20 > You sent the rte_eth_dev_is_removed() series. You're aware that PMDs > implementing this call benefit from an automatic is_removed() check on al= l > remaining devops whenever some error occur. > In short, an application will get notified simply by getting dev_start() = (or any > other callback) return -EIO and not being able to use the device. =20 Yes, but between dev_stop to dev_start may not be any ethdev API calling. > PMDs that do not implement is_removed() (or in addition to it) could also > artificially trigger a RMV event after dev_start() is called. As long as = the PMD > remains quiet while the device is stopped, it's fine. How can the PMD do it after dev_start()? Initiate alarm in dev start functi= on to do it later And entering into race again? I think it is not worth the effort and this patch is doing the right thing(= also some other PMDs) . Thanks. > > > > > > > In my opinion it's not a fix, as in, it doesn't address an > > > > > > > issue introduced by the mentioned patch whose behavior was > correct. > > > > > > > > > > > > > > It's probably too late to change it now and it does address > > > > > > > an issue seen with a use case involving this PMD, however I > > > > > > > think the fail-safe PMD could as well poll using the > > > > > > > recently-added > > > > > > > rte_eth_dev_is_removed() when it's aware the underlying port > > > > > > > is > > > > > stopped instead of expecting interrupts. >=20 > -- > Adrien Mazarguil > 6WIND