From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <matan@mellanox.com>
Received: from EUR02-VE1-obe.outbound.protection.outlook.com
 (mail-eopbgr20042.outbound.protection.outlook.com [40.107.2.42])
 by dpdk.org (Postfix) with ESMTP id 402B81B760;
 Wed, 31 Jan 2018 18:07:58 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com;
 s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;
 bh=4l3Z90RnFxTR8Wox8XFKw4ehU27UfIvStXDMkB2B7/4=;
 b=KY5dd5TyfOz0UoYUwFrQ786URMsus7ffTvMrYwT4J0ay5ctpDpAobOftkS8ggBsecUW06zofmRzguiV5Eroh45a4joBi32Q4F+NvNv139l3/rlCz14T0i21oOiadyERRhUvUy1yLtxp2g5IzpSOUhbf6NwXuvcnoQN3dDk5ANK8=
Received: from AM4PR0501MB2657.eurprd05.prod.outlook.com (10.172.215.19) by
 AM4PR0501MB2337.eurprd05.prod.outlook.com (10.165.82.154) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id
 15.20.444.14; Wed, 31 Jan 2018 17:07:55 +0000
Received: from AM4PR0501MB2657.eurprd05.prod.outlook.com
 ([fe80::50a5:cd88:b3d8:763e]) by AM4PR0501MB2657.eurprd05.prod.outlook.com
 ([fe80::50a5:cd88:b3d8:763e%17]) with mapi id 15.20.0444.016; Wed, 31 Jan
 2018 17:07:56 +0000
From: Matan Azrad <matan@mellanox.com>
To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
CC: Shahaf Shuler <shahafs@mellanox.com>, Mordechay Haimovsky
 <motih@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org>, "stable@dpdk.org"
 <stable@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v3] net/mlx4: fix dev rmv not detected after
 port stop
Thread-Index: AQHTmoBrLBnxYvtDvEq0X4rYUAyrUKON6+DwgAAfnYCAABjRgA==
Date: Wed, 31 Jan 2018 17:07:56 +0000
Message-ID: <AM4PR0501MB2657E999EF07D6A9C8E4E730D2FB0@AM4PR0501MB2657.eurprd05.prod.outlook.com>
References: <1516357009-15463-1-git-send-email-motih@mellanox.com>
 <1517214877-126768-1-git-send-email-motih@mellanox.com>
 <VI1PR05MB314912AF24111F75B1EC7BCBC3E50@VI1PR05MB3149.eurprd05.prod.outlook.com>
 <20180130093958.GE4256@6wind.com>
 <VI1PR05MB3149E091672CE4DB20071B9FC3E40@VI1PR05MB3149.eurprd05.prod.outlook.com>
 <20180131091513.GS4256@6wind.com>
 <AM4PR0501MB2657356B2CA46A039B96F7F6D2FB0@AM4PR0501MB2657.eurprd05.prod.outlook.com>
 <20180131104352.GT4256@6wind.com>
 <AM4PR0501MB2657F359FA9A25BEF2CBA14AD2FB0@AM4PR0501MB2657.eurprd05.prod.outlook.com>
 <20180131143157.GV4256@6wind.com>
In-Reply-To: <20180131143157.GV4256@6wind.com>
Accept-Language: en-US, he-IL
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
authentication-results: spf=none (sender IP is )
 smtp.mailfrom=matan@mellanox.com; 
x-originating-ip: [193.47.165.251]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; AM4PR0501MB2337;
 7:ifm5FpuiTkx5s5We4CE925Jjgjsv8cxQof4M+mNNHgnfZ5T/BPbwpLQlIg3WOojtzmQtwDsNBHPGHvUfodMyveTab5GGinPU3AP4aM2zwTVXLycaetgXFQqDnE8Ctf5pepOxrq3KiJyWCWVUOTIm0pJdn7IgcsqkeY1VgYoH1AZy09XVPd9spNvXDjW1w5thU5X7yMOuAL7ACSLKy/rbDga26H9qxZrMO00w8xPL4nAmuMAT2xHXbgsSRknQy8Py
x-ms-exchange-antispam-srfa-diagnostics: SSOS;
x-ms-office365-filtering-ht: Tenant
x-ms-office365-filtering-correlation-id: ccbcb1b6-ce8b-494b-4028-08d568cd2c8a
x-microsoft-antispam: UriScan:; BCL:0; PCL:0;
 RULEID:(7020095)(4652020)(4534165)(4627221)(201703031133081)(201702281549075)(48565401081)(5600026)(4604075)(3008032)(2017052603307)(7153060)(7193020);
 SRVR:AM4PR0501MB2337; 
x-ms-traffictypediagnostic: AM4PR0501MB2337:
x-ld-processed: a652971c-7d2e-4d9b-a6a4-d149256f461b,ExtAddr
x-microsoft-antispam-prvs: <AM4PR0501MB2337B26BC2D30D3CB68CCD46D2FB0@AM4PR0501MB2337.eurprd05.prod.outlook.com>
x-exchange-antispam-report-test: UriScan:(278428928389397);
x-exchange-antispam-report-cfa-test: BCL:0; PCL:0;
 RULEID:(6040501)(2401047)(8121501046)(5005006)(93006095)(93001095)(3231101)(2400082)(944501161)(3002001)(10201501046)(6055026)(6041288)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123564045)(20161123562045)(20161123558120)(6072148)(201708071742011);
 SRVR:AM4PR0501MB2337; BCL:0; PCL:0; RULEID:; SRVR:AM4PR0501MB2337; 
x-forefront-prvs: 056929CBB8
x-forefront-antispam-report: SFV:NSPM;
 SFS:(10009020)(396003)(376002)(39860400002)(346002)(39380400002)(366004)(51444003)(189003)(199004)(53754006)(105586002)(102836004)(5890100001)(76176011)(25786009)(5250100002)(14454004)(2900100001)(305945005)(6506007)(478600001)(66066001)(74316002)(59450400001)(7736002)(2906002)(6916009)(3846002)(4326008)(316002)(6116002)(93886005)(54906003)(186003)(99286004)(8936002)(86362001)(5660300001)(7696005)(2950100002)(81166006)(6246003)(229853002)(97736004)(53936002)(106356001)(33656002)(3660700001)(6436002)(68736007)(8676002)(3280700002)(55016002)(81156014)(26005)(9686003);
 DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR0501MB2337;
 H:AM4PR0501MB2657.eurprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords;
 MX:1; A:1; LANG:en; 
received-spf: None (protection.outlook.com: mellanox.com does not designate
 permitted sender hosts)
x-microsoft-antispam-message-info: vhv5tTLd0z5WD2b7l42q74TPkwYquzi0HzTgYqSJ05aClVjJis3mhkwp+/dzWOAmKFMYEWLOft87Qw2rlAnKjw==
spamdiagnosticoutput: 1:99
spamdiagnosticmetadata: NSPM
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: Mellanox.com
X-MS-Exchange-CrossTenant-Network-Message-Id: ccbcb1b6-ce8b-494b-4028-08d568cd2c8a
X-MS-Exchange-CrossTenant-originalarrivaltime: 31 Jan 2018 17:07:56.6697 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR0501MB2337
Subject: Re: [dpdk-dev] [PATCH v3] net/mlx4: fix dev rmv not detected after
 port stop
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Jan 2018 17:07:58 -0000

Hi Adrien

From: Adrien Mazarguil , Sent: Wednesday, January 31, 2018 4:32 PM
> Hi Matan,
>=20
> On Wed, Jan 31, 2018 at 01:44:41PM +0000, Matan Azrad wrote:
> > Hi Adrien
> >
> >  From: Adrien Mazarguil, Sent: Wednesday, January 31, 2018 12:44 PM
> > > On Wed, Jan 31, 2018 at 10:08:06AM +0000, Matan Azrad wrote:
> > > > Hi all
> > > >
> > > > From: Adrien Mazarguil
> > > > > On Tue, Jan 30, 2018 at 08:37:06PM +0000, Shahaf Shuler wrote:
> > > > > > Tuesday, January 30, 2018 11:40 AM, Adrien Mazarguil:
> > > > > > > Unfortunately I didn't get a chance to review this patch
> > > > > > > before it was
> > > > > applied.
> > > > > > > I'm not sure a stopped port is supposed to report events
> > > > > > > (interrupts). Will applications expect them to occur at this =
point?
> > > > > >
> > > > > > Why not?
> > > > > >
> > > > > > Stopped port is still counted as attached. The fact the
> > > > > > application stopped
> > > > > the packet receive on it doesn't mean it should not receive a
> > > > > sync events (such as the remove event).
> > > > > > async events, by definition, are not related to traffic being
> > > > > > flows through
> > > > > the port.
> > > > >
> > > > > My comment is based on my understanding of rte_eth_dev_stop(),
> > > which
> > > > > is a device (or port) is completely stopped, in a suspended
> > > > > state and no interrupts shall occur, as a means for applications
> > > > > to temporarily not be bothered by them until restarted.
> > > >
> > > > Stopping traffic is not saying that the application is not
> > > > interesting in the device, I think that you mean to dev_close().
> > >
> > > No, dev_close() releases resources and destroys configuration. Good
> > > luck using dev_start() or any other devop after dev_close().
> >
> > I'm just saying here that when the user call dev_close() it means he
> > is not interesting in the device any more, This is not the case in dev_=
stop().
> >
> > > > Any event may still be usable for application between dev_stop()
> > > > to dev_start(), especially RMV or LCS can still be interested.
> > >
> > > Possibly, but then, how come no PMD implements it that way?
> >
> > Again, maybe others PMDs make mistakes.
>=20
> It's basically the same as stating that for all these years, neither PMD =
nor
> application maintainers understood the true meaning of this API.
>=20
> Maybe you're right, maybe not. What's for sure is this patch unilaterally
> decided to modify an accepted behavior. Perhaps it's not a big deal but w=
e
> must be cautious.

Agree.

I'm just saying my opinion here.

> > > Neither did mlx4 before this patch got applied. At the very least it
> > > cannot be considered a "fix".
> >
> > It fixes something.
>=20
> Technically every time a feature is added, its absence gets "fixed" :)

:)
=20
> > > > > Think about it that way: applications do not want to get
> > > > > interrupts immediately after the device is initialized, because
> > > > > they might not be ready to process them at this point. An
> > > > > explicit call to
> > > > > rte_eth_dev_start() tells the PMD when it's OK to do so. The
> > > > > converse is
> > > rte_eth_dev_stop().
> > > >
> > > > So, they can delay the event registration to the time they
> > > > interesting in the
> > > events.
> > > > And use event unregister when they are not interesting in it anymor=
e.
> > >
> > > Of course you can ask application maintainers to adapt to the new
> > > behavior, or you know, leave things as they used to be.
> > >
> >
> > I don't know what any application does but for me it is a mistake to
> > stop all event processes in dev_stop(), Maybe for other application
> maintainers too.
>=20
> Just like you, I don't know either what all the applications ever written=
 for
> DPDK expect out of dev_stop(). What's for sure is that currently, LSC/RMV
> don't occcur afterward, the same way these events do not occur before
> dev_start().

Why not? RMV event can occur any time after probe.

> Any application possibly relying on this fact will break. In such a
> situation, a conservative approach is better.

If an application should fail to get event in stopped state it may fail in =
the previous code too:
The interrupt run from host thread so the next race may occur:
dev_start() : master thread.
Context switch.
RMV interrupt started to run callbacks: host thread.
Context switch.
dev_stop(): master thread.
Start reconfiguration: master thread.=20
Context switch.
Callback running.

So, the only thing which can disable callback running after dev_stop() is c=
allback unregistration before it.

> > > Setting up RMV/LSC callbacks is not the only configuration an
> > > application usually performs before calling dev_start(). Think about
> > > setting up flow rules, MAC addresses, VLANs, and so on, this on
> > > multiple ports before starting them up all at once. Previously it
> > > could be done in an unspecified order, now they have to take special =
care
> for RMV/LSC.
> >
> > Or maybe there callbacks code are already safe for it.
> > Or they manages the unregister\register calls in the right places.
>=20
> That's my point, these "maybes" don't argue in favor of changing things.

What I'm saying is that callbacks should be safe or not registered in the r=
ight time.

> > > Many devops are only safe when called while a device is stopped.
> > > It's even documented in rte_ethdev.h.
> > >
> >
> > And?
>=20
> ...And applications therefore often do all their configuration in an unsp=
ecified
> order while a port is stopped as a measure of safety. No extra care is ta=
ken
> for RMV/LSC. This uncertainty can be addressed by not modifying the curre=
nt
> behavior.

Or they expect to get interrupt and the corner case will come later if we w=
ill not change it.

> > > > > Stopping traffic can already be achieved by not polling from the
> > > > > application side, calling rte_eth_dev_[rt]x_queue_stop() and/or
> > > > > toggling RX/TX interrupts through rte_eth_dev_[rt]x_intr_enable()=
.
> > > > > rte_eth_dev_stop() provides lower-level device control.
> > > >
> > > > I think it makes sense only for Rx interrupt which is traffic
> > > > oriented(like stop
> > > and start).
> > >
> > > OK, looks like we disagree :)
> > >
> > > > > Perhaps documentation is not clear, however that's how LSC seems
> > > > > implemented in all PMDs; it gets disabled after
> > > > > rte_eth_dev_stop() and one should explicitly use
> > > > > rte_eth_link_get() to retrieve link status afterward. I think
> > > > > RMV should behave similarly with
> > > rte_eth_dev_is_removed().
> > > > > Adapting fail-safe should be easier than modifying all the
> > > > > remaining
> > > PMDs.
> > > >
> > > > Or maybe PMDs which do it make mistakes.
> > >
> > > I'm not convinced mlx4 is the only PMD doing the right thing, we
> > > should ask other maintainers as well as application writers' opinion
> > > on the matter. If it's not a problem for RMV/LSC to occur while a
> > > device is stopped, then I'll agree with the approach. It still won't =
make it a
> fix regardless.
> >
> > Let's think about RMV event, How can the application\other dpdk entitie=
s
> to know if the device was removed when it was in stopped state?
> > Checking it synchronically (by rte_eth_dev_is_removed()) can miss the
> removal just a moment after the device came back again, errors will occur
> and no one will know why.
> >
> > So, at least for RMV event, we need the notification also in stopped st=
ate.
>=20
> You sent the rte_eth_dev_is_removed() series. You're aware that PMDs
> implementing this call benefit from an automatic is_removed() check on al=
l
> remaining devops whenever some error occur.
> In short, an application will get notified simply by getting dev_start() =
(or any
> other callback) return -EIO and not being able to use the device.
=20
Yes, but between dev_stop to dev_start may not be any ethdev API calling.

> PMDs that do not implement is_removed() (or in addition to it) could also
> artificially trigger a RMV event after dev_start() is called. As long as =
the PMD
> remains quiet while the device is stopped, it's fine.

How can the PMD do it after dev_start()? Initiate alarm in dev start functi=
on to do it later And entering into race again?

I think it is not worth the effort and this patch is doing the right thing(=
also some other PMDs) .

Thanks.
> > > > > > > In my opinion it's not a fix, as in, it doesn't address an
> > > > > > > issue introduced by the mentioned patch whose behavior was
> correct.
> > > > > > >
> > > > > > > It's probably too late to change it now and it does address
> > > > > > > an issue seen with a use case involving this PMD, however I
> > > > > > > think the fail-safe PMD could as well poll using the
> > > > > > > recently-added
> > > > > > > rte_eth_dev_is_removed() when it's aware the underlying port
> > > > > > > is
> > > > > stopped instead of expecting interrupts.
>=20
> --
> Adrien Mazarguil
> 6WIND