From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-db5eur01on0070.outbound.protection.outlook.com [104.47.2.70]) by dpdk.org (Postfix) with ESMTP id 57A962B83 for ; Mon, 13 Aug 2018 09:52:48 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=eilhV/sCn3NN2pqEpKJrl7EhM5cnqOXOc3Za6LiRN44=; b=KOJZL8vLaiijI+Ku49oPeYTD4Bw/AT3F9sJjIURUNYX7S7OLhXMIIJL8/RAtrswftfCh1rkbf15Qiia6ibZTf3bBF9iTvNM4Gj97f13TTYCvKLBxtVuRZF1ameNRke0Opujdrwuxpw8mRnUqw3pBX+VwQAAnHEQeBacRa4z/Zys= Received: from DB7PR05MB4426.eurprd05.prod.outlook.com (52.134.109.15) by DB7PR05MB5034.eurprd05.prod.outlook.com (20.176.236.206) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1038.21; Mon, 13 Aug 2018 07:52:47 +0000 Received: from DB7PR05MB4426.eurprd05.prod.outlook.com ([fe80::52a:650b:ae10:fc3]) by DB7PR05MB4426.eurprd05.prod.outlook.com ([fe80::52a:650b:ae10:fc3%4]) with mapi id 15.20.1017.022; Mon, 13 Aug 2018 07:52:47 +0000 From: Shahaf Shuler To: Stephen Hemminger , Yongseok Koh CC: "dev@dpdk.org" , Stephen Hemminger Thread-Topic: [RFC] mlx5: fix error unwind in device start Thread-Index: AQHUKeL8TJuTvg/rbU67vTTfddBCWaS9X3JA Date: Mon, 13 Aug 2018 07:52:47 +0000 Message-ID: References: <20180801215952.25326-1-stephen@networkplumber.org> In-Reply-To: <20180801215952.25326-1-stephen@networkplumber.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=shahafs@mellanox.com; x-originating-ip: [193.47.165.251] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB7PR05MB5034; 6:QaVKei2kC6tSBMauPM8PWcoeaIN3OaNDjMpNAl+952fTPMRSvWqeWmsTbEKJ4dZI12itd8RDdqk5LXqalkgCHxDRde+37m9nwTHoyYml5TzxaFB2KCP8yi3SbraZHDsWEtGNbP8baJy3FdD6suWF2Gq/JV6Gjt13Az9hSa6Qom4gpT39Pcn3JKrTIBjyQgZ1VJeRpqIpcgzJZVnc94Vxs3ONRdWRUYcwLMHKrufnXARL4DD2tMDVgSUHjr4bomh1IFiIjDxq+Lnpf0WC5qmQc7d+MGP6AtD5AJibxw7rNYPXDfhMr8HV93NTYz9d4+w9E+PfCVeV51/4c6FlBTm77FfnsWXOnmys4iW/57CTi/dFK24hRudhbyBXwaCNct5e9eREl7YKs6nwRgEnyS0J9ZZEN30ncGQvoAOqwx5RI4umkgGRzsrD5VMJepZACTGGv8MZHYlmcQ8pVWjQWGOH0g==; 5:5sLDGgkHTOo7u9DzR0HrELhz+SpH36qVK9agCIkP5r3q3D0xuKyRu8DXaRb8AV4JHGyhoKlOFu1GIm5zI4SR6aaGbFSby7FrrIBEp1yaV2IIB0WoUdKw1LhtnbSsf6QGWcDCSs1Fuo2rjfF0D+Hd8xsppKWwGskWPPKUgG3thEM=; 7:zK4a91DTzDaKXSFCRAayF7VTmH7FIbw7zwIbINFPBa25XoUhL5H6ZayIfu4S7/eKBuCJem+F0I7pHsSgfkyPjsSTXknJw/bXjYneDuF2Vh2CrAtuMIUahQ0Rc7h8ghbS/+iUL2aH82Ddxg59tzvHAUU5x0hdvmpkhmZVLZBeNUVCNUuMbnCitUCNfu1+bEfast0NFDmunxmjLj+HAzS2l0SxY5CG0CJe54POEoclIwlSotwCn0tE7wk2N39NMSbN x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: da154673-0bde-4460-87ad-08d600f1c29c x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989117)(4534165)(4627221)(201703031133081)(201702281549075)(8990107)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:DB7PR05MB5034; x-ms-traffictypediagnostic: DB7PR05MB5034: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(28532068793085)(89211679590171); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3231311)(944501410)(52105095)(93006095)(93001095)(3002001)(10201501046)(6055026)(149027)(150027)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123564045)(20161123562045)(20161123558120)(6072148)(201708071742011)(7699016); SRVR:DB7PR05MB5034; BCL:0; PCL:0; RULEID:; SRVR:DB7PR05MB5034; x-forefront-prvs: 07630F72AD x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(136003)(346002)(376002)(366004)(39860400002)(189003)(199004)(6116002)(3846002)(5660300001)(8676002)(9686003)(26005)(55016002)(186003)(8936002)(105586002)(6506007)(53936002)(102836004)(476003)(7696005)(76176011)(305945005)(486006)(81156014)(256004)(446003)(11346002)(7736002)(6436002)(14444005)(74316002)(81166006)(25786009)(99286004)(5250100002)(478600001)(66066001)(2906002)(33656002)(86362001)(6636002)(68736007)(6246003)(110136005)(97736004)(316002)(2900100001)(14454004)(4326008)(106356001)(229853002)(54906003); DIR:OUT; SFP:1101; SCL:1; SRVR:DB7PR05MB5034; H:DB7PR05MB4426.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: rRTNVjpqcIXycEh/1vaYYUad1ngpgb2THGQtcNjBiEzNhEvs6UoGdukA8nFOVpbpkcgc3M6qpFxjiTBNKCLa2+BPviAnHGAlSKOYtU8rarBj8VcrMwj+IyYT2sdWqZREe2gjXRChuICKz6FM78svd/+DuW6EXo0xU08paBqcU5SzcnvF6syfslvWst0F8mAY+gBz/+HkU6yLdbLmx0AcFM+bNw1N4yqvavWHm4M8wE/dfnZkqBHiPdNFY6jtg4Nj/+esom1OM5jM26KlxVxU3dQs3vs71ZJkkHCtkN5KOkOZaUacPZHe11N5SWHRKZvmORQS6yeKtuBPOtgJ8PGpvp5d+WgTqeb4X6VP3OajE0I= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: da154673-0bde-4460-87ad-08d600f1c29c X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Aug 2018 07:52:47.0239 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB7PR05MB5034 Subject: Re: [dpdk-dev] [RFC] mlx5: fix error unwind in device start X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Aug 2018 07:52:48 -0000 Hi Stephan, Thursday, August 2, 2018 1:00 AM, Stephen Hemminger: > Subject: [RFC] mlx5: fix error unwind in device start >=20 > The error handling in start of the mlx5 driver is buggy. > For example, if setting up the flows fails the device driver will then ge= t stuck > in mlx5_flow_rxq_flags_clear waiting for something that will never happen= . Looking at the code I cannot understand why the mlx5_flow_rxq_flags_clear g= et stuck nor to what it waits. The function has few finite loops which are not depended in anything which = happened before it at the device start. Moreover I tried to force either the mlx5_traffic_enable or the mlx5_flow_s= tart to stop, however the results was the port failed to start but no stuck= . Can you provide more details about the issue you saw there? =20 >=20 > The problem is that the code jumps to a common error label and does > unwind for portions of the driver which have not been setup. >=20 > This suggested patch breaks it into different labels with each failure pa= th only > unwinding what was done. >=20 > Also, the ethdev driver should not be manipulating the dev_started flag > directly. That is handled by the common ethdev layer. >=20 I agree that maybe this code part can be better written, but my question be= fore is whether we have an actual bug that we will solve w/ this change?=20 > The patch works for the success case, but furthur testing is needed to > actually exercise all the error paths. > This is left as exercise for the maintainers. >=20 > Signed-off-by: Stephen Hemminger > --- > drivers/net/mlx5/mlx5_trigger.c | 26 +++++++++++++------------- > 1 file changed, 13 insertions(+), 13 deletions(-) >=20 > diff --git a/drivers/net/mlx5/mlx5_trigger.c > b/drivers/net/mlx5/mlx5_trigger.c index e2a9bb703261..79a7b233986a > 100644 > --- a/drivers/net/mlx5/mlx5_trigger.c > +++ b/drivers/net/mlx5/mlx5_trigger.c > @@ -171,42 +171,42 @@ mlx5_dev_start(struct rte_eth_dev *dev) > if (ret) { > DRV_LOG(ERR, "port %u Rx queue allocation failed: %s", > dev->data->port_id, strerror(rte_errno)); > - mlx5_txq_stop(dev); > - return -rte_errno; > + goto error_txq_stop; > } > - dev->data->dev_started =3D 1; > + > ret =3D mlx5_rx_intr_vec_enable(dev); > if (ret) { > DRV_LOG(ERR, "port %u Rx interrupt vector creation failed", > dev->data->port_id); > - goto error; > + goto error_rxq_stop; > } > mlx5_xstats_init(dev); > ret =3D mlx5_traffic_enable(dev); > if (ret) { > DRV_LOG(DEBUG, "port %u failed to set defaults flows", > dev->data->port_id); > - goto error; > + goto error_intr_vec_disable; > } > ret =3D mlx5_flow_start(dev, &priv->flows); > if (ret) { > DRV_LOG(DEBUG, "port %u failed to set flows", > dev->data->port_id); > - goto error; > + goto error_traffic_disable; > } > + > dev->tx_pkt_burst =3D mlx5_select_tx_function(dev); > dev->rx_pkt_burst =3D mlx5_select_rx_function(dev); > mlx5_dev_interrupt_handler_install(dev); > return 0; > -error: > - ret =3D rte_errno; /* Save rte_errno before cleanup. */ > - /* Rollback. */ > - dev->data->dev_started =3D 0; > - mlx5_flow_stop(dev, &priv->flows); > + > +error_traffic_disable: > mlx5_traffic_disable(dev); > - mlx5_txq_stop(dev); > +error_intr_vec_disable: > + mlx5_rx_intr_vec_disable(dev); > +error_rxq_stop: > mlx5_rxq_stop(dev); > - rte_errno =3D ret; /* Restore rte_errno. */ > +error_txq_stop: > + mlx5_txq_stop(dev); > return -rte_errno; > } >=20 > -- > 2.18.0