From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 3CC5BA00C5;
	Mon, 14 Feb 2022 14:09:27 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 0EE7440DDA;
	Mon, 14 Feb 2022 14:09:27 +0100 (CET)
Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com
 [205.220.165.32])
 by mails.dpdk.org (Postfix) with ESMTP id 8AACB4067E;
 Mon, 14 Feb 2022 14:09:24 +0100 (CET)
Received: from pps.filterd (m0246627.ppops.net [127.0.0.1])
 by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 21EB2NLN028540; 
 Mon, 14 Feb 2022 13:09:23 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com;
 h=from : to : cc :
 subject : date : message-id : content-type : mime-version;
 s=corp-2021-07-09; bh=IO4rFFOkue8rJOSMRTuZug0u5BVGGRFdnrhpz2TUDxk=;
 b=zKZXrm0KuAyRt6Q/hYTd/QHqTZG3ofG2mweWoekZSxO3dfVy5CclZxVzXSYPEw/6WX9m
 pRUU2oh1d7qGgQy26WOWm9KNc5KSQvyBwYMJ9fM8zDJSFXmFf2SUVYwL6fe0RciDT3hE
 gGi8hepyRYWUZnyo5QQGRKy/EwUMlWWSd5ZbNOP9ciP/RQJFaKqrW6QQRgWCvBDIeCLW
 xPE+Z4McKoqqP4WSZi9NjrA4qN2kFgdn53wYATv+Ic1yEjyM3AMO4fdv4bFDnvxPblcX
 GwMbJql69lgRsfBuG6Xounv84FazgF+cZDQ7SoZWCep3A5wKiahvOx7IVvxrWt2VFhUP 4w== 
Received: from aserp3020.oracle.com (aserp3020.oracle.com [141.146.126.70])
 by mx0b-00069f02.pphosted.com with ESMTP id 3e63g14dd2-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Mon, 14 Feb 2022 13:09:23 +0000
Received: from pps.filterd (aserp3020.oracle.com [127.0.0.1])
 by aserp3020.oracle.com (8.16.1.2/8.16.1.2) with SMTP id 21ED612j146208;
 Mon, 14 Feb 2022 13:09:22 GMT
Received: from nam12-mw2-obe.outbound.protection.outlook.com
 (mail-mw2nam12lp2040.outbound.protection.outlook.com [104.47.66.40])
 by aserp3020.oracle.com with ESMTP id 3e6qkwn930-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Mon, 14 Feb 2022 13:09:22 +0000
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none;
 b=lwM5rxPCdN1d8+7z9hnhkpK7zb4/tS8q4fE8/kWs/T2oBRUYikuZnuu5N27sHG23/+JDq156h0vlmh0auxCojEWgPMxg7hV5rkfH0aYpJ2yxG7Jo//GZjK6gqA+QEcMKTKT1U6+kENy6mmWXpd4EnZ/W7AM9W6++lFI0sqBo/LaiDTRpA8JCmDlXSllQho6swyWJCBsMq+7bDYPiNigmLW0QJ/rQ01dtImvDyq1+ijnHYVKG7c/Cw/YtL6vV1Fx/CqcOFRPgbqkPXrDPlm5wOJub8QQXkplWjXfv63mBUf2jjsIbWwhXiEy/QvtwoUgX1PhwaPZ+RlkL5NieRIiYWA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector9901;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=IO4rFFOkue8rJOSMRTuZug0u5BVGGRFdnrhpz2TUDxk=;
 b=mmZv3AePgyW5iq5V7P4pp4l/bzINc36facNFr7uht6uwO4skUJMVSatoZqXC8bBN3Mys0FZ5nzr0gjKk9LbvdBKX+h8aBpV7ngg8Li+wd5lYoE6IEYUh6WQP+DYv7hcXDBILo3p6tZ2F0Y1qS/PApkCPKASh57lTuNJrAnqF9rcOOKR1nayAdlUcFC+xAI57LDmMAkPhoHJ+n7IrCF7pRxYFLIkfOVJ+lyL820HbBFxXa2p955CzjH8DY3u2y6isvfe0BooAb3acNWoEn95Qq0SKrG0gRuUSMgOwrXmIwpbWO5vDmgQfnFgPT05uxpVZpgvgo54qWCiDlv6Kb7nJOQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none;
 dkim=none; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=IO4rFFOkue8rJOSMRTuZug0u5BVGGRFdnrhpz2TUDxk=;
 b=UMkGQj/tpL+4VWNdmPtd2XyRTxKJEeyglryQDkPNFnwirerJbOmxUGNDbRsswN8rMedDNp8tfVYCcozXguJK+qdqnwXQ/DqYHE66mV1Dl6uqdteumlCRZi2ejK8jswBkMmp9FSDDadfZQ76HtC+DY3lde59aS/MKRVmrWtXgDIE=
Received: from PH0PR10MB5514.namprd10.prod.outlook.com (2603:10b6:510:106::17)
 by PH0PR10MB4616.namprd10.prod.outlook.com (2603:10b6:510:34::10)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.15; Mon, 14 Feb
 2022 13:09:19 +0000
Received: from PH0PR10MB5514.namprd10.prod.outlook.com
 ([fe80::7080:9532:83ec:6f68]) by PH0PR10MB5514.namprd10.prod.outlook.com
 ([fe80::7080:9532:83ec:6f68%4]) with mapi id 15.20.4975.019; Mon, 14 Feb 2022
 13:09:19 +0000
From: Vipul Ashri <vipul.ashri@oracle.com>
To: =?iso-8859-1?Q?Ga=EBtan_Rivet?= <grive@u256.net>, "dev@dpdk.org"
 <dev@dpdk.org>
CC: "stable@dpdk.org" <stable@dpdk.org>, Madhuker Mythri
 <madhuker.mythri@oracle.com>
Subject: Re: [PATCH v2] net/failsafe: link_update request crashing at boot
Thread-Topic: [PATCH v2] net/failsafe: link_update request crashing at boot
Thread-Index: Adgho9y4mc8M8powT/yG9xybD6XL/Q==
Date: Mon, 14 Feb 2022 13:09:19 +0000
Message-ID: <PH0PR10MB5514F1BB9221AAD404DE194D9A339@PH0PR10MB5514.namprd10.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 521774d1-e978-49c2-77ae-08d9efbb3653
x-ms-traffictypediagnostic: PH0PR10MB4616:EE_
x-microsoft-antispam-prvs: <PH0PR10MB46169BBA20D6E5B262C2FC4F9A339@PH0PR10MB4616.namprd10.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:480;
x-ms-exchange-senderadcheck: 1
x-ms-exchange-antispam-relay: 0
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: 9wWRb/pRahT6b4KLbwFQ6uT61+d0QRuZ3AC0b6xIG4ZPRCusrvQyQAHBXN1mxS6w3ycJPpqazsyWFvAF3KjdXelVLmbwSp6sWB1aTKomMJIXRsQF/hzuAYdXCbYg9pVAr0TZSWXBe1B7Rw9ZB2qWMaHX3WD2y4WWwZUylkcM5n7ezfEb8yDI24BSmHOZIcfPLFd/ff27Xx99f3tiqjfT94Auek5gFulXbWZsVqZ2dHAJpyB6z/zCkHf9pJWL8Ls2C3fDbK3iW0lI6WKy26ENIxtzDwdHjQ7vls9cw73NMDqJ51ozjQ6BuKc0N+NpteTbJnzqJpgS945ipfJRzSuXmCzaE4AtR6aiZxr3MzbmRxjQKqqy28QjJMI99T0in4a+SO7vDQi3E/qfL5SCxHsFiC9YTILY5EIvf4RA/PMUySir8m+WxRW/2AeKUhwA3cxQPO6vw5J/GFwWS6bCiF5hBg6VbI/XskCi+VXOgdj7js3EWpvARV995PO8UxKvSvNtczyQrjnky9hmVeM9VMgaKs1FxqZXLP+FxPAybgW8mv1q/bUg6rWDDraj9DEPDgQKGfHEQR/0MtA4tkiF8KBSnWAhvnKRySQyJfHP3kRMZxBjX3gJVKqD8kO+NnvuVx4IefXYi2OPSufyx+YgJSTWXUPsSB+1v7fAxYWuywqzSmTTcKsPR+zHlIhzrF+1G+THmxT03H21E6bM0l6qm6FG1w==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:PH0PR10MB5514.namprd10.prod.outlook.com; PTR:; CAT:NONE;
 SFS:(13230001)(366004)(38070700005)(107886003)(5660300002)(83380400001)(52536014)(8936002)(508600001)(66574015)(9686003)(33656002)(53546011)(7696005)(71200400001)(26005)(186003)(6506007)(66476007)(19627235002)(86362001)(64756008)(44832011)(38100700002)(54906003)(110136005)(316002)(55016003)(66446008)(4326008)(66946007)(76116006)(8676002)(2906002)(66556008)(122000001);
 DIR:OUT; SFP:1101; 
x-ms-exchange-antispam-messagedata-chunkcount: 1
x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?S7havX6Kqqzj6HTiB5UBu0zGnsdys80MTtzLkf8RBdore3dhVnosLqWZ0/?=
 =?iso-8859-1?Q?VF3f1wMZX0miX+DEExmdIC/IvHub3lxfTRnKRRiwcw1y9s1HHWCGR/omkU?=
 =?iso-8859-1?Q?ZG3RkhYP32FLOxoZfXvnpdTGfEY9GOJ8Dwrk3cc6fXvQY6pXE3cLpuXZFY?=
 =?iso-8859-1?Q?MXIQZBdbpmT5KEMBo018TvHKk45W8S2VyoDnB9VCcHfKzxwAh9fzLxQJFY?=
 =?iso-8859-1?Q?SJ7sVdAYddqXpkVUeeuWBuxFPXFFVBTXvMgj3aJoCvHA2UuxC6A1bYqM4/?=
 =?iso-8859-1?Q?vo+6hJlGni2Z6T9XNJT1T94lUG15Tn1tUqhySWl4cQVKlSjnBlLe0LTLHN?=
 =?iso-8859-1?Q?pgaE6hD4Hu46o2kcOSz6LYrNxfEmOguj8g0HMXd4oPwEPQ8k6tTO0V8asr?=
 =?iso-8859-1?Q?jlRarAhCYnY7Z6+7RZ1U+4QWZ5Cm3+NJQaa8tgRNNrb7b+prw+LMbPL5ts?=
 =?iso-8859-1?Q?zRtZyOTaFdpLdWNGZ42PnRgL8rtyI5k1DHNDv9vVar5V/rtjVNpQey1Cjx?=
 =?iso-8859-1?Q?DipiRVtIsnvAKKOhVD32pYS+OLRSf04N+bAdR8Mr9rcgtSDBgQ+tthN2Fa?=
 =?iso-8859-1?Q?/36nkWHoY+tOwCJco6uR4E4x735PEGrIdQUE7TpKE1xYLZTh2F+WLh2qb7?=
 =?iso-8859-1?Q?2+eCrTxgKYnBLDdYzLMh+YfuKxHoxBFZw4nE1v3ejuj8deDz3sipBosNAl?=
 =?iso-8859-1?Q?U3x9ccbqWHvCwshYuUb3cVufXs3fhVu/rN6PfVSjqDq7HOAKJjHVXNAPuI?=
 =?iso-8859-1?Q?t7/rybMB6Da1BHWNKcKeRBgQNi0xvRCPy9dOXqUppxkb42WW5sbeAVsbYm?=
 =?iso-8859-1?Q?36wdrKJMvie011nI9zarRGFYg9ku6gmXfNQpTFku9xSW7FiH/oRaT/F6OM?=
 =?iso-8859-1?Q?oeBA9hhqJ4twOQIjbEO5ChD7cLPdeqfVTu+apxqghSJ/W0gx8vv7TQHyxW?=
 =?iso-8859-1?Q?zV5gpTwD3XwL8mWv0Vi1rKLMa0YCxvhnuHM9eZnYnD71qAZ1x6a4erVkZU?=
 =?iso-8859-1?Q?vtqA4ycf1rO7E+b3Fvm9oLQv9p9l4AIwM/PO71NraZ8Rhzy5IP/mkU6tJ7?=
 =?iso-8859-1?Q?uTkLdU7JOJaJURgz0B9MF1M+Mrs8prl/qIFC2ai/pQ41AbSp4HQF57kf/0?=
 =?iso-8859-1?Q?T7Amc7hRjChGCE5UOJ74W7PTy9dkgglU56D0YWdttzy22p1b5L9DHnW97X?=
 =?iso-8859-1?Q?RQFU2xygW4/9tx1rUjlHksYSUk3hBXnpKCbE7X5yOXMkNA9mmr1f9LXpvb?=
 =?iso-8859-1?Q?/8Zp0V1Fa8eYUyhww0ihmzmi59yecdkgZO0hvMDUFaObNEHPlPZf8Ad+CX?=
 =?iso-8859-1?Q?pAfPJEr7RtPgwqaOW4JlaPE1y9hYNBaBJiht4rV02SfAT4JYRV7VPhrfwO?=
 =?iso-8859-1?Q?6LfFMvW3VTfsniheanJZMMKLQ+Vkqzjm7k3M81E7hdg3pyHoan6Mq6NZnB?=
 =?iso-8859-1?Q?JxJYHc3xCzbyEFUI3Iw6Qrv4ENTC6xzBKQ35ULWBCzEkAjb1dT0LvQqtbH?=
 =?iso-8859-1?Q?kFFgF8dmpS8+fjZvjTFWh4KUGKinsvX3ml4gx9sA2sIf0nTrvQjzPulTS/?=
 =?iso-8859-1?Q?GdT1rhN3BaaExFrEj9KzRy6dhw4f/OnFgsrIISkxLMf5V/lthqhhSQsEDD?=
 =?iso-8859-1?Q?5zW23wlzITOnAnP8XA2xBOFYvvAxm3piEj8ypYNnjNoOQAmTfFOvvs3Q?=
 =?iso-8859-1?Q?=3D=3D?=
Content-Type: multipart/alternative;
 boundary="_000_PH0PR10MB5514F1BB9221AAD404DE194D9A339PH0PR10MB5514namp_"
MIME-Version: 1.0
X-OriginatorOrg: oracle.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: PH0PR10MB5514.namprd10.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 521774d1-e978-49c2-77ae-08d9efbb3653
X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Feb 2022 13:09:19.8217 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: mNUgOSZP1VwW2bEEd/8sEmL2M1uMmib7WZ/YJFfHOBROWCK17aeLtgr6HIIULrgsVDlDFAA8TGNGNUlnMLWM4Q==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR10MB4616
X-Proofpoint-Virus-Version: vendor=nai engine=6300 definitions=10257
 signatures=673431
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0
 bulkscore=0 suspectscore=0
 phishscore=0 malwarescore=0 mlxscore=0 mlxlogscore=999 adultscore=0
 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000
 definitions=main-2202140080
X-Proofpoint-GUID: BATnvCznzXgRMpKq0pJcq-5R68AfVCo7
X-Proofpoint-ORIG-GUID: BATnvCznzXgRMpKq0pJcq-5R68AfVCo7
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

--_000_PH0PR10MB5514F1BB9221AAD404DE194D9A339PH0PR10MB5514namp_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi Gaetan



Sorry for very late reply, we were busy working on 21.11 integration.

Although we have adopted this code internally for us but I am sharing the p=
atch to opensource for community benefit.



This is specific case of AZURE setup with our very customized complex envir=
onment.

Let me share the logs with traceback first

SECONDARY PROCESS

timestamp=3D1633598184

TCZ0.0.0 Cycle 152 (Build 1832)

signal 11 (Segmentation fault), address is 0x31117bbce6c8 from 0x47d08b1



[bt]: ( 1) _Z18snprintf_backtraceRPciiP9siginfo_tPv      (+   0xf4) - sp =
=3D 0x7fffef3fd110, ip =3D 0x3acdc54

[bt]: ( 2) _Z13crit_err_hdlriP9siginfo_tPv               (+  0x159) - sp =
=3D 0x7fffef3fdc20, ip =3D 0x3acdf29

[bt]: ( 3) _ZN13SignalAdapter12handleSignalEiP9siginfo_tPv (+  0x104) - sp =
=3D 0x7fffef3fdf00, ip =3D 0x274d4c4

[bt]: ( 4) _L_unlock_18                                  (+   0x2c) - sp =
=3D 0x7fffef3fdf80, ip =3D 0x7ffff7bce630

[bt]: ( 5) rte_eth_dev_attach_secondary                  (+   0x21) - sp =
=3D 0x7fffef3fec50, ip =3D 0x47d08b1

[bt]: ( 6) rte_eth_from_ring                             (+ 0x3438) - sp =
=3D 0x7fffef3fec80, ip =3D 0x4e49da8

[bt]: ( 7) _init                                         (+ 0xa1b8) - sp =
=3D 0x7fffef3feec0, ip =3D 0x12e0368

[bt]: ( 8) local_dev_probe                               (+   0xac) - sp =
=3D 0x7fffef3feef0, ip =3D 0x478fd2c

[bt]: ( 9) rte_uuid_unparse                              (+  0x274) - sp =
=3D 0x7fffef3fef30, ip =3D 0x47a3e94

[bt]: (10) rte_eal_vfio_get_vf_token                     (+   0xd7) - sp =
=3D 0x7fffef3ff110, ip =3D 0x47b04b7

[bt]: (11) eal_hugepage_info_read                        (+  0x602) - sp =
=3D 0x7fffef3ff170, ip =3D 0x47b2cd2

[bt]: (12) start_thread                                  (+   0xc5) - sp =
=3D 0x7fffef3ff220, ip =3D 0x7ffff7bc6ea5

[bt]: (13) clone                                         (+   0x6d) - sp =
=3D 0x7fffef3ff2c0, ip =3D 0x7ffff004096d

EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket:eal_de=
v_mp_request

EAL: Cannot send request to primary

EAL: Failed to send hotplug request to primary

net_failsafe: Failed to probe devargs net_tap_vsc0

EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket:eal_de=
v_mp_request

EAL: Cannot send request to primary

EAL: Failed to send hotplug request to primary

net_failsafe: Failed to probe devargs net_tap_vsc1

EAL: No legacy callbacks, legacy socket not created

EAL: Drop mp reply: eal_dev_mp_request



PRIMARY PROCESS

timestamp=3D1633598196

TCZ0.0.0 Cycle 152 (Build 1832)

signal 11 (Segmentation fault), address is 0x38 from 0x9d8fbe



[bt]: ( 1) _Z18snprintf_backtraceRPciiP9siginfo_tPv      (+   0xf4) - sp =
=3D 0x7fffecf41150, ip =3D 0x100dd44

[bt]: ( 2) _Z13crit_err_hdlriP9siginfo_tPv               (+  0x159) - sp =
=3D 0x7fffecf41c60, ip =3D 0x100e019

[bt]: ( 3) _ZN13SignalAdapter12handleSignalEiP9siginfo_tPv (+  0x104) - sp =
=3D 0x7fffecf41f40, ip =3D 0xff4894

[bt]: ( 4) _L_unlock_18                                  (+   0x2c) - sp =
=3D 0x7fffecf41fc0, ip =3D 0x7ffff61d9630

[bt]: ( 5) failsafe_eth_dev_close                        (+  0x65e) - sp =
=3D 0x7fffecf42c90, ip =3D 0x9d8fbe

[bt]: ( 6) rte_eth_link_get_nowait                       (+   0x6a) - sp =
=3D 0x7fffecf42cf0, ip =3D 0x62fa0a

[bt]: ( 7) _ZN11StatsThread9statsLoopEP10CustomObject      (+  0x33e) - sp =
=3D 0x7fffecf42d20, ip =3D 0xedea2e

[bt]: ( 8) _ZN11StatsThread9statsLoopEP10CustomObject      (+  0x8dc) - sp =
=3D 0x7fffecf42d90, ip =3D 0xedefcc

[bt]: ( 9) ThreadFunction                                (+   0xe6) - sp =
=3D 0x7fffecf42db0, ip =3D 0x7ffff6b477e6

[bt]: (10) start_thread                                  (+   0xc5) - sp =
=3D 0x7fffecf42de0, ip =3D 0x7ffff61d1ea5

[bt]: (11) clone                                         (+   0x6d) - sp =
=3D 0x7fffecf42e80, ip =3D 0x7ffff0a6b96d







DPDK 20.11.2

core mask is 00000000000000000000000000004000

DPDK Custom Process initialized with 2 ports

the min max TxQ is maxTxQueues 16

Using 1 RxQs for port 0 (# F-core=3D1)

Using 1 RxQs for port 3 (# F-core=3D1)

Core 14 (port=3D0, rxQ=3D0) kni_ring=3D(nil)

Core 14 (port=3D3, rxQ=3D0) kni_ring=3D(nil)

Core 14 txN =3D 0

Thread for core 14 using ring from usbc of 0x31117b29bb00

Ring size must be powers of 2, adjusting from 8196 to 16384

Thread for core 14 using ring from MEDIA of 0x31117b27b840

Encaps Memory Zone=3D 48044 sizeof encaps =3D 60

Trace Memory Zone=3D 272

Policy Memory Zone=3D 8196 sizeof policy =3D 240

link status for port 0 is 1

link status for port 3 is 1

PORT 0 supports 16 rx queues and 16 tx queues (driver_name =3D net_failsafe=
, driver_type =3D 16)

PORT 0 is polling for link-change, interrupts disabled

[DPDK] tap_flow_create(): Kernel refused TC filter rule creation (17): File=
 exists

[DPDK] net_failsafe: Failed to create flow on sub_device 1

add_flow(): create() fails for port 0; Reason: overlapping rules or Kernel =
too old for flower support

Error adding broadcast flow

PORT 3 supports 16 rx queues and 16 tx queues (driver_name =3D net_failsafe=
, driver_type =3D 16)

PORT 3 is polling for link-change, interrupts disabled

[DPDK] EAL: Failed to hotplug add device on primary

[DPDK] tap_flow_create(): Kernel refused TC filter rule creation (17): File=
 exists

[DPDK] net_failsafe: Failed to create flow on sub_device 1

add_flow(): create() fails for port 3; Reason: overlapping rules or Kernel =
too old for flower support

Error adding broadcast flow

Cmd Thread is available

Capture object initialized

init :Stats Thread is available

ifLinkUpdate: Sending OperStatus for port=3D0 stat=3D1

ifLinkUpdate: Port 0 Link Change - speed 40000 Mbps - full-duplex

[DPDK] EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket=
_2934_298e9db8d1:eal_dev_mp_request

[DPDK] EAL: rte_mp_request_sync failed

[DPDK] EAL: Failed to send hotplug request to secondary

[DPDK] EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket=
_2934_298e9db8d1:eal_dev_mp_request

[DPDK] EAL: rte_mp_request_sync failed

[DPDK] EAL: Failed to hotplug add device on primary

[DPDK] Invalid port_id=3D2

[DPDK] net_failsafe: Operation rte_eth_stats_get failed for sub_device 1 wi=
th error -19



There is some race at secondary process and primary got crashed because its=
 data-structures and partially filled.

Let me know if you need GDB analysis, I can share with next reply if you ar=
e still unsatisfied. GDB analysis will be bigger.

Thanks!



Regards

Vipul



-----Original Message-----
From: Ga=EBtan Rivet <grive@u256.net>
Sent: Monday, November 22, 2021 3:53 PM
To: Vipul Ashri <vipul.ashri@oracle.com>; dev@dpdk.org
Cc: stable@dpdk.org
Subject: [External] : Re: [PATCH v2] net/failsafe: link_update request cras=
hing at boot



On Thu, Oct 21, 2021, at 23:42, vipul.ashri@oracle.com<mailto:vipul.ashri@o=
racle.com> wrote:

> From: Vipul Ashri <vipul.ashri@oracle.com<mailto:vipul.ashri@oracle.com>>

>

> failsafe crashed while sending early link_update request during boot

> time initialization.

> Based on debugging we found failsafe device was good but sub- devices

> were progressing towards initialization and SUBOPS macro where

> expanding macro gives [partial_dev]->dev_ops->link_update()

> execution of which triggered crash because dev_ops=3D=3D0. similar crash

> seen at failsafe_eth_dev_close()

>

> Failsafe driver need a separate check for subdevices similar to

> "RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);" which is called

> to almost every eth_dev function.

>

> Fixes: a46f8d5 ("net/failsafe: add fail-safe PMD")

> Cc: stable@dpdk.org<mailto:stable@dpdk.org>

> Signed-off-by: Vipul Ashri <vipul.ashri@oracle.com<mailto:vipul.ashri@ora=
cle.com>>



Hello Vipul,



I'm sorry for the delay, I missed your fix on the mailing list.



IIUC, the issue is that failsafe finished init and received an ethdev opera=
tion call, but one of its sub-device, although marked DEV_ACTIVE, has its e=
th_dev->dev_ops field NULL.



It is really surprising to me, because there aren't many ways for a sub-dev=
ice to become DEV_ACTIVE.



The only two ways are



  * by executing 'fs_dev_configure()', which will first execute

    rte_eth_dev_configure() on the sub-device, and on error would

    stop *without* setting DEV_ACTIVE.

    rte_eth_dev_configure() will itself execute

    RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV), so it would

    return negative errno and fs_dev_configure() would abort.



  * by executing 'fs_dev_remove()' and the sub-device was 'DEV_STARTED'

    to begin with, then it is retrograded to DEV_ACTIVE once stopped.



So I don't understand yet how it is possible for a sub-device to become DEV=
_ACTIVE while its eth_dev->dev_ops are NULL. It seems more like a bug, memo=
ry corruption or just an unexpected execution pattern.



Could describe in more detail the execution?

In particular, setting the EAL log-level to debug with the option:

' --log-level pmd.net.failsafe:debug '

for example while using testpmd or your DPDK app.

It should show ethdev level accesses to the sub-devices, and error values.



Best regards,

--

Gaetan Rivet

--_000_PH0PR10MB5514F1BB9221AAD404DE194D9A339PH0PR10MB5514namp_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
<meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
	{mso-style-priority:99;
	mso-style-link:"Plain Text Char";
	margin:0in;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
span.PlainTextChar
	{mso-style-name:"Plain Text Char";
	mso-style-priority:99;
	mso-style-link:"Plain Text";
	font-family:"Calibri",sans-serif;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri",sans-serif;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"#0563C1" vlink=3D"#954F72" style=3D"word-wrap:=
break-word">
<div class=3D"WordSection1">
<p class=3D"MsoPlainText">Hi Gaetan<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">Sorry for very late reply, we were busy working o=
n 21.11 integration.<o:p></o:p></p>
<p class=3D"MsoPlainText">Although we have adopted this code internally for=
 us but I am sharing the patch to opensource for community benefit.<o:p></o=
:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">This is specific case of AZURE setup with our ver=
y customized complex environment.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p></o:p></p>
<p class=3D"MsoPlainText">Let me share the logs with traceback first<o:p></=
o:p></p>
<table class=3D"MsoTableGrid" border=3D"1" cellspacing=3D"0" cellpadding=3D=
"0" style=3D"border-collapse:collapse;border:none">
<tbody>
<tr>
<td width=3D"875" valign=3D"top" style=3D"width:656.6pt;border:solid window=
text 1.0pt;padding:0in 5.4pt 0in 5.4pt">
<p class=3D"MsoPlainText">SECONDARY PROCESS<o:p></o:p></p>
<p class=3D"MsoPlainText">timestamp=3D1633598184<o:p></o:p></p>
<p class=3D"MsoPlainText">TCZ0.0.0 Cycle 152 (Build 1832)<o:p></o:p></p>
<p class=3D"MsoPlainText">signal 11 (Segmentation fault), address is 0x3111=
7bbce6c8 from 0x47d08b1<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 1) _Z18snprintf_backtraceRPciiP9siginfo_t=
Pv&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0xf4) - sp =3D 0x7fffef3fd1=
10, ip =3D 0x3acdc54<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 2) _Z13crit_err_hdlriP9siginfo_tPv&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p; (+&nbsp; 0x159) - sp =3D 0x7fffef3fdc20, ip =3D 0x3acdf29<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 3) _ZN13SignalAdapter12handleSignalEiP9si=
ginfo_tPv (+&nbsp; 0x104) - sp =3D 0x7fffef3fdf00, ip =3D 0x274d4c4<o:p></o=
:p></p>
<p class=3D"MsoPlainText">[bt]: ( 4) _L_unlock_18&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0x2c) - sp =3D 0x7fffef3fdf80, ip =3D=
 0x7ffff7bce630<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 5) rte_eth_dev_attach_secondary&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0x21) - sp =3D 0x7fffef3fec50, ip =3D 0x47=
d08b1<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 6) rte_eth_from_ring&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+=
 0x3438) - sp =3D 0x7fffef3fec80, ip =3D 0x4e49da8<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 7) _init&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+ 0xa1b8) - sp =
=3D 0x7fffef3feec0, ip =3D 0x12e0368<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 8) local_dev_probe&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp; (+&nbsp;&nbsp; 0xac) - sp =3D 0x7fffef3feef0, ip =3D 0x478fd2c<o:p>=
</o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 9) rte_uuid_unparse&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p; (+&nbsp; 0x274) - sp =3D 0x7fffef3fef30, ip =3D 0x47a3e94<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: (10) rte_eal_vfio_get_vf_token&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0xd7) - sp =3D 0x7fffef3ff1=
10, ip =3D 0x47b04b7<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: (11) eal_hugepage_info_read&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+&nbsp; 0x602) - sp =3D 0x=
7fffef3ff170, ip =3D 0x47b2cd2<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: (12) start_thread&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0xc5) - sp =3D 0x7fffef3ff220, ip =3D=
 0x7ffff7bc6ea5<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: (13) clone&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0x=
6d) - sp =3D 0x7fffef3ff2c0, ip =3D 0x7ffff004096d<o:p></o:p></p>
<p class=3D"MsoPlainText">EAL: Fail to recv reply for request /var/run/dpdk=
/oracusbc/mp_socket:eal_dev_mp_request<o:p></o:p></p>
<p class=3D"MsoPlainText">EAL: Cannot send request to primary<o:p></o:p></p=
>
<p class=3D"MsoPlainText">EAL: Failed to send hotplug request to primary<o:=
p></o:p></p>
<p class=3D"MsoPlainText">net_failsafe: Failed to probe devargs net_tap_vsc=
0<o:p></o:p></p>
<p class=3D"MsoPlainText">EAL: Fail to recv reply for request /var/run/dpdk=
/oracusbc/mp_socket:eal_dev_mp_request<o:p></o:p></p>
<p class=3D"MsoPlainText">EAL: Cannot send request to primary<o:p></o:p></p=
>
<p class=3D"MsoPlainText">EAL: Failed to send hotplug request to primary<o:=
p></o:p></p>
<p class=3D"MsoPlainText">net_failsafe: Failed to probe devargs net_tap_vsc=
1<o:p></o:p></p>
<p class=3D"MsoPlainText">EAL: No legacy callbacks, legacy socket not creat=
ed<o:p></o:p></p>
<p class=3D"MsoPlainText">EAL: Drop mp reply: eal_dev_mp_request<o:p></o:p>=
</p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">PRIMARY PROCESS<o:p></o:p></p>
<p class=3D"MsoPlainText">timestamp=3D1633598196<o:p></o:p></p>
<p class=3D"MsoPlainText">TCZ0.0.0 Cycle 152 (Build 1832)<o:p></o:p></p>
<p class=3D"MsoPlainText">signal 11 (Segmentation fault), address is 0x38 f=
rom 0x9d8fbe<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 1) _Z18snprintf_backtraceRPciiP9siginfo_t=
Pv&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0xf4) - sp =3D 0x7fffecf411=
50, ip =3D 0x100dd44<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 2) _Z13crit_err_hdlriP9siginfo_tPv&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p; (+&nbsp; 0x159) - sp =3D 0x7fffecf41c60, ip =3D 0x100e019<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 3) _ZN13SignalAdapter12handleSignalEiP9si=
ginfo_tPv (+&nbsp; 0x104) - sp =3D 0x7fffecf41f40, ip =3D 0xff4894<o:p></o:=
p></p>
<p class=3D"MsoPlainText">[bt]: ( 4) _L_unlock_18&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0x2c) - sp =3D 0x7fffecf41fc0, ip =3D=
 0x7ffff61d9630<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 5) failsafe_eth_dev_close&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+&nbsp; 0x65e) - sp =3D 0x=
7fffecf42c90, ip =3D 0x9d8fbe<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 6) rte_eth_link_get_nowait&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0x6a) - sp =3D 0x=
7fffecf42cf0, ip =3D 0x62fa0a<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 7) _ZN11StatsThread9statsLoopEP10CustomOb=
ject&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+&nbsp; 0x33e) - sp =3D 0x7fffecf42d20,=
 ip =3D 0xedea2e<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 8) _ZN11StatsThread9statsLoopEP10CustomOb=
ject&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+&nbsp; 0x8dc) - sp =3D 0x7fffecf42d90,=
 ip =3D 0xedefcc<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: ( 9) ThreadFunction&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp; (+&nbsp;&nbsp; 0xe6) - sp =3D 0x7fffecf42db0, ip =3D 0x7ffff6b=
477e6<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: (10) start_thread&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0xc5) - sp =3D 0x7fffecf42de0, ip =3D=
 0x7ffff61d1ea5<o:p></o:p></p>
<p class=3D"MsoPlainText">[bt]: (11) clone&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (+&nbsp;&nbsp; 0x=
6d) - sp =3D 0x7fffecf42e80, ip =3D 0x7ffff0a6b96d<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">DPDK 20.11.2<o:p></o:p></p>
<p class=3D"MsoPlainText">core mask is 00000000000000000000000000004000<o:p=
></o:p></p>
<p class=3D"MsoPlainText">DPDK Custom Process initialized with 2 ports<o:p>=
</o:p></p>
<p class=3D"MsoPlainText">the min max TxQ is maxTxQueues 16<o:p></o:p></p>
<p class=3D"MsoPlainText">Using 1 RxQs for port 0 (# F-core=3D1)<o:p></o:p>=
</p>
<p class=3D"MsoPlainText">Using 1 RxQs for port 3 (# F-core=3D1)<o:p></o:p>=
</p>
<p class=3D"MsoPlainText">Core 14 (port=3D0, rxQ=3D0) kni_ring=3D(nil)<o:p>=
</o:p></p>
<p class=3D"MsoPlainText">Core 14 (port=3D3, rxQ=3D0) kni_ring=3D(nil)<o:p>=
</o:p></p>
<p class=3D"MsoPlainText">Core 14 txN =3D 0<o:p></o:p></p>
<p class=3D"MsoPlainText">Thread for core 14 using ring from usbc of 0x3111=
7b29bb00<o:p></o:p></p>
<p class=3D"MsoPlainText">Ring size must be powers of 2, adjusting from 819=
6 to 16384<o:p></o:p></p>
<p class=3D"MsoPlainText">Thread for core 14 using ring from MEDIA of 0x311=
17b27b840<o:p></o:p></p>
<p class=3D"MsoPlainText">Encaps Memory Zone=3D 48044 sizeof encaps =3D 60<=
o:p></o:p></p>
<p class=3D"MsoPlainText">Trace Memory Zone=3D 272<o:p></o:p></p>
<p class=3D"MsoPlainText">Policy Memory Zone=3D 8196 sizeof policy =3D 240<=
o:p></o:p></p>
<p class=3D"MsoPlainText">link status for port 0 is 1<o:p></o:p></p>
<p class=3D"MsoPlainText">link status for port 3 is 1<o:p></o:p></p>
<p class=3D"MsoPlainText">PORT 0 supports 16 rx queues and 16 tx queues (dr=
iver_name =3D net_failsafe, driver_type =3D 16)<o:p></o:p></p>
<p class=3D"MsoPlainText">PORT 0 is polling for link-change, interrupts dis=
abled<o:p></o:p></p>
<p class=3D"MsoPlainText">[DPDK] tap_flow_create(): Kernel refused TC filte=
r rule creation (17): File exists<o:p></o:p></p>
<p class=3D"MsoPlainText">[DPDK] net_failsafe: Failed to create flow on sub=
_device 1<o:p></o:p></p>
<p class=3D"MsoPlainText">add_flow(): create() fails for port 0; Reason: ov=
erlapping rules or Kernel too old for flower support<o:p></o:p></p>
<p class=3D"MsoPlainText">Error adding broadcast flow<o:p></o:p></p>
<p class=3D"MsoPlainText">PORT 3 supports 16 rx queues and 16 tx queues (dr=
iver_name =3D net_failsafe, driver_type =3D 16)<o:p></o:p></p>
<p class=3D"MsoPlainText">PORT 3 is polling for link-change, interrupts dis=
abled<o:p></o:p></p>
<p class=3D"MsoPlainText">[DPDK] EAL: Failed to hotplug add device on prima=
ry<o:p></o:p></p>
<p class=3D"MsoPlainText">[DPDK] tap_flow_create(): Kernel refused TC filte=
r rule creation (17): File exists<o:p></o:p></p>
<p class=3D"MsoPlainText">[DPDK] net_failsafe: Failed to create flow on sub=
_device 1<o:p></o:p></p>
<p class=3D"MsoPlainText">add_flow(): create() fails for port 3; Reason: ov=
erlapping rules or Kernel too old for flower support<o:p></o:p></p>
<p class=3D"MsoPlainText">Error adding broadcast flow<o:p></o:p></p>
<p class=3D"MsoPlainText">Cmd Thread is available<o:p></o:p></p>
<p class=3D"MsoPlainText">Capture object initialized<o:p></o:p></p>
<p class=3D"MsoPlainText">init :Stats Thread is available<o:p></o:p></p>
<p class=3D"MsoPlainText">ifLinkUpdate: Sending OperStatus for port=3D0 sta=
t=3D1<o:p></o:p></p>
<p class=3D"MsoPlainText">ifLinkUpdate: Port 0 Link Change - speed 40000 Mb=
ps - full-duplex<o:p></o:p></p>
<p class=3D"MsoPlainText">[DPDK] EAL: Fail to recv reply for request /var/r=
un/dpdk/oracusbc/mp_socket_2934_298e9db8d1:eal_dev_mp_request<o:p></o:p></p=
>
<p class=3D"MsoPlainText">[DPDK] EAL: rte_mp_request_sync failed<o:p></o:p>=
</p>
<p class=3D"MsoPlainText">[DPDK] EAL: Failed to send hotplug request to sec=
ondary<o:p></o:p></p>
<p class=3D"MsoPlainText">[DPDK] EAL: Fail to recv reply for request /var/r=
un/dpdk/oracusbc/mp_socket_2934_298e9db8d1:eal_dev_mp_request<o:p></o:p></p=
>
<p class=3D"MsoPlainText">[DPDK] EAL: rte_mp_request_sync failed<o:p></o:p>=
</p>
<p class=3D"MsoPlainText">[DPDK] EAL: Failed to hotplug add device on prima=
ry<o:p></o:p></p>
<p class=3D"MsoPlainText">[DPDK] Invalid port_id=3D2<o:p></o:p></p>
<p class=3D"MsoPlainText">[DPDK] net_failsafe: Operation rte_eth_stats_get =
failed for sub_device 1 with error -19<o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">There is some race at secondary process and prima=
ry got crashed because its data-structures and partially filled.<o:p></o:p>=
</p>
<p class=3D"MsoPlainText">Let me know if you need GDB analysis, I can share=
 with next reply if you are still unsatisfied. GDB analysis will be bigger.=
<o:p></o:p></p>
<p class=3D"MsoPlainText">Thanks!<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">Regards<o:p></o:p></p>
<p class=3D"MsoPlainText">Vipul<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">-----Original Message-----<br>
From: Ga=EBtan Rivet &lt;grive@u256.net&gt; <br>
Sent: Monday, November 22, 2021 3:53 PM<br>
To: Vipul Ashri &lt;vipul.ashri@oracle.com&gt;; dev@dpdk.org<br>
Cc: stable@dpdk.org<br>
Subject: [External] : Re: [PATCH v2] net/failsafe: link_update request cras=
hing at boot</p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">On Thu, Oct 21, 2021, at 23:42, <a href=3D"mailto=
:vipul.ashri@oracle.com">
<span style=3D"color:windowtext;text-decoration:none">vipul.ashri@oracle.co=
m</span></a> wrote:<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; From: Vipul Ashri &lt;<a href=3D"mailto:vipu=
l.ashri@oracle.com"><span style=3D"color:windowtext;text-decoration:none">v=
ipul.ashri@oracle.com</span></a>&gt;<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt;<o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">&gt; failsafe crashed while sending early link_up=
date request during boot
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; time initialization.<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; Based on debugging we found failsafe device =
was good but sub- devices
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; were progressing towards initialization and =
SUBOPS macro where
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; expanding macro gives [partial_dev]-&gt;dev_=
ops-&gt;link_update()<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; execution of which triggered crash because d=
ev_ops=3D=3D0. similar crash
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; seen at failsafe_eth_dev_close()<o:p></o:p><=
/p>
<p class=3D"MsoPlainText">&gt;<o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">&gt; Failsafe driver need a separate check for su=
bdevices similar to
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; &quot;RTE_ETH_VALID_PORTID_OR_ERR_RET(port_i=
d, -ENODEV);&quot; which is called
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; to almost every eth_dev function.<o:p></o:p>=
</p>
<p class=3D"MsoPlainText">&gt;<o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">&gt; Fixes: a46f8d5 (&quot;net/failsafe: add fail=
-safe PMD&quot;)<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; Cc: <a href=3D"mailto:stable@dpdk.org"><span=
 style=3D"color:windowtext;text-decoration:none">stable@dpdk.org</span></a>=
<o:p></o:p></p>
<p class=3D"MsoPlainText">&gt; Signed-off-by: Vipul Ashri &lt;<a href=3D"ma=
ilto:vipul.ashri@oracle.com"><span style=3D"color:windowtext;text-decoratio=
n:none">vipul.ashri@oracle.com</span></a>&gt;<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">Hello Vipul,<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">I'm sorry for the delay, I missed your fix on the=
 mailing list.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">IIUC, the issue is that failsafe finished init an=
d received an ethdev operation call, but one of its sub-device, although ma=
rked DEV_ACTIVE, has its eth_dev-&gt;dev_ops field NULL.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">It is really surprising to me, because there aren=
't many ways for a sub-device to become DEV_ACTIVE.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">The only two ways are<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">&nbsp; * by executing 'fs_dev_configure()', which=
 will first execute<o:p></o:p></p>
<p class=3D"MsoPlainText">&nbsp;&nbsp;&nbsp; rte_eth_dev_configure() on the=
 sub-device, and on error would<o:p></o:p></p>
<p class=3D"MsoPlainText">&nbsp;&nbsp;&nbsp; stop *without* setting DEV_ACT=
IVE.<o:p></o:p></p>
<p class=3D"MsoPlainText">&nbsp;&nbsp;&nbsp; rte_eth_dev_configure() will i=
tself execute<o:p></o:p></p>
<p class=3D"MsoPlainText">&nbsp;&nbsp;&nbsp; RTE_ETH_VALID_PORTID_OR_ERR_RE=
T(port_id, -ENODEV), so it would<o:p></o:p></p>
<p class=3D"MsoPlainText">&nbsp;&nbsp;&nbsp; return negative errno and fs_d=
ev_configure() would abort.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">&nbsp; * by executing 'fs_dev_remove()' and the s=
ub-device was 'DEV_STARTED'<o:p></o:p></p>
<p class=3D"MsoPlainText">&nbsp;&nbsp;&nbsp; to begin with, then it is retr=
ograded to DEV_ACTIVE once stopped.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">So I don't understand yet how it is possible for =
a sub-device to become DEV_ACTIVE while its eth_dev-&gt;dev_ops are NULL. I=
t seems more like a bug, memory corruption or just an unexpected execution =
pattern.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">Could describe in more detail the execution?<o:p>=
</o:p></p>
<p class=3D"MsoPlainText">In particular, setting the EAL log-level to debug=
 with the option:<o:p></o:p></p>
<p class=3D"MsoPlainText">' --log-level pmd.net.failsafe:debug '<o:p></o:p>=
</p>
<p class=3D"MsoPlainText">for example while using testpmd or your DPDK app.=
<o:p></o:p></p>
<p class=3D"MsoPlainText">It should show ethdev level accesses to the sub-d=
evices, and error values.<o:p></o:p></p>
<p class=3D"MsoPlainText"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoPlainText">Best regards,<o:p></o:p></p>
<p class=3D"MsoPlainText">--<o:p></o:p></p>
<p class=3D"MsoPlainText">Gaetan Rivet<o:p></o:p></p>
</div>
</body>
</html>

--_000_PH0PR10MB5514F1BB9221AAD404DE194D9A339PH0PR10MB5514namp_--