From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 865D6A034C for ; Tue, 15 Feb 2022 17:25:09 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 719734114E; Tue, 15 Feb 2022 17:25:09 +0100 (CET) Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by mails.dpdk.org (Postfix) with ESMTP id 6FCF5410F3; Tue, 15 Feb 2022 17:25:07 +0100 (CET) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 21FGJ71E008017; Tue, 15 Feb 2022 16:25:06 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : message-id : date : subject : to : cc : references : from : in-reply-to : mime-version; s=corp-2021-07-09; bh=YT6Y/g8ikpSb3NZnpq2TARyhpjTsHkfmndqtIRSSa7o=; b=VboGWhffA0Za0bePoGIiAsGwMQuBxmanvvFANbINzRNMNDmdGYNCDjF79IH0+oJJw13O Tw6y0kezraTbJ+YKm+8Pt+BC0c7i1ASNwDm29/yZz0b/R4Z4iUmJynPfIsd+/iJZPce/ GJeGhD91UXZIurDcrVmJThlYckgQg2F+MzzEAV66LRdS8/iv3M8yIE2j6JZqPdFnGJY1 MhrHdjMQdO5X8Rv8gxtGeB1S1gk0GuCcKf6pOJm3o4MsucZyF3OWuCjrrSugyU8yX/Yi 57n0G7I2NChOoGf4tiqTuEg0jx6WP84wOaoZRNTHzACewm3HlmRUEjeAYAJ1ZZJVnKLe 4Q== Received: from userp3020.oracle.com (userp3020.oracle.com [156.151.31.79]) by mx0b-00069f02.pphosted.com with ESMTP id 3e88hghfg8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 15 Feb 2022 16:25:06 +0000 Received: from pps.filterd (userp3020.oracle.com [127.0.0.1]) by userp3020.oracle.com (8.16.1.2/8.16.1.2) with SMTP id 21FGF8u5008765; Tue, 15 Feb 2022 16:25:04 GMT Received: from nam11-dm6-obe.outbound.protection.outlook.com (mail-dm6nam11lp2177.outbound.protection.outlook.com [104.47.57.177]) by userp3020.oracle.com with ESMTP id 3e66bnv1a4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 15 Feb 2022 16:25:04 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=MgxHXYjwBjc+NAh7fK+CsCae6taW8mPPViMFEWI3TSJb6OTLx8oAuvt52NRJDmJqeCG1auN1oWZXHCE1OzlbB+jbF3Bpp+eq0pXHFdpawNu/s2e92vqQHrmnrCJbtKYynrkveat0vmPgZJ7t6clNQTOkare55SQvyqnARBigIaY5cLdSRjRPie7fjHTunvDSnwvZc3jh428/IS63b0l47zvmcPFMHaQeTgpahf8GmwqC5m7eA3QHbSX9juEP3IUhKMnJIY+7meRrkxUSVxbCsvm3Bt3vyKVOOMnzkVhd6OITSCRj3c0Vqv/dJJ+r9JRPga+8gHICRv9DE5HsPYuEqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YT6Y/g8ikpSb3NZnpq2TARyhpjTsHkfmndqtIRSSa7o=; b=SDE8qJT0mf2mFXvlpwSpZJJlHbrxT8tWxEz7EvXXbi+WIR8PG62X8ZRprcFAjnY3p7uE7WQmPZVz0qwgk3skh50YXzSJc8jqBNOrzaYhoZFp8htdw/Mj8zijonooxkvk+lsl0MOQCPsOl84Xapo8qYZRmJjUGSUjNJ1iaHBN8uNDUl3MOTzkHxwPaqMqSE2yLunbOAgqd0OsYKPz6214dgEzhyu87RPbxfplMT0wVZPat7mQOJ3vyiqlSBIxYayo5G3kJz6t+EKKJQz1sKy3rAEFr9kxJbpcJdV+KgTzrwqgzJ8/M6oFIhlbRKe3Yvncp9fjE2kuamSXk44hRpr9fA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.onmicrosoft.com; s=selector2-oracle-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YT6Y/g8ikpSb3NZnpq2TARyhpjTsHkfmndqtIRSSa7o=; b=ts3+yY9+zYaIiJoAKA3V1N98VMvyHvJ246Eg6fBS1xOZ2cFp3m0iYMb7wyeGpGrHvq/E2jgf7cBJ4BftUYPlFyt4SsNZF856PQ4sH47HaFRjdJHh1mvN5t6tD28wQiXheCMRjxtuzS9pBrdnYIOff8kj7vbFLh5dGOf+utlNiiM= Received: from PH0PR10MB5514.namprd10.prod.outlook.com (2603:10b6:510:106::17) by DM6PR10MB4170.namprd10.prod.outlook.com (2603:10b6:5:213::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4975.11; Tue, 15 Feb 2022 16:25:02 +0000 Received: from PH0PR10MB5514.namprd10.prod.outlook.com ([fe80::7080:9532:83ec:6f68]) by PH0PR10MB5514.namprd10.prod.outlook.com ([fe80::7080:9532:83ec:6f68%4]) with mapi id 15.20.4975.019; Tue, 15 Feb 2022 16:25:02 +0000 Content-Type: multipart/alternative; boundary="------------X0iRtJhSPoUg4B36CGJWqdhF" Message-ID: Date: Tue, 15 Feb 2022 21:54:55 +0530 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.6.0 Subject: Re: [PATCH v2] net/failsafe: link_update request crashing at boot Content-Language: en-US To: =?UTF-8?Q?Ga=c3=abtan_Rivet?= Cc: dev@dpdk.org, stable@dpdk.org References: <20211021115139.2634-1-vipul.ashri@oracle.com> <20211021214215.1633-1-vipul.ashri@oracle.com> <87c84612-4116-4fe7-a711-f5f364513c3d@www.fastmail.com> From: Vipul Ashri Organization: Oracle Corporation In-Reply-To: <87c84612-4116-4fe7-a711-f5f364513c3d@www.fastmail.com> X-ClientProxiedBy: TYAPR03CA0013.apcprd03.prod.outlook.com (2603:1096:404:14::25) To PH0PR10MB5514.namprd10.prod.outlook.com (2603:10b6:510:106::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 60358e4a-d498-4c05-3b49-08d9f09fb7a0 X-MS-TrafficTypeDiagnostic: DM6PR10MB4170:EE_ X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:383; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: qbflq0lqUqLXbcFzlV7fioXUggmVLK4Bkftv7hwa4uJVhve0DjHZLcCaFHns5dd42MhiZ3NePrA/OxiToDoxqW0lO8ZTXUGAvaNhDAxBMuRkxJC24QqusSte13uvwbCIWR+8jrH9P5RBDXGhv8Y9Nj15mB4DZ5zeim6g8fkFTimVldPCttOy6dOCr5JbxNW3sS/cAX2kVjo8ietCe1FWn7BKVW6sgXKGwLgx62xb2sn9GbatQ6YCcy82ubXz+1bRgcrkExjELA8PbbRUDP4SWQYH03O6r89slwWR4WLTNwj8nzq4UVzLJB4RKMp3NfPonL5PiM7xSc3Jl426pTwOna3JrYdqXpZBOjupVffd2z+M/GNY2g3o2hpRDqsDyJ8rBif1VIFbvqq70K4qhhhO+jICk1esiicPue/CKWpccjWeJgjWR7y3OFtgjVcvF9gU5MeZNmNJfYUZbuXsK15yJrl/j7cj2MJKDuG5fg+WytIsiRU+oG1nVlFez7OOH+gZajIS3tkVWa+NKhE0z8ZpagHgICdx/r0206lT26NpFzJtSvH/NF2WIPFjChtlpmZRVc6tBwPhhlre6rYq97zURVqK6tX6Fuu8u2GyKB9ydtkiSvudtLgIsbWgXlyKp9GY/L3zrLX/7JMsjyumd4ej/thVfenm+SsDxujBHRmqtEtZJZ4SyZjT01IqzTUItdHGHki2Z+Wj5lIVuHNIV+r8iUb83iNJgC6d7M7RZSZzjxA= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH0PR10MB5514.namprd10.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230001)(366004)(2906002)(31686004)(30864003)(6512007)(26005)(36756003)(83380400001)(44832011)(66574015)(186003)(36916002)(316002)(2616005)(86362001)(8676002)(19627235002)(8936002)(508600001)(31696002)(66556008)(66476007)(33964004)(6486002)(38100700002)(6916009)(5660300002)(53546011)(66946007)(4326008)(6666004)(6506007)(43740500002)(45980500001); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?N3V2RHNTUzNHYjRmMjY1WGVnTmQ5KzdRK2dsaUcyclZGS1VrU1B0Zml2dzVX?= =?utf-8?B?djdSWlZyVVRsR3ZTWlZtSXJCS1M0ZklSU0dKcllpSndRbk5ISGt1M2IxdU1G?= =?utf-8?B?d3dzd1pOSFpDYTVud0lONkc1b3hxTFZoYU9jT3crWk91MncxYXNudmN5eU5V?= =?utf-8?B?bFBhQXRkV29pMVNldjZnU2l0QmkzdDF2b3k4YlJvaXFkMy9MOTBxQjI0czBE?= =?utf-8?B?T0xzOEVFa3Iya1dkMjdpc0NXK09EZjBpKzQycHludDlvRzhBTFAzbmF3UzNl?= =?utf-8?B?dHpZb2JhMUtRVHpmQkpwUjBRaFkwTVlLemZ5MmpPZG1qd0Y2U0Z0aXNQZTRz?= =?utf-8?B?MVFxdkUzcTdhNlJzTnNRek5CQjhuTHI4RUwwV0w2SVVNeDdWcVErVjFLRGYr?= =?utf-8?B?MC83cnBjWDd6SGM4MzF3aDhTU3l3MjZkZThIWDhnTURzaGR4Q3RNY1ZwUkcw?= =?utf-8?B?VG0rVVRyaVM1TXlYR0J4L25xMmI1NlRYTnZDWG14VWgxVnE0OEZKdFFLY0dy?= =?utf-8?B?b1dCbzBJOXN3UmxFTzZsajFERERldlR0YnYzeHNiTTVZYjVOM1hDUFBabmhl?= =?utf-8?B?c0J2cHRzMjZXYUQ4Ym5TK3hkMUpnUEtVYnpBenYvVXpIRlR3Y1R4dTlVV081?= =?utf-8?B?a2lqbld0eElKdVZuQmFVV2E3RzEzUFlyRXJHRWNsN0J4dXdYaVUxVmtqbWdC?= =?utf-8?B?a2FvTVpZQm9qSFZSRTFHQ05mdzRUUzNoY3h3NUhmblcxenhyYTZqaVQ4clYv?= =?utf-8?B?dUNnbUJKTzdFUytIeUYxUzNHbytiSWZFMEhXMDVZV3VwM3BaWDFEN2k5TUJj?= =?utf-8?B?WTBSZXh4eW9LeGVQZnJ0NDcreDVoYU95NDlpRDVxU1hLdmpCc1JBUjk4ZDFp?= =?utf-8?B?Y2NReUNMWmcvdFI1YjNRZm41WlJ6bG5CelJWOEQ5a3prQ1Rjckx3dHBCdGdS?= =?utf-8?B?a1ozN0dYMTl4cS9ZdTRSTzd0aTFQeDJjS2RWaFpJM2RLb3JhM1E4SjVEVlVu?= =?utf-8?B?RlhScnZCZmpuZ2h5am82SHdQM21hUGtxZFp5RlZEOHFwK0VNcTBNUFRCZlhv?= =?utf-8?B?Rjg0V0N4cnU4ZmtBem1taUhFTVVBZVNnVWltNnM4bEF6STBkWjdTOWZNZ2Vj?= =?utf-8?B?YklzLzhiOTdMdld0YlRzUEVzcmdLV1oyRmh3OUNiaEdNZ3c5WXM3eHJDdERh?= =?utf-8?B?dk5VcEhMK1NRcjRmNERCTTRCNk5ocmNwdWJZOWo2Y3FmRlA4QWdlWklKWktE?= =?utf-8?B?V0trNi9XM0g4K09mcXByR3ZuNElJOGE1dHY2RG1VTDBlQmV3TmdrK0VOL1FK?= =?utf-8?B?Q3pBbG9IRFk0QWRydHVqdEZQa1dPUUJuOEtXWGZPb1E2b1lOYUp3Q3hXTkRm?= =?utf-8?B?Wmdla2RKUkI3Yi9za3RKMEtEVERGNk9QSHBmNGVKT0VqYzhnaS9JbFhDeElS?= =?utf-8?B?Q0ZZR1l6d2l0L3VmYzRoYzZQYzk3SmFLRjhBbHFVZ2JENEJleFBrNlJ4dXlZ?= =?utf-8?B?ejcrMkFaTWtzRk56UFkvdndPejdyVStnZnd5ZzdXK010MGUvUHIwTEtKL2NL?= =?utf-8?B?S1JSQmowZGMzdzBJTTFoallOczNyS2IzazFlVzVtMXovYTVCMmVyM2NwQzVj?= =?utf-8?B?MTlyaER2RUZBd3k0MGNIWk9lM2ZOd09KUWMrYVBvMENJc0J3WUV2T2hHMitT?= =?utf-8?B?TG00YnNiUEhCZXhIbE5zKzdGemhqUUlWR3RudGtFYzIxUXNYK1kwcWo4eWhQ?= =?utf-8?B?bzNQRmJMaXhwRHNuVHRyY2ZSWkZXbFhFRWRac1BEUTl1bVc3NXo5L3hjV3BY?= =?utf-8?B?TVZLMktuYzQ3UmZKd0RrRmRpNHhPMWNvL1IxamJieWlPTGwyVnZCQXNDOWM4?= =?utf-8?B?YWt2NHdWYk85S2lFajc0OEUvZTFTTkZTaTNYcEdWVzNIRmxTcGpLUlhQby9R?= =?utf-8?B?Q1dhNEQzcFZUZXh3M0NpWlhOWDY3azNqOEFGMFFZSGVqRUJGRGRnSUJnSjdq?= =?utf-8?B?YkJvTGlQb2FjTFpvSFBNY0RSUTdpV3ZlRkt4THdwOGEza29DQnBUYWhkbVJS?= =?utf-8?B?SlYwQ3VPM0xrcEJZakJuWnFsbnRGWGtKZThoSTViWlRJYWhIQ3o4THhCWVBV?= =?utf-8?B?MnFsMHpJdGtxY1BYWndhYmZJUDg4SzRyUlo3dVp1cjUyVXVmZDN5S0g2UjJz?= =?utf-8?Q?ZSzkiLS/PT1fCgviL4GzbeI=3D?= X-OriginatorOrg: oracle.com X-MS-Exchange-CrossTenant-Network-Message-Id: 60358e4a-d498-4c05-3b49-08d9f09fb7a0 X-MS-Exchange-CrossTenant-AuthSource: PH0PR10MB5514.namprd10.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Feb 2022 16:25:02.4895 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 4e2c6054-71cb-48f1-bd6c-3a9705aca71b X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: HhPZ6kqYGGoQE257JfcPTTDbgIOmE4f0/qn9E3AE0T0+usuipgwU6g5wX9BYYiZwuB8SAmvhPhzOokjr9zKSjQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR10MB4170 X-Proofpoint-Virus-Version: vendor=nai engine=6300 definitions=10258 signatures=673431 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 spamscore=0 mlxlogscore=999 phishscore=0 bulkscore=0 malwarescore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2201110000 definitions=main-2202150096 X-Proofpoint-GUID: -Q9A7nibhuQ-c2Eq-XRmLiu1a5fKT6Gz X-Proofpoint-ORIG-GUID: -Q9A7nibhuQ-c2Eq-XRmLiu1a5fKT6Gz X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org --------------X0iRtJhSPoUg4B36CGJWqdhF Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 11/22/2021 3:53 PM, Gaëtan Rivet wrote: > Could describe in more detail the execution? > In particular, setting the EAL log-level to debug with the option: > ' --log-level pmd.net.failsafe:debug' > for example while using testpmd or your DPDK app. > It should show ethdev level accesses to the sub-devices, and error values. > > Best regards, Hi Gaetan Sorry for very late reply, we were busy working on 21.11 integration. Although we have adopted this code internally for us but I am sharing the patch to opensource for community benefit. This is specific case of AZURE setup with our very customized complex environment. Let me share the logs with trace-back first ================================================================================================================== SECONDARY PROCESS timestamp=1633598184 TCZ0.0.0 Cycle 152 (Build 1832) signal 11 (Segmentation fault), address is 0x31117bbce6c8 from 0x47d08b1 [bt]: ( 1) _Z18snprintf_backtraceRPciiP9siginfo_tPv      (+   0xf4) - sp = 0x7fffef3fd110, ip = 0x3acdc54 [bt]: ( 2) _Z13crit_err_hdlriP9siginfo_tPv               (+  0x159) - sp = 0x7fffef3fdc20, ip = 0x3acdf29 [bt]: ( 3) _ZN13SignalAdapter12handleSignalEiP9siginfo_tPv (+  0x104) - sp = 0x7fffef3fdf00, ip = 0x274d4c4 [bt]: ( 4) _L_unlock_18                                  (+   0x2c) - sp = 0x7fffef3fdf80, ip = 0x7ffff7bce630 [bt]: ( 5) rte_eth_dev_attach_secondary                  (+   0x21) - sp = 0x7fffef3fec50, ip = 0x47d08b1 [bt]: ( 6) rte_eth_from_ring                             (+ 0x3438) - sp = 0x7fffef3fec80, ip = 0x4e49da8 [bt]: ( 7) _init                                         (+ 0xa1b8) - sp = 0x7fffef3feec0, ip = 0x12e0368 [bt]: ( 8) local_dev_probe                               (+   0xac) - sp = 0x7fffef3feef0, ip = 0x478fd2c [bt]: ( 9) rte_uuid_unparse                              (+  0x274) - sp = 0x7fffef3fef30, ip = 0x47a3e94 [bt]: (10) rte_eal_vfio_get_vf_token                     (+   0xd7) - sp = 0x7fffef3ff110, ip = 0x47b04b7 [bt]: (11) eal_hugepage_info_read                        (+  0x602) - sp = 0x7fffef3ff170, ip = 0x47b2cd2 [bt]: (12) start_thread                                  (+   0xc5) - sp = 0x7fffef3ff220, ip = 0x7ffff7bc6ea5 [bt]: (13) clone                                         (+   0x6d) - sp = 0x7fffef3ff2c0, ip = 0x7ffff004096d EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket:eal_dev_mp_request EAL: Cannot send request to primary EAL: Failed to send hotplug request to primary net_failsafe: Failed to probe devargs net_tap_vsc0 EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket:eal_dev_mp_request EAL: Cannot send request to primary EAL: Failed to send hotplug request to primary net_failsafe: Failed to probe devargs net_tap_vsc1 EAL: No legacy callbacks, legacy socket not created EAL: Drop mp reply: eal_dev_mp_request ================================================================================================================== PRIMARY PROCESS timestamp=1633598196 TCZ0.0.0 Cycle 152 (Build 1832) signal 11 (Segmentation fault), address is 0x38 from 0x9d8fbe [bt]: ( 1) _Z18snprintf_backtraceRPciiP9siginfo_tPv      (+   0xf4) - sp = 0x7fffecf41150, ip = 0x100dd44 [bt]: ( 2) _Z13crit_err_hdlriP9siginfo_tPv               (+  0x159) - sp = 0x7fffecf41c60, ip = 0x100e019 [bt]: ( 3) _ZN13SignalAdapter12handleSignalEiP9siginfo_tPv (+  0x104) - sp = 0x7fffecf41f40, ip = 0xff4894 [bt]: ( 4) _L_unlock_18                                  (+   0x2c) - sp = 0x7fffecf41fc0, ip = 0x7ffff61d9630 [bt]: ( 5) failsafe_eth_dev_close                        (+  0x65e) - sp = 0x7fffecf42c90, ip = 0x9d8fbe [bt]: ( 6) rte_eth_link_get_nowait                       (+   0x6a) - sp = 0x7fffecf42cf0, ip = 0x62fa0a [bt]: ( 7) _ZN11StatsThread9statsLoopEP10CustomObject      (+  0x33e) - sp = 0x7fffecf42d20, ip = 0xedea2e [bt]: ( 8) _ZN11StatsThread9statsLoopEP10CustomObject      (+  0x8dc) - sp = 0x7fffecf42d90, ip = 0xedefcc [bt]: ( 9) ThreadFunction                                (+   0xe6) - sp = 0x7fffecf42db0, ip = 0x7ffff6b477e6 [bt]: (10) start_thread                                  (+   0xc5) - sp = 0x7fffecf42de0, ip = 0x7ffff61d1ea5 [bt]: (11) clone                                         (+   0x6d) - sp = 0x7fffecf42e80, ip = 0x7ffff0a6b96d ================================================================================================================== DPDK 20.11.2 core mask is 00000000000000000000000000004000 DPDK Custom Process initialized with 2 ports the min max TxQ is maxTxQueues 16 Using 1 RxQs for port 0 (# F-core=1) Using 1 RxQs for port 3 (# F-core=1) Core 14 (port=0, rxQ=0) kni_ring=(nil) Core 14 (port=3, rxQ=0) kni_ring=(nil) Core 14 txN = 0 Thread for core 14 using ring from usbc of 0x31117b29bb00 Ring size must be powers of 2, adjusting from 8196 to 16384 Thread for core 14 using ring from MEDIA of 0x31117b27b840 Encaps Memory Zone= 48044 sizeof encaps = 60 Trace Memory Zone= 272 Policy Memory Zone= 8196 sizeof policy = 240 link status for port 0 is 1 link status for port 3 is 1 PORT 0 supports 16 rx queues and 16 tx queues (driver_name = net_failsafe, driver_type = 16) PORT 0 is polling for link-change, interrupts disabled [DPDK] tap_flow_create(): Kernel refused TC filter rule creation (17): File exists [DPDK] net_failsafe: Failed to create flow on sub_device 1 add_flow(): create() fails for port 0; Reason: overlapping rules or Kernel too old for flower support Error adding broadcast flow PORT 3 supports 16 rx queues and 16 tx queues (driver_name = net_failsafe, driver_type = 16) PORT 3 is polling for link-change, interrupts disabled [DPDK] EAL: Failed to hotplug add device on primary [DPDK] tap_flow_create(): Kernel refused TC filter rule creation (17): File exists [DPDK] net_failsafe: Failed to create flow on sub_device 1 add_flow(): create() fails for port 3; Reason: overlapping rules or Kernel too old for flower support Error adding broadcast flow Cmd Thread is available Capture object initialized init :Stats Thread is available ifLinkUpdate: Sending OperStatus for port=0 stat=1 ifLinkUpdate: Port 0 Link Change - speed 40000 Mbps - full-duplex [DPDK] EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket_2934_298e9db8d1:eal_dev_mp_request [DPDK] EAL: rte_mp_request_sync failed [DPDK] EAL: Failed to send hotplug request to secondary [DPDK] EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket_2934_298e9db8d1:eal_dev_mp_request [DPDK] EAL: rte_mp_request_sync failed [DPDK] EAL: Failed to hotplug add device on primary [DPDK] Invalid port_id=2 [DPDK] net_failsafe: Operation rte_eth_stats_get failed for sub_device 1 with error -19 There is some race at secondary process and primary got crashed because its data-structures and partially filled. Let me know if you need GDB analysis, I can share with next reply if you are still unsatisfied. GDB analysis will be bigger. Thanks! Regards --------------X0iRtJhSPoUg4B36CGJWqdhF Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit
On 11/22/2021 3:53 PM, Gaëtan Rivet wrote:
Could describe in more detail the execution?
In particular, setting the EAL log-level to debug with the option:
' --log-level pmd.net.failsafe:debug '
for example while using testpmd or your DPDK app.
It should show ethdev level accesses to the sub-devices, and error values.

Best regards,
Hi Gaetan

Sorry for very late reply, we were busy working on 21.11 integration.
Although we have adopted this code internally for us but I am sharing the patch to opensource for community benefit.

This is specific case of AZURE setup with our very customized complex environment.

Let me share the logs with trace-back first
==================================================================================================================
SECONDARY PROCESS
timestamp=1633598184
TCZ0.0.0 Cycle 152 (Build 1832)
signal 11 (Segmentation fault), address is 0x31117bbce6c8 from 0x47d08b1

[bt]: ( 1) _Z18snprintf_backtraceRPciiP9siginfo_tPv      (+   0xf4) - sp = 0x7fffef3fd110, ip = 0x3acdc54
[bt]: ( 2) _Z13crit_err_hdlriP9siginfo_tPv               (+  0x159) - sp = 0x7fffef3fdc20, ip = 0x3acdf29
[bt]: ( 3) _ZN13SignalAdapter12handleSignalEiP9siginfo_tPv (+  0x104) - sp = 0x7fffef3fdf00, ip = 0x274d4c4
[bt]: ( 4) _L_unlock_18                                  (+   0x2c) - sp = 0x7fffef3fdf80, ip = 0x7ffff7bce630
[bt]: ( 5) rte_eth_dev_attach_secondary                  (+   0x21) - sp = 0x7fffef3fec50, ip = 0x47d08b1
[bt]: ( 6) rte_eth_from_ring                             (+ 0x3438) - sp = 0x7fffef3fec80, ip = 0x4e49da8
[bt]: ( 7) _init                                         (+ 0xa1b8) - sp = 0x7fffef3feec0, ip = 0x12e0368
[bt]: ( 8) local_dev_probe                               (+   0xac) - sp = 0x7fffef3feef0, ip = 0x478fd2c
[bt]: ( 9) rte_uuid_unparse                              (+  0x274) - sp = 0x7fffef3fef30, ip = 0x47a3e94
[bt]: (10) rte_eal_vfio_get_vf_token                     (+   0xd7) - sp = 0x7fffef3ff110, ip = 0x47b04b7
[bt]: (11) eal_hugepage_info_read                        (+  0x602) - sp = 0x7fffef3ff170, ip = 0x47b2cd2
[bt]: (12) start_thread                                  (+   0xc5) - sp = 0x7fffef3ff220, ip = 0x7ffff7bc6ea5
[bt]: (13) clone                                         (+   0x6d) - sp = 0x7fffef3ff2c0, ip = 0x7ffff004096d
EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket:eal_dev_mp_request
EAL: Cannot send request to primary
EAL: Failed to send hotplug request to primary
net_failsafe: Failed to probe devargs net_tap_vsc0
EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket:eal_dev_mp_request
EAL: Cannot send request to primary
EAL: Failed to send hotplug request to primary
net_failsafe: Failed to probe devargs net_tap_vsc1
EAL: No legacy callbacks, legacy socket not created
EAL: Drop mp reply: eal_dev_mp_request
==================================================================================================================
PRIMARY PROCESS
timestamp=1633598196
TCZ0.0.0 Cycle 152 (Build 1832)
signal 11 (Segmentation fault), address is 0x38 from 0x9d8fbe

[bt]: ( 1) _Z18snprintf_backtraceRPciiP9siginfo_tPv      (+   0xf4) - sp = 0x7fffecf41150, ip = 0x100dd44
[bt]: ( 2) _Z13crit_err_hdlriP9siginfo_tPv               (+  0x159) - sp = 0x7fffecf41c60, ip = 0x100e019
[bt]: ( 3) _ZN13SignalAdapter12handleSignalEiP9siginfo_tPv (+  0x104) - sp = 0x7fffecf41f40, ip = 0xff4894
[bt]: ( 4) _L_unlock_18                                  (+   0x2c) - sp = 0x7fffecf41fc0, ip = 0x7ffff61d9630
[bt]: ( 5) failsafe_eth_dev_close                        (+  0x65e) - sp = 0x7fffecf42c90, ip = 0x9d8fbe
[bt]: ( 6) rte_eth_link_get_nowait                       (+   0x6a) - sp = 0x7fffecf42cf0, ip = 0x62fa0a
[bt]: ( 7) _ZN11StatsThread9statsLoopEP10CustomObject      (+  0x33e) - sp = 0x7fffecf42d20, ip = 0xedea2e
[bt]: ( 8) _ZN11StatsThread9statsLoopEP10CustomObject      (+  0x8dc) - sp = 0x7fffecf42d90, ip = 0xedefcc
[bt]: ( 9) ThreadFunction                                (+   0xe6) - sp = 0x7fffecf42db0, ip = 0x7ffff6b477e6
[bt]: (10) start_thread                                  (+   0xc5) - sp = 0x7fffecf42de0, ip = 0x7ffff61d1ea5
[bt]: (11) clone                                         (+   0x6d) - sp = 0x7fffecf42e80, ip = 0x7ffff0a6b96d

==================================================================================================================
DPDK 20.11.2
core mask is 00000000000000000000000000004000
DPDK Custom Process initialized with 2 ports
the min max TxQ is maxTxQueues 16
Using 1 RxQs for port 0 (# F-core=1)
Using 1 RxQs for port 3 (# F-core=1)
Core 14 (port=0, rxQ=0) kni_ring=(nil)
Core 14 (port=3, rxQ=0) kni_ring=(nil)
Core 14 txN = 0
Thread for core 14 using ring from usbc of 0x31117b29bb00
Ring size must be powers of 2, adjusting from 8196 to 16384
Thread for core 14 using ring from MEDIA of 0x31117b27b840
Encaps Memory Zone= 48044 sizeof encaps = 60
Trace Memory Zone= 272
Policy Memory Zone= 8196 sizeof policy = 240
link status for port 0 is 1
link status for port 3 is 1
PORT 0 supports 16 rx queues and 16 tx queues (driver_name = net_failsafe, driver_type = 16)
PORT 0 is polling for link-change, interrupts disabled
[DPDK] tap_flow_create(): Kernel refused TC filter rule creation (17): File exists
[DPDK] net_failsafe: Failed to create flow on sub_device 1
add_flow(): create() fails for port 0; Reason: overlapping rules or Kernel too old for flower support
Error adding broadcast flow
PORT 3 supports 16 rx queues and 16 tx queues (driver_name = net_failsafe, driver_type = 16)
PORT 3 is polling for link-change, interrupts disabled
[DPDK] EAL: Failed to hotplug add device on primary
[DPDK] tap_flow_create(): Kernel refused TC filter rule creation (17): File exists
[DPDK] net_failsafe: Failed to create flow on sub_device 1
add_flow(): create() fails for port 3; Reason: overlapping rules or Kernel too old for flower support
Error adding broadcast flow
Cmd Thread is available
Capture object initialized
init :Stats Thread is available
ifLinkUpdate: Sending OperStatus for port=0 stat=1
ifLinkUpdate: Port 0 Link Change - speed 40000 Mbps - full-duplex
[DPDK] EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket_2934_298e9db8d1:eal_dev_mp_request
[DPDK] EAL: rte_mp_request_sync failed
[DPDK] EAL: Failed to send hotplug request to secondary
[DPDK] EAL: Fail to recv reply for request /var/run/dpdk/oracusbc/mp_socket_2934_298e9db8d1:eal_dev_mp_request
[DPDK] EAL: rte_mp_request_sync failed
[DPDK] EAL: Failed to hotplug add device on primary
[DPDK] Invalid port_id=2
[DPDK] net_failsafe: Operation rte_eth_stats_get failed for sub_device 1 with error -19

There is some race at secondary process and primary got crashed because its data-structures and partially filled.
Let me know if you need GDB analysis, I can share with next reply if you are still unsatisfied. GDB analysis will be bigger.
Thanks!

Regards
--------------X0iRtJhSPoUg4B36CGJWqdhF--