From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2408345680 for ; Mon, 22 Jul 2024 22:34:48 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D4588402BE; Mon, 22 Jul 2024 22:34:47 +0200 (CEST) Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on2050.outbound.protection.outlook.com [40.107.96.50]) by mails.dpdk.org (Postfix) with ESMTP id C4EF14028B for ; Mon, 22 Jul 2024 22:34:46 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=GcJn7VNjOBRfgWMMH+ee7QGUgd4ZgoqRSHOWJMtywcpjLc+jHuVdPEm1BIDrLwbHujQAue1v5nxagptu5ewAqFxth53ZaqVDvqh2djHXmcpwFc/uQVJEYo01THc03qU1mur1T7lKLk0+TNC7baFDMT0gsgOxRsUlp7XgF8BGbZPmF0pfr98f2oLsv2c1vJkw7VFw1n10F0Gj8laCgCYBMN/nJsdfata6Yqr5BHefbbYzzxazwPbNpCStWFyI+lxxRE7y3SwD2X6NV7WQ0YQVkG5/mkYEc7KeFfv3DtH8cDzjtnpAf25i+bA4fSFUp4bUr7UkeUehedl+3f46aHHugQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wGMw48K68InWlZZxwfxRhYeg5m4hHQb6ODQ8Jct4XzM=; b=XR3o98EWN6Ea1kXgT7dwyRsDavw2xWed1stDQdaW3Yfa24qN9Lat95g55vGVnLG9PyTxZr4JyjRGLcuONiUIEuMVPIrgO3kb7t5UC7NKOoHwuUwi8r3UV/6V96FAqm9fbL/HQwu4VT4GlGN2fwtrVrckm8v+PCUlSpY2WlKmPy9E0McESPtXzNXUIWQ9iWwCBO/T+RCKkTBJTBOj5hI3Tem/RwbpINERkKkXyRvWhIv53uI/50v8RlxHhVKRV5GQKrT9SUHvA3NZry37qSNz2tIfj5lx6ToJisOS9UwgV8sFrhjBOwd8nI2MnlRtYHRVtezy0JwldI4RMoHXaiWcQA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wGMw48K68InWlZZxwfxRhYeg5m4hHQb6ODQ8Jct4XzM=; b=Q4+mLJMdCOLxa6aKzZ8yqe/HCz9+ueDu8FfIztWZyExIk/oBw1agtLCT9K2w4Q0M8SqtApS2NMO+zL0kbqZlO6Nx8TVM5mX7Zlx12rx6+Tw2AwIJ2txIKF61v4/StynEgi8VRkx40IHEKzRMcRWeJ6IaHPf8UFXHAyiflbHpfMzeJqJ4TfnXpKQWDFUXbKRWOjB+BsZmPe+nvtiqQ9N0A8VPz9f6sX/KGKxfAXvW/9U4jh1Yq/S9B8RTdjZmdMm7j7Ez4V1oxXid3rCg1U12lHph2Nz6qmReltQQuNLOI3eErYW9FZOpFxfVdia3jk5YYX01v1TEytIuzlHlyREgpQ== Received: from DM5PR12MB2405.namprd12.prod.outlook.com (2603:10b6:4:b2::20) by MN2PR12MB4301.namprd12.prod.outlook.com (2603:10b6:208:1d4::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7762.29; Mon, 22 Jul 2024 20:34:38 +0000 Received: from DM5PR12MB2405.namprd12.prod.outlook.com ([fe80::911c:6414:c3d3:7c85]) by DM5PR12MB2405.namprd12.prod.outlook.com ([fe80::911c:6414:c3d3:7c85%4]) with mapi id 15.20.7762.027; Mon, 22 Jul 2024 20:34:38 +0000 From: Alexander Kozyrev To: Ken Andrews , "users@dpdk.org" Subject: Re: Mellanox - Unexpected CEQ error, rx stops receiving packets Thread-Topic: Mellanox - Unexpected CEQ error, rx stops receiving packets Thread-Index: AQHayTZvXdTRG8YAUUi1PrqLUS1Yp7IDVPAX Date: Mon, 22 Jul 2024 20:34:38 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; x-ms-publictraffictype: Email x-ms-traffictypediagnostic: DM5PR12MB2405:EE_|MN2PR12MB4301:EE_ x-ms-office365-filtering-correlation-id: ef8d8ad2-bdf3-4d3f-bb8c-08dcaa8db51b x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; ARA:13230040|366016|376014|1800799024|38070700018; x-microsoft-antispam-message-info: =?iso-8859-1?Q?FAbGQpj5g6e2LFP+mznz/wfg7aPyJg5QfTvQyeG0r3UTsn9xeSkIubkZYI?= =?iso-8859-1?Q?tWL0q2Gl36pi/75WFXYAT72T73M62Gsaf7WcmbntlIXpuA/+lHxpRcsLNM?= =?iso-8859-1?Q?2uva0aO29q4Ccpgfn19Mt09652p0IWrqSyQ7qNGk1DiLF1ZE5As0ix/j5x?= =?iso-8859-1?Q?Vpr1cVeA6WdaJ2b1Kty2K8Sjqp5ZF2dWCEV3p3QDITsUHH+NYFYSdYjBKH?= =?iso-8859-1?Q?5JMcGSCg+FQbRfKK5rR+kgGFaNhKKAVOfLv4KhZNgfhVS/1Dz8KOYczxXM?= =?iso-8859-1?Q?tFHKDPYTbOED3aDEFDIH7N0ZD6uOPcdMtxmNKb0Q1W3PD7ODGcG9wwepTb?= =?iso-8859-1?Q?4mZ8qQeJoKs/oXIbfjI2L/VgzJlaA1baTysQGMnXvMsja8hGFLiLe2Drzn?= =?iso-8859-1?Q?yHcnzRryi0+fCpAMxpEXjxyeUXuhToEyybiYljFBzHI7aols+ZC07bAiQW?= =?iso-8859-1?Q?6C7oE4f1mkRz8okv6X7xfn8HRZSSxGRR6G+k+PtlGM+GwJHBEMKh4AHfFI?= =?iso-8859-1?Q?c1+U6RuvZqXcc0sRCd2EAxZ9aK0VquodlWVAnbgZkwW7UbolnafTk4EiE9?= =?iso-8859-1?Q?4Exk2vXqbbrC5h5fGXjEwYew/mN9742iiHouJ/IXkvXURi8uQL8blI5mUL?= =?iso-8859-1?Q?cpvoqvlYWjC1daTeLNpnXaVhPhoXDlxrq8pcQrNkIfgPddrGTSdvXRmJQR?= =?iso-8859-1?Q?tq7L3mUuy4PlpQJY8FbsQQdrdNzrWsIUDmxjjBQ6TiltWyDPCjfOtiI4qD?= =?iso-8859-1?Q?X/nECwwrWYSs0qBqxC+5THKdXqL4Kv2CNlQInd48pIU4NpwE9uesnMG3ha?= =?iso-8859-1?Q?ymTv4L1ytJnHy/YD4VK/69/vG/Mo7Fv3McPsoc8p4Y691fucRccu0JQ/Yz?= =?iso-8859-1?Q?DHv7yvUBQRIG0ogA9MO/dNA0aJYAswYWoPqVlhsxe8HkSQ7SCqO7ZDNZ+S?= =?iso-8859-1?Q?AkGMckgEqYwf9hdTx0xFueFemtQmusYCH/mzRfWodtdwUne5c/3AOzFALX?= =?iso-8859-1?Q?rSztn1Kvs+icV5F2K3tG23mMg42Ot42hqK0Upe2Cnor8nJVWJUIe1AuZar?= =?iso-8859-1?Q?rqkAVJE6yJolTAx9QOw2KnIV+VlCS8IwIK42j/oSSj5GqzoAIjdZNElHUJ?= =?iso-8859-1?Q?I14kHzWrIy1dp2cZbYJlH7UX+i0sKX26fW7Cg/y1fic9ZoMGlMSJP4bOzg?= =?iso-8859-1?Q?lO3dOQDuz0n6fKEwdqFNKv8930riSZJwyEc1OHgUUjCIBgKu0p+FFL/vKz?= =?iso-8859-1?Q?dlYYMdG49E+Kkhqx7CSApz2tsPOWx1ty9gGNfH3+Mpzu36itw4PpFKThlH?= =?iso-8859-1?Q?Htj8CCAq8A1IphWhzT/QYUck+Mk3WV7ryeEIV6zak6SrB7XIEsfpFLplBb?= =?iso-8859-1?Q?tEFs00ar684l3SU8zCvf5ktUip67qqzt0il5u7lHIUUSgh+3QMglKqu0Vh?= =?iso-8859-1?Q?rPmwrUd09JdH4ZLs03NwUmtBVmqJ34s0rzWJDg=3D=3D?= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM5PR12MB2405.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(376014)(1800799024)(38070700018); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?/Hjj5PwLIf3bgugHwXlncJgZI2tVBnNivsNdW5xl9X6qsl6NPLu66Qh9NC?= =?iso-8859-1?Q?ukojb/hVfJRQYyRsoO/L6DVfvcNFLSGtK2vTcl1NQToQ2ttxw/GCg/BPUc?= =?iso-8859-1?Q?sHpT383wIzC4f1hzeOccyLF19uxLGUIEtHg5IQuuCuRMKA0D0k+2KJidXN?= =?iso-8859-1?Q?K0SGVgaH1x8VuFftM/QvGtLSQIuIHaLf5MsQF9UiE1WGMXRbzhpwhdFv5h?= =?iso-8859-1?Q?zIfJHRUNkAUttBSxaVS5UeWR9jrisPs/abn0Fve9PsaFov5tyARjY/TQcb?= =?iso-8859-1?Q?mq6iCTX2fSy7Xv9CZIi2nReEEwInNePxDef2O86N/TB5ZAg0pVJUjqotSm?= =?iso-8859-1?Q?e3jmm5xKlYOOuKnqLYpmTETOK8bjWhnL/9xvEAeiD1ZvgSsWoZP8xWyU/X?= =?iso-8859-1?Q?61yyv7+kYc5BiLxYZBFa9gvRBYnYU+YU5zEgofaHvIAhiPvzMC7uInkKsP?= =?iso-8859-1?Q?qOY7oaAc/CS+PlSgkjmHOkvQaLPiXFN/Q9rPOCbvwKsL/5Bnye4WsRsJtX?= =?iso-8859-1?Q?1oi8sI/bjzulQWKX6O5qTjN7qj3rRxD0daLF9CgNZj2ZViG15Nr8wuufVS?= =?iso-8859-1?Q?Rp9vla/4RMZOIioQVD/JpIaAVaT6Ik4MxMKqU5Py/6SWNGmXdp88bgIC/U?= =?iso-8859-1?Q?vQz7CQB4/gRYSqTQI79uM3uol+B7mPLCgRy2EzYwVBNFq7/FnYq9D/jX+J?= =?iso-8859-1?Q?M0jP7OWckYTubfWb1Fym2Sg9Y10ymJKLI16PSSr4wGQH0nOiBaWZz5B2uH?= =?iso-8859-1?Q?Ii+17ZRkEESnYBJAjcwuy/91l/lrJUNq98CSEYQXUNVYGRWU0geu0FiQbh?= =?iso-8859-1?Q?RpfGBbURGFMONkUqyVNwYuyQ0DpEJ7tPuuVkCt68MNi1C34kCu9ZIFkezq?= =?iso-8859-1?Q?gNbfP9GMpXkoAFRTV5OXoga2dC0H+xDCR3zIYbgf9uBV89sk7RTUi5vIEK?= =?iso-8859-1?Q?3g76H125ZFhIDGAnKq5TxOWQYLqX6jxt8ArpUhBgwxgU///vLTGlO08qer?= =?iso-8859-1?Q?Mi7ruVYuUixmVxL/0yqx4Q9jDy6Sdd7rirQwa7XvImvg7cl+eG75xrJSSE?= =?iso-8859-1?Q?RrXw0b6rGOxxF4ahx1XHKVnE/pyIsb+PRz7jjG9XsUJAPKPxJmTjGAKm7+?= =?iso-8859-1?Q?NlepMqKoNKe3EqAI3R6lWp/8XTPTnPJhWEZ5SOZxAKFjlA7+xawfFjpfPQ?= =?iso-8859-1?Q?JGaXDGSKb5tu0mj6Q/6wBE/JlYaxVjhwnhWoPp5dqJzKVIkC0Liu0yy+rq?= =?iso-8859-1?Q?WSvOgJiHHy595nPewMEKgf7ySmgjRlRKtQANuPIB2iyXeQ346AgCI7mG64?= =?iso-8859-1?Q?enEvohuQst1hyCLAIcR9OBlre6ZDy1uIWpIGMqEJDvOCvsW/BcZyl6vfVZ?= =?iso-8859-1?Q?DgaEwfESD/8sr9tIbwcb8NvvenBdHynyeAtiWuQbpvgGNrgJ3B0vfeBsXo?= =?iso-8859-1?Q?HEeVkheGRitxgJI7xh+5Oodn+c6wNymiyCLzqYMeb7bkwNIGTc6RPAKTtU?= =?iso-8859-1?Q?qS6QNmhIbvUCWdYVhEtdSV/jvmogIKTCh60mngHq0IH4fSvJRqOu5rWjVd?= =?iso-8859-1?Q?1S3HZpAQ1+bafl+Aaj12dbtNt6zwm9z5Tif2ViNKCMe6SGnRK+DANkg9qv?= =?iso-8859-1?Q?tf7PGviHreUo8r83FoLGYLKxAxToiP1gM9?= Content-Type: multipart/alternative; boundary="_000_DM5PR12MB24050E37689CA9DB74D12DB3AFA82DM5PR12MB2405namp_" MIME-Version: 1.0 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM5PR12MB2405.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: ef8d8ad2-bdf3-4d3f-bb8c-08dcaa8db51b X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Jul 2024 20:34:38.5346 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: jQpKK18ex4I9g5yh4/kJbvEtwzsAz0m8T+iyrbwhgmEyJYkVrmjDtZ4NijAhYIs1nBgptfAVeJhjQ1U+oubFyQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB4301 X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --_000_DM5PR12MB24050E37689CA9DB74D12DB3AFA82DM5PR12MB2405namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Ken, here is the error syndrome 0x04 meaning: 0x4: Local_Protection_Error "This event is generated when a user attempts to access an address outside = of the registered memory region. For example, this may happen if the Lkey does not match the address in the = WR." Looks like wrong buffer was passed to the NIC for a packet acquisition. Could you please share more details on your test case? What is the traffic = pattern? What is Rx/Tx queues config? mbufs? Regards, Alex ________________________________ From: Ken Andrews Sent: Friday, June 28, 2024 5:22 AM To: users@dpdk.org Subject: Mellanox - Unexpected CEQ error, rx stops receiving packets Hi, I'm seeing an issue previously mentioned on this list in 2022, where my Mel= lanox NIC is logging an Unexpected CEQ error syndrome. Once this condition = is hit, the rxq->err_state var in the mlx5 PMD is never cleared, and the rx just loops = around never receiving any further packets. It's not clear what's causing the initial CEQ error, as it can take upwards= of an hour to occur. The full log entry is: Unexpected CQE error syndrome 0x04 CQN =3D 256 SQN =3D 4679 wqe_counter =3D= 2149 wq_ci =3D 3294 cq_ci =3D 42609 MLX5 Error CQ: at [0x292d26000], len=3D16384 The NIC is: NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (default = mode), Single-port OSFP, PCIe 5.0 x16, Crypto Enabled, Secure Boot Enabled Part number: MCX75310AAC-NEA_Ax Firmware: 28.36.1010 OFED Version: 24.04-0.6.6.0 DPDK Version: 23.11.0 This issue was previously mentioned in this post: https://mails.dpdk.org/ar= chives/users/2022-October/006779.html Can anyone please help shed some light on this? Thanks, Ken AndrewsKen Andrews R&D Departmentt: +44 1506 671416e: ken.andrews@calne= xsol.comw: calnexsol.comNew Product The SNE-X is a total solution to the problem of real-world Ethernet testing= . It combines comprehensive and efficient network emulation for 5G, Data Ce= nter, and Cloud applications. Click for more information. Calnex Solutions Oracle Campus Linlithgow EH49 7LR United KingdomCalnex Solutions plc is registered in Scotland. Registration = number: SC299625. Registered office: Oracle Campus, Linlithgow, Scotland, E= H49 7LR, United Kingdom. --_000_DM5PR12MB24050E37689CA9DB74D12DB3AFA82DM5PR12MB2405namp_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
Hi Ken, here is the error syndrome 0x04 meaning:
0x4: Local_Protection_Error 
"This event is generated when a user attempts to access an address out= side of the registered memory region.
For example, this may happen if the Lkey does not match the address in the = WR."
Looks like wrong buffer was passed to the NIC for a packet acquisition.
Could you please share more details on your test case? What is the traffic = pattern? What is Rx/Tx queues config? mbufs?

Regards,
Alex


Hi,

I'm seeing an issue previously mentioned on this list in 2022, where my Mel= lanox NIC is logging an Unexpected CEQ error syndrome. Once this condition = is hit, the 
rxq->err_state var in the mlx5 PMD is never cleared, and the rx just loo= ps around never receiving any further packets.

It's not clear what's causing the initial CEQ error, as it can take upwards= of an hour to occur.

The full log entry is:

Unexpected CQE error syndrome 0x04 CQN =3D 256 SQN =3D 4679 wqe_counter =3D= 2149 wq_ci =3D 3294 cq_ci =3D 42609
MLX5 Error CQ: at [0x292d26000], len=3D16384

The NIC is:  NVIDIA ConnectX-7 HHHL Adapter card, 400GbE / NDR IB (def= ault mode), Single-port OSFP, PCIe 5.0 x16, Crypto Enabled, Secure Boot Ena= bled
Part number: MCX75310AAC-NEA_Ax
Firmware: 28.36.1010
OFED Version: 24.04-0.6.6.0
DPDK Version: 23.11.0

This issue was previously mentioned in this post: https://mails.dpdk.org/archives/users/2022-October/006779.html

Can anyone please help shed some light on this?

Thanks,
Ken AndrewsKen Andrews R&D Departmentt: +44 1506 671416e: ken.andrews@c= alnexsol.comw: calnexsol.comNew Product

The SNE-X is a total solution to the problem of real-world Ethernet testing= . It combines comprehensive and efficient network emulation for 5G, Data Ce= nter, and Cloud applications. Click for more information.

Calnex Solutions
Oracle Campus
Linlithgow
EH49 7LR
United KingdomCalnex Solutions plc is registered in Scotland. Registration = number: SC299625. Registered office: Oracle Campus, Linlithgow, Scotland, E= H49 7LR, United Kingdom.
--_000_DM5PR12MB24050E37689CA9DB74D12DB3AFA82DM5PR12MB2405namp_--