From mboxrd@z Thu Jan 1 00:00:00 1970
Return-Path:
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
by inbox.dpdk.org (Postfix) with ESMTP id 6145643EF6;
Wed, 24 Apr 2024 10:38:50 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
by mails.dpdk.org (Postfix) with ESMTP id 3E421433D5;
Wed, 24 Apr 2024 10:38:50 +0200 (CEST)
Received: from inbox.dpdk.org (inbox.dpdk.org [95.142.172.178])
by mails.dpdk.org (Postfix) with ESMTP id F31F5433C0
for ; Wed, 24 Apr 2024 10:38:47 +0200 (CEST)
Received: by inbox.dpdk.org (Postfix, from userid 33)
id DF87C43EF7; Wed, 24 Apr 2024 10:38:47 +0200 (CEST)
From: bugzilla@dpdk.org
To: dev@dpdk.org
Subject: [DPDK/ethdev Bug 1419] [mlx5] Segfault when calling
rte_eth_dev_start() twice
Date: Wed, 24 Apr 2024 08:38:47 +0000
X-Bugzilla-Reason: AssignedTo
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: DPDK
X-Bugzilla-Component: ethdev
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: vojanec@cesnet.cz
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution:
X-Bugzilla-Priority: Normal
X-Bugzilla-Assigned-To: dev@dpdk.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags:
X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform
op_sys bug_status bug_severity priority component assigned_to reporter
target_milestone attachments.created
Message-ID:
Content-Type: multipart/alternative; boundary=17139479270.EDc8bfcae.1777976
Content-Transfer-Encoding: 7bit
X-Bugzilla-URL: http://bugs.dpdk.org/
Auto-Submitted: auto-generated
X-Auto-Response-Suppress: All
MIME-Version: 1.0
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dev-bounces@dpdk.org
--17139479270.EDc8bfcae.1777976
Date: Wed, 24 Apr 2024 10:38:47 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.dpdk.org/
Auto-Submitted: auto-generated
X-Auto-Response-Suppress: All
https://bugs.dpdk.org/show_bug.cgi?id=3D1419
Bug ID: 1419
Summary: [mlx5] Segfault when calling rte_eth_dev_start() twice
Product: DPDK
Version: unspecified
Hardware: x86
OS: Linux
Status: UNCONFIRMED
Severity: normal
Priority: Normal
Component: ethdev
Assignee: dev@dpdk.org
Reporter: vojanec@cesnet.cz
Target Milestone: ---
Created attachment 279
--> https://bugs.dpdk.org/attachment.cgi?id=3D279&action=3Dedit
Example application for reproducing
When calling 'rte_eth_dev_start()' on a port whose mempool is not large eno=
ugh,
the function fails with
an error code '-ENOMEM' and message:
mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory
This is expected behaviour. However, when retrying the same call right after
the failure,
the function now fails with error code '-EINVAL' and a message:
mlx5_net: port 0 failed to set defaults flows
This behaviour is suspicious, as the expected behaviour would be to return =
the
same error
message since no more memory was allocated in the meantime.
Furthermore, even more suspicious and incorrect behaviour is observed when =
flow
isolated mode
is enabled. In that case, the first call to 'rte_eth_dev_start()' fails as
expected, but the
second call actually succeeds (return value 0). This leads to undefined
behaviour and a segfault
when calling 'rte_eth_rx_burst()' later.
[Steps to reproduce]
See the attached patch introducing an example application. Apply the patch =
and
build the application
using 'make'. Run the application as follows:
# dpdk-hugepages --setup 2G
# ./build/crash -- 1024
The only application argument is the packet mempool size. Setting it to 1024
ensures that the mempool
is small enough to get allocated, but also fails the first
'rte_eth_dev_start()'.
The application initializes a single DPDK port (use the '--allow' argument =
to
specify), enables
flow isolate mode and attempts to start the port twice. After that, the
application segfaults when
calling 'rte_eth_rx_burst()'.
[Bug investigation]
The 'mlx5_dev_start()' function deallocates used memory when failing after =
its
first
call. However, it seems that it deallocates more memory than it actually
allocated, thus effectively
unconfiguring the queues (or entire port, unsure). In flow isolate mode, it
seems the second call
to 'mlx5_dev_start()' skips some initialization and does not return an erro=
r.
[DPDK Version]
Tested on:
e2e546ab5b ("version: 24.07-rc0")
eeb0605f11 ("version: 23.11.0"), tag: v23.11
[OS Version]
Operating system: Red Hat Enterprise Linux release 8.9 (Ootpa)
Kernel: 4.18.0-477.10.1.el8_8.x86_64
Architecture: x86_64
[Network Devices]
0000:c4:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=3Dens3f0np0 drv=3Dmlx5=
_core
unused=3D=20
0000:c4:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if=3Dens3f1np1 drv=3Dmlx5=
_core
unused=3D
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--17139479270.EDc8bfcae.1777976
Date: Wed, 24 Apr 2024 10:38:47 +0200
MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.dpdk.org/
Auto-Submitted: auto-generated
X-Auto-Response-Suppress: All
[mlx5] Segfault when calling rte_eth_dev_start() twice
Product
DPDK
Version
unspecified
Hardware
x86
OS
Linux
Status
UNCONFIRMED
Severity
normal
Priority
Normal
Component
ethdev
Assignee
dev@dpdk.org
Reporter
vojanec@cesnet.cz
Target Milestone
---
Created attachment 279[details]=
a>
Example application for reproducing
When calling 'rte_eth_dev_start()' on a port whose mempool is not large eno=
ugh,
the function fails with
an error code '-ENOMEM' and message:
mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory
This is expected behaviour. However, when retrying the same call right after
the failure,
the function now fails with error code '-EINVAL' and a message:
mlx5_net: port 0 failed to set defaults flows
This behaviour is suspicious, as the expected behaviour would be to return =
the
same error
message since no more memory was allocated in the meantime.
Furthermore, even more suspicious and incorrect behaviour is observed when =
flow
isolated mode
is enabled. In that case, the first call to 'rte_eth_dev_start()' fails as
expected, but the
second call actually succeeds (return value 0). This leads to undefined
behaviour and a segfault
when calling 'rte_eth_rx_burst()' later.
[Steps to reproduce]
See the attached patch introducing an example application. Apply the patch =
and
build the application
using 'make'. Run the application as follows:
# dpdk-hugepages --setup 2G
# ./build/crash <EAL ARGS> -- 1024
The only application argument is the packet mempool size. Setting it to 1024
ensures that the mempool
is small enough to get allocated, but also fails the first
'rte_eth_dev_start()'.
The application initializes a single DPDK port (use the '--allow' argument =
to
specify), enables
flow isolate mode and attempts to start the port twice. After that, the
application segfaults when
calling 'rte_eth_rx_burst()'.
[Bug investigation]
The 'mlx5_dev_start()' function deallocates used memory when failing after =
its
first
call. However, it seems that it deallocates more memory than it actually
allocated, thus effectively
unconfiguring the queues (or entire port, unsure). In flow isolate mode, it
seems the second call
to 'mlx5_dev_start()' skips some initialization and does not return an erro=
r.
[DPDK Version]
Tested on:
e2e546ab5b ("version: 24.07-rc0")
eeb0605f11 ("version: 23.11.0"), tag: v23.11
[OS Version]
Operating system: Red Hat Enterprise Linux release 8.9 (Ootpa)
Kernel: 4.18.0-477.10.1.el8_8.x86_64
Architecture: x86_64
[Network Devices]
0000:c4:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=3Dens3f0np0 drv=3Dmlx5=
_core
unused=3D=20
0000:c4:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if=3Dens3f1np1 drv=3Dmlx5=
_core
unused=3D
Created attachment 279 [details]= a> Example application for reproducing When calling 'rte_eth_dev_start()' on a port whose mempool is not large eno= ugh, the function fails with an error code '-ENOMEM' and message: mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory This is expected behaviour. However, when retrying the same call right after the failure, the function now fails with error code '-EINVAL' and a message: mlx5_net: port 0 failed to set defaults flows This behaviour is suspicious, as the expected behaviour would be to return = the same error message since no more memory was allocated in the meantime. Furthermore, even more suspicious and incorrect behaviour is observed when = flow isolated mode is enabled. In that case, the first call to 'rte_eth_dev_start()' fails as expected, but the second call actually succeeds (return value 0). This leads to undefined behaviour and a segfault when calling 'rte_eth_rx_burst()' later. [Steps to reproduce] See the attached patch introducing an example application. Apply the patch = and build the application using 'make'. Run the application as follows: # dpdk-hugepages --setup 2G # ./build/crash <EAL ARGS> -- 1024 The only application argument is the packet mempool size. Setting it to 1024 ensures that the mempool is small enough to get allocated, but also fails the first 'rte_eth_dev_start()'. The application initializes a single DPDK port (use the '--allow' argument = to specify), enables flow isolate mode and attempts to start the port twice. After that, the application segfaults when calling 'rte_eth_rx_burst()'. [Bug investigation] The 'mlx5_dev_start()' function deallocates used memory when failing after = its first call. However, it seems that it deallocates more memory than it actually allocated, thus effectively unconfiguring the queues (or entire port, unsure). In flow isolate mode, it seems the second call to 'mlx5_dev_start()' skips some initialization and does not return an erro= r. [DPDK Version] Tested on: e2e546ab5b ("version: 24.07-rc0") eeb0605f11 ("version: 23.11.0"), tag: v23.11 [OS Version] Operating system: Red Hat Enterprise Linux release 8.9 (Ootpa) Kernel: 4.18.0-477.10.1.el8_8.x86_64 Architecture: x86_64 [Network Devices] 0000:c4:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=3Dens3f0np0 drv=3Dmlx5= _core unused=3D=20 0000:c4:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if=3Dens3f1np1 drv=3Dmlx5= _core unused=3D