From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2CC9D43B79; Mon, 4 Mar 2024 14:25:12 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E34F94027D; Mon, 4 Mar 2024 14:25:11 +0100 (CET) Received: from inbox.dpdk.org (inbox.dpdk.org [95.142.172.178]) by mails.dpdk.org (Postfix) with ESMTP id 85FD54026B for ; Mon, 4 Mar 2024 14:25:10 +0100 (CET) Received: by inbox.dpdk.org (Postfix, from userid 33) id 7BD2343B8F; Mon, 4 Mar 2024 14:25:10 +0100 (CET) From: bugzilla@dpdk.org To: dev@dpdk.org Subject: [DPDK/vhost/virtio Bug 1394] vq_assert_lock__ fail in vhost_user_set_vring_addr during live migration with HW vDPA Date: Mon, 04 Mar 2024 13:25:09 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: DPDK X-Bugzilla-Component: vhost/virtio X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: critical X-Bugzilla-Who: yajunw@nvidia.com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: dev@dpdk.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: multipart/alternative; boundary=17095587100.2CDe45cF.2266513 Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --17095587100.2CDe45cF.2266513 Date: Mon, 4 Mar 2024 14:25:10 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All https://bugs.dpdk.org/show_bug.cgi?id=3D1394 Bug ID: 1394 Summary: vq_assert_lock__ fail in vhost_user_set_vring_addr during live migration with HW vDPA Product: DPDK Version: unspecified Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: critical Priority: Normal Component: vhost/virtio Assignee: dev@dpdk.org Reporter: yajunw@nvidia.com Target Milestone: --- DPDK version: v24.03-rc1 1. boot vDPA and qemu(8.1) build/examples/dpdk-vdpa -a 5e:00.2,class=3Dvdpa --file-prefix vf0 --log-level=3D.,info -- --client --iface /tmp/vfe-net=20 2. live migration VM to another server sudo virsh migrate --verbose --live --persistent gen-l-vrt-295-005-CentOS-7= .4 qemu+ssh://gen-l-vrt-294/system --unsafe 3. dpdk crash After device configured, vhost_user_lock_all_queue_pairs won't be called. T= he=20 vq_assert_lock__ failed in vhost_user_set_vring_addr for vDPA case. related to commit: commit 741dc052eaf9459cc576b0d87e96a40069485c32 (HEAD) Author: David Marchand david.marchand@redhat.com Date: Tue Dec 5 10:45:34 2023 +0100 vhost: annotate virtqueue access checks Modifying vq->access_ok should be done with a write lock taken. Annotate vring_translate() and vring_invalidate(). new port /tmp/vfe-net0, device : 5e:00.2 mlx5_vdpa: MTU cannot be set on device 5e:00.2. mlx5_vdpa: Region 0: HVA 0x7fff00000000, GPA 0x0, size 0xc0000000. mlx5_vdpa: Region 1: HVA 0x7ffdc0000000, GPA 0x100000000, size 0x140000000. mlx5_vdpa: Indirect mkey mode is KLM Fixed Buffer Size. mlx5_vdpa: vid 0: Init last_avail_idx=3D0, last_used_idx=3D0 for virtq 0. mlx5_vdpa: Virtq 0 notifier state is enabled. mlx5_vdpa: vid 0: Init last_avail_idx=3D0, last_used_idx=3D0 for virtq 1. mlx5_vdpa: Virtq 1 notifier state is enabled. [New Thread 0x7fffe7b9b400 (LWP 962699)] mlx5_vdpa: vDPA device 0 was configured. VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 0 VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 1 VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 2 mlx5_vdpa: Update virtq 2 status disable -> enable. mlx5_vdpa: vid 0: Init last_avail_idx=3D0, last_used_idx=3D0 for virtq 2. VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 3 mlx5_vdpa: Update virtq 3 status disable -> enable. mlx5_vdpa: Virtq 2 notifier state is enabled. mlx5_vdpa: vid 0: Init last_avail_idx=3D0, last_used_idx=3D0 for virtq 3. mlx5_vdpa: Virtq 3 notifier state is enabled. VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_LOG_BASE VHOST_CONFIG: (/tmp/vfe-net0) log mmap size: 294912, offset: 0 VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_FEATURES VHOST_CONFIG: (/tmp/vfe-net0) negotiated Virtio features: 0x144601803 mlx5_vdpa: mlx5 vdpa: enabling dirty logging... VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_GET_FEATURES VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_GET_STATUS VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ADDR EAL: PANIC in vq_assert_lock__(): VHOST_CONFIG: (/tmp/vfe-net0) vhost_user_set_vring_addr() called without ac= cess lock taken. 0: /images/vdpa/dpdk/build/examples/dpdk-vdpa (rte_dump_stack+0x1f) [aadfca] 1: /images/vdpa/dpdk/build/examples/dpdk-vdpa (__rte_panic+0xe2) [a8032b] 2: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x3bac83) [7bac83] 3: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x3bd0ef) [7bd0ef] 4: /images/vdpa/dpdk/build/examples/dpdk-vdpa (vhost_user_msg_handler+0x508) [7c264f] 5: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x36b797) [76b797] 6: /images/vdpa/dpdk/build/examples/dpdk-vdpa (fdset_event_dispatch+0x1cd) [769793] 7: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x6970ca) [a970ca] 8: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x6af41c) [aaf41c] 9: /lib64/libpthread.so.0 (7ffff6426000+0x814a) [7ffff642e14a] 10: /lib64/libc.so.6 (clone+0x43) [7ffff615ddc3] Thread 29 "dpdk-vhost-evt" received signal SIGABRT, Aborted. [Switching to Thread 0x7fffe839c400 (LWP 962487)] __GI_raise (sig=3Dsig@entry=3D6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 return ret; Missing separate debuginfos, use: yum debuginfo-install elfutils-libelf-0.182-3.el8.x86_64 libgcc-8.4.1-1.el8.x86_64 libibverbs-2307mlnx47-1.2310007.x86_64 libnl3-3.5.0-1.el8.x86_64 libpcap-1.9.1-5.el8.x86_64 numactl-libs-2.0.12-11.el8.x86_64 openssl-libs-1.1.1k-5.el8_5.x86_64 zlib-1.2.11-17.el8.x86_64 (gdb) bt #0 __GI_raise (sig=3Dsig@entry=3D6) at ../sysdeps/unix/sysv/linux/raise.c:= 50 #1 0x00007ffff6082db5 in __GI_abort () at abort.c:79 #2 0x0000000000a80330 in __rte_panic (funcname=3D0x3461330 <__func__.33032> "vq_assert_lock__",=20 format=3D0x345eb90 "VHOST_CONFIG: (%s) %s() called without access lock taken.\n%.0s") at ../lib/eal/common/eal_common_debug.c:26 #3 0x00000000007bac83 in vq_assert_lock__ (dev=3D0x17ffc2500, vq=3D0x17ffa= 2280, func=3D0x3461370 <__func__.33676> "vhost_user_set_vring_addr") at ../lib/vhost/vhost.h:547 #4 0x00000000007bd0ef in vhost_user_set_vring_addr (pdev=3D0x7fffe8399270, ctx=3D0x7fffe8398fc0, main_fd=3D115) at ../lib/vhost/vhost_user.c:930 #5 0x00000000007c264f in vhost_user_msg_handler (vid=3D0, fd=3D115) at ../lib/vhost/vhost_user.c:3197 #6 0x000000000076b797 in vhost_user_read_cb (connfd=3D115, dat=3D0x7fffe00= 00dd0, remove=3D0x7fffe8399354) at ../lib/vhost/socket.c:318 #7 0x0000000000769793 in fdset_event_dispatch (arg=3D0x3cb3940 ) at ../lib/vhost/fd_man.c:282 #8 0x0000000000a970ca in control_thread_start (arg=3D0x9a5ed30) at ../lib/eal/common/eal_common_thread.c:282 #9 0x0000000000aaf41c in thread_start_wrapper (arg=3D0x7fffffffe090) at ../lib/eal/unix/rte_thread.c:114 #10 0x00007ffff642e14a in start_thread (arg=3D) at pthread_create.c:479 #11 0x00007ffff615ddc3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) --=20 You are receiving this mail because: You are the assignee for the bug.= --17095587100.2CDe45cF.2266513 Date: Mon, 4 Mar 2024 14:25:10 +0100 MIME-Version: 1.0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.dpdk.org/ Auto-Submitted: auto-generated X-Auto-Response-Suppress: All
Bug ID 1394
Summary vq_assert_lock__ fail in vhost_user_set_vring_addr during liv= e migration with HW vDPA
Product DPDK
Version unspecified
Hardware x86
OS Linux
Status UNCONFIRMED
Severity critical
Priority Normal
Component vhost/virtio
Assignee dev@dpdk.org
Reporter yajunw@nvidia.com
Target Milestone ---

DPDK version: v24.03-rc1

1. boot vDPA and qemu(8.1)
build/examples/dpdk-vdpa -a 5e:00.2,class=3Dvdpa --file-prefix vf0
--log-level=3D.,info -- --client --iface /tmp/vfe-net=20

<qemu:arg value=3D'-chardev'/>
<qemu:arg value=3D'socket,id=3Dcharnet1,path=3D/tmp/vfe-net0,server'/>
<qemu:arg value=3D'-netdev'/>
<qemu:arg value=3D'vhost-user,chardev=3Dcharnet1,queues=3D2,id=3Dhostnet=
1'/>
<qemu:arg value=3D'-device'/>
<qemu:arg
value=3D'virtio-net-pci,mq=3Don,vectors=3D6,netdev=3Dhostnet1,id=3Dnet1,mac=
=3D00:00:00:00:33:00,bus=3Dpci.0,addr=3D0x8,page-per-vq=3Don'/>

2. live migration VM to another server
sudo virsh migrate --verbose --live --persistent gen-l-vrt-295-005-CentOS-7=
.4
qemu+ssh://gen-l-vrt-294/system  --unsafe

3. dpdk crash

After device configured, vhost_user_lock_all_queue_pairs won't be called. T=
he=20
vq_assert_lock__ failed in vhost_user_set_vring_addr for vDPA case.

related to commit:
commit 741dc052eaf9459cc576b0d87e96a40069485c32 (HEAD)
Author: David Marchand dav=
id.marchand@redhat.com
Date:   Tue Dec 5 10:45:34 2023 +0100

    vhost: annotate virtqueue access checks

    Modifying vq->access_ok should be done with a write lock taken.
    Annotate vring_translate() and vring_invalidate().



new port /tmp/vfe-net0, device : 5e:00.2
mlx5_vdpa: MTU cannot be set on device 5e:00.2.
mlx5_vdpa: Region 0: HVA 0x7fff00000000, GPA 0x0, size 0xc0000000.
mlx5_vdpa: Region 1: HVA 0x7ffdc0000000, GPA 0x100000000, size 0x140000000.
mlx5_vdpa: Indirect mkey mode is KLM Fixed Buffer Size.
mlx5_vdpa: vid 0: Init last_avail_idx=3D0, last_used_idx=3D0 for virtq 0.
mlx5_vdpa: Virtq 0 notifier state is enabled.
mlx5_vdpa: vid 0: Init last_avail_idx=3D0, last_used_idx=3D0 for virtq 1.
mlx5_vdpa: Virtq 1 notifier state is enabled.
[New Thread 0x7fffe7b9b400 (LWP 962699)]
mlx5_vdpa: vDPA device 0 was configured.
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 0
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 1
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 2
mlx5_vdpa: Update virtq 2 status disable -> enable.
mlx5_vdpa: vid 0: Init last_avail_idx=3D0, last_used_idx=3D0 for virtq 2.
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ENABLE
VHOST_CONFIG: (/tmp/vfe-net0) set queue enable: 1 to qp idx: 3
mlx5_vdpa: Update virtq 3 status disable -> enable.
mlx5_vdpa: Virtq 2 notifier state is enabled.
mlx5_vdpa: vid 0: Init last_avail_idx=3D0, last_used_idx=3D0 for virtq 3.
mlx5_vdpa: Virtq 3 notifier state is enabled.
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_LOG_BASE
VHOST_CONFIG: (/tmp/vfe-net0) log mmap size: 294912, offset: 0
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_FEATURES
VHOST_CONFIG: (/tmp/vfe-net0) negotiated Virtio features: 0x144601803
mlx5_vdpa: mlx5 vdpa: enabling dirty logging...
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_GET_FEATURES
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_GET_STATUS
VHOST_CONFIG: (/tmp/vfe-net0) read message VHOST_USER_SET_VRING_ADDR
EAL: PANIC in vq_assert_lock__():
VHOST_CONFIG: (/tmp/vfe-net0) vhost_user_set_vring_addr() called without ac=
cess
lock taken.
0: /images/vdpa/dpdk/build/examples/dpdk-vdpa (rte_dump_stack+0x1f) [aadfca]
1: /images/vdpa/dpdk/build/examples/dpdk-vdpa (__rte_panic+0xe2) [a8032b]
2: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x3bac83) [7bac83]
3: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x3bd0ef) [7bd0ef]
4: /images/vdpa/dpdk/build/examples/dpdk-vdpa (vhost_user_msg_handler+0x508)
[7c264f]
5: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x36b797) [76b797]
6: /images/vdpa/dpdk/build/examples/dpdk-vdpa (fdset_event_dispatch+0x1cd)
[769793]
7: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x6970ca) [a970ca]
8: /images/vdpa/dpdk/build/examples/dpdk-vdpa (400000+0x6af41c) [aaf41c]
9: /lib64/libpthread.so.0 (7ffff6426000+0x814a) [7ffff642e14a]
10: /lib64/libc.so.6 (clone+0x43) [7ffff615ddc3]

Thread 29 "dpdk-vhost-evt" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe839c400 (LWP 962487)]
__GI_raise (sig=3Dsig@entry=3D6) at ../sysdeps/unix/sysv/linux/raise.c:=
50
50        return ret;
Missing separate debuginfos, use: yum debuginfo-install
elfutils-libelf-0.182-3.el8.x86_64 libgcc-8.4.1-1.el8.x86_64
libibverbs-2307mlnx47-1.2310007.x86_64 libnl3-3.5.0-1.el8.x86_64
libpcap-1.9.1-5.el8.x86_64 numactl-libs-2.0.12-11.el8.x86_64
openssl-libs-1.1.1k-5.el8_5.x86_64 zlib-1.2.11-17.el8.x86_64
(gdb) bt
#0  __GI_raise (sig=3Dsig@entry=3D6) at ../sysdeps/unix/sysv/linux/rais=
e.c:50
#1  0x00007ffff6082db5 in __GI_abort () at abort.c:79
#2  0x0000000000a80330 in __rte_panic (funcname=3D0x3461330 <__func__.33=
032>
"vq_assert_lock__",=20
    format=3D0x345eb90 "VHOST_CONFIG: (%s) %s() called without access =
lock
taken.\n%.0s") at ../lib/eal/common/eal_common_debug.c:26
#3  0x00000000007bac83 in vq_assert_lock__ (dev=3D0x17ffc2500, vq=3D0x17ffa=
2280,
func=3D0x3461370 <__func__.33676> "vhost_user_set_vring_addr&quo=
t;)
    at ../lib/vhost/vhost.h:547
#4  0x00000000007bd0ef in vhost_user_set_vring_addr (pdev=3D0x7fffe8399270,
ctx=3D0x7fffe8398fc0, main_fd=3D115) at ../lib/vhost/vhost_user.c:930
#5  0x00000000007c264f in vhost_user_msg_handler (vid=3D0, fd=3D115) at
../lib/vhost/vhost_user.c:3197
#6  0x000000000076b797 in vhost_user_read_cb (connfd=3D115, dat=3D0x7fffe00=
00dd0,
remove=3D0x7fffe8399354) at ../lib/vhost/socket.c:318
#7  0x0000000000769793 in fdset_event_dispatch (arg=3D0x3cb3940
<vhost_user+8192>) at ../lib/vhost/fd_man.c:282
#8  0x0000000000a970ca in control_thread_start (arg=3D0x9a5ed30) at
../lib/eal/common/eal_common_thread.c:282
#9  0x0000000000aaf41c in thread_start_wrapper (arg=3D0x7fffffffe090) at
../lib/eal/unix/rte_thread.c:114
#10 0x00007ffff642e14a in start_thread (arg=3D<optimized out>) at
pthread_create.c:479
#11 0x00007ffff615ddc3 in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)
          


You are receiving this mail because:
  • You are the assignee for the bug.
=20=20=20=20=20=20=20=20=20=20
= --17095587100.2CDe45cF.2266513--