Hi,

 

Internally the VM is using DPDK 17.05, on Centos7.9 – but this seems to be reproducing with guest level 18.11 as well.

 

The issue is when the DPDK PMDs get started at guest, so the assumption is that that presents bad / inaccessible memory towards the host.

 

We did notice some mis-use at the guest of selinux permissions, and removing that helped reducing the frequency significantly.

 

Is there a way to map the shared memory between VM and host to see where is the segmentation fault coming from?

 

I will see if I can upload the VM xml, but it is a multi-queue 4 port VM.

 

Thanks for the assistance,

Eran

 

From: Xia, Chenbo <chenbo.xia@intel.com>
Sent: Friday, November 26, 2021 4:25 AM
To: Bendror, Eran (Nokia - US) <eran.bendror@nokia.com>; ktraynor@redhat.com
Cc: ayeh@cisco.com; dev@dpdk.org; Stokes, Ian <ian.stokes@intel.com>; maxime.coquelin@redhat.com; yega@cisco.com; Marco Varlese <marco.varlese@suse.com>
Subject: RE: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service

 

Hi,

 

Is it possible that you can provide more info about this isuee. I mean: qemu cmdline/libvirt xml, ovs cmdline, guest driver version and etc… Or it’s hard to reproduce the issue.

 

Thanks,

Chenbo

 

From: Bendror, Eran (Nokia - US) <eran.bendror@nokia.com>
Sent: Wednesday, November 17, 2021 10:42 PM
To: ktraynor@redhat.com
Cc: ayeh@cisco.com; Xia, Chenbo <chenbo.xia@intel.com>; dev@dpdk.org; Stokes, Ian <ian.stokes@intel.com>; maxime.coquelin@redhat.com; yega@cisco.com
Subject: Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service

 

Hello,

 

I am wondering if there was any progress in this topic, we are seeing a very similar issue, where a VM level application restart triggers segmentation fault and failed to allocate MBuf on the host level

 

CentOS Linux release 7.8.2003 (Core)

dpdk-18.11.5-1.el7_8.x86_64

openvswitch-2.11.0-4.el7.x86_64

libvirt 4.5.0

QEMU 4.5.0 (API)

QEMU 2.12.0

3.10.0-1127.13.1.el7.x86_64

 

And we get the same crash

 

#0  0x00007f96cb72e7ee in rte_memcpy_generic () from /lib64/librte_vhost.so.4

#1  0x00007f96cb7350f2 in rte_vhost_dequeue_burst () from /lib64/librte_vhost.so.4

#2  0x00007f96caf97f03 in netdev_dpdk_vhost_rxq_recv () from /lib64/libopenvswitch-2.11.so.0

#3  0x00007f96caed21e6 in netdev_rxq_recv () from /lib64/libopenvswitch-2.11.so.0

#4  0x00007f96caea07ca in dp_netdev_process_rxq_port () from /lib64/libopenvswitch-2.11.so.0

#5  0x00007f96caea0ca5 in pmd_thread_main () from /lib64/libopenvswitch-2.11.so.0

#6  0x00007f96caf2da3f in ovsthread_wrapper () from /lib64/libopenvswitch-2.11.so.0

#7  0x00007f96c9ef3ea5 in start_thread () from /lib64/libpthread.so.0

#8  0x00007f96c94118dd in clone () from /lib64/libc.so.6

 

We have tried upgrading host level artifacts:

 

dpdk-20.11.3-1.el7.x86_64

openvswitch-2.16.1-1.el7.x86_64

 

With backtrace:

 

#0  0x00007f6b8b49748c in virtio_dev_tx_split_legacy () from /lib64/librte_vhost.so.21

#1  0x00007f6b8b4c0fdb in rte_vhost_dequeue_burst () from /lib64/librte_vhost.so.21

#2  0x000055bd714c2802 in netdev_dpdk_vhost_rxq_recv ()

#3  0x000055bd713f8e51 in netdev_rxq_recv ()

#4  0x000055bd713c9d2a in dp_netdev_process_rxq_port ()

#5  0x000055bd713ca1f9 in pmd_thread_main ()

#6  0x000055bd71455cdf in ovsthread_wrapper ()

#7  0x00007f6b8a6a9ea5 in start_thread () from /lib64/libpthread.so.0

#8  0x00007f6b89bc78dd in clone () from /lib64/libc.so.6

 

Regards,

Eran