DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
@ 2020-11-19  1:44 Alex Yeh (ayeh)
  2020-11-19 11:21 ` [dpdk-dev] [ovs-dev] " Stokes, Ian
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Yeh (ayeh) @ 2020-11-19  1:44 UTC (permalink / raw)
  To: dev, ovs-dev; +Cc: Yegappan Lakshmanan (yega)

Hi,
               We are seeing a ovs-vswitchd service crash with segfault in the librte_vhost library when a DPDK application within a guest VM is stopped.

               We are using OVS 2.11.1 on CentOS 7.6 (3.10.0-1062 Linux kernel) with DPDK 18.11.2.

               We are using OVS-DPDK on the host and the guest VM is running a DPDK application. With some traffic, if the application service within the VM is restarted, then OVS crashes.

               This crash is not seen if the guest VM is restarted (instead of stopping the application within the VM).

               The crash trackback (attached below) points to the rte_memcpy_generic() function in rte_memcpy.h. It looks like the crash occurs when vhost is trying to dequeue the packets from the guest VM (as the application in the guest VM has stopped and the huge pages are returned to the guest kernel).

               We have tried enabling iommu in ovs by setting
"other_config:vhost-iommu-support=true" and enabling iommu in qemu using the following configuration in the guest domain XML:
<iommu model='intel'>
    <driver intremap='on'/>
</iommu>
               With iommu enabled ovs-vswitchd still crashes when guest VM restarts the network service.

               Is this a known problem? Anyone else seen a crash like this?  How can we protect the ovs-vswitchd from crashing when a guest VM restarts the network application or service?

Thanks
Alex
------------------------------------------------------------------------

Log:
Oct 7 19:54:16 Branch81-Bravo kernel: [2245909.596635] pmd16[25721]: segfault at 7f4d1d733000 ip 00007f4d2ae5d066 sp 00007f4d1ce65618 error 4 in librte_vhost.so.4[7f4d2ae52000+1a000]
Oct 7 19:54:19 Branch81-Bravo systemd[1]: ovs-vswitchd.service: main process exited, code=killed, status=11/SEGV

Environment:
CentOs 7.6.1810
openvswitch-2.11.1-1.el7.centos.x86_64
openvswitch-kmod-2.11.1-1.el7.centos.x86_64
dpdk-18.11-2.el7.centos.x86_64
3.10.0-1062.4.1.el7.x86_64
qemu-kvm-ev-2.12.0-18.el7.centos_6.1.1

Core dump trace:
(gdb) bt
#-1 0x00007ffff205602e in rte_memcpy_generic (dst=<optimized out>,
src=0x7fffcef3607c, n=<optimized out>)
at /usr/src/debug/dpdk-18.11/x86_64-native-linuxapp-gcc/include/rte_memcpy.h:793
Backtrace stopped: Cannot access memory at address 0x7ffff20558f0

(gdb) list *0x00007ffff205602e
0x7ffff205602e is in rte_memcpy_generic (/usr/src/debug/dpdk-18.11/x86_64-native-linuxapp-gcc/include/rte_memcpy.h:793).
788 }
789
790 /**
791 * For copy with unaligned load
792 */
793 MOVEUNALIGNED_LEFT47(dst, src, n, srcofs);
794
795 /**
796 * Copy whatever left
797 */

(gdb) list *0x00007ffff205c192
0x7ffff205c192 is in rte_vhost_dequeue_burst (/usr/src/debug/dpdk-18.11/lib/librte_vhost/virtio_net.c:1192).
1187 * In zero copy mode, one mbuf can only reference data
1188 * for one or partial of one desc buff.
1189 */
1190 mbuf_avail = cpy_len;
1191 } else {
1192 if (likely(cpy_len > MAX_BATCH_LEN ||
1193 vq->batch_copy_nb_elems >= vq->size ||
1194 (hdr && cur == m))) {
1195 rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
1196 mbuf_offset),
(gdb)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
  2020-11-19  1:44 [dpdk-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service Alex Yeh (ayeh)
@ 2020-11-19 11:21 ` Stokes, Ian
  2020-11-19 12:08   ` Kevin Traynor
  0 siblings, 1 reply; 10+ messages in thread
From: Stokes, Ian @ 2020-11-19 11:21 UTC (permalink / raw)
  To: Alex Yeh (ayeh), dev; +Cc: Yegappan Lakshmanan (yega)

> Hi,
>                We are seeing a ovs-vswitchd service crash with segfault in the
> librte_vhost library when a DPDK application within a guest VM is stopped.
> 
>                We are using OVS 2.11.1 on CentOS 7.6 (3.10.0-1062 Linux kernel) with
> DPDK 18.11.2.

Hi,

Is there a reason you are using OVS 2.11.1 and DPDK 18.11.2?  These are quite old.

As a first step I would recommend using the latest of these branches that have been validated with by the OVS community.

As of now this would be OVS 2.11.4 and DPDK 18.11.9 to check if the issue is still present there my suspicion is that this could be an issue resolved in the DPDK library since 18.11.2.

Regards
Ian

> 
>                We are using OVS-DPDK on the host and the guest VM is running a DPDK
> application. With some traffic, if the application service within the VM is
> restarted, then OVS crashes.
> 
>                This crash is not seen if the guest VM is restarted (instead of stopping
> the application within the VM).
> 
>                The crash trackback (attached below) points to the
> rte_memcpy_generic() function in rte_memcpy.h. It looks like the crash occurs
> when vhost is trying to dequeue the packets from the guest VM (as the
> application in the guest VM has stopped and the huge pages are returned to the
> guest kernel).
> 
>                We have tried enabling iommu in ovs by setting
> "other_config:vhost-iommu-support=true" and enabling iommu in qemu using
> the following configuration in the guest domain XML:
> <iommu model='intel'>
>     <driver intremap='on'/>
> </iommu>
>                With iommu enabled ovs-vswitchd still crashes when guest VM restarts
> the network service.
> 
>                Is this a known problem? Anyone else seen a crash like this?  How can
> we protect the ovs-vswitchd from crashing when a guest VM restarts the
> network application or service?
> 
> Thanks
> Alex
> ------------------------------------------------------------------------
> 
> Log:
> Oct 7 19:54:16 Branch81-Bravo kernel: [2245909.596635] pmd16[25721]:
> segfault at 7f4d1d733000 ip 00007f4d2ae5d066 sp 00007f4d1ce65618 error 4 in
> librte_vhost.so.4[7f4d2ae52000+1a000]
> Oct 7 19:54:19 Branch81-Bravo systemd[1]: ovs-vswitchd.service: main process
> exited, code=killed, status=11/SEGV
> 
> Environment:
> CentOs 7.6.1810
> openvswitch-2.11.1-1.el7.centos.x86_64
> openvswitch-kmod-2.11.1-1.el7.centos.x86_64
> dpdk-18.11-2.el7.centos.x86_64
> 3.10.0-1062.4.1.el7.x86_64
> qemu-kvm-ev-2.12.0-18.el7.centos_6.1.1
> 
> Core dump trace:
> (gdb) bt
> #-1 0x00007ffff205602e in rte_memcpy_generic (dst=<optimized out>,
> src=0x7fffcef3607c, n=<optimized out>)
> at /usr/src/debug/dpdk-18.11/x86_64-native-linuxapp-
> gcc/include/rte_memcpy.h:793
> Backtrace stopped: Cannot access memory at address 0x7ffff20558f0
> 
> (gdb) list *0x00007ffff205602e
> 0x7ffff205602e is in rte_memcpy_generic (/usr/src/debug/dpdk-18.11/x86_64-
> native-linuxapp-gcc/include/rte_memcpy.h:793).
> 788 }
> 789
> 790 /**
> 791 * For copy with unaligned load
> 792 */
> 793 MOVEUNALIGNED_LEFT47(dst, src, n, srcofs);
> 794
> 795 /**
> 796 * Copy whatever left
> 797 */
> 
> (gdb) list *0x00007ffff205c192
> 0x7ffff205c192 is in rte_vhost_dequeue_burst (/usr/src/debug/dpdk-
> 18.11/lib/librte_vhost/virtio_net.c:1192).
> 1187 * In zero copy mode, one mbuf can only reference data
> 1188 * for one or partial of one desc buff.
> 1189 */
> 1190 mbuf_avail = cpy_len;
> 1191 } else {
> 1192 if (likely(cpy_len > MAX_BATCH_LEN ||
> 1193 vq->batch_copy_nb_elems >= vq->size ||
> 1194 (hdr && cur == m))) {
> 1195 rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
> 1196 mbuf_offset),
> (gdb)
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
  2020-11-19 11:21 ` [dpdk-dev] [ovs-dev] " Stokes, Ian
@ 2020-11-19 12:08   ` Kevin Traynor
  2021-01-08 19:35     ` Alex Yeh (ayeh)
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin Traynor @ 2020-11-19 12:08 UTC (permalink / raw)
  To: Stokes, Ian, Alex Yeh (ayeh), dev; +Cc: Yegappan Lakshmanan (yega)

On 19/11/2020 11:21, Stokes, Ian wrote:
>> Hi,
>>                We are seeing a ovs-vswitchd service crash with segfault in the
>> librte_vhost library when a DPDK application within a guest VM is stopped.
>>
>>                We are using OVS 2.11.1 on CentOS 7.6 (3.10.0-1062 Linux kernel) with
>> DPDK 18.11.2.
> 
> Hi,
> 
> Is there a reason you are using OVS 2.11.1 and DPDK 18.11.2?  These are quite old.
> 
> As a first step I would recommend using the latest of these branches that have been validated with by the OVS community.
> 
> As of now this would be OVS 2.11.4 and DPDK 18.11.9 to check if the issue is still present there my suspicion is that this could be an issue resolved in the DPDK library since 18.11.2.
> 

+1, there's 58 commits in the vhost library on 18.11 branch since
18.11.2, so it might be already fixed. 18.11.10 is the latest release,
while below is in from 18.11.7.

$ git log --oneline v18.11.2..HEAD . | grep crash
90b5ba739f vhost: fix crash on port deletion

If you are planning to continue to use 18.11 for a while, I think you
will want to test the 18.11.11 Release Candidate that will be available
in a few weeks. It is the last planned 18.11 release, so any issues you
find *after* it is released won't be fixed.

Kevin.



> Regards
> Ian
> 
>>
>>                We are using OVS-DPDK on the host and the guest VM is running a DPDK
>> application. With some traffic, if the application service within the VM is
>> restarted, then OVS crashes.
>>
>>                This crash is not seen if the guest VM is restarted (instead of stopping
>> the application within the VM).
>>
>>                The crash trackback (attached below) points to the
>> rte_memcpy_generic() function in rte_memcpy.h. It looks like the crash occurs
>> when vhost is trying to dequeue the packets from the guest VM (as the
>> application in the guest VM has stopped and the huge pages are returned to the
>> guest kernel).
>>
>>                We have tried enabling iommu in ovs by setting
>> "other_config:vhost-iommu-support=true" and enabling iommu in qemu using
>> the following configuration in the guest domain XML:
>> <iommu model='intel'>
>>     <driver intremap='on'/>
>> </iommu>
>>                With iommu enabled ovs-vswitchd still crashes when guest VM restarts
>> the network service.
>>
>>                Is this a known problem? Anyone else seen a crash like this?  How can
>> we protect the ovs-vswitchd from crashing when a guest VM restarts the
>> network application or service?
>>
>> Thanks
>> Alex
>> ------------------------------------------------------------------------
>>
>> Log:
>> Oct 7 19:54:16 Branch81-Bravo kernel: [2245909.596635] pmd16[25721]:
>> segfault at 7f4d1d733000 ip 00007f4d2ae5d066 sp 00007f4d1ce65618 error 4 in
>> librte_vhost.so.4[7f4d2ae52000+1a000]
>> Oct 7 19:54:19 Branch81-Bravo systemd[1]: ovs-vswitchd.service: main process
>> exited, code=killed, status=11/SEGV
>>
>> Environment:
>> CentOs 7.6.1810
>> openvswitch-2.11.1-1.el7.centos.x86_64
>> openvswitch-kmod-2.11.1-1.el7.centos.x86_64
>> dpdk-18.11-2.el7.centos.x86_64
>> 3.10.0-1062.4.1.el7.x86_64
>> qemu-kvm-ev-2.12.0-18.el7.centos_6.1.1
>>
>> Core dump trace:
>> (gdb) bt
>> #-1 0x00007ffff205602e in rte_memcpy_generic (dst=<optimized out>,
>> src=0x7fffcef3607c, n=<optimized out>)
>> at /usr/src/debug/dpdk-18.11/x86_64-native-linuxapp-
>> gcc/include/rte_memcpy.h:793
>> Backtrace stopped: Cannot access memory at address 0x7ffff20558f0
>>
>> (gdb) list *0x00007ffff205602e
>> 0x7ffff205602e is in rte_memcpy_generic (/usr/src/debug/dpdk-18.11/x86_64-
>> native-linuxapp-gcc/include/rte_memcpy.h:793).
>> 788 }
>> 789
>> 790 /**
>> 791 * For copy with unaligned load
>> 792 */
>> 793 MOVEUNALIGNED_LEFT47(dst, src, n, srcofs);
>> 794
>> 795 /**
>> 796 * Copy whatever left
>> 797 */
>>
>> (gdb) list *0x00007ffff205c192
>> 0x7ffff205c192 is in rte_vhost_dequeue_burst (/usr/src/debug/dpdk-
>> 18.11/lib/librte_vhost/virtio_net.c:1192).
>> 1187 * In zero copy mode, one mbuf can only reference data
>> 1188 * for one or partial of one desc buff.
>> 1189 */
>> 1190 mbuf_avail = cpy_len;
>> 1191 } else {
>> 1192 if (likely(cpy_len > MAX_BATCH_LEN ||
>> 1193 vq->batch_copy_nb_elems >= vq->size ||
>> 1194 (hdr && cur == m))) {
>> 1195 rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
>> 1196 mbuf_offset),
>> (gdb)
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
  2020-11-19 12:08   ` Kevin Traynor
@ 2021-01-08 19:35     ` Alex Yeh (ayeh)
  2021-01-12 18:20       ` Alex Yeh (ayeh)
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Yeh (ayeh) @ 2021-01-08 19:35 UTC (permalink / raw)
  To: Kevin Traynor, Stokes, Ian, dev; +Cc: Yegappan Lakshmanan (yega)

Hi Kevin, Stokes,
	Thanks for the suggestion.
	We have upgrade to OVS 2.11.4 and DPDK 18.11.10. The OVS still crashes with the same segfault error when application within the guest VM retarts. Any suggestion on how to proceed?

Thanks
Alex

[root@nfvis ~]# ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.11.4
DPDK 18.11.10

-----Original Message-----
From: Kevin Traynor <ktraynor@redhat.com> 
Sent: Thursday, November 19, 2020 4:09 AM
To: Stokes, Ian <ian.stokes@intel.com>; Alex Yeh (ayeh) <ayeh@cisco.com>; dev@dpdk.org
Cc: Yegappan Lakshmanan (yega) <yega@cisco.com>
Subject: Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service

On 19/11/2020 11:21, Stokes, Ian wrote:
>> Hi,
>>                We are seeing a ovs-vswitchd service crash with 
>> segfault in the librte_vhost library when a DPDK application within a guest VM is stopped.
>>
>>                We are using OVS 2.11.1 on CentOS 7.6 (3.10.0-1062 
>> Linux kernel) with DPDK 18.11.2.
> 
> Hi,
> 
> Is there a reason you are using OVS 2.11.1 and DPDK 18.11.2?  These are quite old.
> 
> As a first step I would recommend using the latest of these branches that have been validated with by the OVS community.
> 
> As of now this would be OVS 2.11.4 and DPDK 18.11.9 to check if the issue is still present there my suspicion is that this could be an issue resolved in the DPDK library since 18.11.2.
> 

+1, there's 58 commits in the vhost library on 18.11 branch since
18.11.2, so it might be already fixed. 18.11.10 is the latest release, while below is in from 18.11.7.

$ git log --oneline v18.11.2..HEAD . | grep crash 90b5ba739f vhost: fix crash on port deletion

If you are planning to continue to use 18.11 for a while, I think you will want to test the 18.11.11 Release Candidate that will be available in a few weeks. It is the last planned 18.11 release, so any issues you find *after* it is released won't be fixed.

Kevin.



> Regards
> Ian
> 
>>
>>                We are using OVS-DPDK on the host and the guest VM is 
>> running a DPDK application. With some traffic, if the application 
>> service within the VM is restarted, then OVS crashes.
>>
>>                This crash is not seen if the guest VM is restarted 
>> (instead of stopping the application within the VM).
>>
>>                The crash trackback (attached below) points to the
>> rte_memcpy_generic() function in rte_memcpy.h. It looks like the 
>> crash occurs when vhost is trying to dequeue the packets from the 
>> guest VM (as the application in the guest VM has stopped and the huge 
>> pages are returned to the guest kernel).
>>
>>                We have tried enabling iommu in ovs by setting 
>> "other_config:vhost-iommu-support=true" and enabling iommu in qemu 
>> using the following configuration in the guest domain XML:
>> <iommu model='intel'>
>>     <driver intremap='on'/>
>> </iommu>
>>                With iommu enabled ovs-vswitchd still crashes when 
>> guest VM restarts the network service.
>>
>>                Is this a known problem? Anyone else seen a crash like 
>> this?  How can we protect the ovs-vswitchd from crashing when a guest 
>> VM restarts the network application or service?
>>
>> Thanks
>> Alex
>> ---------------------------------------------------------------------
>> ---
>>
>> Log:
>> Oct 7 19:54:16 Branch81-Bravo kernel: [2245909.596635] pmd16[25721]:
>> segfault at 7f4d1d733000 ip 00007f4d2ae5d066 sp 00007f4d1ce65618 
>> error 4 in librte_vhost.so.4[7f4d2ae52000+1a000]
>> Oct 7 19:54:19 Branch81-Bravo systemd[1]: ovs-vswitchd.service: main 
>> process exited, code=killed, status=11/SEGV
>>
>> Environment:
>> CentOs 7.6.1810
>> openvswitch-2.11.1-1.el7.centos.x86_64
>> openvswitch-kmod-2.11.1-1.el7.centos.x86_64
>> dpdk-18.11-2.el7.centos.x86_64
>> 3.10.0-1062.4.1.el7.x86_64
>> qemu-kvm-ev-2.12.0-18.el7.centos_6.1.1
>>
>> Core dump trace:
>> (gdb) bt
>> #-1 0x00007ffff205602e in rte_memcpy_generic (dst=<optimized out>, 
>> src=0x7fffcef3607c, n=<optimized out>) at 
>> /usr/src/debug/dpdk-18.11/x86_64-native-linuxapp-
>> gcc/include/rte_memcpy.h:793
>> Backtrace stopped: Cannot access memory at address 0x7ffff20558f0
>>
>> (gdb) list *0x00007ffff205602e
>> 0x7ffff205602e is in rte_memcpy_generic 
>> (/usr/src/debug/dpdk-18.11/x86_64-
>> native-linuxapp-gcc/include/rte_memcpy.h:793).
>> 788 }
>> 789
>> 790 /**
>> 791 * For copy with unaligned load
>> 792 */
>> 793 MOVEUNALIGNED_LEFT47(dst, src, n, srcofs);
>> 794
>> 795 /**
>> 796 * Copy whatever left
>> 797 */
>>
>> (gdb) list *0x00007ffff205c192
>> 0x7ffff205c192 is in rte_vhost_dequeue_burst (/usr/src/debug/dpdk- 
>> 18.11/lib/librte_vhost/virtio_net.c:1192).
>> 1187 * In zero copy mode, one mbuf can only reference data
>> 1188 * for one or partial of one desc buff.
>> 1189 */
>> 1190 mbuf_avail = cpy_len;
>> 1191 } else {
>> 1192 if (likely(cpy_len > MAX_BATCH_LEN ||
>> 1193 vq->batch_copy_nb_elems >= vq->size ||
>> 1194 (hdr && cur == m))) {
>> 1195 rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
>> 1196 mbuf_offset),
>> (gdb)
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
  2021-01-08 19:35     ` Alex Yeh (ayeh)
@ 2021-01-12 18:20       ` Alex Yeh (ayeh)
  2021-01-13 14:14         ` Kevin Traynor
  0 siblings, 1 reply; 10+ messages in thread
From: Alex Yeh (ayeh) @ 2021-01-12 18:20 UTC (permalink / raw)
  To: Kevin Traynor, Stokes, Ian, dev; +Cc: Yegappan Lakshmanan (yega)

Hi Kevin, Stokes,
	Resending just to make sure the email is not lost.
Thanks and looking forward to your suggestion,
Alex

-----Original Message-----
From: Alex Yeh (ayeh) 
Sent: Friday, January 08, 2021 11:36 AM
To: Kevin Traynor <ktraynor@redhat.com>; Stokes, Ian <ian.stokes@intel.com>; dev@dpdk.org
Cc: Yegappan Lakshmanan (yega) <yega@cisco.com>
Subject: RE: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service

Hi Kevin, Stokes,
	Thanks for the suggestion.
	We have upgrade to OVS 2.11.4 and DPDK 18.11.10. The OVS still crashes with the same segfault error when application within the guest VM retarts. Any suggestion on how to proceed?

Thanks
Alex

[root@nfvis ~]# ovs-vswitchd --version
ovs-vswitchd (Open vSwitch) 2.11.4
DPDK 18.11.10

-----Original Message-----
From: Kevin Traynor <ktraynor@redhat.com>
Sent: Thursday, November 19, 2020 4:09 AM
To: Stokes, Ian <ian.stokes@intel.com>; Alex Yeh (ayeh) <ayeh@cisco.com>; dev@dpdk.org
Cc: Yegappan Lakshmanan (yega) <yega@cisco.com>
Subject: Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service

On 19/11/2020 11:21, Stokes, Ian wrote:
>> Hi,
>>                We are seeing a ovs-vswitchd service crash with 
>> segfault in the librte_vhost library when a DPDK application within a guest VM is stopped.
>>
>>                We are using OVS 2.11.1 on CentOS 7.6 (3.10.0-1062 
>> Linux kernel) with DPDK 18.11.2.
> 
> Hi,
> 
> Is there a reason you are using OVS 2.11.1 and DPDK 18.11.2?  These are quite old.
> 
> As a first step I would recommend using the latest of these branches that have been validated with by the OVS community.
> 
> As of now this would be OVS 2.11.4 and DPDK 18.11.9 to check if the issue is still present there my suspicion is that this could be an issue resolved in the DPDK library since 18.11.2.
> 

+1, there's 58 commits in the vhost library on 18.11 branch since
18.11.2, so it might be already fixed. 18.11.10 is the latest release, while below is in from 18.11.7.

$ git log --oneline v18.11.2..HEAD . | grep crash 90b5ba739f vhost: fix crash on port deletion

If you are planning to continue to use 18.11 for a while, I think you will want to test the 18.11.11 Release Candidate that will be available in a few weeks. It is the last planned 18.11 release, so any issues you find *after* it is released won't be fixed.

Kevin.



> Regards
> Ian
> 
>>
>>                We are using OVS-DPDK on the host and the guest VM is 
>> running a DPDK application. With some traffic, if the application 
>> service within the VM is restarted, then OVS crashes.
>>
>>                This crash is not seen if the guest VM is restarted 
>> (instead of stopping the application within the VM).
>>
>>                The crash trackback (attached below) points to the
>> rte_memcpy_generic() function in rte_memcpy.h. It looks like the 
>> crash occurs when vhost is trying to dequeue the packets from the 
>> guest VM (as the application in the guest VM has stopped and the huge 
>> pages are returned to the guest kernel).
>>
>>                We have tried enabling iommu in ovs by setting 
>> "other_config:vhost-iommu-support=true" and enabling iommu in qemu 
>> using the following configuration in the guest domain XML:
>> <iommu model='intel'>
>>     <driver intremap='on'/>
>> </iommu>
>>                With iommu enabled ovs-vswitchd still crashes when 
>> guest VM restarts the network service.
>>
>>                Is this a known problem? Anyone else seen a crash like 
>> this?  How can we protect the ovs-vswitchd from crashing when a guest 
>> VM restarts the network application or service?
>>
>> Thanks
>> Alex
>> ---------------------------------------------------------------------
>> ---
>>
>> Log:
>> Oct 7 19:54:16 Branch81-Bravo kernel: [2245909.596635] pmd16[25721]:
>> segfault at 7f4d1d733000 ip 00007f4d2ae5d066 sp 00007f4d1ce65618 
>> error 4 in librte_vhost.so.4[7f4d2ae52000+1a000]
>> Oct 7 19:54:19 Branch81-Bravo systemd[1]: ovs-vswitchd.service: main 
>> process exited, code=killed, status=11/SEGV
>>
>> Environment:
>> CentOs 7.6.1810
>> openvswitch-2.11.1-1.el7.centos.x86_64
>> openvswitch-kmod-2.11.1-1.el7.centos.x86_64
>> dpdk-18.11-2.el7.centos.x86_64
>> 3.10.0-1062.4.1.el7.x86_64
>> qemu-kvm-ev-2.12.0-18.el7.centos_6.1.1
>>
>> Core dump trace:
>> (gdb) bt
>> #-1 0x00007ffff205602e in rte_memcpy_generic (dst=<optimized out>, 
>> src=0x7fffcef3607c, n=<optimized out>) at
>> /usr/src/debug/dpdk-18.11/x86_64-native-linuxapp-
>> gcc/include/rte_memcpy.h:793
>> Backtrace stopped: Cannot access memory at address 0x7ffff20558f0
>>
>> (gdb) list *0x00007ffff205602e
>> 0x7ffff205602e is in rte_memcpy_generic
>> (/usr/src/debug/dpdk-18.11/x86_64-
>> native-linuxapp-gcc/include/rte_memcpy.h:793).
>> 788 }
>> 789
>> 790 /**
>> 791 * For copy with unaligned load
>> 792 */
>> 793 MOVEUNALIGNED_LEFT47(dst, src, n, srcofs);
>> 794
>> 795 /**
>> 796 * Copy whatever left
>> 797 */
>>
>> (gdb) list *0x00007ffff205c192
>> 0x7ffff205c192 is in rte_vhost_dequeue_burst (/usr/src/debug/dpdk- 
>> 18.11/lib/librte_vhost/virtio_net.c:1192).
>> 1187 * In zero copy mode, one mbuf can only reference data
>> 1188 * for one or partial of one desc buff.
>> 1189 */
>> 1190 mbuf_avail = cpy_len;
>> 1191 } else {
>> 1192 if (likely(cpy_len > MAX_BATCH_LEN ||
>> 1193 vq->batch_copy_nb_elems >= vq->size ||
>> 1194 (hdr && cur == m))) {
>> 1195 rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
>> 1196 mbuf_offset),
>> (gdb)
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
  2021-01-12 18:20       ` Alex Yeh (ayeh)
@ 2021-01-13 14:14         ` Kevin Traynor
  0 siblings, 0 replies; 10+ messages in thread
From: Kevin Traynor @ 2021-01-13 14:14 UTC (permalink / raw)
  To: Alex Yeh (ayeh), Stokes, Ian, dev
  Cc: Yegappan Lakshmanan (yega), Maxime Coquelin, Chenbo Xia

On 12/01/2021 18:20, Alex Yeh (ayeh) wrote:
> Hi Kevin, Stokes,
> 	Resending just to make sure the email is not lost.
> Thanks and looking forward to your suggestion,
> Alex
> 

+Cc vhost/virtio maintainers

Thanks for the report and checking the newer versions. I think at this
stage you should log a report in https://bugs.dpdk.org and provide steps
for the vhost/virtio maintainers so they can reproduce this issue.

> -----Original Message-----
> From: Alex Yeh (ayeh) 
> Sent: Friday, January 08, 2021 11:36 AM
> To: Kevin Traynor <ktraynor@redhat.com>; Stokes, Ian <ian.stokes@intel.com>; dev@dpdk.org
> Cc: Yegappan Lakshmanan (yega) <yega@cisco.com>
> Subject: RE: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
> 
> Hi Kevin, Stokes,
> 	Thanks for the suggestion.
> 	We have upgrade to OVS 2.11.4 and DPDK 18.11.10. The OVS still crashes with the same segfault error when application within the guest VM retarts. Any suggestion on how to proceed?
> 
> Thanks
> Alex
> 
> [root@nfvis ~]# ovs-vswitchd --version
> ovs-vswitchd (Open vSwitch) 2.11.4
> DPDK 18.11.10
> 
> -----Original Message-----
> From: Kevin Traynor <ktraynor@redhat.com>
> Sent: Thursday, November 19, 2020 4:09 AM
> To: Stokes, Ian <ian.stokes@intel.com>; Alex Yeh (ayeh) <ayeh@cisco.com>; dev@dpdk.org
> Cc: Yegappan Lakshmanan (yega) <yega@cisco.com>
> Subject: Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
> 
> On 19/11/2020 11:21, Stokes, Ian wrote:
>>> Hi,
>>>                We are seeing a ovs-vswitchd service crash with 
>>> segfault in the librte_vhost library when a DPDK application within a guest VM is stopped.
>>>
>>>                We are using OVS 2.11.1 on CentOS 7.6 (3.10.0-1062 
>>> Linux kernel) with DPDK 18.11.2.
>>
>> Hi,
>>
>> Is there a reason you are using OVS 2.11.1 and DPDK 18.11.2?  These are quite old.
>>
>> As a first step I would recommend using the latest of these branches that have been validated with by the OVS community.
>>
>> As of now this would be OVS 2.11.4 and DPDK 18.11.9 to check if the issue is still present there my suspicion is that this could be an issue resolved in the DPDK library since 18.11.2.
>>
> 
> +1, there's 58 commits in the vhost library on 18.11 branch since
> 18.11.2, so it might be already fixed. 18.11.10 is the latest release, while below is in from 18.11.7.
> 
> $ git log --oneline v18.11.2..HEAD . | grep crash 90b5ba739f vhost: fix crash on port deletion
> 
> If you are planning to continue to use 18.11 for a while, I think you will want to test the 18.11.11 Release Candidate that will be available in a few weeks. It is the last planned 18.11 release, so any issues you find *after* it is released won't be fixed.
> 
> Kevin.
> 
> 
> 
>> Regards
>> Ian
>>
>>>
>>>                We are using OVS-DPDK on the host and the guest VM is 
>>> running a DPDK application. With some traffic, if the application 
>>> service within the VM is restarted, then OVS crashes.
>>>
>>>                This crash is not seen if the guest VM is restarted 
>>> (instead of stopping the application within the VM).
>>>
>>>                The crash trackback (attached below) points to the
>>> rte_memcpy_generic() function in rte_memcpy.h. It looks like the 
>>> crash occurs when vhost is trying to dequeue the packets from the 
>>> guest VM (as the application in the guest VM has stopped and the huge 
>>> pages are returned to the guest kernel).
>>>
>>>                We have tried enabling iommu in ovs by setting 
>>> "other_config:vhost-iommu-support=true" and enabling iommu in qemu 
>>> using the following configuration in the guest domain XML:
>>> <iommu model='intel'>
>>>     <driver intremap='on'/>
>>> </iommu>
>>>                With iommu enabled ovs-vswitchd still crashes when 
>>> guest VM restarts the network service.
>>>
>>>                Is this a known problem? Anyone else seen a crash like 
>>> this?  How can we protect the ovs-vswitchd from crashing when a guest 
>>> VM restarts the network application or service?
>>>
>>> Thanks
>>> Alex
>>> ---------------------------------------------------------------------
>>> ---
>>>
>>> Log:
>>> Oct 7 19:54:16 Branch81-Bravo kernel: [2245909.596635] pmd16[25721]:
>>> segfault at 7f4d1d733000 ip 00007f4d2ae5d066 sp 00007f4d1ce65618 
>>> error 4 in librte_vhost.so.4[7f4d2ae52000+1a000]
>>> Oct 7 19:54:19 Branch81-Bravo systemd[1]: ovs-vswitchd.service: main 
>>> process exited, code=killed, status=11/SEGV
>>>
>>> Environment:
>>> CentOs 7.6.1810
>>> openvswitch-2.11.1-1.el7.centos.x86_64
>>> openvswitch-kmod-2.11.1-1.el7.centos.x86_64
>>> dpdk-18.11-2.el7.centos.x86_64
>>> 3.10.0-1062.4.1.el7.x86_64
>>> qemu-kvm-ev-2.12.0-18.el7.centos_6.1.1
>>>
>>> Core dump trace:
>>> (gdb) bt
>>> #-1 0x00007ffff205602e in rte_memcpy_generic (dst=<optimized out>, 
>>> src=0x7fffcef3607c, n=<optimized out>) at
>>> /usr/src/debug/dpdk-18.11/x86_64-native-linuxapp-
>>> gcc/include/rte_memcpy.h:793
>>> Backtrace stopped: Cannot access memory at address 0x7ffff20558f0
>>>
>>> (gdb) list *0x00007ffff205602e
>>> 0x7ffff205602e is in rte_memcpy_generic
>>> (/usr/src/debug/dpdk-18.11/x86_64-
>>> native-linuxapp-gcc/include/rte_memcpy.h:793).
>>> 788 }
>>> 789
>>> 790 /**
>>> 791 * For copy with unaligned load
>>> 792 */
>>> 793 MOVEUNALIGNED_LEFT47(dst, src, n, srcofs);
>>> 794
>>> 795 /**
>>> 796 * Copy whatever left
>>> 797 */
>>>
>>> (gdb) list *0x00007ffff205c192
>>> 0x7ffff205c192 is in rte_vhost_dequeue_burst (/usr/src/debug/dpdk- 
>>> 18.11/lib/librte_vhost/virtio_net.c:1192).
>>> 1187 * In zero copy mode, one mbuf can only reference data
>>> 1188 * for one or partial of one desc buff.
>>> 1189 */
>>> 1190 mbuf_avail = cpy_len;
>>> 1191 } else {
>>> 1192 if (likely(cpy_len > MAX_BATCH_LEN ||
>>> 1193 vq->batch_copy_nb_elems >= vq->size ||
>>> 1194 (hdr && cur == m))) {
>>> 1195 rte_memcpy(rte_pktmbuf_mtod_offset(cur, void *,
>>> 1196 mbuf_offset),
>>> (gdb)
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
  2021-11-26 14:09   ` Bendror, Eran (Nokia - US)
@ 2022-02-21 14:19     ` Marco Varlese
  0 siblings, 0 replies; 10+ messages in thread
From: Marco Varlese @ 2022-02-21 14:19 UTC (permalink / raw)
  To: Xia, Chenbo, ktraynor, maxime.coquelin, dev
  Cc: ayeh, Stokes, Ian, yega, Bendror, Eran (Nokia - US)

Hello,

I have been seeing the same issue with several different DPDK-OVS 
versions as well as QEMU versions.

It looks like an issue with handling the VHOST_USER_GET_VRING_BASE once 
the application in the guest is restarted. It might probably have to do 
with QEMU asynchronous message passing...

I am not an expert on the vhost/virtio so trying to have your help with 
this. Has anybody had the chance to look into this issue and found a 
solution or workaround?


Cheers,
Marco


On 11/26/21 15:09, Bendror, Eran (Nokia - US) wrote:
> Hi,
> 
> Internally the VM is using DPDK 17.05, on Centos7.9 – but this seems to 
> be reproducing with guest level 18.11 as well.
> 
> The issue is when the DPDK PMDs get started at guest, so the assumption 
> is that that presents bad / inaccessible memory towards the host.
> 
> We did notice some mis-use at the guest of selinux permissions, and 
> removing that helped reducing the frequency significantly.
> 
> Is there a way to map the shared memory between VM and host to see where 
> is the segmentation fault coming from?
> 
> I will see if I can upload the VM xml, but it is a multi-queue 4 port VM.
> 
> Thanks for the assistance,
> 
> Eran
> 
> *From:* Xia, Chenbo <chenbo.xia@intel.com>
> *Sent:* Friday, November 26, 2021 4:25 AM
> *To:* Bendror, Eran (Nokia - US) <eran.bendror@nokia.com>; 
> ktraynor@redhat.com
> *Cc:* ayeh@cisco.com; dev@dpdk.org; Stokes, Ian <ian.stokes@intel.com>; 
> maxime.coquelin@redhat.com; yega@cisco.com; Marco Varlese 
> <marco.varlese@suse.com>
> *Subject:* RE: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when 
> guest VM restarts network service
> 
> Hi,
> 
> Is it possible that you can provide more info about this isuee. I mean: 
> qemu cmdline/libvirt xml, ovs cmdline, guest driver version and etc… Or 
> it’s hard to reproduce the issue.
> 
> Thanks,
> 
> Chenbo
> 
> *From:* Bendror, Eran (Nokia - US) <eran.bendror@nokia.com 
> <mailto:eran.bendror@nokia.com>>
> *Sent:* Wednesday, November 17, 2021 10:42 PM
> *To:* ktraynor@redhat.com <mailto:ktraynor@redhat.com>
> *Cc:* ayeh@cisco.com <mailto:ayeh@cisco.com>; Xia, Chenbo 
> <chenbo.xia@intel.com <mailto:chenbo.xia@intel.com>>; dev@dpdk.org 
> <mailto:dev@dpdk.org>; Stokes, Ian <ian.stokes@intel.com 
> <mailto:ian.stokes@intel.com>>; maxime.coquelin@redhat.com 
> <mailto:maxime.coquelin@redhat.com>; yega@cisco.com <mailto:yega@cisco.com>
> *Subject:* Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when 
> guest VM restarts network service
> 
> Hello,
> 
> I am wondering if there was any progress in this topic, we are seeing a 
> very similar issue, where a VM level application restart triggers 
> segmentation fault and failed to allocate MBuf on the host level
> 
> CentOS Linux release 7.8.2003 (Core)
> 
> dpdk-18.11.5-1.el7_8.x86_64
> 
> openvswitch-2.11.0-4.el7.x86_64
> 
> libvirt 4.5.0
> 
> QEMU 4.5.0 (API)
> 
> QEMU 2.12.0
> 
> 3.10.0-1127.13.1.el7.x86_64
> 
> And we get the same crash
> 
> #0  0x00007f96cb72e7ee in rte_memcpy_generic () from 
> /lib64/librte_vhost.so.4
> 
> #1  0x00007f96cb7350f2 in rte_vhost_dequeue_burst () from 
> /lib64/librte_vhost.so.4
> 
> #2  0x00007f96caf97f03 in netdev_dpdk_vhost_rxq_recv () from 
> /lib64/libopenvswitch-2.11.so.0
> 
> #3  0x00007f96caed21e6 in netdev_rxq_recv () from 
> /lib64/libopenvswitch-2.11.so.0
> 
> #4  0x00007f96caea07ca in dp_netdev_process_rxq_port () from 
> /lib64/libopenvswitch-2.11.so.0
> 
> #5  0x00007f96caea0ca5 in pmd_thread_main () from 
> /lib64/libopenvswitch-2.11.so.0
> 
> #6  0x00007f96caf2da3f in ovsthread_wrapper () from 
> /lib64/libopenvswitch-2.11.so.0
> 
> #7  0x00007f96c9ef3ea5 in start_thread () from /lib64/libpthread.so.0
> 
> #8  0x00007f96c94118dd in clone () from /lib64/libc.so.6
> 
> We have tried upgrading host level artifacts:
> 
> dpdk-20.11.3-1.el7.x86_64
> 
> openvswitch-2.16.1-1.el7.x86_64
> 
> With backtrace:
> 
> #0  0x00007f6b8b49748c in virtio_dev_tx_split_legacy () from 
> /lib64/librte_vhost.so.21
> 
> #1  0x00007f6b8b4c0fdb in rte_vhost_dequeue_burst () from 
> /lib64/librte_vhost.so.21
> 
> #2  0x000055bd714c2802 in netdev_dpdk_vhost_rxq_recv ()
> 
> #3  0x000055bd713f8e51 in netdev_rxq_recv ()
> 
> #4  0x000055bd713c9d2a in dp_netdev_process_rxq_port ()
> 
> #5  0x000055bd713ca1f9 in pmd_thread_main ()
> 
> #6  0x000055bd71455cdf in ovsthread_wrapper ()
> 
> #7  0x00007f6b8a6a9ea5 in start_thread () from /lib64/libpthread.so.0
> 
> #8  0x00007f6b89bc78dd in clone () from /lib64/libc.so.6
> 
> Regards,
> 
> Eran
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
  2021-11-26  9:24 ` Xia, Chenbo
@ 2021-11-26 14:09   ` Bendror, Eran (Nokia - US)
  2022-02-21 14:19     ` Marco Varlese
  0 siblings, 1 reply; 10+ messages in thread
From: Bendror, Eran (Nokia - US) @ 2021-11-26 14:09 UTC (permalink / raw)
  To: Xia, Chenbo, ktraynor
  Cc: ayeh, dev, Stokes,  Ian, maxime.coquelin, yega, Marco Varlese

[-- Attachment #1: Type: text/plain, Size: 3729 bytes --]

Hi,

Internally the VM is using DPDK 17.05, on Centos7.9 – but this seems to be reproducing with guest level 18.11 as well.

The issue is when the DPDK PMDs get started at guest, so the assumption is that that presents bad / inaccessible memory towards the host.

We did notice some mis-use at the guest of selinux permissions, and removing that helped reducing the frequency significantly.

Is there a way to map the shared memory between VM and host to see where is the segmentation fault coming from?

I will see if I can upload the VM xml, but it is a multi-queue 4 port VM.

Thanks for the assistance,
Eran

From: Xia, Chenbo <chenbo.xia@intel.com>
Sent: Friday, November 26, 2021 4:25 AM
To: Bendror, Eran (Nokia - US) <eran.bendror@nokia.com>; ktraynor@redhat.com
Cc: ayeh@cisco.com; dev@dpdk.org; Stokes, Ian <ian.stokes@intel.com>; maxime.coquelin@redhat.com; yega@cisco.com; Marco Varlese <marco.varlese@suse.com>
Subject: RE: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service

Hi,

Is it possible that you can provide more info about this isuee. I mean: qemu cmdline/libvirt xml, ovs cmdline, guest driver version and etc… Or it’s hard to reproduce the issue.

Thanks,
Chenbo

From: Bendror, Eran (Nokia - US) <eran.bendror@nokia.com<mailto:eran.bendror@nokia.com>>
Sent: Wednesday, November 17, 2021 10:42 PM
To: ktraynor@redhat.com<mailto:ktraynor@redhat.com>
Cc: ayeh@cisco.com<mailto:ayeh@cisco.com>; Xia, Chenbo <chenbo.xia@intel.com<mailto:chenbo.xia@intel.com>>; dev@dpdk.org<mailto:dev@dpdk.org>; Stokes, Ian <ian.stokes@intel.com<mailto:ian.stokes@intel.com>>; maxime.coquelin@redhat.com<mailto:maxime.coquelin@redhat.com>; yega@cisco.com<mailto:yega@cisco.com>
Subject: Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service

Hello,

I am wondering if there was any progress in this topic, we are seeing a very similar issue, where a VM level application restart triggers segmentation fault and failed to allocate MBuf on the host level

CentOS Linux release 7.8.2003 (Core)
dpdk-18.11.5-1.el7_8.x86_64
openvswitch-2.11.0-4.el7.x86_64
libvirt 4.5.0
QEMU 4.5.0 (API)
QEMU 2.12.0
3.10.0-1127.13.1.el7.x86_64

And we get the same crash

#0  0x00007f96cb72e7ee in rte_memcpy_generic () from /lib64/librte_vhost.so.4
#1  0x00007f96cb7350f2 in rte_vhost_dequeue_burst () from /lib64/librte_vhost.so.4
#2  0x00007f96caf97f03 in netdev_dpdk_vhost_rxq_recv () from /lib64/libopenvswitch-2.11.so.0
#3  0x00007f96caed21e6 in netdev_rxq_recv () from /lib64/libopenvswitch-2.11.so.0
#4  0x00007f96caea07ca in dp_netdev_process_rxq_port () from /lib64/libopenvswitch-2.11.so.0
#5  0x00007f96caea0ca5 in pmd_thread_main () from /lib64/libopenvswitch-2.11.so.0
#6  0x00007f96caf2da3f in ovsthread_wrapper () from /lib64/libopenvswitch-2.11.so.0
#7  0x00007f96c9ef3ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f96c94118dd in clone () from /lib64/libc.so.6

We have tried upgrading host level artifacts:

dpdk-20.11.3-1.el7.x86_64
openvswitch-2.16.1-1.el7.x86_64

With backtrace:

#0  0x00007f6b8b49748c in virtio_dev_tx_split_legacy () from /lib64/librte_vhost.so.21
#1  0x00007f6b8b4c0fdb in rte_vhost_dequeue_burst () from /lib64/librte_vhost.so.21
#2  0x000055bd714c2802 in netdev_dpdk_vhost_rxq_recv ()
#3  0x000055bd713f8e51 in netdev_rxq_recv ()
#4  0x000055bd713c9d2a in dp_netdev_process_rxq_port ()
#5  0x000055bd713ca1f9 in pmd_thread_main ()
#6  0x000055bd71455cdf in ovsthread_wrapper ()
#7  0x00007f6b8a6a9ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f6b89bc78dd in clone () from /lib64/libc.so.6

Regards,
Eran


[-- Attachment #2: Type: text/html, Size: 8656 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
  2021-11-17 14:42 Bendror, Eran (Nokia - US)
@ 2021-11-26  9:24 ` Xia, Chenbo
  2021-11-26 14:09   ` Bendror, Eran (Nokia - US)
  0 siblings, 1 reply; 10+ messages in thread
From: Xia, Chenbo @ 2021-11-26  9:24 UTC (permalink / raw)
  To: Bendror, Eran (Nokia - US), ktraynor
  Cc: ayeh, dev, Stokes,  Ian, maxime.coquelin, yega, Marco Varlese

[-- Attachment #1: Type: text/plain, Size: 2458 bytes --]

Hi,

Is it possible that you can provide more info about this isuee. I mean: qemu cmdline/libvirt xml, ovs cmdline, guest driver version and etc… Or it’s hard to reproduce the issue.

Thanks,
Chenbo

From: Bendror, Eran (Nokia - US) <eran.bendror@nokia.com>
Sent: Wednesday, November 17, 2021 10:42 PM
To: ktraynor@redhat.com
Cc: ayeh@cisco.com; Xia, Chenbo <chenbo.xia@intel.com>; dev@dpdk.org; Stokes, Ian <ian.stokes@intel.com>; maxime.coquelin@redhat.com; yega@cisco.com
Subject: Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service

Hello,

I am wondering if there was any progress in this topic, we are seeing a very similar issue, where a VM level application restart triggers segmentation fault and failed to allocate MBuf on the host level

CentOS Linux release 7.8.2003 (Core)
dpdk-18.11.5-1.el7_8.x86_64
openvswitch-2.11.0-4.el7.x86_64
libvirt 4.5.0
QEMU 4.5.0 (API)
QEMU 2.12.0
3.10.0-1127.13.1.el7.x86_64

And we get the same crash

#0  0x00007f96cb72e7ee in rte_memcpy_generic () from /lib64/librte_vhost.so.4
#1  0x00007f96cb7350f2 in rte_vhost_dequeue_burst () from /lib64/librte_vhost.so.4
#2  0x00007f96caf97f03 in netdev_dpdk_vhost_rxq_recv () from /lib64/libopenvswitch-2.11.so.0
#3  0x00007f96caed21e6 in netdev_rxq_recv () from /lib64/libopenvswitch-2.11.so.0
#4  0x00007f96caea07ca in dp_netdev_process_rxq_port () from /lib64/libopenvswitch-2.11.so.0
#5  0x00007f96caea0ca5 in pmd_thread_main () from /lib64/libopenvswitch-2.11.so.0
#6  0x00007f96caf2da3f in ovsthread_wrapper () from /lib64/libopenvswitch-2.11.so.0
#7  0x00007f96c9ef3ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f96c94118dd in clone () from /lib64/libc.so.6

We have tried upgrading host level artifacts:

dpdk-20.11.3-1.el7.x86_64
openvswitch-2.16.1-1.el7.x86_64

With backtrace:

#0  0x00007f6b8b49748c in virtio_dev_tx_split_legacy () from /lib64/librte_vhost.so.21
#1  0x00007f6b8b4c0fdb in rte_vhost_dequeue_burst () from /lib64/librte_vhost.so.21
#2  0x000055bd714c2802 in netdev_dpdk_vhost_rxq_recv ()
#3  0x000055bd713f8e51 in netdev_rxq_recv ()
#4  0x000055bd713c9d2a in dp_netdev_process_rxq_port ()
#5  0x000055bd713ca1f9 in pmd_thread_main ()
#6  0x000055bd71455cdf in ovsthread_wrapper ()
#7  0x00007f6b8a6a9ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f6b89bc78dd in clone () from /lib64/libc.so.6

Regards,
Eran


[-- Attachment #2: Type: text/html, Size: 45175 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] [ovs-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service
@ 2021-11-17 14:42 Bendror, Eran (Nokia - US)
  2021-11-26  9:24 ` Xia, Chenbo
  0 siblings, 1 reply; 10+ messages in thread
From: Bendror, Eran (Nokia - US) @ 2021-11-17 14:42 UTC (permalink / raw)
  To: ktraynor; +Cc: ayeh, chenbo.xia, dev, ian.stokes, maxime.coquelin, yega

[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]

Hello,

I am wondering if there was any progress in this topic, we are seeing a very similar issue, where a VM level application restart triggers segmentation fault and failed to allocate MBuf on the host level

CentOS Linux release 7.8.2003 (Core)
dpdk-18.11.5-1.el7_8.x86_64
openvswitch-2.11.0-4.el7.x86_64
libvirt 4.5.0
QEMU 4.5.0 (API)
QEMU 2.12.0
3.10.0-1127.13.1.el7.x86_64

And we get the same crash

#0  0x00007f96cb72e7ee in rte_memcpy_generic () from /lib64/librte_vhost.so.4
#1  0x00007f96cb7350f2 in rte_vhost_dequeue_burst () from /lib64/librte_vhost.so.4
#2  0x00007f96caf97f03 in netdev_dpdk_vhost_rxq_recv () from /lib64/libopenvswitch-2.11.so.0
#3  0x00007f96caed21e6 in netdev_rxq_recv () from /lib64/libopenvswitch-2.11.so.0
#4  0x00007f96caea07ca in dp_netdev_process_rxq_port () from /lib64/libopenvswitch-2.11.so.0
#5  0x00007f96caea0ca5 in pmd_thread_main () from /lib64/libopenvswitch-2.11.so.0
#6  0x00007f96caf2da3f in ovsthread_wrapper () from /lib64/libopenvswitch-2.11.so.0
#7  0x00007f96c9ef3ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f96c94118dd in clone () from /lib64/libc.so.6

We have tried upgrading host level artifacts:

dpdk-20.11.3-1.el7.x86_64
openvswitch-2.16.1-1.el7.x86_64

With backtrace:

#0  0x00007f6b8b49748c in virtio_dev_tx_split_legacy () from /lib64/librte_vhost.so.21
#1  0x00007f6b8b4c0fdb in rte_vhost_dequeue_burst () from /lib64/librte_vhost.so.21
#2  0x000055bd714c2802 in netdev_dpdk_vhost_rxq_recv ()
#3  0x000055bd713f8e51 in netdev_rxq_recv ()
#4  0x000055bd713c9d2a in dp_netdev_process_rxq_port ()
#5  0x000055bd713ca1f9 in pmd_thread_main ()
#6  0x000055bd71455cdf in ovsthread_wrapper ()
#7  0x00007f6b8a6a9ea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f6b89bc78dd in clone () from /lib64/libc.so.6

Regards,
Eran


[-- Attachment #2: Type: text/html, Size: 5002 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-02-21 14:19 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19  1:44 [dpdk-dev] ovs-vswitchd with DPDK crashed when guest VM restarts network service Alex Yeh (ayeh)
2020-11-19 11:21 ` [dpdk-dev] [ovs-dev] " Stokes, Ian
2020-11-19 12:08   ` Kevin Traynor
2021-01-08 19:35     ` Alex Yeh (ayeh)
2021-01-12 18:20       ` Alex Yeh (ayeh)
2021-01-13 14:14         ` Kevin Traynor
2021-11-17 14:42 Bendror, Eran (Nokia - US)
2021-11-26  9:24 ` Xia, Chenbo
2021-11-26 14:09   ` Bendror, Eran (Nokia - US)
2022-02-21 14:19     ` Marco Varlese

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).