From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f172.google.com (mail-qt0-f172.google.com [209.85.216.172]) by dpdk.org (Postfix) with ESMTP id 0A0715A9B for ; Mon, 27 Jun 2016 17:53:07 +0200 (CEST) Received: by mail-qt0-f172.google.com with SMTP id w59so23976165qtd.3 for ; Mon, 27 Jun 2016 08:53:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mirantis.com; s=google; h=mime-version:from:date:message-id:subject:to; bh=BYl0HnIsoFgy0QpG0ZFr6YJ2tX9ndA6ivj+28wHHr2w=; b=QYcBMtVmqWIeyzH+fxsGQgmSNQoBICicSC8Huuxje+sjq7QIM9hH5v6Phx05NWKYqL AMteqWnBehA3nTCcL+W9/BeqJWITsp0PZ5mIDIszokWCuZTb5264F9oF/CfvybjuoXUv XX4CFVmItTRiCtcADsasR21e4anVPl6rgHEwM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=BYl0HnIsoFgy0QpG0ZFr6YJ2tX9ndA6ivj+28wHHr2w=; b=YTRQ33P4/qq8IyuefcBuAqWlAj6cQPe6zgJlUVpvnqwljq1k/3s9O93VcWs1qB9Ubf YL0wa7PM/UKs7a0arY/vF0oIV31UJ0bVhP411o3lZDs8rh5YWS2sdYzWRVaZTdfmTiGP TJN1wn44W4cPSZGzmZaAxCWcn0rKmSfC/+quZ4KWDn07VAv5NsA6u/NwbMrqPl13zWDs Y5ZqkcgMLibWxSMYygsB5mBTLpVARcEN0jOmAHDqyVwANSLp/yctS2ma/ZVCA9FadNCq GGD+HCNACYsWGtOuPtNtTBMa4fPOgvyEHjXFnokzQ07MqbP04S2fAD7wrLN2qpsWVxbV wUAg== X-Gm-Message-State: ALyK8tLfJK5oaqXZZQBK08BRT1G2sOgf+Xlh4lB7DuORYvj3jUG5k0Rp0wTEgcnVb1FEoI/ejUM+uedzcF0dMHGT X-Received: by 10.200.36.13 with SMTP id c13mr24562514qtc.88.1467042786288; Mon, 27 Jun 2016 08:53:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.55.137.197 with HTTP; Mon, 27 Jun 2016 08:53:05 -0700 (PDT) From: Sergey Matov Date: Mon, 27 Jun 2016 18:53:05 +0300 Message-ID: To: users@dpdk.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-users] OpenVSwitch with DPDK causes pmd segfault X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2016 15:53:07 -0000 Hello dear community. I've faced unexpected segmentation fault running Open vSwitch with DPDK using OpenStack. On the Ubuntu 14.04 with 3.13 kernel we are having Open vSwitch ml2 neutron accelerated with DPDK. Running VM with 3 interfaces and vCPU pinning goes ok. VM got all pings and SSH successfully. Host related configuration: Hugepages: HugePages_Total: 39552 HugePages_Free: 31360 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB NUMA topology: root@node-1:~# lscpu | grep NUMA NUMA node(s): 4 NUMA node0 CPU(s): 0-4,20-24 NUMA node1 CPU(s): 5-9,25-29 NUMA node2 CPU(s): 10-14,30-34 NUMA node3 CPU(s): 15-19,35-39 OpenStack release: stable/Mitaka OVS version: 2.4.1 (bit patched for DPDK 2.1 support) DPDK: 2.1 QEMU: 2.3 Libvirt: 1.2.9.3 OVS coremask 0x1 OVS PMD CPU mask 0x308426 VM based on Ubuntu 14.04 with 4.2 kernel. Guest system got 8GB RAM, 8GB disc and 4 vCPUs. All vCPUs pinned are from same NUMA. We are also make sure that every NUMA node got > 12 GB of HugePages. Brief Guest vCPU configuration: 8388608 8388608 4 Guest VM running with VirtIO vNICs. After successful setup of VM we are trying to run DPDK application on Guest VM (for example, testpmd or pktgen) we are able to see Application is working for some time. But after short time, OR after restarting DPDK application, we are seeing OVS dpdk-based ports (dpdk and vhost-user) are goes down with "Cannot allocate memory" root@node-1:~# ovs-vsctl show 49af53c7-e8fc-46b2-b077-f5afd64302a1 Bridge br-floating Port phy-br-floating Interface phy-br-floating type: patch options: {peer=int-br-floating} Port "p_ff798dba-0" Interface "p_ff798dba-0" type: internal Port br-floating Interface br-floating type: internal Bridge br-int fail_mode: secure Port "vhu775c67a4-1c" tag: 2 Interface "vhu775c67a4-1c" type: dpdkvhostuser error: "could not open network device vhu775c67a4-1c (Cannot allocate memory)" Port "vhu50d83b4f-ed" tag: 4 Interface "vhu50d83b4f-ed" type: dpdkvhostuser error: "could not open network device vhu50d83b4f-ed (Cannot allocate memory)" Port int-br-prv Interface int-br-prv type: patch options: {peer=phy-br-prv} Port "fg-52a823ea-0f" tag: 1 Interface "fg-52a823ea-0f" type: internal Port br-int Interface br-int type: internal Port "qr-96fc706d-60" tag: 2 Interface "qr-96fc706d-60" type: internal Port int-br-floating Interface int-br-floating type: patch options: {peer=phy-br-floating} Port "vhu10b484ac-d9" tag: 3 Interface "vhu10b484ac-d9" type: dpdkvhostuser error: "could not open network device vhu10b484ac-d9 (Cannot allocate memory)" Bridge br-prv Port br-prv Interface br-prv type: internal Port phy-br-prv Interface phy-br-prv type: patch options: {peer=int-br-prv} Port "dpdk0" Interface "dpdk0" type: dpdk error: "could not open network device dpdk0 (Cannot allocate memory)" ovs_version: "2.4.1" And dmesg shows next error: [11364.145636] pmd35[5499]: segfault at 8 ip 00007fe3abe4ea4e sp 00007fe325ffa7b0 error 4 in libdpdk.so[7fe3abcee000+1bb000] Back trace of GDB: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f8c56ffd700 (LWP 5650)] __netdev_dpdk_vhost_send (netdev=0x7f8d1c803f00, pkts=pkts@entry=0x7f8c56ff8fa0, cnt=cnt@entry=32, may_steal=) at lib/netdev-dpdk.c:1054 1054 while (!rte_vring_available_entries(virtio_dev, VIRTIO_RXQ)) { (gdb) bt #0 __netdev_dpdk_vhost_send (netdev=0x7f8d1c803f00, pkts=pkts@entry=0x7f8c56ff8fa0, cnt=cnt@entry=32, may_steal=) at lib/netdev-dpdk.c:1054 #1 0x00000000005048f7 in netdev_dpdk_vhost_send (netdev=, qid=, pkts=0x7f8c56ff8fa0, cnt=32, may_steal=) at lib/netdev-dpdk.c:1196 #2 0x0000000000476950 in netdev_send (netdev=, qid=, buffers=buffers@entry=0x7f8c56ff8fa0, cnt=cnt@entry=32, may_steal=may_steal@entry=true) at lib/netdev.c:740 #3 0x000000000045c28c in dp_execute_cb (aux_=aux_@entry=0x7f8c56ffc820, packets=packets@entry=0x7f8c56ff8fa0, cnt=cnt@entry=32, a=a@entry=0x7f8c9c009884, may_steal=true) at lib/dpif-netdev.c:3396 #4 0x000000000047c451 in odp_execute_actions (dp=dp@entry=0x7f8c56ffc820, packets=packets@entry=0x7f8c56ff8fa0, cnt=32, steal=steal@entry=true, actions=, actions_len=, dp_execute_action=dp_execute_action@entry=0x45c120 ) at lib/odp-execute.c:518 #5 0x000000000045bcce in dp_netdev_execute_actions (actions_len=, actions=, may_steal=true, cnt=, packets=, pmd=0x1c12e00) at lib/dpif-netdev.c:3536 #6 packet_batch_execute (now=, pmd=, batch=0x7f8c56ff8f88) at lib/dpif-netdev.c:3084 #7 dp_netdev_input (pmd=pmd@entry=0x1c12e00, packets=packets@entry=0x7f8c56ffc920, cnt=) at lib/dpif-netdev.c:3320 #8 0x000000000045bf22 in dp_netdev_process_rxq_port (pmd=pmd@entry=0x1c12e00, port=0x17b8f50, rxq=) at lib/dpif-netdev.c:2513 #9 0x000000000045cf93 in pmd_thread_main (f_=0x1c12e00) at lib/dpif-netdev.c:2661 #10 0x00000000004aec34 in ovsthread_wrapper (aux_=) at lib/ovs-thread.c:340 #11 0x00007f8d21e1b184 in start_thread (arg=0x7f8c56ffd700) at pthread_create.c:312 #12 0x00007f8d21b4837d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 OVS log error: 2016-06-27T15:32:31.235Z|00003|daemon_unix(monitor)|ERR|1 crashes: pid 4628 died, killed (Segmentation fault), core dumped, restarting This behavior is quite similar to https://bugzilla.redhat.com/show_bug.cgi?id=1293495. I am able to reproduce this scenario in 100% by running DPDK 2.1.0 and latest pktgen or in-tree testpmd on guest. Please can anyone clarify the issue? Does the vhost-user code on this particular DPDK v 2.1.0 is suffering from this kind of issue(I can't find one)? -- *Best Regards* *Sergey Matov* *Mirantis Inc*