From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f174.google.com (mail-wi0-f174.google.com [209.85.212.174]) by dpdk.org (Postfix) with ESMTP id DE8835681 for ; Thu, 7 May 2015 18:22:02 +0200 (CEST) Received: by wizk4 with SMTP id k4so249413368wiz.1 for ; Thu, 07 May 2015 09:22:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=BtGr6VWeG6xzvnRMATcECx0fJPnhvkGBaV4HEC7m14M=; b=GDwm02ZFuuVUcBMBjDG2VzMHVckSumrcmFbL4H8V1/4uBllDADuq0rKpFm2z2ZZHz7 bft9x4JJTdhtEo2mCGrZLgKg9Na8FnrTErm5TBCaujO1VZbi+MIsji1AvgNw0g6pzPJw j+eltgYmk8GpLj/hMvWdFfHRnBW2xFIE0dYaWa9s+pONvOXgP8au3MIsz+62sisS+mT4 +VEJV2GQvN2fDxxqCOIVZriLdASxFFvHlthv+pLiLS8zjPTdwHiqpjJVW8nukzp2upU4 hlknX/Pk9kGQ/w1nQWjv0iUt613qeRHvboNtELTXApYnDrDU9/NjL5Sn6wq3GP5Wporb GUnQ== X-Gm-Message-State: ALoCoQklom2eoXyQ56VWdjM/poscNCBNzUA72gjsRoXbhwUWNAlqwRgM0kbT35vBhEMahmvUG1GL MIME-Version: 1.0 X-Received: by 10.180.96.196 with SMTP id du4mr8031196wib.77.1431015722703; Thu, 07 May 2015 09:22:02 -0700 (PDT) Received: by 10.180.70.140 with HTTP; Thu, 7 May 2015 09:22:02 -0700 (PDT) Date: Thu, 7 May 2015 19:22:02 +0300 Message-ID: From: Oleg Strikov To: dev@dpdk.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-dev] Issues met while running openvswitch/dpdk/virtio inside the VM X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 May 2015 16:22:03 -0000 Hi DPDK users and developers, Few weeks ago I came up with the idea to run openvswitch with dpdk backend inside qemu-kvm virtual machine. I don't have enough supported NICs yet and my plan was to start experimenting inside the virtualized environment, achieve functional state of all the components and then switch to the real hardware. Additional useful side-effect of doing things inside the vm is that issues can be easily reproduced by someone else in a different environment. I (fondly) hoped that running openvswitch/dpdk inside the vm would be simpler than running the same set of components on the real hardware. Unfortunately I met a bunch of issues on the way. All these issues lie on a borderline between dpdk and openvswitch but I think that you might be interested in my story. Please note that I still don't have openvswitch/dpdk working inside the vm. I definetely have some progress though. Q: Does it sound okay from functional (not performance) standpoint to run openvswitch/dpdk inside the vm? Do we want to be able to do this? Does anyone from the dpdk development team do this? ## Issue 1 ## Openvswitch requires backend pmd driver to provide N_CORES tx queues where N_CORES is the amount of cores available on the machine (openvswitch counts the amount of cpu* entries inside /sys/devices/system/node/node0/ folder). To my understanding it doesn't take into account the actual amount of cores used by dpdk and just allocates tx queue for each available core. You may refer to this chunk of code for details: https://github.com/openvswitch/ovs/blob/master/lib/dpif-netdev.c#L1067 This approach works fine on the real hardware but makes some issues when we run openvswitch/dpdk inside the virtual machine. I tried both emulated e1000 NIC and virtio NIC and neither of them worked just from the box. Emulated e1000 NIC doesn't support multiple tx queues at all (see http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_e1000/em_ethdev.c#n884) and virtio NIC doesn't support multiple tx queues by default. To enable multiple tx queue for virtio NIC I had to add the following line to the interface section of my libvirt config: '' ## Issue 2 ## Openvswitch calls rte_eth_tx_queue_setup() twice for the same port_id/queue_id. First call takes place during device initialization (see call to dpdk_eth_dev_init() inside netdev_dpdk_init(): https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L522). Second call takes place when openvswitch tries to add more tx queues to the device (see call to dpdk_eth_dev_init() inside netdev_dpdk_set_multiq(): https://github.com/openvswitch/ovs/blob/master/lib/netdev-dpdk.c#L697). Second call not only initialized new queues but tries to re-initialize existing ones. Unfortunately virtio driver can't handle second call of rte_eth_tx_queue_setup() and returns error here: http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_virtio/virtio_ethdev.c#n316 This happens because memzone with the name portN_tvqN already exists when second call takes place (memzone has been created during the first call). To deal with this issue I had to manually add rte_memzone_lookup-based check for this situation and avoid allocation of a new memzone if it already exists. Q: Is it okay that openvswitch calls rte_eth_tx_queue_setup() twice? Right now I can't understand if it's the issue with the virtio pmd driver or incorrect API usage by openvswitch? Could someone shed some light on this so I can move forward and maybe propose a fix. ## Issue 3 ## This issue is also (somehow) related to the fact that openvswitch calls rte_eth_tx_queue_setup() twice. I fix the previous issue by the method described above and initialization finishes. The whole machinery starts to work but crashes at the very beginning (while fetching the first packet from the NIC maybe). This crash happens here: http://dpdk.org/browse/dpdk/tree/lib/librte_pmd_virtio/virtio_rxtx.c#n588 It takes place because vq_ring structure contains zeros instead of correct values: vq_ring = {num = 0, desc = 0x0, avail = 0x0, used = 0x0} My understanding is that vq_ring gets initialized after the first call to rte_eth_tx_queue_setup(), then overwritten by the second call to rte_eth_tx_queue_setup() but without an appropriate initialization for the second time. I'm trying to fix this issue right now. Q: Does it sound like a realistic goal to make virtio driver work in openvswitch-like scenarios? I'm definitely not an expert in the area of dpdk and can't estimate time and resources required. Maybe it's better to wait until I get a proper hardware? Thanks for helping, Oleg