From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qc0-x22a.google.com (mail-qc0-x22a.google.com [IPv6:2607:f8b0:400d:c01::22a]) by dpdk.org (Postfix) with ESMTP id 32904156 for ; Tue, 17 Dec 2013 00:34:21 +0100 (CET) Received: by mail-qc0-f170.google.com with SMTP id x13so4359526qcv.29 for ; Mon, 16 Dec 2013 15:35:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=TPbukS+Jthqv0LiJYWTWqYvNo2Dfo6FByJ9l+woXEDU=; b=0kwtoARoCRb1hcElslmnqWYnx9t2PHqEWDOkC0FjT7Tf2gpCCwJZLA/G3zL/h2Tu75 Jm2YuaG/N/F7wbyO2DpYD9W0dcpUxekziOOyQD9qWhQfzOwkyccYxyuuO3GO7d4y3wsB GkFxpGCMNcFs+GQZeoQ8gyIka+mD2axErROgpi+Ks3gEFc43bQqyJz5wbkwTJeHZsR7F XqCkWBbSPyCfF90/c8y0AEIZbpvJdEusVLCM4CeudQrc2H05iq3fm/k1BnL9/+fYQG0C T1btalnzXGSdr8crWwV4WArfG2ODS+A57m/PDs1k7BeNG/Bk5YRA9sUrxI1OHBW7DZ+p ThWg== MIME-Version: 1.0 X-Received: by 10.49.108.200 with SMTP id hm8mr37497481qeb.53.1387236927288; Mon, 16 Dec 2013 15:35:27 -0800 (PST) Received: by 10.96.63.166 with HTTP; Mon, 16 Dec 2013 15:35:27 -0800 (PST) In-Reply-To: <20131213150121.3d70a0d2@nehalam.linuxnetplumber.net> References: <20131213150121.3d70a0d2@nehalam.linuxnetplumber.net> Date: Mon, 16 Dec 2013 15:35:27 -0800 Message-ID: From: James Yu To: Stephen Hemminger Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: dev@dpdk.org Subject: Re: [dpdk-dev] outw() in virtio_ring_doorbell() in DPDK+virtio consume 40% of the CPU in oprofile X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Dec 2013 23:34:21 -0000 (A) The packets I sent are 64-bytes, not big packet. I am not sure GSO will help. For bigger packet, it will help. (B) What you do mean "multiple packets per second" ? Do you mean multiple queue support to send/receive parallel in multiple cores to speed it up ? Is it supported in DPDK 1.3.1r2 ? (C) There are two places using dpdk_ring_doorbell() in virtio_user.c, eth_tx_burst() and virtio_alloc_rxq() which is called in virtio_recv_buf(). I looked at them further using "top perf -C 0". It could even occupies 80% of the logical core 0 on a CentOS 32-bit VM. Here is the implementation of outw() using gcc preprocessing (-E) static void outw(unsigned short int value, unsigned short int __port){ __asm__ __volatile__ ("outw %w0,%w1": :"a" (value), "Nd" (__port)); } Is outw command a blocking call ? Based on this link http://wiki.osdev.org/Inline_Assembly/Examples, I am not sure it is blocked/waiting. The question is what's causing it to be blocked during the outw operation ? Is it normal ? If it is simply because of the IO virtualization to map guest physical port address to host physical port addresses, how can it be improved with using VT-d ? Using the MMIO as described below ? There are two components in vmxnet3: the user PMD codes and the kernel driver. It actually uses MMIO to access to the memory. vmxnet3.ko vmxnet3_alloc_pci_resources -> compat_pci_resource_start -> ioremap userspace -> PMD vmxnet3_init_adapter:: adapter->hw_addr1 = (unsigned char*) mmap() The first time of virtual address fault, the vmxnet3 driver to find the mapped IO and cache it. Subsequent access to the virtual address will be faster. I wonder how Virtio using outw() handles the access to the IO port address. Does it have to map from the IO port address in the VM to the physical port address in the host for EVERY access ? If that's the case,some improvement can be done if we use similar way as the vmxnet3 model ? Thanks James On Fri, Dec 13, 2013 at 3:01 PM, Stephen Hemminger < stephen@networkplumber.org> wrote: > On Fri, 13 Dec 2013 14:04:35 -0800 > James Yu wrote: > > > Resending it due to missing [dpdk-dev] in the subject line. > > > > I am using Spirent to send a 2Gbps traffic to a 10G port that are looped > > back by l2fwd+DPDK+virtio in a CentOS 32-bit and receive on the other > port > > only at 700 Mbps. The CentOS 32-bit is on a Fedora 18 KVM host. The > > virtual interfaces are configured as virtio port type, not e1000. > vhost-net > > was automatically used in qemu-kvm when virtio ports are used in the > guest. > > > > The questions are > > A. Why it can only reach 2Gbps > > B. Why outw() is using 40% of the entire measurement when it only try to > > write 2 bytes to the IO port using assembly outw command ? Is it a > blocking > > call ? or it wastes time is mapping from the IO address of the guest to > the > > physical address of the IO port on the host ? > > C. any way to improve it ? > > D. vmxnet PMD codes are using memory mapped IO address, not port IO > > address. Will it be faster to use memory mapped IO address ? > > > > Any pointers or feedback will help. > > Thanks > > > > James > > The outw is a VM exit to the hypervisor. It informs the hypervisor that > data > is ready to send and it runs then. To really get better performance, virtio > needs to be able to do multiple packets per send. For bulk throughput > GSO support would help, but that is a generic DPDK issues. > > Virtio use I/O to signal hypervisor (there is talk of using MMIO in later > versions but it won't be faster. > >