From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f48.google.com (mail-pa0-f48.google.com [209.85.220.48]) by dpdk.org (Postfix) with ESMTP id CB6C28E64 for ; Thu, 1 Oct 2015 23:02:26 +0200 (CEST) Received: by pacfv12 with SMTP id fv12so86820358pac.2 for ; Thu, 01 Oct 2015 14:02:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=hMa0oO29VzNx/MmbHUCez/rL/hQCJjxEG4BEFUW2l7w=; b=X0WauqoE1UdeyZNvkDbbUWxGyj4+dt9OCeqDikzI2hSIEk7QmhppE2bXSjZ7VZF95C 0V/HNLSlVVDYxzzG9yt3BwG/FUh/AHK6AIDNTfNWzRFuDyuSmktKgYCJtfurCthhCkvF /Ac+zyvmIc4HfyR6CDU1FDBBnXeKdTLsw7l3TLXdr2X682/ubaNQ2MiJjAUZkeuL8YUP L8GzeK5SgescGh6zv1wxngZL//dnWLekWF+TMXcdF/sQ6j/txkvEfF4TPcluRfixfY5o qDtH8GjkwFdYtqCziKAU/aycLovlxuK0R9v4XbqAywFapCk1bdNBXwsXM5sWvK9gdS7X VYCA== X-Received: by 10.68.68.143 with SMTP id w15mr15018375pbt.56.1443733346049; Thu, 01 Oct 2015 14:02:26 -0700 (PDT) Received: from [192.168.1.188] (static-50-53-21-5.bvtn.or.frontiernet.net. [50.53.21.5]) by smtp.googlemail.com with ESMTPSA id rw8sm8502663pac.11.2015.10.01.14.02.24 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 01 Oct 2015 14:02:25 -0700 (PDT) To: "Michael S. Tsirkin" , Bruce Richardson References: <20151001113828-mutt-send-email-mst@redhat.com> <560CF44A.60102@scylladb.com> <20151001120027-mutt-send-email-mst@redhat.com> <560CFB66.5050904@scylladb.com> <20151001124211-mutt-send-email-mst@redhat.com> <560D0413.5080401@scylladb.com> <20151001131754-mutt-send-email-mst@redhat.com> <20151001110806.GA16248@bricha3-MOBL3> <20151001141124-mutt-send-email-mst@redhat.com> <20151001120713.GA11504@bricha3-MOBL3> <20151001155943-mutt-send-email-mst@redhat.com> From: Alexander Duyck Message-ID: <560D9F60.6040907@gmail.com> Date: Thu, 1 Oct 2015 14:02:24 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20151001155943-mutt-send-email-mst@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" , Avi Kivity Subject: Re: [dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Oct 2015 21:02:27 -0000 On 10/01/2015 06:14 AM, Michael S. Tsirkin wrote: > On Thu, Oct 01, 2015 at 01:07:13PM +0100, Bruce Richardson wrote: >>>> This in itself is going to use up >>>> a good proportion of the processing time, as well as that we have to spend cycles >>>> copying the descriptors from one ring in memory to another. Given that right now >>>> with the vector ixgbe driver, the cycle cost per packet of RX is just a few dozen >>>> cycles on modern cores, every additional cycle (fraction of a nanosecond) has >>>> an impact. >>>> >>>> Regards, >>>> /Bruce >>> See above. There is no need for that on data path. Only re-adding >>> buffers requires a system call. >>> >> Re-adding buffers is a key part of the data path! Ok, the fact that its only on >> descriptor rearm does allow somewhat bigger batches, > That was the point, yes. > >> but the whole point of having >> the kernel do this extra work you propose is to allow the kernel to scan and >> sanitize the physical addresses - and that will take a lot of cycles, especially >> if it has to handle all the different descriptor formats of all the different NICs, >> as has already been pointed out. >> >> /Bruce > Well the driver would be per NIC, so there's only need to support > specific formats supported by a given NIC. One thing that seems to be overlooked in your discussion is the cost to translate these descriptors. It isn't as if most systems running DPDK have the cycles to spare. As I believe was brought up in another thread we are looking at a budget of something like 68ns of 10Gbps line rate. The overhead for having to go through and translate/parse/validate the descriptors would end up being pretty significant. If you need proof of that just try running the ixgbe driver and route small packets. We end up spending something like 40ns in ixgbe_clean_rx_irq and that is mostly just translating the descriptor bits into the correct sk_buff bits. Also trying to maintain a user-space ring in addition to the kernel-space ring means that much more memory overhead and increasing the liklihood of things getting pushed out of the L1 cache. As far as the descriptor validation itself the overhead for that would guarantee that you cannot get any performance out of the device. There are too many corner cases that would have to be addressed in validating user-space input to allow for us to process packets in any sort of timely fashion. For starters we would have to validate the size, alignment, and ownership of a given buffer. If it is a transmit buffer we have to go through and validate any offloads being requested. Likely just the validation and translation would add 10s if not 100s of nanoseconds to the time needed to process each packet. In addition we are talking about doing this in kernel space which means we wouldn't really be able to take advantage of things like SSE or AVX instructions. > An alternative is to format the descriptors in kernel, based > on just the list of addresses. This seems cleaner, but I don't > know how efficient it would be. > > Device vendors and dpdk developers are probably the best people to > figure out what's the best thing to do here. As far as the bifurcated driver approach the only way something like that would ever work is if you could limit the access via an IOMMU. At least everything I have seen proposed for a bifurcated driver still involved one if they expected to get any performance. > But it looks like it's not going to happen unless security is made > a requirement for upstreaming code. The fact is we already ship uio_pci_generic. User space drivers are here to stay. What is being asked for is an extension to the existing infrastructure to allow MSI-X interrupts to trigger an event on a file descriptor. As far as I know that doesn't add any additional security risk since it is the kernel PCIe subsystem itself that would be programming the address and data for said device, it wouldn't actually grant any more access other then the additional file descriptors to support MSI-X vectors. Anyway that is just my $.02 on this. - Alex