From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f181.google.com (mail-wi0-f181.google.com [209.85.212.181]) by dpdk.org (Postfix) with ESMTP id A34468DAC for ; Thu, 1 Oct 2015 13:20:40 +0200 (CEST) Received: by wicgb1 with SMTP id gb1so24529671wic.1 for ; Thu, 01 Oct 2015 04:20:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=pDKJbO2rGBthKlNcIvJNe1haxY2Hsc5QVhFYxmsw9oc=; b=m3QUUFsUt14Imh+oVd1jEljgZvT1OxMCcCj+8DeNJUBrU+f7maU/rTEk+Tr0/nMzr8 DVdFrXAGh8eZSWiZb/tUQklwehU5F6YBqGko5dPP1QHDX1cLyfsSHzUvybTaWdGYulJ+ dg/I966Q/zLSMc+/L1QReeJANKOjTdtIIxQNfYVJxrW/z49L/yVQ/O1YL+m0DJmZ8ZcT ytk5zoR+fiP/HaYXvazWnwsWmjjzPawPIDQ7bCHPStjlDBlzh8n6R8Woz5D3eKF0uKyy hEpxZf7zkO4DrB38JwBEezIBfFJP3TURMJ5I8XfKvcvZJePHyhN7xB4uUgSxNE7pC3wS F/qg== X-Gm-Message-State: ALoCoQmOpfpnSdypbBn5MfnAfbkcwgC27yv2iqJ/cmOyt+l/32tNtM8tnyfQ6cOWr/mezD4TwPjQ X-Received: by 10.194.184.166 with SMTP id ev6mr10726033wjc.125.1443698440448; Thu, 01 Oct 2015 04:20:40 -0700 (PDT) Received: from avi.cloudius ([37.142.229.250]) by smtp.googlemail.com with ESMTPSA id t7sm2669788wia.9.2015.10.01.04.20.38 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 01 Oct 2015 04:20:38 -0700 (PDT) To: "Michael S. Tsirkin" References: <560C0171.7080507@scylladb.com> <20150930204016.GA29975@redhat.com> <20151001113828-mutt-send-email-mst@redhat.com> <560CF44A.60102@scylladb.com> <20151001120027-mutt-send-email-mst@redhat.com> <560CFB66.5050904@scylladb.com> <20151001124211-mutt-send-email-mst@redhat.com> <560D0413.5080401@scylladb.com> <20151001131754-mutt-send-email-mst@redhat.com> <560D0FE2.7010905@scylladb.com> <20151001135054-mutt-send-email-mst@redhat.com> From: Avi Kivity Message-ID: <560D1705.30300@scylladb.com> Date: Thu, 1 Oct 2015 14:20:37 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20151001135054-mutt-send-email-mst@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Oct 2015 11:20:40 -0000 On 10/01/2015 02:09 PM, Michael S. Tsirkin wrote: > On Thu, Oct 01, 2015 at 01:50:10PM +0300, Avi Kivity wrote: >>>> It's not just the lack of system calls, of course, the architecture is >>>> completely different. >>> Absolutely - I'm not saying move all of DPDK into kernel. >>> We just need to protect the RX rings so hardware does >>> not corrupt kernel memory. >>> >>> >>> Thinking about it some more, many devices >>> have separate rings for DMA: TX (device reads memory) >>> and RX (device writes memory). >>> With such devices, a mode where userspace can write TX ring >>> but not RX ring might make sense. >> I'm sure you can cause havoc just by reading, if you read from I/O memory. > Not talking about I/O memory here. These are device rings in RAM. Right. But you program them with DMA addresses, so the device can read another device's memory. >>> This will mean userspace might read kernel memory >>> through the device, but can not corrupt it. >>> >>> That's already a big win! >>> >>> And RX buffers do not have to be added one at a time. >>> If we assume 0.2usec per system call, batching some 100 buffers per >>> system call gives you 2 nano seconds overhead. That seems quite >>> reasonable. >> You're ignoring the page table walk > Some caching strategy might work here. It may, or it may not. I'm not against this. I'm against blocking user's access to their hardware, using an existing, established interface, for a small subset of setups. It doesn't help you in any way (you can still get reports of oopses due to buggy userspace drivers on physical machines, or on virtual machines that don't require interrupts), and it harms them. >> and other per-descriptor processing. > You probably can let userspace pre-format it all, > just validate addresses. You have to figure out if the descriptor contains an address or not (many devices have several descriptor formats, some with addresses and some without, which are intermixed). You also have to parse the descriptor size and see if it crosses a page boundary or not. > >> Again^2, maybe this can work. But it shouldn't block a patch enabling >> interrupt support of VFs. After the ring proxy is available and proven for >> a few years, we can deprecate bus mastering from uio, and after a few more >> years remove it. > We are talking about DPDK patches posted in June 2015. It's not some > software proven for years. dpdk has been used for years, it just won't work on VFs, if you need interrupt support. > If Linux keeps enabling hacks, no one will > bother doing the right thing. Upstream inclusion is the only carrot > Linux has to make people do the right thing. It's not a carrot, it's a stick. Implementing you scheme will take a huge effort, is not guaranteed to provide the performance needed, and will not be available for years. Meanwhile exactly the same thing on physical machines is supported. People will just use out of tree drivers (dpdk has several already). It's a pain, but nowhere near what you are proposing.