From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f171.google.com (mail-wi0-f171.google.com [209.85.212.171]) by dpdk.org (Postfix) with ESMTP id 2B8C98D9F for ; Wed, 30 Sep 2015 17:36:20 +0200 (CEST) Received: by wicfx3 with SMTP id fx3so203688361wic.1 for ; Wed, 30 Sep 2015 08:36:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=hq25usd5slFAVIBep+ykYdCcpPlHr4JCoxohgS9uY9Y=; b=kQegYhNhw60Fiu29HNdYqjM/YsHz5Dhl0LVCERhdC/sQgsglxnU/RJmJy9TKYgAPzQ L0A9KeJZA83XUJP/MI8UZxDh6oYAF3ppry7t6sUiy4o1zSfaqXSuO10m/nenciIWcOat 8P68EREuJTKCI1JeralkHrkUcTVPenHL0Eol5l+B+ZknCvpkmLgsapbnrJLMdE/pK5aS pFt1CT5WIvY+p9CrbYzBZYTvDYSIQN7F9cNSaHnyiymjsvCGt7UrgyRuw9Kp9F/JK+l3 ULuOpgZZ/fUOqUKkzTvB1sBNqkhwXdCT9XN1EOfI+khJig1TYSSvdM9EKYPj+FTmGAeg IMrA== X-Gm-Message-State: ALoCoQmqgTc3I+wVxE4IVHWaYl6oyqhkPunNs2kaEcW9ufAXVbOj9BpvP/ID032RlLkM+BGcU/fQ X-Received: by 10.194.94.71 with SMTP id da7mr5099942wjb.8.1443627379931; Wed, 30 Sep 2015 08:36:19 -0700 (PDT) Received: from avi.cloudius ([37.142.229.250]) by smtp.googlemail.com with ESMTPSA id jf3sm30194435wic.8.2015.09.30.08.36.18 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 Sep 2015 08:36:19 -0700 (PDT) To: "Michael S. Tsirkin" References: <20150930134533-mutt-send-email-mst@redhat.com> <560BC6C9.4020505@cloudius-systems.com> <20150930143927-mutt-send-email-mst@redhat.com> <560BCD2F.5060505@cloudius-systems.com> <20150930150115-mutt-send-email-mst@redhat.com> <560BD284.7040505@cloudius-systems.com> <20150930151632-mutt-send-email-mst@redhat.com> <560BDE24.8000308@scylladb.com> <20150930165359-mutt-send-email-mst@redhat.com> <560BF782.4070308@scylladb.com> <20150930175848-mutt-send-email-mst@redhat.com> From: Avi Kivity Message-ID: <560C0171.7080507@scylladb.com> Date: Wed, 30 Sep 2015 18:36:17 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150930175848-mutt-send-email-mst@redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 Sep 2015 15:36:20 -0000 On 09/30/2015 06:21 PM, Michael S. Tsirkin wrote: > On Wed, Sep 30, 2015 at 05:53:54PM +0300, Avi Kivity wrote: >> On 09/30/2015 05:39 PM, Michael S. Tsirkin wrote: >>> On Wed, Sep 30, 2015 at 04:05:40PM +0300, Avi Kivity wrote: >>>> On 09/30/2015 03:27 PM, Michael S. Tsirkin wrote: >>>>> On Wed, Sep 30, 2015 at 03:16:04PM +0300, Vlad Zolotarov wrote: >>>>>> On 09/30/15 15:03, Michael S. Tsirkin wrote: >>>>>>> On Wed, Sep 30, 2015 at 02:53:19PM +0300, Vlad Zolotarov wrote: >>>>>>>> On 09/30/15 14:41, Michael S. Tsirkin wrote: >>>>>>>>> On Wed, Sep 30, 2015 at 02:26:01PM +0300, Vlad Zolotarov wrote: >>>>>>>>>> The whole idea is to bypass kernel. Especially for networking... >>>>>>>>> ... on dumb hardware that doesn't support doing that securely. >>>>>>>> On a very capable HW that supports whatever security requirements needed >>>>>>>> (e.g. 82599 Intel's SR-IOV VF devices). >>>>>>> Network card type is irrelevant as long as you do not have an IOMMU, >>>>>>> otherwise you would just use e.g. VFIO. >>>>>> Sorry, but I don't follow your logic here - Amazon EC2 environment is a >>>>>> example where there *is* iommu but it's not virtualized >>>>>> and thus VFIO is >>>>>> useless and there is an option to use directly assigned SR-IOV networking >>>>>> device there where using the kernel drivers impose a performance impact >>>>>> compared to user space UIO-based user space kernel bypass mode of usage. How >>>>>> is it irrelevant? Could u, pls, clarify your point? >>>>>> >>>>> So it's not even dumb hardware, it's another piece of software >>>>> that forces an "all or nothing" approach where either >>>>> device has access to all VM memory, or none. >>>>> And this, unfortunately, leaves you with no secure way to >>>>> allow userspace drivers. >>>> Some setups don't need security (they are single-user, single application). >>>> But do need a lot of performance (like 5X-10X performance). An example is >>>> OpenVSwitch, security doesn't help it at all and if you force it to use the >>>> kernel drivers you cripple it. >>> We'd have to see there are actual users that need this. So far, dpdk >>> seems like the only one, >> dpdk is a whole class if users. It's not a specific application. >> >>> and it wants to use UIO for slow path stuff >>> like polling link status. Why this needs kernel bypass support, I don't >>> know. I asked, and got no answer. >> First, it's more than link status. dpdk also has an interrupt mode, which >> applications can fall back to when when the load is light in order to save >> power (and in order not to get support calls about 100% cpu when idle). > Aha, looks like it appeared in June. Interesting, thanks for the info. > >> Even for link status, you don't want to poll for that, because accessing >> device registers is expensive. An interrupt is the best approach for rare >> events like link changed. > Yea, but you probably can get by with a timer for that, even if it's ugly. Maybe you can, but (a) why increase link status change detection latency (b) link status change detection is not the only user of the feature, since June. >>>> Also, I'm root. I can do anything I like, including loading a patched >>>> pci_uio_generic. You're not providing _any_ security, you're simply making >>>> life harder for users. >>> Maybe that's true on your system. But I guess you know that's not true >>> for everyone, not in 2015. >> Why is it not true? if I'm root, I can do anything I like to my >> system, and everyone is root in 2015. I can access the BARs directly >> and program DMA, how am I more secure by uio not allowing me to setup >> msix? > That's not the point. The point always was that using uio for these > devices (capable of DMA, in particular of msix) isn't possible in a > secure way. uio is used today for DMA-capable devices. Some users are perfectly willing to give up security for functionality (that's all users who have root access to their machines, not just uio users). You aren't adding any security by disallowing uio, you're just removing functionality. As it happens, you're removing the functionality from the users who have no other option. They can't use vfio because it doesn't work on virtualized setups. (note even on a setup that does support vfio, high performance users will want to avoid it). > And yes, if same device happens to also do interrupts, UIO > does not reject it as it probably should, and we can't change this > without breaking some working setups. But this doesn't mean we should > add more setups like this that we'll then be forced to maintain. pci_uio_generic is maybe the driver with the lowest maintenance burden in the entire kernel. One driver supporting all pci devices, if you don't need msi/msix. And with the patch, it will be one driver supporting all pci devices. I don't really understand the tradeoff. By rejecting the patch you're denying users the ability to use their devices, except through the much slower kernel drivers. The patch would not allow a non-root user to do ANYTHING. Root can already do anything. So what security issue is there? > > >> Non-root users are already secured by their inability to load the module, >> and by the device permissions. >> >>>>> So it makes even less sense to add insecure work-arounds in the kernel. >>>>> It seems quite likely that by the time the new kernel reaches >>>>> production X years from now, EC2 will have a virtual iommu. >>>> I can adopt a new kernel tomorrow. I have no influence on EC2. >>>> >>>> >>> Xen grant tables sound like they could be the right interface >>> for EC2. google search for "grant tables iommu" immediately gives me: >>> http://lists.xenproject.org/archives/html/xen-devel/2014-04/msg00963.html >>> Maybe latest Xen is already doing the right thing, and it's just the >>> question of making VFIO use that. >>> >> grant tables only work for virtual devices, not physical devices. > Why not? That's what the patches above seem to do. > Oh, I think those are for emulating transient iommu maps (new map for every request) on top of a real iommu. The dpdk use case is permanently mapping a large chunk of guest userspace, I don't think Xen exposes enough grant table entries for that. In addition, that leaves users of kvm, vmware, older Xen, or bare metal machines without iommus out in the cold; and bare metal users that want the iommu off for performance are forced to use it. And for what, to prevent root from touching memory via dma that they can access in a million other ways?