From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 61F9EA0544;
	Thu, 13 Feb 2020 18:23:31 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 36F6B1BFE6;
	Thu, 13 Feb 2020 18:23:30 +0100 (CET)
Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com
 [207.211.31.120]) by dpdk.org (Postfix) with ESMTP id ECDC21BFC8
 for <dev@dpdk.org>; Thu, 13 Feb 2020 18:23:28 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1581614608;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references;
 bh=5ZMRVz/WdCTWZhSrZBtOojCPqykVcHGAI17WszWJS+8=;
 b=VR7hWvXHHwUirb3k+M9ovzT9MLC9BRpNB5esHycdWfUvz0i7q3woCxFkQWDtznpBLAXnsd
 XdPc1zMRDowDzTlUGio5gB03qztBqRfeiMwmIJsf4NkYZWkxysFHbWob28mKEWSS3bqFMH
 kgXuFbZrXeJ7qBrJYq79hHigcPKyHns=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-105-ct_soC2HMr-QkKjI0riS7g-1; Thu, 13 Feb 2020 12:23:24 -0500
X-MC-Unique: ct_soC2HMr-QkKjI0riS7g-1
Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com
 [10.5.11.12])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 545A6107ACCC;
 Thu, 13 Feb 2020 17:23:22 +0000 (UTC)
Received: from w520.home (ovpn-116-28.phx2.redhat.com [10.3.116.28])
 by smtp.corp.redhat.com (Postfix) with ESMTP id 7619D60BF1;
 Thu, 13 Feb 2020 17:23:21 +0000 (UTC)
Date: Thu, 13 Feb 2020 10:23:21 -0700
From: Alex Williamson <alex.williamson@redhat.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org, linux-pci@vger.kernel.org,
 linux-kernel@vger.kernel.org, dev@dpdk.org, mtosatti@redhat.com,
 thomas@monjalon.net, bluca@debian.org, jerinjacobk@gmail.com,
 bruce.richardson@intel.com
Message-ID: <20200213102321.0341db63@w520.home>
In-Reply-To: <20200213124654.76128d29.cohuck@redhat.com>
References: <158145472604.16827.15751375540102298130.stgit@gimli.home>
 <158146234273.16827.10591457733122265139.stgit@gimli.home>
 <20200213124654.76128d29.cohuck@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12
Subject: Re: [dpdk-dev] [PATCH 3/7] vfio/pci: Introduce VF token
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Thu, 13 Feb 2020 12:46:54 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 11 Feb 2020 16:05:42 -0700
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
> > fully isolated from the PF.  The PF can always cause a denial of
> > service to the VF, if not access data passed through the VF directly.
> > This is why vfio-pci currently does not bind to PFs with SR-IOV enabled
> > and does not provide access itself to enabling SR-IOV on a PF.  The
> > IOMMU grouping mechanism might allow us a solution to this lack of
> > isolation, however the deficiency isn't actually in the DMA path, so
> > much as the potential cooperation between PF and VF devices.  Also,
> > if we were to force VFs into the same IOMMU group as the PF, we severely
> > limit the utility of having independent drivers managing PFs and VFs
> > with vfio.
> > 
> > Therefore we introduce the concept of a VF token.  The token is
> > implemented as a UUID and represents a shared secret which must be set
> > by the PF driver and used by the VF drivers in order to access a vfio
> > device file descriptor for the VF.  The ioctl to set the VF token will
> > be provided in a later commit, this commit implements the underlying
> > infrastructure.  The concept here is to augment the string the user
> > passes to match a device within a group in order to retrieve access to
> > the device descriptor.  For example, rather than passing only the PCI
> > device name (ex. "0000:03:00.0") the user would also pass a vf_token
> > UUID (ex. "2ab74924-c335-45f4-9b16-8569e5b08258").  The device match
> > string therefore becomes:
> > 
> > "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
> > 
> > This syntax is expected to be extensible to future options as well, with
> > the standard being:
> > 
> > "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"  
> 
> Is this designed to be an AND condition? (I read it as such.)

Not sure I understand, the device name is always required for
compatibility, then zero or more key/value pairs may be needed
depending on the vfio bus driver and device requirements.  I'm not
suggesting that the user would pass multiple vf_token= options and the
implementation here would only parse the first.  I'm really only
suggesting the parsing format we'd use for multiple options, I'm not
trying to dictate how a bus driver might make use of them.  Does that
make sense, should I change some wording?
 
> > 
> > The device name must be first and option=value pairs are separated by
> > spaces.
> > 
> > The vf_token option is only required for VFs where the PF device is
> > bound to vfio-pci.  There is no change for PFs using existing host
> > drivers.
> > 
> > Note that in order to protect existing VF users, not only is it required
> > to set a vf_token on the PF before VFs devices can be accessed, but also
> > if there are existing VF users, (re)opening the PF device must also
> > provide the current vf_token as authentication.  This is intended to
> > prevent a VF driver starting with a trusted PF driver and later being
> > replaced by an unknown driver.  A vf_token is not required to open the
> > PF device when none of the VF devices are in use by vfio-pci drivers.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > ---
> >  drivers/vfio/pci/vfio_pci.c         |  127 +++++++++++++++++++++++++++++++++++
> >  drivers/vfio/pci/vfio_pci_private.h |    8 ++
> >  2 files changed, 134 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 2ec6c31d0ab0..26aea9ac4863 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -466,6 +466,35 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
> >  		vfio_pci_set_power_state(vdev, PCI_D3hot);
> >  }
> >  
> > +static struct pci_driver vfio_pci_driver;
> > +
> > +static void vfio_pci_vf_token_user_add(struct vfio_pci_device *vdev, int val)  
> 
> Suggestion: call this _user_modify(), and have _user_add() and
> _user_remove() as wrappers. That said, ...

I did consider something along these lines, but it seems overly
explicit for a helper that's used in two places with only two obvious
and discrete values.  If this were an exposed API, absolutely I'd agree.

> > +{
> > +	struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > +	struct vfio_device *pf_dev;
> > +	struct vfio_pci_device *pf_vdev;
> > +
> > +	if (!vdev->pdev->is_virtfn)
> > +		return;
> > +
> > +	pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > +	if (!pf_dev)
> > +		return;
> > +
> > +	if (pci_dev_driver(physfn) != &vfio_pci_driver) {
> > +		vfio_device_put(pf_dev);
> > +		return;
> > +	}
> > +
> > +	pf_vdev = vfio_device_data(pf_dev);
> > +
> > +	mutex_lock(&pf_vdev->vf_token->lock);
> > +	pf_vdev->vf_token->users += val;  
> 
> ...is this expected to always be >= 0? If yes, it looks like a bug if
> this is called with val==-n for n > users.

I'm not sure if you're inadvertently pointing out the bug in the
vfio_pci_open() path below where we increment token users before
vfio_pci_enable() which can fail, or your suggesting a WARN_ON here
should this go negative.  This is a static helper function, so I
generally don't try to sanitize the inputs like I would for an exposed
API.
 
> > +	mutex_unlock(&pf_vdev->vf_token->lock);
> > +
> > +	vfio_device_put(pf_dev);
> > +}
> > +
> >  static void vfio_pci_release(void *device_data)
> >  {
> >  	struct vfio_pci_device *vdev = device_data;
> > @@ -475,6 +504,7 @@ static void vfio_pci_release(void *device_data)
> >  	if (!(--vdev->refcnt)) {
> >  		vfio_spapr_pci_eeh_release(vdev->pdev);
> >  		vfio_pci_disable(vdev);
> > +		vfio_pci_vf_token_user_add(vdev, -1);
> >  	}
> >  
> >  	mutex_unlock(&vdev->reflck->lock);
> > @@ -493,6 +523,7 @@ static int vfio_pci_open(void *device_data)
> >  	mutex_lock(&vdev->reflck->lock);
> >  
> >  	if (!vdev->refcnt) {
> > +		vfio_pci_vf_token_user_add(vdev, 1);
> >  		ret = vfio_pci_enable(vdev);
> >  		if (ret)
> >  			goto error;

I think we want to include decrementing token users in the error path
here.

> > @@ -1278,11 +1309,86 @@ static void vfio_pci_request(void *device_data, unsigned int count)
> >  	mutex_unlock(&vdev->igate);
> >  }
> >  
> > +#define VF_TOKEN_ARG "vf_token="
> > +
> > +/* Called with vdev->vf_token->lock */
> > +static int vfio_pci_vf_token_match(struct vfio_pci_device *vdev, char *opts)
> > +{
> > +	char *token;
> > +	uuid_t uuid;
> > +	int ret;
> > +
> > +	if (!opts)
> > +		return -EINVAL;
> > +
> > +	token = strstr(opts, VF_TOKEN_ARG);
> > +	if (!token)
> > +		return -EINVAL;
> > +
> > +	token += strlen(VF_TOKEN_ARG);
> > +
> > +	ret = uuid_parse(token, &uuid);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (!uuid_equal(&uuid, &vdev->vf_token->uuid))
> > +		return -EACCES;
> > +
> > +	return 0;
> > +}
> > +
> >  static int vfio_pci_match(void *device_data, char *buf)
> >  {
> >  	struct vfio_pci_device *vdev = device_data;
> > +	char *opts;
> > +
> > +	opts = strchr(buf, ' ');
> > +	if (opts) {
> > +		*opts = 0;
> > +		opts++;
> > +	}
> > +
> > +	if (strcmp(pci_name(vdev->pdev), buf))
> > +		return 0; /* No match */  
> 
> Up to here, this function is fine; but below, it gets a bit hard to
> follow...
> 
> > +
> > +	if (vdev->pdev->is_virtfn) {
> > +		struct pci_dev *physfn = pci_physfn(vdev->pdev);
> > +		struct vfio_device *pf_dev;
> > +		int ret = 0;
> > +
> > +		pf_dev = vfio_device_get_from_dev(&physfn->dev);
> > +		if (pf_dev) {
> > +			if (pci_dev_driver(physfn) == &vfio_pci_driver) {
> > +				struct vfio_pci_device *pf_vdev =
> > +						vfio_device_data(pf_dev);
> > +
> > +				mutex_lock(&pf_vdev->vf_token->lock);
> > +				ret = vfio_pci_vf_token_match(pf_vdev, opts);
> > +				mutex_unlock(&pf_vdev->vf_token->lock);
> > +			}
> > +
> > +			vfio_device_put(pf_dev);
> > +
> > +			if (ret)
> > +				return -EACCES;
> > +		}
> > +	}  
> 
> If we are a VF, and the PF is bound to vfio, pass whatever stuff other
> than the device name that was passed in to an opaque match function.

vfio_pci_match() is an opaque match function relative to vfio.c, but
there's nothing opaque here.  We have a VF where the associated PF is
bound to vfio-pci, therefore we require that the additional options
include a vf_token matching the PF and we go looking to verify that.
 
> > -	return !strcmp(pci_name(vdev->pdev), buf);
> > +	if (vdev->vf_token) {
> > +		int ret = 0;
> > +
> > +		mutex_lock(&vdev->vf_token->lock);
> > +
> > +		if (vdev->vf_token->users)
> > +			ret = vfio_pci_vf_token_match(vdev, opts);
> > +
> > +		mutex_unlock(&vdev->vf_token->lock);
> > +
> > +		if (ret)
> > +			return -EACCES;
> > +	}  
> 
> If we have a VF token with users, pass whatever stuff other than the
> device name that was passed in to an opaque match function.

This description strays further off course a bit.  If we have a
vf_token then we are a PF and clearly bound to vfio-pci.  If there are
existing VF users then we require the user to provide a vf_token
matching the one currently on the device.

> What about the following instead:
> 
> - parse the passed-in string into device name and key/value pairs
> - maybe reject anything with an unknown key

This is definitely something that we should decided whether or not we
want to do it.  I think an argument for it is that a user can pick
arbitrary key=value options that would be ignored with this
implementation, but later might match a key that gets defined and then
we break them.  Misuse of the API on the part of the user, but maybe
better to just prevent it ahead of time.

> - check the device name
> - if we're a VF with the PF bound to vfio, require a VF token to be
>   specified and pass it to a token match function
> - if we have a VF token with users, require a VF token to be specified
>   and pass it to a token match function

This is essentially what we do above.  Maybe we just disagree about
whether we're calling an "opaque match function" versus a "token match
function".

> 
> My main gripes with the current code are:
> - key=value parsing is done in the match function for vf_token
> - it looks hard to extend the list of supported key/value pairs
> - I don't see a good way to find out (as the user) _why_ the VF isn't
>   matching

If we want to reject unknown options, then yes, the parsing should be
done ahead.  I don't see that it's hard to extend though, each new
requirement can follow the same methodology to check for it in the
remaining options string.

The last point is the one I brought up in the cover letter and where
I'm also not happy with the opaque error condition, but I have no
thoughts on how to resolve it.  Either we block the user from getting
the device file descriptor, and they're left scratching their heads
why, or we give them access to the file descriptor AND we need to
impose barriers to access mechanisms (ex. block read/write/mmap, limit
ioctls) AND we need to use VFIO_DEVICE_FEATURE and VFIO_DEVICE_GET_INFO
as a mechanism for the user to figure out that the device requires
"something" to get full access.  With the latter, I'm concerned whether
existing userspace code will fail in predictable ways and that it
snowballs into an ugly API.  For instance, if we add a flag to device
info to indicate it's locked, existing code won't know about that flag,
so we have to cripple device info to report no regions and no irqs to
make that code fail.  Then a user needs to know which feature to probe
for to figure out how the device is locked, then once they do we make
device info report real values?  It's maybe a little more deterministic
than blocking access to the file descriptor, but I'm not sure it's
worth it.  We could do a log-once if the match fails for token, but we
need to be careful not to provide an obvious point where the user could
spam the logs.  I've also considered if we could write an error back
into the user's buffer, but the ioctl isn't designed that way and we
don't know if we'd break how the user consumes that buffer later.

Ideas greatly welcomed in this space.  Thanks for the review!

Alex