From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 64EDFA0561;
	Thu, 18 Mar 2021 20:47:16 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 491E3140ECE;
	Thu, 18 Mar 2021 20:47:16 +0100 (CET)
Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net
 [217.70.183.195])
 by mails.dpdk.org (Postfix) with ESMTP id 002BF40698
 for <dev@dpdk.org>; Thu, 18 Mar 2021 20:47:14 +0100 (CET)
X-Originating-IP: 78.45.89.65
Received: from [192.168.1.23] (ip-78-45-89-65.net.upcbroadband.cz
 [78.45.89.65]) (Authenticated sender: i.maximets@ovn.org)
 by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id 43C0B60007;
 Thu, 18 Mar 2021 19:47:12 +0000 (UTC)
To: Stefan Hajnoczi <stefanha@redhat.com>, Ilya Maximets <i.maximets@ovn.org>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>,
 Chenbo Xia <chenbo.xia@intel.com>, dev@dpdk.org,
 Adrian Moreno <amorenoz@redhat.com>, Julia Suvorova <jusual@redhat.com>,
 =?UTF-8?Q?Marc-Andr=c3=a9_Lureau?= <marcandre.lureau@redhat.com>,
 Daniel Berrange <berrange@redhat.com>
References: <20210317202530.4145673-1-i.maximets@ovn.org>
 <YFOTU0M50y5GlF25@stefanha-x1.localdomain>
From: Ilya Maximets <i.maximets@ovn.org>
Message-ID: <eeea4d9f-e600-9b4d-58f3-f8ced9485854@ovn.org>
Date: Thu, 18 Mar 2021 20:47:12 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.8.0
MIME-Version: 1.0
In-Reply-To: <YFOTU0M50y5GlF25@stefanha-x1.localdomain>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Subject: Re: [dpdk-dev] [RFC 0/4] SocketPair Broker support for vhost and
 virtio-user.
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 3/18/21 6:52 PM, Stefan Hajnoczi wrote:
> On Wed, Mar 17, 2021 at 09:25:26PM +0100, Ilya Maximets wrote:
> Hi,
> Some questions to understand the problems that SocketPair Broker solves:
> 
>> Even more configuration tricks required in order to share some sockets
>> between different containers and not only with the host, e.g. to
>> create service chains.
> 
> How does SocketPair Broker solve this? I guess the idea is that
> SocketPair Broker must be started before other containers. That way
> applications don't need to sleep and reconnect when a socket isn't
> available yet.
> 
> On the other hand, the SocketPair Broker might be unavailable (OOM
> killer, crash, etc), so applications still need to sleep and reconnect
> to the broker itself. I'm not sure the problem has actually been solved
> unless there is a reason why the broker is always guaranteed to be
> available?

Hi, Stefan.  Thanks for your feedback!

The idea is to have the SocketPair Broker running right from the
boot of the host.  If it will use a systemd socket-based service
activation, the socket should persist while systemd is alive, IIUC.
OOM, crash and restart of the broker should not affect existence
of the socket and systemd will spawn a service if it's not running
for any reason without loosing incoming connections.

> 
>> And some housekeeping usually required for applications in case the
>> socket server terminated abnormally and socket files left on a file
>> system:
>>  "failed to bind to vhu: Address already in use; remove it and try again"
> 
> QEMU avoids this by unlinking before binding. The drawback is that users
> might accidentally hijack an existing listen socket, but that can be
> solved with a pidfile.

How exactly this could be solved with a pidfile?  And what if this is
a different application that tries to create a socket on a same path?
e.g. QEMU creates a socket (started in a server mode) and user
accidentally created dpdkvhostuser port in Open vSwitch instead of
dpdkvhostuserclient.  This way rte_vhost library will try to bind
to an existing socket file and will fail.  Subsequently port creation
in OVS will fail.   We can't allow OVS to unlink files because this
way OVS users will have ability to unlink random sockets that OVS has
access to and we also has no idea if it's a QEMU that created a file
or it was a virtio-user application or someone else.
There are, probably, ways to detect if there is any alive process that
has this socket open, but that sounds like too much for this purpose,
also I'm not sure if it's possible if actual user is in a different
container.
So I don't see a good reliable way to detect these conditions.  This
falls on shoulders of a higher level management software or a user to
clean these socket files up before adding ports.

> 
>> Additionally, all applications (system and user's!) should follow
>> naming conventions and place socket files in particular location on a
>> file system to make things work.
> 
> Does SocketPair Broker solve this? Applications now need to use a naming
> convention for keys, so it seems like this issue has not been
> eliminated.

Key is an arbitrary sequence of bytes, so it's hard to call it a naming
convention.  But they need to know keys, you're right.  And to be
careful I said "eliminates most of the inconveniences". :)

> 
>> This patch-set aims to eliminate most of the inconveniences by
>> leveraging an infrastructure service provided by a SocketPair Broker.
> 
> I don't understand yet why this is useful for vhost-user, where the
> creation of the vhost-user device backend and its use by a VMM are
> closely managed by one piece of software:
> 
> 1. Unlink the socket path.
> 2. Create, bind, and listen on the socket path.
> 3. Instantiate the vhost-user device backend (e.g. talk to DPDK/SPDK
>    RPC, spawn a process, etc) and pass in the listen fd.
> 4. In the meantime the VMM can open the socket path and call connect(2).
>    As soon as the vhost-user device backend calls accept(2) the
>    connection will proceed (there is no need for sleeping).
> 
> This approach works across containers without a broker.

Not sure if I fully understood a question here, but anyway.

This approach works fine if you know what application to run.
In case of a k8s cluster, it might be a random DPDK application
with virtio-user ports running inside a container and want to
have a network connection.  Also, this application needs to run
virtio-user in server mode, otherwise restart of the OVS will
require restart of the application.  So, you basically need to
rely on a third-party application to create a socket with a right
name and in a correct location that is shared with a host, so
OVS can find it and connect.

In a VM world everything is much more simple, since you have
a libvirt and QEMU that will take care of all of these stuff
and which are also under full control of management software
and a system administrator.
In case of a container with a "random" DPDK application inside
there is no such entity that can help.  Of course, some solution
might be implemented in docker/podman daemon to create and manage
outside-looking sockets for an application inside the container,
but that is not available today AFAIK and I'm not sure if it
ever will.

> 
> BTW what is the security model of the broker? Unlike pathname UNIX
> domain sockets there is no ownership permission check.

I thought about this.  Yes, we should allow connection to this socket
for a wide group of applications.  That might be a problem.
However, 2 applications need to know the 1024 (at most) byte key in
order to connect to each other.  This might be considered as a
sufficient security model in case these keys are not predictable.
Suggestions on how to make this more secure are welcome.

If it's really necessary to completely isolate some connections
from other ones, one more broker could be started.  But I'm not
sure what the case it should be.

Broker itself closes the socketpair on its side, so the connection
between 2 applications is direct and should be secure as far as
kernel doesn't allow other system processes to intercept data on
arbitrary unix sockets.

Best regards, Ilya Maximets.