From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7B456A0561; Thu, 18 Mar 2021 21:14:32 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3DF60140F58; Thu, 18 Mar 2021 21:14:32 +0100 (CET) Received: from relay3-d.mail.gandi.net (relay3-d.mail.gandi.net [217.70.183.195]) by mails.dpdk.org (Postfix) with ESMTP id B0F6D40698 for ; Thu, 18 Mar 2021 21:14:30 +0100 (CET) X-Originating-IP: 78.45.89.65 Received: from [192.168.1.23] (ip-78-45-89-65.net.upcbroadband.cz [78.45.89.65]) (Authenticated sender: i.maximets@ovn.org) by relay3-d.mail.gandi.net (Postfix) with ESMTPSA id E744B60002; Thu, 18 Mar 2021 20:14:27 +0000 (UTC) To: Ilya Maximets , Stefan Hajnoczi Cc: Maxime Coquelin , Chenbo Xia , dev@dpdk.org, Adrian Moreno , Julia Suvorova , =?UTF-8?Q?Marc-Andr=c3=a9_Lureau?= , Daniel Berrange References: <20210317202530.4145673-1-i.maximets@ovn.org> From: Ilya Maximets Message-ID: <269ceb3d-3eda-ab5e-659d-e646a4c81957@ovn.org> Date: Thu, 18 Mar 2021 21:14:27 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [RFC 0/4] SocketPair Broker support for vhost and virtio-user. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 3/18/21 8:47 PM, Ilya Maximets wrote: > On 3/18/21 6:52 PM, Stefan Hajnoczi wrote: >> On Wed, Mar 17, 2021 at 09:25:26PM +0100, Ilya Maximets wrote: >> Hi, >> Some questions to understand the problems that SocketPair Broker solves: >> >>> Even more configuration tricks required in order to share some sockets >>> between different containers and not only with the host, e.g. to >>> create service chains. >> >> How does SocketPair Broker solve this? I guess the idea is that >> SocketPair Broker must be started before other containers. That way >> applications don't need to sleep and reconnect when a socket isn't >> available yet. >> >> On the other hand, the SocketPair Broker might be unavailable (OOM >> killer, crash, etc), so applications still need to sleep and reconnect >> to the broker itself. I'm not sure the problem has actually been solved >> unless there is a reason why the broker is always guaranteed to be >> available? > > Hi, Stefan. Thanks for your feedback! > > The idea is to have the SocketPair Broker running right from the > boot of the host. If it will use a systemd socket-based service > activation, the socket should persist while systemd is alive, IIUC. > OOM, crash and restart of the broker should not affect existence > of the socket and systemd will spawn a service if it's not running > for any reason without loosing incoming connections. > >> >>> And some housekeeping usually required for applications in case the >>> socket server terminated abnormally and socket files left on a file >>> system: >>> "failed to bind to vhu: Address already in use; remove it and try again" >> >> QEMU avoids this by unlinking before binding. The drawback is that users >> might accidentally hijack an existing listen socket, but that can be >> solved with a pidfile. > > How exactly this could be solved with a pidfile? And what if this is > a different application that tries to create a socket on a same path? > e.g. QEMU creates a socket (started in a server mode) and user > accidentally created dpdkvhostuser port in Open vSwitch instead of > dpdkvhostuserclient. This way rte_vhost library will try to bind > to an existing socket file and will fail. Subsequently port creation > in OVS will fail. We can't allow OVS to unlink files because this > way OVS users will have ability to unlink random sockets that OVS has > access to and we also has no idea if it's a QEMU that created a file > or it was a virtio-user application or someone else. > There are, probably, ways to detect if there is any alive process that > has this socket open, but that sounds like too much for this purpose, > also I'm not sure if it's possible if actual user is in a different > container. > So I don't see a good reliable way to detect these conditions. This > falls on shoulders of a higher level management software or a user to > clean these socket files up before adding ports. > >> >>> Additionally, all applications (system and user's!) should follow >>> naming conventions and place socket files in particular location on a >>> file system to make things work. >> >> Does SocketPair Broker solve this? Applications now need to use a naming >> convention for keys, so it seems like this issue has not been >> eliminated. > > Key is an arbitrary sequence of bytes, so it's hard to call it a naming > convention. But they need to know keys, you're right. And to be > careful I said "eliminates most of the inconveniences". :) > >> >>> This patch-set aims to eliminate most of the inconveniences by >>> leveraging an infrastructure service provided by a SocketPair Broker. >> >> I don't understand yet why this is useful for vhost-user, where the >> creation of the vhost-user device backend and its use by a VMM are >> closely managed by one piece of software: >> >> 1. Unlink the socket path. >> 2. Create, bind, and listen on the socket path. >> 3. Instantiate the vhost-user device backend (e.g. talk to DPDK/SPDK >> RPC, spawn a process, etc) and pass in the listen fd. >> 4. In the meantime the VMM can open the socket path and call connect(2). >> As soon as the vhost-user device backend calls accept(2) the >> connection will proceed (there is no need for sleeping). >> >> This approach works across containers without a broker. > > Not sure if I fully understood a question here, but anyway. > > This approach works fine if you know what application to run. > In case of a k8s cluster, it might be a random DPDK application > with virtio-user ports running inside a container and want to > have a network connection. Also, this application needs to run > virtio-user in server mode, otherwise restart of the OVS will > require restart of the application. So, you basically need to > rely on a third-party application to create a socket with a right > name and in a correct location that is shared with a host, so > OVS can find it and connect. > > In a VM world everything is much more simple, since you have > a libvirt and QEMU that will take care of all of these stuff > and which are also under full control of management software > and a system administrator. > In case of a container with a "random" DPDK application inside > there is no such entity that can help. Of course, some solution > might be implemented in docker/podman daemon to create and manage > outside-looking sockets for an application inside the container, > but that is not available today AFAIK and I'm not sure if it > ever will. > >> >> BTW what is the security model of the broker? Unlike pathname UNIX >> domain sockets there is no ownership permission check. > > I thought about this. Yes, we should allow connection to this socket > for a wide group of applications. That might be a problem. > However, 2 applications need to know the 1024 (at most) byte key in > order to connect to each other. This might be considered as a > sufficient security model in case these keys are not predictable. > Suggestions on how to make this more secure are welcome. Digging more into unix sockets, I think that broker might use SO_PEERCRED to identify at least a uid and gid of a client. This way we can implement policies, e.g. one client might request to pair it only with clients from the same group or from the same user. This is actually a great extension for the SocketPair Broker Protocol. Might even use SO_PEERSEC to enforce even stricter policies based on selinux context. > > If it's really necessary to completely isolate some connections > from other ones, one more broker could be started. But I'm not > sure what the case it should be. > > Broker itself closes the socketpair on its side, so the connection > between 2 applications is direct and should be secure as far as > kernel doesn't allow other system processes to intercept data on > arbitrary unix sockets. > > Best regards, Ilya Maximets. >