From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E2F14A0547; Tue, 30 Mar 2021 17:01:34 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7951E406B4; Tue, 30 Mar 2021 17:01:34 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by mails.dpdk.org (Postfix) with ESMTP id 5E33A40691 for ; Tue, 30 Mar 2021 17:01:32 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1617116491; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ma42BaDuKWpDYfd3o7sARCwpBCWqwdpKHlD+7USUxIw=; b=LitFXWZuPXHwIyWe6L98Jkp+9A4ecSq1/qV/MQkO47Vqg9U+cdYWTUnz6XMw6DrgpS60oM +iTBmOAmHDiqd61cVgFk10f2PR+ohjNeQUvcGcg4arneZIykY+dXrb6d8+i9i8K58WaUGs rpIDQZtunxytu9p+o71tvZvuMUSjrUI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-439-Tzvjq_4bN4-TwLeekpmzyw-1; Tue, 30 Mar 2021 11:01:27 -0400 X-MC-Unique: Tzvjq_4bN4-TwLeekpmzyw-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E6F738143FE; Tue, 30 Mar 2021 15:01:25 +0000 (UTC) Received: from localhost (ovpn-115-22.ams2.redhat.com [10.36.115.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5255D10023B5; Tue, 30 Mar 2021 15:01:15 +0000 (UTC) Date: Tue, 30 Mar 2021 16:01:14 +0100 From: Stefan Hajnoczi To: Ilya Maximets Cc: Billy McFall , Adrian Moreno , Maxime Coquelin , Chenbo Xia , dev@dpdk.org, Julia Suvorova , =?iso-8859-1?Q?Marc-Andr=E9?= Lureau , Daniel Berrange Message-ID: References: <53dd4b66-9e44-01c3-9f9a-b37dcadb14b7@ovn.org> <597d1ec7-d271-dc0d-522d-b900c9cb00ea@ovn.org> <2ba6ff01-fe2d-253f-cb36-303b63ba2133@ovn.org> <8a9c1923-7711-9962-fa37-a4e84e399d4f@ovn.org> MIME-Version: 1.0 In-Reply-To: <8a9c1923-7711-9962-fa37-a4e84e399d4f@ovn.org> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=stefanha@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="nbtQnV/dAnLNDd3n" Content-Disposition: inline X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: Re: [dpdk-dev] [RFC 0/4] SocketPair Broker support for vhost and virtio-user. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" --nbtQnV/dAnLNDd3n Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 25, 2021 at 06:58:56PM +0100, Ilya Maximets wrote: > On 3/25/21 5:43 PM, Stefan Hajnoczi wrote: > > On Thu, Mar 25, 2021 at 12:00:11PM +0100, Ilya Maximets wrote: > >> On 3/25/21 10:35 AM, Stefan Hajnoczi wrote: > >>> On Wed, Mar 24, 2021 at 02:11:31PM +0100, Ilya Maximets wrote: > >>>> On 3/24/21 1:05 PM, Stefan Hajnoczi wrote: > >>>>> On Tue, Mar 23, 2021 at 04:54:57PM -0400, Billy McFall wrote: > >>>>>> On Tue, Mar 23, 2021 at 3:52 PM Ilya Maximets = wrote: > >>>>>>> On 3/23/21 6:57 PM, Adrian Moreno wrote: > >>>>>>>> On 3/19/21 6:21 PM, Stefan Hajnoczi wrote: > >>>>>>>>> On Fri, Mar 19, 2021 at 04:29:21PM +0100, Ilya Maximets wrote: > >>>>>>>>>> On 3/19/21 3:05 PM, Stefan Hajnoczi wrote: > >>>>>>>>>>> On Thu, Mar 18, 2021 at 08:47:12PM +0100, Ilya Maximets wrote= : > >>>>>>>>>>>> On 3/18/21 6:52 PM, Stefan Hajnoczi wrote: > >>>>>>>>>>>>> On Wed, Mar 17, 2021 at 09:25:26PM +0100, Ilya Maximets wro= te: > >>>> - How to get this fd again after the OVS restart? CNI will not be i= nvoked > >>>> at this point to pass a new fd. > >>>> > >>>> - If application will close the connection for any reason (restart, = some > >>>> reconfiguration internal to the application) and OVS will be re-st= arted > >>>> at the same time, abstract socket will be gone. Need a persistent= daemon > >>>> to hold it. > >>> > >>> I remembered that these two points can be solved by sd_notify(3) > >>> FDSTORE=3D1. This requires that OVS runs as a systemd service. Not su= re if > >>> this is the case (at least in the CNI use case)? > >>> > >>> https://www.freedesktop.org/software/systemd/man/sd_notify.html > >> > >> IIUC, these file descriptors only passed on the restart of the service= , > >> so port-del + port-add scenario is not covered (and this is a very > >> common usecase, users are implementing some configuration changes this > >> way and also this is internally possible scenario, e.g. this sequence > >> will be triggered internally to change the OpenFlow port number). > >> port-del will release all the resources including the listening socket= . > >> Keeping the fd for later use is not an option, because OVS will not kn= ow > >> if this port will be added back or not and fds is a limited resource. > >=20 > > If users of the CNI plugin are reasonably expected to do this then it > > sounds like a blocker for the sd_notify(3) approach. Maybe it could be > > fixed by introducing an atomic port-rename (?) operation, but this is > > starting to sound too invasive. >=20 > It's hard to implement, actually. Things like 'port-rename' will > be internally implemented as del+add in most cases. Otherwise, it > will require a significant rework of OVS internals. > There are things that could be adjusted on the fly, but some > fundamental parts like OF port number that every other part depends > on are not easy to change. I see. In that case the sd_notify(3) approach won't work. > >> OVS could run as a system pod or as a systemd service. It differs fro= m > >> one setup to another. So it might not be controlled by systemd. > >=20 > > Does the CNI plugin allow both configurations? >=20 > CNI runs as a DaemonSet (pod on each node) by itself, and it doesn't > matter if OVS is running on the host or in a different pod. Okay. > >=20 > > It's impossible to come up with one approach that works for everyone in > > the general case (beyond the CNI plugin, beyond Kubernetes). >=20 > If we're looking for a solution to store abstract sockets somehow > for OVS then it's hard to came up with something generic. It will > have dependency on specific init system anyway. >=20 > OTOH, Broker solution will work for all cases. :) One may think > of a broker as a service that supplies abstract sockets for processes > from different namespaces. These sockets are already connected, for > convenience. I'm not sure what we're trying to come up with :). I haven't figured out how much of what has been discussed is cosmetic and nice-to-have stuff versus what is a real problem that needs a solution. >From the vhost-user point of view I would prefer to stick to the existing UNIX domain socket approach. Any additional mechanism adds extra complexity, won't be supported by all software, requires educating users and developers, requires building new vhost-user application container images, etc. IMO it's only worth doing if there is a real problem with UNIX domain sockets that cannot be solved without introducing a new connection mechanism. Stefan --nbtQnV/dAnLNDd3n--