From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <thomas@monjalon.net>
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com
 [66.111.4.25]) by dpdk.org (Postfix) with ESMTP id CFF402BF4
 for <dev@dpdk.org>; Fri, 29 Mar 2019 14:34:30 +0100 (CET)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.nyi.internal (Postfix) with ESMTP id 6590D21B96;
 Fri, 29 Mar 2019 09:34:30 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute1.internal (MEProxy); Fri, 29 Mar 2019 09:34:30 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h=
 from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding:content-type; s=mesmtp;
 bh=VXzMq2cNSBTWhaZ1p9CZtpFjcrWXLp9TaIERDITgssg=; b=dkLnecogtSsI
 J6rdVCbWW4MdcWvlGKH9G/DZPEWdo1on74bP7QzxdEB23H/xuIrY30nIACRUZYma
 BpYX2yBAel8T//DUgV40NpB3lOHyf0qmd9u7C02wxqTiSbP8L6Eya2aR0O2xa8M+
 w6DBfLpsOgloQ9M6OTHDEQAxwCQYgD8=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:content-type
 :date:from:in-reply-to:message-id:mime-version:references
 :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm2; bh=VXzMq2cNSBTWhaZ1p9CZtpFjcrWXLp9TaIERDITgs
 sg=; b=TyWKa0GSNoIKChV6AYVpaqVNafn68j8J4ccwODv0isVQ9yPe5hOGryqUD
 xfmDKQpWBLvD+G4pDdIMsT4ekfTtykeM5Ue8veE8C6ZQrXJy1SSAzaGZXsvaX+SD
 5Smk/wrAWRZOTVKf3dl9Vl+nL1yPk9r797TKuPg2ksEqNmC72AKcaP2SLnJkBkEG
 ravNxebfA5vQNazEet+1YZmLtY5SnE7kEjnKuGIzjiWIcFYqhaROIqHWbLbSPY2R
 nDpM+n7JYn1TjSWQTOKRadl5PpiFv+ngmh+nCQ1x/Augl7o9m3fCkVe+vX+pw8og
 2J7EmfAOZX5uWWgzsuYh7xhN6aXpQ==
X-ME-Sender: <xms:5R6eXGEUFFPgVqjvusvGWjrNbxF-UlxAnuJk4Y6jIfMqXgDl6t5wvg>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedutddrkeejgddvjecutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvffufffkjghfggfgtgesthfuredttddtvdenucfhrhhomhepvfhhohhmrghs
 ucfoohhnjhgrlhhonhcuoehthhhomhgrshesmhhonhhjrghlohhnrdhnvghtqeenucffoh
 hmrghinhepughpughkrdhorhhgnecukfhppeejjedrudefgedrvddtfedrudekgeenucfr
 rghrrghmpehmrghilhhfrhhomhepthhhohhmrghssehmohhnjhgrlhhonhdrnhgvthenuc
 evlhhushhtvghrufhiiigvpedt
X-ME-Proxy: <xmx:5R6eXKbUvIEdMBwco0Q2c7w5zg6205hWeT7mEncA-qN3Xq7UG9_LaA>
 <xmx:5R6eXKyGmT0DdZGnHfhp0y6bs9-8f5EDjjjOdXqudPLwFEOPelQJtg>
 <xmx:5R6eXMRW71AAaXa6PoYmQfnj972KrrepbRRmOPcgd_y22ApMrFzC3Q>
 <xmx:5h6eXF-i_hst9oWYPW78WFf_5143_YyV2vn7LJIWHUIYplcEf8lK6A>
Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184])
 by mail.messagingengine.com (Postfix) with ESMTPA id B3B43E464B;
 Fri, 29 Mar 2019 09:34:28 -0400 (EDT)
From: Thomas Monjalon <thomas@monjalon.net>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: David Marchand <david.marchand@redhat.com>, dev <dev@dpdk.org>,
 John McNamara <john.mcnamara@intel.com>,
 Marko Kovacevic <marko.kovacevic@intel.com>, iain.barker@oracle.com,
 edwin.leung@oracle.com, maxime.coquelin@redhat.com
Date: Fri, 29 Mar 2019 14:34:27 +0100
Message-ID: <4406705.fyK4ph7NJL@xps>
In-Reply-To: <af1c5ca2-b309-f17a-fda5-88942e4090ac@intel.com>
References: <07f664c33ddedaa5dcfe82ecb97d931e68b7e33a.1550855529.git.anatoly.burakov@intel.com>
 <3255576.YcZt162MTL@xps> <af1c5ca2-b309-f17a-fda5-88942e4090ac@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
Subject: Re: [dpdk-dev] [PATCH] eal: add option to not store segment fd's
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 29 Mar 2019 13:34:31 -0000

29/03/2019 14:24, Burakov, Anatoly:
> On 29-Mar-19 12:40 PM, Thomas Monjalon wrote:
> > 29/03/2019 13:05, Burakov, Anatoly:
> >> On 29-Mar-19 11:34 AM, Thomas Monjalon wrote:
> >>> 29/03/2019 11:33, Burakov, Anatoly:
> >>>> On 29-Mar-19 9:50 AM, David Marchand wrote:
> >>>>> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov
> >>>>> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >>>>>
> >>>>>       Due to internal glibc limitations [1], DPDK may exhaust internal
> >>>>>       file descriptor limits when using smaller page sizes, which results
> >>>>>       in inability to use system calls such as select() by user
> >>>>>       applications.
> >>>>>
> >>>>>       While the problem can be worked around using --single-file-segments
> >>>>>       option, it does not work if --legacy-mem mode is also used. Add a
> >>>>>       (yet another) EAL flag to disable storing fd's internally. This
> >>>>>       will sacrifice compability with Virtio with vhost-backend, but
> >>>>>       at least select() and friends will work.
> >>>>>
> >>>>>       [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
> >>>>>
> >>>>>
> >>>>> Sorry, I am a bit lost and I never took the time to look in the new
> >>>>> memory allocation system.
> >>>>> This gives the impression that we are accumulating workarounds, between
> >>>>> legacy-mem, single-file-segments, now no-seg-fds.
> >>>>
> >>>> Yep. I don't like this any more than you do, but i think there are users
> >>>> of all of these, so we can't just drop them willy-nilly. My great hope
> >>>> was that by now everyone would move on to use VFIO so legacy mem
> >>>> wouldn't be needed (the only reason it exists is to provide
> >>>> compatibility for use cases where lots of IOVA-contiguous memory is
> >>>> required, and VFIO cannot be used), but apparently that is too much to
> >>>> ask :/
> >>>>
> >>>>>
> >>>>> Iiuc, everything revolves around the need for per page locks.
> >>>>> Can you summarize why we need them?
> >>>>
> >>>> The short answer is multiprocess. We have to be able to map and unmap
> >>>> pages individually, and for that we need to be sure that we can, in
> >>>> fact, remove a page because no one else uses it. We also need to store
> >>>> fd's because virtio with vhost-user backend needs them to work, because
> >>>> it relies on sharing memory between processes using fd's.
> >>>
> >>> It's a pity adding an option to workaround a limitation of a corner case.
> >>> It adds complexity that we will have to support forever,
> >>> and it's even not perfect because of vhost.
> >>>
> >>> Might there be another solution?
> >>>
> >>
> >> If there is one, i'm all ears. I don't see any solutions aside from
> >> adding limitations.
> >>
> >> For example, we could drop the single/multi file segments mode and just
> >> make single file segments a default and the only available mode, but
> >> this has certain risks because older kernels do not support fallocate()
> >> on hugetlbfs.
> >>
> >> We could further draw a line in the sand, and say that, for example,
> >> 19.11 (or 20.11) will not have legacy mem mode, and everyone should use
> >> VFIO by now and if you don't it's your own fault.
> >>
> >> We could also cut down on the number of fd's we use in single-file
> >> segments mode by not using locks and simply deleting pages in the
> >> primary, but yanking out hugepages from under secondaries' feet makes me
> >> feel uneasy, even if technically by the time that happens, they're not
> >> supposed to be used anyway. This could mean that the patch is no longer
> >> necessary because we don't use that many fd's any more.
> > 
> > This last option is interesting. Is it realistic?
> > 
> 
> I can do it in current release cycle, but i'm not sure if it's too late 
> to do such changes. I guess it's OK since the validation cycle is just 
> starting? I'll throw something together and see if it crashes and burns.

OK let's try that.

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by dpdk.space (Postfix) with ESMTP id 0818BA05D3
	for <public@inbox.dpdk.org>; Fri, 29 Mar 2019 14:34:34 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id A3C2D3772;
	Fri, 29 Mar 2019 14:34:32 +0100 (CET)
Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com
 [66.111.4.25]) by dpdk.org (Postfix) with ESMTP id CFF402BF4
 for <dev@dpdk.org>; Fri, 29 Mar 2019 14:34:30 +0100 (CET)
Received: from compute1.internal (compute1.nyi.internal [10.202.2.41])
 by mailout.nyi.internal (Postfix) with ESMTP id 6590D21B96;
 Fri, 29 Mar 2019 09:34:30 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute1.internal (MEProxy); Fri, 29 Mar 2019 09:34:30 -0400
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h=
 from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding:content-type; s=mesmtp;
 bh=VXzMq2cNSBTWhaZ1p9CZtpFjcrWXLp9TaIERDITgssg=; b=dkLnecogtSsI
 J6rdVCbWW4MdcWvlGKH9G/DZPEWdo1on74bP7QzxdEB23H/xuIrY30nIACRUZYma
 BpYX2yBAel8T//DUgV40NpB3lOHyf0qmd9u7C02wxqTiSbP8L6Eya2aR0O2xa8M+
 w6DBfLpsOgloQ9M6OTHDEQAxwCQYgD8=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=
 messagingengine.com; h=cc:content-transfer-encoding:content-type
 :date:from:in-reply-to:message-id:mime-version:references
 :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender
 :x-sasl-enc; s=fm2; bh=VXzMq2cNSBTWhaZ1p9CZtpFjcrWXLp9TaIERDITgs
 sg=; b=TyWKa0GSNoIKChV6AYVpaqVNafn68j8J4ccwODv0isVQ9yPe5hOGryqUD
 xfmDKQpWBLvD+G4pDdIMsT4ekfTtykeM5Ue8veE8C6ZQrXJy1SSAzaGZXsvaX+SD
 5Smk/wrAWRZOTVKf3dl9Vl+nL1yPk9r797TKuPg2ksEqNmC72AKcaP2SLnJkBkEG
 ravNxebfA5vQNazEet+1YZmLtY5SnE7kEjnKuGIzjiWIcFYqhaROIqHWbLbSPY2R
 nDpM+n7JYn1TjSWQTOKRadl5PpiFv+ngmh+nCQ1x/Augl7o9m3fCkVe+vX+pw8og
 2J7EmfAOZX5uWWgzsuYh7xhN6aXpQ==
X-ME-Sender: <xms:5R6eXGEUFFPgVqjvusvGWjrNbxF-UlxAnuJk4Y6jIfMqXgDl6t5wvg>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedutddrkeejgddvjecutefuodetggdotefrodftvf
 curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu
 uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc
 fjughrpefhvffufffkjghfggfgtgesthfuredttddtvdenucfhrhhomhepvfhhohhmrghs
 ucfoohhnjhgrlhhonhcuoehthhhomhgrshesmhhonhhjrghlohhnrdhnvghtqeenucffoh
 hmrghinhepughpughkrdhorhhgnecukfhppeejjedrudefgedrvddtfedrudekgeenucfr
 rghrrghmpehmrghilhhfrhhomhepthhhohhmrghssehmohhnjhgrlhhonhdrnhgvthenuc
 evlhhushhtvghrufhiiigvpedt
X-ME-Proxy: <xmx:5R6eXKbUvIEdMBwco0Q2c7w5zg6205hWeT7mEncA-qN3Xq7UG9_LaA>
 <xmx:5R6eXKyGmT0DdZGnHfhp0y6bs9-8f5EDjjjOdXqudPLwFEOPelQJtg>
 <xmx:5R6eXMRW71AAaXa6PoYmQfnj972KrrepbRRmOPcgd_y22ApMrFzC3Q>
 <xmx:5h6eXF-i_hst9oWYPW78WFf_5143_YyV2vn7LJIWHUIYplcEf8lK6A>
Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184])
 by mail.messagingengine.com (Postfix) with ESMTPA id B3B43E464B;
 Fri, 29 Mar 2019 09:34:28 -0400 (EDT)
From: Thomas Monjalon <thomas@monjalon.net>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: David Marchand <david.marchand@redhat.com>, dev <dev@dpdk.org>,
 John McNamara <john.mcnamara@intel.com>,
 Marko Kovacevic <marko.kovacevic@intel.com>, iain.barker@oracle.com,
 edwin.leung@oracle.com, maxime.coquelin@redhat.com
Date: Fri, 29 Mar 2019 14:34:27 +0100
Message-ID: <4406705.fyK4ph7NJL@xps>
In-Reply-To: <af1c5ca2-b309-f17a-fda5-88942e4090ac@intel.com>
References: <07f664c33ddedaa5dcfe82ecb97d931e68b7e33a.1550855529.git.anatoly.burakov@intel.com>
 <3255576.YcZt162MTL@xps> <af1c5ca2-b309-f17a-fda5-88942e4090ac@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="UTF-8"
Subject: Re: [dpdk-dev] [PATCH] eal: add option to not store segment fd's
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>
Message-ID: <20190329133427.D_roRfSHXH_7txsPnvz4fVe4i5brsbxBkdEUep2Ng8c@z>

29/03/2019 14:24, Burakov, Anatoly:
> On 29-Mar-19 12:40 PM, Thomas Monjalon wrote:
> > 29/03/2019 13:05, Burakov, Anatoly:
> >> On 29-Mar-19 11:34 AM, Thomas Monjalon wrote:
> >>> 29/03/2019 11:33, Burakov, Anatoly:
> >>>> On 29-Mar-19 9:50 AM, David Marchand wrote:
> >>>>> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov
> >>>>> <anatoly.burakov@intel.com <mailto:anatoly.burakov@intel.com>> wrote:
> >>>>>
> >>>>>       Due to internal glibc limitations [1], DPDK may exhaust internal
> >>>>>       file descriptor limits when using smaller page sizes, which results
> >>>>>       in inability to use system calls such as select() by user
> >>>>>       applications.
> >>>>>
> >>>>>       While the problem can be worked around using --single-file-segments
> >>>>>       option, it does not work if --legacy-mem mode is also used. Add a
> >>>>>       (yet another) EAL flag to disable storing fd's internally. This
> >>>>>       will sacrifice compability with Virtio with vhost-backend, but
> >>>>>       at least select() and friends will work.
> >>>>>
> >>>>>       [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html
> >>>>>
> >>>>>
> >>>>> Sorry, I am a bit lost and I never took the time to look in the new
> >>>>> memory allocation system.
> >>>>> This gives the impression that we are accumulating workarounds, between
> >>>>> legacy-mem, single-file-segments, now no-seg-fds.
> >>>>
> >>>> Yep. I don't like this any more than you do, but i think there are users
> >>>> of all of these, so we can't just drop them willy-nilly. My great hope
> >>>> was that by now everyone would move on to use VFIO so legacy mem
> >>>> wouldn't be needed (the only reason it exists is to provide
> >>>> compatibility for use cases where lots of IOVA-contiguous memory is
> >>>> required, and VFIO cannot be used), but apparently that is too much to
> >>>> ask :/
> >>>>
> >>>>>
> >>>>> Iiuc, everything revolves around the need for per page locks.
> >>>>> Can you summarize why we need them?
> >>>>
> >>>> The short answer is multiprocess. We have to be able to map and unmap
> >>>> pages individually, and for that we need to be sure that we can, in
> >>>> fact, remove a page because no one else uses it. We also need to store
> >>>> fd's because virtio with vhost-user backend needs them to work, because
> >>>> it relies on sharing memory between processes using fd's.
> >>>
> >>> It's a pity adding an option to workaround a limitation of a corner case.
> >>> It adds complexity that we will have to support forever,
> >>> and it's even not perfect because of vhost.
> >>>
> >>> Might there be another solution?
> >>>
> >>
> >> If there is one, i'm all ears. I don't see any solutions aside from
> >> adding limitations.
> >>
> >> For example, we could drop the single/multi file segments mode and just
> >> make single file segments a default and the only available mode, but
> >> this has certain risks because older kernels do not support fallocate()
> >> on hugetlbfs.
> >>
> >> We could further draw a line in the sand, and say that, for example,
> >> 19.11 (or 20.11) will not have legacy mem mode, and everyone should use
> >> VFIO by now and if you don't it's your own fault.
> >>
> >> We could also cut down on the number of fd's we use in single-file
> >> segments mode by not using locks and simply deleting pages in the
> >> primary, but yanking out hugepages from under secondaries' feet makes me
> >> feel uneasy, even if technically by the time that happens, they're not
> >> supposed to be used anyway. This could mean that the patch is no longer
> >> necessary because we don't use that many fd's any more.
> > 
> > This last option is interesting. Is it realistic?
> > 
> 
> I can do it in current release cycle, but i'm not sure if it's too late 
> to do such changes. I guess it's OK since the validation cycle is just 
> starting? I'll throw something together and see if it crashes and burns.

OK let's try that.