From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id AA30B1B05 for ; Fri, 29 Mar 2019 14:24:35 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Mar 2019 06:24:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,284,1549958400"; d="scan'208";a="156953949" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.103]) ([10.237.220.103]) by fmsmga004.fm.intel.com with ESMTP; 29 Mar 2019 06:24:32 -0700 To: Thomas Monjalon Cc: David Marchand , dev , John McNamara , Marko Kovacevic , iain.barker@oracle.com, edwin.leung@oracle.com, maxime.coquelin@redhat.com References: <07f664c33ddedaa5dcfe82ecb97d931e68b7e33a.1550855529.git.anatoly.burakov@intel.com> <1682850.JO3elT0QtZ@xps> <3255576.YcZt162MTL@xps> From: "Burakov, Anatoly" Message-ID: Date: Fri, 29 Mar 2019 13:24:32 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.0 MIME-Version: 1.0 In-Reply-To: <3255576.YcZt162MTL@xps> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] eal: add option to not store segment fd's X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Mar 2019 13:24:36 -0000 On 29-Mar-19 12:40 PM, Thomas Monjalon wrote: > 29/03/2019 13:05, Burakov, Anatoly: >> On 29-Mar-19 11:34 AM, Thomas Monjalon wrote: >>> 29/03/2019 11:33, Burakov, Anatoly: >>>> On 29-Mar-19 9:50 AM, David Marchand wrote: >>>>> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov >>>>> > wrote: >>>>> >>>>> Due to internal glibc limitations [1], DPDK may exhaust internal >>>>> file descriptor limits when using smaller page sizes, which results >>>>> in inability to use system calls such as select() by user >>>>> applications. >>>>> >>>>> While the problem can be worked around using --single-file-segments >>>>> option, it does not work if --legacy-mem mode is also used. Add a >>>>> (yet another) EAL flag to disable storing fd's internally. This >>>>> will sacrifice compability with Virtio with vhost-backend, but >>>>> at least select() and friends will work. >>>>> >>>>> [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html >>>>> >>>>> >>>>> Sorry, I am a bit lost and I never took the time to look in the new >>>>> memory allocation system. >>>>> This gives the impression that we are accumulating workarounds, between >>>>> legacy-mem, single-file-segments, now no-seg-fds. >>>> >>>> Yep. I don't like this any more than you do, but i think there are users >>>> of all of these, so we can't just drop them willy-nilly. My great hope >>>> was that by now everyone would move on to use VFIO so legacy mem >>>> wouldn't be needed (the only reason it exists is to provide >>>> compatibility for use cases where lots of IOVA-contiguous memory is >>>> required, and VFIO cannot be used), but apparently that is too much to >>>> ask :/ >>>> >>>>> >>>>> Iiuc, everything revolves around the need for per page locks. >>>>> Can you summarize why we need them? >>>> >>>> The short answer is multiprocess. We have to be able to map and unmap >>>> pages individually, and for that we need to be sure that we can, in >>>> fact, remove a page because no one else uses it. We also need to store >>>> fd's because virtio with vhost-user backend needs them to work, because >>>> it relies on sharing memory between processes using fd's. >>> >>> It's a pity adding an option to workaround a limitation of a corner case. >>> It adds complexity that we will have to support forever, >>> and it's even not perfect because of vhost. >>> >>> Might there be another solution? >>> >> >> If there is one, i'm all ears. I don't see any solutions aside from >> adding limitations. >> >> For example, we could drop the single/multi file segments mode and just >> make single file segments a default and the only available mode, but >> this has certain risks because older kernels do not support fallocate() >> on hugetlbfs. >> >> We could further draw a line in the sand, and say that, for example, >> 19.11 (or 20.11) will not have legacy mem mode, and everyone should use >> VFIO by now and if you don't it's your own fault. >> >> We could also cut down on the number of fd's we use in single-file >> segments mode by not using locks and simply deleting pages in the >> primary, but yanking out hugepages from under secondaries' feet makes me >> feel uneasy, even if technically by the time that happens, they're not >> supposed to be used anyway. This could mean that the patch is no longer >> necessary because we don't use that many fd's any more. > > This last option is interesting. Is it realistic? > I can do it in current release cycle, but i'm not sure if it's too late to do such changes. I guess it's OK since the validation cycle is just starting? I'll throw something together and see if it crashes and burns. -- Thanks, Anatoly From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 9C1B8A05D3 for ; Fri, 29 Mar 2019 14:24:38 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 232E42BD3; Fri, 29 Mar 2019 14:24:37 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id AA30B1B05 for ; Fri, 29 Mar 2019 14:24:35 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 29 Mar 2019 06:24:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.60,284,1549958400"; d="scan'208";a="156953949" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.103]) ([10.237.220.103]) by fmsmga004.fm.intel.com with ESMTP; 29 Mar 2019 06:24:32 -0700 To: Thomas Monjalon Cc: David Marchand , dev , John McNamara , Marko Kovacevic , iain.barker@oracle.com, edwin.leung@oracle.com, maxime.coquelin@redhat.com References: <07f664c33ddedaa5dcfe82ecb97d931e68b7e33a.1550855529.git.anatoly.burakov@intel.com> <1682850.JO3elT0QtZ@xps> <3255576.YcZt162MTL@xps> From: "Burakov, Anatoly" Message-ID: Date: Fri, 29 Mar 2019 13:24:32 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.0 MIME-Version: 1.0 In-Reply-To: <3255576.YcZt162MTL@xps> Content-Type: text/plain; charset="UTF-8"; format="flowed" Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] eal: add option to not store segment fd's X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190329132432.1pOiS55jbB7leme7DoYwbNuhsQoL9zlMbDy-YJU6Fzc@z> On 29-Mar-19 12:40 PM, Thomas Monjalon wrote: > 29/03/2019 13:05, Burakov, Anatoly: >> On 29-Mar-19 11:34 AM, Thomas Monjalon wrote: >>> 29/03/2019 11:33, Burakov, Anatoly: >>>> On 29-Mar-19 9:50 AM, David Marchand wrote: >>>>> On Fri, Feb 22, 2019 at 6:12 PM Anatoly Burakov >>>>> > wrote: >>>>> >>>>> Due to internal glibc limitations [1], DPDK may exhaust internal >>>>> file descriptor limits when using smaller page sizes, which results >>>>> in inability to use system calls such as select() by user >>>>> applications. >>>>> >>>>> While the problem can be worked around using --single-file-segments >>>>> option, it does not work if --legacy-mem mode is also used. Add a >>>>> (yet another) EAL flag to disable storing fd's internally. This >>>>> will sacrifice compability with Virtio with vhost-backend, but >>>>> at least select() and friends will work. >>>>> >>>>> [1] https://mails.dpdk.org/archives/dev/2019-February/124386.html >>>>> >>>>> >>>>> Sorry, I am a bit lost and I never took the time to look in the new >>>>> memory allocation system. >>>>> This gives the impression that we are accumulating workarounds, between >>>>> legacy-mem, single-file-segments, now no-seg-fds. >>>> >>>> Yep. I don't like this any more than you do, but i think there are users >>>> of all of these, so we can't just drop them willy-nilly. My great hope >>>> was that by now everyone would move on to use VFIO so legacy mem >>>> wouldn't be needed (the only reason it exists is to provide >>>> compatibility for use cases where lots of IOVA-contiguous memory is >>>> required, and VFIO cannot be used), but apparently that is too much to >>>> ask :/ >>>> >>>>> >>>>> Iiuc, everything revolves around the need for per page locks. >>>>> Can you summarize why we need them? >>>> >>>> The short answer is multiprocess. We have to be able to map and unmap >>>> pages individually, and for that we need to be sure that we can, in >>>> fact, remove a page because no one else uses it. We also need to store >>>> fd's because virtio with vhost-user backend needs them to work, because >>>> it relies on sharing memory between processes using fd's. >>> >>> It's a pity adding an option to workaround a limitation of a corner case. >>> It adds complexity that we will have to support forever, >>> and it's even not perfect because of vhost. >>> >>> Might there be another solution? >>> >> >> If there is one, i'm all ears. I don't see any solutions aside from >> adding limitations. >> >> For example, we could drop the single/multi file segments mode and just >> make single file segments a default and the only available mode, but >> this has certain risks because older kernels do not support fallocate() >> on hugetlbfs. >> >> We could further draw a line in the sand, and say that, for example, >> 19.11 (or 20.11) will not have legacy mem mode, and everyone should use >> VFIO by now and if you don't it's your own fault. >> >> We could also cut down on the number of fd's we use in single-file >> segments mode by not using locks and simply deleting pages in the >> primary, but yanking out hugepages from under secondaries' feet makes me >> feel uneasy, even if technically by the time that happens, they're not >> supposed to be used anyway. This could mean that the patch is no longer >> necessary because we don't use that many fd's any more. > > This last option is interesting. Is it realistic? > I can do it in current release cycle, but i'm not sure if it's too late to do such changes. I guess it's OK since the validation cycle is just starting? I'll throw something together and see if it crashes and burns. -- Thanks, Anatoly