From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C3378A04C3; Wed, 13 Nov 2019 18:27:05 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E21BF2B8B; Wed, 13 Nov 2019 18:27:00 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 6A23B2AB; Wed, 13 Nov 2019 18:26:57 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 13 Nov 2019 09:26:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.68,301,1569308400"; d="scan'208";a="406036952" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.92]) ([10.237.220.92]) by fmsmga006.fm.intel.com with ESMTP; 13 Nov 2019 09:26:53 -0800 To: Bruce Richardson , Venumadhav Josyula Cc: users@dpdk.org, dev@dpdk.org, Venumadhav Josyula References: <20191113091927.GA1501@bricha3-MOBL.ger.corp.intel.com> From: "Burakov, Anatoly" Message-ID: <70f4e9f0-70f7-aa4a-6c5d-c24308d196c2@intel.com> Date: Wed, 13 Nov 2019 17:26:54 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20191113091927.GA1501@bricha3-MOBL.ger.corp.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] time taken for allocation of mempool. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 13-Nov-19 9:19 AM, Bruce Richardson wrote: > On Wed, Nov 13, 2019 at 10:37:57AM +0530, Venumadhav Josyula wrote: >> Hi , >> We are using 'rte_mempool_create' for allocation of flow memory. This has >> been there for a while. We just migrated to dpdk-18.11 from dpdk-17.05. Now >> here is problem statement >> >> Problem statement : >> In new dpdk ( 18.11 ), the 'rte_mempool_create' take approximately ~4.4 sec >> for allocation compared to older dpdk (17.05). We have som 8-9 mempools for >> our entire product. We do upfront allocation for all of them ( i.e. when >> dpdk application is coming up). Our application is run to completion model. >> >> Questions:- >> i) is that acceptable / has anybody seen such a thing ? >> ii) What has changed between two dpdk versions ( 18.11 v/s 17.05 ) from >> memory perspective ? >> >> Any pointer are welcome. >> > Hi, > > from 17.05 to 18.11 there was a change in default memory model for DPDK. In > 17.05 all DPDK memory was allocated statically upfront and that used for > the memory pools. With 18.11, no large blocks of memory are allocated at > init time, instead the memory is requested from the kernel as it is needed > by the app. This will make the initial startup of an app faster, but the > allocation of new objects like mempools slower, and it could be this you > are seeing. > > Some things to try: > 1. Use "--socket-mem" EAL flag to do an upfront allocation of memory for use > by your memory pools and see if it improves things. > 2. Try using "--legacy-mem" flag to revert to the old memory model. > > Regards, > /Bruce > I would also add to this the fact that the mempool will, by default, attempt to allocate IOVA-contiguous memory, with a fallback to non-IOVA contiguous memory whenever getting IOVA-contiguous memory isn't possible. If you are running in IOVA as PA mode (such as would be the case if you are using igb_uio kernel driver), then, since it is now impossible to preallocate large PA-contiguous chunks in advance, what will likely happen in this case is, mempool will try to allocate IOVA-contiguous memory, fail and retry with non-IOVA contiguous memory (essentially allocating memory twice). For large mempools (or large number of mempools) that can take a bit of time. The obvious workaround is using VFIO and IOVA as VA mode. This will cause the allocator to be able to get IOVA-contiguous memory at the outset, and allocation will complete faster. The other two alternatives, already suggested in this thread by Bruce and Olivier, are: 1) use bigger page sizes (such as 1G) 2) use legacy mode (and lose out on all of the benefits provided by the new memory model) The recommended solution is to use VFIO/IOMMU, and IOVA as VA mode. -- Thanks, Anatoly