From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f48.google.com (mail-wg0-f48.google.com [74.125.82.48]) by dpdk.org (Postfix) with ESMTP id 2E8439A8B for ; Fri, 3 Apr 2015 11:14:49 +0200 (CEST) Received: by wgdm6 with SMTP id m6so106552370wgd.2 for ; Fri, 03 Apr 2015 02:14:49 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:organization :user-agent:in-reply-to:references:mime-version :content-transfer-encoding:content-type; bh=KqNswDrY95+8gxV7EOJDkwwHQ9OCuJE+udEo6Cui4Wk=; b=RLUuY/i2bLwo1NeC0ywTEgGhYQEPdyti67klO0UyNE+Ne2mH1Xy3aMb+VPE0UsCs8G FaSQNh7Z2djR7e5UvcsLQv2VKPlX+14qIBhauvM0K8i9799UDZtij5H6jRHBHk0/lJis 96aO+/8r9R7KPlSO6Kzmr5czoM2Ny2FaCfqNXY+dKEm6qTD3KC8ZWn8IM1g8bBZ5PHhC HBi4MSY9w5MlHlQ5RgbEDG/uSR9BqpefQrvpfL6V0QJ9M6hPaCPRLs+pI0nF7gR7wKu5 k+UKUNJnmdEyv1HOAOIr7SmTzkLy6yfmkKAP3rhFX62JZ6t94du+cnKiSIFcGIyrHMcE DoVQ== X-Gm-Message-State: ALoCoQnwfmcipNJjTuafxcVRNHfRjTaZwHDpEq1HtsO8yg3ndUmtxD8zkhwBS91DM6ygAA356WLr X-Received: by 10.180.231.40 with SMTP id td8mr3589341wic.89.1428052489056; Fri, 03 Apr 2015 02:14:49 -0700 (PDT) Received: from xps13.localnet (136-92-190-109.dsl.ovh.fr. [109.190.92.136]) by mx.google.com with ESMTPSA id m4sm10693396wjb.25.2015.04.03.02.14.47 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 03 Apr 2015 02:14:48 -0700 (PDT) Date: Fri, 03 Apr 2015 02:14:48 -0700 (PDT) X-Google-Original-Date: Fri, 03 Apr 2015 11:14 +0200 From: Thomas Monjalon To: "Gonzalez Monroy, Sergio" , Lilijun Message-ID: <2447953.M95UbNe7b9@xps13> Organization: 6WIND User-Agent: KMail/4.14.4 (Linux/3.18.4-1-ARCH; KDE/4.14.4; x86_64; ; ) In-Reply-To: <551E57A6.9070405@intel.com> References: <1427974230-8572-1-git-send-email-jerry.lilijun@huawei.com> <551E57A6.9070405@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH] eal: decrease the memory init time with many hugepages setup X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Apr 2015 09:14:49 -0000 2015-04-03 10:04, Gonzalez Monroy, Sergio: > On 02/04/2015 14:41, Jay Rolette wrote: > > On Thu, Apr 2, 2015 at 7:55 AM, Thomas Monjalon > > wrote: > > > >> 2015-04-02 19:30, jerry.lilijun@huawei.com: > >>> From: Lilijun > >>> > >>> In the function map_all_hugepages(), hugepage memory is truly allocated > >> by > >>> memset(virtaddr, 0, hugepage_sz). Then it costs about 40s to finish the > >>> dpdk memory initialization when 40000 2M hugepages are setup in host os. > >> Yes it's something we should try to reduce. > >> > > I have a patch in my tree that does the same opto, but it is commented out > > right now. In our case, 2/3's of the startup time for our entire app was > > due to that particular call - memset(virtaddr, 0, hugepage_sz). Just > > zeroing 1 byte per huge page reduces that by 30% in my tests. > > > > The only reason I have it commented out is that I didn't have time to make > > sure there weren't side-effects for DPDK or my app. For normal shared > > memory on Linux, pages are initialized to zero automatically once they are > > touched, so the memset isn't required but I wasn't sure whether that > > applied to huge pages. Also wasn't sure how hugetlbfs factored into the > > equation. > > > > Hopefully someone can chime in on that. Would love to uncomment the opto :) > > > I think the opto/patch is good ;) > > I had a look at the Linux kernel sources (mm/hugetlb.c)and at least > since 2.6.32 (minimum > Linux kernel version supported by DPDK) the kernel clears the hugepage > (clear_huge_page) > when it faults (hugetlb_no_page). > > Primary DPDK apps do clear_hugedir, clearing previously allocated > hugepages, thus triggering > hugepage faults (hugetlb_no_page) during map_all_hugepages. > > Note that even when we exit a primary DPDK app, hugepages remain > allocated, reason why > apps such as dump_cfg are able to retrieve config/memory information. OK, thanks Sergio. So the patch should add a comment to explain page fault reason of memset and why 1 byte is enough. I think we should also consider remap_all_hugepages() function. > >> Isn't it a security hole? > >> > > Not necessarily. If the kernel pre-zeros the huge pages via CoW like normal > > pages, then definitely not. > > > > Even if the kernel doesn't pre-zero the pages, if DPDK takes care of > > properly initializing memory structures on startup as they are carved out > > of the huge pages, then it isn't a security hole. However, that approach is > > susceptible to bit rot... You can audit the code and make sure everything > > is kosher at first, but you have to worry about new code making assumptions > > about how memory is initialized. > > > >> This article speaks about "prezeroing optimizations" in Linux kernel: > >> http://landley.net/writing/memory-faq.txt > > > > I read through that when I was trying to figure out what whether huge pages > > were pre-zeroed or not. It doesn't talk about huge pages much beyond why > > they are useful for reducing TLB swaps. > > > > Jay