DPDK patches and discussions
 help / color / mirror / Atom feed
From: 建明 <jianmingfan@126.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: dev@dpdk.org, "Jianming Fan" <fanjianming@jd.com>
Subject: Re: [dpdk-dev] [PATCH v2] mem: accelerate dpdk program startup by reuse page from page cache
Date: Sun, 11 Nov 2018 10:22:09 +0800 (CST)	[thread overview]
Message-ID: <66c1cc99.1cf5.16700938cb0.Coremail.jianmingfan@126.com> (raw)
In-Reply-To: <357b1b24-68f8-2c83-ff42-6ea1dce11b9c@intel.com>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=GBK, Size: 5338 bytes --]

Hi, Burakov

  Thanks very much for your reply. 
 
  I run the testpmd dpdk18.08 on my router with 200GB huge page configured.
  And find it still takes 50s in zeroing the 200GB huge page each time the program restarts.
  
  As you mentioned app shall calls rte_eal_cleanup() to do the cleanup.
  However, this api is not designed to accelerate startup time.
  >>
     During rte_eal_init() EAL allocates memory from hugepages to enable its core libraries to perform their tasks.
	 The rte_eal_cleanup() function releases these resources, ensuring that no hugepage memory is leaked.
	 It is expected that all DPDK applications call rte_eal_cleanup() before exiting. 
	 Not calling this function could result in leaking hugepages, leading to failure during initialization of secondary processes.
  >>  
  I guess you suggest to use secondary process which uses share memory for fast startup. However, as you know, exist applications need to do a lot of change to use it well.  
  
 And You mentioned faster initialization is one of the key reasons why the new memory subsystem was developed.
 However, as I read the following code, I guess the community doesn't really consider the power of reuse the exist hugepage fs page cache.
	hugepage_info_init(void)
	{
   		/	* clear out the hugepages dir from unused pages */
   		if (clear_hugedir(hpi->hugedir) == -1)
     		 break;
	}



  The key to this patch is that it takes advantage of the page cache of huge page fs.
  with this patch, when you first startup the program, the following steps by be taken.
  
  1. user space: create files under /dev/hugepages
  2. user space: do mmap with shared and populate flag set.
  3. kernel space:
      3.1 find free vma.
	  3.2 alloc huge page from huge page pool reserved by the hugepage fs.
	  3.3 call clear_huge_page to zero the page.  
			******************This step is very time-consuming********************
		
	  3.4 insert the page to the file inode's page cache
	  3.5 insert the page into the page table
	  
	
  then if you restart the program, the following steps will be taken
  1. user space: open files under /dev/hugepages. 
  2. user space: do mmap with shared and populate flag set.
  3. kernel space:
      3.1 find free vma
	  3.2 it search the file's inode page cache, and find there is page there.
	  3.3 insert the page into the page table  
  Note restart the program doesn't need to do clear_huge_page any more!!!
  

  Btw, i worked for intel serveral years ago. It's a great place to work.

Best regards
jianming


At 2018-11-09 22:03:25, "Burakov, Anatoly" <anatoly.burakov@intel.com> wrote:
>On 09-Nov-18 12:20 PM, Burakov, Anatoly wrote:
>> On 09-Nov-18 9:23 AM, jianmingfan wrote:
>>> --- fix coding style of the previous patch
>>>
>>> During procless startup, dpdk invokes clear_hugedir() to unlink all
>>> hugepage files under /dev/hugepages. Then in map_all_hugepages(),
>>> it invokes mmap to allocate and zero all the huge pages as configured
>>> in /sys/kernel/mm/hugepages/xxx/nr_hugepages.
>>>
>>> This cause startup process extreamly slow with large size of huge page
>>> configured.
>>>
>>> In our use case, we usually configure as large as 200GB hugepages in our
>>> router. It takes more than 50s each time dpdk process startup to clear
>>> the pages.
>>>
>>> To address this issue, user can turn on --reuse-map switch. With it,
>>> dpdk will check the validity of the exiting page cache under
>>> /dev/hugespages. If valid, the cache will be reused not deleted,
>>> so that the os doesn't need to zero the pages again.
>>>
>>> However, as there are a lot of users ,e.g. rte_kni_alloc, rely on the
>>> os zeor page behavior. To keep things work, I add memset during
>>> malloc_heap_alloc(). This makes sense due to the following reason.
>>> 1) user often configure hugepage size too large to be used by the 
>>> program.
>>> In our router, 200GB is configured, but less than 2GB is actually used.
>>> 2) dpdk users don't call heap allocation in performance-critical path.
>>> They alloc memory during process bootup.
>>>
>>> Signed-off-by: Jianming Fan <fanjianming@jd.com>
>>> ---
>> 
>> I believe this issue is better solved by actually fixing all of the 
>> memory that DPDK leaves behind. We already have rte_eal_cleanup() call 
>> which will deallocate any EAL-allocated memory that have been reserved, 
>> and an exited application should free any memory it was using so that 
>> memory subsystem could free it back to the system, thereby not needing 
>> any cleaning of hugepages at startup.
>> 
>> If your application does not e.g. free its mempools on exit, it should 
>> :) Chances are, the problem will go away. The only circumstance where 
>> this may not work is if you preallocated your memory using 
>> -m/--socket-mem flag.
>> 
>
>To clarify - all of the above is only applicable to 18.05 and beyond. 
>The map_all_hugepages() function only gets called in the legacy mem 
>init, so this patch solves a problem that does not exist on recent DPDK 
>versions in the first place - faster initialization is one of the key 
>reasons why the new memory subsystem was developed.
>
>-- 
>Thanks,
>Anatoly
\x16º&™«Zžg¥•©èÅÊ&JéÍ¢ýuÓÍõãÝ´×Ä^¶êç=«a™«Zžg¥•©èÅÊ&Eç\x1eŠ÷~º&\x11Dtà0w¡·¨ºÖèºwi®‹^rبž‹­–Š$r‰¦j)^¢–௭4çŠ.µº.Úk¢×œ¶*'¢ëe¢‰\x1c¢n4×Núç†òv—d¢¸\x0f¢Ë_‹\x1c"¶\x11\x1213ât!=ðMCíú+uëÝ¥Ù(®\x04®Ÿ]M¢ý´×Í<ß^=ûMtÐ!\x13\f¢\fJ('jÛ«z

      parent reply	other threads:[~2018-11-11  2:22 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-09  7:58 [dpdk-dev] [PATCH] " jianmingfan
2018-11-09  9:23 ` [dpdk-dev] [PATCH v2] " jianmingfan
2018-11-09 12:20   ` Burakov, Anatoly
2018-11-09 14:03     ` Burakov, Anatoly
2018-11-09 16:21       ` Stephen Hemminger
2018-11-11  2:19       ` [dpdk-dev] 答复: " 范建明
2018-11-12  9:04         ` Burakov, Anatoly
2018-11-11  2:22       ` 建明 [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66c1cc99.1cf5.16700938cb0.Coremail.jianmingfan@126.com \
    --to=jianmingfan@126.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=fanjianming@jd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).