* [dpdk-dev] dpdk starting issue with descending virtual address allocation in new kernel @ 2014-09-10 22:40 Michael Hu (NSBU) 2014-09-11 9:53 ` Richardson, Bruce 0 siblings, 1 reply; 3+ messages in thread From: Michael Hu (NSBU) @ 2014-09-10 22:40 UTC (permalink / raw) To: dev Hi All, We have a kernel config question to consult you. DPDK failed to start due to mbuf creation issue with new kernel 3.14.17 + grsecurity patches. We tries to trace down the issue, it seems that the virtual address of huge page is allocated from high address to low address by kernel where dpdk expects it to be low to high to think it is as consecutive. See dumped virt address bellow. It is first 0x710421400000, then 0x710421200000. Where previously it would be 0x710421200000 first , then 0x710421400000. But they are still consecutive. ---- Initialize Port 0 -- TxQ 1, RxQ 1, Src MAC 00:0c:29:b3:30:db Create: Default RX 0:0 - Memory used (MBUFs 4096 x (size 1984 + Hdr 64)) + 790720 = 8965 KB Zone 0: name:<RG_MP_log_history>, phys:0x6ac00000, len:0x2080, virt:0x710421400000, socket_id:0, flags:0 Zone 1: name:<MP_log_history>, phys:0x6ac02080, len:0x1d10c0, virt:0x710421402080, socket_id:0, flags:0 Zone 2: name:<MALLOC_S0_HEAP_0>, phys:0x6ae00000, len:0x160000, virt:0x710421200000, socket_id:0, flags:0 Zone 3: name:<rte_eth_dev_data>, phys:0x6add3140, len:0x11a00, virt:0x7104215d3140, socket_id:0, flags:0 Zone 4: name:<rte_vmxnet3_pmd_0_shared>, phys:0x6ade4b40, len:0x300, virt:0x7104215e4b40, socket_id:0, flags:0 Zone 5: name:<rte_vmxnet3_pmd_0_queuedesc>, phys:0x6ade4e80, len:0x200, virt:0x7104215e4e80, socket_id:0, flags:0 Zone 6: name:<RG_MP_Default RX 0:0>, phys:0x6ade5080, len:0x10080, virt:0x7104215e5080, socket_id:0, flags:0 Segment 0: phys:0x6ac00000, len:2097152, virt:0x710421400000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0 Segment 1: phys:0x6ae00000, len:2097152, virt:0x710421200000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0 Segment 2: phys:0x6b000000, len:2097152, virt:0x710421000000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0 Segment 3: phys:0x6b200000, len:2097152, virt:0x710420e00000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0 Segment 4: phys:0x6b400000, len:2097152, virt:0x710420c00000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0 Segment 5: phys:0x6b600000, len:2097152, virt:0x710420a00000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0 Segment 6: phys:0x6b800000, len:2097152, virt:0x710420800000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0 Segment 7: phys:0x6ba00000, len:2097152, virt:0x710420600000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0 Segment 8: phys:0x6bc00000, len:2097152, virt:0x710420400000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0 Segment 9: phys:0x6be00000, len:2097152, virt:0x710420200000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0 --- Related dpdk code is in dpdk/lib/librte_eal/linuxapp/eal/eal_memory.c :: rte_eal_hugepage_init() for (i = 0; i < nr_hugefiles; i++) { new_memseg = 0; /* if this is a new section, create a new memseg */ if (i == 0) new_memseg = 1; else if (hugepage[i].socket_id != hugepage[i-1].socket_id) new_memseg = 1; else if (hugepage[i].size != hugepage[i-1].size) new_memseg = 1; else if ((hugepage[i].physaddr - hugepage[i-1].physaddr) != hugepage[i].size) new_memseg = 1; else if (((unsigned long)hugepage[i].final_va - (unsigned long)hugepage[i-1].final_va) != hugepage[i].size) { new_memseg = 1; } Is this a known issue? Is there any workaround? Or Could you advise which kernel config may relate this this kernel behavior change? Thanks, Michael ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [dpdk-dev] dpdk starting issue with descending virtual address allocation in new kernel 2014-09-10 22:40 [dpdk-dev] dpdk starting issue with descending virtual address allocation in new kernel Michael Hu (NSBU) @ 2014-09-11 9:53 ` Richardson, Bruce 2014-09-11 18:59 ` Michael Hu (NSBU) 0 siblings, 1 reply; 3+ messages in thread From: Richardson, Bruce @ 2014-09-11 9:53 UTC (permalink / raw) To: Michael Hu (NSBU), dev > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Michael Hu (NSBU) > Sent: Wednesday, September 10, 2014 11:41 PM > To: dev@dpdk.org > Subject: [dpdk-dev] dpdk starting issue with descending virtual address > allocation in new kernel > > Hi All, > > We have a kernel config question to consult you. > DPDK failed to start due to mbuf creation issue with new kernel 3.14.17 + > grsecurity patches. > We tries to trace down the issue, it seems that the virtual address of huge page > is allocated from high address to low address by kernel where dpdk expects it to > be low to high to think it is as consecutive. See dumped virt address bellow. It is > first 0x710421400000, then 0x710421200000. Where previously it would be > 0x710421200000 first , then 0x710421400000. But they are still consecutive. > ---- > Initialize Port 0 -- TxQ 1, RxQ 1, Src MAC 00:0c:29:b3:30:db > Create: Default RX 0:0 - Memory used (MBUFs 4096 x (size 1984 + Hdr 64)) + > 790720 = 8965 KB > Zone 0: name:<RG_MP_log_history>, phys:0x6ac00000, len:0x2080, > virt:0x710421400000, socket_id:0, flags:0 > Zone 1: name:<MP_log_history>, phys:0x6ac02080, len:0x1d10c0, > virt:0x710421402080, socket_id:0, flags:0 > Zone 2: name:<MALLOC_S0_HEAP_0>, phys:0x6ae00000, len:0x160000, > virt:0x710421200000, socket_id:0, flags:0 > Zone 3: name:<rte_eth_dev_data>, phys:0x6add3140, len:0x11a00, > virt:0x7104215d3140, socket_id:0, flags:0 > Zone 4: name:<rte_vmxnet3_pmd_0_shared>, phys:0x6ade4b40, len:0x300, > virt:0x7104215e4b40, socket_id:0, flags:0 > Zone 5: name:<rte_vmxnet3_pmd_0_queuedesc>, phys:0x6ade4e80, len:0x200, > virt:0x7104215e4e80, socket_id:0, flags:0 > Zone 6: name:<RG_MP_Default RX 0:0>, phys:0x6ade5080, len:0x10080, > virt:0x7104215e5080, socket_id:0, flags:0 > Segment 0: phys:0x6ac00000, len:2097152, virt:0x710421400000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 1: phys:0x6ae00000, len:2097152, virt:0x710421200000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 2: phys:0x6b000000, len:2097152, virt:0x710421000000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 3: phys:0x6b200000, len:2097152, virt:0x710420e00000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 4: phys:0x6b400000, len:2097152, virt:0x710420c00000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 5: phys:0x6b600000, len:2097152, virt:0x710420a00000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 6: phys:0x6b800000, len:2097152, virt:0x710420800000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 7: phys:0x6ba00000, len:2097152, virt:0x710420600000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 8: phys:0x6bc00000, len:2097152, virt:0x710420400000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > Segment 9: phys:0x6be00000, len:2097152, virt:0x710420200000, socket_id:0, > hugepage_sz:2097152, nchannel:0, nrank:0 > --- > > > > > > Related dpdk code is in > dpdk/lib/librte_eal/linuxapp/eal/eal_memory.c :: rte_eal_hugepage_init() > for (i = 0; i < nr_hugefiles; i++) { > new_memseg = 0; > > /* if this is a new section, create a new memseg */ > if (i == 0) > new_memseg = 1; > else if (hugepage[i].socket_id != hugepage[i-1].socket_id) > new_memseg = 1; > else if (hugepage[i].size != hugepage[i-1].size) > new_memseg = 1; > else if ((hugepage[i].physaddr - hugepage[i-1].physaddr) != > hugepage[i].size) > new_memseg = 1; > else if (((unsigned long)hugepage[i].final_va - > (unsigned long)hugepage[i-1].final_va) != hugepage[i].size) { > new_memseg = 1; > } > > > > Is this a known issue? Is there any workaround? Or Could you advise which > kernel config may relate this this kernel behavior change? > > Thanks, > Michael This should not be a problem for Intel DPDK startup, as the EAL should take care of mmaps being done in this order at startup. By way of background, where I have seen this occur before is on 32-bit systems, while 64-bit systems tend to mmap in ascending order in every case I've looked at. [Q: is this 32-bit you are running, or 64-bit]. In either case, we modified the EAL memory mapping code some time back to try and take account of this - if you look in map_all_hugepages function in eal_memory.c, you will see that, when we go to do the second mapping of hugepages to line the pages up, we do so by explicitly specifying our preferred address. We get this address by allocating a large block of memory from /dev/zero, taking the address and then freeing it again. Then we map the pages one-at-a-time into that free address block, so that even when the kernel wants to map things from hi to lo address, our address hints still cause things to map from lo to hi. If this does not work for you, I'd be curious to find out why. Do any of the security patches you have applied prevent mmap address hinting from working, for instance? Regards, /Bruce ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [dpdk-dev] dpdk starting issue with descending virtual address allocation in new kernel 2014-09-11 9:53 ` Richardson, Bruce @ 2014-09-11 18:59 ` Michael Hu (NSBU) 0 siblings, 0 replies; 3+ messages in thread From: Michael Hu (NSBU) @ 2014-09-11 18:59 UTC (permalink / raw) To: Richardson, Bruce; +Cc: dev Thanks for the info, Bruce. We are using 64bit system. The kernel config that affect this behavior is CONFIG_PAX_RANDMMAP where it is defined as --- config PAX_RANDMMAP bool "Randomize user stack and mmap() bases" default y if GRKERNSEC_CONFIG_AUTO depends on PAX_ASLR select PAX_RANDUSTACK help By saying Y here the kernel will randomize every task's userland stack and use a randomized base address for mmap() requests that do not specify one themselves. The stack randomization is done in two steps where the second one may apply a big amount of shift to the top of the stack and cause problems for programs that want to use lots of memory (more than 2.5 GB if SEGMEXEC is not active, or 1.25 GB when it is). As a result of mmap randomization all dynamically loaded libraries will appear at random addresses and therefore be harder to exploit by a technique where an attacker attempts to execute library code for his purposes (e.g. spawn a shell from an exploited program that is running at an elevated privilege level). Furthermore, if a program is relinked as a dynamic ELF file, its base address will be randomized as well, completing the full randomization of the address space layout. Attacking such programs becomes a guess game. You can find an example of doing this at http://pax.grsecurity.net/et_dyn.tar.gz and practical samples at http://www.grsecurity.net/grsec-gcc-specs.tar.gz . NOTE: you can use the 'chpax' or 'paxctl' utilities to control this feature on a per file basis. --- With this config, DPDK seems not able to start due to change in mmap. Please refer to mmap patches in (search CONFIG_PAX_RANDMMAP) https://grsecurity.net/stable/grsecurity-3.0-3.14.18-201409082127.patch Thanks, Michael On 9/11/14 2:53 AM, "Richardson, Bruce" <bruce.richardson@intel.com> wrote: >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Michael Hu (NSBU) >> Sent: Wednesday, September 10, 2014 11:41 PM >> To: dev@dpdk.org >> Subject: [dpdk-dev] dpdk starting issue with descending virtual address >> allocation in new kernel >> >> Hi All, >> >> We have a kernel config question to consult you. >> DPDK failed to start due to mbuf creation issue with new kernel 3.14.17 >>+ >> grsecurity patches. >> We tries to trace down the issue, it seems that the virtual address of >>huge page >> is allocated from high address to low address by kernel where dpdk >>expects it to >> be low to high to think it is as consecutive. See dumped virt address >>bellow. It is >> first 0x710421400000, then 0x710421200000. Where previously it would be >> 0x710421200000 first , then 0x710421400000. But they are still >>consecutive. >> ---- >> Initialize Port 0 -- TxQ 1, RxQ 1, Src MAC 00:0c:29:b3:30:db >> Create: Default RX 0:0 - Memory used (MBUFs 4096 x (size 1984 + >>Hdr 64)) + >> 790720 = 8965 KB >> Zone 0: name:<RG_MP_log_history>, phys:0x6ac00000, len:0x2080, >> virt:0x710421400000, socket_id:0, flags:0 >> Zone 1: name:<MP_log_history>, phys:0x6ac02080, len:0x1d10c0, >> virt:0x710421402080, socket_id:0, flags:0 >> Zone 2: name:<MALLOC_S0_HEAP_0>, phys:0x6ae00000, len:0x160000, >> virt:0x710421200000, socket_id:0, flags:0 >> Zone 3: name:<rte_eth_dev_data>, phys:0x6add3140, len:0x11a00, >> virt:0x7104215d3140, socket_id:0, flags:0 >> Zone 4: name:<rte_vmxnet3_pmd_0_shared>, phys:0x6ade4b40, len:0x300, >> virt:0x7104215e4b40, socket_id:0, flags:0 >> Zone 5: name:<rte_vmxnet3_pmd_0_queuedesc>, phys:0x6ade4e80, len:0x200, >> virt:0x7104215e4e80, socket_id:0, flags:0 >> Zone 6: name:<RG_MP_Default RX 0:0>, phys:0x6ade5080, len:0x10080, >> virt:0x7104215e5080, socket_id:0, flags:0 >> Segment 0: phys:0x6ac00000, len:2097152, virt:0x710421400000, >>socket_id:0, >> hugepage_sz:2097152, nchannel:0, nrank:0 >> Segment 1: phys:0x6ae00000, len:2097152, virt:0x710421200000, >>socket_id:0, >> hugepage_sz:2097152, nchannel:0, nrank:0 >> Segment 2: phys:0x6b000000, len:2097152, virt:0x710421000000, >>socket_id:0, >> hugepage_sz:2097152, nchannel:0, nrank:0 >> Segment 3: phys:0x6b200000, len:2097152, virt:0x710420e00000, >>socket_id:0, >> hugepage_sz:2097152, nchannel:0, nrank:0 >> Segment 4: phys:0x6b400000, len:2097152, virt:0x710420c00000, >>socket_id:0, >> hugepage_sz:2097152, nchannel:0, nrank:0 >> Segment 5: phys:0x6b600000, len:2097152, virt:0x710420a00000, >>socket_id:0, >> hugepage_sz:2097152, nchannel:0, nrank:0 >> Segment 6: phys:0x6b800000, len:2097152, virt:0x710420800000, >>socket_id:0, >> hugepage_sz:2097152, nchannel:0, nrank:0 >> Segment 7: phys:0x6ba00000, len:2097152, virt:0x710420600000, >>socket_id:0, >> hugepage_sz:2097152, nchannel:0, nrank:0 >> Segment 8: phys:0x6bc00000, len:2097152, virt:0x710420400000, >>socket_id:0, >> hugepage_sz:2097152, nchannel:0, nrank:0 >> Segment 9: phys:0x6be00000, len:2097152, virt:0x710420200000, >>socket_id:0, >> hugepage_sz:2097152, nchannel:0, nrank:0 >> --- >> >> >> >> >> >> Related dpdk code is in >> dpdk/lib/librte_eal/linuxapp/eal/eal_memory.c :: >>rte_eal_hugepage_init() >> for (i = 0; i < nr_hugefiles; i++) { >> new_memseg = 0; >> >> /* if this is a new section, create a new memseg */ >> if (i == 0) >> new_memseg = 1; >> else if (hugepage[i].socket_id != hugepage[i-1].socket_id) >> new_memseg = 1; >> else if (hugepage[i].size != hugepage[i-1].size) >> new_memseg = 1; >> else if ((hugepage[i].physaddr - hugepage[i-1].physaddr) != >> hugepage[i].size) >> new_memseg = 1; >> else if (((unsigned long)hugepage[i].final_va - >> (unsigned long)hugepage[i-1].final_va) != hugepage[i].size) >>{ >> new_memseg = 1; >> } >> >> >> >> Is this a known issue? Is there any workaround? Or Could you advise >>which >> kernel config may relate this this kernel behavior change? >> >> Thanks, >> Michael > >This should not be a problem for Intel DPDK startup, as the EAL should >take care of mmaps being done in this order at startup. >By way of background, where I have seen this occur before is on 32-bit >systems, while 64-bit systems tend to mmap in ascending order in every >case I've looked at. [Q: is this 32-bit you are running, or 64-bit]. In >either case, we modified the EAL memory mapping code some time back to >try and take account of this - if you look in map_all_hugepages function >in eal_memory.c, you will see that, when we go to do the second mapping >of hugepages to line the pages up, we do so by explicitly specifying our >preferred address. We get this address by allocating a large block of >memory from /dev/zero, taking the address and then freeing it again. Then >we map the pages one-at-a-time into that free address block, so that even >when the kernel wants to map things from hi to lo address, our address >hints still cause things to map from lo to hi. If this does not work for >you, I'd be curious to find out why. Do any of the security patches you >have applied prevent mmap address hinting from working, for instance? > >Regards, >/Bruce ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-09-11 18:55 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-09-10 22:40 [dpdk-dev] dpdk starting issue with descending virtual address allocation in new kernel Michael Hu (NSBU) 2014-09-11 9:53 ` Richardson, Bruce 2014-09-11 18:59 ` Michael Hu (NSBU)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).