* KASLR enabled in Linux Kernel
@ 2024-01-22 9:58 Sudhakar Vajha
2024-01-23 21:00 ` Dmitry Kozlyuk
0 siblings, 1 reply; 2+ messages in thread
From: Sudhakar Vajha @ 2024-01-22 9:58 UTC (permalink / raw)
To: users
[-- Attachment #1.1: Type: text/plain, Size: 6911 bytes --]
Hi Team,
The issue we are facing:
We have a telecom product called "Session Boarder Controller(SBC)" version 6400 in which we are using DPDK version 22.11.1.
Users have encountered instances where the activation of ASLR in the Linux Kernel results in DPDK initialization failures on the SBC 6400 platform. As ASLR is needed for FIPS, this issue poses a challenge for users seeking to benefit from both enhanced security through ASLR and the high-performance packet processing capabilities offered by DPDK.
Problem analysis
The DPDK defines the memory type in the following way:
The number of huge page types * the number of NUMA nodes present in the system i.e. 2 * 1 = 2, which means that there are two memory types (two huge pages 1GB and 2MB) with one NUMA node.
Deciding the amount of memory going towards each memory type is a balancing act between maximum segments per type, maximum memory per type, and number of detected NUMA nodes. The goal is to make sure each memory type gets at least one memseg list.
The total amount of memory is limited by RTE_MAX_MEM_MB value.
The total amount of memory per type is limited by either RTE_MAX_MEM_MB_PER_TYPE, or by RTE_MAX_MEM_MB divided by the number of detected NUMA nodes. Additionally, maximum number of segments per type is also limited by RTE_MAX_MEMSEG_PER_TYPE. This is because for smaller page sizes, it can take hundreds of thousands of segments to reach the above specified per-type memory limits.
Additionally, each type may have multiple memseg lists associated with it, each limited by either RTE_MAX_MEM_MB_PER_LIST for bigger page sizes, or RTE_MAX_MEMSEG_PER_LIST segments for smaller ones. The number of memseg lists per type is decided based on the above limits, and also take number of detected NUMA nodes, to make sure that doesn't run out of memseg lists before we populate all NUMA nodes with memory.
#define RTE_MAX_MEM_MB 524288 defined in rte_build_config.h file.
#define RTE_MAX_MEM_MB_PER_TYPE 65536 defined in rte_config.h file.
#define RTE_MAX_MEMSEG_PER_LIST 32768
#define RTE_MAX_MEM_MB_PER_LIST 65536
#define RTE_MAX_MEMSEG_PER_TYPE 32768
max_mem = (uint64_t)RTE_MAX_MEM_MB << 20;
max_mem_per_type = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20,
max_mem / n_memtypes);
The following logs are captured from 6400 during boot-up time:
EAL: eal_dynmem_memseg_lists_init:117 n_memtypes = 2!!!!!
EAL: eal_dynmem_memseg_lists_init:124 max_mem:549755813888 max_mem_per_type :68719476736
EAL: eal_dynmem_memseg_lists_init:132 max_seglists_per_type = 64!!!!!
EAL: eal_dynmem_memseg_lists_init:175 max_segs_per_type = 64!!!!!
EAL: eal_dynmem_memseg_lists_init:179 max_segs_per_list = 64!!!!!
EAL: eal_dynmem_memseg_lists_init:184 max_mem_per_list = 68719476736!!!!!
EAL: eal_dynmem_memseg_lists_init:188 n_segs = 64!!!!!
Each memory type is created the following named memseg lists:
* memseg-1048576k-0-0(1GB) with 64 segments:
* memseg-2048k-0-0(2MB) with 32768 segments.
During SBC 6400 initialization, requesting the system to create 64 huge pages of 1GB size. DPDK allocates all these 64 huge pages of 1 GB size in a contiguous physical memory location. If all these pages are allocated in a contiguous memory location, no issue has been observed while remapping the huge pages into the memory segment list of size 64. But with ASLR enabled, it is not guaranteed that the memory for huge pages will always be allocated in contiguous memory locations. When ASLR is enabled, if DPDK creates the memory for huge pages in a contiguous memory location, remapping the huge page memory into the memory segment list will be done at once in one step. This is the default behavior.
The issue is happening while remapping the 64 huge pages, not created in contiguous physical memory, into the memory segment list. When huge pages are not contiguous, the remapping will be done in two steps:
1st step:
Huge page memory layout:
0 1 2 3 4 5 6 7 8 9................................................................................63
For example, if 0-9 pages are contiguous, and the rest of the huge pages are stored in different physical memory locations, only 0-9 huge pages will be remapped into the memory segment list.
Memory segment list:
0 1 2 3 4 5 6 7 8 9................................................................................63
2nd Step:
Remapping will be done again for the rest of the huge pages, this time, as the memory segment list is not empty (as it is already having 9 segments), DPDK is leaving a space for one segment in the memory segment list and try to remap the huge pages into the rest of the segments in the memory segment list. As both huge pages and memory segment list are equal in size 64, DPDK is failing to get the enough memory from the memory segment list as it is already left the space for one segment in the memory segment list.
Huge page memory layout:
0 1 2 3 4 5 6 7 8 9 10 11 ..................................................63
The remaining huge pages are 54. DPDK tries to remap the 54 huge pages into the memory segment list.
But the memory segment list is having 53 segments as it is left one segment for a hole. Hence, the memory allocation would be failed and initialization of DPDK would also be failed.
Memory segment list:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 .....................................................63
[cid:image001.png@01DA4D47.AE9DB9C0]
[cid:image002.png@01DA4D47.AE9DB9C0][Leave space for a hole if memory segment list is not empty][The space is having only for 53 segments.]
Note: Why DPDK is leaving a space for a hole in a memory segment list?
Basically, DPDK is leaving the space to know how many segments there are in order to map all pages into one address space, and leave appropriate holes between segments so that rte_malloc does not concatenate them into one big segment. But in this case, all the 64 pages are belongs to one address space and leaving space for a hole is not required.
The following are my queries:
1. Why the space is leaving in the memseg list? And what is the significance of the hole?
2. Can I scale up the size of the memory segment list to greater than the 64?
Regards,
Sudhakar
[-- Attachment #1.2: Type: text/html, Size: 65215 bytes --]
[-- Attachment #2: image001.png --]
[-- Type: image/png, Size: 436 bytes --]
[-- Attachment #3: image002.png --]
[-- Type: image/png, Size: 600 bytes --]
[-- Attachment #4: image003.emz --]
[-- Type: application/octet-stream, Size: 983 bytes --]
[-- Attachment #5: image004.png --]
[-- Type: image/png, Size: 1864 bytes --]
[-- Attachment #6: image005.emz --]
[-- Type: application/octet-stream, Size: 931 bytes --]
[-- Attachment #7: image006.png --]
[-- Type: image/png, Size: 1558 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: KASLR enabled in Linux Kernel
2024-01-22 9:58 KASLR enabled in Linux Kernel Sudhakar Vajha
@ 2024-01-23 21:00 ` Dmitry Kozlyuk
0 siblings, 0 replies; 2+ messages in thread
From: Dmitry Kozlyuk @ 2024-01-23 21:00 UTC (permalink / raw)
To: Sudhakar Vajha; +Cc: users
Hi Sudhakar,
2024-01-22 09:58 (UTC+0000), Sudhakar Vajha:
[...]
> During SBC 6400 initialization, requesting the system to create 64 huge
> pages of 1GB size.
Are you using <rte_memzone.h> API, possibly via <rte_mempool.h> functions?
What are the exact request parameters: size, alignment, flags?
> DPDK is leaving a space for one segment in the memory segment list and try
> to remap the huge pages into the rest of the segments in the memory segment
> list.
Can you please provide logs or debug prints
to confirm this is what's happening?
> Note: Why DPDK is leaving a space for a hole in a memory segment list?
>
> Basically, DPDK is leaving the space to know how many segments there are in
> order to map all pages into one address space, and leave appropriate holes
> between segments so that rte_malloc does not concatenate them into one big
> segment. But in this case, all the 64 pages are belongs to one address
> space and leaving space for a hole is not required.
Note: you seem to use the term "address space"
for what is called DPDK allocator element (struct malloc_elem).
At the same time, rte_malloc doesn't have logic to leave holes,
as one element is allowed to contain memory from pages
that are not physically contiguous.
There is logic like this with --legacy-mem EAL option
(page ranges that are not pysically contiguous
are mapped to virtual addresses that are not contiguous as well),
but from you logs it seems that you are not using it.
> The following are my queries:
>
> 1. Why the space is leaving in the memseg list? And what is the
> significance of the hole?
If indeed using memzones, the following might happen.
Memzone is internally a regular element to the DPDK allocator.
It consists of the requested amount of usable memory, plus a header of 128B
(also a trailer when RTE_MALLOC_DEBUG is defined).
It is thus impossible to have all 64 GB of memory as usable via DPDK,
because some little part of it will be internally used for element headers.
Much more space may be lost due to alignment.
Maybe aligning multiple times when ASLR is enabled accumulates too much loss.
> 2. Can I scale up the size of the memory segment list to greater than the 64?
You have to rebuild DPDK with RTE_MAX_MEM_MB_PER_TYPE and
RTE_MAX_MEM_MB_PER_LIST changed as needed.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2024-01-23 21:00 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22 9:58 KASLR enabled in Linux Kernel Sudhakar Vajha
2024-01-23 21:00 ` Dmitry Kozlyuk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).