From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by dpdk.org (Postfix) with ESMTP id 405F449E0 for ; Fri, 9 Nov 2018 20:29:05 +0100 (CET) Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wA9JShZk137661 for ; Fri, 9 Nov 2018 14:29:04 -0500 Received: from e12.ny.us.ibm.com (e12.ny.us.ibm.com [129.33.205.202]) by mx0a-001b2d01.pphosted.com with ESMTP id 2nnep6vh5r-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 09 Nov 2018 14:29:04 -0500 Received: from localhost by e12.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 9 Nov 2018 19:29:03 -0000 Received: from b01cxnp23034.gho.pok.ibm.com (9.57.198.29) by e12.ny.us.ibm.com (146.89.104.199) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 9 Nov 2018 19:29:00 -0000 Received: from b01ledav006.gho.pok.ibm.com (b01ledav006.gho.pok.ibm.com [9.57.199.111]) by b01cxnp23034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wA9JSx9v29818920 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 9 Nov 2018 19:28:59 GMT Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 06406AC05B; Fri, 9 Nov 2018 19:28:59 +0000 (GMT) Received: from b01ledav006.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8D55FAC059; Fri, 9 Nov 2018 19:28:58 +0000 (GMT) Received: from oc8377887825.ibm.com (unknown [9.85.202.182]) by b01ledav006.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 9 Nov 2018 19:28:58 +0000 (GMT) From: David Wilder To: stable@dpdk.org Cc: pradeep@us.ibm.com, chaozhu@linux.vnet.ibm.com Date: Fri, 9 Nov 2018 11:28:43 -0800 X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 18110919-0060-0000-0000-000002D03CCF X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010016; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000270; SDB=6.01115055; UDB=6.00578155; IPR=6.00895154; MB=3.00024093; MTD=3.00000008; XFM=3.00000015; UTC=2018-11-09 19:29:02 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18110919-0061-0000-0000-0000472513A6 Message-Id: <20181109192843.2718-1-dwilder@us.ibm.com> Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-11-09_06:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1011 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1811090176 Subject: [dpdk-stable] [PATCH] mem: Add Power9 support in rte_eal_hugepage_init X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Nov 2018 19:29:06 -0000 Determine if the ppc64 platform is Power9 or Power8 and perform huge page mapping appropriately for the selected platform. Signed-off-by: Pradeep Satyanarayana Tested-by: David Wilder --- On IBM Power8, when mmaping hugepage files the address hint supplied to mmap is not always honored, therefor we let the kernel pick the address by specifying a NULL address hint. On Power9 the address hint is honored as expected. This patch detects the platform, if Power9 the address hint is supplied to mmap and the pages are sorted appropriately. Hugepage mapping for both primary and secondary processes now work correctly on Power9. I have retain the original behavior and limitations on Power8. Additionally the flags supplied to mmap() have been corrected eliminating the message "Cannot get a virtual area" messages previously seen during EAL init on Power. lib/librte_eal/linuxapp/eal/eal_memory.c | 75 +++++++++++++++++------- 1 file changed, 54 insertions(+), 21 deletions(-) diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c index bac969a12..5b7001be8 100644 --- a/lib/librte_eal/linuxapp/eal/eal_memory.c +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c @@ -50,6 +50,9 @@ #include #include #include +#ifdef RTE_ARCH_PPC_64 +#include +#endif #include #include #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES @@ -107,6 +110,10 @@ static uint64_t baseaddr = 0x100000000; static bool phys_addrs_available = true; +#ifdef RTE_ARCH_PPC_64 +static int p8; +#endif + #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space" static void @@ -309,12 +316,7 @@ get_virtual_area(size_t *size, size_t hugepage_sz) addr_hint = get_addr_hint(); addr = mmap(addr_hint, - (*size) + hugepage_sz, PROT_READ, -#ifdef RTE_ARCH_PPC_64 - MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -#else - MAP_PRIVATE, -#endif + (*size) + hugepage_sz, PROT_READ, MAP_PRIVATE, fd, 0); if (addr == MAP_FAILED) { /* map failed. Let's try with less memory */ @@ -501,6 +503,15 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl, struct hugepage_info *hpi, * vma_len. If it fails, vma_addr is NULL, so * let the kernel provide the address. */ vma_addr = get_virtual_area(&vma_len, hpi->hugepage_sz); +#ifdef RTE_ARCH_PPC_64 + /* + * On power8 the address hint is not consistently + * honored, therefor we always let the + * kernel provide the address. + */ + if (p8) + vma_addr = NULL; +#endif if (vma_addr == NULL) vma_len = hugepage_sz; } @@ -1059,6 +1070,23 @@ rte_eal_hugepage_init(void) int nr_hugefiles, nr_hugepages = 0; void *addr; +#ifdef RTE_ARCH_PPC_64 + char *platform; + platform = (char *)getauxval(AT_BASE_PLATFORM); + + p8 = 0; + + /* Alert the user in case our assumptions are incorrect */ + if (platform == NULL) + printf("Some distros on P9 do not support " + "getauxval(AT_BASE_PLATFORM). Assuming P9\n"); + + if (platform && !strncmp(platform, "power8", 6)) { + RTE_LOG(DEBUG, EAL, "This must be a P8\n"); + p8 = 1; + } else + RTE_LOG(DEBUG, EAL, "This must be a P9\n"); +#endif test_phys_addrs_available(); memset(used_hp, 0, sizeof(used_hp)); @@ -1305,14 +1333,22 @@ rte_eal_hugepage_init(void) new_memseg = 1; #ifdef RTE_ARCH_PPC_64 - /* On PPC64 architecture, the mmap always start from higher - * virtual address to lower address. Here, both the physical - * address and virtual address are in descending order */ + /* + * On power8 we let the kernel selected the virtual address + * for mmaped segments, successive mmaps will start from + * higher virtual address to lower address. Physical address + * are in descending order for both platforms. + */ else if ((hugepage[i-1].physaddr - hugepage[i].physaddr) != hugepage[i].size) new_memseg = 1; - else if (((unsigned long)hugepage[i-1].final_va - - (unsigned long)hugepage[i].final_va) != hugepage[i].size) + else if ((((unsigned long)hugepage[i-1].final_va - + (unsigned long)hugepage[i].final_va) != + hugepage[i].size) && (p8)) + new_memseg = 1; + else if ((((unsigned long)hugepage[i].final_va - + (unsigned long)hugepage[i-1].final_va) != + hugepage[i].size) && (!p8)) new_memseg = 1; #else else if ((hugepage[i].physaddr - hugepage[i-1].physaddr) != @@ -1338,9 +1374,12 @@ rte_eal_hugepage_init(void) else { #ifdef RTE_ARCH_PPC_64 /* Use the phy and virt address of the last page as segment - * address for IBM Power architecture */ - mcfg->memseg[j].iova = hugepage[i].physaddr; - mcfg->memseg[j].addr = hugepage[i].final_va; + * address for IBM Power8 architecture. + */ + if (p8) { + mcfg->memseg[j].iova = hugepage[i].physaddr; + mcfg->memseg[j].addr = hugepage[i].final_va; + } #endif mcfg->memseg[j].len += mcfg->memseg[j].hugepage_sz; } @@ -1437,13 +1476,7 @@ rte_eal_hugepage_attach(void) * use mmap to get identical addresses as the primary process. */ base_addr = mmap(mcfg->memseg[s].addr, mcfg->memseg[s].len, - PROT_READ, -#ifdef RTE_ARCH_PPC_64 - MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -#else - MAP_PRIVATE, -#endif - fd_zero, 0); + PROT_READ, MAP_PRIVATE, fd_zero, 0); if (base_addr == MAP_FAILED || base_addr != mcfg->memseg[s].addr) { max_seg = s; -- 2.19.1