From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by dpdk.org (Postfix) with ESMTP id 0D4F691 for ; Sat, 10 Nov 2018 03:57:03 +0100 (CET) Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id wAA2s5jw143836 for ; Fri, 9 Nov 2018 21:57:03 -0500 Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) by mx0b-001b2d01.pphosted.com with ESMTP id 2nnhbwuf2u-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 09 Nov 2018 21:57:03 -0500 Received: from localhost by e36.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Sat, 10 Nov 2018 02:57:02 -0000 Received: from b03cxnp08026.gho.boulder.ibm.com (9.17.130.18) by e36.co.us.ibm.com (192.168.1.136) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Sat, 10 Nov 2018 02:57:00 -0000 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp08026.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id wAA2uxZu25624796 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Sat, 10 Nov 2018 02:56:59 GMT Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9610178064; Sat, 10 Nov 2018 02:56:59 +0000 (GMT) Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 690D47805C; Sat, 10 Nov 2018 02:56:59 +0000 (GMT) Received: from ltc.linux.ibm.com (unknown [9.16.170.189]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP; Sat, 10 Nov 2018 02:56:59 +0000 (GMT) MIME-Version: 1.0 Date: Fri, 09 Nov 2018 18:59:45 -0800 From: dwilder To: David Wilder Cc: stable@dpdk.org, pradeep@us.ibm.com, chaozhu@linux.vnet.ibm.com In-Reply-To: <20181109192843.2718-1-dwilder@us.ibm.com> References: <20181109192843.2718-1-dwilder@us.ibm.com> X-Sender: dwilder@us.ibm.com User-Agent: Roundcube Webmail/1.0.1 X-TM-AS-GCONF: 00 x-cbid: 18111002-0020-0000-0000-00000E87E182 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010018; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000270; SDB=6.01115204; UDB=6.00578244; IPR=6.00895303; MB=3.00024095; MTD=3.00000008; XFM=3.00000015; UTC=2018-11-10 02:57:01 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18111002-0021-0000-0000-000063AAC941 Message-Id: <683727cd7768be7498206c672f603d67@linux.vnet.ibm.com> Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-11-09_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1807170000 definitions=main-1811100023 Subject: Re: [dpdk-stable] [PATCH] mem: Add Power9 support in rte_eal_hugepage_init X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 10 Nov 2018 02:57:04 -0000 On 2018-11-09 11:28, David Wilder wrote: > Determine if the ppc64 platform is Power9 or Power8 and perform huge > page mapping appropriately for the selected platform. > > Signed-off-by: Pradeep Satyanarayana > Tested-by: David Wilder > --- > On IBM Power8, when mmaping hugepage files the address hint supplied to > mmap > is not always honored, therefor we let the kernel pick the address by > specifying a NULL address hint. On Power9 the address hint is honored > as > expected. This patch detects the platform, if Power9 the address hint > is > supplied to mmap and the pages are sorted appropriately. Hugepage > mapping for > both primary and secondary processes now work correctly on Power9. I > have > retain the original behavior and limitations on Power8. Additionally > the flags > supplied to mmap() have been corrected eliminating the message "Cannot > get > a virtual area" messages previously seen during EAL init on Power. > > lib/librte_eal/linuxapp/eal/eal_memory.c | 75 +++++++++++++++++------- > 1 file changed, 54 insertions(+), 21 deletions(-) > > diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c > b/lib/librte_eal/linuxapp/eal/eal_memory.c > index bac969a12..5b7001be8 100644 > --- a/lib/librte_eal/linuxapp/eal/eal_memory.c > +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > @@ -50,6 +50,9 @@ > #include > #include > #include > +#ifdef RTE_ARCH_PPC_64 > +#include > +#endif > #include > #include > #ifdef RTE_EAL_NUMA_AWARE_HUGEPAGES > @@ -107,6 +110,10 @@ static uint64_t baseaddr = 0x100000000; > > static bool phys_addrs_available = true; > > +#ifdef RTE_ARCH_PPC_64 > +static int p8; > +#endif > + > #define RANDOMIZE_VA_SPACE_FILE "/proc/sys/kernel/randomize_va_space" > > static void > @@ -309,12 +316,7 @@ get_virtual_area(size_t *size, size_t hugepage_sz) > addr_hint = get_addr_hint(); > > addr = mmap(addr_hint, > - (*size) + hugepage_sz, PROT_READ, > -#ifdef RTE_ARCH_PPC_64 > - MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, > -#else > - MAP_PRIVATE, > -#endif > + (*size) + hugepage_sz, PROT_READ, MAP_PRIVATE, > fd, 0); > if (addr == MAP_FAILED) { > /* map failed. Let's try with less memory */ > @@ -501,6 +503,15 @@ map_all_hugepages(struct hugepage_file > *hugepg_tbl, struct hugepage_info *hpi, > * vma_len. If it fails, vma_addr is NULL, so > * let the kernel provide the address. */ > vma_addr = get_virtual_area(&vma_len, hpi->hugepage_sz); > +#ifdef RTE_ARCH_PPC_64 > + /* > + * On power8 the address hint is not consistently > + * honored, therefor we always let the > + * kernel provide the address. > + */ > + if (p8) > + vma_addr = NULL; > +#endif > if (vma_addr == NULL) > vma_len = hugepage_sz; > } > @@ -1059,6 +1070,23 @@ rte_eal_hugepage_init(void) > int nr_hugefiles, nr_hugepages = 0; > void *addr; > > +#ifdef RTE_ARCH_PPC_64 > + char *platform; > + platform = (char *)getauxval(AT_BASE_PLATFORM); > + > + p8 = 0; > + > + /* Alert the user in case our assumptions are incorrect */ > + if (platform == NULL) > + printf("Some distros on P9 do not support " > + "getauxval(AT_BASE_PLATFORM). Assuming P9\n"); > + > + if (platform && !strncmp(platform, "power8", 6)) { > + RTE_LOG(DEBUG, EAL, "This must be a P8\n"); > + p8 = 1; > + } else > + RTE_LOG(DEBUG, EAL, "This must be a P9\n"); > +#endif > test_phys_addrs_available(); > > memset(used_hp, 0, sizeof(used_hp)); > @@ -1305,14 +1333,22 @@ rte_eal_hugepage_init(void) > new_memseg = 1; > > #ifdef RTE_ARCH_PPC_64 > - /* On PPC64 architecture, the mmap always start from higher > - * virtual address to lower address. Here, both the physical > - * address and virtual address are in descending order */ > + /* > + * On power8 we let the kernel selected the virtual address > + * for mmaped segments, successive mmaps will start from > + * higher virtual address to lower address. Physical address > + * are in descending order for both platforms. > + */ > else if ((hugepage[i-1].physaddr - hugepage[i].physaddr) != > hugepage[i].size) > new_memseg = 1; > - else if (((unsigned long)hugepage[i-1].final_va - > - (unsigned long)hugepage[i].final_va) != hugepage[i].size) > + else if ((((unsigned long)hugepage[i-1].final_va - > + (unsigned long)hugepage[i].final_va) != > + hugepage[i].size) && (p8)) > + new_memseg = 1; > + else if ((((unsigned long)hugepage[i].final_va - > + (unsigned long)hugepage[i-1].final_va) != > + hugepage[i].size) && (!p8)) > new_memseg = 1; > #else > else if ((hugepage[i].physaddr - hugepage[i-1].physaddr) != > @@ -1338,9 +1374,12 @@ rte_eal_hugepage_init(void) > else { > #ifdef RTE_ARCH_PPC_64 > /* Use the phy and virt address of the last page as segment > - * address for IBM Power architecture */ > - mcfg->memseg[j].iova = hugepage[i].physaddr; > - mcfg->memseg[j].addr = hugepage[i].final_va; > + * address for IBM Power8 architecture. > + */ > + if (p8) { > + mcfg->memseg[j].iova = hugepage[i].physaddr; > + mcfg->memseg[j].addr = hugepage[i].final_va; > + } > #endif > mcfg->memseg[j].len += mcfg->memseg[j].hugepage_sz; > } > @@ -1437,13 +1476,7 @@ rte_eal_hugepage_attach(void) > * use mmap to get identical addresses as the primary process. > */ > base_addr = mmap(mcfg->memseg[s].addr, mcfg->memseg[s].len, > - PROT_READ, > -#ifdef RTE_ARCH_PPC_64 > - MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, > -#else > - MAP_PRIVATE, > -#endif > - fd_zero, 0); > + PROT_READ, MAP_PRIVATE, fd_zero, 0); > if (base_addr == MAP_FAILED || > base_addr != mcfg->memseg[s].addr) { > max_seg = s; Sorry, This breaks Chao's workaround that makes memory initialization for the second process work on power8 (setting nr_hugepages and nr_overcommit_hugepages) I need to make the mmap flags change conditional on power8/9. I am working on a v2 patch.