From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 175D92BA9 for ; Tue, 22 Mar 2016 18:10:50 +0100 (CET) Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP; 22 Mar 2016 10:10:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,378,1455004800"; d="scan'208";a="71242815" Received: from fooyekan-mobl2.ger.corp.intel.com ([10.252.25.15]) by fmsmga004.fm.intel.com with SMTP; 22 Mar 2016 10:10:48 -0700 Received: by (sSMTP sendmail emulation); Tue, 22 Mar 2016 17:10:46 +0025 Date: Tue, 22 Mar 2016 17:10:46 +0000 From: Bruce Richardson To: Sergio Gonzalez Monroy Cc: Gowrishankar , dev@dpdk.org, chaozhu@linux.vnet.ibm.com, David Marchand Message-ID: <20160322171046.GG20448@bricha3-MOBL3> References: <1457360003-30055-1-git-send-email-gowrishankar.m@linux.vnet.ibm.com> <56F17454.3010907@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56F17454.3010907@intel.com> Organization: Intel Shannon Ltd. User-Agent: Mutt/1.5.23 (2014-03-12) Subject: Re: [dpdk-dev] [PATCH] eal/ppc: fix secondary process to map hugepages in correct order X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2016 17:10:51 -0000 On Tue, Mar 22, 2016 at 04:35:32PM +0000, Sergio Gonzalez Monroy wrote: > First of all, forgive my ignorance regarding ppc64 and if the questions are > naive but after having a > look to the already existing code for ppc64 and this patch now, why are we > doing this reverse mapping at all? > > I guess the question revolves around the comment in eal_memory.c: > 1316 /* On PPC64 architecture, the mmap always start from > higher > 1317 * virtual address to lower address. Here, both the > physical > 1318 * address and virtual address are in descending order > */ > > From looking at the code, for ppc64 we do qsort in reverse order and > thereafter everything looks to be is > done to account for that reverse sorting. > > CC: Chao Zhu and David Marchand as original author and reviewer of the code. > > Sergio > Just to add my 2c here. At one point, with I believe some i686 installs - don't remember the specific OS/kernel, we found that the mmap calls were returning the highest free address first and then working downwards - must like seems to be described here. To fix this we changed the mmap code from assuming that addresses are mapped upwards, to instead explicitly requesting a large free block of memory (mmap of /dev/zero) to find a free address space range of the correct size, and then explicitly mmapping each individual page to the appropriate place in that free range. With this scheme it didn't matter whether the OS tried to mmap the pages from the highest or lowest address because we always told the OS where to put the page (and we knew the slot was free from the earlier block mmap). Would this scheme not also work for PPC in a similar way? (Again, forgive unfamiliarity with PPC! :-) ) /Bruce > > On 07/03/2016 14:13, Gowrishankar wrote: > >From: Gowri Shankar > > > >For a secondary process address space to map hugepages from every segment of > >primary process, hugepage_file entries has to be mapped reversely from the > >list that primary process updated for every segment. This is for a reason that, > >in ppc64, hugepages are sorted for decrementing addresses. > > > >Signed-off-by: Gowrishankar > >--- > > lib/librte_eal/linuxapp/eal/eal_memory.c | 26 ++++++++++++++++---------- > > 1 file changed, 16 insertions(+), 10 deletions(-) > > > >diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c > >index 5b9132c..6aea5d0 100644 > >--- a/lib/librte_eal/linuxapp/eal/eal_memory.c > >+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c > >@@ -1400,7 +1400,7 @@ rte_eal_hugepage_attach(void) > > { > > const struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; > > const struct hugepage_file *hp = NULL; > >- unsigned num_hp = 0; > >+ unsigned num_hp = 0, mapped_hp = 0; > > unsigned i, s = 0; /* s used to track the segment number */ > > off_t size; > > int fd, fd_zero = -1, fd_hugepage = -1; > >@@ -1486,14 +1486,12 @@ rte_eal_hugepage_attach(void) > > goto error; > > } > >- num_hp = size / sizeof(struct hugepage_file); > >- RTE_LOG(DEBUG, EAL, "Analysing %u files\n", num_hp); > >- > > s = 0; > > while (s < RTE_MAX_MEMSEG && mcfg->memseg[s].len > 0){ > > void *addr, *base_addr; > > uintptr_t offset = 0; > > size_t mapping_size; > >+ unsigned int index; > > #ifdef RTE_LIBRTE_IVSHMEM > > /* > > * if segment has ioremap address set, it's an IVSHMEM segment and > >@@ -1504,6 +1502,8 @@ rte_eal_hugepage_attach(void) > > continue; > > } > > #endif > >+ num_hp = mcfg->memseg[s].len / mcfg->memseg[s].hugepage_sz; > >+ RTE_LOG(DEBUG, EAL, "Analysing %u files in segment %u\n", num_hp, s); > > /* > > * free previously mapped memory so we can map the > > * hugepages into the space > >@@ -1514,18 +1514,23 @@ rte_eal_hugepage_attach(void) > > /* find the hugepages for this segment and map them > > * we don't need to worry about order, as the server sorted the > > * entries before it did the second mmap of them */ > >+#ifdef RTE_ARCH_PPC_64 > >+ for (i = num_hp-1; i < num_hp && offset < mcfg->memseg[s].len; i--){ > >+#else > > for (i = 0; i < num_hp && offset < mcfg->memseg[s].len; i++){ > >- if (hp[i].memseg_id == (int)s){ > >- fd = open(hp[i].filepath, O_RDWR); > >+#endif > >+ index = i + mapped_hp; > >+ if (hp[index].memseg_id == (int)s){ > >+ fd = open(hp[index].filepath, O_RDWR); > > if (fd < 0) { > > RTE_LOG(ERR, EAL, "Could not open %s\n", > >- hp[i].filepath); > >+ hp[index].filepath); > > goto error; > > } > > #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS > >- mapping_size = hp[i].size * hp[i].repeated; > >+ mapping_size = hp[index].size * hp[index].repeated; > > #else > >- mapping_size = hp[i].size; > >+ mapping_size = hp[index].size; > > #endif > > addr = mmap(RTE_PTR_ADD(base_addr, offset), > > mapping_size, PROT_READ | PROT_WRITE, > >@@ -1534,7 +1539,7 @@ rte_eal_hugepage_attach(void) > > if (addr == MAP_FAILED || > > addr != RTE_PTR_ADD(base_addr, offset)) { > > RTE_LOG(ERR, EAL, "Could not mmap %s\n", > >- hp[i].filepath); > >+ hp[index].filepath); > > goto error; > > } > > offset+=mapping_size; > >@@ -1543,6 +1548,7 @@ rte_eal_hugepage_attach(void) > > RTE_LOG(DEBUG, EAL, "Mapped segment %u of size 0x%llx\n", s, > > (unsigned long long)mcfg->memseg[s].len); > > s++; > >+ mapped_hp += num_hp; > > } > > /* unmap the hugepage config file, since we are done using it */ > > munmap((void *)(uintptr_t)hp, size); >