From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 42ADB2BA6 for ; Fri, 10 Jun 2016 11:47:11 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP; 10 Jun 2016 02:47:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,449,1459839600"; d="scan'208";a="716921008" Received: from irsmsx154.ger.corp.intel.com ([163.33.192.96]) by FMSMGA003.fm.intel.com with ESMTP; 10 Jun 2016 02:47:09 -0700 Received: from irsmsx109.ger.corp.intel.com ([169.254.13.193]) by IRSMSX154.ger.corp.intel.com ([169.254.12.28]) with mapi id 14.03.0248.002; Fri, 10 Jun 2016 10:47:07 +0100 From: "Burakov, Anatoly" To: Thomas Monjalon CC: David Marchand , "dev@dpdk.org" , "Gonzalez Monroy, Sergio" , "Yigit, Ferruh" , "Traynor, Kevin" , "pmatilai@redhat.com" Thread-Topic: [PATCH] dropping librte_ivshmem - was log: deprecate history dump Thread-Index: AQHRwpWkIvfEVp+q1EWlZtS+ZvGHIJ/iZ52g///3xoCAABF0kA== Date: Fri, 10 Jun 2016 09:47:07 +0000 Message-ID: References: <1465481396-23968-1-git-send-email-thomas.monjalon@6wind.com> <1679257.PTMOF8o7eO@xps13> <1594485.zPMdI6dQJ2@xps13> In-Reply-To: <1594485.zPMdI6dQJ2@xps13> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNGEyYzRkNzYtMzBiZS00Y2VlLThmNTctNzdjYmU0MjhjZTI4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IlJTVjlIZGxqYVNHN0IxSFZNOEpSSDlUZVBHMEVFTm40SzY0RlloS283alU9In0= x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] dropping librte_ivshmem - was log: deprecate history dump X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jun 2016 09:47:11 -0000 > > Hi Thomas, > > > > Just a few notes: > > > > > 3/ The automatic mapped allocation of DPDK objects in the guest. > > > It should not be done in EAL. > > > An ivshmem driver would be called by rte_eal_dev_init. > > > It would check where are the shared DPDK structures, as currently > > > done with the IVSHMEM_MAGIC (0x0BADC0DE), and do the appropriate > allocations. > > > Thus only the driver would depend on ring and mempool. > > > > The problem here is IVSHMEM doesn't allocate the memory from DPDK, it > allocates new memory segments by mapping a PCI device. I.e. it doesn't do > mallocs, it modifies mem_config and adds memory to DPDK. Can that be > done from within a PMD? >=20 > Everything is possible :) > Maybe you just need to add an API to add some memory segments. > Other question: why is it so important to register these memory segments = in > EAL? I think they just need to be known by the ivshmem driver which map > some objects on top. That's because we need the memzone_lookup functionality. We can get by with= out it with rings because those are tailq-based, so we can just put rings t= here, but memzones are looked up through the memconfig, so IVSHMEM memzones= have to be present there in order for the code to work without any additio= nal API's. Although, I guess we don't really need to have _memsegs_ in order to lookup= memzones - we just have to create some memzones directly inside mcfg, bypa= ssing the normal memzone_reserve stuff. That would still be a hack, but pro= bably much less of a hack than what there is right now :)=20 Another possible issue here is the order in which the memory is allocated. = We put IVSHMEM init in EAL because we have to map things at specific addres= ses. The later IVSHMEM initializes, the more chance something will take up = memory space that IVSHMEM needs. This could probably be solved with --base-= virtaddr, so documentation will have to be updated to include advice to use= that flag. >=20 > > > The last step of the ivshmem cleanup will be to remove the memory > > > hack RTE_EAL_SINGLE_FILE_SEGMENTS. Then > CONFIG_RTE_LIBRTE_IVSHMEM > > > could be removed. > > > > The reason for that hack is that we often need to map several hugepages= , > and some of those pages could be 2M in size. If you're sharing 1G worth o= f > contiguous memory backed by 2M pages, that's 512 files in the command lin= e > in vanilla DPDK, but can be made into one with > RTE_EAL_SINGLE_FILE_SEGMENTS, so that QEMU command-line doesn't get > overly long. > > > > So removing this hack, while definitely desired, will adversely affect > > some use cases, such as using IVSHMEM on platforms where 1G pages > > aren't supported. Whether we want to go with the effort of supporting > > those is of course an open question - I personally don't have any data > > on IVSHMEM userbase. Maybe Kevin/other OVS devs could help me out > here > > :) >=20 > We can keep supporting 2M pages by having a command line option, instead > of the #ifdef RTE_EAL_SINGLE_FILE_SEGMENTS. > But as I said, it is not the top priority to remove this hack. Ah, so you're not suggesting removing the _functionality_, just the #ifdef?= That could be made to work I guess... Also, please correct me if I'm wrong, but I seem to remember some patches a= bout putting all memory in a single file - I think that should work for IVS= HMEM as well, because I believe IVSHMEM handles holes in files just fine, a= nd can map even if everything resides inside a single file. So if that patc= h does what I think it does, we might just integrate it and remove the sing= le file segments code entirely. Thanks, Anatoly