From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 491549ABF for ; Tue, 24 Feb 2015 14:24:43 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP; 24 Feb 2015 05:24:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.09,638,1418112000"; d="scan'208";a="689998630" Received: from smonroyx-mobl.ger.corp.intel.com (HELO [10.237.221.21]) ([10.237.221.21]) by orsmga002.jf.intel.com with ESMTP; 24 Feb 2015 05:24:40 -0800 Message-ID: <54EC7B96.7070202@intel.com> Date: Tue, 24 Feb 2015 13:24:38 +0000 From: "Gonzalez Monroy, Sergio" User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Neil Horman References: <54DCB3B6.1010204@redhat.com> <20150212155225.GB4634@neilslaptop.think-freely.org> <54DDCE68.7090400@redhat.com> <54DDDB12.3090100@intel.com> <20150213125142.GA11979@neilslaptop.think-freely.org> <54E74548.7010805@intel.com> <20150222233740.GB31293@neilslaptop.think-freely.org> <54EAFFFD.5000200@intel.com> <20150223135205.GA19230@hmsreliant.think-freely.org> <54EB4016.1040204@intel.com> <20150223182319.GC19230@hmsreliant.think-freely.org> In-Reply-To: <20150223182319.GC19230@hmsreliant.think-freely.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH 0/8] Improve build process X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 24 Feb 2015 13:24:44 -0000 On 23/02/2015 18:23, Neil Horman wrote: > On Mon, Feb 23, 2015 at 02:58:30PM +0000, Gonzalez Monroy, Sergio wrote: >> On 23/02/2015 13:52, Neil Horman wrote: >>> On Mon, Feb 23, 2015 at 10:25:01AM +0000, Gonzalez Monroy, Sergio wrote: >>>> On 22/02/2015 23:37, Neil Horman wrote: >>>>> On Fri, Feb 20, 2015 at 02:31:36PM +0000, Gonzalez Monroy, Sergio wrote: >>>>>> On 13/02/2015 12:51, Neil Horman wrote: >>>>>>> On Fri, Feb 13, 2015 at 11:08:02AM +0000, Gonzalez Monroy, Sergio wrote: >>>>>>>> On 13/02/2015 10:14, Panu Matilainen wrote: >>>>>>>>> On 02/12/2015 05:52 PM, Neil Horman wrote: >>>>>>>>>> On Thu, Feb 12, 2015 at 04:07:50PM +0200, Panu Matilainen wrote: >>>>>>>>>>> On 02/12/2015 02:23 PM, Neil Horman wrote: >>>>>>>>> [...snip...] >>>>>>>>>>>>>>> So I just realized that I was not having into account a possible >>>>>>>>>>>>>>> scenario, where >>>>>>>>>>>>>>> we have an app built with static dpdk libs then loading a dso >>>>>>>>>>>>>>> with -d >>>>>>>>>>>>>>> option. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In such case, because the pmd would have DT_NEEDED entries, >>>>>>>>>>>>>>> dlopen will >>>>>>>>>>>>>>> fail. >>>>>>>>>>>>>>> So to enable such scenario we would need to build PMDs without >>>>>>>>>>>>>>> DT_NEEDED >>>>>>>>>>>>>>> entries. >>>>>>>>>>>>>> Hmm, for that to be a problem you'd need to have the PMD built >>>>>>>>>>>>>> against >>>>>>>>>>>>>> shared dpdk libs and while the application is built against >>>>>>>>>>>>>> static dpdk >>>>>>>>>>>>>> libs. I dont think that's a supportable scenario in any case. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Or is there some other scenario that I'm not seeing? >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Panu - >>>>>>>>>>>>>> >>>>>>>>>>>>> I agree with you. I suppose it comes down to, do we want to >>>>>>>>>>>>> support such >>>>>>>>>>>>> scenario? >>>>>>>>>>>>> >>>>>>>>>>>>> From what I can see, it seems that we do currently support such >>>>>>>>>>>>> scenario by >>>>>>>>>>>>> building dpdk apps against all static dpdk libs using >>>>>>>>>>>>> --whole-archive (all >>>>>>>>>>>>> libs and not only PMDs). >>>>>>>>>>>>> http://dpdk.org/browse/dpdk/commit/?id=20afd76a504155e947c770783ef5023e87136ad8 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Am I misunderstanding this? >>>>>>>>>>>>> >>>>>>>>>>>> Shoot, you're right, I missed the static build aspect to this. Yes, >>>>>>>>>>>> if we do the following: >>>>>>>>>>>> >>>>>>>>>>>> 1) Build the DPDK as a static library >>>>>>>>>>>> 2) Link an application against (1) >>>>>>>>>>>> 3) Use the dlopen mechanism to load a PMD built as a DSO >>>>>>>>>>>> >>>>>>>>>>>> Then the DT_NEEDED entries in the DSO will go unsatisfied, because >>>>>>>>>>>> the shared >>>>>>>>>>>> objects on which it (the PMD) depends will not exist in the file >>>>>>>>>>>> system. >>>>>>>>>>> I think its even more twisty: >>>>>>>>>>> >>>>>>>>>>> 1) Build the DPDK as a static library >>>>>>>>>>> 2) Link an application against (1) >>>>>>>>>>> 3) Do another build of DPDK as a shared library >>>>>>>>>>> 4) In app 2), use the dlopen mechanism to load a PMD built as a part >>>>>>>>>>> of or >>>>>>>>>>> against 3) >>>>>>>>>>> >>>>>>>>>>> Somehow I doubt this would work very well. >>>>>>>>>>> >>>>>>>>>> Ideally it should, presuming the ABI is preserved between (1) and (3), >>>>>>>>>> though I >>>>>>>>>> agree, up until recently, that was an assumption that was unreliable. >>>>>>>>> Versioning is a big and important step towards reliability but there are >>>>>>>>> more issues to solve. This of course getting pretty far from the original >>>>>>>>> topic, but at least one such issue is that there are some cases where a >>>>>>>>> config value affects what are apparently public structs (rte_mbuf wrt >>>>>>>>> RTE_MBUF_REFCNT for example), which really is a no-go. >>>>>>>>> >>>>>>>> Agree, the RTE_MBUF_REFCNT is something that needs to be dealt with asap. >>>>>>>> I'll look into it. >>>>>>>> >>>>>>>>>>>> I think the problem is a little bit orthogonal to the libdpdk_core >>>>>>>>>>>> problem you >>>>>>>>>>>> were initially addressing. That is to say, this problem of >>>>>>>>>>>> dlopen-ed PMD's >>>>>>>>>>>> exists regardless of weather you build the DPDK as part of a static >>>>>>>>>>>> or dynamic >>>>>>>>>>>> library. The problems just happen to intersect in their >>>>>>>>>>>> manipulation of the >>>>>>>>>>>> DT_NEEDED entries. >>>>>>>>>>>> >>>>>>>>>>>> Ok, so, given the above, I would say your approach is likely >>>>>>>>>>>> correct, just >>>>>>>>>>>> prevent DT_NEEDED entries from getting added to PMD's. Doing so will >>>>>>>>>>>> sidestep >>>>>>>>>>>> loading issue for libraries that may not exist in the filesystem, >>>>>>>>>>>> but thats ok, >>>>>>>>>>>> because by all rights, the symbols codified in those needed >>>>>>>>>>>> libraries should >>>>>>>>>>>> already be present in the running application (either made available >>>>>>>>>>>> by the >>>>>>>>>>>> application having statically linked them, or having the linker load >>>>>>>>>>>> them from >>>>>>>>>>>> the proper libraries at run time). >>>>>>>>>>> My 5c is that I'd much rather see the common case (all static or all >>>>>>>>>>> shared) >>>>>>>>>>> be simple and reliable, which in case of DSOs includes no lying >>>>>>>>>>> (whether by >>>>>>>>>>> omission or otherwise) about DT_NEEDED, ever. That way the issue is >>>>>>>>>>> dealt >>>>>>>>>>> once where it belongs. If somebody wants to go down the rabbit hole of >>>>>>>>>>> mixed >>>>>>>>>>> shared + static linkage, let them dig the hole by themselves :) >>>>>>>>>>> >>>>>>>>>> This is a fair point. Can DT_NEEDED sections be stripped via tools like >>>>>>>>>> objcopy >>>>>>>>>> after the build is complete? If so, end users can hack this corner case >>>>>>>>>> to work >>>>>>>>>> as needed. >>>>>>>>> Patchelf (http://nixos.org/patchelf.html) appears to support that, but >>>>>>>>> given that source is available it'd be easier to just modify the makefiles >>>>>>>>> if that's really needed. >>>>>>>>> >>>>>>>> I think we agree on the issue. >>>>>>>> >>>>>>>> So I'll be sending a patch to add DT_NEEDED entries to all libraries and >>>>>>>> PMDs. The only exception would be librte_eal, which would not have proper >>>>>>>> NEEDED entries. >>>>>>>> Do we bother adding a linker script for librte_eal that would include >>>>>>>> dependent libraries? >>>>>>>> >>>>>>> I say yes to the linker script, but will happily bow to an alternate consensus >>>>>>> Neil >>>>>>> >>>>>> So the case we want to solve is the following circular dependencies: >>>>>> eal -> mempool, malloc >>>>>> mempool -> eal , malloc, ring >>>>>> malloc -> eal >>>>>> ring -> eal, malloc >>>>>> >>>>>> We cannot write/create the proposed (below) linker script at least until we >>>>>> have built mempool and malloc. >>>>>> INPUT ( -lrte_eal.so -lrte_mempool -lrte_malloc ) >>>>>> >>>>> Not sure I understand why you have a build time dependency on this. Link time >>>>> perhaps, but not build time. Or am I reading too much into your use of the term >>>>> 'built' above? >>>> I meant 'built' as compiled + linked. Am I misusing the term? >>> No, you're not (though I misused the term link time above, I meant to say load >>> time). So you're saying that when you build shared libraries, you get linker >>> errors indicating that, during the build, you're missing symbols, is that >>> correct? I guess I'm confused because I don't see how thats not happening for >>> everyone, right now. In other words, I'm not sure what about your changes is >>> giving rise to that problem. >>> >>>>>> Few ways I have thought about implementing this (not particularly fond of >>>>>> any of them) : >>>>>> - Have the linker script file in the repo (scripts/ ?) in a fixed location >>>>>> and just copy it to $(RTE_OUTPUT)/lib/ once all libs have finished building. >>>>>> - Generate the file on build time from a defined make variable once all >>>>>> libs have finished >>>>>> >>>>> I'm still not sure I understand. Why does this dependency exist at build time? >>>>> The dependency between malloc and eal shouldn't be a problem during the build, >>>>> as symbols from each other should just remain undefined, and get resolved at >>>>> load time. >>>> Is that not the way it is currently implemented? >>>> I get the impression that we are talking about different goals (correct me >>>> if it is not the case) >>>> >>> We may well be, I'm not sure yet. >>> >>>> I thought that the agreed solution was to: >>>> 1) NOT to create/generate a 'core' library >>>> 2) Add DT_NEEDED entries for all libraries (except eal which is the first >>>> library we link) >>>> 3) Use linker script for eal >>>> >>> Ok, we're definately on the same page, as thats what I thought the goal was as >>> well. >>> >>>> Given the previously mentioned circular dependencies between eal, mempool, >>>> malloc and ring: >>>> - eal would not be linked against other libraries (no NEEDED entries) >>>> - malloc is linked against eal (previously built), so malloc would have a >>>> NEEDED entry for eal. >>>> >>>> In that scenario, if the linker script is setup/created after we build eal, >>>> then when we try to link malloc >>>> against eal, the linker will pull mempool and malloc too (because we >>>> included them in the linker script). >>>> Therefore, the link fails as none of those libraries (malloc and mempool) >>>> have been built yet. >>>> >>> Ah, I see now, I wasn't thinking about the extra requirements that DT_NEEDED >>> entries placed on the build conditions. >>> >>> I see now, apologies for being dense previously. Given what you indicate I >>> would say that the solution here is to link the libraries against individual >>> other specific libraries, not the core library that you generate as a linker >>> script. That way you avoid the circular dependency, and the core library just >>> becomes a convienience for application developers looking to link to a single >>> library. >>> >> I'm not sure I quite understand your suggestion. >> >> Could you roughly describe steps for building eal, malloc and mempool libs ? >> For example, something like this? >> 1) build eal, which creates librte_eal.so.1 >> 2) write linker script for librte_eal.so >> 3) build malloc against eal (-lrte_eal ) >> etc > Hm, so I spent a bit of time looking at this, and your right, I thought this was > really just a artifact of the introduction of --as-needed to the build to force > DT_NEEDED entries, and my suggestion was that you simply not link the libraries > that were causing the circular dependency. I had assumed that the link > directives included -lrte_malloc -lrte_mempool for the eal library, but they > weren't really needed, so you could remove them and it would work out. > > Unfortunately that turns out to not be the case. librte_eal does explicitly use > calls in librte_malloc, and vice versa. The current use of -no-as-needed in the > build system (which I was previously unaware of), is a hack to avoid having to > address that problem. > > That throws a monkey wrench into this plan. I would see 3 ways forward: > > 1) Fix the problem - That is to say, remove the use of --no-as-needed from the > build, and address the circular dependencies that arise. This could/will mean > actually merging libraries with circular dependencies into a single library, as > they should be so that they can completely resolve all of the symbols they use > at link time > > 2) Ignore the problem. If we just keep the lack of DT_NEEDED entries in place, > I think the problem goes away, and we can continue on. > > I think option 1 is likely the more correct approach, as removing DT_NEEDED to > avoid a circuar depenency is a hack, but it may not be the most pragmatic > approach. just living without DT_NEEDED entries and documenting link time needs > will certainly be faster, and might be the better course of action, especially > if we provide a 'core' pseudo library/linker script that embodies that action > for the end user. > > Neil > So basically for 1) the approach of creating a core library would be a solution. The last suggestion for the core library was to not merge the sources but generate a single library. The problem with that was the versioning wouldn't work as it currently is, given that is per library. So if we were to create a core library, we just need to merge the version.map files of each library into a single version.map for the core library. This approach, as you noted, would be the proper fix. The other solution would be to just leave eal without DT_NEEDED entries and specify in the docs that apps require eal, mempool, malloc and ring to be link such as: --no-as-needed -lrte_eal -lrte_mempool -lrte_malloc -lrte_ring --as-needed With those flags it should work regardless of --as-needed being set before hand. With this second approach we would have all libraries, but eal, with DT_NEEDED entries and we would not need to create a core library. We don't really need to create a linker script for that (not sure we can even create such linker script with those flags) and just documenting link time needs as you mentioned should be enough. So should I go forward with last suggested approach? Regards, Sergio >> I suppose that another way to go about this, instead of creating the linker >> script that pulls >> dependent libraries, is to always link (using --no-as-needed in case gcc >> adds it by default) >> against these libraries (eal, mempool, malloc, and ring) with necessary doc >> about how to build apps. >> >> Sergio >>> Neil >>> >>>> Was your suggestion to leave all of these libraries (eal, mempool, malloc, >>>> ring) without NEEDED entries? >>>> >>> No, you can add NEEDED entries there, they will just be for the individual >>> libraries, not the core linker script library. >>> >>> Best >>> Neil >>> >>>> Regards, >>>> Sergio >>>>> What is the error you are getting? >>>>> >>>>> Best >>>>> Neil >>>>> >>>>>> Thoughts? any other approached is more than welcome! >>>>>> >>>>>> Sergio >>>>>> >>>>>> PS: Thinking again on the core library and the issue of having multiple >>>>>> version.map files, we could have a core_version.map instead instead of >>>>>> multiple files per core library (eal, mempool, etc) >>>>>> >>>>>> >>>> >> >>