From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 63830A0487 for ; Fri, 5 Jul 2019 18:53:52 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2CE721BE07; Fri, 5 Jul 2019 18:53:52 +0200 (CEST) Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 33CD61BDFC for ; Fri, 5 Jul 2019 18:53:50 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga102.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jul 2019 09:53:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,455,1557212400"; d="scan'208";a="172717278" Received: from vmedvedk-mobl.ger.corp.intel.com (HELO [10.237.220.98]) ([10.237.220.98]) by FMSMGA003.fm.intel.com with ESMTP; 05 Jul 2019 09:53:47 -0700 To: Alex Kiselev Cc: Stephen Hemminger , Honnappa Nagarahalli , "Ruifeng Wang (Arm Technology China)" , "bruce.richardson@intel.com" , "dev@dpdk.org" , "Gavin Hu (Arm Technology China)" , nd References: <20190627093751.7746-1-ruifeng.wang@arm.com> <20190627082451.56719392@hermes.lan> <20190627213450.30082af6@hermes.lan> <185e012d-6f8a-66be-dc8c-a420065660fb@intel.com> <20190628083507.31eca1db@hermes.lan> <6a78a166-f19c-5444-0a1a-a74aa06463b1@intel.com> From: "Medvedkin, Vladimir" Message-ID: <344e9588-9dea-df1c-4ebf-583cc6830bec@intel.com> Date: Fri, 5 Jul 2019 17:53:46 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: not inline unnecessary functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Alex, On 05/07/2019 14:37, Alex Kiselev wrote: > пт, 5 июл. 2019 г. в 13:31, Medvedkin, Vladimir : >> Hi Stephen, >> >> On 28/06/2019 16:35, Stephen Hemminger wrote: >>> On Fri, 28 Jun 2019 15:16:30 +0100 >>> "Medvedkin, Vladimir" wrote: >>> >>>> Hi Honnappa, >>>> >>>> On 28/06/2019 14:57, Honnappa Nagarahalli wrote: >>>>>> Hi all, >>>>>> >>>>>> On 28/06/2019 05:34, Stephen Hemminger wrote: >>>>>>> On Fri, 28 Jun 2019 02:44:54 +0000 >>>>>>> "Ruifeng Wang (Arm Technology China)" wrote: >>>>>>> >>>>>>>>>> Tests showed that the function inlining caused performance drop on >>>>>>>>>> some x86 platforms with the memory ordering patches applied. >>>>>>>>>> By force no-inline functions, the performance was better than >>>>>>>>>> before on x86 and no impact to arm64 platforms. >>>>>>>>>> >>>>>>>>>> Suggested-by: Medvedkin Vladimir >>>>>>>>>> Signed-off-by: Ruifeng Wang >>>>>>>>>> Reviewed-by: Gavin Hu >>>>>>>>> { >>>>>>>>> >>>>>>>>> Do you actually need to force noinline or is just taking of inline enough? >>>>>>>>> In general, letting compiler decide is often best practice. >>>>>>>> The force noinline is an optimization for x86 platforms to keep >>>>>>>> rte_lpm_add() API performance with memory ordering applied. >>>>>>> I don't think you answered my question. What does a recent version of >>>>>>> GCC do if you drop the inline. >>>>>>> >>>>>>> Actually all the functions in rte_lpm should drop inline. >>>>>> I'm agree with Stephen. If it is not a fastpath and size of function is not >>>>>> minimal it is good to remove inline qualifier for other control plane functions >>>>>> such as rule_add/delete/find/etc and let the compiler decide to inline it >>>>>> (unless it affects performance). >>>>> IMO, the rule needs to be simple. If it is control plane function, we should leave it to the compiler to decide. I do not think we need to worry too much about performance for control plane functions. >>>> Control plane is not as important as data plane speed but it is still >>>> important. For lpm we are talking not about initialization, but runtime >>>> routes add/del related functions. If it is very slow the library will be >>>> totally unusable because after it receives a route update it will be >>>> blocked for a long time and route update queue would overflow. >>> Control plane performance is more impacted by algorithmic choice. >>> The original LPM had terrible (n^2?) control path. Current code is better. >>> I had a patch using RB tree, but it was rejected because it used the >>> /usr/include/bsd/sys/tree.h which added a dependency. >> You're absolutely right, control plane performance is mostly depends on >> algorithm. Current LPM implementation has number of problems there. One >> problem is rules_tbl[] that is a flat array containing routes for >> control plane purposes. Replacing it with a rb-tree solves this problem, >> but there are another problems. For example, when you try to add a route >> 10.0.0.0/8 while a number of subroutes are exist in the table (for >> example 10.20.0.0/16), current implementation will load tbl_entry -> do >> some checks (depth, ext entry) -> conditionally store new entry. Under >> several circumstances it would take a lot time. But in fact it needs to >> unconditionally rewrite only two ranges - from 10.0.0.0 to 10.19.255.255 >> and from 10.21.0.0 to 10.255.255.255. And control plane could help us to >> get this two ranges. The best struct to do so is lc-tree because it is >> relatively easy to traverse subtree (described by 10.0.0.0/8) and get >> subroutes. We are working on a new implementation, hopefully it will be >> ready soon. > Have you considered switching to this algorithm? > http://www.nxlab.fer.hr/dxr/ I considered DXR (and not only, for example poptrie). There are number of pros and cons comparing to DIR24-8. In my opinion it would be great to provide an option to choose an algo for your routing table. > -- Regards, Vladimir