From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E9522A0487 for ; Fri, 5 Jul 2019 12:31:42 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id CE1281BE42; Fri, 5 Jul 2019 12:31:41 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 7BBA41BE23 for ; Fri, 5 Jul 2019 12:31:40 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Jul 2019 03:31:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,454,1557212400"; d="scan'208";a="363565056" Received: from vmedvedk-mobl.ger.corp.intel.com (HELO [10.237.220.98]) ([10.237.220.98]) by fmsmga006.fm.intel.com with ESMTP; 05 Jul 2019 03:31:36 -0700 To: Stephen Hemminger Cc: Honnappa Nagarahalli , "Ruifeng Wang (Arm Technology China)" , "bruce.richardson@intel.com" , "dev@dpdk.org" , "Gavin Hu (Arm Technology China)" , nd References: <20190627093751.7746-1-ruifeng.wang@arm.com> <20190627082451.56719392@hermes.lan> <20190627213450.30082af6@hermes.lan> <185e012d-6f8a-66be-dc8c-a420065660fb@intel.com> <20190628083507.31eca1db@hermes.lan> From: "Medvedkin, Vladimir" Message-ID: <6a78a166-f19c-5444-0a1a-a74aa06463b1@intel.com> Date: Fri, 5 Jul 2019 11:31:36 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20190628083507.31eca1db@hermes.lan> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: not inline unnecessary functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Stephen, On 28/06/2019 16:35, Stephen Hemminger wrote: > On Fri, 28 Jun 2019 15:16:30 +0100 > "Medvedkin, Vladimir" wrote: > >> Hi Honnappa, >> >> On 28/06/2019 14:57, Honnappa Nagarahalli wrote: >>>> Hi all, >>>> >>>> On 28/06/2019 05:34, Stephen Hemminger wrote: >>>>> On Fri, 28 Jun 2019 02:44:54 +0000 >>>>> "Ruifeng Wang (Arm Technology China)" wrote: >>>>> >>>>>>>> Tests showed that the function inlining caused performance drop on >>>>>>>> some x86 platforms with the memory ordering patches applied. >>>>>>>> By force no-inline functions, the performance was better than >>>>>>>> before on x86 and no impact to arm64 platforms. >>>>>>>> >>>>>>>> Suggested-by: Medvedkin Vladimir >>>>>>>> Signed-off-by: Ruifeng Wang >>>>>>>> Reviewed-by: Gavin Hu >>>>>>> { >>>>>>> >>>>>>> Do you actually need to force noinline or is just taking of inline enough? >>>>>>> In general, letting compiler decide is often best practice. >>>>>> The force noinline is an optimization for x86 platforms to keep >>>>>> rte_lpm_add() API performance with memory ordering applied. >>>>> I don't think you answered my question. What does a recent version of >>>>> GCC do if you drop the inline. >>>>> >>>>> Actually all the functions in rte_lpm should drop inline. >>>> I'm agree with Stephen. If it is not a fastpath and size of function is not >>>> minimal it is good to remove inline qualifier for other control plane functions >>>> such as rule_add/delete/find/etc and let the compiler decide to inline it >>>> (unless it affects performance). >>> IMO, the rule needs to be simple. If it is control plane function, we should leave it to the compiler to decide. I do not think we need to worry too much about performance for control plane functions. >> Control plane is not as important as data plane speed but it is still >> important. For lpm we are talking not about initialization, but runtime >> routes add/del related functions. If it is very slow the library will be >> totally unusable because after it receives a route update it will be >> blocked for a long time and route update queue would overflow. > Control plane performance is more impacted by algorithmic choice. > The original LPM had terrible (n^2?) control path. Current code is better. > I had a patch using RB tree, but it was rejected because it used the > /usr/include/bsd/sys/tree.h which added a dependency. You're absolutely right,  control plane performance is mostly depends on algorithm. Current LPM implementation has number of problems there. One problem is rules_tbl[] that is a flat array containing routes for control plane purposes. Replacing it with a rb-tree solves this problem, but there are another problems. For example, when you try to add a route 10.0.0.0/8 while a number of subroutes are exist in the table (for example 10.20.0.0/16), current implementation will load tbl_entry -> do some checks (depth, ext entry) -> conditionally store new entry. Under several circumstances it would take a lot time.  But in fact it needs to unconditionally rewrite only two ranges - from 10.0.0.0 to 10.19.255.255 and from 10.21.0.0 to 10.255.255.255. And control plane could help us to get this two ranges. The best struct to do so is lc-tree because it is relatively easy to traverse subtree (described by 10.0.0.0/8) and get subroutes. We are working on a new implementation, hopefully it will be ready soon. -- Regards, Vladimir