From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 96905A046B for ; Fri, 28 Jun 2019 15:47:20 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 3727B4CAF; Fri, 28 Jun 2019 15:47:20 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id 390A14CAF for ; Fri, 28 Jun 2019 15:47:18 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Jun 2019 06:47:17 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,427,1557212400"; d="scan'208";a="183688238" Received: from vmedvedk-mobl.ger.corp.intel.com (HELO [10.237.220.98]) ([10.237.220.98]) by fmsmga001.fm.intel.com with ESMTP; 28 Jun 2019 06:47:15 -0700 From: "Medvedkin, Vladimir" To: Stephen Hemminger , "Ruifeng Wang (Arm Technology China)" Cc: "bruce.richardson@intel.com" , "dev@dpdk.org" , Honnappa Nagarahalli , "Gavin Hu (Arm Technology China)" , nd References: <20190627093751.7746-1-ruifeng.wang@arm.com> <20190627082451.56719392@hermes.lan> <20190627213450.30082af6@hermes.lan> Message-ID: Date: Fri, 28 Jun 2019 14:47:14 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20190627213450.30082af6@hermes.lan> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: not inline unnecessary functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi all, On 28/06/2019 05:34, Stephen Hemminger wrote: > On Fri, 28 Jun 2019 02:44:54 +0000 > "Ruifeng Wang (Arm Technology China)" wrote: > >>> >>>> Tests showed that the function inlining caused performance drop on >>>> some x86 platforms with the memory ordering patches applied. >>>> By force no-inline functions, the performance was better than before >>>> on x86 and no impact to arm64 platforms. >>>> >>>> Suggested-by: Medvedkin Vladimir >>>> Signed-off-by: Ruifeng Wang >>>> Reviewed-by: Gavin Hu >>> { >>> >>> Do you actually need to force noinline or is just taking of inline enough? >>> In general, letting compiler decide is often best practice. >> The force noinline is an optimization for x86 platforms to keep rte_lpm_add() API >> performance with memory ordering applied. > I don't think you answered my question. What does a recent version of GCC do > if you drop the inline. > > Actually all the functions in rte_lpm should drop inline. I'm agree with Stephen. If it is not a fastpath and size of function is not minimal it is good to remove inline qualifier for other control plane functions such as rule_add/delete/find/etc and let the compiler decide to inline it (unless it affects performance). > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c > index 6b7b28a2e431..ffe07e980864 100644 > --- a/lib/librte_lpm/rte_lpm.c > +++ b/lib/librte_lpm/rte_lpm.c > @@ -399,7 +399,7 @@ MAP_STATIC_SYMBOL(void rte_lpm_free(struct rte_lpm *lpm), > * are stored in the rule table from 0 - 31. > * NOTE: Valid range for depth parameter is 1 .. 32 inclusive. > */ > -static inline int32_t > +static int32_t > rule_add_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth, > uint8_t next_hop) > { > @@ -471,7 +471,7 @@ rule_add_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth, > return rule_index; > } > > -static inline int32_t > +static int32_t > rule_add_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth, > uint32_t next_hop) > { > @@ -547,7 +547,7 @@ rule_add_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth, > * Delete a rule from the rule table. > * NOTE: Valid range for depth parameter is 1 .. 32 inclusive. > */ > -static inline void > +static void > rule_delete_v20(struct rte_lpm_v20 *lpm, int32_t rule_index, uint8_t depth) > { > int i; > @@ -570,7 +570,7 @@ rule_delete_v20(struct rte_lpm_v20 *lpm, int32_t rule_index, uint8_t depth) > lpm->rule_info[depth - 1].used_rules--; > } > > -static inline void > +static void > rule_delete_v1604(struct rte_lpm *lpm, int32_t rule_index, uint8_t depth) > { > int i; > @@ -597,7 +597,7 @@ rule_delete_v1604(struct rte_lpm *lpm, int32_t rule_index, uint8_t depth) > * Finds a rule in rule table. > * NOTE: Valid range for depth parameter is 1 .. 32 inclusive. > */ > -static inline int32_t > +static int32_t > rule_find_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth) > { > uint32_t rule_gindex, last_rule, rule_index; > @@ -618,7 +618,7 @@ rule_find_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth) > return -EINVAL; > } > > -static inline int32_t > +static int32_t > rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth) > { > uint32_t rule_gindex, last_rule, rule_index; > @@ -642,7 +642,7 @@ rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth) > /* > * Find, clean and allocate a tbl8. > */ > -static inline int32_t > +static int32_t > tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8) > { > uint32_t group_idx; /* tbl8 group index. */ > @@ -669,7 +669,7 @@ tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8) > return -ENOSPC; > } > > -static inline int32_t > +static int32_t > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s) > { > uint32_t group_idx; /* tbl8 group index. */ > @@ -709,7 +709,7 @@ tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start) > tbl8[tbl8_group_start].valid_group = INVALID; > } > > -static inline int32_t > +static int32_t > add_depth_small_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth, > uint8_t next_hop) > { > @@ -777,7 +777,7 @@ add_depth_small_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth, > return 0; > } > > -static inline int32_t > +static int32_t > add_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth, > uint32_t next_hop) > { > @@ -846,7 +846,7 @@ add_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth, > return 0; > } > > -static inline int32_t > +static int32_t > add_depth_big_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth, > uint8_t next_hop) > { > @@ -971,7 +971,7 @@ add_depth_big_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth, > return 0; > } > > -static inline int32_t > +static int32_t > add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth, > uint32_t next_hop) > { > @@ -1244,7 +1244,7 @@ BIND_DEFAULT_SYMBOL(rte_lpm_is_rule_present, _v1604, 16.04); > MAP_STATIC_SYMBOL(int rte_lpm_is_rule_present(struct rte_lpm *lpm, uint32_t ip, > uint8_t depth, uint32_t *next_hop), rte_lpm_is_rule_present_v1604); > > -static inline int32_t > +static int32_t > find_previous_rule_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth, > uint8_t *sub_rule_depth) > { > @@ -1266,7 +1266,7 @@ find_previous_rule_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth, > return -1; > } > > -static inline int32_t > +static int32_t > find_previous_rule_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth, > uint8_t *sub_rule_depth) > { > @@ -1288,7 +1288,7 @@ find_previous_rule_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth, > return -1; > } > > -static inline int32_t > +static int32_t > delete_depth_small_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, > uint8_t depth, int32_t sub_rule_index, uint8_t sub_rule_depth) > { > @@ -1381,7 +1381,7 @@ delete_depth_small_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, > return 0; > } > > -static inline int32_t > +static int32_t > delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked, > uint8_t depth, int32_t sub_rule_index, uint8_t sub_rule_depth) > { > @@ -1483,7 +1483,7 @@ delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked, > * Return of value > -1 means tbl8 is in use but has all the same values and > * thus can be recycled > */ > -static inline int32_t > +static int32_t > tbl8_recycle_check_v20(struct rte_lpm_tbl_entry_v20 *tbl8, > uint32_t tbl8_group_start) > { > @@ -1530,7 +1530,7 @@ tbl8_recycle_check_v20(struct rte_lpm_tbl_entry_v20 *tbl8, > return -EINVAL; > } > > -static inline int32_t > +static int32_t > tbl8_recycle_check_v1604(struct rte_lpm_tbl_entry *tbl8, > uint32_t tbl8_group_start) > { > @@ -1577,7 +1577,7 @@ tbl8_recycle_check_v1604(struct rte_lpm_tbl_entry *tbl8, > return -EINVAL; > } > > -static inline int32_t > +static int32_t > delete_depth_big_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, > uint8_t depth, int32_t sub_rule_index, uint8_t sub_rule_depth) > { > @@ -1655,7 +1655,7 @@ delete_depth_big_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, > return 0; > } > > -static inline int32_t > +static int32_t > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, > uint8_t depth, int32_t sub_rule_index, uint8_t sub_rule_depth) > { -- Regards, Vladimir