From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 477BAA046B for ; Fri, 28 Jun 2019 07:48:18 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 43D275B3A; Fri, 28 Jun 2019 07:48:17 +0200 (CEST) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-eopbgr150079.outbound.protection.outlook.com [40.107.15.79]) by dpdk.org (Postfix) with ESMTP id ECAA95B34 for ; Fri, 28 Jun 2019 07:48:14 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HxwI5o16tYp5PHrdwgMPhQEBwRiKbiQNzofHxp6JwWk=; b=oFYJU/maoJ2t8eUpraCm5rHdAP/d5VdyVB0vscvxL+S8Np/zjzrv8Hw0GtIIXJp1Bx0/t0pm8ZcC0cCK3myGIY2/3aqhH2nLS8Me1yagCfpnKGEaEw1hX+xe15OLsJQugdAlky0r17vlcKM2yBmRBe97hSmd5RVWYYgOpOjo+HI= Received: from AM0PR08MB4418.eurprd08.prod.outlook.com (20.179.35.207) by AM0PR08MB5522.eurprd08.prod.outlook.com (52.132.215.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2032.17; Fri, 28 Jun 2019 05:48:14 +0000 Received: from AM0PR08MB4418.eurprd08.prod.outlook.com ([fe80::3582:a8b6:2af4:a6d]) by AM0PR08MB4418.eurprd08.prod.outlook.com ([fe80::3582:a8b6:2af4:a6d%3]) with mapi id 15.20.2008.014; Fri, 28 Jun 2019 05:48:14 +0000 From: "Ruifeng Wang (Arm Technology China)" To: Stephen Hemminger CC: "bruce.richardson@intel.com" , "vladimir.medvedkin@intel.com" , "dev@dpdk.org" , Honnappa Nagarahalli , "Gavin Hu (Arm Technology China)" , nd , nd Thread-Topic: [dpdk-dev] [PATCH v3 1/3] lib/lpm: not inline unnecessary functions Thread-Index: AQHVLMwV0WqbgHJxDE+UVV0PVSn4MKavn1SAgACuGHCAAC6gAIAADpQg Date: Fri, 28 Jun 2019 05:48:13 +0000 Message-ID: References: <20190627093751.7746-1-ruifeng.wang@arm.com> <20190627082451.56719392@hermes.lan> <20190627213450.30082af6@hermes.lan> In-Reply-To: <20190627213450.30082af6@hermes.lan> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: dd6fe823-42b4-47ef-8557-4d143345fe14.0 x-checkrecipientchecked: true authentication-results: spf=none (sender IP is ) smtp.mailfrom=Ruifeng.Wang@arm.com; x-originating-ip: [113.29.88.7] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 75a4c71e-8b76-4374-f9d1-08d6fb8c3609 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(7168020)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(4618075)(2017052603328)(7193020); SRVR:AM0PR08MB5522; x-ms-traffictypediagnostic: AM0PR08MB5522: x-ms-exchange-purlcount: 1 x-microsoft-antispam-prvs: nodisclaimer: True x-ms-oob-tlc-oobclassifiers: OLM:8882; x-forefront-prvs: 00826B6158 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(376002)(136003)(39860400002)(366004)(346002)(396003)(13464003)(189003)(199004)(71190400001)(99286004)(4326008)(6246003)(76176011)(7696005)(76116006)(66066001)(81166006)(2906002)(86362001)(66446008)(64756008)(486006)(66946007)(476003)(25786009)(54906003)(53936002)(316002)(66476007)(8676002)(8936002)(9686003)(81156014)(6916009)(72206003)(74316002)(33656002)(256004)(14444005)(966005)(66556008)(55016002)(7736002)(73956011)(55236004)(6506007)(305945005)(6306002)(6436002)(14454004)(53546011)(229853002)(478600001)(186003)(3846002)(71200400001)(26005)(102836004)(52536014)(446003)(6116002)(11346002)(5660300002)(68736007); DIR:OUT; SFP:1101; SCL:1; SRVR:AM0PR08MB5522; H:AM0PR08MB4418.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: J/x/OM5vMHENNEhXTInPJWSY6U465dcdYmf5UtYFhEPnJjQCmClcfcatdBPCf3w0iGjfn4GsEz8dU2HDkcWsujYCB23MbY1pFwn48bLvDRGnjoMBQDhu3BbgEPFIrsnRspy1vVPwQNl6Bvh1eifOOeCQgLspgvRjBm7JVdm76NPKSSoPQGz5qPpclhbAv8b3DO4AYxoDH3FuHqEHmHU/6M5FBt5xtM1c4kzZVN55XkipD+r2QzILCKFIzjp1i4zD0aGiCwAU3HOcXQNr8DNOgg+miaQ+EEY3X1AjEaPyc9nJgkJeikmDDGjPCBIxESIokxweAGvHZpAam5qvc1zZySqGXL9cwS+IUqFS1Wlb6w3sLkxvenDSyt012Kzai2FB5dMBR7FTRhNaxOJiyxJhgRDsbxuom6TK3GRs7lCJKfc= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 75a4c71e-8b76-4374-f9d1-08d6fb8c3609 X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Jun 2019 05:48:13.8571 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Ruifeng.Wang@arm.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR08MB5522 Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: not inline unnecessary functions X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Stephen, > -----Original Message----- > From: Stephen Hemminger > Sent: Friday, June 28, 2019 12:35 > To: Ruifeng Wang (Arm Technology China) > Cc: bruce.richardson@intel.com; vladimir.medvedkin@intel.com; > dev@dpdk.org; Honnappa Nagarahalli ; > Gavin Hu (Arm Technology China) ; nd > Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib/lpm: not inline unnecessary > functions >=20 > On Fri, 28 Jun 2019 02:44:54 +0000 > "Ruifeng Wang (Arm Technology China)" wrote: >=20 > > > > > > > Tests showed that the function inlining caused performance drop on > > > > some x86 platforms with the memory ordering patches applied. > > > > By force no-inline functions, the performance was better than > > > > before on x86 and no impact to arm64 platforms. > > > > > > > > Suggested-by: Medvedkin Vladimir > > > > Signed-off-by: Ruifeng Wang > > > > Reviewed-by: Gavin Hu > > > { > > > > > > Do you actually need to force noinline or is just taking of inline en= ough? > > > In general, letting compiler decide is often best practice. > > > > The force noinline is an optimization for x86 platforms to keep > > rte_lpm_add() API performance with memory ordering applied. >=20 > I don't think you answered my question. What does a recent version of GCC > do if you drop the inline. >=20 GCC still inlines it when inline is dropped. So we need the force noinline. See Vladmir's comment in http://patches.dpdk.org/patch/54936/. > Actually all the functions in rte_lpm should drop inline. >=20 I don't know if we have guideline on use of 'inline'. In general, 'inline' is not used and it is upon compiler to decide? Is it reasonable to use force inline or force noinline for performance tunn= ing? > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c index > 6b7b28a2e431..ffe07e980864 100644 > --- a/lib/librte_lpm/rte_lpm.c > +++ b/lib/librte_lpm/rte_lpm.c > @@ -399,7 +399,7 @@ MAP_STATIC_SYMBOL(void rte_lpm_free(struct > rte_lpm *lpm), > * are stored in the rule table from 0 - 31. > * NOTE: Valid range for depth parameter is 1 .. 32 inclusive. > */ > -static inline int32_t > +static int32_t > rule_add_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth, > uint8_t next_hop) > { > @@ -471,7 +471,7 @@ rule_add_v20(struct rte_lpm_v20 *lpm, uint32_t > ip_masked, uint8_t depth, > return rule_index; > } >=20 > -static inline int32_t > +static int32_t > rule_add_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth, > uint32_t next_hop) > { > @@ -547,7 +547,7 @@ rule_add_v1604(struct rte_lpm *lpm, uint32_t > ip_masked, uint8_t depth, > * Delete a rule from the rule table. > * NOTE: Valid range for depth parameter is 1 .. 32 inclusive. > */ > -static inline void > +static void > rule_delete_v20(struct rte_lpm_v20 *lpm, int32_t rule_index, uint8_t dep= th) > { > int i; > @@ -570,7 +570,7 @@ rule_delete_v20(struct rte_lpm_v20 *lpm, int32_t > rule_index, uint8_t depth) > lpm->rule_info[depth - 1].used_rules--; } >=20 > -static inline void > +static void > rule_delete_v1604(struct rte_lpm *lpm, int32_t rule_index, uint8_t depth= ) { > int i; > @@ -597,7 +597,7 @@ rule_delete_v1604(struct rte_lpm *lpm, int32_t > rule_index, uint8_t depth) > * Finds a rule in rule table. > * NOTE: Valid range for depth parameter is 1 .. 32 inclusive. > */ > -static inline int32_t > +static int32_t > rule_find_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth= ) > { > uint32_t rule_gindex, last_rule, rule_index; @@ -618,7 +618,7 @@ > rule_find_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t depth) > return -EINVAL; > } >=20 > -static inline int32_t > +static int32_t > rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth) = { > uint32_t rule_gindex, last_rule, rule_index; @@ -642,7 +642,7 @@ > rule_find_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth) > /* > * Find, clean and allocate a tbl8. > */ > -static inline int32_t > +static int32_t > tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8) { > uint32_t group_idx; /* tbl8 group index. */ @@ -669,7 +669,7 @@ > tbl8_alloc_v20(struct rte_lpm_tbl_entry_v20 *tbl8) > return -ENOSPC; > } >=20 > -static inline int32_t > +static int32_t > tbl8_alloc_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s) = { > uint32_t group_idx; /* tbl8 group index. */ @@ -709,7 +709,7 @@ > tbl8_free_v1604(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start= ) > tbl8[tbl8_group_start].valid_group =3D INVALID; } >=20 > -static inline int32_t > +static int32_t > add_depth_small_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t depth, > uint8_t next_hop) > { > @@ -777,7 +777,7 @@ add_depth_small_v20(struct rte_lpm_v20 *lpm, > uint32_t ip, uint8_t depth, > return 0; > } >=20 > -static inline int32_t > +static int32_t > add_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth, > uint32_t next_hop) > { > @@ -846,7 +846,7 @@ add_depth_small_v1604(struct rte_lpm *lpm, > uint32_t ip, uint8_t depth, > return 0; > } >=20 > -static inline int32_t > +static int32_t > add_depth_big_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, uint8_t > depth, > uint8_t next_hop) > { > @@ -971,7 +971,7 @@ add_depth_big_v20(struct rte_lpm_v20 *lpm, > uint32_t ip_masked, uint8_t depth, > return 0; > } >=20 > -static inline int32_t > +static int32_t > add_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t > depth, > uint32_t next_hop) > { > @@ -1244,7 +1244,7 @@ BIND_DEFAULT_SYMBOL(rte_lpm_is_rule_present, > _v1604, 16.04); MAP_STATIC_SYMBOL(int rte_lpm_is_rule_present(struct > rte_lpm *lpm, uint32_t ip, > uint8_t depth, uint32_t *next_hop), > rte_lpm_is_rule_present_v1604); >=20 > -static inline int32_t > +static int32_t > find_previous_rule_v20(struct rte_lpm_v20 *lpm, uint32_t ip, uint8_t dep= th, > uint8_t *sub_rule_depth) > { > @@ -1266,7 +1266,7 @@ find_previous_rule_v20(struct rte_lpm_v20 *lpm, > uint32_t ip, uint8_t depth, > return -1; > } >=20 > -static inline int32_t > +static int32_t > find_previous_rule_v1604(struct rte_lpm *lpm, uint32_t ip, uint8_t depth= , > uint8_t *sub_rule_depth) > { > @@ -1288,7 +1288,7 @@ find_previous_rule_v1604(struct rte_lpm *lpm, > uint32_t ip, uint8_t depth, > return -1; > } >=20 > -static inline int32_t > +static int32_t > delete_depth_small_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, > uint8_t depth, int32_t sub_rule_index, uint8_t sub_rule_depth) > { @@ -1381,7 +1381,7 @@ delete_depth_small_v20(struct rte_lpm_v20 > *lpm, uint32_t ip_masked, > return 0; > } >=20 > -static inline int32_t > +static int32_t > delete_depth_small_v1604(struct rte_lpm *lpm, uint32_t ip_masked, > uint8_t depth, int32_t sub_rule_index, uint8_t sub_rule_depth) > { @@ -1483,7 +1483,7 @@ delete_depth_small_v1604(struct rte_lpm *lpm, > uint32_t ip_masked, > * Return of value > -1 means tbl8 is in use but has all the same values= and > * thus can be recycled > */ > -static inline int32_t > +static int32_t > tbl8_recycle_check_v20(struct rte_lpm_tbl_entry_v20 *tbl8, > uint32_t tbl8_group_start) > { > @@ -1530,7 +1530,7 @@ tbl8_recycle_check_v20(struct > rte_lpm_tbl_entry_v20 *tbl8, > return -EINVAL; > } >=20 > -static inline int32_t > +static int32_t > tbl8_recycle_check_v1604(struct rte_lpm_tbl_entry *tbl8, > uint32_t tbl8_group_start) > { > @@ -1577,7 +1577,7 @@ tbl8_recycle_check_v1604(struct > rte_lpm_tbl_entry *tbl8, > return -EINVAL; > } >=20 > -static inline int32_t > +static int32_t > delete_depth_big_v20(struct rte_lpm_v20 *lpm, uint32_t ip_masked, > uint8_t depth, int32_t sub_rule_index, uint8_t sub_rule_depth) > { @@ -1655,7 +1655,7 @@ delete_depth_big_v20(struct rte_lpm_v20 *lpm, > uint32_t ip_masked, > return 0; > } >=20 > -static inline int32_t > +static int32_t > delete_depth_big_v1604(struct rte_lpm *lpm, uint32_t ip_masked, > uint8_t depth, int32_t sub_rule_index, uint8_t sub_rule_depth) {