From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 740C7A0526; Wed, 8 Jul 2020 14:37:06 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 701061DC6A; Wed, 8 Jul 2020 14:37:05 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 406661DB4F for ; Wed, 8 Jul 2020 14:37:03 +0200 (CEST) IronPort-SDR: /EV5TCSE/OYVekQ7eEEmLyCfFTcMFhPaQk93HZydAS5Xj+6/J8LZynnyF3pWM5H1VEkNpxhSzt 3y9j2udq+sHw== X-IronPort-AV: E=McAfee;i="6000,8403,9675"; a="127384705" X-IronPort-AV: E=Sophos;i="5.75,327,1589266800"; d="scan'208,217";a="127384705" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Jul 2020 05:37:02 -0700 IronPort-SDR: x2UqOymYFu6Drl3LV8qrUyIao5r6lFjCcyXslNhwxWfI+uoogJs7k8P6yV2rbTnt/gUGX54YDs JmPredL4xl5Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,327,1589266800"; d="scan'208,217";a="315856356" Received: from vmedvedk-mobl.ger.corp.intel.com (HELO [10.213.247.70]) ([10.213.247.70]) by fmsmga002.fm.intel.com with ESMTP; 08 Jul 2020 05:36:59 -0700 To: Ruifeng Wang , Bruce Richardson , John McNamara , Marko Kovacevic , Ray Kinsella , Neil Horman Cc: dev@dpdk.org, konstantin.ananyev@intel.com, honnappa.nagarahalli@arm.com, nd@arm.com References: <20190906094534.36060-1-ruifeng.wang@arm.com> <20200707151554.64431-1-ruifeng.wang@arm.com> <20200707151554.64431-2-ruifeng.wang@arm.com> From: "Medvedkin, Vladimir" Message-ID: Date: Wed, 8 Jul 2020 13:36:58 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20200707151554.64431-2-ruifeng.wang@arm.com> Content-Language: en-US Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v7 1/3] lib/lpm: integrate RCU QSBR X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 07/07/2020 16:15, Ruifeng Wang wrote: > Currently, the tbl8 group is freed even though the readers might be > using the tbl8 group entries. The freed tbl8 group can be reallocated > quickly. This results in incorrect lookup results. > > RCU QSBR process is integrated for safe tbl8 group reclaim. > Refer to RCU documentation to understand various aspects of > integrating RCU library into other libraries. > > Signed-off-by: Ruifeng Wang > Reviewed-by: Honnappa Nagarahalli > Acked-by: Ray Kinsella > --- > doc/guides/prog_guide/lpm_lib.rst | 32 ++++++++ > lib/librte_lpm/Makefile | 2 +- > lib/librte_lpm/meson.build | 1 + > lib/librte_lpm/rte_lpm.c | 120 ++++++++++++++++++++++++++--- > lib/librte_lpm/rte_lpm.h | 59 ++++++++++++++ > lib/librte_lpm/rte_lpm_version.map | 6 ++ > 6 files changed, 208 insertions(+), 12 deletions(-) > > diff --git a/doc/guides/prog_guide/lpm_lib.rst b/doc/guides/prog_guide/lpm_lib.rst > index 1609a57d0..03945904b 100644 > --- a/doc/guides/prog_guide/lpm_lib.rst > +++ b/doc/guides/prog_guide/lpm_lib.rst > @@ -145,6 +145,38 @@ depending on whether we need to move to the next table or not. > Prefix expansion is one of the keys of this algorithm, > since it improves the speed dramatically by adding redundancy. > > +Deletion > +~~~~~~~~ > + > +When deleting a rule, a replacement rule is searched for. Replacement rule is an existing rule that has > +the longest prefix match with the rule to be deleted, but has shorter prefix. > + > +If a replacement rule is found, target tbl24 and tbl8 entries are updated to have the same depth and next hop > +value with the replacement rule. > + > +If no replacement rule can be found, target tbl24 and tbl8 entries will be cleared. > + > +Prefix expansion is performed if the rule's depth is not exactly 24 bits or 32 bits. > + > +After deleting a rule, a group of tbl8s that belongs to the same tbl24 entry are freed in following cases: > + > +* All tbl8s in the group are empty . > + > +* All tbl8s in the group have the same values and with depth no greater than 24. > + > +Free of tbl8s have different behaviors: > + > +* If RCU is not used, tbl8s are cleared and reclaimed immediately. > + > +* If RCU is used, tbl8s are reclaimed when readers are in quiescent state. > + > +When the LPM is not using RCU, tbl8 group can be freed immediately even though the readers might be using > +the tbl8 group entries. This might result in incorrect lookup results. > + > +RCU QSBR process is integrated for safe tbl8 group reclamation. Application has certain responsibilities > +while using this feature. Please refer to resource reclamation framework of :ref:`RCU library ` > +for more details. > + > Lookup > ~~~~~~ > > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile > index d682785b6..6f06c5c03 100644 > --- a/lib/librte_lpm/Makefile > +++ b/lib/librte_lpm/Makefile > @@ -8,7 +8,7 @@ LIB = librte_lpm.a > > CFLAGS += -O3 > CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) > -LDLIBS += -lrte_eal -lrte_hash > +LDLIBS += -lrte_eal -lrte_hash -lrte_rcu > > EXPORT_MAP := rte_lpm_version.map > > diff --git a/lib/librte_lpm/meson.build b/lib/librte_lpm/meson.build > index 021ac6d8d..6cfc083c5 100644 > --- a/lib/librte_lpm/meson.build > +++ b/lib/librte_lpm/meson.build > @@ -7,3 +7,4 @@ headers = files('rte_lpm.h', 'rte_lpm6.h') > # without worrying about which architecture we actually need > headers += files('rte_lpm_altivec.h', 'rte_lpm_neon.h', 'rte_lpm_sse.h') > deps += ['hash'] > +deps += ['rcu'] > diff --git a/lib/librte_lpm/rte_lpm.c b/lib/librte_lpm/rte_lpm.c > index 38ab512a4..d498ba761 100644 > --- a/lib/librte_lpm/rte_lpm.c > +++ b/lib/librte_lpm/rte_lpm.c > @@ -1,5 +1,6 @@ > /* SPDX-License-Identifier: BSD-3-Clause > * Copyright(c) 2010-2014 Intel Corporation > + * Copyright(c) 2020 Arm Limited > */ > > #include > @@ -246,12 +247,82 @@ rte_lpm_free(struct rte_lpm *lpm) > > rte_mcfg_tailq_write_unlock(); > > + if (lpm->dq) > + rte_rcu_qsbr_dq_delete(lpm->dq); > rte_free(lpm->tbl8); > rte_free(lpm->rules_tbl); > rte_free(lpm); > rte_free(te); > } > > +static void > +__lpm_rcu_qsbr_free_resource(void *p, void *data, unsigned int n) > +{ > + struct rte_lpm_tbl_entry zero_tbl8_entry = {0}; > + uint32_t tbl8_group_index = *(uint32_t *)data; > + struct rte_lpm_tbl_entry *tbl8 = ((struct rte_lpm *)p)->tbl8; > + > + RTE_SET_USED(n); > + /* Set tbl8 group invalid */ > + __atomic_store(&tbl8[tbl8_group_index], &zero_tbl8_entry, > + __ATOMIC_RELAXED); > +} > + > +/* Associate QSBR variable with an LPM object. > + */ > +int > +rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg, > + struct rte_rcu_qsbr_dq **dq) > +{ > + char rcu_dq_name[RTE_RCU_QSBR_DQ_NAMESIZE]; > + struct rte_rcu_qsbr_dq_parameters params = {0}; > + > + if ((lpm == NULL) || (cfg == NULL)) { > + rte_errno = EINVAL; > + return 1; > + } > + > + if (lpm->v) { > + rte_errno = EEXIST; > + return 1; > + } > + > + if (cfg->mode == RTE_LPM_QSBR_MODE_SYNC) { > + /* No other things to do. */ > + } else if (cfg->mode == RTE_LPM_QSBR_MODE_DQ) { > + /* Init QSBR defer queue. */ > + snprintf(rcu_dq_name, sizeof(rcu_dq_name), > + "LPM_RCU_%s", lpm->name); > + params.name = rcu_dq_name; > + params.size = cfg->dq_size; > + if (params.size == 0) > + params.size = lpm->number_tbl8s; > + params.trigger_reclaim_limit = cfg->reclaim_thd; > + params.max_reclaim_size = cfg->reclaim_max; > + if (params.max_reclaim_size == 0) > + params.max_reclaim_size = RTE_LPM_RCU_DQ_RECLAIM_MAX; > + params.esize = sizeof(uint32_t); /* tbl8 group index */ > + params.free_fn = __lpm_rcu_qsbr_free_resource; > + params.p = lpm; > + params.v = cfg->v; > + lpm->dq = rte_rcu_qsbr_dq_create(¶ms); > + if (lpm->dq == NULL) { > + RTE_LOG(ERR, LPM, > + "LPM QS defer queue creation failed\n"); > + return 1; > + } > + if (dq) > + *dq = lpm->dq; > + } else { > + rte_errno = EINVAL; > + return 1; > + } > + lpm->rcu_mode = cfg->mode; > + lpm->v = cfg->v; > + > + return 0; > +} > + > /* > * Adds a rule to the rule table. > * > @@ -394,14 +465,15 @@ rule_find(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth) > * Find, clean and allocate a tbl8. > */ > static int32_t > -tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s) > +_tbl8_alloc(struct rte_lpm *lpm) > { > uint32_t group_idx; /* tbl8 group index. */ > struct rte_lpm_tbl_entry *tbl8_entry; > > /* Scan through tbl8 to find a free (i.e. INVALID) tbl8 group. */ > - for (group_idx = 0; group_idx < number_tbl8s; group_idx++) { > - tbl8_entry = &tbl8[group_idx * RTE_LPM_TBL8_GROUP_NUM_ENTRIES]; > + for (group_idx = 0; group_idx < lpm->number_tbl8s; group_idx++) { > + tbl8_entry = &lpm->tbl8[group_idx * > + RTE_LPM_TBL8_GROUP_NUM_ENTRIES]; > /* If a free tbl8 group is found clean it and set as VALID. */ > if (!tbl8_entry->valid_group) { > struct rte_lpm_tbl_entry new_tbl8_entry = { > @@ -427,14 +499,40 @@ tbl8_alloc(struct rte_lpm_tbl_entry *tbl8, uint32_t number_tbl8s) > return -ENOSPC; > } > > +static int32_t > +tbl8_alloc(struct rte_lpm *lpm) > +{ > + int32_t group_idx; /* tbl8 group index. */ > + > + group_idx = _tbl8_alloc(lpm); > + if ((group_idx == -ENOSPC) && (lpm->dq != NULL)) { > + /* If there are no tbl8 groups try to reclaim one. */ > + if (rte_rcu_qsbr_dq_reclaim(lpm->dq, 1, NULL, NULL, NULL) == 0) > + group_idx = _tbl8_alloc(lpm); > + } > + > + return group_idx; > +} > + > static void > -tbl8_free(struct rte_lpm_tbl_entry *tbl8, uint32_t tbl8_group_start) > +tbl8_free(struct rte_lpm *lpm, uint32_t tbl8_group_start) > { > - /* Set tbl8 group invalid*/ > struct rte_lpm_tbl_entry zero_tbl8_entry = {0}; > > - __atomic_store(&tbl8[tbl8_group_start], &zero_tbl8_entry, > - __ATOMIC_RELAXED); > + if (!lpm->v) { > + /* Set tbl8 group invalid*/ > + __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry, > + __ATOMIC_RELAXED); > + } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_SYNC) { > + /* Wait for quiescent state change. */ > + rte_rcu_qsbr_synchronize(lpm->v, RTE_QSBR_THRID_INVALID); > + /* Set tbl8 group invalid*/ > + __atomic_store(&lpm->tbl8[tbl8_group_start], &zero_tbl8_entry, > + __ATOMIC_RELAXED); > + } else if (lpm->rcu_mode == RTE_LPM_QSBR_MODE_DQ) { > + /* Push into QSBR defer queue. */ > + rte_rcu_qsbr_dq_enqueue(lpm->dq, (void *)&tbl8_group_start); > + } > } > > static __rte_noinline int32_t > @@ -523,7 +621,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth, > > if (!lpm->tbl24[tbl24_index].valid) { > /* Search for a free tbl8 group. */ > - tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s); > + tbl8_group_index = tbl8_alloc(lpm); > > /* Check tbl8 allocation was successful. */ > if (tbl8_group_index < 0) { > @@ -569,7 +667,7 @@ add_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, uint8_t depth, > } /* If valid entry but not extended calculate the index into Table8. */ > else if (lpm->tbl24[tbl24_index].valid_group == 0) { > /* Search for free tbl8 group. */ > - tbl8_group_index = tbl8_alloc(lpm->tbl8, lpm->number_tbl8s); > + tbl8_group_index = tbl8_alloc(lpm); > > if (tbl8_group_index < 0) { > return tbl8_group_index; > @@ -977,7 +1075,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, > */ > lpm->tbl24[tbl24_index].valid = 0; > __atomic_thread_fence(__ATOMIC_RELEASE); > - tbl8_free(lpm->tbl8, tbl8_group_start); > + tbl8_free(lpm, tbl8_group_start); > } else if (tbl8_recycle_index > -1) { > /* Update tbl24 entry. */ > struct rte_lpm_tbl_entry new_tbl24_entry = { > @@ -993,7 +1091,7 @@ delete_depth_big(struct rte_lpm *lpm, uint32_t ip_masked, > __atomic_store(&lpm->tbl24[tbl24_index], &new_tbl24_entry, > __ATOMIC_RELAXED); > __atomic_thread_fence(__ATOMIC_RELEASE); > - tbl8_free(lpm->tbl8, tbl8_group_start); > + tbl8_free(lpm, tbl8_group_start); > } > #undef group_idx > return 0; > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h > index b9d49ac87..7889f21b3 100644 > --- a/lib/librte_lpm/rte_lpm.h > +++ b/lib/librte_lpm/rte_lpm.h > @@ -1,5 +1,6 @@ > /* SPDX-License-Identifier: BSD-3-Clause > * Copyright(c) 2010-2014 Intel Corporation > + * Copyright(c) 2020 Arm Limited > */ > > #ifndef _RTE_LPM_H_ > @@ -20,6 +21,7 @@ > #include > #include > #include > +#include > > #ifdef __cplusplus > extern "C" { > @@ -62,6 +64,17 @@ extern "C" { > /** Bitmask used to indicate successful lookup */ > #define RTE_LPM_LOOKUP_SUCCESS 0x01000000 > > +/** @internal Default RCU defer queue entries to reclaim in one go. */ > +#define RTE_LPM_RCU_DQ_RECLAIM_MAX 16 > + > +/** RCU reclamation modes */ > +enum rte_lpm_qsbr_mode { > + /** Create defer queue for reclaim. */ > + RTE_LPM_QSBR_MODE_DQ = 0, > + /** Use blocking mode reclaim. No defer queue created. */ > + RTE_LPM_QSBR_MODE_SYNC > +}; > + > #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN > /** @internal Tbl24 entry structure. */ > __extension__ > @@ -130,6 +143,28 @@ struct rte_lpm { > __rte_cache_aligned; /**< LPM tbl24 table. */ > struct rte_lpm_tbl_entry *tbl8; /**< LPM tbl8 table. */ > struct rte_lpm_rule *rules_tbl; /**< LPM rules. */ > +#ifdef ALLOW_EXPERIMENTAL_API > + /* RCU config. */ > + struct rte_rcu_qsbr *v; /* RCU QSBR variable. */ > + enum rte_lpm_qsbr_mode rcu_mode;/* Blocking, defer queue. */ > + struct rte_rcu_qsbr_dq *dq; /* RCU QSBR defer queue. */ > +#endif > +}; > + > +/** LPM RCU QSBR configuration structure. */ > +struct rte_lpm_rcu_config { > + struct rte_rcu_qsbr *v; /* RCU QSBR variable. */ > + /* Mode of RCU QSBR. RTE_LPM_QSBR_MODE_xxx > + * '0' for default: create defer queue for reclaim. > + */ > + enum rte_lpm_qsbr_mode mode; > + uint32_t dq_size; /* RCU defer queue size. > + * default: lpm->number_tbl8s. > + */ > + uint32_t reclaim_thd; /* Threshold to trigger auto reclaim. */ > + uint32_t reclaim_max; /* Max entries to reclaim in one go. > + * default: RTE_LPM_RCU_DQ_RECLAIM_MAX. > + */ > }; > > /** > @@ -179,6 +214,30 @@ rte_lpm_find_existing(const char *name); > void > rte_lpm_free(struct rte_lpm *lpm); > > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Associate RCU QSBR variable with an LPM object. > + * > + * @param lpm > + * the lpm object to add RCU QSBR > + * @param cfg > + * RCU QSBR configuration > + * @param dq > + * handler of created RCU QSBR defer queue > + * @return > + * On success - 0 > + * On error - 1 with error code set in rte_errno. > + * Possible rte_errno codes are: > + * - EINVAL - invalid pointer > + * - EEXIST - already added QSBR > + * - ENOMEM - memory allocation failure > + */ > +__rte_experimental > +int rte_lpm_rcu_qsbr_add(struct rte_lpm *lpm, struct rte_lpm_rcu_config *cfg, > + struct rte_rcu_qsbr_dq **dq); > + > /** > * Add a rule to the LPM table. > * > diff --git a/lib/librte_lpm/rte_lpm_version.map b/lib/librte_lpm/rte_lpm_version.map > index 500f58b80..bfccd7eac 100644 > --- a/lib/librte_lpm/rte_lpm_version.map > +++ b/lib/librte_lpm/rte_lpm_version.map > @@ -21,3 +21,9 @@ DPDK_20.0 { > > local: *; > }; > + > +EXPERIMENTAL { > + global: > + > + rte_lpm_rcu_qsbr_add; > +}; Acked-by: Vladimir Medvedkin -- Regards, Vladimir