From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id E97947272 for ; Wed, 14 Mar 2018 12:09:23 +0100 (CET) X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 14 Mar 2018 04:09:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,470,1515484800"; d="scan'208";a="211452221" Received: from bricha3-mobl.ger.corp.intel.com ([10.237.221.118]) by fmsmga005.fm.intel.com with SMTP; 14 Mar 2018 04:09:19 -0700 Received: by (sSMTP sendmail emulation); Wed, 14 Mar 2018 11:09:18 +0000 Date: Wed, 14 Mar 2018 11:09:18 +0000 From: Bruce Richardson To: Medvedkin Vladimir Cc: dev@dpdk.org Message-ID: <20180314110918.GA11720@bricha3-MOBL.ger.corp.intel.com> References: <1519249495-16594-1-git-send-email-medvedkinv@gmail.com> <1519249495-16594-2-git-send-email-medvedkinv@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1519249495-16594-2-git-send-email-medvedkinv@gmail.com> Organization: Intel Research and Development Ireland Ltd. User-Agent: Mutt/1.9.4 (2018-02-28) Subject: Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Mar 2018 11:09:25 -0000 On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote: > RIB is an alternative to current LPM library. > It solves the following problems > - Increases the speed of control plane operations against lpm such as > adding/deleting routes > - Adds abstraction from dataplane algorithms, so it is possible to add > different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc > in addition to current dir24_8 > - It is possible to keep user defined application specific additional > information in struct rte_rib_node which represents route entry. > It can be next hop/set of next hops (i.e. active and feasible), > pointers to link rte_rib_node based on some criteria (i.e. next_hop), > plenty of additional control plane information. > - For dir24_8 implementation it is possible to remove rte_lpm_tbl_entry.depth > field that helps to save 6 bits. > - Also new dir24_8 implementation supports different next_hop sizes > (1/2/4/8 bytes per next hop) > - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary operator. > Instead it returns special default value if there is no route. > > Signed-off-by: Medvedkin Vladimir > --- > config/common_base | 6 + > doc/api/doxy-api.conf | 1 + > lib/Makefile | 2 + > lib/librte_rib/Makefile | 22 ++ > lib/librte_rib/rte_dir24_8.c | 482 +++++++++++++++++++++++++++++++++ > lib/librte_rib/rte_dir24_8.h | 116 ++++++++ > lib/librte_rib/rte_rib.c | 526 +++++++++++++++++++++++++++++++++++++ > lib/librte_rib/rte_rib.h | 322 +++++++++++++++++++++++ > lib/librte_rib/rte_rib_version.map | 18 ++ > mk/rte.app.mk | 1 + > 10 files changed, 1496 insertions(+) > create mode 100644 lib/librte_rib/Makefile > create mode 100644 lib/librte_rib/rte_dir24_8.c > create mode 100644 lib/librte_rib/rte_dir24_8.h > create mode 100644 lib/librte_rib/rte_rib.c > create mode 100644 lib/librte_rib/rte_rib.h > create mode 100644 lib/librte_rib/rte_rib_version.map > First pass review comments. For now just reviewed the main public header file rte_rib.h. Later reviews will cover the other files as best I can. /Bruce > diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h > new file mode 100644 > index 0000000..6eac8fb > --- /dev/null > +++ b/lib/librte_rib/rte_rib.h > @@ -0,0 +1,322 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2018 Vladimir Medvedkin > + */ > + > +#ifndef _RTE_RIB_H_ > +#define _RTE_RIB_H_ > + > +/** > + * @file > + * Compressed trie implementation for Longest Prefix Match > + */ > + > +/** @internal Macro to enable/disable run-time checks. */ > +#if defined(RTE_LIBRTE_RIB_DEBUG) > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do { \ > + if (cond) \ > + return retval; \ > +} while (0) > +#else > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) > +#endif use RTE_ASSERT? > + > +#define RTE_RIB_VALID_NODE 1 should there be an INVALID_NODE macro? > +#define RTE_RIB_GET_NXT_ALL 0 > +#define RTE_RIB_GET_NXT_COVER 1 > + > +#define RTE_RIB_INVALID_ROUTE 0 > +#define RTE_RIB_VALID_ROUTE 1 > + > +/** Max number of characters in RIB name. */ > +#define RTE_RIB_NAMESIZE 64 > + > +/** Maximum depth value possible for IPv4 RIB. */ > +#define RTE_RIB_MAXDEPTH 32 I think we should have IPv4 in the name here. Will it not be extended to support IPv6 in future? > + > +/** > + * Macro to check if prefix1 {key1/depth1} > + * is covered by prefix2 {key2/depth2} > + */ > +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2) \ > + ((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) == 0)\ > + && (depth1 > depth2)) Neat check! Any particular reason for using UINT64_MAX here rather than UINT32_MAX? I think you can avoid the casting and have a slightly shorter mask by changing "(uint32_t)(UINT64_MAX << (32 - depth2)" to "~(UINT32_MAX >> depth2)" I'd also suggest for readability putting the second check first, and, for maintainability, using an inline function rather than a macro. > + > +/** @internal Macro to get next node in tree*/ > +#define RTE_RIB_GET_NXT_NODE(node, key) \ > + ((key & (1 << (31 - node->depth))) ? node->right : node->left) > +/** @internal Macro to check if node is right child*/ > +#define RTE_RIB_IS_RIGHT_NODE(node) (node->parent->right == node) Again, consider inline fns rather than macros. For the latter macro, rather than doing additional pointer derefs to parent, can you also get if it's a right node by using: "(node->key & (1 << (32 - node->depth)))"? > + > + > +struct rte_rib_node { > + struct rte_rib_node *left; > + struct rte_rib_node *right; > + struct rte_rib_node *parent; > + uint32_t key; > + uint8_t depth; > + uint8_t flag; > + uint64_t nh; > + uint64_t ext[0]; > +}; > + > +struct rte_rib; > + > +/** Type of FIB struct*/ > +enum rte_rib_type { > + RTE_RIB_DIR24_8_1B, > + RTE_RIB_DIR24_8_2B, > + RTE_RIB_DIR24_8_4B, > + RTE_RIB_DIR24_8_8B, > + RTE_RIB_TYPE_MAX > +}; If the plan is to support multiple underlying fib types and algorithms under the rib library, would it not be better to separate out the algorithm part from the data storage part? So have the type just be DIR_24_8, and have the 1, 2, 4 or 8 specified separately. > + > +enum rte_rib_op { > + RTE_RIB_ADD, > + RTE_RIB_DEL > +}; > + > +/** RIB nodes allocation type */ > +enum rte_rib_alloc_type { > + RTE_RIB_MALLOC, > + RTE_RIB_MEMPOOL, > + RTE_RIB_ALLOC_MAX > +}; Not sure you need this any more. Malloc allocations and mempool allocations are now pretty much the same thing. > + > +typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t key, > + uint8_t depth, uint64_t next_hop, enum rte_rib_op op); Do you anticipate more ops in future than just add and delete? If not, why not just split this function into two and drop the op struct. > +typedef int (*rte_rib_tree_lookup_fn_t)(void *fib, const uint32_t *ips, > + uint64_t *next_hops, const unsigned n); > +typedef struct rte_rib_node *(*rte_rib_alloc_node_fn_t)(struct rte_rib *rib); > +typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib, > + struct rte_rib_node *node); > + > +struct rte_rib { > + char name[RTE_RIB_NAMESIZE]; > + /*pointer to rib trie*/ > + struct rte_rib_node *trie; > + /*pointer to dataplane struct*/ > + void *fib; > + /*prefix modification*/ > + rte_rib_modify_fn_t modify; > + /* Bulk lookup fn*/ > + rte_rib_tree_lookup_fn_t lookup; > + /*alloc trie element*/ > + rte_rib_alloc_node_fn_t alloc_node; > + /*free trie element*/ > + rte_rib_free_node_fn_t free_node; > + struct rte_mempool *node_pool; > + uint32_t cur_nodes; > + uint32_t cur_routes; > + int max_nodes; > + int node_sz; > + enum rte_rib_type type; > + enum rte_rib_alloc_type alloc_type; > +}; > + > +/** RIB configuration structure */ > +struct rte_rib_conf { > + enum rte_rib_type type; > + enum rte_rib_alloc_type alloc_type; > + int max_nodes; > + size_t node_sz; > + uint64_t def_nh; > +}; > + > +/** > + * Lookup an IP into the RIB structure > + * > + * @param rib > + * RIB object handle > + * @param key > + * IP to be looked up in the RIB > + * @return > + * pointer to struct rte_rib_node on success, > + * NULL otherwise > + */ > +struct rte_rib_node * > +rte_rib_tree_lookup(struct rte_rib *rib, uint32_t key); > + > +/** > + * Lookup less specific route into the RIB structure > + * > + * @param ent > + * Pointer to struct rte_rib_node that represents target route > + * @return > + * pointer to struct rte_rib_node that represents > + * less specific route on success, > + * NULL otherwise > + */ > +struct rte_rib_node * > +rte_rib_tree_lookup_parent(struct rte_rib_node *ent); > + > +/** > + * Lookup prefix into the RIB structure > + * > + * @param rib > + * RIB object handle > + * @param key > + * net to be looked up in the RIB > + * @param depth > + * prefix length > + * @return > + * pointer to struct rte_rib_node on success, > + * NULL otherwise > + */ > +struct rte_rib_node * > +rte_rib_tree_lookup_exact(struct rte_rib *rib, uint32_t key, uint8_t depth); Can you explain the difference between this and regular lookup, and how they would be used. I don't think the names convey the differences sufficiently, and so we should look to rename one or both to be clearer. > + > +/** > + * Retrieve next more specific prefix from the RIB s/more/most/ > + * that is covered by key/depth supernet > + * > + * @param rib > + * RIB object handle > + * @param key > + * net address of supernet prefix that covers returned more specific prefixes > + * @param depth > + * supernet prefix length > + * @param cur > + * pointer to the last returned prefix to get next prefix > + * or > + * NULL to get first more specific prefix > + * @param flag > + * -RTE_RIB_GET_NXT_ALL > + * get all prefixes from subtrie By all prefixes do you mean more specific, i.e. the final prefix? > + * -RTE_RIB_GET_NXT_COVER > + * get only first more specific prefix even if it have more specifics > + * @return > + * pointer to the next more specific prefix > + * or > + * NULL if there is no prefixes left > + */ > +struct rte_rib_node * > +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, uint8_t depth, > + struct rte_rib_node *cur, int flag); > + > +/** > + * Remove prefix from the RIB > + * > + * @param rib > + * RIB object handle > + * @param key > + * net to be removed from the RIB > + * @param depth > + * prefix length > + */ > +void > +rte_rib_tree_remove(struct rte_rib *rib, uint32_t key, uint8_t depth); > + > +/** > + * Insert prefix into the RIB > + * > + * @param rib > + * RIB object handle > + * @param key > + * net to be inserted to the RIB > + * @param depth > + * prefix length > + * @return > + * pointer to new rte_rib_node on success > + * NULL otherwise > + */ > +struct rte_rib_node * > +rte_rib_tree_insert(struct rte_rib *rib, uint32_t key, uint8_t depth); > + > +/** > + * Create RIB > + * > + * @param name > + * RIB name > + * @param socket_id > + * NUMA socket ID for RIB table memory allocation > + * @param conf > + * Structure containing the configuration > + * @return > + * Handle to RIB object on success > + * NULL otherwise with rte_errno set to an appropriate values. > + */ > +struct rte_rib * > +rte_rib_create(const char *name, int socket_id, struct rte_rib_conf *conf); > + > +/** > + * Find an existing RIB object and return a pointer to it. > + * > + * @param name > + * Name of the rib object as passed to rte_rib_create() > + * @return > + * Pointer to rib object or NULL if object not found with rte_errno > + * set appropriately. Possible rte_errno values include: > + * - ENOENT - required entry not available to return. > + */ > +struct rte_rib * > +rte_rib_find_existing(const char *name); > + > +/** > + * Free an RIB object. > + * > + * @param rib > + * RIB object handle > + * @return > + * None > + */ > +void > +rte_rib_free(struct rte_rib *rib); > + > +/** > + * Add a rule to the RIB. > + * > + * @param rib > + * RIB object handle > + * @param ip > + * IP of the rule to be added to the RIB > + * @param depth > + * Depth of the rule to be added to the RIB > + * @param next_hop > + * Next hop of the rule to be added to the RIB > + * @return > + * 0 on success, negative value otherwise > + */ > +int > +rte_rib_add(struct rte_rib *rib, uint32_t ip, uint8_t depth, uint64_t next_hop); > + > +/** > + * Delete a rule from the RIB. > + * > + * @param rib > + * RIB object handle > + * @param ip > + * IP of the rule to be deleted from the RIB > + * @param depth > + * Depth of the rule to be deleted from the RIB > + * @return > + * 0 on success, negative value otherwise > + */ > +int > +rte_rib_delete(struct rte_rib *rib, uint32_t ip, uint8_t depth); > + > +/** > + * Lookup multiple IP addresses in an FIB. This may be implemented as a > + * macro, so the address of the function should not be used. > + * > + * @param RIB > + * RIB object handle > + * @param ips > + * Array of IPs to be looked up in the FIB > + * @param next_hops > + * Next hop of the most specific rule found for IP. > + * This is an array of eight byte values. > + * If the lookup for the given IP failed, then corresponding element would > + * contain default value, see description of then next parameter. > + * @param n > + * Number of elements in ips (and next_hops) array to lookup. This should be a > + * compile time constant, and divisible by 8 for best performance. > + * @param defv > + * Default value to populate into corresponding element of hop[] array, > + * if lookup would fail. > + * @return > + * -EINVAL for incorrect arguments, otherwise 0 > + */ > +#define rte_rib_fib_lookup_bulk(rib, ips, next_hops, n) \ > + rib->lookup(rib->fib, ips, next_hops, n) My main thought here is whether this needs to be a function at all? Given that it takes a full burst of addresses in a single go, how much performance would actually be lost by making this a regular function in the C file? IF we do convert this to a regular function, then a lot of the structure definitions above - most importantly, the rib structure itself - can probably be moved to a private header file and not exposed to applications at all. This will make ABI compatibility a *lot* easier, as the structures can be changed without affecting the public ABI. /Bruce > + > +#endif /* _RTE_RIB_H_ */