From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f175.google.com (mail-qk0-f175.google.com [209.85.220.175]) by dpdk.org (Postfix) with ESMTP id 23C6E5F18 for ; Sun, 25 Mar 2018 20:17:22 +0200 (CEST) Received: by mail-qk0-f175.google.com with SMTP id g184so17921518qkd.10 for ; Sun, 25 Mar 2018 11:17:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=51sNTIslNt3GBBo7Us+ZClLtP/u4Klip7omAjBaPUR0=; b=GezVbRrXNjB952YJidOe8C+51Ox7qNNeuFZfwIYMWL64aRxqWeP1qb/8D7E+sdvv72 BmcVM9393NOjFb3Bx5C5oP4wMO0JoIEf6+PmFMpO29dhY9meqjRihNu3sL8aLqqVDoZd 7tgjxh0xypZsmkiMGHrYYpD0aKxmjMjN6GTAYnuidL4m1+5F5iZx+e9gIWPQv8cu6Jql 7IVy8OHlLjRdNYM34GUIFdQaHVzo6926dkfXTaSdhcuIfq04QpG3NF4ptw8qK/zrQg8D /U/OauZl2sp0Yvj8zpeTQf0jwAOFGUbEqOPNd2AssaBQikJJmLGKItOGli9DMDpvLCWw clug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=51sNTIslNt3GBBo7Us+ZClLtP/u4Klip7omAjBaPUR0=; b=qdQ7+EC7upWbBzNzWAURHVhGS4NNjB5GTzvPIscqO1jNUME8sSAK814nGIkKfWpbFJ 0xB8rn/iT9S9dYZ+YslcpCHYLArx6kxAQwHxr0sZzWJmEr5GjHmYsyZqK2BqEwrWxSoT k5wT+UAfytWfgEYgN0v0JjfTdpxNXuoJeXMtKO1fKS9SD/kQ6GPD8Ew/BbO0L9NN02ZO woh833BF22VVtt9BHt0fHE++z4MVV6DTGwY5gmZMMUwi9wfhVqX9nZ3mqfXWvXeFyKRW hq2xl80qVJehiS1N50fzGMNbz8/0c7M+HT04gttso+YvclJb4QtnnWmivWMqgcnSulS+ HHig== X-Gm-Message-State: AElRT7GD/LRlRobD2fy6BYyrzQpop/PIJix4NIDNTFDklbr47byPy8Il vHW1DCH9zeSrXkELWZhfMBTvYzP7jGVRwcbCaa8= X-Google-Smtp-Source: AG47ELvkZGkC/guewrlYt/aFUUn97qfTt9dmzjujs0tTMlEhdSaNArV7TaNA5gCOz06m8kC/1qFL9GMmf7TYn7jnKu8= X-Received: by 10.55.10.6 with SMTP id 6mr50567059qkk.271.1522001841273; Sun, 25 Mar 2018 11:17:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.200.37.27 with HTTP; Sun, 25 Mar 2018 11:17:20 -0700 (PDT) In-Reply-To: <20180314110918.GA11720@bricha3-MOBL.ger.corp.intel.com> References: <1519249495-16594-1-git-send-email-medvedkinv@gmail.com> <1519249495-16594-2-git-send-email-medvedkinv@gmail.com> <20180314110918.GA11720@bricha3-MOBL.ger.corp.intel.com> From: Vladimir Medvedkin Date: Sun, 25 Mar 2018 21:17:20 +0300 Message-ID: To: Bruce Richardson Cc: dev@dpdk.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 Mar 2018 18:17:23 -0000 Hi, 2018-03-14 14:09 GMT+03:00 Bruce Richardson : > On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote: > > RIB is an alternative to current LPM library. > > It solves the following problems > > - Increases the speed of control plane operations against lpm such as > > adding/deleting routes > > - Adds abstraction from dataplane algorithms, so it is possible to add > > different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc > > in addition to current dir24_8 > > - It is possible to keep user defined application specific additional > > information in struct rte_rib_node which represents route entry. > > It can be next hop/set of next hops (i.e. active and feasible), > > pointers to link rte_rib_node based on some criteria (i.e. next_hop), > > plenty of additional control plane information. > > - For dir24_8 implementation it is possible to remove > rte_lpm_tbl_entry.depth > > field that helps to save 6 bits. > > - Also new dir24_8 implementation supports different next_hop sizes > > (1/2/4/8 bytes per next hop) > > - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary > operator. > > Instead it returns special default value if there is no route. > > > > Signed-off-by: Medvedkin Vladimir > > --- > > config/common_base | 6 + > > doc/api/doxy-api.conf | 1 + > > lib/Makefile | 2 + > > lib/librte_rib/Makefile | 22 ++ > > lib/librte_rib/rte_dir24_8.c | 482 ++++++++++++++++++++++++++++++ > +++ > > lib/librte_rib/rte_dir24_8.h | 116 ++++++++ > > lib/librte_rib/rte_rib.c | 526 ++++++++++++++++++++++++++++++ > +++++++ > > lib/librte_rib/rte_rib.h | 322 +++++++++++++++++++++++ > > lib/librte_rib/rte_rib_version.map | 18 ++ > > mk/rte.app.mk | 1 + > > 10 files changed, 1496 insertions(+) > > create mode 100644 lib/librte_rib/Makefile > > create mode 100644 lib/librte_rib/rte_dir24_8.c > > create mode 100644 lib/librte_rib/rte_dir24_8.h > > create mode 100644 lib/librte_rib/rte_rib.c > > create mode 100644 lib/librte_rib/rte_rib.h > > create mode 100644 lib/librte_rib/rte_rib_version.map > > > > First pass review comments. For now just reviewed the main public header > file rte_rib.h. Later reviews will cover the other files as best I can. > > /Bruce > > > > diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h > > new file mode 100644 > > index 0000000..6eac8fb > > --- /dev/null > > +++ b/lib/librte_rib/rte_rib.h > > @@ -0,0 +1,322 @@ > > +/* SPDX-License-Identifier: BSD-3-Clause > > + * Copyright(c) 2018 Vladimir Medvedkin > > + */ > > + > > +#ifndef _RTE_RIB_H_ > > +#define _RTE_RIB_H_ > > + > > +/** > > + * @file > > + * Compressed trie implementation for Longest Prefix Match > > + */ > > + > > +/** @internal Macro to enable/disable run-time checks. */ > > +#if defined(RTE_LIBRTE_RIB_DEBUG) > > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do { \ > > + if (cond) \ > > + return retval; \ > > +} while (0) > > +#else > > +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) > > +#endif > > use RTE_ASSERT? > it was done just like it was done in the LPM lib. But if you think it should be RTE_ASSERT so be it. > > > + > > +#define RTE_RIB_VALID_NODE 1 > > should there be an INVALID_NODE macro? > No > > > +#define RTE_RIB_GET_NXT_ALL 0 > > +#define RTE_RIB_GET_NXT_COVER 1 > > + > > +#define RTE_RIB_INVALID_ROUTE 0 > > +#define RTE_RIB_VALID_ROUTE 1 > > + > > +/** Max number of characters in RIB name. */ > > +#define RTE_RIB_NAMESIZE 64 > > + > > +/** Maximum depth value possible for IPv4 RIB. */ > > +#define RTE_RIB_MAXDEPTH 32 > > I think we should have IPv4 in the name here. Will it not be extended to > support IPv6 in future? > I think there should be a separate implementation of the library for ipv6 > > > + > > +/** > > + * Macro to check if prefix1 {key1/depth1} > > + * is covered by prefix2 {key2/depth2} > > + */ > > +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2) > \ > > + ((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) == 0)\ > > + && (depth1 > depth2)) > Neat check! > > Any particular reason for using UINT64_MAX here rather than UINT32_MAX? in case when depth2 = 0 UINT32_MAX shifted left by 32 bit will remain UINT32_MAX because shift count will be masked to 5 bits. I think you can avoid the casting and have a slightly shorter mask by > changing "(uint32_t)(UINT64_MAX << (32 - depth2)" to > "~(UINT32_MAX >> depth2)" > I'd also suggest for readability putting the second check first, and, > for maintainability, using an inline function rather than a macro. > Agree, it looks clearer > > + > > +/** @internal Macro to get next node in tree*/ > > +#define RTE_RIB_GET_NXT_NODE(node, key) > \ > > + ((key & (1 << (31 - node->depth))) ? node->right : node->left) > > +/** @internal Macro to check if node is right child*/ > > +#define RTE_RIB_IS_RIGHT_NODE(node) (node->parent->right == node) > > Again, consider inline fns rather than macros. > Ok For the latter macro, rather than doing additional pointer derefs to > parent, can you also get if it's a right node by using: > "(node->key & (1 << (32 - node->depth)))"? > No, it is not possible. Decision whether node be left or right is made using parent and child common depth. Consider case with 10.0.0.0/8 and 10.128.0.0/24. In this way common depth will be /8 and 10.128.0.0/24 will be right child. > > + > > + > > +struct rte_rib_node { > > + struct rte_rib_node *left; > > + struct rte_rib_node *right; > > + struct rte_rib_node *parent; > > + uint32_t key; > > + uint8_t depth; > > + uint8_t flag; > > + uint64_t nh; > > + uint64_t ext[0]; > > +}; > > + > > +struct rte_rib; > > + > > +/** Type of FIB struct*/ > > +enum rte_rib_type { > > + RTE_RIB_DIR24_8_1B, > > + RTE_RIB_DIR24_8_2B, > > + RTE_RIB_DIR24_8_4B, > > + RTE_RIB_DIR24_8_8B, > > + RTE_RIB_TYPE_MAX > > +}; > > If the plan is to support multiple underlying fib types and algorithms > under the rib library, would it not be better to separate out the > algorithm part from the data storage part? So have the type just be > DIR_24_8, and have the 1, 2, 4 or 8 specified separately. > Yes, we were talk about it in IRC, agree. Now I pass next hop size in union rte_rib_fib_conf inside rte_rib_conf > > > + > > +enum rte_rib_op { > > + RTE_RIB_ADD, > > + RTE_RIB_DEL > > +}; > > + > > +/** RIB nodes allocation type */ > > +enum rte_rib_alloc_type { > > + RTE_RIB_MALLOC, > > + RTE_RIB_MEMPOOL, > > + RTE_RIB_ALLOC_MAX > > +}; > > Not sure you need this any more. Malloc allocations and mempool > allocations are now pretty much the same thing. > Actually I think to remove malloc. On performance tests with adding/deleting huge amount of routes malloc is slower. Maybe because of fragmentation. What do you think? > > + > > +typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t key, > > + uint8_t depth, uint64_t next_hop, enum rte_rib_op op); > > Do you anticipate more ops in future than just add and delete? If not, > why not just split this function into two and drop the op struct. > It is difficult question. I'm not ready to make decision at the moment. > > > +typedef int (*rte_rib_tree_lookup_fn_t)(void *fib, const uint32_t *ips, > > + uint64_t *next_hops, const unsigned n); > > +typedef struct rte_rib_node *(*rte_rib_alloc_node_fn_t)(struct rte_rib > *rib); > > +typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib, > > + struct rte_rib_node *node); > > + > > +struct rte_rib { > > + char name[RTE_RIB_NAMESIZE]; > > + /*pointer to rib trie*/ > > + struct rte_rib_node *trie; > > + /*pointer to dataplane struct*/ > > + void *fib; > > + /*prefix modification*/ > > + rte_rib_modify_fn_t modify; > > + /* Bulk lookup fn*/ > > + rte_rib_tree_lookup_fn_t lookup; > > + /*alloc trie element*/ > > + rte_rib_alloc_node_fn_t alloc_node; > > + /*free trie element*/ > > + rte_rib_free_node_fn_t free_node; > > + struct rte_mempool *node_pool; > > + uint32_t cur_nodes; > > + uint32_t cur_routes; > > + int max_nodes; > > + int node_sz; > > + enum rte_rib_type type; > > + enum rte_rib_alloc_type alloc_type; > > +}; > > + > > +/** RIB configuration structure */ > > +struct rte_rib_conf { > > + enum rte_rib_type type; > > + enum rte_rib_alloc_type alloc_type; > > + int max_nodes; > > + size_t node_sz; > > + uint64_t def_nh; > > +}; > > + > > +/** > > + * Lookup an IP into the RIB structure > > + * > > + * @param rib > > + * RIB object handle > > + * @param key > > + * IP to be looked up in the RIB > > + * @return > > + * pointer to struct rte_rib_node on success, > > + * NULL otherwise > > + */ > > +struct rte_rib_node * > > +rte_rib_tree_lookup(struct rte_rib *rib, uint32_t key); > > + > > +/** > > + * Lookup less specific route into the RIB structure > > + * > > + * @param ent > > + * Pointer to struct rte_rib_node that represents target route > > + * @return > > + * pointer to struct rte_rib_node that represents > > + * less specific route on success, > > + * NULL otherwise > > + */ > > +struct rte_rib_node * > > +rte_rib_tree_lookup_parent(struct rte_rib_node *ent); > > + > > +/** > > + * Lookup prefix into the RIB structure > > + * > > + * @param rib > > + * RIB object handle > > + * @param key > > + * net to be looked up in the RIB > > + * @param depth > > + * prefix length > > + * @return > > + * pointer to struct rte_rib_node on success, > > + * NULL otherwise > > + */ > > +struct rte_rib_node * > > +rte_rib_tree_lookup_exact(struct rte_rib *rib, uint32_t key, uint8_t > depth); > > Can you explain the difference between this and regular lookup, and how > they would be used. I don't think the names convey the differences > sufficiently, and so we should look to rename one or both to be clearer. > Regular lookup (rte_rib_tree_lookup) will lookup for most specific node for passed key. rte_rib_tree_lookup_exact will lookup node contained key and depth equal to passed in args. It used to find exact route. > > > + > > +/** > > + * Retrieve next more specific prefix from the RIB > s/more/most/ > > > + * that is covered by key/depth supernet > > + * > > + * @param rib > > + * RIB object handle > > + * @param key > > + * net address of supernet prefix that covers returned more specific > prefixes > > + * @param depth > > + * supernet prefix length > > + * @param cur > > + * pointer to the last returned prefix to get next prefix > > + * or > > + * NULL to get first more specific prefix > > + * @param flag > > + * -RTE_RIB_GET_NXT_ALL > > + * get all prefixes from subtrie > > By all prefixes do you mean more specific, i.e. the final prefix? > What do you mean the final prefix? > > + * -RTE_RIB_GET_NXT_COVER > > + * get only first more specific prefix even if it have more specifics > > + * @return > > + * pointer to the next more specific prefix > > + * or > > + * NULL if there is no prefixes left > > + */ > > +struct rte_rib_node * > > +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, uint8_t depth, > > + struct rte_rib_node *cur, int flag); > > + > > +/** > > + * Remove prefix from the RIB > > + * > > + * @param rib > > + * RIB object handle > > + * @param key > > + * net to be removed from the RIB > > + * @param depth > > + * prefix length > > + */ > > +void > > +rte_rib_tree_remove(struct rte_rib *rib, uint32_t key, uint8_t depth); > > + > > +/** > > + * Insert prefix into the RIB > > + * > > + * @param rib > > + * RIB object handle > > + * @param key > > + * net to be inserted to the RIB > > + * @param depth > > + * prefix length > > + * @return > > + * pointer to new rte_rib_node on success > > + * NULL otherwise > > + */ > > +struct rte_rib_node * > > +rte_rib_tree_insert(struct rte_rib *rib, uint32_t key, uint8_t depth); > > + > > +/** > > + * Create RIB > > + * > > + * @param name > > + * RIB name > > + * @param socket_id > > + * NUMA socket ID for RIB table memory allocation > > + * @param conf > > + * Structure containing the configuration > > + * @return > > + * Handle to RIB object on success > > + * NULL otherwise with rte_errno set to an appropriate values. > > + */ > > +struct rte_rib * > > +rte_rib_create(const char *name, int socket_id, struct rte_rib_conf > *conf); > > + > > +/** > > + * Find an existing RIB object and return a pointer to it. > > + * > > + * @param name > > + * Name of the rib object as passed to rte_rib_create() > > + * @return > > + * Pointer to rib object or NULL if object not found with rte_errno > > + * set appropriately. Possible rte_errno values include: > > + * - ENOENT - required entry not available to return. > > + */ > > +struct rte_rib * > > +rte_rib_find_existing(const char *name); > > + > > +/** > > + * Free an RIB object. > > + * > > + * @param rib > > + * RIB object handle > > + * @return > > + * None > > + */ > > +void > > +rte_rib_free(struct rte_rib *rib); > > + > > +/** > > + * Add a rule to the RIB. > > + * > > + * @param rib > > + * RIB object handle > > + * @param ip > > + * IP of the rule to be added to the RIB > > + * @param depth > > + * Depth of the rule to be added to the RIB > > + * @param next_hop > > + * Next hop of the rule to be added to the RIB > > + * @return > > + * 0 on success, negative value otherwise > > + */ > > +int > > +rte_rib_add(struct rte_rib *rib, uint32_t ip, uint8_t depth, uint64_t > next_hop); > > + > > +/** > > + * Delete a rule from the RIB. > > + * > > + * @param rib > > + * RIB object handle > > + * @param ip > > + * IP of the rule to be deleted from the RIB > > + * @param depth > > + * Depth of the rule to be deleted from the RIB > > + * @return > > + * 0 on success, negative value otherwise > > + */ > > +int > > +rte_rib_delete(struct rte_rib *rib, uint32_t ip, uint8_t depth); > > + > > +/** > > + * Lookup multiple IP addresses in an FIB. This may be implemented as a > > + * macro, so the address of the function should not be used. > > + * > > + * @param RIB > > + * RIB object handle > > + * @param ips > > + * Array of IPs to be looked up in the FIB > > + * @param next_hops > > + * Next hop of the most specific rule found for IP. > > + * This is an array of eight byte values. > > + * If the lookup for the given IP failed, then corresponding element > would > > + * contain default value, see description of then next parameter. > > + * @param n > > + * Number of elements in ips (and next_hops) array to lookup. This > should be a > > + * compile time constant, and divisible by 8 for best performance. > > + * @param defv > > + * Default value to populate into corresponding element of hop[] > array, > > + * if lookup would fail. > > + * @return > > + * -EINVAL for incorrect arguments, otherwise 0 > > + */ > > +#define rte_rib_fib_lookup_bulk(rib, ips, next_hops, n) \ > > + rib->lookup(rib->fib, ips, next_hops, n) > > My main thought here is whether this needs to be a function at all? > Given that it takes a full burst of addresses in a single go, how much > performance would actually be lost by making this a regular function in > the C file? > IF we do convert this to a regular function, then a lot of the structure > definitions above - most importantly, the rib structure itself - can > probably be moved to a private header file and not exposed to > applications at all. This will make ABI compatibility a *lot* easier, as > the structures can be changed without affecting the public ABI. > I didn't quite understand what you mean. > /Bruce > > > + > > +#endif /* _RTE_RIB_H_ */ > -- Regards, Vladimir