From: Bruce Richardson <bruce.richardson@intel.com>
To: Medvedkin Vladimir <medvedkinv@gmail.com>
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v2 1/2] Add RIB library
Date: Wed, 14 Mar 2018 11:09:18 +0000 [thread overview]
Message-ID: <20180314110918.GA11720@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <1519249495-16594-2-git-send-email-medvedkinv@gmail.com>
On Wed, Feb 21, 2018 at 09:44:54PM +0000, Medvedkin Vladimir wrote:
> RIB is an alternative to current LPM library.
> It solves the following problems
> - Increases the speed of control plane operations against lpm such as
> adding/deleting routes
> - Adds abstraction from dataplane algorithms, so it is possible to add
> different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
> in addition to current dir24_8
> - It is possible to keep user defined application specific additional
> information in struct rte_rib_node which represents route entry.
> It can be next hop/set of next hops (i.e. active and feasible),
> pointers to link rte_rib_node based on some criteria (i.e. next_hop),
> plenty of additional control plane information.
> - For dir24_8 implementation it is possible to remove rte_lpm_tbl_entry.depth
> field that helps to save 6 bits.
> - Also new dir24_8 implementation supports different next_hop sizes
> (1/2/4/8 bytes per next hop)
> - Removed RTE_LPM_LOOKUP_SUCCESS to save 1 bit and to eleminate ternary operator.
> Instead it returns special default value if there is no route.
>
> Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
> ---
> config/common_base | 6 +
> doc/api/doxy-api.conf | 1 +
> lib/Makefile | 2 +
> lib/librte_rib/Makefile | 22 ++
> lib/librte_rib/rte_dir24_8.c | 482 +++++++++++++++++++++++++++++++++
> lib/librte_rib/rte_dir24_8.h | 116 ++++++++
> lib/librte_rib/rte_rib.c | 526 +++++++++++++++++++++++++++++++++++++
> lib/librte_rib/rte_rib.h | 322 +++++++++++++++++++++++
> lib/librte_rib/rte_rib_version.map | 18 ++
> mk/rte.app.mk | 1 +
> 10 files changed, 1496 insertions(+)
> create mode 100644 lib/librte_rib/Makefile
> create mode 100644 lib/librte_rib/rte_dir24_8.c
> create mode 100644 lib/librte_rib/rte_dir24_8.h
> create mode 100644 lib/librte_rib/rte_rib.c
> create mode 100644 lib/librte_rib/rte_rib.h
> create mode 100644 lib/librte_rib/rte_rib_version.map
>
First pass review comments. For now just reviewed the main public header
file rte_rib.h. Later reviews will cover the other files as best I can.
/Bruce
<snip>
> diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
> new file mode 100644
> index 0000000..6eac8fb
> --- /dev/null
> +++ b/lib/librte_rib/rte_rib.h
> @@ -0,0 +1,322 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Vladimir Medvedkin <medvedkinv@gmail.com>
> + */
> +
> +#ifndef _RTE_RIB_H_
> +#define _RTE_RIB_H_
> +
> +/**
> + * @file
> + * Compressed trie implementation for Longest Prefix Match
> + */
> +
> +/** @internal Macro to enable/disable run-time checks. */
> +#if defined(RTE_LIBRTE_RIB_DEBUG)
> +#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do { \
> + if (cond) \
> + return retval; \
> +} while (0)
> +#else
> +#define RTE_RIB_RETURN_IF_TRUE(cond, retval)
> +#endif
use RTE_ASSERT?
> +
> +#define RTE_RIB_VALID_NODE 1
should there be an INVALID_NODE macro?
> +#define RTE_RIB_GET_NXT_ALL 0
> +#define RTE_RIB_GET_NXT_COVER 1
> +
> +#define RTE_RIB_INVALID_ROUTE 0
> +#define RTE_RIB_VALID_ROUTE 1
> +
> +/** Max number of characters in RIB name. */
> +#define RTE_RIB_NAMESIZE 64
> +
> +/** Maximum depth value possible for IPv4 RIB. */
> +#define RTE_RIB_MAXDEPTH 32
I think we should have IPv4 in the name here. Will it not be extended to
support IPv6 in future?
> +
> +/**
> + * Macro to check if prefix1 {key1/depth1}
> + * is covered by prefix2 {key2/depth2}
> + */
> +#define RTE_RIB_IS_COVERED(key1, depth1, key2, depth2) \
> + ((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - depth2))) == 0)\
> + && (depth1 > depth2))
Neat check!
Any particular reason for using UINT64_MAX here rather than UINT32_MAX?
I think you can avoid the casting and have a slightly shorter mask by
changing "(uint32_t)(UINT64_MAX << (32 - depth2)" to
"~(UINT32_MAX >> depth2)"
I'd also suggest for readability putting the second check first, and,
for maintainability, using an inline function rather than a macro.
> +
> +/** @internal Macro to get next node in tree*/
> +#define RTE_RIB_GET_NXT_NODE(node, key) \
> + ((key & (1 << (31 - node->depth))) ? node->right : node->left)
> +/** @internal Macro to check if node is right child*/
> +#define RTE_RIB_IS_RIGHT_NODE(node) (node->parent->right == node)
Again, consider inline fns rather than macros.
For the latter macro, rather than doing additional pointer derefs to
parent, can you also get if it's a right node by using:
"(node->key & (1 << (32 - node->depth)))"?
> +
> +
> +struct rte_rib_node {
> + struct rte_rib_node *left;
> + struct rte_rib_node *right;
> + struct rte_rib_node *parent;
> + uint32_t key;
> + uint8_t depth;
> + uint8_t flag;
> + uint64_t nh;
> + uint64_t ext[0];
> +};
> +
> +struct rte_rib;
> +
> +/** Type of FIB struct*/
> +enum rte_rib_type {
> + RTE_RIB_DIR24_8_1B,
> + RTE_RIB_DIR24_8_2B,
> + RTE_RIB_DIR24_8_4B,
> + RTE_RIB_DIR24_8_8B,
> + RTE_RIB_TYPE_MAX
> +};
If the plan is to support multiple underlying fib types and algorithms
under the rib library, would it not be better to separate out the
algorithm part from the data storage part? So have the type just be
DIR_24_8, and have the 1, 2, 4 or 8 specified separately.
> +
> +enum rte_rib_op {
> + RTE_RIB_ADD,
> + RTE_RIB_DEL
> +};
> +
> +/** RIB nodes allocation type */
> +enum rte_rib_alloc_type {
> + RTE_RIB_MALLOC,
> + RTE_RIB_MEMPOOL,
> + RTE_RIB_ALLOC_MAX
> +};
Not sure you need this any more. Malloc allocations and mempool
allocations are now pretty much the same thing.
> +
> +typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t key,
> + uint8_t depth, uint64_t next_hop, enum rte_rib_op op);
Do you anticipate more ops in future than just add and delete? If not,
why not just split this function into two and drop the op struct.
> +typedef int (*rte_rib_tree_lookup_fn_t)(void *fib, const uint32_t *ips,
> + uint64_t *next_hops, const unsigned n);
> +typedef struct rte_rib_node *(*rte_rib_alloc_node_fn_t)(struct rte_rib *rib);
> +typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib,
> + struct rte_rib_node *node);
> +
> +struct rte_rib {
> + char name[RTE_RIB_NAMESIZE];
> + /*pointer to rib trie*/
> + struct rte_rib_node *trie;
> + /*pointer to dataplane struct*/
> + void *fib;
> + /*prefix modification*/
> + rte_rib_modify_fn_t modify;
> + /* Bulk lookup fn*/
> + rte_rib_tree_lookup_fn_t lookup;
> + /*alloc trie element*/
> + rte_rib_alloc_node_fn_t alloc_node;
> + /*free trie element*/
> + rte_rib_free_node_fn_t free_node;
> + struct rte_mempool *node_pool;
> + uint32_t cur_nodes;
> + uint32_t cur_routes;
> + int max_nodes;
> + int node_sz;
> + enum rte_rib_type type;
> + enum rte_rib_alloc_type alloc_type;
> +};
> +
> +/** RIB configuration structure */
> +struct rte_rib_conf {
> + enum rte_rib_type type;
> + enum rte_rib_alloc_type alloc_type;
> + int max_nodes;
> + size_t node_sz;
> + uint64_t def_nh;
> +};
> +
> +/**
> + * Lookup an IP into the RIB structure
> + *
> + * @param rib
> + * RIB object handle
> + * @param key
> + * IP to be looked up in the RIB
> + * @return
> + * pointer to struct rte_rib_node on success,
> + * NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_lookup(struct rte_rib *rib, uint32_t key);
> +
> +/**
> + * Lookup less specific route into the RIB structure
> + *
> + * @param ent
> + * Pointer to struct rte_rib_node that represents target route
> + * @return
> + * pointer to struct rte_rib_node that represents
> + * less specific route on success,
> + * NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_lookup_parent(struct rte_rib_node *ent);
> +
> +/**
> + * Lookup prefix into the RIB structure
> + *
> + * @param rib
> + * RIB object handle
> + * @param key
> + * net to be looked up in the RIB
> + * @param depth
> + * prefix length
> + * @return
> + * pointer to struct rte_rib_node on success,
> + * NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_lookup_exact(struct rte_rib *rib, uint32_t key, uint8_t depth);
Can you explain the difference between this and regular lookup, and how
they would be used. I don't think the names convey the differences
sufficiently, and so we should look to rename one or both to be clearer.
> +
> +/**
> + * Retrieve next more specific prefix from the RIB
s/more/most/
> + * that is covered by key/depth supernet
> + *
> + * @param rib
> + * RIB object handle
> + * @param key
> + * net address of supernet prefix that covers returned more specific prefixes
> + * @param depth
> + * supernet prefix length
> + * @param cur
> + * pointer to the last returned prefix to get next prefix
> + * or
> + * NULL to get first more specific prefix
> + * @param flag
> + * -RTE_RIB_GET_NXT_ALL
> + * get all prefixes from subtrie
By all prefixes do you mean more specific, i.e. the final prefix?
> + * -RTE_RIB_GET_NXT_COVER
> + * get only first more specific prefix even if it have more specifics
> + * @return
> + * pointer to the next more specific prefix
> + * or
> + * NULL if there is no prefixes left
> + */
> +struct rte_rib_node *
> +rte_rib_tree_get_nxt(struct rte_rib *rib, uint32_t key, uint8_t depth,
> + struct rte_rib_node *cur, int flag);
> +
> +/**
> + * Remove prefix from the RIB
> + *
> + * @param rib
> + * RIB object handle
> + * @param key
> + * net to be removed from the RIB
> + * @param depth
> + * prefix length
> + */
> +void
> +rte_rib_tree_remove(struct rte_rib *rib, uint32_t key, uint8_t depth);
> +
> +/**
> + * Insert prefix into the RIB
> + *
> + * @param rib
> + * RIB object handle
> + * @param key
> + * net to be inserted to the RIB
> + * @param depth
> + * prefix length
> + * @return
> + * pointer to new rte_rib_node on success
> + * NULL otherwise
> + */
> +struct rte_rib_node *
> +rte_rib_tree_insert(struct rte_rib *rib, uint32_t key, uint8_t depth);
> +
> +/**
> + * Create RIB
> + *
> + * @param name
> + * RIB name
> + * @param socket_id
> + * NUMA socket ID for RIB table memory allocation
> + * @param conf
> + * Structure containing the configuration
> + * @return
> + * Handle to RIB object on success
> + * NULL otherwise with rte_errno set to an appropriate values.
> + */
> +struct rte_rib *
> +rte_rib_create(const char *name, int socket_id, struct rte_rib_conf *conf);
> +
> +/**
> + * Find an existing RIB object and return a pointer to it.
> + *
> + * @param name
> + * Name of the rib object as passed to rte_rib_create()
> + * @return
> + * Pointer to rib object or NULL if object not found with rte_errno
> + * set appropriately. Possible rte_errno values include:
> + * - ENOENT - required entry not available to return.
> + */
> +struct rte_rib *
> +rte_rib_find_existing(const char *name);
> +
> +/**
> + * Free an RIB object.
> + *
> + * @param rib
> + * RIB object handle
> + * @return
> + * None
> + */
> +void
> +rte_rib_free(struct rte_rib *rib);
> +
> +/**
> + * Add a rule to the RIB.
> + *
> + * @param rib
> + * RIB object handle
> + * @param ip
> + * IP of the rule to be added to the RIB
> + * @param depth
> + * Depth of the rule to be added to the RIB
> + * @param next_hop
> + * Next hop of the rule to be added to the RIB
> + * @return
> + * 0 on success, negative value otherwise
> + */
> +int
> +rte_rib_add(struct rte_rib *rib, uint32_t ip, uint8_t depth, uint64_t next_hop);
> +
> +/**
> + * Delete a rule from the RIB.
> + *
> + * @param rib
> + * RIB object handle
> + * @param ip
> + * IP of the rule to be deleted from the RIB
> + * @param depth
> + * Depth of the rule to be deleted from the RIB
> + * @return
> + * 0 on success, negative value otherwise
> + */
> +int
> +rte_rib_delete(struct rte_rib *rib, uint32_t ip, uint8_t depth);
> +
> +/**
> + * Lookup multiple IP addresses in an FIB. This may be implemented as a
> + * macro, so the address of the function should not be used.
> + *
> + * @param RIB
> + * RIB object handle
> + * @param ips
> + * Array of IPs to be looked up in the FIB
> + * @param next_hops
> + * Next hop of the most specific rule found for IP.
> + * This is an array of eight byte values.
> + * If the lookup for the given IP failed, then corresponding element would
> + * contain default value, see description of then next parameter.
> + * @param n
> + * Number of elements in ips (and next_hops) array to lookup. This should be a
> + * compile time constant, and divisible by 8 for best performance.
> + * @param defv
> + * Default value to populate into corresponding element of hop[] array,
> + * if lookup would fail.
> + * @return
> + * -EINVAL for incorrect arguments, otherwise 0
> + */
> +#define rte_rib_fib_lookup_bulk(rib, ips, next_hops, n) \
> + rib->lookup(rib->fib, ips, next_hops, n)
My main thought here is whether this needs to be a function at all?
Given that it takes a full burst of addresses in a single go, how much
performance would actually be lost by making this a regular function in
the C file?
IF we do convert this to a regular function, then a lot of the structure
definitions above - most importantly, the rib structure itself - can
probably be moved to a private header file and not exposed to
applications at all. This will make ABI compatibility a *lot* easier, as
the structures can be changed without affecting the public ABI.
/Bruce
> +
> +#endif /* _RTE_RIB_H_ */
next prev parent reply other threads:[~2018-03-14 11:09 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-21 21:44 [dpdk-dev] [PATCH v2 0/2] lib/rib: Add Routing Information Base library Medvedkin Vladimir
2018-02-21 21:44 ` [dpdk-dev] [PATCH v2 1/2] Add RIB library Medvedkin Vladimir
2018-03-14 11:09 ` Bruce Richardson [this message]
2018-03-14 12:05 ` Richardson, Bruce
2018-03-25 18:17 ` Vladimir Medvedkin
2018-03-26 9:50 ` Bruce Richardson
2018-03-29 19:59 ` Vladimir Medvedkin
2018-03-29 10:27 ` Bruce Richardson
2018-03-29 20:11 ` Vladimir Medvedkin
2018-03-29 20:41 ` Bruce Richardson
2018-02-21 21:44 ` [dpdk-dev] [PATCH v2 2/2] Add autotests for " Medvedkin Vladimir
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180314110918.GA11720@bricha3-MOBL.ger.corp.intel.com \
--to=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=medvedkinv@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).