DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [RFC] Add RIB library
@ 2017-07-11 19:33 Medvedkin Vladimir
  2017-07-11 19:33 ` [dpdk-dev] [PATCH] lib/rib: Add Routing Information Base library Medvedkin Vladimir
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Medvedkin Vladimir @ 2017-07-11 19:33 UTC (permalink / raw)
  To: dev; +Cc: Medvedkin Vladimir

Hi,

I want to introduce new library for ip routing lookup that have some advantages
over current LPM library. In short:
     - Increases the speed of control plane operations against lpm such as
       adding/deleting routes
     - Adds abstraction from dataplane algorythms, so it is possible to add
       different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
       in addition to current dir24_8
     - It is possible to keep user defined application specific additional
       information in struct rte_rib_v4_node which represents route entry.
       It can be next hop/set of next hops (i.e. active and feasible),
       pointers to link rte_rib_v4_node based on some criteria (i.e. next_hop),
       plenty of additional control plane information.
     - For dir24_8 implementation it is possible to remove rte_lpm_tbl_entry.depth
       field that helps to save 6 bits.
     - Also new dir24_8 implementation supports different next_hop sizes
       (1/2/4/8 bytes per next hop)

It would be nice to hear your opinion. The draft is below.

Medvedkin Vladimir (1):
  lib/rib: Add Routing Information Base library

 config/common_base           |   6 +
 doc/api/doxy-api.conf        |   1 +
 lib/Makefile                 |   2 +
 lib/librte_rib/Makefile      |  43 ++++
 lib/librte_rib/rte_dir24_8.c | 411 +++++++++++++++++++++++++++++++++++++++
 lib/librte_rib/rte_dir24_8.h | 144 ++++++++++++++
 lib/librte_rib/rte_rib.c     | 454 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_rib/rte_rib.h     | 260 +++++++++++++++++++++++++
 8 files changed, 1321 insertions(+)
 create mode 100644 lib/librte_rib/Makefile
 create mode 100644 lib/librte_rib/rte_dir24_8.c
 create mode 100644 lib/librte_rib/rte_dir24_8.h
 create mode 100644 lib/librte_rib/rte_rib.c
 create mode 100644 lib/librte_rib/rte_rib.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [dpdk-dev] [PATCH] lib/rib: Add Routing Information Base library
  2017-07-11 19:33 [dpdk-dev] [RFC] Add RIB library Medvedkin Vladimir
@ 2017-07-11 19:33 ` Medvedkin Vladimir
  2017-07-11 20:28   ` Stephen Hemminger
  2017-07-11 20:31 ` [dpdk-dev] [RFC] Add RIB library Stephen Hemminger
  2017-08-14 10:51 ` Bruce Richardson
  2 siblings, 1 reply; 12+ messages in thread
From: Medvedkin Vladimir @ 2017-07-11 19:33 UTC (permalink / raw)
  To: dev; +Cc: Medvedkin Vladimir

RIB is an alternative to current LPM library.
It solves the following problems
 - Increases the speed of control plane operations against lpm such as
   adding/deleting routes
 - Adds abstraction from dataplane algorythms, so it is possible to add
   different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
   in addition to current dir24_8
 - It is possible to keep user defined application specific additional
   information in struct rte_rib_v4_node which represents route entry.
   It can be next hop/set of next hops (i.e. active and feasible),
   pointers to link rte_rib_v4_node based on some criteria (i.e. next_hop),
   plenty of additional control plane information.
 - For dir24_8 implementation it is possible to remove rte_lpm_tbl_entry.depth
   field that helps to save 6 bits.
 - Also new dir24_8 implementation supports different next_hop sizes
   (1/2/4/8 bytes per next hop)

Signed-off-by: Medvedkin Vladimir <medvedkinv@gmail.com>
---
 config/common_base           |   6 +
 doc/api/doxy-api.conf        |   1 +
 lib/Makefile                 |   2 +
 lib/librte_rib/Makefile      |  43 ++++
 lib/librte_rib/rte_dir24_8.c | 411 +++++++++++++++++++++++++++++++++++++++
 lib/librte_rib/rte_dir24_8.h | 144 ++++++++++++++
 lib/librte_rib/rte_rib.c     | 454 +++++++++++++++++++++++++++++++++++++++++++
 lib/librte_rib/rte_rib.h     | 260 +++++++++++++++++++++++++
 8 files changed, 1321 insertions(+)
 create mode 100644 lib/librte_rib/Makefile
 create mode 100644 lib/librte_rib/rte_dir24_8.c
 create mode 100644 lib/librte_rib/rte_dir24_8.h
 create mode 100644 lib/librte_rib/rte_rib.c
 create mode 100644 lib/librte_rib/rte_rib.h

diff --git a/config/common_base b/config/common_base
index 8ae6e92..1b0921f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -615,6 +615,12 @@ CONFIG_RTE_LIBRTE_LPM=y
 CONFIG_RTE_LIBRTE_LPM_DEBUG=n
 
 #
+# Compile librte_rib
+#
+CONFIG_RTE_LIBRTE_RIB=y
+CONFIG_RTE_LIBRTE_RIB_DEBUG=n
+
+#
 # Compile librte_acl
 #
 CONFIG_RTE_LIBRTE_ACL=y
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 823554f..f80c8d5 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -54,6 +54,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_kvargs \
                           lib/librte_latencystats \
                           lib/librte_lpm \
+			  lib/librte_rib \
                           lib/librte_mbuf \
                           lib/librte_mempool \
                           lib/librte_meter \
diff --git a/lib/Makefile b/lib/Makefile
index 86caba1..943c383 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -61,6 +61,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_EFD) += librte_efd
 DEPDIRS-librte_efd := librte_eal librte_ring librte_hash
 DIRS-$(CONFIG_RTE_LIBRTE_LPM) += librte_lpm
 DEPDIRS-librte_lpm := librte_eal
+DIRS-$(CONFIG_RTE_LIBRTE_RIB) += librte_rib
+DEPDIRS-librte_rib := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_ACL) += librte_acl
 DEPDIRS-librte_acl := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_NET) += librte_net
diff --git a/lib/librte_rib/Makefile b/lib/librte_rib/Makefile
new file mode 100644
index 0000000..8d434a3
--- /dev/null
+++ b/lib/librte_rib/Makefile
@@ -0,0 +1,43 @@
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Vladimir Medvedkin <medvedkinv@gmail.com>
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_rib.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_rib.c rte_dir24_8.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_rib.h rte_dir24_8.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_rib/rte_dir24_8.c b/lib/librte_rib/rte_dir24_8.c
new file mode 100644
index 0000000..915df3d
--- /dev/null
+++ b/lib/librte_rib/rte_dir24_8.c
@@ -0,0 +1,411 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <rte_memory.h>
+#include <rte_branch_prediction.h>
+
+#include <rte_rib.h>
+#include <rte_dir24_8.h>
+
+#define ROUNDUP(x, y)   ((((x - 1) >> (32 - y)) + 1) << (32 - y))
+#define RTE_DIR24_8_GET_TBL24_P(fib, ip)			\
+	((void *)&((uint8_t *)fib->tbl24)[(ip &			\
+		RTE_DIR24_8_TBL24_MASK) >> (8 - fib->nh_sz)])	\
+
+int
+rte_dir24_8_lookup_1b(void *fib_p, uint32_t ip, uint64_t *next_hop)
+{
+	struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;
+	uint32_t tbl24_index = (ip >> 8);
+
+	RTE_RIB_RETURN_IF_TRUE(((fib_p == NULL) ||
+		(next_hop == NULL)), -EINVAL);
+
+	RTE_DIR24_8_LOOKUP(uint8_t);
+
+	return (tbl_entry & RTE_DIR24_8_LOOKUP_SUCCESS) ? 0 : -ENOENT;
+}
+
+int
+rte_dir24_8_lookup_2b(void *fib_p, uint32_t ip, uint64_t *next_hop)
+{
+	struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;
+	uint32_t tbl24_index = (ip >> 8);
+
+	RTE_RIB_RETURN_IF_TRUE(((fib_p == NULL) ||
+		(next_hop == NULL)), -EINVAL);
+
+	RTE_DIR24_8_LOOKUP(uint16_t);
+
+	return (tbl_entry & RTE_DIR24_8_LOOKUP_SUCCESS) ? 0 : -ENOENT;
+}
+
+int
+rte_dir24_8_lookup_4b(void *fib_p, uint32_t ip, uint64_t *next_hop)
+{
+	struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;
+	uint32_t tbl24_index = (ip >> 8);
+
+	RTE_RIB_RETURN_IF_TRUE(((fib_p == NULL) ||
+		(next_hop == NULL)), -EINVAL);
+
+	RTE_DIR24_8_LOOKUP(uint32_t);
+
+	return (tbl_entry & RTE_DIR24_8_LOOKUP_SUCCESS) ? 0 : -ENOENT;
+}
+
+int
+rte_dir24_8_lookup_8b(void *fib_p, uint32_t ip, uint64_t *next_hop)
+{
+	struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;
+	uint32_t tbl24_index = (ip >> 8);
+
+	RTE_RIB_RETURN_IF_TRUE(((fib_p == NULL) ||
+		(next_hop == NULL)), -EINVAL);
+
+	RTE_DIR24_8_LOOKUP(uint64_t);
+
+	return (tbl_entry & RTE_DIR24_8_LOOKUP_SUCCESS) ? 0 : -ENOENT;
+}
+
+static int
+tbl8_alloc(struct rte_dir24_8_tbl *fib)
+{
+	uint32_t i;
+	uint8_t *ptr8 = (uint8_t *)fib->tbl8;
+	uint16_t *ptr16 = (uint16_t *)fib->tbl8;
+	uint32_t *ptr32 = (uint32_t *)fib->tbl8;
+	uint64_t *ptr64 = (uint64_t *)fib->tbl8;
+
+	if (fib->cur_tbl8s >= fib->number_tbl8s)
+		return -ENOSPC;
+
+	switch (fib->nh_sz) {
+	case RTE_DIR24_8_1B:
+		for (i = 0; i < fib->number_tbl8s; i++) {
+			if ((ptr8[i * RTE_DIR24_8_TBL8_GRP_NUM_ENT] &
+					RTE_DIR24_8_VALID_EXT_ENT) !=
+					RTE_DIR24_8_VALID_EXT_ENT) {
+				fib->cur_tbl8s++;
+				return i;
+			}
+		}
+		break;
+	case RTE_DIR24_8_2B:
+		for (i = 0; i < fib->number_tbl8s; i++) {
+			if ((ptr16[i * RTE_DIR24_8_TBL8_GRP_NUM_ENT] &
+					RTE_DIR24_8_VALID_EXT_ENT) !=
+					RTE_DIR24_8_VALID_EXT_ENT) {
+				fib->cur_tbl8s++;
+				return i;
+			}
+		}
+	case RTE_DIR24_8_4B:
+		for (i = 0; i < fib->number_tbl8s; i++) {
+			if ((ptr32[i * RTE_DIR24_8_TBL8_GRP_NUM_ENT] &
+					RTE_DIR24_8_VALID_EXT_ENT) !=
+					RTE_DIR24_8_VALID_EXT_ENT) {
+				fib->cur_tbl8s++;
+				return i;
+			}
+		}
+	case RTE_DIR24_8_8B:
+		for (i = 0; i < fib->number_tbl8s; i++) {
+			if ((ptr64[i * RTE_DIR24_8_TBL8_GRP_NUM_ENT] &
+					RTE_DIR24_8_VALID_EXT_ENT) !=
+					RTE_DIR24_8_VALID_EXT_ENT) {
+				fib->cur_tbl8s++;
+				return i;
+			}
+		}
+	}
+	return -ENOSPC;
+}
+
+static void
+write_to_fib(void *ptr, uint64_t val, enum rte_dir24_8_nh_sz size, int n)
+{
+	int i;
+	uint8_t *ptr8 = (uint8_t *)ptr;
+	uint16_t *ptr16 = (uint16_t *)ptr;
+	uint32_t *ptr32 = (uint32_t *)ptr;
+	uint64_t *ptr64 = (uint64_t *)ptr;
+
+	switch (size) {
+	case RTE_DIR24_8_1B:
+		for (i = 0; i < n; i++)
+			ptr8[i] = (uint8_t)val;
+		break;
+	case RTE_DIR24_8_2B:
+		for (i = 0; i < n; i++)
+			ptr16[i] = (uint16_t)val;
+		break;
+	case RTE_DIR24_8_4B:
+		for (i = 0; i < n; i++)
+			ptr32[i] = (uint32_t)val;
+		break;
+	case RTE_DIR24_8_8B:
+		for (i = 0; i < n; i++)
+			ptr64[i] = (uint64_t)val;
+		break;
+	}
+}
+
+static int
+install_to_fib(struct rte_dir24_8_tbl *fib, uint32_t ledge, uint32_t redge,
+	uint64_t next_hop, int valid)
+{
+	int tbl8_idx;
+	uint64_t tbl24_tmp;
+	uint8_t *tbl8_ptr;
+
+	if (ROUNDUP(ledge, 24) <= redge) {
+		if (ledge < ROUNDUP(ledge, 24)) {
+			tbl24_tmp = RTE_DIR24_8_GET_TBL24(fib, ledge);
+			if ((tbl24_tmp & RTE_DIR24_8_VALID_EXT_ENT) !=
+					RTE_DIR24_8_VALID_EXT_ENT) {
+				tbl8_idx = tbl8_alloc(fib);
+				if (tbl8_idx < 0)
+					return tbl8_idx;
+				tbl8_ptr = (uint8_t *)fib->tbl8 +
+					((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) <<
+					fib->nh_sz);
+				/*Init tbl8 entries with nexthop from tbl24*/
+				write_to_fib((void *)tbl8_ptr, tbl24_tmp|
+					RTE_DIR24_8_VALID_EXT_ENT, fib->nh_sz,
+					RTE_DIR24_8_TBL8_GRP_NUM_ENT);
+				/*update dir24 entry with tbl8 index*/
+				write_to_fib(RTE_DIR24_8_GET_TBL24_P(fib, ledge),
+					(tbl8_idx << 2)|
+					RTE_DIR24_8_VALID_EXT_ENT|
+					valid, fib->nh_sz, 1);
+			} else
+				tbl8_idx = tbl24_tmp >> 2;
+			tbl8_ptr = (uint8_t *)fib->tbl8 +
+				(((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) +
+				(ledge & ~RTE_DIR24_8_TBL24_MASK)) <<
+				fib->nh_sz);
+			/*update tbl8 with new next hop*/
+			write_to_fib((void *)tbl8_ptr, (next_hop << 2)|
+				RTE_DIR24_8_VALID_EXT_ENT|valid,
+				fib->nh_sz, ROUNDUP(ledge, 24) - ledge);
+		}
+		if (ROUNDUP(ledge, 24) < (redge & RTE_DIR24_8_TBL24_MASK)) {
+			write_to_fib(RTE_DIR24_8_GET_TBL24_P(fib,
+				ROUNDUP(ledge, 24)), (next_hop << 2)|valid,
+				fib->nh_sz, ((redge & RTE_DIR24_8_TBL24_MASK) -
+				ROUNDUP(ledge, 24)) >> 8);
+		}
+		if (redge & ~RTE_DIR24_8_TBL24_MASK) {
+			tbl24_tmp = RTE_DIR24_8_GET_TBL24(fib, redge);
+			if ((tbl24_tmp & RTE_DIR24_8_VALID_EXT_ENT) !=
+					RTE_DIR24_8_VALID_EXT_ENT) {
+				tbl8_idx = tbl8_alloc(fib);
+				if (tbl8_idx < 0)
+					return tbl8_idx;
+				tbl8_ptr = (uint8_t *)fib->tbl8 +
+					((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) <<
+					fib->nh_sz);
+				/*Init tbl8 entries with nexthop from tbl24*/
+				write_to_fib((void *)tbl8_ptr, tbl24_tmp|
+					RTE_DIR24_8_VALID_EXT_ENT, fib->nh_sz,
+					RTE_DIR24_8_TBL8_GRP_NUM_ENT);
+				/*update dir24 entry with tbl8 index*/
+				write_to_fib(RTE_DIR24_8_GET_TBL24_P(fib, redge),
+					(tbl8_idx << 2)|
+					RTE_DIR24_8_VALID_EXT_ENT|
+					valid, fib->nh_sz, 1);
+			} else
+				tbl8_idx = tbl24_tmp >> 2;
+			tbl8_ptr = (uint8_t *)fib->tbl8 +
+				((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) <<
+				fib->nh_sz);
+			/*update tbl8 with new next hop*/
+			write_to_fib((void *)tbl8_ptr, (next_hop << 2)|
+				RTE_DIR24_8_VALID_EXT_ENT|valid,
+				fib->nh_sz, redge & ~RTE_DIR24_8_TBL24_MASK);
+		}
+	} else {
+		tbl24_tmp = RTE_DIR24_8_GET_TBL24(fib, ledge);
+			if ((tbl24_tmp & RTE_DIR24_8_VALID_EXT_ENT) !=
+				RTE_DIR24_8_VALID_EXT_ENT) {
+				tbl8_idx = tbl8_alloc(fib);
+				if (tbl8_idx < 0)
+					return tbl8_idx;
+				tbl8_ptr = (uint8_t *)fib->tbl8 +
+					((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) <<
+					fib->nh_sz);
+
+				/*Init tbl8 entries with nexthop from tbl24*/
+				write_to_fib((void *)tbl8_ptr, tbl24_tmp|
+					RTE_DIR24_8_VALID_EXT_ENT, fib->nh_sz,
+					RTE_DIR24_8_TBL8_GRP_NUM_ENT);
+				/*update dir24 entry with tbl8 index*/
+				write_to_fib(RTE_DIR24_8_GET_TBL24_P(fib, ledge),
+					(tbl8_idx << 2)|
+					RTE_DIR24_8_VALID_EXT_ENT|
+					valid, fib->nh_sz, 1);
+			} else
+				tbl8_idx = tbl24_tmp >> 2;
+			tbl8_ptr = (uint8_t *)fib->tbl8 +
+				(((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) +
+				(ledge & ~RTE_DIR24_8_TBL24_MASK)) <<
+				fib->nh_sz);
+			/*update tbl8 with new next hop*/
+			write_to_fib((void *)tbl8_ptr, (next_hop << 2)|
+				RTE_DIR24_8_VALID_EXT_ENT|valid,
+				fib->nh_sz, redge - ledge);
+	}
+	return 0;
+}
+
+int
+rte_dir24_8_modify(struct rte_rib *rib, uint32_t ip, uint8_t mask_len,
+	uint64_t next_hop, enum rte_rib_op rib_op)
+{
+	int ret, valid = RTE_DIR24_8_LOOKUP_SUCCESS;
+	struct rte_rib_v4_node *ent, *tmp;
+	struct rte_rib_simple_ext *ext, *tmp_ext;
+	struct rte_dir24_8_tbl *fib;
+	uint32_t ledge, redge;
+	int tbl8_idx;
+	uint8_t *tbl8_ptr;
+
+	if ((rib == NULL) || (mask_len > RTE_RIB_V4_MAXDEPTH))
+		return -EINVAL;
+
+	fib = rib->fib;
+	ip &= (uint32_t)(UINT64_MAX << (32 - mask_len));
+
+	ent = rte_rib_v4_lookup_exact(rib, ip, mask_len);
+	if (rib_op == RTE_RIB_OP_ADD) {
+		if (ent)
+			return -EEXIST;
+		ent = rte_rib_v4_insert(rib, ip, mask_len);
+	}
+	if (ent == NULL)
+		return -ENOENT;
+	ext = (struct rte_rib_simple_ext *)ent->ext;
+
+	switch (rib_op) {
+	case RTE_RIB_OP_DEL:
+		tmp = rte_rib_v4_lookup_parent(ent);
+		if (tmp == NULL) {
+			valid = 0;
+			break;
+		}
+		tmp_ext = (struct rte_rib_simple_ext *)tmp->ext;
+		if (ext->next_hop == tmp_ext->next_hop)
+			goto delete_finish;
+		next_hop = tmp_ext->next_hop;
+		break;
+	case RTE_RIB_OP_MODIFY:
+		if (ext->next_hop == next_hop)
+			return 0;
+		break;
+	case RTE_RIB_OP_ADD:
+		ext->next_hop = next_hop;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	tmp = NULL;
+	ledge = ip;
+	do {
+		tmp = rte_rib_v4_get_next_child(rib, ip, mask_len, tmp,
+			GET_NXT_COVER);
+		if (tmp != NULL) {
+			if (tmp->mask_len == mask_len)
+				continue;
+			redge = tmp->key;
+			if (ledge == redge) {
+				ledge = redge +
+					(uint32_t)(1ULL << (32 - tmp->mask_len));
+				continue;
+			}
+			ret = install_to_fib(rib->fib, ledge, redge,
+				next_hop, valid);
+			if (ret != 0) {
+				if ((ret == -ENOSPC) && (mask_len > 24))
+					rte_rib_v4_remove(rib, ip, mask_len);
+				return ret;
+			}
+			ledge = redge +
+				(uint32_t)(1ULL << (32 - tmp->mask_len));
+		} else {
+			redge = ip + (uint32_t)(1ULL << (32 - mask_len));
+			ret = install_to_fib(rib->fib, ledge, redge,
+				next_hop, valid);
+			if (ret != 0) {
+				if ((ret == -ENOSPC) && (mask_len > 24))
+					rte_rib_v4_remove(rib, ip, mask_len);
+				return ret;
+			}
+		}
+	} while (tmp);
+
+	if (rib_op == RTE_RIB_OP_DEL) {
+		/*tbl8 recycle check*/
+delete_finish:
+		rte_rib_v4_remove(rib, ip, mask_len);
+		if (mask_len > 24) {
+			tmp = rte_rib_v4_get_next_child(rib,
+				(ip & RTE_DIR24_8_TBL24_MASK), 24, NULL,
+				GET_NXT_ALL);
+			if (tmp == NULL) {
+				ent = rte_rib_v4_lookup(rib, ip);
+				if (ent) {
+					ext = (struct rte_rib_simple_ext *)ent->ext;
+					next_hop = ext->next_hop << 2|
+						RTE_DIR24_8_LOOKUP_SUCCESS;
+				} else {
+					next_hop = 0;
+				}
+				tbl8_idx = RTE_DIR24_8_GET_TBL24(fib, ip) >> 2;
+				tbl8_ptr = (uint8_t *)fib->tbl8 +
+					((tbl8_idx * RTE_DIR24_8_TBL8_GRP_NUM_ENT) <<
+					fib->nh_sz);
+				/*update tbl24*/
+				write_to_fib(RTE_DIR24_8_GET_TBL24_P(fib, ip),
+					next_hop, fib->nh_sz, 1);
+				/*Cleanup tbl8*/
+				write_to_fib((void *)tbl8_ptr, 0, fib->nh_sz,
+					RTE_DIR24_8_TBL8_GRP_NUM_ENT);
+			}
+		}
+	}
+	return 0;
+}
diff --git a/lib/librte_rib/rte_dir24_8.h b/lib/librte_rib/rte_dir24_8.h
new file mode 100644
index 0000000..080cd6e
--- /dev/null
+++ b/lib/librte_rib/rte_dir24_8.h
@@ -0,0 +1,144 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIR24_8_H_
+#define _RTE_DIR24_8_H_
+
+/**
+ * @file
+ * RTE Longest Prefix Match (LPM)
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/** @internal Total number of tbl24 entries. */
+#define RTE_DIR24_8_TBL24_NUM_ENT	(1 << 24)
+
+/** Maximum depth value possible for IPv4 LPM. */
+#define RTE_DIR24_8_MAX_DEPTH		32
+
+/** @internal Number of entries in a tbl8 group. */
+#define RTE_DIR24_8_TBL8_GRP_NUM_ENT	256
+
+/** @internal bitmask with valid and valid_group fields set */
+#define RTE_DIR24_8_VALID_EXT_ENT	0x03
+
+/** Bitmask used to indicate successful lookup */
+#define RTE_DIR24_8_LOOKUP_SUCCESS	0x01
+
+#define RTE_DIR24_8_TBL24_MASK		0xffffff00
+
+/** Size of nexthop (1 << nh_sz) bits */
+enum rte_dir24_8_nh_sz {
+	RTE_DIR24_8_1B,
+	RTE_DIR24_8_2B,
+	RTE_DIR24_8_4B,
+	RTE_DIR24_8_8B
+};
+
+#define DIR24_8_TBL_IDX(a)	((a) >> (3 - fib->nh_sz))
+#define DIR24_8_PSD_IDX(a)	((a) & ((1 << (3 - fib->nh_sz)) - 1))
+#define DIR24_8_BITS_IN_NH	(8 * (1 << fib->nh_sz))
+
+#define DIR24_8_TBL24_VAL(ip)	(ip >> 8)
+#define DIR24_8_TBL8_VAL(res, ip)					\
+	((res >> 2) * RTE_DIR24_8_TBL8_GRP_NUM_ENT + (uint8_t)ip)	\
+
+#define DIR24_8_LOOKUP_MSK	((((uint64_t)1 << ((1 << (fib->nh_sz + 3)) - 1)) << 1) - 1)
+#define RTE_DIR24_8_GET_TBL24(fib, ip)					\
+	((fib->tbl24[DIR24_8_TBL_IDX(DIR24_8_TBL24_VAL(ip))] >>		\
+	(DIR24_8_PSD_IDX(DIR24_8_TBL24_VAL(ip)) *			\
+	DIR24_8_BITS_IN_NH)) & DIR24_8_LOOKUP_MSK)			\
+
+#define RTE_DIR24_8_GET_TBL8(fib, res, ip)				\
+	((fib->tbl8[DIR24_8_TBL_IDX(DIR24_8_TBL8_VAL(res, ip))] >>	\
+	(DIR24_8_PSD_IDX(DIR24_8_TBL8_VAL(res, ip)) *			\
+	DIR24_8_BITS_IN_NH)) & DIR24_8_LOOKUP_MSK)			\
+
+#define RTE_DIR24_8_LOOKUP(type)					\
+	const type *ptbl = (const type *)				\
+		&((type *)fib->tbl24)[tbl24_index];			\
+	type tbl_entry = *ptbl;						\
+	if (unlikely((tbl_entry & RTE_DIR24_8_VALID_EXT_ENT) ==		\
+			RTE_DIR24_8_VALID_EXT_ENT)) {			\
+		uint32_t tbl8_index = (uint8_t)ip + ((tbl_entry >> 2) *	\
+				RTE_DIR24_8_TBL8_GRP_NUM_ENT);		\
+									\
+		ptbl = (const type *)&((type *)fib->tbl8)[tbl8_index];	\
+		tbl_entry = *ptbl;					\
+	}								\
+									\
+	*next_hop = (uint64_t)tbl_entry >> 2				\
+									\
+
+struct rte_dir24_8_tbl {
+	uint32_t	number_tbl8s;	/**< Total number of tbl8s. */
+	uint32_t	cur_tbl8s;	/**< Current cumber of tbl8s. */
+	enum rte_dir24_8_nh_sz	nh_sz;	/**< Size of nexthop entry */
+	uint64_t	*tbl8;		/**< LPM tbl8 table. */
+	uint64_t	tbl24[0] __rte_cache_aligned; /**< LPM tbl24 table. */
+};
+
+int rte_dir24_8_modify(struct rte_rib *rib, uint32_t key, uint8_t mask_len,
+	uint64_t next_hop, enum rte_rib_op rib_op);
+int rte_dir24_8_lookup_1b(void *fib_p, uint32_t ip, uint64_t *next_hop);
+int rte_dir24_8_lookup_2b(void *fib_p, uint32_t ip, uint64_t *next_hop);
+int rte_dir24_8_lookup_4b(void *fib_p, uint32_t ip, uint64_t *next_hop);
+int rte_dir24_8_lookup_8b(void *fib_p, uint32_t ip, uint64_t *next_hop);
+int rte_dir24_8_modify(struct rte_rib *rib, uint32_t key, uint8_t mask_len,
+	uint64_t next_hop, enum rte_rib_op rib_op);
+
+static inline int
+rte_dir24_8_lookup(void *fib_p, uint32_t ip, uint64_t *next_hop)
+{
+	uint64_t res;
+	struct rte_dir24_8_tbl *fib = (struct rte_dir24_8_tbl *)fib_p;
+
+	/* DEBUG: Check user input arguments. */
+	RTE_RIB_RETURN_IF_TRUE(((fib == NULL) || (ip == NULL) ||
+		(next_hop == NULL)), -EINVAL);
+
+	res = RTE_DIR24_8_GET_TBL24(fib, ip);
+	if (unlikely((res & RTE_DIR24_8_VALID_EXT_ENT) ==
+		RTE_DIR24_8_VALID_EXT_ENT)) {
+		res = RTE_DIR24_8_GET_TBL8(fib, res, ip);
+	}
+	*next_hop = res >> 2;
+	return (res & RTE_DIR24_8_LOOKUP_SUCCESS) ? 0 : -ENOENT;
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DIR24_8_H_ */
+
diff --git a/lib/librte_rib/rte_rib.c b/lib/librte_rib/rte_rib.c
new file mode 100644
index 0000000..0fbe9a5
--- /dev/null
+++ b/lib/librte_rib/rte_rib.c
@@ -0,0 +1,454 @@
+/*
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Vladimir Medvedkin <medvedkinv@gmail.com>
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <x86intrin.h>
+#include <sys/queue.h>
+
+#include <rte_eal.h>
+#include <rte_eal_memconfig.h>
+#include <rte_common.h>
+#include <rte_tailq.h>
+#include <rte_errno.h>
+#include <rte_rwlock.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_mempool.h>
+#include <rte_malloc.h>
+#include <rte_log.h>
+
+#include <rte_rib.h>
+#include <rte_dir24_8.h>
+
+TAILQ_HEAD(rte_rib_list, rte_tailq_entry);
+static struct rte_tailq_elem rte_rib_tailq = {
+	.name = "RTE_RIB",
+};
+EAL_REGISTER_TAILQ(rte_rib_tailq)
+
+#define MIN(X, Y) (((X) < (Y)) ? (X) : (Y))
+
+static struct rte_rib_v4_node *
+new_node_malloc(struct rte_rib *rib)
+{
+	return malloc(rib->node_sz);
+}
+
+static void
+free_node_malloc(__rte_unused struct rte_rib *rib, struct rte_rib_v4_node *ent)
+{
+	free(ent);
+}
+
+static struct rte_rib_v4_node *
+new_node_mempool(struct rte_rib *rib)
+{
+	struct rte_rib_v4_node *ent;
+	int ret;
+
+	ret = rte_mempool_get(rib->node_pool, (void *)&ent);
+	if (ret != 0)
+		return NULL;
+	return ent;
+}
+
+static void
+free_node_mempool(struct rte_rib *rib, struct rte_rib_v4_node *ent)
+{
+	rte_mempool_put(rib->node_pool, ent);
+}
+
+struct rte_rib_v4_node *
+rte_rib_v4_lookup(struct rte_rib *rib, uint32_t key)
+{
+	struct rte_rib_v4_node *cur = rib->trie;
+	struct rte_rib_v4_node *prev = NULL;
+
+	while ((cur != NULL) && (((cur->key ^ key) &
+		(uint32_t)(UINT64_MAX << (32 - cur->mask_len))) != 0)) {
+		if (cur->flag & VALID_NODE)
+			prev = cur;
+		cur = GET_NXT_NODE(cur, key);
+	}
+	return prev;
+}
+
+struct rte_rib_v4_node *
+rte_rib_v4_lookup_parent(struct rte_rib_v4_node *ent)
+{
+	struct rte_rib_v4_node *tmp;
+
+	if (ent == NULL)
+		return NULL;
+	tmp = ent->parent;
+	while ((tmp != NULL) && (tmp->flag & VALID_NODE) != VALID_NODE)
+		tmp = tmp->parent;
+	return tmp;
+}
+
+struct rte_rib_v4_node *
+rte_rib_v4_lookup_exact(struct rte_rib *rib, uint32_t key, uint8_t mask_len)
+{
+	struct rte_rib_v4_node *cur = rib->trie;
+
+	key &= (uint32_t)(UINT64_MAX << (32 - mask_len));
+	while (cur != NULL) {
+		if ((cur->key == key) && (cur->mask_len == mask_len) &&
+				(cur->flag & VALID_NODE))
+			return cur;
+		if ((cur->mask_len > mask_len) ||
+				(((uint64_t)key >> (32 - cur->mask_len)) !=
+				((uint64_t)cur->key >> (32 - cur->mask_len))))
+			break;
+		cur = GET_NXT_NODE(cur, key);
+	}
+	return NULL;
+}
+
+struct rte_rib_v4_node *
+rte_rib_v4_get_next_child(struct rte_rib *rib, uint32_t key,
+	uint8_t mask_len, struct rte_rib_v4_node *cur, int flag)
+{
+	struct rte_rib_v4_node *tmp, *prev = NULL;
+
+	if (cur == NULL) {
+		tmp = rib->trie;
+		while ((tmp) && (tmp->mask_len < mask_len))
+			tmp = GET_NXT_NODE(tmp, key);
+	} else {
+		tmp = cur;
+		while ((tmp->parent != NULL) && (IS_RIGHT_NODE(tmp) ||
+				(tmp->parent->right == NULL))) {
+			tmp = tmp->parent;
+			if ((tmp->flag & VALID_NODE) &&
+				(IS_COVERED(tmp->key, tmp->mask_len, key, mask_len)))
+				return tmp;
+		}
+		tmp = (tmp->parent) ? tmp->parent->right : NULL;
+	}
+	while (tmp) {
+		if ((tmp->flag & VALID_NODE) &&
+			(IS_COVERED(tmp->key, tmp->mask_len, key, mask_len))) {
+			prev = tmp;
+			if (flag == GET_NXT_COVER)
+				return prev;
+		}
+		tmp = (tmp->left) ? tmp->left : tmp->right;
+	}
+	return prev;
+}
+
+void
+rte_rib_v4_remove(struct rte_rib *rib, uint32_t key, uint8_t mask_len)
+{
+	struct rte_rib_v4_node *cur, *prev, *child;
+
+	cur = rte_rib_v4_lookup_exact(rib, key, mask_len);
+	if (cur == NULL)
+		return;
+
+	cur->flag &= ~VALID_NODE;
+	while ((cur->flag & VALID_NODE) != VALID_NODE) {
+		if ((cur->left != NULL) && (cur->right != NULL)) {
+			rib->free_node(rib, cur);
+			return;
+		}
+		child = (cur->left == NULL) ? cur->right : cur->left;
+		if (child != NULL)
+			child->parent = cur->parent;
+		if (cur->parent == NULL) {
+			rib->trie = child;
+			rib->free_node(rib, cur);
+			return;
+		}
+		if (cur->parent->left == cur)
+			cur->parent->left = child;
+		else
+			cur->parent->right = child;
+		prev = cur;
+		cur = cur->parent;
+		rib->free_node(rib, prev);
+	}
+}
+
+struct rte_rib_v4_node *
+rte_rib_v4_insert(struct rte_rib *rib, uint32_t key, uint8_t mask_len)
+{
+	struct rte_rib_v4_node **tmp = &rib->trie;
+	struct rte_rib_v4_node *prev = NULL;
+	struct rte_rib_v4_node *new_node = NULL;
+	struct rte_rib_v4_node *common_node = NULL;
+	int i = 0;
+	uint32_t common_prefix;
+	uint8_t common_mask_len;
+
+	if (mask_len > 32)
+		return NULL;
+
+	key &= (uint32_t)(UINT64_MAX << (32 - mask_len));
+	new_node = rte_rib_v4_lookup_exact(rib, key, mask_len);
+	if (new_node)
+		return NULL;
+
+	new_node = rib->alloc_node(rib);
+	if (new_node == NULL) {
+		RTE_LOG(ERR, LPM, "RIB node allocation failed\n");
+		return NULL;
+	}
+	new_node->left = NULL;
+	new_node->right = NULL;
+	new_node->parent = NULL;
+	new_node->key = key;
+	new_node->mask_len = mask_len;
+	new_node->flag = VALID_NODE;
+
+	while (1) {
+		if (*tmp == NULL) {
+			*tmp = new_node;
+			new_node->parent = prev;
+		}
+		if ((key == (*tmp)->key) && (mask_len == (*tmp)->mask_len)) {
+			if (new_node != *tmp) {
+				rib->free_node(rib, new_node);
+				(*tmp)->flag |= VALID_NODE;
+			}
+			return *tmp;
+		}
+		i = (*tmp)->mask_len;
+		if ((i >= mask_len) || (((uint64_t)key >> (32 - i)) !=
+				((uint64_t)(*tmp)->key >> (32 - i))))
+			break;
+		prev = *tmp;
+		tmp = (key & (1 << (31 - i))) ? &(*tmp)->right : &(*tmp)->left;
+	}
+	common_mask_len = MIN(mask_len, (*tmp)->mask_len);
+	common_prefix = key ^ (*tmp)->key;
+#ifdef __LZCNT__
+	i = _lzcnt_u32(common_prefix);
+#else
+	for (i = 0; i <= common_mask_len; i++) {
+		if ((common_prefix & (1 << (31 - i))) != 0)
+			break;
+	}
+#endif
+	common_mask_len = MIN(i, common_mask_len);
+	common_prefix = key & (uint32_t)(UINT64_MAX << (32 - common_mask_len));
+	if ((common_prefix == key) && (common_mask_len == mask_len)) {
+		if ((*tmp)->key & (1 << (31 - mask_len)))
+			new_node->right = *tmp;
+		else
+			new_node->left = *tmp;
+		new_node->parent = (*tmp)->parent;
+		(*tmp)->parent = new_node;
+		*tmp = new_node;
+	} else {
+		common_node = rib->alloc_node(rib);
+		if (common_node == NULL) {
+			RTE_LOG(ERR, LPM, "RIB node allocation failed\n");
+			rib->free_node(rib, new_node);
+			return NULL;
+		}
+		common_node->key = common_prefix;
+		common_node->mask_len = common_mask_len;
+		common_node->flag = 0;
+		common_node->parent = (*tmp)->parent;
+		new_node->parent = common_node;
+		(*tmp)->parent = common_node;
+		if ((new_node->key & (1 << (31 - common_mask_len))) == 0) {
+			common_node->left = new_node;
+			common_node->right = *tmp;
+		} else {
+			common_node->left = *tmp;
+			common_node->right = new_node;
+		}
+		*tmp = common_node;
+	}
+	return new_node;
+}
+
+struct rte_rib *
+rte_rib_v4_create(const char *name, int socket_id, struct rte_rib_conf *conf)
+{
+	char mem_name[RTE_RIB_NAMESIZE];
+	struct rte_rib *rib = NULL;
+	struct rte_tailq_entry *te;
+	struct rte_rib_list *rib_list;
+	int dir24_8_nh_size;
+	rte_rib_lookup_fn_t lookup_fn;
+
+	/* Check user arguments. */
+	if ((name == NULL) || (conf == NULL) || (socket_id < -1) ||
+			(conf->type >= RTE_RIB_TYPE_MAX) ||
+			(conf->alloc_type >= RTE_RIB_ALLOC_MAX) ||
+			(conf->max_nodes == 0) ||
+			(conf->node_sz < sizeof(struct rte_rib_v4_node))) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mem_name, sizeof(mem_name), "RIB_%s", name);
+
+	rib_list = RTE_TAILQ_CAST(rte_rib_tailq.head, rte_rib_list);
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	/* guarantee there's no existing */
+	TAILQ_FOREACH(te, rib_list, next) {
+		rib = (struct rte_rib *)te->data;
+		if (strncmp(name, rib->name, RTE_RIB_NAMESIZE) == 0)
+			break;
+	}
+	rib = NULL;
+	if (te != NULL) {
+		rte_errno = EEXIST;
+		goto exit;
+	}
+
+	/* allocate tailq entry */
+	te = rte_zmalloc("RIB_TAILQ_ENTRY", sizeof(*te), 0);
+	if (te == NULL) {
+		RTE_LOG(ERR, LPM, "Failed to allocate tailq entry\n");
+		goto exit;
+	}
+
+	/* Allocate memory to store the LPM data structures. */
+	rib = (struct rte_rib *)rte_zmalloc_socket(mem_name,
+		sizeof(struct rte_rib),	RTE_CACHE_LINE_SIZE, socket_id);
+	if (rib == NULL) {
+		RTE_LOG(ERR, LPM, "RIB memory allocation failed\n");
+		goto free_te;
+	}
+	snprintf(rib->name, sizeof(rib->name), "%s", name);
+	rib->trie = NULL;
+	rib->node_sz = conf->node_sz;
+
+	snprintf(mem_name, sizeof(mem_name), "FIB_%s", name);
+
+	if (conf->type <= RTE_RIB_DIR24_8_8B) {
+		switch (conf->type) {
+		case RTE_RIB_DIR24_8_1B:
+			dir24_8_nh_size = RTE_DIR24_8_1B;
+			lookup_fn = rte_dir24_8_lookup_1b;
+			break;
+		case RTE_RIB_DIR24_8_2B:
+			dir24_8_nh_size = RTE_DIR24_8_2B;
+			lookup_fn = rte_dir24_8_lookup_2b;
+			break;
+		case RTE_RIB_DIR24_8_4B:
+			dir24_8_nh_size = RTE_DIR24_8_4B;
+			lookup_fn = rte_dir24_8_lookup_4b;
+			break;
+		case RTE_RIB_DIR24_8_8B:
+			dir24_8_nh_size = RTE_DIR24_8_8B;
+			lookup_fn = rte_dir24_8_lookup_8b;
+			break;
+		case RTE_RIB_TYPE_MAX:
+		default:
+			RTE_LOG(ERR, LPM, "Bad RIB type\n");
+			goto free_rib;
+		}
+		rib->fib = (void *)rte_zmalloc_socket(mem_name,
+			sizeof(struct rte_dir24_8_tbl) +
+			RTE_DIR24_8_TBL24_NUM_ENT * (1 << dir24_8_nh_size),
+			RTE_CACHE_LINE_SIZE, socket_id);
+		if (rib->fib == NULL) {
+			RTE_LOG(ERR, LPM, "Failed to allocate FIB\n");
+			goto free_rib;
+		}
+		snprintf(mem_name, sizeof(mem_name), "TBL8_%s", name);
+		((struct rte_dir24_8_tbl *)rib->fib)->tbl8 =
+			(void *)rte_zmalloc_socket(mem_name,
+			RTE_DIR24_8_TBL8_GRP_NUM_ENT * (1 << dir24_8_nh_size) *
+			conf->number_tbl8s, RTE_CACHE_LINE_SIZE, socket_id);
+		if (((struct rte_dir24_8_tbl *)rib->fib)->tbl8 == NULL) {
+			RTE_LOG(ERR, LPM, "Failed to allocate TBL8\n");
+			rte_free(rib->fib);
+			goto free_rib;
+		}
+		rib->modify_cb = rte_dir24_8_modify;
+		rib->lookup_cb = lookup_fn;
+		((struct rte_dir24_8_tbl *)rib->fib)->nh_sz =
+			(enum rte_dir24_8_nh_sz)dir24_8_nh_size;
+		((struct rte_dir24_8_tbl *)rib->fib)->number_tbl8s =
+			conf->number_tbl8s;
+	}
+
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	switch (conf->alloc_type) {
+	case RTE_RIB_MALLOC:
+		rib->alloc_node = new_node_malloc;
+		rib->free_node = free_node_malloc;
+		break;
+	case RTE_RIB_MEMPOOL:
+		snprintf(mem_name, sizeof(mem_name), "MP_%s", name);
+		rib->node_pool = rte_mempool_create(mem_name, conf->max_nodes,
+			conf->node_sz, 0, 0, NULL, NULL, NULL, NULL,
+			socket_id, 0);
+
+		if (rib->node_pool == NULL) {
+			RTE_LOG(ERR, LPM, "Failed to allocate mempool\n");
+			rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+			goto free_fib;
+		}
+		rib->alloc_node = new_node_mempool;
+		rib->free_node = free_node_mempool;
+		break;
+	case RTE_RIB_ALLOC_MAX:
+	default:
+		RTE_LOG(ERR, LPM, "Bad RIB alloc type\n");
+		rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+		goto free_fib;
+	}
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+
+	te->data = (void *)rib;
+	TAILQ_INSERT_TAIL(rib_list, te, next);
+
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return rib;
+
+free_fib:
+	rte_free(((struct rte_dir24_8_tbl *)rib->fib)->tbl8);
+	rte_free(rib->fib);
+free_rib:
+	rte_free(rib);
+	rib = NULL;
+free_te:
+	rte_free(te);
+exit:
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return NULL;
+}
diff --git a/lib/librte_rib/rte_rib.h b/lib/librte_rib/rte_rib.h
new file mode 100644
index 0000000..279c85c
--- /dev/null
+++ b/lib/librte_rib/rte_rib.h
@@ -0,0 +1,260 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Vladimir Medvedkin <medvedkinv@gmail.com>
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_RIB_H_
+#define _RTE_RIB_H_
+
+/**
+ * @file
+ * Compressed trie implementation for Longest Prefix Match
+ */
+
+/** @internal Macro to enable/disable run-time checks. */
+#if defined(RTE_LIBRTE_DIR24_8_DEBUG)
+#define RTE_RIB_RETURN_IF_TRUE(cond, retval) do {	\
+	if (cond)					\
+		return retval;				\
+} while (0)
+#else
+#define RTE_RIB_RETURN_IF_TRUE(cond, retval)
+#endif
+
+#define VALID_NODE	1
+#define GET_NXT_ALL	0
+#define GET_NXT_COVER	1
+
+/** Max number of characters in RIB name. */
+#define RTE_RIB_NAMESIZE	64
+
+/** Maximum depth value possible for IPv4 RIB. */
+#define RTE_RIB_V4_MAXDEPTH	32
+
+/**
+ * Macro to check if prefix1 {key1/mask_len1}
+ * is covered by prefix2 {key2/mask_len2}
+ */
+#define IS_COVERED(key1, mask_len1, key2, mask_len2)			     \
+	((((key1 ^ key2) & (uint32_t)(UINT64_MAX << (32 - mask_len2))) == 0) \
+		&& (mask_len1 > mask_len2))
+
+/** @internal Macro to get next node in tree*/
+#define GET_NXT_NODE(node, key)						 \
+	((key & (1 << (31 - node->mask_len))) ? node->right : node->left)
+/** @internal Macro to check if node is right child*/
+#define IS_RIGHT_NODE(node)	(node->parent->right == node)
+
+
+struct rte_rib_v4_node {
+	struct rte_rib_v4_node *left;
+	struct rte_rib_v4_node *right;
+	struct rte_rib_v4_node *parent;
+	uint32_t	key;
+	uint8_t		mask_len;
+	uint8_t		flag;
+	uint64_t	ext[0];
+};
+
+/** simple extension to keep only single next_hop */
+struct rte_rib_simple_ext {
+	uint64_t next_hop;
+};
+
+struct rte_rib;
+
+enum rte_rib_op {
+	RTE_RIB_OP_ADD = 0,
+	RTE_RIB_OP_DEL,
+	RTE_RIB_OP_MODIFY
+};
+
+/** Type of FIB struct*/
+enum rte_rib_type {
+	RTE_RIB_DIR24_8_1B,
+	RTE_RIB_DIR24_8_2B,
+	RTE_RIB_DIR24_8_4B,
+	RTE_RIB_DIR24_8_8B,
+	RTE_RIB_TYPE_MAX
+};
+
+/** RIB nodes allocation type */
+enum rte_rib_alloc_type {
+	RTE_RIB_MALLOC,
+	RTE_RIB_MEMPOOL,
+	RTE_RIB_ALLOC_MAX
+};
+
+typedef int (*rte_rib_modify_fn_t)(struct rte_rib *rib, uint32_t key,
+	uint8_t mask_len, uint64_t next_hop, enum rte_rib_op rib_op);
+typedef int (*rte_rib_lookup_fn_t)(void *fib, uint32_t key, uint64_t *next_hop);
+typedef struct rte_rib_v4_node *(*rte_rib_alloc_node_fn_t)(struct rte_rib *rib);
+typedef void (*rte_rib_free_node_fn_t)(struct rte_rib *rib,
+	struct rte_rib_v4_node *node);
+
+struct rte_rib {
+	char name[RTE_RIB_NAMESIZE];
+	/*pointer to rib trie*/
+	struct rte_rib_v4_node	*trie;
+	/*pointer to dataplane struct*/
+	void	*fib;
+	/*prefix modification callback*/
+	rte_rib_modify_fn_t	modify_cb;
+	/* lookup fn callback*/
+	rte_rib_lookup_fn_t	lookup_cb;
+	/*alloc trie element*/
+	rte_rib_alloc_node_fn_t	alloc_node;
+	/*free trie element*/
+	rte_rib_free_node_fn_t	free_node;
+	void	*node_pool;
+	int	node_sz;
+};
+
+/** RIB configuration structure */
+struct rte_rib_conf {
+	enum rte_rib_type type;
+	int	max_nodes;
+	size_t	node_sz;
+	int	number_tbl8s;
+	enum rte_rib_alloc_type alloc_type;
+};
+
+/**
+ * Lookup an IP into the RIB structure
+ *
+ * @param rib
+ *  RIB object handle
+ * @param key
+ *  IP to be looked up in the RIB
+ * @return
+ *  pointer to struct rte_rib_v4_node on success,
+ *  NULL otherwise
+ */
+struct rte_rib_v4_node *rte_rib_v4_lookup(struct rte_rib *rib, uint32_t key);
+
+/**
+ * Lookup less specific route into the RIB structure
+ *
+ * @param ent
+ *  Pointer to struct rte_rib_v4_node that represents target route
+ * @return
+ *  pointer to struct rte_rib_v4_node that represents
+ *  less specific route on success,
+ *  NULL otherwise
+ */
+struct rte_rib_v4_node *rte_rib_v4_lookup_parent(struct rte_rib_v4_node *ent);
+
+/**
+ * Lookup prefix into the RIB structure
+ *
+ * @param rib
+ *  RIB object handle
+ * @param key
+ *  net to be looked up in the RIB
+ * @param mask_len
+ *  prefix length
+ * @return
+ *  pointer to struct rte_rib_v4_node on success,
+ *  NULL otherwise
+ */
+struct rte_rib_v4_node *rte_rib_v4_lookup_exact(struct rte_rib *rib,
+	uint32_t key, uint8_t mask_len);
+
+/**
+ * Retrieve next more specific prefix from the RIB
+ * that is covered by key/mask_len supernet
+ *
+ * @param rib
+ *  RIB object handle
+ * @param key
+ *  net address of supernet prefix that covers returned more specific prefixes
+ * @param mask_len
+ *  supernet prefix length
+ * @param cur
+ *   pointer to the last returned prefix to get next prefix
+ *   or
+ *   NULL to get first more specific prefix
+ * @param flag
+ *  -GET_NXT_ALL
+ *   get all prefixes from subtrie
+ *  -GET_NXT_COVER
+ *   get only first more specific prefix even if it have more specifics
+ * @return
+ *  pointer to the next more specific prefix
+ *  or
+ *  NULL if there is no prefixes left
+ */
+struct rte_rib_v4_node *rte_rib_v4_get_next_child(struct rte_rib *rib,
+	uint32_t key, uint8_t mask_len, struct rte_rib_v4_node *cur, int flag);
+
+/**
+ * Remove prefix from the RIB
+ *
+ * @param rib
+ *  RIB object handle
+ * @param key
+ *  net to be removed from the RIB
+ * @param mask_len
+ *  prefix length
+ */
+void rte_rib_v4_remove(struct rte_rib *rib, uint32_t key, uint8_t mask_len);
+
+/**
+ * Insert prefix into the RIB
+ *
+ * @param rib
+ *  RIB object handle
+ * @param key
+ *  net to be inserted to the RIB
+ * @param mask_len
+ *  prefix length
+ * @return
+ *  pointer to new rte_rib_v4_node on success
+ *  NULL otherwise
+ */
+struct rte_rib_v4_node *rte_rib_v4_insert(struct rte_rib *rib, uint32_t key,
+	uint8_t mask_len);
+
+/**
+ * Create RIB
+ *
+ * @param name
+ *  RIB name
+ * @param socket_id
+ *  NUMA socket ID for RIB table memory allocation
+ * @param conf
+ *  Structure containing the configuration
+ * @return
+ *  Handle to LPM object on success
+ *  NULL otherwise with rte_errno set to an appropriate values.
+ */
+struct rte_rib *rte_rib_v4_create(const char *name, int socket_id,
+	struct rte_rib_conf *conf);
+
+#endif /* _RTE_RIB_H_ */
+
-- 
1.9.1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] lib/rib: Add Routing Information Base library
  2017-07-11 19:33 ` [dpdk-dev] [PATCH] lib/rib: Add Routing Information Base library Medvedkin Vladimir
@ 2017-07-11 20:28   ` Stephen Hemminger
  2017-07-11 23:17     ` Vladimir Medvedkin
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-11 20:28 UTC (permalink / raw)
  To: Medvedkin Vladimir; +Cc: dev

On Tue, 11 Jul 2017 19:33:05 +0000
Medvedkin Vladimir <medvedkinv@gmail.com> wrote:

> +
> +#define ROUNDUP(x, y)   ((((x - 1) >> (32 - y)) + 1) << (32 - y))

There is already RTE_ALIGN_FLOOR/RTE_ALIGN_CEIL


> +#define RTE_DIR24_8_GET_TBL24_P(fib, ip)			\
> +	((void *)&((uint8_t *)fib->tbl24)[(ip &			\
> +		RTE_DIR24_8_TBL24_MASK) >> (8 - fib->nh_sz)])	\
> +

Why is this a macro and not an inline function.
The expresion could also be split up to be simpler, and compiler
would generate same result.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC] Add RIB library
  2017-07-11 19:33 [dpdk-dev] [RFC] Add RIB library Medvedkin Vladimir
  2017-07-11 19:33 ` [dpdk-dev] [PATCH] lib/rib: Add Routing Information Base library Medvedkin Vladimir
@ 2017-07-11 20:31 ` Stephen Hemminger
  2017-07-11 23:13   ` Vladimir Medvedkin
  2017-08-14 10:51 ` Bruce Richardson
  2 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2017-07-11 20:31 UTC (permalink / raw)
  To: Medvedkin Vladimir; +Cc: dev

On Tue, 11 Jul 2017 19:33:04 +0000
Medvedkin Vladimir <medvedkinv@gmail.com> wrote:

> Hi,
> 
> I want to introduce new library for ip routing lookup that have some advantages
> over current LPM library. In short:
>      - Increases the speed of control plane operations against lpm such as
>        adding/deleting routes
>      - Adds abstraction from dataplane algorythms, so it is possible to add
>        different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
>        in addition to current dir24_8
>      - It is possible to keep user defined application specific additional
>        information in struct rte_rib_v4_node which represents route entry.
>        It can be next hop/set of next hops (i.e. active and feasible),
>        pointers to link rte_rib_v4_node based on some criteria (i.e. next_hop),
>        plenty of additional control plane information.
>      - For dir24_8 implementation it is possible to remove rte_lpm_tbl_entry.depth
>        field that helps to save 6 bits.
>      - Also new dir24_8 implementation supports different next_hop sizes
>        (1/2/4/8 bytes per next hop)
> 
> It would be nice to hear your opinion. The draft is below.
> 
> Medvedkin Vladimir (1):
>   lib/rib: Add Routing Information Base library
> 
>  config/common_base           |   6 +
>  doc/api/doxy-api.conf        |   1 +
>  lib/Makefile                 |   2 +
>  lib/librte_rib/Makefile      |  43 ++++
>  lib/librte_rib/rte_dir24_8.c | 411 +++++++++++++++++++++++++++++++++++++++
>  lib/librte_rib/rte_dir24_8.h | 144 ++++++++++++++
>  lib/librte_rib/rte_rib.c     | 454 +++++++++++++++++++++++++++++++++++++++++++
>  lib/librte_rib/rte_rib.h     | 260 +++++++++++++++++++++++++
>  8 files changed, 1321 insertions(+)
>  create mode 100644 lib/librte_rib/Makefile
>  create mode 100644 lib/librte_rib/rte_dir24_8.c
>  create mode 100644 lib/librte_rib/rte_dir24_8.h
>  create mode 100644 lib/librte_rib/rte_rib.c
>  create mode 100644 lib/librte_rib/rte_rib.h
> 

In network paralance a RIB is usually a full route table and FIB is the forwarding
table in use. You probably don't want to call this a RIB. It looks more like an
abstraction above FIB.


https://networkengineering.stackexchange.com/questions/38588/rib-vs-fib-differences
http://aftabsiddiqui.com/index.php/ip-routing-table-rib-and-forwarding-table-fib/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC] Add RIB library
  2017-07-11 20:31 ` [dpdk-dev] [RFC] Add RIB library Stephen Hemminger
@ 2017-07-11 23:13   ` Vladimir Medvedkin
  0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Medvedkin @ 2017-07-11 23:13 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Actually that is vendor specific. For example Juniper takes all route
information from protocol specific tables and compiles it in rib (yes, they
call it rib) so-called inet.0 (for ipv4 default VRF). In general RIB
contains control plane information and is used for control plane purpose
such as fib modification and for example showing route information in CLI.
FIB is used on dataplane only to make forwarding decision.
Actually with this new library you can keep information about all
particular route sources and all protocol specific information. For example
I have in my rib
10.0.0.0/8          *[Static/5] 42w2d 10:58:02
                    > to 10.201.254.1 via ae1.996
                    [Static/5] 35w6d 05:39:47
                      Discard
                    [OSPF/150] 3w0d 07:49:33, metric 15, tag 0
                    > to 1.1.1.1 via ae3.0
                    [BGP/170] 07:17:26, localpref 100, from 1.1.2.1
                      AS path: I, validation-state: unverified
                    > to 1.1.2.2 via ae1.879, label-switched-path M9-OS1
                    [BGP/170] 2w1d 01:13:34, localpref 100, from 1.1.3.1
                      AS path: I, validation-state: unverified
                    > to 1.1.3.2 via ae1.878, label-switched-path M9-OS0
                    [BGP/170] 2w0d 02:54:47, localpref 100, from 1.1.4.1
                      AS path: I, validation-state: unverified
                    > to 1.1.4.2 via ae2.811, label-switched-path M9-KR0

So you have only one struct rte_rib_v4_node for prefix  10.0.0.0/8 that
contains app specific information like
struct my_app_static_ext {
        int admin_dist;
        time_t time;
        uint64_t nh_id;
}

struct my_app_ospf_ext {
        int admin_dist;
        time_t time;
        uint64_t nh_id;
        int metric;
        int type;
}
struct my_app_bgp_ext {
        int admin_dist;
        time_t time;
        uint64_t nh_id
        uint32_t source;
        int med;
        int localpref;
        char * as_path;
        char* community;
}

union my_app_proto_ext {
        struct my_app_static_ext;
        struct my_app_ospf_ext;
        struct my_app_bgp_ext;
}

struct my_app_ext {
        struct my_app_ext *next;
        int type;
        union my_app_proto_ext;
}
 in it's .ext field. In this way you'll keep information about all routes.


2017-07-11 23:31 GMT+03:00 Stephen Hemminger <stephen@networkplumber.org>:

> On Tue, 11 Jul 2017 19:33:04 +0000
> Medvedkin Vladimir <medvedkinv@gmail.com> wrote:
>
> > Hi,
> >
> > I want to introduce new library for ip routing lookup that have some
> advantages
> > over current LPM library. In short:
> >      - Increases the speed of control plane operations against lpm such
> as
> >        adding/deleting routes
> >      - Adds abstraction from dataplane algorythms, so it is possible to
> add
> >        different ip route lookup algorythms such as
> DXR/poptrie/lpc-trie/etc
> >        in addition to current dir24_8
> >      - It is possible to keep user defined application specific
> additional
> >        information in struct rte_rib_v4_node which represents route
> entry.
> >        It can be next hop/set of next hops (i.e. active and feasible),
> >        pointers to link rte_rib_v4_node based on some criteria (i.e.
> next_hop),
> >        plenty of additional control plane information.
> >      - For dir24_8 implementation it is possible to remove
> rte_lpm_tbl_entry.depth
> >        field that helps to save 6 bits.
> >      - Also new dir24_8 implementation supports different next_hop sizes
> >        (1/2/4/8 bytes per next hop)
> >
> > It would be nice to hear your opinion. The draft is below.
> >
> > Medvedkin Vladimir (1):
> >   lib/rib: Add Routing Information Base library
> >
> >  config/common_base           |   6 +
> >  doc/api/doxy-api.conf        |   1 +
> >  lib/Makefile                 |   2 +
> >  lib/librte_rib/Makefile      |  43 ++++
> >  lib/librte_rib/rte_dir24_8.c | 411 ++++++++++++++++++++++++++++++
> +++++++++
> >  lib/librte_rib/rte_dir24_8.h | 144 ++++++++++++++
> >  lib/librte_rib/rte_rib.c     | 454 ++++++++++++++++++++++++++++++
> +++++++++++++
> >  lib/librte_rib/rte_rib.h     | 260 +++++++++++++++++++++++++
> >  8 files changed, 1321 insertions(+)
> >  create mode 100644 lib/librte_rib/Makefile
> >  create mode 100644 lib/librte_rib/rte_dir24_8.c
> >  create mode 100644 lib/librte_rib/rte_dir24_8.h
> >  create mode 100644 lib/librte_rib/rte_rib.c
> >  create mode 100644 lib/librte_rib/rte_rib.h
> >
>
> In network paralance a RIB is usually a full route table and FIB is the
> forwarding
> table in use. You probably don't want to call this a RIB. It looks more
> like an
> abstraction above FIB.
>
>
> https://networkengineering.stackexchange.com/questions/
> 38588/rib-vs-fib-differences
> http://aftabsiddiqui.com/index.php/ip-routing-table-
> rib-and-forwarding-table-fib/
>



-- 
Regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [PATCH] lib/rib: Add Routing Information Base library
  2017-07-11 20:28   ` Stephen Hemminger
@ 2017-07-11 23:17     ` Vladimir Medvedkin
  0 siblings, 0 replies; 12+ messages in thread
From: Vladimir Medvedkin @ 2017-07-11 23:17 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

Thanks, I'll change it.

2017-07-11 23:28 GMT+03:00 Stephen Hemminger <stephen@networkplumber.org>:

> On Tue, 11 Jul 2017 19:33:05 +0000
> Medvedkin Vladimir <medvedkinv@gmail.com> wrote:
>
> > +
> > +#define ROUNDUP(x, y)   ((((x - 1) >> (32 - y)) + 1) << (32 - y))
>
> There is already RTE_ALIGN_FLOOR/RTE_ALIGN_CEIL
>
>
> > +#define RTE_DIR24_8_GET_TBL24_P(fib, ip)                     \
> > +     ((void *)&((uint8_t *)fib->tbl24)[(ip &                 \
> > +             RTE_DIR24_8_TBL24_MASK) >> (8 - fib->nh_sz)])   \
> > +
>
> Why is this a macro and not an inline function.
> The expresion could also be split up to be simpler, and compiler
> would generate same result.
>



-- 
Regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC] Add RIB library
  2017-07-11 19:33 [dpdk-dev] [RFC] Add RIB library Medvedkin Vladimir
  2017-07-11 19:33 ` [dpdk-dev] [PATCH] lib/rib: Add Routing Information Base library Medvedkin Vladimir
  2017-07-11 20:31 ` [dpdk-dev] [RFC] Add RIB library Stephen Hemminger
@ 2017-08-14 10:51 ` Bruce Richardson
  2017-08-14 22:28   ` Vladimir Medvedkin
  2 siblings, 1 reply; 12+ messages in thread
From: Bruce Richardson @ 2017-08-14 10:51 UTC (permalink / raw)
  To: Medvedkin Vladimir; +Cc: dev

On Tue, Jul 11, 2017 at 07:33:04PM +0000, Medvedkin Vladimir wrote:
> Hi,
> 
> I want to introduce new library for ip routing lookup that have some advantages
> over current LPM library. In short:
>      - Increases the speed of control plane operations against lpm such as
>        adding/deleting routes
>      - Adds abstraction from dataplane algorythms, so it is possible to add
>        different ip route lookup algorythms such as DXR/poptrie/lpc-trie/etc
>        in addition to current dir24_8
>      - It is possible to keep user defined application specific additional
>        information in struct rte_rib_v4_node which represents route entry.
>        It can be next hop/set of next hops (i.e. active and feasible),
>        pointers to link rte_rib_v4_node based on some criteria (i.e. next_hop),
>        plenty of additional control plane information.
>      - For dir24_8 implementation it is possible to remove rte_lpm_tbl_entry.depth
>        field that helps to save 6 bits.
>      - Also new dir24_8 implementation supports different next_hop sizes
>        (1/2/4/8 bytes per next hop)
> 
> It would be nice to hear your opinion. The draft is below.
> 
> Medvedkin Vladimir (1):
>   lib/rib: Add Routing Information Base library
> 

On reading this patch and then having discussion with you offline, it
appears there are two major new elements in this patchset:

1. a re-implementation of LPM, with the major advantage of having a
flexible data-size
2. a separate control plane structure that is designed to fit on top off
possibly multiple lookup structures for the data plane

Is this correct?

For the first part, I don't think we should carry about two separate LPM
implementations, but rather look to take the improvements in your
version back into the existing lib. [Or else replace the existing one,
but I prefer pulling the new stuff into it, so as to keep backward
compatibility]

For the second part, perhaps you could expand a bit more on the thought
here, and explain what all different data plane implementations would
fit under it. Would, for instance a hash-lookup work? In that case, what
would the data plane APIs be, and the control plane ones.

Thanks,
/Bruce

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC] Add RIB library
  2017-08-14 10:51 ` Bruce Richardson
@ 2017-08-14 22:28   ` Vladimir Medvedkin
  2017-08-15  8:23     ` Bruce Richardson
  0 siblings, 1 reply; 12+ messages in thread
From: Vladimir Medvedkin @ 2017-08-14 22:28 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

2017-08-14 13:51 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:

> On Tue, Jul 11, 2017 at 07:33:04PM +0000, Medvedkin Vladimir wrote:
> > Hi,
> >
> > I want to introduce new library for ip routing lookup that have some
> advantages
> > over current LPM library. In short:
> >      - Increases the speed of control plane operations against lpm such
> as
> >        adding/deleting routes
> >      - Adds abstraction from dataplane algorythms, so it is possible to
> add
> >        different ip route lookup algorythms such as
> DXR/poptrie/lpc-trie/etc
> >        in addition to current dir24_8
> >      - It is possible to keep user defined application specific
> additional
> >        information in struct rte_rib_v4_node which represents route
> entry.
> >        It can be next hop/set of next hops (i.e. active and feasible),
> >        pointers to link rte_rib_v4_node based on some criteria (i.e.
> next_hop),
> >        plenty of additional control plane information.
> >      - For dir24_8 implementation it is possible to remove
> rte_lpm_tbl_entry.depth
> >        field that helps to save 6 bits.
> >      - Also new dir24_8 implementation supports different next_hop sizes
> >        (1/2/4/8 bytes per next hop)
> >
> > It would be nice to hear your opinion. The draft is below.
> >
> > Medvedkin Vladimir (1):
> >   lib/rib: Add Routing Information Base library
> >
>
> On reading this patch and then having discussion with you offline, it
> appears there are two major new elements in this patchset:
>
> 1. a re-implementation of LPM, with the major advantage of having a
> flexible data-size
> 2. a separate control plane structure that is designed to fit on top off
> possibly multiple lookup structures for the data plane
>
> Is this correct?
>
Correct

>
> For the first part, I don't think we should carry about two separate LPM
> implementations, but rather look to take the improvements in your
> version back into the existing lib. [Or else replace the existing one,
> but I prefer pulling the new stuff into it, so as to keep backward
> compatibility]
>

> For the second part, perhaps you could expand a bit more on the thought
> here, and explain what all different data plane implementations would
> fit under it. Would, for instance a hash-lookup work? In that case, what
> would the data plane APIs be, and the control plane ones.
>

 I'm not sure for _all_ data plane implementations, but from my point of
view compressed prefix trie (rte_rib structure) could be useful at least
for dir24_8, dxr, bitmap handling. Concerning to hash lookup, it depends on
algorithm (array of hash tables indexed by mask length, unrolling prefix to
number of /32).
Perhaps it is better to waive the abstraction and make LPM as primary
struct that keeps rte_rib inside (instead of rules_tbl[ ]).
In that case rte_rib becomes side structure and it's API only for working
with a trie. LPM's API remains the same (except next_hop size and LPM
creation).


> Thanks,
> /Bruce
>



-- 
Regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC] Add RIB library
  2017-08-14 22:28   ` Vladimir Medvedkin
@ 2017-08-15  8:23     ` Bruce Richardson
  2017-08-15 10:49       ` Vladimir Medvedkin
  0 siblings, 1 reply; 12+ messages in thread
From: Bruce Richardson @ 2017-08-15  8:23 UTC (permalink / raw)
  To: Vladimir Medvedkin; +Cc: dev

On Tue, Aug 15, 2017 at 01:28:26AM +0300, Vladimir Medvedkin wrote:
> 2017-08-14 13:51 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:
> 
> > On Tue, Jul 11, 2017 at 07:33:04PM +0000, Medvedkin Vladimir wrote:
> > > Hi,
> > >
> > > I want to introduce new library for ip routing lookup that have some
> > advantages
> > > over current LPM library. In short:
> > >      - Increases the speed of control plane operations against lpm such
> > as
> > >        adding/deleting routes
> > >      - Adds abstraction from dataplane algorythms, so it is possible to
> > add
> > >        different ip route lookup algorythms such as
> > DXR/poptrie/lpc-trie/etc
> > >        in addition to current dir24_8
> > >      - It is possible to keep user defined application specific
> > additional
> > >        information in struct rte_rib_v4_node which represents route
> > entry.
> > >        It can be next hop/set of next hops (i.e. active and feasible),
> > >        pointers to link rte_rib_v4_node based on some criteria (i.e.
> > next_hop),
> > >        plenty of additional control plane information.
> > >      - For dir24_8 implementation it is possible to remove
> > rte_lpm_tbl_entry.depth
> > >        field that helps to save 6 bits.
> > >      - Also new dir24_8 implementation supports different next_hop sizes
> > >        (1/2/4/8 bytes per next hop)
> > >
> > > It would be nice to hear your opinion. The draft is below.
> > >
> > > Medvedkin Vladimir (1):
> > >   lib/rib: Add Routing Information Base library
> > >
> >
> > On reading this patch and then having discussion with you offline, it
> > appears there are two major new elements in this patchset:
> >
> > 1. a re-implementation of LPM, with the major advantage of having a
> > flexible data-size
> > 2. a separate control plane structure that is designed to fit on top off
> > possibly multiple lookup structures for the data plane
> >
> > Is this correct?
> >
> Correct
> 
> >
> > For the first part, I don't think we should carry about two separate LPM
> > implementations, but rather look to take the improvements in your
> > version back into the existing lib. [Or else replace the existing one,
> > but I prefer pulling the new stuff into it, so as to keep backward
> > compatibility]
> >
> 
> > For the second part, perhaps you could expand a bit more on the thought
> > here, and explain what all different data plane implementations would
> > fit under it. Would, for instance a hash-lookup work? In that case, what
> > would the data plane APIs be, and the control plane ones.
> >
> 
>  I'm not sure for _all_ data plane implementations, but from my point of
> view compressed prefix trie (rte_rib structure) could be useful at least
> for dir24_8, dxr, bitmap handling. Concerning to hash lookup, it depends on
> algorithm (array of hash tables indexed by mask length, unrolling prefix to
> number of /32).
> Perhaps it is better to waive the abstraction and make LPM as primary
> struct that keeps rte_rib inside (instead of rules_tbl[ ]).
> In that case rte_rib becomes side structure and it's API only for working
> with a trie. LPM's API remains the same (except next_hop size and LPM
> creation).
> 
> 
What is the advantage of using the rte_rib for control plane access over
the existing rules table structure. Are not the basic operations needed
for adding/removing/looking-up rules supported by both?

/Bruce

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC] Add RIB library
  2017-08-15  8:23     ` Bruce Richardson
@ 2017-08-15 10:49       ` Vladimir Medvedkin
  2017-08-15 11:01         ` Vladimir Medvedkin
  0 siblings, 1 reply; 12+ messages in thread
From: Vladimir Medvedkin @ 2017-08-15 10:49 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

The advantage is in increasing control plane operations speed. I tested
with my fullview + internal routes. It had 650030 prefixes(shuffled) with
1000 specific (longer /24) prefixes within 73 /24 networks. Every prefix
had unique next hop. In this test the speed of inserting new routes was
increased 70 times against current LPM. This was achieved due to
1. keeping routes in a trie structure instead of array (no need to get free
room for rule)
2. avoid unnecessary reads in tbl24 (checking for .depth). Thanks to
rte_rib_v4_get_next_child() (that is reverse order traversal without
recursion) you can get all more specific prefixes inside your target prefix
(you want to add/del), so you can get all ranges between that more
specifics and write next hop unconditionally to tbl24 and tbl8.

2017-08-15 11:23 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:

> On Tue, Aug 15, 2017 at 01:28:26AM +0300, Vladimir Medvedkin wrote:
> > 2017-08-14 13:51 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com
> >:
> >
> > > On Tue, Jul 11, 2017 at 07:33:04PM +0000, Medvedkin Vladimir wrote:
> > > > Hi,
> > > >
> > > > I want to introduce new library for ip routing lookup that have some
> > > advantages
> > > > over current LPM library. In short:
> > > >      - Increases the speed of control plane operations against lpm
> such
> > > as
> > > >        adding/deleting routes
> > > >      - Adds abstraction from dataplane algorythms, so it is possible
> to
> > > add
> > > >        different ip route lookup algorythms such as
> > > DXR/poptrie/lpc-trie/etc
> > > >        in addition to current dir24_8
> > > >      - It is possible to keep user defined application specific
> > > additional
> > > >        information in struct rte_rib_v4_node which represents route
> > > entry.
> > > >        It can be next hop/set of next hops (i.e. active and
> feasible),
> > > >        pointers to link rte_rib_v4_node based on some criteria (i.e.
> > > next_hop),
> > > >        plenty of additional control plane information.
> > > >      - For dir24_8 implementation it is possible to remove
> > > rte_lpm_tbl_entry.depth
> > > >        field that helps to save 6 bits.
> > > >      - Also new dir24_8 implementation supports different next_hop
> sizes
> > > >        (1/2/4/8 bytes per next hop)
> > > >
> > > > It would be nice to hear your opinion. The draft is below.
> > > >
> > > > Medvedkin Vladimir (1):
> > > >   lib/rib: Add Routing Information Base library
> > > >
> > >
> > > On reading this patch and then having discussion with you offline, it
> > > appears there are two major new elements in this patchset:
> > >
> > > 1. a re-implementation of LPM, with the major advantage of having a
> > > flexible data-size
> > > 2. a separate control plane structure that is designed to fit on top
> off
> > > possibly multiple lookup structures for the data plane
> > >
> > > Is this correct?
> > >
> > Correct
> >
> > >
> > > For the first part, I don't think we should carry about two separate
> LPM
> > > implementations, but rather look to take the improvements in your
> > > version back into the existing lib. [Or else replace the existing one,
> > > but I prefer pulling the new stuff into it, so as to keep backward
> > > compatibility]
> > >
> >
> > > For the second part, perhaps you could expand a bit more on the thought
> > > here, and explain what all different data plane implementations would
> > > fit under it. Would, for instance a hash-lookup work? In that case,
> what
> > > would the data plane APIs be, and the control plane ones.
> > >
> >
> >  I'm not sure for _all_ data plane implementations, but from my point of
> > view compressed prefix trie (rte_rib structure) could be useful at least
> > for dir24_8, dxr, bitmap handling. Concerning to hash lookup, it depends
> on
> > algorithm (array of hash tables indexed by mask length, unrolling prefix
> to
> > number of /32).
> > Perhaps it is better to waive the abstraction and make LPM as primary
> > struct that keeps rte_rib inside (instead of rules_tbl[ ]).
> > In that case rte_rib becomes side structure and it's API only for working
> > with a trie. LPM's API remains the same (except next_hop size and LPM
> > creation).
> >
> >
> What is the advantage of using the rte_rib for control plane access over
> the existing rules table structure. Are not the basic operations needed
> for adding/removing/looking-up rules supported by both?
>
> /Bruce
>



-- 
Regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC] Add RIB library
  2017-08-15 10:49       ` Vladimir Medvedkin
@ 2017-08-15 11:01         ` Vladimir Medvedkin
  2018-01-16  0:41           ` Thomas Monjalon
  0 siblings, 1 reply; 12+ messages in thread
From: Vladimir Medvedkin @ 2017-08-15 11:01 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

Moreover rte_rib_v4_node could contain app specific extension (.ext field).
For example you can implement PIC (prefix independent convergence) by
linking rte_rib_v4_node with similar next hop together and precalculate
feasible next hop for each. Something like:
struct rte_rib_pic_nh {
    struct *rte_rib_v4_node;
    uint64_t nh;
    uint64_t feasible_nh;
}
and keep that linked list's head in next hop structure.
When next hop fails you just jump from rte_rib_v4_node rte_rib_v4_node and
change next hop very fast.

2017-08-15 13:49 GMT+03:00 Vladimir Medvedkin <medvedkinv@gmail.com>:

> The advantage is in increasing control plane operations speed. I tested
> with my fullview + internal routes. It had 650030 prefixes(shuffled) with
> 1000 specific (longer /24) prefixes within 73 /24 networks. Every prefix
> had unique next hop. In this test the speed of inserting new routes was
> increased 70 times against current LPM. This was achieved due to
> 1. keeping routes in a trie structure instead of array (no need to get
> free room for rule)
> 2. avoid unnecessary reads in tbl24 (checking for .depth). Thanks to
> rte_rib_v4_get_next_child() (that is reverse order traversal without
> recursion) you can get all more specific prefixes inside your target prefix
> (you want to add/del), so you can get all ranges between that more
> specifics and write next hop unconditionally to tbl24 and tbl8.
>
> 2017-08-15 11:23 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:
>
>> On Tue, Aug 15, 2017 at 01:28:26AM +0300, Vladimir Medvedkin wrote:
>> > 2017-08-14 13:51 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com
>> >:
>> >
>> > > On Tue, Jul 11, 2017 at 07:33:04PM +0000, Medvedkin Vladimir wrote:
>> > > > Hi,
>> > > >
>> > > > I want to introduce new library for ip routing lookup that have some
>> > > advantages
>> > > > over current LPM library. In short:
>> > > >      - Increases the speed of control plane operations against lpm
>> such
>> > > as
>> > > >        adding/deleting routes
>> > > >      - Adds abstraction from dataplane algorythms, so it is
>> possible to
>> > > add
>> > > >        different ip route lookup algorythms such as
>> > > DXR/poptrie/lpc-trie/etc
>> > > >        in addition to current dir24_8
>> > > >      - It is possible to keep user defined application specific
>> > > additional
>> > > >        information in struct rte_rib_v4_node which represents route
>> > > entry.
>> > > >        It can be next hop/set of next hops (i.e. active and
>> feasible),
>> > > >        pointers to link rte_rib_v4_node based on some criteria (i.e.
>> > > next_hop),
>> > > >        plenty of additional control plane information.
>> > > >      - For dir24_8 implementation it is possible to remove
>> > > rte_lpm_tbl_entry.depth
>> > > >        field that helps to save 6 bits.
>> > > >      - Also new dir24_8 implementation supports different next_hop
>> sizes
>> > > >        (1/2/4/8 bytes per next hop)
>> > > >
>> > > > It would be nice to hear your opinion. The draft is below.
>> > > >
>> > > > Medvedkin Vladimir (1):
>> > > >   lib/rib: Add Routing Information Base library
>> > > >
>> > >
>> > > On reading this patch and then having discussion with you offline, it
>> > > appears there are two major new elements in this patchset:
>> > >
>> > > 1. a re-implementation of LPM, with the major advantage of having a
>> > > flexible data-size
>> > > 2. a separate control plane structure that is designed to fit on top
>> off
>> > > possibly multiple lookup structures for the data plane
>> > >
>> > > Is this correct?
>> > >
>> > Correct
>> >
>> > >
>> > > For the first part, I don't think we should carry about two separate
>> LPM
>> > > implementations, but rather look to take the improvements in your
>> > > version back into the existing lib. [Or else replace the existing one,
>> > > but I prefer pulling the new stuff into it, so as to keep backward
>> > > compatibility]
>> > >
>> >
>> > > For the second part, perhaps you could expand a bit more on the
>> thought
>> > > here, and explain what all different data plane implementations would
>> > > fit under it. Would, for instance a hash-lookup work? In that case,
>> what
>> > > would the data plane APIs be, and the control plane ones.
>> > >
>> >
>> >  I'm not sure for _all_ data plane implementations, but from my point of
>> > view compressed prefix trie (rte_rib structure) could be useful at least
>> > for dir24_8, dxr, bitmap handling. Concerning to hash lookup, it
>> depends on
>> > algorithm (array of hash tables indexed by mask length, unrolling
>> prefix to
>> > number of /32).
>> > Perhaps it is better to waive the abstraction and make LPM as primary
>> > struct that keeps rte_rib inside (instead of rules_tbl[ ]).
>> > In that case rte_rib becomes side structure and it's API only for
>> working
>> > with a trie. LPM's API remains the same (except next_hop size and LPM
>> > creation).
>> >
>> >
>> What is the advantage of using the rte_rib for control plane access over
>> the existing rules table structure. Are not the basic operations needed
>> for adding/removing/looking-up rules supported by both?
>>
>> /Bruce
>>
>
>
>
> --
> Regards,
> Vladimir
>



-- 
Regards,
Vladimir

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dpdk-dev] [RFC] Add RIB library
  2017-08-15 11:01         ` Vladimir Medvedkin
@ 2018-01-16  0:41           ` Thomas Monjalon
  0 siblings, 0 replies; 12+ messages in thread
From: Thomas Monjalon @ 2018-01-16  0:41 UTC (permalink / raw)
  To: Vladimir Medvedkin, Bruce Richardson; +Cc: dev

Bruce, Vladimir,
There was no progress since August.
Is there a plan to benefit from Vladimir's work?

15/08/2017 13:01, Vladimir Medvedkin:
> Moreover rte_rib_v4_node could contain app specific extension (.ext field).
> For example you can implement PIC (prefix independent convergence) by
> linking rte_rib_v4_node with similar next hop together and precalculate
> feasible next hop for each. Something like:
> struct rte_rib_pic_nh {
>     struct *rte_rib_v4_node;
>     uint64_t nh;
>     uint64_t feasible_nh;
> }
> and keep that linked list's head in next hop structure.
> When next hop fails you just jump from rte_rib_v4_node rte_rib_v4_node and
> change next hop very fast.
> 
> 2017-08-15 13:49 GMT+03:00 Vladimir Medvedkin <medvedkinv@gmail.com>:
> 
> > The advantage is in increasing control plane operations speed. I tested
> > with my fullview + internal routes. It had 650030 prefixes(shuffled) with
> > 1000 specific (longer /24) prefixes within 73 /24 networks. Every prefix
> > had unique next hop. In this test the speed of inserting new routes was
> > increased 70 times against current LPM. This was achieved due to
> > 1. keeping routes in a trie structure instead of array (no need to get
> > free room for rule)
> > 2. avoid unnecessary reads in tbl24 (checking for .depth). Thanks to
> > rte_rib_v4_get_next_child() (that is reverse order traversal without
> > recursion) you can get all more specific prefixes inside your target prefix
> > (you want to add/del), so you can get all ranges between that more
> > specifics and write next hop unconditionally to tbl24 and tbl8.
> >
> > 2017-08-15 11:23 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com>:
> >
> >> On Tue, Aug 15, 2017 at 01:28:26AM +0300, Vladimir Medvedkin wrote:
> >> > 2017-08-14 13:51 GMT+03:00 Bruce Richardson <bruce.richardson@intel.com
> >> >:
> >> >
> >> > > On Tue, Jul 11, 2017 at 07:33:04PM +0000, Medvedkin Vladimir wrote:
> >> > > > Hi,
> >> > > >
> >> > > > I want to introduce new library for ip routing lookup that have some
> >> > > advantages
> >> > > > over current LPM library. In short:
> >> > > >      - Increases the speed of control plane operations against lpm
> >> such
> >> > > as
> >> > > >        adding/deleting routes
> >> > > >      - Adds abstraction from dataplane algorythms, so it is
> >> possible to
> >> > > add
> >> > > >        different ip route lookup algorythms such as
> >> > > DXR/poptrie/lpc-trie/etc
> >> > > >        in addition to current dir24_8
> >> > > >      - It is possible to keep user defined application specific
> >> > > additional
> >> > > >        information in struct rte_rib_v4_node which represents route
> >> > > entry.
> >> > > >        It can be next hop/set of next hops (i.e. active and
> >> feasible),
> >> > > >        pointers to link rte_rib_v4_node based on some criteria (i.e.
> >> > > next_hop),
> >> > > >        plenty of additional control plane information.
> >> > > >      - For dir24_8 implementation it is possible to remove
> >> > > rte_lpm_tbl_entry.depth
> >> > > >        field that helps to save 6 bits.
> >> > > >      - Also new dir24_8 implementation supports different next_hop
> >> sizes
> >> > > >        (1/2/4/8 bytes per next hop)
> >> > > >
> >> > > > It would be nice to hear your opinion. The draft is below.
> >> > > >
> >> > > > Medvedkin Vladimir (1):
> >> > > >   lib/rib: Add Routing Information Base library
> >> > > >
> >> > >
> >> > > On reading this patch and then having discussion with you offline, it
> >> > > appears there are two major new elements in this patchset:
> >> > >
> >> > > 1. a re-implementation of LPM, with the major advantage of having a
> >> > > flexible data-size
> >> > > 2. a separate control plane structure that is designed to fit on top
> >> off
> >> > > possibly multiple lookup structures for the data plane
> >> > >
> >> > > Is this correct?
> >> > >
> >> > Correct
> >> >
> >> > >
> >> > > For the first part, I don't think we should carry about two separate
> >> LPM
> >> > > implementations, but rather look to take the improvements in your
> >> > > version back into the existing lib. [Or else replace the existing one,
> >> > > but I prefer pulling the new stuff into it, so as to keep backward
> >> > > compatibility]
> >> > >
> >> >
> >> > > For the second part, perhaps you could expand a bit more on the
> >> thought
> >> > > here, and explain what all different data plane implementations would
> >> > > fit under it. Would, for instance a hash-lookup work? In that case,
> >> what
> >> > > would the data plane APIs be, and the control plane ones.
> >> > >
> >> >
> >> >  I'm not sure for _all_ data plane implementations, but from my point of
> >> > view compressed prefix trie (rte_rib structure) could be useful at least
> >> > for dir24_8, dxr, bitmap handling. Concerning to hash lookup, it
> >> depends on
> >> > algorithm (array of hash tables indexed by mask length, unrolling
> >> prefix to
> >> > number of /32).
> >> > Perhaps it is better to waive the abstraction and make LPM as primary
> >> > struct that keeps rte_rib inside (instead of rules_tbl[ ]).
> >> > In that case rte_rib becomes side structure and it's API only for
> >> working
> >> > with a trie. LPM's API remains the same (except next_hop size and LPM
> >> > creation).
> >> >
> >> >
> >> What is the advantage of using the rte_rib for control plane access over
> >> the existing rules table structure. Are not the basic operations needed
> >> for adding/removing/looking-up rules supported by both?
> >>
> >> /Bruce
> >>
> >
> >
> >
> > --
> > Regards,
> > Vladimir
> >
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-01-16  0:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11 19:33 [dpdk-dev] [RFC] Add RIB library Medvedkin Vladimir
2017-07-11 19:33 ` [dpdk-dev] [PATCH] lib/rib: Add Routing Information Base library Medvedkin Vladimir
2017-07-11 20:28   ` Stephen Hemminger
2017-07-11 23:17     ` Vladimir Medvedkin
2017-07-11 20:31 ` [dpdk-dev] [RFC] Add RIB library Stephen Hemminger
2017-07-11 23:13   ` Vladimir Medvedkin
2017-08-14 10:51 ` Bruce Richardson
2017-08-14 22:28   ` Vladimir Medvedkin
2017-08-15  8:23     ` Bruce Richardson
2017-08-15 10:49       ` Vladimir Medvedkin
2017-08-15 11:01         ` Vladimir Medvedkin
2018-01-16  0:41           ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).