From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.mhcomputing.net (master.mhcomputing.net [74.208.46.186]) by dpdk.org (Postfix) with ESMTP id 8F768C67A for ; Mon, 22 Jun 2015 19:55:38 +0200 (CEST) Received: by mail.mhcomputing.net (Postfix, from userid 1000) id D521A80A0A6; Mon, 22 Jun 2015 10:53:02 -0700 (PDT) Date: Mon, 22 Jun 2015 10:53:02 -0700 From: Matthew Hall To: Vladimir Medvedkin Message-ID: <20150622175302.GA15788@mhcomputing.net> References: <5A3882CB-0DE0-43DB-8DCA-051D561AA943@mhcomputing.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Cc: "" Subject: Re: [dpdk-dev] rte_lpm with larger nexthops or another method? X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 22 Jun 2015 17:55:38 -0000 That's a lot better indeed! 16 million CIDR blocks would be a huge improvement. Do you happen to know what is also possible for rte_lpm6? Matthew. On Mon, Jun 22, 2015 at 01:11:14PM +0300, Vladimir Medvedkin wrote: > Hi Matthew, > > I just recently thought about next_hop extension. For ipv4 we can do > something like: > struct rte_lpm_tbl24_entry { > /* Stores Next hop or group index (i.e. gindex)into tbl8. */ > union { > uint32_t next_hop :24; > uint32_t tbl8_gindex :24; > }__attribute__((__packed__)); > /* Using single uint8_t to store 3 values. */ > uint32_t valid :1; /**< Validation flag. */ > uint32_t ext_entry :1; /**< External entry. */ > uint32_t depth :6; /**< Rule depth. */ > }; > so we have 24 bit for next_hop. > > 2015-06-22 5:29 GMT+03:00 Matthew Hall : > > > Hello, > > > > I have gone out on the internet for days looking at a bunch of different > > radix tree implementations to see if I could figure a way to implement my > > own tree, just to work around the really low 255 CIDR block limitation in > > librte_lpm. Unfortunately every single one I could find falls into one of > > these two annoying categories: > > > > 1) bloated with a lot of irrelevant kernel code I don't care about > > (especially the Linux version but also the BSD one, which also makes a > > weird assumption every address object stores its length in byte 0 of the > > address struct). These are hard to convert into something that plays nice > > with raw packet data. > > > > 2) very seemingly simple code, which breaks horribly if you try to add > > IPv6 support (such as the radix tree from University of Michigan / LLVM > > compiler benchmark suite, and the one from the old unmaintained mrt daemon, > > which includes a bizarre custom reference-counted memory manager that is > > very convoluted). These are easy to set up, but cause a lot of weird > > segfaults which I am having a difficult time to try to debug. > > > > So it seems like I am going nowhere with this approach. Instead, I'd like > > to know, what would I need to do to add this support to my local copy of > > librte_lpm? Let's assume for the sake of this discussion, that I don't care > > one iota about any performance cost, and I am happy if I need to prefetch > > two cachelines instead of just one (which I recall from a past thread is > > why librte_lpm has such a low nexthop limit to start with). > > > > Failing that, does anybody have a known good userspace version of any of > > these sort of items: > > > > 1) Hash Based FIB (forwarding information base), > > 2) Tree Based FIB, > > 3) Patricia trie (which does not break horribly on IPv6 or make bad > > assumptions about data format besides uint8_t* and length), > > 4) Crit-Bit tree > > 5) any other good way of taking IPv4 and IPv6 and finding the longest > > prefix match against a table of pre-loaded CIDR blocks? > > > > I am really pulling out my hair trying to find a way to do something which > > doesn't seem like it should have to be be this difficult. I must be missing > > a more obvious way to handle this. > > > > Thanks, > > Matthew