From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cristian.dumitrescu@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 919E92A1A
 for <dev@dpdk.org>; Mon, 28 Sep 2015 22:46:27 +0200 (CEST)
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by fmsmga101.fm.intel.com with ESMTP; 28 Sep 2015 13:46:26 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.17,604,1437462000"; d="scan'208";a="799118370"
Received: from irsmsx109.ger.corp.intel.com ([163.33.3.23])
 by fmsmga001.fm.intel.com with ESMTP; 28 Sep 2015 13:46:26 -0700
Received: from irsmsx108.ger.corp.intel.com ([169.254.11.216]) by
 IRSMSX109.ger.corp.intel.com ([169.254.13.137]) with mapi id 14.03.0248.002;
 Mon, 28 Sep 2015 21:46:24 +0100
From: "Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>
To: "Yeddula, Avinash" <ayeddula@ciena.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] Need your thoughts on DPDK hash table / DPDK
 lookup/insert API's
Thread-Index: AdD34TS+v3QRYXIySDinG5KvTy34WwCSL02g
Date: Mon, 28 Sep 2015 20:46:24 +0000
Message-ID: <3EB4FA525960D640B5BDFFD6A3D89126478DA312@IRSMSX108.ger.corp.intel.com>
References: <A1E50D8AD6310E47A6C10F075AEDC0220379FB2A6D@ONWVEXCHMB01.ciena.com>
In-Reply-To: <A1E50D8AD6310E47A6C10F075AEDC0220379FB2A6D@ONWVEXCHMB01.ciena.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.182]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] Need your thoughts on DPDK hash table /
	DPDK	lookup/insert API's
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Sep 2015 20:46:28 -0000


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yeddula, Avinash
> Sent: Friday, September 25, 2015 11:27 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] Need your thoughts on DPDK hash table / DPDK
> lookup/insert API's
>=20
> Hello All,
>=20
>=20
> 1.      I've a scenario where I need to walk the entire Hash table to ret=
rieve the
> data.  I'm currently using DPDK extensible bucket hash in the rte_table l=
ibrary
> of packet framework.
>=20
> Since I'll not be storing the keys anywhere else, I don't have a way to w=
alk
> hash table.
>=20
> I'm planning to write one for my application, but just wanted to check wi=
th
> the DPDK community on their thoughts.
>=20

Please take a look at the examples/ip_pipeline application (from DPDK relea=
se 2.1+). You can look at any pipeline (flow classification, firewall, rout=
ing, etc), all of them implement the strategy detailed below.

The way we are solving this problem is by creating two copies of the same f=
unctional table, one fast copy and one slow copy, which are kept in sync wh=
ile used for different purposes:
-fast table copy: used by data plane, implemented using the rte_table API, =
lookup is packet-oriented (works with a packet burst), lookup operation opt=
imized for performance
-slow table copy: used by the control/management plane, not necessarily slo=
w for lookup but optimized for queries, kept in sync with the fast table co=
py; main point is: queries from management are executed without impacting t=
he data plane performance

To avoid locks, the data plane thread is the only thread that accesses the =
fast table copy, including lookup and add/delete entries. Besides polling t=
he input packet queues for packet processing, the data plane thread also po=
lls its input message queues (with a much lower frequency) to handle reques=
ts for updates on the fast table copy (such as adding new table entries, up=
dating existing entries, deleting entries). The management thread updates t=
he slow copy and sends requests to the data plane thread to update the fast=
 table copy.

This way, complex table queries required by the management plane are implem=
ented without impact to the data plane thread. These queries are usually de=
fined by the application and it is not possible to do them in a generic way=
, here are some examples:
-hash table used for flow classification: list all flows with certain value=
/regex for whatever field in the flow key tuple, list all flows ordered bas=
ed on whatever application dependent criteria (e.g. subscriber name, area c=
ode, etc)
-LPM table used for routing: list all routes with a specific output interfa=
ce in the descending order of the depth of their IP prefix

A few documentation pointers:
http://www.dpdk.org/doc/guides/sample_app_ug/ip_pipeline.html#table-copies
http://www.dpdk.org/doc/guides/prog_guide/packet_framework.html#shared-data=
-structures


>=20
>=20
> 2.      I have a scenario where the components whose are not part of the
> pipeline needs to call the DPDK lookup/add apis. Moreover they are
> interested in lookup/insert one entry at a time. With the current approac=
h, I
> know everything works in bursts to achieve more/better performance.
>=20
>=20
>=20
> Currently a lookup api's looks like this.
>=20
> static int rte_table_array_lookup( void *table,    struct rte_mbuf **pkts=
,
> uint64_t pkts_mask,    uint64_t *lookup_hit_mask,    void **entries)
>=20
>=20
>=20
> New addition to the existing lookup,  I would like it to be something lik=
e this
> (not exactly, something similar).   With this the outside guy doesn't hav=
e to
> construct "rte_mbuf and put the key in the metadata for the DPDK lookup
> API to act "
>=20
> static int rte_table_array_single_pkt_lookup( void *table,   void *key,  =
 void
> *entry)
>=20
>

The packet-oriented lookup API of rte_table is intended for packet processi=
ng threads (data plane), you should have a different lookup mechanism for m=
anagement thread, which should probably use a different table copy, as prop=
osed above.

For example, please table a look at flow classification pipeline from the e=
xamples/ip_pipeline application:
-file pipeline_flow_classification_be.c: the back end of the pipeline (the =
data plane thread) uses the rte_table lookup API (packet oriented)
-file pipeline_flow_classification.c: the front-end of the pipeline (the CL=
I code executed by the management/master thread) implements a different has=
h mechanism with linked list buckets built on top of TAILQ (struct app_pipe=
line_fc::flows), which is kept in sync with the back-end copy through messa=
ge requests
=20
The back-end copy is exclusively used for packet processing, while the fron=
t-end copy is used for queries and list/display operations.

>=20
> Any thoughts on  adding these 2 items officially to dpdk library.
>=20
>=20
>=20
> Thanks
>=20
> -Avinash