From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 919E92A1A for ; Mon, 28 Sep 2015 22:46:27 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP; 28 Sep 2015 13:46:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.17,604,1437462000"; d="scan'208";a="799118370" Received: from irsmsx109.ger.corp.intel.com ([163.33.3.23]) by fmsmga001.fm.intel.com with ESMTP; 28 Sep 2015 13:46:26 -0700 Received: from irsmsx108.ger.corp.intel.com ([169.254.11.216]) by IRSMSX109.ger.corp.intel.com ([169.254.13.137]) with mapi id 14.03.0248.002; Mon, 28 Sep 2015 21:46:24 +0100 From: "Dumitrescu, Cristian" To: "Yeddula, Avinash" , "dev@dpdk.org" Thread-Topic: [dpdk-dev] Need your thoughts on DPDK hash table / DPDK lookup/insert API's Thread-Index: AdD34TS+v3QRYXIySDinG5KvTy34WwCSL02g Date: Mon, 28 Sep 2015 20:46:24 +0000 Message-ID: <3EB4FA525960D640B5BDFFD6A3D89126478DA312@IRSMSX108.ger.corp.intel.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] Need your thoughts on DPDK hash table / DPDK lookup/insert API's X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Sep 2015 20:46:28 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Yeddula, Avinash > Sent: Friday, September 25, 2015 11:27 PM > To: dev@dpdk.org > Subject: [dpdk-dev] Need your thoughts on DPDK hash table / DPDK > lookup/insert API's >=20 > Hello All, >=20 >=20 > 1. I've a scenario where I need to walk the entire Hash table to ret= rieve the > data. I'm currently using DPDK extensible bucket hash in the rte_table l= ibrary > of packet framework. >=20 > Since I'll not be storing the keys anywhere else, I don't have a way to w= alk > hash table. >=20 > I'm planning to write one for my application, but just wanted to check wi= th > the DPDK community on their thoughts. >=20 Please take a look at the examples/ip_pipeline application (from DPDK relea= se 2.1+). You can look at any pipeline (flow classification, firewall, rout= ing, etc), all of them implement the strategy detailed below. The way we are solving this problem is by creating two copies of the same f= unctional table, one fast copy and one slow copy, which are kept in sync wh= ile used for different purposes: -fast table copy: used by data plane, implemented using the rte_table API, = lookup is packet-oriented (works with a packet burst), lookup operation opt= imized for performance -slow table copy: used by the control/management plane, not necessarily slo= w for lookup but optimized for queries, kept in sync with the fast table co= py; main point is: queries from management are executed without impacting t= he data plane performance To avoid locks, the data plane thread is the only thread that accesses the = fast table copy, including lookup and add/delete entries. Besides polling t= he input packet queues for packet processing, the data plane thread also po= lls its input message queues (with a much lower frequency) to handle reques= ts for updates on the fast table copy (such as adding new table entries, up= dating existing entries, deleting entries). The management thread updates t= he slow copy and sends requests to the data plane thread to update the fast= table copy. This way, complex table queries required by the management plane are implem= ented without impact to the data plane thread. These queries are usually de= fined by the application and it is not possible to do them in a generic way= , here are some examples: -hash table used for flow classification: list all flows with certain value= /regex for whatever field in the flow key tuple, list all flows ordered bas= ed on whatever application dependent criteria (e.g. subscriber name, area c= ode, etc) -LPM table used for routing: list all routes with a specific output interfa= ce in the descending order of the depth of their IP prefix A few documentation pointers: http://www.dpdk.org/doc/guides/sample_app_ug/ip_pipeline.html#table-copies http://www.dpdk.org/doc/guides/prog_guide/packet_framework.html#shared-data= -structures >=20 >=20 > 2. I have a scenario where the components whose are not part of the > pipeline needs to call the DPDK lookup/add apis. Moreover they are > interested in lookup/insert one entry at a time. With the current approac= h, I > know everything works in bursts to achieve more/better performance. >=20 >=20 >=20 > Currently a lookup api's looks like this. >=20 > static int rte_table_array_lookup( void *table, struct rte_mbuf **pkts= , > uint64_t pkts_mask, uint64_t *lookup_hit_mask, void **entries) >=20 >=20 >=20 > New addition to the existing lookup, I would like it to be something lik= e this > (not exactly, something similar). With this the outside guy doesn't hav= e to > construct "rte_mbuf and put the key in the metadata for the DPDK lookup > API to act " >=20 > static int rte_table_array_single_pkt_lookup( void *table, void *key, = void > *entry) >=20 > The packet-oriented lookup API of rte_table is intended for packet processi= ng threads (data plane), you should have a different lookup mechanism for m= anagement thread, which should probably use a different table copy, as prop= osed above. For example, please table a look at flow classification pipeline from the e= xamples/ip_pipeline application: -file pipeline_flow_classification_be.c: the back end of the pipeline (the = data plane thread) uses the rte_table lookup API (packet oriented) -file pipeline_flow_classification.c: the front-end of the pipeline (the CL= I code executed by the management/master thread) implements a different has= h mechanism with linked list buckets built on top of TAILQ (struct app_pipe= line_fc::flows), which is kept in sync with the back-end copy through messa= ge requests =20 The back-end copy is exclusively used for packet processing, while the fron= t-end copy is used for queries and list/display operations. >=20 > Any thoughts on adding these 2 items officially to dpdk library. >=20 >=20 >=20 > Thanks >=20 > -Avinash