From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 86CBAA00C5; Thu, 30 Apr 2020 10:07:57 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id ED3B21D982; Thu, 30 Apr 2020 10:07:56 +0200 (CEST) Received: from smtp-4.sys.kth.se (smtp-4.sys.kth.se [130.237.48.193]) by dpdk.org (Postfix) with ESMTP id 7DD031D976 for ; Thu, 30 Apr 2020 10:07:55 +0200 (CEST) Received: from smtp-4.sys.kth.se (localhost.localdomain [127.0.0.1]) by smtp-4.sys.kth.se (Postfix) with ESMTP id 3B73F6B85; Thu, 30 Apr 2020 10:07:54 +0200 (CEST) X-Virus-Scanned: by amavisd-new at kth.se Received: from smtp-4.sys.kth.se ([127.0.0.1]) by smtp-4.sys.kth.se (smtp-4.sys.kth.se [127.0.0.1]) (amavisd-new, port 10024) with LMTP id QWinuniKgWJY; Thu, 30 Apr 2020 10:07:48 +0200 (CEST) X-KTH-Auth: barbette [2a02:a03f:4070:7c00:a0be:4634:4b62:aa73] DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kth.se; s=default; t=1588234068; bh=rof1/dq8gO0B6sBI4Q/7gMHsB/wyEV6uUfTnCVoRVWc=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=d6Pm30NLOXrNB9MSxi01WNdK3tOrAJsmkWPQME6NguV0S8fzfvW/Fh4uKhl5yhwXf PdOg5J7xr/ON/8es/K3HXvRLtdoPFwKteE4hIFfMnA6xD7wuIpD2nBKBSXbdtyx3FL Wyag4SGvURT8SlK21BzzITfoJ7R6ne6VGkuF1xvo= X-KTH-mail-from: barbette@kth.se Received: from [IPv6:2a02:a03f:4070:7c00:a0be:4634:4b62:aa73] (unknown [IPv6:2a02:a03f:4070:7c00:a0be:4634:4b62:aa73]) by smtp-4.sys.kth.se (Postfix) with ESMTPSA id 5FFF3E48; Thu, 30 Apr 2020 10:07:46 +0200 (CEST) To: jerinj@marvell.com Cc: dev@dpdk.org, thomas@monjalon.net, david.marchand@redhat.com, mdr@ashroe.eu, mattias.ronnblom@ericsson.com, kirankumark@marvell.com, pbhagavatula@marvell.com, ndabilpuram@marvell.com, xiao.w.wang@intel.com, amo@semihalf.com References: <20200405085613.1336841-1-jerinj@marvell.com> <20200411141428.1987768-1-jerinj@marvell.com> From: Tom Barbette Message-ID: <263134f2-f2a4-a873-5431-602e680ca347@kth.se> Date: Thu, 30 Apr 2020 10:07:43 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200411141428.1987768-1-jerinj@marvell.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v5 00/29] graph: introduce graph subsystem X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi all, I could not check all discussions regarding the graph subsystem, but I could not find a trivia behind the idea of re-creating yet another graph processing system like VPP, BESS, Click/FastClick and a few others that all support DPDK already and comes with up to thousands of "nodes" already built? Is there something fundamentally better than those? Or this is just to provide a clean in-house API? Thanks, Tom Le 11/04/2020 à 16:13, jerinj@marvell.com a écrit : > From: Jerin Jacob > > Using graph traversal for packet processing is a proven architecture > that has been implemented in various open source libraries. > > Graph architecture for packet processing enables abstracting the data > processing functions as “nodes” and “links” them together to create a > complex “graph” to create reusable/modular data processing functions. > > The patchset further includes performance enhancements and modularity > to the DPDK as discussed in more detail below. > > v5..v4: > ------ > Addressed the following review comments from Andrzej Ostruszka. > > 1) Addressed and comment in (http://mails.dpdk.org/archives/dev/2020-April/162184.html) > and improved following function prototypes/return types and adjusted the > implementation > a) rte_graph_node_get > b) rte_graph_max_count > c) rte_graph_export > d) rte_graph_destroy > 2) Updated UT and l3fwd-graph for updated function prototype > 3) bug fix in edge_update > 4) avoid reading graph_src_nodes_count() twice in rte_graph_create() > 5) Fix graph_mem_fixup_secondray typo > 6) Fixed Doxygen comments for rte_node_next_stream_put > 7) Updated the documentation to reflect the same. > 8) Removed RTE prefix from rte_node_mbuf_priv[1|2] * as they are > internal defines > 9) Limited next_hop id provided to LPM route add in > librte_node/ip4_lookup.c to 24 bits () > 10) Fixed pattern array overflow issue with l3fwd-graph/main.c by > splitting pattern > array to default + non-default array. Updated doc with the same info. > 11) Fixed parsing issues in parse_config() in l3fwd-graph/main.c inline > with issues reported > in l2fwd-event > 12)Removed next_hop field in l3fwd-graph/main.c main() > 13) Fixed graph create error check in l3fwd-graph/main.c main() > > v4..v3: > ------- > Addressed the following review comments from Wang, Xiao W > > 1) Remove unnecessary line from rte_graph.h > 2) Fix a typo from rte_graph.h > 3) Move NODE_ID_CHECK to 3rd patch where it is first used. > 4) Fixed bug in edge_update() > > v3..v2: > ------- > 1) refactor ipv4 node lookup by moving SSE and NEON specific code to > lib/librte_node/ip4_lookup_sse.h and lib/librte_node/ip4_lookup_neon.h > 2) Add scalar version of process() function for ipv4 lookup to make > the node work on NON x86 and arm64 machines. > > v2..v1: > ------ > 1) Added programmer guide/implementation documentation and l3fwd-graph doc > > RFC..v1: > -------- > > 1) Split the patch to more logical ones for review. > 2) Added doxygen comments for the API > 3) Code cleanup > 4) Additional performance improvements. > Delta between l3fwd and l3fwd-graph is negligible now. > (~1%) on octeontx2. > 5) Added SIMD routines for x86 in additional to arm64. > > Hosted in netlify for easy reference: > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Programmer’s Guide: > https://dpdk-graph.netlify.com/doc/html/guides/prog_guide/graph_lib.html > > l3fwd-graph doc: > https://dpdk-graph.netlify.com/doc/html/guides/sample_app_ug/l3_forward_graph.html > > API doc: > https://dpdk-graph.netlify.com/doc/html/api/rte__graph_8h.html > https://dpdk-graph.netlify.com/doc/html/api/rte__graph__worker_8h.html > https://dpdk-graph.netlify.com/doc/html/api/rte__node__eth__api_8h.html > https://dpdk-graph.netlify.com/doc/html/api/rte__node__ip4__api_8h.html > > 2) Added the release notes for the this feature > > 3) Fix build issues reported by CI for v1: > http://mails.dpdk.org/archives/test-report/2020-March/121326.html > > > Addional nodes planned for v20.08 > ---------------------------------- > 1) Packet classification node > 2) Support for IPV6 LPM node > > > This patchset contains > ----------------------------- > 1) The API definition to "create" nodes and "link" together to create a > "graph" for packet processing. See, lib/librte_graph/rte_graph.h > > 2) The Fast path API definition for the graph walker and enqueue > function used by the workers. See, lib/librte_graph/rte_graph_worker.h > > 3) Optimized SW implementation for (1) and (2). See, lib/librte_graph/ > > 4) Test case to verify the graph infrastructure functionality > See, app/test/test_graph.c > > 5) Performance test cases to evaluate the cost of graph walker and nodes > enqueue fast-path function for various combinations. > > See app/test/test_graph_perf.c > > 6) Packet processing nodes(Null, Rx, Tx, Pkt drop, IPV4 rewrite, IPv4 > lookup) > using graph infrastructure. See lib/librte_node/* > > 7) An example application to showcase l3fwd > (functionality same as existing examples/l3fwd) using graph > infrastructure and use packets processing nodes (item (6)). See examples/l3fwd-graph/. > > Performance > ----------- > 1) Graph walk and node enqueue overhead can be tested with performance > test case application [1] > # If all packets go from a node to another node (we call it as > # "homerun") then it will be just a pointer swap for a burst of packets. > # In the worst case, a couple of handful cycles to move an object from a > node to another node. > > 2) Performance comparison with existing l3fwd (The complete static code > with out any nodes) vs modular l3fwd-graph with 5 nodes > (ip4_lookup, ip4_rewrite, ethdev_tx, ethdev_rx, pkt_drop). > Here is graphical representation of the l3fwd-graph as Graphviz dot > file: > http://bit.ly/39UPPGm > > # l3fwd-graph performance is -1.2% wrt static l3fwd. > > # We have simulated the similar test with existing librte_pipeline > # application [4]. > ip_pipline application is -48.62% wrt static l3fwd. > > The above results are on octeontx2. It may vary on other platforms. > The platforms with higher L1 and L2 caches will have further better > performance. > > > Tested architectures: > -------------------- > 1) AArch64 > 2) X86 > > > Identified tweaking for better performance on different targets > --------------------------------------------------------------- > 1) Test with various burst size values (256, 128, 64, 32) using > CONFIG_RTE_GRAPH_BURST_SIZE config option. > Based on our testing, on x86 and arm64 servers, The sweet spot is 256 > burst size. > While on arm64 embedded SoCs, it is either 64 or 128. > > 2) Disable node statistics (use CONFIG_RTE_LIBRTE_GRAPH_STATS config > option) > if not needed. > > 3) Use arm64 optimized memory copy for arm64 architecture by > selecting CONFIG_RTE_ARCH_ARM64_MEMCPY. > > Commands to run tests > --------------------- > > [1] > perf test: > echo "graph_perf_autotest" | sudo ./build/app/test/dpdk-test -c 0x30 > > [2] > functionality test: > echo "graph_autotest" | sudo ./build/app/test/dpdk-test -c 0x30 > > [3] > l3fwd-graph: > ./l3fwd-graph -c 0x100 -- -p 0x3 --config="(0, 0, 8)" -P > > [4] > # ./ip_pipeline --c 0xff0000 -- -s route.cli > > Route.cli: (Copy paste to the shell to avoid dos format issues) > > https://pastebin.com/raw/B4Ktx7TT > > Jerin Jacob (13): > graph: define the public API for graph support > graph: implement node registration > graph: implement node operations > graph: implement node debug routines > graph: implement internal graph operation helpers > graph: populate fastpath memory for graph reel > graph: implement create and destroy APIs > graph: implement graph operation APIs > graph: implement Graphviz export > graph: implement debug routines > graph: implement stats support > graph: implement fastpath API routines > doc: add graph library programmer's guide guide > > Kiran Kumar K (2): > graph: add unit test case > node: add ipv4 rewrite node > > Nithin Dabilpuram (11): > node: add log infra and null node > node: add ethdev Rx node > node: add ethdev Tx node > node: add ethdev Rx and Tx node ctrl API > node: ipv4 lookup for arm64 > node: add ipv4 rewrite and lookup ctrl API > node: add packet drop node > l3fwd-graph: add graph based l3fwd skeleton > l3fwd-graph: add ethdev configuration changes > l3fwd-graph: add graph config and main loop > doc: add l3fwd graph application user guide > > Pavan Nikhilesh (3): > graph: add performance testcase > node: add generic ipv4 lookup node > node: ipv4 lookup for x86 > > MAINTAINERS | 14 + > app/test/Makefile | 7 + > app/test/meson.build | 12 +- > app/test/test_graph.c | 819 ++++ > app/test/test_graph_perf.c | 1057 ++++++ > config/common_base | 12 + > config/rte_config.h | 4 + > doc/api/doxy-api-index.md | 5 + > doc/api/doxy-api.conf.in | 2 + > doc/guides/prog_guide/graph_lib.rst | 397 ++ > .../prog_guide/img/anatomy_of_a_node.svg | 1078 ++++++ > .../prog_guide/img/graph_mem_layout.svg | 702 ++++ > doc/guides/prog_guide/img/link_the_nodes.svg | 3330 +++++++++++++++++ > doc/guides/prog_guide/index.rst | 1 + > doc/guides/rel_notes/release_20_05.rst | 32 + > doc/guides/sample_app_ug/index.rst | 1 + > doc/guides/sample_app_ug/intro.rst | 4 + > doc/guides/sample_app_ug/l3_forward_graph.rst | 334 ++ > examples/Makefile | 3 + > examples/l3fwd-graph/Makefile | 58 + > examples/l3fwd-graph/main.c | 1126 ++++++ > examples/l3fwd-graph/meson.build | 13 + > examples/meson.build | 6 +- > lib/Makefile | 6 + > lib/librte_graph/Makefile | 28 + > lib/librte_graph/graph.c | 587 +++ > lib/librte_graph/graph_debug.c | 84 + > lib/librte_graph/graph_ops.c | 169 + > lib/librte_graph/graph_populate.c | 234 ++ > lib/librte_graph/graph_private.h | 347 ++ > lib/librte_graph/graph_stats.c | 406 ++ > lib/librte_graph/meson.build | 11 + > lib/librte_graph/node.c | 421 +++ > lib/librte_graph/rte_graph.h | 668 ++++ > lib/librte_graph/rte_graph_version.map | 47 + > lib/librte_graph/rte_graph_worker.h | 510 +++ > lib/librte_node/Makefile | 32 + > lib/librte_node/ethdev_ctrl.c | 115 + > lib/librte_node/ethdev_rx.c | 221 ++ > lib/librte_node/ethdev_rx_priv.h | 81 + > lib/librte_node/ethdev_tx.c | 86 + > lib/librte_node/ethdev_tx_priv.h | 62 + > lib/librte_node/ip4_lookup.c | 215 ++ > lib/librte_node/ip4_lookup_neon.h | 238 ++ > lib/librte_node/ip4_lookup_sse.h | 244 ++ > lib/librte_node/ip4_rewrite.c | 326 ++ > lib/librte_node/ip4_rewrite_priv.h | 77 + > lib/librte_node/log.c | 14 + > lib/librte_node/meson.build | 10 + > lib/librte_node/node_private.h | 79 + > lib/librte_node/null.c | 23 + > lib/librte_node/pkt_drop.c | 26 + > lib/librte_node/rte_node_eth_api.h | 64 + > lib/librte_node/rte_node_ip4_api.h | 78 + > lib/librte_node/rte_node_version.map | 9 + > lib/meson.build | 5 +- > meson.build | 1 + > mk/rte.app.mk | 2 + > 58 files changed, 14538 insertions(+), 5 deletions(-) > create mode 100644 app/test/test_graph.c > create mode 100644 app/test/test_graph_perf.c > create mode 100644 doc/guides/prog_guide/graph_lib.rst > create mode 100644 doc/guides/prog_guide/img/anatomy_of_a_node.svg > create mode 100644 doc/guides/prog_guide/img/graph_mem_layout.svg > create mode 100644 doc/guides/prog_guide/img/link_the_nodes.svg > create mode 100644 doc/guides/sample_app_ug/l3_forward_graph.rst > create mode 100644 examples/l3fwd-graph/Makefile > create mode 100644 examples/l3fwd-graph/main.c > create mode 100644 examples/l3fwd-graph/meson.build > create mode 100644 lib/librte_graph/Makefile > create mode 100644 lib/librte_graph/graph.c > create mode 100644 lib/librte_graph/graph_debug.c > create mode 100644 lib/librte_graph/graph_ops.c > create mode 100644 lib/librte_graph/graph_populate.c > create mode 100644 lib/librte_graph/graph_private.h > create mode 100644 lib/librte_graph/graph_stats.c > create mode 100644 lib/librte_graph/meson.build > create mode 100644 lib/librte_graph/node.c > create mode 100644 lib/librte_graph/rte_graph.h > create mode 100644 lib/librte_graph/rte_graph_version.map > create mode 100644 lib/librte_graph/rte_graph_worker.h > create mode 100644 lib/librte_node/Makefile > create mode 100644 lib/librte_node/ethdev_ctrl.c > create mode 100644 lib/librte_node/ethdev_rx.c > create mode 100644 lib/librte_node/ethdev_rx_priv.h > create mode 100644 lib/librte_node/ethdev_tx.c > create mode 100644 lib/librte_node/ethdev_tx_priv.h > create mode 100644 lib/librte_node/ip4_lookup.c > create mode 100644 lib/librte_node/ip4_lookup_neon.h > create mode 100644 lib/librte_node/ip4_lookup_sse.h > create mode 100644 lib/librte_node/ip4_rewrite.c > create mode 100644 lib/librte_node/ip4_rewrite_priv.h > create mode 100644 lib/librte_node/log.c > create mode 100644 lib/librte_node/meson.build > create mode 100644 lib/librte_node/node_private.h > create mode 100644 lib/librte_node/null.c > create mode 100644 lib/librte_node/pkt_drop.c > create mode 100644 lib/librte_node/rte_node_eth_api.h > create mode 100644 lib/librte_node/rte_node_ip4_api.h > create mode 100644 lib/librte_node/rte_node_version.map >