From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 75ACFA057C; Thu, 26 Mar 2020 17:59:01 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D58061C030; Thu, 26 Mar 2020 17:57:22 +0100 (CET) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by dpdk.org (Postfix) with ESMTP id 50F1B1C030 for ; Thu, 26 Mar 2020 17:57:21 +0100 (CET) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 02QGiTUR025262; Thu, 26 Mar 2020 09:57:18 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=pfpt0818; bh=fQ+dx2GMiRNk1kQBUWMy7dg5B9ZuLW8so827Z9xr6yA=; b=UVmRWzSJidtHdAqlyelF15Uy3i7dQ/x3pyu1hiNBr18RsBmARnSzo++BSR73kcKuc6Bt 52u4faKkJncnQe4Cba76Y/EjMugWk6k176jb/ib2ovUdmsytN8cVw3iHBhAER2/dZya6 4GL7JKYGNN1u45ZjIOmx6eJg89NAUdrN8IeLPXIXuejeXfrIzFW+54SIDGkbp359Swhl T//h71VvslLo7iWu0BUtmiAdstCDb22NC+JHrUEPTY8CuebmkGWaSSMWiI5lyCG0yTPw BoTdMG2kT8vMViHveB3lmqYSw46l0X+ZL/xnGLKCH+Zd9IILE0LnhPZ2+YYSLp4TJ9Ul jg== Received: from sc-exch03.marvell.com ([199.233.58.183]) by mx0a-0016f401.pphosted.com with ESMTP id 2ywg9nxggp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Thu, 26 Mar 2020 09:57:18 -0700 Received: from DC5-EXCH01.marvell.com (10.69.176.38) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 26 Mar 2020 09:57:16 -0700 Received: from SC-EXCH03.marvell.com (10.93.176.83) by DC5-EXCH01.marvell.com (10.69.176.38) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 26 Mar 2020 09:57:16 -0700 Received: from maili.marvell.com (10.93.176.43) by SC-EXCH03.marvell.com (10.93.176.83) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Thu, 26 Mar 2020 09:57:15 -0700 Received: from jerin-lab.marvell.com (jerin-lab.marvell.com [10.28.34.14]) by maili.marvell.com (Postfix) with ESMTP id C17C73F7040; Thu, 26 Mar 2020 09:57:12 -0700 (PDT) From: To: John McNamara , Marko Kovacevic , Jerin Jacob , Kiran Kumar K CC: , , , , , , Date: Thu, 26 Mar 2020 22:26:28 +0530 Message-ID: <20200326165644.866053-13-jerinj@marvell.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20200326165644.866053-1-jerinj@marvell.com> References: <20200318213551.3489504-1-jerinj@marvell.com> <20200326165644.866053-1-jerinj@marvell.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.645 definitions=2020-03-26_08:2020-03-26, 2020-03-26 signatures=0 Subject: [dpdk-dev] [PATCH v2 12/28] graph: implement fastpath API routines X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" From: Jerin Jacob Adding implementation for rte_graph_walk() API. This will perform a walk on the circular buffer and call the process function of each node and collect the stats if stats collection is enabled. Signed-off-by: Jerin Jacob Signed-off-by: Kiran Kumar K Signed-off-by: Pavan Nikhilesh Signed-off-by: Nithin Dabilpuram --- doc/api/doxy-api-index.md | 1 + lib/librte_graph/graph.c | 16 + lib/librte_graph/rte_graph_version.map | 10 + lib/librte_graph/rte_graph_worker.h | 434 +++++++++++++++++++++++++ 4 files changed, 461 insertions(+) diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md index 5cc50f750..fd2ff64d7 100644 --- a/doc/api/doxy-api-index.md +++ b/doc/api/doxy-api-index.md @@ -160,6 +160,7 @@ The public API headers are grouped by topics: [port_in_action] (@ref rte_port_in_action.h) [table_action] (@ref rte_table_action.h) * [graph] (@ref rte_graph.h): + [graph_worker] (@ref rte_graph_worker.h) - **basic**: [approx fraction] (@ref rte_approx.h), diff --git a/lib/librte_graph/graph.c b/lib/librte_graph/graph.c index e96363777..d5d816c71 100644 --- a/lib/librte_graph/graph.c +++ b/lib/librte_graph/graph.c @@ -474,6 +474,22 @@ __rte_node_stream_alloc(struct rte_graph *graph, struct rte_node *node) node->realloc_count++; } +void __rte_noinline +__rte_node_stream_alloc_size(struct rte_graph *graph, struct rte_node *node, + uint16_t req_size) +{ + uint16_t size = node->size; + + RTE_VERIFY(size != UINT16_MAX); + /* Allocate double amount of size to avoid immediate realloc */ + size = RTE_MIN(UINT16_MAX, RTE_MAX(RTE_GRAPH_BURST_SIZE, req_size * 2)); + node->objs = rte_realloc_socket(node->objs, size * sizeof(void *), + RTE_CACHE_LINE_SIZE, graph->socket); + RTE_VERIFY(node->objs); + node->size = size; + node->realloc_count++; +} + static int graph_to_dot(FILE *f, struct graph *graph) { diff --git a/lib/librte_graph/rte_graph_version.map b/lib/librte_graph/rte_graph_version.map index adf55d406..13b838752 100644 --- a/lib/librte_graph/rte_graph_version.map +++ b/lib/librte_graph/rte_graph_version.map @@ -3,6 +3,7 @@ EXPERIMENTAL { __rte_node_register; __rte_node_stream_alloc; + __rte_node_stream_alloc_size; rte_graph_create; rte_graph_destroy; @@ -16,6 +17,7 @@ EXPERIMENTAL { rte_graph_node_get; rte_graph_node_get_by_name; rte_graph_obj_dump; + rte_graph_walk; rte_graph_cluster_stats_create; rte_graph_cluster_stats_destroy; @@ -28,10 +30,18 @@ EXPERIMENTAL { rte_node_edge_get; rte_node_edge_shrink; rte_node_edge_update; + rte_node_enqueue; + rte_node_enqueue_x1; + rte_node_enqueue_x2; + rte_node_enqueue_x4; + rte_node_enqueue_next; rte_node_from_name; rte_node_id_to_name; rte_node_list_dump; rte_node_max_count; + rte_node_next_stream_get; + rte_node_next_stream_put; + rte_node_next_stream_move; local: *; }; diff --git a/lib/librte_graph/rte_graph_worker.h b/lib/librte_graph/rte_graph_worker.h index a8133739d..a1bfc498b 100644 --- a/lib/librte_graph/rte_graph_worker.h +++ b/lib/librte_graph/rte_graph_worker.h @@ -101,6 +101,440 @@ struct rte_node { __rte_experimental void __rte_node_stream_alloc(struct rte_graph *graph, struct rte_node *node); +/** + * @internal + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Allocate a stream with requested number of objects. + * + * If stream already exists then re-allocate it to a larger size. + * + * @param graph + * Pointer to the graph object. + * @param node + * Pointer to the node object. + * @param req_size + * Number of objects to be allocated. + */ +__rte_experimental +void __rte_node_stream_alloc_size(struct rte_graph *graph, + struct rte_node *node, uint16_t req_size); + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Perform graph walk on the circular buffer and invoke the process function + * of the nodes and collect the stats. + * + * @param graph + * Graph pointer returned from rte_graph_lookup function. + * + * @see rte_graph_lookup() + */ +__rte_experimental +static inline void +rte_graph_walk(struct rte_graph *graph) +{ + const rte_graph_off_t *cir_start = graph->cir_start; + const rte_node_t mask = graph->cir_mask; + uint32_t head = graph->head; + struct rte_node *node; + uint64_t start; + uint16_t rc; + void **objs; + + /* + * Walk on the source node(s) ((cir_start - head) -> cir_start) and then + * on the pending streams (cir_start -> (cir_start + mask) -> cir_start) + * in a circular buffer fashion. + * + * +-----+ <= cir_start - head [number of source nodes] + * | | + * | ... | <= source nodes + * | | + * +-----+ <= cir_start [head = 0] [tail = 0] + * | | + * | ... | <= pending streams + * | | + * +-----+ <= cir_start + mask + */ + while (likely(head != graph->tail)) { + node = RTE_PTR_ADD(graph, cir_start[(int32_t)head++]); + RTE_ASSERT(node->fence == RTE_GRAPH_FENCE); + objs = node->objs; + rte_prefetch0(objs); + + if (rte_graph_has_stats_feature()) { + start = rte_rdtsc(); + rc = node->process(graph, node, objs, node->idx); + node->total_cycles += rte_rdtsc() - start; + node->total_calls++; + node->total_objs += rc; + } else { + node->process(graph, node, objs, node->idx); + } + node->idx = 0; + head = likely((int32_t)head > 0) ? head & mask : head; + } + graph->tail = 0; +} + +/* Fast path helper functions */ + +/** + * @internal + * + * Enqueue a given node to the tail of the graph reel. + * + * @param graph + * Pointer Graph object. + * @param node + * Pointer to node object to be enqueued. + */ +static __rte_always_inline void +__rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node) +{ + uint32_t tail; + + tail = graph->tail; + graph->cir_start[tail++] = node->off; + graph->tail = tail & graph->cir_mask; +} + +/** + * @internal + * + * Enqueue sequence prologue function. + * + * Updates the node to tail of graph reel and resizes the number of objects + * available in the stream as needed. + * + * @param graph + * Pointer to the graph object. + * @param node + * Pointer to the node object. + * @param idx + * Index at which the object enqueue starts from. + * @param space + * Space required for the object enqueue. + */ +static __rte_always_inline void +__rte_node_enqueue_prologue(struct rte_graph *graph, struct rte_node *node, + const uint16_t idx, const uint16_t space) +{ + + /* Add to the pending stream list if the node is new */ + if (idx == 0) + __rte_node_enqueue_tail_update(graph, node); + + if (unlikely(node->size < (idx + space))) + __rte_node_stream_alloc(graph, node); +} + +/** + * @internal + * + * Get the node pointer from current node edge id. + * + * @param node + * Current node pointer. + * @param next + * Edge id of the required node. + * + * @return + * Pointer to the node denoted by the edge id. + */ +static __rte_always_inline struct rte_node * +__rte_node_next_node_get(struct rte_node *node, rte_edge_t next) +{ + RTE_ASSERT(next < node->nb_edges); + RTE_ASSERT(node->fence == RTE_GRAPH_FENCE); + node = node->nodes[next]; + RTE_ASSERT(node->fence == RTE_GRAPH_FENCE); + + return node; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enqueue the objs to next node for further processing and set + * the next node to pending state in the circular buffer. + * + * @param graph + * Graph pointer returned from rte_graph_lookup(). + * @param node + * Current node pointer. + * @param next + * Relative next node index to enqueue objs. + * @param objs + * Objs to enqueue. + * @param nb_objs + * Number of objs to enqueue. + */ +__rte_experimental +static inline void +rte_node_enqueue(struct rte_graph *graph, struct rte_node *node, + rte_edge_t next, void **objs, uint16_t nb_objs) +{ + node = __rte_node_next_node_get(node, next); + const uint16_t idx = node->idx; + + __rte_node_enqueue_prologue(graph, node, idx, nb_objs); + + rte_memcpy(&node->objs[idx], objs, nb_objs * sizeof(void *)); + node->idx = idx + nb_objs; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enqueue only one obj to next node for further processing and + * set the next node to pending state in the circular buffer. + * + * @param graph + * Graph pointer returned from rte_graph_lookup(). + * @param node + * Current node pointer. + * @param next + * Relative next node index to enqueue objs. + * @param obj + * Obj to enqueue. + */ +__rte_experimental +static inline void +rte_node_enqueue_x1(struct rte_graph *graph, struct rte_node *node, + rte_edge_t next, void *obj) +{ + node = __rte_node_next_node_get(node, next); + uint16_t idx = node->idx; + + __rte_node_enqueue_prologue(graph, node, idx, 1); + + node->objs[idx++] = obj; + node->idx = idx; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enqueue only two objs to next node for further processing and + * set the next node to pending state in the circular buffer. + * Same as rte_node_enqueue_x1 but enqueue two objs. + * + * @param graph + * Graph pointer returned from rte_graph_lookup(). + * @param node + * Current node pointer. + * @param next + * Relative next node index to enqueue objs. + * @param obj0 + * Obj to enqueue. + * @param obj1 + * Obj to enqueue. + */ +__rte_experimental +static inline void +rte_node_enqueue_x2(struct rte_graph *graph, struct rte_node *node, + rte_edge_t next, void *obj0, void *obj1) +{ + node = __rte_node_next_node_get(node, next); + uint16_t idx = node->idx; + + __rte_node_enqueue_prologue(graph, node, idx, 2); + + node->objs[idx++] = obj0; + node->objs[idx++] = obj1; + node->idx = idx; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enqueue only four objs to next node for further processing and + * set the next node to pending state in the circular buffer. + * Same as rte_node_enqueue_x1 but enqueue four objs. + * + * @param graph + * Graph pointer returned from rte_graph_lookup(). + * @param node + * Current node pointer. + * @param next + * Relative next node index to enqueue objs. + * @param obj0 + * 1st obj to enqueue. + * @param obj1 + * 2nd obj to enqueue. + * @param obj2 + * 3rd obj to enqueue. + * @param obj3 + * 4th obj to enqueue. + */ +__rte_experimental +static inline void +rte_node_enqueue_x4(struct rte_graph *graph, struct rte_node *node, + rte_edge_t next, void *obj0, void *obj1, void *obj2, + void *obj3) +{ + node = __rte_node_next_node_get(node, next); + uint16_t idx = node->idx; + + __rte_node_enqueue_prologue(graph, node, idx, 4); + + node->objs[idx++] = obj0; + node->objs[idx++] = obj1; + node->objs[idx++] = obj2; + node->objs[idx++] = obj3; + node->idx = idx; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Enqueue objs to multiple next nodes for further processing and + * set the next nodes to pending state in the circular buffer. + * objs[i] will be enqueued to nexts[i]. + * + * @param graph + * Graph pointer returned from rte_graph_lookup(). + * @param node + * Current node pointer. + * @param nexts + * List of relative next node indices to enqueue objs. + * @param objs + * List of objs to enqueue. + * @param nb_objs + * Number of objs to enqueue. + */ +__rte_experimental +static inline void +rte_node_enqueue_next(struct rte_graph *graph, struct rte_node *node, + rte_edge_t *nexts, void **objs, uint16_t nb_objs) +{ + uint16_t i; + + for (i = 0; i < nb_objs; i++) + rte_node_enqueue_x1(graph, node, nexts[i], objs[i]); +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Get the stream of next node to enqueue the objs. + * Once done with the updating the objs, needs to call + * rte_node_next_stream_put to put the next node to pending state. + * + * @param graph + * Graph pointer returned from rte_graph_lookup(). + * @param node + * Current node pointer. + * @param next + * Relative next node index to get stream. + * @param nb_objs + * Requested free size of the next stream. + * + * @return + * Valid next stream on success. + * + * @see rte_node_next_stream_put(). + */ +__rte_experimental +static inline void ** +rte_node_next_stream_get(struct rte_graph *graph, struct rte_node *node, + rte_edge_t next, uint16_t nb_objs) +{ + node = __rte_node_next_node_get(node, next); + const uint16_t idx = node->idx; + uint16_t free_space = node->size - idx; + + if (unlikely(free_space < nb_objs)) + __rte_node_stream_alloc_size(graph, node, nb_objs); + + return &node->objs[idx]; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Put the next stream to pending state in the circular buffer + * for further processing. Should be invoked followed by + * rte_node_next_stream_get(). + * + * @param graph + * Graph pointer returned from rte_graph_lookup(). + * @param node + * Current node pointer. + * @param next + * Relative next node index.. + * @param idx + * Number of objs updated in the stream after getting the stream using + * rte_node_next_stream_get. + * + * @see rte_node_next_stream_get(). + */ +__rte_experimental +static inline void +rte_node_next_stream_put(struct rte_graph *graph, struct rte_node *node, + rte_edge_t next, uint16_t idx) +{ + if (unlikely(!idx)) + return; + + node = __rte_node_next_node_get(node, next); + if (node->idx == 0) + __rte_node_enqueue_tail_update(graph, node); + + node->idx += idx; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change without prior notice + * + * Home run scenario, Enqueue all the objs of current node to next + * node in optimized way by swapping the streams of both nodes. + * Performs good when next node is already not in pending state. + * If next node is already in pending state then normal enqueue + * will be used. + * + * @param graph + * Graph pointer returned from rte_graph_lookup(). + * @param src + * Current node pointer. + * @param next + * Relative next node index. + */ +__rte_experimental +static inline void +rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src, + rte_edge_t next) +{ + struct rte_node *dst = __rte_node_next_node_get(src, next); + + /* Let swap the pointers if dst don't have valid objs */ + if (likely(dst->idx == 0)) { + void **dobjs = dst->objs; + uint16_t dsz = dst->size; + dst->objs = src->objs; + dst->size = src->size; + src->objs = dobjs; + src->size = dsz; + dst->idx = src->idx; + __rte_node_enqueue_tail_update(graph, dst); + } else { /* Move the objects from src node to dst node */ + rte_node_enqueue(graph, src, next, src->objs, src->idx); + } +} + #ifdef __cplusplus } #endif -- 2.25.1