From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A4AE3A0597; Fri, 10 Apr 2020 01:07:09 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2A79A1C43C; Fri, 10 Apr 2020 01:07:09 +0200 (CEST) Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by dpdk.org (Postfix) with ESMTP id 1AC031C2F8 for ; Fri, 10 Apr 2020 01:07:08 +0200 (CEST) Received: by mail-lj1-f196.google.com with SMTP id z26so276780ljz.11 for ; Thu, 09 Apr 2020 16:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf-com.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=UgnCbsjGOZFo3e0KPSpV69ToWdSct01ZYHdIi7s2Ld0=; b=t93TDucee42DulaCe9OMSGu5ZjUTKRh6hIa7/a1hPnD0RuRx1NvhWuwSQ3m7f4Pzuh CkdI/ud1R7Ec29u8T4QqWbOK6h/59WcJ1R9cfGtfetyVYnfwbt+CzzhwOehzT444b9y+ hyeQfdxIwEFYXW64aLnYz2a9UZBrIuiSGzpmGw3vktynhVN7fCc8OUChV4kfmu6dfMOl 0U9/Y0cZv6Cwi7cjx08yrROdc797cuVX+UvWtoEfhr77SHbmaMXCakiAbau5CXgF+1we fe5IrFSIjBc+U6SqPfvbvJGSpK3z3xaScmMQTYKZMYiJtqAs6YPrnHK/S2YNGeEsV1aS YNlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=UgnCbsjGOZFo3e0KPSpV69ToWdSct01ZYHdIi7s2Ld0=; b=D45rrY89EhRITJWZyUKh5bQf9gDm04n7/hHoTKOgwkrMff0GfTcwqfwHmKA+ifPyyl HTxS5vMBBY3GsByxrv2d6UinBGCGSC/re+HnggIDnwEGCnd/iRLEUsN5QtkbKjsMLZEo cHaJz0pK9jl+84IPvbawZW5i8R6vv91uroFibilMujGKIKUl1g1QFfdGRtdoz/+VFYhT rUKbVP0XvTSpKJyrEBtMffKwiaklijUuLRCOKXaMnw1isCdIOINdbnK6wGPNJdm5l8AX 4XXGUUPRbq1Lwg29PghENg9YaJzKFx8nCurD+UpH9fflUOcELd0I1oTfphvsH3sGglqT mccw== X-Gm-Message-State: AGi0PuZoNgpQqlsPscS7DrXRVxwRfZS8m/xe4aI9XDy4HCxKQfY8VA/Z /0vvq32be6SVQyxyc2CJV8R2n1Mkr9A= X-Google-Smtp-Source: APiQypJv3rD4kqOOXu54fFLjD5yVQuNyvZzHd7JO1aBEcigRmMfktEn4mSIFc/5IP/rJ/nhXxwjeQg== X-Received: by 2002:a2e:8146:: with SMTP id t6mr1251276ljg.236.1586473627219; Thu, 09 Apr 2020 16:07:07 -0700 (PDT) Received: from [192.168.8.100] (user-5-173-33-152.play-internet.pl. [5.173.33.152]) by smtp.gmail.com with ESMTPSA id v12sm100765ljh.6.2020.04.09.16.07.06 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 09 Apr 2020 16:07:06 -0700 (PDT) To: dev@dpdk.org References: <20200331192945.2466880-1-jerinj@marvell.com> <20200405085613.1336841-1-jerinj@marvell.com> <20200405085613.1336841-13-jerinj@marvell.com> From: Andrzej Ostruszka Message-ID: <491dec21-e05a-2a06-9bb7-f6c5802250ef@semihalf.com> Date: Fri, 10 Apr 2020 01:07:05 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: <20200405085613.1336841-13-jerinj@marvell.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v4 12/29] graph: implement fastpath API routines X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 4/5/20 10:55 AM, jerinj@marvell.com wrote: > From: Jerin Jacob > > Adding implementation for rte_graph_walk() API. This will perform a walk > on the circular buffer and call the process function of each node > and collect the stats if stats collection is enabled. > > Signed-off-by: Jerin Jacob > Signed-off-by: Kiran Kumar K > Signed-off-by: Pavan Nikhilesh > Signed-off-by: Nithin Dabilpuram > --- [...] > +__rte_experimental > +static inline void > +rte_graph_walk(struct rte_graph *graph) > +{ > + const rte_graph_off_t *cir_start = graph->cir_start; > + const rte_node_t mask = graph->cir_mask; > + uint32_t head = graph->head; > + struct rte_node *node; > + uint64_t start; > + uint16_t rc; > + void **objs; > + > + /* > + * Walk on the source node(s) ((cir_start - head) -> cir_start) and then > + * on the pending streams (cir_start -> (cir_start + mask) -> cir_start) > + * in a circular buffer fashion. > + * > + * +-----+ <= cir_start - head [number of source nodes] > + * | | > + * | ... | <= source nodes > + * | | > + * +-----+ <= cir_start [head = 0] [tail = 0] > + * | | > + * | ... | <= pending streams > + * | | > + * +-----+ <= cir_start + mask > + */ > + while (likely(head != graph->tail)) { > + node = RTE_PTR_ADD(graph, cir_start[(int32_t)head++]); > + RTE_ASSERT(node->fence == RTE_GRAPH_FENCE); > + objs = node->objs; > + rte_prefetch0(objs); > + > + if (rte_graph_has_stats_feature()) { > + start = rte_rdtsc(); > + rc = node->process(graph, node, objs, node->idx); > + node->total_cycles += rte_rdtsc() - start; > + node->total_calls++; > + node->total_objs += rc; > + } else { > + node->process(graph, node, objs, node->idx); > + } > + node->idx = 0; So I guess this is a responsibility of a node process function to handle all objects. What should it do if it is not possible. E.g. after tx_burst we usually drop packets, how do you drop objects in graph? Do you simply free them (does the node knows how object was allocated?) or you need to pass it to "null" node. Process function returns number of objects processed (e.g. later RX/TX nodes), why it is not used here? > + head = likely((int32_t)head > 0) ? head & mask : head; > + } > + graph->tail = 0; > +} [...] > +__rte_experimental > +static inline void > +rte_node_enqueue(struct rte_graph *graph, struct rte_node *node, > + rte_edge_t next, void **objs, uint16_t nb_objs) > +{ > + node = __rte_node_next_node_get(node, next); > + const uint16_t idx = node->idx; > + > + __rte_node_enqueue_prologue(graph, node, idx, nb_objs); > + > + rte_memcpy(&node->objs[idx], objs, nb_objs * sizeof(void *)); > + node->idx = idx + nb_objs; > +} I see how it works for usual scenario. But is there some kind of fork/join operation? Just trying to imagine scenario where I have a bunch of co-processors doing different operations for given object/packet. Then to to minimize latency I'd like to use them in parallel and when all done then enqueue it to next node. Is there a support for that in terms of graph lib or should that be handled in the process function of a single node? [...] > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Enqueue objs to multiple next nodes for further processing and > + * set the next nodes to pending state in the circular buffer. > + * objs[i] will be enqueued to nexts[i]. > + * > + * @param graph > + * Graph pointer returned from rte_graph_lookup(). > + * @param node > + * Current node pointer. > + * @param nexts > + * List of relative next node indices to enqueue objs. > + * @param objs > + * List of objs to enqueue. > + * @param nb_objs > + * Number of objs to enqueue. > + */ > +__rte_experimental > +static inline void > +rte_node_enqueue_next(struct rte_graph *graph, struct rte_node *node, > + rte_edge_t *nexts, void **objs, uint16_t nb_objs) > +{ > + uint16_t i; > + > + for (i = 0; i < nb_objs; i++) > + rte_node_enqueue_x1(graph, node, nexts[i], objs[i]); I have noticed comments about x1/2/4 functions but since you defended the performance there why not having some kind of use of them here (Duff's device like unrolling)? Just like you have in your performance test. [...] > +__rte_experimental > +static inline void ** > +rte_node_next_stream_get(struct rte_graph *graph, struct rte_node *node, > + rte_edge_t next, uint16_t nb_objs) > +{ > + node = __rte_node_next_node_get(node, next); > + const uint16_t idx = node->idx; > + uint16_t free_space = node->size - idx; > + > + if (unlikely(free_space < nb_objs)) > + __rte_node_stream_alloc_size(graph, node, nb_objs); > + > + return &node->objs[idx]; > +} > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Put the next stream to pending state in the circular buffer > + * for further processing. Should be invoked followed by > + * rte_node_next_stream_get(). Is the last sentence correct? > + * > + * @param graph > + * Graph pointer returned from rte_graph_lookup(). > + * @param node > + * Current node pointer. > + * @param next > + * Relative next node index.. > + * @param idx > + * Number of objs updated in the stream after getting the stream using > + * rte_node_next_stream_get. > + * > + * @see rte_node_next_stream_get(). > + */ > +__rte_experimental > +static inline void > +rte_node_next_stream_put(struct rte_graph *graph, struct rte_node *node, > + rte_edge_t next, uint16_t idx) I understood the stream_move as an optimized enqueue, but could you describe how these _get/put are meant to be used, and why not use stream_move/enqueue? I see in later (testing) patches that it is used for source nodes, and I'm not sure I understand the difference between them. With regards Andrzej Ostruszka