From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 56E5242B19; Mon, 15 May 2023 22:52:48 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E7A8440A80; Mon, 15 May 2023 22:52:47 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id DCE5D4021F for ; Mon, 15 May 2023 22:52:45 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 6B5631C420 for ; Mon, 15 May 2023 22:52:45 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 69E541BFFF; Mon, 15 May 2023 22:52:45 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-3.4 required=5.0 tests=ALL_TRUSTED, AWL, NICE_REPLY_A, T_SCC_BODY_TEXT_LINE autolearn=disabled version=3.4.6 X-Spam-Score: -3.4 Received: from [192.168.1.59] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 9D8F41BE6B; Mon, 15 May 2023 22:52:44 +0200 (CEST) Message-ID: Date: Mon, 15 May 2023 22:52:44 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= Subject: Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts To: Jerin Jacob , =?UTF-8?Q?Mattias_R=c3=b6nnblom?= Cc: "jerinj@marvell.com" , "dev@dpdk.org" , =?UTF-8?Q?Morten_Br=c3=b8rup?= References: <20230511081641.6693-1-mattias.ronnblom@ericsson.com> <20230511082415.6720-1-mattias.ronnblom@ericsson.com> <7c0013e1-d11b-2ad0-04b9-73be426d4719@ericsson.com> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2023-05-15 14:38, Jerin Jacob wrote: > On Fri, May 12, 2023 at 6:45 PM Mattias Rönnblom > wrote: >> >> On 2023-05-12 13:59, Jerin Jacob wrote: >>> On Thu, May 11, 2023 at 2:00 PM Mattias Rönnblom >>> wrote: >>>> >>>> Use non-burst event enqueue and dequeue calls from burst enqueue and >>>> dequeue only when the burst size is compile-time constant (and equal >>>> to one). >>>> >>>> Signed-off-by: Mattias Rönnblom >>>> >>>> --- >>>> >>>> v3: Actually include the change v2 claimed to contain. >>>> v2: Wrap builtin call in __extension__, to avoid compiler warnings if >>>> application is compiled with -pedantic. (Morten Brørup) >>>> --- >>>> lib/eventdev/rte_eventdev.h | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h >>>> index a90e23ac8b..a471caeb6d 100644 >>>> --- a/lib/eventdev/rte_eventdev.h >>>> +++ b/lib/eventdev/rte_eventdev.h >>>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id, >>>> * Allow zero cost non burst mode routine invocation if application >>>> * requests nb_events as const one >>>> */ >>>> - if (nb_events == 1) >>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1) >>> >>> "Why" part is not clear from the commit message. Is this to avoid >>> nb_events read if it is built-in const. >> >> The __builtin_constant_p() is introduced to avoid having the compiler >> generate a conditional branch and two different code paths in case >> nb_elem is a run-time variable. >> >> In particular, this matters if nb_elems is run-time variable and varies >> between 1 and some larger value. >> >> I should have mention this in the commit message. >> >> A very slight performance improvement. It also makes the code better >> match the comment, imo. Zero cost for const one enqueues, but no impact >> non-compile-time-constant-length enqueues. >> >> Feel free to ignore. > > > I did some performance comparison of the patch. > A low-end ARM machines shows 0.7% drop with single event case. No > regression see with high-end ARM cores with single event case. > > IMO, optimizing the check for burst mode(the new patch) may not show > any real improvement as the cost is divided by number of event. > Whereas optimizing the check for single event case(The current code) > shows better performance with single event case and no regression > with burst mode as cost is divided by number of events. I ran some tests on an AMD Zen 3 with DSW. In the below tests the enqueue burst size is not compile-time constant. Enqueue burst size Performance improvement Run-time constant 1 ~5% Run-time constant 2 ~0% Run-time variable 1-2 ~9% Run-time variable 1-16 ~0% The run-time variable enqueue sizes randomly (uniformly) distributed in the specified range. The first result may come as a surprise. The benchmark is using RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type in most apps). The single-event enqueue function only exists in a generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent). I suspect that is the reason for the performance improvement. This effect is large-enough to make it somewhat beneficial (+~1%) to use run-time variable single-event enqueue compared to keeping the burst size compile-time constant. The performance gain is counted toward both enqueue and dequeue costs (+benchmark app overhead), so an under-estimation if see this as an enqueue performance improvement. > If you agree, then we can skip this patch. > I have no strong opinion if this should be included or not. It was up to me, I would drop the single-enqueue special case handling altogether in the next ABI update. > >> >>> If so, check should be following. Right? >>> >>> if (__extension__((__builtin_constant_p(nb_events)) && nb_events == 1) >>> || nb_events == 1) >>> >>> At least, It was my original intention in the code. >>> >>> >>> >>>> return (fp_ops->enqueue)(port, ev); >>>> else >>>> return fn(port, ev, nb_events); >>>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_t port_id, struct rte_event ev[], >>>> * Allow zero cost non burst mode routine invocation if application >>>> * requests nb_events as const one >>>> */ >>>> - if (nb_events == 1) >>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_events == 1) >>>> return (fp_ops->dequeue)(port, ev, timeout_ticks); >>>> else >>>> return (fp_ops->dequeue_burst)(port, ev, nb_events, >>>> -- >>>> 2.34.1 >>>> >>