From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9713842B22; Tue, 16 May 2023 15:08:51 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2BF5B41141; Tue, 16 May 2023 15:08:51 +0200 (CEST) Received: from mail-vs1-f41.google.com (mail-vs1-f41.google.com [209.85.217.41]) by mails.dpdk.org (Postfix) with ESMTP id 35ADE410EE for ; Tue, 16 May 2023 15:08:50 +0200 (CEST) Received: by mail-vs1-f41.google.com with SMTP id ada2fe7eead31-4360e73d0d4so6174670137.1 for ; Tue, 16 May 2023 06:08:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684242529; x=1686834529; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JfJXQZ4ycMbGEwH1zezBuOChI+Pz2b01etmWPSCr7OM=; b=jmvEiqcALE8k/XiHhzLWMwbXCbt0M9GFDN+ZzLdAqlov5Tjib8xHTn+3qpnmUnS611 tb7l9+MUXj7Zp+BNLb0mDHGSGZ4X6rnxQ8sOBcmRd2Vo9zoN/4iFdI96ZJm8z5EELVJo /pgsc35JpUYWujCsKBQ9NuTuKk0HOrhGrQp0nWy7j6jGBDV0FdmMExfJosm7PaT+X2Lz P6LaySbYpQhe/bJ12FFlL/RI2j+sVdeP8UGvG6xXrCtS9tO+e4gkjDtq+o+Wffhm4XdD 6tCGwgJ+re9aT8r4YkK/EJcxRu5S0ZS7rDU3ksn1bWUyjUX2GpLIktt6fQ6Hci6gPSPL o5qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684242529; x=1686834529; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JfJXQZ4ycMbGEwH1zezBuOChI+Pz2b01etmWPSCr7OM=; b=dWsci2NLLTuFxI9Q3EbyPwrthnnWGxaDcZFEH4LFJxsdiJqTCHaxQvilDGLlkZeFBu YYIbVDpbDd2fkvWTKj6SH5wzKPRNgEJGMIv1uJfIsBPagEDJpbz4n4FaUQQ4eNGYqB9c QsOA8VaaIY4NrQRjVoO97+lNa7Zd82oP0RBxQUuvl4LnQQdSSes3Tpspyuy2mNTw8SZD MFSbevYUw3kvNRN/nQoRfyyxjPx29BPA+2XvxrMX/0AHm15tj4vFtSfRVkGreT87NXEy 6LcTKwd1ARPa6b+n5EA/aJHKKI7DWNWM1fnKPsPQd77ey7e3poboSOhbo+DtE5r7XJTo m8YA== X-Gm-Message-State: AC+VfDxmaU66YblyvVQWoMu9jTMLVXKggenD8VA+T+c2h4AbpNBGqU1y myx1ljGajesQlnoi56DT9WgU6CPoxEBnCT46zpU= X-Google-Smtp-Source: ACHHUZ5Q4xmn7UgqcY2j8lMsluvyjLqH27gMuAGKYzCQAO1N51bRPY/GtJdTYwxZAYEsisludjP0LTxMpSEViPT7oFA= X-Received: by 2002:a67:f291:0:b0:42e:6748:13dc with SMTP id m17-20020a67f291000000b0042e674813dcmr16426622vsk.0.1684242529397; Tue, 16 May 2023 06:08:49 -0700 (PDT) MIME-Version: 1.0 References: <20230511081641.6693-1-mattias.ronnblom@ericsson.com> <20230511082415.6720-1-mattias.ronnblom@ericsson.com> <7c0013e1-d11b-2ad0-04b9-73be426d4719@ericsson.com> In-Reply-To: From: Jerin Jacob Date: Tue, 16 May 2023 18:38:23 +0530 Message-ID: Subject: Re: [PATCH v3] eventdev: avoid non-burst shortcut for variable-size bursts To: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= Cc: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= , "jerinj@marvell.com" , "dev@dpdk.org" , =?UTF-8?Q?Morten_Br=C3=B8rup?= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, May 16, 2023 at 2:22=E2=80=AFAM Mattias R=C3=B6nnblom wrote: > > On 2023-05-15 14:38, Jerin Jacob wrote: > > On Fri, May 12, 2023 at 6:45=E2=80=AFPM Mattias R=C3=B6nnblom > > wrote: > >> > >> On 2023-05-12 13:59, Jerin Jacob wrote: > >>> On Thu, May 11, 2023 at 2:00=E2=80=AFPM Mattias R=C3=B6nnblom > >>> wrote: > >>>> > >>>> Use non-burst event enqueue and dequeue calls from burst enqueue and > >>>> dequeue only when the burst size is compile-time constant (and equal > >>>> to one). > >>>> > >>>> Signed-off-by: Mattias R=C3=B6nnblom > >>>> > >>>> --- > >>>> > >>>> v3: Actually include the change v2 claimed to contain. > >>>> v2: Wrap builtin call in __extension__, to avoid compiler warnings i= f > >>>> application is compiled with -pedantic. (Morten Br=C3=B8rup) > >>>> --- > >>>> lib/eventdev/rte_eventdev.h | 4 ++-- > >>>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev= .h > >>>> index a90e23ac8b..a471caeb6d 100644 > >>>> --- a/lib/eventdev/rte_eventdev.h > >>>> +++ b/lib/eventdev/rte_eventdev.h > >>>> @@ -1944,7 +1944,7 @@ __rte_event_enqueue_burst(uint8_t dev_id, uint= 8_t port_id, > >>>> * Allow zero cost non burst mode routine invocation if ap= plication > >>>> * requests nb_events as const one > >>>> */ > >>>> - if (nb_events =3D=3D 1) > >>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_eve= nts =3D=3D 1) > >>> > >>> "Why" part is not clear from the commit message. Is this to avoid > >>> nb_events read if it is built-in const. > >> > >> The __builtin_constant_p() is introduced to avoid having the compiler > >> generate a conditional branch and two different code paths in case > >> nb_elem is a run-time variable. > >> > >> In particular, this matters if nb_elems is run-time variable and varie= s > >> between 1 and some larger value. > >> > >> I should have mention this in the commit message. > >> > >> A very slight performance improvement. It also makes the code better > >> match the comment, imo. Zero cost for const one enqueues, but no impac= t > >> non-compile-time-constant-length enqueues. > >> > >> Feel free to ignore. > > > > > > I did some performance comparison of the patch. > > A low-end ARM machines shows 0.7% drop with single event case. No > > regression see with high-end ARM cores with single event case. > > > > IMO, optimizing the check for burst mode(the new patch) may not show > > any real improvement as the cost is divided by number of event. > > Whereas optimizing the check for single event case(The current code) > > shows better performance with single event case and no regression > > with burst mode as cost is divided by number of events. > > I ran some tests on an AMD Zen 3 with DSW. > In the below tests the enqueue burst size is not compile-time constant. > > Enqueue burst size Performance improvement > Run-time constant 1 ~5% > Run-time constant 2 ~0% > Run-time variable 1-2 ~9% > Run-time variable 1-16 ~0% > > The run-time variable enqueue sizes randomly (uniformly) distributed in > the specified range. > > The first result may come as a surprise. The benchmark is using > RTE_EVENT_OP_FORWARD type events (which likely is the dominating op type > in most apps). The single-event enqueue function only exists in a > generic variant (i.e., no rte_event_enqueue_forward_burst() equivalent). > I suspect that is the reason for the performance improvement. > > This effect is large-enough to make it somewhat beneficial (+~1%) to use > run-time variable single-event enqueue compared to keeping the burst > size compile-time constant. # Interesting, Could you share your testeventdev command to test it. # By having quick glance on DSW code, following change can be added(or similar cases). Not sure such change in DSW driver is making a difference or nor? diff --git a/drivers/event/dsw/dsw_event.c b/drivers/event/dsw/dsw_event.c index e84b65d99f..455470997b 100644 --- a/drivers/event/dsw/dsw_event.c +++ b/drivers/event/dsw/dsw_event.c @@ -1251,7 +1251,7 @@ dsw_port_flush_out_buffers(struct dsw_evdev *dsw, struct dsw_port *source_port) uint16_t dsw_event_enqueue(void *port, const struct rte_event *ev) { - return dsw_event_enqueue_burst(port, ev, unlikely(ev =3D=3D NULL) ?= 0 : 1); + return dsw_event_enqueue_burst(port, ev, 1); } static __rte_always_inline uint16_t @@ -1340,7 +1340,7 @@ dsw_event_enqueue_burst_generic(struct dsw_port *source_port, return (num_non_release + num_release); } -uint16_t +inline uint16_t dsw_event_enqueue_burst(void *port, const struct rte_event events[], uint16_t events_len) { # I am testing with command like this "/build/app/dpdk-test-eventdev -l 0-23 -a 0002:0e:00.0 -- --test=3Dperf_atq --plcores 1 --wlcores 8 --stlist p --nb_pkts=3D10000000000" > > The performance gain is counted toward both enqueue and dequeue costs > (+benchmark app overhead), so an under-estimation if see this as an > enqueue performance improvement. > > > If you agree, then we can skip this patch. > > > > I have no strong opinion if this should be included or not. > > It was up to me, I would drop the single-enqueue special case handling > altogether in the next ABI update. That's a reasonable path. If we are willing to push a patch, we can test it and give feedback. Or in our spare time, We can do that as well. > > > > >> > >>> If so, check should be following. Right? > >>> > >>> if (__extension__((__builtin_constant_p(nb_events)) && nb_events =3D= =3D 1) > >>> || nb_events =3D=3D 1) > >>> > >>> At least, It was my original intention in the code. > >>> > >>> > >>> > >>>> return (fp_ops->enqueue)(port, ev); > >>>> else > >>>> return fn(port, ev, nb_events); > >>>> @@ -2200,7 +2200,7 @@ rte_event_dequeue_burst(uint8_t dev_id, uint8_= t port_id, struct rte_event ev[], > >>>> * Allow zero cost non burst mode routine invocation if ap= plication > >>>> * requests nb_events as const one > >>>> */ > >>>> - if (nb_events =3D=3D 1) > >>>> + if (__extension__(__builtin_constant_p(nb_events)) && nb_eve= nts =3D=3D 1) > >>>> return (fp_ops->dequeue)(port, ev, timeout_ticks); > >>>> else > >>>> return (fp_ops->dequeue_burst)(port, ev, nb_events= , > >>>> -- > >>>> 2.34.1 > >>>> > >>