From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id DEB3F42B7B; Tue, 23 May 2023 08:09:43 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AE6FD410EE; Tue, 23 May 2023 08:09:43 +0200 (CEST) Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by mails.dpdk.org (Postfix) with ESMTP id 65A4840A80 for ; Tue, 23 May 2023 08:09:42 +0200 (CEST) Received: by mail-vs1-f45.google.com with SMTP id ada2fe7eead31-43485a18d5aso3103540137.1 for ; Mon, 22 May 2023 23:09:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684822181; x=1687414181; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=cPb5JjfqpkPA+xt7Ip13zxezrCCDa8puJjLSQ6MZIjo=; b=W49k0g6msLwR5a8sBLGKtcOuDNtCPAUkq/wy6OrWIwPCpR9p8qIaQ5t/1k1mIiUDGE 36/ysNEkYnKIr2wROs1LMxl4blWByqyxBTTha/GN4jMqKgkFXmAcIuidpRtRWeNK0416 LWrX9X5n588c4mLe06P+7W+SQZ4CbWf8TwwhPGIFl4meoij6kFbmEjImN/fSvAptF0L6 hL+ySpvOeVgC2iCFt0fjqFZJF/xJOL3fY3GM828QhNpoaUdpMKYQqn/fUQEOo5TbBmFw qYDDGSXT13LSsI1GGtR/mzPDpOnBmQqGfYrAtLyeJm24R1M1sImS+eHhO0MMC99YvYXg alYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684822181; x=1687414181; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cPb5JjfqpkPA+xt7Ip13zxezrCCDa8puJjLSQ6MZIjo=; b=ZyR3ZWY+omCyIGkdWRxpPG4NW+8GWxWqN4gJcE1yYdx/3FsLYVeJ21sQrfabdN6jvc zVcAOZcEx1mqlsv884ck5DOkMbAXWfPTizWZDkPe/S+/nca2YFaz3ef+RTHyN7tVSqF6 2VhiJExYmSqqSd7C7Tq2H7Jbu3GQ4rrxARjZfLxTSU96Ktl4PR+4Cp1FX3iERhapBhhn ziB4mW+w+Sv4pyMiiTbPoLnuW2+CCPt8SahbKMvciolbnS1WssyxDL5HYIMkETpaaG89 WRAR7nasBUyqhfUQhMwY1HbQ3s/8xA1M8LGd83T+O/IDwYOzq+JXtUDP54vC3btqktWM DMbA== X-Gm-Message-State: AC+VfDwHNjm2M8Cq8k3B6WGdQ/7ZKuXyFnRlo5QtwVdAAJGasgRtV9K0 LikssuScDzZKEKb8BDE+ZSvK4FYK8aWH3p7Xiuyy2wxJ5wnVbg== X-Google-Smtp-Source: ACHHUZ6GltkTJ399zpo6Wis4x1B5V9dc79XZyYrwn2b9dzSoUmXpn16n0sBVR0J5t7rRCDXjm+Kr1ZkagBZtE+GrXlo= X-Received: by 2002:a05:6102:4422:b0:439:4fb6:83fa with SMTP id df34-20020a056102442200b004394fb683famr1422287vsb.9.1684822181421; Mon, 22 May 2023 23:09:41 -0700 (PDT) MIME-Version: 1.0 References: <20230425195110.4223-1-pbhagavatula@marvell.com> <20230522115635.1568-1-pbhagavatula@marvell.com> In-Reply-To: <20230522115635.1568-1-pbhagavatula@marvell.com> From: Jerin Jacob Date: Tue, 23 May 2023 11:39:15 +0530 Message-ID: Subject: Re: [PATCH v3] event/cnxk: use LMTST for enqueue new burst To: pbhagavatula@marvell.com Cc: jerinj@marvell.com, Nithin Dabilpuram , Kiran Kumar K , Sunil Kumar Kori , Satha Rao , Shijith Thotton , dev@dpdk.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Mon, May 22, 2023 at 5:26=E2=80=AFPM wrote: > > From: Pavan Nikhilesh > > Use LMTST when all events in the burst are enqueue with > rte_event:op as RTE_EVENT_OP_NEW i.e. events are enqueued > with the `rte_event_enqueue_new_burst` API. > > Signed-off-by: Pavan Nikhilesh > Acked-by: Shijith Thotton Applied to dpdk-next-net-eventdev/for-main. Thanks > --- > v3 Changes: > - Fix checkpatch issues. > v2 Changes: > - Fix spell check. > > drivers/common/cnxk/hw/sso.h | 1 + > drivers/common/cnxk/roc_sso.c | 10 +- > drivers/common/cnxk/roc_sso.h | 3 + > drivers/event/cnxk/cn10k_eventdev.c | 9 +- > drivers/event/cnxk/cn10k_eventdev.h | 6 +- > drivers/event/cnxk/cn10k_worker.c | 304 +++++++++++++++++++++++++++- > drivers/event/cnxk/cn9k_eventdev.c | 2 +- > drivers/event/cnxk/cnxk_eventdev.c | 15 +- > drivers/event/cnxk/cnxk_eventdev.h | 4 +- > 9 files changed, 338 insertions(+), 16 deletions(-) > > diff --git a/drivers/common/cnxk/hw/sso.h b/drivers/common/cnxk/hw/sso.h > index 25deaa4c14..09b8d4955f 100644 > --- a/drivers/common/cnxk/hw/sso.h > +++ b/drivers/common/cnxk/hw/sso.h > @@ -157,6 +157,7 @@ > #define SSO_LF_GGRP_AQ_CNT (0x1c0ull) > #define SSO_LF_GGRP_AQ_THR (0x1e0ull) > #define SSO_LF_GGRP_MISC_CNT (0x200ull) > +#define SSO_LF_GGRP_OP_AW_LMTST (0x400ull) > > #define SSO_AF_IAQ_FREE_CNT_MASK 0x3FFFull > #define SSO_AF_IAQ_RSVD_FREE_MASK 0x3FFFull > diff --git a/drivers/common/cnxk/roc_sso.c b/drivers/common/cnxk/roc_sso.= c > index 4a6a5080f7..99a55e49b0 100644 > --- a/drivers/common/cnxk/roc_sso.c > +++ b/drivers/common/cnxk/roc_sso.c > @@ -6,6 +6,7 @@ > #include "roc_priv.h" > > #define SSO_XAQ_CACHE_CNT (0x7) > +#define SSO_XAQ_SLACK (16) > > /* Private functions. */ > int > @@ -493,9 +494,13 @@ sso_hwgrp_init_xaq_aura(struct dev *dev, struct roc_= sso_xaq_data *xaq, > > xaq->nb_xae =3D nb_xae; > > - /* Taken from HRM 14.3.3(4) */ > + /** SSO will reserve up to 0x4 XAQ buffers per group when GetWork= engine > + * is inactive and it might prefetch an additional 0x3 buffers du= e to > + * pipelining. > + */ > xaq->nb_xaq =3D (SSO_XAQ_CACHE_CNT * nb_hwgrp); > xaq->nb_xaq +=3D PLT_MAX(1 + ((xaq->nb_xae - 1) / xae_waes), xaq-= >nb_xaq); > + xaq->nb_xaq +=3D SSO_XAQ_SLACK; > > xaq->mem =3D plt_zmalloc(xaq_buf_size * xaq->nb_xaq, xaq_buf_size= ); > if (xaq->mem =3D=3D NULL) { > @@ -537,7 +542,8 @@ sso_hwgrp_init_xaq_aura(struct dev *dev, struct roc_s= so_xaq_data *xaq, > * There should be a minimum headroom of 7 XAQs per HWGRP for SSO > * to request XAQ to cache them even before enqueue is called. > */ > - xaq->xaq_lmt =3D xaq->nb_xaq - (nb_hwgrp * SSO_XAQ_CACHE_CNT); > + xaq->xaq_lmt =3D > + xaq->nb_xaq - (nb_hwgrp * SSO_XAQ_CACHE_CNT) - SSO_XAQ_SL= ACK; > > return 0; > npa_fill_fail: > diff --git a/drivers/common/cnxk/roc_sso.h b/drivers/common/cnxk/roc_sso.= h > index e67797b046..a2bb6fcb22 100644 > --- a/drivers/common/cnxk/roc_sso.h > +++ b/drivers/common/cnxk/roc_sso.h > @@ -7,6 +7,9 @@ > > #include "hw/ssow.h" > > +#define ROC_SSO_AW_PER_LMT_LINE_LOG2 3 > +#define ROC_SSO_XAE_PER_XAQ 352 > + > struct roc_sso_hwgrp_qos { > uint16_t hwgrp; > uint8_t xaq_prcnt; > diff --git a/drivers/event/cnxk/cn10k_eventdev.c b/drivers/event/cnxk/cn1= 0k_eventdev.c > index 071ea5a212..855c92da83 100644 > --- a/drivers/event/cnxk/cn10k_eventdev.c > +++ b/drivers/event/cnxk/cn10k_eventdev.c > @@ -91,8 +91,10 @@ cn10k_sso_hws_setup(void *arg, void *hws, uintptr_t gr= p_base) > uint64_t val; > > ws->grp_base =3D grp_base; > - ws->fc_mem =3D (uint64_t *)dev->fc_iova; > + ws->fc_mem =3D (int64_t *)dev->fc_iova; > ws->xaq_lmt =3D dev->xaq_lmt; > + ws->fc_cache_space =3D dev->fc_cache_space; > + ws->aw_lmt =3D ws->lmt_base; > > /* Set get_work timeout for HWS */ > val =3D NSEC2USEC(dev->deq_tmo_ns); > @@ -624,6 +626,7 @@ cn10k_sso_info_get(struct rte_eventdev *event_dev, > > dev_info->driver_name =3D RTE_STR(EVENTDEV_NAME_CN10K_PMD); > cnxk_sso_info_get(dev, dev_info); > + dev_info->max_event_port_enqueue_depth =3D UINT32_MAX; > } > > static int > @@ -632,7 +635,7 @@ cn10k_sso_dev_configure(const struct rte_eventdev *ev= ent_dev) > struct cnxk_sso_evdev *dev =3D cnxk_sso_pmd_priv(event_dev); > int rc; > > - rc =3D cnxk_sso_dev_validate(event_dev); > + rc =3D cnxk_sso_dev_validate(event_dev, 1, UINT32_MAX); > if (rc < 0) { > plt_err("Invalid event device configuration"); > return -EINVAL; > @@ -871,7 +874,7 @@ cn10k_sso_set_priv_mem(const struct rte_eventdev *eve= nt_dev, void *lookup_mem) > for (i =3D 0; i < dev->nb_event_ports; i++) { > struct cn10k_sso_hws *ws =3D event_dev->data->ports[i]; > ws->xaq_lmt =3D dev->xaq_lmt; > - ws->fc_mem =3D (uint64_t *)dev->fc_iova; > + ws->fc_mem =3D (int64_t *)dev->fc_iova; > ws->tstamp =3D dev->tstamp; > if (lookup_mem) > ws->lookup_mem =3D lookup_mem; > diff --git a/drivers/event/cnxk/cn10k_eventdev.h b/drivers/event/cnxk/cn1= 0k_eventdev.h > index aaa01d1ec1..29567728cd 100644 > --- a/drivers/event/cnxk/cn10k_eventdev.h > +++ b/drivers/event/cnxk/cn10k_eventdev.h > @@ -19,9 +19,11 @@ struct cn10k_sso_hws { > struct cnxk_timesync_info **tstamp; > uint64_t meta_aura; > /* Add Work Fastpath data */ > - uint64_t xaq_lmt __rte_cache_aligned; > - uint64_t *fc_mem; > + int64_t *fc_mem __rte_cache_aligned; > + int64_t *fc_cache_space; > + uintptr_t aw_lmt; > uintptr_t grp_base; > + int32_t xaq_lmt; > /* Tx Fastpath data */ > uintptr_t lmt_base __rte_cache_aligned; > uint64_t lso_tun_fmt; > diff --git a/drivers/event/cnxk/cn10k_worker.c b/drivers/event/cnxk/cn10k= _worker.c > index 562d2fca13..9b5bf90159 100644 > --- a/drivers/event/cnxk/cn10k_worker.c > +++ b/drivers/event/cnxk/cn10k_worker.c > @@ -77,6 +77,36 @@ cn10k_sso_hws_forward_event(struct cn10k_sso_hws *ws, > cn10k_sso_hws_fwd_group(ws, ev, grp); > } > > +static inline int32_t > +sso_read_xaq_space(struct cn10k_sso_hws *ws) > +{ > + return (ws->xaq_lmt - __atomic_load_n(ws->fc_mem, __ATOMIC_RELAXE= D)) * > + ROC_SSO_XAE_PER_XAQ; > +} > + > +static inline void > +sso_lmt_aw_wait_fc(struct cn10k_sso_hws *ws, int64_t req) > +{ > + int64_t cached, refill; > + > +retry: > + while (__atomic_load_n(ws->fc_cache_space, __ATOMIC_RELAXED) < 0) > + ; > + > + cached =3D __atomic_fetch_sub(ws->fc_cache_space, req, __ATOMIC_A= CQUIRE) - req; > + /* Check if there is enough space, else update and retry. */ > + if (cached < 0) { > + /* Check if we have space else retry. */ > + do { > + refill =3D sso_read_xaq_space(ws); > + } while (refill <=3D 0); > + __atomic_compare_exchange(ws->fc_cache_space, &cached, &r= efill, > + 0, __ATOMIC_RELEASE, > + __ATOMIC_RELAXED); > + goto retry; > + } > +} > + > uint16_t __rte_hot > cn10k_sso_hws_enq(void *port, const struct rte_event *ev) > { > @@ -103,6 +133,253 @@ cn10k_sso_hws_enq(void *port, const struct rte_even= t *ev) > return 1; > } > > +#define VECTOR_SIZE_BITS 0xFFFFFFFFFFF80000ULL > +#define VECTOR_GET_LINE_OFFSET(line) (19 + (3 * line)) > + > +static uint64_t > +vector_size_partial_mask(uint16_t off, uint16_t cnt) > +{ > + return (VECTOR_SIZE_BITS & ~(~0x0ULL << off)) | > + ((uint64_t)(cnt - 1) << off); > +} > + > +static __rte_always_inline uint16_t > +cn10k_sso_hws_new_event_lmtst(struct cn10k_sso_hws *ws, uint8_t queue_id= , > + const struct rte_event ev[], uint16_t n) > +{ > + uint16_t lines, partial_line, burst, left; > + uint64_t wdata[2], pa[2] =3D {0}; > + uintptr_t lmt_addr; > + uint16_t sz0, sz1; > + uint16_t lmt_id; > + > + sz0 =3D sz1 =3D 0; > + lmt_addr =3D ws->lmt_base; > + ROC_LMT_BASE_ID_GET(lmt_addr, lmt_id); > + > + left =3D n; > +again: > + burst =3D RTE_MIN( > + BIT(ROC_SSO_AW_PER_LMT_LINE_LOG2 + ROC_LMT_LINES_PER_CORE= _LOG2), > + left); > + > + /* Set wdata */ > + lines =3D burst >> ROC_SSO_AW_PER_LMT_LINE_LOG2; > + partial_line =3D burst & (BIT(ROC_SSO_AW_PER_LMT_LINE_LOG2) - 1); > + wdata[0] =3D wdata[1] =3D 0; > + if (lines > BIT(ROC_LMT_LINES_PER_STR_LOG2)) { > + wdata[0] =3D lmt_id; > + wdata[0] |=3D 15ULL << 12; > + wdata[0] |=3D VECTOR_SIZE_BITS; > + pa[0] =3D (ws->grp_base + (queue_id << 12) + > + SSO_LF_GGRP_OP_AW_LMTST) | > + (0x7 << 4); > + sz0 =3D 16 << ROC_SSO_AW_PER_LMT_LINE_LOG2; > + > + wdata[1] =3D lmt_id + 16; > + pa[1] =3D (ws->grp_base + (queue_id << 12) + > + SSO_LF_GGRP_OP_AW_LMTST) | > + (0x7 << 4); > + > + lines -=3D 17; > + wdata[1] |=3D partial_line ? (uint64_t)(lines + 1) << 12 = : > + (uint64_t)(lines << 12); > + wdata[1] |=3D partial_line ? > + vector_size_partial_mask( > + VECTOR_GET_LINE_OFFSET(lines)= , > + partial_line) : > + VECTOR_SIZE_BITS; > + sz1 =3D burst - sz0; > + partial_line =3D 0; > + } else if (lines) { > + /* We need to handle two cases here: > + * 1. Partial line spill over to wdata[1] i.e. lines =3D= =3D 16 > + * 2. Partial line with spill lines < 16. > + */ > + wdata[0] =3D lmt_id; > + pa[0] =3D (ws->grp_base + (queue_id << 12) + > + SSO_LF_GGRP_OP_AW_LMTST) | > + (0x7 << 4); > + sz0 =3D lines << ROC_SSO_AW_PER_LMT_LINE_LOG2; > + if (lines =3D=3D 16) { > + wdata[0] |=3D 15ULL << 12; > + wdata[0] |=3D VECTOR_SIZE_BITS; > + if (partial_line) { > + wdata[1] =3D lmt_id + 16; > + pa[1] =3D (ws->grp_base + (queue_id << 12= ) + > + SSO_LF_GGRP_OP_AW_LMTST) | > + ((partial_line - 1) << 4); > + } > + } else { > + lines -=3D 1; > + wdata[0] |=3D partial_line ? (uint64_t)(lines + 1= ) << 12 : > + (uint64_t)(lines= << 12); > + wdata[0] |=3D > + partial_line ? > + vector_size_partial_mask( > + VECTOR_GET_LINE_OFFSET(li= nes), > + partial_line) : > + VECTOR_SIZE_BITS; > + sz0 +=3D partial_line; > + } > + sz1 =3D burst - sz0; > + partial_line =3D 0; > + } > + > + /* Only partial lines */ > + if (partial_line) { > + wdata[0] =3D lmt_id; > + pa[0] =3D (ws->grp_base + (queue_id << 12) + > + SSO_LF_GGRP_OP_AW_LMTST) | > + ((partial_line - 1) << 4); > + sz0 =3D partial_line; > + sz1 =3D burst - sz0; > + } > + > +#if defined(RTE_ARCH_ARM64) > + uint64x2_t aw_mask =3D {0xC0FFFFFFFFULL, ~0x0ULL}; > + uint64x2_t tt_mask =3D {0x300000000ULL, 0}; > + uint16_t parts; > + > + while (burst) { > + parts =3D burst > 7 ? 8 : plt_align32prevpow2(burst); > + burst -=3D parts; > + /* Lets try to fill at least one line per burst. */ > + switch (parts) { > + case 8: { > + uint64x2_t aw0, aw1, aw2, aw3, aw4, aw5, aw6, aw7= ; > + > + aw0 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [0]), > + aw_mask); > + aw1 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [1]), > + aw_mask); > + aw2 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [2]), > + aw_mask); > + aw3 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [3]), > + aw_mask); > + aw4 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [4]), > + aw_mask); > + aw5 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [5]), > + aw_mask); > + aw6 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [6]), > + aw_mask); > + aw7 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [7]), > + aw_mask); > + > + aw0 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw0, 6), = tt_mask), > + aw0); > + aw1 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw1, 6), = tt_mask), > + aw1); > + aw2 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw2, 6), = tt_mask), > + aw2); > + aw3 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw3, 6), = tt_mask), > + aw3); > + aw4 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw4, 6), = tt_mask), > + aw4); > + aw5 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw5, 6), = tt_mask), > + aw5); > + aw6 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw6, 6), = tt_mask), > + aw6); > + aw7 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw7, 6), = tt_mask), > + aw7); > + > + vst1q_u64((void *)lmt_addr, aw0); > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 16), aw1)= ; > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 32), aw2)= ; > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 48), aw3)= ; > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 64), aw4)= ; > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 80), aw5)= ; > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 96), aw6)= ; > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 112), aw7= ); > + lmt_addr =3D (uintptr_t)PLT_PTR_ADD(lmt_addr, 128= ); > + } break; > + case 4: { > + uint64x2_t aw0, aw1, aw2, aw3; > + aw0 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [0]), > + aw_mask); > + aw1 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [1]), > + aw_mask); > + aw2 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [2]), > + aw_mask); > + aw3 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [3]), > + aw_mask); > + > + aw0 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw0, 6), = tt_mask), > + aw0); > + aw1 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw1, 6), = tt_mask), > + aw1); > + aw2 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw2, 6), = tt_mask), > + aw2); > + aw3 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw3, 6), = tt_mask), > + aw3); > + > + vst1q_u64((void *)lmt_addr, aw0); > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 16), aw1)= ; > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 32), aw2)= ; > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 48), aw3)= ; > + lmt_addr =3D (uintptr_t)PLT_PTR_ADD(lmt_addr, 64)= ; > + } break; > + case 2: { > + uint64x2_t aw0, aw1; > + > + aw0 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [0]), > + aw_mask); > + aw1 =3D vandq_u64(vld1q_u64((const uint64_t *)&ev= [1]), > + aw_mask); > + > + aw0 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw0, 6), = tt_mask), > + aw0); > + aw1 =3D vorrq_u64(vandq_u64(vshrq_n_u64(aw1, 6), = tt_mask), > + aw1); > + > + vst1q_u64((void *)lmt_addr, aw0); > + vst1q_u64((void *)PLT_PTR_ADD(lmt_addr, 16), aw1)= ; > + lmt_addr =3D (uintptr_t)PLT_PTR_ADD(lmt_addr, 32)= ; > + } break; > + case 1: { > + __uint128_t aw0; > + > + aw0 =3D ev[0].u64; > + aw0 <<=3D 64; > + aw0 |=3D ev[0].event & (BIT_ULL(32) - 1); > + aw0 |=3D (uint64_t)ev[0].sched_type << 32; > + > + *((__uint128_t *)lmt_addr) =3D aw0; > + lmt_addr =3D (uintptr_t)PLT_PTR_ADD(lmt_addr, 16)= ; > + } break; > + } > + ev +=3D parts; > + } > +#else > + uint16_t i; > + > + for (i =3D 0; i < burst; i++) { > + __uint128_t aw0; > + > + aw0 =3D ev[0].u64; > + aw0 <<=3D 64; > + aw0 |=3D ev[0].event & (BIT_ULL(32) - 1); > + aw0 |=3D (uint64_t)ev[0].sched_type << 32; > + *((__uint128_t *)lmt_addr) =3D aw0; > + lmt_addr =3D (uintptr_t)PLT_PTR_ADD(lmt_addr, 16); > + } > +#endif > + > + /* wdata[0] will be always valid */ > + sso_lmt_aw_wait_fc(ws, sz0); > + roc_lmt_submit_steorl(wdata[0], pa[0]); > + if (wdata[1]) { > + sso_lmt_aw_wait_fc(ws, sz1); > + roc_lmt_submit_steorl(wdata[1], pa[1]); > + } > + > + left -=3D (sz0 + sz1); > + if (left) > + goto again; > + > + return n; > +} > + > uint16_t __rte_hot > cn10k_sso_hws_enq_burst(void *port, const struct rte_event ev[], > uint16_t nb_events) > @@ -115,13 +392,32 @@ uint16_t __rte_hot > cn10k_sso_hws_enq_new_burst(void *port, const struct rte_event ev[], > uint16_t nb_events) > { > + uint16_t idx =3D 0, done =3D 0, rc =3D 0; > struct cn10k_sso_hws *ws =3D port; > - uint16_t i, rc =3D 1; > + uint8_t queue_id; > + int32_t space; > + > + /* Do a common back-pressure check and return */ > + space =3D sso_read_xaq_space(ws) - ROC_SSO_XAE_PER_XAQ; > + if (space <=3D 0) > + return 0; > + nb_events =3D space < nb_events ? space : nb_events; > + > + do { > + queue_id =3D ev[idx].queue_id; > + for (idx =3D idx + 1; idx < nb_events; idx++) > + if (queue_id !=3D ev[idx].queue_id) > + break; > + > + rc =3D cn10k_sso_hws_new_event_lmtst(ws, queue_id, &ev[do= ne], > + idx - done); > + if (rc !=3D (idx - done)) > + return rc + done; > + done +=3D rc; > > - for (i =3D 0; i < nb_events && rc; i++) > - rc =3D cn10k_sso_hws_new_event(ws, &ev[i]); > + } while (done < nb_events); > > - return nb_events; > + return done; > } > > uint16_t __rte_hot > diff --git a/drivers/event/cnxk/cn9k_eventdev.c b/drivers/event/cnxk/cn9k= _eventdev.c > index 7e8339bd3a..e59e537311 100644 > --- a/drivers/event/cnxk/cn9k_eventdev.c > +++ b/drivers/event/cnxk/cn9k_eventdev.c > @@ -753,7 +753,7 @@ cn9k_sso_dev_configure(const struct rte_eventdev *eve= nt_dev) > struct cnxk_sso_evdev *dev =3D cnxk_sso_pmd_priv(event_dev); > int rc; > > - rc =3D cnxk_sso_dev_validate(event_dev); > + rc =3D cnxk_sso_dev_validate(event_dev, 1, 1); > if (rc < 0) { > plt_err("Invalid event device configuration"); > return -EINVAL; > diff --git a/drivers/event/cnxk/cnxk_eventdev.c b/drivers/event/cnxk/cnxk= _eventdev.c > index cb9ba5d353..99f9cdcd0d 100644 > --- a/drivers/event/cnxk/cnxk_eventdev.c > +++ b/drivers/event/cnxk/cnxk_eventdev.c > @@ -145,7 +145,8 @@ cnxk_sso_restore_links(const struct rte_eventdev *eve= nt_dev, > } > > int > -cnxk_sso_dev_validate(const struct rte_eventdev *event_dev) > +cnxk_sso_dev_validate(const struct rte_eventdev *event_dev, uint32_t deq= _depth, > + uint32_t enq_depth) > { > struct rte_event_dev_config *conf =3D &event_dev->data->dev_conf; > struct cnxk_sso_evdev *dev =3D cnxk_sso_pmd_priv(event_dev); > @@ -173,12 +174,12 @@ cnxk_sso_dev_validate(const struct rte_eventdev *ev= ent_dev) > return -EINVAL; > } > > - if (conf->nb_event_port_dequeue_depth > 1) { > + if (conf->nb_event_port_dequeue_depth > deq_depth) { > plt_err("Unsupported event port deq depth requested"); > return -EINVAL; > } > > - if (conf->nb_event_port_enqueue_depth > 1) { > + if (conf->nb_event_port_enqueue_depth > enq_depth) { > plt_err("Unsupported event port enq depth requested"); > return -EINVAL; > } > @@ -630,6 +631,14 @@ cnxk_sso_init(struct rte_eventdev *event_dev) > } > > dev =3D cnxk_sso_pmd_priv(event_dev); > + dev->fc_cache_space =3D rte_zmalloc("fc_cache", PLT_CACHE_LINE_SI= ZE, > + PLT_CACHE_LINE_SIZE); > + if (dev->fc_cache_space =3D=3D NULL) { > + plt_memzone_free(mz); > + plt_err("Failed to reserve memory for XAQ fc cache"); > + return -ENOMEM; > + } > + > pci_dev =3D container_of(event_dev->dev, struct rte_pci_device, d= evice); > dev->sso.pci_dev =3D pci_dev; > > diff --git a/drivers/event/cnxk/cnxk_eventdev.h b/drivers/event/cnxk/cnxk= _eventdev.h > index c7cbd722ab..a2f30bfe5f 100644 > --- a/drivers/event/cnxk/cnxk_eventdev.h > +++ b/drivers/event/cnxk/cnxk_eventdev.h > @@ -90,6 +90,7 @@ struct cnxk_sso_evdev { > uint32_t max_dequeue_timeout_ns; > int32_t max_num_events; > uint64_t xaq_lmt; > + int64_t *fc_cache_space; > rte_iova_t fc_iova; > uint64_t rx_offloads; > uint64_t tx_offloads; > @@ -206,7 +207,8 @@ int cnxk_sso_fini(struct rte_eventdev *event_dev); > int cnxk_sso_remove(struct rte_pci_device *pci_dev); > void cnxk_sso_info_get(struct cnxk_sso_evdev *dev, > struct rte_event_dev_info *dev_info); > -int cnxk_sso_dev_validate(const struct rte_eventdev *event_dev); > +int cnxk_sso_dev_validate(const struct rte_eventdev *event_dev, > + uint32_t deq_depth, uint32_t enq_depth); > int cnxk_setup_event_ports(const struct rte_eventdev *event_dev, > cnxk_sso_init_hws_mem_t init_hws_mem, > cnxk_sso_hws_setup_t hws_setup); > -- > 2.25.1 >