From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 59E4A43689; Wed, 6 Dec 2023 13:17:44 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3E678402CF; Wed, 6 Dec 2023 13:17:44 +0100 (CET) Received: from mail-qt1-f175.google.com (mail-qt1-f175.google.com [209.85.160.175]) by mails.dpdk.org (Postfix) with ESMTP id D52DF40289 for ; Wed, 6 Dec 2023 13:17:42 +0100 (CET) Received: by mail-qt1-f175.google.com with SMTP id d75a77b69052e-425860bf009so695121cf.2 for ; Wed, 06 Dec 2023 04:17:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701865062; x=1702469862; darn=dpdk.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4VO3x8uTD/uDIGKZ10giFvXVYpNY3rVC71KvpCr5COw=; b=WyLcP/H4WeUCXUsUpX6DVkkLrYiv7QFpZ0NPU13WaBJdqJGOlsw9XL024/utgxPPld J6Ab0QxOxlheEysnmFPciTp7dIYkbjupmWPF/3dPhxDWty6rj7IzpBeoIGKc1XbvaTya XePa9U4V3i7lJXxJD6lSsmB9KA7u3HROygdl1ycNh3PjXDr54WffA5f8Ppe9qDPLnah8 EgkCPSgBzF6/HSuh1Sebl4XmPO7SLnzUxuq/trzVgVIt33aUKcXjVaFjKXcmxgSZPxMg nqpZ+yN2Up8u5LjNMCeft3+ItARxh1U5t7P40ZiZGPtI/qFwupLT+7vqQMvZiY7dibag C1Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701865062; x=1702469862; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4VO3x8uTD/uDIGKZ10giFvXVYpNY3rVC71KvpCr5COw=; b=FFnQKfM1ZeLXWe/myBO+SiJ1bdQbVfCzMIpNjD9ub7NC35QX4g8WBQsams4PvjUHyn mvdL5IhM/kH6JzNY2XIGdR0M1qCYifAe6EBfUQ2adt403nSa5DNtrIDzJeMenUqDWhOr mZfbH1soF6QaEgKLmHeF8TONEi4k8X+7QqsD1RsH6PISMfTCqKdo/oi1O0emZCc0/9fD BQ7p8WlQsgIm/dZuxccdB+6ZYV4NRSiJCeXwj+sLIUe52Nbk7opnc+3rkWrDA/f40xcB 0sh6ceYXlWDTpTZYI9/mQT3Wi26W9XPtfB9qA9TkGeB4KL8doubI2up7l24ipLXUsa7O /1mA== X-Gm-Message-State: AOJu0YzNR0VA5ZAvQDwSU0qWcZpL0OFT8xl3L1lw9ubSneyn+G4IUWec GQGyDmGr5UAbHN6SFUuGq5f+PZ/PnZPAYpS9rFQ= X-Google-Smtp-Source: AGHT+IHnGuypJA22Gbbd1FBm8naw78LIUsU35LC7am4Gg+l0Q2clrXfC9hlrQfjxKTFPXVKBr4+mg8LFUGeJsTDWNzY= X-Received: by 2002:ac8:4e95:0:b0:423:723e:e5da with SMTP id 21-20020ac84e95000000b00423723ee5damr896142qtp.39.1701865062142; Wed, 06 Dec 2023 04:17:42 -0800 (PST) MIME-Version: 1.0 References: <20231125160349.2021-1-pbhagavatula@marvell.com> <20231125160349.2021-2-pbhagavatula@marvell.com> In-Reply-To: <20231125160349.2021-2-pbhagavatula@marvell.com> From: Jerin Jacob Date: Wed, 6 Dec 2023 17:47:16 +0530 Message-ID: Subject: Re: [PATCH v2 2/3] net/octeon_ep: use SSE instructions for Rx routine To: pbhagavatula@marvell.com Cc: jerinj@marvell.com, Vamsi Attunuru , Bruce Richardson , Konstantin Ananyev , dev@dpdk.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Sat, Nov 25, 2023 at 10:52=E2=80=AFPM wrote: > > From: Pavan Nikhilesh > > Optimize Rx routine to use SSE instructions. > > Signed-off-by: Pavan Nikhilesh > --- > diff --git a/drivers/net/octeon_ep/cnxk_ep_rx_sse.c b/drivers/net/octeon_= ep/cnxk_ep_rx_sse.c > new file mode 100644 > index 0000000000..531f75a2e0 > --- /dev/null > +++ b/drivers/net/octeon_ep/cnxk_ep_rx_sse.c > @@ -0,0 +1,124 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(C) 2023 Marvell. > + */ > + > +#include "cnxk_ep_rx.h" > + > +static __rte_always_inline uint32_t > +hadd(__m128i x) > +{ > + __m128i hi64 =3D _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)); > + __m128i sum64 =3D _mm_add_epi32(hi64, x); > + __m128i hi32 =3D _mm_shufflelo_epi16(sum64, _MM_SHUFFLE(1, 0, 3, = 2)); > + __m128i sum32 =3D _mm_add_epi32(sum64, hi32); > + return _mm_cvtsi128_si32(sum32); > +} > + > +static __rte_always_inline void > +cnxk_ep_process_pkts_vec_sse(struct rte_mbuf **rx_pkts, struct otx_ep_dr= oq *droq, uint16_t new_pkts) > +{ > + struct rte_mbuf **recv_buf_list =3D droq->recv_buf_list; > + uint32_t bytes_rsvd =3D 0, read_idx =3D droq->read_idx; > + uint32_t idx0, idx1, idx2, idx3; > + struct rte_mbuf *m0, *m1, *m2, *m3; > + uint16_t nb_desc =3D droq->nb_desc; > + uint16_t pkts =3D 0; > + > + idx0 =3D read_idx; > + while (pkts < new_pkts) { > + const __m128i bswap_mask =3D _mm_set_epi8(0xFF, 0xFF, 12,= 13, 0xFF, 0xFF, 8, 9, 0xFF, > + 0xFF, 4, 5, 0xFF,= 0xFF, 0, 1); > + const __m128i cpy_mask =3D _mm_set_epi8(0xFF, 0xFF, 9, 8,= 0xFF, 0xFF, 9, 8, 0xFF, > + 0xFF, 1, 0, 0xFF, 0= xFF, 1, 0); > + __m128i s01, s23; > + > + idx1 =3D otx_ep_incr_index(idx0, 1, nb_desc); > + idx2 =3D otx_ep_incr_index(idx1, 1, nb_desc); > + idx3 =3D otx_ep_incr_index(idx2, 1, nb_desc); > + > + m0 =3D recv_buf_list[idx0]; > + m1 =3D recv_buf_list[idx1]; > + m2 =3D recv_buf_list[idx2]; > + m3 =3D recv_buf_list[idx3]; > + Please add some comments for SSE usage for this section > + s01 =3D _mm_set_epi32(rte_pktmbuf_mtod(m3, struct otx_ep_= droq_info *)->length >> 48, > + rte_pktmbuf_mtod(m1, struct otx_ep_dr= oq_info *)->length >> 48, > + rte_pktmbuf_mtod(m2, struct otx_ep_dr= oq_info *)->length >> 48, > + rte_pktmbuf_mtod(m0, struct otx_ep_dr= oq_info *)->length >> 48); > + s01 =3D _mm_shuffle_epi8(s01, bswap_mask); > + bytes_rsvd +=3D hadd(s01); > + s23 =3D _mm_shuffle_epi32(s01, _MM_SHUFFLE(3, 3, 1, 1)); > + s01 =3D _mm_shuffle_epi8(s01, cpy_mask); > + s23 =3D _mm_shuffle_epi8(s23, cpy_mask); > diff --git a/drivers/net/octeon_ep/otx_ep_rxtx.h b/drivers/net/octeon_ep/= otx_ep_rxtx.h > index b159c32cae..af657dba50 100644 > --- a/drivers/net/octeon_ep/otx_ep_rxtx.h > +++ b/drivers/net/octeon_ep/otx_ep_rxtx.h > @@ -48,12 +48,22 @@ cnxk_ep_xmit_pkts_mseg(void *tx_queue, struct rte_mbu= f **pkts, uint16_t nb_pkts) > uint16_t > cnxk_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t bu= dget); > > +#ifdef RTE_ARCH_X86 We can skip #ifdef for function declaration. Same comment for AVX > +uint16_t > +cnxk_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_= t budget); > +#endif > + > uint16_t > cnxk_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16= _t budget); > > uint16_t > cn9k_ep_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t bu= dget); > > +#ifdef RTE_ARCH_X86 We can skip #ifdef for function declaration. Same comment for AVX > +uint16_t > +cn9k_ep_recv_pkts_sse(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_= t budget); > +#endif > + > uint16_t > cn9k_ep_recv_pkts_mseg(void *rx_queue, struct rte_mbuf **rx_pkts, uint16= _t budget); > #endif /* _OTX_EP_RXTX_H_ */ > -- > 2.25.1 >