From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7F623A0093; Fri, 17 Jun 2022 09:51:26 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7016F410E7; Fri, 17 Jun 2022 09:51:26 +0200 (CEST) Received: from mx0b-0016f401.pphosted.com (mx0a-0016f401.pphosted.com [67.231.148.174]) by mails.dpdk.org (Postfix) with ESMTP id C2E9140698 for ; Fri, 17 Jun 2022 09:51:24 +0200 (CEST) Received: from pps.filterd (m0045849.ppops.net [127.0.0.1]) by mx0a-0016f401.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 25H3vrZ6007567; Fri, 17 Jun 2022 00:51:20 -0700 Received: from nam12-mw2-obe.outbound.protection.outlook.com (mail-mw2nam12lp2046.outbound.protection.outlook.com [104.47.66.46]) by mx0a-0016f401.pphosted.com (PPS) with ESMTPS id 3grj5h8sfb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 17 Jun 2022 00:51:20 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=icgsYhT7yLczp9UfAONa4fyGuYlVK6Yt42GVDzAL5yAoZVlAowxeifM7Eazis6R6ICUAXPH1m0pzQg3zuwk5qHLAEaDxOChpKsHOfwqQArKTraXNwaeSRKFLbIwzLUczabpGtEHaWfWahvJ8K/aMn9WrXq2ixmuUBHZAbED0NUdu8l/NIGyWQzhdCFVnVH9uUJE/gHrrNpn+scDRBPiWJnrjq3/Jql8la++w/3H5jP4Fqg577xBLjQbwpprwLqY5D2zNjtajFsYf+cPEhcZSUOt9RLUiGR2apKqUuVojq8FewmyJq0eiaNSMas2iBrJrZ1zO8KodwQL0vi3FLNcC3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=r9mFuXJ6Zae0PkmUxCrblArwzobhZNmoiGLm5Jhn+jw=; b=R0T6CgzX5t7YMTYiJM7FUKTJ55BE//rIrTVA31NJiV96zeBd5eZRIjAiuUgxSyUM12MXsspofMR8MSlWK1lyetMolUlkSbxS12sg1f8tt3fEgt5t1LHGyUTSPCN6Kj1NIVvWdHnTlzdyK6nlMSQLwt9o569bKaYecKG6HSZoeApyZmisod78p8k3i3jq/gqCeAgwJ7CP7Wc6ueW46c4cGVJtKLBpOnogQ605P9Rrx3XFaz3ioSRQrv8yHo9XebtpUAPrzSSNlJUR7QcBlPS53hafgzmS0YPMBWfM4NfJWsVZ/XZrhmHc4ObNKBhOoS1P+VE9m4LKSGCkHxZzol/faw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=marvell.com; dmarc=pass action=none header.from=marvell.com; dkim=pass header.d=marvell.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=marvell.onmicrosoft.com; s=selector1-marvell-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=r9mFuXJ6Zae0PkmUxCrblArwzobhZNmoiGLm5Jhn+jw=; b=UEONql9ra7k9jaO3fW2v44MvzyxkSyxD9mEOg1Hgbjh0ciiOid1k/nFHuIkH0m+vhXZxzI8vFW51CxvZaT22TYY/Ivg/RIUUl/Z1MEQZ6JCw/kWbG4rS6VEaY7D3BY5ZDcGOzOxBalA1bmOGznhQNUpcX6g2eqtK0J16W1WdgfY= Received: from PH0PR18MB3846.namprd18.prod.outlook.com (2603:10b6:510:49::21) by DM6PR18MB3402.namprd18.prod.outlook.com (2603:10b6:5:1c1::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.16; Fri, 17 Jun 2022 07:51:17 +0000 Received: from PH0PR18MB3846.namprd18.prod.outlook.com ([fe80::6163:7614:e7bd:d002]) by PH0PR18MB3846.namprd18.prod.outlook.com ([fe80::6163:7614:e7bd:d002%5]) with mapi id 15.20.5353.017; Fri, 17 Jun 2022 07:51:17 +0000 From: Rahul Bhansali To: Rahul Bhansali , "dev@dpdk.org" , Radu Nicolau , Akhil Goyal , Ruifeng Wang CC: Jerin Jacob Kollanukkaran , Konstantin Ananyev Subject: RE: [PATCH v2 2/2] examples/ipsec-secgw: add support of NEON with poll mode Thread-Topic: [PATCH v2 2/2] examples/ipsec-secgw: add support of NEON with poll mode Thread-Index: AQHYgh3rd9yah0OEq0ylj6Ud4j2Djq1TOaOg Date: Fri, 17 Jun 2022 07:51:17 +0000 Message-ID: References: <20220524095717.3875284-1-rbhansali@marvell.com> <20220617074241.3260496-1-rbhansali@marvell.com> <20220617074241.3260496-2-rbhansali@marvell.com> In-Reply-To: <20220617074241.3260496-2-rbhansali@marvell.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 56e5cec0-dd53-4d10-7829-08da50362914 x-ms-traffictypediagnostic: DM6PR18MB3402:EE_ x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: ekqvtj7biUKGCk3kFdbUC6WexKgWekErG5dEim1nx0tOMIW14A0BI6ejYBi7+YlUwg7y2sY4H0x7qycrUIkzaUkp0z9iFOw4HQYjxlhtuFcEjmCnR/Vqaa7trleiJONnx9seXdWU4iY72p38iqX/afb4TbyEaAqfA4LFnzxlVdPv67RuxrH+4gejHiPzwdVhqe9RgKBcsdxt/pIOMZW809fzfIYNCh5Z4uaqEyTAhiwMyxgG313IndWnsLa6taM6wH8VIvBrvpLf4WcV6gXVoQrMC2QjnnwlkXQ6Zk3ZFxHMZNcMILupeks808WTLQbpSakrTuT46SRbgsWuCVvwhvTdzpySPXH++vZE/GAwXDn09aXBk6aeZVk09YN3CDiLrAmMJiz6VxQHqYnREd8NorSoqwHATtxQL5PoDsO1+dnps7iXS6iWdFiDK+ffOj+2SpqzmrwnbzJhRfFHLT93VUiySY8Pu2QvI5deJptVH4S1piQqS7tv06u3JTxTRREcKTtLAr+8nei3NCbB3DuIrN/AiSXxgAlewFR95DMumfKU07G+LzwVe4pakOIyBcEfcTjpVG2j/wR7KnXy8bJ3DqvwfJoBEmELcUcJv3ClPxswnzphGoJhSSzE199oOKyzxBzDDExdovAzavhW4eArLI9O+JJ6xyzYH8v1vDi2qyXjkEjBxkEUT5HR84WK9IZ+u+HPdtIDR9WzDnjSrebS5g== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PH0PR18MB3846.namprd18.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230016)(4636009)(366004)(38070700005)(186003)(2906002)(38100700002)(26005)(316002)(54906003)(110136005)(8676002)(4326008)(66946007)(83380400001)(64756008)(76116006)(8936002)(66446008)(66556008)(7696005)(52536014)(6506007)(66476007)(53546011)(5660300002)(30864003)(33656002)(498600001)(122000001)(55016003)(9686003)(86362001)(71200400001)(579004)(559001); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?yopxp4AvEm7mk/3wiqMwrsWSNNZivb5rgys+DSKStkVmPMbVA5+MDInjVxTG?= =?us-ascii?Q?Ghy1g/exCIN/JM0+OtfZj4sEeznF5yF6LcObtphgctAW2maUPRi2+QgtFv9g?= =?us-ascii?Q?M4QU6ijjlygWeR4vWE1ky7dFJSn1jGCb9jbUT5YkXRMH2h/qIq4gOYrytgEd?= =?us-ascii?Q?F2oZlZqK61lu6KY7jomsT4HRBdKZuRSRAiRzH0c3C8pTrlK2lLC8424jJ4i4?= =?us-ascii?Q?PGJN+r79vbQsegrw3cCEcEX9eQRy1F5kMmFinB0/NDONo+jOFqu4Z+wiMPAH?= =?us-ascii?Q?FpF2RSWT0pW4AhMGvk6rHWpiheKeBk9mjB/NuPCjAFEmwVi4usuZriNXk/H6?= =?us-ascii?Q?wbIx3hFIyoxCtW9YPqcJD/0ZCv9UdudJvq0g2tMuxOQrzABnhDX/3NVTkojU?= =?us-ascii?Q?SWTRa7WAANg2H8MudcybD3jJS76Dxk9f4o/4p/+LGd0GJtvhOWip4EP1Aq7y?= =?us-ascii?Q?obSUNulH40qCou6fIMeED7nX7Qa73vnpCJxk+wc5gImKe3JXMG2WG1DxX+v6?= =?us-ascii?Q?1DQRQfEm49ED6uXs2bH8ln1iBF80++kBcMHdnMIfjsJ7pOqQtcuIHrSD/ij0?= =?us-ascii?Q?wZdw/5+0Jmzcb12UD/FLXzw0fO56YA0t3ZPVSe9skcxQIsMIJrBRhlrORMoL?= =?us-ascii?Q?sBgKzIm4vvZkSP7HognYuw9Hd1ryd5k0pXwtYByNbjIc75kWnv/7dWNZs6yN?= =?us-ascii?Q?QNe8+Qbf8ZyjF0ejoHZj1OL3bnC0nWhSNoe5QoqF1cMWIlndfr7D1PL6eRPF?= =?us-ascii?Q?/jHso0YivKoujUcgV2sEGswupTmD4C55qV2WZ8b4ZE4RYci7ygmdWY6RpDme?= =?us-ascii?Q?38aj7AUM15Wm4X5AveZGThJeQFDniBQ8mzAkrhYI91A4Xgefjgqcyn5Li7C3?= =?us-ascii?Q?xInjZ8d0625f1KFKh0VqAZDKFKnbqSme042+EUfRYr23DB6qCdoqv3lqXQDO?= =?us-ascii?Q?F+L891H9/bK06oNKLsudlw2+UTqOSg5p1Q4rxHoae4F5duDGOOanfMt4o6Tx?= =?us-ascii?Q?YC3xOlzZnb01pfOZMdeFTv90YqreGPxktBkgWoyFHi2nw2DR0sXP2BU1LAQP?= =?us-ascii?Q?9yLHrZ3PV0qIU2jz3lIxlnHf94OexMjqzv4zL1I2KThuMC0V3P/m3ptAZbAx?= =?us-ascii?Q?2itj3kGionERY67WWf7iZ/jMiWOaw7MGVwMZ1FBPa0eDM540c01Dmkhuhfwr?= =?us-ascii?Q?AZR0G7zPvrkkqA8dLmFlo0AnCIc6jOfcuayd6iDY7f2nMLX0EmN7fFEu4H3x?= =?us-ascii?Q?jntnMVhCippTkSoA+GFw7sARw7H+bSSDfV6ECe706l8t6wBPlPIhyC72Z3jS?= =?us-ascii?Q?TSSpXGQI088UEoJGB5jaqCPRPTXtyTpzXa/Pc8w6Cv0vlfzouud6k/R+hEKB?= =?us-ascii?Q?ZocIKWJAewBiLg5e9nPPMRgeE8j36q8Cs3tWmmvo7B2CjXC4lhSylbKwL8iw?= =?us-ascii?Q?kkdsw2y2yTMPjgwytE1hG95DXue5ASrhqabS+r188iMjFyqFwcZEph+0P4i+?= =?us-ascii?Q?whoRWUwiTGguHL9ThbF3BIO1Y+nAiqb4eyhUb5sbdnugZKAq5MXdmEqpupdd?= =?us-ascii?Q?h2OG4WVayKFSZ+pwzOCFTSMRf1qeYBMXBr/xHLeWC9qRMn4418dLq+VtzR1i?= =?us-ascii?Q?p5T9amC3MvHNYSWMPWG/tOAWjcg8BgDIu62N9u20xNby36TVnfA6tEW/dvAG?= =?us-ascii?Q?hGrIGQ98Gv1erjnZF61mjBgyqaMZjkA7W5XyIGZ2pN0/vJNUh7Fl9Tz/Tzze?= =?us-ascii?Q?mDzmDY0tPw=3D=3D?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: marvell.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: PH0PR18MB3846.namprd18.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 56e5cec0-dd53-4d10-7829-08da50362914 X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Jun 2022 07:51:17.3854 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 70e1fb47-1155-421d-87fc-2e58f638b6e0 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Ek/m318XhR8mV3czvLyPS7XtlL0u/2EeDQZW77vBm4jPRceQAeOWnuNG/iYw2CWjK/o1XGy/4p9butEdRQ1qjA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR18MB3402 X-Proofpoint-GUID: Rnd0xXkbxeGsDXAL34FOXsnlPe5irMgy X-Proofpoint-ORIG-GUID: Rnd0xXkbxeGsDXAL34FOXsnlPe5irMgy X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.64.514 definitions=2022-06-17_07,2022-06-16_01,2022-02-23_01 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org CC: Konstantin Ananyev > -----Original Message----- > From: Rahul Bhansali > Sent: Friday, June 17, 2022 1:13 PM > To: dev@dpdk.org; Radu Nicolau ; Akhil Goyal > ; Ruifeng Wang > Cc: Jerin Jacob Kollanukkaran ; Rahul Bhansali > > Subject: [PATCH v2 2/2] examples/ipsec-secgw: add support of NEON with po= ll > mode >=20 > This adds the support of NEON based lpm lookup along with multi packet > processing for burst send in packets routing. >=20 > Performance impact: > On cn10k, with poll mode inline protocol, outbound performance increased = by > upto ~8% and inbound performance increased by upto ~6%. >=20 > Signed-off-by: Rahul Bhansali > --- > Changes in v2: Removed Neon packet grouping function and used the common > one. >=20 > examples/ipsec-secgw/Makefile | 5 +- > examples/ipsec-secgw/ipsec-secgw.c | 25 ++ > examples/ipsec-secgw/ipsec_lpm_neon.h | 213 +++++++++++++++++ > examples/ipsec-secgw/ipsec_neon.h | 321 ++++++++++++++++++++++++++ > examples/ipsec-secgw/ipsec_worker.c | 9 + > 5 files changed, 571 insertions(+), 2 deletions(-) create mode 100644 > examples/ipsec-secgw/ipsec_lpm_neon.h > create mode 100644 examples/ipsec-secgw/ipsec_neon.h >=20 > diff --git a/examples/ipsec-secgw/Makefile b/examples/ipsec-secgw/Makefil= e > index 89af54bd37..ffe232774d 100644 > --- a/examples/ipsec-secgw/Makefile > +++ b/examples/ipsec-secgw/Makefile > @@ -36,6 +36,7 @@ shared: build/$(APP)-shared > static: build/$(APP)-static > ln -sf $(APP)-static build/$(APP) >=20 > +INCLUDES =3D-I../common > PC_FILE :=3D $(shell $(PKGCONF) --path libdpdk 2>/dev/null) CFLAGS +=3D= -O3 > $(shell $(PKGCONF) --cflags libdpdk) LDFLAGS_SHARED =3D $(shell $(PKGCON= F) -- > libs libdpdk) @@ -53,10 +54,10 @@ CFLAGS +=3D -DALLOW_EXPERIMENTAL_API > CFLAGS +=3D -Wno-address-of-packed-member >=20 > build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build > - $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED) > + $(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS) > +$(LDFLAGS_SHARED) >=20 > build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build > - $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC) > + $(CC) $(CFLAGS) $(SRCS-y) $(INCLUDES) -o $@ $(LDFLAGS) > +$(LDFLAGS_STATIC) >=20 > build: > @mkdir -p $@ > diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec- > secgw/ipsec-secgw.c > index 4d8a4a71b8..b650668305 100644 > --- a/examples/ipsec-secgw/ipsec-secgw.c > +++ b/examples/ipsec-secgw/ipsec-secgw.c > @@ -56,6 +56,10 @@ > #include "parser.h" > #include "sad.h" >=20 > +#if defined(__ARM_NEON) > +#include "ipsec_lpm_neon.h" > +#endif > + > volatile bool force_quit; >=20 > #define MAX_JUMBO_PKT_LEN 9600 > @@ -100,6 +104,12 @@ struct ethaddr_info ethaddr_tbl[RTE_MAX_ETHPORTS] > =3D { > { 0, ETHADDR(0x00, 0x16, 0x3e, 0x49, 0x9e, 0xdd) } }; >=20 > +/* > + * To hold ethernet header per port, which will be applied > + * to outgoing packets. > + */ > +xmm_t val_eth[RTE_MAX_ETHPORTS]; > + > struct flow_info flow_info_tbl[RTE_MAX_ETHPORTS]; >=20 > #define CMD_LINE_OPT_CONFIG "config" > @@ -568,9 +578,16 @@ process_pkts(struct lcore_conf *qconf, struct > rte_mbuf **pkts, > process_pkts_outbound(&qconf->outbound, &traffic); > } >=20 > +#if defined __ARM_NEON > + /* Neon optimized packet routing */ > + route4_pkts_neon(qconf->rt4_ctx, traffic.ip4.pkts, traffic.ip4.num, > + qconf->outbound.ipv4_offloads, true); > + route6_pkts_neon(qconf->rt6_ctx, traffic.ip6.pkts, traffic.ip6.num); > +#else > route4_pkts(qconf->rt4_ctx, traffic.ip4.pkts, traffic.ip4.num, > qconf->outbound.ipv4_offloads, true); > route6_pkts(qconf->rt6_ctx, traffic.ip6.pkts, traffic.ip6.num); > +#endif > } >=20 > static inline void > @@ -1403,6 +1420,8 @@ add_dst_ethaddr(uint16_t port, const struct > rte_ether_addr *addr) > return -EINVAL; >=20 > ethaddr_tbl[port].dst =3D ETHADDR_TO_UINT64(addr); > + rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[port].dst, > + (struct rte_ether_addr *)(val_eth + port)); > return 0; > } >=20 > @@ -1865,6 +1884,12 @@ port_init(uint16_t portid, uint64_t req_rx_offload= s, > uint64_t req_tx_offloads) > portid, rte_strerror(-ret)); >=20 > ethaddr_tbl[portid].src =3D ETHADDR_TO_UINT64(ðaddr); > + > + rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[portid].dst, > + (struct rte_ether_addr *)(val_eth + portid)); > + rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[portid].src, > + (struct rte_ether_addr *)(val_eth + portid) + 1); > + > print_ethaddr("Address: ", ðaddr); > printf("\n"); >=20 > diff --git a/examples/ipsec-secgw/ipsec_lpm_neon.h b/examples/ipsec- > secgw/ipsec_lpm_neon.h > new file mode 100644 > index 0000000000..959a5a8666 > --- /dev/null > +++ b/examples/ipsec-secgw/ipsec_lpm_neon.h > @@ -0,0 +1,213 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(C) 2022 Marvell. > + */ > + > +#ifndef __IPSEC_LPM_NEON_H__ > +#define __IPSEC_LPM_NEON_H__ > + > +#include > +#include "ipsec_neon.h" > + > +/* > + * Append ethernet header and read destination IPV4 addresses from 4 mbu= fs. > + */ > +static inline void > +processx4_step1(struct rte_mbuf *pkt[FWDSTEP], int32x4_t *dip, > + uint64_t *inline_flag) > +{ > + struct rte_ipv4_hdr *ipv4_hdr; > + struct rte_ether_hdr *eth_hdr; > + int32_t dst[FWDSTEP]; > + int i; > + > + for (i =3D 0; i < FWDSTEP; i++) { > + eth_hdr =3D (struct rte_ether_hdr *)rte_pktmbuf_prepend(pkt[i], > + > RTE_ETHER_HDR_LEN); > + pkt[i]->ol_flags |=3D RTE_MBUF_F_TX_IPV4; > + pkt[i]->l2_len =3D RTE_ETHER_HDR_LEN; > + > + ipv4_hdr =3D (struct rte_ipv4_hdr *)(eth_hdr + 1); > + > + /* Fetch destination IPv4 address */ > + dst[i] =3D ipv4_hdr->dst_addr; > + *inline_flag |=3D pkt[i]->ol_flags & > RTE_MBUF_F_TX_SEC_OFFLOAD; > + } > + > + dip[0] =3D vld1q_s32(dst); > +} > + > +/* > + * Lookup into LPM for destination port. > + */ > +static inline void > +processx4_step2(struct rt_ctx *rt_ctx, int32x4_t dip, uint64_t inline_fl= ag, > + struct rte_mbuf *pkt[FWDSTEP], uint16_t dprt[FWDSTEP]) { > + uint32_t next_hop; > + rte_xmm_t dst; > + uint8_t i; > + > + dip =3D vreinterpretq_s32_u8(vrev32q_u8(vreinterpretq_u8_s32(dip))); > + > + /* If all 4 packets are non-inline */ > + if (!inline_flag) { > + rte_lpm_lookupx4((struct rte_lpm *)rt_ctx, dip, dst.u32, > + BAD_PORT); > + /* get rid of unused upper 16 bit for each dport. */ > + vst1_s16((int16_t *)dprt, vqmovn_s32(dst.x)); > + return; > + } > + > + /* Inline and non-inline packets */ > + dst.x =3D dip; > + for (i =3D 0; i < FWDSTEP; i++) { > + if (pkt[i]->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) { > + next_hop =3D get_hop_for_offload_pkt(pkt[i], 0); > + dprt[i] =3D (uint16_t) (((next_hop & > + RTE_LPM_LOOKUP_SUCCESS) > !=3D 0) > + ? next_hop : BAD_PORT); > + > + } else { > + dprt[i] =3D (uint16_t) ((rte_lpm_lookup( > + (struct rte_lpm *)rt_ctx, > + dst.u32[i], &next_hop) =3D=3D 0) > + ? next_hop : BAD_PORT); > + } > + } > +} > + > +/* > + * Process single packets for destination port. > + */ > +static inline void > +process_single_pkt(struct rt_ctx *rt_ctx, struct rte_mbuf *pkt, > + uint16_t *dst_port) > +{ > + struct rte_ether_hdr *eth_hdr; > + struct rte_ipv4_hdr *ipv4_hdr; > + uint32_t next_hop; > + uint32_t dst_ip; > + > + eth_hdr =3D (struct rte_ether_hdr *)rte_pktmbuf_prepend(pkt, > + > RTE_ETHER_HDR_LEN); > + pkt->ol_flags |=3D RTE_MBUF_F_TX_IPV4; > + pkt->l2_len =3D RTE_ETHER_HDR_LEN; > + > + if (pkt->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) { > + next_hop =3D get_hop_for_offload_pkt(pkt, 0); > + *dst_port =3D (uint16_t) (((next_hop & > + RTE_LPM_LOOKUP_SUCCESS) !=3D 0) > + ? next_hop : BAD_PORT); > + } else { > + ipv4_hdr =3D (struct rte_ipv4_hdr *)(eth_hdr + 1); > + dst_ip =3D rte_be_to_cpu_32(ipv4_hdr->dst_addr); > + *dst_port =3D (uint16_t) ((rte_lpm_lookup( > + (struct rte_lpm *)rt_ctx, > + dst_ip, &next_hop) =3D=3D 0) > + ? next_hop : BAD_PORT); > + } > +} > + > +/* > + * Buffer optimized handling of IPv6 packets. > + */ > +static inline void > +route6_pkts_neon(struct rt_ctx *rt_ctx, struct rte_mbuf **pkts, int > +nb_rx) { > + uint8_t dst_ip6[MAX_PKT_BURST][16]; > + int32_t dst_port[MAX_PKT_BURST]; > + struct rte_ether_hdr *eth_hdr; > + struct rte_ipv6_hdr *ipv6_hdr; > + int32_t hop[MAX_PKT_BURST]; > + struct rte_mbuf *pkt; > + uint8_t lpm_pkts =3D 0; > + int32_t i; > + > + if (nb_rx =3D=3D 0) > + return; > + > + /* Need to do an LPM lookup for non-inline packets. Inline packets will > + * have port ID in the SA > + */ > + > + for (i =3D 0; i < nb_rx; i++) { > + pkt =3D pkts[i]; > + eth_hdr =3D (struct rte_ether_hdr *)rte_pktmbuf_prepend(pkt, > + > RTE_ETHER_HDR_LEN); > + pkt->l2_len =3D RTE_ETHER_HDR_LEN; > + pkt->ol_flags |=3D RTE_MBUF_F_TX_IPV6; > + > + if (!(pkt->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD)) { > + /* Security offload not enabled. So an LPM lookup is > + * required to get the hop > + */ > + ipv6_hdr =3D (struct rte_ipv6_hdr *)(eth_hdr + 1); > + memcpy(&dst_ip6[lpm_pkts][0], > + ipv6_hdr->dst_addr, 16); > + lpm_pkts++; > + } > + } > + > + rte_lpm6_lookup_bulk_func((struct rte_lpm6 *)rt_ctx, dst_ip6, > + hop, lpm_pkts); > + > + lpm_pkts =3D 0; > + > + for (i =3D 0; i < nb_rx; i++) { > + pkt =3D pkts[i]; > + if (pkt->ol_flags & RTE_MBUF_F_TX_SEC_OFFLOAD) { > + /* Read hop from the SA */ > + dst_port[i] =3D get_hop_for_offload_pkt(pkt, 1); > + } else { > + /* Need to use hop returned by lookup */ > + dst_port[i] =3D hop[lpm_pkts++]; > + } > + if (dst_port[i] =3D=3D -1) > + dst_port[i] =3D BAD_PORT; > + } > + > + /* Send packets */ > + send_multi_pkts(pkts, (uint16_t *)dst_port, nb_rx, 0, 0, false); } > + > +/* > + * Buffer optimized handling of IPv4 packets. > + */ > +static inline void > +route4_pkts_neon(struct rt_ctx *rt_ctx, struct rte_mbuf **pkts, int nb_r= x, > + uint64_t tx_offloads, bool ip_cksum) { > + const int32_t k =3D RTE_ALIGN_FLOOR(nb_rx, FWDSTEP); > + const int32_t m =3D nb_rx % FWDSTEP; > + uint16_t dst_port[MAX_PKT_BURST]; > + uint64_t inline_flag =3D 0; > + int32x4_t dip; > + int32_t i; > + > + if (nb_rx =3D=3D 0) > + return; > + > + for (i =3D 0; i !=3D k; i +=3D FWDSTEP) { > + processx4_step1(&pkts[i], &dip, &inline_flag); > + processx4_step2(rt_ctx, dip, inline_flag, &pkts[i], > + &dst_port[i]); > + } > + > + /* Classify last up to 3 packets one by one */ > + switch (m) { > + case 3: > + process_single_pkt(rt_ctx, pkts[i], &dst_port[i]); > + i++; > + /* fallthrough */ > + case 2: > + process_single_pkt(rt_ctx, pkts[i], &dst_port[i]); > + i++; > + /* fallthrough */ > + case 1: > + process_single_pkt(rt_ctx, pkts[i], &dst_port[i]); > + } > + > + send_multi_pkts(pkts, dst_port, nb_rx, tx_offloads, ip_cksum, true); } > + > +#endif /* __IPSEC_LPM_NEON_H__ */ > diff --git a/examples/ipsec-secgw/ipsec_neon.h b/examples/ipsec- > secgw/ipsec_neon.h > new file mode 100644 > index 0000000000..0f72219ed0 > --- /dev/null > +++ b/examples/ipsec-secgw/ipsec_neon.h > @@ -0,0 +1,321 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(C) 2022 Marvell. > + */ > + > +#ifndef _IPSEC_NEON_H_ > +#define _IPSEC_NEON_H_ > + > +#include "ipsec.h" > +#include "neon_common.h" > + > +#define MAX_TX_BURST (MAX_PKT_BURST / 2) > +#define BAD_PORT ((uint16_t)-1) > + > +extern xmm_t val_eth[RTE_MAX_ETHPORTS]; > + > +/* > + * Update source and destination MAC addresses in the ethernet header. > + */ > +static inline void > +processx4_step3(struct rte_mbuf *pkts[FWDSTEP], uint16_t > dst_port[FWDSTEP], > + uint64_t tx_offloads, bool ip_cksum, uint8_t *l_pkt) { > + uint32x4_t te[FWDSTEP]; > + uint32x4_t ve[FWDSTEP]; > + uint32_t *p[FWDSTEP]; > + struct rte_mbuf *pkt; > + uint8_t i; > + > + for (i =3D 0; i < FWDSTEP; i++) { > + pkt =3D pkts[i]; > + > + /* Check if it is a large packet */ > + if (pkt->pkt_len - RTE_ETHER_HDR_LEN > mtu_size) > + *l_pkt |=3D 1; > + > + p[i] =3D rte_pktmbuf_mtod(pkt, uint32_t *); > + ve[i] =3D vreinterpretq_u32_s32(val_eth[dst_port[i]]); > + te[i] =3D vld1q_u32(p[i]); > + > + /* Update last 4 bytes */ > + ve[i] =3D vsetq_lane_u32(vgetq_lane_u32(te[i], 3), ve[i], 3); > + vst1q_u32(p[i], ve[i]); > + > + if (ip_cksum) { > + struct rte_ipv4_hdr *ip; > + > + pkt->ol_flags |=3D tx_offloads; > + > + ip =3D (struct rte_ipv4_hdr *) > + (p[i] + RTE_ETHER_HDR_LEN + 1); > + ip->hdr_checksum =3D 0; > + > + /* calculate IPv4 cksum in SW */ > + if ((pkt->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) =3D=3D 0) > + ip->hdr_checksum =3D rte_ipv4_cksum(ip); > + } > + > + } > +} > + > +/** > + * Process single packet: > + * Update source and destination MAC addresses in the ethernet header. > + */ > +static inline void > +process_packet(struct rte_mbuf *pkt, uint16_t *dst_port, uint64_t tx_off= loads, > + bool ip_cksum, uint8_t *l_pkt) > +{ > + struct rte_ether_hdr *eth_hdr; > + uint32x4_t te, ve; > + > + /* Check if it is a large packet */ > + if (pkt->pkt_len - RTE_ETHER_HDR_LEN > mtu_size) > + *l_pkt |=3D 1; > + > + eth_hdr =3D rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *); > + > + te =3D vld1q_u32((uint32_t *)eth_hdr); > + ve =3D vreinterpretq_u32_s32(val_eth[dst_port[0]]); > + > + ve =3D vcopyq_laneq_u32(ve, 3, te, 3); > + vst1q_u32((uint32_t *)eth_hdr, ve); > + > + if (ip_cksum) { > + struct rte_ipv4_hdr *ip; > + > + pkt->ol_flags |=3D tx_offloads; > + > + ip =3D (struct rte_ipv4_hdr *)(eth_hdr + 1); > + ip->hdr_checksum =3D 0; > + > + /* calculate IPv4 cksum in SW */ > + if ((pkt->ol_flags & RTE_MBUF_F_TX_IP_CKSUM) =3D=3D 0) > + ip->hdr_checksum =3D rte_ipv4_cksum(ip); > + } > +} > + > +static inline void > +send_packets(struct rte_mbuf *m[], uint16_t port, uint32_t num, bool > +is_ipv4) { > + uint8_t proto; > + uint32_t i; > + > + proto =3D is_ipv4 ? IPPROTO_IP : IPPROTO_IPV6; > + for (i =3D 0; i < num; i++) > + send_single_packet(m[i], port, proto); } > + > +static inline void > +send_packetsx4(struct rte_mbuf *m[], uint16_t port, uint32_t num) { > + unsigned int lcoreid =3D rte_lcore_id(); > + struct lcore_conf *qconf; > + uint32_t len, j, n; > + > + qconf =3D &lcore_conf[lcoreid]; > + > + len =3D qconf->tx_mbufs[port].len; > + > + /* > + * If TX buffer for that queue is empty, and we have enough packets, > + * then send them straightway. > + */ > + if (num >=3D MAX_TX_BURST && len =3D=3D 0) { > + n =3D rte_eth_tx_burst(port, qconf->tx_queue_id[port], m, num); > + core_stats_update_tx(n); > + if (unlikely(n < num)) { > + do { > + rte_pktmbuf_free(m[n]); > + } while (++n < num); > + } > + return; > + } > + > + /* > + * Put packets into TX buffer for that queue. > + */ > + > + n =3D len + num; > + n =3D (n > MAX_PKT_BURST) ? MAX_PKT_BURST - len : num; > + > + j =3D 0; > + switch (n % FWDSTEP) { > + while (j < n) { > + case 0: > + qconf->tx_mbufs[port].m_table[len + j] =3D m[j]; > + j++; > + /* fallthrough */ > + case 3: > + qconf->tx_mbufs[port].m_table[len + j] =3D m[j]; > + j++; > + /* fallthrough */ > + case 2: > + qconf->tx_mbufs[port].m_table[len + j] =3D m[j]; > + j++; > + /* fallthrough */ > + case 1: > + qconf->tx_mbufs[port].m_table[len + j] =3D m[j]; > + j++; > + } > + } > + > + len +=3D n; > + > + /* enough pkts to be sent */ > + if (unlikely(len =3D=3D MAX_PKT_BURST)) { > + > + send_burst(qconf, MAX_PKT_BURST, port); > + > + /* copy rest of the packets into the TX buffer. */ > + len =3D num - n; > + if (len =3D=3D 0) > + goto exit; > + > + j =3D 0; > + switch (len % FWDSTEP) { > + while (j < len) { > + case 0: > + qconf->tx_mbufs[port].m_table[j] =3D m[n + j]; > + j++; > + /* fallthrough */ > + case 3: > + qconf->tx_mbufs[port].m_table[j] =3D m[n + j]; > + j++; > + /* fallthrough */ > + case 2: > + qconf->tx_mbufs[port].m_table[j] =3D m[n + j]; > + j++; > + /* fallthrough */ > + case 1: > + qconf->tx_mbufs[port].m_table[j] =3D m[n + j]; > + j++; > + } > + } > + } > + > +exit: > + qconf->tx_mbufs[port].len =3D len; > +} > + > +/** > + * Send packets burst to the ports in dst_port array */ static > +__rte_always_inline void send_multi_pkts(struct rte_mbuf **pkts, > +uint16_t dst_port[MAX_PKT_BURST], > + int nb_rx, uint64_t tx_offloads, bool ip_cksum, bool is_ipv4) { > + unsigned int lcoreid =3D rte_lcore_id(); > + uint16_t pnum[MAX_PKT_BURST + 1]; > + uint8_t l_pkt =3D 0; > + uint16_t dlp, *lp; > + int i =3D 0, k; > + > + /* > + * Finish packet processing and group consecutive > + * packets with the same destination port. > + */ > + k =3D RTE_ALIGN_FLOOR(nb_rx, FWDSTEP); > + > + if (k !=3D 0) { > + uint16x8_t dp1, dp2; > + > + lp =3D pnum; > + lp[0] =3D 1; > + > + processx4_step3(pkts, dst_port, tx_offloads, ip_cksum, &l_pkt); > + > + /* dp1: */ > + dp1 =3D vld1q_u16(dst_port); > + > + for (i =3D FWDSTEP; i !=3D k; i +=3D FWDSTEP) { > + processx4_step3(&pkts[i], &dst_port[i], tx_offloads, > + ip_cksum, &l_pkt); > + > + /* > + * dp2: > + * > + */ > + dp2 =3D vld1q_u16(&dst_port[i - FWDSTEP + 1]); > + lp =3D neon_port_groupx4(&pnum[i - FWDSTEP], lp, dp1, > dp2); > + > + /* > + * dp1: > + * > + */ > + dp1 =3D vextq_u16(dp2, dp1, FWDSTEP - 1); > + } > + > + /* > + * dp2: > + */ > + dp2 =3D vextq_u16(dp1, dp1, 1); > + dp2 =3D vsetq_lane_u16(vgetq_lane_u16(dp2, 2), dp2, 3); > + lp =3D neon_port_groupx4(&pnum[i - FWDSTEP], lp, dp1, dp2); > + > + /* > + * remove values added by the last repeated > + * dst port. > + */ > + lp[0]--; > + dlp =3D dst_port[i - 1]; > + } else { > + /* set dlp and lp to the never used values. */ > + dlp =3D BAD_PORT - 1; > + lp =3D pnum + MAX_PKT_BURST; > + } > + > + /* Process up to last 3 packets one by one. */ > + switch (nb_rx % FWDSTEP) { > + case 3: > + process_packet(pkts[i], dst_port + i, tx_offloads, ip_cksum, > + &l_pkt); > + GROUP_PORT_STEP(dlp, dst_port, lp, pnum, i); > + i++; > + /* fallthrough */ > + case 2: > + process_packet(pkts[i], dst_port + i, tx_offloads, ip_cksum, > + &l_pkt); > + GROUP_PORT_STEP(dlp, dst_port, lp, pnum, i); > + i++; > + /* fallthrough */ > + case 1: > + process_packet(pkts[i], dst_port + i, tx_offloads, ip_cksum, > + &l_pkt); > + GROUP_PORT_STEP(dlp, dst_port, lp, pnum, i); > + } > + > + /* > + * Send packets out, through destination port. > + * Consecutive packets with the same destination port > + * are already grouped together. > + * If destination port for the packet equals BAD_PORT, > + * then free the packet without sending it out. > + */ > + for (i =3D 0; i < nb_rx; i +=3D k) { > + > + uint16_t pn; > + > + pn =3D dst_port[i]; > + k =3D pnum[i]; > + > + if (likely(pn !=3D BAD_PORT)) { > + if (l_pkt) > + /* Large packet is present, need to send > + * individual packets with fragment > + */ > + send_packets(pkts + i, pn, k, is_ipv4); > + else > + send_packetsx4(pkts + i, pn, k); > + > + } else { > + free_pkts(&pkts[i], k); > + if (is_ipv4) > + core_statistics[lcoreid].lpm4.miss++; > + else > + core_statistics[lcoreid].lpm6.miss++; > + } > + } > +} > + > +#endif /* _IPSEC_NEON_H_ */ > diff --git a/examples/ipsec-secgw/ipsec_worker.c b/examples/ipsec- > secgw/ipsec_worker.c > index e1d4e3d864..803157d8ee 100644 > --- a/examples/ipsec-secgw/ipsec_worker.c > +++ b/examples/ipsec-secgw/ipsec_worker.c > @@ -12,6 +12,10 @@ > #include "ipsec-secgw.h" > #include "ipsec_worker.h" >=20 > +#if defined(__ARM_NEON) > +#include "ipsec_lpm_neon.h" > +#endif > + > struct port_drv_mode_data { > struct rte_security_session *sess; > struct rte_security_ctx *ctx; > @@ -1248,8 +1252,13 @@ ipsec_poll_mode_wrkr_inl_pr(void) > v6_num =3D ip6.num; > } >=20 > +#if defined __ARM_NEON > + route4_pkts_neon(rt4_ctx, v4, v4_num, 0, false); > + route6_pkts_neon(rt6_ctx, v6, v6_num); #else > route4_pkts(rt4_ctx, v4, v4_num, 0, false); > route6_pkts(rt6_ctx, v6, v6_num); > +#endif > } > } > } > -- > 2.25.1