From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-eopbgr50055.outbound.protection.outlook.com [40.107.5.55]) by dpdk.org (Postfix) with ESMTP id B377A44C3 for ; Tue, 12 Mar 2019 14:05:54 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=hkXwHzDF669+uMJKuxz/u40KAuNGKjXfpU8EDrQSdvs=; b=pkwwprcQZMblBCel8FGLA6fuqJXWY54eFkkq8cepslX2FvDDqKm+rH87zi5A3y7aAumJP6eEtTJ9/Wx1szMANIl9XoR1PXvxdZ5nlaC3I+PCKNjpQCJvFqdeMQY6dSLCtHxC/QtnFhQH20XoP0FwQogi2jKcgudHcZPFE+MXMRo= Received: from AM6PR08MB3672.eurprd08.prod.outlook.com (20.177.115.76) by AM6PR08MB4087.eurprd08.prod.outlook.com (20.179.2.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1709.13; Tue, 12 Mar 2019 13:05:53 +0000 Received: from AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::4d90:78f1:e670:14d5]) by AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::4d90:78f1:e670:14d5%3]) with mapi id 15.20.1686.021; Tue, 12 Mar 2019 13:05:53 +0000 From: Honnappa Nagarahalli To: "Ruifeng Wang (Arm Technology China)" , "wenzhuo.lu@intel.com" , "jingjing.wu@intel.com" , "bernard.iremonger@intel.com" CC: "dev@dpdk.org" , "jerinj@marvell.com" , "hemant.agrawal@nxp.com" , nd , "Ruifeng Wang (Arm Technology China)" , nd Thread-Topic: [PATCH v2] app/testpmd: optimized MAC swap by using neon intrinsics Thread-Index: AQHU2JV1vPABY2BIl0qimb9KaOUtS6YH9x0A Date: Tue, 12 Mar 2019 13:05:53 +0000 Message-ID: References: <1552368927-5485-1-git-send-email-ruifeng.wang@arm.com> In-Reply-To: <1552368927-5485-1-git-send-email-ruifeng.wang@arm.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.111.135] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 7572ced6-6ab5-46f9-32a5-08d6a6eb7536 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600127)(711020)(4605104)(4618075)(2017052603328)(7153060)(7193020); SRVR:AM6PR08MB4087; x-ms-traffictypediagnostic: AM6PR08MB4087: x-ld-processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr nodisclaimer: True x-microsoft-antispam-prvs: x-forefront-prvs: 09749A275C x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(376002)(346002)(396003)(39860400002)(136003)(366004)(199004)(189003)(33656002)(9686003)(6246003)(76176011)(55016002)(53936002)(86362001)(102836004)(2906002)(66066001)(2201001)(7696005)(6506007)(106356001)(2501003)(26005)(97736004)(6436002)(99286004)(14444005)(186003)(105586002)(256004)(3846002)(8676002)(81156014)(446003)(486006)(305945005)(229853002)(7736002)(74316002)(72206003)(316002)(68736007)(478600001)(52536013)(14454004)(25786009)(5660300002)(54906003)(110136005)(11346002)(8936002)(476003)(4326008)(71190400001)(71200400001)(81166006)(6116002); DIR:OUT; SFP:1101; SCL:1; SRVR:AM6PR08MB4087; H:AM6PR08MB3672.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: Z6JEguVg+gDyTQS+sRa3MAbzZaClu5O4wYONiKoYhORRezneH4HSsIhUG2HPJ6psG2Zqaq02bEp/SaOhwHmMfXlOdPAhG0s99hOLL4Jvylbn7nDTUn2vlT6leqyJF9RvrtLFNZTq9nINL14Gkh3K+0KGNm2JhglD9n0hDzaHsby92r/vAmH5VTnaZ/7244XJud7vOJdtSyNLP33Dsv+z6BBy2oduwpqUPUdZiQKSJQnb77TxiY+06umPBXwFXUzxriisHpseNZesofQ3FLmTs4IzDR+7QcVO5jPtaYbd1qt4tM3r4/ZwzakivycQMYqz9OVuujfzqPh3MaYde5pCCLBTrsY/IlwkPoV1Dn32BRxUYVLAPFxLzJYdrPFIHHD1yg4FieiaCl6+6KqbbKQ109FPv0Wql/iZlGpR23c9Hfs= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7572ced6-6ab5-46f9-32a5-08d6a6eb7536 X-MS-Exchange-CrossTenant-originalarrivaltime: 12 Mar 2019 13:05:53.2284 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB4087 Subject: Re: [dpdk-dev] [PATCH v2] app/testpmd: optimized MAC swap by using neon intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Mar 2019 13:05:54 -0000 > Improved MAC swap performance for ARM platform. > The improvement was achieved by using neon intrinsics to save CPU cycles > and doing swap for four packets at a time. > The optimization had 15% - 20% throughput boost in testpmd MAC swap > mode. >=20 > Signed-off-by: Ruifeng Wang > Reviewed-by: Gavin Hu > Reviewed-by: Phil Yang > Acked-by: Jerin Jacob > --- > v2: > * Defined idx_map as const. > * Added file header line to indicate derivation from macswap_sse.h. >=20 > app/test-pmd/macswap.c | 4 +- > app/test-pmd/macswap_neon.h | 97 > +++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 100 insertions(+), 1 deletion(-) create mode 100644 > app/test-pmd/macswap_neon.h >=20 > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c index > cbb41b7..71af916 100644 > --- a/app/test-pmd/macswap.c > +++ b/app/test-pmd/macswap.c > @@ -66,8 +66,10 @@ > #include >=20 > #include "testpmd.h" > -#ifdef RTE_ARCH_X86 > +#if defined(RTE_ARCH_X86) > #include "macswap_sse.h" > +#elif defined(RTE_MACHINE_CPUFLAG_NEON) #include "macswap_neon.h" > #else > #include "macswap.h" > #endif > diff --git a/app/test-pmd/macswap_neon.h b/app/test- > pmd/macswap_neon.h new file mode 100644 index 0000000..bdf416a > --- /dev/null > +++ b/app/test-pmd/macswap_neon.h > @@ -0,0 +1,97 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2019 Arm Limited > + * > + * Copyright(c) 2019 Intel Corporation > + * > + * Derived do_macswap implementation from app/test-pmd/macswap_sse.h > +*/ > + > +#ifndef _MACSWAP_NEON_H_ > +#define _MACSWAP_NEON_H_ > + > +#include "macswap_common.h" > +#include "rte_vect.h" > + > +static inline void > +do_macswap(struct rte_mbuf *pkts[], uint16_t nb, > + struct rte_port *txp) > +{ > + struct ether_hdr *eth_hdr[4]; > + struct rte_mbuf *mb[4]; > + uint64_t ol_flags; > + int i; > + int r; > + uint8x16_t v0, v1, v2, v3; > + /** > + * Index map be used to shuffle the 16 bytes. > + * byte 0-5 will be swapped with byte 6-11. > + * byte 12-15 will keep unchanged. > + */ > + const uint8x16_t idx_map =3D {6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, > + 12, 13, 14, 15}; > + > + ol_flags =3D ol_flags_init(txp->dev_conf.txmode.offloads); > + vlan_qinq_set(pkts, nb, ol_flags, > + txp->tx_vlan_id, txp->tx_vlan_id_outer); > + > + i =3D 0; > + r =3D nb; > + > + while (r >=3D 4) { > + if (r >=3D 8) { > + rte_prefetch0(rte_pktmbuf_mtod(pkts[i + 4], void *)); > + rte_prefetch0(rte_pktmbuf_mtod(pkts[i + 5], void *)); > + rte_prefetch0(rte_pktmbuf_mtod(pkts[i + 6], void *)); > + rte_prefetch0(rte_pktmbuf_mtod(pkts[i + 7], void *)); > + } > + > + mb[0] =3D pkts[i++]; > + eth_hdr[0] =3D rte_pktmbuf_mtod(mb[0], struct ether_hdr *); > + > + mb[1] =3D pkts[i++]; > + eth_hdr[1] =3D rte_pktmbuf_mtod(mb[1], struct ether_hdr *); > + > + mb[2] =3D pkts[i++]; > + eth_hdr[2] =3D rte_pktmbuf_mtod(mb[2], struct ether_hdr *); > + > + mb[3] =3D pkts[i++]; > + eth_hdr[3] =3D rte_pktmbuf_mtod(mb[3], struct ether_hdr *); > + > + v0 =3D vld1q_u8((uint8_t const *)eth_hdr[0]); > + v1 =3D vld1q_u8((uint8_t const *)eth_hdr[1]); > + v2 =3D vld1q_u8((uint8_t const *)eth_hdr[2]); > + v3 =3D vld1q_u8((uint8_t const *)eth_hdr[3]); > + > + v0 =3D vqtbl1q_u8(v0, idx_map); > + v1 =3D vqtbl1q_u8(v1, idx_map); > + v2 =3D vqtbl1q_u8(v2, idx_map); > + v3 =3D vqtbl1q_u8(v3, idx_map); > + > + vst1q_u8((uint8_t *)eth_hdr[0], v0); > + vst1q_u8((uint8_t *)eth_hdr[1], v1); > + vst1q_u8((uint8_t *)eth_hdr[2], v2); > + vst1q_u8((uint8_t *)eth_hdr[3], v3); > + > + mbuf_field_set(mb[0], ol_flags); > + mbuf_field_set(mb[1], ol_flags); > + mbuf_field_set(mb[2], ol_flags); > + mbuf_field_set(mb[3], ol_flags); > + r -=3D 4; > + } > + > + for ( ; i < nb; i++) { > + if (i < nb - 1) > + rte_prefetch0(rte_pktmbuf_mtod(pkts[i+1], void *)); > + mb[0] =3D pkts[i]; > + eth_hdr[0] =3D rte_pktmbuf_mtod(mb[0], struct ether_hdr *); > + > + /* Swap dest and src mac addresses. */ > + v0 =3D vld1q_u8((uint8_t const *)eth_hdr[0]); > + v0 =3D vqtbl1q_u8(v0, idx_map); > + vst1q_u8((uint8_t *)eth_hdr[0], v0); > + > + mbuf_field_set(mb[0], ol_flags); > + } > +} > + > +#endif /* _MACSWAP_NEON_H_ */ > -- > 2.7.4 Reviewed-by: Honnappa Nagarahalli