From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f50.google.com (mail-wm0-f50.google.com [74.125.82.50]) by dpdk.org (Postfix) with ESMTP id 039925682 for ; Sat, 27 Aug 2016 10:57:50 +0200 (CEST) Received: by mail-wm0-f50.google.com with SMTP id o80so22576666wme.1 for ; Sat, 27 Aug 2016 01:57:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:user-agent:in-reply-to :references:mime-version:content-transfer-encoding; bh=8+ojV5H7oG4/tEBXdwRe1q2ifZdjLBZrYBMQmrRaoYY=; b=KoqXiyU1IeLi3od0rrfbotbn8+IjAs/WRFjmizhByGZPWhbvYMyvzyvVXV7CE5mftK x2AcZGTfSVi8POtrYlZzaC4O7t3//5sj4528v8bouY7Gi0Qa7t1lWnKFyFrRlia8PCPf KN96UA5+yDN4GA7cp5KdNwQBv2LaYMJDFvK3qKhhAle3pUljE2Ll9E7Lb7thJ5t7NG24 CiDvXJcWPjbRp+30rbvtjuOdJBOva78HO0Gt/TxE9H4Z8YV+OZrhOanc/VlE0bnWZwYL rdmCdbaya0o6LSAK4wqpgbMtJxiadjCx9EhApLMcwiyUUCAD2SSAmfFXQxWxTvXtbm/W T1lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:user-agent :in-reply-to:references:mime-version:content-transfer-encoding; bh=8+ojV5H7oG4/tEBXdwRe1q2ifZdjLBZrYBMQmrRaoYY=; b=iyD1hxTs5tPOZM+Jp2gTbfejRq6rznVdyje3KFhhrBrCUXQp5meYBe4keKTYugp/K+ mfpJLGUnWC7yu4nk04NPucAbXlLqUiXtAa4XUSkAHUjI/798cQcVNWeiH4svwjrPoFK4 LmAzEo0lgXBfg1J33Er7TuiCGrfDK9FLg8ERFz5als9DLHNMuOQsxs3/Fbu4afLG6+Vd ypPNIxgiBccnpCEajf2B18p75TrJMN3llA/FyOJX91AuB+VCBo5bIiCg4kKX96jwhwRD nNdesn6MdJRR4tP64LjpBAaprkdnDUr3DWpxh+2u9FBJckCe+72sikGm9nMrHP/wdo2J kfgw== X-Gm-Message-State: AE9vXwPVVC+C3Pb+tM66zqCc0N/VXsDIpUWThxmrrylIIbz3AXh8Omlv+h90gFo7CQhIQUgc X-Received: by 10.28.152.66 with SMTP id a63mr2329126wme.66.1472288269759; Sat, 27 Aug 2016 01:57:49 -0700 (PDT) Received: from xps13.localnet (184.203.134.77.rev.sfr.net. [77.134.203.184]) by smtp.gmail.com with ESMTPSA id ya1sm23752335wjb.23.2016.08.27.01.57.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 27 Aug 2016 01:57:48 -0700 (PDT) From: Thomas Monjalon To: Pablo de Lara , Byron Marohn Cc: dev@dpdk.org, bruce.richardson@intel.com, Saikrishna Edupuganti , jianbo.liu@linaro.org, chaozhu@linux.vnet.ibm.com, jerin.jacob@caviumnetworks.com Date: Sat, 27 Aug 2016 10:57:47 +0200 Message-ID: <5721729.LXq7JRZ983@xps13> User-Agent: KMail/4.14.10 (Linux/4.5.4-1-ARCH; KDE/4.14.11; x86_64; ; ) In-Reply-To: <1472247287-167011-3-git-send-email-pablo.de.lara.guarch@intel.com> References: <1472247287-167011-1-git-send-email-pablo.de.lara.guarch@intel.com> <1472247287-167011-3-git-send-email-pablo.de.lara.guarch@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH 2/3] hash: add vectorized comparison X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Aug 2016 08:57:50 -0000 2016-08-26 22:34, Pablo de Lara: > From: Byron Marohn > > In lookup bulk function, the signatures of all entries > are compared against the signature of the key that is being looked up. > Now that all the signatures are together, they can be compared > with vector instructions (SSE, AVX2), achieving higher lookup performance. > > Also, entries per bucket are increased to 8 when using processors > with AVX2, as 256 bits can be compared at once, which is the size of > 8x32-bit signatures. Please, would it be possible to use the generic SIMD intrinsics? We could define generic types compatible with Altivec and NEON: __attribute__ ((vector_size (n))) as described in https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html > +/* 8 entries per bucket */ > +#if defined(__AVX2__) Please prefer #ifdef RTE_MACHINE_CPUFLAG_AVX2 Ideally the vector support could be checked at runtime: if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) It would allow packaging one binary using the best optimization available. > + *prim_hash_matches |= _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32( > + _mm256_load_si256((__m256i const *)prim_bkt->sig_current), > + _mm256_set1_epi32(prim_hash))); > + *sec_hash_matches |= _mm256_movemask_ps((__m256)_mm256_cmpeq_epi32( > + _mm256_load_si256((__m256i const *)sec_bkt->sig_current), > + _mm256_set1_epi32(sec_hash))); > +/* 4 entries per bucket */ > +#elif defined(__SSE2__) > + *prim_hash_matches |= _mm_movemask_ps((__m128)_mm_cmpeq_epi16( > + _mm_load_si128((__m128i const *)prim_bkt->sig_current), > + _mm_set1_epi32(prim_hash))); > + *sec_hash_matches |= _mm_movemask_ps((__m128)_mm_cmpeq_epi16( > + _mm_load_si128((__m128i const *)sec_bkt->sig_current), > + _mm_set1_epi32(sec_hash))); In order to allow such switch based on register size, we could have an abstraction in EAL supporting 128/256/512 width for x86/ARM/POWER. I think aliasing RTE_MACHINE_CPUFLAG_ and RTE_CPUFLAG_ may be enough.