From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f52.google.com (mail-oi0-f52.google.com [209.85.218.52]) by dpdk.org (Postfix) with ESMTP id 1D687C5A0 for ; Fri, 19 Feb 2016 18:50:50 +0100 (CET) Received: by mail-oi0-f52.google.com with SMTP id x21so17122966oix.2 for ; Fri, 19 Feb 2016 09:50:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Dn+mRt6SS/yg8vybZ9EU/w2nMIETd0lXrLSrdNf4zsg=; b=jjQR0yYzQkZJwUClUGQd8VBs8KSNUd/FMMq2ReYmBuMN96El6A5XLGYDnhMob5A31V 1DbvjSVvH0IT6qqpUfk5Twp0TryFuEOBHyomxAJ9x+WcRMvnZPgZs6jlWYxKoRAoiBUT uO4x/VvvfolPbS3frRpZ+JBQy5cpR2c8FsK/a9YeiJJUorZC64PaRhVtUrGZXfGgOYkM ZAktmGn3JvMwAEYhcy1h4aWNUjuCStEi1U2S64U3BuXmlo11QnPVnvBfwO4z7i6mDuJo QkXoZLa5wKMS4x2OjvbWkdMSIcIKSccmQamZY+6DVtmKUYLMm8hRW1AopSvT/Wa459Vx PpPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=Dn+mRt6SS/yg8vybZ9EU/w2nMIETd0lXrLSrdNf4zsg=; b=iZ2DZm/AHt/Boc69l7O9do9ND7ZRW0/T9/o+o3E4RJgKmMXTPHvl1Avka9DDZPcX+M ITLvEMNYDeFHJx3MmBqyxaV+1XPJ3UMvzAwANOw543tgatc7TmMwjNQxB/aAb57GDfNJ xgLmfRmPa4RoAd8XVIolo1VnmzjfFD/4XjV90zGziVcEGDJs+6etM2WgK4/0S3IS5QxE qF3R4sit9acAaa5z/ctTJSTVsGfiCz3Yx3uxXEiZXl0NnuH+D7VholFaJ+ZQefQ9qBtl R4WILkw4uFn/vaQrcq5V3757trwa2oO2wM+st2mOT4kdLicdn6BUrXvvtFIB4v0jDShf 5mQg== X-Gm-Message-State: AG10YOQ76NvM7obzXaLmy7XamIZZ7gCHvtBlJpsl3BgOSxAAPWLoQBgK1fDl3l17qN/K61K4+8S5eI5/hyEB2Q== MIME-Version: 1.0 X-Received: by 10.202.206.66 with SMTP id e63mr12289685oig.88.1455904249596; Fri, 19 Feb 2016 09:50:49 -0800 (PST) Received: by 10.202.177.195 with HTTP; Fri, 19 Feb 2016 09:50:49 -0800 (PST) In-Reply-To: <1453950506-28067-1-git-send-email-zhihong.wang@intel.com> References: <1431979303-1346-2-git-send-email-rkerur@gmail.com> <1453950506-28067-1-git-send-email-zhihong.wang@intel.com> Date: Fri, 19 Feb 2016 09:50:49 -0800 Message-ID: From: Ravi Kerur To: Zhihong Wang Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [dpdk-dev, v3] Implement memcmp using Intel SIMD instrinsics. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Feb 2016 17:50:50 -0000 On Wed, Jan 27, 2016 at 7:08 PM, Zhihong Wang wrote: > > diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcmp.h b/lib > > /librte_eal/common/include/arch/x86/rte_memcmp.h > > [...] > > > +#ifdef __cplusplus > > +extern "C" { > > +#endif > > + > > +/** > > + * Compare bytes between two locations. The locations must not overlap. > > + * > > Parameter names should be kept consistent as they are in function body. > > > + * @param src_1 > > + * Pointer to the first source of the data. > > + * @param src_2 > > + * Pointer to the second source of the data. > > + * @param n > > + * Number of bytes to compare. > > + * @return > > + * zero if src_1 equal src_2 > > + * -ve if src_1 less than src_2 > > + * +ve if src_1 greater than src_2 > > + */ > > +static inline int > > +rte_memcmp(const void *src_1, const void *src, > > + size_t n) __attribute__((always_inline)); > > + > > +/** > > + * Find the first different bit for comparison. > > + */ > > +static inline int > > +rte_cmpffd (uint32_t x, uint32_t y) > > +{ > > + int i; > > + int pos = x ^ y; > > + for (i = 0; i < 32; i++) > > + if (pos & (1< > Coding style check :-) > BTW, does the bsf instruction provide this check? > > > + return i; > > + return -1; > > +} > > + > > [...] > > > +/** > > + * Compare 48 bytes between two locations. > > + * Locations should not overlap. > > + */ > > +static inline int > > +rte_cmp48(const void *src_1, const void *src_2) > > Guess this is not used. > I had left _unused_ with the assumption that it might be needed when actual performance tests are done on high end servers. > > [...] > > > +/** > > + * Compare 256 bytes between two locations. > > + * Locations should not overlap. > > + */ > > +static inline int > > +rte_cmp256(const void *src_1, const void *src_2) > > +{ > > + int ret; > > + > > + ret = rte_cmp64((const uint8_t *)src_1 + 0 * 64, > > + (const uint8_t *)src_2 + 0 * 64); > > Why not just use rte_cmp128? > > > [...] > > > +static inline int > > +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) > > +{ > > + const uint8_t *src_1 = (const uint8_t *)_src_1; > > + const uint8_t *src_2 = (const uint8_t *)_src_2; > > + int ret = 0; > > + > > + if (n < 16) > > + return rte_memcmp_regular(src_1, src_2, n); > > + > > + if (n <= 32) { > > + ret = rte_cmp16(src_1, src_2); > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + return rte_cmp16(src_1 - 16 + n, src_2 - 16 + n); > > + } > > + > > Too many conditions here may harm the overall performance. > It's a trade-off thing, all about balancing the overhead. > Just make sure this is tuned based on actual test numbers. > > > > + if (n <= 48) { > > + ret = rte_cmp32(src_1, src_2); > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + return rte_cmp16(src_1 - 16 + n, src_2 - 16 + n); > > + } > > + > > + if (n <= 64) { > > + ret = rte_cmp32(src_1, src_2); > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + ret = rte_cmp16(src_1 + 32, src_2 + 32); > > + > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + return rte_cmp16(src_1 - 16 + n, src_2 - 16 + n); > > + } > > + > > + if (n <= 96) { > > + ret = rte_cmp64(src_1, src_2); > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + ret = rte_cmp16(src_1 + 64, src_2 + 64); > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + return rte_cmp16(src_1 - 16 + n, src_2 - 16 + n); > > + } > > + > > + if (n <= 128) { > > + ret = rte_cmp64(src_1, src_2); > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + ret = rte_cmp32(src_1 + 64, src_2 + 64); > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + ret = rte_cmp16(src_1 + 96, src_2 + 96); > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + return rte_cmp16(src_1 - 16 + n, src_2 - 16 + n); > > + } > > [...] > > > +/** > > + * Compare 48 bytes between two locations. > > + * Locations should not overlap. > > + */ > > +static inline int > > +rte_cmp48(const void *src_1, const void *src_2) > > Not used. > > > +{ > > + int ret; > > + > > + ret = rte_cmp16((const uint8_t *)src_1 + 0 * 16, > > + (const uint8_t *)src_2 + 0 * 16); > > + > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + ret = rte_cmp16((const uint8_t *)src_1 + 1 * 16, > > + (const uint8_t *)src_2 + 1 * 16); > > + > > + if (unlikely(ret != 0)) > > + return ret; > > + > > + return rte_cmp16((const uint8_t *)src_1 + 2 * 16, > > + (const uint8_t *)src_2 + 2 * 16); > > +} > > + > > +/** > > + * Compare 64 bytes between two locations. > > + * Locations should not overlap. > > + */ > > +static inline int > > +rte_cmp64(const void *src_1, const void *src_2) > > +{ > > + int ret; > > + > > + ret = rte_cmp16((const uint8_t *)src_1 + 0 * 16, > > + (const uint8_t *)src_2 + 0 * 16); > > Why not rte_cmp32? And use rte_cmp64 for rte_cmp128, and so on. > That should make the code looks clearer. > > > It'd be great if you could format this patch into a patch set with several > little ones. :-) > Also, the kernel checkpatch is very helpful. > Good coding style and patch organization make it easy for in-depth reviews. > > Combination of scalar and vector (32/64/128) was done to get optimal performance numbers. If there is enough interest in this I can work on it and provide an updated patch set. Thanks, Ravi