From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rkerur@gmail.com>
Received: from mail-oi0-f52.google.com (mail-oi0-f52.google.com
 [209.85.218.52]) by dpdk.org (Postfix) with ESMTP id 1D687C5A0
 for <dev@dpdk.org>; Fri, 19 Feb 2016 18:50:50 +0100 (CET)
Received: by mail-oi0-f52.google.com with SMTP id x21so17122966oix.2
 for <dev@dpdk.org>; Fri, 19 Feb 2016 09:50:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=Dn+mRt6SS/yg8vybZ9EU/w2nMIETd0lXrLSrdNf4zsg=;
 b=jjQR0yYzQkZJwUClUGQd8VBs8KSNUd/FMMq2ReYmBuMN96El6A5XLGYDnhMob5A31V
 1DbvjSVvH0IT6qqpUfk5Twp0TryFuEOBHyomxAJ9x+WcRMvnZPgZs6jlWYxKoRAoiBUT
 uO4x/VvvfolPbS3frRpZ+JBQy5cpR2c8FsK/a9YeiJJUorZC64PaRhVtUrGZXfGgOYkM
 ZAktmGn3JvMwAEYhcy1h4aWNUjuCStEi1U2S64U3BuXmlo11QnPVnvBfwO4z7i6mDuJo
 QkXoZLa5wKMS4x2OjvbWkdMSIcIKSccmQamZY+6DVtmKUYLMm8hRW1AopSvT/Wa459Vx
 PpPQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=Dn+mRt6SS/yg8vybZ9EU/w2nMIETd0lXrLSrdNf4zsg=;
 b=iZ2DZm/AHt/Boc69l7O9do9ND7ZRW0/T9/o+o3E4RJgKmMXTPHvl1Avka9DDZPcX+M
 ITLvEMNYDeFHJx3MmBqyxaV+1XPJ3UMvzAwANOw543tgatc7TmMwjNQxB/aAb57GDfNJ
 xgLmfRmPa4RoAd8XVIolo1VnmzjfFD/4XjV90zGziVcEGDJs+6etM2WgK4/0S3IS5QxE
 qF3R4sit9acAaa5z/ctTJSTVsGfiCz3Yx3uxXEiZXl0NnuH+D7VholFaJ+ZQefQ9qBtl
 R4WILkw4uFn/vaQrcq5V3757trwa2oO2wM+st2mOT4kdLicdn6BUrXvvtFIB4v0jDShf
 5mQg==
X-Gm-Message-State: AG10YOQ76NvM7obzXaLmy7XamIZZ7gCHvtBlJpsl3BgOSxAAPWLoQBgK1fDl3l17qN/K61K4+8S5eI5/hyEB2Q==
MIME-Version: 1.0
X-Received: by 10.202.206.66 with SMTP id e63mr12289685oig.88.1455904249596;
 Fri, 19 Feb 2016 09:50:49 -0800 (PST)
Received: by 10.202.177.195 with HTTP; Fri, 19 Feb 2016 09:50:49 -0800 (PST)
In-Reply-To: <1453950506-28067-1-git-send-email-zhihong.wang@intel.com>
References: <1431979303-1346-2-git-send-email-rkerur@gmail.com>
 <1453950506-28067-1-git-send-email-zhihong.wang@intel.com>
Date: Fri, 19 Feb 2016 09:50:49 -0800
Message-ID: <CAFb4SLCCPKYJZYXR8ik4RWrpVmM5APkF=C1g_AFCv9Y3A-P2nw@mail.gmail.com>
From: Ravi Kerur <rkerur@gmail.com>
To: Zhihong Wang <zhihong.wang@intel.com>
Content-Type: text/plain; charset=UTF-8
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [dpdk-dev,
	v3] Implement memcmp using Intel SIMD instrinsics.
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Feb 2016 17:50:50 -0000

On Wed, Jan 27, 2016 at 7:08 PM, Zhihong Wang <zhihong.wang@intel.com>
wrote:

> > diff --git a/lib/librte_eal/common/include/arch/x86/rte_memcmp.h b/lib
> > /librte_eal/common/include/arch/x86/rte_memcmp.h
>
> [...]
>
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * Compare bytes between two locations. The locations must not overlap.
> > + *
>
> Parameter names should be kept consistent as they are in function body.
>
> > + * @param src_1
> > + *   Pointer to the first source of the data.
> > + * @param src_2
> > + *   Pointer to the second source of the data.
> > + * @param n
> > + *   Number of bytes to compare.
> > + * @return
> > + *   zero if src_1 equal src_2
> > + *   -ve if src_1 less than src_2
> > + *   +ve if src_1 greater than src_2
> > + */
> > +static inline int
> > +rte_memcmp(const void *src_1, const void *src,
> > +             size_t n) __attribute__((always_inline));
> > +
> > +/**
> > + * Find the first different bit for comparison.
> > + */
> > +static inline int
> > +rte_cmpffd (uint32_t x, uint32_t y)
> > +{
> > +     int i;
> > +     int pos = x ^ y;
> > +     for (i = 0; i < 32; i++)
> > +             if (pos & (1<<i))
>
> Coding style check :-)
> BTW, does the bsf instruction provide this check?
>
> > +                     return i;
> > +     return -1;
> > +}
> > +
>
> [...]
>
> > +/**
> > + * Compare 48 bytes between two locations.
> > + * Locations should not overlap.
> > + */
> > +static inline int
> > +rte_cmp48(const void *src_1, const void *src_2)
>
> Guess this is not used.
>

I had left _unused_ with the assumption that it might be needed when actual
performance tests are done on high end servers.

>
> [...]
>
> > +/**
> > + * Compare 256 bytes between two locations.
> > + * Locations should not overlap.
> > + */
> > +static inline int
> > +rte_cmp256(const void *src_1, const void *src_2)
> > +{
> > +     int ret;
> > +
> > +     ret = rte_cmp64((const uint8_t *)src_1 + 0 * 64,
> > +                     (const uint8_t *)src_2 + 0 * 64);
>
> Why not just use rte_cmp128?
>
>
> [...]
>
> > +static inline int
> > +rte_memcmp(const void *_src_1, const void *_src_2, size_t n)
> > +{
> > +     const uint8_t *src_1 = (const uint8_t *)_src_1;
> > +     const uint8_t *src_2 = (const uint8_t *)_src_2;
> > +     int ret = 0;
> > +
> > +     if (n < 16)
> > +             return rte_memcmp_regular(src_1, src_2, n);
> > +
> > +     if (n <= 32) {
> > +             ret = rte_cmp16(src_1, src_2);
> > +             if (unlikely(ret != 0))
> > +                     return ret;
> > +
> > +             return rte_cmp16(src_1 - 16 + n, src_2 - 16 + n);
> > +     }
> > +
>
> Too many conditions here may harm the overall performance.
> It's a trade-off thing, all about balancing the overhead.
> Just make sure this is tuned based on actual test numbers.
>
>
> > +     if (n <= 48) {
> > +             ret = rte_cmp32(src_1, src_2);
> > +             if (unlikely(ret != 0))
> > +                     return ret;
> > +
> > +             return rte_cmp16(src_1 - 16 + n, src_2 - 16 + n);
> > +     }
> > +
> > +     if (n <= 64) {
> > +             ret = rte_cmp32(src_1, src_2);
> > +             if (unlikely(ret != 0))
> > +                     return ret;
> > +
> > +             ret = rte_cmp16(src_1 + 32, src_2 + 32);
> > +
> > +             if (unlikely(ret != 0))
> > +                     return ret;
> > +
> > +             return rte_cmp16(src_1 - 16 + n, src_2 - 16 + n);
> > +     }
> > +
> > +     if (n <= 96) {
> > +             ret = rte_cmp64(src_1, src_2);
> > +             if (unlikely(ret != 0))
> > +                     return ret;
> > +
> > +             ret = rte_cmp16(src_1 + 64, src_2 + 64);
> > +             if (unlikely(ret != 0))
> > +                     return ret;
> > +
> > +             return rte_cmp16(src_1 - 16 + n, src_2 - 16 + n);
> > +     }
> > +
> > +     if (n <= 128) {
> > +             ret = rte_cmp64(src_1, src_2);
> > +             if (unlikely(ret != 0))
> > +                     return ret;
> > +
> > +             ret = rte_cmp32(src_1 + 64, src_2 + 64);
> > +             if (unlikely(ret != 0))
> > +                     return ret;
> > +
> > +             ret = rte_cmp16(src_1 + 96, src_2 + 96);
> > +             if (unlikely(ret != 0))
> > +                     return ret;
> > +
> > +             return rte_cmp16(src_1 - 16 + n, src_2 - 16 + n);
> > +     }
>
> [...]
>
> > +/**
> > + * Compare 48 bytes between two locations.
> > + * Locations should not overlap.
> > + */
> > +static inline int
> > +rte_cmp48(const void *src_1, const void *src_2)
>
> Not used.
>
> > +{
> > +     int ret;
> > +
> > +     ret = rte_cmp16((const uint8_t *)src_1 + 0 * 16,
> > +                     (const uint8_t *)src_2 + 0 * 16);
> > +
> > +     if (unlikely(ret != 0))
> > +             return ret;
> > +
> > +     ret = rte_cmp16((const uint8_t *)src_1 + 1 * 16,
> > +                     (const uint8_t *)src_2 + 1 * 16);
> > +
> > +     if (unlikely(ret != 0))
> > +             return ret;
> > +
> > +     return rte_cmp16((const uint8_t *)src_1 + 2 * 16,
> > +                     (const uint8_t *)src_2 + 2 * 16);
> > +}
> > +
> > +/**
> > + * Compare 64 bytes between two locations.
> > + * Locations should not overlap.
> > + */
> > +static inline int
> > +rte_cmp64(const void *src_1, const void *src_2)
> > +{
> > +     int ret;
> > +
> > +     ret = rte_cmp16((const uint8_t *)src_1 + 0 * 16,
> > +                     (const uint8_t *)src_2 + 0 * 16);
>
> Why not rte_cmp32? And use rte_cmp64 for rte_cmp128, and so on.
> That should make the code looks clearer.
>
>
> It'd be great if you could format this patch into a patch set with several
> little ones. :-)
> Also, the kernel checkpatch is very helpful.
> Good coding style and patch organization make it easy for in-depth reviews.
>
>
Combination of scalar and vector (32/64/128) was done to get optimal
performance numbers. If there is enough interest in this I can work on it
and provide an updated patch set.

Thanks,
Ravi