From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f48.google.com (mail-oi0-f48.google.com [209.85.218.48]) by dpdk.org (Postfix) with ESMTP id F0E752E7B for ; Fri, 24 Apr 2015 00:26:40 +0200 (CEST) Received: by oiko83 with SMTP id o83so26847577oik.1 for ; Thu, 23 Apr 2015 15:26:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=khrx0lwvWvY35tChCWxM6f6TIePl2ACtm9Q1CTSzvJA=; b=Uy3tki/q0QbaTyN5iTaamu9WcVXOd7QO7rdH4dsyQtu4DquSETC7Pg3Lq/Xx00tx8g /PLxa5t3gQM83+f9Vs1u1dbwTAinCuRYRV41yiqMKzA672GlU5MWx8i8RanbwjQNgJfs paNs0Jn4E5BR3ppzQ3oQ8wGo2rTWT/8nHdoBHsP3PsdjJWF5hSWxMeg/yU2bCMNE6Qdu qKaYyJTkY+HI/Z/YPhosWH0nWRsL97yGz/zaoTJOyd74u6mKz7KnraUpvV429kx7Um1C /lGIF2Qax6WFQBMzF9vwdRhqQXtqXRcIIiiPifFlaAc7yYd9IZfFQJlCKvW+O/v8tLVX Or7g== MIME-Version: 1.0 X-Received: by 10.202.98.193 with SMTP id w184mr4349580oib.96.1429828000338; Thu, 23 Apr 2015 15:26:40 -0700 (PDT) Received: by 10.202.179.195 with HTTP; Thu, 23 Apr 2015 15:26:40 -0700 (PDT) In-Reply-To: <20150423140042.GA7248@bricha3-MOBL3> References: <1429716828-19012-1-git-send-email-rkerur@gmail.com> <1429716828-19012-2-git-send-email-rkerur@gmail.com> <55389E44.8030603@intel.com> <20150423081138.GA8592@bricha3-MOBL3> <2601191342CEEE43887BDE71AB97725821420FC7@irsmsx105.ger.corp.intel.com> <20150423140042.GA7248@bricha3-MOBL3> Date: Thu, 23 Apr 2015 15:26:40 -0700 Message-ID: From: Ravi Kerur To: Bruce Richardson Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH] Implement memcmp using AVX/SSE instructio X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Apr 2015 22:26:41 -0000 On Thu, Apr 23, 2015 at 7:00 AM, Bruce Richardson < bruce.richardson@intel.com> wrote: > On Thu, Apr 23, 2015 at 06:53:44AM -0700, Ravi Kerur wrote: > > On Thu, Apr 23, 2015 at 2:23 AM, Ananyev, Konstantin < > > konstantin.ananyev@intel.com> wrote: > > > > > > > > > > > > -----Original Message----- > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce > Richardson > > > > Sent: Thursday, April 23, 2015 9:12 AM > > > > To: Wodkowski, PawelX > > > > Cc: dev@dpdk.org > > > > Subject: Re: [dpdk-dev] [PATCH] Implement memcmp using AVX/SSE > instructio > > > > > > > > On Thu, Apr 23, 2015 at 09:24:52AM +0200, Pawel Wodkowski wrote: > > > > > On 2015-04-22 17:33, Ravi Kerur wrote: > > > > > >+/** > > > > > >+ * Compare bytes between two locations. The locations must not > > > overlap. > > > > > >+ * > > > > > >+ * @note This is implemented as a macro, so it's address should > not > > > be taken > > > > > >+ * and care is needed as parameter expressions may be evaluated > > > multiple times. > > > > > >+ * > > > > > >+ * @param src_1 > > > > > >+ * Pointer to the first source of the data. > > > > > >+ * @param src_2 > > > > > >+ * Pointer to the second source of the data. > > > > > >+ * @param n > > > > > >+ * Number of bytes to compare. > > > > > >+ * @return > > > > > >+ * true if equal otherwise false. > > > > > >+ */ > > > > > >+static inline bool > > > > > >+rte_memcmp(const void *src_1, const void *src, > > > > > >+ size_t n) __attribute__((always_inline)); > > > > > You are exposing this as public API, so I think you should follow > > > > > description bellow or not call this _memcmp_ > > > > > > > > > > int memcmp(const void *s1, const void *s2, size_t n); > > > > > > > > > > The memcmp() function returns an integer less than, equal to, or > > > greater > > > > > than > > > > > zero if the first n bytes of s1 is found, > respectively, > > > to be > > > > > less than, to > > > > > match, or be greater than the first n bytes of s2. > > > > > > > > > > > > > +1 to this point. > > > > > > > > Also, if I read your quoted performance numbers in your earlier mail > > > correctly, > > > > we are only looking at a 1-4% performance increase. Is the additional > > > code to > > > > maintain worth the benefit? > > > > > > Yep, same thought here, is it really worth it? > > > Konstantin > > > > > > > > > > > /Bruce > > > > > > > > > -- > > > > > Pawel > > > > > > > I think I haven't exploited every thing x86 has to offer to improve > > performance. I am looking for inputs. Until we have exhausted all > avenues I > > don't want to drop it. One thing I have noticed is that bigger key size > > gets better performance numbers. I plan to re-run perf tests with 64 and > > 128 bytes key size and will report back. Any other avenues to try out > > please let me know I will give it a shot. > > > > Thanks, > > Ravi > > Hi Ravi, > > are 128 byte comparisons realistic? An IPv6 5-tuple with double vlan tags > is still > only 41 bytes, or 48 with some padding added? > While for a memcpy function, you can see cases where you are going to copy > a whole > packet, meaning that sizes of 128B+ (up to multiple k) are realistic, it's > harder > to see that for a compare function. > > In any case, we await the results of your further optimization work to see > how > that goes. > > Hi Bruce, Couple of things I am planning to try 1. Use _xor_ and _testz_ instructions for comparison instead of _cmpeq_ and _mask_. 2. I am using unaligned loads, not sure about the penalty, I plan to try with aligned loads if address is aligned and compare results. Agreed that with just L3 or even if we go with L2 + L3 + L4 tuples it will not exceed 64 bytes, 128 bytes is just a stretch for some weird MPLSoGRE header formats. My focus is currently on improving performance for < 64 bytes and < 128 bytes key lengths only. Thanks, Ravi Regards, > /Bruce >