From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f52.google.com (mail-wm0-f52.google.com [74.125.82.52]) by dpdk.org (Postfix) with ESMTP id AAF065ABB for ; Thu, 2 Jun 2016 23:16:18 +0200 (CEST) Received: by mail-wm0-f52.google.com with SMTP id a136so248544443wme.0 for ; Thu, 02 Jun 2016 14:16:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=te5He/UHQmOCHEgI4cTwNdPbiIyYhC54PBV8AWHE3DI=; b=JHVefJdDoWKJpm2GI3eOW+Gq4jw5cFsvaQoUwribAAf7kzmkMxfnpE0hgMkdqhNHBS pkEkvFB5dLm5eg40/PUJnpxOAT9yQU6InYIdmyl2Yux17hBrF0riW4F8VgTjkviunxD4 JLygBSIBvGZbKouLTfpWeh2FznIsTnW3S1NXfwDZppYrXcshWmVfApEXyK2XHaDopPFt KaMRY5f6DN6I3SJXEEwYehjy2a6636Yk078Sbd0bTuRnY8x1aPiWwMEHd0X+rN7vRY9I E6uq7iBbZ6mcSYIDzzeeDd5DFLWxwjvwhAamo1/04ybmqueQEFG9v00JVRpnoEQ4Y40/ jNzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=te5He/UHQmOCHEgI4cTwNdPbiIyYhC54PBV8AWHE3DI=; b=L9drqHlo9sgquuQw4zoz4sLm/NstrOgomaiWNrjpPT0L9WmPdXmOFtJ48euA6iuU1P IYN/R4ype3TS2e35VNQt+3E4PukFYMkwM+B77SaEaqovnPCaKUX3MKxteGY+xqL52HFI Y10DKmUN907jXV8NjdHPyx8NAQjK7eGrmjsZC3HPH84a0HwMVBeLzTkwlPLLm1AgfSs6 ZmdEKkKGJt9F/2nm02e/cjzIuCud0K2gFf5H8JVe1TouKAKY5isseSFeb4ei3M/N4QMM xsAdJge4tlz4CfFxOhN+6Yb9VFmzgbA5/WjtwUEraDH0T+ZYA1K+mJcRTfkivnqXHcEc OM+w== X-Gm-Message-State: ALyK8tKSTI3DdiOxWWhsJStf3kP/W9Mrd+BSxAIyb4PQ0xDhw91zetMMNK1CP/wl0+OhEFqm X-Received: by 10.194.106.66 with SMTP id gs2mr168717wjb.23.1464902178187; Thu, 02 Jun 2016 14:16:18 -0700 (PDT) Received: from [192.168.0.16] (85-171-34-230.rev.numericable.fr. [85.171.34.230]) by smtp.gmail.com with ESMTPSA id c62sm3088872wmd.1.2016.06.02.14.16.17 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 02 Jun 2016 14:16:17 -0700 (PDT) To: Jerin Jacob References: <1464101442-10501-1-git-send-email-jerin.jacob@caviumnetworks.com> <1464250025-9191-1-git-send-email-jerin.jacob@caviumnetworks.com> <574BFD97.2010505@6wind.com> <20160531125822.GA10995@localhost.localdomain> <574DFC9A.2050304@6wind.com> <20160601070018.GA26922@localhost.localdomain> <574FE202.2060306@6wind.com> <20160602093936.GB6794@localhost.localdomain> Cc: dev@dpdk.org, thomas.monjalon@6wind.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com From: Olivier MATZ Message-ID: <5750A220.6040804@6wind.com> Date: Thu, 2 Jun 2016 23:16:16 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160602093936.GB6794@localhost.localdomain> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v2] mempool: replace c memcpy code semantics with optimized rte_memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jun 2016 21:16:18 -0000 Hi Jerin, On 06/02/2016 11:39 AM, Jerin Jacob wrote: > On Thu, Jun 02, 2016 at 09:36:34AM +0200, Olivier MATZ wrote: >> I think the LIFO behavior should occur on a per-bulk basis. I mean, >> it should behave like in the exemplaes below: >> >> // pool cache is in state X >> obj1 = mempool_get(mp) >> obj2 = mempool_get(mp) >> mempool_put(mp, obj2) >> mempool_put(mp, obj1) >> // pool cache is back in state X >> >> // pool cache is in state X >> bulk1 = mempool_get_bulk(mp, 16) >> bulk2 = mempool_get_bulk(mp, 16) >> mempool_put_bulk(mp, bulk2, 16) >> mempool_put_bulk(mp, bulk1, 16) >> // pool cache is back in state X >> > > Per entry LIFO behavior make more sense in _bulk_ case as recently en-queued buffer > comes out for next "get" makes more chance that buffer in Last level cache. Yes, from a memory cache perspective, I think you are right. In practise, I'm not sure it's so important because on many hw drivers, a packet buffer returns to the pool only after a round of the tx ring. So I'd say it won't make a big difference here. >> Note that today it's not the case for bulks, since object addresses >> are reversed only in get(), we are not back in the original state. >> I don't really see the advantage of this. >> >> Removing the reversing may accelerate the cache in case of bulk get, >> I think. > > I tried in my setup, it was dropping the performance. Have you got > improvement in any setup? I know that the mempool_perf autotest is not representative of a real use case, but it gives a trend. I did a quick test with - the legacy code, - the rte_memcpy in put() - the rte_memcpy in both put() and get() (no reverse) It seems that removing the reversing brings ~50% of enhancement with bulks of 32 (on an westmere): legacy mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=839922483 mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=849792204 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=1617022156 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=1675087052 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=3202914713 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=3268725963 rte_memcpy in put() (your patch proposal) mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=762157465 mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=744593817 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=1500276326 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=1461347942 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=2974076107 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=2928122264 rte_memcpy in put() and get() mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=974834892 mempool_autotest cache=512 cores=1 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=1129329459 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=2147798220 mempool_autotest cache=512 cores=2 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=2232457625 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=32 rate_persec=4510816664 mempool_autotest cache=512 cores=4 n_get_bulk=32 n_put_bulk=32 n_keep=128 rate_persec=4582421298 This is probably more a measure of the pure CPU cost of the mempool function, without considering the memory cache aspect. So, of course, a real use-case test should be done to confirm or not that it increases the performance. I'll manage to do a test and let you know the result. By the way, not all drivers are allocating or freeing the mbufs by bulk, so this modification would only affect these ones. What driver are you using for your test? Regards, Olivier