From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A9E94A00C3; Tue, 18 Jan 2022 10:07:31 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 452C4426EF; Tue, 18 Jan 2022 10:07:31 +0100 (CET) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by mails.dpdk.org (Postfix) with ESMTP id 849F2426ED for ; Tue, 18 Jan 2022 10:07:29 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1642496849; x=1674032849; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=CXeeTb3pQq8Ne2HJxQkCMV0A0KHwPIXPsiP4VoiWsfk=; b=f7owLxEQMrIxVC5nnhwq/JWWYxk5wJzHw+EZzu5siYekdOH3fUeSwdkx x10NhB52p+LMyjp6CJ0YFQEEWCykHKD/Kn07JK9XiFlE/M1y4RmzaaYDz IZTGk7kEL2JHWRK3S6mKEaJPdvUWS1fjyZsfR47xLKBbMZhsxZGfDmzLm 5uBQjwUkl0X6rMXi8y9NJiE/zI6XiSmMxpKophyWoDIKEh+8w6UXh7QvO E44bQRMeWvR/G+CNgsEM9hD2JCx//WiAqf5p+VxOl7QPHyyPyYLGqiD/1 dPUm5u9QpXwpXfHoTdoNxvDRzSp+VvGJ1zzy4lPouWf56aZUyxdMd42PU Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10230"; a="308109518" X-IronPort-AV: E=Sophos;i="5.88,297,1635231600"; d="scan'208";a="308109518" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2022 01:07:28 -0800 X-IronPort-AV: E=Sophos;i="5.88,297,1635231600"; d="scan'208";a="693324034" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.29.28]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 18 Jan 2022 01:07:27 -0800 Date: Tue, 18 Jan 2022 09:07:23 +0000 From: Bruce Richardson To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: olivier.matz@6wind.com, andrew.rybchenko@oktetlabs.ru, jerinjacobk@gmail.com, dev@dpdk.org Subject: Re: [PATCH] mempool: fix get objects from mempool with cache Message-ID: References: <98CBD80474FA8B44BF855DF32C47DC35D86DB2@smartserver.smartshare.dk> <20220114163650.94288-1-mb@smartsharesystems.com> <98CBD80474FA8B44BF855DF32C47DC35D86E0F@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D86E0F@smartserver.smartshare.dk> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Tue, Jan 18, 2022 at 09:25:22AM +0100, Morten Brørup wrote: > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: Monday, 17 January 2022 18.35 > > > > On Fri, Jan 14, 2022 at 05:36:50PM +0100, Morten Brørup wrote: > > > A flush threshold for the mempool cache was introduced in DPDK > > version > > > 1.3, but rte_mempool_do_generic_get() was not completely updated back > > > then, and some inefficiencies were introduced. > > > > > > This patch fixes the following in rte_mempool_do_generic_get(): > > > > > > 1. The code that initially screens the cache request was not updated > > > with the change in DPDK version 1.3. > > > The initial screening compared the request length to the cache size, > > > which was correct before, but became irrelevant with the introduction > > of > > > the flush threshold. E.g. the cache can hold up to flushthresh > > objects, > > > which is more than its size, so some requests were not served from > > the > > > cache, even though they could be. > > > The initial screening has now been corrected to match the initial > > > screening in rte_mempool_do_generic_put(), which verifies that a > > cache > > > is present, and that the length of the request does not overflow the > > > memory allocated for the cache. > > > > > > 2. The function is a helper for rte_mempool_generic_get(), so it must > > > behave according to the description of that function. > > > Specifically, objects must first be returned from the cache, > > > subsequently from the ring. > > > After the change in DPDK version 1.3, this was not the behavior when > > > the request was partially satisfied from the cache; instead, the > > objects > > > from the ring were returned ahead of the objects from the cache. This > > is > > > bad for CPUs with a small L1 cache, which benefit from having the hot > > > objects first in the returned array. (This is also the reason why > > > the function returns the objects in reverse order.) > > > Now, all code paths first return objects from the cache, subsequently > > > from the ring. > > > > > > 3. If the cache could not be backfilled, the function would attempt > > > to get all the requested objects from the ring (instead of only the > > > number of requested objects minus the objects available in the ring), > > > and the function would fail if that failed. > > > Now, the first part of the request is always satisfied from the > > cache, > > > and if the subsequent backfilling of the cache from the ring fails, > > only > > > the remaining requested objects are retrieved from the ring. > > > > > > 4. The code flow for satisfying the request from the cache was > > slightly > > > inefficient: > > > The likely code path where the objects are simply served from the > > cache > > > was treated as unlikely. Now it is treated as likely. > > > And in the code path where the cache was backfilled first, numbers > > were > > > added and subtracted from the cache length; now this code path simply > > > sets the cache length to its final value. > > > > > > 5. Some comments were not correct anymore. > > > The comments have been updated. > > > Most importanly, the description of the succesful return value was > > > inaccurate. Success only returns 0, not >= 0. > > > > > > Signed-off-by: Morten Brørup > > > --- > > > > I am a little uncertain about the reversing of the copies taking things > > out > > of the mempool - for machines where we are not that cache constrainted > > will > > we lose out in possible optimizations where the compiler optimizes the > > copy > > loop as a memcpy? > > The objects are also returned in reverse order in the code it replaces, so this behavior is not introduced by this patch; I only describe the reason for it. > > I floated a previous patch, in which the objects were returned in order, but Jerin argued [1] that we should keep it the way it was, unless I could show a performance improvement. > > So I retracted that patch to split it up in two independent patches instead. This patch for get(), and [3] for put(). > > While experimenting using rte_memcpy() for these, I couldn't achieve a performance boost - quite the opposite. So I gave up on it. > > Reviewing the x86 variant of rte_memcpy() [2] makes me think that it is inefficient for copying small bulks of pointers, especially when n is unknown at compile time, and its code path goes through a great deal of branches. > Thanks for all the explanation. Reviewed-by: Bruce Richardson