From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 97F7BA034D; Fri, 28 Jan 2022 09:40:47 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 04F5540688; Fri, 28 Jan 2022 09:40:47 +0100 (CET) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id AAEAB40141 for ; Fri, 28 Jan 2022 09:40:45 +0100 (CET) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [RFC] mempool: modify flush threshold X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Fri, 28 Jan 2022 09:40:45 +0100 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D86E53@smartserver.smartshare.dk> In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [RFC] mempool: modify flush threshold Thread-Index: AdgROwkZuWblliw+TCizaeUiQGWQjAC3ePDQ References: <98CBD80474FA8B44BF855DF32C47DC35D86DB9@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D86DF0@smartserver.smartshare.dk> From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: "Olivier Matz" Cc: "Bruce Richardson" , "Andrew Rybchenko" , "Honnappa Nagarahalli" , X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org +Honnappa, you have shown a keen interest in this topic. > From: Olivier Matz [mailto:olivier.matz@6wind.com] > Sent: Monday, 24 January 2022 16.57 >=20 > On Mon, Jan 10, 2022 at 09:40:48AM +0000, Bruce Richardson wrote: > > On Sat, Jan 08, 2022 at 12:00:17PM +0100, Morten Br=F8rup wrote: > > > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > > > Sent: Friday, 7 January 2022 16.12 > > > > > > > > On Tue, Dec 28, 2021 at 03:28:45PM +0100, Morten Br=F8rup wrote: > > > > > Hi mempool maintainers and DPDK team. > > > > > > > > > > Does anyone know the reason or history why > > > > CACHE_FLUSHTHRESH_MULTIPLIER was chosen to be 1.5? I think it is > > > > counterintuitive. > > > > > > > > > > The mempool cache flush threshold was introduced in DPDK > version 1.3; > > > > it was not in DPDK version 1.2. The copyright notice for > rte_mempool.c > > > > says year 2012. > > > > > > > > > > > > > > > Here is my analysis: > > > > > > > > > > With the multiplier of 1.5, a mempool cache is allowed to be > filled > > > > up to 50 % above than its target size before its excess entries > are > > > > flushed to the mempool (thereby reducing the cache length to the > target > > > > size). > > > > > > > > > > In the opposite direction, a mempool cache is allowed to be > drained > > > > completely, i.e. up to 100 % below its target size. > > > > > > > > > > My instinct tells me that it would be more natural to let a > mempool > > > > cache go the same amount above and below its target size, i.e. > using a > > > > flush multiplier of 2 instead of 1.5. > > > > > > > > > > Also, the cache should be allowed to fill up to and including > the > > > > flush threshold, so it is flushed when the threshold is = exceeded, > > > > instead of when it is reached. > > > > > > > > > > Here is a simplified example: > > > > > > > > > > Imagine a cache target size of 32, corresponding to a typical > packet > > > > burst. With a flush threshold of 2 (and len > threshold instead > of len > > > > >=3D threshold), the cache could hold 1 +/-1 packet bursts. With > the > > > > current multiplier it can only hold [0 .. 1.5[ packet bursts, = not > > > > really providing a lot of elasticity. > > > > > > > > > Hi Morten, > > > > > > > > Interesting to see this being looked at again. The original idea > of > > > > adding > > > > in some extra room above the requested value was to avoid the > worst- > > > > case > > > > scenario of a pool oscillating between full and empty repeatedly > due to > > > > the > > > > addition/removal of perhaps a single packet. As for why 1.5 was > chosen > > > > as > > > > the value, I don't recall any particular reason for it myself. > The main > > > > objective was to have a separate flush and size values so that = we > could > > > > go > > > > a bit above full, and when flushing, not emptying the entire > cache down > > > > to > > > > zero. > > > > > > Thanks for providing the historical background for this feature, > Bruce. > > > > > > > > > > > In terms of the behavioural points you make above, I wonder if > symmetry > > > > is > > > > actually necessary or desirable in this case. After all, the > ideal case > > > > is > > > > probably to keep the mempool neither full nor empty, so that = both > > > > allocations or frees can be done without having to go to the > underlying > > > > shared data structure. To accomodate this, the mempool will only > flush > > > > when > > > > the number of elements is greater than size * 1.5, and then it > only > > > > flushes > > > > elements down to size, ensuring that allocations can still take > place. > > > > On allocation, new buffers are taken when we don't have enough = in > the > > > > cache > > > > to fullfil the request, and then the cache is filled up to size, > not to > > > > the > > > > flush threshold. > > > > > > I agree with the ideal case. > > > > > > However, it looks like the addition of the flush threshold also > changed the "size" parameter to effectively become "desired length" > instead. This interpretation is also supported by the flush algorithm, > which doesn't flush below the "size", but to the "size". So based on > interpretation, I was wondering why it is not symmetrical around the > "desired length", but allowed to go 100 % below and only 50 % above. > > > > > > > > > > > Now, for the scenario you describe - where the mempool cache = size > is > > > > set to > > > > be the same as the burst size, this scheme probably does break > down, in > > > > that we don't really have any burst elasticity. However, I would > query > > > > if > > > > that is a configuration that is used, since to the user it = should > > > > appear > > > > correctly to provide no elasticity. Looking at testpmd, and our > example > > > > apps, the standard there is a burst size of 32 and mempool cache > of > > > > ~256. > > > > In OVS code, netdev-dpdk.c seems to initialize the mempool with > cache > > > > size > > > > of RTE_MEMPOOL_CACHE_MAX_SIZE (through define MP_CACHE_SZ). In > all > > > > these > > > > cases, I think the 1.5 threshold should work just fine for us. > > > > > > My example was for demonstration only, and hopefully not being = used > by any applications. > > > > > > The simplified example was intended to demonstrate the theoretical > effect of the unbalance in using the 1.5 threshold. It will be similar > with a cache size of 256 objects: You will be allowed to go 8 bursts > (calculation: 256 objects / 32 objects per burst) below the cache > "size" without retrieving objects from the backing pool, but only 4 > bursts above the cache "size" before flushing objects to the backing > pool. > > > > > > > That said, > > > > if you want to bump it up to 2x, I can't say I'd object strongly > as it > > > > should be harmless, I think. > > > > > > From an academic viewpoint, 2x seems much better to me than 1.5x. > And based on your background information from the introduction of the > flush threshold, there was no academic reason why 1.5x was chosen - > this is certainly valuable feedback. > > > >=20 > Just to say I agree with Morten's analysis. Having a threshold at 2x > size would > make more sense to me. >=20 > Without the corner cases (big requests, or common pool empty), I would > have suggested this, which looks more symetric than what we have = today: >=20 > get() > fill request with as many objs as possible from cache > if cache is empty, request "size" more objects from common pool > fill the rest of the request from cache I agree, and this is doable without a performance penalty. >=20 > put() > fill cache with provided objects up to 2x size > if cache exceeds threshold, fill common pool with "size" objects > fill the cache with the remaining objs from request I tried implementing this already, but it cannot be done without a = performance cost, one way or the other: 1. If we flush the top of the cache (which is a stack), we are flushing = the hot objects, leaving cold objects at the new top of the cache. 2. If we flush the bottom of the cache, we need to move the remaining = objects in the cache down. E.g. flushing objects at index [0..256[ in = the cache means that the objects at index [256..len[ in the cache must = be moved down to index[0..len-256[. So I came to the conclusion that flushing the cache needs to flush it = completely. I also consider this the best choice for real applications: Either, an lcore thread in the application is in a state of balance, = where it uses the mempool cache within its flush/refill boundaries - = which makes the flush method less important. Or, an lcore thread in the application is out of balance (either = permanently or temporarily), and mostly gets or puts objects from/to the = mempool. If it mostly puts objects, not flushing all of the objects will = cause more frequent flushing. E.g.: Cache size=3D256, flushthresh=3D512 (2x, not 1.5x), initial len=3D256; = application burst len=3D32. If flushing leaves "size" objects in the cache, the cache is flushed = every 8th burst. And with flushthresh=3D384 (1.5x), flushing occurs = every 4th burst, which is what the mempool does today! If the cache is flushed completely, the cache is only flushed at every = 16th burst. And when/if the application thread breaks its pattern of continuously = putting objects, and suddenly starts to get objects instead, it will = either get objects already in the cache, or the get() function will = refill the cache. The concept of not flushing the cache completely is based on the = assumption that it is more likely for an application's lcore thread to = get() after flushing than to put() after flushing. I strongly disagree = with this assumption! If an application thread is continuously putting = so much that it overflows the cache, it is much more likely to keep = putting than it is to start getting. This is CPU branch prediction 101. >=20 > But this is a quite sensitive area, and modifying these functions > should > be done with care, as it impacts mbuf allocation. Certainly! The current mempool caching algorithm is defective, and I'm trying to = repair it. Honnappa already provided very valuable feedback regarding = CPUs with small L1 cache, which must be taken carefully into = consideration. For now, there seems to be consensus regarding changing the flush = threshold multiplier from 1.5 to 2. >=20 > > > I will consider providing a patch. Worst case it probably doesn't > do any harm. And if anyone is really interested, they can look at the > mempool debug stats to see the difference it makes for their specific > application, e.g. the put_common_pool_bulk/put_bulk and > get_common_pool_bulk/get_success_bulk ratios should decrease. > > > > Sure, thanks. > > > > /Bruce