From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id C741EA0559;
	Wed, 16 Nov 2022 12:29:57 +0100 (CET)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 6821F40E03;
	Wed, 16 Nov 2022 12:29:57 +0100 (CET)
Received: from shelob.oktetlabs.ru (shelob.oktetlabs.ru [91.220.146.113])
 by mails.dpdk.org (Postfix) with ESMTP id E1FA740DFB
 for <dev@dpdk.org>; Wed, 16 Nov 2022 12:29:56 +0100 (CET)
Received: from [192.168.38.17] (aros.oktetlabs.ru [192.168.38.17])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
 (No client certificate requested)
 by shelob.oktetlabs.ru (Postfix) with ESMTPSA id 057DD66;
 Wed, 16 Nov 2022 14:29:55 +0300 (MSK)
DKIM-Filter: OpenDKIM Filter v2.11.0 shelob.oktetlabs.ru 057DD66
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=oktetlabs.ru;
 s=default; t=1668598196;
 bh=tLaD6uSFIUsNtNP+KbmmJc4Qj7mIzaUHU9CaBLCr4G4=;
 h=Date:Subject:To:Cc:References:From:In-Reply-To:From;
 b=xMMg5sGk7eVcsz/Yy7KhYokDuzCUVVwoWZHPkSqfAm/hwQIJpvl0EvYr/IObdZ3qw
 E1Xys4pvXtDhaCd6DVBwH0/z9bzuUoRQXtV5KsB67PvE4l1ckHsXWdb7Prp4ly3kqX
 A0+eE/fZQgUdMQjshoXTP5daf1KzXWFTocBJCMmE=
Message-ID: <27299e15-948d-9e0b-d6e0-efb740a3016e@oktetlabs.ru>
Date: Wed, 16 Nov 2022 14:29:55 +0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.4.0
Subject: Re: [PATCH] mempool: micro-optimize put function
Content-Language: en-US
To: =?UTF-8?Q?Morten_Br=c3=b8rup?= <mb@smartsharesystems.com>,
 olivier.matz@6wind.com, dev@dpdk.org
Cc: honnappa.nagarahalli@arm.com, bruce.richardson@intel.com,
 konstantin.ananyev@huawei.com
References: <20221116101855.93297-1-mb@smartsharesystems.com>
 <cc8ddab3-f8ed-1a57-ace8-bdcb06fad75f@oktetlabs.ru>
 <98CBD80474FA8B44BF855DF32C47DC35D874C7@smartserver.smartshare.dk>
From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Organization: OKTET Labs
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D874C7@smartserver.smartshare.dk>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On 11/16/22 14:10, Morten Brørup wrote:
>> From: Andrew Rybchenko [mailto:andrew.rybchenko@oktetlabs.ru]
>> Sent: Wednesday, 16 November 2022 12.05
>>
>> On 11/16/22 13:18, Morten Brørup wrote:
>>> Micro-optimization:
>>> Reduced the most likely code path in the generic put function by
>> moving an
>>> unlikely check out of the most likely code path and further down.
>>>
>>> Also updated the comments in the function.
>>>
>>> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
>>> ---
>>>    lib/mempool/rte_mempool.h | 35 ++++++++++++++++++-----------------
>>>    1 file changed, 18 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
>>> index 9f530db24b..aba90dbb5b 100644
>>> --- a/lib/mempool/rte_mempool.h
>>> +++ b/lib/mempool/rte_mempool.h
>>> @@ -1364,32 +1364,33 @@ rte_mempool_do_generic_put(struct rte_mempool
>> *mp, void * const *obj_table,
>>>    {
>>>    	void **cache_objs;
>>>
>>> -	/* No cache provided */
>>> +	/* No cache provided? */
>>>    	if (unlikely(cache == NULL))
>>>    		goto driver_enqueue;
>>>
>>> -	/* increment stat now, adding in mempool always success */
>>> +	/* Increment stats now, adding in mempool always succeeds. */
>>>    	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_bulk, 1);
>>>    	RTE_MEMPOOL_CACHE_STAT_ADD(cache, put_objs, n);
>>>
>>> -	/* The request itself is too big for the cache */
>>> -	if (unlikely(n > cache->flushthresh))
>>> -		goto driver_enqueue_stats_incremented;
>>
>> I've kept the check here since it protects against overflow in len plus
>> n below if n is really huge.
> 
> We can fix that, see below.
> 
>>
>>> -
>>> -	/*
>>> -	 * The cache follows the following algorithm:
>>> -	 *   1. If the objects cannot be added to the cache without
>> crossing
>>> -	 *      the flush threshold, flush the cache to the backend.
>>> -	 *   2. Add the objects to the cache.
>>> -	 */
>>> -
>>> -	if (cache->len + n <= cache->flushthresh) {
>>> +	if (likely(cache->len + n <= cache->flushthresh)) {
> 
> It is an invariant that cache->len <= cache->flushthresh, so the above comparison can be rewritten to protect against overflow:
> 
> if (likely(n <= cache->flushthresh - cache->len)) {
> 

True, but it would be useful to highlight the usage of the
invariant here using either a comment or an assert.

IMHO it is wrong to use likely here since, as far as I know, it makes 
else branch very expensive, but crossing the flush
threshold is an expected branch and it must not be that
expensive.

>>> +		/*
>>> +		 * The objects can be added to the cache without crossing
>> the
>>> +		 * flush threshold.
>>> +		 */
>>>    		cache_objs = &cache->objs[cache->len];
>>>    		cache->len += n;
>>> -	} else {
>>> +	} else if (likely(n <= cache->flushthresh)) {
>>> +		/*
>>> +		 * The request itself fits into the cache.
>>> +		 * But first, the cache must be flushed to the backend, so
>>> +		 * adding the objects does not cross the flush threshold.
>>> +		 */
>>>    		cache_objs = &cache->objs[0];
>>>    		rte_mempool_ops_enqueue_bulk(mp, cache_objs, cache->len);
>>>    		cache->len = n;
>>> +	} else {
>>> +		/* The request itself is too big for the cache. */
>>> +		goto driver_enqueue_stats_incremented;
>>>    	}
>>>
>>>    	/* Add the objects to the cache. */
>>> @@ -1399,13 +1400,13 @@ rte_mempool_do_generic_put(struct rte_mempool
>> *mp, void * const *obj_table,
>>>
>>>    driver_enqueue:
>>>
>>> -	/* increment stat now, adding in mempool always success */
>>> +	/* Increment stats now, adding in mempool always succeeds. */
>>>    	RTE_MEMPOOL_STAT_ADD(mp, put_bulk, 1);
>>>    	RTE_MEMPOOL_STAT_ADD(mp, put_objs, n);
>>>
>>>    driver_enqueue_stats_incremented:
>>>
>>> -	/* push objects to the backend */
>>> +	/* Push the objects to the backend. */
>>>    	rte_mempool_ops_enqueue_bulk(mp, obj_table, n);
>>>    }
>>>
>>
>