From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id E6431A0351;
	Mon, 10 Jan 2022 08:27:01 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 748F8411DB;
	Mon, 10 Jan 2022 08:27:01 +0100 (CET)
Received: from mail-io1-f46.google.com (mail-io1-f46.google.com
 [209.85.166.46]) by mails.dpdk.org (Postfix) with ESMTP id 71B9F4013F
 for <dev@dpdk.org>; Mon, 10 Jan 2022 08:27:00 +0100 (CET)
Received: by mail-io1-f46.google.com with SMTP id u8so16345192iol.5
 for <dev@dpdk.org>; Sun, 09 Jan 2022 23:27:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=hIH4i1a/U7600gBJxgjgWQgtEhlur/whAQoNblZH1VY=;
 b=UovvkD0NpLRwd43DQF4m5yR4Hesr2I9nomu8ClUSen7fe4bV90TUU9IFRlnM1uhaGj
 qDuxVlvq3816zN9OxTSvHRarvBWX+cSXDAWrN2TzuuyXq8MrBkLzcPI0uo+ZDYnFfV+Q
 WswWkgsTbQuOzaAt0WbKgMN8bO6vM9BFLe2U7PPQw6VRLSRYxdBtjSEe3hdlpgxkCRuP
 Q0uRqYVJeVKyViUQZzMtWtkCEv0AN8UmvSMz9/E+oxvkC7e1TQQ+p0irsGVJiEMDP4Do
 z9xAoRN0BtVEaMWLPZw7Vf5M9F2+Ke93csSHtEpDAqAQQpV+5wJ2shHv1JdzwRVKEjsD
 AqWg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=hIH4i1a/U7600gBJxgjgWQgtEhlur/whAQoNblZH1VY=;
 b=K1kq0FabkXDsLjzljCaJ4gt049bYQpf09kCrlPQAdqBdCgaAVbHT7lEVegytv8If6L
 QIi+XRUkm5GVhtRf7BsRX2nn9NTugjNEg11KDDBENGv8aPFYjqhfDYqxk2F3fdwcKDMf
 /zRwRFWdFqGePCSav0wuEaFO1yHw63ir86s4xJ8LlQ5UYTKpbxYSPbZbRt8+qg2Y6Gfs
 dgepqgNUjGwBYo5sby5CU4wAtgGvOmtkmnLRZEyWSeoy4mXa/0sUC42+glRGC5VCv2fI
 NzvWniEFoV/GRpwsmApDnW35H+pi2jq8Cc9Sfa7foAyca5XwzP06907jiQcaoRcODWBM
 hR1w==
X-Gm-Message-State: AOAM531cUUchIOKbkwrHWr5w4UFSP63opwB/Cxgguev9ykzSFIwROo1s
 IUQjoLZ2gCr7ddjpxCb/4k9dP9M0VVxmQ8XStsffEYtZ
X-Google-Smtp-Source: ABdhPJzN2Ul7oQaK91zF3TgLjZ+v0iRjSx+O5NJOHKDMBxA0MhUklvui0RaKn60pfZmbecnAQrU9wM0Xyz0N9andxz0=
X-Received: by 2002:a05:6638:260b:: with SMTP id
 m11mr35422193jat.280.1641799619743; 
 Sun, 09 Jan 2022 23:26:59 -0800 (PST)
MIME-Version: 1.0
References: <98CBD80474FA8B44BF855DF32C47DC35D86DB2@smartserver.smartshare.dk>
 <20220106122345.82740-1-mb@smartsharesystems.com>
 <CALBAE1MsxYJqMttaB=D4E3iFC6hm=wOhzUJSwpQJh4P3e=nZZA@mail.gmail.com>
 <98CBD80474FA8B44BF855DF32C47DC35D86DE7@smartserver.smartshare.dk>
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D86DE7@smartserver.smartshare.dk>
From: Jerin Jacob <jerinjacobk@gmail.com>
Date: Mon, 10 Jan 2022 12:56:33 +0530
Message-ID: <CALBAE1OjCswxUfaNLWg5y-tnPkFhvvKQ8sJ3JpBoo7ObgeB5OA@mail.gmail.com>
Subject: Re: [PATCH] mempool: optimize incomplete cache handling
To: =?UTF-8?Q?Morten_Br=C3=B8rup?= <mb@smartsharesystems.com>
Cc: Olivier Matz <olivier.matz@6wind.com>, 
 Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>, dpdk-dev <dev@dpdk.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

On Fri, Jan 7, 2022 at 2:16 PM Morten Br=C3=B8rup <mb@smartsharesystems.com=
> wrote:
>
> > From: Jerin Jacob [mailto:jerinjacobk@gmail.com]
> > Sent: Thursday, 6 January 2022 17.55
> >
> > On Thu, Jan 6, 2022 at 5:54 PM Morten Br=C3=B8rup <mb@smartsharesystems=
.com>
> > wrote:
> > >
> > > A flush threshold for the mempool cache was introduced in DPDK
> > version
> > > 1.3, but rte_mempool_do_generic_get() was not completely updated back
> > > then.
> > >
> > > The incompleteness did not cause any functional bugs, so this patch
> > > could be considered refactoring for the purpose of cleaning up.
> > >
> > > This patch completes the update of rte_mempool_do_generic_get() as
> > > follows:
> > >
> > > 1. A few comments were malplaced or no longer correct.
> > > Some comments have been updated/added/corrected.
> > >
> > > 2. The code that initially screens the cache request was not updated.
> > > The initial screening compared the request length to the cache size,
> > > which was correct before, but became irrelevant with the introduction
> > of
> > > the flush threshold. E.g. the cache can hold up to flushthresh
> > objects,
> > > which is more than its size, so some requests were not served from
> > the
> > > cache, even though they could be.
> > > The initial screening has now been corrected to match the initial
> > > screening in rte_mempool_do_generic_put(), which verifies that a
> > cache
> > > is present, and that the length of the request does not overflow the
> > > memory allocated for the cache.
> > >
> > > 3. The code flow for satisfying the request from the cache was weird.
> > > The likely code path where the objects are simply served from the
> > cache
> > > was treated as unlikely; now it is treated as likely.
> > > And in the code path where the cache was backfilled first, numbers
> > were
> > > added and subtracted from the cache length; now this code path simply
> > > sets the cache length to its final value.
> > >
> > > 4. The objects were returned in reverse order.
> > > Returning the objects in reverse order is not necessary, so
> > rte_memcpy()
> > > is now used instead.
> >
> > Have you checked the performance with network workload?
> > IMO, reverse order makes sense(LIFO vs FIFO).
> > The LIFO makes the cache warm as the same buffers are reused
> > frequently.
>
> I have not done any performance testing. We probably agree that the only =
major difference lies in how the objects are returned. And we probably also=
 agree that rte_memcpy() is faster than the copy loop it replaced, especial=
ly when n is constant at compile time. So the performance difference mainly=
 depends on the application, which I will discuss below.
>
> Let's first consider LIFO vs. FIFO.
>
> The key argument for the rte_memcpy() optimization is that we are still g=
etting the burst of objects from the top of the stack (LIFO); only the orde=
r of the objects inside the burst is not reverse anymore.
>
> Here is an example:
>
> The cache initially contains 8 objects: 01234567.
>
> 8 more objects are put into the cache: 89ABCDEF.
>
> The cache now holds: 0123456789ABCDEF.

Agree. However I think, it may matter with less sized L1 cache
machines and burst size is more where it plays role what can be in L1
with the scheme.

I would suggest splitting each performance improvement as a separate
patch for better tracking and quantity of the performance improvement.

I think, mempool performance test and tx only stream mode in testpmd
can quantify patches.


>
> Getting 4 objects from the cache gives us CDEF instead of FEDC, i.e. we a=
re still getting the 4 objects most recently put into the cache.
>
> Furthermore, if the application is working with fixed size bursts, it wil=
l usually put and get the same size burst, i.e. put the burst 89ABCDEF into=
 the cache, and then get the burst 89ABCDEF from the cache again.
>
>
> Here is an example unfavorable scenario:
>
> The cache initially contains 4 objects, which have gone cold: 0123.
>
> 4 more objects, which happen to be hot, are put into the cache: 4567.
>
> Getting 8 objects from the cache gives us 01234567 instead of 76543210.
>
> Now, if the application only processes the first 4 of the 8 objects in th=
e burst, it would have benefitted from those objects being the hot 7654 obj=
ects instead of the cold 0123 objects.
>
> However, I think that most applications process the complete burst, so I =
do consider this scenario unlikely.
>
> Similarly, a pipelined application doesn't process objects in reverse ord=
er at every other step in the pipeline, even though the previous step in th=
e pipeline most recently touched the last object of the burst.
>
>
> My overall conclusion was that the benefit of using rte_memcpy() outweigh=
s the disadvantage of the unfavorable scenario, because I consider the prob=
ability of the unfavorable scenario occurring very low. But again: it mainl=
y depends on the application.
>
> If anyone disagrees with the risk analysis described above, I will happil=
y provide a version 2 of the patch, where the objects are still returned in=
 reverse order. After all, the rte_memcpy() benefit is relatively small com=
pared to the impact if the unlikely scenario occurs.
>