From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
Dharmik Thakkar <Dharmik.Thakkar@arm.com>
Cc: Olivier Matz <olivier.matz@6wind.com>,
Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
"dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>,
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
"Ruifeng Wang" <Ruifeng.Wang@arm.com>
Subject: RE: [PATCH 1/1] mempool: implement index-based per core cache
Date: Fri, 21 Jan 2022 11:31:55 +0000 [thread overview]
Message-ID: <DM6PR11MB4491578D71A6FAF1B67469229A5B9@DM6PR11MB4491.namprd11.prod.outlook.com> (raw)
In-Reply-To: <DM6PR11MB4491B14574EE5E1B0A5CD8E79A5B9@DM6PR11MB4491.namprd11.prod.outlook.com>
>
>
> Hi Dharmik,
> > >
> > >>>
> > >>>> Current mempool per core cache implementation stores pointers to mbufs
> > >>>> On 64b architectures, each pointer consumes 8B
> > >>>> This patch replaces it with index-based implementation,
> > >>>> where in each buffer is addressed by (pool base address + index)
> > >>>> It reduces the amount of memory/cache required for per core cache
> > >>>>
> > >>>> L3Fwd performance testing reveals minor improvements in the cache
> > >>>> performance (L1 and L2 misses reduced by 0.60%)
> > >>>> with no change in throughput
> > >>>
> > >>> I feel really sceptical about that patch and the whole idea in general:
> > >>> - From what I read above there is no real performance improvement observed.
> > >>> (In fact on my IA boxes mempool_perf_autotest reports ~20% slowdown,
> > >>> see below for more details).
> > >>
> > >> Currently, the optimizations (loop unroll and vectorization) are only implemented for ARM64.
> > >> Similar optimizations can be implemented for x86 platforms which should close the performance gap
> > >> and in my understanding should give better performance for a bulk size of 32.
> > >
> > > Might be, but I still don't see the reason for such effort.
> > > As you mentioned there is no performance improvement in 'real' apps: l3fwd, etc.
> > > on ARM64 even with vectorized version of the code.
> > >
> >
> > IMO, even without performance improvement, it is advantageous because the same performance is being achieved
> > with less memory and cache utilization using the patch.
> >
> > >>> - Space utilization difference looks neglectable too.
> > >>
> > >> Sorry, I did not understand this point.
> > >
> > > As I understand one of the expectations from that patch was:
> > > reduce memory/cache required, which should improve cache utilization
> > > (less misses, etc.).
> > > Though I think such improvements would be neglectable and wouldn't
> > > cause any real performance gain.
> >
> > The cache utilization performance numbers are for the l3fwd app, which might not be bottlenecked at the mempool per core cache.
> > Theoretically, this patch enables storing twice the number of objects in the cache as compared to the original implementation.
>
> It saves you 4 just bytes per mbuf.
> Even for simple l2fwd-like workload we access ~100 bytes per mbuf.
> Let's do a simplistic estimation of number of affected cache-lines l for l2fwd.
> For bulk of 32 packets, assuming 64B per cache-line and 16B per HW desc:
>
> number of cache-lines accessed
> cache with pointers / cache with indexes
> mempool_get: (32*8)/64=4 / (32*4)/64=2
> RX (read HW desc): (32*16)/64=8 / (32*16)/64=8
> RX (write mbuf fields, 1st cache line): (32*64)/64=3 / (32*64)/64=32
Should be:
RX (write mbuf fields, 1st cache line): (32*64)/64=32 / (32*64)/64=32
off course
> update mac addrs: (32*64)/64=32 / (32*64)/64=32
> TX (write HW desc): (32*16)/64=8 / (32*16)/64=8
> free mbufs (read 2nd mbuf cache line): (32*64)/64=32 / (32*64)/64=32
> mempool_put: (32*8)/64=4 / (32*4)/64=2
> total: 120 116
>
> So, if my calculations are correct, max estimated gain for cache utilization would be:
> (120-116)*100/120=3.33%
> Note that numbers are for over-simplistic usage scenario.
> In more realistic ones, when we have to touch more cache-lines per packet,
> that difference would be even less noticeable.
> So I really doubt we will see some noticeable improvements in terms of cache utilization
> with that patch.
>
> > >
> > >>> - The change introduces a new build time config option with a major limitation:
> > >>> All memzones in a pool have to be within the same 4GB boundary.
> > >>> To address it properly, extra changes will be required in init(/populate) part of the code.
> > >>
> > >> I agree to the above mentioned challenges and I am currently working on resolving these issues.
> > >
> > > I still think that to justify such changes some really noticeable performance
> > > improvement needs to be demonstrated: double-digit speedup for l3fwd/ipsec-secgw/...
> > > Otherwise it just not worth the hassle.
> > >
> >
> > Like I mentioned earlier, the app might not be bottlenecked at the mempool per core cache.
> > That could be the reason the numbers with l3fwd don’t fully show the advantage of the patch.
>
> As I said above, I don’t think we'll see any real advantage here.
> But feel free to pick-up different app and prove me wrong.
> After all we have plenty of sample apps that do provide enough
> pressure on the cache: l3fwd-acl, ipsec-secgw.
> Or you can even apply these patches from Sean:
> https://patches.dpdk.org/project/dpdk/list/?series=20999
> to run l3fwd with configurable routes.
> That should help you to make it cache-bound.
>
> > I’m seeing double-digit improvement with mempool_perf_autotest which should not be ignored.
>
> And for other we are seeing double digit degradation.
> So far the whole idea doesn't look promising at all, at least to me.
> Konstantin
next prev parent reply other threads:[~2022-01-21 11:32 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-30 17:27 [dpdk-dev] [RFC] " Dharmik Thakkar
2021-10-01 12:36 ` Jerin Jacob
2021-10-01 15:44 ` Honnappa Nagarahalli
2021-10-01 17:32 ` Jerin Jacob
2021-10-01 17:57 ` Honnappa Nagarahalli
2021-10-01 18:21 ` Jerin Jacob
2021-10-01 21:30 ` Ananyev, Konstantin
2021-10-02 0:07 ` Honnappa Nagarahalli
2021-10-02 18:51 ` Ananyev, Konstantin
2021-10-04 16:36 ` Honnappa Nagarahalli
2021-10-30 10:23 ` Morten Brørup
2021-10-31 8:14 ` Morten Brørup
2021-11-03 15:12 ` Dharmik Thakkar
2021-11-03 15:52 ` Morten Brørup
2021-11-04 4:42 ` Dharmik Thakkar
2021-11-04 8:04 ` Morten Brørup
2021-11-08 4:32 ` Honnappa Nagarahalli
2021-11-08 7:22 ` Morten Brørup
2021-11-08 15:29 ` Honnappa Nagarahalli
2021-11-08 15:39 ` Morten Brørup
2021-11-08 15:46 ` Honnappa Nagarahalli
2021-11-08 16:03 ` Morten Brørup
2021-11-08 16:47 ` Jerin Jacob
2021-12-24 22:59 ` [PATCH 0/1] " Dharmik Thakkar
2021-12-24 22:59 ` [PATCH 1/1] " Dharmik Thakkar
2022-01-11 2:26 ` Ananyev, Konstantin
2022-01-13 5:17 ` Dharmik Thakkar
2022-01-13 10:37 ` Ananyev, Konstantin
2022-01-19 15:32 ` Dharmik Thakkar
2022-01-21 11:25 ` Ananyev, Konstantin
2022-01-21 11:31 ` Ananyev, Konstantin [this message]
2022-03-24 19:51 ` Dharmik Thakkar
2021-12-25 0:16 ` [PATCH 0/1] " Morten Brørup
2022-01-07 11:15 ` Bruce Richardson
2022-01-07 11:29 ` Morten Brørup
2022-01-07 13:50 ` Bruce Richardson
2022-01-08 9:37 ` Morten Brørup
2022-01-10 6:38 ` Jerin Jacob
2022-01-13 5:31 ` Dharmik Thakkar
2023-07-06 17:43 ` Stephen Hemminger
2023-07-31 12:23 ` Thomas Monjalon
2023-07-31 12:33 ` Morten Brørup
2023-07-31 14:57 ` Dharmik Jayesh Thakkar
2022-01-13 5:36 ` [PATCH v2 " Dharmik Thakkar
2022-01-13 5:36 ` [PATCH v2 1/1] " Dharmik Thakkar
2022-01-13 10:18 ` Jerin Jacob
2022-01-20 8:21 ` Morten Brørup
2022-01-21 6:01 ` Honnappa Nagarahalli
2022-01-21 7:36 ` Morten Brørup
2022-01-24 13:05 ` Ray Kinsella
2022-01-21 9:12 ` Bruce Richardson
2022-01-23 7:13 ` Wang, Haiyue
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DM6PR11MB4491578D71A6FAF1B67469229A5B9@DM6PR11MB4491.namprd11.prod.outlook.com \
--to=konstantin.ananyev@intel.com \
--cc=Dharmik.Thakkar@arm.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=Ruifeng.Wang@arm.com \
--cc=andrew.rybchenko@oktetlabs.ru \
--cc=dev@dpdk.org \
--cc=nd@arm.com \
--cc=olivier.matz@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).