From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 920C5A0032; Thu, 1 Sep 2022 10:21:11 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2A95640684; Thu, 1 Sep 2022 10:21:11 +0200 (CEST) Received: from smartserver.smartsharesystems.com (smartserver.smartsharesystems.com [77.243.40.215]) by mails.dpdk.org (Postfix) with ESMTP id 25E5640395 for ; Thu, 1 Sep 2022 10:21:10 +0200 (CEST) Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: Optimizing for common burst sizes X-MimeOLE: Produced By Microsoft Exchange V6.5 Date: Thu, 1 Sep 2022 10:21:07 +0200 Message-ID: <98CBD80474FA8B44BF855DF32C47DC35D872DC@smartserver.smartshare.dk> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Optimizing for common burst sizes Thread-Index: Adi928klsXHTbkBAR6+4w9Zdv/F62A== From: =?iso-8859-1?Q?Morten_Br=F8rup?= To: X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Triggered by the discussion about the performance cost of function = pointers [1], I want to share some thoughts about variables vs. = constants: A lot of flexibility - some of it only required to support more or less = exotic scenarios - has been sneaking into DPDK, and costing performance. = Not only function pointers, but also variables, which might as well be = build time constants. E.g.: When the i40e driver allocates a bulk of mbufs, the non-vector = function allocates rxq->rx_free_thresh mbufs each time, but the = optimized vector functions allocate RTE_I40E_RXQ_REARM_THRESH (#defined = as 32) mbufs each time. Using a constant provides higher performance, = because the compiler at build time knows the size of the mempool cache = copy loop in the rte_mempool_do_generic_get() function, which gets = inlined into the i40e driver. I suppose the non-vector variant having a = variable is there to support latency-sensitive applications requiring = very small bursts - but using a build time configurable constant would = provide even higher performance (i.e. lower latency). We might be able to achieve some general performance improvements by = agreeing on a few "extremely common" burst sizes, and giving them = special treatment. It would not only improve performance, but might also = reduce some complexity by using common standards in places where burst = sizes today are implementation specific. They might also be helpful = guidance for new DPDK application developers. These constants could be: RTE_BURST_MICRO --------------- The burst size for latency sensitive application. Default: 4 (?) RTE_BURST_SMALL --------------- A small, but still efficient burst; e.g. a cache line of pointers. Default: RTE_CACHE_LINE_SIZE / sizeof(void *) =3D 8 or 16 RTE_BURST_DEFAULT ----------------- The typical application burst size. Default: 32 RTE_BURST_HUGE -------------- A very large burst, but small enough to fit into a typical PMD egress = queue. E.g. the mempool cache size, or half of it. Default: RTE_MEMPOOL_CACHE_MAX_SIZE =3D 512 (or half of it?) Note: This one might be too rare to deserve special treatment, but is = included for the sake of discussion. Obviously, these should be build time configurable. [1] = http://inbox.dpdk.org/dev/20220818114449.1408226-1-cristian.dumitrescu@in= tel.com/T/#m679f356f097c89d3a542b7a0967069d6d0bc25e3 Med venlig hilsen / Kind regards, -Morten Br=F8rup