From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1679EA04B1; Wed, 4 Nov 2020 16:01:01 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A1E2DC938; Wed, 4 Nov 2020 16:00:58 +0100 (CET) Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) by dpdk.org (Postfix) with ESMTP id 70A1FC92A for ; Wed, 4 Nov 2020 16:00:56 +0100 (CET) Received: by mail-wm1-f68.google.com with SMTP id h62so2618103wme.3 for ; Wed, 04 Nov 2020 07:00:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=56Tpq6oSORTZzeAW0fOzl2TGuGMYt90ugvAP6bYTO9w=; b=VOrxPlm12efb0kVqrd5gRQLKUCzKCRALPfNJDHddiipDNqbbFALZsYmaFr4YTUMuKY aWpbccLThoB+USiWqYpztMiPQRTXdKZCKfCh+GVhuVQ9H8n3MVJKTZ4Nnm6zsFLiig4r 5fFwgEiI/RXcFnB03TrryxocKEhwYyJV+a3Bw+CDm6Ec0I6DVAD49TzRRH5TK/IKXuqC w1VZ0xOs6KWetQKJpq7txlcS+pq1LgWrAQ3tGarbrY2LhqIXU+x/QpUPv1JhV8GFw19r 5KCA0MFF5O8ZM0M0kb+a1WI41I5+eqlnKyXO4Sb+ELQLt7u6lmCx1oXr6aR5pi1TFv7U h11A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=56Tpq6oSORTZzeAW0fOzl2TGuGMYt90ugvAP6bYTO9w=; b=R+IcYI1cSm1MDtBvUpa2zR5Lw2VcXviIODRWubR3f23GiMNnoC3nYl7761yCTr4CkA Y9eUytXGiRDlS+eX5Lq054niTIvphPOWcOP/fPggvE/IArNOg3pNY3JD0i0s0sjUNq9Y vK/JxdUCFIWnSfjBecBqm2sWRQIDsWFXdSlW4TCFVKUb8ohK0Bvulp9la9IXZa79Sb2E JyZT3If2DnWNY169L1/pFtAZSs1BTYAOyESJ6lyGQANYGBU6myj6fOMb76omCBY9SHKc JRWzV7HX5XRJqoyukb7ff9ymIlMEYA32pH1FWzmnM3JPg4QUrN9AMoRnXRou3+VmWQFV 9sXg== X-Gm-Message-State: AOAM533/fd4+xZHEcMA/0mOIdAbV8rKwRyckunFQaAFJQe74YC5g1nB6 KKZfspn9KJSDBnKA/dq0PHR0xA== X-Google-Smtp-Source: ABdhPJzZc/Wmvm0MA36OGyZQCIOs49djogABFQD5XYV5yJwqrbdGh2JymMhCvQ8zE8y0cWC6tNzUVQ== X-Received: by 2002:a1c:66c4:: with SMTP id a187mr4994562wmc.186.1604502055015; Wed, 04 Nov 2020 07:00:55 -0800 (PST) Received: from 6wind.com (2a01cb0c0005a600345636f7e65ed1a0.ipv6.abo.wanadoo.fr. [2a01:cb0c:5:a600:3456:36f7:e65e:d1a0]) by smtp.gmail.com with ESMTPSA id s9sm2972581wrf.90.2020.11.04.07.00.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Nov 2020 07:00:54 -0800 (PST) Date: Wed, 4 Nov 2020 16:00:53 +0100 From: Olivier Matz To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: Slava Ovsiienko , NBU-Contact-Thomas Monjalon , dev@dpdk.org, techboard@dpdk.org, Ajit Khaparde , "Ananyev, Konstantin" , Andrew Rybchenko , "Yigit, Ferruh" , david.marchand@redhat.com, "Richardson, Bruce" , jerinj@marvell.com, honnappa.nagarahalli@arm.com, maxime.coquelin@redhat.com, stephen@networkplumber.org, hemant.agrawal@nxp.com, Matan Azrad , Shahaf Shuler Message-ID: <20201104150053.GI1898@platinum> References: <20201029092751.3837177-1-thomas@monjalon.net> <3086227.yllCKDRCEA@thomas> <98CBD80474FA8B44BF855DF32C47DC35C613CD@smartserver.smartshare.dk> <13044489.RHGIMAnax8@thomas> <98CBD80474FA8B44BF855DF32C47DC35C613DB@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35C613DF@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C613DF@smartserver.smartshare.dk> User-Agent: Mutt/1.10.1 (2018-07-13) Subject: Re: [dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst half X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, On Tue, Nov 03, 2020 at 04:03:46PM +0100, Morten Brørup wrote: > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Slava Ovsiienko > > Sent: Tuesday, November 3, 2020 3:03 PM > > > > Hi, Morten > > > > > From: Morten Brørup > > > Sent: Tuesday, November 3, 2020 14:10 > > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net] > > > > Sent: Monday, November 2, 2020 4:58 PM > > > > > > > > +Cc techboard > > > > > > > > We need benchmark numbers in order to take a decision. > > > > Please all, prepare some arguments and numbers so we can discuss > > the > > > > mbuf layout in the next techboard meeting. I did some quick tests, and it appears to me that just moving the pool pointer to the first cache line has not a significant impact. However, I agree with Morten that there is some room for optimization around m->pool: I did a hack in the ixgbe driver to assume there is only one mbuf pool. This simplifies a lot the freeing of mbufs in Tx, because we don't have to group them in bulks that shares the same pool (see ixgbe_tx_free_bufs()). The impact of this hack is quite good: +~5% on a real-life forwarding use case. It is maybe possible to store the pool in the sw ring to avoid a later access to m->pool. Having a pool index as suggested by Morten would also help to reduce used room in sw ring in this case. But this is a bit off-topic :) > > > I propose that the techboard considers this from two angels: > > > > > > 1. Long term goals and their relative priority. I.e. what can be > > achieved with > > > wide-ranging modifications, requiring yet another ABI break and due > > notices. > > > > > > 2. Short term goals, i.e. what can be achieved for this release. > > > > > > > > > My suggestions follow... > > > > > > 1. Regarding long term goals: > > > > > > I have argued that simple forwarding of non-segmented packets using > > only the > > > first mbuf cache line can be achieved by making three > > > modifications: > > > > > > a) Move m->tx_offload to the first cache line. > > Not all PMDs use this field on Tx. HW might support the checksum > > offloads > > directly, not requiring these fields at all. To me, a driver should use m->tx_offload, because the application specifies the offset where the checksum has to be done, in case the hw is not able to recognize the protocol. > > > b) Use an 8 bit pktmbuf mempool index in the first cache line, > > > instead of the 64 bit m->pool pointer in the second cache line. > > 256 mpool looks enough, as for me. Regarding the indirect access to the > > pool > > (via some table) - it might introduce some performance impact. > > It might, but I hope that it is negligible, so the benefits outweigh the disadvantages. > > It would have to be measured, though. > > And m->pool is only used for free()'ing (and detach()'ing) mbufs. > > > For example, > > mlx5 PMD strongly relies on pool field for allocating mbufs in Rx > > datapath. > > We're going to update (o-o, we found point to optimize), but for now it > > does. > > Without looking at the source code, I don't think the PMD is using m->pool in the RX datapath, I think it is using a pool dedicated to a receive queue used for RX descriptors in the PMD (i.e. driver->queue->pool). > > > > > > c) Do not access m->next when we know that it is NULL. > > > We can use m->nb_segs == 1 or some other invariant as the gate. > > > It can be implemented by adding an m->next accessor function: > > > struct rte_mbuf * rte_mbuf_next(struct rte_mbuf * m) > > > { > > > return m->nb_segs == 1 ? NULL : m->next; > > > } > > > > Sorry, not sure about this. IIRC, nb_segs is valid in the first > > segment/mbuf only. > > If we have the 4 segments in the pkt we see nb_seg=4 in the first one, > > and the nb_seg=1 > > in the others. The next field is NULL in the last mbuf only. Am I wrong > > and miss something ? > > You are correct. > > This would have to be updated too. Either by increasing m->nb_seg in the following segments, or by splitting up relevant functions into functions for working on first segments (incl. non-segmented packets), and functions for working on following segments of segmented packets. Instead of maintaining a valid nb_segs, a HAS_NEXT flag would be easier to implement. However it means that an accessor needs to be used instead of any m->next access. > > > Regarding the priority of this goal, I guess that simple forwarding > > of non- > > > segmented packets is probably the path taken by the majority of > > packets > > > handled by DPDK. > > > > > > An alternative goal could be: > > > Do not touch the second cache line during RX. > > > A comment in the mbuf structure says so, but it is not true anymore. > > > > > > (I guess that regression testing didn't catch this because the tests > > perform TX > > > immediately after RX, so the cache miss just moves from the TX to the > > RX part > > > of the test application.) > > > > > > > > > 2. Regarding short term goals: > > > > > > The current DPDK source code looks to me like m->next is the most > > frequently > > > accessed field in the second cache line, so it makes sense moving > > this to the > > > first cache line, rather than m->pool. > > > Benchmarking may help here. > > > > Moreover, for the segmented packets the packet size is supposed to be > > large, > > and it imposes the relatively low packet rate, so probably optimization > > of > > moving next to the 1st cache line might be negligible at all. Just > > compare 148Mpps of > > 64B pkts and 4Mpps of 3000B pkts over 100Gbps link. Currently we are on > > benchmarking > > and did not succeed yet on difference finding. The benefit can't be > > expressed in mpps delta, > > we should measure CPU clocks, but Rx queue is almost always empty - we > > have an empty > > loops. So, if we have the boost - it is extremely hard to catch one. > > Very good point regarding the value of such an optimization, Slava! > > And when free()'ing packets, both m->next and m->pool are touched. > > So perhaps the free()/detach() functions in the mbuf library can be modified to handle first segments (and non-segmented packets) and following segments differently, so accessing m->next can be avoided for non-segmented packets. Then m->pool should be moved to the first cache line. > I also think that Moving m->pool without doing something else about m->next is probably useless. And it's too late for 20.11 to do additionnal changes, so I suggest to postpone the field move to 21.11, once we have a clearer view of possible optimizations. Olivier