From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9789AA04E7; Mon, 2 Nov 2020 16:58:19 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 73BE0C982; Mon, 2 Nov 2020 16:58:18 +0100 (CET) Received: from new4-smtp.messagingengine.com (new4-smtp.messagingengine.com [66.111.4.230]) by dpdk.org (Postfix) with ESMTP id C7F4DC958; Mon, 2 Nov 2020 16:58:15 +0100 (CET) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailnew.nyi.internal (Postfix) with ESMTP id 66F11580420; Mon, 2 Nov 2020 10:58:14 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Mon, 02 Nov 2020 10:58:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=fm2; bh= g+pOJHxjImO5AXaXccCzyFUCX4eE+Z4mezzlgtOyLwM=; b=GXh1cZfDqMq8ZjrC n7kVClGLmc08j520M5vkqa+S35UQnYz55Haknxyr441PsKfdR7uwX0a4NZ+O+g36 +3jAY2czXnxzIrgCGlut/KsPGASa5G9YfpzWYRBrir1rCIuNYPnqc+mtS6dqRpgO 9q5CWhH4topy07/o2dgT9qM/DfBx+IkwcNoSt3bnPF0JFnWv5q0NxBjGvBLW0Yzx Voq8IU8ZTutMfkf7JB7O+VNitYH5PDpUdRTarrP60Edhlv6+BVA5wQH6mA91CYui ofrN6gVmFEF1dFxu7VXpG0UWxTomJTMYAgj+mfw3kTO02WlMoSCFznbncl0JstsN Uszrwg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=g+pOJHxjImO5AXaXccCzyFUCX4eE+Z4mezzlgtOyL wM=; b=cQ7BpdQ3JEhuq5laLQ/61f/q0CzXMDEQDPh53nWIo3ftTyqnCJcEAnScF e5cedrPuo3+5nzxTtRYb87vESh8JQYFGEoJ1yjdUZM+8fkYe9nIr5KC7EOVohciG 369HSyNOgRNJ1WnkZjZs1WVTDQgQLnkXkHWsWtqWXqx0VT2/4YklNQ46tNHK5K66 Rs7lth6i/ZB9hOHQpufAm6dwq76QQYhj9KFo72g7/wFssE212GJpcD27yuk+UtVU DCE5dWYNaRTR5ludAJf4k7wy9eFwfyKTZ+3K9UWLEbpApI+lWfZNsV/mzYnLKy9Y xBUePWnB3MXb2GD1+tk/t3HhPRZIg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedruddtuddgkedvucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkfgjfhgggfgtsehtqhertddttddunecuhfhrohhmpefvhhhomhgr shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecugg ftrfgrthhtvghrnhepfeegffeihfeftedthfdvgfetkeffffdukeevtdevtddvgfevuedu veegvdeggedtnecukfhppeejjedrudefgedrvddtfedrudekgeenucevlhhushhtvghruf hiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehthhhomhgrshesmhhonhhjrghl ohhnrdhnvght X-ME-Proxy: Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184]) by mail.messagingengine.com (Postfix) with ESMTPA id F23A83064682; Mon, 2 Nov 2020 10:58:09 -0500 (EST) From: Thomas Monjalon To: dev@dpdk.org, techboard@dpdk.org Cc: Ajit Khaparde , "Ananyev, Konstantin" , Andrew Rybchenko , dev@dpdk.org, "Yigit, Ferruh" , david.marchand@redhat.com, "Richardson, Bruce" , olivier.matz@6wind.com, jerinj@marvell.com, viacheslavo@nvidia.com, honnappa.nagarahalli@arm.com, maxime.coquelin@redhat.com, stephen@networkplumber.org, hemant.agrawal@nxp.com, viacheslavo@nvidia.com, Matan Azrad , Shahaf Shuler , Morten =?ISO-8859-1?Q?Br=F8rup?= Date: Mon, 02 Nov 2020 16:58:08 +0100 Message-ID: <13044489.RHGIMAnax8@thomas> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C613CD@smartserver.smartshare.dk> References: <20201029092751.3837177-1-thomas@monjalon.net> <3086227.yllCKDRCEA@thomas> <98CBD80474FA8B44BF855DF32C47DC35C613CD@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="iso-8859-1" Subject: Re: [dpdk-dev] [PATCH 15/15] mbuf: move pool pointer in hotterfirst half X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" +Cc techboard We need benchmark numbers in order to take a decision. Please all, prepare some arguments and numbers so we can discuss the mbuf layout in the next techboard meeting. 01/11/2020 21:59, Morten Br=F8rup: > > From: Thomas Monjalon [mailto:thomas@monjalon.net] > > Sent: Sunday, November 1, 2020 5:38 PM > >=20 > > 01/11/2020 10:12, Morten Br=F8rup: > > > One thing has always puzzled me: > > > Why do we use 64 bits to indicate which memory pool > > > an mbuf belongs to? > > > The portid only uses 16 bits and an indirection index. > > > Why don't we use the same kind of indirection index for mbuf pools? > >=20 > > I wonder what would be the cost of indirection. Probably neglectible. >=20 > Probably. The portid does it, and that indirection is heavily used everyw= here. >=20 > The size of mbuf memory pool indirection array should be compile time con= figurable, like the size of the portid indirection array. >=20 > And for reference, the indirection array will fit into one cache line if = we default to 8 mbuf pools, thus supporting an 8 CPU socket system with one= mbuf pool per CPU socket, or a 4 CPU socket system with two mbuf pools per= CPU socket. >=20 > (And as a side note: Our application is optimized for single-socket syste= ms, and we only use one mbuf pool. I guess many applications were developed= without carefully optimizing for multi-socket systems, and also just use o= ne mbuf pool. In these cases, the mbuf structure doesn't really need a pool= field. But it is still there, and the DPDK libraries use it, so we didn't = bother removing it.) >=20 > > I think it is a good proposal... > > ... for next year, after a deprecation notice. > >=20 > > > I can easily imagine using one mbuf pool (or perhaps a few pools) > > > per CPU socket (or per physical memory bus closest to an attached NIC= ), > > > but not more than 256 mbuf memory pools in total. > > > So, let's introduce an mbufpoolid like the portid, > > > and cut this mbuf field down from 64 to 8 bits. We will need to measure the perf of the solution. There is a chance for the cost to be too much high. > > > If we also cut down m->pkt_len from 32 to 24 bits, > >=20 > > Who is using packets larger than 64k? Are 16 bits enough? >=20 > I personally consider 64k a reasonable packet size limit. Exotic applicat= ions with even larger packets would have to live with this constraint. But = let's see if there are any objections. For reference, 64k corresponds to ca= =2E 44 Ethernet (1500 byte) packets. >=20 > (The limit could be 65535 bytes, to avoid translation of the value 0 into= 65536 bytes.) >=20 > This modification would go nicely hand in hand with the mbuf pool indirec= tion modification. >=20 > ... after yet another round of ABI stability discussions, depreciation no= tices, and so on. :-) After more thoughts, I'm afraid 64k is too small in some cases. And 24-bit manipulation would probably break performance. I'm afraid we are stuck with 32-bit length. > > > we can get the 8 bit mbuf pool index into the first cache line > > > at no additional cost. > >=20 > > I like the idea. > > It means we don't need to move the pool pointer now, > > i.e. it does not have to replace the timestamp field. >=20 > Agreed! Don't move m->pool to the first cache line; it is not used for RX. >=20 > >=20 > > > In other words: This would free up another 64 bit field in the mbuf > > structure! > >=20 > > That would be great! > >=20 > >=20 > > > And even though the m->next pointer for scattered packets resides > > > in the second cache line, the libraries and application knows > > > that m->next is NULL when m->nb_segs is 1. > > > This proves that my suggestion would make touching > > > the second cache line unnecessary (in simple cases), > > > even for re-initializing the mbuf. > >=20 > > So you think the "next" pointer should stay in the second half of mbuf? > >=20 > > I feel you would like to move the Tx offloads in the first half > > to improve performance of very simple apps. >=20 > "Very simple apps" sounds like a minority of apps. I would rather say "ve= ry simple packet handling scenarios", e.g. forwarding of normal size non-se= gmented packets. I would guess that the vast majority of packets handled by= DPDK applications actually match this scenario. So I'm proposing to optimi= ze for what I think is the most common scenario. >=20 > If segmented packets are common, then m->next could be moved to the first= cache line. But it will only improve the pure RX steps of the pipeline. Wh= en preparing the packet for TX, m->tx_offloads will need to be set, and the= second cache line comes into play. So I'm wondering how big the benefit of= having m->next in the first cache line really is - assuming that m->nb_seg= s will be checked before accessing m->next. >=20 > > I am thinking the opposite: we could have some dynamic fields space > > in the first half to improve performance of complex Rx. > > Note: we can add a flag hint for field registration in this first half. > >=20 >=20 > I have had the same thoughts. However, I would prefer being able to forwa= rd ordinary packets without using the second mbuf cache line at all (althou= gh only in specific scenarios like my example above). >=20 > Furthermore, the application can abuse the 64 bit m->tx_offload field for= private purposes until it is time to prepare the packet for TX and pass it= on to the driver. This hack somewhat resembles a dynamic field in the firs= t cache line, and will not be possible if the m->pool or m->next field is m= oved there.