From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by dpdk.org (Postfix) with ESMTP id 181DF68A7 for ; Thu, 11 Sep 2014 15:11:22 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP; 11 Sep 2014 06:09:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,505,1406617200"; d="scan'208";a="601443242" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga002.jf.intel.com with ESMTP; 11 Sep 2014 06:15:51 -0700 Received: from sivswdev02.ir.intel.com (sivswdev02.ir.intel.com [10.237.217.46]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id s8BDFnBG002155; Thu, 11 Sep 2014 14:15:49 +0100 Received: from sivswdev02.ir.intel.com (localhost [127.0.0.1]) by sivswdev02.ir.intel.com with ESMTP id s8BDFnM0023675; Thu, 11 Sep 2014 14:15:49 +0100 Received: (from bricha3@localhost) by sivswdev02.ir.intel.com with id s8BDFnAe023671; Thu, 11 Sep 2014 14:15:49 +0100 From: Bruce Richardson To: dev@dpdk.org Date: Thu, 11 Sep 2014 14:15:44 +0100 Message-Id: <1410441347-22840-11-git-send-email-bruce.richardson@intel.com> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <1410441347-22840-1-git-send-email-bruce.richardson@intel.com> References: <1409759378-10113-1-git-send-email-bruce.richardson@intel.com> <1410441347-22840-1-git-send-email-bruce.richardson@intel.com> Subject: [dpdk-dev] [PATCH v2 10/13] mbuf: split mbuf across two cache lines. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Sep 2014 13:11:23 -0000 This change splits the mbuf in two to move the pool and next pointers to the second cache line. This frees up 16 bytes in first cache line. The reason for this change is that we believe that there is no possible way that we can ever fit all the fields we need to fit into a 64-byte mbuf, and so we need to start looking at a 128-byte mbuf instead. Examples of new fields that need to fit in, include - * 32-bits more for filter information for support for the new filters in the i40e driver (and possibly other future drivers) * an additional 2-4 bytes for storing info on a second vlan tag to allow drivers to support double Vlan/QinQ * 4-bytes for storing a sequence number to enable out of order packet processing and subsequent packet reordering as well as potentially a number of other fields or splitting out fields that are superimposed over each other right now, e.g. for the qos scheduler. We also want to allow space for use by other non-Intel NIC drivers that may be open-sourced to dpdk.org in the future too, where they support fields and offloads that currently supported hardware doesn't. If we accept the fact of a 2-cache-line mbuf, then the issue becomes how to rework things so that we spread our fields over the two cache lines while causing the lowest slow-down possible. The general approach that we are looking to take is to focus the first cache line on fields that are updated on RX , so that receive only deals with one cache line. The second cache line can be used for application data and information that will only be used on the TX leg. This would allow us to work on the first cache line in RX as now, and have the second cache line being prefetched in the background so that it is available when necessary. Hardware prefetches should help us out here. We also may move rarely used, or slow-path RX fields e.g. such as those for chained mbufs with jumbo frames, to the second cache line, depending upon the performance impact and bytes savings achieved. Updated in V2: * Expanded commit description to include contents of a previous mail describing some of the logic behind expanding mbuf to two cache lines * Update kni mbuf structure Signed-off-by: Bruce Richardson --- app/test/test_mbuf.c | 2 +- lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h | 6 +++--- lib/librte_mbuf/rte_mbuf.h | 3 ++- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c index 1b25481..66bcbc5 100644 --- a/app/test/test_mbuf.c +++ b/app/test/test_mbuf.c @@ -782,7 +782,7 @@ test_failing_mbuf_sanity_check(void) static int test_mbuf(void) { - RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != 64); + RTE_BUILD_BUG_ON(sizeof(struct rte_mbuf) != CACHE_LINE_SIZE * 2); /* create pktmbuf pool if it does not exist */ if (pktmbuf_pool == NULL) { diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h index ab022bd..25ed672 100644 --- a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kni_common.h @@ -108,7 +108,7 @@ struct rte_kni_fifo { * Padding is necessary to assure the offsets of these fields */ struct rte_kni_mbuf { - void *buf_addr; + void *buf_addr __attribute__((__aligned__(64))); char pad0[10]; uint16_t data_off; /**< Start address of data in segment buffer. */ char pad1[4]; @@ -117,9 +117,9 @@ struct rte_kni_mbuf { uint16_t data_len; /**< Amount of data in segment buffer. */ uint32_t pkt_len; /**< Total pkt len: sum of all segment data_len. */ char pad3[8]; - void *pool; + void *pool __attribute__((__aligned__(64))); void *next; -} __attribute__((__aligned__(64))); +}; /* * Struct used to create a KNI device. Passed to the kernel in IOCTL call diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index 34900d4..508021b 100644 --- a/lib/librte_mbuf/rte_mbuf.h +++ b/lib/librte_mbuf/rte_mbuf.h @@ -176,7 +176,8 @@ struct rte_mbuf { uint32_t sched; /**< Hierarchical scheduler */ } hash; /**< hash information */ - /* fields only used in slow path or on TX */ + /* second cache line - fields only used in slow path or on TX */ + MARKER cacheline1 __rte_cache_aligned; struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */ struct rte_mbuf *next; /**< Next segment of scattered packet. */ -- 1.9.3