From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D4FC041B7E; Mon, 30 Jan 2023 10:20:36 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id AFEDB40EDE; Mon, 30 Jan 2023 10:20:36 +0100 (CET) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by mails.dpdk.org (Postfix) with ESMTP id 0A2B540C35 for ; Mon, 30 Jan 2023 10:20:34 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675070435; x=1706606435; h=date:from:to:cc:subject:message-id:references: content-transfer-encoding:in-reply-to:mime-version; bh=2ZE1dq+lNYEwgrPaN4lPA4EdTqecFpIdKqSYTPTThFc=; b=n1GyNjCds1NVCnmWQuYq6XR+tDmaPDr2gh04UN43NhEpJAtg1un4PTV8 qrDyKHxAHTGv+GHtYq9/zvHabnnQbIfA9FdqVecxagpo7Woa74XFxqBax i99RwXRpXcYcozjGcz5WD3X6ER+98rMWIEJ1SyREHyntuW47FDcFhKzVT NJnWSpDmaQH2ztSKhC4xB9v9EvzCqWYGZ98J9gkE1AR+TM1n+0W7ZkKPp VqFMx/65VXr0RyWqd2gPkB+kbXeqXkrtpsLZzHAc9eLfDA4/8RAj2YgDl pniR+atS7IM08nDMnPFpoQD6yRaOfRnwvJmjRs+tLJsmsdtQkJnicqm/z Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10605"; a="327529430" X-IronPort-AV: E=Sophos;i="5.97,257,1669104000"; d="scan'208";a="327529430" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2023 01:20:34 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10605"; a="992826032" X-IronPort-AV: E=Sophos;i="5.97,257,1669104000"; d="scan'208";a="992826032" Received: from fmsmsx601.amr.corp.intel.com ([10.18.126.81]) by fmsmga005.fm.intel.com with ESMTP; 30 Jan 2023 01:20:33 -0800 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx601.amr.corp.intel.com (10.18.126.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.16; Mon, 30 Jan 2023 01:20:33 -0800 Received: from fmsedg602.ED.cps.intel.com (10.1.192.136) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.16 via Frontend Transport; Mon, 30 Jan 2023 01:20:33 -0800 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (104.47.55.175) by edgegateway.intel.com (192.55.55.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.16; Mon, 30 Jan 2023 01:20:32 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bNgW8BUowZpBW5/5vU28lksCBvBDw20/k9QF9frA3MNJ4ILkF53ey8Om8QYAp1TUXGfYZ+SbWE0H1OeQjwOYaBZHGlnKyw2q8o4S1x87WF4FYoTftyWehqIliuIEOo7hRIYARRl7ESgf/wfiSdqIb+RG+AJpd+B9iI7dL27HsWwjuzl20LQUHuBfzsR2iBHPTrlza1RSX0EAN2GZWLhIpaOfkhUQg9hHRPrdr1QdN8f1Vp1XyGKDH94IgQkP8AIJoB35bob1BhzUhi21rbUN5aPtozxxJtC6EILciYTbRmWL/C9DOhdNmDE+ncXdU3sUcAQaUH2zaXa5SFfqG7vK7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tJlWWtpXX64c0XDQo4c/+gPjsqvU71x5IXvTwLH7C2g=; b=XX7zV6HWsWMSqM+7NmG0HKxes4SCg3k/6jK2a1z6Bid41g6/IRmFGXjjCxO7lw9SyqW8D3rSbGeeOuEUrEN15bt1XARgcqPya0BXoiNMxCWmW3nYaGY0CYBsxU718TRKU2dilep7+yEPWIb+h2+KcUQ7n9atj5j/vOuSJibKR+Zx8d/7eGl06a3D2bgITUEDucaAuMpwuicnfPeZrpCdmDYzVG2Fz63Em5AjV6di9DnXCltevxwTqizewDZrlEpeLFKZabtZA5MFHwTRyYlrzRSD7roOQ6K2cuY1YUFEJRlVnKqz2zjmkoafdi90o9Co4Bh9cGlwJxC52kDw5Hb6gA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) by IA0PR11MB7402.namprd11.prod.outlook.com (2603:10b6:208:432::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6043.23; Mon, 30 Jan 2023 09:20:25 +0000 Received: from DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::4d9f:6867:2d53:9ee]) by DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::4d9f:6867:2d53:9ee%7]) with mapi id 15.20.6043.036; Mon, 30 Jan 2023 09:20:25 +0000 Date: Mon, 30 Jan 2023 09:20:18 +0000 From: Bruce Richardson To: "Jiang, Cheng1" CC: "thomas@monjalon.net" , "mb@smartsharesystems.com" , "dev@dpdk.org" , "Hu, Jiayu" , "Ding, Xuan" , "Ma, WenwuX" , "Wang, YuanX" , "He, Xingguang" Subject: Re: [PATCH v3] app/dma-perf: introduce dma-perf application Message-ID: References: <20221220010619.31829-1-cheng1.jiang@intel.com> <20230117120526.39375-1-cheng1.jiang@intel.com> Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-ClientProxiedBy: LO4P123CA0505.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:272::15) To DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR11MB7309:EE_|IA0PR11MB7402:EE_ X-MS-Office365-Filtering-Correlation-Id: ae1386b8-7daf-415b-52ee-08db02a33848 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: kESF6xYewAhrTQdASV7gpIorKjh4xrHZaS8PSEk2EufQLuoIHbifx9FOnIgBbVSR7bnuthAX97DEFWi2BxfapC0JH8Mkx7ogZ+5xqo2xPjdET5I55q5CInGvazE13lf1PkSGUc94KEAEbwZCRlvhtDXl2pyURyyYaeuSyLHpHJQAXofeutriWSNZDbjePm2Cf+0y3vg/ZwTY/C9VZwN5IK0SzNnuAlm3pFPsS0vUpcvX7LuEYidw6W0GXom7EzWO6KxANzalPxKad2FXcJPnK4F90jum1gqh3pABaPZah6oVR6pB7+KMW/B3IybIG/lX3yG8ZYhyJjKWroe1FDDuJf+QNEh2gHtuUTQec59KCcVjZNEoSZyg8Ct40S0bLzzK4cQeQVJBszW6HapBNkuDk+r4PZ9y7Dvbqgfedqwy4jt/7M4S2jHbzbo3hgxhQikFMjQ0upESBXCYRYocY7W8MwO1uw50ukQCOfUCGuK8kz8d3Bf3cYfdDEA+Mf4+VyPmU7BKdVRmCdyyvhmjo5dm1LbEnnmkAHvZm8FEDnidYZO2Md85QZtETPr4FjK2HqQo3Se/STA+/tuyaP7AQlgu/o+sCLCMoc6ij+4lFXAxIi8cr5/02BLiPnndM8KPBnhy8Nz3u6p0F2fty8f4enqigQ== X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DS0PR11MB7309.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230025)(396003)(39860400002)(366004)(136003)(346002)(376002)(451199018)(44832011)(2906002)(30864003)(5660300002)(82960400001)(8936002)(6862004)(38100700002)(86362001)(66476007)(107886003)(41300700001)(6666004)(83380400001)(478600001)(6486002)(6636002)(54906003)(316002)(66574015)(4326008)(8676002)(66556008)(6506007)(6512007)(53546011)(26005)(186003)(66946007); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?iso-8859-1?Q?9MZ9uRloZB2Qm4EOMxEi0/g9sAGSmKWHhQclBOWXgzXFBX4TwIWs6sFtbn?= =?iso-8859-1?Q?xHkJCuB4GwvA3McR12T1h5gGLADSR5U14PuZxk8oEYzIKDqQdkSI7stPuV?= =?iso-8859-1?Q?ZJFFX2aN+bXAI5C/8hegiBSvetxO8j4YYkC2QLRAGBn5cNq0y9Nz0K0ILJ?= =?iso-8859-1?Q?zvp1edKIp8L7GE60vdCac+4r3pF2tbb1V0/WVqmDYJ7OkcVEd4rKmRWgX+?= =?iso-8859-1?Q?ypST55GAefneyINHtluf3Fzs/MPFTCgn9Xx20bJcz8EWOZrTF4Zx92iHUY?= =?iso-8859-1?Q?nH3x56i8jE6FnXzYcdnv19OUoYdoUTv3wfSP4egokifv2ScW0oxotgLFWY?= =?iso-8859-1?Q?b4pRBdqa6U9n6t0OZ4hMrXcPW8NhZXgWafKGtvqOY3gwN58Yno58+kqz43?= =?iso-8859-1?Q?KKuA06g2cdM6H5PDZpLiNhjiPO4uxRtR5zIv6Zt4Y0UjcbeLIgRRizapGa?= =?iso-8859-1?Q?u05rJVWztMiskNfsoOYPzG9aiT8ISrim9CIu8tigaaeScs/fS/8K2Fi5K9?= =?iso-8859-1?Q?jgM/b9vAmt+WR8ib7gsukcT1V8aiLhP2AWc+0dDiF4JdT2/cb9StwDZr5I?= =?iso-8859-1?Q?JLSfCxFeWHZ4sY3q8i6jveV4+3V5CkRnC1CZdPRZlF5FEe99d0YN/bs3Cm?= =?iso-8859-1?Q?ZId69CzPmaQmzHGmzoXPXnxgJdhgikGqZ2NOQR2P8qm9A88FPwNGkmFsSf?= =?iso-8859-1?Q?PldzR25NkdAwDWuoQt9lxgYPaakcPTEzpGv0hEmWjNP+DIdGO0zNKoHaTa?= =?iso-8859-1?Q?0KJZOL86w52zNrTN8oCnuYt1fgYkjPR+bMfbg1bDb/YSBz3yHnxJBiR8GP?= =?iso-8859-1?Q?i1qDDUGEZHcuverDckYznUAX/pjwJNTqUwQsA+TtgmWFoqViNqqQEKzc9O?= =?iso-8859-1?Q?Varfpu3PxQckkHBIkL5KOZiDUvUBP/f7jBZ7+qMirT9x7Ks5xpf80nN9ib?= =?iso-8859-1?Q?P7h22z5n5hzXnFc3E6LlqhEWeEXbuLG5r75o+RvXEHjXl+F40OZcWn8eYh?= =?iso-8859-1?Q?2p42v8hw27Eoj08fDU1/qvN1ZKT9lreqpwLyT3e4qmF12xv0YLWRZqvv0Q?= =?iso-8859-1?Q?EqPscMx4LM2I5WE17puBk+GQnY6DSIRGXapfyuloP60ldLRqyBUZsBOqwz?= =?iso-8859-1?Q?gzAg5Z/VqytoKXiFJpWeGrprw4P1vYNC8uULkIToVGmN4yb/72YdiApvFp?= =?iso-8859-1?Q?D+5yxAbUcmL8SCYZVx/6et7/QTcbWE36Mle1+kgALF5fR4xzv/pvkclJET?= =?iso-8859-1?Q?VquelTwGCG/TR8nmZa/oFhD738q9Oy+8d6FicFo02BaMIxUrMHlzMN+OXc?= =?iso-8859-1?Q?v1tkCKSRNxsDDz/wRFvT07BaBfJkLJZcN14D7o3w06tWjn+F+B1WSMW35N?= =?iso-8859-1?Q?MuSr2zb8sgZg6ung1HG/8gnntCSh1PIRJZfeP07XRY2NAVfk4Zq2Jklzez?= =?iso-8859-1?Q?sgFYEuwHHQRQ1LsCcXmWwmJCgneu3N8QBSOJiYY9nsDDee1AyF3lsGQ54b?= =?iso-8859-1?Q?nsx9wyz5nntMJ+WIgWU8TFE3aOnLdISAc+q4AVsv4itoV2J61HeU+TgjVY?= =?iso-8859-1?Q?IPqO2CZ14CjPdu902EJ2+SOXk5KsTOIm8B9Pxplk14v5EXuQh9Mwl7PSL+?= =?iso-8859-1?Q?+ykSmSMsL1yy2oF0wmvQ4ztcff8VeHia6JysUoT+XgQn2qu9M/DhadCQ?= =?iso-8859-1?Q?=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: ae1386b8-7daf-415b-52ee-08db02a33848 X-MS-Exchange-CrossTenant-AuthSource: DS0PR11MB7309.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Jan 2023 09:20:25.3371 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: Rkad58s+8rFyt8qW4K6rkOTKNPKwi6IVtcI71MATuTE6Yr7A4UYVaibIl9TID7Z4VWVcAqnjg9OHmgPPla/giXt/koq9qE02m09wXCsIX/4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PR11MB7402 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Sat, Jan 28, 2023 at 01:32:05PM +0000, Jiang, Cheng1 wrote: > Hi Bruce, > > Sorry for the late reply. We are in the Spring Festival holiday last week. > Thanks for your comments. > Replies are inline. > > Thanks, > Cheng > > > -----Original Message----- > > From: Richardson, Bruce > > Sent: Wednesday, January 18, 2023 12:52 AM > > To: Jiang, Cheng1 > > Cc: thomas@monjalon.net; mb@smartsharesystems.com; dev@dpdk.org; Hu, > > Jiayu ; Ding, Xuan ; Ma, WenwuX > > ; Wang, YuanX ; He, > > Xingguang > > Subject: Re: [PATCH v3] app/dma-perf: introduce dma-perf application > > > > On Tue, Jan 17, 2023 at 12:05:26PM +0000, Cheng Jiang wrote: > > > There are many high-performance DMA devices supported in DPDK now, and > > > these DMA devices can also be integrated into other modules of DPDK as > > > accelerators, such as Vhost. Before integrating DMA into applications, > > > developers need to know the performance of these DMA devices in > > > various scenarios and the performance of CPUs in the same scenario, > > > such as different buffer lengths. Only in this way can we know the > > > target performance of the application accelerated by using them. This > > > patch introduces a high-performance testing tool, which supports > > > comparing the performance of CPU and DMA in different scenarios > > > automatically with a pre-set config file. Memory Copy performance test are > > supported for now. > > > > > > Signed-off-by: Cheng Jiang > > > Signed-off-by: Jiayu Hu > > > Signed-off-by: Yuan Wang > > > Acked-by: Morten Brørup > > > --- > > > > More input based off trying running the application, including some thoughts on > > the testing methodology below. > > > > > > > +static void > > > +output_result(uint8_t scenario_id, uint32_t lcore_id, uint16_t dev_id, > > uint64_t ave_cycle, > > > + uint32_t buf_size, uint32_t nr_buf, uint32_t memory, > > > + float bandwidth, uint64_t ops, bool is_dma) { > > > + if (is_dma) > > > + printf("lcore %u, DMA %u:\n" > > > + "average cycles: %" PRIu64 "," > > > + " buffer size: %u, nr_buf: %u," > > > + " memory: %uMB, frequency: %" PRIu64 ".\n", > > > + lcore_id, > > > + dev_id, > > > + ave_cycle, > > > + buf_size, > > > + nr_buf, > > > + memory, > > > + rte_get_timer_hz()); > > > + else > > > + printf("lcore %u\n" > > > + "average cycles: %" PRIu64 "," > > > + " buffer size: %u, nr_buf: %u," > > > + " memory: %uMB, frequency: %" PRIu64 ".\n", > > > + lcore_id, > > > + ave_cycle, > > > + buf_size, > > > + nr_buf, > > > + memory, > > > + rte_get_timer_hz()); > > > + > > > > The term "average cycles" is unclear here - is it average cycles per test iteration, > > or average cycles per buffer copy? > > The average cycles stands for average cycles per buffer copy, I'll clarify it in the next version. > > > > > > > > + printf("Average bandwidth: %.3lfGbps, OPS: %" PRIu64 "\n", > > > +bandwidth, ops); > > > + > > > > > > > > > + > > > +static inline void > > > +do_dma_mem_copy(uint16_t dev_id, uint32_t nr_buf, uint16_t kick_batch, > > uint32_t buf_size, > > > + uint16_t mpool_iter_step, struct rte_mbuf **srcs, > > struct rte_mbuf > > > +**dsts) { > > > + int64_t async_cnt = 0; > > > + int nr_cpl = 0; > > > + uint32_t index; > > > + uint16_t offset; > > > + uint32_t i; > > > + > > > + for (offset = 0; offset < mpool_iter_step; offset++) { > > > + for (i = 0; index = i * mpool_iter_step + offset, index < nr_buf; > > i++) { > > > + if (unlikely(rte_dma_copy(dev_id, > > > + 0, > > > + srcs[index]->buf_iova + > > srcs[index]->data_off, > > > + dsts[index]->buf_iova + > > dsts[index]->data_off, > > > + buf_size, > > > + 0) < 0)) { > > > + rte_dma_submit(dev_id, 0); > > > + while (rte_dma_burst_capacity(dev_id, 0) == 0) > > { > > > + nr_cpl = rte_dma_completed(dev_id, 0, > > MAX_DMA_CPL_NB, > > > + NULL, NULL); > > > + async_cnt -= nr_cpl; > > > + } > > > + if (rte_dma_copy(dev_id, > > > + 0, > > > + srcs[index]->buf_iova + > > srcs[index]->data_off, > > > + dsts[index]->buf_iova + > > dsts[index]->data_off, > > > + buf_size, > > > + 0) < 0) { > > > + printf("enqueue fail again at %u\n", > > index); > > > + printf("space:%d\n", > > rte_dma_burst_capacity(dev_id, 0)); > > > + rte_exit(EXIT_FAILURE, "DMA enqueue > > failed\n"); > > > + } > > > + } > > > + async_cnt++; > > > + > > > + /** > > > + * When '&' is used to wrap an index, mask must be a > > power of 2. > > > + * That is, kick_batch must be 2^n. > > > + */ > > > + if (unlikely((async_cnt % kick_batch) == 0)) { > > > + rte_dma_submit(dev_id, 0); > > > + /* add a poll to avoid ring full */ > > > + nr_cpl = rte_dma_completed(dev_id, 0, > > MAX_DMA_CPL_NB, NULL, NULL); > > > + async_cnt -= nr_cpl; > > > + } > > > + } > > > + > > > + rte_dma_submit(dev_id, 0); > > > + while (async_cnt > 0) { > > > + nr_cpl = rte_dma_completed(dev_id, 0, > > MAX_DMA_CPL_NB, NULL, NULL); > > > + async_cnt -= nr_cpl; > > > + } > > > > I have a couple of concerns about the methodology for testing the HW DMA > > performance. For example, the inclusion of that final block means that we are > > including the latency of the copy operation in the result. > > > > If the objective of the test application is to determine if it is cheaper for > > software to offload a copy operation to HW or do it in SW, then the primary > > concern is the HW offload cost. That offload cost should remain constant > > irrespective of the size of the copy - since all you are doing is writing a descriptor > > and reading a completion result. However, seeing the results of running the app, > > I notice that the reported average cycles increases as the packet size increases, > > which would tend to indicate that we are not giving a realistic measurement of > > offload cost. > > We are trying to compare the time required to complete a certain amount of > work using DMA with the time required to complete it using CPU. I think in addition > to the offload cost, the capability of the DMA itself is also an important factor to be considered. > The offload cost should be constant , but when DMA copies memory of different lengths, > the time costs are different. So the reported average cycles increases as the packet size increases. > Therefore, this test result includes both offload cost and DMA operation cost. To some extent, > it should be a relative realistic measurement result. > > Do you think it makes sense to you? > Hi, Yes, I get your point about the job latency being different when the packet/copy sizes increase, but on the other hand, as I state above the actual cycle cost to the application should not increase. If any application is doing what this test app is doing, just sitting around waiting for job completion (in the fast path), then it is likely that the programmer should look at improving the offload into the app. The main issue here is that by outputting a single number, you are mixing two separate values - both offload cost and job latency. If you want to show the effects of larger/smaller packets on both, then you should output both values separately. For most applications where you will offload copies and do other work while the copy is being done, the offload cost is of primary concern. For some applications the latency figure may also be important, but in those cases the user will want to see the latency called out explicitly, not just mixed up in a single figure with offload cost. > > > > The trouble then becomes how to do so in a more realistic manner. The most > > accurate way I can think of in a unit test like this is to offload > > entries to the device and measure the cycles taken there. Then wait until such > > time as all copies are completed (to eliminate the latency time, which in a real- > > world case would be spent by a core doing something else), and then do a > > second measurement of the time taken to process all the completions. In the > > same way as for a SW copy, any time not spent in memcpy is not copy time, for > > HW copies any time spent not writing descriptors or reading completions is not > > part of the offload cost. > > Agreed, we are thinking about adding offload cost as one of test results in the future. > > > > > That said, doing the above is still not fully realistic, as a real-world app will likely > > still have some amount of other overhead, for example, polling occasionally for > > completions in between doing other work (though one would expect this to be > > relatively cheap). Similarly, if the submission queue fills, the app may have to > > delay waiting for space to submit jobs, and therefore see some of the HW copy > > latency. > > > > Therefore, I think the most realistic way to measure this is to look at the rate of > > operations while processing is being done in the middle of the test. For example, > > if we have a simple packet processing application, running the application just > > doing RX and TX and measuring the rate allows us to determine the basic packet > > I/O cost. Adding in an offload to HW for each packet and again measuring the > > rate, will then allow us to compute the true offload copy cost of the operation, > > and should give us a number that remains flat even as packet size increases. For > > previous work done on vhost with DMA acceleration, I believe we saw exactly > > that - while SW PPS reduced as packet size increased, with HW copies the PPS > > remained constant even as packet size increased. > > > > The challenge to my mind, is therefore how to implement this in a suitable unit- > > test style way, to fit into the framework you have given here. I would suggest > > that the actual performance measurement needs to be done - not on a total > > time - but on a fixed time basis within each test. For example, when doing HW > > copies, 1ms into each test run, we need to snapshot the completed entries, and > > then say 1ms later measure the number that have been completed since. In this > > way, we avoid the initial startup latency while we wait for jobs to start > > completing, and we avoid the final latency as we await the last job to complete. > > We would also include time for some potentially empty polls, and if a queue size > > is too small see that reflected in the performance too. > > I understand your concerns, but I think maybe we are not discussing the same performance number here. > We are trying to test the maximum bandwidth of DMA, and what you said is how to measure the offload cost more accurately if I understand it correctly. > I think these two performance data are both important. Maybe we can add your test methodology as one of performance aspect for > DMA in the future, I need to reconsider it and get back to you later. > Max bandwidth of HW is a third and separate number from that of offload-cost and latency. Again, it should be measured and reported separately if you want the app to provide it. Regards, /Bruce