From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9655743BCD; Fri, 1 Mar 2024 10:45:34 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 55F0E4336D; Fri, 1 Mar 2024 10:45:34 +0100 (CET) Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by mails.dpdk.org (Postfix) with ESMTP id 9F68A400D5 for ; Fri, 1 Mar 2024 10:45:32 +0100 (CET) Received: from mail.maildlp.com (unknown [172.19.163.44]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4TmNVt2BpSz1FKSt; Fri, 1 Mar 2024 17:45:26 +0800 (CST) Received: from dggpeml500024.china.huawei.com (unknown [7.185.36.10]) by mail.maildlp.com (Postfix) with ESMTPS id F1B4714037B; Fri, 1 Mar 2024 17:45:28 +0800 (CST) Received: from [10.67.121.161] (10.67.121.161) by dggpeml500024.china.huawei.com (7.185.36.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 1 Mar 2024 17:45:28 +0800 Subject: Re: [EXTERNAL] Re: [EXT] Re: [PATCH v10 4/4] app/dma-perf: add SG copy support To: Gowrishankar Muthukrishnan , Amit Prakash Shukla , Cheng Jiang CC: "dev@dpdk.org" , Jerin Jacob , Anoob Joseph , Kevin Laatz , Bruce Richardson , Pavan Nikhilesh Bhagavatula References: <20240227160031.3931694-1-amitprakashs@marvell.com> <20240227185631.3932799-1-amitprakashs@marvell.com> <8148d7bc-f7a3-ad8c-71bf-c3f9c3c31a01@huawei.com> <3fa6868b-2d6d-9c09-c993-aa7ace1c7d80@huawei.com> From: fengchengwen Message-ID: Date: Fri, 1 Mar 2024 17:45:28 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.121.161] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To dggpeml500024.china.huawei.com (7.185.36.10) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024/3/1 16:06, Gowrishankar Muthukrishnan wrote: > Hi Fengcheng, > > >>>>> -output_result(uint8_t scenario_id, uint32_t lcore_id, char >>>>> *dma_name, >>>> uint16_t ring_size, >>>>> - uint16_t kick_batch, uint64_t ave_cycle, uint32_t >>>> buf_size, uint32_t nr_buf, >>>>> - float memory, float bandwidth, float mops, bool >>>> is_dma) >>>>> +output_result(struct test_configure *cfg, struct lcore_params *para, >>>>> + uint16_t kick_batch, uint64_t ave_cycle, uint32_t >>>> buf_size, >>>>> + uint32_t nr_buf, float memory, float bandwidth, float >>>> mops) >>>>> { >>>>> - if (is_dma) >>>>> - printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch Size: >>>> %u.\n", >>>>> - lcore_id, dma_name, ring_size, kick_batch); >>>>> - else >>>>> + uint16_t ring_size = cfg->ring_size.cur; >>>>> + uint8_t scenario_id = cfg->scenario_id; >>>>> + uint32_t lcore_id = para->lcore_id; >>>>> + char *dma_name = para->dma_name; >>>>> + >>>>> + if (cfg->is_dma) { >>>>> + printf("lcore %u, DMA %s, DMA Ring Size: %u, Kick Batch Size: >>>> %u", lcore_id, >>>>> + dma_name, ring_size, kick_batch); >>>>> + if (cfg->is_sg) >>>>> + printf(" DMA src ptrs: %u, dst ptrs: %u", >>>>> + para->src_ptrs, para->dst_ptrs); >>>> >>>> DMA src sges: %u DMA dst sges: %u >>>> >>>> I think we should add a column which title maybe misc, some like >>>> sg-src[4]- dst[1], and later we may add fill test, then this field >>>> could be pattern- >>>> 0x12345678 >>>> >>>> And in "[PATCH v10 2/4] app/dma-perf: add PCI device support" commit, >>>> if the DMA was worked in non-mem2mem direction, we could add simple >>>> descriptor of direction and pcie.info in the above misc column. >>>> >>> >>> I am sorry, I could not understand complete picture here. Do you mean >>> we reserve a column and use it as per test type. >>> >>> For plain mem copy, nothing added. >>> For SG mem copy, instead of showing "DMA src sges: 1, dst sges: 4", print >> "sg-src[1]-dst[4]". >>> In future, when we add fill test in benchmark, this line instead be "pattern- >> 0x12345678". >>> >>> Is my understanding correct over here ? >> >> Yes, some like this. >> > This patch adds SGE info in an alignment with existing output. > > I think it is better to add further extensions as we add new features. Since the app doesn't support the features that you mentioned, it is difficult to anticipate the requirements. > In fact, if the additional frameworks that we put in are not useful for those features, it could lead to stale code. > I would prefer if we can make these changes as we add new features. > >>> > >>>>> } >>>>> >>>>> while (1) { >>>>> @@ -541,13 +702,53 @@ mem_copy_benchmark(struct test_configure >>>> *cfg, bool is_dma) >>>>> >>>>> rte_eal_mp_wait_lcore(); >>>>> >>>>> - for (i = 0; i < (nr_buf / nb_workers) * nb_workers; i++) { >>>>> - if (memcmp(rte_pktmbuf_mtod(srcs[i], void *), >>>>> - rte_pktmbuf_mtod(dsts[i], void *), >>>>> - cfg->buf_size.cur) != 0) { >>>>> - printf("Copy validation fails for buffer number %d\n", >>>> i); >>>>> - ret = -1; >>>>> - goto out; >>>>> + if (!cfg->is_sg && cfg->transfer_dir == RTE_DMA_DIR_MEM_TO_MEM) >>>> { >>>>> + for (i = 0; i < (nr_buf / nb_workers) * nb_workers; i++) { >>>>> + if (memcmp(rte_pktmbuf_mtod(srcs[i], void *), >>>>> + rte_pktmbuf_mtod(dsts[i], void *), >>>>> + cfg->buf_size.cur) != 0) { >>>>> + printf("Copy validation fails for buffer number >>>> %d\n", i); >>>>> + ret = -1; >>>>> + goto out; >>>>> + } >>>>> + } >>>>> + } else if (cfg->is_sg && cfg->transfer_dir == >>>> RTE_DMA_DIR_MEM_TO_MEM) { >>>>> + size_t src_remsz = buf_size % cfg->src_ptrs; >>>>> + size_t dst_remsz = buf_size % cfg->dst_ptrs; >>>>> + size_t src_sz = buf_size / cfg->src_ptrs; >>>>> + size_t dst_sz = buf_size / cfg->dst_ptrs; >>>>> + uint8_t src[buf_size], dst[buf_size]; >>>>> + uint8_t *sbuf, *dbuf, *ptr; >>>>> + >>>>> + for (i = 0; i < (nr_buf / RTE_MAX(cfg->src_ptrs, cfg->dst_ptrs)); >>>> i++) { >>>>> + sbuf = src; >>>>> + dbuf = dst; >>>>> + ptr = NULL; >>>>> + >>>>> + for (j = 0; j < cfg->src_ptrs; j++) { >>>>> + ptr = rte_pktmbuf_mtod(srcs[i * cfg->src_ptrs >>>> + j], uint8_t *); >>>>> + memcpy(sbuf, ptr, src_sz); >>>>> + sbuf += src_sz; >>>>> + } >>>>> + >>>>> + if (src_remsz) >>>>> + memcpy(sbuf, ptr + src_sz, src_remsz); >>>>> + >>>>> + for (j = 0; j < cfg->dst_ptrs; j++) { >>>>> + ptr = rte_pktmbuf_mtod(dsts[i * cfg->dst_ptrs >>>> + j], uint8_t *); >>>>> + memcpy(dbuf, ptr, dst_sz); >>>>> + dbuf += dst_sz; >>>>> + } >>>>> + >>>>> + if (dst_remsz) >>>>> + memcpy(dbuf, ptr + dst_sz, dst_remsz); >>>>> + >>>>> + if (memcmp(src, dst, buf_size) != 0) { >>>>> + printf("SG Copy validation fails for buffer >>>> number %d\n", >>>>> + i * cfg->src_ptrs); >>>>> + ret = -1; >>>>> + goto out; >>>>> + } >>>> >>>> Now I doubt the value of verify, this verify can't find the middle >>>> round copy failure, because as long as the last round copy is >>>> successful, the validation will pass. >>>> >>> Validation is on entire buffer. If any middle copy is a failure, >>> entire memcmp would have failed. Or do I miss something ? >>> >>>> And adding validatation in every round copy will impact performance. >>>> >>> This validation is just after worker function is stopped measuring perf. >>> How would this impact performance ? >> >> Yes, it will don't impact performance. >> >> What I said before is that is not valid, pls consider following scene: >> >> >> while (1) { >> for (i = 0; i < nr_buf; i++) { // this for loop will copy all nr_bufs, >> let's defind this is a round copy. >> dma_copy: >> ret = rte_dma_copy(dev_id, 0, >> rte_mbuf_data_iova(srcs[i]), >> rte_mbuf_data_iova(dsts[i]), buf_size, 0); >> if (unlikely(ret < 0)) { >> if (ret == -ENOSPC) { >> do_dma_submit_and_poll(dev_id, >> &async_cnt, worker_info); >> goto dma_copy; >> } else >> error_exit(dev_id); >> } >> async_cnt++; >> >> if ((async_cnt % kick_batch) == 0) >> do_dma_submit_and_poll(dev_id, >> &async_cnt, worker_info); >> } >> >> if (worker_info->stop_flag) // if don't stop, it will do many >> round copies. >> break; >> } >> >> and the later validation just verify the last round, let's assume there are 100 >> round, and if the last round copy work well, but round 0~98 both copy fail, >> then the validation will not detect it. >> >> >> So if we want do all the validation, then we should add the velidation after >> every round copy, but it will impact the performance. >> >> >>> >>>> Also app/test_dmadev already verify data. so I think we should drop >>>> the validation commit. >>> >>> Even in some corner cases or unknown issues, copy would have failed >>> and taking perf cycles then is meaningless. That is the reason, this >>> validation is added after perf function doing its job. >> >> How about: >> >> while (1) { >> for (i = 0; i < nr_buf; i++) { // this for loop will copy all nr_bufs, >> let's defind this is a round copy. >> dma_copy: >> ret = rte_dma_copy(dev_id, 0, >> rte_mbuf_data_iova(srcs[i]), >> rte_mbuf_data_iova(dsts[i]), buf_size, 0); >> if (unlikely(ret < 0)) { >> if (ret == -ENOSPC) { >> do_dma_submit_and_poll(dev_id, >> &async_cnt, worker_info); >> goto dma_copy; >> } else >> error_exit(dev_id); >> } >> async_cnt++; >> >> if ((async_cnt % kick_batch) == 0) >> do_dma_submit_and_poll(dev_id, >> &async_cnt, worker_info); >> } >> >> if (unlikely(work_info->verify)) { >> ret = verify(); >> if (ret != 0) { >> // error trace, >> break; >> } >> } >> >> if (worker_info->stop_flag) // if don't stop, it will do many >> round copies. >> break; >> } >> >> and make this verify as a config entry > > I believe there is a difference in understanding of what this is intended to do. Intention here is not to validate every operation done by DMA, and that is already taken care by UT. > > Is it possible that we are we misreporting numbers if the application is buggy or PMD is misbehaving for the scenario under test and the copies are not actually performed? Yes. Think about a scenario where PMD is buggy when trying bursts of more than 1. > > Checking last set of buffers is more like testing a sample from the perf test to make sure perf test was indeed performing what it is claiming to do. If you think it is unnecessary to do so, we can drop this from upstream. But adding complete verification in performance app would be repeating what a unit test is expected to do. I would suggest not to do that. I think this commit is mainly test the dma-perf itself. OK with continue this commit. Thanks > > Thanks, > Gowrishankar >