From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C425CA0C48; Thu, 8 Jul 2021 20:36:10 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 40AAD4014F; Thu, 8 Jul 2021 20:36:10 +0200 (CEST) Received: from mail-il1-f170.google.com (mail-il1-f170.google.com [209.85.166.170]) by mails.dpdk.org (Postfix) with ESMTP id 7BFB14014E for ; Thu, 8 Jul 2021 20:36:08 +0200 (CEST) Received: by mail-il1-f170.google.com with SMTP id i13so7587428ilu.4 for ; Thu, 08 Jul 2021 11:36:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ilL5Mz6fVodGmu4dawlUnCiCec3peoH9fXyFYYpBq2M=; b=dbbO0VdiRM7BdEkwdpa1/JFcxwDbTVlZpBkX9foQ/5ZueJGogGzpGkf3X5Z2rOK/d9 6aP17PxQcFmyD9DgivvoZAg7zLWdXtJxYRbqBGqldRMg9hskQO/DfpVfw1Ms/R1nPhIu WagBFzTaw0D4oU8KhwafH8mpxgp+UO2D5NEnuatznmAnkYOoeuBmXC+2RlDbURTerfMQ rdrciAlYNIqwiPlBev/46k22nl5KmEdYNlo4DeAL/knydCDQa2lKfZTpnKVTW9P3l1d+ n6XZ/HPWYLxGPfH4v1CRC+/0a/U286+cOlQSGFQxnj2m/bI8fSj4rmUghhQ28DJsD/Vt e4yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ilL5Mz6fVodGmu4dawlUnCiCec3peoH9fXyFYYpBq2M=; b=O19WgSF4QfoVKBln9JjB77XZWIDTo6nSPMmbEOMWYg6dorsrmCman/7qzlBOoBo1nr ybBfgyVdrn8pVp+4+6jFLHAj/1pEw+2IM9QfioKq/zIbNgkpSXzkaNYKsc5aUp3WG+/r JZNb+L7tsqUpaA4D+RtmIXCK9HV1iWRe0FnbYh69SAzc09o2pdFoo7YclnvS4C3/6bdC Yvuve8YwDBnoO1AeHwdyVx2G66hFdSoP2GWZWNqOnwEDzh6BrSBs2JboqPmJnuOBYnOx OQLj2nU5m5lilFzXnoUlYInCdRdWX4g/LPLutIlSeXHlCgxejhZV7cGUwg3OZm/OxyCc l/LA== X-Gm-Message-State: AOAM531EB/NY2CWtI0ffNLhNZgwqqpABeVIezPQlMXBjyRL/ld7pBrx/ jfkqeyEnLtYaHqiFAgr/FfZWqtfv46fAlHM+Wg0= X-Google-Smtp-Source: ABdhPJxjJW6ksklQaUOFuevrlyG626HkC3n6TDVVZU+FIeBhyVa2PpAiSFx8FRGictT483qPpLh/iFZ8kYsGPEHC7f4= X-Received: by 2002:a92:b00b:: with SMTP id x11mr23500826ilh.130.1625769367704; Thu, 08 Jul 2021 11:36:07 -0700 (PDT) MIME-Version: 1.0 References: <1625231891-2963-1-git-send-email-fengchengwen@huawei.com> <9b063d9f-5b52-8e1b-e12a-f24f4ea3b122@huawei.com> In-Reply-To: <9b063d9f-5b52-8e1b-e12a-f24f4ea3b122@huawei.com> From: Jerin Jacob Date: Fri, 9 Jul 2021 00:05:40 +0530 Message-ID: To: fengchengwen Cc: Bruce Richardson , Thomas Monjalon , Ferruh Yigit , Jerin Jacob , dpdk-dev , =?UTF-8?Q?Morten_Br=C3=B8rup?= , Nipun Gupta , Hemant Agrawal , Maxime Coquelin , Honnappa Nagarahalli , David Marchand , Satananda Burla , Prasun Kapoor , "Ananyev, Konstantin" , liangma@liangbit.com, Radha Mohan Chintakuntla Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH] dmadev: introduce DMA device library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Thu, Jul 8, 2021 at 8:41 AM fengchengwen wrote: > > >>> > >>> It's just more conditionals and branches all through the code. Inside the > >>> user application, the user has to check whether to set the flag or not (or > >>> special-case the last transaction outside the loop), and within the driver, > >>> there has to be a branch whether or not to call the doorbell function. The > >>> code on both sides is far simpler and more readable if the doorbell > >>> function is exactly that - a separate function. > >> > >> I disagree. The reason is: > >> > >> We will have two classes of applications > >> > >> a) do dma copy request as and when it has data(I think, this is the > >> prime use case), for those, > >> I think, it is considerable overhead to have two function invocation > >> per transfer i.e > >> rte_dma_copy() and rte_dma_perform() > >> > >> b) do dma copy when the data is reached to a logical state, like copy > >> IP frame from Ethernet packets or so, > >> In that case, the application will have a LOGIC to detect when to > >> perform it so on the end of > >> that rte_dma_copy() flag can be updated to fire the doorbell. > >> > >> IMO, We are comparing against a branch(flag is already in register) vs > >> a set of instructions for > >> 1) function pointer overhead > >> 2) Need to use the channel context again back in another function. > >> > >> IMO, a single branch is most optimal from performance PoV. > >> > > Ok, let's try it and see how it goes. > > Test result show: > 1) For Kunpeng platform (ARMv8) could benefit very little with doorbell in flags > 2) For Xeon E5-2690 v2 (X86) could benefit with separate function > 3) Both platform could benefit with doorbell in flags if burst < 5 > > There is a performance gain in small bursts (<5). Given the extensive use of bursts in DPDK applications and users are accustomed to the concept, I do not recommend > using the 'doorbell' in flags. There is NO concept change between one option vs other option. Just argument differnet. Also, _perform() scheme not used anywhere in DPDK. I Regarding performance, I have added dummy instructions to simulate the real work load[1], now burst also has some gain in both x86 and arm64[3] I have modified your application[2] to dpdk test application to use cpu isolation etc. So this is gain in flag scheme ad code is checked in to Github[2[ [1] static inline void delay(void) { volatile int k; for (k = 0; k < 16; k++) { } } __rte_noinline int hisi_dma_enqueue(struct dmadev *dev, int vchan, void *src, void *dst, int len, const int flags) { delay(); *ring = 1; return 0; } __rte_noinline int hisi_dma_enqueue_doorbell(struct dmadev *dev, int vchan, void *src, void *dst, int len, const int flags) { delay(); *ring = 1; if (unlikely(flags == 1)) { rte_wmb(); *doorbell = 1; } return 0; } [2] https://github.com/jerinjacobk/dpdk-dmatest/commit/4fc9bc3029543bbc4caaa5183d98bac93c34f588 Update results [3] Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz echo "dma_perf_autotest" | ./build/app/test/dpdk-test --no-huge -c 0xf00 core=24 Timer running at 2600.00MHz test_for_perform_after_multiple_enqueue: burst=1 cycles=46.000000 test_for_last_enqueue_issue_doorbell: burst=1 cycles=45.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=2 cycles=90.000000 test_for_last_enqueue_issue_doorbell: burst=2 cycles=89.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=3 cycles=134.000000 test_for_last_enqueue_issue_doorbell: burst=3 cycles=133.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=4 cycles=177.000000 test_for_last_enqueue_issue_doorbell: burst=4 cycles=176.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=5 cycles=221.000000 test_for_last_enqueue_issue_doorbell: burst=5 cycles=221.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=6 cycles=265.000000 test_for_last_enqueue_issue_doorbell: burst=6 cycles=265.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=7 cycles=333.000000 test_for_last_enqueue_issue_doorbell: burst=7 cycles=309.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=8 cycles=375.000000 test_for_last_enqueue_issue_doorbell: burst=8 cycles=373.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=9 cycles=418.000000 test_for_last_enqueue_issue_doorbell: burst=9 cycles=414.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=10 cycles=462.000000 test_for_last_enqueue_issue_doorbell: burst=10 cycles=458.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=11 cycles=507.000000 test_for_last_enqueue_issue_doorbell: burst=11 cycles=501.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=12 cycles=552.000000 test_for_last_enqueue_issue_doorbell: burst=12 cycles=546.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=13 cycles=593.000000 test_for_last_enqueue_issue_doorbell: burst=13 cycles=590.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=14 cycles=638.000000 test_for_last_enqueue_issue_doorbell: burst=14 cycles=634.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=15 cycles=681.000000 test_for_last_enqueue_issue_doorbell: burst=15 cycles=678.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=16 cycles=725.000000 test_for_last_enqueue_issue_doorbell: burst=16 cycles=722.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=17 cycles=770.000000 test_for_last_enqueue_issue_doorbell: burst=17 cycles=767.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=18 cycles=815.000000 test_for_last_enqueue_issue_doorbell: burst=18 cycles=812.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=19 cycles=857.000000 test_for_last_enqueue_issue_doorbell: burst=19 cycles=854.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=20 cycles=902.000000 test_for_last_enqueue_issue_doorbell: burst=20 cycles=899.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=21 cycles=945.000000 test_for_last_enqueue_issue_doorbell: burst=21 cycles=943.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=22 cycles=990.000000 test_for_last_enqueue_issue_doorbell: burst=22 cycles=988.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=23 cycles=1033.000000 test_for_last_enqueue_issue_doorbell: burst=23 cycles=1031.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=24 cycles=1077.000000 test_for_last_enqueue_issue_doorbell: burst=24 cycles=1075.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=25 cycles=1121.000000 test_for_last_enqueue_issue_doorbell: burst=25 cycles=1119.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=26 cycles=1166.000000 test_for_last_enqueue_issue_doorbell: burst=26 cycles=1163.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=27 cycles=1208.000000 test_for_last_enqueue_issue_doorbell: burst=27 cycles=1208.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=28 cycles=1252.000000 test_for_last_enqueue_issue_doorbell: burst=28 cycles=1252.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=29 cycles=1295.000000 test_for_last_enqueue_issue_doorbell: burst=29 cycles=1295.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=30 cycles=1342.000000 test_for_last_enqueue_issue_doorbell: burst=30 cycles=1340.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=31 cycles=1386.000000 test_for_last_enqueue_issue_doorbell: burst=31 cycles=1384.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=32 cycles=1429.000000 test_for_last_enqueue_issue_doorbell: burst=32 cycles=1428.000000 ------------------------------------------------------------------------------- octeontx2: See https://doc.dpdk.org/guides/prog_guide/profile_app.html section 62.2.3. High-resolution cycle counter meson --cross config/arm/arm64_octeontx2_linux_gcc -Dc_args='-DRTE_ARM_EAL_RDTSC_USE_PMU' build echo "dma_perf_autotest" | ./build/app/test/dpdk-test --no-huge -c 0xff0000 RTE>>dma_perf_autotest^M lcore=16 Timer running at 2400.00MHz test_for_perform_after_multiple_enqueue: burst=1 cycles=105.000000 test_for_last_enqueue_issue_doorbell: burst=1 cycles=105.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=2 cycles=207.000000 test_for_last_enqueue_issue_doorbell: burst=2 cycles=207.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=3 cycles=309.000000 test_for_last_enqueue_issue_doorbell: burst=3 cycles=310.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=4 cycles=411.000000 test_for_last_enqueue_issue_doorbell: burst=4 cycles=410.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=5 cycles=513.000000 test_for_last_enqueue_issue_doorbell: burst=5 cycles=512.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=6 cycles=615.000000 test_for_last_enqueue_issue_doorbell: burst=6 cycles=615.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=7 cycles=717.000000 test_for_last_enqueue_issue_doorbell: burst=7 cycles=716.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=8 cycles=819.000000 test_for_last_enqueue_issue_doorbell: burst=8 cycles=818.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=9 cycles=921.000000 test_for_last_enqueue_issue_doorbell: burst=9 cycles=922.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=10 cycles=1023.000000 test_for_last_enqueue_issue_doorbell: burst=10 cycles=1022.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=11 cycles=1126.000000 test_for_last_enqueue_issue_doorbell: burst=11 cycles=1124.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=12 cycles=1227.000000 test_for_last_enqueue_issue_doorbell: burst=12 cycles=1227.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=13 cycles=1329.000000 test_for_last_enqueue_issue_doorbell: burst=13 cycles=1328.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=14 cycles=1431.000000 test_for_last_enqueue_issue_doorbell: burst=14 cycles=1430.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=15 cycles=1534.000000 test_for_last_enqueue_issue_doorbell: burst=15 cycles=1534.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=16 cycles=1638.000000 test_for_last_enqueue_issue_doorbell: burst=16 cycles=1640.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=17 cycles=1746.000000 test_for_last_enqueue_issue_doorbell: burst=17 cycles=1739.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=18 cycles=1847.000000 test_for_last_enqueue_issue_doorbell: burst=18 cycles=1841.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=19 cycles=1950.000000 test_for_last_enqueue_issue_doorbell: burst=19 cycles=1944.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=20 cycles=2051.000000 test_for_last_enqueue_issue_doorbell: burst=20 cycles=2045.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=21 cycles=2154.000000 test_for_last_enqueue_issue_doorbell: burst=21 cycles=2148.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=22 cycles=2257.000000 test_for_last_enqueue_issue_doorbell: burst=22 cycles=2249.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=23 cycles=2358.000000 test_for_last_enqueue_issue_doorbell: burst=23 cycles=2352.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=24 cycles=2459.000000 test_for_last_enqueue_issue_doorbell: burst=24 cycles=2454.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=25 cycles=2562.000000 test_for_last_enqueue_issue_doorbell: burst=25 cycles=2555.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=26 cycles=2665.000000 test_for_last_enqueue_issue_doorbell: burst=26 cycles=2657.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=27 cycles=2766.000000 test_for_last_enqueue_issue_doorbell: burst=27 cycles=2760.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=28 cycles=2867.000000 test_for_last_enqueue_issue_doorbell: burst=28 cycles=2861.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=29 cycles=2970.000000 test_for_last_enqueue_issue_doorbell: burst=29 cycles=2964.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=30 cycles=3073.000000 test_for_last_enqueue_issue_doorbell: burst=30 cycles=3065.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=31 cycles=3174.000000 test_for_last_enqueue_issue_doorbell: burst=31 cycles=3168.000000 ------------------------------------------------------------------------------- test_for_perform_after_multiple_enqueue: burst=32 cycles=3275.000000 test_for_last_enqueue_issue_doorbell: burst=32 cycles=3269.000000 ------------------------------------------------------------------------------- Test OK RTE> > And also user may confuse about the doorbell operations. > > Kunpeng platform test result: > [root@SZ tmp]# ./a1 1 > burst = 1 > perform_after_multiple_enqueue: burst:1 cost:0s.554422us > doorbell_for_every_enqueue: burst:1 cost:0s.450927us > last_enqueue_issue_doorbell: burst:1 cost:0s.450479us > [root@SZ tmp]# > [root@SZ tmp]# ./a1 2 > burst = 2 > perform_after_multiple_enqueue: burst:2 cost:0s.900884us > doorbell_for_every_enqueue: burst:2 cost:0s.866732us > last_enqueue_issue_doorbell: burst:2 cost:0s.732469us > [root@SZ tmp]# ./a1 5 > burst = 5 > perform_after_multiple_enqueue: burst:5 cost:1s.732410us > doorbell_for_every_enqueue: burst:5 cost:2s.115479us > last_enqueue_issue_doorbell: burst:5 cost:1s.759349us > [root@SZ tmp]# ./a1 10 > burst = 10 > perform_after_multiple_enqueue: burst:10 cost:3s.490716us > doorbell_for_every_enqueue: burst:10 cost:4s.194691us > last_enqueue_issue_doorbell: burst:10 cost:3s.331825us > [root@SZ tmp]# ./a1 30 > burst = 30 > perform_after_multiple_enqueue: burst:30 cost:9s.61761us > doorbell_for_every_enqueue: burst:30 cost:12s.517082us > last_enqueue_issue_doorbell: burst:30 cost:9s.614802us > > X86 platform test result: > fengchengwen@SZ:~/tmp$ ./a1 1 > burst = 1 > perform_after_multiple_enqueue: burst:1 cost:0s.406331us > doorbell_for_every_enqueue: burst:1 cost:0s.331109us > last_enqueue_issue_doorbell: burst:1 cost:0s.381782us > fengchengwen@SZ:~/tmp$ ./a1 2 > burst = 2 > perform_after_multiple_enqueue: burst:2 cost:0s.569024us > doorbell_for_every_enqueue: burst:2 cost:0s.643449us > last_enqueue_issue_doorbell: burst:2 cost:0s.486639us > fengchengwen@SZ:~/tmp$ ./a1 5 > burst = 5 > perform_after_multiple_enqueue: burst:5 cost:1s.166384us > doorbell_for_every_enqueue: burst:5 cost:1s.602369us > last_enqueue_issue_doorbell: burst:5 cost:1s.209392us > fengchengwen@SZ:~/tmp$ ./a1 10 > burst = 10 > perform_after_multiple_enqueue: burst:10 cost:2s.229901us > doorbell_for_every_enqueue: burst:10 cost:3s.754802us > last_enqueue_issue_doorbell: burst:10 cost:2s.328705us > fengchengwen@SZ:~/tmp$ > fengchengwen@SZ:~/tmp$ ./a1 30 > burst = 30 > perform_after_multiple_enqueue: burst:30 cost:6s.132817us > doorbell_for_every_enqueue: burst:30 cost:9s.944619us > last_enqueue_issue_doorbell: burst:30 cost:7s.73551us > > > test-code: > > #include > #include > #include > #include > #include > > struct dmadev; > > unsigned int dev_reg[10240]; > volatile unsigned int *ring; > volatile unsigned int *doorbell; > > void init_global(void) > { > ring = &dev_reg[100]; > doorbell = &dev_reg[10000]; > } > > #define rte_wmb() asm volatile("dmb oshst" : : : "memory") > //#define rte_wmb() asm volatile ("" : : : "memory") > > typedef int (*enqueue_t)(struct dmadev *dev, int vchan, void *src, void *dst, int len, int flags); > typedef void (*perform_t)(struct dmadev *dev, int vchan); > > struct dmadev { > enqueue_t enqueue; > perform_t perform; > char rsv[512]; > }; > > int hisi_dma_enqueue(struct dmadev *dev, int vchan, void *src, void *dst, int len, int flags) > { > *ring = 1; > } > > int hisi_dma_enqueue_doorbell(struct dmadev *dev, int vchan, void *src, void *dst, int len, int flags) > { > *ring = 1; > if (flags == 1) { > rte_wmb(); > *doorbell = 1; > } > } > > void hisi_dma_perform(struct dmadev *dev, int vchan) > { > rte_wmb(); > *doorbell = 1; > } > > struct dmadev devlist[64]; > > void init_devlist(bool enq_doorbell) > { > int i; > for (i = 0; i < 64; i++) { > devlist[i].enqueue = enq_doorbell ? hisi_dma_enqueue_doorbell : hisi_dma_enqueue; > devlist[i].perform = hisi_dma_perform; > } > } > > static inline int dma_enqueue(int dev_id, int vchan, void *src, void *dst, int len, int flags) > { > struct dmadev *dev = &devlist[dev_id]; > return dev->enqueue(dev, vchan, src, dst, len, flags); > } > > static inline void dma_perform(int dev_id, int vchan) > { > struct dmadev *dev = &devlist[dev_id]; > return dev->perform(dev, vchan); > } > > #define MAX_LOOPS 90000000 > > void test_for_perform_after_multiple_enqueue(int burst) > { > struct timeval start, end, delta; > unsigned int i, j; > init_devlist(false); > gettimeofday(&start, NULL); > for (i = 0; i < MAX_LOOPS; i++) { > for (j = 0; j < burst; j++) > (void)dma_enqueue(10, 0, NULL, NULL, 0, 0); > dma_perform(10, 0); > } > gettimeofday(&end, NULL); > timersub(&end, &start, &delta); > printf("perform_after_multiple_enqueue: burst:%d cost:%us.%uus \n", burst, delta.tv_sec, delta.tv_usec); > } > > void test_for_doorbell_for_every_enqueue(int burst) > { > struct timeval start, end, delta; > unsigned int i, j; > init_devlist(true); > gettimeofday(&start, NULL); > for (i = 0; i < MAX_LOOPS; i++) { > for (j = 0; j < burst; j++) > (void)dma_enqueue(10, 0, NULL, NULL, 0, 1); > } > gettimeofday(&end, NULL); > timersub(&end, &start, &delta); > printf("doorbell_for_every_enqueue: burst:%d cost:%us.%uus \n", burst, delta.tv_sec, delta.tv_usec); > } > > void test_for_last_enqueue_issue_doorbell(int burst) > { > struct timeval start, end, delta; > unsigned int i, j; > init_devlist(true); > gettimeofday(&start, NULL); > for (i = 0; i < MAX_LOOPS; i++) { > for (j = 0; j < burst - 1; j++) > (void)dma_enqueue(10, 0, NULL, NULL, 0, 0); > dma_enqueue(10, 0, NULL, NULL, 0, 1); > } > gettimeofday(&end, NULL); > timersub(&end, &start, &delta); > printf("last_enqueue_issue_doorbell: burst:%d cost:%us.%uus \n", burst, delta.tv_sec, delta.tv_usec); > } > > void main(int argc, char *argv[]) > { > if (argc < 2) { > printf("please input burst parameter!\n"); > return; > } > init_global(); > int burst = atol(argv[1]); > printf("burst = %d \n", burst); > test_for_perform_after_multiple_enqueue(burst); > test_for_doorbell_for_every_enqueue(burst); > test_for_last_enqueue_issue_doorbell(burst); > } > > > > >> > >>> > >>>> > >>>>> > >>>>>> > >>>>>>> > >>>>>>>> > >>>>> > >>>>>>>>> + +/** + * @warning + * @b EXPERIMENTAL: this API may change > >>>>>>>>> without prior notice. + * + * Returns the number of operations > >>>>>>>>> that failed to complete. + * NOTE: This API was used when > >>>>>>>>> rte_dmadev_completed has_error was set. + * + * @param dev_id > >>>>>>>>> + * The identifier of the device. + * @param vq_id + * The > >>>>>>>>> identifier of virt queue. > >>>>>>>> (> + * @param nb_status > >>>>>>>>> + * Indicates the size of status array. + * @param[out] > >>>>>>>>> status + * The error code of operations that failed to > >>>>>>>>> complete. + * @param[out] cookie + * The last failed > >>>>>>>>> completed operation's cookie. + * + * @return + * The number > >>>>>>>>> of operations that failed to complete. + * + * NOTE: The > >>>>>>>>> caller must ensure that the input parameter is valid and the + > >>>>>>>>> * corresponding device supports the operation. + */ > >>>>>>>>> +__rte_experimental +static inline uint16_t > >>>>>>>>> +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vq_id, + > >>>>>>>>> const uint16_t nb_status, uint32_t *status, + > >>>>>>>>> dma_cookie_t *cookie) > >>>>>>>> > >>>>>>>> IMO, it is better to move cookie/rind_idx at 3. Why it would > >>>>>>>> return any array of errors? since it called after > >>>>>>>> rte_dmadev_completed() has has_error. Is it better to change > >>>>>>>> > >>>>>>>> rte_dmadev_error_status((uint16_t dev_id, uint16_t vq_id, > >>>>>>>> dma_cookie_t *cookie, uint32_t *status) > >>>>>>>> > >>>>>>>> I also think, we may need to set status as bitmask and enumerate > >>>>>>>> all the combination of error codes of all the driver and return > >>>>>>>> string from driver existing rte_flow_error > >>>>>>>> > >>>>>>>> See struct rte_flow_error { enum rte_flow_error_type type; /**< > >>>>>>>> Cause field and error types. */ const void *cause; /**< Object > >>>>>>>> responsible for the error. */ const char *message; /**< > >>>>>>>> Human-readable error message. */ }; > >>>>>>>> > >>>>>>> > >>>>>>> I think we need a multi-return value API here, as we may add > >>>>>>> operations in future which have non-error status values to return. > >>>>>>> The obvious case is DMA engines which support "compare" operations. > >>>>>>> In that case a successful compare (as in there were no DMA or HW > >>>>>>> errors) can return "equal" or "not-equal" as statuses. For general > >>>>>>> "copy" operations, the faster completion op can be used to just > >>>>>>> return successful values (and only call this status version on > >>>>>>> error), while apps using those compare ops or a mixture of copy and > >>>>>>> compare ops, would always use the slower one that returns status > >>>>>>> values for each and every op.. > >>>>>>> > >>>>>>> The ioat APIs used 32-bit integer values for this status array so > >>>>>>> as to allow e.g. 16-bits for error code and 16-bits for future > >>>>>>> status values. For most operations there should be a fairly small > >>>>>>> set of things that can go wrong, i.e. bad source address, bad > >>>>>>> destination address or invalid length. Within that we may have a > >>>>>>> couple of specifics for why an address is bad, but even so I don't > >>>>>>> think we need to start having multiple bit combinations. > >>>>>> > >>>>>> OK. What is the purpose of errors status? Is it for application > >>>>>> printing it or Does the application need to take any action based on > >>>>>> specific error requests? > >>>>> > >>>>> It's largely for information purposes, but in the case of SVA/SVM > >>>>> errors could occur due to the memory not being pinned, i.e. a page > >>>>> fault, in some cases. If that happens, then it's up the app to either > >>>>> touch the memory and retry the copy, or to do a SW memcpy as a > >>>>> fallback. > >>>>> > >>>>> In other error cases, I think it's good to tell the application if it's > >>>>> passing around bad data, or data that is beyond the scope of hardware, > >>>>> e.g. a copy that is beyond what can be done in a single transaction > >>>>> for a HW instance. Given that there are always things that can go > >>>>> wrong, I think we need some error reporting mechanism. > >>>>> > >>>>>> If the former is scope, then we need to define the standard enum > >>>>>> value for the error right? ie. uint32_t *status needs to change to > >>>>>> enum rte_dma_error or so. > >>>>>> > >>>>> Sure. Perhaps an error/status structure either is an option, where we > >>>>> explicitly call out error info from status info. > >>>> > >>>> Agree. Better to have a structure with filed like, > >>>> > >>>> 1) enum rte_dma_error_type 2) memory to store, informative message on > >>>> fine aspects of error. LIke address caused issue etc.(Which will be > >>>> driver-specific information). > >>>> > >>> The only issue I have with that is that once we have driver specific > >>> information it is of little use to the application, since it can't know > >>> anything about it excepy maybe log it. I'd much rather have a set of error > >>> codes telling user that "source address is wrong", "dest address is wrong", > >>> and a generic "an address is wrong" in case driver/HW cannot distinguish > >>> source of error. Can we see how far we get with just error codes before we > >>> start into passing string messages around and all the memory management > >>> headaches that implies. > >> > >> Works for me. It should be "enum rte_dma_error_type" then, which has a standard > >> error type. Which is missing in the spec now. > >> > > +1 > > . > >