From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 02FF2A00C2 for ; Fri, 24 Apr 2020 14:40:10 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E2FFC1C1E7; Fri, 24 Apr 2020 14:40:09 +0200 (CEST) Received: from smtp4.epfl.ch (smtp4.epfl.ch [128.178.224.219]) by dpdk.org (Postfix) with ESMTP id EC6011C1E5 for ; Fri, 24 Apr 2020 14:40:08 +0200 (CEST) Received: (qmail 43883 invoked by uid 107); 24 Apr 2020 12:40:08 -0000 Received: from ax-snat-224-159.epfl.ch (HELO ewa02.intranet.epfl.ch) (192.168.224.159) (TLS, AES256-GCM-SHA384 cipher) by mail.epfl.ch (AngelmatoPhylax SMTP proxy) with ESMTPS; Fri, 24 Apr 2020 14:40:08 +0200 X-EPFL-Auth: 7JwznN/47/WA7kuqzyW5InCBL9kGpgW0EJzoTbVeRu4bsdziLrg= Received: from ewa09.intranet.epfl.ch (128.178.224.180) by ewa02.intranet.epfl.ch (128.178.224.159) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1913.5; Fri, 24 Apr 2020 14:40:07 +0200 Received: from ewa09.intranet.epfl.ch ([fe80::796d:b166:b908:8a7a]) by ewa09.intranet.epfl.ch ([fe80::796d:b166:b908:8a7a%3]) with mapi id 15.01.1913.007; Fri, 24 Apr 2020 14:40:07 +0200 From: Yan Lei To: Tom Barbette , Raslan Darawsheh , Wisam Monther , Thomas Monjalon CC: "users@dpdk.org" Thread-Topic: [dpdk-users] [mlx5 + DPDK 19.11] Flow insertion rate less than 4K per sec Thread-Index: AQHWD18w6DXuHcd1MEWNCNEXOo4dZKh4SiEAgAAzmGKAA09i8YAEl68AgAACyoCAAXVzAIAAKjGIgAEu2YCAADrqgIAEkHOAgAAz7o0= Date: Fri, 24 Apr 2020 12:40:07 +0000 Message-ID: <2b2bf9b21c5e4e73b96901176437d606@epfl.ch> References: <2cb8c79c6e0a4829996f7a3b56386e89@epfl.ch> <89cc4e44367b4da9b3be59327f178524@epfl.ch> <148d15790fe042c28bedb282aef1e068@epfl.ch> <6171086.9CP3fYhb5E@thomas> <40bae964f1874d5698a5d70cad2de4a6@epfl.ch> <50ddf9ff-654b-347c-bb76-41595ad16f9b@kth.se> , <895c65c8-3662-f656-72df-54cbb406ef7b@kth.se> In-Reply-To: <895c65c8-3662-f656-72df-54cbb406ef7b@kth.se> Accept-Language: en-US, fr-CH Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [128.179.254.135] MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] [mlx5 + DPDK 19.11] Flow insertion rate less than 4K per sec X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" Hi Tom, I also did some measurements on the RSS flows. I got similar results as you= rs (~7ms for installing w/ group=3D1 and 512 entries. DPDK 19.11). @Wisam @Thomas @Raslan Also there is an issue in the benchmark patch mentio= ned earlier in this thread. I was able to get 80K/sec RSS flow insertion r= ate (group =3D 1, 512 entries) with the patch. But with some debugging I fo= und the first insertion always take ~7ms, all the following insertions take= ~12us. It turns out that all the RSS flows generated in the benchmark have= the same RETA. And the PMD/Drv/FW/NIC are smart enough to reuse the RETA f= or all the flow rules other than the first one. If I change the RETA for ev= ery new rule, first insertion take ~7ms, all other insertions take ~4.7ms. Is RSS flow insertion supposed to take this long time? As @Tom mentioned ot= her rules have been improved a lot. I attached a patch so you can reproduce the results I mentioned above. The = patch changes RETA to have 512 entries and varies RETA for each new rule (u= p to 256 variations). The patch should be applied on top of https://patches= .dpdk.org/patch/68059/ The cmd I used to get the results sudo ./flow_perf -l 3-7 -n 4 -w 02:00.0 -- --ingress --group=3D1 --ether --= ipv4 --rss --flows-count=3D10 BTW, I measured latency using rdtsc rather than the clock() used in the pat= ch. When creating a RSS flow, most part of the latency is just idle time (I= guess it just waits for the RETA to be allocated on the NIC), so clock() i= s not accurate in this case since it measures CPU time. Thanks and cheers, Lei ________________________________ From: Tom Barbette Sent: Friday, April 24, 2020 12:12 PM To: Raslan Darawsheh; Yan Lei; Wisam Monther; Thomas Monjalon Cc: users@dpdk.org Subject: Re: [dpdk-users] [mlx5 + DPDK 19.11] Flow insertion rate less than= 4K per sec Hi Raslan! Thanks for your concern. You have an example there: https://github.com/rsspp/fastclick/blob/bef6413c66ea13cb42bcafbe487d7a31bb0= ce58a/vendor/nicscheduler/methods/rss.cc#L193 It's basically "eth ipv4" with an RSS action. The goal is to do more or less what irqbalance does with IRQs, but with RSS which allows for a much better fine-tuning of the load-balancing. That rule takes around 10ms to be installed (timing of rte_flow_create) with 512 entries, and 4ms with 128 entries. However the redirection rule we use to simply jump between tables (https://github.com/rsspp/fastclick/blob/bef6413c66ea13cb42bcafbe487d7a31bb= 0ce58a/vendor/nicscheduler/methods/rss.cc#L140) to approach atomicity of updates by updating different tables in alternating cycles takes 9usec, that is pretty fast. In comparison, on group 0 RETA rules take around 35ms with 512 entries, and 30ms with 128. So the improvement is not as high as with "standard" rules, sadly. That being said, RSS update on XL710 takes around 20us (the global RSS table, here I use rte_flow because MLX5 is not updateable while the device is running with DPDK, but it is with the Kernel). Tom Le 21/04/2020 =E0 14:30, Raslan Darawsheh a =E9crit : > Hi Tom, > > Can you send an example for an rte_flow rule that you are trying ? > I guess since you are using RSS it might affect more the performance what= flows are being used. > > Kindest regards, > Raslan Darawsheh > >> -----Original Message----- >> From: users On Behalf Of Tom Barbette >> Sent: Tuesday, April 21, 2020 12:00 PM >> To: Yan Lei ; Wisam Monther ; >> Thomas Monjalon >> Cc: users@dpdk.org >> Subject: Re: [dpdk-users] [mlx5 + DPDK 19.11] Flow insertion rate less t= han >> 4K per sec >> >> Interesting! No I did not try the flow_perf, it was from our own >> application. >> >> I'm actually taking that number from the installation time of a single >> rule, that have RSS action which is probably more costly. So this and >> that may bring down the performance. >> >> Tom >> >> Le 20/04/2020 =E0 15:48, Yan Lei a =E9crit : >>> >>> Hi Tom, >>> >>> I guess "SW steering" refers to the "direct verbs/rules" >>> >> (https://eur03.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fmai= l >> s.dpdk.org%2Farchives%2Fdev%2F2019- >> February%2F125303.html&data=3D02%7C01%7Crasland%40mellanox.com >> %7C5a9698dca303410e1cf208d7e5d251d7%7Ca652971c7d2e4d9ba6a4d14925 >> 6f461b%7C0%7C0%7C637230563788809406&sdata=3DAFzmKniGXDV5yCnd >> 3qQIRQNUQ0YCxS0jXv2b2n6QW0Q%3D&reserved=3D0). group=3D0 >>> is still the same old (pre dpdk 19.05) slow implementation of flow >>> insertion. But just my guess. >>> >>> How did you measure the flow insertion rate? Did you use the patch they >>> mentioned earlier in the thread? With that patch I got 330K with >>> sudo ./flow_perf -l 3-7 -n 4 -w 02:00.0,dv_flow_en=3D1 -- --ingress >>> --group=3D1 --ether --ipv4 --udp --queue --flows-count=3D1000000. >>> >>> Cheers, >>> Lei >>> >>> -----------------------------------------------------------------------= - >>> *From:* Tom Barbette >>> *Sent:* Monday, April 20, 2020 2:24 PM >>> *To:* Wisam Monther; Thomas Monjalon; Yan Lei >>> *Cc:* users@dpdk.org >>> *Subject:* Re: [dpdk-users] [mlx5 + DPDK 19.11] Flow insertion rate les= s >>> than 4K per sec >>> Hi, >>> >>> Le 19/04/2020 =E0 16:07, Wisam Monther a =E9crit : >>>> Hey Yan, >>>> >>>> For FW steering there is HW limitation on number of flows that can be >> added on it. >>>> Which is table 0, and I think it's 65536 >>>> >>>> But to get Millions of rules use --group=3D1 which is SW steering. >>> What do you mean by SW steering? >>> >>> Using group 1 we had better performance but only around 10K. I think th= e >>> whole process lacks an update feature instead of delete+create, and the >>> ability to batch rules. >>> >>>> Moreover make sure you have enough memory in the app to have good >> insertion rate. >>>> >>>> If you have enough 1G huge pages then it's ok. >>>> If you are working with 2M pages your command should be like this: >>>> >>>> sudo ./flow_perf -l 3-7 -n 4 -w 02:00.0,dv_flow_en=3D1 --socket-mem=3D= 4096 >> -- --ingress --group=3D1 --ether --ipv4 --udp --queue --flows-count=3D10= 00000 >>>> >>>> BRs, >>>> Wisam Jaddo >>>> >>>>> -----Original Message----- >>>>> From: Thomas Monjalon >>>>> Sent: Sunday, April 19, 2020 4:58 PM >>>>> To: Yan Lei >>>>> Cc: users@dpdk.org; Wisam Monther >>>>> Subject: Re: [dpdk-users] [mlx5 + DPDK 19.11] Flow insertion rate les= s >> than 4K >>>>> per sec >>>>> >>>>> +Cc Wisam >>>>> >>>>> 16/04/2020 17:32, Yan Lei: >>>>>> Hi Thomas, >>>>>> >>>>>> >>>>>> I tried the patch (68057 + 68058) on DPDK 19.11/20.02 + ofed 4.7.3. >>>>>> >>>>>> >>>>>> TL;DR >>>>>> >>>>>> >>>>>> 1. I was only able to generate 3K rules per second. >>>>>> >>>>>> 2. The maximum number of distinct rules the NIC can support seems to >> be >>>>> 65536. >>>>>> >>>>>> >>>>>> How can I increase the insertion rate? Any firmware/driver config I >> need to >>>>> tune? Also, is 65536 distinct flows truly a limit of the NIC? The pat= ch >> defaults to >>>>> generate 4 million distinct flows though... >>>>>> >>>>>> >>>>>> Thanks in advance! >>>>>> >>>>>> >>>>>> >>>>>> Initially, running >>>>>> >>>>>> >>>>>> ``` >>>>>> >>>>>> sudo ./flow_perf -l 3-7 -n 4 -w 02:00.0,dv_flow_en=3D1 -- --ingress >>>>>> --ether --ipv4 --udp --queue --flows-count=3D1000000 >>>>>> >>>>>> ``` >>>>>> >>>>>> >>>>>> failed after a few seconds and it gave >>>>>> >>>>>> >>>>>> ``` >>>>>> Flow can't be created 1 message: hardware refuses to create flow >>>>>> EAL: Error - exiting with code: 1 >>>>>> Cause: error in creating flow >>>>>> ``` >>>>>> >>>>>> >>>>>> Then I added a small debug patch (attached) and it showed that the >> error >>>>> happens when creating the 65536th flow rule. >>> >>> The first table is indeed limited to something around that number. But >>> performance are already degrading before that point. Even with OFED 5 >>> and the firmware that comes with it. >>> >>>>>> >>>>>> >>>>>> ``` >>>>>> Flow can't be created 1 message: hardware refuses to create flow >>>>>> EAL: Error - exiting with code: 1 >>>>>> Cause: error in creating flow,flows generated: 65536 ``` >>>>>> >>>>>> >>>>>> My guess is that the NIC can only accept 65536 concurrent rules. Onc= e I >>>>> changed the outer ip mask to 0xffff, the above command runs fine. >>>>>> >>>>>> >>>>>> To see how many rules I can generate per second. I ran (with the out= er >>>>>> ip mask 0xffff) >>>>>> >>>>>> >>>>>> ``` >>>>>> >>>>>> sudo ./flow_perf -l 3-7 -n 4 -w 02:00.0,dv_flow_en=3D1 -- --ingress >>>>>> --ether --ipv4 --udp --queue --flows-count=3D65536 >>>>>> >>>>>> ``` >>>>>> >>>>>> >>>>>> and it gives >>>>>> >>>>>> >>>>>> ``` >>>>>> >>>>>> :: Total flow insertion rate -> 3.015922 K/Sec >>>>>> :: The time for creating 65536 in flows 21.730005 seconds >>>>>> :: EAGAIN counter =3D 0 >>>>>> ``` >>>>>> So 3 rules per sec. Which is close to what I observed before. >>>>>> >>>>>> ``` >>>>>> sudo ./flow_perf -l 3-7 -n 4 -w 02:00.0,dv_flow_en=3D1 -- --ingress >>>>>> --ether --ipv4 --udp --queue --flows-count=3D100000 ``` gives >>>>>> >>>>>> ``` >>>>>> :: Total flow insertion rate -> 0.949381 K/Sec >>>>>> :: The time for creating 100000 in flows 105.331842 seconds >>>>>> :: EAGAIN counter =3D 0 >>>>>> ``` >>>>>> Have no idea why it's only 1k/sec in this case... >>>>>> >>>>>> Thanks and cheers, >>>>>> Lei >>>>>> >>>>>> >>>>>> ________________________________ >>>>>> From: users on behalf of Yan Lei >>>>>> >>>>>> Sent: Tuesday, April 14, 2020 1:20 PM >>>>>> To: Thomas Monjalon >>>>>> Cc: users@dpdk.org >>>>>> Subject: Re: [dpdk-users] [mlx5 + DPDK 19.11] Flow insertion rate le= ss >>>>>> than 4K per sec >>>>>> >>>>>> Hi Thomas, >>>>>> >>>>>> Thanks! I will give it a try (using DPDK 19.11 + ofed 4.7.3). >>>>>> >>>>>> Cheers, >>>>>> Lei >>>>>> ________________________________ >>>>>> From: Thomas Monjalon >>>>>> Sent: Tuesday, April 14, 2020 12:12:28 PM >>>>>> To: Yan Lei >>>>>> Cc: users@dpdk.org >>>>>> Subject: Re: [dpdk-users] [mlx5 + DPDK 19.11] Flow insertion rate le= ss >>>>>> than 4K per sec >>>>>> >>>>>> Hi, >>>>>> >>>>>> 10/04/2020 20:11, Yan Lei: >>>>>>> I am doing some study that requires inserting more than 1 million >>>>>>> flow rules per second to the NIC. And I runs DPDK 19.11 on a >> ConnectX-5 >>>>> NIC. >>>>>>> >>>>>>> But I only managed to create around 3.3K rules per second. >>>>>>> Below is the code I used to measure the insertion rate: >>>>>> >>>>>> Please could you review this new application designed for such >> measure? >>>>>> >>>>>> >> https://eur03.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Fpatc >>>>>> >>>>> >> hes.dpdk.org%2Fpatch%2F68058%2F&data=3D02%7C01%7Cwisamm%40m >>>>> ellanox.c >>>>>> >>>>> >> om%7Cc046523203de456919da08d7e469add0%7Ca652971c7d2e4d9ba6a4d14 >> 9 >>>>> 256f46 >>>>>> >>>>> >> 1b%7C0%7C0%7C637229014854391590&sdata=3DdxxEmm4DWoMPeNGy >> M >>>>> FaYgk%2BjSE >>>>>> %2FwVKLnYAwQ7QhjKGc%3D&reserved=3D0 >>>>>> >>>>>> Any feedback about the above patch is welcome. Feel free to try and >> review >>>>> it. >>>>> >>>>> >>>>> >>>> >>> >>> >>> Tom