From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C370DA09E5 for ; Tue, 8 Dec 2020 11:52:38 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id AE27172DE; Tue, 8 Dec 2020 11:52:37 +0100 (CET) Received: from wh10.alp1.flow.ch (wh10.alp1.flow.ch [185.119.84.194]) by dpdk.org (Postfix) with ESMTP id CB28DA3 for ; Tue, 8 Dec 2020 11:52:35 +0100 (CET) Received: from [::1] (port=47246 helo=wh10.alp1.flow.ch) by wh10.alp1.flow.ch with esmtpa (Exim 4.92) (envelope-from ) id 1kmabi-007yKt-TM; Tue, 08 Dec 2020 11:52:34 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Tue, 08 Dec 2020 11:52:34 +0100 From: Alex Kiselev To: "Singh, Jasvinder" Cc: users@dpdk.org, "Dumitrescu, Cristian" , "Dharmappa, Savinay" In-Reply-To: <5FD54115-155E-4492-B1D6-041C8782BB8E@intel.com> References: <090256f7b7a6739f80353be3339fd062@therouter.net> <7e314aa3562c380a573781a4c0562b93@therouter.net> <4d1beb6eb85896bef1e5a1b9778006d7@therouter.net> , <49019BC8-DDA6-4B39-B395-2A68E91AB424@intel.com> <226b13286c876e69ad40a65858131b66@therouter.net> <4536a02973015dc8049834635f145a19@therouter.net>, , <4e5bde1cf78b0f77f4a5ec016a7217d6@therouter.net> <5FD54115-155E-4492-B1D6-041C8782BB8E@intel.com> Message-ID: <33e4de48a732dc6d3c8ec24353b03348@therouter.net> X-Sender: alex@therouter.net User-Agent: Roundcube Webmail/1.3.8 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh10.alp1.flow.ch X-AntiAbuse: Original Domain - dpdk.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - therouter.net X-Get-Message-Sender-Via: wh10.alp1.flow.ch: authenticated_id: alex@therouter.net X-Authenticated-Sender: wh10.alp1.flow.ch: alex@therouter.net X-Source: X-Source-Args: X-Source-Dir: Subject: Re: [dpdk-users] scheduler issue X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" On 2020-12-07 23:32, Singh, Jasvinder wrote: >> On 7 Dec 2020, at 22:16, Alex Kiselev wrote: >> >> On 2020-12-07 21:34, Alex Kiselev wrote: >>> On 2020-12-07 20:29, Singh, Jasvinder wrote: >>>>> On 7 Dec 2020, at 19:09, Alex Kiselev wrote: >>>>> On 2020-12-07 20:07, Alex Kiselev wrote: >>>>>>> On 2020-12-07 19:18, Alex Kiselev wrote: >>>>>>> On 2020-12-07 18:59, Singh, Jasvinder wrote: >>>>>>>>> On 7 Dec 2020, at 17:45, Alex Kiselev >>>>>>>>> wrote: >>>>>>>>> On 2020-12-07 18:31, Singh, Jasvinder wrote: >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Alex Kiselev >>>>>>>>>>> Sent: Monday, December 7, 2020 4:50 PM >>>>>>>>>>> To: Singh, Jasvinder >>>>>>>>>>> Cc: users@dpdk.org; Dumitrescu, Cristian >>>>>>>>>>> ; >>>>>>>>>>> Dharmappa, Savinay >>>>>>>>>>> Subject: Re: [dpdk-users] scheduler issue >>>>>>>>>>>>> On 2020-12-07 12:32, Singh, Jasvinder wrote: >>>>>>>>>>>> >> -----Original Message----- >>>>>>>>>>>> >> From: Alex Kiselev >>>>>>>>>>>> >> Sent: Monday, December 7, 2020 10:46 AM >>>>>>>>>>>> >> To: Singh, Jasvinder >>>>>>>>>>>> >> Cc: users@dpdk.org; Dumitrescu, Cristian >>>>>>>>>>>> >> ; Dharmappa, Savinay >>>>>>>>>>>> >> >>>>>>>>>>>> >> Subject: Re: [dpdk-users] scheduler issue >>>>>>>>>>>> >> >>>>>>>>>>>> >> On 2020-12-07 11:00, Singh, Jasvinder wrote: >>>>>>>>>>>> >> >> -----Original Message----- >>>>>>>>>>>> >> >> From: users On Behalf Of Alex Kiselev >>>>>>>>>>>> >> >> Sent: Friday, November 27, 2020 12:12 PM >>>>>>>>>>>> >> >> To: users@dpdk.org >>>>>>>>>>>> >> >> Cc: Dumitrescu, Cristian >>>>>>>>>>>> >> >> Subject: Re: [dpdk-users] scheduler issue >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> On 2020-11-25 16:04, Alex Kiselev wrote: >>>>>>>>>>>> >> >> > On 2020-11-24 16:34, Alex Kiselev wrote: >>>>>>>>>>>> >> >> >> Hello, >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> >> >> >> I am facing a problem with the scheduler library DPDK 18.11.10 >>>>>>>>>>>> >> >> >> with default scheduler settings (RED is off). >>>>>>>>>>>> >> >> >> It seems like some of the pipes (last time it was 4 out of 600 >>>>>>>>>>>> >> >> >> pipes) start incorrectly dropping most of the traffic after a >>>>>>>>>>>> >> >> >> couple of days of successful work. >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> >> >> >> So far I've checked that there are no mbuf leaks or any other >>>>>>>>>>>> >> >> >> errors in my code and I am sure that traffic enters problematic >>>>>>>>>>>> pipes. >>>>>>>>>>>> >> >> >> Also switching a traffic in the runtime to pipes of another >>>>>>>>>>>> >> >> >> port restores the traffic flow. >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> >> >> >> Ho do I approach debugging this issue? >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> >> >> >> I've added using rte_sched_queue_read_stats(), but it doesn't >>>>>>>>>>>> >> >> >> give me counters that accumulate values (packet drops for >>>>>>>>>>>> >> >> >> example), it gives me some kind of current values and after a >>>>>>>>>>>> >> >> >> couple of seconds those values are reset to zero, so I can say >>>>>>>>>>>> nothing based on that API. >>>>>>>>>>>> >> >> >> >>>>>>>>>>>> >> >> >> I would appreciate any ideas and help. >>>>>>>>>>>> >> >> >> Thanks. >>>>>>>>>>>> >> >> > >>>>>>>>>>>> >> >> > Problematic pipes had very low bandwidth limit (1 Mbit/s) and >>>>>>>>>>>> >> >> > also there is an oversubscription configuration event at subport >>>>>>>>>>>> >> >> > 0 of port >>>>>>>>>>>> >> >> > 13 to which those pipes belongs and >>>>>>>>>>>> >> >> CONFIG_RTE_SCHED_SUBPORT_TC_OV is >>>>>>>>>>>> >> >> > disabled. >>>>>>>>>>>> >> >> > >>>>>>>>>>>> >> >> > Could a congestion at that subport be the reason of the problem? >>>>>>>>>>>> >> >> > >>>>>>>>>>>> >> >> > How much overhead and performance degradation will add enabling >>>>>>>>>>>> >> >> > CONFIG_RTE_SCHED_SUBPORT_TC_OV feature? >>>>>>>>>>>> >> >> > >>>>>>>>>>>> >> >> > Configuration: >>>>>>>>>>>> >> >> > >>>>>>>>>>>> >> >> > # >>>>>>>>>>>> >> >> > # QoS Scheduler Profiles >>>>>>>>>>>> >> >> > # >>>>>>>>>>>> >> >> > hqos add profile 1 rate 8 K size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 2 rate 400 K size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 3 rate 600 K size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 4 rate 800 K size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 5 rate 1 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 6 rate 1500 K size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 7 rate 2 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 8 rate 3 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 9 rate 4 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 10 rate 5 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 11 rate 6 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 12 rate 8 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 13 rate 10 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 14 rate 12 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 15 rate 15 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 16 rate 16 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 17 rate 20 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 18 rate 30 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 19 rate 32 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 20 rate 40 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 21 rate 50 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 22 rate 60 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 23 rate 100 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 24 rate 25 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > hqos add profile 25 rate 50 M size 1000000 tc period 40 >>>>>>>>>>>> >> >> > >>>>>>>>>>>> >> >> > # >>>>>>>>>>>> >> >> > # Port 13 >>>>>>>>>>>> >> >> > # >>>>>>>>>>>> >> >> > hqos add port 13 rate 40 G mtu 1522 frame overhead 24 queue >>>>>>>>>>>> >> >> > sizes >>>>>>>>>>>> >> >> > 64 >>>>>>>>>>>> >> >> > 64 64 64 >>>>>>>>>>>> >> >> > hqos add port 13 subport 0 rate 1500 M size 1000000 tc period 10 >>>>>>>>>>>> >> >> > hqos add port 13 subport 0 pipes 3000 profile 2 >>>>>>>>>>>> >> >> > hqos add port 13 subport 0 pipes 3000 profile 5 >>>>>>>>>>>> >> >> > hqos add port 13 subport 0 pipes 3000 profile 6 >>>>>>>>>>>> >> >> > hqos add port 13 subport 0 pipes 3000 profile 7 >>>>>>>>>>>> >> >> > hqos add port 13 subport 0 pipes 3000 profile 9 >>>>>>>>>>>> >> >> > hqos add port 13 subport 0 pipes 3000 profile 11 >>>>>>>>>>>> >> >> > hqos set port 13 lcore 5 >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> I've enabled TC_OV feature and redirected most of the traffic to TC3. >>>>>>>>>>>> >> >> But the issue still exists. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Below is queue statistics of one of problematic pipes. >>>>>>>>>>>> >> >> Almost all of the traffic entering the pipe is dropped. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> And the pipe is also configured with the 1Mbit/s profile. >>>>>>>>>>>> >> >> So, the issue is only with very low bandwidth pipe profiles. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> And this time there was no congestion on the subport. >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> >>>>>>>>>>>> >> >> Egress qdisc >>>>>>>>>>>> >> >> dir 0 >>>>>>>>>>>> >> >> rate 1M >>>>>>>>>>>> >> >> port 6, subport 0, pipe_id 138, profile_id 5 >>>>>>>>>>>> >> >> tc 0, queue 0: bytes 752, bytes dropped 0, pkts 8, pkts dropped 0 >>>>>>>>>>>> >> >> tc 0, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 0, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 0, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 1, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 1, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 1, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 1, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 2, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 2, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 2, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 2, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >>>>>>>>>>>> >> >> tc 3, queue 0: bytes 56669, bytes dropped 360242, pkts 150, >>>>>>>>>>>> >> >> pkts dropped >>>>>>>>>>>> >> >> 3749 >>>>>>>>>>>> >> >> tc 3, queue 1: bytes 63005, bytes dropped 648782, pkts 150, >>>>>>>>>>>> >> >> pkts dropped >>>>>>>>>>>> >> >> 3164 >>>>>>>>>>>> >> >> tc 3, queue 2: bytes 9984, bytes dropped 49704, pkts 128, pkts >>>>>>>>>>>> >> >> dropped >>>>>>>>>>>> >> >> 636 >>>>>>>>>>>> >> >> tc 3, queue 3: bytes 15436, bytes dropped 107198, pkts 130, >>>>>>>>>>>> >> >> pkts dropped >>>>>>>>>>>> >> >> 354 >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > Hi Alex, >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > Can you try newer version of the library, say dpdk 20.11? >>>>>>>>>>>> >> >>>>>>>>>>>> >> Right now no, since switching to another DPDK will take a lot of time >>>>>>>>>>>> >> because I am using a lot of custom patches. >>>>>>>>>>>> >> >>>>>>>>>>>> >> I've tried to simply copy the entire rte_sched lib from DPDK 19 to >>>>>>>>>>>> >> DPDK 18. >>>>>>>>>>>> >> And I was able to successful back port and resolve all dependency >>>>>>>>>>>> >> issues, but it also will take some time to test this approach. >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> > Are you >>>>>>>>>>>> >> > using dpdk qos sample app or your own app? >>>>>>>>>>>> >> >>>>>>>>>>>> >> My own app. >>>>>>>>>>>> >> >>>>>>>>>>>> >> >> What are the packets size? >>>>>>>>>>>> >> >>>>>>>>>>>> >> Application is used as BRAS/BNG server, so it's used to provide >>>>>>>>>>>> >> internet access to residential customers. Therefore packet sizes are >>>>>>>>>>>> >> typical to the internet and vary from 64 to 1500 bytes. Most of the >>>>>>>>>>>> >> packets are around >>>>>>>>>>>> >> 1000 bytes. >>>>>>>>>>>> >> >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > Couple of other things for clarification- 1. At what rate you are >>>>>>>>>>>> >> > injecting the traffic to low bandwidth pipes? >>>>>>>>>>>> >> >>>>>>>>>>>> >> Well, the rate vary also, there could be congestion on some pipes at >>>>>>>>>>>> >> some date time. >>>>>>>>>>>> >> >>>>>>>>>>>> >> But the problem is that once the problem occurs at a pipe or at some >>>>>>>>>>>> >> queues inside the pipe, the pipe stops transmitting even when >>>>>>>>>>>> >> incoming traffic rate is much lower than the pipe's rate. >>>>>>>>>>>> >> >>>>>>>>>>>> >> > 2. How is traffic distributed among pipes and their traffic class? >>>>>>>>>>>> >> >>>>>>>>>>>> >> I am using IPv4 TOS field to choose the TC and there is a tos2tc map. >>>>>>>>>>>> >> Most of my traffic has 0 tos value which is mapped to TC3 inside my >>>>>>>>>>>> >> app. >>>>>>>>>>>> >> >>>>>>>>>>>> >> Recently I've switched to a tos2map which maps all traffic to TC3 to >>>>>>>>>>>> >> see if it solves the problem. >>>>>>>>>>>> >> >>>>>>>>>>>> >> Packet distribution to queues is done using the formula (ipv4.src + >>>>>>>>>>>> >> ipv4.dst) & 3 >>>>>>>>>>>> >> >>>>>>>>>>>> >> > 3. Can you try putting your own counters on those pipes queues >>>>>>>>>>>> >> > which periodically show the #packets in the queues to understand >>>>>>>>>>>> >> > the dynamics? >>>>>>>>>>>> >> >>>>>>>>>>>> >> I will try. >>>>>>>>>>>> >> >>>>>>>>>>>> >> P.S. >>>>>>>>>>>> >> >>>>>>>>>>>> >> Recently I've got another problem with scheduler. >>>>>>>>>>>> >> >>>>>>>>>>>> >> After enabling the TC_OV feature one of the ports stops transmitting. >>>>>>>>>>>> >> All port's pipes were affected. >>>>>>>>>>>> >> Port had only one support, and there were only pipes with 1 Mbit/s >>>>>>>>>>>> >> profile. >>>>>>>>>>>> >> The problem was solved by adding a 10Mit/s profile to that port. Only >>>>>>>>>>>> >> after that port's pipes started to transmit. >>>>>>>>>>>> >> I guess it has something to do with calculating tc_ov_wm as it >>>>>>>>>>>> >> depends on the maximum pipe rate. >>>>>>>>>>>> >> >>>>>>>>>>>> >> I am gonna make a test lab and a test build to reproduce this. >>>>>>>>>>>> I've made some tests and was able to reproduce the port >>>>>>>>>>>> configuration issue >>>>>>>>>>>> using a test build of my app. >>>>>>>>>>>> Tests showed that TC_OV feature works not correctly in DPDK >>>>>>>>>>>> 18.11, but >>>>>>>>>>>> there are workarounds. >>>>>>>>>>>> I still can't reproduce my main problem which is random >>>>>>>>>>>> pipes stop >>>>>>>>>>>> transmitting. >>>>>>>>>>>> Here are details: >>>>>>>>>>>> All tests use the same test traffic generator that produce >>>>>>>>>>>> 10 traffic flows entering 10 different pipes of port 1 >>>>>>>>>>>> subport 0. >>>>>>>>>>>> Only queue 0 of each pipe is used. >>>>>>>>>>>> TX rate is 800 kbit/s. packet size is 800 byte. >>>>>>>>>>>> Pipes rate are 1 Mbit/s. Subport 0 rate is 500 Mbit/s. >>>>>>>>>>>> ### >>>>>>>>>>>> ### test 1 >>>>>>>>>>>> ### >>>>>>>>>>>> Traffic generator is configured to use TC3. >>>>>>>>>>>> Configuration: >>>>>>>>>>>> hqos add profile 27 rate 1 M size 1000000 tc period 40 >>>>>>>>>>>> hqos add profile 23 rate 100 M size 1000000 tc period 40 >>>>>>>>>>>> # qos test port >>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue >>>>>>>>>>>> sizes 64 64 >>>>>>>>>>>> 64 64 >>>>>>>>>>>> hqos add port 1 subport 0 rate 500 M size 1000000 tc period >>>>>>>>>>>> 10 >>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 27 >>>>>>>>>>>> hqos set port 1 lcore 3 >>>>>>>>>>>> Results: >>>>>>>>>>>> h5 ~ # rcli sh qos rcv >>>>>>>>>>>> rcv 0: rx rate 641280, nb pkts 501, ind 1 rcv 1: rx rate >>>>>>>>>>>> 641280, nb pkts 501, ind >>>>>>>>>>>> 1 rcv 2: rx rate 641280, nb pkts 501, ind 1 rcv 3: rx rate >>>>>>>>>>>> 641280, nb pkts 501, >>>>>>>>>>>> ind 1 rcv 4: rx rate 641280, nb pkts 501, ind 1 rcv 5: rx >>>>>>>>>>>> rate 641280, nb pkts >>>>>>>>>>>> 501, ind 1 rcv 6: rx rate 641280, nb pkts 501, ind 1 rcv 7: >>>>>>>>>>>> rx rate 641280, nb >>>>>>>>>>>> pkts 501, ind 1 rcv 8: rx rate 641280, nb pkts 501, ind 1 >>>>>>>>>>>> rcv 9: rx rate 641280, >>>>>>>>>>>> nb pkts 501, ind 1 >>>>>>>>>>>> ! BUG >>>>>>>>>>>> ! RX rate is lower then expected 800000 bit/s despite that >>>>>>>>>>>> there is no >>>>>>>>>>>> congestion neither at subport nor at pipes levels. >>>>>>>>>>> [JS] - Can you elaborate on your scheduler hierarchy? >>>>>>>>>> sure, take a look below at the output >>>>>>>>>> "number of pipes per subport" >>>>>>>>>> TR application always round the total number of pipes per port >>>>>>>>>> to a power2 value. >>>>>>>>>> I mean- how >>>>>>>>>>> many pipes per subport? It has to be the number that can be >>>>>>>>>>> expressed >>>>>>>>>>> as power of 2, for e.g 4K, 2K, 1K etc. In run time, >>>>>>>>>>> scheduler will >>>>>>>>>>> scan all the pipes and will process only those which have got >>>>>>>>>>> packets >>>>>>>>>>> in their queue. >>>>>>>>>> Configuration of port 1 with enabled profile 23 >>>>>>>>>> h5 ~ # rcli sh hqos ports >>>>>>>>>> hqos scheduler port: 1 >>>>>>>>>> lcore_id: 3 >>>>>>>>>> socket: 0 >>>>>>>>>> rate: 0 >>>>>>>>>> mtu: 1522 >>>>>>>>>> frame overhead: 24 >>>>>>>>>> number of pipes per subport: 4096 >>>>>>>>>> pipe profiles: 2 >>>>>>>>>> pipe profile id: 27 >>>>>>>>>> pipe rate: 1000000 >>>>>>>>>> number of pipes: 2000 >>>>>>>>>> pipe pool size: 2000 >>>>>>>>>> number of pipes in use: 0 >>>>>>>>>> pipe profile id: 23 >>>>>>>>>> pipe rate: 100000000 >>>>>>>>>> number of pipes: 200 >>>>>>>>>> pipe pool size: 200 >>>>>>>>>> number of pipes in use: 0 >>>>>>>>>> Configuration with only one profile at port 1 >>>>>>>>>> hqos scheduler port: 1 >>>>>>>>>> lcore_id: 3 >>>>>>>>>> socket: 0 >>>>>>>>>> rate: 0 >>>>>>>>>> mtu: 1522 >>>>>>>>>> frame overhead: 24 >>>>>>>>>> number of pipes per subport: 2048 >>>>>>>>>> pipe profiles: 1 >>>>>>>>>> pipe profile id: 27 >>>>>>>>>> pipe rate: 1000000 >>>>>>>>>> number of pipes: 2000 >>>>>>>>>> pipe pool size: 2000 >>>>>>>>>> number of pipes in use: 0 >>>>>>>>>> [JS] what is the meaning of number of pipes , Pipe pool size, >>>>>>>>>> and number of pipes in use which is zero above? Does your >>>>>>>>>> application map packet field values to these number of pipes >>>>>>>>>> in run time ? Can you give me example of mapping of packet >>>>>>>>>> field values to pipe id, tc, queue? >>>>>>>> please, ignore all information from the outputs above except the >>>>>>>> "number of pipes per subport". >>>>>>>> since the tests were made with a simple test application which >>>>>>>> is >>>>>>>> based on TR but doesn't use >>>>>>>> it's production QoS logic. >>>>>>>> The tests 1 - 5 were made with a simple test application with a >>>>>>>> very >>>>>>>> straitforward qos mappings >>>>>>>> which I described at the beginning. >>>>>>>> Here they are: >>>>>>>> All tests use the same test traffic generator that produce >>>>>>>> 10 traffic flows entering 10 different pipes (0 - 9) of port 1 >>>>>>>> subport 0. >>>>>>>> Only queue 0 of each pipe is used. >>>>>>>> TX rate is 800 kbit/s. packet size is 800 byte. >>>>>>>> Pipes rate are 1 Mbit/s. Subport 0 rate is 500 Mbit/s. >>>>>>>>>>>> ### >>>>>>>>>>>> ### test 2 >>>>>>>>>>>> ### >>>>>>>>>>>> Traffic generator is configured to use TC3. >>>>>>>>>>>> !!! profile 23 has been added to the test port. >>>>>>>>>>>> Configuration: >>>>>>>>>>>> hqos add profile 27 rate 1 M size 1000000 tc period 40 >>>>>>>>>>>> hqos add profile 23 rate 100 M size 1000000 tc period 40 >>>>>>>>>>>> # qos test port >>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue >>>>>>>>>>>> sizes 64 64 >>>>>>>>>>>> 64 64 >>>>>>>>>>>> hqos add port 1 subport 0 rate 500 M size 1000000 tc period >>>>>>>>>>>> 10 >>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 27 >>>>>>>>>>>> hqos add port 1 subport 0 pipes 200 profile 23 >>>>>>>>>>>> hqos set port 1 lcore 3 >>>>>>>>>>>> Results: >>>>>>>>>>>> h5 ~ # rcli sh qos rcv >>>>>>>>>>>> rcv 0: rx rate 798720, nb pkts 624, ind 1 rcv 1: rx rate >>>>>>>>>>>> 798720, nb pkts 624, ind >>>>>>>>>>>> 1 rcv 2: rx rate 798720, nb pkts 624, ind 1 rcv 3: rx rate >>>>>>>>>>>> 798720, nb pkts 624, >>>>>>>>>>>> ind 1 rcv 4: rx rate 798720, nb pkts 624, ind 1 rcv 5: rx >>>>>>>>>>>> rate 798720, nb pkts >>>>>>>>>>>> 624, ind 1 rcv 6: rx rate 798720, nb pkts 624, ind 1 rcv 7: >>>>>>>>>>>> rx rate 798720, nb >>>>>>>>>>>> pkts 624, ind 1 rcv 8: rx rate 798720, nb pkts 624, ind 1 >>>>>>>>>>>> rcv 9: rx rate 798720, >>>>>>>>>>>> nb pkts 624, ind 1 >>>>>>>>>>>> OK. >>>>>>>>>>>> Receiving traffic is rate is equal to expected values. >>>>>>>>>>>> So, just adding a pipes which are not being used solves the >>>>>>>>>>>> problem. >>>>>>>>>>>> ### >>>>>>>>>>>> ### test 3 >>>>>>>>>>>> ### >>>>>>>>>>>> !!! traffic generator uses TC 0, so tc_ov is not being used >>>>>>>>>>>> in this test. >>>>>>>>>>>> profile 23 is not used. >>>>>>>>>>>> Configuration without profile 23. >>>>>>>>>>>> hqos add profile 27 rate 1 M size 1000000 tc period 40 >>>>>>>>>>>> hqos add profile 23 rate 100 M size 1000000 tc period 40 >>>>>>>>>>>> # qos test port >>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue >>>>>>>>>>>> sizes 64 64 >>>>>>>>>>>> 64 64 >>>>>>>>>>>> hqos add port 1 subport 0 rate 500 M size 1000000 tc period >>>>>>>>>>>> 10 >>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 27 >>>>>>>>>>>> hqos set port 1 lcore 3 >>>>>>>>>>>> Restuls: >>>>>>>>>>>> h5 ~ # rcli sh qos rcv >>>>>>>>>>>> rcv 0: rx rate 798720, nb pkts 624, ind 0 rcv 1: rx rate >>>>>>>>>>>> 798720, nb pkts 624, ind >>>>>>>>>>>> 0 rcv 2: rx rate 798720, nb pkts 624, ind 0 rcv 3: rx rate >>>>>>>>>>>> 798720, nb pkts 624, >>>>>>>>>>>> ind 0 rcv 4: rx rate 798720, nb pkts 624, ind 0 rcv 5: rx >>>>>>>>>>>> rate 798720, nb pkts >>>>>>>>>>>> 624, ind 0 rcv 6: rx rate 798720, nb pkts 624, ind 0 rcv 7: >>>>>>>>>>>> rx rate 798720, nb >>>>>>>>>>>> pkts 624, ind 0 rcv 8: rx rate 798720, nb pkts 624, ind 0 >>>>>>>>>>>> rcv 9: rx rate 798720, >>>>>>>>>>>> nb pkts 624, ind 0 >>>>>>>>>>>> OK. >>>>>>>>>>>> Receiving traffic is rate is equal to expected values. >>>>>>>>>>>> ### >>>>>>>>>>>> ### test 4 >>>>>>>>>>>> ### >>>>>>>>>>>> Traffic generator is configured to use TC3. >>>>>>>>>>>> no profile 23. >>>>>>>>>>>> !! subport tc period has been changed from 10 to 5. >>>>>>>>>>>> Configuration: >>>>>>>>>>>> hqos add profile 27 rate 1 M size 1000000 tc period 40 >>>>>>>>>>>> hqos add profile 23 rate 100 M size 1000000 tc period 40 >>>>>>>>>>>> # qos test port >>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue >>>>>>>>>>>> sizes 64 64 >>>>>>>>>>>> 64 64 >>>>>>>>>>>> hqos add port 1 subport 0 rate 500 M size 1000000 tc period >>>>>>>>>>>> 5 >>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 27 >>>>>>>>>>>> hqos set port 1 lcore 3 >>>>>>>>>>>> Restuls: >>>>>>>>>>>> rcv 0: rx rate 0, nb pkts 0, ind 1 >>>>>>>>>>>> rcv 1: rx rate 0, nb pkts 0, ind 1 >>>>>>>>>>>> rcv 2: rx rate 0, nb pkts 0, ind 1 >>>>>>>>>>>> rcv 3: rx rate 0, nb pkts 0, ind 1 >>>>>>>>>>>> rcv 4: rx rate 0, nb pkts 0, ind 1 >>>>>>>>>>>> rcv 5: rx rate 0, nb pkts 0, ind 1 >>>>>>>>>>>> rcv 6: rx rate 0, nb pkts 0, ind 1 >>>>>>>>>>>> rcv 7: rx rate 0, nb pkts 0, ind 1 >>>>>>>>>>>> rcv 8: rx rate 0, nb pkts 0, ind 1 >>>>>>>>>>>> rcv 9: rx rate 0, nb pkts 0, ind 1 >>>>>>>>>>>> ! zero traffic >>>>>>>>>>>> ### >>>>>>>>>>>> ### test 5 >>>>>>>>>>>> ### >>>>>>>>>>>> Traffic generator is configured to use TC3. >>>>>>>>>>>> profile 23 is enabled. >>>>>>>>>>>> subport tc period has been changed from 10 to 5. >>>>>>>>>>>> Configuration: >>>>>>>>>>>> hqos add profile 27 rate 1 M size 1000000 tc period 40 >>>>>>>>>>>> hqos add profile 23 rate 100 M size 1000000 tc period 40 >>>>>>>>>>>> # qos test port >>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue >>>>>>>>>>>> sizes 64 64 >>>>>>>>>>>> 64 64 >>>>>>>>>>>> hqos add port 1 subport 0 rate 500 M size 1000000 tc period >>>>>>>>>>>> 5 >>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 27 >>>>>>>>>>>> hqos add port 1 subport 0 pipes 200 profile 23 >>>>>>>>>>>> hqos set port 1 lcore 3 >>>>>>>>>>>> Restuls: >>>>>>>>>>>> h5 ~ # rcli sh qos rcv >>>>>>>>>>>> rcv 0: rx rate 800000, nb pkts 625, ind 1 rcv 1: rx rate >>>>>>>>>>>> 800000, nb pkts 625, ind >>>>>>>>>>>> 1 rcv 2: rx rate 800000, nb pkts 625, ind 1 rcv 3: rx rate >>>>>>>>>>>> 800000, nb pkts 625, >>>>>>>>>>>> ind 1 rcv 4: rx rate 800000, nb pkts 625, ind 1 rcv 5: rx >>>>>>>>>>>> rate 800000, nb pkts >>>>>>>>>>>> 625, ind 1 rcv 6: rx rate 800000, nb pkts 625, ind 1 rcv 7: >>>>>>>>>>>> rx rate 800000, nb >>>>>>>>>>>> pkts 625, ind 1 rcv 8: rx rate 800000, nb pkts 625, ind 1 >>>>>>>>>>>> rcv 9: rx rate 800000, >>>>>>>>>>>> nb pkts 625, ind 1 >>>>>>>>>>>> OK >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > Does this problem exist when you disable oversubscription mode? Worth >>>>>>>>>>>> > looking at grinder_tc_ov_credits_update() and grinder_credits_update() >>>>>>>>>>>> > functions where tc_ov_wm is altered. >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > Thanks, >>>>>>>>>>>> >> > Jasvinder >>>>>>> Ok, these new two tests show even more clearly that TC_OV feature >>>>>>> is broken. >>>>>>> Test 1 doesn't use TC_OV and all available support bandwidth is >>>>>>> distributed to 300 pipes in a very fair way. There are 300 >>>>>>> generators >>>>>>> with tx rate 1M. They produce 300 traffic flows which enter each >>>>>>> own pipe (queue 0) of port 1 subport 0. >>>>>>> port 1 >>>>>>> subport rate 300 M >>>>>>> Then application measures the rx rate of each flows after flow's >>>>>>> traffic leaves the scheduler. >>>>>>> For example, the following line >>>>>>> rcv 284 rx rate 995840 nb pkts 778 >>>>>>> shows that rx rate of the flow with number 284 is 995840 bit/s. >>>>>>> All 300 rx rates are about 995840 bit/s (1Mbit/s) as expected. >>>> [JS] May be try repeat same test but change traffic from tc 0 to >>>> tc3. See if this works. >>> The second test with incorrect restuls was already done with tc3. >>>>>> The second test uses the same configuration >>>>>> but uses TC3, so the TC_OV function is being used. >>>>>> And the distribution of traffic in the test is very unfair. >>>>>> Some of the pipes get 875520 bit/s, some of the pipes get only >>>>>> 604160 bit/s despite that there is >>>> [JS] try repeat test with increase pipe bandwidth let’s say 50 mbps >>>> or >>>> even greater. >>> I increased pipe rate to 10Mbit/s and both tests (tc0 and tc3) showed >>> correct and identical results. >>> But, then I changed the tests and increased the number of pipes to >>> 600 >>> to see how it would work with subport congestion. I added 600 pipes >>> generatinig 10Mbit/s >>> and 3G subport limit, therefore each pipe should get equal share >>> which >>> is about 5mbits. >>> And results of both tests (tc0 or tc3) are not very good. >>> First pipes are getting much more bandwidth than the last ones. >>> The difference is 3 times. So, TC_OV is still not working!! >> >> decreasing subport tc_period from 10 to 5 has solved that problem >> and scheduler started to distribute subport bandwidth between 10 >> mbit/s pipes almost ideally. >> > [JS] now, returning to 1 mbps pipes situation, try reducing tc period > first at subport and then at pipe level, if that help in getting even > traffic across low bandwidth pipes. reducing subport tc from 10 to 5 period also solved the problem with 1 Mbit/s pipes. so, my second problem has been solved, but the first one with some of low bandwidth pipes stop transmitting still remains. > > > >>> rcv 0 rx rate 7324160 nb pkts 5722 >>> rcv 1 rx rate 7281920 nb pkts 5689 >>> rcv 2 rx rate 7226880 nb pkts 5646 >>> rcv 3 rx rate 7124480 nb pkts 5566 >>> rcv 4 rx rate 7324160 nb pkts 5722 >>> rcv 5 rx rate 7271680 nb pkts 5681 >>> rcv 6 rx rate 7188480 nb pkts 5616 >>> rcv 7 rx rate 7150080 nb pkts 5586 >>> rcv 8 rx rate 7328000 nb pkts 5725 >>> rcv 9 rx rate 7249920 nb pkts 5664 >>> rcv 10 rx rate 7188480 nb pkts 5616 >>> rcv 11 rx rate 7179520 nb pkts 5609 >>> rcv 12 rx rate 7324160 nb pkts 5722 >>> rcv 13 rx rate 7208960 nb pkts 5632 >>> rcv 14 rx rate 7152640 nb pkts 5588 >>> rcv 15 rx rate 7127040 nb pkts 5568 >>> rcv 16 rx rate 7303680 nb pkts 5706 >>> .... >>> rcv 587 rx rate 2406400 nb pkts 1880 >>> rcv 588 rx rate 2406400 nb pkts 1880 >>> rcv 589 rx rate 2406400 nb pkts 1880 >>> rcv 590 rx rate 2406400 nb pkts 1880 >>> rcv 591 rx rate 2406400 nb pkts 1880 >>> rcv 592 rx rate 2398720 nb pkts 1874 >>> rcv 593 rx rate 2400000 nb pkts 1875 >>> rcv 594 rx rate 2400000 nb pkts 1875 >>> rcv 595 rx rate 2400000 nb pkts 1875 >>> rcv 596 rx rate 2401280 nb pkts 1876 >>> rcv 597 rx rate 2401280 nb pkts 1876 >>> rcv 598 rx rate 2401280 nb pkts 1876 >>> rcv 599 rx rate 2402560 nb pkts 1877 >>> rx rate sum 3156416000 >> >> >> >>>>> ... despite that there is _NO_ congestion... >>>>> congestion at the subport or pipe. >>>>>> And the subport !! doesn't use about 42 mbit/s of available >>>>>> bandwidth. >>>>>> The only difference is those test configurations is TC of >>>>>> generated traffic. >>>>>> Test 1 uses TC 1 while test 2 uses TC 3 (which is use TC_OV >>>>>> function). >>>>>> So, enabling TC_OV changes the results dramatically. >>>>>> ## >>>>>> ## test1 >>>>>> ## >>>>>> hqos add profile 7 rate 2 M size 1000000 tc period 40 >>>>>> # qos test port >>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes >>>>>> 64 64 64 64 >>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period 10 >>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 >>>>>> hqos add port 1 subport 0 pipes 200 profile 23 >>>>>> hqos set port 1 lcore 3 >>>>>> port 1 >>>>>> subport rate 300 M >>>>>> number of tx flows 300 >>>>>> generator tx rate 1M >>>>>> TC 1 >>>>>> ... >>>>>> rcv 284 rx rate 995840 nb pkts 778 >>>>>> rcv 285 rx rate 995840 nb pkts 778 >>>>>> rcv 286 rx rate 995840 nb pkts 778 >>>>>> rcv 287 rx rate 995840 nb pkts 778 >>>>>> rcv 288 rx rate 995840 nb pkts 778 >>>>>> rcv 289 rx rate 995840 nb pkts 778 >>>>>> rcv 290 rx rate 995840 nb pkts 778 >>>>>> rcv 291 rx rate 995840 nb pkts 778 >>>>>> rcv 292 rx rate 995840 nb pkts 778 >>>>>> rcv 293 rx rate 995840 nb pkts 778 >>>>>> rcv 294 rx rate 995840 nb pkts 778 >>>>>> ... >>>>>> sum pipe's rx rate is 298 494 720 >>>>>> OK. >>>>>> The subport rate is equally distributed to 300 pipes. >>>>>> ## >>>>>> ## test 2 >>>>>> ## >>>>>> hqos add profile 7 rate 2 M size 1000000 tc period 40 >>>>>> # qos test port >>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes >>>>>> 64 64 64 64 >>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period 10 >>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 >>>>>> hqos add port 1 subport 0 pipes 200 profile 23 >>>>>> hqos set port 1 lcore 3 >>>>>> port 1 >>>>>> subport rate 300 M >>>>>> number of tx flows 300 >>>>>> generator tx rate 1M >>>>>> TC 3 >>>>>> h5 ~ # rcli sh qos rcv >>>>>> rcv 0 rx rate 875520 nb pkts 684 >>>>>> rcv 1 rx rate 856320 nb pkts 669 >>>>>> rcv 2 rx rate 849920 nb pkts 664 >>>>>> rcv 3 rx rate 853760 nb pkts 667 >>>>>> rcv 4 rx rate 867840 nb pkts 678 >>>>>> rcv 5 rx rate 844800 nb pkts 660 >>>>>> rcv 6 rx rate 852480 nb pkts 666 >>>>>> rcv 7 rx rate 855040 nb pkts 668 >>>>>> rcv 8 rx rate 865280 nb pkts 676 >>>>>> rcv 9 rx rate 846080 nb pkts 661 >>>>>> rcv 10 rx rate 858880 nb pkts 671 >>>>>> rcv 11 rx rate 870400 nb pkts 680 >>>>>> rcv 12 rx rate 864000 nb pkts 675 >>>>>> rcv 13 rx rate 852480 nb pkts 666 >>>>>> rcv 14 rx rate 855040 nb pkts 668 >>>>>> rcv 15 rx rate 857600 nb pkts 670 >>>>>> rcv 16 rx rate 864000 nb pkts 675 >>>>>> rcv 17 rx rate 866560 nb pkts 677 >>>>>> rcv 18 rx rate 865280 nb pkts 676 >>>>>> rcv 19 rx rate 858880 nb pkts 671 >>>>>> rcv 20 rx rate 856320 nb pkts 669 >>>>>> rcv 21 rx rate 864000 nb pkts 675 >>>>>> rcv 22 rx rate 869120 nb pkts 679 >>>>>> rcv 23 rx rate 856320 nb pkts 669 >>>>>> rcv 24 rx rate 862720 nb pkts 674 >>>>>> rcv 25 rx rate 865280 nb pkts 676 >>>>>> rcv 26 rx rate 867840 nb pkts 678 >>>>>> rcv 27 rx rate 870400 nb pkts 680 >>>>>> rcv 28 rx rate 860160 nb pkts 672 >>>>>> rcv 29 rx rate 870400 nb pkts 680 >>>>>> rcv 30 rx rate 869120 nb pkts 679 >>>>>> rcv 31 rx rate 870400 nb pkts 680 >>>>>> rcv 32 rx rate 858880 nb pkts 671 >>>>>> rcv 33 rx rate 858880 nb pkts 671 >>>>>> rcv 34 rx rate 852480 nb pkts 666 >>>>>> rcv 35 rx rate 874240 nb pkts 683 >>>>>> rcv 36 rx rate 855040 nb pkts 668 >>>>>> rcv 37 rx rate 853760 nb pkts 667 >>>>>> rcv 38 rx rate 869120 nb pkts 679 >>>>>> rcv 39 rx rate 885760 nb pkts 692 >>>>>> rcv 40 rx rate 861440 nb pkts 673 >>>>>> rcv 41 rx rate 852480 nb pkts 666 >>>>>> rcv 42 rx rate 871680 nb pkts 681 >>>>>> ... >>>>>> ... >>>>>> rcv 288 rx rate 766720 nb pkts 599 >>>>>> rcv 289 rx rate 766720 nb pkts 599 >>>>>> rcv 290 rx rate 766720 nb pkts 599 >>>>>> rcv 291 rx rate 766720 nb pkts 599 >>>>>> rcv 292 rx rate 762880 nb pkts 596 >>>>>> rcv 293 rx rate 762880 nb pkts 596 >>>>>> rcv 294 rx rate 762880 nb pkts 596 >>>>>> rcv 295 rx rate 760320 nb pkts 594 >>>>>> rcv 296 rx rate 604160 nb pkts 472 >>>>>> rcv 297 rx rate 604160 nb pkts 472 >>>>>> rcv 298 rx rate 604160 nb pkts 472 >>>>>> rcv 299 rx rate 604160 nb pkts 472 >>>>>> rx rate sum 258839040 >>>>>> FAILED. >>>>>> The subport rate is distributed NOT equally between 300 pipes. >>>>>> Some subport bandwith (about 42) is not being used!