From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E3205A09E5 for ; Mon, 7 Dec 2020 17:49:56 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2D2ECE07; Mon, 7 Dec 2020 17:49:55 +0100 (CET) Received: from wh10.alp1.flow.ch (wh10.alp1.flow.ch [185.119.84.194]) by dpdk.org (Postfix) with ESMTP id E815FCF3 for ; Mon, 7 Dec 2020 17:49:53 +0100 (CET) Received: from [::1] (port=60504 helo=wh10.alp1.flow.ch) by wh10.alp1.flow.ch with esmtpa (Exim 4.92) (envelope-from ) id 1kmJhu-006JxZ-IQ; Mon, 07 Dec 2020 17:49:50 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 07 Dec 2020 17:49:50 +0100 From: Alex Kiselev To: "Singh, Jasvinder" Cc: users@dpdk.org, "Dumitrescu, Cristian" , "Dharmappa, Savinay" In-Reply-To: References: <090256f7b7a6739f80353be3339fd062@therouter.net> <7e314aa3562c380a573781a4c0562b93@therouter.net> Message-ID: <4d1beb6eb85896bef1e5a1b9778006d7@therouter.net> X-Sender: alex@therouter.net User-Agent: Roundcube Webmail/1.3.8 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh10.alp1.flow.ch X-AntiAbuse: Original Domain - dpdk.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - therouter.net X-Get-Message-Sender-Via: wh10.alp1.flow.ch: authenticated_id: alex@therouter.net X-Authenticated-Sender: wh10.alp1.flow.ch: alex@therouter.net X-Source: X-Source-Args: X-Source-Dir: Subject: Re: [dpdk-users] scheduler issue X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" On 2020-12-07 12:32, Singh, Jasvinder wrote: >> -----Original Message----- >> From: Alex Kiselev >> Sent: Monday, December 7, 2020 10:46 AM >> To: Singh, Jasvinder >> Cc: users@dpdk.org; Dumitrescu, Cristian >> ; >> Dharmappa, Savinay >> Subject: Re: [dpdk-users] scheduler issue >> >> On 2020-12-07 11:00, Singh, Jasvinder wrote: >> >> -----Original Message----- >> >> From: users On Behalf Of Alex Kiselev >> >> Sent: Friday, November 27, 2020 12:12 PM >> >> To: users@dpdk.org >> >> Cc: Dumitrescu, Cristian >> >> Subject: Re: [dpdk-users] scheduler issue >> >> >> >> On 2020-11-25 16:04, Alex Kiselev wrote: >> >> > On 2020-11-24 16:34, Alex Kiselev wrote: >> >> >> Hello, >> >> >> >> >> >> I am facing a problem with the scheduler library DPDK 18.11.10 >> >> >> with default scheduler settings (RED is off). >> >> >> It seems like some of the pipes (last time it was 4 out of 600 >> >> >> pipes) start incorrectly dropping most of the traffic after a >> >> >> couple of days of successful work. >> >> >> >> >> >> So far I've checked that there are no mbuf leaks or any other >> >> >> errors in my code and I am sure that traffic enters problematic pipes. >> >> >> Also switching a traffic in the runtime to pipes of another port >> >> >> restores the traffic flow. >> >> >> >> >> >> Ho do I approach debugging this issue? >> >> >> >> >> >> I've added using rte_sched_queue_read_stats(), but it doesn't give >> >> >> me counters that accumulate values (packet drops for example), it >> >> >> gives me some kind of current values and after a couple of seconds >> >> >> those values are reset to zero, so I can say nothing based on that API. >> >> >> >> >> >> I would appreciate any ideas and help. >> >> >> Thanks. >> >> > >> >> > Problematic pipes had very low bandwidth limit (1 Mbit/s) and also >> >> > there is an oversubscription configuration event at subport 0 of >> >> > port >> >> > 13 to which those pipes belongs and >> >> CONFIG_RTE_SCHED_SUBPORT_TC_OV is >> >> > disabled. >> >> > >> >> > Could a congestion at that subport be the reason of the problem? >> >> > >> >> > How much overhead and performance degradation will add enabling >> >> > CONFIG_RTE_SCHED_SUBPORT_TC_OV feature? >> >> > >> >> > Configuration: >> >> > >> >> > # >> >> > # QoS Scheduler Profiles >> >> > # >> >> > hqos add profile 1 rate 8 K size 1000000 tc period 40 >> >> > hqos add profile 2 rate 400 K size 1000000 tc period 40 >> >> > hqos add profile 3 rate 600 K size 1000000 tc period 40 >> >> > hqos add profile 4 rate 800 K size 1000000 tc period 40 >> >> > hqos add profile 5 rate 1 M size 1000000 tc period 40 >> >> > hqos add profile 6 rate 1500 K size 1000000 tc period 40 >> >> > hqos add profile 7 rate 2 M size 1000000 tc period 40 >> >> > hqos add profile 8 rate 3 M size 1000000 tc period 40 >> >> > hqos add profile 9 rate 4 M size 1000000 tc period 40 >> >> > hqos add profile 10 rate 5 M size 1000000 tc period 40 >> >> > hqos add profile 11 rate 6 M size 1000000 tc period 40 >> >> > hqos add profile 12 rate 8 M size 1000000 tc period 40 >> >> > hqos add profile 13 rate 10 M size 1000000 tc period 40 >> >> > hqos add profile 14 rate 12 M size 1000000 tc period 40 >> >> > hqos add profile 15 rate 15 M size 1000000 tc period 40 >> >> > hqos add profile 16 rate 16 M size 1000000 tc period 40 >> >> > hqos add profile 17 rate 20 M size 1000000 tc period 40 >> >> > hqos add profile 18 rate 30 M size 1000000 tc period 40 >> >> > hqos add profile 19 rate 32 M size 1000000 tc period 40 >> >> > hqos add profile 20 rate 40 M size 1000000 tc period 40 >> >> > hqos add profile 21 rate 50 M size 1000000 tc period 40 >> >> > hqos add profile 22 rate 60 M size 1000000 tc period 40 >> >> > hqos add profile 23 rate 100 M size 1000000 tc period 40 >> >> > hqos add profile 24 rate 25 M size 1000000 tc period 40 >> >> > hqos add profile 25 rate 50 M size 1000000 tc period 40 >> >> > >> >> > # >> >> > # Port 13 >> >> > # >> >> > hqos add port 13 rate 40 G mtu 1522 frame overhead 24 queue sizes >> >> > 64 >> >> > 64 64 64 >> >> > hqos add port 13 subport 0 rate 1500 M size 1000000 tc period 10 >> >> > hqos add port 13 subport 0 pipes 3000 profile 2 >> >> > hqos add port 13 subport 0 pipes 3000 profile 5 >> >> > hqos add port 13 subport 0 pipes 3000 profile 6 >> >> > hqos add port 13 subport 0 pipes 3000 profile 7 >> >> > hqos add port 13 subport 0 pipes 3000 profile 9 >> >> > hqos add port 13 subport 0 pipes 3000 profile 11 >> >> > hqos set port 13 lcore 5 >> >> >> >> I've enabled TC_OV feature and redirected most of the traffic to TC3. >> >> But the issue still exists. >> >> >> >> Below is queue statistics of one of problematic pipes. >> >> Almost all of the traffic entering the pipe is dropped. >> >> >> >> And the pipe is also configured with the 1Mbit/s profile. >> >> So, the issue is only with very low bandwidth pipe profiles. >> >> >> >> And this time there was no congestion on the subport. >> >> >> >> >> >> Egress qdisc >> >> dir 0 >> >> rate 1M >> >> port 6, subport 0, pipe_id 138, profile_id 5 >> >> tc 0, queue 0: bytes 752, bytes dropped 0, pkts 8, pkts dropped 0 >> >> tc 0, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 0, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 0, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 1, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 1, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 1, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 1, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 2, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 2, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 2, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 2, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> >> tc 3, queue 0: bytes 56669, bytes dropped 360242, pkts 150, pkts >> >> dropped >> >> 3749 >> >> tc 3, queue 1: bytes 63005, bytes dropped 648782, pkts 150, pkts >> >> dropped >> >> 3164 >> >> tc 3, queue 2: bytes 9984, bytes dropped 49704, pkts 128, pkts >> >> dropped >> >> 636 >> >> tc 3, queue 3: bytes 15436, bytes dropped 107198, pkts 130, pkts >> >> dropped >> >> 354 >> > >> > >> > Hi Alex, >> > >> > Can you try newer version of the library, say dpdk 20.11? >> >> Right now no, since switching to another DPDK will take a lot of time >> because >> I am using a lot of custom patches. >> >> I've tried to simply copy the entire rte_sched lib from DPDK 19 to >> DPDK 18. >> And I was able to successful back port and resolve all dependency >> issues, but >> it also will take some time to test this approach. >> >> >> > Are you >> > using dpdk qos sample app or your own app? >> >> My own app. >> >> >> What are the packets size? >> >> Application is used as BRAS/BNG server, so it's used to provide >> internet >> access to residential customers. Therefore packet sizes are typical to >> the >> internet and vary from 64 to 1500 bytes. Most of the packets are >> around >> 1000 bytes. >> >> > >> > Couple of other things for clarification- 1. At what rate you are >> > injecting the traffic to low bandwidth pipes? >> >> Well, the rate vary also, there could be congestion on some pipes at >> some >> date time. >> >> But the problem is that once the problem occurs at a pipe or at some >> queues >> inside the pipe, the pipe stops transmitting even when incoming >> traffic rate is >> much lower than the pipe's rate. >> >> > 2. How is traffic distributed among pipes and their traffic class? >> >> I am using IPv4 TOS field to choose the TC and there is a tos2tc map. >> Most of my traffic has 0 tos value which is mapped to TC3 inside my >> app. >> >> Recently I've switched to a tos2map which maps all traffic to TC3 to >> see if it >> solves the problem. >> >> Packet distribution to queues is done using the formula (ipv4.src + >> ipv4.dst) & 3 >> >> > 3. Can you try putting your own counters on those pipes queues which >> > periodically show the #packets in the queues to understand the >> > dynamics? >> >> I will try. >> >> P.S. >> >> Recently I've got another problem with scheduler. >> >> After enabling the TC_OV feature one of the ports stops transmitting. >> All port's pipes were affected. >> Port had only one support, and there were only pipes with 1 Mbit/s >> profile. >> The problem was solved by adding a 10Mit/s profile to that port. Only >> after >> that port's pipes started to transmit. >> I guess it has something to do with calculating tc_ov_wm as it depends >> on the >> maximum pipe rate. >> >> I am gonna make a test lab and a test build to reproduce this. I've made some tests and was able to reproduce the port configuration issue using a test build of my app. Tests showed that TC_OV feature works not correctly in DPDK 18.11, but there are workarounds. I still can't reproduce my main problem which is random pipes stop transmitting. Here are details: All tests use the same test traffic generator that produce 10 traffic flows entering 10 different pipes of port 1 subport 0. Only queue 0 of each pipe is used. TX rate is 800 kbit/s. packet size is 800 byte. Pipes rate are 1 Mbit/s. Subport 0 rate is 500 Mbit/s. ### ### test 1 ### Traffic generator is configured to use TC3. Configuration: hqos add profile 27 rate 1 M size 1000000 tc period 40 hqos add profile 23 rate 100 M size 1000000 tc period 40 # qos test port hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64 64 64 hqos add port 1 subport 0 rate 500 M size 1000000 tc period 10 hqos add port 1 subport 0 pipes 2000 profile 27 hqos set port 1 lcore 3 Results: h5 ~ # rcli sh qos rcv rcv 0: rx rate 641280, nb pkts 501, ind 1 rcv 1: rx rate 641280, nb pkts 501, ind 1 rcv 2: rx rate 641280, nb pkts 501, ind 1 rcv 3: rx rate 641280, nb pkts 501, ind 1 rcv 4: rx rate 641280, nb pkts 501, ind 1 rcv 5: rx rate 641280, nb pkts 501, ind 1 rcv 6: rx rate 641280, nb pkts 501, ind 1 rcv 7: rx rate 641280, nb pkts 501, ind 1 rcv 8: rx rate 641280, nb pkts 501, ind 1 rcv 9: rx rate 641280, nb pkts 501, ind 1 ! BUG ! RX rate is lower then expected 800000 bit/s despite that there is no congestion neither at subport nor at pipes levels. ### ### test 2 ### Traffic generator is configured to use TC3. !!! profile 23 has been added to the test port. Configuration: hqos add profile 27 rate 1 M size 1000000 tc period 40 hqos add profile 23 rate 100 M size 1000000 tc period 40 # qos test port hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64 64 64 hqos add port 1 subport 0 rate 500 M size 1000000 tc period 10 hqos add port 1 subport 0 pipes 2000 profile 27 hqos add port 1 subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 Results: h5 ~ # rcli sh qos rcv rcv 0: rx rate 798720, nb pkts 624, ind 1 rcv 1: rx rate 798720, nb pkts 624, ind 1 rcv 2: rx rate 798720, nb pkts 624, ind 1 rcv 3: rx rate 798720, nb pkts 624, ind 1 rcv 4: rx rate 798720, nb pkts 624, ind 1 rcv 5: rx rate 798720, nb pkts 624, ind 1 rcv 6: rx rate 798720, nb pkts 624, ind 1 rcv 7: rx rate 798720, nb pkts 624, ind 1 rcv 8: rx rate 798720, nb pkts 624, ind 1 rcv 9: rx rate 798720, nb pkts 624, ind 1 OK. Receiving traffic is rate is equal to expected values. So, just adding a pipes which are not being used solves the problem. ### ### test 3 ### !!! traffic generator uses TC 0, so tc_ov is not being used in this test. profile 23 is not used. Configuration without profile 23. hqos add profile 27 rate 1 M size 1000000 tc period 40 hqos add profile 23 rate 100 M size 1000000 tc period 40 # qos test port hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64 64 64 hqos add port 1 subport 0 rate 500 M size 1000000 tc period 10 hqos add port 1 subport 0 pipes 2000 profile 27 hqos set port 1 lcore 3 Restuls: h5 ~ # rcli sh qos rcv rcv 0: rx rate 798720, nb pkts 624, ind 0 rcv 1: rx rate 798720, nb pkts 624, ind 0 rcv 2: rx rate 798720, nb pkts 624, ind 0 rcv 3: rx rate 798720, nb pkts 624, ind 0 rcv 4: rx rate 798720, nb pkts 624, ind 0 rcv 5: rx rate 798720, nb pkts 624, ind 0 rcv 6: rx rate 798720, nb pkts 624, ind 0 rcv 7: rx rate 798720, nb pkts 624, ind 0 rcv 8: rx rate 798720, nb pkts 624, ind 0 rcv 9: rx rate 798720, nb pkts 624, ind 0 OK. Receiving traffic is rate is equal to expected values. ### ### test 4 ### Traffic generator is configured to use TC3. no profile 23. !! subport tc period has been changed from 10 to 5. Configuration: hqos add profile 27 rate 1 M size 1000000 tc period 40 hqos add profile 23 rate 100 M size 1000000 tc period 40 # qos test port hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64 64 64 hqos add port 1 subport 0 rate 500 M size 1000000 tc period 5 hqos add port 1 subport 0 pipes 2000 profile 27 hqos set port 1 lcore 3 Restuls: rcv 0: rx rate 0, nb pkts 0, ind 1 rcv 1: rx rate 0, nb pkts 0, ind 1 rcv 2: rx rate 0, nb pkts 0, ind 1 rcv 3: rx rate 0, nb pkts 0, ind 1 rcv 4: rx rate 0, nb pkts 0, ind 1 rcv 5: rx rate 0, nb pkts 0, ind 1 rcv 6: rx rate 0, nb pkts 0, ind 1 rcv 7: rx rate 0, nb pkts 0, ind 1 rcv 8: rx rate 0, nb pkts 0, ind 1 rcv 9: rx rate 0, nb pkts 0, ind 1 ! zero traffic ### ### test 5 ### Traffic generator is configured to use TC3. profile 23 is enabled. subport tc period has been changed from 10 to 5. Configuration: hqos add profile 27 rate 1 M size 1000000 tc period 40 hqos add profile 23 rate 100 M size 1000000 tc period 40 # qos test port hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64 64 64 hqos add port 1 subport 0 rate 500 M size 1000000 tc period 5 hqos add port 1 subport 0 pipes 2000 profile 27 hqos add port 1 subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 Restuls: h5 ~ # rcli sh qos rcv rcv 0: rx rate 800000, nb pkts 625, ind 1 rcv 1: rx rate 800000, nb pkts 625, ind 1 rcv 2: rx rate 800000, nb pkts 625, ind 1 rcv 3: rx rate 800000, nb pkts 625, ind 1 rcv 4: rx rate 800000, nb pkts 625, ind 1 rcv 5: rx rate 800000, nb pkts 625, ind 1 rcv 6: rx rate 800000, nb pkts 625, ind 1 rcv 7: rx rate 800000, nb pkts 625, ind 1 rcv 8: rx rate 800000, nb pkts 625, ind 1 rcv 9: rx rate 800000, nb pkts 625, ind 1 OK > > > Does this problem exist when you disable oversubscription mode? Worth > looking at grinder_tc_ov_credits_update() and grinder_credits_update() > functions where tc_ov_wm is altered. > > >> > >> > Thanks, >> > Jasvinder