From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 91995A0524 for ; Mon, 7 Dec 2020 11:46:20 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id E110DF12; Mon, 7 Dec 2020 11:46:18 +0100 (CET) Received: from wh10.alp1.flow.ch (wh10.alp1.flow.ch [185.119.84.194]) by dpdk.org (Postfix) with ESMTP id AF243CF3 for ; Mon, 7 Dec 2020 11:46:16 +0100 (CET) Received: from [::1] (port=36128 helo=wh10.alp1.flow.ch) by wh10.alp1.flow.ch with esmtpa (Exim 4.92) (envelope-from ) id 1kmE22-0058aj-Gw; Mon, 07 Dec 2020 11:46:14 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Mon, 07 Dec 2020 11:46:12 +0100 From: Alex Kiselev To: "Singh, Jasvinder" Cc: users@dpdk.org, "Dumitrescu, Cristian" , "Dharmappa, Savinay" In-Reply-To: References: <090256f7b7a6739f80353be3339fd062@therouter.net> Message-ID: <7e314aa3562c380a573781a4c0562b93@therouter.net> X-Sender: alex@therouter.net User-Agent: Roundcube Webmail/1.3.8 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh10.alp1.flow.ch X-AntiAbuse: Original Domain - dpdk.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - therouter.net X-Get-Message-Sender-Via: wh10.alp1.flow.ch: authenticated_id: alex@therouter.net X-Authenticated-Sender: wh10.alp1.flow.ch: alex@therouter.net X-Source: X-Source-Args: X-Source-Dir: Subject: Re: [dpdk-users] scheduler issue X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" On 2020-12-07 11:00, Singh, Jasvinder wrote: >> -----Original Message----- >> From: users On Behalf Of Alex Kiselev >> Sent: Friday, November 27, 2020 12:12 PM >> To: users@dpdk.org >> Cc: Dumitrescu, Cristian >> Subject: Re: [dpdk-users] scheduler issue >> >> On 2020-11-25 16:04, Alex Kiselev wrote: >> > On 2020-11-24 16:34, Alex Kiselev wrote: >> >> Hello, >> >> >> >> I am facing a problem with the scheduler library DPDK 18.11.10 with >> >> default scheduler settings (RED is off). >> >> It seems like some of the pipes (last time it was 4 out of 600 pipes) >> >> start incorrectly dropping most of the traffic after a couple of days >> >> of successful work. >> >> >> >> So far I've checked that there are no mbuf leaks or any other errors >> >> in my code and I am sure that traffic enters problematic pipes. >> >> Also switching a traffic in the runtime to pipes of another port >> >> restores the traffic flow. >> >> >> >> Ho do I approach debugging this issue? >> >> >> >> I've added using rte_sched_queue_read_stats(), but it doesn't give me >> >> counters that accumulate values (packet drops for example), it gives >> >> me some kind of current values and after a couple of seconds those >> >> values are reset to zero, so I can say nothing based on that API. >> >> >> >> I would appreciate any ideas and help. >> >> Thanks. >> > >> > Problematic pipes had very low bandwidth limit (1 Mbit/s) and also >> > there is an oversubscription configuration event at subport 0 of port >> > 13 to which those pipes belongs and >> CONFIG_RTE_SCHED_SUBPORT_TC_OV is >> > disabled. >> > >> > Could a congestion at that subport be the reason of the problem? >> > >> > How much overhead and performance degradation will add enabling >> > CONFIG_RTE_SCHED_SUBPORT_TC_OV feature? >> > >> > Configuration: >> > >> > # >> > # QoS Scheduler Profiles >> > # >> > hqos add profile 1 rate 8 K size 1000000 tc period 40 >> > hqos add profile 2 rate 400 K size 1000000 tc period 40 >> > hqos add profile 3 rate 600 K size 1000000 tc period 40 >> > hqos add profile 4 rate 800 K size 1000000 tc period 40 >> > hqos add profile 5 rate 1 M size 1000000 tc period 40 >> > hqos add profile 6 rate 1500 K size 1000000 tc period 40 >> > hqos add profile 7 rate 2 M size 1000000 tc period 40 >> > hqos add profile 8 rate 3 M size 1000000 tc period 40 >> > hqos add profile 9 rate 4 M size 1000000 tc period 40 >> > hqos add profile 10 rate 5 M size 1000000 tc period 40 >> > hqos add profile 11 rate 6 M size 1000000 tc period 40 >> > hqos add profile 12 rate 8 M size 1000000 tc period 40 >> > hqos add profile 13 rate 10 M size 1000000 tc period 40 >> > hqos add profile 14 rate 12 M size 1000000 tc period 40 >> > hqos add profile 15 rate 15 M size 1000000 tc period 40 >> > hqos add profile 16 rate 16 M size 1000000 tc period 40 >> > hqos add profile 17 rate 20 M size 1000000 tc period 40 >> > hqos add profile 18 rate 30 M size 1000000 tc period 40 >> > hqos add profile 19 rate 32 M size 1000000 tc period 40 >> > hqos add profile 20 rate 40 M size 1000000 tc period 40 >> > hqos add profile 21 rate 50 M size 1000000 tc period 40 >> > hqos add profile 22 rate 60 M size 1000000 tc period 40 >> > hqos add profile 23 rate 100 M size 1000000 tc period 40 >> > hqos add profile 24 rate 25 M size 1000000 tc period 40 >> > hqos add profile 25 rate 50 M size 1000000 tc period 40 >> > >> > # >> > # Port 13 >> > # >> > hqos add port 13 rate 40 G mtu 1522 frame overhead 24 queue sizes 64 >> > 64 64 64 >> > hqos add port 13 subport 0 rate 1500 M size 1000000 tc period 10 >> > hqos add port 13 subport 0 pipes 3000 profile 2 >> > hqos add port 13 subport 0 pipes 3000 profile 5 >> > hqos add port 13 subport 0 pipes 3000 profile 6 >> > hqos add port 13 subport 0 pipes 3000 profile 7 >> > hqos add port 13 subport 0 pipes 3000 profile 9 >> > hqos add port 13 subport 0 pipes 3000 profile 11 >> > hqos set port 13 lcore 5 >> >> I've enabled TC_OV feature and redirected most of the traffic to TC3. >> But the issue still exists. >> >> Below is queue statistics of one of problematic pipes. >> Almost all of the traffic entering the pipe is dropped. >> >> And the pipe is also configured with the 1Mbit/s profile. >> So, the issue is only with very low bandwidth pipe profiles. >> >> And this time there was no congestion on the subport. >> >> >> Egress qdisc >> dir 0 >> rate 1M >> port 6, subport 0, pipe_id 138, profile_id 5 >> tc 0, queue 0: bytes 752, bytes dropped 0, pkts 8, pkts dropped 0 >> tc 0, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 0, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 0, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 1, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 1, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 1, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 1, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 2, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 2, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 2, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 2, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 >> tc 3, queue 0: bytes 56669, bytes dropped 360242, pkts 150, pkts >> dropped >> 3749 >> tc 3, queue 1: bytes 63005, bytes dropped 648782, pkts 150, pkts >> dropped >> 3164 >> tc 3, queue 2: bytes 9984, bytes dropped 49704, pkts 128, pkts >> dropped >> 636 >> tc 3, queue 3: bytes 15436, bytes dropped 107198, pkts 130, pkts >> dropped >> 354 > > > Hi Alex, > > Can you try newer version of the library, say dpdk 20.11? Right now no, since switching to another DPDK will take a lot of time because I am using a lot of custom patches. I've tried to simply copy the entire rte_sched lib from DPDK 19 to DPDK 18. And I was able to successful back port and resolve all dependency issues, but it also will take some time to test this approach. > Are you > using dpdk qos sample app or your own app? My own app. >> What are the packets size? Application is used as BRAS/BNG server, so it's used to provide internet access to residential customers. Therefore packet sizes are typical to the internet and vary from 64 to 1500 bytes. Most of the packets are around 1000 bytes. > > Couple of other things for clarification- > 1. At what rate you are injecting the traffic to low bandwidth pipes? Well, the rate vary also, there could be congestion on some pipes at some date time. But the problem is that once the problem occurs at a pipe or at some queues inside the pipe, the pipe stops transmitting even when incoming traffic rate is much lower than the pipe's rate. > 2. How is traffic distributed among pipes and their traffic class? I am using IPv4 TOS field to choose the TC and there is a tos2tc map. Most of my traffic has 0 tos value which is mapped to TC3 inside my app. Recently I've switched to a tos2map which maps all traffic to TC3 to see if it solves the problem. Packet distribution to queues is done using the formula (ipv4.src + ipv4.dst) & 3 > 3. Can you try putting your own counters on those pipes queues which > periodically show the #packets in the queues to understand the > dynamics? I will try. P.S. Recently I've got another problem with scheduler. After enabling the TC_OV feature one of the ports stops transmitting. All port's pipes were affected. Port had only one support, and there were only pipes with 1 Mbit/s profile. The problem was solved by adding a 10Mit/s profile to that port. Only after that port's pipes started to transmit. I guess it has something to do with calculating tc_ov_wm as it depends on the maximum pipe rate. I am gonna make a test lab and a test build to reproduce this. > > Thanks, > Jasvinder