From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 07D67A0524 for ; Fri, 27 Nov 2020 13:11:46 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B16BCC93E; Fri, 27 Nov 2020 13:11:44 +0100 (CET) Received: from wh10.alp1.flow.ch (wh10.alp1.flow.ch [185.119.84.194]) by dpdk.org (Postfix) with ESMTP id 54FDAC93C for ; Fri, 27 Nov 2020 13:11:42 +0100 (CET) Received: from [::1] (port=39930 helo=wh10.alp1.flow.ch) by wh10.alp1.flow.ch with esmtpa (Exim 4.92) (envelope-from ) id 1kicbD-000Xrg-W5; Fri, 27 Nov 2020 13:11:39 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Fri, 27 Nov 2020 13:11:39 +0100 From: Alex Kiselev To: users@dpdk.org Cc: cristian.dumitrescu@intel.com In-Reply-To: References: Message-ID: <090256f7b7a6739f80353be3339fd062@therouter.net> X-Sender: alex@therouter.net User-Agent: Roundcube Webmail/1.3.8 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh10.alp1.flow.ch X-AntiAbuse: Original Domain - dpdk.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - therouter.net X-Get-Message-Sender-Via: wh10.alp1.flow.ch: authenticated_id: alex@therouter.net X-Authenticated-Sender: wh10.alp1.flow.ch: alex@therouter.net X-Source: X-Source-Args: X-Source-Dir: Subject: Re: [dpdk-users] scheduler issue X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" On 2020-11-25 16:04, Alex Kiselev wrote: > On 2020-11-24 16:34, Alex Kiselev wrote: >> Hello, >> >> I am facing a problem with the scheduler library DPDK 18.11.10 with >> default >> scheduler settings (RED is off). >> It seems like some of the pipes (last time it was 4 out of 600 pipes) >> start incorrectly dropping most of the traffic after a couple of days >> of successful work. >> >> So far I've checked that there are no mbuf leaks or any >> other errors in my code and I am sure that traffic enters problematic >> pipes. >> Also switching a traffic in the runtime to pipes of another port >> restores the traffic flow. >> >> Ho do I approach debugging this issue? >> >> I've added using rte_sched_queue_read_stats(), but it doesn't give >> me counters that accumulate values (packet drops for example), >> it gives me some kind of current values and after a couple of seconds >> those values are reset to zero, so I can say nothing based on that >> API. >> >> I would appreciate any ideas and help. >> Thanks. > > Problematic pipes had very low bandwidth limit (1 Mbit/s) and > also there is an oversubscription configuration event at subport 0 > of port 13 to which those pipes belongs and > CONFIG_RTE_SCHED_SUBPORT_TC_OV is disabled. > > Could a congestion at that subport be the reason of the problem? > > How much overhead and performance degradation will add enabling > CONFIG_RTE_SCHED_SUBPORT_TC_OV feature? > > Configuration: > > # > # QoS Scheduler Profiles > # > hqos add profile 1 rate 8 K size 1000000 tc period 40 > hqos add profile 2 rate 400 K size 1000000 tc period 40 > hqos add profile 3 rate 600 K size 1000000 tc period 40 > hqos add profile 4 rate 800 K size 1000000 tc period 40 > hqos add profile 5 rate 1 M size 1000000 tc period 40 > hqos add profile 6 rate 1500 K size 1000000 tc period 40 > hqos add profile 7 rate 2 M size 1000000 tc period 40 > hqos add profile 8 rate 3 M size 1000000 tc period 40 > hqos add profile 9 rate 4 M size 1000000 tc period 40 > hqos add profile 10 rate 5 M size 1000000 tc period 40 > hqos add profile 11 rate 6 M size 1000000 tc period 40 > hqos add profile 12 rate 8 M size 1000000 tc period 40 > hqos add profile 13 rate 10 M size 1000000 tc period 40 > hqos add profile 14 rate 12 M size 1000000 tc period 40 > hqos add profile 15 rate 15 M size 1000000 tc period 40 > hqos add profile 16 rate 16 M size 1000000 tc period 40 > hqos add profile 17 rate 20 M size 1000000 tc period 40 > hqos add profile 18 rate 30 M size 1000000 tc period 40 > hqos add profile 19 rate 32 M size 1000000 tc period 40 > hqos add profile 20 rate 40 M size 1000000 tc period 40 > hqos add profile 21 rate 50 M size 1000000 tc period 40 > hqos add profile 22 rate 60 M size 1000000 tc period 40 > hqos add profile 23 rate 100 M size 1000000 tc period 40 > hqos add profile 24 rate 25 M size 1000000 tc period 40 > hqos add profile 25 rate 50 M size 1000000 tc period 40 > > # > # Port 13 > # > hqos add port 13 rate 40 G mtu 1522 frame overhead 24 queue sizes 64 > 64 64 64 > hqos add port 13 subport 0 rate 1500 M size 1000000 tc period 10 > hqos add port 13 subport 0 pipes 3000 profile 2 > hqos add port 13 subport 0 pipes 3000 profile 5 > hqos add port 13 subport 0 pipes 3000 profile 6 > hqos add port 13 subport 0 pipes 3000 profile 7 > hqos add port 13 subport 0 pipes 3000 profile 9 > hqos add port 13 subport 0 pipes 3000 profile 11 > hqos set port 13 lcore 5 I've enabled TC_OV feature and redirected most of the traffic to TC3. But the issue still exists. Below is queue statistics of one of problematic pipes. Almost all of the traffic entering the pipe is dropped. And the pipe is also configured with the 1Mbit/s profile. So, the issue is only with very low bandwidth pipe profiles. And this time there was no congestion on the subport. Egress qdisc dir 0 rate 1M port 6, subport 0, pipe_id 138, profile_id 5 tc 0, queue 0: bytes 752, bytes dropped 0, pkts 8, pkts dropped 0 tc 0, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 0, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 0, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 1, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 1, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 1, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 1, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 2, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 2, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 2, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 2, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0 tc 3, queue 0: bytes 56669, bytes dropped 360242, pkts 150, pkts dropped 3749 tc 3, queue 1: bytes 63005, bytes dropped 648782, pkts 150, pkts dropped 3164 tc 3, queue 2: bytes 9984, bytes dropped 49704, pkts 128, pkts dropped 636 tc 3, queue 3: bytes 15436, bytes dropped 107198, pkts 130, pkts dropped 354