DPDK usage discussions
 help / color / mirror / Atom feed
From: Alex Kiselev <alex@therouter.net>
To: "Singh, Jasvinder" <jasvinder.singh@intel.com>
Cc: users@dpdk.org, "Dumitrescu,
	Cristian" <cristian.dumitrescu@intel.com>,
	"Dharmappa, Savinay" <savinay.dharmappa@intel.com>
Subject: Re: [dpdk-users] scheduler issue
Date: Mon, 07 Dec 2020 11:46:12 +0100	[thread overview]
Message-ID: <7e314aa3562c380a573781a4c0562b93@therouter.net> (raw)
In-Reply-To: <CY4PR1101MB2134727AA8A986E0F5675561E0CE0@CY4PR1101MB2134.namprd11.prod.outlook.com>

On 2020-12-07 11:00, Singh, Jasvinder wrote:
>> -----Original Message-----
>> From: users <users-bounces@dpdk.org> On Behalf Of Alex Kiselev
>> Sent: Friday, November 27, 2020 12:12 PM
>> To: users@dpdk.org
>> Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
>> Subject: Re: [dpdk-users] scheduler issue
>> 
>> On 2020-11-25 16:04, Alex Kiselev wrote:
>> > On 2020-11-24 16:34, Alex Kiselev wrote:
>> >> Hello,
>> >>
>> >> I am facing a problem with the scheduler library DPDK 18.11.10 with
>> >> default scheduler settings (RED is off).
>> >> It seems like some of the pipes (last time it was 4 out of 600 pipes)
>> >> start incorrectly dropping most of the traffic after a couple of days
>> >> of successful work.
>> >>
>> >> So far I've checked that there are no mbuf leaks or any other errors
>> >> in my code and I am sure that traffic enters problematic pipes.
>> >> Also switching a traffic in the runtime to pipes of another port
>> >> restores the traffic flow.
>> >>
>> >> Ho do I approach debugging this issue?
>> >>
>> >> I've added using rte_sched_queue_read_stats(), but it doesn't give me
>> >> counters that accumulate values (packet drops for example), it gives
>> >> me some kind of current values and after a couple of seconds those
>> >> values are reset to zero, so I can say nothing based on that API.
>> >>
>> >> I would appreciate any ideas and help.
>> >> Thanks.
>> >
>> > Problematic pipes had very low bandwidth limit (1 Mbit/s) and also
>> > there is an oversubscription configuration event at subport 0 of port
>> > 13 to which those pipes belongs and
>> CONFIG_RTE_SCHED_SUBPORT_TC_OV is
>> > disabled.
>> >
>> > Could a congestion at that subport be the reason of the problem?
>> >
>> > How much overhead and performance degradation will add enabling
>> > CONFIG_RTE_SCHED_SUBPORT_TC_OV feature?
>> >
>> > Configuration:
>> >
>> >   #
>> >   # QoS Scheduler Profiles
>> >   #
>> >   hqos add profile  1 rate    8 K size 1000000 tc period 40
>> >   hqos add profile  2 rate  400 K size 1000000 tc period 40
>> >   hqos add profile  3 rate  600 K size 1000000 tc period 40
>> >   hqos add profile  4 rate  800 K size 1000000 tc period 40
>> >   hqos add profile  5 rate    1 M size 1000000 tc period 40
>> >   hqos add profile  6 rate 1500 K size 1000000 tc period 40
>> >   hqos add profile  7 rate    2 M size 1000000 tc period 40
>> >   hqos add profile  8 rate    3 M size 1000000 tc period 40
>> >   hqos add profile  9 rate    4 M size 1000000 tc period 40
>> >   hqos add profile 10 rate    5 M size 1000000 tc period 40
>> >   hqos add profile 11 rate    6 M size 1000000 tc period 40
>> >   hqos add profile 12 rate    8 M size 1000000 tc period 40
>> >   hqos add profile 13 rate   10 M size 1000000 tc period 40
>> >   hqos add profile 14 rate   12 M size 1000000 tc period 40
>> >   hqos add profile 15 rate   15 M size 1000000 tc period 40
>> >   hqos add profile 16 rate   16 M size 1000000 tc period 40
>> >   hqos add profile 17 rate   20 M size 1000000 tc period 40
>> >   hqos add profile 18 rate   30 M size 1000000 tc period 40
>> >   hqos add profile 19 rate   32 M size 1000000 tc period 40
>> >   hqos add profile 20 rate   40 M size 1000000 tc period 40
>> >   hqos add profile 21 rate   50 M size 1000000 tc period 40
>> >   hqos add profile 22 rate   60 M size 1000000 tc period 40
>> >   hqos add profile 23 rate  100 M size 1000000 tc period 40
>> >   hqos add profile 24 rate 25 M size 1000000 tc period 40
>> >   hqos add profile 25 rate 50 M size 1000000 tc period 40
>> >
>> >   #
>> >   # Port 13
>> >   #
>> >   hqos add port 13 rate 40 G mtu 1522 frame overhead 24 queue sizes 64
>> > 64 64 64
>> >   hqos add port 13 subport 0 rate 1500 M size 1000000 tc period 10
>> >   hqos add port 13 subport 0 pipes 3000 profile 2
>> >   hqos add port 13 subport 0 pipes 3000 profile 5
>> >   hqos add port 13 subport 0 pipes 3000 profile 6
>> >   hqos add port 13 subport 0 pipes 3000 profile 7
>> >   hqos add port 13 subport 0 pipes 3000 profile 9
>> >   hqos add port 13 subport 0 pipes 3000 profile 11
>> >   hqos set port 13 lcore 5
>> 
>> I've enabled TC_OV feature and redirected most of the traffic to TC3.
>> But the issue still exists.
>> 
>> Below is queue statistics of one of problematic pipes.
>> Almost all of the traffic entering the pipe is dropped.
>> 
>> And the pipe is also configured with the 1Mbit/s profile.
>> So, the issue is only with very low bandwidth pipe profiles.
>> 
>> And this time there was no congestion on the subport.
>> 
>> 
>> Egress qdisc
>> dir 0
>>    rate 1M
>>    port 6, subport 0, pipe_id 138, profile_id 5
>>    tc 0, queue 0: bytes 752, bytes dropped 0, pkts 8, pkts dropped 0
>>    tc 0, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 0, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 0, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 1, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 1, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 1, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 1, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 2, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 2, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 2, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 2, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>    tc 3, queue 0: bytes 56669, bytes dropped 360242, pkts 150, pkts 
>> dropped
>> 3749
>>    tc 3, queue 1: bytes 63005, bytes dropped 648782, pkts 150, pkts 
>> dropped
>> 3164
>>    tc 3, queue 2: bytes 9984, bytes dropped 49704, pkts 128, pkts 
>> dropped
>> 636
>>    tc 3, queue 3: bytes 15436, bytes dropped 107198, pkts 130, pkts 
>> dropped
>> 354
> 
> 
> Hi Alex,
> 
> Can you try newer version of the library, say dpdk 20.11?

Right now no, since switching to another DPDK will take a lot of time
because I am using a lot of custom patches.

I've tried to simply copy the entire rte_sched lib from DPDK 19 to DPDK 
18.
And I was able to successful back port and resolve all dependency 
issues,
but it also will take some time to test this approach.


> Are you
> using dpdk qos sample app or your own app?

My own app.

>> What are the packets size?

Application is used as BRAS/BNG server, so it's used to provide internet
access to residential customers. Therefore packet sizes are typical to 
the internet
and vary from 64 to 1500 bytes. Most of the packets are around 1000 
bytes.

> 
> Couple of other things for clarification-
> 1. At what rate you are injecting the traffic to low bandwidth pipes?

Well, the rate vary also, there could be congestion on some pipes at 
some
date time.

But the problem is that once the problem occurs at a pipe or at some 
queues
inside the pipe, the pipe stops transmitting even when incoming traffic 
rate
is much lower than the pipe's rate.

> 2. How is traffic distributed among pipes and their traffic class?

I am using IPv4 TOS field to choose the TC and there is a tos2tc map.
Most of my traffic has 0 tos value which is mapped to TC3 inside my app.

Recently I've switched to a tos2map which maps all traffic to TC3
to see if it solves the problem.

Packet distribution to queues is done using the formula (ipv4.src + 
ipv4.dst) & 3

> 3. Can you try putting your own counters on those pipes queues which
> periodically show the #packets in the queues to understand the
> dynamics?

I will try.

P.S.

Recently I've got another problem with scheduler.

After enabling the TC_OV feature one of the ports stops transmitting.
All port's pipes were affected.
Port had only one support, and there were only pipes with 1 Mbit/s
profile. The problem was solved by adding a 10Mit/s profile to that
port. Only after that port's pipes started to transmit.
I guess it has something to do with calculating tc_ov_wm
as it depends on the maximum pipe rate.

I am gonna make a test lab and a test build to reproduce this.

> 
> Thanks,
> Jasvinder

  reply	other threads:[~2020-12-07 10:46 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24 13:34 Alex Kiselev
2020-11-25 15:04 ` Alex Kiselev
2020-11-27 12:11   ` Alex Kiselev
2020-12-07 10:00     ` Singh, Jasvinder
2020-12-07 10:46       ` Alex Kiselev [this message]
2020-12-07 11:32         ` Singh, Jasvinder
2020-12-07 12:29           ` Alex Kiselev
2020-12-07 16:49           ` Alex Kiselev
2020-12-07 17:31             ` Singh, Jasvinder
2020-12-07 17:45               ` Alex Kiselev
     [not found]                 ` <49019BC8-DDA6-4B39-B395-2A68E91AB424@intel.com>
     [not found]                   ` <226b13286c876e69ad40a65858131b66@therouter.net>
     [not found]                     ` <4536a02973015dc8049834635f145a19@therouter.net>
     [not found]                       ` <f9a27b6493ae1e1e2850a3b459ab9d33@therouter.net>
     [not found]                         ` <B8241A33-0927-4411-A340-9DD0BEE07968@intel.com>
     [not found]                           ` <e6a0429dc4a1a33861a066e3401e85b6@therouter.net>
2020-12-07 22:16                             ` Alex Kiselev
2020-12-07 22:32                               ` Singh, Jasvinder
2020-12-08 10:52                                 ` Alex Kiselev
2020-12-08 13:24                                   ` Singh, Jasvinder
2020-12-09 13:41                                     ` Alex Kiselev
2020-12-10 10:29                                       ` Singh, Jasvinder
2020-12-11 21:29                                     ` Alex Kiselev
2020-12-11 22:06                                       ` Singh, Jasvinder
2020-12-11 22:27                                         ` Alex Kiselev
2020-12-11 22:36                                           ` Alex Kiselev
2020-12-11 22:55                                           ` Singh, Jasvinder
2020-12-11 23:36                                             ` Alex Kiselev
2020-12-12  0:20                                               ` Singh, Jasvinder
2020-12-12  0:45                                                 ` Alex Kiselev
2020-12-12  0:54                                                   ` Alex Kiselev
2020-12-12  1:45                                                     ` Alex Kiselev
2020-12-12 10:22                                                       ` Singh, Jasvinder
2020-12-12 10:46                                                         ` Alex Kiselev
2020-12-12 17:19                                                           ` Alex Kiselev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7e314aa3562c380a573781a4c0562b93@therouter.net \
    --to=alex@therouter.net \
    --cc=cristian.dumitrescu@intel.com \
    --cc=jasvinder.singh@intel.com \
    --cc=savinay.dharmappa@intel.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).