From: "Singh, Jasvinder" <jasvinder.singh@intel.com>
To: Alex Kiselev <alex@therouter.net>
Cc: "users@dpdk.org" <users@dpdk.org>,
"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>,
"Dharmappa, Savinay" <savinay.dharmappa@intel.com>
Subject: Re: [dpdk-users] scheduler issue
Date: Mon, 7 Dec 2020 11:32:55 +0000 [thread overview]
Message-ID: <CY4PR1101MB2134CDBBF15FCC39AC068B23E0CE0@CY4PR1101MB2134.namprd11.prod.outlook.com> (raw)
In-Reply-To: <7e314aa3562c380a573781a4c0562b93@therouter.net>
> -----Original Message-----
> From: Alex Kiselev <alex@therouter.net>
> Sent: Monday, December 7, 2020 10:46 AM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>
> Cc: users@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>;
> Dharmappa, Savinay <savinay.dharmappa@intel.com>
> Subject: Re: [dpdk-users] scheduler issue
>
> On 2020-12-07 11:00, Singh, Jasvinder wrote:
> >> -----Original Message-----
> >> From: users <users-bounces@dpdk.org> On Behalf Of Alex Kiselev
> >> Sent: Friday, November 27, 2020 12:12 PM
> >> To: users@dpdk.org
> >> Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
> >> Subject: Re: [dpdk-users] scheduler issue
> >>
> >> On 2020-11-25 16:04, Alex Kiselev wrote:
> >> > On 2020-11-24 16:34, Alex Kiselev wrote:
> >> >> Hello,
> >> >>
> >> >> I am facing a problem with the scheduler library DPDK 18.11.10
> >> >> with default scheduler settings (RED is off).
> >> >> It seems like some of the pipes (last time it was 4 out of 600
> >> >> pipes) start incorrectly dropping most of the traffic after a
> >> >> couple of days of successful work.
> >> >>
> >> >> So far I've checked that there are no mbuf leaks or any other
> >> >> errors in my code and I am sure that traffic enters problematic pipes.
> >> >> Also switching a traffic in the runtime to pipes of another port
> >> >> restores the traffic flow.
> >> >>
> >> >> Ho do I approach debugging this issue?
> >> >>
> >> >> I've added using rte_sched_queue_read_stats(), but it doesn't give
> >> >> me counters that accumulate values (packet drops for example), it
> >> >> gives me some kind of current values and after a couple of seconds
> >> >> those values are reset to zero, so I can say nothing based on that API.
> >> >>
> >> >> I would appreciate any ideas and help.
> >> >> Thanks.
> >> >
> >> > Problematic pipes had very low bandwidth limit (1 Mbit/s) and also
> >> > there is an oversubscription configuration event at subport 0 of
> >> > port
> >> > 13 to which those pipes belongs and
> >> CONFIG_RTE_SCHED_SUBPORT_TC_OV is
> >> > disabled.
> >> >
> >> > Could a congestion at that subport be the reason of the problem?
> >> >
> >> > How much overhead and performance degradation will add enabling
> >> > CONFIG_RTE_SCHED_SUBPORT_TC_OV feature?
> >> >
> >> > Configuration:
> >> >
> >> > #
> >> > # QoS Scheduler Profiles
> >> > #
> >> > hqos add profile 1 rate 8 K size 1000000 tc period 40
> >> > hqos add profile 2 rate 400 K size 1000000 tc period 40
> >> > hqos add profile 3 rate 600 K size 1000000 tc period 40
> >> > hqos add profile 4 rate 800 K size 1000000 tc period 40
> >> > hqos add profile 5 rate 1 M size 1000000 tc period 40
> >> > hqos add profile 6 rate 1500 K size 1000000 tc period 40
> >> > hqos add profile 7 rate 2 M size 1000000 tc period 40
> >> > hqos add profile 8 rate 3 M size 1000000 tc period 40
> >> > hqos add profile 9 rate 4 M size 1000000 tc period 40
> >> > hqos add profile 10 rate 5 M size 1000000 tc period 40
> >> > hqos add profile 11 rate 6 M size 1000000 tc period 40
> >> > hqos add profile 12 rate 8 M size 1000000 tc period 40
> >> > hqos add profile 13 rate 10 M size 1000000 tc period 40
> >> > hqos add profile 14 rate 12 M size 1000000 tc period 40
> >> > hqos add profile 15 rate 15 M size 1000000 tc period 40
> >> > hqos add profile 16 rate 16 M size 1000000 tc period 40
> >> > hqos add profile 17 rate 20 M size 1000000 tc period 40
> >> > hqos add profile 18 rate 30 M size 1000000 tc period 40
> >> > hqos add profile 19 rate 32 M size 1000000 tc period 40
> >> > hqos add profile 20 rate 40 M size 1000000 tc period 40
> >> > hqos add profile 21 rate 50 M size 1000000 tc period 40
> >> > hqos add profile 22 rate 60 M size 1000000 tc period 40
> >> > hqos add profile 23 rate 100 M size 1000000 tc period 40
> >> > hqos add profile 24 rate 25 M size 1000000 tc period 40
> >> > hqos add profile 25 rate 50 M size 1000000 tc period 40
> >> >
> >> > #
> >> > # Port 13
> >> > #
> >> > hqos add port 13 rate 40 G mtu 1522 frame overhead 24 queue sizes
> >> > 64
> >> > 64 64 64
> >> > hqos add port 13 subport 0 rate 1500 M size 1000000 tc period 10
> >> > hqos add port 13 subport 0 pipes 3000 profile 2
> >> > hqos add port 13 subport 0 pipes 3000 profile 5
> >> > hqos add port 13 subport 0 pipes 3000 profile 6
> >> > hqos add port 13 subport 0 pipes 3000 profile 7
> >> > hqos add port 13 subport 0 pipes 3000 profile 9
> >> > hqos add port 13 subport 0 pipes 3000 profile 11
> >> > hqos set port 13 lcore 5
> >>
> >> I've enabled TC_OV feature and redirected most of the traffic to TC3.
> >> But the issue still exists.
> >>
> >> Below is queue statistics of one of problematic pipes.
> >> Almost all of the traffic entering the pipe is dropped.
> >>
> >> And the pipe is also configured with the 1Mbit/s profile.
> >> So, the issue is only with very low bandwidth pipe profiles.
> >>
> >> And this time there was no congestion on the subport.
> >>
> >>
> >> Egress qdisc
> >> dir 0
> >> rate 1M
> >> port 6, subport 0, pipe_id 138, profile_id 5
> >> tc 0, queue 0: bytes 752, bytes dropped 0, pkts 8, pkts dropped 0
> >> tc 0, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 0, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 0, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 1, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 1, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 1, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 1, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 2, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 2, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 2, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 2, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
> >> tc 3, queue 0: bytes 56669, bytes dropped 360242, pkts 150, pkts
> >> dropped
> >> 3749
> >> tc 3, queue 1: bytes 63005, bytes dropped 648782, pkts 150, pkts
> >> dropped
> >> 3164
> >> tc 3, queue 2: bytes 9984, bytes dropped 49704, pkts 128, pkts
> >> dropped
> >> 636
> >> tc 3, queue 3: bytes 15436, bytes dropped 107198, pkts 130, pkts
> >> dropped
> >> 354
> >
> >
> > Hi Alex,
> >
> > Can you try newer version of the library, say dpdk 20.11?
>
> Right now no, since switching to another DPDK will take a lot of time because
> I am using a lot of custom patches.
>
> I've tried to simply copy the entire rte_sched lib from DPDK 19 to DPDK 18.
> And I was able to successful back port and resolve all dependency issues, but
> it also will take some time to test this approach.
>
>
> > Are you
> > using dpdk qos sample app or your own app?
>
> My own app.
>
> >> What are the packets size?
>
> Application is used as BRAS/BNG server, so it's used to provide internet
> access to residential customers. Therefore packet sizes are typical to the
> internet and vary from 64 to 1500 bytes. Most of the packets are around
> 1000 bytes.
>
> >
> > Couple of other things for clarification- 1. At what rate you are
> > injecting the traffic to low bandwidth pipes?
>
> Well, the rate vary also, there could be congestion on some pipes at some
> date time.
>
> But the problem is that once the problem occurs at a pipe or at some queues
> inside the pipe, the pipe stops transmitting even when incoming traffic rate is
> much lower than the pipe's rate.
>
> > 2. How is traffic distributed among pipes and their traffic class?
>
> I am using IPv4 TOS field to choose the TC and there is a tos2tc map.
> Most of my traffic has 0 tos value which is mapped to TC3 inside my app.
>
> Recently I've switched to a tos2map which maps all traffic to TC3 to see if it
> solves the problem.
>
> Packet distribution to queues is done using the formula (ipv4.src +
> ipv4.dst) & 3
>
> > 3. Can you try putting your own counters on those pipes queues which
> > periodically show the #packets in the queues to understand the
> > dynamics?
>
> I will try.
>
> P.S.
>
> Recently I've got another problem with scheduler.
>
> After enabling the TC_OV feature one of the ports stops transmitting.
> All port's pipes were affected.
> Port had only one support, and there were only pipes with 1 Mbit/s profile.
> The problem was solved by adding a 10Mit/s profile to that port. Only after
> that port's pipes started to transmit.
> I guess it has something to do with calculating tc_ov_wm as it depends on the
> maximum pipe rate.
>
> I am gonna make a test lab and a test build to reproduce this.
Does this problem exist when you disable oversubscription mode? Worth looking at grinder_tc_ov_credits_update() and grinder_credits_update() functions where tc_ov_wm is altered.
> >
> > Thanks,
> > Jasvinder
next prev parent reply other threads:[~2020-12-07 11:33 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-24 13:34 Alex Kiselev
2020-11-25 15:04 ` Alex Kiselev
2020-11-27 12:11 ` Alex Kiselev
2020-12-07 10:00 ` Singh, Jasvinder
2020-12-07 10:46 ` Alex Kiselev
2020-12-07 11:32 ` Singh, Jasvinder [this message]
2020-12-07 12:29 ` Alex Kiselev
2020-12-07 16:49 ` Alex Kiselev
2020-12-07 17:31 ` Singh, Jasvinder
2020-12-07 17:45 ` Alex Kiselev
[not found] ` <49019BC8-DDA6-4B39-B395-2A68E91AB424@intel.com>
[not found] ` <226b13286c876e69ad40a65858131b66@therouter.net>
[not found] ` <4536a02973015dc8049834635f145a19@therouter.net>
[not found] ` <f9a27b6493ae1e1e2850a3b459ab9d33@therouter.net>
[not found] ` <B8241A33-0927-4411-A340-9DD0BEE07968@intel.com>
[not found] ` <e6a0429dc4a1a33861a066e3401e85b6@therouter.net>
2020-12-07 22:16 ` Alex Kiselev
2020-12-07 22:32 ` Singh, Jasvinder
2020-12-08 10:52 ` Alex Kiselev
2020-12-08 13:24 ` Singh, Jasvinder
2020-12-09 13:41 ` Alex Kiselev
2020-12-10 10:29 ` Singh, Jasvinder
2020-12-11 21:29 ` Alex Kiselev
2020-12-11 22:06 ` Singh, Jasvinder
2020-12-11 22:27 ` Alex Kiselev
2020-12-11 22:36 ` Alex Kiselev
2020-12-11 22:55 ` Singh, Jasvinder
2020-12-11 23:36 ` Alex Kiselev
2020-12-12 0:20 ` Singh, Jasvinder
2020-12-12 0:45 ` Alex Kiselev
2020-12-12 0:54 ` Alex Kiselev
2020-12-12 1:45 ` Alex Kiselev
2020-12-12 10:22 ` Singh, Jasvinder
2020-12-12 10:46 ` Alex Kiselev
2020-12-12 17:19 ` Alex Kiselev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CY4PR1101MB2134CDBBF15FCC39AC068B23E0CE0@CY4PR1101MB2134.namprd11.prod.outlook.com \
--to=jasvinder.singh@intel.com \
--cc=alex@therouter.net \
--cc=cristian.dumitrescu@intel.com \
--cc=savinay.dharmappa@intel.com \
--cc=users@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).