DPDK usage discussions
 help / color / mirror / Atom feed
From: "Singh, Jasvinder" <jasvinder.singh@intel.com>
To: Alex Kiselev <alex@therouter.net>
Cc: "users@dpdk.org" <users@dpdk.org>,
	"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>,
	"Dharmappa, Savinay" <savinay.dharmappa@intel.com>
Subject: Re: [dpdk-users] scheduler issue
Date: Mon, 7 Dec 2020 22:32:06 +0000	[thread overview]
Message-ID: <5FD54115-155E-4492-B1D6-041C8782BB8E@intel.com> (raw)
In-Reply-To: <4e5bde1cf78b0f77f4a5ec016a7217d6@therouter.net>



> On 7 Dec 2020, at 22:16, Alex Kiselev <alex@therouter.net> wrote:
> 
> On 2020-12-07 21:34, Alex Kiselev wrote:
>> On 2020-12-07 20:29, Singh, Jasvinder wrote:
>>>> On 7 Dec 2020, at 19:09, Alex Kiselev <alex@therouter.net> wrote:
>>>> On 2020-12-07 20:07, Alex Kiselev wrote:
>>>>>> On 2020-12-07 19:18, Alex Kiselev wrote:
>>>>>> On 2020-12-07 18:59, Singh, Jasvinder wrote:
>>>>>>>> On 7 Dec 2020, at 17:45, Alex Kiselev <alex@therouter.net> wrote:
>>>>>>>> On 2020-12-07 18:31, Singh, Jasvinder wrote:
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Alex Kiselev <alex@therouter.net>
>>>>>>>>>> Sent: Monday, December 7, 2020 4:50 PM
>>>>>>>>>> To: Singh, Jasvinder <jasvinder.singh@intel.com>
>>>>>>>>>> Cc: users@dpdk.org; Dumitrescu, Cristian <cristian.dumitrescu@intel.com>;
>>>>>>>>>> Dharmappa, Savinay <savinay.dharmappa@intel.com>
>>>>>>>>>> Subject: Re: [dpdk-users] scheduler issue
>>>>>>>>>>>> On 2020-12-07 12:32, Singh, Jasvinder wrote:
>>>>>>>>>>> >> -----Original Message-----
>>>>>>>>>>> >> From: Alex Kiselev <alex@therouter.net>
>>>>>>>>>>> >> Sent: Monday, December 7, 2020 10:46 AM
>>>>>>>>>>> >> To: Singh, Jasvinder <jasvinder.singh@intel.com>
>>>>>>>>>>> >> Cc: users@dpdk.org; Dumitrescu, Cristian
>>>>>>>>>>> >> <cristian.dumitrescu@intel.com>; Dharmappa, Savinay
>>>>>>>>>>> >> <savinay.dharmappa@intel.com>
>>>>>>>>>>> >> Subject: Re: [dpdk-users] scheduler issue
>>>>>>>>>>> >>
>>>>>>>>>>> >> On 2020-12-07 11:00, Singh, Jasvinder wrote:
>>>>>>>>>>> >> >> -----Original Message-----
>>>>>>>>>>> >> >> From: users <users-bounces@dpdk.org> On Behalf Of Alex Kiselev
>>>>>>>>>>> >> >> Sent: Friday, November 27, 2020 12:12 PM
>>>>>>>>>>> >> >> To: users@dpdk.org
>>>>>>>>>>> >> >> Cc: Dumitrescu, Cristian <cristian.dumitrescu@intel.com>
>>>>>>>>>>> >> >> Subject: Re: [dpdk-users] scheduler issue
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> On 2020-11-25 16:04, Alex Kiselev wrote:
>>>>>>>>>>> >> >> > On 2020-11-24 16:34, Alex Kiselev wrote:
>>>>>>>>>>> >> >> >> Hello,
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >> I am facing a problem with the scheduler library DPDK 18.11.10
>>>>>>>>>>> >> >> >> with default scheduler settings (RED is off).
>>>>>>>>>>> >> >> >> It seems like some of the pipes (last time it was 4 out of 600
>>>>>>>>>>> >> >> >> pipes) start incorrectly dropping most of the traffic after a
>>>>>>>>>>> >> >> >> couple of days of successful work.
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >> So far I've checked that there are no mbuf leaks or any other
>>>>>>>>>>> >> >> >> errors in my code and I am sure that traffic enters problematic
>>>>>>>>>>> pipes.
>>>>>>>>>>> >> >> >> Also switching a traffic in the runtime to pipes of another
>>>>>>>>>>> >> >> >> port restores the traffic flow.
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >> Ho do I approach debugging this issue?
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >> I've added using rte_sched_queue_read_stats(), but it doesn't
>>>>>>>>>>> >> >> >> give me counters that accumulate values (packet drops for
>>>>>>>>>>> >> >> >> example), it gives me some kind of current values and after a
>>>>>>>>>>> >> >> >> couple of seconds those values are reset to zero, so I can say
>>>>>>>>>>> nothing based on that API.
>>>>>>>>>>> >> >> >>
>>>>>>>>>>> >> >> >> I would appreciate any ideas and help.
>>>>>>>>>>> >> >> >> Thanks.
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> > Problematic pipes had very low bandwidth limit (1 Mbit/s) and
>>>>>>>>>>> >> >> > also there is an oversubscription configuration event at subport
>>>>>>>>>>> >> >> > 0 of port
>>>>>>>>>>> >> >> > 13 to which those pipes belongs and
>>>>>>>>>>> >> >> CONFIG_RTE_SCHED_SUBPORT_TC_OV is
>>>>>>>>>>> >> >> > disabled.
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> > Could a congestion at that subport be the reason of the problem?
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> > How much overhead and performance degradation will add enabling
>>>>>>>>>>> >> >> > CONFIG_RTE_SCHED_SUBPORT_TC_OV feature?
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> > Configuration:
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> >   #
>>>>>>>>>>> >> >> >   # QoS Scheduler Profiles
>>>>>>>>>>> >> >> >   #
>>>>>>>>>>> >> >> >   hqos add profile  1 rate    8 K size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile  2 rate  400 K size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile  3 rate  600 K size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile  4 rate  800 K size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile  5 rate    1 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile  6 rate 1500 K size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile  7 rate    2 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile  8 rate    3 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile  9 rate    4 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 10 rate    5 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 11 rate    6 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 12 rate    8 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 13 rate   10 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 14 rate   12 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 15 rate   15 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 16 rate   16 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 17 rate   20 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 18 rate   30 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 19 rate   32 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 20 rate   40 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 21 rate   50 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 22 rate   60 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 23 rate  100 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 24 rate 25 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >   hqos add profile 25 rate 50 M size 1000000 tc period 40
>>>>>>>>>>> >> >> >
>>>>>>>>>>> >> >> >   #
>>>>>>>>>>> >> >> >   # Port 13
>>>>>>>>>>> >> >> >   #
>>>>>>>>>>> >> >> >   hqos add port 13 rate 40 G mtu 1522 frame overhead 24 queue
>>>>>>>>>>> >> >> > sizes
>>>>>>>>>>> >> >> > 64
>>>>>>>>>>> >> >> > 64 64 64
>>>>>>>>>>> >> >> >   hqos add port 13 subport 0 rate 1500 M size 1000000 tc period 10
>>>>>>>>>>> >> >> >   hqos add port 13 subport 0 pipes 3000 profile 2
>>>>>>>>>>> >> >> >   hqos add port 13 subport 0 pipes 3000 profile 5
>>>>>>>>>>> >> >> >   hqos add port 13 subport 0 pipes 3000 profile 6
>>>>>>>>>>> >> >> >   hqos add port 13 subport 0 pipes 3000 profile 7
>>>>>>>>>>> >> >> >   hqos add port 13 subport 0 pipes 3000 profile 9
>>>>>>>>>>> >> >> >   hqos add port 13 subport 0 pipes 3000 profile 11
>>>>>>>>>>> >> >> >   hqos set port 13 lcore 5
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> I've enabled TC_OV feature and redirected most of the traffic to TC3.
>>>>>>>>>>> >> >> But the issue still exists.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Below is queue statistics of one of problematic pipes.
>>>>>>>>>>> >> >> Almost all of the traffic entering the pipe is dropped.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> And the pipe is also configured with the 1Mbit/s profile.
>>>>>>>>>>> >> >> So, the issue is only with very low bandwidth pipe profiles.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> And this time there was no congestion on the subport.
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >>
>>>>>>>>>>> >> >> Egress qdisc
>>>>>>>>>>> >> >> dir 0
>>>>>>>>>>> >> >>    rate 1M
>>>>>>>>>>> >> >>    port 6, subport 0, pipe_id 138, profile_id 5
>>>>>>>>>>> >> >>    tc 0, queue 0: bytes 752, bytes dropped 0, pkts 8, pkts dropped 0
>>>>>>>>>>> >> >>    tc 0, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 0, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 0, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 1, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 1, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 1, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 1, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 2, queue 0: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 2, queue 1: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 2, queue 2: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 2, queue 3: bytes 0, bytes dropped 0, pkts 0, pkts dropped 0
>>>>>>>>>>> >> >>    tc 3, queue 0: bytes 56669, bytes dropped 360242, pkts 150,
>>>>>>>>>>> >> >> pkts dropped
>>>>>>>>>>> >> >> 3749
>>>>>>>>>>> >> >>    tc 3, queue 1: bytes 63005, bytes dropped 648782, pkts 150,
>>>>>>>>>>> >> >> pkts dropped
>>>>>>>>>>> >> >> 3164
>>>>>>>>>>> >> >>    tc 3, queue 2: bytes 9984, bytes dropped 49704, pkts 128, pkts
>>>>>>>>>>> >> >> dropped
>>>>>>>>>>> >> >> 636
>>>>>>>>>>> >> >>    tc 3, queue 3: bytes 15436, bytes dropped 107198, pkts 130,
>>>>>>>>>>> >> >> pkts dropped
>>>>>>>>>>> >> >> 354
>>>>>>>>>>> >> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Hi Alex,
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Can you try newer version of the library, say dpdk 20.11?
>>>>>>>>>>> >>
>>>>>>>>>>> >> Right now no, since switching to another DPDK will take a lot of time
>>>>>>>>>>> >> because I am using a lot of custom patches.
>>>>>>>>>>> >>
>>>>>>>>>>> >> I've tried to simply copy the entire rte_sched lib from DPDK 19 to
>>>>>>>>>>> >> DPDK 18.
>>>>>>>>>>> >> And I was able to successful back port and resolve all dependency
>>>>>>>>>>> >> issues, but it also will take some time to test this approach.
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> > Are you
>>>>>>>>>>> >> > using dpdk qos sample app or your own app?
>>>>>>>>>>> >>
>>>>>>>>>>> >> My own app.
>>>>>>>>>>> >>
>>>>>>>>>>> >> >> What are the packets size?
>>>>>>>>>>> >>
>>>>>>>>>>> >> Application is used as BRAS/BNG server, so it's used to provide
>>>>>>>>>>> >> internet access to residential customers. Therefore packet sizes are
>>>>>>>>>>> >> typical to the internet and vary from 64 to 1500 bytes. Most of the
>>>>>>>>>>> >> packets are around
>>>>>>>>>>> >> 1000 bytes.
>>>>>>>>>>> >>
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Couple of other things for clarification- 1. At what rate you are
>>>>>>>>>>> >> > injecting the traffic to low bandwidth pipes?
>>>>>>>>>>> >>
>>>>>>>>>>> >> Well, the rate vary also, there could be congestion on some pipes at
>>>>>>>>>>> >> some date time.
>>>>>>>>>>> >>
>>>>>>>>>>> >> But the problem is that once the problem occurs at a pipe or at some
>>>>>>>>>>> >> queues inside the pipe, the pipe stops transmitting even when
>>>>>>>>>>> >> incoming traffic rate is much lower than the pipe's rate.
>>>>>>>>>>> >>
>>>>>>>>>>> >> > 2. How is traffic distributed among pipes and their traffic class?
>>>>>>>>>>> >>
>>>>>>>>>>> >> I am using IPv4 TOS field to choose the TC and there is a tos2tc map.
>>>>>>>>>>> >> Most of my traffic has 0 tos value which is mapped to TC3 inside my
>>>>>>>>>>> >> app.
>>>>>>>>>>> >>
>>>>>>>>>>> >> Recently I've switched to a tos2map which maps all traffic to TC3 to
>>>>>>>>>>> >> see if it solves the problem.
>>>>>>>>>>> >>
>>>>>>>>>>> >> Packet distribution to queues is done using the formula (ipv4.src +
>>>>>>>>>>> >> ipv4.dst) & 3
>>>>>>>>>>> >>
>>>>>>>>>>> >> > 3. Can you try putting your own counters on those pipes queues
>>>>>>>>>>> >> > which periodically show the #packets in the queues to understand
>>>>>>>>>>> >> > the dynamics?
>>>>>>>>>>> >>
>>>>>>>>>>> >> I will try.
>>>>>>>>>>> >>
>>>>>>>>>>> >> P.S.
>>>>>>>>>>> >>
>>>>>>>>>>> >> Recently I've got another problem with scheduler.
>>>>>>>>>>> >>
>>>>>>>>>>> >> After enabling the TC_OV feature one of the ports stops transmitting.
>>>>>>>>>>> >> All port's pipes were affected.
>>>>>>>>>>> >> Port had only one support, and there were only pipes with 1 Mbit/s
>>>>>>>>>>> >> profile.
>>>>>>>>>>> >> The problem was solved by adding a 10Mit/s profile to that port. Only
>>>>>>>>>>> >> after that port's pipes started to transmit.
>>>>>>>>>>> >> I guess it has something to do with calculating tc_ov_wm as it
>>>>>>>>>>> >> depends on the maximum pipe rate.
>>>>>>>>>>> >>
>>>>>>>>>>> >> I am gonna make a test lab and a test build to reproduce this.
>>>>>>>>>>> I've made some tests and was able to reproduce the port configuration issue
>>>>>>>>>>> using a test build of my app.
>>>>>>>>>>> Tests showed that TC_OV feature works not correctly in DPDK 18.11, but
>>>>>>>>>>> there are workarounds.
>>>>>>>>>>> I still can't reproduce my main problem which is random pipes stop
>>>>>>>>>>> transmitting.
>>>>>>>>>>> Here are details:
>>>>>>>>>>> All tests use the same test traffic generator that produce
>>>>>>>>>>> 10 traffic flows entering 10 different pipes of port 1 subport 0.
>>>>>>>>>>> Only queue 0 of each pipe is used.
>>>>>>>>>>> TX rate is 800 kbit/s. packet size is 800 byte.
>>>>>>>>>>> Pipes rate are 1 Mbit/s. Subport 0 rate is 500 Mbit/s.
>>>>>>>>>>> ###
>>>>>>>>>>> ### test 1
>>>>>>>>>>> ###
>>>>>>>>>>> Traffic generator is configured to use TC3.
>>>>>>>>>>> Configuration:
>>>>>>>>>>> hqos add profile 27 rate 1 M size 1000000 tc period 40
>>>>>>>>>>> hqos add profile 23 rate  100 M size 1000000 tc period 40
>>>>>>>>>>> # qos test port
>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64
>>>>>>>>>>> 64 64
>>>>>>>>>>> hqos add port 1 subport 0 rate 500 M size 1000000 tc period 10
>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 27
>>>>>>>>>>> hqos set port 1 lcore 3
>>>>>>>>>>> Results:
>>>>>>>>>>> h5 ~ # rcli sh qos rcv
>>>>>>>>>>> rcv 0: rx rate 641280, nb pkts 501, ind 1 rcv 1: rx rate 641280, nb pkts 501, ind
>>>>>>>>>>> 1 rcv 2: rx rate 641280, nb pkts 501, ind 1 rcv 3: rx rate 641280, nb pkts 501,
>>>>>>>>>>> ind 1 rcv 4: rx rate 641280, nb pkts 501, ind 1 rcv 5: rx rate 641280, nb pkts
>>>>>>>>>>> 501, ind 1 rcv 6: rx rate 641280, nb pkts 501, ind 1 rcv 7: rx rate 641280, nb
>>>>>>>>>>> pkts 501, ind 1 rcv 8: rx rate 641280, nb pkts 501, ind 1 rcv 9: rx rate 641280,
>>>>>>>>>>> nb pkts 501, ind 1
>>>>>>>>>>> ! BUG
>>>>>>>>>>> ! RX rate is lower then expected 800000 bit/s despite that there is no
>>>>>>>>>>> congestion neither at subport nor at pipes levels.
>>>>>>>>>> [JS] - Can you elaborate on your scheduler hierarchy?
>>>>>>>>> sure, take a look below at the output
>>>>>>>>> "number of pipes per subport"
>>>>>>>>> TR application always round the total number of pipes per port
>>>>>>>>> to a power2 value.
>>>>>>>>> I mean-  how
>>>>>>>>>> many pipes per subport? It has to be the number that can be expressed
>>>>>>>>>> as power of 2, for e.g 4K, 2K, 1K etc.  In run time, scheduler will
>>>>>>>>>> scan all the pipes and will process only those which have got packets
>>>>>>>>>> in their queue.
>>>>>>>>> Configuration of port 1 with enabled profile 23
>>>>>>>>> h5 ~ # rcli sh hqos ports
>>>>>>>>> hqos scheduler port: 1
>>>>>>>>> lcore_id: 3
>>>>>>>>> socket: 0
>>>>>>>>> rate: 0
>>>>>>>>> mtu: 1522
>>>>>>>>> frame overhead: 24
>>>>>>>>> number of pipes per subport: 4096
>>>>>>>>> pipe profiles: 2
>>>>>>>>>  pipe profile id: 27
>>>>>>>>>  pipe rate: 1000000
>>>>>>>>>  number of pipes: 2000
>>>>>>>>>  pipe pool size: 2000
>>>>>>>>>  number of pipes in use: 0
>>>>>>>>>  pipe profile id: 23
>>>>>>>>>  pipe rate: 100000000
>>>>>>>>>  number of pipes: 200
>>>>>>>>>  pipe pool size: 200
>>>>>>>>>  number of pipes in use: 0
>>>>>>>>> Configuration with only one profile at port 1
>>>>>>>>> hqos scheduler port: 1
>>>>>>>>> lcore_id: 3
>>>>>>>>> socket: 0
>>>>>>>>> rate: 0
>>>>>>>>> mtu: 1522
>>>>>>>>> frame overhead: 24
>>>>>>>>> number of pipes per subport: 2048
>>>>>>>>> pipe profiles: 1
>>>>>>>>>  pipe profile id: 27
>>>>>>>>>  pipe rate: 1000000
>>>>>>>>>  number of pipes: 2000
>>>>>>>>>  pipe pool size: 2000
>>>>>>>>>  number of pipes in use: 0
>>>>>>>>> [JS]  what is the meaning of number of pipes , Pipe pool size, and number of pipes in use which is zero above? Does your application map packet field values to these number of pipes in run time ? Can you give me example of mapping of packet field values to pipe id, tc, queue?
>>>>>>> please, ignore all information from the outputs above except the
>>>>>>> "number of pipes per subport".
>>>>>>> since the tests were made with a simple test application which is
>>>>>>> based on TR but doesn't use
>>>>>>> it's production QoS logic.
>>>>>>> The tests 1 - 5 were made with a simple test application with a very
>>>>>>> straitforward qos mappings
>>>>>>> which I described at the beginning.
>>>>>>> Here they are:
>>>>>>> All tests use the same test traffic generator that produce
>>>>>>> 10 traffic flows entering 10 different pipes (0 - 9) of port 1 subport 0.
>>>>>>> Only queue 0 of each pipe is used.
>>>>>>> TX rate is 800 kbit/s. packet size is 800 byte.
>>>>>>> Pipes rate are 1 Mbit/s. Subport 0 rate is 500 Mbit/s.
>>>>>>>>>>> ###
>>>>>>>>>>> ### test 2
>>>>>>>>>>> ###
>>>>>>>>>>> Traffic generator is configured to use TC3.
>>>>>>>>>>> !!! profile 23 has been added to the test port.
>>>>>>>>>>> Configuration:
>>>>>>>>>>> hqos add profile 27 rate 1 M size 1000000 tc period 40
>>>>>>>>>>> hqos add profile 23 rate  100 M size 1000000 tc period 40
>>>>>>>>>>> # qos test port
>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64
>>>>>>>>>>> 64 64
>>>>>>>>>>> hqos add port 1 subport 0 rate 500 M size 1000000 tc period 10
>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 27
>>>>>>>>>>> hqos add port 1 subport 0 pipes 200 profile 23
>>>>>>>>>>> hqos set port 1 lcore 3
>>>>>>>>>>> Results:
>>>>>>>>>>> h5 ~ # rcli sh qos rcv
>>>>>>>>>>> rcv 0: rx rate 798720, nb pkts 624, ind 1 rcv 1: rx rate 798720, nb pkts 624, ind
>>>>>>>>>>> 1 rcv 2: rx rate 798720, nb pkts 624, ind 1 rcv 3: rx rate 798720, nb pkts 624,
>>>>>>>>>>> ind 1 rcv 4: rx rate 798720, nb pkts 624, ind 1 rcv 5: rx rate 798720, nb pkts
>>>>>>>>>>> 624, ind 1 rcv 6: rx rate 798720, nb pkts 624, ind 1 rcv 7: rx rate 798720, nb
>>>>>>>>>>> pkts 624, ind 1 rcv 8: rx rate 798720, nb pkts 624, ind 1 rcv 9: rx rate 798720,
>>>>>>>>>>> nb pkts 624, ind 1
>>>>>>>>>>> OK.
>>>>>>>>>>> Receiving traffic is rate is equal to expected values.
>>>>>>>>>>> So, just adding a pipes which are not being used solves the problem.
>>>>>>>>>>> ###
>>>>>>>>>>> ### test 3
>>>>>>>>>>> ###
>>>>>>>>>>> !!! traffic generator uses TC 0, so tc_ov is not being used in this test.
>>>>>>>>>>> profile 23 is not used.
>>>>>>>>>>> Configuration without profile 23.
>>>>>>>>>>> hqos add profile 27 rate 1 M size 1000000 tc period 40
>>>>>>>>>>> hqos add profile 23 rate  100 M size 1000000 tc period 40
>>>>>>>>>>> # qos test port
>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64
>>>>>>>>>>> 64 64
>>>>>>>>>>> hqos add port 1 subport 0 rate 500 M size 1000000 tc period 10
>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 27
>>>>>>>>>>> hqos set port 1 lcore 3
>>>>>>>>>>> Restuls:
>>>>>>>>>>> h5 ~ # rcli sh qos rcv
>>>>>>>>>>> rcv 0: rx rate 798720, nb pkts 624, ind 0 rcv 1: rx rate 798720, nb pkts 624, ind
>>>>>>>>>>> 0 rcv 2: rx rate 798720, nb pkts 624, ind 0 rcv 3: rx rate 798720, nb pkts 624,
>>>>>>>>>>> ind 0 rcv 4: rx rate 798720, nb pkts 624, ind 0 rcv 5: rx rate 798720, nb pkts
>>>>>>>>>>> 624, ind 0 rcv 6: rx rate 798720, nb pkts 624, ind 0 rcv 7: rx rate 798720, nb
>>>>>>>>>>> pkts 624, ind 0 rcv 8: rx rate 798720, nb pkts 624, ind 0 rcv 9: rx rate 798720,
>>>>>>>>>>> nb pkts 624, ind 0
>>>>>>>>>>> OK.
>>>>>>>>>>> Receiving traffic is rate is equal to expected values.
>>>>>>>>>>> ###
>>>>>>>>>>> ### test 4
>>>>>>>>>>> ###
>>>>>>>>>>> Traffic generator is configured to use TC3.
>>>>>>>>>>> no profile 23.
>>>>>>>>>>> !! subport tc period has been changed from 10 to 5.
>>>>>>>>>>> Configuration:
>>>>>>>>>>> hqos add profile 27 rate 1 M size 1000000 tc period 40
>>>>>>>>>>> hqos add profile 23 rate  100 M size 1000000 tc period 40
>>>>>>>>>>> # qos test port
>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64
>>>>>>>>>>> 64 64
>>>>>>>>>>> hqos add port 1 subport 0 rate 500 M size 1000000 tc period 5
>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 27
>>>>>>>>>>> hqos set port 1 lcore 3
>>>>>>>>>>> Restuls:
>>>>>>>>>>> rcv 0: rx rate 0, nb pkts 0, ind 1
>>>>>>>>>>> rcv 1: rx rate 0, nb pkts 0, ind 1
>>>>>>>>>>> rcv 2: rx rate 0, nb pkts 0, ind 1
>>>>>>>>>>> rcv 3: rx rate 0, nb pkts 0, ind 1
>>>>>>>>>>> rcv 4: rx rate 0, nb pkts 0, ind 1
>>>>>>>>>>> rcv 5: rx rate 0, nb pkts 0, ind 1
>>>>>>>>>>> rcv 6: rx rate 0, nb pkts 0, ind 1
>>>>>>>>>>> rcv 7: rx rate 0, nb pkts 0, ind 1
>>>>>>>>>>> rcv 8: rx rate 0, nb pkts 0, ind 1
>>>>>>>>>>> rcv 9: rx rate 0, nb pkts 0, ind 1
>>>>>>>>>>> ! zero traffic
>>>>>>>>>>> ###
>>>>>>>>>>> ### test 5
>>>>>>>>>>> ###
>>>>>>>>>>> Traffic generator is configured to use TC3.
>>>>>>>>>>> profile 23 is enabled.
>>>>>>>>>>> subport tc period has been changed from 10 to 5.
>>>>>>>>>>> Configuration:
>>>>>>>>>>> hqos add profile 27 rate 1 M size 1000000 tc period 40
>>>>>>>>>>> hqos add profile 23 rate  100 M size 1000000 tc period 40
>>>>>>>>>>> # qos test port
>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64
>>>>>>>>>>> 64 64
>>>>>>>>>>> hqos add port 1 subport 0 rate 500 M size 1000000 tc period 5
>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 27
>>>>>>>>>>> hqos add port 1 subport 0 pipes 200 profile 23
>>>>>>>>>>> hqos set port 1 lcore 3
>>>>>>>>>>> Restuls:
>>>>>>>>>>> h5 ~ # rcli sh qos rcv
>>>>>>>>>>> rcv 0: rx rate 800000, nb pkts 625, ind 1 rcv 1: rx rate 800000, nb pkts 625, ind
>>>>>>>>>>> 1 rcv 2: rx rate 800000, nb pkts 625, ind 1 rcv 3: rx rate 800000, nb pkts 625,
>>>>>>>>>>> ind 1 rcv 4: rx rate 800000, nb pkts 625, ind 1 rcv 5: rx rate 800000, nb pkts
>>>>>>>>>>> 625, ind 1 rcv 6: rx rate 800000, nb pkts 625, ind 1 rcv 7: rx rate 800000, nb
>>>>>>>>>>> pkts 625, ind 1 rcv 8: rx rate 800000, nb pkts 625, ind 1 rcv 9: rx rate 800000,
>>>>>>>>>>> nb pkts 625, ind 1
>>>>>>>>>>> OK
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> > Does this problem exist when you disable oversubscription mode? Worth
>>>>>>>>>>> > looking at grinder_tc_ov_credits_update() and grinder_credits_update()
>>>>>>>>>>> > functions where tc_ov_wm is altered.
>>>>>>>>>>> >
>>>>>>>>>>> >
>>>>>>>>>>> >> >
>>>>>>>>>>> >> > Thanks,
>>>>>>>>>>> >> > Jasvinder
>>>>>> Ok, these new two tests show even more clearly that TC_OV feature is broken.
>>>>>> Test 1 doesn't use TC_OV and all available support bandwidth is
>>>>>> distributed to 300 pipes in a very fair way. There are 300 generators
>>>>>> with tx rate 1M. They produce 300 traffic flows which enter each
>>>>>> own pipe (queue 0) of port 1 subport 0.
>>>>>> port 1
>>>>>> subport rate 300 M
>>>>>> Then application measures the rx rate of each flows after flow's
>>>>>> traffic leaves the scheduler.
>>>>>> For example, the following line
>>>>>> rcv 284 rx rate 995840  nb pkts 778
>>>>>> shows that rx rate of the flow with number 284 is 995840 bit/s.
>>>>>> All 300 rx rates are about 995840 bit/s (1Mbit/s) as expected.
>>> [JS] May be try repeat same test but change traffic from  tc 0 to
>>> tc3. See if this works.
>> The second test with incorrect restuls was already done with tc3.
>>>>> The second test uses the same configuration
>>>>> but uses TC3, so the TC_OV function is being used.
>>>>> And the distribution of traffic in the test is very unfair.
>>>>> Some of the pipes get 875520 bit/s, some of the pipes get only
>>>>> 604160 bit/s despite that there is
>>> [JS] try repeat test with increase pipe bandwidth let’s say 50 mbps or
>>> even greater.
>> I increased pipe rate to 10Mbit/s and both tests (tc0 and tc3) showed
>> correct and identical results.
>> But, then I changed the tests and increased the number of pipes to 600
>> to see how it would work with subport congestion. I added 600 pipes
>> generatinig 10Mbit/s
>> and 3G subport limit, therefore each pipe should get equal share which
>> is about 5mbits.
>> And results of both tests (tc0 or tc3) are not very good.
>> First pipes are getting much more bandwidth than the last ones.
>> The difference is 3 times. So, TC_OV is still not working!!
> 
> decreasing subport tc_period from 10 to 5 has solved that problem
> and scheduler started to distribute subport bandwidth between 10 mbit/s pipes almost ideally.
> 
[JS] now, returning to 1 mbps pipes situation, try reducing tc period first at subport and then at  pipe level, if that help in getting even traffic across low bandwidth pipes.



>> rcv 0   rx rate 7324160 nb pkts 5722
>> rcv 1   rx rate 7281920 nb pkts 5689
>> rcv 2   rx rate 7226880 nb pkts 5646
>> rcv 3   rx rate 7124480 nb pkts 5566
>> rcv 4   rx rate 7324160 nb pkts 5722
>> rcv 5   rx rate 7271680 nb pkts 5681
>> rcv 6   rx rate 7188480 nb pkts 5616
>> rcv 7   rx rate 7150080 nb pkts 5586
>> rcv 8   rx rate 7328000 nb pkts 5725
>> rcv 9   rx rate 7249920 nb pkts 5664
>> rcv 10  rx rate 7188480 nb pkts 5616
>> rcv 11  rx rate 7179520 nb pkts 5609
>> rcv 12  rx rate 7324160 nb pkts 5722
>> rcv 13  rx rate 7208960 nb pkts 5632
>> rcv 14  rx rate 7152640 nb pkts 5588
>> rcv 15  rx rate 7127040 nb pkts 5568
>> rcv 16  rx rate 7303680 nb pkts 5706
>> ....
>> rcv 587 rx rate 2406400 nb pkts 1880
>> rcv 588 rx rate 2406400 nb pkts 1880
>> rcv 589 rx rate 2406400 nb pkts 1880
>> rcv 590 rx rate 2406400 nb pkts 1880
>> rcv 591 rx rate 2406400 nb pkts 1880
>> rcv 592 rx rate 2398720 nb pkts 1874
>> rcv 593 rx rate 2400000 nb pkts 1875
>> rcv 594 rx rate 2400000 nb pkts 1875
>> rcv 595 rx rate 2400000 nb pkts 1875
>> rcv 596 rx rate 2401280 nb pkts 1876
>> rcv 597 rx rate 2401280 nb pkts 1876
>> rcv 598 rx rate 2401280 nb pkts 1876
>> rcv 599 rx rate 2402560 nb pkts 1877
>> rx rate sum 3156416000
> 
> 
> 
>>>> ... despite that there is _NO_ congestion...
>>>> congestion at the subport or pipe.
>>>>> And the subport !! doesn't use about 42 mbit/s of available bandwidth.
>>>>> The only difference is those test configurations is TC of generated traffic.
>>>>> Test 1 uses TC 1 while test 2 uses TC 3 (which is use TC_OV function).
>>>>> So, enabling TC_OV changes the results dramatically.
>>>>> ##
>>>>> ## test1
>>>>> ##
>>>>> hqos add profile  7 rate    2 M size 1000000 tc period 40
>>>>> # qos test port
>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64 64 64
>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period 10
>>>>> hqos add port 1 subport 0 pipes 2000 profile 7
>>>>> hqos add port 1 subport 0 pipes 200 profile 23
>>>>> hqos set port 1 lcore 3
>>>>> port 1
>>>>> subport rate 300 M
>>>>> number of tx flows 300
>>>>> generator tx rate 1M
>>>>> TC 1
>>>>> ...
>>>>> rcv 284 rx rate 995840  nb pkts 778
>>>>> rcv 285 rx rate 995840  nb pkts 778
>>>>> rcv 286 rx rate 995840  nb pkts 778
>>>>> rcv 287 rx rate 995840  nb pkts 778
>>>>> rcv 288 rx rate 995840  nb pkts 778
>>>>> rcv 289 rx rate 995840  nb pkts 778
>>>>> rcv 290 rx rate 995840  nb pkts 778
>>>>> rcv 291 rx rate 995840  nb pkts 778
>>>>> rcv 292 rx rate 995840  nb pkts 778
>>>>> rcv 293 rx rate 995840  nb pkts 778
>>>>> rcv 294 rx rate 995840  nb pkts 778
>>>>> ...
>>>>> sum pipe's rx rate is 298 494 720
>>>>> OK.
>>>>> The subport rate is equally distributed to 300 pipes.
>>>>> ##
>>>>> ##  test 2
>>>>> ##
>>>>> hqos add profile  7 rate    2 M size 1000000 tc period 40
>>>>> # qos test port
>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue sizes 64 64 64 64
>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period 10
>>>>> hqos add port 1 subport 0 pipes 2000 profile 7
>>>>> hqos add port 1 subport 0 pipes 200 profile 23
>>>>> hqos set port 1 lcore 3
>>>>> port 1
>>>>> subport rate 300 M
>>>>> number of tx flows 300
>>>>> generator tx rate 1M
>>>>> TC 3
>>>>> h5 ~ # rcli sh qos rcv
>>>>> rcv 0   rx rate 875520  nb pkts 684
>>>>> rcv 1   rx rate 856320  nb pkts 669
>>>>> rcv 2   rx rate 849920  nb pkts 664
>>>>> rcv 3   rx rate 853760  nb pkts 667
>>>>> rcv 4   rx rate 867840  nb pkts 678
>>>>> rcv 5   rx rate 844800  nb pkts 660
>>>>> rcv 6   rx rate 852480  nb pkts 666
>>>>> rcv 7   rx rate 855040  nb pkts 668
>>>>> rcv 8   rx rate 865280  nb pkts 676
>>>>> rcv 9   rx rate 846080  nb pkts 661
>>>>> rcv 10  rx rate 858880  nb pkts 671
>>>>> rcv 11  rx rate 870400  nb pkts 680
>>>>> rcv 12  rx rate 864000  nb pkts 675
>>>>> rcv 13  rx rate 852480  nb pkts 666
>>>>> rcv 14  rx rate 855040  nb pkts 668
>>>>> rcv 15  rx rate 857600  nb pkts 670
>>>>> rcv 16  rx rate 864000  nb pkts 675
>>>>> rcv 17  rx rate 866560  nb pkts 677
>>>>> rcv 18  rx rate 865280  nb pkts 676
>>>>> rcv 19  rx rate 858880  nb pkts 671
>>>>> rcv 20  rx rate 856320  nb pkts 669
>>>>> rcv 21  rx rate 864000  nb pkts 675
>>>>> rcv 22  rx rate 869120  nb pkts 679
>>>>> rcv 23  rx rate 856320  nb pkts 669
>>>>> rcv 24  rx rate 862720  nb pkts 674
>>>>> rcv 25  rx rate 865280  nb pkts 676
>>>>> rcv 26  rx rate 867840  nb pkts 678
>>>>> rcv 27  rx rate 870400  nb pkts 680
>>>>> rcv 28  rx rate 860160  nb pkts 672
>>>>> rcv 29  rx rate 870400  nb pkts 680
>>>>> rcv 30  rx rate 869120  nb pkts 679
>>>>> rcv 31  rx rate 870400  nb pkts 680
>>>>> rcv 32  rx rate 858880  nb pkts 671
>>>>> rcv 33  rx rate 858880  nb pkts 671
>>>>> rcv 34  rx rate 852480  nb pkts 666
>>>>> rcv 35  rx rate 874240  nb pkts 683
>>>>> rcv 36  rx rate 855040  nb pkts 668
>>>>> rcv 37  rx rate 853760  nb pkts 667
>>>>> rcv 38  rx rate 869120  nb pkts 679
>>>>> rcv 39  rx rate 885760  nb pkts 692
>>>>> rcv 40  rx rate 861440  nb pkts 673
>>>>> rcv 41  rx rate 852480  nb pkts 666
>>>>> rcv 42  rx rate 871680  nb pkts 681
>>>>> ...
>>>>> ...
>>>>> rcv 288 rx rate 766720  nb pkts 599
>>>>> rcv 289 rx rate 766720  nb pkts 599
>>>>> rcv 290 rx rate 766720  nb pkts 599
>>>>> rcv 291 rx rate 766720  nb pkts 599
>>>>> rcv 292 rx rate 762880  nb pkts 596
>>>>> rcv 293 rx rate 762880  nb pkts 596
>>>>> rcv 294 rx rate 762880  nb pkts 596
>>>>> rcv 295 rx rate 760320  nb pkts 594
>>>>> rcv 296 rx rate 604160  nb pkts 472
>>>>> rcv 297 rx rate 604160  nb pkts 472
>>>>> rcv 298 rx rate 604160  nb pkts 472
>>>>> rcv 299 rx rate 604160  nb pkts 472
>>>>> rx rate sum 258839040
>>>>> FAILED.
>>>>> The subport rate is distributed NOT equally between 300 pipes.
>>>>> Some subport bandwith (about 42) is not being used!

  reply	other threads:[~2020-12-07 22:32 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24 13:34 Alex Kiselev
2020-11-25 15:04 ` Alex Kiselev
2020-11-27 12:11   ` Alex Kiselev
2020-12-07 10:00     ` Singh, Jasvinder
2020-12-07 10:46       ` Alex Kiselev
2020-12-07 11:32         ` Singh, Jasvinder
2020-12-07 12:29           ` Alex Kiselev
2020-12-07 16:49           ` Alex Kiselev
2020-12-07 17:31             ` Singh, Jasvinder
2020-12-07 17:45               ` Alex Kiselev
     [not found]                 ` <49019BC8-DDA6-4B39-B395-2A68E91AB424@intel.com>
     [not found]                   ` <226b13286c876e69ad40a65858131b66@therouter.net>
     [not found]                     ` <4536a02973015dc8049834635f145a19@therouter.net>
     [not found]                       ` <f9a27b6493ae1e1e2850a3b459ab9d33@therouter.net>
     [not found]                         ` <B8241A33-0927-4411-A340-9DD0BEE07968@intel.com>
     [not found]                           ` <e6a0429dc4a1a33861a066e3401e85b6@therouter.net>
2020-12-07 22:16                             ` Alex Kiselev
2020-12-07 22:32                               ` Singh, Jasvinder [this message]
2020-12-08 10:52                                 ` Alex Kiselev
2020-12-08 13:24                                   ` Singh, Jasvinder
2020-12-09 13:41                                     ` Alex Kiselev
2020-12-10 10:29                                       ` Singh, Jasvinder
2020-12-11 21:29                                     ` Alex Kiselev
2020-12-11 22:06                                       ` Singh, Jasvinder
2020-12-11 22:27                                         ` Alex Kiselev
2020-12-11 22:36                                           ` Alex Kiselev
2020-12-11 22:55                                           ` Singh, Jasvinder
2020-12-11 23:36                                             ` Alex Kiselev
2020-12-12  0:20                                               ` Singh, Jasvinder
2020-12-12  0:45                                                 ` Alex Kiselev
2020-12-12  0:54                                                   ` Alex Kiselev
2020-12-12  1:45                                                     ` Alex Kiselev
2020-12-12 10:22                                                       ` Singh, Jasvinder
2020-12-12 10:46                                                         ` Alex Kiselev
2020-12-12 17:19                                                           ` Alex Kiselev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5FD54115-155E-4492-B1D6-041C8782BB8E@intel.com \
    --to=jasvinder.singh@intel.com \
    --cc=alex@therouter.net \
    --cc=cristian.dumitrescu@intel.com \
    --cc=savinay.dharmappa@intel.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).