From: "Singh, Jasvinder" <jasvinder.singh@intel.com>
To: Alex Kiselev <alex@therouter.net>
Cc: "users@dpdk.org" <users@dpdk.org>,
"Dumitrescu, Cristian" <cristian.dumitrescu@intel.com>,
"Dharmappa, Savinay" <savinay.dharmappa@intel.com>
Subject: Re: [dpdk-users] scheduler issue
Date: Sat, 12 Dec 2020 10:22:08 +0000 [thread overview]
Message-ID: <3CC4F951-89A4-4845-9DD0-9982AB1A32AB@intel.com> (raw)
In-Reply-To: <7b20d7eec340a8de3843ea60a7166715@therouter.net>
> On 12 Dec 2020, at 01:45, Alex Kiselev <alex@therouter.net> wrote:
>
> On 2020-12-12 01:54, Alex Kiselev wrote:
>>> On 2020-12-12 01:45, Alex Kiselev wrote:
>>> On 2020-12-12 01:20, Singh, Jasvinder wrote:
>>>>> On 11 Dec 2020, at 23:37, Alex Kiselev <alex@therouter.net> wrote:
>>>>> On 2020-12-11 23:55, Singh, Jasvinder wrote:
>>>>> On 11 Dec 2020, at 22:27, Alex Kiselev <alex@therouter.net> wrote:
>>>>>> On 2020-12-11 23:06, Singh, Jasvinder wrote:
>>>>> On 11 Dec 2020, at 21:29, Alex Kiselev <alex@therouter.net> wrote:
>>>>> On 2020-12-08 14:24, Singh, Jasvinder wrote:
>>>>> <snip>
>>>>>> [JS] now, returning to 1 mbps pipes situation, try reducing tc
>>>>> period
>>>>>> first at subport and then at pipe level, if that help in getting
>>>>> even
>>>>>> traffic across low bandwidth pipes.
>>>>> reducing subport tc from 10 to 5 period also solved the problem
>>>>> with 1
>>>>> Mbit/s pipes.
>>>>> so, my second problem has been solved,
>>>>> but the first one with some of low bandwidth pipes stop
>>>>> transmitting still
>>>>> remains.
>>>>> I see, try removing "pkt_len <= pipe_tc_ov_credits" condition in
>>>>> the
>>>>> grinder_credits_check() code for oversubscription case, instead use
>>>>> this pkt_len <= pipe_tc_credits + pipe_tc_ov_credits;
>>>>> if I do what you suggest, I will get this code
>>>>> enough_credits = (pkt_len <= subport_tb_credits) &&
>>>>> (pkt_len <= subport_tc_credits) &&
>>>>> (pkt_len <= pipe_tb_credits) &&
>>>>> (pkt_len <= pipe_tc_credits) &&
>>>>> (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits);
>>>>> And this doesn't make sense since if condition pkt_len <=
>>>>> pipe_tc_credits is true
>>>>> then condition (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits) is
>>>>> also always true.
>>>>> [JS] my suggestion is to remove“pkt_len <= pipe_tc_credits“,
>>>>> “ pkt_len
>>>>> <= pipe_tc_ov_credits”and use only “pkt_len <= pipe_tc_credits
>>>>> +
>>>>> pipe_tc_ov_credits“
>>>>> While keeping tc_ov flag on.
>>>>> Your suggestion just turns off TC_OV feature.
>>>>>> I don't see your point.
>>>>>> This new suggestion will also effectively turn off the TC_OV
>>>>>> feature since
>>>>>> the only effect of enabling TC_OV is adding additional condition
>>>>>> pkt_len <= pipe_tc_ov_credits
>>>>>> which doesn't allow a pipe to spend more resources than it should.
>>>>>> And in the case of support congestion a pipe should spent less
>>>>>> than %100 of pipe's maximum rate.
>>>>>> And you suggest to allow pipe to spend 100% of it's rate plus some
>>>>>> extra.
>>>>>> I guess effect of this would even more unfair support's bandwidth
>>>>>> distibution.
>>>>>> Btw, a pipe might stop transmitting even when there is no
>>>>>> congestion at a subport.
>>>>> Although I didn’t try this solution but the idea here is - in a
>>>>> particular round, of pkt_len is less than pipe_tc_credits( which is
>>>>> a
>>>>> constant value each time) but greater than pipe_tc_ov_credits, then
>>>>> it
>>>>> might hit the situation when no packet will be scheduled from the
>>>>> pipe
>>>>> even though there are fixed credits greater than packet size is
>>>>> available.
>>>> But that is a perfectly normal situation and that's exactly the idea
>>>> behind TC_OV.
>>>> It means a pipe should wait for the next subport->tc_ov_period_id
>>>> when pipe_tc_ov_credits will be reset to a new value
>>>> But here it’s not guaranteed that new value of pipe_tc_ov_credits
>>>> will be sufficient for low bandwidth pipe to send their packets as
>>>> each time pipe_tc_ov_credits is freshly computed.
>>>>> pipe->tc_ov_credits = subport->tc_ov_wm * params->tc_ov_weight;
>>>>> which allows the pipe to continue transmitting.
>>>> No that won’t happen if new tc_ov_credits value is again less than
>>>> pkt_len and will hit deadlock.
>>> new tc_ov_credits can't not be less than subport->tc_ov_wm_min,
>>> and tc_ov_wm_min is equal to port->mtu.
>>> all my scheduler ports configured with mtu 1522. etherdev ports also uses
>>> the same mtu, therefore there should be no packets bigger that 1522.
>> also, tc_ov_credits is set to tc_ov_wm_min only in the case of constant
>> congestion and today I detected the problem when there was no congestion.
>> so, it's highly unlikely that tc_ov_credits is always set to a value
>> less than pkt_size. The only scenario in which this might be the case is
>> when scheduler port get a corrupted mbuf with incorrect pkt len
>> which cause a queue deadlock.
>
> also, a defragmented ipv4 packet (multisegment mbuf) might have pkt_len much bigger
> then scheduler port's MTU, therefore you are right, there is absolutely no guarantee
> that packet will not cause queue's deadlock. and this explanation sounds very plausible
> to me and I bet this is my case.
>
But you mentioned earlier that your packet length is low, never exceeding above threshold. May be test with fixed 256/512 bytes size pkt if faces the same no transmission situation.
>
>>> Maybe I should increase port's MTU? to 1540?
>>>>> And it could not cause a permanent pipe stop which is what I am
>>>>> facing.
>>>>>> In fairness, pipe should send the as much as packets which
>>>>>> consumes pipe_tc_credits, regardless of extra pipe_tc_ov_credits
>>>>>> which
>>>>>> is extra on top of pipe_tc_credits.
>>>>> I think it's quite the opposite. That's why after I reduced the
>>>>> support tc_period
>>>>> I got much more fairness. Since reducing subport tc_period also
>>>>> reduce the tc_ov_wm_max value.
>>>>> s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(params->tc_period,
>>>>> port->pipe_tc3_rate_max)
>>>>> as a result a pipe transmits less bytes in one round. so pipe
>>>>> rotation inside a grinder
>>>>> happens much more often and a pipe can't monopolise resources.
>>>>> in other sos implementation this is called "quantum".
>>>> Yes, so reducing tc period makes the case when all pipes ( high n low
>>>> bandwidth) gets lower values of tc_ov_credits values which allow
>>>> lesser transmission from higher bw pipes and leave bandwidth for low
>>>> bw pipes. So, here is the thing- Either tune tc period to a value
>>>> which prevent high bw pipe hogging most of bw or makes changes in the
>>>> code, where oversubscription add extra credits on top of guaranteed.
>>>> One question, don’t your low bw pipes have higher priority traffic
>>>> tc0, tc1, tc2 . Packets from those tc must be going out. Isn’t this
>>>> the case ?
>>> well, it would be the case after I find out
>>> what's going on. Right now I am using a tos2tc map configured
>>> in such a way that all ipv4 packets with any TOS values
>>> goes into TC3.
>>>>>>>> rcv 0 rx rate 7324160 nb pkts 5722
>>>>>>>> rcv 1 rx rate 7281920 nb pkts 5689
>>>>>>>> rcv 2 rx rate 7226880 nb pkts 5646
>>>>>>>> rcv 3 rx rate 7124480 nb pkts 5566
>>>>>>>> rcv 4 rx rate 7324160 nb pkts 5722
>>>>>>>> rcv 5 rx rate 7271680 nb pkts 5681
>>>>>>>> rcv 6 rx rate 7188480 nb pkts 5616
>>>>>>>> rcv 7 rx rate 7150080 nb pkts 5586
>>>>>>>> rcv 8 rx rate 7328000 nb pkts 5725
>>>>>>>> rcv 9 rx rate 7249920 nb pkts 5664
>>>>>>>> rcv 10 rx rate 7188480 nb pkts 5616 rcv 11 rx rate 7179520 nb
>>>>> pkts
>>>>>>>> 5609 rcv 12 rx rate 7324160 nb pkts 5722 rcv 13 rx rate
>>>>> 7208960 nb
>>>>>>>> pkts 5632 rcv 14 rx rate 7152640 nb pkts 5588 rcv 15 rx rate
>>>>>>>> 7127040 nb pkts 5568 rcv 16 rx rate 7303680 nb pkts 5706 ....
>>>>>>>> rcv 587 rx rate 2406400 nb pkts 1880 rcv 588 rx rate 2406400 nb
>>>>> pkts
>>>>>>>> 1880 rcv 589 rx rate 2406400 nb pkts 1880 rcv 590 rx rate
>>>>> 2406400 nb
>>>>>>>> pkts 1880 rcv 591 rx rate 2406400 nb pkts 1880 rcv 592 rx rate
>>>>>>>> 2398720 nb pkts 1874 rcv 593 rx rate 2400000 nb pkts 1875 rcv
>>>>> 594 rx
>>>>>>>> rate 2400000 nb pkts 1875 rcv 595 rx rate 2400000 nb pkts 1875
>>>>> rcv
>>>>>>>> 596 rx rate 2401280 nb pkts 1876 rcv 597 rx rate 2401280 nb
>>>>> pkts
>>>>>>>> 1876 rcv 598 rx rate 2401280 nb pkts 1876 rcv 599 rx rate
>>>>> 2402560 nb
>>>>>>>> pkts 1877 rx rate sum 3156416000
>>>>>>>>>> ... despite that there is _NO_ congestion...
>>>>>>>>>> congestion at the subport or pipe.
>>>>>>>>>>> And the subport !! doesn't use about 42 mbit/s of available
>>>>>>>>>>> bandwidth.
>>>>>>>>>>> The only difference is those test configurations is TC of
>>>>>>>>>>> generated traffic.
>>>>>>>>>>> Test 1 uses TC 1 while test 2 uses TC 3 (which is use TC_OV
>>>>>>>>>>> function).
>>>>>>>>>>> So, enabling TC_OV changes the results dramatically.
>>>>>>>>>>> ##
>>>>>>>>>>> ## test1
>>>>>>>>>>> ##
>>>>>>>>>>> hqos add profile 7 rate 2 M size 1000000 tc period 40
>>>>>>>>>>> # qos test port
>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue
>>>>> sizes
>>>>>>>>>>> 64 64 64 64
>>>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period
>>>>> 10
>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add port
>>>>> 1
>>>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port
>>>>> 1
>>>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate
>>>>> 1M TC
>>>>>>>>>>> 1 ...
>>>>>>>>>>> rcv 284 rx rate 995840 nb pkts 778 rcv 285 rx rate 995840
>>>>> nb
>>>>>>>>>>> pkts 778 rcv 286 rx rate 995840 nb pkts 778 rcv 287 rx rate
>>>>>>>>>>> 995840 nb pkts 778 rcv 288 rx rate 995840 nb pkts 778 rcv
>>>>> 289
>>>>>>>>>>> rx rate 995840 nb pkts 778 rcv 290 rx rate 995840 nb pkts
>>>>> 778
>>>>>>>>>>> rcv 291 rx rate 995840 nb pkts 778 rcv 292 rx rate 995840
>>>>> nb
>>>>>>>>>>> pkts 778 rcv 293 rx rate 995840 nb pkts 778 rcv 294 rx rate
>>>>>>>>>>> 995840 nb pkts 778 ...
>>>>>>>>>>> sum pipe's rx rate is 298 494 720 OK.
>>>>>>>>>>> The subport rate is equally distributed to 300 pipes.
>>>>>>>>>>> ##
>>>>>>>>>>> ## test 2
>>>>>>>>>>> ##
>>>>>>>>>>> hqos add profile 7 rate 2 M size 1000000 tc period 40
>>>>>>>>>>> # qos test port
>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue
>>>>> sizes
>>>>>>>>>>> 64 64 64 64
>>>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period
>>>>> 10
>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add port
>>>>> 1
>>>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port
>>>>> 1
>>>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate
>>>>> 1M TC
>>>>>>>>>>> 3
>>>>>>>>>>> h5 ~ # rcli sh qos rcv
>>>>>>>>>>> rcv 0 rx rate 875520 nb pkts 684
>>>>>>>>>>> rcv 1 rx rate 856320 nb pkts 669
>>>>>>>>>>> rcv 2 rx rate 849920 nb pkts 664
>>>>>>>>>>> rcv 3 rx rate 853760 nb pkts 667
>>>>>>>>>>> rcv 4 rx rate 867840 nb pkts 678
>>>>>>>>>>> rcv 5 rx rate 844800 nb pkts 660
>>>>>>>>>>> rcv 6 rx rate 852480 nb pkts 666
>>>>>>>>>>> rcv 7 rx rate 855040 nb pkts 668
>>>>>>>>>>> rcv 8 rx rate 865280 nb pkts 676
>>>>>>>>>>> rcv 9 rx rate 846080 nb pkts 661
>>>>>>>>>>> rcv 10 rx rate 858880 nb pkts 671 rcv 11 rx rate 870400
>>>>> nb
>>>>>>>>>>> pkts 680 rcv 12 rx rate 864000 nb pkts 675 rcv 13 rx rate
>>>>>>>>>>> 852480 nb pkts 666 rcv 14 rx rate 855040 nb pkts 668 rcv
>>>>> 15
>>>>>>>>>>> rx rate 857600 nb pkts 670 rcv 16 rx rate 864000 nb pkts
>>>>> 675
>>>>>>>>>>> rcv 17 rx rate 866560 nb pkts 677 rcv 18 rx rate 865280
>>>>> nb
>>>>>>>>>>> pkts 676 rcv 19 rx rate 858880 nb pkts 671 rcv 20 rx rate
>>>>>>>>>>> 856320 nb pkts 669 rcv 21 rx rate 864000 nb pkts 675 rcv
>>>>> 22
>>>>>>>>>>> rx rate 869120 nb pkts 679 rcv 23 rx rate 856320 nb pkts
>>>>> 669
>>>>>>>>>>> rcv 24 rx rate 862720 nb pkts 674 rcv 25 rx rate 865280
>>>>> nb
>>>>>>>>>>> pkts 676 rcv 26 rx rate 867840 nb pkts 678 rcv 27 rx rate
>>>>>>>>>>> 870400 nb pkts 680 rcv 28 rx rate 860160 nb pkts 672 rcv
>>>>> 29
>>>>>>>>>>> rx rate 870400 nb pkts 680 rcv 30 rx rate 869120 nb pkts
>>>>> 679
>>>>>>>>>>> rcv 31 rx rate 870400 nb pkts 680 rcv 32 rx rate 858880
>>>>> nb
>>>>>>>>>>> pkts 671 rcv 33 rx rate 858880 nb pkts 671 rcv 34 rx rate
>>>>>>>>>>> 852480 nb pkts 666 rcv 35 rx rate 874240 nb pkts 683 rcv
>>>>> 36
>>>>>>>>>>> rx rate 855040 nb pkts 668 rcv 37 rx rate 853760 nb pkts
>>>>> 667
>>>>>>>>>>> rcv 38 rx rate 869120 nb pkts 679 rcv 39 rx rate 885760
>>>>> nb
>>>>>>>>>>> pkts 692 rcv 40 rx rate 861440 nb pkts 673 rcv 41 rx rate
>>>>>>>>>>> 852480 nb pkts 666 rcv 42 rx rate 871680 nb pkts 681 ...
>>>>>>>>>>> ...
>>>>>>>>>>> rcv 288 rx rate 766720 nb pkts 599 rcv 289 rx rate 766720
>>>>> nb
>>>>>>>>>>> pkts 599 rcv 290 rx rate 766720 nb pkts 599 rcv 291 rx rate
>>>>>>>>>>> 766720 nb pkts 599 rcv 292 rx rate 762880 nb pkts 596 rcv
>>>>> 293
>>>>>>>>>>> rx rate 762880 nb pkts 596 rcv 294 rx rate 762880 nb pkts
>>>>> 596
>>>>>>>>>>> rcv 295 rx rate 760320 nb pkts 594 rcv 296 rx rate 604160
>>>>> nb
>>>>>>>>>>> pkts 472 rcv 297 rx rate 604160 nb pkts 472 rcv 298 rx rate
>>>>>>>>>>> 604160 nb pkts 472 rcv 299 rx rate 604160 nb pkts 472 rx
>>>>> rate
>>>>>>>>>>> sum 258839040 FAILED.
>>>>>>>>>>> The subport rate is distributed NOT equally between 300
>>>>> pipes.
>>>>>>>>>>> Some subport bandwith (about 42) is not being used!
next prev parent reply other threads:[~2020-12-12 10:22 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-24 13:34 Alex Kiselev
2020-11-25 15:04 ` Alex Kiselev
2020-11-27 12:11 ` Alex Kiselev
2020-12-07 10:00 ` Singh, Jasvinder
2020-12-07 10:46 ` Alex Kiselev
2020-12-07 11:32 ` Singh, Jasvinder
2020-12-07 12:29 ` Alex Kiselev
2020-12-07 16:49 ` Alex Kiselev
2020-12-07 17:31 ` Singh, Jasvinder
2020-12-07 17:45 ` Alex Kiselev
[not found] ` <49019BC8-DDA6-4B39-B395-2A68E91AB424@intel.com>
[not found] ` <226b13286c876e69ad40a65858131b66@therouter.net>
[not found] ` <4536a02973015dc8049834635f145a19@therouter.net>
[not found] ` <f9a27b6493ae1e1e2850a3b459ab9d33@therouter.net>
[not found] ` <B8241A33-0927-4411-A340-9DD0BEE07968@intel.com>
[not found] ` <e6a0429dc4a1a33861a066e3401e85b6@therouter.net>
2020-12-07 22:16 ` Alex Kiselev
2020-12-07 22:32 ` Singh, Jasvinder
2020-12-08 10:52 ` Alex Kiselev
2020-12-08 13:24 ` Singh, Jasvinder
2020-12-09 13:41 ` Alex Kiselev
2020-12-10 10:29 ` Singh, Jasvinder
2020-12-11 21:29 ` Alex Kiselev
2020-12-11 22:06 ` Singh, Jasvinder
2020-12-11 22:27 ` Alex Kiselev
2020-12-11 22:36 ` Alex Kiselev
2020-12-11 22:55 ` Singh, Jasvinder
2020-12-11 23:36 ` Alex Kiselev
2020-12-12 0:20 ` Singh, Jasvinder
2020-12-12 0:45 ` Alex Kiselev
2020-12-12 0:54 ` Alex Kiselev
2020-12-12 1:45 ` Alex Kiselev
2020-12-12 10:22 ` Singh, Jasvinder [this message]
2020-12-12 10:46 ` Alex Kiselev
2020-12-12 17:19 ` Alex Kiselev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3CC4F951-89A4-4845-9DD0-9982AB1A32AB@intel.com \
--to=jasvinder.singh@intel.com \
--cc=alex@therouter.net \
--cc=cristian.dumitrescu@intel.com \
--cc=savinay.dharmappa@intel.com \
--cc=users@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).