DPDK usage discussions
 help / color / mirror / Atom feed
From: Alex Kiselev <alex@therouter.net>
To: "Singh, Jasvinder" <jasvinder.singh@intel.com>
Cc: users@dpdk.org, "Dumitrescu,
	Cristian" <cristian.dumitrescu@intel.com>,
	"Dharmappa, Savinay" <savinay.dharmappa@intel.com>
Subject: Re: [dpdk-users] scheduler issue
Date: Sat, 12 Dec 2020 01:54:07 +0100	[thread overview]
Message-ID: <d1f9857d4eb9c7d78f5ade278ae0ab17@therouter.net> (raw)
In-Reply-To: <4ed02c4280efcfe2bf9e6c51803f807b@therouter.net>

On 2020-12-12 01:45, Alex Kiselev wrote:
> On 2020-12-12 01:20, Singh, Jasvinder wrote:
>>> On 11 Dec 2020, at 23:37, Alex Kiselev <alex@therouter.net> wrote:
>> 
>>> On 2020-12-11 23:55, Singh, Jasvinder wrote:
>>> On 11 Dec 2020, at 22:27, Alex Kiselev <alex@therouter.net> wrote:
>> 
>>>> On 2020-12-11 23:06, Singh, Jasvinder wrote:
>> 
>>> On 11 Dec 2020, at 21:29, Alex Kiselev <alex@therouter.net> wrote:
>> 
>>> On 2020-12-08 14:24, Singh, Jasvinder wrote:
>> 
>>> <snip>
>> 
>>>> [JS] now, returning to 1 mbps pipes situation, try reducing tc
>>> period
>> 
>>>> first at subport and then at  pipe level, if that help in getting
>>> even
>> 
>>>> traffic across low bandwidth pipes.
>> 
>>> reducing subport tc from 10 to 5 period also solved the problem
>>> with 1
>> 
>>> Mbit/s pipes.
>> 
>>> so, my second problem has been solved,
>> 
>>> but the first one with some of low bandwidth pipes stop
>>> transmitting still
>> 
>>> remains.
>> 
>>> I see, try removing "pkt_len <= pipe_tc_ov_credits" condition in
>>> the
>> 
>>> grinder_credits_check() code for oversubscription case, instead use
>> 
>>> this pkt_len <= pipe_tc_credits + pipe_tc_ov_credits;
>> 
>>> if I do what you suggest, I will get this code
>> 
>>> enough_credits = (pkt_len <= subport_tb_credits) &&
>> 
>>> (pkt_len <= subport_tc_credits) &&
>> 
>>> (pkt_len <= pipe_tb_credits) &&
>> 
>>> (pkt_len <= pipe_tc_credits) &&
>> 
>>> (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits);
>> 
>>> And this doesn't make sense since if condition pkt_len <=
>>> pipe_tc_credits is true
>> 
>>> then condition (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits) is
>>> also always true.
>> 
>>> [JS] my suggestion is to remove“pkt_len <= pipe_tc_credits“,
>>> “ pkt_len
>> 
>>> <= pipe_tc_ov_credits”and use only “pkt_len <= pipe_tc_credits
>>> +
>> 
>>> pipe_tc_ov_credits“
>> 
>>> While keeping tc_ov flag on.
>> 
>>> Your suggestion just turns off TC_OV feature.
>> 
>>>> I don't see your point.
>> 
>>>> This new suggestion will also effectively turn off the TC_OV
>>>> feature since
>> 
>>>> the only effect of enabling TC_OV is adding additional condition
>> 
>>>> pkt_len <= pipe_tc_ov_credits
>> 
>>>> which doesn't allow a pipe to spend more resources than it should.
>> 
>>>> And in the case of support congestion a pipe should spent less
>>>> than %100 of pipe's maximum rate.
>> 
>>>> And you suggest to allow pipe to spend 100% of it's rate plus some
>>>> extra.
>> 
>>>> I guess effect of this would even more unfair support's bandwidth
>>>> distibution.
>> 
>>>> Btw, a pipe might stop transmitting even when there is no
>>>> congestion at a subport.
>> 
>>> Although I didn’t try this solution but the idea here is - in a
>> 
>>> particular round, of pkt_len is less than pipe_tc_credits( which is
>>> a
>> 
>>> constant value each time) but greater than pipe_tc_ov_credits, then
>>> it
>> 
>>> might hit the situation when no packet will be scheduled from the
>>> pipe
>> 
>>> even though there are fixed credits greater than packet size is
>> 
>>> available.
>> 
>> But that is a perfectly normal situation and that's exactly the idea
>> behind TC_OV.
>> It means a pipe should wait for the next subport->tc_ov_period_id
>> when pipe_tc_ov_credits will be reset to a new value
>> 
>> But here it’s not guaranteed that new value of pipe_tc_ov_credits
>> will be sufficient for low bandwidth pipe to send their packets as
>> each time pipe_tc_ov_credits is freshly computed.
>> 
>>> pipe->tc_ov_credits = subport->tc_ov_wm * params->tc_ov_weight;
>>> 
>>> which allows the pipe to continue transmitting.
>> 
>> No that won’t happen if new tc_ov_credits value is again less than
>> pkt_len and will hit deadlock.
> 
> new tc_ov_credits can't not be less than subport->tc_ov_wm_min,
> and tc_ov_wm_min is equal to port->mtu.
> all my scheduler ports configured with mtu 1522. etherdev ports also 
> uses
> the same mtu, therefore there should be no packets bigger that 1522.

also, tc_ov_credits is set to tc_ov_wm_min only in the case of constant
congestion and today I detected the problem when there was no 
congestion.
so, it's highly unlikely that tc_ov_credits is always set to a value
less than pkt_size. The only scenario in which this might be the case is
when scheduler port get a corrupted mbuf with incorrect pkt len
which cause a queue deadlock.

> 
> Maybe I should increase port's MTU? to 1540?
> 
>> 
>>> And it could not cause a permanent pipe stop which is what I am
>>> facing.
>> 
>>>> In fairness, pipe should send the as much as packets which
>>> 
>>>> consumes pipe_tc_credits, regardless of extra pipe_tc_ov_credits
>>>> which
>>> 
>>>> is extra on top of pipe_tc_credits.
>>> 
>>> I think it's quite the opposite. That's why after I reduced the
>>> support tc_period
>>> I got much more fairness. Since reducing subport tc_period also
>>> reduce the tc_ov_wm_max value.
>>> s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(params->tc_period,
>>> port->pipe_tc3_rate_max)
>>> as a result a pipe transmits less bytes in one round. so pipe
>>> rotation inside a grinder
>>> happens much more often and a pipe can't monopolise resources.
>>> 
>>> in other sos implementation this is called "quantum".
>> 
>> Yes, so reducing tc period makes the case when all pipes ( high n low
>> bandwidth) gets lower values of  tc_ov_credits  values which allow
>> lesser transmission from higher bw pipes and leave bandwidth for low
>> bw pipes. So, here is the thing- Either tune tc period to a value
>> which prevent high bw pipe hogging most of bw or makes changes in the
>> code, where oversubscription add extra credits on top of guaranteed.
>> 
>> One question, don’t your low bw pipes have higher priority traffic
>> tc0, tc1, tc2 . Packets from those tc must be going out. Isn’t this
>> the case ?
> 
> well, it would be the case after I find out
> what's going on. Right now I am using a tos2tc map configured
> in such a way that all ipv4 packets with any TOS values
> goes into TC3.
> 
>> 
>>>> 
>> 
>>>> 
>> 
>>>> 
>> 
>>>>>> rcv 0   rx rate 7324160 nb pkts 5722
>> 
>>>>>> rcv 1   rx rate 7281920 nb pkts 5689
>> 
>>>>>> rcv 2   rx rate 7226880 nb pkts 5646
>> 
>>>>>> rcv 3   rx rate 7124480 nb pkts 5566
>> 
>>>>>> rcv 4   rx rate 7324160 nb pkts 5722
>> 
>>>>>> rcv 5   rx rate 7271680 nb pkts 5681
>> 
>>>>>> rcv 6   rx rate 7188480 nb pkts 5616
>> 
>>>>>> rcv 7   rx rate 7150080 nb pkts 5586
>> 
>>>>>> rcv 8   rx rate 7328000 nb pkts 5725
>> 
>>>>>> rcv 9   rx rate 7249920 nb pkts 5664
>> 
>>>>>> rcv 10  rx rate 7188480 nb pkts 5616 rcv 11  rx rate 7179520 nb
>>> pkts
>> 
>>>>>> 5609 rcv 12  rx rate 7324160 nb pkts 5722 rcv 13  rx rate
>>> 7208960 nb
>> 
>>>>>> pkts 5632 rcv 14  rx rate 7152640 nb pkts 5588 rcv 15  rx rate
>> 
>>>>>> 7127040 nb pkts 5568 rcv 16  rx rate 7303680 nb pkts 5706 ....
>> 
>>>>>> rcv 587 rx rate 2406400 nb pkts 1880 rcv 588 rx rate 2406400 nb
>>> pkts
>> 
>>>>>> 1880 rcv 589 rx rate 2406400 nb pkts 1880 rcv 590 rx rate
>>> 2406400 nb
>> 
>>>>>> pkts 1880 rcv 591 rx rate 2406400 nb pkts 1880 rcv 592 rx rate
>> 
>>>>>> 2398720 nb pkts 1874 rcv 593 rx rate 2400000 nb pkts 1875 rcv
>>> 594 rx
>> 
>>>>>> rate 2400000 nb pkts 1875 rcv 595 rx rate 2400000 nb pkts 1875
>>> rcv
>> 
>>>>>> 596 rx rate 2401280 nb pkts 1876 rcv 597 rx rate 2401280 nb
>>> pkts
>> 
>>>>>> 1876 rcv 598 rx rate 2401280 nb pkts 1876 rcv 599 rx rate
>>> 2402560 nb
>> 
>>>>>> pkts 1877 rx rate sum 3156416000
>> 
>>>>> 
>> 
>>>>> 
>> 
>>>>> 
>> 
>>>>>>>> ... despite that there is _NO_ congestion...
>> 
>>>>>>>> congestion at the subport or pipe.
>> 
>>>>>>>>> And the subport !! doesn't use about 42 mbit/s of available
>> 
>>>>>>>>> bandwidth.
>> 
>>>>>>>>> The only difference is those test configurations is TC of
>> 
>>>>>>>>> generated traffic.
>> 
>>>>>>>>> Test 1 uses TC 1 while test 2 uses TC 3 (which is use TC_OV
>> 
>>>>>>>>> function).
>> 
>>>>>>>>> So, enabling TC_OV changes the results dramatically.
>> 
>>>>>>>>> ##
>> 
>>>>>>>>> ## test1
>> 
>>>>>>>>> ##
>> 
>>>>>>>>> hqos add profile  7 rate    2 M size 1000000 tc period 40
>> 
>>>>>>>>> # qos test port
>> 
>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue
>>> sizes
>> 
>>>>>>>>> 64 64 64 64
>> 
>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period
>>> 10
>> 
>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add port
>>> 1
>> 
>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port
>>> 1
>> 
>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate
>>> 1M TC
>> 
>>>>>>>>> 1 ...
>> 
>>>>>>>>> rcv 284 rx rate 995840  nb pkts 778 rcv 285 rx rate 995840
>>> nb
>> 
>>>>>>>>> pkts 778 rcv 286 rx rate 995840  nb pkts 778 rcv 287 rx rate
>> 
>>>>>>>>> 995840  nb pkts 778 rcv 288 rx rate 995840  nb pkts 778 rcv
>>> 289
>> 
>>>>>>>>> rx rate 995840  nb pkts 778 rcv 290 rx rate 995840  nb pkts
>>> 778
>> 
>>>>>>>>> rcv 291 rx rate 995840  nb pkts 778 rcv 292 rx rate 995840
>>> nb
>> 
>>>>>>>>> pkts 778 rcv 293 rx rate 995840  nb pkts 778 rcv 294 rx rate
>> 
>>>>>>>>> 995840  nb pkts 778 ...
>> 
>>>>>>>>> sum pipe's rx rate is 298 494 720 OK.
>> 
>>>>>>>>> The subport rate is equally distributed to 300 pipes.
>> 
>>>>>>>>> ##
>> 
>>>>>>>>> ##  test 2
>> 
>>>>>>>>> ##
>> 
>>>>>>>>> hqos add profile  7 rate    2 M size 1000000 tc period 40
>> 
>>>>>>>>> # qos test port
>> 
>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue
>>> sizes
>> 
>>>>>>>>> 64 64 64 64
>> 
>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period
>>> 10
>> 
>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add port
>>> 1
>> 
>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port
>>> 1
>> 
>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate
>>> 1M TC
>> 
>>>>>>>>> 3
>> 
>>>>>>>>> h5 ~ # rcli sh qos rcv
>> 
>>>>>>>>> rcv 0   rx rate 875520  nb pkts 684
>> 
>>>>>>>>> rcv 1   rx rate 856320  nb pkts 669
>> 
>>>>>>>>> rcv 2   rx rate 849920  nb pkts 664
>> 
>>>>>>>>> rcv 3   rx rate 853760  nb pkts 667
>> 
>>>>>>>>> rcv 4   rx rate 867840  nb pkts 678
>> 
>>>>>>>>> rcv 5   rx rate 844800  nb pkts 660
>> 
>>>>>>>>> rcv 6   rx rate 852480  nb pkts 666
>> 
>>>>>>>>> rcv 7   rx rate 855040  nb pkts 668
>> 
>>>>>>>>> rcv 8   rx rate 865280  nb pkts 676
>> 
>>>>>>>>> rcv 9   rx rate 846080  nb pkts 661
>> 
>>>>>>>>> rcv 10  rx rate 858880  nb pkts 671 rcv 11  rx rate 870400
>>> nb
>> 
>>>>>>>>> pkts 680 rcv 12  rx rate 864000  nb pkts 675 rcv 13  rx rate
>> 
>>>>>>>>> 852480  nb pkts 666 rcv 14  rx rate 855040  nb pkts 668 rcv
>>> 15
>> 
>>>>>>>>> rx rate 857600  nb pkts 670 rcv 16  rx rate 864000  nb pkts
>>> 675
>> 
>>>>>>>>> rcv 17  rx rate 866560  nb pkts 677 rcv 18  rx rate 865280
>>> nb
>> 
>>>>>>>>> pkts 676 rcv 19  rx rate 858880  nb pkts 671 rcv 20  rx rate
>> 
>>>>>>>>> 856320  nb pkts 669 rcv 21  rx rate 864000  nb pkts 675 rcv
>>> 22
>> 
>>>>>>>>> rx rate 869120  nb pkts 679 rcv 23  rx rate 856320  nb pkts
>>> 669
>> 
>>>>>>>>> rcv 24  rx rate 862720  nb pkts 674 rcv 25  rx rate 865280
>>> nb
>> 
>>>>>>>>> pkts 676 rcv 26  rx rate 867840  nb pkts 678 rcv 27  rx rate
>> 
>>>>>>>>> 870400  nb pkts 680 rcv 28  rx rate 860160  nb pkts 672 rcv
>>> 29
>> 
>>>>>>>>> rx rate 870400  nb pkts 680 rcv 30  rx rate 869120  nb pkts
>>> 679
>> 
>>>>>>>>> rcv 31  rx rate 870400  nb pkts 680 rcv 32  rx rate 858880
>>> nb
>> 
>>>>>>>>> pkts 671 rcv 33  rx rate 858880  nb pkts 671 rcv 34  rx rate
>> 
>>>>>>>>> 852480  nb pkts 666 rcv 35  rx rate 874240  nb pkts 683 rcv
>>> 36
>> 
>>>>>>>>> rx rate 855040  nb pkts 668 rcv 37  rx rate 853760  nb pkts
>>> 667
>> 
>>>>>>>>> rcv 38  rx rate 869120  nb pkts 679 rcv 39  rx rate 885760
>>> nb
>> 
>>>>>>>>> pkts 692 rcv 40  rx rate 861440  nb pkts 673 rcv 41  rx rate
>> 
>>>>>>>>> 852480  nb pkts 666 rcv 42  rx rate 871680  nb pkts 681 ...
>> 
>>>>>>>>> ...
>> 
>>>>>>>>> rcv 288 rx rate 766720  nb pkts 599 rcv 289 rx rate 766720
>>> nb
>> 
>>>>>>>>> pkts 599 rcv 290 rx rate 766720  nb pkts 599 rcv 291 rx rate
>> 
>>>>>>>>> 766720  nb pkts 599 rcv 292 rx rate 762880  nb pkts 596 rcv
>>> 293
>> 
>>>>>>>>> rx rate 762880  nb pkts 596 rcv 294 rx rate 762880  nb pkts
>>> 596
>> 
>>>>>>>>> rcv 295 rx rate 760320  nb pkts 594 rcv 296 rx rate 604160
>>> nb
>> 
>>>>>>>>> pkts 472 rcv 297 rx rate 604160  nb pkts 472 rcv 298 rx rate
>> 
>>>>>>>>> 604160  nb pkts 472 rcv 299 rx rate 604160  nb pkts 472 rx
>>> rate
>> 
>>>>>>>>> sum 258839040 FAILED.
>> 
>>>>>>>>> The subport rate is distributed NOT equally between 300
>>> pipes.
>> 
>>>>>>>>> Some subport bandwith (about 42) is not being used!

  reply	other threads:[~2020-12-12  0:54 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24 13:34 Alex Kiselev
2020-11-25 15:04 ` Alex Kiselev
2020-11-27 12:11   ` Alex Kiselev
2020-12-07 10:00     ` Singh, Jasvinder
2020-12-07 10:46       ` Alex Kiselev
2020-12-07 11:32         ` Singh, Jasvinder
2020-12-07 12:29           ` Alex Kiselev
2020-12-07 16:49           ` Alex Kiselev
2020-12-07 17:31             ` Singh, Jasvinder
2020-12-07 17:45               ` Alex Kiselev
     [not found]                 ` <49019BC8-DDA6-4B39-B395-2A68E91AB424@intel.com>
     [not found]                   ` <226b13286c876e69ad40a65858131b66@therouter.net>
     [not found]                     ` <4536a02973015dc8049834635f145a19@therouter.net>
     [not found]                       ` <f9a27b6493ae1e1e2850a3b459ab9d33@therouter.net>
     [not found]                         ` <B8241A33-0927-4411-A340-9DD0BEE07968@intel.com>
     [not found]                           ` <e6a0429dc4a1a33861a066e3401e85b6@therouter.net>
2020-12-07 22:16                             ` Alex Kiselev
2020-12-07 22:32                               ` Singh, Jasvinder
2020-12-08 10:52                                 ` Alex Kiselev
2020-12-08 13:24                                   ` Singh, Jasvinder
2020-12-09 13:41                                     ` Alex Kiselev
2020-12-10 10:29                                       ` Singh, Jasvinder
2020-12-11 21:29                                     ` Alex Kiselev
2020-12-11 22:06                                       ` Singh, Jasvinder
2020-12-11 22:27                                         ` Alex Kiselev
2020-12-11 22:36                                           ` Alex Kiselev
2020-12-11 22:55                                           ` Singh, Jasvinder
2020-12-11 23:36                                             ` Alex Kiselev
2020-12-12  0:20                                               ` Singh, Jasvinder
2020-12-12  0:45                                                 ` Alex Kiselev
2020-12-12  0:54                                                   ` Alex Kiselev [this message]
2020-12-12  1:45                                                     ` Alex Kiselev
2020-12-12 10:22                                                       ` Singh, Jasvinder
2020-12-12 10:46                                                         ` Alex Kiselev
2020-12-12 17:19                                                           ` Alex Kiselev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d1f9857d4eb9c7d78f5ade278ae0ab17@therouter.net \
    --to=alex@therouter.net \
    --cc=cristian.dumitrescu@intel.com \
    --cc=jasvinder.singh@intel.com \
    --cc=savinay.dharmappa@intel.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).