DPDK usage discussions
 help / color / mirror / Atom feed
From: Alex Kiselev <alex@therouter.net>
To: "Singh, Jasvinder" <jasvinder.singh@intel.com>
Cc: users@dpdk.org, "Dumitrescu,
	Cristian" <cristian.dumitrescu@intel.com>,
	"Dharmappa, Savinay" <savinay.dharmappa@intel.com>
Subject: Re: [dpdk-users] scheduler issue
Date: Sat, 12 Dec 2020 02:45:33 +0100	[thread overview]
Message-ID: <7b20d7eec340a8de3843ea60a7166715@therouter.net> (raw)
In-Reply-To: <d1f9857d4eb9c7d78f5ade278ae0ab17@therouter.net>

On 2020-12-12 01:54, Alex Kiselev wrote:
> On 2020-12-12 01:45, Alex Kiselev wrote:
>> On 2020-12-12 01:20, Singh, Jasvinder wrote:
>>>> On 11 Dec 2020, at 23:37, Alex Kiselev <alex@therouter.net> wrote:
>>> 
>>>> On 2020-12-11 23:55, Singh, Jasvinder wrote:
>>>> On 11 Dec 2020, at 22:27, Alex Kiselev <alex@therouter.net> wrote:
>>> 
>>>>> On 2020-12-11 23:06, Singh, Jasvinder wrote:
>>> 
>>>> On 11 Dec 2020, at 21:29, Alex Kiselev <alex@therouter.net> wrote:
>>> 
>>>> On 2020-12-08 14:24, Singh, Jasvinder wrote:
>>> 
>>>> <snip>
>>> 
>>>>> [JS] now, returning to 1 mbps pipes situation, try reducing tc
>>>> period
>>> 
>>>>> first at subport and then at  pipe level, if that help in getting
>>>> even
>>> 
>>>>> traffic across low bandwidth pipes.
>>> 
>>>> reducing subport tc from 10 to 5 period also solved the problem
>>>> with 1
>>> 
>>>> Mbit/s pipes.
>>> 
>>>> so, my second problem has been solved,
>>> 
>>>> but the first one with some of low bandwidth pipes stop
>>>> transmitting still
>>> 
>>>> remains.
>>> 
>>>> I see, try removing "pkt_len <= pipe_tc_ov_credits" condition in
>>>> the
>>> 
>>>> grinder_credits_check() code for oversubscription case, instead use
>>> 
>>>> this pkt_len <= pipe_tc_credits + pipe_tc_ov_credits;
>>> 
>>>> if I do what you suggest, I will get this code
>>> 
>>>> enough_credits = (pkt_len <= subport_tb_credits) &&
>>> 
>>>> (pkt_len <= subport_tc_credits) &&
>>> 
>>>> (pkt_len <= pipe_tb_credits) &&
>>> 
>>>> (pkt_len <= pipe_tc_credits) &&
>>> 
>>>> (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits);
>>> 
>>>> And this doesn't make sense since if condition pkt_len <=
>>>> pipe_tc_credits is true
>>> 
>>>> then condition (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits) is
>>>> also always true.
>>> 
>>>> [JS] my suggestion is to remove“pkt_len <= pipe_tc_credits“,
>>>> “ pkt_len
>>> 
>>>> <= pipe_tc_ov_credits”and use only “pkt_len <= pipe_tc_credits
>>>> +
>>> 
>>>> pipe_tc_ov_credits“
>>> 
>>>> While keeping tc_ov flag on.
>>> 
>>>> Your suggestion just turns off TC_OV feature.
>>> 
>>>>> I don't see your point.
>>> 
>>>>> This new suggestion will also effectively turn off the TC_OV
>>>>> feature since
>>> 
>>>>> the only effect of enabling TC_OV is adding additional condition
>>> 
>>>>> pkt_len <= pipe_tc_ov_credits
>>> 
>>>>> which doesn't allow a pipe to spend more resources than it should.
>>> 
>>>>> And in the case of support congestion a pipe should spent less
>>>>> than %100 of pipe's maximum rate.
>>> 
>>>>> And you suggest to allow pipe to spend 100% of it's rate plus some
>>>>> extra.
>>> 
>>>>> I guess effect of this would even more unfair support's bandwidth
>>>>> distibution.
>>> 
>>>>> Btw, a pipe might stop transmitting even when there is no
>>>>> congestion at a subport.
>>> 
>>>> Although I didn’t try this solution but the idea here is - in a
>>> 
>>>> particular round, of pkt_len is less than pipe_tc_credits( which is
>>>> a
>>> 
>>>> constant value each time) but greater than pipe_tc_ov_credits, then
>>>> it
>>> 
>>>> might hit the situation when no packet will be scheduled from the
>>>> pipe
>>> 
>>>> even though there are fixed credits greater than packet size is
>>> 
>>>> available.
>>> 
>>> But that is a perfectly normal situation and that's exactly the idea
>>> behind TC_OV.
>>> It means a pipe should wait for the next subport->tc_ov_period_id
>>> when pipe_tc_ov_credits will be reset to a new value
>>> 
>>> But here it’s not guaranteed that new value of pipe_tc_ov_credits
>>> will be sufficient for low bandwidth pipe to send their packets as
>>> each time pipe_tc_ov_credits is freshly computed.
>>> 
>>>> pipe->tc_ov_credits = subport->tc_ov_wm * params->tc_ov_weight;
>>>> 
>>>> which allows the pipe to continue transmitting.
>>> 
>>> No that won’t happen if new tc_ov_credits value is again less than
>>> pkt_len and will hit deadlock.
>> 
>> new tc_ov_credits can't not be less than subport->tc_ov_wm_min,
>> and tc_ov_wm_min is equal to port->mtu.
>> all my scheduler ports configured with mtu 1522. etherdev ports also 
>> uses
>> the same mtu, therefore there should be no packets bigger that 1522.
> 
> also, tc_ov_credits is set to tc_ov_wm_min only in the case of constant
> congestion and today I detected the problem when there was no 
> congestion.
> so, it's highly unlikely that tc_ov_credits is always set to a value
> less than pkt_size. The only scenario in which this might be the case 
> is
> when scheduler port get a corrupted mbuf with incorrect pkt len
> which cause a queue deadlock.

also, a defragmented ipv4 packet (multisegment mbuf) might have pkt_len 
much bigger
then scheduler port's MTU, therefore you are right, there is absolutely 
no guarantee
that packet will not cause queue's deadlock. and this explanation sounds 
very plausible
to me and I bet this is my case.


> 
>> 
>> Maybe I should increase port's MTU? to 1540?
>> 
>>> 
>>>> And it could not cause a permanent pipe stop which is what I am
>>>> facing.
>>> 
>>>>> In fairness, pipe should send the as much as packets which
>>>> 
>>>>> consumes pipe_tc_credits, regardless of extra pipe_tc_ov_credits
>>>>> which
>>>> 
>>>>> is extra on top of pipe_tc_credits.
>>>> 
>>>> I think it's quite the opposite. That's why after I reduced the
>>>> support tc_period
>>>> I got much more fairness. Since reducing subport tc_period also
>>>> reduce the tc_ov_wm_max value.
>>>> s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(params->tc_period,
>>>> port->pipe_tc3_rate_max)
>>>> as a result a pipe transmits less bytes in one round. so pipe
>>>> rotation inside a grinder
>>>> happens much more often and a pipe can't monopolise resources.
>>>> 
>>>> in other sos implementation this is called "quantum".
>>> 
>>> Yes, so reducing tc period makes the case when all pipes ( high n low
>>> bandwidth) gets lower values of  tc_ov_credits  values which allow
>>> lesser transmission from higher bw pipes and leave bandwidth for low
>>> bw pipes. So, here is the thing- Either tune tc period to a value
>>> which prevent high bw pipe hogging most of bw or makes changes in the
>>> code, where oversubscription add extra credits on top of guaranteed.
>>> 
>>> One question, don’t your low bw pipes have higher priority traffic
>>> tc0, tc1, tc2 . Packets from those tc must be going out. Isn’t this
>>> the case ?
>> 
>> well, it would be the case after I find out
>> what's going on. Right now I am using a tos2tc map configured
>> in such a way that all ipv4 packets with any TOS values
>> goes into TC3.
>> 
>>> 
>>>>> 
>>> 
>>>>> 
>>> 
>>>>> 
>>> 
>>>>>>> rcv 0   rx rate 7324160 nb pkts 5722
>>> 
>>>>>>> rcv 1   rx rate 7281920 nb pkts 5689
>>> 
>>>>>>> rcv 2   rx rate 7226880 nb pkts 5646
>>> 
>>>>>>> rcv 3   rx rate 7124480 nb pkts 5566
>>> 
>>>>>>> rcv 4   rx rate 7324160 nb pkts 5722
>>> 
>>>>>>> rcv 5   rx rate 7271680 nb pkts 5681
>>> 
>>>>>>> rcv 6   rx rate 7188480 nb pkts 5616
>>> 
>>>>>>> rcv 7   rx rate 7150080 nb pkts 5586
>>> 
>>>>>>> rcv 8   rx rate 7328000 nb pkts 5725
>>> 
>>>>>>> rcv 9   rx rate 7249920 nb pkts 5664
>>> 
>>>>>>> rcv 10  rx rate 7188480 nb pkts 5616 rcv 11  rx rate 7179520 nb
>>>> pkts
>>> 
>>>>>>> 5609 rcv 12  rx rate 7324160 nb pkts 5722 rcv 13  rx rate
>>>> 7208960 nb
>>> 
>>>>>>> pkts 5632 rcv 14  rx rate 7152640 nb pkts 5588 rcv 15  rx rate
>>> 
>>>>>>> 7127040 nb pkts 5568 rcv 16  rx rate 7303680 nb pkts 5706 ....
>>> 
>>>>>>> rcv 587 rx rate 2406400 nb pkts 1880 rcv 588 rx rate 2406400 nb
>>>> pkts
>>> 
>>>>>>> 1880 rcv 589 rx rate 2406400 nb pkts 1880 rcv 590 rx rate
>>>> 2406400 nb
>>> 
>>>>>>> pkts 1880 rcv 591 rx rate 2406400 nb pkts 1880 rcv 592 rx rate
>>> 
>>>>>>> 2398720 nb pkts 1874 rcv 593 rx rate 2400000 nb pkts 1875 rcv
>>>> 594 rx
>>> 
>>>>>>> rate 2400000 nb pkts 1875 rcv 595 rx rate 2400000 nb pkts 1875
>>>> rcv
>>> 
>>>>>>> 596 rx rate 2401280 nb pkts 1876 rcv 597 rx rate 2401280 nb
>>>> pkts
>>> 
>>>>>>> 1876 rcv 598 rx rate 2401280 nb pkts 1876 rcv 599 rx rate
>>>> 2402560 nb
>>> 
>>>>>>> pkts 1877 rx rate sum 3156416000
>>> 
>>>>>> 
>>> 
>>>>>> 
>>> 
>>>>>> 
>>> 
>>>>>>>>> ... despite that there is _NO_ congestion...
>>> 
>>>>>>>>> congestion at the subport or pipe.
>>> 
>>>>>>>>>> And the subport !! doesn't use about 42 mbit/s of available
>>> 
>>>>>>>>>> bandwidth.
>>> 
>>>>>>>>>> The only difference is those test configurations is TC of
>>> 
>>>>>>>>>> generated traffic.
>>> 
>>>>>>>>>> Test 1 uses TC 1 while test 2 uses TC 3 (which is use TC_OV
>>> 
>>>>>>>>>> function).
>>> 
>>>>>>>>>> So, enabling TC_OV changes the results dramatically.
>>> 
>>>>>>>>>> ##
>>> 
>>>>>>>>>> ## test1
>>> 
>>>>>>>>>> ##
>>> 
>>>>>>>>>> hqos add profile  7 rate    2 M size 1000000 tc period 40
>>> 
>>>>>>>>>> # qos test port
>>> 
>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue
>>>> sizes
>>> 
>>>>>>>>>> 64 64 64 64
>>> 
>>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period
>>>> 10
>>> 
>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add port
>>>> 1
>>> 
>>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port
>>>> 1
>>> 
>>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate
>>>> 1M TC
>>> 
>>>>>>>>>> 1 ...
>>> 
>>>>>>>>>> rcv 284 rx rate 995840  nb pkts 778 rcv 285 rx rate 995840
>>>> nb
>>> 
>>>>>>>>>> pkts 778 rcv 286 rx rate 995840  nb pkts 778 rcv 287 rx rate
>>> 
>>>>>>>>>> 995840  nb pkts 778 rcv 288 rx rate 995840  nb pkts 778 rcv
>>>> 289
>>> 
>>>>>>>>>> rx rate 995840  nb pkts 778 rcv 290 rx rate 995840  nb pkts
>>>> 778
>>> 
>>>>>>>>>> rcv 291 rx rate 995840  nb pkts 778 rcv 292 rx rate 995840
>>>> nb
>>> 
>>>>>>>>>> pkts 778 rcv 293 rx rate 995840  nb pkts 778 rcv 294 rx rate
>>> 
>>>>>>>>>> 995840  nb pkts 778 ...
>>> 
>>>>>>>>>> sum pipe's rx rate is 298 494 720 OK.
>>> 
>>>>>>>>>> The subport rate is equally distributed to 300 pipes.
>>> 
>>>>>>>>>> ##
>>> 
>>>>>>>>>> ##  test 2
>>> 
>>>>>>>>>> ##
>>> 
>>>>>>>>>> hqos add profile  7 rate    2 M size 1000000 tc period 40
>>> 
>>>>>>>>>> # qos test port
>>> 
>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue
>>>> sizes
>>> 
>>>>>>>>>> 64 64 64 64
>>> 
>>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period
>>>> 10
>>> 
>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add port
>>>> 1
>>> 
>>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port
>>>> 1
>>> 
>>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate
>>>> 1M TC
>>> 
>>>>>>>>>> 3
>>> 
>>>>>>>>>> h5 ~ # rcli sh qos rcv
>>> 
>>>>>>>>>> rcv 0   rx rate 875520  nb pkts 684
>>> 
>>>>>>>>>> rcv 1   rx rate 856320  nb pkts 669
>>> 
>>>>>>>>>> rcv 2   rx rate 849920  nb pkts 664
>>> 
>>>>>>>>>> rcv 3   rx rate 853760  nb pkts 667
>>> 
>>>>>>>>>> rcv 4   rx rate 867840  nb pkts 678
>>> 
>>>>>>>>>> rcv 5   rx rate 844800  nb pkts 660
>>> 
>>>>>>>>>> rcv 6   rx rate 852480  nb pkts 666
>>> 
>>>>>>>>>> rcv 7   rx rate 855040  nb pkts 668
>>> 
>>>>>>>>>> rcv 8   rx rate 865280  nb pkts 676
>>> 
>>>>>>>>>> rcv 9   rx rate 846080  nb pkts 661
>>> 
>>>>>>>>>> rcv 10  rx rate 858880  nb pkts 671 rcv 11  rx rate 870400
>>>> nb
>>> 
>>>>>>>>>> pkts 680 rcv 12  rx rate 864000  nb pkts 675 rcv 13  rx rate
>>> 
>>>>>>>>>> 852480  nb pkts 666 rcv 14  rx rate 855040  nb pkts 668 rcv
>>>> 15
>>> 
>>>>>>>>>> rx rate 857600  nb pkts 670 rcv 16  rx rate 864000  nb pkts
>>>> 675
>>> 
>>>>>>>>>> rcv 17  rx rate 866560  nb pkts 677 rcv 18  rx rate 865280
>>>> nb
>>> 
>>>>>>>>>> pkts 676 rcv 19  rx rate 858880  nb pkts 671 rcv 20  rx rate
>>> 
>>>>>>>>>> 856320  nb pkts 669 rcv 21  rx rate 864000  nb pkts 675 rcv
>>>> 22
>>> 
>>>>>>>>>> rx rate 869120  nb pkts 679 rcv 23  rx rate 856320  nb pkts
>>>> 669
>>> 
>>>>>>>>>> rcv 24  rx rate 862720  nb pkts 674 rcv 25  rx rate 865280
>>>> nb
>>> 
>>>>>>>>>> pkts 676 rcv 26  rx rate 867840  nb pkts 678 rcv 27  rx rate
>>> 
>>>>>>>>>> 870400  nb pkts 680 rcv 28  rx rate 860160  nb pkts 672 rcv
>>>> 29
>>> 
>>>>>>>>>> rx rate 870400  nb pkts 680 rcv 30  rx rate 869120  nb pkts
>>>> 679
>>> 
>>>>>>>>>> rcv 31  rx rate 870400  nb pkts 680 rcv 32  rx rate 858880
>>>> nb
>>> 
>>>>>>>>>> pkts 671 rcv 33  rx rate 858880  nb pkts 671 rcv 34  rx rate
>>> 
>>>>>>>>>> 852480  nb pkts 666 rcv 35  rx rate 874240  nb pkts 683 rcv
>>>> 36
>>> 
>>>>>>>>>> rx rate 855040  nb pkts 668 rcv 37  rx rate 853760  nb pkts
>>>> 667
>>> 
>>>>>>>>>> rcv 38  rx rate 869120  nb pkts 679 rcv 39  rx rate 885760
>>>> nb
>>> 
>>>>>>>>>> pkts 692 rcv 40  rx rate 861440  nb pkts 673 rcv 41  rx rate
>>> 
>>>>>>>>>> 852480  nb pkts 666 rcv 42  rx rate 871680  nb pkts 681 ...
>>> 
>>>>>>>>>> ...
>>> 
>>>>>>>>>> rcv 288 rx rate 766720  nb pkts 599 rcv 289 rx rate 766720
>>>> nb
>>> 
>>>>>>>>>> pkts 599 rcv 290 rx rate 766720  nb pkts 599 rcv 291 rx rate
>>> 
>>>>>>>>>> 766720  nb pkts 599 rcv 292 rx rate 762880  nb pkts 596 rcv
>>>> 293
>>> 
>>>>>>>>>> rx rate 762880  nb pkts 596 rcv 294 rx rate 762880  nb pkts
>>>> 596
>>> 
>>>>>>>>>> rcv 295 rx rate 760320  nb pkts 594 rcv 296 rx rate 604160
>>>> nb
>>> 
>>>>>>>>>> pkts 472 rcv 297 rx rate 604160  nb pkts 472 rcv 298 rx rate
>>> 
>>>>>>>>>> 604160  nb pkts 472 rcv 299 rx rate 604160  nb pkts 472 rx
>>>> rate
>>> 
>>>>>>>>>> sum 258839040 FAILED.
>>> 
>>>>>>>>>> The subport rate is distributed NOT equally between 300
>>>> pipes.
>>> 
>>>>>>>>>> Some subport bandwith (about 42) is not being used!

  reply	other threads:[~2020-12-12  1:45 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24 13:34 Alex Kiselev
2020-11-25 15:04 ` Alex Kiselev
2020-11-27 12:11   ` Alex Kiselev
2020-12-07 10:00     ` Singh, Jasvinder
2020-12-07 10:46       ` Alex Kiselev
2020-12-07 11:32         ` Singh, Jasvinder
2020-12-07 12:29           ` Alex Kiselev
2020-12-07 16:49           ` Alex Kiselev
2020-12-07 17:31             ` Singh, Jasvinder
2020-12-07 17:45               ` Alex Kiselev
     [not found]                 ` <49019BC8-DDA6-4B39-B395-2A68E91AB424@intel.com>
     [not found]                   ` <226b13286c876e69ad40a65858131b66@therouter.net>
     [not found]                     ` <4536a02973015dc8049834635f145a19@therouter.net>
     [not found]                       ` <f9a27b6493ae1e1e2850a3b459ab9d33@therouter.net>
     [not found]                         ` <B8241A33-0927-4411-A340-9DD0BEE07968@intel.com>
     [not found]                           ` <e6a0429dc4a1a33861a066e3401e85b6@therouter.net>
2020-12-07 22:16                             ` Alex Kiselev
2020-12-07 22:32                               ` Singh, Jasvinder
2020-12-08 10:52                                 ` Alex Kiselev
2020-12-08 13:24                                   ` Singh, Jasvinder
2020-12-09 13:41                                     ` Alex Kiselev
2020-12-10 10:29                                       ` Singh, Jasvinder
2020-12-11 21:29                                     ` Alex Kiselev
2020-12-11 22:06                                       ` Singh, Jasvinder
2020-12-11 22:27                                         ` Alex Kiselev
2020-12-11 22:36                                           ` Alex Kiselev
2020-12-11 22:55                                           ` Singh, Jasvinder
2020-12-11 23:36                                             ` Alex Kiselev
2020-12-12  0:20                                               ` Singh, Jasvinder
2020-12-12  0:45                                                 ` Alex Kiselev
2020-12-12  0:54                                                   ` Alex Kiselev
2020-12-12  1:45                                                     ` Alex Kiselev [this message]
2020-12-12 10:22                                                       ` Singh, Jasvinder
2020-12-12 10:46                                                         ` Alex Kiselev
2020-12-12 17:19                                                           ` Alex Kiselev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b20d7eec340a8de3843ea60a7166715@therouter.net \
    --to=alex@therouter.net \
    --cc=cristian.dumitrescu@intel.com \
    --cc=jasvinder.singh@intel.com \
    --cc=savinay.dharmappa@intel.com \
    --cc=users@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).