From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 95635A09EE for ; Sat, 12 Dec 2020 02:45:40 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id C793CA54E; Sat, 12 Dec 2020 02:45:38 +0100 (CET) Received: from wh10.alp1.flow.ch (wh10.alp1.flow.ch [185.119.84.194]) by dpdk.org (Postfix) with ESMTP id 0A8FF2BBD for ; Sat, 12 Dec 2020 02:45:37 +0100 (CET) Received: from [::1] (port=35220 helo=wh10.alp1.flow.ch) by wh10.alp1.flow.ch with esmtpa (Exim 4.92) (envelope-from ) id 1kntyZ-0004LS-8v; Sat, 12 Dec 2020 02:45:35 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Sat, 12 Dec 2020 02:45:33 +0100 From: Alex Kiselev To: "Singh, Jasvinder" Cc: users@dpdk.org, "Dumitrescu, Cristian" , "Dharmappa, Savinay" In-Reply-To: References: <7909ed9ded69f36b262ff151244c8b0d@therouter.net> , <85944DCD-F0D5-4F64-9E8C-68D1428491B8@intel.com> <4ed02c4280efcfe2bf9e6c51803f807b@therouter.net> Message-ID: <7b20d7eec340a8de3843ea60a7166715@therouter.net> X-Sender: alex@therouter.net User-Agent: Roundcube Webmail/1.3.8 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh10.alp1.flow.ch X-AntiAbuse: Original Domain - dpdk.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - therouter.net X-Get-Message-Sender-Via: wh10.alp1.flow.ch: authenticated_id: alex@therouter.net X-Authenticated-Sender: wh10.alp1.flow.ch: alex@therouter.net X-Source: X-Source-Args: X-Source-Dir: Subject: Re: [dpdk-users] scheduler issue X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" On 2020-12-12 01:54, Alex Kiselev wrote: > On 2020-12-12 01:45, Alex Kiselev wrote: >> On 2020-12-12 01:20, Singh, Jasvinder wrote: >>>> On 11 Dec 2020, at 23:37, Alex Kiselev wrote: >>> >>>> On 2020-12-11 23:55, Singh, Jasvinder wrote: >>>> On 11 Dec 2020, at 22:27, Alex Kiselev wrote: >>> >>>>> On 2020-12-11 23:06, Singh, Jasvinder wrote: >>> >>>> On 11 Dec 2020, at 21:29, Alex Kiselev wrote: >>> >>>> On 2020-12-08 14:24, Singh, Jasvinder wrote: >>> >>>> >>> >>>>> [JS] now, returning to 1 mbps pipes situation, try reducing tc >>>> period >>> >>>>> first at subport and then at pipe level, if that help in getting >>>> even >>> >>>>> traffic across low bandwidth pipes. >>> >>>> reducing subport tc from 10 to 5 period also solved the problem >>>> with 1 >>> >>>> Mbit/s pipes. >>> >>>> so, my second problem has been solved, >>> >>>> but the first one with some of low bandwidth pipes stop >>>> transmitting still >>> >>>> remains. >>> >>>> I see, try removing "pkt_len <= pipe_tc_ov_credits" condition in >>>> the >>> >>>> grinder_credits_check() code for oversubscription case, instead use >>> >>>> this pkt_len <= pipe_tc_credits + pipe_tc_ov_credits; >>> >>>> if I do what you suggest, I will get this code >>> >>>> enough_credits = (pkt_len <= subport_tb_credits) && >>> >>>> (pkt_len <= subport_tc_credits) && >>> >>>> (pkt_len <= pipe_tb_credits) && >>> >>>> (pkt_len <= pipe_tc_credits) && >>> >>>> (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits); >>> >>>> And this doesn't make sense since if condition pkt_len <= >>>> pipe_tc_credits is true >>> >>>> then condition (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits) is >>>> also always true. >>> >>>> [JS] my suggestion is to remove“pkt_len <= pipe_tc_credits“, >>>> “ pkt_len >>> >>>> <= pipe_tc_ov_credits”and use only “pkt_len <= pipe_tc_credits >>>> + >>> >>>> pipe_tc_ov_credits“ >>> >>>> While keeping tc_ov flag on. >>> >>>> Your suggestion just turns off TC_OV feature. >>> >>>>> I don't see your point. >>> >>>>> This new suggestion will also effectively turn off the TC_OV >>>>> feature since >>> >>>>> the only effect of enabling TC_OV is adding additional condition >>> >>>>> pkt_len <= pipe_tc_ov_credits >>> >>>>> which doesn't allow a pipe to spend more resources than it should. >>> >>>>> And in the case of support congestion a pipe should spent less >>>>> than %100 of pipe's maximum rate. >>> >>>>> And you suggest to allow pipe to spend 100% of it's rate plus some >>>>> extra. >>> >>>>> I guess effect of this would even more unfair support's bandwidth >>>>> distibution. >>> >>>>> Btw, a pipe might stop transmitting even when there is no >>>>> congestion at a subport. >>> >>>> Although I didn’t try this solution but the idea here is - in a >>> >>>> particular round, of pkt_len is less than pipe_tc_credits( which is >>>> a >>> >>>> constant value each time) but greater than pipe_tc_ov_credits, then >>>> it >>> >>>> might hit the situation when no packet will be scheduled from the >>>> pipe >>> >>>> even though there are fixed credits greater than packet size is >>> >>>> available. >>> >>> But that is a perfectly normal situation and that's exactly the idea >>> behind TC_OV. >>> It means a pipe should wait for the next subport->tc_ov_period_id >>> when pipe_tc_ov_credits will be reset to a new value >>> >>> But here it’s not guaranteed that new value of pipe_tc_ov_credits >>> will be sufficient for low bandwidth pipe to send their packets as >>> each time pipe_tc_ov_credits is freshly computed. >>> >>>> pipe->tc_ov_credits = subport->tc_ov_wm * params->tc_ov_weight; >>>> >>>> which allows the pipe to continue transmitting. >>> >>> No that won’t happen if new tc_ov_credits value is again less than >>> pkt_len and will hit deadlock. >> >> new tc_ov_credits can't not be less than subport->tc_ov_wm_min, >> and tc_ov_wm_min is equal to port->mtu. >> all my scheduler ports configured with mtu 1522. etherdev ports also >> uses >> the same mtu, therefore there should be no packets bigger that 1522. > > also, tc_ov_credits is set to tc_ov_wm_min only in the case of constant > congestion and today I detected the problem when there was no > congestion. > so, it's highly unlikely that tc_ov_credits is always set to a value > less than pkt_size. The only scenario in which this might be the case > is > when scheduler port get a corrupted mbuf with incorrect pkt len > which cause a queue deadlock. also, a defragmented ipv4 packet (multisegment mbuf) might have pkt_len much bigger then scheduler port's MTU, therefore you are right, there is absolutely no guarantee that packet will not cause queue's deadlock. and this explanation sounds very plausible to me and I bet this is my case. > >> >> Maybe I should increase port's MTU? to 1540? >> >>> >>>> And it could not cause a permanent pipe stop which is what I am >>>> facing. >>> >>>>> In fairness, pipe should send the as much as packets which >>>> >>>>> consumes pipe_tc_credits, regardless of extra pipe_tc_ov_credits >>>>> which >>>> >>>>> is extra on top of pipe_tc_credits. >>>> >>>> I think it's quite the opposite. That's why after I reduced the >>>> support tc_period >>>> I got much more fairness. Since reducing subport tc_period also >>>> reduce the tc_ov_wm_max value. >>>> s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(params->tc_period, >>>> port->pipe_tc3_rate_max) >>>> as a result a pipe transmits less bytes in one round. so pipe >>>> rotation inside a grinder >>>> happens much more often and a pipe can't monopolise resources. >>>> >>>> in other sos implementation this is called "quantum". >>> >>> Yes, so reducing tc period makes the case when all pipes ( high n low >>> bandwidth) gets lower values of tc_ov_credits values which allow >>> lesser transmission from higher bw pipes and leave bandwidth for low >>> bw pipes. So, here is the thing- Either tune tc period to a value >>> which prevent high bw pipe hogging most of bw or makes changes in the >>> code, where oversubscription add extra credits on top of guaranteed. >>> >>> One question, don’t your low bw pipes have higher priority traffic >>> tc0, tc1, tc2 . Packets from those tc must be going out. Isn’t this >>> the case ? >> >> well, it would be the case after I find out >> what's going on. Right now I am using a tos2tc map configured >> in such a way that all ipv4 packets with any TOS values >> goes into TC3. >> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>>>> rcv 0 rx rate 7324160 nb pkts 5722 >>> >>>>>>> rcv 1 rx rate 7281920 nb pkts 5689 >>> >>>>>>> rcv 2 rx rate 7226880 nb pkts 5646 >>> >>>>>>> rcv 3 rx rate 7124480 nb pkts 5566 >>> >>>>>>> rcv 4 rx rate 7324160 nb pkts 5722 >>> >>>>>>> rcv 5 rx rate 7271680 nb pkts 5681 >>> >>>>>>> rcv 6 rx rate 7188480 nb pkts 5616 >>> >>>>>>> rcv 7 rx rate 7150080 nb pkts 5586 >>> >>>>>>> rcv 8 rx rate 7328000 nb pkts 5725 >>> >>>>>>> rcv 9 rx rate 7249920 nb pkts 5664 >>> >>>>>>> rcv 10 rx rate 7188480 nb pkts 5616 rcv 11 rx rate 7179520 nb >>>> pkts >>> >>>>>>> 5609 rcv 12 rx rate 7324160 nb pkts 5722 rcv 13 rx rate >>>> 7208960 nb >>> >>>>>>> pkts 5632 rcv 14 rx rate 7152640 nb pkts 5588 rcv 15 rx rate >>> >>>>>>> 7127040 nb pkts 5568 rcv 16 rx rate 7303680 nb pkts 5706 .... >>> >>>>>>> rcv 587 rx rate 2406400 nb pkts 1880 rcv 588 rx rate 2406400 nb >>>> pkts >>> >>>>>>> 1880 rcv 589 rx rate 2406400 nb pkts 1880 rcv 590 rx rate >>>> 2406400 nb >>> >>>>>>> pkts 1880 rcv 591 rx rate 2406400 nb pkts 1880 rcv 592 rx rate >>> >>>>>>> 2398720 nb pkts 1874 rcv 593 rx rate 2400000 nb pkts 1875 rcv >>>> 594 rx >>> >>>>>>> rate 2400000 nb pkts 1875 rcv 595 rx rate 2400000 nb pkts 1875 >>>> rcv >>> >>>>>>> 596 rx rate 2401280 nb pkts 1876 rcv 597 rx rate 2401280 nb >>>> pkts >>> >>>>>>> 1876 rcv 598 rx rate 2401280 nb pkts 1876 rcv 599 rx rate >>>> 2402560 nb >>> >>>>>>> pkts 1877 rx rate sum 3156416000 >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>>>>> ... despite that there is _NO_ congestion... >>> >>>>>>>>> congestion at the subport or pipe. >>> >>>>>>>>>> And the subport !! doesn't use about 42 mbit/s of available >>> >>>>>>>>>> bandwidth. >>> >>>>>>>>>> The only difference is those test configurations is TC of >>> >>>>>>>>>> generated traffic. >>> >>>>>>>>>> Test 1 uses TC 1 while test 2 uses TC 3 (which is use TC_OV >>> >>>>>>>>>> function). >>> >>>>>>>>>> So, enabling TC_OV changes the results dramatically. >>> >>>>>>>>>> ## >>> >>>>>>>>>> ## test1 >>> >>>>>>>>>> ## >>> >>>>>>>>>> hqos add profile 7 rate 2 M size 1000000 tc period 40 >>> >>>>>>>>>> # qos test port >>> >>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue >>>> sizes >>> >>>>>>>>>> 64 64 64 64 >>> >>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period >>>> 10 >>> >>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add port >>>> 1 >>> >>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port >>>> 1 >>> >>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate >>>> 1M TC >>> >>>>>>>>>> 1 ... >>> >>>>>>>>>> rcv 284 rx rate 995840 nb pkts 778 rcv 285 rx rate 995840 >>>> nb >>> >>>>>>>>>> pkts 778 rcv 286 rx rate 995840 nb pkts 778 rcv 287 rx rate >>> >>>>>>>>>> 995840 nb pkts 778 rcv 288 rx rate 995840 nb pkts 778 rcv >>>> 289 >>> >>>>>>>>>> rx rate 995840 nb pkts 778 rcv 290 rx rate 995840 nb pkts >>>> 778 >>> >>>>>>>>>> rcv 291 rx rate 995840 nb pkts 778 rcv 292 rx rate 995840 >>>> nb >>> >>>>>>>>>> pkts 778 rcv 293 rx rate 995840 nb pkts 778 rcv 294 rx rate >>> >>>>>>>>>> 995840 nb pkts 778 ... >>> >>>>>>>>>> sum pipe's rx rate is 298 494 720 OK. >>> >>>>>>>>>> The subport rate is equally distributed to 300 pipes. >>> >>>>>>>>>> ## >>> >>>>>>>>>> ## test 2 >>> >>>>>>>>>> ## >>> >>>>>>>>>> hqos add profile 7 rate 2 M size 1000000 tc period 40 >>> >>>>>>>>>> # qos test port >>> >>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue >>>> sizes >>> >>>>>>>>>> 64 64 64 64 >>> >>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period >>>> 10 >>> >>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add port >>>> 1 >>> >>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port >>>> 1 >>> >>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate >>>> 1M TC >>> >>>>>>>>>> 3 >>> >>>>>>>>>> h5 ~ # rcli sh qos rcv >>> >>>>>>>>>> rcv 0 rx rate 875520 nb pkts 684 >>> >>>>>>>>>> rcv 1 rx rate 856320 nb pkts 669 >>> >>>>>>>>>> rcv 2 rx rate 849920 nb pkts 664 >>> >>>>>>>>>> rcv 3 rx rate 853760 nb pkts 667 >>> >>>>>>>>>> rcv 4 rx rate 867840 nb pkts 678 >>> >>>>>>>>>> rcv 5 rx rate 844800 nb pkts 660 >>> >>>>>>>>>> rcv 6 rx rate 852480 nb pkts 666 >>> >>>>>>>>>> rcv 7 rx rate 855040 nb pkts 668 >>> >>>>>>>>>> rcv 8 rx rate 865280 nb pkts 676 >>> >>>>>>>>>> rcv 9 rx rate 846080 nb pkts 661 >>> >>>>>>>>>> rcv 10 rx rate 858880 nb pkts 671 rcv 11 rx rate 870400 >>>> nb >>> >>>>>>>>>> pkts 680 rcv 12 rx rate 864000 nb pkts 675 rcv 13 rx rate >>> >>>>>>>>>> 852480 nb pkts 666 rcv 14 rx rate 855040 nb pkts 668 rcv >>>> 15 >>> >>>>>>>>>> rx rate 857600 nb pkts 670 rcv 16 rx rate 864000 nb pkts >>>> 675 >>> >>>>>>>>>> rcv 17 rx rate 866560 nb pkts 677 rcv 18 rx rate 865280 >>>> nb >>> >>>>>>>>>> pkts 676 rcv 19 rx rate 858880 nb pkts 671 rcv 20 rx rate >>> >>>>>>>>>> 856320 nb pkts 669 rcv 21 rx rate 864000 nb pkts 675 rcv >>>> 22 >>> >>>>>>>>>> rx rate 869120 nb pkts 679 rcv 23 rx rate 856320 nb pkts >>>> 669 >>> >>>>>>>>>> rcv 24 rx rate 862720 nb pkts 674 rcv 25 rx rate 865280 >>>> nb >>> >>>>>>>>>> pkts 676 rcv 26 rx rate 867840 nb pkts 678 rcv 27 rx rate >>> >>>>>>>>>> 870400 nb pkts 680 rcv 28 rx rate 860160 nb pkts 672 rcv >>>> 29 >>> >>>>>>>>>> rx rate 870400 nb pkts 680 rcv 30 rx rate 869120 nb pkts >>>> 679 >>> >>>>>>>>>> rcv 31 rx rate 870400 nb pkts 680 rcv 32 rx rate 858880 >>>> nb >>> >>>>>>>>>> pkts 671 rcv 33 rx rate 858880 nb pkts 671 rcv 34 rx rate >>> >>>>>>>>>> 852480 nb pkts 666 rcv 35 rx rate 874240 nb pkts 683 rcv >>>> 36 >>> >>>>>>>>>> rx rate 855040 nb pkts 668 rcv 37 rx rate 853760 nb pkts >>>> 667 >>> >>>>>>>>>> rcv 38 rx rate 869120 nb pkts 679 rcv 39 rx rate 885760 >>>> nb >>> >>>>>>>>>> pkts 692 rcv 40 rx rate 861440 nb pkts 673 rcv 41 rx rate >>> >>>>>>>>>> 852480 nb pkts 666 rcv 42 rx rate 871680 nb pkts 681 ... >>> >>>>>>>>>> ... >>> >>>>>>>>>> rcv 288 rx rate 766720 nb pkts 599 rcv 289 rx rate 766720 >>>> nb >>> >>>>>>>>>> pkts 599 rcv 290 rx rate 766720 nb pkts 599 rcv 291 rx rate >>> >>>>>>>>>> 766720 nb pkts 599 rcv 292 rx rate 762880 nb pkts 596 rcv >>>> 293 >>> >>>>>>>>>> rx rate 762880 nb pkts 596 rcv 294 rx rate 762880 nb pkts >>>> 596 >>> >>>>>>>>>> rcv 295 rx rate 760320 nb pkts 594 rcv 296 rx rate 604160 >>>> nb >>> >>>>>>>>>> pkts 472 rcv 297 rx rate 604160 nb pkts 472 rcv 298 rx rate >>> >>>>>>>>>> 604160 nb pkts 472 rcv 299 rx rate 604160 nb pkts 472 rx >>>> rate >>> >>>>>>>>>> sum 258839040 FAILED. >>> >>>>>>>>>> The subport rate is distributed NOT equally between 300 >>>> pipes. >>> >>>>>>>>>> Some subport bandwith (about 42) is not being used!