From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 781EBA0524 for ; Sat, 12 Dec 2020 18:20:02 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DA551A54F; Sat, 12 Dec 2020 18:20:00 +0100 (CET) Received: from wh10.alp1.flow.ch (wh10.alp1.flow.ch [185.119.84.194]) by dpdk.org (Postfix) with ESMTP id 6C9F74C7B for ; Sat, 12 Dec 2020 18:19:59 +0100 (CET) Received: from [::1] (port=38630 helo=wh10.alp1.flow.ch) by wh10.alp1.flow.ch with esmtpa (Exim 4.92) (envelope-from ) id 1ko8Ym-001dUV-HP; Sat, 12 Dec 2020 18:19:56 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Sat, 12 Dec 2020 18:19:56 +0100 From: Alex Kiselev To: "Singh, Jasvinder" Cc: users@dpdk.org, "Dumitrescu, Cristian" , "Dharmappa, Savinay" In-Reply-To: References: <7909ed9ded69f36b262ff151244c8b0d@therouter.net> , <85944DCD-F0D5-4F64-9E8C-68D1428491B8@intel.com> <4ed02c4280efcfe2bf9e6c51803f807b@therouter.net> , <7b20d7eec340a8de3843ea60a7166715@therouter.net> <3CC4F951-89A4-4845-9DD0-9982AB1A32AB@intel.com> Message-ID: <7da84f91234331918758eb643089088d@therouter.net> X-Sender: alex@therouter.net User-Agent: Roundcube Webmail/1.3.8 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - wh10.alp1.flow.ch X-AntiAbuse: Original Domain - dpdk.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - therouter.net X-Get-Message-Sender-Via: wh10.alp1.flow.ch: authenticated_id: alex@therouter.net X-Authenticated-Sender: wh10.alp1.flow.ch: alex@therouter.net X-Source: X-Source-Args: X-Source-Dir: Subject: Re: [dpdk-users] scheduler issue X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" On 2020-12-12 11:46, Alex Kiselev wrote: > On 2020-12-12 11:22, Singh, Jasvinder wrote: >>> On 12 Dec 2020, at 01:45, Alex Kiselev wrote: >>> >>> On 2020-12-12 01:54, Alex Kiselev wrote: >>>>> On 2020-12-12 01:45, Alex Kiselev wrote: >>>>> On 2020-12-12 01:20, Singh, Jasvinder wrote: >>>>>>> On 11 Dec 2020, at 23:37, Alex Kiselev >>>>>>> wrote: >>>>>>> On 2020-12-11 23:55, Singh, Jasvinder wrote: >>>>>>> On 11 Dec 2020, at 22:27, Alex Kiselev >>>>>>> wrote: >>>>>>>> On 2020-12-11 23:06, Singh, Jasvinder wrote: >>>>>>> On 11 Dec 2020, at 21:29, Alex Kiselev >>>>>>> wrote: >>>>>>> On 2020-12-08 14:24, Singh, Jasvinder wrote: >>>>>>> >>>>>>>> [JS] now, returning to 1 mbps pipes situation, try reducing tc >>>>>>> period >>>>>>>> first at subport and then at pipe level, if that help in >>>>>>>> getting >>>>>>> even >>>>>>>> traffic across low bandwidth pipes. >>>>>>> reducing subport tc from 10 to 5 period also solved the problem >>>>>>> with 1 >>>>>>> Mbit/s pipes. >>>>>>> so, my second problem has been solved, >>>>>>> but the first one with some of low bandwidth pipes stop >>>>>>> transmitting still >>>>>>> remains. >>>>>>> I see, try removing "pkt_len <= pipe_tc_ov_credits" condition in >>>>>>> the >>>>>>> grinder_credits_check() code for oversubscription case, instead >>>>>>> use >>>>>>> this pkt_len <= pipe_tc_credits + pipe_tc_ov_credits; >>>>>>> if I do what you suggest, I will get this code >>>>>>> enough_credits = (pkt_len <= subport_tb_credits) && >>>>>>> (pkt_len <= subport_tc_credits) && >>>>>>> (pkt_len <= pipe_tb_credits) && >>>>>>> (pkt_len <= pipe_tc_credits) && >>>>>>> (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits); >>>>>>> And this doesn't make sense since if condition pkt_len <= >>>>>>> pipe_tc_credits is true >>>>>>> then condition (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits) >>>>>>> is >>>>>>> also always true. >>>>>>> [JS] my suggestion is to remove“pkt_len <= pipe_tc_credits“, >>>>>>> “ pkt_len >>>>>>> <= pipe_tc_ov_credits”and use only “pkt_len <= pipe_tc_credits >>>>>>> + >>>>>>> pipe_tc_ov_credits“ >>>>>>> While keeping tc_ov flag on. >>>>>>> Your suggestion just turns off TC_OV feature. >>>>>>>> I don't see your point. >>>>>>>> This new suggestion will also effectively turn off the TC_OV >>>>>>>> feature since >>>>>>>> the only effect of enabling TC_OV is adding additional condition >>>>>>>> pkt_len <= pipe_tc_ov_credits >>>>>>>> which doesn't allow a pipe to spend more resources than it >>>>>>>> should. >>>>>>>> And in the case of support congestion a pipe should spent less >>>>>>>> than %100 of pipe's maximum rate. >>>>>>>> And you suggest to allow pipe to spend 100% of it's rate plus >>>>>>>> some >>>>>>>> extra. >>>>>>>> I guess effect of this would even more unfair support's >>>>>>>> bandwidth >>>>>>>> distibution. >>>>>>>> Btw, a pipe might stop transmitting even when there is no >>>>>>>> congestion at a subport. >>>>>>> Although I didn’t try this solution but the idea here is - in a >>>>>>> particular round, of pkt_len is less than pipe_tc_credits( which >>>>>>> is >>>>>>> a >>>>>>> constant value each time) but greater than pipe_tc_ov_credits, >>>>>>> then >>>>>>> it >>>>>>> might hit the situation when no packet will be scheduled from the >>>>>>> pipe >>>>>>> even though there are fixed credits greater than packet size is >>>>>>> available. >>>>>> But that is a perfectly normal situation and that's exactly the >>>>>> idea >>>>>> behind TC_OV. >>>>>> It means a pipe should wait for the next subport->tc_ov_period_id >>>>>> when pipe_tc_ov_credits will be reset to a new value >>>>>> But here it’s not guaranteed that new value of pipe_tc_ov_credits >>>>>> will be sufficient for low bandwidth pipe to send their packets as >>>>>> each time pipe_tc_ov_credits is freshly computed. >>>>>>> pipe->tc_ov_credits = subport->tc_ov_wm * params->tc_ov_weight; >>>>>>> which allows the pipe to continue transmitting. >>>>>> No that won’t happen if new tc_ov_credits value is again less than >>>>>> pkt_len and will hit deadlock. >>>>> new tc_ov_credits can't not be less than subport->tc_ov_wm_min, >>>>> and tc_ov_wm_min is equal to port->mtu. >>>>> all my scheduler ports configured with mtu 1522. etherdev ports >>>>> also uses >>>>> the same mtu, therefore there should be no packets bigger that >>>>> 1522. >>>> also, tc_ov_credits is set to tc_ov_wm_min only in the case of >>>> constant >>>> congestion and today I detected the problem when there was no >>>> congestion. >>>> so, it's highly unlikely that tc_ov_credits is always set to a value >>>> less than pkt_size. The only scenario in which this might be the >>>> case is >>>> when scheduler port get a corrupted mbuf with incorrect pkt len >>>> which cause a queue deadlock. >>> >>> also, a defragmented ipv4 packet (multisegment mbuf) might have >>> pkt_len much bigger >>> then scheduler port's MTU, therefore you are right, there is >>> absolutely no guarantee >>> that packet will not cause queue's deadlock. and this explanation >>> sounds very plausible >>> to me and I bet this is my case. >>> >> >> But you mentioned earlier that your packet length is low, never >> exceeding above threshold. May be test with fixed 256/512 bytes size >> pkt if faces the same no transmission situation. > > No, I could only say so about the test lab which I were using > to test the pipe fairness and which uses a packet generator with > constant pkt size. > > My main issue happens in a productive network. And I mentioned > that it is a network to provide internet access to residential > customers therefore packet sizes are up to 1522 bytes. Also, > fragmented packets are valid packets in such networks. My application > performs ipv4 defragmentation and then sends to the scheduler, so the > scheduler > might receive the multisegment packet up to 1522 * 8 size. I've tested your patch. /* Check pipe and subport credits */ enough_credits = (pkt_len <= subport_tb_credits) && (pkt_len <= subport_tc_credits) && (pkt_len <= pipe_tb_credits) && (pkt_len <= pipe_tc_credits + pipe_tc_ov_credits); and the effect is quite positive. Also, I moved the packet defragmentation block before the scheduler in my app. It will guarantee that all packets entering the scheduler have a size no more than 1522. Thus pipe_tc_ov_credits will always be greater than pkt_size. This build is gonna be tested in production in a couple of days. I'll let you know about the results. > >> >> >> >> >> >> >>> >>>>> Maybe I should increase port's MTU? to 1540? >>>>>>> And it could not cause a permanent pipe stop which is what I am >>>>>>> facing. >>>>>>>> In fairness, pipe should send the as much as packets which >>>>>>>> consumes pipe_tc_credits, regardless of extra pipe_tc_ov_credits >>>>>>>> which >>>>>>>> is extra on top of pipe_tc_credits. >>>>>>> I think it's quite the opposite. That's why after I reduced the >>>>>>> support tc_period >>>>>>> I got much more fairness. Since reducing subport tc_period also >>>>>>> reduce the tc_ov_wm_max value. >>>>>>> s->tc_ov_wm_max = rte_sched_time_ms_to_bytes(params->tc_period, >>>>>>> port->pipe_tc3_rate_max) >>>>>>> as a result a pipe transmits less bytes in one round. so pipe >>>>>>> rotation inside a grinder >>>>>>> happens much more often and a pipe can't monopolise resources. >>>>>>> in other sos implementation this is called "quantum". >>>>>> Yes, so reducing tc period makes the case when all pipes ( high n >>>>>> low >>>>>> bandwidth) gets lower values of tc_ov_credits values which allow >>>>>> lesser transmission from higher bw pipes and leave bandwidth for >>>>>> low >>>>>> bw pipes. So, here is the thing- Either tune tc period to a value >>>>>> which prevent high bw pipe hogging most of bw or makes changes in >>>>>> the >>>>>> code, where oversubscription add extra credits on top of >>>>>> guaranteed. >>>>>> One question, don’t your low bw pipes have higher priority traffic >>>>>> tc0, tc1, tc2 . Packets from those tc must be going out. Isn’t >>>>>> this >>>>>> the case ? >>>>> well, it would be the case after I find out >>>>> what's going on. Right now I am using a tos2tc map configured >>>>> in such a way that all ipv4 packets with any TOS values >>>>> goes into TC3. >>>>>>>>>> rcv 0 rx rate 7324160 nb pkts 5722 >>>>>>>>>> rcv 1 rx rate 7281920 nb pkts 5689 >>>>>>>>>> rcv 2 rx rate 7226880 nb pkts 5646 >>>>>>>>>> rcv 3 rx rate 7124480 nb pkts 5566 >>>>>>>>>> rcv 4 rx rate 7324160 nb pkts 5722 >>>>>>>>>> rcv 5 rx rate 7271680 nb pkts 5681 >>>>>>>>>> rcv 6 rx rate 7188480 nb pkts 5616 >>>>>>>>>> rcv 7 rx rate 7150080 nb pkts 5586 >>>>>>>>>> rcv 8 rx rate 7328000 nb pkts 5725 >>>>>>>>>> rcv 9 rx rate 7249920 nb pkts 5664 >>>>>>>>>> rcv 10 rx rate 7188480 nb pkts 5616 rcv 11 rx rate 7179520 >>>>>>>>>> nb >>>>>>> pkts >>>>>>>>>> 5609 rcv 12 rx rate 7324160 nb pkts 5722 rcv 13 rx rate >>>>>>> 7208960 nb >>>>>>>>>> pkts 5632 rcv 14 rx rate 7152640 nb pkts 5588 rcv 15 rx rate >>>>>>>>>> 7127040 nb pkts 5568 rcv 16 rx rate 7303680 nb pkts 5706 .... >>>>>>>>>> rcv 587 rx rate 2406400 nb pkts 1880 rcv 588 rx rate 2406400 >>>>>>>>>> nb >>>>>>> pkts >>>>>>>>>> 1880 rcv 589 rx rate 2406400 nb pkts 1880 rcv 590 rx rate >>>>>>> 2406400 nb >>>>>>>>>> pkts 1880 rcv 591 rx rate 2406400 nb pkts 1880 rcv 592 rx rate >>>>>>>>>> 2398720 nb pkts 1874 rcv 593 rx rate 2400000 nb pkts 1875 rcv >>>>>>> 594 rx >>>>>>>>>> rate 2400000 nb pkts 1875 rcv 595 rx rate 2400000 nb pkts 1875 >>>>>>> rcv >>>>>>>>>> 596 rx rate 2401280 nb pkts 1876 rcv 597 rx rate 2401280 nb >>>>>>> pkts >>>>>>>>>> 1876 rcv 598 rx rate 2401280 nb pkts 1876 rcv 599 rx rate >>>>>>> 2402560 nb >>>>>>>>>> pkts 1877 rx rate sum 3156416000 >>>>>>>>>>>> ... despite that there is _NO_ congestion... >>>>>>>>>>>> congestion at the subport or pipe. >>>>>>>>>>>>> And the subport !! doesn't use about 42 mbit/s of available >>>>>>>>>>>>> bandwidth. >>>>>>>>>>>>> The only difference is those test configurations is TC of >>>>>>>>>>>>> generated traffic. >>>>>>>>>>>>> Test 1 uses TC 1 while test 2 uses TC 3 (which is use TC_OV >>>>>>>>>>>>> function). >>>>>>>>>>>>> So, enabling TC_OV changes the results dramatically. >>>>>>>>>>>>> ## >>>>>>>>>>>>> ## test1 >>>>>>>>>>>>> ## >>>>>>>>>>>>> hqos add profile 7 rate 2 M size 1000000 tc period 40 >>>>>>>>>>>>> # qos test port >>>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue >>>>>>> sizes >>>>>>>>>>>>> 64 64 64 64 >>>>>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period >>>>>>> 10 >>>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add >>>>>>>>>>>>> port >>>>>>> 1 >>>>>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port >>>>>>> 1 >>>>>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate >>>>>>> 1M TC >>>>>>>>>>>>> 1 ... >>>>>>>>>>>>> rcv 284 rx rate 995840 nb pkts 778 rcv 285 rx rate 995840 >>>>>>> nb >>>>>>>>>>>>> pkts 778 rcv 286 rx rate 995840 nb pkts 778 rcv 287 rx >>>>>>>>>>>>> rate >>>>>>>>>>>>> 995840 nb pkts 778 rcv 288 rx rate 995840 nb pkts 778 rcv >>>>>>> 289 >>>>>>>>>>>>> rx rate 995840 nb pkts 778 rcv 290 rx rate 995840 nb pkts >>>>>>> 778 >>>>>>>>>>>>> rcv 291 rx rate 995840 nb pkts 778 rcv 292 rx rate 995840 >>>>>>> nb >>>>>>>>>>>>> pkts 778 rcv 293 rx rate 995840 nb pkts 778 rcv 294 rx >>>>>>>>>>>>> rate >>>>>>>>>>>>> 995840 nb pkts 778 ... >>>>>>>>>>>>> sum pipe's rx rate is 298 494 720 OK. >>>>>>>>>>>>> The subport rate is equally distributed to 300 pipes. >>>>>>>>>>>>> ## >>>>>>>>>>>>> ## test 2 >>>>>>>>>>>>> ## >>>>>>>>>>>>> hqos add profile 7 rate 2 M size 1000000 tc period 40 >>>>>>>>>>>>> # qos test port >>>>>>>>>>>>> hqos add port 1 rate 10 G mtu 1522 frame overhead 24 queue >>>>>>> sizes >>>>>>>>>>>>> 64 64 64 64 >>>>>>>>>>>>> hqos add port 1 subport 0 rate 300 M size 1000000 tc period >>>>>>> 10 >>>>>>>>>>>>> hqos add port 1 subport 0 pipes 2000 profile 7 hqos add >>>>>>>>>>>>> port >>>>>>> 1 >>>>>>>>>>>>> subport 0 pipes 200 profile 23 hqos set port 1 lcore 3 port >>>>>>> 1 >>>>>>>>>>>>> subport rate 300 M number of tx flows 300 generator tx rate >>>>>>> 1M TC >>>>>>>>>>>>> 3 >>>>>>>>>>>>> h5 ~ # rcli sh qos rcv >>>>>>>>>>>>> rcv 0 rx rate 875520 nb pkts 684 >>>>>>>>>>>>> rcv 1 rx rate 856320 nb pkts 669 >>>>>>>>>>>>> rcv 2 rx rate 849920 nb pkts 664 >>>>>>>>>>>>> rcv 3 rx rate 853760 nb pkts 667 >>>>>>>>>>>>> rcv 4 rx rate 867840 nb pkts 678 >>>>>>>>>>>>> rcv 5 rx rate 844800 nb pkts 660 >>>>>>>>>>>>> rcv 6 rx rate 852480 nb pkts 666 >>>>>>>>>>>>> rcv 7 rx rate 855040 nb pkts 668 >>>>>>>>>>>>> rcv 8 rx rate 865280 nb pkts 676 >>>>>>>>>>>>> rcv 9 rx rate 846080 nb pkts 661 >>>>>>>>>>>>> rcv 10 rx rate 858880 nb pkts 671 rcv 11 rx rate 870400 >>>>>>> nb >>>>>>>>>>>>> pkts 680 rcv 12 rx rate 864000 nb pkts 675 rcv 13 rx >>>>>>>>>>>>> rate >>>>>>>>>>>>> 852480 nb pkts 666 rcv 14 rx rate 855040 nb pkts 668 rcv >>>>>>> 15 >>>>>>>>>>>>> rx rate 857600 nb pkts 670 rcv 16 rx rate 864000 nb pkts >>>>>>> 675 >>>>>>>>>>>>> rcv 17 rx rate 866560 nb pkts 677 rcv 18 rx rate 865280 >>>>>>> nb >>>>>>>>>>>>> pkts 676 rcv 19 rx rate 858880 nb pkts 671 rcv 20 rx >>>>>>>>>>>>> rate >>>>>>>>>>>>> 856320 nb pkts 669 rcv 21 rx rate 864000 nb pkts 675 rcv >>>>>>> 22 >>>>>>>>>>>>> rx rate 869120 nb pkts 679 rcv 23 rx rate 856320 nb pkts >>>>>>> 669 >>>>>>>>>>>>> rcv 24 rx rate 862720 nb pkts 674 rcv 25 rx rate 865280 >>>>>>> nb >>>>>>>>>>>>> pkts 676 rcv 26 rx rate 867840 nb pkts 678 rcv 27 rx >>>>>>>>>>>>> rate >>>>>>>>>>>>> 870400 nb pkts 680 rcv 28 rx rate 860160 nb pkts 672 rcv >>>>>>> 29 >>>>>>>>>>>>> rx rate 870400 nb pkts 680 rcv 30 rx rate 869120 nb pkts >>>>>>> 679 >>>>>>>>>>>>> rcv 31 rx rate 870400 nb pkts 680 rcv 32 rx rate 858880 >>>>>>> nb >>>>>>>>>>>>> pkts 671 rcv 33 rx rate 858880 nb pkts 671 rcv 34 rx >>>>>>>>>>>>> rate >>>>>>>>>>>>> 852480 nb pkts 666 rcv 35 rx rate 874240 nb pkts 683 rcv >>>>>>> 36 >>>>>>>>>>>>> rx rate 855040 nb pkts 668 rcv 37 rx rate 853760 nb pkts >>>>>>> 667 >>>>>>>>>>>>> rcv 38 rx rate 869120 nb pkts 679 rcv 39 rx rate 885760 >>>>>>> nb >>>>>>>>>>>>> pkts 692 rcv 40 rx rate 861440 nb pkts 673 rcv 41 rx >>>>>>>>>>>>> rate >>>>>>>>>>>>> 852480 nb pkts 666 rcv 42 rx rate 871680 nb pkts 681 ... >>>>>>>>>>>>> ... >>>>>>>>>>>>> rcv 288 rx rate 766720 nb pkts 599 rcv 289 rx rate 766720 >>>>>>> nb >>>>>>>>>>>>> pkts 599 rcv 290 rx rate 766720 nb pkts 599 rcv 291 rx >>>>>>>>>>>>> rate >>>>>>>>>>>>> 766720 nb pkts 599 rcv 292 rx rate 762880 nb pkts 596 rcv >>>>>>> 293 >>>>>>>>>>>>> rx rate 762880 nb pkts 596 rcv 294 rx rate 762880 nb pkts >>>>>>> 596 >>>>>>>>>>>>> rcv 295 rx rate 760320 nb pkts 594 rcv 296 rx rate 604160 >>>>>>> nb >>>>>>>>>>>>> pkts 472 rcv 297 rx rate 604160 nb pkts 472 rcv 298 rx >>>>>>>>>>>>> rate >>>>>>>>>>>>> 604160 nb pkts 472 rcv 299 rx rate 604160 nb pkts 472 rx >>>>>>> rate >>>>>>>>>>>>> sum 258839040 FAILED. >>>>>>>>>>>>> The subport rate is distributed NOT equally between 300 >>>>>>> pipes. >>>>>>>>>>>>> Some subport bandwith (about 42) is not being used!