From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) by dpdk.org (Postfix) with ESMTP id 78B181B04F; Fri, 31 Aug 2018 18:01:34 +0200 (CEST) Received: by mail-wr1-f50.google.com with SMTP id g33-v6so11708773wrd.1; Fri, 31 Aug 2018 09:01:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:content-transfer-encoding:mime-version; bh=mWCJcy+l0kTnQn8N917KMIt4lRCfu5nZnqrik44UP0M=; b=p6cW9OHfk3kMWkEkHYmUS/njfGx41LnVHcH5+xw9A8l00dx63Mvyy7dEFfit6q8pAM 1jrfsh9cpW+ug72JvyhCzvZwft5lYzUHO1x8nUb7LLLNrkEwQFGmkTffqwo9YcbN5ib/ L8N+fMff63ZWWFyaBaqXv34iPnthEDTnVJ2uOoNjpoSiukRAZi0a/3xPaPmcqHwOkRc0 pK0KpNfwjPQeSOr3lN3spqp625PBVvknXkgAGE0YfvK/POf4k3fr0Nsv/c6RGmnq+lKf YDu1/ELfgQ24enb5GX54ZCkH4snF16iVYGkhk2fupVSIHWTeYUiU7BebZ+/1YvSArezN x2kQ== X-Gm-Message-State: APzg51AKpX3Tf3wDXdmiTqQJs8TTtIDrm5KvHanviuKeZHLxfRLifyVc c7ciJFNPUGWhTaXlMUZvnaw= X-Google-Smtp-Source: ANB0VdaKh8/5nXPhR75PT+6tcIKxfWN7k505PEawFB1aeUk+O39EJGRBD9T2ogFgOxrZ5TTiNGoBTg== X-Received: by 2002:adf:dc88:: with SMTP id r8-v6mr11042624wrj.166.1535731294112; Fri, 31 Aug 2018 09:01:34 -0700 (PDT) Received: from localhost ([2a01:4b00:f419:6f00:8361:8946:ba2b:d556]) by smtp.gmail.com with ESMTPSA id q135-v6sm7938656wmd.4.2018.08.31.09.01.32 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 31 Aug 2018 09:01:32 -0700 (PDT) Message-ID: <1535731291.11823.33.camel@debian.org> From: Luca Boccassi To: Matan Azrad , Chas Williams <3chas3@gmail.com> Cc: Eric Kinzie , "dev@dpdk.org" , Declan Doherty , Chas Williams , "stable@dpdk.org" Date: Fri, 31 Aug 2018 17:01:31 +0100 In-Reply-To: References: <20180816125202.15980-1-bluca@debian.org> <20180816133208.26566-1-bluca@debian.org> <1534933159.5764.107.camel@debian.org> <20180822174316.GA29821@roosta> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Evolution 3.22.6-1+deb9u1 Mime-Version: 1.0 Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v4] net/bonding: per-slave intermediate rx ring X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 31 Aug 2018 16:01:34 -0000 On Wed, 2018-08-29 at 15:20 +0000, Matan Azrad wrote: >=20 > From: Chas Williams > > On Tue, Aug 28, 2018 at 5:51 AM Matan Azrad > com> wrote: > >=20 > >=20 > > From: Chas Williams > > > On Mon, Aug 27, 2018 at 11:30 AM Matan Azrad > > @mellanox.com> wrote: > >=20 > > > > > > > Because rings are generally quite efficient. > > > >=20 > > > > But you are using a ring in addition to regular array > > > > management, it must hurt performance of the bonding PMD > > > > (means the bonding itself - not the slaves PMDs which are > > > > called from the bonding) > > >=20 > > > It adds latency. > >=20 > > And by that hurts the application performance because it takes more > > CPU time in the bonding PMD. > >=20 > > No, as I said before it takes _less_ CPU time in the bonding PMD > > because we use a more optimal read from the slaves. >=20 > Each packet pointer should be copied more 2 times because of this > patch + some management(the ring overhead) > So in the bonding code you lose performance. >=20 > >=20 > > > It increases performance because we spend less CPU time reading > > > from the PMDs. > >=20 > > So, it's hack in the bonding PMD to improve some slaves code > > performance but hurt the bonding code performance, > > Over all the performance we gain for those slaves improves the > > application performance only when working with those slaves.=C2=A0 > > But may hurt the application performance when working with other > > slaves. > >=20 > > What is your evidence that is hurts bonding performance?=C2=A0 Your > > argument is purely theoretical. >=20 > Yes, we cannot test all the scenarios cross the PMDs. Chas has evidence that this helps, a _lot_, in some very common cases. We haven't seen evidence of negative impact anywhere in 2 years. Given this, surely it's not unreasonable to ask to substantiate theoretical arguments with some testing? > > =C2=A0 I could easily argue than even for non-vectorized PMDs there is = a > > performance gain because we > > spend less time switching between PMDs. >=20 > But spend more time in the bonding part. >=20 > =C2=A0> If you are going to read from a PMD you should attempt to read as > much as possible. It's > > expensive to read the cards registers and perform the queue > > manipulations. >=20 > You do it anyway. >=20 > The context changing is expensive but also the extra copies per > packet and the ring management. >=20 > We have here tradeoff that may be affect differently for other > scenarios. >=20 > >=20 > > > =C2=A0 This means we have more CPU to use for > > > post processing (i.e. routing). > >=20 > >=20 > >=20 > > > > > Bonding is in a middle ground between application and PMD. > > > >=20 > > > > Yes. > > > > > What bonding is doing, may not improve all applications. > > > >=20 > > > > Yes, but it can be solved using some bonding modes. > > > > > If using a ring to buffer the vectorized receive routines, > > > > > improves your particular application, > > > > > that's great.=C2=A0 > > > >=20 > > > > It may be not great and even bad for some other PMDs which are > > > > not vectororized. > > > >=20 > > > > > However, I don't think I can say that it would help all > > > > > applications.=C2=A0 As you point out, there is overhead associate= d > > > > > with > > > > > a ring. > > > >=20 > > > > Yes. > > > > > Bonding's receive burst isn't especially efficient (in mode > > > > > 4). > > > >=20 > > > > Why? > > > >=20 > > > > It makes a copy of the slaves, has a fair bit of stack usage,=C2=A0 > > > > needs to check the slave status, and needs to examine each > > > > packet to see if it is a slow protocol packet.=C2=A0 So each > > > > packet is essentially read twice.=C2=A0 The fast queue code for mod= e > > > > 4 > > > > avoids some of this (and probably ignores checking collecting > > > > incorrectly).=C2=A0 If you find a slow protocol packet, you need to > > > > chop it out of the array with memmove due to overlap. > > >=20 > > > Agree. > > > So to improve the bonding performance you need to optimize the > > > aboves problems. > > > There is no connection to the ring. > > >=20 > > > And as I have described numerous times, these problems > > > can't be easily fixed and preserve the existing API. > >=20 > > Sometimes we need to work harder to see a gain for all. > > We should not apply a patch because it is easy and show a gain for > > specific scenarios. > >=20 > > > > > Bonding benefits from being able to read as much as possible > > > > > (within limits of > > > > > course, large reads would blow out caches) from each slave. > > > >=20 > > > > The slaves PMDs can benefits in the same way. > > > >=20 > > > > > It can't return all that data though because applications > > > > > tend to use the=C2=A0 > > > > > burst size that would be efficient for a typical PMD. > > > >=20 > > > > What is the preferred burst size of the bonding? Maybe the > > > > application should use it when they are using bonding. > > > >=20 > > > > The preferred burst size for bonding would be the sum of all > > > > the > > > > slaves ideal read size.=C2=A0 However, that's not likely to be > > > > simple > > > > since most applications decide early the size for the > > > > read/write > > > > burst operations. > > > > =C2=A0 > > > > > An alternative might be to ask bonding applications to simply > > > > > issue larger reads for > > > > > certain modes.=C2=A0 That's probably not as easy as it sounds > > > > > given the > > > > > way that the burst length effects multiplexing. > > > >=20 > > > > Can you explain it more? > > > >=20 > > > > A longer burst size on one PMD will tend to favor that PMD > > > > over others.=C2=A0 It will fill your internal queues with more=C2= =A0 > > > > of its packets. > > >=20 > > > Agree, it's about fairness. > > > =C2=A0 > > > >=20 > > > > > Another solution might be just alternatively poll the > > > > > individual > > > > > slaves on each rx burst.=C2=A0 But that means you need to poll at > > > > > a > > > > > faster rate.=C2=A0 Depending on your application, you might not b= e > > > > > able to do that. > > > > Again, can you be more precise in the above explanation? > > > >=20 > > > > If the application knows that there are two slaves backing > > > > a bonding interface, the application could just read twice > > > > from the bonding interface, knowing that the bonding=C2=A0 > > > > interface is going to alternate between the slaves.=C2=A0 But > > > > this requires the application to know things about the bonding > > > > PMD, like the number of slaves. > > >=20 > > > Why should the application poll twice? > > > Poll slave 0, than process it's packets, poll slave 1 than > > > process its packets... > > > What is the problem? > > >=20 > > > Because let's say that each slave is 10G and you are using > > >=20 > > > link aggregation with two slaves.=C2=A0 If you have tuned your > > >=20 > > > application on the assumption that a PMD is approximately > > > 10G, then you are going to be under polling the bonding PMD. > > > For the above to work, you need to ensure that the application > > > is polling sufficiently to keep up. > >=20 > > But each poll will be shorter, no slaves loops, only one slave > > call. > >=20 > > But you still need to poll the bonding PMD N times as > > fast where N is the number of slaves.=C2=A0 The "scheduler" > > in the application may not be aware of that.=C2=A0 That was > > (apparently) the point of the way the current bonding > > PMD.=C2=A0 It hides everything from the application. >=20 > Sorry, I don't sure I understand you here. > Also here we have tradeoff: > Read full burst from each slave + less bonding code per burst > against > More bonding calls(it is not in N relation it depends in heavy and > the burst size). > =C2=A0 >=20 > > > > > =C2=A0 We can avoid this scheduling overhead by just > > > > > doing the extra reads in bonding and buffering in a ring. > > > > >=20 > > > > > Since bonding is going to be buffering everything in a ring, > > > >=20 > > > > ? I don't familiar with it. For now I don't think we need a > > > > ring. > > > >=20 > > > > We have some empirical testing that shows performance > > > > improvements. > > > > How do you explain this?=C2=A0=C2=A0 > > >=20 > > > You are using a hack in bonding which hurt the bonding but > > > improves the vectorized PMD you use. > > >=20 > > > > Rings aren't inefficient. > > >=20 > > > Only when there is a need to use it. > > >=20 > > > I believe we have identified a need. > > > =C2=A0 > > >=20 > > > > There's significant overhead of calling the rx burst routine on > > > > any PMD. > > > > You need to get the PMD data structures into local memory. > > >=20 > > > Yes. > > >=20 > > > > Reading as much as possible makes sense. > > >=20 > > > Agree. > > >=20 > > > > Could you generally do this for all PMDs?=C2=A0 Yes, but the ring > > > > adds latency.=C2=A0 Latency > > > > that isn't an issue for the bonding PMD because of all the > > > > other inefficiencies (for mode 4). > > >=20 > > > Enlarging the bonding latency by that way makes some vectorized > > > slave PMDs happy and makes the bonding worst=C2=A0 > > > for other slaves PMDs - this is not a good idea to put it > > > upstream. > > >=20 > > > Improving the other problems in the bonding(reduce the bonding > > > latency) will do the job for you and for others. > > >=20 > > > Latency is not the issue with bonding.=C2=A0 The issue with bonding > > > is the overhead associated with making an rx burst call.=C2=A0 We > > > add latency (via rings) to make part of the bonding driver more > > > efficient. > > >=20 > > > Again, I suspect it even helps the non-vectorized PMDs. > >=20 > > =C2=A0>Calling a PMD's rx burst routine is costly and we are switching > > between > > > PMD's inside the bonding driver.=C2=A0 Bonding is halfway between an > > > application and a PMD.=C2=A0 What we are doing in the bonding PMD is > > > what an application would typically do.=C2=A0 But instead of forcing > > > all the bonding users to do this, we are doing it for them. > >=20 > > Not agree. > > The overhead in bonding may be critical for some other > > PMDs\scenario. > >=20 > > The overhead is bonding is undesirable.=C2=A0 It's not easy to address > > because of the initial design goals. > > =C2=A0 > >=20 > > To summarize: > > We are not agree. > >=20 > > Bottom line in my side, > > It is not a good Idea to add overhead in the bonding PMD which is a > > generic PMD just to get gain for some specific scenarios in some > > specific PMDs while for other scenarios\PMDs it is bad. > >=20 > > You are making some assumptions which are simply not valid. > > BOnding is _not_ a generic PMD. >=20 > What? > Should the bonding be good only for specific scenarios with specific > PMDs? > If no, it is generic. >=20 > > =C2=A0 As discussed above, bonding > > is somewhere between application and PMD. >=20 > Yes. >=20 > > =C2=A0 The choices made > > for bonding where to make it easy to integrate into an existing > > application with the application having to know anything about > > bonding. >=20 > The bonding should be generic at least from the next perspectives: > Different PMDs. > Different application scenarios. >=20 >=20 >=20 >=20 >=20 > >=20 > > Matan.=C2=A0 > >=20 > >=20 > >=20 > >=20 --=20 Kind regards, Luca Boccassi