From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <luca.boccassi@gmail.com>
Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com
 [209.85.221.50]) by dpdk.org (Postfix) with ESMTP id 78B181B04F;
 Fri, 31 Aug 2018 18:01:34 +0200 (CEST)
Received: by mail-wr1-f50.google.com with SMTP id g33-v6so11708773wrd.1;
 Fri, 31 Aug 2018 09:01:34 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to
 :references:content-transfer-encoding:mime-version;
 bh=mWCJcy+l0kTnQn8N917KMIt4lRCfu5nZnqrik44UP0M=;
 b=p6cW9OHfk3kMWkEkHYmUS/njfGx41LnVHcH5+xw9A8l00dx63Mvyy7dEFfit6q8pAM
 1jrfsh9cpW+ug72JvyhCzvZwft5lYzUHO1x8nUb7LLLNrkEwQFGmkTffqwo9YcbN5ib/
 L8N+fMff63ZWWFyaBaqXv34iPnthEDTnVJ2uOoNjpoSiukRAZi0a/3xPaPmcqHwOkRc0
 pK0KpNfwjPQeSOr3lN3spqp625PBVvknXkgAGE0YfvK/POf4k3fr0Nsv/c6RGmnq+lKf
 YDu1/ELfgQ24enb5GX54ZCkH4snF16iVYGkhk2fupVSIHWTeYUiU7BebZ+/1YvSArezN
 x2kQ==
X-Gm-Message-State: APzg51AKpX3Tf3wDXdmiTqQJs8TTtIDrm5KvHanviuKeZHLxfRLifyVc
 c7ciJFNPUGWhTaXlMUZvnaw=
X-Google-Smtp-Source: ANB0VdaKh8/5nXPhR75PT+6tcIKxfWN7k505PEawFB1aeUk+O39EJGRBD9T2ogFgOxrZ5TTiNGoBTg==
X-Received: by 2002:adf:dc88:: with SMTP id
 r8-v6mr11042624wrj.166.1535731294112; 
 Fri, 31 Aug 2018 09:01:34 -0700 (PDT)
Received: from localhost ([2a01:4b00:f419:6f00:8361:8946:ba2b:d556])
 by smtp.gmail.com with ESMTPSA id q135-v6sm7938656wmd.4.2018.08.31.09.01.32
 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Fri, 31 Aug 2018 09:01:32 -0700 (PDT)
Message-ID: <1535731291.11823.33.camel@debian.org>
From: Luca Boccassi <bluca@debian.org>
To: Matan Azrad <matan@mellanox.com>, Chas Williams <3chas3@gmail.com>
Cc: Eric Kinzie <ehkinzie@gmail.com>, "dev@dpdk.org" <dev@dpdk.org>, Declan
 Doherty <declan.doherty@intel.com>, Chas Williams <chas3@att.com>,
 "stable@dpdk.org" <stable@dpdk.org>
Date: Fri, 31 Aug 2018 17:01:31 +0100
In-Reply-To: <AM0PR0502MB4019B1FA822A85132E983041D2090@AM0PR0502MB4019.eurprd05.prod.outlook.com>
References: <20180816125202.15980-1-bluca@debian.org>
 <20180816133208.26566-1-bluca@debian.org>
 <CAG2-Gk=B3DUfRPNHFWVRKfQfyK=jh0V2BzO_D06==Obb1pOUow@mail.gmail.com>
 <AM0PR0502MB40199D1FFAEA65CD02997F17D2310@AM0PR0502MB4019.eurprd05.prod.outlook.com>
 <CAG2-Gkk_6aiFFZj4VDczJF=ojT5c4YNy1sLKKZV4+j9TppT65A@mail.gmail.com>
 <AM0PR0502MB4019D54C2137FC2AD5F18C15D2310@AM0PR0502MB4019.eurprd05.prod.outlook.com>
 <CAG2-Gkkfs3jyfc2dyBnwP_-wte_EwMzWV+bkiJgAua_5DB6LnQ@mail.gmail.com>
 <AM0PR0502MB4019BFECED087A97C68EBD70D2300@AM0PR0502MB4019.eurprd05.prod.outlook.com>
 <1534933159.5764.107.camel@debian.org>
 <AM0PR0502MB4019C2964AE8C5ED8A73411DD2300@AM0PR0502MB4019.eurprd05.prod.outlook.com>
 <20180822174316.GA29821@roosta>
 <AM0PR0502MB4019D6279E607D8A219978D1D2370@AM0PR0502MB4019.eurprd05.prod.outlook.com>
 <CAG2-GkmeEXR_M8W=Ky5PNn57=ji_wBSXJxgQ+oZVYRttq=P37Q@mail.gmail.com>
 <AM0PR0502MB40194FAA439D78BB044D6BDAD2340@AM0PR0502MB4019.eurprd05.prod.outlook.com>
 <CAG2-GknxOSkweLU8gH+LZP9A85PyFhK0kGupccDNAnL+g4JuCw@mail.gmail.com>
 <AM0PR0502MB40192D6A50A72D06AF793859D20B0@AM0PR0502MB4019.eurprd05.prod.outlook.com>
 <CAG2-Gk=tWSjQYHGoZEzg3r37qq8R8szJDrYm8gj-0Ow9ZoC9sQ@mail.gmail.com>
 <DB3PR0502MB4028D746761F07E3B02BE40AD20A0@DB3PR0502MB4028.eurprd05.prod.outlook.com>
 <CAG2-Gk=FkBiLNTWkiaERkrs2CY3ko-t7f09_cJtKcQSd5K_mTQ@mail.gmail.com>
 <AM0PR0502MB4019B1FA822A85132E983041D2090@AM0PR0502MB4019.eurprd05.prod.outlook.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Evolution 3.22.6-1+deb9u1 
Mime-Version: 1.0
Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH v4] net/bonding: per-slave
 intermediate rx ring
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 31 Aug 2018 16:01:34 -0000

On Wed, 2018-08-29 at 15:20 +0000, Matan Azrad wrote:
>=20
> From: Chas Williams
> > On Tue, Aug 28, 2018 at 5:51 AM Matan Azrad <mailto:matan@mellanox.
> > com> wrote:
> >=20
> >=20
> > From: Chas Williams
> > > On Mon, Aug 27, 2018 at 11:30 AM Matan Azrad <mailto:mailto:matan
> > > @mellanox.com> wrote:
> >=20
> > <snip>
> > > > > Because rings are generally quite efficient.
> > > >=20
> > > > But you are using a ring in addition to regular array
> > > > management, it must hurt performance of the bonding PMD
> > > > (means the bonding itself - not the slaves PMDs which are
> > > > called from the bonding)
> > >=20
> > > It adds latency.
> >=20
> > And by that hurts the application performance because it takes more
> > CPU time in the bonding PMD.
> >=20
> > No, as I said before it takes _less_ CPU time in the bonding PMD
> > because we use a more optimal read from the slaves.
>=20
> Each packet pointer should be copied more 2 times because of this
> patch + some management(the ring overhead)
> So in the bonding code you lose performance.
>=20
> >=20
> > > It increases performance because we spend less CPU time reading
> > > from the PMDs.
> >=20
> > So, it's hack in the bonding PMD to improve some slaves code
> > performance but hurt the bonding code performance,
> > Over all the performance we gain for those slaves improves the
> > application performance only when working with those slaves.=C2=A0
> > But may hurt the application performance when working with other
> > slaves.
> >=20
> > What is your evidence that is hurts bonding performance?=C2=A0 Your
> > argument is purely theoretical.
>=20
> Yes, we cannot test all the scenarios cross the PMDs.

Chas has evidence that this helps, a _lot_, in some very common cases.
We haven't seen evidence of negative impact anywhere in 2 years. Given
this, surely it's not unreasonable to ask to substantiate theoretical
arguments with some testing?

> > =C2=A0 I could easily argue than even for non-vectorized PMDs there is =
a
> > performance gain because we
> > spend less time switching between PMDs.
>=20
> But spend more time in the bonding part.
>=20
> =C2=A0> If you are going to read from a PMD you should attempt to read as
> much as possible. It's
> > expensive to read the cards registers and perform the queue
> > manipulations.
>=20
> You do it anyway.
>=20
> The context changing is expensive but also the extra copies per
> packet and the ring management.
>=20
> We have here tradeoff that may be affect differently for other
> scenarios.
>=20
> >=20
> > > =C2=A0 This means we have more CPU to use for
> > > post processing (i.e. routing).
> >=20
> >=20
> >=20
> > > > > Bonding is in a middle ground between application and PMD.
> > > >=20
> > > > Yes.
> > > > > What bonding is doing, may not improve all applications.
> > > >=20
> > > > Yes, but it can be solved using some bonding modes.
> > > > > If using a ring to buffer the vectorized receive routines,
> > > > > improves your particular application,
> > > > > that's great.=C2=A0
> > > >=20
> > > > It may be not great and even bad for some other PMDs which are
> > > > not vectororized.
> > > >=20
> > > > > However, I don't think I can say that it would help all
> > > > > applications.=C2=A0 As you point out, there is overhead associate=
d
> > > > > with
> > > > > a ring.
> > > >=20
> > > > Yes.
> > > > > Bonding's receive burst isn't especially efficient (in mode
> > > > > 4).
> > > >=20
> > > > Why?
> > > >=20
> > > > It makes a copy of the slaves, has a fair bit of stack usage,=C2=A0
> > > > needs to check the slave status, and needs to examine each
> > > > packet to see if it is a slow protocol packet.=C2=A0 So each
> > > > packet is essentially read twice.=C2=A0 The fast queue code for mod=
e
> > > > 4
> > > > avoids some of this (and probably ignores checking collecting
> > > > incorrectly).=C2=A0 If you find a slow protocol packet, you need to
> > > > chop it out of the array with memmove due to overlap.
> > >=20
> > > Agree.
> > > So to improve the bonding performance you need to optimize the
> > > aboves problems.
> > > There is no connection to the ring.
> > >=20
> > > And as I have described numerous times, these problems
> > > can't be easily fixed and preserve the existing API.
> >=20
> > Sometimes we need to work harder to see a gain for all.
> > We should not apply a patch because it is easy and show a gain for
> > specific scenarios.
> >=20
> > > > > Bonding benefits from being able to read as much as possible
> > > > > (within limits of
> > > > > course, large reads would blow out caches) from each slave.
> > > >=20
> > > > The slaves PMDs can benefits in the same way.
> > > >=20
> > > > > It can't return all that data though because applications
> > > > > tend to use the=C2=A0
> > > > > burst size that would be efficient for a typical PMD.
> > > >=20
> > > > What is the preferred burst size of the bonding? Maybe the
> > > > application should use it when they are using bonding.
> > > >=20
> > > > The preferred burst size for bonding would be the sum of all
> > > > the
> > > > slaves ideal read size.=C2=A0 However, that's not likely to be
> > > > simple
> > > > since most applications decide early the size for the
> > > > read/write
> > > > burst operations.
> > > > =C2=A0
> > > > > An alternative might be to ask bonding applications to simply
> > > > > issue larger reads for
> > > > > certain modes.=C2=A0 That's probably not as easy as it sounds
> > > > > given the
> > > > > way that the burst length effects multiplexing.
> > > >=20
> > > > Can you explain it more?
> > > >=20
> > > > A longer burst size on one PMD will tend to favor that PMD
> > > > over others.=C2=A0 It will fill your internal queues with more=C2=
=A0
> > > > of its packets.
> > >=20
> > > Agree, it's about fairness.
> > > =C2=A0
> > > >=20
> > > > > Another solution might be just alternatively poll the
> > > > > individual
> > > > > slaves on each rx burst.=C2=A0 But that means you need to poll at
> > > > > a
> > > > > faster rate.=C2=A0 Depending on your application, you might not b=
e
> > > > > able to do that.
> > > > Again, can you be more precise in the above explanation?
> > > >=20
> > > > If the application knows that there are two slaves backing
> > > > a bonding interface, the application could just read twice
> > > > from the bonding interface, knowing that the bonding=C2=A0
> > > > interface is going to alternate between the slaves.=C2=A0 But
> > > > this requires the application to know things about the bonding
> > > > PMD, like the number of slaves.
> > >=20
> > > Why should the application poll twice?
> > > Poll slave 0, than process it's packets, poll slave 1 than
> > > process its packets...
> > > What is the problem?
> > >=20
> > > Because let's say that each slave is 10G and you are using
> > >=20
> > > link aggregation with two slaves.=C2=A0 If you have tuned your
> > >=20
> > > application on the assumption that a PMD is approximately
> > > 10G, then you are going to be under polling the bonding PMD.
> > > For the above to work, you need to ensure that the application
> > > is polling sufficiently to keep up.
> >=20
> > But each poll will be shorter, no slaves loops, only one slave
> > call.
> >=20
> > But you still need to poll the bonding PMD N times as
> > fast where N is the number of slaves.=C2=A0 The "scheduler"
> > in the application may not be aware of that.=C2=A0 That was
> > (apparently) the point of the way the current bonding
> > PMD.=C2=A0 It hides everything from the application.
>=20
> Sorry, I don't sure I understand you here.
> Also here we have tradeoff:
> Read full burst from each slave + less bonding code per burst
> against
> More bonding calls(it is not in N relation it depends in heavy and
> the burst size).
> =C2=A0
>=20
> > > > > =C2=A0 We can avoid this scheduling overhead by just
> > > > > doing the extra reads in bonding and buffering in a ring.
> > > > >=20
> > > > > Since bonding is going to be buffering everything in a ring,
> > > >=20
> > > > ? I don't familiar with it. For now I don't think we need a
> > > > ring.
> > > >=20
> > > > We have some empirical testing that shows performance
> > > > improvements.
> > > > How do you explain this?=C2=A0=C2=A0
> > >=20
> > > You are using a hack in bonding which hurt the bonding but
> > > improves the vectorized PMD you use.
> > >=20
> > > > Rings aren't inefficient.
> > >=20
> > > Only when there is a need to use it.
> > >=20
> > > I believe we have identified a need.
> > > =C2=A0
> > >=20
> > > > There's significant overhead of calling the rx burst routine on
> > > > any PMD.
> > > > You need to get the PMD data structures into local memory.
> > >=20
> > > Yes.
> > >=20
> > > > Reading as much as possible makes sense.
> > >=20
> > > Agree.
> > >=20
> > > > Could you generally do this for all PMDs?=C2=A0 Yes, but the ring
> > > > adds latency.=C2=A0 Latency
> > > > that isn't an issue for the bonding PMD because of all the
> > > > other inefficiencies (for mode 4).
> > >=20
> > > Enlarging the bonding latency by that way makes some vectorized
> > > slave PMDs happy and makes the bonding worst=C2=A0
> > > for other slaves PMDs - this is not a good idea to put it
> > > upstream.
> > >=20
> > > Improving the other problems in the bonding(reduce the bonding
> > > latency) will do the job for you and for others.
> > >=20
> > > Latency is not the issue with bonding.=C2=A0 The issue with bonding
> > > is the overhead associated with making an rx burst call.=C2=A0 We
> > > add latency (via rings) to make part of the bonding driver more
> > > efficient.
> > >=20
> > > Again, I suspect it even helps the non-vectorized PMDs.
> >=20
> > =C2=A0>Calling a PMD's rx burst routine is costly and we are switching
> > between
> > > PMD's inside the bonding driver.=C2=A0 Bonding is halfway between an
> > > application and a PMD.=C2=A0 What we are doing in the bonding PMD is
> > > what an application would typically do.=C2=A0 But instead of forcing
> > > all the bonding users to do this, we are doing it for them.
> >=20
> > Not agree.
> > The overhead in bonding may be critical for some other
> > PMDs\scenario.
> >=20
> > The overhead is bonding is undesirable.=C2=A0 It's not easy to address
> > because of the initial design goals.
> > =C2=A0
> >=20
> > To summarize:
> > We are not agree.
> >=20
> > Bottom line in my side,
> > It is not a good Idea to add overhead in the bonding PMD which is a
> > generic PMD just to get gain for some specific scenarios in some
> > specific PMDs while for other scenarios\PMDs it is bad.
> >=20
> > You are making some assumptions which are simply not valid.
> > BOnding is _not_ a generic PMD.
>=20
> What?
> Should the bonding be good only for specific scenarios with specific
> PMDs?
> If no, it is generic.
>=20
> > =C2=A0 As discussed above, bonding
> > is somewhere between application and PMD.
>=20
> Yes.
>=20
> > =C2=A0 The choices made
> > for bonding where to make it easy to integrate into an existing
> > application with the application having to know anything about
> > bonding.
>=20
> The bonding should be generic at least from the next perspectives:
> Different PMDs.
> Different application scenarios.
>=20
>=20
>=20
>=20
>=20
> >=20
> > Matan.=C2=A0
> >=20
> >=20
> >=20
> >=20

--=20
Kind regards,
Luca Boccassi