From: Matan Azrad <matan@mellanox.com>
To: Chas Williams <3chas3@gmail.com>
Cc: Eric Kinzie <ehkinzie@gmail.com>,
"bluca@debian.org" <bluca@debian.org>,
"dev@dpdk.org" <dev@dpdk.org>,
Declan Doherty <declan.doherty@intel.com>,
Chas Williams <chas3@att.com>,
"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH v4] net/bonding: per-slave intermediate rx ring
Date: Wed, 29 Aug 2018 15:20:13 +0000 [thread overview]
Message-ID: <AM0PR0502MB4019B1FA822A85132E983041D2090@AM0PR0502MB4019.eurprd05.prod.outlook.com> (raw)
In-Reply-To: <CAG2-Gk=FkBiLNTWkiaERkrs2CY3ko-t7f09_cJtKcQSd5K_mTQ@mail.gmail.com>
From: Chas Williams
>On Tue, Aug 28, 2018 at 5:51 AM Matan Azrad <mailto:matan@mellanox.com> wrote:
>
>
>From: Chas Williams
>>On Mon, Aug 27, 2018 at 11:30 AM Matan Azrad <mailto:mailto:matan@mellanox.com> wrote:
><snip>
>>>>Because rings are generally quite efficient.
>>>
>>>But you are using a ring in addition to regular array management, it must hurt performance of the bonding PMD
>>>(means the bonding itself - not the slaves PMDs which are called from the bonding)
>>
>>It adds latency.
>
>And by that hurts the application performance because it takes more CPU time in the bonding PMD.
>
>No, as I said before it takes _less_ CPU time in the bonding PMD
>because we use a more optimal read from the slaves.
Each packet pointer should be copied more 2 times because of this patch + some management(the ring overhead)
So in the bonding code you lose performance.
>
>>It increases performance because we spend less CPU time reading from the PMDs.
>
>So, it's hack in the bonding PMD to improve some slaves code performance but hurt the bonding code performance,
>Over all the performance we gain for those slaves improves the application performance only when working with those slaves.
>But may hurt the application performance when working with other slaves.
>
>What is your evidence that is hurts bonding performance? Your
>argument is purely theoretical.
Yes, we cannot test all the scenarios cross the PMDs.
> I could easily argue than even for non-vectorized PMDs there is a performance gain because we
>spend less time switching between PMDs.
But spend more time in the bonding part.
> If you are going to read from a PMD you should attempt to read as much as possible. It's
>expensive to read the cards registers and perform the queue
>manipulations.
You do it anyway.
The context changing is expensive but also the extra copies per packet and the ring management.
We have here tradeoff that may be affect differently for other scenarios.
>
>> This means we have more CPU to use for
>>post processing (i.e. routing).
>
>
>
>>>>Bonding is in a middle ground between application and PMD.
>>>Yes.
>>>>What bonding is doing, may not improve all applications.
>>>Yes, but it can be solved using some bonding modes.
>>>> If using a ring to buffer the vectorized receive routines, improves your particular application,
>>>>that's great.
>>>It may be not great and even bad for some other PMDs which are not vectororized.
>>>
>>>> However, I don't think I can say that it would help all
>>>>applications. As you point out, there is overhead associated with
>>>>a ring.
>>>Yes.
>>>>Bonding's receive burst isn't especially efficient (in mode 4).
>>>
>>>Why?
>>>
>>>It makes a copy of the slaves, has a fair bit of stack usage,
>>>needs to check the slave status, and needs to examine each
>>>packet to see if it is a slow protocol packet. So each
>>>packet is essentially read twice. The fast queue code for mode 4
>>>avoids some of this (and probably ignores checking collecting
>>>incorrectly). If you find a slow protocol packet, you need to
>>>chop it out of the array with memmove due to overlap.
>>
>>Agree.
>>So to improve the bonding performance you need to optimize the aboves problems.
>>There is no connection to the ring.
>>
>>And as I have described numerous times, these problems
>>can't be easily fixed and preserve the existing API.
>
>Sometimes we need to work harder to see a gain for all.
>We should not apply a patch because it is easy and show a gain for specific scenarios.
>
>>>> Bonding benefits from being able to read as much as possible (within limits of
>>>>course, large reads would blow out caches) from each slave.
>>>
>>>The slaves PMDs can benefits in the same way.
>>>
>>>>It can't return all that data though because applications tend to use the
>>>>burst size that would be efficient for a typical PMD.
>>>
>>>What is the preferred burst size of the bonding? Maybe the application should use it when they are using bonding.
>>>
>>>The preferred burst size for bonding would be the sum of all the
>>>slaves ideal read size. However, that's not likely to be simple
>>>since most applications decide early the size for the read/write
>>>burst operations.
>>>
>>>>An alternative might be to ask bonding applications to simply issue larger reads for
>>>>certain modes. That's probably not as easy as it sounds given the
>>>>way that the burst length effects multiplexing.
>>>
>>>Can you explain it more?
>>>
>>>A longer burst size on one PMD will tend to favor that PMD
>>>over others. It will fill your internal queues with more
>>>of its packets.
>>
>>Agree, it's about fairness.
>>
>>>
>>>>Another solution might be just alternatively poll the individual
>>>>slaves on each rx burst. But that means you need to poll at a
>>>>faster rate. Depending on your application, you might not be
>>>>able to do that.
>>
>>>Again, can you be more precise in the above explanation?
>>>
>>>If the application knows that there are two slaves backing
>>>a bonding interface, the application could just read twice
>>>from the bonding interface, knowing that the bonding
>>>interface is going to alternate between the slaves. But
>>>this requires the application to know things about the bonding
>>>PMD, like the number of slaves.
>>
>>Why should the application poll twice?
>>Poll slave 0, than process it's packets, poll slave 1 than process its packets...
>>What is the problem?
>>
>>Because let's say that each slave is 10G and you are using
>>
>>link aggregation with two slaves. If you have tuned your
>>
>>application on the assumption that a PMD is approximately
>>10G, then you are going to be under polling the bonding PMD.
>>For the above to work, you need to ensure that the application
>>is polling sufficiently to keep up.
>
>But each poll will be shorter, no slaves loops, only one slave call.
>
>But you still need to poll the bonding PMD N times as
>fast where N is the number of slaves. The "scheduler"
>in the application may not be aware of that. That was
>(apparently) the point of the way the current bonding
>PMD. It hides everything from the application.
Sorry, I don't sure I understand you here.
Also here we have tradeoff:
Read full burst from each slave + less bonding code per burst
against
More bonding calls(it is not in N relation it depends in heavy and the burst size).
>>>> We can avoid this scheduling overhead by just
>>>>doing the extra reads in bonding and buffering in a ring.
>>>>
>>>>Since bonding is going to be buffering everything in a ring,
>>>
>>>? I don't familiar with it. For now I don't think we need a ring.
>>>
>>>We have some empirical testing that shows performance improvements.
>>>How do you explain this?
>>
>>You are using a hack in bonding which hurt the bonding but improves the vectorized PMD you use.
>>
>>>Rings aren't inefficient.
>>
>>Only when there is a need to use it.
>>
>>I believe we have identified a need.
>>
>>
>>>There's significant overhead of calling the rx burst routine on any PMD.
>>>You need to get the PMD data structures into local memory.
>>
>>Yes.
>>
>>>Reading as much as possible makes sense.
>>
>>Agree.
>>
>>> Could you generally do this for all PMDs? Yes, but the ring adds latency. Latency
>>>that isn't an issue for the bonding PMD because of all the
>>>other inefficiencies (for mode 4).
>>
>>Enlarging the bonding latency by that way makes some vectorized slave PMDs happy and makes the bonding worst
>>for other slaves PMDs - this is not a good idea to put it upstream.
>>
>>Improving the other problems in the bonding(reduce the bonding latency) will do the job for you and for others.
>>
>>Latency is not the issue with bonding. The issue with bonding
>>is the overhead associated with making an rx burst call. We
>>add latency (via rings) to make part of the bonding driver more
>>efficient.
>>
>>Again, I suspect it even helps the non-vectorized PMDs.
> >Calling a PMD's rx burst routine is costly and we are switching between
>>PMD's inside the bonding driver. Bonding is halfway between an
>>application and a PMD. What we are doing in the bonding PMD is
>>what an application would typically do. But instead of forcing
>>all the bonding users to do this, we are doing it for them.
>
>Not agree.
>The overhead in bonding may be critical for some other PMDs\scenario.
>
>The overhead is bonding is undesirable. It's not easy to address
>because of the initial design goals.
>
>
>To summarize:
>We are not agree.
>
>Bottom line in my side,
>It is not a good Idea to add overhead in the bonding PMD which is a generic PMD just to get gain for some specific scenarios in some specific PMDs while for other scenarios\PMDs it is bad.
>
>You are making some assumptions which are simply not valid.
>BOnding is _not_ a generic PMD.
What?
Should the bonding be good only for specific scenarios with specific PMDs?
If no, it is generic.
> As discussed above, bonding
>is somewhere between application and PMD.
Yes.
> The choices made
>for bonding where to make it easy to integrate into an existing
>application with the application having to know anything about
>bonding.
The bonding should be generic at least from the next perspectives:
Different PMDs.
Different application scenarios.
>
>Matan.
>
>
>
>
next prev parent reply other threads:[~2018-08-29 15:20 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-15 15:46 [dpdk-dev] [PATCH] " Luca Boccassi
2018-08-15 16:06 ` [dpdk-dev] [PATCH v2] " Luca Boccassi
2018-08-16 12:52 ` [dpdk-dev] [PATCH v3] " Luca Boccassi
2018-08-16 13:32 ` [dpdk-dev] [PATCH v4] " Luca Boccassi
2018-08-20 14:11 ` Chas Williams
2018-08-21 10:56 ` Matan Azrad
2018-08-21 11:13 ` Luca Boccassi
2018-08-21 14:58 ` Chas Williams
2018-08-21 15:43 ` Matan Azrad
2018-08-21 18:19 ` Chas Williams
2018-08-22 7:09 ` Matan Azrad
2018-08-22 10:19 ` [dpdk-dev] [dpdk-stable] " Luca Boccassi
2018-08-22 11:42 ` Matan Azrad
2018-08-22 17:43 ` Eric Kinzie
2018-08-23 7:28 ` Matan Azrad
2018-08-23 15:51 ` Chas Williams
2018-08-26 7:40 ` Matan Azrad
2018-08-27 13:22 ` Chas Williams
2018-08-27 15:30 ` Matan Azrad
2018-08-27 15:51 ` Chas Williams
2018-08-28 9:51 ` Matan Azrad
2018-08-29 14:30 ` Chas Williams
2018-08-29 15:20 ` Matan Azrad [this message]
2018-08-31 16:01 ` Luca Boccassi
2018-09-02 11:34 ` Matan Azrad
2018-09-09 20:57 ` Chas Williams
2018-09-12 5:38 ` Matan Azrad
2018-09-19 18:09 ` [dpdk-dev] " Luca Boccassi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AM0PR0502MB4019B1FA822A85132E983041D2090@AM0PR0502MB4019.eurprd05.prod.outlook.com \
--to=matan@mellanox.com \
--cc=3chas3@gmail.com \
--cc=bluca@debian.org \
--cc=chas3@att.com \
--cc=declan.doherty@intel.com \
--cc=dev@dpdk.org \
--cc=ehkinzie@gmail.com \
--cc=stable@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).