From: dave seddon <dave.seddon.ca@gmail.com>
To: Pavel Vajarov <freakpv@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>, users <users@dpdk.org>
Subject: Re: [dpdk-users] Peformance troubleshouting of TCP/IP stack over DPDK.
Date: Thu, 7 May 2020 07:09:44 -0700 [thread overview]
Message-ID: <CANypexQRVdiMC4as0Y0kXVkS0uiQ=8MMb0be5j9UOMH8-_gAdQ@mail.gmail.com> (raw)
In-Reply-To: <CAK9EM1-EZG7KC79SNsYM5D=wFz1DD6hPAWtLrN2jpfmGNwsPmw@mail.gmail.com>
tc qdisc
https://linux.die.net/man/8/tc
On Thu, May 7, 2020 at 3:47 AM Pavel Vajarov <freakpv@gmail.com> wrote:
> On Wed, May 6, 2020 at 5:55 PM Stephen Hemminger <
> stephen@networkplumber.org>
> wrote:
>
> > On Wed, 6 May 2020 08:14:20 +0300
> > Pavel Vajarov <freakpv@gmail.com> wrote:
> >
> > > Hi there,
> > >
> > > We are trying to compare the performance of DPDK+FreeBSD networking
> stack
> > > vs standard Linux kernel and we have problems finding out why the
> former
> > is
> > > slower. The details are below.
> > >
> > > There is a project called F-Stack <https://github.com/F-Stack/f-stack
> >.
> > > It glues the networking stack from
> > > FreeBSD 11.01 over DPDK. We made a setup to test the performance of
> > > transparent
> > > TCP proxy based on F-Stack and another one running on Standard Linux
> > > kernel.
> > > We did the tests on KVM with 2 cores (Intel(R) Xeon(R) Gold 6139 CPU @
> > > 2.30GHz)
> > > and 32GB RAM. 10Gbs NIC was attached in passthrough mode.
> > > The application level code, the one which handles epoll notifications
> and
> > > memcpy data between the sockets, of the both proxy applications is 100%
> > the
> > > same. Both proxy applications are single threaded and in all tests we
> > > pinned the applications on core 1. The interrupts from the network card
> > > were pinned to the same core 1 for the test with the standard Linux
> > > application.
> > >
> > > Here are the test results:
> > > 1. The Linux based proxy was able to handle about 1.7-1.8 Gbps before
> it
> > > started to throttle the traffic. No visible CPU usage was observed on
> > core
> > > 0 during the tests, only core 1, where the application and the IRQs
> were
> > > pinned, took the load.
> > > 2. The DPDK+FreeBSD proxy was able to thandle 700-800 Mbps before it
> > > started to throttle the traffic. No visible CPU usage was observed on
> > core
> > > 0 during the tests only core 1, where the application was pinned, took
> > the
> > > load. In some of the latter tests I did some changes to the number of
> > read
> > > packets in one call from the network card and the number of handled
> > events
> > > in one call to epoll. With these changes I was able to increase the
> > > throughput
> > > to 900-1000 Mbps but couldn't increase it more.
> > > 3. We did another test with the DPDK+FreeBSD proxy just to give us some
> > > more info about the problem. We disabled the TCP proxy functionality
> and
> > > let the packets be simply ip forwarded by the FreeBSD stack. In this
> test
> > > we reached up to 5Gbps without being able to throttle the traffic. We
> > just
> > > don't have more traffic to redirect there at the moment. So the
> bottlneck
> > > seem to be either in the upper level of the network stack or in the
> > > application
> > > code.
> > >
> > > There is a huawei switch which redirects the traffic to this server. It
> > > regularly
> > > sends arping and if the server doesn't respond it stops the
> redirection.
> > > So we assumed that when the redirection stops it's because the server
> > > throttles the traffic and drops packets and can't respond to the arping
> > > because
> > > of the packets drop.
> > >
> > > The whole application can be very roughly represented in the following
> > way:
> > > - Write pending outgoing packets to the network card
> > > - Read incoming packets from the network card
> > > - Push the incoming packets to the FreeBSD stack
> > > - Call epoll_wait/kevent without waiting
> > > - Handle the events
> > > - loop from the beginning
> > > According to the performance profiling that we did, aside from packet
> > > processing,
> > > about 25-30% of the application time seems to be spent in the
> > > epoll_wait/kevent
> > > even though the `timeout` parameter of this call is set to 0 i.e.
> > > it shouldn't block waiting for events if there is none.
> > >
> > > I can give you much more details and code for everything, if needed.
> > >
> > > My questions are:
> > > 1. Does somebody have observations or educated guesses about what
> amount
> > of
> > > traffic should I expect the DPDK + FreeBSD stack + kevent to process in
> > the
> > > above
> > > scenario? Are the numbers low or expected?
> > > We've expected to see better performance than the standard Linux kernel
> > one
> > > but
> > > so far we can't get this performance.
> > > 2. Do you think the diffrence comes because of the time spending
> handling
> > > packets
> > > and handling epoll in both of the tests? What do I mean. For the
> standard
> > > Linux tests
> > > the interrupts handling has higher priority than the epoll handling and
> > > thus the application
> > > can spend much more time handling packets and processing them in the
> > kernel
> > > than
> > > handling epoll events in the user space. For the DPDK+FreeBSD case the
> > time
> > > for
> > > handling packets and the time for processing epolls is kind of equal. I
> > > think, that this was
> > > the reason why we were able to get more performance increasing the
> number
> > > of read
> > > packets at one go and decreasing the epoll events. However, we couldn't
> > > increase the
> > > throughput enough with these tweaks.
> > > 3. Can you suggest something else that we can test/measure/profile to
> get
> > > better idea
> > > what exactly is happening here and to improve the performance more?
> > >
> > > Any help is appreciated!
> > >
> > > Thanks in advance,
> > > Pavel.
> >
> > First off, if you are testing on KVM, are you using PCI pass thru or
> SR-IOV
> > to make the device available to the guest directly. The default mode uses
> > a Linux bridge, and this results in multiple copies and context switches.
> > You end up testing Linux bridge and virtio performance, not TCP.
> >
> > To get full speed with TCP and most software stacks you need TCP
> > segmentation
> > offload.
> >
> > Also software queue discipline, kernel version, and TCP congestion
> control
> > can have a big role in your result.
> >
>
> Hi,
>
> Thanks for the response.
>
> We did the tests on Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-96-generic
> x86_64).
> The NIC was given to the guest using SR-IOV.
> The TCP segmentation offload was enabled for both tests (standard Linux and
> DPDK+FreeBSD).
> The congestion control algorithm for both tests was 'cubic'.
>
> What do you mean by 'software queue discipline'?
>
> Regards,
> Pavel.
>
--
Regards,
Dave Seddon
+1 415 857 5102
next prev parent reply other threads:[~2020-05-07 14:09 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-06 5:14 Pavel Vajarov
2020-05-06 14:54 ` Stephen Hemminger
2020-05-07 10:47 ` Pavel Vajarov
2020-05-07 14:09 ` dave seddon [this message]
2020-05-07 20:31 ` Stephen Hemminger
2020-05-08 5:03 ` Pavel Vajarov
2020-05-20 19:43 ` Vincent Li
2020-05-21 8:09 ` Pavel Vajarov
2020-05-21 16:31 ` Vincent Li
2020-05-26 16:50 ` Vincent Li
2020-05-27 5:11 ` Pavel Vajarov
2020-05-27 16:44 ` Vincent Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CANypexQRVdiMC4as0Y0kXVkS0uiQ=8MMb0be5j9UOMH8-_gAdQ@mail.gmail.com' \
--to=dave.seddon.ca@gmail.com \
--cc=freakpv@gmail.com \
--cc=stephen@networkplumber.org \
--cc=users@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).