DPDK usage discussions
 help / color / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Pavel Vajarov <freakpv@gmail.com>
Cc: users@dpdk.org
Subject: Re: [dpdk-users] Peformance troubleshouting of TCP/IP stack over DPDK.
Date: Wed, 6 May 2020 07:54:56 -0700
Message-ID: <20200506075456.140625fb@hermes.lan> (raw)
In-Reply-To: <CAK9EM1_c_eicdL4zU7BKF6i5KRd02SSfJfE=0CFa8w2iMDfe=w@mail.gmail.com>

On Wed, 6 May 2020 08:14:20 +0300
Pavel Vajarov <freakpv@gmail.com> wrote:

> Hi there,
> We are trying to compare the performance of DPDK+FreeBSD networking stack
> vs standard Linux kernel and we have problems finding out why the former is
> slower. The details are below.
> There is a project called F-Stack <https://github.com/F-Stack/f-stack>.
> It glues the networking stack from
> FreeBSD 11.01 over DPDK. We made a setup to test the performance of
> transparent
> TCP proxy based on F-Stack and another one running on Standard Linux
> kernel.
> We did the tests on KVM with 2 cores (Intel(R) Xeon(R) Gold 6139 CPU @
> 2.30GHz)
> and 32GB RAM. 10Gbs NIC was attached in passthrough mode.
> The application level code, the one which handles epoll notifications and
> memcpy data between the sockets, of the both proxy applications is 100% the
> same. Both proxy applications are single threaded and in all tests we
> pinned the applications on core 1. The interrupts from the network card
> were pinned to the same core 1 for the test with the standard Linux
> application.
> Here are the test results:
> 1. The Linux based proxy was able to handle about 1.7-1.8 Gbps before it
> started to throttle the traffic. No visible CPU usage was observed on core
> 0 during the tests, only core 1, where the application and the IRQs were
> pinned, took the load.
> 2. The DPDK+FreeBSD proxy was able to thandle 700-800 Mbps before it
> started to throttle the traffic. No visible CPU usage was observed on core
> 0 during the tests only core 1, where the application was pinned, took the
> load. In some of the latter tests I did some changes to the number of read
> packets in one call from the network card and the number of handled events
> in one call to epoll. With these changes I was able to increase the
> throughput
> to 900-1000 Mbps but couldn't increase it more.
> 3. We did another test with the DPDK+FreeBSD proxy just to give us some
> more info about the problem. We disabled the TCP proxy functionality and
> let the packets be simply ip forwarded by the FreeBSD stack. In this test
> we reached up to 5Gbps without being able to throttle the traffic. We just
> don't have more traffic to redirect there at the moment. So the bottlneck
> seem to be either in the upper level of the network stack or in the
> application
> code.
> There is a huawei switch which redirects the traffic to this server. It
> regularly
> sends arping and if the server doesn't respond it stops the redirection.
> So we assumed that when the redirection stops it's because the server
> throttles the traffic and drops packets and can't respond to the arping
> because
> of the packets drop.
> The whole application can be very roughly represented in the following way:
>  - Write pending outgoing packets to the network card
> - Read incoming packets from the network card
>  - Push the incoming packets to the FreeBSD stack
>  - Call epoll_wait/kevent without waiting
>  - Handle the events
>  - loop from the beginning
> According to the performance profiling that we did, aside from packet
> processing,
>  about 25-30% of the application time seems to be spent in the
> epoll_wait/kevent
> even though the `timeout` parameter of this call is set to 0 i.e.
> it shouldn't block waiting for events if there is none.
> I can give you much more details and code for everything, if needed.
> My questions are:
> 1. Does somebody have observations or educated guesses about what amount of
> traffic should I expect the DPDK + FreeBSD stack + kevent to process in the
> above
> scenario? Are the numbers low or expected?
> We've expected to see better performance than the standard Linux kernel one
> but
> so far we can't get this performance.
> 2. Do you think the diffrence comes because of the time spending handling
> packets
> and handling epoll in both of the tests? What do I mean. For the standard
> Linux tests
> the interrupts handling has higher priority than the epoll handling and
> thus the application
> can spend much more time handling packets and processing them in the kernel
> than
> handling epoll events in the user space. For the DPDK+FreeBSD case the time
> for
> handling packets and the time for processing epolls is kind of equal. I
> think, that this was
> the reason why we were able to get more performance increasing the number
> of read
> packets at one go and decreasing the epoll events. However, we couldn't
> increase the
> throughput enough with these tweaks.
> 3. Can you suggest something else that we can test/measure/profile to get
> better idea
> what exactly is happening here and to improve the performance more?
> Any help is appreciated!
> Thanks in advance,
> Pavel.

First off, if you are testing on KVM, are you using PCI pass thru or SR-IOV
to make the device available to the guest directly. The default mode uses
a Linux bridge, and this results in multiple copies and context switches.
You end up testing Linux bridge and virtio performance, not TCP.

To get full speed with TCP and most software stacks you need TCP segmentation

Also software queue discipline, kernel version, and TCP congestion control
can have a big role in your result.

  reply index

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-06  5:14 Pavel Vajarov
2020-05-06 14:54 ` Stephen Hemminger [this message]
2020-05-07 10:47   ` Pavel Vajarov
2020-05-07 14:09     ` dave seddon
2020-05-07 20:31       ` Stephen Hemminger
2020-05-08  5:03         ` Pavel Vajarov
2020-05-20 19:43       ` Vincent Li
2020-05-21  8:09         ` Pavel Vajarov
2020-05-21 16:31           ` Vincent Li
2020-05-26 16:50 ` Vincent Li
2020-05-27  5:11   ` Pavel Vajarov
2020-05-27 16:44     ` Vincent Li

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200506075456.140625fb@hermes.lan \
    --to=stephen@networkplumber.org \
    --cc=freakpv@gmail.com \
    --cc=users@dpdk.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK usage discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/users/0 users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 users users/ http://inbox.dpdk.org/users \
	public-inbox-index users

Newsgroup available over NNTP:

AGPL code for this site: git clone https://public-inbox.org/ public-inbox