From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2FA53A00C5 for ; Fri, 8 May 2020 07:04:15 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F0B4A1DB8A; Fri, 8 May 2020 07:04:14 +0200 (CEST) Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) by dpdk.org (Postfix) with ESMTP id 0F3181DB77 for ; Fri, 8 May 2020 07:04:14 +0200 (CEST) Received: by mail-wm1-f68.google.com with SMTP id x4so8885731wmj.1 for ; Thu, 07 May 2020 22:04:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4UIBuXEiXYg5vLZe9QULPDQhqdbgI9JMOPryNWG6rv4=; b=ITXKiHdhJ5boiF8F4tU+V/yyqpLyQ7f0kRDOZOVek2xw4HGDXqyrj5qCQaMdaY+qpU MkeZ/Gm5Iqdo+atI6HxTMlAax2XupGx72IkbQ1BpbU0kOOof4SuNCkTPUOWytPxK7461 xHDdSO7SXRwVQZs0hdsauiySz1vI62D4jG3NeDnf6Gtm2jGGzRYCvwPSq9om7eSOtJcI RfkZxDdmSEDajgQFtKQu/CVTkLISZ/LX08SXkbEkvTXHoFLyOtnrEvX1BHHZfuTGfP/m 4T/J6smwy2HLkdO8XL8liAU7Eoe/GiNkqRR7MZ8Hg3Uxqj8O7YPjXZ+p1aAWdu6OymMO bkHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4UIBuXEiXYg5vLZe9QULPDQhqdbgI9JMOPryNWG6rv4=; b=umsUJfK7BubHWWF9uqdM7j6GZ1AEiZ9F9W+sOqQdZbVmndCXw4TvGemDB3kBPFibdR Y1j4qMIjtyjXG7N1ynFEv/yB3EkX5nhIv7t44srnNJhFkKYRRX3UsP9BudTYUD/Oa1vq BZ5/o2dgWPbP0eW1bHmXH3fqyOERBBR43mLtb3MffxzMM2fvNpzio4u4tTZ/WNckkIk4 I+QtVx37apnDjmDUGiNbnXGlJNMNhjXtGWXZ6f5TDkIZXp43d0UjTvKBcbTRV8Njipyz n/yD0WGqiwmN2BJEmRSuhl5N4MHpjzILvc9TZhFqqG2kLZ/nYocIscny9rlCsPq8ChFG IcNg== X-Gm-Message-State: AGi0PuZP33ejaGDc43JuZk1Us5kL4aB5mDFzV8tih0S9nnxcvPBwYzHS 36ec/8n97Gq6xz1Etnf+mRX1iNrniWVI6SS6xqQ= X-Google-Smtp-Source: APiQypLgM7506f6KS1efKc8lAsfm2G37iRUihvOUvhVxuZMuwamyQ281MA31uFmzkB0I7ApV/FxXqlAxgswBawaH+nc= X-Received: by 2002:a1c:2289:: with SMTP id i131mr152488wmi.111.1588914253787; Thu, 07 May 2020 22:04:13 -0700 (PDT) MIME-Version: 1.0 References: <20200506075456.140625fb@hermes.lan> <20200507133107.58d18183@hermes.lan> In-Reply-To: <20200507133107.58d18183@hermes.lan> From: Pavel Vajarov Date: Fri, 8 May 2020 08:03:57 +0300 Message-ID: To: Stephen Hemminger Cc: dave seddon , users Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-users] Peformance troubleshouting of TCP/IP stack over DPDK. X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" Thanks for the response. The F-stack has TSO option in the config file which we turned ON for the tests. I'll check fd.io. On Thu, May 7, 2020 at 11:31 PM Stephen Hemminger < stephen@networkplumber.org> wrote: > On Thu, 7 May 2020 07:09:44 -0700 > dave seddon wrote: > > > tc qdisc > > https://linux.die.net/man/8/tc > > > > On Thu, May 7, 2020 at 3:47 AM Pavel Vajarov wrote: > > > > > On Wed, May 6, 2020 at 5:55 PM Stephen Hemminger < > > > stephen@networkplumber.org> > > > wrote: > > > > > > > On Wed, 6 May 2020 08:14:20 +0300 > > > > Pavel Vajarov wrote: > > > > > > > > > Hi there, > > > > > > > > > > We are trying to compare the performance of DPDK+FreeBSD > networking > > > stack > > > > > vs standard Linux kernel and we have problems finding out why the > > > former > > > > is > > > > > slower. The details are below. > > > > > > > > > > There is a project called F-Stack < > https://github.com/F-Stack/f-stack > > > >. > > > > > It glues the networking stack from > > > > > FreeBSD 11.01 over DPDK. We made a setup to test the performance of > > > > > transparent > > > > > TCP proxy based on F-Stack and another one running on Standard > Linux > > > > > kernel. > > > > > We did the tests on KVM with 2 cores (Intel(R) Xeon(R) Gold 6139 > CPU @ > > > > > 2.30GHz) > > > > > and 32GB RAM. 10Gbs NIC was attached in passthrough mode. > > > > > The application level code, the one which handles epoll > notifications > > > and > > > > > memcpy data between the sockets, of the both proxy applications is > 100% > > > > the > > > > > same. Both proxy applications are single threaded and in all tests > we > > > > > pinned the applications on core 1. The interrupts from the network > card > > > > > were pinned to the same core 1 for the test with the standard Linux > > > > > application. > > > > > > > > > > Here are the test results: > > > > > 1. The Linux based proxy was able to handle about 1.7-1.8 Gbps > before > > > it > > > > > started to throttle the traffic. No visible CPU usage was observed > on > > > > core > > > > > 0 during the tests, only core 1, where the application and the > IRQs > > > were > > > > > pinned, took the load. > > > > > 2. The DPDK+FreeBSD proxy was able to thandle 700-800 Mbps before > it > > > > > started to throttle the traffic. No visible CPU usage was observed > on > > > > core > > > > > 0 during the tests only core 1, where the application was pinned, > took > > > > the > > > > > load. In some of the latter tests I did some changes to the number > of > > > > read > > > > > packets in one call from the network card and the number of > handled > > > > events > > > > > in one call to epoll. With these changes I was able to increase the > > > > > throughput > > > > > to 900-1000 Mbps but couldn't increase it more. > > > > > 3. We did another test with the DPDK+FreeBSD proxy just to give us > some > > > > > more info about the problem. We disabled the TCP proxy > functionality > > > and > > > > > let the packets be simply ip forwarded by the FreeBSD stack. In > this > > > test > > > > > we reached up to 5Gbps without being able to throttle the traffic. > We > > > > just > > > > > don't have more traffic to redirect there at the moment. So the > > > bottlneck > > > > > seem to be either in the upper level of the network stack or in the > > > > > application > > > > > code. > > > > > > > > > > There is a huawei switch which redirects the traffic to this > server. It > > > > > regularly > > > > > sends arping and if the server doesn't respond it stops the > > > redirection. > > > > > So we assumed that when the redirection stops it's because the > server > > > > > throttles the traffic and drops packets and can't respond to the > arping > > > > > because > > > > > of the packets drop. > > > > > > > > > > The whole application can be very roughly represented in the > following > > > > way: > > > > > - Write pending outgoing packets to the network card > > > > > - Read incoming packets from the network card > > > > > - Push the incoming packets to the FreeBSD stack > > > > > - Call epoll_wait/kevent without waiting > > > > > - Handle the events > > > > > - loop from the beginning > > > > > According to the performance profiling that we did, aside from > packet > > > > > processing, > > > > > about 25-30% of the application time seems to be spent in the > > > > > epoll_wait/kevent > > > > > even though the `timeout` parameter of this call is set to 0 i.e. > > > > > it shouldn't block waiting for events if there is none. > > > > > > > > > > I can give you much more details and code for everything, if > needed. > > > > > > > > > > My questions are: > > > > > 1. Does somebody have observations or educated guesses about what > > > amount > > > > of > > > > > traffic should I expect the DPDK + FreeBSD stack + kevent to > process in > > > > the > > > > > above > > > > > scenario? Are the numbers low or expected? > > > > > We've expected to see better performance than the standard Linux > kernel > > > > one > > > > > but > > > > > so far we can't get this performance. > > > > > 2. Do you think the diffrence comes because of the time spending > > > handling > > > > > packets > > > > > and handling epoll in both of the tests? What do I mean. For the > > > standard > > > > > Linux tests > > > > > the interrupts handling has higher priority than the epoll > handling and > > > > > thus the application > > > > > can spend much more time handling packets and processing them in > the > > > > kernel > > > > > than > > > > > handling epoll events in the user space. For the DPDK+FreeBSD case > the > > > > time > > > > > for > > > > > handling packets and the time for processing epolls is kind of > equal. I > > > > > think, that this was > > > > > the reason why we were able to get more performance increasing > the > > > number > > > > > of read > > > > > packets at one go and decreasing the epoll events. However, we > couldn't > > > > > increase the > > > > > throughput enough with these tweaks. > > > > > 3. Can you suggest something else that we can test/measure/profile > to > > > get > > > > > better idea > > > > > what exactly is happening here and to improve the performance more? > > > > > > > > > > Any help is appreciated! > > > > > > > > > > Thanks in advance, > > > > > Pavel. > > > > > > > > First off, if you are testing on KVM, are you using PCI pass thru > or > > > SR-IOV > > > > to make the device available to the guest directly. The default mode > uses > > > > a Linux bridge, and this results in multiple copies and context > switches. > > > > You end up testing Linux bridge and virtio performance, not TCP. > > > > > > > > To get full speed with TCP and most software stacks you need TCP > > > > segmentation > > > > offload. > > > > > > > > Also software queue discipline, kernel version, and TCP congestion > > > control > > > > can have a big role in your result. > > > > > > > > > > Hi, > > > > > > Thanks for the response. > > > > > > We did the tests on Ubuntu 18.04.4 LTS (GNU/Linux 4.15.0-96-generic > > > x86_64). > > > The NIC was given to the guest using SR-IOV. > > > The TCP segmentation offload was enabled for both tests (standard > Linux and > > > DPDK+FreeBSD). > > > The congestion control algorithm for both tests was 'cubic'. > > > > > > What do you mean by 'software queue discipline'? > > The default qdisc in Ubuntu should be fq_codel (see tc qdisc show) > and that in general has a positive effect on reducing bufferbloat. > > F-stack probably doesn't use TSO, you might want to look at TCP stack > from FD.io for comparison. > > >