From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 80F125A86 for ; Thu, 1 Oct 2015 12:38:41 +0200 (CEST) Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id E8778341AC4; Thu, 1 Oct 2015 10:38:40 +0000 (UTC) Received: from redhat.com (ovpn-116-83.ams2.redhat.com [10.36.116.83]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id t91AcckJ025650; Thu, 1 Oct 2015 06:38:39 -0400 Date: Thu, 1 Oct 2015 13:38:37 +0300 From: "Michael S. Tsirkin" To: Avi Kivity Message-ID: <20151001131754-mutt-send-email-mst@redhat.com> References: <560BF782.4070308@scylladb.com> <20150930175848-mutt-send-email-mst@redhat.com> <560C0171.7080507@scylladb.com> <20150930204016.GA29975@redhat.com> <20151001113828-mutt-send-email-mst@redhat.com> <560CF44A.60102@scylladb.com> <20151001120027-mutt-send-email-mst@redhat.com> <560CFB66.5050904@scylladb.com> <20151001124211-mutt-send-email-mst@redhat.com> <560D0413.5080401@scylladb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <560D0413.5080401@scylladb.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Oct 2015 10:38:41 -0000 On Thu, Oct 01, 2015 at 12:59:47PM +0300, Avi Kivity wrote: > > > On 10/01/2015 12:55 PM, Michael S. Tsirkin wrote: > >On Thu, Oct 01, 2015 at 12:22:46PM +0300, Avi Kivity wrote: > >>It's easy to claim that > >>a solution is around the corner, only no one was looking for it, but the > >>reality is that kernel bypass has been a solution for years for high > >>performance users, > >I never said that it's trivial. > > > >It's probably a lot of work. It's definitely more work than just abusing > >sysfs. > > > >But it looks like a write system call into an eventfd is about 1.5 > >microseconds on my laptop. Even with a system call per packet, system > >call overhead is not what makes DPDK drivers outperform Linux ones. > > > > 1.5 us = 0.6 Mpps per core limit. Oh, I calculated it incorrectly. It's 0.15 us. So 6Mpps. But for RX, you can batch a lot of packets. You can see by now I'm not that good at benchmarking. Here's what I wrote: #include #include #include #include int main(int argc, char **argv) { int e = eventfd(0, 0); uint64_t v = 1; int i; for (i = 0; i < 10000000; ++i) { write(e, &v, sizeof v); } } This takes 1.5 seconds to run on my laptop: $ time ./a.out real 0m1.507s user 0m0.179s sys 0m1.328s > dpdk performance is in the tens of > millions of packets per system. I think that's with a bunch of batching though. > It's not just the lack of system calls, of course, the architecture is > completely different. Absolutely - I'm not saying move all of DPDK into kernel. We just need to protect the RX rings so hardware does not corrupt kernel memory. Thinking about it some more, many devices have separate rings for DMA: TX (device reads memory) and RX (device writes memory). With such devices, a mode where userspace can write TX ring but not RX ring might make sense. This will mean userspace might read kernel memory through the device, but can not corrupt it. That's already a big win! And RX buffers do not have to be added one at a time. If we assume 0.2usec per system call, batching some 100 buffers per system call gives you 2 nano seconds overhead. That seems quite reasonable. -- MST