From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wg0-f52.google.com (mail-wg0-f52.google.com [74.125.82.52]) by dpdk.org (Postfix) with ESMTP id 0AC0DB3B1 for ; Fri, 11 Jul 2014 17:54:26 +0200 (CEST) Received: by mail-wg0-f52.google.com with SMTP id a1so539664wgh.11 for ; Fri, 11 Jul 2014 08:54:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-type:content-transfer-encoding; bh=e1sOyuZBxkuzTd1EgxGfEglFoQYKLeEx1KfQ2b3YYJU=; b=Lw6O9gxF/TbRe/yiHEnkGDVlLFccZQnLEapDjxL9ElJSt518jq2R0aJRV7GzoKNdq3 hMQeD+1FuSjgMOOixoSsqGwyYN0ydhu8Ynn/pPY+pg4Rpz287wY7y5QJz2W8V3AQUXml 0Ds7OKqtawQ5nU+Mr4XmvVSMoJFuZKLr71zO1yGW/+mtLEvsMZ487rqHkarJfKtbyoEe I/BnyEMikRVYUoYVP0hbfMejEq33TZtVmMWgfoV/sS/MDLI+ummzMaiozIP67BUR+A3L EXPvUoC8I+f74wZjFFUv2rr1Ns+4kf0EaBiMt/ayQU5ocwnigr1xiGTey4kiuhpkgp2P VsdA== X-Gm-Message-State: ALoCoQkAnhNUWgdNaAXEuX1wH5xWtHHYW9REMrMcB/13/9wgcnfcNlxGsEOdwldfdFGmKPvgvxWq X-Received: by 10.180.8.10 with SMTP id n10mr5985759wia.41.1405094090587; Fri, 11 Jul 2014 08:54:50 -0700 (PDT) Received: from samsung-9 (243-249.80-90.static-ip.oleane.fr. [90.80.249.243]) by mx.google.com with ESMTPSA id ja9sm8799794wic.8.2014.07.11.08.54.49 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Fri, 11 Jul 2014 08:54:50 -0700 (PDT) Date: Fri, 11 Jul 2014 08:16:23 -0700 From: Stephen Hemminger To: "Richardson, Bruce" Message-ID: <20140711081623.4c026199@samsung-9> In-Reply-To: <59AF69C657FD0841A61C55336867B5B0343ACD8B@IRSMSX103.ger.corp.intel.com> References: <1405024369-30058-1-git-send-email-linville@tuxdriver.com> <20140711061147.06c12136@samsung-9> <20140711144912.GA25478@tuxdriver.com> <59AF69C657FD0841A61C55336867B5B0343ACD8B@IRSMSX103.ger.corp.intel.com> X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.10; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH] librte_pmd_packet: add PMD for AF_PACKET-based virtual devices X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jul 2014 15:54:26 -0000 On Fri, 11 Jul 2014 15:06:25 +0000 "Richardson, Bruce" wrote: > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of John W. Linville > > Sent: Friday, July 11, 2014 7:49 AM > > To: Stephen Hemminger > > Cc: dev@dpdk.org > > Subject: Re: [dpdk-dev] [PATCH] librte_pmd_packet: add PMD for AF_PACKET- > > based virtual devices > > > > On Fri, Jul 11, 2014 at 06:11:47AM -0700, Stephen Hemminger wrote: > > > On Thu, 10 Jul 2014 16:32:49 -0400 > > > "John W. Linville" wrote: > > > > > > > This is a Linux-specific virtual PMD driver backed by an AF_PACKET > > > > socket. This implementation uses mmap'ed ring buffers to limit copying > > > > and user/kernel transitions. The PACKET_FANOUT_HASH behavior of > > > > AF_PACKET is used for frame reception. In the current implementation, > > > > Tx and Rx queues are always paired, and therefore are always equal > > > > in number -- changing this would be a Simple Matter Of Programming. > > > > > > > > Interfaces of this type are created with a command line option like > > > > "--vdev=eth_packet0,iface=...". There are a number of options availabe > > > > as arguments: > > > > > > > > - Interface is chosen by "iface" (required) > > > > - Number of queue pairs set by "qpairs" (optional, default: 16) > > > > - AF_PACKET MMAP block size set by "blocksz" (optional, default: 4096) > > > > - AF_PACKET MMAP frame size set by "framesz" (optional, default: 2048) > > > > - AF_PACKET MMAP frame count set by "framecnt" (optional, default: 512) > > > > > > > > Signed-off-by: John W. Linville > > > > --- > > > > This PMD is intended to provide a means for using DPDK on a broad > > > > range of hardware without hardware-specific PMDs and (hopefully) > > > > with better performance than what PCAP offers in Linux. This might > > > > be useful as a development platform for DPDK applications when > > > > DPDK-supported hardware is expensive or unavailable. > > > > > > > > config/common_bsdapp | 5 + > > > > config/common_linuxapp | 5 + > > > > lib/Makefile | 1 + > > > > lib/librte_eal/linuxapp/eal/Makefile | 1 + > > > > lib/librte_pmd_packet/Makefile | 60 +++ > > > > lib/librte_pmd_packet/rte_eth_packet.c | 826 > > +++++++++++++++++++++++++++++++++ > > > > lib/librte_pmd_packet/rte_eth_packet.h | 55 +++ > > > > mk/rte.app.mk | 4 + > > > > 8 files changed, 957 insertions(+) > > > > create mode 100644 lib/librte_pmd_packet/Makefile > > > > create mode 100644 lib/librte_pmd_packet/rte_eth_packet.c > > > > create mode 100644 lib/librte_pmd_packet/rte_eth_packet.h > > > > > > > > diff --git a/config/common_bsdapp b/config/common_bsdapp > > > > index 943dce8f1ede..c317f031278e 100644 > > > > --- a/config/common_bsdapp > > > > +++ b/config/common_bsdapp > > > > @@ -226,6 +226,11 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=y > > > > CONFIG_RTE_LIBRTE_PMD_BOND=y > > > > > > > > # > > > > +# Compile software PMD backed by AF_PACKET sockets (Linux only) > > > > +# > > > > +CONFIG_RTE_LIBRTE_PMD_PACKET=n > > > > + > > > > +# > > > > # Do prefetch of packet data within PMD driver receive function > > > > # > > > > CONFIG_RTE_PMD_PACKET_PREFETCH=y > > > > diff --git a/config/common_linuxapp b/config/common_linuxapp > > > > index 7bf5d80d4e26..f9e7bc3015ec 100644 > > > > --- a/config/common_linuxapp > > > > +++ b/config/common_linuxapp > > > > @@ -249,6 +249,11 @@ CONFIG_RTE_LIBRTE_PMD_PCAP=n > > > > CONFIG_RTE_LIBRTE_PMD_BOND=y > > > > > > > > # > > > > +# Compile software PMD backed by AF_PACKET sockets (Linux only) > > > > +# > > > > +CONFIG_RTE_LIBRTE_PMD_PACKET=y > > > > + > > > > +# > > > > # Compile Xen PMD > > > > # > > > > CONFIG_RTE_LIBRTE_PMD_XENVIRT=n > > > > diff --git a/lib/Makefile b/lib/Makefile > > > > index 10c5bb3045bc..930fadf29898 100644 > > > > --- a/lib/Makefile > > > > +++ b/lib/Makefile > > > > @@ -47,6 +47,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += > > librte_pmd_i40e > > > > DIRS-$(CONFIG_RTE_LIBRTE_PMD_BOND) += librte_pmd_bond > > > > DIRS-$(CONFIG_RTE_LIBRTE_PMD_RING) += librte_pmd_ring > > > > DIRS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += librte_pmd_pcap > > > > +DIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += librte_pmd_packet > > > > DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += librte_pmd_virtio > > > > DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += librte_pmd_vmxnet3 > > > > DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += librte_pmd_xenvirt > > > > diff --git a/lib/librte_eal/linuxapp/eal/Makefile > > b/lib/librte_eal/linuxapp/eal/Makefile > > > > index 756d6b0c9301..feed24a63272 100644 > > > > --- a/lib/librte_eal/linuxapp/eal/Makefile > > > > +++ b/lib/librte_eal/linuxapp/eal/Makefile > > > > @@ -44,6 +44,7 @@ CFLAGS += -I$(RTE_SDK)/lib/librte_ether > > > > CFLAGS += -I$(RTE_SDK)/lib/librte_ivshmem > > > > CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_ring > > > > CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_pcap > > > > +CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_packet > > > > CFLAGS += -I$(RTE_SDK)/lib/librte_pmd_xenvirt > > > > CFLAGS += $(WERROR_FLAGS) -O3 > > > > > > > > diff --git a/lib/librte_pmd_packet/Makefile > > b/lib/librte_pmd_packet/Makefile > > > > new file mode 100644 > > > > index 000000000000..e1266fb992cd > > > > --- /dev/null > > > > +++ b/lib/librte_pmd_packet/Makefile > > > > @@ -0,0 +1,60 @@ > > > > +# BSD LICENSE > > > > +# > > > > +# Copyright(c) 2014 John W. Linville > > > > +# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > > > > +# Copyright(c) 2014 6WIND S.A. > > > > +# All rights reserved. > > > > +# > > > > +# Redistribution and use in source and binary forms, with or without > > > > +# modification, are permitted provided that the following conditions > > > > +# are met: > > > > +# > > > > +# * Redistributions of source code must retain the above copyright > > > > +# notice, this list of conditions and the following disclaimer. > > > > +# * Redistributions in binary form must reproduce the above copyright > > > > +# notice, this list of conditions and the following disclaimer in > > > > +# the documentation and/or other materials provided with the > > > > +# distribution. > > > > +# * Neither the name of Intel Corporation nor the names of its > > > > +# contributors may be used to endorse or promote products derived > > > > +# from this software without specific prior written permission. > > > > +# > > > > +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > > CONTRIBUTORS > > > > +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT > > NOT > > > > +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND > > FITNESS FOR > > > > +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE > > COPYRIGHT > > > > +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, > > INCIDENTAL, > > > > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT > > NOT > > > > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; > > LOSS OF USE, > > > > +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED > > AND ON ANY > > > > +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR > > TORT > > > > +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT > > OF THE USE > > > > +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > > DAMAGE. > > > > + > > > > +include $(RTE_SDK)/mk/rte.vars.mk > > > > + > > > > +# > > > > +# library name > > > > +# > > > > +LIB = librte_pmd_packet.a > > > > + > > > > +CFLAGS += -O3 > > > > +CFLAGS += $(WERROR_FLAGS) > > > > + > > > > +# > > > > +# all source are stored in SRCS-y > > > > +# > > > > +SRCS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += rte_eth_packet.c > > > > + > > > > +# > > > > +# Export include files > > > > +# > > > > +SYMLINK-y-include += rte_eth_packet.h > > > > + > > > > +# this lib depends upon: > > > > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_mbuf > > > > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_ether > > > > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_malloc > > > > +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_PACKET) += lib/librte_kvargs > > > > + > > > > +include $(RTE_SDK)/mk/rte.lib.mk > > > > diff --git a/lib/librte_pmd_packet/rte_eth_packet.c > > b/lib/librte_pmd_packet/rte_eth_packet.c > > > > new file mode 100644 > > > > index 000000000000..fceb6258aad6 > > > > --- /dev/null > > > > +++ b/lib/librte_pmd_packet/rte_eth_packet.c > > > > @@ -0,0 +1,826 @@ > > > > +/*- > > > > + * BSD LICENSE > > > > + * > > > > + * Copyright(c) 2014 John W. Linville > > > > + * > > > > + * Originally based upon librte_pmd_pcap code: > > > > + * > > > > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > > > > + * Copyright(c) 2014 6WIND S.A. > > > > + * All rights reserved. > > > > + * > > > > + * Redistribution and use in source and binary forms, with or without > > > > + * modification, are permitted provided that the following conditions > > > > + * are met: > > > > + * > > > > + * * Redistributions of source code must retain the above copyright > > > > + * notice, this list of conditions and the following disclaimer. > > > > + * * Redistributions in binary form must reproduce the above copyright > > > > + * notice, this list of conditions and the following disclaimer in > > > > + * the documentation and/or other materials provided with the > > > > + * distribution. > > > > + * * Neither the name of Intel Corporation nor the names of its > > > > + * contributors may be used to endorse or promote products derived > > > > + * from this software without specific prior written permission. > > > > + * > > > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND > > CONTRIBUTORS > > > > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT > > NOT > > > > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND > > FITNESS FOR > > > > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE > > COPYRIGHT > > > > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, > > INCIDENTAL, > > > > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, > > BUT NOT > > > > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; > > LOSS OF USE, > > > > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED > > AND ON ANY > > > > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR > > TORT > > > > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT > > OF THE USE > > > > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH > > DAMAGE. > > > > + */ > > > > + > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > + > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > +#include > > > > + > > > > +#include "rte_eth_packet.h" > > > > + > > > > +#define ETH_PACKET_IFACE_ARG "iface" > > > > +#define ETH_PACKET_NUM_Q_ARG "qpairs" > > > > +#define ETH_PACKET_BLOCKSIZE_ARG "blocksz" > > > > +#define ETH_PACKET_FRAMESIZE_ARG "framesz" > > > > +#define ETH_PACKET_FRAMECOUNT_ARG "framecnt" > > > > + > > > > +#define DFLT_BLOCK_SIZE (1 << 12) > > > > +#define DFLT_FRAME_SIZE (1 << 11) > > > > +#define DFLT_FRAME_COUNT (1 << 9) > > > > + > > > > +struct pkt_rx_queue { > > > > + int sockfd; > > > > + > > > > + struct iovec *rd; > > > > + uint8_t *map; > > > > + unsigned int framecount; > > > > + unsigned int framenum; > > > > + > > > > + struct rte_mempool *mb_pool; > > > > + > > > > + volatile unsigned long rx_pkts; > > > > + volatile unsigned long err_pkts; > > > > > > Use of volatile will generate slow code, don't think > > > it is necessary, especially when only one CPU can use a queue > > > at a time. > > > > That is a good point, worth checking out. FWIW, those lines are > > boilerplate originally copied from the pcap PMD. :-) > > > > > Yes, I agree it's worth checking out if there is a performance impact, but if we assume that the stats for RX/TX are possibly going to be read by another core, they really should be volatile for correctness. Since only one core does update, that is not necessary. add will generate valid value. and reader will read a valid value. Only if two cpu's are using same queue would it be possible to for two add's to collide; but DPDK queue documentation specifically says queue's are not MP safe.