* [dpdk-dev] [PATCH 0/2] slow data path communication between DPDK port and Linux @ 2016-01-27 16:32 Ferruh Yigit 2016-01-27 16:32 ` [dpdk-dev] [PATCH 1/2] kdp: add kernel data path kernel module Ferruh Yigit ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-01-27 16:32 UTC (permalink / raw) To: dev This is slow data path communication implementation based on existing KNI. Difference is: librte_kni converted into a PMD, kdp kernel module is almost same except all control path functionality removed and some simplification done. Motivation is to simplify slow path data communication. Now any application can use this new PMD to send/get data to Linux kernel. PMD supports two communication methods: 1) KDP kernel module PMD initialization functions handles creating virtual interfaces (with help of kdp kernel module) and created FIFO. FIFO is used to share data between userspace and kernelspace. This is default method. 2) tun/tap module When KDP module is not inserted, PMD creates tap interface and transfers packets using tap interface. In long term this patch intends to replace the KNI and KNI will be depreciated. Sample usage: 1) Transfer any packet received from NIC that bound to DPDK, to the Linux kernel a) insert kdp kernel module insmod build/kmod/rte_kdp.ko b) bind NIC to the DPDK using dpdk_nic_bind.py c) ./testpmd --vdev eth_kdp0 c1) testpmd show two ports, one of them physical, other virtual ... Configuring Port 0 (socket 0) Port 0: 00:00:00:00:00:00 Configuring Port 1 (socket 0) ... Checking link statuses... Port 0 Link Up - speed 10000 Mbps - full-duplex Port 1 Link Up - speed 10000 Mbps - full-duplex Done c2) This will create "kdp0" Linux interface $ ip l show kdp0 21: kdp0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff d) Linux port can be used for data d1) $ ifconfig kdp0 1.0.0.2 $ ping 1.0.0.1 PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data. 64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=0.789 ms 64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=0.881 ms d2) $ tcpdump -nn -i kdp0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:01:22.407506 IP 1.0.0.1 > 1.0.0.2: ICMP echo request, id 40016, seq 18, length 64 15:01:22.408521 IP 1.0.0.2 > 1.0.0.1: ICMP echo reply, id 40016, seq 18, length 64 2) Data travels between virtual Linux interfaces pass from DPDK application, application can alter data a) insert kdp kernel module insmod build/kmod/rte_kdp.ko b) No physical NIC involved c) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 c1) testpmd show two ports, both of them are virtual ... Configuring Port 0 (socket 0) Port 0: 00:00:00:00:00:00 Configuring Port 1 (socket 0) Port 1: 00:00:00:00:00:00 Checking link statuses... Port 0 Link Up - speed 10000 Mbps - full-duplex Port 1 Link Up - speed 10000 Mbps - full-duplex Done c2) This will create "kdp0" and "kdp1" Linux interfaces $ ip l show kdp0; ip l show kdp1 22: kdp0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 23: kdp1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff d) Data travel between virtual ports pass from DPDK application $ifconfig kdp0 1.0.0.1 $ifconfig kdp1 1.0.0.2 d1) $ ping 1.0.0.1 PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data. 64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=3.57 ms 64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=1.85 ms 64 bytes from 1.0.0.1: icmp_seq=3 ttl=64 time=1.89 ms d2) $ tcpdump -nn -i kdp0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:20:51.908543 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 1, length 64 15:20:51.909570 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 1, length 64 15:20:52.909551 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 2, length 64 15:20:52.910577 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 2, length 64 3) tun/tap interface usage a) No external module required, tun/tap support in kernel required b) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 b1) This will create "tap_kdp0" and "tap_kdp1" Linux interfaces $ ip l show tap_kdp0; ip l show tap_kdp1 25: tap_kdp0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 56:47:97:9c:03:8e brd ff:ff:ff:ff:ff:ff 26: tap_kdp1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:15:22:b0:52:42 brd ff:ff:ff:ff:ff:ff Ferruh Yigit (2): kdp: add kernel data path kernel module kdp: add virtual PMD for kernel slow data path communication config/common_linuxapp | 9 +- doc/guides/nics/pcap_ring.rst | 125 ++++- doc/guides/rel_notes/release_2_3.rst | 6 + drivers/net/Makefile | 3 +- drivers/net/kdp/Makefile | 61 +++ drivers/net/kdp/rte_eth_kdp.c | 481 +++++++++++++++++ drivers/net/kdp/rte_kdp.c | 365 +++++++++++++ drivers/net/kdp/rte_kdp.h | 126 +++++ drivers/net/kdp/rte_kdp_fifo.h | 91 ++++ drivers/net/kdp/rte_kdp_tap.c | 96 ++++ drivers/net/kdp/rte_pmd_kdp_version.map | 4 + lib/librte_eal/common/include/rte_log.h | 3 +- lib/librte_eal/linuxapp/Makefile | 5 +- lib/librte_eal/linuxapp/eal/Makefile | 3 +- .../linuxapp/eal/include/exec-env/rte_kdp_common.h | 143 +++++ lib/librte_eal/linuxapp/kdp/Makefile | 56 ++ lib/librte_eal/linuxapp/kdp/kdp_dev.h | 82 +++ lib/librte_eal/linuxapp/kdp/kdp_fifo.h | 91 ++++ lib/librte_eal/linuxapp/kdp/kdp_misc.c | 463 +++++++++++++++++ lib/librte_eal/linuxapp/kdp/kdp_net.c | 573 +++++++++++++++++++++ mk/rte.app.mk | 3 +- 21 files changed, 2780 insertions(+), 9 deletions(-) create mode 100644 drivers/net/kdp/Makefile create mode 100644 drivers/net/kdp/rte_eth_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.h create mode 100644 drivers/net/kdp/rte_kdp_fifo.h create mode 100644 drivers/net/kdp/rte_kdp_tap.c create mode 100644 drivers/net/kdp/rte_pmd_kdp_version.map create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h create mode 100644 lib/librte_eal/linuxapp/kdp/Makefile create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_dev.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_fifo.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_misc.c create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_net.c -- 2.5.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH 1/2] kdp: add kernel data path kernel module 2016-01-27 16:32 [dpdk-dev] [PATCH 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit @ 2016-01-27 16:32 ` Ferruh Yigit 2016-02-08 17:14 ` Reshma Pattan 2016-01-27 16:32 ` [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2 siblings, 1 reply; 29+ messages in thread From: Ferruh Yigit @ 2016-01-27 16:32 UTC (permalink / raw) To: dev This kernel module is based on KNI module, but this one is stripped version of it and only for data messages, no control functionality provided. FIFO implementation of the KNI is kept exact same, but ethtool related code removed and virtual network management related code simplified. This module contains kernel support to create network devices and this module has a simple driver for virtual network device, the driver simply puts/gets packets to/from FIFO instead of real hardware. FIFO is created owned by userspace application, which is for this case KDP PMD. In long term this patch intends to replace the KNI and KNI will be depreciated. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> --- config/common_linuxapp | 8 +- lib/librte_eal/linuxapp/Makefile | 5 +- lib/librte_eal/linuxapp/eal/Makefile | 3 +- .../linuxapp/eal/include/exec-env/rte_kdp_common.h | 143 +++++ lib/librte_eal/linuxapp/kdp/Makefile | 56 ++ lib/librte_eal/linuxapp/kdp/kdp_dev.h | 82 +++ lib/librte_eal/linuxapp/kdp/kdp_fifo.h | 91 ++++ lib/librte_eal/linuxapp/kdp/kdp_misc.c | 463 +++++++++++++++++ lib/librte_eal/linuxapp/kdp/kdp_net.c | 573 +++++++++++++++++++++ 9 files changed, 1421 insertions(+), 3 deletions(-) create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h create mode 100644 lib/librte_eal/linuxapp/kdp/Makefile create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_dev.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_fifo.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_misc.c create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_net.c diff --git a/config/common_linuxapp b/config/common_linuxapp index 74bc515..73c91d8 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -320,6 +320,12 @@ CONFIG_RTE_LIBRTE_PMD_XENVIRT=n CONFIG_RTE_LIBRTE_PMD_NULL=y # +# Compile KDP PMD +# +CONFIG_RTE_KDP_KMOD=y +CONFIG_RTE_KDP_PREEMPT_DEFAULT=y + +# # Do prefetch of packet data within PMD driver receive function # CONFIG_RTE_PMD_PACKET_PREFETCH=y diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile index d9c5233..e3f91a7 100644 --- a/lib/librte_eal/linuxapp/Makefile +++ b/lib/librte_eal/linuxapp/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -38,6 +38,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal ifeq ($(CONFIG_RTE_KNI_KMOD),y) DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni endif +ifeq ($(CONFIG_RTE_KDP_KMOD),y) +DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kdp +endif ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y) DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += xen_dom0 endif diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index 26eced5..ac72aea 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -116,6 +116,7 @@ CFLAGS_eal_thread.o += -Wno-return-type endif INC := rte_interrupts.h rte_kni_common.h rte_dom0_common.h +INC += rte_kdp_common.h SYMLINK-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP)-include/exec-env := \ $(addprefix include/exec-env/,$(INC)) diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h new file mode 100644 index 0000000..0c77f58 --- /dev/null +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h @@ -0,0 +1,143 @@ +/*- + * This file is provided under a dual BSD/LGPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GNU LESSER GENERAL PUBLIC LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + * + * + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + */ + +#ifndef _RTE_KDP_COMMON_H_ +#define _RTE_KDP_COMMON_H_ + +#ifdef __KERNEL__ +#include <linux/if.h> +#endif + +/** + * KDP name is part of memzone name. + */ +#define RTE_KDP_NAMESIZE 32 + +#ifndef RTE_CACHE_LINE_SIZE +#define RTE_CACHE_LINE_SIZE 64 /**< Cache line size. */ +#endif + +/* + * Fifo struct mapped in a shared memory. It describes a circular buffer FIFO + * Write and read should wrap around. Fifo is empty when write == read + * Writing should never overwrite the read position + */ +struct rte_kdp_fifo { + volatile unsigned write; /**< Next position to be written*/ + volatile unsigned read; /**< Next position to be read */ + unsigned len; /**< Circular buffer length */ + unsigned elem_size; /**< Pointer size - for 32/64 bit OS */ + void * volatile buffer[0]; /**< The buffer contains mbuf pointers */ +}; + +/* + * The kernel image of the rte_mbuf struct, with only the relevant fields. + * Padding is necessary to assure the offsets of these fields + */ +struct rte_kdp_mbuf { + void *buf_addr __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); + char pad0[10]; + + /**< Start address of data in segment buffer. */ + uint16_t data_off; + char pad1[4]; + uint64_t ol_flags; /**< Offload features. */ + char pad2[4]; + + /**< Total pkt len: sum of all segment data_len. */ + uint32_t pkt_len; + + /**< Amount of data in segment buffer. */ + uint16_t data_len; + + /* fields on second cache line */ + char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); + void *pool; + void *next; +}; + +/* + * Struct used to create a KDP device. Passed to the kernel in IOCTL call + */ +struct rte_kdp_device_info { + char name[RTE_KDP_NAMESIZE]; /**< Network device name for KDP */ + + phys_addr_t tx_phys; + phys_addr_t rx_phys; + phys_addr_t alloc_phys; + phys_addr_t free_phys; + + /* mbuf mempool */ + void *mbuf_va; + phys_addr_t mbuf_phys; + + uint16_t group_id; /**< Group ID */ + uint32_t core_id; /**< core ID to bind for kernel thread */ + + uint8_t force_bind : 1; /**< Flag for kernel thread binding */ + + /* mbuf size */ + unsigned mbuf_size; +}; + +#define KDP_DEVICE "kdp" + +#define RTE_KDP_IOCTL_TEST _IOWR(0, 1, int) +#define RTE_KDP_IOCTL_CREATE _IOWR(0, 2, struct rte_kdp_device_info) +#define RTE_KDP_IOCTL_RELEASE _IOWR(0, 3, struct rte_kdp_device_info) + +#endif /* _RTE_KDP_COMMON_H_ */ diff --git a/lib/librte_eal/linuxapp/kdp/Makefile b/lib/librte_eal/linuxapp/kdp/Makefile new file mode 100644 index 0000000..764f6a8 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/Makefile @@ -0,0 +1,56 @@ +# BSD LICENSE +# +# Copyright(c) 2016 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# module name and path +# +MODULE = rte_kdp + +# +# CFLAGS +# +MODULE_CFLAGS += -I$(SRCDIR) --param max-inline-insns-single=50 +MODULE_CFLAGS += -I$(RTE_OUTPUT)/include +MODULE_CFLAGS += -include $(RTE_OUTPUT)/include/rte_config.h +MODULE_CFLAGS += -Wall -Werror + +# this lib needs main eal +DEPDIRS-y += lib/librte_eal/linuxapp/eal + +# +# all source are stored in SRCS-y +# +SRCS-y += kdp_misc.c +SRCS-y += kdp_net.c + +include $(RTE_SDK)/mk/rte.module.mk diff --git a/lib/librte_eal/linuxapp/kdp/kdp_dev.h b/lib/librte_eal/linuxapp/kdp/kdp_dev.h new file mode 100644 index 0000000..52952b4 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/kdp_dev.h @@ -0,0 +1,82 @@ +/*- + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + */ + +#ifndef _KDP_DEV_H_ +#define _KDP_DEV_H_ + +#include <exec-env/rte_kdp_common.h> + +/** + * A structure describing the private information for a kdp device. + */ +struct kdp_dev { + /* kdp list */ + struct list_head list; + + struct net_device_stats stats; + uint16_t group_id; /* Group ID of a group of KDP devices */ + unsigned core_id; /* Core ID to bind */ + char name[RTE_KDP_NAMESIZE]; /* Network device name */ + struct task_struct *pthread; + + /* wait queue for req/resp */ + wait_queue_head_t wq; + struct mutex sync_lock; + + /* kdp device */ + struct net_device *net_dev; + + /* queue for packets to be sent out */ + void *tx_q; + + /* queue for the packets received */ + void *rx_q; + + /* queue for the allocated mbufs those can be used to save sk buffs */ + void *alloc_q; + + /* free queue for the mbufs to be freed */ + void *free_q; + + void *sync_kva; + void *sync_va; + + void *mbuf_kva; + void *mbuf_va; + + /* mbuf size */ + unsigned mbuf_size; +}; + +void kdp_net_rx(struct kdp_dev *kdp); +void kdp_net_init(struct net_device *dev); +void kdp_net_config_lo_mode(char *lo_str); + +#define KDP_ERR(args...) printk(KERN_DEBUG "KDP: Error: " args) +#define KDP_PRINT(args...) printk(KERN_DEBUG "KDP: " args) + +#ifdef RTE_KDP_KO_DEBUG +#define KDP_DBG(args...) printk(KERN_DEBUG "KDP: " args) +#else +#define KDP_DBG(args...) +#endif + +#endif diff --git a/lib/librte_eal/linuxapp/kdp/kdp_fifo.h b/lib/librte_eal/linuxapp/kdp/kdp_fifo.h new file mode 100644 index 0000000..a5fe080 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/kdp_fifo.h @@ -0,0 +1,91 @@ +/*- + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + */ + +#ifndef _KDP_FIFO_H_ +#define _KDP_FIFO_H_ + +#include <exec-env/rte_kdp_common.h> + +/** + * Adds num elements into the fifo. Return the number actually written + */ +static inline unsigned +kdp_fifo_put(struct rte_kdp_fifo *fifo, void **data, unsigned num) +{ + unsigned i = 0; + unsigned fifo_write = fifo->write; + unsigned fifo_read = fifo->read; + unsigned new_write = fifo_write; + + for (i = 0; i < num; i++) { + new_write = (new_write + 1) & (fifo->len - 1); + + if (new_write == fifo_read) + break; + fifo->buffer[fifo_write] = data[i]; + fifo_write = new_write; + } + fifo->write = fifo_write; + + return i; +} + +/** + * Get up to num elements from the fifo. Return the number actully read + */ +static inline unsigned +kdp_fifo_get(struct rte_kdp_fifo *fifo, void **data, unsigned num) +{ + unsigned i = 0; + unsigned new_read = fifo->read; + unsigned fifo_write = fifo->write; + + for (i = 0; i < num; i++) { + if (new_read == fifo_write) + break; + + data[i] = fifo->buffer[new_read]; + new_read = (new_read + 1) & (fifo->len - 1); + } + fifo->read = new_read; + + return i; +} + +/** + * Get the num of elements in the fifo + */ +static inline unsigned +kdp_fifo_count(struct rte_kdp_fifo *fifo) +{ + return (fifo->len + fifo->write - fifo->read) & (fifo->len - 1); +} + +/** + * Get the num of available elements in the fifo + */ +static inline unsigned +kdp_fifo_free_count(struct rte_kdp_fifo *fifo) +{ + return (fifo->read - fifo->write - 1) & (fifo->len - 1); +} + +#endif /* _KDP_FIFO_H_ */ diff --git a/lib/librte_eal/linuxapp/kdp/kdp_misc.c b/lib/librte_eal/linuxapp/kdp/kdp_misc.c new file mode 100644 index 0000000..d97d1c0 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/kdp_misc.c @@ -0,0 +1,463 @@ +/*- + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; + * + * The full GNU General Public License is included in this distribution + * in the file called LICENSE.GPL. + * + * Contact Information: + * Intel Corporation + */ + +#include <linux/version.h> +#include <linux/miscdevice.h> +#include <linux/netdevice.h> +#include <linux/pci.h> +#include <linux/kthread.h> +#include <net/netns/generic.h> + +#include "kdp_dev.h" + +#define KDP_RX_LOOP_NUM 1000 +#define KDP_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ +#define KDP_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ + +static unsigned long device_in_use; /* device in use flag */ +static struct task_struct *kdp_kthread; +static struct rw_semaphore kdp_list_lock; +static struct list_head kdp_list_head; + +/* loopback mode */ +static char *lo_mode; + +/* Kernel thread mode */ +static char *kthread_mode; +static unsigned multiple_kthread_on; + +static int +kdp_thread_single(void *data) +{ + struct kdp_dev *dev; + int j; + + while (!kthread_should_stop()) { + down_read(&kdp_list_lock); + for (j = 0; j < KDP_RX_LOOP_NUM; j++) { + list_for_each_entry(dev, &kdp_list_head, list) { + kdp_net_rx(dev); + } + } + up_read(&kdp_list_lock); +#ifdef RTE_KDP_PREEMPT_DEFAULT + /* reschedule out for a while */ + schedule_timeout_interruptible( + usecs_to_jiffies(KDP_KTHREAD_RESCHEDULE_INTERVAL)); +#endif + } + + return 0; +} + +static int +kdp_thread_multiple(void *param) +{ + int j; + struct kdp_dev *dev = (struct kdp_dev *)param; + + while (!kthread_should_stop()) { + for (j = 0; j < KDP_RX_LOOP_NUM; j++) + kdp_net_rx(dev); + +#ifdef RTE_KDP_PREEMPT_DEFAULT + schedule_timeout_interruptible( + usecs_to_jiffies(KDP_KTHREAD_RESCHEDULE_INTERVAL)); +#endif + } + + return 0; +} + +static int +kdp_dev_remove(struct kdp_dev *dev) +{ + if (!dev) + return -ENODEV; + + if (dev->net_dev) { + unregister_netdev(dev->net_dev); + free_netdev(dev->net_dev); + } + + return 0; +} + +static int +kdp_check_param(struct kdp_dev *kdp, struct rte_kdp_device_info *dev) +{ + if (!kdp || !dev) + return -1; + + /* Check if network name has been used */ + if (!strncmp(kdp->name, dev->name, RTE_KDP_NAMESIZE)) { + KDP_ERR("KDP name %s duplicated\n", dev->name); + return -1; + } + + return 0; +} + +static int +kdp_ioctl_create(unsigned int ioctl_num, unsigned long ioctl_param) +{ + int ret; + struct rte_kdp_device_info dev_info; + struct net_device *net_dev = NULL; + struct kdp_dev *kdp, *dev, *n; + + printk(KERN_INFO "KDP: Creating kdp...\n"); + /* Check the buffer size, to avoid warning */ + if (_IOC_SIZE(ioctl_num) > sizeof(dev_info)) + return -EINVAL; + + /* Copy kdp info from user space */ + ret = copy_from_user(&dev_info, (void *)ioctl_param, sizeof(dev_info)); + if (ret) { + KDP_ERR("copy_from_user in kdp_ioctl_create"); + return -EIO; + } + + /** + * Check if the cpu core id is valid for binding, + * for multiple kernel thread mode. + */ + if (multiple_kthread_on && dev_info.force_bind && + !cpu_online(dev_info.core_id)) { + KDP_ERR("cpu %u is not online\n", dev_info.core_id); + return -EINVAL; + } + + /* Check if it has been created */ + down_read(&kdp_list_lock); + list_for_each_entry_safe(dev, n, &kdp_list_head, list) { + if (kdp_check_param(dev, &dev_info) < 0) { + up_read(&kdp_list_lock); + return -EINVAL; + } + } + up_read(&kdp_list_lock); + + net_dev = alloc_netdev(sizeof(struct kdp_dev), dev_info.name, +#ifdef NET_NAME_UNKNOWN + NET_NAME_UNKNOWN, +#endif + kdp_net_init); + if (net_dev == NULL) { + KDP_ERR("error allocating device \"%s\"\n", dev_info.name); + return -EBUSY; + } + + kdp = netdev_priv(net_dev); + + kdp->net_dev = net_dev; + kdp->group_id = dev_info.group_id; + kdp->core_id = dev_info.core_id; + strncpy(kdp->name, dev_info.name, RTE_KDP_NAMESIZE); + + /* Translate user space info into kernel space info */ + kdp->tx_q = phys_to_virt(dev_info.tx_phys); + kdp->rx_q = phys_to_virt(dev_info.rx_phys); + kdp->alloc_q = phys_to_virt(dev_info.alloc_phys); + kdp->free_q = phys_to_virt(dev_info.free_phys); + + kdp->mbuf_kva = phys_to_virt(dev_info.mbuf_phys); + kdp->mbuf_va = dev_info.mbuf_va; + + kdp->mbuf_size = dev_info.mbuf_size; + + KDP_PRINT("tx_phys: 0x%016llx, tx_q addr: 0x%p\n", + (unsigned long long) dev_info.tx_phys, kdp->tx_q); + KDP_PRINT("rx_phys: 0x%016llx, rx_q addr: 0x%p\n", + (unsigned long long) dev_info.rx_phys, kdp->rx_q); + KDP_PRINT("alloc_phys: 0x%016llx, alloc_q addr: 0x%p\n", + (unsigned long long) dev_info.alloc_phys, kdp->alloc_q); + KDP_PRINT("free_phys: 0x%016llx, free_q addr: 0x%p\n", + (unsigned long long) dev_info.free_phys, kdp->free_q); + KDP_PRINT("mbuf_phys: 0x%016llx, mbuf_kva: 0x%p\n", + (unsigned long long) dev_info.mbuf_phys, kdp->mbuf_kva); + KDP_PRINT("mbuf_va: 0x%p\n", dev_info.mbuf_va); + KDP_PRINT("mbuf_size: %u\n", kdp->mbuf_size); + + ret = register_netdev(net_dev); + if (ret) { + KDP_ERR("error %i registering device \"%s\"\n", + ret, dev_info.name); + kdp_dev_remove(kdp); + return -ENODEV; + } + + /** + * Create a new kernel thread for multiple mode, set its core affinity, + * and finally wake it up. + */ + if (multiple_kthread_on) { + kdp->pthread = kthread_create(kdp_thread_multiple, + (void *)kdp, + "kdp_%s", kdp->name); + if (IS_ERR(kdp->pthread)) { + kdp_dev_remove(kdp); + return -ECANCELED; + } + if (dev_info.force_bind) + kthread_bind(kdp->pthread, kdp->core_id); + wake_up_process(kdp->pthread); + } + + down_write(&kdp_list_lock); + list_add(&kdp->list, &kdp_list_head); + up_write(&kdp_list_lock); + + return 0; +} + +static int +kdp_ioctl_release(unsigned int ioctl_num, unsigned long ioctl_param) +{ + int ret = -EINVAL; + struct kdp_dev *dev, *n; + struct rte_kdp_device_info dev_info; + + if (_IOC_SIZE(ioctl_num) > sizeof(dev_info)) + return -EINVAL; + + ret = copy_from_user(&dev_info, (void *)ioctl_param, sizeof(dev_info)); + if (ret) { + KDP_ERR("copy_from_user in kdp_ioctl_release"); + return -EIO; + } + + /* Release the network device according to its name */ + if (strlen(dev_info.name) == 0) + return ret; + + down_write(&kdp_list_lock); + list_for_each_entry_safe(dev, n, &kdp_list_head, list) { + if (strncmp(dev->name, dev_info.name, RTE_KDP_NAMESIZE) != 0) + continue; + + if (multiple_kthread_on && dev->pthread != NULL) { + kthread_stop(dev->pthread); + dev->pthread = NULL; + } + + kdp_dev_remove(dev); + list_del(&dev->list); + ret = 0; + break; + } + up_write(&kdp_list_lock); + printk(KERN_INFO "KDP: %s release kdp named %s\n", + (ret == 0 ? "Successfully" : "Unsuccessfully"), dev_info.name); + + return ret; +} + +static int +kdp_ioctl(struct inode *inode, unsigned int ioctl_num, + unsigned long ioctl_param) +{ + int ret = -EINVAL; + + KDP_DBG("IOCTL num=0x%0x param=0x%0lx\n", ioctl_num, ioctl_param); + + /* + * Switch according to the ioctl called + */ + switch (_IOC_NR(ioctl_num)) { + case _IOC_NR(RTE_KDP_IOCTL_TEST): + /* For test only, not used */ + break; + case _IOC_NR(RTE_KDP_IOCTL_CREATE): + ret = kdp_ioctl_create(ioctl_num, ioctl_param); + break; + case _IOC_NR(RTE_KDP_IOCTL_RELEASE): + ret = kdp_ioctl_release(ioctl_num, ioctl_param); + break; + default: + KDP_DBG("IOCTL default\n"); + break; + } + + return ret; +} + +static int +kdp_open(struct inode *inode, struct file *file) +{ + /* kdp device can be opened by one user only per netns */ + if (test_and_set_bit(KDP_DEV_IN_USE_BIT_NUM, &device_in_use)) + return -EBUSY; + + /* Create kernel thread for single mode */ + if (multiple_kthread_on == 0) { + KDP_PRINT("Single kernel thread for all KDP devices\n"); + /* Create kernel thread for RX */ + kdp_kthread = kthread_run(kdp_thread_single, NULL, + "kdp_single"); + if (IS_ERR(kdp_kthread)) { + KDP_ERR("Unable to create kernel threaed\n"); + return PTR_ERR(kdp_kthread); + } + } else + KDP_PRINT("Multiple kernel thread mode enabled\n"); + + KDP_PRINT("/dev/kdp opened\n"); + + return 0; +} + +static int +kdp_release(struct inode *inode, struct file *file) +{ + struct kdp_dev *dev, *n; + + /* Stop kernel thread for single mode */ + if (multiple_kthread_on == 0) { + /* Stop kernel thread */ + kthread_stop(kdp_kthread); + kdp_kthread = NULL; + } + + down_write(&kdp_list_lock); + list_for_each_entry_safe(dev, n, &kdp_list_head, list) { + /* Stop kernel thread for multiple mode */ + if (multiple_kthread_on && dev->pthread != NULL) { + kthread_stop(dev->pthread); + dev->pthread = NULL; + } + + kdp_dev_remove(dev); + list_del(&dev->list); + } + up_write(&kdp_list_lock); + + /* Clear the bit of device in use */ + clear_bit(KDP_DEV_IN_USE_BIT_NUM, &device_in_use); + + KDP_PRINT("/dev/kdp closed\n"); + + return 0; +} + +static int +kdp_compat_ioctl(struct inode *inode, unsigned int ioctl_num, + unsigned long ioctl_param) +{ + /* 32 bits app on 64 bits OS to be supported later */ + KDP_PRINT("Not implemented.\n"); + + return -EINVAL; +} + +static const struct file_operations kdp_fops = { + .owner = THIS_MODULE, + .open = kdp_open, + .release = kdp_release, + .unlocked_ioctl = (void *)kdp_ioctl, + .compat_ioctl = (void *)kdp_compat_ioctl, +}; + +static struct miscdevice kdp_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = KDP_DEVICE, + .fops = &kdp_fops, +}; + +static int __init +kdp_parse_kthread_mode(void) +{ + if (!kthread_mode) + return 0; + + if (strcmp(kthread_mode, "single") == 0) + return 0; + else if (strcmp(kthread_mode, "multiple") == 0) + multiple_kthread_on = 1; + else + return -1; + + return 0; +} + +static int __init +kdp_init(void) +{ + int rc; + + KDP_PRINT("######## DPDK kdp module loading ########\n"); + + if (kdp_parse_kthread_mode() < 0) { + KDP_ERR("Invalid parameter for kthread_mode\n"); + return -EINVAL; + } + + rc = misc_register(&kdp_misc); + if (rc != 0) { + KDP_ERR("Misc registration failed\n"); + return rc; + } + + /* Configure the lo mode according to the input parameter */ + kdp_net_config_lo_mode(lo_mode); + + /* Clear the bit of device in use */ + clear_bit(KDP_DEV_IN_USE_BIT_NUM, &device_in_use); + init_rwsem(&kdp_list_lock); + INIT_LIST_HEAD(&kdp_list_head); + + KDP_PRINT("######## DPDK kdp module loaded ########\n"); + + return 0; +} +module_init(kdp_init); + +static void __exit +kdp_exit(void) +{ + misc_deregister(&kdp_misc); + KDP_PRINT("####### DPDK kdp module unloaded #######\n"); +} +module_exit(kdp_exit); + +module_param(lo_mode, charp, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(lo_mode, +"KDP loopback mode (default=lo_mode_none):\n" +" lo_mode_none Kernel loopback disabled\n" +" lo_mode_fifo Enable kernel loopback with fifo\n" +" lo_mode_fifo_skb Enable kernel loopback with fifo and skb buffer\n" +"\n" +); + +module_param(kthread_mode, charp, S_IRUGO); +MODULE_PARM_DESC(kthread_mode, +"Kernel thread mode (default=single):\n" +" single Single kernel thread mode enabled.\n" +" multiple Multiple kernel thread mode enabled.\n" +"\n" +); + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_AUTHOR("Intel Corporation"); +MODULE_DESCRIPTION("Kernel Module for managing kdp devices"); diff --git a/lib/librte_eal/linuxapp/kdp/kdp_net.c b/lib/librte_eal/linuxapp/kdp/kdp_net.c new file mode 100644 index 0000000..5c669f5 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/kdp_net.c @@ -0,0 +1,573 @@ +/*- + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + */ + +/* + * This code is inspired from the book "Linux Device Drivers" by + * Alessandro Rubini and Jonathan Corbet, published by O'Reilly & Associates + */ + +#include <linux/version.h> +#include <linux/etherdevice.h> /* eth_type_trans */ + +#include "kdp_fifo.h" +#include "kdp_dev.h" + +#define WD_TIMEOUT 5 /*jiffies */ + +#define MBUF_BURST_SZ 32 + +/* typedef for rx function */ +typedef void (*kdp_net_rx_t)(struct kdp_dev *kdp); + +/* + * Open and close + */ +static int +kdp_net_open(struct net_device *dev) +{ + random_ether_addr(dev->dev_addr); + netif_start_queue(dev); + + return 0; +} + +static int +kdp_net_release(struct net_device *dev) +{ + netif_stop_queue(dev); /* can't transmit any more */ + + return 0; +} + +/* + * Configuration changes (passed on by ifconfig) + */ +static int +kdp_net_config(struct net_device *dev, struct ifmap *map) +{ + if (dev->flags & IFF_UP) /* can't act on a running interface */ + return -EBUSY; + + /* ignore other fields */ + return 0; +} + +/* + * Transmit a packet (called by the kernel) + */ +static int +kdp_net_tx(struct sk_buff *skb, struct net_device *dev) +{ + int len = 0; + unsigned ret; + struct kdp_dev *kdp = netdev_priv(dev); + struct rte_kdp_mbuf *pkt_kva = NULL; + struct rte_kdp_mbuf *pkt_va = NULL; + + dev->trans_start = jiffies; /* save the timestamp */ + + /* Check if the length of skb is less than mbuf size */ + if (skb->len > kdp->mbuf_size) + goto drop; + + /** + * Check if it has at least one free entry in tx_q and + * one entry in alloc_q. + */ + if (kdp_fifo_free_count(kdp->tx_q) == 0 || + kdp_fifo_count(kdp->alloc_q) == 0) { + /** + * If no free entry in tx_q or no entry in alloc_q, + * drops skb and goes out. + */ + goto drop; + } + + /* dequeue a mbuf from alloc_q */ + ret = kdp_fifo_get(kdp->alloc_q, (void **)&pkt_va, 1); + if (likely(ret == 1)) { + void *data_kva; + + pkt_kva = (void *)pkt_va - kdp->mbuf_va + kdp->mbuf_kva; + data_kva = pkt_kva->buf_addr + pkt_kva->data_off - kdp->mbuf_va + + kdp->mbuf_kva; + + len = skb->len; + memcpy(data_kva, skb->data, len); + if (unlikely(len < ETH_ZLEN)) { + memset(data_kva + len, 0, ETH_ZLEN - len); + len = ETH_ZLEN; + } + pkt_kva->pkt_len = len; + pkt_kva->data_len = len; + + /* enqueue mbuf into tx_q */ + ret = kdp_fifo_put(kdp->tx_q, (void **)&pkt_va, 1); + if (unlikely(ret != 1)) { + /* Failing should not happen */ + KDP_ERR("Fail to enqueue mbuf into tx_q\n"); + goto drop; + } + } else { + /* Failing should not happen */ + KDP_ERR("Fail to dequeue mbuf from alloc_q\n"); + goto drop; + } + + /* Free skb and update statistics */ + dev_kfree_skb(skb); + kdp->stats.tx_bytes += len; + kdp->stats.tx_packets++; + + return NETDEV_TX_OK; + +drop: + /* Free skb and update statistics */ + dev_kfree_skb(skb); + kdp->stats.tx_dropped++; + + return NETDEV_TX_OK; +} + +static int +kdp_net_change_mtu(struct net_device *dev, int new_mtu) +{ + KDP_DBG("kdp_net_change_mtu new mtu %d to be set\n", new_mtu); + + dev->mtu = new_mtu; + + return 0; +} + +/* + * Ioctl commands + */ +static int +kdp_net_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) +{ + KDP_DBG("kdp_net_ioctl %d\n", + ((struct kdp_dev *)netdev_priv(dev))->group_id); + + return 0; +} + +static void +kdp_net_set_rx_mode(struct net_device *dev) +{ +} + +/* + * Return statistics to the caller + */ +static struct net_device_stats * +kdp_net_stats(struct net_device *dev) +{ + struct kdp_dev *kdp = netdev_priv(dev); + return &kdp->stats; +} + +/* + * Deal with a transmit timeout. + */ +static void +kdp_net_tx_timeout(struct net_device *dev) +{ + struct kdp_dev *kdp = netdev_priv(dev); + + KDP_DBG("Transmit timeout at %ld, latency %ld\n", jiffies, + jiffies - dev->trans_start); + + kdp->stats.tx_errors++; + netif_wake_queue(dev); +} + +/** + * kdp_net_set_mac - Change the Ethernet Address of the KDP NIC + * @netdev: network interface device structure + * @p: pointer to an address structure + * + * Returns 0 on success, negative on failure + **/ +static int kdp_net_set_mac(struct net_device *netdev, void *p) +{ + struct sockaddr *addr = p; + if (!is_valid_ether_addr((unsigned char *)(addr->sa_data))) + return -EADDRNOTAVAIL; + memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len); + return 0; +} + +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0)) +static int kdp_net_change_carrier(struct net_device *dev, bool new_carrier) +{ + if (new_carrier) + netif_carrier_on(dev); + else + netif_carrier_off(dev); + return 0; +} +#endif + +static const struct net_device_ops kdp_net_netdev_ops = { + .ndo_open = kdp_net_open, + .ndo_stop = kdp_net_release, + .ndo_set_config = kdp_net_config, + .ndo_start_xmit = kdp_net_tx, + .ndo_change_mtu = kdp_net_change_mtu, + .ndo_do_ioctl = kdp_net_ioctl, + .ndo_set_rx_mode = kdp_net_set_rx_mode, + .ndo_get_stats = kdp_net_stats, + .ndo_tx_timeout = kdp_net_tx_timeout, + .ndo_set_mac_address = kdp_net_set_mac, +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0)) + .ndo_change_carrier = kdp_net_change_carrier, +#endif +}; + +/* + * Fill the eth header + */ +static int +kdp_net_header(struct sk_buff *skb, struct net_device *dev, + unsigned short type, const void *daddr, + const void *saddr, unsigned int len) +{ + struct ethhdr *eth = (struct ethhdr *) skb_push(skb, ETH_HLEN); + + memcpy(eth->h_source, saddr ? saddr : dev->dev_addr, dev->addr_len); + memcpy(eth->h_dest, daddr ? daddr : dev->dev_addr, dev->addr_len); + eth->h_proto = htons(type); + + return dev->hard_header_len; +} + +/* + * Re-fill the eth header + */ +#if (LINUX_VERSION_CODE < KERNEL_VERSION(4, 1, 0)) +static int +kdp_net_rebuild_header(struct sk_buff *skb) +{ + struct net_device *dev = skb->dev; + struct ethhdr *eth = (struct ethhdr *) skb->data; + + memcpy(eth->h_source, dev->dev_addr, dev->addr_len); + memcpy(eth->h_dest, dev->dev_addr, dev->addr_len); + + return 0; +} +#endif /* < 4.1.0 */ + +static const struct header_ops kdp_net_header_ops = { + .create = kdp_net_header, +#if (LINUX_VERSION_CODE < KERNEL_VERSION(4, 1, 0)) + .rebuild = kdp_net_rebuild_header, +#endif /* < 4.1.0 */ + .cache = NULL, /* disable caching */ +}; + +void +kdp_net_init(struct net_device *dev) +{ + struct kdp_dev *kdp = netdev_priv(dev); + + KDP_DBG("kdp_net_init\n"); + + init_waitqueue_head(&kdp->wq); + mutex_init(&kdp->sync_lock); + + ether_setup(dev); /* assign some of the fields */ + dev->netdev_ops = &kdp_net_netdev_ops; + dev->header_ops = &kdp_net_header_ops; + dev->watchdog_timeo = WD_TIMEOUT; +} + +/* + * RX: normal working mode + */ +static void +kdp_net_rx_normal(struct kdp_dev *kdp) +{ + unsigned ret; + uint32_t len; + unsigned i, num_rx, num_fq; + struct rte_kdp_mbuf *kva; + struct rte_kdp_mbuf *va[MBUF_BURST_SZ]; + void *data_kva; + unsigned mbuf_burst_size = MBUF_BURST_SZ; + + struct sk_buff *skb; + struct net_device *dev = kdp->net_dev; + + /* Get the number of free entries in free_q */ + num_fq = kdp_fifo_free_count(kdp->free_q); + if (num_fq == 0) { + /* No room on the free_q, bail out */ + return; + } + + /* Calculate the number of entries to dequeue from rx_q */ + num_rx = min(num_fq, mbuf_burst_size); + + /* Burst dequeue from rx_q */ + num_rx = kdp_fifo_get(kdp->rx_q, (void **)va, num_rx); + if (num_rx == 0) + return; + + /* Transfer received packets to netif */ + for (i = 0; i < num_rx; i++) { + kva = (void *)va[i] - kdp->mbuf_va + kdp->mbuf_kva; + len = kva->data_len; + data_kva = kva->buf_addr + kva->data_off - kdp->mbuf_va + + kdp->mbuf_kva; + + skb = dev_alloc_skb(len + 2); + if (!skb) { + KDP_ERR("Out of mem, dropping pkts\n"); + /* Update statistics */ + kdp->stats.rx_dropped++; + } else { + /* Align IP on 16B boundary */ + skb_reserve(skb, 2); + memcpy(skb_put(skb, len), data_kva, len); + skb->dev = dev; + skb->protocol = eth_type_trans(skb, dev); + skb->ip_summed = CHECKSUM_UNNECESSARY; + + /* Call netif interface */ + netif_rx(skb); + + /* Update statistics */ + kdp->stats.rx_bytes += len; + kdp->stats.rx_packets++; + } + } + + /* Burst enqueue mbufs into free_q */ + ret = kdp_fifo_put(kdp->free_q, (void **)va, num_rx); + if (ret != num_rx) + /* Failing should not happen */ + KDP_ERR("Fail to enqueue entries into free_q\n"); +} + +/* + * RX: loopback with enqueue/dequeue fifos. + */ +static void +kdp_net_rx_lo_fifo(struct kdp_dev *kdp) +{ + unsigned ret; + uint32_t len; + unsigned i, num, num_rq, num_tq, num_aq, num_fq; + struct rte_kdp_mbuf *kva; + struct rte_kdp_mbuf *va[MBUF_BURST_SZ]; + void *data_kva; + struct rte_kdp_mbuf *alloc_kva; + struct rte_kdp_mbuf *alloc_va[MBUF_BURST_SZ]; + void *alloc_data_kva; + unsigned mbuf_burst_size = MBUF_BURST_SZ; + + /* Get the number of entries in rx_q */ + num_rq = kdp_fifo_count(kdp->rx_q); + + /* Get the number of free entrie in tx_q */ + num_tq = kdp_fifo_free_count(kdp->tx_q); + + /* Get the number of entries in alloc_q */ + num_aq = kdp_fifo_count(kdp->alloc_q); + + /* Get the number of free entries in free_q */ + num_fq = kdp_fifo_free_count(kdp->free_q); + + /* Calculate the number of entries to be dequeued from rx_q */ + num = min(num_rq, num_tq); + num = min(num, num_aq); + num = min(num, num_fq); + num = min(num, mbuf_burst_size); + + /* Return if no entry to dequeue from rx_q */ + if (num == 0) + return; + + /* Burst dequeue from rx_q */ + ret = kdp_fifo_get(kdp->rx_q, (void **)va, num); + if (ret == 0) + return; /* Failing should not happen */ + + /* Dequeue entries from alloc_q */ + ret = kdp_fifo_get(kdp->alloc_q, (void **)alloc_va, num); + if (ret) { + num = ret; + /* Copy mbufs */ + for (i = 0; i < num; i++) { + kva = (void *)va[i] - kdp->mbuf_va + kdp->mbuf_kva; + len = kva->pkt_len; + data_kva = kva->buf_addr + kva->data_off - + kdp->mbuf_va + kdp->mbuf_kva; + + alloc_kva = (void *)alloc_va[i] - kdp->mbuf_va + + kdp->mbuf_kva; + alloc_data_kva = alloc_kva->buf_addr + + alloc_kva->data_off - kdp->mbuf_va + + kdp->mbuf_kva; + memcpy(alloc_data_kva, data_kva, len); + alloc_kva->pkt_len = len; + alloc_kva->data_len = len; + + kdp->stats.tx_bytes += len; + kdp->stats.rx_bytes += len; + } + + /* Burst enqueue mbufs into tx_q */ + ret = kdp_fifo_put(kdp->tx_q, (void **)alloc_va, num); + if (ret != num) + /* Failing should not happen */ + KDP_ERR("Fail to enqueue mbufs into tx_q\n"); + } + + /* Burst enqueue mbufs into free_q */ + ret = kdp_fifo_put(kdp->free_q, (void **)va, num); + if (ret != num) + /* Failing should not happen */ + KDP_ERR("Fail to enqueue mbufs into free_q\n"); + + /** + * Update statistic, and enqueue/dequeue failure is impossible, + * as all queues are checked at first. + */ + kdp->stats.tx_packets += num; + kdp->stats.rx_packets += num; +} + +/* + * RX: loopback with enqueue/dequeue fifos and sk buffer copies. + */ +static void +kdp_net_rx_lo_fifo_skb(struct kdp_dev *kdp) +{ + unsigned ret; + uint32_t len; + unsigned i, num_rq, num_fq, num; + struct rte_kdp_mbuf *kva; + struct rte_kdp_mbuf *va[MBUF_BURST_SZ]; + void *data_kva; + struct sk_buff *skb; + struct net_device *dev = kdp->net_dev; + unsigned mbuf_burst_size = MBUF_BURST_SZ; + + /* Get the number of entries in rx_q */ + num_rq = kdp_fifo_count(kdp->rx_q); + + /* Get the number of free entries in free_q */ + num_fq = kdp_fifo_free_count(kdp->free_q); + + /* Calculate the number of entries to dequeue from rx_q */ + num = min(num_rq, num_fq); + num = min(num, mbuf_burst_size); + + /* Return if no entry to dequeue from rx_q */ + if (num == 0) + return; + + /* Burst dequeue mbufs from rx_q */ + ret = kdp_fifo_get(kdp->rx_q, (void **)va, num); + if (ret == 0) + return; + + /* Copy mbufs to sk buffer and then call tx interface */ + for (i = 0; i < num; i++) { + kva = (void *)va[i] - kdp->mbuf_va + kdp->mbuf_kva; + len = kva->data_len; + data_kva = kva->buf_addr + kva->data_off - kdp->mbuf_va + + kdp->mbuf_kva; + + skb = dev_alloc_skb(len + 2); + if (skb == NULL) + KDP_ERR("Out of mem, dropping pkts\n"); + else { + /* Align IP on 16B boundary */ + skb_reserve(skb, 2); + memcpy(skb_put(skb, len), data_kva, len); + skb->dev = dev; + skb->ip_summed = CHECKSUM_UNNECESSARY; + dev_kfree_skb(skb); + } + + /* Simulate real usage, allocate/copy skb twice */ + skb = dev_alloc_skb(len + 2); + if (skb == NULL) { + KDP_ERR("Out of mem, dropping pkts\n"); + kdp->stats.rx_dropped++; + } else { + /* Align IP on 16B boundary */ + skb_reserve(skb, 2); + memcpy(skb_put(skb, len), data_kva, len); + skb->dev = dev; + skb->ip_summed = CHECKSUM_UNNECESSARY; + + kdp->stats.rx_bytes += len; + kdp->stats.rx_packets++; + + /* call tx interface */ + kdp_net_tx(skb, dev); + } + } + + /* enqueue all the mbufs from rx_q into free_q */ + ret = kdp_fifo_put(kdp->free_q, (void **)&va, num); + if (ret != num) + /* Failing should not happen */ + KDP_ERR("Fail to enqueue mbufs into free_q\n"); +} + +/* kdp rx function pointer, with default to normal rx */ +static kdp_net_rx_t kdp_net_rx_func = kdp_net_rx_normal; + +void +kdp_net_config_lo_mode(char *lo_str) +{ + if (!lo_str) { + KDP_PRINT("loopback disabled"); + return; + } + + if (!strcmp(lo_str, "lo_mode_none")) + KDP_PRINT("loopback disabled"); + else if (!strcmp(lo_str, "lo_mode_fifo")) { + KDP_PRINT("loopback mode=lo_mode_fifo enabled"); + kdp_net_rx_func = kdp_net_rx_lo_fifo; + } else if (!strcmp(lo_str, "lo_mode_fifo_skb")) { + KDP_PRINT("loopback mode=lo_mode_fifo_skb enabled"); + kdp_net_rx_func = kdp_net_rx_lo_fifo_skb; + } else + KDP_PRINT("Incognizant parameter, loopback disabled"); +} + +/* rx interface */ +void +kdp_net_rx(struct kdp_dev *kdp) +{ + /** + * It doesn't need to check if it is NULL pointer, + * as it has a default value + */ + (*kdp_net_rx_func)(kdp); +} -- 2.5.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] kdp: add kernel data path kernel module 2016-01-27 16:32 ` [dpdk-dev] [PATCH 1/2] kdp: add kernel data path kernel module Ferruh Yigit @ 2016-02-08 17:14 ` Reshma Pattan 2016-02-09 10:53 ` Ferruh Yigit 0 siblings, 1 reply; 29+ messages in thread From: Reshma Pattan @ 2016-02-08 17:14 UTC (permalink / raw) To: Ferruh Yigit, dev On 1/27/2016 4:32 PM, Ferruh Yigit wrote: > This kernel module is based on KNI module, but this one is stripped > version of it and only for data messages, no control functionality > provided. > > FIFO implementation of the KNI is kept exact same, but ethtool related > code removed and virtual network management related code simplified. > > This module contains kernel support to create network devices and > this module has a simple driver for virtual network device, the driver > simply puts/gets packets to/from FIFO instead of real hardware. > > FIFO is created owned by userspace application, which is for this case > KDP PMD. > > In long term this patch intends to replace the KNI and KNI will be > depreciated. > > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> > --- > > > diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h > new file mode 100644 > index 0000000..0c77f58 > --- /dev/null > +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h > > +/** > + * KDP name is part of memzone name. > + */ > +#define RTE_KDP_NAMESIZE 32 > + > +#ifndef RTE_CACHE_LINE_SIZE > +#define RTE_CACHE_LINE_SIZE 64 /**< Cache line size. */ > +#endif Jerin Jacob has patch for cleaning of MACRO RTE_CACHE_LINE_SIZE and having CONFIG_RTE_CACHE_LINE_SIZE in config file. You may need to remove this,once those changes are available in code. > + > +/* > + * The kernel image of the rte_mbuf struct, with only the relevant fields. > + * Padding is necessary to assure the offsets of these fields > + */ > +struct rte_kdp_mbuf { > + void *buf_addr __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); > + char pad0[10]; > + > + /**< Start address of data in segment buffer. */ > + uint16_t data_off; > + char pad1[4]; > + uint64_t ol_flags; /**< Offload features. */ You are not using ol_flags down in the code. Should this be removed? > + char pad2[4]; > + > + /**< Total pkt len: sum of all segment data_len. */ > + uint32_t pkt_len; > + > + /**< Amount of data in segment buffer. */ > + uint16_t data_len; > + > + /* fields on second cache line */ > + char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); > + void *pool; > + void *next; > +}; > + Does all structures should have "__rte_cache_aligned" in their declarations? Like other DPDK structs? > diff --git a/lib/librte_eal/linuxapp/kdp/kdp_dev.h b/lib/librte_eal/linuxapp/kdp/kdp_dev.h > new file mode 100644 > index 0000000..52952b4 > --- /dev/null > +++ b/lib/librte_eal/linuxapp/kdp/kdp_dev.h > > + > +#define KDP_ERR(args...) printk(KERN_DEBUG "KDP: Error: " args) > +#define KDP_PRINT(args...) printk(KERN_DEBUG "KDP: " args) > + > +#ifdef RTE_KDP_KO_DEBUG > +#define KDP_DBG(args...) printk(KERN_DEBUG "KDP: " args) Is it good to haveKERN_DEBUG "KDP:Debug: " like Errors? > diff --git a/lib/librte_eal/linuxapp/kdp/kdp_fifo.h b/lib/librte_eal/linuxapp/kdp/kdp_fifo.h > new file mode 100644 > index 0000000..a5fe080 > --- /dev/null > +++ b/lib/librte_eal/linuxapp/kdp/kdp_fifo.h > > +/** > + * Adds num elements into the fifo. Return the number actually written > + */ > +static inline unsigned > +kdp_fifo_put(struct rte_kdp_fifo *fifo, void **data, unsigned num) > +{ > + unsigned i = 0; > + unsigned fifo_write = fifo->write; > + unsigned fifo_read = fifo->read; > + unsigned new_write = fifo_write; > + > + for (i = 0; i < num; i++) { > + new_write = (new_write + 1) & (fifo->len - 1); > + > + if (new_write == fifo_read) > + break; > + fifo->buffer[fifo_write] = data[i]; > + fifo_write = new_write; > + } > + fifo->write = fifo_write; > + > + return i; > +} you can add header for all function declarations inside header file with below format. Same for other header files and functions. *@Description *@params *@Return value > diff --git a/lib/librte_eal/linuxapp/kdp/kdp_misc.c b/lib/librte_eal/linuxapp/kdp/kdp_misc.c > new file mode 100644 > index 0000000..d97d1c0 > --- /dev/null > +++ b/lib/librte_eal/linuxapp/kdp/kdp_misc.c > +static int > +kdp_compat_ioctl(struct inode *inode, unsigned int ioctl_num, > + unsigned long ioctl_param) > +{ > + /* 32 bits app on 64 bits OS to be supported later */ > + KDP_PRINT("Not implemented.\n"); Should this be warning/ERR instead of PRINT? > diff --git a/lib/librte_eal/linuxapp/kdp/kdp_net.c b/lib/librte_eal/linuxapp/kdp/kdp_net.c > new file mode 100644 > index 0000000..5c669f5 > --- /dev/null > +++ b/lib/librte_eal/linuxapp/kdp/kdp_net.c > + > +static void > +kdp_net_set_rx_mode(struct net_device *dev) > +{ > +} Empty function body? Thanks, Reshma ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH 1/2] kdp: add kernel data path kernel module 2016-02-08 17:14 ` Reshma Pattan @ 2016-02-09 10:53 ` Ferruh Yigit 0 siblings, 0 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-02-09 10:53 UTC (permalink / raw) To: Reshma Pattan; +Cc: dev On Mon, Feb 08, 2016 at 05:14:54PM +0000, Reshma Pattan wrote: Hi Reshma, > > > On 1/27/2016 4:32 PM, Ferruh Yigit wrote: >> This kernel module is based on KNI module, but this one is stripped >> version of it and only for data messages, no control functionality >> provided. >> >> FIFO implementation of the KNI is kept exact same, but ethtool related >> code removed and virtual network management related code simplified. >> >> This module contains kernel support to create network devices and >> this module has a simple driver for virtual network device, the driver >> simply puts/gets packets to/from FIFO instead of real hardware. >> >> FIFO is created owned by userspace application, which is for this case >> KDP PMD. >> >> In long term this patch intends to replace the KNI and KNI will be >> depreciated. >> >> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> >> --- >> >> diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h >> new file mode 100644 >> index 0000000..0c77f58 >> --- /dev/null >> +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h >> >> +/** >> + * KDP name is part of memzone name. >> + */ >> +#define RTE_KDP_NAMESIZE 32 >> + >> +#ifndef RTE_CACHE_LINE_SIZE >> +#define RTE_CACHE_LINE_SIZE 64 /**< Cache line size. */ >> +#endif > > Jerin Jacob has patch for cleaning of MACRO RTE_CACHE_LINE_SIZE and having > CONFIG_RTE_CACHE_LINE_SIZE > > in config file. You may need to remove this,once those changes are available > in code. > Thanks, when that patch applied, I can rebase code. >> + >> +/* >> + * The kernel image of the rte_mbuf struct, with only the relevant fields. >> + * Padding is necessary to assure the offsets of these fields >> + */ >> +struct rte_kdp_mbuf { >> + void *buf_addr __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); >> + char pad0[10]; >> + >> + /**< Start address of data in segment buffer. */ >> + uint16_t data_off; >> + char pad1[4]; >> + uint64_t ol_flags; /**< Offload features. */ > > You are not using ol_flags down in the code. Should this be removed? > Can't remove, this struct should match with rte_mbuf >> + char pad2[4]; >> + >> + /**< Total pkt len: sum of all segment data_len. */ >> + uint32_t pkt_len; >> + >> + /**< Amount of data in segment buffer. */ >> + uint16_t data_len; >> + >> + /* fields on second cache line */ >> + char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); >> + void *pool; >> + void *next; >> +}; >> + > > Does all structures should have "__rte_cache_aligned" in their declarations? > Like other DPDK structs? > This is kernel module. Doesn't know about userspace library macros. > >> diff --git a/lib/librte_eal/linuxapp/kdp/kdp_dev.h b/lib/librte_eal/linuxapp/kdp/kdp_dev.h >> new file mode 100644 >> index 0000000..52952b4 >> --- /dev/null >> +++ b/lib/librte_eal/linuxapp/kdp/kdp_dev.h >> >> + >> +#define KDP_ERR(args...) printk(KERN_DEBUG "KDP: Error: " args) >> +#define KDP_PRINT(args...) printk(KERN_DEBUG "KDP: " args) >> + >> +#ifdef RTE_KDP_KO_DEBUG >> +#define KDP_DBG(args...) printk(KERN_DEBUG "KDP: " args) > > Is it good to haveKERN_DEBUG "KDP:Debug: " like Errors? > I think extra "Debug" prefix is not required here. > >> diff --git a/lib/librte_eal/linuxapp/kdp/kdp_fifo.h b/lib/librte_eal/linuxapp/kdp/kdp_fifo.h >> new file mode 100644 >> index 0000000..a5fe080 >> --- /dev/null >> +++ b/lib/librte_eal/linuxapp/kdp/kdp_fifo.h >> >> +/** >> + * Adds num elements into the fifo. Return the number actually written >> + */ >> +static inline unsigned >> +kdp_fifo_put(struct rte_kdp_fifo *fifo, void **data, unsigned num) >> +{ >> + unsigned i = 0; >> + unsigned fifo_write = fifo->write; >> + unsigned fifo_read = fifo->read; >> + unsigned new_write = fifo_write; >> + >> + for (i = 0; i < num; i++) { >> + new_write = (new_write + 1) & (fifo->len - 1); >> + >> + if (new_write == fifo_read) >> + break; >> + fifo->buffer[fifo_write] = data[i]; >> + fifo_write = new_write; >> + } >> + fifo->write = fifo_write; >> + >> + return i; >> +} > > you can add header for all function declarations inside header file with > below format. Same for other header files and functions. > > *@Description > > *@params > > *@Return value > This is private header. > >> diff --git a/lib/librte_eal/linuxapp/kdp/kdp_misc.c b/lib/librte_eal/linuxapp/kdp/kdp_misc.c >> new file mode 100644 >> index 0000000..d97d1c0 >> --- /dev/null >> +++ b/lib/librte_eal/linuxapp/kdp/kdp_misc.c >> +static int >> +kdp_compat_ioctl(struct inode *inode, unsigned int ioctl_num, >> + unsigned long ioctl_param) >> +{ >> + /* 32 bits app on 64 bits OS to be supported later */ >> + KDP_PRINT("Not implemented.\n"); > > Should this be warning/ERR instead of PRINT? > >> diff --git a/lib/librte_eal/linuxapp/kdp/kdp_net.c b/lib/librte_eal/linuxapp/kdp/kdp_net.c >> new file mode 100644 >> index 0000000..5c669f5 >> --- /dev/null >> +++ b/lib/librte_eal/linuxapp/kdp/kdp_net.c >> + >> +static void >> +kdp_net_set_rx_mode(struct net_device *dev) >> +{ >> +} > > Empty function body? > Yes, this is part of net_device_ops, and required to fake multicast support. Regards, ferruh ^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication 2016-01-27 16:32 [dpdk-dev] [PATCH 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2016-01-27 16:32 ` [dpdk-dev] [PATCH 1/2] kdp: add kernel data path kernel module Ferruh Yigit @ 2016-01-27 16:32 ` Ferruh Yigit 2016-01-28 8:16 ` Xu, Qian Q 2016-02-09 17:33 ` Reshma Pattan 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2 siblings, 2 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-01-27 16:32 UTC (permalink / raw) To: dev This patch provides slow data path communication to the Linux kernel. Patch is based on librte_kni, and heavily re-uses it. The main difference is librte_kni library converted into a PMD, to provide ease of use for applications. Now any application can use slow path communication without any update in application, because of existing eal support for virtual PMD. Also this PMD supports two methods to send packets to the Linux, first one is custom FIFO implementation with help of KDP kernel module, second one is Linux in-kernel tun/tap support. PMD first checks for KDP kernel module, if fails it tries to create and use a tap interface. With FIFO method: PMD's rx_pkt_burst() get packets from FIFO, and tx_pkt_burst() puts packet to the FIFO. The corresponding Linux virtual network device driver code also gets/puts packets from FIFO as they are coming from hardware. With tun/tap method: no external kernel module required, PMD reads from and writes packets to the tap interface file descriptor. Tap interface has performance penalty against FIFO implementation. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> --- config/common_linuxapp | 1 + doc/guides/nics/pcap_ring.rst | 125 ++++++++- doc/guides/rel_notes/release_2_3.rst | 6 + drivers/net/Makefile | 3 +- drivers/net/kdp/Makefile | 61 ++++ drivers/net/kdp/rte_eth_kdp.c | 481 ++++++++++++++++++++++++++++++++ drivers/net/kdp/rte_kdp.c | 365 ++++++++++++++++++++++++ drivers/net/kdp/rte_kdp.h | 126 +++++++++ drivers/net/kdp/rte_kdp_fifo.h | 91 ++++++ drivers/net/kdp/rte_kdp_tap.c | 96 +++++++ drivers/net/kdp/rte_pmd_kdp_version.map | 4 + lib/librte_eal/common/include/rte_log.h | 3 +- mk/rte.app.mk | 3 +- 13 files changed, 1359 insertions(+), 6 deletions(-) create mode 100644 drivers/net/kdp/Makefile create mode 100644 drivers/net/kdp/rte_eth_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.h create mode 100644 drivers/net/kdp/rte_kdp_fifo.h create mode 100644 drivers/net/kdp/rte_kdp_tap.c create mode 100644 drivers/net/kdp/rte_pmd_kdp_version.map diff --git a/config/common_linuxapp b/config/common_linuxapp index 73c91d8..b9dec0c 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -322,6 +322,7 @@ CONFIG_RTE_LIBRTE_PMD_NULL=y # # Compile KDP PMD # +CONFIG_RTE_LIBRTE_PMD_KDP=y CONFIG_RTE_KDP_KMOD=y CONFIG_RTE_KDP_PREEMPT_DEFAULT=y diff --git a/doc/guides/nics/pcap_ring.rst b/doc/guides/nics/pcap_ring.rst index 46aa3ac..78b7b61 100644 --- a/doc/guides/nics/pcap_ring.rst +++ b/doc/guides/nics/pcap_ring.rst @@ -28,11 +28,11 @@ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -Libpcap and Ring Based Poll Mode Drivers -======================================== +Software Poll Mode Drivers +========================== In addition to Poll Mode Drivers (PMDs) for physical and virtual hardware, -the DPDK also includes two pure-software PMDs. These two drivers are: +the DPDK also includes pure-software PMDs. These drivers are: * A libpcap -based PMD (librte_pmd_pcap) that reads and writes packets using libpcap, - both from files on disk, as well as from physical NIC devices using standard Linux kernel drivers. @@ -40,6 +40,10 @@ the DPDK also includes two pure-software PMDs. These two drivers are: * A ring-based PMD (librte_pmd_ring) that allows a set of software FIFOs (that is, rte_ring) to be accessed using the PMD APIs, as though they were physical NICs. +* A slow data path PMD (librte_pmd_kdp) that allows send/get packets to/from OS network + stack as it is a physical NIC. + + .. note:: The libpcap -based PMD is disabled by default in the build configuration files, @@ -211,6 +215,121 @@ Multiple devices may be specified, separated by commas. Done. +Kernel Data Path PMD +~~~~~~~~~~~~~~~~~~~~ + +Kernel Data Path (KDP) PMD is to communicate with OS network stack easily by application. + +.. code-block:: console + + ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 -- -i + ... + Configuring Port 0 (socket 0) + Port 0: 00:00:00:00:00:00 + Configuring Port 1 (socket 0) + Port 1: 00:00:00:00:00:00 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + +KDP PMD supports two type of communication: + +* Custom FIFO implementation +* tun/tap implementation + +Custom FIFO implementation gives more performance but requires KDP kernel module (rte_kdp.ko) inserted. + +By default FIFO communication has priority, if KDP kernel module is not inserted, tun/tap communication used. + +If KDP kernel module inserted, above testpmd command will create following virtual interfaces, these can be used as any interface. + +.. code-block:: console + + # ifconfig kdp0; ifconfig kdp1 + kdp0: flags=4098<BROADCAST,MULTICAST> mtu 1500 + ether 00:00:00:00:00:00 txqueuelen 1000 (Ethernet) + RX packets 0 bytes 0 (0.0 B) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 0 bytes 0 (0.0 B) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + kdp1: flags=4098<BROADCAST,MULTICAST> mtu 1500 + ether 00:00:00:00:00:00 txqueuelen 1000 (Ethernet) + RX packets 0 bytes 0 (0.0 B) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 0 bytes 0 (0.0 B) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + +With tun/tap communication method, following interfaces are created: + +.. code-block:: console + + # ifconfig tap_kdp0; ifconfig tap_kdp1 + tap_kdp0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 + inet6 fe80::341f:afff:feb7:23db prefixlen 64 scopeid 0x20<link> + ether 36:1f:af:b7:23:db txqueuelen 500 (Ethernet) + RX packets 126624864 bytes 6184828655 (5.7 GiB) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 126236898 bytes 6150306636 (5.7 GiB) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + tap_kdp1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 + inet6 fe80::f030:b4ff:fe94:b720 prefixlen 64 scopeid 0x20<link> + ether f2:30:b4:94:b7:20 txqueuelen 500 (Ethernet) + RX packets 126237370 bytes 6150329717 (5.7 GiB) + RX errors 0 dropped 9 overruns 0 frame 0 + TX packets 126624896 bytes 6184826874 (5.7 GiB) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + +DPDK application can be used to forward packages between these interfaces: + +.. code-block:: console + + In Linux: + ip l add br0 type bridge + ip l set tap_kdp0 master br0 + ip l set tap_kdp1 master br0 + ip l set br0 up + ip l set tap_kdp0 up + ip l set tap_kdp1 up + + + In testpmd: + testpmd> start + io packet forwarding - CRC stripping disabled - packets/burst=32 + nb forwarding cores=1 - nb forwarding ports=2 + RX queues=1 - RX desc=128 - RX free threshold=0 + RX threshold registers: pthresh=0 hthresh=0 wthresh=0 + TX queues=1 - TX desc=512 - TX free threshold=0 + TX threshold registers: pthresh=0 hthresh=0 wthresh=0 + TX RS bit threshold=0 - TXQ flags=0x0 + testpmd> stop + Telling cores to stop... + Waiting for lcores to finish... + + ---------------------- Forward statistics for port 0 ---------------------- + RX-packets: 973900 RX-dropped: 0 RX-total: 973900 + TX-packets: 973903 TX-dropped: 0 TX-total: 973903 + ---------------------------------------------------------------------------- + + ---------------------- Forward statistics for port 1 ---------------------- + RX-packets: 973903 RX-dropped: 0 RX-total: 973903 + TX-packets: 973900 TX-dropped: 0 TX-total: 973900 + ---------------------------------------------------------------------------- + + +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++ + RX-packets: 1947803 RX-dropped: 0 RX-total: 1947803 + TX-packets: 1947803 TX-dropped: 0 TX-total: 1947803 + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + + Done. + + + + + Using the Poll Mode Driver from an Application ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst index 99de186..faf6a17 100644 --- a/doc/guides/rel_notes/release_2_3.rst +++ b/doc/guides/rel_notes/release_2_3.rst @@ -4,6 +4,12 @@ DPDK Release 2.3 New Features ------------ +* **Added Slow Data Path support.** + + * This is based on KNI work and in long term intends to replace it. + * Added Kernel Data Path (KDP) kernel module. + * Added KDP virtual PMD. + Resolved Issues --------------- diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 6e4497e..0be06f5 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -51,6 +51,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt +DIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += kdp include $(RTE_SDK)/mk/rte.sharelib.mk include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/net/kdp/Makefile b/drivers/net/kdp/Makefile new file mode 100644 index 0000000..035056e --- /dev/null +++ b/drivers/net/kdp/Makefile @@ -0,0 +1,61 @@ +# BSD LICENSE +# +# Copyright(c) 2016 Intel Corporation. All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = librte_pmd_kdp.a + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +EXPORT_MAP := rte_pmd_kdp_version.map + +LIBABIVER := 1 + +# +# all source are stored in SRCS-y +# +SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_eth_kdp.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_kdp.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_kdp_tap.c + +# +# Export include files +# +SYMLINK-y-include += + +# this lib depends upon: +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += lib/librte_mbuf +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += lib/librte_ether + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/kdp/rte_eth_kdp.c b/drivers/net/kdp/rte_eth_kdp.c new file mode 100644 index 0000000..ac650d7 --- /dev/null +++ b/drivers/net/kdp/rte_eth_kdp.c @@ -0,0 +1,481 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <rte_ethdev.h> +#include <rte_dev.h> +#include <rte_kvargs.h> + +#include "rte_kdp.h" + +#define MAX_PACKET_SZ 2048 + +struct kdp_queue { + struct pmd_internals *internals; + struct rte_mempool *mb_pool; + + uint64_t rx_pkts; + uint64_t rx_bytes; + uint64_t rx_err_pkts; + uint64_t tx_pkts; + uint64_t tx_bytes; + uint64_t tx_err_pkts; +}; + +struct pmd_internals { + struct rte_kdp *kdp; + struct rte_kdp_tap *kdp_tap; + + struct kdp_queue rx_queues[RTE_MAX_QUEUES_PER_PORT]; + struct kdp_queue tx_queues[RTE_MAX_QUEUES_PER_PORT]; +}; + +static struct ether_addr eth_addr = { .addr_bytes = {0} }; +static const char *drivername = "KDP PMD"; +static struct rte_eth_link pmd_link = { + .link_speed = 10000, + .link_duplex = ETH_LINK_FULL_DUPLEX, + .link_status = 0 +}; + +static uint16_t +eth_kdp_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct kdp_queue *kdp_q = q; + struct pmd_internals *internals = kdp_q->internals; + uint16_t nb_pkts; + + nb_pkts = rte_kdp_rx_burst(internals->kdp, bufs, nb_bufs); + + kdp_q->rx_pkts += nb_pkts; + kdp_q->rx_err_pkts += nb_bufs - nb_pkts; + + return nb_pkts; +} + +static uint16_t +eth_kdp_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct kdp_queue *kdp_q = q; + struct pmd_internals *internals = kdp_q->internals; + uint16_t nb_pkts; + + nb_pkts = rte_kdp_tx_burst(internals->kdp, bufs, nb_bufs); + + kdp_q->tx_pkts += nb_pkts; + kdp_q->tx_err_pkts += nb_bufs - nb_pkts; + + return nb_pkts; +} + +static uint16_t +eth_kdp_tap_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct kdp_queue *kdp_q = q; + struct pmd_internals *internals = kdp_q->internals; + struct rte_kdp_tap *kdp_tap = internals->kdp_tap; + struct rte_mbuf *m; + int ret; + unsigned i; + + for (i = 0; i < nb_bufs; i++) { + m = rte_pktmbuf_alloc(kdp_q->mb_pool); + bufs[i] = m; + ret = read(kdp_tap->tap_fd, rte_pktmbuf_mtod(m, void *), + MAX_PACKET_SZ); + if (ret < 0) { + rte_pktmbuf_free(m); + break; + } + + m->nb_segs = 1; + m->next = NULL; + m->pkt_len = (uint16_t)ret; + m->data_len = (uint16_t)ret; + } + + kdp_q->rx_pkts += i; + kdp_q->rx_err_pkts += nb_bufs - i; + + return i; +} + +static uint16_t +eth_kdp_tap_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct kdp_queue *kdp_q = q; + struct pmd_internals *internals = kdp_q->internals; + struct rte_kdp_tap *kdp_tap = internals->kdp_tap; + struct rte_mbuf *m; + unsigned i; + + for (i = 0; i < nb_bufs; i++) { + m = bufs[i]; + write(kdp_tap->tap_fd, rte_pktmbuf_mtod(m, void*), + rte_pktmbuf_data_len(m)); + rte_pktmbuf_free(m); + } + + kdp_q->tx_pkts += i; + kdp_q->tx_err_pkts += nb_bufs - i; + + return i; +} + +static int +kdp_start(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct rte_kdp_conf conf; + uint16_t port_id = dev->data->port_id; + int ret = 0; + + if (internals->kdp) { + snprintf(conf.name, RTE_KDP_NAMESIZE, "kdp%u", port_id); + conf.force_bind = 0; + conf.group_id = port_id; + conf.mbuf_size = MAX_PACKET_SZ; + + ret = rte_kdp_start(internals->kdp, + internals->rx_queues[0].mb_pool, + &conf); + if (ret) + RTE_LOG(ERR, KDP, "Fail to create kdp for port: %d\n", + port_id); + } + + return ret; +} + +static int +eth_dev_start(struct rte_eth_dev *dev) +{ + int ret; + + ret = kdp_start(dev); + if (ret) + return -1; + + dev->data->dev_link.link_status = 1; + return 0; +} + +static void +eth_dev_stop(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + + rte_kdp_release(internals->kdp); + dev->data->dev_link.link_status = 0; +} + +static void +eth_dev_close(struct rte_eth_dev *dev __rte_unused) +{ + rte_kdp_close(); +} + +static int +eth_dev_configure(struct rte_eth_dev *dev __rte_unused) +{ + return 0; +} + +static void +eth_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) +{ + struct rte_eth_dev_data *data = dev->data; + + dev_info->driver_name = data->drv_name; + dev_info->max_mac_addrs = 1; + dev_info->max_rx_pktlen = (uint32_t)-1; + dev_info->max_rx_queues = data->nb_rx_queues; + dev_info->max_tx_queues = data->nb_tx_queues; + dev_info->min_rx_bufsize = 0; + dev_info->pci_dev = NULL; +} + +static int +eth_rx_queue_setup(struct rte_eth_dev *dev, + uint16_t rx_queue_id __rte_unused, + uint16_t nb_rx_desc __rte_unused, + unsigned int socket_id __rte_unused, + const struct rte_eth_rxconf *rx_conf __rte_unused, + struct rte_mempool *mb_pool) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct kdp_queue *q; + + q = &internals->rx_queues[rx_queue_id]; + q->internals = internals; + q->mb_pool = mb_pool; + + dev->data->rx_queues[rx_queue_id] = q; + + return 0; +} + +static int +eth_tx_queue_setup(struct rte_eth_dev *dev, + uint16_t tx_queue_id, + uint16_t nb_tx_desc __rte_unused, + unsigned int socket_id __rte_unused, + const struct rte_eth_txconf *tx_conf __rte_unused) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct kdp_queue *q; + + q = &internals->tx_queues[tx_queue_id]; + q->internals = internals; + + dev->data->tx_queues[tx_queue_id] = q; + + return 0; +} + +static void +eth_queue_release(void *q __rte_unused) +{ +} + +static int +eth_link_update(struct rte_eth_dev *dev __rte_unused, + int wait_to_complete __rte_unused) +{ + return 0; +} + +static void +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) +{ + unsigned i, num_stats; + unsigned long rx_packets_total = 0, rx_bytes_total = 0; + unsigned long tx_packets_total = 0, tx_bytes_total = 0; + unsigned long tx_packets_err_total = 0; + struct rte_eth_dev_data *data = dev->data; + struct kdp_queue *q; + + num_stats = RTE_MIN((unsigned)RTE_ETHDEV_QUEUE_STAT_CNTRS, + data->nb_rx_queues); + for (i = 0; i < num_stats; i++) { + q = data->rx_queues[i]; + stats->q_ipackets[i] = q->rx_pkts; + stats->q_ibytes[i] = q->rx_bytes; + rx_packets_total += stats->q_ipackets[i]; + rx_bytes_total += stats->q_ibytes[i]; + } + + num_stats = RTE_MIN((unsigned)RTE_ETHDEV_QUEUE_STAT_CNTRS, + data->nb_tx_queues); + for (i = 0; i < num_stats; i++) { + q = data->tx_queues[i]; + stats->q_opackets[i] = q->tx_pkts; + stats->q_obytes[i] = q->tx_bytes; + stats->q_errors[i] = q->tx_err_pkts; + tx_packets_total += stats->q_opackets[i]; + tx_bytes_total += stats->q_obytes[i]; + tx_packets_err_total += stats->q_errors[i]; + } + + stats->ipackets = rx_packets_total; + stats->ibytes = rx_bytes_total; + stats->opackets = tx_packets_total; + stats->obytes = tx_bytes_total; + stats->oerrors = tx_packets_err_total; +} + +static void +eth_stats_reset(struct rte_eth_dev *dev) +{ + unsigned i; + struct rte_eth_dev_data *data = dev->data; + struct kdp_queue *q; + + for (i = 0; i < data->nb_rx_queues; i++) { + q = data->rx_queues[i]; + q->rx_pkts = 0; + q->rx_bytes = 0; + } + for (i = 0; i < data->nb_tx_queues; i++) { + q = data->rx_queues[i]; + q->tx_pkts = 0; + q->tx_bytes = 0; + q->tx_err_pkts = 0; + } +} + +static const struct eth_dev_ops ops = { + .dev_start = eth_dev_start, + .dev_stop = eth_dev_stop, + .dev_close = eth_dev_close, + .dev_configure = eth_dev_configure, + .dev_infos_get = eth_dev_info, + .rx_queue_setup = eth_rx_queue_setup, + .tx_queue_setup = eth_tx_queue_setup, + .rx_queue_release = eth_queue_release, + .tx_queue_release = eth_queue_release, + .link_update = eth_link_update, + .stats_get = eth_stats_get, + .stats_reset = eth_stats_reset, +}; + +static struct rte_eth_dev * +eth_dev_kdp_create(const char *name, unsigned numa_node) +{ + uint16_t nb_rx_queues = 1; + uint16_t nb_tx_queues = 1; + struct rte_eth_dev_data *data = NULL; + struct pmd_internals *internals = NULL; + struct rte_eth_dev *eth_dev = NULL; + + if (name == NULL) + return NULL; + + RTE_LOG(INFO, PMD, "Creating kdp ethdev on numa socket %u\n", + numa_node); + + data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node); + if (data == NULL) + goto error; + + internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node); + if (internals == NULL) + goto error; + + /* reserve an ethdev entry */ + eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL); + if (eth_dev == NULL) + goto error; + + data->dev_private = internals; + data->port_id = eth_dev->data->port_id; + memmove(data->name, eth_dev->data->name, sizeof(data->name)); + data->nb_rx_queues = nb_rx_queues; + data->nb_tx_queues = nb_tx_queues; + data->dev_link = pmd_link; + data->mac_addrs = ð_addr; + + eth_dev->data = data; + eth_dev->dev_ops = &ops; + eth_dev->driver = NULL; + + data->dev_flags = RTE_ETH_DEV_DETACHABLE; + data->kdrv = RTE_KDRV_NONE; + data->drv_name = drivername; + data->numa_node = numa_node; + + return eth_dev; + +error: + rte_free(data); + rte_free(internals); + + return NULL; +} + +static int +rte_pmd_kdp_devinit(const char *name, const char *params __rte_unused) +{ + struct rte_eth_dev *eth_dev = NULL; + struct pmd_internals *internals; + struct rte_kdp *kdp; + struct rte_kdp_tap *kdp_tap = NULL; + uint16_t port_id; + + RTE_LOG(INFO, PMD, "Initializing eth_kdp for %s\n", name); + + eth_dev = eth_dev_kdp_create(name, rte_socket_id()); + if (eth_dev == NULL) + return -1; + + internals = eth_dev->data->dev_private; + port_id = eth_dev->data->port_id; + + kdp = rte_kdp_init(port_id); + if (kdp == NULL) + kdp_tap = rte_kdp_tap_init(port_id); + + if (kdp == NULL && kdp_tap == NULL) { + rte_eth_dev_release_port(eth_dev); + rte_free(internals); + + /* Not return error to prevent panic in rte_eal_init() */ + return 0; + } + + internals->kdp = kdp; + internals->kdp_tap = kdp_tap; + + if (kdp == NULL) { + eth_dev->rx_pkt_burst = eth_kdp_tap_rx; + eth_dev->tx_pkt_burst = eth_kdp_tap_tx; + } else { + eth_dev->rx_pkt_burst = eth_kdp_rx; + eth_dev->tx_pkt_burst = eth_kdp_tx; + } + + return 0; +} + +static int +rte_pmd_kdp_devuninit(const char *name) +{ + struct rte_eth_dev *eth_dev = NULL; + + if (name == NULL) + return -EINVAL; + + RTE_LOG(INFO, PMD, "Un-Initializing eth_kdp for %s\n", name); + + /* find the ethdev entry */ + eth_dev = rte_eth_dev_allocated(name); + if (eth_dev == NULL) + return -1; + + eth_dev_stop(eth_dev); + + if (eth_dev->data) + rte_free(eth_dev->data->dev_private); + rte_free(eth_dev->data); + + rte_eth_dev_release_port(eth_dev); + return 0; +} + +static struct rte_driver pmd_kdp_drv = { + .name = "eth_kdp", + .type = PMD_VDEV, + .init = rte_pmd_kdp_devinit, + .uninit = rte_pmd_kdp_devuninit, +}; + +PMD_REGISTER_DRIVER(pmd_kdp_drv); diff --git a/drivers/net/kdp/rte_kdp.c b/drivers/net/kdp/rte_kdp.c new file mode 100644 index 0000000..604f697 --- /dev/null +++ b/drivers/net/kdp/rte_kdp.c @@ -0,0 +1,365 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef RTE_EXEC_ENV_LINUXAPP +#error "KDP is not supported" +#endif + +#include <rte_spinlock.h> +#include <rte_ethdev.h> +#include <rte_memzone.h> + +#include "rte_kdp.h" +#include "rte_kdp_fifo.h" + +#define MAX_MBUF_BURST_NUM 32 + +/* Maximum number of ring entries */ +#define KDP_FIFO_COUNT_MAX 1024 +#define KDP_FIFO_SIZE (KDP_FIFO_COUNT_MAX * sizeof(void *) + \ + sizeof(struct rte_kdp_fifo)) + +static volatile int kdp_fd = -1; + +static const struct rte_memzone * +kdp_memzone_reserve(const char *name, size_t len, int socket_id, + unsigned flags) +{ + const struct rte_memzone *mz = rte_memzone_lookup(name); + + if (mz == NULL) + mz = rte_memzone_reserve(name, len, socket_id, flags); + + return mz; +} + +static int +slot_init(struct rte_kdp_memzone_slot *slot) +{ +#define OBJNAMSIZ 32 + char obj_name[OBJNAMSIZ]; + const struct rte_memzone *mz; + + /* TX RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_tx_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_tx_q = mz; + + /* RX RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_rx_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_rx_q = mz; + + /* ALLOC RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_alloc_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_alloc_q = mz; + + /* FREE RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_free_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_free_q = mz; + + return 0; + +kdp_fail: + return -1; +} + +static void +ring_init(struct rte_kdp *kdp) +{ + struct rte_kdp_memzone_slot *slot = kdp->slot; + const struct rte_memzone *mz; + + /* TX RING */ + mz = slot->m_tx_q; + kdp->tx_q = mz->addr; + kdp_fifo_init(kdp->tx_q, KDP_FIFO_COUNT_MAX); + + /* RX RING */ + mz = slot->m_rx_q; + kdp->rx_q = mz->addr; + kdp_fifo_init(kdp->rx_q, KDP_FIFO_COUNT_MAX); + + /* ALLOC RING */ + mz = slot->m_alloc_q; + kdp->alloc_q = mz->addr; + kdp_fifo_init(kdp->alloc_q, KDP_FIFO_COUNT_MAX); + + /* FREE RING */ + mz = slot->m_free_q; + kdp->free_q = mz->addr; + kdp_fifo_init(kdp->free_q, KDP_FIFO_COUNT_MAX); +} + +/* Shall be called before any allocation happens */ +struct rte_kdp * +rte_kdp_init(uint16_t port_id) +{ + struct rte_kdp_memzone_slot *slot = NULL; + struct rte_kdp *kdp = NULL; + int ret; + + /* Check FD and open */ + if (kdp_fd < 0) { + kdp_fd = open("/dev/kdp", O_RDWR); + if (kdp_fd < 0) { + RTE_LOG(ERR, KDP, "Can not open /dev/kdp\n"); + return NULL; + } + } + + slot = rte_malloc(NULL, sizeof(struct rte_kdp_memzone_slot), 0); + if (slot == NULL) + goto kdp_fail; + slot->id = port_id; + + kdp = rte_malloc(NULL, sizeof(struct rte_kdp), 0); + if (kdp == NULL) + goto kdp_fail; + kdp->slot = slot; + + ret = slot_init(slot); + if (ret < 0) + goto kdp_fail; + + ring_init(kdp); + + return kdp; + +kdp_fail: + rte_free(slot); + rte_free(kdp); + RTE_LOG(ERR, KDP, "Unable to allocate memory\n"); + return NULL; +} + +static void +kdp_allocate_mbufs(struct rte_kdp *kdp) +{ + int i, ret; + struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; + + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pool) != + offsetof(struct rte_kdp_mbuf, pool)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_addr) != + offsetof(struct rte_kdp_mbuf, buf_addr)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, next) != + offsetof(struct rte_kdp_mbuf, next)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_off) != + offsetof(struct rte_kdp_mbuf, data_off)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_len) != + offsetof(struct rte_kdp_mbuf, data_len)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pkt_len) != + offsetof(struct rte_kdp_mbuf, pkt_len)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != + offsetof(struct rte_kdp_mbuf, ol_flags)); + + /* Check if pktmbuf pool has been configured */ + if (kdp->pktmbuf_pool == NULL) { + RTE_LOG(ERR, KDP, "No valid mempool for allocating mbufs\n"); + return; + } + + for (i = 0; i < MAX_MBUF_BURST_NUM; i++) { + pkts[i] = rte_pktmbuf_alloc(kdp->pktmbuf_pool); + if (unlikely(pkts[i] == NULL)) { + /* Out of memory */ + RTE_LOG(ERR, KDP, "Out of memory\n"); + break; + } + } + + /* No pkt mbuf alocated */ + if (i <= 0) + return; + + ret = kdp_fifo_put(kdp->alloc_q, (void **)pkts, i); + + /* Check if any mbufs not put into alloc_q, and then free them */ + if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM) { + int j; + + for (j = ret; j < i; j++) + rte_pktmbuf_free(pkts[j]); + } +} + +int +rte_kdp_start(struct rte_kdp *kdp, struct rte_mempool *pktmbuf_pool, + const struct rte_kdp_conf *conf) +{ + struct rte_kdp_memzone_slot *slot = kdp->slot; + struct rte_kdp_device_info dev_info; + char mz_name[RTE_MEMZONE_NAMESIZE]; + const struct rte_memzone *mz; + int ret; + + if (!kdp || !pktmbuf_pool || !conf || !conf->name[0]) + return -1; + + snprintf(kdp->name, RTE_KDP_NAMESIZE, "%s", conf->name); + kdp->pktmbuf_pool = pktmbuf_pool; + kdp->group_id = conf->group_id; + + memset(&dev_info, 0, sizeof(dev_info)); + dev_info.core_id = conf->core_id; + dev_info.force_bind = conf->force_bind; + dev_info.group_id = conf->group_id; + dev_info.mbuf_size = conf->mbuf_size; + snprintf(dev_info.name, RTE_KDP_NAMESIZE, "%s", conf->name); + + dev_info.tx_phys = slot->m_tx_q->phys_addr; + dev_info.rx_phys = slot->m_rx_q->phys_addr; + dev_info.alloc_phys = slot->m_alloc_q->phys_addr; + dev_info.free_phys = slot->m_free_q->phys_addr; + + /* MBUF mempool */ + snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_OBJ_NAME, + pktmbuf_pool->name); + mz = rte_memzone_lookup(mz_name); + if (mz == NULL) + goto kdp_fail; + dev_info.mbuf_va = mz->addr; + dev_info.mbuf_phys = mz->phys_addr; + + ret = ioctl(kdp_fd, RTE_KDP_IOCTL_CREATE, &dev_info); + if (ret < 0) + goto kdp_fail; + + kdp->in_use = 1; + + /* Allocate mbufs and then put them into alloc_q */ + kdp_allocate_mbufs(kdp); + + return 0; + +kdp_fail: + return -1; +} + +static void +kdp_free_mbufs(struct rte_kdp *kdp) +{ + int i, ret; + struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; + + ret = kdp_fifo_get(kdp->free_q, (void **)pkts, MAX_MBUF_BURST_NUM); + if (likely(ret > 0)) { + for (i = 0; i < ret; i++) + rte_pktmbuf_free(pkts[i]); + } +} + +unsigned +rte_kdp_tx_burst(struct rte_kdp *kdp, struct rte_mbuf **mbufs, unsigned num) +{ + unsigned ret = kdp_fifo_put(kdp->rx_q, (void **)mbufs, num); + + /* Get mbufs from free_q and then free them */ + kdp_free_mbufs(kdp); + + return ret; +} + +unsigned +rte_kdp_rx_burst(struct rte_kdp *kdp, struct rte_mbuf **mbufs, unsigned num) +{ + unsigned ret = kdp_fifo_get(kdp->tx_q, (void **)mbufs, num); + + /* If buffers removed, allocate mbufs and then put them into alloc_q */ + if (ret) + kdp_allocate_mbufs(kdp); + + return ret; +} + +static void +kdp_free_fifo(struct rte_kdp_fifo *fifo) +{ + int ret; + struct rte_mbuf *pkt; + + do { + ret = kdp_fifo_get(fifo, (void **)&pkt, 1); + if (ret) + rte_pktmbuf_free(pkt); + } while (ret); +} + +int +rte_kdp_release(struct rte_kdp *kdp) +{ + struct rte_kdp_device_info dev_info; + + if (!kdp || !kdp->in_use) + return -1; + + snprintf(dev_info.name, sizeof(dev_info.name), "%s", kdp->name); + if (ioctl(kdp_fd, RTE_KDP_IOCTL_RELEASE, &dev_info) < 0) { + RTE_LOG(ERR, KDP, "Fail to release kdp device\n"); + return -1; + } + + /* mbufs in all fifo should be released, except request/response */ + kdp_free_fifo(kdp->tx_q); + kdp_free_fifo(kdp->rx_q); + kdp_free_fifo(kdp->alloc_q); + kdp_free_fifo(kdp->free_q); + + rte_free(kdp->slot); + + /* Memset the KDP struct */ + memset(kdp, 0, sizeof(struct rte_kdp)); + + return 0; +} + +void +rte_kdp_close(void) +{ + if (kdp_fd < 0) + return; + + close(kdp_fd); + kdp_fd = -1; +} diff --git a/drivers/net/kdp/rte_kdp.h b/drivers/net/kdp/rte_kdp.h new file mode 100644 index 0000000..b9db048 --- /dev/null +++ b/drivers/net/kdp/rte_kdp.h @@ -0,0 +1,126 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_KDP_H_ +#define _RTE_KDP_H_ + +#include <fcntl.h> +#include <unistd.h> + +#include <sys/ioctl.h> + +#include <rte_malloc.h> +#include <rte_mbuf.h> +#include <rte_memcpy.h> +#include <rte_memory.h> +#include <rte_mempool.h> + +#include <exec-env/rte_kdp_common.h> + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * KDP memzone pool slot + */ +struct rte_kdp_memzone_slot { + uint32_t id; + + /* Memzones */ + const struct rte_memzone *m_tx_q; /**< TX queue */ + const struct rte_memzone *m_rx_q; /**< RX queue */ + const struct rte_memzone *m_alloc_q; /**< Allocated mbufs queue */ + const struct rte_memzone *m_free_q; /**< To be freed mbufs queue */ +}; + +/** + * KDP context + */ +struct rte_kdp { + char name[RTE_KDP_NAMESIZE]; /**< KDP interface name */ + struct rte_mempool *pktmbuf_pool; /**< pkt mbuf mempool */ + struct rte_kdp_memzone_slot *slot; + uint16_t group_id; /**< Group ID of KDP devices */ + + struct rte_kdp_fifo *tx_q; /**< TX queue */ + struct rte_kdp_fifo *rx_q; /**< RX queue */ + struct rte_kdp_fifo *alloc_q; /**< Allocated mbufs queue */ + struct rte_kdp_fifo *free_q; /**< To be freed mbufs queue */ + + uint8_t in_use; /**< kdp in use */ +}; + +struct rte_kdp_tap { + char name[RTE_KDP_NAMESIZE]; + int tap_fd; +}; + +/** + * Structure for configuring KDP device. + */ +struct rte_kdp_conf { + /* + * KDP name which will be used in relevant network device. + * Let the name as short as possible, as it will be part of + * memzone name. + */ + char name[RTE_KDP_NAMESIZE]; + uint32_t core_id; /* Core ID to bind kernel thread on */ + uint16_t group_id; + unsigned mbuf_size; + + uint8_t force_bind; /* Flag to bind kernel thread */ +}; + +struct rte_kdp_tap *rte_kdp_tap_init(uint16_t port_id); +struct rte_kdp *rte_kdp_init(uint16_t port_id); + +int rte_kdp_start(struct rte_kdp *kdp, struct rte_mempool *pktmbuf_pool, + const struct rte_kdp_conf *conf); + +unsigned rte_kdp_rx_burst(struct rte_kdp *kdp, + struct rte_mbuf **mbufs, unsigned num); + +unsigned rte_kdp_tx_burst(struct rte_kdp *kdp, + struct rte_mbuf **mbufs, unsigned num); + +int rte_kdp_release(struct rte_kdp *kdp); + +void rte_kdp_close(void); + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_KDP_H_ */ diff --git a/drivers/net/kdp/rte_kdp_fifo.h b/drivers/net/kdp/rte_kdp_fifo.h new file mode 100644 index 0000000..1a7e063 --- /dev/null +++ b/drivers/net/kdp/rte_kdp_fifo.h @@ -0,0 +1,91 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/** + * Initializes the kdp fifo structure + */ +static void +kdp_fifo_init(struct rte_kdp_fifo *fifo, unsigned size) +{ + /* Ensure size is power of 2 */ + if (size & (size - 1)) + rte_panic("KDP fifo size must be power of 2\n"); + + fifo->write = 0; + fifo->read = 0; + fifo->len = size; + fifo->elem_size = sizeof(void *); +} + +/** + * Adds num elements into the fifo. Return the number actually written + */ +static inline unsigned +kdp_fifo_put(struct rte_kdp_fifo *fifo, void **data, unsigned num) +{ + unsigned i = 0; + unsigned fifo_write = fifo->write; + unsigned fifo_read = fifo->read; + unsigned new_write = fifo_write; + + for (i = 0; i < num; i++) { + new_write = (new_write + 1) & (fifo->len - 1); + + if (new_write == fifo_read) + break; + fifo->buffer[fifo_write] = data[i]; + fifo_write = new_write; + } + fifo->write = fifo_write; + return i; +} + +/** + * Get up to num elements from the fifo. Return the number actully read + */ +static inline unsigned +kdp_fifo_get(struct rte_kdp_fifo *fifo, void **data, unsigned num) +{ + unsigned i = 0; + unsigned new_read = fifo->read; + unsigned fifo_write = fifo->write; + for (i = 0; i < num; i++) { + if (new_read == fifo_write) + break; + + data[i] = fifo->buffer[new_read]; + new_read = (new_read + 1) & (fifo->len - 1); + } + fifo->read = new_read; + return i; +} diff --git a/drivers/net/kdp/rte_kdp_tap.c b/drivers/net/kdp/rte_kdp_tap.c new file mode 100644 index 0000000..f07ba98 --- /dev/null +++ b/drivers/net/kdp/rte_kdp_tap.c @@ -0,0 +1,96 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <string.h> + +#include <sys/socket.h> +#include <linux/if.h> +#include <linux/if_tun.h> + +#include "rte_kdp.h" + +static int +tap_create(char *name) +{ + struct ifreq ifr; + int fd, ret; + + fd = open("/dev/net/tun", O_RDWR); + if (fd < 0) + return fd; + + memset(&ifr, 0, sizeof(ifr)); + + /* TAP device without packet information */ + ifr.ifr_flags = IFF_TAP | IFF_NO_PI; + + if (name && *name) + snprintf(ifr.ifr_name, IFNAMSIZ, "%s", name); + + ret = ioctl(fd, TUNSETIFF, (void *)&ifr); + if (ret < 0) { + close(fd); + return ret; + } + + if (name) + snprintf(name, IFNAMSIZ, "%s", ifr.ifr_name); + + return fd; +} + +struct rte_kdp_tap * +rte_kdp_tap_init(uint16_t port_id) +{ + struct rte_kdp_tap *kdp_tap = NULL; + int flags; + + kdp_tap = rte_malloc(NULL, sizeof(struct rte_kdp_tap), 0); + if (kdp_tap == NULL) + goto error; + + snprintf(kdp_tap->name, IFNAMSIZ, "tap_kdp%u", port_id); + kdp_tap->tap_fd = tap_create(kdp_tap->name); + if (kdp_tap->tap_fd < 0) + goto error; + + flags = fcntl(kdp_tap->tap_fd, F_GETFL, 0); + fcntl(kdp_tap->tap_fd, F_SETFL, flags | O_NONBLOCK); + + return kdp_tap; + +error: + rte_free(kdp_tap); + return NULL; +} + diff --git a/drivers/net/kdp/rte_pmd_kdp_version.map b/drivers/net/kdp/rte_pmd_kdp_version.map new file mode 100644 index 0000000..0812bb1 --- /dev/null +++ b/drivers/net/kdp/rte_pmd_kdp_version.map @@ -0,0 +1,4 @@ +DPDK_2.3 { + + local: *; +}; diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h index 2e47e7f..5a0048b 100644 --- a/lib/librte_eal/common/include/rte_log.h +++ b/lib/librte_eal/common/include/rte_log.h @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2016 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -79,6 +79,7 @@ extern struct rte_logs rte_logs; #define RTE_LOGTYPE_PIPELINE 0x00008000 /**< Log related to pipeline. */ #define RTE_LOGTYPE_MBUF 0x00010000 /**< Log related to mbuf. */ #define RTE_LOGTYPE_CRYPTODEV 0x00020000 /**< Log related to cryptodev. */ +#define RTE_LOGTYPE_KDP 0x00080000 /**< Log related to KDP. */ /* these log types can be used in an application */ #define RTE_LOGTYPE_USER1 0x01000000 /**< User-defined log type 1. */ diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 8ecab41..eb18972 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # Copyright(c) 2014-2015 6WIND S.A. # All rights reserved. # @@ -154,6 +154,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT) += -lrte_pmd_qat +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += -lrte_pmd_kdp # AESNI MULTI BUFFER is dependent on the IPSec_MB library _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB) += -lrte_pmd_aesni_mb -- 2.5.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication 2016-01-27 16:32 ` [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit @ 2016-01-28 8:16 ` Xu, Qian Q 2016-01-29 16:04 ` Yigit, Ferruh 2016-02-09 17:33 ` Reshma Pattan 1 sibling, 1 reply; 29+ messages in thread From: Xu, Qian Q @ 2016-01-28 8:16 UTC (permalink / raw) To: Yigit, Ferruh, dev Any dependencies with kernel versions? What kernel versions should it support? Thanks Qian -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ferruh Yigit Sent: Thursday, January 28, 2016 12:33 AM To: dev@dpdk.org Subject: [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication This patch provides slow data path communication to the Linux kernel. Patch is based on librte_kni, and heavily re-uses it. The main difference is librte_kni library converted into a PMD, to provide ease of use for applications. Now any application can use slow path communication without any update in application, because of existing eal support for virtual PMD. Also this PMD supports two methods to send packets to the Linux, first one is custom FIFO implementation with help of KDP kernel module, second one is Linux in-kernel tun/tap support. PMD first checks for KDP kernel module, if fails it tries to create and use a tap interface. With FIFO method: PMD's rx_pkt_burst() get packets from FIFO, and tx_pkt_burst() puts packet to the FIFO. The corresponding Linux virtual network device driver code also gets/puts packets from FIFO as they are coming from hardware. With tun/tap method: no external kernel module required, PMD reads from and writes packets to the tap interface file descriptor. Tap interface has performance penalty against FIFO implementation. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> --- config/common_linuxapp | 1 + doc/guides/nics/pcap_ring.rst | 125 ++++++++- doc/guides/rel_notes/release_2_3.rst | 6 + drivers/net/Makefile | 3 +- drivers/net/kdp/Makefile | 61 ++++ drivers/net/kdp/rte_eth_kdp.c | 481 ++++++++++++++++++++++++++++++++ drivers/net/kdp/rte_kdp.c | 365 ++++++++++++++++++++++++ drivers/net/kdp/rte_kdp.h | 126 +++++++++ drivers/net/kdp/rte_kdp_fifo.h | 91 ++++++ drivers/net/kdp/rte_kdp_tap.c | 96 +++++++ drivers/net/kdp/rte_pmd_kdp_version.map | 4 + lib/librte_eal/common/include/rte_log.h | 3 +- mk/rte.app.mk | 3 +- 13 files changed, 1359 insertions(+), 6 deletions(-) create mode 100644 drivers/net/kdp/Makefile create mode 100644 drivers/net/kdp/rte_eth_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.h create mode 100644 drivers/net/kdp/rte_kdp_fifo.h create mode 100644 drivers/net/kdp/rte_kdp_tap.c create mode 100644 drivers/net/kdp/rte_pmd_kdp_version.map ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication 2016-01-28 8:16 ` Xu, Qian Q @ 2016-01-29 16:04 ` Yigit, Ferruh 0 siblings, 0 replies; 29+ messages in thread From: Yigit, Ferruh @ 2016-01-29 16:04 UTC (permalink / raw) To: Xu, Qian Q; +Cc: dev On Thu, Jan 28, 2016 at 08:16:09AM +0000, Xu, Qian Q wrote: > Any dependencies with kernel versions? What kernel versions should it support? > Hi Qian, Kernel module dependencies is same as KNI, and DPDK supports Kernel version >= 2.6.34, this is valid for KDP. For PMD, it is not dependent but uses tun/tap interface, and tun/tap also supported for kernel versions >= 2.6.34. Thanks, ferruh ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication 2016-01-27 16:32 ` [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit 2016-01-28 8:16 ` Xu, Qian Q @ 2016-02-09 17:33 ` Reshma Pattan 2016-02-09 17:51 ` Ferruh Yigit 1 sibling, 1 reply; 29+ messages in thread From: Reshma Pattan @ 2016-02-09 17:33 UTC (permalink / raw) To: Ferruh Yigit, dev Hi Ferruh, On 1/27/2016 4:32 PM, Ferruh Yigit wrote: > This patch provides slow data path communication to the Linux kernel. > Patch is based on librte_kni, and heavily re-uses it. > > The main difference is librte_kni library converted into a PMD, to > provide ease of use for applications. > > Now any application can use slow path communication without any update > in application, because of existing eal support for virtual PMD. > > Also this PMD supports two methods to send packets to the Linux, first > one is custom FIFO implementation with help of KDP kernel module, second > one is Linux in-kernel tun/tap support. PMD first checks for KDP kernel > module, if fails it tries to create and use a tap interface. > > With FIFO method: PMD's rx_pkt_burst() get packets from FIFO, > and tx_pkt_burst() puts packet to the FIFO. > The corresponding Linux virtual network device driver code > also gets/puts packets from FIFO as they are coming from hardware. > > With tun/tap method: no external kernel module required, PMD reads from > and writes packets to the tap interface file descriptor. Tap interface > has performance penalty against FIFO implementation. > > Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> > --- > > diff --git a/doc/guides/nics/pcap_ring.rst b/doc/guides/nics/pcap_ring.rst > index 46aa3ac..78b7b61 100644 > --- a/doc/guides/nics/pcap_ring.rst > +++ b/doc/guides/nics/pcap_ring.rst > @@ -28,11 +28,11 @@ > + > + > +DPDK application can be used to forward packages between these interfaces: > + Packages ==> packets.? > diff --git a/drivers/net/kdp/rte_eth_kdp.c b/drivers/net/kdp/rte_eth_kdp.c > new file mode 100644 > index 0000000..ac650d7 > --- /dev/null > +++ b/drivers/net/kdp/rte_eth_kdp.c > @@ -0,0 +1,481 @@ > No public API to create KDP PMD device. We should have one right? > diff --git a/drivers/net/kdp/rte_kdp.h b/drivers/net/kdp/rte_kdp.h > new file mode 100644 > index 0000000..b9db048 > --- /dev/null > +++ b/drivers/net/kdp/rte_kdp.h > @@ -0,0 +1,126 @@ > > +struct rte_kdp_tap *rte_kdp_tap_init(uint16_t port_id); > +struct rte_kdp *rte_kdp_init(uint16_t port_id); > + > +int rte_kdp_start(struct rte_kdp *kdp, struct rte_mempool *pktmbuf_pool, > + const struct rte_kdp_conf *conf); > + > +unsigned rte_kdp_rx_burst(struct rte_kdp *kdp, > + struct rte_mbuf **mbufs, unsigned num); > + > +unsigned rte_kdp_tx_burst(struct rte_kdp *kdp, > + struct rte_mbuf **mbufs, unsigned num); > + > +int rte_kdp_release(struct rte_kdp *kdp); > + > +void rte_kdp_close(void); > These functions can be static. Thanks, Reshma ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication 2016-02-09 17:33 ` Reshma Pattan @ 2016-02-09 17:51 ` Ferruh Yigit 0 siblings, 0 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-02-09 17:51 UTC (permalink / raw) To: Reshma Pattan; +Cc: dev On Tue, Feb 09, 2016 at 05:33:55PM +0000, Reshma Pattan wrote: > Hi Ferruh, > Hi Reshma, > On 1/27/2016 4:32 PM, Ferruh Yigit wrote: >> This patch provides slow data path communication to the Linux kernel. >> Patch is based on librte_kni, and heavily re-uses it. >> >> The main difference is librte_kni library converted into a PMD, to >> provide ease of use for applications. >> >> Now any application can use slow path communication without any update >> in application, because of existing eal support for virtual PMD. >> >> Also this PMD supports two methods to send packets to the Linux, first >> one is custom FIFO implementation with help of KDP kernel module, second >> one is Linux in-kernel tun/tap support. PMD first checks for KDP kernel >> module, if fails it tries to create and use a tap interface. >> >> With FIFO method: PMD's rx_pkt_burst() get packets from FIFO, >> and tx_pkt_burst() puts packet to the FIFO. >> The corresponding Linux virtual network device driver code >> also gets/puts packets from FIFO as they are coming from hardware. >> >> With tun/tap method: no external kernel module required, PMD reads from >> and writes packets to the tap interface file descriptor. Tap interface >> has performance penalty against FIFO implementation. >> >> Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> >> --- >> diff --git a/doc/guides/nics/pcap_ring.rst >> b/doc/guides/nics/pcap_ring.rst >> index 46aa3ac..78b7b61 100644 >> --- a/doc/guides/nics/pcap_ring.rst >> +++ b/doc/guides/nics/pcap_ring.rst >> @@ -28,11 +28,11 @@ >> + >> + >> +DPDK application can be used to forward packages between these interfaces: >> + > > Packages ==> packets.? > Right, I will fix, thanks. >> diff --git a/drivers/net/kdp/rte_eth_kdp.c b/drivers/net/kdp/rte_eth_kdp.c >> new file mode 100644 >> index 0000000..ac650d7 >> --- /dev/null >> +++ b/drivers/net/kdp/rte_eth_kdp.c >> @@ -0,0 +1,481 @@ >> > > No public API to create KDP PMD device. We should have one right? > Doesn't have to have one, KDP does not have a requirement to have right now. It is possible to create PMD with eal --vdev parameter... >> diff --git a/drivers/net/kdp/rte_kdp.h b/drivers/net/kdp/rte_kdp.h >> new file mode 100644 >> index 0000000..b9db048 >> --- /dev/null >> +++ b/drivers/net/kdp/rte_kdp.h >> @@ -0,0 +1,126 @@ >> >> +struct rte_kdp_tap *rte_kdp_tap_init(uint16_t port_id); >> +struct rte_kdp *rte_kdp_init(uint16_t port_id); >> + >> +int rte_kdp_start(struct rte_kdp *kdp, struct rte_mempool *pktmbuf_pool, >> + const struct rte_kdp_conf *conf); >> + >> +unsigned rte_kdp_rx_burst(struct rte_kdp *kdp, >> + struct rte_mbuf **mbufs, unsigned num); >> + >> +unsigned rte_kdp_tx_burst(struct rte_kdp *kdp, >> + struct rte_mbuf **mbufs, unsigned num); >> + >> +int rte_kdp_release(struct rte_kdp *kdp); >> + >> +void rte_kdp_close(void); >> > > These functions can be static. > No, this header used by multiple sources, the function declarations here are the ones in the scope of other file. Thanks, ferruh ^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v2 0/2] slow data path communication between DPDK port and Linux 2016-01-27 16:32 [dpdk-dev] [PATCH 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2016-01-27 16:32 ` [dpdk-dev] [PATCH 1/2] kdp: add kernel data path kernel module Ferruh Yigit 2016-01-27 16:32 ` [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit @ 2016-02-19 5:05 ` Ferruh Yigit 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 1/2] kdp: add kernel data path kernel module Ferruh Yigit ` (2 more replies) 2 siblings, 3 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-02-19 5:05 UTC (permalink / raw) To: dev This is slow data path communication implementation based on existing KNI. Difference is: librte_kni converted into a PMD, kdp kernel module is almost same except all control path functionality removed and some simplification done. Motivation is to simplify slow path data communication. Now any application can use this new PMD to send/get data to Linux kernel. PMD supports two communication methods: 1) KDP kernel module PMD initialization functions handles creating virtual interfaces (with help of kdp kernel module) and created FIFO. FIFO is used to share data between userspace and kernelspace. This is default method. 2) tun/tap module When KDP module is not inserted, PMD creates tap interface and transfers packets using tap interface. In long term this patch intends to replace the KNI and KNI will be depreciated. v2: u* Use rtnetlink to create interfaces * include modules.h to prevent compile error in old kernels Sample usage: 1) Transfer any packet received from NIC that bound to DPDK, to the Linux kernel a) insert kdp kernel module insmod build/kmod/rte_kdp.ko b) bind NIC to the DPDK using dpdk_nic_bind.py c) ./testpmd --vdev eth_kdp0 c1) testpmd show two ports, one of them physical, other virtual ... Configuring Port 0 (socket 0) Port 0: 00:00:00:00:00:00 Configuring Port 1 (socket 0) ... Checking link statuses... Port 0 Link Up - speed 10000 Mbps - full-duplex Port 1 Link Up - speed 10000 Mbps - full-duplex Done c2) This will create "kdp0" Linux interface $ ip l show kdp0 21: kdp0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff d) Linux port can be used for data d1) $ ifconfig kdp0 1.0.0.2 $ ping 1.0.0.1 PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data. 64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=0.789 ms 64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=0.881 ms d2) $ tcpdump -nn -i kdp0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:01:22.407506 IP 1.0.0.1 > 1.0.0.2: ICMP echo request, id 40016, seq 18, length 64 15:01:22.408521 IP 1.0.0.2 > 1.0.0.1: ICMP echo reply, id 40016, seq 18, length 64 2) Data travels between virtual Linux interfaces pass from DPDK application, application can alter data a) insert kdp kernel module insmod build/kmod/rte_kdp.ko b) No physical NIC involved c) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 c1) testpmd show two ports, both of them are virtual ... Configuring Port 0 (socket 0) Port 0: 00:00:00:00:00:00 Configuring Port 1 (socket 0) Port 1: 00:00:00:00:00:00 Checking link statuses... Port 0 Link Up - speed 10000 Mbps - full-duplex Port 1 Link Up - speed 10000 Mbps - full-duplex Done c2) This will create "kdp0" and "kdp1" Linux interfaces $ ip l show kdp0; ip l show kdp1 22: kdp0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 23: kdp1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff d) Data travel between virtual ports pass from DPDK application $ifconfig kdp0 1.0.0.1 $ifconfig kdp1 1.0.0.2 d1) $ ping 1.0.0.1 PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data. 64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=3.57 ms 64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=1.85 ms 64 bytes from 1.0.0.1: icmp_seq=3 ttl=64 time=1.89 ms d2) $ tcpdump -nn -i kdp0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:20:51.908543 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 1, length 64 15:20:51.909570 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 1, length 64 15:20:52.909551 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 2, length 64 15:20:52.910577 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 2, length 64 3) tun/tap interface usage a) No external module required, tun/tap support in kernel required b) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 b1) This will create "tap_kdp0" and "tap_kdp1" Linux interfaces $ ip l show tap_kdp0; ip l show tap_kdp1 25: tap_kdp0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 56:47:97:9c:03:8e brd ff:ff:ff:ff:ff:ff 26: tap_kdp1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:15:22:b0:52:42 brd ff:ff:ff:ff:ff:ff Ferruh Yigit (2): kdp: add kernel data path kernel module kdp: add virtual PMD for kernel slow data path communication MAINTAINERS | 5 + config/common_linuxapp | 9 +- doc/guides/nics/pcap_ring.rst | 125 ++- doc/guides/rel_notes/release_16_04.rst | 6 + drivers/net/Makefile | 3 +- drivers/net/kdp/Makefile | 61 ++ drivers/net/kdp/rte_eth_kdp.c | 501 ++++++++++++ drivers/net/kdp/rte_kdp.c | 633 +++++++++++++++ drivers/net/kdp/rte_kdp.h | 116 +++ drivers/net/kdp/rte_kdp_fifo.h | 91 +++ drivers/net/kdp/rte_kdp_tap.c | 101 +++ drivers/net/kdp/rte_pmd_kdp_version.map | 4 + lib/librte_eal/common/include/rte_log.h | 3 +- lib/librte_eal/linuxapp/Makefile | 5 +- lib/librte_eal/linuxapp/eal/Makefile | 3 +- .../linuxapp/eal/include/exec-env/rte_kdp_common.h | 139 ++++ lib/librte_eal/linuxapp/kdp/Makefile | 55 ++ lib/librte_eal/linuxapp/kdp/kdp_dev.h | 78 ++ lib/librte_eal/linuxapp/kdp/kdp_fifo.h | 91 +++ lib/librte_eal/linuxapp/kdp/kdp_net.c | 862 +++++++++++++++++++++ mk/rte.app.mk | 3 +- 21 files changed, 2885 insertions(+), 9 deletions(-) create mode 100644 drivers/net/kdp/Makefile create mode 100644 drivers/net/kdp/rte_eth_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.h create mode 100644 drivers/net/kdp/rte_kdp_fifo.h create mode 100644 drivers/net/kdp/rte_kdp_tap.c create mode 100644 drivers/net/kdp/rte_pmd_kdp_version.map create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h create mode 100644 lib/librte_eal/linuxapp/kdp/Makefile create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_dev.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_fifo.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_net.c -- 2.5.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v2 1/2] kdp: add kernel data path kernel module 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit @ 2016-02-19 5:05 ` Ferruh Yigit 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2 siblings, 0 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-02-19 5:05 UTC (permalink / raw) To: dev This kernel module is based on KNI module, but this one is stripped version of it and only for data messages, no control functionality provided. FIFO implementation of the KNI is kept exact same, but ethtool related code removed and virtual network management related code simplified. This module contains kernel support to create network devices and this module has a simple driver for virtual network device, the driver simply puts/gets packets to/from FIFO instead of real hardware. FIFO is created owned by userspace application, which is for this case KDP PMD. In long term this patch intends to replace the KNI and KNI will be depreciated. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> --- v2: * Use rtnetlink to create interfaces * include modules.h to prevent compile error in old kernels --- MAINTAINERS | 4 + config/common_linuxapp | 8 +- lib/librte_eal/linuxapp/Makefile | 5 +- lib/librte_eal/linuxapp/eal/Makefile | 3 +- .../linuxapp/eal/include/exec-env/rte_kdp_common.h | 139 ++++ lib/librte_eal/linuxapp/kdp/Makefile | 55 ++ lib/librte_eal/linuxapp/kdp/kdp_dev.h | 78 ++ lib/librte_eal/linuxapp/kdp/kdp_fifo.h | 91 +++ lib/librte_eal/linuxapp/kdp/kdp_net.c | 862 +++++++++++++++++++++ 9 files changed, 1242 insertions(+), 3 deletions(-) create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h create mode 100644 lib/librte_eal/linuxapp/kdp/Makefile create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_dev.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_fifo.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_net.c diff --git a/MAINTAINERS b/MAINTAINERS index 628bc05..05ffe26 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -257,6 +257,10 @@ F: app/test/test_kni.c F: examples/kni/ F: doc/guides/sample_app_ug/kernel_nic_interface.rst +Linux KDP +M: Ferruh Yigit <ferruh.yigit@gmail.com> +F: lib/librte_eal/linuxapp/kdp/ + Linux AF_PACKET M: John W. Linville <linville@tuxdriver.com> F: drivers/net/af_packet/ diff --git a/config/common_linuxapp b/config/common_linuxapp index f1638db..e1b5032 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -314,6 +314,12 @@ CONFIG_RTE_LIBRTE_PMD_XENVIRT=n CONFIG_RTE_LIBRTE_PMD_NULL=y # +# Compile KDP PMD +# +CONFIG_RTE_KDP_KMOD=y +CONFIG_RTE_KDP_PREEMPT_DEFAULT=y + +# # Do prefetch of packet data within PMD driver receive function # CONFIG_RTE_PMD_PACKET_PREFETCH=y diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile index d9c5233..e3f91a7 100644 --- a/lib/librte_eal/linuxapp/Makefile +++ b/lib/librte_eal/linuxapp/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -38,6 +38,9 @@ DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal ifeq ($(CONFIG_RTE_KNI_KMOD),y) DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kni endif +ifeq ($(CONFIG_RTE_KDP_KMOD),y) +DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += kdp +endif ifeq ($(CONFIG_RTE_LIBRTE_XEN_DOM0),y) DIRS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += xen_dom0 endif diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index 6e26250..a70b793 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -121,6 +121,7 @@ CFLAGS_eal_thread.o += -Wno-return-type endif INC := rte_interrupts.h rte_kni_common.h rte_dom0_common.h +INC += rte_kdp_common.h SYMLINK-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP)-include/exec-env := \ $(addprefix include/exec-env/,$(INC)) diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h new file mode 100644 index 0000000..0334876 --- /dev/null +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h @@ -0,0 +1,139 @@ +/*- + * This file is provided under a dual BSD/LGPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GNU LESSER GENERAL PUBLIC LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + * + * + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + */ + +#ifndef _RTE_KDP_COMMON_H_ +#define _RTE_KDP_COMMON_H_ + +/** + * KDP name + */ +#define RTE_KDP_NAMESIZE 32 + +#define KDP_DEVICE "kdp" + +/* + * Fifo struct mapped in a shared memory. It describes a circular buffer FIFO + * Write and read should wrap around. Fifo is empty when write == read + * Writing should never overwrite the read position + */ +struct rte_kdp_fifo { + volatile unsigned write; /**< Next position to be written*/ + volatile unsigned read; /**< Next position to be read */ + unsigned len; /**< Circular buffer length */ + unsigned elem_size; /**< Pointer size - for 32/64 bit OS */ + void * volatile buffer[0]; /**< The buffer contains mbuf pointers */ +}; + +/* + * The kernel image of the rte_mbuf struct, with only the relevant fields. + * Padding is necessary to assure the offsets of these fields + */ +struct rte_kdp_mbuf { + void *buf_addr __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); + char pad0[10]; + + /**< Start address of data in segment buffer. */ + uint16_t data_off; + char pad1[4]; + uint64_t ol_flags; /**< Offload features. */ + char pad2[4]; + + /**< Total pkt len: sum of all segment data_len. */ + uint32_t pkt_len; + + /**< Amount of data in segment buffer. */ + uint16_t data_len; + + /* fields on second cache line */ + char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); + void *pool; + void *next; +}; + +/* + * Struct used to create a KDP device. Passed to the kernel in IOCTL call + */ +struct rte_kdp_device_info { + char name[RTE_KDP_NAMESIZE]; /**< Network device name for KDP */ + + phys_addr_t tx_phys; + phys_addr_t rx_phys; + phys_addr_t alloc_phys; + phys_addr_t free_phys; + + /* mbuf mempool */ + void *mbuf_va; + phys_addr_t mbuf_phys; + + uint16_t port_id; /**< Group ID */ + uint32_t core_id; /**< core ID to bind for kernel thread */ + + uint8_t force_bind : 1; /**< Flag for kernel thread binding */ + + /* mbuf size */ + unsigned mbuf_size; +}; + +enum { + IFLA_KDP_UNSPEC, + IFLA_KDP_PORTID, + IFLA_KDP_DEVINFO, + __IFLA_KDP_MAX, +}; +#define IFLA_KDP_MAX (__IFLA_KDP_MAX - 1) + +#endif /* _RTE_KDP_COMMON_H_ */ diff --git a/lib/librte_eal/linuxapp/kdp/Makefile b/lib/librte_eal/linuxapp/kdp/Makefile new file mode 100644 index 0000000..3897dc6 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/Makefile @@ -0,0 +1,55 @@ +# BSD LICENSE +# +# Copyright(c) 2016 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# module name and path +# +MODULE = rte_kdp + +# +# CFLAGS +# +MODULE_CFLAGS += -I$(SRCDIR) --param max-inline-insns-single=50 +MODULE_CFLAGS += -I$(RTE_OUTPUT)/include +MODULE_CFLAGS += -include $(RTE_OUTPUT)/include/rte_config.h +MODULE_CFLAGS += -Wall -Werror + +# this lib needs main eal +DEPDIRS-y += lib/librte_eal/linuxapp/eal + +# +# all source are stored in SRCS-y +# +SRCS-y += kdp_net.c + +include $(RTE_SDK)/mk/rte.module.mk diff --git a/lib/librte_eal/linuxapp/kdp/kdp_dev.h b/lib/librte_eal/linuxapp/kdp/kdp_dev.h new file mode 100644 index 0000000..61f4288 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/kdp_dev.h @@ -0,0 +1,78 @@ +/*- + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + */ + +#ifndef _KDP_DEV_H_ +#define _KDP_DEV_H_ + +#include <exec-env/rte_kdp_common.h> + +/** + * A structure describing the private information for a kdp device. + */ +struct kdp_dev { + /* kdp list */ + struct list_head list; + + struct net_device_stats stats; + uint16_t port_id; /* Group ID of a group of KDP devices */ + unsigned core_id; /* Core ID to bind */ + char name[RTE_KDP_NAMESIZE]; /* Network device name */ + struct task_struct *pthread; + + /* wait queue for req/resp */ + wait_queue_head_t wq; + struct mutex sync_lock; + + /* kdp device */ + struct net_device *net_dev; + + /* queue for packets to be sent out */ + void *tx_q; + + /* queue for the packets received */ + void *rx_q; + + /* queue for the allocated mbufs those can be used to save sk buffs */ + void *alloc_q; + + /* free queue for the mbufs to be freed */ + void *free_q; + + void *sync_kva; + void *sync_va; + + void *mbuf_kva; + void *mbuf_va; + + /* mbuf size */ + unsigned mbuf_size; +}; + +#define KDP_ERR(args...) printk(KERN_ERR "KDP: " args) +#define KDP_PRINT(args...) printk(KERN_DEBUG "KDP: " args) + +#ifdef RTE_KDP_KO_DEBUG +#define KDP_DBG(args...) printk(KERN_DEBUG "KDP: " args) +#else +#define KDP_DBG(args...) +#endif + +#endif diff --git a/lib/librte_eal/linuxapp/kdp/kdp_fifo.h b/lib/librte_eal/linuxapp/kdp/kdp_fifo.h new file mode 100644 index 0000000..a5fe080 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/kdp_fifo.h @@ -0,0 +1,91 @@ +/*- + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + */ + +#ifndef _KDP_FIFO_H_ +#define _KDP_FIFO_H_ + +#include <exec-env/rte_kdp_common.h> + +/** + * Adds num elements into the fifo. Return the number actually written + */ +static inline unsigned +kdp_fifo_put(struct rte_kdp_fifo *fifo, void **data, unsigned num) +{ + unsigned i = 0; + unsigned fifo_write = fifo->write; + unsigned fifo_read = fifo->read; + unsigned new_write = fifo_write; + + for (i = 0; i < num; i++) { + new_write = (new_write + 1) & (fifo->len - 1); + + if (new_write == fifo_read) + break; + fifo->buffer[fifo_write] = data[i]; + fifo_write = new_write; + } + fifo->write = fifo_write; + + return i; +} + +/** + * Get up to num elements from the fifo. Return the number actully read + */ +static inline unsigned +kdp_fifo_get(struct rte_kdp_fifo *fifo, void **data, unsigned num) +{ + unsigned i = 0; + unsigned new_read = fifo->read; + unsigned fifo_write = fifo->write; + + for (i = 0; i < num; i++) { + if (new_read == fifo_write) + break; + + data[i] = fifo->buffer[new_read]; + new_read = (new_read + 1) & (fifo->len - 1); + } + fifo->read = new_read; + + return i; +} + +/** + * Get the num of elements in the fifo + */ +static inline unsigned +kdp_fifo_count(struct rte_kdp_fifo *fifo) +{ + return (fifo->len + fifo->write - fifo->read) & (fifo->len - 1); +} + +/** + * Get the num of available elements in the fifo + */ +static inline unsigned +kdp_fifo_free_count(struct rte_kdp_fifo *fifo) +{ + return (fifo->read - fifo->write - 1) & (fifo->len - 1); +} + +#endif /* _KDP_FIFO_H_ */ diff --git a/lib/librte_eal/linuxapp/kdp/kdp_net.c b/lib/librte_eal/linuxapp/kdp/kdp_net.c new file mode 100644 index 0000000..08229f1 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/kdp_net.c @@ -0,0 +1,862 @@ +/*- + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + */ + +/* + * This code is inspired from the book "Linux Device Drivers" by + * Alessandro Rubini and Jonathan Corbet, published by O'Reilly & Associates + */ + +#include <linux/version.h> +#include <linux/module.h> +#include <linux/etherdevice.h> /* eth_type_trans */ +#include <linux/kthread.h> +#include <net/rtnetlink.h> + +#include "kdp_fifo.h" +#include "kdp_dev.h" + +#define WD_TIMEOUT 5 /*jiffies */ +#define MBUF_BURST_SZ 32 + +#define KDP_RX_LOOP_NUM 1000 +#define KDP_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ + +static struct task_struct *kdp_kthread; +static struct rw_semaphore kdp_list_lock; +static struct list_head kdp_list_head; + +/* loopback mode */ +static char *lo_mode; + +/* Kernel thread mode */ +static char *kthread_mode; +static unsigned multiple_kthread_on; + +/* typedef for rx function */ +typedef void (*kdp_net_rx_t)(struct kdp_dev *kdp); + +/* + * Open and close + */ +static int kdp_net_open(struct net_device *dev) +{ + random_ether_addr(dev->dev_addr); + netif_start_queue(dev); + + return 0; +} + +static int kdp_net_release(struct net_device *dev) +{ + netif_stop_queue(dev); /* can't transmit any more */ + + return 0; +} + +/* + * Configuration changes (passed on by ifconfig) + */ +static int kdp_net_config(struct net_device *dev, struct ifmap *map) +{ + if (dev->flags & IFF_UP) /* can't act on a running interface */ + return -EBUSY; + + /* ignore other fields */ + return 0; +} + +/* + * Transmit a packet (called by the kernel) + */ +static int kdp_net_tx(struct sk_buff *skb, struct net_device *dev) +{ + int len = 0; + unsigned ret; + struct kdp_dev *kdp = netdev_priv(dev); + struct rte_kdp_mbuf *pkt_kva = NULL; + struct rte_kdp_mbuf *pkt_va = NULL; + + dev->trans_start = jiffies; /* save the timestamp */ + + /* Check if the length of skb is less than mbuf size */ + if (skb->len > kdp->mbuf_size) + goto drop; + + /** + * Check if it has at least one free entry in tx_q and + * one entry in alloc_q. + */ + if (kdp_fifo_free_count(kdp->tx_q) == 0 || + kdp_fifo_count(kdp->alloc_q) == 0) { + /** + * If no free entry in tx_q or no entry in alloc_q, + * drops skb and goes out. + */ + goto drop; + } + + /* dequeue a mbuf from alloc_q */ + ret = kdp_fifo_get(kdp->alloc_q, (void **)&pkt_va, 1); + if (likely(ret == 1)) { + void *data_kva; + + pkt_kva = (void *)pkt_va - kdp->mbuf_va + kdp->mbuf_kva; + data_kva = pkt_kva->buf_addr + pkt_kva->data_off - kdp->mbuf_va + + kdp->mbuf_kva; + + len = skb->len; + memcpy(data_kva, skb->data, len); + if (unlikely(len < ETH_ZLEN)) { + memset(data_kva + len, 0, ETH_ZLEN - len); + len = ETH_ZLEN; + } + pkt_kva->pkt_len = len; + pkt_kva->data_len = len; + + /* enqueue mbuf into tx_q */ + ret = kdp_fifo_put(kdp->tx_q, (void **)&pkt_va, 1); + if (unlikely(ret != 1)) { + /* Failing should not happen */ + KDP_ERR("Fail to enqueue mbuf into tx_q\n"); + goto drop; + } + } else { + /* Failing should not happen */ + KDP_ERR("Fail to dequeue mbuf from alloc_q\n"); + goto drop; + } + + /* Free skb and update statistics */ + dev_kfree_skb(skb); + kdp->stats.tx_bytes += len; + kdp->stats.tx_packets++; + + return NETDEV_TX_OK; + +drop: + /* Free skb and update statistics */ + dev_kfree_skb(skb); + kdp->stats.tx_dropped++; + + return NETDEV_TX_OK; +} + +static int kdp_net_change_mtu(struct net_device *dev, int new_mtu) +{ + KDP_DBG("kdp_net_change_mtu new mtu %d to be set\n", new_mtu); + + dev->mtu = new_mtu; + + return 0; +} + +/* + * Ioctl commands + */ +static int kdp_net_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) +{ + KDP_DBG("kdp_net_ioctl %d\n", + ((struct kdp_dev *)netdev_priv(dev))->port_id); + + return 0; +} + +static void kdp_net_set_rx_mode(struct net_device *dev) +{ +} + +/* + * Return statistics to the caller + */ +static struct net_device_stats *kdp_net_stats(struct net_device *dev) +{ + struct kdp_dev *kdp = netdev_priv(dev); + + return &kdp->stats; +} + +/* + * Deal with a transmit timeout. + */ +static void kdp_net_tx_timeout(struct net_device *dev) +{ + struct kdp_dev *kdp = netdev_priv(dev); + + KDP_DBG("Transmit timeout at %ld, latency %ld\n", jiffies, + jiffies - dev->trans_start); + + kdp->stats.tx_errors++; + netif_wake_queue(dev); +} + +/** + * kdp_net_set_mac - Change the Ethernet Address of the KDP NIC + * @netdev: network interface device structure + * @p: pointer to an address structure + * + * Returns 0 on success, negative on failure + **/ +static int kdp_net_set_mac(struct net_device *netdev, void *p) +{ + struct sockaddr *addr = p; + if (!is_valid_ether_addr((unsigned char *)(addr->sa_data))) + return -EADDRNOTAVAIL; + memcpy(netdev->dev_addr, addr->sa_data, netdev->addr_len); + + return 0; +} + +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0)) +static int kdp_net_change_carrier(struct net_device *dev, bool new_carrier) +{ + if (new_carrier) + netif_carrier_on(dev); + else + netif_carrier_off(dev); + + return 0; +} +#endif + +static const struct net_device_ops kdp_net_netdev_ops = { + .ndo_open = kdp_net_open, + .ndo_stop = kdp_net_release, + .ndo_set_config = kdp_net_config, + .ndo_start_xmit = kdp_net_tx, + .ndo_change_mtu = kdp_net_change_mtu, + .ndo_do_ioctl = kdp_net_ioctl, + .ndo_set_rx_mode = kdp_net_set_rx_mode, + .ndo_get_stats = kdp_net_stats, + .ndo_tx_timeout = kdp_net_tx_timeout, + .ndo_set_mac_address = kdp_net_set_mac, +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0)) + .ndo_change_carrier = kdp_net_change_carrier, +#endif +}; + +/* + * Fill the eth header + */ +static int kdp_net_header(struct sk_buff *skb, struct net_device *dev, + unsigned short type, const void *daddr, + const void *saddr, unsigned int len) +{ + struct ethhdr *eth = (struct ethhdr *) skb_push(skb, ETH_HLEN); + + memcpy(eth->h_source, saddr ? saddr : dev->dev_addr, dev->addr_len); + memcpy(eth->h_dest, daddr ? daddr : dev->dev_addr, dev->addr_len); + eth->h_proto = htons(type); + + return dev->hard_header_len; +} + +/* + * Re-fill the eth header + */ +#if (LINUX_VERSION_CODE < KERNEL_VERSION(4, 1, 0)) +static int kdp_net_rebuild_header(struct sk_buff *skb) +{ + struct net_device *dev = skb->dev; + struct ethhdr *eth = (struct ethhdr *) skb->data; + + memcpy(eth->h_source, dev->dev_addr, dev->addr_len); + memcpy(eth->h_dest, dev->dev_addr, dev->addr_len); + + return 0; +} +#endif /* < 4.1.0 */ + +static const struct header_ops kdp_net_header_ops = { + .create = kdp_net_header, +#if (LINUX_VERSION_CODE < KERNEL_VERSION(4, 1, 0)) + .rebuild = kdp_net_rebuild_header, +#endif /* < 4.1.0 */ + .cache = NULL, /* disable caching */ +}; + +static void kdp_net_setup(struct net_device *dev) +{ + struct kdp_dev *kdp; + + ether_setup(dev); + dev->netdev_ops = &kdp_net_netdev_ops; + dev->header_ops = &kdp_net_header_ops; + dev->watchdog_timeo = WD_TIMEOUT; + + kdp = netdev_priv(dev); + init_waitqueue_head(&kdp->wq); + mutex_init(&kdp->sync_lock); + + dev->flags |= IFF_UP; +} + +/* + * RX: normal working mode + */ +static void kdp_net_rx_normal(struct kdp_dev *kdp) +{ + unsigned ret; + uint32_t len; + unsigned i, num_rx, num_fq; + struct rte_kdp_mbuf *kva; + struct rte_kdp_mbuf *va[MBUF_BURST_SZ]; + void *data_kva; + unsigned mbuf_burst_size = MBUF_BURST_SZ; + + struct sk_buff *skb; + struct net_device *dev = kdp->net_dev; + + /* Get the number of free entries in free_q */ + num_fq = kdp_fifo_free_count(kdp->free_q); + if (num_fq == 0) { + /* No room on the free_q, bail out */ + return; + } + + /* Calculate the number of entries to dequeue from rx_q */ + num_rx = min(num_fq, mbuf_burst_size); + + /* Burst dequeue from rx_q */ + num_rx = kdp_fifo_get(kdp->rx_q, (void **)va, num_rx); + if (num_rx == 0) + return; + + /* Transfer received packets to netif */ + for (i = 0; i < num_rx; i++) { + kva = (void *)va[i] - kdp->mbuf_va + kdp->mbuf_kva; + len = kva->data_len; + data_kva = kva->buf_addr + kva->data_off - kdp->mbuf_va + + kdp->mbuf_kva; + + skb = dev_alloc_skb(len + 2); + if (!skb) { + KDP_ERR("Out of mem, dropping pkts\n"); + /* Update statistics */ + kdp->stats.rx_dropped++; + } else { + /* Align IP on 16B boundary */ + skb_reserve(skb, 2); + memcpy(skb_put(skb, len), data_kva, len); + skb->dev = dev; + skb->protocol = eth_type_trans(skb, dev); + skb->ip_summed = CHECKSUM_UNNECESSARY; + + /* Call netif interface */ + netif_rx(skb); + + /* Update statistics */ + kdp->stats.rx_bytes += len; + kdp->stats.rx_packets++; + } + } + + /* Burst enqueue mbufs into free_q */ + ret = kdp_fifo_put(kdp->free_q, (void **)va, num_rx); + if (ret != num_rx) + /* Failing should not happen */ + KDP_ERR("Fail to enqueue entries into free_q\n"); +} + +/* + * RX: loopback with enqueue/dequeue fifos. + */ +static void kdp_net_rx_lo_fifo(struct kdp_dev *kdp) +{ + unsigned ret; + uint32_t len; + unsigned i, num, num_rq, num_tq, num_aq, num_fq; + struct rte_kdp_mbuf *kva; + struct rte_kdp_mbuf *va[MBUF_BURST_SZ]; + void *data_kva; + struct rte_kdp_mbuf *alloc_kva; + struct rte_kdp_mbuf *alloc_va[MBUF_BURST_SZ]; + void *alloc_data_kva; + unsigned mbuf_burst_size = MBUF_BURST_SZ; + + /* Get the number of entries in rx_q */ + num_rq = kdp_fifo_count(kdp->rx_q); + + /* Get the number of free entrie in tx_q */ + num_tq = kdp_fifo_free_count(kdp->tx_q); + + /* Get the number of entries in alloc_q */ + num_aq = kdp_fifo_count(kdp->alloc_q); + + /* Get the number of free entries in free_q */ + num_fq = kdp_fifo_free_count(kdp->free_q); + + /* Calculate the number of entries to be dequeued from rx_q */ + num = min(num_rq, num_tq); + num = min(num, num_aq); + num = min(num, num_fq); + num = min(num, mbuf_burst_size); + + /* Return if no entry to dequeue from rx_q */ + if (num == 0) + return; + + /* Burst dequeue from rx_q */ + ret = kdp_fifo_get(kdp->rx_q, (void **)va, num); + if (ret == 0) + return; /* Failing should not happen */ + + /* Dequeue entries from alloc_q */ + ret = kdp_fifo_get(kdp->alloc_q, (void **)alloc_va, num); + if (ret) { + num = ret; + /* Copy mbufs */ + for (i = 0; i < num; i++) { + kva = (void *)va[i] - kdp->mbuf_va + kdp->mbuf_kva; + len = kva->pkt_len; + data_kva = kva->buf_addr + kva->data_off - + kdp->mbuf_va + kdp->mbuf_kva; + + alloc_kva = (void *)alloc_va[i] - kdp->mbuf_va + + kdp->mbuf_kva; + alloc_data_kva = alloc_kva->buf_addr + + alloc_kva->data_off - kdp->mbuf_va + + kdp->mbuf_kva; + memcpy(alloc_data_kva, data_kva, len); + alloc_kva->pkt_len = len; + alloc_kva->data_len = len; + + kdp->stats.tx_bytes += len; + kdp->stats.rx_bytes += len; + } + + /* Burst enqueue mbufs into tx_q */ + ret = kdp_fifo_put(kdp->tx_q, (void **)alloc_va, num); + if (ret != num) + /* Failing should not happen */ + KDP_ERR("Fail to enqueue mbufs into tx_q\n"); + } + + /* Burst enqueue mbufs into free_q */ + ret = kdp_fifo_put(kdp->free_q, (void **)va, num); + if (ret != num) + /* Failing should not happen */ + KDP_ERR("Fail to enqueue mbufs into free_q\n"); + + /** + * Update statistic, and enqueue/dequeue failure is impossible, + * as all queues are checked at first. + */ + kdp->stats.tx_packets += num; + kdp->stats.rx_packets += num; +} + +/* + * RX: loopback with enqueue/dequeue fifos and sk buffer copies. + */ +static void kdp_net_rx_lo_fifo_skb(struct kdp_dev *kdp) +{ + unsigned ret; + uint32_t len; + unsigned i, num_rq, num_fq, num; + struct rte_kdp_mbuf *kva; + struct rte_kdp_mbuf *va[MBUF_BURST_SZ]; + void *data_kva; + struct sk_buff *skb; + struct net_device *dev = kdp->net_dev; + unsigned mbuf_burst_size = MBUF_BURST_SZ; + + /* Get the number of entries in rx_q */ + num_rq = kdp_fifo_count(kdp->rx_q); + + /* Get the number of free entries in free_q */ + num_fq = kdp_fifo_free_count(kdp->free_q); + + /* Calculate the number of entries to dequeue from rx_q */ + num = min(num_rq, num_fq); + num = min(num, mbuf_burst_size); + + /* Return if no entry to dequeue from rx_q */ + if (num == 0) + return; + + /* Burst dequeue mbufs from rx_q */ + ret = kdp_fifo_get(kdp->rx_q, (void **)va, num); + if (ret == 0) + return; + + /* Copy mbufs to sk buffer and then call tx interface */ + for (i = 0; i < num; i++) { + kva = (void *)va[i] - kdp->mbuf_va + kdp->mbuf_kva; + len = kva->data_len; + data_kva = kva->buf_addr + kva->data_off - kdp->mbuf_va + + kdp->mbuf_kva; + + skb = dev_alloc_skb(len + 2); + if (skb == NULL) + KDP_ERR("Out of mem, dropping pkts\n"); + else { + /* Align IP on 16B boundary */ + skb_reserve(skb, 2); + memcpy(skb_put(skb, len), data_kva, len); + skb->dev = dev; + skb->ip_summed = CHECKSUM_UNNECESSARY; + dev_kfree_skb(skb); + } + + /* Simulate real usage, allocate/copy skb twice */ + skb = dev_alloc_skb(len + 2); + if (skb == NULL) { + KDP_ERR("Out of mem, dropping pkts\n"); + kdp->stats.rx_dropped++; + } else { + /* Align IP on 16B boundary */ + skb_reserve(skb, 2); + memcpy(skb_put(skb, len), data_kva, len); + skb->dev = dev; + skb->ip_summed = CHECKSUM_UNNECESSARY; + + kdp->stats.rx_bytes += len; + kdp->stats.rx_packets++; + + /* call tx interface */ + kdp_net_tx(skb, dev); + } + } + + /* enqueue all the mbufs from rx_q into free_q */ + ret = kdp_fifo_put(kdp->free_q, (void **)&va, num); + if (ret != num) + /* Failing should not happen */ + KDP_ERR("Fail to enqueue mbufs into free_q\n"); +} + +/* kdp rx function pointer, with default to normal rx */ +static kdp_net_rx_t kdp_net_rx_func = kdp_net_rx_normal; + +/* rx interface */ +static void kdp_net_rx(struct kdp_dev *kdp) +{ + /** + * It doesn't need to check if it is NULL pointer, + * as it has a default value + */ + (*kdp_net_rx_func)(kdp); +} + +static int kdp_thread_single(void *data) +{ + struct kdp_dev *dev; + int j; + + while (!kthread_should_stop()) { + down_read(&kdp_list_lock); + for (j = 0; j < KDP_RX_LOOP_NUM; j++) { + list_for_each_entry(dev, &kdp_list_head, list) { + kdp_net_rx(dev); + } + } + up_read(&kdp_list_lock); +#ifdef RTE_KDP_PREEMPT_DEFAULT + /* reschedule out for a while */ + schedule_timeout_interruptible( + usecs_to_jiffies(KDP_KTHREAD_RESCHEDULE_INTERVAL)); +#endif + } + + return 0; +} + +static int kdp_thread_multiple(void *param) +{ + int j; + struct kdp_dev *dev = (struct kdp_dev *)param; + + while (!kthread_should_stop()) { + for (j = 0; j < KDP_RX_LOOP_NUM; j++) + kdp_net_rx(dev); + +#ifdef RTE_KDP_PREEMPT_DEFAULT + schedule_timeout_interruptible( + usecs_to_jiffies(KDP_KTHREAD_RESCHEDULE_INTERVAL)); +#endif + } + + return 0; +} + +static void kdp_setup(struct kdp_dev *kdp, + struct rte_kdp_device_info *info) +{ + kdp->port_id = info->port_id; + kdp->core_id = info->core_id; + strncpy(kdp->name, info->name, RTE_KDP_NAMESIZE); + + /* Translate user space info into kernel space info */ + kdp->tx_q = phys_to_virt(info->tx_phys); + kdp->rx_q = phys_to_virt(info->rx_phys); + kdp->alloc_q = phys_to_virt(info->alloc_phys); + kdp->free_q = phys_to_virt(info->free_phys); + + kdp->mbuf_kva = phys_to_virt(info->mbuf_phys); + kdp->mbuf_va = info->mbuf_va; + + kdp->mbuf_size = info->mbuf_size; + + KDP_PRINT("tx_phys: 0x%016llx, tx_q addr: 0x%p\n", + (unsigned long long) info->tx_phys, kdp->tx_q); + KDP_PRINT("rx_phys: 0x%016llx, rx_q addr: 0x%p\n", + (unsigned long long) info->rx_phys, kdp->rx_q); + KDP_PRINT("alloc_phys: 0x%016llx, alloc_q addr: 0x%p\n", + (unsigned long long) info->alloc_phys, kdp->alloc_q); + KDP_PRINT("free_phys: 0x%016llx, free_q addr: 0x%p\n", + (unsigned long long) info->free_phys, kdp->free_q); + KDP_PRINT("mbuf_phys: 0x%016llx, mbuf_kva: 0x%p\n", + (unsigned long long) info->mbuf_phys, kdp->mbuf_kva); + KDP_PRINT("mbuf_va: 0x%p\n", info->mbuf_va); + KDP_PRINT("mbuf_size: %u\n", kdp->mbuf_size); +} + +static int create_kthread(struct kdp_dev *kdp, + struct rte_kdp_device_info *info) +{ + /** + * Create a new kernel thread for multiple mode, set its core affinity, + * and finally wake it up. + */ + if (multiple_kthread_on) { + kdp->pthread = kthread_create(kdp_thread_multiple, + (void *)kdp, "kdp_%s", kdp->name); + if (IS_ERR(kdp->pthread)) + return -ECANCELED; + + if (info->force_bind) + kthread_bind(kdp->pthread, kdp->core_id); + + wake_up_process(kdp->pthread); + + return 0; + } + + /* single thread */ + if (kdp_kthread == NULL) { + KDP_PRINT("Single kernel thread for all KDP devices\n"); + + /* Create kernel thread for RX */ + kdp_kthread = kthread_run(kdp_thread_single, NULL, + "kdp_single"); + if (IS_ERR(kdp_kthread)) { + KDP_ERR("Unable to create kernel threaed\n"); + return PTR_ERR(kdp_kthread); + } + } + + return 0; +} + +static int kdp_net_newlink(struct net *net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) +{ + struct rte_kdp_device_info dev_info; + struct kdp_dev *kdp; + int ret; + + kdp = netdev_priv(dev); + + if (data && data[IFLA_KDP_PORTID]) + kdp->port_id = nla_get_u8(data[IFLA_KDP_PORTID]); + else + goto error_free; + + if (data && data[IFLA_KDP_DEVINFO]) + memcpy(&dev_info, nla_data(data[IFLA_KDP_DEVINFO]), + sizeof(struct rte_kdp_device_info)); + else + goto error_free; + + /** + * Check if the cpu core id is valid for binding, + * for multiple kernel thread mode. + */ + if (multiple_kthread_on && dev_info.force_bind && + !cpu_online(dev_info.core_id)) { + KDP_ERR("cpu %u is not online\n", dev_info.core_id); + goto error_free; + } + + kdp->net_dev = dev; + kdp_setup(kdp, &dev_info); + + ret = register_netdevice(dev); + if (ret < 0) + goto error_free; + + ret = create_kthread(kdp, &dev_info); + if (ret < 0) + goto error_unregister; + + down_write(&kdp_list_lock); + list_add(&kdp->list, &kdp_list_head); + up_write(&kdp_list_lock); + + return 0; + +error_unregister: + unregister_netdev(dev); +error_free: + free_netdev(dev); + return -EINVAL; +} + +static void single_kthread_stop(void) +{ + /* Stop kernel thread for single mode */ + if (multiple_kthread_on == 0 && kdp_kthread != NULL) { + kthread_stop(kdp_kthread); + kdp_kthread = NULL; + } +} + +static void multiple_kthread_stop(struct kdp_dev *kdp) +{ + /* Stop kernel thread for multiple mode */ + if (multiple_kthread_on && kdp->pthread != NULL) { + kthread_stop(kdp->pthread); + kdp->pthread = NULL; + } +} + +static void kdp_net_dellink(struct net_device *dev, struct list_head *head) +{ + struct kdp_dev *kdp; + + kdp = netdev_priv(dev); + + down_write(&kdp_list_lock); + list_del(&kdp->list); + up_write(&kdp_list_lock); + + multiple_kthread_stop(kdp); + + down_write(&kdp_list_lock); + if (list_empty(&kdp_list_head)) + single_kthread_stop(); + up_write(&kdp_list_lock); + + unregister_netdevice_queue(dev, head); +} + +static struct rtnl_link_ops kdp_link_ops __read_mostly = { + .kind = KDP_DEVICE, + .priv_size = sizeof(struct kdp_dev), + .setup = kdp_net_setup, + .maxtype = IFLA_KDP_MAX, + .newlink = kdp_net_newlink, + .dellink = kdp_net_dellink, +}; + +static int __init +kdp_parse_kthread_mode(void) +{ + if (!kthread_mode) + return 0; + + if (strcmp(kthread_mode, "single") == 0) + return 0; + else if (strcmp(kthread_mode, "multiple") == 0) + multiple_kthread_on = 1; + else + return -1; + + return 0; +} + +static void kdp_net_config_lo_mode(char *lo_str) +{ + if (!lo_str) { + KDP_PRINT("loopback disabled"); + return; + } + + if (!strcmp(lo_str, "lo_mode_none")) + KDP_PRINT("loopback disabled"); + else if (!strcmp(lo_str, "lo_mode_fifo")) { + KDP_PRINT("loopback mode=lo_mode_fifo enabled"); + kdp_net_rx_func = kdp_net_rx_lo_fifo; + } else if (!strcmp(lo_str, "lo_mode_fifo_skb")) { + KDP_PRINT("loopback mode=lo_mode_fifo_skb enabled"); + kdp_net_rx_func = kdp_net_rx_lo_fifo_skb; + } else + KDP_PRINT("Incognizant parameter, loopback disabled"); +} + +static int __init kdp_init(void) +{ + if (kdp_parse_kthread_mode() < 0) { + KDP_ERR("Invalid parameter for kthread_mode\n"); + return -EINVAL; + } + + /* Configure the lo mode according to the input parameter */ + kdp_net_config_lo_mode(lo_mode); + + init_rwsem(&kdp_list_lock); + INIT_LIST_HEAD(&kdp_list_head); + + return rtnl_link_register(&kdp_link_ops); +} +module_init(kdp_init); + +static void kdp_release(void) +{ + struct kdp_dev *kdp, *n; + + single_kthread_stop(); + + down_write(&kdp_list_lock); + list_for_each_entry_safe(kdp, n, &kdp_list_head, list) { + multiple_kthread_stop(kdp); + list_del(&kdp->list); + } + up_write(&kdp_list_lock); +} + +static void __exit kdp_exit(void) +{ + kdp_release(); + rtnl_link_unregister(&kdp_link_ops); +} +module_exit(kdp_exit); + +module_param(lo_mode, charp, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(lo_mode, +"KDP loopback mode (default=lo_mode_none):\n" +" lo_mode_none Kernel loopback disabled\n" +" lo_mode_fifo Enable kernel loopback with fifo\n" +" lo_mode_fifo_skb Enable kernel loopback with fifo and skb buffer\n" +"\n" +); + +module_param(kthread_mode, charp, S_IRUGO); +MODULE_PARM_DESC(kthread_mode, +"Kernel thread mode (default=single):\n" +" single Single kernel thread mode enabled.\n" +" multiple Multiple kernel thread mode enabled.\n" +"\n" +); + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_AUTHOR("Intel Corporation"); +MODULE_DESCRIPTION("Kernel Module for managing kdp devices"); -- 2.5.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v2 2/2] kdp: add virtual PMD for kernel slow data path communication 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 1/2] kdp: add kernel data path kernel module Ferruh Yigit @ 2016-02-19 5:05 ` Ferruh Yigit 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2 siblings, 0 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-02-19 5:05 UTC (permalink / raw) To: dev This patch provides slow data path communication to the Linux kernel. Patch is based on librte_kni, and heavily re-uses it. The main difference is librte_kni library converted into a PMD, to provide ease of use for applications. Now any application can use slow path communication without any update in application, because of existing eal support for virtual PMD. Also this PMD supports two methods to send packets to the Linux, first one is custom FIFO implementation with help of KDP kernel module, second one is Linux in-kernel tun/tap support. PMD first checks for KDP kernel module, if fails it tries to create and use a tap interface. With FIFO method: PMD's rx_pkt_burst() get packets from FIFO, and tx_pkt_burst() puts packet to the FIFO. The corresponding Linux virtual network device driver code also gets/puts packets from FIFO as they are coming from hardware. With tun/tap method: no external kernel module required, PMD reads from and writes packets to the tap interface file descriptor. Tap interface has performance penalty against FIFO implementation. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> --- v2: * Use rtnetlink to create interfaces --- MAINTAINERS | 1 + config/common_linuxapp | 1 + doc/guides/nics/pcap_ring.rst | 125 ++++++- doc/guides/rel_notes/release_16_04.rst | 6 + drivers/net/Makefile | 3 +- drivers/net/kdp/Makefile | 61 +++ drivers/net/kdp/rte_eth_kdp.c | 501 +++++++++++++++++++++++++ drivers/net/kdp/rte_kdp.c | 633 ++++++++++++++++++++++++++++++++ drivers/net/kdp/rte_kdp.h | 116 ++++++ drivers/net/kdp/rte_kdp_fifo.h | 91 +++++ drivers/net/kdp/rte_kdp_tap.c | 101 +++++ drivers/net/kdp/rte_pmd_kdp_version.map | 4 + lib/librte_eal/common/include/rte_log.h | 3 +- mk/rte.app.mk | 3 +- 14 files changed, 1643 insertions(+), 6 deletions(-) create mode 100644 drivers/net/kdp/Makefile create mode 100644 drivers/net/kdp/rte_eth_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.h create mode 100644 drivers/net/kdp/rte_kdp_fifo.h create mode 100644 drivers/net/kdp/rte_kdp_tap.c create mode 100644 drivers/net/kdp/rte_pmd_kdp_version.map diff --git a/MAINTAINERS b/MAINTAINERS index 05ffe26..deaeea3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -260,6 +260,7 @@ F: doc/guides/sample_app_ug/kernel_nic_interface.rst Linux KDP M: Ferruh Yigit <ferruh.yigit@gmail.com> F: lib/librte_eal/linuxapp/kdp/ +F: drivers/net/kdp/ Linux AF_PACKET M: John W. Linville <linville@tuxdriver.com> diff --git a/config/common_linuxapp b/config/common_linuxapp index e1b5032..aa13719 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -316,6 +316,7 @@ CONFIG_RTE_LIBRTE_PMD_NULL=y # # Compile KDP PMD # +CONFIG_RTE_LIBRTE_PMD_KDP=y CONFIG_RTE_KDP_KMOD=y CONFIG_RTE_KDP_PREEMPT_DEFAULT=y diff --git a/doc/guides/nics/pcap_ring.rst b/doc/guides/nics/pcap_ring.rst index aa48d33..b602e65 100644 --- a/doc/guides/nics/pcap_ring.rst +++ b/doc/guides/nics/pcap_ring.rst @@ -28,11 +28,11 @@ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -Libpcap and Ring Based Poll Mode Drivers -======================================== +Software Poll Mode Drivers +========================== In addition to Poll Mode Drivers (PMDs) for physical and virtual hardware, -the DPDK also includes two pure-software PMDs. These two drivers are: +the DPDK also includes pure-software PMDs. These drivers are: * A libpcap -based PMD (librte_pmd_pcap) that reads and writes packets using libpcap, - both from files on disk, as well as from physical NIC devices using standard Linux kernel drivers. @@ -40,6 +40,10 @@ the DPDK also includes two pure-software PMDs. These two drivers are: * A ring-based PMD (librte_pmd_ring) that allows a set of software FIFOs (that is, rte_ring) to be accessed using the PMD APIs, as though they were physical NICs. +* A slow data path PMD (librte_pmd_kdp) that allows send/get packets to/from OS network + stack as it is a physical NIC. + + .. note:: The libpcap -based PMD is disabled by default in the build configuration files, @@ -211,6 +215,121 @@ Multiple devices may be specified, separated by commas. Done. +Kernel Data Path PMD +~~~~~~~~~~~~~~~~~~~~ + +Kernel Data Path (KDP) PMD is to communicate with OS network stack easily by application. + +.. code-block:: console + + ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 -- -i + ... + Configuring Port 0 (socket 0) + Port 0: 00:00:00:00:00:00 + Configuring Port 1 (socket 0) + Port 1: 00:00:00:00:00:00 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + +KDP PMD supports two type of communication: + +* Custom FIFO implementation +* tun/tap implementation + +Custom FIFO implementation gives more performance but requires KDP kernel module (rte_kdp.ko) inserted. + +By default FIFO communication has priority, if KDP kernel module is not inserted, tun/tap communication used. + +If KDP kernel module inserted, above testpmd command will create following virtual interfaces, these can be used as any interface. + +.. code-block:: console + + # ifconfig kdp0; ifconfig kdp1 + kdp0: flags=4098<BROADCAST,MULTICAST> mtu 1500 + ether 00:00:00:00:00:00 txqueuelen 1000 (Ethernet) + RX packets 0 bytes 0 (0.0 B) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 0 bytes 0 (0.0 B) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + kdp1: flags=4098<BROADCAST,MULTICAST> mtu 1500 + ether 00:00:00:00:00:00 txqueuelen 1000 (Ethernet) + RX packets 0 bytes 0 (0.0 B) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 0 bytes 0 (0.0 B) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + +With tun/tap communication method, following interfaces are created: + +.. code-block:: console + + # ifconfig tap_kdp0; ifconfig tap_kdp1 + tap_kdp0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 + inet6 fe80::341f:afff:feb7:23db prefixlen 64 scopeid 0x20<link> + ether 36:1f:af:b7:23:db txqueuelen 500 (Ethernet) + RX packets 126624864 bytes 6184828655 (5.7 GiB) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 126236898 bytes 6150306636 (5.7 GiB) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + tap_kdp1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 + inet6 fe80::f030:b4ff:fe94:b720 prefixlen 64 scopeid 0x20<link> + ether f2:30:b4:94:b7:20 txqueuelen 500 (Ethernet) + RX packets 126237370 bytes 6150329717 (5.7 GiB) + RX errors 0 dropped 9 overruns 0 frame 0 + TX packets 126624896 bytes 6184826874 (5.7 GiB) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + +DPDK application can be used to forward packets between these interfaces: + +.. code-block:: console + + In Linux: + ip l add br0 type bridge + ip l set tap_kdp0 master br0 + ip l set tap_kdp1 master br0 + ip l set br0 up + ip l set tap_kdp0 up + ip l set tap_kdp1 up + + + In testpmd: + testpmd> start + io packet forwarding - CRC stripping disabled - packets/burst=32 + nb forwarding cores=1 - nb forwarding ports=2 + RX queues=1 - RX desc=128 - RX free threshold=0 + RX threshold registers: pthresh=0 hthresh=0 wthresh=0 + TX queues=1 - TX desc=512 - TX free threshold=0 + TX threshold registers: pthresh=0 hthresh=0 wthresh=0 + TX RS bit threshold=0 - TXQ flags=0x0 + testpmd> stop + Telling cores to stop... + Waiting for lcores to finish... + + ---------------------- Forward statistics for port 0 ---------------------- + RX-packets: 973900 RX-dropped: 0 RX-total: 973900 + TX-packets: 973903 TX-dropped: 0 TX-total: 973903 + ---------------------------------------------------------------------------- + + ---------------------- Forward statistics for port 1 ---------------------- + RX-packets: 973903 RX-dropped: 0 RX-total: 973903 + TX-packets: 973900 TX-dropped: 0 TX-total: 973900 + ---------------------------------------------------------------------------- + + +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++ + RX-packets: 1947803 RX-dropped: 0 RX-total: 1947803 + TX-packets: 1947803 TX-dropped: 0 TX-total: 1947803 + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + + Done. + + + + + Using the Poll Mode Driver from an Application ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/guides/rel_notes/release_16_04.rst b/doc/guides/rel_notes/release_16_04.rst index eb1b3b2..d17778c 100644 --- a/doc/guides/rel_notes/release_16_04.rst +++ b/doc/guides/rel_notes/release_16_04.rst @@ -44,6 +44,12 @@ This section should contain new features added in this release. Sample format: Add the offload and negotiation of checksum and TSO between vhost-user and vanilla Linux virtio guest. +* **Added Slow Data Path support.** + + * This is based on KNI work and in long term intends to replace it. + * Added Kernel Data Path (KDP) kernel module. + * Added KDP virtual PMD. + Resolved Issues --------------- diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 6e4497e..0be06f5 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -51,6 +51,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt +DIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += kdp include $(RTE_SDK)/mk/rte.sharelib.mk include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/net/kdp/Makefile b/drivers/net/kdp/Makefile new file mode 100644 index 0000000..035056e --- /dev/null +++ b/drivers/net/kdp/Makefile @@ -0,0 +1,61 @@ +# BSD LICENSE +# +# Copyright(c) 2016 Intel Corporation. All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = librte_pmd_kdp.a + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +EXPORT_MAP := rte_pmd_kdp_version.map + +LIBABIVER := 1 + +# +# all source are stored in SRCS-y +# +SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_eth_kdp.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_kdp.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_kdp_tap.c + +# +# Export include files +# +SYMLINK-y-include += + +# this lib depends upon: +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += lib/librte_mbuf +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += lib/librte_ether + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/kdp/rte_eth_kdp.c b/drivers/net/kdp/rte_eth_kdp.c new file mode 100644 index 0000000..68dd734 --- /dev/null +++ b/drivers/net/kdp/rte_eth_kdp.c @@ -0,0 +1,501 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <rte_ethdev.h> + +#include "rte_kdp.h" + +#define MAX_PACKET_SZ 2048 + +struct pmd_queue_stats { + uint64_t pkts; + uint64_t bytes; + uint64_t err_pkts; +}; + +struct pmd_queue { + struct pmd_internals *internals; + struct rte_mempool *mb_pool; + + struct pmd_queue_stats rx; + struct pmd_queue_stats tx; +}; + +struct pmd_internals { + struct kdp_data *kdp; + struct kdp_tap_data *kdp_tap; + + struct pmd_queue rx_queues[RTE_MAX_QUEUES_PER_PORT]; + struct pmd_queue tx_queues[RTE_MAX_QUEUES_PER_PORT]; +}; + +static struct ether_addr eth_addr = { .addr_bytes = {0} }; +static const char *drivername = "KDP PMD"; +static struct rte_eth_link pmd_link = { + .link_speed = 10000, + .link_duplex = ETH_LINK_FULL_DUPLEX, + .link_status = 0 +}; + +static uint16_t +eth_kdp_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct pmd_queue *kdp_q = q; + struct kdp_data *kdp = kdp_q->internals->kdp; + uint16_t nb_pkts; + + nb_pkts = kdp_rx_burst(kdp, bufs, nb_bufs); + + kdp_q->rx.pkts += nb_pkts; + kdp_q->rx.err_pkts += nb_bufs - nb_pkts; + + return nb_pkts; +} + +static uint16_t +eth_kdp_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct pmd_queue *kdp_q = q; + struct kdp_data *kdp = kdp_q->internals->kdp; + uint16_t nb_pkts; + + nb_pkts = kdp_tx_burst(kdp, bufs, nb_bufs); + + kdp_q->tx.pkts += nb_pkts; + kdp_q->tx.err_pkts += nb_bufs - nb_pkts; + + return nb_pkts; +} + +static uint16_t +eth_kdp_tap_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct pmd_queue *kdp_q = q; + struct pmd_internals *internals = kdp_q->internals; + struct kdp_tap_data *kdp_tap = internals->kdp_tap; + struct rte_mbuf *m; + int ret; + unsigned i; + + for (i = 0; i < nb_bufs; i++) { + m = rte_pktmbuf_alloc(kdp_q->mb_pool); + bufs[i] = m; + ret = read(kdp_tap->tap_fd, rte_pktmbuf_mtod(m, void *), + MAX_PACKET_SZ); + if (ret < 0) { + rte_pktmbuf_free(m); + break; + } + + m->nb_segs = 1; + m->next = NULL; + m->pkt_len = (uint16_t)ret; + m->data_len = (uint16_t)ret; + } + + kdp_q->rx.pkts += i; + kdp_q->rx.err_pkts += nb_bufs - i; + + return i; +} + +static uint16_t +eth_kdp_tap_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct pmd_queue *kdp_q = q; + struct pmd_internals *internals = kdp_q->internals; + struct kdp_tap_data *kdp_tap = internals->kdp_tap; + struct rte_mbuf *m; + unsigned i; + + for (i = 0; i < nb_bufs; i++) { + m = bufs[i]; + write(kdp_tap->tap_fd, rte_pktmbuf_mtod(m, void*), + rte_pktmbuf_data_len(m)); + rte_pktmbuf_free(m); + } + + kdp_q->tx.pkts += i; + kdp_q->tx.err_pkts += nb_bufs - i; + + return i; +} + +static int +eth_kdp_start(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct kdp_conf conf; + uint16_t port_id = dev->data->port_id; + int ret = 0; + + snprintf(conf.name, RTE_KDP_NAMESIZE, KDP_DEVICE "%u", + port_id); + conf.force_bind = 0; + conf.port_id = port_id; + conf.mbuf_size = MAX_PACKET_SZ; + + ret = kdp_start(internals->kdp, + internals->rx_queues[0].mb_pool, + &conf); + if (ret) + RTE_LOG(ERR, KDP, "Fail to create kdp for port: %d\n", + port_id); + + return ret; +} + +static int +eth_kdp_dev_start(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + int ret; + + if (internals->kdp) { + ret = eth_kdp_start(dev); + if (ret) + return -1; + } + + dev->data->dev_link.link_status = 1; + return 0; +} + +static void +eth_kdp_dev_stop(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + + if (internals->kdp) + kdp_stop(internals->kdp); + + dev->data->dev_link.link_status = 0; +} + +static void +eth_kdp_dev_close(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct kdp_data *kdp = internals->kdp; + struct kdp_tap_data *kdp_tap = internals->kdp_tap; + + if (kdp) { + kdp_close(kdp); + + rte_free(kdp); + internals->kdp = NULL; + } + + if (kdp_tap) { + kdp_tap_close(kdp_tap); + + rte_free(kdp_tap); + internals->kdp_tap = NULL; + } + + rte_free(dev->data->dev_private); + dev->data->dev_private = NULL; +} + +static int +eth_kdp_dev_configure(struct rte_eth_dev *dev __rte_unused) +{ + return 0; +} + +static void +eth_kdp_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) +{ + struct rte_eth_dev_data *data = dev->data; + + dev_info->driver_name = data->drv_name; + dev_info->max_mac_addrs = 1; + dev_info->max_rx_pktlen = (uint32_t)-1; + dev_info->max_rx_queues = data->nb_rx_queues; + dev_info->max_tx_queues = data->nb_tx_queues; + dev_info->min_rx_bufsize = 0; + dev_info->pci_dev = NULL; +} + +static int +eth_kdp_rx_queue_setup(struct rte_eth_dev *dev, + uint16_t rx_queue_id __rte_unused, + uint16_t nb_rx_desc __rte_unused, + unsigned int socket_id __rte_unused, + const struct rte_eth_rxconf *rx_conf __rte_unused, + struct rte_mempool *mb_pool) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct pmd_queue *q; + + q = &internals->rx_queues[rx_queue_id]; + q->internals = internals; + q->mb_pool = mb_pool; + + dev->data->rx_queues[rx_queue_id] = q; + + return 0; +} + +static int +eth_kdp_tx_queue_setup(struct rte_eth_dev *dev, + uint16_t tx_queue_id, + uint16_t nb_tx_desc __rte_unused, + unsigned int socket_id __rte_unused, + const struct rte_eth_txconf *tx_conf __rte_unused) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct pmd_queue *q; + + q = &internals->tx_queues[tx_queue_id]; + q->internals = internals; + + dev->data->tx_queues[tx_queue_id] = q; + + return 0; +} + +static void +eth_kdp_queue_release(void *q __rte_unused) +{ +} + +static int +eth_kdp_link_update(struct rte_eth_dev *dev __rte_unused, + int wait_to_complete __rte_unused) +{ + return 0; +} + +static void +eth_kdp_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) +{ + unsigned i, num_stats; + unsigned long rx_packets_total = 0, rx_bytes_total = 0; + unsigned long tx_packets_total = 0, tx_bytes_total = 0; + unsigned long tx_packets_err_total = 0; + struct rte_eth_dev_data *data = dev->data; + struct pmd_queue *q; + + num_stats = RTE_MIN((unsigned)RTE_ETHDEV_QUEUE_STAT_CNTRS, + data->nb_rx_queues); + for (i = 0; i < num_stats; i++) { + q = data->rx_queues[i]; + stats->q_ipackets[i] = q->rx.pkts; + stats->q_ibytes[i] = q->rx.bytes; + rx_packets_total += stats->q_ipackets[i]; + rx_bytes_total += stats->q_ibytes[i]; + } + + num_stats = RTE_MIN((unsigned)RTE_ETHDEV_QUEUE_STAT_CNTRS, + data->nb_tx_queues); + for (i = 0; i < num_stats; i++) { + q = data->tx_queues[i]; + stats->q_opackets[i] = q->tx.pkts; + stats->q_obytes[i] = q->tx.bytes; + stats->q_errors[i] = q->tx.err_pkts; + tx_packets_total += stats->q_opackets[i]; + tx_bytes_total += stats->q_obytes[i]; + tx_packets_err_total += stats->q_errors[i]; + } + + stats->ipackets = rx_packets_total; + stats->ibytes = rx_bytes_total; + stats->opackets = tx_packets_total; + stats->obytes = tx_bytes_total; + stats->oerrors = tx_packets_err_total; +} + +static void +eth_kdp_stats_reset(struct rte_eth_dev *dev) +{ + unsigned i; + struct rte_eth_dev_data *data = dev->data; + struct pmd_queue *q; + + for (i = 0; i < data->nb_rx_queues; i++) { + q = data->rx_queues[i]; + q->rx.pkts = 0; + q->rx.bytes = 0; + } + for (i = 0; i < data->nb_tx_queues; i++) { + q = data->tx_queues[i]; + q->tx.pkts = 0; + q->tx.bytes = 0; + q->tx.err_pkts = 0; + } +} + +static const struct eth_dev_ops eth_kdp_ops = { + .dev_start = eth_kdp_dev_start, + .dev_stop = eth_kdp_dev_stop, + .dev_close = eth_kdp_dev_close, + .dev_configure = eth_kdp_dev_configure, + .dev_infos_get = eth_kdp_dev_info, + .rx_queue_setup = eth_kdp_rx_queue_setup, + .tx_queue_setup = eth_kdp_tx_queue_setup, + .rx_queue_release = eth_kdp_queue_release, + .tx_queue_release = eth_kdp_queue_release, + .link_update = eth_kdp_link_update, + .stats_get = eth_kdp_stats_get, + .stats_reset = eth_kdp_stats_reset, +}; + +static struct rte_eth_dev * +eth_kdp_create(const char *name, unsigned numa_node) +{ + uint16_t nb_rx_queues = 1; + uint16_t nb_tx_queues = 1; + struct rte_eth_dev_data *data = NULL; + struct pmd_internals *internals = NULL; + struct rte_eth_dev *eth_dev = NULL; + + RTE_LOG(INFO, PMD, "Creating kdp ethdev on numa socket %u\n", + numa_node); + + data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node); + if (data == NULL) + goto error; + + internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node); + if (internals == NULL) + goto error; + + /* reserve an ethdev entry */ + eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL); + if (eth_dev == NULL) + goto error; + + data->dev_private = internals; + data->port_id = eth_dev->data->port_id; + memmove(data->name, eth_dev->data->name, sizeof(data->name)); + data->nb_rx_queues = nb_rx_queues; + data->nb_tx_queues = nb_tx_queues; + data->dev_link = pmd_link; + data->mac_addrs = ð_addr; + + eth_dev->data = data; + eth_dev->dev_ops = ð_kdp_ops; + eth_dev->driver = NULL; + + data->dev_flags = RTE_ETH_DEV_DETACHABLE; + data->kdrv = RTE_KDRV_NONE; + data->drv_name = drivername; + data->numa_node = numa_node; + + return eth_dev; + +error: + rte_free(data); + rte_free(internals); + + return NULL; +} + +static int +eth_kdp_devinit(const char *name, const char *params __rte_unused) +{ + struct rte_eth_dev *eth_dev = NULL; + struct pmd_internals *internals; + struct kdp_data *kdp; + struct kdp_tap_data *kdp_tap = NULL; + uint16_t port_id; + + RTE_LOG(INFO, PMD, "Initializing eth_kdp for %s\n", name); + + eth_dev = eth_kdp_create(name, rte_socket_id()); + if (eth_dev == NULL) + return -1; + + internals = eth_dev->data->dev_private; + port_id = eth_dev->data->port_id; + + kdp = kdp_init(port_id); + if (kdp == NULL) + kdp_tap = kdp_tap_init(port_id); + + if (kdp == NULL && kdp_tap == NULL) { + rte_eth_dev_release_port(eth_dev); + rte_free(internals); + + /* Not return error to prevent panic in rte_eal_init() */ + return 0; + } + + internals->kdp = kdp; + internals->kdp_tap = kdp_tap; + + if (kdp == NULL) { + eth_dev->rx_pkt_burst = eth_kdp_tap_rx; + eth_dev->tx_pkt_burst = eth_kdp_tap_tx; + } else { + eth_dev->rx_pkt_burst = eth_kdp_rx; + eth_dev->tx_pkt_burst = eth_kdp_tx; + } + + return 0; +} + +static int +eth_kdp_devuninit(const char *name) +{ + struct rte_eth_dev *eth_dev = NULL; + + RTE_LOG(INFO, PMD, "Un-Initializing eth_kdp for %s\n", name); + + /* find the ethdev entry */ + eth_dev = rte_eth_dev_allocated(name); + if (eth_dev == NULL) + return -1; + + eth_kdp_dev_stop(eth_dev); + + if (eth_dev->data) + rte_free(eth_dev->data->dev_private); + rte_free(eth_dev->data); + + rte_eth_dev_release_port(eth_dev); + + kdp_uninit(); + + return 0; +} + +static struct rte_driver eth_kdp_drv = { + .name = "eth_kdp", + .type = PMD_VDEV, + .init = eth_kdp_devinit, + .uninit = eth_kdp_devuninit, +}; + +PMD_REGISTER_DRIVER(eth_kdp_drv); diff --git a/drivers/net/kdp/rte_kdp.c b/drivers/net/kdp/rte_kdp.c new file mode 100644 index 0000000..ed50a0f --- /dev/null +++ b/drivers/net/kdp/rte_kdp.c @@ -0,0 +1,633 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef RTE_EXEC_ENV_LINUXAPP +#error "KDP is not supported" +#endif + +#include <sys/socket.h> +#include <linux/netlink.h> +#include <linux/rtnetlink.h> + +#include <rte_spinlock.h> +#include <rte_ethdev.h> +#include <rte_memzone.h> + +#include "rte_kdp.h" +#include "rte_kdp_fifo.h" + +#define KDP_MODULE_NAME "rte_kdp" +#define MAX_MBUF_BURST_NUM 32 + +/* Maximum number of ring entries */ +#define KDP_FIFO_COUNT_MAX 1024 +#define KDP_FIFO_SIZE (KDP_FIFO_COUNT_MAX * sizeof(void *) + \ + sizeof(struct rte_kdp_fifo)) + +#define BUFSZ 1024 +struct kdp_request { + struct nlmsghdr nlmsg; + char buf[BUFSZ]; +}; + +static int kdp_fd = -1; +static int kdp_ref_count; + +static const struct rte_memzone * +kdp_memzone_reserve(const char *name, size_t len, int socket_id, + unsigned flags) +{ + const struct rte_memzone *mz = rte_memzone_lookup(name); + + if (mz == NULL) + mz = rte_memzone_reserve(name, len, socket_id, flags); + + return mz; +} + +static int +kdp_slot_init(struct kdp_memzone_slot *slot) +{ +#define OBJNAMSIZ 32 + char obj_name[OBJNAMSIZ]; + const struct rte_memzone *mz; + + /* TX RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_tx_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_tx_q = mz; + + /* RX RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_rx_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_rx_q = mz; + + /* ALLOC RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_alloc_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_alloc_q = mz; + + /* FREE RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_free_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_free_q = mz; + + return 0; + +kdp_fail: + return -1; +} + +static void +kdp_ring_init(struct kdp_data *kdp) +{ + struct kdp_memzone_slot *slot = kdp->slot; + const struct rte_memzone *mz; + + /* TX RING */ + mz = slot->m_tx_q; + kdp->tx_q = mz->addr; + kdp_fifo_init(kdp->tx_q, KDP_FIFO_COUNT_MAX); + + /* RX RING */ + mz = slot->m_rx_q; + kdp->rx_q = mz->addr; + kdp_fifo_init(kdp->rx_q, KDP_FIFO_COUNT_MAX); + + /* ALLOC RING */ + mz = slot->m_alloc_q; + kdp->alloc_q = mz->addr; + kdp_fifo_init(kdp->alloc_q, KDP_FIFO_COUNT_MAX); + + /* FREE RING */ + mz = slot->m_free_q; + kdp->free_q = mz->addr; + kdp_fifo_init(kdp->free_q, KDP_FIFO_COUNT_MAX); +} + +static int +kdp_module_check(void) +{ + int fd; + + fd = open("/sys/module/" KDP_MODULE_NAME "/initstate", O_RDONLY); + if (fd < 0) + return -1; + close(fd); + + return 0; +} + +static int +rtnl_socket_open(void) +{ + struct sockaddr_nl src; + int ret; + + /* Check FD and open */ + if (kdp_fd < 0) { + kdp_fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); + if (kdp_fd < 0) { + RTE_LOG(ERR, KDP, "socket for create failed.\n"); + return -1; + } + + memset(&src, 0, sizeof(struct sockaddr_nl)); + + src.nl_family = AF_NETLINK; + src.nl_pid = getpid(); + + ret = bind(kdp_fd, (struct sockaddr *)&src, + sizeof(struct sockaddr_nl)); + if (ret < 0) { + RTE_LOG(ERR, KDP, "Bind for create failed.\n"); + close(kdp_fd); + kdp_fd = -1; + return -1; + } + } + + kdp_ref_count++; + + return 0; +} + +static void +kdp_ref_put(void) +{ + /* not initialized? */ + if (!kdp_ref_count) + return; + + kdp_ref_count--; + + /* not last one? */ + if (kdp_ref_count) + return; + + if (kdp_fd < 0) + return; + + close(kdp_fd); + kdp_fd = -1; +} + +struct kdp_data * +kdp_init(uint16_t port_id) +{ + struct kdp_memzone_slot *slot = NULL; + struct kdp_data *kdp = NULL; + int ret; + + ret = kdp_module_check(); + if (ret) + return NULL; + + ret = rtnl_socket_open(); + if (ret) + return NULL; + + slot = rte_malloc(NULL, sizeof(struct kdp_memzone_slot), 0); + if (slot == NULL) + goto kdp_fail; + slot->id = port_id; + + kdp = rte_malloc(NULL, sizeof(struct kdp_data), 0); + if (kdp == NULL) + goto kdp_fail; + kdp->slot = slot; + + ret = kdp_slot_init(slot); + if (ret < 0) + goto kdp_fail; + + kdp_ring_init(kdp); + + return kdp; + +kdp_fail: + kdp_ref_put(); + rte_free(slot); + rte_free(kdp); + RTE_LOG(ERR, KDP, "Unable to allocate memory\n"); + return NULL; +} + +static void +kdp_mbufs_allocate(struct kdp_data *kdp) +{ + int i, ret; + struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; + + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pool) != + offsetof(struct rte_kdp_mbuf, pool)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_addr) != + offsetof(struct rte_kdp_mbuf, buf_addr)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, next) != + offsetof(struct rte_kdp_mbuf, next)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_off) != + offsetof(struct rte_kdp_mbuf, data_off)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_len) != + offsetof(struct rte_kdp_mbuf, data_len)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pkt_len) != + offsetof(struct rte_kdp_mbuf, pkt_len)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != + offsetof(struct rte_kdp_mbuf, ol_flags)); + + /* Check if pktmbuf pool has been configured */ + if (kdp->pktmbuf_pool == NULL) { + RTE_LOG(ERR, KDP, "No valid mempool for allocating mbufs\n"); + return; + } + + for (i = 0; i < MAX_MBUF_BURST_NUM; i++) { + pkts[i] = rte_pktmbuf_alloc(kdp->pktmbuf_pool); + if (unlikely(pkts[i] == NULL)) { + /* Out of memory */ + RTE_LOG(ERR, KDP, "Out of memory\n"); + break; + } + } + + /* No pkt mbuf alocated */ + if (i <= 0) + return; + + ret = kdp_fifo_put(kdp->alloc_q, (void **)pkts, i); + + /* Check if any mbufs not put into alloc_q, and then free them */ + if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM) { + int j; + + for (j = ret; j < i; j++) + rte_pktmbuf_free(pkts[j]); + } +} + +static int +attr_add(struct kdp_request *req, unsigned short type, void *buf, size_t len) +{ + struct rtattr *rta; + int nlmsg_len; + + nlmsg_len = NLMSG_ALIGN(req->nlmsg.nlmsg_len); + rta = (struct rtattr *)((char *)&req->nlmsg + nlmsg_len); + if (nlmsg_len + RTA_LENGTH(len) > sizeof(struct kdp_request)) + return -1; + rta->rta_type = type; + rta->rta_len = RTA_LENGTH(len); + memcpy(RTA_DATA(rta), buf, len); + req->nlmsg.nlmsg_len = nlmsg_len + RTA_LENGTH(len); + + return 0; +} + +static struct +rtattr *attr_nested_add(struct kdp_request *req, unsigned short type) +{ + struct rtattr *rta; + int nlmsg_len; + + nlmsg_len = NLMSG_ALIGN(req->nlmsg.nlmsg_len); + rta = (struct rtattr *)((char *)&req->nlmsg + nlmsg_len); + if (nlmsg_len + RTA_LENGTH(0) > sizeof(struct kdp_request)) + return NULL; + rta->rta_type = type; + rta->rta_len = nlmsg_len; + req->nlmsg.nlmsg_len = nlmsg_len + RTA_LENGTH(0); + + return rta; +} + +static void +attr_nested_end(struct kdp_request *req, struct rtattr *rta) +{ + rta->rta_len = req->nlmsg.nlmsg_len - rta->rta_len; +} + +static int +rtnl_create(struct rte_kdp_device_info *dev_info) +{ + struct kdp_request req; + struct ifinfomsg *info; + struct rtattr *rta1; + struct rtattr *rta2; + char name[RTE_KDP_NAMESIZE]; + char type[RTE_KDP_NAMESIZE]; + struct iovec iov; + struct msghdr msg; + struct sockaddr_nl nladdr; + int ret; + char buf[BUFSZ]; + + memset(&req, 0, sizeof(struct kdp_request)); + + req.nlmsg.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)); + req.nlmsg.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; + req.nlmsg.nlmsg_flags |= NLM_F_ACK; + req.nlmsg.nlmsg_type = RTM_NEWLINK; + + info = NLMSG_DATA(&req.nlmsg); + + info->ifi_family = AF_UNSPEC; + info->ifi_index = 0; + + snprintf(name, RTE_KDP_NAMESIZE, "%s", dev_info->name); + ret = attr_add(&req, IFLA_IFNAME, name, strlen(name) + 1); + if (ret < 0) + return -1; + + rta1 = attr_nested_add(&req, IFLA_LINKINFO); + if (rta1 == NULL) + return -1; + + snprintf(type, RTE_KDP_NAMESIZE, KDP_DEVICE); + ret = attr_add(&req, IFLA_INFO_KIND, type, strlen(type) + 1); + if (ret < 0) + return -1; + + rta2 = attr_nested_add(&req, IFLA_INFO_DATA); + if (rta2 == NULL) + return -1; + + ret = attr_add(&req, IFLA_KDP_PORTID, &dev_info->port_id, + sizeof(uint8_t)); + if (ret < 0) + return -1; + + ret = attr_add(&req, IFLA_KDP_DEVINFO, dev_info, + sizeof(struct rte_kdp_device_info)); + if (ret < 0) + return -1; + + attr_nested_end(&req, rta2); + attr_nested_end(&req, rta1); + + memset(&nladdr, 0, sizeof(nladdr)); + nladdr.nl_family = AF_NETLINK; + + iov.iov_base = (void *)&req.nlmsg; + iov.iov_len = req.nlmsg.nlmsg_len; + + memset(&msg, 0, sizeof(struct msghdr)); + msg.msg_name = &nladdr; + msg.msg_namelen = sizeof(nladdr); + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + + ret = sendmsg(kdp_fd, &msg, 0); + if (ret < 0) { + RTE_LOG(ERR, KDP, "Send for create failed %d.\n", errno); + return -1; + } + + memset(buf, 0, sizeof(buf)); + iov.iov_base = buf; + iov.iov_len = sizeof(buf); + + ret = recvmsg(kdp_fd, &msg, 0); + if (ret < 0) { + RTE_LOG(ERR, KDP, "Recv for create failed.\n"); + return -1; + } + + return 0; +} + +int +kdp_start(struct kdp_data *kdp, struct rte_mempool *pktmbuf_pool, + const struct kdp_conf *conf) +{ + struct kdp_memzone_slot *slot = kdp->slot; + struct rte_kdp_device_info dev_info; + char mz_name[RTE_MEMZONE_NAMESIZE]; + const struct rte_memzone *mz; + int ret; + + if (!kdp || !pktmbuf_pool || !conf || !conf->name[0]) + return -1; + + snprintf(kdp->name, RTE_KDP_NAMESIZE, "%s", conf->name); + kdp->pktmbuf_pool = pktmbuf_pool; + kdp->port_id = conf->port_id; + + memset(&dev_info, 0, sizeof(dev_info)); + dev_info.core_id = conf->core_id; + dev_info.force_bind = conf->force_bind; + dev_info.port_id = conf->port_id; + dev_info.mbuf_size = conf->mbuf_size; + snprintf(dev_info.name, RTE_KDP_NAMESIZE, "%s", conf->name); + + dev_info.tx_phys = slot->m_tx_q->phys_addr; + dev_info.rx_phys = slot->m_rx_q->phys_addr; + dev_info.alloc_phys = slot->m_alloc_q->phys_addr; + dev_info.free_phys = slot->m_free_q->phys_addr; + + /* MBUF mempool */ + snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_OBJ_NAME, + pktmbuf_pool->name); + mz = rte_memzone_lookup(mz_name); + if (mz == NULL) + goto kdp_fail; + dev_info.mbuf_va = mz->addr; + dev_info.mbuf_phys = mz->phys_addr; + + ret = rtnl_create(&dev_info); + if (ret < 0) + goto kdp_fail; + + kdp->in_use = 1; + + /* Allocate mbufs and then put them into alloc_q */ + kdp_mbufs_allocate(kdp); + + return 0; + +kdp_fail: + return -1; +} + +static void +kdp_mbufs_free(struct kdp_data *kdp) +{ + int i, ret; + struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; + + ret = kdp_fifo_get(kdp->free_q, (void **)pkts, MAX_MBUF_BURST_NUM); + if (likely(ret > 0)) { + for (i = 0; i < ret; i++) + rte_pktmbuf_free(pkts[i]); + } +} + +unsigned +kdp_tx_burst(struct kdp_data *kdp, struct rte_mbuf **mbufs, unsigned num) +{ + unsigned ret = kdp_fifo_put(kdp->rx_q, (void **)mbufs, num); + + /* Get mbufs from free_q and then free them */ + kdp_mbufs_free(kdp); + + return ret; +} + +unsigned +kdp_rx_burst(struct kdp_data *kdp, struct rte_mbuf **mbufs, unsigned num) +{ + unsigned ret = kdp_fifo_get(kdp->tx_q, (void **)mbufs, num); + + /* If buffers removed, allocate mbufs and then put them into alloc_q */ + if (ret) + kdp_mbufs_allocate(kdp); + + return ret; +} + +static void +kdp_fifo_free(struct rte_kdp_fifo *fifo) +{ + int ret; + struct rte_mbuf *pkt; + + do { + ret = kdp_fifo_get(fifo, (void **)&pkt, 1); + if (ret) + rte_pktmbuf_free(pkt); + } while (ret); +} + +static int +rtnl_destroy(struct kdp_data *kdp) +{ + struct kdp_request req; + struct ifinfomsg *info; + struct iovec iov; + struct msghdr msg; + struct sockaddr_nl nladdr; + int ret; + + memset(&req, 0, sizeof(struct kdp_request)); + + req.nlmsg.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)); + req.nlmsg.nlmsg_flags = NLM_F_REQUEST; + req.nlmsg.nlmsg_type = RTM_DELLINK; + + info = NLMSG_DATA(&req.nlmsg); + + info->ifi_family = AF_UNSPEC; + info->ifi_index = 0; + + ret = attr_add(&req, IFLA_IFNAME, kdp->name, strlen(kdp->name) + 1); + if (ret < 0) + return -1; + + memset(&nladdr, 0, sizeof(nladdr)); + nladdr.nl_family = AF_NETLINK; + + iov.iov_base = (void *)&req.nlmsg; + iov.iov_len = req.nlmsg.nlmsg_len; + + memset(&msg, 0, sizeof(struct msghdr)); + msg.msg_name = &nladdr; + msg.msg_namelen = sizeof(nladdr); + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + + ret = sendmsg(kdp_fd, &msg, 0); + if (ret < 0) { + RTE_LOG(ERR, KDP, "Send for destroy failed.\n"); + return -1; + } + return 0; +} + +int +kdp_stop(struct kdp_data *kdp) +{ + struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; + int ret; + int i; + + if (!kdp || !kdp->in_use) + return -1; + + rtnl_destroy(kdp); + + do { + ret = kdp_fifo_get(kdp->free_q, (void **)pkts, + MAX_MBUF_BURST_NUM); + if (ret > 0) { + for (i = 0; i < ret; i++) + rte_pktmbuf_free(pkts[i]); + } + } while (ret > 0); + + do { + ret = kdp_fifo_get(kdp->alloc_q, (void **)pkts, + MAX_MBUF_BURST_NUM); + if (ret > 0) { + for (i = 0; i < ret; i++) + rte_pktmbuf_free(pkts[i]); + } + } while (ret > 0); + return 0; +} + +void +kdp_close(struct kdp_data *kdp) +{ + /* mbufs in all fifo should be released, except request/response */ + kdp_fifo_free(kdp->tx_q); + kdp_fifo_free(kdp->rx_q); + kdp_fifo_free(kdp->alloc_q); + kdp_fifo_free(kdp->free_q); + + rte_free(kdp->slot); + + /* Memset the KDP struct */ + memset(kdp, 0, sizeof(struct kdp_data)); +} + +void +kdp_uninit(void) +{ + kdp_ref_put(); +} diff --git a/drivers/net/kdp/rte_kdp.h b/drivers/net/kdp/rte_kdp.h new file mode 100644 index 0000000..20ad93d --- /dev/null +++ b/drivers/net/kdp/rte_kdp.h @@ -0,0 +1,116 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_KDP_H_ +#define _RTE_KDP_H_ + +#include <fcntl.h> +#include <unistd.h> + +#include <sys/ioctl.h> + +#include <rte_malloc.h> +#include <rte_mbuf.h> + +#include <exec-env/rte_kdp_common.h> + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * KDP memzone pool slot + */ +struct kdp_memzone_slot { + uint32_t id; + + /* Memzones */ + const struct rte_memzone *m_tx_q; /**< TX queue */ + const struct rte_memzone *m_rx_q; /**< RX queue */ + const struct rte_memzone *m_alloc_q; /**< Allocated mbufs queue */ + const struct rte_memzone *m_free_q; /**< To be freed mbufs queue */ +}; + +/** + * KDP context + */ +struct kdp_data { + char name[RTE_KDP_NAMESIZE]; /**< KDP interface name */ + struct rte_mempool *pktmbuf_pool; /**< pkt mbuf mempool */ + struct kdp_memzone_slot *slot; + uint16_t port_id; /**< Group ID of KDP devices */ + + struct rte_kdp_fifo *tx_q; /**< TX queue */ + struct rte_kdp_fifo *rx_q; /**< RX queue */ + struct rte_kdp_fifo *alloc_q; /**< Allocated mbufs queue */ + struct rte_kdp_fifo *free_q; /**< To be freed mbufs queue */ + + uint8_t in_use; /**< kdp in use */ +}; + +struct kdp_tap_data { + char name[RTE_KDP_NAMESIZE]; + int tap_fd; +}; + +/** + * Structure for configuring KDP device. + */ +struct kdp_conf { + char name[RTE_KDP_NAMESIZE]; + uint32_t core_id; /* Core ID to bind kernel thread on */ + uint16_t port_id; + unsigned mbuf_size; + + uint8_t force_bind; /* Flag to bind kernel thread */ +}; + +struct kdp_data *kdp_init(uint16_t port_id); +int kdp_start(struct kdp_data *kdp, struct rte_mempool *pktmbuf_pool, + const struct kdp_conf *conf); +unsigned kdp_rx_burst(struct kdp_data *kdp, + struct rte_mbuf **mbufs, unsigned num); +unsigned kdp_tx_burst(struct kdp_data *kdp, + struct rte_mbuf **mbufs, unsigned num); +int kdp_stop(struct kdp_data *kdp); +void kdp_close(struct kdp_data *kdp); +void kdp_uninit(void); + +struct kdp_tap_data *kdp_tap_init(uint16_t port_id); +void kdp_tap_close(struct kdp_tap_data *kdp_tap); + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_KDP_H_ */ diff --git a/drivers/net/kdp/rte_kdp_fifo.h b/drivers/net/kdp/rte_kdp_fifo.h new file mode 100644 index 0000000..1a7e063 --- /dev/null +++ b/drivers/net/kdp/rte_kdp_fifo.h @@ -0,0 +1,91 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/** + * Initializes the kdp fifo structure + */ +static void +kdp_fifo_init(struct rte_kdp_fifo *fifo, unsigned size) +{ + /* Ensure size is power of 2 */ + if (size & (size - 1)) + rte_panic("KDP fifo size must be power of 2\n"); + + fifo->write = 0; + fifo->read = 0; + fifo->len = size; + fifo->elem_size = sizeof(void *); +} + +/** + * Adds num elements into the fifo. Return the number actually written + */ +static inline unsigned +kdp_fifo_put(struct rte_kdp_fifo *fifo, void **data, unsigned num) +{ + unsigned i = 0; + unsigned fifo_write = fifo->write; + unsigned fifo_read = fifo->read; + unsigned new_write = fifo_write; + + for (i = 0; i < num; i++) { + new_write = (new_write + 1) & (fifo->len - 1); + + if (new_write == fifo_read) + break; + fifo->buffer[fifo_write] = data[i]; + fifo_write = new_write; + } + fifo->write = fifo_write; + return i; +} + +/** + * Get up to num elements from the fifo. Return the number actully read + */ +static inline unsigned +kdp_fifo_get(struct rte_kdp_fifo *fifo, void **data, unsigned num) +{ + unsigned i = 0; + unsigned new_read = fifo->read; + unsigned fifo_write = fifo->write; + for (i = 0; i < num; i++) { + if (new_read == fifo_write) + break; + + data[i] = fifo->buffer[new_read]; + new_read = (new_read + 1) & (fifo->len - 1); + } + fifo->read = new_read; + return i; +} diff --git a/drivers/net/kdp/rte_kdp_tap.c b/drivers/net/kdp/rte_kdp_tap.c new file mode 100644 index 0000000..12f3ad2 --- /dev/null +++ b/drivers/net/kdp/rte_kdp_tap.c @@ -0,0 +1,101 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <string.h> + +#include <sys/socket.h> +#include <linux/if.h> +#include <linux/if_tun.h> + +#include "rte_kdp.h" + +static int +tap_create(char *name) +{ + struct ifreq ifr; + int fd, ret; + + fd = open("/dev/net/tun", O_RDWR); + if (fd < 0) + return fd; + + memset(&ifr, 0, sizeof(ifr)); + + /* TAP device without packet information */ + ifr.ifr_flags = IFF_TAP | IFF_NO_PI; + + if (name && *name) + snprintf(ifr.ifr_name, IFNAMSIZ, "%s", name); + + ret = ioctl(fd, TUNSETIFF, (void *)&ifr); + if (ret < 0) { + close(fd); + return ret; + } + + if (name) + snprintf(name, IFNAMSIZ, "%s", ifr.ifr_name); + + return fd; +} + +struct kdp_tap_data * +kdp_tap_init(uint16_t port_id) +{ + struct kdp_tap_data *kdp_tap = NULL; + int flags; + + kdp_tap = rte_malloc(NULL, sizeof(struct kdp_tap_data), 0); + if (kdp_tap == NULL) + goto error; + + snprintf(kdp_tap->name, IFNAMSIZ, "tap_kdp%u", port_id); + kdp_tap->tap_fd = tap_create(kdp_tap->name); + if (kdp_tap->tap_fd < 0) + goto error; + + flags = fcntl(kdp_tap->tap_fd, F_GETFL, 0); + fcntl(kdp_tap->tap_fd, F_SETFL, flags | O_NONBLOCK); + + return kdp_tap; + +error: + rte_free(kdp_tap); + return NULL; +} + +void +kdp_tap_close(struct kdp_tap_data *kdp_tap) +{ + close(kdp_tap->tap_fd); +} diff --git a/drivers/net/kdp/rte_pmd_kdp_version.map b/drivers/net/kdp/rte_pmd_kdp_version.map new file mode 100644 index 0000000..0812bb1 --- /dev/null +++ b/drivers/net/kdp/rte_pmd_kdp_version.map @@ -0,0 +1,4 @@ +DPDK_2.3 { + + local: *; +}; diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h index 2e47e7f..5a0048b 100644 --- a/lib/librte_eal/common/include/rte_log.h +++ b/lib/librte_eal/common/include/rte_log.h @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2016 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -79,6 +79,7 @@ extern struct rte_logs rte_logs; #define RTE_LOGTYPE_PIPELINE 0x00008000 /**< Log related to pipeline. */ #define RTE_LOGTYPE_MBUF 0x00010000 /**< Log related to mbuf. */ #define RTE_LOGTYPE_CRYPTODEV 0x00020000 /**< Log related to cryptodev. */ +#define RTE_LOGTYPE_KDP 0x00080000 /**< Log related to KDP. */ /* these log types can be used in an application */ #define RTE_LOGTYPE_USER1 0x01000000 /**< User-defined log type 1. */ diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 8ecab41..eb18972 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # Copyright(c) 2014-2015 6WIND S.A. # All rights reserved. # @@ -154,6 +154,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT) += -lrte_pmd_qat +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += -lrte_pmd_kdp # AESNI MULTI BUFFER is dependent on the IPSec_MB library _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB) += -lrte_pmd_aesni_mb -- 2.5.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 1/2] kdp: add kernel data path kernel module Ferruh Yigit 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit @ 2016-03-09 11:17 ` Ferruh Yigit 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 1/2] kdp: add kernel data path kernel module Ferruh Yigit ` (2 more replies) 2 siblings, 3 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-03-09 11:17 UTC (permalink / raw) To: dev This patch sent to keep record of latest status of the work. This is slow data path communication implementation based on existing KNI. Difference is: librte_kni converted into a PMD, kdp kernel module is almost same except all control path functionality removed and some simplification done. Motivation is to simplify slow path data communication. Now any application can use this new PMD to send/get data to Linux kernel. PMD supports two communication methods: 1) KDP kernel module PMD initialization functions handles creating virtual interfaces (with help of kdp kernel module) and created FIFO. FIFO is used to share data between userspace and kernelspace. This is default method. 2) tun/tap module When KDP module is not inserted, PMD creates tap interface and transfers packets using tap interface. In long term this patch intends to replace the KNI and KNI will be depreciated. v3: * Remove logging helper macros, use pr_fmt * Replace rw_semaphore with mutex * Devices are not up by default * Use unsigned primitive types as possible * Update module parameters * Code cleanup, remove useless comments, reorder fields/code. v2: * Use rtnetlink to create interfaces * Include modules.h to prevent compile error in old kernels Sample usage: 1) Transfer any packet received from NIC that bound to DPDK, to the Linux kernel a) insert kdp kernel module insmod build/kmod/rte_kdp.ko b) bind NIC to the DPDK using dpdk_nic_bind.py c) ./testpmd --vdev eth_kdp0 c1) testpmd show two ports, one of them physical, other virtual ... Configuring Port 0 (socket 0) Port 0: 00:00:00:00:00:00 Configuring Port 1 (socket 0) ... Checking link statuses... Port 0 Link Up - speed 10000 Mbps - full-duplex Port 1 Link Up - speed 10000 Mbps - full-duplex Done c2) This will create "kdp0" Linux interface $ ip l show kdp0 21: kdp0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff d) Linux port can be used for data d1) $ ifconfig kdp0 1.0.0.2 $ ping 1.0.0.1 PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data. 64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=0.789 ms 64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=0.881 ms d2) $ tcpdump -nn -i kdp0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:01:22.407506 IP 1.0.0.1 > 1.0.0.2: ICMP echo request, id 40016, seq 18, length 64 15:01:22.408521 IP 1.0.0.2 > 1.0.0.1: ICMP echo reply, id 40016, seq 18, length 64 2) Data travels between virtual Linux interfaces pass from DPDK application, application can alter data a) insert kdp kernel module insmod build/kmod/rte_kdp.ko b) No physical NIC involved c) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 c1) testpmd show two ports, both of them are virtual ... Configuring Port 0 (socket 0) Port 0: 00:00:00:00:00:00 Configuring Port 1 (socket 0) Port 1: 00:00:00:00:00:00 Checking link statuses... Port 0 Link Up - speed 10000 Mbps - full-duplex Port 1 Link Up - speed 10000 Mbps - full-duplex Done c2) This will create "kdp0" and "kdp1" Linux interfaces $ ip l show kdp0; ip l show kdp1 22: kdp0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 23: kdp1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff d) Data travel between virtual ports pass from DPDK application $ifconfig kdp0 1.0.0.1 $ifconfig kdp1 1.0.0.2 d1) $ ping 1.0.0.1 PING 1.0.0.1 (1.0.0.1) 56(84) bytes of data. 64 bytes from 1.0.0.1: icmp_seq=1 ttl=64 time=3.57 ms 64 bytes from 1.0.0.1: icmp_seq=2 ttl=64 time=1.85 ms 64 bytes from 1.0.0.1: icmp_seq=3 ttl=64 time=1.89 ms d2) $ tcpdump -nn -i kdp0 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on kdp0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:20:51.908543 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 1, length 64 15:20:51.909570 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 1, length 64 15:20:52.909551 IP 1.0.0.2 > 1.0.0.1: ICMP echo request, id 41234, seq 2, length 64 15:20:52.910577 IP 1.0.0.1 > 1.0.0.2: ICMP echo reply, id 41234, seq 2, length 64 3) tun/tap interface usage a) No external module required, tun/tap support in kernel required b) ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 b1) This will create "tap_kdp0" and "tap_kdp1" Linux interfaces $ ip l show tap_kdp0; ip l show tap_kdp1 25: tap_kdp0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 56:47:97:9c:03:8e brd ff:ff:ff:ff:ff:ff 26: tap_kdp1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 500 link/ether 5e:15:22:b0:52:42 brd ff:ff:ff:ff:ff:ff Ferruh Yigit (2): kdp: add kernel data path kernel module kdp: add virtual PMD for kernel slow data path communication MAINTAINERS | 5 + config/common_base | 7 + config/common_linuxapp | 2 + doc/guides/nics/pcap_ring.rst | 125 +++- doc/guides/rel_notes/release_16_04.rst | 5 + drivers/net/Makefile | 3 +- drivers/net/kdp/Makefile | 61 ++ drivers/net/kdp/rte_eth_kdp.c | 501 ++++++++++++++ drivers/net/kdp/rte_kdp.c | 633 ++++++++++++++++++ drivers/net/kdp/rte_kdp.h | 116 ++++ drivers/net/kdp/rte_kdp_fifo.h | 91 +++ drivers/net/kdp/rte_kdp_tap.c | 101 +++ drivers/net/kdp/rte_pmd_kdp_version.map | 4 + lib/librte_eal/common/include/rte_log.h | 3 +- lib/librte_eal/linuxapp/Makefile | 3 +- lib/librte_eal/linuxapp/eal/Makefile | 3 +- .../linuxapp/eal/include/exec-env/rte_kdp_common.h | 134 ++++ lib/librte_eal/linuxapp/kdp/Makefile | 55 ++ lib/librte_eal/linuxapp/kdp/kdp_dev.h | 76 +++ lib/librte_eal/linuxapp/kdp/kdp_fifo.h | 91 +++ lib/librte_eal/linuxapp/kdp/kdp_net.c | 718 +++++++++++++++++++++ mk/rte.app.mk | 3 +- 22 files changed, 2732 insertions(+), 8 deletions(-) create mode 100644 drivers/net/kdp/Makefile create mode 100644 drivers/net/kdp/rte_eth_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.h create mode 100644 drivers/net/kdp/rte_kdp_fifo.h create mode 100644 drivers/net/kdp/rte_kdp_tap.c create mode 100644 drivers/net/kdp/rte_pmd_kdp_version.map create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h create mode 100644 lib/librte_eal/linuxapp/kdp/Makefile create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_dev.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_fifo.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_net.c -- 2.5.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 1/2] kdp: add kernel data path kernel module 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit @ 2016-03-09 11:17 ` Ferruh Yigit 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit 2016-03-14 15:32 ` [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2 siblings, 0 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-03-09 11:17 UTC (permalink / raw) To: dev This kernel module is based on KNI module, but this one is stripped version of it and only for data messages, no control functionality provided. FIFO implementation of the KNI is kept exact same, but ethtool related code removed and virtual network management related code simplified. This module contains kernel support to create network devices and this module has a simple driver for virtual network device, the driver simply puts/gets packets to/from FIFO instead of real hardware. FIFO is created owned by userspace application, which is for this case KDP PMD. In long term this patch intends to replace the KNI and KNI will be depreciated. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> --- v3: * Remove logging helper macros, use pr_fmt * Replace rw_semaphore with mutex * Devices are not up by default * Use unsigned primitive types as possible * Update module parameters * Code cleanup, remove useless comments, reorder fields/code. v2: * Use rtnetlink to create interfaces * include modules.h to prevent compile error in old kernels --- MAINTAINERS | 4 + config/common_base | 6 + config/common_linuxapp | 1 + lib/librte_eal/linuxapp/Makefile | 3 +- lib/librte_eal/linuxapp/eal/Makefile | 3 +- .../linuxapp/eal/include/exec-env/rte_kdp_common.h | 134 ++++ lib/librte_eal/linuxapp/kdp/Makefile | 55 ++ lib/librte_eal/linuxapp/kdp/kdp_dev.h | 76 +++ lib/librte_eal/linuxapp/kdp/kdp_fifo.h | 91 +++ lib/librte_eal/linuxapp/kdp/kdp_net.c | 718 +++++++++++++++++++++ 10 files changed, 1089 insertions(+), 2 deletions(-) create mode 100644 lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h create mode 100644 lib/librte_eal/linuxapp/kdp/Makefile create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_dev.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_fifo.h create mode 100644 lib/librte_eal/linuxapp/kdp/kdp_net.c diff --git a/MAINTAINERS b/MAINTAINERS index e253bf7..edcc4cc 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -258,6 +258,10 @@ F: app/test/test_kni.c F: examples/kni/ F: doc/guides/sample_app_ug/kernel_nic_interface.rst +Linux KDP +M: Ferruh Yigit <ferruh.yigit@gmail.com> +F: lib/librte_eal/linuxapp/kdp/ + Linux AF_PACKET M: John W. Linville <linville@tuxdriver.com> F: drivers/net/af_packet/ diff --git a/config/common_base b/config/common_base index c73f71a..973baff 100644 --- a/config/common_base +++ b/config/common_base @@ -302,6 +302,12 @@ CONFIG_RTE_LIBRTE_PMD_XENVIRT=n CONFIG_RTE_LIBRTE_PMD_NULL=y # +# Compile KDP PMD +# +CONFIG_RTE_KDP_KMOD=n +CONFIG_RTE_KDP_PREEMPT_DEFAULT=y + +# # Do prefetch of packet data within PMD driver receive function # CONFIG_RTE_PMD_PACKET_PREFETCH=y diff --git a/config/common_linuxapp b/config/common_linuxapp index ffbe260..569a0fe 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -39,6 +39,7 @@ CONFIG_RTE_EAL_IGB_UIO=y CONFIG_RTE_EAL_VFIO=y CONFIG_RTE_KNI_KMOD=y CONFIG_RTE_LIBRTE_KNI=y +CONFIG_RTE_KDP_KMOD=y CONFIG_RTE_LIBRTE_VHOST=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_POWER=y diff --git a/lib/librte_eal/linuxapp/Makefile b/lib/librte_eal/linuxapp/Makefile index 20d2a91..26c70f4 100644 --- a/lib/librte_eal/linuxapp/Makefile +++ b/lib/librte_eal/linuxapp/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2014 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -34,6 +34,7 @@ include $(RTE_SDK)/mk/rte.vars.mk DIRS-$(CONFIG_RTE_EXEC_ENV_LINUXAPP) += eal DIRS-$(CONFIG_RTE_EAL_IGB_UIO) += igb_uio DIRS-$(CONFIG_RTE_KNI_KMOD) += kni +DIRS-$(CONFIG_RTE_KDP_KMOD) += kdp DIRS-$(CONFIG_RTE_LIBRTE_XEN_DOM0) += xen_dom0 include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/lib/librte_eal/linuxapp/eal/Makefile b/lib/librte_eal/linuxapp/eal/Makefile index c5490e4..e75662d 100644 --- a/lib/librte_eal/linuxapp/eal/Makefile +++ b/lib/librte_eal/linuxapp/eal/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -121,6 +121,7 @@ CFLAGS_eal_thread.o += -Wno-return-type endif INC := rte_interrupts.h rte_kni_common.h rte_dom0_common.h +INC += rte_kdp_common.h SYMLINK-$(CONFIG_RTE_EXEC_ENV_LINUXAPP)-include/exec-env := \ $(addprefix include/exec-env/,$(INC)) diff --git a/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h new file mode 100644 index 0000000..b9db8ef --- /dev/null +++ b/lib/librte_eal/linuxapp/eal/include/exec-env/rte_kdp_common.h @@ -0,0 +1,134 @@ +/*- + * This file is provided under a dual BSD/LGPLv2 license. When using or + * redistributing this file, you may do so under either license. + * + * GNU LESSER GENERAL PUBLIC LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + * + * + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + */ + +#ifndef _RTE_KDP_COMMON_H_ +#define _RTE_KDP_COMMON_H_ + +/** + * KDP name + */ +#define RTE_KDP_NAMESIZE 32 + +#define KDP_DEVICE "kdp" + +/* + * Fifo struct mapped in a shared memory. It describes a circular buffer FIFO + * Write and read should wrap around. Fifo is empty when write == read + * Writing should never overwrite the read position + */ +struct rte_kdp_fifo { + volatile unsigned write; /**< Next position to be written*/ + volatile unsigned read; /**< Next position to be read */ + unsigned len; /**< Circular buffer length */ + unsigned elem_size; /**< Pointer size - for 32/64 bit OS */ + void * volatile buffer[0]; /**< The buffer contains mbuf pointers */ +}; + +/* + * The kernel image of the rte_mbuf struct, with only the relevant fields. + * Padding is necessary to assure the offsets of these fields + */ +struct rte_kdp_mbuf { + void *buf_addr __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); + char pad0[10]; + + uint16_t data_off; /**< Start address of data in segment buffer. */ + char pad1[4]; + uint64_t ol_flags; /**< Offload features. */ + char pad2[4]; + + uint32_t pkt_len; /**< Total pkt len: sum of all segment data_len. */ + + uint16_t data_len; /**< Amount of data in segment buffer. */ + + /* fields on second cache line */ + char pad3[8] __attribute__((__aligned__(RTE_CACHE_LINE_SIZE))); + void *pool; + void *next; +}; + +/* + * Struct used to create a KDP device. Passed to the kernel in IOCTL call + */ +struct rte_kdp_device_info { + char name[RTE_KDP_NAMESIZE]; /**< Network device name for KDP */ + uint16_t port_id; + + phys_addr_t tx_phys; + phys_addr_t rx_phys; + phys_addr_t alloc_phys; + phys_addr_t free_phys; + + /* mbuf mempool */ + void *mbuf_va; + phys_addr_t mbuf_phys; + + unsigned mbuf_size; + + uint8_t force_bind; /**< Flag for kernel thread binding */ + uint32_t core_id; /**< core ID to bind for kernel thread */ +}; + +enum { + IFLA_KDP_UNSPEC, + IFLA_KDP_PORTID, + IFLA_KDP_DEVINFO, + __IFLA_KDP_MAX, +}; +#define IFLA_KDP_MAX (__IFLA_KDP_MAX - 1) + +#endif /* _RTE_KDP_COMMON_H_ */ diff --git a/lib/librte_eal/linuxapp/kdp/Makefile b/lib/librte_eal/linuxapp/kdp/Makefile new file mode 100644 index 0000000..3897dc6 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/Makefile @@ -0,0 +1,55 @@ +# BSD LICENSE +# +# Copyright(c) 2016 Intel Corporation. All rights reserved. +# All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# module name and path +# +MODULE = rte_kdp + +# +# CFLAGS +# +MODULE_CFLAGS += -I$(SRCDIR) --param max-inline-insns-single=50 +MODULE_CFLAGS += -I$(RTE_OUTPUT)/include +MODULE_CFLAGS += -include $(RTE_OUTPUT)/include/rte_config.h +MODULE_CFLAGS += -Wall -Werror + +# this lib needs main eal +DEPDIRS-y += lib/librte_eal/linuxapp/eal + +# +# all source are stored in SRCS-y +# +SRCS-y += kdp_net.c + +include $(RTE_SDK)/mk/rte.module.mk diff --git a/lib/librte_eal/linuxapp/kdp/kdp_dev.h b/lib/librte_eal/linuxapp/kdp/kdp_dev.h new file mode 100644 index 0000000..0689e4f --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/kdp_dev.h @@ -0,0 +1,76 @@ +/*- + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + */ + +#ifndef _KDP_DEV_H_ +#define _KDP_DEV_H_ + +#include <exec-env/rte_kdp_common.h> + +#ifdef pr_fmt +#undef pr_fmt +#endif +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +/** + * A structure describing the private information for a kdp device. + */ +struct kdp_dev { + /* kdp list */ + struct list_head list; + + char name[RTE_KDP_NAMESIZE]; /* Network device name */ + + u8 port_id; + u32 core_id; /* Core ID to bind */ + + /* kdp device */ + struct net_device *net_dev; + + struct task_struct *pthread; + struct net_device_stats stats; + + /* queue for packets to be sent out */ + void *tx_q; + + /* queue for the packets received */ + void *rx_q; + + /* queue for the allocated mbufs those can be used to save sk buffs */ + void *alloc_q; + + /* free queue for the mbufs to be freed */ + void *free_q; + + void *mbuf_kva; + void *mbuf_va; + ssize_t addr_diff; + + /* mbuf size */ + unsigned mbuf_size; +}; + +#ifdef RTE_KDP_KO_DEBUG +#define KDP_DBG(args...) pr_debug(args) +#else +#define KDP_DBG(args...) +#endif + +#endif /* _KDP_DEV_H_ */ diff --git a/lib/librte_eal/linuxapp/kdp/kdp_fifo.h b/lib/librte_eal/linuxapp/kdp/kdp_fifo.h new file mode 100644 index 0000000..b70ce25 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/kdp_fifo.h @@ -0,0 +1,91 @@ +/*- + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + */ + +#ifndef _KDP_FIFO_H_ +#define _KDP_FIFO_H_ + +#include <exec-env/rte_kdp_common.h> + +/** + * Adds num elements into the fifo. Return the number actually written + */ +static inline size_t +kdp_fifo_put(struct rte_kdp_fifo *fifo, void **data, size_t num) +{ + size_t i; + u32 fifo_write = fifo->write; + u32 fifo_read = fifo->read; + u32 new_write = fifo_write; + + for (i = 0; i < num; i++) { + new_write = (new_write + 1) & (fifo->len - 1); + + if (new_write == fifo_read) + break; + fifo->buffer[fifo_write] = data[i]; + fifo_write = new_write; + } + fifo->write = fifo_write; + + return i; +} + +/** + * Get up to num elements from the fifo. Return the number actully read + */ +static inline size_t +kdp_fifo_get(struct rte_kdp_fifo *fifo, void **data, size_t num) +{ + size_t i = 0; + u32 new_read = fifo->read; + u32 fifo_write = fifo->write; + + for (i = 0; i < num; i++) { + if (new_read == fifo_write) + break; + + data[i] = fifo->buffer[new_read]; + new_read = (new_read + 1) & (fifo->len - 1); + } + fifo->read = new_read; + + return i; +} + +/** + * Get the num of elements in the fifo + */ +static inline size_t +kdp_fifo_count(struct rte_kdp_fifo *fifo) +{ + return (fifo->len + fifo->write - fifo->read) & (fifo->len - 1); +} + +/** + * Get the num of available elements in the fifo + */ +static inline size_t +kdp_fifo_free_count(struct rte_kdp_fifo *fifo) +{ + return (fifo->read - fifo->write - 1) & (fifo->len - 1); +} + +#endif /* _KDP_FIFO_H_ */ diff --git a/lib/librte_eal/linuxapp/kdp/kdp_net.c b/lib/librte_eal/linuxapp/kdp/kdp_net.c new file mode 100644 index 0000000..f089339 --- /dev/null +++ b/lib/librte_eal/linuxapp/kdp/kdp_net.c @@ -0,0 +1,718 @@ +/*- + * GPL LICENSE SUMMARY + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; + * + * Contact Information: + * Intel Corporation + */ + +#include <linux/etherdevice.h> +#include <linux/kthread.h> +#include <linux/module.h> +#include <linux/version.h> +#include <net/rtnetlink.h> + +#include "kdp_dev.h" +#include "kdp_fifo.h" + +#define WD_TIMEOUT 5 /*jiffies */ +#define MBUF_BURST_SZ 32 + +#define KDP_RX_LOOP_NUM 1000 +#define KDP_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ + +static struct task_struct *kdp_kthread; +static struct mutex kdp_list_lock; +static struct list_head kdp_list_head; + +/* loopback mode */ +static char *lo_mode; +module_param(lo_mode, charp, S_IRUGO); +MODULE_PARM_DESC(lo_mode, "Enable loopback mode: fifo or fifo_skb."); + +/* Kernel thread mode */ +static bool multiple_kthread; +module_param(multiple_kthread, bool, S_IRUGO); +MODULE_PARM_DESC(multiple_kthread, "Enable multiple kernel tread mode."); + +/* typedef for rx function */ +typedef void (*kdp_net_rx_t)(struct kdp_dev *kdp); + +static int kdp_net_open(struct net_device *dev) +{ + random_ether_addr(dev->dev_addr); + netif_start_queue(dev); + + return 0; +} + +static int kdp_net_close(struct net_device *dev) +{ + netif_stop_queue(dev); + + return 0; +} + +static inline void *va_to_kva(void *va, struct kdp_dev *kdp) +{ + return va + kdp->addr_diff; +} + +static inline void *pkt_data(struct rte_kdp_mbuf *pkt, struct kdp_dev *kdp) +{ + return va_to_kva(pkt->buf_addr + pkt->data_off, kdp); +} + +/* + * Transmit a packet (called by the kernel) + */ +static int kdp_net_tx(struct sk_buff *skb, struct net_device *dev) +{ + struct kdp_dev *kdp = netdev_priv(dev); + struct rte_kdp_mbuf *pkt; + void *pkt_va; + void *data; + u32 len; + u32 ret; + + dev->trans_start = jiffies; /* save the timestamp */ + + /* Check if the length of skb is less than mbuf size */ + if (skb->len > kdp->mbuf_size) + goto drop; + + /** + * Check if it has at least one free entry in tx_q and + * one entry in alloc_q. + */ + if (kdp_fifo_free_count(kdp->tx_q) == 0 || + kdp_fifo_count(kdp->alloc_q) == 0) { + /** + * If no free entry in tx_q or no entry in alloc_q, + * drops skb and goes out. + */ + goto drop; + } + + /* dequeue a mbuf from alloc_q */ + ret = kdp_fifo_get(kdp->alloc_q, &pkt_va, 1); + if (likely(ret == 1)) { + pkt = va_to_kva(pkt_va, kdp); + data = pkt_data(pkt, kdp); + + len = skb->len; + memcpy(data, skb->data, len); + if (unlikely(len < ETH_ZLEN)) { + memset(data + len, 0, ETH_ZLEN - len); + len = ETH_ZLEN; + } + pkt->pkt_len = len; + pkt->data_len = len; + + /* enqueue mbuf into tx_q */ + ret = kdp_fifo_put(kdp->tx_q, &pkt_va, 1); + if (unlikely(ret != 1)) { + /* Failing should not happen */ + pr_err("Fail to enqueue mbuf into tx_q\n"); + goto drop; + } + } else { + /* Failing should not happen */ + pr_err("Fail to dequeue mbuf from alloc_q\n"); + goto drop; + } + + /* Free skb and update statistics */ + dev_kfree_skb(skb); + kdp->stats.tx_bytes += len; + kdp->stats.tx_packets++; + + return NETDEV_TX_OK; + +drop: + /* Free skb and update statistics */ + dev_kfree_skb(skb); + kdp->stats.tx_dropped++; + + return NETDEV_TX_OK; +} + +static void kdp_net_set_rx_mode(struct net_device *dev) +{ +} + +static int kdp_net_set_mac(struct net_device *dev, void *p) +{ + struct sockaddr *addr = p; + + if (!is_valid_ether_addr(addr->sa_data)) + return -EADDRNOTAVAIL; + + memcpy(dev->dev_addr, addr->sa_data, dev->addr_len); + + return 0; +} + +static int kdp_net_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) +{ + return -EOPNOTSUPP; +} + +/* + * Configuration changes (passed on by ifconfig) + */ +static int kdp_net_config(struct net_device *dev, struct ifmap *map) +{ + if (dev->flags & IFF_UP) + return -EBUSY; + + return -EOPNOTSUPP; +} + +static int kdp_net_change_mtu(struct net_device *dev, int new_mtu) +{ + dev->mtu = new_mtu; + + return 0; +} + +/* + * Deal with a transmit timeout. + */ +static void kdp_net_tx_timeout(struct net_device *dev) +{ + struct kdp_dev *kdp = netdev_priv(dev); + + KDP_DBG("Transmit timeout at %ld, latency %ld\n", jiffies, + jiffies - dev->trans_start); + + kdp->stats.tx_errors++; + netif_wake_queue(dev); +} + +/* + * Return statistics to the caller + */ +static struct net_device_stats *kdp_net_stats(struct net_device *dev) +{ + struct kdp_dev *kdp = netdev_priv(dev); + + return &kdp->stats; +} + +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0)) +static int kdp_net_change_carrier(struct net_device *dev, bool new_carrier) +{ + if (new_carrier) + netif_carrier_on(dev); + else + netif_carrier_off(dev); + return 0; +} +#endif + +static const struct net_device_ops kdp_net_netdev_ops = { + .ndo_open = kdp_net_open, + .ndo_stop = kdp_net_close, + .ndo_start_xmit = kdp_net_tx, + .ndo_set_rx_mode = kdp_net_set_rx_mode, + .ndo_set_mac_address = kdp_net_set_mac, + .ndo_do_ioctl = kdp_net_ioctl, + .ndo_set_config = kdp_net_config, + .ndo_change_mtu = kdp_net_change_mtu, + .ndo_tx_timeout = kdp_net_tx_timeout, + .ndo_get_stats = kdp_net_stats, +#if (LINUX_VERSION_CODE >= KERNEL_VERSION(3, 9, 0)) + .ndo_change_carrier = kdp_net_change_carrier, +#endif +}; + +static void kdp_net_setup(struct net_device *dev) +{ + ether_setup(dev); + dev->netdev_ops = &kdp_net_netdev_ops; + dev->watchdog_timeo = WD_TIMEOUT; +} + +/* + * RX: normal working mode + */ +static void kdp_net_rx_normal(struct kdp_dev *kdp) +{ + struct net_device *dev = kdp->net_dev; + void *va[MBUF_BURST_SZ]; + struct rte_kdp_mbuf *pkt; + void *data; + struct sk_buff *skb; + size_t num_rx, num_fq; + size_t len; + size_t ret; + u32 i; + + /* Get the number of free entries in free_q */ + num_fq = kdp_fifo_free_count(kdp->free_q); + if (num_fq == 0) + return; /* No room on the free_q, bail out */ + + /* Calculate the number of entries to dequeue from rx_q */ + num_rx = min_t(size_t, num_fq, MBUF_BURST_SZ); + + /* Burst dequeue from rx_q */ + num_rx = kdp_fifo_get(kdp->rx_q, va, num_rx); + if (num_rx == 0) + return; + + /* Transfer received packets to netif */ + for (i = 0; i < num_rx; i++) { + pkt = va_to_kva(va[i], kdp); + data = pkt_data(pkt, kdp); + len = pkt->data_len; + + skb = dev_alloc_skb(len + 2); + if (!skb) { + kdp->stats.rx_dropped++; + continue; + } + + /* Align IP on 16B boundary */ + skb_reserve(skb, 2); + memcpy(skb_put(skb, len), data, len); + skb->dev = dev; + skb->protocol = eth_type_trans(skb, dev); + skb->ip_summed = CHECKSUM_UNNECESSARY; + + /* Call netif interface */ + netif_rx(skb); + + /* Update statistics */ + kdp->stats.rx_bytes += len; + kdp->stats.rx_packets++; + } + + /* Burst enqueue mbufs into free_q */ + ret = kdp_fifo_put(kdp->free_q, va, num_rx); + if (ret != num_rx) + /* Failing should not happen */ + pr_err("Fail to enqueue entries into free_q\n"); +} + +/* + * RX: loopback with enqueue/dequeue fifos. + */ +static void kdp_net_rx_lo_fifo(struct kdp_dev *kdp) +{ + void *va[MBUF_BURST_SZ]; + struct rte_kdp_mbuf *pkt; + void *data; + void *alloc_va[MBUF_BURST_SZ]; + struct rte_kdp_mbuf *alloc_pkt; + void *alloc_data; + size_t num, num_q; + size_t ret; + size_t len; + u32 i; + + /* Get the number of entries in rx_q */ + num_q = kdp_fifo_count(kdp->rx_q); + num = min_t(size_t, num_q, MBUF_BURST_SZ); + + /* Get the number of free entrie in tx_q */ + num_q = kdp_fifo_free_count(kdp->tx_q); + num = min_t(size_t, num, num_q); + + /* Get the number of entries in alloc_q */ + num_q = kdp_fifo_count(kdp->alloc_q); + num = min_t(size_t, num, num_q); + + /* Get the number of free entries in free_q */ + num_q = kdp_fifo_free_count(kdp->free_q); + num = min_t(size_t, num, num_q); + + /* Return if no entry to dequeue from rx_q */ + if (num == 0) + return; + + /* Dequeue entries from alloc_q */ + ret = kdp_fifo_get(kdp->alloc_q, alloc_va, num); + if (ret == 0) + return; + + /* Burst dequeue from rx_q */ + ret = kdp_fifo_get(kdp->rx_q, va, num); + if (ret == 0) { + /* recover enties from alloc_q before return */ + ret = kdp_fifo_put(kdp->free_q, alloc_va, num); + if (ret != num) + pr_err("Fail to enqueue alloc mbufs into free_q\n"); + return; + } + + num = ret; + /* Copy mbufs */ + for (i = 0; i < num; i++) { + pkt = va_to_kva(va[i], kdp); + data = pkt_data(pkt, kdp); + + alloc_pkt = va_to_kva(alloc_va[i], kdp); + alloc_data = pkt_data(alloc_pkt, kdp); + + len = pkt->pkt_len; + memcpy(alloc_data, data, len); + + alloc_pkt->pkt_len = len; + alloc_pkt->data_len = len; + + kdp->stats.tx_bytes += len; + kdp->stats.rx_bytes += len; + } + + /* Burst enqueue mbufs into tx_q */ + ret = kdp_fifo_put(kdp->tx_q, alloc_va, num); + if (ret != num) + /* Failing should not happen */ + pr_err("Fail to enqueue mbufs into tx_q\n"); + + /* Burst enqueue mbufs into free_q */ + ret = kdp_fifo_put(kdp->free_q, va, num); + if (ret != num) + /* Failing should not happen */ + pr_err("Fail to enqueue mbufs into free_q\n"); + + /** + * Update statistic, and enqueue/dequeue failure is impossible, + * as all queues are checked at first. + */ + kdp->stats.tx_packets += num; + kdp->stats.rx_packets += num; +} + +/* + * RX: loopback with enqueue/dequeue fifos and sk buffer copies. + */ +static void kdp_net_rx_lo_fifo_skb(struct kdp_dev *kdp) +{ + struct net_device *dev = kdp->net_dev; + void *va[MBUF_BURST_SZ]; + struct rte_kdp_mbuf *pkt; + void *data; + struct sk_buff *skb; + size_t num_rq, num_fq; + size_t ret; + size_t len; + size_t num; + u32 i; + + /* Get the number of entries in rx_q */ + num_rq = kdp_fifo_count(kdp->rx_q); + + /* Get the number of free entries in free_q */ + num_fq = kdp_fifo_free_count(kdp->free_q); + + /* Calculate the number of entries to dequeue from rx_q */ + num = min_t(size_t, num_rq, num_fq); + num = min_t(size_t, num, MBUF_BURST_SZ); + + /* Return if no entry to dequeue from rx_q */ + if (num == 0) + return; + + /* Burst dequeue mbufs from rx_q */ + ret = kdp_fifo_get(kdp->rx_q, va, num); + if (ret == 0) + return; + + num = ret; + /* Copy mbufs to sk buffer and then call tx interface */ + for (i = 0; i < num; i++) { + pkt = va_to_kva(va[i], kdp); + data = pkt_data(pkt, kdp); + len = pkt->data_len; + + skb = dev_alloc_skb(len + 2); + if (!skb) { + kdp->stats.rx_dropped++; + continue; + } + + /* Align IP on 16B boundary */ + skb_reserve(skb, 2); + memcpy(skb_put(skb, len), data, len); + skb->dev = dev; + skb->ip_summed = CHECKSUM_UNNECESSARY; + + kdp->stats.rx_bytes += len; + kdp->stats.rx_packets++; + + /* call tx interface */ + kdp_net_tx(skb, dev); + } + + /* enqueue all the mbufs from rx_q into free_q */ + ret = kdp_fifo_put(kdp->free_q, va, num); + if (ret != num) + /* Failing should not happen */ + pr_err("Fail to enqueue mbufs into free_q\n"); +} + +/* kdp rx function pointer, with default to normal rx */ +static kdp_net_rx_t kdp_net_rx_func = kdp_net_rx_normal; + +static int kdp_thread_single(void *data) +{ + struct kdp_dev *kdp; + u32 i; + + while (!kthread_should_stop()) { + mutex_lock(&kdp_list_lock); + for (i = 0; i < KDP_RX_LOOP_NUM; i++) + list_for_each_entry(kdp, &kdp_list_head, list) + (*kdp_net_rx_func)(kdp); + mutex_unlock(&kdp_list_lock); + +#ifdef RTE_KDP_PREEMPT_DEFAULT + /* reschedule out for a while */ + schedule_timeout_interruptible( + usecs_to_jiffies(KDP_KTHREAD_RESCHEDULE_INTERVAL)); +#endif + } + + return 0; +} + +static int kdp_thread_multiple(void *param) +{ + struct kdp_dev *kdp = param; + u32 i; + + while (!kthread_should_stop()) { + for (i = 0; i < KDP_RX_LOOP_NUM; i++) + (*kdp_net_rx_func)(kdp); + +#ifdef RTE_KDP_PREEMPT_DEFAULT + schedule_timeout_interruptible( + usecs_to_jiffies(KDP_KTHREAD_RESCHEDULE_INTERVAL)); +#endif + } + + return 0; +} + +static void kdp_setup(struct kdp_dev *kdp, struct rte_kdp_device_info *info) +{ + kdp->port_id = info->port_id; + kdp->core_id = info->core_id; + strncpy(kdp->name, info->name, RTE_KDP_NAMESIZE); + + /* Translate user space info into kernel space info */ + kdp->tx_q = phys_to_virt(info->tx_phys); + kdp->rx_q = phys_to_virt(info->rx_phys); + kdp->alloc_q = phys_to_virt(info->alloc_phys); + kdp->free_q = phys_to_virt(info->free_phys); + + kdp->mbuf_kva = phys_to_virt(info->mbuf_phys); + kdp->mbuf_va = info->mbuf_va; + kdp->addr_diff = kdp->mbuf_kva - kdp->mbuf_va; + + kdp->mbuf_size = info->mbuf_size; + + pr_info("tx_phys: 0x%016llx, tx_q addr: 0x%p\n", + (unsigned long long) info->tx_phys, kdp->tx_q); + pr_info("rx_phys: 0x%016llx, rx_q addr: 0x%p\n", + (unsigned long long) info->rx_phys, kdp->rx_q); + pr_info("alloc_phys: 0x%016llx, alloc_q addr: 0x%p\n", + (unsigned long long) info->alloc_phys, kdp->alloc_q); + pr_info("free_phys: 0x%016llx, free_q addr: 0x%p\n", + (unsigned long long) info->free_phys, kdp->free_q); + pr_info("mbuf_phys: 0x%016llx, mbuf_kva: 0x%p\n", + (unsigned long long) info->mbuf_phys, kdp->mbuf_kva); + pr_info("mbuf_va: 0x%p\n", info->mbuf_va); + pr_info("mbuf_size: %u\n", kdp->mbuf_size); +} + +static int create_kthread(struct kdp_dev *kdp, + struct rte_kdp_device_info *info) +{ + /** + * Create a new kernel thread for multiple mode, set its core affinity, + * and finally wake it up. + */ + if (multiple_kthread) { + /** + * Check if the cpu core id is valid for binding, + * for multiple kernel thread mode. + */ + if (info->force_bind && !cpu_online(kdp->core_id)) { + pr_err("cpu %u is not online\n", kdp->core_id); + return -EINVAL; + } + + kdp->pthread = kthread_create(kdp_thread_multiple, + (void *)kdp, "kdp_%s", kdp->name); + if (IS_ERR(kdp->pthread)) + return -ECANCELED; + + if (info->force_bind) + kthread_bind(kdp->pthread, kdp->core_id); + + wake_up_process(kdp->pthread); + + return 0; + } + + /* single thread */ + if (kdp_kthread == NULL) { + pr_info("Single kernel thread for all KDP devices\n"); + + /* Create kernel thread for RX */ + kdp_kthread = kthread_run(kdp_thread_single, NULL, + "kdp_single"); + if (IS_ERR(kdp_kthread)) { + pr_err("Unable to create kernel thread\n"); + return -ECANCELED; + } + } + + return 0; +} + +static int kdp_net_newlink(struct net *net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[]) +{ + struct rte_kdp_device_info dev_info; + struct kdp_dev *kdp = netdev_priv(dev); + int ret; + + if (data && data[IFLA_KDP_PORTID]) + kdp->port_id = nla_get_u8(data[IFLA_KDP_PORTID]); + else + goto error_free; + + if (data && data[IFLA_KDP_DEVINFO]) + memcpy(&dev_info, nla_data(data[IFLA_KDP_DEVINFO]), + sizeof(struct rte_kdp_device_info)); + else + goto error_free; + + kdp->net_dev = dev; + kdp_setup(kdp, &dev_info); + + ret = register_netdevice(dev); + if (ret < 0) + goto error_free; + + ret = create_kthread(kdp, &dev_info); + if (ret < 0) + goto error_unregister; + + mutex_lock(&kdp_list_lock); + list_add(&kdp->list, &kdp_list_head); + mutex_unlock(&kdp_list_lock); + + return 0; + +error_unregister: + unregister_netdev(dev); +error_free: + free_netdev(dev); + return -EINVAL; +} + +static void single_kthread_stop(void) +{ + /* Stop kernel thread for single mode */ + if (!multiple_kthread && kdp_kthread) { + kthread_stop(kdp_kthread); + kdp_kthread = NULL; + } +} + +static void multiple_kthread_stop(struct kdp_dev *kdp) +{ + /* Stop kernel thread for multiple mode */ + if (multiple_kthread && kdp->pthread) { + kthread_stop(kdp->pthread); + kdp->pthread = NULL; + } +} + +static void kdp_kthread_stop_one(struct kdp_dev *kdp) +{ + multiple_kthread_stop(kdp); + + mutex_lock(&kdp_list_lock); + if (list_empty(&kdp_list_head)) + single_kthread_stop(); + mutex_unlock(&kdp_list_lock); +} + +static void kdp_net_dellink(struct net_device *dev, struct list_head *head) +{ + struct kdp_dev *kdp = netdev_priv(dev); + + mutex_lock(&kdp_list_lock); + list_del(&kdp->list); + mutex_unlock(&kdp_list_lock); + + kdp_kthread_stop_one(kdp); + + unregister_netdevice_queue(dev, head); +} + +static struct rtnl_link_ops kdp_link_ops __read_mostly = { + .kind = KDP_DEVICE, + .priv_size = sizeof(struct kdp_dev), + .setup = kdp_net_setup, + .maxtype = IFLA_KDP_MAX, + .newlink = kdp_net_newlink, + .dellink = kdp_net_dellink, +}; + +static void __init kdp_net_config_lo_mode(char *lo_str) +{ + if (!lo_str) + return; + + if (!strcmp(lo_str, "fifo")) { + pr_info("loopback mode fifo enabled"); + kdp_net_rx_func = kdp_net_rx_lo_fifo; + } else if (!strcmp(lo_str, "fifo_skb")) { + pr_info("loopback mode fifo_skb enabled"); + kdp_net_rx_func = kdp_net_rx_lo_fifo_skb; + } else + pr_info("Incognizant parameter, loopback disabled"); +} + +static int __init kdp_init(void) +{ + /* Configure the loopback mode according to the input parameter */ + kdp_net_config_lo_mode(lo_mode); + + mutex_init(&kdp_list_lock); + INIT_LIST_HEAD(&kdp_list_head); + + return rtnl_link_register(&kdp_link_ops); +} +module_init(kdp_init); + +static void __exit kdp_exit(void) +{ + rtnl_link_unregister(&kdp_link_ops); +} +module_exit(kdp_exit); + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_AUTHOR("Intel Corporation"); +MODULE_DESCRIPTION("Kernel Module for managing kdp devices"); -- 2.5.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* [dpdk-dev] [PATCH v3 2/2] kdp: add virtual PMD for kernel slow data path communication 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 1/2] kdp: add kernel data path kernel module Ferruh Yigit @ 2016-03-09 11:17 ` Ferruh Yigit 2016-03-14 15:32 ` [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2 siblings, 0 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-03-09 11:17 UTC (permalink / raw) To: dev This patch provides slow data path communication to the Linux kernel. Patch is based on librte_kni, and heavily re-uses it. The main difference is librte_kni library converted into a PMD, to provide ease of use for applications. Now any application can use slow path communication without any update in application, because of existing eal support for virtual PMD. Also this PMD supports two methods to send packets to the Linux, first one is custom FIFO implementation with help of KDP kernel module, second one is Linux in-kernel tun/tap support. PMD first checks for KDP kernel module, if fails it tries to create and use a tap interface. With FIFO method: PMD's rx_pkt_burst() get packets from FIFO, and tx_pkt_burst() puts packet to the FIFO. The corresponding Linux virtual network device driver code also gets/puts packets from FIFO as they are coming from hardware. With tun/tap method: no external kernel module required, PMD reads from and writes packets to the tap interface file descriptor. Tap interface has performance penalty against FIFO implementation. Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com> --- v3: * No update v2: * Use rtnetlink to create interfaces --- MAINTAINERS | 1 + config/common_base | 1 + config/common_linuxapp | 1 + doc/guides/nics/pcap_ring.rst | 125 ++++++- doc/guides/rel_notes/release_16_04.rst | 5 + drivers/net/Makefile | 3 +- drivers/net/kdp/Makefile | 61 +++ drivers/net/kdp/rte_eth_kdp.c | 501 +++++++++++++++++++++++++ drivers/net/kdp/rte_kdp.c | 633 ++++++++++++++++++++++++++++++++ drivers/net/kdp/rte_kdp.h | 116 ++++++ drivers/net/kdp/rte_kdp_fifo.h | 91 +++++ drivers/net/kdp/rte_kdp_tap.c | 101 +++++ drivers/net/kdp/rte_pmd_kdp_version.map | 4 + lib/librte_eal/common/include/rte_log.h | 3 +- mk/rte.app.mk | 3 +- 15 files changed, 1643 insertions(+), 6 deletions(-) create mode 100644 drivers/net/kdp/Makefile create mode 100644 drivers/net/kdp/rte_eth_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.c create mode 100644 drivers/net/kdp/rte_kdp.h create mode 100644 drivers/net/kdp/rte_kdp_fifo.h create mode 100644 drivers/net/kdp/rte_kdp_tap.c create mode 100644 drivers/net/kdp/rte_pmd_kdp_version.map diff --git a/MAINTAINERS b/MAINTAINERS index edcc4cc..2174bac 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -261,6 +261,7 @@ F: doc/guides/sample_app_ug/kernel_nic_interface.rst Linux KDP M: Ferruh Yigit <ferruh.yigit@gmail.com> F: lib/librte_eal/linuxapp/kdp/ +F: drivers/net/kdp/ Linux AF_PACKET M: John W. Linville <linville@tuxdriver.com> diff --git a/config/common_base b/config/common_base index 973baff..767f391 100644 --- a/config/common_base +++ b/config/common_base @@ -306,6 +306,7 @@ CONFIG_RTE_LIBRTE_PMD_NULL=y # CONFIG_RTE_KDP_KMOD=n CONFIG_RTE_KDP_PREEMPT_DEFAULT=y +CONFIG_RTE_LIBRTE_PMD_KDP=n # # Do prefetch of packet data within PMD driver receive function diff --git a/config/common_linuxapp b/config/common_linuxapp index 569a0fe..fd25a38 100644 --- a/config/common_linuxapp +++ b/config/common_linuxapp @@ -40,6 +40,7 @@ CONFIG_RTE_EAL_VFIO=y CONFIG_RTE_KNI_KMOD=y CONFIG_RTE_LIBRTE_KNI=y CONFIG_RTE_KDP_KMOD=y +CONFIG_RTE_LIBRTE_PMD_KDP=y CONFIG_RTE_LIBRTE_VHOST=y CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y CONFIG_RTE_LIBRTE_POWER=y diff --git a/doc/guides/nics/pcap_ring.rst b/doc/guides/nics/pcap_ring.rst index aa48d33..b602e65 100644 --- a/doc/guides/nics/pcap_ring.rst +++ b/doc/guides/nics/pcap_ring.rst @@ -28,11 +28,11 @@ (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -Libpcap and Ring Based Poll Mode Drivers -======================================== +Software Poll Mode Drivers +========================== In addition to Poll Mode Drivers (PMDs) for physical and virtual hardware, -the DPDK also includes two pure-software PMDs. These two drivers are: +the DPDK also includes pure-software PMDs. These drivers are: * A libpcap -based PMD (librte_pmd_pcap) that reads and writes packets using libpcap, - both from files on disk, as well as from physical NIC devices using standard Linux kernel drivers. @@ -40,6 +40,10 @@ the DPDK also includes two pure-software PMDs. These two drivers are: * A ring-based PMD (librte_pmd_ring) that allows a set of software FIFOs (that is, rte_ring) to be accessed using the PMD APIs, as though they were physical NICs. +* A slow data path PMD (librte_pmd_kdp) that allows send/get packets to/from OS network + stack as it is a physical NIC. + + .. note:: The libpcap -based PMD is disabled by default in the build configuration files, @@ -211,6 +215,121 @@ Multiple devices may be specified, separated by commas. Done. +Kernel Data Path PMD +~~~~~~~~~~~~~~~~~~~~ + +Kernel Data Path (KDP) PMD is to communicate with OS network stack easily by application. + +.. code-block:: console + + ./testpmd --vdev eth_kdp0 --vdev eth_kdp1 -- -i + ... + Configuring Port 0 (socket 0) + Port 0: 00:00:00:00:00:00 + Configuring Port 1 (socket 0) + Port 1: 00:00:00:00:00:00 + Checking link statuses... + Port 0 Link Up - speed 10000 Mbps - full-duplex + Port 1 Link Up - speed 10000 Mbps - full-duplex + Done + +KDP PMD supports two type of communication: + +* Custom FIFO implementation +* tun/tap implementation + +Custom FIFO implementation gives more performance but requires KDP kernel module (rte_kdp.ko) inserted. + +By default FIFO communication has priority, if KDP kernel module is not inserted, tun/tap communication used. + +If KDP kernel module inserted, above testpmd command will create following virtual interfaces, these can be used as any interface. + +.. code-block:: console + + # ifconfig kdp0; ifconfig kdp1 + kdp0: flags=4098<BROADCAST,MULTICAST> mtu 1500 + ether 00:00:00:00:00:00 txqueuelen 1000 (Ethernet) + RX packets 0 bytes 0 (0.0 B) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 0 bytes 0 (0.0 B) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + kdp1: flags=4098<BROADCAST,MULTICAST> mtu 1500 + ether 00:00:00:00:00:00 txqueuelen 1000 (Ethernet) + RX packets 0 bytes 0 (0.0 B) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 0 bytes 0 (0.0 B) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + +With tun/tap communication method, following interfaces are created: + +.. code-block:: console + + # ifconfig tap_kdp0; ifconfig tap_kdp1 + tap_kdp0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 + inet6 fe80::341f:afff:feb7:23db prefixlen 64 scopeid 0x20<link> + ether 36:1f:af:b7:23:db txqueuelen 500 (Ethernet) + RX packets 126624864 bytes 6184828655 (5.7 GiB) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 126236898 bytes 6150306636 (5.7 GiB) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + tap_kdp1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 + inet6 fe80::f030:b4ff:fe94:b720 prefixlen 64 scopeid 0x20<link> + ether f2:30:b4:94:b7:20 txqueuelen 500 (Ethernet) + RX packets 126237370 bytes 6150329717 (5.7 GiB) + RX errors 0 dropped 9 overruns 0 frame 0 + TX packets 126624896 bytes 6184826874 (5.7 GiB) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + +DPDK application can be used to forward packets between these interfaces: + +.. code-block:: console + + In Linux: + ip l add br0 type bridge + ip l set tap_kdp0 master br0 + ip l set tap_kdp1 master br0 + ip l set br0 up + ip l set tap_kdp0 up + ip l set tap_kdp1 up + + + In testpmd: + testpmd> start + io packet forwarding - CRC stripping disabled - packets/burst=32 + nb forwarding cores=1 - nb forwarding ports=2 + RX queues=1 - RX desc=128 - RX free threshold=0 + RX threshold registers: pthresh=0 hthresh=0 wthresh=0 + TX queues=1 - TX desc=512 - TX free threshold=0 + TX threshold registers: pthresh=0 hthresh=0 wthresh=0 + TX RS bit threshold=0 - TXQ flags=0x0 + testpmd> stop + Telling cores to stop... + Waiting for lcores to finish... + + ---------------------- Forward statistics for port 0 ---------------------- + RX-packets: 973900 RX-dropped: 0 RX-total: 973900 + TX-packets: 973903 TX-dropped: 0 TX-total: 973903 + ---------------------------------------------------------------------------- + + ---------------------- Forward statistics for port 1 ---------------------- + RX-packets: 973903 RX-dropped: 0 RX-total: 973903 + TX-packets: 973900 TX-dropped: 0 TX-total: 973900 + ---------------------------------------------------------------------------- + + +++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++ + RX-packets: 1947803 RX-dropped: 0 RX-total: 1947803 + TX-packets: 1947803 TX-dropped: 0 TX-total: 1947803 + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + + Done. + + + + + Using the Poll Mode Driver from an Application ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/guides/rel_notes/release_16_04.rst b/doc/guides/rel_notes/release_16_04.rst index 96f144e..7f6b3aa 100644 --- a/doc/guides/rel_notes/release_16_04.rst +++ b/doc/guides/rel_notes/release_16_04.rst @@ -63,6 +63,11 @@ This section should contain new features added in this release. Sample format: space bytes, to boost the performance. In the meanwhile, it deprecated the legacy way via reading/writing sysfile supported by kernel module igb_uio. +* **Added Slow Data Path support.** + + * This is based on KNI work and in long term intends to replace it. + * Added Kernel Data Path (KDP) kernel module. + * Added KDP virtual PMD. Resolved Issues --------------- diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 0c3393f..78f923a 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # All rights reserved. # # Redistribution and use in source and binary forms, with or without @@ -51,5 +51,6 @@ DIRS-$(CONFIG_RTE_LIBRTE_PMD_SZEDATA2) += szedata2 DIRS-$(CONFIG_RTE_LIBRTE_VIRTIO_PMD) += virtio DIRS-$(CONFIG_RTE_LIBRTE_VMXNET3_PMD) += vmxnet3 DIRS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT) += xenvirt +DIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += kdp include $(RTE_SDK)/mk/rte.subdir.mk diff --git a/drivers/net/kdp/Makefile b/drivers/net/kdp/Makefile new file mode 100644 index 0000000..035056e --- /dev/null +++ b/drivers/net/kdp/Makefile @@ -0,0 +1,61 @@ +# BSD LICENSE +# +# Copyright(c) 2016 Intel Corporation. All rights reserved. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions +# are met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in +# the documentation and/or other materials provided with the +# distribution. +# * Neither the name of Intel Corporation nor the names of its +# contributors may be used to endorse or promote products derived +# from this software without specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +include $(RTE_SDK)/mk/rte.vars.mk + +# +# library name +# +LIB = librte_pmd_kdp.a + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) + +EXPORT_MAP := rte_pmd_kdp_version.map + +LIBABIVER := 1 + +# +# all source are stored in SRCS-y +# +SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_eth_kdp.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_kdp.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += rte_kdp_tap.c + +# +# Export include files +# +SYMLINK-y-include += + +# this lib depends upon: +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += lib/librte_mbuf +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += lib/librte_ether + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/drivers/net/kdp/rte_eth_kdp.c b/drivers/net/kdp/rte_eth_kdp.c new file mode 100644 index 0000000..68dd734 --- /dev/null +++ b/drivers/net/kdp/rte_eth_kdp.c @@ -0,0 +1,501 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <rte_ethdev.h> + +#include "rte_kdp.h" + +#define MAX_PACKET_SZ 2048 + +struct pmd_queue_stats { + uint64_t pkts; + uint64_t bytes; + uint64_t err_pkts; +}; + +struct pmd_queue { + struct pmd_internals *internals; + struct rte_mempool *mb_pool; + + struct pmd_queue_stats rx; + struct pmd_queue_stats tx; +}; + +struct pmd_internals { + struct kdp_data *kdp; + struct kdp_tap_data *kdp_tap; + + struct pmd_queue rx_queues[RTE_MAX_QUEUES_PER_PORT]; + struct pmd_queue tx_queues[RTE_MAX_QUEUES_PER_PORT]; +}; + +static struct ether_addr eth_addr = { .addr_bytes = {0} }; +static const char *drivername = "KDP PMD"; +static struct rte_eth_link pmd_link = { + .link_speed = 10000, + .link_duplex = ETH_LINK_FULL_DUPLEX, + .link_status = 0 +}; + +static uint16_t +eth_kdp_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct pmd_queue *kdp_q = q; + struct kdp_data *kdp = kdp_q->internals->kdp; + uint16_t nb_pkts; + + nb_pkts = kdp_rx_burst(kdp, bufs, nb_bufs); + + kdp_q->rx.pkts += nb_pkts; + kdp_q->rx.err_pkts += nb_bufs - nb_pkts; + + return nb_pkts; +} + +static uint16_t +eth_kdp_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct pmd_queue *kdp_q = q; + struct kdp_data *kdp = kdp_q->internals->kdp; + uint16_t nb_pkts; + + nb_pkts = kdp_tx_burst(kdp, bufs, nb_bufs); + + kdp_q->tx.pkts += nb_pkts; + kdp_q->tx.err_pkts += nb_bufs - nb_pkts; + + return nb_pkts; +} + +static uint16_t +eth_kdp_tap_rx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct pmd_queue *kdp_q = q; + struct pmd_internals *internals = kdp_q->internals; + struct kdp_tap_data *kdp_tap = internals->kdp_tap; + struct rte_mbuf *m; + int ret; + unsigned i; + + for (i = 0; i < nb_bufs; i++) { + m = rte_pktmbuf_alloc(kdp_q->mb_pool); + bufs[i] = m; + ret = read(kdp_tap->tap_fd, rte_pktmbuf_mtod(m, void *), + MAX_PACKET_SZ); + if (ret < 0) { + rte_pktmbuf_free(m); + break; + } + + m->nb_segs = 1; + m->next = NULL; + m->pkt_len = (uint16_t)ret; + m->data_len = (uint16_t)ret; + } + + kdp_q->rx.pkts += i; + kdp_q->rx.err_pkts += nb_bufs - i; + + return i; +} + +static uint16_t +eth_kdp_tap_tx(void *q, struct rte_mbuf **bufs, uint16_t nb_bufs) +{ + struct pmd_queue *kdp_q = q; + struct pmd_internals *internals = kdp_q->internals; + struct kdp_tap_data *kdp_tap = internals->kdp_tap; + struct rte_mbuf *m; + unsigned i; + + for (i = 0; i < nb_bufs; i++) { + m = bufs[i]; + write(kdp_tap->tap_fd, rte_pktmbuf_mtod(m, void*), + rte_pktmbuf_data_len(m)); + rte_pktmbuf_free(m); + } + + kdp_q->tx.pkts += i; + kdp_q->tx.err_pkts += nb_bufs - i; + + return i; +} + +static int +eth_kdp_start(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct kdp_conf conf; + uint16_t port_id = dev->data->port_id; + int ret = 0; + + snprintf(conf.name, RTE_KDP_NAMESIZE, KDP_DEVICE "%u", + port_id); + conf.force_bind = 0; + conf.port_id = port_id; + conf.mbuf_size = MAX_PACKET_SZ; + + ret = kdp_start(internals->kdp, + internals->rx_queues[0].mb_pool, + &conf); + if (ret) + RTE_LOG(ERR, KDP, "Fail to create kdp for port: %d\n", + port_id); + + return ret; +} + +static int +eth_kdp_dev_start(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + int ret; + + if (internals->kdp) { + ret = eth_kdp_start(dev); + if (ret) + return -1; + } + + dev->data->dev_link.link_status = 1; + return 0; +} + +static void +eth_kdp_dev_stop(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + + if (internals->kdp) + kdp_stop(internals->kdp); + + dev->data->dev_link.link_status = 0; +} + +static void +eth_kdp_dev_close(struct rte_eth_dev *dev) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct kdp_data *kdp = internals->kdp; + struct kdp_tap_data *kdp_tap = internals->kdp_tap; + + if (kdp) { + kdp_close(kdp); + + rte_free(kdp); + internals->kdp = NULL; + } + + if (kdp_tap) { + kdp_tap_close(kdp_tap); + + rte_free(kdp_tap); + internals->kdp_tap = NULL; + } + + rte_free(dev->data->dev_private); + dev->data->dev_private = NULL; +} + +static int +eth_kdp_dev_configure(struct rte_eth_dev *dev __rte_unused) +{ + return 0; +} + +static void +eth_kdp_dev_info(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) +{ + struct rte_eth_dev_data *data = dev->data; + + dev_info->driver_name = data->drv_name; + dev_info->max_mac_addrs = 1; + dev_info->max_rx_pktlen = (uint32_t)-1; + dev_info->max_rx_queues = data->nb_rx_queues; + dev_info->max_tx_queues = data->nb_tx_queues; + dev_info->min_rx_bufsize = 0; + dev_info->pci_dev = NULL; +} + +static int +eth_kdp_rx_queue_setup(struct rte_eth_dev *dev, + uint16_t rx_queue_id __rte_unused, + uint16_t nb_rx_desc __rte_unused, + unsigned int socket_id __rte_unused, + const struct rte_eth_rxconf *rx_conf __rte_unused, + struct rte_mempool *mb_pool) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct pmd_queue *q; + + q = &internals->rx_queues[rx_queue_id]; + q->internals = internals; + q->mb_pool = mb_pool; + + dev->data->rx_queues[rx_queue_id] = q; + + return 0; +} + +static int +eth_kdp_tx_queue_setup(struct rte_eth_dev *dev, + uint16_t tx_queue_id, + uint16_t nb_tx_desc __rte_unused, + unsigned int socket_id __rte_unused, + const struct rte_eth_txconf *tx_conf __rte_unused) +{ + struct pmd_internals *internals = dev->data->dev_private; + struct pmd_queue *q; + + q = &internals->tx_queues[tx_queue_id]; + q->internals = internals; + + dev->data->tx_queues[tx_queue_id] = q; + + return 0; +} + +static void +eth_kdp_queue_release(void *q __rte_unused) +{ +} + +static int +eth_kdp_link_update(struct rte_eth_dev *dev __rte_unused, + int wait_to_complete __rte_unused) +{ + return 0; +} + +static void +eth_kdp_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats) +{ + unsigned i, num_stats; + unsigned long rx_packets_total = 0, rx_bytes_total = 0; + unsigned long tx_packets_total = 0, tx_bytes_total = 0; + unsigned long tx_packets_err_total = 0; + struct rte_eth_dev_data *data = dev->data; + struct pmd_queue *q; + + num_stats = RTE_MIN((unsigned)RTE_ETHDEV_QUEUE_STAT_CNTRS, + data->nb_rx_queues); + for (i = 0; i < num_stats; i++) { + q = data->rx_queues[i]; + stats->q_ipackets[i] = q->rx.pkts; + stats->q_ibytes[i] = q->rx.bytes; + rx_packets_total += stats->q_ipackets[i]; + rx_bytes_total += stats->q_ibytes[i]; + } + + num_stats = RTE_MIN((unsigned)RTE_ETHDEV_QUEUE_STAT_CNTRS, + data->nb_tx_queues); + for (i = 0; i < num_stats; i++) { + q = data->tx_queues[i]; + stats->q_opackets[i] = q->tx.pkts; + stats->q_obytes[i] = q->tx.bytes; + stats->q_errors[i] = q->tx.err_pkts; + tx_packets_total += stats->q_opackets[i]; + tx_bytes_total += stats->q_obytes[i]; + tx_packets_err_total += stats->q_errors[i]; + } + + stats->ipackets = rx_packets_total; + stats->ibytes = rx_bytes_total; + stats->opackets = tx_packets_total; + stats->obytes = tx_bytes_total; + stats->oerrors = tx_packets_err_total; +} + +static void +eth_kdp_stats_reset(struct rte_eth_dev *dev) +{ + unsigned i; + struct rte_eth_dev_data *data = dev->data; + struct pmd_queue *q; + + for (i = 0; i < data->nb_rx_queues; i++) { + q = data->rx_queues[i]; + q->rx.pkts = 0; + q->rx.bytes = 0; + } + for (i = 0; i < data->nb_tx_queues; i++) { + q = data->tx_queues[i]; + q->tx.pkts = 0; + q->tx.bytes = 0; + q->tx.err_pkts = 0; + } +} + +static const struct eth_dev_ops eth_kdp_ops = { + .dev_start = eth_kdp_dev_start, + .dev_stop = eth_kdp_dev_stop, + .dev_close = eth_kdp_dev_close, + .dev_configure = eth_kdp_dev_configure, + .dev_infos_get = eth_kdp_dev_info, + .rx_queue_setup = eth_kdp_rx_queue_setup, + .tx_queue_setup = eth_kdp_tx_queue_setup, + .rx_queue_release = eth_kdp_queue_release, + .tx_queue_release = eth_kdp_queue_release, + .link_update = eth_kdp_link_update, + .stats_get = eth_kdp_stats_get, + .stats_reset = eth_kdp_stats_reset, +}; + +static struct rte_eth_dev * +eth_kdp_create(const char *name, unsigned numa_node) +{ + uint16_t nb_rx_queues = 1; + uint16_t nb_tx_queues = 1; + struct rte_eth_dev_data *data = NULL; + struct pmd_internals *internals = NULL; + struct rte_eth_dev *eth_dev = NULL; + + RTE_LOG(INFO, PMD, "Creating kdp ethdev on numa socket %u\n", + numa_node); + + data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node); + if (data == NULL) + goto error; + + internals = rte_zmalloc_socket(name, sizeof(*internals), 0, numa_node); + if (internals == NULL) + goto error; + + /* reserve an ethdev entry */ + eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL); + if (eth_dev == NULL) + goto error; + + data->dev_private = internals; + data->port_id = eth_dev->data->port_id; + memmove(data->name, eth_dev->data->name, sizeof(data->name)); + data->nb_rx_queues = nb_rx_queues; + data->nb_tx_queues = nb_tx_queues; + data->dev_link = pmd_link; + data->mac_addrs = ð_addr; + + eth_dev->data = data; + eth_dev->dev_ops = ð_kdp_ops; + eth_dev->driver = NULL; + + data->dev_flags = RTE_ETH_DEV_DETACHABLE; + data->kdrv = RTE_KDRV_NONE; + data->drv_name = drivername; + data->numa_node = numa_node; + + return eth_dev; + +error: + rte_free(data); + rte_free(internals); + + return NULL; +} + +static int +eth_kdp_devinit(const char *name, const char *params __rte_unused) +{ + struct rte_eth_dev *eth_dev = NULL; + struct pmd_internals *internals; + struct kdp_data *kdp; + struct kdp_tap_data *kdp_tap = NULL; + uint16_t port_id; + + RTE_LOG(INFO, PMD, "Initializing eth_kdp for %s\n", name); + + eth_dev = eth_kdp_create(name, rte_socket_id()); + if (eth_dev == NULL) + return -1; + + internals = eth_dev->data->dev_private; + port_id = eth_dev->data->port_id; + + kdp = kdp_init(port_id); + if (kdp == NULL) + kdp_tap = kdp_tap_init(port_id); + + if (kdp == NULL && kdp_tap == NULL) { + rte_eth_dev_release_port(eth_dev); + rte_free(internals); + + /* Not return error to prevent panic in rte_eal_init() */ + return 0; + } + + internals->kdp = kdp; + internals->kdp_tap = kdp_tap; + + if (kdp == NULL) { + eth_dev->rx_pkt_burst = eth_kdp_tap_rx; + eth_dev->tx_pkt_burst = eth_kdp_tap_tx; + } else { + eth_dev->rx_pkt_burst = eth_kdp_rx; + eth_dev->tx_pkt_burst = eth_kdp_tx; + } + + return 0; +} + +static int +eth_kdp_devuninit(const char *name) +{ + struct rte_eth_dev *eth_dev = NULL; + + RTE_LOG(INFO, PMD, "Un-Initializing eth_kdp for %s\n", name); + + /* find the ethdev entry */ + eth_dev = rte_eth_dev_allocated(name); + if (eth_dev == NULL) + return -1; + + eth_kdp_dev_stop(eth_dev); + + if (eth_dev->data) + rte_free(eth_dev->data->dev_private); + rte_free(eth_dev->data); + + rte_eth_dev_release_port(eth_dev); + + kdp_uninit(); + + return 0; +} + +static struct rte_driver eth_kdp_drv = { + .name = "eth_kdp", + .type = PMD_VDEV, + .init = eth_kdp_devinit, + .uninit = eth_kdp_devuninit, +}; + +PMD_REGISTER_DRIVER(eth_kdp_drv); diff --git a/drivers/net/kdp/rte_kdp.c b/drivers/net/kdp/rte_kdp.c new file mode 100644 index 0000000..ed50a0f --- /dev/null +++ b/drivers/net/kdp/rte_kdp.c @@ -0,0 +1,633 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef RTE_EXEC_ENV_LINUXAPP +#error "KDP is not supported" +#endif + +#include <sys/socket.h> +#include <linux/netlink.h> +#include <linux/rtnetlink.h> + +#include <rte_spinlock.h> +#include <rte_ethdev.h> +#include <rte_memzone.h> + +#include "rte_kdp.h" +#include "rte_kdp_fifo.h" + +#define KDP_MODULE_NAME "rte_kdp" +#define MAX_MBUF_BURST_NUM 32 + +/* Maximum number of ring entries */ +#define KDP_FIFO_COUNT_MAX 1024 +#define KDP_FIFO_SIZE (KDP_FIFO_COUNT_MAX * sizeof(void *) + \ + sizeof(struct rte_kdp_fifo)) + +#define BUFSZ 1024 +struct kdp_request { + struct nlmsghdr nlmsg; + char buf[BUFSZ]; +}; + +static int kdp_fd = -1; +static int kdp_ref_count; + +static const struct rte_memzone * +kdp_memzone_reserve(const char *name, size_t len, int socket_id, + unsigned flags) +{ + const struct rte_memzone *mz = rte_memzone_lookup(name); + + if (mz == NULL) + mz = rte_memzone_reserve(name, len, socket_id, flags); + + return mz; +} + +static int +kdp_slot_init(struct kdp_memzone_slot *slot) +{ +#define OBJNAMSIZ 32 + char obj_name[OBJNAMSIZ]; + const struct rte_memzone *mz; + + /* TX RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_tx_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_tx_q = mz; + + /* RX RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_rx_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_rx_q = mz; + + /* ALLOC RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_alloc_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_alloc_q = mz; + + /* FREE RING */ + snprintf(obj_name, OBJNAMSIZ, "kdp_free_%d", slot->id); + mz = kdp_memzone_reserve(obj_name, KDP_FIFO_SIZE, SOCKET_ID_ANY, 0); + if (mz == NULL) + goto kdp_fail; + slot->m_free_q = mz; + + return 0; + +kdp_fail: + return -1; +} + +static void +kdp_ring_init(struct kdp_data *kdp) +{ + struct kdp_memzone_slot *slot = kdp->slot; + const struct rte_memzone *mz; + + /* TX RING */ + mz = slot->m_tx_q; + kdp->tx_q = mz->addr; + kdp_fifo_init(kdp->tx_q, KDP_FIFO_COUNT_MAX); + + /* RX RING */ + mz = slot->m_rx_q; + kdp->rx_q = mz->addr; + kdp_fifo_init(kdp->rx_q, KDP_FIFO_COUNT_MAX); + + /* ALLOC RING */ + mz = slot->m_alloc_q; + kdp->alloc_q = mz->addr; + kdp_fifo_init(kdp->alloc_q, KDP_FIFO_COUNT_MAX); + + /* FREE RING */ + mz = slot->m_free_q; + kdp->free_q = mz->addr; + kdp_fifo_init(kdp->free_q, KDP_FIFO_COUNT_MAX); +} + +static int +kdp_module_check(void) +{ + int fd; + + fd = open("/sys/module/" KDP_MODULE_NAME "/initstate", O_RDONLY); + if (fd < 0) + return -1; + close(fd); + + return 0; +} + +static int +rtnl_socket_open(void) +{ + struct sockaddr_nl src; + int ret; + + /* Check FD and open */ + if (kdp_fd < 0) { + kdp_fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); + if (kdp_fd < 0) { + RTE_LOG(ERR, KDP, "socket for create failed.\n"); + return -1; + } + + memset(&src, 0, sizeof(struct sockaddr_nl)); + + src.nl_family = AF_NETLINK; + src.nl_pid = getpid(); + + ret = bind(kdp_fd, (struct sockaddr *)&src, + sizeof(struct sockaddr_nl)); + if (ret < 0) { + RTE_LOG(ERR, KDP, "Bind for create failed.\n"); + close(kdp_fd); + kdp_fd = -1; + return -1; + } + } + + kdp_ref_count++; + + return 0; +} + +static void +kdp_ref_put(void) +{ + /* not initialized? */ + if (!kdp_ref_count) + return; + + kdp_ref_count--; + + /* not last one? */ + if (kdp_ref_count) + return; + + if (kdp_fd < 0) + return; + + close(kdp_fd); + kdp_fd = -1; +} + +struct kdp_data * +kdp_init(uint16_t port_id) +{ + struct kdp_memzone_slot *slot = NULL; + struct kdp_data *kdp = NULL; + int ret; + + ret = kdp_module_check(); + if (ret) + return NULL; + + ret = rtnl_socket_open(); + if (ret) + return NULL; + + slot = rte_malloc(NULL, sizeof(struct kdp_memzone_slot), 0); + if (slot == NULL) + goto kdp_fail; + slot->id = port_id; + + kdp = rte_malloc(NULL, sizeof(struct kdp_data), 0); + if (kdp == NULL) + goto kdp_fail; + kdp->slot = slot; + + ret = kdp_slot_init(slot); + if (ret < 0) + goto kdp_fail; + + kdp_ring_init(kdp); + + return kdp; + +kdp_fail: + kdp_ref_put(); + rte_free(slot); + rte_free(kdp); + RTE_LOG(ERR, KDP, "Unable to allocate memory\n"); + return NULL; +} + +static void +kdp_mbufs_allocate(struct kdp_data *kdp) +{ + int i, ret; + struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; + + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pool) != + offsetof(struct rte_kdp_mbuf, pool)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_addr) != + offsetof(struct rte_kdp_mbuf, buf_addr)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, next) != + offsetof(struct rte_kdp_mbuf, next)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_off) != + offsetof(struct rte_kdp_mbuf, data_off)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, data_len) != + offsetof(struct rte_kdp_mbuf, data_len)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, pkt_len) != + offsetof(struct rte_kdp_mbuf, pkt_len)); + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, ol_flags) != + offsetof(struct rte_kdp_mbuf, ol_flags)); + + /* Check if pktmbuf pool has been configured */ + if (kdp->pktmbuf_pool == NULL) { + RTE_LOG(ERR, KDP, "No valid mempool for allocating mbufs\n"); + return; + } + + for (i = 0; i < MAX_MBUF_BURST_NUM; i++) { + pkts[i] = rte_pktmbuf_alloc(kdp->pktmbuf_pool); + if (unlikely(pkts[i] == NULL)) { + /* Out of memory */ + RTE_LOG(ERR, KDP, "Out of memory\n"); + break; + } + } + + /* No pkt mbuf alocated */ + if (i <= 0) + return; + + ret = kdp_fifo_put(kdp->alloc_q, (void **)pkts, i); + + /* Check if any mbufs not put into alloc_q, and then free them */ + if (ret >= 0 && ret < i && ret < MAX_MBUF_BURST_NUM) { + int j; + + for (j = ret; j < i; j++) + rte_pktmbuf_free(pkts[j]); + } +} + +static int +attr_add(struct kdp_request *req, unsigned short type, void *buf, size_t len) +{ + struct rtattr *rta; + int nlmsg_len; + + nlmsg_len = NLMSG_ALIGN(req->nlmsg.nlmsg_len); + rta = (struct rtattr *)((char *)&req->nlmsg + nlmsg_len); + if (nlmsg_len + RTA_LENGTH(len) > sizeof(struct kdp_request)) + return -1; + rta->rta_type = type; + rta->rta_len = RTA_LENGTH(len); + memcpy(RTA_DATA(rta), buf, len); + req->nlmsg.nlmsg_len = nlmsg_len + RTA_LENGTH(len); + + return 0; +} + +static struct +rtattr *attr_nested_add(struct kdp_request *req, unsigned short type) +{ + struct rtattr *rta; + int nlmsg_len; + + nlmsg_len = NLMSG_ALIGN(req->nlmsg.nlmsg_len); + rta = (struct rtattr *)((char *)&req->nlmsg + nlmsg_len); + if (nlmsg_len + RTA_LENGTH(0) > sizeof(struct kdp_request)) + return NULL; + rta->rta_type = type; + rta->rta_len = nlmsg_len; + req->nlmsg.nlmsg_len = nlmsg_len + RTA_LENGTH(0); + + return rta; +} + +static void +attr_nested_end(struct kdp_request *req, struct rtattr *rta) +{ + rta->rta_len = req->nlmsg.nlmsg_len - rta->rta_len; +} + +static int +rtnl_create(struct rte_kdp_device_info *dev_info) +{ + struct kdp_request req; + struct ifinfomsg *info; + struct rtattr *rta1; + struct rtattr *rta2; + char name[RTE_KDP_NAMESIZE]; + char type[RTE_KDP_NAMESIZE]; + struct iovec iov; + struct msghdr msg; + struct sockaddr_nl nladdr; + int ret; + char buf[BUFSZ]; + + memset(&req, 0, sizeof(struct kdp_request)); + + req.nlmsg.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)); + req.nlmsg.nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL; + req.nlmsg.nlmsg_flags |= NLM_F_ACK; + req.nlmsg.nlmsg_type = RTM_NEWLINK; + + info = NLMSG_DATA(&req.nlmsg); + + info->ifi_family = AF_UNSPEC; + info->ifi_index = 0; + + snprintf(name, RTE_KDP_NAMESIZE, "%s", dev_info->name); + ret = attr_add(&req, IFLA_IFNAME, name, strlen(name) + 1); + if (ret < 0) + return -1; + + rta1 = attr_nested_add(&req, IFLA_LINKINFO); + if (rta1 == NULL) + return -1; + + snprintf(type, RTE_KDP_NAMESIZE, KDP_DEVICE); + ret = attr_add(&req, IFLA_INFO_KIND, type, strlen(type) + 1); + if (ret < 0) + return -1; + + rta2 = attr_nested_add(&req, IFLA_INFO_DATA); + if (rta2 == NULL) + return -1; + + ret = attr_add(&req, IFLA_KDP_PORTID, &dev_info->port_id, + sizeof(uint8_t)); + if (ret < 0) + return -1; + + ret = attr_add(&req, IFLA_KDP_DEVINFO, dev_info, + sizeof(struct rte_kdp_device_info)); + if (ret < 0) + return -1; + + attr_nested_end(&req, rta2); + attr_nested_end(&req, rta1); + + memset(&nladdr, 0, sizeof(nladdr)); + nladdr.nl_family = AF_NETLINK; + + iov.iov_base = (void *)&req.nlmsg; + iov.iov_len = req.nlmsg.nlmsg_len; + + memset(&msg, 0, sizeof(struct msghdr)); + msg.msg_name = &nladdr; + msg.msg_namelen = sizeof(nladdr); + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + + ret = sendmsg(kdp_fd, &msg, 0); + if (ret < 0) { + RTE_LOG(ERR, KDP, "Send for create failed %d.\n", errno); + return -1; + } + + memset(buf, 0, sizeof(buf)); + iov.iov_base = buf; + iov.iov_len = sizeof(buf); + + ret = recvmsg(kdp_fd, &msg, 0); + if (ret < 0) { + RTE_LOG(ERR, KDP, "Recv for create failed.\n"); + return -1; + } + + return 0; +} + +int +kdp_start(struct kdp_data *kdp, struct rte_mempool *pktmbuf_pool, + const struct kdp_conf *conf) +{ + struct kdp_memzone_slot *slot = kdp->slot; + struct rte_kdp_device_info dev_info; + char mz_name[RTE_MEMZONE_NAMESIZE]; + const struct rte_memzone *mz; + int ret; + + if (!kdp || !pktmbuf_pool || !conf || !conf->name[0]) + return -1; + + snprintf(kdp->name, RTE_KDP_NAMESIZE, "%s", conf->name); + kdp->pktmbuf_pool = pktmbuf_pool; + kdp->port_id = conf->port_id; + + memset(&dev_info, 0, sizeof(dev_info)); + dev_info.core_id = conf->core_id; + dev_info.force_bind = conf->force_bind; + dev_info.port_id = conf->port_id; + dev_info.mbuf_size = conf->mbuf_size; + snprintf(dev_info.name, RTE_KDP_NAMESIZE, "%s", conf->name); + + dev_info.tx_phys = slot->m_tx_q->phys_addr; + dev_info.rx_phys = slot->m_rx_q->phys_addr; + dev_info.alloc_phys = slot->m_alloc_q->phys_addr; + dev_info.free_phys = slot->m_free_q->phys_addr; + + /* MBUF mempool */ + snprintf(mz_name, sizeof(mz_name), RTE_MEMPOOL_OBJ_NAME, + pktmbuf_pool->name); + mz = rte_memzone_lookup(mz_name); + if (mz == NULL) + goto kdp_fail; + dev_info.mbuf_va = mz->addr; + dev_info.mbuf_phys = mz->phys_addr; + + ret = rtnl_create(&dev_info); + if (ret < 0) + goto kdp_fail; + + kdp->in_use = 1; + + /* Allocate mbufs and then put them into alloc_q */ + kdp_mbufs_allocate(kdp); + + return 0; + +kdp_fail: + return -1; +} + +static void +kdp_mbufs_free(struct kdp_data *kdp) +{ + int i, ret; + struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; + + ret = kdp_fifo_get(kdp->free_q, (void **)pkts, MAX_MBUF_BURST_NUM); + if (likely(ret > 0)) { + for (i = 0; i < ret; i++) + rte_pktmbuf_free(pkts[i]); + } +} + +unsigned +kdp_tx_burst(struct kdp_data *kdp, struct rte_mbuf **mbufs, unsigned num) +{ + unsigned ret = kdp_fifo_put(kdp->rx_q, (void **)mbufs, num); + + /* Get mbufs from free_q and then free them */ + kdp_mbufs_free(kdp); + + return ret; +} + +unsigned +kdp_rx_burst(struct kdp_data *kdp, struct rte_mbuf **mbufs, unsigned num) +{ + unsigned ret = kdp_fifo_get(kdp->tx_q, (void **)mbufs, num); + + /* If buffers removed, allocate mbufs and then put them into alloc_q */ + if (ret) + kdp_mbufs_allocate(kdp); + + return ret; +} + +static void +kdp_fifo_free(struct rte_kdp_fifo *fifo) +{ + int ret; + struct rte_mbuf *pkt; + + do { + ret = kdp_fifo_get(fifo, (void **)&pkt, 1); + if (ret) + rte_pktmbuf_free(pkt); + } while (ret); +} + +static int +rtnl_destroy(struct kdp_data *kdp) +{ + struct kdp_request req; + struct ifinfomsg *info; + struct iovec iov; + struct msghdr msg; + struct sockaddr_nl nladdr; + int ret; + + memset(&req, 0, sizeof(struct kdp_request)); + + req.nlmsg.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)); + req.nlmsg.nlmsg_flags = NLM_F_REQUEST; + req.nlmsg.nlmsg_type = RTM_DELLINK; + + info = NLMSG_DATA(&req.nlmsg); + + info->ifi_family = AF_UNSPEC; + info->ifi_index = 0; + + ret = attr_add(&req, IFLA_IFNAME, kdp->name, strlen(kdp->name) + 1); + if (ret < 0) + return -1; + + memset(&nladdr, 0, sizeof(nladdr)); + nladdr.nl_family = AF_NETLINK; + + iov.iov_base = (void *)&req.nlmsg; + iov.iov_len = req.nlmsg.nlmsg_len; + + memset(&msg, 0, sizeof(struct msghdr)); + msg.msg_name = &nladdr; + msg.msg_namelen = sizeof(nladdr); + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + + ret = sendmsg(kdp_fd, &msg, 0); + if (ret < 0) { + RTE_LOG(ERR, KDP, "Send for destroy failed.\n"); + return -1; + } + return 0; +} + +int +kdp_stop(struct kdp_data *kdp) +{ + struct rte_mbuf *pkts[MAX_MBUF_BURST_NUM]; + int ret; + int i; + + if (!kdp || !kdp->in_use) + return -1; + + rtnl_destroy(kdp); + + do { + ret = kdp_fifo_get(kdp->free_q, (void **)pkts, + MAX_MBUF_BURST_NUM); + if (ret > 0) { + for (i = 0; i < ret; i++) + rte_pktmbuf_free(pkts[i]); + } + } while (ret > 0); + + do { + ret = kdp_fifo_get(kdp->alloc_q, (void **)pkts, + MAX_MBUF_BURST_NUM); + if (ret > 0) { + for (i = 0; i < ret; i++) + rte_pktmbuf_free(pkts[i]); + } + } while (ret > 0); + return 0; +} + +void +kdp_close(struct kdp_data *kdp) +{ + /* mbufs in all fifo should be released, except request/response */ + kdp_fifo_free(kdp->tx_q); + kdp_fifo_free(kdp->rx_q); + kdp_fifo_free(kdp->alloc_q); + kdp_fifo_free(kdp->free_q); + + rte_free(kdp->slot); + + /* Memset the KDP struct */ + memset(kdp, 0, sizeof(struct kdp_data)); +} + +void +kdp_uninit(void) +{ + kdp_ref_put(); +} diff --git a/drivers/net/kdp/rte_kdp.h b/drivers/net/kdp/rte_kdp.h new file mode 100644 index 0000000..20ad93d --- /dev/null +++ b/drivers/net/kdp/rte_kdp.h @@ -0,0 +1,116 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_KDP_H_ +#define _RTE_KDP_H_ + +#include <fcntl.h> +#include <unistd.h> + +#include <sys/ioctl.h> + +#include <rte_malloc.h> +#include <rte_mbuf.h> + +#include <exec-env/rte_kdp_common.h> + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * KDP memzone pool slot + */ +struct kdp_memzone_slot { + uint32_t id; + + /* Memzones */ + const struct rte_memzone *m_tx_q; /**< TX queue */ + const struct rte_memzone *m_rx_q; /**< RX queue */ + const struct rte_memzone *m_alloc_q; /**< Allocated mbufs queue */ + const struct rte_memzone *m_free_q; /**< To be freed mbufs queue */ +}; + +/** + * KDP context + */ +struct kdp_data { + char name[RTE_KDP_NAMESIZE]; /**< KDP interface name */ + struct rte_mempool *pktmbuf_pool; /**< pkt mbuf mempool */ + struct kdp_memzone_slot *slot; + uint16_t port_id; /**< Group ID of KDP devices */ + + struct rte_kdp_fifo *tx_q; /**< TX queue */ + struct rte_kdp_fifo *rx_q; /**< RX queue */ + struct rte_kdp_fifo *alloc_q; /**< Allocated mbufs queue */ + struct rte_kdp_fifo *free_q; /**< To be freed mbufs queue */ + + uint8_t in_use; /**< kdp in use */ +}; + +struct kdp_tap_data { + char name[RTE_KDP_NAMESIZE]; + int tap_fd; +}; + +/** + * Structure for configuring KDP device. + */ +struct kdp_conf { + char name[RTE_KDP_NAMESIZE]; + uint32_t core_id; /* Core ID to bind kernel thread on */ + uint16_t port_id; + unsigned mbuf_size; + + uint8_t force_bind; /* Flag to bind kernel thread */ +}; + +struct kdp_data *kdp_init(uint16_t port_id); +int kdp_start(struct kdp_data *kdp, struct rte_mempool *pktmbuf_pool, + const struct kdp_conf *conf); +unsigned kdp_rx_burst(struct kdp_data *kdp, + struct rte_mbuf **mbufs, unsigned num); +unsigned kdp_tx_burst(struct kdp_data *kdp, + struct rte_mbuf **mbufs, unsigned num); +int kdp_stop(struct kdp_data *kdp); +void kdp_close(struct kdp_data *kdp); +void kdp_uninit(void); + +struct kdp_tap_data *kdp_tap_init(uint16_t port_id); +void kdp_tap_close(struct kdp_tap_data *kdp_tap); + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_KDP_H_ */ diff --git a/drivers/net/kdp/rte_kdp_fifo.h b/drivers/net/kdp/rte_kdp_fifo.h new file mode 100644 index 0000000..1a7e063 --- /dev/null +++ b/drivers/net/kdp/rte_kdp_fifo.h @@ -0,0 +1,91 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/** + * Initializes the kdp fifo structure + */ +static void +kdp_fifo_init(struct rte_kdp_fifo *fifo, unsigned size) +{ + /* Ensure size is power of 2 */ + if (size & (size - 1)) + rte_panic("KDP fifo size must be power of 2\n"); + + fifo->write = 0; + fifo->read = 0; + fifo->len = size; + fifo->elem_size = sizeof(void *); +} + +/** + * Adds num elements into the fifo. Return the number actually written + */ +static inline unsigned +kdp_fifo_put(struct rte_kdp_fifo *fifo, void **data, unsigned num) +{ + unsigned i = 0; + unsigned fifo_write = fifo->write; + unsigned fifo_read = fifo->read; + unsigned new_write = fifo_write; + + for (i = 0; i < num; i++) { + new_write = (new_write + 1) & (fifo->len - 1); + + if (new_write == fifo_read) + break; + fifo->buffer[fifo_write] = data[i]; + fifo_write = new_write; + } + fifo->write = fifo_write; + return i; +} + +/** + * Get up to num elements from the fifo. Return the number actully read + */ +static inline unsigned +kdp_fifo_get(struct rte_kdp_fifo *fifo, void **data, unsigned num) +{ + unsigned i = 0; + unsigned new_read = fifo->read; + unsigned fifo_write = fifo->write; + for (i = 0; i < num; i++) { + if (new_read == fifo_write) + break; + + data[i] = fifo->buffer[new_read]; + new_read = (new_read + 1) & (fifo->len - 1); + } + fifo->read = new_read; + return i; +} diff --git a/drivers/net/kdp/rte_kdp_tap.c b/drivers/net/kdp/rte_kdp_tap.c new file mode 100644 index 0000000..12f3ad2 --- /dev/null +++ b/drivers/net/kdp/rte_kdp_tap.c @@ -0,0 +1,101 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2016 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include <string.h> + +#include <sys/socket.h> +#include <linux/if.h> +#include <linux/if_tun.h> + +#include "rte_kdp.h" + +static int +tap_create(char *name) +{ + struct ifreq ifr; + int fd, ret; + + fd = open("/dev/net/tun", O_RDWR); + if (fd < 0) + return fd; + + memset(&ifr, 0, sizeof(ifr)); + + /* TAP device without packet information */ + ifr.ifr_flags = IFF_TAP | IFF_NO_PI; + + if (name && *name) + snprintf(ifr.ifr_name, IFNAMSIZ, "%s", name); + + ret = ioctl(fd, TUNSETIFF, (void *)&ifr); + if (ret < 0) { + close(fd); + return ret; + } + + if (name) + snprintf(name, IFNAMSIZ, "%s", ifr.ifr_name); + + return fd; +} + +struct kdp_tap_data * +kdp_tap_init(uint16_t port_id) +{ + struct kdp_tap_data *kdp_tap = NULL; + int flags; + + kdp_tap = rte_malloc(NULL, sizeof(struct kdp_tap_data), 0); + if (kdp_tap == NULL) + goto error; + + snprintf(kdp_tap->name, IFNAMSIZ, "tap_kdp%u", port_id); + kdp_tap->tap_fd = tap_create(kdp_tap->name); + if (kdp_tap->tap_fd < 0) + goto error; + + flags = fcntl(kdp_tap->tap_fd, F_GETFL, 0); + fcntl(kdp_tap->tap_fd, F_SETFL, flags | O_NONBLOCK); + + return kdp_tap; + +error: + rte_free(kdp_tap); + return NULL; +} + +void +kdp_tap_close(struct kdp_tap_data *kdp_tap) +{ + close(kdp_tap->tap_fd); +} diff --git a/drivers/net/kdp/rte_pmd_kdp_version.map b/drivers/net/kdp/rte_pmd_kdp_version.map new file mode 100644 index 0000000..349c6e1 --- /dev/null +++ b/drivers/net/kdp/rte_pmd_kdp_version.map @@ -0,0 +1,4 @@ +DPDK_16.04 { + + local: *; +}; diff --git a/lib/librte_eal/common/include/rte_log.h b/lib/librte_eal/common/include/rte_log.h index 2e47e7f..5a0048b 100644 --- a/lib/librte_eal/common/include/rte_log.h +++ b/lib/librte_eal/common/include/rte_log.h @@ -1,7 +1,7 @@ /*- * BSD LICENSE * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * Copyright(c) 2010-2016 Intel Corporation. All rights reserved. * All rights reserved. * * Redistribution and use in source and binary forms, with or without @@ -79,6 +79,7 @@ extern struct rte_logs rte_logs; #define RTE_LOGTYPE_PIPELINE 0x00008000 /**< Log related to pipeline. */ #define RTE_LOGTYPE_MBUF 0x00010000 /**< Log related to mbuf. */ #define RTE_LOGTYPE_CRYPTODEV 0x00020000 /**< Log related to cryptodev. */ +#define RTE_LOGTYPE_KDP 0x00080000 /**< Log related to KDP. */ /* these log types can be used in an application */ #define RTE_LOGTYPE_USER1 0x01000000 /**< User-defined log type 1. */ diff --git a/mk/rte.app.mk b/mk/rte.app.mk index daac09f..cdce5e9 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -1,6 +1,6 @@ # BSD LICENSE # -# Copyright(c) 2010-2015 Intel Corporation. All rights reserved. +# Copyright(c) 2010-2016 Intel Corporation. All rights reserved. # Copyright(c) 2014-2015 6WIND S.A. # All rights reserved. # @@ -145,6 +145,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_PCAP) += -lrte_pmd_pcap _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AF_PACKET) += -lrte_pmd_af_packet _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_NULL) += -lrte_pmd_null _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT) += -lrte_pmd_qat +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KDP) += -lrte_pmd_kdp # AESNI MULTI BUFFER is dependent on the IPSec_MB library _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB) += -lrte_pmd_aesni_mb -- 2.5.0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 1/2] kdp: add kernel data path kernel module Ferruh Yigit 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit @ 2016-03-14 15:32 ` Ferruh Yigit 2016-03-16 7:26 ` Panu Matilainen 2 siblings, 1 reply; 29+ messages in thread From: Ferruh Yigit @ 2016-03-14 15:32 UTC (permalink / raw) To: dev; +Cc: David Marchand, Helin Zhang On 3/9/2016 11:17 AM, Ferruh Yigit wrote: > This patch sent to keep record of latest status of the work. > > > This is slow data path communication implementation based on existing KNI. > > Difference is: librte_kni converted into a PMD, kdp kernel module is almost > same except all control path functionality removed and some simplification done. > > Motivation is to simplify slow path data communication. > Now any application can use this new PMD to send/get data to Linux kernel. > > PMD supports two communication methods: > > 1) KDP kernel module > PMD initialization functions handles creating virtual interfaces (with help of > kdp kernel module) and created FIFO. FIFO is used to share data between > userspace and kernelspace. This is default method. > > 2) tun/tap module > When KDP module is not inserted, PMD creates tap interface and transfers > packets using tap interface. > > In long term this patch intends to replace the KNI and KNI will be > depreciated. > Self-NACK: Will work on another option that does not introduce new kernel module. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-14 15:32 ` [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit @ 2016-03-16 7:26 ` Panu Matilainen 2016-03-16 8:19 ` Ferruh Yigit 0 siblings, 1 reply; 29+ messages in thread From: Panu Matilainen @ 2016-03-16 7:26 UTC (permalink / raw) To: Ferruh Yigit, dev; +Cc: David Marchand, Helin Zhang On 03/14/2016 05:32 PM, Ferruh Yigit wrote: > On 3/9/2016 11:17 AM, Ferruh Yigit wrote: >> This patch sent to keep record of latest status of the work. >> >> >> This is slow data path communication implementation based on existing KNI. >> >> Difference is: librte_kni converted into a PMD, kdp kernel module is almost >> same except all control path functionality removed and some simplification done. >> >> Motivation is to simplify slow path data communication. >> Now any application can use this new PMD to send/get data to Linux kernel. >> >> PMD supports two communication methods: >> >> 1) KDP kernel module >> PMD initialization functions handles creating virtual interfaces (with help of >> kdp kernel module) and created FIFO. FIFO is used to share data between >> userspace and kernelspace. This is default method. >> >> 2) tun/tap module >> When KDP module is not inserted, PMD creates tap interface and transfers >> packets using tap interface. >> >> In long term this patch intends to replace the KNI and KNI will be >> depreciated. >> > > Self-NACK: Will work on another option that does not introduce new > kernel module. > Hmm, care to elaborate a bit? The second mode of this PMD already was free of external kernel modules. Do you mean you'll be just removing mode 1) from the PMD or looking at something completely different? Just thinking that tun/tap PMD sounds like a useful thing to have, I hope you're not abandoning that. - Panu - ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 7:26 ` Panu Matilainen @ 2016-03-16 8:19 ` Ferruh Yigit 2016-03-16 8:22 ` Panu Matilainen 0 siblings, 1 reply; 29+ messages in thread From: Ferruh Yigit @ 2016-03-16 8:19 UTC (permalink / raw) To: Panu Matilainen, dev; +Cc: David Marchand, Helin Zhang On 3/16/2016 7:26 AM, Panu Matilainen wrote: > On 03/14/2016 05:32 PM, Ferruh Yigit wrote: >> On 3/9/2016 11:17 AM, Ferruh Yigit wrote: >>> This patch sent to keep record of latest status of the work. >>> >>> >>> This is slow data path communication implementation based on existing KNI. >>> >>> Difference is: librte_kni converted into a PMD, kdp kernel module is almost >>> same except all control path functionality removed and some simplification done. >>> >>> Motivation is to simplify slow path data communication. >>> Now any application can use this new PMD to send/get data to Linux kernel. >>> >>> PMD supports two communication methods: >>> >>> 1) KDP kernel module >>> PMD initialization functions handles creating virtual interfaces (with help of >>> kdp kernel module) and created FIFO. FIFO is used to share data between >>> userspace and kernelspace. This is default method. >>> >>> 2) tun/tap module >>> When KDP module is not inserted, PMD creates tap interface and transfers >>> packets using tap interface. >>> >>> In long term this patch intends to replace the KNI and KNI will be >>> depreciated. >>> >> >> Self-NACK: Will work on another option that does not introduce new >> kernel module. >> > > Hmm, care to elaborate a bit? The second mode of this PMD already was > free of external kernel modules. Do you mean you'll be just removing > mode 1) from the PMD or looking at something completely different? > > Just thinking that tun/tap PMD sounds like a useful thing to have, I > hope you're not abandoning that. > It will be KNI PMD. Plan is to have something like KDP, but with existing KNI kernel module. There will be tun/tap support as fallback. Regards, ferruh ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 8:19 ` Ferruh Yigit @ 2016-03-16 8:22 ` Panu Matilainen 2016-03-16 10:26 ` Ferruh Yigit 2016-03-16 11:07 ` Bruce Richardson 0 siblings, 2 replies; 29+ messages in thread From: Panu Matilainen @ 2016-03-16 8:22 UTC (permalink / raw) To: Ferruh Yigit, dev; +Cc: David Marchand, Helin Zhang On 03/16/2016 10:19 AM, Ferruh Yigit wrote: > On 3/16/2016 7:26 AM, Panu Matilainen wrote: >> On 03/14/2016 05:32 PM, Ferruh Yigit wrote: >>> On 3/9/2016 11:17 AM, Ferruh Yigit wrote: >>>> This patch sent to keep record of latest status of the work. >>>> >>>> >>>> This is slow data path communication implementation based on existing KNI. >>>> >>>> Difference is: librte_kni converted into a PMD, kdp kernel module is almost >>>> same except all control path functionality removed and some simplification done. >>>> >>>> Motivation is to simplify slow path data communication. >>>> Now any application can use this new PMD to send/get data to Linux kernel. >>>> >>>> PMD supports two communication methods: >>>> >>>> 1) KDP kernel module >>>> PMD initialization functions handles creating virtual interfaces (with help of >>>> kdp kernel module) and created FIFO. FIFO is used to share data between >>>> userspace and kernelspace. This is default method. >>>> >>>> 2) tun/tap module >>>> When KDP module is not inserted, PMD creates tap interface and transfers >>>> packets using tap interface. >>>> >>>> In long term this patch intends to replace the KNI and KNI will be >>>> depreciated. >>>> >>> >>> Self-NACK: Will work on another option that does not introduce new >>> kernel module. >>> >> >> Hmm, care to elaborate a bit? The second mode of this PMD already was >> free of external kernel modules. Do you mean you'll be just removing >> mode 1) from the PMD or looking at something completely different? >> >> Just thinking that tun/tap PMD sounds like a useful thing to have, I >> hope you're not abandoning that. >> > > It will be KNI PMD. > Plan is to have something like KDP, but with existing KNI kernel module. > There will be tun/tap support as fallback. Hum, now I'm confused. I was under the impression everybody hated KNI and wanted to get rid of it, and certainly not build future solutions on top of it? - Panu - > > Regards, > ferruh > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 8:22 ` Panu Matilainen @ 2016-03-16 10:26 ` Ferruh Yigit 2016-03-16 10:45 ` Thomas Monjalon 2016-03-16 13:15 ` Panu Matilainen 2016-03-16 11:07 ` Bruce Richardson 1 sibling, 2 replies; 29+ messages in thread From: Ferruh Yigit @ 2016-03-16 10:26 UTC (permalink / raw) To: Panu Matilainen, dev; +Cc: David Marchand, Helin Zhang On 3/16/2016 8:22 AM, Panu Matilainen wrote: > On 03/16/2016 10:19 AM, Ferruh Yigit wrote: >> On 3/16/2016 7:26 AM, Panu Matilainen wrote: >>> On 03/14/2016 05:32 PM, Ferruh Yigit wrote: >>>> On 3/9/2016 11:17 AM, Ferruh Yigit wrote: >>>>> This patch sent to keep record of latest status of the work. >>>>> >>>>> >>>>> This is slow data path communication implementation based on existing KNI. >>>>> >>>>> Difference is: librte_kni converted into a PMD, kdp kernel module is almost >>>>> same except all control path functionality removed and some simplification done. >>>>> >>>>> Motivation is to simplify slow path data communication. >>>>> Now any application can use this new PMD to send/get data to Linux kernel. >>>>> >>>>> PMD supports two communication methods: >>>>> >>>>> 1) KDP kernel module >>>>> PMD initialization functions handles creating virtual interfaces (with help of >>>>> kdp kernel module) and created FIFO. FIFO is used to share data between >>>>> userspace and kernelspace. This is default method. >>>>> >>>>> 2) tun/tap module >>>>> When KDP module is not inserted, PMD creates tap interface and transfers >>>>> packets using tap interface. >>>>> >>>>> In long term this patch intends to replace the KNI and KNI will be >>>>> depreciated. >>>>> >>>> >>>> Self-NACK: Will work on another option that does not introduce new >>>> kernel module. >>>> >>> >>> Hmm, care to elaborate a bit? The second mode of this PMD already was >>> free of external kernel modules. Do you mean you'll be just removing >>> mode 1) from the PMD or looking at something completely different? >>> >>> Just thinking that tun/tap PMD sounds like a useful thing to have, I >>> hope you're not abandoning that. >>> >> >> It will be KNI PMD. >> Plan is to have something like KDP, but with existing KNI kernel module. >> There will be tun/tap support as fallback. > > Hum, now I'm confused. I was under the impression everybody hated KNI > and wanted to get rid of it, and certainly not build future solutions on > top of it? > We can't remove it. We can't replace/improve it -you were one of the major opposition to this. This doesn't leave more option other than using it. There won't be any update in KNI kernel module, library + sample app will be converted into PMD. Regards, ferruh ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 10:26 ` Ferruh Yigit @ 2016-03-16 10:45 ` Thomas Monjalon 2016-03-16 11:07 ` Mcnamara, John 2016-03-16 11:13 ` Ferruh Yigit 2016-03-16 13:15 ` Panu Matilainen 1 sibling, 2 replies; 29+ messages in thread From: Thomas Monjalon @ 2016-03-16 10:45 UTC (permalink / raw) To: Ferruh Yigit; +Cc: dev, Panu Matilainen, David Marchand, Helin Zhang 2016-03-16 10:26, Ferruh Yigit: > On 3/16/2016 8:22 AM, Panu Matilainen wrote: > > On 03/16/2016 10:19 AM, Ferruh Yigit wrote: > >> On 3/16/2016 7:26 AM, Panu Matilainen wrote: > >>> On 03/14/2016 05:32 PM, Ferruh Yigit wrote: > >>>> On 3/9/2016 11:17 AM, Ferruh Yigit wrote: > >>>>> This patch sent to keep record of latest status of the work. > >>>>> > >>>>> > >>>>> This is slow data path communication implementation based on existing KNI. > >>>>> > >>>>> Difference is: librte_kni converted into a PMD, kdp kernel module is almost > >>>>> same except all control path functionality removed and some simplification done. > >>>>> > >>>>> Motivation is to simplify slow path data communication. > >>>>> Now any application can use this new PMD to send/get data to Linux kernel. > >>>>> > >>>>> PMD supports two communication methods: > >>>>> > >>>>> 1) KDP kernel module > >>>>> PMD initialization functions handles creating virtual interfaces (with help of > >>>>> kdp kernel module) and created FIFO. FIFO is used to share data between > >>>>> userspace and kernelspace. This is default method. > >>>>> > >>>>> 2) tun/tap module > >>>>> When KDP module is not inserted, PMD creates tap interface and transfers > >>>>> packets using tap interface. > >>>>> > >>>>> In long term this patch intends to replace the KNI and KNI will be > >>>>> depreciated. > >>>>> > >>>> > >>>> Self-NACK: Will work on another option that does not introduce new > >>>> kernel module. > >>>> > >>> > >>> Hmm, care to elaborate a bit? The second mode of this PMD already was > >>> free of external kernel modules. Do you mean you'll be just removing > >>> mode 1) from the PMD or looking at something completely different? > >>> > >>> Just thinking that tun/tap PMD sounds like a useful thing to have, I > >>> hope you're not abandoning that. > >>> > >> > >> It will be KNI PMD. > >> Plan is to have something like KDP, but with existing KNI kernel module. > >> There will be tun/tap support as fallback. > > > > Hum, now I'm confused. I was under the impression everybody hated KNI > > and wanted to get rid of it, and certainly not build future solutions on > > top of it? > > We can't remove it. Why? > We can't replace/improve it -you were one of the major opposition to this. > This doesn't leave more option other than using it. Why cannot we replace it by something upstream? > There won't be any update in KNI kernel module, library + sample app > will be converted into PMD. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 10:45 ` Thomas Monjalon @ 2016-03-16 11:07 ` Mcnamara, John 2016-03-16 11:13 ` Ferruh Yigit 1 sibling, 0 replies; 29+ messages in thread From: Mcnamara, John @ 2016-03-16 11:07 UTC (permalink / raw) To: Thomas Monjalon, Yigit, Ferruh Cc: dev, Panu Matilainen, David Marchand, Zhang, Helin > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Thomas Monjalon > Sent: Wednesday, March 16, 2016 10:46 AM > To: Yigit, Ferruh <ferruh.yigit@intel.com> > Cc: dev@dpdk.org; Panu Matilainen <pmatilai@redhat.com>; David > Marchand <david.marchand@6wind.com>; Zhang, Helin > <helin.zhang@intel.com> > Subject: Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication > between DPDK port and Linux > > > > We can't remove it. > > Why? There are a lot of people using KNI. > > We can't replace/improve it -you were one of the major opposition to this. > > This doesn't leave more option other than using it. > > Why cannot we replace it by something upstream? In theory it could be upstreamed. Let's see how we get on with upstreaming the KCP component first. John ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 10:45 ` Thomas Monjalon 2016-03-16 11:07 ` Mcnamara, John @ 2016-03-16 11:13 ` Ferruh Yigit 2016-03-16 13:23 ` Panu Matilainen 1 sibling, 1 reply; 29+ messages in thread From: Ferruh Yigit @ 2016-03-16 11:13 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, Panu Matilainen, David Marchand, Helin Zhang On 3/16/2016 10:45 AM, Thomas Monjalon wrote: > 2016-03-16 10:26, Ferruh Yigit: >> On 3/16/2016 8:22 AM, Panu Matilainen wrote: >>> On 03/16/2016 10:19 AM, Ferruh Yigit wrote: >>>> On 3/16/2016 7:26 AM, Panu Matilainen wrote: >>>>> On 03/14/2016 05:32 PM, Ferruh Yigit wrote: >>>>>> On 3/9/2016 11:17 AM, Ferruh Yigit wrote: >>>>>>> This patch sent to keep record of latest status of the work. >>>>>>> >>>>>>> >>>>>>> This is slow data path communication implementation based on existing KNI. >>>>>>> >>>>>>> Difference is: librte_kni converted into a PMD, kdp kernel module is almost >>>>>>> same except all control path functionality removed and some simplification done. >>>>>>> >>>>>>> Motivation is to simplify slow path data communication. >>>>>>> Now any application can use this new PMD to send/get data to Linux kernel. >>>>>>> >>>>>>> PMD supports two communication methods: >>>>>>> >>>>>>> 1) KDP kernel module >>>>>>> PMD initialization functions handles creating virtual interfaces (with help of >>>>>>> kdp kernel module) and created FIFO. FIFO is used to share data between >>>>>>> userspace and kernelspace. This is default method. >>>>>>> >>>>>>> 2) tun/tap module >>>>>>> When KDP module is not inserted, PMD creates tap interface and transfers >>>>>>> packets using tap interface. >>>>>>> >>>>>>> In long term this patch intends to replace the KNI and KNI will be >>>>>>> depreciated. >>>>>>> >>>>>> >>>>>> Self-NACK: Will work on another option that does not introduce new >>>>>> kernel module. >>>>>> >>>>> >>>>> Hmm, care to elaborate a bit? The second mode of this PMD already was >>>>> free of external kernel modules. Do you mean you'll be just removing >>>>> mode 1) from the PMD or looking at something completely different? >>>>> >>>>> Just thinking that tun/tap PMD sounds like a useful thing to have, I >>>>> hope you're not abandoning that. >>>>> >>>> >>>> It will be KNI PMD. >>>> Plan is to have something like KDP, but with existing KNI kernel module. >>>> There will be tun/tap support as fallback. >>> >>> Hum, now I'm confused. I was under the impression everybody hated KNI >>> and wanted to get rid of it, and certainly not build future solutions on >>> top of it? >> >> We can't remove it. > > Why? > >> We can't replace/improve it -you were one of the major opposition to this. >> This doesn't leave more option other than using it. > > Why cannot we replace it by something upstream? > I doubt KDP is upstream-able to Linux community. If somebody can, that is great. Even for KCP, upstreaming task is still under discussion, and as a heads up, it is likely to be dropped. Regards, ferruh >> There won't be any update in KNI kernel module, library + sample app >> will be converted into PMD. > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 11:13 ` Ferruh Yigit @ 2016-03-16 13:23 ` Panu Matilainen 0 siblings, 0 replies; 29+ messages in thread From: Panu Matilainen @ 2016-03-16 13:23 UTC (permalink / raw) To: Ferruh Yigit, Thomas Monjalon; +Cc: dev, David Marchand, Helin Zhang On 03/16/2016 01:13 PM, Ferruh Yigit wrote: > On 3/16/2016 10:45 AM, Thomas Monjalon wrote: >> 2016-03-16 10:26, Ferruh Yigit: >>> On 3/16/2016 8:22 AM, Panu Matilainen wrote: >>>> On 03/16/2016 10:19 AM, Ferruh Yigit wrote: >>>>> On 3/16/2016 7:26 AM, Panu Matilainen wrote: >>>>>> On 03/14/2016 05:32 PM, Ferruh Yigit wrote: >>>>>>> On 3/9/2016 11:17 AM, Ferruh Yigit wrote: >>>>>>>> This patch sent to keep record of latest status of the work. >>>>>>>> >>>>>>>> >>>>>>>> This is slow data path communication implementation based on existing KNI. >>>>>>>> >>>>>>>> Difference is: librte_kni converted into a PMD, kdp kernel module is almost >>>>>>>> same except all control path functionality removed and some simplification done. >>>>>>>> >>>>>>>> Motivation is to simplify slow path data communication. >>>>>>>> Now any application can use this new PMD to send/get data to Linux kernel. >>>>>>>> >>>>>>>> PMD supports two communication methods: >>>>>>>> >>>>>>>> 1) KDP kernel module >>>>>>>> PMD initialization functions handles creating virtual interfaces (with help of >>>>>>>> kdp kernel module) and created FIFO. FIFO is used to share data between >>>>>>>> userspace and kernelspace. This is default method. >>>>>>>> >>>>>>>> 2) tun/tap module >>>>>>>> When KDP module is not inserted, PMD creates tap interface and transfers >>>>>>>> packets using tap interface. >>>>>>>> >>>>>>>> In long term this patch intends to replace the KNI and KNI will be >>>>>>>> depreciated. >>>>>>>> >>>>>>> >>>>>>> Self-NACK: Will work on another option that does not introduce new >>>>>>> kernel module. >>>>>>> >>>>>> >>>>>> Hmm, care to elaborate a bit? The second mode of this PMD already was >>>>>> free of external kernel modules. Do you mean you'll be just removing >>>>>> mode 1) from the PMD or looking at something completely different? >>>>>> >>>>>> Just thinking that tun/tap PMD sounds like a useful thing to have, I >>>>>> hope you're not abandoning that. >>>>>> >>>>> >>>>> It will be KNI PMD. >>>>> Plan is to have something like KDP, but with existing KNI kernel module. >>>>> There will be tun/tap support as fallback. >>>> >>>> Hum, now I'm confused. I was under the impression everybody hated KNI >>>> and wanted to get rid of it, and certainly not build future solutions on >>>> top of it? >>> >>> We can't remove it. >> >> Why? >> >>> We can't replace/improve it -you were one of the major opposition to this. >>> This doesn't leave more option other than using it. >> >> Why cannot we replace it by something upstream? >> > I doubt KDP is upstream-able to Linux community. If somebody can, that > is great. > > Even for KCP, upstreaming task is still under discussion, and as a heads > up, it is likely to be dropped. If KCP/KDP are not upstreamable then the solution is to find another way that is. Easier said than done, no doubt. - Panu - ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 10:26 ` Ferruh Yigit 2016-03-16 10:45 ` Thomas Monjalon @ 2016-03-16 13:15 ` Panu Matilainen 2016-03-16 13:58 ` Thomas Monjalon 1 sibling, 1 reply; 29+ messages in thread From: Panu Matilainen @ 2016-03-16 13:15 UTC (permalink / raw) To: Ferruh Yigit, dev; +Cc: David Marchand, Helin Zhang, Thomas Monjalon On 03/16/2016 12:26 PM, Ferruh Yigit wrote: > On 3/16/2016 8:22 AM, Panu Matilainen wrote: >> On 03/16/2016 10:19 AM, Ferruh Yigit wrote: >>> On 3/16/2016 7:26 AM, Panu Matilainen wrote: >>>> On 03/14/2016 05:32 PM, Ferruh Yigit wrote: >>>>> On 3/9/2016 11:17 AM, Ferruh Yigit wrote: >>>>>> This patch sent to keep record of latest status of the work. >>>>>> >>>>>> >>>>>> This is slow data path communication implementation based on existing KNI. >>>>>> >>>>>> Difference is: librte_kni converted into a PMD, kdp kernel module is almost >>>>>> same except all control path functionality removed and some simplification done. >>>>>> >>>>>> Motivation is to simplify slow path data communication. >>>>>> Now any application can use this new PMD to send/get data to Linux kernel. >>>>>> >>>>>> PMD supports two communication methods: >>>>>> >>>>>> 1) KDP kernel module >>>>>> PMD initialization functions handles creating virtual interfaces (with help of >>>>>> kdp kernel module) and created FIFO. FIFO is used to share data between >>>>>> userspace and kernelspace. This is default method. >>>>>> >>>>>> 2) tun/tap module >>>>>> When KDP module is not inserted, PMD creates tap interface and transfers >>>>>> packets using tap interface. >>>>>> >>>>>> In long term this patch intends to replace the KNI and KNI will be >>>>>> depreciated. >>>>>> >>>>> >>>>> Self-NACK: Will work on another option that does not introduce new >>>>> kernel module. >>>>> >>>> >>>> Hmm, care to elaborate a bit? The second mode of this PMD already was >>>> free of external kernel modules. Do you mean you'll be just removing >>>> mode 1) from the PMD or looking at something completely different? >>>> >>>> Just thinking that tun/tap PMD sounds like a useful thing to have, I >>>> hope you're not abandoning that. >>>> >>> >>> It will be KNI PMD. >>> Plan is to have something like KDP, but with existing KNI kernel module. >>> There will be tun/tap support as fallback. >> >> Hum, now I'm confused. I was under the impression everybody hated KNI >> and wanted to get rid of it, and certainly not build future solutions on >> top of it? >> > > We can't remove it. > We can't replace/improve it -you were one of the major opposition to this. No no no. There's a misunderstanding somewhere in there. I understand the functionality provided by KNI is important. I'd LOVE to see the it replaced. With something that does not require out-of-tree kernel modules. As long as out-of-tree kernel modules are in the picture, the feature might as well not exist at all for the audience I'm dealing with. To that audience, replacing KNI with out-of-tree KCP/KDP or whatever is just irrelevant, there's no progress being made. I also understand there are lot of users to whom out-of-tree kernel modules are not a problem at all, and I'm in no position to tell them that's somehow wrong. If KCP/KDP is better than KNI for that audience then more power to them. But I dont see why such modules would *have* to be within the dpdk source - as suggested several times around this thread/topic such work could live in a separate repository or such. What I really would like to see is a clear policy regarding kernel modules in DPDK. I certainly am in no position to dictate one, and that's why I've been asking questions and throwing around crazy (or not) ideas around the topic. - Panu - ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 13:15 ` Panu Matilainen @ 2016-03-16 13:58 ` Thomas Monjalon 2016-03-16 15:03 ` Panu Matilainen 0 siblings, 1 reply; 29+ messages in thread From: Thomas Monjalon @ 2016-03-16 13:58 UTC (permalink / raw) To: Panu Matilainen; +Cc: Ferruh Yigit, dev, David Marchand, Helin Zhang 2016-03-16 15:15, Panu Matilainen: > What I really would like to see is a clear policy regarding kernel > modules in DPDK. I certainly am in no position to dictate one, and > that's why I've been asking questions and throwing around crazy (or not) > ideas around the topic. I think the consensus is to avoid new kernel module, but allow them in a staging directory while being discussed upstream. About the existing out-of-tree kernel modules, we must continue trying to obsolete them with upstream work. If you feel the consensus must be clearly stated and acked, please send a patch for doc/guides/contributing/design.rst. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 13:58 ` Thomas Monjalon @ 2016-03-16 15:03 ` Panu Matilainen 2016-03-16 15:15 ` Thomas Monjalon 0 siblings, 1 reply; 29+ messages in thread From: Panu Matilainen @ 2016-03-16 15:03 UTC (permalink / raw) To: Thomas Monjalon; +Cc: Ferruh Yigit, dev, David Marchand, Helin Zhang On 03/16/2016 03:58 PM, Thomas Monjalon wrote: > 2016-03-16 15:15, Panu Matilainen: >> What I really would like to see is a clear policy regarding kernel >> modules in DPDK. I certainly am in no position to dictate one, and >> that's why I've been asking questions and throwing around crazy (or not) >> ideas around the topic. > > I think the consensus is to avoid new kernel module, > but allow them in a staging directory while being discussed upstream. To me the more interesting question is: what happens after that? As in, if upstream says no, does it mean axe from dpdk, no ifs and buts? If accepted upstream, does a version of the module still live within dpdk codebase (for example to provide the version for older kernel versions, I dont see that as unreasonable at all)? > About the existing out-of-tree kernel modules, we must continue trying > to obsolete them with upstream work. Agreed. > > If you feel the consensus must be clearly stated and acked, > please send a patch for doc/guides/contributing/design.rst. I'll be happy to, once we have a clear consensus on what the policy actually is. - Panu - ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 15:03 ` Panu Matilainen @ 2016-03-16 15:15 ` Thomas Monjalon 0 siblings, 0 replies; 29+ messages in thread From: Thomas Monjalon @ 2016-03-16 15:15 UTC (permalink / raw) To: Panu Matilainen; +Cc: Ferruh Yigit, dev, David Marchand, Helin Zhang 2016-03-16 17:03, Panu Matilainen: > On 03/16/2016 03:58 PM, Thomas Monjalon wrote: > > 2016-03-16 15:15, Panu Matilainen: > >> What I really would like to see is a clear policy regarding kernel > >> modules in DPDK. I certainly am in no position to dictate one, and > >> that's why I've been asking questions and throwing around crazy (or not) > >> ideas around the topic. > > > > I think the consensus is to avoid new kernel module, > > but allow them in a staging directory while being discussed upstream. > > To me the more interesting question is: what happens after that? > As in, if upstream says no, does it mean axe from dpdk, no ifs and buts? > If accepted upstream, does a version of the module still live within > dpdk codebase (for example to provide the version for older kernel > versions, I dont see that as unreasonable at all)? > > > > About the existing out-of-tree kernel modules, we must continue trying > > to obsolete them with upstream work. > > Agreed. > > > > > If you feel the consensus must be clearly stated and acked, > > please send a patch for doc/guides/contributing/design.rst. > > I'll be happy to, once we have a clear consensus on what the policy > actually is. Sending a patch is the most efficient way of having the discussion happens with more contributors. We, as a technical community, take some patch-based decisions ;) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux 2016-03-16 8:22 ` Panu Matilainen 2016-03-16 10:26 ` Ferruh Yigit @ 2016-03-16 11:07 ` Bruce Richardson 1 sibling, 0 replies; 29+ messages in thread From: Bruce Richardson @ 2016-03-16 11:07 UTC (permalink / raw) To: Panu Matilainen; +Cc: Ferruh Yigit, dev, David Marchand, Helin Zhang On Wed, Mar 16, 2016 at 10:22:05AM +0200, Panu Matilainen wrote: > On 03/16/2016 10:19 AM, Ferruh Yigit wrote: > >On 3/16/2016 7:26 AM, Panu Matilainen wrote: > >>On 03/14/2016 05:32 PM, Ferruh Yigit wrote: > >>>On 3/9/2016 11:17 AM, Ferruh Yigit wrote: > >>>>This patch sent to keep record of latest status of the work. > >>>> > >>>> > >>>>This is slow data path communication implementation based on existing KNI. > >>>> > >>>>Difference is: librte_kni converted into a PMD, kdp kernel module is almost > >>>>same except all control path functionality removed and some simplification done. > >>>> > >>>>Motivation is to simplify slow path data communication. > >>>>Now any application can use this new PMD to send/get data to Linux kernel. > >>>> > >>>>PMD supports two communication methods: > >>>> > >>>>1) KDP kernel module > >>>>PMD initialization functions handles creating virtual interfaces (with help of > >>>>kdp kernel module) and created FIFO. FIFO is used to share data between > >>>>userspace and kernelspace. This is default method. > >>>> > >>>>2) tun/tap module > >>>>When KDP module is not inserted, PMD creates tap interface and transfers > >>>>packets using tap interface. > >>>> > >>>>In long term this patch intends to replace the KNI and KNI will be > >>>>depreciated. > >>>> > >>> > >>>Self-NACK: Will work on another option that does not introduce new > >>>kernel module. > >>> > >> > >>Hmm, care to elaborate a bit? The second mode of this PMD already was > >>free of external kernel modules. Do you mean you'll be just removing > >>mode 1) from the PMD or looking at something completely different? > >> > >>Just thinking that tun/tap PMD sounds like a useful thing to have, I > >>hope you're not abandoning that. > >> > > > >It will be KNI PMD. > >Plan is to have something like KDP, but with existing KNI kernel module. > >There will be tun/tap support as fallback. > > Hum, now I'm confused. I was under the impression everybody hated KNI and > wanted to get rid of it, and certainly not build future solutions on top of > it? > KNI has it's issues - mainly: a) not being upstream and b) having large amounts of code to do port management in it, that is best handled by other means - but the code for transferring packets between kernel space and userspace is more performant and scalable than TUN/TAP, so we need to keep that around unless/until we can get TUN/TAP to reach the same performance levels. Now, we are thinking of some ways in which that can be achieved, but any such solution is going to be a bit out, so making any driver for transferring packets from user->kernel and vice versa might as well take advantage of KNI as well as TUN/TAP so as to allow those who want the extra performance to have it. Regards, /Bruce ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2016-03-16 15:16 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-27 16:32 [dpdk-dev] [PATCH 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2016-01-27 16:32 ` [dpdk-dev] [PATCH 1/2] kdp: add kernel data path kernel module Ferruh Yigit 2016-02-08 17:14 ` Reshma Pattan 2016-02-09 10:53 ` Ferruh Yigit 2016-01-27 16:32 ` [dpdk-dev] [PATCH 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit 2016-01-28 8:16 ` Xu, Qian Q 2016-01-29 16:04 ` Yigit, Ferruh 2016-02-09 17:33 ` Reshma Pattan 2016-02-09 17:51 ` Ferruh Yigit 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 1/2] kdp: add kernel data path kernel module Ferruh Yigit 2016-02-19 5:05 ` [dpdk-dev] [PATCH v2 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 1/2] kdp: add kernel data path kernel module Ferruh Yigit 2016-03-09 11:17 ` [dpdk-dev] [PATCH v3 2/2] kdp: add virtual PMD for kernel slow data path communication Ferruh Yigit 2016-03-14 15:32 ` [dpdk-dev] [PATCH v3 0/2] slow data path communication between DPDK port and Linux Ferruh Yigit 2016-03-16 7:26 ` Panu Matilainen 2016-03-16 8:19 ` Ferruh Yigit 2016-03-16 8:22 ` Panu Matilainen 2016-03-16 10:26 ` Ferruh Yigit 2016-03-16 10:45 ` Thomas Monjalon 2016-03-16 11:07 ` Mcnamara, John 2016-03-16 11:13 ` Ferruh Yigit 2016-03-16 13:23 ` Panu Matilainen 2016-03-16 13:15 ` Panu Matilainen 2016-03-16 13:58 ` Thomas Monjalon 2016-03-16 15:03 ` Panu Matilainen 2016-03-16 15:15 ` Thomas Monjalon 2016-03-16 11:07 ` Bruce Richardson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).