* [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-03-22 11:57 ` [dpdk-dev] [RFC v5] " Jakub Grajciar
@ 2019-03-22 11:57 ` Jakub Grajciar
2019-03-25 20:58 ` Ferruh Yigit
2019-05-09 8:30 ` [dpdk-dev] [RFC v6] " Jakub Grajciar
2 siblings, 0 replies; 58+ messages in thread
From: Jakub Grajciar @ 2019-03-22 11:57 UTC (permalink / raw)
To: dev; +Cc: Jakub Grajciar
Memory interface (memif), provides high performance
packet transfer over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
---
MAINTAINERS | 6 +
config/common_base | 5 +
config/common_linux | 1 +
doc/guides/nics/features/memif.ini | 14 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/memif.rst | 200 ++++
doc/guides/rel_notes/release_19_05.rst | 4 +
drivers/net/Makefile | 1 +
drivers/net/memif/Makefile | 28 +
drivers/net/memif/memif.h | 179 +++
drivers/net/memif/memif_socket.c | 1104 ++++++++++++++++++
drivers/net/memif/memif_socket.h | 104 ++
drivers/net/memif/meson.build | 13 +
drivers/net/memif/rte_eth_memif.c | 1132 +++++++++++++++++++
drivers/net/memif/rte_eth_memif.h | 203 ++++
drivers/net/memif/rte_pmd_memif_version.map | 4 +
drivers/net/meson.build | 1 +
mk/rte.app.mk | 1 +
18 files changed, 3001 insertions(+)
create mode 100644 doc/guides/nics/features/memif.ini
create mode 100644 doc/guides/nics/memif.rst
create mode 100644 drivers/net/memif/Makefile
create mode 100644 drivers/net/memif/memif.h
create mode 100644 drivers/net/memif/memif_socket.c
create mode 100644 drivers/net/memif/memif_socket.h
create mode 100644 drivers/net/memif/meson.build
create mode 100644 drivers/net/memif/rte_eth_memif.c
create mode 100644 drivers/net/memif/rte_eth_memif.h
create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
requires patch: http://patchwork.dpdk.org/patch/51511/
Memif uses a unix domain socket as control channel (cc).
rte_interrupts.h is used to handle interrupts for this cc.
This patch is required, because the message received can
be a reason for disconnecting.
v3:
- coding style fixes
- documentation
- doxygen comments
- use strlcpy() instead of strncpy()
- use RTE_BUILD_BUG_ON instead of _Static_assert()
- fix meson build deps
v4:
- coding style fixes
- doc update (desc format, messaging, features, ...)
- pointer arithmetic fix
- __rte_packed and __rte_aligned instead of __attribute__()
- fixed multi-queue
- support additional archs (memfd_create syscall)
v5:
- coding style fixes
- add release notes
- remove unused dependencies
- doc update
diff --git a/MAINTAINERS b/MAINTAINERS
index 452b8eb82..145b5282a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -790,6 +790,12 @@ F: drivers/net/softnic/
F: doc/guides/nics/features/softnic.ini
F: doc/guides/nics/softnic.rst
+Memif PMD
+M: Jakub Grajciar <jgrajcia@cisco.com>
+F: drivers/net/memif/
+F: doc/guides/nics/memif.rst
+F: doc/guides/nics/features/memif.ini
+
Crypto Drivers
--------------
diff --git a/config/common_base b/config/common_base
index 0b09a9348..eb6055ea8 100644
--- a/config/common_base
+++ b/config/common_base
@@ -416,6 +416,11 @@ CONFIG_RTE_LIBRTE_VMXNET3_DEBUG_TX_FREE=n
#
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
+#
+# Compile Memory Interface PMD driver (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_MEMIF=n
+
#
# Compile link bonding PMD library
#
diff --git a/config/common_linux b/config/common_linux
index 75334273d..87514fe4f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_LIBRTE_IFC_PMD=y
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_MEMIF=y
CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
CONFIG_RTE_LIBRTE_PMD_TAP=y
CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
new file mode 100644
index 000000000..807d9ecdc
--- /dev/null
+++ b/doc/guides/nics/features/memif.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'memif' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status = Y
+Basic stats = Y
+Jumbo frame = Y
+ARMv8 = Y
+Power8 = Y
+x86-32 = Y
+x86-64 = Y
+Usage doc = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 5c80e3baa..167e0dacb 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -34,6 +34,7 @@ Network Interface Controller Drivers
intel_vf
kni
liquidio
+ memif
mlx4
mlx5
mvneta
diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
new file mode 100644
index 000000000..7cc547105
--- /dev/null
+++ b/doc/guides/nics/memif.rst
@@ -0,0 +1,200 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2018-2019 Cisco Systems, Inc.
+
+======================
+Memif Poll Mode Driver
+======================
+Shared memory packet interface (memif) PMD allows for DPDK and any other client
+using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
+Linux only.
+
+The created device transmits packets in a raw format. It can be used with
+Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
+supported in DPDK memif implementation.
+
+Memif works in two roles: master and slave. Slave connects to master over an
+existing socket. It is also a producer of shared memory file and initializes
+the shared memory. Master creates the socket and listens for any slave
+connection requests. The socket may already exist on the system. Be sure to
+remove any such sockets, if you are creating a master interface, or you will
+see an "Address already in use" error. Function ``rte_pmd_memif_remove()``,
+which removes memif interface, will also remove a listener socket, if it is
+not being used by any other interface.
+
+The method to enable one or more interfaces is to use the
+``--vdev=net_memif0`` option on the DPDK application command line. Each
+``--vdev=net_memif1`` option given will create an interface named net_memif0,
+net_memif1, and so on. Memif uses unix domain socket to transmit control
+messages. Each memif has a unique id per socket. If you are connecting multiple
+interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
+etc. Note that if you assign a socket to a master interface it becomes a
+listener socket. Listener socket can not be used by a slave interface on same
+client.
+
+.. csv-table:: **Memif configuration options**
+ :header: "Option", "Description", "Default", "Valid value"
+
+ "id=0", "Each memif on same socket must be given a unique id", "0", "uint32_t"
+ "role=master", "Set memif role", "slave", "master|slave"
+ "bsize=1024", "Size of packet buffer", "2048", "uint16_t"
+ "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
+ "nrxq=2", "Number of RX queues", "1", "255"
+ "ntxq=2", "Number of TX queues", "1", "255"
+ "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
+ "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
+ "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
+ "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
+
+
+**Connection establishment**
+
+In order to create memif connection, two memif interfaces, each in separate
+process, are needed. Once interface in ``master`` role and other in
+``slave`` role. It is not possible to connect two interfaces in a single
+process.
+
+Memif driver uses unix domain socket to exchange required information between
+memif interfaces. Socket file path is specified at interface creation see
+*Memif configuration options* table above. If socket is used by ``master``
+interface, it's marked as listener socket (in scope of current process) and
+listens to connection requests from other processes. One socket can be used by
+multiple interfaces. One process can have ``slave`` and ``master`` interfaces
+at the same time, provided each role is assigned unique socket.
+
+For detailed information on memif control messages, see: net/memif/memif.h.
+
+Slave interface attempts to make a connection on assigned socket. Process
+listening on this socket will extract the connection request and create a new
+connected socket (control channel). Then it sends the 'hello' message
+(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
+adjusts its configuration accordingly, and sends 'init' message
+(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
+uses this id to find master interface, and assigns the control channel to this
+interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
+sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
+every region allocated. Master responds to each of these messages with 'ack'
+message. Same behavior applies to rings. Slave sends 'add ring' message
+(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
+each message with 'ack' message. To finalize the connection, slave interface
+sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
+master maps regions to its address space, initializes rings and responds with
+'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
+(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
+any time, due to driver error or if the interface is being deleted.
+
+
+Files
+
+- net/memif/memif.h *- control messages definitions*
+- net/memif/memif_socket.h
+- net/memif/memif_socket.c
+
+
+
+**Descriptor format**
+
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|0 |length |region |flags |
++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
+|1 |metadata |offset |
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+
+**Flags field - flags (Quad Word 0, bits 0:15)**
+
++-----+--------------------+------------------------------------------------------------------------------------------------+
+|Bits |Name |Functionality |
++=====+====================+================================================================================================+
+|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
++-----+--------------------+------------------------------------------------------------------------------------------------+
+
+**Region index - region (Quad Word 0, 16:31)**
+
+Index of memory region, the buffer is located in.
+
+**Data length - length (Quad Word 0, 32:63)**
+
+Length of transmitted/received data.
+
+**Data Offset - offset (Quad Word 1, 0:31)**
+
+Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
+
+**Metadata - metadata (Quad Word 1, 32:63)**
+
+Buffer metadata.
+
+Files
+
+- net/memif/memif.h *- descriptor and ring definitions*
+- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
+
+
+
+Example: testpmd and testpmd
+----------------------------
+In this example we run two instances of testpmd application and transmit packets over memif.
+
+First create ``master`` interface::
+
+ #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
+
+Now create ``slave`` interface (master must be already running so the slave will connect)::
+
+ #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
+
+Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
+
+ testpmd> set fwd rxonly
+ testpmd> start
+
+ testpmd> set fwd txonly
+ testpmd> start
+
+Show status::
+
+ testpmd> show port stats 0
+
+Example: testpmd and VPP
+------------------------
+For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
+
+Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
+
+ vpp# create interface memif id 0 master no-zero-copy
+ vpp# set interface state memif0/0 up
+ vpp# set interface ip address memif0/0 192.168.1.1/24
+
+To see socket filename use show memif command::
+
+ vpp# show memif
+ sockets
+ id listener filename
+ 0 yes (1) /run/vpp/memif.sock
+ ...
+
+Now create memif interface by running testpmd with these command line options::
+
+ #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
+
+Testpmd should now create memif slave interface and try to connect to master.
+In testpmd set forward option to icmpecho and start forwarding::
+
+ testpmd> set fwd icmpecho
+ testpmd> start
+
+Send ping from VPP::
+
+ vpp# ping 192.168.1.2
+ 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
+ 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
+ 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
+ 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 61a2c7383..be55faacc 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -91,6 +91,10 @@ New Features
* Added promiscuous mode support.
+* **Added memif PMD.**
+
+ Added the new Shared Memory Packet Interface (``memif``) PMD.
+ See the :doc:`../nics/memif` guide for more details on this new driver.
Removed Items
-------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 502869a87..1ecbf5155 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -33,6 +33,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_IAVF_PMD) += iavf
DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
new file mode 100644
index 000000000..346cf6473
--- /dev/null
+++ b/drivers/net/memif/Makefile
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_memif.a
+
+EXPORT_MAP := rte_pmd_memif_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
+LDLIBS += -lrte_ethdev -lrte_kvargs
+LDLIBS += -lrte_hash
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
new file mode 100644
index 000000000..15a56888b
--- /dev/null
+++ b/drivers/net/memif/memif.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_H_
+#define _MEMIF_H_
+
+#define MEMIF_COOKIE 0x3E31F20
+#define MEMIF_VERSION_MAJOR 2
+#define MEMIF_VERSION_MINOR 0
+#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
+#define MEMIF_NAME_SZ 32
+
+/*
+ * S2M: direction slave -> master
+ * M2S: direction master -> slave
+ */
+
+/*
+ * Type definitions
+ */
+
+typedef enum memif_msg_type {
+ MEMIF_MSG_TYPE_NONE,
+ MEMIF_MSG_TYPE_ACK,
+ MEMIF_MSG_TYPE_HELLO,
+ MEMIF_MSG_TYPE_INIT,
+ MEMIF_MSG_TYPE_ADD_REGION,
+ MEMIF_MSG_TYPE_ADD_RING,
+ MEMIF_MSG_TYPE_CONNECT,
+ MEMIF_MSG_TYPE_CONNECTED,
+ MEMIF_MSG_TYPE_DISCONNECT,
+} memif_msg_type_t;
+
+typedef enum {
+ MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
+ MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
+} memif_ring_type_t;
+
+typedef enum {
+ MEMIF_INTERFACE_MODE_ETHERNET,
+ MEMIF_INTERFACE_MODE_IP,
+ MEMIF_INTERFACE_MODE_PUNT_INJECT,
+} memif_interface_mode_t;
+
+typedef uint16_t memif_region_index_t;
+typedef uint32_t memif_region_offset_t;
+typedef uint64_t memif_region_size_t;
+typedef uint16_t memif_ring_index_t;
+typedef uint32_t memif_interface_id_t;
+typedef uint16_t memif_version_t;
+typedef uint8_t memif_log2_ring_size_t;
+
+/*
+ * Socket messages
+ */
+
+ /**
+ * M2S
+ * Contains master interfaces configuration.
+ */
+typedef struct __rte_packed {
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+ memif_version_t min_version; /**< lowest supported memif version */
+ memif_version_t max_version; /**< highest supported memif version */
+ memif_region_index_t max_region; /**< maximum num of regions */
+ memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
+ memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
+ memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
+} memif_msg_hello_t;
+
+/**
+ * S2M
+ * Contains information required to identify interface
+ * to which the slave wants to connect.
+ */
+typedef struct __rte_packed {
+ memif_version_t version; /**< memif version */
+ memif_interface_id_t id; /**< interface id */
+ memif_interface_mode_t mode:8; /**< interface mode */
+ uint8_t secret[24]; /**< optional security parameter */
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+} memif_msg_init_t;
+
+/**
+ * S2M
+ * Request master to add new shared memory region to master interface.
+ * Shared files file descriptor is passed in cmsghdr.
+ */
+typedef struct __rte_packed {
+ memif_region_index_t index; /**< shm regions index */
+ memif_region_size_t size; /**< shm region size */
+} memif_msg_add_region_t;
+
+/**
+ * S2M
+ * Request master to add new ring to master interface.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
+ memif_ring_index_t index; /**< ring index */
+ memif_region_index_t region; /**< region index on which this ring is located */
+ memif_region_offset_t offset; /**< buffer start offset */
+ memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
+ uint16_t private_hdr_size; /**< used for private metadata */
+} memif_msg_add_ring_t;
+
+/**
+ * S2M
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
+} memif_msg_connect_t;
+
+/**
+ * M2S
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
+} memif_msg_connected_t;
+
+/**
+ * S2M & M2S
+ * Disconnect interfaces.
+ */
+typedef struct __rte_packed {
+ uint32_t code; /**< error code */
+ uint8_t string[96]; /**< disconnect reason */
+} memif_msg_disconnect_t;
+
+typedef struct __rte_packed __rte_aligned(128)
+{
+ memif_msg_type_t type:16;
+ union {
+ memif_msg_hello_t hello;
+ memif_msg_init_t init;
+ memif_msg_add_region_t add_region;
+ memif_msg_add_ring_t add_ring;
+ memif_msg_connect_t connect;
+ memif_msg_connected_t connected;
+ memif_msg_disconnect_t disconnect;
+ };
+} memif_msg_t;
+
+/*
+ * Ring and Descriptor Layout
+ */
+
+/**
+ * Buffer descriptor.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
+ memif_region_index_t region; /**< region index on which the buffer is located */
+ uint32_t length; /**< buffer length */
+ memif_region_offset_t offset; /**< buffer offset */
+ uint32_t metadata;
+} memif_desc_t;
+
+#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
+ uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
+
+typedef struct {
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+ uint32_t cookie; /**< MEMIF_COOKIE */
+ uint16_t flags; /**< flags */
+#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ volatile uint16_t head; /**< pointer to ring buffer head */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ volatile uint16_t tail; /**< pointer to ring buffer tail */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ memif_desc_t desc[0]; /**< buffer descriptors */
+} memif_ring_t;
+
+#endif /* _MEMIF_H_ */
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
new file mode 100644
index 000000000..ad6a5e483
--- /dev/null
+++ b/drivers/net/memif/memif_socket.c
@@ -0,0 +1,1104 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+static void memif_intr_handler(void *arg);
+
+static ssize_t
+memif_msg_send(int fd, memif_msg_t *msg, int afd)
+{
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ struct cmsghdr *cmsg;
+ char ctl[CMSG_SPACE(sizeof(int))];
+
+ iov[0].iov_base = msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+
+ if (afd > 0) {
+ memset(&ctl, 0, sizeof(ctl));
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+ cmsg = CMSG_FIRSTHDR(&mh);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
+ }
+
+ return sendmsg(fd, &mh, 0);
+}
+
+static int
+memif_msg_send_from_queue(struct memif_control_channel *cc)
+{
+ ssize_t size;
+ int ret = 0;
+ struct memif_msg_queue_elt *e;
+
+ e = TAILQ_FIRST(&cc->msg_queue);
+ if (e == NULL)
+ return 0;
+
+ size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
+ ret = -1;
+ } else {
+ MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
+ }
+ TAILQ_REMOVE(&cc->msg_queue, e, next);
+ rte_free(e);
+
+ return ret;
+}
+
+static struct memif_msg_queue_elt *
+memif_msg_enq(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = rte_zmalloc("memif_msg",
+ sizeof(struct
+ memif_msg_queue_elt), 0);
+
+ if (e == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control message.");
+ return NULL;
+ }
+
+ e->fd = -1;
+ TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
+
+ return e;
+}
+
+void
+memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ struct pmd_internals *pmd;
+ memif_msg_disconnect_t *d;
+
+ if (e == NULL) {
+ MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
+ return;
+ }
+
+ pmd = cc->dev->data->dev_private;
+ d = &e->msg.disconnect;
+
+ e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
+ d->code = err_code;
+
+ if (reason != NULL) {
+ strlcpy((char *)d->string, reason, sizeof(d->string));
+ if (cc->dev != NULL) {
+ strlcpy(pmd->local_disc_string, reason,
+ sizeof(pmd->local_disc_string));
+ }
+ }
+}
+
+static int
+memif_msg_enq_hello(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ memif_msg_hello_t *h;
+
+ if (e == NULL)
+ return -1;
+
+ h = &e->msg.hello;
+
+ e->msg.type = MEMIF_MSG_TYPE_HELLO;
+ h->min_version = MEMIF_VERSION;
+ h->max_version = MEMIF_VERSION;
+ h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_region = ETH_MEMIF_MAX_REGION_IDX;
+ h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
+
+ strlcpy((char *)h->name, rte_version(), sizeof(h->name));
+
+ return 0;
+}
+
+static int
+memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_hello_t *h = &msg->hello;
+
+ if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
+ memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ /* Set parameters for active connection */
+ pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
+ pmd->cfg.num_s2m_rings);
+ pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
+ pmd->cfg.num_m2s_rings);
+ pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
+ pmd->cfg.log2_ring_size);
+ pmd->run.buffer_size = pmd->cfg.buffer_size;
+
+ strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
+
+ MIF_LOG(DEBUG, "%s: Connecting to %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
+{
+ memif_msg_init_t *i = &msg->init;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *pmd;
+ struct rte_eth_dev *dev;
+
+ if (i->version != MEMIF_VERSION) {
+ memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ if (cc->socket == NULL) {
+ memif_msg_enq_disconnect(cc, "Device error", 0);
+ return -1;
+ }
+
+ /* Find device with requested ID */
+ TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
+ dev = elt->dev;
+ pmd = dev->data->dev_private;
+ if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
+ pmd->id == i->id) {
+ /* assign control channel to device */
+ cc->dev = dev;
+ pmd->cc = cc;
+
+ if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Only ethernet mode supported",
+ 0);
+ return -1;
+ }
+
+ if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
+ ETH_MEMIF_FLAG_CONNECTED)) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Already connected", 0);
+ return -1;
+ }
+ strlcpy(pmd->remote_name, (char *)i->name,
+ sizeof(pmd->remote_name));
+
+ if (*pmd->secret != '\0') {
+ if (*i->secret == '\0') {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Secret required", 0);
+ return -1;
+ }
+ if (strcmp(pmd->secret, (char *)i->secret) != 0) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Incorrect secret", 0);
+ return -1;
+ }
+ }
+
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
+ return 0;
+ }
+ }
+
+ /* ID not found on this socket */
+ MIF_LOG(DEBUG, "ID %u not found.", i->id);
+ memif_msg_enq_disconnect(cc, "ID not found", 0);
+ return -1;
+}
+
+static int
+memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
+ int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_region_t *ar = &msg->add_region;
+ struct memif_region *mr;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
+ return -1;
+ }
+
+ if (ar->index > ETH_MEMIF_MAX_REGION_IDX) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
+ return -1;
+ }
+
+ mr = rte_realloc(pmd->regions, sizeof(struct memif_region) *
+ (ar->index + 1), 0);
+ if (mr == NULL) {
+ memif_msg_enq_disconnect(pmd->cc, "Device error", 0);
+ return -1;
+ }
+
+ pmd->regions = mr;
+ pmd->regions[ar->index].fd = fd;
+ pmd->regions[ar->index].region_size = ar->size;
+ pmd->regions[ar->index].addr = NULL;
+ pmd->regions_num++;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_ring_t *ar = &msg->add_ring;
+ struct memif_queue *mq;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
+ return -1;
+ }
+
+ /* check if we have enough queues */
+ if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
+ if (ar->index >= pmd->cfg.num_s2m_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_s2m_rings++;
+ } else {
+ if (ar->index >= pmd->cfg.num_m2s_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_m2s_rings++;
+ }
+
+ mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
+ dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
+
+ mq->intr_handle.fd = fd;
+ mq->log2_ring_size = ar->log2_ring_size;
+ mq->region = ar->region;
+ mq->offset = ar->offset;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connect_t *c = &msg->connect;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connected_t *c = &msg->connected;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_disconnect_t *d = &msg->disconnect;
+
+ memset(pmd->remote_disc_string, 0, sizeof(pmd->remote_disc_string));
+ strlcpy(pmd->remote_disc_string, (char *)d->string,
+ sizeof(pmd->remote_disc_string));
+
+ MIF_LOG(INFO, "%s: Disconnect received: %s",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+
+ memset(pmd->local_disc_string, 0, 96);
+ memif_disconnect(rte_eth_dev_allocated
+ (rte_vdev_device_name(pmd->vdev)));
+ return 0;
+}
+
+static int
+memif_msg_enq_ack(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ if (e == NULL)
+ return -1;
+
+ e->msg.type = MEMIF_MSG_TYPE_ACK;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_init(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_init_t *i = &e->msg.init;
+
+ if (e == NULL)
+ return -1;
+
+ i = &e->msg.init;
+ e->msg.type = MEMIF_MSG_TYPE_INIT;
+ i->version = MEMIF_VERSION;
+ i->id = pmd->id;
+ i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
+
+ strlcpy((char *)i->name, rte_version(), sizeof(i->name));
+
+ if (*pmd->secret != '\0')
+ strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_add_region_t *ar;
+ struct memif_region *mr = &pmd->regions[idx];
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_region;
+ e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
+ e->fd = mr->fd;
+ ar->index = idx;
+ ar->size = mr->region_size;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
+ memif_ring_type_t type)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ struct memif_queue *mq;
+ memif_msg_add_ring_t *ar;
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_ring;
+ mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
+ dev->data->rx_queues[idx];
+
+ e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
+ e->fd = mq->intr_handle.fd;
+ ar->index = idx;
+ ar->offset = mq->offset;
+ ar->region = mq->region;
+ ar->log2_ring_size = mq->log2_ring_size;
+ ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
+ ar->private_hdr_size = 0;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connect_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connect;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECT;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connected(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connected_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connected;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static void
+memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
+{
+ struct memif_msg_queue_elt *elt;
+ struct memif_control_channel *cc = arg;
+
+ /* close control channel fd */
+ close(intr_handle->fd);
+ /* clear message queue */
+ while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
+ TAILQ_REMOVE(&cc->msg_queue, elt, next);
+ free(elt);
+ }
+ /* free control channel */
+ rte_free(cc);
+}
+
+void
+memif_disconnect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *elt;
+ struct memif_queue *mq;
+ int i;
+ int ret;
+
+ if (pmd->cc != NULL) {
+ /* Clear control message queue (except disconnect message if any). */
+ while ((elt = TAILQ_FIRST(&pmd->cc->msg_queue)) != NULL) {
+ if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
+ TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
+ free(elt);
+ }
+ }
+ /* send disconnect message (if there is any in queue) */
+ memif_msg_send_from_queue(pmd->cc);
+
+ /* at this point, there should be no more messages in queue */
+ if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
+ MIF_LOG(WARNING,
+ "%s: Unexpected message(s) in message queue.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+
+ if (pmd->cc->intr_handle.fd > 0) {
+ ret =
+ rte_intr_callback_unregister(&pmd->cc->intr_handle,
+ memif_intr_handler,
+ pmd->cc);
+ /*
+ * If callback is active (disconnecting based on
+ * received control message).
+ */
+ if (ret == -EAGAIN) {
+ /* *INDENT-OFF* */
+ ret = rte_intr_callback_unregister_pending(
+ &pmd->cc->intr_handle,
+ memif_intr_handler,
+ pmd->cc,
+ memif_intr_unregister_handler);
+ /* *INDENT-ON* */
+ } else if (ret > 0) {
+ close(pmd->cc->intr_handle.fd);
+ rte_free(pmd->cc);
+ }
+ if (ret <= 0)
+ MIF_LOG(WARNING,
+ "%s: Failed to unregister control channel callback.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+ }
+
+ /* unconfig interrupts */
+ for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ rte_intr_disable(&mq->intr_handle);
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+ for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ rte_intr_disable(&mq->intr_handle);
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+
+ memif_free_regions(pmd);
+
+ dev->data->dev_link.link_status = ETH_LINK_DOWN;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
+ MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
+}
+
+static int
+memif_msg_receive(struct memif_control_channel *cc)
+{
+ char ctl[CMSG_SPACE(sizeof(int)) +
+ CMSG_SPACE(sizeof(struct ucred))] = { 0 };
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ memif_msg_t msg = { 0 };
+ ssize_t size;
+ int ret = 0;
+ struct ucred *cr __rte_unused;
+ cr = 0;
+ struct cmsghdr *cmsg;
+ int afd = -1;
+ int i;
+ struct pmd_internals *pmd;
+
+ iov[0].iov_base = (void *)&msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+
+ size = recvmsg(cc->intr_handle.fd, &mh, 0);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(DEBUG, "Invalid message size.");
+ memif_msg_enq_disconnect(cc, "Invalid message size", 0);
+ return -1;
+ }
+ MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
+
+ cmsg = CMSG_FIRSTHDR(&mh);
+ while (cmsg) {
+ if (cmsg->cmsg_level == SOL_SOCKET) {
+ if (cmsg->cmsg_type == SCM_CREDENTIALS)
+ cr = (struct ucred *)CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_RIGHTS)
+ afd = *(int *)CMSG_DATA(cmsg);
+ }
+ cmsg = CMSG_NXTHDR(&mh, cmsg);
+ }
+
+ if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
+ MIF_LOG(DEBUG, "Unexpected message.");
+ memif_msg_enq_disconnect(cc, "Unexpected message", 0);
+ return -1;
+ }
+
+ /* get device from hash data */
+ switch (msg.type) {
+ case MEMIF_MSG_TYPE_ACK:
+ break;
+ case MEMIF_MSG_TYPE_HELLO:
+ ret = memif_msg_receive_hello(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_init_regions_and_queues(cc->dev);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_init(cc->dev);
+ if (ret < 0)
+ goto exit;
+ pmd = cc->dev->data->dev_private;
+ for (i = 0; i < pmd->regions_num; i++) {
+ ret = memif_msg_enq_add_region(cc->dev, i);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_S2M);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_M2S);
+ if (ret < 0)
+ goto exit;
+ }
+ ret = memif_msg_enq_connect(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_INIT:
+ /*
+ * This cc does not have an interface asociated with it.
+ * If suitable interface is found it will be assigned here.
+ */
+ ret = memif_msg_receive_init(cc, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_REGION:
+ ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_RING:
+ ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECT:
+ ret = memif_msg_receive_connect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_connected(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECTED:
+ ret = memif_msg_receive_connected(cc->dev, &msg);
+ break;
+ case MEMIF_MSG_TYPE_DISCONNECT:
+ ret = memif_msg_receive_disconnect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ break;
+ default:
+ memif_msg_enq_disconnect(cc, "Unknown message type", 0);
+ ret = -1;
+ goto exit;
+ }
+
+ exit:
+ return ret;
+}
+
+static void
+memif_intr_handler(void *arg)
+{
+ struct memif_control_channel *cc = arg;
+ struct rte_eth_dev *dev;
+ int ret;
+
+ ret = memif_msg_receive(cc);
+ /* if driver failed to assign device */
+ if (cc->dev == NULL) {
+ ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
+ memif_intr_handler,
+ cc,
+ memif_intr_unregister_handler);
+ if (ret < 0)
+ MIF_LOG(WARNING,
+ "Failed to unregister control channel callback.");
+ return;
+ }
+ /* if memif_msg_receive failed */
+ if (ret < 0)
+ goto disconnect;
+
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto disconnect;
+
+ return;
+
+ disconnect:
+ dev = rte_eth_dev_allocated(rte_vdev_device_name(
+ ((struct pmd_internals *)cc->dev->data->dev_private)->vdev));
+ if (dev == NULL) {
+ MIF_LOG(WARNING, "eth dev not allocated");
+ return;
+ }
+ memif_disconnect(dev);
+}
+
+static void
+memif_listener_handler(void *arg)
+{
+ struct memif_socket *socket = arg;
+ int sockfd;
+ int addr_len;
+ struct sockaddr_un client;
+ struct memif_control_channel *cc;
+ int ret;
+
+ addr_len = sizeof(client);
+ sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
+ (socklen_t *)&addr_len);
+ if (sockfd < 0) {
+ MIF_LOG(ERR,
+ "Failed to accept connection request on socket fd %d",
+ socket->intr_handle.fd);
+ return;
+ }
+
+ MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
+
+ cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
+ if (cc == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control channel.");
+ goto error;
+ }
+
+ cc->intr_handle.fd = sockfd;
+ cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ cc->socket = socket;
+ cc->dev = NULL;
+ TAILQ_INIT(&cc->msg_queue);
+
+ ret =
+ rte_intr_callback_register(&cc->intr_handle, memif_intr_handler,
+ cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to register control channel callback.");
+ goto error;
+ }
+
+ ret = memif_msg_enq_hello(cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to enqueue hello message.");
+ goto error;
+ }
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto error;
+
+ return;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (cc != NULL) {
+ rte_free(cc);
+ cc = NULL;
+ }
+}
+
+static struct memif_socket *
+memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
+{
+ struct memif_socket *sock;
+ struct sockaddr_un un;
+ int sockfd;
+ int ret;
+ int on = 1;
+
+ sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
+ if (sock == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory for memif socket");
+ return NULL;
+ }
+
+ sock->listener = listener;
+ rte_memcpy(sock->filename, key, 256);
+ TAILQ_INIT(&sock->dev_queue);
+
+ if (listener != 0) {
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0)
+ goto error;
+
+ un.sun_family = AF_UNIX;
+ memcpy(un.sun_path, sock->filename,
+ sizeof(un.sun_path) - 1);
+
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
+ sizeof(on));
+ if (ret < 0)
+ goto error;
+ ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+ if (ret < 0)
+ goto error;
+ ret = listen(sockfd, 1);
+ if (ret < 0)
+ goto error;
+
+ MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
+ rte_vdev_device_name(pmd->vdev), sock->filename);
+
+ sock->intr_handle.fd = sockfd;
+ sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ ret = rte_intr_callback_register(&sock->intr_handle,
+ memif_listener_handler, sock);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt "
+ "callback for listener socket",
+ rte_vdev_device_name(pmd->vdev));
+ return NULL;
+ }
+ }
+
+ return sock;
+
+ error:
+ MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
+ rte_vdev_device_name(pmd->vdev), key, strerror(errno));
+ if (sock != NULL)
+ rte_free(sock);
+ return NULL;
+}
+
+static struct rte_hash *
+memif_create_socket_hash(void)
+{
+ struct rte_hash_parameters params = { 0 };
+ params.name = MEMIF_SOCKET_HASH_NAME;
+ params.entries = 256;
+ params.key_len = 256;
+ params.hash_func = rte_jhash;
+ params.hash_func_init_val = 0;
+ return rte_hash_create(¶ms);
+}
+
+int
+memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *tmp_pmd;
+ struct rte_hash *hash;
+ int ret;
+ char key[256];
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL) {
+ hash = memif_create_socket_hash();
+ if (hash == NULL) {
+ MIF_LOG(ERR, "Failed to create memif socket hash.");
+ return -1;
+ }
+ }
+
+ memset(key, 0, 256);
+ rte_memcpy(key, socket_filename, strlen(socket_filename));
+ ret = rte_hash_lookup_data(hash, key, (void **)&socket);
+ if (ret < 0) {
+ socket = memif_socket_create(pmd, key,
+ (pmd->role ==
+ MEMIF_ROLE_SLAVE) ? 0 : 1);
+ if (socket == NULL)
+ return -1;
+ ret = rte_hash_add_key_data(hash, key, socket);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to add socket to socket hash.");
+ return ret;
+ }
+ }
+ pmd->socket_filename = socket->filename;
+
+ if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
+ MIF_LOG(ERR, "Socket is a listener.");
+ return -1;
+ } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
+ MIF_LOG(ERR, "Socket is not a listener.");
+ return -1;
+ }
+
+ TAILQ_FOREACH(elt, &socket->dev_queue, next) {
+ tmp_pmd = elt->dev->data->dev_private;
+ if (tmp_pmd->id == pmd->id) {
+ MIF_LOG(ERR, "Memif device with id %d already "
+ "exists on socket %s",
+ pmd->id, socket->filename);
+ return -1;
+ }
+ }
+
+ elt =
+ rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt),
+ 0);
+ if (elt == NULL) {
+ MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ elt->dev = dev;
+ TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
+
+ return 0;
+}
+
+void
+memif_socket_remove_device(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt, *next;
+ struct rte_hash *hash;
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL)
+ return;
+
+ if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) <
+ 0)
+ return;
+
+ for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->dev == dev) {
+ TAILQ_REMOVE(&socket->dev_queue, elt, next);
+ free(elt);
+ pmd->socket_filename = NULL;
+ }
+ }
+
+ /* remove socket, if this was the last device using it */
+ if (TAILQ_EMPTY(&socket->dev_queue)) {
+ rte_hash_del_key(hash, socket->filename);
+ if (socket->listener) {
+ /* remove listener socket file,
+ * so we can create new one later.
+ */
+ remove(socket->filename);
+ }
+ rte_free(socket);
+ }
+}
+
+int
+memif_connect_master(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+ return 0;
+}
+
+int
+memif_connect_slave(struct rte_eth_dev *dev)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_un sun;
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0) {
+ MIF_LOG(ERR, "%s: Failed to open socket.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ sun.sun_family = AF_UNIX;
+
+ memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
+
+ ret = connect(sockfd, (struct sockaddr *)&sun,
+ sizeof(struct sockaddr_un));
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+ goto error;
+ }
+
+ MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+
+ pmd->cc = rte_zmalloc("memif-cc",
+ sizeof(struct memif_control_channel), 0);
+ if (pmd->cc == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate control channel.",
+ rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ pmd->cc->intr_handle.fd = sockfd;
+ pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ pmd->cc->socket = NULL;
+ pmd->cc->dev = dev;
+ TAILQ_INIT(&pmd->cc->msg_queue);
+
+ ret = rte_intr_callback_register(&pmd->cc->intr_handle,
+ memif_intr_handler, pmd->cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt callback "
+ "for control fd", rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ return 0;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (pmd->cc != NULL) {
+ rte_free(pmd->cc);
+ pmd->cc = NULL;
+ }
+ return -1;
+}
diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
new file mode 100644
index 000000000..8caea270b
--- /dev/null
+++ b/drivers/net/memif/memif_socket.h
@@ -0,0 +1,104 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_SOCKET_H_
+#define _MEMIF_SOCKET_H_
+
+#include <sys/queue.h>
+
+/**
+ * Remove device from socket device list. If no device is left on the socket,
+ * remove the socket as well.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_socket_remove_device(struct rte_eth_dev *dev);
+
+/**
+ * Enqueue disconnect message to control channel message queue.
+ *
+ * @param cc
+ * control channel
+ * @param reason
+ * const string stating disconnect reason (96 characters)
+ * @param err_code
+ * error code
+ */
+void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code);
+
+/**
+ * Initialize memif socket for specified device. If socket doesn't exist, create socket.
+ *
+ * @param dev
+ * memif ethernet device
+ * @param socket_filename
+ * socket filename
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
+
+/**
+ * Disconnect memif device. Close control channel and shared memory.
+ *
+ * @param dev
+ * ethernet device
+ */
+void memif_disconnect(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, enable connection establishment.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_master(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, send connection request.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_slave(struct rte_eth_dev *dev);
+
+struct memif_socket_dev_list_elt {
+ TAILQ_ENTRY(memif_socket_dev_list_elt) next;
+ struct rte_eth_dev *dev; /**< pointer to device internals */
+};
+
+#define MEMIF_SOCKET_HASH_NAME "memif-sh"
+struct memif_socket {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ uint8_t listener; /**< if not zero socket is listener */
+ char filename[256]; /**< socket filename */
+
+ TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
+ /**< Queue of devices using this socket */
+};
+
+/* Control message queue. */
+struct memif_msg_queue_elt {
+ TAILQ_ENTRY(memif_msg_queue_elt) next;
+ memif_msg_t msg; /**< control message */
+ int fd; /**< fd to be sent to peer */
+};
+
+struct memif_control_channel {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
+ struct memif_socket *socket; /**< pointer to socket */
+ struct rte_eth_dev *dev; /**< pointer to device */
+};
+
+#endif /* MEMIF_SOCKET_H */
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
new file mode 100644
index 000000000..4dfe37d3a
--- /dev/null
+++ b/drivers/net/memif/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+if host_machine.system() != 'linux'
+ build = false
+endif
+
+sources = files('rte_eth_memif.c',
+ 'memif_socket.c')
+
+allow_experimental_apis = true
+
+deps += ['hash']
diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
new file mode 100644
index 000000000..7ef1bca93
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -0,0 +1,1132 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/if_ether.h>
+#include <errno.h>
+#include <sys/eventfd.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+#define ETH_MEMIF_ID_ARG "id"
+#define ETH_MEMIF_ROLE_ARG "role"
+#define ETH_MEMIF_BUFFER_SIZE_ARG "bsize"
+#define ETH_MEMIF_RING_SIZE_ARG "rsize"
+#define ETH_MEMIF_SOCKET_ARG "socket"
+#define ETH_MEMIF_MAC_ARG "mac"
+#define ETH_MEMIF_ZC_ARG "zero-copy"
+#define ETH_MEMIF_SECRET_ARG "secret"
+
+static const char *valid_arguments[] = {
+ ETH_MEMIF_ID_ARG,
+ ETH_MEMIF_ROLE_ARG,
+ ETH_MEMIF_BUFFER_SIZE_ARG,
+ ETH_MEMIF_RING_SIZE_ARG,
+ ETH_MEMIF_SOCKET_ARG,
+ ETH_MEMIF_MAC_ARG,
+ ETH_MEMIF_ZC_ARG,
+ ETH_MEMIF_SECRET_ARG,
+ NULL
+};
+
+static struct rte_vdev_driver pmd_memif_drv;
+
+const char *
+memif_version(void)
+{
+ return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
+}
+
+static void
+memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
+{
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+ dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static memif_ring_t *
+memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
+{
+ /* rings only in region 0 */
+ void *p = pmd->regions[0].addr;
+ int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size);
+
+ p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
+
+ return (memif_ring_t *)p;
+}
+
+static void *
+memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
+{
+ return ((uint8_t *)pmd->regions[d->region].addr + d->offset);
+}
+
+static uint16_t
+eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
+ uint16_t n_rx_pkts = 0;
+ uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
+ RTE_PKTMBUF_HEADROOM;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head = NULL;
+ uint64_t b;
+ ssize_t size __rte_unused;
+ uint16_t head;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+
+ /* consume interrupt */
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
+ size = read(mq->intr_handle.fd, &b, sizeof(b));
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
+ last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+ if (cur_slot == last_slot)
+ goto refill;
+ n_slots = last_slot - cur_slot;
+
+ while (n_slots && n_rx_pkts < nb_pkts) {
+ mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf_head == NULL))
+ goto no_free_bufs;
+ mbuf = mbuf_head;
+ mbuf->port = mq->in_port;
+
+ next_slot:
+ s0 = cur_slot & mask;
+ d0 = &ring->desc[s0];
+
+ src_len = d0->length;
+ dst_off = 0;
+ src_off = 0;
+
+ do {
+ dst_len = mbuf_size - dst_off;
+ if (dst_len == 0) {
+ dst_off = 0;
+ dst_len = mbuf_size + RTE_PKTMBUF_HEADROOM;
+
+ mbuf = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf == NULL))
+ goto no_free_bufs;
+ mbuf->port = mq->in_port;
+ rte_pktmbuf_chain(mbuf_head, mbuf);
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ rte_pktmbuf_pkt_len(mbuf) =
+ rte_pktmbuf_data_len(mbuf) += cp_len;
+
+ memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
+ (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ } while (src_len);
+
+ cur_slot++;
+ n_slots--;
+ if (d0->flags & MEMIF_DESC_FLAG_NEXT)
+ goto next_slot;
+
+ *bufs++ = mbuf_head;
+ n_rx_pkts++;
+ }
+
+ no_free_bufs:
+ if (type == MEMIF_RING_S2M) {
+ rte_mb();
+ ring->tail = cur_slot;
+ mq->last_head = cur_slot;
+ } else {
+ mq->last_tail = cur_slot;
+ }
+
+ refill:
+ if (type == MEMIF_RING_M2S) {
+ head = ring->head;
+ n_slots = ring_size - head + mq->last_tail;
+
+ while (n_slots--) {
+ s0 = head++ & mask;
+ d0 = &ring->desc[s0];
+ d0->length = pmd->run.buffer_size;
+ }
+ rte_mb();
+ ring->head = head;
+ }
+
+ mq->n_pkts += n_rx_pkts;
+ return n_rx_pkts;
+}
+
+static uint16_t
+eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head;
+ uint64_t a;
+ ssize_t size;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ n_free = ring->tail - mq->last_tail;
+ mq->last_tail += n_free;
+ slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+
+ if (type == MEMIF_RING_S2M)
+ n_free = ring_size - ring->head + mq->last_tail;
+ else
+ n_free = ring->head - ring->tail;
+
+ while (n_free && n_tx_pkts < nb_pkts) {
+ mbuf_head = *bufs++;
+ mbuf = mbuf_head;
+
+ saved_slot = slot;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len =
+ (type ==
+ MEMIF_RING_S2M) ? pmd->run.buffer_size : d0->length;
+
+ next_in_chain:
+ src_off = 0;
+ src_len = rte_pktmbuf_data_len(mbuf);
+
+ while (src_len) {
+ if (dst_len == 0) {
+ if (n_free) {
+ slot++;
+ n_free--;
+ d0->flags |= MEMIF_DESC_FLAG_NEXT;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.buffer_size : d0->length;
+ d0->flags = 0;
+ } else {
+ slot = saved_slot;
+ goto no_free_slots;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
+ rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
+ cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ dst_len -= cp_len;
+
+ d0->length = dst_off;
+ }
+
+ if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
+ mbuf = mbuf->next;
+ goto next_in_chain;
+ }
+
+ n_tx_pkts++;
+ slot++;
+ n_free--;
+ rte_pktmbuf_free(mbuf_head);
+ }
+
+ no_free_slots:
+ rte_mb();
+ if (type == MEMIF_RING_S2M)
+ ring->head = slot;
+ else
+ ring->tail = slot;
+
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
+ a = 1;
+ size = write(mq->intr_handle.fd, &a, sizeof(a));
+ if (unlikely(size < 0)) {
+ MIF_LOG(WARNING,
+ "%s: Failed to send interrupt. %s",
+ rte_vdev_device_name(pmd->vdev), strerror(errno));
+ }
+ }
+
+ mq->n_err += nb_pkts - n_tx_pkts;
+ mq->n_pkts += n_tx_pkts;
+ return n_tx_pkts;
+}
+
+void
+memif_free_regions(struct pmd_internals *pmd)
+{
+ int i;
+ struct memif_region *r;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ r = pmd->regions + i;
+ if (r == NULL)
+ return;
+ if (r->addr == NULL)
+ return;
+ munmap(r->addr, r->region_size);
+ if (r->fd > 0) {
+ close(r->fd);
+ r->fd = -1;
+ }
+ }
+ rte_free(pmd->regions);
+}
+
+static int
+memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn)
+{
+ struct memif_region *r;
+ char shm_name[32];
+ int i;
+ int ret = 0;
+
+ r = rte_zmalloc("memif_region", sizeof(struct memif_region) * (brn + 1), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate regions.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ pmd->regions = r;
+ pmd->regions_num = brn + 1;
+
+ /*
+ * Create shm for every region. Region 0 is reserved for descriptors.
+ * Other regions contain buffers.
+ */
+ for (i = 0; i < (brn + 1); i++) {
+ r = &pmd->regions[i];
+
+ r->buffer_offset = (i == 0) ? (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) +
+ sizeof(memif_desc_t) * (1 << pmd->run.log2_ring_size)) : 0;
+ r->region_size = (i == 0) ? r->buffer_offset :
+ (uint32_t)(pmd->run.buffer_size *
+ (1 << pmd->run.log2_ring_size) *
+ (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings));
+
+ memset(shm_name, 0, sizeof(char) * 32);
+ sprintf(shm_name, "memif region %d", i);
+
+ r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
+ if (r->fd < 0) {
+ MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ return -1;
+ }
+
+ ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ return -1;
+ }
+
+ ret = ftruncate(r->fd, r->region_size);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ return -1;
+ }
+
+ r->addr = mmap(NULL, r->region_size, PROT_READ |
+ PROT_WRITE, MAP_SHARED, r->fd, 0);
+ if (r->addr == NULL) {
+ MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
+static void
+memif_init_rings(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_ring_t *ring;
+ int i, j;
+ uint16_t slot;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = i * (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 1;
+ ring->desc[j].offset = pmd->regions[1].buffer_offset +
+ (uint32_t)(slot * pmd->run.buffer_size);
+ ring->desc[j].length = pmd->run.buffer_size;
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = (i + pmd->run.num_s2m_rings) *
+ (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 1;
+ ring->desc[j].offset = pmd->regions[1].buffer_offset +
+ (uint32_t)(slot * pmd->run.buffer_size);
+ ring->desc[j].length = pmd->run.buffer_size;
+ }
+ }
+}
+
+/* called only by slave */
+static void
+memif_init_queues(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = dev->data->tx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0].addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for tx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = dev->data->rx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0].addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for rx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+}
+
+int
+memif_init_regions_and_queues(struct rte_eth_dev *dev)
+{
+ int ret;
+
+ ret = memif_alloc_regions(dev->data->dev_private, /* num of buffer regions */ 1);
+ if (ret < 0)
+ return ret;
+
+ memif_init_rings(dev);
+
+ memif_init_queues(dev);
+
+ return 0;
+}
+
+int
+memif_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_region *mr;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ mr = pmd->regions + i;
+ if (mr != NULL) {
+ if (mr->addr == NULL) {
+ if (mr->fd < 0)
+ return -1;
+ mr->addr = mmap(NULL, mr->region_size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, mr->fd, 0);
+ if (mr->addr == NULL)
+ return -1;
+ }
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region].addr +
+ mq->offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_MASTER)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region].addr +
+ mq->offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_SLAVE)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
+ dev->data->dev_link.link_status = ETH_LINK_UP;
+ MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
+ return 0;
+}
+
+static int
+memif_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int ret = 0;
+
+ switch (pmd->role) {
+ case MEMIF_ROLE_SLAVE:
+ ret = memif_connect_slave(dev);
+ break;
+ case MEMIF_ROLE_MASTER:
+ ret = memif_connect_master(dev);
+ break;
+ default:
+ MIF_LOG(ERR, "%s: Unknown role: %d.",
+ rte_vdev_device_name(pmd->vdev), pmd->role);
+ ret = -1;
+ break;
+ }
+
+ return ret;
+}
+
+static int
+memif_dev_configure(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ /*
+ * SLAVE - TXQ
+ * MASTER - RXQ
+ */
+ pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_tx_queues : dev->data->nb_rx_queues;
+
+ /*
+ * SLAVE - RXQ
+ * MASTER - TXQ
+ */
+ pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+
+ return 0;
+}
+
+static int
+memif_tx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type =
+ (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->pmd = pmd;
+ dev->data->tx_queues[qid] = mq;
+
+ return 0;
+}
+
+static int
+memif_rx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->mempool = mb_pool;
+ mq->in_port = dev->data->port_id;
+ mq->pmd = pmd;
+ dev->data->rx_queues[qid] = mq;
+
+ return 0;
+}
+
+static void
+memif_queue_release(void *queue)
+{
+ struct memif_queue *q = (struct memif_queue *)queue;
+
+ if (!q)
+ return;
+
+ rte_free(q);
+}
+
+static int
+memif_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static int
+memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+ uint8_t tmp, nq;
+
+ stats->ipackets = 0;
+ stats->ibytes = 0;
+ stats->opackets = 0;
+ stats->obytes = 0;
+ stats->oerrors = 0;
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
+ pmd->run.num_m2s_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* RX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->rx_queues[i];
+ stats->q_ipackets[i] = mq->n_pkts;
+ stats->q_ibytes[i] = mq->n_bytes;
+ stats->ipackets += mq->n_pkts;
+ stats->ibytes += mq->n_bytes;
+ }
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
+ pmd->run.num_s2m_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* TX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->tx_queues[i];
+ stats->q_opackets[i] = mq->n_pkts;
+ stats->q_obytes[i] = mq->n_bytes;
+ stats->q_errors[i] = mq->n_err;
+ stats->opackets += mq->n_pkts;
+ stats->obytes += mq->n_bytes;
+ stats->oerrors += mq->n_err;
+ }
+ return 0;
+}
+
+static void
+memif_stats_reset(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+ struct memif_queue *mq;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
+ dev->data->rx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
+ dev->data->tx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+}
+
+static int
+memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
+ rte_vdev_device_name(pmd->vdev));
+
+ return -1;
+}
+
+static int
+memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
+
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = memif_dev_start,
+ .dev_infos_get = memif_dev_info,
+ .dev_configure = memif_dev_configure,
+ .tx_queue_setup = memif_tx_queue_setup,
+ .rx_queue_setup = memif_rx_queue_setup,
+ .rx_queue_release = memif_queue_release,
+ .tx_queue_release = memif_queue_release,
+ .rx_queue_intr_enable = memif_rx_queue_intr_enable,
+ .rx_queue_intr_disable = memif_rx_queue_intr_disable,
+ .link_update = memif_link_update,
+ .stats_get = memif_stats_get,
+ .stats_reset = memif_stats_reset,
+};
+
+static int
+memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
+ memif_interface_id_t id, uint32_t flags,
+ const char *socket_filename,
+ memif_log2_ring_size_t log2_ring_size,
+ uint16_t buffer_size, const char *secret,
+ struct ether_addr *eth_addr)
+{
+ int ret = 0;
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct pmd_internals *pmd;
+ const unsigned int numa_node = vdev->device.numa_node;
+ const char *name = rte_vdev_device_name(vdev);
+
+ if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
+ MIF_LOG(ERR, "Zero-copy not supported.");
+ return -1;
+ }
+
+ eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
+ if (eth_dev == NULL) {
+ MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
+ return -1;
+ }
+
+ pmd = eth_dev->data->dev_private;
+ memset(pmd, 0, sizeof(*pmd));
+
+ pmd->vdev = vdev;
+ pmd->id = id;
+ pmd->flags = flags;
+ pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
+ pmd->role = role;
+ ret = memif_socket_init(eth_dev, socket_filename);
+ if (ret < 0)
+ return ret;
+
+ memset(pmd->secret, 0, sizeof(char) * 24);
+ if (secret != NULL)
+ strlcpy(pmd->secret, secret, sizeof(pmd->secret));
+
+ pmd->cfg.log2_ring_size = log2_ring_size;
+ /* set in .dev_configure() */
+ pmd->cfg.num_s2m_rings = 0;
+ pmd->cfg.num_m2s_rings = 0;
+
+ pmd->cfg.buffer_size = buffer_size;
+
+ rte_memcpy(&pmd->eth_addr, eth_addr, sizeof(struct ether_addr));
+
+ data = eth_dev->data;
+ data->dev_private = pmd;
+ data->numa_node = numa_node;
+ data->mac_addrs = &pmd->eth_addr;
+
+ eth_dev->dev_ops = &ops;
+ eth_dev->device = &vdev->device;
+ eth_dev->rx_pkt_burst = eth_memif_rx;
+ eth_dev->tx_pkt_burst = eth_memif_tx;
+
+ rte_eth_dev_probing_finish(eth_dev);
+
+ return 0;
+}
+
+static int
+memif_set_role(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ enum memif_role_t *role = (enum memif_role_t *)extra_args;
+
+ if (strstr(value, "master") != NULL) {
+ *role = MEMIF_ROLE_MASTER;
+ } else if (strstr(value, "slave") != NULL) {
+ *role = MEMIF_ROLE_SLAVE;
+ } else {
+ MIF_LOG(ERR, "Unknown role: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint32_t *flags = (uint32_t *)extra_args;
+
+ if (strstr(value, "yes") != NULL) {
+ *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
+ } else if (strstr(value, "no") != NULL) {
+ *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
+ } else {
+ MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
+
+ /* even if parsing fails, 0 is a valid id */
+ *id = strtoul(value, NULL, 10);
+ return 0;
+}
+
+static int
+memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ uint16_t *buffer_size = (uint16_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > 0xFFFF) {
+ MIF_LOG(ERR, "Invalid buffer size: %s.", value);
+ return -EINVAL;
+ }
+ *buffer_size = tmp;
+ return 0;
+}
+
+static int
+memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ memif_log2_ring_size_t *log2_ring_size =
+ (memif_log2_ring_size_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
+ MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
+ value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
+ return -EINVAL;
+ }
+ *log2_ring_size = tmp;
+ return 0;
+}
+
+/* check if directory exists and if we have permission to read/write */
+static int
+memif_check_socket_filename(const char *filename)
+{
+ char *dir = NULL, *tmp;
+ uint32_t idx;
+ int ret = 0;
+
+ tmp = strrchr(filename, '/');
+ if (tmp != NULL) {
+ idx = tmp - filename;
+ dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
+ if (dir == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory.");
+ return -1;
+ }
+ strlcpy(dir, filename, sizeof(char) * (idx + 1));
+ }
+
+ if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
+ W_OK, AT_EACCESS) < 0)) {
+ MIF_LOG(ERR, "Invalid directory: '%s'.", dir);
+ ret = -EINVAL;
+ }
+
+ if (dir != NULL)
+ rte_free(dir);
+
+ return ret;
+}
+
+static int
+memif_set_socket_filename(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ const char **socket_filename = (const char **)extra_args;
+
+ *socket_filename = value;
+ return memif_check_socket_filename(*socket_filename);
+}
+
+static int
+memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ struct ether_addr *eth_addr = (struct ether_addr *)extra_args;
+ int ret = 0;
+
+ ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+ ð_addr->addr_bytes[0], ð_addr->addr_bytes[1],
+ ð_addr->addr_bytes[2], ð_addr->addr_bytes[3],
+ ð_addr->addr_bytes[4], ð_addr->addr_bytes[5]);
+ if (ret != 6)
+ MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
+ return 0;
+}
+
+static int
+memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **secret = (const char **)extra_args;
+
+ *secret = value;
+ return 0;
+}
+
+static int
+rte_pmd_memif_probe(struct rte_vdev_device *vdev)
+{
+ RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
+ RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
+ int ret = 0;
+ struct rte_kvargs *kvlist;
+ const char *name = rte_vdev_device_name(vdev);
+ enum memif_role_t role = MEMIF_ROLE_SLAVE;
+ memif_interface_id_t id = 0;
+ uint16_t buffer_size = ETH_MEMIF_DEFAULT_BUFFER_SIZE;
+ memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
+ const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
+ uint32_t flags = 0;
+ const char *secret = NULL;
+ struct ether_addr eth_addr;
+
+ eth_random_addr(eth_addr.addr_bytes);
+
+ MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
+
+ kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
+
+ /* parse parameters */
+ if (kvlist != NULL) {
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
+ &memif_set_role, &role);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
+ &memif_set_id, &id);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_BUFFER_SIZE_ARG,
+ &memif_set_bs, &buffer_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
+ &memif_set_rs, &log2_ring_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
+ &memif_set_socket_filename,
+ (void *)(&socket_filename));
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
+ &memif_set_mac, ð_addr);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
+ &memif_set_zc, &flags);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
+ &memif_set_secret, (void *)(&secret));
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* create interface */
+ ret = memif_create(vdev, role, id, flags, socket_filename,
+ log2_ring_size, buffer_size, secret, ð_addr);
+
+ exit:
+ if (kvlist != NULL)
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_memif_remove(struct rte_vdev_device *vdev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internals *pmd;
+
+ eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
+ if (eth_dev == NULL)
+ return 0;
+
+ pmd = eth_dev->data->dev_private;
+
+ memif_msg_enq_disconnect(pmd->cc, "Invalid message size", 0);
+ memif_disconnect(eth_dev);
+
+ memif_socket_remove_device(eth_dev);
+
+ pmd->vdev = NULL;
+
+ rte_free(eth_dev->data->dev_private);
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_vdev_driver pmd_memif_drv = {
+ .probe = rte_pmd_memif_probe,
+ .remove = rte_pmd_memif_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
+
+RTE_PMD_REGISTER_PARAM_STRING(net_memif,
+ ETH_MEMIF_ID_ARG "=<int>"
+ ETH_MEMIF_ROLE_ARG "=master|slave"
+ ETH_MEMIF_BUFFER_SIZE_ARG "=<int>"
+ ETH_MEMIF_RING_SIZE_ARG "=<int>"
+ ETH_MEMIF_SOCKET_ARG "=<string>"
+ ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
+ ETH_MEMIF_ZC_ARG "=yes|no"
+ ETH_MEMIF_SECRET_ARG "=<string>");
+
+int memif_logtype;
+
+RTE_INIT(memif_init_log)
+{
+ memif_logtype = rte_log_register("pmd.net.memif");
+ if (memif_logtype >= 0)
+ rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
new file mode 100644
index 000000000..930de38ed
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.h
@@ -0,0 +1,203 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_ETH_MEMIF_H_
+#define _RTE_ETH_MEMIF_H_
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif /* GNU_SOURCE */
+
+#include <sys/queue.h>
+
+#include <rte_ethdev_driver.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
+#include "memif.h"
+
+#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/tmp/memif.sock"
+#define ETH_MEMIF_DEFAULT_RING_SIZE 10
+#define ETH_MEMIF_DEFAULT_BUFFER_SIZE 2048
+
+#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
+#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
+#define ETH_MEMIF_MAX_REGION_IDX 255
+
+extern int memif_logtype;
+
+#define MIF_LOG(level, fmt, args...) \
+ rte_log(RTE_LOG_ ## level, memif_logtype, \
+ "%s(): " fmt "\n", __func__, ##args)
+
+enum memif_role_t {
+ MEMIF_ROLE_MASTER,
+ MEMIF_ROLE_SLAVE,
+};
+
+struct memif_region {
+ void *addr; /**< shared memory address */
+ memif_region_size_t region_size; /**< shared memory size */
+ int fd; /**< shared memory file descriptor */
+ uint32_t buffer_offset; /**< offset at which buffers start */
+};
+
+struct memif_queue {
+ struct rte_mempool *mempool; /**< mempool for RX packets */
+ uint16_t in_port; /**< port id */
+
+ struct pmd_internals *pmd; /**< device internals */
+
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+
+ /* ring info */
+ memif_ring_type_t type; /**< ring type */
+ memif_ring_t *ring; /**< pointer to ring */
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+
+ memif_region_index_t region; /**< shared memory region index */
+ memif_region_offset_t offset; /**< offset at which the queue begins */
+
+ uint16_t last_head; /**< last ring head */
+ uint16_t last_tail; /**< last ring tail */
+
+ /* rx/tx info */
+ uint64_t n_pkts; /**< number of rx/tx packets */
+ uint64_t n_bytes; /**< number of rx/tx bytes */
+ uint64_t n_err; /**< number of tx errors */
+};
+
+struct pmd_internals {
+ memif_interface_id_t id; /**< unique id */
+ enum memif_role_t role; /**< device role */
+ uint32_t flags; /**< device status flags */
+#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
+/**< device is connecting */
+#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
+/**< device is connected */
+#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
+/**< device is zero-copy enabled */
+#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
+/**< device has not been configured and can not accept connection requests */
+
+ struct ether_addr eth_addr; /**< mac address */
+ char *socket_filename; /**< pointer to socket filename */
+ char secret[24]; /**< secret (optional security parameter) */
+
+ struct memif_control_channel *cc; /**< control channel */
+
+ struct memif_region *regions; /**< shared memory regions */
+ uint8_t regions_num; /**< number of regions */
+
+ /* remote info */
+ char remote_name[64]; /**< remote app name */
+ char remote_if_name[64]; /**< remote peer name */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t buffer_size; /**< buffer size */
+ } cfg; /**< Configured parameters (max values) */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t buffer_size; /**< buffer size */
+ } run;
+ /**< Parameters used in active connection */
+
+ char local_disc_string[96]; /**< local disconnect reason */
+ char remote_disc_string[96]; /**< remote disconnect reason */
+
+ struct rte_vdev_device *vdev; /**< vdev handle */
+};
+
+/**
+ * Unmap shared memory and free regions from memory.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_free_regions(struct pmd_internals *pmd);
+
+/**
+ * Finalize connection establishment process. Map shared memory file
+ * (master role), initialize ring queue, set link status up.
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect(struct rte_eth_dev *dev);
+
+/**
+ * Create shared memory file and initialize ring queue.
+ * Only called by slave when establishing connection
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_init_regions_and_queues(struct rte_eth_dev *dev);
+
+/**
+ * Get memif version string.
+ *
+ * @return
+ * - memif version string
+ */
+const char *memif_version(void);
+
+#ifndef MFD_HUGETLB
+#ifndef __NR_memfd_create
+
+#if defined __x86_64__
+#define __NR_memfd_create 319
+#elif defined __x86_32__
+#define __NR_memfd_create 1073742143
+#elif defined __arm__
+#define __NR_memfd_create 385
+#elif defined __aarch64__
+#define __NR_memfd_create 279
+#elif defined __powerpc__
+#define __NR_memfd_create 360
+#elif defined __i386__
+#define __NR_memfd_create 356
+#else
+#error "__NR_memfd_create unknown for this architecture"
+#endif
+
+#endif /* __NR_memfd_create */
+
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif /* MFD_HUGETLB */
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#endif /* RTE_ETH_MEMIF_H */
diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
new file mode 100644
index 000000000..4b2e62193
--- /dev/null
+++ b/drivers/net/memif/rte_pmd_memif_version.map
@@ -0,0 +1,4 @@
+DPDK_19.05 {
+
+ local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index 3ecc78cee..86ad1ec9e 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -22,6 +22,7 @@ drivers = ['af_packet',
'ixgbe',
'kni',
'liquidio',
+ 'memif',
'mlx4',
'mlx5',
'mvneta',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 262132fc6..d82cb0fb5 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -171,6 +171,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
--
2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-03-22 11:57 ` [dpdk-dev] [RFC v5] " Jakub Grajciar
2019-03-22 11:57 ` Jakub Grajciar
@ 2019-03-25 20:58 ` Ferruh Yigit
2019-03-25 20:58 ` Ferruh Yigit
2019-05-02 12:35 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-09 8:30 ` [dpdk-dev] [RFC v6] " Jakub Grajciar
2 siblings, 2 replies; 58+ messages in thread
From: Ferruh Yigit @ 2019-03-25 20:58 UTC (permalink / raw)
To: Jakub Grajciar, dev
On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> Memory interface (memif), provides high performance
> packet transfer over shared memory.
>
> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
<...>
> @@ -0,0 +1,200 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> + Copyright(c) 2018-2019 Cisco Systems, Inc.
> +
> +======================
> +Memif Poll Mode Driver
> +======================
> +Shared memory packet interface (memif) PMD allows for DPDK and any other client
> +using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
> +Linux only.
> +
> +The created device transmits packets in a raw format. It can be used with
> +Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
> +supported in DPDK memif implementation.
> +
> +Memif works in two roles: master and slave. Slave connects to master over an
> +existing socket. It is also a producer of shared memory file and initializes
> +the shared memory. Master creates the socket and listens for any slave
> +connection requests. The socket may already exist on the system. Be sure to
> +remove any such sockets, if you are creating a master interface, or you will
> +see an "Address already in use" error. Function ``rte_pmd_memif_remove()``,
Can it be possible to remove this existing socket on successfully termination of
the dpdk application with 'master' role?
Otherwise each time to run a dpdk app, requires to delete the socket file first.
> +which removes memif interface, will also remove a listener socket, if it is
> +not being used by any other interface.
> +
> +The method to enable one or more interfaces is to use the
> +``--vdev=net_memif0`` option on the DPDK application command line. Each
> +``--vdev=net_memif1`` option given will create an interface named net_memif0,
> +net_memif1, and so on. Memif uses unix domain socket to transmit control
> +messages. Each memif has a unique id per socket. If you are connecting multiple
> +interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
> +etc. Note that if you assign a socket to a master interface it becomes a
> +listener socket. Listener socket can not be used by a slave interface on same
> +client.
When I connect two slaves, id=0 & id=1 to the master, both master and second
slave crashed as soon as second slave connected, is this a know issue? Can you
please check this?
master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=master -- -i
slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=slave --vdev
net_memif0,role=slave,id=0 -- -i
slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=slave2 --vdev
net_memif0,role=slave,id=1 -- -i
<...>
> +Example: testpmd and testpmd
> +----------------------------
> +In this example we run two instances of testpmd application and transmit packets over memif.
How this play with multi process support? When I run a secondary app when memif
PMD is enabled in primary, secondary process crashes.
Can you please check and ensure at least nothing crashes when a secondary app run?
> +
> +First create ``master`` interface::
> +
> + #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
> +
> +Now create ``slave`` interface (master must be already running so the slave will connect)::
> +
> + #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
> +
> +Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
> +
> + testpmd> set fwd rxonly
> + testpmd> start
> +
> + testpmd> set fwd txonly
> + testpmd> start
Would it be useful to add loopback option to the PMD, for testing/debug ?
Also I am getting low performance numbers above, comparing the ring pmd for
example, is there any performance target for the pmd?
Same forwarding core for both testpmds, this is terrible, either something is
wrong or I am doing something wrong, it is ~40Kpps
Different forwarding cores for each testpmd, still low, !18Mpps
<...>
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
Can you please add here which experimental APIs are called (as a comment), this
help to keep trace of them in long term?
<...>
> +/**
> + * Buffer descriptor.
> + */
> +typedef struct __rte_packed {
> + uint16_t flags; /**< flags */
> +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
Is this define used?
<...>
> +
> +static struct rte_vdev_driver pmd_memif_drv;
Same comment from previous review, is this forward deceleration required?
<...>
> +static memif_ring_t *
> +memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
> +{
> + /* rings only in region 0 */
> + void *p = pmd->regions[0].addr;
> + int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
> + (1 << pmd->run.log2_ring_size);
> +
> + p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
According above code, I guess layout is as following [1], can you please correct
if it is wrong?
Can you please put this information somewhere, possibly the function comment
that allocates it, so that everyone doesn't need to figure out.
[1]
region 0:
+-----------------------+
| S2M rings | M2S rings |
+-----------------------+
S2M OR M2S Rings:
+-----------------------------------------+
| ring 0 | ring 1 | ring num_s2m_rings - 1|
+-----------------------------------------+
ring 0:
+-----------------------------------------------------+
| Ring Header | (1 << pmd->run.log2_ring_size) * desc |
+-----------------------------------------------------+
region 1:
+-------------------------------------------------------------------------+
| packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
+-------------------------------------------------------------------------+
Is there any order on packet buffers?
<...>
> + while (n_slots && n_rx_pkts < nb_pkts) {
> + mbuf_head = rte_pktmbuf_alloc(mq->mempool);
> + if (unlikely(mbuf_head == NULL))
> + goto no_free_bufs;
> + mbuf = mbuf_head;
> + mbuf->port = mq->in_port;
> +
> + next_slot:
> + s0 = cur_slot & mask;
> + d0 = &ring->desc[s0];
> +
> + src_len = d0->length;
> + dst_off = 0;
> + src_off = 0;
> +
> + do {
> + dst_len = mbuf_size - dst_off;
> + if (dst_len == 0) {
> + dst_off = 0;
> + dst_len = mbuf_size + RTE_PKTMBUF_HEADROOM;
> +
> + mbuf = rte_pktmbuf_alloc(mq->mempool);
> + if (unlikely(mbuf == NULL))
> + goto no_free_bufs;
> + mbuf->port = mq->in_port;
> + rte_pktmbuf_chain(mbuf_head, mbuf);
> + }
> + cp_len = RTE_MIN(dst_len, src_len);
> +
> + rte_pktmbuf_pkt_len(mbuf) =
> + rte_pktmbuf_data_len(mbuf) += cp_len;
Can you please make this two lines to prevent confusion?
Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'?
'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len' is not
set to 'cp_len'...
<...>
> +void
> +memif_free_regions(struct pmd_internals *pmd)
> +{
> + int i;
> + struct memif_region *r;
> +
> + for (i = 0; i < pmd->regions_num; i++) {
> + r = pmd->regions + i;
> + if (r == NULL)
> + return;
'r' can be NULL only when 'pmd->region' is NULL and 'i == 0', so is it better to
check 'pmd->region' is NULL check before the loop?
> + if (r->addr == NULL)
> + return;
> + munmap(r->addr, r->region_size);
> + if (r->fd > 0) {
> + close(r->fd);
> + r->fd = -1;
> + }
> + }
> + rte_free(pmd->regions);
> +}
> +
> +static int
> +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn)
> +{
> + struct memif_region *r;
> + char shm_name[32];
> + int i;
> + int ret = 0;
> +
> + r = rte_zmalloc("memif_region", sizeof(struct memif_region) * (brn + 1), 0);
> + if (r == NULL) {
> + MIF_LOG(ERR, "%s: Failed to allocate regions.",
> + rte_vdev_device_name(pmd->vdev));
> + return -ENOMEM;
> + }
> +
> + pmd->regions = r;
> + pmd->regions_num = brn + 1;
> +
> + /*
> + * Create shm for every region. Region 0 is reserved for descriptors.
> + * Other regions contain buffers.
> + */
> + for (i = 0; i < (brn + 1); i++) {
> + r = &pmd->regions[i];
> +
> + r->buffer_offset = (i == 0) ? (pmd->run.num_s2m_rings +
> + pmd->run.num_m2s_rings) *
> + (sizeof(memif_ring_t) +
> + sizeof(memif_desc_t) * (1 << pmd->run.log2_ring_size)) : 0;
For complex operations can you please prefer regular if check against ternary,
with the help of the formatting, it is hard to follow this.
what exactly "buffer_offset" is? For 'region 0' it calculates the size of the
size of the 'region 0', otherwise 0. This is offset starting from where? And it
seems only used for below size assignment.
> + r->region_size = (i == 0) ? r->buffer_offset :
> + (uint32_t)(pmd->run.buffer_size *
I guess 'buffer_size' is buffer size per packet, to make this clear, what do you
think to rename it 'packet_buffer_size' ?
> + (1 << pmd->run.log2_ring_size) *
> + (pmd->run.num_s2m_rings +
> + pmd->run.num_m2s_rings));
There is an illusion of packet buffers can be in multiple regions (above comment
implies it) but this logic assumes all packet buffer are in same region other
than region 0, which gives us 'region 1' and this is already hardcoded in a few
places. Is there a benefit of assuming there will be more regions, will it make
simple to accept 'region 0' for rings and 'region 1' is for packet buffer?
> +
> + memset(shm_name, 0, sizeof(char) * 32);
> + sprintf(shm_name, "memif region %d", i);
snprintf please, and can you please add a define for shm_name size?
<...>
> +static void
> +memif_init_rings(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + memif_ring_t *ring;
> + int i, j;
> + uint16_t slot;
> +
> + for (i = 0; i < pmd->run.num_s2m_rings; i++) {
> + ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
> + ring->head = 0;
> + ring->tail = 0;
> + ring->cookie = MEMIF_COOKIE;
> + ring->flags = 0;
> + for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
> + slot = i * (1 << pmd->run.log2_ring_size) + j;
> + ring->desc[j].region = 1;
Why 'region' 1 is hardcoded for buffer, can it be possible to use multiple
regions for buffers?
<...>
> +int
> +memif_connect(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_region *mr;
> + struct memif_queue *mq;
> + int i;
> +
> + for (i = 0; i < pmd->regions_num; i++) {
> + mr = pmd->regions + i;
> + if (mr != NULL) {
> + if (mr->addr == NULL) {
> + if (mr->fd < 0)
> + return -1;
> + mr->addr = mmap(NULL, mr->region_size,
> + PROT_READ | PROT_WRITE,
> + MAP_SHARED, mr->fd, 0);
> + if (mr->addr == NULL)
> + return -1;
> + }
> + }
> + }
For one master work with multiple slaves, there should be an array of regions,
right, one for for each slave.
Also same concern is valid for Rx/Tx queues, master and slave uses same
dev->data->tx_queues / dev->data->rx_queue, but crossover. I think a more
complex logic required for multiple slave support.
If multiple slave is not supported, is 'id' devarg still valid, or is it used at
all?
<...>
> +static int
> +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
> + memif_interface_id_t id, uint32_t flags,
> + const char *socket_filename,
> + memif_log2_ring_size_t log2_ring_size,
> + uint16_t buffer_size, const char *secret,
> + struct ether_addr *eth_addr)
> +{
> + int ret = 0;
> + struct rte_eth_dev *eth_dev;
> + struct rte_eth_dev_data *data;
> + struct pmd_internals *pmd;
> + const unsigned int numa_node = vdev->device.numa_node;
> + const char *name = rte_vdev_device_name(vdev);
> +
> + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
> + MIF_LOG(ERR, "Zero-copy not supported.");
> + return -1;
> + }
What is the plan for the zero copy?
Can you please put the status & plan to the documentation?
<...>
> +struct memif_queue {
> + struct rte_mempool *mempool; /**< mempool for RX packets */
> + uint16_t in_port; /**< port id */
> +
> + struct pmd_internals *pmd; /**< device internals */
> +
> + struct rte_intr_handle intr_handle; /**< interrupt handle */
> +
> + /* ring info */
> + memif_ring_type_t type; /**< ring type */
> + memif_ring_t *ring; /**< pointer to ring */
> + memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
> +
> + memif_region_index_t region; /**< shared memory region index */
> + memif_region_offset_t offset; /**< offset at which the queue begins */
Offset from where, it is start of the region but I think better to detail in the
comment.
> +
> + uint16_t last_head; /**< last ring head */
> + uint16_t last_tail; /**< last ring tail */
ring already has head, tail variables, what is the difference in queue ones?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-03-25 20:58 ` Ferruh Yigit
@ 2019-03-25 20:58 ` Ferruh Yigit
2019-05-02 12:35 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
1 sibling, 0 replies; 58+ messages in thread
From: Ferruh Yigit @ 2019-03-25 20:58 UTC (permalink / raw)
To: Jakub Grajciar, dev
On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> Memory interface (memif), provides high performance
> packet transfer over shared memory.
>
> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
<...>
> @@ -0,0 +1,200 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> + Copyright(c) 2018-2019 Cisco Systems, Inc.
> +
> +======================
> +Memif Poll Mode Driver
> +======================
> +Shared memory packet interface (memif) PMD allows for DPDK and any other client
> +using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
> +Linux only.
> +
> +The created device transmits packets in a raw format. It can be used with
> +Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
> +supported in DPDK memif implementation.
> +
> +Memif works in two roles: master and slave. Slave connects to master over an
> +existing socket. It is also a producer of shared memory file and initializes
> +the shared memory. Master creates the socket and listens for any slave
> +connection requests. The socket may already exist on the system. Be sure to
> +remove any such sockets, if you are creating a master interface, or you will
> +see an "Address already in use" error. Function ``rte_pmd_memif_remove()``,
Can it be possible to remove this existing socket on successfully termination of
the dpdk application with 'master' role?
Otherwise each time to run a dpdk app, requires to delete the socket file first.
> +which removes memif interface, will also remove a listener socket, if it is
> +not being used by any other interface.
> +
> +The method to enable one or more interfaces is to use the
> +``--vdev=net_memif0`` option on the DPDK application command line. Each
> +``--vdev=net_memif1`` option given will create an interface named net_memif0,
> +net_memif1, and so on. Memif uses unix domain socket to transmit control
> +messages. Each memif has a unique id per socket. If you are connecting multiple
> +interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
> +etc. Note that if you assign a socket to a master interface it becomes a
> +listener socket. Listener socket can not be used by a slave interface on same
> +client.
When I connect two slaves, id=0 & id=1 to the master, both master and second
slave crashed as soon as second slave connected, is this a know issue? Can you
please check this?
master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=master -- -i
slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=slave --vdev
net_memif0,role=slave,id=0 -- -i
slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=slave2 --vdev
net_memif0,role=slave,id=1 -- -i
<...>
> +Example: testpmd and testpmd
> +----------------------------
> +In this example we run two instances of testpmd application and transmit packets over memif.
How this play with multi process support? When I run a secondary app when memif
PMD is enabled in primary, secondary process crashes.
Can you please check and ensure at least nothing crashes when a secondary app run?
> +
> +First create ``master`` interface::
> +
> + #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
> +
> +Now create ``slave`` interface (master must be already running so the slave will connect)::
> +
> + #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
> +
> +Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
> +
> + testpmd> set fwd rxonly
> + testpmd> start
> +
> + testpmd> set fwd txonly
> + testpmd> start
Would it be useful to add loopback option to the PMD, for testing/debug ?
Also I am getting low performance numbers above, comparing the ring pmd for
example, is there any performance target for the pmd?
Same forwarding core for both testpmds, this is terrible, either something is
wrong or I am doing something wrong, it is ~40Kpps
Different forwarding cores for each testpmd, still low, !18Mpps
<...>
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
Can you please add here which experimental APIs are called (as a comment), this
help to keep trace of them in long term?
<...>
> +/**
> + * Buffer descriptor.
> + */
> +typedef struct __rte_packed {
> + uint16_t flags; /**< flags */
> +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
Is this define used?
<...>
> +
> +static struct rte_vdev_driver pmd_memif_drv;
Same comment from previous review, is this forward deceleration required?
<...>
> +static memif_ring_t *
> +memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
> +{
> + /* rings only in region 0 */
> + void *p = pmd->regions[0].addr;
> + int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
> + (1 << pmd->run.log2_ring_size);
> +
> + p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
According above code, I guess layout is as following [1], can you please correct
if it is wrong?
Can you please put this information somewhere, possibly the function comment
that allocates it, so that everyone doesn't need to figure out.
[1]
region 0:
+-----------------------+
| S2M rings | M2S rings |
+-----------------------+
S2M OR M2S Rings:
+-----------------------------------------+
| ring 0 | ring 1 | ring num_s2m_rings - 1|
+-----------------------------------------+
ring 0:
+-----------------------------------------------------+
| Ring Header | (1 << pmd->run.log2_ring_size) * desc |
+-----------------------------------------------------+
region 1:
+-------------------------------------------------------------------------+
| packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
+-------------------------------------------------------------------------+
Is there any order on packet buffers?
<...>
> + while (n_slots && n_rx_pkts < nb_pkts) {
> + mbuf_head = rte_pktmbuf_alloc(mq->mempool);
> + if (unlikely(mbuf_head == NULL))
> + goto no_free_bufs;
> + mbuf = mbuf_head;
> + mbuf->port = mq->in_port;
> +
> + next_slot:
> + s0 = cur_slot & mask;
> + d0 = &ring->desc[s0];
> +
> + src_len = d0->length;
> + dst_off = 0;
> + src_off = 0;
> +
> + do {
> + dst_len = mbuf_size - dst_off;
> + if (dst_len == 0) {
> + dst_off = 0;
> + dst_len = mbuf_size + RTE_PKTMBUF_HEADROOM;
> +
> + mbuf = rte_pktmbuf_alloc(mq->mempool);
> + if (unlikely(mbuf == NULL))
> + goto no_free_bufs;
> + mbuf->port = mq->in_port;
> + rte_pktmbuf_chain(mbuf_head, mbuf);
> + }
> + cp_len = RTE_MIN(dst_len, src_len);
> +
> + rte_pktmbuf_pkt_len(mbuf) =
> + rte_pktmbuf_data_len(mbuf) += cp_len;
Can you please make this two lines to prevent confusion?
Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'?
'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len' is not
set to 'cp_len'...
<...>
> +void
> +memif_free_regions(struct pmd_internals *pmd)
> +{
> + int i;
> + struct memif_region *r;
> +
> + for (i = 0; i < pmd->regions_num; i++) {
> + r = pmd->regions + i;
> + if (r == NULL)
> + return;
'r' can be NULL only when 'pmd->region' is NULL and 'i == 0', so is it better to
check 'pmd->region' is NULL check before the loop?
> + if (r->addr == NULL)
> + return;
> + munmap(r->addr, r->region_size);
> + if (r->fd > 0) {
> + close(r->fd);
> + r->fd = -1;
> + }
> + }
> + rte_free(pmd->regions);
> +}
> +
> +static int
> +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn)
> +{
> + struct memif_region *r;
> + char shm_name[32];
> + int i;
> + int ret = 0;
> +
> + r = rte_zmalloc("memif_region", sizeof(struct memif_region) * (brn + 1), 0);
> + if (r == NULL) {
> + MIF_LOG(ERR, "%s: Failed to allocate regions.",
> + rte_vdev_device_name(pmd->vdev));
> + return -ENOMEM;
> + }
> +
> + pmd->regions = r;
> + pmd->regions_num = brn + 1;
> +
> + /*
> + * Create shm for every region. Region 0 is reserved for descriptors.
> + * Other regions contain buffers.
> + */
> + for (i = 0; i < (brn + 1); i++) {
> + r = &pmd->regions[i];
> +
> + r->buffer_offset = (i == 0) ? (pmd->run.num_s2m_rings +
> + pmd->run.num_m2s_rings) *
> + (sizeof(memif_ring_t) +
> + sizeof(memif_desc_t) * (1 << pmd->run.log2_ring_size)) : 0;
For complex operations can you please prefer regular if check against ternary,
with the help of the formatting, it is hard to follow this.
what exactly "buffer_offset" is? For 'region 0' it calculates the size of the
size of the 'region 0', otherwise 0. This is offset starting from where? And it
seems only used for below size assignment.
> + r->region_size = (i == 0) ? r->buffer_offset :
> + (uint32_t)(pmd->run.buffer_size *
I guess 'buffer_size' is buffer size per packet, to make this clear, what do you
think to rename it 'packet_buffer_size' ?
> + (1 << pmd->run.log2_ring_size) *
> + (pmd->run.num_s2m_rings +
> + pmd->run.num_m2s_rings));
There is an illusion of packet buffers can be in multiple regions (above comment
implies it) but this logic assumes all packet buffer are in same region other
than region 0, which gives us 'region 1' and this is already hardcoded in a few
places. Is there a benefit of assuming there will be more regions, will it make
simple to accept 'region 0' for rings and 'region 1' is for packet buffer?
> +
> + memset(shm_name, 0, sizeof(char) * 32);
> + sprintf(shm_name, "memif region %d", i);
snprintf please, and can you please add a define for shm_name size?
<...>
> +static void
> +memif_init_rings(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + memif_ring_t *ring;
> + int i, j;
> + uint16_t slot;
> +
> + for (i = 0; i < pmd->run.num_s2m_rings; i++) {
> + ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
> + ring->head = 0;
> + ring->tail = 0;
> + ring->cookie = MEMIF_COOKIE;
> + ring->flags = 0;
> + for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
> + slot = i * (1 << pmd->run.log2_ring_size) + j;
> + ring->desc[j].region = 1;
Why 'region' 1 is hardcoded for buffer, can it be possible to use multiple
regions for buffers?
<...>
> +int
> +memif_connect(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_region *mr;
> + struct memif_queue *mq;
> + int i;
> +
> + for (i = 0; i < pmd->regions_num; i++) {
> + mr = pmd->regions + i;
> + if (mr != NULL) {
> + if (mr->addr == NULL) {
> + if (mr->fd < 0)
> + return -1;
> + mr->addr = mmap(NULL, mr->region_size,
> + PROT_READ | PROT_WRITE,
> + MAP_SHARED, mr->fd, 0);
> + if (mr->addr == NULL)
> + return -1;
> + }
> + }
> + }
For one master work with multiple slaves, there should be an array of regions,
right, one for for each slave.
Also same concern is valid for Rx/Tx queues, master and slave uses same
dev->data->tx_queues / dev->data->rx_queue, but crossover. I think a more
complex logic required for multiple slave support.
If multiple slave is not supported, is 'id' devarg still valid, or is it used at
all?
<...>
> +static int
> +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
> + memif_interface_id_t id, uint32_t flags,
> + const char *socket_filename,
> + memif_log2_ring_size_t log2_ring_size,
> + uint16_t buffer_size, const char *secret,
> + struct ether_addr *eth_addr)
> +{
> + int ret = 0;
> + struct rte_eth_dev *eth_dev;
> + struct rte_eth_dev_data *data;
> + struct pmd_internals *pmd;
> + const unsigned int numa_node = vdev->device.numa_node;
> + const char *name = rte_vdev_device_name(vdev);
> +
> + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
> + MIF_LOG(ERR, "Zero-copy not supported.");
> + return -1;
> + }
What is the plan for the zero copy?
Can you please put the status & plan to the documentation?
<...>
> +struct memif_queue {
> + struct rte_mempool *mempool; /**< mempool for RX packets */
> + uint16_t in_port; /**< port id */
> +
> + struct pmd_internals *pmd; /**< device internals */
> +
> + struct rte_intr_handle intr_handle; /**< interrupt handle */
> +
> + /* ring info */
> + memif_ring_type_t type; /**< ring type */
> + memif_ring_t *ring; /**< pointer to ring */
> + memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
> +
> + memif_region_index_t region; /**< shared memory region index */
> + memif_region_offset_t offset; /**< offset at which the queue begins */
Offset from where, it is start of the region but I think better to detail in the
comment.
> +
> + uint16_t last_head; /**< last ring head */
> + uint16_t last_tail; /**< last ring tail */
ring already has head, tail variables, what is the difference in queue ones?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-03-25 20:58 ` Ferruh Yigit
2019-03-25 20:58 ` Ferruh Yigit
@ 2019-05-02 12:35 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-02 12:35 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-03 4:27 ` Honnappa Nagarahalli
1 sibling, 2 replies; 58+ messages in thread
From: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) @ 2019-05-02 12:35 UTC (permalink / raw)
To: Ferruh Yigit, dev
________________________________________
From: Ferruh Yigit <ferruh.yigit@intel.com>
Sent: Monday, March 25, 2019 9:58 PM
To: Jakub Grajciar; dev@dpdk.org
Subject: Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> Memory interface (memif), provides high performance
> packet transfer over shared memory.
>
> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
<...>
> @@ -0,0 +1,200 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> + Copyright(c) 2018-2019 Cisco Systems, Inc.
> +
> +======================
> +Memif Poll Mode Driver
> +======================
> +Shared memory packet interface (memif) PMD allows for DPDK and any other client
> +using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
> +Linux only.
> +
> +The created device transmits packets in a raw format. It can be used with
> +Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
> +supported in DPDK memif implementation.
> +
> +Memif works in two roles: master and slave. Slave connects to master over an
> +existing socket. It is also a producer of shared memory file and initializes
> +the shared memory. Master creates the socket and listens for any slave
> +connection requests. The socket may already exist on the system. Be sure to
> +remove any such sockets, if you are creating a master interface, or you will
> +see an "Address already in use" error. Function ``rte_pmd_memif_remove()``,
Can it be possible to remove this existing socket on successfully termination of
the dpdk application with 'master' role?
Otherwise each time to run a dpdk app, requires to delete the socket file first.
net_memif is the same case as net_virtio_user, I'd use the same workaround.
testpmd.c:pmd_test_exit() this, however, is only valid for testpmd.
> +which removes memif interface, will also remove a listener socket, if it is
> +not being used by any other interface.
> +
> +The method to enable one or more interfaces is to use the
> +``--vdev=net_memif0`` option on the DPDK application command line. Each
> +``--vdev=net_memif1`` option given will create an interface named net_memif0,
> +net_memif1, and so on. Memif uses unix domain socket to transmit control
> +messages. Each memif has a unique id per socket. If you are connecting multiple
> +interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
> +etc. Note that if you assign a socket to a master interface it becomes a
> +listener socket. Listener socket can not be used by a slave interface on same
> +client.
When I connect two slaves, id=0 & id=1 to the master, both master and second
slave crashed as soon as second slave connected, is this a know issue? Can you
please check this?
master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=master -- -i
slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=slave --vdev
net_memif0,role=slave,id=0 -- -i
slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=slave2 --vdev
net_memif0,role=slave,id=1 -- -i
<...>
Each interface can be connected to one peer interface at the same time, I'll
add more details to the documentation.
> +Example: testpmd and testpmd
> +----------------------------
> +In this example we run two instances of testpmd application and transmit packets over memif.
How this play with multi process support? When I run a secondary app when memif
PMD is enabled in primary, secondary process crashes.
Can you please check and ensure at least nothing crashes when a secondary app run?
For now I'd disable multi-process support, so we can get the patch applied, then
provide multi-process support in a separate patch.
> +
> +First create ``master`` interface::
> +
> + #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
> +
> +Now create ``slave`` interface (master must be already running so the slave will connect)::
> +
> + #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
> +
> +Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
> +
> + testpmd> set fwd rxonly
> + testpmd> start
> +
> + testpmd> set fwd txonly
> + testpmd> start
Would it be useful to add loopback option to the PMD, for testing/debug ?
Also I am getting low performance numbers above, comparing the ring pmd for
example, is there any performance target for the pmd?
Same forwarding core for both testpmds, this is terrible, either something is
wrong or I am doing something wrong, it is ~40Kpps
Different forwarding cores for each testpmd, still low, !18Mpps
<...>
The difference between ring pmd and memif pmd is that while memif transfers packets,
ring transfers whole buffers. (Also ring can not be used process to process)
That means, it does not have to alloc/free buffers.
I did a simple test where I modified the code so the tx function will only free given buffers
and rx allocates new buffers. On my machine, this can only handle 45-50Mpps.
With that in mind, I believe that 23Mpps is fine performance. No performance target is
defined, the goal is to be as fast as possible.
The cause of the issue, where traffic drops to 40Kpps while using the same core for both applications,
is that within the timeslice given to testpmd process, memif driver fills its queues,
but keeps polling, not giving the other process a chance to receive the packets.
Same applies to rx side, where an empty queue is polled over and over again. In such configuration,
interrupt rx-mode should be used instead of polling. Or the application could suspend
the port.
> +/**
> + * Buffer descriptor.
> + */
> +typedef struct __rte_packed {
> + uint16_t flags; /**< flags */
> +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
Is this define used?
<...>
Yes, see eth_memif_tx() and eth_memif_rx()
> + while (n_slots && n_rx_pkts < nb_pkts) {
> + mbuf_head = rte_pktmbuf_alloc(mq->mempool);
> + if (unlikely(mbuf_head == NULL))
> + goto no_free_bufs;
> + mbuf = mbuf_head;
> + mbuf->port = mq->in_port;
> +
> + next_slot:
> + s0 = cur_slot & mask;
> + d0 = &ring->desc[s0];
> +
> + src_len = d0->length;
> + dst_off = 0;
> + src_off = 0;
> +
> + do {
> + dst_len = mbuf_size - dst_off;
> + if (dst_len == 0) {
> + dst_off = 0;
> + dst_len = mbuf_size + RTE_PKTMBUF_HEADROOM;
> +
> + mbuf = rte_pktmbuf_alloc(mq->mempool);
> + if (unlikely(mbuf == NULL))
> + goto no_free_bufs;
> + mbuf->port = mq->in_port;
> + rte_pktmbuf_chain(mbuf_head, mbuf);
> + }
> + cp_len = RTE_MIN(dst_len, src_len);
> +
> + rte_pktmbuf_pkt_len(mbuf) =
> + rte_pktmbuf_data_len(mbuf) += cp_len;
Can you please make this two lines to prevent confusion?
Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'?
'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len' is not
set to 'cp_len'...
<...>
Fix in next patch.
> + if (r->addr == NULL)
> + return;
> + munmap(r->addr, r->region_size);
> + if (r->fd > 0) {
> + close(r->fd);
> + r->fd = -1;
> + }
> + }
> + rte_free(pmd->regions);
> +}
> +
> +static int
> +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn)
> +{
> + struct memif_region *r;
> + char shm_name[32];
> + int i;
> + int ret = 0;
> +
> + r = rte_zmalloc("memif_region", sizeof(struct memif_region) * (brn + 1), 0);
> + if (r == NULL) {
> + MIF_LOG(ERR, "%s: Failed to allocate regions.",
> + rte_vdev_device_name(pmd->vdev));
> + return -ENOMEM;
> + }
> +
> + pmd->regions = r;
> + pmd->regions_num = brn + 1;
> +
> + /*
> + * Create shm for every region. Region 0 is reserved for descriptors.
> + * Other regions contain buffers.
> + */
> + for (i = 0; i < (brn + 1); i++) {
> + r = &pmd->regions[i];
> +
> + r->buffer_offset = (i == 0) ? (pmd->run.num_s2m_rings +
> + pmd->run.num_m2s_rings) *
> + (sizeof(memif_ring_t) +
> + sizeof(memif_desc_t) * (1 << pmd->run.log2_ring_size)) : 0;
what exactly "buffer_offset" is? For 'region 0' it calculates the size of the
size of the 'region 0', otherwise 0. This is offset starting from where? And it
seems only used for below size assignment.
It is possible to use single region for one connection (non-zero-copy) to reduce
the number of files opened. It will be implemented in the next patch.
'buffer_offset' is offset from the start of shared memory file to the first packet buffer.
> + (1 << pmd->run.log2_ring_size) *
> + (pmd->run.num_s2m_rings +
> + pmd->run.num_m2s_rings));
There is an illusion of packet buffers can be in multiple regions (above comment
implies it) but this logic assumes all packet buffer are in same region other
than region 0, which gives us 'region 1' and this is already hardcoded in a few
places. Is there a benefit of assuming there will be more regions, will it make
simple to accept 'region 0' for rings and 'region 1' is for packet buffer?
Multiple regions are required in case of zero-copy slave. The zero-copy will
expose dpdk shared memory, where each memseg/memseg_list will be
represended by memif_region.
> +int
> +memif_connect(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_region *mr;
> + struct memif_queue *mq;
> + int i;
> +
> + for (i = 0; i < pmd->regions_num; i++) {
> + mr = pmd->regions + i;
> + if (mr != NULL) {
> + if (mr->addr == NULL) {
> + if (mr->fd < 0)
> + return -1;
> + mr->addr = mmap(NULL, mr->region_size,
> + PROT_READ | PROT_WRITE,
> + MAP_SHARED, mr->fd, 0);
> + if (mr->addr == NULL)
> + return -1;
> + }
> + }
> + }
If multiple slave is not supported, is 'id' devarg still valid, or is it used at
all?
'id' is used to identify peer interface. Interface with id=0 will only connect to an
interface with id=0 and so on...
<...>
> +static int
> +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
> + memif_interface_id_t id, uint32_t flags,
> + const char *socket_filename,
> + memif_log2_ring_size_t log2_ring_size,
> + uint16_t buffer_size, const char *secret,
> + struct ether_addr *eth_addr)
> +{
> + int ret = 0;
> + struct rte_eth_dev *eth_dev;
> + struct rte_eth_dev_data *data;
> + struct pmd_internals *pmd;
> + const unsigned int numa_node = vdev->device.numa_node;
> + const char *name = rte_vdev_device_name(vdev);
> +
> + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
> + MIF_LOG(ERR, "Zero-copy not supported.");
> + return -1;
> + }
What is the plan for the zero copy?
Can you please put the status & plan to the documentation?
<...>
I don't think that putting the status in the docs is needed. Zero-copy slave is in testing stage
and I should be able to submit a patch once we get this one applied.
> +
> + uint16_t last_head; /**< last ring head */
> + uint16_t last_tail; /**< last ring tail */
ring already has head, tail variables, what is the difference in queue ones?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-02 12:35 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
@ 2019-05-02 12:35 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-03 4:27 ` Honnappa Nagarahalli
1 sibling, 0 replies; 58+ messages in thread
From: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) @ 2019-05-02 12:35 UTC (permalink / raw)
To: Ferruh Yigit, dev
________________________________________
From: Ferruh Yigit <ferruh.yigit@intel.com>
Sent: Monday, March 25, 2019 9:58 PM
To: Jakub Grajciar; dev@dpdk.org
Subject: Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> Memory interface (memif), provides high performance
> packet transfer over shared memory.
>
> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
<...>
> @@ -0,0 +1,200 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> + Copyright(c) 2018-2019 Cisco Systems, Inc.
> +
> +======================
> +Memif Poll Mode Driver
> +======================
> +Shared memory packet interface (memif) PMD allows for DPDK and any other client
> +using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
> +Linux only.
> +
> +The created device transmits packets in a raw format. It can be used with
> +Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
> +supported in DPDK memif implementation.
> +
> +Memif works in two roles: master and slave. Slave connects to master over an
> +existing socket. It is also a producer of shared memory file and initializes
> +the shared memory. Master creates the socket and listens for any slave
> +connection requests. The socket may already exist on the system. Be sure to
> +remove any such sockets, if you are creating a master interface, or you will
> +see an "Address already in use" error. Function ``rte_pmd_memif_remove()``,
Can it be possible to remove this existing socket on successfully termination of
the dpdk application with 'master' role?
Otherwise each time to run a dpdk app, requires to delete the socket file first.
net_memif is the same case as net_virtio_user, I'd use the same workaround.
testpmd.c:pmd_test_exit() this, however, is only valid for testpmd.
> +which removes memif interface, will also remove a listener socket, if it is
> +not being used by any other interface.
> +
> +The method to enable one or more interfaces is to use the
> +``--vdev=net_memif0`` option on the DPDK application command line. Each
> +``--vdev=net_memif1`` option given will create an interface named net_memif0,
> +net_memif1, and so on. Memif uses unix domain socket to transmit control
> +messages. Each memif has a unique id per socket. If you are connecting multiple
> +interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
> +etc. Note that if you assign a socket to a master interface it becomes a
> +listener socket. Listener socket can not be used by a slave interface on same
> +client.
When I connect two slaves, id=0 & id=1 to the master, both master and second
slave crashed as soon as second slave connected, is this a know issue? Can you
please check this?
master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=master -- -i
slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=slave --vdev
net_memif0,role=slave,id=0 -- -i
slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=slave2 --vdev
net_memif0,role=slave,id=1 -- -i
<...>
Each interface can be connected to one peer interface at the same time, I'll
add more details to the documentation.
> +Example: testpmd and testpmd
> +----------------------------
> +In this example we run two instances of testpmd application and transmit packets over memif.
How this play with multi process support? When I run a secondary app when memif
PMD is enabled in primary, secondary process crashes.
Can you please check and ensure at least nothing crashes when a secondary app run?
For now I'd disable multi-process support, so we can get the patch applied, then
provide multi-process support in a separate patch.
> +
> +First create ``master`` interface::
> +
> + #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
> +
> +Now create ``slave`` interface (master must be already running so the slave will connect)::
> +
> + #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
> +
> +Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
> +
> + testpmd> set fwd rxonly
> + testpmd> start
> +
> + testpmd> set fwd txonly
> + testpmd> start
Would it be useful to add loopback option to the PMD, for testing/debug ?
Also I am getting low performance numbers above, comparing the ring pmd for
example, is there any performance target for the pmd?
Same forwarding core for both testpmds, this is terrible, either something is
wrong or I am doing something wrong, it is ~40Kpps
Different forwarding cores for each testpmd, still low, !18Mpps
<...>
The difference between ring pmd and memif pmd is that while memif transfers packets,
ring transfers whole buffers. (Also ring can not be used process to process)
That means, it does not have to alloc/free buffers.
I did a simple test where I modified the code so the tx function will only free given buffers
and rx allocates new buffers. On my machine, this can only handle 45-50Mpps.
With that in mind, I believe that 23Mpps is fine performance. No performance target is
defined, the goal is to be as fast as possible.
The cause of the issue, where traffic drops to 40Kpps while using the same core for both applications,
is that within the timeslice given to testpmd process, memif driver fills its queues,
but keeps polling, not giving the other process a chance to receive the packets.
Same applies to rx side, where an empty queue is polled over and over again. In such configuration,
interrupt rx-mode should be used instead of polling. Or the application could suspend
the port.
> +/**
> + * Buffer descriptor.
> + */
> +typedef struct __rte_packed {
> + uint16_t flags; /**< flags */
> +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
Is this define used?
<...>
Yes, see eth_memif_tx() and eth_memif_rx()
> + while (n_slots && n_rx_pkts < nb_pkts) {
> + mbuf_head = rte_pktmbuf_alloc(mq->mempool);
> + if (unlikely(mbuf_head == NULL))
> + goto no_free_bufs;
> + mbuf = mbuf_head;
> + mbuf->port = mq->in_port;
> +
> + next_slot:
> + s0 = cur_slot & mask;
> + d0 = &ring->desc[s0];
> +
> + src_len = d0->length;
> + dst_off = 0;
> + src_off = 0;
> +
> + do {
> + dst_len = mbuf_size - dst_off;
> + if (dst_len == 0) {
> + dst_off = 0;
> + dst_len = mbuf_size + RTE_PKTMBUF_HEADROOM;
> +
> + mbuf = rte_pktmbuf_alloc(mq->mempool);
> + if (unlikely(mbuf == NULL))
> + goto no_free_bufs;
> + mbuf->port = mq->in_port;
> + rte_pktmbuf_chain(mbuf_head, mbuf);
> + }
> + cp_len = RTE_MIN(dst_len, src_len);
> +
> + rte_pktmbuf_pkt_len(mbuf) =
> + rte_pktmbuf_data_len(mbuf) += cp_len;
Can you please make this two lines to prevent confusion?
Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'?
'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len' is not
set to 'cp_len'...
<...>
Fix in next patch.
> + if (r->addr == NULL)
> + return;
> + munmap(r->addr, r->region_size);
> + if (r->fd > 0) {
> + close(r->fd);
> + r->fd = -1;
> + }
> + }
> + rte_free(pmd->regions);
> +}
> +
> +static int
> +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn)
> +{
> + struct memif_region *r;
> + char shm_name[32];
> + int i;
> + int ret = 0;
> +
> + r = rte_zmalloc("memif_region", sizeof(struct memif_region) * (brn + 1), 0);
> + if (r == NULL) {
> + MIF_LOG(ERR, "%s: Failed to allocate regions.",
> + rte_vdev_device_name(pmd->vdev));
> + return -ENOMEM;
> + }
> +
> + pmd->regions = r;
> + pmd->regions_num = brn + 1;
> +
> + /*
> + * Create shm for every region. Region 0 is reserved for descriptors.
> + * Other regions contain buffers.
> + */
> + for (i = 0; i < (brn + 1); i++) {
> + r = &pmd->regions[i];
> +
> + r->buffer_offset = (i == 0) ? (pmd->run.num_s2m_rings +
> + pmd->run.num_m2s_rings) *
> + (sizeof(memif_ring_t) +
> + sizeof(memif_desc_t) * (1 << pmd->run.log2_ring_size)) : 0;
what exactly "buffer_offset" is? For 'region 0' it calculates the size of the
size of the 'region 0', otherwise 0. This is offset starting from where? And it
seems only used for below size assignment.
It is possible to use single region for one connection (non-zero-copy) to reduce
the number of files opened. It will be implemented in the next patch.
'buffer_offset' is offset from the start of shared memory file to the first packet buffer.
> + (1 << pmd->run.log2_ring_size) *
> + (pmd->run.num_s2m_rings +
> + pmd->run.num_m2s_rings));
There is an illusion of packet buffers can be in multiple regions (above comment
implies it) but this logic assumes all packet buffer are in same region other
than region 0, which gives us 'region 1' and this is already hardcoded in a few
places. Is there a benefit of assuming there will be more regions, will it make
simple to accept 'region 0' for rings and 'region 1' is for packet buffer?
Multiple regions are required in case of zero-copy slave. The zero-copy will
expose dpdk shared memory, where each memseg/memseg_list will be
represended by memif_region.
> +int
> +memif_connect(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_region *mr;
> + struct memif_queue *mq;
> + int i;
> +
> + for (i = 0; i < pmd->regions_num; i++) {
> + mr = pmd->regions + i;
> + if (mr != NULL) {
> + if (mr->addr == NULL) {
> + if (mr->fd < 0)
> + return -1;
> + mr->addr = mmap(NULL, mr->region_size,
> + PROT_READ | PROT_WRITE,
> + MAP_SHARED, mr->fd, 0);
> + if (mr->addr == NULL)
> + return -1;
> + }
> + }
> + }
If multiple slave is not supported, is 'id' devarg still valid, or is it used at
all?
'id' is used to identify peer interface. Interface with id=0 will only connect to an
interface with id=0 and so on...
<...>
> +static int
> +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
> + memif_interface_id_t id, uint32_t flags,
> + const char *socket_filename,
> + memif_log2_ring_size_t log2_ring_size,
> + uint16_t buffer_size, const char *secret,
> + struct ether_addr *eth_addr)
> +{
> + int ret = 0;
> + struct rte_eth_dev *eth_dev;
> + struct rte_eth_dev_data *data;
> + struct pmd_internals *pmd;
> + const unsigned int numa_node = vdev->device.numa_node;
> + const char *name = rte_vdev_device_name(vdev);
> +
> + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
> + MIF_LOG(ERR, "Zero-copy not supported.");
> + return -1;
> + }
What is the plan for the zero copy?
Can you please put the status & plan to the documentation?
<...>
I don't think that putting the status in the docs is needed. Zero-copy slave is in testing stage
and I should be able to submit a patch once we get this one applied.
> +
> + uint16_t last_head; /**< last ring head */
> + uint16_t last_tail; /**< last ring tail */
ring already has head, tail variables, what is the difference in queue ones?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-02 12:35 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-02 12:35 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
@ 2019-05-03 4:27 ` Honnappa Nagarahalli
2019-05-03 4:27 ` Honnappa Nagarahalli
2019-05-06 11:00 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
1 sibling, 2 replies; 58+ messages in thread
From: Honnappa Nagarahalli @ 2019-05-03 4:27 UTC (permalink / raw)
To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco),
Ferruh Yigit, dev, Honnappa Nagarahalli
Cc: nd, nd
> On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> > Memory interface (memif), provides high performance packet transfer
> > over shared memory.
> >
> > Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
>
> <...>
>
> > @@ -0,0 +1,200 @@
> > +.. SPDX-License-Identifier: BSD-3-Clause
> > + Copyright(c) 2018-2019 Cisco Systems, Inc.
> > +
> > +======================
> > +Memif Poll Mode Driver
> > +======================
> > +Shared memory packet interface (memif) PMD allows for DPDK and any
> > +other client using memif (DPDK, VPP, libmemif) to communicate using
> > +shared memory. Memif is Linux only.
> > +
> > +The created device transmits packets in a raw format. It can be used
> > +with Ethernet mode, IP mode, or Punt/Inject. At this moment, only
> > +Ethernet mode is supported in DPDK memif implementation.
> > +
> > +Memif works in two roles: master and slave. Slave connects to master
> > +over an existing socket. It is also a producer of shared memory file
> > +and initializes the shared memory. Master creates the socket and
> > +listens for any slave connection requests. The socket may already
> > +exist on the system. Be sure to remove any such sockets, if you are
> > +creating a master interface, or you will see an "Address already in
> > +use" error. Function ``rte_pmd_memif_remove()``,
>
> Can it be possible to remove this existing socket on successfully termination
> of the dpdk application with 'master' role?
> Otherwise each time to run a dpdk app, requires to delete the socket file first.
>
>
> net_memif is the same case as net_virtio_user, I'd use the same
> workaround.
> testpmd.c:pmd_test_exit() this, however, is only valid for testpmd.
>
>
> > +which removes memif interface, will also remove a listener socket, if
> > +it is not being used by any other interface.
> > +
> > +The method to enable one or more interfaces is to use the
> > +``--vdev=net_memif0`` option on the DPDK application command line.
> > +Each ``--vdev=net_memif1`` option given will create an interface
> > +named net_memif0, net_memif1, and so on. Memif uses unix domain
> > +socket to transmit control messages. Each memif has a unique id per
> > +socket. If you are connecting multiple interfaces using same socket,
> > +be sure to specify unique ids ``id=0``, ``id=1``, etc. Note that if
> > +you assign a socket to a master interface it becomes a listener
> > +socket. Listener socket can not be used by a slave interface on same client.
>
> When I connect two slaves, id=0 & id=1 to the master, both master and
> second slave crashed as soon as second slave connected, is this a know issue?
> Can you please check this?
>
> master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=master
> -- -i
> slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=slave --vdev
> net_memif0,role=slave,id=0 -- -i
> slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=slave2 --vdev
> net_memif0,role=slave,id=1 -- -i
>
> <...>
>
>
> Each interface can be connected to one peer interface at the same time, I'll
> add more details to the documentation.
>
>
> > +Example: testpmd and testpmd
> > +----------------------------
> > +In this example we run two instances of testpmd application and transmit
> packets over memif.
>
> How this play with multi process support? When I run a secondary app when
> memif PMD is enabled in primary, secondary process crashes.
> Can you please check and ensure at least nothing crashes when a secondary
> app run?
>
>
> For now I'd disable multi-process support, so we can get the patch applied,
> then
> provide multi-process support in a separate patch.
>
>
> > +
> > +First create ``master`` interface::
> > +
> > + #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1
> > + --vdev=net_memif,role=master -- -i
> > +
> > +Now create ``slave`` interface (master must be already running so the
> slave will connect)::
> > +
> > + #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2
> > + --vdev=net_memif -- -i
> > +
> > +Set forwarding mode in one of the instances to 'rx only' and the other to
> 'tx only'::
> > +
> > + testpmd> set fwd rxonly
> > + testpmd> start
> > +
> > + testpmd> set fwd txonly
> > + testpmd> start
>
> Would it be useful to add loopback option to the PMD, for testing/debug ?
>
> Also I am getting low performance numbers above, comparing the ring pmd
> for example, is there any performance target for the pmd?
> Same forwarding core for both testpmds, this is terrible, either something is
> wrong or I am doing something wrong, it is ~40Kpps Different forwarding
> cores for each testpmd, still low, !18Mpps
>
> <...>
>
>
> The difference between ring pmd and memif pmd is that while memif
> transfers packets,
> ring transfers whole buffers. (Also ring can not be used process to process)
> That means, it does not have to alloc/free buffers.
> I did a simple test where I modified the code so the tx function will only
> free given buffers
> and rx allocates new buffers. On my machine, this can only handle 45-
> 50Mpps.
>
> With that in mind, I believe that 23Mpps is fine performance. No
> performance target is
> defined, the goal is to be as fast as possible.
Use of C11 atomics have proven to provide better performance on weakly ordered architectures (at least on Arm). IMO, C11 atomics should be used to implement the fast path functions at least. This ensures optimal performance on all supported architectures in DPDK.
>
> The cause of the issue, where traffic drops to 40Kpps while using the same
> core for both applications,
> is that within the timeslice given to testpmd process, memif driver fills its
> queues,
> but keeps polling, not giving the other process a chance to receive the
> packets.
> Same applies to rx side, where an empty queue is polled over and over
> again. In such configuration,
> interrupt rx-mode should be used instead of polling. Or the application
> could suspend
> the port.
>
>
> > +/**
> > + * Buffer descriptor.
> > + */
> > +typedef struct __rte_packed {
> > + uint16_t flags; /**< flags */
> > +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
>
> Is this define used?
>
> <...>
>
>
> Yes, see eth_memif_tx() and eth_memif_rx()
>
>
> > + while (n_slots && n_rx_pkts < nb_pkts) {
> > + mbuf_head = rte_pktmbuf_alloc(mq->mempool);
> > + if (unlikely(mbuf_head == NULL))
> > + goto no_free_bufs;
> > + mbuf = mbuf_head;
> > + mbuf->port = mq->in_port;
> > +
> > + next_slot:
> > + s0 = cur_slot & mask;
> > + d0 = &ring->desc[s0];
> > +
> > + src_len = d0->length;
> > + dst_off = 0;
> > + src_off = 0;
> > +
> > + do {
> > + dst_len = mbuf_size - dst_off;
> > + if (dst_len == 0) {
> > + dst_off = 0;
> > + dst_len = mbuf_size +
> > + RTE_PKTMBUF_HEADROOM;
> > +
> > + mbuf = rte_pktmbuf_alloc(mq->mempool);
> > + if (unlikely(mbuf == NULL))
> > + goto no_free_bufs;
> > + mbuf->port = mq->in_port;
> > + rte_pktmbuf_chain(mbuf_head, mbuf);
> > + }
> > + cp_len = RTE_MIN(dst_len, src_len);
> > +
> > + rte_pktmbuf_pkt_len(mbuf) =
> > + rte_pktmbuf_data_len(mbuf) += cp_len;
>
> Can you please make this two lines to prevent confusion?
>
> Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'?
> 'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len' is
> not set to 'cp_len'...
>
> <...>
>
>
> Fix in next patch.
>
>
> > + if (r->addr == NULL)
> > + return;
> > + munmap(r->addr, r->region_size);
> > + if (r->fd > 0) {
> > + close(r->fd);
> > + r->fd = -1;
> > + }
> > + }
> > + rte_free(pmd->regions);
> > +}
> > +
> > +static int
> > +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn) {
> > + struct memif_region *r;
> > + char shm_name[32];
> > + int i;
> > + int ret = 0;
> > +
> > + r = rte_zmalloc("memif_region", sizeof(struct memif_region) * (brn + 1),
> 0);
> > + if (r == NULL) {
> > + MIF_LOG(ERR, "%s: Failed to allocate regions.",
> > + rte_vdev_device_name(pmd->vdev));
> > + return -ENOMEM;
> > + }
> > +
> > + pmd->regions = r;
> > + pmd->regions_num = brn + 1;
> > +
> > + /*
> > + * Create shm for every region. Region 0 is reserved for descriptors.
> > + * Other regions contain buffers.
> > + */
> > + for (i = 0; i < (brn + 1); i++) {
> > + r = &pmd->regions[i];
> > +
> > + r->buffer_offset = (i == 0) ? (pmd->run.num_s2m_rings +
> > + pmd->run.num_m2s_rings) *
> > + (sizeof(memif_ring_t) +
> > + sizeof(memif_desc_t) * (1 <<
> > + pmd->run.log2_ring_size)) : 0;
>
> what exactly "buffer_offset" is? For 'region 0' it calculates the size of the size
> of the 'region 0', otherwise 0. This is offset starting from where? And it seems
> only used for below size assignment.
>
>
> It is possible to use single region for one connection (non-zero-copy) to
> reduce
> the number of files opened. It will be implemented in the next patch.
> 'buffer_offset' is offset from the start of shared memory file to the first
> packet buffer.
>
>
> > + (1 << pmd->run.log2_ring_size) *
> > + (pmd->run.num_s2m_rings +
> > + pmd->run.num_m2s_rings));
>
> There is an illusion of packet buffers can be in multiple regions (above
> comment implies it) but this logic assumes all packet buffer are in same
> region other than region 0, which gives us 'region 1' and this is already
> hardcoded in a few places. Is there a benefit of assuming there will be more
> regions, will it make simple to accept 'region 0' for rings and 'region 1' is for
> packet buffer?
>
>
> Multiple regions are required in case of zero-copy slave. The zero-copy will
> expose dpdk shared memory, where each memseg/memseg_list will be
> represended by memif_region.
>
>
> > +int
> > +memif_connect(struct rte_eth_dev *dev) {
> > + struct pmd_internals *pmd = dev->data->dev_private;
> > + struct memif_region *mr;
> > + struct memif_queue *mq;
> > + int i;
> > +
> > + for (i = 0; i < pmd->regions_num; i++) {
> > + mr = pmd->regions + i;
> > + if (mr != NULL) {
> > + if (mr->addr == NULL) {
> > + if (mr->fd < 0)
> > + return -1;
> > + mr->addr = mmap(NULL, mr->region_size,
> > + PROT_READ | PROT_WRITE,
> > + MAP_SHARED, mr->fd, 0);
> > + if (mr->addr == NULL)
> > + return -1;
> > + }
> > + }
> > + }
>
> If multiple slave is not supported, is 'id' devarg still valid, or is it used at all?
>
>
> 'id' is used to identify peer interface. Interface with id=0 will only connect
> to an
> interface with id=0 and so on...
>
>
> <...>
>
> > +static int
> > +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
> > + memif_interface_id_t id, uint32_t flags,
> > + const char *socket_filename,
> > + memif_log2_ring_size_t log2_ring_size,
> > + uint16_t buffer_size, const char *secret,
> > + struct ether_addr *eth_addr) {
> > + int ret = 0;
> > + struct rte_eth_dev *eth_dev;
> > + struct rte_eth_dev_data *data;
> > + struct pmd_internals *pmd;
> > + const unsigned int numa_node = vdev->device.numa_node;
> > + const char *name = rte_vdev_device_name(vdev);
> > +
> > + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
> > + MIF_LOG(ERR, "Zero-copy not supported.");
> > + return -1;
> > + }
>
> What is the plan for the zero copy?
> Can you please put the status & plan to the documentation?
>
> <...>
>
>
> I don't think that putting the status in the docs is needed. Zero-copy slave
> is in testing stage
> and I should be able to submit a patch once we get this one applied.
>
>
> > +
> > + uint16_t last_head; /**< last ring head */
> > + uint16_t last_tail; /**< last ring tail */
>
> ring already has head, tail variables, what is the difference in queue ones?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-03 4:27 ` Honnappa Nagarahalli
@ 2019-05-03 4:27 ` Honnappa Nagarahalli
2019-05-06 11:00 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
1 sibling, 0 replies; 58+ messages in thread
From: Honnappa Nagarahalli @ 2019-05-03 4:27 UTC (permalink / raw)
To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco),
Ferruh Yigit, dev, Honnappa Nagarahalli
Cc: nd, nd
> On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> > Memory interface (memif), provides high performance packet transfer
> > over shared memory.
> >
> > Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
>
> <...>
>
> > @@ -0,0 +1,200 @@
> > +.. SPDX-License-Identifier: BSD-3-Clause
> > + Copyright(c) 2018-2019 Cisco Systems, Inc.
> > +
> > +======================
> > +Memif Poll Mode Driver
> > +======================
> > +Shared memory packet interface (memif) PMD allows for DPDK and any
> > +other client using memif (DPDK, VPP, libmemif) to communicate using
> > +shared memory. Memif is Linux only.
> > +
> > +The created device transmits packets in a raw format. It can be used
> > +with Ethernet mode, IP mode, or Punt/Inject. At this moment, only
> > +Ethernet mode is supported in DPDK memif implementation.
> > +
> > +Memif works in two roles: master and slave. Slave connects to master
> > +over an existing socket. It is also a producer of shared memory file
> > +and initializes the shared memory. Master creates the socket and
> > +listens for any slave connection requests. The socket may already
> > +exist on the system. Be sure to remove any such sockets, if you are
> > +creating a master interface, or you will see an "Address already in
> > +use" error. Function ``rte_pmd_memif_remove()``,
>
> Can it be possible to remove this existing socket on successfully termination
> of the dpdk application with 'master' role?
> Otherwise each time to run a dpdk app, requires to delete the socket file first.
>
>
> net_memif is the same case as net_virtio_user, I'd use the same
> workaround.
> testpmd.c:pmd_test_exit() this, however, is only valid for testpmd.
>
>
> > +which removes memif interface, will also remove a listener socket, if
> > +it is not being used by any other interface.
> > +
> > +The method to enable one or more interfaces is to use the
> > +``--vdev=net_memif0`` option on the DPDK application command line.
> > +Each ``--vdev=net_memif1`` option given will create an interface
> > +named net_memif0, net_memif1, and so on. Memif uses unix domain
> > +socket to transmit control messages. Each memif has a unique id per
> > +socket. If you are connecting multiple interfaces using same socket,
> > +be sure to specify unique ids ``id=0``, ``id=1``, etc. Note that if
> > +you assign a socket to a master interface it becomes a listener
> > +socket. Listener socket can not be used by a slave interface on same client.
>
> When I connect two slaves, id=0 & id=1 to the master, both master and
> second slave crashed as soon as second slave connected, is this a know issue?
> Can you please check this?
>
> master: ./build/app/testpmd -w0:0.0 -l20,21 --vdev net_memif0,role=master
> -- -i
> slave1: ./build/app/testpmd -w0:0.0 -l20,22 --file-prefix=slave --vdev
> net_memif0,role=slave,id=0 -- -i
> slave2: ./build/app/testpmd -w0:0.0 -l20,23 --file-prefix=slave2 --vdev
> net_memif0,role=slave,id=1 -- -i
>
> <...>
>
>
> Each interface can be connected to one peer interface at the same time, I'll
> add more details to the documentation.
>
>
> > +Example: testpmd and testpmd
> > +----------------------------
> > +In this example we run two instances of testpmd application and transmit
> packets over memif.
>
> How this play with multi process support? When I run a secondary app when
> memif PMD is enabled in primary, secondary process crashes.
> Can you please check and ensure at least nothing crashes when a secondary
> app run?
>
>
> For now I'd disable multi-process support, so we can get the patch applied,
> then
> provide multi-process support in a separate patch.
>
>
> > +
> > +First create ``master`` interface::
> > +
> > + #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1
> > + --vdev=net_memif,role=master -- -i
> > +
> > +Now create ``slave`` interface (master must be already running so the
> slave will connect)::
> > +
> > + #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2
> > + --vdev=net_memif -- -i
> > +
> > +Set forwarding mode in one of the instances to 'rx only' and the other to
> 'tx only'::
> > +
> > + testpmd> set fwd rxonly
> > + testpmd> start
> > +
> > + testpmd> set fwd txonly
> > + testpmd> start
>
> Would it be useful to add loopback option to the PMD, for testing/debug ?
>
> Also I am getting low performance numbers above, comparing the ring pmd
> for example, is there any performance target for the pmd?
> Same forwarding core for both testpmds, this is terrible, either something is
> wrong or I am doing something wrong, it is ~40Kpps Different forwarding
> cores for each testpmd, still low, !18Mpps
>
> <...>
>
>
> The difference between ring pmd and memif pmd is that while memif
> transfers packets,
> ring transfers whole buffers. (Also ring can not be used process to process)
> That means, it does not have to alloc/free buffers.
> I did a simple test where I modified the code so the tx function will only
> free given buffers
> and rx allocates new buffers. On my machine, this can only handle 45-
> 50Mpps.
>
> With that in mind, I believe that 23Mpps is fine performance. No
> performance target is
> defined, the goal is to be as fast as possible.
Use of C11 atomics have proven to provide better performance on weakly ordered architectures (at least on Arm). IMO, C11 atomics should be used to implement the fast path functions at least. This ensures optimal performance on all supported architectures in DPDK.
>
> The cause of the issue, where traffic drops to 40Kpps while using the same
> core for both applications,
> is that within the timeslice given to testpmd process, memif driver fills its
> queues,
> but keeps polling, not giving the other process a chance to receive the
> packets.
> Same applies to rx side, where an empty queue is polled over and over
> again. In such configuration,
> interrupt rx-mode should be used instead of polling. Or the application
> could suspend
> the port.
>
>
> > +/**
> > + * Buffer descriptor.
> > + */
> > +typedef struct __rte_packed {
> > + uint16_t flags; /**< flags */
> > +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
>
> Is this define used?
>
> <...>
>
>
> Yes, see eth_memif_tx() and eth_memif_rx()
>
>
> > + while (n_slots && n_rx_pkts < nb_pkts) {
> > + mbuf_head = rte_pktmbuf_alloc(mq->mempool);
> > + if (unlikely(mbuf_head == NULL))
> > + goto no_free_bufs;
> > + mbuf = mbuf_head;
> > + mbuf->port = mq->in_port;
> > +
> > + next_slot:
> > + s0 = cur_slot & mask;
> > + d0 = &ring->desc[s0];
> > +
> > + src_len = d0->length;
> > + dst_off = 0;
> > + src_off = 0;
> > +
> > + do {
> > + dst_len = mbuf_size - dst_off;
> > + if (dst_len == 0) {
> > + dst_off = 0;
> > + dst_len = mbuf_size +
> > + RTE_PKTMBUF_HEADROOM;
> > +
> > + mbuf = rte_pktmbuf_alloc(mq->mempool);
> > + if (unlikely(mbuf == NULL))
> > + goto no_free_bufs;
> > + mbuf->port = mq->in_port;
> > + rte_pktmbuf_chain(mbuf_head, mbuf);
> > + }
> > + cp_len = RTE_MIN(dst_len, src_len);
> > +
> > + rte_pktmbuf_pkt_len(mbuf) =
> > + rte_pktmbuf_data_len(mbuf) += cp_len;
>
> Can you please make this two lines to prevent confusion?
>
> Also shouldn't need to add 'cp_len' to 'mbuf_head->pkt_len'?
> 'rte_pktmbuf_chain' updates 'mbuf_head' but at that stage 'mbuf->pkt_len' is
> not set to 'cp_len'...
>
> <...>
>
>
> Fix in next patch.
>
>
> > + if (r->addr == NULL)
> > + return;
> > + munmap(r->addr, r->region_size);
> > + if (r->fd > 0) {
> > + close(r->fd);
> > + r->fd = -1;
> > + }
> > + }
> > + rte_free(pmd->regions);
> > +}
> > +
> > +static int
> > +memif_alloc_regions(struct pmd_internals *pmd, uint8_t brn) {
> > + struct memif_region *r;
> > + char shm_name[32];
> > + int i;
> > + int ret = 0;
> > +
> > + r = rte_zmalloc("memif_region", sizeof(struct memif_region) * (brn + 1),
> 0);
> > + if (r == NULL) {
> > + MIF_LOG(ERR, "%s: Failed to allocate regions.",
> > + rte_vdev_device_name(pmd->vdev));
> > + return -ENOMEM;
> > + }
> > +
> > + pmd->regions = r;
> > + pmd->regions_num = brn + 1;
> > +
> > + /*
> > + * Create shm for every region. Region 0 is reserved for descriptors.
> > + * Other regions contain buffers.
> > + */
> > + for (i = 0; i < (brn + 1); i++) {
> > + r = &pmd->regions[i];
> > +
> > + r->buffer_offset = (i == 0) ? (pmd->run.num_s2m_rings +
> > + pmd->run.num_m2s_rings) *
> > + (sizeof(memif_ring_t) +
> > + sizeof(memif_desc_t) * (1 <<
> > + pmd->run.log2_ring_size)) : 0;
>
> what exactly "buffer_offset" is? For 'region 0' it calculates the size of the size
> of the 'region 0', otherwise 0. This is offset starting from where? And it seems
> only used for below size assignment.
>
>
> It is possible to use single region for one connection (non-zero-copy) to
> reduce
> the number of files opened. It will be implemented in the next patch.
> 'buffer_offset' is offset from the start of shared memory file to the first
> packet buffer.
>
>
> > + (1 << pmd->run.log2_ring_size) *
> > + (pmd->run.num_s2m_rings +
> > + pmd->run.num_m2s_rings));
>
> There is an illusion of packet buffers can be in multiple regions (above
> comment implies it) but this logic assumes all packet buffer are in same
> region other than region 0, which gives us 'region 1' and this is already
> hardcoded in a few places. Is there a benefit of assuming there will be more
> regions, will it make simple to accept 'region 0' for rings and 'region 1' is for
> packet buffer?
>
>
> Multiple regions are required in case of zero-copy slave. The zero-copy will
> expose dpdk shared memory, where each memseg/memseg_list will be
> represended by memif_region.
>
>
> > +int
> > +memif_connect(struct rte_eth_dev *dev) {
> > + struct pmd_internals *pmd = dev->data->dev_private;
> > + struct memif_region *mr;
> > + struct memif_queue *mq;
> > + int i;
> > +
> > + for (i = 0; i < pmd->regions_num; i++) {
> > + mr = pmd->regions + i;
> > + if (mr != NULL) {
> > + if (mr->addr == NULL) {
> > + if (mr->fd < 0)
> > + return -1;
> > + mr->addr = mmap(NULL, mr->region_size,
> > + PROT_READ | PROT_WRITE,
> > + MAP_SHARED, mr->fd, 0);
> > + if (mr->addr == NULL)
> > + return -1;
> > + }
> > + }
> > + }
>
> If multiple slave is not supported, is 'id' devarg still valid, or is it used at all?
>
>
> 'id' is used to identify peer interface. Interface with id=0 will only connect
> to an
> interface with id=0 and so on...
>
>
> <...>
>
> > +static int
> > +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
> > + memif_interface_id_t id, uint32_t flags,
> > + const char *socket_filename,
> > + memif_log2_ring_size_t log2_ring_size,
> > + uint16_t buffer_size, const char *secret,
> > + struct ether_addr *eth_addr) {
> > + int ret = 0;
> > + struct rte_eth_dev *eth_dev;
> > + struct rte_eth_dev_data *data;
> > + struct pmd_internals *pmd;
> > + const unsigned int numa_node = vdev->device.numa_node;
> > + const char *name = rte_vdev_device_name(vdev);
> > +
> > + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
> > + MIF_LOG(ERR, "Zero-copy not supported.");
> > + return -1;
> > + }
>
> What is the plan for the zero copy?
> Can you please put the status & plan to the documentation?
>
> <...>
>
>
> I don't think that putting the status in the docs is needed. Zero-copy slave
> is in testing stage
> and I should be able to submit a patch once we get this one applied.
>
>
> > +
> > + uint16_t last_head; /**< last ring head */
> > + uint16_t last_tail; /**< last ring tail */
>
> ring already has head, tail variables, what is the difference in queue ones?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-03 4:27 ` Honnappa Nagarahalli
2019-05-03 4:27 ` Honnappa Nagarahalli
@ 2019-05-06 11:00 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-06 11:00 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-06 11:04 ` Damjan Marion (damarion)
1 sibling, 2 replies; 58+ messages in thread
From: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) @ 2019-05-06 11:00 UTC (permalink / raw)
To: Honnappa Nagarahalli, Ferruh Yigit, dev; +Cc: nd, Damjan Marion (damarion)
________________________________________
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Sent: Friday, May 3, 2019 6:27 AM
To: Jakub Grajciar; Ferruh Yigit; dev@dpdk.org; Honnappa Nagarahalli
Cc: nd; nd
Subject: RE: [dpdk-dev] [RFC v5] /net: memory interface (memif)
> On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> > Memory interface (memif), provides high performance packet transfer
> > over shared memory.
> >
> > Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
>
<...>
> With that in mind, I believe that 23Mpps is fine performance. No
> performance target is
> defined, the goal is to be as fast as possible.
Use of C11 atomics have proven to provide better performance on weakly ordered architectures (at least on Arm). IMO, C11 atomics should be used to implement the fast path functions at least. This ensures optimal performance on all supported architectures in DPDK.
Atomics are not required by memif driver.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-06 11:00 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
@ 2019-05-06 11:00 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-06 11:04 ` Damjan Marion (damarion)
1 sibling, 0 replies; 58+ messages in thread
From: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) @ 2019-05-06 11:00 UTC (permalink / raw)
To: Honnappa Nagarahalli, Ferruh Yigit, dev; +Cc: nd, Damjan Marion (damarion)
________________________________________
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Sent: Friday, May 3, 2019 6:27 AM
To: Jakub Grajciar; Ferruh Yigit; dev@dpdk.org; Honnappa Nagarahalli
Cc: nd; nd
Subject: RE: [dpdk-dev] [RFC v5] /net: memory interface (memif)
> On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> > Memory interface (memif), provides high performance packet transfer
> > over shared memory.
> >
> > Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
>
<...>
> With that in mind, I believe that 23Mpps is fine performance. No
> performance target is
> defined, the goal is to be as fast as possible.
Use of C11 atomics have proven to provide better performance on weakly ordered architectures (at least on Arm). IMO, C11 atomics should be used to implement the fast path functions at least. This ensures optimal performance on all supported architectures in DPDK.
Atomics are not required by memif driver.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-06 11:00 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-06 11:00 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
@ 2019-05-06 11:04 ` Damjan Marion (damarion)
2019-05-06 11:04 ` Damjan Marion (damarion)
2019-05-07 11:29 ` Honnappa Nagarahalli
1 sibling, 2 replies; 58+ messages in thread
From: Damjan Marion (damarion) @ 2019-05-06 11:04 UTC (permalink / raw)
To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
Cc: Honnappa Nagarahalli, Ferruh Yigit, dev, nd
> On 6 May 2019, at 13:00, Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) <jgrajcia@cisco.com> wrote:
>
>
> ________________________________________
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Friday, May 3, 2019 6:27 AM
> To: Jakub Grajciar; Ferruh Yigit; dev@dpdk.org; Honnappa Nagarahalli
> Cc: nd; nd
> Subject: RE: [dpdk-dev] [RFC v5] /net: memory interface (memif)
>
>> On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
>>> Memory interface (memif), provides high performance packet transfer
>>> over shared memory.
>>>
>>> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
>>
>
> <...>
>
>> With that in mind, I believe that 23Mpps is fine performance. No
>> performance target is
>> defined, the goal is to be as fast as possible.
> Use of C11 atomics have proven to provide better performance on weakly ordered architectures (at least on Arm). IMO, C11 atomics should be used to implement the fast path functions at least. This ensures optimal performance on all supported architectures in DPDK.
>
> Atomics are not required by memif driver.
Correct, only thing we need is store barrier once per batch of packets,
to make sure that descriptor changes are globally visible before we bump head pointer.
--
Damjan
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-06 11:04 ` Damjan Marion (damarion)
@ 2019-05-06 11:04 ` Damjan Marion (damarion)
2019-05-07 11:29 ` Honnappa Nagarahalli
1 sibling, 0 replies; 58+ messages in thread
From: Damjan Marion (damarion) @ 2019-05-06 11:04 UTC (permalink / raw)
To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
Cc: Honnappa Nagarahalli, Ferruh Yigit, dev, nd
> On 6 May 2019, at 13:00, Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) <jgrajcia@cisco.com> wrote:
>
>
> ________________________________________
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Friday, May 3, 2019 6:27 AM
> To: Jakub Grajciar; Ferruh Yigit; dev@dpdk.org; Honnappa Nagarahalli
> Cc: nd; nd
> Subject: RE: [dpdk-dev] [RFC v5] /net: memory interface (memif)
>
>> On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
>>> Memory interface (memif), provides high performance packet transfer
>>> over shared memory.
>>>
>>> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
>>
>
> <...>
>
>> With that in mind, I believe that 23Mpps is fine performance. No
>> performance target is
>> defined, the goal is to be as fast as possible.
> Use of C11 atomics have proven to provide better performance on weakly ordered architectures (at least on Arm). IMO, C11 atomics should be used to implement the fast path functions at least. This ensures optimal performance on all supported architectures in DPDK.
>
> Atomics are not required by memif driver.
Correct, only thing we need is store barrier once per batch of packets,
to make sure that descriptor changes are globally visible before we bump head pointer.
--
Damjan
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-06 11:04 ` Damjan Marion (damarion)
2019-05-06 11:04 ` Damjan Marion (damarion)
@ 2019-05-07 11:29 ` Honnappa Nagarahalli
2019-05-07 11:29 ` Honnappa Nagarahalli
2019-05-07 11:37 ` Damjan Marion (damarion)
1 sibling, 2 replies; 58+ messages in thread
From: Honnappa Nagarahalli @ 2019-05-07 11:29 UTC (permalink / raw)
To: Damjan Marion (damarion),
Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
Cc: Ferruh Yigit, dev, nd, nd
> >
> >> On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> >>> Memory interface (memif), provides high performance packet transfer
> >>> over shared memory.
> >>>
> >>> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
> >>
> >
> > <...>
> >
> >> With that in mind, I believe that 23Mpps is fine performance. No
> >> performance target is
> >> defined, the goal is to be as fast as possible.
> > Use of C11 atomics have proven to provide better performance on weakly
> ordered architectures (at least on Arm). IMO, C11 atomics should be used to
> implement the fast path functions at least. This ensures optimal performance
> on all supported architectures in DPDK.
> >
> > Atomics are not required by memif driver.
>
> Correct, only thing we need is store barrier once per batch of packets, to
> make sure that descriptor changes are globally visible before we bump head
> pointer.
May be I was not clear in my comments, I meant that the use of GCC C++11 memory model aware atomic operations [1] show better performance. So, instead of using full memory barriers you can use store-release and load-acquire semantics. A similar change was done to svm_fifo data structure in VPP [2] (though the original algorithm used was different from the one used in this memif patch).
[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
[2] https://gerrit.fd.io/r/#/c/18223/
>
> --
> Damjan
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-07 11:29 ` Honnappa Nagarahalli
@ 2019-05-07 11:29 ` Honnappa Nagarahalli
2019-05-07 11:37 ` Damjan Marion (damarion)
1 sibling, 0 replies; 58+ messages in thread
From: Honnappa Nagarahalli @ 2019-05-07 11:29 UTC (permalink / raw)
To: Damjan Marion (damarion),
Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
Cc: Ferruh Yigit, dev, nd, nd
> >
> >> On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
> >>> Memory interface (memif), provides high performance packet transfer
> >>> over shared memory.
> >>>
> >>> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
> >>
> >
> > <...>
> >
> >> With that in mind, I believe that 23Mpps is fine performance. No
> >> performance target is
> >> defined, the goal is to be as fast as possible.
> > Use of C11 atomics have proven to provide better performance on weakly
> ordered architectures (at least on Arm). IMO, C11 atomics should be used to
> implement the fast path functions at least. This ensures optimal performance
> on all supported architectures in DPDK.
> >
> > Atomics are not required by memif driver.
>
> Correct, only thing we need is store barrier once per batch of packets, to
> make sure that descriptor changes are globally visible before we bump head
> pointer.
May be I was not clear in my comments, I meant that the use of GCC C++11 memory model aware atomic operations [1] show better performance. So, instead of using full memory barriers you can use store-release and load-acquire semantics. A similar change was done to svm_fifo data structure in VPP [2] (though the original algorithm used was different from the one used in this memif patch).
[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
[2] https://gerrit.fd.io/r/#/c/18223/
>
> --
> Damjan
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-07 11:29 ` Honnappa Nagarahalli
2019-05-07 11:29 ` Honnappa Nagarahalli
@ 2019-05-07 11:37 ` Damjan Marion (damarion)
2019-05-07 11:37 ` Damjan Marion (damarion)
2019-05-08 7:53 ` Honnappa Nagarahalli
1 sibling, 2 replies; 58+ messages in thread
From: Damjan Marion (damarion) @ 2019-05-07 11:37 UTC (permalink / raw)
To: Honnappa Nagarahalli
Cc: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco),
Ferruh Yigit, dev, nd
--
Damjan
On 7 May 2019, at 13:29, Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>> wrote:
On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
Memory interface (memif), provides high performance packet transfer
over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com<mailto:jgrajcia@cisco.com>>
<...>
With that in mind, I believe that 23Mpps is fine performance. No
performance target is
defined, the goal is to be as fast as possible.
Use of C11 atomics have proven to provide better performance on weakly
ordered architectures (at least on Arm). IMO, C11 atomics should be used to
implement the fast path functions at least. This ensures optimal performance
on all supported architectures in DPDK.
Atomics are not required by memif driver.
Correct, only thing we need is store barrier once per batch of packets, to
make sure that descriptor changes are globally visible before we bump head
pointer.
May be I was not clear in my comments, I meant that the use of GCC C++11 memory model aware atomic operations [1] show better performance. So, instead of using full memory barriers you can use store-release and load-acquire semantics. A similar change was done to svm_fifo data structure in VPP [2] (though the original algorithm used was different from the one used in this memif patch).
[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
[2] https://gerrit.fd.io/r/#/c/18223/
Typically we have hundreds of normal memory stores into the packet ring, then single store fence and then finally one more store to bump head pointer.
Sorry. I’m not getting what are you suggesting here, and how that can be faster...
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-07 11:37 ` Damjan Marion (damarion)
@ 2019-05-07 11:37 ` Damjan Marion (damarion)
2019-05-08 7:53 ` Honnappa Nagarahalli
1 sibling, 0 replies; 58+ messages in thread
From: Damjan Marion (damarion) @ 2019-05-07 11:37 UTC (permalink / raw)
To: Honnappa Nagarahalli
Cc: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco),
Ferruh Yigit, dev, nd
--
Damjan
On 7 May 2019, at 13:29, Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>> wrote:
On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
Memory interface (memif), provides high performance packet transfer
over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com<mailto:jgrajcia@cisco.com>>
<...>
With that in mind, I believe that 23Mpps is fine performance. No
performance target is
defined, the goal is to be as fast as possible.
Use of C11 atomics have proven to provide better performance on weakly
ordered architectures (at least on Arm). IMO, C11 atomics should be used to
implement the fast path functions at least. This ensures optimal performance
on all supported architectures in DPDK.
Atomics are not required by memif driver.
Correct, only thing we need is store barrier once per batch of packets, to
make sure that descriptor changes are globally visible before we bump head
pointer.
May be I was not clear in my comments, I meant that the use of GCC C++11 memory model aware atomic operations [1] show better performance. So, instead of using full memory barriers you can use store-release and load-acquire semantics. A similar change was done to svm_fifo data structure in VPP [2] (though the original algorithm used was different from the one used in this memif patch).
[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
[2] https://gerrit.fd.io/r/#/c/18223/
Typically we have hundreds of normal memory stores into the packet ring, then single store fence and then finally one more store to bump head pointer.
Sorry. I’m not getting what are you suggesting here, and how that can be faster...
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-07 11:37 ` Damjan Marion (damarion)
2019-05-07 11:37 ` Damjan Marion (damarion)
@ 2019-05-08 7:53 ` Honnappa Nagarahalli
2019-05-08 7:53 ` Honnappa Nagarahalli
1 sibling, 1 reply; 58+ messages in thread
From: Honnappa Nagarahalli @ 2019-05-08 7:53 UTC (permalink / raw)
To: Damjan Marion (damarion)
Cc: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco),
Ferruh Yigit, dev, nd, nd
--
Damjan
On 7 May 2019, at 13:29, Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>> wrote:
On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
Memory interface (memif), provides high performance packet transfer
over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com<mailto:jgrajcia@cisco.com>>
<...>
With that in mind, I believe that 23Mpps is fine performance. No
performance target is
defined, the goal is to be as fast as possible.
Use of C11 atomics have proven to provide better performance on weakly
ordered architectures (at least on Arm). IMO, C11 atomics should be used to
implement the fast path functions at least. This ensures optimal performance
on all supported architectures in DPDK.
Atomics are not required by memif driver.
Correct, only thing we need is store barrier once per batch of packets, to
make sure that descriptor changes are globally visible before we bump head
pointer.
May be I was not clear in my comments, I meant that the use of GCC C++11 memory model aware atomic operations [1] show better performance. So, instead of using full memory barriers you can use store-release and load-acquire semantics. A similar change was done to svm_fifo data structure in VPP [2] (though the original algorithm used was different from the one used in this memif patch).
[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
[2] https://gerrit.fd.io/r/#/c/18223/
Typically we have hundreds of normal memory stores into the packet ring, then single store fence and then finally one more store to bump head pointer.
Sorry. I’m not getting what are you suggesting here, and how that can be faster...
[Honnappa] I will get back with a patch and we can go from there
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v5] /net: memory interface (memif)
2019-05-08 7:53 ` Honnappa Nagarahalli
@ 2019-05-08 7:53 ` Honnappa Nagarahalli
0 siblings, 0 replies; 58+ messages in thread
From: Honnappa Nagarahalli @ 2019-05-08 7:53 UTC (permalink / raw)
To: Damjan Marion (damarion)
Cc: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco),
Ferruh Yigit, dev, nd, nd
--
Damjan
On 7 May 2019, at 13:29, Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>> wrote:
On 3/22/2019 11:57 AM, Jakub Grajciar wrote:
Memory interface (memif), provides high performance packet transfer
over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com<mailto:jgrajcia@cisco.com>>
<...>
With that in mind, I believe that 23Mpps is fine performance. No
performance target is
defined, the goal is to be as fast as possible.
Use of C11 atomics have proven to provide better performance on weakly
ordered architectures (at least on Arm). IMO, C11 atomics should be used to
implement the fast path functions at least. This ensures optimal performance
on all supported architectures in DPDK.
Atomics are not required by memif driver.
Correct, only thing we need is store barrier once per batch of packets, to
make sure that descriptor changes are globally visible before we bump head
pointer.
May be I was not clear in my comments, I meant that the use of GCC C++11 memory model aware atomic operations [1] show better performance. So, instead of using full memory barriers you can use store-release and load-acquire semantics. A similar change was done to svm_fifo data structure in VPP [2] (though the original algorithm used was different from the one used in this memif patch).
[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
[2] https://gerrit.fd.io/r/#/c/18223/
Typically we have hundreds of normal memory stores into the packet ring, then single store fence and then finally one more store to bump head pointer.
Sorry. I’m not getting what are you suggesting here, and how that can be faster...
[Honnappa] I will get back with a patch and we can go from there
^ permalink raw reply [flat|nested] 58+ messages in thread
* [dpdk-dev] [RFC v6] /net: memory interface (memif)
2019-03-22 11:57 ` [dpdk-dev] [RFC v5] " Jakub Grajciar
2019-03-22 11:57 ` Jakub Grajciar
2019-03-25 20:58 ` Ferruh Yigit
@ 2019-05-09 8:30 ` Jakub Grajciar
2019-05-09 8:30 ` Jakub Grajciar
2019-05-13 10:45 ` [dpdk-dev] [RFC v7] " Jakub Grajciar
2 siblings, 2 replies; 58+ messages in thread
From: Jakub Grajciar @ 2019-05-09 8:30 UTC (permalink / raw)
To: dev; +Cc: Jakub Grajciar
Memory interface (memif), provides high performance
packet transfer over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
---
MAINTAINERS | 6 +
app/test-pmd/testpmd.c | 2 +
config/common_base | 5 +
config/common_linux | 1 +
doc/guides/nics/features/memif.ini | 14 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/memif.rst | 234 ++++
doc/guides/rel_notes/release_19_05.rst | 4 +
drivers/net/Makefile | 1 +
drivers/net/memif/Makefile | 30 +
drivers/net/memif/memif.h | 179 +++
drivers/net/memif/memif_socket.c | 1105 +++++++++++++++++
drivers/net/memif/memif_socket.h | 105 ++
drivers/net/memif/meson.build | 13 +
drivers/net/memif/rte_eth_memif.c | 1211 +++++++++++++++++++
drivers/net/memif/rte_eth_memif.h | 207 ++++
drivers/net/memif/rte_pmd_memif_version.map | 4 +
drivers/net/meson.build | 1 +
mk/rte.app.mk | 1 +
19 files changed, 3124 insertions(+)
create mode 100644 doc/guides/nics/features/memif.ini
create mode 100644 doc/guides/nics/memif.rst
create mode 100644 drivers/net/memif/Makefile
create mode 100644 drivers/net/memif/memif.h
create mode 100644 drivers/net/memif/memif_socket.c
create mode 100644 drivers/net/memif/memif_socket.h
create mode 100644 drivers/net/memif/meson.build
create mode 100644 drivers/net/memif/rte_eth_memif.c
create mode 100644 drivers/net/memif/rte_eth_memif.h
create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
requires patch: http://patchwork.dpdk.org/patch/51511/
Memif uses a unix domain socket as control channel (cc).
rte_interrupts.h is used to handle interrupts for this cc.
This patch is required, because the message received can
be a reason for disconnecting.
v3:
- coding style fixes
- documentation
- doxygen comments
- use strlcpy() instead of strncpy()
- use RTE_BUILD_BUG_ON instead of _Static_assert()
- fix meson build deps
v4:
- coding style fixes
- doc update (desc format, messaging, features, ...)
- pointer arithmetic fix
- __rte_packed and __rte_aligned instead of __attribute__()
- fixed multi-queue
- support additional archs (memfd_create syscall)
v5:
- coding style fixes
- add release notes
- remove unused dependencies
- doc update
v6:
- remove socket on termination of testpmd application
- disallow memif probing in secondary process (return error), multi-process support will be implemented in separate patch
- fix chained buffer packet length
- use single region for rings and buffers (non-zero-copy)
- doc update
diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829c5..3698c146b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -841,6 +841,12 @@ F: drivers/net/softnic/
F: doc/guides/nics/features/softnic.ini
F: doc/guides/nics/softnic.rst
+Memif PMD
+M: Jakub Grajciar <jgrajcia@cisco.com>
+F: drivers/net/memif/
+F: doc/guides/nics/memif.rst
+F: doc/guides/nics/features/memif.ini
+
Crypto Drivers
--------------
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6fbfd29dd..9eec4c19d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2493,6 +2493,8 @@ pmd_test_exit(void)
device = rte_eth_devices[pt_id].device;
if (device && !strcmp(device->driver->name, "net_virtio_user"))
detach_port_device(pt_id);
+ else if (device && !strcmp(device->driver->name, "net_memif"))
+ detach_port_device(pt_id);
}
}
diff --git a/config/common_base b/config/common_base
index 6b96e0e80..dca119a65 100644
--- a/config/common_base
+++ b/config/common_base
@@ -444,6 +444,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
#
CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+#
+# Compile Memory Interface PMD driver (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_MEMIF=n
+
#
# Compile link bonding PMD library
#
diff --git a/config/common_linux b/config/common_linux
index 75334273d..87514fe4f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_LIBRTE_IFC_PMD=y
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_MEMIF=y
CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
CONFIG_RTE_LIBRTE_PMD_TAP=y
CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
new file mode 100644
index 000000000..807d9ecdc
--- /dev/null
+++ b/doc/guides/nics/features/memif.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'memif' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status = Y
+Basic stats = Y
+Jumbo frame = Y
+ARMv8 = Y
+Power8 = Y
+x86-32 = Y
+x86-64 = Y
+Usage doc = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2221c35f2..691e7209f 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
intel_vf
kni
liquidio
+ memif
mlx4
mlx5
mvneta
diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
new file mode 100644
index 000000000..0d0312442
--- /dev/null
+++ b/doc/guides/nics/memif.rst
@@ -0,0 +1,234 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2018-2019 Cisco Systems, Inc.
+
+======================
+Memif Poll Mode Driver
+======================
+
+Shared memory packet interface (memif) PMD allows for DPDK and any other client
+using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
+Linux only.
+
+The created device transmits packets in a raw format. It can be used with
+Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
+supported in DPDK memif implementation.
+
+Memif works in two roles: master and slave. Slave connects to master over an
+existing socket. It is also a producer of shared memory file and initializes
+the shared memory. Each interface can be connected to one peer interface
+at same time. The peer interface is identified by id parameter. Master
+creates the socket and listens for any slave connection requests. The socket
+may already exist on the system. Be sure to remove any such sockets, if you
+are creating a master interface, or you will see an "Address already in use"
+error. Function ``rte_pmd_memif_remove()``, which removes memif interface,
+will also remove a listener socket, if it is not being used by any other
+interface.
+
+The method to enable one or more interfaces is to use the
+``--vdev=net_memif0`` option on the DPDK application command line. Each
+``--vdev=net_memif1`` option given will create an interface named net_memif0,
+net_memif1, and so on. Memif uses unix domain socket to transmit control
+messages. Each memif has a unique id per socket. This id is used to identify
+peer interface. If you are connecting multiple
+interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
+etc. Note that if you assign a socket to a master interface it becomes a
+listener socket. Listener socket can not be used by a slave interface on same
+client.
+
+.. csv-table:: **Memif configuration options**
+ :header: "Option", "Description", "Default", "Valid value"
+
+ "id=0", "Used to identify peer interface", "0", "uint32_t"
+ "role=master", "Set memif role", "slave", "master|slave"
+ "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
+ "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
+ "nrxq=2", "Number of RX queues", "1", "255"
+ "ntxq=2", "Number of TX queues", "1", "255"
+ "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
+ "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
+ "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
+ "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
+
+**Connection establishment**
+
+In order to create memif connection, two memif interfaces, each in separate
+process, are needed. One interface in ``master`` role and other in
+``slave`` role. It is not possible to connect two interfaces in a single
+process. Each interface can be connected to one interface at same time,
+identified by matching id parameter.
+
+Memif driver uses unix domain socket to exchange required information between
+memif interfaces. Socket file path is specified at interface creation see
+*Memif configuration options* table above. If socket is used by ``master``
+interface, it's marked as listener socket (in scope of current process) and
+listens to connection requests from other processes. One socket can be used by
+multiple interfaces. One process can have ``slave`` and ``master`` interfaces
+at the same time, provided each role is assigned unique socket.
+
+For detailed information on memif control messages, see: net/memif/memif.h.
+
+Slave interface attempts to make a connection on assigned socket. Process
+listening on this socket will extract the connection request and create a new
+connected socket (control channel). Then it sends the 'hello' message
+(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
+adjusts its configuration accordingly, and sends 'init' message
+(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
+uses this id to find master interface, and assigns the control channel to this
+interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
+sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
+every region allocated. Master responds to each of these messages with 'ack'
+message. Same behavior applies to rings. Slave sends 'add ring' message
+(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
+each message with 'ack' message. To finalize the connection, slave interface
+sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
+master maps regions to its address space, initializes rings and responds with
+'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
+(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
+any time, due to driver error or if the interface is being deleted.
+
+Files
+
+- net/memif/memif.h *- control messages definitions*
+- net/memif/memif_socket.h
+- net/memif/memif_socket.c
+
+Shared memory
+~~~~~~~~~~~~~
+
+**Shared memory format**
+
+Slave is producer and master is consumer. Memory regions, are mapped shared memory files,
+created by memif slave and provided to master at connection establishment.
+Regions contain rings and buffers. Rings and buffers can also be separated into multiple
+regions. For no-zero-copy, rings and buffers are stored inside single memory
+region to reduce the number of opened files.
+
+region n (no-zero-copy):
+
++-----------------------+-------------------------------------------------------------------------+
+| Rings | Buffers |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+| S2M rings | M2S rings | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+
+S2M OR M2S Rings:
+
++--------+--------+-----------------------+
+| ring 0 | ring 1 | ring num_s2m_rings - 1|
++--------+--------+-----------------------+
+
+ring 0:
+
++-------------+---------------------------------------+
+| ring header | (1 << pmd->run.log2_ring_size) * desc |
++-------------+---------------------------------------+
+
+Descriptors are assigned packet buffers in order of rings creation. If we have one ring
+in each direction and ring size is 1024, then first 1024 buffers will belong to S2M ring and
+last 1024 will belong to M2S ring. In case of zero-copy, buffers are dequeued and
+enqueued as needed.
+
+**Descriptor format**
+
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|0 |length |region |flags |
++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
+|1 |metadata |offset |
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+**Flags field - flags (Quad Word 0, bits 0:15)**
+
++-----+--------------------+------------------------------------------------------------------------------------------------+
+|Bits |Name |Functionality |
++=====+====================+================================================================================================+
+|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
++-----+--------------------+------------------------------------------------------------------------------------------------+
+
+**Region index - region (Quad Word 0, 16:31)**
+
+Index of memory region, the buffer is located in.
+
+**Data length - length (Quad Word 0, 32:63)**
+
+Length of transmitted/received data.
+
+**Data Offset - offset (Quad Word 1, 0:31)**
+
+Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
+
+**Metadata - metadata (Quad Word 1, 32:63)**
+
+Buffer metadata.
+
+Files
+
+- net/memif/memif.h *- descriptor and ring definitions*
+- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
+
+Example: testpmd and testpmd
+----------------------------
+In this example we run two instances of testpmd application and transmit packets over memif.
+
+First create ``master`` interface::
+
+ #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
+
+Now create ``slave`` interface (master must be already running so the slave will connect)::
+
+ #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
+
+Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
+
+ testpmd> set fwd rxonly
+ testpmd> start
+
+ testpmd> set fwd txonly
+ testpmd> start
+
+Show status::
+
+ testpmd> show port stats 0
+
+Example: testpmd and VPP
+------------------------
+For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
+
+Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
+
+ vpp# create interface memif id 0 master no-zero-copy
+ vpp# set interface state memif0/0 up
+ vpp# set interface ip address memif0/0 192.168.1.1/24
+
+To see socket filename use show memif command::
+
+ vpp# show memif
+ sockets
+ id listener filename
+ 0 yes (1) /run/vpp/memif.sock
+ ...
+
+Now create memif interface by running testpmd with these command line options::
+
+ #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
+
+Testpmd should now create memif slave interface and try to connect to master.
+In testpmd set forward option to icmpecho and start forwarding::
+
+ testpmd> set fwd icmpecho
+ testpmd> start
+
+Send ping from VPP::
+
+ vpp# ping 192.168.1.2
+ 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
+ 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
+ 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
+ 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 5044ac7df..c77001cd5 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -221,6 +221,10 @@ New Features
Added support for Intel SST-BF feature so that the distributor core is
pinned to a high frequency core if available.
+* **Added memif PMD.**
+
+ Added the new Shared Memory Packet Interface (``memif``) PMD.
+ See the :doc:`../nics/memif` guide for more details on this new driver.
Removed Items
-------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3a72cf38c..78cb10fc6 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -35,6 +35,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
DIRS-$(CONFIG_RTE_LIBRTE_IPN3KE_PMD) += ipn3ke
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
new file mode 100644
index 000000000..2ae3c37a2
--- /dev/null
+++ b/drivers/net/memif/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_memif.a
+
+EXPORT_MAP := rte_pmd_memif_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
+LDLIBS += -lrte_ethdev -lrte_kvargs
+LDLIBS += -lrte_hash
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
new file mode 100644
index 000000000..3948b1f50
--- /dev/null
+++ b/drivers/net/memif/memif.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_H_
+#define _MEMIF_H_
+
+#define MEMIF_COOKIE 0x3E31F20
+#define MEMIF_VERSION_MAJOR 2
+#define MEMIF_VERSION_MINOR 0
+#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
+#define MEMIF_NAME_SZ 32
+
+/*
+ * S2M: direction slave -> master
+ * M2S: direction master -> slave
+ */
+
+/*
+ * Type definitions
+ */
+
+typedef enum memif_msg_type {
+ MEMIF_MSG_TYPE_NONE,
+ MEMIF_MSG_TYPE_ACK,
+ MEMIF_MSG_TYPE_HELLO,
+ MEMIF_MSG_TYPE_INIT,
+ MEMIF_MSG_TYPE_ADD_REGION,
+ MEMIF_MSG_TYPE_ADD_RING,
+ MEMIF_MSG_TYPE_CONNECT,
+ MEMIF_MSG_TYPE_CONNECTED,
+ MEMIF_MSG_TYPE_DISCONNECT,
+} memif_msg_type_t;
+
+typedef enum {
+ MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
+ MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
+} memif_ring_type_t;
+
+typedef enum {
+ MEMIF_INTERFACE_MODE_ETHERNET,
+ MEMIF_INTERFACE_MODE_IP,
+ MEMIF_INTERFACE_MODE_PUNT_INJECT,
+} memif_interface_mode_t;
+
+typedef uint16_t memif_region_index_t;
+typedef uint32_t memif_region_offset_t;
+typedef uint64_t memif_region_size_t;
+typedef uint16_t memif_ring_index_t;
+typedef uint32_t memif_interface_id_t;
+typedef uint16_t memif_version_t;
+typedef uint8_t memif_log2_ring_size_t;
+
+/*
+ * Socket messages
+ */
+
+ /**
+ * M2S
+ * Contains master interfaces configuration.
+ */
+typedef struct __rte_packed {
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+ memif_version_t min_version; /**< lowest supported memif version */
+ memif_version_t max_version; /**< highest supported memif version */
+ memif_region_index_t max_region; /**< maximum num of regions */
+ memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
+ memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
+ memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
+} memif_msg_hello_t;
+
+/**
+ * S2M
+ * Contains information required to identify interface
+ * to which the slave wants to connect.
+ */
+typedef struct __rte_packed {
+ memif_version_t version; /**< memif version */
+ memif_interface_id_t id; /**< interface id */
+ memif_interface_mode_t mode:8; /**< interface mode */
+ uint8_t secret[24]; /**< optional security parameter */
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+} memif_msg_init_t;
+
+/**
+ * S2M
+ * Request master to add new shared memory region to master interface.
+ * Shared files file descriptor is passed in cmsghdr.
+ */
+typedef struct __rte_packed {
+ memif_region_index_t index; /**< shm regions index */
+ memif_region_size_t size; /**< shm region size */
+} memif_msg_add_region_t;
+
+/**
+ * S2M
+ * Request master to add new ring to master interface.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
+ memif_ring_index_t index; /**< ring index */
+ memif_region_index_t region; /**< region index on which this ring is located */
+ memif_region_offset_t offset; /**< buffer start offset */
+ memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
+ uint16_t private_hdr_size; /**< used for private metadata */
+} memif_msg_add_ring_t;
+
+/**
+ * S2M
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
+} memif_msg_connect_t;
+
+/**
+ * M2S
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
+} memif_msg_connected_t;
+
+/**
+ * S2M & M2S
+ * Disconnect interfaces.
+ */
+typedef struct __rte_packed {
+ uint32_t code; /**< error code */
+ uint8_t string[96]; /**< disconnect reason */
+} memif_msg_disconnect_t;
+
+typedef struct __rte_packed __rte_aligned(128)
+{
+ memif_msg_type_t type:16;
+ union {
+ memif_msg_hello_t hello;
+ memif_msg_init_t init;
+ memif_msg_add_region_t add_region;
+ memif_msg_add_ring_t add_ring;
+ memif_msg_connect_t connect;
+ memif_msg_connected_t connected;
+ memif_msg_disconnect_t disconnect;
+ };
+} memif_msg_t;
+
+/*
+ * Ring and Descriptor Layout
+ */
+
+/**
+ * Buffer descriptor.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
+ memif_region_index_t region; /**< region index on which the buffer is located */
+ uint32_t length; /**< buffer length */
+ memif_region_offset_t offset; /**< buffer offset */
+ uint32_t metadata;
+} memif_desc_t;
+
+#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
+ uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
+
+typedef struct {
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+ uint32_t cookie; /**< MEMIF_COOKIE */
+ uint16_t flags; /**< flags */
+#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ volatile uint16_t head; /**< pointer to ring buffer head */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ volatile uint16_t tail; /**< pointer to ring buffer tail */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ memif_desc_t desc[0]; /**< buffer descriptors */
+} memif_ring_t;
+
+#endif /* _MEMIF_H_ */
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
new file mode 100644
index 000000000..92dd34ca7
--- /dev/null
+++ b/drivers/net/memif/memif_socket.c
@@ -0,0 +1,1105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+static void memif_intr_handler(void *arg);
+
+static ssize_t
+memif_msg_send(int fd, memif_msg_t *msg, int afd)
+{
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ struct cmsghdr *cmsg;
+ char ctl[CMSG_SPACE(sizeof(int))];
+
+ iov[0].iov_base = msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+
+ if (afd > 0) {
+ memset(&ctl, 0, sizeof(ctl));
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+ cmsg = CMSG_FIRSTHDR(&mh);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
+ }
+
+ return sendmsg(fd, &mh, 0);
+}
+
+static int
+memif_msg_send_from_queue(struct memif_control_channel *cc)
+{
+ ssize_t size;
+ int ret = 0;
+ struct memif_msg_queue_elt *e;
+
+ e = TAILQ_FIRST(&cc->msg_queue);
+ if (e == NULL)
+ return 0;
+
+ size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
+ ret = -1;
+ } else {
+ MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
+ }
+ TAILQ_REMOVE(&cc->msg_queue, e, next);
+ rte_free(e);
+
+ return ret;
+}
+
+static struct memif_msg_queue_elt *
+memif_msg_enq(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e;
+
+ e = rte_zmalloc("memif_msg", sizeof(struct memif_msg_queue_elt), 0);
+ if (e == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control message.");
+ return NULL;
+ }
+
+ e->fd = -1;
+ TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
+
+ return e;
+}
+
+void
+memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code)
+{
+ struct memif_msg_queue_elt *e;
+ struct pmd_internals *pmd;
+ memif_msg_disconnect_t *d;
+
+ if (cc == NULL) {
+ MIF_LOG(DEBUG, "Missing control channel.");
+ return;
+ }
+
+ e = memif_msg_enq(cc);
+ if (e == NULL) {
+ MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
+ return;
+ }
+
+ d = &e->msg.disconnect;
+
+ e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
+ d->code = err_code;
+
+ if (reason != NULL) {
+ strlcpy((char *)d->string, reason, sizeof(d->string));
+ if (cc->dev != NULL) {
+ pmd = cc->dev->data->dev_private;
+ strlcpy(pmd->local_disc_string, reason,
+ sizeof(pmd->local_disc_string));
+ }
+ }
+}
+
+static int
+memif_msg_enq_hello(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ memif_msg_hello_t *h;
+
+ if (e == NULL)
+ return -1;
+
+ h = &e->msg.hello;
+
+ e->msg.type = MEMIF_MSG_TYPE_HELLO;
+ h->min_version = MEMIF_VERSION;
+ h->max_version = MEMIF_VERSION;
+ h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_region = ETH_MEMIF_MAX_REGION_NUM - 1;
+ h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
+
+ strlcpy((char *)h->name, rte_version(), sizeof(h->name));
+
+ return 0;
+}
+
+static int
+memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_hello_t *h = &msg->hello;
+
+ if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
+ memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ /* Set parameters for active connection */
+ pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
+ pmd->cfg.num_s2m_rings);
+ pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
+ pmd->cfg.num_m2s_rings);
+ pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
+ pmd->cfg.log2_ring_size);
+ pmd->run.pkt_buffer_size = pmd->cfg.pkt_buffer_size;
+
+ strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
+
+ MIF_LOG(DEBUG, "%s: Connecting to %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
+{
+ memif_msg_init_t *i = &msg->init;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *pmd;
+ struct rte_eth_dev *dev;
+
+ if (i->version != MEMIF_VERSION) {
+ memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ if (cc->socket == NULL) {
+ memif_msg_enq_disconnect(cc, "Device error", 0);
+ return -1;
+ }
+
+ /* Find device with requested ID */
+ TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
+ dev = elt->dev;
+ pmd = dev->data->dev_private;
+ if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
+ pmd->id == i->id) {
+ /* assign control channel to device */
+ cc->dev = dev;
+ pmd->cc = cc;
+
+ if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Only ethernet mode supported",
+ 0);
+ return -1;
+ }
+
+ if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
+ ETH_MEMIF_FLAG_CONNECTED)) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Already connected", 0);
+ return -1;
+ }
+ strlcpy(pmd->remote_name, (char *)i->name,
+ sizeof(pmd->remote_name));
+
+ if (*pmd->secret != '\0') {
+ if (*i->secret == '\0') {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Secret required", 0);
+ return -1;
+ }
+ if (strcmp(pmd->secret, (char *)i->secret) != 0) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Incorrect secret", 0);
+ return -1;
+ }
+ }
+
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
+ return 0;
+ }
+ }
+
+ /* ID not found on this socket */
+ MIF_LOG(DEBUG, "ID %u not found.", i->id);
+ memif_msg_enq_disconnect(cc, "ID not found", 0);
+ return -1;
+}
+
+static int
+memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
+ int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_region_t *ar = &msg->add_region;
+ struct memif_region *r;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
+ return -1;
+ }
+
+ if (ar->index >= ETH_MEMIF_MAX_REGION_NUM || ar->index != pmd->regions_num ||
+ pmd->regions[ar->index] != NULL) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ r->fd = fd;
+ r->region_size = ar->size;
+ r->addr = NULL;
+
+ pmd->regions[ar->index] = r;
+ pmd->regions_num++;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_ring_t *ar = &msg->add_ring;
+ struct memif_queue *mq;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
+ return -1;
+ }
+
+ /* check if we have enough queues */
+ if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
+ if (ar->index >= pmd->cfg.num_s2m_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_s2m_rings++;
+ } else {
+ if (ar->index >= pmd->cfg.num_m2s_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_m2s_rings++;
+ }
+
+ mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
+ dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
+
+ mq->intr_handle.fd = fd;
+ mq->log2_ring_size = ar->log2_ring_size;
+ mq->region = ar->region;
+ mq->offset = ar->offset;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connect_t *c = &msg->connect;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connected_t *c = &msg->connected;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_disconnect_t *d = &msg->disconnect;
+
+ memset(pmd->remote_disc_string, 0, sizeof(pmd->remote_disc_string));
+ strlcpy(pmd->remote_disc_string, (char *)d->string,
+ sizeof(pmd->remote_disc_string));
+
+ MIF_LOG(INFO, "%s: Disconnect received: %s",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+
+ memset(pmd->local_disc_string, 0, 96);
+ memif_disconnect(rte_eth_dev_allocated
+ (rte_vdev_device_name(pmd->vdev)));
+ return 0;
+}
+
+static int
+memif_msg_enq_ack(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ if (e == NULL)
+ return -1;
+
+ e->msg.type = MEMIF_MSG_TYPE_ACK;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_init(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_init_t *i = &e->msg.init;
+
+ if (e == NULL)
+ return -1;
+
+ i = &e->msg.init;
+ e->msg.type = MEMIF_MSG_TYPE_INIT;
+ i->version = MEMIF_VERSION;
+ i->id = pmd->id;
+ i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
+
+ strlcpy((char *)i->name, rte_version(), sizeof(i->name));
+
+ if (*pmd->secret != '\0')
+ strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_add_region_t *ar;
+ struct memif_region *mr = pmd->regions[idx];
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_region;
+ e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
+ e->fd = mr->fd;
+ ar->index = idx;
+ ar->size = mr->region_size;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
+ memif_ring_type_t type)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ struct memif_queue *mq;
+ memif_msg_add_ring_t *ar;
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_ring;
+ mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
+ dev->data->rx_queues[idx];
+
+ e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
+ e->fd = mq->intr_handle.fd;
+ ar->index = idx;
+ ar->offset = mq->offset;
+ ar->region = mq->region;
+ ar->log2_ring_size = mq->log2_ring_size;
+ ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
+ ar->private_hdr_size = 0;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connect_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connect;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECT;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connected(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connected_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connected;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static void
+memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
+{
+ struct memif_msg_queue_elt *elt;
+ struct memif_control_channel *cc = arg;
+
+ /* close control channel fd */
+ close(intr_handle->fd);
+ /* clear message queue */
+ while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
+ TAILQ_REMOVE(&cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ /* free control channel */
+ rte_free(cc);
+}
+
+void
+memif_disconnect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *elt, *next;
+ struct memif_queue *mq;
+ struct rte_intr_handle *ih;
+ int i;
+ int ret;
+
+ if (pmd->cc != NULL) {
+ /* Clear control message queue (except disconnect message if any). */
+ for (elt = TAILQ_FIRST(&pmd->cc->msg_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
+ TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ }
+ /* send disconnect message (if there is any in queue) */
+ memif_msg_send_from_queue(pmd->cc);
+
+ /* at this point, there should be no more messages in queue */
+ if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
+ MIF_LOG(WARNING,
+ "%s: Unexpected message(s) in message queue.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+
+ ih = &pmd->cc->intr_handle;
+ if (ih->fd > 0) {
+ ret = rte_intr_callback_unregister(ih,
+ memif_intr_handler,
+ pmd->cc);
+ /*
+ * If callback is active (disconnecting based on
+ * received control message).
+ */
+ if (ret == -EAGAIN) {
+ ret = rte_intr_callback_unregister_pending(ih,
+ memif_intr_handler,
+ pmd->cc,
+ memif_intr_unregister_handler);
+ } else if (ret > 0) {
+ close(ih->fd);
+ rte_free(pmd->cc);
+ }
+ pmd->cc = NULL;
+ if (ret <= 0)
+ MIF_LOG(WARNING,
+ "%s: Failed to unregister control channel callback.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+ }
+
+ /* unconfig interrupts */
+ for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+ for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+
+ memif_free_regions(pmd);
+
+ /* reset connection configuration */
+ memset(&pmd->run, 0, sizeof(pmd->run));
+
+ dev->data->dev_link.link_status = ETH_LINK_DOWN;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
+ MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
+}
+
+static int
+memif_msg_receive(struct memif_control_channel *cc)
+{
+ char ctl[CMSG_SPACE(sizeof(int)) +
+ CMSG_SPACE(sizeof(struct ucred))] = { 0 };
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ memif_msg_t msg = { 0 };
+ ssize_t size;
+ int ret = 0;
+ struct ucred *cr __rte_unused;
+ cr = 0;
+ struct cmsghdr *cmsg;
+ int afd = -1;
+ int i;
+ struct pmd_internals *pmd;
+
+ iov[0].iov_base = (void *)&msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+
+ size = recvmsg(cc->intr_handle.fd, &mh, 0);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(DEBUG, "Invalid message size.");
+ memif_msg_enq_disconnect(cc, "Invalid message size", 0);
+ return -1;
+ }
+ MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
+
+ cmsg = CMSG_FIRSTHDR(&mh);
+ while (cmsg) {
+ if (cmsg->cmsg_level == SOL_SOCKET) {
+ if (cmsg->cmsg_type == SCM_CREDENTIALS)
+ cr = (struct ucred *)CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_RIGHTS)
+ afd = *(int *)CMSG_DATA(cmsg);
+ }
+ cmsg = CMSG_NXTHDR(&mh, cmsg);
+ }
+
+ if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
+ MIF_LOG(DEBUG, "Unexpected message.");
+ memif_msg_enq_disconnect(cc, "Unexpected message", 0);
+ return -1;
+ }
+
+ /* get device from hash data */
+ switch (msg.type) {
+ case MEMIF_MSG_TYPE_ACK:
+ break;
+ case MEMIF_MSG_TYPE_HELLO:
+ ret = memif_msg_receive_hello(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_init_regions_and_queues(cc->dev);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_init(cc->dev);
+ if (ret < 0)
+ goto exit;
+ pmd = cc->dev->data->dev_private;
+ for (i = 0; i < pmd->regions_num; i++) {
+ ret = memif_msg_enq_add_region(cc->dev, i);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_S2M);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_M2S);
+ if (ret < 0)
+ goto exit;
+ }
+ ret = memif_msg_enq_connect(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_INIT:
+ /*
+ * This cc does not have an interface asociated with it.
+ * If suitable interface is found it will be assigned here.
+ */
+ ret = memif_msg_receive_init(cc, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_REGION:
+ ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_RING:
+ ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECT:
+ ret = memif_msg_receive_connect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_connected(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECTED:
+ ret = memif_msg_receive_connected(cc->dev, &msg);
+ break;
+ case MEMIF_MSG_TYPE_DISCONNECT:
+ ret = memif_msg_receive_disconnect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ break;
+ default:
+ memif_msg_enq_disconnect(cc, "Unknown message type", 0);
+ ret = -1;
+ goto exit;
+ }
+
+ exit:
+ return ret;
+}
+
+static void
+memif_intr_handler(void *arg)
+{
+ struct memif_control_channel *cc = arg;
+ int ret;
+
+ ret = memif_msg_receive(cc);
+ /* if driver failed to assign device */
+ if (cc->dev == NULL) {
+ ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
+ memif_intr_handler,
+ cc,
+ memif_intr_unregister_handler);
+ if (ret < 0)
+ MIF_LOG(WARNING,
+ "Failed to unregister control channel callback.");
+ return;
+ }
+ /* if memif_msg_receive failed */
+ if (ret < 0)
+ goto disconnect;
+
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto disconnect;
+
+ return;
+
+ disconnect:
+ if (cc->dev == NULL) {
+ MIF_LOG(WARNING, "eth dev not allocated");
+ return;
+ }
+ memif_disconnect(cc->dev);
+}
+
+static void
+memif_listener_handler(void *arg)
+{
+ struct memif_socket *socket = arg;
+ int sockfd;
+ int addr_len;
+ struct sockaddr_un client;
+ struct memif_control_channel *cc;
+ int ret;
+
+ addr_len = sizeof(client);
+ sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
+ (socklen_t *)&addr_len);
+ if (sockfd < 0) {
+ MIF_LOG(ERR,
+ "Failed to accept connection request on socket fd %d",
+ socket->intr_handle.fd);
+ return;
+ }
+
+ MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
+
+ cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
+ if (cc == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control channel.");
+ goto error;
+ }
+
+ cc->intr_handle.fd = sockfd;
+ cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ cc->socket = socket;
+ cc->dev = NULL;
+ TAILQ_INIT(&cc->msg_queue);
+
+ ret = rte_intr_callback_register(&cc->intr_handle, memif_intr_handler, cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to register control channel callback.");
+ goto error;
+ }
+
+ ret = memif_msg_enq_hello(cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to enqueue hello message.");
+ goto error;
+ }
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto error;
+
+ return;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (cc != NULL)
+ rte_free(cc);
+}
+
+static struct memif_socket *
+memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
+{
+ struct memif_socket *sock;
+ struct sockaddr_un un;
+ int sockfd;
+ int ret;
+ int on = 1;
+
+ sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
+ if (sock == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory for memif socket");
+ return NULL;
+ }
+
+ sock->listener = listener;
+ rte_memcpy(sock->filename, key, 256);
+ TAILQ_INIT(&sock->dev_queue);
+
+ if (listener != 0) {
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0)
+ goto error;
+
+ un.sun_family = AF_UNIX;
+ memcpy(un.sun_path, sock->filename,
+ sizeof(un.sun_path) - 1);
+
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
+ sizeof(on));
+ if (ret < 0)
+ goto error;
+ ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+ if (ret < 0)
+ goto error;
+ ret = listen(sockfd, 1);
+ if (ret < 0)
+ goto error;
+
+ MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
+ rte_vdev_device_name(pmd->vdev), sock->filename);
+
+ sock->intr_handle.fd = sockfd;
+ sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ ret = rte_intr_callback_register(&sock->intr_handle,
+ memif_listener_handler, sock);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt "
+ "callback for listener socket",
+ rte_vdev_device_name(pmd->vdev));
+ return NULL;
+ }
+ }
+
+ return sock;
+
+ error:
+ MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
+ rte_vdev_device_name(pmd->vdev), key, strerror(errno));
+ if (sock != NULL)
+ rte_free(sock);
+ return NULL;
+}
+
+static struct rte_hash *
+memif_create_socket_hash(void)
+{
+ struct rte_hash_parameters params = { 0 };
+ params.name = MEMIF_SOCKET_HASH_NAME;
+ params.entries = 256;
+ params.key_len = 256;
+ params.hash_func = rte_jhash;
+ params.hash_func_init_val = 0;
+ return rte_hash_create(¶ms);
+}
+
+int
+memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *tmp_pmd;
+ struct rte_hash *hash;
+ int ret;
+ char key[256];
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL) {
+ hash = memif_create_socket_hash();
+ if (hash == NULL) {
+ MIF_LOG(ERR, "Failed to create memif socket hash.");
+ return -1;
+ }
+ }
+
+ memset(key, 0, 256);
+ rte_memcpy(key, socket_filename, strlen(socket_filename));
+ ret = rte_hash_lookup_data(hash, key, (void **)&socket);
+ if (ret < 0) {
+ socket = memif_socket_create(pmd, key,
+ (pmd->role ==
+ MEMIF_ROLE_SLAVE) ? 0 : 1);
+ if (socket == NULL)
+ return -1;
+ ret = rte_hash_add_key_data(hash, key, socket);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to add socket to socket hash.");
+ return ret;
+ }
+ }
+ pmd->socket_filename = socket->filename;
+
+ if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
+ MIF_LOG(ERR, "Socket is a listener.");
+ return -1;
+ } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
+ MIF_LOG(ERR, "Socket is not a listener.");
+ return -1;
+ }
+
+ TAILQ_FOREACH(elt, &socket->dev_queue, next) {
+ tmp_pmd = elt->dev->data->dev_private;
+ if (tmp_pmd->id == pmd->id) {
+ MIF_LOG(ERR, "Memif device with id %d already "
+ "exists on socket %s",
+ pmd->id, socket->filename);
+ return -1;
+ }
+ }
+
+ elt =
+ rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt),
+ 0);
+ if (elt == NULL) {
+ MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ elt->dev = dev;
+ TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
+
+ return 0;
+}
+
+void
+memif_socket_remove_device(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt, *next;
+ struct rte_hash *hash;
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL)
+ return;
+
+ if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) <
+ 0)
+ return;
+
+ for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->dev == dev) {
+ TAILQ_REMOVE(&socket->dev_queue, elt, next);
+ rte_free(elt);
+ pmd->socket_filename = NULL;
+ }
+ }
+
+ /* remove socket, if this was the last device using it */
+ if (TAILQ_EMPTY(&socket->dev_queue)) {
+ rte_hash_del_key(hash, socket->filename);
+ if (socket->listener) {
+ /* remove listener socket file,
+ * so we can create new one later.
+ */
+ remove(socket->filename);
+ }
+ rte_free(socket);
+ }
+}
+
+int
+memif_connect_master(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+ return 0;
+}
+
+int
+memif_connect_slave(struct rte_eth_dev *dev)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_un sun;
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0) {
+ MIF_LOG(ERR, "%s: Failed to open socket.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ sun.sun_family = AF_UNIX;
+
+ memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
+
+ ret = connect(sockfd, (struct sockaddr *)&sun,
+ sizeof(struct sockaddr_un));
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+ goto error;
+ }
+
+ MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+
+ pmd->cc = rte_zmalloc("memif-cc",
+ sizeof(struct memif_control_channel), 0);
+ if (pmd->cc == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate control channel.",
+ rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ pmd->cc->intr_handle.fd = sockfd;
+ pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ pmd->cc->socket = NULL;
+ pmd->cc->dev = dev;
+ TAILQ_INIT(&pmd->cc->msg_queue);
+
+ ret = rte_intr_callback_register(&pmd->cc->intr_handle,
+ memif_intr_handler, pmd->cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt callback "
+ "for control fd", rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ return 0;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (pmd->cc != NULL) {
+ rte_free(pmd->cc);
+ pmd->cc = NULL;
+ }
+ return -1;
+}
diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
new file mode 100644
index 000000000..2e8970712
--- /dev/null
+++ b/drivers/net/memif/memif_socket.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_SOCKET_H_
+#define _MEMIF_SOCKET_H_
+
+#include <sys/queue.h>
+
+/**
+ * Remove device from socket device list. If no device is left on the socket,
+ * remove the socket as well.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_socket_remove_device(struct rte_eth_dev *dev);
+
+/**
+ * Enqueue disconnect message to control channel message queue.
+ *
+ * @param cc
+ * control channel
+ * @param reason
+ * const string stating disconnect reason (96 characters)
+ * @param err_code
+ * error code
+ */
+void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code);
+
+/**
+ * Initialize memif socket for specified device. If socket doesn't exist, create socket.
+ *
+ * @param dev
+ * memif ethernet device
+ * @param socket_filename
+ * socket filename
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
+
+/**
+ * Disconnect memif device. Close control channel and shared memory.
+ *
+ * @param dev
+ * ethernet device
+ */
+void memif_disconnect(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, enable connection establishment.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_master(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, send connection request.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_slave(struct rte_eth_dev *dev);
+
+struct memif_socket_dev_list_elt {
+ TAILQ_ENTRY(memif_socket_dev_list_elt) next;
+ struct rte_eth_dev *dev; /**< pointer to device internals */
+ char dev_name[RTE_ETH_NAME_MAX_LEN];
+};
+
+#define MEMIF_SOCKET_HASH_NAME "memif-sh"
+struct memif_socket {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ uint8_t listener; /**< if not zero socket is listener */
+ char filename[256]; /**< socket filename */
+
+ TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
+ /**< Queue of devices using this socket */
+};
+
+/* Control message queue. */
+struct memif_msg_queue_elt {
+ TAILQ_ENTRY(memif_msg_queue_elt) next;
+ memif_msg_t msg; /**< control message */
+ int fd; /**< fd to be sent to peer */
+};
+
+struct memif_control_channel {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
+ struct memif_socket *socket; /**< pointer to socket */
+ struct rte_eth_dev *dev; /**< pointer to device */
+};
+
+#endif /* MEMIF_SOCKET_H */
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
new file mode 100644
index 000000000..4dfe37d3a
--- /dev/null
+++ b/drivers/net/memif/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+if host_machine.system() != 'linux'
+ build = false
+endif
+
+sources = files('rte_eth_memif.c',
+ 'memif_socket.c')
+
+allow_experimental_apis = true
+
+deps += ['hash']
diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
new file mode 100644
index 000000000..c50f14f08
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -0,0 +1,1211 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/if_ether.h>
+#include <errno.h>
+#include <sys/eventfd.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+#define ETH_MEMIF_ID_ARG "id"
+#define ETH_MEMIF_ROLE_ARG "role"
+#define ETH_MEMIF_PKT_BUFFER_SIZE_ARG "bsize"
+#define ETH_MEMIF_RING_SIZE_ARG "rsize"
+#define ETH_MEMIF_SOCKET_ARG "socket"
+#define ETH_MEMIF_MAC_ARG "mac"
+#define ETH_MEMIF_ZC_ARG "zero-copy"
+#define ETH_MEMIF_SECRET_ARG "secret"
+
+static const char *valid_arguments[] = {
+ ETH_MEMIF_ID_ARG,
+ ETH_MEMIF_ROLE_ARG,
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ ETH_MEMIF_RING_SIZE_ARG,
+ ETH_MEMIF_SOCKET_ARG,
+ ETH_MEMIF_MAC_ARG,
+ ETH_MEMIF_ZC_ARG,
+ ETH_MEMIF_SECRET_ARG,
+ NULL
+};
+
+/* Message header to synchronize queues via IPC */
+struct ipc_queues {
+ char port_name[RTE_DEV_NAME_MAX_LEN];
+ int rxq_count;
+ int txq_count;
+ /*
+ * The file descriptors are in the dedicated part
+ * of the Unix message to be translated by the kernel.
+ */
+};
+
+const char *
+memif_version(void)
+{
+ return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
+}
+
+static void
+memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
+{
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+ dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static memif_ring_t *
+memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
+{
+ /* rings only in region 0 */
+ void *p = pmd->regions[0]->addr;
+ int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size);
+
+ p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
+
+ return (memif_ring_t *)p;
+}
+
+static void *
+memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
+{
+ return ((uint8_t *)pmd->regions[d->region]->addr +
+ pmd->regions[d->region]->pkt_buffer_offset + d->offset);
+}
+
+static int
+memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail,
+ struct rte_mbuf *tail)
+{
+ /* Check for number-of-segments-overflow */
+ if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS))
+ return -EOVERFLOW;
+
+ /* Chain 'tail' onto the old tail */
+ cur_tail->next = tail;
+
+ /* accumulate number of segments and total length. */
+ head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs);
+
+ tail->pkt_len = tail->data_len;
+ head->pkt_len += tail->pkt_len;
+
+ return 0;
+}
+
+static uint16_t
+eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
+ uint16_t n_rx_pkts = 0;
+ uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
+ RTE_PKTMBUF_HEADROOM;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail;
+ uint64_t b;
+ ssize_t size __rte_unused;
+ uint16_t head;
+ int ret;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ /* consume interrupt */
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
+ size = read(mq->intr_handle.fd, &b, sizeof(b));
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
+ last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+ if (cur_slot == last_slot)
+ goto refill;
+ n_slots = last_slot - cur_slot;
+
+ while (n_slots && n_rx_pkts < nb_pkts) {
+ mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf_head == NULL))
+ goto no_free_bufs;
+ mbuf = mbuf_head;
+ mbuf->port = mq->in_port;
+
+next_slot:
+ s0 = cur_slot & mask;
+ d0 = &ring->desc[s0];
+
+ src_len = d0->length;
+ dst_off = 0;
+ src_off = 0;
+
+ do {
+ dst_len = mbuf_size - dst_off;
+ if (dst_len == 0) {
+ dst_off = 0;
+ dst_len = mbuf_size;
+
+ /* store pointer to tail */
+ mbuf_tail = mbuf;
+ mbuf = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf == NULL))
+ goto no_free_bufs;
+ mbuf->port = mq->in_port;
+ ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf);
+ if (unlikely(ret < 0)) {
+ MIF_LOG(ERR, "%s: number-of-segments-overflow",
+ rte_vdev_device_name(pmd->vdev));
+ rte_pktmbuf_free(mbuf);
+ goto no_free_bufs;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ rte_pktmbuf_data_len(mbuf) += cp_len;
+ rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
+ if (mbuf != mbuf_head)
+ rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
+
+ memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
+ (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
+
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ } while (src_len);
+
+ cur_slot++;
+ n_slots--;
+
+ if (d0->flags & MEMIF_DESC_FLAG_NEXT)
+ goto next_slot;
+
+ mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
+ *bufs++ = mbuf_head;
+ n_rx_pkts++;
+ }
+
+no_free_bufs:
+ if (type == MEMIF_RING_S2M) {
+ rte_mb();
+ ring->tail = cur_slot;
+ mq->last_head = cur_slot;
+ } else {
+ mq->last_tail = cur_slot;
+ }
+
+refill:
+ if (type == MEMIF_RING_M2S) {
+ head = ring->head;
+ n_slots = ring_size - head + mq->last_tail;
+
+ while (n_slots--) {
+ s0 = head++ & mask;
+ d0 = &ring->desc[s0];
+ d0->length = pmd->run.pkt_buffer_size;
+ }
+ rte_mb();
+ ring->head = head;
+ }
+
+ mq->n_pkts += n_rx_pkts;
+ return n_rx_pkts;
+}
+
+static uint16_t
+eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head;
+ uint64_t a;
+ ssize_t size;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ n_free = ring->tail - mq->last_tail;
+ mq->last_tail += n_free;
+ slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+
+ if (type == MEMIF_RING_S2M)
+ n_free = ring_size - ring->head + mq->last_tail;
+ else
+ n_free = ring->head - ring->tail;
+
+ while (n_tx_pkts < nb_pkts && n_free) {
+ mbuf_head = *bufs++;
+ mbuf = mbuf_head;
+
+ saved_slot = slot;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+
+next_in_chain:
+ src_off = 0;
+ src_len = rte_pktmbuf_data_len(mbuf);
+
+ while (src_len) {
+ if (dst_len == 0) {
+ if (n_free) {
+ slot++;
+ n_free--;
+ d0->flags |= MEMIF_DESC_FLAG_NEXT;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+ d0->flags = 0;
+ } else {
+ slot = saved_slot;
+ goto no_free_slots;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
+ rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
+ cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ dst_len -= cp_len;
+
+ d0->length = dst_off;
+ }
+
+ if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
+ mbuf = mbuf->next;
+ goto next_in_chain;
+ }
+
+ n_tx_pkts++;
+ slot++;
+ n_free--;
+ rte_pktmbuf_free(mbuf_head);
+ }
+
+no_free_slots:
+ rte_mb();
+ if (type == MEMIF_RING_S2M)
+ ring->head = slot;
+ else
+ ring->tail = slot;
+
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
+ a = 1;
+ size = write(mq->intr_handle.fd, &a, sizeof(a));
+ if (unlikely(size < 0)) {
+ MIF_LOG(WARNING,
+ "%s: Failed to send interrupt. %s",
+ rte_vdev_device_name(pmd->vdev), strerror(errno));
+ }
+ }
+
+ mq->n_err += nb_pkts - n_tx_pkts;
+ mq->n_pkts += n_tx_pkts;
+ return n_tx_pkts;
+}
+
+void
+memif_free_regions(struct pmd_internals *pmd)
+{
+ int i;
+ struct memif_region *r;
+
+ /* regions are allocated contiguously, so it's
+ * enough to loop until 'pmd->regions_num'
+ */
+ for (i = 0; i < pmd->regions_num; i++) {
+ r = pmd->regions[i];
+ if (r != NULL) {
+ if (r->addr != NULL) {
+ munmap(r->addr, r->region_size);
+ if (r->fd > 0) {
+ close(r->fd);
+ r->fd = -1;
+ }
+ }
+ rte_free(r);
+ pmd->regions[i] = NULL;
+ }
+ }
+ pmd->regions_num = 0;
+}
+
+static int
+memif_region_init_shm(struct pmd_internals *pmd, uint8_t has_buffers)
+{
+ char shm_name[ETH_MEMIF_SHM_NAME_SIZE];
+ int ret = 0;
+ struct memif_region *r;
+
+ if (pmd->regions_num >= ETH_MEMIF_MAX_REGION_NUM) {
+ MIF_LOG(ERR, "%s: Too many regions.", rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ /* calculate buffer offset */
+ r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+
+ r->region_size = r->pkt_buffer_offset;
+ /* if region has buffers, add buffers size to region_size */
+ if (has_buffers == 1)
+ r->region_size += (uint32_t)(pmd->run.pkt_buffer_size *
+ (1 << pmd->run.log2_ring_size) *
+ (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings));
+
+ memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE);
+ snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d",
+ pmd->regions_num);
+
+ r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
+ if (r->fd < 0) {
+ MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ ret = -1;
+ goto error;
+ }
+
+ ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ ret = ftruncate(r->fd, r->region_size);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ r->addr = mmap(NULL, r->region_size, PROT_READ |
+ PROT_WRITE, MAP_SHARED, r->fd, 0);
+ if (r->addr == MAP_FAILED) {
+ MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(ret));
+ ret = -1;
+ goto error;
+ }
+
+ pmd->regions[pmd->regions_num] = r;
+ pmd->regions_num++;
+
+ return ret;
+
+error:
+ if (r->fd > 0)
+ close(r->fd);
+ r->fd = -1;
+
+ return ret;
+}
+
+static int
+memif_regions_init(struct pmd_internals *pmd)
+{
+ int ret;
+
+ /* create one buffer region */
+ ret = memif_region_init_shm(pmd, /* has buffer */ 1);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void
+memif_init_rings(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_ring_t *ring;
+ int i, j;
+ uint16_t slot;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = i * (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = (i + pmd->run.num_s2m_rings) *
+ (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+}
+
+/* called only by slave */
+static void
+memif_init_queues(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = dev->data->tx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for tx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = dev->data->rx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for rx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+}
+
+int
+memif_init_regions_and_queues(struct rte_eth_dev *dev)
+{
+ int ret;
+
+ ret = memif_regions_init(dev->data->dev_private);
+ if (ret < 0)
+ return ret;
+
+ memif_init_rings(dev);
+
+ memif_init_queues(dev);
+
+ return 0;
+}
+
+int
+memif_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_region *mr;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ mr = pmd->regions[i];
+ if (mr != NULL) {
+ if (mr->addr == NULL) {
+ if (mr->fd < 0)
+ return -1;
+ mr->addr = mmap(NULL, mr->region_size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, mr->fd, 0);
+ if (mr->addr == NULL)
+ return -1;
+ /* calculate packet buffer offset */
+ mr->pkt_buffer_offset = (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+ }
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_MASTER)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_SLAVE)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
+ dev->data->dev_link.link_status = ETH_LINK_UP;
+ MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
+ return 0;
+}
+
+static int
+memif_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int ret = 0;
+
+ switch (pmd->role) {
+ case MEMIF_ROLE_SLAVE:
+ ret = memif_connect_slave(dev);
+ break;
+ case MEMIF_ROLE_MASTER:
+ ret = memif_connect_master(dev);
+ break;
+ default:
+ MIF_LOG(ERR, "%s: Unknown role: %d.",
+ rte_vdev_device_name(pmd->vdev), pmd->role);
+ ret = -1;
+ break;
+ }
+
+ return ret;
+}
+
+static int
+memif_dev_configure(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ /*
+ * SLAVE - TXQ
+ * MASTER - RXQ
+ */
+ pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_tx_queues : dev->data->nb_rx_queues;
+
+ /*
+ * SLAVE - RXQ
+ * MASTER - TXQ
+ */
+ pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+
+ return 0;
+}
+
+static int
+memif_tx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type =
+ (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->pmd = pmd;
+ dev->data->tx_queues[qid] = mq;
+
+ return 0;
+}
+
+static int
+memif_rx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ printf("Setting up rx queue...\n");
+
+ mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->mempool = mb_pool;
+ mq->in_port = dev->data->port_id;
+ mq->pmd = pmd;
+ dev->data->rx_queues[qid] = mq;
+
+ return 0;
+}
+
+static void
+memif_queue_release(void *queue)
+{
+ struct memif_queue *q = (struct memif_queue *)queue;
+
+ if (!q)
+ return;
+
+ rte_free(q);
+}
+
+static int
+memif_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static int
+memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+ uint8_t tmp, nq;
+
+ stats->ipackets = 0;
+ stats->ibytes = 0;
+ stats->opackets = 0;
+ stats->obytes = 0;
+ stats->oerrors = 0;
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
+ pmd->run.num_m2s_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* RX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->rx_queues[i];
+ stats->q_ipackets[i] = mq->n_pkts;
+ stats->q_ibytes[i] = mq->n_bytes;
+ stats->ipackets += mq->n_pkts;
+ stats->ibytes += mq->n_bytes;
+ }
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
+ pmd->run.num_s2m_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* TX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->tx_queues[i];
+ stats->q_opackets[i] = mq->n_pkts;
+ stats->q_obytes[i] = mq->n_bytes;
+ stats->q_errors[i] = mq->n_err;
+ stats->opackets += mq->n_pkts;
+ stats->obytes += mq->n_bytes;
+ stats->oerrors += mq->n_err;
+ }
+ return 0;
+}
+
+static void
+memif_stats_reset(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+ struct memif_queue *mq;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
+ dev->data->rx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
+ dev->data->tx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+}
+
+static int
+memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
+ rte_vdev_device_name(pmd->vdev));
+
+ return -1;
+}
+
+static int
+memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
+
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = memif_dev_start,
+ .dev_infos_get = memif_dev_info,
+ .dev_configure = memif_dev_configure,
+ .tx_queue_setup = memif_tx_queue_setup,
+ .rx_queue_setup = memif_rx_queue_setup,
+ .rx_queue_release = memif_queue_release,
+ .tx_queue_release = memif_queue_release,
+ .rx_queue_intr_enable = memif_rx_queue_intr_enable,
+ .rx_queue_intr_disable = memif_rx_queue_intr_disable,
+ .link_update = memif_link_update,
+ .stats_get = memif_stats_get,
+ .stats_reset = memif_stats_reset,
+};
+
+static int
+memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
+ memif_interface_id_t id, uint32_t flags,
+ const char *socket_filename,
+ memif_log2_ring_size_t log2_ring_size,
+ uint16_t pkt_buffer_size, const char *secret,
+ struct ether_addr *ether_addr)
+{
+ int ret = 0;
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct pmd_internals *pmd;
+ const unsigned int numa_node = vdev->device.numa_node;
+ const char *name = rte_vdev_device_name(vdev);
+
+ if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
+ MIF_LOG(ERR, "Zero-copy slave not supported.");
+ return -1;
+ }
+
+ eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
+ if (eth_dev == NULL) {
+ MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
+ return -1;
+ }
+
+ pmd = eth_dev->data->dev_private;
+ memset(pmd, 0, sizeof(*pmd));
+
+ pmd->vdev = vdev;
+ pmd->id = id;
+ pmd->flags = flags;
+ pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
+ pmd->role = role;
+ printf("%s\n", role == MEMIF_ROLE_SLAVE ? "SLAVE" : "MASTER");
+ ret = memif_socket_init(eth_dev, socket_filename);
+ if (ret < 0)
+ return ret;
+
+ memset(pmd->secret, 0, sizeof(char) * 24);
+ if (secret != NULL)
+ strlcpy(pmd->secret, secret, sizeof(pmd->secret));
+
+ pmd->cfg.log2_ring_size = log2_ring_size;
+ /* set in .dev_configure() */
+ pmd->cfg.num_s2m_rings = 0;
+ pmd->cfg.num_m2s_rings = 0;
+
+ pmd->cfg.pkt_buffer_size = pkt_buffer_size;
+
+ data = eth_dev->data;
+ data->dev_private = pmd;
+ data->numa_node = numa_node;
+ data->mac_addrs = ether_addr;
+
+ eth_dev->dev_ops = &ops;
+ eth_dev->device = &vdev->device;
+ eth_dev->rx_pkt_burst = eth_memif_rx;
+ eth_dev->tx_pkt_burst = eth_memif_tx;
+
+ rte_eth_dev_probing_finish(eth_dev);
+
+ return 0;
+}
+
+static int
+memif_set_role(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ enum memif_role_t *role = (enum memif_role_t *)extra_args;
+
+ if (strstr(value, "master") != NULL) {
+ *role = MEMIF_ROLE_MASTER;
+ } else if (strstr(value, "slave") != NULL) {
+ *role = MEMIF_ROLE_SLAVE;
+ } else {
+ MIF_LOG(ERR, "Unknown role: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint32_t *flags = (uint32_t *)extra_args;
+
+ if (strstr(value, "yes") != NULL) {
+ *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
+ } else if (strstr(value, "no") != NULL) {
+ *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
+ } else {
+ MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
+
+ /* even if parsing fails, 0 is a valid id */
+ *id = strtoul(value, NULL, 10);
+ return 0;
+}
+
+static int
+memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ uint16_t *pkt_buffer_size = (uint16_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > 0xFFFF) {
+ MIF_LOG(ERR, "Invalid buffer size: %s.", value);
+ return -EINVAL;
+ }
+ *pkt_buffer_size = tmp;
+ return 0;
+}
+
+static int
+memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ memif_log2_ring_size_t *log2_ring_size =
+ (memif_log2_ring_size_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
+ MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
+ value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
+ return -EINVAL;
+ }
+ *log2_ring_size = tmp;
+ return 0;
+}
+
+/* check if directory exists and if we have permission to read/write */
+static int
+memif_check_socket_filename(const char *filename)
+{
+ char *dir = NULL, *tmp;
+ uint32_t idx;
+ int ret = 0;
+
+ tmp = strrchr(filename, '/');
+ if (tmp != NULL) {
+ idx = tmp - filename;
+ dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
+ if (dir == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory.");
+ return -1;
+ }
+ strlcpy(dir, filename, sizeof(char) * (idx + 1));
+ }
+
+ if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
+ W_OK, AT_EACCESS) < 0)) {
+ MIF_LOG(ERR, "Invalid directory: '%s'.", dir);
+ ret = -EINVAL;
+ }
+
+ if (dir != NULL)
+ rte_free(dir);
+
+ return ret;
+}
+
+static int
+memif_set_socket_filename(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ const char **socket_filename = (const char **)extra_args;
+
+ *socket_filename = value;
+ return memif_check_socket_filename(*socket_filename);
+}
+
+static int
+memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ struct ether_addr *ether_addr = (struct ether_addr *)extra_args;
+ int ret = 0;
+
+ ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+ ðer_addr->addr_bytes[0], ðer_addr->addr_bytes[1],
+ ðer_addr->addr_bytes[2], ðer_addr->addr_bytes[3],
+ ðer_addr->addr_bytes[4], ðer_addr->addr_bytes[5]);
+ if (ret != 6)
+ MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
+ return 0;
+}
+
+static int
+memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **secret = (const char **)extra_args;
+
+ *secret = value;
+ return 0;
+}
+
+static int
+rte_pmd_memif_probe(struct rte_vdev_device *vdev)
+{
+ RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
+ RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
+ int ret = 0;
+ struct rte_kvargs *kvlist;
+ const char *name = rte_vdev_device_name(vdev);
+ enum memif_role_t role = MEMIF_ROLE_SLAVE;
+ memif_interface_id_t id = 0;
+ uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
+ memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
+ const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
+ uint32_t flags = 0;
+ const char *secret = NULL;
+ struct ether_addr *ether_addr = rte_zmalloc("", sizeof(struct ether_addr), 0);
+
+ eth_random_addr(ether_addr->addr_bytes);
+
+ MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
+
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ MIF_LOG(ERR, "Multi-processing not supported for memif.");
+ /* TODO:
+ * Request connection information.
+ *
+ * Once memif in the primary process is connected,
+ * broadcast connection information.
+ */
+ return -1;
+ }
+
+ kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
+
+ /* parse parameters */
+ if (kvlist != NULL) {
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
+ &memif_set_role, &role);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
+ &memif_set_id, &id);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ &memif_set_bs, &pkt_buffer_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
+ &memif_set_rs, &log2_ring_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
+ &memif_set_socket_filename,
+ (void *)(&socket_filename));
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
+ &memif_set_mac, ether_addr);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
+ &memif_set_zc, &flags);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
+ &memif_set_secret, (void *)(&secret));
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* create interface */
+ ret = memif_create(vdev, role, id, flags, socket_filename,
+ log2_ring_size, pkt_buffer_size, secret, ether_addr);
+
+exit:
+ if (kvlist != NULL)
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_memif_remove(struct rte_vdev_device *vdev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internals *pmd;
+
+ eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
+ if (eth_dev == NULL)
+ return 0;
+
+ pmd = eth_dev->data->dev_private;
+
+ memif_msg_enq_disconnect(pmd->cc, "Device removed", 0);
+ memif_disconnect(eth_dev);
+
+ memif_socket_remove_device(eth_dev);
+
+ pmd->vdev = NULL;
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_vdev_driver pmd_memif_drv = {
+ .probe = rte_pmd_memif_probe,
+ .remove = rte_pmd_memif_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
+
+RTE_PMD_REGISTER_PARAM_STRING(net_memif,
+ ETH_MEMIF_ID_ARG "=<int>"
+ ETH_MEMIF_ROLE_ARG "=master|slave"
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG "=<int>"
+ ETH_MEMIF_RING_SIZE_ARG "=<int>"
+ ETH_MEMIF_SOCKET_ARG "=<string>"
+ ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
+ ETH_MEMIF_ZC_ARG "=yes|no"
+ ETH_MEMIF_SECRET_ARG "=<string>");
+
+int memif_logtype;
+
+RTE_INIT(memif_init_log)
+{
+ memif_logtype = rte_log_register("pmd.net.memif");
+ if (memif_logtype >= 0)
+ rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
new file mode 100644
index 000000000..e640c975f
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.h
@@ -0,0 +1,207 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_ETH_MEMIF_H_
+#define _RTE_ETH_MEMIF_H_
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif /* GNU_SOURCE */
+
+#include <sys/queue.h>
+
+#include <rte_ethdev_driver.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
+#include "memif.h"
+
+#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/tmp/memif.sock"
+#define ETH_MEMIF_DEFAULT_RING_SIZE 10
+#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
+
+#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
+#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
+#define ETH_MEMIF_MAX_REGION_NUM 256
+
+#define ETH_MEMIF_SHM_NAME_SIZE 32
+
+extern int memif_logtype;
+
+#define MIF_LOG(level, fmt, args...) \
+ rte_log(RTE_LOG_ ## level, memif_logtype, \
+ "%s(): " fmt "\n", __func__, ##args)
+
+enum memif_role_t {
+ MEMIF_ROLE_MASTER,
+ MEMIF_ROLE_SLAVE,
+};
+
+struct memif_region {
+ void *addr; /**< shared memory address */
+ memif_region_size_t region_size; /**< shared memory size */
+ int fd; /**< shared memory file descriptor */
+ uint32_t pkt_buffer_offset;
+ /**< offset from 'addr' to first packet buffer */
+};
+
+struct memif_queue {
+ struct rte_mempool *mempool; /**< mempool for RX packets */
+ uint16_t in_port; /**< port id */
+
+ struct pmd_internals *pmd; /**< device internals */
+
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+
+ /* ring info */
+ memif_ring_type_t type; /**< ring type */
+ memif_ring_t *ring; /**< pointer to ring */
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+
+ memif_region_index_t region; /**< shared memory region index */
+ memif_region_offset_t offset;
+ /**< ring offset from start of shm region (ring - memif_region.addr) */
+
+ uint16_t last_head; /**< last ring head */
+ uint16_t last_tail; /**< last ring tail */
+
+ /* rx/tx info */
+ uint64_t n_pkts; /**< number of rx/tx packets */
+ uint64_t n_bytes; /**< number of rx/tx bytes */
+ uint64_t n_err; /**< number of tx errors */
+};
+
+struct pmd_internals {
+ memif_interface_id_t id; /**< unique id */
+ enum memif_role_t role; /**< device role */
+ uint32_t flags; /**< device status flags */
+#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
+/**< device is connecting */
+#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
+/**< device is connected */
+#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
+/**< device is zero-copy enabled */
+#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
+/**< device has not been configured and can not accept connection requests */
+
+ char *socket_filename; /**< pointer to socket filename */
+ char secret[24]; /**< secret (optional security parameter) */
+
+ struct memif_control_channel *cc; /**< control channel */
+
+ struct memif_region *regions[ETH_MEMIF_MAX_REGION_NUM];
+ /**< shared memory regions */
+ memif_region_index_t regions_num; /**< number of regions */
+
+ /* remote info */
+ char remote_name[64]; /**< remote app name */
+ char remote_if_name[64]; /**< remote peer name */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } cfg; /**< Configured parameters (max values) */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } run;
+ /**< Parameters used in active connection */
+
+ char local_disc_string[96]; /**< local disconnect reason */
+ char remote_disc_string[96]; /**< remote disconnect reason */
+
+ struct rte_vdev_device *vdev; /**< vdev handle */
+};
+
+/**
+ * Unmap shared memory and free regions from memory.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_free_regions(struct pmd_internals *pmd);
+
+/**
+ * Finalize connection establishment process. Map shared memory file
+ * (master role), initialize ring queue, set link status up.
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect(struct rte_eth_dev *dev);
+
+/**
+ * Create shared memory file and initialize ring queue.
+ * Only called by slave when establishing connection
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_init_regions_and_queues(struct rte_eth_dev *dev);
+
+/**
+ * Get memif version string.
+ *
+ * @return
+ * - memif version string
+ */
+const char *memif_version(void);
+
+#ifndef MFD_HUGETLB
+#ifndef __NR_memfd_create
+
+#if defined __x86_64__
+#define __NR_memfd_create 319
+#elif defined __x86_32__
+#define __NR_memfd_create 1073742143
+#elif defined __arm__
+#define __NR_memfd_create 385
+#elif defined __aarch64__
+#define __NR_memfd_create 279
+#elif defined __powerpc__
+#define __NR_memfd_create 360
+#elif defined __i386__
+#define __NR_memfd_create 356
+#else
+#error "__NR_memfd_create unknown for this architecture"
+#endif
+
+#endif /* __NR_memfd_create */
+
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif /* MFD_HUGETLB */
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#endif /* RTE_ETH_MEMIF_H */
diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
new file mode 100644
index 000000000..4b2e62193
--- /dev/null
+++ b/drivers/net/memif/rte_pmd_memif_version.map
@@ -0,0 +1,4 @@
+DPDK_19.05 {
+
+ local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index ed99896c3..b57073483 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -24,6 +24,7 @@ drivers = ['af_packet',
'ixgbe',
'kni',
'liquidio',
+ 'memif',
'mlx4',
'mlx5',
'mvneta',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7c9b4b538..326b2a30b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -174,6 +174,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
--
2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* [dpdk-dev] [RFC v6] /net: memory interface (memif)
2019-05-09 8:30 ` [dpdk-dev] [RFC v6] " Jakub Grajciar
@ 2019-05-09 8:30 ` Jakub Grajciar
2019-05-13 10:45 ` [dpdk-dev] [RFC v7] " Jakub Grajciar
1 sibling, 0 replies; 58+ messages in thread
From: Jakub Grajciar @ 2019-05-09 8:30 UTC (permalink / raw)
To: dev; +Cc: Jakub Grajciar
Memory interface (memif), provides high performance
packet transfer over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
---
MAINTAINERS | 6 +
app/test-pmd/testpmd.c | 2 +
config/common_base | 5 +
config/common_linux | 1 +
doc/guides/nics/features/memif.ini | 14 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/memif.rst | 234 ++++
doc/guides/rel_notes/release_19_05.rst | 4 +
drivers/net/Makefile | 1 +
drivers/net/memif/Makefile | 30 +
drivers/net/memif/memif.h | 179 +++
drivers/net/memif/memif_socket.c | 1105 +++++++++++++++++
drivers/net/memif/memif_socket.h | 105 ++
drivers/net/memif/meson.build | 13 +
drivers/net/memif/rte_eth_memif.c | 1211 +++++++++++++++++++
drivers/net/memif/rte_eth_memif.h | 207 ++++
drivers/net/memif/rte_pmd_memif_version.map | 4 +
drivers/net/meson.build | 1 +
mk/rte.app.mk | 1 +
19 files changed, 3124 insertions(+)
create mode 100644 doc/guides/nics/features/memif.ini
create mode 100644 doc/guides/nics/memif.rst
create mode 100644 drivers/net/memif/Makefile
create mode 100644 drivers/net/memif/memif.h
create mode 100644 drivers/net/memif/memif_socket.c
create mode 100644 drivers/net/memif/memif_socket.h
create mode 100644 drivers/net/memif/meson.build
create mode 100644 drivers/net/memif/rte_eth_memif.c
create mode 100644 drivers/net/memif/rte_eth_memif.h
create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
requires patch: http://patchwork.dpdk.org/patch/51511/
Memif uses a unix domain socket as control channel (cc).
rte_interrupts.h is used to handle interrupts for this cc.
This patch is required, because the message received can
be a reason for disconnecting.
v3:
- coding style fixes
- documentation
- doxygen comments
- use strlcpy() instead of strncpy()
- use RTE_BUILD_BUG_ON instead of _Static_assert()
- fix meson build deps
v4:
- coding style fixes
- doc update (desc format, messaging, features, ...)
- pointer arithmetic fix
- __rte_packed and __rte_aligned instead of __attribute__()
- fixed multi-queue
- support additional archs (memfd_create syscall)
v5:
- coding style fixes
- add release notes
- remove unused dependencies
- doc update
v6:
- remove socket on termination of testpmd application
- disallow memif probing in secondary process (return error), multi-process support will be implemented in separate patch
- fix chained buffer packet length
- use single region for rings and buffers (non-zero-copy)
- doc update
diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829c5..3698c146b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -841,6 +841,12 @@ F: drivers/net/softnic/
F: doc/guides/nics/features/softnic.ini
F: doc/guides/nics/softnic.rst
+Memif PMD
+M: Jakub Grajciar <jgrajcia@cisco.com>
+F: drivers/net/memif/
+F: doc/guides/nics/memif.rst
+F: doc/guides/nics/features/memif.ini
+
Crypto Drivers
--------------
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6fbfd29dd..9eec4c19d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2493,6 +2493,8 @@ pmd_test_exit(void)
device = rte_eth_devices[pt_id].device;
if (device && !strcmp(device->driver->name, "net_virtio_user"))
detach_port_device(pt_id);
+ else if (device && !strcmp(device->driver->name, "net_memif"))
+ detach_port_device(pt_id);
}
}
diff --git a/config/common_base b/config/common_base
index 6b96e0e80..dca119a65 100644
--- a/config/common_base
+++ b/config/common_base
@@ -444,6 +444,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
#
CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+#
+# Compile Memory Interface PMD driver (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_MEMIF=n
+
#
# Compile link bonding PMD library
#
diff --git a/config/common_linux b/config/common_linux
index 75334273d..87514fe4f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_LIBRTE_IFC_PMD=y
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_MEMIF=y
CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
CONFIG_RTE_LIBRTE_PMD_TAP=y
CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
new file mode 100644
index 000000000..807d9ecdc
--- /dev/null
+++ b/doc/guides/nics/features/memif.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'memif' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status = Y
+Basic stats = Y
+Jumbo frame = Y
+ARMv8 = Y
+Power8 = Y
+x86-32 = Y
+x86-64 = Y
+Usage doc = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2221c35f2..691e7209f 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
intel_vf
kni
liquidio
+ memif
mlx4
mlx5
mvneta
diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
new file mode 100644
index 000000000..0d0312442
--- /dev/null
+++ b/doc/guides/nics/memif.rst
@@ -0,0 +1,234 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2018-2019 Cisco Systems, Inc.
+
+======================
+Memif Poll Mode Driver
+======================
+
+Shared memory packet interface (memif) PMD allows for DPDK and any other client
+using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
+Linux only.
+
+The created device transmits packets in a raw format. It can be used with
+Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
+supported in DPDK memif implementation.
+
+Memif works in two roles: master and slave. Slave connects to master over an
+existing socket. It is also a producer of shared memory file and initializes
+the shared memory. Each interface can be connected to one peer interface
+at same time. The peer interface is identified by id parameter. Master
+creates the socket and listens for any slave connection requests. The socket
+may already exist on the system. Be sure to remove any such sockets, if you
+are creating a master interface, or you will see an "Address already in use"
+error. Function ``rte_pmd_memif_remove()``, which removes memif interface,
+will also remove a listener socket, if it is not being used by any other
+interface.
+
+The method to enable one or more interfaces is to use the
+``--vdev=net_memif0`` option on the DPDK application command line. Each
+``--vdev=net_memif1`` option given will create an interface named net_memif0,
+net_memif1, and so on. Memif uses unix domain socket to transmit control
+messages. Each memif has a unique id per socket. This id is used to identify
+peer interface. If you are connecting multiple
+interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
+etc. Note that if you assign a socket to a master interface it becomes a
+listener socket. Listener socket can not be used by a slave interface on same
+client.
+
+.. csv-table:: **Memif configuration options**
+ :header: "Option", "Description", "Default", "Valid value"
+
+ "id=0", "Used to identify peer interface", "0", "uint32_t"
+ "role=master", "Set memif role", "slave", "master|slave"
+ "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
+ "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
+ "nrxq=2", "Number of RX queues", "1", "255"
+ "ntxq=2", "Number of TX queues", "1", "255"
+ "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
+ "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
+ "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
+ "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
+
+**Connection establishment**
+
+In order to create memif connection, two memif interfaces, each in separate
+process, are needed. One interface in ``master`` role and other in
+``slave`` role. It is not possible to connect two interfaces in a single
+process. Each interface can be connected to one interface at same time,
+identified by matching id parameter.
+
+Memif driver uses unix domain socket to exchange required information between
+memif interfaces. Socket file path is specified at interface creation see
+*Memif configuration options* table above. If socket is used by ``master``
+interface, it's marked as listener socket (in scope of current process) and
+listens to connection requests from other processes. One socket can be used by
+multiple interfaces. One process can have ``slave`` and ``master`` interfaces
+at the same time, provided each role is assigned unique socket.
+
+For detailed information on memif control messages, see: net/memif/memif.h.
+
+Slave interface attempts to make a connection on assigned socket. Process
+listening on this socket will extract the connection request and create a new
+connected socket (control channel). Then it sends the 'hello' message
+(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
+adjusts its configuration accordingly, and sends 'init' message
+(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
+uses this id to find master interface, and assigns the control channel to this
+interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
+sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
+every region allocated. Master responds to each of these messages with 'ack'
+message. Same behavior applies to rings. Slave sends 'add ring' message
+(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
+each message with 'ack' message. To finalize the connection, slave interface
+sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
+master maps regions to its address space, initializes rings and responds with
+'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
+(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
+any time, due to driver error or if the interface is being deleted.
+
+Files
+
+- net/memif/memif.h *- control messages definitions*
+- net/memif/memif_socket.h
+- net/memif/memif_socket.c
+
+Shared memory
+~~~~~~~~~~~~~
+
+**Shared memory format**
+
+Slave is producer and master is consumer. Memory regions, are mapped shared memory files,
+created by memif slave and provided to master at connection establishment.
+Regions contain rings and buffers. Rings and buffers can also be separated into multiple
+regions. For no-zero-copy, rings and buffers are stored inside single memory
+region to reduce the number of opened files.
+
+region n (no-zero-copy):
+
++-----------------------+-------------------------------------------------------------------------+
+| Rings | Buffers |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+| S2M rings | M2S rings | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+
+S2M OR M2S Rings:
+
++--------+--------+-----------------------+
+| ring 0 | ring 1 | ring num_s2m_rings - 1|
++--------+--------+-----------------------+
+
+ring 0:
+
++-------------+---------------------------------------+
+| ring header | (1 << pmd->run.log2_ring_size) * desc |
++-------------+---------------------------------------+
+
+Descriptors are assigned packet buffers in order of rings creation. If we have one ring
+in each direction and ring size is 1024, then first 1024 buffers will belong to S2M ring and
+last 1024 will belong to M2S ring. In case of zero-copy, buffers are dequeued and
+enqueued as needed.
+
+**Descriptor format**
+
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|0 |length |region |flags |
++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
+|1 |metadata |offset |
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+**Flags field - flags (Quad Word 0, bits 0:15)**
+
++-----+--------------------+------------------------------------------------------------------------------------------------+
+|Bits |Name |Functionality |
++=====+====================+================================================================================================+
+|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
++-----+--------------------+------------------------------------------------------------------------------------------------+
+
+**Region index - region (Quad Word 0, 16:31)**
+
+Index of memory region, the buffer is located in.
+
+**Data length - length (Quad Word 0, 32:63)**
+
+Length of transmitted/received data.
+
+**Data Offset - offset (Quad Word 1, 0:31)**
+
+Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
+
+**Metadata - metadata (Quad Word 1, 32:63)**
+
+Buffer metadata.
+
+Files
+
+- net/memif/memif.h *- descriptor and ring definitions*
+- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
+
+Example: testpmd and testpmd
+----------------------------
+In this example we run two instances of testpmd application and transmit packets over memif.
+
+First create ``master`` interface::
+
+ #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
+
+Now create ``slave`` interface (master must be already running so the slave will connect)::
+
+ #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
+
+Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
+
+ testpmd> set fwd rxonly
+ testpmd> start
+
+ testpmd> set fwd txonly
+ testpmd> start
+
+Show status::
+
+ testpmd> show port stats 0
+
+Example: testpmd and VPP
+------------------------
+For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
+
+Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
+
+ vpp# create interface memif id 0 master no-zero-copy
+ vpp# set interface state memif0/0 up
+ vpp# set interface ip address memif0/0 192.168.1.1/24
+
+To see socket filename use show memif command::
+
+ vpp# show memif
+ sockets
+ id listener filename
+ 0 yes (1) /run/vpp/memif.sock
+ ...
+
+Now create memif interface by running testpmd with these command line options::
+
+ #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
+
+Testpmd should now create memif slave interface and try to connect to master.
+In testpmd set forward option to icmpecho and start forwarding::
+
+ testpmd> set fwd icmpecho
+ testpmd> start
+
+Send ping from VPP::
+
+ vpp# ping 192.168.1.2
+ 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
+ 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
+ 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
+ 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 5044ac7df..c77001cd5 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -221,6 +221,10 @@ New Features
Added support for Intel SST-BF feature so that the distributor core is
pinned to a high frequency core if available.
+* **Added memif PMD.**
+
+ Added the new Shared Memory Packet Interface (``memif``) PMD.
+ See the :doc:`../nics/memif` guide for more details on this new driver.
Removed Items
-------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3a72cf38c..78cb10fc6 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -35,6 +35,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
DIRS-$(CONFIG_RTE_LIBRTE_IPN3KE_PMD) += ipn3ke
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
new file mode 100644
index 000000000..2ae3c37a2
--- /dev/null
+++ b/drivers/net/memif/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_memif.a
+
+EXPORT_MAP := rte_pmd_memif_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
+LDLIBS += -lrte_ethdev -lrte_kvargs
+LDLIBS += -lrte_hash
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
new file mode 100644
index 000000000..3948b1f50
--- /dev/null
+++ b/drivers/net/memif/memif.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_H_
+#define _MEMIF_H_
+
+#define MEMIF_COOKIE 0x3E31F20
+#define MEMIF_VERSION_MAJOR 2
+#define MEMIF_VERSION_MINOR 0
+#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
+#define MEMIF_NAME_SZ 32
+
+/*
+ * S2M: direction slave -> master
+ * M2S: direction master -> slave
+ */
+
+/*
+ * Type definitions
+ */
+
+typedef enum memif_msg_type {
+ MEMIF_MSG_TYPE_NONE,
+ MEMIF_MSG_TYPE_ACK,
+ MEMIF_MSG_TYPE_HELLO,
+ MEMIF_MSG_TYPE_INIT,
+ MEMIF_MSG_TYPE_ADD_REGION,
+ MEMIF_MSG_TYPE_ADD_RING,
+ MEMIF_MSG_TYPE_CONNECT,
+ MEMIF_MSG_TYPE_CONNECTED,
+ MEMIF_MSG_TYPE_DISCONNECT,
+} memif_msg_type_t;
+
+typedef enum {
+ MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
+ MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
+} memif_ring_type_t;
+
+typedef enum {
+ MEMIF_INTERFACE_MODE_ETHERNET,
+ MEMIF_INTERFACE_MODE_IP,
+ MEMIF_INTERFACE_MODE_PUNT_INJECT,
+} memif_interface_mode_t;
+
+typedef uint16_t memif_region_index_t;
+typedef uint32_t memif_region_offset_t;
+typedef uint64_t memif_region_size_t;
+typedef uint16_t memif_ring_index_t;
+typedef uint32_t memif_interface_id_t;
+typedef uint16_t memif_version_t;
+typedef uint8_t memif_log2_ring_size_t;
+
+/*
+ * Socket messages
+ */
+
+ /**
+ * M2S
+ * Contains master interfaces configuration.
+ */
+typedef struct __rte_packed {
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+ memif_version_t min_version; /**< lowest supported memif version */
+ memif_version_t max_version; /**< highest supported memif version */
+ memif_region_index_t max_region; /**< maximum num of regions */
+ memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
+ memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
+ memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
+} memif_msg_hello_t;
+
+/**
+ * S2M
+ * Contains information required to identify interface
+ * to which the slave wants to connect.
+ */
+typedef struct __rte_packed {
+ memif_version_t version; /**< memif version */
+ memif_interface_id_t id; /**< interface id */
+ memif_interface_mode_t mode:8; /**< interface mode */
+ uint8_t secret[24]; /**< optional security parameter */
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+} memif_msg_init_t;
+
+/**
+ * S2M
+ * Request master to add new shared memory region to master interface.
+ * Shared files file descriptor is passed in cmsghdr.
+ */
+typedef struct __rte_packed {
+ memif_region_index_t index; /**< shm regions index */
+ memif_region_size_t size; /**< shm region size */
+} memif_msg_add_region_t;
+
+/**
+ * S2M
+ * Request master to add new ring to master interface.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
+ memif_ring_index_t index; /**< ring index */
+ memif_region_index_t region; /**< region index on which this ring is located */
+ memif_region_offset_t offset; /**< buffer start offset */
+ memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
+ uint16_t private_hdr_size; /**< used for private metadata */
+} memif_msg_add_ring_t;
+
+/**
+ * S2M
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
+} memif_msg_connect_t;
+
+/**
+ * M2S
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
+} memif_msg_connected_t;
+
+/**
+ * S2M & M2S
+ * Disconnect interfaces.
+ */
+typedef struct __rte_packed {
+ uint32_t code; /**< error code */
+ uint8_t string[96]; /**< disconnect reason */
+} memif_msg_disconnect_t;
+
+typedef struct __rte_packed __rte_aligned(128)
+{
+ memif_msg_type_t type:16;
+ union {
+ memif_msg_hello_t hello;
+ memif_msg_init_t init;
+ memif_msg_add_region_t add_region;
+ memif_msg_add_ring_t add_ring;
+ memif_msg_connect_t connect;
+ memif_msg_connected_t connected;
+ memif_msg_disconnect_t disconnect;
+ };
+} memif_msg_t;
+
+/*
+ * Ring and Descriptor Layout
+ */
+
+/**
+ * Buffer descriptor.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
+ memif_region_index_t region; /**< region index on which the buffer is located */
+ uint32_t length; /**< buffer length */
+ memif_region_offset_t offset; /**< buffer offset */
+ uint32_t metadata;
+} memif_desc_t;
+
+#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
+ uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
+
+typedef struct {
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+ uint32_t cookie; /**< MEMIF_COOKIE */
+ uint16_t flags; /**< flags */
+#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ volatile uint16_t head; /**< pointer to ring buffer head */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ volatile uint16_t tail; /**< pointer to ring buffer tail */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ memif_desc_t desc[0]; /**< buffer descriptors */
+} memif_ring_t;
+
+#endif /* _MEMIF_H_ */
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
new file mode 100644
index 000000000..92dd34ca7
--- /dev/null
+++ b/drivers/net/memif/memif_socket.c
@@ -0,0 +1,1105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+static void memif_intr_handler(void *arg);
+
+static ssize_t
+memif_msg_send(int fd, memif_msg_t *msg, int afd)
+{
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ struct cmsghdr *cmsg;
+ char ctl[CMSG_SPACE(sizeof(int))];
+
+ iov[0].iov_base = msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+
+ if (afd > 0) {
+ memset(&ctl, 0, sizeof(ctl));
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+ cmsg = CMSG_FIRSTHDR(&mh);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
+ }
+
+ return sendmsg(fd, &mh, 0);
+}
+
+static int
+memif_msg_send_from_queue(struct memif_control_channel *cc)
+{
+ ssize_t size;
+ int ret = 0;
+ struct memif_msg_queue_elt *e;
+
+ e = TAILQ_FIRST(&cc->msg_queue);
+ if (e == NULL)
+ return 0;
+
+ size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
+ ret = -1;
+ } else {
+ MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
+ }
+ TAILQ_REMOVE(&cc->msg_queue, e, next);
+ rte_free(e);
+
+ return ret;
+}
+
+static struct memif_msg_queue_elt *
+memif_msg_enq(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e;
+
+ e = rte_zmalloc("memif_msg", sizeof(struct memif_msg_queue_elt), 0);
+ if (e == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control message.");
+ return NULL;
+ }
+
+ e->fd = -1;
+ TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
+
+ return e;
+}
+
+void
+memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code)
+{
+ struct memif_msg_queue_elt *e;
+ struct pmd_internals *pmd;
+ memif_msg_disconnect_t *d;
+
+ if (cc == NULL) {
+ MIF_LOG(DEBUG, "Missing control channel.");
+ return;
+ }
+
+ e = memif_msg_enq(cc);
+ if (e == NULL) {
+ MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
+ return;
+ }
+
+ d = &e->msg.disconnect;
+
+ e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
+ d->code = err_code;
+
+ if (reason != NULL) {
+ strlcpy((char *)d->string, reason, sizeof(d->string));
+ if (cc->dev != NULL) {
+ pmd = cc->dev->data->dev_private;
+ strlcpy(pmd->local_disc_string, reason,
+ sizeof(pmd->local_disc_string));
+ }
+ }
+}
+
+static int
+memif_msg_enq_hello(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ memif_msg_hello_t *h;
+
+ if (e == NULL)
+ return -1;
+
+ h = &e->msg.hello;
+
+ e->msg.type = MEMIF_MSG_TYPE_HELLO;
+ h->min_version = MEMIF_VERSION;
+ h->max_version = MEMIF_VERSION;
+ h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_region = ETH_MEMIF_MAX_REGION_NUM - 1;
+ h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
+
+ strlcpy((char *)h->name, rte_version(), sizeof(h->name));
+
+ return 0;
+}
+
+static int
+memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_hello_t *h = &msg->hello;
+
+ if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
+ memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ /* Set parameters for active connection */
+ pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
+ pmd->cfg.num_s2m_rings);
+ pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
+ pmd->cfg.num_m2s_rings);
+ pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
+ pmd->cfg.log2_ring_size);
+ pmd->run.pkt_buffer_size = pmd->cfg.pkt_buffer_size;
+
+ strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
+
+ MIF_LOG(DEBUG, "%s: Connecting to %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
+{
+ memif_msg_init_t *i = &msg->init;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *pmd;
+ struct rte_eth_dev *dev;
+
+ if (i->version != MEMIF_VERSION) {
+ memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ if (cc->socket == NULL) {
+ memif_msg_enq_disconnect(cc, "Device error", 0);
+ return -1;
+ }
+
+ /* Find device with requested ID */
+ TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
+ dev = elt->dev;
+ pmd = dev->data->dev_private;
+ if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
+ pmd->id == i->id) {
+ /* assign control channel to device */
+ cc->dev = dev;
+ pmd->cc = cc;
+
+ if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Only ethernet mode supported",
+ 0);
+ return -1;
+ }
+
+ if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
+ ETH_MEMIF_FLAG_CONNECTED)) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Already connected", 0);
+ return -1;
+ }
+ strlcpy(pmd->remote_name, (char *)i->name,
+ sizeof(pmd->remote_name));
+
+ if (*pmd->secret != '\0') {
+ if (*i->secret == '\0') {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Secret required", 0);
+ return -1;
+ }
+ if (strcmp(pmd->secret, (char *)i->secret) != 0) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Incorrect secret", 0);
+ return -1;
+ }
+ }
+
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
+ return 0;
+ }
+ }
+
+ /* ID not found on this socket */
+ MIF_LOG(DEBUG, "ID %u not found.", i->id);
+ memif_msg_enq_disconnect(cc, "ID not found", 0);
+ return -1;
+}
+
+static int
+memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
+ int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_region_t *ar = &msg->add_region;
+ struct memif_region *r;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
+ return -1;
+ }
+
+ if (ar->index >= ETH_MEMIF_MAX_REGION_NUM || ar->index != pmd->regions_num ||
+ pmd->regions[ar->index] != NULL) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ r->fd = fd;
+ r->region_size = ar->size;
+ r->addr = NULL;
+
+ pmd->regions[ar->index] = r;
+ pmd->regions_num++;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_ring_t *ar = &msg->add_ring;
+ struct memif_queue *mq;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
+ return -1;
+ }
+
+ /* check if we have enough queues */
+ if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
+ if (ar->index >= pmd->cfg.num_s2m_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_s2m_rings++;
+ } else {
+ if (ar->index >= pmd->cfg.num_m2s_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_m2s_rings++;
+ }
+
+ mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
+ dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
+
+ mq->intr_handle.fd = fd;
+ mq->log2_ring_size = ar->log2_ring_size;
+ mq->region = ar->region;
+ mq->offset = ar->offset;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connect_t *c = &msg->connect;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connected_t *c = &msg->connected;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_disconnect_t *d = &msg->disconnect;
+
+ memset(pmd->remote_disc_string, 0, sizeof(pmd->remote_disc_string));
+ strlcpy(pmd->remote_disc_string, (char *)d->string,
+ sizeof(pmd->remote_disc_string));
+
+ MIF_LOG(INFO, "%s: Disconnect received: %s",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+
+ memset(pmd->local_disc_string, 0, 96);
+ memif_disconnect(rte_eth_dev_allocated
+ (rte_vdev_device_name(pmd->vdev)));
+ return 0;
+}
+
+static int
+memif_msg_enq_ack(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ if (e == NULL)
+ return -1;
+
+ e->msg.type = MEMIF_MSG_TYPE_ACK;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_init(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_init_t *i = &e->msg.init;
+
+ if (e == NULL)
+ return -1;
+
+ i = &e->msg.init;
+ e->msg.type = MEMIF_MSG_TYPE_INIT;
+ i->version = MEMIF_VERSION;
+ i->id = pmd->id;
+ i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
+
+ strlcpy((char *)i->name, rte_version(), sizeof(i->name));
+
+ if (*pmd->secret != '\0')
+ strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_add_region_t *ar;
+ struct memif_region *mr = pmd->regions[idx];
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_region;
+ e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
+ e->fd = mr->fd;
+ ar->index = idx;
+ ar->size = mr->region_size;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
+ memif_ring_type_t type)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ struct memif_queue *mq;
+ memif_msg_add_ring_t *ar;
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_ring;
+ mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
+ dev->data->rx_queues[idx];
+
+ e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
+ e->fd = mq->intr_handle.fd;
+ ar->index = idx;
+ ar->offset = mq->offset;
+ ar->region = mq->region;
+ ar->log2_ring_size = mq->log2_ring_size;
+ ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
+ ar->private_hdr_size = 0;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connect_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connect;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECT;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connected(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connected_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connected;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static void
+memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
+{
+ struct memif_msg_queue_elt *elt;
+ struct memif_control_channel *cc = arg;
+
+ /* close control channel fd */
+ close(intr_handle->fd);
+ /* clear message queue */
+ while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
+ TAILQ_REMOVE(&cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ /* free control channel */
+ rte_free(cc);
+}
+
+void
+memif_disconnect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *elt, *next;
+ struct memif_queue *mq;
+ struct rte_intr_handle *ih;
+ int i;
+ int ret;
+
+ if (pmd->cc != NULL) {
+ /* Clear control message queue (except disconnect message if any). */
+ for (elt = TAILQ_FIRST(&pmd->cc->msg_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
+ TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ }
+ /* send disconnect message (if there is any in queue) */
+ memif_msg_send_from_queue(pmd->cc);
+
+ /* at this point, there should be no more messages in queue */
+ if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
+ MIF_LOG(WARNING,
+ "%s: Unexpected message(s) in message queue.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+
+ ih = &pmd->cc->intr_handle;
+ if (ih->fd > 0) {
+ ret = rte_intr_callback_unregister(ih,
+ memif_intr_handler,
+ pmd->cc);
+ /*
+ * If callback is active (disconnecting based on
+ * received control message).
+ */
+ if (ret == -EAGAIN) {
+ ret = rte_intr_callback_unregister_pending(ih,
+ memif_intr_handler,
+ pmd->cc,
+ memif_intr_unregister_handler);
+ } else if (ret > 0) {
+ close(ih->fd);
+ rte_free(pmd->cc);
+ }
+ pmd->cc = NULL;
+ if (ret <= 0)
+ MIF_LOG(WARNING,
+ "%s: Failed to unregister control channel callback.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+ }
+
+ /* unconfig interrupts */
+ for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+ for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+
+ memif_free_regions(pmd);
+
+ /* reset connection configuration */
+ memset(&pmd->run, 0, sizeof(pmd->run));
+
+ dev->data->dev_link.link_status = ETH_LINK_DOWN;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
+ MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
+}
+
+static int
+memif_msg_receive(struct memif_control_channel *cc)
+{
+ char ctl[CMSG_SPACE(sizeof(int)) +
+ CMSG_SPACE(sizeof(struct ucred))] = { 0 };
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ memif_msg_t msg = { 0 };
+ ssize_t size;
+ int ret = 0;
+ struct ucred *cr __rte_unused;
+ cr = 0;
+ struct cmsghdr *cmsg;
+ int afd = -1;
+ int i;
+ struct pmd_internals *pmd;
+
+ iov[0].iov_base = (void *)&msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+
+ size = recvmsg(cc->intr_handle.fd, &mh, 0);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(DEBUG, "Invalid message size.");
+ memif_msg_enq_disconnect(cc, "Invalid message size", 0);
+ return -1;
+ }
+ MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
+
+ cmsg = CMSG_FIRSTHDR(&mh);
+ while (cmsg) {
+ if (cmsg->cmsg_level == SOL_SOCKET) {
+ if (cmsg->cmsg_type == SCM_CREDENTIALS)
+ cr = (struct ucred *)CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_RIGHTS)
+ afd = *(int *)CMSG_DATA(cmsg);
+ }
+ cmsg = CMSG_NXTHDR(&mh, cmsg);
+ }
+
+ if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
+ MIF_LOG(DEBUG, "Unexpected message.");
+ memif_msg_enq_disconnect(cc, "Unexpected message", 0);
+ return -1;
+ }
+
+ /* get device from hash data */
+ switch (msg.type) {
+ case MEMIF_MSG_TYPE_ACK:
+ break;
+ case MEMIF_MSG_TYPE_HELLO:
+ ret = memif_msg_receive_hello(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_init_regions_and_queues(cc->dev);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_init(cc->dev);
+ if (ret < 0)
+ goto exit;
+ pmd = cc->dev->data->dev_private;
+ for (i = 0; i < pmd->regions_num; i++) {
+ ret = memif_msg_enq_add_region(cc->dev, i);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_S2M);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_M2S);
+ if (ret < 0)
+ goto exit;
+ }
+ ret = memif_msg_enq_connect(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_INIT:
+ /*
+ * This cc does not have an interface asociated with it.
+ * If suitable interface is found it will be assigned here.
+ */
+ ret = memif_msg_receive_init(cc, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_REGION:
+ ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_RING:
+ ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECT:
+ ret = memif_msg_receive_connect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_connected(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECTED:
+ ret = memif_msg_receive_connected(cc->dev, &msg);
+ break;
+ case MEMIF_MSG_TYPE_DISCONNECT:
+ ret = memif_msg_receive_disconnect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ break;
+ default:
+ memif_msg_enq_disconnect(cc, "Unknown message type", 0);
+ ret = -1;
+ goto exit;
+ }
+
+ exit:
+ return ret;
+}
+
+static void
+memif_intr_handler(void *arg)
+{
+ struct memif_control_channel *cc = arg;
+ int ret;
+
+ ret = memif_msg_receive(cc);
+ /* if driver failed to assign device */
+ if (cc->dev == NULL) {
+ ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
+ memif_intr_handler,
+ cc,
+ memif_intr_unregister_handler);
+ if (ret < 0)
+ MIF_LOG(WARNING,
+ "Failed to unregister control channel callback.");
+ return;
+ }
+ /* if memif_msg_receive failed */
+ if (ret < 0)
+ goto disconnect;
+
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto disconnect;
+
+ return;
+
+ disconnect:
+ if (cc->dev == NULL) {
+ MIF_LOG(WARNING, "eth dev not allocated");
+ return;
+ }
+ memif_disconnect(cc->dev);
+}
+
+static void
+memif_listener_handler(void *arg)
+{
+ struct memif_socket *socket = arg;
+ int sockfd;
+ int addr_len;
+ struct sockaddr_un client;
+ struct memif_control_channel *cc;
+ int ret;
+
+ addr_len = sizeof(client);
+ sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
+ (socklen_t *)&addr_len);
+ if (sockfd < 0) {
+ MIF_LOG(ERR,
+ "Failed to accept connection request on socket fd %d",
+ socket->intr_handle.fd);
+ return;
+ }
+
+ MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
+
+ cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
+ if (cc == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control channel.");
+ goto error;
+ }
+
+ cc->intr_handle.fd = sockfd;
+ cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ cc->socket = socket;
+ cc->dev = NULL;
+ TAILQ_INIT(&cc->msg_queue);
+
+ ret = rte_intr_callback_register(&cc->intr_handle, memif_intr_handler, cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to register control channel callback.");
+ goto error;
+ }
+
+ ret = memif_msg_enq_hello(cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to enqueue hello message.");
+ goto error;
+ }
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto error;
+
+ return;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (cc != NULL)
+ rte_free(cc);
+}
+
+static struct memif_socket *
+memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
+{
+ struct memif_socket *sock;
+ struct sockaddr_un un;
+ int sockfd;
+ int ret;
+ int on = 1;
+
+ sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
+ if (sock == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory for memif socket");
+ return NULL;
+ }
+
+ sock->listener = listener;
+ rte_memcpy(sock->filename, key, 256);
+ TAILQ_INIT(&sock->dev_queue);
+
+ if (listener != 0) {
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0)
+ goto error;
+
+ un.sun_family = AF_UNIX;
+ memcpy(un.sun_path, sock->filename,
+ sizeof(un.sun_path) - 1);
+
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
+ sizeof(on));
+ if (ret < 0)
+ goto error;
+ ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+ if (ret < 0)
+ goto error;
+ ret = listen(sockfd, 1);
+ if (ret < 0)
+ goto error;
+
+ MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
+ rte_vdev_device_name(pmd->vdev), sock->filename);
+
+ sock->intr_handle.fd = sockfd;
+ sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ ret = rte_intr_callback_register(&sock->intr_handle,
+ memif_listener_handler, sock);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt "
+ "callback for listener socket",
+ rte_vdev_device_name(pmd->vdev));
+ return NULL;
+ }
+ }
+
+ return sock;
+
+ error:
+ MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
+ rte_vdev_device_name(pmd->vdev), key, strerror(errno));
+ if (sock != NULL)
+ rte_free(sock);
+ return NULL;
+}
+
+static struct rte_hash *
+memif_create_socket_hash(void)
+{
+ struct rte_hash_parameters params = { 0 };
+ params.name = MEMIF_SOCKET_HASH_NAME;
+ params.entries = 256;
+ params.key_len = 256;
+ params.hash_func = rte_jhash;
+ params.hash_func_init_val = 0;
+ return rte_hash_create(¶ms);
+}
+
+int
+memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *tmp_pmd;
+ struct rte_hash *hash;
+ int ret;
+ char key[256];
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL) {
+ hash = memif_create_socket_hash();
+ if (hash == NULL) {
+ MIF_LOG(ERR, "Failed to create memif socket hash.");
+ return -1;
+ }
+ }
+
+ memset(key, 0, 256);
+ rte_memcpy(key, socket_filename, strlen(socket_filename));
+ ret = rte_hash_lookup_data(hash, key, (void **)&socket);
+ if (ret < 0) {
+ socket = memif_socket_create(pmd, key,
+ (pmd->role ==
+ MEMIF_ROLE_SLAVE) ? 0 : 1);
+ if (socket == NULL)
+ return -1;
+ ret = rte_hash_add_key_data(hash, key, socket);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to add socket to socket hash.");
+ return ret;
+ }
+ }
+ pmd->socket_filename = socket->filename;
+
+ if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
+ MIF_LOG(ERR, "Socket is a listener.");
+ return -1;
+ } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
+ MIF_LOG(ERR, "Socket is not a listener.");
+ return -1;
+ }
+
+ TAILQ_FOREACH(elt, &socket->dev_queue, next) {
+ tmp_pmd = elt->dev->data->dev_private;
+ if (tmp_pmd->id == pmd->id) {
+ MIF_LOG(ERR, "Memif device with id %d already "
+ "exists on socket %s",
+ pmd->id, socket->filename);
+ return -1;
+ }
+ }
+
+ elt =
+ rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt),
+ 0);
+ if (elt == NULL) {
+ MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ elt->dev = dev;
+ TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
+
+ return 0;
+}
+
+void
+memif_socket_remove_device(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt, *next;
+ struct rte_hash *hash;
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL)
+ return;
+
+ if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) <
+ 0)
+ return;
+
+ for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->dev == dev) {
+ TAILQ_REMOVE(&socket->dev_queue, elt, next);
+ rte_free(elt);
+ pmd->socket_filename = NULL;
+ }
+ }
+
+ /* remove socket, if this was the last device using it */
+ if (TAILQ_EMPTY(&socket->dev_queue)) {
+ rte_hash_del_key(hash, socket->filename);
+ if (socket->listener) {
+ /* remove listener socket file,
+ * so we can create new one later.
+ */
+ remove(socket->filename);
+ }
+ rte_free(socket);
+ }
+}
+
+int
+memif_connect_master(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+ return 0;
+}
+
+int
+memif_connect_slave(struct rte_eth_dev *dev)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_un sun;
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0) {
+ MIF_LOG(ERR, "%s: Failed to open socket.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ sun.sun_family = AF_UNIX;
+
+ memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
+
+ ret = connect(sockfd, (struct sockaddr *)&sun,
+ sizeof(struct sockaddr_un));
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+ goto error;
+ }
+
+ MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+
+ pmd->cc = rte_zmalloc("memif-cc",
+ sizeof(struct memif_control_channel), 0);
+ if (pmd->cc == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate control channel.",
+ rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ pmd->cc->intr_handle.fd = sockfd;
+ pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ pmd->cc->socket = NULL;
+ pmd->cc->dev = dev;
+ TAILQ_INIT(&pmd->cc->msg_queue);
+
+ ret = rte_intr_callback_register(&pmd->cc->intr_handle,
+ memif_intr_handler, pmd->cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt callback "
+ "for control fd", rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ return 0;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (pmd->cc != NULL) {
+ rte_free(pmd->cc);
+ pmd->cc = NULL;
+ }
+ return -1;
+}
diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
new file mode 100644
index 000000000..2e8970712
--- /dev/null
+++ b/drivers/net/memif/memif_socket.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_SOCKET_H_
+#define _MEMIF_SOCKET_H_
+
+#include <sys/queue.h>
+
+/**
+ * Remove device from socket device list. If no device is left on the socket,
+ * remove the socket as well.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_socket_remove_device(struct rte_eth_dev *dev);
+
+/**
+ * Enqueue disconnect message to control channel message queue.
+ *
+ * @param cc
+ * control channel
+ * @param reason
+ * const string stating disconnect reason (96 characters)
+ * @param err_code
+ * error code
+ */
+void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code);
+
+/**
+ * Initialize memif socket for specified device. If socket doesn't exist, create socket.
+ *
+ * @param dev
+ * memif ethernet device
+ * @param socket_filename
+ * socket filename
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
+
+/**
+ * Disconnect memif device. Close control channel and shared memory.
+ *
+ * @param dev
+ * ethernet device
+ */
+void memif_disconnect(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, enable connection establishment.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_master(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, send connection request.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_slave(struct rte_eth_dev *dev);
+
+struct memif_socket_dev_list_elt {
+ TAILQ_ENTRY(memif_socket_dev_list_elt) next;
+ struct rte_eth_dev *dev; /**< pointer to device internals */
+ char dev_name[RTE_ETH_NAME_MAX_LEN];
+};
+
+#define MEMIF_SOCKET_HASH_NAME "memif-sh"
+struct memif_socket {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ uint8_t listener; /**< if not zero socket is listener */
+ char filename[256]; /**< socket filename */
+
+ TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
+ /**< Queue of devices using this socket */
+};
+
+/* Control message queue. */
+struct memif_msg_queue_elt {
+ TAILQ_ENTRY(memif_msg_queue_elt) next;
+ memif_msg_t msg; /**< control message */
+ int fd; /**< fd to be sent to peer */
+};
+
+struct memif_control_channel {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
+ struct memif_socket *socket; /**< pointer to socket */
+ struct rte_eth_dev *dev; /**< pointer to device */
+};
+
+#endif /* MEMIF_SOCKET_H */
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
new file mode 100644
index 000000000..4dfe37d3a
--- /dev/null
+++ b/drivers/net/memif/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+if host_machine.system() != 'linux'
+ build = false
+endif
+
+sources = files('rte_eth_memif.c',
+ 'memif_socket.c')
+
+allow_experimental_apis = true
+
+deps += ['hash']
diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
new file mode 100644
index 000000000..c50f14f08
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -0,0 +1,1211 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/if_ether.h>
+#include <errno.h>
+#include <sys/eventfd.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+#define ETH_MEMIF_ID_ARG "id"
+#define ETH_MEMIF_ROLE_ARG "role"
+#define ETH_MEMIF_PKT_BUFFER_SIZE_ARG "bsize"
+#define ETH_MEMIF_RING_SIZE_ARG "rsize"
+#define ETH_MEMIF_SOCKET_ARG "socket"
+#define ETH_MEMIF_MAC_ARG "mac"
+#define ETH_MEMIF_ZC_ARG "zero-copy"
+#define ETH_MEMIF_SECRET_ARG "secret"
+
+static const char *valid_arguments[] = {
+ ETH_MEMIF_ID_ARG,
+ ETH_MEMIF_ROLE_ARG,
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ ETH_MEMIF_RING_SIZE_ARG,
+ ETH_MEMIF_SOCKET_ARG,
+ ETH_MEMIF_MAC_ARG,
+ ETH_MEMIF_ZC_ARG,
+ ETH_MEMIF_SECRET_ARG,
+ NULL
+};
+
+/* Message header to synchronize queues via IPC */
+struct ipc_queues {
+ char port_name[RTE_DEV_NAME_MAX_LEN];
+ int rxq_count;
+ int txq_count;
+ /*
+ * The file descriptors are in the dedicated part
+ * of the Unix message to be translated by the kernel.
+ */
+};
+
+const char *
+memif_version(void)
+{
+ return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
+}
+
+static void
+memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
+{
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+ dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static memif_ring_t *
+memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
+{
+ /* rings only in region 0 */
+ void *p = pmd->regions[0]->addr;
+ int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size);
+
+ p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
+
+ return (memif_ring_t *)p;
+}
+
+static void *
+memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
+{
+ return ((uint8_t *)pmd->regions[d->region]->addr +
+ pmd->regions[d->region]->pkt_buffer_offset + d->offset);
+}
+
+static int
+memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail,
+ struct rte_mbuf *tail)
+{
+ /* Check for number-of-segments-overflow */
+ if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS))
+ return -EOVERFLOW;
+
+ /* Chain 'tail' onto the old tail */
+ cur_tail->next = tail;
+
+ /* accumulate number of segments and total length. */
+ head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs);
+
+ tail->pkt_len = tail->data_len;
+ head->pkt_len += tail->pkt_len;
+
+ return 0;
+}
+
+static uint16_t
+eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
+ uint16_t n_rx_pkts = 0;
+ uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
+ RTE_PKTMBUF_HEADROOM;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail;
+ uint64_t b;
+ ssize_t size __rte_unused;
+ uint16_t head;
+ int ret;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ /* consume interrupt */
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
+ size = read(mq->intr_handle.fd, &b, sizeof(b));
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
+ last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+ if (cur_slot == last_slot)
+ goto refill;
+ n_slots = last_slot - cur_slot;
+
+ while (n_slots && n_rx_pkts < nb_pkts) {
+ mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf_head == NULL))
+ goto no_free_bufs;
+ mbuf = mbuf_head;
+ mbuf->port = mq->in_port;
+
+next_slot:
+ s0 = cur_slot & mask;
+ d0 = &ring->desc[s0];
+
+ src_len = d0->length;
+ dst_off = 0;
+ src_off = 0;
+
+ do {
+ dst_len = mbuf_size - dst_off;
+ if (dst_len == 0) {
+ dst_off = 0;
+ dst_len = mbuf_size;
+
+ /* store pointer to tail */
+ mbuf_tail = mbuf;
+ mbuf = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf == NULL))
+ goto no_free_bufs;
+ mbuf->port = mq->in_port;
+ ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf);
+ if (unlikely(ret < 0)) {
+ MIF_LOG(ERR, "%s: number-of-segments-overflow",
+ rte_vdev_device_name(pmd->vdev));
+ rte_pktmbuf_free(mbuf);
+ goto no_free_bufs;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ rte_pktmbuf_data_len(mbuf) += cp_len;
+ rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
+ if (mbuf != mbuf_head)
+ rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
+
+ memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
+ (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
+
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ } while (src_len);
+
+ cur_slot++;
+ n_slots--;
+
+ if (d0->flags & MEMIF_DESC_FLAG_NEXT)
+ goto next_slot;
+
+ mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
+ *bufs++ = mbuf_head;
+ n_rx_pkts++;
+ }
+
+no_free_bufs:
+ if (type == MEMIF_RING_S2M) {
+ rte_mb();
+ ring->tail = cur_slot;
+ mq->last_head = cur_slot;
+ } else {
+ mq->last_tail = cur_slot;
+ }
+
+refill:
+ if (type == MEMIF_RING_M2S) {
+ head = ring->head;
+ n_slots = ring_size - head + mq->last_tail;
+
+ while (n_slots--) {
+ s0 = head++ & mask;
+ d0 = &ring->desc[s0];
+ d0->length = pmd->run.pkt_buffer_size;
+ }
+ rte_mb();
+ ring->head = head;
+ }
+
+ mq->n_pkts += n_rx_pkts;
+ return n_rx_pkts;
+}
+
+static uint16_t
+eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head;
+ uint64_t a;
+ ssize_t size;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ n_free = ring->tail - mq->last_tail;
+ mq->last_tail += n_free;
+ slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+
+ if (type == MEMIF_RING_S2M)
+ n_free = ring_size - ring->head + mq->last_tail;
+ else
+ n_free = ring->head - ring->tail;
+
+ while (n_tx_pkts < nb_pkts && n_free) {
+ mbuf_head = *bufs++;
+ mbuf = mbuf_head;
+
+ saved_slot = slot;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+
+next_in_chain:
+ src_off = 0;
+ src_len = rte_pktmbuf_data_len(mbuf);
+
+ while (src_len) {
+ if (dst_len == 0) {
+ if (n_free) {
+ slot++;
+ n_free--;
+ d0->flags |= MEMIF_DESC_FLAG_NEXT;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+ d0->flags = 0;
+ } else {
+ slot = saved_slot;
+ goto no_free_slots;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
+ rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
+ cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ dst_len -= cp_len;
+
+ d0->length = dst_off;
+ }
+
+ if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
+ mbuf = mbuf->next;
+ goto next_in_chain;
+ }
+
+ n_tx_pkts++;
+ slot++;
+ n_free--;
+ rte_pktmbuf_free(mbuf_head);
+ }
+
+no_free_slots:
+ rte_mb();
+ if (type == MEMIF_RING_S2M)
+ ring->head = slot;
+ else
+ ring->tail = slot;
+
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
+ a = 1;
+ size = write(mq->intr_handle.fd, &a, sizeof(a));
+ if (unlikely(size < 0)) {
+ MIF_LOG(WARNING,
+ "%s: Failed to send interrupt. %s",
+ rte_vdev_device_name(pmd->vdev), strerror(errno));
+ }
+ }
+
+ mq->n_err += nb_pkts - n_tx_pkts;
+ mq->n_pkts += n_tx_pkts;
+ return n_tx_pkts;
+}
+
+void
+memif_free_regions(struct pmd_internals *pmd)
+{
+ int i;
+ struct memif_region *r;
+
+ /* regions are allocated contiguously, so it's
+ * enough to loop until 'pmd->regions_num'
+ */
+ for (i = 0; i < pmd->regions_num; i++) {
+ r = pmd->regions[i];
+ if (r != NULL) {
+ if (r->addr != NULL) {
+ munmap(r->addr, r->region_size);
+ if (r->fd > 0) {
+ close(r->fd);
+ r->fd = -1;
+ }
+ }
+ rte_free(r);
+ pmd->regions[i] = NULL;
+ }
+ }
+ pmd->regions_num = 0;
+}
+
+static int
+memif_region_init_shm(struct pmd_internals *pmd, uint8_t has_buffers)
+{
+ char shm_name[ETH_MEMIF_SHM_NAME_SIZE];
+ int ret = 0;
+ struct memif_region *r;
+
+ if (pmd->regions_num >= ETH_MEMIF_MAX_REGION_NUM) {
+ MIF_LOG(ERR, "%s: Too many regions.", rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ /* calculate buffer offset */
+ r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+
+ r->region_size = r->pkt_buffer_offset;
+ /* if region has buffers, add buffers size to region_size */
+ if (has_buffers == 1)
+ r->region_size += (uint32_t)(pmd->run.pkt_buffer_size *
+ (1 << pmd->run.log2_ring_size) *
+ (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings));
+
+ memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE);
+ snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d",
+ pmd->regions_num);
+
+ r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
+ if (r->fd < 0) {
+ MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ ret = -1;
+ goto error;
+ }
+
+ ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ ret = ftruncate(r->fd, r->region_size);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ r->addr = mmap(NULL, r->region_size, PROT_READ |
+ PROT_WRITE, MAP_SHARED, r->fd, 0);
+ if (r->addr == MAP_FAILED) {
+ MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(ret));
+ ret = -1;
+ goto error;
+ }
+
+ pmd->regions[pmd->regions_num] = r;
+ pmd->regions_num++;
+
+ return ret;
+
+error:
+ if (r->fd > 0)
+ close(r->fd);
+ r->fd = -1;
+
+ return ret;
+}
+
+static int
+memif_regions_init(struct pmd_internals *pmd)
+{
+ int ret;
+
+ /* create one buffer region */
+ ret = memif_region_init_shm(pmd, /* has buffer */ 1);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void
+memif_init_rings(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_ring_t *ring;
+ int i, j;
+ uint16_t slot;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = i * (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = (i + pmd->run.num_s2m_rings) *
+ (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+}
+
+/* called only by slave */
+static void
+memif_init_queues(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = dev->data->tx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for tx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = dev->data->rx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for rx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+}
+
+int
+memif_init_regions_and_queues(struct rte_eth_dev *dev)
+{
+ int ret;
+
+ ret = memif_regions_init(dev->data->dev_private);
+ if (ret < 0)
+ return ret;
+
+ memif_init_rings(dev);
+
+ memif_init_queues(dev);
+
+ return 0;
+}
+
+int
+memif_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_region *mr;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ mr = pmd->regions[i];
+ if (mr != NULL) {
+ if (mr->addr == NULL) {
+ if (mr->fd < 0)
+ return -1;
+ mr->addr = mmap(NULL, mr->region_size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, mr->fd, 0);
+ if (mr->addr == NULL)
+ return -1;
+ /* calculate packet buffer offset */
+ mr->pkt_buffer_offset = (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+ }
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_MASTER)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_SLAVE)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
+ dev->data->dev_link.link_status = ETH_LINK_UP;
+ MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
+ return 0;
+}
+
+static int
+memif_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int ret = 0;
+
+ switch (pmd->role) {
+ case MEMIF_ROLE_SLAVE:
+ ret = memif_connect_slave(dev);
+ break;
+ case MEMIF_ROLE_MASTER:
+ ret = memif_connect_master(dev);
+ break;
+ default:
+ MIF_LOG(ERR, "%s: Unknown role: %d.",
+ rte_vdev_device_name(pmd->vdev), pmd->role);
+ ret = -1;
+ break;
+ }
+
+ return ret;
+}
+
+static int
+memif_dev_configure(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ /*
+ * SLAVE - TXQ
+ * MASTER - RXQ
+ */
+ pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_tx_queues : dev->data->nb_rx_queues;
+
+ /*
+ * SLAVE - RXQ
+ * MASTER - TXQ
+ */
+ pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+
+ return 0;
+}
+
+static int
+memif_tx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type =
+ (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->pmd = pmd;
+ dev->data->tx_queues[qid] = mq;
+
+ return 0;
+}
+
+static int
+memif_rx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ printf("Setting up rx queue...\n");
+
+ mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->mempool = mb_pool;
+ mq->in_port = dev->data->port_id;
+ mq->pmd = pmd;
+ dev->data->rx_queues[qid] = mq;
+
+ return 0;
+}
+
+static void
+memif_queue_release(void *queue)
+{
+ struct memif_queue *q = (struct memif_queue *)queue;
+
+ if (!q)
+ return;
+
+ rte_free(q);
+}
+
+static int
+memif_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static int
+memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+ uint8_t tmp, nq;
+
+ stats->ipackets = 0;
+ stats->ibytes = 0;
+ stats->opackets = 0;
+ stats->obytes = 0;
+ stats->oerrors = 0;
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
+ pmd->run.num_m2s_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* RX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->rx_queues[i];
+ stats->q_ipackets[i] = mq->n_pkts;
+ stats->q_ibytes[i] = mq->n_bytes;
+ stats->ipackets += mq->n_pkts;
+ stats->ibytes += mq->n_bytes;
+ }
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
+ pmd->run.num_s2m_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* TX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->tx_queues[i];
+ stats->q_opackets[i] = mq->n_pkts;
+ stats->q_obytes[i] = mq->n_bytes;
+ stats->q_errors[i] = mq->n_err;
+ stats->opackets += mq->n_pkts;
+ stats->obytes += mq->n_bytes;
+ stats->oerrors += mq->n_err;
+ }
+ return 0;
+}
+
+static void
+memif_stats_reset(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+ struct memif_queue *mq;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
+ dev->data->rx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
+ dev->data->tx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+}
+
+static int
+memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
+ rte_vdev_device_name(pmd->vdev));
+
+ return -1;
+}
+
+static int
+memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
+
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = memif_dev_start,
+ .dev_infos_get = memif_dev_info,
+ .dev_configure = memif_dev_configure,
+ .tx_queue_setup = memif_tx_queue_setup,
+ .rx_queue_setup = memif_rx_queue_setup,
+ .rx_queue_release = memif_queue_release,
+ .tx_queue_release = memif_queue_release,
+ .rx_queue_intr_enable = memif_rx_queue_intr_enable,
+ .rx_queue_intr_disable = memif_rx_queue_intr_disable,
+ .link_update = memif_link_update,
+ .stats_get = memif_stats_get,
+ .stats_reset = memif_stats_reset,
+};
+
+static int
+memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
+ memif_interface_id_t id, uint32_t flags,
+ const char *socket_filename,
+ memif_log2_ring_size_t log2_ring_size,
+ uint16_t pkt_buffer_size, const char *secret,
+ struct ether_addr *ether_addr)
+{
+ int ret = 0;
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct pmd_internals *pmd;
+ const unsigned int numa_node = vdev->device.numa_node;
+ const char *name = rte_vdev_device_name(vdev);
+
+ if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
+ MIF_LOG(ERR, "Zero-copy slave not supported.");
+ return -1;
+ }
+
+ eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
+ if (eth_dev == NULL) {
+ MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
+ return -1;
+ }
+
+ pmd = eth_dev->data->dev_private;
+ memset(pmd, 0, sizeof(*pmd));
+
+ pmd->vdev = vdev;
+ pmd->id = id;
+ pmd->flags = flags;
+ pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
+ pmd->role = role;
+ printf("%s\n", role == MEMIF_ROLE_SLAVE ? "SLAVE" : "MASTER");
+ ret = memif_socket_init(eth_dev, socket_filename);
+ if (ret < 0)
+ return ret;
+
+ memset(pmd->secret, 0, sizeof(char) * 24);
+ if (secret != NULL)
+ strlcpy(pmd->secret, secret, sizeof(pmd->secret));
+
+ pmd->cfg.log2_ring_size = log2_ring_size;
+ /* set in .dev_configure() */
+ pmd->cfg.num_s2m_rings = 0;
+ pmd->cfg.num_m2s_rings = 0;
+
+ pmd->cfg.pkt_buffer_size = pkt_buffer_size;
+
+ data = eth_dev->data;
+ data->dev_private = pmd;
+ data->numa_node = numa_node;
+ data->mac_addrs = ether_addr;
+
+ eth_dev->dev_ops = &ops;
+ eth_dev->device = &vdev->device;
+ eth_dev->rx_pkt_burst = eth_memif_rx;
+ eth_dev->tx_pkt_burst = eth_memif_tx;
+
+ rte_eth_dev_probing_finish(eth_dev);
+
+ return 0;
+}
+
+static int
+memif_set_role(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ enum memif_role_t *role = (enum memif_role_t *)extra_args;
+
+ if (strstr(value, "master") != NULL) {
+ *role = MEMIF_ROLE_MASTER;
+ } else if (strstr(value, "slave") != NULL) {
+ *role = MEMIF_ROLE_SLAVE;
+ } else {
+ MIF_LOG(ERR, "Unknown role: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint32_t *flags = (uint32_t *)extra_args;
+
+ if (strstr(value, "yes") != NULL) {
+ *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
+ } else if (strstr(value, "no") != NULL) {
+ *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
+ } else {
+ MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
+
+ /* even if parsing fails, 0 is a valid id */
+ *id = strtoul(value, NULL, 10);
+ return 0;
+}
+
+static int
+memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ uint16_t *pkt_buffer_size = (uint16_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > 0xFFFF) {
+ MIF_LOG(ERR, "Invalid buffer size: %s.", value);
+ return -EINVAL;
+ }
+ *pkt_buffer_size = tmp;
+ return 0;
+}
+
+static int
+memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ memif_log2_ring_size_t *log2_ring_size =
+ (memif_log2_ring_size_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
+ MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
+ value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
+ return -EINVAL;
+ }
+ *log2_ring_size = tmp;
+ return 0;
+}
+
+/* check if directory exists and if we have permission to read/write */
+static int
+memif_check_socket_filename(const char *filename)
+{
+ char *dir = NULL, *tmp;
+ uint32_t idx;
+ int ret = 0;
+
+ tmp = strrchr(filename, '/');
+ if (tmp != NULL) {
+ idx = tmp - filename;
+ dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
+ if (dir == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory.");
+ return -1;
+ }
+ strlcpy(dir, filename, sizeof(char) * (idx + 1));
+ }
+
+ if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
+ W_OK, AT_EACCESS) < 0)) {
+ MIF_LOG(ERR, "Invalid directory: '%s'.", dir);
+ ret = -EINVAL;
+ }
+
+ if (dir != NULL)
+ rte_free(dir);
+
+ return ret;
+}
+
+static int
+memif_set_socket_filename(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ const char **socket_filename = (const char **)extra_args;
+
+ *socket_filename = value;
+ return memif_check_socket_filename(*socket_filename);
+}
+
+static int
+memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ struct ether_addr *ether_addr = (struct ether_addr *)extra_args;
+ int ret = 0;
+
+ ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+ ðer_addr->addr_bytes[0], ðer_addr->addr_bytes[1],
+ ðer_addr->addr_bytes[2], ðer_addr->addr_bytes[3],
+ ðer_addr->addr_bytes[4], ðer_addr->addr_bytes[5]);
+ if (ret != 6)
+ MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
+ return 0;
+}
+
+static int
+memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **secret = (const char **)extra_args;
+
+ *secret = value;
+ return 0;
+}
+
+static int
+rte_pmd_memif_probe(struct rte_vdev_device *vdev)
+{
+ RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
+ RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
+ int ret = 0;
+ struct rte_kvargs *kvlist;
+ const char *name = rte_vdev_device_name(vdev);
+ enum memif_role_t role = MEMIF_ROLE_SLAVE;
+ memif_interface_id_t id = 0;
+ uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
+ memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
+ const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
+ uint32_t flags = 0;
+ const char *secret = NULL;
+ struct ether_addr *ether_addr = rte_zmalloc("", sizeof(struct ether_addr), 0);
+
+ eth_random_addr(ether_addr->addr_bytes);
+
+ MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
+
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ MIF_LOG(ERR, "Multi-processing not supported for memif.");
+ /* TODO:
+ * Request connection information.
+ *
+ * Once memif in the primary process is connected,
+ * broadcast connection information.
+ */
+ return -1;
+ }
+
+ kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
+
+ /* parse parameters */
+ if (kvlist != NULL) {
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
+ &memif_set_role, &role);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
+ &memif_set_id, &id);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ &memif_set_bs, &pkt_buffer_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
+ &memif_set_rs, &log2_ring_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
+ &memif_set_socket_filename,
+ (void *)(&socket_filename));
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
+ &memif_set_mac, ether_addr);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
+ &memif_set_zc, &flags);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
+ &memif_set_secret, (void *)(&secret));
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* create interface */
+ ret = memif_create(vdev, role, id, flags, socket_filename,
+ log2_ring_size, pkt_buffer_size, secret, ether_addr);
+
+exit:
+ if (kvlist != NULL)
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_memif_remove(struct rte_vdev_device *vdev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internals *pmd;
+
+ eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
+ if (eth_dev == NULL)
+ return 0;
+
+ pmd = eth_dev->data->dev_private;
+
+ memif_msg_enq_disconnect(pmd->cc, "Device removed", 0);
+ memif_disconnect(eth_dev);
+
+ memif_socket_remove_device(eth_dev);
+
+ pmd->vdev = NULL;
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_vdev_driver pmd_memif_drv = {
+ .probe = rte_pmd_memif_probe,
+ .remove = rte_pmd_memif_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
+
+RTE_PMD_REGISTER_PARAM_STRING(net_memif,
+ ETH_MEMIF_ID_ARG "=<int>"
+ ETH_MEMIF_ROLE_ARG "=master|slave"
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG "=<int>"
+ ETH_MEMIF_RING_SIZE_ARG "=<int>"
+ ETH_MEMIF_SOCKET_ARG "=<string>"
+ ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
+ ETH_MEMIF_ZC_ARG "=yes|no"
+ ETH_MEMIF_SECRET_ARG "=<string>");
+
+int memif_logtype;
+
+RTE_INIT(memif_init_log)
+{
+ memif_logtype = rte_log_register("pmd.net.memif");
+ if (memif_logtype >= 0)
+ rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
new file mode 100644
index 000000000..e640c975f
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.h
@@ -0,0 +1,207 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_ETH_MEMIF_H_
+#define _RTE_ETH_MEMIF_H_
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif /* GNU_SOURCE */
+
+#include <sys/queue.h>
+
+#include <rte_ethdev_driver.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
+#include "memif.h"
+
+#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/tmp/memif.sock"
+#define ETH_MEMIF_DEFAULT_RING_SIZE 10
+#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
+
+#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
+#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
+#define ETH_MEMIF_MAX_REGION_NUM 256
+
+#define ETH_MEMIF_SHM_NAME_SIZE 32
+
+extern int memif_logtype;
+
+#define MIF_LOG(level, fmt, args...) \
+ rte_log(RTE_LOG_ ## level, memif_logtype, \
+ "%s(): " fmt "\n", __func__, ##args)
+
+enum memif_role_t {
+ MEMIF_ROLE_MASTER,
+ MEMIF_ROLE_SLAVE,
+};
+
+struct memif_region {
+ void *addr; /**< shared memory address */
+ memif_region_size_t region_size; /**< shared memory size */
+ int fd; /**< shared memory file descriptor */
+ uint32_t pkt_buffer_offset;
+ /**< offset from 'addr' to first packet buffer */
+};
+
+struct memif_queue {
+ struct rte_mempool *mempool; /**< mempool for RX packets */
+ uint16_t in_port; /**< port id */
+
+ struct pmd_internals *pmd; /**< device internals */
+
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+
+ /* ring info */
+ memif_ring_type_t type; /**< ring type */
+ memif_ring_t *ring; /**< pointer to ring */
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+
+ memif_region_index_t region; /**< shared memory region index */
+ memif_region_offset_t offset;
+ /**< ring offset from start of shm region (ring - memif_region.addr) */
+
+ uint16_t last_head; /**< last ring head */
+ uint16_t last_tail; /**< last ring tail */
+
+ /* rx/tx info */
+ uint64_t n_pkts; /**< number of rx/tx packets */
+ uint64_t n_bytes; /**< number of rx/tx bytes */
+ uint64_t n_err; /**< number of tx errors */
+};
+
+struct pmd_internals {
+ memif_interface_id_t id; /**< unique id */
+ enum memif_role_t role; /**< device role */
+ uint32_t flags; /**< device status flags */
+#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
+/**< device is connecting */
+#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
+/**< device is connected */
+#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
+/**< device is zero-copy enabled */
+#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
+/**< device has not been configured and can not accept connection requests */
+
+ char *socket_filename; /**< pointer to socket filename */
+ char secret[24]; /**< secret (optional security parameter) */
+
+ struct memif_control_channel *cc; /**< control channel */
+
+ struct memif_region *regions[ETH_MEMIF_MAX_REGION_NUM];
+ /**< shared memory regions */
+ memif_region_index_t regions_num; /**< number of regions */
+
+ /* remote info */
+ char remote_name[64]; /**< remote app name */
+ char remote_if_name[64]; /**< remote peer name */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } cfg; /**< Configured parameters (max values) */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } run;
+ /**< Parameters used in active connection */
+
+ char local_disc_string[96]; /**< local disconnect reason */
+ char remote_disc_string[96]; /**< remote disconnect reason */
+
+ struct rte_vdev_device *vdev; /**< vdev handle */
+};
+
+/**
+ * Unmap shared memory and free regions from memory.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_free_regions(struct pmd_internals *pmd);
+
+/**
+ * Finalize connection establishment process. Map shared memory file
+ * (master role), initialize ring queue, set link status up.
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect(struct rte_eth_dev *dev);
+
+/**
+ * Create shared memory file and initialize ring queue.
+ * Only called by slave when establishing connection
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_init_regions_and_queues(struct rte_eth_dev *dev);
+
+/**
+ * Get memif version string.
+ *
+ * @return
+ * - memif version string
+ */
+const char *memif_version(void);
+
+#ifndef MFD_HUGETLB
+#ifndef __NR_memfd_create
+
+#if defined __x86_64__
+#define __NR_memfd_create 319
+#elif defined __x86_32__
+#define __NR_memfd_create 1073742143
+#elif defined __arm__
+#define __NR_memfd_create 385
+#elif defined __aarch64__
+#define __NR_memfd_create 279
+#elif defined __powerpc__
+#define __NR_memfd_create 360
+#elif defined __i386__
+#define __NR_memfd_create 356
+#else
+#error "__NR_memfd_create unknown for this architecture"
+#endif
+
+#endif /* __NR_memfd_create */
+
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif /* MFD_HUGETLB */
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#endif /* RTE_ETH_MEMIF_H */
diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
new file mode 100644
index 000000000..4b2e62193
--- /dev/null
+++ b/drivers/net/memif/rte_pmd_memif_version.map
@@ -0,0 +1,4 @@
+DPDK_19.05 {
+
+ local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index ed99896c3..b57073483 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -24,6 +24,7 @@ drivers = ['af_packet',
'ixgbe',
'kni',
'liquidio',
+ 'memif',
'mlx4',
'mlx5',
'mvneta',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7c9b4b538..326b2a30b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -174,6 +174,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
--
2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* [dpdk-dev] [RFC v7] /net: memory interface (memif)
2019-05-09 8:30 ` [dpdk-dev] [RFC v6] " Jakub Grajciar
2019-05-09 8:30 ` Jakub Grajciar
@ 2019-05-13 10:45 ` Jakub Grajciar
2019-05-13 10:45 ` Jakub Grajciar
2019-05-16 11:46 ` [dpdk-dev] [RFC v8] " Jakub Grajciar
1 sibling, 2 replies; 58+ messages in thread
From: Jakub Grajciar @ 2019-05-13 10:45 UTC (permalink / raw)
To: dev; +Cc: Jakub Grajciar
Memory interface (memif), provides high performance
packet transfer over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
---
MAINTAINERS | 6 +
app/test-pmd/testpmd.c | 2 +
config/common_base | 5 +
config/common_linux | 1 +
doc/guides/nics/features/memif.ini | 14 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/memif.rst | 234 ++++
doc/guides/rel_notes/release_19_05.rst | 4 +
drivers/net/Makefile | 1 +
drivers/net/memif/Makefile | 30 +
drivers/net/memif/memif.h | 179 +++
drivers/net/memif/memif_socket.c | 1103 +++++++++++++++++
drivers/net/memif/memif_socket.h | 105 ++
drivers/net/memif/meson.build | 13 +
drivers/net/memif/rte_eth_memif.c | 1202 +++++++++++++++++++
drivers/net/memif/rte_eth_memif.h | 207 ++++
drivers/net/memif/rte_pmd_memif_version.map | 4 +
drivers/net/meson.build | 1 +
mk/rte.app.mk | 1 +
19 files changed, 3113 insertions(+)
create mode 100644 doc/guides/nics/features/memif.ini
create mode 100644 doc/guides/nics/memif.rst
create mode 100644 drivers/net/memif/Makefile
create mode 100644 drivers/net/memif/memif.h
create mode 100644 drivers/net/memif/memif_socket.c
create mode 100644 drivers/net/memif/memif_socket.h
create mode 100644 drivers/net/memif/meson.build
create mode 100644 drivers/net/memif/rte_eth_memif.c
create mode 100644 drivers/net/memif/rte_eth_memif.h
create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
requires patch: http://patchwork.dpdk.org/patch/51511/
Memif uses a unix domain socket as control channel (cc).
rte_interrupts.h is used to handle interrupts for this cc.
This patch is required, because the message received can
be a reason for disconnecting.
v3:
- coding style fixes
- documentation
- doxygen comments
- use strlcpy() instead of strncpy()
- use RTE_BUILD_BUG_ON instead of _Static_assert()
- fix meson build deps
v4:
- coding style fixes
- doc update (desc format, messaging, features, ...)
- pointer arithmetic fix
- __rte_packed and __rte_aligned instead of __attribute__()
- fixed multi-queue
- support additional archs (memfd_create syscall)
v5:
- coding style fixes
- add release notes
- remove unused dependencies
- doc update
v6:
- remove socket on termination of testpmd application
- disallow memif probing in secondary process (return error), multi-process support will be implemented in separate patch
- fix chained buffer packet length
- use single region for rings and buffers (non-zero-copy)
- doc update
v7:
- coding style fixes
diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829c5..3698c146b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -841,6 +841,12 @@ F: drivers/net/softnic/
F: doc/guides/nics/features/softnic.ini
F: doc/guides/nics/softnic.rst
+Memif PMD
+M: Jakub Grajciar <jgrajcia@cisco.com>
+F: drivers/net/memif/
+F: doc/guides/nics/memif.rst
+F: doc/guides/nics/features/memif.ini
+
Crypto Drivers
--------------
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6fbfd29dd..9eec4c19d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2493,6 +2493,8 @@ pmd_test_exit(void)
device = rte_eth_devices[pt_id].device;
if (device && !strcmp(device->driver->name, "net_virtio_user"))
detach_port_device(pt_id);
+ else if (device && !strcmp(device->driver->name, "net_memif"))
+ detach_port_device(pt_id);
}
}
diff --git a/config/common_base b/config/common_base
index 6b96e0e80..dca119a65 100644
--- a/config/common_base
+++ b/config/common_base
@@ -444,6 +444,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
#
CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+#
+# Compile Memory Interface PMD driver (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_MEMIF=n
+
#
# Compile link bonding PMD library
#
diff --git a/config/common_linux b/config/common_linux
index 75334273d..87514fe4f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_LIBRTE_IFC_PMD=y
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_MEMIF=y
CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
CONFIG_RTE_LIBRTE_PMD_TAP=y
CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
new file mode 100644
index 000000000..807d9ecdc
--- /dev/null
+++ b/doc/guides/nics/features/memif.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'memif' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status = Y
+Basic stats = Y
+Jumbo frame = Y
+ARMv8 = Y
+Power8 = Y
+x86-32 = Y
+x86-64 = Y
+Usage doc = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2221c35f2..691e7209f 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
intel_vf
kni
liquidio
+ memif
mlx4
mlx5
mvneta
diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
new file mode 100644
index 000000000..0d0312442
--- /dev/null
+++ b/doc/guides/nics/memif.rst
@@ -0,0 +1,234 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2018-2019 Cisco Systems, Inc.
+
+======================
+Memif Poll Mode Driver
+======================
+
+Shared memory packet interface (memif) PMD allows for DPDK and any other client
+using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
+Linux only.
+
+The created device transmits packets in a raw format. It can be used with
+Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
+supported in DPDK memif implementation.
+
+Memif works in two roles: master and slave. Slave connects to master over an
+existing socket. It is also a producer of shared memory file and initializes
+the shared memory. Each interface can be connected to one peer interface
+at same time. The peer interface is identified by id parameter. Master
+creates the socket and listens for any slave connection requests. The socket
+may already exist on the system. Be sure to remove any such sockets, if you
+are creating a master interface, or you will see an "Address already in use"
+error. Function ``rte_pmd_memif_remove()``, which removes memif interface,
+will also remove a listener socket, if it is not being used by any other
+interface.
+
+The method to enable one or more interfaces is to use the
+``--vdev=net_memif0`` option on the DPDK application command line. Each
+``--vdev=net_memif1`` option given will create an interface named net_memif0,
+net_memif1, and so on. Memif uses unix domain socket to transmit control
+messages. Each memif has a unique id per socket. This id is used to identify
+peer interface. If you are connecting multiple
+interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
+etc. Note that if you assign a socket to a master interface it becomes a
+listener socket. Listener socket can not be used by a slave interface on same
+client.
+
+.. csv-table:: **Memif configuration options**
+ :header: "Option", "Description", "Default", "Valid value"
+
+ "id=0", "Used to identify peer interface", "0", "uint32_t"
+ "role=master", "Set memif role", "slave", "master|slave"
+ "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
+ "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
+ "nrxq=2", "Number of RX queues", "1", "255"
+ "ntxq=2", "Number of TX queues", "1", "255"
+ "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
+ "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
+ "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
+ "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
+
+**Connection establishment**
+
+In order to create memif connection, two memif interfaces, each in separate
+process, are needed. One interface in ``master`` role and other in
+``slave`` role. It is not possible to connect two interfaces in a single
+process. Each interface can be connected to one interface at same time,
+identified by matching id parameter.
+
+Memif driver uses unix domain socket to exchange required information between
+memif interfaces. Socket file path is specified at interface creation see
+*Memif configuration options* table above. If socket is used by ``master``
+interface, it's marked as listener socket (in scope of current process) and
+listens to connection requests from other processes. One socket can be used by
+multiple interfaces. One process can have ``slave`` and ``master`` interfaces
+at the same time, provided each role is assigned unique socket.
+
+For detailed information on memif control messages, see: net/memif/memif.h.
+
+Slave interface attempts to make a connection on assigned socket. Process
+listening on this socket will extract the connection request and create a new
+connected socket (control channel). Then it sends the 'hello' message
+(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
+adjusts its configuration accordingly, and sends 'init' message
+(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
+uses this id to find master interface, and assigns the control channel to this
+interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
+sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
+every region allocated. Master responds to each of these messages with 'ack'
+message. Same behavior applies to rings. Slave sends 'add ring' message
+(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
+each message with 'ack' message. To finalize the connection, slave interface
+sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
+master maps regions to its address space, initializes rings and responds with
+'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
+(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
+any time, due to driver error or if the interface is being deleted.
+
+Files
+
+- net/memif/memif.h *- control messages definitions*
+- net/memif/memif_socket.h
+- net/memif/memif_socket.c
+
+Shared memory
+~~~~~~~~~~~~~
+
+**Shared memory format**
+
+Slave is producer and master is consumer. Memory regions, are mapped shared memory files,
+created by memif slave and provided to master at connection establishment.
+Regions contain rings and buffers. Rings and buffers can also be separated into multiple
+regions. For no-zero-copy, rings and buffers are stored inside single memory
+region to reduce the number of opened files.
+
+region n (no-zero-copy):
+
++-----------------------+-------------------------------------------------------------------------+
+| Rings | Buffers |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+| S2M rings | M2S rings | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+
+S2M OR M2S Rings:
+
++--------+--------+-----------------------+
+| ring 0 | ring 1 | ring num_s2m_rings - 1|
++--------+--------+-----------------------+
+
+ring 0:
+
++-------------+---------------------------------------+
+| ring header | (1 << pmd->run.log2_ring_size) * desc |
++-------------+---------------------------------------+
+
+Descriptors are assigned packet buffers in order of rings creation. If we have one ring
+in each direction and ring size is 1024, then first 1024 buffers will belong to S2M ring and
+last 1024 will belong to M2S ring. In case of zero-copy, buffers are dequeued and
+enqueued as needed.
+
+**Descriptor format**
+
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|0 |length |region |flags |
++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
+|1 |metadata |offset |
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+**Flags field - flags (Quad Word 0, bits 0:15)**
+
++-----+--------------------+------------------------------------------------------------------------------------------------+
+|Bits |Name |Functionality |
++=====+====================+================================================================================================+
+|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
++-----+--------------------+------------------------------------------------------------------------------------------------+
+
+**Region index - region (Quad Word 0, 16:31)**
+
+Index of memory region, the buffer is located in.
+
+**Data length - length (Quad Word 0, 32:63)**
+
+Length of transmitted/received data.
+
+**Data Offset - offset (Quad Word 1, 0:31)**
+
+Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
+
+**Metadata - metadata (Quad Word 1, 32:63)**
+
+Buffer metadata.
+
+Files
+
+- net/memif/memif.h *- descriptor and ring definitions*
+- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
+
+Example: testpmd and testpmd
+----------------------------
+In this example we run two instances of testpmd application and transmit packets over memif.
+
+First create ``master`` interface::
+
+ #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
+
+Now create ``slave`` interface (master must be already running so the slave will connect)::
+
+ #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
+
+Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
+
+ testpmd> set fwd rxonly
+ testpmd> start
+
+ testpmd> set fwd txonly
+ testpmd> start
+
+Show status::
+
+ testpmd> show port stats 0
+
+Example: testpmd and VPP
+------------------------
+For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
+
+Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
+
+ vpp# create interface memif id 0 master no-zero-copy
+ vpp# set interface state memif0/0 up
+ vpp# set interface ip address memif0/0 192.168.1.1/24
+
+To see socket filename use show memif command::
+
+ vpp# show memif
+ sockets
+ id listener filename
+ 0 yes (1) /run/vpp/memif.sock
+ ...
+
+Now create memif interface by running testpmd with these command line options::
+
+ #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
+
+Testpmd should now create memif slave interface and try to connect to master.
+In testpmd set forward option to icmpecho and start forwarding::
+
+ testpmd> set fwd icmpecho
+ testpmd> start
+
+Send ping from VPP::
+
+ vpp# ping 192.168.1.2
+ 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
+ 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
+ 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
+ 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 5044ac7df..c77001cd5 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -221,6 +221,10 @@ New Features
Added support for Intel SST-BF feature so that the distributor core is
pinned to a high frequency core if available.
+* **Added memif PMD.**
+
+ Added the new Shared Memory Packet Interface (``memif``) PMD.
+ See the :doc:`../nics/memif` guide for more details on this new driver.
Removed Items
-------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3a72cf38c..78cb10fc6 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -35,6 +35,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
DIRS-$(CONFIG_RTE_LIBRTE_IPN3KE_PMD) += ipn3ke
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
new file mode 100644
index 000000000..2ae3c37a2
--- /dev/null
+++ b/drivers/net/memif/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_memif.a
+
+EXPORT_MAP := rte_pmd_memif_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
+LDLIBS += -lrte_ethdev -lrte_kvargs
+LDLIBS += -lrte_hash
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
new file mode 100644
index 000000000..3948b1f50
--- /dev/null
+++ b/drivers/net/memif/memif.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_H_
+#define _MEMIF_H_
+
+#define MEMIF_COOKIE 0x3E31F20
+#define MEMIF_VERSION_MAJOR 2
+#define MEMIF_VERSION_MINOR 0
+#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
+#define MEMIF_NAME_SZ 32
+
+/*
+ * S2M: direction slave -> master
+ * M2S: direction master -> slave
+ */
+
+/*
+ * Type definitions
+ */
+
+typedef enum memif_msg_type {
+ MEMIF_MSG_TYPE_NONE,
+ MEMIF_MSG_TYPE_ACK,
+ MEMIF_MSG_TYPE_HELLO,
+ MEMIF_MSG_TYPE_INIT,
+ MEMIF_MSG_TYPE_ADD_REGION,
+ MEMIF_MSG_TYPE_ADD_RING,
+ MEMIF_MSG_TYPE_CONNECT,
+ MEMIF_MSG_TYPE_CONNECTED,
+ MEMIF_MSG_TYPE_DISCONNECT,
+} memif_msg_type_t;
+
+typedef enum {
+ MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
+ MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
+} memif_ring_type_t;
+
+typedef enum {
+ MEMIF_INTERFACE_MODE_ETHERNET,
+ MEMIF_INTERFACE_MODE_IP,
+ MEMIF_INTERFACE_MODE_PUNT_INJECT,
+} memif_interface_mode_t;
+
+typedef uint16_t memif_region_index_t;
+typedef uint32_t memif_region_offset_t;
+typedef uint64_t memif_region_size_t;
+typedef uint16_t memif_ring_index_t;
+typedef uint32_t memif_interface_id_t;
+typedef uint16_t memif_version_t;
+typedef uint8_t memif_log2_ring_size_t;
+
+/*
+ * Socket messages
+ */
+
+ /**
+ * M2S
+ * Contains master interfaces configuration.
+ */
+typedef struct __rte_packed {
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+ memif_version_t min_version; /**< lowest supported memif version */
+ memif_version_t max_version; /**< highest supported memif version */
+ memif_region_index_t max_region; /**< maximum num of regions */
+ memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
+ memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
+ memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
+} memif_msg_hello_t;
+
+/**
+ * S2M
+ * Contains information required to identify interface
+ * to which the slave wants to connect.
+ */
+typedef struct __rte_packed {
+ memif_version_t version; /**< memif version */
+ memif_interface_id_t id; /**< interface id */
+ memif_interface_mode_t mode:8; /**< interface mode */
+ uint8_t secret[24]; /**< optional security parameter */
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+} memif_msg_init_t;
+
+/**
+ * S2M
+ * Request master to add new shared memory region to master interface.
+ * Shared files file descriptor is passed in cmsghdr.
+ */
+typedef struct __rte_packed {
+ memif_region_index_t index; /**< shm regions index */
+ memif_region_size_t size; /**< shm region size */
+} memif_msg_add_region_t;
+
+/**
+ * S2M
+ * Request master to add new ring to master interface.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
+ memif_ring_index_t index; /**< ring index */
+ memif_region_index_t region; /**< region index on which this ring is located */
+ memif_region_offset_t offset; /**< buffer start offset */
+ memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
+ uint16_t private_hdr_size; /**< used for private metadata */
+} memif_msg_add_ring_t;
+
+/**
+ * S2M
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
+} memif_msg_connect_t;
+
+/**
+ * M2S
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
+} memif_msg_connected_t;
+
+/**
+ * S2M & M2S
+ * Disconnect interfaces.
+ */
+typedef struct __rte_packed {
+ uint32_t code; /**< error code */
+ uint8_t string[96]; /**< disconnect reason */
+} memif_msg_disconnect_t;
+
+typedef struct __rte_packed __rte_aligned(128)
+{
+ memif_msg_type_t type:16;
+ union {
+ memif_msg_hello_t hello;
+ memif_msg_init_t init;
+ memif_msg_add_region_t add_region;
+ memif_msg_add_ring_t add_ring;
+ memif_msg_connect_t connect;
+ memif_msg_connected_t connected;
+ memif_msg_disconnect_t disconnect;
+ };
+} memif_msg_t;
+
+/*
+ * Ring and Descriptor Layout
+ */
+
+/**
+ * Buffer descriptor.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
+ memif_region_index_t region; /**< region index on which the buffer is located */
+ uint32_t length; /**< buffer length */
+ memif_region_offset_t offset; /**< buffer offset */
+ uint32_t metadata;
+} memif_desc_t;
+
+#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
+ uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
+
+typedef struct {
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+ uint32_t cookie; /**< MEMIF_COOKIE */
+ uint16_t flags; /**< flags */
+#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ volatile uint16_t head; /**< pointer to ring buffer head */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ volatile uint16_t tail; /**< pointer to ring buffer tail */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ memif_desc_t desc[0]; /**< buffer descriptors */
+} memif_ring_t;
+
+#endif /* _MEMIF_H_ */
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
new file mode 100644
index 000000000..3596eb686
--- /dev/null
+++ b/drivers/net/memif/memif_socket.c
@@ -0,0 +1,1103 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+static void memif_intr_handler(void *arg);
+
+static ssize_t
+memif_msg_send(int fd, memif_msg_t *msg, int afd)
+{
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ struct cmsghdr *cmsg;
+ char ctl[CMSG_SPACE(sizeof(int))];
+
+ iov[0].iov_base = msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+
+ if (afd > 0) {
+ memset(&ctl, 0, sizeof(ctl));
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+ cmsg = CMSG_FIRSTHDR(&mh);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
+ }
+
+ return sendmsg(fd, &mh, 0);
+}
+
+static int
+memif_msg_send_from_queue(struct memif_control_channel *cc)
+{
+ ssize_t size;
+ int ret = 0;
+ struct memif_msg_queue_elt *e;
+
+ e = TAILQ_FIRST(&cc->msg_queue);
+ if (e == NULL)
+ return 0;
+
+ size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
+ ret = -1;
+ } else {
+ MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
+ }
+ TAILQ_REMOVE(&cc->msg_queue, e, next);
+ rte_free(e);
+
+ return ret;
+}
+
+static struct memif_msg_queue_elt *
+memif_msg_enq(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e;
+
+ e = rte_zmalloc("memif_msg", sizeof(struct memif_msg_queue_elt), 0);
+ if (e == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control message.");
+ return NULL;
+ }
+
+ e->fd = -1;
+ TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
+
+ return e;
+}
+
+void
+memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code)
+{
+ struct memif_msg_queue_elt *e;
+ struct pmd_internals *pmd;
+ memif_msg_disconnect_t *d;
+
+ if (cc == NULL) {
+ MIF_LOG(DEBUG, "Missing control channel.");
+ return;
+ }
+
+ e = memif_msg_enq(cc);
+ if (e == NULL) {
+ MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
+ return;
+ }
+
+ d = &e->msg.disconnect;
+
+ e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
+ d->code = err_code;
+
+ if (reason != NULL) {
+ strlcpy((char *)d->string, reason, sizeof(d->string));
+ if (cc->dev != NULL) {
+ pmd = cc->dev->data->dev_private;
+ strlcpy(pmd->local_disc_string, reason,
+ sizeof(pmd->local_disc_string));
+ }
+ }
+}
+
+static int
+memif_msg_enq_hello(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ memif_msg_hello_t *h;
+
+ if (e == NULL)
+ return -1;
+
+ h = &e->msg.hello;
+
+ e->msg.type = MEMIF_MSG_TYPE_HELLO;
+ h->min_version = MEMIF_VERSION;
+ h->max_version = MEMIF_VERSION;
+ h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_region = ETH_MEMIF_MAX_REGION_NUM - 1;
+ h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
+
+ strlcpy((char *)h->name, rte_version(), sizeof(h->name));
+
+ return 0;
+}
+
+static int
+memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_hello_t *h = &msg->hello;
+
+ if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
+ memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ /* Set parameters for active connection */
+ pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
+ pmd->cfg.num_s2m_rings);
+ pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
+ pmd->cfg.num_m2s_rings);
+ pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
+ pmd->cfg.log2_ring_size);
+ pmd->run.pkt_buffer_size = pmd->cfg.pkt_buffer_size;
+
+ strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
+
+ MIF_LOG(DEBUG, "%s: Connecting to %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
+{
+ memif_msg_init_t *i = &msg->init;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *pmd;
+ struct rte_eth_dev *dev;
+
+ if (i->version != MEMIF_VERSION) {
+ memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ if (cc->socket == NULL) {
+ memif_msg_enq_disconnect(cc, "Device error", 0);
+ return -1;
+ }
+
+ /* Find device with requested ID */
+ TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
+ dev = elt->dev;
+ pmd = dev->data->dev_private;
+ if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
+ pmd->id == i->id) {
+ /* assign control channel to device */
+ cc->dev = dev;
+ pmd->cc = cc;
+
+ if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Only ethernet mode supported",
+ 0);
+ return -1;
+ }
+
+ if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
+ ETH_MEMIF_FLAG_CONNECTED)) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Already connected", 0);
+ return -1;
+ }
+ strlcpy(pmd->remote_name, (char *)i->name,
+ sizeof(pmd->remote_name));
+
+ if (*pmd->secret != '\0') {
+ if (*i->secret == '\0') {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Secret required", 0);
+ return -1;
+ }
+ if (strcmp(pmd->secret, (char *)i->secret) != 0) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Incorrect secret", 0);
+ return -1;
+ }
+ }
+
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
+ return 0;
+ }
+ }
+
+ /* ID not found on this socket */
+ MIF_LOG(DEBUG, "ID %u not found.", i->id);
+ memif_msg_enq_disconnect(cc, "ID not found", 0);
+ return -1;
+}
+
+static int
+memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
+ int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_region_t *ar = &msg->add_region;
+ struct memif_region *r;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
+ return -1;
+ }
+
+ if (ar->index >= ETH_MEMIF_MAX_REGION_NUM || ar->index != pmd->regions_num ||
+ pmd->regions[ar->index] != NULL) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ r->fd = fd;
+ r->region_size = ar->size;
+ r->addr = NULL;
+
+ pmd->regions[ar->index] = r;
+ pmd->regions_num++;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_ring_t *ar = &msg->add_ring;
+ struct memif_queue *mq;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
+ return -1;
+ }
+
+ /* check if we have enough queues */
+ if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
+ if (ar->index >= pmd->cfg.num_s2m_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_s2m_rings++;
+ } else {
+ if (ar->index >= pmd->cfg.num_m2s_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_m2s_rings++;
+ }
+
+ mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
+ dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
+
+ mq->intr_handle.fd = fd;
+ mq->log2_ring_size = ar->log2_ring_size;
+ mq->region = ar->region;
+ mq->offset = ar->offset;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connect_t *c = &msg->connect;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connected_t *c = &msg->connected;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_disconnect_t *d = &msg->disconnect;
+
+ memset(pmd->remote_disc_string, 0, sizeof(pmd->remote_disc_string));
+ strlcpy(pmd->remote_disc_string, (char *)d->string,
+ sizeof(pmd->remote_disc_string));
+
+ MIF_LOG(INFO, "%s: Disconnect received: %s",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+
+ memset(pmd->local_disc_string, 0, 96);
+ memif_disconnect(rte_eth_dev_allocated
+ (rte_vdev_device_name(pmd->vdev)));
+ return 0;
+}
+
+static int
+memif_msg_enq_ack(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ if (e == NULL)
+ return -1;
+
+ e->msg.type = MEMIF_MSG_TYPE_ACK;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_init(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_init_t *i = &e->msg.init;
+
+ if (e == NULL)
+ return -1;
+
+ i = &e->msg.init;
+ e->msg.type = MEMIF_MSG_TYPE_INIT;
+ i->version = MEMIF_VERSION;
+ i->id = pmd->id;
+ i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
+
+ strlcpy((char *)i->name, rte_version(), sizeof(i->name));
+
+ if (*pmd->secret != '\0')
+ strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_add_region_t *ar;
+ struct memif_region *mr = pmd->regions[idx];
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_region;
+ e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
+ e->fd = mr->fd;
+ ar->index = idx;
+ ar->size = mr->region_size;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
+ memif_ring_type_t type)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ struct memif_queue *mq;
+ memif_msg_add_ring_t *ar;
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_ring;
+ mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
+ dev->data->rx_queues[idx];
+
+ e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
+ e->fd = mq->intr_handle.fd;
+ ar->index = idx;
+ ar->offset = mq->offset;
+ ar->region = mq->region;
+ ar->log2_ring_size = mq->log2_ring_size;
+ ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
+ ar->private_hdr_size = 0;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connect_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connect;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECT;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connected(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connected_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connected;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static void
+memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
+{
+ struct memif_msg_queue_elt *elt;
+ struct memif_control_channel *cc = arg;
+
+ /* close control channel fd */
+ close(intr_handle->fd);
+ /* clear message queue */
+ while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
+ TAILQ_REMOVE(&cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ /* free control channel */
+ rte_free(cc);
+}
+
+void
+memif_disconnect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *elt, *next;
+ struct memif_queue *mq;
+ struct rte_intr_handle *ih;
+ int i;
+ int ret;
+
+ if (pmd->cc != NULL) {
+ /* Clear control message queue (except disconnect message if any). */
+ for (elt = TAILQ_FIRST(&pmd->cc->msg_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
+ TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ }
+ /* send disconnect message (if there is any in queue) */
+ memif_msg_send_from_queue(pmd->cc);
+
+ /* at this point, there should be no more messages in queue */
+ if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
+ MIF_LOG(WARNING,
+ "%s: Unexpected message(s) in message queue.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+
+ ih = &pmd->cc->intr_handle;
+ if (ih->fd > 0) {
+ ret = rte_intr_callback_unregister(ih,
+ memif_intr_handler,
+ pmd->cc);
+ /*
+ * If callback is active (disconnecting based on
+ * received control message).
+ */
+ if (ret == -EAGAIN) {
+ ret = rte_intr_callback_unregister_pending(ih,
+ memif_intr_handler,
+ pmd->cc,
+ memif_intr_unregister_handler);
+ } else if (ret > 0) {
+ close(ih->fd);
+ rte_free(pmd->cc);
+ }
+ pmd->cc = NULL;
+ if (ret <= 0)
+ MIF_LOG(WARNING,
+ "%s: Failed to unregister control channel callback.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+ }
+
+ /* unconfig interrupts */
+ for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+ for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+
+ memif_free_regions(pmd);
+
+ /* reset connection configuration */
+ memset(&pmd->run, 0, sizeof(pmd->run));
+
+ dev->data->dev_link.link_status = ETH_LINK_DOWN;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
+ MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
+}
+
+static int
+memif_msg_receive(struct memif_control_channel *cc)
+{
+ char ctl[CMSG_SPACE(sizeof(int)) +
+ CMSG_SPACE(sizeof(struct ucred))] = { 0 };
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ memif_msg_t msg = { 0 };
+ ssize_t size;
+ int ret = 0;
+ struct ucred *cr __rte_unused;
+ cr = 0;
+ struct cmsghdr *cmsg;
+ int afd = -1;
+ int i;
+ struct pmd_internals *pmd;
+
+ iov[0].iov_base = (void *)&msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+
+ size = recvmsg(cc->intr_handle.fd, &mh, 0);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(DEBUG, "Invalid message size.");
+ memif_msg_enq_disconnect(cc, "Invalid message size", 0);
+ return -1;
+ }
+ MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
+
+ cmsg = CMSG_FIRSTHDR(&mh);
+ while (cmsg) {
+ if (cmsg->cmsg_level == SOL_SOCKET) {
+ if (cmsg->cmsg_type == SCM_CREDENTIALS)
+ cr = (struct ucred *)CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_RIGHTS)
+ afd = *(int *)CMSG_DATA(cmsg);
+ }
+ cmsg = CMSG_NXTHDR(&mh, cmsg);
+ }
+
+ if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
+ MIF_LOG(DEBUG, "Unexpected message.");
+ memif_msg_enq_disconnect(cc, "Unexpected message", 0);
+ return -1;
+ }
+
+ /* get device from hash data */
+ switch (msg.type) {
+ case MEMIF_MSG_TYPE_ACK:
+ break;
+ case MEMIF_MSG_TYPE_HELLO:
+ ret = memif_msg_receive_hello(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_init_regions_and_queues(cc->dev);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_init(cc->dev);
+ if (ret < 0)
+ goto exit;
+ pmd = cc->dev->data->dev_private;
+ for (i = 0; i < pmd->regions_num; i++) {
+ ret = memif_msg_enq_add_region(cc->dev, i);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_S2M);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_M2S);
+ if (ret < 0)
+ goto exit;
+ }
+ ret = memif_msg_enq_connect(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_INIT:
+ /*
+ * This cc does not have an interface asociated with it.
+ * If suitable interface is found it will be assigned here.
+ */
+ ret = memif_msg_receive_init(cc, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_REGION:
+ ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_RING:
+ ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECT:
+ ret = memif_msg_receive_connect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_connected(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECTED:
+ ret = memif_msg_receive_connected(cc->dev, &msg);
+ break;
+ case MEMIF_MSG_TYPE_DISCONNECT:
+ ret = memif_msg_receive_disconnect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ break;
+ default:
+ memif_msg_enq_disconnect(cc, "Unknown message type", 0);
+ ret = -1;
+ goto exit;
+ }
+
+ exit:
+ return ret;
+}
+
+static void
+memif_intr_handler(void *arg)
+{
+ struct memif_control_channel *cc = arg;
+ int ret;
+
+ ret = memif_msg_receive(cc);
+ /* if driver failed to assign device */
+ if (cc->dev == NULL) {
+ ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
+ memif_intr_handler,
+ cc,
+ memif_intr_unregister_handler);
+ if (ret < 0)
+ MIF_LOG(WARNING,
+ "Failed to unregister control channel callback.");
+ return;
+ }
+ /* if memif_msg_receive failed */
+ if (ret < 0)
+ goto disconnect;
+
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto disconnect;
+
+ return;
+
+ disconnect:
+ if (cc->dev == NULL) {
+ MIF_LOG(WARNING, "eth dev not allocated");
+ return;
+ }
+ memif_disconnect(cc->dev);
+}
+
+static void
+memif_listener_handler(void *arg)
+{
+ struct memif_socket *socket = arg;
+ int sockfd;
+ int addr_len;
+ struct sockaddr_un client;
+ struct memif_control_channel *cc;
+ int ret;
+
+ addr_len = sizeof(client);
+ sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
+ (socklen_t *)&addr_len);
+ if (sockfd < 0) {
+ MIF_LOG(ERR,
+ "Failed to accept connection request on socket fd %d",
+ socket->intr_handle.fd);
+ return;
+ }
+
+ MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
+
+ cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
+ if (cc == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control channel.");
+ goto error;
+ }
+
+ cc->intr_handle.fd = sockfd;
+ cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ cc->socket = socket;
+ cc->dev = NULL;
+ TAILQ_INIT(&cc->msg_queue);
+
+ ret = rte_intr_callback_register(&cc->intr_handle, memif_intr_handler, cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to register control channel callback.");
+ goto error;
+ }
+
+ ret = memif_msg_enq_hello(cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to enqueue hello message.");
+ goto error;
+ }
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto error;
+
+ return;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (cc != NULL)
+ rte_free(cc);
+}
+
+static struct memif_socket *
+memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
+{
+ struct memif_socket *sock;
+ struct sockaddr_un un;
+ int sockfd;
+ int ret;
+ int on = 1;
+
+ sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
+ if (sock == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory for memif socket");
+ return NULL;
+ }
+
+ sock->listener = listener;
+ rte_memcpy(sock->filename, key, 256);
+ TAILQ_INIT(&sock->dev_queue);
+
+ if (listener != 0) {
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0)
+ goto error;
+
+ un.sun_family = AF_UNIX;
+ memcpy(un.sun_path, sock->filename,
+ sizeof(un.sun_path) - 1);
+
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
+ sizeof(on));
+ if (ret < 0)
+ goto error;
+ ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+ if (ret < 0)
+ goto error;
+ ret = listen(sockfd, 1);
+ if (ret < 0)
+ goto error;
+
+ MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
+ rte_vdev_device_name(pmd->vdev), sock->filename);
+
+ sock->intr_handle.fd = sockfd;
+ sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ ret = rte_intr_callback_register(&sock->intr_handle,
+ memif_listener_handler, sock);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt "
+ "callback for listener socket",
+ rte_vdev_device_name(pmd->vdev));
+ return NULL;
+ }
+ }
+
+ return sock;
+
+ error:
+ MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
+ rte_vdev_device_name(pmd->vdev), key, strerror(errno));
+ if (sock != NULL)
+ rte_free(sock);
+ return NULL;
+}
+
+static struct rte_hash *
+memif_create_socket_hash(void)
+{
+ struct rte_hash_parameters params = { 0 };
+ params.name = MEMIF_SOCKET_HASH_NAME;
+ params.entries = 256;
+ params.key_len = 256;
+ params.hash_func = rte_jhash;
+ params.hash_func_init_val = 0;
+ return rte_hash_create(¶ms);
+}
+
+int
+memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *tmp_pmd;
+ struct rte_hash *hash;
+ int ret;
+ char key[256];
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL) {
+ hash = memif_create_socket_hash();
+ if (hash == NULL) {
+ MIF_LOG(ERR, "Failed to create memif socket hash.");
+ return -1;
+ }
+ }
+
+ memset(key, 0, 256);
+ rte_memcpy(key, socket_filename, strlen(socket_filename));
+ ret = rte_hash_lookup_data(hash, key, (void **)&socket);
+ if (ret < 0) {
+ socket = memif_socket_create(pmd, key,
+ (pmd->role ==
+ MEMIF_ROLE_SLAVE) ? 0 : 1);
+ if (socket == NULL)
+ return -1;
+ ret = rte_hash_add_key_data(hash, key, socket);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to add socket to socket hash.");
+ return ret;
+ }
+ }
+ pmd->socket_filename = socket->filename;
+
+ if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
+ MIF_LOG(ERR, "Socket is a listener.");
+ return -1;
+ } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
+ MIF_LOG(ERR, "Socket is not a listener.");
+ return -1;
+ }
+
+ TAILQ_FOREACH(elt, &socket->dev_queue, next) {
+ tmp_pmd = elt->dev->data->dev_private;
+ if (tmp_pmd->id == pmd->id) {
+ MIF_LOG(ERR, "Memif device with id %d already "
+ "exists on socket %s",
+ pmd->id, socket->filename);
+ return -1;
+ }
+ }
+
+ elt = rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt), 0);
+ if (elt == NULL) {
+ MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ elt->dev = dev;
+ TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
+
+ return 0;
+}
+
+void
+memif_socket_remove_device(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt, *next;
+ struct rte_hash *hash;
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL)
+ return;
+
+ if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) <
+ 0)
+ return;
+
+ for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->dev == dev) {
+ TAILQ_REMOVE(&socket->dev_queue, elt, next);
+ rte_free(elt);
+ pmd->socket_filename = NULL;
+ }
+ }
+
+ /* remove socket, if this was the last device using it */
+ if (TAILQ_EMPTY(&socket->dev_queue)) {
+ rte_hash_del_key(hash, socket->filename);
+ if (socket->listener) {
+ /* remove listener socket file,
+ * so we can create new one later.
+ */
+ remove(socket->filename);
+ }
+ rte_free(socket);
+ }
+}
+
+int
+memif_connect_master(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+ return 0;
+}
+
+int
+memif_connect_slave(struct rte_eth_dev *dev)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_un sun;
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0) {
+ MIF_LOG(ERR, "%s: Failed to open socket.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ sun.sun_family = AF_UNIX;
+
+ memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
+
+ ret = connect(sockfd, (struct sockaddr *)&sun,
+ sizeof(struct sockaddr_un));
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+ goto error;
+ }
+
+ MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+
+ pmd->cc = rte_zmalloc("memif-cc",
+ sizeof(struct memif_control_channel), 0);
+ if (pmd->cc == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate control channel.",
+ rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ pmd->cc->intr_handle.fd = sockfd;
+ pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ pmd->cc->socket = NULL;
+ pmd->cc->dev = dev;
+ TAILQ_INIT(&pmd->cc->msg_queue);
+
+ ret = rte_intr_callback_register(&pmd->cc->intr_handle,
+ memif_intr_handler, pmd->cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt callback "
+ "for control fd", rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ return 0;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (pmd->cc != NULL) {
+ rte_free(pmd->cc);
+ pmd->cc = NULL;
+ }
+ return -1;
+}
diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
new file mode 100644
index 000000000..2e8970712
--- /dev/null
+++ b/drivers/net/memif/memif_socket.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_SOCKET_H_
+#define _MEMIF_SOCKET_H_
+
+#include <sys/queue.h>
+
+/**
+ * Remove device from socket device list. If no device is left on the socket,
+ * remove the socket as well.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_socket_remove_device(struct rte_eth_dev *dev);
+
+/**
+ * Enqueue disconnect message to control channel message queue.
+ *
+ * @param cc
+ * control channel
+ * @param reason
+ * const string stating disconnect reason (96 characters)
+ * @param err_code
+ * error code
+ */
+void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code);
+
+/**
+ * Initialize memif socket for specified device. If socket doesn't exist, create socket.
+ *
+ * @param dev
+ * memif ethernet device
+ * @param socket_filename
+ * socket filename
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
+
+/**
+ * Disconnect memif device. Close control channel and shared memory.
+ *
+ * @param dev
+ * ethernet device
+ */
+void memif_disconnect(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, enable connection establishment.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_master(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, send connection request.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_slave(struct rte_eth_dev *dev);
+
+struct memif_socket_dev_list_elt {
+ TAILQ_ENTRY(memif_socket_dev_list_elt) next;
+ struct rte_eth_dev *dev; /**< pointer to device internals */
+ char dev_name[RTE_ETH_NAME_MAX_LEN];
+};
+
+#define MEMIF_SOCKET_HASH_NAME "memif-sh"
+struct memif_socket {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ uint8_t listener; /**< if not zero socket is listener */
+ char filename[256]; /**< socket filename */
+
+ TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
+ /**< Queue of devices using this socket */
+};
+
+/* Control message queue. */
+struct memif_msg_queue_elt {
+ TAILQ_ENTRY(memif_msg_queue_elt) next;
+ memif_msg_t msg; /**< control message */
+ int fd; /**< fd to be sent to peer */
+};
+
+struct memif_control_channel {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
+ struct memif_socket *socket; /**< pointer to socket */
+ struct rte_eth_dev *dev; /**< pointer to device */
+};
+
+#endif /* MEMIF_SOCKET_H */
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
new file mode 100644
index 000000000..4dfe37d3a
--- /dev/null
+++ b/drivers/net/memif/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+if host_machine.system() != 'linux'
+ build = false
+endif
+
+sources = files('rte_eth_memif.c',
+ 'memif_socket.c')
+
+allow_experimental_apis = true
+
+deps += ['hash']
diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
new file mode 100644
index 000000000..f42c39635
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -0,0 +1,1202 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/if_ether.h>
+#include <errno.h>
+#include <sys/eventfd.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+#define ETH_MEMIF_ID_ARG "id"
+#define ETH_MEMIF_ROLE_ARG "role"
+#define ETH_MEMIF_PKT_BUFFER_SIZE_ARG "bsize"
+#define ETH_MEMIF_RING_SIZE_ARG "rsize"
+#define ETH_MEMIF_SOCKET_ARG "socket"
+#define ETH_MEMIF_MAC_ARG "mac"
+#define ETH_MEMIF_ZC_ARG "zero-copy"
+#define ETH_MEMIF_SECRET_ARG "secret"
+
+static const char *valid_arguments[] = {
+ ETH_MEMIF_ID_ARG,
+ ETH_MEMIF_ROLE_ARG,
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ ETH_MEMIF_RING_SIZE_ARG,
+ ETH_MEMIF_SOCKET_ARG,
+ ETH_MEMIF_MAC_ARG,
+ ETH_MEMIF_ZC_ARG,
+ ETH_MEMIF_SECRET_ARG,
+ NULL
+};
+
+const char *
+memif_version(void)
+{
+ return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
+}
+
+static void
+memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
+{
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+ dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static memif_ring_t *
+memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
+{
+ /* rings only in region 0 */
+ void *p = pmd->regions[0]->addr;
+ int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size);
+
+ p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
+
+ return (memif_ring_t *)p;
+}
+
+static void *
+memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
+{
+ return ((uint8_t *)pmd->regions[d->region]->addr +
+ pmd->regions[d->region]->pkt_buffer_offset + d->offset);
+}
+
+static int
+memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail,
+ struct rte_mbuf *tail)
+{
+ /* Check for number-of-segments-overflow */
+ if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS))
+ return -EOVERFLOW;
+
+ /* Chain 'tail' onto the old tail */
+ cur_tail->next = tail;
+
+ /* accumulate number of segments and total length. */
+ head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs);
+
+ tail->pkt_len = tail->data_len;
+ head->pkt_len += tail->pkt_len;
+
+ return 0;
+}
+
+static uint16_t
+eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
+ uint16_t n_rx_pkts = 0;
+ uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
+ RTE_PKTMBUF_HEADROOM;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail;
+ uint64_t b;
+ ssize_t size __rte_unused;
+ uint16_t head;
+ int ret;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ /* consume interrupt */
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
+ size = read(mq->intr_handle.fd, &b, sizeof(b));
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
+ last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+ if (cur_slot == last_slot)
+ goto refill;
+ n_slots = last_slot - cur_slot;
+
+ while (n_slots && n_rx_pkts < nb_pkts) {
+ mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf_head == NULL))
+ goto no_free_bufs;
+ mbuf = mbuf_head;
+ mbuf->port = mq->in_port;
+
+next_slot:
+ s0 = cur_slot & mask;
+ d0 = &ring->desc[s0];
+
+ src_len = d0->length;
+ dst_off = 0;
+ src_off = 0;
+
+ do {
+ dst_len = mbuf_size - dst_off;
+ if (dst_len == 0) {
+ dst_off = 0;
+ dst_len = mbuf_size;
+
+ /* store pointer to tail */
+ mbuf_tail = mbuf;
+ mbuf = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf == NULL))
+ goto no_free_bufs;
+ mbuf->port = mq->in_port;
+ ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf);
+ if (unlikely(ret < 0)) {
+ MIF_LOG(ERR, "%s: number-of-segments-overflow",
+ rte_vdev_device_name(pmd->vdev));
+ rte_pktmbuf_free(mbuf);
+ goto no_free_bufs;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ rte_pktmbuf_data_len(mbuf) += cp_len;
+ rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
+ if (mbuf != mbuf_head)
+ rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
+
+ memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
+ (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
+
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ } while (src_len);
+
+ cur_slot++;
+ n_slots--;
+
+ if (d0->flags & MEMIF_DESC_FLAG_NEXT)
+ goto next_slot;
+
+ mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
+ *bufs++ = mbuf_head;
+ n_rx_pkts++;
+ }
+
+no_free_bufs:
+ if (type == MEMIF_RING_S2M) {
+ rte_mb();
+ ring->tail = cur_slot;
+ mq->last_head = cur_slot;
+ } else {
+ mq->last_tail = cur_slot;
+ }
+
+refill:
+ if (type == MEMIF_RING_M2S) {
+ head = ring->head;
+ n_slots = ring_size - head + mq->last_tail;
+
+ while (n_slots--) {
+ s0 = head++ & mask;
+ d0 = &ring->desc[s0];
+ d0->length = pmd->run.pkt_buffer_size;
+ }
+ rte_mb();
+ ring->head = head;
+ }
+
+ mq->n_pkts += n_rx_pkts;
+ return n_rx_pkts;
+}
+
+static uint16_t
+eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head;
+ uint64_t a;
+ ssize_t size;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ n_free = ring->tail - mq->last_tail;
+ mq->last_tail += n_free;
+ slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+
+ if (type == MEMIF_RING_S2M)
+ n_free = ring_size - ring->head + mq->last_tail;
+ else
+ n_free = ring->head - ring->tail;
+
+ while (n_tx_pkts < nb_pkts && n_free) {
+ mbuf_head = *bufs++;
+ mbuf = mbuf_head;
+
+ saved_slot = slot;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+
+next_in_chain:
+ src_off = 0;
+ src_len = rte_pktmbuf_data_len(mbuf);
+
+ while (src_len) {
+ if (dst_len == 0) {
+ if (n_free) {
+ slot++;
+ n_free--;
+ d0->flags |= MEMIF_DESC_FLAG_NEXT;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+ d0->flags = 0;
+ } else {
+ slot = saved_slot;
+ goto no_free_slots;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
+ rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
+ cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ dst_len -= cp_len;
+
+ d0->length = dst_off;
+ }
+
+ if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
+ mbuf = mbuf->next;
+ goto next_in_chain;
+ }
+
+ n_tx_pkts++;
+ slot++;
+ n_free--;
+ rte_pktmbuf_free(mbuf_head);
+ }
+
+no_free_slots:
+ rte_mb();
+ if (type == MEMIF_RING_S2M)
+ ring->head = slot;
+ else
+ ring->tail = slot;
+
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
+ a = 1;
+ size = write(mq->intr_handle.fd, &a, sizeof(a));
+ if (unlikely(size < 0)) {
+ MIF_LOG(WARNING,
+ "%s: Failed to send interrupt. %s",
+ rte_vdev_device_name(pmd->vdev), strerror(errno));
+ }
+ }
+
+ mq->n_err += nb_pkts - n_tx_pkts;
+ mq->n_pkts += n_tx_pkts;
+ return n_tx_pkts;
+}
+
+void
+memif_free_regions(struct pmd_internals *pmd)
+{
+ int i;
+ struct memif_region *r;
+
+ /* regions are allocated contiguously, so it's
+ * enough to loop until 'pmd->regions_num'
+ */
+ for (i = 0; i < pmd->regions_num; i++) {
+ r = pmd->regions[i];
+ if (r != NULL) {
+ if (r->addr != NULL) {
+ munmap(r->addr, r->region_size);
+ if (r->fd > 0) {
+ close(r->fd);
+ r->fd = -1;
+ }
+ }
+ rte_free(r);
+ pmd->regions[i] = NULL;
+ }
+ }
+ pmd->regions_num = 0;
+}
+
+static int
+memif_region_init_shm(struct pmd_internals *pmd, uint8_t has_buffers)
+{
+ char shm_name[ETH_MEMIF_SHM_NAME_SIZE];
+ int ret = 0;
+ struct memif_region *r;
+
+ if (pmd->regions_num >= ETH_MEMIF_MAX_REGION_NUM) {
+ MIF_LOG(ERR, "%s: Too many regions.", rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ /* calculate buffer offset */
+ r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+
+ r->region_size = r->pkt_buffer_offset;
+ /* if region has buffers, add buffers size to region_size */
+ if (has_buffers == 1)
+ r->region_size += (uint32_t)(pmd->run.pkt_buffer_size *
+ (1 << pmd->run.log2_ring_size) *
+ (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings));
+
+ memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE);
+ snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d",
+ pmd->regions_num);
+
+ r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
+ if (r->fd < 0) {
+ MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ ret = -1;
+ goto error;
+ }
+
+ ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ ret = ftruncate(r->fd, r->region_size);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ r->addr = mmap(NULL, r->region_size, PROT_READ |
+ PROT_WRITE, MAP_SHARED, r->fd, 0);
+ if (r->addr == MAP_FAILED) {
+ MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(ret));
+ ret = -1;
+ goto error;
+ }
+
+ pmd->regions[pmd->regions_num] = r;
+ pmd->regions_num++;
+
+ return ret;
+
+error:
+ if (r->fd > 0)
+ close(r->fd);
+ r->fd = -1;
+
+ return ret;
+}
+
+static int
+memif_regions_init(struct pmd_internals *pmd)
+{
+ int ret;
+
+ /* create one buffer region */
+ ret = memif_region_init_shm(pmd, /* has buffer */ 1);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void
+memif_init_rings(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_ring_t *ring;
+ int i, j;
+ uint16_t slot;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = i * (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = (uint32_t)(slot *
+ pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = (i + pmd->run.num_s2m_rings) *
+ (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = (uint32_t)(slot *
+ pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+}
+
+/* called only by slave */
+static void
+memif_init_queues(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = dev->data->tx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for tx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = dev->data->rx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for rx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+}
+
+int
+memif_init_regions_and_queues(struct rte_eth_dev *dev)
+{
+ int ret;
+
+ ret = memif_regions_init(dev->data->dev_private);
+ if (ret < 0)
+ return ret;
+
+ memif_init_rings(dev);
+
+ memif_init_queues(dev);
+
+ return 0;
+}
+
+int
+memif_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_region *mr;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ mr = pmd->regions[i];
+ if (mr != NULL) {
+ if (mr->addr == NULL) {
+ if (mr->fd < 0)
+ return -1;
+ mr->addr = mmap(NULL, mr->region_size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, mr->fd, 0);
+ if (mr->addr == NULL)
+ return -1;
+ /* calculate packet buffer offset */
+ mr->pkt_buffer_offset = (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+ }
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_MASTER)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_SLAVE)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
+ dev->data->dev_link.link_status = ETH_LINK_UP;
+ MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
+ return 0;
+}
+
+static int
+memif_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int ret = 0;
+
+ switch (pmd->role) {
+ case MEMIF_ROLE_SLAVE:
+ ret = memif_connect_slave(dev);
+ break;
+ case MEMIF_ROLE_MASTER:
+ ret = memif_connect_master(dev);
+ break;
+ default:
+ MIF_LOG(ERR, "%s: Unknown role: %d.",
+ rte_vdev_device_name(pmd->vdev), pmd->role);
+ ret = -1;
+ break;
+ }
+
+ return ret;
+}
+
+static int
+memif_dev_configure(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ /*
+ * SLAVE - TXQ
+ * MASTER - RXQ
+ */
+ pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_tx_queues : dev->data->nb_rx_queues;
+
+ /*
+ * SLAVE - RXQ
+ * MASTER - TXQ
+ */
+ pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+
+ return 0;
+}
+
+static int
+memif_tx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type =
+ (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->pmd = pmd;
+ dev->data->tx_queues[qid] = mq;
+
+ return 0;
+}
+
+static int
+memif_rx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ printf("Setting up rx queue...\n");
+
+ mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->mempool = mb_pool;
+ mq->in_port = dev->data->port_id;
+ mq->pmd = pmd;
+ dev->data->rx_queues[qid] = mq;
+
+ return 0;
+}
+
+static void
+memif_queue_release(void *queue)
+{
+ struct memif_queue *q = (struct memif_queue *)queue;
+
+ if (!q)
+ return;
+
+ rte_free(q);
+}
+
+static int
+memif_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static int
+memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+ uint8_t tmp, nq;
+
+ stats->ipackets = 0;
+ stats->ibytes = 0;
+ stats->opackets = 0;
+ stats->obytes = 0;
+ stats->oerrors = 0;
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
+ pmd->run.num_m2s_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* RX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->rx_queues[i];
+ stats->q_ipackets[i] = mq->n_pkts;
+ stats->q_ibytes[i] = mq->n_bytes;
+ stats->ipackets += mq->n_pkts;
+ stats->ibytes += mq->n_bytes;
+ }
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
+ pmd->run.num_s2m_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* TX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->tx_queues[i];
+ stats->q_opackets[i] = mq->n_pkts;
+ stats->q_obytes[i] = mq->n_bytes;
+ stats->q_errors[i] = mq->n_err;
+ stats->opackets += mq->n_pkts;
+ stats->obytes += mq->n_bytes;
+ stats->oerrors += mq->n_err;
+ }
+ return 0;
+}
+
+static void
+memif_stats_reset(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+ struct memif_queue *mq;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
+ dev->data->rx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
+ dev->data->tx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+}
+
+static int
+memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
+ rte_vdev_device_name(pmd->vdev));
+
+ return -1;
+}
+
+static int
+memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
+
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = memif_dev_start,
+ .dev_infos_get = memif_dev_info,
+ .dev_configure = memif_dev_configure,
+ .tx_queue_setup = memif_tx_queue_setup,
+ .rx_queue_setup = memif_rx_queue_setup,
+ .rx_queue_release = memif_queue_release,
+ .tx_queue_release = memif_queue_release,
+ .rx_queue_intr_enable = memif_rx_queue_intr_enable,
+ .rx_queue_intr_disable = memif_rx_queue_intr_disable,
+ .link_update = memif_link_update,
+ .stats_get = memif_stats_get,
+ .stats_reset = memif_stats_reset,
+};
+
+static int
+memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
+ memif_interface_id_t id, uint32_t flags,
+ const char *socket_filename,
+ memif_log2_ring_size_t log2_ring_size,
+ uint16_t pkt_buffer_size, const char *secret,
+ struct ether_addr *ether_addr)
+{
+ int ret = 0;
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct pmd_internals *pmd;
+ const unsigned int numa_node = vdev->device.numa_node;
+ const char *name = rte_vdev_device_name(vdev);
+
+ if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
+ MIF_LOG(ERR, "Zero-copy slave not supported.");
+ return -1;
+ }
+
+ eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
+ if (eth_dev == NULL) {
+ MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
+ return -1;
+ }
+
+ pmd = eth_dev->data->dev_private;
+ memset(pmd, 0, sizeof(*pmd));
+
+ pmd->vdev = vdev;
+ pmd->id = id;
+ pmd->flags = flags;
+ pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
+ pmd->role = role;
+ printf("%s\n", role == MEMIF_ROLE_SLAVE ? "SLAVE" : "MASTER");
+ ret = memif_socket_init(eth_dev, socket_filename);
+ if (ret < 0)
+ return ret;
+
+ memset(pmd->secret, 0, sizeof(char) * 24);
+ if (secret != NULL)
+ strlcpy(pmd->secret, secret, sizeof(pmd->secret));
+
+ pmd->cfg.log2_ring_size = log2_ring_size;
+ /* set in .dev_configure() */
+ pmd->cfg.num_s2m_rings = 0;
+ pmd->cfg.num_m2s_rings = 0;
+
+ pmd->cfg.pkt_buffer_size = pkt_buffer_size;
+
+ data = eth_dev->data;
+ data->dev_private = pmd;
+ data->numa_node = numa_node;
+ data->mac_addrs = ether_addr;
+
+ eth_dev->dev_ops = &ops;
+ eth_dev->device = &vdev->device;
+ eth_dev->rx_pkt_burst = eth_memif_rx;
+ eth_dev->tx_pkt_burst = eth_memif_tx;
+
+ rte_eth_dev_probing_finish(eth_dev);
+
+ return 0;
+}
+
+static int
+memif_set_role(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ enum memif_role_t *role = (enum memif_role_t *)extra_args;
+
+ if (strstr(value, "master") != NULL) {
+ *role = MEMIF_ROLE_MASTER;
+ } else if (strstr(value, "slave") != NULL) {
+ *role = MEMIF_ROLE_SLAVE;
+ } else {
+ MIF_LOG(ERR, "Unknown role: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint32_t *flags = (uint32_t *)extra_args;
+
+ if (strstr(value, "yes") != NULL) {
+ *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
+ } else if (strstr(value, "no") != NULL) {
+ *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
+ } else {
+ MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
+
+ /* even if parsing fails, 0 is a valid id */
+ *id = strtoul(value, NULL, 10);
+ return 0;
+}
+
+static int
+memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ uint16_t *pkt_buffer_size = (uint16_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > 0xFFFF) {
+ MIF_LOG(ERR, "Invalid buffer size: %s.", value);
+ return -EINVAL;
+ }
+ *pkt_buffer_size = tmp;
+ return 0;
+}
+
+static int
+memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ memif_log2_ring_size_t *log2_ring_size =
+ (memif_log2_ring_size_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
+ MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
+ value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
+ return -EINVAL;
+ }
+ *log2_ring_size = tmp;
+ return 0;
+}
+
+/* check if directory exists and if we have permission to read/write */
+static int
+memif_check_socket_filename(const char *filename)
+{
+ char *dir = NULL, *tmp;
+ uint32_t idx;
+ int ret = 0;
+
+ tmp = strrchr(filename, '/');
+ if (tmp != NULL) {
+ idx = tmp - filename;
+ dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
+ if (dir == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory.");
+ return -1;
+ }
+ strlcpy(dir, filename, sizeof(char) * (idx + 1));
+ }
+
+ if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
+ W_OK, AT_EACCESS) < 0)) {
+ MIF_LOG(ERR, "Invalid directory: '%s'.", dir);
+ ret = -EINVAL;
+ }
+
+ if (dir != NULL)
+ rte_free(dir);
+
+ return ret;
+}
+
+static int
+memif_set_socket_filename(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ const char **socket_filename = (const char **)extra_args;
+
+ *socket_filename = value;
+ return memif_check_socket_filename(*socket_filename);
+}
+
+static int
+memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ struct ether_addr *ether_addr = (struct ether_addr *)extra_args;
+ int ret = 0;
+
+ ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+ ðer_addr->addr_bytes[0], ðer_addr->addr_bytes[1],
+ ðer_addr->addr_bytes[2], ðer_addr->addr_bytes[3],
+ ðer_addr->addr_bytes[4], ðer_addr->addr_bytes[5]);
+ if (ret != 6)
+ MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
+ return 0;
+}
+
+static int
+memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **secret = (const char **)extra_args;
+
+ *secret = value;
+ return 0;
+}
+
+static int
+rte_pmd_memif_probe(struct rte_vdev_device *vdev)
+{
+ RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
+ RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
+ int ret = 0;
+ struct rte_kvargs *kvlist;
+ const char *name = rte_vdev_device_name(vdev);
+ enum memif_role_t role = MEMIF_ROLE_SLAVE;
+ memif_interface_id_t id = 0;
+ uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
+ memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
+ const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
+ uint32_t flags = 0;
+ const char *secret = NULL;
+ struct ether_addr *ether_addr = rte_zmalloc("", sizeof(struct ether_addr), 0);
+
+ eth_random_addr(ether_addr->addr_bytes);
+
+ MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
+
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ MIF_LOG(ERR, "Multi-processing not supported for memif.");
+ /* TODO:
+ * Request connection information.
+ *
+ * Once memif in the primary process is connected,
+ * broadcast connection information.
+ */
+ return -1;
+ }
+
+ kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
+
+ /* parse parameters */
+ if (kvlist != NULL) {
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
+ &memif_set_role, &role);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
+ &memif_set_id, &id);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ &memif_set_bs, &pkt_buffer_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
+ &memif_set_rs, &log2_ring_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
+ &memif_set_socket_filename,
+ (void *)(&socket_filename));
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
+ &memif_set_mac, ether_addr);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
+ &memif_set_zc, &flags);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
+ &memif_set_secret, (void *)(&secret));
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* create interface */
+ ret = memif_create(vdev, role, id, flags, socket_filename,
+ log2_ring_size, pkt_buffer_size, secret, ether_addr);
+
+exit:
+ if (kvlist != NULL)
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_memif_remove(struct rte_vdev_device *vdev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internals *pmd;
+
+ eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
+ if (eth_dev == NULL)
+ return 0;
+
+ pmd = eth_dev->data->dev_private;
+
+ memif_msg_enq_disconnect(pmd->cc, "Device removed", 0);
+ memif_disconnect(eth_dev);
+
+ memif_socket_remove_device(eth_dev);
+
+ pmd->vdev = NULL;
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_vdev_driver pmd_memif_drv = {
+ .probe = rte_pmd_memif_probe,
+ .remove = rte_pmd_memif_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
+
+RTE_PMD_REGISTER_PARAM_STRING(net_memif,
+ ETH_MEMIF_ID_ARG "=<int>"
+ ETH_MEMIF_ROLE_ARG "=master|slave"
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG "=<int>"
+ ETH_MEMIF_RING_SIZE_ARG "=<int>"
+ ETH_MEMIF_SOCKET_ARG "=<string>"
+ ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
+ ETH_MEMIF_ZC_ARG "=yes|no"
+ ETH_MEMIF_SECRET_ARG "=<string>");
+
+int memif_logtype;
+
+RTE_INIT(memif_init_log)
+{
+ memif_logtype = rte_log_register("pmd.net.memif");
+ if (memif_logtype >= 0)
+ rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
new file mode 100644
index 000000000..e640c975f
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.h
@@ -0,0 +1,207 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_ETH_MEMIF_H_
+#define _RTE_ETH_MEMIF_H_
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif /* GNU_SOURCE */
+
+#include <sys/queue.h>
+
+#include <rte_ethdev_driver.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
+#include "memif.h"
+
+#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/tmp/memif.sock"
+#define ETH_MEMIF_DEFAULT_RING_SIZE 10
+#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
+
+#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
+#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
+#define ETH_MEMIF_MAX_REGION_NUM 256
+
+#define ETH_MEMIF_SHM_NAME_SIZE 32
+
+extern int memif_logtype;
+
+#define MIF_LOG(level, fmt, args...) \
+ rte_log(RTE_LOG_ ## level, memif_logtype, \
+ "%s(): " fmt "\n", __func__, ##args)
+
+enum memif_role_t {
+ MEMIF_ROLE_MASTER,
+ MEMIF_ROLE_SLAVE,
+};
+
+struct memif_region {
+ void *addr; /**< shared memory address */
+ memif_region_size_t region_size; /**< shared memory size */
+ int fd; /**< shared memory file descriptor */
+ uint32_t pkt_buffer_offset;
+ /**< offset from 'addr' to first packet buffer */
+};
+
+struct memif_queue {
+ struct rte_mempool *mempool; /**< mempool for RX packets */
+ uint16_t in_port; /**< port id */
+
+ struct pmd_internals *pmd; /**< device internals */
+
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+
+ /* ring info */
+ memif_ring_type_t type; /**< ring type */
+ memif_ring_t *ring; /**< pointer to ring */
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+
+ memif_region_index_t region; /**< shared memory region index */
+ memif_region_offset_t offset;
+ /**< ring offset from start of shm region (ring - memif_region.addr) */
+
+ uint16_t last_head; /**< last ring head */
+ uint16_t last_tail; /**< last ring tail */
+
+ /* rx/tx info */
+ uint64_t n_pkts; /**< number of rx/tx packets */
+ uint64_t n_bytes; /**< number of rx/tx bytes */
+ uint64_t n_err; /**< number of tx errors */
+};
+
+struct pmd_internals {
+ memif_interface_id_t id; /**< unique id */
+ enum memif_role_t role; /**< device role */
+ uint32_t flags; /**< device status flags */
+#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
+/**< device is connecting */
+#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
+/**< device is connected */
+#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
+/**< device is zero-copy enabled */
+#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
+/**< device has not been configured and can not accept connection requests */
+
+ char *socket_filename; /**< pointer to socket filename */
+ char secret[24]; /**< secret (optional security parameter) */
+
+ struct memif_control_channel *cc; /**< control channel */
+
+ struct memif_region *regions[ETH_MEMIF_MAX_REGION_NUM];
+ /**< shared memory regions */
+ memif_region_index_t regions_num; /**< number of regions */
+
+ /* remote info */
+ char remote_name[64]; /**< remote app name */
+ char remote_if_name[64]; /**< remote peer name */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } cfg; /**< Configured parameters (max values) */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } run;
+ /**< Parameters used in active connection */
+
+ char local_disc_string[96]; /**< local disconnect reason */
+ char remote_disc_string[96]; /**< remote disconnect reason */
+
+ struct rte_vdev_device *vdev; /**< vdev handle */
+};
+
+/**
+ * Unmap shared memory and free regions from memory.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_free_regions(struct pmd_internals *pmd);
+
+/**
+ * Finalize connection establishment process. Map shared memory file
+ * (master role), initialize ring queue, set link status up.
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect(struct rte_eth_dev *dev);
+
+/**
+ * Create shared memory file and initialize ring queue.
+ * Only called by slave when establishing connection
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_init_regions_and_queues(struct rte_eth_dev *dev);
+
+/**
+ * Get memif version string.
+ *
+ * @return
+ * - memif version string
+ */
+const char *memif_version(void);
+
+#ifndef MFD_HUGETLB
+#ifndef __NR_memfd_create
+
+#if defined __x86_64__
+#define __NR_memfd_create 319
+#elif defined __x86_32__
+#define __NR_memfd_create 1073742143
+#elif defined __arm__
+#define __NR_memfd_create 385
+#elif defined __aarch64__
+#define __NR_memfd_create 279
+#elif defined __powerpc__
+#define __NR_memfd_create 360
+#elif defined __i386__
+#define __NR_memfd_create 356
+#else
+#error "__NR_memfd_create unknown for this architecture"
+#endif
+
+#endif /* __NR_memfd_create */
+
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif /* MFD_HUGETLB */
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#endif /* RTE_ETH_MEMIF_H */
diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
new file mode 100644
index 000000000..4b2e62193
--- /dev/null
+++ b/drivers/net/memif/rte_pmd_memif_version.map
@@ -0,0 +1,4 @@
+DPDK_19.05 {
+
+ local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index ed99896c3..b57073483 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -24,6 +24,7 @@ drivers = ['af_packet',
'ixgbe',
'kni',
'liquidio',
+ 'memif',
'mlx4',
'mlx5',
'mvneta',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7c9b4b538..326b2a30b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -174,6 +174,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
--
2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* [dpdk-dev] [RFC v7] /net: memory interface (memif)
2019-05-13 10:45 ` [dpdk-dev] [RFC v7] " Jakub Grajciar
@ 2019-05-13 10:45 ` Jakub Grajciar
2019-05-16 11:46 ` [dpdk-dev] [RFC v8] " Jakub Grajciar
1 sibling, 0 replies; 58+ messages in thread
From: Jakub Grajciar @ 2019-05-13 10:45 UTC (permalink / raw)
To: dev; +Cc: Jakub Grajciar
Memory interface (memif), provides high performance
packet transfer over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
---
MAINTAINERS | 6 +
app/test-pmd/testpmd.c | 2 +
config/common_base | 5 +
config/common_linux | 1 +
doc/guides/nics/features/memif.ini | 14 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/memif.rst | 234 ++++
doc/guides/rel_notes/release_19_05.rst | 4 +
drivers/net/Makefile | 1 +
drivers/net/memif/Makefile | 30 +
drivers/net/memif/memif.h | 179 +++
drivers/net/memif/memif_socket.c | 1103 +++++++++++++++++
drivers/net/memif/memif_socket.h | 105 ++
drivers/net/memif/meson.build | 13 +
drivers/net/memif/rte_eth_memif.c | 1202 +++++++++++++++++++
drivers/net/memif/rte_eth_memif.h | 207 ++++
drivers/net/memif/rte_pmd_memif_version.map | 4 +
drivers/net/meson.build | 1 +
mk/rte.app.mk | 1 +
19 files changed, 3113 insertions(+)
create mode 100644 doc/guides/nics/features/memif.ini
create mode 100644 doc/guides/nics/memif.rst
create mode 100644 drivers/net/memif/Makefile
create mode 100644 drivers/net/memif/memif.h
create mode 100644 drivers/net/memif/memif_socket.c
create mode 100644 drivers/net/memif/memif_socket.h
create mode 100644 drivers/net/memif/meson.build
create mode 100644 drivers/net/memif/rte_eth_memif.c
create mode 100644 drivers/net/memif/rte_eth_memif.h
create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
requires patch: http://patchwork.dpdk.org/patch/51511/
Memif uses a unix domain socket as control channel (cc).
rte_interrupts.h is used to handle interrupts for this cc.
This patch is required, because the message received can
be a reason for disconnecting.
v3:
- coding style fixes
- documentation
- doxygen comments
- use strlcpy() instead of strncpy()
- use RTE_BUILD_BUG_ON instead of _Static_assert()
- fix meson build deps
v4:
- coding style fixes
- doc update (desc format, messaging, features, ...)
- pointer arithmetic fix
- __rte_packed and __rte_aligned instead of __attribute__()
- fixed multi-queue
- support additional archs (memfd_create syscall)
v5:
- coding style fixes
- add release notes
- remove unused dependencies
- doc update
v6:
- remove socket on termination of testpmd application
- disallow memif probing in secondary process (return error), multi-process support will be implemented in separate patch
- fix chained buffer packet length
- use single region for rings and buffers (non-zero-copy)
- doc update
v7:
- coding style fixes
diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829c5..3698c146b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -841,6 +841,12 @@ F: drivers/net/softnic/
F: doc/guides/nics/features/softnic.ini
F: doc/guides/nics/softnic.rst
+Memif PMD
+M: Jakub Grajciar <jgrajcia@cisco.com>
+F: drivers/net/memif/
+F: doc/guides/nics/memif.rst
+F: doc/guides/nics/features/memif.ini
+
Crypto Drivers
--------------
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6fbfd29dd..9eec4c19d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2493,6 +2493,8 @@ pmd_test_exit(void)
device = rte_eth_devices[pt_id].device;
if (device && !strcmp(device->driver->name, "net_virtio_user"))
detach_port_device(pt_id);
+ else if (device && !strcmp(device->driver->name, "net_memif"))
+ detach_port_device(pt_id);
}
}
diff --git a/config/common_base b/config/common_base
index 6b96e0e80..dca119a65 100644
--- a/config/common_base
+++ b/config/common_base
@@ -444,6 +444,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
#
CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+#
+# Compile Memory Interface PMD driver (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_MEMIF=n
+
#
# Compile link bonding PMD library
#
diff --git a/config/common_linux b/config/common_linux
index 75334273d..87514fe4f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_LIBRTE_IFC_PMD=y
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_MEMIF=y
CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
CONFIG_RTE_LIBRTE_PMD_TAP=y
CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
new file mode 100644
index 000000000..807d9ecdc
--- /dev/null
+++ b/doc/guides/nics/features/memif.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'memif' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status = Y
+Basic stats = Y
+Jumbo frame = Y
+ARMv8 = Y
+Power8 = Y
+x86-32 = Y
+x86-64 = Y
+Usage doc = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2221c35f2..691e7209f 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
intel_vf
kni
liquidio
+ memif
mlx4
mlx5
mvneta
diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
new file mode 100644
index 000000000..0d0312442
--- /dev/null
+++ b/doc/guides/nics/memif.rst
@@ -0,0 +1,234 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2018-2019 Cisco Systems, Inc.
+
+======================
+Memif Poll Mode Driver
+======================
+
+Shared memory packet interface (memif) PMD allows for DPDK and any other client
+using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
+Linux only.
+
+The created device transmits packets in a raw format. It can be used with
+Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
+supported in DPDK memif implementation.
+
+Memif works in two roles: master and slave. Slave connects to master over an
+existing socket. It is also a producer of shared memory file and initializes
+the shared memory. Each interface can be connected to one peer interface
+at same time. The peer interface is identified by id parameter. Master
+creates the socket and listens for any slave connection requests. The socket
+may already exist on the system. Be sure to remove any such sockets, if you
+are creating a master interface, or you will see an "Address already in use"
+error. Function ``rte_pmd_memif_remove()``, which removes memif interface,
+will also remove a listener socket, if it is not being used by any other
+interface.
+
+The method to enable one or more interfaces is to use the
+``--vdev=net_memif0`` option on the DPDK application command line. Each
+``--vdev=net_memif1`` option given will create an interface named net_memif0,
+net_memif1, and so on. Memif uses unix domain socket to transmit control
+messages. Each memif has a unique id per socket. This id is used to identify
+peer interface. If you are connecting multiple
+interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
+etc. Note that if you assign a socket to a master interface it becomes a
+listener socket. Listener socket can not be used by a slave interface on same
+client.
+
+.. csv-table:: **Memif configuration options**
+ :header: "Option", "Description", "Default", "Valid value"
+
+ "id=0", "Used to identify peer interface", "0", "uint32_t"
+ "role=master", "Set memif role", "slave", "master|slave"
+ "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
+ "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
+ "nrxq=2", "Number of RX queues", "1", "255"
+ "ntxq=2", "Number of TX queues", "1", "255"
+ "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
+ "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
+ "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
+ "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
+
+**Connection establishment**
+
+In order to create memif connection, two memif interfaces, each in separate
+process, are needed. One interface in ``master`` role and other in
+``slave`` role. It is not possible to connect two interfaces in a single
+process. Each interface can be connected to one interface at same time,
+identified by matching id parameter.
+
+Memif driver uses unix domain socket to exchange required information between
+memif interfaces. Socket file path is specified at interface creation see
+*Memif configuration options* table above. If socket is used by ``master``
+interface, it's marked as listener socket (in scope of current process) and
+listens to connection requests from other processes. One socket can be used by
+multiple interfaces. One process can have ``slave`` and ``master`` interfaces
+at the same time, provided each role is assigned unique socket.
+
+For detailed information on memif control messages, see: net/memif/memif.h.
+
+Slave interface attempts to make a connection on assigned socket. Process
+listening on this socket will extract the connection request and create a new
+connected socket (control channel). Then it sends the 'hello' message
+(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
+adjusts its configuration accordingly, and sends 'init' message
+(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
+uses this id to find master interface, and assigns the control channel to this
+interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
+sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
+every region allocated. Master responds to each of these messages with 'ack'
+message. Same behavior applies to rings. Slave sends 'add ring' message
+(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
+each message with 'ack' message. To finalize the connection, slave interface
+sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
+master maps regions to its address space, initializes rings and responds with
+'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
+(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
+any time, due to driver error or if the interface is being deleted.
+
+Files
+
+- net/memif/memif.h *- control messages definitions*
+- net/memif/memif_socket.h
+- net/memif/memif_socket.c
+
+Shared memory
+~~~~~~~~~~~~~
+
+**Shared memory format**
+
+Slave is producer and master is consumer. Memory regions, are mapped shared memory files,
+created by memif slave and provided to master at connection establishment.
+Regions contain rings and buffers. Rings and buffers can also be separated into multiple
+regions. For no-zero-copy, rings and buffers are stored inside single memory
+region to reduce the number of opened files.
+
+region n (no-zero-copy):
+
++-----------------------+-------------------------------------------------------------------------+
+| Rings | Buffers |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+| S2M rings | M2S rings | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+
+S2M OR M2S Rings:
+
++--------+--------+-----------------------+
+| ring 0 | ring 1 | ring num_s2m_rings - 1|
++--------+--------+-----------------------+
+
+ring 0:
+
++-------------+---------------------------------------+
+| ring header | (1 << pmd->run.log2_ring_size) * desc |
++-------------+---------------------------------------+
+
+Descriptors are assigned packet buffers in order of rings creation. If we have one ring
+in each direction and ring size is 1024, then first 1024 buffers will belong to S2M ring and
+last 1024 will belong to M2S ring. In case of zero-copy, buffers are dequeued and
+enqueued as needed.
+
+**Descriptor format**
+
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|0 |length |region |flags |
++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
+|1 |metadata |offset |
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+**Flags field - flags (Quad Word 0, bits 0:15)**
+
++-----+--------------------+------------------------------------------------------------------------------------------------+
+|Bits |Name |Functionality |
++=====+====================+================================================================================================+
+|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
++-----+--------------------+------------------------------------------------------------------------------------------------+
+
+**Region index - region (Quad Word 0, 16:31)**
+
+Index of memory region, the buffer is located in.
+
+**Data length - length (Quad Word 0, 32:63)**
+
+Length of transmitted/received data.
+
+**Data Offset - offset (Quad Word 1, 0:31)**
+
+Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
+
+**Metadata - metadata (Quad Word 1, 32:63)**
+
+Buffer metadata.
+
+Files
+
+- net/memif/memif.h *- descriptor and ring definitions*
+- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
+
+Example: testpmd and testpmd
+----------------------------
+In this example we run two instances of testpmd application and transmit packets over memif.
+
+First create ``master`` interface::
+
+ #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
+
+Now create ``slave`` interface (master must be already running so the slave will connect)::
+
+ #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
+
+Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
+
+ testpmd> set fwd rxonly
+ testpmd> start
+
+ testpmd> set fwd txonly
+ testpmd> start
+
+Show status::
+
+ testpmd> show port stats 0
+
+Example: testpmd and VPP
+------------------------
+For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
+
+Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
+
+ vpp# create interface memif id 0 master no-zero-copy
+ vpp# set interface state memif0/0 up
+ vpp# set interface ip address memif0/0 192.168.1.1/24
+
+To see socket filename use show memif command::
+
+ vpp# show memif
+ sockets
+ id listener filename
+ 0 yes (1) /run/vpp/memif.sock
+ ...
+
+Now create memif interface by running testpmd with these command line options::
+
+ #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
+
+Testpmd should now create memif slave interface and try to connect to master.
+In testpmd set forward option to icmpecho and start forwarding::
+
+ testpmd> set fwd icmpecho
+ testpmd> start
+
+Send ping from VPP::
+
+ vpp# ping 192.168.1.2
+ 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
+ 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
+ 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
+ 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
diff --git a/doc/guides/rel_notes/release_19_05.rst b/doc/guides/rel_notes/release_19_05.rst
index 5044ac7df..c77001cd5 100644
--- a/doc/guides/rel_notes/release_19_05.rst
+++ b/doc/guides/rel_notes/release_19_05.rst
@@ -221,6 +221,10 @@ New Features
Added support for Intel SST-BF feature so that the distributor core is
pinned to a high frequency core if available.
+* **Added memif PMD.**
+
+ Added the new Shared Memory Packet Interface (``memif``) PMD.
+ See the :doc:`../nics/memif` guide for more details on this new driver.
Removed Items
-------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3a72cf38c..78cb10fc6 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -35,6 +35,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
DIRS-$(CONFIG_RTE_LIBRTE_IPN3KE_PMD) += ipn3ke
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
new file mode 100644
index 000000000..2ae3c37a2
--- /dev/null
+++ b/drivers/net/memif/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_memif.a
+
+EXPORT_MAP := rte_pmd_memif_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
+LDLIBS += -lrte_ethdev -lrte_kvargs
+LDLIBS += -lrte_hash
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
new file mode 100644
index 000000000..3948b1f50
--- /dev/null
+++ b/drivers/net/memif/memif.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_H_
+#define _MEMIF_H_
+
+#define MEMIF_COOKIE 0x3E31F20
+#define MEMIF_VERSION_MAJOR 2
+#define MEMIF_VERSION_MINOR 0
+#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
+#define MEMIF_NAME_SZ 32
+
+/*
+ * S2M: direction slave -> master
+ * M2S: direction master -> slave
+ */
+
+/*
+ * Type definitions
+ */
+
+typedef enum memif_msg_type {
+ MEMIF_MSG_TYPE_NONE,
+ MEMIF_MSG_TYPE_ACK,
+ MEMIF_MSG_TYPE_HELLO,
+ MEMIF_MSG_TYPE_INIT,
+ MEMIF_MSG_TYPE_ADD_REGION,
+ MEMIF_MSG_TYPE_ADD_RING,
+ MEMIF_MSG_TYPE_CONNECT,
+ MEMIF_MSG_TYPE_CONNECTED,
+ MEMIF_MSG_TYPE_DISCONNECT,
+} memif_msg_type_t;
+
+typedef enum {
+ MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
+ MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
+} memif_ring_type_t;
+
+typedef enum {
+ MEMIF_INTERFACE_MODE_ETHERNET,
+ MEMIF_INTERFACE_MODE_IP,
+ MEMIF_INTERFACE_MODE_PUNT_INJECT,
+} memif_interface_mode_t;
+
+typedef uint16_t memif_region_index_t;
+typedef uint32_t memif_region_offset_t;
+typedef uint64_t memif_region_size_t;
+typedef uint16_t memif_ring_index_t;
+typedef uint32_t memif_interface_id_t;
+typedef uint16_t memif_version_t;
+typedef uint8_t memif_log2_ring_size_t;
+
+/*
+ * Socket messages
+ */
+
+ /**
+ * M2S
+ * Contains master interfaces configuration.
+ */
+typedef struct __rte_packed {
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+ memif_version_t min_version; /**< lowest supported memif version */
+ memif_version_t max_version; /**< highest supported memif version */
+ memif_region_index_t max_region; /**< maximum num of regions */
+ memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
+ memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
+ memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
+} memif_msg_hello_t;
+
+/**
+ * S2M
+ * Contains information required to identify interface
+ * to which the slave wants to connect.
+ */
+typedef struct __rte_packed {
+ memif_version_t version; /**< memif version */
+ memif_interface_id_t id; /**< interface id */
+ memif_interface_mode_t mode:8; /**< interface mode */
+ uint8_t secret[24]; /**< optional security parameter */
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+} memif_msg_init_t;
+
+/**
+ * S2M
+ * Request master to add new shared memory region to master interface.
+ * Shared files file descriptor is passed in cmsghdr.
+ */
+typedef struct __rte_packed {
+ memif_region_index_t index; /**< shm regions index */
+ memif_region_size_t size; /**< shm region size */
+} memif_msg_add_region_t;
+
+/**
+ * S2M
+ * Request master to add new ring to master interface.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
+ memif_ring_index_t index; /**< ring index */
+ memif_region_index_t region; /**< region index on which this ring is located */
+ memif_region_offset_t offset; /**< buffer start offset */
+ memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
+ uint16_t private_hdr_size; /**< used for private metadata */
+} memif_msg_add_ring_t;
+
+/**
+ * S2M
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
+} memif_msg_connect_t;
+
+/**
+ * M2S
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
+} memif_msg_connected_t;
+
+/**
+ * S2M & M2S
+ * Disconnect interfaces.
+ */
+typedef struct __rte_packed {
+ uint32_t code; /**< error code */
+ uint8_t string[96]; /**< disconnect reason */
+} memif_msg_disconnect_t;
+
+typedef struct __rte_packed __rte_aligned(128)
+{
+ memif_msg_type_t type:16;
+ union {
+ memif_msg_hello_t hello;
+ memif_msg_init_t init;
+ memif_msg_add_region_t add_region;
+ memif_msg_add_ring_t add_ring;
+ memif_msg_connect_t connect;
+ memif_msg_connected_t connected;
+ memif_msg_disconnect_t disconnect;
+ };
+} memif_msg_t;
+
+/*
+ * Ring and Descriptor Layout
+ */
+
+/**
+ * Buffer descriptor.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
+ memif_region_index_t region; /**< region index on which the buffer is located */
+ uint32_t length; /**< buffer length */
+ memif_region_offset_t offset; /**< buffer offset */
+ uint32_t metadata;
+} memif_desc_t;
+
+#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
+ uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
+
+typedef struct {
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+ uint32_t cookie; /**< MEMIF_COOKIE */
+ uint16_t flags; /**< flags */
+#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ volatile uint16_t head; /**< pointer to ring buffer head */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ volatile uint16_t tail; /**< pointer to ring buffer tail */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ memif_desc_t desc[0]; /**< buffer descriptors */
+} memif_ring_t;
+
+#endif /* _MEMIF_H_ */
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
new file mode 100644
index 000000000..3596eb686
--- /dev/null
+++ b/drivers/net/memif/memif_socket.c
@@ -0,0 +1,1103 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+static void memif_intr_handler(void *arg);
+
+static ssize_t
+memif_msg_send(int fd, memif_msg_t *msg, int afd)
+{
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ struct cmsghdr *cmsg;
+ char ctl[CMSG_SPACE(sizeof(int))];
+
+ iov[0].iov_base = msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+
+ if (afd > 0) {
+ memset(&ctl, 0, sizeof(ctl));
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+ cmsg = CMSG_FIRSTHDR(&mh);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
+ }
+
+ return sendmsg(fd, &mh, 0);
+}
+
+static int
+memif_msg_send_from_queue(struct memif_control_channel *cc)
+{
+ ssize_t size;
+ int ret = 0;
+ struct memif_msg_queue_elt *e;
+
+ e = TAILQ_FIRST(&cc->msg_queue);
+ if (e == NULL)
+ return 0;
+
+ size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
+ ret = -1;
+ } else {
+ MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
+ }
+ TAILQ_REMOVE(&cc->msg_queue, e, next);
+ rte_free(e);
+
+ return ret;
+}
+
+static struct memif_msg_queue_elt *
+memif_msg_enq(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e;
+
+ e = rte_zmalloc("memif_msg", sizeof(struct memif_msg_queue_elt), 0);
+ if (e == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control message.");
+ return NULL;
+ }
+
+ e->fd = -1;
+ TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
+
+ return e;
+}
+
+void
+memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code)
+{
+ struct memif_msg_queue_elt *e;
+ struct pmd_internals *pmd;
+ memif_msg_disconnect_t *d;
+
+ if (cc == NULL) {
+ MIF_LOG(DEBUG, "Missing control channel.");
+ return;
+ }
+
+ e = memif_msg_enq(cc);
+ if (e == NULL) {
+ MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
+ return;
+ }
+
+ d = &e->msg.disconnect;
+
+ e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
+ d->code = err_code;
+
+ if (reason != NULL) {
+ strlcpy((char *)d->string, reason, sizeof(d->string));
+ if (cc->dev != NULL) {
+ pmd = cc->dev->data->dev_private;
+ strlcpy(pmd->local_disc_string, reason,
+ sizeof(pmd->local_disc_string));
+ }
+ }
+}
+
+static int
+memif_msg_enq_hello(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ memif_msg_hello_t *h;
+
+ if (e == NULL)
+ return -1;
+
+ h = &e->msg.hello;
+
+ e->msg.type = MEMIF_MSG_TYPE_HELLO;
+ h->min_version = MEMIF_VERSION;
+ h->max_version = MEMIF_VERSION;
+ h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_region = ETH_MEMIF_MAX_REGION_NUM - 1;
+ h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
+
+ strlcpy((char *)h->name, rte_version(), sizeof(h->name));
+
+ return 0;
+}
+
+static int
+memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_hello_t *h = &msg->hello;
+
+ if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
+ memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ /* Set parameters for active connection */
+ pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
+ pmd->cfg.num_s2m_rings);
+ pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
+ pmd->cfg.num_m2s_rings);
+ pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
+ pmd->cfg.log2_ring_size);
+ pmd->run.pkt_buffer_size = pmd->cfg.pkt_buffer_size;
+
+ strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
+
+ MIF_LOG(DEBUG, "%s: Connecting to %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
+{
+ memif_msg_init_t *i = &msg->init;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *pmd;
+ struct rte_eth_dev *dev;
+
+ if (i->version != MEMIF_VERSION) {
+ memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ if (cc->socket == NULL) {
+ memif_msg_enq_disconnect(cc, "Device error", 0);
+ return -1;
+ }
+
+ /* Find device with requested ID */
+ TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
+ dev = elt->dev;
+ pmd = dev->data->dev_private;
+ if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
+ pmd->id == i->id) {
+ /* assign control channel to device */
+ cc->dev = dev;
+ pmd->cc = cc;
+
+ if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Only ethernet mode supported",
+ 0);
+ return -1;
+ }
+
+ if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
+ ETH_MEMIF_FLAG_CONNECTED)) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Already connected", 0);
+ return -1;
+ }
+ strlcpy(pmd->remote_name, (char *)i->name,
+ sizeof(pmd->remote_name));
+
+ if (*pmd->secret != '\0') {
+ if (*i->secret == '\0') {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Secret required", 0);
+ return -1;
+ }
+ if (strcmp(pmd->secret, (char *)i->secret) != 0) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Incorrect secret", 0);
+ return -1;
+ }
+ }
+
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
+ return 0;
+ }
+ }
+
+ /* ID not found on this socket */
+ MIF_LOG(DEBUG, "ID %u not found.", i->id);
+ memif_msg_enq_disconnect(cc, "ID not found", 0);
+ return -1;
+}
+
+static int
+memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
+ int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_region_t *ar = &msg->add_region;
+ struct memif_region *r;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
+ return -1;
+ }
+
+ if (ar->index >= ETH_MEMIF_MAX_REGION_NUM || ar->index != pmd->regions_num ||
+ pmd->regions[ar->index] != NULL) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ r->fd = fd;
+ r->region_size = ar->size;
+ r->addr = NULL;
+
+ pmd->regions[ar->index] = r;
+ pmd->regions_num++;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_ring_t *ar = &msg->add_ring;
+ struct memif_queue *mq;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
+ return -1;
+ }
+
+ /* check if we have enough queues */
+ if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
+ if (ar->index >= pmd->cfg.num_s2m_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_s2m_rings++;
+ } else {
+ if (ar->index >= pmd->cfg.num_m2s_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_m2s_rings++;
+ }
+
+ mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
+ dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
+
+ mq->intr_handle.fd = fd;
+ mq->log2_ring_size = ar->log2_ring_size;
+ mq->region = ar->region;
+ mq->offset = ar->offset;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connect_t *c = &msg->connect;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connected_t *c = &msg->connected;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_disconnect_t *d = &msg->disconnect;
+
+ memset(pmd->remote_disc_string, 0, sizeof(pmd->remote_disc_string));
+ strlcpy(pmd->remote_disc_string, (char *)d->string,
+ sizeof(pmd->remote_disc_string));
+
+ MIF_LOG(INFO, "%s: Disconnect received: %s",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+
+ memset(pmd->local_disc_string, 0, 96);
+ memif_disconnect(rte_eth_dev_allocated
+ (rte_vdev_device_name(pmd->vdev)));
+ return 0;
+}
+
+static int
+memif_msg_enq_ack(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ if (e == NULL)
+ return -1;
+
+ e->msg.type = MEMIF_MSG_TYPE_ACK;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_init(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_init_t *i = &e->msg.init;
+
+ if (e == NULL)
+ return -1;
+
+ i = &e->msg.init;
+ e->msg.type = MEMIF_MSG_TYPE_INIT;
+ i->version = MEMIF_VERSION;
+ i->id = pmd->id;
+ i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
+
+ strlcpy((char *)i->name, rte_version(), sizeof(i->name));
+
+ if (*pmd->secret != '\0')
+ strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_add_region_t *ar;
+ struct memif_region *mr = pmd->regions[idx];
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_region;
+ e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
+ e->fd = mr->fd;
+ ar->index = idx;
+ ar->size = mr->region_size;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
+ memif_ring_type_t type)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ struct memif_queue *mq;
+ memif_msg_add_ring_t *ar;
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_ring;
+ mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
+ dev->data->rx_queues[idx];
+
+ e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
+ e->fd = mq->intr_handle.fd;
+ ar->index = idx;
+ ar->offset = mq->offset;
+ ar->region = mq->region;
+ ar->log2_ring_size = mq->log2_ring_size;
+ ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
+ ar->private_hdr_size = 0;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connect_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connect;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECT;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connected(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connected_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connected;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static void
+memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
+{
+ struct memif_msg_queue_elt *elt;
+ struct memif_control_channel *cc = arg;
+
+ /* close control channel fd */
+ close(intr_handle->fd);
+ /* clear message queue */
+ while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
+ TAILQ_REMOVE(&cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ /* free control channel */
+ rte_free(cc);
+}
+
+void
+memif_disconnect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *elt, *next;
+ struct memif_queue *mq;
+ struct rte_intr_handle *ih;
+ int i;
+ int ret;
+
+ if (pmd->cc != NULL) {
+ /* Clear control message queue (except disconnect message if any). */
+ for (elt = TAILQ_FIRST(&pmd->cc->msg_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
+ TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ }
+ /* send disconnect message (if there is any in queue) */
+ memif_msg_send_from_queue(pmd->cc);
+
+ /* at this point, there should be no more messages in queue */
+ if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
+ MIF_LOG(WARNING,
+ "%s: Unexpected message(s) in message queue.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+
+ ih = &pmd->cc->intr_handle;
+ if (ih->fd > 0) {
+ ret = rte_intr_callback_unregister(ih,
+ memif_intr_handler,
+ pmd->cc);
+ /*
+ * If callback is active (disconnecting based on
+ * received control message).
+ */
+ if (ret == -EAGAIN) {
+ ret = rte_intr_callback_unregister_pending(ih,
+ memif_intr_handler,
+ pmd->cc,
+ memif_intr_unregister_handler);
+ } else if (ret > 0) {
+ close(ih->fd);
+ rte_free(pmd->cc);
+ }
+ pmd->cc = NULL;
+ if (ret <= 0)
+ MIF_LOG(WARNING,
+ "%s: Failed to unregister control channel callback.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+ }
+
+ /* unconfig interrupts */
+ for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+ for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+
+ memif_free_regions(pmd);
+
+ /* reset connection configuration */
+ memset(&pmd->run, 0, sizeof(pmd->run));
+
+ dev->data->dev_link.link_status = ETH_LINK_DOWN;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
+ MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
+}
+
+static int
+memif_msg_receive(struct memif_control_channel *cc)
+{
+ char ctl[CMSG_SPACE(sizeof(int)) +
+ CMSG_SPACE(sizeof(struct ucred))] = { 0 };
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ memif_msg_t msg = { 0 };
+ ssize_t size;
+ int ret = 0;
+ struct ucred *cr __rte_unused;
+ cr = 0;
+ struct cmsghdr *cmsg;
+ int afd = -1;
+ int i;
+ struct pmd_internals *pmd;
+
+ iov[0].iov_base = (void *)&msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+
+ size = recvmsg(cc->intr_handle.fd, &mh, 0);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(DEBUG, "Invalid message size.");
+ memif_msg_enq_disconnect(cc, "Invalid message size", 0);
+ return -1;
+ }
+ MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
+
+ cmsg = CMSG_FIRSTHDR(&mh);
+ while (cmsg) {
+ if (cmsg->cmsg_level == SOL_SOCKET) {
+ if (cmsg->cmsg_type == SCM_CREDENTIALS)
+ cr = (struct ucred *)CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_RIGHTS)
+ afd = *(int *)CMSG_DATA(cmsg);
+ }
+ cmsg = CMSG_NXTHDR(&mh, cmsg);
+ }
+
+ if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
+ MIF_LOG(DEBUG, "Unexpected message.");
+ memif_msg_enq_disconnect(cc, "Unexpected message", 0);
+ return -1;
+ }
+
+ /* get device from hash data */
+ switch (msg.type) {
+ case MEMIF_MSG_TYPE_ACK:
+ break;
+ case MEMIF_MSG_TYPE_HELLO:
+ ret = memif_msg_receive_hello(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_init_regions_and_queues(cc->dev);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_init(cc->dev);
+ if (ret < 0)
+ goto exit;
+ pmd = cc->dev->data->dev_private;
+ for (i = 0; i < pmd->regions_num; i++) {
+ ret = memif_msg_enq_add_region(cc->dev, i);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_S2M);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_M2S);
+ if (ret < 0)
+ goto exit;
+ }
+ ret = memif_msg_enq_connect(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_INIT:
+ /*
+ * This cc does not have an interface asociated with it.
+ * If suitable interface is found it will be assigned here.
+ */
+ ret = memif_msg_receive_init(cc, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_REGION:
+ ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_RING:
+ ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECT:
+ ret = memif_msg_receive_connect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_connected(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECTED:
+ ret = memif_msg_receive_connected(cc->dev, &msg);
+ break;
+ case MEMIF_MSG_TYPE_DISCONNECT:
+ ret = memif_msg_receive_disconnect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ break;
+ default:
+ memif_msg_enq_disconnect(cc, "Unknown message type", 0);
+ ret = -1;
+ goto exit;
+ }
+
+ exit:
+ return ret;
+}
+
+static void
+memif_intr_handler(void *arg)
+{
+ struct memif_control_channel *cc = arg;
+ int ret;
+
+ ret = memif_msg_receive(cc);
+ /* if driver failed to assign device */
+ if (cc->dev == NULL) {
+ ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
+ memif_intr_handler,
+ cc,
+ memif_intr_unregister_handler);
+ if (ret < 0)
+ MIF_LOG(WARNING,
+ "Failed to unregister control channel callback.");
+ return;
+ }
+ /* if memif_msg_receive failed */
+ if (ret < 0)
+ goto disconnect;
+
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto disconnect;
+
+ return;
+
+ disconnect:
+ if (cc->dev == NULL) {
+ MIF_LOG(WARNING, "eth dev not allocated");
+ return;
+ }
+ memif_disconnect(cc->dev);
+}
+
+static void
+memif_listener_handler(void *arg)
+{
+ struct memif_socket *socket = arg;
+ int sockfd;
+ int addr_len;
+ struct sockaddr_un client;
+ struct memif_control_channel *cc;
+ int ret;
+
+ addr_len = sizeof(client);
+ sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
+ (socklen_t *)&addr_len);
+ if (sockfd < 0) {
+ MIF_LOG(ERR,
+ "Failed to accept connection request on socket fd %d",
+ socket->intr_handle.fd);
+ return;
+ }
+
+ MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
+
+ cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
+ if (cc == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control channel.");
+ goto error;
+ }
+
+ cc->intr_handle.fd = sockfd;
+ cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ cc->socket = socket;
+ cc->dev = NULL;
+ TAILQ_INIT(&cc->msg_queue);
+
+ ret = rte_intr_callback_register(&cc->intr_handle, memif_intr_handler, cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to register control channel callback.");
+ goto error;
+ }
+
+ ret = memif_msg_enq_hello(cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to enqueue hello message.");
+ goto error;
+ }
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto error;
+
+ return;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (cc != NULL)
+ rte_free(cc);
+}
+
+static struct memif_socket *
+memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
+{
+ struct memif_socket *sock;
+ struct sockaddr_un un;
+ int sockfd;
+ int ret;
+ int on = 1;
+
+ sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
+ if (sock == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory for memif socket");
+ return NULL;
+ }
+
+ sock->listener = listener;
+ rte_memcpy(sock->filename, key, 256);
+ TAILQ_INIT(&sock->dev_queue);
+
+ if (listener != 0) {
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0)
+ goto error;
+
+ un.sun_family = AF_UNIX;
+ memcpy(un.sun_path, sock->filename,
+ sizeof(un.sun_path) - 1);
+
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
+ sizeof(on));
+ if (ret < 0)
+ goto error;
+ ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+ if (ret < 0)
+ goto error;
+ ret = listen(sockfd, 1);
+ if (ret < 0)
+ goto error;
+
+ MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
+ rte_vdev_device_name(pmd->vdev), sock->filename);
+
+ sock->intr_handle.fd = sockfd;
+ sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ ret = rte_intr_callback_register(&sock->intr_handle,
+ memif_listener_handler, sock);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt "
+ "callback for listener socket",
+ rte_vdev_device_name(pmd->vdev));
+ return NULL;
+ }
+ }
+
+ return sock;
+
+ error:
+ MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
+ rte_vdev_device_name(pmd->vdev), key, strerror(errno));
+ if (sock != NULL)
+ rte_free(sock);
+ return NULL;
+}
+
+static struct rte_hash *
+memif_create_socket_hash(void)
+{
+ struct rte_hash_parameters params = { 0 };
+ params.name = MEMIF_SOCKET_HASH_NAME;
+ params.entries = 256;
+ params.key_len = 256;
+ params.hash_func = rte_jhash;
+ params.hash_func_init_val = 0;
+ return rte_hash_create(¶ms);
+}
+
+int
+memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *tmp_pmd;
+ struct rte_hash *hash;
+ int ret;
+ char key[256];
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL) {
+ hash = memif_create_socket_hash();
+ if (hash == NULL) {
+ MIF_LOG(ERR, "Failed to create memif socket hash.");
+ return -1;
+ }
+ }
+
+ memset(key, 0, 256);
+ rte_memcpy(key, socket_filename, strlen(socket_filename));
+ ret = rte_hash_lookup_data(hash, key, (void **)&socket);
+ if (ret < 0) {
+ socket = memif_socket_create(pmd, key,
+ (pmd->role ==
+ MEMIF_ROLE_SLAVE) ? 0 : 1);
+ if (socket == NULL)
+ return -1;
+ ret = rte_hash_add_key_data(hash, key, socket);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to add socket to socket hash.");
+ return ret;
+ }
+ }
+ pmd->socket_filename = socket->filename;
+
+ if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
+ MIF_LOG(ERR, "Socket is a listener.");
+ return -1;
+ } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
+ MIF_LOG(ERR, "Socket is not a listener.");
+ return -1;
+ }
+
+ TAILQ_FOREACH(elt, &socket->dev_queue, next) {
+ tmp_pmd = elt->dev->data->dev_private;
+ if (tmp_pmd->id == pmd->id) {
+ MIF_LOG(ERR, "Memif device with id %d already "
+ "exists on socket %s",
+ pmd->id, socket->filename);
+ return -1;
+ }
+ }
+
+ elt = rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt), 0);
+ if (elt == NULL) {
+ MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ elt->dev = dev;
+ TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
+
+ return 0;
+}
+
+void
+memif_socket_remove_device(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt, *next;
+ struct rte_hash *hash;
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL)
+ return;
+
+ if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) <
+ 0)
+ return;
+
+ for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->dev == dev) {
+ TAILQ_REMOVE(&socket->dev_queue, elt, next);
+ rte_free(elt);
+ pmd->socket_filename = NULL;
+ }
+ }
+
+ /* remove socket, if this was the last device using it */
+ if (TAILQ_EMPTY(&socket->dev_queue)) {
+ rte_hash_del_key(hash, socket->filename);
+ if (socket->listener) {
+ /* remove listener socket file,
+ * so we can create new one later.
+ */
+ remove(socket->filename);
+ }
+ rte_free(socket);
+ }
+}
+
+int
+memif_connect_master(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+ return 0;
+}
+
+int
+memif_connect_slave(struct rte_eth_dev *dev)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_un sun;
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0) {
+ MIF_LOG(ERR, "%s: Failed to open socket.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ sun.sun_family = AF_UNIX;
+
+ memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
+
+ ret = connect(sockfd, (struct sockaddr *)&sun,
+ sizeof(struct sockaddr_un));
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+ goto error;
+ }
+
+ MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+
+ pmd->cc = rte_zmalloc("memif-cc",
+ sizeof(struct memif_control_channel), 0);
+ if (pmd->cc == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate control channel.",
+ rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ pmd->cc->intr_handle.fd = sockfd;
+ pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ pmd->cc->socket = NULL;
+ pmd->cc->dev = dev;
+ TAILQ_INIT(&pmd->cc->msg_queue);
+
+ ret = rte_intr_callback_register(&pmd->cc->intr_handle,
+ memif_intr_handler, pmd->cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt callback "
+ "for control fd", rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ return 0;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (pmd->cc != NULL) {
+ rte_free(pmd->cc);
+ pmd->cc = NULL;
+ }
+ return -1;
+}
diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
new file mode 100644
index 000000000..2e8970712
--- /dev/null
+++ b/drivers/net/memif/memif_socket.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_SOCKET_H_
+#define _MEMIF_SOCKET_H_
+
+#include <sys/queue.h>
+
+/**
+ * Remove device from socket device list. If no device is left on the socket,
+ * remove the socket as well.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_socket_remove_device(struct rte_eth_dev *dev);
+
+/**
+ * Enqueue disconnect message to control channel message queue.
+ *
+ * @param cc
+ * control channel
+ * @param reason
+ * const string stating disconnect reason (96 characters)
+ * @param err_code
+ * error code
+ */
+void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code);
+
+/**
+ * Initialize memif socket for specified device. If socket doesn't exist, create socket.
+ *
+ * @param dev
+ * memif ethernet device
+ * @param socket_filename
+ * socket filename
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
+
+/**
+ * Disconnect memif device. Close control channel and shared memory.
+ *
+ * @param dev
+ * ethernet device
+ */
+void memif_disconnect(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, enable connection establishment.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_master(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, send connection request.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_slave(struct rte_eth_dev *dev);
+
+struct memif_socket_dev_list_elt {
+ TAILQ_ENTRY(memif_socket_dev_list_elt) next;
+ struct rte_eth_dev *dev; /**< pointer to device internals */
+ char dev_name[RTE_ETH_NAME_MAX_LEN];
+};
+
+#define MEMIF_SOCKET_HASH_NAME "memif-sh"
+struct memif_socket {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ uint8_t listener; /**< if not zero socket is listener */
+ char filename[256]; /**< socket filename */
+
+ TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
+ /**< Queue of devices using this socket */
+};
+
+/* Control message queue. */
+struct memif_msg_queue_elt {
+ TAILQ_ENTRY(memif_msg_queue_elt) next;
+ memif_msg_t msg; /**< control message */
+ int fd; /**< fd to be sent to peer */
+};
+
+struct memif_control_channel {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
+ struct memif_socket *socket; /**< pointer to socket */
+ struct rte_eth_dev *dev; /**< pointer to device */
+};
+
+#endif /* MEMIF_SOCKET_H */
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
new file mode 100644
index 000000000..4dfe37d3a
--- /dev/null
+++ b/drivers/net/memif/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+if host_machine.system() != 'linux'
+ build = false
+endif
+
+sources = files('rte_eth_memif.c',
+ 'memif_socket.c')
+
+allow_experimental_apis = true
+
+deps += ['hash']
diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
new file mode 100644
index 000000000..f42c39635
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -0,0 +1,1202 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/if_ether.h>
+#include <errno.h>
+#include <sys/eventfd.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+#define ETH_MEMIF_ID_ARG "id"
+#define ETH_MEMIF_ROLE_ARG "role"
+#define ETH_MEMIF_PKT_BUFFER_SIZE_ARG "bsize"
+#define ETH_MEMIF_RING_SIZE_ARG "rsize"
+#define ETH_MEMIF_SOCKET_ARG "socket"
+#define ETH_MEMIF_MAC_ARG "mac"
+#define ETH_MEMIF_ZC_ARG "zero-copy"
+#define ETH_MEMIF_SECRET_ARG "secret"
+
+static const char *valid_arguments[] = {
+ ETH_MEMIF_ID_ARG,
+ ETH_MEMIF_ROLE_ARG,
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ ETH_MEMIF_RING_SIZE_ARG,
+ ETH_MEMIF_SOCKET_ARG,
+ ETH_MEMIF_MAC_ARG,
+ ETH_MEMIF_ZC_ARG,
+ ETH_MEMIF_SECRET_ARG,
+ NULL
+};
+
+const char *
+memif_version(void)
+{
+ return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
+}
+
+static void
+memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
+{
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+ dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static memif_ring_t *
+memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
+{
+ /* rings only in region 0 */
+ void *p = pmd->regions[0]->addr;
+ int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size);
+
+ p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
+
+ return (memif_ring_t *)p;
+}
+
+static void *
+memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
+{
+ return ((uint8_t *)pmd->regions[d->region]->addr +
+ pmd->regions[d->region]->pkt_buffer_offset + d->offset);
+}
+
+static int
+memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail,
+ struct rte_mbuf *tail)
+{
+ /* Check for number-of-segments-overflow */
+ if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS))
+ return -EOVERFLOW;
+
+ /* Chain 'tail' onto the old tail */
+ cur_tail->next = tail;
+
+ /* accumulate number of segments and total length. */
+ head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs);
+
+ tail->pkt_len = tail->data_len;
+ head->pkt_len += tail->pkt_len;
+
+ return 0;
+}
+
+static uint16_t
+eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
+ uint16_t n_rx_pkts = 0;
+ uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
+ RTE_PKTMBUF_HEADROOM;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail;
+ uint64_t b;
+ ssize_t size __rte_unused;
+ uint16_t head;
+ int ret;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ /* consume interrupt */
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
+ size = read(mq->intr_handle.fd, &b, sizeof(b));
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
+ last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+ if (cur_slot == last_slot)
+ goto refill;
+ n_slots = last_slot - cur_slot;
+
+ while (n_slots && n_rx_pkts < nb_pkts) {
+ mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf_head == NULL))
+ goto no_free_bufs;
+ mbuf = mbuf_head;
+ mbuf->port = mq->in_port;
+
+next_slot:
+ s0 = cur_slot & mask;
+ d0 = &ring->desc[s0];
+
+ src_len = d0->length;
+ dst_off = 0;
+ src_off = 0;
+
+ do {
+ dst_len = mbuf_size - dst_off;
+ if (dst_len == 0) {
+ dst_off = 0;
+ dst_len = mbuf_size;
+
+ /* store pointer to tail */
+ mbuf_tail = mbuf;
+ mbuf = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf == NULL))
+ goto no_free_bufs;
+ mbuf->port = mq->in_port;
+ ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf);
+ if (unlikely(ret < 0)) {
+ MIF_LOG(ERR, "%s: number-of-segments-overflow",
+ rte_vdev_device_name(pmd->vdev));
+ rte_pktmbuf_free(mbuf);
+ goto no_free_bufs;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ rte_pktmbuf_data_len(mbuf) += cp_len;
+ rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
+ if (mbuf != mbuf_head)
+ rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
+
+ memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
+ (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
+
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ } while (src_len);
+
+ cur_slot++;
+ n_slots--;
+
+ if (d0->flags & MEMIF_DESC_FLAG_NEXT)
+ goto next_slot;
+
+ mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
+ *bufs++ = mbuf_head;
+ n_rx_pkts++;
+ }
+
+no_free_bufs:
+ if (type == MEMIF_RING_S2M) {
+ rte_mb();
+ ring->tail = cur_slot;
+ mq->last_head = cur_slot;
+ } else {
+ mq->last_tail = cur_slot;
+ }
+
+refill:
+ if (type == MEMIF_RING_M2S) {
+ head = ring->head;
+ n_slots = ring_size - head + mq->last_tail;
+
+ while (n_slots--) {
+ s0 = head++ & mask;
+ d0 = &ring->desc[s0];
+ d0->length = pmd->run.pkt_buffer_size;
+ }
+ rte_mb();
+ ring->head = head;
+ }
+
+ mq->n_pkts += n_rx_pkts;
+ return n_rx_pkts;
+}
+
+static uint16_t
+eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head;
+ uint64_t a;
+ ssize_t size;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ n_free = ring->tail - mq->last_tail;
+ mq->last_tail += n_free;
+ slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+
+ if (type == MEMIF_RING_S2M)
+ n_free = ring_size - ring->head + mq->last_tail;
+ else
+ n_free = ring->head - ring->tail;
+
+ while (n_tx_pkts < nb_pkts && n_free) {
+ mbuf_head = *bufs++;
+ mbuf = mbuf_head;
+
+ saved_slot = slot;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+
+next_in_chain:
+ src_off = 0;
+ src_len = rte_pktmbuf_data_len(mbuf);
+
+ while (src_len) {
+ if (dst_len == 0) {
+ if (n_free) {
+ slot++;
+ n_free--;
+ d0->flags |= MEMIF_DESC_FLAG_NEXT;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+ d0->flags = 0;
+ } else {
+ slot = saved_slot;
+ goto no_free_slots;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
+ rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
+ cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ dst_len -= cp_len;
+
+ d0->length = dst_off;
+ }
+
+ if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
+ mbuf = mbuf->next;
+ goto next_in_chain;
+ }
+
+ n_tx_pkts++;
+ slot++;
+ n_free--;
+ rte_pktmbuf_free(mbuf_head);
+ }
+
+no_free_slots:
+ rte_mb();
+ if (type == MEMIF_RING_S2M)
+ ring->head = slot;
+ else
+ ring->tail = slot;
+
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
+ a = 1;
+ size = write(mq->intr_handle.fd, &a, sizeof(a));
+ if (unlikely(size < 0)) {
+ MIF_LOG(WARNING,
+ "%s: Failed to send interrupt. %s",
+ rte_vdev_device_name(pmd->vdev), strerror(errno));
+ }
+ }
+
+ mq->n_err += nb_pkts - n_tx_pkts;
+ mq->n_pkts += n_tx_pkts;
+ return n_tx_pkts;
+}
+
+void
+memif_free_regions(struct pmd_internals *pmd)
+{
+ int i;
+ struct memif_region *r;
+
+ /* regions are allocated contiguously, so it's
+ * enough to loop until 'pmd->regions_num'
+ */
+ for (i = 0; i < pmd->regions_num; i++) {
+ r = pmd->regions[i];
+ if (r != NULL) {
+ if (r->addr != NULL) {
+ munmap(r->addr, r->region_size);
+ if (r->fd > 0) {
+ close(r->fd);
+ r->fd = -1;
+ }
+ }
+ rte_free(r);
+ pmd->regions[i] = NULL;
+ }
+ }
+ pmd->regions_num = 0;
+}
+
+static int
+memif_region_init_shm(struct pmd_internals *pmd, uint8_t has_buffers)
+{
+ char shm_name[ETH_MEMIF_SHM_NAME_SIZE];
+ int ret = 0;
+ struct memif_region *r;
+
+ if (pmd->regions_num >= ETH_MEMIF_MAX_REGION_NUM) {
+ MIF_LOG(ERR, "%s: Too many regions.", rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ /* calculate buffer offset */
+ r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+
+ r->region_size = r->pkt_buffer_offset;
+ /* if region has buffers, add buffers size to region_size */
+ if (has_buffers == 1)
+ r->region_size += (uint32_t)(pmd->run.pkt_buffer_size *
+ (1 << pmd->run.log2_ring_size) *
+ (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings));
+
+ memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE);
+ snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d",
+ pmd->regions_num);
+
+ r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
+ if (r->fd < 0) {
+ MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ ret = -1;
+ goto error;
+ }
+
+ ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ ret = ftruncate(r->fd, r->region_size);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ r->addr = mmap(NULL, r->region_size, PROT_READ |
+ PROT_WRITE, MAP_SHARED, r->fd, 0);
+ if (r->addr == MAP_FAILED) {
+ MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(ret));
+ ret = -1;
+ goto error;
+ }
+
+ pmd->regions[pmd->regions_num] = r;
+ pmd->regions_num++;
+
+ return ret;
+
+error:
+ if (r->fd > 0)
+ close(r->fd);
+ r->fd = -1;
+
+ return ret;
+}
+
+static int
+memif_regions_init(struct pmd_internals *pmd)
+{
+ int ret;
+
+ /* create one buffer region */
+ ret = memif_region_init_shm(pmd, /* has buffer */ 1);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void
+memif_init_rings(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_ring_t *ring;
+ int i, j;
+ uint16_t slot;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = i * (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = (uint32_t)(slot *
+ pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = (i + pmd->run.num_s2m_rings) *
+ (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = (uint32_t)(slot *
+ pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+}
+
+/* called only by slave */
+static void
+memif_init_queues(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = dev->data->tx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for tx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = dev->data->rx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for rx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+}
+
+int
+memif_init_regions_and_queues(struct rte_eth_dev *dev)
+{
+ int ret;
+
+ ret = memif_regions_init(dev->data->dev_private);
+ if (ret < 0)
+ return ret;
+
+ memif_init_rings(dev);
+
+ memif_init_queues(dev);
+
+ return 0;
+}
+
+int
+memif_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_region *mr;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ mr = pmd->regions[i];
+ if (mr != NULL) {
+ if (mr->addr == NULL) {
+ if (mr->fd < 0)
+ return -1;
+ mr->addr = mmap(NULL, mr->region_size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, mr->fd, 0);
+ if (mr->addr == NULL)
+ return -1;
+ /* calculate packet buffer offset */
+ mr->pkt_buffer_offset = (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+ }
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_MASTER)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_SLAVE)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
+ dev->data->dev_link.link_status = ETH_LINK_UP;
+ MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
+ return 0;
+}
+
+static int
+memif_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int ret = 0;
+
+ switch (pmd->role) {
+ case MEMIF_ROLE_SLAVE:
+ ret = memif_connect_slave(dev);
+ break;
+ case MEMIF_ROLE_MASTER:
+ ret = memif_connect_master(dev);
+ break;
+ default:
+ MIF_LOG(ERR, "%s: Unknown role: %d.",
+ rte_vdev_device_name(pmd->vdev), pmd->role);
+ ret = -1;
+ break;
+ }
+
+ return ret;
+}
+
+static int
+memif_dev_configure(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ /*
+ * SLAVE - TXQ
+ * MASTER - RXQ
+ */
+ pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_tx_queues : dev->data->nb_rx_queues;
+
+ /*
+ * SLAVE - RXQ
+ * MASTER - TXQ
+ */
+ pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+
+ return 0;
+}
+
+static int
+memif_tx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type =
+ (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->pmd = pmd;
+ dev->data->tx_queues[qid] = mq;
+
+ return 0;
+}
+
+static int
+memif_rx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ printf("Setting up rx queue...\n");
+
+ mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->mempool = mb_pool;
+ mq->in_port = dev->data->port_id;
+ mq->pmd = pmd;
+ dev->data->rx_queues[qid] = mq;
+
+ return 0;
+}
+
+static void
+memif_queue_release(void *queue)
+{
+ struct memif_queue *q = (struct memif_queue *)queue;
+
+ if (!q)
+ return;
+
+ rte_free(q);
+}
+
+static int
+memif_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static int
+memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+ uint8_t tmp, nq;
+
+ stats->ipackets = 0;
+ stats->ibytes = 0;
+ stats->opackets = 0;
+ stats->obytes = 0;
+ stats->oerrors = 0;
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
+ pmd->run.num_m2s_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* RX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->rx_queues[i];
+ stats->q_ipackets[i] = mq->n_pkts;
+ stats->q_ibytes[i] = mq->n_bytes;
+ stats->ipackets += mq->n_pkts;
+ stats->ibytes += mq->n_bytes;
+ }
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
+ pmd->run.num_s2m_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* TX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->tx_queues[i];
+ stats->q_opackets[i] = mq->n_pkts;
+ stats->q_obytes[i] = mq->n_bytes;
+ stats->q_errors[i] = mq->n_err;
+ stats->opackets += mq->n_pkts;
+ stats->obytes += mq->n_bytes;
+ stats->oerrors += mq->n_err;
+ }
+ return 0;
+}
+
+static void
+memif_stats_reset(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+ struct memif_queue *mq;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
+ dev->data->rx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
+ dev->data->tx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+}
+
+static int
+memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
+ rte_vdev_device_name(pmd->vdev));
+
+ return -1;
+}
+
+static int
+memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
+
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = memif_dev_start,
+ .dev_infos_get = memif_dev_info,
+ .dev_configure = memif_dev_configure,
+ .tx_queue_setup = memif_tx_queue_setup,
+ .rx_queue_setup = memif_rx_queue_setup,
+ .rx_queue_release = memif_queue_release,
+ .tx_queue_release = memif_queue_release,
+ .rx_queue_intr_enable = memif_rx_queue_intr_enable,
+ .rx_queue_intr_disable = memif_rx_queue_intr_disable,
+ .link_update = memif_link_update,
+ .stats_get = memif_stats_get,
+ .stats_reset = memif_stats_reset,
+};
+
+static int
+memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
+ memif_interface_id_t id, uint32_t flags,
+ const char *socket_filename,
+ memif_log2_ring_size_t log2_ring_size,
+ uint16_t pkt_buffer_size, const char *secret,
+ struct ether_addr *ether_addr)
+{
+ int ret = 0;
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct pmd_internals *pmd;
+ const unsigned int numa_node = vdev->device.numa_node;
+ const char *name = rte_vdev_device_name(vdev);
+
+ if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
+ MIF_LOG(ERR, "Zero-copy slave not supported.");
+ return -1;
+ }
+
+ eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
+ if (eth_dev == NULL) {
+ MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
+ return -1;
+ }
+
+ pmd = eth_dev->data->dev_private;
+ memset(pmd, 0, sizeof(*pmd));
+
+ pmd->vdev = vdev;
+ pmd->id = id;
+ pmd->flags = flags;
+ pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
+ pmd->role = role;
+ printf("%s\n", role == MEMIF_ROLE_SLAVE ? "SLAVE" : "MASTER");
+ ret = memif_socket_init(eth_dev, socket_filename);
+ if (ret < 0)
+ return ret;
+
+ memset(pmd->secret, 0, sizeof(char) * 24);
+ if (secret != NULL)
+ strlcpy(pmd->secret, secret, sizeof(pmd->secret));
+
+ pmd->cfg.log2_ring_size = log2_ring_size;
+ /* set in .dev_configure() */
+ pmd->cfg.num_s2m_rings = 0;
+ pmd->cfg.num_m2s_rings = 0;
+
+ pmd->cfg.pkt_buffer_size = pkt_buffer_size;
+
+ data = eth_dev->data;
+ data->dev_private = pmd;
+ data->numa_node = numa_node;
+ data->mac_addrs = ether_addr;
+
+ eth_dev->dev_ops = &ops;
+ eth_dev->device = &vdev->device;
+ eth_dev->rx_pkt_burst = eth_memif_rx;
+ eth_dev->tx_pkt_burst = eth_memif_tx;
+
+ rte_eth_dev_probing_finish(eth_dev);
+
+ return 0;
+}
+
+static int
+memif_set_role(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ enum memif_role_t *role = (enum memif_role_t *)extra_args;
+
+ if (strstr(value, "master") != NULL) {
+ *role = MEMIF_ROLE_MASTER;
+ } else if (strstr(value, "slave") != NULL) {
+ *role = MEMIF_ROLE_SLAVE;
+ } else {
+ MIF_LOG(ERR, "Unknown role: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint32_t *flags = (uint32_t *)extra_args;
+
+ if (strstr(value, "yes") != NULL) {
+ *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
+ } else if (strstr(value, "no") != NULL) {
+ *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
+ } else {
+ MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
+
+ /* even if parsing fails, 0 is a valid id */
+ *id = strtoul(value, NULL, 10);
+ return 0;
+}
+
+static int
+memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ uint16_t *pkt_buffer_size = (uint16_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > 0xFFFF) {
+ MIF_LOG(ERR, "Invalid buffer size: %s.", value);
+ return -EINVAL;
+ }
+ *pkt_buffer_size = tmp;
+ return 0;
+}
+
+static int
+memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ memif_log2_ring_size_t *log2_ring_size =
+ (memif_log2_ring_size_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
+ MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
+ value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
+ return -EINVAL;
+ }
+ *log2_ring_size = tmp;
+ return 0;
+}
+
+/* check if directory exists and if we have permission to read/write */
+static int
+memif_check_socket_filename(const char *filename)
+{
+ char *dir = NULL, *tmp;
+ uint32_t idx;
+ int ret = 0;
+
+ tmp = strrchr(filename, '/');
+ if (tmp != NULL) {
+ idx = tmp - filename;
+ dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
+ if (dir == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory.");
+ return -1;
+ }
+ strlcpy(dir, filename, sizeof(char) * (idx + 1));
+ }
+
+ if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
+ W_OK, AT_EACCESS) < 0)) {
+ MIF_LOG(ERR, "Invalid directory: '%s'.", dir);
+ ret = -EINVAL;
+ }
+
+ if (dir != NULL)
+ rte_free(dir);
+
+ return ret;
+}
+
+static int
+memif_set_socket_filename(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ const char **socket_filename = (const char **)extra_args;
+
+ *socket_filename = value;
+ return memif_check_socket_filename(*socket_filename);
+}
+
+static int
+memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ struct ether_addr *ether_addr = (struct ether_addr *)extra_args;
+ int ret = 0;
+
+ ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+ ðer_addr->addr_bytes[0], ðer_addr->addr_bytes[1],
+ ðer_addr->addr_bytes[2], ðer_addr->addr_bytes[3],
+ ðer_addr->addr_bytes[4], ðer_addr->addr_bytes[5]);
+ if (ret != 6)
+ MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
+ return 0;
+}
+
+static int
+memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **secret = (const char **)extra_args;
+
+ *secret = value;
+ return 0;
+}
+
+static int
+rte_pmd_memif_probe(struct rte_vdev_device *vdev)
+{
+ RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
+ RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
+ int ret = 0;
+ struct rte_kvargs *kvlist;
+ const char *name = rte_vdev_device_name(vdev);
+ enum memif_role_t role = MEMIF_ROLE_SLAVE;
+ memif_interface_id_t id = 0;
+ uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
+ memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
+ const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
+ uint32_t flags = 0;
+ const char *secret = NULL;
+ struct ether_addr *ether_addr = rte_zmalloc("", sizeof(struct ether_addr), 0);
+
+ eth_random_addr(ether_addr->addr_bytes);
+
+ MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
+
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ MIF_LOG(ERR, "Multi-processing not supported for memif.");
+ /* TODO:
+ * Request connection information.
+ *
+ * Once memif in the primary process is connected,
+ * broadcast connection information.
+ */
+ return -1;
+ }
+
+ kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
+
+ /* parse parameters */
+ if (kvlist != NULL) {
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
+ &memif_set_role, &role);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
+ &memif_set_id, &id);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ &memif_set_bs, &pkt_buffer_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
+ &memif_set_rs, &log2_ring_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
+ &memif_set_socket_filename,
+ (void *)(&socket_filename));
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
+ &memif_set_mac, ether_addr);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
+ &memif_set_zc, &flags);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
+ &memif_set_secret, (void *)(&secret));
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* create interface */
+ ret = memif_create(vdev, role, id, flags, socket_filename,
+ log2_ring_size, pkt_buffer_size, secret, ether_addr);
+
+exit:
+ if (kvlist != NULL)
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_memif_remove(struct rte_vdev_device *vdev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internals *pmd;
+
+ eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
+ if (eth_dev == NULL)
+ return 0;
+
+ pmd = eth_dev->data->dev_private;
+
+ memif_msg_enq_disconnect(pmd->cc, "Device removed", 0);
+ memif_disconnect(eth_dev);
+
+ memif_socket_remove_device(eth_dev);
+
+ pmd->vdev = NULL;
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_vdev_driver pmd_memif_drv = {
+ .probe = rte_pmd_memif_probe,
+ .remove = rte_pmd_memif_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
+
+RTE_PMD_REGISTER_PARAM_STRING(net_memif,
+ ETH_MEMIF_ID_ARG "=<int>"
+ ETH_MEMIF_ROLE_ARG "=master|slave"
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG "=<int>"
+ ETH_MEMIF_RING_SIZE_ARG "=<int>"
+ ETH_MEMIF_SOCKET_ARG "=<string>"
+ ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
+ ETH_MEMIF_ZC_ARG "=yes|no"
+ ETH_MEMIF_SECRET_ARG "=<string>");
+
+int memif_logtype;
+
+RTE_INIT(memif_init_log)
+{
+ memif_logtype = rte_log_register("pmd.net.memif");
+ if (memif_logtype >= 0)
+ rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
new file mode 100644
index 000000000..e640c975f
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.h
@@ -0,0 +1,207 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_ETH_MEMIF_H_
+#define _RTE_ETH_MEMIF_H_
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif /* GNU_SOURCE */
+
+#include <sys/queue.h>
+
+#include <rte_ethdev_driver.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
+#include "memif.h"
+
+#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/tmp/memif.sock"
+#define ETH_MEMIF_DEFAULT_RING_SIZE 10
+#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
+
+#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
+#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
+#define ETH_MEMIF_MAX_REGION_NUM 256
+
+#define ETH_MEMIF_SHM_NAME_SIZE 32
+
+extern int memif_logtype;
+
+#define MIF_LOG(level, fmt, args...) \
+ rte_log(RTE_LOG_ ## level, memif_logtype, \
+ "%s(): " fmt "\n", __func__, ##args)
+
+enum memif_role_t {
+ MEMIF_ROLE_MASTER,
+ MEMIF_ROLE_SLAVE,
+};
+
+struct memif_region {
+ void *addr; /**< shared memory address */
+ memif_region_size_t region_size; /**< shared memory size */
+ int fd; /**< shared memory file descriptor */
+ uint32_t pkt_buffer_offset;
+ /**< offset from 'addr' to first packet buffer */
+};
+
+struct memif_queue {
+ struct rte_mempool *mempool; /**< mempool for RX packets */
+ uint16_t in_port; /**< port id */
+
+ struct pmd_internals *pmd; /**< device internals */
+
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+
+ /* ring info */
+ memif_ring_type_t type; /**< ring type */
+ memif_ring_t *ring; /**< pointer to ring */
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+
+ memif_region_index_t region; /**< shared memory region index */
+ memif_region_offset_t offset;
+ /**< ring offset from start of shm region (ring - memif_region.addr) */
+
+ uint16_t last_head; /**< last ring head */
+ uint16_t last_tail; /**< last ring tail */
+
+ /* rx/tx info */
+ uint64_t n_pkts; /**< number of rx/tx packets */
+ uint64_t n_bytes; /**< number of rx/tx bytes */
+ uint64_t n_err; /**< number of tx errors */
+};
+
+struct pmd_internals {
+ memif_interface_id_t id; /**< unique id */
+ enum memif_role_t role; /**< device role */
+ uint32_t flags; /**< device status flags */
+#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
+/**< device is connecting */
+#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
+/**< device is connected */
+#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
+/**< device is zero-copy enabled */
+#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
+/**< device has not been configured and can not accept connection requests */
+
+ char *socket_filename; /**< pointer to socket filename */
+ char secret[24]; /**< secret (optional security parameter) */
+
+ struct memif_control_channel *cc; /**< control channel */
+
+ struct memif_region *regions[ETH_MEMIF_MAX_REGION_NUM];
+ /**< shared memory regions */
+ memif_region_index_t regions_num; /**< number of regions */
+
+ /* remote info */
+ char remote_name[64]; /**< remote app name */
+ char remote_if_name[64]; /**< remote peer name */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } cfg; /**< Configured parameters (max values) */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } run;
+ /**< Parameters used in active connection */
+
+ char local_disc_string[96]; /**< local disconnect reason */
+ char remote_disc_string[96]; /**< remote disconnect reason */
+
+ struct rte_vdev_device *vdev; /**< vdev handle */
+};
+
+/**
+ * Unmap shared memory and free regions from memory.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_free_regions(struct pmd_internals *pmd);
+
+/**
+ * Finalize connection establishment process. Map shared memory file
+ * (master role), initialize ring queue, set link status up.
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect(struct rte_eth_dev *dev);
+
+/**
+ * Create shared memory file and initialize ring queue.
+ * Only called by slave when establishing connection
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_init_regions_and_queues(struct rte_eth_dev *dev);
+
+/**
+ * Get memif version string.
+ *
+ * @return
+ * - memif version string
+ */
+const char *memif_version(void);
+
+#ifndef MFD_HUGETLB
+#ifndef __NR_memfd_create
+
+#if defined __x86_64__
+#define __NR_memfd_create 319
+#elif defined __x86_32__
+#define __NR_memfd_create 1073742143
+#elif defined __arm__
+#define __NR_memfd_create 385
+#elif defined __aarch64__
+#define __NR_memfd_create 279
+#elif defined __powerpc__
+#define __NR_memfd_create 360
+#elif defined __i386__
+#define __NR_memfd_create 356
+#else
+#error "__NR_memfd_create unknown for this architecture"
+#endif
+
+#endif /* __NR_memfd_create */
+
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif /* MFD_HUGETLB */
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#endif /* RTE_ETH_MEMIF_H */
diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
new file mode 100644
index 000000000..4b2e62193
--- /dev/null
+++ b/drivers/net/memif/rte_pmd_memif_version.map
@@ -0,0 +1,4 @@
+DPDK_19.05 {
+
+ local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index ed99896c3..b57073483 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -24,6 +24,7 @@ drivers = ['af_packet',
'ixgbe',
'kni',
'liquidio',
+ 'memif',
'mlx4',
'mlx5',
'mvneta',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7c9b4b538..326b2a30b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -174,6 +174,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
--
2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* [dpdk-dev] [RFC v8] /net: memory interface (memif)
2019-05-13 10:45 ` [dpdk-dev] [RFC v7] " Jakub Grajciar
2019-05-13 10:45 ` Jakub Grajciar
@ 2019-05-16 11:46 ` Jakub Grajciar
2019-05-16 15:18 ` Stephen Hemminger
` (5 more replies)
1 sibling, 6 replies; 58+ messages in thread
From: Jakub Grajciar @ 2019-05-16 11:46 UTC (permalink / raw)
To: dev; +Cc: Jakub Grajciar
Memory interface (memif), provides high performance
packet transfer over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
---
MAINTAINERS | 6 +
app/test-pmd/testpmd.c | 2 +
config/common_base | 5 +
config/common_linux | 1 +
doc/guides/nics/features/memif.ini | 14 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/memif.rst | 234 ++++
doc/guides/rel_notes/release_19_08.rst | 5 +
drivers/net/Makefile | 1 +
drivers/net/memif/Makefile | 30 +
drivers/net/memif/memif.h | 179 +++
drivers/net/memif/memif_socket.c | 1103 +++++++++++++++++
drivers/net/memif/memif_socket.h | 105 ++
drivers/net/memif/meson.build | 13 +
drivers/net/memif/rte_eth_memif.c | 1194 +++++++++++++++++++
drivers/net/memif/rte_eth_memif.h | 207 ++++
drivers/net/memif/rte_pmd_memif_version.map | 4 +
drivers/net/meson.build | 1 +
mk/rte.app.mk | 1 +
19 files changed, 3106 insertions(+)
create mode 100644 doc/guides/nics/features/memif.ini
create mode 100644 doc/guides/nics/memif.rst
create mode 100644 drivers/net/memif/Makefile
create mode 100644 drivers/net/memif/memif.h
create mode 100644 drivers/net/memif/memif_socket.c
create mode 100644 drivers/net/memif/memif_socket.h
create mode 100644 drivers/net/memif/meson.build
create mode 100644 drivers/net/memif/rte_eth_memif.c
create mode 100644 drivers/net/memif/rte_eth_memif.h
create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
requires patch: http://patchwork.dpdk.org/patch/51511/
Memif uses a unix domain socket as control channel (cc).
rte_interrupts.h is used to handle interrupts for this cc.
This patch is required, because the message received can
be a reason for disconnecting.
v3:
- coding style fixes
- documentation
- doxygen comments
- use strlcpy() instead of strncpy()
- use RTE_BUILD_BUG_ON instead of _Static_assert()
- fix meson build deps
v4:
- coding style fixes
- doc update (desc format, messaging, features, ...)
- pointer arithmetic fix
- __rte_packed and __rte_aligned instead of __attribute__()
- fixed multi-queue
- support additional archs (memfd_create syscall)
v5:
- coding style fixes
- add release notes
- remove unused dependencies
- doc update
v6:
- remove socket on termination of testpmd application
- disallow memif probing in secondary process (return error), multi-process support will be implemented in separate patch
- fix chained buffer packet length
- use single region for rings and buffers (non-zero-copy)
- doc update
v7:
- coding style fixes
v8:
- release notes: 19.05 -> 19.08
diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829c5..3698c146b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -841,6 +841,12 @@ F: drivers/net/softnic/
F: doc/guides/nics/features/softnic.ini
F: doc/guides/nics/softnic.rst
+Memif PMD
+M: Jakub Grajciar <jgrajcia@cisco.com>
+F: drivers/net/memif/
+F: doc/guides/nics/memif.rst
+F: doc/guides/nics/features/memif.ini
+
Crypto Drivers
--------------
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index f0061d99f..b05f1462d 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2493,6 +2493,8 @@ pmd_test_exit(void)
device = rte_eth_devices[pt_id].device;
if (device && !strcmp(device->driver->name, "net_virtio_user"))
detach_port_device(pt_id);
+ else if (device && !strcmp(device->driver->name, "net_memif"))
+ detach_port_device(pt_id);
}
}
diff --git a/config/common_base b/config/common_base
index 6b96e0e80..dca119a65 100644
--- a/config/common_base
+++ b/config/common_base
@@ -444,6 +444,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
#
CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+#
+# Compile Memory Interface PMD driver (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_MEMIF=n
+
#
# Compile link bonding PMD library
#
diff --git a/config/common_linux b/config/common_linux
index 75334273d..87514fe4f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_LIBRTE_IFC_PMD=y
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_MEMIF=y
CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
CONFIG_RTE_LIBRTE_PMD_TAP=y
CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
new file mode 100644
index 000000000..807d9ecdc
--- /dev/null
+++ b/doc/guides/nics/features/memif.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'memif' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status = Y
+Basic stats = Y
+Jumbo frame = Y
+ARMv8 = Y
+Power8 = Y
+x86-32 = Y
+x86-64 = Y
+Usage doc = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2221c35f2..691e7209f 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
intel_vf
kni
liquidio
+ memif
mlx4
mlx5
mvneta
diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
new file mode 100644
index 000000000..0d0312442
--- /dev/null
+++ b/doc/guides/nics/memif.rst
@@ -0,0 +1,234 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2018-2019 Cisco Systems, Inc.
+
+======================
+Memif Poll Mode Driver
+======================
+
+Shared memory packet interface (memif) PMD allows for DPDK and any other client
+using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
+Linux only.
+
+The created device transmits packets in a raw format. It can be used with
+Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
+supported in DPDK memif implementation.
+
+Memif works in two roles: master and slave. Slave connects to master over an
+existing socket. It is also a producer of shared memory file and initializes
+the shared memory. Each interface can be connected to one peer interface
+at same time. The peer interface is identified by id parameter. Master
+creates the socket and listens for any slave connection requests. The socket
+may already exist on the system. Be sure to remove any such sockets, if you
+are creating a master interface, or you will see an "Address already in use"
+error. Function ``rte_pmd_memif_remove()``, which removes memif interface,
+will also remove a listener socket, if it is not being used by any other
+interface.
+
+The method to enable one or more interfaces is to use the
+``--vdev=net_memif0`` option on the DPDK application command line. Each
+``--vdev=net_memif1`` option given will create an interface named net_memif0,
+net_memif1, and so on. Memif uses unix domain socket to transmit control
+messages. Each memif has a unique id per socket. This id is used to identify
+peer interface. If you are connecting multiple
+interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
+etc. Note that if you assign a socket to a master interface it becomes a
+listener socket. Listener socket can not be used by a slave interface on same
+client.
+
+.. csv-table:: **Memif configuration options**
+ :header: "Option", "Description", "Default", "Valid value"
+
+ "id=0", "Used to identify peer interface", "0", "uint32_t"
+ "role=master", "Set memif role", "slave", "master|slave"
+ "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
+ "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
+ "nrxq=2", "Number of RX queues", "1", "255"
+ "ntxq=2", "Number of TX queues", "1", "255"
+ "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
+ "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
+ "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
+ "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
+
+**Connection establishment**
+
+In order to create memif connection, two memif interfaces, each in separate
+process, are needed. One interface in ``master`` role and other in
+``slave`` role. It is not possible to connect two interfaces in a single
+process. Each interface can be connected to one interface at same time,
+identified by matching id parameter.
+
+Memif driver uses unix domain socket to exchange required information between
+memif interfaces. Socket file path is specified at interface creation see
+*Memif configuration options* table above. If socket is used by ``master``
+interface, it's marked as listener socket (in scope of current process) and
+listens to connection requests from other processes. One socket can be used by
+multiple interfaces. One process can have ``slave`` and ``master`` interfaces
+at the same time, provided each role is assigned unique socket.
+
+For detailed information on memif control messages, see: net/memif/memif.h.
+
+Slave interface attempts to make a connection on assigned socket. Process
+listening on this socket will extract the connection request and create a new
+connected socket (control channel). Then it sends the 'hello' message
+(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
+adjusts its configuration accordingly, and sends 'init' message
+(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
+uses this id to find master interface, and assigns the control channel to this
+interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
+sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
+every region allocated. Master responds to each of these messages with 'ack'
+message. Same behavior applies to rings. Slave sends 'add ring' message
+(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
+each message with 'ack' message. To finalize the connection, slave interface
+sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
+master maps regions to its address space, initializes rings and responds with
+'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
+(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
+any time, due to driver error or if the interface is being deleted.
+
+Files
+
+- net/memif/memif.h *- control messages definitions*
+- net/memif/memif_socket.h
+- net/memif/memif_socket.c
+
+Shared memory
+~~~~~~~~~~~~~
+
+**Shared memory format**
+
+Slave is producer and master is consumer. Memory regions, are mapped shared memory files,
+created by memif slave and provided to master at connection establishment.
+Regions contain rings and buffers. Rings and buffers can also be separated into multiple
+regions. For no-zero-copy, rings and buffers are stored inside single memory
+region to reduce the number of opened files.
+
+region n (no-zero-copy):
+
++-----------------------+-------------------------------------------------------------------------+
+| Rings | Buffers |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+| S2M rings | M2S rings | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+
+S2M OR M2S Rings:
+
++--------+--------+-----------------------+
+| ring 0 | ring 1 | ring num_s2m_rings - 1|
++--------+--------+-----------------------+
+
+ring 0:
+
++-------------+---------------------------------------+
+| ring header | (1 << pmd->run.log2_ring_size) * desc |
++-------------+---------------------------------------+
+
+Descriptors are assigned packet buffers in order of rings creation. If we have one ring
+in each direction and ring size is 1024, then first 1024 buffers will belong to S2M ring and
+last 1024 will belong to M2S ring. In case of zero-copy, buffers are dequeued and
+enqueued as needed.
+
+**Descriptor format**
+
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|0 |length |region |flags |
++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
+|1 |metadata |offset |
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+**Flags field - flags (Quad Word 0, bits 0:15)**
+
++-----+--------------------+------------------------------------------------------------------------------------------------+
+|Bits |Name |Functionality |
++=====+====================+================================================================================================+
+|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
++-----+--------------------+------------------------------------------------------------------------------------------------+
+
+**Region index - region (Quad Word 0, 16:31)**
+
+Index of memory region, the buffer is located in.
+
+**Data length - length (Quad Word 0, 32:63)**
+
+Length of transmitted/received data.
+
+**Data Offset - offset (Quad Word 1, 0:31)**
+
+Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
+
+**Metadata - metadata (Quad Word 1, 32:63)**
+
+Buffer metadata.
+
+Files
+
+- net/memif/memif.h *- descriptor and ring definitions*
+- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
+
+Example: testpmd and testpmd
+----------------------------
+In this example we run two instances of testpmd application and transmit packets over memif.
+
+First create ``master`` interface::
+
+ #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
+
+Now create ``slave`` interface (master must be already running so the slave will connect)::
+
+ #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
+
+Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
+
+ testpmd> set fwd rxonly
+ testpmd> start
+
+ testpmd> set fwd txonly
+ testpmd> start
+
+Show status::
+
+ testpmd> show port stats 0
+
+Example: testpmd and VPP
+------------------------
+For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
+
+Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
+
+ vpp# create interface memif id 0 master no-zero-copy
+ vpp# set interface state memif0/0 up
+ vpp# set interface ip address memif0/0 192.168.1.1/24
+
+To see socket filename use show memif command::
+
+ vpp# show memif
+ sockets
+ id listener filename
+ 0 yes (1) /run/vpp/memif.sock
+ ...
+
+Now create memif interface by running testpmd with these command line options::
+
+ #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
+
+Testpmd should now create memif slave interface and try to connect to master.
+In testpmd set forward option to icmpecho and start forwarding::
+
+ testpmd> set fwd icmpecho
+ testpmd> start
+
+Send ping from VPP::
+
+ vpp# ping 192.168.1.2
+ 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
+ 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
+ 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
+ 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index b9510f93a..42a7922b3 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -54,6 +54,11 @@ New Features
Also, make sure to start the actual text at the margin.
=========================================================
+* **Added memif PMD.**
+
+ Added the new Shared Memory Packet Interface (``memif``) PMD.
+ See the :doc:`../nics/memif` guide for more details on this new driver.
+
Removed Items
-------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3a72cf38c..78cb10fc6 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -35,6 +35,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
DIRS-$(CONFIG_RTE_LIBRTE_IPN3KE_PMD) += ipn3ke
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
new file mode 100644
index 000000000..2ae3c37a2
--- /dev/null
+++ b/drivers/net/memif/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_memif.a
+
+EXPORT_MAP := rte_pmd_memif_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
+LDLIBS += -lrte_ethdev -lrte_kvargs
+LDLIBS += -lrte_hash
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
new file mode 100644
index 000000000..3948b1f50
--- /dev/null
+++ b/drivers/net/memif/memif.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_H_
+#define _MEMIF_H_
+
+#define MEMIF_COOKIE 0x3E31F20
+#define MEMIF_VERSION_MAJOR 2
+#define MEMIF_VERSION_MINOR 0
+#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
+#define MEMIF_NAME_SZ 32
+
+/*
+ * S2M: direction slave -> master
+ * M2S: direction master -> slave
+ */
+
+/*
+ * Type definitions
+ */
+
+typedef enum memif_msg_type {
+ MEMIF_MSG_TYPE_NONE,
+ MEMIF_MSG_TYPE_ACK,
+ MEMIF_MSG_TYPE_HELLO,
+ MEMIF_MSG_TYPE_INIT,
+ MEMIF_MSG_TYPE_ADD_REGION,
+ MEMIF_MSG_TYPE_ADD_RING,
+ MEMIF_MSG_TYPE_CONNECT,
+ MEMIF_MSG_TYPE_CONNECTED,
+ MEMIF_MSG_TYPE_DISCONNECT,
+} memif_msg_type_t;
+
+typedef enum {
+ MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
+ MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
+} memif_ring_type_t;
+
+typedef enum {
+ MEMIF_INTERFACE_MODE_ETHERNET,
+ MEMIF_INTERFACE_MODE_IP,
+ MEMIF_INTERFACE_MODE_PUNT_INJECT,
+} memif_interface_mode_t;
+
+typedef uint16_t memif_region_index_t;
+typedef uint32_t memif_region_offset_t;
+typedef uint64_t memif_region_size_t;
+typedef uint16_t memif_ring_index_t;
+typedef uint32_t memif_interface_id_t;
+typedef uint16_t memif_version_t;
+typedef uint8_t memif_log2_ring_size_t;
+
+/*
+ * Socket messages
+ */
+
+ /**
+ * M2S
+ * Contains master interfaces configuration.
+ */
+typedef struct __rte_packed {
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+ memif_version_t min_version; /**< lowest supported memif version */
+ memif_version_t max_version; /**< highest supported memif version */
+ memif_region_index_t max_region; /**< maximum num of regions */
+ memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
+ memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
+ memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
+} memif_msg_hello_t;
+
+/**
+ * S2M
+ * Contains information required to identify interface
+ * to which the slave wants to connect.
+ */
+typedef struct __rte_packed {
+ memif_version_t version; /**< memif version */
+ memif_interface_id_t id; /**< interface id */
+ memif_interface_mode_t mode:8; /**< interface mode */
+ uint8_t secret[24]; /**< optional security parameter */
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+} memif_msg_init_t;
+
+/**
+ * S2M
+ * Request master to add new shared memory region to master interface.
+ * Shared files file descriptor is passed in cmsghdr.
+ */
+typedef struct __rte_packed {
+ memif_region_index_t index; /**< shm regions index */
+ memif_region_size_t size; /**< shm region size */
+} memif_msg_add_region_t;
+
+/**
+ * S2M
+ * Request master to add new ring to master interface.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
+ memif_ring_index_t index; /**< ring index */
+ memif_region_index_t region; /**< region index on which this ring is located */
+ memif_region_offset_t offset; /**< buffer start offset */
+ memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
+ uint16_t private_hdr_size; /**< used for private metadata */
+} memif_msg_add_ring_t;
+
+/**
+ * S2M
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
+} memif_msg_connect_t;
+
+/**
+ * M2S
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
+} memif_msg_connected_t;
+
+/**
+ * S2M & M2S
+ * Disconnect interfaces.
+ */
+typedef struct __rte_packed {
+ uint32_t code; /**< error code */
+ uint8_t string[96]; /**< disconnect reason */
+} memif_msg_disconnect_t;
+
+typedef struct __rte_packed __rte_aligned(128)
+{
+ memif_msg_type_t type:16;
+ union {
+ memif_msg_hello_t hello;
+ memif_msg_init_t init;
+ memif_msg_add_region_t add_region;
+ memif_msg_add_ring_t add_ring;
+ memif_msg_connect_t connect;
+ memif_msg_connected_t connected;
+ memif_msg_disconnect_t disconnect;
+ };
+} memif_msg_t;
+
+/*
+ * Ring and Descriptor Layout
+ */
+
+/**
+ * Buffer descriptor.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
+ memif_region_index_t region; /**< region index on which the buffer is located */
+ uint32_t length; /**< buffer length */
+ memif_region_offset_t offset; /**< buffer offset */
+ uint32_t metadata;
+} memif_desc_t;
+
+#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
+ uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
+
+typedef struct {
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+ uint32_t cookie; /**< MEMIF_COOKIE */
+ uint16_t flags; /**< flags */
+#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ volatile uint16_t head; /**< pointer to ring buffer head */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ volatile uint16_t tail; /**< pointer to ring buffer tail */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ memif_desc_t desc[0]; /**< buffer descriptors */
+} memif_ring_t;
+
+#endif /* _MEMIF_H_ */
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
new file mode 100644
index 000000000..81bb17224
--- /dev/null
+++ b/drivers/net/memif/memif_socket.c
@@ -0,0 +1,1103 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+static void memif_intr_handler(void *arg);
+
+static ssize_t
+memif_msg_send(int fd, memif_msg_t *msg, int afd)
+{
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ struct cmsghdr *cmsg;
+ char ctl[CMSG_SPACE(sizeof(int))];
+
+ iov[0].iov_base = msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+
+ if (afd > 0) {
+ memset(&ctl, 0, sizeof(ctl));
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+ cmsg = CMSG_FIRSTHDR(&mh);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
+ }
+
+ return sendmsg(fd, &mh, 0);
+}
+
+static int
+memif_msg_send_from_queue(struct memif_control_channel *cc)
+{
+ ssize_t size;
+ int ret = 0;
+ struct memif_msg_queue_elt *e;
+
+ e = TAILQ_FIRST(&cc->msg_queue);
+ if (e == NULL)
+ return 0;
+
+ size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
+ ret = -1;
+ } else {
+ MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
+ }
+ TAILQ_REMOVE(&cc->msg_queue, e, next);
+ rte_free(e);
+
+ return ret;
+}
+
+static struct memif_msg_queue_elt *
+memif_msg_enq(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e;
+
+ e = rte_zmalloc("memif_msg", sizeof(struct memif_msg_queue_elt), 0);
+ if (e == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control message.");
+ return NULL;
+ }
+
+ e->fd = -1;
+ TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
+
+ return e;
+}
+
+void
+memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code)
+{
+ struct memif_msg_queue_elt *e;
+ struct pmd_internals *pmd;
+ memif_msg_disconnect_t *d;
+
+ if (cc == NULL) {
+ MIF_LOG(DEBUG, "Missing control channel.");
+ return;
+ }
+
+ e = memif_msg_enq(cc);
+ if (e == NULL) {
+ MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
+ return;
+ }
+
+ d = &e->msg.disconnect;
+
+ e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
+ d->code = err_code;
+
+ if (reason != NULL) {
+ strlcpy((char *)d->string, reason, sizeof(d->string));
+ if (cc->dev != NULL) {
+ pmd = cc->dev->data->dev_private;
+ strlcpy(pmd->local_disc_string, reason,
+ sizeof(pmd->local_disc_string));
+ }
+ }
+}
+
+static int
+memif_msg_enq_hello(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ memif_msg_hello_t *h;
+
+ if (e == NULL)
+ return -1;
+
+ h = &e->msg.hello;
+
+ e->msg.type = MEMIF_MSG_TYPE_HELLO;
+ h->min_version = MEMIF_VERSION;
+ h->max_version = MEMIF_VERSION;
+ h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_region = ETH_MEMIF_MAX_REGION_NUM - 1;
+ h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
+
+ strlcpy((char *)h->name, rte_version(), sizeof(h->name));
+
+ return 0;
+}
+
+static int
+memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_hello_t *h = &msg->hello;
+
+ if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
+ memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ /* Set parameters for active connection */
+ pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
+ pmd->cfg.num_s2m_rings);
+ pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
+ pmd->cfg.num_m2s_rings);
+ pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
+ pmd->cfg.log2_ring_size);
+ pmd->run.pkt_buffer_size = pmd->cfg.pkt_buffer_size;
+
+ strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
+
+ MIF_LOG(DEBUG, "%s: Connecting to %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
+{
+ memif_msg_init_t *i = &msg->init;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *pmd;
+ struct rte_eth_dev *dev;
+
+ if (i->version != MEMIF_VERSION) {
+ memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ if (cc->socket == NULL) {
+ memif_msg_enq_disconnect(cc, "Device error", 0);
+ return -1;
+ }
+
+ /* Find device with requested ID */
+ TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
+ dev = elt->dev;
+ pmd = dev->data->dev_private;
+ if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
+ pmd->id == i->id) {
+ /* assign control channel to device */
+ cc->dev = dev;
+ pmd->cc = cc;
+
+ if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Only ethernet mode supported",
+ 0);
+ return -1;
+ }
+
+ if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
+ ETH_MEMIF_FLAG_CONNECTED)) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Already connected", 0);
+ return -1;
+ }
+ strlcpy(pmd->remote_name, (char *)i->name,
+ sizeof(pmd->remote_name));
+
+ if (*pmd->secret != '\0') {
+ if (*i->secret == '\0') {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Secret required", 0);
+ return -1;
+ }
+ if (strcmp(pmd->secret, (char *)i->secret) != 0) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Incorrect secret", 0);
+ return -1;
+ }
+ }
+
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
+ return 0;
+ }
+ }
+
+ /* ID not found on this socket */
+ MIF_LOG(DEBUG, "ID %u not found.", i->id);
+ memif_msg_enq_disconnect(cc, "ID not found", 0);
+ return -1;
+}
+
+static int
+memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
+ int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_region_t *ar = &msg->add_region;
+ struct memif_region *r;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
+ return -1;
+ }
+
+ if (ar->index >= ETH_MEMIF_MAX_REGION_NUM || ar->index != pmd->regions_num ||
+ pmd->regions[ar->index] != NULL) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ r->fd = fd;
+ r->region_size = ar->size;
+ r->addr = NULL;
+
+ pmd->regions[ar->index] = r;
+ pmd->regions_num++;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_ring_t *ar = &msg->add_ring;
+ struct memif_queue *mq;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
+ return -1;
+ }
+
+ /* check if we have enough queues */
+ if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
+ if (ar->index >= pmd->cfg.num_s2m_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_s2m_rings++;
+ } else {
+ if (ar->index >= pmd->cfg.num_m2s_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_m2s_rings++;
+ }
+
+ mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
+ dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
+
+ mq->intr_handle.fd = fd;
+ mq->log2_ring_size = ar->log2_ring_size;
+ mq->region = ar->region;
+ mq->ring_offset = ar->offset;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connect_t *c = &msg->connect;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connected_t *c = &msg->connected;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_disconnect_t *d = &msg->disconnect;
+
+ memset(pmd->remote_disc_string, 0, sizeof(pmd->remote_disc_string));
+ strlcpy(pmd->remote_disc_string, (char *)d->string,
+ sizeof(pmd->remote_disc_string));
+
+ MIF_LOG(INFO, "%s: Disconnect received: %s",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+
+ memset(pmd->local_disc_string, 0, 96);
+ memif_disconnect(rte_eth_dev_allocated
+ (rte_vdev_device_name(pmd->vdev)));
+ return 0;
+}
+
+static int
+memif_msg_enq_ack(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ if (e == NULL)
+ return -1;
+
+ e->msg.type = MEMIF_MSG_TYPE_ACK;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_init(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_init_t *i = &e->msg.init;
+
+ if (e == NULL)
+ return -1;
+
+ i = &e->msg.init;
+ e->msg.type = MEMIF_MSG_TYPE_INIT;
+ i->version = MEMIF_VERSION;
+ i->id = pmd->id;
+ i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
+
+ strlcpy((char *)i->name, rte_version(), sizeof(i->name));
+
+ if (*pmd->secret != '\0')
+ strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_add_region_t *ar;
+ struct memif_region *mr = pmd->regions[idx];
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_region;
+ e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
+ e->fd = mr->fd;
+ ar->index = idx;
+ ar->size = mr->region_size;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
+ memif_ring_type_t type)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ struct memif_queue *mq;
+ memif_msg_add_ring_t *ar;
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_ring;
+ mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
+ dev->data->rx_queues[idx];
+
+ e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
+ e->fd = mq->intr_handle.fd;
+ ar->index = idx;
+ ar->offset = mq->ring_offset;
+ ar->region = mq->region;
+ ar->log2_ring_size = mq->log2_ring_size;
+ ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
+ ar->private_hdr_size = 0;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connect_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connect;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECT;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connected(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connected_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connected;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static void
+memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
+{
+ struct memif_msg_queue_elt *elt;
+ struct memif_control_channel *cc = arg;
+
+ /* close control channel fd */
+ close(intr_handle->fd);
+ /* clear message queue */
+ while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
+ TAILQ_REMOVE(&cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ /* free control channel */
+ rte_free(cc);
+}
+
+void
+memif_disconnect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *elt, *next;
+ struct memif_queue *mq;
+ struct rte_intr_handle *ih;
+ int i;
+ int ret;
+
+ if (pmd->cc != NULL) {
+ /* Clear control message queue (except disconnect message if any). */
+ for (elt = TAILQ_FIRST(&pmd->cc->msg_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
+ TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ }
+ /* send disconnect message (if there is any in queue) */
+ memif_msg_send_from_queue(pmd->cc);
+
+ /* at this point, there should be no more messages in queue */
+ if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
+ MIF_LOG(WARNING,
+ "%s: Unexpected message(s) in message queue.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+
+ ih = &pmd->cc->intr_handle;
+ if (ih->fd > 0) {
+ ret = rte_intr_callback_unregister(ih,
+ memif_intr_handler,
+ pmd->cc);
+ /*
+ * If callback is active (disconnecting based on
+ * received control message).
+ */
+ if (ret == -EAGAIN) {
+ ret = rte_intr_callback_unregister_pending(ih,
+ memif_intr_handler,
+ pmd->cc,
+ memif_intr_unregister_handler);
+ } else if (ret > 0) {
+ close(ih->fd);
+ rte_free(pmd->cc);
+ }
+ pmd->cc = NULL;
+ if (ret <= 0)
+ MIF_LOG(WARNING, "%s: Failed to unregister "
+ "control channel callback.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+ }
+
+ /* unconfig interrupts */
+ for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+ for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+
+ memif_free_regions(pmd);
+
+ /* reset connection configuration */
+ memset(&pmd->run, 0, sizeof(pmd->run));
+
+ dev->data->dev_link.link_status = ETH_LINK_DOWN;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
+ MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
+}
+
+static int
+memif_msg_receive(struct memif_control_channel *cc)
+{
+ char ctl[CMSG_SPACE(sizeof(int)) +
+ CMSG_SPACE(sizeof(struct ucred))] = { 0 };
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ memif_msg_t msg = { 0 };
+ ssize_t size;
+ int ret = 0;
+ struct ucred *cr __rte_unused;
+ cr = 0;
+ struct cmsghdr *cmsg;
+ int afd = -1;
+ int i;
+ struct pmd_internals *pmd;
+
+ iov[0].iov_base = (void *)&msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+
+ size = recvmsg(cc->intr_handle.fd, &mh, 0);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(DEBUG, "Invalid message size.");
+ memif_msg_enq_disconnect(cc, "Invalid message size", 0);
+ return -1;
+ }
+ MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
+
+ cmsg = CMSG_FIRSTHDR(&mh);
+ while (cmsg) {
+ if (cmsg->cmsg_level == SOL_SOCKET) {
+ if (cmsg->cmsg_type == SCM_CREDENTIALS)
+ cr = (struct ucred *)CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_RIGHTS)
+ afd = *(int *)CMSG_DATA(cmsg);
+ }
+ cmsg = CMSG_NXTHDR(&mh, cmsg);
+ }
+
+ if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
+ MIF_LOG(DEBUG, "Unexpected message.");
+ memif_msg_enq_disconnect(cc, "Unexpected message", 0);
+ return -1;
+ }
+
+ /* get device from hash data */
+ switch (msg.type) {
+ case MEMIF_MSG_TYPE_ACK:
+ break;
+ case MEMIF_MSG_TYPE_HELLO:
+ ret = memif_msg_receive_hello(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_init_regions_and_queues(cc->dev);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_init(cc->dev);
+ if (ret < 0)
+ goto exit;
+ pmd = cc->dev->data->dev_private;
+ for (i = 0; i < pmd->regions_num; i++) {
+ ret = memif_msg_enq_add_region(cc->dev, i);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_S2M);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_M2S);
+ if (ret < 0)
+ goto exit;
+ }
+ ret = memif_msg_enq_connect(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_INIT:
+ /*
+ * This cc does not have an interface asociated with it.
+ * If suitable interface is found it will be assigned here.
+ */
+ ret = memif_msg_receive_init(cc, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_REGION:
+ ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_RING:
+ ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECT:
+ ret = memif_msg_receive_connect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_connected(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECTED:
+ ret = memif_msg_receive_connected(cc->dev, &msg);
+ break;
+ case MEMIF_MSG_TYPE_DISCONNECT:
+ ret = memif_msg_receive_disconnect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ break;
+ default:
+ memif_msg_enq_disconnect(cc, "Unknown message type", 0);
+ ret = -1;
+ goto exit;
+ }
+
+ exit:
+ return ret;
+}
+
+static void
+memif_intr_handler(void *arg)
+{
+ struct memif_control_channel *cc = arg;
+ int ret;
+
+ ret = memif_msg_receive(cc);
+ /* if driver failed to assign device */
+ if (cc->dev == NULL) {
+ ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
+ memif_intr_handler,
+ cc,
+ memif_intr_unregister_handler);
+ if (ret < 0)
+ MIF_LOG(WARNING,
+ "Failed to unregister control channel callback.");
+ return;
+ }
+ /* if memif_msg_receive failed */
+ if (ret < 0)
+ goto disconnect;
+
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto disconnect;
+
+ return;
+
+ disconnect:
+ if (cc->dev == NULL) {
+ MIF_LOG(WARNING, "eth dev not allocated");
+ return;
+ }
+ memif_disconnect(cc->dev);
+}
+
+static void
+memif_listener_handler(void *arg)
+{
+ struct memif_socket *socket = arg;
+ int sockfd;
+ int addr_len;
+ struct sockaddr_un client;
+ struct memif_control_channel *cc;
+ int ret;
+
+ addr_len = sizeof(client);
+ sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
+ (socklen_t *)&addr_len);
+ if (sockfd < 0) {
+ MIF_LOG(ERR,
+ "Failed to accept connection request on socket fd %d",
+ socket->intr_handle.fd);
+ return;
+ }
+
+ MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
+
+ cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
+ if (cc == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control channel.");
+ goto error;
+ }
+
+ cc->intr_handle.fd = sockfd;
+ cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ cc->socket = socket;
+ cc->dev = NULL;
+ TAILQ_INIT(&cc->msg_queue);
+
+ ret = rte_intr_callback_register(&cc->intr_handle, memif_intr_handler, cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to register control channel callback.");
+ goto error;
+ }
+
+ ret = memif_msg_enq_hello(cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to enqueue hello message.");
+ goto error;
+ }
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto error;
+
+ return;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (cc != NULL)
+ rte_free(cc);
+}
+
+static struct memif_socket *
+memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
+{
+ struct memif_socket *sock;
+ struct sockaddr_un un;
+ int sockfd;
+ int ret;
+ int on = 1;
+
+ sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
+ if (sock == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory for memif socket");
+ return NULL;
+ }
+
+ sock->listener = listener;
+ rte_memcpy(sock->filename, key, 256);
+ TAILQ_INIT(&sock->dev_queue);
+
+ if (listener != 0) {
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0)
+ goto error;
+
+ un.sun_family = AF_UNIX;
+ memcpy(un.sun_path, sock->filename,
+ sizeof(un.sun_path) - 1);
+
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
+ sizeof(on));
+ if (ret < 0)
+ goto error;
+ ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+ if (ret < 0)
+ goto error;
+ ret = listen(sockfd, 1);
+ if (ret < 0)
+ goto error;
+
+ MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
+ rte_vdev_device_name(pmd->vdev), sock->filename);
+
+ sock->intr_handle.fd = sockfd;
+ sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ ret = rte_intr_callback_register(&sock->intr_handle,
+ memif_listener_handler, sock);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt "
+ "callback for listener socket",
+ rte_vdev_device_name(pmd->vdev));
+ return NULL;
+ }
+ }
+
+ return sock;
+
+ error:
+ MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
+ rte_vdev_device_name(pmd->vdev), key, strerror(errno));
+ if (sock != NULL)
+ rte_free(sock);
+ return NULL;
+}
+
+static struct rte_hash *
+memif_create_socket_hash(void)
+{
+ struct rte_hash_parameters params = { 0 };
+ params.name = MEMIF_SOCKET_HASH_NAME;
+ params.entries = 256;
+ params.key_len = 256;
+ params.hash_func = rte_jhash;
+ params.hash_func_init_val = 0;
+ return rte_hash_create(¶ms);
+}
+
+int
+memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *tmp_pmd;
+ struct rte_hash *hash;
+ int ret;
+ char key[256];
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL) {
+ hash = memif_create_socket_hash();
+ if (hash == NULL) {
+ MIF_LOG(ERR, "Failed to create memif socket hash.");
+ return -1;
+ }
+ }
+
+ memset(key, 0, 256);
+ rte_memcpy(key, socket_filename, strlen(socket_filename));
+ ret = rte_hash_lookup_data(hash, key, (void **)&socket);
+ if (ret < 0) {
+ socket = memif_socket_create(pmd, key,
+ (pmd->role ==
+ MEMIF_ROLE_SLAVE) ? 0 : 1);
+ if (socket == NULL)
+ return -1;
+ ret = rte_hash_add_key_data(hash, key, socket);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to add socket to socket hash.");
+ return ret;
+ }
+ }
+ pmd->socket_filename = socket->filename;
+
+ if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
+ MIF_LOG(ERR, "Socket is a listener.");
+ return -1;
+ } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
+ MIF_LOG(ERR, "Socket is not a listener.");
+ return -1;
+ }
+
+ TAILQ_FOREACH(elt, &socket->dev_queue, next) {
+ tmp_pmd = elt->dev->data->dev_private;
+ if (tmp_pmd->id == pmd->id) {
+ MIF_LOG(ERR, "Memif device with id %d already "
+ "exists on socket %s",
+ pmd->id, socket->filename);
+ return -1;
+ }
+ }
+
+ elt = rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt), 0);
+ if (elt == NULL) {
+ MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ elt->dev = dev;
+ TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
+
+ return 0;
+}
+
+void
+memif_socket_remove_device(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt, *next;
+ struct rte_hash *hash;
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL)
+ return;
+
+ if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) <
+ 0)
+ return;
+
+ for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->dev == dev) {
+ TAILQ_REMOVE(&socket->dev_queue, elt, next);
+ rte_free(elt);
+ pmd->socket_filename = NULL;
+ }
+ }
+
+ /* remove socket, if this was the last device using it */
+ if (TAILQ_EMPTY(&socket->dev_queue)) {
+ rte_hash_del_key(hash, socket->filename);
+ if (socket->listener) {
+ /* remove listener socket file,
+ * so we can create new one later.
+ */
+ remove(socket->filename);
+ }
+ rte_free(socket);
+ }
+}
+
+int
+memif_connect_master(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+ return 0;
+}
+
+int
+memif_connect_slave(struct rte_eth_dev *dev)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_un sun;
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, 96);
+ memset(pmd->remote_disc_string, 0, 96);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0) {
+ MIF_LOG(ERR, "%s: Failed to open socket.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ sun.sun_family = AF_UNIX;
+
+ memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
+
+ ret = connect(sockfd, (struct sockaddr *)&sun,
+ sizeof(struct sockaddr_un));
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+ goto error;
+ }
+
+ MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+
+ pmd->cc = rte_zmalloc("memif-cc",
+ sizeof(struct memif_control_channel), 0);
+ if (pmd->cc == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate control channel.",
+ rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ pmd->cc->intr_handle.fd = sockfd;
+ pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ pmd->cc->socket = NULL;
+ pmd->cc->dev = dev;
+ TAILQ_INIT(&pmd->cc->msg_queue);
+
+ ret = rte_intr_callback_register(&pmd->cc->intr_handle,
+ memif_intr_handler, pmd->cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt callback "
+ "for control fd", rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ return 0;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (pmd->cc != NULL) {
+ rte_free(pmd->cc);
+ pmd->cc = NULL;
+ }
+ return -1;
+}
diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
new file mode 100644
index 000000000..2e8970712
--- /dev/null
+++ b/drivers/net/memif/memif_socket.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_SOCKET_H_
+#define _MEMIF_SOCKET_H_
+
+#include <sys/queue.h>
+
+/**
+ * Remove device from socket device list. If no device is left on the socket,
+ * remove the socket as well.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_socket_remove_device(struct rte_eth_dev *dev);
+
+/**
+ * Enqueue disconnect message to control channel message queue.
+ *
+ * @param cc
+ * control channel
+ * @param reason
+ * const string stating disconnect reason (96 characters)
+ * @param err_code
+ * error code
+ */
+void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code);
+
+/**
+ * Initialize memif socket for specified device. If socket doesn't exist, create socket.
+ *
+ * @param dev
+ * memif ethernet device
+ * @param socket_filename
+ * socket filename
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
+
+/**
+ * Disconnect memif device. Close control channel and shared memory.
+ *
+ * @param dev
+ * ethernet device
+ */
+void memif_disconnect(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, enable connection establishment.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_master(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, send connection request.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_slave(struct rte_eth_dev *dev);
+
+struct memif_socket_dev_list_elt {
+ TAILQ_ENTRY(memif_socket_dev_list_elt) next;
+ struct rte_eth_dev *dev; /**< pointer to device internals */
+ char dev_name[RTE_ETH_NAME_MAX_LEN];
+};
+
+#define MEMIF_SOCKET_HASH_NAME "memif-sh"
+struct memif_socket {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ uint8_t listener; /**< if not zero socket is listener */
+ char filename[256]; /**< socket filename */
+
+ TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
+ /**< Queue of devices using this socket */
+};
+
+/* Control message queue. */
+struct memif_msg_queue_elt {
+ TAILQ_ENTRY(memif_msg_queue_elt) next;
+ memif_msg_t msg; /**< control message */
+ int fd; /**< fd to be sent to peer */
+};
+
+struct memif_control_channel {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
+ struct memif_socket *socket; /**< pointer to socket */
+ struct rte_eth_dev *dev; /**< pointer to device */
+};
+
+#endif /* MEMIF_SOCKET_H */
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
new file mode 100644
index 000000000..4dfe37d3a
--- /dev/null
+++ b/drivers/net/memif/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+if host_machine.system() != 'linux'
+ build = false
+endif
+
+sources = files('rte_eth_memif.c',
+ 'memif_socket.c')
+
+allow_experimental_apis = true
+
+deps += ['hash']
diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
new file mode 100644
index 000000000..7e1950b4d
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -0,0 +1,1194 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/if_ether.h>
+#include <errno.h>
+#include <sys/eventfd.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+#define ETH_MEMIF_ID_ARG "id"
+#define ETH_MEMIF_ROLE_ARG "role"
+#define ETH_MEMIF_PKT_BUFFER_SIZE_ARG "bsize"
+#define ETH_MEMIF_RING_SIZE_ARG "rsize"
+#define ETH_MEMIF_SOCKET_ARG "socket"
+#define ETH_MEMIF_MAC_ARG "mac"
+#define ETH_MEMIF_ZC_ARG "zero-copy"
+#define ETH_MEMIF_SECRET_ARG "secret"
+
+static const char *valid_arguments[] = {
+ ETH_MEMIF_ID_ARG,
+ ETH_MEMIF_ROLE_ARG,
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ ETH_MEMIF_RING_SIZE_ARG,
+ ETH_MEMIF_SOCKET_ARG,
+ ETH_MEMIF_MAC_ARG,
+ ETH_MEMIF_ZC_ARG,
+ ETH_MEMIF_SECRET_ARG,
+ NULL
+};
+
+const char *
+memif_version(void)
+{
+ return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
+}
+
+static void
+memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
+{
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+ dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static memif_ring_t *
+memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
+{
+ /* rings only in region 0 */
+ void *p = pmd->regions[0]->addr;
+ int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size);
+
+ p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
+
+ return (memif_ring_t *)p;
+}
+
+static void *
+memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
+{
+ return ((uint8_t *)pmd->regions[d->region]->addr + d->offset);
+}
+
+static int
+memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail,
+ struct rte_mbuf *tail)
+{
+ /* Check for number-of-segments-overflow */
+ if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS))
+ return -EOVERFLOW;
+
+ /* Chain 'tail' onto the old tail */
+ cur_tail->next = tail;
+
+ /* accumulate number of segments and total length. */
+ head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs);
+
+ tail->pkt_len = tail->data_len;
+ head->pkt_len += tail->pkt_len;
+
+ return 0;
+}
+
+static uint16_t
+eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
+ uint16_t n_rx_pkts = 0;
+ uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
+ RTE_PKTMBUF_HEADROOM;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail;
+ uint64_t b;
+ ssize_t size __rte_unused;
+ uint16_t head;
+ int ret;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ /* consume interrupt */
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
+ size = read(mq->intr_handle.fd, &b, sizeof(b));
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
+ last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+ if (cur_slot == last_slot)
+ goto refill;
+ n_slots = last_slot - cur_slot;
+
+ while (n_slots && n_rx_pkts < nb_pkts) {
+ mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf_head == NULL))
+ goto no_free_bufs;
+ mbuf = mbuf_head;
+ mbuf->port = mq->in_port;
+
+next_slot:
+ s0 = cur_slot & mask;
+ d0 = &ring->desc[s0];
+
+ src_len = d0->length;
+ dst_off = 0;
+ src_off = 0;
+
+ do {
+ dst_len = mbuf_size - dst_off;
+ if (dst_len == 0) {
+ dst_off = 0;
+ dst_len = mbuf_size;
+
+ /* store pointer to tail */
+ mbuf_tail = mbuf;
+ mbuf = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf == NULL))
+ goto no_free_bufs;
+ mbuf->port = mq->in_port;
+ ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf);
+ if (unlikely(ret < 0)) {
+ MIF_LOG(ERR, "%s: number-of-segments-overflow",
+ rte_vdev_device_name(pmd->vdev));
+ rte_pktmbuf_free(mbuf);
+ goto no_free_bufs;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ rte_pktmbuf_data_len(mbuf) += cp_len;
+ rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
+ if (mbuf != mbuf_head)
+ rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
+
+ memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
+ (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
+
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ } while (src_len);
+
+ cur_slot++;
+ n_slots--;
+
+ if (d0->flags & MEMIF_DESC_FLAG_NEXT)
+ goto next_slot;
+
+ mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
+ *bufs++ = mbuf_head;
+ n_rx_pkts++;
+ }
+
+no_free_bufs:
+ if (type == MEMIF_RING_S2M) {
+ rte_mb();
+ ring->tail = cur_slot;
+ mq->last_head = cur_slot;
+ } else {
+ mq->last_tail = cur_slot;
+ }
+
+refill:
+ if (type == MEMIF_RING_M2S) {
+ head = ring->head;
+ n_slots = ring_size - head + mq->last_tail;
+
+ while (n_slots--) {
+ s0 = head++ & mask;
+ d0 = &ring->desc[s0];
+ d0->length = pmd->run.pkt_buffer_size;
+ }
+ rte_mb();
+ ring->head = head;
+ }
+
+ mq->n_pkts += n_rx_pkts;
+ return n_rx_pkts;
+}
+
+static uint16_t
+eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head;
+ uint64_t a;
+ ssize_t size;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ n_free = ring->tail - mq->last_tail;
+ mq->last_tail += n_free;
+ slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+
+ if (type == MEMIF_RING_S2M)
+ n_free = ring_size - ring->head + mq->last_tail;
+ else
+ n_free = ring->head - ring->tail;
+
+ while (n_tx_pkts < nb_pkts && n_free) {
+ mbuf_head = *bufs++;
+ mbuf = mbuf_head;
+
+ saved_slot = slot;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+
+next_in_chain:
+ src_off = 0;
+ src_len = rte_pktmbuf_data_len(mbuf);
+
+ while (src_len) {
+ if (dst_len == 0) {
+ if (n_free) {
+ slot++;
+ n_free--;
+ d0->flags |= MEMIF_DESC_FLAG_NEXT;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+ d0->flags = 0;
+ } else {
+ slot = saved_slot;
+ goto no_free_slots;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
+ rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
+ cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ dst_len -= cp_len;
+
+ d0->length = dst_off;
+ }
+
+ if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
+ mbuf = mbuf->next;
+ goto next_in_chain;
+ }
+
+ n_tx_pkts++;
+ slot++;
+ n_free--;
+ rte_pktmbuf_free(mbuf_head);
+ }
+
+no_free_slots:
+ rte_mb();
+ if (type == MEMIF_RING_S2M)
+ ring->head = slot;
+ else
+ ring->tail = slot;
+
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
+ a = 1;
+ size = write(mq->intr_handle.fd, &a, sizeof(a));
+ if (unlikely(size < 0)) {
+ MIF_LOG(WARNING,
+ "%s: Failed to send interrupt. %s",
+ rte_vdev_device_name(pmd->vdev), strerror(errno));
+ }
+ }
+
+ mq->n_err += nb_pkts - n_tx_pkts;
+ mq->n_pkts += n_tx_pkts;
+ return n_tx_pkts;
+}
+
+void
+memif_free_regions(struct pmd_internals *pmd)
+{
+ int i;
+ struct memif_region *r;
+
+ /* regions are allocated contiguously, so it's
+ * enough to loop until 'pmd->regions_num'
+ */
+ for (i = 0; i < pmd->regions_num; i++) {
+ r = pmd->regions[i];
+ if (r != NULL) {
+ if (r->addr != NULL) {
+ munmap(r->addr, r->region_size);
+ if (r->fd > 0) {
+ close(r->fd);
+ r->fd = -1;
+ }
+ }
+ rte_free(r);
+ pmd->regions[i] = NULL;
+ }
+ }
+ pmd->regions_num = 0;
+}
+
+static int
+memif_region_init_shm(struct pmd_internals *pmd, uint8_t has_buffers)
+{
+ char shm_name[ETH_MEMIF_SHM_NAME_SIZE];
+ int ret = 0;
+ struct memif_region *r;
+
+ if (pmd->regions_num >= ETH_MEMIF_MAX_REGION_NUM) {
+ MIF_LOG(ERR, "%s: Too many regions.", rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ /* calculate buffer offset */
+ r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+
+ r->region_size = r->pkt_buffer_offset;
+ /* if region has buffers, add buffers size to region_size */
+ if (has_buffers == 1)
+ r->region_size += (uint32_t)(pmd->run.pkt_buffer_size *
+ (1 << pmd->run.log2_ring_size) *
+ (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings));
+
+ memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE);
+ snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d",
+ pmd->regions_num);
+
+ r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
+ if (r->fd < 0) {
+ MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ ret = -1;
+ goto error;
+ }
+
+ ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ ret = ftruncate(r->fd, r->region_size);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ r->addr = mmap(NULL, r->region_size, PROT_READ |
+ PROT_WRITE, MAP_SHARED, r->fd, 0);
+ if (r->addr == MAP_FAILED) {
+ MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(ret));
+ ret = -1;
+ goto error;
+ }
+
+ pmd->regions[pmd->regions_num] = r;
+ pmd->regions_num++;
+
+ return ret;
+
+error:
+ if (r->fd > 0)
+ close(r->fd);
+ r->fd = -1;
+
+ return ret;
+}
+
+static int
+memif_regions_init(struct pmd_internals *pmd)
+{
+ int ret;
+
+ /* create one buffer region */
+ ret = memif_region_init_shm(pmd, /* has buffer */ 1);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void
+memif_init_rings(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_ring_t *ring;
+ int i, j;
+ uint16_t slot;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = i * (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
+ (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = (i + pmd->run.num_s2m_rings) *
+ (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
+ (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+}
+
+/* called only by slave */
+static void
+memif_init_queues(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = dev->data->tx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for tx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = dev->data->rx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for rx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+}
+
+int
+memif_init_regions_and_queues(struct rte_eth_dev *dev)
+{
+ int ret;
+
+ ret = memif_regions_init(dev->data->dev_private);
+ if (ret < 0)
+ return ret;
+
+ memif_init_rings(dev);
+
+ memif_init_queues(dev);
+
+ return 0;
+}
+
+int
+memif_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_region *mr;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ mr = pmd->regions[i];
+ if (mr != NULL) {
+ if (mr->addr == NULL) {
+ if (mr->fd < 0)
+ return -1;
+ mr->addr = mmap(NULL, mr->region_size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, mr->fd, 0);
+ if (mr->addr == NULL)
+ return -1;
+ }
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->ring_offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_MASTER)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->ring_offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_SLAVE)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
+ dev->data->dev_link.link_status = ETH_LINK_UP;
+ MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
+ return 0;
+}
+
+static int
+memif_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int ret = 0;
+
+ switch (pmd->role) {
+ case MEMIF_ROLE_SLAVE:
+ ret = memif_connect_slave(dev);
+ break;
+ case MEMIF_ROLE_MASTER:
+ ret = memif_connect_master(dev);
+ break;
+ default:
+ MIF_LOG(ERR, "%s: Unknown role: %d.",
+ rte_vdev_device_name(pmd->vdev), pmd->role);
+ ret = -1;
+ break;
+ }
+
+ return ret;
+}
+
+static int
+memif_dev_configure(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ /*
+ * SLAVE - TXQ
+ * MASTER - RXQ
+ */
+ pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_tx_queues : dev->data->nb_rx_queues;
+
+ /*
+ * SLAVE - RXQ
+ * MASTER - TXQ
+ */
+ pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+
+ return 0;
+}
+
+static int
+memif_tx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type =
+ (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->pmd = pmd;
+ dev->data->tx_queues[qid] = mq;
+
+ return 0;
+}
+
+static int
+memif_rx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->mempool = mb_pool;
+ mq->in_port = dev->data->port_id;
+ mq->pmd = pmd;
+ dev->data->rx_queues[qid] = mq;
+
+ return 0;
+}
+
+static void
+memif_queue_release(void *queue)
+{
+ struct memif_queue *q = (struct memif_queue *)queue;
+
+ if (!q)
+ return;
+
+ rte_free(q);
+}
+
+static int
+memif_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static int
+memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+ uint8_t tmp, nq;
+
+ stats->ipackets = 0;
+ stats->ibytes = 0;
+ stats->opackets = 0;
+ stats->obytes = 0;
+ stats->oerrors = 0;
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
+ pmd->run.num_m2s_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* RX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->rx_queues[i];
+ stats->q_ipackets[i] = mq->n_pkts;
+ stats->q_ibytes[i] = mq->n_bytes;
+ stats->ipackets += mq->n_pkts;
+ stats->ibytes += mq->n_bytes;
+ }
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
+ pmd->run.num_s2m_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* TX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->tx_queues[i];
+ stats->q_opackets[i] = mq->n_pkts;
+ stats->q_obytes[i] = mq->n_bytes;
+ stats->q_errors[i] = mq->n_err;
+ stats->opackets += mq->n_pkts;
+ stats->obytes += mq->n_bytes;
+ stats->oerrors += mq->n_err;
+ }
+ return 0;
+}
+
+static void
+memif_stats_reset(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+ struct memif_queue *mq;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
+ dev->data->rx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
+ dev->data->tx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+}
+
+static int
+memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
+ rte_vdev_device_name(pmd->vdev));
+
+ return -1;
+}
+
+static int
+memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
+
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = memif_dev_start,
+ .dev_infos_get = memif_dev_info,
+ .dev_configure = memif_dev_configure,
+ .tx_queue_setup = memif_tx_queue_setup,
+ .rx_queue_setup = memif_rx_queue_setup,
+ .rx_queue_release = memif_queue_release,
+ .tx_queue_release = memif_queue_release,
+ .rx_queue_intr_enable = memif_rx_queue_intr_enable,
+ .rx_queue_intr_disable = memif_rx_queue_intr_disable,
+ .link_update = memif_link_update,
+ .stats_get = memif_stats_get,
+ .stats_reset = memif_stats_reset,
+};
+
+static int
+memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
+ memif_interface_id_t id, uint32_t flags,
+ const char *socket_filename,
+ memif_log2_ring_size_t log2_ring_size,
+ uint16_t pkt_buffer_size, const char *secret,
+ struct ether_addr *ether_addr)
+{
+ int ret = 0;
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct pmd_internals *pmd;
+ const unsigned int numa_node = vdev->device.numa_node;
+ const char *name = rte_vdev_device_name(vdev);
+
+ if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
+ MIF_LOG(ERR, "Zero-copy slave not supported.");
+ return -1;
+ }
+
+ eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
+ if (eth_dev == NULL) {
+ MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
+ return -1;
+ }
+
+ pmd = eth_dev->data->dev_private;
+ memset(pmd, 0, sizeof(*pmd));
+
+ pmd->vdev = vdev;
+ pmd->id = id;
+ pmd->flags = flags;
+ pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
+ pmd->role = role;
+
+ ret = memif_socket_init(eth_dev, socket_filename);
+ if (ret < 0)
+ return ret;
+
+ memset(pmd->secret, 0, sizeof(char) * 24);
+ if (secret != NULL)
+ strlcpy(pmd->secret, secret, sizeof(pmd->secret));
+
+ pmd->cfg.log2_ring_size = log2_ring_size;
+ /* set in .dev_configure() */
+ pmd->cfg.num_s2m_rings = 0;
+ pmd->cfg.num_m2s_rings = 0;
+
+ pmd->cfg.pkt_buffer_size = pkt_buffer_size;
+
+ data = eth_dev->data;
+ data->dev_private = pmd;
+ data->numa_node = numa_node;
+ data->mac_addrs = ether_addr;
+
+ eth_dev->dev_ops = &ops;
+ eth_dev->device = &vdev->device;
+ eth_dev->rx_pkt_burst = eth_memif_rx;
+ eth_dev->tx_pkt_burst = eth_memif_tx;
+
+ rte_eth_dev_probing_finish(eth_dev);
+
+ return 0;
+}
+
+static int
+memif_set_role(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ enum memif_role_t *role = (enum memif_role_t *)extra_args;
+
+ if (strstr(value, "master") != NULL) {
+ *role = MEMIF_ROLE_MASTER;
+ } else if (strstr(value, "slave") != NULL) {
+ *role = MEMIF_ROLE_SLAVE;
+ } else {
+ MIF_LOG(ERR, "Unknown role: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint32_t *flags = (uint32_t *)extra_args;
+
+ if (strstr(value, "yes") != NULL) {
+ *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
+ } else if (strstr(value, "no") != NULL) {
+ *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
+ } else {
+ MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
+
+ /* even if parsing fails, 0 is a valid id */
+ *id = strtoul(value, NULL, 10);
+ return 0;
+}
+
+static int
+memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ uint16_t *pkt_buffer_size = (uint16_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > 0xFFFF) {
+ MIF_LOG(ERR, "Invalid buffer size: %s.", value);
+ return -EINVAL;
+ }
+ *pkt_buffer_size = tmp;
+ return 0;
+}
+
+static int
+memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ memif_log2_ring_size_t *log2_ring_size =
+ (memif_log2_ring_size_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
+ MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
+ value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
+ return -EINVAL;
+ }
+ *log2_ring_size = tmp;
+ return 0;
+}
+
+/* check if directory exists and if we have permission to read/write */
+static int
+memif_check_socket_filename(const char *filename)
+{
+ char *dir = NULL, *tmp;
+ uint32_t idx;
+ int ret = 0;
+
+ tmp = strrchr(filename, '/');
+ if (tmp != NULL) {
+ idx = tmp - filename;
+ dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
+ if (dir == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory.");
+ return -1;
+ }
+ strlcpy(dir, filename, sizeof(char) * (idx + 1));
+ }
+
+ if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
+ W_OK, AT_EACCESS) < 0)) {
+ MIF_LOG(ERR, "Invalid directory: '%s'.", dir);
+ ret = -EINVAL;
+ }
+
+ if (dir != NULL)
+ rte_free(dir);
+
+ return ret;
+}
+
+static int
+memif_set_socket_filename(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ const char **socket_filename = (const char **)extra_args;
+
+ *socket_filename = value;
+ return memif_check_socket_filename(*socket_filename);
+}
+
+static int
+memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ struct ether_addr *ether_addr = (struct ether_addr *)extra_args;
+ int ret = 0;
+
+ ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+ ðer_addr->addr_bytes[0], ðer_addr->addr_bytes[1],
+ ðer_addr->addr_bytes[2], ðer_addr->addr_bytes[3],
+ ðer_addr->addr_bytes[4], ðer_addr->addr_bytes[5]);
+ if (ret != 6)
+ MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
+ return 0;
+}
+
+static int
+memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **secret = (const char **)extra_args;
+
+ *secret = value;
+ return 0;
+}
+
+static int
+rte_pmd_memif_probe(struct rte_vdev_device *vdev)
+{
+ RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
+ RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
+ int ret = 0;
+ struct rte_kvargs *kvlist;
+ const char *name = rte_vdev_device_name(vdev);
+ enum memif_role_t role = MEMIF_ROLE_SLAVE;
+ memif_interface_id_t id = 0;
+ uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
+ memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
+ const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
+ uint32_t flags = 0;
+ const char *secret = NULL;
+ struct ether_addr *ether_addr = rte_zmalloc("", sizeof(struct ether_addr), 0);
+
+ eth_random_addr(ether_addr->addr_bytes);
+
+ MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
+
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ MIF_LOG(ERR, "Multi-processing not supported for memif.");
+ /* TODO:
+ * Request connection information.
+ *
+ * Once memif in the primary process is connected,
+ * broadcast connection information.
+ */
+ return -1;
+ }
+
+ kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
+
+ /* parse parameters */
+ if (kvlist != NULL) {
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
+ &memif_set_role, &role);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
+ &memif_set_id, &id);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ &memif_set_bs, &pkt_buffer_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
+ &memif_set_rs, &log2_ring_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
+ &memif_set_socket_filename,
+ (void *)(&socket_filename));
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
+ &memif_set_mac, ether_addr);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
+ &memif_set_zc, &flags);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
+ &memif_set_secret, (void *)(&secret));
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* create interface */
+ ret = memif_create(vdev, role, id, flags, socket_filename,
+ log2_ring_size, pkt_buffer_size, secret, ether_addr);
+
+exit:
+ if (kvlist != NULL)
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_memif_remove(struct rte_vdev_device *vdev)
+{
+ struct rte_eth_dev *eth_dev;
+ struct pmd_internals *pmd;
+
+ eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
+ if (eth_dev == NULL)
+ return 0;
+
+ pmd = eth_dev->data->dev_private;
+
+ memif_msg_enq_disconnect(pmd->cc, "Device removed", 0);
+ memif_disconnect(eth_dev);
+
+ memif_socket_remove_device(eth_dev);
+
+ pmd->vdev = NULL;
+
+ rte_eth_dev_release_port(eth_dev);
+
+ return 0;
+}
+
+static struct rte_vdev_driver pmd_memif_drv = {
+ .probe = rte_pmd_memif_probe,
+ .remove = rte_pmd_memif_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
+
+RTE_PMD_REGISTER_PARAM_STRING(net_memif,
+ ETH_MEMIF_ID_ARG "=<int>"
+ ETH_MEMIF_ROLE_ARG "=master|slave"
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG "=<int>"
+ ETH_MEMIF_RING_SIZE_ARG "=<int>"
+ ETH_MEMIF_SOCKET_ARG "=<string>"
+ ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
+ ETH_MEMIF_ZC_ARG "=yes|no"
+ ETH_MEMIF_SECRET_ARG "=<string>");
+
+int memif_logtype;
+
+RTE_INIT(memif_init_log)
+{
+ memif_logtype = rte_log_register("pmd.net.memif");
+ if (memif_logtype >= 0)
+ rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
new file mode 100644
index 000000000..175eef667
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.h
@@ -0,0 +1,207 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_ETH_MEMIF_H_
+#define _RTE_ETH_MEMIF_H_
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif /* GNU_SOURCE */
+
+#include <sys/queue.h>
+
+#include <rte_ethdev_driver.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
+#include "memif.h"
+
+#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/tmp/memif.sock"
+#define ETH_MEMIF_DEFAULT_RING_SIZE 10
+#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
+
+#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
+#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
+#define ETH_MEMIF_MAX_REGION_NUM 256
+
+#define ETH_MEMIF_SHM_NAME_SIZE 32
+
+extern int memif_logtype;
+
+#define MIF_LOG(level, fmt, args...) \
+ rte_log(RTE_LOG_ ## level, memif_logtype, \
+ "%s(): " fmt "\n", __func__, ##args)
+
+enum memif_role_t {
+ MEMIF_ROLE_MASTER,
+ MEMIF_ROLE_SLAVE,
+};
+
+struct memif_region {
+ void *addr; /**< shared memory address */
+ memif_region_size_t region_size; /**< shared memory size */
+ int fd; /**< shared memory file descriptor */
+ uint32_t pkt_buffer_offset;
+ /**< offset from 'addr' to first packet buffer */
+};
+
+struct memif_queue {
+ struct rte_mempool *mempool; /**< mempool for RX packets */
+ uint16_t in_port; /**< port id */
+
+ struct pmd_internals *pmd; /**< device internals */
+
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+
+ /* ring info */
+ memif_ring_type_t type; /**< ring type */
+ memif_ring_t *ring; /**< pointer to ring */
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+
+ memif_region_index_t region; /**< shared memory region index */
+ memif_region_offset_t ring_offset;
+ /**< ring offset from start of shm region (ring - memif_region.addr) */
+
+ uint16_t last_head; /**< last ring head */
+ uint16_t last_tail; /**< last ring tail */
+
+ /* rx/tx info */
+ uint64_t n_pkts; /**< number of rx/tx packets */
+ uint64_t n_bytes; /**< number of rx/tx bytes */
+ uint64_t n_err; /**< number of tx errors */
+};
+
+struct pmd_internals {
+ memif_interface_id_t id; /**< unique id */
+ enum memif_role_t role; /**< device role */
+ uint32_t flags; /**< device status flags */
+#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
+/**< device is connecting */
+#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
+/**< device is connected */
+#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
+/**< device is zero-copy enabled */
+#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
+/**< device has not been configured and can not accept connection requests */
+
+ char *socket_filename; /**< pointer to socket filename */
+ char secret[24]; /**< secret (optional security parameter) */
+
+ struct memif_control_channel *cc; /**< control channel */
+
+ struct memif_region *regions[ETH_MEMIF_MAX_REGION_NUM];
+ /**< shared memory regions */
+ memif_region_index_t regions_num; /**< number of regions */
+
+ /* remote info */
+ char remote_name[64]; /**< remote app name */
+ char remote_if_name[64]; /**< remote peer name */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } cfg; /**< Configured parameters (max values) */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } run;
+ /**< Parameters used in active connection */
+
+ char local_disc_string[96]; /**< local disconnect reason */
+ char remote_disc_string[96]; /**< remote disconnect reason */
+
+ struct rte_vdev_device *vdev; /**< vdev handle */
+};
+
+/**
+ * Unmap shared memory and free regions from memory.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_free_regions(struct pmd_internals *pmd);
+
+/**
+ * Finalize connection establishment process. Map shared memory file
+ * (master role), initialize ring queue, set link status up.
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect(struct rte_eth_dev *dev);
+
+/**
+ * Create shared memory file and initialize ring queue.
+ * Only called by slave when establishing connection
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_init_regions_and_queues(struct rte_eth_dev *dev);
+
+/**
+ * Get memif version string.
+ *
+ * @return
+ * - memif version string
+ */
+const char *memif_version(void);
+
+#ifndef MFD_HUGETLB
+#ifndef __NR_memfd_create
+
+#if defined __x86_64__
+#define __NR_memfd_create 319
+#elif defined __x86_32__
+#define __NR_memfd_create 1073742143
+#elif defined __arm__
+#define __NR_memfd_create 385
+#elif defined __aarch64__
+#define __NR_memfd_create 279
+#elif defined __powerpc__
+#define __NR_memfd_create 360
+#elif defined __i386__
+#define __NR_memfd_create 356
+#else
+#error "__NR_memfd_create unknown for this architecture"
+#endif
+
+#endif /* __NR_memfd_create */
+
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif /* MFD_HUGETLB */
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#endif /* RTE_ETH_MEMIF_H */
diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
new file mode 100644
index 000000000..8861484fb
--- /dev/null
+++ b/drivers/net/memif/rte_pmd_memif_version.map
@@ -0,0 +1,4 @@
+DPDK_19.08 {
+
+ local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index ed99896c3..b57073483 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -24,6 +24,7 @@ drivers = ['af_packet',
'ixgbe',
'kni',
'liquidio',
+ 'memif',
'mlx4',
'mlx5',
'mvneta',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7c9b4b538..326b2a30b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -174,6 +174,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
--
2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v8] /net: memory interface (memif)
2019-05-16 11:46 ` [dpdk-dev] [RFC v8] " Jakub Grajciar
@ 2019-05-16 15:18 ` Stephen Hemminger
2019-05-16 15:19 ` Stephen Hemminger
` (4 subsequent siblings)
5 siblings, 0 replies; 58+ messages in thread
From: Stephen Hemminger @ 2019-05-16 15:18 UTC (permalink / raw)
To: Jakub Grajciar; +Cc: dev
On Thu, 16 May 2019 13:46:58 +0200
Jakub Grajciar <jgrajcia@cisco.com> wrote:
> +struct memif_queue {
> + struct rte_mempool *mempool; /**< mempool for RX packets */
> + uint16_t in_port; /**< port id */
> +
> + struct pmd_internals *pmd; /**< device internals */
> +
> + struct rte_intr_handle intr_handle; /**< interrupt handle */
> +
> + /* ring info */
> + memif_ring_type_t type; /**< ring type */
> + memif_ring_t *ring; /**< pointer to ring */
> + memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
> +
> + memif_region_index_t region; /**< shared memory region index */
> + memif_region_offset_t ring_offset;
> + /**< ring offset from start of shm region (ring - memif_region.addr) */
> +
> + uint16_t last_head; /**< last ring head */
> + uint16_t last_tail; /**< last ring tail */
> +
> + /* rx/tx info */
> + uint64_t n_pkts; /**< number of rx/tx packets */
> + uint64_t n_bytes; /**< number of rx/tx bytes */
> + uint64_t n_err; /**< number of tx errors */
> +};
> +
The layout of this structure has lots of holes, you might want to rearrange elements.
struct memif_queue {
struct rte_mempool * mempool; /* 0 8 */
uint16_t in_port; /* 8 2 */
/* XXX 6 bytes hole, try to pack */
struct pmd_internals * pmd; /* 16 8 */
struct rte_intr_handle intr_handle; /* 24 26656 */
/* --- cacheline 416 boundary (26624 bytes) was 56 bytes ago --- */
memif_ring_type_t type; /* 26680 4 */
/* XXX 4 bytes hole, try to pack */
/* --- cacheline 417 boundary (26688 bytes) --- */
memif_ring_t * ring; /* 26688 8 */
memif_log2_ring_size_t log2_ring_size; /* 26696 1 */
/* XXX 1 byte hole, try to pack */
memif_region_index_t region; /* 26698 2 */
memif_region_offset_t ring_offset; /* 26700 4 */
uint16_t last_head; /* 26704 2 */
uint16_t last_tail; /* 26706 2 */
/* XXX 4 bytes hole, try to pack */
uint64_t n_pkts; /* 26712 8 */
uint64_t n_bytes; /* 26720 8 */
uint64_t n_err; /* 26728 8 */
/* size: 26736, cachelines: 418, members: 14 */
/* sum members: 26721, holes: 4, sum holes: 15 */
/* last cacheline: 48 bytes */
};
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v8] /net: memory interface (memif)
2019-05-16 11:46 ` [dpdk-dev] [RFC v8] " Jakub Grajciar
2019-05-16 15:18 ` Stephen Hemminger
@ 2019-05-16 15:19 ` Stephen Hemminger
2019-05-16 15:21 ` Stephen Hemminger
` (3 subsequent siblings)
5 siblings, 0 replies; 58+ messages in thread
From: Stephen Hemminger @ 2019-05-16 15:19 UTC (permalink / raw)
To: Jakub Grajciar; +Cc: dev
On Thu, 16 May 2019 13:46:58 +0200
Jakub Grajciar <jgrajcia@cisco.com> wrote:
> +
> +#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/tmp/memif.sock"
> +#define ETH_MEMIF_DEFAULT_RING_SIZE 10
> +#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
The current best practice on Linux is to use /run for sockets.
It is even in the latest Filesystem layout standard.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v8] /net: memory interface (memif)
2019-05-16 11:46 ` [dpdk-dev] [RFC v8] " Jakub Grajciar
2019-05-16 15:18 ` Stephen Hemminger
2019-05-16 15:19 ` Stephen Hemminger
@ 2019-05-16 15:21 ` Stephen Hemminger
2019-05-20 9:22 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-16 15:25 ` Stephen Hemminger
` (2 subsequent siblings)
5 siblings, 1 reply; 58+ messages in thread
From: Stephen Hemminger @ 2019-05-16 15:21 UTC (permalink / raw)
To: Jakub Grajciar; +Cc: dev
On Thu, 16 May 2019 13:46:58 +0200
Jakub Grajciar <jgrajcia@cisco.com> wrote:
> +enum memif_role_t {
> + MEMIF_ROLE_MASTER,
> + MEMIF_ROLE_SLAVE,
> +};
Because master/slave terminology is potentially culturally offensive it
is flagged by many corporate source scanning tools.
Could you use primary/secondary in memif instead?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v8] /net: memory interface (memif)
2019-05-16 15:21 ` Stephen Hemminger
@ 2019-05-20 9:22 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
0 siblings, 0 replies; 58+ messages in thread
From: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) @ 2019-05-20 9:22 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, Damjan Marion (damarion)
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, May 16, 2019 5:22 PM
> To: Jakub Grajciar
> <jgrajcia@cisco.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC v8] /net: memory interface (memif)
>
> On Thu, 16 May 2019 13:46:58 +0200
> Jakub Grajciar <jgrajcia@cisco.com> wrote:
>
> > +enum memif_role_t {
> > + MEMIF_ROLE_MASTER,
> > + MEMIF_ROLE_SLAVE,
> > +};
>
> Because master/slave terminology is potentially culturally offensive it is
> flagged by many corporate source scanning tools.
>
> Could you use primary/secondary in memif instead?
Other implementations also use master/slave terminology, so changing it would be confusing. However, we will consider using primary/secondary in next protocol version.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v8] /net: memory interface (memif)
2019-05-16 11:46 ` [dpdk-dev] [RFC v8] " Jakub Grajciar
` (2 preceding siblings ...)
2019-05-16 15:21 ` Stephen Hemminger
@ 2019-05-16 15:25 ` Stephen Hemminger
2019-05-16 15:28 ` Stephen Hemminger
2019-05-20 10:18 ` [dpdk-dev] [RFC v9] " Jakub Grajciar
5 siblings, 0 replies; 58+ messages in thread
From: Stephen Hemminger @ 2019-05-16 15:25 UTC (permalink / raw)
To: Jakub Grajciar; +Cc: dev
On Thu, 16 May 2019 13:46:58 +0200
Jakub Grajciar <jgrajcia@cisco.com> wrote:
> + /* remote info */
> + char remote_name[64]; /**< remote app name */
> + char remote_if_name[64];
Hard coding magic string sizes has future potential for disaster.
Could you at least add a #define.
> +typedef struct __rte_packed {
> + uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
> + memif_version_t min_version; /**< lowest supported memif version */
> + memif_version_t max_version; /**< highest supported memif version */
> + memif_region_index_t max_region; /**< maximum num of regions */
> + memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
> + memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
> + memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
> +} memif_msg_hello_t;
Why is name a uint8_t not char? Are end up having to cast it.
Maybe it is because it UTF-8 or you have some subsystem where sizeof(char) != sizeof(uint8_t)?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v8] /net: memory interface (memif)
2019-05-16 11:46 ` [dpdk-dev] [RFC v8] " Jakub Grajciar
` (3 preceding siblings ...)
2019-05-16 15:25 ` Stephen Hemminger
@ 2019-05-16 15:28 ` Stephen Hemminger
2019-05-20 10:18 ` [dpdk-dev] [RFC v9] " Jakub Grajciar
5 siblings, 0 replies; 58+ messages in thread
From: Stephen Hemminger @ 2019-05-16 15:28 UTC (permalink / raw)
To: Jakub Grajciar; +Cc: dev
On Thu, 16 May 2019 13:46:58 +0200
Jakub Grajciar <jgrajcia@cisco.com> wrote:
> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
> index f0061d99f..b05f1462d 100644
> --- a/app/test-pmd/testpmd.c
> +++ b/app/test-pmd/testpmd.c
> @@ -2493,6 +2493,8 @@ pmd_test_exit(void)
> device = rte_eth_devices[pt_id].device;
> if (device && !strcmp(device->driver->name, "net_virtio_user"))
> detach_port_device(pt_id);
> + else if (device && !strcmp(device->driver->name, "net_memif"))
> + detach_port_device(pt_id);
> }
> }
Why not do a proper implementation of dev_close instead of pushing
the problem onto the application.
The existing virtio_user hack is a bug and it should be fixed.
^ permalink raw reply [flat|nested] 58+ messages in thread
* [dpdk-dev] [RFC v9] /net: memory interface (memif)
2019-05-16 11:46 ` [dpdk-dev] [RFC v8] " Jakub Grajciar
` (4 preceding siblings ...)
2019-05-16 15:28 ` Stephen Hemminger
@ 2019-05-20 10:18 ` Jakub Grajciar
2019-05-29 17:29 ` Ferruh Yigit
2019-05-31 6:22 ` [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD Jakub Grajciar
5 siblings, 2 replies; 58+ messages in thread
From: Jakub Grajciar @ 2019-05-20 10:18 UTC (permalink / raw)
To: dev; +Cc: Jakub Grajciar
Memory interface (memif), provides high performance
packet transfer over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
---
MAINTAINERS | 6 +
config/common_base | 5 +
config/common_linux | 1 +
doc/guides/nics/features/memif.ini | 14 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/memif.rst | 234 ++++
doc/guides/rel_notes/release_19_08.rst | 5 +
drivers/net/Makefile | 1 +
drivers/net/memif/Makefile | 30 +
drivers/net/memif/memif.h | 179 +++
drivers/net/memif/memif_socket.c | 1124 +++++++++++++++++
drivers/net/memif/memif_socket.h | 105 ++
drivers/net/memif/meson.build | 13 +
drivers/net/memif/rte_eth_memif.c | 1198 +++++++++++++++++++
drivers/net/memif/rte_eth_memif.h | 212 ++++
drivers/net/memif/rte_pmd_memif_version.map | 4 +
drivers/net/meson.build | 1 +
mk/rte.app.mk | 1 +
18 files changed, 3134 insertions(+)
create mode 100644 doc/guides/nics/features/memif.ini
create mode 100644 doc/guides/nics/memif.rst
create mode 100644 drivers/net/memif/Makefile
create mode 100644 drivers/net/memif/memif.h
create mode 100644 drivers/net/memif/memif_socket.c
create mode 100644 drivers/net/memif/memif_socket.h
create mode 100644 drivers/net/memif/meson.build
create mode 100644 drivers/net/memif/rte_eth_memif.c
create mode 100644 drivers/net/memif/rte_eth_memif.h
create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
requires patch: http://patchwork.dpdk.org/patch/51511/
Memif uses a unix domain socket as control channel (cc).
rte_interrupts.h is used to handle interrupts for this cc.
This patch is required, because the message received can
be a reason for disconnecting.
v3:
- coding style fixes
- documentation
- doxygen comments
- use strlcpy() instead of strncpy()
- use RTE_BUILD_BUG_ON instead of _Static_assert()
- fix meson build deps
v4:
- coding style fixes
- doc update (desc format, messaging, features, ...)
- pointer arithmetic fix
- __rte_packed and __rte_aligned instead of __attribute__()
- fixed multi-queue
- support additional archs (memfd_create syscall)
v5:
- coding style fixes
- add release notes
- remove unused dependencies
- doc update
v6:
- remove socket on termination of testpmd application
- disallow memif probing in secondary process (return error), multi-process support will be implemented in separate patch
- fix chained buffer packet length
- use single region for rings and buffers (non-zero-copy)
- doc update
v7:
- coding style fixes
v8:
- release notes: 19.05 -> 19.08
v9:
- rearrange elements in structures to remove holes
- default socket file in /run
- #define name sizes
- implement dev_close op
diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829c5..3698c146b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -841,6 +841,12 @@ F: drivers/net/softnic/
F: doc/guides/nics/features/softnic.ini
F: doc/guides/nics/softnic.rst
+Memif PMD
+M: Jakub Grajciar <jgrajcia@cisco.com>
+F: drivers/net/memif/
+F: doc/guides/nics/memif.rst
+F: doc/guides/nics/features/memif.ini
+
Crypto Drivers
--------------
diff --git a/config/common_base b/config/common_base
index 6b96e0e80..dca119a65 100644
--- a/config/common_base
+++ b/config/common_base
@@ -444,6 +444,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
#
CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+#
+# Compile Memory Interface PMD driver (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_MEMIF=n
+
#
# Compile link bonding PMD library
#
diff --git a/config/common_linux b/config/common_linux
index 75334273d..87514fe4f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_LIBRTE_IFC_PMD=y
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_MEMIF=y
CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
CONFIG_RTE_LIBRTE_PMD_TAP=y
CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
new file mode 100644
index 000000000..807d9ecdc
--- /dev/null
+++ b/doc/guides/nics/features/memif.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'memif' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status = Y
+Basic stats = Y
+Jumbo frame = Y
+ARMv8 = Y
+Power8 = Y
+x86-32 = Y
+x86-64 = Y
+Usage doc = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2221c35f2..691e7209f 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
intel_vf
kni
liquidio
+ memif
mlx4
mlx5
mvneta
diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
new file mode 100644
index 000000000..0d0312442
--- /dev/null
+++ b/doc/guides/nics/memif.rst
@@ -0,0 +1,234 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2018-2019 Cisco Systems, Inc.
+
+======================
+Memif Poll Mode Driver
+======================
+
+Shared memory packet interface (memif) PMD allows for DPDK and any other client
+using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
+Linux only.
+
+The created device transmits packets in a raw format. It can be used with
+Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
+supported in DPDK memif implementation.
+
+Memif works in two roles: master and slave. Slave connects to master over an
+existing socket. It is also a producer of shared memory file and initializes
+the shared memory. Each interface can be connected to one peer interface
+at same time. The peer interface is identified by id parameter. Master
+creates the socket and listens for any slave connection requests. The socket
+may already exist on the system. Be sure to remove any such sockets, if you
+are creating a master interface, or you will see an "Address already in use"
+error. Function ``rte_pmd_memif_remove()``, which removes memif interface,
+will also remove a listener socket, if it is not being used by any other
+interface.
+
+The method to enable one or more interfaces is to use the
+``--vdev=net_memif0`` option on the DPDK application command line. Each
+``--vdev=net_memif1`` option given will create an interface named net_memif0,
+net_memif1, and so on. Memif uses unix domain socket to transmit control
+messages. Each memif has a unique id per socket. This id is used to identify
+peer interface. If you are connecting multiple
+interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
+etc. Note that if you assign a socket to a master interface it becomes a
+listener socket. Listener socket can not be used by a slave interface on same
+client.
+
+.. csv-table:: **Memif configuration options**
+ :header: "Option", "Description", "Default", "Valid value"
+
+ "id=0", "Used to identify peer interface", "0", "uint32_t"
+ "role=master", "Set memif role", "slave", "master|slave"
+ "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
+ "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
+ "nrxq=2", "Number of RX queues", "1", "255"
+ "ntxq=2", "Number of TX queues", "1", "255"
+ "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
+ "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
+ "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
+ "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
+
+**Connection establishment**
+
+In order to create memif connection, two memif interfaces, each in separate
+process, are needed. One interface in ``master`` role and other in
+``slave`` role. It is not possible to connect two interfaces in a single
+process. Each interface can be connected to one interface at same time,
+identified by matching id parameter.
+
+Memif driver uses unix domain socket to exchange required information between
+memif interfaces. Socket file path is specified at interface creation see
+*Memif configuration options* table above. If socket is used by ``master``
+interface, it's marked as listener socket (in scope of current process) and
+listens to connection requests from other processes. One socket can be used by
+multiple interfaces. One process can have ``slave`` and ``master`` interfaces
+at the same time, provided each role is assigned unique socket.
+
+For detailed information on memif control messages, see: net/memif/memif.h.
+
+Slave interface attempts to make a connection on assigned socket. Process
+listening on this socket will extract the connection request and create a new
+connected socket (control channel). Then it sends the 'hello' message
+(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
+adjusts its configuration accordingly, and sends 'init' message
+(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
+uses this id to find master interface, and assigns the control channel to this
+interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
+sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
+every region allocated. Master responds to each of these messages with 'ack'
+message. Same behavior applies to rings. Slave sends 'add ring' message
+(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
+each message with 'ack' message. To finalize the connection, slave interface
+sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
+master maps regions to its address space, initializes rings and responds with
+'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
+(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
+any time, due to driver error or if the interface is being deleted.
+
+Files
+
+- net/memif/memif.h *- control messages definitions*
+- net/memif/memif_socket.h
+- net/memif/memif_socket.c
+
+Shared memory
+~~~~~~~~~~~~~
+
+**Shared memory format**
+
+Slave is producer and master is consumer. Memory regions, are mapped shared memory files,
+created by memif slave and provided to master at connection establishment.
+Regions contain rings and buffers. Rings and buffers can also be separated into multiple
+regions. For no-zero-copy, rings and buffers are stored inside single memory
+region to reduce the number of opened files.
+
+region n (no-zero-copy):
+
++-----------------------+-------------------------------------------------------------------------+
+| Rings | Buffers |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+| S2M rings | M2S rings | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+
+S2M OR M2S Rings:
+
++--------+--------+-----------------------+
+| ring 0 | ring 1 | ring num_s2m_rings - 1|
++--------+--------+-----------------------+
+
+ring 0:
+
++-------------+---------------------------------------+
+| ring header | (1 << pmd->run.log2_ring_size) * desc |
++-------------+---------------------------------------+
+
+Descriptors are assigned packet buffers in order of rings creation. If we have one ring
+in each direction and ring size is 1024, then first 1024 buffers will belong to S2M ring and
+last 1024 will belong to M2S ring. In case of zero-copy, buffers are dequeued and
+enqueued as needed.
+
+**Descriptor format**
+
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|0 |length |region |flags |
++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
+|1 |metadata |offset |
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+**Flags field - flags (Quad Word 0, bits 0:15)**
+
++-----+--------------------+------------------------------------------------------------------------------------------------+
+|Bits |Name |Functionality |
++=====+====================+================================================================================================+
+|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
++-----+--------------------+------------------------------------------------------------------------------------------------+
+
+**Region index - region (Quad Word 0, 16:31)**
+
+Index of memory region, the buffer is located in.
+
+**Data length - length (Quad Word 0, 32:63)**
+
+Length of transmitted/received data.
+
+**Data Offset - offset (Quad Word 1, 0:31)**
+
+Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
+
+**Metadata - metadata (Quad Word 1, 32:63)**
+
+Buffer metadata.
+
+Files
+
+- net/memif/memif.h *- descriptor and ring definitions*
+- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
+
+Example: testpmd and testpmd
+----------------------------
+In this example we run two instances of testpmd application and transmit packets over memif.
+
+First create ``master`` interface::
+
+ #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
+
+Now create ``slave`` interface (master must be already running so the slave will connect)::
+
+ #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
+
+Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
+
+ testpmd> set fwd rxonly
+ testpmd> start
+
+ testpmd> set fwd txonly
+ testpmd> start
+
+Show status::
+
+ testpmd> show port stats 0
+
+Example: testpmd and VPP
+------------------------
+For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
+
+Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
+
+ vpp# create interface memif id 0 master no-zero-copy
+ vpp# set interface state memif0/0 up
+ vpp# set interface ip address memif0/0 192.168.1.1/24
+
+To see socket filename use show memif command::
+
+ vpp# show memif
+ sockets
+ id listener filename
+ 0 yes (1) /run/vpp/memif.sock
+ ...
+
+Now create memif interface by running testpmd with these command line options::
+
+ #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
+
+Testpmd should now create memif slave interface and try to connect to master.
+In testpmd set forward option to icmpecho and start forwarding::
+
+ testpmd> set fwd icmpecho
+ testpmd> start
+
+Send ping from VPP::
+
+ vpp# ping 192.168.1.2
+ 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
+ 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
+ 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
+ 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index b9510f93a..42a7922b3 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -54,6 +54,11 @@ New Features
Also, make sure to start the actual text at the margin.
=========================================================
+* **Added memif PMD.**
+
+ Added the new Shared Memory Packet Interface (``memif``) PMD.
+ See the :doc:`../nics/memif` guide for more details on this new driver.
+
Removed Items
-------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3a72cf38c..78cb10fc6 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -35,6 +35,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
DIRS-$(CONFIG_RTE_LIBRTE_IPN3KE_PMD) += ipn3ke
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
new file mode 100644
index 000000000..2ae3c37a2
--- /dev/null
+++ b/drivers/net/memif/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_memif.a
+
+EXPORT_MAP := rte_pmd_memif_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
+LDLIBS += -lrte_ethdev -lrte_kvargs
+LDLIBS += -lrte_hash
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
new file mode 100644
index 000000000..3948b1f50
--- /dev/null
+++ b/drivers/net/memif/memif.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_H_
+#define _MEMIF_H_
+
+#define MEMIF_COOKIE 0x3E31F20
+#define MEMIF_VERSION_MAJOR 2
+#define MEMIF_VERSION_MINOR 0
+#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
+#define MEMIF_NAME_SZ 32
+
+/*
+ * S2M: direction slave -> master
+ * M2S: direction master -> slave
+ */
+
+/*
+ * Type definitions
+ */
+
+typedef enum memif_msg_type {
+ MEMIF_MSG_TYPE_NONE,
+ MEMIF_MSG_TYPE_ACK,
+ MEMIF_MSG_TYPE_HELLO,
+ MEMIF_MSG_TYPE_INIT,
+ MEMIF_MSG_TYPE_ADD_REGION,
+ MEMIF_MSG_TYPE_ADD_RING,
+ MEMIF_MSG_TYPE_CONNECT,
+ MEMIF_MSG_TYPE_CONNECTED,
+ MEMIF_MSG_TYPE_DISCONNECT,
+} memif_msg_type_t;
+
+typedef enum {
+ MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
+ MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
+} memif_ring_type_t;
+
+typedef enum {
+ MEMIF_INTERFACE_MODE_ETHERNET,
+ MEMIF_INTERFACE_MODE_IP,
+ MEMIF_INTERFACE_MODE_PUNT_INJECT,
+} memif_interface_mode_t;
+
+typedef uint16_t memif_region_index_t;
+typedef uint32_t memif_region_offset_t;
+typedef uint64_t memif_region_size_t;
+typedef uint16_t memif_ring_index_t;
+typedef uint32_t memif_interface_id_t;
+typedef uint16_t memif_version_t;
+typedef uint8_t memif_log2_ring_size_t;
+
+/*
+ * Socket messages
+ */
+
+ /**
+ * M2S
+ * Contains master interfaces configuration.
+ */
+typedef struct __rte_packed {
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+ memif_version_t min_version; /**< lowest supported memif version */
+ memif_version_t max_version; /**< highest supported memif version */
+ memif_region_index_t max_region; /**< maximum num of regions */
+ memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
+ memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
+ memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
+} memif_msg_hello_t;
+
+/**
+ * S2M
+ * Contains information required to identify interface
+ * to which the slave wants to connect.
+ */
+typedef struct __rte_packed {
+ memif_version_t version; /**< memif version */
+ memif_interface_id_t id; /**< interface id */
+ memif_interface_mode_t mode:8; /**< interface mode */
+ uint8_t secret[24]; /**< optional security parameter */
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+} memif_msg_init_t;
+
+/**
+ * S2M
+ * Request master to add new shared memory region to master interface.
+ * Shared files file descriptor is passed in cmsghdr.
+ */
+typedef struct __rte_packed {
+ memif_region_index_t index; /**< shm regions index */
+ memif_region_size_t size; /**< shm region size */
+} memif_msg_add_region_t;
+
+/**
+ * S2M
+ * Request master to add new ring to master interface.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
+ memif_ring_index_t index; /**< ring index */
+ memif_region_index_t region; /**< region index on which this ring is located */
+ memif_region_offset_t offset; /**< buffer start offset */
+ memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
+ uint16_t private_hdr_size; /**< used for private metadata */
+} memif_msg_add_ring_t;
+
+/**
+ * S2M
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
+} memif_msg_connect_t;
+
+/**
+ * M2S
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
+} memif_msg_connected_t;
+
+/**
+ * S2M & M2S
+ * Disconnect interfaces.
+ */
+typedef struct __rte_packed {
+ uint32_t code; /**< error code */
+ uint8_t string[96]; /**< disconnect reason */
+} memif_msg_disconnect_t;
+
+typedef struct __rte_packed __rte_aligned(128)
+{
+ memif_msg_type_t type:16;
+ union {
+ memif_msg_hello_t hello;
+ memif_msg_init_t init;
+ memif_msg_add_region_t add_region;
+ memif_msg_add_ring_t add_ring;
+ memif_msg_connect_t connect;
+ memif_msg_connected_t connected;
+ memif_msg_disconnect_t disconnect;
+ };
+} memif_msg_t;
+
+/*
+ * Ring and Descriptor Layout
+ */
+
+/**
+ * Buffer descriptor.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
+ memif_region_index_t region; /**< region index on which the buffer is located */
+ uint32_t length; /**< buffer length */
+ memif_region_offset_t offset; /**< buffer offset */
+ uint32_t metadata;
+} memif_desc_t;
+
+#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
+ uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
+
+typedef struct {
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+ uint32_t cookie; /**< MEMIF_COOKIE */
+ uint16_t flags; /**< flags */
+#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ volatile uint16_t head; /**< pointer to ring buffer head */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ volatile uint16_t tail; /**< pointer to ring buffer tail */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ memif_desc_t desc[0]; /**< buffer descriptors */
+} memif_ring_t;
+
+#endif /* _MEMIF_H_ */
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
new file mode 100644
index 000000000..98e8ceefd
--- /dev/null
+++ b/drivers/net/memif/memif_socket.c
@@ -0,0 +1,1124 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+static void memif_intr_handler(void *arg);
+
+static ssize_t
+memif_msg_send(int fd, memif_msg_t *msg, int afd)
+{
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ struct cmsghdr *cmsg;
+ char ctl[CMSG_SPACE(sizeof(int))];
+
+ iov[0].iov_base = msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+
+ if (afd > 0) {
+ memset(&ctl, 0, sizeof(ctl));
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+ cmsg = CMSG_FIRSTHDR(&mh);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
+ }
+
+ return sendmsg(fd, &mh, 0);
+}
+
+static int
+memif_msg_send_from_queue(struct memif_control_channel *cc)
+{
+ ssize_t size;
+ int ret = 0;
+ struct memif_msg_queue_elt *e;
+
+ e = TAILQ_FIRST(&cc->msg_queue);
+ if (e == NULL)
+ return 0;
+
+ size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
+ ret = -1;
+ } else {
+ MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
+ }
+ TAILQ_REMOVE(&cc->msg_queue, e, next);
+ rte_free(e);
+
+ return ret;
+}
+
+static struct memif_msg_queue_elt *
+memif_msg_enq(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e;
+
+ e = rte_zmalloc("memif_msg", sizeof(struct memif_msg_queue_elt), 0);
+ if (e == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control message.");
+ return NULL;
+ }
+
+ e->fd = -1;
+ TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
+
+ return e;
+}
+
+void
+memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code)
+{
+ struct memif_msg_queue_elt *e;
+ struct pmd_internals *pmd;
+ memif_msg_disconnect_t *d;
+
+ if (cc == NULL) {
+ MIF_LOG(DEBUG, "Missing control channel.");
+ return;
+ }
+
+ e = memif_msg_enq(cc);
+ if (e == NULL) {
+ MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
+ return;
+ }
+
+ d = &e->msg.disconnect;
+
+ e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
+ d->code = err_code;
+
+ if (reason != NULL) {
+ strlcpy((char *)d->string, reason, sizeof(d->string));
+ if (cc->dev != NULL) {
+ pmd = cc->dev->data->dev_private;
+ strlcpy(pmd->local_disc_string, reason,
+ sizeof(pmd->local_disc_string));
+ }
+ }
+}
+
+static int
+memif_msg_enq_hello(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ memif_msg_hello_t *h;
+
+ if (e == NULL)
+ return -1;
+
+ h = &e->msg.hello;
+
+ e->msg.type = MEMIF_MSG_TYPE_HELLO;
+ h->min_version = MEMIF_VERSION;
+ h->max_version = MEMIF_VERSION;
+ h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_region = ETH_MEMIF_MAX_REGION_NUM - 1;
+ h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
+
+ strlcpy((char *)h->name, rte_version(), sizeof(h->name));
+
+ return 0;
+}
+
+static int
+memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_hello_t *h = &msg->hello;
+
+ if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
+ memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ /* Set parameters for active connection */
+ pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
+ pmd->cfg.num_s2m_rings);
+ pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
+ pmd->cfg.num_m2s_rings);
+ pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
+ pmd->cfg.log2_ring_size);
+ pmd->run.pkt_buffer_size = pmd->cfg.pkt_buffer_size;
+
+ strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
+
+ MIF_LOG(DEBUG, "%s: Connecting to %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
+{
+ memif_msg_init_t *i = &msg->init;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *pmd;
+ struct rte_eth_dev *dev;
+
+ if (i->version != MEMIF_VERSION) {
+ memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ if (cc->socket == NULL) {
+ memif_msg_enq_disconnect(cc, "Device error", 0);
+ return -1;
+ }
+
+ /* Find device with requested ID */
+ TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
+ dev = elt->dev;
+ pmd = dev->data->dev_private;
+ if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
+ pmd->id == i->id) {
+ /* assign control channel to device */
+ cc->dev = dev;
+ pmd->cc = cc;
+
+ if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Only ethernet mode supported",
+ 0);
+ return -1;
+ }
+
+ if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
+ ETH_MEMIF_FLAG_CONNECTED)) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Already connected", 0);
+ return -1;
+ }
+ strlcpy(pmd->remote_name, (char *)i->name,
+ sizeof(pmd->remote_name));
+
+ if (*pmd->secret != '\0') {
+ if (*i->secret == '\0') {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Secret required", 0);
+ return -1;
+ }
+ if (strncmp(pmd->secret, (char *)i->secret,
+ ETH_MEMIF_SECRET_SIZE) != 0) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Incorrect secret", 0);
+ return -1;
+ }
+ }
+
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
+ return 0;
+ }
+ }
+
+ /* ID not found on this socket */
+ MIF_LOG(DEBUG, "ID %u not found.", i->id);
+ memif_msg_enq_disconnect(cc, "ID not found", 0);
+ return -1;
+}
+
+static int
+memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
+ int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_region_t *ar = &msg->add_region;
+ struct memif_region *r;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
+ return -1;
+ }
+
+ if (ar->index >= ETH_MEMIF_MAX_REGION_NUM || ar->index != pmd->regions_num ||
+ pmd->regions[ar->index] != NULL) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ r->fd = fd;
+ r->region_size = ar->size;
+ r->addr = NULL;
+
+ pmd->regions[ar->index] = r;
+ pmd->regions_num++;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_ring_t *ar = &msg->add_ring;
+ struct memif_queue *mq;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
+ return -1;
+ }
+
+ /* check if we have enough queues */
+ if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
+ if (ar->index >= pmd->cfg.num_s2m_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_s2m_rings++;
+ } else {
+ if (ar->index >= pmd->cfg.num_m2s_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_m2s_rings++;
+ }
+
+ mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
+ dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
+
+ mq->intr_handle.fd = fd;
+ mq->log2_ring_size = ar->log2_ring_size;
+ mq->region = ar->region;
+ mq->ring_offset = ar->offset;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connect_t *c = &msg->connect;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connected_t *c = &msg->connected;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_disconnect_t *d = &msg->disconnect;
+
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ strlcpy(pmd->remote_disc_string, (char *)d->string,
+ sizeof(pmd->remote_disc_string));
+
+ MIF_LOG(INFO, "%s: Disconnect received: %s",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memif_disconnect(rte_eth_dev_allocated
+ (rte_vdev_device_name(pmd->vdev)));
+ return 0;
+}
+
+static int
+memif_msg_enq_ack(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ if (e == NULL)
+ return -1;
+
+ e->msg.type = MEMIF_MSG_TYPE_ACK;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_init(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_init_t *i = &e->msg.init;
+
+ if (e == NULL)
+ return -1;
+
+ i = &e->msg.init;
+ e->msg.type = MEMIF_MSG_TYPE_INIT;
+ i->version = MEMIF_VERSION;
+ i->id = pmd->id;
+ i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
+
+ strlcpy((char *)i->name, rte_version(), sizeof(i->name));
+
+ if (*pmd->secret != '\0')
+ strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_add_region_t *ar;
+ struct memif_region *mr = pmd->regions[idx];
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_region;
+ e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
+ e->fd = mr->fd;
+ ar->index = idx;
+ ar->size = mr->region_size;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
+ memif_ring_type_t type)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ struct memif_queue *mq;
+ memif_msg_add_ring_t *ar;
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_ring;
+ mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
+ dev->data->rx_queues[idx];
+
+ e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
+ e->fd = mq->intr_handle.fd;
+ ar->index = idx;
+ ar->offset = mq->ring_offset;
+ ar->region = mq->region;
+ ar->log2_ring_size = mq->log2_ring_size;
+ ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
+ ar->private_hdr_size = 0;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connect_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connect;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECT;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connected(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connected_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connected;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static void
+memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
+{
+ struct memif_msg_queue_elt *elt;
+ struct memif_control_channel *cc = arg;
+
+ /* close control channel fd */
+ close(intr_handle->fd);
+ /* clear message queue */
+ while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
+ TAILQ_REMOVE(&cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ /* free control channel */
+ rte_free(cc);
+}
+
+void
+memif_disconnect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *elt, *next;
+ struct memif_queue *mq;
+ struct rte_intr_handle *ih;
+ int i;
+ int ret;
+
+ if (pmd->cc != NULL) {
+ /* Clear control message queue (except disconnect message if any). */
+ for (elt = TAILQ_FIRST(&pmd->cc->msg_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
+ TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ }
+ /* send disconnect message (if there is any in queue) */
+ memif_msg_send_from_queue(pmd->cc);
+
+ /* at this point, there should be no more messages in queue */
+ if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
+ MIF_LOG(WARNING,
+ "%s: Unexpected message(s) in message queue.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+
+ ih = &pmd->cc->intr_handle;
+ if (ih->fd > 0) {
+ ret = rte_intr_callback_unregister(ih,
+ memif_intr_handler,
+ pmd->cc);
+ /*
+ * If callback is active (disconnecting based on
+ * received control message).
+ */
+ if (ret == -EAGAIN) {
+ ret = rte_intr_callback_unregister_pending(ih,
+ memif_intr_handler,
+ pmd->cc,
+ memif_intr_unregister_handler);
+ } else if (ret > 0) {
+ close(ih->fd);
+ rte_free(pmd->cc);
+ }
+ pmd->cc = NULL;
+ if (ret <= 0)
+ MIF_LOG(WARNING, "%s: Failed to unregister "
+ "control channel callback.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+ }
+
+ /* unconfig interrupts */
+ for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
+ if (pmd->role == MEMIF_ROLE_SLAVE) {
+ if (dev->data->tx_queues != NULL)
+ mq = dev->data->tx_queues[i];
+ else
+ continue;
+ } else {
+ if (dev->data->rx_queues != NULL)
+ mq = dev->data->rx_queues[i];
+ else
+ continue;
+ }
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+ for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
+ if (pmd->role == MEMIF_ROLE_MASTER) {
+ if (dev->data->tx_queues != NULL)
+ mq = dev->data->tx_queues[i];
+ else
+ continue;
+ } else {
+ if (dev->data->rx_queues != NULL)
+ mq = dev->data->rx_queues[i];
+ else
+ continue;
+ }
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+
+ memif_free_regions(pmd);
+
+ /* reset connection configuration */
+ memset(&pmd->run, 0, sizeof(pmd->run));
+
+ dev->data->dev_link.link_status = ETH_LINK_DOWN;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
+ MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
+}
+
+static int
+memif_msg_receive(struct memif_control_channel *cc)
+{
+ char ctl[CMSG_SPACE(sizeof(int)) +
+ CMSG_SPACE(sizeof(struct ucred))] = { 0 };
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ memif_msg_t msg = { 0 };
+ ssize_t size;
+ int ret = 0;
+ struct ucred *cr __rte_unused;
+ cr = 0;
+ struct cmsghdr *cmsg;
+ int afd = -1;
+ int i;
+ struct pmd_internals *pmd;
+
+ iov[0].iov_base = (void *)&msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+
+ size = recvmsg(cc->intr_handle.fd, &mh, 0);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(DEBUG, "Invalid message size.");
+ memif_msg_enq_disconnect(cc, "Invalid message size", 0);
+ return -1;
+ }
+ MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
+
+ cmsg = CMSG_FIRSTHDR(&mh);
+ while (cmsg) {
+ if (cmsg->cmsg_level == SOL_SOCKET) {
+ if (cmsg->cmsg_type == SCM_CREDENTIALS)
+ cr = (struct ucred *)CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_RIGHTS)
+ afd = *(int *)CMSG_DATA(cmsg);
+ }
+ cmsg = CMSG_NXTHDR(&mh, cmsg);
+ }
+
+ if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
+ MIF_LOG(DEBUG, "Unexpected message.");
+ memif_msg_enq_disconnect(cc, "Unexpected message", 0);
+ return -1;
+ }
+
+ /* get device from hash data */
+ switch (msg.type) {
+ case MEMIF_MSG_TYPE_ACK:
+ break;
+ case MEMIF_MSG_TYPE_HELLO:
+ ret = memif_msg_receive_hello(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_init_regions_and_queues(cc->dev);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_init(cc->dev);
+ if (ret < 0)
+ goto exit;
+ pmd = cc->dev->data->dev_private;
+ for (i = 0; i < pmd->regions_num; i++) {
+ ret = memif_msg_enq_add_region(cc->dev, i);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_S2M);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_M2S);
+ if (ret < 0)
+ goto exit;
+ }
+ ret = memif_msg_enq_connect(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_INIT:
+ /*
+ * This cc does not have an interface asociated with it.
+ * If suitable interface is found it will be assigned here.
+ */
+ ret = memif_msg_receive_init(cc, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_REGION:
+ ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_RING:
+ ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECT:
+ ret = memif_msg_receive_connect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_connected(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECTED:
+ ret = memif_msg_receive_connected(cc->dev, &msg);
+ break;
+ case MEMIF_MSG_TYPE_DISCONNECT:
+ ret = memif_msg_receive_disconnect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ break;
+ default:
+ memif_msg_enq_disconnect(cc, "Unknown message type", 0);
+ ret = -1;
+ goto exit;
+ }
+
+ exit:
+ return ret;
+}
+
+static void
+memif_intr_handler(void *arg)
+{
+ struct memif_control_channel *cc = arg;
+ int ret;
+
+ ret = memif_msg_receive(cc);
+ /* if driver failed to assign device */
+ if (cc->dev == NULL) {
+ ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
+ memif_intr_handler,
+ cc,
+ memif_intr_unregister_handler);
+ if (ret < 0)
+ MIF_LOG(WARNING,
+ "Failed to unregister control channel callback.");
+ return;
+ }
+ /* if memif_msg_receive failed */
+ if (ret < 0)
+ goto disconnect;
+
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto disconnect;
+
+ return;
+
+ disconnect:
+ if (cc->dev == NULL) {
+ MIF_LOG(WARNING, "eth dev not allocated");
+ return;
+ }
+ memif_disconnect(cc->dev);
+}
+
+static void
+memif_listener_handler(void *arg)
+{
+ struct memif_socket *socket = arg;
+ int sockfd;
+ int addr_len;
+ struct sockaddr_un client;
+ struct memif_control_channel *cc;
+ int ret;
+
+ addr_len = sizeof(client);
+ sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
+ (socklen_t *)&addr_len);
+ if (sockfd < 0) {
+ MIF_LOG(ERR,
+ "Failed to accept connection request on socket fd %d",
+ socket->intr_handle.fd);
+ return;
+ }
+
+ MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
+
+ cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
+ if (cc == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control channel.");
+ goto error;
+ }
+
+ cc->intr_handle.fd = sockfd;
+ cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ cc->socket = socket;
+ cc->dev = NULL;
+ TAILQ_INIT(&cc->msg_queue);
+
+ ret = rte_intr_callback_register(&cc->intr_handle, memif_intr_handler, cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to register control channel callback.");
+ goto error;
+ }
+
+ ret = memif_msg_enq_hello(cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to enqueue hello message.");
+ goto error;
+ }
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto error;
+
+ return;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (cc != NULL)
+ rte_free(cc);
+}
+
+static struct memif_socket *
+memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
+{
+ struct memif_socket *sock;
+ struct sockaddr_un un;
+ int sockfd;
+ int ret;
+ int on = 1;
+
+ sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
+ if (sock == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory for memif socket");
+ return NULL;
+ }
+
+ sock->listener = listener;
+ rte_memcpy(sock->filename, key, 256);
+ TAILQ_INIT(&sock->dev_queue);
+
+ if (listener != 0) {
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0)
+ goto error;
+
+ un.sun_family = AF_UNIX;
+ memcpy(un.sun_path, sock->filename,
+ sizeof(un.sun_path) - 1);
+
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
+ sizeof(on));
+ if (ret < 0)
+ goto error;
+ ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+ if (ret < 0)
+ goto error;
+ ret = listen(sockfd, 1);
+ if (ret < 0)
+ goto error;
+
+ MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
+ rte_vdev_device_name(pmd->vdev), sock->filename);
+
+ sock->intr_handle.fd = sockfd;
+ sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ ret = rte_intr_callback_register(&sock->intr_handle,
+ memif_listener_handler, sock);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt "
+ "callback for listener socket",
+ rte_vdev_device_name(pmd->vdev));
+ return NULL;
+ }
+ }
+
+ return sock;
+
+ error:
+ MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
+ rte_vdev_device_name(pmd->vdev), key, strerror(errno));
+ if (sock != NULL)
+ rte_free(sock);
+ return NULL;
+}
+
+static struct rte_hash *
+memif_create_socket_hash(void)
+{
+ struct rte_hash_parameters params = { 0 };
+ params.name = MEMIF_SOCKET_HASH_NAME;
+ params.entries = 256;
+ params.key_len = 256;
+ params.hash_func = rte_jhash;
+ params.hash_func_init_val = 0;
+ return rte_hash_create(¶ms);
+}
+
+int
+memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *tmp_pmd;
+ struct rte_hash *hash;
+ int ret;
+ char key[256];
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL) {
+ hash = memif_create_socket_hash();
+ if (hash == NULL) {
+ MIF_LOG(ERR, "Failed to create memif socket hash.");
+ return -1;
+ }
+ }
+
+ memset(key, 0, 256);
+ rte_memcpy(key, socket_filename, strlen(socket_filename));
+ ret = rte_hash_lookup_data(hash, key, (void **)&socket);
+ if (ret < 0) {
+ socket = memif_socket_create(pmd, key,
+ (pmd->role ==
+ MEMIF_ROLE_SLAVE) ? 0 : 1);
+ if (socket == NULL)
+ return -1;
+ ret = rte_hash_add_key_data(hash, key, socket);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to add socket to socket hash.");
+ return ret;
+ }
+ }
+ pmd->socket_filename = socket->filename;
+
+ if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
+ MIF_LOG(ERR, "Socket is a listener.");
+ return -1;
+ } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
+ MIF_LOG(ERR, "Socket is not a listener.");
+ return -1;
+ }
+
+ TAILQ_FOREACH(elt, &socket->dev_queue, next) {
+ tmp_pmd = elt->dev->data->dev_private;
+ if (tmp_pmd->id == pmd->id) {
+ MIF_LOG(ERR, "Memif device with id %d already "
+ "exists on socket %s",
+ pmd->id, socket->filename);
+ return -1;
+ }
+ }
+
+ elt = rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt), 0);
+ if (elt == NULL) {
+ MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ elt->dev = dev;
+ TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
+
+ return 0;
+}
+
+void
+memif_socket_remove_device(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt, *next;
+ struct rte_hash *hash;
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL)
+ return;
+
+ if (pmd->socket_filename == NULL)
+ return;
+
+ if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) < 0)
+ return;
+
+ for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->dev == dev) {
+ TAILQ_REMOVE(&socket->dev_queue, elt, next);
+ rte_free(elt);
+ pmd->socket_filename = NULL;
+ }
+ }
+
+ /* remove socket, if this was the last device using it */
+ if (TAILQ_EMPTY(&socket->dev_queue)) {
+ rte_hash_del_key(hash, socket->filename);
+ if (socket->listener) {
+ /* remove listener socket file,
+ * so we can create new one later.
+ */
+ remove(socket->filename);
+ }
+ rte_free(socket);
+ }
+}
+
+int
+memif_connect_master(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+ return 0;
+}
+
+int
+memif_connect_slave(struct rte_eth_dev *dev)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_un sun;
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0) {
+ MIF_LOG(ERR, "%s: Failed to open socket.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ sun.sun_family = AF_UNIX;
+
+ memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
+
+ ret = connect(sockfd, (struct sockaddr *)&sun,
+ sizeof(struct sockaddr_un));
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+ goto error;
+ }
+
+ MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+
+ pmd->cc = rte_zmalloc("memif-cc",
+ sizeof(struct memif_control_channel), 0);
+ if (pmd->cc == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate control channel.",
+ rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ pmd->cc->intr_handle.fd = sockfd;
+ pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ pmd->cc->socket = NULL;
+ pmd->cc->dev = dev;
+ TAILQ_INIT(&pmd->cc->msg_queue);
+
+ ret = rte_intr_callback_register(&pmd->cc->intr_handle,
+ memif_intr_handler, pmd->cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt callback "
+ "for control fd", rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ return 0;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (pmd->cc != NULL) {
+ rte_free(pmd->cc);
+ pmd->cc = NULL;
+ }
+ return -1;
+}
diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
new file mode 100644
index 000000000..c60614721
--- /dev/null
+++ b/drivers/net/memif/memif_socket.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_SOCKET_H_
+#define _MEMIF_SOCKET_H_
+
+#include <sys/queue.h>
+
+/**
+ * Remove device from socket device list. If no device is left on the socket,
+ * remove the socket as well.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_socket_remove_device(struct rte_eth_dev *dev);
+
+/**
+ * Enqueue disconnect message to control channel message queue.
+ *
+ * @param cc
+ * control channel
+ * @param reason
+ * const string stating disconnect reason (96 characters)
+ * @param err_code
+ * error code
+ */
+void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code);
+
+/**
+ * Initialize memif socket for specified device. If socket doesn't exist, create socket.
+ *
+ * @param dev
+ * memif ethernet device
+ * @param socket_filename
+ * socket filename
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
+
+/**
+ * Disconnect memif device. Close control channel and shared memory.
+ *
+ * @param dev
+ * ethernet device
+ */
+void memif_disconnect(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, enable connection establishment.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_master(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, send connection request.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_slave(struct rte_eth_dev *dev);
+
+struct memif_socket_dev_list_elt {
+ TAILQ_ENTRY(memif_socket_dev_list_elt) next;
+ struct rte_eth_dev *dev; /**< pointer to device internals */
+ char dev_name[RTE_ETH_NAME_MAX_LEN];
+};
+
+#define MEMIF_SOCKET_HASH_NAME "memif-sh"
+struct memif_socket {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ char filename[256]; /**< socket filename */
+
+ TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
+ /**< Queue of devices using this socket */
+ uint8_t listener; /**< if not zero socket is listener */
+};
+
+/* Control message queue. */
+struct memif_msg_queue_elt {
+ memif_msg_t msg; /**< control message */
+ TAILQ_ENTRY(memif_msg_queue_elt) next;
+ int fd; /**< fd to be sent to peer */
+};
+
+struct memif_control_channel {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
+ struct memif_socket *socket; /**< pointer to socket */
+ struct rte_eth_dev *dev; /**< pointer to device */
+};
+
+#endif /* MEMIF_SOCKET_H */
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
new file mode 100644
index 000000000..4dfe37d3a
--- /dev/null
+++ b/drivers/net/memif/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+if host_machine.system() != 'linux'
+ build = false
+endif
+
+sources = files('rte_eth_memif.c',
+ 'memif_socket.c')
+
+allow_experimental_apis = true
+
+deps += ['hash']
diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
new file mode 100644
index 000000000..8700e7589
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -0,0 +1,1198 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/if_ether.h>
+#include <errno.h>
+#include <sys/eventfd.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+#define ETH_MEMIF_ID_ARG "id"
+#define ETH_MEMIF_ROLE_ARG "role"
+#define ETH_MEMIF_PKT_BUFFER_SIZE_ARG "bsize"
+#define ETH_MEMIF_RING_SIZE_ARG "rsize"
+#define ETH_MEMIF_SOCKET_ARG "socket"
+#define ETH_MEMIF_MAC_ARG "mac"
+#define ETH_MEMIF_ZC_ARG "zero-copy"
+#define ETH_MEMIF_SECRET_ARG "secret"
+
+static const char *valid_arguments[] = {
+ ETH_MEMIF_ID_ARG,
+ ETH_MEMIF_ROLE_ARG,
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ ETH_MEMIF_RING_SIZE_ARG,
+ ETH_MEMIF_SOCKET_ARG,
+ ETH_MEMIF_MAC_ARG,
+ ETH_MEMIF_ZC_ARG,
+ ETH_MEMIF_SECRET_ARG,
+ NULL
+};
+
+const char *
+memif_version(void)
+{
+ return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
+}
+
+static void
+memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
+{
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+ dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static memif_ring_t *
+memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
+{
+ /* rings only in region 0 */
+ void *p = pmd->regions[0]->addr;
+ int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size);
+
+ p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
+
+ return (memif_ring_t *)p;
+}
+
+static void *
+memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
+{
+ return ((uint8_t *)pmd->regions[d->region]->addr + d->offset);
+}
+
+static int
+memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail,
+ struct rte_mbuf *tail)
+{
+ /* Check for number-of-segments-overflow */
+ if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS))
+ return -EOVERFLOW;
+
+ /* Chain 'tail' onto the old tail */
+ cur_tail->next = tail;
+
+ /* accumulate number of segments and total length. */
+ head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs);
+
+ tail->pkt_len = tail->data_len;
+ head->pkt_len += tail->pkt_len;
+
+ return 0;
+}
+
+static uint16_t
+eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
+ uint16_t n_rx_pkts = 0;
+ uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
+ RTE_PKTMBUF_HEADROOM;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail;
+ uint64_t b;
+ ssize_t size __rte_unused;
+ uint16_t head;
+ int ret;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ /* consume interrupt */
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
+ size = read(mq->intr_handle.fd, &b, sizeof(b));
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
+ last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+ if (cur_slot == last_slot)
+ goto refill;
+ n_slots = last_slot - cur_slot;
+
+ while (n_slots && n_rx_pkts < nb_pkts) {
+ mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf_head == NULL))
+ goto no_free_bufs;
+ mbuf = mbuf_head;
+ mbuf->port = mq->in_port;
+
+next_slot:
+ s0 = cur_slot & mask;
+ d0 = &ring->desc[s0];
+
+ src_len = d0->length;
+ dst_off = 0;
+ src_off = 0;
+
+ do {
+ dst_len = mbuf_size - dst_off;
+ if (dst_len == 0) {
+ dst_off = 0;
+ dst_len = mbuf_size;
+
+ /* store pointer to tail */
+ mbuf_tail = mbuf;
+ mbuf = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf == NULL))
+ goto no_free_bufs;
+ mbuf->port = mq->in_port;
+ ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf);
+ if (unlikely(ret < 0)) {
+ MIF_LOG(ERR, "%s: number-of-segments-overflow",
+ rte_vdev_device_name(pmd->vdev));
+ rte_pktmbuf_free(mbuf);
+ goto no_free_bufs;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ rte_pktmbuf_data_len(mbuf) += cp_len;
+ rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
+ if (mbuf != mbuf_head)
+ rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
+
+ memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
+ (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
+
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ } while (src_len);
+
+ cur_slot++;
+ n_slots--;
+
+ if (d0->flags & MEMIF_DESC_FLAG_NEXT)
+ goto next_slot;
+
+ mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
+ *bufs++ = mbuf_head;
+ n_rx_pkts++;
+ }
+
+no_free_bufs:
+ if (type == MEMIF_RING_S2M) {
+ rte_mb();
+ ring->tail = cur_slot;
+ mq->last_head = cur_slot;
+ } else {
+ mq->last_tail = cur_slot;
+ }
+
+refill:
+ if (type == MEMIF_RING_M2S) {
+ head = ring->head;
+ n_slots = ring_size - head + mq->last_tail;
+
+ while (n_slots--) {
+ s0 = head++ & mask;
+ d0 = &ring->desc[s0];
+ d0->length = pmd->run.pkt_buffer_size;
+ }
+ rte_mb();
+ ring->head = head;
+ }
+
+ mq->n_pkts += n_rx_pkts;
+ return n_rx_pkts;
+}
+
+static uint16_t
+eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head;
+ uint64_t a;
+ ssize_t size;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ n_free = ring->tail - mq->last_tail;
+ mq->last_tail += n_free;
+ slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+
+ if (type == MEMIF_RING_S2M)
+ n_free = ring_size - ring->head + mq->last_tail;
+ else
+ n_free = ring->head - ring->tail;
+
+ while (n_tx_pkts < nb_pkts && n_free) {
+ mbuf_head = *bufs++;
+ mbuf = mbuf_head;
+
+ saved_slot = slot;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+
+next_in_chain:
+ src_off = 0;
+ src_len = rte_pktmbuf_data_len(mbuf);
+
+ while (src_len) {
+ if (dst_len == 0) {
+ if (n_free) {
+ slot++;
+ n_free--;
+ d0->flags |= MEMIF_DESC_FLAG_NEXT;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+ d0->flags = 0;
+ } else {
+ slot = saved_slot;
+ goto no_free_slots;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
+ rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
+ cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ dst_len -= cp_len;
+
+ d0->length = dst_off;
+ }
+
+ if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
+ mbuf = mbuf->next;
+ goto next_in_chain;
+ }
+
+ n_tx_pkts++;
+ slot++;
+ n_free--;
+ rte_pktmbuf_free(mbuf_head);
+ }
+
+no_free_slots:
+ rte_mb();
+ if (type == MEMIF_RING_S2M)
+ ring->head = slot;
+ else
+ ring->tail = slot;
+
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
+ a = 1;
+ size = write(mq->intr_handle.fd, &a, sizeof(a));
+ if (unlikely(size < 0)) {
+ MIF_LOG(WARNING,
+ "%s: Failed to send interrupt. %s",
+ rte_vdev_device_name(pmd->vdev), strerror(errno));
+ }
+ }
+
+ mq->n_err += nb_pkts - n_tx_pkts;
+ mq->n_pkts += n_tx_pkts;
+ return n_tx_pkts;
+}
+
+void
+memif_free_regions(struct pmd_internals *pmd)
+{
+ int i;
+ struct memif_region *r;
+
+ /* regions are allocated contiguously, so it's
+ * enough to loop until 'pmd->regions_num'
+ */
+ for (i = 0; i < pmd->regions_num; i++) {
+ r = pmd->regions[i];
+ if (r != NULL) {
+ if (r->addr != NULL) {
+ munmap(r->addr, r->region_size);
+ if (r->fd > 0) {
+ close(r->fd);
+ r->fd = -1;
+ }
+ }
+ rte_free(r);
+ pmd->regions[i] = NULL;
+ }
+ }
+ pmd->regions_num = 0;
+}
+
+static int
+memif_region_init_shm(struct pmd_internals *pmd, uint8_t has_buffers)
+{
+ char shm_name[ETH_MEMIF_SHM_NAME_SIZE];
+ int ret = 0;
+ struct memif_region *r;
+
+ if (pmd->regions_num >= ETH_MEMIF_MAX_REGION_NUM) {
+ MIF_LOG(ERR, "%s: Too many regions.", rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ /* calculate buffer offset */
+ r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+
+ r->region_size = r->pkt_buffer_offset;
+ /* if region has buffers, add buffers size to region_size */
+ if (has_buffers == 1)
+ r->region_size += (uint32_t)(pmd->run.pkt_buffer_size *
+ (1 << pmd->run.log2_ring_size) *
+ (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings));
+
+ memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE);
+ snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d",
+ pmd->regions_num);
+
+ r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
+ if (r->fd < 0) {
+ MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ ret = -1;
+ goto error;
+ }
+
+ ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ ret = ftruncate(r->fd, r->region_size);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ r->addr = mmap(NULL, r->region_size, PROT_READ |
+ PROT_WRITE, MAP_SHARED, r->fd, 0);
+ if (r->addr == MAP_FAILED) {
+ MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(ret));
+ ret = -1;
+ goto error;
+ }
+
+ pmd->regions[pmd->regions_num] = r;
+ pmd->regions_num++;
+
+ return ret;
+
+error:
+ if (r->fd > 0)
+ close(r->fd);
+ r->fd = -1;
+
+ return ret;
+}
+
+static int
+memif_regions_init(struct pmd_internals *pmd)
+{
+ int ret;
+
+ /* create one buffer region */
+ ret = memif_region_init_shm(pmd, /* has buffer */ 1);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void
+memif_init_rings(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_ring_t *ring;
+ int i, j;
+ uint16_t slot;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = i * (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
+ (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = (i + pmd->run.num_s2m_rings) *
+ (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
+ (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+}
+
+/* called only by slave */
+static void
+memif_init_queues(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = dev->data->tx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for tx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = dev->data->rx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for rx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+}
+
+int
+memif_init_regions_and_queues(struct rte_eth_dev *dev)
+{
+ int ret;
+
+ ret = memif_regions_init(dev->data->dev_private);
+ if (ret < 0)
+ return ret;
+
+ memif_init_rings(dev);
+
+ memif_init_queues(dev);
+
+ return 0;
+}
+
+int
+memif_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_region *mr;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ mr = pmd->regions[i];
+ if (mr != NULL) {
+ if (mr->addr == NULL) {
+ if (mr->fd < 0)
+ return -1;
+ mr->addr = mmap(NULL, mr->region_size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, mr->fd, 0);
+ if (mr->addr == NULL)
+ return -1;
+ }
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->ring_offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_MASTER)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->ring_offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_SLAVE)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
+ dev->data->dev_link.link_status = ETH_LINK_UP;
+ MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
+ return 0;
+}
+
+static int
+memif_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int ret = 0;
+
+ switch (pmd->role) {
+ case MEMIF_ROLE_SLAVE:
+ ret = memif_connect_slave(dev);
+ break;
+ case MEMIF_ROLE_MASTER:
+ ret = memif_connect_master(dev);
+ break;
+ default:
+ MIF_LOG(ERR, "%s: Unknown role: %d.",
+ rte_vdev_device_name(pmd->vdev), pmd->role);
+ ret = -1;
+ break;
+ }
+
+ return ret;
+}
+
+static void
+memif_dev_close(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memif_msg_enq_disconnect(pmd->cc, "Device closed", 0);
+ memif_disconnect(dev);
+
+ memif_socket_remove_device(dev);
+}
+
+static int
+memif_dev_configure(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ /*
+ * SLAVE - TXQ
+ * MASTER - RXQ
+ */
+ pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_tx_queues : dev->data->nb_rx_queues;
+
+ /*
+ * SLAVE - RXQ
+ * MASTER - TXQ
+ */
+ pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+
+ return 0;
+}
+
+static int
+memif_tx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type =
+ (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->pmd = pmd;
+ dev->data->tx_queues[qid] = mq;
+
+ return 0;
+}
+
+static int
+memif_rx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->mempool = mb_pool;
+ mq->in_port = dev->data->port_id;
+ mq->pmd = pmd;
+ dev->data->rx_queues[qid] = mq;
+
+ return 0;
+}
+
+static void
+memif_queue_release(void *queue)
+{
+ struct memif_queue *mq = (struct memif_queue *)queue;
+
+ if (!mq)
+ return;
+
+ rte_free(mq);
+}
+
+static int
+memif_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static int
+memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+ uint8_t tmp, nq;
+
+ stats->ipackets = 0;
+ stats->ibytes = 0;
+ stats->opackets = 0;
+ stats->obytes = 0;
+ stats->oerrors = 0;
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
+ pmd->run.num_m2s_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* RX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->rx_queues[i];
+ stats->q_ipackets[i] = mq->n_pkts;
+ stats->q_ibytes[i] = mq->n_bytes;
+ stats->ipackets += mq->n_pkts;
+ stats->ibytes += mq->n_bytes;
+ }
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
+ pmd->run.num_s2m_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* TX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->tx_queues[i];
+ stats->q_opackets[i] = mq->n_pkts;
+ stats->q_obytes[i] = mq->n_bytes;
+ stats->q_errors[i] = mq->n_err;
+ stats->opackets += mq->n_pkts;
+ stats->obytes += mq->n_bytes;
+ stats->oerrors += mq->n_err;
+ }
+ return 0;
+}
+
+static void
+memif_stats_reset(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+ struct memif_queue *mq;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
+ dev->data->rx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
+ dev->data->tx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+}
+
+static int
+memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
+ rte_vdev_device_name(pmd->vdev));
+
+ return -1;
+}
+
+static int
+memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
+
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = memif_dev_start,
+ .dev_close = memif_dev_close,
+ .dev_infos_get = memif_dev_info,
+ .dev_configure = memif_dev_configure,
+ .tx_queue_setup = memif_tx_queue_setup,
+ .rx_queue_setup = memif_rx_queue_setup,
+ .rx_queue_release = memif_queue_release,
+ .tx_queue_release = memif_queue_release,
+ .rx_queue_intr_enable = memif_rx_queue_intr_enable,
+ .rx_queue_intr_disable = memif_rx_queue_intr_disable,
+ .link_update = memif_link_update,
+ .stats_get = memif_stats_get,
+ .stats_reset = memif_stats_reset,
+};
+
+static int
+memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
+ memif_interface_id_t id, uint32_t flags,
+ const char *socket_filename,
+ memif_log2_ring_size_t log2_ring_size,
+ uint16_t pkt_buffer_size, const char *secret,
+ struct ether_addr *ether_addr)
+{
+ int ret = 0;
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct pmd_internals *pmd;
+ const unsigned int numa_node = vdev->device.numa_node;
+ const char *name = rte_vdev_device_name(vdev);
+
+ if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
+ MIF_LOG(ERR, "Zero-copy slave not supported.");
+ return -1;
+ }
+
+ eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
+ if (eth_dev == NULL) {
+ MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
+ return -1;
+ }
+
+ pmd = eth_dev->data->dev_private;
+ memset(pmd, 0, sizeof(*pmd));
+
+ pmd->vdev = vdev;
+ pmd->id = id;
+ pmd->flags = flags;
+ pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
+ pmd->role = role;
+
+ ret = memif_socket_init(eth_dev, socket_filename);
+ if (ret < 0)
+ return ret;
+
+ memset(pmd->secret, 0, sizeof(char) * ETH_MEMIF_SECRET_SIZE);
+ if (secret != NULL)
+ strlcpy(pmd->secret, secret, sizeof(pmd->secret));
+
+ pmd->cfg.log2_ring_size = log2_ring_size;
+ /* set in .dev_configure() */
+ pmd->cfg.num_s2m_rings = 0;
+ pmd->cfg.num_m2s_rings = 0;
+
+ pmd->cfg.pkt_buffer_size = pkt_buffer_size;
+
+ data = eth_dev->data;
+ data->dev_private = pmd;
+ data->numa_node = numa_node;
+ data->mac_addrs = ether_addr;
+
+ eth_dev->dev_ops = &ops;
+ eth_dev->device = &vdev->device;
+ eth_dev->rx_pkt_burst = eth_memif_rx;
+ eth_dev->tx_pkt_burst = eth_memif_tx;
+
+ eth_dev->data->dev_flags &= RTE_ETH_DEV_CLOSE_REMOVE;
+
+ rte_eth_dev_probing_finish(eth_dev);
+
+ return 0;
+}
+
+static int
+memif_set_role(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ enum memif_role_t *role = (enum memif_role_t *)extra_args;
+
+ if (strstr(value, "master") != NULL) {
+ *role = MEMIF_ROLE_MASTER;
+ } else if (strstr(value, "slave") != NULL) {
+ *role = MEMIF_ROLE_SLAVE;
+ } else {
+ MIF_LOG(ERR, "Unknown role: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint32_t *flags = (uint32_t *)extra_args;
+
+ if (strstr(value, "yes") != NULL) {
+ *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
+ } else if (strstr(value, "no") != NULL) {
+ *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
+ } else {
+ MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
+
+ /* even if parsing fails, 0 is a valid id */
+ *id = strtoul(value, NULL, 10);
+ return 0;
+}
+
+static int
+memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ uint16_t *pkt_buffer_size = (uint16_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > 0xFFFF) {
+ MIF_LOG(ERR, "Invalid buffer size: %s.", value);
+ return -EINVAL;
+ }
+ *pkt_buffer_size = tmp;
+ return 0;
+}
+
+static int
+memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ memif_log2_ring_size_t *log2_ring_size =
+ (memif_log2_ring_size_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
+ MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
+ value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
+ return -EINVAL;
+ }
+ *log2_ring_size = tmp;
+ return 0;
+}
+
+/* check if directory exists and if we have permission to read/write */
+static int
+memif_check_socket_filename(const char *filename)
+{
+ char *dir = NULL, *tmp;
+ uint32_t idx;
+ int ret = 0;
+
+ tmp = strrchr(filename, '/');
+ if (tmp != NULL) {
+ idx = tmp - filename;
+ dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
+ if (dir == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory.");
+ return -1;
+ }
+ strlcpy(dir, filename, sizeof(char) * (idx + 1));
+ }
+
+ if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
+ W_OK, AT_EACCESS) < 0)) {
+ MIF_LOG(ERR, "Invalid directory: '%s'.", dir);
+ ret = -EINVAL;
+ }
+
+ if (dir != NULL)
+ rte_free(dir);
+
+ return ret;
+}
+
+static int
+memif_set_socket_filename(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ const char **socket_filename = (const char **)extra_args;
+
+ *socket_filename = value;
+ return memif_check_socket_filename(*socket_filename);
+}
+
+static int
+memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ struct ether_addr *ether_addr = (struct ether_addr *)extra_args;
+ int ret = 0;
+
+ ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+ ðer_addr->addr_bytes[0], ðer_addr->addr_bytes[1],
+ ðer_addr->addr_bytes[2], ðer_addr->addr_bytes[3],
+ ðer_addr->addr_bytes[4], ðer_addr->addr_bytes[5]);
+ if (ret != 6)
+ MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
+ return 0;
+}
+
+static int
+memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **secret = (const char **)extra_args;
+
+ *secret = value;
+ return 0;
+}
+
+static int
+rte_pmd_memif_probe(struct rte_vdev_device *vdev)
+{
+ RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
+ RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
+ int ret = 0;
+ struct rte_kvargs *kvlist;
+ const char *name = rte_vdev_device_name(vdev);
+ enum memif_role_t role = MEMIF_ROLE_SLAVE;
+ memif_interface_id_t id = 0;
+ uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
+ memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
+ const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
+ uint32_t flags = 0;
+ const char *secret = NULL;
+ struct ether_addr *ether_addr = rte_zmalloc("", sizeof(struct ether_addr), 0);
+
+ eth_random_addr(ether_addr->addr_bytes);
+
+ MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
+
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ MIF_LOG(ERR, "Multi-processing not supported for memif.");
+ /* TODO:
+ * Request connection information.
+ *
+ * Once memif in the primary process is connected,
+ * broadcast connection information.
+ */
+ return -1;
+ }
+
+ kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
+
+ /* parse parameters */
+ if (kvlist != NULL) {
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
+ &memif_set_role, &role);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
+ &memif_set_id, &id);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ &memif_set_bs, &pkt_buffer_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
+ &memif_set_rs, &log2_ring_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
+ &memif_set_socket_filename,
+ (void *)(&socket_filename));
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
+ &memif_set_mac, ether_addr);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
+ &memif_set_zc, &flags);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
+ &memif_set_secret, (void *)(&secret));
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* create interface */
+ ret = memif_create(vdev, role, id, flags, socket_filename,
+ log2_ring_size, pkt_buffer_size, secret, ether_addr);
+
+exit:
+ if (kvlist != NULL)
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_memif_remove(struct rte_vdev_device *vdev)
+{
+ struct rte_eth_dev *eth_dev;
+
+ eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
+ if (eth_dev == NULL)
+ return 0;
+
+ rte_eth_dev_close(eth_dev->data->port_id);
+
+ return 0;
+}
+
+static struct rte_vdev_driver pmd_memif_drv = {
+ .probe = rte_pmd_memif_probe,
+ .remove = rte_pmd_memif_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
+
+RTE_PMD_REGISTER_PARAM_STRING(net_memif,
+ ETH_MEMIF_ID_ARG "=<int>"
+ ETH_MEMIF_ROLE_ARG "=master|slave"
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG "=<int>"
+ ETH_MEMIF_RING_SIZE_ARG "=<int>"
+ ETH_MEMIF_SOCKET_ARG "=<string>"
+ ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
+ ETH_MEMIF_ZC_ARG "=yes|no"
+ ETH_MEMIF_SECRET_ARG "=<string>");
+
+int memif_logtype;
+
+RTE_INIT(memif_init_log)
+{
+ memif_logtype = rte_log_register("pmd.net.memif");
+ if (memif_logtype >= 0)
+ rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
new file mode 100644
index 000000000..8cee2643c
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.h
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_ETH_MEMIF_H_
+#define _RTE_ETH_MEMIF_H_
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif /* GNU_SOURCE */
+
+#include <sys/queue.h>
+
+#include <rte_ethdev_driver.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
+#include "memif.h"
+
+#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/run/memif.sock"
+#define ETH_MEMIF_DEFAULT_RING_SIZE 10
+#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
+
+#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
+#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
+#define ETH_MEMIF_MAX_REGION_NUM 256
+
+#define ETH_MEMIF_SHM_NAME_SIZE 32
+#define ETH_MEMIF_DISC_STRING_SIZE 96
+#define ETH_MEMIF_SECRET_SIZE 24
+
+extern int memif_logtype;
+
+#define MIF_LOG(level, fmt, args...) \
+ rte_log(RTE_LOG_ ## level, memif_logtype, \
+ "%s(): " fmt "\n", __func__, ##args)
+
+enum memif_role_t {
+ MEMIF_ROLE_MASTER,
+ MEMIF_ROLE_SLAVE,
+};
+
+struct memif_region {
+ void *addr; /**< shared memory address */
+ memif_region_size_t region_size; /**< shared memory size */
+ int fd; /**< shared memory file descriptor */
+ uint32_t pkt_buffer_offset;
+ /**< offset from 'addr' to first packet buffer */
+};
+
+struct memif_queue {
+ struct rte_mempool *mempool; /**< mempool for RX packets */
+ struct pmd_internals *pmd; /**< device internals */
+
+ memif_ring_type_t type; /**< ring type */
+ memif_region_index_t region; /**< shared memory region index */
+
+ uint16_t in_port; /**< port id */
+
+ memif_region_offset_t ring_offset;
+ /**< ring offset from start of shm region (ring - memif_region.addr) */
+
+ uint16_t last_head; /**< last ring head */
+ uint16_t last_tail; /**< last ring tail */
+
+ /* rx/tx info */
+ uint64_t n_pkts; /**< number of rx/tx packets */
+ uint64_t n_bytes; /**< number of rx/tx bytes */
+ uint64_t n_err; /**< number of tx errors */
+
+ memif_ring_t *ring; /**< pointer to ring */
+
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+};
+
+struct pmd_internals {
+ memif_interface_id_t id; /**< unique id */
+ enum memif_role_t role; /**< device role */
+ uint32_t flags; /**< device status flags */
+#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
+/**< device is connecting */
+#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
+/**< device is connected */
+#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
+/**< device is zero-copy enabled */
+#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
+/**< device has not been configured and can not accept connection requests */
+
+ char *socket_filename; /**< pointer to socket filename */
+ char secret[ETH_MEMIF_SECRET_SIZE]; /**< secret (optional security parameter) */
+
+ struct memif_control_channel *cc; /**< control channel */
+
+ struct memif_region *regions[ETH_MEMIF_MAX_REGION_NUM];
+ /**< shared memory regions */
+ memif_region_index_t regions_num; /**< number of regions */
+
+ /* remote info */
+ char remote_name[RTE_DEV_NAME_MAX_LEN]; /**< remote app name */
+ char remote_if_name[RTE_DEV_NAME_MAX_LEN]; /**< remote peer name */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } cfg; /**< Configured parameters (max values) */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } run;
+ /**< Parameters used in active connection */
+
+ char local_disc_string[ETH_MEMIF_DISC_STRING_SIZE];
+ /**< local disconnect reason */
+ char remote_disc_string[ETH_MEMIF_DISC_STRING_SIZE];
+ /**< remote disconnect reason */
+
+ struct rte_vdev_device *vdev; /**< vdev handle */
+};
+
+/**
+ * Unmap shared memory and free regions from memory.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_free_regions(struct pmd_internals *pmd);
+
+/**
+ * Finalize connection establishment process. Map shared memory file
+ * (master role), initialize ring queue, set link status up.
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect(struct rte_eth_dev *dev);
+
+/**
+ * Create shared memory file and initialize ring queue.
+ * Only called by slave when establishing connection
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_init_regions_and_queues(struct rte_eth_dev *dev);
+
+/**
+ * Get memif version string.
+ *
+ * @return
+ * - memif version string
+ */
+const char *memif_version(void);
+
+#ifndef MFD_HUGETLB
+#ifndef __NR_memfd_create
+
+#if defined __x86_64__
+#define __NR_memfd_create 319
+#elif defined __x86_32__
+#define __NR_memfd_create 1073742143
+#elif defined __arm__
+#define __NR_memfd_create 385
+#elif defined __aarch64__
+#define __NR_memfd_create 279
+#elif defined __powerpc__
+#define __NR_memfd_create 360
+#elif defined __i386__
+#define __NR_memfd_create 356
+#else
+#error "__NR_memfd_create unknown for this architecture"
+#endif
+
+#endif /* __NR_memfd_create */
+
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif /* MFD_HUGETLB */
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#endif /* RTE_ETH_MEMIF_H */
diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
new file mode 100644
index 000000000..8861484fb
--- /dev/null
+++ b/drivers/net/memif/rte_pmd_memif_version.map
@@ -0,0 +1,4 @@
+DPDK_19.08 {
+
+ local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index ed99896c3..b57073483 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -24,6 +24,7 @@ drivers = ['af_packet',
'ixgbe',
'kni',
'liquidio',
+ 'memif',
'mlx4',
'mlx5',
'mvneta',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7c9b4b538..326b2a30b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -174,6 +174,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
--
2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v9] /net: memory interface (memif)
2019-05-20 10:18 ` [dpdk-dev] [RFC v9] " Jakub Grajciar
@ 2019-05-29 17:29 ` Ferruh Yigit
2019-05-30 12:38 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-05-31 6:22 ` [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD Jakub Grajciar
1 sibling, 1 reply; 58+ messages in thread
From: Ferruh Yigit @ 2019-05-29 17:29 UTC (permalink / raw)
To: Jakub Grajciar, dev
On 5/20/2019 11:18 AM, Jakub Grajciar wrote:
> Memory interface (memif), provides high performance
> packet transfer over shared memory.
Can you please update the patch title as following in next version:
net/memif: introduce memory interface (memif) PMD
Can you please drop the RFC from next version, the patch is mature enough and we
can start regular versioning on it (without RFC), and RFC tag prefent DPDK
community lab to process the patch, so dropping the RFC will help there.
>
> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
<...>
> +
> +.. csv-table:: **Memif configuration options**
> + :header: "Option", "Description", "Default", "Valid value"
> +
> + "id=0", "Used to identify peer interface", "0", "uint32_t"
> + "role=master", "Set memif role", "slave", "master|slave"
> + "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
What happens is 'bsize < mbuf size'? I didn't see any check in the code but is
there any assumption around this?
Or any assumption that slave and master packet should be same? Or any other
relation?
If there is any assumption it may be good to add checks to the code and document
here.
> + "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
> + "nrxq=2", "Number of RX queues", "1", "255"
> + "ntxq=2", "Number of TX queues", "1", "255"
> + "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
> + "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
> + "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
> + "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
> +
<...>
> +Example: testpmd and testpmd
> +----------------------------
> +In this example we run two instances of testpmd application and transmit packets over memif.
> +
> +First create ``master`` interface::
> +
> + #./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
> +
> +Now create ``slave`` interface (master must be already running so the slave will connect)::
> +
> + #./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
> +
> +Set forwarding mode in one of the instances to 'rx only' and the other to 'tx only'::
> +
> + testpmd> set fwd rxonly
> + testpmd> start
> +
> + testpmd> set fwd txonly
> + testpmd> start
Also following works, above sample was not clear for me if the shared buffer is
only for one direction communication, but it is not the case, you can consider
adding below as well to clarify this:
Slave:
testpmd> start
Master:
testpmd> start tx_first
And if you want you can add multiple queue sample too, which gives better
performance:
master:
./testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1
--vdev=net_memif,role=master -- -i --rxq=4 --txq=4
slave:
./testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2
--vdev=net_memif,role=slave -- -i --rxq=4 --txq=4
<...>
> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
> +LDLIBS += -lrte_ethdev -lrte_kvargs
> +LDLIBS += -lrte_hash
Requires to be linked against vdev bus library for shared build:
LDLIBS += -lrte_bus_vdev
Can you please be sure to that PMD can be compiled as shared library?
<...>
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
> +
> +if host_machine.system() != 'linux'
> + build = false
> +endif
> +
> +sources = files('rte_eth_memif.c',
> + 'memif_socket.c')
> +
> +allow_experimental_apis = true
Can you please list the used experimental APIs here in a commented way, as it is
done in Makefile?
<...>
> +static void
> +memif_dev_close(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> +
> + memif_msg_enq_disconnect(pmd->cc, "Device closed", 0);
> + memif_disconnect(dev);
> +
> + memif_socket_remove_device(dev);
On latest agreement, .dev_close should free all internal data structures, and
since 'RTE_ETH_DEV_CLOSE_REMOVE' flag is set, generic ones will be cleaned by API.
Please check what is clean by API from 'rte_eth_dev_release_port()' remaining
memmory allocated by PMD should be freed here (rx, tx queues for example)
<...>
> + /* TX stats */
> + for (i = 0; i < nq; i++) {
> + mq = dev->data->tx_queues[i];
> + stats->q_opackets[i] = mq->n_pkts;
> + stats->q_obytes[i] = mq->n_bytes;
> + stats->q_errors[i] = mq->n_err;
'q_errors' is for Rx, there is a patchset on top of the next-net head, right now
this value only incorporate to overall Tx error stats (oerrors) not in this field.
<...>
> +
> + eth_dev->data->dev_flags &= RTE_ETH_DEV_CLOSE_REMOVE;
Are you sure you want to '&' this flag, shouldn't it be "|="?
<...>
> +/* check if directory exists and if we have permission to read/write */
> +static int
> +memif_check_socket_filename(const char *filename)
> +{
> + char *dir = NULL, *tmp;
> + uint32_t idx;
> + int ret = 0;
> +
> + tmp = strrchr(filename, '/');
> + if (tmp != NULL) {
> + idx = tmp - filename;
> + dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
> + if (dir == NULL) {
> + MIF_LOG(ERR, "Failed to allocate memory.");
> + return -1;
> + }
> + strlcpy(dir, filename, sizeof(char) * (idx + 1));
> + }
> +
> + if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
> + W_OK, AT_EACCESS) < 0)) {
> + MIF_LOG(ERR, "Invalid directory: '%s'.", dir);
Getting following build error from "gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1)":
In file included from .../dpdk/drivers/net/memif/rte_eth_memif.c:27:
In function ‘memif_check_socket_filename’,
inlined from ‘memif_set_socket_filename’ at
.../dpdk/drivers/net/memif/rte_eth_memif.c:1052:9:
.../dpdk/drivers/net/memif/rte_eth_memif.h:35:2: error: ‘%s’ directive argument
is null [-Werror=format-overflow=]
35 | rte_log(RTE_LOG_ ## level, memif_logtype, \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
36 | "%s(): " fmt "\n", __func__, ##args)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.../dpdk/drivers/net/memif/rte_eth_memif.c:1035:3: note: in expansion of macro
‘MIF_LOG’
1035 | MIF_LOG(ERR, "Invalid directory: '%s'.", dir);
| ^~~~~~~
.../dpdk/drivers/net/memif/rte_eth_memif.c: In function ‘memif_set_socket_filename’:
.../dpdk/drivers/net/memif/rte_eth_memif.c:1098:37: note: format string is
defined here
1098 | MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
<...>
> +static int
> +memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> + struct ether_addr *ether_addr = (struct ether_addr *)extra_args;
Can you please rebase next version on top of latest head, which got some patches
that are adding rte_ prefix to these structs.
<...>
> +#ifndef _RTE_ETH_MEMIF_H_
> +#define _RTE_ETH_MEMIF_H_
> +
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif /* GNU_SOURCE */
Why this was required?
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [RFC v9] /net: memory interface (memif)
2019-05-29 17:29 ` Ferruh Yigit
@ 2019-05-30 12:38 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
0 siblings, 0 replies; 58+ messages in thread
From: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) @ 2019-05-30 12:38 UTC (permalink / raw)
To: Ferruh Yigit, dev
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Wednesday, May 29, 2019 7:29 PM
> To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
> <jgrajcia@cisco.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC v9] /net: memory interface (memif)
> > +
> > +.. csv-table:: **Memif configuration options**
> > + :header: "Option", "Description", "Default", "Valid value"
> > +
> > + "id=0", "Used to identify peer interface", "0", "uint32_t"
> > + "role=master", "Set memif role", "slave", "master|slave"
> > + "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
>
> What happens is 'bsize < mbuf size'? I didn't see any check in the code but is
> there any assumption around this?
> Or any assumption that slave and master packet should be same? Or any
> other relation?
> If there is any assumption it may be good to add checks to the code and
> document here.
There is no relation between bsize and mbuf size. Memif driver will consume as many buffers as it needs (chaining them).
> > +#ifndef _RTE_ETH_MEMIF_H_
> > +#define _RTE_ETH_MEMIF_H_
> > +
> > +#ifndef _GNU_SOURCE
> > +#define _GNU_SOURCE
> > +#endif /* GNU_SOURCE */
>
> Why this was required?
_GNU_SOURCE is required by memfd_create().
^ permalink raw reply [flat|nested] 58+ messages in thread
* [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD
2019-05-20 10:18 ` [dpdk-dev] [RFC v9] " Jakub Grajciar
2019-05-29 17:29 ` Ferruh Yigit
@ 2019-05-31 6:22 ` Jakub Grajciar
2019-05-31 7:43 ` Ye Xiaolong
` (3 more replies)
1 sibling, 4 replies; 58+ messages in thread
From: Jakub Grajciar @ 2019-05-31 6:22 UTC (permalink / raw)
To: dev; +Cc: Jakub Grajciar
Memory interface (memif), provides high performance
packet transfer over shared memory.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
---
MAINTAINERS | 6 +
config/common_base | 5 +
config/common_linux | 1 +
doc/guides/nics/features/memif.ini | 14 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/memif.rst | 234 ++++
doc/guides/rel_notes/release_19_08.rst | 5 +
drivers/net/Makefile | 1 +
drivers/net/memif/Makefile | 31 +
drivers/net/memif/memif.h | 179 +++
drivers/net/memif/memif_socket.c | 1124 +++++++++++++++++
drivers/net/memif/memif_socket.h | 105 ++
drivers/net/memif/meson.build | 15 +
drivers/net/memif/rte_eth_memif.c | 1206 +++++++++++++++++++
drivers/net/memif/rte_eth_memif.h | 212 ++++
drivers/net/memif/rte_pmd_memif_version.map | 4 +
drivers/net/meson.build | 1 +
mk/rte.app.mk | 1 +
18 files changed, 3145 insertions(+)
create mode 100644 doc/guides/nics/features/memif.ini
create mode 100644 doc/guides/nics/memif.rst
create mode 100644 drivers/net/memif/Makefile
create mode 100644 drivers/net/memif/memif.h
create mode 100644 drivers/net/memif/memif_socket.c
create mode 100644 drivers/net/memif/memif_socket.h
create mode 100644 drivers/net/memif/meson.build
create mode 100644 drivers/net/memif/rte_eth_memif.c
create mode 100644 drivers/net/memif/rte_eth_memif.h
create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
requires patch: http://patchwork.dpdk.org/patch/51511/
Memif uses a unix domain socket as control channel (cc).
rte_interrupts.h is used to handle interrupts for this cc.
This patch is required, because the message received can
be a reason for disconnecting.
v3:
- coding style fixes
- documentation
- doxygen comments
- use strlcpy() instead of strncpy()
- use RTE_BUILD_BUG_ON instead of _Static_assert()
- fix meson build deps
v4:
- coding style fixes
- doc update (desc format, messaging, features, ...)
- pointer arithmetic fix
- __rte_packed and __rte_aligned instead of __attribute__()
- fixed multi-queue
- support additional archs (memfd_create syscall)
v5:
- coding style fixes
- add release notes
- remove unused dependencies
- doc update
v6:
- remove socket on termination of testpmd application
- disallow memif probing in secondary process (return error), multi-process support will be implemented in separate patch
- fix chained buffer packet length
- use single region for rings and buffers (non-zero-copy)
- doc update
v7:
- coding style fixes
v8:
- release notes: 19.05 -> 19.08
v9:
- rearrange elements in structures to remove holes
- default socket file in /run
- #define name sizes
- implement dev_close op
v10:
- fix shared lib build
- clear memory allocated by PMD in .dev_close
- fix NULL pointer string 'error: ‘%s’ directive argument is null'
diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829c5..3698c146b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -841,6 +841,12 @@ F: drivers/net/softnic/
F: doc/guides/nics/features/softnic.ini
F: doc/guides/nics/softnic.rst
+Memif PMD
+M: Jakub Grajciar <jgrajcia@cisco.com>
+F: drivers/net/memif/
+F: doc/guides/nics/memif.rst
+F: doc/guides/nics/features/memif.ini
+
Crypto Drivers
--------------
diff --git a/config/common_base b/config/common_base
index 6f19ad5d2..e406e7836 100644
--- a/config/common_base
+++ b/config/common_base
@@ -444,6 +444,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
#
CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+#
+# Compile Memory Interface PMD driver (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_MEMIF=n
+
#
# Compile link bonding PMD library
#
diff --git a/config/common_linux b/config/common_linux
index 75334273d..87514fe4f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_LIBRTE_IFC_PMD=y
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_MEMIF=y
CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
CONFIG_RTE_LIBRTE_PMD_TAP=y
CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
new file mode 100644
index 000000000..807d9ecdc
--- /dev/null
+++ b/doc/guides/nics/features/memif.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'memif' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status = Y
+Basic stats = Y
+Jumbo frame = Y
+ARMv8 = Y
+Power8 = Y
+x86-32 = Y
+x86-64 = Y
+Usage doc = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2221c35f2..691e7209f 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
intel_vf
kni
liquidio
+ memif
mlx4
mlx5
mvneta
diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
new file mode 100644
index 000000000..de2d481eb
--- /dev/null
+++ b/doc/guides/nics/memif.rst
@@ -0,0 +1,234 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2018-2019 Cisco Systems, Inc.
+
+======================
+Memif Poll Mode Driver
+======================
+
+Shared memory packet interface (memif) PMD allows for DPDK and any other client
+using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
+Linux only.
+
+The created device transmits packets in a raw format. It can be used with
+Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
+supported in DPDK memif implementation.
+
+Memif works in two roles: master and slave. Slave connects to master over an
+existing socket. It is also a producer of shared memory file and initializes
+the shared memory. Each interface can be connected to one peer interface
+at same time. The peer interface is identified by id parameter. Master
+creates the socket and listens for any slave connection requests. The socket
+may already exist on the system. Be sure to remove any such sockets, if you
+are creating a master interface, or you will see an "Address already in use"
+error. Function ``rte_pmd_memif_remove()``, which removes memif interface,
+will also remove a listener socket, if it is not being used by any other
+interface.
+
+The method to enable one or more interfaces is to use the
+``--vdev=net_memif0`` option on the DPDK application command line. Each
+``--vdev=net_memif1`` option given will create an interface named net_memif0,
+net_memif1, and so on. Memif uses unix domain socket to transmit control
+messages. Each memif has a unique id per socket. This id is used to identify
+peer interface. If you are connecting multiple
+interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
+etc. Note that if you assign a socket to a master interface it becomes a
+listener socket. Listener socket can not be used by a slave interface on same
+client.
+
+.. csv-table:: **Memif configuration options**
+ :header: "Option", "Description", "Default", "Valid value"
+
+ "id=0", "Used to identify peer interface", "0", "uint32_t"
+ "role=master", "Set memif role", "slave", "master|slave"
+ "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
+ "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
+ "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
+ "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
+ "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
+ "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
+
+**Connection establishment**
+
+In order to create memif connection, two memif interfaces, each in separate
+process, are needed. One interface in ``master`` role and other in
+``slave`` role. It is not possible to connect two interfaces in a single
+process. Each interface can be connected to one interface at same time,
+identified by matching id parameter.
+
+Memif driver uses unix domain socket to exchange required information between
+memif interfaces. Socket file path is specified at interface creation see
+*Memif configuration options* table above. If socket is used by ``master``
+interface, it's marked as listener socket (in scope of current process) and
+listens to connection requests from other processes. One socket can be used by
+multiple interfaces. One process can have ``slave`` and ``master`` interfaces
+at the same time, provided each role is assigned unique socket.
+
+For detailed information on memif control messages, see: net/memif/memif.h.
+
+Slave interface attempts to make a connection on assigned socket. Process
+listening on this socket will extract the connection request and create a new
+connected socket (control channel). Then it sends the 'hello' message
+(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
+adjusts its configuration accordingly, and sends 'init' message
+(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
+uses this id to find master interface, and assigns the control channel to this
+interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
+sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
+every region allocated. Master responds to each of these messages with 'ack'
+message. Same behavior applies to rings. Slave sends 'add ring' message
+(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
+each message with 'ack' message. To finalize the connection, slave interface
+sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
+master maps regions to its address space, initializes rings and responds with
+'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
+(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
+any time, due to driver error or if the interface is being deleted.
+
+Files
+
+- net/memif/memif.h *- control messages definitions*
+- net/memif/memif_socket.h
+- net/memif/memif_socket.c
+
+Shared memory
+~~~~~~~~~~~~~
+
+**Shared memory format**
+
+Slave is producer and master is consumer. Memory regions, are mapped shared memory files,
+created by memif slave and provided to master at connection establishment.
+Regions contain rings and buffers. Rings and buffers can also be separated into multiple
+regions. For no-zero-copy, rings and buffers are stored inside single memory
+region to reduce the number of opened files.
+
+region n (no-zero-copy):
+
++-----------------------+-------------------------------------------------------------------------+
+| Rings | Buffers |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+| S2M rings | M2S rings | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+
+S2M OR M2S Rings:
+
++--------+--------+-----------------------+
+| ring 0 | ring 1 | ring num_s2m_rings - 1|
++--------+--------+-----------------------+
+
+ring 0:
+
++-------------+---------------------------------------+
+| ring header | (1 << pmd->run.log2_ring_size) * desc |
++-------------+---------------------------------------+
+
+Descriptors are assigned packet buffers in order of rings creation. If we have one ring
+in each direction and ring size is 1024, then first 1024 buffers will belong to S2M ring and
+last 1024 will belong to M2S ring. In case of zero-copy, buffers are dequeued and
+enqueued as needed.
+
+**Descriptor format**
+
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|0 |length |region |flags |
++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
+|1 |metadata |offset |
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+**Flags field - flags (Quad Word 0, bits 0:15)**
+
++-----+--------------------+------------------------------------------------------------------------------------------------+
+|Bits |Name |Functionality |
++=====+====================+================================================================================================+
+|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
++-----+--------------------+------------------------------------------------------------------------------------------------+
+
+**Region index - region (Quad Word 0, 16:31)**
+
+Index of memory region, the buffer is located in.
+
+**Data length - length (Quad Word 0, 32:63)**
+
+Length of transmitted/received data.
+
+**Data Offset - offset (Quad Word 1, 0:31)**
+
+Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
+
+**Metadata - metadata (Quad Word 1, 32:63)**
+
+Buffer metadata.
+
+Files
+
+- net/memif/memif.h *- descriptor and ring definitions*
+- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
+
+Example: testpmd
+----------------------------
+In this example we run two instances of testpmd application and transmit packets over memif.
+
+First create ``master`` interface::
+
+ #./build/app/testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
+
+Now create ``slave`` interface (master must be already running so the slave will connect)::
+
+ #./build/app/testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
+
+Start forwarding packets::
+
+ Slave:
+ testpmd> start
+
+ Master:
+ testpmd> start tx_first
+
+Show status::
+
+ testpmd> show port stats 0
+
+For more details on testpmd please refer to :doc:`../testpmd_app_ug/index`.
+
+Example: testpmd and VPP
+------------------------
+For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
+
+Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
+
+ vpp# create interface memif id 0 master no-zero-copy
+ vpp# set interface state memif0/0 up
+ vpp# set interface ip address memif0/0 192.168.1.1/24
+
+To see socket filename use show memif command::
+
+ vpp# show memif
+ sockets
+ id listener filename
+ 0 yes (1) /run/vpp/memif.sock
+ ...
+
+Now create memif interface by running testpmd with these command line options::
+
+ #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
+
+Testpmd should now create memif slave interface and try to connect to master.
+In testpmd set forward option to icmpecho and start forwarding::
+
+ testpmd> set fwd icmpecho
+ testpmd> start
+
+Send ping from VPP::
+
+ vpp# ping 192.168.1.2
+ 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
+ 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
+ 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
+ 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index a17e7dea5..46efd13f0 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -54,6 +54,11 @@ New Features
Also, make sure to start the actual text at the margin.
=========================================================
+* **Added memif PMD.**
+
+ Added the new Shared Memory Packet Interface (``memif``) PMD.
+ See the :doc:`../nics/memif` guide for more details on this new driver.
+
Removed Items
-------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3a72cf38c..78cb10fc6 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -35,6 +35,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
DIRS-$(CONFIG_RTE_LIBRTE_IPN3KE_PMD) += ipn3ke
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
new file mode 100644
index 000000000..c3119625c
--- /dev/null
+++ b/drivers/net/memif/Makefile
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_memif.a
+
+EXPORT_MAP := rte_pmd_memif_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
+LDLIBS += -lrte_ethdev -lrte_kvargs
+LDLIBS += -lrte_hash
+LDLIBS += -lrte_bus_vdev
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
new file mode 100644
index 000000000..3948b1f50
--- /dev/null
+++ b/drivers/net/memif/memif.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_H_
+#define _MEMIF_H_
+
+#define MEMIF_COOKIE 0x3E31F20
+#define MEMIF_VERSION_MAJOR 2
+#define MEMIF_VERSION_MINOR 0
+#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
+#define MEMIF_NAME_SZ 32
+
+/*
+ * S2M: direction slave -> master
+ * M2S: direction master -> slave
+ */
+
+/*
+ * Type definitions
+ */
+
+typedef enum memif_msg_type {
+ MEMIF_MSG_TYPE_NONE,
+ MEMIF_MSG_TYPE_ACK,
+ MEMIF_MSG_TYPE_HELLO,
+ MEMIF_MSG_TYPE_INIT,
+ MEMIF_MSG_TYPE_ADD_REGION,
+ MEMIF_MSG_TYPE_ADD_RING,
+ MEMIF_MSG_TYPE_CONNECT,
+ MEMIF_MSG_TYPE_CONNECTED,
+ MEMIF_MSG_TYPE_DISCONNECT,
+} memif_msg_type_t;
+
+typedef enum {
+ MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
+ MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
+} memif_ring_type_t;
+
+typedef enum {
+ MEMIF_INTERFACE_MODE_ETHERNET,
+ MEMIF_INTERFACE_MODE_IP,
+ MEMIF_INTERFACE_MODE_PUNT_INJECT,
+} memif_interface_mode_t;
+
+typedef uint16_t memif_region_index_t;
+typedef uint32_t memif_region_offset_t;
+typedef uint64_t memif_region_size_t;
+typedef uint16_t memif_ring_index_t;
+typedef uint32_t memif_interface_id_t;
+typedef uint16_t memif_version_t;
+typedef uint8_t memif_log2_ring_size_t;
+
+/*
+ * Socket messages
+ */
+
+ /**
+ * M2S
+ * Contains master interfaces configuration.
+ */
+typedef struct __rte_packed {
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+ memif_version_t min_version; /**< lowest supported memif version */
+ memif_version_t max_version; /**< highest supported memif version */
+ memif_region_index_t max_region; /**< maximum num of regions */
+ memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
+ memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
+ memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
+} memif_msg_hello_t;
+
+/**
+ * S2M
+ * Contains information required to identify interface
+ * to which the slave wants to connect.
+ */
+typedef struct __rte_packed {
+ memif_version_t version; /**< memif version */
+ memif_interface_id_t id; /**< interface id */
+ memif_interface_mode_t mode:8; /**< interface mode */
+ uint8_t secret[24]; /**< optional security parameter */
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+} memif_msg_init_t;
+
+/**
+ * S2M
+ * Request master to add new shared memory region to master interface.
+ * Shared files file descriptor is passed in cmsghdr.
+ */
+typedef struct __rte_packed {
+ memif_region_index_t index; /**< shm regions index */
+ memif_region_size_t size; /**< shm region size */
+} memif_msg_add_region_t;
+
+/**
+ * S2M
+ * Request master to add new ring to master interface.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
+ memif_ring_index_t index; /**< ring index */
+ memif_region_index_t region; /**< region index on which this ring is located */
+ memif_region_offset_t offset; /**< buffer start offset */
+ memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
+ uint16_t private_hdr_size; /**< used for private metadata */
+} memif_msg_add_ring_t;
+
+/**
+ * S2M
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
+} memif_msg_connect_t;
+
+/**
+ * M2S
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
+} memif_msg_connected_t;
+
+/**
+ * S2M & M2S
+ * Disconnect interfaces.
+ */
+typedef struct __rte_packed {
+ uint32_t code; /**< error code */
+ uint8_t string[96]; /**< disconnect reason */
+} memif_msg_disconnect_t;
+
+typedef struct __rte_packed __rte_aligned(128)
+{
+ memif_msg_type_t type:16;
+ union {
+ memif_msg_hello_t hello;
+ memif_msg_init_t init;
+ memif_msg_add_region_t add_region;
+ memif_msg_add_ring_t add_ring;
+ memif_msg_connect_t connect;
+ memif_msg_connected_t connected;
+ memif_msg_disconnect_t disconnect;
+ };
+} memif_msg_t;
+
+/*
+ * Ring and Descriptor Layout
+ */
+
+/**
+ * Buffer descriptor.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
+ memif_region_index_t region; /**< region index on which the buffer is located */
+ uint32_t length; /**< buffer length */
+ memif_region_offset_t offset; /**< buffer offset */
+ uint32_t metadata;
+} memif_desc_t;
+
+#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
+ uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
+
+typedef struct {
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+ uint32_t cookie; /**< MEMIF_COOKIE */
+ uint16_t flags; /**< flags */
+#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ volatile uint16_t head; /**< pointer to ring buffer head */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ volatile uint16_t tail; /**< pointer to ring buffer tail */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ memif_desc_t desc[0]; /**< buffer descriptors */
+} memif_ring_t;
+
+#endif /* _MEMIF_H_ */
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
new file mode 100644
index 000000000..98e8ceefd
--- /dev/null
+++ b/drivers/net/memif/memif_socket.c
@@ -0,0 +1,1124 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+static void memif_intr_handler(void *arg);
+
+static ssize_t
+memif_msg_send(int fd, memif_msg_t *msg, int afd)
+{
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ struct cmsghdr *cmsg;
+ char ctl[CMSG_SPACE(sizeof(int))];
+
+ iov[0].iov_base = msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+
+ if (afd > 0) {
+ memset(&ctl, 0, sizeof(ctl));
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+ cmsg = CMSG_FIRSTHDR(&mh);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
+ }
+
+ return sendmsg(fd, &mh, 0);
+}
+
+static int
+memif_msg_send_from_queue(struct memif_control_channel *cc)
+{
+ ssize_t size;
+ int ret = 0;
+ struct memif_msg_queue_elt *e;
+
+ e = TAILQ_FIRST(&cc->msg_queue);
+ if (e == NULL)
+ return 0;
+
+ size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
+ ret = -1;
+ } else {
+ MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
+ }
+ TAILQ_REMOVE(&cc->msg_queue, e, next);
+ rte_free(e);
+
+ return ret;
+}
+
+static struct memif_msg_queue_elt *
+memif_msg_enq(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e;
+
+ e = rte_zmalloc("memif_msg", sizeof(struct memif_msg_queue_elt), 0);
+ if (e == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control message.");
+ return NULL;
+ }
+
+ e->fd = -1;
+ TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
+
+ return e;
+}
+
+void
+memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code)
+{
+ struct memif_msg_queue_elt *e;
+ struct pmd_internals *pmd;
+ memif_msg_disconnect_t *d;
+
+ if (cc == NULL) {
+ MIF_LOG(DEBUG, "Missing control channel.");
+ return;
+ }
+
+ e = memif_msg_enq(cc);
+ if (e == NULL) {
+ MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
+ return;
+ }
+
+ d = &e->msg.disconnect;
+
+ e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
+ d->code = err_code;
+
+ if (reason != NULL) {
+ strlcpy((char *)d->string, reason, sizeof(d->string));
+ if (cc->dev != NULL) {
+ pmd = cc->dev->data->dev_private;
+ strlcpy(pmd->local_disc_string, reason,
+ sizeof(pmd->local_disc_string));
+ }
+ }
+}
+
+static int
+memif_msg_enq_hello(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ memif_msg_hello_t *h;
+
+ if (e == NULL)
+ return -1;
+
+ h = &e->msg.hello;
+
+ e->msg.type = MEMIF_MSG_TYPE_HELLO;
+ h->min_version = MEMIF_VERSION;
+ h->max_version = MEMIF_VERSION;
+ h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_region = ETH_MEMIF_MAX_REGION_NUM - 1;
+ h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
+
+ strlcpy((char *)h->name, rte_version(), sizeof(h->name));
+
+ return 0;
+}
+
+static int
+memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_hello_t *h = &msg->hello;
+
+ if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
+ memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ /* Set parameters for active connection */
+ pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
+ pmd->cfg.num_s2m_rings);
+ pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
+ pmd->cfg.num_m2s_rings);
+ pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
+ pmd->cfg.log2_ring_size);
+ pmd->run.pkt_buffer_size = pmd->cfg.pkt_buffer_size;
+
+ strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
+
+ MIF_LOG(DEBUG, "%s: Connecting to %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
+{
+ memif_msg_init_t *i = &msg->init;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *pmd;
+ struct rte_eth_dev *dev;
+
+ if (i->version != MEMIF_VERSION) {
+ memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ if (cc->socket == NULL) {
+ memif_msg_enq_disconnect(cc, "Device error", 0);
+ return -1;
+ }
+
+ /* Find device with requested ID */
+ TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
+ dev = elt->dev;
+ pmd = dev->data->dev_private;
+ if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
+ pmd->id == i->id) {
+ /* assign control channel to device */
+ cc->dev = dev;
+ pmd->cc = cc;
+
+ if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Only ethernet mode supported",
+ 0);
+ return -1;
+ }
+
+ if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
+ ETH_MEMIF_FLAG_CONNECTED)) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Already connected", 0);
+ return -1;
+ }
+ strlcpy(pmd->remote_name, (char *)i->name,
+ sizeof(pmd->remote_name));
+
+ if (*pmd->secret != '\0') {
+ if (*i->secret == '\0') {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Secret required", 0);
+ return -1;
+ }
+ if (strncmp(pmd->secret, (char *)i->secret,
+ ETH_MEMIF_SECRET_SIZE) != 0) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Incorrect secret", 0);
+ return -1;
+ }
+ }
+
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
+ return 0;
+ }
+ }
+
+ /* ID not found on this socket */
+ MIF_LOG(DEBUG, "ID %u not found.", i->id);
+ memif_msg_enq_disconnect(cc, "ID not found", 0);
+ return -1;
+}
+
+static int
+memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
+ int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_region_t *ar = &msg->add_region;
+ struct memif_region *r;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
+ return -1;
+ }
+
+ if (ar->index >= ETH_MEMIF_MAX_REGION_NUM || ar->index != pmd->regions_num ||
+ pmd->regions[ar->index] != NULL) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ r->fd = fd;
+ r->region_size = ar->size;
+ r->addr = NULL;
+
+ pmd->regions[ar->index] = r;
+ pmd->regions_num++;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_ring_t *ar = &msg->add_ring;
+ struct memif_queue *mq;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
+ return -1;
+ }
+
+ /* check if we have enough queues */
+ if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
+ if (ar->index >= pmd->cfg.num_s2m_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_s2m_rings++;
+ } else {
+ if (ar->index >= pmd->cfg.num_m2s_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_m2s_rings++;
+ }
+
+ mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
+ dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
+
+ mq->intr_handle.fd = fd;
+ mq->log2_ring_size = ar->log2_ring_size;
+ mq->region = ar->region;
+ mq->ring_offset = ar->offset;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connect_t *c = &msg->connect;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connected_t *c = &msg->connected;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_disconnect_t *d = &msg->disconnect;
+
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ strlcpy(pmd->remote_disc_string, (char *)d->string,
+ sizeof(pmd->remote_disc_string));
+
+ MIF_LOG(INFO, "%s: Disconnect received: %s",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memif_disconnect(rte_eth_dev_allocated
+ (rte_vdev_device_name(pmd->vdev)));
+ return 0;
+}
+
+static int
+memif_msg_enq_ack(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ if (e == NULL)
+ return -1;
+
+ e->msg.type = MEMIF_MSG_TYPE_ACK;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_init(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_init_t *i = &e->msg.init;
+
+ if (e == NULL)
+ return -1;
+
+ i = &e->msg.init;
+ e->msg.type = MEMIF_MSG_TYPE_INIT;
+ i->version = MEMIF_VERSION;
+ i->id = pmd->id;
+ i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
+
+ strlcpy((char *)i->name, rte_version(), sizeof(i->name));
+
+ if (*pmd->secret != '\0')
+ strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_add_region_t *ar;
+ struct memif_region *mr = pmd->regions[idx];
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_region;
+ e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
+ e->fd = mr->fd;
+ ar->index = idx;
+ ar->size = mr->region_size;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
+ memif_ring_type_t type)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ struct memif_queue *mq;
+ memif_msg_add_ring_t *ar;
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_ring;
+ mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
+ dev->data->rx_queues[idx];
+
+ e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
+ e->fd = mq->intr_handle.fd;
+ ar->index = idx;
+ ar->offset = mq->ring_offset;
+ ar->region = mq->region;
+ ar->log2_ring_size = mq->log2_ring_size;
+ ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
+ ar->private_hdr_size = 0;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connect_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connect;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECT;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connected(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connected_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connected;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static void
+memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
+{
+ struct memif_msg_queue_elt *elt;
+ struct memif_control_channel *cc = arg;
+
+ /* close control channel fd */
+ close(intr_handle->fd);
+ /* clear message queue */
+ while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
+ TAILQ_REMOVE(&cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ /* free control channel */
+ rte_free(cc);
+}
+
+void
+memif_disconnect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *elt, *next;
+ struct memif_queue *mq;
+ struct rte_intr_handle *ih;
+ int i;
+ int ret;
+
+ if (pmd->cc != NULL) {
+ /* Clear control message queue (except disconnect message if any). */
+ for (elt = TAILQ_FIRST(&pmd->cc->msg_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
+ TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ }
+ /* send disconnect message (if there is any in queue) */
+ memif_msg_send_from_queue(pmd->cc);
+
+ /* at this point, there should be no more messages in queue */
+ if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
+ MIF_LOG(WARNING,
+ "%s: Unexpected message(s) in message queue.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+
+ ih = &pmd->cc->intr_handle;
+ if (ih->fd > 0) {
+ ret = rte_intr_callback_unregister(ih,
+ memif_intr_handler,
+ pmd->cc);
+ /*
+ * If callback is active (disconnecting based on
+ * received control message).
+ */
+ if (ret == -EAGAIN) {
+ ret = rte_intr_callback_unregister_pending(ih,
+ memif_intr_handler,
+ pmd->cc,
+ memif_intr_unregister_handler);
+ } else if (ret > 0) {
+ close(ih->fd);
+ rte_free(pmd->cc);
+ }
+ pmd->cc = NULL;
+ if (ret <= 0)
+ MIF_LOG(WARNING, "%s: Failed to unregister "
+ "control channel callback.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+ }
+
+ /* unconfig interrupts */
+ for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
+ if (pmd->role == MEMIF_ROLE_SLAVE) {
+ if (dev->data->tx_queues != NULL)
+ mq = dev->data->tx_queues[i];
+ else
+ continue;
+ } else {
+ if (dev->data->rx_queues != NULL)
+ mq = dev->data->rx_queues[i];
+ else
+ continue;
+ }
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+ for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
+ if (pmd->role == MEMIF_ROLE_MASTER) {
+ if (dev->data->tx_queues != NULL)
+ mq = dev->data->tx_queues[i];
+ else
+ continue;
+ } else {
+ if (dev->data->rx_queues != NULL)
+ mq = dev->data->rx_queues[i];
+ else
+ continue;
+ }
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+
+ memif_free_regions(pmd);
+
+ /* reset connection configuration */
+ memset(&pmd->run, 0, sizeof(pmd->run));
+
+ dev->data->dev_link.link_status = ETH_LINK_DOWN;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
+ MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
+}
+
+static int
+memif_msg_receive(struct memif_control_channel *cc)
+{
+ char ctl[CMSG_SPACE(sizeof(int)) +
+ CMSG_SPACE(sizeof(struct ucred))] = { 0 };
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ memif_msg_t msg = { 0 };
+ ssize_t size;
+ int ret = 0;
+ struct ucred *cr __rte_unused;
+ cr = 0;
+ struct cmsghdr *cmsg;
+ int afd = -1;
+ int i;
+ struct pmd_internals *pmd;
+
+ iov[0].iov_base = (void *)&msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+
+ size = recvmsg(cc->intr_handle.fd, &mh, 0);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(DEBUG, "Invalid message size.");
+ memif_msg_enq_disconnect(cc, "Invalid message size", 0);
+ return -1;
+ }
+ MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
+
+ cmsg = CMSG_FIRSTHDR(&mh);
+ while (cmsg) {
+ if (cmsg->cmsg_level == SOL_SOCKET) {
+ if (cmsg->cmsg_type == SCM_CREDENTIALS)
+ cr = (struct ucred *)CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_RIGHTS)
+ afd = *(int *)CMSG_DATA(cmsg);
+ }
+ cmsg = CMSG_NXTHDR(&mh, cmsg);
+ }
+
+ if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
+ MIF_LOG(DEBUG, "Unexpected message.");
+ memif_msg_enq_disconnect(cc, "Unexpected message", 0);
+ return -1;
+ }
+
+ /* get device from hash data */
+ switch (msg.type) {
+ case MEMIF_MSG_TYPE_ACK:
+ break;
+ case MEMIF_MSG_TYPE_HELLO:
+ ret = memif_msg_receive_hello(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_init_regions_and_queues(cc->dev);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_init(cc->dev);
+ if (ret < 0)
+ goto exit;
+ pmd = cc->dev->data->dev_private;
+ for (i = 0; i < pmd->regions_num; i++) {
+ ret = memif_msg_enq_add_region(cc->dev, i);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_S2M);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_M2S);
+ if (ret < 0)
+ goto exit;
+ }
+ ret = memif_msg_enq_connect(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_INIT:
+ /*
+ * This cc does not have an interface asociated with it.
+ * If suitable interface is found it will be assigned here.
+ */
+ ret = memif_msg_receive_init(cc, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_REGION:
+ ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_RING:
+ ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECT:
+ ret = memif_msg_receive_connect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_connected(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECTED:
+ ret = memif_msg_receive_connected(cc->dev, &msg);
+ break;
+ case MEMIF_MSG_TYPE_DISCONNECT:
+ ret = memif_msg_receive_disconnect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ break;
+ default:
+ memif_msg_enq_disconnect(cc, "Unknown message type", 0);
+ ret = -1;
+ goto exit;
+ }
+
+ exit:
+ return ret;
+}
+
+static void
+memif_intr_handler(void *arg)
+{
+ struct memif_control_channel *cc = arg;
+ int ret;
+
+ ret = memif_msg_receive(cc);
+ /* if driver failed to assign device */
+ if (cc->dev == NULL) {
+ ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
+ memif_intr_handler,
+ cc,
+ memif_intr_unregister_handler);
+ if (ret < 0)
+ MIF_LOG(WARNING,
+ "Failed to unregister control channel callback.");
+ return;
+ }
+ /* if memif_msg_receive failed */
+ if (ret < 0)
+ goto disconnect;
+
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto disconnect;
+
+ return;
+
+ disconnect:
+ if (cc->dev == NULL) {
+ MIF_LOG(WARNING, "eth dev not allocated");
+ return;
+ }
+ memif_disconnect(cc->dev);
+}
+
+static void
+memif_listener_handler(void *arg)
+{
+ struct memif_socket *socket = arg;
+ int sockfd;
+ int addr_len;
+ struct sockaddr_un client;
+ struct memif_control_channel *cc;
+ int ret;
+
+ addr_len = sizeof(client);
+ sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
+ (socklen_t *)&addr_len);
+ if (sockfd < 0) {
+ MIF_LOG(ERR,
+ "Failed to accept connection request on socket fd %d",
+ socket->intr_handle.fd);
+ return;
+ }
+
+ MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
+
+ cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
+ if (cc == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control channel.");
+ goto error;
+ }
+
+ cc->intr_handle.fd = sockfd;
+ cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ cc->socket = socket;
+ cc->dev = NULL;
+ TAILQ_INIT(&cc->msg_queue);
+
+ ret = rte_intr_callback_register(&cc->intr_handle, memif_intr_handler, cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to register control channel callback.");
+ goto error;
+ }
+
+ ret = memif_msg_enq_hello(cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to enqueue hello message.");
+ goto error;
+ }
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto error;
+
+ return;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (cc != NULL)
+ rte_free(cc);
+}
+
+static struct memif_socket *
+memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
+{
+ struct memif_socket *sock;
+ struct sockaddr_un un;
+ int sockfd;
+ int ret;
+ int on = 1;
+
+ sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
+ if (sock == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory for memif socket");
+ return NULL;
+ }
+
+ sock->listener = listener;
+ rte_memcpy(sock->filename, key, 256);
+ TAILQ_INIT(&sock->dev_queue);
+
+ if (listener != 0) {
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0)
+ goto error;
+
+ un.sun_family = AF_UNIX;
+ memcpy(un.sun_path, sock->filename,
+ sizeof(un.sun_path) - 1);
+
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
+ sizeof(on));
+ if (ret < 0)
+ goto error;
+ ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+ if (ret < 0)
+ goto error;
+ ret = listen(sockfd, 1);
+ if (ret < 0)
+ goto error;
+
+ MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
+ rte_vdev_device_name(pmd->vdev), sock->filename);
+
+ sock->intr_handle.fd = sockfd;
+ sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ ret = rte_intr_callback_register(&sock->intr_handle,
+ memif_listener_handler, sock);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt "
+ "callback for listener socket",
+ rte_vdev_device_name(pmd->vdev));
+ return NULL;
+ }
+ }
+
+ return sock;
+
+ error:
+ MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
+ rte_vdev_device_name(pmd->vdev), key, strerror(errno));
+ if (sock != NULL)
+ rte_free(sock);
+ return NULL;
+}
+
+static struct rte_hash *
+memif_create_socket_hash(void)
+{
+ struct rte_hash_parameters params = { 0 };
+ params.name = MEMIF_SOCKET_HASH_NAME;
+ params.entries = 256;
+ params.key_len = 256;
+ params.hash_func = rte_jhash;
+ params.hash_func_init_val = 0;
+ return rte_hash_create(¶ms);
+}
+
+int
+memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *tmp_pmd;
+ struct rte_hash *hash;
+ int ret;
+ char key[256];
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL) {
+ hash = memif_create_socket_hash();
+ if (hash == NULL) {
+ MIF_LOG(ERR, "Failed to create memif socket hash.");
+ return -1;
+ }
+ }
+
+ memset(key, 0, 256);
+ rte_memcpy(key, socket_filename, strlen(socket_filename));
+ ret = rte_hash_lookup_data(hash, key, (void **)&socket);
+ if (ret < 0) {
+ socket = memif_socket_create(pmd, key,
+ (pmd->role ==
+ MEMIF_ROLE_SLAVE) ? 0 : 1);
+ if (socket == NULL)
+ return -1;
+ ret = rte_hash_add_key_data(hash, key, socket);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to add socket to socket hash.");
+ return ret;
+ }
+ }
+ pmd->socket_filename = socket->filename;
+
+ if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
+ MIF_LOG(ERR, "Socket is a listener.");
+ return -1;
+ } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
+ MIF_LOG(ERR, "Socket is not a listener.");
+ return -1;
+ }
+
+ TAILQ_FOREACH(elt, &socket->dev_queue, next) {
+ tmp_pmd = elt->dev->data->dev_private;
+ if (tmp_pmd->id == pmd->id) {
+ MIF_LOG(ERR, "Memif device with id %d already "
+ "exists on socket %s",
+ pmd->id, socket->filename);
+ return -1;
+ }
+ }
+
+ elt = rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt), 0);
+ if (elt == NULL) {
+ MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ elt->dev = dev;
+ TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
+
+ return 0;
+}
+
+void
+memif_socket_remove_device(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt, *next;
+ struct rte_hash *hash;
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL)
+ return;
+
+ if (pmd->socket_filename == NULL)
+ return;
+
+ if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) < 0)
+ return;
+
+ for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->dev == dev) {
+ TAILQ_REMOVE(&socket->dev_queue, elt, next);
+ rte_free(elt);
+ pmd->socket_filename = NULL;
+ }
+ }
+
+ /* remove socket, if this was the last device using it */
+ if (TAILQ_EMPTY(&socket->dev_queue)) {
+ rte_hash_del_key(hash, socket->filename);
+ if (socket->listener) {
+ /* remove listener socket file,
+ * so we can create new one later.
+ */
+ remove(socket->filename);
+ }
+ rte_free(socket);
+ }
+}
+
+int
+memif_connect_master(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+ return 0;
+}
+
+int
+memif_connect_slave(struct rte_eth_dev *dev)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_un sun;
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0) {
+ MIF_LOG(ERR, "%s: Failed to open socket.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ sun.sun_family = AF_UNIX;
+
+ memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
+
+ ret = connect(sockfd, (struct sockaddr *)&sun,
+ sizeof(struct sockaddr_un));
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+ goto error;
+ }
+
+ MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+
+ pmd->cc = rte_zmalloc("memif-cc",
+ sizeof(struct memif_control_channel), 0);
+ if (pmd->cc == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate control channel.",
+ rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ pmd->cc->intr_handle.fd = sockfd;
+ pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ pmd->cc->socket = NULL;
+ pmd->cc->dev = dev;
+ TAILQ_INIT(&pmd->cc->msg_queue);
+
+ ret = rte_intr_callback_register(&pmd->cc->intr_handle,
+ memif_intr_handler, pmd->cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt callback "
+ "for control fd", rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ return 0;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (pmd->cc != NULL) {
+ rte_free(pmd->cc);
+ pmd->cc = NULL;
+ }
+ return -1;
+}
diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
new file mode 100644
index 000000000..c60614721
--- /dev/null
+++ b/drivers/net/memif/memif_socket.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_SOCKET_H_
+#define _MEMIF_SOCKET_H_
+
+#include <sys/queue.h>
+
+/**
+ * Remove device from socket device list. If no device is left on the socket,
+ * remove the socket as well.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_socket_remove_device(struct rte_eth_dev *dev);
+
+/**
+ * Enqueue disconnect message to control channel message queue.
+ *
+ * @param cc
+ * control channel
+ * @param reason
+ * const string stating disconnect reason (96 characters)
+ * @param err_code
+ * error code
+ */
+void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code);
+
+/**
+ * Initialize memif socket for specified device. If socket doesn't exist, create socket.
+ *
+ * @param dev
+ * memif ethernet device
+ * @param socket_filename
+ * socket filename
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
+
+/**
+ * Disconnect memif device. Close control channel and shared memory.
+ *
+ * @param dev
+ * ethernet device
+ */
+void memif_disconnect(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, enable connection establishment.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_master(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, send connection request.
+ *
+ * @param dev
+ * memif ethernet device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_slave(struct rte_eth_dev *dev);
+
+struct memif_socket_dev_list_elt {
+ TAILQ_ENTRY(memif_socket_dev_list_elt) next;
+ struct rte_eth_dev *dev; /**< pointer to device internals */
+ char dev_name[RTE_ETH_NAME_MAX_LEN];
+};
+
+#define MEMIF_SOCKET_HASH_NAME "memif-sh"
+struct memif_socket {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ char filename[256]; /**< socket filename */
+
+ TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
+ /**< Queue of devices using this socket */
+ uint8_t listener; /**< if not zero socket is listener */
+};
+
+/* Control message queue. */
+struct memif_msg_queue_elt {
+ memif_msg_t msg; /**< control message */
+ TAILQ_ENTRY(memif_msg_queue_elt) next;
+ int fd; /**< fd to be sent to peer */
+};
+
+struct memif_control_channel {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
+ struct memif_socket *socket; /**< pointer to socket */
+ struct rte_eth_dev *dev; /**< pointer to device */
+};
+
+#endif /* MEMIF_SOCKET_H */
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
new file mode 100644
index 000000000..287a30ef2
--- /dev/null
+++ b/drivers/net/memif/meson.build
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+if host_machine.system() != 'linux'
+ build = false
+endif
+
+sources = files('rte_eth_memif.c',
+ 'memif_socket.c')
+
+allow_experimental_apis = true
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+
+deps += ['hash']
diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
new file mode 100644
index 000000000..3b9c1ce66
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -0,0 +1,1206 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/if_ether.h>
+#include <errno.h>
+#include <sys/eventfd.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+#define ETH_MEMIF_ID_ARG "id"
+#define ETH_MEMIF_ROLE_ARG "role"
+#define ETH_MEMIF_PKT_BUFFER_SIZE_ARG "bsize"
+#define ETH_MEMIF_RING_SIZE_ARG "rsize"
+#define ETH_MEMIF_SOCKET_ARG "socket"
+#define ETH_MEMIF_MAC_ARG "mac"
+#define ETH_MEMIF_ZC_ARG "zero-copy"
+#define ETH_MEMIF_SECRET_ARG "secret"
+
+static const char *valid_arguments[] = {
+ ETH_MEMIF_ID_ARG,
+ ETH_MEMIF_ROLE_ARG,
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ ETH_MEMIF_RING_SIZE_ARG,
+ ETH_MEMIF_SOCKET_ARG,
+ ETH_MEMIF_MAC_ARG,
+ ETH_MEMIF_ZC_ARG,
+ ETH_MEMIF_SECRET_ARG,
+ NULL
+};
+
+const char *
+memif_version(void)
+{
+ return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
+}
+
+static void
+memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
+{
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+ dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static memif_ring_t *
+memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
+{
+ /* rings only in region 0 */
+ void *p = pmd->regions[0]->addr;
+ int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size);
+
+ p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
+
+ return (memif_ring_t *)p;
+}
+
+static void *
+memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
+{
+ return ((uint8_t *)pmd->regions[d->region]->addr + d->offset);
+}
+
+static int
+memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail,
+ struct rte_mbuf *tail)
+{
+ /* Check for number-of-segments-overflow */
+ if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS))
+ return -EOVERFLOW;
+
+ /* Chain 'tail' onto the old tail */
+ cur_tail->next = tail;
+
+ /* accumulate number of segments and total length. */
+ head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs);
+
+ tail->pkt_len = tail->data_len;
+ head->pkt_len += tail->pkt_len;
+
+ return 0;
+}
+
+static uint16_t
+eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
+ uint16_t n_rx_pkts = 0;
+ uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
+ RTE_PKTMBUF_HEADROOM;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail;
+ uint64_t b;
+ ssize_t size __rte_unused;
+ uint16_t head;
+ int ret;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ /* consume interrupt */
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
+ size = read(mq->intr_handle.fd, &b, sizeof(b));
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
+ last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+ if (cur_slot == last_slot)
+ goto refill;
+ n_slots = last_slot - cur_slot;
+
+ while (n_slots && n_rx_pkts < nb_pkts) {
+ mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf_head == NULL))
+ goto no_free_bufs;
+ mbuf = mbuf_head;
+ mbuf->port = mq->in_port;
+
+next_slot:
+ s0 = cur_slot & mask;
+ d0 = &ring->desc[s0];
+
+ src_len = d0->length;
+ dst_off = 0;
+ src_off = 0;
+
+ do {
+ dst_len = mbuf_size - dst_off;
+ if (dst_len == 0) {
+ dst_off = 0;
+ dst_len = mbuf_size;
+
+ /* store pointer to tail */
+ mbuf_tail = mbuf;
+ mbuf = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf == NULL))
+ goto no_free_bufs;
+ mbuf->port = mq->in_port;
+ ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf);
+ if (unlikely(ret < 0)) {
+ MIF_LOG(ERR, "%s: number-of-segments-overflow",
+ rte_vdev_device_name(pmd->vdev));
+ rte_pktmbuf_free(mbuf);
+ goto no_free_bufs;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ rte_pktmbuf_data_len(mbuf) += cp_len;
+ rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
+ if (mbuf != mbuf_head)
+ rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
+
+ memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
+ (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
+
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ } while (src_len);
+
+ cur_slot++;
+ n_slots--;
+
+ if (d0->flags & MEMIF_DESC_FLAG_NEXT)
+ goto next_slot;
+
+ mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
+ *bufs++ = mbuf_head;
+ n_rx_pkts++;
+ }
+
+no_free_bufs:
+ if (type == MEMIF_RING_S2M) {
+ rte_mb();
+ ring->tail = cur_slot;
+ mq->last_head = cur_slot;
+ } else {
+ mq->last_tail = cur_slot;
+ }
+
+refill:
+ if (type == MEMIF_RING_M2S) {
+ head = ring->head;
+ n_slots = ring_size - head + mq->last_tail;
+
+ while (n_slots--) {
+ s0 = head++ & mask;
+ d0 = &ring->desc[s0];
+ d0->length = pmd->run.pkt_buffer_size;
+ }
+ rte_mb();
+ ring->head = head;
+ }
+
+ mq->n_pkts += n_rx_pkts;
+ return n_rx_pkts;
+}
+
+static uint16_t
+eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head;
+ uint64_t a;
+ ssize_t size;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ n_free = ring->tail - mq->last_tail;
+ mq->last_tail += n_free;
+ slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+
+ if (type == MEMIF_RING_S2M)
+ n_free = ring_size - ring->head + mq->last_tail;
+ else
+ n_free = ring->head - ring->tail;
+
+ while (n_tx_pkts < nb_pkts && n_free) {
+ mbuf_head = *bufs++;
+ mbuf = mbuf_head;
+
+ saved_slot = slot;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+
+next_in_chain:
+ src_off = 0;
+ src_len = rte_pktmbuf_data_len(mbuf);
+
+ while (src_len) {
+ if (dst_len == 0) {
+ if (n_free) {
+ slot++;
+ n_free--;
+ d0->flags |= MEMIF_DESC_FLAG_NEXT;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+ d0->flags = 0;
+ } else {
+ slot = saved_slot;
+ goto no_free_slots;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
+ rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
+ cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ dst_len -= cp_len;
+
+ d0->length = dst_off;
+ }
+
+ if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
+ mbuf = mbuf->next;
+ goto next_in_chain;
+ }
+
+ n_tx_pkts++;
+ slot++;
+ n_free--;
+ rte_pktmbuf_free(mbuf_head);
+ }
+
+no_free_slots:
+ rte_mb();
+ if (type == MEMIF_RING_S2M)
+ ring->head = slot;
+ else
+ ring->tail = slot;
+
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
+ a = 1;
+ size = write(mq->intr_handle.fd, &a, sizeof(a));
+ if (unlikely(size < 0)) {
+ MIF_LOG(WARNING,
+ "%s: Failed to send interrupt. %s",
+ rte_vdev_device_name(pmd->vdev), strerror(errno));
+ }
+ }
+
+ mq->n_err += nb_pkts - n_tx_pkts;
+ mq->n_pkts += n_tx_pkts;
+ return n_tx_pkts;
+}
+
+void
+memif_free_regions(struct pmd_internals *pmd)
+{
+ int i;
+ struct memif_region *r;
+
+ /* regions are allocated contiguously, so it's
+ * enough to loop until 'pmd->regions_num'
+ */
+ for (i = 0; i < pmd->regions_num; i++) {
+ r = pmd->regions[i];
+ if (r != NULL) {
+ if (r->addr != NULL) {
+ munmap(r->addr, r->region_size);
+ if (r->fd > 0) {
+ close(r->fd);
+ r->fd = -1;
+ }
+ }
+ rte_free(r);
+ pmd->regions[i] = NULL;
+ }
+ }
+ pmd->regions_num = 0;
+}
+
+static int
+memif_region_init_shm(struct pmd_internals *pmd, uint8_t has_buffers)
+{
+ char shm_name[ETH_MEMIF_SHM_NAME_SIZE];
+ int ret = 0;
+ struct memif_region *r;
+
+ if (pmd->regions_num >= ETH_MEMIF_MAX_REGION_NUM) {
+ MIF_LOG(ERR, "%s: Too many regions.", rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ /* calculate buffer offset */
+ r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+
+ r->region_size = r->pkt_buffer_offset;
+ /* if region has buffers, add buffers size to region_size */
+ if (has_buffers == 1)
+ r->region_size += (uint32_t)(pmd->run.pkt_buffer_size *
+ (1 << pmd->run.log2_ring_size) *
+ (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings));
+
+ memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE);
+ snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d",
+ pmd->regions_num);
+
+ r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
+ if (r->fd < 0) {
+ MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ ret = -1;
+ goto error;
+ }
+
+ ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ ret = ftruncate(r->fd, r->region_size);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ r->addr = mmap(NULL, r->region_size, PROT_READ |
+ PROT_WRITE, MAP_SHARED, r->fd, 0);
+ if (r->addr == MAP_FAILED) {
+ MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(ret));
+ ret = -1;
+ goto error;
+ }
+
+ pmd->regions[pmd->regions_num] = r;
+ pmd->regions_num++;
+
+ return ret;
+
+error:
+ if (r->fd > 0)
+ close(r->fd);
+ r->fd = -1;
+
+ return ret;
+}
+
+static int
+memif_regions_init(struct pmd_internals *pmd)
+{
+ int ret;
+
+ /* create one buffer region */
+ ret = memif_region_init_shm(pmd, /* has buffer */ 1);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void
+memif_init_rings(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_ring_t *ring;
+ int i, j;
+ uint16_t slot;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = i * (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
+ (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = (i + pmd->run.num_s2m_rings) *
+ (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
+ (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+}
+
+/* called only by slave */
+static void
+memif_init_queues(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = dev->data->tx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for tx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = dev->data->rx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for rx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+}
+
+int
+memif_init_regions_and_queues(struct rte_eth_dev *dev)
+{
+ int ret;
+
+ ret = memif_regions_init(dev->data->dev_private);
+ if (ret < 0)
+ return ret;
+
+ memif_init_rings(dev);
+
+ memif_init_queues(dev);
+
+ return 0;
+}
+
+int
+memif_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_region *mr;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ mr = pmd->regions[i];
+ if (mr != NULL) {
+ if (mr->addr == NULL) {
+ if (mr->fd < 0)
+ return -1;
+ mr->addr = mmap(NULL, mr->region_size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, mr->fd, 0);
+ if (mr->addr == NULL)
+ return -1;
+ }
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->ring_offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_MASTER)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->ring_offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_SLAVE)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
+ dev->data->dev_link.link_status = ETH_LINK_UP;
+ MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
+ return 0;
+}
+
+static int
+memif_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int ret = 0;
+
+ switch (pmd->role) {
+ case MEMIF_ROLE_SLAVE:
+ ret = memif_connect_slave(dev);
+ break;
+ case MEMIF_ROLE_MASTER:
+ ret = memif_connect_master(dev);
+ break;
+ default:
+ MIF_LOG(ERR, "%s: Unknown role: %d.",
+ rte_vdev_device_name(pmd->vdev), pmd->role);
+ ret = -1;
+ break;
+ }
+
+ return ret;
+}
+
+static void
+memif_dev_close(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memif_msg_enq_disconnect(pmd->cc, "Device closed", 0);
+ memif_disconnect(dev);
+
+ memif_socket_remove_device(dev);
+}
+
+static int
+memif_dev_configure(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ /*
+ * SLAVE - TXQ
+ * MASTER - RXQ
+ */
+ pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_tx_queues : dev->data->nb_rx_queues;
+
+ /*
+ * SLAVE - RXQ
+ * MASTER - TXQ
+ */
+ pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+
+ return 0;
+}
+
+static int
+memif_tx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type =
+ (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->pmd = pmd;
+ dev->data->tx_queues[qid] = mq;
+
+ return 0;
+}
+
+static int
+memif_rx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->mempool = mb_pool;
+ mq->in_port = dev->data->port_id;
+ mq->pmd = pmd;
+ dev->data->rx_queues[qid] = mq;
+
+ return 0;
+}
+
+static void
+memif_queue_release(void *queue)
+{
+ struct memif_queue *mq = (struct memif_queue *)queue;
+
+ if (!mq)
+ return;
+
+ rte_free(mq);
+}
+
+static int
+memif_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static int
+memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+ uint8_t tmp, nq;
+
+ stats->ipackets = 0;
+ stats->ibytes = 0;
+ stats->opackets = 0;
+ stats->obytes = 0;
+ stats->oerrors = 0;
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
+ pmd->run.num_m2s_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* RX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->rx_queues[i];
+ stats->q_ipackets[i] = mq->n_pkts;
+ stats->q_ibytes[i] = mq->n_bytes;
+ stats->ipackets += mq->n_pkts;
+ stats->ibytes += mq->n_bytes;
+ }
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
+ pmd->run.num_s2m_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* TX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->tx_queues[i];
+ stats->q_opackets[i] = mq->n_pkts;
+ stats->q_obytes[i] = mq->n_bytes;
+ stats->opackets += mq->n_pkts;
+ stats->obytes += mq->n_bytes;
+ stats->oerrors += mq->n_err;
+ }
+ return 0;
+}
+
+static void
+memif_stats_reset(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+ struct memif_queue *mq;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
+ dev->data->rx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
+ dev->data->tx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+}
+
+static int
+memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
+ rte_vdev_device_name(pmd->vdev));
+
+ return -1;
+}
+
+static int
+memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
+
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = memif_dev_start,
+ .dev_close = memif_dev_close,
+ .dev_infos_get = memif_dev_info,
+ .dev_configure = memif_dev_configure,
+ .tx_queue_setup = memif_tx_queue_setup,
+ .rx_queue_setup = memif_rx_queue_setup,
+ .rx_queue_release = memif_queue_release,
+ .tx_queue_release = memif_queue_release,
+ .rx_queue_intr_enable = memif_rx_queue_intr_enable,
+ .rx_queue_intr_disable = memif_rx_queue_intr_disable,
+ .link_update = memif_link_update,
+ .stats_get = memif_stats_get,
+ .stats_reset = memif_stats_reset,
+};
+
+static int
+memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
+ memif_interface_id_t id, uint32_t flags,
+ const char *socket_filename,
+ memif_log2_ring_size_t log2_ring_size,
+ uint16_t pkt_buffer_size, const char *secret,
+ struct rte_ether_addr *ether_addr)
+{
+ int ret = 0;
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct pmd_internals *pmd;
+ const unsigned int numa_node = vdev->device.numa_node;
+ const char *name = rte_vdev_device_name(vdev);
+
+ if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
+ MIF_LOG(ERR, "Zero-copy slave not supported.");
+ return -1;
+ }
+
+ eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
+ if (eth_dev == NULL) {
+ MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
+ return -1;
+ }
+
+ pmd = eth_dev->data->dev_private;
+ memset(pmd, 0, sizeof(*pmd));
+
+ pmd->vdev = vdev;
+ pmd->id = id;
+ pmd->flags = flags;
+ pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
+ pmd->role = role;
+
+ ret = memif_socket_init(eth_dev, socket_filename);
+ if (ret < 0)
+ return ret;
+
+ memset(pmd->secret, 0, sizeof(char) * ETH_MEMIF_SECRET_SIZE);
+ if (secret != NULL)
+ strlcpy(pmd->secret, secret, sizeof(pmd->secret));
+
+ pmd->cfg.log2_ring_size = log2_ring_size;
+ /* set in .dev_configure() */
+ pmd->cfg.num_s2m_rings = 0;
+ pmd->cfg.num_m2s_rings = 0;
+
+ pmd->cfg.pkt_buffer_size = pkt_buffer_size;
+
+ data = eth_dev->data;
+ data->dev_private = pmd;
+ data->numa_node = numa_node;
+ data->mac_addrs = ether_addr;
+
+ eth_dev->dev_ops = &ops;
+ eth_dev->device = &vdev->device;
+ eth_dev->rx_pkt_burst = eth_memif_rx;
+ eth_dev->tx_pkt_burst = eth_memif_tx;
+
+ eth_dev->data->dev_flags |= RTE_ETH_DEV_CLOSE_REMOVE;
+
+ rte_eth_dev_probing_finish(eth_dev);
+
+ return 0;
+}
+
+static int
+memif_set_role(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ enum memif_role_t *role = (enum memif_role_t *)extra_args;
+
+ if (strstr(value, "master") != NULL) {
+ *role = MEMIF_ROLE_MASTER;
+ } else if (strstr(value, "slave") != NULL) {
+ *role = MEMIF_ROLE_SLAVE;
+ } else {
+ MIF_LOG(ERR, "Unknown role: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint32_t *flags = (uint32_t *)extra_args;
+
+ if (strstr(value, "yes") != NULL) {
+ *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
+ } else if (strstr(value, "no") != NULL) {
+ *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
+ } else {
+ MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
+
+ /* even if parsing fails, 0 is a valid id */
+ *id = strtoul(value, NULL, 10);
+ return 0;
+}
+
+static int
+memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ uint16_t *pkt_buffer_size = (uint16_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > 0xFFFF) {
+ MIF_LOG(ERR, "Invalid buffer size: %s.", value);
+ return -EINVAL;
+ }
+ *pkt_buffer_size = tmp;
+ return 0;
+}
+
+static int
+memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ memif_log2_ring_size_t *log2_ring_size =
+ (memif_log2_ring_size_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
+ MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
+ value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
+ return -EINVAL;
+ }
+ *log2_ring_size = tmp;
+ return 0;
+}
+
+/* check if directory exists and if we have permission to read/write */
+static int
+memif_check_socket_filename(const char *filename)
+{
+ char *dir = NULL, *tmp;
+ uint32_t idx;
+ int ret = 0;
+
+ tmp = strrchr(filename, '/');
+ if (tmp != NULL) {
+ idx = tmp - filename;
+ dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
+ if (dir == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory.");
+ return -1;
+ }
+ strlcpy(dir, filename, sizeof(char) * (idx + 1));
+ }
+
+ if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
+ W_OK, AT_EACCESS) < 0)) {
+ MIF_LOG(ERR, "Invalid socket directory.");
+ ret = -EINVAL;
+ }
+
+ if (dir != NULL)
+ rte_free(dir);
+
+ return ret;
+}
+
+static int
+memif_set_socket_filename(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ const char **socket_filename = (const char **)extra_args;
+
+ *socket_filename = value;
+ return memif_check_socket_filename(*socket_filename);
+}
+
+static int
+memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ struct rte_ether_addr *ether_addr = (struct rte_ether_addr *)extra_args;
+ int ret = 0;
+
+ ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+ ðer_addr->addr_bytes[0], ðer_addr->addr_bytes[1],
+ ðer_addr->addr_bytes[2], ðer_addr->addr_bytes[3],
+ ðer_addr->addr_bytes[4], ðer_addr->addr_bytes[5]);
+ if (ret != 6)
+ MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
+ return 0;
+}
+
+static int
+memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **secret = (const char **)extra_args;
+
+ *secret = value;
+ return 0;
+}
+
+static int
+rte_pmd_memif_probe(struct rte_vdev_device *vdev)
+{
+ RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
+ RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
+ int ret = 0;
+ struct rte_kvargs *kvlist;
+ const char *name = rte_vdev_device_name(vdev);
+ enum memif_role_t role = MEMIF_ROLE_SLAVE;
+ memif_interface_id_t id = 0;
+ uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
+ memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
+ const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
+ uint32_t flags = 0;
+ const char *secret = NULL;
+ struct rte_ether_addr *ether_addr = rte_zmalloc("", sizeof(struct rte_ether_addr), 0);
+
+ rte_eth_random_addr(ether_addr->addr_bytes);
+
+ MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
+
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ MIF_LOG(ERR, "Multi-processing not supported for memif.");
+ /* TODO:
+ * Request connection information.
+ *
+ * Once memif in the primary process is connected,
+ * broadcast connection information.
+ */
+ return -1;
+ }
+
+ kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
+
+ /* parse parameters */
+ if (kvlist != NULL) {
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
+ &memif_set_role, &role);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
+ &memif_set_id, &id);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ &memif_set_bs, &pkt_buffer_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
+ &memif_set_rs, &log2_ring_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
+ &memif_set_socket_filename,
+ (void *)(&socket_filename));
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
+ &memif_set_mac, ether_addr);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
+ &memif_set_zc, &flags);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
+ &memif_set_secret, (void *)(&secret));
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* create interface */
+ ret = memif_create(vdev, role, id, flags, socket_filename,
+ log2_ring_size, pkt_buffer_size, secret, ether_addr);
+
+exit:
+ if (kvlist != NULL)
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_memif_remove(struct rte_vdev_device *vdev)
+{
+ struct rte_eth_dev *eth_dev;
+ int i;
+
+ eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
+ if (eth_dev == NULL)
+ return 0;
+
+ for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
+ (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->rx_queues[i]);
+ for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+ (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->tx_queues[i]);
+
+ rte_free(eth_dev->process_private);
+ eth_dev->process_private = NULL;
+
+ rte_eth_dev_close(eth_dev->data->port_id);
+
+ return 0;
+}
+
+static struct rte_vdev_driver pmd_memif_drv = {
+ .probe = rte_pmd_memif_probe,
+ .remove = rte_pmd_memif_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
+
+RTE_PMD_REGISTER_PARAM_STRING(net_memif,
+ ETH_MEMIF_ID_ARG "=<int>"
+ ETH_MEMIF_ROLE_ARG "=master|slave"
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG "=<int>"
+ ETH_MEMIF_RING_SIZE_ARG "=<int>"
+ ETH_MEMIF_SOCKET_ARG "=<string>"
+ ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
+ ETH_MEMIF_ZC_ARG "=yes|no"
+ ETH_MEMIF_SECRET_ARG "=<string>");
+
+int memif_logtype;
+
+RTE_INIT(memif_init_log)
+{
+ memif_logtype = rte_log_register("pmd.net.memif");
+ if (memif_logtype >= 0)
+ rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
new file mode 100644
index 000000000..8cee2643c
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.h
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_ETH_MEMIF_H_
+#define _RTE_ETH_MEMIF_H_
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif /* GNU_SOURCE */
+
+#include <sys/queue.h>
+
+#include <rte_ethdev_driver.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
+#include "memif.h"
+
+#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/run/memif.sock"
+#define ETH_MEMIF_DEFAULT_RING_SIZE 10
+#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
+
+#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
+#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
+#define ETH_MEMIF_MAX_REGION_NUM 256
+
+#define ETH_MEMIF_SHM_NAME_SIZE 32
+#define ETH_MEMIF_DISC_STRING_SIZE 96
+#define ETH_MEMIF_SECRET_SIZE 24
+
+extern int memif_logtype;
+
+#define MIF_LOG(level, fmt, args...) \
+ rte_log(RTE_LOG_ ## level, memif_logtype, \
+ "%s(): " fmt "\n", __func__, ##args)
+
+enum memif_role_t {
+ MEMIF_ROLE_MASTER,
+ MEMIF_ROLE_SLAVE,
+};
+
+struct memif_region {
+ void *addr; /**< shared memory address */
+ memif_region_size_t region_size; /**< shared memory size */
+ int fd; /**< shared memory file descriptor */
+ uint32_t pkt_buffer_offset;
+ /**< offset from 'addr' to first packet buffer */
+};
+
+struct memif_queue {
+ struct rte_mempool *mempool; /**< mempool for RX packets */
+ struct pmd_internals *pmd; /**< device internals */
+
+ memif_ring_type_t type; /**< ring type */
+ memif_region_index_t region; /**< shared memory region index */
+
+ uint16_t in_port; /**< port id */
+
+ memif_region_offset_t ring_offset;
+ /**< ring offset from start of shm region (ring - memif_region.addr) */
+
+ uint16_t last_head; /**< last ring head */
+ uint16_t last_tail; /**< last ring tail */
+
+ /* rx/tx info */
+ uint64_t n_pkts; /**< number of rx/tx packets */
+ uint64_t n_bytes; /**< number of rx/tx bytes */
+ uint64_t n_err; /**< number of tx errors */
+
+ memif_ring_t *ring; /**< pointer to ring */
+
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+};
+
+struct pmd_internals {
+ memif_interface_id_t id; /**< unique id */
+ enum memif_role_t role; /**< device role */
+ uint32_t flags; /**< device status flags */
+#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
+/**< device is connecting */
+#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
+/**< device is connected */
+#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
+/**< device is zero-copy enabled */
+#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
+/**< device has not been configured and can not accept connection requests */
+
+ char *socket_filename; /**< pointer to socket filename */
+ char secret[ETH_MEMIF_SECRET_SIZE]; /**< secret (optional security parameter) */
+
+ struct memif_control_channel *cc; /**< control channel */
+
+ struct memif_region *regions[ETH_MEMIF_MAX_REGION_NUM];
+ /**< shared memory regions */
+ memif_region_index_t regions_num; /**< number of regions */
+
+ /* remote info */
+ char remote_name[RTE_DEV_NAME_MAX_LEN]; /**< remote app name */
+ char remote_if_name[RTE_DEV_NAME_MAX_LEN]; /**< remote peer name */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } cfg; /**< Configured parameters (max values) */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } run;
+ /**< Parameters used in active connection */
+
+ char local_disc_string[ETH_MEMIF_DISC_STRING_SIZE];
+ /**< local disconnect reason */
+ char remote_disc_string[ETH_MEMIF_DISC_STRING_SIZE];
+ /**< remote disconnect reason */
+
+ struct rte_vdev_device *vdev; /**< vdev handle */
+};
+
+/**
+ * Unmap shared memory and free regions from memory.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_free_regions(struct pmd_internals *pmd);
+
+/**
+ * Finalize connection establishment process. Map shared memory file
+ * (master role), initialize ring queue, set link status up.
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect(struct rte_eth_dev *dev);
+
+/**
+ * Create shared memory file and initialize ring queue.
+ * Only called by slave when establishing connection
+ *
+ * @param pmd
+ * device internals
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_init_regions_and_queues(struct rte_eth_dev *dev);
+
+/**
+ * Get memif version string.
+ *
+ * @return
+ * - memif version string
+ */
+const char *memif_version(void);
+
+#ifndef MFD_HUGETLB
+#ifndef __NR_memfd_create
+
+#if defined __x86_64__
+#define __NR_memfd_create 319
+#elif defined __x86_32__
+#define __NR_memfd_create 1073742143
+#elif defined __arm__
+#define __NR_memfd_create 385
+#elif defined __aarch64__
+#define __NR_memfd_create 279
+#elif defined __powerpc__
+#define __NR_memfd_create 360
+#elif defined __i386__
+#define __NR_memfd_create 356
+#else
+#error "__NR_memfd_create unknown for this architecture"
+#endif
+
+#endif /* __NR_memfd_create */
+
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif /* MFD_HUGETLB */
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#endif /* RTE_ETH_MEMIF_H */
diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
new file mode 100644
index 000000000..8861484fb
--- /dev/null
+++ b/drivers/net/memif/rte_pmd_memif_version.map
@@ -0,0 +1,4 @@
+DPDK_19.08 {
+
+ local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index ed99896c3..b57073483 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -24,6 +24,7 @@ drivers = ['af_packet',
'ixgbe',
'kni',
'liquidio',
+ 'memif',
'mlx4',
'mlx5',
'mvneta',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7c9b4b538..326b2a30b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -174,6 +174,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
--
2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD
2019-05-31 6:22 ` [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD Jakub Grajciar
@ 2019-05-31 7:43 ` Ye Xiaolong
2019-06-03 11:28 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-06-05 12:01 ` Ferruh Yigit
2019-06-03 13:37 ` Aaron Conole
` (2 subsequent siblings)
3 siblings, 2 replies; 58+ messages in thread
From: Ye Xiaolong @ 2019-05-31 7:43 UTC (permalink / raw)
To: Jakub Grajciar; +Cc: dev
Minor nit, this should be a V1 instead of V10 after your RFC.
On 05/31, Jakub Grajciar wrote:
>Memory interface (memif), provides high performance
>packet transfer over shared memory.
It'd be better to have more descrition of this new PMD in the commit log, something
like you have in memif.rst
Thanks,
Xiaolong
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD
2019-05-31 7:43 ` Ye Xiaolong
@ 2019-06-03 11:28 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-06-03 14:25 ` Ye Xiaolong
2019-06-05 12:01 ` Ferruh Yigit
1 sibling, 1 reply; 58+ messages in thread
From: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) @ 2019-06-03 11:28 UTC (permalink / raw)
To: Ye Xiaolong; +Cc: dev
> -----Original Message-----
> From: Ye Xiaolong <xiaolong.ye@intel.com>
> Sent: Friday, May 31, 2019 9:43 AM
> To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
> <jgrajcia@cisco.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface
> (memif) PMD
>
> Minor nit, this should be a V1 instead of V10 after your RFC.
I'll keep that in mind, but what version number should I use in the next patch, since I did use V10 in this one?
Thanks,
Jakub
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD
2019-06-03 11:28 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
@ 2019-06-03 14:25 ` Ye Xiaolong
0 siblings, 0 replies; 58+ messages in thread
From: Ye Xiaolong @ 2019-06-03 14:25 UTC (permalink / raw)
To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco),
Ferruh Yigit
Cc: dev
On 06/03, Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) wrote:
>
>
>> -----Original Message-----
>> From: Ye Xiaolong <xiaolong.ye@intel.com>
>> Sent: Friday, May 31, 2019 9:43 AM
>> To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
>> <jgrajcia@cisco.com>
>> Cc: dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface
>> (memif) PMD
>>
>> Minor nit, this should be a V1 instead of V10 after your RFC.
>
>I'll keep that in mind, but what version number should I use in the next patch, since I did use V10 in this one?
>
I am not sure what's DPDK convention for this case, Maybe Ferruh can suggest
on this.
Thanks,
Xiaolong
>Thanks,
>Jakub
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD
2019-05-31 7:43 ` Ye Xiaolong
2019-06-03 11:28 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
@ 2019-06-05 12:01 ` Ferruh Yigit
1 sibling, 0 replies; 58+ messages in thread
From: Ferruh Yigit @ 2019-06-05 12:01 UTC (permalink / raw)
To: Ye Xiaolong, Jakub Grajciar; +Cc: dev
On 5/31/2019 8:43 AM, Ye Xiaolong wrote:
> Minor nit, this should be a V1 instead of V10 after your RFC.
Normally Xiaolong is right, after RFC the actual patches numbering resets, but
for this case I think it is OK as long as people can follow the order, I am OK
to continue as v11 for next version, up to you.
>
> On 05/31, Jakub Grajciar wrote:
>> Memory interface (memif), provides high performance
>> packet transfer over shared memory.
>
> It'd be better to have more descrition of this new PMD in the commit log, something
> like you have in memif.rst
+1, some more detail in commit log is better.
>
> Thanks,
> Xiaolong
>
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD
2019-05-31 6:22 ` [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD Jakub Grajciar
2019-05-31 7:43 ` Ye Xiaolong
@ 2019-06-03 13:37 ` Aaron Conole
2019-06-05 11:55 ` Ferruh Yigit
2019-06-06 8:24 ` [dpdk-dev] [PATCH v11] " Jakub Grajciar
3 siblings, 0 replies; 58+ messages in thread
From: Aaron Conole @ 2019-06-03 13:37 UTC (permalink / raw)
To: Jakub Grajciar; +Cc: dev
Jakub Grajciar <jgrajcia@cisco.com> writes:
> Memory interface (memif), provides high performance
> packet transfer over shared memory.
>
> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
> ---
> MAINTAINERS | 6 +
> config/common_base | 5 +
> config/common_linux | 1 +
> doc/guides/nics/features/memif.ini | 14 +
> doc/guides/nics/index.rst | 1 +
> doc/guides/nics/memif.rst | 234 ++++
> doc/guides/rel_notes/release_19_08.rst | 5 +
> drivers/net/Makefile | 1 +
> drivers/net/memif/Makefile | 31 +
> drivers/net/memif/memif.h | 179 +++
> drivers/net/memif/memif_socket.c | 1124 +++++++++++++++++
> drivers/net/memif/memif_socket.h | 105 ++
> drivers/net/memif/meson.build | 15 +
> drivers/net/memif/rte_eth_memif.c | 1206 +++++++++++++++++++
> drivers/net/memif/rte_eth_memif.h | 212 ++++
> drivers/net/memif/rte_pmd_memif_version.map | 4 +
> drivers/net/meson.build | 1 +
> mk/rte.app.mk | 1 +
> 18 files changed, 3145 insertions(+)
> create mode 100644 doc/guides/nics/features/memif.ini
> create mode 100644 doc/guides/nics/memif.rst
> create mode 100644 drivers/net/memif/Makefile
> create mode 100644 drivers/net/memif/memif.h
> create mode 100644 drivers/net/memif/memif_socket.c
> create mode 100644 drivers/net/memif/memif_socket.h
> create mode 100644 drivers/net/memif/meson.build
> create mode 100644 drivers/net/memif/rte_eth_memif.c
> create mode 100644 drivers/net/memif/rte_eth_memif.h
> create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
>
> requires patch: http://patchwork.dpdk.org/patch/51511/
> Memif uses a unix domain socket as control channel (cc).
> rte_interrupts.h is used to handle interrupts for this cc.
> This patch is required, because the message received can
> be a reason for disconnecting.
>
> v3:
> - coding style fixes
> - documentation
> - doxygen comments
> - use strlcpy() instead of strncpy()
> - use RTE_BUILD_BUG_ON instead of _Static_assert()
> - fix meson build deps
>
> v4:
> - coding style fixes
> - doc update (desc format, messaging, features, ...)
> - pointer arithmetic fix
> - __rte_packed and __rte_aligned instead of __attribute__()
> - fixed multi-queue
> - support additional archs (memfd_create syscall)
>
> v5:
> - coding style fixes
> - add release notes
> - remove unused dependencies
> - doc update
>
> v6:
> - remove socket on termination of testpmd application
> - disallow memif probing in secondary process (return error), multi-process support will be implemented in separate patch
> - fix chained buffer packet length
> - use single region for rings and buffers (non-zero-copy)
> - doc update
>
> v7:
> - coding style fixes
>
> v8:
> - release notes: 19.05 -> 19.08
>
> v9:
> - rearrange elements in structures to remove holes
> - default socket file in /run
> - #define name sizes
> - implement dev_close op
>
> v10:
> - fix shared lib build
> - clear memory allocated by PMD in .dev_close
> - fix NULL pointer string 'error: ‘%s’ directive argument is null'
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 15d0829c5..3698c146b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -841,6 +841,12 @@ F: drivers/net/softnic/
> F: doc/guides/nics/features/softnic.ini
> F: doc/guides/nics/softnic.rst
>
> +Memif PMD
> +M: Jakub Grajciar <jgrajcia@cisco.com>
> +F: drivers/net/memif/
> +F: doc/guides/nics/memif.rst
> +F: doc/guides/nics/features/memif.ini
> +
>
> Crypto Drivers
> --------------
> diff --git a/config/common_base b/config/common_base
> index 6f19ad5d2..e406e7836 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -444,6 +444,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
> #
> CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
>
> +#
> +# Compile Memory Interface PMD driver (Linux only)
> +#
> +CONFIG_RTE_LIBRTE_PMD_MEMIF=n
> +
> #
> # Compile link bonding PMD library
> #
> diff --git a/config/common_linux b/config/common_linux
> index 75334273d..87514fe4f 100644
> --- a/config/common_linux
> +++ b/config/common_linux
> @@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
> CONFIG_RTE_LIBRTE_PMD_VHOST=y
> CONFIG_RTE_LIBRTE_IFC_PMD=y
> CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
> +CONFIG_RTE_LIBRTE_PMD_MEMIF=y
> CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
> CONFIG_RTE_LIBRTE_PMD_TAP=y
> CONFIG_RTE_LIBRTE_AVP_PMD=y
> diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
> new file mode 100644
> index 000000000..807d9ecdc
> --- /dev/null
> +++ b/doc/guides/nics/features/memif.ini
> @@ -0,0 +1,14 @@
> +;
> +; Supported features of the 'memif' network poll mode driver.
> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +Link status = Y
> +Basic stats = Y
> +Jumbo frame = Y
> +ARMv8 = Y
> +Power8 = Y
> +x86-32 = Y
> +x86-64 = Y
> +Usage doc = Y
> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
> index 2221c35f2..691e7209f 100644
> --- a/doc/guides/nics/index.rst
> +++ b/doc/guides/nics/index.rst
> @@ -36,6 +36,7 @@ Network Interface Controller Drivers
> intel_vf
> kni
> liquidio
> + memif
> mlx4
> mlx5
> mvneta
> diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
> new file mode 100644
> index 000000000..de2d481eb
> --- /dev/null
> +++ b/doc/guides/nics/memif.rst
> @@ -0,0 +1,234 @@
> +.. SPDX-License-Identifier: BSD-3-Clause
> + Copyright(c) 2018-2019 Cisco Systems, Inc.
> +
> +======================
> +Memif Poll Mode Driver
> +======================
> +
> +Shared memory packet interface (memif) PMD allows for DPDK and any other client
> +using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
> +Linux only.
> +
> +The created device transmits packets in a raw format. It can be used with
> +Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
> +supported in DPDK memif implementation.
> +
> +Memif works in two roles: master and slave. Slave connects to master over an
> +existing socket. It is also a producer of shared memory file and initializes
> +the shared memory. Each interface can be connected to one peer interface
> +at same time. The peer interface is identified by id parameter. Master
> +creates the socket and listens for any slave connection requests. The socket
> +may already exist on the system. Be sure to remove any such sockets, if you
> +are creating a master interface, or you will see an "Address already in use"
> +error. Function ``rte_pmd_memif_remove()``, which removes memif interface,
> +will also remove a listener socket, if it is not being used by any other
> +interface.
> +
> +The method to enable one or more interfaces is to use the
> +``--vdev=net_memif0`` option on the DPDK application command line. Each
> +``--vdev=net_memif1`` option given will create an interface named net_memif0,
> +net_memif1, and so on. Memif uses unix domain socket to transmit control
> +messages. Each memif has a unique id per socket. This id is used to identify
> +peer interface. If you are connecting multiple
> +interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
> +etc. Note that if you assign a socket to a master interface it becomes a
> +listener socket. Listener socket can not be used by a slave interface on same
> +client.
> +
> +.. csv-table:: **Memif configuration options**
> + :header: "Option", "Description", "Default", "Valid value"
> +
> + "id=0", "Used to identify peer interface", "0", "uint32_t"
> + "role=master", "Set memif role", "slave", "master|slave"
> + "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
> + "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
> + "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
> + "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
> + "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
> + "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
> +
> +**Connection establishment**
> +
> +In order to create memif connection, two memif interfaces, each in separate
> +process, are needed. One interface in ``master`` role and other in
> +``slave`` role. It is not possible to connect two interfaces in a single
> +process. Each interface can be connected to one interface at same time,
> +identified by matching id parameter.
> +
> +Memif driver uses unix domain socket to exchange required information between
> +memif interfaces. Socket file path is specified at interface creation see
> +*Memif configuration options* table above. If socket is used by ``master``
> +interface, it's marked as listener socket (in scope of current process) and
> +listens to connection requests from other processes. One socket can be used by
> +multiple interfaces. One process can have ``slave`` and ``master`` interfaces
> +at the same time, provided each role is assigned unique socket.
> +
> +For detailed information on memif control messages, see: net/memif/memif.h.
> +
> +Slave interface attempts to make a connection on assigned socket. Process
> +listening on this socket will extract the connection request and create a new
> +connected socket (control channel). Then it sends the 'hello' message
> +(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
> +adjusts its configuration accordingly, and sends 'init' message
> +(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
> +uses this id to find master interface, and assigns the control channel to this
> +interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
> +sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
> +every region allocated. Master responds to each of these messages with 'ack'
> +message. Same behavior applies to rings. Slave sends 'add ring' message
> +(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
> +each message with 'ack' message. To finalize the connection, slave interface
> +sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
> +master maps regions to its address space, initializes rings and responds with
> +'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
> +(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
> +any time, due to driver error or if the interface is being deleted.
> +
> +Files
> +
> +- net/memif/memif.h *- control messages definitions*
> +- net/memif/memif_socket.h
> +- net/memif/memif_socket.c
> +
> +Shared memory
> +~~~~~~~~~~~~~
> +
> +**Shared memory format**
> +
> +Slave is producer and master is consumer. Memory regions, are mapped shared memory files,
> +created by memif slave and provided to master at connection establishment.
> +Regions contain rings and buffers. Rings and buffers can also be separated into multiple
> +regions. For no-zero-copy, rings and buffers are stored inside single memory
> +region to reduce the number of opened files.
> +
> +region n (no-zero-copy):
> +
> ++-----------------------+-------------------------------------------------------------------------+
> +| Rings | Buffers |
> ++-----------+-----------+-----------------+---+---------------------------------------------------+
> +| S2M rings | M2S rings | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
> ++-----------+-----------+-----------------+---+---------------------------------------------------+
> +
> +S2M OR M2S Rings:
> +
> ++--------+--------+-----------------------+
> +| ring 0 | ring 1 | ring num_s2m_rings - 1|
> ++--------+--------+-----------------------+
> +
> +ring 0:
> +
> ++-------------+---------------------------------------+
> +| ring header | (1 << pmd->run.log2_ring_size) * desc |
> ++-------------+---------------------------------------+
> +
> +Descriptors are assigned packet buffers in order of rings creation. If we have one ring
> +in each direction and ring size is 1024, then first 1024 buffers will belong to S2M ring and
> +last 1024 will belong to M2S ring. In case of zero-copy, buffers are dequeued and
> +enqueued as needed.
> +
> +**Descriptor format**
> +
> ++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> +|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
> +| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> +|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
> ++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> +|0 |length |region |flags |
> ++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
> +|1 |metadata |offset |
> ++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> +| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
> +| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> +| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
> ++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> +
> +**Flags field - flags (Quad Word 0, bits 0:15)**
> +
> ++-----+--------------------+------------------------------------------------------------------------------------------------+
> +|Bits |Name |Functionality |
> ++=====+====================+================================================================================================+
> +|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
> ++-----+--------------------+------------------------------------------------------------------------------------------------+
> +
> +**Region index - region (Quad Word 0, 16:31)**
> +
> +Index of memory region, the buffer is located in.
> +
> +**Data length - length (Quad Word 0, 32:63)**
> +
> +Length of transmitted/received data.
> +
> +**Data Offset - offset (Quad Word 1, 0:31)**
> +
> +Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
> +
> +**Metadata - metadata (Quad Word 1, 32:63)**
> +
> +Buffer metadata.
> +
> +Files
> +
> +- net/memif/memif.h *- descriptor and ring definitions*
> +- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
> +
> +Example: testpmd
> +----------------------------
> +In this example we run two instances of testpmd application and transmit packets over memif.
> +
> +First create ``master`` interface::
> +
> + #./build/app/testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
> +
> +Now create ``slave`` interface (master must be already running so the slave will connect)::
> +
> + #./build/app/testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
> +
> +Start forwarding packets::
> +
> + Slave:
> + testpmd> start
> +
> + Master:
> + testpmd> start tx_first
> +
> +Show status::
> +
> + testpmd> show port stats 0
> +
> +For more details on testpmd please refer to :doc:`../testpmd_app_ug/index`.
> +
> +Example: testpmd and VPP
> +------------------------
> +For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
> +
> +Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
> +
> + vpp# create interface memif id 0 master no-zero-copy
> + vpp# set interface state memif0/0 up
> + vpp# set interface ip address memif0/0 192.168.1.1/24
> +
> +To see socket filename use show memif command::
> +
> + vpp# show memif
> + sockets
> + id listener filename
> + 0 yes (1) /run/vpp/memif.sock
> + ...
> +
> +Now create memif interface by running testpmd with these command line options::
> +
> + #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
> +
> +Testpmd should now create memif slave interface and try to connect to master.
> +In testpmd set forward option to icmpecho and start forwarding::
> +
> + testpmd> set fwd icmpecho
> + testpmd> start
> +
> +Send ping from VPP::
> +
> + vpp# ping 192.168.1.2
> + 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
> + 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
> + 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
> + 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
> diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
> index a17e7dea5..46efd13f0 100644
> --- a/doc/guides/rel_notes/release_19_08.rst
> +++ b/doc/guides/rel_notes/release_19_08.rst
> @@ -54,6 +54,11 @@ New Features
> Also, make sure to start the actual text at the margin.
> =========================================================
>
> +* **Added memif PMD.**
> +
> + Added the new Shared Memory Packet Interface (``memif``) PMD.
> + See the :doc:`../nics/memif` guide for more details on this new driver.
> +
>
> Removed Items
> -------------
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 3a72cf38c..78cb10fc6 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -35,6 +35,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
> DIRS-$(CONFIG_RTE_LIBRTE_IPN3KE_PMD) += ipn3ke
> DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
> DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
> DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
> DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
> DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
> diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
> new file mode 100644
> index 000000000..c3119625c
> --- /dev/null
> +++ b/drivers/net/memif/Makefile
> @@ -0,0 +1,31 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +#
> +# library name
> +#
> +LIB = librte_pmd_memif.a
> +
> +EXPORT_MAP := rte_pmd_memif_version.map
> +
> +LIBABIVER := 1
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> +# Experimantal APIs:
> +# - rte_intr_callback_unregister_pending
> +LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
> +LDLIBS += -lrte_ethdev -lrte_kvargs
> +LDLIBS += -lrte_hash
> +LDLIBS += -lrte_bus_vdev
> +
> +#
> +# all source are stored in SRCS-y
> +#
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
> new file mode 100644
> index 000000000..3948b1f50
> --- /dev/null
> +++ b/drivers/net/memif/memif.h
> @@ -0,0 +1,179 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
> + */
> +
> +#ifndef _MEMIF_H_
> +#define _MEMIF_H_
> +
> +#define MEMIF_COOKIE 0x3E31F20
> +#define MEMIF_VERSION_MAJOR 2
> +#define MEMIF_VERSION_MINOR 0
> +#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
> +#define MEMIF_NAME_SZ 32
> +
> +/*
> + * S2M: direction slave -> master
> + * M2S: direction master -> slave
> + */
> +
> +/*
> + * Type definitions
> + */
> +
> +typedef enum memif_msg_type {
> + MEMIF_MSG_TYPE_NONE,
> + MEMIF_MSG_TYPE_ACK,
> + MEMIF_MSG_TYPE_HELLO,
> + MEMIF_MSG_TYPE_INIT,
> + MEMIF_MSG_TYPE_ADD_REGION,
> + MEMIF_MSG_TYPE_ADD_RING,
> + MEMIF_MSG_TYPE_CONNECT,
> + MEMIF_MSG_TYPE_CONNECTED,
> + MEMIF_MSG_TYPE_DISCONNECT,
> +} memif_msg_type_t;
> +
> +typedef enum {
> + MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
> + MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
> +} memif_ring_type_t;
> +
> +typedef enum {
> + MEMIF_INTERFACE_MODE_ETHERNET,
> + MEMIF_INTERFACE_MODE_IP,
> + MEMIF_INTERFACE_MODE_PUNT_INJECT,
> +} memif_interface_mode_t;
> +
> +typedef uint16_t memif_region_index_t;
> +typedef uint32_t memif_region_offset_t;
> +typedef uint64_t memif_region_size_t;
> +typedef uint16_t memif_ring_index_t;
> +typedef uint32_t memif_interface_id_t;
> +typedef uint16_t memif_version_t;
> +typedef uint8_t memif_log2_ring_size_t;
> +
> +/*
> + * Socket messages
> + */
> +
> + /**
> + * M2S
> + * Contains master interfaces configuration.
> + */
> +typedef struct __rte_packed {
> + uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
> + memif_version_t min_version; /**< lowest supported memif version */
> + memif_version_t max_version; /**< highest supported memif version */
> + memif_region_index_t max_region; /**< maximum num of regions */
> + memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
> + memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
> + memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
> +} memif_msg_hello_t;
> +
> +/**
> + * S2M
> + * Contains information required to identify interface
> + * to which the slave wants to connect.
> + */
> +typedef struct __rte_packed {
> + memif_version_t version; /**< memif version */
> + memif_interface_id_t id; /**< interface id */
> + memif_interface_mode_t mode:8; /**< interface mode */
> + uint8_t secret[24]; /**< optional security parameter */
> + uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
> +} memif_msg_init_t;
> +
> +/**
> + * S2M
> + * Request master to add new shared memory region to master interface.
> + * Shared files file descriptor is passed in cmsghdr.
> + */
> +typedef struct __rte_packed {
> + memif_region_index_t index; /**< shm regions index */
> + memif_region_size_t size; /**< shm region size */
> +} memif_msg_add_region_t;
> +
> +/**
> + * S2M
> + * Request master to add new ring to master interface.
> + */
> +typedef struct __rte_packed {
> + uint16_t flags; /**< flags */
> +#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
> + memif_ring_index_t index; /**< ring index */
> + memif_region_index_t region; /**< region index on which this ring is located */
> + memif_region_offset_t offset; /**< buffer start offset */
> + memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
> + uint16_t private_hdr_size; /**< used for private metadata */
> +} memif_msg_add_ring_t;
> +
> +/**
> + * S2M
> + * Finalize connection establishment.
> + */
> +typedef struct __rte_packed {
> + uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
> +} memif_msg_connect_t;
> +
> +/**
> + * M2S
> + * Finalize connection establishment.
> + */
> +typedef struct __rte_packed {
> + uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
> +} memif_msg_connected_t;
> +
> +/**
> + * S2M & M2S
> + * Disconnect interfaces.
> + */
> +typedef struct __rte_packed {
> + uint32_t code; /**< error code */
> + uint8_t string[96]; /**< disconnect reason */
> +} memif_msg_disconnect_t;
> +
> +typedef struct __rte_packed __rte_aligned(128)
> +{
> + memif_msg_type_t type:16;
> + union {
> + memif_msg_hello_t hello;
> + memif_msg_init_t init;
> + memif_msg_add_region_t add_region;
> + memif_msg_add_ring_t add_ring;
> + memif_msg_connect_t connect;
> + memif_msg_connected_t connected;
> + memif_msg_disconnect_t disconnect;
> + };
> +} memif_msg_t;
> +
> +/*
> + * Ring and Descriptor Layout
> + */
> +
> +/**
> + * Buffer descriptor.
> + */
> +typedef struct __rte_packed {
> + uint16_t flags; /**< flags */
> +#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
> + memif_region_index_t region; /**< region index on which the buffer is located */
> + uint32_t length; /**< buffer length */
> + memif_region_offset_t offset; /**< buffer offset */
> + uint32_t metadata;
> +} memif_desc_t;
> +
> +#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
> + uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
> +
> +typedef struct {
> + MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
> + uint32_t cookie; /**< MEMIF_COOKIE */
> + uint16_t flags; /**< flags */
> +#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
> + volatile uint16_t head; /**< pointer to ring buffer head */
> + MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
> + volatile uint16_t tail; /**< pointer to ring buffer tail */
> + MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
> + memif_desc_t desc[0]; /**< buffer descriptors */
> +} memif_ring_t;
> +
> +#endif /* _MEMIF_H_ */
> diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
> new file mode 100644
> index 000000000..98e8ceefd
> --- /dev/null
> +++ b/drivers/net/memif/memif_socket.c
> @@ -0,0 +1,1124 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
> + */
> +
> +#include <stdlib.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/un.h>
> +#include <sys/ioctl.h>
> +#include <errno.h>
> +
> +#include <rte_version.h>
> +#include <rte_mbuf.h>
> +#include <rte_ether.h>
> +#include <rte_ethdev_driver.h>
> +#include <rte_ethdev_vdev.h>
> +#include <rte_malloc.h>
> +#include <rte_kvargs.h>
> +#include <rte_bus_vdev.h>
> +#include <rte_hash.h>
> +#include <rte_jhash.h>
> +#include <rte_string_fns.h>
> +
> +#include "rte_eth_memif.h"
> +#include "memif_socket.h"
> +
> +static void memif_intr_handler(void *arg);
> +
> +static ssize_t
> +memif_msg_send(int fd, memif_msg_t *msg, int afd)
> +{
> + struct msghdr mh = { 0 };
> + struct iovec iov[1];
> + struct cmsghdr *cmsg;
> + char ctl[CMSG_SPACE(sizeof(int))];
> +
> + iov[0].iov_base = msg;
> + iov[0].iov_len = sizeof(memif_msg_t);
> + mh.msg_iov = iov;
> + mh.msg_iovlen = 1;
> +
> + if (afd > 0) {
> + memset(&ctl, 0, sizeof(ctl));
> + mh.msg_control = ctl;
> + mh.msg_controllen = sizeof(ctl);
> + cmsg = CMSG_FIRSTHDR(&mh);
> + cmsg->cmsg_len = CMSG_LEN(sizeof(int));
> + cmsg->cmsg_level = SOL_SOCKET;
> + cmsg->cmsg_type = SCM_RIGHTS;
> + rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
> + }
> +
> + return sendmsg(fd, &mh, 0);
> +}
> +
> +static int
> +memif_msg_send_from_queue(struct memif_control_channel *cc)
> +{
> + ssize_t size;
> + int ret = 0;
> + struct memif_msg_queue_elt *e;
> +
> + e = TAILQ_FIRST(&cc->msg_queue);
> + if (e == NULL)
> + return 0;
> +
> + size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
> + if (size != sizeof(memif_msg_t)) {
> + MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
> + ret = -1;
> + } else {
> + MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
> + }
> + TAILQ_REMOVE(&cc->msg_queue, e, next);
> + rte_free(e);
> +
> + return ret;
> +}
> +
> +static struct memif_msg_queue_elt *
> +memif_msg_enq(struct memif_control_channel *cc)
> +{
> + struct memif_msg_queue_elt *e;
> +
> + e = rte_zmalloc("memif_msg", sizeof(struct memif_msg_queue_elt), 0);
> + if (e == NULL) {
> + MIF_LOG(ERR, "Failed to allocate control message.");
> + return NULL;
> + }
> +
> + e->fd = -1;
> + TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
> +
> + return e;
> +}
> +
> +void
> +memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
> + int err_code)
> +{
> + struct memif_msg_queue_elt *e;
> + struct pmd_internals *pmd;
> + memif_msg_disconnect_t *d;
> +
> + if (cc == NULL) {
> + MIF_LOG(DEBUG, "Missing control channel.");
> + return;
> + }
> +
> + e = memif_msg_enq(cc);
> + if (e == NULL) {
> + MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
> + return;
> + }
> +
> + d = &e->msg.disconnect;
> +
> + e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
> + d->code = err_code;
> +
> + if (reason != NULL) {
> + strlcpy((char *)d->string, reason, sizeof(d->string));
> + if (cc->dev != NULL) {
> + pmd = cc->dev->data->dev_private;
> + strlcpy(pmd->local_disc_string, reason,
> + sizeof(pmd->local_disc_string));
> + }
> + }
> +}
> +
> +static int
> +memif_msg_enq_hello(struct memif_control_channel *cc)
> +{
> + struct memif_msg_queue_elt *e = memif_msg_enq(cc);
> + memif_msg_hello_t *h;
> +
> + if (e == NULL)
> + return -1;
> +
> + h = &e->msg.hello;
> +
> + e->msg.type = MEMIF_MSG_TYPE_HELLO;
> + h->min_version = MEMIF_VERSION;
> + h->max_version = MEMIF_VERSION;
> + h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
> + h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
> + h->max_region = ETH_MEMIF_MAX_REGION_NUM - 1;
> + h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
> +
> + strlcpy((char *)h->name, rte_version(), sizeof(h->name));
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + memif_msg_hello_t *h = &msg->hello;
> +
> + if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
> + memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
> + return -1;
> + }
> +
> + /* Set parameters for active connection */
> + pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
> + pmd->cfg.num_s2m_rings);
> + pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
> + pmd->cfg.num_m2s_rings);
> + pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
> + pmd->cfg.log2_ring_size);
> + pmd->run.pkt_buffer_size = pmd->cfg.pkt_buffer_size;
> +
> + strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
> +
> + MIF_LOG(DEBUG, "%s: Connecting to %s.",
> + rte_vdev_device_name(pmd->vdev), pmd->remote_name);
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
> +{
> + memif_msg_init_t *i = &msg->init;
> + struct memif_socket_dev_list_elt *elt;
> + struct pmd_internals *pmd;
> + struct rte_eth_dev *dev;
> +
> + if (i->version != MEMIF_VERSION) {
> + memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
> + return -1;
> + }
> +
> + if (cc->socket == NULL) {
> + memif_msg_enq_disconnect(cc, "Device error", 0);
> + return -1;
> + }
> +
> + /* Find device with requested ID */
> + TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
> + dev = elt->dev;
> + pmd = dev->data->dev_private;
> + if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
> + pmd->id == i->id) {
> + /* assign control channel to device */
> + cc->dev = dev;
> + pmd->cc = cc;
> +
> + if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
> + memif_msg_enq_disconnect(pmd->cc,
> + "Only ethernet mode supported",
> + 0);
> + return -1;
> + }
> +
> + if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
> + ETH_MEMIF_FLAG_CONNECTED)) {
> + memif_msg_enq_disconnect(pmd->cc,
> + "Already connected", 0);
> + return -1;
> + }
> + strlcpy(pmd->remote_name, (char *)i->name,
> + sizeof(pmd->remote_name));
> +
> + if (*pmd->secret != '\0') {
> + if (*i->secret == '\0') {
> + memif_msg_enq_disconnect(pmd->cc,
> + "Secret required", 0);
> + return -1;
> + }
> + if (strncmp(pmd->secret, (char *)i->secret,
> + ETH_MEMIF_SECRET_SIZE) != 0) {
> + memif_msg_enq_disconnect(pmd->cc,
> + "Incorrect secret", 0);
> + return -1;
> + }
> + }
> +
> + pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
> + return 0;
> + }
> + }
> +
> + /* ID not found on this socket */
> + MIF_LOG(DEBUG, "ID %u not found.", i->id);
> + memif_msg_enq_disconnect(cc, "ID not found", 0);
> + return -1;
> +}
> +
> +static int
> +memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
> + int fd)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + memif_msg_add_region_t *ar = &msg->add_region;
> + struct memif_region *r;
> +
> + if (fd < 0) {
> + memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
> + return -1;
> + }
> +
> + if (ar->index >= ETH_MEMIF_MAX_REGION_NUM || ar->index != pmd->regions_num ||
> + pmd->regions[ar->index] != NULL) {
> + memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
> + return -1;
> + }
> +
> + r = rte_zmalloc("region", sizeof(struct memif_region), 0);
> + if (r == NULL) {
> + MIF_LOG(ERR, "%s: Failed to alloc memif region.",
> + rte_vdev_device_name(pmd->vdev));
> + return -ENOMEM;
> + }
> +
> + r->fd = fd;
> + r->region_size = ar->size;
> + r->addr = NULL;
> +
> + pmd->regions[ar->index] = r;
> + pmd->regions_num++;
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + memif_msg_add_ring_t *ar = &msg->add_ring;
> + struct memif_queue *mq;
> +
> + if (fd < 0) {
> + memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
> + return -1;
> + }
> +
> + /* check if we have enough queues */
> + if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
> + if (ar->index >= pmd->cfg.num_s2m_rings) {
> + memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
> + return -1;
> + }
> + pmd->run.num_s2m_rings++;
> + } else {
> + if (ar->index >= pmd->cfg.num_m2s_rings) {
> + memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
> + return -1;
> + }
> + pmd->run.num_m2s_rings++;
> + }
> +
> + mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
> + dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
> +
> + mq->intr_handle.fd = fd;
> + mq->log2_ring_size = ar->log2_ring_size;
> + mq->region = ar->region;
> + mq->ring_offset = ar->offset;
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + memif_msg_connect_t *c = &msg->connect;
> + int ret;
> +
> + ret = memif_connect(dev);
> + if (ret < 0)
> + return ret;
> +
> + strlcpy(pmd->remote_if_name, (char *)c->if_name,
> + sizeof(pmd->remote_if_name));
> + MIF_LOG(INFO, "%s: Remote interface %s connected.",
> + rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + memif_msg_connected_t *c = &msg->connected;
> + int ret;
> +
> + ret = memif_connect(dev);
> + if (ret < 0)
> + return ret;
> +
> + strlcpy(pmd->remote_if_name, (char *)c->if_name,
> + sizeof(pmd->remote_if_name));
> + MIF_LOG(INFO, "%s: Remote interface %s connected.",
> + rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + memif_msg_disconnect_t *d = &msg->disconnect;
> +
> + memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
> + strlcpy(pmd->remote_disc_string, (char *)d->string,
> + sizeof(pmd->remote_disc_string));
> +
> + MIF_LOG(INFO, "%s: Disconnect received: %s",
> + rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
> +
> + memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
> + memif_disconnect(rte_eth_dev_allocated
> + (rte_vdev_device_name(pmd->vdev)));
> + return 0;
> +}
> +
> +static int
> +memif_msg_enq_ack(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
> + if (e == NULL)
> + return -1;
> +
> + e->msg.type = MEMIF_MSG_TYPE_ACK;
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_enq_init(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
> + memif_msg_init_t *i = &e->msg.init;
> +
> + if (e == NULL)
> + return -1;
> +
> + i = &e->msg.init;
> + e->msg.type = MEMIF_MSG_TYPE_INIT;
> + i->version = MEMIF_VERSION;
> + i->id = pmd->id;
> + i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
> +
> + strlcpy((char *)i->name, rte_version(), sizeof(i->name));
> +
> + if (*pmd->secret != '\0')
> + strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
> + memif_msg_add_region_t *ar;
> + struct memif_region *mr = pmd->regions[idx];
> +
> + if (e == NULL)
> + return -1;
> +
> + ar = &e->msg.add_region;
> + e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
> + e->fd = mr->fd;
> + ar->index = idx;
> + ar->size = mr->region_size;
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
> + memif_ring_type_t type)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
> + struct memif_queue *mq;
> + memif_msg_add_ring_t *ar;
> +
> + if (e == NULL)
> + return -1;
> +
> + ar = &e->msg.add_ring;
> + mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
> + dev->data->rx_queues[idx];
> +
> + e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
> + e->fd = mq->intr_handle.fd;
> + ar->index = idx;
> + ar->offset = mq->ring_offset;
> + ar->region = mq->region;
> + ar->log2_ring_size = mq->log2_ring_size;
> + ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
> + ar->private_hdr_size = 0;
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_enq_connect(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
> + const char *name = rte_vdev_device_name(pmd->vdev);
> + memif_msg_connect_t *c;
> +
> + if (e == NULL)
> + return -1;
> +
> + c = &e->msg.connect;
> + e->msg.type = MEMIF_MSG_TYPE_CONNECT;
> + strlcpy((char *)c->if_name, name, sizeof(c->if_name));
> +
> + return 0;
> +}
> +
> +static int
> +memif_msg_enq_connected(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
> + const char *name = rte_vdev_device_name(pmd->vdev);
> + memif_msg_connected_t *c;
> +
> + if (e == NULL)
> + return -1;
> +
> + c = &e->msg.connected;
> + e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
> + strlcpy((char *)c->if_name, name, sizeof(c->if_name));
> +
> + return 0;
> +}
> +
> +static void
> +memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
> +{
> + struct memif_msg_queue_elt *elt;
> + struct memif_control_channel *cc = arg;
> +
> + /* close control channel fd */
> + close(intr_handle->fd);
> + /* clear message queue */
> + while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
> + TAILQ_REMOVE(&cc->msg_queue, elt, next);
> + rte_free(elt);
> + }
> + /* free control channel */
> + rte_free(cc);
> +}
> +
> +void
> +memif_disconnect(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_msg_queue_elt *elt, *next;
> + struct memif_queue *mq;
> + struct rte_intr_handle *ih;
> + int i;
> + int ret;
> +
> + if (pmd->cc != NULL) {
> + /* Clear control message queue (except disconnect message if any). */
> + for (elt = TAILQ_FIRST(&pmd->cc->msg_queue); elt != NULL; elt = next) {
> + next = TAILQ_NEXT(elt, next);
> + if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
> + TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
> + rte_free(elt);
> + }
> + }
> + /* send disconnect message (if there is any in queue) */
> + memif_msg_send_from_queue(pmd->cc);
> +
> + /* at this point, there should be no more messages in queue */
> + if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
> + MIF_LOG(WARNING,
> + "%s: Unexpected message(s) in message queue.",
> + rte_vdev_device_name(pmd->vdev));
> + }
> +
> + ih = &pmd->cc->intr_handle;
> + if (ih->fd > 0) {
> + ret = rte_intr_callback_unregister(ih,
> + memif_intr_handler,
> + pmd->cc);
> + /*
> + * If callback is active (disconnecting based on
> + * received control message).
> + */
> + if (ret == -EAGAIN) {
> + ret = rte_intr_callback_unregister_pending(ih,
> + memif_intr_handler,
> + pmd->cc,
> + memif_intr_unregister_handler);
> + } else if (ret > 0) {
> + close(ih->fd);
> + rte_free(pmd->cc);
> + }
> + pmd->cc = NULL;
> + if (ret <= 0)
> + MIF_LOG(WARNING, "%s: Failed to unregister "
> + "control channel callback.",
> + rte_vdev_device_name(pmd->vdev));
> + }
> + }
> +
> + /* unconfig interrupts */
> + for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
> + if (pmd->role == MEMIF_ROLE_SLAVE) {
> + if (dev->data->tx_queues != NULL)
> + mq = dev->data->tx_queues[i];
> + else
> + continue;
> + } else {
> + if (dev->data->rx_queues != NULL)
> + mq = dev->data->rx_queues[i];
> + else
> + continue;
> + }
> + if (mq->intr_handle.fd > 0) {
> + close(mq->intr_handle.fd);
> + mq->intr_handle.fd = -1;
> + }
> + mq->ring = NULL;
> + }
> + for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
> + if (pmd->role == MEMIF_ROLE_MASTER) {
> + if (dev->data->tx_queues != NULL)
> + mq = dev->data->tx_queues[i];
> + else
> + continue;
> + } else {
> + if (dev->data->rx_queues != NULL)
> + mq = dev->data->rx_queues[i];
> + else
> + continue;
> + }
> + if (mq->intr_handle.fd > 0) {
> + close(mq->intr_handle.fd);
> + mq->intr_handle.fd = -1;
> + }
> + mq->ring = NULL;
> + }
> +
> + memif_free_regions(pmd);
> +
> + /* reset connection configuration */
> + memset(&pmd->run, 0, sizeof(pmd->run));
> +
> + dev->data->dev_link.link_status = ETH_LINK_DOWN;
> + pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
> + pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
> + MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
> +}
> +
> +static int
> +memif_msg_receive(struct memif_control_channel *cc)
> +{
> + char ctl[CMSG_SPACE(sizeof(int)) +
> + CMSG_SPACE(sizeof(struct ucred))] = { 0 };
> + struct msghdr mh = { 0 };
> + struct iovec iov[1];
> + memif_msg_t msg = { 0 };
> + ssize_t size;
> + int ret = 0;
> + struct ucred *cr __rte_unused;
> + cr = 0;
> + struct cmsghdr *cmsg;
> + int afd = -1;
> + int i;
> + struct pmd_internals *pmd;
> +
> + iov[0].iov_base = (void *)&msg;
> + iov[0].iov_len = sizeof(memif_msg_t);
> + mh.msg_iov = iov;
> + mh.msg_iovlen = 1;
> + mh.msg_control = ctl;
> + mh.msg_controllen = sizeof(ctl);
> +
> + size = recvmsg(cc->intr_handle.fd, &mh, 0);
> + if (size != sizeof(memif_msg_t)) {
> + MIF_LOG(DEBUG, "Invalid message size.");
> + memif_msg_enq_disconnect(cc, "Invalid message size", 0);
> + return -1;
> + }
> + MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
> +
> + cmsg = CMSG_FIRSTHDR(&mh);
> + while (cmsg) {
> + if (cmsg->cmsg_level == SOL_SOCKET) {
> + if (cmsg->cmsg_type == SCM_CREDENTIALS)
> + cr = (struct ucred *)CMSG_DATA(cmsg);
> + else if (cmsg->cmsg_type == SCM_RIGHTS)
> + afd = *(int *)CMSG_DATA(cmsg);
I see that GCC from Xenial:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Complains as follows:
../drivers/net/memif/memif_socket.c: In function ‘memif_msg_receive’:
../drivers/net/memif/memif_socket.c:665:33: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
afd = *(int *)CMSG_DATA(cmsg);
^
Here's the travis build:
https://travis-ci.com/ovsrobot/dpdk/builds/113842433
> + }
> + cmsg = CMSG_NXTHDR(&mh, cmsg);
> + }
> +
> + if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
> + MIF_LOG(DEBUG, "Unexpected message.");
> + memif_msg_enq_disconnect(cc, "Unexpected message", 0);
> + return -1;
> + }
> +
> + /* get device from hash data */
> + switch (msg.type) {
> + case MEMIF_MSG_TYPE_ACK:
> + break;
> + case MEMIF_MSG_TYPE_HELLO:
> + ret = memif_msg_receive_hello(cc->dev, &msg);
> + if (ret < 0)
> + goto exit;
> + ret = memif_init_regions_and_queues(cc->dev);
> + if (ret < 0)
> + goto exit;
> + ret = memif_msg_enq_init(cc->dev);
> + if (ret < 0)
> + goto exit;
> + pmd = cc->dev->data->dev_private;
> + for (i = 0; i < pmd->regions_num; i++) {
> + ret = memif_msg_enq_add_region(cc->dev, i);
> + if (ret < 0)
> + goto exit;
> + }
> + for (i = 0; i < pmd->run.num_s2m_rings; i++) {
> + ret = memif_msg_enq_add_ring(cc->dev, i,
> + MEMIF_RING_S2M);
> + if (ret < 0)
> + goto exit;
> + }
> + for (i = 0; i < pmd->run.num_m2s_rings; i++) {
> + ret = memif_msg_enq_add_ring(cc->dev, i,
> + MEMIF_RING_M2S);
> + if (ret < 0)
> + goto exit;
> + }
> + ret = memif_msg_enq_connect(cc->dev);
> + if (ret < 0)
> + goto exit;
> + break;
> + case MEMIF_MSG_TYPE_INIT:
> + /*
> + * This cc does not have an interface asociated with it.
> + * If suitable interface is found it will be assigned here.
> + */
> + ret = memif_msg_receive_init(cc, &msg);
> + if (ret < 0)
> + goto exit;
> + ret = memif_msg_enq_ack(cc->dev);
> + if (ret < 0)
> + goto exit;
> + break;
> + case MEMIF_MSG_TYPE_ADD_REGION:
> + ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
> + if (ret < 0)
> + goto exit;
> + ret = memif_msg_enq_ack(cc->dev);
> + if (ret < 0)
> + goto exit;
> + break;
> + case MEMIF_MSG_TYPE_ADD_RING:
> + ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
> + if (ret < 0)
> + goto exit;
> + ret = memif_msg_enq_ack(cc->dev);
> + if (ret < 0)
> + goto exit;
> + break;
> + case MEMIF_MSG_TYPE_CONNECT:
> + ret = memif_msg_receive_connect(cc->dev, &msg);
> + if (ret < 0)
> + goto exit;
> + ret = memif_msg_enq_connected(cc->dev);
> + if (ret < 0)
> + goto exit;
> + break;
> + case MEMIF_MSG_TYPE_CONNECTED:
> + ret = memif_msg_receive_connected(cc->dev, &msg);
> + break;
> + case MEMIF_MSG_TYPE_DISCONNECT:
> + ret = memif_msg_receive_disconnect(cc->dev, &msg);
> + if (ret < 0)
> + goto exit;
> + break;
> + default:
> + memif_msg_enq_disconnect(cc, "Unknown message type", 0);
> + ret = -1;
> + goto exit;
> + }
> +
> + exit:
> + return ret;
> +}
> +
> +static void
> +memif_intr_handler(void *arg)
> +{
> + struct memif_control_channel *cc = arg;
> + int ret;
> +
> + ret = memif_msg_receive(cc);
> + /* if driver failed to assign device */
> + if (cc->dev == NULL) {
> + ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
> + memif_intr_handler,
> + cc,
> + memif_intr_unregister_handler);
> + if (ret < 0)
> + MIF_LOG(WARNING,
> + "Failed to unregister control channel callback.");
> + return;
> + }
> + /* if memif_msg_receive failed */
> + if (ret < 0)
> + goto disconnect;
> +
> + ret = memif_msg_send_from_queue(cc);
> + if (ret < 0)
> + goto disconnect;
> +
> + return;
> +
> + disconnect:
> + if (cc->dev == NULL) {
> + MIF_LOG(WARNING, "eth dev not allocated");
> + return;
> + }
> + memif_disconnect(cc->dev);
> +}
> +
> +static void
> +memif_listener_handler(void *arg)
> +{
> + struct memif_socket *socket = arg;
> + int sockfd;
> + int addr_len;
> + struct sockaddr_un client;
> + struct memif_control_channel *cc;
> + int ret;
> +
> + addr_len = sizeof(client);
> + sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
> + (socklen_t *)&addr_len);
> + if (sockfd < 0) {
> + MIF_LOG(ERR,
> + "Failed to accept connection request on socket fd %d",
> + socket->intr_handle.fd);
> + return;
> + }
> +
> + MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
> +
> + cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
> + if (cc == NULL) {
> + MIF_LOG(ERR, "Failed to allocate control channel.");
> + goto error;
> + }
> +
> + cc->intr_handle.fd = sockfd;
> + cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
> + cc->socket = socket;
> + cc->dev = NULL;
> + TAILQ_INIT(&cc->msg_queue);
> +
> + ret = rte_intr_callback_register(&cc->intr_handle, memif_intr_handler, cc);
> + if (ret < 0) {
> + MIF_LOG(ERR, "Failed to register control channel callback.");
> + goto error;
> + }
> +
> + ret = memif_msg_enq_hello(cc);
> + if (ret < 0) {
> + MIF_LOG(ERR, "Failed to enqueue hello message.");
> + goto error;
> + }
> + ret = memif_msg_send_from_queue(cc);
> + if (ret < 0)
> + goto error;
> +
> + return;
> +
> + error:
> + if (sockfd > 0) {
> + close(sockfd);
> + sockfd = -1;
> + }
> + if (cc != NULL)
> + rte_free(cc);
> +}
> +
> +static struct memif_socket *
> +memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
> +{
> + struct memif_socket *sock;
> + struct sockaddr_un un;
> + int sockfd;
> + int ret;
> + int on = 1;
> +
> + sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
> + if (sock == NULL) {
> + MIF_LOG(ERR, "Failed to allocate memory for memif socket");
> + return NULL;
> + }
> +
> + sock->listener = listener;
> + rte_memcpy(sock->filename, key, 256);
> + TAILQ_INIT(&sock->dev_queue);
> +
> + if (listener != 0) {
> + sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
> + if (sockfd < 0)
> + goto error;
> +
> + un.sun_family = AF_UNIX;
> + memcpy(un.sun_path, sock->filename,
> + sizeof(un.sun_path) - 1);
> +
> + ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
> + sizeof(on));
> + if (ret < 0)
> + goto error;
> + ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
> + if (ret < 0)
> + goto error;
> + ret = listen(sockfd, 1);
> + if (ret < 0)
> + goto error;
> +
> + MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
> + rte_vdev_device_name(pmd->vdev), sock->filename);
> +
> + sock->intr_handle.fd = sockfd;
> + sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
> + ret = rte_intr_callback_register(&sock->intr_handle,
> + memif_listener_handler, sock);
> + if (ret < 0) {
> + MIF_LOG(ERR, "%s: Failed to register interrupt "
> + "callback for listener socket",
> + rte_vdev_device_name(pmd->vdev));
> + return NULL;
> + }
> + }
> +
> + return sock;
> +
> + error:
> + MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
> + rte_vdev_device_name(pmd->vdev), key, strerror(errno));
> + if (sock != NULL)
> + rte_free(sock);
> + return NULL;
> +}
> +
> +static struct rte_hash *
> +memif_create_socket_hash(void)
> +{
> + struct rte_hash_parameters params = { 0 };
> + params.name = MEMIF_SOCKET_HASH_NAME;
> + params.entries = 256;
> + params.key_len = 256;
> + params.hash_func = rte_jhash;
> + params.hash_func_init_val = 0;
> + return rte_hash_create(¶ms);
> +}
> +
> +int
> +memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_socket *socket = NULL;
> + struct memif_socket_dev_list_elt *elt;
> + struct pmd_internals *tmp_pmd;
> + struct rte_hash *hash;
> + int ret;
> + char key[256];
> +
> + hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
> + if (hash == NULL) {
> + hash = memif_create_socket_hash();
> + if (hash == NULL) {
> + MIF_LOG(ERR, "Failed to create memif socket hash.");
> + return -1;
> + }
> + }
> +
> + memset(key, 0, 256);
> + rte_memcpy(key, socket_filename, strlen(socket_filename));
> + ret = rte_hash_lookup_data(hash, key, (void **)&socket);
> + if (ret < 0) {
> + socket = memif_socket_create(pmd, key,
> + (pmd->role ==
> + MEMIF_ROLE_SLAVE) ? 0 : 1);
> + if (socket == NULL)
> + return -1;
> + ret = rte_hash_add_key_data(hash, key, socket);
> + if (ret < 0) {
> + MIF_LOG(ERR, "Failed to add socket to socket hash.");
> + return ret;
> + }
> + }
> + pmd->socket_filename = socket->filename;
> +
> + if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
> + MIF_LOG(ERR, "Socket is a listener.");
> + return -1;
> + } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
> + MIF_LOG(ERR, "Socket is not a listener.");
> + return -1;
> + }
> +
> + TAILQ_FOREACH(elt, &socket->dev_queue, next) {
> + tmp_pmd = elt->dev->data->dev_private;
> + if (tmp_pmd->id == pmd->id) {
> + MIF_LOG(ERR, "Memif device with id %d already "
> + "exists on socket %s",
> + pmd->id, socket->filename);
> + return -1;
> + }
> + }
> +
> + elt = rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt), 0);
> + if (elt == NULL) {
> + MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
> + rte_vdev_device_name(pmd->vdev));
> + return -1;
> + }
> + elt->dev = dev;
> + TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
> +
> + return 0;
> +}
> +
> +void
> +memif_socket_remove_device(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_socket *socket = NULL;
> + struct memif_socket_dev_list_elt *elt, *next;
> + struct rte_hash *hash;
> +
> + hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
> + if (hash == NULL)
> + return;
> +
> + if (pmd->socket_filename == NULL)
> + return;
> +
> + if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) < 0)
> + return;
> +
> + for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
> + next = TAILQ_NEXT(elt, next);
> + if (elt->dev == dev) {
> + TAILQ_REMOVE(&socket->dev_queue, elt, next);
> + rte_free(elt);
> + pmd->socket_filename = NULL;
> + }
> + }
> +
> + /* remove socket, if this was the last device using it */
> + if (TAILQ_EMPTY(&socket->dev_queue)) {
> + rte_hash_del_key(hash, socket->filename);
> + if (socket->listener) {
> + /* remove listener socket file,
> + * so we can create new one later.
> + */
> + remove(socket->filename);
> + }
> + rte_free(socket);
> + }
> +}
> +
> +int
> +memif_connect_master(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> +
> + memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
> + memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
> + pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
> + return 0;
> +}
> +
> +int
> +memif_connect_slave(struct rte_eth_dev *dev)
> +{
> + int sockfd;
> + int ret;
> + struct sockaddr_un sun;
> + struct pmd_internals *pmd = dev->data->dev_private;
> +
> + memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
> + memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
> + pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
> +
> + sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
> + if (sockfd < 0) {
> + MIF_LOG(ERR, "%s: Failed to open socket.",
> + rte_vdev_device_name(pmd->vdev));
> + return -1;
> + }
> +
> + sun.sun_family = AF_UNIX;
> +
> + memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
> +
> + ret = connect(sockfd, (struct sockaddr *)&sun,
> + sizeof(struct sockaddr_un));
> + if (ret < 0) {
> + MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
> + rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
> + goto error;
> + }
> +
> + MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
> + rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
> +
> + pmd->cc = rte_zmalloc("memif-cc",
> + sizeof(struct memif_control_channel), 0);
> + if (pmd->cc == NULL) {
> + MIF_LOG(ERR, "%s: Failed to allocate control channel.",
> + rte_vdev_device_name(pmd->vdev));
> + goto error;
> + }
> +
> + pmd->cc->intr_handle.fd = sockfd;
> + pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
> + pmd->cc->socket = NULL;
> + pmd->cc->dev = dev;
> + TAILQ_INIT(&pmd->cc->msg_queue);
> +
> + ret = rte_intr_callback_register(&pmd->cc->intr_handle,
> + memif_intr_handler, pmd->cc);
> + if (ret < 0) {
> + MIF_LOG(ERR, "%s: Failed to register interrupt callback "
> + "for control fd", rte_vdev_device_name(pmd->vdev));
> + goto error;
> + }
> +
> + return 0;
> +
> + error:
> + if (sockfd > 0) {
> + close(sockfd);
> + sockfd = -1;
> + }
> + if (pmd->cc != NULL) {
> + rte_free(pmd->cc);
> + pmd->cc = NULL;
> + }
> + return -1;
> +}
> diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
> new file mode 100644
> index 000000000..c60614721
> --- /dev/null
> +++ b/drivers/net/memif/memif_socket.h
> @@ -0,0 +1,105 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
> + */
> +
> +#ifndef _MEMIF_SOCKET_H_
> +#define _MEMIF_SOCKET_H_
> +
> +#include <sys/queue.h>
> +
> +/**
> + * Remove device from socket device list. If no device is left on the socket,
> + * remove the socket as well.
> + *
> + * @param pmd
> + * device internals
> + */
> +void memif_socket_remove_device(struct rte_eth_dev *dev);
> +
> +/**
> + * Enqueue disconnect message to control channel message queue.
> + *
> + * @param cc
> + * control channel
> + * @param reason
> + * const string stating disconnect reason (96 characters)
> + * @param err_code
> + * error code
> + */
> +void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
> + int err_code);
> +
> +/**
> + * Initialize memif socket for specified device. If socket doesn't exist, create socket.
> + *
> + * @param dev
> + * memif ethernet device
> + * @param socket_filename
> + * socket filename
> + * @return
> + * - On success, zero.
> + * - On failure, a negative value.
> + */
> +int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
> +
> +/**
> + * Disconnect memif device. Close control channel and shared memory.
> + *
> + * @param dev
> + * ethernet device
> + */
> +void memif_disconnect(struct rte_eth_dev *dev);
> +
> +/**
> + * If device is properly configured, enable connection establishment.
> + *
> + * @param dev
> + * memif ethernet device
> + * @return
> + * - On success, zero.
> + * - On failure, a negative value.
> + */
> +int memif_connect_master(struct rte_eth_dev *dev);
> +
> +/**
> + * If device is properly configured, send connection request.
> + *
> + * @param dev
> + * memif ethernet device
> + * @return
> + * - On success, zero.
> + * - On failure, a negative value.
> + */
> +int memif_connect_slave(struct rte_eth_dev *dev);
> +
> +struct memif_socket_dev_list_elt {
> + TAILQ_ENTRY(memif_socket_dev_list_elt) next;
> + struct rte_eth_dev *dev; /**< pointer to device internals */
> + char dev_name[RTE_ETH_NAME_MAX_LEN];
> +};
> +
> +#define MEMIF_SOCKET_HASH_NAME "memif-sh"
> +struct memif_socket {
> + struct rte_intr_handle intr_handle; /**< interrupt handle */
> + char filename[256]; /**< socket filename */
> +
> + TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
> + /**< Queue of devices using this socket */
> + uint8_t listener; /**< if not zero socket is listener */
> +};
> +
> +/* Control message queue. */
> +struct memif_msg_queue_elt {
> + memif_msg_t msg; /**< control message */
> + TAILQ_ENTRY(memif_msg_queue_elt) next;
> + int fd; /**< fd to be sent to peer */
> +};
> +
> +struct memif_control_channel {
> + struct rte_intr_handle intr_handle; /**< interrupt handle */
> + TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
> + struct memif_socket *socket; /**< pointer to socket */
> + struct rte_eth_dev *dev; /**< pointer to device */
> +};
> +
> +#endif /* MEMIF_SOCKET_H */
> diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
> new file mode 100644
> index 000000000..287a30ef2
> --- /dev/null
> +++ b/drivers/net/memif/meson.build
> @@ -0,0 +1,15 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
> +
> +if host_machine.system() != 'linux'
> + build = false
> +endif
> +
> +sources = files('rte_eth_memif.c',
> + 'memif_socket.c')
> +
> +allow_experimental_apis = true
> +# Experimantal APIs:
> +# - rte_intr_callback_unregister_pending
> +
> +deps += ['hash']
> diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
> new file mode 100644
> index 000000000..3b9c1ce66
> --- /dev/null
> +++ b/drivers/net/memif/rte_eth_memif.c
> @@ -0,0 +1,1206 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
> + */
> +
> +#include <stdlib.h>
> +#include <fcntl.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/socket.h>
> +#include <sys/un.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +#include <linux/if_ether.h>
> +#include <errno.h>
> +#include <sys/eventfd.h>
> +
> +#include <rte_version.h>
> +#include <rte_mbuf.h>
> +#include <rte_ether.h>
> +#include <rte_ethdev_driver.h>
> +#include <rte_ethdev_vdev.h>
> +#include <rte_malloc.h>
> +#include <rte_kvargs.h>
> +#include <rte_bus_vdev.h>
> +#include <rte_string_fns.h>
> +
> +#include "rte_eth_memif.h"
> +#include "memif_socket.h"
> +
> +#define ETH_MEMIF_ID_ARG "id"
> +#define ETH_MEMIF_ROLE_ARG "role"
> +#define ETH_MEMIF_PKT_BUFFER_SIZE_ARG "bsize"
> +#define ETH_MEMIF_RING_SIZE_ARG "rsize"
> +#define ETH_MEMIF_SOCKET_ARG "socket"
> +#define ETH_MEMIF_MAC_ARG "mac"
> +#define ETH_MEMIF_ZC_ARG "zero-copy"
> +#define ETH_MEMIF_SECRET_ARG "secret"
> +
> +static const char *valid_arguments[] = {
> + ETH_MEMIF_ID_ARG,
> + ETH_MEMIF_ROLE_ARG,
> + ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
> + ETH_MEMIF_RING_SIZE_ARG,
> + ETH_MEMIF_SOCKET_ARG,
> + ETH_MEMIF_MAC_ARG,
> + ETH_MEMIF_ZC_ARG,
> + ETH_MEMIF_SECRET_ARG,
> + NULL
> +};
> +
> +const char *
> +memif_version(void)
> +{
> + return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
> +}
> +
> +static void
> +memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
> +{
> + dev_info->max_mac_addrs = 1;
> + dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
> + dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
> + dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
> + dev_info->min_rx_bufsize = 0;
> +}
> +
> +static memif_ring_t *
> +memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
> +{
> + /* rings only in region 0 */
> + void *p = pmd->regions[0]->addr;
> + int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
> + (1 << pmd->run.log2_ring_size);
> +
> + p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
> +
> + return (memif_ring_t *)p;
> +}
> +
> +static void *
> +memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
> +{
> + return ((uint8_t *)pmd->regions[d->region]->addr + d->offset);
> +}
> +
> +static int
> +memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail,
> + struct rte_mbuf *tail)
> +{
> + /* Check for number-of-segments-overflow */
> + if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS))
> + return -EOVERFLOW;
> +
> + /* Chain 'tail' onto the old tail */
> + cur_tail->next = tail;
> +
> + /* accumulate number of segments and total length. */
> + head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs);
> +
> + tail->pkt_len = tail->data_len;
> + head->pkt_len += tail->pkt_len;
> +
> + return 0;
> +}
> +
> +static uint16_t
> +eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
> +{
> + struct memif_queue *mq = queue;
> + struct pmd_internals *pmd = mq->pmd;
> + memif_ring_t *ring = mq->ring;
> + uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
> + uint16_t n_rx_pkts = 0;
> + uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
> + RTE_PKTMBUF_HEADROOM;
> + uint16_t src_len, src_off, dst_len, dst_off, cp_len;
> + memif_ring_type_t type = mq->type;
> + memif_desc_t *d0;
> + struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail;
> + uint64_t b;
> + ssize_t size __rte_unused;
> + uint16_t head;
> + int ret;
> +
> + if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
> + return 0;
> + if (unlikely(ring == NULL))
> + return 0;
> +
> + /* consume interrupt */
> + if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
> + size = read(mq->intr_handle.fd, &b, sizeof(b));
> +
> + ring_size = 1 << mq->log2_ring_size;
> + mask = ring_size - 1;
> +
> + cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
> + last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
> + if (cur_slot == last_slot)
> + goto refill;
> + n_slots = last_slot - cur_slot;
> +
> + while (n_slots && n_rx_pkts < nb_pkts) {
> + mbuf_head = rte_pktmbuf_alloc(mq->mempool);
> + if (unlikely(mbuf_head == NULL))
> + goto no_free_bufs;
> + mbuf = mbuf_head;
> + mbuf->port = mq->in_port;
> +
> +next_slot:
> + s0 = cur_slot & mask;
> + d0 = &ring->desc[s0];
> +
> + src_len = d0->length;
> + dst_off = 0;
> + src_off = 0;
> +
> + do {
> + dst_len = mbuf_size - dst_off;
> + if (dst_len == 0) {
> + dst_off = 0;
> + dst_len = mbuf_size;
> +
> + /* store pointer to tail */
> + mbuf_tail = mbuf;
> + mbuf = rte_pktmbuf_alloc(mq->mempool);
> + if (unlikely(mbuf == NULL))
> + goto no_free_bufs;
> + mbuf->port = mq->in_port;
> + ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf);
> + if (unlikely(ret < 0)) {
> + MIF_LOG(ERR, "%s: number-of-segments-overflow",
> + rte_vdev_device_name(pmd->vdev));
> + rte_pktmbuf_free(mbuf);
> + goto no_free_bufs;
> + }
> + }
> + cp_len = RTE_MIN(dst_len, src_len);
> +
> + rte_pktmbuf_data_len(mbuf) += cp_len;
> + rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
> + if (mbuf != mbuf_head)
> + rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
> +
> + memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
> + (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
> +
> + src_off += cp_len;
> + dst_off += cp_len;
> + src_len -= cp_len;
> + } while (src_len);
> +
> + cur_slot++;
> + n_slots--;
> +
> + if (d0->flags & MEMIF_DESC_FLAG_NEXT)
> + goto next_slot;
> +
> + mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
> + *bufs++ = mbuf_head;
> + n_rx_pkts++;
> + }
> +
> +no_free_bufs:
> + if (type == MEMIF_RING_S2M) {
> + rte_mb();
> + ring->tail = cur_slot;
> + mq->last_head = cur_slot;
> + } else {
> + mq->last_tail = cur_slot;
> + }
> +
> +refill:
> + if (type == MEMIF_RING_M2S) {
> + head = ring->head;
> + n_slots = ring_size - head + mq->last_tail;
> +
> + while (n_slots--) {
> + s0 = head++ & mask;
> + d0 = &ring->desc[s0];
> + d0->length = pmd->run.pkt_buffer_size;
> + }
> + rte_mb();
> + ring->head = head;
> + }
> +
> + mq->n_pkts += n_rx_pkts;
> + return n_rx_pkts;
> +}
> +
> +static uint16_t
> +eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
> +{
> + struct memif_queue *mq = queue;
> + struct pmd_internals *pmd = mq->pmd;
> + memif_ring_t *ring = mq->ring;
> + uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
> + uint16_t src_len, src_off, dst_len, dst_off, cp_len;
> + memif_ring_type_t type = mq->type;
> + memif_desc_t *d0;
> + struct rte_mbuf *mbuf;
> + struct rte_mbuf *mbuf_head;
> + uint64_t a;
> + ssize_t size;
> +
> + if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
> + return 0;
> + if (unlikely(ring == NULL))
> + return 0;
> +
> + ring_size = 1 << mq->log2_ring_size;
> + mask = ring_size - 1;
> +
> + n_free = ring->tail - mq->last_tail;
> + mq->last_tail += n_free;
> + slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
> +
> + if (type == MEMIF_RING_S2M)
> + n_free = ring_size - ring->head + mq->last_tail;
> + else
> + n_free = ring->head - ring->tail;
> +
> + while (n_tx_pkts < nb_pkts && n_free) {
> + mbuf_head = *bufs++;
> + mbuf = mbuf_head;
> +
> + saved_slot = slot;
> + d0 = &ring->desc[slot & mask];
> + dst_off = 0;
> + dst_len = (type == MEMIF_RING_S2M) ?
> + pmd->run.pkt_buffer_size : d0->length;
> +
> +next_in_chain:
> + src_off = 0;
> + src_len = rte_pktmbuf_data_len(mbuf);
> +
> + while (src_len) {
> + if (dst_len == 0) {
> + if (n_free) {
> + slot++;
> + n_free--;
> + d0->flags |= MEMIF_DESC_FLAG_NEXT;
> + d0 = &ring->desc[slot & mask];
> + dst_off = 0;
> + dst_len = (type == MEMIF_RING_S2M) ?
> + pmd->run.pkt_buffer_size : d0->length;
> + d0->flags = 0;
> + } else {
> + slot = saved_slot;
> + goto no_free_slots;
> + }
> + }
> + cp_len = RTE_MIN(dst_len, src_len);
> +
> + memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
> + rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
> + cp_len);
> +
> + mq->n_bytes += cp_len;
> + src_off += cp_len;
> + dst_off += cp_len;
> + src_len -= cp_len;
> + dst_len -= cp_len;
> +
> + d0->length = dst_off;
> + }
> +
> + if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
> + mbuf = mbuf->next;
> + goto next_in_chain;
> + }
> +
> + n_tx_pkts++;
> + slot++;
> + n_free--;
> + rte_pktmbuf_free(mbuf_head);
> + }
> +
> +no_free_slots:
> + rte_mb();
> + if (type == MEMIF_RING_S2M)
> + ring->head = slot;
> + else
> + ring->tail = slot;
> +
> + if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
> + a = 1;
> + size = write(mq->intr_handle.fd, &a, sizeof(a));
> + if (unlikely(size < 0)) {
> + MIF_LOG(WARNING,
> + "%s: Failed to send interrupt. %s",
> + rte_vdev_device_name(pmd->vdev), strerror(errno));
> + }
> + }
> +
> + mq->n_err += nb_pkts - n_tx_pkts;
> + mq->n_pkts += n_tx_pkts;
> + return n_tx_pkts;
> +}
> +
> +void
> +memif_free_regions(struct pmd_internals *pmd)
> +{
> + int i;
> + struct memif_region *r;
> +
> + /* regions are allocated contiguously, so it's
> + * enough to loop until 'pmd->regions_num'
> + */
> + for (i = 0; i < pmd->regions_num; i++) {
> + r = pmd->regions[i];
> + if (r != NULL) {
> + if (r->addr != NULL) {
> + munmap(r->addr, r->region_size);
> + if (r->fd > 0) {
> + close(r->fd);
> + r->fd = -1;
> + }
> + }
> + rte_free(r);
> + pmd->regions[i] = NULL;
> + }
> + }
> + pmd->regions_num = 0;
> +}
> +
> +static int
> +memif_region_init_shm(struct pmd_internals *pmd, uint8_t has_buffers)
> +{
> + char shm_name[ETH_MEMIF_SHM_NAME_SIZE];
> + int ret = 0;
> + struct memif_region *r;
> +
> + if (pmd->regions_num >= ETH_MEMIF_MAX_REGION_NUM) {
> + MIF_LOG(ERR, "%s: Too many regions.", rte_vdev_device_name(pmd->vdev));
> + return -1;
> + }
> +
> + r = rte_zmalloc("region", sizeof(struct memif_region), 0);
> + if (r == NULL) {
> + MIF_LOG(ERR, "%s: Failed to alloc memif region.",
> + rte_vdev_device_name(pmd->vdev));
> + return -ENOMEM;
> + }
> +
> + /* calculate buffer offset */
> + r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) *
> + (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
> + (1 << pmd->run.log2_ring_size));
> +
> + r->region_size = r->pkt_buffer_offset;
> + /* if region has buffers, add buffers size to region_size */
> + if (has_buffers == 1)
> + r->region_size += (uint32_t)(pmd->run.pkt_buffer_size *
> + (1 << pmd->run.log2_ring_size) *
> + (pmd->run.num_s2m_rings +
> + pmd->run.num_m2s_rings));
> +
> + memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE);
> + snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d",
> + pmd->regions_num);
> +
> + r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
> + if (r->fd < 0) {
> + MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
> + rte_vdev_device_name(pmd->vdev),
> + strerror(errno));
> + ret = -1;
> + goto error;
> + }
> +
> + ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
> + if (ret < 0) {
> + MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
> + rte_vdev_device_name(pmd->vdev),
> + strerror(errno));
> + goto error;
> + }
> +
> + ret = ftruncate(r->fd, r->region_size);
> + if (ret < 0) {
> + MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
> + rte_vdev_device_name(pmd->vdev),
> + strerror(errno));
> + goto error;
> + }
> +
> + r->addr = mmap(NULL, r->region_size, PROT_READ |
> + PROT_WRITE, MAP_SHARED, r->fd, 0);
> + if (r->addr == MAP_FAILED) {
> + MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
> + rte_vdev_device_name(pmd->vdev),
> + strerror(ret));
> + ret = -1;
> + goto error;
> + }
> +
> + pmd->regions[pmd->regions_num] = r;
> + pmd->regions_num++;
> +
> + return ret;
> +
> +error:
> + if (r->fd > 0)
> + close(r->fd);
> + r->fd = -1;
> +
> + return ret;
> +}
> +
> +static int
> +memif_regions_init(struct pmd_internals *pmd)
> +{
> + int ret;
> +
> + /* create one buffer region */
> + ret = memif_region_init_shm(pmd, /* has buffer */ 1);
> + if (ret < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +static void
> +memif_init_rings(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + memif_ring_t *ring;
> + int i, j;
> + uint16_t slot;
> +
> + for (i = 0; i < pmd->run.num_s2m_rings; i++) {
> + ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
> + ring->head = 0;
> + ring->tail = 0;
> + ring->cookie = MEMIF_COOKIE;
> + ring->flags = 0;
> + for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
> + slot = i * (1 << pmd->run.log2_ring_size) + j;
> + ring->desc[j].region = 0;
> + ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
> + (uint32_t)(slot * pmd->run.pkt_buffer_size);
> + ring->desc[j].length = pmd->run.pkt_buffer_size;
> + }
> + }
> +
> + for (i = 0; i < pmd->run.num_m2s_rings; i++) {
> + ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
> + ring->head = 0;
> + ring->tail = 0;
> + ring->cookie = MEMIF_COOKIE;
> + ring->flags = 0;
> + for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
> + slot = (i + pmd->run.num_s2m_rings) *
> + (1 << pmd->run.log2_ring_size) + j;
> + ring->desc[j].region = 0;
> + ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
> + (uint32_t)(slot * pmd->run.pkt_buffer_size);
> + ring->desc[j].length = pmd->run.pkt_buffer_size;
> + }
> + }
> +}
> +
> +/* called only by slave */
> +static void
> +memif_init_queues(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_queue *mq;
> + int i;
> +
> + for (i = 0; i < pmd->run.num_s2m_rings; i++) {
> + mq = dev->data->tx_queues[i];
> + mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
> + mq->log2_ring_size = pmd->run.log2_ring_size;
> + /* queues located only in region 0 */
> + mq->region = 0;
> + mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
> + mq->last_head = 0;
> + mq->last_tail = 0;
> + mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
> + if (mq->intr_handle.fd < 0) {
> + MIF_LOG(WARNING,
> + "%s: Failed to create eventfd for tx queue %d: %s.",
> + rte_vdev_device_name(pmd->vdev), i,
> + strerror(errno));
> + }
> + }
> +
> + for (i = 0; i < pmd->run.num_m2s_rings; i++) {
> + mq = dev->data->rx_queues[i];
> + mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
> + mq->log2_ring_size = pmd->run.log2_ring_size;
> + /* queues located only in region 0 */
> + mq->region = 0;
> + mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
> + mq->last_head = 0;
> + mq->last_tail = 0;
> + mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
> + if (mq->intr_handle.fd < 0) {
> + MIF_LOG(WARNING,
> + "%s: Failed to create eventfd for rx queue %d: %s.",
> + rte_vdev_device_name(pmd->vdev), i,
> + strerror(errno));
> + }
> + }
> +}
> +
> +int
> +memif_init_regions_and_queues(struct rte_eth_dev *dev)
> +{
> + int ret;
> +
> + ret = memif_regions_init(dev->data->dev_private);
> + if (ret < 0)
> + return ret;
> +
> + memif_init_rings(dev);
> +
> + memif_init_queues(dev);
> +
> + return 0;
> +}
> +
> +int
> +memif_connect(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_region *mr;
> + struct memif_queue *mq;
> + int i;
> +
> + for (i = 0; i < pmd->regions_num; i++) {
> + mr = pmd->regions[i];
> + if (mr != NULL) {
> + if (mr->addr == NULL) {
> + if (mr->fd < 0)
> + return -1;
> + mr->addr = mmap(NULL, mr->region_size,
> + PROT_READ | PROT_WRITE,
> + MAP_SHARED, mr->fd, 0);
> + if (mr->addr == NULL)
> + return -1;
> + }
> + }
> + }
> +
> + for (i = 0; i < pmd->run.num_s2m_rings; i++) {
> + mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
> + dev->data->tx_queues[i] : dev->data->rx_queues[i];
> + mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
> + mq->ring_offset);
> + if (mq->ring->cookie != MEMIF_COOKIE) {
> + MIF_LOG(ERR, "%s: Wrong cookie",
> + rte_vdev_device_name(pmd->vdev));
> + return -1;
> + }
> + mq->ring->head = 0;
> + mq->ring->tail = 0;
> + mq->last_head = 0;
> + mq->last_tail = 0;
> + /* enable polling mode */
> + if (pmd->role == MEMIF_ROLE_MASTER)
> + mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
> + }
> + for (i = 0; i < pmd->run.num_m2s_rings; i++) {
> + mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
> + dev->data->rx_queues[i] : dev->data->tx_queues[i];
> + mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
> + mq->ring_offset);
> + if (mq->ring->cookie != MEMIF_COOKIE) {
> + MIF_LOG(ERR, "%s: Wrong cookie",
> + rte_vdev_device_name(pmd->vdev));
> + return -1;
> + }
> + mq->ring->head = 0;
> + mq->ring->tail = 0;
> + mq->last_head = 0;
> + mq->last_tail = 0;
> + /* enable polling mode */
> + if (pmd->role == MEMIF_ROLE_SLAVE)
> + mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
> + }
> +
> + pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
> + pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
> + dev->data->dev_link.link_status = ETH_LINK_UP;
> + MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
> + return 0;
> +}
> +
> +static int
> +memif_dev_start(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + int ret = 0;
> +
> + switch (pmd->role) {
> + case MEMIF_ROLE_SLAVE:
> + ret = memif_connect_slave(dev);
> + break;
> + case MEMIF_ROLE_MASTER:
> + ret = memif_connect_master(dev);
> + break;
> + default:
> + MIF_LOG(ERR, "%s: Unknown role: %d.",
> + rte_vdev_device_name(pmd->vdev), pmd->role);
> + ret = -1;
> + break;
> + }
> +
> + return ret;
> +}
> +
> +static void
> +memif_dev_close(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> +
> + memif_msg_enq_disconnect(pmd->cc, "Device closed", 0);
> + memif_disconnect(dev);
> +
> + memif_socket_remove_device(dev);
> +}
> +
> +static int
> +memif_dev_configure(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> +
> + /*
> + * SLAVE - TXQ
> + * MASTER - RXQ
> + */
> + pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
> + dev->data->nb_tx_queues : dev->data->nb_rx_queues;
> +
> + /*
> + * SLAVE - RXQ
> + * MASTER - TXQ
> + */
> + pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
> + dev->data->nb_rx_queues : dev->data->nb_tx_queues;
> +
> + return 0;
> +}
> +
> +static int
> +memif_tx_queue_setup(struct rte_eth_dev *dev,
> + uint16_t qid,
> + uint16_t nb_tx_desc __rte_unused,
> + unsigned int socket_id __rte_unused,
> + const struct rte_eth_txconf *tx_conf __rte_unused)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_queue *mq;
> +
> + mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
> + if (mq == NULL) {
> + MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
> + rte_vdev_device_name(pmd->vdev), qid);
> + return -ENOMEM;
> + }
> +
> + mq->type =
> + (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
> + mq->n_pkts = 0;
> + mq->n_bytes = 0;
> + mq->n_err = 0;
> + mq->intr_handle.fd = -1;
> + mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
> + mq->pmd = pmd;
> + dev->data->tx_queues[qid] = mq;
> +
> + return 0;
> +}
> +
> +static int
> +memif_rx_queue_setup(struct rte_eth_dev *dev,
> + uint16_t qid,
> + uint16_t nb_rx_desc __rte_unused,
> + unsigned int socket_id __rte_unused,
> + const struct rte_eth_rxconf *rx_conf __rte_unused,
> + struct rte_mempool *mb_pool)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_queue *mq;
> +
> + mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
> + if (mq == NULL) {
> + MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
> + rte_vdev_device_name(pmd->vdev), qid);
> + return -ENOMEM;
> + }
> +
> + mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
> + mq->n_pkts = 0;
> + mq->n_bytes = 0;
> + mq->n_err = 0;
> + mq->intr_handle.fd = -1;
> + mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
> + mq->mempool = mb_pool;
> + mq->in_port = dev->data->port_id;
> + mq->pmd = pmd;
> + dev->data->rx_queues[qid] = mq;
> +
> + return 0;
> +}
> +
> +static void
> +memif_queue_release(void *queue)
> +{
> + struct memif_queue *mq = (struct memif_queue *)queue;
> +
> + if (!mq)
> + return;
> +
> + rte_free(mq);
> +}
> +
> +static int
> +memif_link_update(struct rte_eth_dev *dev __rte_unused,
> + int wait_to_complete __rte_unused)
> +{
> + return 0;
> +}
> +
> +static int
> +memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + struct memif_queue *mq;
> + int i;
> + uint8_t tmp, nq;
> +
> + stats->ipackets = 0;
> + stats->ibytes = 0;
> + stats->opackets = 0;
> + stats->obytes = 0;
> + stats->oerrors = 0;
> +
> + tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
> + pmd->run.num_m2s_rings;
> + nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
> + RTE_ETHDEV_QUEUE_STAT_CNTRS;
> +
> + /* RX stats */
> + for (i = 0; i < nq; i++) {
> + mq = dev->data->rx_queues[i];
> + stats->q_ipackets[i] = mq->n_pkts;
> + stats->q_ibytes[i] = mq->n_bytes;
> + stats->ipackets += mq->n_pkts;
> + stats->ibytes += mq->n_bytes;
> + }
> +
> + tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
> + pmd->run.num_s2m_rings;
> + nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
> + RTE_ETHDEV_QUEUE_STAT_CNTRS;
> +
> + /* TX stats */
> + for (i = 0; i < nq; i++) {
> + mq = dev->data->tx_queues[i];
> + stats->q_opackets[i] = mq->n_pkts;
> + stats->q_obytes[i] = mq->n_bytes;
> + stats->opackets += mq->n_pkts;
> + stats->obytes += mq->n_bytes;
> + stats->oerrors += mq->n_err;
> + }
> + return 0;
> +}
> +
> +static void
> +memif_stats_reset(struct rte_eth_dev *dev)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> + int i;
> + struct memif_queue *mq;
> +
> + for (i = 0; i < pmd->run.num_s2m_rings; i++) {
> + mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
> + dev->data->rx_queues[i];
> + mq->n_pkts = 0;
> + mq->n_bytes = 0;
> + mq->n_err = 0;
> + }
> + for (i = 0; i < pmd->run.num_m2s_rings; i++) {
> + mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
> + dev->data->tx_queues[i];
> + mq->n_pkts = 0;
> + mq->n_bytes = 0;
> + mq->n_err = 0;
> + }
> +}
> +
> +static int
> +memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
> +{
> + struct pmd_internals *pmd = dev->data->dev_private;
> +
> + MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
> + rte_vdev_device_name(pmd->vdev));
> +
> + return -1;
> +}
> +
> +static int
> +memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
> +{
> + struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
> +
> + return 0;
> +}
> +
> +static const struct eth_dev_ops ops = {
> + .dev_start = memif_dev_start,
> + .dev_close = memif_dev_close,
> + .dev_infos_get = memif_dev_info,
> + .dev_configure = memif_dev_configure,
> + .tx_queue_setup = memif_tx_queue_setup,
> + .rx_queue_setup = memif_rx_queue_setup,
> + .rx_queue_release = memif_queue_release,
> + .tx_queue_release = memif_queue_release,
> + .rx_queue_intr_enable = memif_rx_queue_intr_enable,
> + .rx_queue_intr_disable = memif_rx_queue_intr_disable,
> + .link_update = memif_link_update,
> + .stats_get = memif_stats_get,
> + .stats_reset = memif_stats_reset,
> +};
> +
> +static int
> +memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
> + memif_interface_id_t id, uint32_t flags,
> + const char *socket_filename,
> + memif_log2_ring_size_t log2_ring_size,
> + uint16_t pkt_buffer_size, const char *secret,
> + struct rte_ether_addr *ether_addr)
> +{
> + int ret = 0;
> + struct rte_eth_dev *eth_dev;
> + struct rte_eth_dev_data *data;
> + struct pmd_internals *pmd;
> + const unsigned int numa_node = vdev->device.numa_node;
> + const char *name = rte_vdev_device_name(vdev);
> +
> + if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
> + MIF_LOG(ERR, "Zero-copy slave not supported.");
> + return -1;
> + }
> +
> + eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
> + if (eth_dev == NULL) {
> + MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
> + return -1;
> + }
> +
> + pmd = eth_dev->data->dev_private;
> + memset(pmd, 0, sizeof(*pmd));
> +
> + pmd->vdev = vdev;
> + pmd->id = id;
> + pmd->flags = flags;
> + pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
> + pmd->role = role;
> +
> + ret = memif_socket_init(eth_dev, socket_filename);
> + if (ret < 0)
> + return ret;
> +
> + memset(pmd->secret, 0, sizeof(char) * ETH_MEMIF_SECRET_SIZE);
> + if (secret != NULL)
> + strlcpy(pmd->secret, secret, sizeof(pmd->secret));
> +
> + pmd->cfg.log2_ring_size = log2_ring_size;
> + /* set in .dev_configure() */
> + pmd->cfg.num_s2m_rings = 0;
> + pmd->cfg.num_m2s_rings = 0;
> +
> + pmd->cfg.pkt_buffer_size = pkt_buffer_size;
> +
> + data = eth_dev->data;
> + data->dev_private = pmd;
> + data->numa_node = numa_node;
> + data->mac_addrs = ether_addr;
> +
> + eth_dev->dev_ops = &ops;
> + eth_dev->device = &vdev->device;
> + eth_dev->rx_pkt_burst = eth_memif_rx;
> + eth_dev->tx_pkt_burst = eth_memif_tx;
> +
> + eth_dev->data->dev_flags |= RTE_ETH_DEV_CLOSE_REMOVE;
> +
> + rte_eth_dev_probing_finish(eth_dev);
> +
> + return 0;
> +}
> +
> +static int
> +memif_set_role(const char *key __rte_unused, const char *value,
> + void *extra_args)
> +{
> + enum memif_role_t *role = (enum memif_role_t *)extra_args;
> +
> + if (strstr(value, "master") != NULL) {
> + *role = MEMIF_ROLE_MASTER;
> + } else if (strstr(value, "slave") != NULL) {
> + *role = MEMIF_ROLE_SLAVE;
> + } else {
> + MIF_LOG(ERR, "Unknown role: %s.", value);
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> +static int
> +memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> + uint32_t *flags = (uint32_t *)extra_args;
> +
> + if (strstr(value, "yes") != NULL) {
> + *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
> + } else if (strstr(value, "no") != NULL) {
> + *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
> + } else {
> + MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
> + return -EINVAL;
> + }
> + return 0;
> +}
> +
> +static int
> +memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> + memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
> +
> + /* even if parsing fails, 0 is a valid id */
> + *id = strtoul(value, NULL, 10);
> + return 0;
> +}
> +
> +static int
> +memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> + unsigned long tmp;
> + uint16_t *pkt_buffer_size = (uint16_t *)extra_args;
> +
> + tmp = strtoul(value, NULL, 10);
> + if (tmp == 0 || tmp > 0xFFFF) {
> + MIF_LOG(ERR, "Invalid buffer size: %s.", value);
> + return -EINVAL;
> + }
> + *pkt_buffer_size = tmp;
> + return 0;
> +}
> +
> +static int
> +memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> + unsigned long tmp;
> + memif_log2_ring_size_t *log2_ring_size =
> + (memif_log2_ring_size_t *)extra_args;
> +
> + tmp = strtoul(value, NULL, 10);
> + if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
> + MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
> + value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
> + return -EINVAL;
> + }
> + *log2_ring_size = tmp;
> + return 0;
> +}
> +
> +/* check if directory exists and if we have permission to read/write */
> +static int
> +memif_check_socket_filename(const char *filename)
> +{
> + char *dir = NULL, *tmp;
> + uint32_t idx;
> + int ret = 0;
> +
> + tmp = strrchr(filename, '/');
> + if (tmp != NULL) {
> + idx = tmp - filename;
> + dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
> + if (dir == NULL) {
> + MIF_LOG(ERR, "Failed to allocate memory.");
> + return -1;
> + }
> + strlcpy(dir, filename, sizeof(char) * (idx + 1));
> + }
> +
> + if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
> + W_OK, AT_EACCESS) < 0)) {
> + MIF_LOG(ERR, "Invalid socket directory.");
> + ret = -EINVAL;
> + }
> +
> + if (dir != NULL)
> + rte_free(dir);
> +
> + return ret;
> +}
> +
> +static int
> +memif_set_socket_filename(const char *key __rte_unused, const char *value,
> + void *extra_args)
> +{
> + const char **socket_filename = (const char **)extra_args;
> +
> + *socket_filename = value;
> + return memif_check_socket_filename(*socket_filename);
> +}
> +
> +static int
> +memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> + struct rte_ether_addr *ether_addr = (struct rte_ether_addr *)extra_args;
> + int ret = 0;
> +
> + ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
> + ðer_addr->addr_bytes[0], ðer_addr->addr_bytes[1],
> + ðer_addr->addr_bytes[2], ðer_addr->addr_bytes[3],
> + ðer_addr->addr_bytes[4], ðer_addr->addr_bytes[5]);
> + if (ret != 6)
> + MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
> + return 0;
> +}
> +
> +static int
> +memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
> +{
> + const char **secret = (const char **)extra_args;
> +
> + *secret = value;
> + return 0;
> +}
> +
> +static int
> +rte_pmd_memif_probe(struct rte_vdev_device *vdev)
> +{
> + RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
> + RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
> + int ret = 0;
> + struct rte_kvargs *kvlist;
> + const char *name = rte_vdev_device_name(vdev);
> + enum memif_role_t role = MEMIF_ROLE_SLAVE;
> + memif_interface_id_t id = 0;
> + uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
> + memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
> + const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
> + uint32_t flags = 0;
> + const char *secret = NULL;
> + struct rte_ether_addr *ether_addr = rte_zmalloc("", sizeof(struct rte_ether_addr), 0);
> +
> + rte_eth_random_addr(ether_addr->addr_bytes);
> +
> + MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
> +
> + if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> + MIF_LOG(ERR, "Multi-processing not supported for memif.");
> + /* TODO:
> + * Request connection information.
> + *
> + * Once memif in the primary process is connected,
> + * broadcast connection information.
> + */
> + return -1;
> + }
> +
> + kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
> +
> + /* parse parameters */
> + if (kvlist != NULL) {
> + ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
> + &memif_set_role, &role);
> + if (ret < 0)
> + goto exit;
> + ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
> + &memif_set_id, &id);
> + if (ret < 0)
> + goto exit;
> + ret = rte_kvargs_process(kvlist, ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
> + &memif_set_bs, &pkt_buffer_size);
> + if (ret < 0)
> + goto exit;
> + ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
> + &memif_set_rs, &log2_ring_size);
> + if (ret < 0)
> + goto exit;
> + ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
> + &memif_set_socket_filename,
> + (void *)(&socket_filename));
> + if (ret < 0)
> + goto exit;
> + ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
> + &memif_set_mac, ether_addr);
> + if (ret < 0)
> + goto exit;
> + ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
> + &memif_set_zc, &flags);
> + if (ret < 0)
> + goto exit;
> + ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
> + &memif_set_secret, (void *)(&secret));
> + if (ret < 0)
> + goto exit;
> + }
> +
> + /* create interface */
> + ret = memif_create(vdev, role, id, flags, socket_filename,
> + log2_ring_size, pkt_buffer_size, secret, ether_addr);
> +
> +exit:
> + if (kvlist != NULL)
> + rte_kvargs_free(kvlist);
> + return ret;
> +}
> +
> +static int
> +rte_pmd_memif_remove(struct rte_vdev_device *vdev)
> +{
> + struct rte_eth_dev *eth_dev;
> + int i;
> +
> + eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
> + if (eth_dev == NULL)
> + return 0;
> +
> + for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
> + (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->rx_queues[i]);
> + for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
> + (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->tx_queues[i]);
> +
> + rte_free(eth_dev->process_private);
> + eth_dev->process_private = NULL;
> +
> + rte_eth_dev_close(eth_dev->data->port_id);
> +
> + return 0;
> +}
> +
> +static struct rte_vdev_driver pmd_memif_drv = {
> + .probe = rte_pmd_memif_probe,
> + .remove = rte_pmd_memif_remove,
> +};
> +
> +RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
> +
> +RTE_PMD_REGISTER_PARAM_STRING(net_memif,
> + ETH_MEMIF_ID_ARG "=<int>"
> + ETH_MEMIF_ROLE_ARG "=master|slave"
> + ETH_MEMIF_PKT_BUFFER_SIZE_ARG "=<int>"
> + ETH_MEMIF_RING_SIZE_ARG "=<int>"
> + ETH_MEMIF_SOCKET_ARG "=<string>"
> + ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
> + ETH_MEMIF_ZC_ARG "=yes|no"
> + ETH_MEMIF_SECRET_ARG "=<string>");
> +
> +int memif_logtype;
> +
> +RTE_INIT(memif_init_log)
> +{
> + memif_logtype = rte_log_register("pmd.net.memif");
> + if (memif_logtype >= 0)
> + rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
> +}
> diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
> new file mode 100644
> index 000000000..8cee2643c
> --- /dev/null
> +++ b/drivers/net/memif/rte_eth_memif.h
> @@ -0,0 +1,212 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
> + */
> +
> +#ifndef _RTE_ETH_MEMIF_H_
> +#define _RTE_ETH_MEMIF_H_
> +
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif /* GNU_SOURCE */
> +
> +#include <sys/queue.h>
> +
> +#include <rte_ethdev_driver.h>
> +#include <rte_ether.h>
> +#include <rte_interrupts.h>
> +
> +#include "memif.h"
> +
> +#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/run/memif.sock"
> +#define ETH_MEMIF_DEFAULT_RING_SIZE 10
> +#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
> +
> +#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
> +#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
> +#define ETH_MEMIF_MAX_REGION_NUM 256
> +
> +#define ETH_MEMIF_SHM_NAME_SIZE 32
> +#define ETH_MEMIF_DISC_STRING_SIZE 96
> +#define ETH_MEMIF_SECRET_SIZE 24
> +
> +extern int memif_logtype;
> +
> +#define MIF_LOG(level, fmt, args...) \
> + rte_log(RTE_LOG_ ## level, memif_logtype, \
> + "%s(): " fmt "\n", __func__, ##args)
> +
> +enum memif_role_t {
> + MEMIF_ROLE_MASTER,
> + MEMIF_ROLE_SLAVE,
> +};
> +
> +struct memif_region {
> + void *addr; /**< shared memory address */
> + memif_region_size_t region_size; /**< shared memory size */
> + int fd; /**< shared memory file descriptor */
> + uint32_t pkt_buffer_offset;
> + /**< offset from 'addr' to first packet buffer */
> +};
> +
> +struct memif_queue {
> + struct rte_mempool *mempool; /**< mempool for RX packets */
> + struct pmd_internals *pmd; /**< device internals */
> +
> + memif_ring_type_t type; /**< ring type */
> + memif_region_index_t region; /**< shared memory region index */
> +
> + uint16_t in_port; /**< port id */
> +
> + memif_region_offset_t ring_offset;
> + /**< ring offset from start of shm region (ring - memif_region.addr) */
> +
> + uint16_t last_head; /**< last ring head */
> + uint16_t last_tail; /**< last ring tail */
> +
> + /* rx/tx info */
> + uint64_t n_pkts; /**< number of rx/tx packets */
> + uint64_t n_bytes; /**< number of rx/tx bytes */
> + uint64_t n_err; /**< number of tx errors */
> +
> + memif_ring_t *ring; /**< pointer to ring */
> +
> + struct rte_intr_handle intr_handle; /**< interrupt handle */
> +
> + memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
> +};
> +
> +struct pmd_internals {
> + memif_interface_id_t id; /**< unique id */
> + enum memif_role_t role; /**< device role */
> + uint32_t flags; /**< device status flags */
> +#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
> +/**< device is connecting */
> +#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
> +/**< device is connected */
> +#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
> +/**< device is zero-copy enabled */
> +#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
> +/**< device has not been configured and can not accept connection requests */
> +
> + char *socket_filename; /**< pointer to socket filename */
> + char secret[ETH_MEMIF_SECRET_SIZE]; /**< secret (optional security parameter) */
> +
> + struct memif_control_channel *cc; /**< control channel */
> +
> + struct memif_region *regions[ETH_MEMIF_MAX_REGION_NUM];
> + /**< shared memory regions */
> + memif_region_index_t regions_num; /**< number of regions */
> +
> + /* remote info */
> + char remote_name[RTE_DEV_NAME_MAX_LEN]; /**< remote app name */
> + char remote_if_name[RTE_DEV_NAME_MAX_LEN]; /**< remote peer name */
> +
> + struct {
> + memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
> + uint8_t num_s2m_rings; /**< number of slave to master rings */
> + uint8_t num_m2s_rings; /**< number of master to slave rings */
> + uint16_t pkt_buffer_size; /**< buffer size */
> + } cfg; /**< Configured parameters (max values) */
> +
> + struct {
> + memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
> + uint8_t num_s2m_rings; /**< number of slave to master rings */
> + uint8_t num_m2s_rings; /**< number of master to slave rings */
> + uint16_t pkt_buffer_size; /**< buffer size */
> + } run;
> + /**< Parameters used in active connection */
> +
> + char local_disc_string[ETH_MEMIF_DISC_STRING_SIZE];
> + /**< local disconnect reason */
> + char remote_disc_string[ETH_MEMIF_DISC_STRING_SIZE];
> + /**< remote disconnect reason */
> +
> + struct rte_vdev_device *vdev; /**< vdev handle */
> +};
> +
> +/**
> + * Unmap shared memory and free regions from memory.
> + *
> + * @param pmd
> + * device internals
> + */
> +void memif_free_regions(struct pmd_internals *pmd);
> +
> +/**
> + * Finalize connection establishment process. Map shared memory file
> + * (master role), initialize ring queue, set link status up.
> + *
> + * @param pmd
> + * device internals
> + * @return
> + * - On success, zero.
> + * - On failure, a negative value.
> + */
> +int memif_connect(struct rte_eth_dev *dev);
> +
> +/**
> + * Create shared memory file and initialize ring queue.
> + * Only called by slave when establishing connection
> + *
> + * @param pmd
> + * device internals
> + * @return
> + * - On success, zero.
> + * - On failure, a negative value.
> + */
> +int memif_init_regions_and_queues(struct rte_eth_dev *dev);
> +
> +/**
> + * Get memif version string.
> + *
> + * @return
> + * - memif version string
> + */
> +const char *memif_version(void);
> +
> +#ifndef MFD_HUGETLB
> +#ifndef __NR_memfd_create
> +
> +#if defined __x86_64__
> +#define __NR_memfd_create 319
> +#elif defined __x86_32__
> +#define __NR_memfd_create 1073742143
> +#elif defined __arm__
> +#define __NR_memfd_create 385
> +#elif defined __aarch64__
> +#define __NR_memfd_create 279
> +#elif defined __powerpc__
> +#define __NR_memfd_create 360
> +#elif defined __i386__
> +#define __NR_memfd_create 356
> +#else
> +#error "__NR_memfd_create unknown for this architecture"
> +#endif
> +
> +#endif /* __NR_memfd_create */
> +
> +static inline int memfd_create(const char *name, unsigned int flags)
> +{
> + return syscall(__NR_memfd_create, name, flags);
> +}
> +#endif /* MFD_HUGETLB */
> +
> +#ifndef F_LINUX_SPECIFIC_BASE
> +#define F_LINUX_SPECIFIC_BASE 1024
> +#endif
> +
> +#ifndef MFD_ALLOW_SEALING
> +#define MFD_ALLOW_SEALING 0x0002U
> +#endif
> +
> +#ifndef F_ADD_SEALS
> +#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
> +#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
> +
> +#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
> +#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
> +#define F_SEAL_GROW 0x0004 /* prevent file from growing */
> +#define F_SEAL_WRITE 0x0008 /* prevent writes */
> +#endif
> +
> +#endif /* RTE_ETH_MEMIF_H */
> diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
> new file mode 100644
> index 000000000..8861484fb
> --- /dev/null
> +++ b/drivers/net/memif/rte_pmd_memif_version.map
> @@ -0,0 +1,4 @@
> +DPDK_19.08 {
> +
> + local: *;
> +};
> diff --git a/drivers/net/meson.build b/drivers/net/meson.build
> index ed99896c3..b57073483 100644
> --- a/drivers/net/meson.build
> +++ b/drivers/net/meson.build
> @@ -24,6 +24,7 @@ drivers = ['af_packet',
> 'ixgbe',
> 'kni',
> 'liquidio',
> + 'memif',
> 'mlx4',
> 'mlx5',
> 'mvneta',
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 7c9b4b538..326b2a30b 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -174,6 +174,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
> _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
> endif
> _LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
> _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
> _LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
> ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
> --
> 2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD
2019-05-31 6:22 ` [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD Jakub Grajciar
2019-05-31 7:43 ` Ye Xiaolong
2019-06-03 13:37 ` Aaron Conole
@ 2019-06-05 11:55 ` Ferruh Yigit
2019-06-06 9:24 ` Ferruh Yigit
2019-06-06 8:24 ` [dpdk-dev] [PATCH v11] " Jakub Grajciar
3 siblings, 1 reply; 58+ messages in thread
From: Ferruh Yigit @ 2019-06-05 11:55 UTC (permalink / raw)
To: Jakub Grajciar, dev
On 5/31/2019 7:22 AM, Jakub Grajciar wrote:
> Memory interface (memif), provides high performance
> packet transfer over shared memory.
Almost there, can you please check below comments? I am hoping to merge next
version of the patch.
Thanks,
ferruh
>
> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
<...>
> +static const char *valid_arguments[] = {
> + ETH_MEMIF_ID_ARG,
> + ETH_MEMIF_ROLE_ARG,
> + ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
> + ETH_MEMIF_RING_SIZE_ARG,
> + ETH_MEMIF_SOCKET_ARG,
> + ETH_MEMIF_MAC_ARG,
> + ETH_MEMIF_ZC_ARG,
> + ETH_MEMIF_SECRET_ARG,
> + NULL
> +};
Checkpatch is giving following warning:
WARNING:STATIC_CONST_CHAR_ARRAY: static const char * array should probably be
static const char * const
#1885: FILE: drivers/net/memif/rte_eth_memif.c:39:
+static const char *valid_arguments[] = {
<...>
> +static int
> +rte_pmd_memif_probe(struct rte_vdev_device *vdev)
> +{
> + RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
> + RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
> + int ret = 0;
> + struct rte_kvargs *kvlist;
> + const char *name = rte_vdev_device_name(vdev);
> + enum memif_role_t role = MEMIF_ROLE_SLAVE;
> + memif_interface_id_t id = 0;
> + uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
> + memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
> + const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
> + uint32_t flags = 0;
> + const char *secret = NULL;
> + struct rte_ether_addr *ether_addr = rte_zmalloc("", sizeof(struct rte_ether_addr), 0);
This is a long line, and breaking it won't reduce the readability, can you
please break it. Checkpatch warning:
WARNING:LONG_LINE: line over 80 characters
#2939: FILE: drivers/net/memif/rte_eth_memif.c:1093:
+ struct rte_ether_addr *ether_addr = rte_zmalloc("", sizeof(struct
rte_ether_addr), 0);
<...>
> +static int
> +rte_pmd_memif_remove(struct rte_vdev_device *vdev)
> +{
> + struct rte_eth_dev *eth_dev;
> + int i;
> +
> + eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
> + if (eth_dev == NULL)
> + return 0;
> +
> + for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
> + (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->rx_queues[i]);
> + for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
> + (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->tx_queues[i]);
Although they point same function, better to use 'dev_ops->tx_queue_release' for
Tx queues.
> +
> + rte_free(eth_dev->process_private);
> + eth_dev->process_private = NULL;
"process_private" is not used in this PMD at all, no need to free it I think.
> +
> + rte_eth_dev_close(eth_dev->data->port_id);
There are two exit path from a PMD:
1) rte_eth_dev_close() API
2) rte_vdev_driver->remove() called by hotplug remove APIs ('rte_dev_remove()'
or 'rte_eal_hotplug_remove()')
Both should clear all PMD allocated resources. Since you are calling
'rte_eth_dev_close() from this .remove() function, it makes sense to move all
resource cleanup to .dev_close (like queue cleanup calls above).
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD
2019-06-05 11:55 ` Ferruh Yigit
@ 2019-06-06 9:24 ` Ferruh Yigit
2019-06-06 10:25 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
0 siblings, 1 reply; 58+ messages in thread
From: Ferruh Yigit @ 2019-06-06 9:24 UTC (permalink / raw)
To: Jakub Grajciar, dev
On 6/5/2019 12:55 PM, Ferruh Yigit wrote:
> On 5/31/2019 7:22 AM, Jakub Grajciar wrote:
>> Memory interface (memif), provides high performance
>> packet transfer over shared memory.
>
> Almost there, can you please check below comments? I am hoping to merge next
> version of the patch.
>
> Thanks,
> ferruh
>
>>
>> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
>
> <...>
>
>> +static const char *valid_arguments[] = {
>> + ETH_MEMIF_ID_ARG,
>> + ETH_MEMIF_ROLE_ARG,
>> + ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
>> + ETH_MEMIF_RING_SIZE_ARG,
>> + ETH_MEMIF_SOCKET_ARG,
>> + ETH_MEMIF_MAC_ARG,
>> + ETH_MEMIF_ZC_ARG,
>> + ETH_MEMIF_SECRET_ARG,
>> + NULL
>> +};
>
> Checkpatch is giving following warning:
>
> WARNING:STATIC_CONST_CHAR_ARRAY: static const char * array should probably be
> static const char * const
> #1885: FILE: drivers/net/memif/rte_eth_memif.c:39:
> +static const char *valid_arguments[] = {
>
> <...>
>
>> +static int
>> +rte_pmd_memif_remove(struct rte_vdev_device *vdev)
>> +{
>> + struct rte_eth_dev *eth_dev;
>> + int i;
>> +
>> + eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
>> + if (eth_dev == NULL)
>> + return 0;
>> +
>> + for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
>> + (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->rx_queues[i]);
>> + for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
>> + (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->tx_queues[i]);
>
> Although they point same function, better to use 'dev_ops->tx_queue_release' for
> Tx queues.
>
>> +
>> + rte_free(eth_dev->process_private);
>> + eth_dev->process_private = NULL;
>
> "process_private" is not used in this PMD at all, no need to free it I think.
>
>> +
>> + rte_eth_dev_close(eth_dev->data->port_id);
>
> There are two exit path from a PMD:
> 1) rte_eth_dev_close() API
> 2) rte_vdev_driver->remove() called by hotplug remove APIs ('rte_dev_remove()'
> or 'rte_eal_hotplug_remove()')
>
> Both should clear all PMD allocated resources. Since you are calling
> 'rte_eth_dev_close() from this .remove() function, it makes sense to move all
> resource cleanup to .dev_close (like queue cleanup calls above).
>
Hi Jakup,
Above comments seems not implemented in v11, can you please check them?
If anything is not clear feel free to reach me on the irc.
Thanks,
ferruh
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD
2019-06-06 9:24 ` Ferruh Yigit
@ 2019-06-06 10:25 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
2019-06-06 11:18 ` Ferruh Yigit
0 siblings, 1 reply; 58+ messages in thread
From: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco) @ 2019-06-06 10:25 UTC (permalink / raw)
To: Ferruh Yigit, dev
> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@intel.com>
> Sent: Thursday, June 6, 2019 11:24 AM
> To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
> <jgrajcia@cisco.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface
> (memif) PMD
>
> On 6/5/2019 12:55 PM, Ferruh Yigit wrote:
> > On 5/31/2019 7:22 AM, Jakub Grajciar wrote:
> >> Memory interface (memif), provides high performance packet transfer
> >> over shared memory.
> >
> > Almost there, can you please check below comments? I am hoping to
> > merge next version of the patch.
> >
> > Thanks,
> > ferruh
> >
> >>
> >> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
> >
> > <...>
> >
> >> +static const char *valid_arguments[] = {
> >> + ETH_MEMIF_ID_ARG,
> >> + ETH_MEMIF_ROLE_ARG,
> >> + ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
> >> + ETH_MEMIF_RING_SIZE_ARG,
> >> + ETH_MEMIF_SOCKET_ARG,
> >> + ETH_MEMIF_MAC_ARG,
> >> + ETH_MEMIF_ZC_ARG,
> >> + ETH_MEMIF_SECRET_ARG,
> >> + NULL
> >> +};
> >
> > Checkpatch is giving following warning:
> >
> > WARNING:STATIC_CONST_CHAR_ARRAY: static const char * array should
> > probably be static const char * const
> > #1885: FILE: drivers/net/memif/rte_eth_memif.c:39:
> > +static const char *valid_arguments[] = {
> >
> > <...>
> >
> >> +static int
> >> +rte_pmd_memif_remove(struct rte_vdev_device *vdev) {
> >> + struct rte_eth_dev *eth_dev;
> >> + int i;
> >> +
> >> + eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
> >> + if (eth_dev == NULL)
> >> + return 0;
> >> +
> >> + for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
> >> + (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data-
> >rx_queues[i]);
> >> + for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
> >> +
> >> +(*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->tx_queues[i]);
> >
> > Although they point same function, better to use
> > 'dev_ops->tx_queue_release' for Tx queues.
> >
> >> +
> >> + rte_free(eth_dev->process_private);
> >> + eth_dev->process_private = NULL;
> >
> > "process_private" is not used in this PMD at all, no need to free it I think.
> >
> >> +
> >> + rte_eth_dev_close(eth_dev->data->port_id);
> >
> > There are two exit path from a PMD:
> > 1) rte_eth_dev_close() API
> > 2) rte_vdev_driver->remove() called by hotplug remove APIs
> ('rte_dev_remove()'
> > or 'rte_eal_hotplug_remove()')
> >
> > Both should clear all PMD allocated resources. Since you are calling
> > 'rte_eth_dev_close() from this .remove() function, it makes sense to
> > move all resource cleanup to .dev_close (like queue cleanup calls above).
> >
>
> Hi Jakup,
>
> Above comments seems not implemented in v11, can you please check them?
> If anything is not clear feel free to reach me on the irc.
>
Sorry, I missed that mail. I'll get it fixed right away, but I don't understand what's wrong with 'static const char *valid_arguments[]...' other PMDs handle args the same way, can you please give me a hint?
Thanks
> Thanks,
> ferruh
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD
2019-06-06 10:25 ` Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
@ 2019-06-06 11:18 ` Ferruh Yigit
0 siblings, 0 replies; 58+ messages in thread
From: Ferruh Yigit @ 2019-06-06 11:18 UTC (permalink / raw)
To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco), dev
On 6/6/2019 11:25 AM, Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at
Cisco) wrote:
>
>
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@intel.com>
>> Sent: Thursday, June 6, 2019 11:24 AM
>> To: Jakub Grajciar -X (jgrajcia - PANTHEON TECHNOLOGIES at Cisco)
>> <jgrajcia@cisco.com>; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v10] net/memif: introduce memory interface
>> (memif) PMD
>>
>> On 6/5/2019 12:55 PM, Ferruh Yigit wrote:
>>> On 5/31/2019 7:22 AM, Jakub Grajciar wrote:
>>>> Memory interface (memif), provides high performance packet transfer
>>>> over shared memory.
>>>
>>> Almost there, can you please check below comments? I am hoping to
>>> merge next version of the patch.
>>>
>>> Thanks,
>>> ferruh
>>>
>>>>
>>>> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
>>>
>>> <...>
>>>
>>>> +static const char *valid_arguments[] = {
>>>> + ETH_MEMIF_ID_ARG,
>>>> + ETH_MEMIF_ROLE_ARG,
>>>> + ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
>>>> + ETH_MEMIF_RING_SIZE_ARG,
>>>> + ETH_MEMIF_SOCKET_ARG,
>>>> + ETH_MEMIF_MAC_ARG,
>>>> + ETH_MEMIF_ZC_ARG,
>>>> + ETH_MEMIF_SECRET_ARG,
>>>> + NULL
>>>> +};
>>>
>>> Checkpatch is giving following warning:
>>>
>>> WARNING:STATIC_CONST_CHAR_ARRAY: static const char * array should
>>> probably be static const char * const
>>> #1885: FILE: drivers/net/memif/rte_eth_memif.c:39:
>>> +static const char *valid_arguments[] = {
>>>
>>> <...>
>>>
>>>> +static int
>>>> +rte_pmd_memif_remove(struct rte_vdev_device *vdev) {
>>>> + struct rte_eth_dev *eth_dev;
>>>> + int i;
>>>> +
>>>> + eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
>>>> + if (eth_dev == NULL)
>>>> + return 0;
>>>> +
>>>> + for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
>>>> + (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data-
>>> rx_queues[i]);
>>>> + for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
>>>> +
>>>> +(*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->tx_queues[i]);
>>>
>>> Although they point same function, better to use
>>> 'dev_ops->tx_queue_release' for Tx queues.
>>>
>>>> +
>>>> + rte_free(eth_dev->process_private);
>>>> + eth_dev->process_private = NULL;
>>>
>>> "process_private" is not used in this PMD at all, no need to free it I think.
>>>
>>>> +
>>>> + rte_eth_dev_close(eth_dev->data->port_id);
>>>
>>> There are two exit path from a PMD:
>>> 1) rte_eth_dev_close() API
>>> 2) rte_vdev_driver->remove() called by hotplug remove APIs
>> ('rte_dev_remove()'
>>> or 'rte_eal_hotplug_remove()')
>>>
>>> Both should clear all PMD allocated resources. Since you are calling
>>> 'rte_eth_dev_close() from this .remove() function, it makes sense to
>>> move all resource cleanup to .dev_close (like queue cleanup calls above).
>>>
>>
>> Hi Jakup,
>>
>> Above comments seems not implemented in v11, can you please check them?
>> If anything is not clear feel free to reach me on the irc.
>>
>
>
> Sorry, I missed that mail. I'll get it fixed right away, but I don't understand what's wrong with 'static const char *valid_arguments[]...' other PMDs handle args the same way, can you please give me a hint?
It is not wrong, but good to have that second 'const', it prevents you update
the valid_arguments[] by mistake, like following will work without that 'const':
char p = "test";
valid_arguments[0] = p;
Since we don't support dynamically changing device arguments in runtime, good to
have the protection, it won't hurt.
>
> Thanks
>
>> Thanks,
>> ferruh
^ permalink raw reply [flat|nested] 58+ messages in thread
* [dpdk-dev] [PATCH v11] net/memif: introduce memory interface (memif) PMD
2019-05-31 6:22 ` [dpdk-dev] [PATCH v10] net/memif: introduce memory interface (memif) PMD Jakub Grajciar
` (2 preceding siblings ...)
2019-06-05 11:55 ` Ferruh Yigit
@ 2019-06-06 8:24 ` Jakub Grajciar
2019-06-06 11:38 ` [dpdk-dev] [PATCH v12] " Jakub Grajciar
3 siblings, 1 reply; 58+ messages in thread
From: Jakub Grajciar @ 2019-06-06 8:24 UTC (permalink / raw)
To: dev; +Cc: Jakub Grajciar
Shared memory packet interface (memif) PMD allows for DPDK and any other client
using memif (DPDK, VPP, libmemif) to communicate using shared memory.
The created device transmits packets in a raw format. It can be used with
Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
supported in DPDK memif implementation. Memif is Linux only.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
---
MAINTAINERS | 6 +
config/common_base | 5 +
config/common_linux | 1 +
doc/guides/nics/features/memif.ini | 14 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/memif.rst | 234 ++++
doc/guides/rel_notes/release_19_08.rst | 5 +
drivers/net/Makefile | 1 +
drivers/net/memif/Makefile | 31 +
drivers/net/memif/memif.h | 179 +++
drivers/net/memif/memif_socket.c | 1124 +++++++++++++++++
drivers/net/memif/memif_socket.h | 105 ++
drivers/net/memif/meson.build | 15 +
drivers/net/memif/rte_eth_memif.c | 1207 +++++++++++++++++++
drivers/net/memif/rte_eth_memif.h | 212 ++++
drivers/net/memif/rte_pmd_memif_version.map | 4 +
drivers/net/meson.build | 1 +
mk/rte.app.mk | 1 +
18 files changed, 3146 insertions(+)
create mode 100644 doc/guides/nics/features/memif.ini
create mode 100644 doc/guides/nics/memif.rst
create mode 100644 drivers/net/memif/Makefile
create mode 100644 drivers/net/memif/memif.h
create mode 100644 drivers/net/memif/memif_socket.c
create mode 100644 drivers/net/memif/memif_socket.h
create mode 100644 drivers/net/memif/meson.build
create mode 100644 drivers/net/memif/rte_eth_memif.c
create mode 100644 drivers/net/memif/rte_eth_memif.h
create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
requires patch: http://patchwork.dpdk.org/patch/51511/
Memif uses a unix domain socket as control channel (cc).
rte_interrupts.h is used to handle interrupts for this cc.
This patch is required, because the message received can
be a reason for disconnecting.
v3:
- coding style fixes
- documentation
- doxygen comments
- use strlcpy() instead of strncpy()
- use RTE_BUILD_BUG_ON instead of _Static_assert()
- fix meson build deps
v4:
- coding style fixes
- doc update (desc format, messaging, features, ...)
- pointer arithmetic fix
- __rte_packed and __rte_aligned instead of __attribute__()
- fixed multi-queue
- support additional archs (memfd_create syscall)
v5:
- coding style fixes
- add release notes
- remove unused dependencies
- doc update
v6:
- remove socket on termination of testpmd application
- disallow memif probing in secondary process (return error), multi-process support will be implemented in separate patch
- fix chained buffer packet length
- use single region for rings and buffers (non-zero-copy)
- doc update
v7:
- coding style fixes
v8:
- release notes: 19.05 -> 19.08
v9:
- rearrange elements in structures to remove holes
- default socket file in /run
- #define name sizes
- implement dev_close op
v10:
- fix shared lib build
- clear memory allocated by PMD in .dev_close
- fix NULL pointer string 'error: ‘%s’ directive argument is null'
v11:
- fix strict-aliasing
- correct parameter names in doxygen comments
diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829c5..3698c146b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -841,6 +841,12 @@ F: drivers/net/softnic/
F: doc/guides/nics/features/softnic.ini
F: doc/guides/nics/softnic.rst
+Memif PMD
+M: Jakub Grajciar <jgrajcia@cisco.com>
+F: drivers/net/memif/
+F: doc/guides/nics/memif.rst
+F: doc/guides/nics/features/memif.ini
+
Crypto Drivers
--------------
diff --git a/config/common_base b/config/common_base
index 6f19ad5d2..e406e7836 100644
--- a/config/common_base
+++ b/config/common_base
@@ -444,6 +444,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
#
CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+#
+# Compile Memory Interface PMD driver (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_MEMIF=n
+
#
# Compile link bonding PMD library
#
diff --git a/config/common_linux b/config/common_linux
index 75334273d..87514fe4f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_LIBRTE_IFC_PMD=y
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_MEMIF=y
CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
CONFIG_RTE_LIBRTE_PMD_TAP=y
CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
new file mode 100644
index 000000000..807d9ecdc
--- /dev/null
+++ b/doc/guides/nics/features/memif.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'memif' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status = Y
+Basic stats = Y
+Jumbo frame = Y
+ARMv8 = Y
+Power8 = Y
+x86-32 = Y
+x86-64 = Y
+Usage doc = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2221c35f2..691e7209f 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
intel_vf
kni
liquidio
+ memif
mlx4
mlx5
mvneta
diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
new file mode 100644
index 000000000..de2d481eb
--- /dev/null
+++ b/doc/guides/nics/memif.rst
@@ -0,0 +1,234 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2018-2019 Cisco Systems, Inc.
+
+======================
+Memif Poll Mode Driver
+======================
+
+Shared memory packet interface (memif) PMD allows for DPDK and any other client
+using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
+Linux only.
+
+The created device transmits packets in a raw format. It can be used with
+Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
+supported in DPDK memif implementation.
+
+Memif works in two roles: master and slave. Slave connects to master over an
+existing socket. It is also a producer of shared memory file and initializes
+the shared memory. Each interface can be connected to one peer interface
+at same time. The peer interface is identified by id parameter. Master
+creates the socket and listens for any slave connection requests. The socket
+may already exist on the system. Be sure to remove any such sockets, if you
+are creating a master interface, or you will see an "Address already in use"
+error. Function ``rte_pmd_memif_remove()``, which removes memif interface,
+will also remove a listener socket, if it is not being used by any other
+interface.
+
+The method to enable one or more interfaces is to use the
+``--vdev=net_memif0`` option on the DPDK application command line. Each
+``--vdev=net_memif1`` option given will create an interface named net_memif0,
+net_memif1, and so on. Memif uses unix domain socket to transmit control
+messages. Each memif has a unique id per socket. This id is used to identify
+peer interface. If you are connecting multiple
+interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
+etc. Note that if you assign a socket to a master interface it becomes a
+listener socket. Listener socket can not be used by a slave interface on same
+client.
+
+.. csv-table:: **Memif configuration options**
+ :header: "Option", "Description", "Default", "Valid value"
+
+ "id=0", "Used to identify peer interface", "0", "uint32_t"
+ "role=master", "Set memif role", "slave", "master|slave"
+ "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
+ "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
+ "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
+ "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
+ "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
+ "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
+
+**Connection establishment**
+
+In order to create memif connection, two memif interfaces, each in separate
+process, are needed. One interface in ``master`` role and other in
+``slave`` role. It is not possible to connect two interfaces in a single
+process. Each interface can be connected to one interface at same time,
+identified by matching id parameter.
+
+Memif driver uses unix domain socket to exchange required information between
+memif interfaces. Socket file path is specified at interface creation see
+*Memif configuration options* table above. If socket is used by ``master``
+interface, it's marked as listener socket (in scope of current process) and
+listens to connection requests from other processes. One socket can be used by
+multiple interfaces. One process can have ``slave`` and ``master`` interfaces
+at the same time, provided each role is assigned unique socket.
+
+For detailed information on memif control messages, see: net/memif/memif.h.
+
+Slave interface attempts to make a connection on assigned socket. Process
+listening on this socket will extract the connection request and create a new
+connected socket (control channel). Then it sends the 'hello' message
+(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
+adjusts its configuration accordingly, and sends 'init' message
+(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
+uses this id to find master interface, and assigns the control channel to this
+interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
+sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
+every region allocated. Master responds to each of these messages with 'ack'
+message. Same behavior applies to rings. Slave sends 'add ring' message
+(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
+each message with 'ack' message. To finalize the connection, slave interface
+sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
+master maps regions to its address space, initializes rings and responds with
+'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
+(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
+any time, due to driver error or if the interface is being deleted.
+
+Files
+
+- net/memif/memif.h *- control messages definitions*
+- net/memif/memif_socket.h
+- net/memif/memif_socket.c
+
+Shared memory
+~~~~~~~~~~~~~
+
+**Shared memory format**
+
+Slave is producer and master is consumer. Memory regions, are mapped shared memory files,
+created by memif slave and provided to master at connection establishment.
+Regions contain rings and buffers. Rings and buffers can also be separated into multiple
+regions. For no-zero-copy, rings and buffers are stored inside single memory
+region to reduce the number of opened files.
+
+region n (no-zero-copy):
+
++-----------------------+-------------------------------------------------------------------------+
+| Rings | Buffers |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+| S2M rings | M2S rings | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+
+S2M OR M2S Rings:
+
++--------+--------+-----------------------+
+| ring 0 | ring 1 | ring num_s2m_rings - 1|
++--------+--------+-----------------------+
+
+ring 0:
+
++-------------+---------------------------------------+
+| ring header | (1 << pmd->run.log2_ring_size) * desc |
++-------------+---------------------------------------+
+
+Descriptors are assigned packet buffers in order of rings creation. If we have one ring
+in each direction and ring size is 1024, then first 1024 buffers will belong to S2M ring and
+last 1024 will belong to M2S ring. In case of zero-copy, buffers are dequeued and
+enqueued as needed.
+
+**Descriptor format**
+
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|0 |length |region |flags |
++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
+|1 |metadata |offset |
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+**Flags field - flags (Quad Word 0, bits 0:15)**
+
++-----+--------------------+------------------------------------------------------------------------------------------------+
+|Bits |Name |Functionality |
++=====+====================+================================================================================================+
+|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
++-----+--------------------+------------------------------------------------------------------------------------------------+
+
+**Region index - region (Quad Word 0, 16:31)**
+
+Index of memory region, the buffer is located in.
+
+**Data length - length (Quad Word 0, 32:63)**
+
+Length of transmitted/received data.
+
+**Data Offset - offset (Quad Word 1, 0:31)**
+
+Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
+
+**Metadata - metadata (Quad Word 1, 32:63)**
+
+Buffer metadata.
+
+Files
+
+- net/memif/memif.h *- descriptor and ring definitions*
+- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
+
+Example: testpmd
+----------------------------
+In this example we run two instances of testpmd application and transmit packets over memif.
+
+First create ``master`` interface::
+
+ #./build/app/testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
+
+Now create ``slave`` interface (master must be already running so the slave will connect)::
+
+ #./build/app/testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
+
+Start forwarding packets::
+
+ Slave:
+ testpmd> start
+
+ Master:
+ testpmd> start tx_first
+
+Show status::
+
+ testpmd> show port stats 0
+
+For more details on testpmd please refer to :doc:`../testpmd_app_ug/index`.
+
+Example: testpmd and VPP
+------------------------
+For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
+
+Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
+
+ vpp# create interface memif id 0 master no-zero-copy
+ vpp# set interface state memif0/0 up
+ vpp# set interface ip address memif0/0 192.168.1.1/24
+
+To see socket filename use show memif command::
+
+ vpp# show memif
+ sockets
+ id listener filename
+ 0 yes (1) /run/vpp/memif.sock
+ ...
+
+Now create memif interface by running testpmd with these command line options::
+
+ #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
+
+Testpmd should now create memif slave interface and try to connect to master.
+In testpmd set forward option to icmpecho and start forwarding::
+
+ testpmd> set fwd icmpecho
+ testpmd> start
+
+Send ping from VPP::
+
+ vpp# ping 192.168.1.2
+ 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
+ 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
+ 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
+ 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index a17e7dea5..46efd13f0 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -54,6 +54,11 @@ New Features
Also, make sure to start the actual text at the margin.
=========================================================
+* **Added memif PMD.**
+
+ Added the new Shared Memory Packet Interface (``memif``) PMD.
+ See the :doc:`../nics/memif` guide for more details on this new driver.
+
Removed Items
-------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3a72cf38c..78cb10fc6 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -35,6 +35,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
DIRS-$(CONFIG_RTE_LIBRTE_IPN3KE_PMD) += ipn3ke
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
new file mode 100644
index 000000000..c3119625c
--- /dev/null
+++ b/drivers/net/memif/Makefile
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_memif.a
+
+EXPORT_MAP := rte_pmd_memif_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
+LDLIBS += -lrte_ethdev -lrte_kvargs
+LDLIBS += -lrte_hash
+LDLIBS += -lrte_bus_vdev
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
new file mode 100644
index 000000000..3948b1f50
--- /dev/null
+++ b/drivers/net/memif/memif.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_H_
+#define _MEMIF_H_
+
+#define MEMIF_COOKIE 0x3E31F20
+#define MEMIF_VERSION_MAJOR 2
+#define MEMIF_VERSION_MINOR 0
+#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
+#define MEMIF_NAME_SZ 32
+
+/*
+ * S2M: direction slave -> master
+ * M2S: direction master -> slave
+ */
+
+/*
+ * Type definitions
+ */
+
+typedef enum memif_msg_type {
+ MEMIF_MSG_TYPE_NONE,
+ MEMIF_MSG_TYPE_ACK,
+ MEMIF_MSG_TYPE_HELLO,
+ MEMIF_MSG_TYPE_INIT,
+ MEMIF_MSG_TYPE_ADD_REGION,
+ MEMIF_MSG_TYPE_ADD_RING,
+ MEMIF_MSG_TYPE_CONNECT,
+ MEMIF_MSG_TYPE_CONNECTED,
+ MEMIF_MSG_TYPE_DISCONNECT,
+} memif_msg_type_t;
+
+typedef enum {
+ MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
+ MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
+} memif_ring_type_t;
+
+typedef enum {
+ MEMIF_INTERFACE_MODE_ETHERNET,
+ MEMIF_INTERFACE_MODE_IP,
+ MEMIF_INTERFACE_MODE_PUNT_INJECT,
+} memif_interface_mode_t;
+
+typedef uint16_t memif_region_index_t;
+typedef uint32_t memif_region_offset_t;
+typedef uint64_t memif_region_size_t;
+typedef uint16_t memif_ring_index_t;
+typedef uint32_t memif_interface_id_t;
+typedef uint16_t memif_version_t;
+typedef uint8_t memif_log2_ring_size_t;
+
+/*
+ * Socket messages
+ */
+
+ /**
+ * M2S
+ * Contains master interfaces configuration.
+ */
+typedef struct __rte_packed {
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+ memif_version_t min_version; /**< lowest supported memif version */
+ memif_version_t max_version; /**< highest supported memif version */
+ memif_region_index_t max_region; /**< maximum num of regions */
+ memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
+ memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
+ memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
+} memif_msg_hello_t;
+
+/**
+ * S2M
+ * Contains information required to identify interface
+ * to which the slave wants to connect.
+ */
+typedef struct __rte_packed {
+ memif_version_t version; /**< memif version */
+ memif_interface_id_t id; /**< interface id */
+ memif_interface_mode_t mode:8; /**< interface mode */
+ uint8_t secret[24]; /**< optional security parameter */
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+} memif_msg_init_t;
+
+/**
+ * S2M
+ * Request master to add new shared memory region to master interface.
+ * Shared files file descriptor is passed in cmsghdr.
+ */
+typedef struct __rte_packed {
+ memif_region_index_t index; /**< shm regions index */
+ memif_region_size_t size; /**< shm region size */
+} memif_msg_add_region_t;
+
+/**
+ * S2M
+ * Request master to add new ring to master interface.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
+ memif_ring_index_t index; /**< ring index */
+ memif_region_index_t region; /**< region index on which this ring is located */
+ memif_region_offset_t offset; /**< buffer start offset */
+ memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
+ uint16_t private_hdr_size; /**< used for private metadata */
+} memif_msg_add_ring_t;
+
+/**
+ * S2M
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
+} memif_msg_connect_t;
+
+/**
+ * M2S
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
+} memif_msg_connected_t;
+
+/**
+ * S2M & M2S
+ * Disconnect interfaces.
+ */
+typedef struct __rte_packed {
+ uint32_t code; /**< error code */
+ uint8_t string[96]; /**< disconnect reason */
+} memif_msg_disconnect_t;
+
+typedef struct __rte_packed __rte_aligned(128)
+{
+ memif_msg_type_t type:16;
+ union {
+ memif_msg_hello_t hello;
+ memif_msg_init_t init;
+ memif_msg_add_region_t add_region;
+ memif_msg_add_ring_t add_ring;
+ memif_msg_connect_t connect;
+ memif_msg_connected_t connected;
+ memif_msg_disconnect_t disconnect;
+ };
+} memif_msg_t;
+
+/*
+ * Ring and Descriptor Layout
+ */
+
+/**
+ * Buffer descriptor.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
+ memif_region_index_t region; /**< region index on which the buffer is located */
+ uint32_t length; /**< buffer length */
+ memif_region_offset_t offset; /**< buffer offset */
+ uint32_t metadata;
+} memif_desc_t;
+
+#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
+ uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
+
+typedef struct {
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+ uint32_t cookie; /**< MEMIF_COOKIE */
+ uint16_t flags; /**< flags */
+#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ volatile uint16_t head; /**< pointer to ring buffer head */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ volatile uint16_t tail; /**< pointer to ring buffer tail */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ memif_desc_t desc[0]; /**< buffer descriptors */
+} memif_ring_t;
+
+#endif /* _MEMIF_H_ */
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
new file mode 100644
index 000000000..1e046b6b1
--- /dev/null
+++ b/drivers/net/memif/memif_socket.c
@@ -0,0 +1,1124 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+static void memif_intr_handler(void *arg);
+
+static ssize_t
+memif_msg_send(int fd, memif_msg_t *msg, int afd)
+{
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ struct cmsghdr *cmsg;
+ char ctl[CMSG_SPACE(sizeof(int))];
+
+ iov[0].iov_base = msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+
+ if (afd > 0) {
+ memset(&ctl, 0, sizeof(ctl));
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+ cmsg = CMSG_FIRSTHDR(&mh);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
+ }
+
+ return sendmsg(fd, &mh, 0);
+}
+
+static int
+memif_msg_send_from_queue(struct memif_control_channel *cc)
+{
+ ssize_t size;
+ int ret = 0;
+ struct memif_msg_queue_elt *e;
+
+ e = TAILQ_FIRST(&cc->msg_queue);
+ if (e == NULL)
+ return 0;
+
+ size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
+ ret = -1;
+ } else {
+ MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
+ }
+ TAILQ_REMOVE(&cc->msg_queue, e, next);
+ rte_free(e);
+
+ return ret;
+}
+
+static struct memif_msg_queue_elt *
+memif_msg_enq(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e;
+
+ e = rte_zmalloc("memif_msg", sizeof(struct memif_msg_queue_elt), 0);
+ if (e == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control message.");
+ return NULL;
+ }
+
+ e->fd = -1;
+ TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
+
+ return e;
+}
+
+void
+memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code)
+{
+ struct memif_msg_queue_elt *e;
+ struct pmd_internals *pmd;
+ memif_msg_disconnect_t *d;
+
+ if (cc == NULL) {
+ MIF_LOG(DEBUG, "Missing control channel.");
+ return;
+ }
+
+ e = memif_msg_enq(cc);
+ if (e == NULL) {
+ MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
+ return;
+ }
+
+ d = &e->msg.disconnect;
+
+ e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
+ d->code = err_code;
+
+ if (reason != NULL) {
+ strlcpy((char *)d->string, reason, sizeof(d->string));
+ if (cc->dev != NULL) {
+ pmd = cc->dev->data->dev_private;
+ strlcpy(pmd->local_disc_string, reason,
+ sizeof(pmd->local_disc_string));
+ }
+ }
+}
+
+static int
+memif_msg_enq_hello(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ memif_msg_hello_t *h;
+
+ if (e == NULL)
+ return -1;
+
+ h = &e->msg.hello;
+
+ e->msg.type = MEMIF_MSG_TYPE_HELLO;
+ h->min_version = MEMIF_VERSION;
+ h->max_version = MEMIF_VERSION;
+ h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_region = ETH_MEMIF_MAX_REGION_NUM - 1;
+ h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
+
+ strlcpy((char *)h->name, rte_version(), sizeof(h->name));
+
+ return 0;
+}
+
+static int
+memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_hello_t *h = &msg->hello;
+
+ if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
+ memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ /* Set parameters for active connection */
+ pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
+ pmd->cfg.num_s2m_rings);
+ pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
+ pmd->cfg.num_m2s_rings);
+ pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
+ pmd->cfg.log2_ring_size);
+ pmd->run.pkt_buffer_size = pmd->cfg.pkt_buffer_size;
+
+ strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
+
+ MIF_LOG(DEBUG, "%s: Connecting to %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
+{
+ memif_msg_init_t *i = &msg->init;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *pmd;
+ struct rte_eth_dev *dev;
+
+ if (i->version != MEMIF_VERSION) {
+ memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ if (cc->socket == NULL) {
+ memif_msg_enq_disconnect(cc, "Device error", 0);
+ return -1;
+ }
+
+ /* Find device with requested ID */
+ TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
+ dev = elt->dev;
+ pmd = dev->data->dev_private;
+ if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
+ pmd->id == i->id) {
+ /* assign control channel to device */
+ cc->dev = dev;
+ pmd->cc = cc;
+
+ if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Only ethernet mode supported",
+ 0);
+ return -1;
+ }
+
+ if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
+ ETH_MEMIF_FLAG_CONNECTED)) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Already connected", 0);
+ return -1;
+ }
+ strlcpy(pmd->remote_name, (char *)i->name,
+ sizeof(pmd->remote_name));
+
+ if (*pmd->secret != '\0') {
+ if (*i->secret == '\0') {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Secret required", 0);
+ return -1;
+ }
+ if (strncmp(pmd->secret, (char *)i->secret,
+ ETH_MEMIF_SECRET_SIZE) != 0) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Incorrect secret", 0);
+ return -1;
+ }
+ }
+
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
+ return 0;
+ }
+ }
+
+ /* ID not found on this socket */
+ MIF_LOG(DEBUG, "ID %u not found.", i->id);
+ memif_msg_enq_disconnect(cc, "ID not found", 0);
+ return -1;
+}
+
+static int
+memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
+ int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_region_t *ar = &msg->add_region;
+ struct memif_region *r;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
+ return -1;
+ }
+
+ if (ar->index >= ETH_MEMIF_MAX_REGION_NUM || ar->index != pmd->regions_num ||
+ pmd->regions[ar->index] != NULL) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ r->fd = fd;
+ r->region_size = ar->size;
+ r->addr = NULL;
+
+ pmd->regions[ar->index] = r;
+ pmd->regions_num++;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_ring_t *ar = &msg->add_ring;
+ struct memif_queue *mq;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
+ return -1;
+ }
+
+ /* check if we have enough queues */
+ if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
+ if (ar->index >= pmd->cfg.num_s2m_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_s2m_rings++;
+ } else {
+ if (ar->index >= pmd->cfg.num_m2s_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_m2s_rings++;
+ }
+
+ mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
+ dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
+
+ mq->intr_handle.fd = fd;
+ mq->log2_ring_size = ar->log2_ring_size;
+ mq->region = ar->region;
+ mq->ring_offset = ar->offset;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connect_t *c = &msg->connect;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connected_t *c = &msg->connected;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_disconnect_t *d = &msg->disconnect;
+
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ strlcpy(pmd->remote_disc_string, (char *)d->string,
+ sizeof(pmd->remote_disc_string));
+
+ MIF_LOG(INFO, "%s: Disconnect received: %s",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memif_disconnect(rte_eth_dev_allocated
+ (rte_vdev_device_name(pmd->vdev)));
+ return 0;
+}
+
+static int
+memif_msg_enq_ack(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ if (e == NULL)
+ return -1;
+
+ e->msg.type = MEMIF_MSG_TYPE_ACK;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_init(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_init_t *i = &e->msg.init;
+
+ if (e == NULL)
+ return -1;
+
+ i = &e->msg.init;
+ e->msg.type = MEMIF_MSG_TYPE_INIT;
+ i->version = MEMIF_VERSION;
+ i->id = pmd->id;
+ i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
+
+ strlcpy((char *)i->name, rte_version(), sizeof(i->name));
+
+ if (*pmd->secret != '\0')
+ strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_add_region_t *ar;
+ struct memif_region *mr = pmd->regions[idx];
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_region;
+ e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
+ e->fd = mr->fd;
+ ar->index = idx;
+ ar->size = mr->region_size;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
+ memif_ring_type_t type)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ struct memif_queue *mq;
+ memif_msg_add_ring_t *ar;
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_ring;
+ mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
+ dev->data->rx_queues[idx];
+
+ e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
+ e->fd = mq->intr_handle.fd;
+ ar->index = idx;
+ ar->offset = mq->ring_offset;
+ ar->region = mq->region;
+ ar->log2_ring_size = mq->log2_ring_size;
+ ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
+ ar->private_hdr_size = 0;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connect_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connect;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECT;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connected(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connected_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connected;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static void
+memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
+{
+ struct memif_msg_queue_elt *elt;
+ struct memif_control_channel *cc = arg;
+
+ /* close control channel fd */
+ close(intr_handle->fd);
+ /* clear message queue */
+ while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
+ TAILQ_REMOVE(&cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ /* free control channel */
+ rte_free(cc);
+}
+
+void
+memif_disconnect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *elt, *next;
+ struct memif_queue *mq;
+ struct rte_intr_handle *ih;
+ int i;
+ int ret;
+
+ if (pmd->cc != NULL) {
+ /* Clear control message queue (except disconnect message if any). */
+ for (elt = TAILQ_FIRST(&pmd->cc->msg_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
+ TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ }
+ /* send disconnect message (if there is any in queue) */
+ memif_msg_send_from_queue(pmd->cc);
+
+ /* at this point, there should be no more messages in queue */
+ if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
+ MIF_LOG(WARNING,
+ "%s: Unexpected message(s) in message queue.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+
+ ih = &pmd->cc->intr_handle;
+ if (ih->fd > 0) {
+ ret = rte_intr_callback_unregister(ih,
+ memif_intr_handler,
+ pmd->cc);
+ /*
+ * If callback is active (disconnecting based on
+ * received control message).
+ */
+ if (ret == -EAGAIN) {
+ ret = rte_intr_callback_unregister_pending(ih,
+ memif_intr_handler,
+ pmd->cc,
+ memif_intr_unregister_handler);
+ } else if (ret > 0) {
+ close(ih->fd);
+ rte_free(pmd->cc);
+ }
+ pmd->cc = NULL;
+ if (ret <= 0)
+ MIF_LOG(WARNING, "%s: Failed to unregister "
+ "control channel callback.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+ }
+
+ /* unconfig interrupts */
+ for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
+ if (pmd->role == MEMIF_ROLE_SLAVE) {
+ if (dev->data->tx_queues != NULL)
+ mq = dev->data->tx_queues[i];
+ else
+ continue;
+ } else {
+ if (dev->data->rx_queues != NULL)
+ mq = dev->data->rx_queues[i];
+ else
+ continue;
+ }
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+ for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
+ if (pmd->role == MEMIF_ROLE_MASTER) {
+ if (dev->data->tx_queues != NULL)
+ mq = dev->data->tx_queues[i];
+ else
+ continue;
+ } else {
+ if (dev->data->rx_queues != NULL)
+ mq = dev->data->rx_queues[i];
+ else
+ continue;
+ }
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+
+ memif_free_regions(pmd);
+
+ /* reset connection configuration */
+ memset(&pmd->run, 0, sizeof(pmd->run));
+
+ dev->data->dev_link.link_status = ETH_LINK_DOWN;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
+ MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
+}
+
+static int
+memif_msg_receive(struct memif_control_channel *cc)
+{
+ char ctl[CMSG_SPACE(sizeof(int)) +
+ CMSG_SPACE(sizeof(struct ucred))] = { 0 };
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ memif_msg_t msg = { 0 };
+ ssize_t size;
+ int ret = 0;
+ struct ucred *cr __rte_unused;
+ cr = 0;
+ struct cmsghdr *cmsg;
+ int afd = -1;
+ int i;
+ struct pmd_internals *pmd;
+
+ iov[0].iov_base = (void *)&msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+
+ size = recvmsg(cc->intr_handle.fd, &mh, 0);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(DEBUG, "Invalid message size.");
+ memif_msg_enq_disconnect(cc, "Invalid message size", 0);
+ return -1;
+ }
+ MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
+
+ cmsg = CMSG_FIRSTHDR(&mh);
+ while (cmsg) {
+ if (cmsg->cmsg_level == SOL_SOCKET) {
+ if (cmsg->cmsg_type == SCM_CREDENTIALS)
+ cr = (struct ucred *)CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_RIGHTS)
+ memcpy(&afd, CMSG_DATA(cmsg), sizeof(int));
+ }
+ cmsg = CMSG_NXTHDR(&mh, cmsg);
+ }
+
+ if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
+ MIF_LOG(DEBUG, "Unexpected message.");
+ memif_msg_enq_disconnect(cc, "Unexpected message", 0);
+ return -1;
+ }
+
+ /* get device from hash data */
+ switch (msg.type) {
+ case MEMIF_MSG_TYPE_ACK:
+ break;
+ case MEMIF_MSG_TYPE_HELLO:
+ ret = memif_msg_receive_hello(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_init_regions_and_queues(cc->dev);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_init(cc->dev);
+ if (ret < 0)
+ goto exit;
+ pmd = cc->dev->data->dev_private;
+ for (i = 0; i < pmd->regions_num; i++) {
+ ret = memif_msg_enq_add_region(cc->dev, i);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_S2M);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_M2S);
+ if (ret < 0)
+ goto exit;
+ }
+ ret = memif_msg_enq_connect(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_INIT:
+ /*
+ * This cc does not have an interface asociated with it.
+ * If suitable interface is found it will be assigned here.
+ */
+ ret = memif_msg_receive_init(cc, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_REGION:
+ ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_RING:
+ ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECT:
+ ret = memif_msg_receive_connect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_connected(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECTED:
+ ret = memif_msg_receive_connected(cc->dev, &msg);
+ break;
+ case MEMIF_MSG_TYPE_DISCONNECT:
+ ret = memif_msg_receive_disconnect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ break;
+ default:
+ memif_msg_enq_disconnect(cc, "Unknown message type", 0);
+ ret = -1;
+ goto exit;
+ }
+
+ exit:
+ return ret;
+}
+
+static void
+memif_intr_handler(void *arg)
+{
+ struct memif_control_channel *cc = arg;
+ int ret;
+
+ ret = memif_msg_receive(cc);
+ /* if driver failed to assign device */
+ if (cc->dev == NULL) {
+ ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
+ memif_intr_handler,
+ cc,
+ memif_intr_unregister_handler);
+ if (ret < 0)
+ MIF_LOG(WARNING,
+ "Failed to unregister control channel callback.");
+ return;
+ }
+ /* if memif_msg_receive failed */
+ if (ret < 0)
+ goto disconnect;
+
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto disconnect;
+
+ return;
+
+ disconnect:
+ if (cc->dev == NULL) {
+ MIF_LOG(WARNING, "eth dev not allocated");
+ return;
+ }
+ memif_disconnect(cc->dev);
+}
+
+static void
+memif_listener_handler(void *arg)
+{
+ struct memif_socket *socket = arg;
+ int sockfd;
+ int addr_len;
+ struct sockaddr_un client;
+ struct memif_control_channel *cc;
+ int ret;
+
+ addr_len = sizeof(client);
+ sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
+ (socklen_t *)&addr_len);
+ if (sockfd < 0) {
+ MIF_LOG(ERR,
+ "Failed to accept connection request on socket fd %d",
+ socket->intr_handle.fd);
+ return;
+ }
+
+ MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
+
+ cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
+ if (cc == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control channel.");
+ goto error;
+ }
+
+ cc->intr_handle.fd = sockfd;
+ cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ cc->socket = socket;
+ cc->dev = NULL;
+ TAILQ_INIT(&cc->msg_queue);
+
+ ret = rte_intr_callback_register(&cc->intr_handle, memif_intr_handler, cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to register control channel callback.");
+ goto error;
+ }
+
+ ret = memif_msg_enq_hello(cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to enqueue hello message.");
+ goto error;
+ }
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto error;
+
+ return;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (cc != NULL)
+ rte_free(cc);
+}
+
+static struct memif_socket *
+memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
+{
+ struct memif_socket *sock;
+ struct sockaddr_un un;
+ int sockfd;
+ int ret;
+ int on = 1;
+
+ sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
+ if (sock == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory for memif socket");
+ return NULL;
+ }
+
+ sock->listener = listener;
+ rte_memcpy(sock->filename, key, 256);
+ TAILQ_INIT(&sock->dev_queue);
+
+ if (listener != 0) {
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0)
+ goto error;
+
+ un.sun_family = AF_UNIX;
+ memcpy(un.sun_path, sock->filename,
+ sizeof(un.sun_path) - 1);
+
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
+ sizeof(on));
+ if (ret < 0)
+ goto error;
+ ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+ if (ret < 0)
+ goto error;
+ ret = listen(sockfd, 1);
+ if (ret < 0)
+ goto error;
+
+ MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
+ rte_vdev_device_name(pmd->vdev), sock->filename);
+
+ sock->intr_handle.fd = sockfd;
+ sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ ret = rte_intr_callback_register(&sock->intr_handle,
+ memif_listener_handler, sock);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt "
+ "callback for listener socket",
+ rte_vdev_device_name(pmd->vdev));
+ return NULL;
+ }
+ }
+
+ return sock;
+
+ error:
+ MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
+ rte_vdev_device_name(pmd->vdev), key, strerror(errno));
+ if (sock != NULL)
+ rte_free(sock);
+ return NULL;
+}
+
+static struct rte_hash *
+memif_create_socket_hash(void)
+{
+ struct rte_hash_parameters params = { 0 };
+ params.name = MEMIF_SOCKET_HASH_NAME;
+ params.entries = 256;
+ params.key_len = 256;
+ params.hash_func = rte_jhash;
+ params.hash_func_init_val = 0;
+ return rte_hash_create(¶ms);
+}
+
+int
+memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *tmp_pmd;
+ struct rte_hash *hash;
+ int ret;
+ char key[256];
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL) {
+ hash = memif_create_socket_hash();
+ if (hash == NULL) {
+ MIF_LOG(ERR, "Failed to create memif socket hash.");
+ return -1;
+ }
+ }
+
+ memset(key, 0, 256);
+ rte_memcpy(key, socket_filename, strlen(socket_filename));
+ ret = rte_hash_lookup_data(hash, key, (void **)&socket);
+ if (ret < 0) {
+ socket = memif_socket_create(pmd, key,
+ (pmd->role ==
+ MEMIF_ROLE_SLAVE) ? 0 : 1);
+ if (socket == NULL)
+ return -1;
+ ret = rte_hash_add_key_data(hash, key, socket);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to add socket to socket hash.");
+ return ret;
+ }
+ }
+ pmd->socket_filename = socket->filename;
+
+ if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
+ MIF_LOG(ERR, "Socket is a listener.");
+ return -1;
+ } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
+ MIF_LOG(ERR, "Socket is not a listener.");
+ return -1;
+ }
+
+ TAILQ_FOREACH(elt, &socket->dev_queue, next) {
+ tmp_pmd = elt->dev->data->dev_private;
+ if (tmp_pmd->id == pmd->id) {
+ MIF_LOG(ERR, "Memif device with id %d already "
+ "exists on socket %s",
+ pmd->id, socket->filename);
+ return -1;
+ }
+ }
+
+ elt = rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt), 0);
+ if (elt == NULL) {
+ MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ elt->dev = dev;
+ TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
+
+ return 0;
+}
+
+void
+memif_socket_remove_device(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt, *next;
+ struct rte_hash *hash;
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL)
+ return;
+
+ if (pmd->socket_filename == NULL)
+ return;
+
+ if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) < 0)
+ return;
+
+ for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->dev == dev) {
+ TAILQ_REMOVE(&socket->dev_queue, elt, next);
+ rte_free(elt);
+ pmd->socket_filename = NULL;
+ }
+ }
+
+ /* remove socket, if this was the last device using it */
+ if (TAILQ_EMPTY(&socket->dev_queue)) {
+ rte_hash_del_key(hash, socket->filename);
+ if (socket->listener) {
+ /* remove listener socket file,
+ * so we can create new one later.
+ */
+ remove(socket->filename);
+ }
+ rte_free(socket);
+ }
+}
+
+int
+memif_connect_master(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+ return 0;
+}
+
+int
+memif_connect_slave(struct rte_eth_dev *dev)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_un sun;
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0) {
+ MIF_LOG(ERR, "%s: Failed to open socket.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ sun.sun_family = AF_UNIX;
+
+ memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
+
+ ret = connect(sockfd, (struct sockaddr *)&sun,
+ sizeof(struct sockaddr_un));
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+ goto error;
+ }
+
+ MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+
+ pmd->cc = rte_zmalloc("memif-cc",
+ sizeof(struct memif_control_channel), 0);
+ if (pmd->cc == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate control channel.",
+ rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ pmd->cc->intr_handle.fd = sockfd;
+ pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ pmd->cc->socket = NULL;
+ pmd->cc->dev = dev;
+ TAILQ_INIT(&pmd->cc->msg_queue);
+
+ ret = rte_intr_callback_register(&pmd->cc->intr_handle,
+ memif_intr_handler, pmd->cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt callback "
+ "for control fd", rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ return 0;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (pmd->cc != NULL) {
+ rte_free(pmd->cc);
+ pmd->cc = NULL;
+ }
+ return -1;
+}
diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
new file mode 100644
index 000000000..db293e200
--- /dev/null
+++ b/drivers/net/memif/memif_socket.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_SOCKET_H_
+#define _MEMIF_SOCKET_H_
+
+#include <sys/queue.h>
+
+/**
+ * Remove device from socket device list. If no device is left on the socket,
+ * remove the socket as well.
+ *
+ * @param dev
+ * memif device
+ */
+void memif_socket_remove_device(struct rte_eth_dev *dev);
+
+/**
+ * Enqueue disconnect message to control channel message queue.
+ *
+ * @param cc
+ * control channel
+ * @param reason
+ * const string stating disconnect reason (96 characters)
+ * @param err_code
+ * error code
+ */
+void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code);
+
+/**
+ * Initialize memif socket for specified device. If socket doesn't exist, create socket.
+ *
+ * @param dev
+ * memif device
+ * @param socket_filename
+ * socket filename
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
+
+/**
+ * Disconnect memif device. Close control channel and shared memory.
+ *
+ * @param dev
+ * memif device
+ */
+void memif_disconnect(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, enable connection establishment.
+ *
+ * @param dev
+ * memif device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_master(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, send connection request.
+ *
+ * @param dev
+ * memif device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_slave(struct rte_eth_dev *dev);
+
+struct memif_socket_dev_list_elt {
+ TAILQ_ENTRY(memif_socket_dev_list_elt) next;
+ struct rte_eth_dev *dev; /**< pointer to device internals */
+ char dev_name[RTE_ETH_NAME_MAX_LEN];
+};
+
+#define MEMIF_SOCKET_HASH_NAME "memif-sh"
+struct memif_socket {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ char filename[256]; /**< socket filename */
+
+ TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
+ /**< Queue of devices using this socket */
+ uint8_t listener; /**< if not zero socket is listener */
+};
+
+/* Control message queue. */
+struct memif_msg_queue_elt {
+ memif_msg_t msg; /**< control message */
+ TAILQ_ENTRY(memif_msg_queue_elt) next;
+ int fd; /**< fd to be sent to peer */
+};
+
+struct memif_control_channel {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
+ struct memif_socket *socket; /**< pointer to socket */
+ struct rte_eth_dev *dev; /**< pointer to device */
+};
+
+#endif /* MEMIF_SOCKET_H */
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
new file mode 100644
index 000000000..287a30ef2
--- /dev/null
+++ b/drivers/net/memif/meson.build
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+if host_machine.system() != 'linux'
+ build = false
+endif
+
+sources = files('rte_eth_memif.c',
+ 'memif_socket.c')
+
+allow_experimental_apis = true
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+
+deps += ['hash']
diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
new file mode 100644
index 000000000..4e6eeb7b6
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -0,0 +1,1207 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/if_ether.h>
+#include <errno.h>
+#include <sys/eventfd.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+#define ETH_MEMIF_ID_ARG "id"
+#define ETH_MEMIF_ROLE_ARG "role"
+#define ETH_MEMIF_PKT_BUFFER_SIZE_ARG "bsize"
+#define ETH_MEMIF_RING_SIZE_ARG "rsize"
+#define ETH_MEMIF_SOCKET_ARG "socket"
+#define ETH_MEMIF_MAC_ARG "mac"
+#define ETH_MEMIF_ZC_ARG "zero-copy"
+#define ETH_MEMIF_SECRET_ARG "secret"
+
+static const char *valid_arguments[] = {
+ ETH_MEMIF_ID_ARG,
+ ETH_MEMIF_ROLE_ARG,
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ ETH_MEMIF_RING_SIZE_ARG,
+ ETH_MEMIF_SOCKET_ARG,
+ ETH_MEMIF_MAC_ARG,
+ ETH_MEMIF_ZC_ARG,
+ ETH_MEMIF_SECRET_ARG,
+ NULL
+};
+
+const char *
+memif_version(void)
+{
+ return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
+}
+
+static void
+memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
+{
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+ dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static memif_ring_t *
+memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
+{
+ /* rings only in region 0 */
+ void *p = pmd->regions[0]->addr;
+ int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size);
+
+ p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
+
+ return (memif_ring_t *)p;
+}
+
+static void *
+memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
+{
+ return ((uint8_t *)pmd->regions[d->region]->addr + d->offset);
+}
+
+static int
+memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail,
+ struct rte_mbuf *tail)
+{
+ /* Check for number-of-segments-overflow */
+ if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS))
+ return -EOVERFLOW;
+
+ /* Chain 'tail' onto the old tail */
+ cur_tail->next = tail;
+
+ /* accumulate number of segments and total length. */
+ head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs);
+
+ tail->pkt_len = tail->data_len;
+ head->pkt_len += tail->pkt_len;
+
+ return 0;
+}
+
+static uint16_t
+eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
+ uint16_t n_rx_pkts = 0;
+ uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
+ RTE_PKTMBUF_HEADROOM;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail;
+ uint64_t b;
+ ssize_t size __rte_unused;
+ uint16_t head;
+ int ret;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ /* consume interrupt */
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
+ size = read(mq->intr_handle.fd, &b, sizeof(b));
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
+ last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+ if (cur_slot == last_slot)
+ goto refill;
+ n_slots = last_slot - cur_slot;
+
+ while (n_slots && n_rx_pkts < nb_pkts) {
+ mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf_head == NULL))
+ goto no_free_bufs;
+ mbuf = mbuf_head;
+ mbuf->port = mq->in_port;
+
+next_slot:
+ s0 = cur_slot & mask;
+ d0 = &ring->desc[s0];
+
+ src_len = d0->length;
+ dst_off = 0;
+ src_off = 0;
+
+ do {
+ dst_len = mbuf_size - dst_off;
+ if (dst_len == 0) {
+ dst_off = 0;
+ dst_len = mbuf_size;
+
+ /* store pointer to tail */
+ mbuf_tail = mbuf;
+ mbuf = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf == NULL))
+ goto no_free_bufs;
+ mbuf->port = mq->in_port;
+ ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf);
+ if (unlikely(ret < 0)) {
+ MIF_LOG(ERR, "%s: number-of-segments-overflow",
+ rte_vdev_device_name(pmd->vdev));
+ rte_pktmbuf_free(mbuf);
+ goto no_free_bufs;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ rte_pktmbuf_data_len(mbuf) += cp_len;
+ rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
+ if (mbuf != mbuf_head)
+ rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
+
+ memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
+ (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
+
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ } while (src_len);
+
+ cur_slot++;
+ n_slots--;
+
+ if (d0->flags & MEMIF_DESC_FLAG_NEXT)
+ goto next_slot;
+
+ mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
+ *bufs++ = mbuf_head;
+ n_rx_pkts++;
+ }
+
+no_free_bufs:
+ if (type == MEMIF_RING_S2M) {
+ rte_mb();
+ ring->tail = cur_slot;
+ mq->last_head = cur_slot;
+ } else {
+ mq->last_tail = cur_slot;
+ }
+
+refill:
+ if (type == MEMIF_RING_M2S) {
+ head = ring->head;
+ n_slots = ring_size - head + mq->last_tail;
+
+ while (n_slots--) {
+ s0 = head++ & mask;
+ d0 = &ring->desc[s0];
+ d0->length = pmd->run.pkt_buffer_size;
+ }
+ rte_mb();
+ ring->head = head;
+ }
+
+ mq->n_pkts += n_rx_pkts;
+ return n_rx_pkts;
+}
+
+static uint16_t
+eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head;
+ uint64_t a;
+ ssize_t size;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ n_free = ring->tail - mq->last_tail;
+ mq->last_tail += n_free;
+ slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+
+ if (type == MEMIF_RING_S2M)
+ n_free = ring_size - ring->head + mq->last_tail;
+ else
+ n_free = ring->head - ring->tail;
+
+ while (n_tx_pkts < nb_pkts && n_free) {
+ mbuf_head = *bufs++;
+ mbuf = mbuf_head;
+
+ saved_slot = slot;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+
+next_in_chain:
+ src_off = 0;
+ src_len = rte_pktmbuf_data_len(mbuf);
+
+ while (src_len) {
+ if (dst_len == 0) {
+ if (n_free) {
+ slot++;
+ n_free--;
+ d0->flags |= MEMIF_DESC_FLAG_NEXT;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+ d0->flags = 0;
+ } else {
+ slot = saved_slot;
+ goto no_free_slots;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
+ rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
+ cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ dst_len -= cp_len;
+
+ d0->length = dst_off;
+ }
+
+ if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
+ mbuf = mbuf->next;
+ goto next_in_chain;
+ }
+
+ n_tx_pkts++;
+ slot++;
+ n_free--;
+ rte_pktmbuf_free(mbuf_head);
+ }
+
+no_free_slots:
+ rte_mb();
+ if (type == MEMIF_RING_S2M)
+ ring->head = slot;
+ else
+ ring->tail = slot;
+
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
+ a = 1;
+ size = write(mq->intr_handle.fd, &a, sizeof(a));
+ if (unlikely(size < 0)) {
+ MIF_LOG(WARNING,
+ "%s: Failed to send interrupt. %s",
+ rte_vdev_device_name(pmd->vdev), strerror(errno));
+ }
+ }
+
+ mq->n_err += nb_pkts - n_tx_pkts;
+ mq->n_pkts += n_tx_pkts;
+ return n_tx_pkts;
+}
+
+void
+memif_free_regions(struct pmd_internals *pmd)
+{
+ int i;
+ struct memif_region *r;
+
+ /* regions are allocated contiguously, so it's
+ * enough to loop until 'pmd->regions_num'
+ */
+ for (i = 0; i < pmd->regions_num; i++) {
+ r = pmd->regions[i];
+ if (r != NULL) {
+ if (r->addr != NULL) {
+ munmap(r->addr, r->region_size);
+ if (r->fd > 0) {
+ close(r->fd);
+ r->fd = -1;
+ }
+ }
+ rte_free(r);
+ pmd->regions[i] = NULL;
+ }
+ }
+ pmd->regions_num = 0;
+}
+
+static int
+memif_region_init_shm(struct pmd_internals *pmd, uint8_t has_buffers)
+{
+ char shm_name[ETH_MEMIF_SHM_NAME_SIZE];
+ int ret = 0;
+ struct memif_region *r;
+
+ if (pmd->regions_num >= ETH_MEMIF_MAX_REGION_NUM) {
+ MIF_LOG(ERR, "%s: Too many regions.", rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ /* calculate buffer offset */
+ r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+
+ r->region_size = r->pkt_buffer_offset;
+ /* if region has buffers, add buffers size to region_size */
+ if (has_buffers == 1)
+ r->region_size += (uint32_t)(pmd->run.pkt_buffer_size *
+ (1 << pmd->run.log2_ring_size) *
+ (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings));
+
+ memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE);
+ snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d",
+ pmd->regions_num);
+
+ r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
+ if (r->fd < 0) {
+ MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ ret = -1;
+ goto error;
+ }
+
+ ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ ret = ftruncate(r->fd, r->region_size);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ r->addr = mmap(NULL, r->region_size, PROT_READ |
+ PROT_WRITE, MAP_SHARED, r->fd, 0);
+ if (r->addr == MAP_FAILED) {
+ MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(ret));
+ ret = -1;
+ goto error;
+ }
+
+ pmd->regions[pmd->regions_num] = r;
+ pmd->regions_num++;
+
+ return ret;
+
+error:
+ if (r->fd > 0)
+ close(r->fd);
+ r->fd = -1;
+
+ return ret;
+}
+
+static int
+memif_regions_init(struct pmd_internals *pmd)
+{
+ int ret;
+
+ /* create one buffer region */
+ ret = memif_region_init_shm(pmd, /* has buffer */ 1);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void
+memif_init_rings(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_ring_t *ring;
+ int i, j;
+ uint16_t slot;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = i * (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
+ (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = (i + pmd->run.num_s2m_rings) *
+ (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
+ (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+}
+
+/* called only by slave */
+static void
+memif_init_queues(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = dev->data->tx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for tx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = dev->data->rx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for rx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+}
+
+int
+memif_init_regions_and_queues(struct rte_eth_dev *dev)
+{
+ int ret;
+
+ ret = memif_regions_init(dev->data->dev_private);
+ if (ret < 0)
+ return ret;
+
+ memif_init_rings(dev);
+
+ memif_init_queues(dev);
+
+ return 0;
+}
+
+int
+memif_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_region *mr;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ mr = pmd->regions[i];
+ if (mr != NULL) {
+ if (mr->addr == NULL) {
+ if (mr->fd < 0)
+ return -1;
+ mr->addr = mmap(NULL, mr->region_size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, mr->fd, 0);
+ if (mr->addr == NULL)
+ return -1;
+ }
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->ring_offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_MASTER)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->ring_offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_SLAVE)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
+ dev->data->dev_link.link_status = ETH_LINK_UP;
+ MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
+ return 0;
+}
+
+static int
+memif_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int ret = 0;
+
+ switch (pmd->role) {
+ case MEMIF_ROLE_SLAVE:
+ ret = memif_connect_slave(dev);
+ break;
+ case MEMIF_ROLE_MASTER:
+ ret = memif_connect_master(dev);
+ break;
+ default:
+ MIF_LOG(ERR, "%s: Unknown role: %d.",
+ rte_vdev_device_name(pmd->vdev), pmd->role);
+ ret = -1;
+ break;
+ }
+
+ return ret;
+}
+
+static void
+memif_dev_close(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memif_msg_enq_disconnect(pmd->cc, "Device closed", 0);
+ memif_disconnect(dev);
+
+ memif_socket_remove_device(dev);
+}
+
+static int
+memif_dev_configure(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ /*
+ * SLAVE - TXQ
+ * MASTER - RXQ
+ */
+ pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_tx_queues : dev->data->nb_rx_queues;
+
+ /*
+ * SLAVE - RXQ
+ * MASTER - TXQ
+ */
+ pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+
+ return 0;
+}
+
+static int
+memif_tx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type =
+ (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->pmd = pmd;
+ dev->data->tx_queues[qid] = mq;
+
+ return 0;
+}
+
+static int
+memif_rx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->mempool = mb_pool;
+ mq->in_port = dev->data->port_id;
+ mq->pmd = pmd;
+ dev->data->rx_queues[qid] = mq;
+
+ return 0;
+}
+
+static void
+memif_queue_release(void *queue)
+{
+ struct memif_queue *mq = (struct memif_queue *)queue;
+
+ if (!mq)
+ return;
+
+ rte_free(mq);
+}
+
+static int
+memif_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static int
+memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+ uint8_t tmp, nq;
+
+ stats->ipackets = 0;
+ stats->ibytes = 0;
+ stats->opackets = 0;
+ stats->obytes = 0;
+ stats->oerrors = 0;
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
+ pmd->run.num_m2s_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* RX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->rx_queues[i];
+ stats->q_ipackets[i] = mq->n_pkts;
+ stats->q_ibytes[i] = mq->n_bytes;
+ stats->ipackets += mq->n_pkts;
+ stats->ibytes += mq->n_bytes;
+ }
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
+ pmd->run.num_s2m_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* TX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->tx_queues[i];
+ stats->q_opackets[i] = mq->n_pkts;
+ stats->q_obytes[i] = mq->n_bytes;
+ stats->opackets += mq->n_pkts;
+ stats->obytes += mq->n_bytes;
+ stats->oerrors += mq->n_err;
+ }
+ return 0;
+}
+
+static void
+memif_stats_reset(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+ struct memif_queue *mq;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
+ dev->data->rx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
+ dev->data->tx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+}
+
+static int
+memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
+ rte_vdev_device_name(pmd->vdev));
+
+ return -1;
+}
+
+static int
+memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
+
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = memif_dev_start,
+ .dev_close = memif_dev_close,
+ .dev_infos_get = memif_dev_info,
+ .dev_configure = memif_dev_configure,
+ .tx_queue_setup = memif_tx_queue_setup,
+ .rx_queue_setup = memif_rx_queue_setup,
+ .rx_queue_release = memif_queue_release,
+ .tx_queue_release = memif_queue_release,
+ .rx_queue_intr_enable = memif_rx_queue_intr_enable,
+ .rx_queue_intr_disable = memif_rx_queue_intr_disable,
+ .link_update = memif_link_update,
+ .stats_get = memif_stats_get,
+ .stats_reset = memif_stats_reset,
+};
+
+static int
+memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
+ memif_interface_id_t id, uint32_t flags,
+ const char *socket_filename,
+ memif_log2_ring_size_t log2_ring_size,
+ uint16_t pkt_buffer_size, const char *secret,
+ struct rte_ether_addr *ether_addr)
+{
+ int ret = 0;
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct pmd_internals *pmd;
+ const unsigned int numa_node = vdev->device.numa_node;
+ const char *name = rte_vdev_device_name(vdev);
+
+ if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
+ MIF_LOG(ERR, "Zero-copy slave not supported.");
+ return -1;
+ }
+
+ eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
+ if (eth_dev == NULL) {
+ MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
+ return -1;
+ }
+
+ pmd = eth_dev->data->dev_private;
+ memset(pmd, 0, sizeof(*pmd));
+
+ pmd->vdev = vdev;
+ pmd->id = id;
+ pmd->flags = flags;
+ pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
+ pmd->role = role;
+
+ ret = memif_socket_init(eth_dev, socket_filename);
+ if (ret < 0)
+ return ret;
+
+ memset(pmd->secret, 0, sizeof(char) * ETH_MEMIF_SECRET_SIZE);
+ if (secret != NULL)
+ strlcpy(pmd->secret, secret, sizeof(pmd->secret));
+
+ pmd->cfg.log2_ring_size = log2_ring_size;
+ /* set in .dev_configure() */
+ pmd->cfg.num_s2m_rings = 0;
+ pmd->cfg.num_m2s_rings = 0;
+
+ pmd->cfg.pkt_buffer_size = pkt_buffer_size;
+
+ data = eth_dev->data;
+ data->dev_private = pmd;
+ data->numa_node = numa_node;
+ data->mac_addrs = ether_addr;
+
+ eth_dev->dev_ops = &ops;
+ eth_dev->device = &vdev->device;
+ eth_dev->rx_pkt_burst = eth_memif_rx;
+ eth_dev->tx_pkt_burst = eth_memif_tx;
+
+ eth_dev->data->dev_flags |= RTE_ETH_DEV_CLOSE_REMOVE;
+
+ rte_eth_dev_probing_finish(eth_dev);
+
+ return 0;
+}
+
+static int
+memif_set_role(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ enum memif_role_t *role = (enum memif_role_t *)extra_args;
+
+ if (strstr(value, "master") != NULL) {
+ *role = MEMIF_ROLE_MASTER;
+ } else if (strstr(value, "slave") != NULL) {
+ *role = MEMIF_ROLE_SLAVE;
+ } else {
+ MIF_LOG(ERR, "Unknown role: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint32_t *flags = (uint32_t *)extra_args;
+
+ if (strstr(value, "yes") != NULL) {
+ *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
+ } else if (strstr(value, "no") != NULL) {
+ *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
+ } else {
+ MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
+
+ /* even if parsing fails, 0 is a valid id */
+ *id = strtoul(value, NULL, 10);
+ return 0;
+}
+
+static int
+memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ uint16_t *pkt_buffer_size = (uint16_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > 0xFFFF) {
+ MIF_LOG(ERR, "Invalid buffer size: %s.", value);
+ return -EINVAL;
+ }
+ *pkt_buffer_size = tmp;
+ return 0;
+}
+
+static int
+memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ memif_log2_ring_size_t *log2_ring_size =
+ (memif_log2_ring_size_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
+ MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
+ value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
+ return -EINVAL;
+ }
+ *log2_ring_size = tmp;
+ return 0;
+}
+
+/* check if directory exists and if we have permission to read/write */
+static int
+memif_check_socket_filename(const char *filename)
+{
+ char *dir = NULL, *tmp;
+ uint32_t idx;
+ int ret = 0;
+
+ tmp = strrchr(filename, '/');
+ if (tmp != NULL) {
+ idx = tmp - filename;
+ dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
+ if (dir == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory.");
+ return -1;
+ }
+ strlcpy(dir, filename, sizeof(char) * (idx + 1));
+ }
+
+ if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
+ W_OK, AT_EACCESS) < 0)) {
+ MIF_LOG(ERR, "Invalid socket directory.");
+ ret = -EINVAL;
+ }
+
+ if (dir != NULL)
+ rte_free(dir);
+
+ return ret;
+}
+
+static int
+memif_set_socket_filename(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ const char **socket_filename = (const char **)extra_args;
+
+ *socket_filename = value;
+ return memif_check_socket_filename(*socket_filename);
+}
+
+static int
+memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ struct rte_ether_addr *ether_addr = (struct rte_ether_addr *)extra_args;
+ int ret = 0;
+
+ ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+ ðer_addr->addr_bytes[0], ðer_addr->addr_bytes[1],
+ ðer_addr->addr_bytes[2], ðer_addr->addr_bytes[3],
+ ðer_addr->addr_bytes[4], ðer_addr->addr_bytes[5]);
+ if (ret != 6)
+ MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
+ return 0;
+}
+
+static int
+memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **secret = (const char **)extra_args;
+
+ *secret = value;
+ return 0;
+}
+
+static int
+rte_pmd_memif_probe(struct rte_vdev_device *vdev)
+{
+ RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
+ RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
+ int ret = 0;
+ struct rte_kvargs *kvlist;
+ const char *name = rte_vdev_device_name(vdev);
+ enum memif_role_t role = MEMIF_ROLE_SLAVE;
+ memif_interface_id_t id = 0;
+ uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
+ memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
+ const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
+ uint32_t flags = 0;
+ const char *secret = NULL;
+ struct rte_ether_addr *ether_addr = rte_zmalloc("",
+ sizeof(struct rte_ether_addr), 0);
+
+ rte_eth_random_addr(ether_addr->addr_bytes);
+
+ MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
+
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ MIF_LOG(ERR, "Multi-processing not supported for memif.");
+ /* TODO:
+ * Request connection information.
+ *
+ * Once memif in the primary process is connected,
+ * broadcast connection information.
+ */
+ return -1;
+ }
+
+ kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
+
+ /* parse parameters */
+ if (kvlist != NULL) {
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
+ &memif_set_role, &role);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
+ &memif_set_id, &id);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ &memif_set_bs, &pkt_buffer_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
+ &memif_set_rs, &log2_ring_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
+ &memif_set_socket_filename,
+ (void *)(&socket_filename));
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
+ &memif_set_mac, ether_addr);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
+ &memif_set_zc, &flags);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
+ &memif_set_secret, (void *)(&secret));
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* create interface */
+ ret = memif_create(vdev, role, id, flags, socket_filename,
+ log2_ring_size, pkt_buffer_size, secret, ether_addr);
+
+exit:
+ if (kvlist != NULL)
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_memif_remove(struct rte_vdev_device *vdev)
+{
+ struct rte_eth_dev *eth_dev;
+ int i;
+
+ eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
+ if (eth_dev == NULL)
+ return 0;
+
+ for (i = 0; i < eth_dev->data->nb_rx_queues; i++)
+ (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->rx_queues[i]);
+ for (i = 0; i < eth_dev->data->nb_tx_queues; i++)
+ (*eth_dev->dev_ops->rx_queue_release)(eth_dev->data->tx_queues[i]);
+
+ rte_free(eth_dev->process_private);
+ eth_dev->process_private = NULL;
+
+ rte_eth_dev_close(eth_dev->data->port_id);
+
+ return 0;
+}
+
+static struct rte_vdev_driver pmd_memif_drv = {
+ .probe = rte_pmd_memif_probe,
+ .remove = rte_pmd_memif_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
+
+RTE_PMD_REGISTER_PARAM_STRING(net_memif,
+ ETH_MEMIF_ID_ARG "=<int>"
+ ETH_MEMIF_ROLE_ARG "=master|slave"
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG "=<int>"
+ ETH_MEMIF_RING_SIZE_ARG "=<int>"
+ ETH_MEMIF_SOCKET_ARG "=<string>"
+ ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
+ ETH_MEMIF_ZC_ARG "=yes|no"
+ ETH_MEMIF_SECRET_ARG "=<string>");
+
+int memif_logtype;
+
+RTE_INIT(memif_init_log)
+{
+ memif_logtype = rte_log_register("pmd.net.memif");
+ if (memif_logtype >= 0)
+ rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
new file mode 100644
index 000000000..5f631e925
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.h
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_ETH_MEMIF_H_
+#define _RTE_ETH_MEMIF_H_
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif /* GNU_SOURCE */
+
+#include <sys/queue.h>
+
+#include <rte_ethdev_driver.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
+#include "memif.h"
+
+#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/run/memif.sock"
+#define ETH_MEMIF_DEFAULT_RING_SIZE 10
+#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
+
+#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
+#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
+#define ETH_MEMIF_MAX_REGION_NUM 256
+
+#define ETH_MEMIF_SHM_NAME_SIZE 32
+#define ETH_MEMIF_DISC_STRING_SIZE 96
+#define ETH_MEMIF_SECRET_SIZE 24
+
+extern int memif_logtype;
+
+#define MIF_LOG(level, fmt, args...) \
+ rte_log(RTE_LOG_ ## level, memif_logtype, \
+ "%s(): " fmt "\n", __func__, ##args)
+
+enum memif_role_t {
+ MEMIF_ROLE_MASTER,
+ MEMIF_ROLE_SLAVE,
+};
+
+struct memif_region {
+ void *addr; /**< shared memory address */
+ memif_region_size_t region_size; /**< shared memory size */
+ int fd; /**< shared memory file descriptor */
+ uint32_t pkt_buffer_offset;
+ /**< offset from 'addr' to first packet buffer */
+};
+
+struct memif_queue {
+ struct rte_mempool *mempool; /**< mempool for RX packets */
+ struct pmd_internals *pmd; /**< device internals */
+
+ memif_ring_type_t type; /**< ring type */
+ memif_region_index_t region; /**< shared memory region index */
+
+ uint16_t in_port; /**< port id */
+
+ memif_region_offset_t ring_offset;
+ /**< ring offset from start of shm region (ring - memif_region.addr) */
+
+ uint16_t last_head; /**< last ring head */
+ uint16_t last_tail; /**< last ring tail */
+
+ /* rx/tx info */
+ uint64_t n_pkts; /**< number of rx/tx packets */
+ uint64_t n_bytes; /**< number of rx/tx bytes */
+ uint64_t n_err; /**< number of tx errors */
+
+ memif_ring_t *ring; /**< pointer to ring */
+
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+};
+
+struct pmd_internals {
+ memif_interface_id_t id; /**< unique id */
+ enum memif_role_t role; /**< device role */
+ uint32_t flags; /**< device status flags */
+#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
+/**< device is connecting */
+#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
+/**< device is connected */
+#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
+/**< device is zero-copy enabled */
+#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
+/**< device has not been configured and can not accept connection requests */
+
+ char *socket_filename; /**< pointer to socket filename */
+ char secret[ETH_MEMIF_SECRET_SIZE]; /**< secret (optional security parameter) */
+
+ struct memif_control_channel *cc; /**< control channel */
+
+ struct memif_region *regions[ETH_MEMIF_MAX_REGION_NUM];
+ /**< shared memory regions */
+ memif_region_index_t regions_num; /**< number of regions */
+
+ /* remote info */
+ char remote_name[RTE_DEV_NAME_MAX_LEN]; /**< remote app name */
+ char remote_if_name[RTE_DEV_NAME_MAX_LEN]; /**< remote peer name */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } cfg; /**< Configured parameters (max values) */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } run;
+ /**< Parameters used in active connection */
+
+ char local_disc_string[ETH_MEMIF_DISC_STRING_SIZE];
+ /**< local disconnect reason */
+ char remote_disc_string[ETH_MEMIF_DISC_STRING_SIZE];
+ /**< remote disconnect reason */
+
+ struct rte_vdev_device *vdev; /**< vdev handle */
+};
+
+/**
+ * Unmap shared memory and free regions from memory.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_free_regions(struct pmd_internals *pmd);
+
+/**
+ * Finalize connection establishment process. Map shared memory file
+ * (master role), initialize ring queue, set link status up.
+ *
+ * @param dev
+ * memif device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect(struct rte_eth_dev *dev);
+
+/**
+ * Create shared memory file and initialize ring queue.
+ * Only called by slave when establishing connection
+ *
+ * @param dev
+ * memif device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_init_regions_and_queues(struct rte_eth_dev *dev);
+
+/**
+ * Get memif version string.
+ *
+ * @return
+ * - memif version string
+ */
+const char *memif_version(void);
+
+#ifndef MFD_HUGETLB
+#ifndef __NR_memfd_create
+
+#if defined __x86_64__
+#define __NR_memfd_create 319
+#elif defined __x86_32__
+#define __NR_memfd_create 1073742143
+#elif defined __arm__
+#define __NR_memfd_create 385
+#elif defined __aarch64__
+#define __NR_memfd_create 279
+#elif defined __powerpc__
+#define __NR_memfd_create 360
+#elif defined __i386__
+#define __NR_memfd_create 356
+#else
+#error "__NR_memfd_create unknown for this architecture"
+#endif
+
+#endif /* __NR_memfd_create */
+
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif /* MFD_HUGETLB */
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#endif /* RTE_ETH_MEMIF_H */
diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
new file mode 100644
index 000000000..8861484fb
--- /dev/null
+++ b/drivers/net/memif/rte_pmd_memif_version.map
@@ -0,0 +1,4 @@
+DPDK_19.08 {
+
+ local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index ed99896c3..b57073483 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -24,6 +24,7 @@ drivers = ['af_packet',
'ixgbe',
'kni',
'liquidio',
+ 'memif',
'mlx4',
'mlx5',
'mvneta',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7c9b4b538..326b2a30b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -174,6 +174,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
--
2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* [dpdk-dev] [PATCH v12] net/memif: introduce memory interface (memif) PMD
2019-06-06 8:24 ` [dpdk-dev] [PATCH v11] " Jakub Grajciar
@ 2019-06-06 11:38 ` Jakub Grajciar
2019-06-06 14:07 ` Ferruh Yigit
0 siblings, 1 reply; 58+ messages in thread
From: Jakub Grajciar @ 2019-06-06 11:38 UTC (permalink / raw)
To: dev; +Cc: Jakub Grajciar
Shared memory packet interface (memif) PMD allows for DPDK and any other
client using memif (DPDK, VPP, libmemif) to communicate using shared
memory. The created device transmits packets in a raw format. It can be
used with Ethernet mode, IP mode, or Punt/Inject. At this moment, only
Ethernet mode is supported in DPDK memif implementation. Memif is Linux
only.
Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
---
MAINTAINERS | 6 +
config/common_base | 5 +
config/common_linux | 1 +
doc/guides/nics/features/memif.ini | 14 +
doc/guides/nics/index.rst | 1 +
doc/guides/nics/memif.rst | 234 ++++
doc/guides/rel_notes/release_19_08.rst | 5 +
drivers/net/Makefile | 1 +
drivers/net/memif/Makefile | 31 +
drivers/net/memif/memif.h | 179 +++
drivers/net/memif/memif_socket.c | 1124 +++++++++++++++++
drivers/net/memif/memif_socket.h | 105 ++
drivers/net/memif/meson.build | 15 +
drivers/net/memif/rte_eth_memif.c | 1204 +++++++++++++++++++
drivers/net/memif/rte_eth_memif.h | 212 ++++
drivers/net/memif/rte_pmd_memif_version.map | 4 +
drivers/net/meson.build | 1 +
mk/rte.app.mk | 1 +
18 files changed, 3143 insertions(+)
create mode 100644 doc/guides/nics/features/memif.ini
create mode 100644 doc/guides/nics/memif.rst
create mode 100644 drivers/net/memif/Makefile
create mode 100644 drivers/net/memif/memif.h
create mode 100644 drivers/net/memif/memif_socket.c
create mode 100644 drivers/net/memif/memif_socket.h
create mode 100644 drivers/net/memif/meson.build
create mode 100644 drivers/net/memif/rte_eth_memif.c
create mode 100644 drivers/net/memif/rte_eth_memif.h
create mode 100644 drivers/net/memif/rte_pmd_memif_version.map
requires patch: http://patchwork.dpdk.org/patch/51511/
Memif uses a unix domain socket as control channel (cc).
rte_interrupts.h is used to handle interrupts for this cc.
This patch is required, because the message received can
be a reason for disconnecting.
v3:
- coding style fixes
- documentation
- doxygen comments
- use strlcpy() instead of strncpy()
- use RTE_BUILD_BUG_ON instead of _Static_assert()
- fix meson build deps
v4:
- coding style fixes
- doc update (desc format, messaging, features, ...)
- pointer arithmetic fix
- __rte_packed and __rte_aligned instead of __attribute__()
- fixed multi-queue
- support additional archs (memfd_create syscall)
v5:
- coding style fixes
- add release notes
- remove unused dependencies
- doc update
v6:
- remove socket on termination of testpmd application
- disallow memif probing in secondary process (return error), multi-process support will be implemented in separate patch
- fix chained buffer packet length
- use single region for rings and buffers (non-zero-copy)
- doc update
v7:
- coding style fixes
v8:
- release notes: 19.05 -> 19.08
v9:
- rearrange elements in structures to remove holes
- default socket file in /run
- #define name sizes
- implement dev_close op
v10:
- fix shared lib build
- clear memory allocated by PMD in .dev_close
- fix NULL pointer string 'error: ‘%s’ directive argument is null'
v11:
- fix strict-aliasing
- correct parameter names in doxygen comments
v12:
- fix coding style issues
- free resources in .dev_close
diff --git a/MAINTAINERS b/MAINTAINERS
index 15d0829c5..3698c146b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -841,6 +841,12 @@ F: drivers/net/softnic/
F: doc/guides/nics/features/softnic.ini
F: doc/guides/nics/softnic.rst
+Memif PMD
+M: Jakub Grajciar <jgrajcia@cisco.com>
+F: drivers/net/memif/
+F: doc/guides/nics/memif.rst
+F: doc/guides/nics/features/memif.ini
+
Crypto Drivers
--------------
diff --git a/config/common_base b/config/common_base
index 6f19ad5d2..e406e7836 100644
--- a/config/common_base
+++ b/config/common_base
@@ -444,6 +444,11 @@ CONFIG_RTE_LIBRTE_PMD_AF_PACKET=n
#
CONFIG_RTE_LIBRTE_PMD_AF_XDP=n
+#
+# Compile Memory Interface PMD driver (Linux only)
+#
+CONFIG_RTE_LIBRTE_PMD_MEMIF=n
+
#
# Compile link bonding PMD library
#
diff --git a/config/common_linux b/config/common_linux
index 75334273d..87514fe4f 100644
--- a/config/common_linux
+++ b/config/common_linux
@@ -19,6 +19,7 @@ CONFIG_RTE_LIBRTE_VHOST_POSTCOPY=n
CONFIG_RTE_LIBRTE_PMD_VHOST=y
CONFIG_RTE_LIBRTE_IFC_PMD=y
CONFIG_RTE_LIBRTE_PMD_AF_PACKET=y
+CONFIG_RTE_LIBRTE_PMD_MEMIF=y
CONFIG_RTE_LIBRTE_PMD_SOFTNIC=y
CONFIG_RTE_LIBRTE_PMD_TAP=y
CONFIG_RTE_LIBRTE_AVP_PMD=y
diff --git a/doc/guides/nics/features/memif.ini b/doc/guides/nics/features/memif.ini
new file mode 100644
index 000000000..807d9ecdc
--- /dev/null
+++ b/doc/guides/nics/features/memif.ini
@@ -0,0 +1,14 @@
+;
+; Supported features of the 'memif' network poll mode driver.
+;
+; Refer to default.ini for the full list of available PMD features.
+;
+[Features]
+Link status = Y
+Basic stats = Y
+Jumbo frame = Y
+ARMv8 = Y
+Power8 = Y
+x86-32 = Y
+x86-64 = Y
+Usage doc = Y
diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
index 2221c35f2..691e7209f 100644
--- a/doc/guides/nics/index.rst
+++ b/doc/guides/nics/index.rst
@@ -36,6 +36,7 @@ Network Interface Controller Drivers
intel_vf
kni
liquidio
+ memif
mlx4
mlx5
mvneta
diff --git a/doc/guides/nics/memif.rst b/doc/guides/nics/memif.rst
new file mode 100644
index 000000000..de2d481eb
--- /dev/null
+++ b/doc/guides/nics/memif.rst
@@ -0,0 +1,234 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2018-2019 Cisco Systems, Inc.
+
+======================
+Memif Poll Mode Driver
+======================
+
+Shared memory packet interface (memif) PMD allows for DPDK and any other client
+using memif (DPDK, VPP, libmemif) to communicate using shared memory. Memif is
+Linux only.
+
+The created device transmits packets in a raw format. It can be used with
+Ethernet mode, IP mode, or Punt/Inject. At this moment, only Ethernet mode is
+supported in DPDK memif implementation.
+
+Memif works in two roles: master and slave. Slave connects to master over an
+existing socket. It is also a producer of shared memory file and initializes
+the shared memory. Each interface can be connected to one peer interface
+at same time. The peer interface is identified by id parameter. Master
+creates the socket and listens for any slave connection requests. The socket
+may already exist on the system. Be sure to remove any such sockets, if you
+are creating a master interface, or you will see an "Address already in use"
+error. Function ``rte_pmd_memif_remove()``, which removes memif interface,
+will also remove a listener socket, if it is not being used by any other
+interface.
+
+The method to enable one or more interfaces is to use the
+``--vdev=net_memif0`` option on the DPDK application command line. Each
+``--vdev=net_memif1`` option given will create an interface named net_memif0,
+net_memif1, and so on. Memif uses unix domain socket to transmit control
+messages. Each memif has a unique id per socket. This id is used to identify
+peer interface. If you are connecting multiple
+interfaces using same socket, be sure to specify unique ids ``id=0``, ``id=1``,
+etc. Note that if you assign a socket to a master interface it becomes a
+listener socket. Listener socket can not be used by a slave interface on same
+client.
+
+.. csv-table:: **Memif configuration options**
+ :header: "Option", "Description", "Default", "Valid value"
+
+ "id=0", "Used to identify peer interface", "0", "uint32_t"
+ "role=master", "Set memif role", "slave", "master|slave"
+ "bsize=1024", "Size of single packet buffer", "2048", "uint16_t"
+ "rsize=11", "Log2 of ring size. If rsize is 10, actual ring size is 1024", "10", "1-14"
+ "socket=/tmp/memif.sock", "Socket filename", "/tmp/memif.sock", "string len 256"
+ "mac=01:23:45:ab:cd:ef", "Mac address", "01:ab:23:cd:45:ef", ""
+ "secret=abc123", "Secret is an optional security option, which if specified, must be matched by peer", "", "string len 24"
+ "zero-copy=yes", "Enable/disable zero-copy slave mode", "no", "yes|no"
+
+**Connection establishment**
+
+In order to create memif connection, two memif interfaces, each in separate
+process, are needed. One interface in ``master`` role and other in
+``slave`` role. It is not possible to connect two interfaces in a single
+process. Each interface can be connected to one interface at same time,
+identified by matching id parameter.
+
+Memif driver uses unix domain socket to exchange required information between
+memif interfaces. Socket file path is specified at interface creation see
+*Memif configuration options* table above. If socket is used by ``master``
+interface, it's marked as listener socket (in scope of current process) and
+listens to connection requests from other processes. One socket can be used by
+multiple interfaces. One process can have ``slave`` and ``master`` interfaces
+at the same time, provided each role is assigned unique socket.
+
+For detailed information on memif control messages, see: net/memif/memif.h.
+
+Slave interface attempts to make a connection on assigned socket. Process
+listening on this socket will extract the connection request and create a new
+connected socket (control channel). Then it sends the 'hello' message
+(``MEMIF_MSG_TYPE_HELLO``), containing configuration boundaries. Slave interface
+adjusts its configuration accordingly, and sends 'init' message
+(``MEMIF_MSG_TYPE_INIT``). This message among others contains interface id. Driver
+uses this id to find master interface, and assigns the control channel to this
+interface. If such interface is found, 'ack' message (``MEMIF_MSG_TYPE_ACK``) is
+sent. Slave interface sends 'add region' message (``MEMIF_MSG_TYPE_ADD_REGION``) for
+every region allocated. Master responds to each of these messages with 'ack'
+message. Same behavior applies to rings. Slave sends 'add ring' message
+(``MEMIF_MSG_TYPE_ADD_RING``) for every initialized ring. Master again responds to
+each message with 'ack' message. To finalize the connection, slave interface
+sends 'connect' message (``MEMIF_MSG_TYPE_CONNECT``). Upon receiving this message
+master maps regions to its address space, initializes rings and responds with
+'connected' message (``MEMIF_MSG_TYPE_CONNECTED``). Disconnect
+(``MEMIF_MSG_TYPE_DISCONNECT``) can be sent by both master and slave interfaces at
+any time, due to driver error or if the interface is being deleted.
+
+Files
+
+- net/memif/memif.h *- control messages definitions*
+- net/memif/memif_socket.h
+- net/memif/memif_socket.c
+
+Shared memory
+~~~~~~~~~~~~~
+
+**Shared memory format**
+
+Slave is producer and master is consumer. Memory regions, are mapped shared memory files,
+created by memif slave and provided to master at connection establishment.
+Regions contain rings and buffers. Rings and buffers can also be separated into multiple
+regions. For no-zero-copy, rings and buffers are stored inside single memory
+region to reduce the number of opened files.
+
+region n (no-zero-copy):
+
++-----------------------+-------------------------------------------------------------------------+
+| Rings | Buffers |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+| S2M rings | M2S rings | packet buffer 0 | . | pb ((1 << pmd->run.log2_ring_size)*(s2m + m2s))-1 |
++-----------+-----------+-----------------+---+---------------------------------------------------+
+
+S2M OR M2S Rings:
+
++--------+--------+-----------------------+
+| ring 0 | ring 1 | ring num_s2m_rings - 1|
++--------+--------+-----------------------+
+
+ring 0:
+
++-------------+---------------------------------------+
+| ring header | (1 << pmd->run.log2_ring_size) * desc |
++-------------+---------------------------------------+
+
+Descriptors are assigned packet buffers in order of rings creation. If we have one ring
+in each direction and ring size is 1024, then first 1024 buffers will belong to S2M ring and
+last 1024 will belong to M2S ring. In case of zero-copy, buffers are dequeued and
+enqueued as needed.
+
+**Descriptor format**
+
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Quad|6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | |1|1| | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|Word|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | |6|5| | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+|0 |length |region |flags |
++----+---------------------------------------------------------------+-------------------------------+-------------------------------+
+|1 |metadata |offset |
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |6| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |3|3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
+| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+| |3| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |2|1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |0|
++----+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+**Flags field - flags (Quad Word 0, bits 0:15)**
+
++-----+--------------------+------------------------------------------------------------------------------------------------+
+|Bits |Name |Functionality |
++=====+====================+================================================================================================+
+|0 |MEMIF_DESC_FLAG_NEXT|Is chained buffer. When set, the packet is divided into multiple buffers. May not be contiguous.|
++-----+--------------------+------------------------------------------------------------------------------------------------+
+
+**Region index - region (Quad Word 0, 16:31)**
+
+Index of memory region, the buffer is located in.
+
+**Data length - length (Quad Word 0, 32:63)**
+
+Length of transmitted/received data.
+
+**Data Offset - offset (Quad Word 1, 0:31)**
+
+Data start offset from memory region address. *.regions[desc->region].addr + desc->offset*
+
+**Metadata - metadata (Quad Word 1, 32:63)**
+
+Buffer metadata.
+
+Files
+
+- net/memif/memif.h *- descriptor and ring definitions*
+- net/memif/rte_eth_memif.c *- eth_memif_rx() eth_memif_tx()*
+
+Example: testpmd
+----------------------------
+In this example we run two instances of testpmd application and transmit packets over memif.
+
+First create ``master`` interface::
+
+ #./build/app/testpmd -l 0-1 --proc-type=primary --file-prefix=pmd1 --vdev=net_memif,role=master -- -i
+
+Now create ``slave`` interface (master must be already running so the slave will connect)::
+
+ #./build/app/testpmd -l 2-3 --proc-type=primary --file-prefix=pmd2 --vdev=net_memif -- -i
+
+Start forwarding packets::
+
+ Slave:
+ testpmd> start
+
+ Master:
+ testpmd> start tx_first
+
+Show status::
+
+ testpmd> show port stats 0
+
+For more details on testpmd please refer to :doc:`../testpmd_app_ug/index`.
+
+Example: testpmd and VPP
+------------------------
+For information on how to get and run VPP please see `<https://wiki.fd.io/view/VPP>`_.
+
+Start VPP in interactive mode (should be by default). Create memif master interface in VPP::
+
+ vpp# create interface memif id 0 master no-zero-copy
+ vpp# set interface state memif0/0 up
+ vpp# set interface ip address memif0/0 192.168.1.1/24
+
+To see socket filename use show memif command::
+
+ vpp# show memif
+ sockets
+ id listener filename
+ 0 yes (1) /run/vpp/memif.sock
+ ...
+
+Now create memif interface by running testpmd with these command line options::
+
+ #./testpmd --vdev=net_memif,socket=/run/vpp/memif.sock -- -i
+
+Testpmd should now create memif slave interface and try to connect to master.
+In testpmd set forward option to icmpecho and start forwarding::
+
+ testpmd> set fwd icmpecho
+ testpmd> start
+
+Send ping from VPP::
+
+ vpp# ping 192.168.1.2
+ 64 bytes from 192.168.1.2: icmp_seq=2 ttl=254 time=36.2918 ms
+ 64 bytes from 192.168.1.2: icmp_seq=3 ttl=254 time=23.3927 ms
+ 64 bytes from 192.168.1.2: icmp_seq=4 ttl=254 time=24.2975 ms
+ 64 bytes from 192.168.1.2: icmp_seq=5 ttl=254 time=17.7049 ms
diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst
index a17e7dea5..46efd13f0 100644
--- a/doc/guides/rel_notes/release_19_08.rst
+++ b/doc/guides/rel_notes/release_19_08.rst
@@ -54,6 +54,11 @@ New Features
Also, make sure to start the actual text at the margin.
=========================================================
+* **Added memif PMD.**
+
+ Added the new Shared Memory Packet Interface (``memif``) PMD.
+ See the :doc:`../nics/memif` guide for more details on this new driver.
+
Removed Items
-------------
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3a72cf38c..78cb10fc6 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -35,6 +35,7 @@ DIRS-$(CONFIG_RTE_LIBRTE_ICE_PMD) += ice
DIRS-$(CONFIG_RTE_LIBRTE_IPN3KE_PMD) += ipn3ke
DIRS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD) += ixgbe
DIRS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += liquidio
+DIRS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif
DIRS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += mlx4
DIRS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += mlx5
DIRS-$(CONFIG_RTE_LIBRTE_MVNETA_PMD) += mvneta
diff --git a/drivers/net/memif/Makefile b/drivers/net/memif/Makefile
new file mode 100644
index 000000000..c3119625c
--- /dev/null
+++ b/drivers/net/memif/Makefile
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+#
+# library name
+#
+LIB = librte_pmd_memif.a
+
+EXPORT_MAP := rte_pmd_memif_version.map
+
+LIBABIVER := 1
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+LDLIBS += -lrte_eal -lrte_mbuf -lrte_mempool
+LDLIBS += -lrte_ethdev -lrte_kvargs
+LDLIBS += -lrte_hash
+LDLIBS += -lrte_bus_vdev
+
+#
+# all source are stored in SRCS-y
+#
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += rte_eth_memif.c
+SRCS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += memif_socket.c
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/drivers/net/memif/memif.h b/drivers/net/memif/memif.h
new file mode 100644
index 000000000..3948b1f50
--- /dev/null
+++ b/drivers/net/memif/memif.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_H_
+#define _MEMIF_H_
+
+#define MEMIF_COOKIE 0x3E31F20
+#define MEMIF_VERSION_MAJOR 2
+#define MEMIF_VERSION_MINOR 0
+#define MEMIF_VERSION ((MEMIF_VERSION_MAJOR << 8) | MEMIF_VERSION_MINOR)
+#define MEMIF_NAME_SZ 32
+
+/*
+ * S2M: direction slave -> master
+ * M2S: direction master -> slave
+ */
+
+/*
+ * Type definitions
+ */
+
+typedef enum memif_msg_type {
+ MEMIF_MSG_TYPE_NONE,
+ MEMIF_MSG_TYPE_ACK,
+ MEMIF_MSG_TYPE_HELLO,
+ MEMIF_MSG_TYPE_INIT,
+ MEMIF_MSG_TYPE_ADD_REGION,
+ MEMIF_MSG_TYPE_ADD_RING,
+ MEMIF_MSG_TYPE_CONNECT,
+ MEMIF_MSG_TYPE_CONNECTED,
+ MEMIF_MSG_TYPE_DISCONNECT,
+} memif_msg_type_t;
+
+typedef enum {
+ MEMIF_RING_S2M, /**< buffer ring in direction slave -> master */
+ MEMIF_RING_M2S, /**< buffer ring in direction master -> slave */
+} memif_ring_type_t;
+
+typedef enum {
+ MEMIF_INTERFACE_MODE_ETHERNET,
+ MEMIF_INTERFACE_MODE_IP,
+ MEMIF_INTERFACE_MODE_PUNT_INJECT,
+} memif_interface_mode_t;
+
+typedef uint16_t memif_region_index_t;
+typedef uint32_t memif_region_offset_t;
+typedef uint64_t memif_region_size_t;
+typedef uint16_t memif_ring_index_t;
+typedef uint32_t memif_interface_id_t;
+typedef uint16_t memif_version_t;
+typedef uint8_t memif_log2_ring_size_t;
+
+/*
+ * Socket messages
+ */
+
+ /**
+ * M2S
+ * Contains master interfaces configuration.
+ */
+typedef struct __rte_packed {
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+ memif_version_t min_version; /**< lowest supported memif version */
+ memif_version_t max_version; /**< highest supported memif version */
+ memif_region_index_t max_region; /**< maximum num of regions */
+ memif_ring_index_t max_m2s_ring; /**< maximum num of M2S ring */
+ memif_ring_index_t max_s2m_ring; /**< maximum num of S2M rings */
+ memif_log2_ring_size_t max_log2_ring_size; /**< maximum ring size (as log2) */
+} memif_msg_hello_t;
+
+/**
+ * S2M
+ * Contains information required to identify interface
+ * to which the slave wants to connect.
+ */
+typedef struct __rte_packed {
+ memif_version_t version; /**< memif version */
+ memif_interface_id_t id; /**< interface id */
+ memif_interface_mode_t mode:8; /**< interface mode */
+ uint8_t secret[24]; /**< optional security parameter */
+ uint8_t name[MEMIF_NAME_SZ]; /**< Client app name. In this case DPDK version */
+} memif_msg_init_t;
+
+/**
+ * S2M
+ * Request master to add new shared memory region to master interface.
+ * Shared files file descriptor is passed in cmsghdr.
+ */
+typedef struct __rte_packed {
+ memif_region_index_t index; /**< shm regions index */
+ memif_region_size_t size; /**< shm region size */
+} memif_msg_add_region_t;
+
+/**
+ * S2M
+ * Request master to add new ring to master interface.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_MSG_ADD_RING_FLAG_S2M 1 /**< ring is in S2M direction */
+ memif_ring_index_t index; /**< ring index */
+ memif_region_index_t region; /**< region index on which this ring is located */
+ memif_region_offset_t offset; /**< buffer start offset */
+ memif_log2_ring_size_t log2_ring_size; /**< ring size (log2) */
+ uint16_t private_hdr_size; /**< used for private metadata */
+} memif_msg_add_ring_t;
+
+/**
+ * S2M
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< slave interface name */
+} memif_msg_connect_t;
+
+/**
+ * M2S
+ * Finalize connection establishment.
+ */
+typedef struct __rte_packed {
+ uint8_t if_name[MEMIF_NAME_SZ]; /**< master interface name */
+} memif_msg_connected_t;
+
+/**
+ * S2M & M2S
+ * Disconnect interfaces.
+ */
+typedef struct __rte_packed {
+ uint32_t code; /**< error code */
+ uint8_t string[96]; /**< disconnect reason */
+} memif_msg_disconnect_t;
+
+typedef struct __rte_packed __rte_aligned(128)
+{
+ memif_msg_type_t type:16;
+ union {
+ memif_msg_hello_t hello;
+ memif_msg_init_t init;
+ memif_msg_add_region_t add_region;
+ memif_msg_add_ring_t add_ring;
+ memif_msg_connect_t connect;
+ memif_msg_connected_t connected;
+ memif_msg_disconnect_t disconnect;
+ };
+} memif_msg_t;
+
+/*
+ * Ring and Descriptor Layout
+ */
+
+/**
+ * Buffer descriptor.
+ */
+typedef struct __rte_packed {
+ uint16_t flags; /**< flags */
+#define MEMIF_DESC_FLAG_NEXT 1 /**< is chained buffer */
+ memif_region_index_t region; /**< region index on which the buffer is located */
+ uint32_t length; /**< buffer length */
+ memif_region_offset_t offset; /**< buffer offset */
+ uint32_t metadata;
+} memif_desc_t;
+
+#define MEMIF_CACHELINE_ALIGN_MARK(mark) \
+ uint8_t mark[0] __rte_aligned(RTE_CACHE_LINE_SIZE)
+
+typedef struct {
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline0);
+ uint32_t cookie; /**< MEMIF_COOKIE */
+ uint16_t flags; /**< flags */
+#define MEMIF_RING_FLAG_MASK_INT 1 /**< disable interrupt mode */
+ volatile uint16_t head; /**< pointer to ring buffer head */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline1);
+ volatile uint16_t tail; /**< pointer to ring buffer tail */
+ MEMIF_CACHELINE_ALIGN_MARK(cacheline2);
+ memif_desc_t desc[0]; /**< buffer descriptors */
+} memif_ring_t;
+
+#endif /* _MEMIF_H_ */
diff --git a/drivers/net/memif/memif_socket.c b/drivers/net/memif/memif_socket.c
new file mode 100644
index 000000000..1e046b6b1
--- /dev/null
+++ b/drivers/net/memif/memif_socket.c
@@ -0,0 +1,1124 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <errno.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_hash.h>
+#include <rte_jhash.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+static void memif_intr_handler(void *arg);
+
+static ssize_t
+memif_msg_send(int fd, memif_msg_t *msg, int afd)
+{
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ struct cmsghdr *cmsg;
+ char ctl[CMSG_SPACE(sizeof(int))];
+
+ iov[0].iov_base = msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+
+ if (afd > 0) {
+ memset(&ctl, 0, sizeof(ctl));
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+ cmsg = CMSG_FIRSTHDR(&mh);
+ cmsg->cmsg_len = CMSG_LEN(sizeof(int));
+ cmsg->cmsg_level = SOL_SOCKET;
+ cmsg->cmsg_type = SCM_RIGHTS;
+ rte_memcpy(CMSG_DATA(cmsg), &afd, sizeof(int));
+ }
+
+ return sendmsg(fd, &mh, 0);
+}
+
+static int
+memif_msg_send_from_queue(struct memif_control_channel *cc)
+{
+ ssize_t size;
+ int ret = 0;
+ struct memif_msg_queue_elt *e;
+
+ e = TAILQ_FIRST(&cc->msg_queue);
+ if (e == NULL)
+ return 0;
+
+ size = memif_msg_send(cc->intr_handle.fd, &e->msg, e->fd);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(ERR, "sendmsg fail: %s.", strerror(errno));
+ ret = -1;
+ } else {
+ MIF_LOG(DEBUG, "Sent msg type %u.", e->msg.type);
+ }
+ TAILQ_REMOVE(&cc->msg_queue, e, next);
+ rte_free(e);
+
+ return ret;
+}
+
+static struct memif_msg_queue_elt *
+memif_msg_enq(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e;
+
+ e = rte_zmalloc("memif_msg", sizeof(struct memif_msg_queue_elt), 0);
+ if (e == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control message.");
+ return NULL;
+ }
+
+ e->fd = -1;
+ TAILQ_INSERT_TAIL(&cc->msg_queue, e, next);
+
+ return e;
+}
+
+void
+memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code)
+{
+ struct memif_msg_queue_elt *e;
+ struct pmd_internals *pmd;
+ memif_msg_disconnect_t *d;
+
+ if (cc == NULL) {
+ MIF_LOG(DEBUG, "Missing control channel.");
+ return;
+ }
+
+ e = memif_msg_enq(cc);
+ if (e == NULL) {
+ MIF_LOG(WARNING, "Failed to enqueue disconnect message.");
+ return;
+ }
+
+ d = &e->msg.disconnect;
+
+ e->msg.type = MEMIF_MSG_TYPE_DISCONNECT;
+ d->code = err_code;
+
+ if (reason != NULL) {
+ strlcpy((char *)d->string, reason, sizeof(d->string));
+ if (cc->dev != NULL) {
+ pmd = cc->dev->data->dev_private;
+ strlcpy(pmd->local_disc_string, reason,
+ sizeof(pmd->local_disc_string));
+ }
+ }
+}
+
+static int
+memif_msg_enq_hello(struct memif_control_channel *cc)
+{
+ struct memif_msg_queue_elt *e = memif_msg_enq(cc);
+ memif_msg_hello_t *h;
+
+ if (e == NULL)
+ return -1;
+
+ h = &e->msg.hello;
+
+ e->msg.type = MEMIF_MSG_TYPE_HELLO;
+ h->min_version = MEMIF_VERSION;
+ h->max_version = MEMIF_VERSION;
+ h->max_s2m_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_m2s_ring = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ h->max_region = ETH_MEMIF_MAX_REGION_NUM - 1;
+ h->max_log2_ring_size = ETH_MEMIF_MAX_LOG2_RING_SIZE;
+
+ strlcpy((char *)h->name, rte_version(), sizeof(h->name));
+
+ return 0;
+}
+
+static int
+memif_msg_receive_hello(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_hello_t *h = &msg->hello;
+
+ if (h->min_version > MEMIF_VERSION || h->max_version < MEMIF_VERSION) {
+ memif_msg_enq_disconnect(pmd->cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ /* Set parameters for active connection */
+ pmd->run.num_s2m_rings = RTE_MIN(h->max_s2m_ring + 1,
+ pmd->cfg.num_s2m_rings);
+ pmd->run.num_m2s_rings = RTE_MIN(h->max_m2s_ring + 1,
+ pmd->cfg.num_m2s_rings);
+ pmd->run.log2_ring_size = RTE_MIN(h->max_log2_ring_size,
+ pmd->cfg.log2_ring_size);
+ pmd->run.pkt_buffer_size = pmd->cfg.pkt_buffer_size;
+
+ strlcpy(pmd->remote_name, (char *)h->name, sizeof(pmd->remote_name));
+
+ MIF_LOG(DEBUG, "%s: Connecting to %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_init(struct memif_control_channel *cc, memif_msg_t *msg)
+{
+ memif_msg_init_t *i = &msg->init;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *pmd;
+ struct rte_eth_dev *dev;
+
+ if (i->version != MEMIF_VERSION) {
+ memif_msg_enq_disconnect(cc, "Incompatible memif version", 0);
+ return -1;
+ }
+
+ if (cc->socket == NULL) {
+ memif_msg_enq_disconnect(cc, "Device error", 0);
+ return -1;
+ }
+
+ /* Find device with requested ID */
+ TAILQ_FOREACH(elt, &cc->socket->dev_queue, next) {
+ dev = elt->dev;
+ pmd = dev->data->dev_private;
+ if (((pmd->flags & ETH_MEMIF_FLAG_DISABLED) == 0) &&
+ pmd->id == i->id) {
+ /* assign control channel to device */
+ cc->dev = dev;
+ pmd->cc = cc;
+
+ if (i->mode != MEMIF_INTERFACE_MODE_ETHERNET) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Only ethernet mode supported",
+ 0);
+ return -1;
+ }
+
+ if (pmd->flags & (ETH_MEMIF_FLAG_CONNECTING |
+ ETH_MEMIF_FLAG_CONNECTED)) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Already connected", 0);
+ return -1;
+ }
+ strlcpy(pmd->remote_name, (char *)i->name,
+ sizeof(pmd->remote_name));
+
+ if (*pmd->secret != '\0') {
+ if (*i->secret == '\0') {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Secret required", 0);
+ return -1;
+ }
+ if (strncmp(pmd->secret, (char *)i->secret,
+ ETH_MEMIF_SECRET_SIZE) != 0) {
+ memif_msg_enq_disconnect(pmd->cc,
+ "Incorrect secret", 0);
+ return -1;
+ }
+ }
+
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTING;
+ return 0;
+ }
+ }
+
+ /* ID not found on this socket */
+ MIF_LOG(DEBUG, "ID %u not found.", i->id);
+ memif_msg_enq_disconnect(cc, "ID not found", 0);
+ return -1;
+}
+
+static int
+memif_msg_receive_add_region(struct rte_eth_dev *dev, memif_msg_t *msg,
+ int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_region_t *ar = &msg->add_region;
+ struct memif_region *r;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing region fd", 0);
+ return -1;
+ }
+
+ if (ar->index >= ETH_MEMIF_MAX_REGION_NUM || ar->index != pmd->regions_num ||
+ pmd->regions[ar->index] != NULL) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid region index", 0);
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ r->fd = fd;
+ r->region_size = ar->size;
+ r->addr = NULL;
+
+ pmd->regions[ar->index] = r;
+ pmd->regions_num++;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_add_ring(struct rte_eth_dev *dev, memif_msg_t *msg, int fd)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_add_ring_t *ar = &msg->add_ring;
+ struct memif_queue *mq;
+
+ if (fd < 0) {
+ memif_msg_enq_disconnect(pmd->cc, "Missing interrupt fd", 0);
+ return -1;
+ }
+
+ /* check if we have enough queues */
+ if (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) {
+ if (ar->index >= pmd->cfg.num_s2m_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_s2m_rings++;
+ } else {
+ if (ar->index >= pmd->cfg.num_m2s_rings) {
+ memif_msg_enq_disconnect(pmd->cc, "Invalid ring index", 0);
+ return -1;
+ }
+ pmd->run.num_m2s_rings++;
+ }
+
+ mq = (ar->flags & MEMIF_MSG_ADD_RING_FLAG_S2M) ?
+ dev->data->rx_queues[ar->index] : dev->data->tx_queues[ar->index];
+
+ mq->intr_handle.fd = fd;
+ mq->log2_ring_size = ar->log2_ring_size;
+ mq->region = ar->region;
+ mq->ring_offset = ar->offset;
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connect_t *c = &msg->connect;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_connected(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_connected_t *c = &msg->connected;
+ int ret;
+
+ ret = memif_connect(dev);
+ if (ret < 0)
+ return ret;
+
+ strlcpy(pmd->remote_if_name, (char *)c->if_name,
+ sizeof(pmd->remote_if_name));
+ MIF_LOG(INFO, "%s: Remote interface %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_if_name);
+
+ return 0;
+}
+
+static int
+memif_msg_receive_disconnect(struct rte_eth_dev *dev, memif_msg_t *msg)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_msg_disconnect_t *d = &msg->disconnect;
+
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ strlcpy(pmd->remote_disc_string, (char *)d->string,
+ sizeof(pmd->remote_disc_string));
+
+ MIF_LOG(INFO, "%s: Disconnect received: %s",
+ rte_vdev_device_name(pmd->vdev), pmd->remote_disc_string);
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memif_disconnect(rte_eth_dev_allocated
+ (rte_vdev_device_name(pmd->vdev)));
+ return 0;
+}
+
+static int
+memif_msg_enq_ack(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ if (e == NULL)
+ return -1;
+
+ e->msg.type = MEMIF_MSG_TYPE_ACK;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_init(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_init_t *i = &e->msg.init;
+
+ if (e == NULL)
+ return -1;
+
+ i = &e->msg.init;
+ e->msg.type = MEMIF_MSG_TYPE_INIT;
+ i->version = MEMIF_VERSION;
+ i->id = pmd->id;
+ i->mode = MEMIF_INTERFACE_MODE_ETHERNET;
+
+ strlcpy((char *)i->name, rte_version(), sizeof(i->name));
+
+ if (*pmd->secret != '\0')
+ strlcpy((char *)i->secret, pmd->secret, sizeof(i->secret));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_region(struct rte_eth_dev *dev, uint8_t idx)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ memif_msg_add_region_t *ar;
+ struct memif_region *mr = pmd->regions[idx];
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_region;
+ e->msg.type = MEMIF_MSG_TYPE_ADD_REGION;
+ e->fd = mr->fd;
+ ar->index = idx;
+ ar->size = mr->region_size;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_add_ring(struct rte_eth_dev *dev, uint8_t idx,
+ memif_ring_type_t type)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ struct memif_queue *mq;
+ memif_msg_add_ring_t *ar;
+
+ if (e == NULL)
+ return -1;
+
+ ar = &e->msg.add_ring;
+ mq = (type == MEMIF_RING_S2M) ? dev->data->tx_queues[idx] :
+ dev->data->rx_queues[idx];
+
+ e->msg.type = MEMIF_MSG_TYPE_ADD_RING;
+ e->fd = mq->intr_handle.fd;
+ ar->index = idx;
+ ar->offset = mq->ring_offset;
+ ar->region = mq->region;
+ ar->log2_ring_size = mq->log2_ring_size;
+ ar->flags = (type == MEMIF_RING_S2M) ? MEMIF_MSG_ADD_RING_FLAG_S2M : 0;
+ ar->private_hdr_size = 0;
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connect_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connect;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECT;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static int
+memif_msg_enq_connected(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *e = memif_msg_enq(pmd->cc);
+ const char *name = rte_vdev_device_name(pmd->vdev);
+ memif_msg_connected_t *c;
+
+ if (e == NULL)
+ return -1;
+
+ c = &e->msg.connected;
+ e->msg.type = MEMIF_MSG_TYPE_CONNECTED;
+ strlcpy((char *)c->if_name, name, sizeof(c->if_name));
+
+ return 0;
+}
+
+static void
+memif_intr_unregister_handler(struct rte_intr_handle *intr_handle, void *arg)
+{
+ struct memif_msg_queue_elt *elt;
+ struct memif_control_channel *cc = arg;
+
+ /* close control channel fd */
+ close(intr_handle->fd);
+ /* clear message queue */
+ while ((elt = TAILQ_FIRST(&cc->msg_queue)) != NULL) {
+ TAILQ_REMOVE(&cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ /* free control channel */
+ rte_free(cc);
+}
+
+void
+memif_disconnect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_msg_queue_elt *elt, *next;
+ struct memif_queue *mq;
+ struct rte_intr_handle *ih;
+ int i;
+ int ret;
+
+ if (pmd->cc != NULL) {
+ /* Clear control message queue (except disconnect message if any). */
+ for (elt = TAILQ_FIRST(&pmd->cc->msg_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->msg.type != MEMIF_MSG_TYPE_DISCONNECT) {
+ TAILQ_REMOVE(&pmd->cc->msg_queue, elt, next);
+ rte_free(elt);
+ }
+ }
+ /* send disconnect message (if there is any in queue) */
+ memif_msg_send_from_queue(pmd->cc);
+
+ /* at this point, there should be no more messages in queue */
+ if (TAILQ_FIRST(&pmd->cc->msg_queue) != NULL) {
+ MIF_LOG(WARNING,
+ "%s: Unexpected message(s) in message queue.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+
+ ih = &pmd->cc->intr_handle;
+ if (ih->fd > 0) {
+ ret = rte_intr_callback_unregister(ih,
+ memif_intr_handler,
+ pmd->cc);
+ /*
+ * If callback is active (disconnecting based on
+ * received control message).
+ */
+ if (ret == -EAGAIN) {
+ ret = rte_intr_callback_unregister_pending(ih,
+ memif_intr_handler,
+ pmd->cc,
+ memif_intr_unregister_handler);
+ } else if (ret > 0) {
+ close(ih->fd);
+ rte_free(pmd->cc);
+ }
+ pmd->cc = NULL;
+ if (ret <= 0)
+ MIF_LOG(WARNING, "%s: Failed to unregister "
+ "control channel callback.",
+ rte_vdev_device_name(pmd->vdev));
+ }
+ }
+
+ /* unconfig interrupts */
+ for (i = 0; i < pmd->cfg.num_s2m_rings; i++) {
+ if (pmd->role == MEMIF_ROLE_SLAVE) {
+ if (dev->data->tx_queues != NULL)
+ mq = dev->data->tx_queues[i];
+ else
+ continue;
+ } else {
+ if (dev->data->rx_queues != NULL)
+ mq = dev->data->rx_queues[i];
+ else
+ continue;
+ }
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+ for (i = 0; i < pmd->cfg.num_m2s_rings; i++) {
+ if (pmd->role == MEMIF_ROLE_MASTER) {
+ if (dev->data->tx_queues != NULL)
+ mq = dev->data->tx_queues[i];
+ else
+ continue;
+ } else {
+ if (dev->data->rx_queues != NULL)
+ mq = dev->data->rx_queues[i];
+ else
+ continue;
+ }
+ if (mq->intr_handle.fd > 0) {
+ close(mq->intr_handle.fd);
+ mq->intr_handle.fd = -1;
+ }
+ mq->ring = NULL;
+ }
+
+ memif_free_regions(pmd);
+
+ /* reset connection configuration */
+ memset(&pmd->run, 0, sizeof(pmd->run));
+
+ dev->data->dev_link.link_status = ETH_LINK_DOWN;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTED;
+ MIF_LOG(DEBUG, "%s: Disconnected.", rte_vdev_device_name(pmd->vdev));
+}
+
+static int
+memif_msg_receive(struct memif_control_channel *cc)
+{
+ char ctl[CMSG_SPACE(sizeof(int)) +
+ CMSG_SPACE(sizeof(struct ucred))] = { 0 };
+ struct msghdr mh = { 0 };
+ struct iovec iov[1];
+ memif_msg_t msg = { 0 };
+ ssize_t size;
+ int ret = 0;
+ struct ucred *cr __rte_unused;
+ cr = 0;
+ struct cmsghdr *cmsg;
+ int afd = -1;
+ int i;
+ struct pmd_internals *pmd;
+
+ iov[0].iov_base = (void *)&msg;
+ iov[0].iov_len = sizeof(memif_msg_t);
+ mh.msg_iov = iov;
+ mh.msg_iovlen = 1;
+ mh.msg_control = ctl;
+ mh.msg_controllen = sizeof(ctl);
+
+ size = recvmsg(cc->intr_handle.fd, &mh, 0);
+ if (size != sizeof(memif_msg_t)) {
+ MIF_LOG(DEBUG, "Invalid message size.");
+ memif_msg_enq_disconnect(cc, "Invalid message size", 0);
+ return -1;
+ }
+ MIF_LOG(DEBUG, "Received msg type: %u.", msg.type);
+
+ cmsg = CMSG_FIRSTHDR(&mh);
+ while (cmsg) {
+ if (cmsg->cmsg_level == SOL_SOCKET) {
+ if (cmsg->cmsg_type == SCM_CREDENTIALS)
+ cr = (struct ucred *)CMSG_DATA(cmsg);
+ else if (cmsg->cmsg_type == SCM_RIGHTS)
+ memcpy(&afd, CMSG_DATA(cmsg), sizeof(int));
+ }
+ cmsg = CMSG_NXTHDR(&mh, cmsg);
+ }
+
+ if (cc->dev == NULL && msg.type != MEMIF_MSG_TYPE_INIT) {
+ MIF_LOG(DEBUG, "Unexpected message.");
+ memif_msg_enq_disconnect(cc, "Unexpected message", 0);
+ return -1;
+ }
+
+ /* get device from hash data */
+ switch (msg.type) {
+ case MEMIF_MSG_TYPE_ACK:
+ break;
+ case MEMIF_MSG_TYPE_HELLO:
+ ret = memif_msg_receive_hello(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_init_regions_and_queues(cc->dev);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_init(cc->dev);
+ if (ret < 0)
+ goto exit;
+ pmd = cc->dev->data->dev_private;
+ for (i = 0; i < pmd->regions_num; i++) {
+ ret = memif_msg_enq_add_region(cc->dev, i);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_S2M);
+ if (ret < 0)
+ goto exit;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ret = memif_msg_enq_add_ring(cc->dev, i,
+ MEMIF_RING_M2S);
+ if (ret < 0)
+ goto exit;
+ }
+ ret = memif_msg_enq_connect(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_INIT:
+ /*
+ * This cc does not have an interface asociated with it.
+ * If suitable interface is found it will be assigned here.
+ */
+ ret = memif_msg_receive_init(cc, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_REGION:
+ ret = memif_msg_receive_add_region(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_ADD_RING:
+ ret = memif_msg_receive_add_ring(cc->dev, &msg, afd);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_ack(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECT:
+ ret = memif_msg_receive_connect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ ret = memif_msg_enq_connected(cc->dev);
+ if (ret < 0)
+ goto exit;
+ break;
+ case MEMIF_MSG_TYPE_CONNECTED:
+ ret = memif_msg_receive_connected(cc->dev, &msg);
+ break;
+ case MEMIF_MSG_TYPE_DISCONNECT:
+ ret = memif_msg_receive_disconnect(cc->dev, &msg);
+ if (ret < 0)
+ goto exit;
+ break;
+ default:
+ memif_msg_enq_disconnect(cc, "Unknown message type", 0);
+ ret = -1;
+ goto exit;
+ }
+
+ exit:
+ return ret;
+}
+
+static void
+memif_intr_handler(void *arg)
+{
+ struct memif_control_channel *cc = arg;
+ int ret;
+
+ ret = memif_msg_receive(cc);
+ /* if driver failed to assign device */
+ if (cc->dev == NULL) {
+ ret = rte_intr_callback_unregister_pending(&cc->intr_handle,
+ memif_intr_handler,
+ cc,
+ memif_intr_unregister_handler);
+ if (ret < 0)
+ MIF_LOG(WARNING,
+ "Failed to unregister control channel callback.");
+ return;
+ }
+ /* if memif_msg_receive failed */
+ if (ret < 0)
+ goto disconnect;
+
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto disconnect;
+
+ return;
+
+ disconnect:
+ if (cc->dev == NULL) {
+ MIF_LOG(WARNING, "eth dev not allocated");
+ return;
+ }
+ memif_disconnect(cc->dev);
+}
+
+static void
+memif_listener_handler(void *arg)
+{
+ struct memif_socket *socket = arg;
+ int sockfd;
+ int addr_len;
+ struct sockaddr_un client;
+ struct memif_control_channel *cc;
+ int ret;
+
+ addr_len = sizeof(client);
+ sockfd = accept(socket->intr_handle.fd, (struct sockaddr *)&client,
+ (socklen_t *)&addr_len);
+ if (sockfd < 0) {
+ MIF_LOG(ERR,
+ "Failed to accept connection request on socket fd %d",
+ socket->intr_handle.fd);
+ return;
+ }
+
+ MIF_LOG(DEBUG, "%s: Connection request accepted.", socket->filename);
+
+ cc = rte_zmalloc("memif-cc", sizeof(struct memif_control_channel), 0);
+ if (cc == NULL) {
+ MIF_LOG(ERR, "Failed to allocate control channel.");
+ goto error;
+ }
+
+ cc->intr_handle.fd = sockfd;
+ cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ cc->socket = socket;
+ cc->dev = NULL;
+ TAILQ_INIT(&cc->msg_queue);
+
+ ret = rte_intr_callback_register(&cc->intr_handle, memif_intr_handler, cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to register control channel callback.");
+ goto error;
+ }
+
+ ret = memif_msg_enq_hello(cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to enqueue hello message.");
+ goto error;
+ }
+ ret = memif_msg_send_from_queue(cc);
+ if (ret < 0)
+ goto error;
+
+ return;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (cc != NULL)
+ rte_free(cc);
+}
+
+static struct memif_socket *
+memif_socket_create(struct pmd_internals *pmd, char *key, uint8_t listener)
+{
+ struct memif_socket *sock;
+ struct sockaddr_un un;
+ int sockfd;
+ int ret;
+ int on = 1;
+
+ sock = rte_zmalloc("memif-socket", sizeof(struct memif_socket), 0);
+ if (sock == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory for memif socket");
+ return NULL;
+ }
+
+ sock->listener = listener;
+ rte_memcpy(sock->filename, key, 256);
+ TAILQ_INIT(&sock->dev_queue);
+
+ if (listener != 0) {
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0)
+ goto error;
+
+ un.sun_family = AF_UNIX;
+ memcpy(un.sun_path, sock->filename,
+ sizeof(un.sun_path) - 1);
+
+ ret = setsockopt(sockfd, SOL_SOCKET, SO_PASSCRED, &on,
+ sizeof(on));
+ if (ret < 0)
+ goto error;
+ ret = bind(sockfd, (struct sockaddr *)&un, sizeof(un));
+ if (ret < 0)
+ goto error;
+ ret = listen(sockfd, 1);
+ if (ret < 0)
+ goto error;
+
+ MIF_LOG(DEBUG, "%s: Memif listener socket %s created.",
+ rte_vdev_device_name(pmd->vdev), sock->filename);
+
+ sock->intr_handle.fd = sockfd;
+ sock->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ ret = rte_intr_callback_register(&sock->intr_handle,
+ memif_listener_handler, sock);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt "
+ "callback for listener socket",
+ rte_vdev_device_name(pmd->vdev));
+ return NULL;
+ }
+ }
+
+ return sock;
+
+ error:
+ MIF_LOG(ERR, "%s: Failed to setup socket %s: %s",
+ rte_vdev_device_name(pmd->vdev), key, strerror(errno));
+ if (sock != NULL)
+ rte_free(sock);
+ return NULL;
+}
+
+static struct rte_hash *
+memif_create_socket_hash(void)
+{
+ struct rte_hash_parameters params = { 0 };
+ params.name = MEMIF_SOCKET_HASH_NAME;
+ params.entries = 256;
+ params.key_len = 256;
+ params.hash_func = rte_jhash;
+ params.hash_func_init_val = 0;
+ return rte_hash_create(¶ms);
+}
+
+int
+memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt;
+ struct pmd_internals *tmp_pmd;
+ struct rte_hash *hash;
+ int ret;
+ char key[256];
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL) {
+ hash = memif_create_socket_hash();
+ if (hash == NULL) {
+ MIF_LOG(ERR, "Failed to create memif socket hash.");
+ return -1;
+ }
+ }
+
+ memset(key, 0, 256);
+ rte_memcpy(key, socket_filename, strlen(socket_filename));
+ ret = rte_hash_lookup_data(hash, key, (void **)&socket);
+ if (ret < 0) {
+ socket = memif_socket_create(pmd, key,
+ (pmd->role ==
+ MEMIF_ROLE_SLAVE) ? 0 : 1);
+ if (socket == NULL)
+ return -1;
+ ret = rte_hash_add_key_data(hash, key, socket);
+ if (ret < 0) {
+ MIF_LOG(ERR, "Failed to add socket to socket hash.");
+ return ret;
+ }
+ }
+ pmd->socket_filename = socket->filename;
+
+ if (socket->listener != 0 && pmd->role == MEMIF_ROLE_SLAVE) {
+ MIF_LOG(ERR, "Socket is a listener.");
+ return -1;
+ } else if ((socket->listener == 0) && (pmd->role == MEMIF_ROLE_MASTER)) {
+ MIF_LOG(ERR, "Socket is not a listener.");
+ return -1;
+ }
+
+ TAILQ_FOREACH(elt, &socket->dev_queue, next) {
+ tmp_pmd = elt->dev->data->dev_private;
+ if (tmp_pmd->id == pmd->id) {
+ MIF_LOG(ERR, "Memif device with id %d already "
+ "exists on socket %s",
+ pmd->id, socket->filename);
+ return -1;
+ }
+ }
+
+ elt = rte_malloc("pmd-queue", sizeof(struct memif_socket_dev_list_elt), 0);
+ if (elt == NULL) {
+ MIF_LOG(ERR, "%s: Failed to add device to socket device list.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ elt->dev = dev;
+ TAILQ_INSERT_TAIL(&socket->dev_queue, elt, next);
+
+ return 0;
+}
+
+void
+memif_socket_remove_device(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_socket *socket = NULL;
+ struct memif_socket_dev_list_elt *elt, *next;
+ struct rte_hash *hash;
+
+ hash = rte_hash_find_existing(MEMIF_SOCKET_HASH_NAME);
+ if (hash == NULL)
+ return;
+
+ if (pmd->socket_filename == NULL)
+ return;
+
+ if (rte_hash_lookup_data(hash, pmd->socket_filename, (void **)&socket) < 0)
+ return;
+
+ for (elt = TAILQ_FIRST(&socket->dev_queue); elt != NULL; elt = next) {
+ next = TAILQ_NEXT(elt, next);
+ if (elt->dev == dev) {
+ TAILQ_REMOVE(&socket->dev_queue, elt, next);
+ rte_free(elt);
+ pmd->socket_filename = NULL;
+ }
+ }
+
+ /* remove socket, if this was the last device using it */
+ if (TAILQ_EMPTY(&socket->dev_queue)) {
+ rte_hash_del_key(hash, socket->filename);
+ if (socket->listener) {
+ /* remove listener socket file,
+ * so we can create new one later.
+ */
+ remove(socket->filename);
+ }
+ rte_free(socket);
+ }
+}
+
+int
+memif_connect_master(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+ return 0;
+}
+
+int
+memif_connect_slave(struct rte_eth_dev *dev)
+{
+ int sockfd;
+ int ret;
+ struct sockaddr_un sun;
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ memset(pmd->local_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ memset(pmd->remote_disc_string, 0, ETH_MEMIF_DISC_STRING_SIZE);
+ pmd->flags &= ~ETH_MEMIF_FLAG_DISABLED;
+
+ sockfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
+ if (sockfd < 0) {
+ MIF_LOG(ERR, "%s: Failed to open socket.",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ sun.sun_family = AF_UNIX;
+
+ memcpy(sun.sun_path, pmd->socket_filename, sizeof(sun.sun_path) - 1);
+
+ ret = connect(sockfd, (struct sockaddr *)&sun,
+ sizeof(struct sockaddr_un));
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to connect socket: %s.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+ goto error;
+ }
+
+ MIF_LOG(DEBUG, "%s: Memif socket: %s connected.",
+ rte_vdev_device_name(pmd->vdev), pmd->socket_filename);
+
+ pmd->cc = rte_zmalloc("memif-cc",
+ sizeof(struct memif_control_channel), 0);
+ if (pmd->cc == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate control channel.",
+ rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ pmd->cc->intr_handle.fd = sockfd;
+ pmd->cc->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ pmd->cc->socket = NULL;
+ pmd->cc->dev = dev;
+ TAILQ_INIT(&pmd->cc->msg_queue);
+
+ ret = rte_intr_callback_register(&pmd->cc->intr_handle,
+ memif_intr_handler, pmd->cc);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to register interrupt callback "
+ "for control fd", rte_vdev_device_name(pmd->vdev));
+ goto error;
+ }
+
+ return 0;
+
+ error:
+ if (sockfd > 0) {
+ close(sockfd);
+ sockfd = -1;
+ }
+ if (pmd->cc != NULL) {
+ rte_free(pmd->cc);
+ pmd->cc = NULL;
+ }
+ return -1;
+}
diff --git a/drivers/net/memif/memif_socket.h b/drivers/net/memif/memif_socket.h
new file mode 100644
index 000000000..db293e200
--- /dev/null
+++ b/drivers/net/memif/memif_socket.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _MEMIF_SOCKET_H_
+#define _MEMIF_SOCKET_H_
+
+#include <sys/queue.h>
+
+/**
+ * Remove device from socket device list. If no device is left on the socket,
+ * remove the socket as well.
+ *
+ * @param dev
+ * memif device
+ */
+void memif_socket_remove_device(struct rte_eth_dev *dev);
+
+/**
+ * Enqueue disconnect message to control channel message queue.
+ *
+ * @param cc
+ * control channel
+ * @param reason
+ * const string stating disconnect reason (96 characters)
+ * @param err_code
+ * error code
+ */
+void memif_msg_enq_disconnect(struct memif_control_channel *cc, const char *reason,
+ int err_code);
+
+/**
+ * Initialize memif socket for specified device. If socket doesn't exist, create socket.
+ *
+ * @param dev
+ * memif device
+ * @param socket_filename
+ * socket filename
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_socket_init(struct rte_eth_dev *dev, const char *socket_filename);
+
+/**
+ * Disconnect memif device. Close control channel and shared memory.
+ *
+ * @param dev
+ * memif device
+ */
+void memif_disconnect(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, enable connection establishment.
+ *
+ * @param dev
+ * memif device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_master(struct rte_eth_dev *dev);
+
+/**
+ * If device is properly configured, send connection request.
+ *
+ * @param dev
+ * memif device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect_slave(struct rte_eth_dev *dev);
+
+struct memif_socket_dev_list_elt {
+ TAILQ_ENTRY(memif_socket_dev_list_elt) next;
+ struct rte_eth_dev *dev; /**< pointer to device internals */
+ char dev_name[RTE_ETH_NAME_MAX_LEN];
+};
+
+#define MEMIF_SOCKET_HASH_NAME "memif-sh"
+struct memif_socket {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ char filename[256]; /**< socket filename */
+
+ TAILQ_HEAD(, memif_socket_dev_list_elt) dev_queue;
+ /**< Queue of devices using this socket */
+ uint8_t listener; /**< if not zero socket is listener */
+};
+
+/* Control message queue. */
+struct memif_msg_queue_elt {
+ memif_msg_t msg; /**< control message */
+ TAILQ_ENTRY(memif_msg_queue_elt) next;
+ int fd; /**< fd to be sent to peer */
+};
+
+struct memif_control_channel {
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+ TAILQ_HEAD(, memif_msg_queue_elt) msg_queue; /**< control message queue */
+ struct memif_socket *socket; /**< pointer to socket */
+ struct rte_eth_dev *dev; /**< pointer to device */
+};
+
+#endif /* MEMIF_SOCKET_H */
diff --git a/drivers/net/memif/meson.build b/drivers/net/memif/meson.build
new file mode 100644
index 000000000..287a30ef2
--- /dev/null
+++ b/drivers/net/memif/meson.build
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+
+if host_machine.system() != 'linux'
+ build = false
+endif
+
+sources = files('rte_eth_memif.c',
+ 'memif_socket.c')
+
+allow_experimental_apis = true
+# Experimantal APIs:
+# - rte_intr_callback_unregister_pending
+
+deps += ['hash']
diff --git a/drivers/net/memif/rte_eth_memif.c b/drivers/net/memif/rte_eth_memif.c
new file mode 100644
index 000000000..b9f05a624
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.c
@@ -0,0 +1,1204 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <fcntl.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/un.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+#include <linux/if_ether.h>
+#include <errno.h>
+#include <sys/eventfd.h>
+
+#include <rte_version.h>
+#include <rte_mbuf.h>
+#include <rte_ether.h>
+#include <rte_ethdev_driver.h>
+#include <rte_ethdev_vdev.h>
+#include <rte_malloc.h>
+#include <rte_kvargs.h>
+#include <rte_bus_vdev.h>
+#include <rte_string_fns.h>
+
+#include "rte_eth_memif.h"
+#include "memif_socket.h"
+
+#define ETH_MEMIF_ID_ARG "id"
+#define ETH_MEMIF_ROLE_ARG "role"
+#define ETH_MEMIF_PKT_BUFFER_SIZE_ARG "bsize"
+#define ETH_MEMIF_RING_SIZE_ARG "rsize"
+#define ETH_MEMIF_SOCKET_ARG "socket"
+#define ETH_MEMIF_MAC_ARG "mac"
+#define ETH_MEMIF_ZC_ARG "zero-copy"
+#define ETH_MEMIF_SECRET_ARG "secret"
+
+static const char * const valid_arguments[] = {
+ ETH_MEMIF_ID_ARG,
+ ETH_MEMIF_ROLE_ARG,
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ ETH_MEMIF_RING_SIZE_ARG,
+ ETH_MEMIF_SOCKET_ARG,
+ ETH_MEMIF_MAC_ARG,
+ ETH_MEMIF_ZC_ARG,
+ ETH_MEMIF_SECRET_ARG,
+ NULL
+};
+
+const char *
+memif_version(void)
+{
+ return ("memif-" RTE_STR(MEMIF_VERSION_MAJOR) "." RTE_STR(MEMIF_VERSION_MINOR));
+}
+
+static void
+memif_dev_info(struct rte_eth_dev *dev __rte_unused, struct rte_eth_dev_info *dev_info)
+{
+ dev_info->max_mac_addrs = 1;
+ dev_info->max_rx_pktlen = (uint32_t)ETH_FRAME_LEN;
+ dev_info->max_rx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->max_tx_queues = ETH_MEMIF_MAX_NUM_Q_PAIRS;
+ dev_info->min_rx_bufsize = 0;
+}
+
+static memif_ring_t *
+memif_get_ring(struct pmd_internals *pmd, memif_ring_type_t type, uint16_t ring_num)
+{
+ /* rings only in region 0 */
+ void *p = pmd->regions[0]->addr;
+ int ring_size = sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size);
+
+ p = (uint8_t *)p + (ring_num + type * pmd->run.num_s2m_rings) * ring_size;
+
+ return (memif_ring_t *)p;
+}
+
+static void *
+memif_get_buffer(struct pmd_internals *pmd, memif_desc_t *d)
+{
+ return ((uint8_t *)pmd->regions[d->region]->addr + d->offset);
+}
+
+static int
+memif_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *cur_tail,
+ struct rte_mbuf *tail)
+{
+ /* Check for number-of-segments-overflow */
+ if (unlikely(head->nb_segs + tail->nb_segs > RTE_MBUF_MAX_NB_SEGS))
+ return -EOVERFLOW;
+
+ /* Chain 'tail' onto the old tail */
+ cur_tail->next = tail;
+
+ /* accumulate number of segments and total length. */
+ head->nb_segs = (uint16_t)(head->nb_segs + tail->nb_segs);
+
+ tail->pkt_len = tail->data_len;
+ head->pkt_len += tail->pkt_len;
+
+ return 0;
+}
+
+static uint16_t
+eth_memif_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t cur_slot, last_slot, n_slots, ring_size, mask, s0;
+ uint16_t n_rx_pkts = 0;
+ uint16_t mbuf_size = rte_pktmbuf_data_room_size(mq->mempool) -
+ RTE_PKTMBUF_HEADROOM;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf, *mbuf_head, *mbuf_tail;
+ uint64_t b;
+ ssize_t size __rte_unused;
+ uint16_t head;
+ int ret;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ /* consume interrupt */
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0)
+ size = read(mq->intr_handle.fd, &b, sizeof(b));
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ cur_slot = (type == MEMIF_RING_S2M) ? mq->last_head : mq->last_tail;
+ last_slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+ if (cur_slot == last_slot)
+ goto refill;
+ n_slots = last_slot - cur_slot;
+
+ while (n_slots && n_rx_pkts < nb_pkts) {
+ mbuf_head = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf_head == NULL))
+ goto no_free_bufs;
+ mbuf = mbuf_head;
+ mbuf->port = mq->in_port;
+
+next_slot:
+ s0 = cur_slot & mask;
+ d0 = &ring->desc[s0];
+
+ src_len = d0->length;
+ dst_off = 0;
+ src_off = 0;
+
+ do {
+ dst_len = mbuf_size - dst_off;
+ if (dst_len == 0) {
+ dst_off = 0;
+ dst_len = mbuf_size;
+
+ /* store pointer to tail */
+ mbuf_tail = mbuf;
+ mbuf = rte_pktmbuf_alloc(mq->mempool);
+ if (unlikely(mbuf == NULL))
+ goto no_free_bufs;
+ mbuf->port = mq->in_port;
+ ret = memif_pktmbuf_chain(mbuf_head, mbuf_tail, mbuf);
+ if (unlikely(ret < 0)) {
+ MIF_LOG(ERR, "%s: number-of-segments-overflow",
+ rte_vdev_device_name(pmd->vdev));
+ rte_pktmbuf_free(mbuf);
+ goto no_free_bufs;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ rte_pktmbuf_data_len(mbuf) += cp_len;
+ rte_pktmbuf_pkt_len(mbuf) = rte_pktmbuf_data_len(mbuf);
+ if (mbuf != mbuf_head)
+ rte_pktmbuf_pkt_len(mbuf_head) += cp_len;
+
+ memcpy(rte_pktmbuf_mtod_offset(mbuf, void *, dst_off),
+ (uint8_t *)memif_get_buffer(pmd, d0) + src_off, cp_len);
+
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ } while (src_len);
+
+ cur_slot++;
+ n_slots--;
+
+ if (d0->flags & MEMIF_DESC_FLAG_NEXT)
+ goto next_slot;
+
+ mq->n_bytes += rte_pktmbuf_pkt_len(mbuf_head);
+ *bufs++ = mbuf_head;
+ n_rx_pkts++;
+ }
+
+no_free_bufs:
+ if (type == MEMIF_RING_S2M) {
+ rte_mb();
+ ring->tail = cur_slot;
+ mq->last_head = cur_slot;
+ } else {
+ mq->last_tail = cur_slot;
+ }
+
+refill:
+ if (type == MEMIF_RING_M2S) {
+ head = ring->head;
+ n_slots = ring_size - head + mq->last_tail;
+
+ while (n_slots--) {
+ s0 = head++ & mask;
+ d0 = &ring->desc[s0];
+ d0->length = pmd->run.pkt_buffer_size;
+ }
+ rte_mb();
+ ring->head = head;
+ }
+
+ mq->n_pkts += n_rx_pkts;
+ return n_rx_pkts;
+}
+
+static uint16_t
+eth_memif_tx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
+{
+ struct memif_queue *mq = queue;
+ struct pmd_internals *pmd = mq->pmd;
+ memif_ring_t *ring = mq->ring;
+ uint16_t slot, saved_slot, n_free, ring_size, mask, n_tx_pkts = 0;
+ uint16_t src_len, src_off, dst_len, dst_off, cp_len;
+ memif_ring_type_t type = mq->type;
+ memif_desc_t *d0;
+ struct rte_mbuf *mbuf;
+ struct rte_mbuf *mbuf_head;
+ uint64_t a;
+ ssize_t size;
+
+ if (unlikely((pmd->flags & ETH_MEMIF_FLAG_CONNECTED) == 0))
+ return 0;
+ if (unlikely(ring == NULL))
+ return 0;
+
+ ring_size = 1 << mq->log2_ring_size;
+ mask = ring_size - 1;
+
+ n_free = ring->tail - mq->last_tail;
+ mq->last_tail += n_free;
+ slot = (type == MEMIF_RING_S2M) ? ring->head : ring->tail;
+
+ if (type == MEMIF_RING_S2M)
+ n_free = ring_size - ring->head + mq->last_tail;
+ else
+ n_free = ring->head - ring->tail;
+
+ while (n_tx_pkts < nb_pkts && n_free) {
+ mbuf_head = *bufs++;
+ mbuf = mbuf_head;
+
+ saved_slot = slot;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+
+next_in_chain:
+ src_off = 0;
+ src_len = rte_pktmbuf_data_len(mbuf);
+
+ while (src_len) {
+ if (dst_len == 0) {
+ if (n_free) {
+ slot++;
+ n_free--;
+ d0->flags |= MEMIF_DESC_FLAG_NEXT;
+ d0 = &ring->desc[slot & mask];
+ dst_off = 0;
+ dst_len = (type == MEMIF_RING_S2M) ?
+ pmd->run.pkt_buffer_size : d0->length;
+ d0->flags = 0;
+ } else {
+ slot = saved_slot;
+ goto no_free_slots;
+ }
+ }
+ cp_len = RTE_MIN(dst_len, src_len);
+
+ memcpy((uint8_t *)memif_get_buffer(pmd, d0) + dst_off,
+ rte_pktmbuf_mtod_offset(mbuf, void *, src_off),
+ cp_len);
+
+ mq->n_bytes += cp_len;
+ src_off += cp_len;
+ dst_off += cp_len;
+ src_len -= cp_len;
+ dst_len -= cp_len;
+
+ d0->length = dst_off;
+ }
+
+ if (rte_pktmbuf_is_contiguous(mbuf) == 0) {
+ mbuf = mbuf->next;
+ goto next_in_chain;
+ }
+
+ n_tx_pkts++;
+ slot++;
+ n_free--;
+ rte_pktmbuf_free(mbuf_head);
+ }
+
+no_free_slots:
+ rte_mb();
+ if (type == MEMIF_RING_S2M)
+ ring->head = slot;
+ else
+ ring->tail = slot;
+
+ if ((ring->flags & MEMIF_RING_FLAG_MASK_INT) == 0) {
+ a = 1;
+ size = write(mq->intr_handle.fd, &a, sizeof(a));
+ if (unlikely(size < 0)) {
+ MIF_LOG(WARNING,
+ "%s: Failed to send interrupt. %s",
+ rte_vdev_device_name(pmd->vdev), strerror(errno));
+ }
+ }
+
+ mq->n_err += nb_pkts - n_tx_pkts;
+ mq->n_pkts += n_tx_pkts;
+ return n_tx_pkts;
+}
+
+void
+memif_free_regions(struct pmd_internals *pmd)
+{
+ int i;
+ struct memif_region *r;
+
+ /* regions are allocated contiguously, so it's
+ * enough to loop until 'pmd->regions_num'
+ */
+ for (i = 0; i < pmd->regions_num; i++) {
+ r = pmd->regions[i];
+ if (r != NULL) {
+ if (r->addr != NULL) {
+ munmap(r->addr, r->region_size);
+ if (r->fd > 0) {
+ close(r->fd);
+ r->fd = -1;
+ }
+ }
+ rte_free(r);
+ pmd->regions[i] = NULL;
+ }
+ }
+ pmd->regions_num = 0;
+}
+
+static int
+memif_region_init_shm(struct pmd_internals *pmd, uint8_t has_buffers)
+{
+ char shm_name[ETH_MEMIF_SHM_NAME_SIZE];
+ int ret = 0;
+ struct memif_region *r;
+
+ if (pmd->regions_num >= ETH_MEMIF_MAX_REGION_NUM) {
+ MIF_LOG(ERR, "%s: Too many regions.", rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+
+ r = rte_zmalloc("region", sizeof(struct memif_region), 0);
+ if (r == NULL) {
+ MIF_LOG(ERR, "%s: Failed to alloc memif region.",
+ rte_vdev_device_name(pmd->vdev));
+ return -ENOMEM;
+ }
+
+ /* calculate buffer offset */
+ r->pkt_buffer_offset = (pmd->run.num_s2m_rings + pmd->run.num_m2s_rings) *
+ (sizeof(memif_ring_t) + sizeof(memif_desc_t) *
+ (1 << pmd->run.log2_ring_size));
+
+ r->region_size = r->pkt_buffer_offset;
+ /* if region has buffers, add buffers size to region_size */
+ if (has_buffers == 1)
+ r->region_size += (uint32_t)(pmd->run.pkt_buffer_size *
+ (1 << pmd->run.log2_ring_size) *
+ (pmd->run.num_s2m_rings +
+ pmd->run.num_m2s_rings));
+
+ memset(shm_name, 0, sizeof(char) * ETH_MEMIF_SHM_NAME_SIZE);
+ snprintf(shm_name, ETH_MEMIF_SHM_NAME_SIZE, "memif_region_%d",
+ pmd->regions_num);
+
+ r->fd = memfd_create(shm_name, MFD_ALLOW_SEALING);
+ if (r->fd < 0) {
+ MIF_LOG(ERR, "%s: Failed to create shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ ret = -1;
+ goto error;
+ }
+
+ ret = fcntl(r->fd, F_ADD_SEALS, F_SEAL_SHRINK);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to add seals to shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ ret = ftruncate(r->fd, r->region_size);
+ if (ret < 0) {
+ MIF_LOG(ERR, "%s: Failed to truncate shm file: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(errno));
+ goto error;
+ }
+
+ r->addr = mmap(NULL, r->region_size, PROT_READ |
+ PROT_WRITE, MAP_SHARED, r->fd, 0);
+ if (r->addr == MAP_FAILED) {
+ MIF_LOG(ERR, "%s: Failed to mmap shm region: %s.",
+ rte_vdev_device_name(pmd->vdev),
+ strerror(ret));
+ ret = -1;
+ goto error;
+ }
+
+ pmd->regions[pmd->regions_num] = r;
+ pmd->regions_num++;
+
+ return ret;
+
+error:
+ if (r->fd > 0)
+ close(r->fd);
+ r->fd = -1;
+
+ return ret;
+}
+
+static int
+memif_regions_init(struct pmd_internals *pmd)
+{
+ int ret;
+
+ /* create one buffer region */
+ ret = memif_region_init_shm(pmd, /* has buffer */ 1);
+ if (ret < 0)
+ return ret;
+
+ return 0;
+}
+
+static void
+memif_init_rings(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ memif_ring_t *ring;
+ int i, j;
+ uint16_t slot;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = i * (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
+ (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ ring->head = 0;
+ ring->tail = 0;
+ ring->cookie = MEMIF_COOKIE;
+ ring->flags = 0;
+ for (j = 0; j < (1 << pmd->run.log2_ring_size); j++) {
+ slot = (i + pmd->run.num_s2m_rings) *
+ (1 << pmd->run.log2_ring_size) + j;
+ ring->desc[j].region = 0;
+ ring->desc[j].offset = pmd->regions[0]->pkt_buffer_offset +
+ (uint32_t)(slot * pmd->run.pkt_buffer_size);
+ ring->desc[j].length = pmd->run.pkt_buffer_size;
+ }
+ }
+}
+
+/* called only by slave */
+static void
+memif_init_queues(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = dev->data->tx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_S2M, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for tx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = dev->data->rx_queues[i];
+ mq->ring = memif_get_ring(pmd, MEMIF_RING_M2S, i);
+ mq->log2_ring_size = pmd->run.log2_ring_size;
+ /* queues located only in region 0 */
+ mq->region = 0;
+ mq->ring_offset = (uint8_t *)mq->ring - (uint8_t *)pmd->regions[0]->addr;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ mq->intr_handle.fd = eventfd(0, EFD_NONBLOCK);
+ if (mq->intr_handle.fd < 0) {
+ MIF_LOG(WARNING,
+ "%s: Failed to create eventfd for rx queue %d: %s.",
+ rte_vdev_device_name(pmd->vdev), i,
+ strerror(errno));
+ }
+ }
+}
+
+int
+memif_init_regions_and_queues(struct rte_eth_dev *dev)
+{
+ int ret;
+
+ ret = memif_regions_init(dev->data->dev_private);
+ if (ret < 0)
+ return ret;
+
+ memif_init_rings(dev);
+
+ memif_init_queues(dev);
+
+ return 0;
+}
+
+int
+memif_connect(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_region *mr;
+ struct memif_queue *mq;
+ int i;
+
+ for (i = 0; i < pmd->regions_num; i++) {
+ mr = pmd->regions[i];
+ if (mr != NULL) {
+ if (mr->addr == NULL) {
+ if (mr->fd < 0)
+ return -1;
+ mr->addr = mmap(NULL, mr->region_size,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED, mr->fd, 0);
+ if (mr->addr == NULL)
+ return -1;
+ }
+ }
+ }
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->tx_queues[i] : dev->data->rx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->ring_offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_MASTER)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->rx_queues[i] : dev->data->tx_queues[i];
+ mq->ring = (memif_ring_t *)((uint8_t *)pmd->regions[mq->region]->addr +
+ mq->ring_offset);
+ if (mq->ring->cookie != MEMIF_COOKIE) {
+ MIF_LOG(ERR, "%s: Wrong cookie",
+ rte_vdev_device_name(pmd->vdev));
+ return -1;
+ }
+ mq->ring->head = 0;
+ mq->ring->tail = 0;
+ mq->last_head = 0;
+ mq->last_tail = 0;
+ /* enable polling mode */
+ if (pmd->role == MEMIF_ROLE_SLAVE)
+ mq->ring->flags = MEMIF_RING_FLAG_MASK_INT;
+ }
+
+ pmd->flags &= ~ETH_MEMIF_FLAG_CONNECTING;
+ pmd->flags |= ETH_MEMIF_FLAG_CONNECTED;
+ dev->data->dev_link.link_status = ETH_LINK_UP;
+ MIF_LOG(INFO, "%s: Connected.", rte_vdev_device_name(pmd->vdev));
+ return 0;
+}
+
+static int
+memif_dev_start(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int ret = 0;
+
+ switch (pmd->role) {
+ case MEMIF_ROLE_SLAVE:
+ ret = memif_connect_slave(dev);
+ break;
+ case MEMIF_ROLE_MASTER:
+ ret = memif_connect_master(dev);
+ break;
+ default:
+ MIF_LOG(ERR, "%s: Unknown role: %d.",
+ rte_vdev_device_name(pmd->vdev), pmd->role);
+ ret = -1;
+ break;
+ }
+
+ return ret;
+}
+
+static void
+memif_dev_close(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+
+ memif_msg_enq_disconnect(pmd->cc, "Device closed", 0);
+ memif_disconnect(dev);
+
+ for (i = 0; i < dev->data->nb_rx_queues; i++)
+ (*dev->dev_ops->rx_queue_release)(dev->data->rx_queues[i]);
+ for (i = 0; i < dev->data->nb_tx_queues; i++)
+ (*dev->dev_ops->tx_queue_release)(dev->data->tx_queues[i]);
+
+ memif_socket_remove_device(dev);
+}
+
+static int
+memif_dev_configure(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ /*
+ * SLAVE - TXQ
+ * MASTER - RXQ
+ */
+ pmd->cfg.num_s2m_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_tx_queues : dev->data->nb_rx_queues;
+
+ /*
+ * SLAVE - RXQ
+ * MASTER - TXQ
+ */
+ pmd->cfg.num_m2s_rings = (pmd->role == MEMIF_ROLE_SLAVE) ?
+ dev->data->nb_rx_queues : dev->data->nb_tx_queues;
+
+ return 0;
+}
+
+static int
+memif_tx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_tx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_txconf *tx_conf __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("tx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate tx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type =
+ (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_S2M : MEMIF_RING_M2S;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->pmd = pmd;
+ dev->data->tx_queues[qid] = mq;
+
+ return 0;
+}
+
+static int
+memif_rx_queue_setup(struct rte_eth_dev *dev,
+ uint16_t qid,
+ uint16_t nb_rx_desc __rte_unused,
+ unsigned int socket_id __rte_unused,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mb_pool)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+
+ mq = rte_zmalloc("rx-queue", sizeof(struct memif_queue), 0);
+ if (mq == NULL) {
+ MIF_LOG(ERR, "%s: Failed to allocate rx queue id: %u",
+ rte_vdev_device_name(pmd->vdev), qid);
+ return -ENOMEM;
+ }
+
+ mq->type = (pmd->role == MEMIF_ROLE_SLAVE) ? MEMIF_RING_M2S : MEMIF_RING_S2M;
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ mq->intr_handle.fd = -1;
+ mq->intr_handle.type = RTE_INTR_HANDLE_EXT;
+ mq->mempool = mb_pool;
+ mq->in_port = dev->data->port_id;
+ mq->pmd = pmd;
+ dev->data->rx_queues[qid] = mq;
+
+ return 0;
+}
+
+static void
+memif_queue_release(void *queue)
+{
+ struct memif_queue *mq = (struct memif_queue *)queue;
+
+ if (!mq)
+ return;
+
+ rte_free(mq);
+}
+
+static int
+memif_link_update(struct rte_eth_dev *dev __rte_unused,
+ int wait_to_complete __rte_unused)
+{
+ return 0;
+}
+
+static int
+memif_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *stats)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ struct memif_queue *mq;
+ int i;
+ uint8_t tmp, nq;
+
+ stats->ipackets = 0;
+ stats->ibytes = 0;
+ stats->opackets = 0;
+ stats->obytes = 0;
+ stats->oerrors = 0;
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_s2m_rings :
+ pmd->run.num_m2s_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* RX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->rx_queues[i];
+ stats->q_ipackets[i] = mq->n_pkts;
+ stats->q_ibytes[i] = mq->n_bytes;
+ stats->ipackets += mq->n_pkts;
+ stats->ibytes += mq->n_bytes;
+ }
+
+ tmp = (pmd->role == MEMIF_ROLE_SLAVE) ? pmd->run.num_m2s_rings :
+ pmd->run.num_s2m_rings;
+ nq = (tmp < RTE_ETHDEV_QUEUE_STAT_CNTRS) ? tmp :
+ RTE_ETHDEV_QUEUE_STAT_CNTRS;
+
+ /* TX stats */
+ for (i = 0; i < nq; i++) {
+ mq = dev->data->tx_queues[i];
+ stats->q_opackets[i] = mq->n_pkts;
+ stats->q_obytes[i] = mq->n_bytes;
+ stats->opackets += mq->n_pkts;
+ stats->obytes += mq->n_bytes;
+ stats->oerrors += mq->n_err;
+ }
+ return 0;
+}
+
+static void
+memif_stats_reset(struct rte_eth_dev *dev)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+ int i;
+ struct memif_queue *mq;
+
+ for (i = 0; i < pmd->run.num_s2m_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->tx_queues[i] :
+ dev->data->rx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+ for (i = 0; i < pmd->run.num_m2s_rings; i++) {
+ mq = (pmd->role == MEMIF_ROLE_SLAVE) ? dev->data->rx_queues[i] :
+ dev->data->tx_queues[i];
+ mq->n_pkts = 0;
+ mq->n_bytes = 0;
+ mq->n_err = 0;
+ }
+}
+
+static int
+memif_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd = dev->data->dev_private;
+
+ MIF_LOG(WARNING, "%s: Interrupt mode not supported.",
+ rte_vdev_device_name(pmd->vdev));
+
+ return -1;
+}
+
+static int
+memif_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t qid __rte_unused)
+{
+ struct pmd_internals *pmd __rte_unused = dev->data->dev_private;
+
+ return 0;
+}
+
+static const struct eth_dev_ops ops = {
+ .dev_start = memif_dev_start,
+ .dev_close = memif_dev_close,
+ .dev_infos_get = memif_dev_info,
+ .dev_configure = memif_dev_configure,
+ .tx_queue_setup = memif_tx_queue_setup,
+ .rx_queue_setup = memif_rx_queue_setup,
+ .rx_queue_release = memif_queue_release,
+ .tx_queue_release = memif_queue_release,
+ .rx_queue_intr_enable = memif_rx_queue_intr_enable,
+ .rx_queue_intr_disable = memif_rx_queue_intr_disable,
+ .link_update = memif_link_update,
+ .stats_get = memif_stats_get,
+ .stats_reset = memif_stats_reset,
+};
+
+static int
+memif_create(struct rte_vdev_device *vdev, enum memif_role_t role,
+ memif_interface_id_t id, uint32_t flags,
+ const char *socket_filename,
+ memif_log2_ring_size_t log2_ring_size,
+ uint16_t pkt_buffer_size, const char *secret,
+ struct rte_ether_addr *ether_addr)
+{
+ int ret = 0;
+ struct rte_eth_dev *eth_dev;
+ struct rte_eth_dev_data *data;
+ struct pmd_internals *pmd;
+ const unsigned int numa_node = vdev->device.numa_node;
+ const char *name = rte_vdev_device_name(vdev);
+
+ if (flags & ETH_MEMIF_FLAG_ZERO_COPY) {
+ MIF_LOG(ERR, "Zero-copy slave not supported.");
+ return -1;
+ }
+
+ eth_dev = rte_eth_vdev_allocate(vdev, sizeof(*pmd));
+ if (eth_dev == NULL) {
+ MIF_LOG(ERR, "%s: Unable to allocate device struct.", name);
+ return -1;
+ }
+
+ pmd = eth_dev->data->dev_private;
+ memset(pmd, 0, sizeof(*pmd));
+
+ pmd->vdev = vdev;
+ pmd->id = id;
+ pmd->flags = flags;
+ pmd->flags |= ETH_MEMIF_FLAG_DISABLED;
+ pmd->role = role;
+
+ ret = memif_socket_init(eth_dev, socket_filename);
+ if (ret < 0)
+ return ret;
+
+ memset(pmd->secret, 0, sizeof(char) * ETH_MEMIF_SECRET_SIZE);
+ if (secret != NULL)
+ strlcpy(pmd->secret, secret, sizeof(pmd->secret));
+
+ pmd->cfg.log2_ring_size = log2_ring_size;
+ /* set in .dev_configure() */
+ pmd->cfg.num_s2m_rings = 0;
+ pmd->cfg.num_m2s_rings = 0;
+
+ pmd->cfg.pkt_buffer_size = pkt_buffer_size;
+
+ data = eth_dev->data;
+ data->dev_private = pmd;
+ data->numa_node = numa_node;
+ data->mac_addrs = ether_addr;
+
+ eth_dev->dev_ops = &ops;
+ eth_dev->device = &vdev->device;
+ eth_dev->rx_pkt_burst = eth_memif_rx;
+ eth_dev->tx_pkt_burst = eth_memif_tx;
+
+ eth_dev->data->dev_flags |= RTE_ETH_DEV_CLOSE_REMOVE;
+
+ rte_eth_dev_probing_finish(eth_dev);
+
+ return 0;
+}
+
+static int
+memif_set_role(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ enum memif_role_t *role = (enum memif_role_t *)extra_args;
+
+ if (strstr(value, "master") != NULL) {
+ *role = MEMIF_ROLE_MASTER;
+ } else if (strstr(value, "slave") != NULL) {
+ *role = MEMIF_ROLE_SLAVE;
+ } else {
+ MIF_LOG(ERR, "Unknown role: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_zc(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ uint32_t *flags = (uint32_t *)extra_args;
+
+ if (strstr(value, "yes") != NULL) {
+ *flags |= ETH_MEMIF_FLAG_ZERO_COPY;
+ } else if (strstr(value, "no") != NULL) {
+ *flags &= ~ETH_MEMIF_FLAG_ZERO_COPY;
+ } else {
+ MIF_LOG(ERR, "Failed to parse zero-copy param: %s.", value);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
+memif_set_id(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ memif_interface_id_t *id = (memif_interface_id_t *)extra_args;
+
+ /* even if parsing fails, 0 is a valid id */
+ *id = strtoul(value, NULL, 10);
+ return 0;
+}
+
+static int
+memif_set_bs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ uint16_t *pkt_buffer_size = (uint16_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > 0xFFFF) {
+ MIF_LOG(ERR, "Invalid buffer size: %s.", value);
+ return -EINVAL;
+ }
+ *pkt_buffer_size = tmp;
+ return 0;
+}
+
+static int
+memif_set_rs(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ unsigned long tmp;
+ memif_log2_ring_size_t *log2_ring_size =
+ (memif_log2_ring_size_t *)extra_args;
+
+ tmp = strtoul(value, NULL, 10);
+ if (tmp == 0 || tmp > ETH_MEMIF_MAX_LOG2_RING_SIZE) {
+ MIF_LOG(ERR, "Invalid ring size: %s (max %u).",
+ value, ETH_MEMIF_MAX_LOG2_RING_SIZE);
+ return -EINVAL;
+ }
+ *log2_ring_size = tmp;
+ return 0;
+}
+
+/* check if directory exists and if we have permission to read/write */
+static int
+memif_check_socket_filename(const char *filename)
+{
+ char *dir = NULL, *tmp;
+ uint32_t idx;
+ int ret = 0;
+
+ tmp = strrchr(filename, '/');
+ if (tmp != NULL) {
+ idx = tmp - filename;
+ dir = rte_zmalloc("memif_tmp", sizeof(char) * (idx + 1), 0);
+ if (dir == NULL) {
+ MIF_LOG(ERR, "Failed to allocate memory.");
+ return -1;
+ }
+ strlcpy(dir, filename, sizeof(char) * (idx + 1));
+ }
+
+ if (dir == NULL || (faccessat(-1, dir, F_OK | R_OK |
+ W_OK, AT_EACCESS) < 0)) {
+ MIF_LOG(ERR, "Invalid socket directory.");
+ ret = -EINVAL;
+ }
+
+ if (dir != NULL)
+ rte_free(dir);
+
+ return ret;
+}
+
+static int
+memif_set_socket_filename(const char *key __rte_unused, const char *value,
+ void *extra_args)
+{
+ const char **socket_filename = (const char **)extra_args;
+
+ *socket_filename = value;
+ return memif_check_socket_filename(*socket_filename);
+}
+
+static int
+memif_set_mac(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ struct rte_ether_addr *ether_addr = (struct rte_ether_addr *)extra_args;
+ int ret = 0;
+
+ ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
+ ðer_addr->addr_bytes[0], ðer_addr->addr_bytes[1],
+ ðer_addr->addr_bytes[2], ðer_addr->addr_bytes[3],
+ ðer_addr->addr_bytes[4], ðer_addr->addr_bytes[5]);
+ if (ret != 6)
+ MIF_LOG(WARNING, "Failed to parse mac '%s'.", value);
+ return 0;
+}
+
+static int
+memif_set_secret(const char *key __rte_unused, const char *value, void *extra_args)
+{
+ const char **secret = (const char **)extra_args;
+
+ *secret = value;
+ return 0;
+}
+
+static int
+rte_pmd_memif_probe(struct rte_vdev_device *vdev)
+{
+ RTE_BUILD_BUG_ON(sizeof(memif_msg_t) != 128);
+ RTE_BUILD_BUG_ON(sizeof(memif_desc_t) != 16);
+ int ret = 0;
+ struct rte_kvargs *kvlist;
+ const char *name = rte_vdev_device_name(vdev);
+ enum memif_role_t role = MEMIF_ROLE_SLAVE;
+ memif_interface_id_t id = 0;
+ uint16_t pkt_buffer_size = ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE;
+ memif_log2_ring_size_t log2_ring_size = ETH_MEMIF_DEFAULT_RING_SIZE;
+ const char *socket_filename = ETH_MEMIF_DEFAULT_SOCKET_FILENAME;
+ uint32_t flags = 0;
+ const char *secret = NULL;
+ struct rte_ether_addr *ether_addr = rte_zmalloc("",
+ sizeof(struct rte_ether_addr), 0);
+
+ rte_eth_random_addr(ether_addr->addr_bytes);
+
+ MIF_LOG(INFO, "Initialize MEMIF: %s.", name);
+
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ MIF_LOG(ERR, "Multi-processing not supported for memif.");
+ /* TODO:
+ * Request connection information.
+ *
+ * Once memif in the primary process is connected,
+ * broadcast connection information.
+ */
+ return -1;
+ }
+
+ kvlist = rte_kvargs_parse(rte_vdev_device_args(vdev), valid_arguments);
+
+ /* parse parameters */
+ if (kvlist != NULL) {
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ROLE_ARG,
+ &memif_set_role, &role);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ID_ARG,
+ &memif_set_id, &id);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_PKT_BUFFER_SIZE_ARG,
+ &memif_set_bs, &pkt_buffer_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_RING_SIZE_ARG,
+ &memif_set_rs, &log2_ring_size);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SOCKET_ARG,
+ &memif_set_socket_filename,
+ (void *)(&socket_filename));
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_MAC_ARG,
+ &memif_set_mac, ether_addr);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_ZC_ARG,
+ &memif_set_zc, &flags);
+ if (ret < 0)
+ goto exit;
+ ret = rte_kvargs_process(kvlist, ETH_MEMIF_SECRET_ARG,
+ &memif_set_secret, (void *)(&secret));
+ if (ret < 0)
+ goto exit;
+ }
+
+ /* create interface */
+ ret = memif_create(vdev, role, id, flags, socket_filename,
+ log2_ring_size, pkt_buffer_size, secret, ether_addr);
+
+exit:
+ if (kvlist != NULL)
+ rte_kvargs_free(kvlist);
+ return ret;
+}
+
+static int
+rte_pmd_memif_remove(struct rte_vdev_device *vdev)
+{
+ struct rte_eth_dev *eth_dev;
+
+ eth_dev = rte_eth_dev_allocated(rte_vdev_device_name(vdev));
+ if (eth_dev == NULL)
+ return 0;
+
+ rte_eth_dev_close(eth_dev->data->port_id);
+
+ return 0;
+}
+
+static struct rte_vdev_driver pmd_memif_drv = {
+ .probe = rte_pmd_memif_probe,
+ .remove = rte_pmd_memif_remove,
+};
+
+RTE_PMD_REGISTER_VDEV(net_memif, pmd_memif_drv);
+
+RTE_PMD_REGISTER_PARAM_STRING(net_memif,
+ ETH_MEMIF_ID_ARG "=<int>"
+ ETH_MEMIF_ROLE_ARG "=master|slave"
+ ETH_MEMIF_PKT_BUFFER_SIZE_ARG "=<int>"
+ ETH_MEMIF_RING_SIZE_ARG "=<int>"
+ ETH_MEMIF_SOCKET_ARG "=<string>"
+ ETH_MEMIF_MAC_ARG "=xx:xx:xx:xx:xx:xx"
+ ETH_MEMIF_ZC_ARG "=yes|no"
+ ETH_MEMIF_SECRET_ARG "=<string>");
+
+int memif_logtype;
+
+RTE_INIT(memif_init_log)
+{
+ memif_logtype = rte_log_register("pmd.net.memif");
+ if (memif_logtype >= 0)
+ rte_log_set_level(memif_logtype, RTE_LOG_NOTICE);
+}
diff --git a/drivers/net/memif/rte_eth_memif.h b/drivers/net/memif/rte_eth_memif.h
new file mode 100644
index 000000000..5f631e925
--- /dev/null
+++ b/drivers/net/memif/rte_eth_memif.h
@@ -0,0 +1,212 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright 2018-2019 Cisco Systems, Inc. All rights reserved.
+ */
+
+#ifndef _RTE_ETH_MEMIF_H_
+#define _RTE_ETH_MEMIF_H_
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif /* GNU_SOURCE */
+
+#include <sys/queue.h>
+
+#include <rte_ethdev_driver.h>
+#include <rte_ether.h>
+#include <rte_interrupts.h>
+
+#include "memif.h"
+
+#define ETH_MEMIF_DEFAULT_SOCKET_FILENAME "/run/memif.sock"
+#define ETH_MEMIF_DEFAULT_RING_SIZE 10
+#define ETH_MEMIF_DEFAULT_PKT_BUFFER_SIZE 2048
+
+#define ETH_MEMIF_MAX_NUM_Q_PAIRS 255
+#define ETH_MEMIF_MAX_LOG2_RING_SIZE 14
+#define ETH_MEMIF_MAX_REGION_NUM 256
+
+#define ETH_MEMIF_SHM_NAME_SIZE 32
+#define ETH_MEMIF_DISC_STRING_SIZE 96
+#define ETH_MEMIF_SECRET_SIZE 24
+
+extern int memif_logtype;
+
+#define MIF_LOG(level, fmt, args...) \
+ rte_log(RTE_LOG_ ## level, memif_logtype, \
+ "%s(): " fmt "\n", __func__, ##args)
+
+enum memif_role_t {
+ MEMIF_ROLE_MASTER,
+ MEMIF_ROLE_SLAVE,
+};
+
+struct memif_region {
+ void *addr; /**< shared memory address */
+ memif_region_size_t region_size; /**< shared memory size */
+ int fd; /**< shared memory file descriptor */
+ uint32_t pkt_buffer_offset;
+ /**< offset from 'addr' to first packet buffer */
+};
+
+struct memif_queue {
+ struct rte_mempool *mempool; /**< mempool for RX packets */
+ struct pmd_internals *pmd; /**< device internals */
+
+ memif_ring_type_t type; /**< ring type */
+ memif_region_index_t region; /**< shared memory region index */
+
+ uint16_t in_port; /**< port id */
+
+ memif_region_offset_t ring_offset;
+ /**< ring offset from start of shm region (ring - memif_region.addr) */
+
+ uint16_t last_head; /**< last ring head */
+ uint16_t last_tail; /**< last ring tail */
+
+ /* rx/tx info */
+ uint64_t n_pkts; /**< number of rx/tx packets */
+ uint64_t n_bytes; /**< number of rx/tx bytes */
+ uint64_t n_err; /**< number of tx errors */
+
+ memif_ring_t *ring; /**< pointer to ring */
+
+ struct rte_intr_handle intr_handle; /**< interrupt handle */
+
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+};
+
+struct pmd_internals {
+ memif_interface_id_t id; /**< unique id */
+ enum memif_role_t role; /**< device role */
+ uint32_t flags; /**< device status flags */
+#define ETH_MEMIF_FLAG_CONNECTING (1 << 0)
+/**< device is connecting */
+#define ETH_MEMIF_FLAG_CONNECTED (1 << 1)
+/**< device is connected */
+#define ETH_MEMIF_FLAG_ZERO_COPY (1 << 2)
+/**< device is zero-copy enabled */
+#define ETH_MEMIF_FLAG_DISABLED (1 << 3)
+/**< device has not been configured and can not accept connection requests */
+
+ char *socket_filename; /**< pointer to socket filename */
+ char secret[ETH_MEMIF_SECRET_SIZE]; /**< secret (optional security parameter) */
+
+ struct memif_control_channel *cc; /**< control channel */
+
+ struct memif_region *regions[ETH_MEMIF_MAX_REGION_NUM];
+ /**< shared memory regions */
+ memif_region_index_t regions_num; /**< number of regions */
+
+ /* remote info */
+ char remote_name[RTE_DEV_NAME_MAX_LEN]; /**< remote app name */
+ char remote_if_name[RTE_DEV_NAME_MAX_LEN]; /**< remote peer name */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } cfg; /**< Configured parameters (max values) */
+
+ struct {
+ memif_log2_ring_size_t log2_ring_size; /**< log2 of ring size */
+ uint8_t num_s2m_rings; /**< number of slave to master rings */
+ uint8_t num_m2s_rings; /**< number of master to slave rings */
+ uint16_t pkt_buffer_size; /**< buffer size */
+ } run;
+ /**< Parameters used in active connection */
+
+ char local_disc_string[ETH_MEMIF_DISC_STRING_SIZE];
+ /**< local disconnect reason */
+ char remote_disc_string[ETH_MEMIF_DISC_STRING_SIZE];
+ /**< remote disconnect reason */
+
+ struct rte_vdev_device *vdev; /**< vdev handle */
+};
+
+/**
+ * Unmap shared memory and free regions from memory.
+ *
+ * @param pmd
+ * device internals
+ */
+void memif_free_regions(struct pmd_internals *pmd);
+
+/**
+ * Finalize connection establishment process. Map shared memory file
+ * (master role), initialize ring queue, set link status up.
+ *
+ * @param dev
+ * memif device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_connect(struct rte_eth_dev *dev);
+
+/**
+ * Create shared memory file and initialize ring queue.
+ * Only called by slave when establishing connection
+ *
+ * @param dev
+ * memif device
+ * @return
+ * - On success, zero.
+ * - On failure, a negative value.
+ */
+int memif_init_regions_and_queues(struct rte_eth_dev *dev);
+
+/**
+ * Get memif version string.
+ *
+ * @return
+ * - memif version string
+ */
+const char *memif_version(void);
+
+#ifndef MFD_HUGETLB
+#ifndef __NR_memfd_create
+
+#if defined __x86_64__
+#define __NR_memfd_create 319
+#elif defined __x86_32__
+#define __NR_memfd_create 1073742143
+#elif defined __arm__
+#define __NR_memfd_create 385
+#elif defined __aarch64__
+#define __NR_memfd_create 279
+#elif defined __powerpc__
+#define __NR_memfd_create 360
+#elif defined __i386__
+#define __NR_memfd_create 356
+#else
+#error "__NR_memfd_create unknown for this architecture"
+#endif
+
+#endif /* __NR_memfd_create */
+
+static inline int memfd_create(const char *name, unsigned int flags)
+{
+ return syscall(__NR_memfd_create, name, flags);
+}
+#endif /* MFD_HUGETLB */
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef MFD_ALLOW_SEALING
+#define MFD_ALLOW_SEALING 0x0002U
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
+#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#endif
+
+#endif /* RTE_ETH_MEMIF_H */
diff --git a/drivers/net/memif/rte_pmd_memif_version.map b/drivers/net/memif/rte_pmd_memif_version.map
new file mode 100644
index 000000000..8861484fb
--- /dev/null
+++ b/drivers/net/memif/rte_pmd_memif_version.map
@@ -0,0 +1,4 @@
+DPDK_19.08 {
+
+ local: *;
+};
diff --git a/drivers/net/meson.build b/drivers/net/meson.build
index ed99896c3..b57073483 100644
--- a/drivers/net/meson.build
+++ b/drivers/net/meson.build
@@ -24,6 +24,7 @@ drivers = ['af_packet',
'ixgbe',
'kni',
'liquidio',
+ 'memif',
'mlx4',
'mlx5',
'mvneta',
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 7c9b4b538..326b2a30b 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -174,6 +174,7 @@ ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_KNI) += -lrte_pmd_kni
endif
_LDLIBS-$(CONFIG_RTE_LIBRTE_LIO_PMD) += -lrte_pmd_lio
+_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_MEMIF) += -lrte_pmd_memif
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX4_PMD) += -lrte_pmd_mlx4
_LDLIBS-$(CONFIG_RTE_LIBRTE_MLX5_PMD) += -lrte_pmd_mlx5 -lmnl
ifeq ($(CONFIG_RTE_IBVERBS_LINK_DLOPEN),y)
--
2.17.1
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [dpdk-dev] [PATCH v12] net/memif: introduce memory interface (memif) PMD
2019-06-06 11:38 ` [dpdk-dev] [PATCH v12] " Jakub Grajciar
@ 2019-06-06 14:07 ` Ferruh Yigit
0 siblings, 0 replies; 58+ messages in thread
From: Ferruh Yigit @ 2019-06-06 14:07 UTC (permalink / raw)
To: Jakub Grajciar, dev
On 6/6/2019 12:38 PM, Jakub Grajciar wrote:
> Shared memory packet interface (memif) PMD allows for DPDK and any other
> client using memif (DPDK, VPP, libmemif) to communicate using shared
> memory. The created device transmits packets in a raw format. It can be
> used with Ethernet mode, IP mode, or Punt/Inject. At this moment, only
> Ethernet mode is supported in DPDK memif implementation. Memif is Linux
> only.
>
> Signed-off-by: Jakub Grajciar <jgrajcia@cisco.com>
> ---
> MAINTAINERS | 6 +
> config/common_base | 5 +
> config/common_linux | 1 +
> doc/guides/nics/features/memif.ini | 14 +
> doc/guides/nics/index.rst | 1 +
> doc/guides/nics/memif.rst | 234 ++++
> doc/guides/rel_notes/release_19_08.rst | 5 +
> drivers/net/Makefile | 1 +
> drivers/net/memif/Makefile | 31 +
> drivers/net/memif/memif.h | 179 +++
> drivers/net/memif/memif_socket.c | 1124 +++++++++++++++++
> drivers/net/memif/memif_socket.h | 105 ++
> drivers/net/memif/meson.build | 15 +
> drivers/net/memif/rte_eth_memif.c | 1204 +++++++++++++++++++
> drivers/net/memif/rte_eth_memif.h | 212 ++++
> drivers/net/memif/rte_pmd_memif_version.map | 4 +
> drivers/net/meson.build | 1 +
> mk/rte.app.mk | 1 +
Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
Applied to dpdk-next-net/master, thanks.
^ permalink raw reply [flat|nested] 58+ messages in thread