DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code
@ 2018-03-09 16:42 Konstantin Ananyev
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework Konstantin Ananyev
                   ` (14 more replies)
  0 siblings, 15 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-09 16:42 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

BPF is used quite intensively inside Linux (and BSD) kernels
for various different purposes and proved to be extremely useful.

BPF inside DPDK might also be used in a lot of places
for a lot of similar things.
 As an example to:
- packet filtering/tracing (aka tcpdump)
- packet classification
- statistics collection
- HW/PMD live-system debugging/prototyping - trace HW descriptors,
  internal PMD SW state, etc.
 ...

All of that in a dynamic, user-defined and extensible manner.

So these series introduce new library - librte_bpf.
librte_bpf provides API to load and execute BPF bytecode within
user-space dpdk app.
It supports basic set of features from eBPF spec.
Also it introduces basic framework to load/unload BPF-based filters
on eth devices (right now via SW RX/TX callbacks).

How to try it:
===============

1) run testpmd as usual and start your favorite forwarding case.
2) build bpf program you'd like to load
(you'll need clang v3.7 or above):
$ cd test/bpf
$ clang -O2 -target bpf -c t1.c

3) load bpf program(s):
testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>

<load-flags>:  [-][J][M]
J - use JIT generated native code, otherwise BPF interpreter will be used.
M - assume input parameter is a pointer to rte_mbuf,
    otherwise assume it is a pointer to first segment's data.

Few examples:

# to load (not JITed) dummy.o at TX queue 0, port 0:
testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o

#to load (and JIT compile) t1.o at RX queue 0, port 1:
testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o

#to load and JIT t3.o (note that it expects mbuf as an input):
testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o

If you are curious to check JIT generated native code:
gdb -p `pgrep testpmd`
(gdb) disas 0x7fd173c5f000,+76
Dump of assembler code from 0x7fd173c5f000 to 0x7fd173c5f04c:
   0x00007fd173c5f000:  mov    %rdi,%rsi
   0x00007fd173c5f003:  movzwq 0x10(%rsi),%rdi
   0x00007fd173c5f008:  mov    0x0(%rsi),%rdx
   0x00007fd173c5f00c:  add    %rdi,%rdx
   0x00007fd173c5f00f:  movzbq 0xc(%rdx),%rdi
   0x00007fd173c5f014:  movzbq 0xd(%rdx),%rdx
   0x00007fd173c5f019:  shl    $0x8,%rdx
   0x00007fd173c5f01d:  or     %rdi,%rdx
   0x00007fd173c5f020:  cmp    $0x608,%rdx
   0x00007fd173c5f027:  jne    0x7fd173c5f044
   0x00007fd173c5f029:  mov    $0xb712e8,%rdi
   0x00007fd173c5f030:  mov    0x0(%rdi),%rdi
   0x00007fd173c5f034:  mov    $0x40,%rdx
   0x00007fd173c5f03b:  mov    $0x4db2f0,%rax
   0x00007fd173c5f042:  callq  *%rax
   0x00007fd173c5f044:  mov    $0x1,%rax
   0x00007fd173c5f04b:  retq
End of assembler dump.

4) observe changed traffic behavior
Let say with the examples above:
  - dummy.o  does literally nothing, so no changes should be here,
    except some possible slowdown.
 - t1.o - should force to drop all packets that doesn't match:
   'dst 1.2.3.4 && udp && dst port 5000' filter.
 - t3.o - should dump to stdout ARP packets.

5) unload some or all bpf programs:
testpmd> bpf-unload tx 0 0

6) continue with step 3) or exit

TODO list:
==========
- meson build
- UT for it
- implement proper validate()
- allow JIT to generate bulk version
- FreeBSD support

Not currently supported features:
=================================
- cBPF
- tail-pointer call
- eBPF MAP
- JIT for non X86_64 targets
- skb

Konstantin Ananyev (5):
  bpf: add BPF loading and execution framework
  bpf: add JIT compilation for x86_64 ISA.
  bpf: introduce basic RX/TX BPF filters
  testpmd: new commands to load/unload BPF filters
  test: add few eBPF samples

 app/test-pmd/bpf_sup.h             |   25 +
 app/test-pmd/cmdline.c             |  146 ++++
 config/common_base                 |    5 +
 config/common_linuxapp             |    1 +
 lib/Makefile                       |    2 +
 lib/librte_bpf/Makefile            |   35 +
 lib/librte_bpf/bpf.c               |   52 ++
 lib/librte_bpf/bpf_exec.c          |  452 ++++++++++++
 lib/librte_bpf/bpf_impl.h          |   37 +
 lib/librte_bpf/bpf_jit_x86.c       | 1329 ++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_load.c          |  380 +++++++++++
 lib/librte_bpf/bpf_pkt.c           |  524 ++++++++++++++
 lib/librte_bpf/bpf_validate.c      |   55 ++
 lib/librte_bpf/rte_bpf.h           |  158 +++++
 lib/librte_bpf/rte_bpf_ethdev.h    |   50 ++
 lib/librte_bpf/rte_bpf_version.map |   16 +
 mk/rte.app.mk                      |    2 +
 test/bpf/dummy.c                   |   20 +
 test/bpf/mbuf.h                    |  556 +++++++++++++++
 test/bpf/t1.c                      |   53 ++
 test/bpf/t2.c                      |   30 +
 test/bpf/t3.c                      |   36 +
 22 files changed, 3964 insertions(+)
 create mode 100644 app/test-pmd/bpf_sup.h
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
@ 2018-03-09 16:42 ` Konstantin Ananyev
  2018-03-13 13:24   ` Jerin Jacob
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 2/5] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-09 16:42 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.
It supports basic set of features from eBPF spec
(https://www.kernel.org/doc/Documentation/networking/filter.txt).

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb

It also adds dependency on libelf.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/common_base                 |   5 +
 config/common_linuxapp             |   1 +
 lib/Makefile                       |   2 +
 lib/librte_bpf/Makefile            |  30 +++
 lib/librte_bpf/bpf.c               |  48 ++++
 lib/librte_bpf/bpf_exec.c          | 452 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h          |  37 +++
 lib/librte_bpf/bpf_load.c          | 380 +++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_validate.c      |  55 +++++
 lib/librte_bpf/rte_bpf.h           | 158 +++++++++++++
 lib/librte_bpf/rte_bpf_version.map |  12 +
 mk/rte.app.mk                      |   2 +
 12 files changed, 1182 insertions(+)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/config/common_base b/config/common_base
index ad03cf433..2205b684f 100644
--- a/config/common_base
+++ b/config/common_base
@@ -823,3 +823,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=n
diff --git a/config/common_linuxapp b/config/common_linuxapp
index ff98f2355..7b4a0ce7d 100644
--- a/config/common_linuxapp
+++ b/config/common_linuxapp
@@ -10,6 +10,7 @@ CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=y
 CONFIG_RTE_EAL_IGB_UIO=y
 CONFIG_RTE_EAL_VFIO=y
 CONFIG_RTE_KNI_KMOD=y
+CONFIG_RTE_LIBRTE_BPF=y
 CONFIG_RTE_LIBRTE_KNI=y
 CONFIG_RTE_LIBRTE_PMD_KNI=y
 CONFIG_RTE_LIBRTE_VHOST=y
diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..a4a2329f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..e0f434e77
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+LDLIBS += -lelf
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..4727d2251
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+
+	if (rc != 0)
+		RTE_LOG(WARNING, USER1, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..f1c1d3be3
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,452 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define BPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define BPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_LOG(ERR, USER1, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[MAX_BPF_REG])
+{
+	const struct bpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			reg[BPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[BPF_REG_1], reg[BPF_REG_2], reg[BPF_REG_3],
+				reg[BPF_REG_4], reg[BPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			return reg[BPF_REG_0];
+		default:
+			RTE_LOG(ERR, USER1,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[MAX_BPF_REG];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[BPF_REG_1] = (uintptr_t)ctx[i];
+		reg[BPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..f09417088
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+#include <linux/bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..6ced9c640
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,380 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+static uint32_t
+bpf_find_xsym(const char *sn, enum rte_bpf_xtype type,
+	const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == type && strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+/*
+ * update BPF code at offset *ofs* with a proper address(index) for external
+ * symbol *sn*
+ */
+static int
+resolve_xsym(const char *sn, size_t ofs, struct bpf_insn *ins, size_t ins_sz,
+	const struct rte_bpf_prm *prm)
+{
+	uint32_t idx, fidx;
+	enum rte_bpf_xtype type;
+
+	if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+		return -EINVAL;
+
+	idx = ofs / sizeof(ins[0]);
+	if (ins[idx].code == (BPF_JMP | BPF_CALL))
+		type = RTE_BPF_XTYPE_FUNC;
+	else if (ins[idx].code == (BPF_LD | BPF_IMM | BPF_DW) &&
+			ofs < ins_sz - sizeof(ins[idx]))
+		type = RTE_BPF_XTYPE_VAR;
+	else
+		return -EINVAL;
+
+	fidx = bpf_find_xsym(sn, type, prm->xsym, prm->nb_xsym);
+	if (fidx == UINT32_MAX)
+		return -ENOENT;
+
+	/* for function we just need an index in our xsym table */
+	if (type == RTE_BPF_XTYPE_FUNC)
+		ins[idx].imm = fidx;
+	/* for variable we need to store its absolute address */
+	else {
+		ins[idx].imm = (uintptr_t)prm->xsym[fidx].var;
+		ins[idx + 1].imm = (uintptr_t)prm->xsym[fidx].var >> 32;
+	}
+
+	return 0;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr * eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_LOG(ERR, USER1, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_LOG(ERR, USER1, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct bpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_LOG(ERR, USER1, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct bpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	int32_t rc;
+	uint32_t i, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		rc = resolve_xsym(sn, ofs, ins, ins_sz, prm);
+		if (rc != 0) {
+			RTE_LOG(ERR, USER1,
+				"resolve_xsym(%s, %zu) error code: %d\n",
+				sn, ofs, rc);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct bpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_LOG(ERR, USER1, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_LOG(ERR, USER1,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_LOG(INFO, USER1, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p(jit={.func=%p,.sz=%zu});\n",
+		__func__, fname, sname, bpf, bpf->jit.func, bpf->jit.sz);
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..7c1267cbd
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct bpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == BPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_LOG(ERR, USER1, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..efee35ad4
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,158 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <linux/bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_VAR, /**< variable */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	union {
+		uint64_t (*func)(uint64_t, uint64_t, uint64_t,
+				uint64_t, uint64_t);
+		void *var;
+	}; /**< value */
+};
+
+/**
+ * Possible BPF program types.
+ */
+enum rte_bpf_prog_type {
+	RTE_BPF_PROG_TYPE_UNSPEC = BPF_PROG_TYPE_UNSPEC,
+	/**< input is a pointer to raw data */
+	RTE_BPF_PROG_TYPE_MBUF,
+	/**< input is a pointer to rte_mbuf */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct bpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	enum rte_bpf_prog_type prog_type; /**< eBPF program type */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *);
+	size_t sz;
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ff65144df
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_elf_load;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 3eb41d176..fb41c77d2 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -83,6 +83,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)          += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf -lelf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v1 2/5] bpf: add JIT compilation for x86_64 ISA.
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework Konstantin Ananyev
@ 2018-03-09 16:42 ` Konstantin Ananyev
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 3/5] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-09 16:42 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/Makefile      |    3 +
 lib/librte_bpf/bpf.c         |    4 +
 lib/librte_bpf/bpf_jit_x86.c | 1329 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1336 insertions(+)
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index e0f434e77..44b12c439 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -23,6 +23,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
+endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
index 4727d2251..b69d20fc8 100644
--- a/lib/librte_bpf/bpf.c
+++ b/lib/librte_bpf/bpf.c
@@ -39,7 +39,11 @@ bpf_jit(struct rte_bpf *bpf)
 {
 	int32_t rc;
 
+#ifdef RTE_ARCH_X86_64
+	rc = bpf_jit_x86(bpf);
+#else
 	rc = -ENOTSUP;
+#endif
 
 	if (rc != 0)
 		RTE_LOG(WARNING, USER1, "%s(%p) failed, error code: %d;\n",
diff --git a/lib/librte_bpf/bpf_jit_x86.c b/lib/librte_bpf/bpf_jit_x86.c
new file mode 100644
index 000000000..e6a331353
--- /dev/null
+++ b/lib/librte_bpf/bpf_jit_x86.c
@@ -0,0 +1,1329 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define GET_BPF_OP(op)	(BPF_OP(op) >> 4)
+
+enum {
+	RAX = 0,  /* scratch, return value */
+	RCX = 1,  /* scratch, 4th arg */
+	RDX = 2,  /* scratch, 3rd arg */
+	RBX = 3,  /* callee saved */
+	RSP = 4,  /* stack pointer */
+	RBP = 5,  /* frame pointer, callee saved */
+	RSI = 6,  /* scratch, 2nd arg */
+	RDI = 7,  /* scratch, 1st arg */
+	R8  = 8,  /* scratch, 5th arg */
+	R9  = 9,  /* scratch, 6th arg */
+	R10 = 10, /* scratch */
+	R11 = 11, /* scratch */
+	R12 = 12, /* callee saved */
+	R13 = 13, /* callee saved */
+	R14 = 14, /* callee saved */
+	R15 = 15, /* callee saved */
+};
+
+#define IS_EXT_REG(r)	((r) >= R8)
+
+enum {
+	REX_PREFIX = 0x40, /* fixed value 0100 */
+	REX_W = 0x8,       /* 64bit operand size */
+	REX_R = 0x4,       /* extension of the ModRM.reg field */
+	REX_X = 0x2,       /* extension of the SIB.index field */
+	REX_B = 0x1,       /* extension of the ModRM.rm field */
+};
+
+enum {
+	MOD_INDIRECT = 0,
+	MOD_IDISP8 = 1,
+	MOD_IDISP32 = 2,
+	MOD_DIRECT = 3,
+};
+
+enum {
+	SIB_SCALE_1 = 0,
+	SIB_SCALE_2 = 1,
+	SIB_SCALE_4 = 2,
+	SIB_SCALE_8 = 3,
+};
+
+/*
+ * eBPF to x86_64 register mappings.
+ */
+static const uint32_t ebpf2x86[] = {
+	[BPF_REG_0] = RAX,
+	[BPF_REG_1] = RDI,
+	[BPF_REG_2] = RSI,
+	[BPF_REG_3] = RDX,
+	[BPF_REG_4] = RCX,
+	[BPF_REG_5] = R8,
+	[BPF_REG_6] = RBX,
+	[BPF_REG_7] = R13,
+	[BPF_REG_8] = R14,
+	[BPF_REG_9] = R15,
+	[BPF_REG_10] = RBP,
+};
+
+/*
+ * r10 and r11 are used as a scratch temporary registers.
+ */
+enum {
+	REG_DIV_IMM = R9,
+	REG_TMP0 = R11,
+	REG_TMP1 = R10,
+};
+
+/*
+ * callee saved registers list.
+ * keep RBP as the last one.
+ */
+static const uint32_t save_regs[] = {RBX, R12, R13, R14, R15, RBP};
+
+struct bpf_jit_state {
+	uint32_t idx;
+	size_t sz;
+	struct {
+		uint32_t num;
+		int32_t off;
+	} exit;
+	uint32_t reguse;
+	int32_t *off;
+	uint8_t *ins;
+};
+
+#define	INUSE(v, r)	(((v) >> (r)) & 1)
+#define	USED(v, r)	((v) |= 1 << (r))
+
+union bpf_jit_imm {
+	uint32_t u32;
+	uint8_t u8[4];
+};
+
+static size_t
+bpf_size(uint32_t bpf_op_sz)
+{
+	if (bpf_op_sz == BPF_B)
+		return sizeof(uint8_t);
+	else if (bpf_op_sz == BPF_H)
+		return sizeof(uint16_t);
+	else if (bpf_op_sz == BPF_W)
+		return sizeof(uint32_t);
+	else if (bpf_op_sz == BPF_DW)
+		return sizeof(uint64_t);
+	return 0;
+}
+
+/*
+ * In many cases for imm8 we can produce shorter code.
+ */
+static size_t
+imm_size(int32_t v)
+{
+	if (v == (int8_t)v)
+		return sizeof(int8_t);
+	return sizeof(int32_t);
+}
+
+static void
+emit_bytes(struct bpf_jit_state *st, const uint8_t ins[], uint32_t sz)
+{
+	uint32_t i;
+
+	if (st->ins != NULL) {
+		for (i = 0; i != sz; i++)
+			st->ins[st->sz + i] = ins[i];
+	}
+	st->sz += sz;
+}
+
+static void
+emit_imm(struct bpf_jit_state *st, const uint32_t imm, uint32_t sz)
+{
+	union bpf_jit_imm v;
+
+	v.u32 = imm;
+	emit_bytes(st, v.u8, sz);
+}
+
+/*
+ * emit REX byte
+ */
+static void
+emit_rex(struct bpf_jit_state *st, uint32_t op, uint32_t reg, uint32_t rm)
+{
+	uint8_t rex;
+
+	/* mark operand registers as used*/
+	USED(st->reguse, reg);
+	USED(st->reguse, rm);
+
+	rex = 0;
+	if (BPF_CLASS(op) == BPF_ALU64 ||
+			op == (BPF_ST | BPF_MEM | BPF_DW) ||
+			op == (BPF_STX | BPF_MEM | BPF_DW) ||
+			op == (BPF_STX | BPF_XADD | BPF_DW) ||
+			op == (BPF_LD | BPF_IMM | BPF_DW) ||
+			(BPF_CLASS(op) == BPF_LDX &&
+			BPF_MODE(op) == BPF_MEM &&
+			BPF_SIZE(op) != BPF_W))
+		rex |= REX_W;
+
+	if (IS_EXT_REG(reg))
+		rex |= REX_R;
+
+	if (IS_EXT_REG(rm))
+		rex |= REX_B;
+
+	/* store using SIL, DIL */
+	if (op == (BPF_STX | BPF_MEM | BPF_B) && (reg == RDI || reg == RSI))
+		rex |= REX_PREFIX;
+
+	if (rex != 0) {
+		rex |= REX_PREFIX;
+		emit_bytes(st, &rex, sizeof(rex));
+	}
+}
+
+/*
+ * emit MODRegRM byte
+ */
+static void
+emit_modregrm(struct bpf_jit_state *st, uint32_t mod, uint32_t reg, uint32_t rm)
+{
+	uint8_t v;
+
+	v = mod << 6 | (reg & 7) << 3 | (rm & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit SIB byte
+ */
+static void
+emit_sib(struct bpf_jit_state *st, uint32_t scale, uint32_t idx, uint32_t base)
+{
+	uint8_t v;
+
+	v = scale << 6 | (idx & 7) << 3 | (base & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit xchg %<sreg>, %<dreg>
+ */
+static void
+emit_xchg_reg(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	const uint8_t ops = 0x87;
+
+	emit_rex(st, BPF_ALU64, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit neg %<dreg>
+ */
+static void
+emit_neg(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 3;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+/*
+ * emit mov %<sreg>, %<dreg>
+ */
+static void
+emit_mov_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x89;
+
+	/* if operands are 32-bit, then it can be used to clear upper 32-bit */
+	if (sreg != dreg || BPF_CLASS(op) == BPF_ALU) {
+		emit_rex(st, op, sreg, dreg);
+		emit_bytes(st, &ops, sizeof(ops));
+		emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+	}
+}
+
+/*
+ * emit movzwl %<sreg>, %<dreg>
+ */
+static void
+emit_movzwl(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	static const uint8_t ops[] = {0x0F, 0xB7};
+
+	emit_rex(st, BPF_ALU, sreg, dreg);
+	emit_bytes(st, ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit ror <imm8>, %<dreg>
+ */
+static void
+emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t prfx = 0x66;
+	const uint8_t ops = 0xC1;
+	const uint8_t mods = 1;
+
+	emit_bytes(st, &prfx, sizeof(prfx));
+	emit_rex(st, BPF_ALU, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit bswap %<dreg>
+ */
+static void
+emit_be2le_48(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	uint32_t rop;
+
+	const uint8_t ops = 0x0F;
+	const uint8_t mods = 1;
+
+	rop = (imm == 64) ? BPF_ALU64 : BPF_ALU;
+	emit_rex(st, rop, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+static void
+emit_be2le(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16) {
+		emit_ror_imm(st, dreg, 8);
+		emit_movzwl(st, dreg, dreg);
+	} else
+		emit_be2le_48(st, dreg, imm);
+}
+
+/*
+ * In general it is NOP for x86.
+ * Just clear the upper bits.
+ */
+static void
+emit_le2be(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16)
+		emit_movzwl(st, dreg, dreg);
+	else if (imm == 32)
+		emit_mov_reg(st, BPF_ALU | BPF_MOV | BPF_X, dreg, dreg);
+}
+
+/*
+ * emit one of:
+ *   add <imm>, %<dreg>
+ *   and <imm>, %<dreg>
+ *   or  <imm>, %<dreg>
+ *   sub <imm>, %<dreg>
+ *   xor <imm>, %<dreg>
+ */
+static void
+emit_alu_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t mod, opcode;
+	uint32_t bop, imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0,
+		[GET_BPF_OP(BPF_AND)] = 4,
+		[GET_BPF_OP(BPF_OR)] =  1,
+		[GET_BPF_OP(BPF_SUB)] = 5,
+		[GET_BPF_OP(BPF_XOR)] = 6,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+
+	imsz = imm_size(imm);
+	opcode = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &opcode, sizeof(opcode));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit one of:
+ *   add %<sreg>, %<dreg>
+ *   and %<sreg>, %<dreg>
+ *   or  %<sreg>, %<dreg>
+ *   sub %<sreg>, %<dreg>
+ *   xor %<sreg>, %<dreg>
+ */
+static void
+emit_alu_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0x01,
+		[GET_BPF_OP(BPF_AND)] = 0x21,
+		[GET_BPF_OP(BPF_OR)] =  0x09,
+		[GET_BPF_OP(BPF_SUB)] = 0x29,
+		[GET_BPF_OP(BPF_XOR)] = 0x31,
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+static void
+emit_shift(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	uint8_t mod;
+	uint32_t bop, opx;
+
+	static const uint8_t ops[] = {0xC1, 0xD3};
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_LSH)] = 4,
+		[GET_BPF_OP(BPF_RSH)] = 5,
+		[GET_BPF_OP(BPF_ARSH)] = 7,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+	opx = (BPF_SRC(op) == BPF_X);
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+}
+
+/*
+ * emit one of:
+ *   shl <imm>, %<dreg>
+ *   shr <imm>, %<dreg>
+ *   sar <imm>, %<dreg>
+ */
+static void
+emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm)
+{
+	emit_shift(st, op, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit one of:
+ *   shl %<dreg>
+ *   shr %<dreg>
+ *   sar %<dreg>
+ * note that rcx is implicitly used as a source register, so few extra
+ * instructions for register spillage might be necessary.
+ */
+static void
+emit_shift_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+
+	emit_shift(st, op, (dreg == RCX) ? sreg : dreg);
+
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+}
+
+/*
+ * emit mov <imm>, %<dreg>
+ */
+static void
+emit_mov_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xC7;
+
+	if (imm == 0) {
+		/* replace 'mov 0, %<dst>' with 'xor %<dst>, %<dst>' */
+		op = BPF_CLASS(op) | BPF_XOR | BPF_X;
+		emit_alu_reg(st, op, dreg, dreg);
+		return;
+	}
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+	emit_imm(st, imm, sizeof(imm));
+}
+
+/*
+ * emit mov <imm64>, %<dreg>
+ */
+static void
+emit_ld_imm64(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm0,
+	uint32_t imm1)
+{
+	const uint8_t ops = 0xB8;
+
+	if (imm1 == 0) {
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, dreg, imm0);
+		return;
+	}
+
+	emit_rex(st, BPF_ALU64, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+
+	emit_imm(st, imm0, sizeof(imm0));
+	emit_imm(st, imm1, sizeof(imm1));
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * either:
+ *   mov %<sreg>, %rdx
+ * OR
+ *   mov <imm>, %rdx
+ * mul %rdx
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ */
+static void
+emit_mul(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 4;
+
+	/* save rax & rdx */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, REG_TMP0);
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* rax = dreg */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, dreg, RAX);
+
+	if (BPF_CLASS(op) == BPF_X)
+		/* rdx = sreg */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X,
+			sreg == RAX ? REG_TMP0 : sreg, RDX);
+	else
+		/* rdx = imm */
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, RDX, imm);
+
+	emit_rex(st, op, RAX, RDX);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RDX);
+
+	if (dreg != RDX)
+		/* restore rdx */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP1, RDX);
+
+	if (dreg != RAX) {
+		/* dreg = rax */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, dreg);
+		/* restore rax */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP0, RAX);
+	}
+}
+
+/*
+ * emit mov <ofs>(%<sreg>), %<dreg>
+ * note that for non 64-bit ops, higher bits have to be cleared.
+ */
+static void
+emit_ld_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	uint32_t mods, opsz;
+	const uint8_t op32 = 0x8B;
+	const uint8_t op16[] = {0x0F, 0xB7};
+	const uint8_t op8[] = {0x0F, 0xB6};
+
+	emit_rex(st, op, dreg, sreg);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_B)
+		emit_bytes(st, op8, sizeof(op8));
+	else if (opsz == BPF_H)
+		emit_bytes(st, op16, sizeof(op16));
+	else
+		emit_bytes(st, &op32, sizeof(op32));
+
+	mods = (imm_size(ofs) == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, dreg, sreg);
+	if (sreg == RSP || sreg == R12)
+		emit_sib(st, SIB_SCALE_1, sreg, sreg);
+	emit_imm(st, ofs, imm_size(ofs));
+}
+
+/*
+ * emit one of:
+ *   mov %<sreg>, <ofs>(%<dreg>)
+ *   mov <imm>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_common(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, uint32_t imm, int32_t ofs)
+{
+	uint32_t mods, imsz, opsz, opx;
+	const uint8_t prfx16 = 0x66;
+
+	/* 8 bit instruction opcodes */
+	static const uint8_t op8[] = {0xC6, 0x88};
+
+	/* 16/32/64 bit instruction opcodes */
+	static const uint8_t ops[] = {0xC7, 0x89};
+
+	/* is the instruction has immediate value or src reg? */
+	opx = (BPF_CLASS(op) == BPF_STX);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_H)
+		emit_bytes(st, &prfx16, sizeof(prfx16));
+
+	emit_rex(st, op, sreg, dreg);
+
+	if (opsz == BPF_B)
+		emit_bytes(st, &op8[opx], sizeof(op8[opx]));
+	else
+		emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, sreg, dreg);
+
+	if (dreg == RSP || dreg == R12)
+		emit_sib(st, SIB_SCALE_1, dreg, dreg);
+
+	emit_imm(st, ofs, imsz);
+
+	if (opx == 0)
+		emit_imm(st, imm, bpf_size(opsz));
+}
+
+static void
+emit_st_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm,
+	int32_t ofs)
+{
+	emit_st_common(st, op, 0, dreg, imm, ofs);
+}
+
+static void
+emit_st_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	emit_st_common(st, op, sreg, dreg, 0, ofs);
+}
+
+/*
+ * emit lock add %<sreg>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_xadd(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	uint32_t imsz, mods;
+
+	const uint8_t lck = 0xF0; /* lock prefix */
+	const uint8_t ops = 0x01; /* add opcode */
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_bytes(st, &lck, sizeof(lck));
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, mods, sreg, dreg);
+	emit_imm(st, ofs, imsz);
+}
+
+/*
+ * emit:
+ *    mov <imm64>, (%rax)
+ *    call *%rax
+ */
+static void
+emit_call(struct bpf_jit_state *st, uintptr_t trg)
+{
+	const uint8_t ops = 0xFF;
+	const uint8_t mods = 2;
+
+	emit_ld_imm64(st, RAX, trg, trg >> 32);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RAX);
+}
+
+/*
+ * emit jmp <ofs>
+ */
+static void
+emit_jmp(struct bpf_jit_state *st, int32_t ofs)
+{
+	int32_t joff;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0xEB;
+	const uint8_t op32 = 0xE9;
+
+	const int32_t sz8 = sizeof(op8) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32) + sizeof(uint32_t);
+
+	/* max possible jmp instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = st->off[st->idx + ofs] - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8, sizeof(op8));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, &op32, sizeof(op32));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit one of:
+ *    cmovz %<sreg>, <%dreg>
+ *    cmovne %<sreg>, <%dreg>
+ *    cmova %<sreg>, <%dreg>
+ *    cmovb %<sreg>, <%dreg>
+ *    cmovae %<sreg>, <%dreg>
+ *    cmovbe %<sreg>, <%dreg>
+ *    cmovg %<sreg>, <%dreg>
+ *    cmovl %<sreg>, <%dreg>
+ *    cmovge %<sreg>, <%dreg>
+ *    cmovle %<sreg>, <%dreg>
+ */
+static void
+emit_movcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x44},  /* CMOVZ */
+		[GET_BPF_OP(BPF_JNE)] = {0x0F, 0x45},  /* CMOVNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x47},  /* CMOVA */
+		[GET_BPF_OP(BPF_JLT)] = {0x0F, 0x42},  /* CMOVB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x43},  /* CMOVAE */
+		[GET_BPF_OP(BPF_JLE)] = {0x0F, 0x46},  /* CMOVBE */
+		[GET_BPF_OP(BPF_JSGT)] = {0x0F, 0x4F}, /* CMOVG */
+		[GET_BPF_OP(BPF_JSLT)] = {0x0F, 0x4C}, /* CMOVL */
+		[GET_BPF_OP(BPF_JSGE)] = {0x0F, 0x4D}, /* CMOVGE */
+		[GET_BPF_OP(BPF_JSLE)] = {0x0F, 0x4E}, /* CMOVLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x45}, /* CMOVNE */
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit one of:
+ * je <ofs>
+ * jne <ofs>
+ * ja <ofs>
+ * jb <ofs>
+ * jae <ofs>
+ * jbe <ofs>
+ * jg <ofs>
+ * jl <ofs>
+ * jge <ofs>
+ * jle <ofs>
+ */
+static void
+emit_jcc(struct bpf_jit_state *st, uint32_t op, int32_t ofs)
+{
+	uint32_t bop, imsz;
+	int32_t joff;
+
+	static const uint8_t op8[] = {
+		[GET_BPF_OP(BPF_JEQ)] = 0x74,  /* JE */
+		[GET_BPF_OP(BPF_JNE)] = 0x75,  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = 0x77,  /* JA */
+		[GET_BPF_OP(BPF_JLT)] = 0x72,  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = 0x73,  /* JAE */
+		[GET_BPF_OP(BPF_JLE)] = 0x76,  /* JBE */
+		[GET_BPF_OP(BPF_JSGT)] = 0x7F, /* JG */
+		[GET_BPF_OP(BPF_JSLT)] = 0x7C, /* JL */
+		[GET_BPF_OP(BPF_JSGE)] = 0x7D, /*JGE */
+		[GET_BPF_OP(BPF_JSLE)] = 0x7E, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = 0x75, /*JNE */
+	};
+
+	static const uint8_t op32[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x84},  /* JE */
+		[GET_BPF_OP(BPF_JNE)] = {0x0F, 0x85},  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x87},  /* JA */
+		[GET_BPF_OP(BPF_JLT)] = {0x0F, 0x82},  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x83},  /* JAE */
+		[GET_BPF_OP(BPF_JLE)] = {0x0F, 0x86},  /* JBE */
+		[GET_BPF_OP(BPF_JSGT)] = {0x0F, 0x8F}, /* JG */
+		[GET_BPF_OP(BPF_JSLT)] = {0x0F, 0x8C}, /* JL */
+		[GET_BPF_OP(BPF_JSGE)] = {0x0F, 0x8D}, /*JGE */
+		[GET_BPF_OP(BPF_JSLE)] = {0x0F, 0x8E}, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x85}, /*JNE */
+	};
+
+	const int32_t sz8 = sizeof(op8[0]) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32[0]) + sizeof(uint32_t);
+
+	/* max possible jcc instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = st->off[st->idx + ofs] - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	bop = GET_BPF_OP(op);
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8[bop], sizeof(op8[bop]));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, op32[bop], sizeof(op32[bop]));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit cmp <imm>, %<dreg>
+ */
+static void
+emit_cmp_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t ops;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	const uint8_t mods = 7;
+
+	imsz = imm_size(imm);
+	ops = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit test <imm>, %<dreg>
+ */
+static void
+emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 0;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+static void
+emit_jcc_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_imm(st, BPF_ALU64, dreg, imm);
+	else
+		emit_cmp_imm(st, BPF_ALU64, dreg, imm);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * emit test %<sreg>, %<dreg>
+ */
+static void
+emit_tst_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x85;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit cmp %<sreg>, %<dreg>
+ */
+static void
+emit_cmp_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x39;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+
+}
+
+static void
+emit_jcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_reg(st, BPF_ALU64, sreg, dreg);
+	else
+		emit_cmp_reg(st, BPF_ALU64, sreg, dreg);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * xor %rdx, %rdx
+ * for divisor as immediate value:
+ *   mov <imm>, %r9
+ * div %<divisor_reg>
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ * either:
+ *   mov %rax, %<dreg>
+ * OR
+ *   mov %rdx, %<dreg>
+ * mov %r11, %rax
+ * mov %r10, %rdx
+ */
+static void
+emit_div(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	uint32_t sr;
+
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 6;
+
+	if (BPF_SRC(op) == BPF_X) {
+		emit_tst_reg(st, BPF_CLASS(op), sreg, sreg);
+		emit_movcc_reg(st, BPF_CLASS(op) | BPF_JEQ | BPF_X, sreg, RAX);
+		emit_jcc(st, BPF_JMP | BPF_JEQ | BPF_K, st->exit.off);
+	}
+
+	/* save rax & rdx */
+	if (dreg != RAX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, REG_TMP0);
+	if (dreg != RDX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* fill rax & rdx */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, dreg, RAX);
+	emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, RDX, 0);
+
+	if (BPF_SRC(op) == BPF_X) {
+		sr = sreg;
+		if (sr == RAX)
+			sr = REG_TMP0;
+		else if (sr == RDX)
+			sr = REG_TMP1;
+	} else {
+		sr = REG_DIV_IMM;
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, sr, imm);
+	}
+
+	emit_rex(st, op, 0, sr);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, sr);
+
+	if (BPF_OP(op) == BPF_DIV)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, dreg);
+	else
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, dreg);
+
+	if (dreg != RAX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP0, RAX);
+	if (dreg != RDX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP1, RDX);
+}
+
+static void
+emit_prolog(struct bpf_jit_state *st, int32_t stack_size)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	/* we can avoid touching the stack at all */
+	if (spil == 0)
+		return;
+
+
+	emit_alu_imm(st, BPF_ALU64 | BPF_SUB | BPF_K, RSP,
+		spil * sizeof(uint64_t));
+
+	ofs = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++) {
+		if (INUSE(st->reguse, save_regs[i]) != 0) {
+			emit_st_reg(st, BPF_STX | BPF_MEM | BPF_DW,
+				save_regs[i], RSP, ofs);
+			ofs += sizeof(uint64_t);
+		}
+	}
+
+	if (INUSE(st->reguse, RBP) != 0) {
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RSP, RBP);
+		emit_alu_imm(st, BPF_ALU64 | BPF_SUB | BPF_K, RSP, stack_size);
+	}
+}
+
+/*
+ * emit ret
+ */
+static void
+emit_ret(struct bpf_jit_state *st)
+{
+	const uint8_t ops = 0xC3;
+
+	emit_bytes(st, &ops, sizeof(ops));
+}
+
+static void
+emit_epilog(struct bpf_jit_state *st)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	/* if we allready have an epilog generate a jump to it */
+	if (st->exit.num++ != 0) {
+		emit_jcc(st, BPF_JMP | BPF_JA | BPF_K, st->exit.off);
+		return;
+	}
+
+	/* store offset of epilog block */
+	st->exit.off = st->sz;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	if (spil != 0) {
+
+		if (INUSE(st->reguse, RBP) != 0)
+			emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RBP, RSP);
+
+		ofs = 0;
+		for (i = 0; i != RTE_DIM(save_regs); i++) {
+			if (INUSE(st->reguse, save_regs[i]) != 0) {
+				emit_ld_reg(st, BPF_LDX | BPF_MEM | BPF_DW,
+					RSP, save_regs[i], ofs);
+				ofs += sizeof(uint64_t);
+			}
+		}
+
+		emit_alu_imm(st, BPF_ALU64 | BPF_ADD | BPF_K, RSP,
+			spil * sizeof(uint64_t));
+	}
+
+	emit_ret(st);
+}
+
+/*
+ * walk through bpf code and translate them x86_64 one.
+ */
+static int
+emit(struct bpf_jit_state *st, const struct rte_bpf *bpf)
+{
+	uint32_t i, dr, op, sr;
+	const struct bpf_insn *ins;
+
+	/* reset state fields */
+	st->sz = 0;
+	st->exit.num = 0;
+
+	emit_prolog(st, bpf->stack_sz);
+
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		st->idx = i;
+		st->off[i] = st->sz;
+
+		ins = bpf->prm.ins + i;
+
+		dr = ebpf2x86[ins->dst_reg];
+		sr = ebpf2x86[ins->src_reg];
+		op = ins->code;
+
+		switch (op) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+		case (BPF_ALU | BPF_SUB | BPF_K):
+		case (BPF_ALU | BPF_AND | BPF_K):
+		case (BPF_ALU | BPF_OR | BPF_K):
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+		case (BPF_ALU | BPF_SUB | BPF_X):
+		case (BPF_ALU | BPF_AND | BPF_X):
+		case (BPF_ALU | BPF_OR | BPF_X):
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			emit_be2le(st, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			emit_le2be(st, dr, ins->imm);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		/* multiply instructions */
+		case (BPF_ALU | BPF_MUL | BPF_K):
+		case (BPF_ALU | BPF_MUL | BPF_X):
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			emit_mul(st, op, sr, dr, ins->imm);
+			break;
+		/* divide instructions */
+		case (BPF_ALU | BPF_DIV | BPF_K):
+		case (BPF_ALU | BPF_MOD | BPF_K):
+		case (BPF_ALU | BPF_DIV | BPF_X):
+		case (BPF_ALU | BPF_MOD | BPF_X):
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			emit_div(st, op, sr, dr, ins->imm);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+		case (BPF_LDX | BPF_MEM | BPF_H):
+		case (BPF_LDX | BPF_MEM | BPF_W):
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			emit_ld_reg(st, op, sr, dr, ins->off);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			emit_ld_imm64(st, dr, ins[0].imm, ins[1].imm);
+			i++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+		case (BPF_STX | BPF_MEM | BPF_H):
+		case (BPF_STX | BPF_MEM | BPF_W):
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			emit_st_reg(st, op, sr, dr, ins->off);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+		case (BPF_ST | BPF_MEM | BPF_H):
+		case (BPF_ST | BPF_MEM | BPF_W):
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			emit_st_imm(st, op, dr, ins->imm, ins->off);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			emit_st_xadd(st, op, sr, dr, ins->off);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			emit_jmp(st, ins->off + 1);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | BPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | BPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | BPF_JLE | BPF_K):
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			emit_jcc_imm(st, op, dr, ins->imm, ins->off + 1);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | BPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | BPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | BPF_JLE | BPF_X):
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			emit_jcc_reg(st, op, sr, dr, ins->off + 1);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			emit_call(st, (uintptr_t)bpf->prm.xsym[ins->imm].func);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			emit_epilog(st);
+			break;
+		default:
+			RTE_LOG(ERR, USER1,
+				"%s(%p): invalid opcode %#x at pc: %u;\n",
+				__func__, bpf, ins->code, i);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * produce a native ISA version of the given BPF code.
+ */
+int
+bpf_jit_x86(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	uint32_t i;
+	size_t sz;
+	struct bpf_jit_state st;
+
+	/* init state */
+	memset(&st, 0, sizeof(st));
+	st.off = malloc(bpf->prm.nb_ins * sizeof(st.off[0]));
+	if (st.off == NULL)
+		return -ENOMEM;
+
+	/* fill with fake offsets */
+	st.exit.off = INT32_MAX;
+	for (i = 0; i != bpf->prm.nb_ins; i++)
+		st.off[i] = INT32_MAX;
+
+	/*
+	 * dry runs, used to calculate total code size and valid jump offsets.
+	 * stop when we get minimal possible size
+	 */
+	do {
+		sz = st.sz;
+		rc = emit(&st, bpf);
+	} while (rc == 0 && sz != st.sz);
+
+	if (rc == 0) {
+
+		/* allocate memory needed */
+		st.ins = mmap(NULL, st.sz, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (st.ins == MAP_FAILED)
+			rc = -ENOMEM;
+		else
+			/* generate code */
+			rc = emit(&st, bpf);
+	}
+
+	if (rc == 0 && mprotect(st.ins, st.sz, PROT_READ | PROT_EXEC) != 0)
+		rc = -ENOMEM;
+
+	if (rc != 0)
+		munmap(st.ins, st.sz);
+	else {
+		bpf->jit.func = (void *)st.ins;
+		bpf->jit.sz = st.sz;
+	}
+
+	free(st.off);
+	return rc;
+}
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v1 3/5] bpf: introduce basic RX/TX BPF filters
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework Konstantin Ananyev
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 2/5] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
@ 2018-03-09 16:42 ` Konstantin Ananyev
  2018-03-13 13:39   ` Jerin Jacob
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 4/5] testpmd: new commands to load/unload " Konstantin Ananyev
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-09 16:42 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce API to install BPF based filters on ethdev RX/TX path.
Current implementation is pure SW one, based on ethdev RX/TX
callback mechanism.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/Makefile            |   2 +
 lib/librte_bpf/bpf_pkt.c           | 524 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/rte_bpf_ethdev.h    |  50 ++++
 lib/librte_bpf/rte_bpf_version.map |   4 +
 4 files changed, 580 insertions(+)
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index 44b12c439..501c49c60 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -22,6 +22,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_pkt.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
 ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
@@ -29,5 +30,6 @@ endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf_ethdev.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf_pkt.c b/lib/librte_bpf/bpf_pkt.c
new file mode 100644
index 000000000..b0177ad82
--- /dev/null
+++ b/lib/librte_bpf/bpf_pkt.c
@@ -0,0 +1,524 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include <sys/queue.h>
+#include <sys/stat.h>
+
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include <rte_malloc.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_cycles.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+
+#include <rte_bpf_ethdev.h>
+
+/*
+ * information about all installed BPF rx/tx callbacks
+ */
+
+struct bpf_eth_cbi {
+	uint32_t use;    /*usage counter */
+	void *cb;        /* callback handle */
+	struct rte_bpf *bpf;
+	struct rte_bpf_jit jit;
+} __rte_cache_aligned;
+
+/*
+ * Odd number means that callback is used by datapath.
+ * Even number means that callback is not used by datapath.
+ */
+#define BPF_ETH_CBI_INUSE  1
+
+static struct bpf_eth_cbi rx_cbi[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+static struct bpf_eth_cbi tx_cbi[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+
+/*
+ * Marks given callback as used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
+{
+	cbi->use++;
+	/* make sure no store/load reordering could happen */
+	rte_smp_mb();
+}
+
+/*
+ * Marks given callback list as not used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
+{
+	/* make sure all previous loads are completed */
+	rte_smp_rmb();
+	cbi->use++;
+}
+
+/*
+ * Waits till datapath finished using given callback.
+ */
+static void
+bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
+{
+	uint32_t nuse, puse;
+
+	/* make sure all previous loads and stores are completed */
+	rte_smp_mb();
+
+	puse = cbi->use;
+
+	/* in use, busy wait till current RX/TX iteration is finished */
+	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
+		do {
+			rte_pause();
+			rte_compiler_barrier();
+			nuse = cbi->use;
+		} while (nuse == puse);
+	}
+}
+
+static void
+bpf_eth_cbi_cleanup(struct bpf_eth_cbi *bc)
+{
+	bc->bpf = NULL;
+	memset(&bc->jit, 0, sizeof(bc->jit));
+}
+
+/*
+ * BPF packet processing routinies.
+ */
+
+static inline uint32_t
+apply_filter(struct rte_mbuf *mb[], const uint64_t rc[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i, j, k;
+	struct rte_mbuf *dr[num];
+
+	for (i = 0, j = 0, k = 0; i != num; i++) {
+
+		/* filter matches */
+		if (rc[i] != 0)
+			mb[j++] = mb[i];
+		/* no match */
+		else
+			dr[k++] = mb[i];
+	}
+
+	if (drop != 0) {
+		/* free filtered out mbufs */
+		for (i = 0; i != k; i++)
+			rte_pktmbuf_free(dr[i]);
+	} else {
+		/* copy filtered out mbufs beyond good ones */
+		for (i = 0; i != k; i++)
+			mb[j + i] = dr[i];
+	}
+
+	return j;
+}
+
+static inline uint32_t
+pkt_filter_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i;
+	void *dp[num];
+	uint64_t rc[num];
+
+	for (i = 0; i != num; i++)
+		dp[i] = rte_pktmbuf_mtod(mb[i], void *);
+
+	rte_bpf_exec_burst(bpf, dp, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i;
+	void *dp;
+	uint64_t rc[num];
+
+	for (i = 0; i != num; i++) {
+		dp = rte_pktmbuf_mtod(mb[i], void *);
+		rc[i] = (jit->func(dp) != 0);
+	}
+
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_mb_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint64_t rc[num];
+
+	rte_bpf_exec_burst(bpf, (void **)mb, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_mb_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i;
+	uint64_t rc[num];
+
+	for (i = 0; i != num; i++)
+		rc[i] = (jit->func(mb[i]) != 0);
+
+	return apply_filter(mb, rc, num, drop);
+}
+
+/*
+ * RX/TX callbacks for raw data bpf.
+ */
+
+static uint16_t
+bpf_rx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+/*
+ * RX/TX callbacks for mbuf.
+ */
+
+static uint16_t
+bpf_rx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static rte_rx_callback_fn
+select_rx_callback(enum rte_bpf_prog_type ptype, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+			return bpf_rx_callback_jit;
+		else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+			return bpf_rx_callback_mb_jit;
+	} else if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+		return bpf_rx_callback_vm;
+	else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+		return bpf_rx_callback_mb_vm;
+
+	return NULL;
+}
+
+static rte_tx_callback_fn
+select_tx_callback(enum rte_bpf_prog_type ptype, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+			return bpf_tx_callback_jit;
+		else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+			return bpf_tx_callback_mb_jit;
+	} else if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+		return bpf_tx_callback_vm;
+	else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+		return bpf_tx_callback_mb_vm;
+
+	return NULL;
+}
+
+/*
+ * helper function to perform BPF unload for given port/queue.
+ * have to introduce extra complexity (and slowdown) here,
+ * as right now there is no safe generic way to remove RX/TX callback
+ * while IO is active.
+ * Still don't free memory allocated for callback handle itself,
+ * again right now there is no safe way to do that without stopping RX/TX
+ * on given port/queue first.
+ */
+static void
+bpf_eth_unload(struct bpf_eth_cbi *bc)
+{
+	/* mark this cbi as empty */
+	bc->cb = NULL;
+	rte_smp_mb();
+
+	/* make sure datapath doesn't use bpf anymore, then destroy bpf */
+	bpf_eth_cbi_wait(bc);
+	rte_bpf_destroy(bc->bpf);
+	bpf_eth_cbi_cleanup(bc);
+}
+
+__rte_experimental void
+rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *bc;
+	void *cb;
+
+	bc = &rx_cbi[port][queue];
+	cb = bc->cb;
+
+	if (cb == NULL)
+		return;
+
+	rte_eth_remove_rx_callback(port, queue, cb);
+	bpf_eth_unload(bc);
+}
+
+__rte_experimental void
+rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *bc;
+	void *cb;
+
+	bc = &tx_cbi[port][queue];
+	cb = bc->cb;
+
+	if (cb == NULL)
+		return;
+
+	rte_eth_remove_tx_callback(port, queue, cb);
+	bpf_eth_unload(bc);
+}
+
+__rte_experimental int
+rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbi *bc;
+	struct rte_bpf *bpf;
+	rte_rx_callback_fn fn;
+
+	if (prm == NULL)
+		return -EINVAL;
+
+	/* remove old one, if any */
+	rte_bpf_eth_rx_unload(port, queue);
+
+	fn = select_rx_callback(prm->prog_type, flags);
+	if (fn == NULL) {
+		RTE_LOG(ERR, USER1, "%s(%u, %u): no callback selected;\n",
+			__func__, port, queue);
+		return -EINVAL;
+	}
+
+	bpf = rte_bpf_elf_load(prm, fname, sname);
+	if (bpf == NULL)
+		return -rte_errno;
+
+	/* update global callback info */
+	bc = &rx_cbi[port][queue];
+	bc->bpf = bpf;
+	rte_bpf_get_jit(bpf, &bc->jit);
+
+	rc = 0;
+
+	if ((flags & RTE_BPF_ETH_F_JIT) != 0 && bc->jit.func == NULL) {
+		RTE_LOG(ERR, USER1, "%s(%u, %u): no JIT generated;\n",
+			__func__, port, queue);
+		rc = -EINVAL;
+	} else {
+		bc->cb = rte_eth_add_rx_callback(port, queue, fn, bc);
+		if (bc->cb == NULL)
+			rc = -rte_errno;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		bpf_eth_cbi_cleanup(bc);
+	}
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbi *bc;
+	struct rte_bpf *bpf;
+	rte_tx_callback_fn fn;
+
+	if (prm == NULL)
+		return -EINVAL;
+
+	/* remove old one, if any */
+	rte_bpf_eth_tx_unload(port, queue);
+
+	fn = select_tx_callback(prm->prog_type, flags);
+	if (fn == NULL) {
+		RTE_LOG(ERR, USER1, "%s(%u, %u): no callback selected;\n",
+			__func__, port, queue);
+		return -EINVAL;
+	}
+
+	bpf = rte_bpf_elf_load(prm, fname, sname);
+	if (bpf == NULL)
+		return -rte_errno;
+
+	/* update global callback info */
+	bc = &tx_cbi[port][queue];
+	bc->bpf = bpf;
+	rte_bpf_get_jit(bpf, &bc->jit);
+
+	rc = 0;
+
+	if ((flags & RTE_BPF_ETH_F_JIT) != 0 && bc->jit.func == NULL) {
+		RTE_LOG(ERR, USER1, "%s(%u, %u): no JIT generated;\n",
+			__func__, port, queue);
+		rc = -EINVAL;
+	} else {
+		bc->cb = rte_eth_add_tx_callback(port, queue, fn, bc);
+		if (bc->cb == NULL)
+			rc = -rte_errno;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		bpf_eth_cbi_cleanup(bc);
+	}
+
+	return rc;
+}
diff --git a/lib/librte_bpf/rte_bpf_ethdev.h b/lib/librte_bpf/rte_bpf_ethdev.h
new file mode 100644
index 000000000..abc3b8e5f
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_ethdev.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_ETHDEV_H_
+#define _RTE_BPF_ETHDEV_H_
+
+#include <rte_bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum {
+	RTE_BPF_ETH_F_NONE = 0,
+	RTE_BPF_ETH_F_JIT  = 0x1, /*< compile BPF into native ISA */
+};
+
+/*
+ * API to install BPF filter as RX/TX callbacks for eth devices.
+ * Note that right now:
+ * - it is not MT safe, i.e. it is not allowed to do load/unload for the
+ *   same port/queue from different threads in parallel.
+ * - though it allows to do load/unload at runtime
+ *   (while RX/TX is ongoing on given port/queue).
+ * - allows only one BPF program per port/queue,
+ * i.e. new load will replace previously loaded for that port/queue BPF program.
+ * Filter behaviour - if BPF program returns zero value for a given packet,
+ * then it will be dropped inside callback and no further processing
+ *   on RX - it will be dropped inside callback and no further processing
+ *   for that packet will happen.
+ *   on TX - packet will remain unsent, and it is responsibility of the user
+ *   to handle such situation (drop, try to send again, etc.).
+ */
+
+void rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue);
+void rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue);
+
+int rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+int rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_ETHDEV_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
index ff65144df..a203e088e 100644
--- a/lib/librte_bpf/rte_bpf_version.map
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -3,6 +3,10 @@ EXPERIMENTAL {
 
 	rte_bpf_destroy;
 	rte_bpf_elf_load;
+	rte_bpf_eth_rx_elf_load;
+	rte_bpf_eth_rx_unload;
+	rte_bpf_eth_tx_elf_load;
+	rte_bpf_eth_tx_unload;
 	rte_bpf_exec;
 	rte_bpf_exec_burst;
 	rte_bpf_get_jit;
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v1 4/5] testpmd: new commands to load/unload BPF filters
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (2 preceding siblings ...)
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 3/5] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
@ 2018-03-09 16:42 ` Konstantin Ananyev
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 5/5] test: add few eBPF samples Konstantin Ananyev
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-09 16:42 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce new testpmd commands to load/unload RX/TX BPF-based filters.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/bpf_sup.h |  25 +++++++++
 app/test-pmd/cmdline.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 171 insertions(+)
 create mode 100644 app/test-pmd/bpf_sup.h

diff --git a/app/test-pmd/bpf_sup.h b/app/test-pmd/bpf_sup.h
new file mode 100644
index 000000000..35f91a07f
--- /dev/null
+++ b/app/test-pmd/bpf_sup.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2017 Intel Corporation
+ */
+
+#ifndef _BPF_SUP_H_
+#define _BPF_SUP_H_
+
+#include <stdio.h>
+#include <rte_mbuf.h>
+#include <rte_bpf_ethdev.h>
+
+static const struct rte_bpf_xsym bpf_xsym[] = {
+	{
+		.name = RTE_STR(stdout),
+		.type = RTE_BPF_XTYPE_VAR,
+		.var = &stdout,
+	},
+	{
+		.name = RTE_STR(rte_pktmbuf_dump),
+		.type = RTE_BPF_XTYPE_FUNC,
+		.func = (void *)rte_pktmbuf_dump,
+	},
+};
+
+#endif /* _BPF_SUP_H_ */
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index d1dc1de6c..56b680e0e 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -75,6 +75,7 @@
 #include "testpmd.h"
 #include "cmdline_mtr.h"
 #include "cmdline_tm.h"
+#include "bpf_sup.h"
 
 static struct cmdline *testpmd_cl;
 
@@ -16030,6 +16031,149 @@ cmdline_parse_inst_t cmd_load_from_file = {
 	},
 };
 
+/* *** load BPF program *** */
+struct cmd_bpf_ld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+	cmdline_fixed_string_t op;
+	cmdline_fixed_string_t flags;
+	cmdline_fixed_string_t prm;
+};
+
+static void
+bpf_parse_flags(const char *str, enum rte_bpf_prog_type *ptype, uint32_t *flags)
+{
+	uint32_t i, v;
+
+	*flags = RTE_BPF_ETH_F_NONE;
+	*ptype = RTE_BPF_PROG_TYPE_UNSPEC;
+
+	for (i = 0; str[i] != 0; i++) {
+		v = toupper(str[i]);
+		if (v == 'J')
+			*flags |= RTE_BPF_ETH_F_JIT;
+		else if (v == 'M')
+			*ptype = RTE_BPF_PROG_TYPE_MBUF;
+		else if (v == '-')
+			continue;
+		else
+			printf("unknown flag: \'%c\'", v);
+	}
+}
+
+static void cmd_operate_bpf_ld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	int32_t rc;
+	uint32_t flags;
+	struct cmd_bpf_ld_result *res;
+	struct rte_bpf_prm prm;
+	const char *fname, *sname;
+
+	res = parsed_result;
+	memset(&prm, 0, sizeof(prm));
+	prm.xsym = bpf_xsym;
+	prm.nb_xsym = RTE_DIM(bpf_xsym);
+
+	bpf_parse_flags(res->flags, &prm.prog_type, &flags);
+	fname = res->prm;
+	sname = ".text";
+
+	if (strcmp(res->dir, "rx") == 0) {
+		rc = rte_bpf_eth_rx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else if (strcmp(res->dir, "tx") == 0) {
+		rc = rte_bpf_eth_tx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_load_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			bpf, "bpf-load");
+cmdline_parse_token_string_t cmd_load_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_load_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_load_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, queue, UINT16);
+cmdline_parse_token_string_t cmd_load_bpf_flags =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			flags, NULL);
+cmdline_parse_token_string_t cmd_load_bpf_prm =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			prm, NULL);
+
+cmdline_parse_inst_t cmd_operate_bpf_ld_parse = {
+	.f = cmd_operate_bpf_ld_parsed,
+	.data = NULL,
+	.help_str = "bpf-load rx|tx <port> <queue> <J|M|B> <file_name>",
+	.tokens = {
+		(void *)&cmd_load_bpf_start,
+		(void *)&cmd_load_bpf_dir,
+		(void *)&cmd_load_bpf_port,
+		(void *)&cmd_load_bpf_queue,
+		(void *)&cmd_load_bpf_flags,
+		(void *)&cmd_load_bpf_prm,
+		NULL,
+	},
+};
+
+/* *** unload BPF program *** */
+struct cmd_bpf_unld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+};
+
+static void cmd_operate_bpf_unld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	struct cmd_bpf_unld_result *res;
+
+	res = parsed_result;
+
+	if (strcmp(res->dir, "rx") == 0)
+		rte_bpf_eth_rx_unload(res->port, res->queue);
+	else if (strcmp(res->dir, "tx") == 0)
+		rte_bpf_eth_tx_unload(res->port, res->queue);
+	else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_unload_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			bpf, "bpf-unload");
+cmdline_parse_token_string_t cmd_unload_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_unload_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_unload_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, queue, UINT16);
+
+cmdline_parse_inst_t cmd_operate_bpf_unld_parse = {
+	.f = cmd_operate_bpf_unld_parsed,
+	.data = NULL,
+	.help_str = "bpf-unload rx|tx <port> <queue>",
+	.tokens = {
+		(void *)&cmd_unload_bpf_start,
+		(void *)&cmd_unload_bpf_dir,
+		(void *)&cmd_unload_bpf_port,
+		(void *)&cmd_unload_bpf_queue,
+		NULL,
+	},
+};
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -16272,6 +16416,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_del_port_tm_node,
 	(cmdline_parse_inst_t *)&cmd_set_port_tm_node_parent,
 	(cmdline_parse_inst_t *)&cmd_port_tm_hierarchy_commit,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_ld_parse,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_unld_parse,
 	NULL,
 };
 
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v1 5/5] test: add few eBPF samples
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (3 preceding siblings ...)
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 4/5] testpmd: new commands to load/unload " Konstantin Ananyev
@ 2018-03-09 16:42 ` Konstantin Ananyev
  2018-03-13 13:02 ` [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Jerin Jacob
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-09 16:42 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add few simple eBPF programs as an example.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 test/bpf/dummy.c |  20 ++
 test/bpf/mbuf.h  | 556 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 test/bpf/t1.c    |  53 ++++++
 test/bpf/t2.c    |  30 +++
 test/bpf/t3.c    |  36 ++++
 5 files changed, 695 insertions(+)
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c

diff --git a/test/bpf/dummy.c b/test/bpf/dummy.c
new file mode 100644
index 000000000..5851469e7
--- /dev/null
+++ b/test/bpf/dummy.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * does nothing always return success.
+ * used to measure BPF infrastructure overhead.
+ * To compile:
+ * clang -O2 -target bpf -c dummy.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+
+uint64_t
+entry(void *arg)
+{
+	return 1;
+}
diff --git a/test/bpf/mbuf.h b/test/bpf/mbuf.h
new file mode 100644
index 000000000..aeef6339d
--- /dev/null
+++ b/test/bpf/mbuf.h
@@ -0,0 +1,556 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation.
+ * Copyright 2014 6WIND S.A.
+ */
+
+/*
+ * Snipper from dpdk.org rte_mbuf.h.
+ * used to provide BPF programs information about rte_mbuf layout.
+ */
+
+#ifndef _MBUF_H_
+#define _MBUF_H_
+
+#include <stdint.h>
+#include <rte_common.h>
+#include <rte_memory.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * Packet Offload Features Flags. It also carry packet type information.
+ * Critical resources. Both rx/tx shared these bits. Be cautious on any change
+ *
+ * - RX flags start at bit position zero, and get added to the left of previous
+ *   flags.
+ * - The most-significant 3 bits are reserved for generic mbuf flags
+ * - TX flags therefore start at bit position 60 (i.e. 63-3), and new flags get
+ *   added to the right of the previously defined flags i.e. they should count
+ *   downwards, not upwards.
+ *
+ * Keep these flags synchronized with rte_get_rx_ol_flag_name() and
+ * rte_get_tx_ol_flag_name().
+ */
+
+/**
+ * RX packet is a 802.1q VLAN packet. This flag was set by PMDs when
+ * the packet is recognized as a VLAN, but the behavior between PMDs
+ * was not the same. This flag is kept for some time to avoid breaking
+ * applications and should be replaced by PKT_RX_VLAN_STRIPPED.
+ */
+#define PKT_RX_VLAN_PKT      (1ULL << 0)
+
+#define PKT_RX_RSS_HASH      (1ULL << 1)  /**< RX packet with RSS hash result. */
+#define PKT_RX_FDIR          (1ULL << 2)  /**< RX packet with FDIR match indicate. */
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_L4_CKSUM_MASK.
+ * This flag was set when the L4 checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_IP_CKSUM_MASK.
+ * This flag was set when the IP checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
+
+#define PKT_RX_EIP_CKSUM_BAD (1ULL << 5)  /**< External IP header checksum error. */
+
+/**
+ * A vlan has been stripped by the hardware and its tci is saved in
+ * mbuf->vlan_tci. This can only happen if vlan stripping is enabled
+ * in the RX configuration of the PMD.
+ */
+#define PKT_RX_VLAN_STRIPPED (1ULL << 6)
+
+/**
+ * Mask of bits used to determine the status of RX IP checksum.
+ * - PKT_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
+ * - PKT_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
+ * - PKT_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
+ * - PKT_RX_IP_CKSUM_NONE: the IP checksum is not correct in the packet
+ *   data, but the integrity of the IP header is verified.
+ */
+#define PKT_RX_IP_CKSUM_MASK ((1ULL << 4) | (1ULL << 7))
+
+#define PKT_RX_IP_CKSUM_UNKNOWN 0
+#define PKT_RX_IP_CKSUM_BAD     (1ULL << 4)
+#define PKT_RX_IP_CKSUM_GOOD    (1ULL << 7)
+#define PKT_RX_IP_CKSUM_NONE    ((1ULL << 4) | (1ULL << 7))
+
+/**
+ * Mask of bits used to determine the status of RX L4 checksum.
+ * - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
+ * - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
+ * - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
+ * - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
+ *   data, but the integrity of the L4 data is verified.
+ */
+#define PKT_RX_L4_CKSUM_MASK ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_L4_CKSUM_UNKNOWN 0
+#define PKT_RX_L4_CKSUM_BAD     (1ULL << 3)
+#define PKT_RX_L4_CKSUM_GOOD    (1ULL << 8)
+#define PKT_RX_L4_CKSUM_NONE    ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_IEEE1588_PTP  (1ULL << 9)  /**< RX IEEE1588 L2 Ethernet PT Packet. */
+#define PKT_RX_IEEE1588_TMST (1ULL << 10) /**< RX IEEE1588 L2/L4 timestamped packet.*/
+#define PKT_RX_FDIR_ID       (1ULL << 13) /**< FD id reported if FDIR match. */
+#define PKT_RX_FDIR_FLX      (1ULL << 14) /**< Flexible bytes reported if FDIR match. */
+
+/**
+ * The 2 vlans have been stripped by the hardware and their tci are
+ * saved in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
+ * This can only happen if vlan stripping is enabled in the RX
+ * configuration of the PMD. If this flag is set, PKT_RX_VLAN_STRIPPED
+ * must also be set.
+ */
+#define PKT_RX_QINQ_STRIPPED (1ULL << 15)
+
+/**
+ * Deprecated.
+ * RX packet with double VLAN stripped.
+ * This flag is replaced by PKT_RX_QINQ_STRIPPED.
+ */
+#define PKT_RX_QINQ_PKT      PKT_RX_QINQ_STRIPPED
+
+/**
+ * When packets are coalesced by a hardware or virtual driver, this flag
+ * can be set in the RX mbuf, meaning that the m->tso_segsz field is
+ * valid and is set to the segment size of original packets.
+ */
+#define PKT_RX_LRO           (1ULL << 16)
+
+/**
+ * Indicate that the timestamp field in the mbuf is valid.
+ */
+#define PKT_RX_TIMESTAMP     (1ULL << 17)
+
+/* add new RX flags here */
+
+/* add new TX flags here */
+
+/**
+ * Offload the MACsec. This flag must be set by the application to enable
+ * this offload feature for a packet to be transmitted.
+ */
+#define PKT_TX_MACSEC        (1ULL << 44)
+
+/**
+ * Bits 45:48 used for the tunnel type.
+ * When doing Tx offload like TSO or checksum, the HW needs to configure the
+ * tunnel type into the HW descriptors.
+ */
+#define PKT_TX_TUNNEL_VXLAN   (0x1ULL << 45)
+#define PKT_TX_TUNNEL_GRE     (0x2ULL << 45)
+#define PKT_TX_TUNNEL_IPIP    (0x3ULL << 45)
+#define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
+/**< TX packet with MPLS-in-UDP RFC 7510 header. */
+#define PKT_TX_TUNNEL_MPLSINUDP (0x5ULL << 45)
+/* add new TX TUNNEL type here */
+#define PKT_TX_TUNNEL_MASK    (0xFULL << 45)
+
+/**
+ * Second VLAN insertion (QinQ) flag.
+ */
+#define PKT_TX_QINQ_PKT    (1ULL << 49)   /**< TX packet with double VLAN inserted. */
+
+/**
+ * TCP segmentation offload. To enable this offload feature for a
+ * packet to be transmitted on hardware supporting TSO:
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *    PKT_TX_TCP_CKSUM)
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
+ *    to 0 in the packet
+ *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
+ *  - calculate the pseudo header checksum without taking ip_len in account,
+ *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
+ *    rte_ipv6_phdr_cksum() that can be used as helpers.
+ */
+#define PKT_TX_TCP_SEG       (1ULL << 50)
+
+#define PKT_TX_IEEE1588_TMST (1ULL << 51) /**< TX IEEE1588 packet to timestamp. */
+
+/**
+ * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
+ * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
+ * L4 checksum offload, the user needs to:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - calculate the pseudo header checksum and set it in the L4 header (only
+ *    for TCP or UDP). See rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum().
+ *    For SCTP, set the crc field to 0.
+ */
+#define PKT_TX_L4_NO_CKSUM   (0ULL << 52) /**< Disable L4 cksum of TX pkt. */
+#define PKT_TX_TCP_CKSUM     (1ULL << 52) /**< TCP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_SCTP_CKSUM    (2ULL << 52) /**< SCTP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_UDP_CKSUM     (3ULL << 52) /**< UDP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_L4_MASK       (3ULL << 52) /**< Mask for L4 cksum offload request. */
+
+/**
+ * Offload the IP checksum in the hardware. The flag PKT_TX_IPV4 should
+ * also be set by the application, although a PMD will only check
+ * PKT_TX_IP_CKSUM.
+ *  - set the IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: l2_len, l3_len
+ */
+#define PKT_TX_IP_CKSUM      (1ULL << 54)
+
+/**
+ * Packet is IPv4. This flag must be set when using any offload feature
+ * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV4          (1ULL << 55)
+
+/**
+ * Packet is IPv6. This flag must be set when using an offload feature
+ * (TSO or L4 checksum) to tell the NIC that the packet is an IPv6
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV6          (1ULL << 56)
+
+#define PKT_TX_VLAN_PKT      (1ULL << 57) /**< TX packet is a 802.1q VLAN packet. */
+
+/**
+ * Offload the IP checksum of an external header in the hardware. The
+ * flag PKT_TX_OUTER_IPV4 should also be set by the application, alto ugh
+ * a PMD will only check PKT_TX_IP_CKSUM.  The IP checksum field in the
+ * packet must be set to 0.
+ *  - set the outer IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: outer_l2_len, outer_l3_len
+ */
+#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
+
+/**
+ * Packet outer header is IPv4. This flag must be set when using any
+ * outer offload feature (L3 or L4 checksum) to tell the NIC that the
+ * outer header of the tunneled packet is an IPv4 packet.
+ */
+#define PKT_TX_OUTER_IPV4   (1ULL << 59)
+
+/**
+ * Packet outer header is IPv6. This flag must be set when using any
+ * outer offload feature (L4 checksum) to tell the NIC that the outer
+ * header of the tunneled packet is an IPv6 packet.
+ */
+#define PKT_TX_OUTER_IPV6    (1ULL << 60)
+
+/**
+ * Bitmask of all supported packet Tx offload features flags,
+ * which can be set for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_IEEE1588_TMST |	 \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK |	 \
+		PKT_TX_MACSEC)
+
+#define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
+
+#define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
+
+/* Use final bit of flags to indicate a control mbuf */
+#define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
+
+/** Alignment constraint of mbuf private area. */
+#define RTE_MBUF_PRIV_ALIGN 8
+
+/**
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of RX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the RX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of TX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the TX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Some NICs need at least 2KB buffer to RX standard Ethernet frame without
+ * splitting it into multiple segments.
+ * So, for mbufs that planned to be involved into RX/TX, the recommended
+ * minimal buffer length is 2KB + RTE_PKTMBUF_HEADROOM.
+ */
+#define	RTE_MBUF_DEFAULT_DATAROOM	2048
+#define	RTE_MBUF_DEFAULT_BUF_SIZE	\
+	(RTE_MBUF_DEFAULT_DATAROOM + RTE_PKTMBUF_HEADROOM)
+
+/* define a set of marker types that can be used to refer to set points in the
+ * mbuf */
+__extension__
+typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
+__extension__
+typedef uint8_t  MARKER8[0];  /**< generic marker with 1B alignment */
+__extension__
+typedef uint64_t MARKER64[0]; /**< marker that allows us to overwrite 8 bytes
+                               * with a single assignment */
+
+typedef struct {
+        volatile int16_t cnt; /**< An internal counter value. */
+} rte_atomic16_t;
+
+/**
+ * The generic rte_mbuf, containing a packet mbuf.
+ */
+struct rte_mbuf {
+	MARKER cacheline0;
+
+	void *buf_addr;           /**< Virtual address of segment buffer. */
+	/**
+	 * Physical address of segment buffer.
+	 * Force alignment to 8-bytes, so as to ensure we have the exact
+	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+	 * working on vector drivers easier.
+	 */
+	phys_addr_t buf_physaddr __rte_aligned(sizeof(phys_addr_t));
+
+	/* next 8 bytes are initialised on RX descriptor rearm */
+	MARKER64 rearm_data;
+	uint16_t data_off;
+
+	/**
+	 * Reference counter. Its size should at least equal to the size
+	 * of port field (16 bits), to support zero-copy broadcast.
+	 * It should only be accessed using the following functions:
+	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
+	 * config option.
+	 */
+	RTE_STD_C11
+	union {
+		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
+		uint16_t refcnt;              /**< Non-atomically accessed refcnt */
+	};
+	uint16_t nb_segs;         /**< Number of segments. */
+
+	/** Input port (16 bits to support more than 256 virtual ports). */
+	uint16_t port;
+
+	uint64_t ol_flags;        /**< Offload features. */
+
+	/* remaining bytes are set on RX when pulling packet from descriptor */
+	MARKER rx_descriptor_fields1;
+
+	/*
+	 * The packet type, which is the combination of outer/inner L2, L3, L4
+	 * and tunnel types. The packet_type is about data really present in the
+	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+	 * vlan is stripped from the data.
+	 */
+	RTE_STD_C11
+	union {
+		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		struct {
+			uint32_t l2_type:4; /**< (Outer) L2 type. */
+			uint32_t l3_type:4; /**< (Outer) L3 type. */
+			uint32_t l4_type:4; /**< (Outer) L4 type. */
+			uint32_t tun_type:4; /**< Tunnel type. */
+			uint32_t inner_l2_type:4; /**< Inner L2 type. */
+			uint32_t inner_l3_type:4; /**< Inner L3 type. */
+			uint32_t inner_l4_type:4; /**< Inner L4 type. */
+		};
+	};
+
+	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
+	uint16_t data_len;        /**< Amount of data in segment buffer. */
+	/** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
+	uint16_t vlan_tci;
+
+	union {
+		uint32_t rss;     /**< RSS hash result if RSS enabled */
+		struct {
+			RTE_STD_C11
+			union {
+				struct {
+					uint16_t hash;
+					uint16_t id;
+				};
+				uint32_t lo;
+				/**< Second 4 flexible bytes */
+			};
+			uint32_t hi;
+			/**< First 4 flexible bytes or FD ID, dependent on
+			     PKT_RX_FDIR_* flag in ol_flags. */
+		} fdir;           /**< Filter identifier if FDIR enabled */
+		struct {
+			uint32_t lo;
+			uint32_t hi;
+		} sched;          /**< Hierarchical scheduler */
+		uint32_t usr;	  /**< User defined tags. See rte_distributor_process() */
+	} hash;                   /**< hash information */
+
+	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
+	uint16_t vlan_tci_outer;
+
+	uint16_t buf_len;         /**< Length of segment buffer. */
+
+	/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
+	 * are not normalized but are always the same for a given port.
+	 */
+	uint64_t timestamp;
+
+	/* second cache line - fields only used in slow path or on TX */
+	MARKER cacheline1 __rte_cache_min_aligned;
+
+	RTE_STD_C11
+	union {
+		void *userdata;   /**< Can be used for external metadata */
+		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
+	};
+
+	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
+	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+
+	/* fields to support TX offloads */
+	RTE_STD_C11
+	union {
+		uint64_t tx_offload;       /**< combined for easy fetch */
+		__extension__
+		struct {
+			uint64_t l2_len:7;
+			/**< L2 (MAC) Header Length for non-tunneling pkt.
+			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
+			 */
+			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+			uint64_t tso_segsz:16; /**< TCP TSO segment size */
+
+			/* fields for TX offloading of tunnels */
+			uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
+			uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
+
+			/* uint64_t unused:8; */
+		};
+	};
+
+	/** Size of the application private data. In case of an indirect
+	 * mbuf, it stores the direct mbuf private data size. */
+	uint16_t priv_size;
+
+	/** Timesync flags for use with IEEE1588. */
+	uint16_t timesync;
+
+	/** Sequence number. See also rte_reorder_insert(). */
+	uint32_t seqn;
+
+} __rte_cache_aligned;
+
+
+/**
+ * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
+ */
+#define RTE_MBUF_INDIRECT(mb)   ((mb)->ol_flags & IND_ATTACHED_MBUF)
+
+/**
+ * Returns TRUE if given mbuf is direct, or FALSE otherwise.
+ */
+#define RTE_MBUF_DIRECT(mb)     (!RTE_MBUF_INDIRECT(mb))
+
+/**
+ * Private data in case of pktmbuf pool.
+ *
+ * A structure that contains some pktmbuf_pool-specific data that are
+ * appended after the mempool structure (in private data).
+ */
+struct rte_pktmbuf_pool_private {
+	uint16_t mbuf_data_room_size; /**< Size of data space in each mbuf. */
+	uint16_t mbuf_priv_size;      /**< Size of private area in each mbuf. */
+};
+
+/**
+ * A macro that points to an offset into the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param o
+ *   The offset into the mbuf data.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod_offset(m, t, o)	\
+	((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
+
+/**
+ * A macro that points to the start of the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _MBUF_H_ */
diff --git a/test/bpf/t1.c b/test/bpf/t1.c
new file mode 100644
index 000000000..6f4dec743
--- /dev/null
+++ b/test/bpf/t1.c
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to first segment packet data as an input parameter.
+ * analog of tcpdump -s 1 -d 'dst 1.2.3.4 && udp && dst port 5000'
+ * (000) ldh      [12]
+ * (001) jeq      #0x800           jt 2    jf 12
+ * (002) ld       [30]
+ * (003) jeq      #0x1020304       jt 4    jf 12
+ * (004) ldb      [23]
+ * (005) jeq      #0x11            jt 6    jf 12
+ * (006) ldh      [20]
+ * (007) jset     #0x1fff          jt 12   jf 8
+ * (008) ldxb     4*([14]&0xf)
+ * (009) ldh      [x + 16]
+ * (010) jeq      #0x1388          jt 11   jf 12
+ * (011) ret      #1
+ * (012) ret      #0
+ *
+ * To compile:
+ * clang -O2 -DRTE_CACHE_LINE_SIZE=64 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -c t1.c
+ */
+
+#include <stdint.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <netinet/udp.h>
+
+uint64_t
+entry(void *pkt)
+{
+	struct ether_header *ether_header = (void *)pkt;
+
+	if (ether_header->ether_type != __builtin_bswap16(0x0800))
+		return 0;
+
+	struct iphdr *iphdr = (void *)(ether_header + 1);
+	if (iphdr->protocol != 17 || (iphdr->frag_off & 0x1ffff) != 0 ||
+			iphdr->daddr != __builtin_bswap32(0x1020304))
+		return 0;
+
+	int hlen = iphdr->ihl * 4;
+	struct udphdr *udphdr = (void *)iphdr + hlen;
+
+	if (udphdr->dest !=  __builtin_bswap16(5000))
+		return 0;
+
+	return 1;
+}
diff --git a/test/bpf/t2.c b/test/bpf/t2.c
new file mode 100644
index 000000000..ee2becb26
--- /dev/null
+++ b/test/bpf/t2.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * cleanup mbuf's vlan_tci and all related RX flags
+ * (PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED).
+ * Doesn't touch contents of packet data.
+ * To compile:
+ * clang -O2 -DRTE_CACHE_LINE_SIZE=... -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t2.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include "mbuf.h"
+
+uint64_t
+entry(void *pkt)
+{
+	struct rte_mbuf *mb;
+
+	mb = pkt;
+	mb->vlan_tci = 0;
+	mb->ol_flags &= ~(PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED);
+
+	return 1;
+}
diff --git a/test/bpf/t3.c b/test/bpf/t3.c
new file mode 100644
index 000000000..f7e775fd7
--- /dev/null
+++ b/test/bpf/t3.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * Dump the mbuf into stdout if it is an ARP packet (aka tcpdump 'arp').
+ * To compile:
+ * clang -O2 -DRTE_CACHE_LINE_SIZE=... -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t3.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <net/ethernet.h>
+#include "mbuf.h"
+
+extern void rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m,
+	unsigned dump_len);
+
+uint64_t
+entry(const void *pkt)
+{
+	const struct rte_mbuf *mb;
+	const struct ether_header *eth;
+
+	mb = pkt;
+	eth = rte_pktmbuf_mtod(mb, const struct ether_header *);
+
+	if (eth->ether_type == __builtin_bswap16(ETHERTYPE_ARP))
+		rte_pktmbuf_dump(stdout, mb, 64);
+
+	return 1;
+}
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (4 preceding siblings ...)
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 5/5] test: add few eBPF samples Konstantin Ananyev
@ 2018-03-13 13:02 ` Jerin Jacob
  2018-03-13 17:24   ` Ananyev, Konstantin
  2018-03-14 16:43 ` Alejandro Lucero
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 83+ messages in thread
From: Jerin Jacob @ 2018-03-13 13:02 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev

-----Original Message-----
> Date: Fri, 9 Mar 2018 16:42:00 +0000
> From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> To: dev@dpdk.org
> CC: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Subject: [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF
>  code
> X-Mailer: git-send-email 1.7.0.7

Hi Konstantin,

> 
> BPF is used quite intensively inside Linux (and BSD) kernels
> for various different purposes and proved to be extremely useful.
> 
> BPF inside DPDK might also be used in a lot of places
> for a lot of similar things.
>  As an example to:
> - packet filtering/tracing (aka tcpdump)
> - packet classification
> - statistics collection
> - HW/PMD live-system debugging/prototyping - trace HW descriptors,
>   internal PMD SW state, etc.
>  ...
> 
> All of that in a dynamic, user-defined and extensible manner.
> 
> So these series introduce new library - librte_bpf.
> librte_bpf provides API to load and execute BPF bytecode within
> user-space dpdk app.
> It supports basic set of features from eBPF spec.
> Also it introduces basic framework to load/unload BPF-based filters
> on eth devices (right now via SW RX/TX callbacks).

It is an interesting feature.
I am yet to catch up on your implementation details.
Meanwhile, I have tried to run non JIT version on arm64.
I had some compilation issue with 4.9 kernel with gcc 5.3 toolchain.
Following patch fixes that.

Just wondering what we will do with FreeBSD, May it better to
kill the dependency of linux/filter.h and different kernel versions
by making bpf_impl.h self sufficient. Just a thought.

diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
index f094170..e500e26 100644
--- a/lib/librte_bpf/bpf_impl.h
+++ b/lib/librte_bpf/bpf_impl.h
@@ -13,6 +13,26 @@
 extern "C" {
 #endif
 
+#ifndef BPF_JLT
+#define BPF_JLT        0xa0    /* LT is unsigned, '<' */
+#endif
+
+#ifndef BPF_JLE
+#define BPF_JLE        0xb0    /* LE is unsigned, '<=' */
+#endif
+
+#ifndef BPF_JSLT
+#define BPF_JSLT       0xc0    /* SLT is signed, '<' */
+#endif
+
+#ifndef BPF_JSLE
+#define BPF_JSLE       0xd0    /* SLE is signed, '<=' */
+#endif
+
+#ifndef EM_BPF
+#define EM_BPF         247     /* Linux BPF - in-kernel virtual machine
*/
+#endif
+

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework Konstantin Ananyev
@ 2018-03-13 13:24   ` Jerin Jacob
  2018-03-13 17:47     ` Ananyev, Konstantin
  0 siblings, 1 reply; 83+ messages in thread
From: Jerin Jacob @ 2018-03-13 13:24 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev

-----Original Message-----
> Date: Fri, 9 Mar 2018 16:42:01 +0000
> From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> To: dev@dpdk.org
> CC: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Subject: [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution
>  framework
> X-Mailer: git-send-email 1.7.0.7
> 
> librte_bpf provides a framework to load and execute eBPF bytecode
> inside user-space dpdk based applications.
> It supports basic set of features from eBPF spec
> (https://www.kernel.org/doc/Documentation/networking/filter.txt).
> 
> Not currently supported features:
>  - JIT
>  - cBPF
>  - tail-pointer call
>  - eBPF MAP
>  - skb
> 
> It also adds dependency on libelf.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  config/common_base                 |   5 +
>  config/common_linuxapp             |   1 +
>  lib/Makefile                       |   2 +
>  lib/librte_bpf/Makefile            |  30 +++
>  lib/librte_bpf/bpf.c               |  48 ++++
>  lib/librte_bpf/bpf_exec.c          | 452 +++++++++++++++++++++++++++++++++++++
>  lib/librte_bpf/bpf_impl.h          |  37 +++
>  lib/librte_bpf/bpf_load.c          | 380 +++++++++++++++++++++++++++++++
>  lib/librte_bpf/bpf_validate.c      |  55 +++++
>  lib/librte_bpf/rte_bpf.h           | 158 +++++++++++++
>  lib/librte_bpf/rte_bpf_version.map |  12 +
>  mk/rte.app.mk                      |   2 +
>  12 files changed, 1182 insertions(+)
>  create mode 100644 lib/librte_bpf/Makefile
>  create mode 100644 lib/librte_bpf/bpf.c
>  create mode 100644 lib/librte_bpf/bpf_exec.c
>  create mode 100644 lib/librte_bpf/bpf_impl.h
>  create mode 100644 lib/librte_bpf/bpf_load.c
>  create mode 100644 lib/librte_bpf/bpf_validate.c
>  create mode 100644 lib/librte_bpf/rte_bpf.h
>  create mode 100644 lib/librte_bpf/rte_bpf_version.map
> 
> diff --git a/config/common_base b/config/common_base
> index ad03cf433..2205b684f 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -823,3 +823,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
>  # Compile the eventdev application
>  #
>  CONFIG_RTE_APP_EVENTDEV=y
> +
> +#
> +# Compile librte_bpf
> +#
> +CONFIG_RTE_LIBRTE_BPF=n
> diff --git a/config/common_linuxapp b/config/common_linuxapp
> index ff98f2355..7b4a0ce7d 100644
> --- a/config/common_linuxapp
> +++ b/config/common_linuxapp
> @@ -10,6 +10,7 @@ CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES=y
>  CONFIG_RTE_EAL_IGB_UIO=y
>  CONFIG_RTE_EAL_VFIO=y
>  CONFIG_RTE_KNI_KMOD=y
> +CONFIG_RTE_LIBRTE_BPF=y
>  CONFIG_RTE_LIBRTE_KNI=y
>  CONFIG_RTE_LIBRTE_PMD_KNI=y
>  CONFIG_RTE_LIBRTE_VHOST=y
> diff --git a/lib/Makefile b/lib/Makefile
> index ec965a606..a4a2329f9 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
>  DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
>  DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
>  DEPDIRS-librte_gso += librte_mempool
> +DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
> +DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether
>  
>  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>  DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
> diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
> new file mode 100644
> index 000000000..e0f434e77
> --- /dev/null
> +++ b/lib/librte_bpf/Makefile
> @@ -0,0 +1,30 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2018 Intel Corporation
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# library name
> +LIB = librte_bpf.a
> +
> +CFLAGS += -O3
> +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
> +CFLAGS += -DALLOW_EXPERIMENTAL_API
> +LDLIBS += -lrte_net -lrte_eal
> +LDLIBS += -lrte_mempool -lrte_ring
> +LDLIBS += -lrte_mbuf -lrte_ethdev
> +LDLIBS += -lelf
> +
> +EXPORT_MAP := rte_bpf_version.map
> +
> +LIBABIVER := 1
> +
> +# all source are stored in SRCS-y
> +SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
> +SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
> +SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
> +SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
> +
> +# install header files
> +SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
> new file mode 100644
> index 000000000..4727d2251
> --- /dev/null
> +++ b/lib/librte_bpf/bpf.c
> @@ -0,0 +1,48 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2018 Intel Corporation
> + */
> +
> +#include <stdarg.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <stdint.h>
> +#include <inttypes.h>
> +
> +#include <rte_common.h>
> +#include <rte_eal.h>
> +
> +#include "bpf_impl.h"
> +
> +__rte_experimental void
> +rte_bpf_destroy(struct rte_bpf *bpf)
> +{
> +	if (bpf != NULL) {
> +		if (bpf->jit.func != NULL)
> +			munmap(bpf->jit.func, bpf->jit.sz);
> +		munmap(bpf, bpf->sz);


Any specific reason to not use this memory from huge page using rte_zmalloc
to avoid normal TLB misses?


> +	}
> +}
> +
> +__rte_experimental int
> +rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
> +{
> +	if (bpf == NULL || jit == NULL)
> +		return -EINVAL;
> +
> +	jit[0] = bpf->jit;
> +	return 0;
> +}
> +
> +int
> +bpf_jit(struct rte_bpf *bpf)
> +{
> +	int32_t rc;
> +
> +	rc = -ENOTSUP;
> +
> +	if (rc != 0)
> +		RTE_LOG(WARNING, USER1, "%s(%p) failed, error code: %d;\n",
> +			__func__, bpf, rc);

How about using new dynamic logging option for this library?

> +	return rc;

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/5] bpf: introduce basic RX/TX BPF filters
  2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 3/5] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
@ 2018-03-13 13:39   ` Jerin Jacob
  2018-03-13 18:07     ` Ananyev, Konstantin
  0 siblings, 1 reply; 83+ messages in thread
From: Jerin Jacob @ 2018-03-13 13:39 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev

-----Original Message-----
> Date: Fri, 9 Mar 2018 16:42:03 +0000
> From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> To: dev@dpdk.org
> CC: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Subject: [dpdk-dev] [PATCH v1 3/5] bpf: introduce basic RX/TX BPF filters
> X-Mailer: git-send-email 1.7.0.7
> 
> Introduce API to install BPF based filters on ethdev RX/TX path.
> Current implementation is pure SW one, based on ethdev RX/TX
> callback mechanism.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  lib/librte_bpf/Makefile            |   2 +
>  lib/librte_bpf/bpf_pkt.c           | 524 +++++++++++++++++++++++++++++++++++++
>  lib/librte_bpf/rte_bpf_ethdev.h    |  50 ++++
>  lib/librte_bpf/rte_bpf_version.map |   4 +
>  4 files changed, 580 insertions(+)
>  create mode 100644 lib/librte_bpf/bpf_pkt.c
>  create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h
> 
> diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
> +
> +/*
> + * information about all installed BPF rx/tx callbacks
> + */
> +
> +struct bpf_eth_cbi {
> +	uint32_t use;    /*usage counter */
> +	void *cb;        /* callback handle */
> +	struct rte_bpf *bpf;
> +	struct rte_bpf_jit jit;
> +} __rte_cache_aligned;
> +
> +/*
> + * Odd number means that callback is used by datapath.
> + * Even number means that callback is not used by datapath.
> + */
> +#define BPF_ETH_CBI_INUSE  1
> +
> +static struct bpf_eth_cbi rx_cbi[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
> +static struct bpf_eth_cbi tx_cbi[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];

How about allocating this memory from huge page?

> +
> +/*
> + * Marks given callback as used by datapath.
> + */
> +static __rte_always_inline void
> +bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
> +{
> +	cbi->use++;
> +	/* make sure no store/load reordering could happen */
> +	rte_smp_mb();

This is an full barrier on non x86. How about a light version of this
logic? See below.

> +}
> +
> +/*
> + * Marks given callback list as not used by datapath.
> + */
> +static __rte_always_inline void
> +bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
> +{
> +	/* make sure all previous loads are completed */
> +	rte_smp_rmb();
> +	cbi->use++;
> +}
> +
> +/*
> + * Waits till datapath finished using given callback.
> + */
> +static void
> +bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
> +{
> +	uint32_t nuse, puse;
> +
> +	/* make sure all previous loads and stores are completed */
> +	rte_smp_mb();
> +

Read conjunction with below change

#if 0
> +	puse = cbi->use;
> +
> +	/* in use, busy wait till current RX/TX iteration is finished */
> +	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
> +		do {
> +			rte_pause();
> +			rte_compiler_barrier();
> +			nuse = cbi->use;
> +		} while (nuse == puse);
> +	}
#else
	cbi->cb = NULL;
	while (likely(cb->done != 1)) {
		rte_pause();
		rte_smb_rmb();
	}

or any other logic using flag to wait until callback completes.
#endif

> +}
> +
> +
> +/*
> + * RX/TX callbacks for raw data bpf.
> + */
> +
> +static uint16_t
> +bpf_rx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
> +	struct rte_mbuf *pkt[], uint16_t nb_pkts,
> +	__rte_unused uint16_t max_pkts, void *user_param)
> +{
> +	struct bpf_eth_cbi *cbi;
> +	uint16_t rc;
> +
> +	cbi = user_param;
> +

Read conjunction with above change

#if 0
> +	bpf_eth_cbi_inuse(cbi);
> +	rc = (cbi->cb != NULL) ?
> +		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 1) :
> +		nb_pkts;
> +	bpf_eth_cbi_unuse(cbi);
#else
	if (likely(cbi->cb != NULL))
		return pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 1) :
	else {
		cbi->done = 1;
		rte_smb_wmb();
		return nb_pkts;
	}
#endif

> +	return rc;
> +}
> +

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code
  2018-03-13 13:02 ` [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Jerin Jacob
@ 2018-03-13 17:24   ` Ananyev, Konstantin
  0 siblings, 0 replies; 83+ messages in thread
From: Ananyev, Konstantin @ 2018-03-13 17:24 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

Hi Jerin,

> 
> Hi Konstantin,
> 
> >
> > BPF is used quite intensively inside Linux (and BSD) kernels
> > for various different purposes and proved to be extremely useful.
> >
> > BPF inside DPDK might also be used in a lot of places
> > for a lot of similar things.
> >  As an example to:
> > - packet filtering/tracing (aka tcpdump)
> > - packet classification
> > - statistics collection
> > - HW/PMD live-system debugging/prototyping - trace HW descriptors,
> >   internal PMD SW state, etc.
> >  ...
> >
> > All of that in a dynamic, user-defined and extensible manner.
> >
> > So these series introduce new library - librte_bpf.
> > librte_bpf provides API to load and execute BPF bytecode within
> > user-space dpdk app.
> > It supports basic set of features from eBPF spec.
> > Also it introduces basic framework to load/unload BPF-based filters
> > on eth devices (right now via SW RX/TX callbacks).
> 
> It is an interesting feature.
> I am yet to catch up on your implementation details.
> Meanwhile, I have tried to run non JIT version on arm64.
> I had some compilation issue with 4.9 kernel with gcc 5.3 toolchain.
> Following patch fixes that.
> 
> Just wondering what we will do with FreeBSD, May it better to
> kill the dependency of linux/filter.h and different kernel versions
> by making bpf_impl.h self sufficient. Just a thought.

Good point, have pretty much same thought:
we already have some rudimentary bpf-related include:
drivers/net/tap/tap_bpf.h
which is uder dual(bsd and gpl) license.
Might be we can move it to lib/librte_net/bpf (or so)
and extend to contain all necessary bpf related stuff.
Then could be used by  both TAP PMD and librte_bpf
and might be something else in future.
Konstantin

> 
> diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
> index f094170..e500e26 100644
> --- a/lib/librte_bpf/bpf_impl.h
> +++ b/lib/librte_bpf/bpf_impl.h
> @@ -13,6 +13,26 @@
>  extern "C" {
>  #endif
> 
> +#ifndef BPF_JLT
> +#define BPF_JLT        0xa0    /* LT is unsigned, '<' */
> +#endif
> +
> +#ifndef BPF_JLE
> +#define BPF_JLE        0xb0    /* LE is unsigned, '<=' */
> +#endif
> +
> +#ifndef BPF_JSLT
> +#define BPF_JSLT       0xc0    /* SLT is signed, '<' */
> +#endif
> +
> +#ifndef BPF_JSLE
> +#define BPF_JSLE       0xd0    /* SLE is signed, '<=' */
> +#endif
> +
> +#ifndef EM_BPF
> +#define EM_BPF         247     /* Linux BPF - in-kernel virtual machine
> */
> +#endif
> +

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework
  2018-03-13 13:24   ` Jerin Jacob
@ 2018-03-13 17:47     ` Ananyev, Konstantin
  0 siblings, 0 replies; 83+ messages in thread
From: Ananyev, Konstantin @ 2018-03-13 17:47 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev



> > diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
> > new file mode 100644
> > index 000000000..4727d2251
> > --- /dev/null
> > +++ b/lib/librte_bpf/bpf.c
> > @@ -0,0 +1,48 @@
> > +/* SPDX-License-Identifier: BSD-3-Clause
> > + * Copyright(c) 2018 Intel Corporation
> > + */
> > +
> > +#include <stdarg.h>
> > +#include <stdio.h>
> > +#include <string.h>
> > +#include <errno.h>
> > +#include <stdint.h>
> > +#include <inttypes.h>
> > +
> > +#include <rte_common.h>
> > +#include <rte_eal.h>
> > +
> > +#include "bpf_impl.h"
> > +
> > +__rte_experimental void
> > +rte_bpf_destroy(struct rte_bpf *bpf)
> > +{
> > +	if (bpf != NULL) {
> > +		if (bpf->jit.func != NULL)
> > +			munmap(bpf->jit.func, bpf->jit.sz);
> > +		munmap(bpf, bpf->sz);
> 
> 
> Any specific reason to not use this memory from huge page using rte_zmalloc
> to avoid normal TLB misses?

The main reason - I'd like to keep BPF code as read-only,
and jitted one as read-only and executable.
About your concern - I don't think it would cause a lot of TLB misses:
most BPF programs are quite small and should fit into few 4K pages.   
At least so far, I didn't observe any dTLB miss-rate increase.

> 
> 
> > +	}
> > +}
> > +
> > +__rte_experimental int
> > +rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
> > +{
> > +	if (bpf == NULL || jit == NULL)
> > +		return -EINVAL;
> > +
> > +	jit[0] = bpf->jit;
> > +	return 0;
> > +}
> > +
> > +int
> > +bpf_jit(struct rte_bpf *bpf)
> > +{
> > +	int32_t rc;
> > +
> > +	rc = -ENOTSUP;
> > +
> > +	if (rc != 0)
> > +		RTE_LOG(WARNING, USER1, "%s(%p) failed, error code: %d;\n",
> > +			__func__, bpf, rc);
> 
> How about using new dynamic logging option for this library?

Good point, will try to switch to it with v2.
Konstantin

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v1 3/5] bpf: introduce basic RX/TX BPF filters
  2018-03-13 13:39   ` Jerin Jacob
@ 2018-03-13 18:07     ` Ananyev, Konstantin
  0 siblings, 0 replies; 83+ messages in thread
From: Ananyev, Konstantin @ 2018-03-13 18:07 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev



> 
> -----Original Message-----
> > Date: Fri, 9 Mar 2018 16:42:03 +0000
> > From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > To: dev@dpdk.org
> > CC: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > Subject: [dpdk-dev] [PATCH v1 3/5] bpf: introduce basic RX/TX BPF filters
> > X-Mailer: git-send-email 1.7.0.7
> >
> > Introduce API to install BPF based filters on ethdev RX/TX path.
> > Current implementation is pure SW one, based on ethdev RX/TX
> > callback mechanism.
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> > ---
> >  lib/librte_bpf/Makefile            |   2 +
> >  lib/librte_bpf/bpf_pkt.c           | 524 +++++++++++++++++++++++++++++++++++++
> >  lib/librte_bpf/rte_bpf_ethdev.h    |  50 ++++
> >  lib/librte_bpf/rte_bpf_version.map |   4 +
> >  4 files changed, 580 insertions(+)
> >  create mode 100644 lib/librte_bpf/bpf_pkt.c
> >  create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h
> >
> > diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
> > +
> > +/*
> > + * information about all installed BPF rx/tx callbacks
> > + */
> > +
> > +struct bpf_eth_cbi {
> > +	uint32_t use;    /*usage counter */
> > +	void *cb;        /* callback handle */
> > +	struct rte_bpf *bpf;
> > +	struct rte_bpf_jit jit;
> > +} __rte_cache_aligned;
> > +
> > +/*
> > + * Odd number means that callback is used by datapath.
> > + * Even number means that callback is not used by datapath.
> > + */
> > +#define BPF_ETH_CBI_INUSE  1
> > +
> > +static struct bpf_eth_cbi rx_cbi[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
> > +static struct bpf_eth_cbi tx_cbi[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
> 
> How about allocating this memory from huge page?

Yep, in v2 will switch to using rte_malloc() for cbi allocation.

> 
> > +
> > +/*
> > + * Marks given callback as used by datapath.
> > + */
> > +static __rte_always_inline void
> > +bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
> > +{
> > +	cbi->use++;
> > +	/* make sure no store/load reordering could happen */
> > +	rte_smp_mb();
> 
> This is an full barrier on non x86.

This is a full barrier on x86 too.
Unfortunately I think it is unavoidable, though open for suggestions.

> How about a light version of this
> logic? See below.
> 
> > +}
> > +
> > +/*
> > + * Marks given callback list as not used by datapath.
> > + */
> > +static __rte_always_inline void
> > +bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
> > +{
> > +	/* make sure all previous loads are completed */
> > +	rte_smp_rmb();
> > +	cbi->use++;
> > +}
> > +
> > +/*
> > + * Waits till datapath finished using given callback.
> > + */
> > +static void
> > +bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
> > +{
> > +	uint32_t nuse, puse;
> > +
> > +	/* make sure all previous loads and stores are completed */
> > +	rte_smp_mb();
> > +
> 
> Read conjunction with below change
> 
> #if 0
> > +	puse = cbi->use;
> > +
> > +	/* in use, busy wait till current RX/TX iteration is finished */
> > +	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
> > +		do {
> > +			rte_pause();
> > +			rte_compiler_barrier();
> > +			nuse = cbi->use;
> > +		} while (nuse == puse);
> > +	}
> #else
> 	cbi->cb = NULL;
> 	while (likely(cb->done != 1)) {
> 		rte_pause();
> 		rte_smb_rmb();
> 	}
> 
> or any other logic using flag to wait until callback completes.
> #endif

I thought about it, but such approach makes control-path function progress dependent
on simultaneous invocation of data-path functions.
In some cases it would cause control-path to hang.
Let say there is no traffic for that port/queue (i.e. tx_burst() wouldn't be called),
or user invokes control-path and data-path function from the same thread, i.e:

rte_bpf_eth_rx_elf_load(port, queue, ...);
....
rte_eth_rx_burst(port, queue, ...);
...
rte_bpf_eth_rx_unload(port,queue);

Konstantin

> 
> > +}
> > +
> > +
> > +/*
> > + * RX/TX callbacks for raw data bpf.
> > + */
> > +
> > +static uint16_t
> > +bpf_rx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
> > +	struct rte_mbuf *pkt[], uint16_t nb_pkts,
> > +	__rte_unused uint16_t max_pkts, void *user_param)
> > +{
> > +	struct bpf_eth_cbi *cbi;
> > +	uint16_t rc;
> > +
> > +	cbi = user_param;
> > +
> 
> Read conjunction with above change
> 
> #if 0
> > +	bpf_eth_cbi_inuse(cbi);
> > +	rc = (cbi->cb != NULL) ?
> > +		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 1) :
> > +		nb_pkts;
> > +	bpf_eth_cbi_unuse(cbi);
> #else
> 	if (likely(cbi->cb != NULL))
> 		return pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 1) :
> 	else {
> 		cbi->done = 1;
> 		rte_smb_wmb();
> 		return nb_pkts;
> 	}
> #endif
> 
> > +	return rc;
> > +}
> > +

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (5 preceding siblings ...)
  2018-03-13 13:02 ` [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Jerin Jacob
@ 2018-03-14 16:43 ` Alejandro Lucero
       [not found]   ` <2601191342CEEE43887BDE71AB9772589E29032C@irsmsx105.ger.corp.intel.com>
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 0/7] " Konstantin Ananyev
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 83+ messages in thread
From: Alejandro Lucero @ 2018-03-14 16:43 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, nick viljoen, Quentin Monnet

I tried to start a discussion about eBPF support with DPDK in last DPDK
meeting in Santa Clara:

https://dpdksummit.com/Archive/pdf/2017USA/DPDK%20support%20for%20new%20hardware%20offloads.pdf

In slide 17 I have some points which, IMHO, are worth to discuss before
adding this support.

I can see compatibility with eBPF programs used with the kernel being just
enough for adding this to DPDK, but if I understand where eBPF inside the
kernel is going (regarding network stack), those programs are going to (or
could) refer to kernel "code", so maybe this compatibility is just
impossible to support. That would force a check for avoiding those programs
with such references and I can see this would become in a mess quickly.

Assuming this issue could be overcome (or not an issue at all), maybe it
makes sense to execute eBPF programs but, does it make sense to execute
eBPF code? To start with, we are going to execute userspace code in
userspace context, so some (I would say main) reasons behind eBPF do not
apply. And from a performance point of view, can we ensure eBPF code
execution is going to be at same level than DPDK?  Would not it be a better
idea to translate ebpf programs to other language like ... C?

Don't take me wrong. I'm not against adding eBPF at all. In fact, from my
company's point of view, Netronome, we would be happy to have this with
DPDK and to support eBPF offload as this is possible now with the netdev
driver.


On Fri, Mar 9, 2018 at 4:42 PM, Konstantin Ananyev <
konstantin.ananyev@intel.com> wrote:

> BPF is used quite intensively inside Linux (and BSD) kernels
> for various different purposes and proved to be extremely useful.
>
> BPF inside DPDK might also be used in a lot of places
> for a lot of similar things.
>  As an example to:
> - packet filtering/tracing (aka tcpdump)
> - packet classification
> - statistics collection
> - HW/PMD live-system debugging/prototyping - trace HW descriptors,
>   internal PMD SW state, etc.
>  ...
>
> All of that in a dynamic, user-defined and extensible manner.
>
> So these series introduce new library - librte_bpf.
> librte_bpf provides API to load and execute BPF bytecode within
> user-space dpdk app.
> It supports basic set of features from eBPF spec.
> Also it introduces basic framework to load/unload BPF-based filters
> on eth devices (right now via SW RX/TX callbacks).
>
> How to try it:
> ===============
>
> 1) run testpmd as usual and start your favorite forwarding case.
> 2) build bpf program you'd like to load
> (you'll need clang v3.7 or above):
> $ cd test/bpf
> $ clang -O2 -target bpf -c t1.c
>
> 3) load bpf program(s):
> testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>
>
> <load-flags>:  [-][J][M]
> J - use JIT generated native code, otherwise BPF interpreter will be used.
> M - assume input parameter is a pointer to rte_mbuf,
>     otherwise assume it is a pointer to first segment's data.
>
> Few examples:
>
> # to load (not JITed) dummy.o at TX queue 0, port 0:
> testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o
>
> #to load (and JIT compile) t1.o at RX queue 0, port 1:
> testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o
>
> #to load and JIT t3.o (note that it expects mbuf as an input):
> testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o
>
> If you are curious to check JIT generated native code:
> gdb -p `pgrep testpmd`
> (gdb) disas 0x7fd173c5f000,+76
> Dump of assembler code from 0x7fd173c5f000 to 0x7fd173c5f04c:
>    0x00007fd173c5f000:  mov    %rdi,%rsi
>    0x00007fd173c5f003:  movzwq 0x10(%rsi),%rdi
>    0x00007fd173c5f008:  mov    0x0(%rsi),%rdx
>    0x00007fd173c5f00c:  add    %rdi,%rdx
>    0x00007fd173c5f00f:  movzbq 0xc(%rdx),%rdi
>    0x00007fd173c5f014:  movzbq 0xd(%rdx),%rdx
>    0x00007fd173c5f019:  shl    $0x8,%rdx
>    0x00007fd173c5f01d:  or     %rdi,%rdx
>    0x00007fd173c5f020:  cmp    $0x608,%rdx
>    0x00007fd173c5f027:  jne    0x7fd173c5f044
>    0x00007fd173c5f029:  mov    $0xb712e8,%rdi
>    0x00007fd173c5f030:  mov    0x0(%rdi),%rdi
>    0x00007fd173c5f034:  mov    $0x40,%rdx
>    0x00007fd173c5f03b:  mov    $0x4db2f0,%rax
>    0x00007fd173c5f042:  callq  *%rax
>    0x00007fd173c5f044:  mov    $0x1,%rax
>    0x00007fd173c5f04b:  retq
> End of assembler dump.
>
> 4) observe changed traffic behavior
> Let say with the examples above:
>   - dummy.o  does literally nothing, so no changes should be here,
>     except some possible slowdown.
>  - t1.o - should force to drop all packets that doesn't match:
>    'dst 1.2.3.4 && udp && dst port 5000' filter.
>  - t3.o - should dump to stdout ARP packets.
>
> 5) unload some or all bpf programs:
> testpmd> bpf-unload tx 0 0
>
> 6) continue with step 3) or exit
>
> TODO list:
> ==========
> - meson build
> - UT for it
> - implement proper validate()
> - allow JIT to generate bulk version
> - FreeBSD support
>
> Not currently supported features:
> =================================
> - cBPF
> - tail-pointer call
> - eBPF MAP
> - JIT for non X86_64 targets
> - skb
>
> Konstantin Ananyev (5):
>   bpf: add BPF loading and execution framework
>   bpf: add JIT compilation for x86_64 ISA.
>   bpf: introduce basic RX/TX BPF filters
>   testpmd: new commands to load/unload BPF filters
>   test: add few eBPF samples
>
>  app/test-pmd/bpf_sup.h             |   25 +
>  app/test-pmd/cmdline.c             |  146 ++++
>  config/common_base                 |    5 +
>  config/common_linuxapp             |    1 +
>  lib/Makefile                       |    2 +
>  lib/librte_bpf/Makefile            |   35 +
>  lib/librte_bpf/bpf.c               |   52 ++
>  lib/librte_bpf/bpf_exec.c          |  452 ++++++++++++
>  lib/librte_bpf/bpf_impl.h          |   37 +
>  lib/librte_bpf/bpf_jit_x86.c       | 1329 ++++++++++++++++++++++++++++++
> ++++++
>  lib/librte_bpf/bpf_load.c          |  380 +++++++++++
>  lib/librte_bpf/bpf_pkt.c           |  524 ++++++++++++++
>  lib/librte_bpf/bpf_validate.c      |   55 ++
>  lib/librte_bpf/rte_bpf.h           |  158 +++++
>  lib/librte_bpf/rte_bpf_ethdev.h    |   50 ++
>  lib/librte_bpf/rte_bpf_version.map |   16 +
>  mk/rte.app.mk                      |    2 +
>  test/bpf/dummy.c                   |   20 +
>  test/bpf/mbuf.h                    |  556 +++++++++++++++
>  test/bpf/t1.c                      |   53 ++
>  test/bpf/t2.c                      |   30 +
>  test/bpf/t3.c                      |   36 +
>  22 files changed, 3964 insertions(+)
>  create mode 100644 app/test-pmd/bpf_sup.h
>  create mode 100644 lib/librte_bpf/Makefile
>  create mode 100644 lib/librte_bpf/bpf.c
>  create mode 100644 lib/librte_bpf/bpf_exec.c
>  create mode 100644 lib/librte_bpf/bpf_impl.h
>  create mode 100644 lib/librte_bpf/bpf_jit_x86.c
>  create mode 100644 lib/librte_bpf/bpf_load.c
>  create mode 100644 lib/librte_bpf/bpf_pkt.c
>  create mode 100644 lib/librte_bpf/bpf_validate.c
>  create mode 100644 lib/librte_bpf/rte_bpf.h
>  create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h
>  create mode 100644 lib/librte_bpf/rte_bpf_version.map
>  create mode 100644 test/bpf/dummy.c
>  create mode 100644 test/bpf/mbuf.h
>  create mode 100644 test/bpf/t1.c
>  create mode 100644 test/bpf/t2.c
>  create mode 100644 test/bpf/t3.c
>
> --
> 2.13.6
>
>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code
       [not found]   ` <2601191342CEEE43887BDE71AB9772589E29032C@irsmsx105.ger.corp.intel.com>
@ 2018-03-16  9:45     ` Ananyev, Konstantin
  0 siblings, 0 replies; 83+ messages in thread
From: Ananyev, Konstantin @ 2018-03-16  9:45 UTC (permalink / raw)
  To: alejandro.lucero; +Cc: dev

> 
> I tried to start a discussion about eBPF support with DPDK in last DPDK meeting in Santa Clara:
> 
> https://dpdksummit.com/Archive/pdf/2017USA/DPDK%20support%20for%20new%20hardware%20offloads.pdf
> 
> In slide 17 I have some points which, IMHO, are worth to discuss before adding this support.
> 
> I can see compatibility with eBPF programs used with the kernel being just enough for adding this to DPDK, but if I understand where eBPF
> inside the kernel is going (regarding network stack), those programs are going to (or could) refer to kernel "code", so maybe this
> compatibility is just impossible to support. That would force a check for avoiding those programs with such references and I can see this
> would become in a mess quickly.

Inside DPDK we can (and should, I think) support eBPF ISA
(https://github.com/iovisor/bpf-docs/blob/master/eBPF.md).
Though of course it would be hard (if possible at all) to support kernel specific structures and functions.
And I don't think we have to go that way, instead it would much plausible for DPDK users to allow eBPF inside DPDK
to refer DPDK specific structures/functions (rte_mbuf, etc.). 
So if we have a eBPF program that accepts pointer to raw packet data as an input and doesn't refer any
external symbols - it should run unmodified with both kernel and DPDK BPF VM.
In other cases we wouldn't have full compatibility here.    

> 
> Assuming this issue could be overcome (or not an issue at all), maybe it makes sense to execute eBPF programs but, does it make sense to
> execute eBPF code? To start with, we are going to execute userspace code in userspace context, so some (I would say main) reasons behind
> eBPF do not apply.

Well, these days BPF used for many different purposes inside kernel.
Some of these purposes would be valid for DPDK apps too, others probably wouldn't.
For example - ability to dynamically create/destroy packet filters to classify/trace/drop/collect statistics
in a user defined way - that I think what many users would be interested in and that what DPDK is missing these days.
Again, nothing prevents people to use BPF inside DPDK for something totally different from current kernel usages.

> And from a performance point of view, can we ensure eBPF code execution is going to be at same level than
> DPDK?  

Obviously performance depends from many things:
  - actual eBPF code you are going to execute
  - interpreter/JIT/HW offload you are going to use for that
  - context at which eBPF VM will be executed
  - etc.

In general, if you load a new packet filter running in SW -
yes that would consume some extra CPU cycles and might affect performance.
But in many cases it is an acceptable tradeoff - functionality vs performance.
Again, it is totally up to user - if he feels he doesn't need that functionality,
he just wouldn't load BPF programs.

>Would not it be a better idea to translate ebpf programs to other language like ... C?

clang (starting from v 3.7) supports eBPF as one of its backend targets, so now it is possible to
write eBPF procedures using C (restricted version).
In fact, all samples in patch #5 are written in pure C.

> 
> Don't take me wrong. I'm not against adding eBPF at all. In fact, from my company's point of view, Netronome, we would be happy to have
> this with DPDK and to support eBPF offload as this is possible now with the netdev driver.

Konstantin


> 
> 
> On Fri, Mar 9, 2018 at 4:42 PM, Konstantin Ananyev <konstantin.ananyev@intel.com> wrote:
> BPF is used quite intensively inside Linux (and BSD) kernels
> for various different purposes and proved to be extremely useful.
> 
> BPF inside DPDK might also be used in a lot of places
> for a lot of similar things.
>  As an example to:
> - packet filtering/tracing (aka tcpdump)
> - packet classification
> - statistics collection
> - HW/PMD live-system debugging/prototyping - trace HW descriptors,
>   internal PMD SW state, etc.
>  ...
> 
> All of that in a dynamic, user-defined and extensible manner.
> 
> So these series introduce new library - librte_bpf.
> librte_bpf provides API to load and execute BPF bytecode within
> user-space dpdk app.
> It supports basic set of features from eBPF spec.
> Also it introduces basic framework to load/unload BPF-based filters
> on eth devices (right now via SW RX/TX callbacks).
> 
> How to try it:
> ===============
> 
> 1) run testpmd as usual and start your favorite forwarding case.
> 2) build bpf program you'd like to load
> (you'll need clang v3.7 or above):
> $ cd test/bpf
> $ clang -O2 -target bpf -c t1.c
> 
> 3) load bpf program(s):
> testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>
> 
> <load-flags>:  [-][J][M]
> J - use JIT generated native code, otherwise BPF interpreter will be used.
> M - assume input parameter is a pointer to rte_mbuf,
>     otherwise assume it is a pointer to first segment's data.
> 
> Few examples:
> 
> # to load (not JITed) dummy.o at TX queue 0, port 0:
> testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o
> 
> #to load (and JIT compile) t1.o at RX queue 0, port 1:
> testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o
> 
> #to load and JIT t3.o (note that it expects mbuf as an input):
> testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o
> 
> If you are curious to check JIT generated native code:
> gdb -p `pgrep testpmd`
> (gdb) disas 0x7fd173c5f000,+76
> Dump of assembler code from 0x7fd173c5f000 to 0x7fd173c5f04c:
>    0x00007fd173c5f000:  mov    %rdi,%rsi
>    0x00007fd173c5f003:  movzwq 0x10(%rsi),%rdi
>    0x00007fd173c5f008:  mov    0x0(%rsi),%rdx
>    0x00007fd173c5f00c:  add    %rdi,%rdx
>    0x00007fd173c5f00f:  movzbq 0xc(%rdx),%rdi
>    0x00007fd173c5f014:  movzbq 0xd(%rdx),%rdx
>    0x00007fd173c5f019:  shl    $0x8,%rdx
>    0x00007fd173c5f01d:  or     %rdi,%rdx
>    0x00007fd173c5f020:  cmp    $0x608,%rdx
>    0x00007fd173c5f027:  jne    0x7fd173c5f044
>    0x00007fd173c5f029:  mov    $0xb712e8,%rdi
>    0x00007fd173c5f030:  mov    0x0(%rdi),%rdi
>    0x00007fd173c5f034:  mov    $0x40,%rdx
>    0x00007fd173c5f03b:  mov    $0x4db2f0,%rax
>    0x00007fd173c5f042:  callq  *%rax
>    0x00007fd173c5f044:  mov    $0x1,%rax
>    0x00007fd173c5f04b:  retq
> End of assembler dump.
> 
> 4) observe changed traffic behavior
> Let say with the examples above:
>   - dummy.o  does literally nothing, so no changes should be here,
>     except some possible slowdown.
>  - t1.o - should force to drop all packets that doesn't match:
>    'dst 1.2.3.4 && udp && dst port 5000' filter.
>  - t3.o - should dump to stdout ARP packets.
> 
> 5) unload some or all bpf programs:
> testpmd> bpf-unload tx 0 0
> 
> 6) continue with step 3) or exit
> 
> TODO list:
> ==========
> - meson build
> - UT for it
> - implement proper validate()
> - allow JIT to generate bulk version
> - FreeBSD support
> 
> Not currently supported features:
> =================================
> - cBPF
> - tail-pointer call
> - eBPF MAP
> - JIT for non X86_64 targets
> - skb
> 
> Konstantin Ananyev (5):
>   bpf: add BPF loading and execution framework
>   bpf: add JIT compilation for x86_64 ISA.
>   bpf: introduce basic RX/TX BPF filters
>   testpmd: new commands to load/unload BPF filters
>   test: add few eBPF samples
> 
>  app/test-pmd/bpf_sup.h             |   25 +
>  app/test-pmd/cmdline.c             |  146 ++++
>  config/common_base                 |    5 +
>  config/common_linuxapp             |    1 +
>  lib/Makefile                       |    2 +
>  lib/librte_bpf/Makefile            |   35 +
>  lib/librte_bpf/bpf.c               |   52 ++
>  lib/librte_bpf/bpf_exec.c          |  452 ++++++++++++
>  lib/librte_bpf/bpf_impl.h          |   37 +
>  lib/librte_bpf/bpf_jit_x86.c       | 1329 ++++++++++++++++++++++++++++++++++++
>  lib/librte_bpf/bpf_load.c          |  380 +++++++++++
>  lib/librte_bpf/bpf_pkt.c           |  524 ++++++++++++++
>  lib/librte_bpf/bpf_validate.c      |   55 ++
>  lib/librte_bpf/rte_bpf.h           |  158 +++++
>  lib/librte_bpf/rte_bpf_ethdev.h    |   50 ++
>  lib/librte_bpf/rte_bpf_version.map |   16 +
>  mk/rte.app.mk                      |    2 +
>  test/bpf/dummy.c                   |   20 +
>  test/bpf/mbuf.h                    |  556 +++++++++++++++
>  test/bpf/t1.c                      |   53 ++
>  test/bpf/t2.c                      |   30 +
>  test/bpf/t3.c                      |   36 +
>  22 files changed, 3964 insertions(+)
>  create mode 100644 app/test-pmd/bpf_sup.h
>  create mode 100644 lib/librte_bpf/Makefile
>  create mode 100644 lib/librte_bpf/bpf.c
>  create mode 100644 lib/librte_bpf/bpf_exec.c
>  create mode 100644 lib/librte_bpf/bpf_impl.h
>  create mode 100644 lib/librte_bpf/bpf_jit_x86.c
>  create mode 100644 lib/librte_bpf/bpf_load.c
>  create mode 100644 lib/librte_bpf/bpf_pkt.c
>  create mode 100644 lib/librte_bpf/bpf_validate.c
>  create mode 100644 lib/librte_bpf/rte_bpf.h
>  create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h
>  create mode 100644 lib/librte_bpf/rte_bpf_version.map
>  create mode 100644 test/bpf/dummy.c
>  create mode 100644 test/bpf/mbuf.h
>  create mode 100644 test/bpf/t1.c
>  create mode 100644 test/bpf/t2.c
>  create mode 100644 test/bpf/t3.c
> 
> --
> 2.13.6


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v2 0/7] add framework to load and execute BPF code
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (6 preceding siblings ...)
  2018-03-14 16:43 ` Alejandro Lucero
@ 2018-03-30 17:32 ` Konstantin Ananyev
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-30 17:32 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

BPF is used quite intensively inside Linux (and BSD) kernels
for various different purposes and proved to be extremely useful.

BPF inside DPDK might also be used in a lot of places
for a lot of similar things.
 As an example to:
- packet filtering/tracing (aka tcpdump)
- packet classification
- statistics collection
- HW/PMD live-system debugging/prototyping - trace HW descriptors,
  internal PMD SW state, etc.
- Comeup with your own idea

All of that in a dynamic, user-defined and extensible manner.

So these series introduce new library - librte_bpf.
librte_bpf provides API to load and execute BPF bytecode within
user-space dpdk app.
It supports basic set of features from eBPF spec.
Also it introduces basic framework to load/unload BPF-based filters
on eth devices (right now via SW RX/TX callbacks).

How to try it:
===============

1) run testpmd as usual and start your favorite forwarding case.
2) build bpf program you'd like to load
(you'll need clang v3.7 or above):
$ cd test/bpf
$ clang -O2 -target bpf -c t1.c

3) load bpf program(s):
testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>

<load-flags>:  [-][J][M]
J - use JIT generated native code, otherwise BPF interpreter will be used.
M - assume input parameter is a pointer to rte_mbuf,
    otherwise assume it is a pointer to first segment's data.

Few examples:

# to load (not JITed) dummy.o at TX queue 0, port 0:
testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o

#to load (and JIT compile) t1.o at RX queue 0, port 1:
testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o

#to load and JIT t3.o (note that it expects mbuf as an input):
testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o

If you are curious to check JIT generated native code:
gdb -p `pgrep testpmd`
(gdb) disas 0x7fd173c5f000,+76
Dump of assembler code from 0x7fd173c5f000 to 0x7fd173c5f04c:
   0x00007fd173c5f000:  mov    %rdi,%rsi
   0x00007fd173c5f003:  movzwq 0x10(%rsi),%rdi
   0x00007fd173c5f008:  mov    0x0(%rsi),%rdx
   0x00007fd173c5f00c:  add    %rdi,%rdx
   0x00007fd173c5f00f:  movzbq 0xc(%rdx),%rdi
   0x00007fd173c5f014:  movzbq 0xd(%rdx),%rdx
   0x00007fd173c5f019:  shl    $0x8,%rdx
   0x00007fd173c5f01d:  or     %rdi,%rdx
   0x00007fd173c5f020:  cmp    $0x608,%rdx
   0x00007fd173c5f027:  jne    0x7fd173c5f044
   0x00007fd173c5f029:  mov    $0xb712e8,%rdi
   0x00007fd173c5f030:  mov    0x0(%rdi),%rdi
   0x00007fd173c5f034:  mov    $0x40,%rdx
   0x00007fd173c5f03b:  mov    $0x4db2f0,%rax
   0x00007fd173c5f042:  callq  *%rax
   0x00007fd173c5f044:  mov    $0x1,%rax
   0x00007fd173c5f04b:  retq
End of assembler dump.


4) observe changed traffic behavior
Let say with the examples above:
  - dummy.o  does literally nothing, so no changes should be here,
    except some possible slowdown.
 - t1.o - should force to drop all packets that doesn't match:
   'dst 1.2.3.4 && udp && dst port 5000' filter.
 - t3.o - should dump to stdout ARP packets.

5) unload some or all bpf programs:
testpmd> bpf-unload tx 0 0

6) continue with step 3) or exit

TODO list:
==========
- UT for it
- allow JIT to generate bulk version

Not currently supported features:
=================================
- cBPF
- tail-pointer call
- eBPF MAP
- JIT for non X86_64 targets
- skb

v2:
 - add meson build
 - add freebsd build
 - use new logging API
 - using rte_malloc() for cbi allocation
 - add extra logic into bpf_validate()

Konstantin Ananyev (7):
  net: move BPF related definitions into librte_net
  bpf: add BPF loading and execution framework
  bpf: add more logic into bpf_validate()
  bpf: add JIT compilation for x86_64 ISA
  bpf: introduce basic RX/TX BPF filters
  testpmd: new commands to load/unload BPF filters
  test: add few eBPF samples

 app/test-pmd/bpf_sup.h             |   25 +
 app/test-pmd/cmdline.c             |  146 ++++
 app/test-pmd/meson.build           |    2 +-
 config/common_base                 |    5 +
 drivers/net/tap/tap_bpf.h          |   80 +--
 lib/Makefile                       |    2 +
 lib/librte_bpf/Makefile            |   35 +
 lib/librte_bpf/bpf.c               |   64 ++
 lib/librte_bpf/bpf_exec.c          |  452 ++++++++++++
 lib/librte_bpf/bpf_impl.h          |   41 ++
 lib/librte_bpf/bpf_jit_x86.c       | 1329 ++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_load.c          |  385 +++++++++++
 lib/librte_bpf/bpf_pkt.c           |  607 ++++++++++++++++
 lib/librte_bpf/bpf_validate.c      | 1166 +++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build         |   24 +
 lib/librte_bpf/rte_bpf.h           |  160 +++++
 lib/librte_bpf/rte_bpf_ethdev.h    |  100 +++
 lib/librte_bpf/rte_bpf_version.map |   16 +
 lib/librte_net/Makefile            |    1 +
 lib/librte_net/meson.build         |    3 +-
 lib/librte_net/rte_bpf_def.h       |  370 ++++++++++
 lib/meson.build                    |    2 +-
 mk/rte.app.mk                      |    2 +
 test/bpf/dummy.c                   |   20 +
 test/bpf/mbuf.h                    |  578 ++++++++++++++++
 test/bpf/t1.c                      |   52 ++
 test/bpf/t2.c                      |   31 +
 test/bpf/t3.c                      |   36 +
 28 files changed, 5652 insertions(+), 82 deletions(-)
 create mode 100644 app/test-pmd/bpf_sup.h
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map
 create mode 100644 lib/librte_net/rte_bpf_def.h
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (7 preceding siblings ...)
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 0/7] " Konstantin Ananyev
@ 2018-03-30 17:32 ` Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 00/10] add framework to load and execute BPF code Konstantin Ananyev
                     ` (10 more replies)
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 2/7] bpf: add BPF loading and execution framework Konstantin Ananyev
                   ` (5 subsequent siblings)
  14 siblings, 11 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-30 17:32 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, olivier.matz, pascal.mazon

Different linux distro may include partial definitions for eBPF
while the linux kernel (or dpdk) may support given eBPF feature.
To avoid issues define in one place all that is needed (and/or supported)
by various DPDK components.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/tap/tap_bpf.h    |  80 +---------
 lib/librte_net/Makefile      |   1 +
 lib/librte_net/meson.build   |   3 +-
 lib/librte_net/rte_bpf_def.h | 370 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 374 insertions(+), 80 deletions(-)
 create mode 100644 lib/librte_net/rte_bpf_def.h

diff --git a/drivers/net/tap/tap_bpf.h b/drivers/net/tap/tap_bpf.h
index 1a70ffe21..d059648d8 100644
--- a/drivers/net/tap/tap_bpf.h
+++ b/drivers/net/tap/tap_bpf.h
@@ -6,85 +6,7 @@
 #define __TAP_BPF_H__
 
 #include <tap_autoconf.h>
-
-/* Do not #include <linux/bpf.h> since eBPF must compile on different
- * distros which may include partial definitions for eBPF (while the
- * kernel itself may support eBPF). Instead define here all that is needed
- */
-
-/* BPF_MAP_UPDATE_ELEM command flags */
-#define	BPF_ANY	0 /* create a new element or update an existing */
-
-/* BPF architecture instruction struct */
-struct bpf_insn {
-	__u8	code;
-	__u8	dst_reg:4;
-	__u8	src_reg:4;
-	__s16	off;
-	__s32	imm; /* immediate value */
-};
-
-/* BPF program types */
-enum bpf_prog_type {
-	BPF_PROG_TYPE_UNSPEC,
-	BPF_PROG_TYPE_SOCKET_FILTER,
-	BPF_PROG_TYPE_KPROBE,
-	BPF_PROG_TYPE_SCHED_CLS,
-	BPF_PROG_TYPE_SCHED_ACT,
-};
-
-/* BPF commands types */
-enum bpf_cmd {
-	BPF_MAP_CREATE,
-	BPF_MAP_LOOKUP_ELEM,
-	BPF_MAP_UPDATE_ELEM,
-	BPF_MAP_DELETE_ELEM,
-	BPF_MAP_GET_NEXT_KEY,
-	BPF_PROG_LOAD,
-};
-
-/* BPF maps types */
-enum bpf_map_type {
-	BPF_MAP_TYPE_UNSPEC,
-	BPF_MAP_TYPE_HASH,
-};
-
-/* union of anonymous structs used with TAP BPF commands */
-union bpf_attr {
-	/* BPF_MAP_CREATE command */
-	struct {
-		__u32	map_type;
-		__u32	key_size;
-		__u32	value_size;
-		__u32	max_entries;
-		__u32	map_flags;
-		__u32	inner_map_fd;
-	};
-
-	/* BPF_MAP_UPDATE_ELEM, BPF_MAP_DELETE_ELEM commands */
-	struct {
-		__u32		map_fd;
-		__aligned_u64	key;
-		union {
-			__aligned_u64 value;
-			__aligned_u64 next_key;
-		};
-		__u64		flags;
-	};
-
-	/* BPF_PROG_LOAD command */
-	struct {
-		__u32		prog_type;
-		__u32		insn_cnt;
-		__aligned_u64	insns;
-		__aligned_u64	license;
-		__u32		log_level;
-		__u32		log_size;
-		__aligned_u64	log_buf;
-		__u32		kern_version;
-		__u32		prog_flags;
-	};
-} __attribute__((aligned(8)));
+#include <rte_bpf_def.h>
 
 #ifndef __NR_bpf
 # if defined(__i386__)
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index 95ff54900..8876b3dcc 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -20,5 +20,6 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_esp
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_bpf_def.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
index 78c0f03e5..ab0b82962 100644
--- a/lib/librte_net/meson.build
+++ b/lib/librte_net/meson.build
@@ -12,7 +12,8 @@ headers = files('rte_ip.h',
 	'rte_ether.h',
 	'rte_gre.h',
 	'rte_net.h',
-	'rte_net_crc.h')
+	'rte_net_crc.h',
+	'rte_bpf_def.h')
 
 sources = files('rte_arp.c', 'rte_net.c', 'rte_net_crc.c')
 deps += ['mbuf']
diff --git a/lib/librte_net/rte_bpf_def.h b/lib/librte_net/rte_bpf_def.h
new file mode 100644
index 000000000..3f4a5a3e7
--- /dev/null
+++ b/lib/librte_net/rte_bpf_def.h
@@ -0,0 +1,370 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd.
+ */
+
+#ifndef _RTE_BPF_DEF_H_
+#define _RTE_BPF_DEF_H_
+
+#ifdef __linux__
+#include <linux/types.h>
+#else
+
+typedef uint8_t __u8;
+typedef uint16_t __u16;
+typedef uint32_t __u32;
+typedef uint64_t __u64;
+
+typedef int8_t __s8;
+typedef int16_t __s16;
+typedef int32_t __s32;
+
+#define __aligned_u64 __u64 __attribute__((aligned(8)))
+
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Do not #include <linux/bpf.h> since eBPF must compile on different
+ * distros which may include partial definitions for eBPF (while the
+ * kernel itself may support eBPF). Instead define here all that is needed
+ * by various DPDK components.
+ */
+
+/* Instruction classes */
+#define BPF_CLASS(code) ((code) & 0x07)
+#define		BPF_LD		0x00
+#define		BPF_LDX		0x01
+#define		BPF_ST		0x02
+#define		BPF_STX		0x03
+#define		BPF_ALU		0x04
+#define		BPF_JMP		0x05
+#define		BPF_RET		0x06
+#define		BPF_MISC        0x07
+
+/* ld/ldx fields */
+#define BPF_SIZE(code)  ((code) & 0x18)
+#define		BPF_W		0x00
+#define		BPF_H		0x08
+#define		BPF_B		0x10
+#define BPF_MODE(code)  ((code) & 0xe0)
+#define		BPF_IMM		0x00
+#define		BPF_ABS		0x20
+#define		BPF_IND		0x40
+#define		BPF_MEM		0x60
+#define		BPF_LEN		0x80
+#define		BPF_MSH		0xa0
+
+/* alu/jmp fields */
+#define BPF_OP(code)    ((code) & 0xf0)
+#define		BPF_ADD		0x00
+#define		BPF_SUB		0x10
+#define		BPF_MUL		0x20
+#define		BPF_DIV		0x30
+#define		BPF_OR		0x40
+#define		BPF_AND		0x50
+#define		BPF_LSH		0x60
+#define		BPF_RSH		0x70
+#define		BPF_NEG		0x80
+#define		BPF_MOD		0x90
+#define		BPF_XOR		0xa0
+
+#define		BPF_JA		0x00
+#define		BPF_JEQ		0x10
+#define		BPF_JGT		0x20
+#define		BPF_JGE		0x30
+#define		BPF_JSET        0x40
+#define BPF_SRC(code)   ((code) & 0x08)
+#define		BPF_K		0x00
+#define		BPF_X		0x08
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/* Extended instruction set based on top of classic BPF */
+
+/* instruction classes */
+#define BPF_ALU64	0x07	/* alu mode in double word width */
+
+/* ld/ldx fields */
+#define BPF_DW		0x18	/* double word */
+#define BPF_XADD	0xc0	/* exclusive add */
+
+/* alu/jmp fields */
+#define BPF_MOV		0xb0	/* mov reg to reg */
+#define BPF_ARSH	0xc0	/* sign extending arithmetic shift right */
+
+/* change endianness of a register */
+#define BPF_END		0xd0	/* flags for endianness conversion: */
+#define BPF_TO_LE	0x00	/* convert to little-endian */
+#define BPF_TO_BE	0x08	/* convert to big-endian */
+#define BPF_FROM_LE	BPF_TO_LE
+#define BPF_FROM_BE	BPF_TO_BE
+
+/* jmp encodings */
+#define BPF_JNE		0x50	/* jump != */
+#define BPF_JLT		0xa0	/* LT is unsigned, '<' */
+#define BPF_JLE		0xb0	/* LE is unsigned, '<=' */
+#define BPF_JSGT	0x60	/* SGT is signed '>', GT in x86 */
+#define BPF_JSGE	0x70	/* SGE is signed '>=', GE in x86 */
+#define BPF_JSLT	0xc0	/* SLT is signed, '<' */
+#define BPF_JSLE	0xd0	/* SLE is signed, '<=' */
+#define BPF_CALL	0x80	/* function call */
+#define BPF_EXIT	0x90	/* function return */
+
+/* Register numbers */
+enum {
+	BPF_REG_0 = 0,
+	BPF_REG_1,
+	BPF_REG_2,
+	BPF_REG_3,
+	BPF_REG_4,
+	BPF_REG_5,
+	BPF_REG_6,
+	BPF_REG_7,
+	BPF_REG_8,
+	BPF_REG_9,
+	BPF_REG_10,
+	__MAX_BPF_REG,
+};
+
+/* BPF has 10 general purpose 64-bit registers and stack frame. */
+#define MAX_BPF_REG	__MAX_BPF_REG
+
+struct bpf_insn {
+	__u8	code;		/* opcode */
+	__u8	dst_reg:4;	/* dest register */
+	__u8	src_reg:4;	/* source register */
+	__s16	off;		/* signed offset */
+	__s32	imm;		/* signed immediate constant */
+};
+
+/* BPF syscall commands, see bpf(2) man-page for details. */
+enum bpf_cmd {
+	BPF_MAP_CREATE,
+	BPF_MAP_LOOKUP_ELEM,
+	BPF_MAP_UPDATE_ELEM,
+	BPF_MAP_DELETE_ELEM,
+	BPF_MAP_GET_NEXT_KEY,
+	BPF_PROG_LOAD,
+	BPF_OBJ_PIN,
+	BPF_OBJ_GET,
+	BPF_PROG_ATTACH,
+	BPF_PROG_DETACH,
+	BPF_PROG_TEST_RUN,
+	BPF_PROG_GET_NEXT_ID,
+	BPF_MAP_GET_NEXT_ID,
+	BPF_PROG_GET_FD_BY_ID,
+	BPF_MAP_GET_FD_BY_ID,
+	BPF_OBJ_GET_INFO_BY_FD,
+};
+
+enum bpf_map_type {
+	BPF_MAP_TYPE_UNSPEC,
+	BPF_MAP_TYPE_HASH,
+	BPF_MAP_TYPE_ARRAY,
+	BPF_MAP_TYPE_PROG_ARRAY,
+	BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+	BPF_MAP_TYPE_PERCPU_HASH,
+	BPF_MAP_TYPE_PERCPU_ARRAY,
+	BPF_MAP_TYPE_STACK_TRACE,
+	BPF_MAP_TYPE_CGROUP_ARRAY,
+	BPF_MAP_TYPE_LRU_HASH,
+	BPF_MAP_TYPE_LRU_PERCPU_HASH,
+	BPF_MAP_TYPE_LPM_TRIE,
+	BPF_MAP_TYPE_ARRAY_OF_MAPS,
+	BPF_MAP_TYPE_HASH_OF_MAPS,
+	BPF_MAP_TYPE_DEVMAP,
+	BPF_MAP_TYPE_SOCKMAP,
+};
+
+enum bpf_prog_type {
+	BPF_PROG_TYPE_UNSPEC,
+	BPF_PROG_TYPE_SOCKET_FILTER,
+	BPF_PROG_TYPE_KPROBE,
+	BPF_PROG_TYPE_SCHED_CLS,
+	BPF_PROG_TYPE_SCHED_ACT,
+	BPF_PROG_TYPE_TRACEPOINT,
+	BPF_PROG_TYPE_XDP,
+	BPF_PROG_TYPE_PERF_EVENT,
+	BPF_PROG_TYPE_CGROUP_SKB,
+	BPF_PROG_TYPE_CGROUP_SOCK,
+	BPF_PROG_TYPE_LWT_IN,
+	BPF_PROG_TYPE_LWT_OUT,
+	BPF_PROG_TYPE_LWT_XMIT,
+	BPF_PROG_TYPE_SOCK_OPS,
+	BPF_PROG_TYPE_SK_SKB,
+};
+
+enum bpf_attach_type {
+	BPF_CGROUP_INET_INGRESS,
+	BPF_CGROUP_INET_EGRESS,
+	BPF_CGROUP_INET_SOCK_CREATE,
+	BPF_CGROUP_SOCK_OPS,
+	BPF_SK_SKB_STREAM_PARSER,
+	BPF_SK_SKB_STREAM_VERDICT,
+	__MAX_BPF_ATTACH_TYPE
+};
+
+#define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
+
+/* If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
+ * to the given target_fd cgroup the descendent cgroup will be able to
+ * override effective bpf program that was inherited from this cgroup
+ */
+#define BPF_F_ALLOW_OVERRIDE	(1U << 0)
+
+/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
+ * verifier will perform strict alignment checking as if the kernel
+ * has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not set,
+ * and NET_IP_ALIGN defined to 2.
+ */
+#define BPF_F_STRICT_ALIGNMENT	(1U << 0)
+
+#define BPF_PSEUDO_MAP_FD	1
+
+/* flags for BPF_MAP_UPDATE_ELEM command */
+#define BPF_ANY		0 /* create new element or update existing */
+#define BPF_NOEXIST	1 /* create new element if it didn't exist */
+#define BPF_EXIST	2 /* update existing element */
+
+/* flags for BPF_MAP_CREATE command */
+#define BPF_F_NO_PREALLOC	(1U << 0)
+/* Instead of having one common LRU list in the
+ * BPF_MAP_TYPE_LRU_[PERCPU_]HASH map, use a percpu LRU list
+ * which can scale and perform better.
+ * Note, the LRU nodes (including free nodes) cannot be moved
+ * across different LRU lists.
+ */
+#define BPF_F_NO_COMMON_LRU	(1U << 1)
+/* Specify numa node during map creation */
+#define BPF_F_NUMA_NODE		(1U << 2)
+
+union bpf_attr {
+	struct { /* anonymous struct used by BPF_MAP_CREATE command */
+		__u32	map_type;	/* one of enum bpf_map_type */
+		__u32	key_size;	/* size of key in bytes */
+		__u32	value_size;	/* size of value in bytes */
+		__u32	max_entries;	/* max number of entries in a map */
+		__u32	map_flags;	/* BPF_MAP_CREATE related
+					 * flags defined above.
+					 */
+		__u32	inner_map_fd;	/* fd pointing to the inner map */
+		__u32	numa_node;	/* numa node (effective only if
+					 * BPF_F_NUMA_NODE is set).
+					 */
+	};
+
+	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
+		__u32		map_fd;
+		__aligned_u64	key;
+		union {
+			__aligned_u64 value;
+			__aligned_u64 next_key;
+		};
+		__u64		flags;
+	};
+
+	struct { /* anonymous struct used by BPF_PROG_LOAD command */
+		__u32		prog_type;	/* one of enum bpf_prog_type */
+		__u32		insn_cnt;
+		__aligned_u64	insns;
+		__aligned_u64	license;
+		__u32		log_level;
+		/* verbosity level of verifier */
+		__u32		log_size;	/* size of user buffer */
+		__aligned_u64	log_buf;	/* user supplied buffer */
+		__u32		kern_version;
+		/* checked when prog_type=kprobe */
+		__u32		prog_flags;
+	};
+
+	struct { /* anonymous struct used by BPF_OBJ_* commands */
+		__aligned_u64	pathname;
+		__u32		bpf_fd;
+	};
+
+	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
+		__u32		target_fd;
+		/* container object to attach to */
+		__u32		attach_bpf_fd;	/* eBPF program to attach */
+		__u32		attach_type;
+		__u32		attach_flags;
+	};
+
+	struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
+		__u32		prog_fd;
+		__u32		retval;
+		__u32		data_size_in;
+		__u32		data_size_out;
+		__aligned_u64	data_in;
+		__aligned_u64	data_out;
+		__u32		repeat;
+		__u32		duration;
+	} test;
+
+	struct { /* anonymous struct used by BPF_*_GET_*_ID */
+		union {
+			__u32		start_id;
+			__u32		prog_id;
+			__u32		map_id;
+		};
+		__u32		next_id;
+	};
+
+	struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
+		__u32		bpf_fd;
+		__u32		info_len;
+		__aligned_u64	info;
+	} info;
+} __attribute__((aligned(8)));
+
+/* Generic BPF return codes which all BPF program types may support.
+ * The values are binary compatible with their TC_ACT_* counter-part to
+ * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
+ * programs.
+ *
+ * XDP is handled seprately, see XDP_*.
+ */
+enum bpf_ret_code {
+	BPF_OK = 0,
+	/* 1 reserved */
+	BPF_DROP = 2,
+	/* 3-6 reserved */
+	BPF_REDIRECT = 7,
+	/* >127 are reserved for prog type specific return codes */
+};
+
+enum sk_action {
+	SK_DROP = 0,
+	SK_PASS,
+};
+
+#define BPF_TAG_SIZE	8
+
+struct bpf_prog_info {
+	__u32 type;
+	__u32 id;
+	__u8  tag[BPF_TAG_SIZE];
+	__u32 jited_prog_len;
+	__u32 xlated_prog_len;
+	__aligned_u64 jited_prog_insns;
+	__aligned_u64 xlated_prog_insns;
+} __attribute__((aligned(8)));
+
+struct bpf_map_info {
+	__u32 type;
+	__u32 id;
+	__u32 key_size;
+	__u32 value_size;
+	__u32 max_entries;
+	__u32 map_flags;
+} __attribute__((aligned(8)));
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_DEF_H_ */
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v2 2/7] bpf: add BPF loading and execution framework
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (8 preceding siblings ...)
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
@ 2018-03-30 17:32 ` Konstantin Ananyev
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 3/7] bpf: add more logic into bpf_validate() Konstantin Ananyev
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-30 17:32 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.
It supports basic set of features from eBPF spec
(https://www.kernel.org/doc/Documentation/networking/filter.txt).

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb

It also adds dependency on libelf.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_bpf/Makefile            |  30 +++
 lib/librte_bpf/bpf.c               |  59 +++++
 lib/librte_bpf/bpf_exec.c          | 452 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h          |  41 ++++
 lib/librte_bpf/bpf_load.c          | 385 +++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_validate.c      |  55 +++++
 lib/librte_bpf/meson.build         |  18 ++
 lib/librte_bpf/rte_bpf.h           | 160 +++++++++++++
 lib/librte_bpf/rte_bpf_version.map |  12 +
 lib/meson.build                    |   2 +-
 mk/rte.app.mk                      |   2 +
 13 files changed, 1222 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/config/common_base b/config/common_base
index ee10b449b..97b60f9ff 100644
--- a/config/common_base
+++ b/config/common_base
@@ -827,3 +827,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=y
diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..a4a2329f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..e0f434e77
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+LDLIBS += -lelf
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..d7f68c017
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+int rte_bpf_logtype;
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+	if (rc != 0)
+		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
+
+RTE_INIT(rte_bpf_init_log);
+
+static void
+rte_bpf_init_log(void)
+{
+	rte_bpf_logtype = rte_log_register("lib.bpf");
+	if (rte_bpf_logtype >= 0)
+		rte_log_set_level(rte_bpf_logtype, RTE_LOG_INFO);
+}
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..0382ade98
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,452 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define BPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define BPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_BPF_LOG(ERR, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[MAX_BPF_REG])
+{
+	const struct bpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			reg[BPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[BPF_REG_1], reg[BPF_REG_2], reg[BPF_REG_3],
+				reg[BPF_REG_4], reg[BPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			return reg[BPF_REG_0];
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[MAX_BPF_REG];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[BPF_REG_1] = (uintptr_t)ctx[i];
+		reg[BPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..5d7e65c31
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+extern int rte_bpf_logtype;
+
+#define	RTE_BPF_LOG(lvl, fmt, args...) \
+	rte_log(RTE_LOG_## lvl, rte_bpf_logtype, fmt, ##args)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..e1ff5714a
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,385 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+/* To overcome compatibility issue */
+#ifndef EM_BPF
+#define	EM_BPF	247
+#endif
+
+static uint32_t
+bpf_find_xsym(const char *sn, enum rte_bpf_xtype type,
+	const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == type && strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+/*
+ * update BPF code at offset *ofs* with a proper address(index) for external
+ * symbol *sn*
+ */
+static int
+resolve_xsym(const char *sn, size_t ofs, struct bpf_insn *ins, size_t ins_sz,
+	const struct rte_bpf_prm *prm)
+{
+	uint32_t idx, fidx;
+	enum rte_bpf_xtype type;
+
+	if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+		return -EINVAL;
+
+	idx = ofs / sizeof(ins[0]);
+	if (ins[idx].code == (BPF_JMP | BPF_CALL))
+		type = RTE_BPF_XTYPE_FUNC;
+	else if (ins[idx].code == (BPF_LD | BPF_IMM | BPF_DW) &&
+			ofs < ins_sz - sizeof(ins[idx]))
+		type = RTE_BPF_XTYPE_VAR;
+	else
+		return -EINVAL;
+
+	fidx = bpf_find_xsym(sn, type, prm->xsym, prm->nb_xsym);
+	if (fidx == UINT32_MAX)
+		return -ENOENT;
+
+	/* for function we just need an index in our xsym table */
+	if (type == RTE_BPF_XTYPE_FUNC)
+		ins[idx].imm = fidx;
+	/* for variable we need to store its absolute address */
+	else {
+		ins[idx].imm = (uintptr_t)prm->xsym[fidx].var;
+		ins[idx + 1].imm = (uintptr_t)prm->xsym[fidx].var >> 32;
+	}
+
+	return 0;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr * eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_BPF_LOG(ERR, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct bpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct bpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	int32_t rc;
+	uint32_t i, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		rc = resolve_xsym(sn, ofs, ins, ins_sz, prm);
+		if (rc != 0) {
+			RTE_BPF_LOG(ERR,
+				"resolve_xsym(%s, %zu) error code: %d\n",
+				sn, ofs, rc);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct bpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_BPF_LOG(ERR, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_BPF_LOG(INFO, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p(jit={.func=%p,.sz=%zu});\n",
+		__func__, fname, sname, bpf, bpf->jit.func, bpf->jit.sz);
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..1911e1381
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct bpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == BPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
new file mode 100644
index 000000000..05c48c7ff
--- /dev/null
+++ b/lib/librte_bpf/meson.build
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+allow_experimental_apis = true
+sources = files('bpf.c',
+		'bpf_exec.c',
+		'bpf_load.c',
+		'bpf_validate.c')
+
+install_headers = files('rte_bpf.h')
+
+deps += ['mbuf', 'net']
+
+dep = dependency('libelf', required: false)
+if dep.found() == false
+	build = false
+endif
+ext_deps += dep
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..4d4b93599
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,160 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <rte_bpf_def.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_VAR, /**< variable */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	union {
+		uint64_t (*func)(uint64_t, uint64_t, uint64_t,
+				uint64_t, uint64_t);
+		void *var;
+	}; /**< value */
+};
+
+/**
+ * Possible BPF program types.
+ * Use negative values for DPDK specific prog-types, to make sure they will
+ * not interfere with Linux related ones.
+ */
+enum rte_bpf_prog_type {
+	RTE_BPF_PROG_TYPE_UNSPEC = BPF_PROG_TYPE_UNSPEC,
+	/**< input is a pointer to raw data */
+	RTE_BPF_PROG_TYPE_MBUF = INT32_MIN,
+	/**< input is a pointer to rte_mbuf */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct bpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	enum rte_bpf_prog_type prog_type; /**< eBPF program type */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *);
+	size_t sz;
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ff65144df
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_elf_load;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/lib/meson.build b/lib/meson.build
index ef6159170..7ff7aaaa5 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -23,7 +23,7 @@ libraries = [ 'compat', # just a header, used for versioning
 	# add pkt framework libs which use other libs from above
 	'port', 'table', 'pipeline',
 	# flow_classify lib depends on pkt framework table lib
-	'flow_classify']
+	'flow_classify', 'bpf']
 
 foreach l:libraries
 	build = true
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 94525dc80..07a9bcfe2 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -83,6 +83,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)          += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf -lelf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v2 3/7] bpf: add more logic into bpf_validate()
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (9 preceding siblings ...)
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 2/7] bpf: add BPF loading and execution framework Konstantin Ananyev
@ 2018-03-30 17:32 ` Konstantin Ananyev
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 4/7] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-30 17:32 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add checks for:
 - all instructions are valid ones
   (known opcodes, correct syntax, valid reg/off/imm values, etc.)
 - no unreachable instructions
 - no loops
 - basic stack boundaries checks
 - division by zero

Still need to add checks for:
 - use/return only initialized registers and stack data.
 - memory boundaries violation

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/bpf_validate.c | 1163 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 1137 insertions(+), 26 deletions(-)

diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
index 1911e1381..816aa519a 100644
--- a/lib/librte_bpf/bpf_validate.c
+++ b/lib/librte_bpf/bpf_validate.c
@@ -14,42 +14,1153 @@
 
 #include "bpf_impl.h"
 
+/* possible instruction node colour */
+enum {
+	WHITE,
+	GREY,
+	BLACK,
+	MAX_NODE_COLOUR
+};
+
+/* possible edge types */
+enum {
+	UNKNOWN_EDGE,
+	TREE_EDGE,
+	BACK_EDGE,
+	CROSS_EDGE,
+	MAX_EDGE_TYPE
+};
+
+struct bpf_reg_state {
+	uint64_t val;
+};
+
+struct bpf_eval_state {
+	struct bpf_reg_state rs[MAX_BPF_REG];
+};
+
+#define	MAX_EDGES	2
+
+struct inst_node {
+	uint8_t colour;
+	uint8_t nb_edge:4;
+	uint8_t cur_edge:4;
+	uint8_t edge_type[MAX_EDGES];
+	uint32_t edge_dest[MAX_EDGES];
+	uint32_t prev_node;
+	struct bpf_eval_state *evst;
+};
+
+struct bpf_verifier {
+	const struct rte_bpf_prm *prm;
+	struct inst_node *in;
+	int32_t stack_sz;
+	uint32_t nb_nodes;
+	uint32_t nb_jcc_nodes;
+	uint32_t node_colour[MAX_NODE_COLOUR];
+	uint32_t edge_type[MAX_EDGE_TYPE];
+	struct bpf_eval_state *evst;
+	struct {
+		uint32_t num;
+		uint32_t cur;
+		struct bpf_eval_state *ent;
+	} evst_pool;
+};
+
+struct bpf_ins_check {
+	struct {
+		uint16_t dreg;
+		uint16_t sreg;
+	} mask;
+	struct {
+		uint16_t min;
+		uint16_t max;
+	} off;
+	struct {
+		uint32_t min;
+		uint32_t max;
+	} imm;
+	const char * (*check)(const struct bpf_insn *);
+	const char * (*eval)(struct bpf_verifier *, const struct bpf_insn *);
+};
+
+#define	ALL_REGS	RTE_LEN2MASK(MAX_BPF_REG, uint16_t)
+#define	WRT_REGS	RTE_LEN2MASK(BPF_REG_10, uint16_t)
+#define	ZERO_REG	RTE_LEN2MASK(BPF_REG_1, uint16_t)
+
 /*
- * dummy one for now, need more work.
+ * check and evaluate functions for particular instruction types.
  */
-int
-bpf_validate(struct rte_bpf *bpf)
+
+static const char *
+check_alu_bele(const struct bpf_insn *ins)
+{
+	if (ins->imm != 16 && ins->imm != 32 && ins->imm != 64)
+		return "invalid imm field";
+	return NULL;
+}
+
+static const char *
+eval_stack(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	int32_t ofs;
+
+	ofs = ins->off;
+
+	if (ofs >= 0 || ofs < -MAX_BPF_STACK_SIZE)
+		return "stack boundary violation";
+
+	ofs = -ofs;
+	bvf->stack_sz = RTE_MAX(bvf->stack_sz, ofs);
+	return NULL;
+}
+
+static const char *
+eval_store(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	if (ins->dst_reg == BPF_REG_10)
+		return eval_stack(bvf, ins);
+	return NULL;
+}
+
+static const char *
+eval_load(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	if (ins->src_reg == BPF_REG_10)
+		return eval_stack(bvf, ins);
+	return NULL;
+}
+
+static const char *
+eval_call(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	uint32_t idx;
+
+	idx = ins->imm;
+
+	if (idx >= bvf->prm->nb_xsym ||
+			bvf->prm->xsym[idx].type != RTE_BPF_XTYPE_FUNC)
+		return "invalid external function index";
+	return NULL;
+}
+
+/*
+ * validate parameters for each instruction type.
+ */
+static const struct bpf_ins_check ins_chk[UINT8_MAX] = {
+	/* ALU IMM 32-bit instructions */
+	[(BPF_ALU | BPF_ADD | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_SUB | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_AND | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_OR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_LSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_RSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_XOR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_MUL | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_MOV | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_DIV | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	[(BPF_ALU | BPF_MOD | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	/* ALU IMM 64-bit instructions */
+	[(BPF_ALU64 | BPF_ADD | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_SUB | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_AND | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_OR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_LSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_RSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_ARSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_XOR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_MUL | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_MOV | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_DIV | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	[(BPF_ALU64 | BPF_MOD | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	/* ALU REG 32-bit instructions */
+	[(BPF_ALU | BPF_ADD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_SUB | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_AND | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_OR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_LSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_RSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_XOR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MUL | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_DIV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MOD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MOV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_NEG)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_END | BPF_TO_BE)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 16, .max = 64},
+		.check = check_alu_bele,
+	},
+	[(BPF_ALU | BPF_END | BPF_TO_LE)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 16, .max = 64},
+		.check = check_alu_bele,
+	},
+	/* ALU REG 64-bit instructions */
+	[(BPF_ALU64 | BPF_ADD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_SUB | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_AND | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_OR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_LSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_RSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_XOR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_MUL | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_DIV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_MOD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_MOV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_NEG)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* load instructions */
+	[(BPF_LDX | BPF_MEM | BPF_B)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_H)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_W)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_DW)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	/* load 64 bit immediate value */
+	[(BPF_LD | BPF_IMM | BPF_DW)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	/* store REG instructions */
+	[(BPF_STX | BPF_MEM | BPF_B)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_H)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	/* atomic add instructions */
+	[(BPF_STX | BPF_XADD | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_XADD | BPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	/* store IMM instructions */
+	[(BPF_ST | BPF_MEM | BPF_B)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_H)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	/* jump instruction */
+	[(BPF_JMP | BPF_JA)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* jcc IMM instructions */
+	[(BPF_JMP | BPF_JEQ | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JNE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JGT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JLT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JGE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JLE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSGT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSLT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSGE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSLE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSET | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	/* jcc REG instructions */
+	[(BPF_JMP | BPF_JEQ | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JNE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JGT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JLT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JGE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JLE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSGT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSLT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSGE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSLE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSET | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* call instruction */
+	[(BPF_JMP | BPF_CALL)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_call,
+	},
+	/* ret instruction */
+	[(BPF_JMP | BPF_EXIT)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+};
+
+/*
+ * make sure that instruction syntax is valid,
+ * and it fields don't violate partciular instrcution type restrictions.
+ */
+static const char *
+check_syntax(const struct bpf_insn *ins)
+{
+
+	uint8_t op;
+	uint16_t off;
+	uint32_t imm;
+
+	op = ins->code;
+
+	if (ins_chk[op].mask.dreg == 0)
+		return "invalid opcode";
+
+	if ((ins_chk[op].mask.dreg & 1 << ins->dst_reg) == 0)
+		return "invalid dst-reg field";
+
+	if ((ins_chk[op].mask.sreg & 1 << ins->src_reg) == 0)
+		return "invalid src-reg field";
+
+	off = ins->off;
+	if (ins_chk[op].off.min > off || ins_chk[op].off.max < off)
+		return "invalid off field";
+
+	imm = ins->imm;
+	if (ins_chk[op].imm.min > imm || ins_chk[op].imm.max < imm)
+		return "invalid imm field";
+
+	if (ins_chk[op].check != NULL)
+		return ins_chk[op].check(ins);
+
+	return NULL;
+}
+
+/*
+ * helper function, return instruction index for the given node.
+ */
+static uint32_t
+get_node_idx(const struct bpf_verifier *bvf, const struct inst_node *node)
 {
-	int32_t rc, ofs, stack_sz;
-	uint32_t i, op, dr;
+	return node - bvf->in;
+}
+
+/*
+ * helper function, used to walk through constructed CFG.
+ */
+static struct inst_node *
+get_next_node(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	uint32_t ce, ne, dst;
+
+	ne = node->nb_edge;
+	ce = node->cur_edge;
+	if (ce == ne)
+		return NULL;
+
+	node->cur_edge++;
+	dst = node->edge_dest[ce];
+	return bvf->in + dst;
+}
+
+static void
+set_node_colour(struct bpf_verifier *bvf, struct inst_node *node,
+	uint32_t new)
+{
+	uint32_t prev;
+
+	prev = node->colour;
+	node->colour = new;
+
+	bvf->node_colour[prev]--;
+	bvf->node_colour[new]++;
+}
+
+/*
+ * helper function, add new edge between two nodes.
+ */
+static int
+add_edge(struct bpf_verifier *bvf, struct inst_node *node, uint32_t nidx)
+{
+	uint32_t ne;
+
+	if (nidx > bvf->prm->nb_ins) {
+		RTE_BPF_LOG(ERR, "%s: program boundary violation at pc: %u, "
+			"next pc: %u\n",
+			__func__, get_node_idx(bvf, node), nidx);
+		return -EINVAL;
+	}
+
+	ne = node->nb_edge;
+	if (ne >= RTE_DIM(node->edge_dest)) {
+		RTE_BPF_LOG(ERR, "%s: internal error at pc: %u\n",
+			__func__, get_node_idx(bvf, node));
+		return -EINVAL;
+	}
+
+	node->edge_dest[ne] = nidx;
+	node->nb_edge = ne + 1;
+	return 0;
+}
+
+/*
+ * helper function, determine type of edge between two nodes.
+ */
+static void
+set_edge_type(struct bpf_verifier *bvf, struct inst_node *node,
+	const struct inst_node *next)
+{
+	uint32_t ce, clr, type;
+
+	ce = node->cur_edge - 1;
+	clr = next->colour;
+
+	type = UNKNOWN_EDGE;
+
+	if (clr == WHITE)
+		type = TREE_EDGE;
+	else if (clr == GREY)
+		type = BACK_EDGE;
+	else if (clr == BLACK)
+		/*
+		 * in fact it could be either direct or cross edge,
+		 * but for now, we don't need to distinguish between them.
+		 */
+		type = CROSS_EDGE;
+
+	node->edge_type[ce] = type;
+	bvf->edge_type[type]++;
+}
+
+static struct inst_node *
+get_prev_node(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	return  bvf->in + node->prev_node;
+}
+
+/*
+ * Depth-First Search (DFS) through previously constructed
+ * Control Flow Graph (CFG).
+ * Information collected at this path would be used later
+ * to determine is there any loops, and/or unreachable instructions.
+ */
+static void
+dfs(struct bpf_verifier *bvf)
+{
+	struct inst_node *next, *node;
+
+	node = bvf->in;
+	while (node != NULL) {
+
+		if (node->colour == WHITE)
+			set_node_colour(bvf, node, GREY);
+
+		if (node->colour == GREY) {
+
+			/* find next unprocessed child node */
+			do {
+				next = get_next_node(bvf, node);
+				if (next == NULL)
+					break;
+				set_edge_type(bvf, node, next);
+			} while (next->colour != WHITE);
+
+			if (next != NULL) {
+				/* proceed with next child */
+				next->prev_node = get_node_idx(bvf, node);
+				node = next;
+			} else {
+				/*
+				 * finished with current node and all it's kids,
+				 * proceed with parent
+				 */
+				set_node_colour(bvf, node, BLACK);
+				node->cur_edge = 0;
+				node = get_prev_node(bvf, node);
+			}
+		} else
+			node = NULL;
+	}
+}
+
+/*
+ * report unreachable instructions.
+ */
+static void
+log_unreachable(const struct bpf_verifier *bvf)
+{
+	uint32_t i;
+	struct inst_node *node;
 	const struct bpf_insn *ins;
 
-	rc = 0;
-	stack_sz = 0;
-	for (i = 0; i != bpf->prm.nb_ins; i++) {
-
-		ins = bpf->prm.ins + i;
-		op = ins->code;
-		dr = ins->dst_reg;
-		ofs = ins->off;
-
-		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
-				dr == BPF_REG_10) {
-			ofs -= sizeof(uint64_t);
-			stack_sz = RTE_MIN(ofs, stack_sz);
+	for (i = 0; i != bvf->prm->nb_ins; i++) {
+
+		node = bvf->in + i;
+		ins = bvf->prm->ins + i;
+
+		if (node->colour == WHITE &&
+				ins->code != (BPF_LD | BPF_IMM | BPF_DW))
+			RTE_BPF_LOG(ERR, "unreachable code at pc: %u;\n", i);
+	}
+}
+
+/*
+ * report loops detected.
+ */
+static void
+log_loop(const struct bpf_verifier *bvf)
+{
+	uint32_t i, j;
+	struct inst_node *node;
+
+	for (i = 0; i != bvf->prm->nb_ins; i++) {
+
+		node = bvf->in + i;
+		if (node->colour != BLACK)
+			continue;
+
+		for (j = 0; j != node->nb_edge; j++) {
+			if (node->edge_type[j] == BACK_EDGE)
+				RTE_BPF_LOG(ERR,
+					"loop at pc:%u --> pc:%u;\n",
+					i, node->edge_dest[j]);
 		}
 	}
+}
+
+/*
+ * First pass goes though all instructions in the set, checks that each
+ * instruction is a valid one (correct syntax, valid field values, etc.)
+ * and constructs control flow graph (CFG).
+ * Then deapth-first search is performed over the constructed graph.
+ * Programs with unreachable instructions and/or loops will be rejected.
+ */
+static int
+validate(struct bpf_verifier *bvf)
+{
+	int32_t rc;
+	uint32_t i;
+	struct inst_node *node;
+	const struct bpf_insn *ins;
+	const char *err;
+
+	rc = 0;
+	for (i = 0; i < bvf->prm->nb_ins; i++) {
+
+		ins = bvf->prm->ins + i;
+		node = bvf->in + i;
 
-	if (stack_sz != 0) {
-		stack_sz = -stack_sz;
-		if (stack_sz > MAX_BPF_STACK_SIZE)
-			rc = -ERANGE;
-		else
-			bpf->stack_sz = stack_sz;
+		err = check_syntax(ins);
+		if (err != 0) {
+			RTE_BPF_LOG(ERR, "%s: %s at pc: %u\n",
+				__func__, err, i);
+			rc |= -EINVAL;
+		}
+
+		/*
+		 * construct CFG, jcc nodes have to outgoing edges,
+		 * 'exit' nodes - none, all others nodes have exaclty one
+		 * outgoing edge.
+		 */
+		switch (ins->code) {
+		case (BPF_JMP | BPF_EXIT):
+			break;
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | BPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | BPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | BPF_JLE | BPF_K):
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | BPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | BPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | BPF_JLE | BPF_X):
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			rc |= add_edge(bvf, node, i + ins->off + 1);
+			rc |= add_edge(bvf, node, i + 1);
+			bvf->nb_jcc_nodes++;
+			break;
+		case (BPF_JMP | BPF_JA):
+			rc |= add_edge(bvf, node, i + ins->off + 1);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			rc |= add_edge(bvf, node, i + 2);
+			i++;
+			break;
+		default:
+			rc |= add_edge(bvf, node, i + 1);
+			break;
+		}
+
+		bvf->nb_nodes++;
+		bvf->node_colour[WHITE]++;
 	}
 
 	if (rc != 0)
-		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
-			__func__, bpf, rc);
+		return rc;
+
+	dfs(bvf);
+
+	RTE_BPF_LOG(INFO, "%s(%p) stats:\n"
+		"nb_nodes=%u;\n"
+		"nb_jcc_nodes=%u;\n"
+		"node_color={[WHITE]=%u, [GREY]=%u,, [BLACK]=%u};\n"
+		"edge_type={[UNKNOWN]=%u, [TREE]=%u, [BACK]=%u, [CROSS]=%u};\n",
+		__func__, bvf,
+		bvf->nb_nodes,
+		bvf->nb_jcc_nodes,
+		bvf->node_colour[WHITE], bvf->node_colour[GREY],
+			bvf->node_colour[BLACK],
+		bvf->edge_type[UNKNOWN_EDGE], bvf->edge_type[TREE_EDGE],
+		bvf->edge_type[BACK_EDGE], bvf->edge_type[CROSS_EDGE]);
+
+	if (bvf->node_colour[BLACK] != bvf->nb_nodes) {
+		RTE_BPF_LOG(ERR, "%s(%p) unreachable instructions;\n",
+			__func__, bvf);
+		log_unreachable(bvf);
+		return -EINVAL;
+	}
+
+	if (bvf->node_colour[GREY] != 0 || bvf->node_colour[WHITE] != 0 ||
+			bvf->edge_type[UNKNOWN_EDGE] != 0) {
+		RTE_BPF_LOG(ERR, "%s(%p) DFS internal error;\n",
+			__func__, bvf);
+		return -EINVAL;
+	}
+
+	if (bvf->edge_type[BACK_EDGE] != 0) {
+		RTE_BPF_LOG(ERR, "%s(%p) loops detected;\n",
+			__func__, bvf);
+		log_loop(bvf);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper functions get/free eval states.
+ */
+static struct bpf_eval_state *
+pull_eval_state(struct bpf_verifier *bvf)
+{
+	uint32_t n;
+
+	n = bvf->evst_pool.cur;
+	if (n == bvf->evst_pool.num)
+		return NULL;
+
+	bvf->evst_pool.cur = n + 1;
+	return bvf->evst_pool.ent + n;
+}
+
+static void
+push_eval_state(struct bpf_verifier *bvf)
+{
+	bvf->evst_pool.cur--;
+}
+
+static void
+evst_pool_fini(struct bpf_verifier *bvf)
+{
+	bvf->evst = NULL;
+	free(bvf->evst_pool.ent);
+	memset(&bvf->evst_pool, 0, sizeof(bvf->evst_pool));
+}
+
+static int
+evst_pool_init(struct bpf_verifier *bvf)
+{
+	uint32_t n;
+
+	n = bvf->nb_jcc_nodes + 1;
+
+	bvf->evst_pool.ent = calloc(n, sizeof(bvf->evst_pool.ent[0]));
+	if (bvf->evst_pool.ent == NULL)
+		return -ENOMEM;
+
+	bvf->evst_pool.num = n;
+	bvf->evst_pool.cur = 0;
+
+	bvf->evst = pull_eval_state(bvf);
+	return 0;
+}
+
+/*
+ * Save current eval state.
+ */
+static int
+save_eval_state(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	struct bpf_eval_state *st;
+
+	/* get new eval_state for this node */
+	st = pull_eval_state(bvf);
+	if (st == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s: internal error (out of space) at pc: %u",
+			__func__, get_node_idx(bvf, node));
+		return -ENOMEM;
+	}
+
+	/* make a copy of current state */
+	memcpy(st, bvf->evst, sizeof(*st));
+
+	/* swap current state with new one */
+	node->evst = bvf->evst;
+	bvf->evst = st;
+
+	RTE_BPF_LOG(DEBUG, "%s(bvf=%p,node=%u) old/new states: %p/%p;\n",
+		__func__, bvf, get_node_idx(bvf, node), node->evst, bvf->evst);
+
+	return 0;
+}
+
+/*
+ * Restore previous eval state and mark current eval state as free.
+ */
+static void
+restore_eval_state(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	RTE_BPF_LOG(DEBUG, "%s(bvf=%p,node=%u) old/new states: %p/%p;\n",
+		__func__, bvf, get_node_idx(bvf, node), bvf->evst, node->evst);
+
+	bvf->evst = node->evst;
+	node->evst = NULL;
+	push_eval_state(bvf);
+}
+
+/*
+ * Do second pass through CFG and try to evaluate instructions
+ * via each possible path.
+ * Right now evaluation functionality is quite limited.
+ * Still need to add extra checks for:
+ * - use/return uninitialized registers.
+ * - use uninitialized data from the stack.
+ * - memory boundaries violation.
+ */
+static int
+evaluate(struct bpf_verifier *bvf)
+{
+	int32_t rc;
+	uint32_t idx, op;
+	const char *err;
+	const struct bpf_insn *ins;
+	struct inst_node *next, *node;
+
+	node = bvf->in;
+	ins = bvf->prm->ins;
+	rc = 0;
+
+	while (node != NULL && rc == 0) {
+
+		/* current node evaluation */
+		idx = get_node_idx(bvf, node);
+		op = ins[idx].code;
+
+		if (ins_chk[op].eval != NULL) {
+			err = ins_chk[op].eval(bvf, ins + idx);
+			if (err != NULL) {
+				RTE_BPF_LOG(ERR, "%s: %s at pc: %u\n",
+					__func__, err, idx);
+				rc = -EINVAL;
+			}
+		}
+
+		/* proceed through CFG */
+		next = get_next_node(bvf, node);
+		if (next != NULL) {
+
+			/* proceed with next child */
+			if (node->cur_edge != node->nb_edge)
+				rc |= save_eval_state(bvf, node);
+			else if (node->evst != NULL)
+				restore_eval_state(bvf, node);
+
+			next->prev_node = get_node_idx(bvf, node);
+			node = next;
+		} else {
+			/*
+			 * finished with current node and all it's kids,
+			 * proceed with parent
+			 */
+			node->cur_edge = 0;
+			node = get_prev_node(bvf, node);
+
+			/* finished */
+			if (node == bvf->in)
+				node = NULL;
+		}
+	}
+
+	return rc;
+}
+
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	struct bpf_verifier bvf;
+
+	memset(&bvf, 0, sizeof(bvf));
+	bvf.prm = &bpf->prm;
+	bvf.in = calloc(bpf->prm.nb_ins, sizeof(bvf.in[0]));
+	if (bvf.in == NULL)
+		return -ENOMEM;
+
+	rc = validate(&bvf);
+
+	if (rc == 0) {
+		rc = evst_pool_init(&bvf);
+		if (rc == 0)
+			rc = evaluate(&bvf);
+		evst_pool_fini(&bvf);
+	}
+
+	free(bvf.in);
+
+	/* copy collected info */
+	if (rc == 0)
+		bpf->stack_sz = bvf.stack_sz;
+
 	return rc;
 }
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v2 4/7] bpf: add JIT compilation for x86_64 ISA
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (10 preceding siblings ...)
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 3/7] bpf: add more logic into bpf_validate() Konstantin Ananyev
@ 2018-03-30 17:32 ` Konstantin Ananyev
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-30 17:32 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/Makefile      |    3 +
 lib/librte_bpf/bpf.c         |    5 +
 lib/librte_bpf/bpf_jit_x86.c | 1329 ++++++++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build   |    4 +
 4 files changed, 1341 insertions(+)
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index e0f434e77..44b12c439 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -23,6 +23,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
+endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
index d7f68c017..dc6d10991 100644
--- a/lib/librte_bpf/bpf.c
+++ b/lib/librte_bpf/bpf.c
@@ -41,7 +41,12 @@ bpf_jit(struct rte_bpf *bpf)
 {
 	int32_t rc;
 
+#ifdef RTE_ARCH_X86_64
+	rc = bpf_jit_x86(bpf);
+#else
 	rc = -ENOTSUP;
+#endif
+
 	if (rc != 0)
 		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
 			__func__, bpf, rc);
diff --git a/lib/librte_bpf/bpf_jit_x86.c b/lib/librte_bpf/bpf_jit_x86.c
new file mode 100644
index 000000000..0face0d8e
--- /dev/null
+++ b/lib/librte_bpf/bpf_jit_x86.c
@@ -0,0 +1,1329 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define GET_BPF_OP(op)	(BPF_OP(op) >> 4)
+
+enum {
+	RAX = 0,  /* scratch, return value */
+	RCX = 1,  /* scratch, 4th arg */
+	RDX = 2,  /* scratch, 3rd arg */
+	RBX = 3,  /* callee saved */
+	RSP = 4,  /* stack pointer */
+	RBP = 5,  /* frame pointer, callee saved */
+	RSI = 6,  /* scratch, 2nd arg */
+	RDI = 7,  /* scratch, 1st arg */
+	R8  = 8,  /* scratch, 5th arg */
+	R9  = 9,  /* scratch, 6th arg */
+	R10 = 10, /* scratch */
+	R11 = 11, /* scratch */
+	R12 = 12, /* callee saved */
+	R13 = 13, /* callee saved */
+	R14 = 14, /* callee saved */
+	R15 = 15, /* callee saved */
+};
+
+#define IS_EXT_REG(r)	((r) >= R8)
+
+enum {
+	REX_PREFIX = 0x40, /* fixed value 0100 */
+	REX_W = 0x8,       /* 64bit operand size */
+	REX_R = 0x4,       /* extension of the ModRM.reg field */
+	REX_X = 0x2,       /* extension of the SIB.index field */
+	REX_B = 0x1,       /* extension of the ModRM.rm field */
+};
+
+enum {
+	MOD_INDIRECT = 0,
+	MOD_IDISP8 = 1,
+	MOD_IDISP32 = 2,
+	MOD_DIRECT = 3,
+};
+
+enum {
+	SIB_SCALE_1 = 0,
+	SIB_SCALE_2 = 1,
+	SIB_SCALE_4 = 2,
+	SIB_SCALE_8 = 3,
+};
+
+/*
+ * eBPF to x86_64 register mappings.
+ */
+static const uint32_t ebpf2x86[] = {
+	[BPF_REG_0] = RAX,
+	[BPF_REG_1] = RDI,
+	[BPF_REG_2] = RSI,
+	[BPF_REG_3] = RDX,
+	[BPF_REG_4] = RCX,
+	[BPF_REG_5] = R8,
+	[BPF_REG_6] = RBX,
+	[BPF_REG_7] = R13,
+	[BPF_REG_8] = R14,
+	[BPF_REG_9] = R15,
+	[BPF_REG_10] = RBP,
+};
+
+/*
+ * r10 and r11 are used as a scratch temporary registers.
+ */
+enum {
+	REG_DIV_IMM = R9,
+	REG_TMP0 = R11,
+	REG_TMP1 = R10,
+};
+
+/*
+ * callee saved registers list.
+ * keep RBP as the last one.
+ */
+static const uint32_t save_regs[] = {RBX, R12, R13, R14, R15, RBP};
+
+struct bpf_jit_state {
+	uint32_t idx;
+	size_t sz;
+	struct {
+		uint32_t num;
+		int32_t off;
+	} exit;
+	uint32_t reguse;
+	int32_t *off;
+	uint8_t *ins;
+};
+
+#define	INUSE(v, r)	(((v) >> (r)) & 1)
+#define	USED(v, r)	((v) |= 1 << (r))
+
+union bpf_jit_imm {
+	uint32_t u32;
+	uint8_t u8[4];
+};
+
+static size_t
+bpf_size(uint32_t bpf_op_sz)
+{
+	if (bpf_op_sz == BPF_B)
+		return sizeof(uint8_t);
+	else if (bpf_op_sz == BPF_H)
+		return sizeof(uint16_t);
+	else if (bpf_op_sz == BPF_W)
+		return sizeof(uint32_t);
+	else if (bpf_op_sz == BPF_DW)
+		return sizeof(uint64_t);
+	return 0;
+}
+
+/*
+ * In many cases for imm8 we can produce shorter code.
+ */
+static size_t
+imm_size(int32_t v)
+{
+	if (v == (int8_t)v)
+		return sizeof(int8_t);
+	return sizeof(int32_t);
+}
+
+static void
+emit_bytes(struct bpf_jit_state *st, const uint8_t ins[], uint32_t sz)
+{
+	uint32_t i;
+
+	if (st->ins != NULL) {
+		for (i = 0; i != sz; i++)
+			st->ins[st->sz + i] = ins[i];
+	}
+	st->sz += sz;
+}
+
+static void
+emit_imm(struct bpf_jit_state *st, const uint32_t imm, uint32_t sz)
+{
+	union bpf_jit_imm v;
+
+	v.u32 = imm;
+	emit_bytes(st, v.u8, sz);
+}
+
+/*
+ * emit REX byte
+ */
+static void
+emit_rex(struct bpf_jit_state *st, uint32_t op, uint32_t reg, uint32_t rm)
+{
+	uint8_t rex;
+
+	/* mark operand registers as used*/
+	USED(st->reguse, reg);
+	USED(st->reguse, rm);
+
+	rex = 0;
+	if (BPF_CLASS(op) == BPF_ALU64 ||
+			op == (BPF_ST | BPF_MEM | BPF_DW) ||
+			op == (BPF_STX | BPF_MEM | BPF_DW) ||
+			op == (BPF_STX | BPF_XADD | BPF_DW) ||
+			op == (BPF_LD | BPF_IMM | BPF_DW) ||
+			(BPF_CLASS(op) == BPF_LDX &&
+			BPF_MODE(op) == BPF_MEM &&
+			BPF_SIZE(op) != BPF_W))
+		rex |= REX_W;
+
+	if (IS_EXT_REG(reg))
+		rex |= REX_R;
+
+	if (IS_EXT_REG(rm))
+		rex |= REX_B;
+
+	/* store using SIL, DIL */
+	if (op == (BPF_STX | BPF_MEM | BPF_B) && (reg == RDI || reg == RSI))
+		rex |= REX_PREFIX;
+
+	if (rex != 0) {
+		rex |= REX_PREFIX;
+		emit_bytes(st, &rex, sizeof(rex));
+	}
+}
+
+/*
+ * emit MODRegRM byte
+ */
+static void
+emit_modregrm(struct bpf_jit_state *st, uint32_t mod, uint32_t reg, uint32_t rm)
+{
+	uint8_t v;
+
+	v = mod << 6 | (reg & 7) << 3 | (rm & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit SIB byte
+ */
+static void
+emit_sib(struct bpf_jit_state *st, uint32_t scale, uint32_t idx, uint32_t base)
+{
+	uint8_t v;
+
+	v = scale << 6 | (idx & 7) << 3 | (base & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit xchg %<sreg>, %<dreg>
+ */
+static void
+emit_xchg_reg(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	const uint8_t ops = 0x87;
+
+	emit_rex(st, BPF_ALU64, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit neg %<dreg>
+ */
+static void
+emit_neg(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 3;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+/*
+ * emit mov %<sreg>, %<dreg>
+ */
+static void
+emit_mov_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x89;
+
+	/* if operands are 32-bit, then it can be used to clear upper 32-bit */
+	if (sreg != dreg || BPF_CLASS(op) == BPF_ALU) {
+		emit_rex(st, op, sreg, dreg);
+		emit_bytes(st, &ops, sizeof(ops));
+		emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+	}
+}
+
+/*
+ * emit movzwl %<sreg>, %<dreg>
+ */
+static void
+emit_movzwl(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	static const uint8_t ops[] = {0x0F, 0xB7};
+
+	emit_rex(st, BPF_ALU, sreg, dreg);
+	emit_bytes(st, ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit ror <imm8>, %<dreg>
+ */
+static void
+emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t prfx = 0x66;
+	const uint8_t ops = 0xC1;
+	const uint8_t mods = 1;
+
+	emit_bytes(st, &prfx, sizeof(prfx));
+	emit_rex(st, BPF_ALU, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit bswap %<dreg>
+ */
+static void
+emit_be2le_48(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	uint32_t rop;
+
+	const uint8_t ops = 0x0F;
+	const uint8_t mods = 1;
+
+	rop = (imm == 64) ? BPF_ALU64 : BPF_ALU;
+	emit_rex(st, rop, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+static void
+emit_be2le(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16) {
+		emit_ror_imm(st, dreg, 8);
+		emit_movzwl(st, dreg, dreg);
+	} else
+		emit_be2le_48(st, dreg, imm);
+}
+
+/*
+ * In general it is NOP for x86.
+ * Just clear the upper bits.
+ */
+static void
+emit_le2be(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16)
+		emit_movzwl(st, dreg, dreg);
+	else if (imm == 32)
+		emit_mov_reg(st, BPF_ALU | BPF_MOV | BPF_X, dreg, dreg);
+}
+
+/*
+ * emit one of:
+ *   add <imm>, %<dreg>
+ *   and <imm>, %<dreg>
+ *   or  <imm>, %<dreg>
+ *   sub <imm>, %<dreg>
+ *   xor <imm>, %<dreg>
+ */
+static void
+emit_alu_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t mod, opcode;
+	uint32_t bop, imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0,
+		[GET_BPF_OP(BPF_AND)] = 4,
+		[GET_BPF_OP(BPF_OR)] =  1,
+		[GET_BPF_OP(BPF_SUB)] = 5,
+		[GET_BPF_OP(BPF_XOR)] = 6,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+
+	imsz = imm_size(imm);
+	opcode = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &opcode, sizeof(opcode));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit one of:
+ *   add %<sreg>, %<dreg>
+ *   and %<sreg>, %<dreg>
+ *   or  %<sreg>, %<dreg>
+ *   sub %<sreg>, %<dreg>
+ *   xor %<sreg>, %<dreg>
+ */
+static void
+emit_alu_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0x01,
+		[GET_BPF_OP(BPF_AND)] = 0x21,
+		[GET_BPF_OP(BPF_OR)] =  0x09,
+		[GET_BPF_OP(BPF_SUB)] = 0x29,
+		[GET_BPF_OP(BPF_XOR)] = 0x31,
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+static void
+emit_shift(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	uint8_t mod;
+	uint32_t bop, opx;
+
+	static const uint8_t ops[] = {0xC1, 0xD3};
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_LSH)] = 4,
+		[GET_BPF_OP(BPF_RSH)] = 5,
+		[GET_BPF_OP(BPF_ARSH)] = 7,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+	opx = (BPF_SRC(op) == BPF_X);
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+}
+
+/*
+ * emit one of:
+ *   shl <imm>, %<dreg>
+ *   shr <imm>, %<dreg>
+ *   sar <imm>, %<dreg>
+ */
+static void
+emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm)
+{
+	emit_shift(st, op, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit one of:
+ *   shl %<dreg>
+ *   shr %<dreg>
+ *   sar %<dreg>
+ * note that rcx is implicitly used as a source register, so few extra
+ * instructions for register spillage might be necessary.
+ */
+static void
+emit_shift_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+
+	emit_shift(st, op, (dreg == RCX) ? sreg : dreg);
+
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+}
+
+/*
+ * emit mov <imm>, %<dreg>
+ */
+static void
+emit_mov_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xC7;
+
+	if (imm == 0) {
+		/* replace 'mov 0, %<dst>' with 'xor %<dst>, %<dst>' */
+		op = BPF_CLASS(op) | BPF_XOR | BPF_X;
+		emit_alu_reg(st, op, dreg, dreg);
+		return;
+	}
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+	emit_imm(st, imm, sizeof(imm));
+}
+
+/*
+ * emit mov <imm64>, %<dreg>
+ */
+static void
+emit_ld_imm64(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm0,
+	uint32_t imm1)
+{
+	const uint8_t ops = 0xB8;
+
+	if (imm1 == 0) {
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, dreg, imm0);
+		return;
+	}
+
+	emit_rex(st, BPF_ALU64, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+
+	emit_imm(st, imm0, sizeof(imm0));
+	emit_imm(st, imm1, sizeof(imm1));
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * either:
+ *   mov %<sreg>, %rdx
+ * OR
+ *   mov <imm>, %rdx
+ * mul %rdx
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ */
+static void
+emit_mul(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 4;
+
+	/* save rax & rdx */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, REG_TMP0);
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* rax = dreg */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, dreg, RAX);
+
+	if (BPF_CLASS(op) == BPF_X)
+		/* rdx = sreg */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X,
+			sreg == RAX ? REG_TMP0 : sreg, RDX);
+	else
+		/* rdx = imm */
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, RDX, imm);
+
+	emit_rex(st, op, RAX, RDX);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RDX);
+
+	if (dreg != RDX)
+		/* restore rdx */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP1, RDX);
+
+	if (dreg != RAX) {
+		/* dreg = rax */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, dreg);
+		/* restore rax */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP0, RAX);
+	}
+}
+
+/*
+ * emit mov <ofs>(%<sreg>), %<dreg>
+ * note that for non 64-bit ops, higher bits have to be cleared.
+ */
+static void
+emit_ld_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	uint32_t mods, opsz;
+	const uint8_t op32 = 0x8B;
+	const uint8_t op16[] = {0x0F, 0xB7};
+	const uint8_t op8[] = {0x0F, 0xB6};
+
+	emit_rex(st, op, dreg, sreg);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_B)
+		emit_bytes(st, op8, sizeof(op8));
+	else if (opsz == BPF_H)
+		emit_bytes(st, op16, sizeof(op16));
+	else
+		emit_bytes(st, &op32, sizeof(op32));
+
+	mods = (imm_size(ofs) == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, dreg, sreg);
+	if (sreg == RSP || sreg == R12)
+		emit_sib(st, SIB_SCALE_1, sreg, sreg);
+	emit_imm(st, ofs, imm_size(ofs));
+}
+
+/*
+ * emit one of:
+ *   mov %<sreg>, <ofs>(%<dreg>)
+ *   mov <imm>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_common(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, uint32_t imm, int32_t ofs)
+{
+	uint32_t mods, imsz, opsz, opx;
+	const uint8_t prfx16 = 0x66;
+
+	/* 8 bit instruction opcodes */
+	static const uint8_t op8[] = {0xC6, 0x88};
+
+	/* 16/32/64 bit instruction opcodes */
+	static const uint8_t ops[] = {0xC7, 0x89};
+
+	/* is the instruction has immediate value or src reg? */
+	opx = (BPF_CLASS(op) == BPF_STX);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_H)
+		emit_bytes(st, &prfx16, sizeof(prfx16));
+
+	emit_rex(st, op, sreg, dreg);
+
+	if (opsz == BPF_B)
+		emit_bytes(st, &op8[opx], sizeof(op8[opx]));
+	else
+		emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, sreg, dreg);
+
+	if (dreg == RSP || dreg == R12)
+		emit_sib(st, SIB_SCALE_1, dreg, dreg);
+
+	emit_imm(st, ofs, imsz);
+
+	if (opx == 0)
+		emit_imm(st, imm, bpf_size(opsz));
+}
+
+static void
+emit_st_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm,
+	int32_t ofs)
+{
+	emit_st_common(st, op, 0, dreg, imm, ofs);
+}
+
+static void
+emit_st_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	emit_st_common(st, op, sreg, dreg, 0, ofs);
+}
+
+/*
+ * emit lock add %<sreg>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_xadd(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	uint32_t imsz, mods;
+
+	const uint8_t lck = 0xF0; /* lock prefix */
+	const uint8_t ops = 0x01; /* add opcode */
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_bytes(st, &lck, sizeof(lck));
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, mods, sreg, dreg);
+	emit_imm(st, ofs, imsz);
+}
+
+/*
+ * emit:
+ *    mov <imm64>, (%rax)
+ *    call *%rax
+ */
+static void
+emit_call(struct bpf_jit_state *st, uintptr_t trg)
+{
+	const uint8_t ops = 0xFF;
+	const uint8_t mods = 2;
+
+	emit_ld_imm64(st, RAX, trg, trg >> 32);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RAX);
+}
+
+/*
+ * emit jmp <ofs>
+ */
+static void
+emit_jmp(struct bpf_jit_state *st, int32_t ofs)
+{
+	int32_t joff;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0xEB;
+	const uint8_t op32 = 0xE9;
+
+	const int32_t sz8 = sizeof(op8) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32) + sizeof(uint32_t);
+
+	/* max possible jmp instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = st->off[st->idx + ofs] - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8, sizeof(op8));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, &op32, sizeof(op32));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit one of:
+ *    cmovz %<sreg>, <%dreg>
+ *    cmovne %<sreg>, <%dreg>
+ *    cmova %<sreg>, <%dreg>
+ *    cmovb %<sreg>, <%dreg>
+ *    cmovae %<sreg>, <%dreg>
+ *    cmovbe %<sreg>, <%dreg>
+ *    cmovg %<sreg>, <%dreg>
+ *    cmovl %<sreg>, <%dreg>
+ *    cmovge %<sreg>, <%dreg>
+ *    cmovle %<sreg>, <%dreg>
+ */
+static void
+emit_movcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x44},  /* CMOVZ */
+		[GET_BPF_OP(BPF_JNE)] = {0x0F, 0x45},  /* CMOVNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x47},  /* CMOVA */
+		[GET_BPF_OP(BPF_JLT)] = {0x0F, 0x42},  /* CMOVB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x43},  /* CMOVAE */
+		[GET_BPF_OP(BPF_JLE)] = {0x0F, 0x46},  /* CMOVBE */
+		[GET_BPF_OP(BPF_JSGT)] = {0x0F, 0x4F}, /* CMOVG */
+		[GET_BPF_OP(BPF_JSLT)] = {0x0F, 0x4C}, /* CMOVL */
+		[GET_BPF_OP(BPF_JSGE)] = {0x0F, 0x4D}, /* CMOVGE */
+		[GET_BPF_OP(BPF_JSLE)] = {0x0F, 0x4E}, /* CMOVLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x45}, /* CMOVNE */
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit one of:
+ * je <ofs>
+ * jne <ofs>
+ * ja <ofs>
+ * jb <ofs>
+ * jae <ofs>
+ * jbe <ofs>
+ * jg <ofs>
+ * jl <ofs>
+ * jge <ofs>
+ * jle <ofs>
+ */
+static void
+emit_jcc(struct bpf_jit_state *st, uint32_t op, int32_t ofs)
+{
+	uint32_t bop, imsz;
+	int32_t joff;
+
+	static const uint8_t op8[] = {
+		[GET_BPF_OP(BPF_JEQ)] = 0x74,  /* JE */
+		[GET_BPF_OP(BPF_JNE)] = 0x75,  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = 0x77,  /* JA */
+		[GET_BPF_OP(BPF_JLT)] = 0x72,  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = 0x73,  /* JAE */
+		[GET_BPF_OP(BPF_JLE)] = 0x76,  /* JBE */
+		[GET_BPF_OP(BPF_JSGT)] = 0x7F, /* JG */
+		[GET_BPF_OP(BPF_JSLT)] = 0x7C, /* JL */
+		[GET_BPF_OP(BPF_JSGE)] = 0x7D, /*JGE */
+		[GET_BPF_OP(BPF_JSLE)] = 0x7E, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = 0x75, /*JNE */
+	};
+
+	static const uint8_t op32[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x84},  /* JE */
+		[GET_BPF_OP(BPF_JNE)] = {0x0F, 0x85},  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x87},  /* JA */
+		[GET_BPF_OP(BPF_JLT)] = {0x0F, 0x82},  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x83},  /* JAE */
+		[GET_BPF_OP(BPF_JLE)] = {0x0F, 0x86},  /* JBE */
+		[GET_BPF_OP(BPF_JSGT)] = {0x0F, 0x8F}, /* JG */
+		[GET_BPF_OP(BPF_JSLT)] = {0x0F, 0x8C}, /* JL */
+		[GET_BPF_OP(BPF_JSGE)] = {0x0F, 0x8D}, /*JGE */
+		[GET_BPF_OP(BPF_JSLE)] = {0x0F, 0x8E}, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x85}, /*JNE */
+	};
+
+	const int32_t sz8 = sizeof(op8[0]) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32[0]) + sizeof(uint32_t);
+
+	/* max possible jcc instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = st->off[st->idx + ofs] - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	bop = GET_BPF_OP(op);
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8[bop], sizeof(op8[bop]));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, op32[bop], sizeof(op32[bop]));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit cmp <imm>, %<dreg>
+ */
+static void
+emit_cmp_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t ops;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	const uint8_t mods = 7;
+
+	imsz = imm_size(imm);
+	ops = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit test <imm>, %<dreg>
+ */
+static void
+emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 0;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+static void
+emit_jcc_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_imm(st, BPF_ALU64, dreg, imm);
+	else
+		emit_cmp_imm(st, BPF_ALU64, dreg, imm);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * emit test %<sreg>, %<dreg>
+ */
+static void
+emit_tst_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x85;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit cmp %<sreg>, %<dreg>
+ */
+static void
+emit_cmp_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x39;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+
+}
+
+static void
+emit_jcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_reg(st, BPF_ALU64, sreg, dreg);
+	else
+		emit_cmp_reg(st, BPF_ALU64, sreg, dreg);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * xor %rdx, %rdx
+ * for divisor as immediate value:
+ *   mov <imm>, %r9
+ * div %<divisor_reg>
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ * either:
+ *   mov %rax, %<dreg>
+ * OR
+ *   mov %rdx, %<dreg>
+ * mov %r11, %rax
+ * mov %r10, %rdx
+ */
+static void
+emit_div(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	uint32_t sr;
+
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 6;
+
+	if (BPF_SRC(op) == BPF_X) {
+		emit_tst_reg(st, BPF_CLASS(op), sreg, sreg);
+		emit_movcc_reg(st, BPF_CLASS(op) | BPF_JEQ | BPF_X, sreg, RAX);
+		emit_jcc(st, BPF_JMP | BPF_JEQ | BPF_K, st->exit.off);
+	}
+
+	/* save rax & rdx */
+	if (dreg != RAX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, REG_TMP0);
+	if (dreg != RDX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* fill rax & rdx */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, dreg, RAX);
+	emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, RDX, 0);
+
+	if (BPF_SRC(op) == BPF_X) {
+		sr = sreg;
+		if (sr == RAX)
+			sr = REG_TMP0;
+		else if (sr == RDX)
+			sr = REG_TMP1;
+	} else {
+		sr = REG_DIV_IMM;
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, sr, imm);
+	}
+
+	emit_rex(st, op, 0, sr);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, sr);
+
+	if (BPF_OP(op) == BPF_DIV)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, dreg);
+	else
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, dreg);
+
+	if (dreg != RAX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP0, RAX);
+	if (dreg != RDX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP1, RDX);
+}
+
+static void
+emit_prolog(struct bpf_jit_state *st, int32_t stack_size)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	/* we can avoid touching the stack at all */
+	if (spil == 0)
+		return;
+
+
+	emit_alu_imm(st, BPF_ALU64 | BPF_SUB | BPF_K, RSP,
+		spil * sizeof(uint64_t));
+
+	ofs = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++) {
+		if (INUSE(st->reguse, save_regs[i]) != 0) {
+			emit_st_reg(st, BPF_STX | BPF_MEM | BPF_DW,
+				save_regs[i], RSP, ofs);
+			ofs += sizeof(uint64_t);
+		}
+	}
+
+	if (INUSE(st->reguse, RBP) != 0) {
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RSP, RBP);
+		emit_alu_imm(st, BPF_ALU64 | BPF_SUB | BPF_K, RSP, stack_size);
+	}
+}
+
+/*
+ * emit ret
+ */
+static void
+emit_ret(struct bpf_jit_state *st)
+{
+	const uint8_t ops = 0xC3;
+
+	emit_bytes(st, &ops, sizeof(ops));
+}
+
+static void
+emit_epilog(struct bpf_jit_state *st)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	/* if we allready have an epilog generate a jump to it */
+	if (st->exit.num++ != 0) {
+		emit_jcc(st, BPF_JMP | BPF_JA | BPF_K, st->exit.off);
+		return;
+	}
+
+	/* store offset of epilog block */
+	st->exit.off = st->sz;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	if (spil != 0) {
+
+		if (INUSE(st->reguse, RBP) != 0)
+			emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RBP, RSP);
+
+		ofs = 0;
+		for (i = 0; i != RTE_DIM(save_regs); i++) {
+			if (INUSE(st->reguse, save_regs[i]) != 0) {
+				emit_ld_reg(st, BPF_LDX | BPF_MEM | BPF_DW,
+					RSP, save_regs[i], ofs);
+				ofs += sizeof(uint64_t);
+			}
+		}
+
+		emit_alu_imm(st, BPF_ALU64 | BPF_ADD | BPF_K, RSP,
+			spil * sizeof(uint64_t));
+	}
+
+	emit_ret(st);
+}
+
+/*
+ * walk through bpf code and translate them x86_64 one.
+ */
+static int
+emit(struct bpf_jit_state *st, const struct rte_bpf *bpf)
+{
+	uint32_t i, dr, op, sr;
+	const struct bpf_insn *ins;
+
+	/* reset state fields */
+	st->sz = 0;
+	st->exit.num = 0;
+
+	emit_prolog(st, bpf->stack_sz);
+
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		st->idx = i;
+		st->off[i] = st->sz;
+
+		ins = bpf->prm.ins + i;
+
+		dr = ebpf2x86[ins->dst_reg];
+		sr = ebpf2x86[ins->src_reg];
+		op = ins->code;
+
+		switch (op) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+		case (BPF_ALU | BPF_SUB | BPF_K):
+		case (BPF_ALU | BPF_AND | BPF_K):
+		case (BPF_ALU | BPF_OR | BPF_K):
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+		case (BPF_ALU | BPF_SUB | BPF_X):
+		case (BPF_ALU | BPF_AND | BPF_X):
+		case (BPF_ALU | BPF_OR | BPF_X):
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			emit_be2le(st, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			emit_le2be(st, dr, ins->imm);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		/* multiply instructions */
+		case (BPF_ALU | BPF_MUL | BPF_K):
+		case (BPF_ALU | BPF_MUL | BPF_X):
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			emit_mul(st, op, sr, dr, ins->imm);
+			break;
+		/* divide instructions */
+		case (BPF_ALU | BPF_DIV | BPF_K):
+		case (BPF_ALU | BPF_MOD | BPF_K):
+		case (BPF_ALU | BPF_DIV | BPF_X):
+		case (BPF_ALU | BPF_MOD | BPF_X):
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			emit_div(st, op, sr, dr, ins->imm);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+		case (BPF_LDX | BPF_MEM | BPF_H):
+		case (BPF_LDX | BPF_MEM | BPF_W):
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			emit_ld_reg(st, op, sr, dr, ins->off);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			emit_ld_imm64(st, dr, ins[0].imm, ins[1].imm);
+			i++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+		case (BPF_STX | BPF_MEM | BPF_H):
+		case (BPF_STX | BPF_MEM | BPF_W):
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			emit_st_reg(st, op, sr, dr, ins->off);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+		case (BPF_ST | BPF_MEM | BPF_H):
+		case (BPF_ST | BPF_MEM | BPF_W):
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			emit_st_imm(st, op, dr, ins->imm, ins->off);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			emit_st_xadd(st, op, sr, dr, ins->off);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			emit_jmp(st, ins->off + 1);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | BPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | BPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | BPF_JLE | BPF_K):
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			emit_jcc_imm(st, op, dr, ins->imm, ins->off + 1);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | BPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | BPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | BPF_JLE | BPF_X):
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			emit_jcc_reg(st, op, sr, dr, ins->off + 1);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			emit_call(st, (uintptr_t)bpf->prm.xsym[ins->imm].func);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			emit_epilog(st);
+			break;
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %u;\n",
+				__func__, bpf, ins->code, i);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * produce a native ISA version of the given BPF code.
+ */
+int
+bpf_jit_x86(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	uint32_t i;
+	size_t sz;
+	struct bpf_jit_state st;
+
+	/* init state */
+	memset(&st, 0, sizeof(st));
+	st.off = malloc(bpf->prm.nb_ins * sizeof(st.off[0]));
+	if (st.off == NULL)
+		return -ENOMEM;
+
+	/* fill with fake offsets */
+	st.exit.off = INT32_MAX;
+	for (i = 0; i != bpf->prm.nb_ins; i++)
+		st.off[i] = INT32_MAX;
+
+	/*
+	 * dry runs, used to calculate total code size and valid jump offsets.
+	 * stop when we get minimal possible size
+	 */
+	do {
+		sz = st.sz;
+		rc = emit(&st, bpf);
+	} while (rc == 0 && sz != st.sz);
+
+	if (rc == 0) {
+
+		/* allocate memory needed */
+		st.ins = mmap(NULL, st.sz, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (st.ins == MAP_FAILED)
+			rc = -ENOMEM;
+		else
+			/* generate code */
+			rc = emit(&st, bpf);
+	}
+
+	if (rc == 0 && mprotect(st.ins, st.sz, PROT_READ | PROT_EXEC) != 0)
+		rc = -ENOMEM;
+
+	if (rc != 0)
+		munmap(st.ins, st.sz);
+	else {
+		bpf->jit.func = (void *)st.ins;
+		bpf->jit.sz = st.sz;
+	}
+
+	free(st.off);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index 05c48c7ff..67ca30533 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -7,6 +7,10 @@ sources = files('bpf.c',
 		'bpf_load.c',
 		'bpf_validate.c')
 
+if arch_subdir == 'x86'
+	sources += files('bpf_jit_x86.c')
+endif
+
 install_headers = files('rte_bpf.h')
 
 deps += ['mbuf', 'net']
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (11 preceding siblings ...)
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 4/7] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
@ 2018-03-30 17:32 ` Konstantin Ananyev
  2018-04-02 22:44   ` Jerin Jacob
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 6/7] testpmd: new commands to load/unload " Konstantin Ananyev
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 7/7] test: add few eBPF samples Konstantin Ananyev
  14 siblings, 1 reply; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-30 17:32 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce API to install BPF based filters on ethdev RX/TX path.
Current implementation is pure SW one, based on ethdev RX/TX
callback mechanism.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/Makefile            |   2 +
 lib/librte_bpf/bpf_pkt.c           | 607 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build         |   6 +-
 lib/librte_bpf/rte_bpf_ethdev.h    | 100 ++++++
 lib/librte_bpf/rte_bpf_version.map |   4 +
 5 files changed, 717 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index 44b12c439..501c49c60 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -22,6 +22,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_pkt.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
 ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
@@ -29,5 +30,6 @@ endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf_ethdev.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf_pkt.c b/lib/librte_bpf/bpf_pkt.c
new file mode 100644
index 000000000..287d40564
--- /dev/null
+++ b/lib/librte_bpf/bpf_pkt.c
@@ -0,0 +1,607 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include <sys/queue.h>
+#include <sys/stat.h>
+
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include <rte_malloc.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_cycles.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+
+#include <rte_bpf_ethdev.h>
+#include "bpf_impl.h"
+
+/*
+ * information about installed BPF rx/tx callback
+ */
+
+struct bpf_eth_cbi {
+	/* used by both data & control path */
+	uint32_t use;    /*usage counter */
+	void *cb;        /* callback handle */
+	struct rte_bpf *bpf;
+	struct rte_bpf_jit jit;
+	/* used by control path only */
+	LIST_ENTRY(bpf_eth_cbi) link;
+	uint16_t port;
+	uint16_t queue;
+} __rte_cache_aligned;
+
+/*
+ * Odd number means that callback is used by datapath.
+ * Even number means that callback is not used by datapath.
+ */
+#define BPF_ETH_CBI_INUSE  1
+
+/*
+ * List to manage RX/TX installed callbacks.
+ */
+LIST_HEAD(bpf_eth_cbi_list, bpf_eth_cbi);
+
+enum {
+	BPF_ETH_RX,
+	BPF_ETH_TX,
+	BPF_ETH_NUM,
+};
+
+/*
+ * information about all installed BPF rx/tx callbacks
+ */
+struct bpf_eth_cbh {
+	rte_spinlock_t lock;
+	struct bpf_eth_cbi_list list;
+	uint32_t type;
+};
+
+static struct bpf_eth_cbh rx_cbh = {
+	.lock = RTE_SPINLOCK_INITIALIZER,
+	.list = LIST_HEAD_INITIALIZER(list),
+	.type = BPF_ETH_RX,
+};
+
+static struct bpf_eth_cbh tx_cbh = {
+	.lock = RTE_SPINLOCK_INITIALIZER,
+	.list = LIST_HEAD_INITIALIZER(list),
+	.type = BPF_ETH_TX,
+};
+
+/*
+ * Marks given callback as used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
+{
+	cbi->use++;
+	/* make sure no store/load reordering could happen */
+	rte_smp_mb();
+}
+
+/*
+ * Marks given callback list as not used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
+{
+	/* make sure all previous loads are completed */
+	rte_smp_rmb();
+	cbi->use++;
+}
+
+/*
+ * Waits till datapath finished using given callback.
+ */
+static void
+bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
+{
+	uint32_t nuse, puse;
+
+	/* make sure all previous loads and stores are completed */
+	rte_smp_mb();
+
+	puse = cbi->use;
+
+	/* in use, busy wait till current RX/TX iteration is finished */
+	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
+		do {
+			rte_pause();
+			rte_compiler_barrier();
+			nuse = cbi->use;
+		} while (nuse == puse);
+	}
+}
+
+static void
+bpf_eth_cbi_cleanup(struct bpf_eth_cbi *bc)
+{
+	bc->bpf = NULL;
+	memset(&bc->jit, 0, sizeof(bc->jit));
+}
+
+static struct bpf_eth_cbi *
+bpf_eth_cbh_find(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *cbi;
+
+	LIST_FOREACH(cbi, &cbh->list, link) {
+		if (cbi->port == port && cbi->queue == queue)
+			break;
+	}
+	return cbi;
+}
+
+static struct bpf_eth_cbi *
+bpf_eth_cbh_add(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *cbi;
+
+	/* return an existing one */
+	cbi = bpf_eth_cbh_find(cbh, port, queue);
+	if (cbi != NULL)
+		return cbi;
+
+	cbi = rte_zmalloc(NULL, sizeof(*cbi), RTE_CACHE_LINE_SIZE);
+	if (cbi != NULL) {
+		cbi->port = port;
+		cbi->queue = queue;
+		LIST_INSERT_HEAD(&cbh->list, cbi, link);
+	}
+	return cbi;
+}
+
+/*
+ * BPF packet processing routinies.
+ */
+
+static inline uint32_t
+apply_filter(struct rte_mbuf *mb[], const uint64_t rc[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i, j, k;
+	struct rte_mbuf *dr[num];
+
+	for (i = 0, j = 0, k = 0; i != num; i++) {
+
+		/* filter matches */
+		if (rc[i] != 0)
+			mb[j++] = mb[i];
+		/* no match */
+		else
+			dr[k++] = mb[i];
+	}
+
+	if (drop != 0) {
+		/* free filtered out mbufs */
+		for (i = 0; i != k; i++)
+			rte_pktmbuf_free(dr[i]);
+	} else {
+		/* copy filtered out mbufs beyond good ones */
+		for (i = 0; i != k; i++)
+			mb[j + i] = dr[i];
+	}
+
+	return j;
+}
+
+static inline uint32_t
+pkt_filter_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i;
+	void *dp[num];
+	uint64_t rc[num];
+
+	for (i = 0; i != num; i++)
+		dp[i] = rte_pktmbuf_mtod(mb[i], void *);
+
+	rte_bpf_exec_burst(bpf, dp, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i, n;
+	void *dp;
+	uint64_t rc[num];
+
+	n = 0;
+	for (i = 0; i != num; i++) {
+		dp = rte_pktmbuf_mtod(mb[i], void *);
+		rc[i] = jit->func(dp);
+		n += (rc[i] == 0);
+	}
+
+	if (n != 0)
+		num = apply_filter(mb, rc, num, drop);
+
+	return num;
+}
+
+static inline uint32_t
+pkt_filter_mb_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint64_t rc[num];
+
+	rte_bpf_exec_burst(bpf, (void **)mb, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_mb_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i, n;
+	uint64_t rc[num];
+
+	n = 0;
+	for (i = 0; i != num; i++) {
+		rc[i] = jit->func(mb[i]);
+		n += (rc[i] == 0);
+	}
+
+	if (n != 0)
+		num = apply_filter(mb, rc, num, drop);
+
+	return num;
+}
+
+/*
+ * RX/TX callbacks for raw data bpf.
+ */
+
+static uint16_t
+bpf_rx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+/*
+ * RX/TX callbacks for mbuf.
+ */
+
+static uint16_t
+bpf_rx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static rte_rx_callback_fn
+select_rx_callback(enum rte_bpf_prog_type ptype, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+			return bpf_rx_callback_jit;
+		else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+			return bpf_rx_callback_mb_jit;
+	} else if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+		return bpf_rx_callback_vm;
+	else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+		return bpf_rx_callback_mb_vm;
+
+	return NULL;
+}
+
+static rte_tx_callback_fn
+select_tx_callback(enum rte_bpf_prog_type ptype, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+			return bpf_tx_callback_jit;
+		else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+			return bpf_tx_callback_mb_jit;
+	} else if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+		return bpf_tx_callback_vm;
+	else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+		return bpf_tx_callback_mb_vm;
+
+	return NULL;
+}
+
+/*
+ * helper function to perform BPF unload for given port/queue.
+ * have to introduce extra complexity (and possible slowdown) here,
+ * as right now there is no safe generic way to remove RX/TX callback
+ * while IO is active.
+ * Still don't free memory allocated for callback handle itself,
+ * again right now there is no safe way to do that without stopping RX/TX
+ * on given port/queue first.
+ */
+static void
+bpf_eth_cbi_unload(struct bpf_eth_cbi *bc)
+{
+	/* mark this cbi as empty */
+	bc->cb = NULL;
+	rte_smp_mb();
+
+	/* make sure datapath doesn't use bpf anymore, then destroy bpf */
+	bpf_eth_cbi_wait(bc);
+	rte_bpf_destroy(bc->bpf);
+	bpf_eth_cbi_cleanup(bc);
+}
+
+static void
+bpf_eth_unload(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *bc;
+
+	bc = bpf_eth_cbh_find(cbh, port, queue);
+	if (bc == NULL || bc->cb == NULL)
+		return;
+
+	if (cbh->type == BPF_ETH_RX)
+		rte_eth_remove_rx_callback(port, queue, bc->cb);
+	else
+		rte_eth_remove_tx_callback(port, queue, bc->cb);
+
+	bpf_eth_cbi_unload(bc);
+}
+
+
+__rte_experimental void
+rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &rx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	bpf_eth_unload(cbh, port, queue);
+	rte_spinlock_unlock(&cbh->lock);
+}
+
+__rte_experimental void
+rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &tx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	bpf_eth_unload(cbh, port, queue);
+	rte_spinlock_unlock(&cbh->lock);
+}
+
+static int
+bpf_eth_elf_load(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbi *bc;
+	struct rte_bpf *bpf;
+	rte_rx_callback_fn frx;
+	rte_tx_callback_fn ftx;
+	struct rte_bpf_jit jit;
+
+	frx = NULL;
+	ftx = NULL;
+
+	if (prm == NULL || rte_eth_dev_is_valid_port(port) == 0 ||
+			queue >= RTE_MAX_QUEUES_PER_PORT)
+		return -EINVAL;
+
+	if (cbh->type == BPF_ETH_RX)
+		frx = select_rx_callback(prm->prog_type, flags);
+	else
+		ftx = select_tx_callback(prm->prog_type, flags);
+
+	if (frx == NULL && ftx == NULL) {
+		RTE_BPF_LOG(ERR, "%s(%u, %u): no callback selected;\n",
+			__func__, port, queue);
+		return -EINVAL;
+	}
+
+	bpf = rte_bpf_elf_load(prm, fname, sname);
+	if (bpf == NULL)
+		return -rte_errno;
+
+	rte_bpf_get_jit(bpf, &jit);
+
+	if ((flags & RTE_BPF_ETH_F_JIT) != 0 && jit.func == NULL) {
+		RTE_BPF_LOG(ERR, "%s(%u, %u): no JIT generated;\n",
+			__func__, port, queue);
+		rte_bpf_destroy(bpf);
+		rc = -ENOTSUP;
+	}
+
+	/* setup/update global callback info */
+	bc = bpf_eth_cbh_add(cbh, port, queue);
+	if (bc == NULL)
+		return -ENOMEM;
+
+	/* remove old one, if any */
+	if (bc->cb != NULL)
+		bpf_eth_unload(cbh, port, queue);
+
+	bc->bpf = bpf;
+	bc->jit = jit;
+
+	if (cbh->type == BPF_ETH_RX)
+		bc->cb = rte_eth_add_rx_callback(port, queue, frx, bc);
+	else
+		bc->cb = rte_eth_add_tx_callback(port, queue, ftx, bc);
+
+	if (bc->cb == NULL) {
+		rc = -rte_errno;
+		rte_bpf_destroy(bpf);
+		bpf_eth_cbi_cleanup(bc);
+	} else
+		rc = 0;
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &rx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	rc = bpf_eth_elf_load(cbh, port, queue, prm, fname, sname, flags);
+	rte_spinlock_unlock(&cbh->lock);
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &tx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	rc = bpf_eth_elf_load(cbh, port, queue, prm, fname, sname, flags);
+	rte_spinlock_unlock(&cbh->lock);
+
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index 67ca30533..39b464041 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -5,15 +5,17 @@ allow_experimental_apis = true
 sources = files('bpf.c',
 		'bpf_exec.c',
 		'bpf_load.c',
+		'bpf_pkt.c',
 		'bpf_validate.c')
 
 if arch_subdir == 'x86'
 	sources += files('bpf_jit_x86.c')
 endif
 
-install_headers = files('rte_bpf.h')
+install_headers = files('rte_bpf.h',
+			'rte_bpf_ethdev.h')
 
-deps += ['mbuf', 'net']
+deps += ['mbuf', 'net', 'ethdev']
 
 dep = dependency('libelf', required: false)
 if dep.found() == false
diff --git a/lib/librte_bpf/rte_bpf_ethdev.h b/lib/librte_bpf/rte_bpf_ethdev.h
new file mode 100644
index 000000000..33ce0c6c7
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_ethdev.h
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_ETHDEV_H_
+#define _RTE_BPF_ETHDEV_H_
+
+#include <rte_bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum {
+	RTE_BPF_ETH_F_NONE = 0,
+	RTE_BPF_ETH_F_JIT  = 0x1, /*< use compiled into native ISA code */
+};
+
+/**
+ * API to install BPF filter as RX/TX callbacks for eth devices.
+ * Note that right now:
+ * - it is not MT safe, i.e. it is not allowed to do load/unload for the
+ *   same port/queue from different threads in parallel.
+ * - though it allows to do load/unload at runtime
+ *   (while RX/TX is ongoing on given port/queue).
+ * - allows only one BPF program per port/queue,
+ * i.e. new load will replace previously loaded for that port/queue BPF program.
+ * Filter behaviour - if BPF program returns zero value for a given packet,
+ * then it will be dropped inside callback and no further processing
+ *   on RX - it will be dropped inside callback and no further processing
+ *   for that packet will happen.
+ *   on TX - packet will remain unsent, and it is responsibility of the user
+ *   to handle such situation (drop, try to send again, etc.).
+ */
+
+/**
+ * Unload previously loaded BPF program (if any) from given RX port/queue
+ * and remove appropriate RX port/queue callback.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the RX queue on the given port
+ */
+void rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue);
+
+/**
+ * Unload previously loaded BPF program (if any) from given TX port/queue
+ * and remove appropriate TX port/queue callback.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the TX queue on the given port
+ */
+void rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue);
+
+/**
+ * Load BPF program from the ELF file and install callback to execute it
+ * on given RX port/queue.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the RX queue on the given port
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   Zero on successful completion or negative error code otherwise.
+ */
+int rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+/**
+ * Load BPF program from the ELF file and install callback to execute it
+ * on given TX port/queue.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the TX queue on the given port
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   Zero on successful completion or negative error code otherwise.
+ */
+int rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_ETHDEV_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
index ff65144df..a203e088e 100644
--- a/lib/librte_bpf/rte_bpf_version.map
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -3,6 +3,10 @@ EXPERIMENTAL {
 
 	rte_bpf_destroy;
 	rte_bpf_elf_load;
+	rte_bpf_eth_rx_elf_load;
+	rte_bpf_eth_rx_unload;
+	rte_bpf_eth_tx_elf_load;
+	rte_bpf_eth_tx_unload;
 	rte_bpf_exec;
 	rte_bpf_exec_burst;
 	rte_bpf_get_jit;
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v2 6/7] testpmd: new commands to load/unload BPF filters
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (12 preceding siblings ...)
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
@ 2018-03-30 17:32 ` Konstantin Ananyev
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 7/7] test: add few eBPF samples Konstantin Ananyev
  14 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-30 17:32 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce new testpmd commands to load/unload RX/TX BPF-based filters.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/bpf_sup.h   |  25 ++++++++
 app/test-pmd/cmdline.c   | 146 +++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/meson.build |   2 +-
 3 files changed, 172 insertions(+), 1 deletion(-)
 create mode 100644 app/test-pmd/bpf_sup.h

diff --git a/app/test-pmd/bpf_sup.h b/app/test-pmd/bpf_sup.h
new file mode 100644
index 000000000..35f91a07f
--- /dev/null
+++ b/app/test-pmd/bpf_sup.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2017 Intel Corporation
+ */
+
+#ifndef _BPF_SUP_H_
+#define _BPF_SUP_H_
+
+#include <stdio.h>
+#include <rte_mbuf.h>
+#include <rte_bpf_ethdev.h>
+
+static const struct rte_bpf_xsym bpf_xsym[] = {
+	{
+		.name = RTE_STR(stdout),
+		.type = RTE_BPF_XTYPE_VAR,
+		.var = &stdout,
+	},
+	{
+		.name = RTE_STR(rte_pktmbuf_dump),
+		.type = RTE_BPF_XTYPE_FUNC,
+		.func = (void *)rte_pktmbuf_dump,
+	},
+};
+
+#endif /* _BPF_SUP_H_ */
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 40b31ad7e..d0ad27871 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -75,6 +75,7 @@
 #include "testpmd.h"
 #include "cmdline_mtr.h"
 #include "cmdline_tm.h"
+#include "bpf_sup.h"
 
 static struct cmdline *testpmd_cl;
 
@@ -16030,6 +16031,149 @@ cmdline_parse_inst_t cmd_load_from_file = {
 	},
 };
 
+/* *** load BPF program *** */
+struct cmd_bpf_ld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+	cmdline_fixed_string_t op;
+	cmdline_fixed_string_t flags;
+	cmdline_fixed_string_t prm;
+};
+
+static void
+bpf_parse_flags(const char *str, enum rte_bpf_prog_type *ptype, uint32_t *flags)
+{
+	uint32_t i, v;
+
+	*flags = RTE_BPF_ETH_F_NONE;
+	*ptype = RTE_BPF_PROG_TYPE_UNSPEC;
+
+	for (i = 0; str[i] != 0; i++) {
+		v = toupper(str[i]);
+		if (v == 'J')
+			*flags |= RTE_BPF_ETH_F_JIT;
+		else if (v == 'M')
+			*ptype = RTE_BPF_PROG_TYPE_MBUF;
+		else if (v == '-')
+			continue;
+		else
+			printf("unknown flag: \'%c\'", v);
+	}
+}
+
+static void cmd_operate_bpf_ld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	int32_t rc;
+	uint32_t flags;
+	struct cmd_bpf_ld_result *res;
+	struct rte_bpf_prm prm;
+	const char *fname, *sname;
+
+	res = parsed_result;
+	memset(&prm, 0, sizeof(prm));
+	prm.xsym = bpf_xsym;
+	prm.nb_xsym = RTE_DIM(bpf_xsym);
+
+	bpf_parse_flags(res->flags, &prm.prog_type, &flags);
+	fname = res->prm;
+	sname = ".text";
+
+	if (strcmp(res->dir, "rx") == 0) {
+		rc = rte_bpf_eth_rx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else if (strcmp(res->dir, "tx") == 0) {
+		rc = rte_bpf_eth_tx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_load_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			bpf, "bpf-load");
+cmdline_parse_token_string_t cmd_load_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_load_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_load_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, queue, UINT16);
+cmdline_parse_token_string_t cmd_load_bpf_flags =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			flags, NULL);
+cmdline_parse_token_string_t cmd_load_bpf_prm =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			prm, NULL);
+
+cmdline_parse_inst_t cmd_operate_bpf_ld_parse = {
+	.f = cmd_operate_bpf_ld_parsed,
+	.data = NULL,
+	.help_str = "bpf-load rx|tx <port> <queue> <J|M|B> <file_name>",
+	.tokens = {
+		(void *)&cmd_load_bpf_start,
+		(void *)&cmd_load_bpf_dir,
+		(void *)&cmd_load_bpf_port,
+		(void *)&cmd_load_bpf_queue,
+		(void *)&cmd_load_bpf_flags,
+		(void *)&cmd_load_bpf_prm,
+		NULL,
+	},
+};
+
+/* *** unload BPF program *** */
+struct cmd_bpf_unld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+};
+
+static void cmd_operate_bpf_unld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	struct cmd_bpf_unld_result *res;
+
+	res = parsed_result;
+
+	if (strcmp(res->dir, "rx") == 0)
+		rte_bpf_eth_rx_unload(res->port, res->queue);
+	else if (strcmp(res->dir, "tx") == 0)
+		rte_bpf_eth_tx_unload(res->port, res->queue);
+	else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_unload_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			bpf, "bpf-unload");
+cmdline_parse_token_string_t cmd_unload_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_unload_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_unload_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, queue, UINT16);
+
+cmdline_parse_inst_t cmd_operate_bpf_unld_parse = {
+	.f = cmd_operate_bpf_unld_parsed,
+	.data = NULL,
+	.help_str = "bpf-unload rx|tx <port> <queue>",
+	.tokens = {
+		(void *)&cmd_unload_bpf_start,
+		(void *)&cmd_unload_bpf_dir,
+		(void *)&cmd_unload_bpf_port,
+		(void *)&cmd_unload_bpf_queue,
+		NULL,
+	},
+};
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -16272,6 +16416,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_del_port_tm_node,
 	(cmdline_parse_inst_t *)&cmd_set_port_tm_node_parent,
 	(cmdline_parse_inst_t *)&cmd_port_tm_hierarchy_commit,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_ld_parse,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_unld_parse,
 	NULL,
 };
 
diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
index b47537642..602e20ac3 100644
--- a/app/test-pmd/meson.build
+++ b/app/test-pmd/meson.build
@@ -21,7 +21,7 @@ sources = files('cmdline.c',
 	'testpmd.c',
 	'txonly.c')
 
-deps = ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'meter', 'bus_pci']
+deps = ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'meter', 'bus_pci', 'bpf']
 if dpdk_conf.has('RTE_LIBRTE_PDUMP')
 	deps += 'pdump'
 endif
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v2 7/7] test: add few eBPF samples
  2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
                   ` (13 preceding siblings ...)
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 6/7] testpmd: new commands to load/unload " Konstantin Ananyev
@ 2018-03-30 17:32 ` Konstantin Ananyev
  14 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-03-30 17:32 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add few simple eBPF programs as an example.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 test/bpf/dummy.c |  20 ++
 test/bpf/mbuf.h  | 578 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 test/bpf/t1.c    |  52 +++++
 test/bpf/t2.c    |  31 +++
 test/bpf/t3.c    |  36 ++++
 5 files changed, 717 insertions(+)
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c

diff --git a/test/bpf/dummy.c b/test/bpf/dummy.c
new file mode 100644
index 000000000..5851469e7
--- /dev/null
+++ b/test/bpf/dummy.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * does nothing always return success.
+ * used to measure BPF infrastructure overhead.
+ * To compile:
+ * clang -O2 -target bpf -c dummy.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+
+uint64_t
+entry(void *arg)
+{
+	return 1;
+}
diff --git a/test/bpf/mbuf.h b/test/bpf/mbuf.h
new file mode 100644
index 000000000..f24f908d7
--- /dev/null
+++ b/test/bpf/mbuf.h
@@ -0,0 +1,578 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation.
+ * Copyright 2014 6WIND S.A.
+ */
+
+/*
+ * Snipper from dpdk.org rte_mbuf.h.
+ * used to provide BPF programs information about rte_mbuf layout.
+ */
+
+#ifndef _MBUF_H_
+#define _MBUF_H_
+
+#include <stdint.h>
+#include <rte_common.h>
+#include <rte_memory.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * Packet Offload Features Flags. It also carry packet type information.
+ * Critical resources. Both rx/tx shared these bits. Be cautious on any change
+ *
+ * - RX flags start at bit position zero, and get added to the left of previous
+ *   flags.
+ * - The most-significant 3 bits are reserved for generic mbuf flags
+ * - TX flags therefore start at bit position 60 (i.e. 63-3), and new flags get
+ *   added to the right of the previously defined flags i.e. they should count
+ *   downwards, not upwards.
+ *
+ * Keep these flags synchronized with rte_get_rx_ol_flag_name() and
+ * rte_get_tx_ol_flag_name().
+ */
+
+/**
+ * RX packet is a 802.1q VLAN packet. This flag was set by PMDs when
+ * the packet is recognized as a VLAN, but the behavior between PMDs
+ * was not the same. This flag is kept for some time to avoid breaking
+ * applications and should be replaced by PKT_RX_VLAN_STRIPPED.
+ */
+#define PKT_RX_VLAN_PKT      (1ULL << 0)
+
+#define PKT_RX_RSS_HASH      (1ULL << 1)
+/**< RX packet with RSS hash result. */
+#define PKT_RX_FDIR          (1ULL << 2)
+/**< RX packet with FDIR match indicate. */
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_L4_CKSUM_MASK.
+ * This flag was set when the L4 checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_IP_CKSUM_MASK.
+ * This flag was set when the IP checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
+
+#define PKT_RX_EIP_CKSUM_BAD (1ULL << 5)
+/**< External IP header checksum error. */
+
+/**
+ * A vlan has been stripped by the hardware and its tci is saved in
+ * mbuf->vlan_tci. This can only happen if vlan stripping is enabled
+ * in the RX configuration of the PMD.
+ */
+#define PKT_RX_VLAN_STRIPPED (1ULL << 6)
+
+/**
+ * Mask of bits used to determine the status of RX IP checksum.
+ * - PKT_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
+ * - PKT_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
+ * - PKT_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
+ * - PKT_RX_IP_CKSUM_NONE: the IP checksum is not correct in the packet
+ *   data, but the integrity of the IP header is verified.
+ */
+#define PKT_RX_IP_CKSUM_MASK ((1ULL << 4) | (1ULL << 7))
+
+#define PKT_RX_IP_CKSUM_UNKNOWN 0
+#define PKT_RX_IP_CKSUM_BAD     (1ULL << 4)
+#define PKT_RX_IP_CKSUM_GOOD    (1ULL << 7)
+#define PKT_RX_IP_CKSUM_NONE    ((1ULL << 4) | (1ULL << 7))
+
+/**
+ * Mask of bits used to determine the status of RX L4 checksum.
+ * - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
+ * - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
+ * - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
+ * - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
+ *   data, but the integrity of the L4 data is verified.
+ */
+#define PKT_RX_L4_CKSUM_MASK ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_L4_CKSUM_UNKNOWN 0
+#define PKT_RX_L4_CKSUM_BAD     (1ULL << 3)
+#define PKT_RX_L4_CKSUM_GOOD    (1ULL << 8)
+#define PKT_RX_L4_CKSUM_NONE    ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_IEEE1588_PTP  (1ULL << 9)
+/**< RX IEEE1588 L2 Ethernet PT Packet. */
+#define PKT_RX_IEEE1588_TMST (1ULL << 10)
+/**< RX IEEE1588 L2/L4 timestamped packet.*/
+#define PKT_RX_FDIR_ID       (1ULL << 13)
+/**< FD id reported if FDIR match. */
+#define PKT_RX_FDIR_FLX      (1ULL << 14)
+/**< Flexible bytes reported if FDIR match. */
+
+/**
+ * The 2 vlans have been stripped by the hardware and their tci are
+ * saved in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
+ * This can only happen if vlan stripping is enabled in the RX
+ * configuration of the PMD. If this flag is set, PKT_RX_VLAN_STRIPPED
+ * must also be set.
+ */
+#define PKT_RX_QINQ_STRIPPED (1ULL << 15)
+
+/**
+ * Deprecated.
+ * RX packet with double VLAN stripped.
+ * This flag is replaced by PKT_RX_QINQ_STRIPPED.
+ */
+#define PKT_RX_QINQ_PKT      PKT_RX_QINQ_STRIPPED
+
+/**
+ * When packets are coalesced by a hardware or virtual driver, this flag
+ * can be set in the RX mbuf, meaning that the m->tso_segsz field is
+ * valid and is set to the segment size of original packets.
+ */
+#define PKT_RX_LRO           (1ULL << 16)
+
+/**
+ * Indicate that the timestamp field in the mbuf is valid.
+ */
+#define PKT_RX_TIMESTAMP     (1ULL << 17)
+
+/* add new RX flags here */
+
+/* add new TX flags here */
+
+/**
+ * Offload the MACsec. This flag must be set by the application to enable
+ * this offload feature for a packet to be transmitted.
+ */
+#define PKT_TX_MACSEC        (1ULL << 44)
+
+/**
+ * Bits 45:48 used for the tunnel type.
+ * When doing Tx offload like TSO or checksum, the HW needs to configure the
+ * tunnel type into the HW descriptors.
+ */
+#define PKT_TX_TUNNEL_VXLAN   (0x1ULL << 45)
+#define PKT_TX_TUNNEL_GRE     (0x2ULL << 45)
+#define PKT_TX_TUNNEL_IPIP    (0x3ULL << 45)
+#define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
+/**< TX packet with MPLS-in-UDP RFC 7510 header. */
+#define PKT_TX_TUNNEL_MPLSINUDP (0x5ULL << 45)
+/* add new TX TUNNEL type here */
+#define PKT_TX_TUNNEL_MASK    (0xFULL << 45)
+
+/**
+ * Second VLAN insertion (QinQ) flag.
+ */
+#define PKT_TX_QINQ_PKT    (1ULL << 49)
+/**< TX packet with double VLAN inserted. */
+
+/**
+ * TCP segmentation offload. To enable this offload feature for a
+ * packet to be transmitted on hardware supporting TSO:
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *    PKT_TX_TCP_CKSUM)
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
+ *    to 0 in the packet
+ *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
+ *  - calculate the pseudo header checksum without taking ip_len in account,
+ *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
+ *    rte_ipv6_phdr_cksum() that can be used as helpers.
+ */
+#define PKT_TX_TCP_SEG       (1ULL << 50)
+
+#define PKT_TX_IEEE1588_TMST (1ULL << 51)
+/**< TX IEEE1588 packet to timestamp. */
+
+/**
+ * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
+ * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
+ * L4 checksum offload, the user needs to:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - calculate the pseudo header checksum and set it in the L4 header (only
+ *    for TCP or UDP). See rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum().
+ *    For SCTP, set the crc field to 0.
+ */
+#define PKT_TX_L4_NO_CKSUM   (0ULL << 52)
+/**< Disable L4 cksum of TX pkt. */
+#define PKT_TX_TCP_CKSUM     (1ULL << 52)
+/**< TCP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_SCTP_CKSUM    (2ULL << 52)
+/**< SCTP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_UDP_CKSUM     (3ULL << 52)
+/**< UDP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_L4_MASK       (3ULL << 52)
+/**< Mask for L4 cksum offload request. */
+
+/**
+ * Offload the IP checksum in the hardware. The flag PKT_TX_IPV4 should
+ * also be set by the application, although a PMD will only check
+ * PKT_TX_IP_CKSUM.
+ *  - set the IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: l2_len, l3_len
+ */
+#define PKT_TX_IP_CKSUM      (1ULL << 54)
+
+/**
+ * Packet is IPv4. This flag must be set when using any offload feature
+ * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV4          (1ULL << 55)
+
+/**
+ * Packet is IPv6. This flag must be set when using an offload feature
+ * (TSO or L4 checksum) to tell the NIC that the packet is an IPv6
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV6          (1ULL << 56)
+
+#define PKT_TX_VLAN_PKT      (1ULL << 57)
+/**< TX packet is a 802.1q VLAN packet. */
+
+/**
+ * Offload the IP checksum of an external header in the hardware. The
+ * flag PKT_TX_OUTER_IPV4 should also be set by the application, alto ugh
+ * a PMD will only check PKT_TX_IP_CKSUM.  The IP checksum field in the
+ * packet must be set to 0.
+ *  - set the outer IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: outer_l2_len, outer_l3_len
+ */
+#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
+
+/**
+ * Packet outer header is IPv4. This flag must be set when using any
+ * outer offload feature (L3 or L4 checksum) to tell the NIC that the
+ * outer header of the tunneled packet is an IPv4 packet.
+ */
+#define PKT_TX_OUTER_IPV4   (1ULL << 59)
+
+/**
+ * Packet outer header is IPv6. This flag must be set when using any
+ * outer offload feature (L4 checksum) to tell the NIC that the outer
+ * header of the tunneled packet is an IPv6 packet.
+ */
+#define PKT_TX_OUTER_IPV6    (1ULL << 60)
+
+/**
+ * Bitmask of all supported packet Tx offload features flags,
+ * which can be set for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_IEEE1588_TMST |	 \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK |	 \
+		PKT_TX_MACSEC)
+
+#define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
+
+#define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
+
+/* Use final bit of flags to indicate a control mbuf */
+#define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
+
+/** Alignment constraint of mbuf private area. */
+#define RTE_MBUF_PRIV_ALIGN 8
+
+/**
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of RX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the RX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of TX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the TX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Some NICs need at least 2KB buffer to RX standard Ethernet frame without
+ * splitting it into multiple segments.
+ * So, for mbufs that planned to be involved into RX/TX, the recommended
+ * minimal buffer length is 2KB + RTE_PKTMBUF_HEADROOM.
+ */
+#define	RTE_MBUF_DEFAULT_DATAROOM	2048
+#define	RTE_MBUF_DEFAULT_BUF_SIZE	\
+	(RTE_MBUF_DEFAULT_DATAROOM + RTE_PKTMBUF_HEADROOM)
+
+/* define a set of marker types that can be used to refer to set points in the
+ * mbuf.
+ */
+__extension__
+typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
+__extension__
+typedef uint8_t  MARKER8[0];  /**< generic marker with 1B alignment */
+__extension__
+typedef uint64_t MARKER64[0];
+/**< marker that allows us to overwrite 8 bytes with a single assignment */
+
+typedef struct {
+	volatile int16_t cnt; /**< An internal counter value. */
+} rte_atomic16_t;
+
+/**
+ * The generic rte_mbuf, containing a packet mbuf.
+ */
+struct rte_mbuf {
+	MARKER cacheline0;
+
+	void *buf_addr;           /**< Virtual address of segment buffer. */
+	/**
+	 * Physical address of segment buffer.
+	 * Force alignment to 8-bytes, so as to ensure we have the exact
+	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+	 * working on vector drivers easier.
+	 */
+	phys_addr_t buf_physaddr __rte_aligned(sizeof(phys_addr_t));
+
+	/* next 8 bytes are initialised on RX descriptor rearm */
+	MARKER64 rearm_data;
+	uint16_t data_off;
+
+	/**
+	 * Reference counter. Its size should at least equal to the size
+	 * of port field (16 bits), to support zero-copy broadcast.
+	 * It should only be accessed using the following functions:
+	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
+	 * config option.
+	 */
+	RTE_STD_C11
+	union {
+		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
+		uint16_t refcnt;
+		/**< Non-atomically accessed refcnt */
+	};
+	uint16_t nb_segs;         /**< Number of segments. */
+
+	/** Input port (16 bits to support more than 256 virtual ports). */
+	uint16_t port;
+
+	uint64_t ol_flags;        /**< Offload features. */
+
+	/* remaining bytes are set on RX when pulling packet from descriptor */
+	MARKER rx_descriptor_fields1;
+
+	/*
+	 * The packet type, which is the combination of outer/inner L2, L3, L4
+	 * and tunnel types. The packet_type is about data really present in the
+	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+	 * vlan is stripped from the data.
+	 */
+	RTE_STD_C11
+	union {
+		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		struct {
+			uint32_t l2_type:4; /**< (Outer) L2 type. */
+			uint32_t l3_type:4; /**< (Outer) L3 type. */
+			uint32_t l4_type:4; /**< (Outer) L4 type. */
+			uint32_t tun_type:4; /**< Tunnel type. */
+			uint32_t inner_l2_type:4; /**< Inner L2 type. */
+			uint32_t inner_l3_type:4; /**< Inner L3 type. */
+			uint32_t inner_l4_type:4; /**< Inner L4 type. */
+		};
+	};
+
+	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
+	uint16_t data_len;        /**< Amount of data in segment buffer. */
+	/** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
+	uint16_t vlan_tci;
+
+	union {
+		uint32_t rss;     /**< RSS hash result if RSS enabled */
+		struct {
+			RTE_STD_C11
+			union {
+				struct {
+					uint16_t hash;
+					uint16_t id;
+				};
+				uint32_t lo;
+				/**< Second 4 flexible bytes */
+			};
+			uint32_t hi;
+			/**< First 4 flexible bytes or FD ID, dependent on
+			 *   PKT_RX_FDIR_* flag in ol_flags.
+			 */
+		} fdir;           /**< Filter identifier if FDIR enabled */
+		struct {
+			uint32_t lo;
+			uint32_t hi;
+		} sched;          /**< Hierarchical scheduler */
+		uint32_t usr;
+		/**< User defined tags. See rte_distributor_process() */
+	} hash;                   /**< hash information */
+
+	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
+	uint16_t vlan_tci_outer;
+
+	uint16_t buf_len;         /**< Length of segment buffer. */
+
+	/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
+	 * are not normalized but are always the same for a given port.
+	 */
+	uint64_t timestamp;
+
+	/* second cache line - fields only used in slow path or on TX */
+	MARKER cacheline1 __rte_cache_min_aligned;
+
+	RTE_STD_C11
+	union {
+		void *userdata;   /**< Can be used for external metadata */
+		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
+	};
+
+	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
+	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+
+	/* fields to support TX offloads */
+	RTE_STD_C11
+	union {
+		uint64_t tx_offload;       /**< combined for easy fetch */
+		__extension__
+		struct {
+			uint64_t l2_len:7;
+			/**< L2 (MAC) Header Length for non-tunneling pkt.
+			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
+			 */
+			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+			uint64_t tso_segsz:16; /**< TCP TSO segment size */
+
+			/* fields for TX offloading of tunnels */
+			uint64_t outer_l3_len:9;
+			/**< Outer L3 (IP) Hdr Length. */
+			uint64_t outer_l2_len:7;
+			/**< Outer L2 (MAC) Hdr Length. */
+
+			/* uint64_t unused:8; */
+		};
+	};
+
+	/** Size of the application private data. In case of an indirect
+	 * mbuf, it stores the direct mbuf private data size.
+	 */
+	uint16_t priv_size;
+
+	/** Timesync flags for use with IEEE1588. */
+	uint16_t timesync;
+
+	/** Sequence number. See also rte_reorder_insert(). */
+	uint32_t seqn;
+
+} __rte_cache_aligned;
+
+
+/**
+ * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
+ */
+#define RTE_MBUF_INDIRECT(mb)   ((mb)->ol_flags & IND_ATTACHED_MBUF)
+
+/**
+ * Returns TRUE if given mbuf is direct, or FALSE otherwise.
+ */
+#define RTE_MBUF_DIRECT(mb)     (!RTE_MBUF_INDIRECT(mb))
+
+/**
+ * Private data in case of pktmbuf pool.
+ *
+ * A structure that contains some pktmbuf_pool-specific data that are
+ * appended after the mempool structure (in private data).
+ */
+struct rte_pktmbuf_pool_private {
+	uint16_t mbuf_data_room_size; /**< Size of data space in each mbuf. */
+	uint16_t mbuf_priv_size;      /**< Size of private area in each mbuf. */
+};
+
+/**
+ * A macro that points to an offset into the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param o
+ *   The offset into the mbuf data.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod_offset(m, t, o)	\
+	((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
+
+/**
+ * A macro that points to the start of the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _MBUF_H_ */
diff --git a/test/bpf/t1.c b/test/bpf/t1.c
new file mode 100644
index 000000000..60f9434ab
--- /dev/null
+++ b/test/bpf/t1.c
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to first segment packet data as an input parameter.
+ * analog of tcpdump -s 1 -d 'dst 1.2.3.4 && udp && dst port 5000'
+ * (000) ldh      [12]
+ * (001) jeq      #0x800           jt 2    jf 12
+ * (002) ld       [30]
+ * (003) jeq      #0x1020304       jt 4    jf 12
+ * (004) ldb      [23]
+ * (005) jeq      #0x11            jt 6    jf 12
+ * (006) ldh      [20]
+ * (007) jset     #0x1fff          jt 12   jf 8
+ * (008) ldxb     4*([14]&0xf)
+ * (009) ldh      [x + 16]
+ * (010) jeq      #0x1388          jt 11   jf 12
+ * (011) ret      #1
+ * (012) ret      #0
+ *
+ * To compile:
+ * clang -O2 -target bpf -c t1.c
+ */
+
+#include <stdint.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <netinet/udp.h>
+
+uint64_t
+entry(void *pkt)
+{
+	struct ether_header *ether_header = (void *)pkt;
+
+	if (ether_header->ether_type != __builtin_bswap16(0x0800))
+		return 0;
+
+	struct iphdr *iphdr = (void *)(ether_header + 1);
+	if (iphdr->protocol != 17 || (iphdr->frag_off & 0x1ffff) != 0 ||
+			iphdr->daddr != __builtin_bswap32(0x1020304))
+		return 0;
+
+	int hlen = iphdr->ihl * 4;
+	struct udphdr *udphdr = (void *)iphdr + hlen;
+
+	if (udphdr->dest !=  __builtin_bswap16(5000))
+		return 0;
+
+	return 1;
+}
diff --git a/test/bpf/t2.c b/test/bpf/t2.c
new file mode 100644
index 000000000..69d7a4fe1
--- /dev/null
+++ b/test/bpf/t2.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * cleanup mbuf's vlan_tci and all related RX flags
+ * (PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED).
+ * Doesn't touch contents of packet data.
+ * To compile:
+ * clang -O2 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t2.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <rte_config.h>
+#include "mbuf.h"
+
+uint64_t
+entry(void *pkt)
+{
+	struct rte_mbuf *mb;
+
+	mb = pkt;
+	mb->vlan_tci = 0;
+	mb->ol_flags &= ~(PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED);
+
+	return 1;
+}
diff --git a/test/bpf/t3.c b/test/bpf/t3.c
new file mode 100644
index 000000000..531b9cb8c
--- /dev/null
+++ b/test/bpf/t3.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * Dump the mbuf into stdout if it is an ARP packet (aka tcpdump 'arp').
+ * To compile:
+ * clang -O2 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t3.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <net/ethernet.h>
+#include <rte_config.h>
+#include "mbuf.h"
+
+extern void rte_pktmbuf_dump(FILE *, const struct rte_mbuf *, unsigned int);
+
+uint64_t
+entry(const void *pkt)
+{
+	const struct rte_mbuf *mb;
+	const struct ether_header *eth;
+
+	mb = pkt;
+	eth = rte_pktmbuf_mtod(mb, const struct ether_header *);
+
+	if (eth->ether_type == __builtin_bswap16(ETHERTYPE_ARP))
+		rte_pktmbuf_dump(stdout, mb, 64);
+
+	return 1;
+}
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
@ 2018-04-02 22:44   ` Jerin Jacob
  2018-04-03 14:57     ` Ananyev, Konstantin
  0 siblings, 1 reply; 83+ messages in thread
From: Jerin Jacob @ 2018-04-02 22:44 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev

-----Original Message-----
> Date: Fri, 30 Mar 2018 18:32:41 +0100
> From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> To: dev@dpdk.org
> CC: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Subject: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters
> X-Mailer: git-send-email 1.7.0.7
> 
> Introduce API to install BPF based filters on ethdev RX/TX path.
> Current implementation is pure SW one, based on ethdev RX/TX
> callback mechanism.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---

Hi Konstantin,

> +/*
> + * Marks given callback as used by datapath.
> + */
> +static __rte_always_inline void
> +bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
> +{
> +	cbi->use++;
> +	/* make sure no store/load reordering could happen */
> +	rte_smp_mb();
> +}
> +
> +/*
> + * Marks given callback list as not used by datapath.
> + */
> +static __rte_always_inline void
> +bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
> +{
> +	/* make sure all previous loads are completed */
> +	rte_smp_rmb();

We earlier discussed this barrier. Will following scheme works out to
fix the bpf_eth_cbi_wait() without cbi->use scheme?

#ie. We need to exit from jitted or interpreted code irrespective of its
state. IMO, We can do that by an _arch_ specific function to fill jitted  memory with
"exit" opcode(value:0x95, exit, return r0),so that above code needs to be come out i n anycase,
on next instruction execution. I know, jitted memory is read-only in your
design, I think, we can change the permission to "write" to the fill
"exit" opcode(both jitted or interpreted case) for termination.


What you think?

> +	cbi->use++;
> +}
> +
> +/*
> + * Waits till datapath finished using given callback.
> + */
> +static void
> +bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
> +{
> +	uint32_t nuse, puse;
> +
> +	/* make sure all previous loads and stores are completed */
> +	rte_smp_mb();
> +
> +	puse = cbi->use;
> +
> +	/* in use, busy wait till current RX/TX iteration is finished */
> +	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
> +		do {
> +			rte_pause();
> +			rte_compiler_barrier();
> +			nuse = cbi->use;
> +		} while (nuse == puse);
> +	}
> +}

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters
  2018-04-02 22:44   ` Jerin Jacob
@ 2018-04-03 14:57     ` Ananyev, Konstantin
  2018-04-03 17:17       ` Jerin Jacob
  0 siblings, 1 reply; 83+ messages in thread
From: Ananyev, Konstantin @ 2018-04-03 14:57 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

Hi Jerin,

> 
> Hi Konstantin,
> 
> > +/*
> > + * Marks given callback as used by datapath.
> > + */
> > +static __rte_always_inline void
> > +bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
> > +{
> > +	cbi->use++;
> > +	/* make sure no store/load reordering could happen */
> > +	rte_smp_mb();
> > +}
> > +
> > +/*
> > + * Marks given callback list as not used by datapath.
> > + */
> > +static __rte_always_inline void
> > +bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
> > +{
> > +	/* make sure all previous loads are completed */
> > +	rte_smp_rmb();
> 
> We earlier discussed this barrier. Will following scheme works out to
> fix the bpf_eth_cbi_wait() without cbi->use scheme?
> 
> #ie. We need to exit from jitted or interpreted code irrespective of its
> state. IMO, We can do that by an _arch_ specific function to fill jitted  memory with
> "exit" opcode(value:0x95, exit, return r0),so that above code needs to be come out i n anycase,
> on next instruction execution. I know, jitted memory is read-only in your
> design, I think, we can change the permission to "write" to the fill
> "exit" opcode(both jitted or interpreted case) for termination.
>
> What you think?

Not sure I understand your proposal...
Are you suggesting to change bpf_exec() and bpf_jit() to make them execute sync primitives in an arch specific manner?
But some users probably will use bpf_exec/jitted program in the environment that wouldn't require such synchronization.
For these people it would be just unnecessary slowdown.

If you are looking for a ways to replace 'smp_rmb'  in bpf_eth_cbi_unuse() with something arch specific, then
I can make cbi_inuse/cbi_unuse - arch specific with keeping current implementation as generic one.
Would that help?

Konstantin

> 
> > +	cbi->use++;
> > +}
> > +
> > +/*
> > + * Waits till datapath finished using given callback.
> > + */
> > +static void
> > +bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
> > +{
> > +	uint32_t nuse, puse;
> > +
> > +	/* make sure all previous loads and stores are completed */
> > +	rte_smp_mb();
> > +
> > +	puse = cbi->use;
> > +
> > +	/* in use, busy wait till current RX/TX iteration is finished */
> > +	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
> > +		do {
> > +			rte_pause();
> > +			rte_compiler_barrier();
> > +			nuse = cbi->use;
> > +		} while (nuse == puse);
> > +	}
> > +}

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters
  2018-04-03 14:57     ` Ananyev, Konstantin
@ 2018-04-03 17:17       ` Jerin Jacob
  2018-04-04 11:39         ` Ananyev, Konstantin
  0 siblings, 1 reply; 83+ messages in thread
From: Jerin Jacob @ 2018-04-03 17:17 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

-----Original Message-----
> Date: Tue, 3 Apr 2018 14:57:32 +0000
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>
> Subject: RE: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF
>  filters
> 

Hi Konstantin,

> Hi Jerin,
> 
> > 
> > Hi Konstantin,
> > 
> > > +/*
> > > + * Marks given callback as used by datapath.
> > > + */
> > > +static __rte_always_inline void
> > > +bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
> > > +{
> > > +	cbi->use++;
> > > +	/* make sure no store/load reordering could happen */
> > > +	rte_smp_mb();
> > > +}
> > > +
> > > +/*
> > > + * Marks given callback list as not used by datapath.
> > > + */
> > > +static __rte_always_inline void
> > > +bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
> > > +{
> > > +	/* make sure all previous loads are completed */
> > > +	rte_smp_rmb();
> > 
> > We earlier discussed this barrier. Will following scheme works out to
> > fix the bpf_eth_cbi_wait() without cbi->use scheme?
> > 
> > #ie. We need to exit from jitted or interpreted code irrespective of its
> > state. IMO, We can do that by an _arch_ specific function to fill jitted  memory with
> > "exit" opcode(value:0x95, exit, return r0),so that above code needs to be come out i n anycase,
> > on next instruction execution. I know, jitted memory is read-only in your
> > design, I think, we can change the permission to "write" to the fill
> > "exit" opcode(both jitted or interpreted case) for termination.
> >
> > What you think?
> 
> Not sure I understand your proposal...

If I understand it correctly, bpf_eth_cbi_wait() is used to _wait_ until
eBPF program exits? Right? . Instead of using bpf_eth_cbi_[un]use()
scheme which involves the barrier. How about,

in bpf_eth_cbi_wait()
{

memset the EBPF "program memory" with 0x95 value. Which is an "exit" and
"return r0" EPBF opcode, Which makes program to terminate by it own
as on 0x95 instruction, CPU decodes and it gets out from EPBF program.

}

In jitted case, it is not 0x95 instruction, which will be an arch
specific instructions, We can have arch abstraction to generated
such instruction for "exit" opcode. And use common code to fill the instructions
to exit from EPBF program provided by arch code.

Does that makes sense?


> Are you suggesting to change bpf_exec() and bpf_jit() to make them execute sync primitives in an arch specific manner?
> But some users probably will use bpf_exec/jitted program in the environment that wouldn't require such synchronization.
> For these people it would be just unnecessary slowdown.
> 
> If you are looking for a ways to replace 'smp_rmb'  in bpf_eth_cbi_unuse() with something arch specific, then
> I can make cbi_inuse/cbi_unuse - arch specific with keeping current implementation as generic one.
> Would that help?
> 
> Konstantin
> 
> > 
> > > +	cbi->use++;
> > > +}
> > > +
> > > +/*
> > > + * Waits till datapath finished using given callback.
> > > + */
> > > +static void
> > > +bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
> > > +{
> > > +	uint32_t nuse, puse;
> > > +
> > > +	/* make sure all previous loads and stores are completed */
> > > +	rte_smp_mb();
> > > +
> > > +	puse = cbi->use;
> > > +
> > > +	/* in use, busy wait till current RX/TX iteration is finished */
> > > +	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
> > > +		do {
> > > +			rte_pause();
> > > +			rte_compiler_barrier();
> > > +			nuse = cbi->use;
> > > +		} while (nuse == puse);
> > > +	}
> > > +}

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters
  2018-04-03 17:17       ` Jerin Jacob
@ 2018-04-04 11:39         ` Ananyev, Konstantin
  2018-04-04 17:51           ` Jerin Jacob
  0 siblings, 1 reply; 83+ messages in thread
From: Ananyev, Konstantin @ 2018-04-04 11:39 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev

Hi Jerin,

> > >
> > > > +/*
> > > > + * Marks given callback as used by datapath.
> > > > + */
> > > > +static __rte_always_inline void
> > > > +bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
> > > > +{
> > > > +	cbi->use++;
> > > > +	/* make sure no store/load reordering could happen */
> > > > +	rte_smp_mb();
> > > > +}
> > > > +
> > > > +/*
> > > > + * Marks given callback list as not used by datapath.
> > > > + */
> > > > +static __rte_always_inline void
> > > > +bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
> > > > +{
> > > > +	/* make sure all previous loads are completed */
> > > > +	rte_smp_rmb();
> > >
> > > We earlier discussed this barrier. Will following scheme works out to
> > > fix the bpf_eth_cbi_wait() without cbi->use scheme?
> > >
> > > #ie. We need to exit from jitted or interpreted code irrespective of its
> > > state. IMO, We can do that by an _arch_ specific function to fill jitted  memory with
> > > "exit" opcode(value:0x95, exit, return r0),so that above code needs to be come out i n anycase,
> > > on next instruction execution. I know, jitted memory is read-only in your
> > > design, I think, we can change the permission to "write" to the fill
> > > "exit" opcode(both jitted or interpreted case) for termination.
> > >
> > > What you think?
> >
> > Not sure I understand your proposal...
> 
> If I understand it correctly, bpf_eth_cbi_wait() is used to _wait_ until
> eBPF program exits? Right?

Kind off, but not only. 
After  bpf_eth_cbi_wait() finishes it is guaranteed that data-path wouldn't try
to access the resources associated with given bpf_eth_cbi (bpf, jit), so we
can proceed with freeing them. 

> . Instead of using bpf_eth_cbi_[un]use()
> scheme which involves the barrier. How about,
> 
> in bpf_eth_cbi_wait()
> {
> 
> memset the EBPF "program memory" with 0x95 value. Which is an "exit" and
> "return r0" EPBF opcode, Which makes program to terminate by it own
> as on 0x95 instruction, CPU decodes and it gets out from EPBF program.
> 
> }
> 
> In jitted case, it is not 0x95 instruction, which will be an arch
> specific instructions, We can have arch abstraction to generated
> such instruction for "exit" opcode. And use common code to fill the instructions
> to exit from EPBF program provided by arch code.
> 
> Does that makes sense?

There is no much point in doing it.
What we need is a guarantee that after some point data-path wouldn't try to access
given bpf context, so we can destroy it.
Konstantin

> 
> 
> > Are you suggesting to change bpf_exec() and bpf_jit() to make them execute sync primitives in an arch specific manner?
> > But some users probably will use bpf_exec/jitted program in the environment that wouldn't require such synchronization.
> > For these people it would be just unnecessary slowdown.
> >
> > If you are looking for a ways to replace 'smp_rmb'  in bpf_eth_cbi_unuse() with something arch specific, then
> > I can make cbi_inuse/cbi_unuse - arch specific with keeping current implementation as generic one.
> > Would that help?
> >
> > Konstantin
> >
> > >
> > > > +	cbi->use++;
> > > > +}
> > > > +
> > > > +/*
> > > > + * Waits till datapath finished using given callback.
> > > > + */
> > > > +static void
> > > > +bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
> > > > +{
> > > > +	uint32_t nuse, puse;
> > > > +
> > > > +	/* make sure all previous loads and stores are completed */
> > > > +	rte_smp_mb();
> > > > +
> > > > +	puse = cbi->use;
> > > > +
> > > > +	/* in use, busy wait till current RX/TX iteration is finished */
> > > > +	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
> > > > +		do {
> > > > +			rte_pause();
> > > > +			rte_compiler_barrier();
> > > > +			nuse = cbi->use;
> > > > +		} while (nuse == puse);
> > > > +	}
> > > > +}

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters
  2018-04-04 11:39         ` Ananyev, Konstantin
@ 2018-04-04 17:51           ` Jerin Jacob
  2018-04-05 12:51             ` Ananyev, Konstantin
  0 siblings, 1 reply; 83+ messages in thread
From: Jerin Jacob @ 2018-04-04 17:51 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

-----Original Message-----
> Date: Wed, 4 Apr 2018 11:39:59 +0000
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>
> Subject: RE: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF
>  filters
> 

Hi Konstantin,

> 
> > > >
> > > > > +/*
> > > > > + * Marks given callback as used by datapath.
> > > > > + */
> > > > > +static __rte_always_inline void
> > > > > +bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
> > > > > +{
> > > > > +	cbi->use++;
> > > > > +	/* make sure no store/load reordering could happen */
> > > > > +	rte_smp_mb();
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Marks given callback list as not used by datapath.
> > > > > + */
> > > > > +static __rte_always_inline void
> > > > > +bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
> > > > > +{
> > > > > +	/* make sure all previous loads are completed */
> > > > > +	rte_smp_rmb();
> > > >
> > > > We earlier discussed this barrier. Will following scheme works out to
> > > > fix the bpf_eth_cbi_wait() without cbi->use scheme?
> > > >
> > > > #ie. We need to exit from jitted or interpreted code irrespective of its
> > > > state. IMO, We can do that by an _arch_ specific function to fill jitted  memory with
> > > > "exit" opcode(value:0x95, exit, return r0),so that above code needs to be come out i n anycase,
> > > > on next instruction execution. I know, jitted memory is read-only in your
> > > > design, I think, we can change the permission to "write" to the fill
> > > > "exit" opcode(both jitted or interpreted case) for termination.
> > > >
> > > > What you think?
> > >
> > > Not sure I understand your proposal...
> > 
> > If I understand it correctly, bpf_eth_cbi_wait() is used to _wait_ until
> > eBPF program exits? Right?
> 
> Kind off, but not only. 
> After  bpf_eth_cbi_wait() finishes it is guaranteed that data-path wouldn't try
> to access the resources associated with given bpf_eth_cbi (bpf, jit), so we
> can proceed with freeing them. 
> 
> > . Instead of using bpf_eth_cbi_[un]use()
> > scheme which involves the barrier. How about,
> > 
> > in bpf_eth_cbi_wait()
> > {
> > 
> > memset the EBPF "program memory" with 0x95 value. Which is an "exit" and
> > "return r0" EPBF opcode, Which makes program to terminate by it own
> > as on 0x95 instruction, CPU decodes and it gets out from EPBF program.
> > 
> > }
> > 
> > In jitted case, it is not 0x95 instruction, which will be an arch
> > specific instructions, We can have arch abstraction to generated
> > such instruction for "exit" opcode. And use common code to fill the instructions
> > to exit from EPBF program provided by arch code.
> > 
> > Does that makes sense?
> 
> There is no much point in doing it.

It helps in avoiding the barrier on non x86 case. Right? So it is useful
thing. Right? and avoid the extra logic in fastpath increment/decrement
"inuse" counters for all the archs.

> What we need is a guarantee that after some point data-path wouldn't try to access
> given bpf context, so we can destroy it.

Is there any reason why you think, above proposed solution wont
guarantee the termination eBPF program?

-ie, 
1)memset to "exit" instruction in eBPF memory
2)Wait for N instruction cycles to terminate the program.
Where N can maximum cycles required to complete an eBPF instruction.

OR

Are you recommending the eBPF program termination is not just enough, there are others stuffs to
relinquish in order to free the bpf context? if so, what other stuffs to
relinquish apart from eBPF program termination.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters
  2018-04-04 17:51           ` Jerin Jacob
@ 2018-04-05 12:51             ` Ananyev, Konstantin
  2018-04-09  4:38               ` Jerin Jacob
  0 siblings, 1 reply; 83+ messages in thread
From: Ananyev, Konstantin @ 2018-04-05 12:51 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev


Hi Jerin,

> 
> >
> > > > >
> > > > > > +/*
> > > > > > + * Marks given callback as used by datapath.
> > > > > > + */
> > > > > > +static __rte_always_inline void
> > > > > > +bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
> > > > > > +{
> > > > > > +	cbi->use++;
> > > > > > +	/* make sure no store/load reordering could happen */
> > > > > > +	rte_smp_mb();
> > > > > > +}
> > > > > > +
> > > > > > +/*
> > > > > > + * Marks given callback list as not used by datapath.
> > > > > > + */
> > > > > > +static __rte_always_inline void
> > > > > > +bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
> > > > > > +{
> > > > > > +	/* make sure all previous loads are completed */
> > > > > > +	rte_smp_rmb();
> > > > >
> > > > > We earlier discussed this barrier. Will following scheme works out to
> > > > > fix the bpf_eth_cbi_wait() without cbi->use scheme?
> > > > >
> > > > > #ie. We need to exit from jitted or interpreted code irrespective of its
> > > > > state. IMO, We can do that by an _arch_ specific function to fill jitted  memory with
> > > > > "exit" opcode(value:0x95, exit, return r0),so that above code needs to be come out i n anycase,
> > > > > on next instruction execution. I know, jitted memory is read-only in your
> > > > > design, I think, we can change the permission to "write" to the fill
> > > > > "exit" opcode(both jitted or interpreted case) for termination.
> > > > >
> > > > > What you think?
> > > >
> > > > Not sure I understand your proposal...
> > >
> > > If I understand it correctly, bpf_eth_cbi_wait() is used to _wait_ until
> > > eBPF program exits? Right?
> >
> > Kind off, but not only.
> > After  bpf_eth_cbi_wait() finishes it is guaranteed that data-path wouldn't try
> > to access the resources associated with given bpf_eth_cbi (bpf, jit), so we
> > can proceed with freeing them.
> >
> > > . Instead of using bpf_eth_cbi_[un]use()
> > > scheme which involves the barrier. How about,
> > >
> > > in bpf_eth_cbi_wait()
> > > {
> > >
> > > memset the EBPF "program memory" with 0x95 value. Which is an "exit" and
> > > "return r0" EPBF opcode, Which makes program to terminate by it own
> > > as on 0x95 instruction, CPU decodes and it gets out from EPBF program.
> > >
> > > }
> > >
> > > In jitted case, it is not 0x95 instruction, which will be an arch
> > > specific instructions, We can have arch abstraction to generated
> > > such instruction for "exit" opcode. And use common code to fill the instructions
> > > to exit from EPBF program provided by arch code.
> > >
> > > Does that makes sense?
> >
> > There is no much point in doing it.
> 
> It helps in avoiding the barrier on non x86 case. Right? 

Nope, I believe it doesn't, see below.

> So it is useful
> thing. Right? and avoid the extra logic in fastpath increment/decrement
> "inuse" counters for all the archs.
> 
> > What we need is a guarantee that after some point data-path wouldn't try to access
> > given bpf context, so we can destroy it.
> 
> Is there any reason why you think, above proposed solution wont
> guarantee the termination eBPF program?
> 
> -ie,
> 1)memset to "exit" instruction in eBPF memory

Even when code is just interpreted (bpf_exec()) - there still be cases 
when you need to synchronize execution thread with thread updating the code
(32bit systems, 16B LDDW instruction, etc.).  
With JIT-ed code things will become much more complicated (icache, variable size instructions)
and I can't see  how it could be done without extra synchronization between execute and update threads.

> 2)Wait for N instruction cycles to terminate the program.

There is no way to guarantee that execution would take exactly N cycles.
Execution thread could be preempted/interrupted, it could be executing syscall,
there could be CPU stall (access slow memory, cpu freq change, etc.). 

So even we'll solve all problems with 1) - it wouldn't buy us a safe solution.

Actually quite a lot of research was done how to speedup slow/fast path synchronization
in user-space:

https://lwn.net/Articles/573424/
some theory beyond:
https://lttng.org/files/thesis/desnoyers-dissertation-2009-12-v27.pdf (chapter 6)
They even introduced a new syscall in Linux for these purposes:
http://man7.org/linux/man-pages/man2/membarrier.2.html

I thought about something similar based on membarrier(), but it has
few implications:
1. only latest linux kernels (4.14+) 
2. Not sure is it available on non x86 platforms.
3. Need to measure real impact.

Because of 1) and 2) we probably would need both mb() and membarrier() code paths.
Anyway - it is probably worth investigating for more generic solution,
but I suppose it is out of scope for that patch.
Konstantin

> Where N can maximum cycles required to complete an eBPF instruction.
> 
> OR
> 
> Are you recommending the eBPF program termination is not just enough, there are others stuffs to
> relinquish in order to free the bpf context? if so, what other stuffs to
> relinquish apart from eBPF program termination.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 00/10] add framework to load and execute BPF code
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
@ 2018-04-06 18:49   ` Konstantin Ananyev
  2018-04-09  4:54     ` Jerin Jacob
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                     ` (9 subsequent siblings)
  10 siblings, 1 reply; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 18:49 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

BPF is used quite intensively inside Linux (and BSD) kernels
for various different purposes and proved to be extremely useful.

BPF inside DPDK might also be used in a lot of places
for a lot of similar things.
 As an example to:
- packet filtering/tracing (aka tcpdump)
- packet classification
- statistics collection
- HW/PMD live-system debugging/prototyping - trace HW descriptors,
  internal PMD SW state, etc.
- Comeup with your own idea

All of that in a dynamic, user-defined and extensible manner.

So these series introduce new library - librte_bpf.
librte_bpf provides API to load and execute BPF bytecode within
user-space dpdk app.
It supports basic set of features from eBPF spec.
Also it introduces basic framework to load/unload BPF-based filters
on eth devices (right now via SW RX/TX callbacks).

How to try it:
===============

1) run testpmd as usual and start your favorite forwarding case.
2) build bpf program you'd like to load
(you'll need clang v3.7 or above):
$ cd test/bpf
$ clang -O2 -target bpf -c t1.c

3) load bpf program(s):
testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>

<load-flags>:  [-][J][M]
J - use JIT generated native code, otherwise BPF interpreter will be used.
M - assume input parameter is a pointer to rte_mbuf,
    otherwise assume it is a pointer to first segment's data.

Few examples:

# to load (not JITed) dummy.o at TX queue 0, port 0:
testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o
#to load (and JIT compile) t1.o at RX queue 0, port 1:
testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o

#to load and JIT t3.o (note that it expects mbuf as an input):
testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o

4) observe changed traffic behavior
Let say with the examples above:
  - dummy.o  does literally nothing, so no changes should be here,
    except some possible slowdown.
 - t1.o - should force to drop all packets that doesn't match:
   'dst 1.2.3.4 && udp && dst port 5000' filter.
 - t3.o - should dump to stdout ARP packets.

5) unload some or all bpf programs:
testpmd> bpf-unload tx 0 0

6) continue with step 3) or exit

Not currently supported features:
=================================
- cBPF
- tail-pointer call
- eBPF MAP
- JIT for non X86_64 targets
- skb

v2:
 - add meson build
 - add freebsd build
 - use new logging API
 - using rte_malloc() for cbi allocation
 - add extra logic into bpf_validate()

v3:
 - add new test-case for it
 - update docs
 - update MAINTAINERS

Konstantin Ananyev (10):
  net: move BPF related definitions into librte_net
  bpf: add BPF loading and execution framework
  bpf: add more logic into bpf_validate()
  bpf: add JIT compilation for x86_64 ISA
  bpf: introduce basic RX/TX BPF filters
  testpmd: new commands to load/unload BPF filters
  test: add few eBPF samples
  test: introduce functional test for librte_bpf
  doc: add librte_bpf related info
  MAINTAINERS: add librte_bpf related info

 MAINTAINERS                        |    4 +
 app/test-pmd/bpf_sup.h             |   25 +
 app/test-pmd/cmdline.c             |  146 ++++
 app/test-pmd/meson.build           |    2 +-
 config/common_base                 |    5 +
 doc/api/doxy-api-index.md          |    3 +-
 doc/api/doxy-api.conf              |    1 +
 doc/guides/prog_guide/bpf_lib.rst  |   37 +
 doc/guides/prog_guide/index.rst    |    1 +
 drivers/net/tap/tap_bpf.h          |   80 +--
 lib/Makefile                       |    2 +
 lib/librte_bpf/Makefile            |   35 +
 lib/librte_bpf/bpf.c               |   64 ++
 lib/librte_bpf/bpf_exec.c          |  452 ++++++++++++
 lib/librte_bpf/bpf_impl.h          |   41 ++
 lib/librte_bpf/bpf_jit_x86.c       | 1368 ++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_load.c          |  386 ++++++++++
 lib/librte_bpf/bpf_pkt.c           |  607 ++++++++++++++++
 lib/librte_bpf/bpf_validate.c      | 1166 ++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build         |   24 +
 lib/librte_bpf/rte_bpf.h           |  170 +++++
 lib/librte_bpf/rte_bpf_ethdev.h    |  102 +++
 lib/librte_bpf/rte_bpf_version.map |   16 +
 lib/librte_net/Makefile            |    1 +
 lib/librte_net/bpf_def.h           |  370 ++++++++++
 lib/librte_net/meson.build         |    3 +-
 lib/meson.build                    |    2 +-
 mk/rte.app.mk                      |    2 +
 test/bpf/dummy.c                   |   20 +
 test/bpf/mbuf.h                    |  578 +++++++++++++++
 test/bpf/t1.c                      |   52 ++
 test/bpf/t2.c                      |   31 +
 test/bpf/t3.c                      |   36 +
 test/test/Makefile                 |    2 +
 test/test/meson.build              |    2 +
 test/test/test_bpf.c               |  633 +++++++++++++++++
 36 files changed, 6386 insertions(+), 83 deletions(-)
 create mode 100644 app/test-pmd/bpf_sup.h
 create mode 100644 doc/guides/prog_guide/bpf_lib.rst
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map
 create mode 100644 lib/librte_net/bpf_def.h
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c
 create mode 100644 test/test/test_bpf.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 00/10] add framework to load and execute BPF code Konstantin Ananyev
@ 2018-04-06 18:49   ` Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 00/10] add framework to load and execute BPF code Konstantin Ananyev
                       ` (10 more replies)
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 02/10] bpf: add BPF loading and execution framework Konstantin Ananyev
                     ` (8 subsequent siblings)
  10 siblings, 11 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 18:49 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, olivier.matz, pascal.mazon

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/tap/tap_bpf.h  |  80 +---------
 lib/librte_net/Makefile    |   1 +
 lib/librte_net/bpf_def.h   | 370 +++++++++++++++++++++++++++++++++++++++++++++
 lib/librte_net/meson.build |   3 +-
 4 files changed, 374 insertions(+), 80 deletions(-)
 create mode 100644 lib/librte_net/bpf_def.h

diff --git a/drivers/net/tap/tap_bpf.h b/drivers/net/tap/tap_bpf.h
index 1a70ffe21..baaf3b25c 100644
--- a/drivers/net/tap/tap_bpf.h
+++ b/drivers/net/tap/tap_bpf.h
@@ -6,85 +6,7 @@
 #define __TAP_BPF_H__
 
 #include <tap_autoconf.h>
-
-/* Do not #include <linux/bpf.h> since eBPF must compile on different
- * distros which may include partial definitions for eBPF (while the
- * kernel itself may support eBPF). Instead define here all that is needed
- */
-
-/* BPF_MAP_UPDATE_ELEM command flags */
-#define	BPF_ANY	0 /* create a new element or update an existing */
-
-/* BPF architecture instruction struct */
-struct bpf_insn {
-	__u8	code;
-	__u8	dst_reg:4;
-	__u8	src_reg:4;
-	__s16	off;
-	__s32	imm; /* immediate value */
-};
-
-/* BPF program types */
-enum bpf_prog_type {
-	BPF_PROG_TYPE_UNSPEC,
-	BPF_PROG_TYPE_SOCKET_FILTER,
-	BPF_PROG_TYPE_KPROBE,
-	BPF_PROG_TYPE_SCHED_CLS,
-	BPF_PROG_TYPE_SCHED_ACT,
-};
-
-/* BPF commands types */
-enum bpf_cmd {
-	BPF_MAP_CREATE,
-	BPF_MAP_LOOKUP_ELEM,
-	BPF_MAP_UPDATE_ELEM,
-	BPF_MAP_DELETE_ELEM,
-	BPF_MAP_GET_NEXT_KEY,
-	BPF_PROG_LOAD,
-};
-
-/* BPF maps types */
-enum bpf_map_type {
-	BPF_MAP_TYPE_UNSPEC,
-	BPF_MAP_TYPE_HASH,
-};
-
-/* union of anonymous structs used with TAP BPF commands */
-union bpf_attr {
-	/* BPF_MAP_CREATE command */
-	struct {
-		__u32	map_type;
-		__u32	key_size;
-		__u32	value_size;
-		__u32	max_entries;
-		__u32	map_flags;
-		__u32	inner_map_fd;
-	};
-
-	/* BPF_MAP_UPDATE_ELEM, BPF_MAP_DELETE_ELEM commands */
-	struct {
-		__u32		map_fd;
-		__aligned_u64	key;
-		union {
-			__aligned_u64 value;
-			__aligned_u64 next_key;
-		};
-		__u64		flags;
-	};
-
-	/* BPF_PROG_LOAD command */
-	struct {
-		__u32		prog_type;
-		__u32		insn_cnt;
-		__aligned_u64	insns;
-		__aligned_u64	license;
-		__u32		log_level;
-		__u32		log_size;
-		__aligned_u64	log_buf;
-		__u32		kern_version;
-		__u32		prog_flags;
-	};
-} __attribute__((aligned(8)));
+#include <bpf_def.h>
 
 #ifndef __NR_bpf
 # if defined(__i386__)
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index 95ff54900..52bb418b8 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -20,5 +20,6 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_esp
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += bpf_def.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_net/bpf_def.h b/lib/librte_net/bpf_def.h
new file mode 100644
index 000000000..3f4a5a3e7
--- /dev/null
+++ b/lib/librte_net/bpf_def.h
@@ -0,0 +1,370 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd.
+ */
+
+#ifndef _RTE_BPF_DEF_H_
+#define _RTE_BPF_DEF_H_
+
+#ifdef __linux__
+#include <linux/types.h>
+#else
+
+typedef uint8_t __u8;
+typedef uint16_t __u16;
+typedef uint32_t __u32;
+typedef uint64_t __u64;
+
+typedef int8_t __s8;
+typedef int16_t __s16;
+typedef int32_t __s32;
+
+#define __aligned_u64 __u64 __attribute__((aligned(8)))
+
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Do not #include <linux/bpf.h> since eBPF must compile on different
+ * distros which may include partial definitions for eBPF (while the
+ * kernel itself may support eBPF). Instead define here all that is needed
+ * by various DPDK components.
+ */
+
+/* Instruction classes */
+#define BPF_CLASS(code) ((code) & 0x07)
+#define		BPF_LD		0x00
+#define		BPF_LDX		0x01
+#define		BPF_ST		0x02
+#define		BPF_STX		0x03
+#define		BPF_ALU		0x04
+#define		BPF_JMP		0x05
+#define		BPF_RET		0x06
+#define		BPF_MISC        0x07
+
+/* ld/ldx fields */
+#define BPF_SIZE(code)  ((code) & 0x18)
+#define		BPF_W		0x00
+#define		BPF_H		0x08
+#define		BPF_B		0x10
+#define BPF_MODE(code)  ((code) & 0xe0)
+#define		BPF_IMM		0x00
+#define		BPF_ABS		0x20
+#define		BPF_IND		0x40
+#define		BPF_MEM		0x60
+#define		BPF_LEN		0x80
+#define		BPF_MSH		0xa0
+
+/* alu/jmp fields */
+#define BPF_OP(code)    ((code) & 0xf0)
+#define		BPF_ADD		0x00
+#define		BPF_SUB		0x10
+#define		BPF_MUL		0x20
+#define		BPF_DIV		0x30
+#define		BPF_OR		0x40
+#define		BPF_AND		0x50
+#define		BPF_LSH		0x60
+#define		BPF_RSH		0x70
+#define		BPF_NEG		0x80
+#define		BPF_MOD		0x90
+#define		BPF_XOR		0xa0
+
+#define		BPF_JA		0x00
+#define		BPF_JEQ		0x10
+#define		BPF_JGT		0x20
+#define		BPF_JGE		0x30
+#define		BPF_JSET        0x40
+#define BPF_SRC(code)   ((code) & 0x08)
+#define		BPF_K		0x00
+#define		BPF_X		0x08
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/* Extended instruction set based on top of classic BPF */
+
+/* instruction classes */
+#define BPF_ALU64	0x07	/* alu mode in double word width */
+
+/* ld/ldx fields */
+#define BPF_DW		0x18	/* double word */
+#define BPF_XADD	0xc0	/* exclusive add */
+
+/* alu/jmp fields */
+#define BPF_MOV		0xb0	/* mov reg to reg */
+#define BPF_ARSH	0xc0	/* sign extending arithmetic shift right */
+
+/* change endianness of a register */
+#define BPF_END		0xd0	/* flags for endianness conversion: */
+#define BPF_TO_LE	0x00	/* convert to little-endian */
+#define BPF_TO_BE	0x08	/* convert to big-endian */
+#define BPF_FROM_LE	BPF_TO_LE
+#define BPF_FROM_BE	BPF_TO_BE
+
+/* jmp encodings */
+#define BPF_JNE		0x50	/* jump != */
+#define BPF_JLT		0xa0	/* LT is unsigned, '<' */
+#define BPF_JLE		0xb0	/* LE is unsigned, '<=' */
+#define BPF_JSGT	0x60	/* SGT is signed '>', GT in x86 */
+#define BPF_JSGE	0x70	/* SGE is signed '>=', GE in x86 */
+#define BPF_JSLT	0xc0	/* SLT is signed, '<' */
+#define BPF_JSLE	0xd0	/* SLE is signed, '<=' */
+#define BPF_CALL	0x80	/* function call */
+#define BPF_EXIT	0x90	/* function return */
+
+/* Register numbers */
+enum {
+	BPF_REG_0 = 0,
+	BPF_REG_1,
+	BPF_REG_2,
+	BPF_REG_3,
+	BPF_REG_4,
+	BPF_REG_5,
+	BPF_REG_6,
+	BPF_REG_7,
+	BPF_REG_8,
+	BPF_REG_9,
+	BPF_REG_10,
+	__MAX_BPF_REG,
+};
+
+/* BPF has 10 general purpose 64-bit registers and stack frame. */
+#define MAX_BPF_REG	__MAX_BPF_REG
+
+struct bpf_insn {
+	__u8	code;		/* opcode */
+	__u8	dst_reg:4;	/* dest register */
+	__u8	src_reg:4;	/* source register */
+	__s16	off;		/* signed offset */
+	__s32	imm;		/* signed immediate constant */
+};
+
+/* BPF syscall commands, see bpf(2) man-page for details. */
+enum bpf_cmd {
+	BPF_MAP_CREATE,
+	BPF_MAP_LOOKUP_ELEM,
+	BPF_MAP_UPDATE_ELEM,
+	BPF_MAP_DELETE_ELEM,
+	BPF_MAP_GET_NEXT_KEY,
+	BPF_PROG_LOAD,
+	BPF_OBJ_PIN,
+	BPF_OBJ_GET,
+	BPF_PROG_ATTACH,
+	BPF_PROG_DETACH,
+	BPF_PROG_TEST_RUN,
+	BPF_PROG_GET_NEXT_ID,
+	BPF_MAP_GET_NEXT_ID,
+	BPF_PROG_GET_FD_BY_ID,
+	BPF_MAP_GET_FD_BY_ID,
+	BPF_OBJ_GET_INFO_BY_FD,
+};
+
+enum bpf_map_type {
+	BPF_MAP_TYPE_UNSPEC,
+	BPF_MAP_TYPE_HASH,
+	BPF_MAP_TYPE_ARRAY,
+	BPF_MAP_TYPE_PROG_ARRAY,
+	BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+	BPF_MAP_TYPE_PERCPU_HASH,
+	BPF_MAP_TYPE_PERCPU_ARRAY,
+	BPF_MAP_TYPE_STACK_TRACE,
+	BPF_MAP_TYPE_CGROUP_ARRAY,
+	BPF_MAP_TYPE_LRU_HASH,
+	BPF_MAP_TYPE_LRU_PERCPU_HASH,
+	BPF_MAP_TYPE_LPM_TRIE,
+	BPF_MAP_TYPE_ARRAY_OF_MAPS,
+	BPF_MAP_TYPE_HASH_OF_MAPS,
+	BPF_MAP_TYPE_DEVMAP,
+	BPF_MAP_TYPE_SOCKMAP,
+};
+
+enum bpf_prog_type {
+	BPF_PROG_TYPE_UNSPEC,
+	BPF_PROG_TYPE_SOCKET_FILTER,
+	BPF_PROG_TYPE_KPROBE,
+	BPF_PROG_TYPE_SCHED_CLS,
+	BPF_PROG_TYPE_SCHED_ACT,
+	BPF_PROG_TYPE_TRACEPOINT,
+	BPF_PROG_TYPE_XDP,
+	BPF_PROG_TYPE_PERF_EVENT,
+	BPF_PROG_TYPE_CGROUP_SKB,
+	BPF_PROG_TYPE_CGROUP_SOCK,
+	BPF_PROG_TYPE_LWT_IN,
+	BPF_PROG_TYPE_LWT_OUT,
+	BPF_PROG_TYPE_LWT_XMIT,
+	BPF_PROG_TYPE_SOCK_OPS,
+	BPF_PROG_TYPE_SK_SKB,
+};
+
+enum bpf_attach_type {
+	BPF_CGROUP_INET_INGRESS,
+	BPF_CGROUP_INET_EGRESS,
+	BPF_CGROUP_INET_SOCK_CREATE,
+	BPF_CGROUP_SOCK_OPS,
+	BPF_SK_SKB_STREAM_PARSER,
+	BPF_SK_SKB_STREAM_VERDICT,
+	__MAX_BPF_ATTACH_TYPE
+};
+
+#define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
+
+/* If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
+ * to the given target_fd cgroup the descendent cgroup will be able to
+ * override effective bpf program that was inherited from this cgroup
+ */
+#define BPF_F_ALLOW_OVERRIDE	(1U << 0)
+
+/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
+ * verifier will perform strict alignment checking as if the kernel
+ * has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not set,
+ * and NET_IP_ALIGN defined to 2.
+ */
+#define BPF_F_STRICT_ALIGNMENT	(1U << 0)
+
+#define BPF_PSEUDO_MAP_FD	1
+
+/* flags for BPF_MAP_UPDATE_ELEM command */
+#define BPF_ANY		0 /* create new element or update existing */
+#define BPF_NOEXIST	1 /* create new element if it didn't exist */
+#define BPF_EXIST	2 /* update existing element */
+
+/* flags for BPF_MAP_CREATE command */
+#define BPF_F_NO_PREALLOC	(1U << 0)
+/* Instead of having one common LRU list in the
+ * BPF_MAP_TYPE_LRU_[PERCPU_]HASH map, use a percpu LRU list
+ * which can scale and perform better.
+ * Note, the LRU nodes (including free nodes) cannot be moved
+ * across different LRU lists.
+ */
+#define BPF_F_NO_COMMON_LRU	(1U << 1)
+/* Specify numa node during map creation */
+#define BPF_F_NUMA_NODE		(1U << 2)
+
+union bpf_attr {
+	struct { /* anonymous struct used by BPF_MAP_CREATE command */
+		__u32	map_type;	/* one of enum bpf_map_type */
+		__u32	key_size;	/* size of key in bytes */
+		__u32	value_size;	/* size of value in bytes */
+		__u32	max_entries;	/* max number of entries in a map */
+		__u32	map_flags;	/* BPF_MAP_CREATE related
+					 * flags defined above.
+					 */
+		__u32	inner_map_fd;	/* fd pointing to the inner map */
+		__u32	numa_node;	/* numa node (effective only if
+					 * BPF_F_NUMA_NODE is set).
+					 */
+	};
+
+	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
+		__u32		map_fd;
+		__aligned_u64	key;
+		union {
+			__aligned_u64 value;
+			__aligned_u64 next_key;
+		};
+		__u64		flags;
+	};
+
+	struct { /* anonymous struct used by BPF_PROG_LOAD command */
+		__u32		prog_type;	/* one of enum bpf_prog_type */
+		__u32		insn_cnt;
+		__aligned_u64	insns;
+		__aligned_u64	license;
+		__u32		log_level;
+		/* verbosity level of verifier */
+		__u32		log_size;	/* size of user buffer */
+		__aligned_u64	log_buf;	/* user supplied buffer */
+		__u32		kern_version;
+		/* checked when prog_type=kprobe */
+		__u32		prog_flags;
+	};
+
+	struct { /* anonymous struct used by BPF_OBJ_* commands */
+		__aligned_u64	pathname;
+		__u32		bpf_fd;
+	};
+
+	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
+		__u32		target_fd;
+		/* container object to attach to */
+		__u32		attach_bpf_fd;	/* eBPF program to attach */
+		__u32		attach_type;
+		__u32		attach_flags;
+	};
+
+	struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
+		__u32		prog_fd;
+		__u32		retval;
+		__u32		data_size_in;
+		__u32		data_size_out;
+		__aligned_u64	data_in;
+		__aligned_u64	data_out;
+		__u32		repeat;
+		__u32		duration;
+	} test;
+
+	struct { /* anonymous struct used by BPF_*_GET_*_ID */
+		union {
+			__u32		start_id;
+			__u32		prog_id;
+			__u32		map_id;
+		};
+		__u32		next_id;
+	};
+
+	struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
+		__u32		bpf_fd;
+		__u32		info_len;
+		__aligned_u64	info;
+	} info;
+} __attribute__((aligned(8)));
+
+/* Generic BPF return codes which all BPF program types may support.
+ * The values are binary compatible with their TC_ACT_* counter-part to
+ * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
+ * programs.
+ *
+ * XDP is handled seprately, see XDP_*.
+ */
+enum bpf_ret_code {
+	BPF_OK = 0,
+	/* 1 reserved */
+	BPF_DROP = 2,
+	/* 3-6 reserved */
+	BPF_REDIRECT = 7,
+	/* >127 are reserved for prog type specific return codes */
+};
+
+enum sk_action {
+	SK_DROP = 0,
+	SK_PASS,
+};
+
+#define BPF_TAG_SIZE	8
+
+struct bpf_prog_info {
+	__u32 type;
+	__u32 id;
+	__u8  tag[BPF_TAG_SIZE];
+	__u32 jited_prog_len;
+	__u32 xlated_prog_len;
+	__aligned_u64 jited_prog_insns;
+	__aligned_u64 xlated_prog_insns;
+} __attribute__((aligned(8)));
+
+struct bpf_map_info {
+	__u32 type;
+	__u32 id;
+	__u32 key_size;
+	__u32 value_size;
+	__u32 max_entries;
+	__u32 map_flags;
+} __attribute__((aligned(8)));
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_DEF_H_ */
diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
index 78c0f03e5..3acc1602a 100644
--- a/lib/librte_net/meson.build
+++ b/lib/librte_net/meson.build
@@ -12,7 +12,8 @@ headers = files('rte_ip.h',
 	'rte_ether.h',
 	'rte_gre.h',
 	'rte_net.h',
-	'rte_net_crc.h')
+	'rte_net_crc.h',
+	'bpf_def.h')
 
 sources = files('rte_arp.c', 'rte_net.c', 'rte_net_crc.c')
 deps += ['mbuf']
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 02/10] bpf: add BPF loading and execution framework
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 00/10] add framework to load and execute BPF code Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
@ 2018-04-06 18:49   ` Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 03/10] bpf: add more logic into bpf_validate() Konstantin Ananyev
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 18:49 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.
It supports basic set of features from eBPF spec
(https://www.kernel.org/doc/Documentation/networking/filter.txt).

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb

It also adds dependency on libelf.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_bpf/Makefile            |  30 +++
 lib/librte_bpf/bpf.c               |  59 +++++
 lib/librte_bpf/bpf_exec.c          | 452 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h          |  41 ++++
 lib/librte_bpf/bpf_load.c          | 386 +++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_validate.c      |  55 +++++
 lib/librte_bpf/meson.build         |  18 ++
 lib/librte_bpf/rte_bpf.h           | 170 ++++++++++++++
 lib/librte_bpf/rte_bpf_version.map |  12 +
 lib/meson.build                    |   2 +-
 mk/rte.app.mk                      |   2 +
 13 files changed, 1233 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/config/common_base b/config/common_base
index c09c7cf88..d68c2e211 100644
--- a/config/common_base
+++ b/config/common_base
@@ -821,3 +821,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=y
diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..a4a2329f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..e0f434e77
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+LDLIBS += -lelf
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..d7f68c017
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+int rte_bpf_logtype;
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+	if (rc != 0)
+		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
+
+RTE_INIT(rte_bpf_init_log);
+
+static void
+rte_bpf_init_log(void)
+{
+	rte_bpf_logtype = rte_log_register("lib.bpf");
+	if (rte_bpf_logtype >= 0)
+		rte_log_set_level(rte_bpf_logtype, RTE_LOG_INFO);
+}
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..0382ade98
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,452 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define BPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define BPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_BPF_LOG(ERR, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[MAX_BPF_REG])
+{
+	const struct bpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			reg[BPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[BPF_REG_1], reg[BPF_REG_2], reg[BPF_REG_3],
+				reg[BPF_REG_4], reg[BPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			return reg[BPF_REG_0];
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[MAX_BPF_REG];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[BPF_REG_1] = (uintptr_t)ctx[i];
+		reg[BPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..5d7e65c31
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+extern int rte_bpf_logtype;
+
+#define	RTE_BPF_LOG(lvl, fmt, args...) \
+	rte_log(RTE_LOG_## lvl, rte_bpf_logtype, fmt, ##args)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..3c7279a6c
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,386 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+/* To overcome compatibility issue */
+#ifndef EM_BPF
+#define	EM_BPF	247
+#endif
+
+static uint32_t
+bpf_find_xsym(const char *sn, enum rte_bpf_xtype type,
+	const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == type && strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+/*
+ * update BPF code at offset *ofs* with a proper address(index) for external
+ * symbol *sn*
+ */
+static int
+resolve_xsym(const char *sn, size_t ofs, struct bpf_insn *ins, size_t ins_sz,
+	const struct rte_bpf_prm *prm)
+{
+	uint32_t idx, fidx;
+	enum rte_bpf_xtype type;
+
+	if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+		return -EINVAL;
+
+	idx = ofs / sizeof(ins[0]);
+	if (ins[idx].code == (BPF_JMP | BPF_CALL))
+		type = RTE_BPF_XTYPE_FUNC;
+	else if (ins[idx].code == (BPF_LD | BPF_IMM | BPF_DW) &&
+			ofs < ins_sz - sizeof(ins[idx]))
+		type = RTE_BPF_XTYPE_VAR;
+	else
+		return -EINVAL;
+
+	fidx = bpf_find_xsym(sn, type, prm->xsym, prm->nb_xsym);
+	if (fidx == UINT32_MAX)
+		return -ENOENT;
+
+	/* for function we just need an index in our xsym table */
+	if (type == RTE_BPF_XTYPE_FUNC)
+		ins[idx].imm = fidx;
+	/* for variable we need to store its absolute address */
+	else {
+		ins[idx].imm = (uintptr_t)prm->xsym[fidx].var;
+		ins[idx + 1].imm =
+			(uint64_t)(uintptr_t)prm->xsym[fidx].var >> 32;
+	}
+
+	return 0;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr * eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_BPF_LOG(ERR, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct bpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct bpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	int32_t rc;
+	uint32_t i, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		rc = resolve_xsym(sn, ofs, ins, ins_sz, prm);
+		if (rc != 0) {
+			RTE_BPF_LOG(ERR,
+				"resolve_xsym(%s, %zu) error code: %d\n",
+				sn, ofs, rc);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct bpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_BPF_LOG(ERR, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_BPF_LOG(INFO, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p(jit={.func=%p,.sz=%zu});\n",
+		__func__, fname, sname, bpf, bpf->jit.func, bpf->jit.sz);
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..1911e1381
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct bpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == BPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
new file mode 100644
index 000000000..05c48c7ff
--- /dev/null
+++ b/lib/librte_bpf/meson.build
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+allow_experimental_apis = true
+sources = files('bpf.c',
+		'bpf_exec.c',
+		'bpf_load.c',
+		'bpf_validate.c')
+
+install_headers = files('rte_bpf.h')
+
+deps += ['mbuf', 'net']
+
+dep = dependency('libelf', required: false)
+if dep.found() == false
+	build = false
+endif
+ext_deps += dep
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..825621404
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,170 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+/**
+ * @file
+ *
+ * RTE BPF support.
+ * librte_bpf provides a framework to load and execute eBPF bytecode
+ * inside user-space dpdk based applications.
+ * It supports basic set of features from eBPF spec
+ * (https://www.kernel.org/doc/Documentation/networking/filter.txt).
+ */
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <bpf_def.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_VAR, /**< variable */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	union {
+		uint64_t (*func)(uint64_t, uint64_t, uint64_t,
+				uint64_t, uint64_t);
+		void *var;
+	}; /**< value */
+};
+
+/**
+ * Possible BPF program types.
+ * Use negative values for DPDK specific prog-types, to make sure they will
+ * not interfere with Linux related ones.
+ */
+enum rte_bpf_prog_type {
+	RTE_BPF_PROG_TYPE_UNSPEC = BPF_PROG_TYPE_UNSPEC,
+	/**< input is a pointer to raw data */
+	RTE_BPF_PROG_TYPE_MBUF = INT32_MIN,
+	/**< input is a pointer to rte_mbuf */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct bpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	enum rte_bpf_prog_type prog_type; /**< eBPF program type */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *); /**< JIT-ed native code */
+	size_t sz;                /**< size of JIT-ed code */
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ff65144df
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_elf_load;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/lib/meson.build b/lib/meson.build
index ef6159170..7ff7aaaa5 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -23,7 +23,7 @@ libraries = [ 'compat', # just a header, used for versioning
 	# add pkt framework libs which use other libs from above
 	'port', 'table', 'pipeline',
 	# flow_classify lib depends on pkt framework table lib
-	'flow_classify']
+	'flow_classify', 'bpf']
 
 foreach l:libraries
 	build = true
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 258590819..405a13147 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -83,6 +83,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)          += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf -lelf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 03/10] bpf: add more logic into bpf_validate()
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
                     ` (2 preceding siblings ...)
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 02/10] bpf: add BPF loading and execution framework Konstantin Ananyev
@ 2018-04-06 18:49   ` Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 04/10] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 18:49 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add checks for:
 - all instructions are valid ones
   (known opcodes, correct syntax, valid reg/off/imm values, etc.)
 - no unreachable instructions
 - no loops
 - basic stack boundaries checks
 - division by zero

Still need to add checks for:
 - use/return only initialized registers and stack data.
 - memory boundaries violation

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/bpf_validate.c | 1163 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 1137 insertions(+), 26 deletions(-)

diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
index 1911e1381..816aa519a 100644
--- a/lib/librte_bpf/bpf_validate.c
+++ b/lib/librte_bpf/bpf_validate.c
@@ -14,42 +14,1153 @@
 
 #include "bpf_impl.h"
 
+/* possible instruction node colour */
+enum {
+	WHITE,
+	GREY,
+	BLACK,
+	MAX_NODE_COLOUR
+};
+
+/* possible edge types */
+enum {
+	UNKNOWN_EDGE,
+	TREE_EDGE,
+	BACK_EDGE,
+	CROSS_EDGE,
+	MAX_EDGE_TYPE
+};
+
+struct bpf_reg_state {
+	uint64_t val;
+};
+
+struct bpf_eval_state {
+	struct bpf_reg_state rs[MAX_BPF_REG];
+};
+
+#define	MAX_EDGES	2
+
+struct inst_node {
+	uint8_t colour;
+	uint8_t nb_edge:4;
+	uint8_t cur_edge:4;
+	uint8_t edge_type[MAX_EDGES];
+	uint32_t edge_dest[MAX_EDGES];
+	uint32_t prev_node;
+	struct bpf_eval_state *evst;
+};
+
+struct bpf_verifier {
+	const struct rte_bpf_prm *prm;
+	struct inst_node *in;
+	int32_t stack_sz;
+	uint32_t nb_nodes;
+	uint32_t nb_jcc_nodes;
+	uint32_t node_colour[MAX_NODE_COLOUR];
+	uint32_t edge_type[MAX_EDGE_TYPE];
+	struct bpf_eval_state *evst;
+	struct {
+		uint32_t num;
+		uint32_t cur;
+		struct bpf_eval_state *ent;
+	} evst_pool;
+};
+
+struct bpf_ins_check {
+	struct {
+		uint16_t dreg;
+		uint16_t sreg;
+	} mask;
+	struct {
+		uint16_t min;
+		uint16_t max;
+	} off;
+	struct {
+		uint32_t min;
+		uint32_t max;
+	} imm;
+	const char * (*check)(const struct bpf_insn *);
+	const char * (*eval)(struct bpf_verifier *, const struct bpf_insn *);
+};
+
+#define	ALL_REGS	RTE_LEN2MASK(MAX_BPF_REG, uint16_t)
+#define	WRT_REGS	RTE_LEN2MASK(BPF_REG_10, uint16_t)
+#define	ZERO_REG	RTE_LEN2MASK(BPF_REG_1, uint16_t)
+
 /*
- * dummy one for now, need more work.
+ * check and evaluate functions for particular instruction types.
  */
-int
-bpf_validate(struct rte_bpf *bpf)
+
+static const char *
+check_alu_bele(const struct bpf_insn *ins)
+{
+	if (ins->imm != 16 && ins->imm != 32 && ins->imm != 64)
+		return "invalid imm field";
+	return NULL;
+}
+
+static const char *
+eval_stack(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	int32_t ofs;
+
+	ofs = ins->off;
+
+	if (ofs >= 0 || ofs < -MAX_BPF_STACK_SIZE)
+		return "stack boundary violation";
+
+	ofs = -ofs;
+	bvf->stack_sz = RTE_MAX(bvf->stack_sz, ofs);
+	return NULL;
+}
+
+static const char *
+eval_store(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	if (ins->dst_reg == BPF_REG_10)
+		return eval_stack(bvf, ins);
+	return NULL;
+}
+
+static const char *
+eval_load(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	if (ins->src_reg == BPF_REG_10)
+		return eval_stack(bvf, ins);
+	return NULL;
+}
+
+static const char *
+eval_call(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	uint32_t idx;
+
+	idx = ins->imm;
+
+	if (idx >= bvf->prm->nb_xsym ||
+			bvf->prm->xsym[idx].type != RTE_BPF_XTYPE_FUNC)
+		return "invalid external function index";
+	return NULL;
+}
+
+/*
+ * validate parameters for each instruction type.
+ */
+static const struct bpf_ins_check ins_chk[UINT8_MAX] = {
+	/* ALU IMM 32-bit instructions */
+	[(BPF_ALU | BPF_ADD | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_SUB | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_AND | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_OR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_LSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_RSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_XOR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_MUL | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_MOV | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_DIV | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	[(BPF_ALU | BPF_MOD | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	/* ALU IMM 64-bit instructions */
+	[(BPF_ALU64 | BPF_ADD | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_SUB | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_AND | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_OR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_LSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_RSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_ARSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_XOR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_MUL | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_MOV | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_DIV | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	[(BPF_ALU64 | BPF_MOD | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	/* ALU REG 32-bit instructions */
+	[(BPF_ALU | BPF_ADD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_SUB | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_AND | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_OR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_LSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_RSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_XOR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MUL | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_DIV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MOD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MOV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_NEG)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_END | BPF_TO_BE)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 16, .max = 64},
+		.check = check_alu_bele,
+	},
+	[(BPF_ALU | BPF_END | BPF_TO_LE)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 16, .max = 64},
+		.check = check_alu_bele,
+	},
+	/* ALU REG 64-bit instructions */
+	[(BPF_ALU64 | BPF_ADD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_SUB | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_AND | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_OR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_LSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_RSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_XOR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_MUL | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_DIV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_MOD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_MOV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_NEG)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* load instructions */
+	[(BPF_LDX | BPF_MEM | BPF_B)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_H)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_W)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_DW)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	/* load 64 bit immediate value */
+	[(BPF_LD | BPF_IMM | BPF_DW)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	/* store REG instructions */
+	[(BPF_STX | BPF_MEM | BPF_B)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_H)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	/* atomic add instructions */
+	[(BPF_STX | BPF_XADD | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_XADD | BPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	/* store IMM instructions */
+	[(BPF_ST | BPF_MEM | BPF_B)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_H)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	/* jump instruction */
+	[(BPF_JMP | BPF_JA)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* jcc IMM instructions */
+	[(BPF_JMP | BPF_JEQ | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JNE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JGT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JLT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JGE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JLE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSGT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSLT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSGE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSLE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSET | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	/* jcc REG instructions */
+	[(BPF_JMP | BPF_JEQ | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JNE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JGT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JLT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JGE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JLE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSGT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSLT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSGE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSLE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSET | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* call instruction */
+	[(BPF_JMP | BPF_CALL)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_call,
+	},
+	/* ret instruction */
+	[(BPF_JMP | BPF_EXIT)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+};
+
+/*
+ * make sure that instruction syntax is valid,
+ * and it fields don't violate partciular instrcution type restrictions.
+ */
+static const char *
+check_syntax(const struct bpf_insn *ins)
+{
+
+	uint8_t op;
+	uint16_t off;
+	uint32_t imm;
+
+	op = ins->code;
+
+	if (ins_chk[op].mask.dreg == 0)
+		return "invalid opcode";
+
+	if ((ins_chk[op].mask.dreg & 1 << ins->dst_reg) == 0)
+		return "invalid dst-reg field";
+
+	if ((ins_chk[op].mask.sreg & 1 << ins->src_reg) == 0)
+		return "invalid src-reg field";
+
+	off = ins->off;
+	if (ins_chk[op].off.min > off || ins_chk[op].off.max < off)
+		return "invalid off field";
+
+	imm = ins->imm;
+	if (ins_chk[op].imm.min > imm || ins_chk[op].imm.max < imm)
+		return "invalid imm field";
+
+	if (ins_chk[op].check != NULL)
+		return ins_chk[op].check(ins);
+
+	return NULL;
+}
+
+/*
+ * helper function, return instruction index for the given node.
+ */
+static uint32_t
+get_node_idx(const struct bpf_verifier *bvf, const struct inst_node *node)
 {
-	int32_t rc, ofs, stack_sz;
-	uint32_t i, op, dr;
+	return node - bvf->in;
+}
+
+/*
+ * helper function, used to walk through constructed CFG.
+ */
+static struct inst_node *
+get_next_node(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	uint32_t ce, ne, dst;
+
+	ne = node->nb_edge;
+	ce = node->cur_edge;
+	if (ce == ne)
+		return NULL;
+
+	node->cur_edge++;
+	dst = node->edge_dest[ce];
+	return bvf->in + dst;
+}
+
+static void
+set_node_colour(struct bpf_verifier *bvf, struct inst_node *node,
+	uint32_t new)
+{
+	uint32_t prev;
+
+	prev = node->colour;
+	node->colour = new;
+
+	bvf->node_colour[prev]--;
+	bvf->node_colour[new]++;
+}
+
+/*
+ * helper function, add new edge between two nodes.
+ */
+static int
+add_edge(struct bpf_verifier *bvf, struct inst_node *node, uint32_t nidx)
+{
+	uint32_t ne;
+
+	if (nidx > bvf->prm->nb_ins) {
+		RTE_BPF_LOG(ERR, "%s: program boundary violation at pc: %u, "
+			"next pc: %u\n",
+			__func__, get_node_idx(bvf, node), nidx);
+		return -EINVAL;
+	}
+
+	ne = node->nb_edge;
+	if (ne >= RTE_DIM(node->edge_dest)) {
+		RTE_BPF_LOG(ERR, "%s: internal error at pc: %u\n",
+			__func__, get_node_idx(bvf, node));
+		return -EINVAL;
+	}
+
+	node->edge_dest[ne] = nidx;
+	node->nb_edge = ne + 1;
+	return 0;
+}
+
+/*
+ * helper function, determine type of edge between two nodes.
+ */
+static void
+set_edge_type(struct bpf_verifier *bvf, struct inst_node *node,
+	const struct inst_node *next)
+{
+	uint32_t ce, clr, type;
+
+	ce = node->cur_edge - 1;
+	clr = next->colour;
+
+	type = UNKNOWN_EDGE;
+
+	if (clr == WHITE)
+		type = TREE_EDGE;
+	else if (clr == GREY)
+		type = BACK_EDGE;
+	else if (clr == BLACK)
+		/*
+		 * in fact it could be either direct or cross edge,
+		 * but for now, we don't need to distinguish between them.
+		 */
+		type = CROSS_EDGE;
+
+	node->edge_type[ce] = type;
+	bvf->edge_type[type]++;
+}
+
+static struct inst_node *
+get_prev_node(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	return  bvf->in + node->prev_node;
+}
+
+/*
+ * Depth-First Search (DFS) through previously constructed
+ * Control Flow Graph (CFG).
+ * Information collected at this path would be used later
+ * to determine is there any loops, and/or unreachable instructions.
+ */
+static void
+dfs(struct bpf_verifier *bvf)
+{
+	struct inst_node *next, *node;
+
+	node = bvf->in;
+	while (node != NULL) {
+
+		if (node->colour == WHITE)
+			set_node_colour(bvf, node, GREY);
+
+		if (node->colour == GREY) {
+
+			/* find next unprocessed child node */
+			do {
+				next = get_next_node(bvf, node);
+				if (next == NULL)
+					break;
+				set_edge_type(bvf, node, next);
+			} while (next->colour != WHITE);
+
+			if (next != NULL) {
+				/* proceed with next child */
+				next->prev_node = get_node_idx(bvf, node);
+				node = next;
+			} else {
+				/*
+				 * finished with current node and all it's kids,
+				 * proceed with parent
+				 */
+				set_node_colour(bvf, node, BLACK);
+				node->cur_edge = 0;
+				node = get_prev_node(bvf, node);
+			}
+		} else
+			node = NULL;
+	}
+}
+
+/*
+ * report unreachable instructions.
+ */
+static void
+log_unreachable(const struct bpf_verifier *bvf)
+{
+	uint32_t i;
+	struct inst_node *node;
 	const struct bpf_insn *ins;
 
-	rc = 0;
-	stack_sz = 0;
-	for (i = 0; i != bpf->prm.nb_ins; i++) {
-
-		ins = bpf->prm.ins + i;
-		op = ins->code;
-		dr = ins->dst_reg;
-		ofs = ins->off;
-
-		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
-				dr == BPF_REG_10) {
-			ofs -= sizeof(uint64_t);
-			stack_sz = RTE_MIN(ofs, stack_sz);
+	for (i = 0; i != bvf->prm->nb_ins; i++) {
+
+		node = bvf->in + i;
+		ins = bvf->prm->ins + i;
+
+		if (node->colour == WHITE &&
+				ins->code != (BPF_LD | BPF_IMM | BPF_DW))
+			RTE_BPF_LOG(ERR, "unreachable code at pc: %u;\n", i);
+	}
+}
+
+/*
+ * report loops detected.
+ */
+static void
+log_loop(const struct bpf_verifier *bvf)
+{
+	uint32_t i, j;
+	struct inst_node *node;
+
+	for (i = 0; i != bvf->prm->nb_ins; i++) {
+
+		node = bvf->in + i;
+		if (node->colour != BLACK)
+			continue;
+
+		for (j = 0; j != node->nb_edge; j++) {
+			if (node->edge_type[j] == BACK_EDGE)
+				RTE_BPF_LOG(ERR,
+					"loop at pc:%u --> pc:%u;\n",
+					i, node->edge_dest[j]);
 		}
 	}
+}
+
+/*
+ * First pass goes though all instructions in the set, checks that each
+ * instruction is a valid one (correct syntax, valid field values, etc.)
+ * and constructs control flow graph (CFG).
+ * Then deapth-first search is performed over the constructed graph.
+ * Programs with unreachable instructions and/or loops will be rejected.
+ */
+static int
+validate(struct bpf_verifier *bvf)
+{
+	int32_t rc;
+	uint32_t i;
+	struct inst_node *node;
+	const struct bpf_insn *ins;
+	const char *err;
+
+	rc = 0;
+	for (i = 0; i < bvf->prm->nb_ins; i++) {
+
+		ins = bvf->prm->ins + i;
+		node = bvf->in + i;
 
-	if (stack_sz != 0) {
-		stack_sz = -stack_sz;
-		if (stack_sz > MAX_BPF_STACK_SIZE)
-			rc = -ERANGE;
-		else
-			bpf->stack_sz = stack_sz;
+		err = check_syntax(ins);
+		if (err != 0) {
+			RTE_BPF_LOG(ERR, "%s: %s at pc: %u\n",
+				__func__, err, i);
+			rc |= -EINVAL;
+		}
+
+		/*
+		 * construct CFG, jcc nodes have to outgoing edges,
+		 * 'exit' nodes - none, all others nodes have exaclty one
+		 * outgoing edge.
+		 */
+		switch (ins->code) {
+		case (BPF_JMP | BPF_EXIT):
+			break;
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | BPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | BPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | BPF_JLE | BPF_K):
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | BPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | BPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | BPF_JLE | BPF_X):
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			rc |= add_edge(bvf, node, i + ins->off + 1);
+			rc |= add_edge(bvf, node, i + 1);
+			bvf->nb_jcc_nodes++;
+			break;
+		case (BPF_JMP | BPF_JA):
+			rc |= add_edge(bvf, node, i + ins->off + 1);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			rc |= add_edge(bvf, node, i + 2);
+			i++;
+			break;
+		default:
+			rc |= add_edge(bvf, node, i + 1);
+			break;
+		}
+
+		bvf->nb_nodes++;
+		bvf->node_colour[WHITE]++;
 	}
 
 	if (rc != 0)
-		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
-			__func__, bpf, rc);
+		return rc;
+
+	dfs(bvf);
+
+	RTE_BPF_LOG(INFO, "%s(%p) stats:\n"
+		"nb_nodes=%u;\n"
+		"nb_jcc_nodes=%u;\n"
+		"node_color={[WHITE]=%u, [GREY]=%u,, [BLACK]=%u};\n"
+		"edge_type={[UNKNOWN]=%u, [TREE]=%u, [BACK]=%u, [CROSS]=%u};\n",
+		__func__, bvf,
+		bvf->nb_nodes,
+		bvf->nb_jcc_nodes,
+		bvf->node_colour[WHITE], bvf->node_colour[GREY],
+			bvf->node_colour[BLACK],
+		bvf->edge_type[UNKNOWN_EDGE], bvf->edge_type[TREE_EDGE],
+		bvf->edge_type[BACK_EDGE], bvf->edge_type[CROSS_EDGE]);
+
+	if (bvf->node_colour[BLACK] != bvf->nb_nodes) {
+		RTE_BPF_LOG(ERR, "%s(%p) unreachable instructions;\n",
+			__func__, bvf);
+		log_unreachable(bvf);
+		return -EINVAL;
+	}
+
+	if (bvf->node_colour[GREY] != 0 || bvf->node_colour[WHITE] != 0 ||
+			bvf->edge_type[UNKNOWN_EDGE] != 0) {
+		RTE_BPF_LOG(ERR, "%s(%p) DFS internal error;\n",
+			__func__, bvf);
+		return -EINVAL;
+	}
+
+	if (bvf->edge_type[BACK_EDGE] != 0) {
+		RTE_BPF_LOG(ERR, "%s(%p) loops detected;\n",
+			__func__, bvf);
+		log_loop(bvf);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper functions get/free eval states.
+ */
+static struct bpf_eval_state *
+pull_eval_state(struct bpf_verifier *bvf)
+{
+	uint32_t n;
+
+	n = bvf->evst_pool.cur;
+	if (n == bvf->evst_pool.num)
+		return NULL;
+
+	bvf->evst_pool.cur = n + 1;
+	return bvf->evst_pool.ent + n;
+}
+
+static void
+push_eval_state(struct bpf_verifier *bvf)
+{
+	bvf->evst_pool.cur--;
+}
+
+static void
+evst_pool_fini(struct bpf_verifier *bvf)
+{
+	bvf->evst = NULL;
+	free(bvf->evst_pool.ent);
+	memset(&bvf->evst_pool, 0, sizeof(bvf->evst_pool));
+}
+
+static int
+evst_pool_init(struct bpf_verifier *bvf)
+{
+	uint32_t n;
+
+	n = bvf->nb_jcc_nodes + 1;
+
+	bvf->evst_pool.ent = calloc(n, sizeof(bvf->evst_pool.ent[0]));
+	if (bvf->evst_pool.ent == NULL)
+		return -ENOMEM;
+
+	bvf->evst_pool.num = n;
+	bvf->evst_pool.cur = 0;
+
+	bvf->evst = pull_eval_state(bvf);
+	return 0;
+}
+
+/*
+ * Save current eval state.
+ */
+static int
+save_eval_state(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	struct bpf_eval_state *st;
+
+	/* get new eval_state for this node */
+	st = pull_eval_state(bvf);
+	if (st == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s: internal error (out of space) at pc: %u",
+			__func__, get_node_idx(bvf, node));
+		return -ENOMEM;
+	}
+
+	/* make a copy of current state */
+	memcpy(st, bvf->evst, sizeof(*st));
+
+	/* swap current state with new one */
+	node->evst = bvf->evst;
+	bvf->evst = st;
+
+	RTE_BPF_LOG(DEBUG, "%s(bvf=%p,node=%u) old/new states: %p/%p;\n",
+		__func__, bvf, get_node_idx(bvf, node), node->evst, bvf->evst);
+
+	return 0;
+}
+
+/*
+ * Restore previous eval state and mark current eval state as free.
+ */
+static void
+restore_eval_state(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	RTE_BPF_LOG(DEBUG, "%s(bvf=%p,node=%u) old/new states: %p/%p;\n",
+		__func__, bvf, get_node_idx(bvf, node), bvf->evst, node->evst);
+
+	bvf->evst = node->evst;
+	node->evst = NULL;
+	push_eval_state(bvf);
+}
+
+/*
+ * Do second pass through CFG and try to evaluate instructions
+ * via each possible path.
+ * Right now evaluation functionality is quite limited.
+ * Still need to add extra checks for:
+ * - use/return uninitialized registers.
+ * - use uninitialized data from the stack.
+ * - memory boundaries violation.
+ */
+static int
+evaluate(struct bpf_verifier *bvf)
+{
+	int32_t rc;
+	uint32_t idx, op;
+	const char *err;
+	const struct bpf_insn *ins;
+	struct inst_node *next, *node;
+
+	node = bvf->in;
+	ins = bvf->prm->ins;
+	rc = 0;
+
+	while (node != NULL && rc == 0) {
+
+		/* current node evaluation */
+		idx = get_node_idx(bvf, node);
+		op = ins[idx].code;
+
+		if (ins_chk[op].eval != NULL) {
+			err = ins_chk[op].eval(bvf, ins + idx);
+			if (err != NULL) {
+				RTE_BPF_LOG(ERR, "%s: %s at pc: %u\n",
+					__func__, err, idx);
+				rc = -EINVAL;
+			}
+		}
+
+		/* proceed through CFG */
+		next = get_next_node(bvf, node);
+		if (next != NULL) {
+
+			/* proceed with next child */
+			if (node->cur_edge != node->nb_edge)
+				rc |= save_eval_state(bvf, node);
+			else if (node->evst != NULL)
+				restore_eval_state(bvf, node);
+
+			next->prev_node = get_node_idx(bvf, node);
+			node = next;
+		} else {
+			/*
+			 * finished with current node and all it's kids,
+			 * proceed with parent
+			 */
+			node->cur_edge = 0;
+			node = get_prev_node(bvf, node);
+
+			/* finished */
+			if (node == bvf->in)
+				node = NULL;
+		}
+	}
+
+	return rc;
+}
+
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	struct bpf_verifier bvf;
+
+	memset(&bvf, 0, sizeof(bvf));
+	bvf.prm = &bpf->prm;
+	bvf.in = calloc(bpf->prm.nb_ins, sizeof(bvf.in[0]));
+	if (bvf.in == NULL)
+		return -ENOMEM;
+
+	rc = validate(&bvf);
+
+	if (rc == 0) {
+		rc = evst_pool_init(&bvf);
+		if (rc == 0)
+			rc = evaluate(&bvf);
+		evst_pool_fini(&bvf);
+	}
+
+	free(bvf.in);
+
+	/* copy collected info */
+	if (rc == 0)
+		bpf->stack_sz = bvf.stack_sz;
+
 	return rc;
 }
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 04/10] bpf: add JIT compilation for x86_64 ISA
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
                     ` (3 preceding siblings ...)
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 03/10] bpf: add more logic into bpf_validate() Konstantin Ananyev
@ 2018-04-06 18:49   ` Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 05/10] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 18:49 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/Makefile      |    3 +
 lib/librte_bpf/bpf.c         |    5 +
 lib/librte_bpf/bpf_jit_x86.c | 1368 ++++++++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build   |    4 +
 4 files changed, 1380 insertions(+)
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index e0f434e77..44b12c439 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -23,6 +23,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
+endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
index d7f68c017..dc6d10991 100644
--- a/lib/librte_bpf/bpf.c
+++ b/lib/librte_bpf/bpf.c
@@ -41,7 +41,12 @@ bpf_jit(struct rte_bpf *bpf)
 {
 	int32_t rc;
 
+#ifdef RTE_ARCH_X86_64
+	rc = bpf_jit_x86(bpf);
+#else
 	rc = -ENOTSUP;
+#endif
+
 	if (rc != 0)
 		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
 			__func__, bpf, rc);
diff --git a/lib/librte_bpf/bpf_jit_x86.c b/lib/librte_bpf/bpf_jit_x86.c
new file mode 100644
index 000000000..d024470c2
--- /dev/null
+++ b/lib/librte_bpf/bpf_jit_x86.c
@@ -0,0 +1,1368 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define GET_BPF_OP(op)	(BPF_OP(op) >> 4)
+
+enum {
+	RAX = 0,  /* scratch, return value */
+	RCX = 1,  /* scratch, 4th arg */
+	RDX = 2,  /* scratch, 3rd arg */
+	RBX = 3,  /* callee saved */
+	RSP = 4,  /* stack pointer */
+	RBP = 5,  /* frame pointer, callee saved */
+	RSI = 6,  /* scratch, 2nd arg */
+	RDI = 7,  /* scratch, 1st arg */
+	R8  = 8,  /* scratch, 5th arg */
+	R9  = 9,  /* scratch, 6th arg */
+	R10 = 10, /* scratch */
+	R11 = 11, /* scratch */
+	R12 = 12, /* callee saved */
+	R13 = 13, /* callee saved */
+	R14 = 14, /* callee saved */
+	R15 = 15, /* callee saved */
+};
+
+#define IS_EXT_REG(r)	((r) >= R8)
+
+enum {
+	REX_PREFIX = 0x40, /* fixed value 0100 */
+	REX_W = 0x8,       /* 64bit operand size */
+	REX_R = 0x4,       /* extension of the ModRM.reg field */
+	REX_X = 0x2,       /* extension of the SIB.index field */
+	REX_B = 0x1,       /* extension of the ModRM.rm field */
+};
+
+enum {
+	MOD_INDIRECT = 0,
+	MOD_IDISP8 = 1,
+	MOD_IDISP32 = 2,
+	MOD_DIRECT = 3,
+};
+
+enum {
+	SIB_SCALE_1 = 0,
+	SIB_SCALE_2 = 1,
+	SIB_SCALE_4 = 2,
+	SIB_SCALE_8 = 3,
+};
+
+/*
+ * eBPF to x86_64 register mappings.
+ */
+static const uint32_t ebpf2x86[] = {
+	[BPF_REG_0] = RAX,
+	[BPF_REG_1] = RDI,
+	[BPF_REG_2] = RSI,
+	[BPF_REG_3] = RDX,
+	[BPF_REG_4] = RCX,
+	[BPF_REG_5] = R8,
+	[BPF_REG_6] = RBX,
+	[BPF_REG_7] = R13,
+	[BPF_REG_8] = R14,
+	[BPF_REG_9] = R15,
+	[BPF_REG_10] = RBP,
+};
+
+/*
+ * r10 and r11 are used as a scratch temporary registers.
+ */
+enum {
+	REG_DIV_IMM = R9,
+	REG_TMP0 = R11,
+	REG_TMP1 = R10,
+};
+
+/*
+ * callee saved registers list.
+ * keep RBP as the last one.
+ */
+static const uint32_t save_regs[] = {RBX, R12, R13, R14, R15, RBP};
+
+struct bpf_jit_state {
+	uint32_t idx;
+	size_t sz;
+	struct {
+		uint32_t num;
+		int32_t off;
+	} exit;
+	uint32_t reguse;
+	int32_t *off;
+	uint8_t *ins;
+};
+
+#define	INUSE(v, r)	(((v) >> (r)) & 1)
+#define	USED(v, r)	((v) |= 1 << (r))
+
+union bpf_jit_imm {
+	uint32_t u32;
+	uint8_t u8[4];
+};
+
+static size_t
+bpf_size(uint32_t bpf_op_sz)
+{
+	if (bpf_op_sz == BPF_B)
+		return sizeof(uint8_t);
+	else if (bpf_op_sz == BPF_H)
+		return sizeof(uint16_t);
+	else if (bpf_op_sz == BPF_W)
+		return sizeof(uint32_t);
+	else if (bpf_op_sz == BPF_DW)
+		return sizeof(uint64_t);
+	return 0;
+}
+
+/*
+ * In many cases for imm8 we can produce shorter code.
+ */
+static size_t
+imm_size(int32_t v)
+{
+	if (v == (int8_t)v)
+		return sizeof(int8_t);
+	return sizeof(int32_t);
+}
+
+static void
+emit_bytes(struct bpf_jit_state *st, const uint8_t ins[], uint32_t sz)
+{
+	uint32_t i;
+
+	if (st->ins != NULL) {
+		for (i = 0; i != sz; i++)
+			st->ins[st->sz + i] = ins[i];
+	}
+	st->sz += sz;
+}
+
+static void
+emit_imm(struct bpf_jit_state *st, const uint32_t imm, uint32_t sz)
+{
+	union bpf_jit_imm v;
+
+	v.u32 = imm;
+	emit_bytes(st, v.u8, sz);
+}
+
+/*
+ * emit REX byte
+ */
+static void
+emit_rex(struct bpf_jit_state *st, uint32_t op, uint32_t reg, uint32_t rm)
+{
+	uint8_t rex;
+
+	/* mark operand registers as used*/
+	USED(st->reguse, reg);
+	USED(st->reguse, rm);
+
+	rex = 0;
+	if (BPF_CLASS(op) == BPF_ALU64 ||
+			op == (BPF_ST | BPF_MEM | BPF_DW) ||
+			op == (BPF_STX | BPF_MEM | BPF_DW) ||
+			op == (BPF_STX | BPF_XADD | BPF_DW) ||
+			op == (BPF_LD | BPF_IMM | BPF_DW) ||
+			(BPF_CLASS(op) == BPF_LDX &&
+			BPF_MODE(op) == BPF_MEM &&
+			BPF_SIZE(op) != BPF_W))
+		rex |= REX_W;
+
+	if (IS_EXT_REG(reg))
+		rex |= REX_R;
+
+	if (IS_EXT_REG(rm))
+		rex |= REX_B;
+
+	/* store using SIL, DIL */
+	if (op == (BPF_STX | BPF_MEM | BPF_B) && (reg == RDI || reg == RSI))
+		rex |= REX_PREFIX;
+
+	if (rex != 0) {
+		rex |= REX_PREFIX;
+		emit_bytes(st, &rex, sizeof(rex));
+	}
+}
+
+/*
+ * emit MODRegRM byte
+ */
+static void
+emit_modregrm(struct bpf_jit_state *st, uint32_t mod, uint32_t reg, uint32_t rm)
+{
+	uint8_t v;
+
+	v = mod << 6 | (reg & 7) << 3 | (rm & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit SIB byte
+ */
+static void
+emit_sib(struct bpf_jit_state *st, uint32_t scale, uint32_t idx, uint32_t base)
+{
+	uint8_t v;
+
+	v = scale << 6 | (idx & 7) << 3 | (base & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit xchg %<sreg>, %<dreg>
+ */
+static void
+emit_xchg_reg(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	const uint8_t ops = 0x87;
+
+	emit_rex(st, BPF_ALU64, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit neg %<dreg>
+ */
+static void
+emit_neg(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 3;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+/*
+ * emit mov %<sreg>, %<dreg>
+ */
+static void
+emit_mov_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x89;
+
+	/* if operands are 32-bit, then it can be used to clear upper 32-bit */
+	if (sreg != dreg || BPF_CLASS(op) == BPF_ALU) {
+		emit_rex(st, op, sreg, dreg);
+		emit_bytes(st, &ops, sizeof(ops));
+		emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+	}
+}
+
+/*
+ * emit movzwl %<sreg>, %<dreg>
+ */
+static void
+emit_movzwl(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	static const uint8_t ops[] = {0x0F, 0xB7};
+
+	emit_rex(st, BPF_ALU, sreg, dreg);
+	emit_bytes(st, ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit ror <imm8>, %<dreg>
+ */
+static void
+emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t prfx = 0x66;
+	const uint8_t ops = 0xC1;
+	const uint8_t mods = 1;
+
+	emit_bytes(st, &prfx, sizeof(prfx));
+	emit_rex(st, BPF_ALU, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit bswap %<dreg>
+ */
+static void
+emit_be2le_48(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	uint32_t rop;
+
+	const uint8_t ops = 0x0F;
+	const uint8_t mods = 1;
+
+	rop = (imm == 64) ? BPF_ALU64 : BPF_ALU;
+	emit_rex(st, rop, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+static void
+emit_be2le(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16) {
+		emit_ror_imm(st, dreg, 8);
+		emit_movzwl(st, dreg, dreg);
+	} else
+		emit_be2le_48(st, dreg, imm);
+}
+
+/*
+ * In general it is NOP for x86.
+ * Just clear the upper bits.
+ */
+static void
+emit_le2be(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16)
+		emit_movzwl(st, dreg, dreg);
+	else if (imm == 32)
+		emit_mov_reg(st, BPF_ALU | BPF_MOV | BPF_X, dreg, dreg);
+}
+
+/*
+ * emit one of:
+ *   add <imm>, %<dreg>
+ *   and <imm>, %<dreg>
+ *   or  <imm>, %<dreg>
+ *   sub <imm>, %<dreg>
+ *   xor <imm>, %<dreg>
+ */
+static void
+emit_alu_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t mod, opcode;
+	uint32_t bop, imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0,
+		[GET_BPF_OP(BPF_AND)] = 4,
+		[GET_BPF_OP(BPF_OR)] =  1,
+		[GET_BPF_OP(BPF_SUB)] = 5,
+		[GET_BPF_OP(BPF_XOR)] = 6,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+
+	imsz = imm_size(imm);
+	opcode = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &opcode, sizeof(opcode));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit one of:
+ *   add %<sreg>, %<dreg>
+ *   and %<sreg>, %<dreg>
+ *   or  %<sreg>, %<dreg>
+ *   sub %<sreg>, %<dreg>
+ *   xor %<sreg>, %<dreg>
+ */
+static void
+emit_alu_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0x01,
+		[GET_BPF_OP(BPF_AND)] = 0x21,
+		[GET_BPF_OP(BPF_OR)] =  0x09,
+		[GET_BPF_OP(BPF_SUB)] = 0x29,
+		[GET_BPF_OP(BPF_XOR)] = 0x31,
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+static void
+emit_shift(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	uint8_t mod;
+	uint32_t bop, opx;
+
+	static const uint8_t ops[] = {0xC1, 0xD3};
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_LSH)] = 4,
+		[GET_BPF_OP(BPF_RSH)] = 5,
+		[GET_BPF_OP(BPF_ARSH)] = 7,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+	opx = (BPF_SRC(op) == BPF_X);
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+}
+
+/*
+ * emit one of:
+ *   shl <imm>, %<dreg>
+ *   shr <imm>, %<dreg>
+ *   sar <imm>, %<dreg>
+ */
+static void
+emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm)
+{
+	emit_shift(st, op, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit one of:
+ *   shl %<dreg>
+ *   shr %<dreg>
+ *   sar %<dreg>
+ * note that rcx is implicitly used as a source register, so few extra
+ * instructions for register spillage might be necessary.
+ */
+static void
+emit_shift_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+
+	emit_shift(st, op, (dreg == RCX) ? sreg : dreg);
+
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+}
+
+/*
+ * emit mov <imm>, %<dreg>
+ */
+static void
+emit_mov_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xC7;
+
+	if (imm == 0) {
+		/* replace 'mov 0, %<dst>' with 'xor %<dst>, %<dst>' */
+		op = BPF_CLASS(op) | BPF_XOR | BPF_X;
+		emit_alu_reg(st, op, dreg, dreg);
+		return;
+	}
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+	emit_imm(st, imm, sizeof(imm));
+}
+
+/*
+ * emit mov <imm64>, %<dreg>
+ */
+static void
+emit_ld_imm64(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm0,
+	uint32_t imm1)
+{
+	const uint8_t ops = 0xB8;
+
+	if (imm1 == 0) {
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, dreg, imm0);
+		return;
+	}
+
+	emit_rex(st, BPF_ALU64, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+
+	emit_imm(st, imm0, sizeof(imm0));
+	emit_imm(st, imm1, sizeof(imm1));
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * either:
+ *   mov %<sreg>, %rdx
+ * OR
+ *   mov <imm>, %rdx
+ * mul %rdx
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ */
+static void
+emit_mul(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 4;
+
+	/* save rax & rdx */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, REG_TMP0);
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* rax = dreg */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, dreg, RAX);
+
+	if (BPF_SRC(op) == BPF_X)
+		/* rdx = sreg */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X,
+			sreg == RAX ? REG_TMP0 : sreg, RDX);
+	else
+		/* rdx = imm */
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, RDX, imm);
+
+	emit_rex(st, op, RAX, RDX);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RDX);
+
+	if (dreg != RDX)
+		/* restore rdx */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP1, RDX);
+
+	if (dreg != RAX) {
+		/* dreg = rax */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, dreg);
+		/* restore rax */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP0, RAX);
+	}
+}
+
+/*
+ * emit mov <ofs>(%<sreg>), %<dreg>
+ * note that for non 64-bit ops, higher bits have to be cleared.
+ */
+static void
+emit_ld_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	uint32_t mods, opsz;
+	const uint8_t op32 = 0x8B;
+	const uint8_t op16[] = {0x0F, 0xB7};
+	const uint8_t op8[] = {0x0F, 0xB6};
+
+	emit_rex(st, op, dreg, sreg);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_B)
+		emit_bytes(st, op8, sizeof(op8));
+	else if (opsz == BPF_H)
+		emit_bytes(st, op16, sizeof(op16));
+	else
+		emit_bytes(st, &op32, sizeof(op32));
+
+	mods = (imm_size(ofs) == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, dreg, sreg);
+	if (sreg == RSP || sreg == R12)
+		emit_sib(st, SIB_SCALE_1, sreg, sreg);
+	emit_imm(st, ofs, imm_size(ofs));
+}
+
+/*
+ * emit one of:
+ *   mov %<sreg>, <ofs>(%<dreg>)
+ *   mov <imm>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_common(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, uint32_t imm, int32_t ofs)
+{
+	uint32_t mods, imsz, opsz, opx;
+	const uint8_t prfx16 = 0x66;
+
+	/* 8 bit instruction opcodes */
+	static const uint8_t op8[] = {0xC6, 0x88};
+
+	/* 16/32/64 bit instruction opcodes */
+	static const uint8_t ops[] = {0xC7, 0x89};
+
+	/* is the instruction has immediate value or src reg? */
+	opx = (BPF_CLASS(op) == BPF_STX);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_H)
+		emit_bytes(st, &prfx16, sizeof(prfx16));
+
+	emit_rex(st, op, sreg, dreg);
+
+	if (opsz == BPF_B)
+		emit_bytes(st, &op8[opx], sizeof(op8[opx]));
+	else
+		emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, sreg, dreg);
+
+	if (dreg == RSP || dreg == R12)
+		emit_sib(st, SIB_SCALE_1, dreg, dreg);
+
+	emit_imm(st, ofs, imsz);
+
+	if (opx == 0) {
+		imsz = RTE_MIN(bpf_size(opsz), sizeof(imm));
+		emit_imm(st, imm, imsz);
+	}
+}
+
+static void
+emit_st_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm,
+	int32_t ofs)
+{
+	emit_st_common(st, op, 0, dreg, imm, ofs);
+}
+
+static void
+emit_st_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	emit_st_common(st, op, sreg, dreg, 0, ofs);
+}
+
+/*
+ * emit lock add %<sreg>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_xadd(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	uint32_t imsz, mods;
+
+	const uint8_t lck = 0xF0; /* lock prefix */
+	const uint8_t ops = 0x01; /* add opcode */
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_bytes(st, &lck, sizeof(lck));
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, mods, sreg, dreg);
+	emit_imm(st, ofs, imsz);
+}
+
+/*
+ * emit:
+ *    mov <imm64>, (%rax)
+ *    call *%rax
+ */
+static void
+emit_call(struct bpf_jit_state *st, uintptr_t trg)
+{
+	const uint8_t ops = 0xFF;
+	const uint8_t mods = 2;
+
+	emit_ld_imm64(st, RAX, trg, trg >> 32);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RAX);
+}
+
+/*
+ * emit jmp <ofs>
+ * where 'ofs' is the target offset for the native code.
+ */
+static void
+emit_abs_jmp(struct bpf_jit_state *st, int32_t ofs)
+{
+	int32_t joff;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0xEB;
+	const uint8_t op32 = 0xE9;
+
+	const int32_t sz8 = sizeof(op8) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32) + sizeof(uint32_t);
+
+	/* max possible jmp instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = ofs - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8, sizeof(op8));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, &op32, sizeof(op32));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit jmp <ofs>
+ * where 'ofs' is the target offset for the BPF bytecode.
+ */
+static void
+emit_jmp(struct bpf_jit_state *st, int32_t ofs)
+{
+	emit_abs_jmp(st, st->off[st->idx + ofs]);
+}
+
+/*
+ * emit one of:
+ *    cmovz %<sreg>, <%dreg>
+ *    cmovne %<sreg>, <%dreg>
+ *    cmova %<sreg>, <%dreg>
+ *    cmovb %<sreg>, <%dreg>
+ *    cmovae %<sreg>, <%dreg>
+ *    cmovbe %<sreg>, <%dreg>
+ *    cmovg %<sreg>, <%dreg>
+ *    cmovl %<sreg>, <%dreg>
+ *    cmovge %<sreg>, <%dreg>
+ *    cmovle %<sreg>, <%dreg>
+ */
+static void
+emit_movcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x44},  /* CMOVZ */
+		[GET_BPF_OP(BPF_JNE)] = {0x0F, 0x45},  /* CMOVNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x47},  /* CMOVA */
+		[GET_BPF_OP(BPF_JLT)] = {0x0F, 0x42},  /* CMOVB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x43},  /* CMOVAE */
+		[GET_BPF_OP(BPF_JLE)] = {0x0F, 0x46},  /* CMOVBE */
+		[GET_BPF_OP(BPF_JSGT)] = {0x0F, 0x4F}, /* CMOVG */
+		[GET_BPF_OP(BPF_JSLT)] = {0x0F, 0x4C}, /* CMOVL */
+		[GET_BPF_OP(BPF_JSGE)] = {0x0F, 0x4D}, /* CMOVGE */
+		[GET_BPF_OP(BPF_JSLE)] = {0x0F, 0x4E}, /* CMOVLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x45}, /* CMOVNE */
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, dreg, sreg);
+	emit_bytes(st, ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, dreg, sreg);
+}
+
+/*
+ * emit one of:
+ * je <ofs>
+ * jne <ofs>
+ * ja <ofs>
+ * jb <ofs>
+ * jae <ofs>
+ * jbe <ofs>
+ * jg <ofs>
+ * jl <ofs>
+ * jge <ofs>
+ * jle <ofs>
+ * where 'ofs' is the target offset for the native code.
+ */
+static void
+emit_abs_jcc(struct bpf_jit_state *st, uint32_t op, int32_t ofs)
+{
+	uint32_t bop, imsz;
+	int32_t joff;
+
+	static const uint8_t op8[] = {
+		[GET_BPF_OP(BPF_JEQ)] = 0x74,  /* JE */
+		[GET_BPF_OP(BPF_JNE)] = 0x75,  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = 0x77,  /* JA */
+		[GET_BPF_OP(BPF_JLT)] = 0x72,  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = 0x73,  /* JAE */
+		[GET_BPF_OP(BPF_JLE)] = 0x76,  /* JBE */
+		[GET_BPF_OP(BPF_JSGT)] = 0x7F, /* JG */
+		[GET_BPF_OP(BPF_JSLT)] = 0x7C, /* JL */
+		[GET_BPF_OP(BPF_JSGE)] = 0x7D, /*JGE */
+		[GET_BPF_OP(BPF_JSLE)] = 0x7E, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = 0x75, /*JNE */
+	};
+
+	static const uint8_t op32[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x84},  /* JE */
+		[GET_BPF_OP(BPF_JNE)] = {0x0F, 0x85},  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x87},  /* JA */
+		[GET_BPF_OP(BPF_JLT)] = {0x0F, 0x82},  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x83},  /* JAE */
+		[GET_BPF_OP(BPF_JLE)] = {0x0F, 0x86},  /* JBE */
+		[GET_BPF_OP(BPF_JSGT)] = {0x0F, 0x8F}, /* JG */
+		[GET_BPF_OP(BPF_JSLT)] = {0x0F, 0x8C}, /* JL */
+		[GET_BPF_OP(BPF_JSGE)] = {0x0F, 0x8D}, /*JGE */
+		[GET_BPF_OP(BPF_JSLE)] = {0x0F, 0x8E}, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x85}, /*JNE */
+	};
+
+	const int32_t sz8 = sizeof(op8[0]) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32[0]) + sizeof(uint32_t);
+
+	/* max possible jcc instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = ofs - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	bop = GET_BPF_OP(op);
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8[bop], sizeof(op8[bop]));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, op32[bop], sizeof(op32[bop]));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit one of:
+ * je <ofs>
+ * jne <ofs>
+ * ja <ofs>
+ * jb <ofs>
+ * jae <ofs>
+ * jbe <ofs>
+ * jg <ofs>
+ * jl <ofs>
+ * jge <ofs>
+ * jle <ofs>
+ * where 'ofs' is the target offset for the BPF bytecode.
+ */
+static void
+emit_jcc(struct bpf_jit_state *st, uint32_t op, int32_t ofs)
+{
+	emit_abs_jcc(st, op, st->off[st->idx + ofs]);
+}
+
+
+/*
+ * emit cmp <imm>, %<dreg>
+ */
+static void
+emit_cmp_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t ops;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	const uint8_t mods = 7;
+
+	imsz = imm_size(imm);
+	ops = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit test <imm>, %<dreg>
+ */
+static void
+emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 0;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+static void
+emit_jcc_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_imm(st, BPF_ALU64, dreg, imm);
+	else
+		emit_cmp_imm(st, BPF_ALU64, dreg, imm);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * emit test %<sreg>, %<dreg>
+ */
+static void
+emit_tst_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x85;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit cmp %<sreg>, %<dreg>
+ */
+static void
+emit_cmp_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x39;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+
+}
+
+static void
+emit_jcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_reg(st, BPF_ALU64, sreg, dreg);
+	else
+		emit_cmp_reg(st, BPF_ALU64, sreg, dreg);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * xor %rdx, %rdx
+ * for divisor as immediate value:
+ *   mov <imm>, %r9
+ * div %<divisor_reg>
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ * either:
+ *   mov %rax, %<dreg>
+ * OR
+ *   mov %rdx, %<dreg>
+ * mov %r11, %rax
+ * mov %r10, %rdx
+ */
+static void
+emit_div(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	uint32_t sr;
+
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 6;
+
+	if (BPF_SRC(op) == BPF_X) {
+
+		/* check that src divisor is not zero */
+		emit_tst_reg(st, BPF_CLASS(op), sreg, sreg);
+
+		/* exit with return value zero */
+		emit_movcc_reg(st, BPF_CLASS(op) | BPF_JEQ | BPF_X, sreg, RAX);
+		emit_abs_jcc(st, BPF_JMP | BPF_JEQ | BPF_K, st->exit.off);
+	}
+
+	/* save rax & rdx */
+	if (dreg != RAX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, REG_TMP0);
+	if (dreg != RDX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* fill rax & rdx */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, dreg, RAX);
+	emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, RDX, 0);
+
+	if (BPF_SRC(op) == BPF_X) {
+		sr = sreg;
+		if (sr == RAX)
+			sr = REG_TMP0;
+		else if (sr == RDX)
+			sr = REG_TMP1;
+	} else {
+		sr = REG_DIV_IMM;
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, sr, imm);
+	}
+
+	emit_rex(st, op, 0, sr);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, sr);
+
+	if (BPF_OP(op) == BPF_DIV)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, dreg);
+	else
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, dreg);
+
+	if (dreg != RAX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP0, RAX);
+	if (dreg != RDX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP1, RDX);
+}
+
+static void
+emit_prolog(struct bpf_jit_state *st, int32_t stack_size)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	/* we can avoid touching the stack at all */
+	if (spil == 0)
+		return;
+
+
+	emit_alu_imm(st, BPF_ALU64 | BPF_SUB | BPF_K, RSP,
+		spil * sizeof(uint64_t));
+
+	ofs = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++) {
+		if (INUSE(st->reguse, save_regs[i]) != 0) {
+			emit_st_reg(st, BPF_STX | BPF_MEM | BPF_DW,
+				save_regs[i], RSP, ofs);
+			ofs += sizeof(uint64_t);
+		}
+	}
+
+	if (INUSE(st->reguse, RBP) != 0) {
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RSP, RBP);
+		emit_alu_imm(st, BPF_ALU64 | BPF_SUB | BPF_K, RSP, stack_size);
+	}
+}
+
+/*
+ * emit ret
+ */
+static void
+emit_ret(struct bpf_jit_state *st)
+{
+	const uint8_t ops = 0xC3;
+
+	emit_bytes(st, &ops, sizeof(ops));
+}
+
+static void
+emit_epilog(struct bpf_jit_state *st)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	/* if we allready have an epilog generate a jump to it */
+	if (st->exit.num++ != 0) {
+		emit_abs_jmp(st, st->exit.off);
+		return;
+	}
+
+	/* store offset of epilog block */
+	st->exit.off = st->sz;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	if (spil != 0) {
+
+		if (INUSE(st->reguse, RBP) != 0)
+			emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RBP, RSP);
+
+		ofs = 0;
+		for (i = 0; i != RTE_DIM(save_regs); i++) {
+			if (INUSE(st->reguse, save_regs[i]) != 0) {
+				emit_ld_reg(st, BPF_LDX | BPF_MEM | BPF_DW,
+					RSP, save_regs[i], ofs);
+				ofs += sizeof(uint64_t);
+			}
+		}
+
+		emit_alu_imm(st, BPF_ALU64 | BPF_ADD | BPF_K, RSP,
+			spil * sizeof(uint64_t));
+	}
+
+	emit_ret(st);
+}
+
+/*
+ * walk through bpf code and translate them x86_64 one.
+ */
+static int
+emit(struct bpf_jit_state *st, const struct rte_bpf *bpf)
+{
+	uint32_t i, dr, op, sr;
+	const struct bpf_insn *ins;
+
+	/* reset state fields */
+	st->sz = 0;
+	st->exit.num = 0;
+
+	emit_prolog(st, bpf->stack_sz);
+
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		st->idx = i;
+		st->off[i] = st->sz;
+
+		ins = bpf->prm.ins + i;
+
+		dr = ebpf2x86[ins->dst_reg];
+		sr = ebpf2x86[ins->src_reg];
+		op = ins->code;
+
+		switch (op) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+		case (BPF_ALU | BPF_SUB | BPF_K):
+		case (BPF_ALU | BPF_AND | BPF_K):
+		case (BPF_ALU | BPF_OR | BPF_K):
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+		case (BPF_ALU | BPF_SUB | BPF_X):
+		case (BPF_ALU | BPF_AND | BPF_X):
+		case (BPF_ALU | BPF_OR | BPF_X):
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			emit_be2le(st, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			emit_le2be(st, dr, ins->imm);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		/* multiply instructions */
+		case (BPF_ALU | BPF_MUL | BPF_K):
+		case (BPF_ALU | BPF_MUL | BPF_X):
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			emit_mul(st, op, sr, dr, ins->imm);
+			break;
+		/* divide instructions */
+		case (BPF_ALU | BPF_DIV | BPF_K):
+		case (BPF_ALU | BPF_MOD | BPF_K):
+		case (BPF_ALU | BPF_DIV | BPF_X):
+		case (BPF_ALU | BPF_MOD | BPF_X):
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			emit_div(st, op, sr, dr, ins->imm);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+		case (BPF_LDX | BPF_MEM | BPF_H):
+		case (BPF_LDX | BPF_MEM | BPF_W):
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			emit_ld_reg(st, op, sr, dr, ins->off);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			emit_ld_imm64(st, dr, ins[0].imm, ins[1].imm);
+			i++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+		case (BPF_STX | BPF_MEM | BPF_H):
+		case (BPF_STX | BPF_MEM | BPF_W):
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			emit_st_reg(st, op, sr, dr, ins->off);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+		case (BPF_ST | BPF_MEM | BPF_H):
+		case (BPF_ST | BPF_MEM | BPF_W):
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			emit_st_imm(st, op, dr, ins->imm, ins->off);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			emit_st_xadd(st, op, sr, dr, ins->off);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			emit_jmp(st, ins->off + 1);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | BPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | BPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | BPF_JLE | BPF_K):
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			emit_jcc_imm(st, op, dr, ins->imm, ins->off + 1);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | BPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | BPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | BPF_JLE | BPF_X):
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			emit_jcc_reg(st, op, sr, dr, ins->off + 1);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			emit_call(st, (uintptr_t)bpf->prm.xsym[ins->imm].func);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			emit_epilog(st);
+			break;
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %u;\n",
+				__func__, bpf, ins->code, i);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * produce a native ISA version of the given BPF code.
+ */
+int
+bpf_jit_x86(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	uint32_t i;
+	size_t sz;
+	struct bpf_jit_state st;
+
+	/* init state */
+	memset(&st, 0, sizeof(st));
+	st.off = malloc(bpf->prm.nb_ins * sizeof(st.off[0]));
+	if (st.off == NULL)
+		return -ENOMEM;
+
+	/* fill with fake offsets */
+	st.exit.off = INT32_MAX;
+	for (i = 0; i != bpf->prm.nb_ins; i++)
+		st.off[i] = INT32_MAX;
+
+	/*
+	 * dry runs, used to calculate total code size and valid jump offsets.
+	 * stop when we get minimal possible size
+	 */
+	do {
+		sz = st.sz;
+		rc = emit(&st, bpf);
+	} while (rc == 0 && sz != st.sz);
+
+	if (rc == 0) {
+
+		/* allocate memory needed */
+		st.ins = mmap(NULL, st.sz, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (st.ins == MAP_FAILED)
+			rc = -ENOMEM;
+		else
+			/* generate code */
+			rc = emit(&st, bpf);
+	}
+
+	if (rc == 0 && mprotect(st.ins, st.sz, PROT_READ | PROT_EXEC) != 0)
+		rc = -ENOMEM;
+
+	if (rc != 0)
+		munmap(st.ins, st.sz);
+	else {
+		bpf->jit.func = (void *)st.ins;
+		bpf->jit.sz = st.sz;
+	}
+
+	free(st.off);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index 05c48c7ff..67ca30533 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -7,6 +7,10 @@ sources = files('bpf.c',
 		'bpf_load.c',
 		'bpf_validate.c')
 
+if arch_subdir == 'x86'
+	sources += files('bpf_jit_x86.c')
+endif
+
 install_headers = files('rte_bpf.h')
 
 deps += ['mbuf', 'net']
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 05/10] bpf: introduce basic RX/TX BPF filters
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
                     ` (4 preceding siblings ...)
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 04/10] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
@ 2018-04-06 18:49   ` Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 06/10] testpmd: new commands to load/unload " Konstantin Ananyev
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 18:49 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce API to install BPF based filters on ethdev RX/TX path.
Current implementation is pure SW one, based on ethdev RX/TX
callback mechanism.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/Makefile            |   2 +
 lib/librte_bpf/bpf_pkt.c           | 607 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build         |   6 +-
 lib/librte_bpf/rte_bpf_ethdev.h    | 102 +++++++
 lib/librte_bpf/rte_bpf_version.map |   4 +
 5 files changed, 719 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index 44b12c439..501c49c60 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -22,6 +22,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_pkt.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
 ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
@@ -29,5 +30,6 @@ endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf_ethdev.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf_pkt.c b/lib/librte_bpf/bpf_pkt.c
new file mode 100644
index 000000000..a8735456e
--- /dev/null
+++ b/lib/librte_bpf/bpf_pkt.c
@@ -0,0 +1,607 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include <sys/queue.h>
+#include <sys/stat.h>
+
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include <rte_malloc.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_cycles.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+
+#include <rte_bpf_ethdev.h>
+#include "bpf_impl.h"
+
+/*
+ * information about installed BPF rx/tx callback
+ */
+
+struct bpf_eth_cbi {
+	/* used by both data & control path */
+	uint32_t use;    /*usage counter */
+	const struct rte_eth_rxtx_callback *cb;  /* callback handle */
+	struct rte_bpf *bpf;
+	struct rte_bpf_jit jit;
+	/* used by control path only */
+	LIST_ENTRY(bpf_eth_cbi) link;
+	uint16_t port;
+	uint16_t queue;
+} __rte_cache_aligned;
+
+/*
+ * Odd number means that callback is used by datapath.
+ * Even number means that callback is not used by datapath.
+ */
+#define BPF_ETH_CBI_INUSE  1
+
+/*
+ * List to manage RX/TX installed callbacks.
+ */
+LIST_HEAD(bpf_eth_cbi_list, bpf_eth_cbi);
+
+enum {
+	BPF_ETH_RX,
+	BPF_ETH_TX,
+	BPF_ETH_NUM,
+};
+
+/*
+ * information about all installed BPF rx/tx callbacks
+ */
+struct bpf_eth_cbh {
+	rte_spinlock_t lock;
+	struct bpf_eth_cbi_list list;
+	uint32_t type;
+};
+
+static struct bpf_eth_cbh rx_cbh = {
+	.lock = RTE_SPINLOCK_INITIALIZER,
+	.list = LIST_HEAD_INITIALIZER(list),
+	.type = BPF_ETH_RX,
+};
+
+static struct bpf_eth_cbh tx_cbh = {
+	.lock = RTE_SPINLOCK_INITIALIZER,
+	.list = LIST_HEAD_INITIALIZER(list),
+	.type = BPF_ETH_TX,
+};
+
+/*
+ * Marks given callback as used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
+{
+	cbi->use++;
+	/* make sure no store/load reordering could happen */
+	rte_smp_mb();
+}
+
+/*
+ * Marks given callback list as not used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
+{
+	/* make sure all previous loads are completed */
+	rte_smp_rmb();
+	cbi->use++;
+}
+
+/*
+ * Waits till datapath finished using given callback.
+ */
+static void
+bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
+{
+	uint32_t nuse, puse;
+
+	/* make sure all previous loads and stores are completed */
+	rte_smp_mb();
+
+	puse = cbi->use;
+
+	/* in use, busy wait till current RX/TX iteration is finished */
+	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
+		do {
+			rte_pause();
+			rte_compiler_barrier();
+			nuse = cbi->use;
+		} while (nuse == puse);
+	}
+}
+
+static void
+bpf_eth_cbi_cleanup(struct bpf_eth_cbi *bc)
+{
+	bc->bpf = NULL;
+	memset(&bc->jit, 0, sizeof(bc->jit));
+}
+
+static struct bpf_eth_cbi *
+bpf_eth_cbh_find(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *cbi;
+
+	LIST_FOREACH(cbi, &cbh->list, link) {
+		if (cbi->port == port && cbi->queue == queue)
+			break;
+	}
+	return cbi;
+}
+
+static struct bpf_eth_cbi *
+bpf_eth_cbh_add(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *cbi;
+
+	/* return an existing one */
+	cbi = bpf_eth_cbh_find(cbh, port, queue);
+	if (cbi != NULL)
+		return cbi;
+
+	cbi = rte_zmalloc(NULL, sizeof(*cbi), RTE_CACHE_LINE_SIZE);
+	if (cbi != NULL) {
+		cbi->port = port;
+		cbi->queue = queue;
+		LIST_INSERT_HEAD(&cbh->list, cbi, link);
+	}
+	return cbi;
+}
+
+/*
+ * BPF packet processing routinies.
+ */
+
+static inline uint32_t
+apply_filter(struct rte_mbuf *mb[], const uint64_t rc[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i, j, k;
+	struct rte_mbuf *dr[num];
+
+	for (i = 0, j = 0, k = 0; i != num; i++) {
+
+		/* filter matches */
+		if (rc[i] != 0)
+			mb[j++] = mb[i];
+		/* no match */
+		else
+			dr[k++] = mb[i];
+	}
+
+	if (drop != 0) {
+		/* free filtered out mbufs */
+		for (i = 0; i != k; i++)
+			rte_pktmbuf_free(dr[i]);
+	} else {
+		/* copy filtered out mbufs beyond good ones */
+		for (i = 0; i != k; i++)
+			mb[j + i] = dr[i];
+	}
+
+	return j;
+}
+
+static inline uint32_t
+pkt_filter_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i;
+	void *dp[num];
+	uint64_t rc[num];
+
+	for (i = 0; i != num; i++)
+		dp[i] = rte_pktmbuf_mtod(mb[i], void *);
+
+	rte_bpf_exec_burst(bpf, dp, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i, n;
+	void *dp;
+	uint64_t rc[num];
+
+	n = 0;
+	for (i = 0; i != num; i++) {
+		dp = rte_pktmbuf_mtod(mb[i], void *);
+		rc[i] = jit->func(dp);
+		n += (rc[i] == 0);
+	}
+
+	if (n != 0)
+		num = apply_filter(mb, rc, num, drop);
+
+	return num;
+}
+
+static inline uint32_t
+pkt_filter_mb_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint64_t rc[num];
+
+	rte_bpf_exec_burst(bpf, (void **)mb, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_mb_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i, n;
+	uint64_t rc[num];
+
+	n = 0;
+	for (i = 0; i != num; i++) {
+		rc[i] = jit->func(mb[i]);
+		n += (rc[i] == 0);
+	}
+
+	if (n != 0)
+		num = apply_filter(mb, rc, num, drop);
+
+	return num;
+}
+
+/*
+ * RX/TX callbacks for raw data bpf.
+ */
+
+static uint16_t
+bpf_rx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+/*
+ * RX/TX callbacks for mbuf.
+ */
+
+static uint16_t
+bpf_rx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static rte_rx_callback_fn
+select_rx_callback(enum rte_bpf_prog_type ptype, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+			return bpf_rx_callback_jit;
+		else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+			return bpf_rx_callback_mb_jit;
+	} else if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+		return bpf_rx_callback_vm;
+	else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+		return bpf_rx_callback_mb_vm;
+
+	return NULL;
+}
+
+static rte_tx_callback_fn
+select_tx_callback(enum rte_bpf_prog_type ptype, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+			return bpf_tx_callback_jit;
+		else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+			return bpf_tx_callback_mb_jit;
+	} else if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+		return bpf_tx_callback_vm;
+	else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+		return bpf_tx_callback_mb_vm;
+
+	return NULL;
+}
+
+/*
+ * helper function to perform BPF unload for given port/queue.
+ * have to introduce extra complexity (and possible slowdown) here,
+ * as right now there is no safe generic way to remove RX/TX callback
+ * while IO is active.
+ * Still don't free memory allocated for callback handle itself,
+ * again right now there is no safe way to do that without stopping RX/TX
+ * on given port/queue first.
+ */
+static void
+bpf_eth_cbi_unload(struct bpf_eth_cbi *bc)
+{
+	/* mark this cbi as empty */
+	bc->cb = NULL;
+	rte_smp_mb();
+
+	/* make sure datapath doesn't use bpf anymore, then destroy bpf */
+	bpf_eth_cbi_wait(bc);
+	rte_bpf_destroy(bc->bpf);
+	bpf_eth_cbi_cleanup(bc);
+}
+
+static void
+bpf_eth_unload(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *bc;
+
+	bc = bpf_eth_cbh_find(cbh, port, queue);
+	if (bc == NULL || bc->cb == NULL)
+		return;
+
+	if (cbh->type == BPF_ETH_RX)
+		rte_eth_remove_rx_callback(port, queue, bc->cb);
+	else
+		rte_eth_remove_tx_callback(port, queue, bc->cb);
+
+	bpf_eth_cbi_unload(bc);
+}
+
+
+__rte_experimental void
+rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &rx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	bpf_eth_unload(cbh, port, queue);
+	rte_spinlock_unlock(&cbh->lock);
+}
+
+__rte_experimental void
+rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &tx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	bpf_eth_unload(cbh, port, queue);
+	rte_spinlock_unlock(&cbh->lock);
+}
+
+static int
+bpf_eth_elf_load(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbi *bc;
+	struct rte_bpf *bpf;
+	rte_rx_callback_fn frx;
+	rte_tx_callback_fn ftx;
+	struct rte_bpf_jit jit;
+
+	frx = NULL;
+	ftx = NULL;
+
+	if (prm == NULL || rte_eth_dev_is_valid_port(port) == 0 ||
+			queue >= RTE_MAX_QUEUES_PER_PORT)
+		return -EINVAL;
+
+	if (cbh->type == BPF_ETH_RX)
+		frx = select_rx_callback(prm->prog_type, flags);
+	else
+		ftx = select_tx_callback(prm->prog_type, flags);
+
+	if (frx == NULL && ftx == NULL) {
+		RTE_BPF_LOG(ERR, "%s(%u, %u): no callback selected;\n",
+			__func__, port, queue);
+		return -EINVAL;
+	}
+
+	bpf = rte_bpf_elf_load(prm, fname, sname);
+	if (bpf == NULL)
+		return -rte_errno;
+
+	rte_bpf_get_jit(bpf, &jit);
+
+	if ((flags & RTE_BPF_ETH_F_JIT) != 0 && jit.func == NULL) {
+		RTE_BPF_LOG(ERR, "%s(%u, %u): no JIT generated;\n",
+			__func__, port, queue);
+		rte_bpf_destroy(bpf);
+		rc = -ENOTSUP;
+	}
+
+	/* setup/update global callback info */
+	bc = bpf_eth_cbh_add(cbh, port, queue);
+	if (bc == NULL)
+		return -ENOMEM;
+
+	/* remove old one, if any */
+	if (bc->cb != NULL)
+		bpf_eth_unload(cbh, port, queue);
+
+	bc->bpf = bpf;
+	bc->jit = jit;
+
+	if (cbh->type == BPF_ETH_RX)
+		bc->cb = rte_eth_add_rx_callback(port, queue, frx, bc);
+	else
+		bc->cb = rte_eth_add_tx_callback(port, queue, ftx, bc);
+
+	if (bc->cb == NULL) {
+		rc = -rte_errno;
+		rte_bpf_destroy(bpf);
+		bpf_eth_cbi_cleanup(bc);
+	} else
+		rc = 0;
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &rx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	rc = bpf_eth_elf_load(cbh, port, queue, prm, fname, sname, flags);
+	rte_spinlock_unlock(&cbh->lock);
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &tx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	rc = bpf_eth_elf_load(cbh, port, queue, prm, fname, sname, flags);
+	rte_spinlock_unlock(&cbh->lock);
+
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index 67ca30533..39b464041 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -5,15 +5,17 @@ allow_experimental_apis = true
 sources = files('bpf.c',
 		'bpf_exec.c',
 		'bpf_load.c',
+		'bpf_pkt.c',
 		'bpf_validate.c')
 
 if arch_subdir == 'x86'
 	sources += files('bpf_jit_x86.c')
 endif
 
-install_headers = files('rte_bpf.h')
+install_headers = files('rte_bpf.h',
+			'rte_bpf_ethdev.h')
 
-deps += ['mbuf', 'net']
+deps += ['mbuf', 'net', 'ethdev']
 
 dep = dependency('libelf', required: false)
 if dep.found() == false
diff --git a/lib/librte_bpf/rte_bpf_ethdev.h b/lib/librte_bpf/rte_bpf_ethdev.h
new file mode 100644
index 000000000..4800bbdaa
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_ethdev.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_ETHDEV_H_
+#define _RTE_BPF_ETHDEV_H_
+
+/**
+ * @file
+ *
+ * API to install BPF filter as RX/TX callbacks for eth devices.
+ * Note that right now:
+ * - it is not MT safe, i.e. it is not allowed to do load/unload for the
+ *   same port/queue from different threads in parallel.
+ * - though it allows to do load/unload at runtime
+ *   (while RX/TX is ongoing on given port/queue).
+ * - allows only one BPF program per port/queue,
+ * i.e. new load will replace previously loaded for that port/queue BPF program.
+ * Filter behaviour - if BPF program returns zero value for a given packet,
+ * then it will be dropped inside callback and no further processing
+ *   on RX - it will be dropped inside callback and no further processing
+ *   for that packet will happen.
+ *   on TX - packet will remain unsent, and it is responsibility of the user
+ *   to handle such situation (drop, try to send again, etc.).
+ */
+
+#include <rte_bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum {
+	RTE_BPF_ETH_F_NONE = 0,
+	RTE_BPF_ETH_F_JIT  = 0x1, /*< use compiled into native ISA code */
+};
+
+/**
+ * Unload previously loaded BPF program (if any) from given RX port/queue
+ * and remove appropriate RX port/queue callback.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the RX queue on the given port
+ */
+void rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue);
+
+/**
+ * Unload previously loaded BPF program (if any) from given TX port/queue
+ * and remove appropriate TX port/queue callback.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the TX queue on the given port
+ */
+void rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue);
+
+/**
+ * Load BPF program from the ELF file and install callback to execute it
+ * on given RX port/queue.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the RX queue on the given port
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   Zero on successful completion or negative error code otherwise.
+ */
+int rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+/**
+ * Load BPF program from the ELF file and install callback to execute it
+ * on given TX port/queue.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the TX queue on the given port
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   Zero on successful completion or negative error code otherwise.
+ */
+int rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_ETHDEV_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
index ff65144df..a203e088e 100644
--- a/lib/librte_bpf/rte_bpf_version.map
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -3,6 +3,10 @@ EXPERIMENTAL {
 
 	rte_bpf_destroy;
 	rte_bpf_elf_load;
+	rte_bpf_eth_rx_elf_load;
+	rte_bpf_eth_rx_unload;
+	rte_bpf_eth_tx_elf_load;
+	rte_bpf_eth_tx_unload;
 	rte_bpf_exec;
 	rte_bpf_exec_burst;
 	rte_bpf_get_jit;
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 06/10] testpmd: new commands to load/unload BPF filters
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
                     ` (5 preceding siblings ...)
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 05/10] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
@ 2018-04-06 18:49   ` Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 07/10] test: add few eBPF samples Konstantin Ananyev
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 18:49 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce new testpmd commands to load/unload RX/TX BPF-based filters.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/bpf_sup.h   |  25 ++++++++
 app/test-pmd/cmdline.c   | 146 +++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/meson.build |   2 +-
 3 files changed, 172 insertions(+), 1 deletion(-)
 create mode 100644 app/test-pmd/bpf_sup.h

diff --git a/app/test-pmd/bpf_sup.h b/app/test-pmd/bpf_sup.h
new file mode 100644
index 000000000..35f91a07f
--- /dev/null
+++ b/app/test-pmd/bpf_sup.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2017 Intel Corporation
+ */
+
+#ifndef _BPF_SUP_H_
+#define _BPF_SUP_H_
+
+#include <stdio.h>
+#include <rte_mbuf.h>
+#include <rte_bpf_ethdev.h>
+
+static const struct rte_bpf_xsym bpf_xsym[] = {
+	{
+		.name = RTE_STR(stdout),
+		.type = RTE_BPF_XTYPE_VAR,
+		.var = &stdout,
+	},
+	{
+		.name = RTE_STR(rte_pktmbuf_dump),
+		.type = RTE_BPF_XTYPE_FUNC,
+		.func = (void *)rte_pktmbuf_dump,
+	},
+};
+
+#endif /* _BPF_SUP_H_ */
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 40b31ad7e..d0ad27871 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -75,6 +75,7 @@
 #include "testpmd.h"
 #include "cmdline_mtr.h"
 #include "cmdline_tm.h"
+#include "bpf_sup.h"
 
 static struct cmdline *testpmd_cl;
 
@@ -16030,6 +16031,149 @@ cmdline_parse_inst_t cmd_load_from_file = {
 	},
 };
 
+/* *** load BPF program *** */
+struct cmd_bpf_ld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+	cmdline_fixed_string_t op;
+	cmdline_fixed_string_t flags;
+	cmdline_fixed_string_t prm;
+};
+
+static void
+bpf_parse_flags(const char *str, enum rte_bpf_prog_type *ptype, uint32_t *flags)
+{
+	uint32_t i, v;
+
+	*flags = RTE_BPF_ETH_F_NONE;
+	*ptype = RTE_BPF_PROG_TYPE_UNSPEC;
+
+	for (i = 0; str[i] != 0; i++) {
+		v = toupper(str[i]);
+		if (v == 'J')
+			*flags |= RTE_BPF_ETH_F_JIT;
+		else if (v == 'M')
+			*ptype = RTE_BPF_PROG_TYPE_MBUF;
+		else if (v == '-')
+			continue;
+		else
+			printf("unknown flag: \'%c\'", v);
+	}
+}
+
+static void cmd_operate_bpf_ld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	int32_t rc;
+	uint32_t flags;
+	struct cmd_bpf_ld_result *res;
+	struct rte_bpf_prm prm;
+	const char *fname, *sname;
+
+	res = parsed_result;
+	memset(&prm, 0, sizeof(prm));
+	prm.xsym = bpf_xsym;
+	prm.nb_xsym = RTE_DIM(bpf_xsym);
+
+	bpf_parse_flags(res->flags, &prm.prog_type, &flags);
+	fname = res->prm;
+	sname = ".text";
+
+	if (strcmp(res->dir, "rx") == 0) {
+		rc = rte_bpf_eth_rx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else if (strcmp(res->dir, "tx") == 0) {
+		rc = rte_bpf_eth_tx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_load_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			bpf, "bpf-load");
+cmdline_parse_token_string_t cmd_load_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_load_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_load_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, queue, UINT16);
+cmdline_parse_token_string_t cmd_load_bpf_flags =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			flags, NULL);
+cmdline_parse_token_string_t cmd_load_bpf_prm =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			prm, NULL);
+
+cmdline_parse_inst_t cmd_operate_bpf_ld_parse = {
+	.f = cmd_operate_bpf_ld_parsed,
+	.data = NULL,
+	.help_str = "bpf-load rx|tx <port> <queue> <J|M|B> <file_name>",
+	.tokens = {
+		(void *)&cmd_load_bpf_start,
+		(void *)&cmd_load_bpf_dir,
+		(void *)&cmd_load_bpf_port,
+		(void *)&cmd_load_bpf_queue,
+		(void *)&cmd_load_bpf_flags,
+		(void *)&cmd_load_bpf_prm,
+		NULL,
+	},
+};
+
+/* *** unload BPF program *** */
+struct cmd_bpf_unld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+};
+
+static void cmd_operate_bpf_unld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	struct cmd_bpf_unld_result *res;
+
+	res = parsed_result;
+
+	if (strcmp(res->dir, "rx") == 0)
+		rte_bpf_eth_rx_unload(res->port, res->queue);
+	else if (strcmp(res->dir, "tx") == 0)
+		rte_bpf_eth_tx_unload(res->port, res->queue);
+	else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_unload_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			bpf, "bpf-unload");
+cmdline_parse_token_string_t cmd_unload_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_unload_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_unload_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, queue, UINT16);
+
+cmdline_parse_inst_t cmd_operate_bpf_unld_parse = {
+	.f = cmd_operate_bpf_unld_parsed,
+	.data = NULL,
+	.help_str = "bpf-unload rx|tx <port> <queue>",
+	.tokens = {
+		(void *)&cmd_unload_bpf_start,
+		(void *)&cmd_unload_bpf_dir,
+		(void *)&cmd_unload_bpf_port,
+		(void *)&cmd_unload_bpf_queue,
+		NULL,
+	},
+};
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -16272,6 +16416,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_del_port_tm_node,
 	(cmdline_parse_inst_t *)&cmd_set_port_tm_node_parent,
 	(cmdline_parse_inst_t *)&cmd_port_tm_hierarchy_commit,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_ld_parse,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_unld_parse,
 	NULL,
 };
 
diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
index b47537642..602e20ac3 100644
--- a/app/test-pmd/meson.build
+++ b/app/test-pmd/meson.build
@@ -21,7 +21,7 @@ sources = files('cmdline.c',
 	'testpmd.c',
 	'txonly.c')
 
-deps = ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'meter', 'bus_pci']
+deps = ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'meter', 'bus_pci', 'bpf']
 if dpdk_conf.has('RTE_LIBRTE_PDUMP')
 	deps += 'pdump'
 endif
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 07/10] test: add few eBPF samples
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
                     ` (6 preceding siblings ...)
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 06/10] testpmd: new commands to load/unload " Konstantin Ananyev
@ 2018-04-06 18:49   ` Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 08/10] test: introduce functional test for librte_bpf Konstantin Ananyev
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 18:49 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add few simple eBPF programs as an example.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 test/bpf/dummy.c |  20 ++
 test/bpf/mbuf.h  | 578 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 test/bpf/t1.c    |  52 +++++
 test/bpf/t2.c    |  31 +++
 test/bpf/t3.c    |  36 ++++
 5 files changed, 717 insertions(+)
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c

diff --git a/test/bpf/dummy.c b/test/bpf/dummy.c
new file mode 100644
index 000000000..5851469e7
--- /dev/null
+++ b/test/bpf/dummy.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * does nothing always return success.
+ * used to measure BPF infrastructure overhead.
+ * To compile:
+ * clang -O2 -target bpf -c dummy.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+
+uint64_t
+entry(void *arg)
+{
+	return 1;
+}
diff --git a/test/bpf/mbuf.h b/test/bpf/mbuf.h
new file mode 100644
index 000000000..f24f908d7
--- /dev/null
+++ b/test/bpf/mbuf.h
@@ -0,0 +1,578 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation.
+ * Copyright 2014 6WIND S.A.
+ */
+
+/*
+ * Snipper from dpdk.org rte_mbuf.h.
+ * used to provide BPF programs information about rte_mbuf layout.
+ */
+
+#ifndef _MBUF_H_
+#define _MBUF_H_
+
+#include <stdint.h>
+#include <rte_common.h>
+#include <rte_memory.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * Packet Offload Features Flags. It also carry packet type information.
+ * Critical resources. Both rx/tx shared these bits. Be cautious on any change
+ *
+ * - RX flags start at bit position zero, and get added to the left of previous
+ *   flags.
+ * - The most-significant 3 bits are reserved for generic mbuf flags
+ * - TX flags therefore start at bit position 60 (i.e. 63-3), and new flags get
+ *   added to the right of the previously defined flags i.e. they should count
+ *   downwards, not upwards.
+ *
+ * Keep these flags synchronized with rte_get_rx_ol_flag_name() and
+ * rte_get_tx_ol_flag_name().
+ */
+
+/**
+ * RX packet is a 802.1q VLAN packet. This flag was set by PMDs when
+ * the packet is recognized as a VLAN, but the behavior between PMDs
+ * was not the same. This flag is kept for some time to avoid breaking
+ * applications and should be replaced by PKT_RX_VLAN_STRIPPED.
+ */
+#define PKT_RX_VLAN_PKT      (1ULL << 0)
+
+#define PKT_RX_RSS_HASH      (1ULL << 1)
+/**< RX packet with RSS hash result. */
+#define PKT_RX_FDIR          (1ULL << 2)
+/**< RX packet with FDIR match indicate. */
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_L4_CKSUM_MASK.
+ * This flag was set when the L4 checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_IP_CKSUM_MASK.
+ * This flag was set when the IP checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
+
+#define PKT_RX_EIP_CKSUM_BAD (1ULL << 5)
+/**< External IP header checksum error. */
+
+/**
+ * A vlan has been stripped by the hardware and its tci is saved in
+ * mbuf->vlan_tci. This can only happen if vlan stripping is enabled
+ * in the RX configuration of the PMD.
+ */
+#define PKT_RX_VLAN_STRIPPED (1ULL << 6)
+
+/**
+ * Mask of bits used to determine the status of RX IP checksum.
+ * - PKT_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
+ * - PKT_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
+ * - PKT_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
+ * - PKT_RX_IP_CKSUM_NONE: the IP checksum is not correct in the packet
+ *   data, but the integrity of the IP header is verified.
+ */
+#define PKT_RX_IP_CKSUM_MASK ((1ULL << 4) | (1ULL << 7))
+
+#define PKT_RX_IP_CKSUM_UNKNOWN 0
+#define PKT_RX_IP_CKSUM_BAD     (1ULL << 4)
+#define PKT_RX_IP_CKSUM_GOOD    (1ULL << 7)
+#define PKT_RX_IP_CKSUM_NONE    ((1ULL << 4) | (1ULL << 7))
+
+/**
+ * Mask of bits used to determine the status of RX L4 checksum.
+ * - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
+ * - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
+ * - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
+ * - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
+ *   data, but the integrity of the L4 data is verified.
+ */
+#define PKT_RX_L4_CKSUM_MASK ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_L4_CKSUM_UNKNOWN 0
+#define PKT_RX_L4_CKSUM_BAD     (1ULL << 3)
+#define PKT_RX_L4_CKSUM_GOOD    (1ULL << 8)
+#define PKT_RX_L4_CKSUM_NONE    ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_IEEE1588_PTP  (1ULL << 9)
+/**< RX IEEE1588 L2 Ethernet PT Packet. */
+#define PKT_RX_IEEE1588_TMST (1ULL << 10)
+/**< RX IEEE1588 L2/L4 timestamped packet.*/
+#define PKT_RX_FDIR_ID       (1ULL << 13)
+/**< FD id reported if FDIR match. */
+#define PKT_RX_FDIR_FLX      (1ULL << 14)
+/**< Flexible bytes reported if FDIR match. */
+
+/**
+ * The 2 vlans have been stripped by the hardware and their tci are
+ * saved in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
+ * This can only happen if vlan stripping is enabled in the RX
+ * configuration of the PMD. If this flag is set, PKT_RX_VLAN_STRIPPED
+ * must also be set.
+ */
+#define PKT_RX_QINQ_STRIPPED (1ULL << 15)
+
+/**
+ * Deprecated.
+ * RX packet with double VLAN stripped.
+ * This flag is replaced by PKT_RX_QINQ_STRIPPED.
+ */
+#define PKT_RX_QINQ_PKT      PKT_RX_QINQ_STRIPPED
+
+/**
+ * When packets are coalesced by a hardware or virtual driver, this flag
+ * can be set in the RX mbuf, meaning that the m->tso_segsz field is
+ * valid and is set to the segment size of original packets.
+ */
+#define PKT_RX_LRO           (1ULL << 16)
+
+/**
+ * Indicate that the timestamp field in the mbuf is valid.
+ */
+#define PKT_RX_TIMESTAMP     (1ULL << 17)
+
+/* add new RX flags here */
+
+/* add new TX flags here */
+
+/**
+ * Offload the MACsec. This flag must be set by the application to enable
+ * this offload feature for a packet to be transmitted.
+ */
+#define PKT_TX_MACSEC        (1ULL << 44)
+
+/**
+ * Bits 45:48 used for the tunnel type.
+ * When doing Tx offload like TSO or checksum, the HW needs to configure the
+ * tunnel type into the HW descriptors.
+ */
+#define PKT_TX_TUNNEL_VXLAN   (0x1ULL << 45)
+#define PKT_TX_TUNNEL_GRE     (0x2ULL << 45)
+#define PKT_TX_TUNNEL_IPIP    (0x3ULL << 45)
+#define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
+/**< TX packet with MPLS-in-UDP RFC 7510 header. */
+#define PKT_TX_TUNNEL_MPLSINUDP (0x5ULL << 45)
+/* add new TX TUNNEL type here */
+#define PKT_TX_TUNNEL_MASK    (0xFULL << 45)
+
+/**
+ * Second VLAN insertion (QinQ) flag.
+ */
+#define PKT_TX_QINQ_PKT    (1ULL << 49)
+/**< TX packet with double VLAN inserted. */
+
+/**
+ * TCP segmentation offload. To enable this offload feature for a
+ * packet to be transmitted on hardware supporting TSO:
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *    PKT_TX_TCP_CKSUM)
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
+ *    to 0 in the packet
+ *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
+ *  - calculate the pseudo header checksum without taking ip_len in account,
+ *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
+ *    rte_ipv6_phdr_cksum() that can be used as helpers.
+ */
+#define PKT_TX_TCP_SEG       (1ULL << 50)
+
+#define PKT_TX_IEEE1588_TMST (1ULL << 51)
+/**< TX IEEE1588 packet to timestamp. */
+
+/**
+ * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
+ * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
+ * L4 checksum offload, the user needs to:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - calculate the pseudo header checksum and set it in the L4 header (only
+ *    for TCP or UDP). See rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum().
+ *    For SCTP, set the crc field to 0.
+ */
+#define PKT_TX_L4_NO_CKSUM   (0ULL << 52)
+/**< Disable L4 cksum of TX pkt. */
+#define PKT_TX_TCP_CKSUM     (1ULL << 52)
+/**< TCP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_SCTP_CKSUM    (2ULL << 52)
+/**< SCTP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_UDP_CKSUM     (3ULL << 52)
+/**< UDP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_L4_MASK       (3ULL << 52)
+/**< Mask for L4 cksum offload request. */
+
+/**
+ * Offload the IP checksum in the hardware. The flag PKT_TX_IPV4 should
+ * also be set by the application, although a PMD will only check
+ * PKT_TX_IP_CKSUM.
+ *  - set the IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: l2_len, l3_len
+ */
+#define PKT_TX_IP_CKSUM      (1ULL << 54)
+
+/**
+ * Packet is IPv4. This flag must be set when using any offload feature
+ * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV4          (1ULL << 55)
+
+/**
+ * Packet is IPv6. This flag must be set when using an offload feature
+ * (TSO or L4 checksum) to tell the NIC that the packet is an IPv6
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV6          (1ULL << 56)
+
+#define PKT_TX_VLAN_PKT      (1ULL << 57)
+/**< TX packet is a 802.1q VLAN packet. */
+
+/**
+ * Offload the IP checksum of an external header in the hardware. The
+ * flag PKT_TX_OUTER_IPV4 should also be set by the application, alto ugh
+ * a PMD will only check PKT_TX_IP_CKSUM.  The IP checksum field in the
+ * packet must be set to 0.
+ *  - set the outer IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: outer_l2_len, outer_l3_len
+ */
+#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
+
+/**
+ * Packet outer header is IPv4. This flag must be set when using any
+ * outer offload feature (L3 or L4 checksum) to tell the NIC that the
+ * outer header of the tunneled packet is an IPv4 packet.
+ */
+#define PKT_TX_OUTER_IPV4   (1ULL << 59)
+
+/**
+ * Packet outer header is IPv6. This flag must be set when using any
+ * outer offload feature (L4 checksum) to tell the NIC that the outer
+ * header of the tunneled packet is an IPv6 packet.
+ */
+#define PKT_TX_OUTER_IPV6    (1ULL << 60)
+
+/**
+ * Bitmask of all supported packet Tx offload features flags,
+ * which can be set for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_IEEE1588_TMST |	 \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK |	 \
+		PKT_TX_MACSEC)
+
+#define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
+
+#define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
+
+/* Use final bit of flags to indicate a control mbuf */
+#define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
+
+/** Alignment constraint of mbuf private area. */
+#define RTE_MBUF_PRIV_ALIGN 8
+
+/**
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of RX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the RX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of TX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the TX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Some NICs need at least 2KB buffer to RX standard Ethernet frame without
+ * splitting it into multiple segments.
+ * So, for mbufs that planned to be involved into RX/TX, the recommended
+ * minimal buffer length is 2KB + RTE_PKTMBUF_HEADROOM.
+ */
+#define	RTE_MBUF_DEFAULT_DATAROOM	2048
+#define	RTE_MBUF_DEFAULT_BUF_SIZE	\
+	(RTE_MBUF_DEFAULT_DATAROOM + RTE_PKTMBUF_HEADROOM)
+
+/* define a set of marker types that can be used to refer to set points in the
+ * mbuf.
+ */
+__extension__
+typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
+__extension__
+typedef uint8_t  MARKER8[0];  /**< generic marker with 1B alignment */
+__extension__
+typedef uint64_t MARKER64[0];
+/**< marker that allows us to overwrite 8 bytes with a single assignment */
+
+typedef struct {
+	volatile int16_t cnt; /**< An internal counter value. */
+} rte_atomic16_t;
+
+/**
+ * The generic rte_mbuf, containing a packet mbuf.
+ */
+struct rte_mbuf {
+	MARKER cacheline0;
+
+	void *buf_addr;           /**< Virtual address of segment buffer. */
+	/**
+	 * Physical address of segment buffer.
+	 * Force alignment to 8-bytes, so as to ensure we have the exact
+	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+	 * working on vector drivers easier.
+	 */
+	phys_addr_t buf_physaddr __rte_aligned(sizeof(phys_addr_t));
+
+	/* next 8 bytes are initialised on RX descriptor rearm */
+	MARKER64 rearm_data;
+	uint16_t data_off;
+
+	/**
+	 * Reference counter. Its size should at least equal to the size
+	 * of port field (16 bits), to support zero-copy broadcast.
+	 * It should only be accessed using the following functions:
+	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
+	 * config option.
+	 */
+	RTE_STD_C11
+	union {
+		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
+		uint16_t refcnt;
+		/**< Non-atomically accessed refcnt */
+	};
+	uint16_t nb_segs;         /**< Number of segments. */
+
+	/** Input port (16 bits to support more than 256 virtual ports). */
+	uint16_t port;
+
+	uint64_t ol_flags;        /**< Offload features. */
+
+	/* remaining bytes are set on RX when pulling packet from descriptor */
+	MARKER rx_descriptor_fields1;
+
+	/*
+	 * The packet type, which is the combination of outer/inner L2, L3, L4
+	 * and tunnel types. The packet_type is about data really present in the
+	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+	 * vlan is stripped from the data.
+	 */
+	RTE_STD_C11
+	union {
+		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		struct {
+			uint32_t l2_type:4; /**< (Outer) L2 type. */
+			uint32_t l3_type:4; /**< (Outer) L3 type. */
+			uint32_t l4_type:4; /**< (Outer) L4 type. */
+			uint32_t tun_type:4; /**< Tunnel type. */
+			uint32_t inner_l2_type:4; /**< Inner L2 type. */
+			uint32_t inner_l3_type:4; /**< Inner L3 type. */
+			uint32_t inner_l4_type:4; /**< Inner L4 type. */
+		};
+	};
+
+	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
+	uint16_t data_len;        /**< Amount of data in segment buffer. */
+	/** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
+	uint16_t vlan_tci;
+
+	union {
+		uint32_t rss;     /**< RSS hash result if RSS enabled */
+		struct {
+			RTE_STD_C11
+			union {
+				struct {
+					uint16_t hash;
+					uint16_t id;
+				};
+				uint32_t lo;
+				/**< Second 4 flexible bytes */
+			};
+			uint32_t hi;
+			/**< First 4 flexible bytes or FD ID, dependent on
+			 *   PKT_RX_FDIR_* flag in ol_flags.
+			 */
+		} fdir;           /**< Filter identifier if FDIR enabled */
+		struct {
+			uint32_t lo;
+			uint32_t hi;
+		} sched;          /**< Hierarchical scheduler */
+		uint32_t usr;
+		/**< User defined tags. See rte_distributor_process() */
+	} hash;                   /**< hash information */
+
+	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
+	uint16_t vlan_tci_outer;
+
+	uint16_t buf_len;         /**< Length of segment buffer. */
+
+	/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
+	 * are not normalized but are always the same for a given port.
+	 */
+	uint64_t timestamp;
+
+	/* second cache line - fields only used in slow path or on TX */
+	MARKER cacheline1 __rte_cache_min_aligned;
+
+	RTE_STD_C11
+	union {
+		void *userdata;   /**< Can be used for external metadata */
+		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
+	};
+
+	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
+	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+
+	/* fields to support TX offloads */
+	RTE_STD_C11
+	union {
+		uint64_t tx_offload;       /**< combined for easy fetch */
+		__extension__
+		struct {
+			uint64_t l2_len:7;
+			/**< L2 (MAC) Header Length for non-tunneling pkt.
+			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
+			 */
+			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+			uint64_t tso_segsz:16; /**< TCP TSO segment size */
+
+			/* fields for TX offloading of tunnels */
+			uint64_t outer_l3_len:9;
+			/**< Outer L3 (IP) Hdr Length. */
+			uint64_t outer_l2_len:7;
+			/**< Outer L2 (MAC) Hdr Length. */
+
+			/* uint64_t unused:8; */
+		};
+	};
+
+	/** Size of the application private data. In case of an indirect
+	 * mbuf, it stores the direct mbuf private data size.
+	 */
+	uint16_t priv_size;
+
+	/** Timesync flags for use with IEEE1588. */
+	uint16_t timesync;
+
+	/** Sequence number. See also rte_reorder_insert(). */
+	uint32_t seqn;
+
+} __rte_cache_aligned;
+
+
+/**
+ * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
+ */
+#define RTE_MBUF_INDIRECT(mb)   ((mb)->ol_flags & IND_ATTACHED_MBUF)
+
+/**
+ * Returns TRUE if given mbuf is direct, or FALSE otherwise.
+ */
+#define RTE_MBUF_DIRECT(mb)     (!RTE_MBUF_INDIRECT(mb))
+
+/**
+ * Private data in case of pktmbuf pool.
+ *
+ * A structure that contains some pktmbuf_pool-specific data that are
+ * appended after the mempool structure (in private data).
+ */
+struct rte_pktmbuf_pool_private {
+	uint16_t mbuf_data_room_size; /**< Size of data space in each mbuf. */
+	uint16_t mbuf_priv_size;      /**< Size of private area in each mbuf. */
+};
+
+/**
+ * A macro that points to an offset into the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param o
+ *   The offset into the mbuf data.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod_offset(m, t, o)	\
+	((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
+
+/**
+ * A macro that points to the start of the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _MBUF_H_ */
diff --git a/test/bpf/t1.c b/test/bpf/t1.c
new file mode 100644
index 000000000..60f9434ab
--- /dev/null
+++ b/test/bpf/t1.c
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to first segment packet data as an input parameter.
+ * analog of tcpdump -s 1 -d 'dst 1.2.3.4 && udp && dst port 5000'
+ * (000) ldh      [12]
+ * (001) jeq      #0x800           jt 2    jf 12
+ * (002) ld       [30]
+ * (003) jeq      #0x1020304       jt 4    jf 12
+ * (004) ldb      [23]
+ * (005) jeq      #0x11            jt 6    jf 12
+ * (006) ldh      [20]
+ * (007) jset     #0x1fff          jt 12   jf 8
+ * (008) ldxb     4*([14]&0xf)
+ * (009) ldh      [x + 16]
+ * (010) jeq      #0x1388          jt 11   jf 12
+ * (011) ret      #1
+ * (012) ret      #0
+ *
+ * To compile:
+ * clang -O2 -target bpf -c t1.c
+ */
+
+#include <stdint.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <netinet/udp.h>
+
+uint64_t
+entry(void *pkt)
+{
+	struct ether_header *ether_header = (void *)pkt;
+
+	if (ether_header->ether_type != __builtin_bswap16(0x0800))
+		return 0;
+
+	struct iphdr *iphdr = (void *)(ether_header + 1);
+	if (iphdr->protocol != 17 || (iphdr->frag_off & 0x1ffff) != 0 ||
+			iphdr->daddr != __builtin_bswap32(0x1020304))
+		return 0;
+
+	int hlen = iphdr->ihl * 4;
+	struct udphdr *udphdr = (void *)iphdr + hlen;
+
+	if (udphdr->dest !=  __builtin_bswap16(5000))
+		return 0;
+
+	return 1;
+}
diff --git a/test/bpf/t2.c b/test/bpf/t2.c
new file mode 100644
index 000000000..69d7a4fe1
--- /dev/null
+++ b/test/bpf/t2.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * cleanup mbuf's vlan_tci and all related RX flags
+ * (PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED).
+ * Doesn't touch contents of packet data.
+ * To compile:
+ * clang -O2 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t2.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <rte_config.h>
+#include "mbuf.h"
+
+uint64_t
+entry(void *pkt)
+{
+	struct rte_mbuf *mb;
+
+	mb = pkt;
+	mb->vlan_tci = 0;
+	mb->ol_flags &= ~(PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED);
+
+	return 1;
+}
diff --git a/test/bpf/t3.c b/test/bpf/t3.c
new file mode 100644
index 000000000..531b9cb8c
--- /dev/null
+++ b/test/bpf/t3.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * Dump the mbuf into stdout if it is an ARP packet (aka tcpdump 'arp').
+ * To compile:
+ * clang -O2 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t3.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <net/ethernet.h>
+#include <rte_config.h>
+#include "mbuf.h"
+
+extern void rte_pktmbuf_dump(FILE *, const struct rte_mbuf *, unsigned int);
+
+uint64_t
+entry(const void *pkt)
+{
+	const struct rte_mbuf *mb;
+	const struct ether_header *eth;
+
+	mb = pkt;
+	eth = rte_pktmbuf_mtod(mb, const struct ether_header *);
+
+	if (eth->ether_type == __builtin_bswap16(ETHERTYPE_ARP))
+		rte_pktmbuf_dump(stdout, mb, 64);
+
+	return 1;
+}
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 08/10] test: introduce functional test for librte_bpf
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
                     ` (7 preceding siblings ...)
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 07/10] test: add few eBPF samples Konstantin Ananyev
@ 2018-04-06 18:49   ` Konstantin Ananyev
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 09/10] doc: add librte_bpf related info Konstantin Ananyev
  2018-04-06 23:18   ` [dpdk-dev] [PATCH v3 10/10] MAINTAINERS: " Konstantin Ananyev
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 18:49 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 test/test/Makefile    |   2 +
 test/test/meson.build |   2 +
 test/test/test_bpf.c  | 633 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 637 insertions(+)
 create mode 100644 test/test/test_bpf.c

diff --git a/test/test/Makefile b/test/test/Makefile
index a88cc38bf..61ac6880d 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -193,6 +193,8 @@ endif
 
 SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += test_bpf.c
+
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 
 CFLAGS += -O3
diff --git a/test/test/meson.build b/test/test/meson.build
index eb3d87a4d..101446984 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -8,6 +8,7 @@ test_sources = files('commands.c',
 	'test_alarm.c',
 	'test_atomic.c',
 	'test_barrier.c',
+	'test_bpf.c',
 	'test_byteorder.c',
 	'test_cmdline.c',
 	'test_cmdline_cirbuf.c',
@@ -98,6 +99,7 @@ test_sources = files('commands.c',
 )
 
 test_deps = ['acl',
+	'bpf',
 	'cfgfile',
 	'cmdline',
 	'cryptodev',
diff --git a/test/test/test_bpf.c b/test/test/test_bpf.c
new file mode 100644
index 000000000..20b6de9de
--- /dev/null
+++ b/test/test/test_bpf.c
@@ -0,0 +1,633 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_memory.h>
+#include <rte_debug.h>
+#include <rte_hexdump.h>
+#include <rte_random.h>
+#include <rte_errno.h>
+#include <rte_bpf.h>
+
+#include "test.h"
+
+/*
+ * Basic functional tests for librte_bpf.
+ * The main procedure - load eBPF program, execute it and
+ * compare restuls with expected values.
+ * Note that more tests have to be added to cover remaining instructions.
+ */
+
+struct dummy_offset {
+	uint64_t u64;
+	uint32_t u32;
+	uint16_t u16;
+	uint8_t  u8;
+};
+
+struct dummy_vect8 {
+	struct dummy_offset in[8];
+	struct dummy_offset out[8];
+};
+
+#define	TEST_FILL_1	0xDEADBEEF
+
+#define	TEST_MUL_1	21
+#define TEST_MUL_2	-100
+
+struct bpf_test {
+	const char *name;
+	size_t arg_sz;
+	struct rte_bpf_prm prm;
+	void (*prepare)(void *);
+	int (*check_result)(uint64_t, const void *);
+};
+
+/* store immediate test-cases */
+static const struct bpf_insn test_store1_prog[] = {
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_B),
+		.dst_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_H),
+		.dst_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u16),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+		.imm = TEST_FILL_1,
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static void
+test_store1_prepare(void *arg)
+{
+	struct dummy_offset *df;
+
+	df = arg;
+	memset(df, 0, sizeof(*df));
+}
+
+static int
+test_store1_check(uint64_t rc, const void *arg)
+{
+	const struct dummy_offset *dft;
+	struct dummy_offset dfe;
+
+	dft = arg;
+
+	if (rc != 1) {
+		printf("%s@%d: invalid return value %" PRIu64 "\n",
+			__func__, __LINE__, rc);
+		return -1;
+	}
+
+	memset(&dfe, 0, sizeof(dfe));
+	dfe.u64 = (int32_t)TEST_FILL_1;
+	dfe.u32 = dfe.u64;
+	dfe.u16 = dfe.u64;
+	dfe.u8 = dfe.u64;
+
+	if (memcmp(dft, &dfe, sizeof(dfe)) != 0) {
+		printf("%s: invalid value\n", __func__);
+		rte_memdump(stdout, "expected", &dfe, sizeof(dfe));
+		rte_memdump(stdout, "result", dft, sizeof(*dft));
+		return -1;
+	}
+
+	return 0;
+}
+
+/* store register test-cases */
+static const struct bpf_insn test_store2_prog[] = {
+
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_B),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_H),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_offset, u16),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+/* load test-cases */
+static const struct bpf_insn test_load1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_B),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_H),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u16),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return sum */
+	{
+		.code = (BPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_4,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_3,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static void
+test_load1_prepare(void *arg)
+{
+	struct dummy_offset *df;
+
+	df = arg;
+
+	memset(df, 0, sizeof(*df));
+	df->u64 = (int32_t)TEST_FILL_1;
+	df->u32 = df->u64;
+	df->u16 = df->u64;
+	df->u8 = df->u64;
+}
+
+static int
+test_load1_check(uint64_t rc, const void *arg)
+{
+	uint64_t v;
+	const struct dummy_offset *dft;
+
+	dft = arg;
+	v = dft->u64;
+	v += dft->u32;
+	v += dft->u16;
+	v += dft->u8;
+
+	if (v != rc) {
+		printf("%s@%d: invalid return value "
+			"expected=0x%" PRIx64 ", actual=0x%" PRIx64 "\n",
+			__func__, __LINE__, v, rc);
+		return -1;
+	}
+	return 0;
+}
+
+/* alu mul test-cases */
+static const struct bpf_insn test_mul1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_MUL | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MUL | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (BPF_ALU | BPF_MUL | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MUL | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_3,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static void
+test_mul1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+	uint64_t v;
+
+	dv = arg;
+
+	v = rte_rand();
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u32 = v;
+	dv->in[1].u64 = v << 12 | v >> 6;
+	dv->in[2].u32 = -v;
+}
+
+static int
+test_mul1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+
+	if (rc != 1) {
+		printf("%s@%d: invalid return value %" PRIu64 "\n",
+			__func__, __LINE__, rc);
+		return -1;
+	}
+
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 * TEST_MUL_1;
+	r3 *= TEST_MUL_2;
+	r4 = (uint32_t)(r4 * r2);
+	r4 *= r3;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	if (memcmp(dvt->out, dve.out, sizeof(dve.out)) != 0) {
+		printf("%s: invalid value\n", __func__);
+		rte_memdump(stdout, "expected", dve.out, sizeof(dve.out));
+		rte_memdump(stdout, "result", dvt->out, sizeof(dvt->out));
+		return -1;
+	}
+
+	return 0;
+}
+
+/* alu div test-cases */
+static const struct bpf_insn test_div1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_DIV | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOD | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_ALU | BPF_MOD | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_DIV | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_3,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	/* check that we can handle division by zero gracefully. */
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[3].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_DIV | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_2,
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static int
+test_div1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+
+	/*
+	 * in the test prog we attempted to divide by zero.
+	 * so it should return 0.
+	 */
+	if (rc != 0) {
+		printf("%s@%d: invalid return value %" PRIu64 "\n",
+			__func__, __LINE__, rc);
+		return -1;
+	}
+
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 / TEST_MUL_1;
+	r3 %= TEST_MUL_2;
+	r2 |= 1;
+	r3 |= 1;
+	r4 = (uint32_t)(r4 % r2);
+	r4 /= r3;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	if (memcmp(dvt->out, dve.out, sizeof(dve.out)) != 0) {
+		printf("%s: invalid value\n", __func__);
+		rte_memdump(stdout, "expected", dve.out, sizeof(dve.out));
+		rte_memdump(stdout, "result", dvt->out, sizeof(dvt->out));
+		return -1;
+	}
+
+	return 0;
+}
+
+static const struct bpf_test tests[] = {
+	{
+		.name = "test_store1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_store1_prog,
+			.nb_ins = RTE_DIM(test_store1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_store1_check,
+	},
+	{
+		.name = "test_store2",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_store2_prog,
+			.nb_ins = RTE_DIM(test_store2_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_store1_check,
+	},
+	{
+		.name = "test_load1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_load1_prog,
+			.nb_ins = RTE_DIM(test_load1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_load1_prepare,
+		.check_result = test_load1_check,
+	},
+	{
+		.name = "test_mul1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_mul1_prog,
+			.nb_ins = RTE_DIM(test_mul1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_mul1_prepare,
+		.check_result = test_mul1_check,
+	},
+	{
+		.name = "test_div1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_div1_prog,
+			.nb_ins = RTE_DIM(test_div1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_mul1_prepare,
+		.check_result = test_div1_check,
+	},
+};
+
+static int
+run_test(const struct bpf_test *tst)
+{
+	int32_t ret, rv;
+	int64_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_jit jit;
+	uint8_t tbuf[tst->arg_sz];
+
+	printf("%s(%s) start\n", __func__, tst->name);
+
+	bpf = rte_bpf_load(&tst->prm);
+	if (bpf == NULL) {
+		printf("%s%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		return -1;
+	}
+
+	tst->prepare(tbuf);
+
+	rc = rte_bpf_exec(bpf, tbuf);
+	ret = tst->check_result(rc, tbuf);
+	if (ret != 0) {
+		printf("%s%d: check_result(%s) failed, error: %d(%s);\n",
+			__func__, __LINE__, tst->name, ret, strerror(ret));
+	}
+
+	rte_bpf_get_jit(bpf, &jit);
+	if (jit.func == NULL)
+		return 0;
+
+	tst->prepare(tbuf);
+	rc = jit.func(tbuf);
+	rv = tst->check_result(rc, tbuf);
+	ret |= rv;
+	if (rv != 0) {
+		printf("%s%d: check_result(%s) failed, error: %d(%s);\n",
+			__func__, __LINE__, tst->name, rv, strerror(ret));
+	}
+
+	rte_bpf_destroy(bpf);
+	return ret;
+
+}
+
+static int
+test_bpf(void)
+{
+	int32_t rc;
+	uint32_t i;
+
+	rc = 0;
+	for (i = 0; i != RTE_DIM(tests); i++)
+		rc |= run_test(tests + i);
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 09/10] doc: add librte_bpf related info
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
                     ` (8 preceding siblings ...)
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 08/10] test: introduce functional test for librte_bpf Konstantin Ananyev
@ 2018-04-06 18:49   ` Konstantin Ananyev
  2018-04-23 13:22     ` Kovacevic, Marko
  2018-04-06 23:18   ` [dpdk-dev] [PATCH v3 10/10] MAINTAINERS: " Konstantin Ananyev
  10 siblings, 1 reply; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 18:49 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 doc/api/doxy-api-index.md         |  3 ++-
 doc/api/doxy-api.conf             |  1 +
 doc/guides/prog_guide/bpf_lib.rst | 37 +++++++++++++++++++++++++++++++++++++
 doc/guides/prog_guide/index.rst   |  1 +
 4 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/prog_guide/bpf_lib.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 2f04619cb..d0c1c37ad 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -133,7 +133,8 @@ The public API headers are grouped by topics:
   [EFD]                (@ref rte_efd.h),
   [ACL]                (@ref rte_acl.h),
   [member]             (@ref rte_member.h),
-  [flow classify]      (@ref rte_flow_classify.h)
+  [flow classify]      (@ref rte_flow_classify.h),
+  [BPF]                (@ref rte_bpf.h)
 
 - **containers**:
   [mbuf]               (@ref rte_mbuf.h),
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index cda52fdfb..c8eb6d893 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -42,6 +42,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_acl \
                           lib/librte_bbdev \
                           lib/librte_bitratestats \
+                          lib/librte_bpf \
                           lib/librte_cfgfile \
                           lib/librte_cmdline \
                           lib/librte_compat \
diff --git a/doc/guides/prog_guide/bpf_lib.rst b/doc/guides/prog_guide/bpf_lib.rst
new file mode 100644
index 000000000..edc0baafd
--- /dev/null
+++ b/doc/guides/prog_guide/bpf_lib.rst
@@ -0,0 +1,37 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+Berkeley Packet Filter Library
+==============================
+
+The DPDK provides an BPF library that gives the ability
+to load and execute Enhanced Berkeley Packet Filter (eBPF) bytecode within
+user-space dpdk appilication.
+
+It supports basic set of features from eBPF spec.
+Please refer to the
+`eBPF spec <https://www.kernel.org/doc/Documentation/networking/filter.txt>`
+for more information.
+Also it introduces basic framework to load/unload BPF-based filters
+on eth devices (right now only via SW RX/TX callbacks).
+
+The library API provides the following basic operations:
+
+*  Create a new BPF execution context and load user provided eBPF code into it.
+
+*   Destroy an BPF execution context and its runtime structures and free the associated memory.
+
+*   Execute eBPF bytecode associated with provied input parameter.
+
+*   Provide information about natively compield code for given BPF context.
+
+*   Load BPF program from the ELF file and install callback to execute it on given ethdev port/queue.
+
+Not currently supported eBPF features
+-------------------------------------
+
+ - JIT
+ - cBPF
+ - tail-pointer call
+ - eBPF MAP
+ - skb
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index bbbe7895d..76b079c3f 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -49,6 +49,7 @@ Programmer's Guide
     vhost_lib
     metrics_lib
     port_hotplug_framework
+    bpf_lib
     source_org
     dev_kit_build_system
     dev_kit_root_make_help
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v3 10/10] MAINTAINERS: add librte_bpf related info
  2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
                     ` (9 preceding siblings ...)
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 09/10] doc: add librte_bpf related info Konstantin Ananyev
@ 2018-04-06 23:18   ` Konstantin Ananyev
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-06 23:18 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 MAINTAINERS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ed3251da7..db7fec362 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -964,6 +964,10 @@ Latency statistics
 M: Reshma Pattan <reshma.pattan@intel.com>
 F: lib/librte_latencystats/
 
+BPF
+M: Konstantin Ananyev <konstantin.ananyev@intel.com>
+F: lib/librte_bpf/
+F: doc/guides/prog_guide/bpf_lib.rst
 
 Test Applications
 -----------------
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters
  2018-04-05 12:51             ` Ananyev, Konstantin
@ 2018-04-09  4:38               ` Jerin Jacob
  0 siblings, 0 replies; 83+ messages in thread
From: Jerin Jacob @ 2018-04-09  4:38 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

-----Original Message-----
> Date: Thu, 5 Apr 2018 12:51:16 +0000
> From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: "dev@dpdk.org" <dev@dpdk.org>
> Subject: RE: [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF
>  filters
> 
> 
> Hi Jerin,
> 
> > 
> > >
> > > > > >
> > > > > > > +/*
> > > > > > > + * Marks given callback as used by datapath.
> > > > > > > + */
> > > > > > > +static __rte_always_inline void
> > > > > > > +bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
> > > > > > > +{
> > > > > > > +	cbi->use++;
> > > > > > > +	/* make sure no store/load reordering could happen */
> > > > > > > +	rte_smp_mb();
> > > > > > > +}
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Marks given callback list as not used by datapath.
> > > > > > > + */
> > > > > > > +static __rte_always_inline void
> > > > > > > +bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
> > > > > > > +{
> > > > > > > +	/* make sure all previous loads are completed */
> > > > > > > +	rte_smp_rmb();
> > > > > >
> > > > > > We earlier discussed this barrier. Will following scheme works out to
> > > > > > fix the bpf_eth_cbi_wait() without cbi->use scheme?
> > > > > >
> > > > > > #ie. We need to exit from jitted or interpreted code irrespective of its
> > > > > > state. IMO, We can do that by an _arch_ specific function to fill jitted  memory with
> > > > > > "exit" opcode(value:0x95, exit, return r0),so that above code needs to be come out i n anycase,
> > > > > > on next instruction execution. I know, jitted memory is read-only in your
> > > > > > design, I think, we can change the permission to "write" to the fill
> > > > > > "exit" opcode(both jitted or interpreted case) for termination.
> > > > > >
> > > > > > What you think?
> > > > >
> > > > > Not sure I understand your proposal...
> > > >
> > > > If I understand it correctly, bpf_eth_cbi_wait() is used to _wait_ until
> > > > eBPF program exits? Right?
> > >
> > > Kind off, but not only.
> > > After  bpf_eth_cbi_wait() finishes it is guaranteed that data-path wouldn't try
> > > to access the resources associated with given bpf_eth_cbi (bpf, jit), so we
> > > can proceed with freeing them.
> > >
> > > > . Instead of using bpf_eth_cbi_[un]use()
> > > > scheme which involves the barrier. How about,
> > > >
> > > > in bpf_eth_cbi_wait()
> > > > {
> > > >
> > > > memset the EBPF "program memory" with 0x95 value. Which is an "exit" and
> > > > "return r0" EPBF opcode, Which makes program to terminate by it own
> > > > as on 0x95 instruction, CPU decodes and it gets out from EPBF program.
> > > >
> > > > }
> > > >
> > > > In jitted case, it is not 0x95 instruction, which will be an arch
> > > > specific instructions, We can have arch abstraction to generated
> > > > such instruction for "exit" opcode. And use common code to fill the instructions
> > > > to exit from EPBF program provided by arch code.
> > > >
> > > > Does that makes sense?
> > >
> > > There is no much point in doing it.
> > 
> > It helps in avoiding the barrier on non x86 case. Right? 
> 
> Nope, I believe it doesn't, see below.
> 
> > So it is useful
> > thing. Right? and avoid the extra logic in fastpath increment/decrement
> > "inuse" counters for all the archs.
> > 
> > > What we need is a guarantee that after some point data-path wouldn't try to access
> > > given bpf context, so we can destroy it.
> > 
> > Is there any reason why you think, above proposed solution wont
> > guarantee the termination eBPF program?
> > 
> > -ie,
> > 1)memset to "exit" instruction in eBPF memory
> 
> Even when code is just interpreted (bpf_exec()) - there still be cases 
> when you need to synchronize execution thread with thread updating the code
> (32bit systems, 16B LDDW instruction, etc.).  
> With JIT-ed code things will become much more complicated (icache, variable size instructions)
> and I can't see  how it could be done without extra synchronization between execute and update threads.
> 
> > 2)Wait for N instruction cycles to terminate the program.
> 
> There is no way to guarantee that execution would take exactly N cycles.
> Execution thread could be preempted/interrupted, it could be executing syscall,
> there could be CPU stall (access slow memory, cpu freq change, etc.). 

I agree. Things make worst with EBPF tail call etc.

> 
> So even we'll solve all problems with 1) - it wouldn't buy us a safe solution.
> 
> Actually quite a lot of research was done how to speedup slow/fast path synchronization
> in user-space:
> 
> https://lwn.net/Articles/573424/
> some theory beyond:
> https://lttng.org/files/thesis/desnoyers-dissertation-2009-12-v27.pdf (chapter 6)
> They even introduced a new syscall in Linux for these purposes:
> http://man7.org/linux/man-pages/man2/membarrier.2.html
> 
> I thought about something similar based on membarrier(), but it has
> few implications:
> 1. only latest linux kernels (4.14+) 
> 2. Not sure is it available on non x86 platforms.
> 3. Need to measure real impact.
> 
> Because of 1) and 2) we probably would need both mb() and membarrier() code paths.
> Anyway - it is probably worth investigating for more generic solution,
> but I suppose it is out of scope for that patch.

Yes.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v3 00/10] add framework to load and execute BPF code
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 00/10] add framework to load and execute BPF code Konstantin Ananyev
@ 2018-04-09  4:54     ` Jerin Jacob
  2018-04-09 11:10       ` Ananyev, Konstantin
  0 siblings, 1 reply; 83+ messages in thread
From: Jerin Jacob @ 2018-04-09  4:54 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev

-----Original Message-----
> Date: Fri, 6 Apr 2018 19:49:32 +0100
> From: Konstantin Ananyev <konstantin.ananyev@intel.com>
> To: dev@dpdk.org
> CC: Konstantin Ananyev <konstantin.ananyev@intel.com>
> Subject: [dpdk-dev] [PATCH v3 00/10] add framework to load and execute BPF
>  code
> X-Mailer: git-send-email 1.7.0.7
> 
> BPF is used quite intensively inside Linux (and BSD) kernels
> for various different purposes and proved to be extremely useful.
> 
> BPF inside DPDK might also be used in a lot of places
> for a lot of similar things.
>  As an example to:
> - packet filtering/tracing (aka tcpdump)
> - packet classification
> - statistics collection
> - HW/PMD live-system debugging/prototyping - trace HW descriptors,
>   internal PMD SW state, etc.
> - Comeup with your own idea
> 
> All of that in a dynamic, user-defined and extensible manner.
> 
> So these series introduce new library - librte_bpf.
> librte_bpf provides API to load and execute BPF bytecode within
> user-space dpdk app.
> It supports basic set of features from eBPF spec.
> Also it introduces basic framework to load/unload BPF-based filters
> on eth devices (right now via SW RX/TX callbacks).
> 
> How to try it:
> ===============
> 
> 1) run testpmd as usual and start your favorite forwarding case.
> 2) build bpf program you'd like to load
> (you'll need clang v3.7 or above):
> $ cd test/bpf
> $ clang -O2 -target bpf -c t1.c
> 
> 3) load bpf program(s):
> testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>
> 
> <load-flags>:  [-][J][M]
> J - use JIT generated native code, otherwise BPF interpreter will be used.
> M - assume input parameter is a pointer to rte_mbuf,
>     otherwise assume it is a pointer to first segment's data.
> 
> Few examples:
> 
> # to load (not JITed) dummy.o at TX queue 0, port 0:
> testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o
> #to load (and JIT compile) t1.o at RX queue 0, port 1:
> testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o
> 
> #to load and JIT t3.o (note that it expects mbuf as an input):
> testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o
> 
> 4) observe changed traffic behavior
> Let say with the examples above:
>   - dummy.o  does literally nothing, so no changes should be here,
>     except some possible slowdown.
>  - t1.o - should force to drop all packets that doesn't match:
>    'dst 1.2.3.4 && udp && dst port 5000' filter.
>  - t3.o - should dump to stdout ARP packets.
> 
> 5) unload some or all bpf programs:
> testpmd> bpf-unload tx 0 0
> 
> 6) continue with step 3) or exit
> 
> Not currently supported features:
> =================================
> - cBPF
> - tail-pointer call
> - eBPF MAP
> - JIT for non X86_64 targets

May be for next release, we are planning to add arm64 JIT support.
Just wondering, How do you test all EBPF opcodes in JIT/Interpreter mode?
Are you planning to add any UT like linux kernel in dpdk ?  or it was
similar to https://github.com/iovisor/ubpf/tree/master/tests ?

Just asking because, when we introduce arm64 JIT support similar
test cases should be required to verify the implementation.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v3 00/10] add framework to load and execute BPF code
  2018-04-09  4:54     ` Jerin Jacob
@ 2018-04-09 11:10       ` Ananyev, Konstantin
  0 siblings, 0 replies; 83+ messages in thread
From: Ananyev, Konstantin @ 2018-04-09 11:10 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev


Hi Jerin,

> >
> > BPF is used quite intensively inside Linux (and BSD) kernels
> > for various different purposes and proved to be extremely useful.
> >
> > BPF inside DPDK might also be used in a lot of places
> > for a lot of similar things.
> >  As an example to:
> > - packet filtering/tracing (aka tcpdump)
> > - packet classification
> > - statistics collection
> > - HW/PMD live-system debugging/prototyping - trace HW descriptors,
> >   internal PMD SW state, etc.
> > - Comeup with your own idea
> >
> > All of that in a dynamic, user-defined and extensible manner.
> >
> > So these series introduce new library - librte_bpf.
> > librte_bpf provides API to load and execute BPF bytecode within
> > user-space dpdk app.
> > It supports basic set of features from eBPF spec.
> > Also it introduces basic framework to load/unload BPF-based filters
> > on eth devices (right now via SW RX/TX callbacks).
> >
> > How to try it:
> > ===============
> >
> > 1) run testpmd as usual and start your favorite forwarding case.
> > 2) build bpf program you'd like to load
> > (you'll need clang v3.7 or above):
> > $ cd test/bpf
> > $ clang -O2 -target bpf -c t1.c
> >
> > 3) load bpf program(s):
> > testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>
> >
> > <load-flags>:  [-][J][M]
> > J - use JIT generated native code, otherwise BPF interpreter will be used.
> > M - assume input parameter is a pointer to rte_mbuf,
> >     otherwise assume it is a pointer to first segment's data.
> >
> > Few examples:
> >
> > # to load (not JITed) dummy.o at TX queue 0, port 0:
> > testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o
> > #to load (and JIT compile) t1.o at RX queue 0, port 1:
> > testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o
> >
> > #to load and JIT t3.o (note that it expects mbuf as an input):
> > testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o
> >
> > 4) observe changed traffic behavior
> > Let say with the examples above:
> >   - dummy.o  does literally nothing, so no changes should be here,
> >     except some possible slowdown.
> >  - t1.o - should force to drop all packets that doesn't match:
> >    'dst 1.2.3.4 && udp && dst port 5000' filter.
> >  - t3.o - should dump to stdout ARP packets.
> >
> > 5) unload some or all bpf programs:
> > testpmd> bpf-unload tx 0 0
> >
> > 6) continue with step 3) or exit
> >
> > Not currently supported features:
> > =================================
> > - cBPF
> > - tail-pointer call
> > - eBPF MAP
> > - JIT for non X86_64 targets
> 
> May be for next release, we are planning to add arm64 JIT support.

Sounds great :)

> Just wondering, How do you test all EBPF opcodes in JIT/Interpreter mode?
> Are you planning to add any UT like linux kernel in dpdk ?  or it was
> similar to https://github.com/iovisor/ubpf/tree/master/tests ?

I added UT for it in v3:
http://dpdk.org/dev/patchwork/patch/37456/
But it doesn't cover whole ISA yet.
In fact - that's what I am working right now - adding more test-cases to it,
so hopefully by next release will have  much better test coverage.
Another thing I plan to add - harden validate() to catch more cases with
Invalid code. 

Konstantin

> 
> Just asking because, when we introduce arm64 JIT support similar
> test cases should be required to verify the implementation.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 00/10] add framework to load and execute BPF code
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  2018-04-16 21:25       ` Thomas Monjalon
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                       ` (9 subsequent siblings)
  10 siblings, 1 reply; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

BPF is used quite intensively inside Linux (and BSD) kernels
for various different purposes and proved to be extremely useful.

BPF inside DPDK might also be used in a lot of places
for a lot of similar things.
 As an example to:
- packet filtering/tracing (aka tcpdump)
- packet classification
- statistics collection
- HW/PMD live-system debugging/prototyping - trace HW descriptors,
  internal PMD SW state, etc.
- Comeup with your own idea

All of that in a dynamic, user-defined and extensible manner.

So these series introduce new library - librte_bpf.
librte_bpf provides API to load and execute BPF bytecode within
user-space dpdk app.
It supports basic set of features from eBPF spec.
Also it introduces basic framework to load/unload BPF-based filters
on eth devices (right now via SW RX/TX callbacks).

How to try it:
===============

1) run testpmd as usual and start your favorite forwarding case.
2) build bpf program you'd like to load
(you'll need clang v3.7 or above):
$ cd test/bpf
$ clang -O2 -target bpf -c t1.c

3) load bpf program(s):
testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>

<load-flags>:  [-][J][M]
J - use JIT generated native code, otherwise BPF interpreter will be used.
M - assume input parameter is a pointer to rte_mbuf,
    otherwise assume it is a pointer to first segment's data.

Few examples:

# to load (not JITed) dummy.o at TX queue 0, port 0:
testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o
#to load (and JIT compile) t1.o at RX queue 0, port 1:
testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o

#to load and JIT t3.o (note that it expects mbuf as an input):
testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o

4) observe changed traffic behavior
Let say with the examples above:
  - dummy.o  does literally nothing, so no changes should be here,
    except some possible slowdown.
 - t1.o - should force to drop all packets that doesn't match:
   'dst 1.2.3.4 && udp && dst port 5000' filter.
 - t3.o - should dump to stdout ARP packets.

5) unload some or all bpf programs:
testpmd> bpf-unload tx 0 0

6) continue with step 3) or exit

Not currently supported features:
=================================
- cBPF
- tail-pointer call
- eBPF MAP
- JIT for non X86_64 targets
- skb
- function calls for 32-bit apps

v2:
 - add meson build
 - add freebsd build
 - use new logging API
 - using rte_malloc() for cbi allocation
 - add extra logic into bpf_validate()

v3:
 - add new test-case for it
 - update docs
 - update MAINTAINERS

v4:
 - add more tests to cover BPF ISA
 - fix few issues

Konstantin Ananyev (10):
  net: move BPF related definitions into librte_net
  bpf: add BPF loading and execution framework
  bpf: add more logic into bpf_validate()
  bpf: add JIT compilation for x86_64 ISA
  bpf: introduce basic RX/TX BPF filters
  testpmd: new commands to load/unload BPF filters
  test: add few eBPF samples
  test: introduce functional test for librte_bpf
  doc: add librte_bpf related info
  MAINTAINERS: add librte_bpf related info

 MAINTAINERS                        |    4 +
 app/test-pmd/bpf_sup.h             |   25 +
 app/test-pmd/cmdline.c             |  146 +++
 app/test-pmd/meson.build           |    2 +-
 config/common_base                 |    5 +
 doc/api/doxy-api-index.md          |    3 +-
 doc/api/doxy-api.conf              |    1 +
 doc/guides/prog_guide/bpf_lib.rst  |   38 +
 doc/guides/prog_guide/index.rst    |    1 +
 drivers/net/tap/tap_bpf.h          |   80 +-
 lib/Makefile                       |    2 +
 lib/librte_bpf/Makefile            |   35 +
 lib/librte_bpf/bpf.c               |   64 ++
 lib/librte_bpf/bpf_exec.c          |  452 ++++++++++
 lib/librte_bpf/bpf_impl.h          |   41 +
 lib/librte_bpf/bpf_jit_x86.c       | 1368 ++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_load.c          |  386 ++++++++
 lib/librte_bpf/bpf_pkt.c           |  607 +++++++++++++
 lib/librte_bpf/bpf_validate.c      | 1175 ++++++++++++++++++++++++
 lib/librte_bpf/meson.build         |   24 +
 lib/librte_bpf/rte_bpf.h           |  170 ++++
 lib/librte_bpf/rte_bpf_ethdev.h    |  102 +++
 lib/librte_bpf/rte_bpf_version.map |   16 +
 lib/librte_net/Makefile            |    1 +
 lib/librte_net/bpf_def.h           |  370 ++++++++
 lib/librte_net/meson.build         |    3 +-
 lib/meson.build                    |    2 +-
 mk/rte.app.mk                      |    2 +
 test/bpf/dummy.c                   |   20 +
 test/bpf/mbuf.h                    |  578 ++++++++++++
 test/bpf/t1.c                      |   52 ++
 test/bpf/t2.c                      |   31 +
 test/bpf/t3.c                      |   36 +
 test/test/Makefile                 |    2 +
 test/test/meson.build              |    2 +
 test/test/test_bpf.c               | 1726 ++++++++++++++++++++++++++++++++++++
 36 files changed, 7489 insertions(+), 83 deletions(-)
 create mode 100644 app/test-pmd/bpf_sup.h
 create mode 100644 doc/guides/prog_guide/bpf_lib.rst
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map
 create mode 100644 lib/librte_net/bpf_def.h
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c
 create mode 100644 test/test/test_bpf.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 00/10] add framework to load and execute BPF code Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 0/8] add framework to load and execute BPF code Konstantin Ananyev
                         ` (8 more replies)
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 02/10] bpf: add BPF loading and execution framework Konstantin Ananyev
                       ` (8 subsequent siblings)
  10 siblings, 9 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, olivier.matz, pascal.mazon

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 drivers/net/tap/tap_bpf.h  |  80 +---------
 lib/librte_net/Makefile    |   1 +
 lib/librte_net/bpf_def.h   | 370 +++++++++++++++++++++++++++++++++++++++++++++
 lib/librte_net/meson.build |   3 +-
 4 files changed, 374 insertions(+), 80 deletions(-)
 create mode 100644 lib/librte_net/bpf_def.h

diff --git a/drivers/net/tap/tap_bpf.h b/drivers/net/tap/tap_bpf.h
index 1a70ffe21..baaf3b25c 100644
--- a/drivers/net/tap/tap_bpf.h
+++ b/drivers/net/tap/tap_bpf.h
@@ -6,85 +6,7 @@
 #define __TAP_BPF_H__
 
 #include <tap_autoconf.h>
-
-/* Do not #include <linux/bpf.h> since eBPF must compile on different
- * distros which may include partial definitions for eBPF (while the
- * kernel itself may support eBPF). Instead define here all that is needed
- */
-
-/* BPF_MAP_UPDATE_ELEM command flags */
-#define	BPF_ANY	0 /* create a new element or update an existing */
-
-/* BPF architecture instruction struct */
-struct bpf_insn {
-	__u8	code;
-	__u8	dst_reg:4;
-	__u8	src_reg:4;
-	__s16	off;
-	__s32	imm; /* immediate value */
-};
-
-/* BPF program types */
-enum bpf_prog_type {
-	BPF_PROG_TYPE_UNSPEC,
-	BPF_PROG_TYPE_SOCKET_FILTER,
-	BPF_PROG_TYPE_KPROBE,
-	BPF_PROG_TYPE_SCHED_CLS,
-	BPF_PROG_TYPE_SCHED_ACT,
-};
-
-/* BPF commands types */
-enum bpf_cmd {
-	BPF_MAP_CREATE,
-	BPF_MAP_LOOKUP_ELEM,
-	BPF_MAP_UPDATE_ELEM,
-	BPF_MAP_DELETE_ELEM,
-	BPF_MAP_GET_NEXT_KEY,
-	BPF_PROG_LOAD,
-};
-
-/* BPF maps types */
-enum bpf_map_type {
-	BPF_MAP_TYPE_UNSPEC,
-	BPF_MAP_TYPE_HASH,
-};
-
-/* union of anonymous structs used with TAP BPF commands */
-union bpf_attr {
-	/* BPF_MAP_CREATE command */
-	struct {
-		__u32	map_type;
-		__u32	key_size;
-		__u32	value_size;
-		__u32	max_entries;
-		__u32	map_flags;
-		__u32	inner_map_fd;
-	};
-
-	/* BPF_MAP_UPDATE_ELEM, BPF_MAP_DELETE_ELEM commands */
-	struct {
-		__u32		map_fd;
-		__aligned_u64	key;
-		union {
-			__aligned_u64 value;
-			__aligned_u64 next_key;
-		};
-		__u64		flags;
-	};
-
-	/* BPF_PROG_LOAD command */
-	struct {
-		__u32		prog_type;
-		__u32		insn_cnt;
-		__aligned_u64	insns;
-		__aligned_u64	license;
-		__u32		log_level;
-		__u32		log_size;
-		__aligned_u64	log_buf;
-		__u32		kern_version;
-		__u32		prog_flags;
-	};
-} __attribute__((aligned(8)));
+#include <bpf_def.h>
 
 #ifndef __NR_bpf
 # if defined(__i386__)
diff --git a/lib/librte_net/Makefile b/lib/librte_net/Makefile
index 95ff54900..52bb418b8 100644
--- a/lib/librte_net/Makefile
+++ b/lib/librte_net/Makefile
@@ -20,5 +20,6 @@ SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include := rte_ip.h rte_tcp.h rte_udp.h rte_esp
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_sctp.h rte_icmp.h rte_arp.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_ether.h rte_gre.h rte_net.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += rte_net_crc.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_NET)-include += bpf_def.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_net/bpf_def.h b/lib/librte_net/bpf_def.h
new file mode 100644
index 000000000..3f4a5a3e7
--- /dev/null
+++ b/lib/librte_net/bpf_def.h
@@ -0,0 +1,370 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd.
+ */
+
+#ifndef _RTE_BPF_DEF_H_
+#define _RTE_BPF_DEF_H_
+
+#ifdef __linux__
+#include <linux/types.h>
+#else
+
+typedef uint8_t __u8;
+typedef uint16_t __u16;
+typedef uint32_t __u32;
+typedef uint64_t __u64;
+
+typedef int8_t __s8;
+typedef int16_t __s16;
+typedef int32_t __s32;
+
+#define __aligned_u64 __u64 __attribute__((aligned(8)))
+
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Do not #include <linux/bpf.h> since eBPF must compile on different
+ * distros which may include partial definitions for eBPF (while the
+ * kernel itself may support eBPF). Instead define here all that is needed
+ * by various DPDK components.
+ */
+
+/* Instruction classes */
+#define BPF_CLASS(code) ((code) & 0x07)
+#define		BPF_LD		0x00
+#define		BPF_LDX		0x01
+#define		BPF_ST		0x02
+#define		BPF_STX		0x03
+#define		BPF_ALU		0x04
+#define		BPF_JMP		0x05
+#define		BPF_RET		0x06
+#define		BPF_MISC        0x07
+
+/* ld/ldx fields */
+#define BPF_SIZE(code)  ((code) & 0x18)
+#define		BPF_W		0x00
+#define		BPF_H		0x08
+#define		BPF_B		0x10
+#define BPF_MODE(code)  ((code) & 0xe0)
+#define		BPF_IMM		0x00
+#define		BPF_ABS		0x20
+#define		BPF_IND		0x40
+#define		BPF_MEM		0x60
+#define		BPF_LEN		0x80
+#define		BPF_MSH		0xa0
+
+/* alu/jmp fields */
+#define BPF_OP(code)    ((code) & 0xf0)
+#define		BPF_ADD		0x00
+#define		BPF_SUB		0x10
+#define		BPF_MUL		0x20
+#define		BPF_DIV		0x30
+#define		BPF_OR		0x40
+#define		BPF_AND		0x50
+#define		BPF_LSH		0x60
+#define		BPF_RSH		0x70
+#define		BPF_NEG		0x80
+#define		BPF_MOD		0x90
+#define		BPF_XOR		0xa0
+
+#define		BPF_JA		0x00
+#define		BPF_JEQ		0x10
+#define		BPF_JGT		0x20
+#define		BPF_JGE		0x30
+#define		BPF_JSET        0x40
+#define BPF_SRC(code)   ((code) & 0x08)
+#define		BPF_K		0x00
+#define		BPF_X		0x08
+
+#ifndef BPF_MAXINSNS
+#define BPF_MAXINSNS 4096
+#endif
+
+/* Extended instruction set based on top of classic BPF */
+
+/* instruction classes */
+#define BPF_ALU64	0x07	/* alu mode in double word width */
+
+/* ld/ldx fields */
+#define BPF_DW		0x18	/* double word */
+#define BPF_XADD	0xc0	/* exclusive add */
+
+/* alu/jmp fields */
+#define BPF_MOV		0xb0	/* mov reg to reg */
+#define BPF_ARSH	0xc0	/* sign extending arithmetic shift right */
+
+/* change endianness of a register */
+#define BPF_END		0xd0	/* flags for endianness conversion: */
+#define BPF_TO_LE	0x00	/* convert to little-endian */
+#define BPF_TO_BE	0x08	/* convert to big-endian */
+#define BPF_FROM_LE	BPF_TO_LE
+#define BPF_FROM_BE	BPF_TO_BE
+
+/* jmp encodings */
+#define BPF_JNE		0x50	/* jump != */
+#define BPF_JLT		0xa0	/* LT is unsigned, '<' */
+#define BPF_JLE		0xb0	/* LE is unsigned, '<=' */
+#define BPF_JSGT	0x60	/* SGT is signed '>', GT in x86 */
+#define BPF_JSGE	0x70	/* SGE is signed '>=', GE in x86 */
+#define BPF_JSLT	0xc0	/* SLT is signed, '<' */
+#define BPF_JSLE	0xd0	/* SLE is signed, '<=' */
+#define BPF_CALL	0x80	/* function call */
+#define BPF_EXIT	0x90	/* function return */
+
+/* Register numbers */
+enum {
+	BPF_REG_0 = 0,
+	BPF_REG_1,
+	BPF_REG_2,
+	BPF_REG_3,
+	BPF_REG_4,
+	BPF_REG_5,
+	BPF_REG_6,
+	BPF_REG_7,
+	BPF_REG_8,
+	BPF_REG_9,
+	BPF_REG_10,
+	__MAX_BPF_REG,
+};
+
+/* BPF has 10 general purpose 64-bit registers and stack frame. */
+#define MAX_BPF_REG	__MAX_BPF_REG
+
+struct bpf_insn {
+	__u8	code;		/* opcode */
+	__u8	dst_reg:4;	/* dest register */
+	__u8	src_reg:4;	/* source register */
+	__s16	off;		/* signed offset */
+	__s32	imm;		/* signed immediate constant */
+};
+
+/* BPF syscall commands, see bpf(2) man-page for details. */
+enum bpf_cmd {
+	BPF_MAP_CREATE,
+	BPF_MAP_LOOKUP_ELEM,
+	BPF_MAP_UPDATE_ELEM,
+	BPF_MAP_DELETE_ELEM,
+	BPF_MAP_GET_NEXT_KEY,
+	BPF_PROG_LOAD,
+	BPF_OBJ_PIN,
+	BPF_OBJ_GET,
+	BPF_PROG_ATTACH,
+	BPF_PROG_DETACH,
+	BPF_PROG_TEST_RUN,
+	BPF_PROG_GET_NEXT_ID,
+	BPF_MAP_GET_NEXT_ID,
+	BPF_PROG_GET_FD_BY_ID,
+	BPF_MAP_GET_FD_BY_ID,
+	BPF_OBJ_GET_INFO_BY_FD,
+};
+
+enum bpf_map_type {
+	BPF_MAP_TYPE_UNSPEC,
+	BPF_MAP_TYPE_HASH,
+	BPF_MAP_TYPE_ARRAY,
+	BPF_MAP_TYPE_PROG_ARRAY,
+	BPF_MAP_TYPE_PERF_EVENT_ARRAY,
+	BPF_MAP_TYPE_PERCPU_HASH,
+	BPF_MAP_TYPE_PERCPU_ARRAY,
+	BPF_MAP_TYPE_STACK_TRACE,
+	BPF_MAP_TYPE_CGROUP_ARRAY,
+	BPF_MAP_TYPE_LRU_HASH,
+	BPF_MAP_TYPE_LRU_PERCPU_HASH,
+	BPF_MAP_TYPE_LPM_TRIE,
+	BPF_MAP_TYPE_ARRAY_OF_MAPS,
+	BPF_MAP_TYPE_HASH_OF_MAPS,
+	BPF_MAP_TYPE_DEVMAP,
+	BPF_MAP_TYPE_SOCKMAP,
+};
+
+enum bpf_prog_type {
+	BPF_PROG_TYPE_UNSPEC,
+	BPF_PROG_TYPE_SOCKET_FILTER,
+	BPF_PROG_TYPE_KPROBE,
+	BPF_PROG_TYPE_SCHED_CLS,
+	BPF_PROG_TYPE_SCHED_ACT,
+	BPF_PROG_TYPE_TRACEPOINT,
+	BPF_PROG_TYPE_XDP,
+	BPF_PROG_TYPE_PERF_EVENT,
+	BPF_PROG_TYPE_CGROUP_SKB,
+	BPF_PROG_TYPE_CGROUP_SOCK,
+	BPF_PROG_TYPE_LWT_IN,
+	BPF_PROG_TYPE_LWT_OUT,
+	BPF_PROG_TYPE_LWT_XMIT,
+	BPF_PROG_TYPE_SOCK_OPS,
+	BPF_PROG_TYPE_SK_SKB,
+};
+
+enum bpf_attach_type {
+	BPF_CGROUP_INET_INGRESS,
+	BPF_CGROUP_INET_EGRESS,
+	BPF_CGROUP_INET_SOCK_CREATE,
+	BPF_CGROUP_SOCK_OPS,
+	BPF_SK_SKB_STREAM_PARSER,
+	BPF_SK_SKB_STREAM_VERDICT,
+	__MAX_BPF_ATTACH_TYPE
+};
+
+#define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
+
+/* If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
+ * to the given target_fd cgroup the descendent cgroup will be able to
+ * override effective bpf program that was inherited from this cgroup
+ */
+#define BPF_F_ALLOW_OVERRIDE	(1U << 0)
+
+/* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
+ * verifier will perform strict alignment checking as if the kernel
+ * has been built with CONFIG_EFFICIENT_UNALIGNED_ACCESS not set,
+ * and NET_IP_ALIGN defined to 2.
+ */
+#define BPF_F_STRICT_ALIGNMENT	(1U << 0)
+
+#define BPF_PSEUDO_MAP_FD	1
+
+/* flags for BPF_MAP_UPDATE_ELEM command */
+#define BPF_ANY		0 /* create new element or update existing */
+#define BPF_NOEXIST	1 /* create new element if it didn't exist */
+#define BPF_EXIST	2 /* update existing element */
+
+/* flags for BPF_MAP_CREATE command */
+#define BPF_F_NO_PREALLOC	(1U << 0)
+/* Instead of having one common LRU list in the
+ * BPF_MAP_TYPE_LRU_[PERCPU_]HASH map, use a percpu LRU list
+ * which can scale and perform better.
+ * Note, the LRU nodes (including free nodes) cannot be moved
+ * across different LRU lists.
+ */
+#define BPF_F_NO_COMMON_LRU	(1U << 1)
+/* Specify numa node during map creation */
+#define BPF_F_NUMA_NODE		(1U << 2)
+
+union bpf_attr {
+	struct { /* anonymous struct used by BPF_MAP_CREATE command */
+		__u32	map_type;	/* one of enum bpf_map_type */
+		__u32	key_size;	/* size of key in bytes */
+		__u32	value_size;	/* size of value in bytes */
+		__u32	max_entries;	/* max number of entries in a map */
+		__u32	map_flags;	/* BPF_MAP_CREATE related
+					 * flags defined above.
+					 */
+		__u32	inner_map_fd;	/* fd pointing to the inner map */
+		__u32	numa_node;	/* numa node (effective only if
+					 * BPF_F_NUMA_NODE is set).
+					 */
+	};
+
+	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
+		__u32		map_fd;
+		__aligned_u64	key;
+		union {
+			__aligned_u64 value;
+			__aligned_u64 next_key;
+		};
+		__u64		flags;
+	};
+
+	struct { /* anonymous struct used by BPF_PROG_LOAD command */
+		__u32		prog_type;	/* one of enum bpf_prog_type */
+		__u32		insn_cnt;
+		__aligned_u64	insns;
+		__aligned_u64	license;
+		__u32		log_level;
+		/* verbosity level of verifier */
+		__u32		log_size;	/* size of user buffer */
+		__aligned_u64	log_buf;	/* user supplied buffer */
+		__u32		kern_version;
+		/* checked when prog_type=kprobe */
+		__u32		prog_flags;
+	};
+
+	struct { /* anonymous struct used by BPF_OBJ_* commands */
+		__aligned_u64	pathname;
+		__u32		bpf_fd;
+	};
+
+	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
+		__u32		target_fd;
+		/* container object to attach to */
+		__u32		attach_bpf_fd;	/* eBPF program to attach */
+		__u32		attach_type;
+		__u32		attach_flags;
+	};
+
+	struct { /* anonymous struct used by BPF_PROG_TEST_RUN command */
+		__u32		prog_fd;
+		__u32		retval;
+		__u32		data_size_in;
+		__u32		data_size_out;
+		__aligned_u64	data_in;
+		__aligned_u64	data_out;
+		__u32		repeat;
+		__u32		duration;
+	} test;
+
+	struct { /* anonymous struct used by BPF_*_GET_*_ID */
+		union {
+			__u32		start_id;
+			__u32		prog_id;
+			__u32		map_id;
+		};
+		__u32		next_id;
+	};
+
+	struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
+		__u32		bpf_fd;
+		__u32		info_len;
+		__aligned_u64	info;
+	} info;
+} __attribute__((aligned(8)));
+
+/* Generic BPF return codes which all BPF program types may support.
+ * The values are binary compatible with their TC_ACT_* counter-part to
+ * provide backwards compatibility with existing SCHED_CLS and SCHED_ACT
+ * programs.
+ *
+ * XDP is handled seprately, see XDP_*.
+ */
+enum bpf_ret_code {
+	BPF_OK = 0,
+	/* 1 reserved */
+	BPF_DROP = 2,
+	/* 3-6 reserved */
+	BPF_REDIRECT = 7,
+	/* >127 are reserved for prog type specific return codes */
+};
+
+enum sk_action {
+	SK_DROP = 0,
+	SK_PASS,
+};
+
+#define BPF_TAG_SIZE	8
+
+struct bpf_prog_info {
+	__u32 type;
+	__u32 id;
+	__u8  tag[BPF_TAG_SIZE];
+	__u32 jited_prog_len;
+	__u32 xlated_prog_len;
+	__aligned_u64 jited_prog_insns;
+	__aligned_u64 xlated_prog_insns;
+} __attribute__((aligned(8)));
+
+struct bpf_map_info {
+	__u32 type;
+	__u32 id;
+	__u32 key_size;
+	__u32 value_size;
+	__u32 max_entries;
+	__u32 map_flags;
+} __attribute__((aligned(8)));
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_DEF_H_ */
diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
index 78c0f03e5..3acc1602a 100644
--- a/lib/librte_net/meson.build
+++ b/lib/librte_net/meson.build
@@ -12,7 +12,8 @@ headers = files('rte_ip.h',
 	'rte_ether.h',
 	'rte_gre.h',
 	'rte_net.h',
-	'rte_net_crc.h')
+	'rte_net_crc.h',
+	'bpf_def.h')
 
 sources = files('rte_arp.c', 'rte_net.c', 'rte_net_crc.c')
 deps += ['mbuf']
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 02/10] bpf: add BPF loading and execution framework
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 00/10] add framework to load and execute BPF code Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 03/10] bpf: add more logic into bpf_validate() Konstantin Ananyev
                       ` (7 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.
It supports basic set of features from eBPF spec
(https://www.kernel.org/doc/Documentation/networking/filter.txt).

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb
 - function calls for 32-bit apps

It also adds dependency on libelf.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_bpf/Makefile            |  30 +++
 lib/librte_bpf/bpf.c               |  59 +++++
 lib/librte_bpf/bpf_exec.c          | 452 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h          |  41 ++++
 lib/librte_bpf/bpf_load.c          | 386 +++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_validate.c      |  55 +++++
 lib/librte_bpf/meson.build         |  18 ++
 lib/librte_bpf/rte_bpf.h           | 170 ++++++++++++++
 lib/librte_bpf/rte_bpf_version.map |  12 +
 lib/meson.build                    |   2 +-
 mk/rte.app.mk                      |   2 +
 13 files changed, 1233 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/config/common_base b/config/common_base
index c09c7cf88..d68c2e211 100644
--- a/config/common_base
+++ b/config/common_base
@@ -821,3 +821,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=y
diff --git a/lib/Makefile b/lib/Makefile
index ec965a606..a4a2329f9 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..e0f434e77
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+LDLIBS += -lelf
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..d7f68c017
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+int rte_bpf_logtype;
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+	if (rc != 0)
+		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
+
+RTE_INIT(rte_bpf_init_log);
+
+static void
+rte_bpf_init_log(void)
+{
+	rte_bpf_logtype = rte_log_register("lib.bpf");
+	if (rte_bpf_logtype >= 0)
+		rte_log_set_level(rte_bpf_logtype, RTE_LOG_INFO);
+}
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..0382ade98
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,452 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define BPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define BPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_BPF_LOG(ERR, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[MAX_BPF_REG])
+{
+	const struct bpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			BPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			BPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			reg[BPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[BPF_REG_1], reg[BPF_REG_2], reg[BPF_REG_3],
+				reg[BPF_REG_4], reg[BPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			return reg[BPF_REG_0];
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[MAX_BPF_REG];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[BPF_REG_1] = (uintptr_t)ctx[i];
+		reg[BPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..5d7e65c31
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+extern int rte_bpf_logtype;
+
+#define	RTE_BPF_LOG(lvl, fmt, args...) \
+	rte_log(RTE_LOG_## lvl, rte_bpf_logtype, fmt, ##args)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..3c7279a6c
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,386 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+/* To overcome compatibility issue */
+#ifndef EM_BPF
+#define	EM_BPF	247
+#endif
+
+static uint32_t
+bpf_find_xsym(const char *sn, enum rte_bpf_xtype type,
+	const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == type && strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+/*
+ * update BPF code at offset *ofs* with a proper address(index) for external
+ * symbol *sn*
+ */
+static int
+resolve_xsym(const char *sn, size_t ofs, struct bpf_insn *ins, size_t ins_sz,
+	const struct rte_bpf_prm *prm)
+{
+	uint32_t idx, fidx;
+	enum rte_bpf_xtype type;
+
+	if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+		return -EINVAL;
+
+	idx = ofs / sizeof(ins[0]);
+	if (ins[idx].code == (BPF_JMP | BPF_CALL))
+		type = RTE_BPF_XTYPE_FUNC;
+	else if (ins[idx].code == (BPF_LD | BPF_IMM | BPF_DW) &&
+			ofs < ins_sz - sizeof(ins[idx]))
+		type = RTE_BPF_XTYPE_VAR;
+	else
+		return -EINVAL;
+
+	fidx = bpf_find_xsym(sn, type, prm->xsym, prm->nb_xsym);
+	if (fidx == UINT32_MAX)
+		return -ENOENT;
+
+	/* for function we just need an index in our xsym table */
+	if (type == RTE_BPF_XTYPE_FUNC)
+		ins[idx].imm = fidx;
+	/* for variable we need to store its absolute address */
+	else {
+		ins[idx].imm = (uintptr_t)prm->xsym[fidx].var;
+		ins[idx + 1].imm =
+			(uint64_t)(uintptr_t)prm->xsym[fidx].var >> 32;
+	}
+
+	return 0;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr * eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_BPF_LOG(ERR, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct bpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct bpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	int32_t rc;
+	uint32_t i, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		rc = resolve_xsym(sn, ofs, ins, ins_sz, prm);
+		if (rc != 0) {
+			RTE_BPF_LOG(ERR,
+				"resolve_xsym(%s, %zu) error code: %d\n",
+				sn, ofs, rc);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct bpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_BPF_LOG(ERR, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_BPF_LOG(INFO, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p(jit={.func=%p,.sz=%zu});\n",
+		__func__, fname, sname, bpf, bpf->jit.func, bpf->jit.sz);
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..1911e1381
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct bpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == BPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
new file mode 100644
index 000000000..05c48c7ff
--- /dev/null
+++ b/lib/librte_bpf/meson.build
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+allow_experimental_apis = true
+sources = files('bpf.c',
+		'bpf_exec.c',
+		'bpf_load.c',
+		'bpf_validate.c')
+
+install_headers = files('rte_bpf.h')
+
+deps += ['mbuf', 'net']
+
+dep = dependency('libelf', required: false)
+if dep.found() == false
+	build = false
+endif
+ext_deps += dep
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..825621404
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,170 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+/**
+ * @file
+ *
+ * RTE BPF support.
+ * librte_bpf provides a framework to load and execute eBPF bytecode
+ * inside user-space dpdk based applications.
+ * It supports basic set of features from eBPF spec
+ * (https://www.kernel.org/doc/Documentation/networking/filter.txt).
+ */
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <bpf_def.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_VAR, /**< variable */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	union {
+		uint64_t (*func)(uint64_t, uint64_t, uint64_t,
+				uint64_t, uint64_t);
+		void *var;
+	}; /**< value */
+};
+
+/**
+ * Possible BPF program types.
+ * Use negative values for DPDK specific prog-types, to make sure they will
+ * not interfere with Linux related ones.
+ */
+enum rte_bpf_prog_type {
+	RTE_BPF_PROG_TYPE_UNSPEC = BPF_PROG_TYPE_UNSPEC,
+	/**< input is a pointer to raw data */
+	RTE_BPF_PROG_TYPE_MBUF = INT32_MIN,
+	/**< input is a pointer to rte_mbuf */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct bpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	enum rte_bpf_prog_type prog_type; /**< eBPF program type */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *); /**< JIT-ed native code */
+	size_t sz;                /**< size of JIT-ed code */
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ff65144df
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_elf_load;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/lib/meson.build b/lib/meson.build
index ef6159170..7ff7aaaa5 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -23,7 +23,7 @@ libraries = [ 'compat', # just a header, used for versioning
 	# add pkt framework libs which use other libs from above
 	'port', 'table', 'pipeline',
 	# flow_classify lib depends on pkt framework table lib
-	'flow_classify']
+	'flow_classify', 'bpf']
 
 foreach l:libraries
 	build = true
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 258590819..405a13147 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -83,6 +83,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER)          += -lrte_timer
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf -lelf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 03/10] bpf: add more logic into bpf_validate()
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                       ` (2 preceding siblings ...)
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 02/10] bpf: add BPF loading and execution framework Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 04/10] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
                       ` (6 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add checks for:
 - all instructions are valid ones
   (known opcodes, correct syntax, valid reg/off/imm values, etc.)
 - no unreachable instructions
 - no loops
 - basic stack boundaries checks
 - division by zero

Still need to add checks for:
 - use/return only initialized registers and stack data.
 - memory boundaries violation

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/bpf_validate.c | 1172 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 1146 insertions(+), 26 deletions(-)

diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
index 1911e1381..7d67be7b4 100644
--- a/lib/librte_bpf/bpf_validate.c
+++ b/lib/librte_bpf/bpf_validate.c
@@ -14,42 +14,1162 @@
 
 #include "bpf_impl.h"
 
+/* possible instruction node colour */
+enum {
+	WHITE,
+	GREY,
+	BLACK,
+	MAX_NODE_COLOUR
+};
+
+/* possible edge types */
+enum {
+	UNKNOWN_EDGE,
+	TREE_EDGE,
+	BACK_EDGE,
+	CROSS_EDGE,
+	MAX_EDGE_TYPE
+};
+
+struct bpf_reg_state {
+	uint64_t val;
+};
+
+struct bpf_eval_state {
+	struct bpf_reg_state rs[MAX_BPF_REG];
+};
+
+#define	MAX_EDGES	2
+
+struct inst_node {
+	uint8_t colour;
+	uint8_t nb_edge:4;
+	uint8_t cur_edge:4;
+	uint8_t edge_type[MAX_EDGES];
+	uint32_t edge_dest[MAX_EDGES];
+	uint32_t prev_node;
+	struct bpf_eval_state *evst;
+};
+
+struct bpf_verifier {
+	const struct rte_bpf_prm *prm;
+	struct inst_node *in;
+	int32_t stack_sz;
+	uint32_t nb_nodes;
+	uint32_t nb_jcc_nodes;
+	uint32_t node_colour[MAX_NODE_COLOUR];
+	uint32_t edge_type[MAX_EDGE_TYPE];
+	struct bpf_eval_state *evst;
+	struct {
+		uint32_t num;
+		uint32_t cur;
+		struct bpf_eval_state *ent;
+	} evst_pool;
+};
+
+struct bpf_ins_check {
+	struct {
+		uint16_t dreg;
+		uint16_t sreg;
+	} mask;
+	struct {
+		uint16_t min;
+		uint16_t max;
+	} off;
+	struct {
+		uint32_t min;
+		uint32_t max;
+	} imm;
+	const char * (*check)(const struct bpf_insn *);
+	const char * (*eval)(struct bpf_verifier *, const struct bpf_insn *);
+};
+
+#define	ALL_REGS	RTE_LEN2MASK(MAX_BPF_REG, uint16_t)
+#define	WRT_REGS	RTE_LEN2MASK(BPF_REG_10, uint16_t)
+#define	ZERO_REG	RTE_LEN2MASK(BPF_REG_1, uint16_t)
+
 /*
- * dummy one for now, need more work.
+ * check and evaluate functions for particular instruction types.
  */
-int
-bpf_validate(struct rte_bpf *bpf)
+
+static const char *
+check_alu_bele(const struct bpf_insn *ins)
+{
+	if (ins->imm != 16 && ins->imm != 32 && ins->imm != 64)
+		return "invalid imm field";
+	return NULL;
+}
+
+static const char *
+eval_stack(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	int32_t ofs;
+
+	ofs = ins->off;
+
+	if (ofs >= 0 || ofs < -MAX_BPF_STACK_SIZE)
+		return "stack boundary violation";
+
+	ofs = -ofs;
+	bvf->stack_sz = RTE_MAX(bvf->stack_sz, ofs);
+	return NULL;
+}
+
+static const char *
+eval_store(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	if (ins->dst_reg == BPF_REG_10)
+		return eval_stack(bvf, ins);
+	return NULL;
+}
+
+static const char *
+eval_load(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	if (ins->src_reg == BPF_REG_10)
+		return eval_stack(bvf, ins);
+	return NULL;
+}
+
+static const char *
+eval_call(struct bpf_verifier *bvf, const struct bpf_insn *ins)
+{
+	uint32_t idx;
+
+	idx = ins->imm;
+
+	if (idx >= bvf->prm->nb_xsym ||
+			bvf->prm->xsym[idx].type != RTE_BPF_XTYPE_FUNC)
+		return "invalid external function index";
+
+	/* for now don't support function calls on 32 bit platform */
+	if (sizeof(uint64_t) != sizeof(uintptr_t))
+		return "function calls are supported only for 64 bit apps";
+	return NULL;
+}
+
+/*
+ * validate parameters for each instruction type.
+ */
+static const struct bpf_ins_check ins_chk[UINT8_MAX] = {
+	/* ALU IMM 32-bit instructions */
+	[(BPF_ALU | BPF_ADD | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_SUB | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_AND | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_OR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_LSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_RSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_XOR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_MUL | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_MOV | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_DIV | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	[(BPF_ALU | BPF_MOD | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	/* ALU IMM 64-bit instructions */
+	[(BPF_ALU64 | BPF_ADD | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_SUB | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_AND | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_OR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_LSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_RSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_ARSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_XOR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_MUL | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_MOV | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU64 | BPF_DIV | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	[(BPF_ALU64 | BPF_MOD | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	/* ALU REG 32-bit instructions */
+	[(BPF_ALU | BPF_ADD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_SUB | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_AND | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_OR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_LSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_RSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_XOR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MUL | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_DIV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MOD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MOV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_NEG)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_END | BPF_TO_BE)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 16, .max = 64},
+		.check = check_alu_bele,
+	},
+	[(BPF_ALU | BPF_END | BPF_TO_LE)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 16, .max = 64},
+		.check = check_alu_bele,
+	},
+	/* ALU REG 64-bit instructions */
+	[(BPF_ALU64 | BPF_ADD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_SUB | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_AND | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_OR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_LSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_RSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_ARSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_XOR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_MUL | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_DIV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_MOD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_MOV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU64 | BPF_NEG)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* load instructions */
+	[(BPF_LDX | BPF_MEM | BPF_B)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_H)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_W)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_DW)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	/* load 64 bit immediate value */
+	[(BPF_LD | BPF_IMM | BPF_DW)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	/* store REG instructions */
+	[(BPF_STX | BPF_MEM | BPF_B)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_H)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	/* atomic add instructions */
+	[(BPF_STX | BPF_XADD | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_XADD | BPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	/* store IMM instructions */
+	[(BPF_ST | BPF_MEM | BPF_B)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_H)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	/* jump instruction */
+	[(BPF_JMP | BPF_JA)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* jcc IMM instructions */
+	[(BPF_JMP | BPF_JEQ | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JNE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JGT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JLT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JGE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JLE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSGT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSLT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSGE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSLE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSET | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	/* jcc REG instructions */
+	[(BPF_JMP | BPF_JEQ | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JNE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JGT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JLT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JGE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JLE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSGT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSLT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSGE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSLE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSET | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* call instruction */
+	[(BPF_JMP | BPF_CALL)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_call,
+	},
+	/* ret instruction */
+	[(BPF_JMP | BPF_EXIT)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+};
+
+/*
+ * make sure that instruction syntax is valid,
+ * and it fields don't violate partciular instrcution type restrictions.
+ */
+static const char *
+check_syntax(const struct bpf_insn *ins)
+{
+
+	uint8_t op;
+	uint16_t off;
+	uint32_t imm;
+
+	op = ins->code;
+
+	if (ins_chk[op].mask.dreg == 0)
+		return "invalid opcode";
+
+	if ((ins_chk[op].mask.dreg & 1 << ins->dst_reg) == 0)
+		return "invalid dst-reg field";
+
+	if ((ins_chk[op].mask.sreg & 1 << ins->src_reg) == 0)
+		return "invalid src-reg field";
+
+	off = ins->off;
+	if (ins_chk[op].off.min > off || ins_chk[op].off.max < off)
+		return "invalid off field";
+
+	imm = ins->imm;
+	if (ins_chk[op].imm.min > imm || ins_chk[op].imm.max < imm)
+		return "invalid imm field";
+
+	if (ins_chk[op].check != NULL)
+		return ins_chk[op].check(ins);
+
+	return NULL;
+}
+
+/*
+ * helper function, return instruction index for the given node.
+ */
+static uint32_t
+get_node_idx(const struct bpf_verifier *bvf, const struct inst_node *node)
 {
-	int32_t rc, ofs, stack_sz;
-	uint32_t i, op, dr;
+	return node - bvf->in;
+}
+
+/*
+ * helper function, used to walk through constructed CFG.
+ */
+static struct inst_node *
+get_next_node(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	uint32_t ce, ne, dst;
+
+	ne = node->nb_edge;
+	ce = node->cur_edge;
+	if (ce == ne)
+		return NULL;
+
+	node->cur_edge++;
+	dst = node->edge_dest[ce];
+	return bvf->in + dst;
+}
+
+static void
+set_node_colour(struct bpf_verifier *bvf, struct inst_node *node,
+	uint32_t new)
+{
+	uint32_t prev;
+
+	prev = node->colour;
+	node->colour = new;
+
+	bvf->node_colour[prev]--;
+	bvf->node_colour[new]++;
+}
+
+/*
+ * helper function, add new edge between two nodes.
+ */
+static int
+add_edge(struct bpf_verifier *bvf, struct inst_node *node, uint32_t nidx)
+{
+	uint32_t ne;
+
+	if (nidx > bvf->prm->nb_ins) {
+		RTE_BPF_LOG(ERR, "%s: program boundary violation at pc: %u, "
+			"next pc: %u\n",
+			__func__, get_node_idx(bvf, node), nidx);
+		return -EINVAL;
+	}
+
+	ne = node->nb_edge;
+	if (ne >= RTE_DIM(node->edge_dest)) {
+		RTE_BPF_LOG(ERR, "%s: internal error at pc: %u\n",
+			__func__, get_node_idx(bvf, node));
+		return -EINVAL;
+	}
+
+	node->edge_dest[ne] = nidx;
+	node->nb_edge = ne + 1;
+	return 0;
+}
+
+/*
+ * helper function, determine type of edge between two nodes.
+ */
+static void
+set_edge_type(struct bpf_verifier *bvf, struct inst_node *node,
+	const struct inst_node *next)
+{
+	uint32_t ce, clr, type;
+
+	ce = node->cur_edge - 1;
+	clr = next->colour;
+
+	type = UNKNOWN_EDGE;
+
+	if (clr == WHITE)
+		type = TREE_EDGE;
+	else if (clr == GREY)
+		type = BACK_EDGE;
+	else if (clr == BLACK)
+		/*
+		 * in fact it could be either direct or cross edge,
+		 * but for now, we don't need to distinguish between them.
+		 */
+		type = CROSS_EDGE;
+
+	node->edge_type[ce] = type;
+	bvf->edge_type[type]++;
+}
+
+static struct inst_node *
+get_prev_node(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	return  bvf->in + node->prev_node;
+}
+
+/*
+ * Depth-First Search (DFS) through previously constructed
+ * Control Flow Graph (CFG).
+ * Information collected at this path would be used later
+ * to determine is there any loops, and/or unreachable instructions.
+ */
+static void
+dfs(struct bpf_verifier *bvf)
+{
+	struct inst_node *next, *node;
+
+	node = bvf->in;
+	while (node != NULL) {
+
+		if (node->colour == WHITE)
+			set_node_colour(bvf, node, GREY);
+
+		if (node->colour == GREY) {
+
+			/* find next unprocessed child node */
+			do {
+				next = get_next_node(bvf, node);
+				if (next == NULL)
+					break;
+				set_edge_type(bvf, node, next);
+			} while (next->colour != WHITE);
+
+			if (next != NULL) {
+				/* proceed with next child */
+				next->prev_node = get_node_idx(bvf, node);
+				node = next;
+			} else {
+				/*
+				 * finished with current node and all it's kids,
+				 * proceed with parent
+				 */
+				set_node_colour(bvf, node, BLACK);
+				node->cur_edge = 0;
+				node = get_prev_node(bvf, node);
+			}
+		} else
+			node = NULL;
+	}
+}
+
+/*
+ * report unreachable instructions.
+ */
+static void
+log_unreachable(const struct bpf_verifier *bvf)
+{
+	uint32_t i;
+	struct inst_node *node;
 	const struct bpf_insn *ins;
 
-	rc = 0;
-	stack_sz = 0;
-	for (i = 0; i != bpf->prm.nb_ins; i++) {
-
-		ins = bpf->prm.ins + i;
-		op = ins->code;
-		dr = ins->dst_reg;
-		ofs = ins->off;
-
-		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
-				dr == BPF_REG_10) {
-			ofs -= sizeof(uint64_t);
-			stack_sz = RTE_MIN(ofs, stack_sz);
+	for (i = 0; i != bvf->prm->nb_ins; i++) {
+
+		node = bvf->in + i;
+		ins = bvf->prm->ins + i;
+
+		if (node->colour == WHITE &&
+				ins->code != (BPF_LD | BPF_IMM | BPF_DW))
+			RTE_BPF_LOG(ERR, "unreachable code at pc: %u;\n", i);
+	}
+}
+
+/*
+ * report loops detected.
+ */
+static void
+log_loop(const struct bpf_verifier *bvf)
+{
+	uint32_t i, j;
+	struct inst_node *node;
+
+	for (i = 0; i != bvf->prm->nb_ins; i++) {
+
+		node = bvf->in + i;
+		if (node->colour != BLACK)
+			continue;
+
+		for (j = 0; j != node->nb_edge; j++) {
+			if (node->edge_type[j] == BACK_EDGE)
+				RTE_BPF_LOG(ERR,
+					"loop at pc:%u --> pc:%u;\n",
+					i, node->edge_dest[j]);
 		}
 	}
+}
+
+/*
+ * First pass goes though all instructions in the set, checks that each
+ * instruction is a valid one (correct syntax, valid field values, etc.)
+ * and constructs control flow graph (CFG).
+ * Then deapth-first search is performed over the constructed graph.
+ * Programs with unreachable instructions and/or loops will be rejected.
+ */
+static int
+validate(struct bpf_verifier *bvf)
+{
+	int32_t rc;
+	uint32_t i;
+	struct inst_node *node;
+	const struct bpf_insn *ins;
+	const char *err;
+
+	rc = 0;
+	for (i = 0; i < bvf->prm->nb_ins; i++) {
+
+		ins = bvf->prm->ins + i;
+		node = bvf->in + i;
 
-	if (stack_sz != 0) {
-		stack_sz = -stack_sz;
-		if (stack_sz > MAX_BPF_STACK_SIZE)
-			rc = -ERANGE;
-		else
-			bpf->stack_sz = stack_sz;
+		err = check_syntax(ins);
+		if (err != 0) {
+			RTE_BPF_LOG(ERR, "%s: %s at pc: %u\n",
+				__func__, err, i);
+			rc |= -EINVAL;
+		}
+
+		/*
+		 * construct CFG, jcc nodes have to outgoing edges,
+		 * 'exit' nodes - none, all others nodes have exaclty one
+		 * outgoing edge.
+		 */
+		switch (ins->code) {
+		case (BPF_JMP | BPF_EXIT):
+			break;
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | BPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | BPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | BPF_JLE | BPF_K):
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | BPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | BPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | BPF_JLE | BPF_X):
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			rc |= add_edge(bvf, node, i + ins->off + 1);
+			rc |= add_edge(bvf, node, i + 1);
+			bvf->nb_jcc_nodes++;
+			break;
+		case (BPF_JMP | BPF_JA):
+			rc |= add_edge(bvf, node, i + ins->off + 1);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			rc |= add_edge(bvf, node, i + 2);
+			i++;
+			break;
+		default:
+			rc |= add_edge(bvf, node, i + 1);
+			break;
+		}
+
+		bvf->nb_nodes++;
+		bvf->node_colour[WHITE]++;
 	}
 
 	if (rc != 0)
-		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
-			__func__, bpf, rc);
+		return rc;
+
+	dfs(bvf);
+
+	RTE_BPF_LOG(INFO, "%s(%p) stats:\n"
+		"nb_nodes=%u;\n"
+		"nb_jcc_nodes=%u;\n"
+		"node_color={[WHITE]=%u, [GREY]=%u,, [BLACK]=%u};\n"
+		"edge_type={[UNKNOWN]=%u, [TREE]=%u, [BACK]=%u, [CROSS]=%u};\n",
+		__func__, bvf,
+		bvf->nb_nodes,
+		bvf->nb_jcc_nodes,
+		bvf->node_colour[WHITE], bvf->node_colour[GREY],
+			bvf->node_colour[BLACK],
+		bvf->edge_type[UNKNOWN_EDGE], bvf->edge_type[TREE_EDGE],
+		bvf->edge_type[BACK_EDGE], bvf->edge_type[CROSS_EDGE]);
+
+	if (bvf->node_colour[BLACK] != bvf->nb_nodes) {
+		RTE_BPF_LOG(ERR, "%s(%p) unreachable instructions;\n",
+			__func__, bvf);
+		log_unreachable(bvf);
+		return -EINVAL;
+	}
+
+	if (bvf->node_colour[GREY] != 0 || bvf->node_colour[WHITE] != 0 ||
+			bvf->edge_type[UNKNOWN_EDGE] != 0) {
+		RTE_BPF_LOG(ERR, "%s(%p) DFS internal error;\n",
+			__func__, bvf);
+		return -EINVAL;
+	}
+
+	if (bvf->edge_type[BACK_EDGE] != 0) {
+		RTE_BPF_LOG(ERR, "%s(%p) loops detected;\n",
+			__func__, bvf);
+		log_loop(bvf);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper functions get/free eval states.
+ */
+static struct bpf_eval_state *
+pull_eval_state(struct bpf_verifier *bvf)
+{
+	uint32_t n;
+
+	n = bvf->evst_pool.cur;
+	if (n == bvf->evst_pool.num)
+		return NULL;
+
+	bvf->evst_pool.cur = n + 1;
+	return bvf->evst_pool.ent + n;
+}
+
+static void
+push_eval_state(struct bpf_verifier *bvf)
+{
+	bvf->evst_pool.cur--;
+}
+
+static void
+evst_pool_fini(struct bpf_verifier *bvf)
+{
+	bvf->evst = NULL;
+	free(bvf->evst_pool.ent);
+	memset(&bvf->evst_pool, 0, sizeof(bvf->evst_pool));
+}
+
+static int
+evst_pool_init(struct bpf_verifier *bvf)
+{
+	uint32_t n;
+
+	n = bvf->nb_jcc_nodes + 1;
+
+	bvf->evst_pool.ent = calloc(n, sizeof(bvf->evst_pool.ent[0]));
+	if (bvf->evst_pool.ent == NULL)
+		return -ENOMEM;
+
+	bvf->evst_pool.num = n;
+	bvf->evst_pool.cur = 0;
+
+	bvf->evst = pull_eval_state(bvf);
+	return 0;
+}
+
+/*
+ * Save current eval state.
+ */
+static int
+save_eval_state(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	struct bpf_eval_state *st;
+
+	/* get new eval_state for this node */
+	st = pull_eval_state(bvf);
+	if (st == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s: internal error (out of space) at pc: %u",
+			__func__, get_node_idx(bvf, node));
+		return -ENOMEM;
+	}
+
+	/* make a copy of current state */
+	memcpy(st, bvf->evst, sizeof(*st));
+
+	/* swap current state with new one */
+	node->evst = bvf->evst;
+	bvf->evst = st;
+
+	RTE_BPF_LOG(DEBUG, "%s(bvf=%p,node=%u) old/new states: %p/%p;\n",
+		__func__, bvf, get_node_idx(bvf, node), node->evst, bvf->evst);
+
+	return 0;
+}
+
+/*
+ * Restore previous eval state and mark current eval state as free.
+ */
+static void
+restore_eval_state(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	RTE_BPF_LOG(DEBUG, "%s(bvf=%p,node=%u) old/new states: %p/%p;\n",
+		__func__, bvf, get_node_idx(bvf, node), bvf->evst, node->evst);
+
+	bvf->evst = node->evst;
+	node->evst = NULL;
+	push_eval_state(bvf);
+}
+
+/*
+ * Do second pass through CFG and try to evaluate instructions
+ * via each possible path.
+ * Right now evaluation functionality is quite limited.
+ * Still need to add extra checks for:
+ * - use/return uninitialized registers.
+ * - use uninitialized data from the stack.
+ * - memory boundaries violation.
+ */
+static int
+evaluate(struct bpf_verifier *bvf)
+{
+	int32_t rc;
+	uint32_t idx, op;
+	const char *err;
+	const struct bpf_insn *ins;
+	struct inst_node *next, *node;
+
+	node = bvf->in;
+	ins = bvf->prm->ins;
+	rc = 0;
+
+	while (node != NULL && rc == 0) {
+
+		/* current node evaluation */
+		idx = get_node_idx(bvf, node);
+		op = ins[idx].code;
+
+		if (ins_chk[op].eval != NULL) {
+			err = ins_chk[op].eval(bvf, ins + idx);
+			if (err != NULL) {
+				RTE_BPF_LOG(ERR, "%s: %s at pc: %u\n",
+					__func__, err, idx);
+				rc = -EINVAL;
+			}
+		}
+
+		/* proceed through CFG */
+		next = get_next_node(bvf, node);
+		if (next != NULL) {
+
+			/* proceed with next child */
+			if (node->cur_edge != node->nb_edge)
+				rc |= save_eval_state(bvf, node);
+			else if (node->evst != NULL)
+				restore_eval_state(bvf, node);
+
+			next->prev_node = get_node_idx(bvf, node);
+			node = next;
+		} else {
+			/*
+			 * finished with current node and all it's kids,
+			 * proceed with parent
+			 */
+			node->cur_edge = 0;
+			node = get_prev_node(bvf, node);
+
+			/* finished */
+			if (node == bvf->in)
+				node = NULL;
+		}
+	}
+
+	return rc;
+}
+
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	struct bpf_verifier bvf;
+
+	memset(&bvf, 0, sizeof(bvf));
+	bvf.prm = &bpf->prm;
+	bvf.in = calloc(bpf->prm.nb_ins, sizeof(bvf.in[0]));
+	if (bvf.in == NULL)
+		return -ENOMEM;
+
+	rc = validate(&bvf);
+
+	if (rc == 0) {
+		rc = evst_pool_init(&bvf);
+		if (rc == 0)
+			rc = evaluate(&bvf);
+		evst_pool_fini(&bvf);
+	}
+
+	free(bvf.in);
+
+	/* copy collected info */
+	if (rc == 0)
+		bpf->stack_sz = bvf.stack_sz;
+
 	return rc;
 }
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 04/10] bpf: add JIT compilation for x86_64 ISA
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                       ` (3 preceding siblings ...)
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 03/10] bpf: add more logic into bpf_validate() Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 05/10] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
                       ` (5 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/Makefile      |    3 +
 lib/librte_bpf/bpf.c         |    5 +
 lib/librte_bpf/bpf_jit_x86.c | 1368 ++++++++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build   |    4 +
 4 files changed, 1380 insertions(+)
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index e0f434e77..44b12c439 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -23,6 +23,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
+endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
index d7f68c017..dc6d10991 100644
--- a/lib/librte_bpf/bpf.c
+++ b/lib/librte_bpf/bpf.c
@@ -41,7 +41,12 @@ bpf_jit(struct rte_bpf *bpf)
 {
 	int32_t rc;
 
+#ifdef RTE_ARCH_X86_64
+	rc = bpf_jit_x86(bpf);
+#else
 	rc = -ENOTSUP;
+#endif
+
 	if (rc != 0)
 		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
 			__func__, bpf, rc);
diff --git a/lib/librte_bpf/bpf_jit_x86.c b/lib/librte_bpf/bpf_jit_x86.c
new file mode 100644
index 000000000..d024470c2
--- /dev/null
+++ b/lib/librte_bpf/bpf_jit_x86.c
@@ -0,0 +1,1368 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define GET_BPF_OP(op)	(BPF_OP(op) >> 4)
+
+enum {
+	RAX = 0,  /* scratch, return value */
+	RCX = 1,  /* scratch, 4th arg */
+	RDX = 2,  /* scratch, 3rd arg */
+	RBX = 3,  /* callee saved */
+	RSP = 4,  /* stack pointer */
+	RBP = 5,  /* frame pointer, callee saved */
+	RSI = 6,  /* scratch, 2nd arg */
+	RDI = 7,  /* scratch, 1st arg */
+	R8  = 8,  /* scratch, 5th arg */
+	R9  = 9,  /* scratch, 6th arg */
+	R10 = 10, /* scratch */
+	R11 = 11, /* scratch */
+	R12 = 12, /* callee saved */
+	R13 = 13, /* callee saved */
+	R14 = 14, /* callee saved */
+	R15 = 15, /* callee saved */
+};
+
+#define IS_EXT_REG(r)	((r) >= R8)
+
+enum {
+	REX_PREFIX = 0x40, /* fixed value 0100 */
+	REX_W = 0x8,       /* 64bit operand size */
+	REX_R = 0x4,       /* extension of the ModRM.reg field */
+	REX_X = 0x2,       /* extension of the SIB.index field */
+	REX_B = 0x1,       /* extension of the ModRM.rm field */
+};
+
+enum {
+	MOD_INDIRECT = 0,
+	MOD_IDISP8 = 1,
+	MOD_IDISP32 = 2,
+	MOD_DIRECT = 3,
+};
+
+enum {
+	SIB_SCALE_1 = 0,
+	SIB_SCALE_2 = 1,
+	SIB_SCALE_4 = 2,
+	SIB_SCALE_8 = 3,
+};
+
+/*
+ * eBPF to x86_64 register mappings.
+ */
+static const uint32_t ebpf2x86[] = {
+	[BPF_REG_0] = RAX,
+	[BPF_REG_1] = RDI,
+	[BPF_REG_2] = RSI,
+	[BPF_REG_3] = RDX,
+	[BPF_REG_4] = RCX,
+	[BPF_REG_5] = R8,
+	[BPF_REG_6] = RBX,
+	[BPF_REG_7] = R13,
+	[BPF_REG_8] = R14,
+	[BPF_REG_9] = R15,
+	[BPF_REG_10] = RBP,
+};
+
+/*
+ * r10 and r11 are used as a scratch temporary registers.
+ */
+enum {
+	REG_DIV_IMM = R9,
+	REG_TMP0 = R11,
+	REG_TMP1 = R10,
+};
+
+/*
+ * callee saved registers list.
+ * keep RBP as the last one.
+ */
+static const uint32_t save_regs[] = {RBX, R12, R13, R14, R15, RBP};
+
+struct bpf_jit_state {
+	uint32_t idx;
+	size_t sz;
+	struct {
+		uint32_t num;
+		int32_t off;
+	} exit;
+	uint32_t reguse;
+	int32_t *off;
+	uint8_t *ins;
+};
+
+#define	INUSE(v, r)	(((v) >> (r)) & 1)
+#define	USED(v, r)	((v) |= 1 << (r))
+
+union bpf_jit_imm {
+	uint32_t u32;
+	uint8_t u8[4];
+};
+
+static size_t
+bpf_size(uint32_t bpf_op_sz)
+{
+	if (bpf_op_sz == BPF_B)
+		return sizeof(uint8_t);
+	else if (bpf_op_sz == BPF_H)
+		return sizeof(uint16_t);
+	else if (bpf_op_sz == BPF_W)
+		return sizeof(uint32_t);
+	else if (bpf_op_sz == BPF_DW)
+		return sizeof(uint64_t);
+	return 0;
+}
+
+/*
+ * In many cases for imm8 we can produce shorter code.
+ */
+static size_t
+imm_size(int32_t v)
+{
+	if (v == (int8_t)v)
+		return sizeof(int8_t);
+	return sizeof(int32_t);
+}
+
+static void
+emit_bytes(struct bpf_jit_state *st, const uint8_t ins[], uint32_t sz)
+{
+	uint32_t i;
+
+	if (st->ins != NULL) {
+		for (i = 0; i != sz; i++)
+			st->ins[st->sz + i] = ins[i];
+	}
+	st->sz += sz;
+}
+
+static void
+emit_imm(struct bpf_jit_state *st, const uint32_t imm, uint32_t sz)
+{
+	union bpf_jit_imm v;
+
+	v.u32 = imm;
+	emit_bytes(st, v.u8, sz);
+}
+
+/*
+ * emit REX byte
+ */
+static void
+emit_rex(struct bpf_jit_state *st, uint32_t op, uint32_t reg, uint32_t rm)
+{
+	uint8_t rex;
+
+	/* mark operand registers as used*/
+	USED(st->reguse, reg);
+	USED(st->reguse, rm);
+
+	rex = 0;
+	if (BPF_CLASS(op) == BPF_ALU64 ||
+			op == (BPF_ST | BPF_MEM | BPF_DW) ||
+			op == (BPF_STX | BPF_MEM | BPF_DW) ||
+			op == (BPF_STX | BPF_XADD | BPF_DW) ||
+			op == (BPF_LD | BPF_IMM | BPF_DW) ||
+			(BPF_CLASS(op) == BPF_LDX &&
+			BPF_MODE(op) == BPF_MEM &&
+			BPF_SIZE(op) != BPF_W))
+		rex |= REX_W;
+
+	if (IS_EXT_REG(reg))
+		rex |= REX_R;
+
+	if (IS_EXT_REG(rm))
+		rex |= REX_B;
+
+	/* store using SIL, DIL */
+	if (op == (BPF_STX | BPF_MEM | BPF_B) && (reg == RDI || reg == RSI))
+		rex |= REX_PREFIX;
+
+	if (rex != 0) {
+		rex |= REX_PREFIX;
+		emit_bytes(st, &rex, sizeof(rex));
+	}
+}
+
+/*
+ * emit MODRegRM byte
+ */
+static void
+emit_modregrm(struct bpf_jit_state *st, uint32_t mod, uint32_t reg, uint32_t rm)
+{
+	uint8_t v;
+
+	v = mod << 6 | (reg & 7) << 3 | (rm & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit SIB byte
+ */
+static void
+emit_sib(struct bpf_jit_state *st, uint32_t scale, uint32_t idx, uint32_t base)
+{
+	uint8_t v;
+
+	v = scale << 6 | (idx & 7) << 3 | (base & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit xchg %<sreg>, %<dreg>
+ */
+static void
+emit_xchg_reg(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	const uint8_t ops = 0x87;
+
+	emit_rex(st, BPF_ALU64, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit neg %<dreg>
+ */
+static void
+emit_neg(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 3;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+/*
+ * emit mov %<sreg>, %<dreg>
+ */
+static void
+emit_mov_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x89;
+
+	/* if operands are 32-bit, then it can be used to clear upper 32-bit */
+	if (sreg != dreg || BPF_CLASS(op) == BPF_ALU) {
+		emit_rex(st, op, sreg, dreg);
+		emit_bytes(st, &ops, sizeof(ops));
+		emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+	}
+}
+
+/*
+ * emit movzwl %<sreg>, %<dreg>
+ */
+static void
+emit_movzwl(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	static const uint8_t ops[] = {0x0F, 0xB7};
+
+	emit_rex(st, BPF_ALU, sreg, dreg);
+	emit_bytes(st, ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit ror <imm8>, %<dreg>
+ */
+static void
+emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t prfx = 0x66;
+	const uint8_t ops = 0xC1;
+	const uint8_t mods = 1;
+
+	emit_bytes(st, &prfx, sizeof(prfx));
+	emit_rex(st, BPF_ALU, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit bswap %<dreg>
+ */
+static void
+emit_be2le_48(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	uint32_t rop;
+
+	const uint8_t ops = 0x0F;
+	const uint8_t mods = 1;
+
+	rop = (imm == 64) ? BPF_ALU64 : BPF_ALU;
+	emit_rex(st, rop, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+static void
+emit_be2le(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16) {
+		emit_ror_imm(st, dreg, 8);
+		emit_movzwl(st, dreg, dreg);
+	} else
+		emit_be2le_48(st, dreg, imm);
+}
+
+/*
+ * In general it is NOP for x86.
+ * Just clear the upper bits.
+ */
+static void
+emit_le2be(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16)
+		emit_movzwl(st, dreg, dreg);
+	else if (imm == 32)
+		emit_mov_reg(st, BPF_ALU | BPF_MOV | BPF_X, dreg, dreg);
+}
+
+/*
+ * emit one of:
+ *   add <imm>, %<dreg>
+ *   and <imm>, %<dreg>
+ *   or  <imm>, %<dreg>
+ *   sub <imm>, %<dreg>
+ *   xor <imm>, %<dreg>
+ */
+static void
+emit_alu_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t mod, opcode;
+	uint32_t bop, imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0,
+		[GET_BPF_OP(BPF_AND)] = 4,
+		[GET_BPF_OP(BPF_OR)] =  1,
+		[GET_BPF_OP(BPF_SUB)] = 5,
+		[GET_BPF_OP(BPF_XOR)] = 6,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+
+	imsz = imm_size(imm);
+	opcode = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &opcode, sizeof(opcode));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit one of:
+ *   add %<sreg>, %<dreg>
+ *   and %<sreg>, %<dreg>
+ *   or  %<sreg>, %<dreg>
+ *   sub %<sreg>, %<dreg>
+ *   xor %<sreg>, %<dreg>
+ */
+static void
+emit_alu_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0x01,
+		[GET_BPF_OP(BPF_AND)] = 0x21,
+		[GET_BPF_OP(BPF_OR)] =  0x09,
+		[GET_BPF_OP(BPF_SUB)] = 0x29,
+		[GET_BPF_OP(BPF_XOR)] = 0x31,
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+static void
+emit_shift(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	uint8_t mod;
+	uint32_t bop, opx;
+
+	static const uint8_t ops[] = {0xC1, 0xD3};
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_LSH)] = 4,
+		[GET_BPF_OP(BPF_RSH)] = 5,
+		[GET_BPF_OP(BPF_ARSH)] = 7,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+	opx = (BPF_SRC(op) == BPF_X);
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+}
+
+/*
+ * emit one of:
+ *   shl <imm>, %<dreg>
+ *   shr <imm>, %<dreg>
+ *   sar <imm>, %<dreg>
+ */
+static void
+emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm)
+{
+	emit_shift(st, op, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit one of:
+ *   shl %<dreg>
+ *   shr %<dreg>
+ *   sar %<dreg>
+ * note that rcx is implicitly used as a source register, so few extra
+ * instructions for register spillage might be necessary.
+ */
+static void
+emit_shift_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+
+	emit_shift(st, op, (dreg == RCX) ? sreg : dreg);
+
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+}
+
+/*
+ * emit mov <imm>, %<dreg>
+ */
+static void
+emit_mov_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xC7;
+
+	if (imm == 0) {
+		/* replace 'mov 0, %<dst>' with 'xor %<dst>, %<dst>' */
+		op = BPF_CLASS(op) | BPF_XOR | BPF_X;
+		emit_alu_reg(st, op, dreg, dreg);
+		return;
+	}
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+	emit_imm(st, imm, sizeof(imm));
+}
+
+/*
+ * emit mov <imm64>, %<dreg>
+ */
+static void
+emit_ld_imm64(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm0,
+	uint32_t imm1)
+{
+	const uint8_t ops = 0xB8;
+
+	if (imm1 == 0) {
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, dreg, imm0);
+		return;
+	}
+
+	emit_rex(st, BPF_ALU64, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+
+	emit_imm(st, imm0, sizeof(imm0));
+	emit_imm(st, imm1, sizeof(imm1));
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * either:
+ *   mov %<sreg>, %rdx
+ * OR
+ *   mov <imm>, %rdx
+ * mul %rdx
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ */
+static void
+emit_mul(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 4;
+
+	/* save rax & rdx */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, REG_TMP0);
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* rax = dreg */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, dreg, RAX);
+
+	if (BPF_SRC(op) == BPF_X)
+		/* rdx = sreg */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X,
+			sreg == RAX ? REG_TMP0 : sreg, RDX);
+	else
+		/* rdx = imm */
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, RDX, imm);
+
+	emit_rex(st, op, RAX, RDX);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RDX);
+
+	if (dreg != RDX)
+		/* restore rdx */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP1, RDX);
+
+	if (dreg != RAX) {
+		/* dreg = rax */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, dreg);
+		/* restore rax */
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP0, RAX);
+	}
+}
+
+/*
+ * emit mov <ofs>(%<sreg>), %<dreg>
+ * note that for non 64-bit ops, higher bits have to be cleared.
+ */
+static void
+emit_ld_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	uint32_t mods, opsz;
+	const uint8_t op32 = 0x8B;
+	const uint8_t op16[] = {0x0F, 0xB7};
+	const uint8_t op8[] = {0x0F, 0xB6};
+
+	emit_rex(st, op, dreg, sreg);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_B)
+		emit_bytes(st, op8, sizeof(op8));
+	else if (opsz == BPF_H)
+		emit_bytes(st, op16, sizeof(op16));
+	else
+		emit_bytes(st, &op32, sizeof(op32));
+
+	mods = (imm_size(ofs) == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, dreg, sreg);
+	if (sreg == RSP || sreg == R12)
+		emit_sib(st, SIB_SCALE_1, sreg, sreg);
+	emit_imm(st, ofs, imm_size(ofs));
+}
+
+/*
+ * emit one of:
+ *   mov %<sreg>, <ofs>(%<dreg>)
+ *   mov <imm>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_common(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, uint32_t imm, int32_t ofs)
+{
+	uint32_t mods, imsz, opsz, opx;
+	const uint8_t prfx16 = 0x66;
+
+	/* 8 bit instruction opcodes */
+	static const uint8_t op8[] = {0xC6, 0x88};
+
+	/* 16/32/64 bit instruction opcodes */
+	static const uint8_t ops[] = {0xC7, 0x89};
+
+	/* is the instruction has immediate value or src reg? */
+	opx = (BPF_CLASS(op) == BPF_STX);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_H)
+		emit_bytes(st, &prfx16, sizeof(prfx16));
+
+	emit_rex(st, op, sreg, dreg);
+
+	if (opsz == BPF_B)
+		emit_bytes(st, &op8[opx], sizeof(op8[opx]));
+	else
+		emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, sreg, dreg);
+
+	if (dreg == RSP || dreg == R12)
+		emit_sib(st, SIB_SCALE_1, dreg, dreg);
+
+	emit_imm(st, ofs, imsz);
+
+	if (opx == 0) {
+		imsz = RTE_MIN(bpf_size(opsz), sizeof(imm));
+		emit_imm(st, imm, imsz);
+	}
+}
+
+static void
+emit_st_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm,
+	int32_t ofs)
+{
+	emit_st_common(st, op, 0, dreg, imm, ofs);
+}
+
+static void
+emit_st_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	emit_st_common(st, op, sreg, dreg, 0, ofs);
+}
+
+/*
+ * emit lock add %<sreg>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_xadd(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	uint32_t imsz, mods;
+
+	const uint8_t lck = 0xF0; /* lock prefix */
+	const uint8_t ops = 0x01; /* add opcode */
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_bytes(st, &lck, sizeof(lck));
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, mods, sreg, dreg);
+	emit_imm(st, ofs, imsz);
+}
+
+/*
+ * emit:
+ *    mov <imm64>, (%rax)
+ *    call *%rax
+ */
+static void
+emit_call(struct bpf_jit_state *st, uintptr_t trg)
+{
+	const uint8_t ops = 0xFF;
+	const uint8_t mods = 2;
+
+	emit_ld_imm64(st, RAX, trg, trg >> 32);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RAX);
+}
+
+/*
+ * emit jmp <ofs>
+ * where 'ofs' is the target offset for the native code.
+ */
+static void
+emit_abs_jmp(struct bpf_jit_state *st, int32_t ofs)
+{
+	int32_t joff;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0xEB;
+	const uint8_t op32 = 0xE9;
+
+	const int32_t sz8 = sizeof(op8) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32) + sizeof(uint32_t);
+
+	/* max possible jmp instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = ofs - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8, sizeof(op8));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, &op32, sizeof(op32));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit jmp <ofs>
+ * where 'ofs' is the target offset for the BPF bytecode.
+ */
+static void
+emit_jmp(struct bpf_jit_state *st, int32_t ofs)
+{
+	emit_abs_jmp(st, st->off[st->idx + ofs]);
+}
+
+/*
+ * emit one of:
+ *    cmovz %<sreg>, <%dreg>
+ *    cmovne %<sreg>, <%dreg>
+ *    cmova %<sreg>, <%dreg>
+ *    cmovb %<sreg>, <%dreg>
+ *    cmovae %<sreg>, <%dreg>
+ *    cmovbe %<sreg>, <%dreg>
+ *    cmovg %<sreg>, <%dreg>
+ *    cmovl %<sreg>, <%dreg>
+ *    cmovge %<sreg>, <%dreg>
+ *    cmovle %<sreg>, <%dreg>
+ */
+static void
+emit_movcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x44},  /* CMOVZ */
+		[GET_BPF_OP(BPF_JNE)] = {0x0F, 0x45},  /* CMOVNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x47},  /* CMOVA */
+		[GET_BPF_OP(BPF_JLT)] = {0x0F, 0x42},  /* CMOVB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x43},  /* CMOVAE */
+		[GET_BPF_OP(BPF_JLE)] = {0x0F, 0x46},  /* CMOVBE */
+		[GET_BPF_OP(BPF_JSGT)] = {0x0F, 0x4F}, /* CMOVG */
+		[GET_BPF_OP(BPF_JSLT)] = {0x0F, 0x4C}, /* CMOVL */
+		[GET_BPF_OP(BPF_JSGE)] = {0x0F, 0x4D}, /* CMOVGE */
+		[GET_BPF_OP(BPF_JSLE)] = {0x0F, 0x4E}, /* CMOVLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x45}, /* CMOVNE */
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, dreg, sreg);
+	emit_bytes(st, ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, dreg, sreg);
+}
+
+/*
+ * emit one of:
+ * je <ofs>
+ * jne <ofs>
+ * ja <ofs>
+ * jb <ofs>
+ * jae <ofs>
+ * jbe <ofs>
+ * jg <ofs>
+ * jl <ofs>
+ * jge <ofs>
+ * jle <ofs>
+ * where 'ofs' is the target offset for the native code.
+ */
+static void
+emit_abs_jcc(struct bpf_jit_state *st, uint32_t op, int32_t ofs)
+{
+	uint32_t bop, imsz;
+	int32_t joff;
+
+	static const uint8_t op8[] = {
+		[GET_BPF_OP(BPF_JEQ)] = 0x74,  /* JE */
+		[GET_BPF_OP(BPF_JNE)] = 0x75,  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = 0x77,  /* JA */
+		[GET_BPF_OP(BPF_JLT)] = 0x72,  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = 0x73,  /* JAE */
+		[GET_BPF_OP(BPF_JLE)] = 0x76,  /* JBE */
+		[GET_BPF_OP(BPF_JSGT)] = 0x7F, /* JG */
+		[GET_BPF_OP(BPF_JSLT)] = 0x7C, /* JL */
+		[GET_BPF_OP(BPF_JSGE)] = 0x7D, /*JGE */
+		[GET_BPF_OP(BPF_JSLE)] = 0x7E, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = 0x75, /*JNE */
+	};
+
+	static const uint8_t op32[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x84},  /* JE */
+		[GET_BPF_OP(BPF_JNE)] = {0x0F, 0x85},  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x87},  /* JA */
+		[GET_BPF_OP(BPF_JLT)] = {0x0F, 0x82},  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x83},  /* JAE */
+		[GET_BPF_OP(BPF_JLE)] = {0x0F, 0x86},  /* JBE */
+		[GET_BPF_OP(BPF_JSGT)] = {0x0F, 0x8F}, /* JG */
+		[GET_BPF_OP(BPF_JSLT)] = {0x0F, 0x8C}, /* JL */
+		[GET_BPF_OP(BPF_JSGE)] = {0x0F, 0x8D}, /*JGE */
+		[GET_BPF_OP(BPF_JSLE)] = {0x0F, 0x8E}, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x85}, /*JNE */
+	};
+
+	const int32_t sz8 = sizeof(op8[0]) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32[0]) + sizeof(uint32_t);
+
+	/* max possible jcc instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = ofs - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	bop = GET_BPF_OP(op);
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8[bop], sizeof(op8[bop]));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, op32[bop], sizeof(op32[bop]));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit one of:
+ * je <ofs>
+ * jne <ofs>
+ * ja <ofs>
+ * jb <ofs>
+ * jae <ofs>
+ * jbe <ofs>
+ * jg <ofs>
+ * jl <ofs>
+ * jge <ofs>
+ * jle <ofs>
+ * where 'ofs' is the target offset for the BPF bytecode.
+ */
+static void
+emit_jcc(struct bpf_jit_state *st, uint32_t op, int32_t ofs)
+{
+	emit_abs_jcc(st, op, st->off[st->idx + ofs]);
+}
+
+
+/*
+ * emit cmp <imm>, %<dreg>
+ */
+static void
+emit_cmp_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t ops;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	const uint8_t mods = 7;
+
+	imsz = imm_size(imm);
+	ops = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit test <imm>, %<dreg>
+ */
+static void
+emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 0;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+static void
+emit_jcc_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_imm(st, BPF_ALU64, dreg, imm);
+	else
+		emit_cmp_imm(st, BPF_ALU64, dreg, imm);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * emit test %<sreg>, %<dreg>
+ */
+static void
+emit_tst_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x85;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit cmp %<sreg>, %<dreg>
+ */
+static void
+emit_cmp_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x39;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+
+}
+
+static void
+emit_jcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_reg(st, BPF_ALU64, sreg, dreg);
+	else
+		emit_cmp_reg(st, BPF_ALU64, sreg, dreg);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * xor %rdx, %rdx
+ * for divisor as immediate value:
+ *   mov <imm>, %r9
+ * div %<divisor_reg>
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ * either:
+ *   mov %rax, %<dreg>
+ * OR
+ *   mov %rdx, %<dreg>
+ * mov %r11, %rax
+ * mov %r10, %rdx
+ */
+static void
+emit_div(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	uint32_t sr;
+
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 6;
+
+	if (BPF_SRC(op) == BPF_X) {
+
+		/* check that src divisor is not zero */
+		emit_tst_reg(st, BPF_CLASS(op), sreg, sreg);
+
+		/* exit with return value zero */
+		emit_movcc_reg(st, BPF_CLASS(op) | BPF_JEQ | BPF_X, sreg, RAX);
+		emit_abs_jcc(st, BPF_JMP | BPF_JEQ | BPF_K, st->exit.off);
+	}
+
+	/* save rax & rdx */
+	if (dreg != RAX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, REG_TMP0);
+	if (dreg != RDX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* fill rax & rdx */
+	emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, dreg, RAX);
+	emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, RDX, 0);
+
+	if (BPF_SRC(op) == BPF_X) {
+		sr = sreg;
+		if (sr == RAX)
+			sr = REG_TMP0;
+		else if (sr == RDX)
+			sr = REG_TMP1;
+	} else {
+		sr = REG_DIV_IMM;
+		emit_mov_imm(st, BPF_ALU64 | BPF_MOV | BPF_K, sr, imm);
+	}
+
+	emit_rex(st, op, 0, sr);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, sr);
+
+	if (BPF_OP(op) == BPF_DIV)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RAX, dreg);
+	else
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RDX, dreg);
+
+	if (dreg != RAX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP0, RAX);
+	if (dreg != RDX)
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, REG_TMP1, RDX);
+}
+
+static void
+emit_prolog(struct bpf_jit_state *st, int32_t stack_size)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	/* we can avoid touching the stack at all */
+	if (spil == 0)
+		return;
+
+
+	emit_alu_imm(st, BPF_ALU64 | BPF_SUB | BPF_K, RSP,
+		spil * sizeof(uint64_t));
+
+	ofs = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++) {
+		if (INUSE(st->reguse, save_regs[i]) != 0) {
+			emit_st_reg(st, BPF_STX | BPF_MEM | BPF_DW,
+				save_regs[i], RSP, ofs);
+			ofs += sizeof(uint64_t);
+		}
+	}
+
+	if (INUSE(st->reguse, RBP) != 0) {
+		emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RSP, RBP);
+		emit_alu_imm(st, BPF_ALU64 | BPF_SUB | BPF_K, RSP, stack_size);
+	}
+}
+
+/*
+ * emit ret
+ */
+static void
+emit_ret(struct bpf_jit_state *st)
+{
+	const uint8_t ops = 0xC3;
+
+	emit_bytes(st, &ops, sizeof(ops));
+}
+
+static void
+emit_epilog(struct bpf_jit_state *st)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	/* if we allready have an epilog generate a jump to it */
+	if (st->exit.num++ != 0) {
+		emit_abs_jmp(st, st->exit.off);
+		return;
+	}
+
+	/* store offset of epilog block */
+	st->exit.off = st->sz;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	if (spil != 0) {
+
+		if (INUSE(st->reguse, RBP) != 0)
+			emit_mov_reg(st, BPF_ALU64 | BPF_MOV | BPF_X, RBP, RSP);
+
+		ofs = 0;
+		for (i = 0; i != RTE_DIM(save_regs); i++) {
+			if (INUSE(st->reguse, save_regs[i]) != 0) {
+				emit_ld_reg(st, BPF_LDX | BPF_MEM | BPF_DW,
+					RSP, save_regs[i], ofs);
+				ofs += sizeof(uint64_t);
+			}
+		}
+
+		emit_alu_imm(st, BPF_ALU64 | BPF_ADD | BPF_K, RSP,
+			spil * sizeof(uint64_t));
+	}
+
+	emit_ret(st);
+}
+
+/*
+ * walk through bpf code and translate them x86_64 one.
+ */
+static int
+emit(struct bpf_jit_state *st, const struct rte_bpf *bpf)
+{
+	uint32_t i, dr, op, sr;
+	const struct bpf_insn *ins;
+
+	/* reset state fields */
+	st->sz = 0;
+	st->exit.num = 0;
+
+	emit_prolog(st, bpf->stack_sz);
+
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		st->idx = i;
+		st->off[i] = st->sz;
+
+		ins = bpf->prm.ins + i;
+
+		dr = ebpf2x86[ins->dst_reg];
+		sr = ebpf2x86[ins->src_reg];
+		op = ins->code;
+
+		switch (op) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+		case (BPF_ALU | BPF_SUB | BPF_K):
+		case (BPF_ALU | BPF_AND | BPF_K):
+		case (BPF_ALU | BPF_OR | BPF_K):
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+		case (BPF_ALU | BPF_SUB | BPF_X):
+		case (BPF_ALU | BPF_AND | BPF_X):
+		case (BPF_ALU | BPF_OR | BPF_X):
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_BE):
+			emit_be2le(st, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_END | BPF_TO_LE):
+			emit_le2be(st, dr, ins->imm);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_K):
+		case (BPF_ALU64 | BPF_SUB | BPF_K):
+		case (BPF_ALU64 | BPF_AND | BPF_K):
+		case (BPF_ALU64 | BPF_OR | BPF_K):
+		case (BPF_ALU64 | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_K):
+		case (BPF_ALU64 | BPF_RSH | BPF_K):
+		case (BPF_ALU64 | BPF_ARSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 64 bit ALU REG operations */
+		case (BPF_ALU64 | BPF_ADD | BPF_X):
+		case (BPF_ALU64 | BPF_SUB | BPF_X):
+		case (BPF_ALU64 | BPF_AND | BPF_X):
+		case (BPF_ALU64 | BPF_OR | BPF_X):
+		case (BPF_ALU64 | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_LSH | BPF_X):
+		case (BPF_ALU64 | BPF_RSH | BPF_X):
+		case (BPF_ALU64 | BPF_ARSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU64 | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		/* multiply instructions */
+		case (BPF_ALU | BPF_MUL | BPF_K):
+		case (BPF_ALU | BPF_MUL | BPF_X):
+		case (BPF_ALU64 | BPF_MUL | BPF_K):
+		case (BPF_ALU64 | BPF_MUL | BPF_X):
+			emit_mul(st, op, sr, dr, ins->imm);
+			break;
+		/* divide instructions */
+		case (BPF_ALU | BPF_DIV | BPF_K):
+		case (BPF_ALU | BPF_MOD | BPF_K):
+		case (BPF_ALU | BPF_DIV | BPF_X):
+		case (BPF_ALU | BPF_MOD | BPF_X):
+		case (BPF_ALU64 | BPF_DIV | BPF_K):
+		case (BPF_ALU64 | BPF_MOD | BPF_K):
+		case (BPF_ALU64 | BPF_DIV | BPF_X):
+		case (BPF_ALU64 | BPF_MOD | BPF_X):
+			emit_div(st, op, sr, dr, ins->imm);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+		case (BPF_LDX | BPF_MEM | BPF_H):
+		case (BPF_LDX | BPF_MEM | BPF_W):
+		case (BPF_LDX | BPF_MEM | BPF_DW):
+			emit_ld_reg(st, op, sr, dr, ins->off);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | BPF_DW):
+			emit_ld_imm64(st, dr, ins[0].imm, ins[1].imm);
+			i++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+		case (BPF_STX | BPF_MEM | BPF_H):
+		case (BPF_STX | BPF_MEM | BPF_W):
+		case (BPF_STX | BPF_MEM | BPF_DW):
+			emit_st_reg(st, op, sr, dr, ins->off);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+		case (BPF_ST | BPF_MEM | BPF_H):
+		case (BPF_ST | BPF_MEM | BPF_W):
+		case (BPF_ST | BPF_MEM | BPF_DW):
+			emit_st_imm(st, op, dr, ins->imm, ins->off);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | BPF_XADD | BPF_W):
+		case (BPF_STX | BPF_XADD | BPF_DW):
+			emit_st_xadd(st, op, sr, dr, ins->off);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			emit_jmp(st, ins->off + 1);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | BPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | BPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | BPF_JLE | BPF_K):
+		case (BPF_JMP | BPF_JSGT | BPF_K):
+		case (BPF_JMP | BPF_JSLT | BPF_K):
+		case (BPF_JMP | BPF_JSGE | BPF_K):
+		case (BPF_JMP | BPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			emit_jcc_imm(st, op, dr, ins->imm, ins->off + 1);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | BPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | BPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | BPF_JLE | BPF_X):
+		case (BPF_JMP | BPF_JSGT | BPF_X):
+		case (BPF_JMP | BPF_JSLT | BPF_X):
+		case (BPF_JMP | BPF_JSGE | BPF_X):
+		case (BPF_JMP | BPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			emit_jcc_reg(st, op, sr, dr, ins->off + 1);
+			break;
+		/* call instructions */
+		case (BPF_JMP | BPF_CALL):
+			emit_call(st, (uintptr_t)bpf->prm.xsym[ins->imm].func);
+			break;
+		/* return instruction */
+		case (BPF_JMP | BPF_EXIT):
+			emit_epilog(st);
+			break;
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %u;\n",
+				__func__, bpf, ins->code, i);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * produce a native ISA version of the given BPF code.
+ */
+int
+bpf_jit_x86(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	uint32_t i;
+	size_t sz;
+	struct bpf_jit_state st;
+
+	/* init state */
+	memset(&st, 0, sizeof(st));
+	st.off = malloc(bpf->prm.nb_ins * sizeof(st.off[0]));
+	if (st.off == NULL)
+		return -ENOMEM;
+
+	/* fill with fake offsets */
+	st.exit.off = INT32_MAX;
+	for (i = 0; i != bpf->prm.nb_ins; i++)
+		st.off[i] = INT32_MAX;
+
+	/*
+	 * dry runs, used to calculate total code size and valid jump offsets.
+	 * stop when we get minimal possible size
+	 */
+	do {
+		sz = st.sz;
+		rc = emit(&st, bpf);
+	} while (rc == 0 && sz != st.sz);
+
+	if (rc == 0) {
+
+		/* allocate memory needed */
+		st.ins = mmap(NULL, st.sz, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (st.ins == MAP_FAILED)
+			rc = -ENOMEM;
+		else
+			/* generate code */
+			rc = emit(&st, bpf);
+	}
+
+	if (rc == 0 && mprotect(st.ins, st.sz, PROT_READ | PROT_EXEC) != 0)
+		rc = -ENOMEM;
+
+	if (rc != 0)
+		munmap(st.ins, st.sz);
+	else {
+		bpf->jit.func = (void *)st.ins;
+		bpf->jit.sz = st.sz;
+	}
+
+	free(st.off);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index 05c48c7ff..67ca30533 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -7,6 +7,10 @@ sources = files('bpf.c',
 		'bpf_load.c',
 		'bpf_validate.c')
 
+if arch_subdir == 'x86'
+	sources += files('bpf_jit_x86.c')
+endif
+
 install_headers = files('rte_bpf.h')
 
 deps += ['mbuf', 'net']
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 05/10] bpf: introduce basic RX/TX BPF filters
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                       ` (4 preceding siblings ...)
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 04/10] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 06/10] testpmd: new commands to load/unload " Konstantin Ananyev
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce API to install BPF based filters on ethdev RX/TX path.
Current implementation is pure SW one, based on ethdev RX/TX
callback mechanism.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/Makefile            |   2 +
 lib/librte_bpf/bpf_pkt.c           | 607 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build         |   6 +-
 lib/librte_bpf/rte_bpf_ethdev.h    | 102 +++++++
 lib/librte_bpf/rte_bpf_version.map |   4 +
 5 files changed, 719 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index 44b12c439..501c49c60 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -22,6 +22,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_pkt.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
 ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
@@ -29,5 +30,6 @@ endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf_ethdev.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf_pkt.c b/lib/librte_bpf/bpf_pkt.c
new file mode 100644
index 000000000..a8735456e
--- /dev/null
+++ b/lib/librte_bpf/bpf_pkt.c
@@ -0,0 +1,607 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include <sys/queue.h>
+#include <sys/stat.h>
+
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include <rte_malloc.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_cycles.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+
+#include <rte_bpf_ethdev.h>
+#include "bpf_impl.h"
+
+/*
+ * information about installed BPF rx/tx callback
+ */
+
+struct bpf_eth_cbi {
+	/* used by both data & control path */
+	uint32_t use;    /*usage counter */
+	const struct rte_eth_rxtx_callback *cb;  /* callback handle */
+	struct rte_bpf *bpf;
+	struct rte_bpf_jit jit;
+	/* used by control path only */
+	LIST_ENTRY(bpf_eth_cbi) link;
+	uint16_t port;
+	uint16_t queue;
+} __rte_cache_aligned;
+
+/*
+ * Odd number means that callback is used by datapath.
+ * Even number means that callback is not used by datapath.
+ */
+#define BPF_ETH_CBI_INUSE  1
+
+/*
+ * List to manage RX/TX installed callbacks.
+ */
+LIST_HEAD(bpf_eth_cbi_list, bpf_eth_cbi);
+
+enum {
+	BPF_ETH_RX,
+	BPF_ETH_TX,
+	BPF_ETH_NUM,
+};
+
+/*
+ * information about all installed BPF rx/tx callbacks
+ */
+struct bpf_eth_cbh {
+	rte_spinlock_t lock;
+	struct bpf_eth_cbi_list list;
+	uint32_t type;
+};
+
+static struct bpf_eth_cbh rx_cbh = {
+	.lock = RTE_SPINLOCK_INITIALIZER,
+	.list = LIST_HEAD_INITIALIZER(list),
+	.type = BPF_ETH_RX,
+};
+
+static struct bpf_eth_cbh tx_cbh = {
+	.lock = RTE_SPINLOCK_INITIALIZER,
+	.list = LIST_HEAD_INITIALIZER(list),
+	.type = BPF_ETH_TX,
+};
+
+/*
+ * Marks given callback as used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
+{
+	cbi->use++;
+	/* make sure no store/load reordering could happen */
+	rte_smp_mb();
+}
+
+/*
+ * Marks given callback list as not used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
+{
+	/* make sure all previous loads are completed */
+	rte_smp_rmb();
+	cbi->use++;
+}
+
+/*
+ * Waits till datapath finished using given callback.
+ */
+static void
+bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
+{
+	uint32_t nuse, puse;
+
+	/* make sure all previous loads and stores are completed */
+	rte_smp_mb();
+
+	puse = cbi->use;
+
+	/* in use, busy wait till current RX/TX iteration is finished */
+	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
+		do {
+			rte_pause();
+			rte_compiler_barrier();
+			nuse = cbi->use;
+		} while (nuse == puse);
+	}
+}
+
+static void
+bpf_eth_cbi_cleanup(struct bpf_eth_cbi *bc)
+{
+	bc->bpf = NULL;
+	memset(&bc->jit, 0, sizeof(bc->jit));
+}
+
+static struct bpf_eth_cbi *
+bpf_eth_cbh_find(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *cbi;
+
+	LIST_FOREACH(cbi, &cbh->list, link) {
+		if (cbi->port == port && cbi->queue == queue)
+			break;
+	}
+	return cbi;
+}
+
+static struct bpf_eth_cbi *
+bpf_eth_cbh_add(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *cbi;
+
+	/* return an existing one */
+	cbi = bpf_eth_cbh_find(cbh, port, queue);
+	if (cbi != NULL)
+		return cbi;
+
+	cbi = rte_zmalloc(NULL, sizeof(*cbi), RTE_CACHE_LINE_SIZE);
+	if (cbi != NULL) {
+		cbi->port = port;
+		cbi->queue = queue;
+		LIST_INSERT_HEAD(&cbh->list, cbi, link);
+	}
+	return cbi;
+}
+
+/*
+ * BPF packet processing routinies.
+ */
+
+static inline uint32_t
+apply_filter(struct rte_mbuf *mb[], const uint64_t rc[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i, j, k;
+	struct rte_mbuf *dr[num];
+
+	for (i = 0, j = 0, k = 0; i != num; i++) {
+
+		/* filter matches */
+		if (rc[i] != 0)
+			mb[j++] = mb[i];
+		/* no match */
+		else
+			dr[k++] = mb[i];
+	}
+
+	if (drop != 0) {
+		/* free filtered out mbufs */
+		for (i = 0; i != k; i++)
+			rte_pktmbuf_free(dr[i]);
+	} else {
+		/* copy filtered out mbufs beyond good ones */
+		for (i = 0; i != k; i++)
+			mb[j + i] = dr[i];
+	}
+
+	return j;
+}
+
+static inline uint32_t
+pkt_filter_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i;
+	void *dp[num];
+	uint64_t rc[num];
+
+	for (i = 0; i != num; i++)
+		dp[i] = rte_pktmbuf_mtod(mb[i], void *);
+
+	rte_bpf_exec_burst(bpf, dp, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i, n;
+	void *dp;
+	uint64_t rc[num];
+
+	n = 0;
+	for (i = 0; i != num; i++) {
+		dp = rte_pktmbuf_mtod(mb[i], void *);
+		rc[i] = jit->func(dp);
+		n += (rc[i] == 0);
+	}
+
+	if (n != 0)
+		num = apply_filter(mb, rc, num, drop);
+
+	return num;
+}
+
+static inline uint32_t
+pkt_filter_mb_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint64_t rc[num];
+
+	rte_bpf_exec_burst(bpf, (void **)mb, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_mb_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i, n;
+	uint64_t rc[num];
+
+	n = 0;
+	for (i = 0; i != num; i++) {
+		rc[i] = jit->func(mb[i]);
+		n += (rc[i] == 0);
+	}
+
+	if (n != 0)
+		num = apply_filter(mb, rc, num, drop);
+
+	return num;
+}
+
+/*
+ * RX/TX callbacks for raw data bpf.
+ */
+
+static uint16_t
+bpf_rx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+/*
+ * RX/TX callbacks for mbuf.
+ */
+
+static uint16_t
+bpf_rx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static rte_rx_callback_fn
+select_rx_callback(enum rte_bpf_prog_type ptype, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+			return bpf_rx_callback_jit;
+		else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+			return bpf_rx_callback_mb_jit;
+	} else if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+		return bpf_rx_callback_vm;
+	else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+		return bpf_rx_callback_mb_vm;
+
+	return NULL;
+}
+
+static rte_tx_callback_fn
+select_tx_callback(enum rte_bpf_prog_type ptype, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+			return bpf_tx_callback_jit;
+		else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+			return bpf_tx_callback_mb_jit;
+	} else if (ptype == RTE_BPF_PROG_TYPE_UNSPEC)
+		return bpf_tx_callback_vm;
+	else if (ptype == RTE_BPF_PROG_TYPE_MBUF)
+		return bpf_tx_callback_mb_vm;
+
+	return NULL;
+}
+
+/*
+ * helper function to perform BPF unload for given port/queue.
+ * have to introduce extra complexity (and possible slowdown) here,
+ * as right now there is no safe generic way to remove RX/TX callback
+ * while IO is active.
+ * Still don't free memory allocated for callback handle itself,
+ * again right now there is no safe way to do that without stopping RX/TX
+ * on given port/queue first.
+ */
+static void
+bpf_eth_cbi_unload(struct bpf_eth_cbi *bc)
+{
+	/* mark this cbi as empty */
+	bc->cb = NULL;
+	rte_smp_mb();
+
+	/* make sure datapath doesn't use bpf anymore, then destroy bpf */
+	bpf_eth_cbi_wait(bc);
+	rte_bpf_destroy(bc->bpf);
+	bpf_eth_cbi_cleanup(bc);
+}
+
+static void
+bpf_eth_unload(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *bc;
+
+	bc = bpf_eth_cbh_find(cbh, port, queue);
+	if (bc == NULL || bc->cb == NULL)
+		return;
+
+	if (cbh->type == BPF_ETH_RX)
+		rte_eth_remove_rx_callback(port, queue, bc->cb);
+	else
+		rte_eth_remove_tx_callback(port, queue, bc->cb);
+
+	bpf_eth_cbi_unload(bc);
+}
+
+
+__rte_experimental void
+rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &rx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	bpf_eth_unload(cbh, port, queue);
+	rte_spinlock_unlock(&cbh->lock);
+}
+
+__rte_experimental void
+rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &tx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	bpf_eth_unload(cbh, port, queue);
+	rte_spinlock_unlock(&cbh->lock);
+}
+
+static int
+bpf_eth_elf_load(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbi *bc;
+	struct rte_bpf *bpf;
+	rte_rx_callback_fn frx;
+	rte_tx_callback_fn ftx;
+	struct rte_bpf_jit jit;
+
+	frx = NULL;
+	ftx = NULL;
+
+	if (prm == NULL || rte_eth_dev_is_valid_port(port) == 0 ||
+			queue >= RTE_MAX_QUEUES_PER_PORT)
+		return -EINVAL;
+
+	if (cbh->type == BPF_ETH_RX)
+		frx = select_rx_callback(prm->prog_type, flags);
+	else
+		ftx = select_tx_callback(prm->prog_type, flags);
+
+	if (frx == NULL && ftx == NULL) {
+		RTE_BPF_LOG(ERR, "%s(%u, %u): no callback selected;\n",
+			__func__, port, queue);
+		return -EINVAL;
+	}
+
+	bpf = rte_bpf_elf_load(prm, fname, sname);
+	if (bpf == NULL)
+		return -rte_errno;
+
+	rte_bpf_get_jit(bpf, &jit);
+
+	if ((flags & RTE_BPF_ETH_F_JIT) != 0 && jit.func == NULL) {
+		RTE_BPF_LOG(ERR, "%s(%u, %u): no JIT generated;\n",
+			__func__, port, queue);
+		rte_bpf_destroy(bpf);
+		rc = -ENOTSUP;
+	}
+
+	/* setup/update global callback info */
+	bc = bpf_eth_cbh_add(cbh, port, queue);
+	if (bc == NULL)
+		return -ENOMEM;
+
+	/* remove old one, if any */
+	if (bc->cb != NULL)
+		bpf_eth_unload(cbh, port, queue);
+
+	bc->bpf = bpf;
+	bc->jit = jit;
+
+	if (cbh->type == BPF_ETH_RX)
+		bc->cb = rte_eth_add_rx_callback(port, queue, frx, bc);
+	else
+		bc->cb = rte_eth_add_tx_callback(port, queue, ftx, bc);
+
+	if (bc->cb == NULL) {
+		rc = -rte_errno;
+		rte_bpf_destroy(bpf);
+		bpf_eth_cbi_cleanup(bc);
+	} else
+		rc = 0;
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &rx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	rc = bpf_eth_elf_load(cbh, port, queue, prm, fname, sname, flags);
+	rte_spinlock_unlock(&cbh->lock);
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &tx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	rc = bpf_eth_elf_load(cbh, port, queue, prm, fname, sname, flags);
+	rte_spinlock_unlock(&cbh->lock);
+
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index 67ca30533..39b464041 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -5,15 +5,17 @@ allow_experimental_apis = true
 sources = files('bpf.c',
 		'bpf_exec.c',
 		'bpf_load.c',
+		'bpf_pkt.c',
 		'bpf_validate.c')
 
 if arch_subdir == 'x86'
 	sources += files('bpf_jit_x86.c')
 endif
 
-install_headers = files('rte_bpf.h')
+install_headers = files('rte_bpf.h',
+			'rte_bpf_ethdev.h')
 
-deps += ['mbuf', 'net']
+deps += ['mbuf', 'net', 'ethdev']
 
 dep = dependency('libelf', required: false)
 if dep.found() == false
diff --git a/lib/librte_bpf/rte_bpf_ethdev.h b/lib/librte_bpf/rte_bpf_ethdev.h
new file mode 100644
index 000000000..4800bbdaa
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_ethdev.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_ETHDEV_H_
+#define _RTE_BPF_ETHDEV_H_
+
+/**
+ * @file
+ *
+ * API to install BPF filter as RX/TX callbacks for eth devices.
+ * Note that right now:
+ * - it is not MT safe, i.e. it is not allowed to do load/unload for the
+ *   same port/queue from different threads in parallel.
+ * - though it allows to do load/unload at runtime
+ *   (while RX/TX is ongoing on given port/queue).
+ * - allows only one BPF program per port/queue,
+ * i.e. new load will replace previously loaded for that port/queue BPF program.
+ * Filter behaviour - if BPF program returns zero value for a given packet,
+ * then it will be dropped inside callback and no further processing
+ *   on RX - it will be dropped inside callback and no further processing
+ *   for that packet will happen.
+ *   on TX - packet will remain unsent, and it is responsibility of the user
+ *   to handle such situation (drop, try to send again, etc.).
+ */
+
+#include <rte_bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum {
+	RTE_BPF_ETH_F_NONE = 0,
+	RTE_BPF_ETH_F_JIT  = 0x1, /*< use compiled into native ISA code */
+};
+
+/**
+ * Unload previously loaded BPF program (if any) from given RX port/queue
+ * and remove appropriate RX port/queue callback.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the RX queue on the given port
+ */
+void rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue);
+
+/**
+ * Unload previously loaded BPF program (if any) from given TX port/queue
+ * and remove appropriate TX port/queue callback.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the TX queue on the given port
+ */
+void rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue);
+
+/**
+ * Load BPF program from the ELF file and install callback to execute it
+ * on given RX port/queue.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the RX queue on the given port
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   Zero on successful completion or negative error code otherwise.
+ */
+int rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+/**
+ * Load BPF program from the ELF file and install callback to execute it
+ * on given TX port/queue.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the TX queue on the given port
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   Zero on successful completion or negative error code otherwise.
+ */
+int rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_ETHDEV_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
index ff65144df..a203e088e 100644
--- a/lib/librte_bpf/rte_bpf_version.map
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -3,6 +3,10 @@ EXPERIMENTAL {
 
 	rte_bpf_destroy;
 	rte_bpf_elf_load;
+	rte_bpf_eth_rx_elf_load;
+	rte_bpf_eth_rx_unload;
+	rte_bpf_eth_tx_elf_load;
+	rte_bpf_eth_tx_unload;
 	rte_bpf_exec;
 	rte_bpf_exec_burst;
 	rte_bpf_get_jit;
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 06/10] testpmd: new commands to load/unload BPF filters
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                       ` (5 preceding siblings ...)
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 05/10] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 07/10] test: add few eBPF samples Konstantin Ananyev
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce new testpmd commands to load/unload RX/TX BPF-based filters.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/bpf_sup.h   |  25 ++++++++
 app/test-pmd/cmdline.c   | 146 +++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/meson.build |   2 +-
 3 files changed, 172 insertions(+), 1 deletion(-)
 create mode 100644 app/test-pmd/bpf_sup.h

diff --git a/app/test-pmd/bpf_sup.h b/app/test-pmd/bpf_sup.h
new file mode 100644
index 000000000..35f91a07f
--- /dev/null
+++ b/app/test-pmd/bpf_sup.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2017 Intel Corporation
+ */
+
+#ifndef _BPF_SUP_H_
+#define _BPF_SUP_H_
+
+#include <stdio.h>
+#include <rte_mbuf.h>
+#include <rte_bpf_ethdev.h>
+
+static const struct rte_bpf_xsym bpf_xsym[] = {
+	{
+		.name = RTE_STR(stdout),
+		.type = RTE_BPF_XTYPE_VAR,
+		.var = &stdout,
+	},
+	{
+		.name = RTE_STR(rte_pktmbuf_dump),
+		.type = RTE_BPF_XTYPE_FUNC,
+		.func = (void *)rte_pktmbuf_dump,
+	},
+};
+
+#endif /* _BPF_SUP_H_ */
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 40b31ad7e..d0ad27871 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -75,6 +75,7 @@
 #include "testpmd.h"
 #include "cmdline_mtr.h"
 #include "cmdline_tm.h"
+#include "bpf_sup.h"
 
 static struct cmdline *testpmd_cl;
 
@@ -16030,6 +16031,149 @@ cmdline_parse_inst_t cmd_load_from_file = {
 	},
 };
 
+/* *** load BPF program *** */
+struct cmd_bpf_ld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+	cmdline_fixed_string_t op;
+	cmdline_fixed_string_t flags;
+	cmdline_fixed_string_t prm;
+};
+
+static void
+bpf_parse_flags(const char *str, enum rte_bpf_prog_type *ptype, uint32_t *flags)
+{
+	uint32_t i, v;
+
+	*flags = RTE_BPF_ETH_F_NONE;
+	*ptype = RTE_BPF_PROG_TYPE_UNSPEC;
+
+	for (i = 0; str[i] != 0; i++) {
+		v = toupper(str[i]);
+		if (v == 'J')
+			*flags |= RTE_BPF_ETH_F_JIT;
+		else if (v == 'M')
+			*ptype = RTE_BPF_PROG_TYPE_MBUF;
+		else if (v == '-')
+			continue;
+		else
+			printf("unknown flag: \'%c\'", v);
+	}
+}
+
+static void cmd_operate_bpf_ld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	int32_t rc;
+	uint32_t flags;
+	struct cmd_bpf_ld_result *res;
+	struct rte_bpf_prm prm;
+	const char *fname, *sname;
+
+	res = parsed_result;
+	memset(&prm, 0, sizeof(prm));
+	prm.xsym = bpf_xsym;
+	prm.nb_xsym = RTE_DIM(bpf_xsym);
+
+	bpf_parse_flags(res->flags, &prm.prog_type, &flags);
+	fname = res->prm;
+	sname = ".text";
+
+	if (strcmp(res->dir, "rx") == 0) {
+		rc = rte_bpf_eth_rx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else if (strcmp(res->dir, "tx") == 0) {
+		rc = rte_bpf_eth_tx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_load_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			bpf, "bpf-load");
+cmdline_parse_token_string_t cmd_load_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_load_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_load_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, queue, UINT16);
+cmdline_parse_token_string_t cmd_load_bpf_flags =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			flags, NULL);
+cmdline_parse_token_string_t cmd_load_bpf_prm =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			prm, NULL);
+
+cmdline_parse_inst_t cmd_operate_bpf_ld_parse = {
+	.f = cmd_operate_bpf_ld_parsed,
+	.data = NULL,
+	.help_str = "bpf-load rx|tx <port> <queue> <J|M|B> <file_name>",
+	.tokens = {
+		(void *)&cmd_load_bpf_start,
+		(void *)&cmd_load_bpf_dir,
+		(void *)&cmd_load_bpf_port,
+		(void *)&cmd_load_bpf_queue,
+		(void *)&cmd_load_bpf_flags,
+		(void *)&cmd_load_bpf_prm,
+		NULL,
+	},
+};
+
+/* *** unload BPF program *** */
+struct cmd_bpf_unld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+};
+
+static void cmd_operate_bpf_unld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	struct cmd_bpf_unld_result *res;
+
+	res = parsed_result;
+
+	if (strcmp(res->dir, "rx") == 0)
+		rte_bpf_eth_rx_unload(res->port, res->queue);
+	else if (strcmp(res->dir, "tx") == 0)
+		rte_bpf_eth_tx_unload(res->port, res->queue);
+	else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_unload_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			bpf, "bpf-unload");
+cmdline_parse_token_string_t cmd_unload_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_unload_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_unload_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, queue, UINT16);
+
+cmdline_parse_inst_t cmd_operate_bpf_unld_parse = {
+	.f = cmd_operate_bpf_unld_parsed,
+	.data = NULL,
+	.help_str = "bpf-unload rx|tx <port> <queue>",
+	.tokens = {
+		(void *)&cmd_unload_bpf_start,
+		(void *)&cmd_unload_bpf_dir,
+		(void *)&cmd_unload_bpf_port,
+		(void *)&cmd_unload_bpf_queue,
+		NULL,
+	},
+};
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -16272,6 +16416,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_del_port_tm_node,
 	(cmdline_parse_inst_t *)&cmd_set_port_tm_node_parent,
 	(cmdline_parse_inst_t *)&cmd_port_tm_hierarchy_commit,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_ld_parse,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_unld_parse,
 	NULL,
 };
 
diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
index b47537642..602e20ac3 100644
--- a/app/test-pmd/meson.build
+++ b/app/test-pmd/meson.build
@@ -21,7 +21,7 @@ sources = files('cmdline.c',
 	'testpmd.c',
 	'txonly.c')
 
-deps = ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'meter', 'bus_pci']
+deps = ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'meter', 'bus_pci', 'bpf']
 if dpdk_conf.has('RTE_LIBRTE_PDUMP')
 	deps += 'pdump'
 endif
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 07/10] test: add few eBPF samples
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                       ` (6 preceding siblings ...)
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 06/10] testpmd: new commands to load/unload " Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 08/10] test: introduce functional test for librte_bpf Konstantin Ananyev
                       ` (2 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add few simple eBPF programs as an example.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 test/bpf/dummy.c |  20 ++
 test/bpf/mbuf.h  | 578 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 test/bpf/t1.c    |  52 +++++
 test/bpf/t2.c    |  31 +++
 test/bpf/t3.c    |  36 ++++
 5 files changed, 717 insertions(+)
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c

diff --git a/test/bpf/dummy.c b/test/bpf/dummy.c
new file mode 100644
index 000000000..5851469e7
--- /dev/null
+++ b/test/bpf/dummy.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * does nothing always return success.
+ * used to measure BPF infrastructure overhead.
+ * To compile:
+ * clang -O2 -target bpf -c dummy.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+
+uint64_t
+entry(void *arg)
+{
+	return 1;
+}
diff --git a/test/bpf/mbuf.h b/test/bpf/mbuf.h
new file mode 100644
index 000000000..f24f908d7
--- /dev/null
+++ b/test/bpf/mbuf.h
@@ -0,0 +1,578 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation.
+ * Copyright 2014 6WIND S.A.
+ */
+
+/*
+ * Snipper from dpdk.org rte_mbuf.h.
+ * used to provide BPF programs information about rte_mbuf layout.
+ */
+
+#ifndef _MBUF_H_
+#define _MBUF_H_
+
+#include <stdint.h>
+#include <rte_common.h>
+#include <rte_memory.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * Packet Offload Features Flags. It also carry packet type information.
+ * Critical resources. Both rx/tx shared these bits. Be cautious on any change
+ *
+ * - RX flags start at bit position zero, and get added to the left of previous
+ *   flags.
+ * - The most-significant 3 bits are reserved for generic mbuf flags
+ * - TX flags therefore start at bit position 60 (i.e. 63-3), and new flags get
+ *   added to the right of the previously defined flags i.e. they should count
+ *   downwards, not upwards.
+ *
+ * Keep these flags synchronized with rte_get_rx_ol_flag_name() and
+ * rte_get_tx_ol_flag_name().
+ */
+
+/**
+ * RX packet is a 802.1q VLAN packet. This flag was set by PMDs when
+ * the packet is recognized as a VLAN, but the behavior between PMDs
+ * was not the same. This flag is kept for some time to avoid breaking
+ * applications and should be replaced by PKT_RX_VLAN_STRIPPED.
+ */
+#define PKT_RX_VLAN_PKT      (1ULL << 0)
+
+#define PKT_RX_RSS_HASH      (1ULL << 1)
+/**< RX packet with RSS hash result. */
+#define PKT_RX_FDIR          (1ULL << 2)
+/**< RX packet with FDIR match indicate. */
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_L4_CKSUM_MASK.
+ * This flag was set when the L4 checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_IP_CKSUM_MASK.
+ * This flag was set when the IP checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
+
+#define PKT_RX_EIP_CKSUM_BAD (1ULL << 5)
+/**< External IP header checksum error. */
+
+/**
+ * A vlan has been stripped by the hardware and its tci is saved in
+ * mbuf->vlan_tci. This can only happen if vlan stripping is enabled
+ * in the RX configuration of the PMD.
+ */
+#define PKT_RX_VLAN_STRIPPED (1ULL << 6)
+
+/**
+ * Mask of bits used to determine the status of RX IP checksum.
+ * - PKT_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
+ * - PKT_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
+ * - PKT_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
+ * - PKT_RX_IP_CKSUM_NONE: the IP checksum is not correct in the packet
+ *   data, but the integrity of the IP header is verified.
+ */
+#define PKT_RX_IP_CKSUM_MASK ((1ULL << 4) | (1ULL << 7))
+
+#define PKT_RX_IP_CKSUM_UNKNOWN 0
+#define PKT_RX_IP_CKSUM_BAD     (1ULL << 4)
+#define PKT_RX_IP_CKSUM_GOOD    (1ULL << 7)
+#define PKT_RX_IP_CKSUM_NONE    ((1ULL << 4) | (1ULL << 7))
+
+/**
+ * Mask of bits used to determine the status of RX L4 checksum.
+ * - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
+ * - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
+ * - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
+ * - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
+ *   data, but the integrity of the L4 data is verified.
+ */
+#define PKT_RX_L4_CKSUM_MASK ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_L4_CKSUM_UNKNOWN 0
+#define PKT_RX_L4_CKSUM_BAD     (1ULL << 3)
+#define PKT_RX_L4_CKSUM_GOOD    (1ULL << 8)
+#define PKT_RX_L4_CKSUM_NONE    ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_IEEE1588_PTP  (1ULL << 9)
+/**< RX IEEE1588 L2 Ethernet PT Packet. */
+#define PKT_RX_IEEE1588_TMST (1ULL << 10)
+/**< RX IEEE1588 L2/L4 timestamped packet.*/
+#define PKT_RX_FDIR_ID       (1ULL << 13)
+/**< FD id reported if FDIR match. */
+#define PKT_RX_FDIR_FLX      (1ULL << 14)
+/**< Flexible bytes reported if FDIR match. */
+
+/**
+ * The 2 vlans have been stripped by the hardware and their tci are
+ * saved in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
+ * This can only happen if vlan stripping is enabled in the RX
+ * configuration of the PMD. If this flag is set, PKT_RX_VLAN_STRIPPED
+ * must also be set.
+ */
+#define PKT_RX_QINQ_STRIPPED (1ULL << 15)
+
+/**
+ * Deprecated.
+ * RX packet with double VLAN stripped.
+ * This flag is replaced by PKT_RX_QINQ_STRIPPED.
+ */
+#define PKT_RX_QINQ_PKT      PKT_RX_QINQ_STRIPPED
+
+/**
+ * When packets are coalesced by a hardware or virtual driver, this flag
+ * can be set in the RX mbuf, meaning that the m->tso_segsz field is
+ * valid and is set to the segment size of original packets.
+ */
+#define PKT_RX_LRO           (1ULL << 16)
+
+/**
+ * Indicate that the timestamp field in the mbuf is valid.
+ */
+#define PKT_RX_TIMESTAMP     (1ULL << 17)
+
+/* add new RX flags here */
+
+/* add new TX flags here */
+
+/**
+ * Offload the MACsec. This flag must be set by the application to enable
+ * this offload feature for a packet to be transmitted.
+ */
+#define PKT_TX_MACSEC        (1ULL << 44)
+
+/**
+ * Bits 45:48 used for the tunnel type.
+ * When doing Tx offload like TSO or checksum, the HW needs to configure the
+ * tunnel type into the HW descriptors.
+ */
+#define PKT_TX_TUNNEL_VXLAN   (0x1ULL << 45)
+#define PKT_TX_TUNNEL_GRE     (0x2ULL << 45)
+#define PKT_TX_TUNNEL_IPIP    (0x3ULL << 45)
+#define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
+/**< TX packet with MPLS-in-UDP RFC 7510 header. */
+#define PKT_TX_TUNNEL_MPLSINUDP (0x5ULL << 45)
+/* add new TX TUNNEL type here */
+#define PKT_TX_TUNNEL_MASK    (0xFULL << 45)
+
+/**
+ * Second VLAN insertion (QinQ) flag.
+ */
+#define PKT_TX_QINQ_PKT    (1ULL << 49)
+/**< TX packet with double VLAN inserted. */
+
+/**
+ * TCP segmentation offload. To enable this offload feature for a
+ * packet to be transmitted on hardware supporting TSO:
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *    PKT_TX_TCP_CKSUM)
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
+ *    to 0 in the packet
+ *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
+ *  - calculate the pseudo header checksum without taking ip_len in account,
+ *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
+ *    rte_ipv6_phdr_cksum() that can be used as helpers.
+ */
+#define PKT_TX_TCP_SEG       (1ULL << 50)
+
+#define PKT_TX_IEEE1588_TMST (1ULL << 51)
+/**< TX IEEE1588 packet to timestamp. */
+
+/**
+ * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
+ * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
+ * L4 checksum offload, the user needs to:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - calculate the pseudo header checksum and set it in the L4 header (only
+ *    for TCP or UDP). See rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum().
+ *    For SCTP, set the crc field to 0.
+ */
+#define PKT_TX_L4_NO_CKSUM   (0ULL << 52)
+/**< Disable L4 cksum of TX pkt. */
+#define PKT_TX_TCP_CKSUM     (1ULL << 52)
+/**< TCP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_SCTP_CKSUM    (2ULL << 52)
+/**< SCTP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_UDP_CKSUM     (3ULL << 52)
+/**< UDP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_L4_MASK       (3ULL << 52)
+/**< Mask for L4 cksum offload request. */
+
+/**
+ * Offload the IP checksum in the hardware. The flag PKT_TX_IPV4 should
+ * also be set by the application, although a PMD will only check
+ * PKT_TX_IP_CKSUM.
+ *  - set the IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: l2_len, l3_len
+ */
+#define PKT_TX_IP_CKSUM      (1ULL << 54)
+
+/**
+ * Packet is IPv4. This flag must be set when using any offload feature
+ * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV4          (1ULL << 55)
+
+/**
+ * Packet is IPv6. This flag must be set when using an offload feature
+ * (TSO or L4 checksum) to tell the NIC that the packet is an IPv6
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV6          (1ULL << 56)
+
+#define PKT_TX_VLAN_PKT      (1ULL << 57)
+/**< TX packet is a 802.1q VLAN packet. */
+
+/**
+ * Offload the IP checksum of an external header in the hardware. The
+ * flag PKT_TX_OUTER_IPV4 should also be set by the application, alto ugh
+ * a PMD will only check PKT_TX_IP_CKSUM.  The IP checksum field in the
+ * packet must be set to 0.
+ *  - set the outer IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: outer_l2_len, outer_l3_len
+ */
+#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
+
+/**
+ * Packet outer header is IPv4. This flag must be set when using any
+ * outer offload feature (L3 or L4 checksum) to tell the NIC that the
+ * outer header of the tunneled packet is an IPv4 packet.
+ */
+#define PKT_TX_OUTER_IPV4   (1ULL << 59)
+
+/**
+ * Packet outer header is IPv6. This flag must be set when using any
+ * outer offload feature (L4 checksum) to tell the NIC that the outer
+ * header of the tunneled packet is an IPv6 packet.
+ */
+#define PKT_TX_OUTER_IPV6    (1ULL << 60)
+
+/**
+ * Bitmask of all supported packet Tx offload features flags,
+ * which can be set for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_IEEE1588_TMST |	 \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK |	 \
+		PKT_TX_MACSEC)
+
+#define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
+
+#define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
+
+/* Use final bit of flags to indicate a control mbuf */
+#define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
+
+/** Alignment constraint of mbuf private area. */
+#define RTE_MBUF_PRIV_ALIGN 8
+
+/**
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of RX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the RX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of TX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the TX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Some NICs need at least 2KB buffer to RX standard Ethernet frame without
+ * splitting it into multiple segments.
+ * So, for mbufs that planned to be involved into RX/TX, the recommended
+ * minimal buffer length is 2KB + RTE_PKTMBUF_HEADROOM.
+ */
+#define	RTE_MBUF_DEFAULT_DATAROOM	2048
+#define	RTE_MBUF_DEFAULT_BUF_SIZE	\
+	(RTE_MBUF_DEFAULT_DATAROOM + RTE_PKTMBUF_HEADROOM)
+
+/* define a set of marker types that can be used to refer to set points in the
+ * mbuf.
+ */
+__extension__
+typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
+__extension__
+typedef uint8_t  MARKER8[0];  /**< generic marker with 1B alignment */
+__extension__
+typedef uint64_t MARKER64[0];
+/**< marker that allows us to overwrite 8 bytes with a single assignment */
+
+typedef struct {
+	volatile int16_t cnt; /**< An internal counter value. */
+} rte_atomic16_t;
+
+/**
+ * The generic rte_mbuf, containing a packet mbuf.
+ */
+struct rte_mbuf {
+	MARKER cacheline0;
+
+	void *buf_addr;           /**< Virtual address of segment buffer. */
+	/**
+	 * Physical address of segment buffer.
+	 * Force alignment to 8-bytes, so as to ensure we have the exact
+	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+	 * working on vector drivers easier.
+	 */
+	phys_addr_t buf_physaddr __rte_aligned(sizeof(phys_addr_t));
+
+	/* next 8 bytes are initialised on RX descriptor rearm */
+	MARKER64 rearm_data;
+	uint16_t data_off;
+
+	/**
+	 * Reference counter. Its size should at least equal to the size
+	 * of port field (16 bits), to support zero-copy broadcast.
+	 * It should only be accessed using the following functions:
+	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
+	 * config option.
+	 */
+	RTE_STD_C11
+	union {
+		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
+		uint16_t refcnt;
+		/**< Non-atomically accessed refcnt */
+	};
+	uint16_t nb_segs;         /**< Number of segments. */
+
+	/** Input port (16 bits to support more than 256 virtual ports). */
+	uint16_t port;
+
+	uint64_t ol_flags;        /**< Offload features. */
+
+	/* remaining bytes are set on RX when pulling packet from descriptor */
+	MARKER rx_descriptor_fields1;
+
+	/*
+	 * The packet type, which is the combination of outer/inner L2, L3, L4
+	 * and tunnel types. The packet_type is about data really present in the
+	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+	 * vlan is stripped from the data.
+	 */
+	RTE_STD_C11
+	union {
+		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		struct {
+			uint32_t l2_type:4; /**< (Outer) L2 type. */
+			uint32_t l3_type:4; /**< (Outer) L3 type. */
+			uint32_t l4_type:4; /**< (Outer) L4 type. */
+			uint32_t tun_type:4; /**< Tunnel type. */
+			uint32_t inner_l2_type:4; /**< Inner L2 type. */
+			uint32_t inner_l3_type:4; /**< Inner L3 type. */
+			uint32_t inner_l4_type:4; /**< Inner L4 type. */
+		};
+	};
+
+	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
+	uint16_t data_len;        /**< Amount of data in segment buffer. */
+	/** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
+	uint16_t vlan_tci;
+
+	union {
+		uint32_t rss;     /**< RSS hash result if RSS enabled */
+		struct {
+			RTE_STD_C11
+			union {
+				struct {
+					uint16_t hash;
+					uint16_t id;
+				};
+				uint32_t lo;
+				/**< Second 4 flexible bytes */
+			};
+			uint32_t hi;
+			/**< First 4 flexible bytes or FD ID, dependent on
+			 *   PKT_RX_FDIR_* flag in ol_flags.
+			 */
+		} fdir;           /**< Filter identifier if FDIR enabled */
+		struct {
+			uint32_t lo;
+			uint32_t hi;
+		} sched;          /**< Hierarchical scheduler */
+		uint32_t usr;
+		/**< User defined tags. See rte_distributor_process() */
+	} hash;                   /**< hash information */
+
+	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
+	uint16_t vlan_tci_outer;
+
+	uint16_t buf_len;         /**< Length of segment buffer. */
+
+	/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
+	 * are not normalized but are always the same for a given port.
+	 */
+	uint64_t timestamp;
+
+	/* second cache line - fields only used in slow path or on TX */
+	MARKER cacheline1 __rte_cache_min_aligned;
+
+	RTE_STD_C11
+	union {
+		void *userdata;   /**< Can be used for external metadata */
+		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
+	};
+
+	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
+	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+
+	/* fields to support TX offloads */
+	RTE_STD_C11
+	union {
+		uint64_t tx_offload;       /**< combined for easy fetch */
+		__extension__
+		struct {
+			uint64_t l2_len:7;
+			/**< L2 (MAC) Header Length for non-tunneling pkt.
+			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
+			 */
+			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+			uint64_t tso_segsz:16; /**< TCP TSO segment size */
+
+			/* fields for TX offloading of tunnels */
+			uint64_t outer_l3_len:9;
+			/**< Outer L3 (IP) Hdr Length. */
+			uint64_t outer_l2_len:7;
+			/**< Outer L2 (MAC) Hdr Length. */
+
+			/* uint64_t unused:8; */
+		};
+	};
+
+	/** Size of the application private data. In case of an indirect
+	 * mbuf, it stores the direct mbuf private data size.
+	 */
+	uint16_t priv_size;
+
+	/** Timesync flags for use with IEEE1588. */
+	uint16_t timesync;
+
+	/** Sequence number. See also rte_reorder_insert(). */
+	uint32_t seqn;
+
+} __rte_cache_aligned;
+
+
+/**
+ * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
+ */
+#define RTE_MBUF_INDIRECT(mb)   ((mb)->ol_flags & IND_ATTACHED_MBUF)
+
+/**
+ * Returns TRUE if given mbuf is direct, or FALSE otherwise.
+ */
+#define RTE_MBUF_DIRECT(mb)     (!RTE_MBUF_INDIRECT(mb))
+
+/**
+ * Private data in case of pktmbuf pool.
+ *
+ * A structure that contains some pktmbuf_pool-specific data that are
+ * appended after the mempool structure (in private data).
+ */
+struct rte_pktmbuf_pool_private {
+	uint16_t mbuf_data_room_size; /**< Size of data space in each mbuf. */
+	uint16_t mbuf_priv_size;      /**< Size of private area in each mbuf. */
+};
+
+/**
+ * A macro that points to an offset into the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param o
+ *   The offset into the mbuf data.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod_offset(m, t, o)	\
+	((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
+
+/**
+ * A macro that points to the start of the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _MBUF_H_ */
diff --git a/test/bpf/t1.c b/test/bpf/t1.c
new file mode 100644
index 000000000..60f9434ab
--- /dev/null
+++ b/test/bpf/t1.c
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to first segment packet data as an input parameter.
+ * analog of tcpdump -s 1 -d 'dst 1.2.3.4 && udp && dst port 5000'
+ * (000) ldh      [12]
+ * (001) jeq      #0x800           jt 2    jf 12
+ * (002) ld       [30]
+ * (003) jeq      #0x1020304       jt 4    jf 12
+ * (004) ldb      [23]
+ * (005) jeq      #0x11            jt 6    jf 12
+ * (006) ldh      [20]
+ * (007) jset     #0x1fff          jt 12   jf 8
+ * (008) ldxb     4*([14]&0xf)
+ * (009) ldh      [x + 16]
+ * (010) jeq      #0x1388          jt 11   jf 12
+ * (011) ret      #1
+ * (012) ret      #0
+ *
+ * To compile:
+ * clang -O2 -target bpf -c t1.c
+ */
+
+#include <stdint.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <netinet/udp.h>
+
+uint64_t
+entry(void *pkt)
+{
+	struct ether_header *ether_header = (void *)pkt;
+
+	if (ether_header->ether_type != __builtin_bswap16(0x0800))
+		return 0;
+
+	struct iphdr *iphdr = (void *)(ether_header + 1);
+	if (iphdr->protocol != 17 || (iphdr->frag_off & 0x1ffff) != 0 ||
+			iphdr->daddr != __builtin_bswap32(0x1020304))
+		return 0;
+
+	int hlen = iphdr->ihl * 4;
+	struct udphdr *udphdr = (void *)iphdr + hlen;
+
+	if (udphdr->dest !=  __builtin_bswap16(5000))
+		return 0;
+
+	return 1;
+}
diff --git a/test/bpf/t2.c b/test/bpf/t2.c
new file mode 100644
index 000000000..69d7a4fe1
--- /dev/null
+++ b/test/bpf/t2.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * cleanup mbuf's vlan_tci and all related RX flags
+ * (PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED).
+ * Doesn't touch contents of packet data.
+ * To compile:
+ * clang -O2 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t2.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <rte_config.h>
+#include "mbuf.h"
+
+uint64_t
+entry(void *pkt)
+{
+	struct rte_mbuf *mb;
+
+	mb = pkt;
+	mb->vlan_tci = 0;
+	mb->ol_flags &= ~(PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED);
+
+	return 1;
+}
diff --git a/test/bpf/t3.c b/test/bpf/t3.c
new file mode 100644
index 000000000..531b9cb8c
--- /dev/null
+++ b/test/bpf/t3.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * Dump the mbuf into stdout if it is an ARP packet (aka tcpdump 'arp').
+ * To compile:
+ * clang -O2 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t3.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <net/ethernet.h>
+#include <rte_config.h>
+#include "mbuf.h"
+
+extern void rte_pktmbuf_dump(FILE *, const struct rte_mbuf *, unsigned int);
+
+uint64_t
+entry(const void *pkt)
+{
+	const struct rte_mbuf *mb;
+	const struct ether_header *eth;
+
+	mb = pkt;
+	eth = rte_pktmbuf_mtod(mb, const struct ether_header *);
+
+	if (eth->ether_type == __builtin_bswap16(ETHERTYPE_ARP))
+		rte_pktmbuf_dump(stdout, mb, 64);
+
+	return 1;
+}
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 08/10] test: introduce functional test for librte_bpf
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                       ` (7 preceding siblings ...)
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 07/10] test: add few eBPF samples Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 09/10] doc: add librte_bpf related info Konstantin Ananyev
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 10/10] MAINTAINERS: " Konstantin Ananyev
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 test/test/Makefile    |    2 +
 test/test/meson.build |    2 +
 test/test/test_bpf.c  | 1726 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1730 insertions(+)
 create mode 100644 test/test/test_bpf.c

diff --git a/test/test/Makefile b/test/test/Makefile
index a88cc38bf..61ac6880d 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -193,6 +193,8 @@ endif
 
 SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += test_bpf.c
+
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 
 CFLAGS += -O3
diff --git a/test/test/meson.build b/test/test/meson.build
index eb3d87a4d..101446984 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -8,6 +8,7 @@ test_sources = files('commands.c',
 	'test_alarm.c',
 	'test_atomic.c',
 	'test_barrier.c',
+	'test_bpf.c',
 	'test_byteorder.c',
 	'test_cmdline.c',
 	'test_cmdline_cirbuf.c',
@@ -98,6 +99,7 @@ test_sources = files('commands.c',
 )
 
 test_deps = ['acl',
+	'bpf',
 	'cfgfile',
 	'cmdline',
 	'cryptodev',
diff --git a/test/test/test_bpf.c b/test/test/test_bpf.c
new file mode 100644
index 000000000..9158c676a
--- /dev/null
+++ b/test/test/test_bpf.c
@@ -0,0 +1,1726 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_memory.h>
+#include <rte_debug.h>
+#include <rte_hexdump.h>
+#include <rte_random.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+#include <rte_bpf.h>
+
+#include "test.h"
+
+/*
+ * Basic functional tests for librte_bpf.
+ * The main procedure - load eBPF program, execute it and
+ * compare restuls with expected values.
+ */
+
+struct dummy_offset {
+	uint64_t u64;
+	uint32_t u32;
+	uint16_t u16;
+	uint8_t  u8;
+};
+
+struct dummy_vect8 {
+	struct dummy_offset in[8];
+	struct dummy_offset out[8];
+};
+
+#define	TEST_FILL_1	0xDEADBEEF
+
+#define	TEST_MUL_1	21
+#define TEST_MUL_2	-100
+
+#define TEST_SHIFT_1	15
+#define TEST_SHIFT_2	33
+
+#define TEST_JCC_1	0
+#define TEST_JCC_2	-123
+#define TEST_JCC_3	5678
+#define TEST_JCC_4	TEST_FILL_1
+
+struct bpf_test {
+	const char *name;
+	size_t arg_sz;
+	struct rte_bpf_prm prm;
+	void (*prepare)(void *);
+	int (*check_result)(uint64_t, const void *);
+	uint32_t allow_fail;
+};
+
+/*
+ * Compare return value and result data with expected ones.
+ * Report a failure if they don't match.
+ */
+static int
+cmp_res(const char *func, uint64_t exp_rc, uint64_t ret_rc,
+	const void *exp_res, const void *ret_res, size_t res_sz)
+{
+	int32_t ret;
+
+	ret = 0;
+	if (exp_rc != ret_rc) {
+		printf("%s@%d: invalid return value, expected: 0x%" PRIx64
+			",result: 0x%" PRIx64 "\n",
+			func, __LINE__, exp_rc, ret_rc);
+		ret |= -1;
+	}
+
+	if (memcmp(exp_res, ret_res, res_sz) != 0) {
+		printf("%s: invalid value\n", func);
+		rte_memdump(stdout, "expected", exp_res, res_sz);
+		rte_memdump(stdout, "result", ret_res, res_sz);
+		ret |= -1;
+	}
+
+	return ret;
+}
+
+/* store immediate test-cases */
+static const struct bpf_insn test_store1_prog[] = {
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_B),
+		.dst_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_H),
+		.dst_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u16),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+		.imm = TEST_FILL_1,
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static void
+test_store1_prepare(void *arg)
+{
+	struct dummy_offset *df;
+
+	df = arg;
+	memset(df, 0, sizeof(*df));
+}
+
+static int
+test_store1_check(uint64_t rc, const void *arg)
+{
+	const struct dummy_offset *dft;
+	struct dummy_offset dfe;
+
+	dft = arg;
+
+	memset(&dfe, 0, sizeof(dfe));
+	dfe.u64 = (int32_t)TEST_FILL_1;
+	dfe.u32 = dfe.u64;
+	dfe.u16 = dfe.u64;
+	dfe.u8 = dfe.u64;
+
+	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
+}
+
+/* store register test-cases */
+static const struct bpf_insn test_store2_prog[] = {
+
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_B),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_H),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_offset, u16),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+/* load test-cases */
+static const struct bpf_insn test_load1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_B),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_H),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u16),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return sum */
+	{
+		.code = (BPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_4,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_3,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static void
+test_load1_prepare(void *arg)
+{
+	struct dummy_offset *df;
+
+	df = arg;
+
+	memset(df, 0, sizeof(*df));
+	df->u64 = (int32_t)TEST_FILL_1;
+	df->u32 = df->u64;
+	df->u16 = df->u64;
+	df->u8 = df->u64;
+}
+
+static int
+test_load1_check(uint64_t rc, const void *arg)
+{
+	uint64_t v;
+	const struct dummy_offset *dft;
+
+	dft = arg;
+	v = dft->u64;
+	v += dft->u32;
+	v += dft->u16;
+	v += dft->u8;
+
+	return cmp_res(__func__, v, rc, dft, dft, sizeof(*dft));
+}
+
+/* alu mul test-cases */
+static const struct bpf_insn test_mul1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_MUL | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MUL | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (BPF_ALU | BPF_MUL | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MUL | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_3,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static void
+test_mul1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+	uint64_t v;
+
+	dv = arg;
+
+	v = rte_rand();
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u32 = v;
+	dv->in[1].u64 = v << 12 | v >> 6;
+	dv->in[2].u32 = -v;
+}
+
+static int
+test_mul1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 * TEST_MUL_1;
+	r3 *= TEST_MUL_2;
+	r4 = (uint32_t)(r4 * r2);
+	r4 *= r3;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	return cmp_res(__func__, 1, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* alu shift test-cases */
+static const struct bpf_insn test_shift1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_LSH | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = TEST_SHIFT_1,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_ARSH | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = TEST_SHIFT_2,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_RSH | BPF_X),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_4,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_LSH | BPF_X),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_4,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[3].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_AND | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = sizeof(uint64_t) * CHAR_BIT - 1,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_ARSH | BPF_X),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_ALU | BPF_AND | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = sizeof(uint32_t) * CHAR_BIT - 1,
+	},
+	{
+		.code = (BPF_ALU | BPF_LSH | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[4].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[5].u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static void
+test_shift1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+	uint64_t v;
+
+	dv = arg;
+
+	v = rte_rand();
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u32 = v;
+	dv->in[1].u64 = v << 12 | v >> 6;
+	dv->in[2].u32 = (-v ^ 5);
+}
+
+static int
+test_shift1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 << TEST_SHIFT_1;
+	r3 = (int64_t)r3 >> TEST_SHIFT_2;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+
+	r2 = (uint32_t)r2 >> r4;
+	r3 <<= r4;
+
+	dve.out[2].u64 = r2;
+	dve.out[3].u64 = r3;
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 &= sizeof(uint64_t) * CHAR_BIT - 1;
+	r3 = (int64_t)r3 >> r2;
+	r2 &= sizeof(uint32_t) * CHAR_BIT - 1;
+	r4 = (uint32_t)r4 << r2;
+
+	dve.out[4].u64 = r4;
+	dve.out[5].u64 = r3;
+
+	return cmp_res(__func__, 1, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* jmp test-cases */
+static const struct bpf_insn test_jump1_prog[] = {
+
+	[0] = {
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 0,
+	},
+	[1] = {
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	[2] = {
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	[3] = {
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u32),
+	},
+	[4] = {
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_5,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	[5] = {
+		.code = (BPF_JMP | BPF_JEQ | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = TEST_JCC_1,
+		.off = 8,
+	},
+	[6] = {
+		.code = (BPF_JMP | BPF_JSLE | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = TEST_JCC_2,
+		.off = 9,
+	},
+	[7] = {
+		.code = (BPF_JMP | BPF_JGT | BPF_K),
+		.dst_reg = BPF_REG_4,
+		.imm = TEST_JCC_3,
+		.off = 10,
+	},
+	[8] = {
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = BPF_REG_5,
+		.imm = TEST_JCC_4,
+		.off = 11,
+	},
+	[9] = {
+		.code = (BPF_JMP | BPF_JNE | BPF_X),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_3,
+		.off = 12,
+	},
+	[10] = {
+		.code = (BPF_JMP | BPF_JSGT | BPF_X),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_4,
+		.off = 13,
+	},
+	[11] = {
+		.code = (BPF_JMP | BPF_JLE | BPF_X),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_5,
+		.off = 14,
+	},
+	[12] = {
+		.code = (BPF_JMP | BPF_JSET | BPF_X),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_5,
+		.off = 15,
+	},
+	[13] = {
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+	[14] = {
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 0x1,
+	},
+	[15] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -10,
+	},
+	[16] = {
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 0x2,
+	},
+	[17] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -11,
+	},
+	[18] = {
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 0x4,
+	},
+	[19] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -12,
+	},
+	[20] = {
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 0x8,
+	},
+	[21] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -13,
+	},
+	[22] = {
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 0x10,
+	},
+	[23] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -14,
+	},
+	[24] = {
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 0x20,
+	},
+	[25] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -15,
+	},
+	[26] = {
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 0x40,
+	},
+	[27] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -16,
+	},
+	[28] = {
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 0x80,
+	},
+	[29] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -17,
+	},
+};
+
+static void
+test_jump1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+	uint64_t v1, v2;
+
+	dv = arg;
+
+	v1 = rte_rand();
+	v2 = rte_rand();
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u64 = v1;
+	dv->in[1].u64 = v2;
+	dv->in[0].u32 = (v1 << 12) + (v2 >> 6);
+	dv->in[1].u32 = (v2 << 12) - (v1 >> 6);
+}
+
+static int
+test_jump1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4, r5, rv;
+	const struct dummy_vect8 *dvt;
+
+	dvt = arg;
+
+	rv = 0;
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[0].u64;
+	r4 = dvt->in[1].u32;
+	r5 = dvt->in[1].u64;
+
+	if (r2 == TEST_JCC_1)
+		rv |= 0x1;
+	if ((int64_t)r3 <= TEST_JCC_2)
+		rv |= 0x2;
+	if (r4 > TEST_JCC_3)
+		rv |= 0x4;
+	if (r5 & TEST_JCC_4)
+		rv |= 0x8;
+	if (r2 != r3)
+		rv |= 0x10;
+	if ((int64_t)r2 > (int64_t)r4)
+		rv |= 0x20;
+	if (r2 <= r5)
+		rv |= 0x40;
+	if (r3 & r5)
+		rv |= 0x80;
+
+	return cmp_res(__func__, rv, rc, &rv, &rc, sizeof(rv));
+}
+
+/* alu (add, sub, and, or, xor, neg)  test-cases */
+static const struct bpf_insn test_alu1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_5,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_AND | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ALU | BPF_XOR | BPF_K),
+		.dst_reg = BPF_REG_4,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_ADD | BPF_K),
+		.dst_reg = BPF_REG_5,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_5,
+		.off = offsetof(struct dummy_vect8, out[3].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_OR | BPF_X),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_3,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_XOR | BPF_X),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_4,
+	},
+	{
+		.code = (BPF_ALU | BPF_SUB | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_5,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_AND | BPF_X),
+		.dst_reg = BPF_REG_5,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[4].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[5].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[6].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_5,
+		.off = offsetof(struct dummy_vect8, out[7].u64),
+	},
+	/* return (-r2 + (-r3)) */
+	{
+		.code = (BPF_ALU | BPF_NEG),
+		.dst_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_NEG),
+		.dst_reg = BPF_REG_3,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_3,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_X),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static int
+test_alu1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4, r5, rv;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[0].u64;
+	r4 = dvt->in[1].u32;
+	r5 = dvt->in[1].u64;
+
+	r2 = (uint32_t)r2 & TEST_FILL_1;
+	r3 |= (int32_t) TEST_FILL_1;
+	r4 = (uint32_t)r4 ^ TEST_FILL_1;
+	r5 += (int32_t)TEST_FILL_1;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+	dve.out[3].u64 = r5;
+
+	r2 = (uint32_t)r2 | (uint32_t)r3;
+	r3 ^= r4;
+	r4 = (uint32_t)r4 - (uint32_t)r5;
+	r5 &= r2;
+
+	dve.out[4].u64 = r2;
+	dve.out[5].u64 = r3;
+	dve.out[6].u64 = r4;
+	dve.out[7].u64 = r5;
+
+	r2 = -(int32_t)r2;
+	rv = (uint32_t)r2;
+	r3 = -r3;
+	rv += r3;
+
+	return cmp_res(__func__, rv, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* endianness conversions (BE->LE/LE->BE)  test-cases */
+static const struct bpf_insn test_bele1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_H),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u16),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_END | BPF_TO_BE),
+		.dst_reg = BPF_REG_2,
+		.imm = sizeof(uint16_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | BPF_END | BPF_TO_BE),
+		.dst_reg = BPF_REG_3,
+		.imm = sizeof(uint32_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | BPF_END | BPF_TO_BE),
+		.dst_reg = BPF_REG_4,
+		.imm = sizeof(uint64_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_H),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u16),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_END | BPF_TO_LE),
+		.dst_reg = BPF_REG_2,
+		.imm = sizeof(uint16_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | BPF_END | BPF_TO_LE),
+		.dst_reg = BPF_REG_3,
+		.imm = sizeof(uint32_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | BPF_END | BPF_TO_LE),
+		.dst_reg = BPF_REG_4,
+		.imm = sizeof(uint64_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[3].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[4].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[5].u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static void
+test_bele1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+
+	dv = arg;
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u64 = rte_rand();
+	dv->in[0].u32 = dv->in[0].u64;
+	dv->in[0].u16 = dv->in[0].u64;
+}
+
+static int
+test_bele1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u16;
+	r3 = dvt->in[0].u32;
+	r4 = dvt->in[0].u64;
+
+	r2 =  rte_cpu_to_be_16(r2);
+	r3 =  rte_cpu_to_be_32(r3);
+	r4 =  rte_cpu_to_be_64(r4);
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	r2 = dvt->in[0].u16;
+	r3 = dvt->in[0].u32;
+	r4 = dvt->in[0].u64;
+
+	r2 =  rte_cpu_to_le_16(r2);
+	r3 =  rte_cpu_to_le_32(r3);
+	r4 =  rte_cpu_to_le_64(r4);
+
+	dve.out[3].u64 = r2;
+	dve.out[4].u64 = r3;
+	dve.out[5].u64 = r4;
+
+	return cmp_res(__func__, 1, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* atomic add test-cases */
+static const struct bpf_insn test_xadd1_prog[] = {
+
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = -1,
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_4,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_5,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_5,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_5,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_6,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_6,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_6,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_7,
+		.imm = TEST_JCC_2,
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_7,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_7,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_8,
+		.imm = TEST_JCC_3,
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_W),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_8,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_XADD | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_8,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static int
+test_xadd1_check(uint64_t rc, const void *arg)
+{
+	uint64_t rv;
+	const struct dummy_offset *dft;
+	struct dummy_offset dfe;
+
+	dft = arg;
+	memset(&dfe, 0, sizeof(dfe));
+
+	rv = 1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = -1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = (int32_t)TEST_FILL_1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_MUL_1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_MUL_2;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_JCC_2;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_JCC_3;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
+}
+
+/* alu div test-cases */
+static const struct bpf_insn test_div1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_DIV | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOD | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_ALU | BPF_MOD | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_DIV | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_3,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_1,
+		.src_reg = BPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	/* check that we can handle division by zero gracefully. */
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[3].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_DIV | BPF_X),
+		.dst_reg = BPF_REG_4,
+		.src_reg = BPF_REG_2,
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | BPF_MOV | BPF_K),
+		.dst_reg = BPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static int
+test_div1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 / TEST_MUL_1;
+	r3 %= TEST_MUL_2;
+	r2 |= 1;
+	r3 |= 1;
+	r4 = (uint32_t)(r4 % r2);
+	r4 /= r3;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	/*
+	 * in the test prog we attempted to divide by zero.
+	 * so return value should return 0.
+	 */
+	return cmp_res(__func__, 0, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* call test-cases */
+static const struct bpf_insn test_call1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_10,
+		.src_reg = BPF_REG_2,
+		.off = -4,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_10,
+		.src_reg = BPF_REG_3,
+		.off = -16,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_X),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_10,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_SUB | BPF_K),
+		.dst_reg = BPF_REG_2,
+		.imm = 4,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_MOV | BPF_X),
+		.dst_reg = BPF_REG_3,
+		.src_reg = BPF_REG_10,
+	},
+	{
+		.code = (BPF_ALU64 | BPF_SUB | BPF_K),
+		.dst_reg = BPF_REG_3,
+		.imm = 16,
+	},
+	{
+		.code = (BPF_JMP | BPF_CALL),
+		.imm = 0,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = BPF_REG_2,
+		.src_reg = BPF_REG_10,
+		.off = -4,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_DW),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_10,
+		.off = -16
+	},
+	{
+		.code = (BPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = BPF_REG_0,
+		.src_reg = BPF_REG_2,
+	},
+	{
+		.code = (BPF_JMP | BPF_EXIT),
+	},
+};
+
+static void
+dummy_func1(const void *p, uint32_t *v32, uint64_t *v64)
+{
+	const struct dummy_offset *dv;
+
+	dv = p;
+
+	v32[0] += dv->u16;
+	v64[0] += dv->u8;
+}
+
+static int
+test_call1_check(uint64_t rc, const void *arg)
+{
+	uint32_t v32;
+	uint64_t v64;
+	const struct dummy_offset *dv;
+
+	dv = arg;
+
+	v32 = dv->u32;
+	v64 = dv->u64;
+	dummy_func1(arg, &v32, &v64);
+	v64 += v32;
+
+	if (v64 != rc) {
+		printf("%s@%d: invalid return value "
+			"expected=0x%" PRIx64 ", actual=0x%" PRIx64 "\n",
+			__func__, __LINE__, v64, rc);
+		return -1;
+	}
+	return 0;
+	return cmp_res(__func__, v64, rc, dv, dv, sizeof(*dv));
+}
+
+static const struct rte_bpf_xsym test_call1_xsym[] = {
+	{
+		.name = RTE_STR(dummy_func1),
+		.type = RTE_BPF_XTYPE_FUNC,
+		.func = (void *)dummy_func1,
+	},
+};
+
+static const struct bpf_test tests[] = {
+	{
+		.name = "test_store1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_store1_prog,
+			.nb_ins = RTE_DIM(test_store1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_store1_check,
+	},
+	{
+		.name = "test_store2",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_store2_prog,
+			.nb_ins = RTE_DIM(test_store2_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_store1_check,
+	},
+	{
+		.name = "test_load1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_load1_prog,
+			.nb_ins = RTE_DIM(test_load1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_load1_prepare,
+		.check_result = test_load1_check,
+	},
+	{
+		.name = "test_mul1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_mul1_prog,
+			.nb_ins = RTE_DIM(test_mul1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_mul1_prepare,
+		.check_result = test_mul1_check,
+	},
+	{
+		.name = "test_shift1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_shift1_prog,
+			.nb_ins = RTE_DIM(test_shift1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_shift1_prepare,
+		.check_result = test_shift1_check,
+	},
+	{
+		.name = "test_jump1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_jump1_prog,
+			.nb_ins = RTE_DIM(test_jump1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_jump1_prepare,
+		.check_result = test_jump1_check,
+	},
+	{
+		.name = "test_alu1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_alu1_prog,
+			.nb_ins = RTE_DIM(test_alu1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_jump1_prepare,
+		.check_result = test_alu1_check,
+	},
+	{
+		.name = "test_bele1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_bele1_prog,
+			.nb_ins = RTE_DIM(test_bele1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_bele1_prepare,
+		.check_result = test_bele1_check,
+	},
+	{
+		.name = "test_xadd1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_xadd1_prog,
+			.nb_ins = RTE_DIM(test_xadd1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_xadd1_check,
+	},
+	{
+		.name = "test_div1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_div1_prog,
+			.nb_ins = RTE_DIM(test_div1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+		},
+		.prepare = test_mul1_prepare,
+		.check_result = test_div1_check,
+	},
+	{
+		.name = "test_call1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_call1_prog,
+			.nb_ins = RTE_DIM(test_call1_prog),
+			.prog_type = RTE_BPF_PROG_TYPE_UNSPEC,
+			.xsym = test_call1_xsym,
+			.nb_xsym = RTE_DIM(test_call1_xsym),
+		},
+		.prepare = test_load1_prepare,
+		.check_result = test_call1_check,
+		/* for now don't support function calls on 32 bit platform */
+		.allow_fail = (sizeof(uint64_t) != sizeof(uintptr_t)),
+	},
+};
+
+static int
+run_test(const struct bpf_test *tst)
+{
+	int32_t ret, rv;
+	int64_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_jit jit;
+	uint8_t tbuf[tst->arg_sz];
+
+	printf("%s(%s) start\n", __func__, tst->name);
+
+	bpf = rte_bpf_load(&tst->prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		return -1;
+	}
+
+	tst->prepare(tbuf);
+
+	rc = rte_bpf_exec(bpf, tbuf);
+	ret = tst->check_result(rc, tbuf);
+	if (ret != 0) {
+		printf("%s@%d: check_result(%s) failed, error: %d(%s);\n",
+			__func__, __LINE__, tst->name, ret, strerror(ret));
+	}
+
+	rte_bpf_get_jit(bpf, &jit);
+	if (jit.func == NULL)
+		return 0;
+
+	tst->prepare(tbuf);
+	rc = jit.func(tbuf);
+	rv = tst->check_result(rc, tbuf);
+	ret |= rv;
+	if (rv != 0) {
+		printf("%s@%d: check_result(%s) failed, error: %d(%s);\n",
+			__func__, __LINE__, tst->name, rv, strerror(ret));
+	}
+
+	rte_bpf_destroy(bpf);
+	return ret;
+
+}
+
+static int
+test_bpf(void)
+{
+	int32_t rc, rv;
+	uint32_t i;
+
+	rc = 0;
+	for (i = 0; i != RTE_DIM(tests); i++) {
+		rv = run_test(tests + i);
+		if (tests[i].allow_fail == 0)
+			rc |= rv;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 09/10] doc: add librte_bpf related info
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                       ` (8 preceding siblings ...)
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 08/10] test: introduce functional test for librte_bpf Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  2018-04-23 13:26       ` Kovacevic, Marko
  2018-04-23 13:34       ` Kovacevic, Marko
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 10/10] MAINTAINERS: " Konstantin Ananyev
  10 siblings, 2 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 doc/api/doxy-api-index.md         |  3 ++-
 doc/api/doxy-api.conf             |  1 +
 doc/guides/prog_guide/bpf_lib.rst | 38 ++++++++++++++++++++++++++++++++++++++
 doc/guides/prog_guide/index.rst   |  1 +
 4 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/prog_guide/bpf_lib.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 2f04619cb..d0c1c37ad 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -133,7 +133,8 @@ The public API headers are grouped by topics:
   [EFD]                (@ref rte_efd.h),
   [ACL]                (@ref rte_acl.h),
   [member]             (@ref rte_member.h),
-  [flow classify]      (@ref rte_flow_classify.h)
+  [flow classify]      (@ref rte_flow_classify.h),
+  [BPF]                (@ref rte_bpf.h)
 
 - **containers**:
   [mbuf]               (@ref rte_mbuf.h),
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index cda52fdfb..c8eb6d893 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -42,6 +42,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_acl \
                           lib/librte_bbdev \
                           lib/librte_bitratestats \
+                          lib/librte_bpf \
                           lib/librte_cfgfile \
                           lib/librte_cmdline \
                           lib/librte_compat \
diff --git a/doc/guides/prog_guide/bpf_lib.rst b/doc/guides/prog_guide/bpf_lib.rst
new file mode 100644
index 000000000..2fce4cefb
--- /dev/null
+++ b/doc/guides/prog_guide/bpf_lib.rst
@@ -0,0 +1,38 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+Berkeley Packet Filter Library
+==============================
+
+The DPDK provides an BPF library that gives the ability
+to load and execute Enhanced Berkeley Packet Filter (eBPF) bytecode within
+user-space dpdk appilication.
+
+It supports basic set of features from eBPF spec.
+Please refer to the
+`eBPF spec <https://www.kernel.org/doc/Documentation/networking/filter.txt>`
+for more information.
+Also it introduces basic framework to load/unload BPF-based filters
+on eth devices (right now only via SW RX/TX callbacks).
+
+The library API provides the following basic operations:
+
+*  Create a new BPF execution context and load user provided eBPF code into it.
+
+*   Destroy an BPF execution context and its runtime structures and free the associated memory.
+
+*   Execute eBPF bytecode associated with provied input parameter.
+
+*   Provide information about natively compield code for given BPF context.
+
+*   Load BPF program from the ELF file and install callback to execute it on given ethdev port/queue.
+
+Not currently supported eBPF features
+-------------------------------------
+
+ - JIT for non X86_64 platforms
+ - cBPF
+ - tail-pointer call
+ - eBPF MAP
+ - skb
+ - external function calls for 32-bit platforms
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index bbbe7895d..76b079c3f 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -49,6 +49,7 @@ Programmer's Guide
     vhost_lib
     metrics_lib
     port_hotplug_framework
+    bpf_lib
     source_org
     dev_kit_build_system
     dev_kit_root_make_help
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v4 10/10] MAINTAINERS: add librte_bpf related info
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                       ` (9 preceding siblings ...)
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 09/10] doc: add librte_bpf related info Konstantin Ananyev
@ 2018-04-13 14:43     ` Konstantin Ananyev
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-04-13 14:43 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 MAINTAINERS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ed3251da7..db7fec362 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -964,6 +964,10 @@ Latency statistics
 M: Reshma Pattan <reshma.pattan@intel.com>
 F: lib/librte_latencystats/
 
+BPF
+M: Konstantin Ananyev <konstantin.ananyev@intel.com>
+F: lib/librte_bpf/
+F: doc/guides/prog_guide/bpf_lib.rst
 
 Test Applications
 -----------------
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v4 00/10] add framework to load and execute BPF code
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 00/10] add framework to load and execute BPF code Konstantin Ananyev
@ 2018-04-16 21:25       ` Thomas Monjalon
  0 siblings, 0 replies; 83+ messages in thread
From: Thomas Monjalon @ 2018-04-16 21:25 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev

13/04/2018 16:43, Konstantin Ananyev:
> Konstantin Ananyev (10):
>   net: move BPF related definitions into librte_net
>   bpf: add BPF loading and execution framework
>   bpf: add more logic into bpf_validate()
>   bpf: add JIT compilation for x86_64 ISA
>   bpf: introduce basic RX/TX BPF filters
>   testpmd: new commands to load/unload BPF filters
>   test: add few eBPF samples
>   test: introduce functional test for librte_bpf
>   doc: add librte_bpf related info
>   MAINTAINERS: add librte_bpf related info

Just a minor comment: you should try to squash last patches
(doc and MAINTAINERS) early in the series.
Thanks

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v3 09/10] doc: add librte_bpf related info
  2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 09/10] doc: add librte_bpf related info Konstantin Ananyev
@ 2018-04-23 13:22     ` Kovacevic, Marko
  0 siblings, 0 replies; 83+ messages in thread
From: Kovacevic, Marko @ 2018-04-23 13:22 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: Ananyev, Konstantin

Small typos below.

> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  doc/api/doxy-api-index.md         |  3 ++-
>  doc/api/doxy-api.conf             |  1 +
>  doc/guides/prog_guide/bpf_lib.rst | 37
> +++++++++++++++++++++++++++++++++++++


<...>

> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2018 Intel Corporation.
> +
> +Berkeley Packet Filter Library
> +==============================
> +
> +The DPDK provides an BPF library that gives the ability to load and
> +execute Enhanced Berkeley Packet Filter (eBPF) bytecode within
> +user-space dpdk appilication.

appilication / application


<...>

> +*  Create a new BPF execution context and load user provided eBPF code
> into it.
> +
> +*   Destroy an BPF execution context and its runtime structures and free the
> associated memory.
> +
> +*   Execute eBPF bytecode associated with provied input parameter.

provied  /  provided 


> +
> +*   Provide information about natively compield code for given BPF context.

compield  /  compiled
 

<...>


Marko K.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v4 09/10] doc: add librte_bpf related info
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 09/10] doc: add librte_bpf related info Konstantin Ananyev
@ 2018-04-23 13:26       ` Kovacevic, Marko
  2018-04-23 13:34       ` Kovacevic, Marko
  1 sibling, 0 replies; 83+ messages in thread
From: Kovacevic, Marko @ 2018-04-23 13:26 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: Ananyev, Konstantin

Small typos below.

> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  doc/api/doxy-api-index.md         |  3 ++-
>  doc/api/doxy-api.conf             |  1 +
>  doc/guides/prog_guide/bpf_lib.rst | 37
> +++++++++++++++++++++++++++++++++++++


<...>

> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2018 Intel Corporation.
> +
> +Berkeley Packet Filter Library
> +==============================
> +
> +The DPDK provides an BPF library that gives the ability to load and 
> +execute Enhanced Berkeley Packet Filter (eBPF) bytecode within 
> +user-space dpdk appilication.

appilication / application


<...>

> +*  Create a new BPF execution context and load user provided eBPF 
> +code
> into it.
> +
> +*   Destroy an BPF execution context and its runtime structures and free the
> associated memory.
> +
> +*   Execute eBPF bytecode associated with provied input parameter.

provied  /  provided 


> +
> +*   Provide information about natively compield code for given BPF context.

compield  /  compiled
 

<...>


Marko K.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v4 09/10] doc: add librte_bpf related info
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 09/10] doc: add librte_bpf related info Konstantin Ananyev
  2018-04-23 13:26       ` Kovacevic, Marko
@ 2018-04-23 13:34       ` Kovacevic, Marko
  1 sibling, 0 replies; 83+ messages in thread
From: Kovacevic, Marko @ 2018-04-23 13:34 UTC (permalink / raw)
  To: Ananyev, Konstantin, dev; +Cc: Ananyev, Konstantin

One more thing very small but this poped up also

Running git check log on HEAD~1:  38051
======================================
Wrong headline format:
        doc: add librte_bpf related info

Think u just need to remove the underscore.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v5 0/8] add framework to load and execute BPF code
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
@ 2018-05-04 12:45       ` Konstantin Ananyev
  2018-05-09 17:11         ` Ferruh Yigit
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
                         ` (7 subsequent siblings)
  8 siblings, 1 reply; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-04 12:45 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

BPF is used quite intensively inside Linux (and BSD) kernels
for various different purposes and proved to be extremely useful.

BPF inside DPDK might also be used in a lot of places
for a lot of similar things.
 As an example to:
- packet filtering/tracing (aka tcpdump)
- packet classification
- statistics collection
- HW/PMD live-system debugging/prototyping - trace HW descriptors,
  internal PMD SW state, etc.
- Comeup with your own idea

All of that in a dynamic, user-defined and extensible manner.

So these series introduce new library - librte_bpf.
librte_bpf provides API to load and execute BPF bytecode within
user-space dpdk app.
It supports basic set of features from eBPF spec.
Also it introduces basic framework to load/unload BPF-based filters
on eth devices (right now via SW RX/TX callbacks).

How to try it:
===============

1) run testpmd as usual and start your favorite forwarding case.
2) build bpf program you'd like to load
(you'll need clang v3.7 or above):
$ cd test/bpf
$ clang -O2 -target bpf -c t1.c

3) load bpf program(s):
testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>

<load-flags>:  [-][J][M]
J - use JIT generated native code, otherwise BPF interpreter will be used.
M - assume input parameter is a pointer to rte_mbuf,
    otherwise assume it is a pointer to first segment's data.

Few examples:

# to load (not JITed) dummy.o at TX queue 0, port 0:
testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o
#to load (and JIT compile) t1.o at RX queue 0, port 1:
testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o

#to load and JIT t3.o (note that it expects mbuf as an input):
testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o

4) observe changed traffic behavior
Let say with the examples above:
 - dummy.o  does literally nothing, so no changes should be here,
   except some possible slowdown.
 - t1.o - should force to drop all packets that doesn't match:
   'dst 1.2.3.4 && udp && dst port 5000' filter.
 - t3.o - should dump to stdout ARP packets.

5) unload some or all bpf programs:
testpmd> bpf-unload tx 0 0

6) continue with step 3) or exit

Not currently supported features:
=================================
- cBPF
- tail-pointer call
- eBPF MAP
- JIT for non X86_64 targets
- skb
- function calls for 32-bit apps
- mbuf pointer as input parameter for 32-bit apps

v2:
 - add meson build
 - add freebsd build
 - use new logging API
 - using rte_malloc() for cbi allocation
 - add extra logic into bpf_validate()

v3:
 - add new test-case for it
 - update docs
 - update MAINTAINERS

v4:
 - add more tests to cover BPF ISA
 - fix few issues

v5:
 - revert changes in tap_bpf.h
 - rename eBPF related defines
 - apply Thomas and Marco and Marco comments

Konstantin Ananyev (8):
  bpf: add BPF loading and execution framework
  bpf: add more logic into bpf_validate()
  bpf: add JIT compilation for x86_64 ISA
  bpf: introduce basic RX/TX BPF filters
  testpmd: new commands to load/unload BPF filters
  test: add few eBPF samples
  test: introduce functional test for librte_bpf
  doc: add bpf library related info

 MAINTAINERS                                 |    4 +
 app/test-pmd/bpf_sup.h                      |   25 +
 app/test-pmd/cmdline.c                      |  149 +++
 app/test-pmd/meson.build                    |    2 +-
 config/common_base                          |    5 +
 doc/api/doxy-api-index.md                   |    3 +-
 doc/api/doxy-api.conf                       |    1 +
 doc/guides/prog_guide/bpf_lib.rst           |   38 +
 doc/guides/prog_guide/index.rst             |    1 +
 doc/guides/rel_notes/release_18_05.rst      |    7 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   56 +
 lib/Makefile                                |    2 +
 lib/librte_bpf/Makefile                     |   36 +
 lib/librte_bpf/bpf.c                        |   64 +
 lib/librte_bpf/bpf_def.h                    |  138 +++
 lib/librte_bpf/bpf_exec.c                   |  453 +++++++
 lib/librte_bpf/bpf_impl.h                   |   41 +
 lib/librte_bpf/bpf_jit_x86.c                | 1369 +++++++++++++++++++++
 lib/librte_bpf/bpf_load.c                   |  386 ++++++
 lib/librte_bpf/bpf_pkt.c                    |  607 +++++++++
 lib/librte_bpf/bpf_validate.c               | 1184 ++++++++++++++++++
 lib/librte_bpf/meson.build                  |   25 +
 lib/librte_bpf/rte_bpf.h                    |  184 +++
 lib/librte_bpf/rte_bpf_ethdev.h             |  102 ++
 lib/librte_bpf/rte_bpf_version.map          |   16 +
 lib/meson.build                             |    2 +-
 mk/rte.app.mk                               |    2 +
 test/bpf/dummy.c                            |   20 +
 test/bpf/mbuf.h                             |  578 +++++++++
 test/bpf/t1.c                               |   52 +
 test/bpf/t2.c                               |   31 +
 test/bpf/t3.c                               |   36 +
 test/test/Makefile                          |    2 +
 test/test/meson.build                       |    2 +
 test/test/test_bpf.c                        | 1759 +++++++++++++++++++++++++++
 35 files changed, 7379 insertions(+), 3 deletions(-)
 create mode 100644 app/test-pmd/bpf_sup.h
 create mode 100644 doc/guides/prog_guide/bpf_lib.rst
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_def.h
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c
 create mode 100644 test/test/test_bpf.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 0/8] add framework to load and execute BPF code Konstantin Ananyev
@ 2018-05-04 12:45       ` Konstantin Ananyev
  2018-05-09 17:09         ` Ferruh Yigit
                           ` (10 more replies)
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 2/8] bpf: add more logic into bpf_validate() Konstantin Ananyev
                         ` (6 subsequent siblings)
  8 siblings, 11 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-04 12:45 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.
It supports basic set of features from eBPF spec
(https://www.kernel.org/doc/Documentation/networking/filter.txt).

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb
 - function calls for 32-bit apps
 - mbuf pointer as input parameter for 32-bit apps

It also adds dependency on libelf.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 MAINTAINERS                        |   4 +
 config/common_base                 |   5 +
 lib/Makefile                       |   2 +
 lib/librte_bpf/Makefile            |  31 +++
 lib/librte_bpf/bpf.c               |  59 +++++
 lib/librte_bpf/bpf_def.h           | 138 +++++++++++
 lib/librte_bpf/bpf_exec.c          | 453 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h          |  41 ++++
 lib/librte_bpf/bpf_load.c          | 386 +++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_validate.c      |  55 +++++
 lib/librte_bpf/meson.build         |  19 ++
 lib/librte_bpf/rte_bpf.h           | 184 +++++++++++++++
 lib/librte_bpf/rte_bpf_version.map |  12 +
 lib/meson.build                    |   2 +-
 mk/rte.app.mk                      |   2 +
 15 files changed, 1392 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_def.h
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index ce06e93c2..4a7edbcf7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1049,6 +1049,10 @@ Latency statistics
 M: Reshma Pattan <reshma.pattan@intel.com>
 F: lib/librte_latencystats/
 
+BPF
+M: Konstantin Ananyev <konstantin.ananyev@intel.com>
+F: lib/librte_bpf/
+F: doc/guides/prog_guide/bpf_lib.rst
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index 03a8688b5..ac425491c 100644
--- a/config/common_base
+++ b/config/common_base
@@ -863,3 +863,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=y
diff --git a/lib/Makefile b/lib/Makefile
index 057bf7890..29cea6429 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -98,6 +98,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ethdev
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ethdev librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ethdev
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..9b714389a
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+LDLIBS += -lelf
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += bpf_def.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..d7f68c017
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+int rte_bpf_logtype;
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+	if (rc != 0)
+		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
+
+RTE_INIT(rte_bpf_init_log);
+
+static void
+rte_bpf_init_log(void)
+{
+	rte_bpf_logtype = rte_log_register("lib.bpf");
+	if (rte_bpf_logtype >= 0)
+		rte_log_set_level(rte_bpf_logtype, RTE_LOG_INFO);
+}
diff --git a/lib/librte_bpf/bpf_def.h b/lib/librte_bpf/bpf_def.h
new file mode 100644
index 000000000..6b69de345
--- /dev/null
+++ b/lib/librte_bpf/bpf_def.h
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 1982, 1986, 1990, 1993
+ *      The Regents of the University of California.
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#ifndef _RTE_BPF_DEF_H_
+#define _RTE_BPF_DEF_H_
+
+/**
+ * @file
+ *
+ * classic BPF (cBPF) and extended BPF (eBPF) related defines.
+ * For more information regarding cBPF and eBPF ISA and their differences,
+ * please refer to:
+ * https://www.kernel.org/doc/Documentation/networking/filter.txt.
+ * As a rule of thumb for that file:
+ * all definitions used by both cBPF and eBPF start with bpf(BPF)_ prefix,
+ * while eBPF only ones start with ebpf(EBPF)) prefix.
+ */
+
+#include <stdint.h>
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * The instruction encodings.
+ */
+
+/* Instruction classes */
+#define BPF_CLASS(code) ((code) & 0x07)
+#define	BPF_LD		0x00
+#define	BPF_LDX		0x01
+#define	BPF_ST		0x02
+#define	BPF_STX		0x03
+#define	BPF_ALU		0x04
+#define	BPF_JMP		0x05
+#define	BPF_RET		0x06
+#define	BPF_MISC        0x07
+
+#define EBPF_ALU64	0x07
+
+/* ld/ldx fields */
+#define BPF_SIZE(code)  ((code) & 0x18)
+#define	BPF_W		0x00
+#define	BPF_H		0x08
+#define	BPF_B		0x10
+#define	EBPF_DW		0x18
+
+#define BPF_MODE(code)  ((code) & 0xe0)
+#define	BPF_IMM		0x00
+#define	BPF_ABS		0x20
+#define	BPF_IND		0x40
+#define	BPF_MEM		0x60
+#define	BPF_LEN		0x80
+#define	BPF_MSH		0xa0
+
+#define EBPF_XADD	0xc0
+
+/* alu/jmp fields */
+#define BPF_OP(code)    ((code) & 0xf0)
+#define	BPF_ADD		0x00
+#define	BPF_SUB		0x10
+#define	BPF_MUL		0x20
+#define	BPF_DIV		0x30
+#define	BPF_OR		0x40
+#define	BPF_AND		0x50
+#define	BPF_LSH		0x60
+#define	BPF_RSH		0x70
+#define	BPF_NEG		0x80
+#define	BPF_MOD		0x90
+#define	BPF_XOR		0xa0
+
+#define EBPF_MOV	0xb0
+#define EBPF_ARSH	0xc0
+#define EBPF_END	0xd0
+
+#define	BPF_JA		0x00
+#define	BPF_JEQ		0x10
+#define	BPF_JGT		0x20
+#define	BPF_JGE		0x30
+#define	BPF_JSET        0x40
+
+#define EBPF_JNE	0x50
+#define EBPF_JSGT	0x60
+#define EBPF_JSGE	0x70
+#define EBPF_CALL	0x80
+#define EBPF_EXIT	0x90
+#define EBPF_JLT	0xa0
+#define EBPF_JLE	0xb0
+#define EBPF_JSLT	0xc0
+#define EBPF_JSLE	0xd0
+
+#define BPF_SRC(code)   ((code) & 0x08)
+#define	BPF_K		0x00
+#define	BPF_X		0x08
+
+/* if BPF_OP(code) == EBPF_END */
+#define EBPF_TO_LE	0x00  /* convert to little-endian */
+#define EBPF_TO_BE	0x08  /* convert to big-endian */
+
+/*
+ * eBPF registers
+ */
+enum {
+	EBPF_REG_0,  /* return value from internal function/for eBPF program */
+	EBPF_REG_1,  /* 0-th argument to internal function */
+	EBPF_REG_2,  /* 1-th argument to internal function */
+	EBPF_REG_3,  /* 2-th argument to internal function */
+	EBPF_REG_4,  /* 3-th argument to internal function */
+	EBPF_REG_5,  /* 4-th argument to internal function */
+	EBPF_REG_6,  /* callee saved register */
+	EBPF_REG_7,  /* callee saved register */
+	EBPF_REG_8,  /* callee saved register */
+	EBPF_REG_9,  /* callee saved register */
+	EBPF_REG_10, /* stack pointer (read-only) */
+	EBPF_REG_NUM,
+};
+
+/*
+ * eBPF instruction format
+ */
+struct ebpf_insn {
+	uint8_t code;
+	uint8_t dst_reg:4;
+	uint8_t src_reg:4;
+	int16_t off;
+	int32_t imm;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_BPF_DEF_H_ */
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..e373b1f3d
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,453 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define EBPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define EBPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_BPF_LOG(ERR, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[EBPF_REG_NUM], const struct ebpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[EBPF_REG_NUM], const struct ebpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
+{
+	const struct ebpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | EBPF_MOV | BPF_K):
+			EBPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | EBPF_MOV | BPF_X):
+			EBPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | EBPF_END | EBPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | EBPF_END | EBPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (EBPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (EBPF_ALU64 | EBPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (EBPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (EBPF_ALU64 | EBPF_MOV | BPF_K):
+			EBPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (EBPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (EBPF_ALU64 | EBPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (EBPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (EBPF_ALU64 | EBPF_MOV | BPF_X):
+			EBPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | EBPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | EBPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | EBPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | EBPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | EBPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | EBPF_XADD | EBPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | EBPF_CALL):
+			reg[EBPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[EBPF_REG_1], reg[EBPF_REG_2],
+				reg[EBPF_REG_3], reg[EBPF_REG_4],
+				reg[EBPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | EBPF_EXIT):
+			return reg[EBPF_REG_0];
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[EBPF_REG_NUM];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[EBPF_REG_1] = (uintptr_t)ctx[i];
+		reg[EBPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..5d7e65c31
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+extern int rte_bpf_logtype;
+
+#define	RTE_BPF_LOG(lvl, fmt, args...) \
+	rte_log(RTE_LOG_## lvl, rte_bpf_logtype, fmt, ##args)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..9450b4b79
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,386 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+/* To overcome compatibility issue */
+#ifndef EM_BPF
+#define	EM_BPF	247
+#endif
+
+static uint32_t
+bpf_find_xsym(const char *sn, enum rte_bpf_xtype type,
+	const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == type && strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+/*
+ * update BPF code at offset *ofs* with a proper address(index) for external
+ * symbol *sn*
+ */
+static int
+resolve_xsym(const char *sn, size_t ofs, struct ebpf_insn *ins, size_t ins_sz,
+	const struct rte_bpf_prm *prm)
+{
+	uint32_t idx, fidx;
+	enum rte_bpf_xtype type;
+
+	if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+		return -EINVAL;
+
+	idx = ofs / sizeof(ins[0]);
+	if (ins[idx].code == (BPF_JMP | EBPF_CALL))
+		type = RTE_BPF_XTYPE_FUNC;
+	else if (ins[idx].code == (BPF_LD | BPF_IMM | EBPF_DW) &&
+			ofs < ins_sz - sizeof(ins[idx]))
+		type = RTE_BPF_XTYPE_VAR;
+	else
+		return -EINVAL;
+
+	fidx = bpf_find_xsym(sn, type, prm->xsym, prm->nb_xsym);
+	if (fidx == UINT32_MAX)
+		return -ENOENT;
+
+	/* for function we just need an index in our xsym table */
+	if (type == RTE_BPF_XTYPE_FUNC)
+		ins[idx].imm = fidx;
+	/* for variable we need to store its absolute address */
+	else {
+		ins[idx].imm = (uintptr_t)prm->xsym[fidx].var;
+		ins[idx + 1].imm =
+			(uint64_t)(uintptr_t)prm->xsym[fidx].var >> 32;
+	}
+
+	return 0;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr * eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_BPF_LOG(ERR, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct ebpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct ebpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	int32_t rc;
+	uint32_t i, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		rc = resolve_xsym(sn, ofs, ins, ins_sz, prm);
+		if (rc != 0) {
+			RTE_BPF_LOG(ERR,
+				"resolve_xsym(%s, %zu) error code: %d\n",
+				sn, ofs, rc);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct ebpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_BPF_LOG(ERR, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_BPF_LOG(INFO, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p(jit={.func=%p,.sz=%zu});\n",
+		__func__, fname, sname, bpf, bpf->jit.func, bpf->jit.sz);
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..6a1b33181
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct ebpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == EBPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
new file mode 100644
index 000000000..e7c8d3398
--- /dev/null
+++ b/lib/librte_bpf/meson.build
@@ -0,0 +1,19 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+allow_experimental_apis = true
+sources = files('bpf.c',
+		'bpf_exec.c',
+		'bpf_load.c',
+		'bpf_validate.c')
+
+install_headers = files('bpf_def.h',
+			'rte_bpf.h')
+
+deps += ['mbuf', 'net']
+
+dep = dependency('libelf', required: false)
+if dep.found() == false
+	build = false
+endif
+ext_deps += dep
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..1d6c4a9d2
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,184 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+/**
+ * @file
+ *
+ * RTE BPF support.
+ * librte_bpf provides a framework to load and execute eBPF bytecode
+ * inside user-space dpdk based applications.
+ * It supports basic set of features from eBPF spec
+ * (https://www.kernel.org/doc/Documentation/networking/filter.txt).
+ */
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <bpf_def.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for function/BPF program arguments.
+ */
+enum rte_bpf_arg_type {
+	RTE_BPF_ARG_UNDEF,      /**< undefined */
+	RTE_BPF_ARG_RAW,        /**< scalar value */
+	RTE_BPF_ARG_PTR = 0x10, /**< pointer to data buffer */
+	RTE_BPF_ARG_PTR_MBUF,   /**< pointer to rte_mbuf */
+	RTE_BPF_ARG_PTR_STACK,
+};
+
+/**
+ * function argument information
+ */
+struct rte_bpf_arg {
+	enum rte_bpf_arg_type type;
+	size_t size;     /**< for pointer types, size of data it points to */
+	size_t buf_size;
+	/**< for mbuf ptr type, max size of rte_mbuf data buffer */
+};
+
+/**
+ * determine is argument a pointer
+ */
+#define RTE_BPF_ARG_PTR_TYPE(x)	((x) & RTE_BPF_ARG_PTR)
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_VAR,  /**< variable */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	union {
+		uint64_t (*func)(uint64_t, uint64_t, uint64_t,
+				uint64_t, uint64_t);
+		void *var;
+	}; /**< value */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct ebpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	struct rte_bpf_arg prog_arg; /**< eBPF program input arg description */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *); /**< JIT-ed native code */
+	size_t sz;                /**< size of JIT-ed code */
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ff65144df
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,12 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_elf_load;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/lib/meson.build b/lib/meson.build
index 166905c1c..9635aff41 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -23,7 +23,7 @@ libraries = [ 'compat', # just a header, used for versioning
 	# add pkt framework libs which use other libs from above
 	'port', 'table', 'pipeline',
 	# flow_classify lib depends on pkt framework table lib
-	'flow_classify']
+	'flow_classify', 'bpf']
 
 foreach l:libraries
 	build = true
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 29a2a6095..6a3bde136 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -82,6 +82,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf -lelf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v5 2/8] bpf: add more logic into bpf_validate()
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 0/8] add framework to load and execute BPF code Konstantin Ananyev
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
@ 2018-05-04 12:45       ` Konstantin Ananyev
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 3/8] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
                         ` (5 subsequent siblings)
  8 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-04 12:45 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add checks for:
 - all instructions are valid ones
   (known opcodes, correct syntax, valid reg/off/imm values, etc.)
 - no unreachable instructions
 - no loops
 - basic stack boundaries checks
 - division by zero

Still need to add checks for:
 - use/return only initialized registers and stack data.
 - memory boundaries violation

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/bpf_validate.c | 1181 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 1155 insertions(+), 26 deletions(-)

diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
index 6a1b33181..b7081c853 100644
--- a/lib/librte_bpf/bpf_validate.c
+++ b/lib/librte_bpf/bpf_validate.c
@@ -14,42 +14,1171 @@
 
 #include "bpf_impl.h"
 
+/* possible instruction node colour */
+enum {
+	WHITE,
+	GREY,
+	BLACK,
+	MAX_NODE_COLOUR
+};
+
+/* possible edge types */
+enum {
+	UNKNOWN_EDGE,
+	TREE_EDGE,
+	BACK_EDGE,
+	CROSS_EDGE,
+	MAX_EDGE_TYPE
+};
+
+struct bpf_reg_state {
+	uint64_t val;
+};
+
+struct bpf_eval_state {
+	struct bpf_reg_state rs[EBPF_REG_NUM];
+};
+
+#define	MAX_EDGES	2
+
+struct inst_node {
+	uint8_t colour;
+	uint8_t nb_edge:4;
+	uint8_t cur_edge:4;
+	uint8_t edge_type[MAX_EDGES];
+	uint32_t edge_dest[MAX_EDGES];
+	uint32_t prev_node;
+	struct bpf_eval_state *evst;
+};
+
+struct bpf_verifier {
+	const struct rte_bpf_prm *prm;
+	struct inst_node *in;
+	int32_t stack_sz;
+	uint32_t nb_nodes;
+	uint32_t nb_jcc_nodes;
+	uint32_t node_colour[MAX_NODE_COLOUR];
+	uint32_t edge_type[MAX_EDGE_TYPE];
+	struct bpf_eval_state *evst;
+	struct {
+		uint32_t num;
+		uint32_t cur;
+		struct bpf_eval_state *ent;
+	} evst_pool;
+};
+
+struct bpf_ins_check {
+	struct {
+		uint16_t dreg;
+		uint16_t sreg;
+	} mask;
+	struct {
+		uint16_t min;
+		uint16_t max;
+	} off;
+	struct {
+		uint32_t min;
+		uint32_t max;
+	} imm;
+	const char * (*check)(const struct ebpf_insn *);
+	const char * (*eval)(struct bpf_verifier *, const struct ebpf_insn *);
+};
+
+#define	ALL_REGS	RTE_LEN2MASK(EBPF_REG_NUM, uint16_t)
+#define	WRT_REGS	RTE_LEN2MASK(EBPF_REG_10, uint16_t)
+#define	ZERO_REG	RTE_LEN2MASK(EBPF_REG_1, uint16_t)
+
 /*
- * dummy one for now, need more work.
+ * check and evaluate functions for particular instruction types.
  */
-int
-bpf_validate(struct rte_bpf *bpf)
+
+static const char *
+check_alu_bele(const struct ebpf_insn *ins)
+{
+	if (ins->imm != 16 && ins->imm != 32 && ins->imm != 64)
+		return "invalid imm field";
+	return NULL;
+}
+
+static const char *
+eval_stack(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
+{
+	int32_t ofs;
+
+	ofs = ins->off;
+
+	if (ofs >= 0 || ofs < -MAX_BPF_STACK_SIZE)
+		return "stack boundary violation";
+
+	ofs = -ofs;
+	bvf->stack_sz = RTE_MAX(bvf->stack_sz, ofs);
+	return NULL;
+}
+
+static const char *
+eval_store(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
+{
+	if (ins->dst_reg == EBPF_REG_10)
+		return eval_stack(bvf, ins);
+	return NULL;
+}
+
+static const char *
+eval_load(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
+{
+	if (ins->src_reg == EBPF_REG_10)
+		return eval_stack(bvf, ins);
+	return NULL;
+}
+
+static const char *
+eval_call(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
+{
+	uint32_t idx;
+
+	idx = ins->imm;
+
+	if (idx >= bvf->prm->nb_xsym ||
+			bvf->prm->xsym[idx].type != RTE_BPF_XTYPE_FUNC)
+		return "invalid external function index";
+
+	/* for now don't support function calls on 32 bit platform */
+	if (sizeof(uint64_t) != sizeof(uintptr_t))
+		return "function calls are supported only for 64 bit apps";
+	return NULL;
+}
+
+/*
+ * validate parameters for each instruction type.
+ */
+static const struct bpf_ins_check ins_chk[UINT8_MAX] = {
+	/* ALU IMM 32-bit instructions */
+	[(BPF_ALU | BPF_ADD | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_SUB | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_AND | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_OR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_LSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_RSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_XOR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_MUL | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | EBPF_MOV | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_DIV | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	[(BPF_ALU | BPF_MOD | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	/* ALU IMM 64-bit instructions */
+	[(EBPF_ALU64 | BPF_ADD | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_SUB | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_AND | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_OR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_LSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_RSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | EBPF_ARSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_XOR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_MUL | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | EBPF_MOV | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_DIV | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	[(EBPF_ALU64 | BPF_MOD | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	/* ALU REG 32-bit instructions */
+	[(BPF_ALU | BPF_ADD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_SUB | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_AND | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_OR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_LSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_RSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_XOR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MUL | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_DIV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MOD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | EBPF_MOV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_NEG)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | EBPF_END | EBPF_TO_BE)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 16, .max = 64},
+		.check = check_alu_bele,
+	},
+	[(BPF_ALU | EBPF_END | EBPF_TO_LE)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 16, .max = 64},
+		.check = check_alu_bele,
+	},
+	/* ALU REG 64-bit instructions */
+	[(EBPF_ALU64 | BPF_ADD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_SUB | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_AND | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_OR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_LSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_RSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | EBPF_ARSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_XOR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_MUL | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_DIV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_MOD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | EBPF_MOV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_NEG)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* load instructions */
+	[(BPF_LDX | BPF_MEM | BPF_B)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_H)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_W)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | EBPF_DW)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	/* load 64 bit immediate value */
+	[(BPF_LD | BPF_IMM | EBPF_DW)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	/* store REG instructions */
+	[(BPF_STX | BPF_MEM | BPF_B)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_H)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | EBPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	/* atomic add instructions */
+	[(BPF_STX | EBPF_XADD | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | EBPF_XADD | EBPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	/* store IMM instructions */
+	[(BPF_ST | BPF_MEM | BPF_B)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_H)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | EBPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	/* jump instruction */
+	[(BPF_JMP | BPF_JA)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* jcc IMM instructions */
+	[(BPF_JMP | BPF_JEQ | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JNE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JGT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JLT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JGE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JLE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JSGT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JSLT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JSGE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JSLE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSET | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	/* jcc REG instructions */
+	[(BPF_JMP | BPF_JEQ | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JNE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JGT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JLT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JGE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JLE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JSGT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JSLT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JSGE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JSLE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSET | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* call instruction */
+	[(BPF_JMP | EBPF_CALL)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_call,
+	},
+	/* ret instruction */
+	[(BPF_JMP | EBPF_EXIT)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+};
+
+/*
+ * make sure that instruction syntax is valid,
+ * and it fields don't violate partciular instrcution type restrictions.
+ */
+static const char *
+check_syntax(const struct ebpf_insn *ins)
+{
+
+	uint8_t op;
+	uint16_t off;
+	uint32_t imm;
+
+	op = ins->code;
+
+	if (ins_chk[op].mask.dreg == 0)
+		return "invalid opcode";
+
+	if ((ins_chk[op].mask.dreg & 1 << ins->dst_reg) == 0)
+		return "invalid dst-reg field";
+
+	if ((ins_chk[op].mask.sreg & 1 << ins->src_reg) == 0)
+		return "invalid src-reg field";
+
+	off = ins->off;
+	if (ins_chk[op].off.min > off || ins_chk[op].off.max < off)
+		return "invalid off field";
+
+	imm = ins->imm;
+	if (ins_chk[op].imm.min > imm || ins_chk[op].imm.max < imm)
+		return "invalid imm field";
+
+	if (ins_chk[op].check != NULL)
+		return ins_chk[op].check(ins);
+
+	return NULL;
+}
+
+/*
+ * helper function, return instruction index for the given node.
+ */
+static uint32_t
+get_node_idx(const struct bpf_verifier *bvf, const struct inst_node *node)
 {
-	int32_t rc, ofs, stack_sz;
-	uint32_t i, op, dr;
+	return node - bvf->in;
+}
+
+/*
+ * helper function, used to walk through constructed CFG.
+ */
+static struct inst_node *
+get_next_node(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	uint32_t ce, ne, dst;
+
+	ne = node->nb_edge;
+	ce = node->cur_edge;
+	if (ce == ne)
+		return NULL;
+
+	node->cur_edge++;
+	dst = node->edge_dest[ce];
+	return bvf->in + dst;
+}
+
+static void
+set_node_colour(struct bpf_verifier *bvf, struct inst_node *node,
+	uint32_t new)
+{
+	uint32_t prev;
+
+	prev = node->colour;
+	node->colour = new;
+
+	bvf->node_colour[prev]--;
+	bvf->node_colour[new]++;
+}
+
+/*
+ * helper function, add new edge between two nodes.
+ */
+static int
+add_edge(struct bpf_verifier *bvf, struct inst_node *node, uint32_t nidx)
+{
+	uint32_t ne;
+
+	if (nidx > bvf->prm->nb_ins) {
+		RTE_BPF_LOG(ERR, "%s: program boundary violation at pc: %u, "
+			"next pc: %u\n",
+			__func__, get_node_idx(bvf, node), nidx);
+		return -EINVAL;
+	}
+
+	ne = node->nb_edge;
+	if (ne >= RTE_DIM(node->edge_dest)) {
+		RTE_BPF_LOG(ERR, "%s: internal error at pc: %u\n",
+			__func__, get_node_idx(bvf, node));
+		return -EINVAL;
+	}
+
+	node->edge_dest[ne] = nidx;
+	node->nb_edge = ne + 1;
+	return 0;
+}
+
+/*
+ * helper function, determine type of edge between two nodes.
+ */
+static void
+set_edge_type(struct bpf_verifier *bvf, struct inst_node *node,
+	const struct inst_node *next)
+{
+	uint32_t ce, clr, type;
+
+	ce = node->cur_edge - 1;
+	clr = next->colour;
+
+	type = UNKNOWN_EDGE;
+
+	if (clr == WHITE)
+		type = TREE_EDGE;
+	else if (clr == GREY)
+		type = BACK_EDGE;
+	else if (clr == BLACK)
+		/*
+		 * in fact it could be either direct or cross edge,
+		 * but for now, we don't need to distinguish between them.
+		 */
+		type = CROSS_EDGE;
+
+	node->edge_type[ce] = type;
+	bvf->edge_type[type]++;
+}
+
+static struct inst_node *
+get_prev_node(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	return  bvf->in + node->prev_node;
+}
+
+/*
+ * Depth-First Search (DFS) through previously constructed
+ * Control Flow Graph (CFG).
+ * Information collected at this path would be used later
+ * to determine is there any loops, and/or unreachable instructions.
+ */
+static void
+dfs(struct bpf_verifier *bvf)
+{
+	struct inst_node *next, *node;
+
+	node = bvf->in;
+	while (node != NULL) {
+
+		if (node->colour == WHITE)
+			set_node_colour(bvf, node, GREY);
+
+		if (node->colour == GREY) {
+
+			/* find next unprocessed child node */
+			do {
+				next = get_next_node(bvf, node);
+				if (next == NULL)
+					break;
+				set_edge_type(bvf, node, next);
+			} while (next->colour != WHITE);
+
+			if (next != NULL) {
+				/* proceed with next child */
+				next->prev_node = get_node_idx(bvf, node);
+				node = next;
+			} else {
+				/*
+				 * finished with current node and all it's kids,
+				 * proceed with parent
+				 */
+				set_node_colour(bvf, node, BLACK);
+				node->cur_edge = 0;
+				node = get_prev_node(bvf, node);
+			}
+		} else
+			node = NULL;
+	}
+}
+
+/*
+ * report unreachable instructions.
+ */
+static void
+log_unreachable(const struct bpf_verifier *bvf)
+{
+	uint32_t i;
+	struct inst_node *node;
 	const struct ebpf_insn *ins;
 
-	rc = 0;
-	stack_sz = 0;
-	for (i = 0; i != bpf->prm.nb_ins; i++) {
-
-		ins = bpf->prm.ins + i;
-		op = ins->code;
-		dr = ins->dst_reg;
-		ofs = ins->off;
-
-		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
-				dr == EBPF_REG_10) {
-			ofs -= sizeof(uint64_t);
-			stack_sz = RTE_MIN(ofs, stack_sz);
+	for (i = 0; i != bvf->prm->nb_ins; i++) {
+
+		node = bvf->in + i;
+		ins = bvf->prm->ins + i;
+
+		if (node->colour == WHITE &&
+				ins->code != (BPF_LD | BPF_IMM | EBPF_DW))
+			RTE_BPF_LOG(ERR, "unreachable code at pc: %u;\n", i);
+	}
+}
+
+/*
+ * report loops detected.
+ */
+static void
+log_loop(const struct bpf_verifier *bvf)
+{
+	uint32_t i, j;
+	struct inst_node *node;
+
+	for (i = 0; i != bvf->prm->nb_ins; i++) {
+
+		node = bvf->in + i;
+		if (node->colour != BLACK)
+			continue;
+
+		for (j = 0; j != node->nb_edge; j++) {
+			if (node->edge_type[j] == BACK_EDGE)
+				RTE_BPF_LOG(ERR,
+					"loop at pc:%u --> pc:%u;\n",
+					i, node->edge_dest[j]);
 		}
 	}
+}
+
+/*
+ * First pass goes though all instructions in the set, checks that each
+ * instruction is a valid one (correct syntax, valid field values, etc.)
+ * and constructs control flow graph (CFG).
+ * Then deapth-first search is performed over the constructed graph.
+ * Programs with unreachable instructions and/or loops will be rejected.
+ */
+static int
+validate(struct bpf_verifier *bvf)
+{
+	int32_t rc;
+	uint32_t i;
+	struct inst_node *node;
+	const struct ebpf_insn *ins;
+	const char *err;
 
-	if (stack_sz != 0) {
-		stack_sz = -stack_sz;
-		if (stack_sz > MAX_BPF_STACK_SIZE)
-			rc = -ERANGE;
-		else
-			bpf->stack_sz = stack_sz;
+	rc = 0;
+	for (i = 0; i < bvf->prm->nb_ins; i++) {
+
+		ins = bvf->prm->ins + i;
+		node = bvf->in + i;
+
+		err = check_syntax(ins);
+		if (err != 0) {
+			RTE_BPF_LOG(ERR, "%s: %s at pc: %u\n",
+				__func__, err, i);
+			rc |= -EINVAL;
+		}
+
+		/*
+		 * construct CFG, jcc nodes have to outgoing edges,
+		 * 'exit' nodes - none, all others nodes have exaclty one
+		 * outgoing edge.
+		 */
+		switch (ins->code) {
+		case (BPF_JMP | EBPF_EXIT):
+			break;
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | EBPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | EBPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | EBPF_JLE | BPF_K):
+		case (BPF_JMP | EBPF_JSGT | BPF_K):
+		case (BPF_JMP | EBPF_JSLT | BPF_K):
+		case (BPF_JMP | EBPF_JSGE | BPF_K):
+		case (BPF_JMP | EBPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | EBPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | EBPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | EBPF_JLE | BPF_X):
+		case (BPF_JMP | EBPF_JSGT | BPF_X):
+		case (BPF_JMP | EBPF_JSLT | BPF_X):
+		case (BPF_JMP | EBPF_JSGE | BPF_X):
+		case (BPF_JMP | EBPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			rc |= add_edge(bvf, node, i + ins->off + 1);
+			rc |= add_edge(bvf, node, i + 1);
+			bvf->nb_jcc_nodes++;
+			break;
+		case (BPF_JMP | BPF_JA):
+			rc |= add_edge(bvf, node, i + ins->off + 1);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | EBPF_DW):
+			rc |= add_edge(bvf, node, i + 2);
+			i++;
+			break;
+		default:
+			rc |= add_edge(bvf, node, i + 1);
+			break;
+		}
+
+		bvf->nb_nodes++;
+		bvf->node_colour[WHITE]++;
 	}
 
 	if (rc != 0)
-		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
-			__func__, bpf, rc);
+		return rc;
+
+	dfs(bvf);
+
+	RTE_BPF_LOG(DEBUG, "%s(%p) stats:\n"
+		"nb_nodes=%u;\n"
+		"nb_jcc_nodes=%u;\n"
+		"node_color={[WHITE]=%u, [GREY]=%u,, [BLACK]=%u};\n"
+		"edge_type={[UNKNOWN]=%u, [TREE]=%u, [BACK]=%u, [CROSS]=%u};\n",
+		__func__, bvf,
+		bvf->nb_nodes,
+		bvf->nb_jcc_nodes,
+		bvf->node_colour[WHITE], bvf->node_colour[GREY],
+			bvf->node_colour[BLACK],
+		bvf->edge_type[UNKNOWN_EDGE], bvf->edge_type[TREE_EDGE],
+		bvf->edge_type[BACK_EDGE], bvf->edge_type[CROSS_EDGE]);
+
+	if (bvf->node_colour[BLACK] != bvf->nb_nodes) {
+		RTE_BPF_LOG(ERR, "%s(%p) unreachable instructions;\n",
+			__func__, bvf);
+		log_unreachable(bvf);
+		return -EINVAL;
+	}
+
+	if (bvf->node_colour[GREY] != 0 || bvf->node_colour[WHITE] != 0 ||
+			bvf->edge_type[UNKNOWN_EDGE] != 0) {
+		RTE_BPF_LOG(ERR, "%s(%p) DFS internal error;\n",
+			__func__, bvf);
+		return -EINVAL;
+	}
+
+	if (bvf->edge_type[BACK_EDGE] != 0) {
+		RTE_BPF_LOG(ERR, "%s(%p) loops detected;\n",
+			__func__, bvf);
+		log_loop(bvf);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper functions get/free eval states.
+ */
+static struct bpf_eval_state *
+pull_eval_state(struct bpf_verifier *bvf)
+{
+	uint32_t n;
+
+	n = bvf->evst_pool.cur;
+	if (n == bvf->evst_pool.num)
+		return NULL;
+
+	bvf->evst_pool.cur = n + 1;
+	return bvf->evst_pool.ent + n;
+}
+
+static void
+push_eval_state(struct bpf_verifier *bvf)
+{
+	bvf->evst_pool.cur--;
+}
+
+static void
+evst_pool_fini(struct bpf_verifier *bvf)
+{
+	bvf->evst = NULL;
+	free(bvf->evst_pool.ent);
+	memset(&bvf->evst_pool, 0, sizeof(bvf->evst_pool));
+}
+
+static int
+evst_pool_init(struct bpf_verifier *bvf)
+{
+	uint32_t n;
+
+	n = bvf->nb_jcc_nodes + 1;
+
+	bvf->evst_pool.ent = calloc(n, sizeof(bvf->evst_pool.ent[0]));
+	if (bvf->evst_pool.ent == NULL)
+		return -ENOMEM;
+
+	bvf->evst_pool.num = n;
+	bvf->evst_pool.cur = 0;
+
+	bvf->evst = pull_eval_state(bvf);
+	return 0;
+}
+
+/*
+ * Save current eval state.
+ */
+static int
+save_eval_state(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	struct bpf_eval_state *st;
+
+	/* get new eval_state for this node */
+	st = pull_eval_state(bvf);
+	if (st == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s: internal error (out of space) at pc: %u",
+			__func__, get_node_idx(bvf, node));
+		return -ENOMEM;
+	}
+
+	/* make a copy of current state */
+	memcpy(st, bvf->evst, sizeof(*st));
+
+	/* swap current state with new one */
+	node->evst = bvf->evst;
+	bvf->evst = st;
+
+	RTE_BPF_LOG(DEBUG, "%s(bvf=%p,node=%u) old/new states: %p/%p;\n",
+		__func__, bvf, get_node_idx(bvf, node), node->evst, bvf->evst);
+
+	return 0;
+}
+
+/*
+ * Restore previous eval state and mark current eval state as free.
+ */
+static void
+restore_eval_state(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	RTE_BPF_LOG(DEBUG, "%s(bvf=%p,node=%u) old/new states: %p/%p;\n",
+		__func__, bvf, get_node_idx(bvf, node), bvf->evst, node->evst);
+
+	bvf->evst = node->evst;
+	node->evst = NULL;
+	push_eval_state(bvf);
+}
+
+/*
+ * Do second pass through CFG and try to evaluate instructions
+ * via each possible path.
+ * Right now evaluation functionality is quite limited.
+ * Still need to add extra checks for:
+ * - use/return uninitialized registers.
+ * - use uninitialized data from the stack.
+ * - memory boundaries violation.
+ */
+static int
+evaluate(struct bpf_verifier *bvf)
+{
+	int32_t rc;
+	uint32_t idx, op;
+	const char *err;
+	const struct ebpf_insn *ins;
+	struct inst_node *next, *node;
+
+	node = bvf->in;
+	ins = bvf->prm->ins;
+	rc = 0;
+
+	while (node != NULL && rc == 0) {
+
+		/* current node evaluation */
+		idx = get_node_idx(bvf, node);
+		op = ins[idx].code;
+
+		if (ins_chk[op].eval != NULL) {
+			err = ins_chk[op].eval(bvf, ins + idx);
+			if (err != NULL) {
+				RTE_BPF_LOG(ERR, "%s: %s at pc: %u\n",
+					__func__, err, idx);
+				rc = -EINVAL;
+			}
+		}
+
+		/* proceed through CFG */
+		next = get_next_node(bvf, node);
+		if (next != NULL) {
+
+			/* proceed with next child */
+			if (node->cur_edge != node->nb_edge)
+				rc |= save_eval_state(bvf, node);
+			else if (node->evst != NULL)
+				restore_eval_state(bvf, node);
+
+			next->prev_node = get_node_idx(bvf, node);
+			node = next;
+		} else {
+			/*
+			 * finished with current node and all it's kids,
+			 * proceed with parent
+			 */
+			node->cur_edge = 0;
+			node = get_prev_node(bvf, node);
+
+			/* finished */
+			if (node == bvf->in)
+				node = NULL;
+		}
+	}
+
+	return rc;
+}
+
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	struct bpf_verifier bvf;
+
+	/* check input argument type, don't allow mbuf ptr on 32-bit */
+	if (bpf->prm.prog_arg.type != RTE_BPF_ARG_RAW &&
+			bpf->prm.prog_arg.type != RTE_BPF_ARG_PTR &&
+			(sizeof(uint64_t) != sizeof(uintptr_t) ||
+			bpf->prm.prog_arg.type != RTE_BPF_ARG_PTR_MBUF)) {
+		RTE_BPF_LOG(ERR, "%s: unsupported argument type\n", __func__);
+		return -ENOTSUP;
+	}
+
+	memset(&bvf, 0, sizeof(bvf));
+	bvf.prm = &bpf->prm;
+	bvf.in = calloc(bpf->prm.nb_ins, sizeof(bvf.in[0]));
+	if (bvf.in == NULL)
+		return -ENOMEM;
+
+	rc = validate(&bvf);
+
+	if (rc == 0) {
+		rc = evst_pool_init(&bvf);
+		if (rc == 0)
+			rc = evaluate(&bvf);
+		evst_pool_fini(&bvf);
+	}
+
+	free(bvf.in);
+
+	/* copy collected info */
+	if (rc == 0)
+		bpf->stack_sz = bvf.stack_sz;
+
 	return rc;
 }
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v5 3/8] bpf: add JIT compilation for x86_64 ISA
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                         ` (2 preceding siblings ...)
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 2/8] bpf: add more logic into bpf_validate() Konstantin Ananyev
@ 2018-05-04 12:45       ` Konstantin Ananyev
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 4/8] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
                         ` (4 subsequent siblings)
  8 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-04 12:45 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/Makefile      |    3 +
 lib/librte_bpf/bpf.c         |    5 +
 lib/librte_bpf/bpf_jit_x86.c | 1369 ++++++++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build   |    4 +
 4 files changed, 1381 insertions(+)
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index 9b714389a..7901a0e78 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -23,6 +23,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
+endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += bpf_def.h
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
index d7f68c017..dc6d10991 100644
--- a/lib/librte_bpf/bpf.c
+++ b/lib/librte_bpf/bpf.c
@@ -41,7 +41,12 @@ bpf_jit(struct rte_bpf *bpf)
 {
 	int32_t rc;
 
+#ifdef RTE_ARCH_X86_64
+	rc = bpf_jit_x86(bpf);
+#else
 	rc = -ENOTSUP;
+#endif
+
 	if (rc != 0)
 		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
 			__func__, bpf, rc);
diff --git a/lib/librte_bpf/bpf_jit_x86.c b/lib/librte_bpf/bpf_jit_x86.c
new file mode 100644
index 000000000..111e028d2
--- /dev/null
+++ b/lib/librte_bpf/bpf_jit_x86.c
@@ -0,0 +1,1369 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define GET_BPF_OP(op)	(BPF_OP(op) >> 4)
+
+enum {
+	RAX = 0,  /* scratch, return value */
+	RCX = 1,  /* scratch, 4th arg */
+	RDX = 2,  /* scratch, 3rd arg */
+	RBX = 3,  /* callee saved */
+	RSP = 4,  /* stack pointer */
+	RBP = 5,  /* frame pointer, callee saved */
+	RSI = 6,  /* scratch, 2nd arg */
+	RDI = 7,  /* scratch, 1st arg */
+	R8  = 8,  /* scratch, 5th arg */
+	R9  = 9,  /* scratch, 6th arg */
+	R10 = 10, /* scratch */
+	R11 = 11, /* scratch */
+	R12 = 12, /* callee saved */
+	R13 = 13, /* callee saved */
+	R14 = 14, /* callee saved */
+	R15 = 15, /* callee saved */
+};
+
+#define IS_EXT_REG(r)	((r) >= R8)
+
+enum {
+	REX_PREFIX = 0x40, /* fixed value 0100 */
+	REX_W = 0x8,       /* 64bit operand size */
+	REX_R = 0x4,       /* extension of the ModRM.reg field */
+	REX_X = 0x2,       /* extension of the SIB.index field */
+	REX_B = 0x1,       /* extension of the ModRM.rm field */
+};
+
+enum {
+	MOD_INDIRECT = 0,
+	MOD_IDISP8 = 1,
+	MOD_IDISP32 = 2,
+	MOD_DIRECT = 3,
+};
+
+enum {
+	SIB_SCALE_1 = 0,
+	SIB_SCALE_2 = 1,
+	SIB_SCALE_4 = 2,
+	SIB_SCALE_8 = 3,
+};
+
+/*
+ * eBPF to x86_64 register mappings.
+ */
+static const uint32_t ebpf2x86[] = {
+	[EBPF_REG_0] = RAX,
+	[EBPF_REG_1] = RDI,
+	[EBPF_REG_2] = RSI,
+	[EBPF_REG_3] = RDX,
+	[EBPF_REG_4] = RCX,
+	[EBPF_REG_5] = R8,
+	[EBPF_REG_6] = RBX,
+	[EBPF_REG_7] = R13,
+	[EBPF_REG_8] = R14,
+	[EBPF_REG_9] = R15,
+	[EBPF_REG_10] = RBP,
+};
+
+/*
+ * r10 and r11 are used as a scratch temporary registers.
+ */
+enum {
+	REG_DIV_IMM = R9,
+	REG_TMP0 = R11,
+	REG_TMP1 = R10,
+};
+
+/*
+ * callee saved registers list.
+ * keep RBP as the last one.
+ */
+static const uint32_t save_regs[] = {RBX, R12, R13, R14, R15, RBP};
+
+struct bpf_jit_state {
+	uint32_t idx;
+	size_t sz;
+	struct {
+		uint32_t num;
+		int32_t off;
+	} exit;
+	uint32_t reguse;
+	int32_t *off;
+	uint8_t *ins;
+};
+
+#define	INUSE(v, r)	(((v) >> (r)) & 1)
+#define	USED(v, r)	((v) |= 1 << (r))
+
+union bpf_jit_imm {
+	uint32_t u32;
+	uint8_t u8[4];
+};
+
+static size_t
+bpf_size(uint32_t bpf_op_sz)
+{
+	if (bpf_op_sz == BPF_B)
+		return sizeof(uint8_t);
+	else if (bpf_op_sz == BPF_H)
+		return sizeof(uint16_t);
+	else if (bpf_op_sz == BPF_W)
+		return sizeof(uint32_t);
+	else if (bpf_op_sz == EBPF_DW)
+		return sizeof(uint64_t);
+	return 0;
+}
+
+/*
+ * In many cases for imm8 we can produce shorter code.
+ */
+static size_t
+imm_size(int32_t v)
+{
+	if (v == (int8_t)v)
+		return sizeof(int8_t);
+	return sizeof(int32_t);
+}
+
+static void
+emit_bytes(struct bpf_jit_state *st, const uint8_t ins[], uint32_t sz)
+{
+	uint32_t i;
+
+	if (st->ins != NULL) {
+		for (i = 0; i != sz; i++)
+			st->ins[st->sz + i] = ins[i];
+	}
+	st->sz += sz;
+}
+
+static void
+emit_imm(struct bpf_jit_state *st, const uint32_t imm, uint32_t sz)
+{
+	union bpf_jit_imm v;
+
+	v.u32 = imm;
+	emit_bytes(st, v.u8, sz);
+}
+
+/*
+ * emit REX byte
+ */
+static void
+emit_rex(struct bpf_jit_state *st, uint32_t op, uint32_t reg, uint32_t rm)
+{
+	uint8_t rex;
+
+	/* mark operand registers as used*/
+	USED(st->reguse, reg);
+	USED(st->reguse, rm);
+
+	rex = 0;
+	if (BPF_CLASS(op) == EBPF_ALU64 ||
+			op == (BPF_ST | BPF_MEM | EBPF_DW) ||
+			op == (BPF_STX | BPF_MEM | EBPF_DW) ||
+			op == (BPF_STX | EBPF_XADD | EBPF_DW) ||
+			op == (BPF_LD | BPF_IMM | EBPF_DW) ||
+			(BPF_CLASS(op) == BPF_LDX &&
+			BPF_MODE(op) == BPF_MEM &&
+			BPF_SIZE(op) != BPF_W))
+		rex |= REX_W;
+
+	if (IS_EXT_REG(reg))
+		rex |= REX_R;
+
+	if (IS_EXT_REG(rm))
+		rex |= REX_B;
+
+	/* store using SIL, DIL */
+	if (op == (BPF_STX | BPF_MEM | BPF_B) && (reg == RDI || reg == RSI))
+		rex |= REX_PREFIX;
+
+	if (rex != 0) {
+		rex |= REX_PREFIX;
+		emit_bytes(st, &rex, sizeof(rex));
+	}
+}
+
+/*
+ * emit MODRegRM byte
+ */
+static void
+emit_modregrm(struct bpf_jit_state *st, uint32_t mod, uint32_t reg, uint32_t rm)
+{
+	uint8_t v;
+
+	v = mod << 6 | (reg & 7) << 3 | (rm & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit SIB byte
+ */
+static void
+emit_sib(struct bpf_jit_state *st, uint32_t scale, uint32_t idx, uint32_t base)
+{
+	uint8_t v;
+
+	v = scale << 6 | (idx & 7) << 3 | (base & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit xchg %<sreg>, %<dreg>
+ */
+static void
+emit_xchg_reg(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	const uint8_t ops = 0x87;
+
+	emit_rex(st, EBPF_ALU64, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit neg %<dreg>
+ */
+static void
+emit_neg(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 3;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+/*
+ * emit mov %<sreg>, %<dreg>
+ */
+static void
+emit_mov_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x89;
+
+	/* if operands are 32-bit, then it can be used to clear upper 32-bit */
+	if (sreg != dreg || BPF_CLASS(op) == BPF_ALU) {
+		emit_rex(st, op, sreg, dreg);
+		emit_bytes(st, &ops, sizeof(ops));
+		emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+	}
+}
+
+/*
+ * emit movzwl %<sreg>, %<dreg>
+ */
+static void
+emit_movzwl(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	static const uint8_t ops[] = {0x0F, 0xB7};
+
+	emit_rex(st, BPF_ALU, sreg, dreg);
+	emit_bytes(st, ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit ror <imm8>, %<dreg>
+ */
+static void
+emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t prfx = 0x66;
+	const uint8_t ops = 0xC1;
+	const uint8_t mods = 1;
+
+	emit_bytes(st, &prfx, sizeof(prfx));
+	emit_rex(st, BPF_ALU, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit bswap %<dreg>
+ */
+static void
+emit_be2le_48(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	uint32_t rop;
+
+	const uint8_t ops = 0x0F;
+	const uint8_t mods = 1;
+
+	rop = (imm == 64) ? EBPF_ALU64 : BPF_ALU;
+	emit_rex(st, rop, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+static void
+emit_be2le(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16) {
+		emit_ror_imm(st, dreg, 8);
+		emit_movzwl(st, dreg, dreg);
+	} else
+		emit_be2le_48(st, dreg, imm);
+}
+
+/*
+ * In general it is NOP for x86.
+ * Just clear the upper bits.
+ */
+static void
+emit_le2be(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16)
+		emit_movzwl(st, dreg, dreg);
+	else if (imm == 32)
+		emit_mov_reg(st, BPF_ALU | EBPF_MOV | BPF_X, dreg, dreg);
+}
+
+/*
+ * emit one of:
+ *   add <imm>, %<dreg>
+ *   and <imm>, %<dreg>
+ *   or  <imm>, %<dreg>
+ *   sub <imm>, %<dreg>
+ *   xor <imm>, %<dreg>
+ */
+static void
+emit_alu_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t mod, opcode;
+	uint32_t bop, imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0,
+		[GET_BPF_OP(BPF_AND)] = 4,
+		[GET_BPF_OP(BPF_OR)] =  1,
+		[GET_BPF_OP(BPF_SUB)] = 5,
+		[GET_BPF_OP(BPF_XOR)] = 6,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+
+	imsz = imm_size(imm);
+	opcode = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &opcode, sizeof(opcode));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit one of:
+ *   add %<sreg>, %<dreg>
+ *   and %<sreg>, %<dreg>
+ *   or  %<sreg>, %<dreg>
+ *   sub %<sreg>, %<dreg>
+ *   xor %<sreg>, %<dreg>
+ */
+static void
+emit_alu_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0x01,
+		[GET_BPF_OP(BPF_AND)] = 0x21,
+		[GET_BPF_OP(BPF_OR)] =  0x09,
+		[GET_BPF_OP(BPF_SUB)] = 0x29,
+		[GET_BPF_OP(BPF_XOR)] = 0x31,
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+static void
+emit_shift(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	uint8_t mod;
+	uint32_t bop, opx;
+
+	static const uint8_t ops[] = {0xC1, 0xD3};
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_LSH)] = 4,
+		[GET_BPF_OP(BPF_RSH)] = 5,
+		[GET_BPF_OP(EBPF_ARSH)] = 7,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+	opx = (BPF_SRC(op) == BPF_X);
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+}
+
+/*
+ * emit one of:
+ *   shl <imm>, %<dreg>
+ *   shr <imm>, %<dreg>
+ *   sar <imm>, %<dreg>
+ */
+static void
+emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm)
+{
+	emit_shift(st, op, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit one of:
+ *   shl %<dreg>
+ *   shr %<dreg>
+ *   sar %<dreg>
+ * note that rcx is implicitly used as a source register, so few extra
+ * instructions for register spillage might be necessary.
+ */
+static void
+emit_shift_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+
+	emit_shift(st, op, (dreg == RCX) ? sreg : dreg);
+
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+}
+
+/*
+ * emit mov <imm>, %<dreg>
+ */
+static void
+emit_mov_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xC7;
+
+	if (imm == 0) {
+		/* replace 'mov 0, %<dst>' with 'xor %<dst>, %<dst>' */
+		op = BPF_CLASS(op) | BPF_XOR | BPF_X;
+		emit_alu_reg(st, op, dreg, dreg);
+		return;
+	}
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+	emit_imm(st, imm, sizeof(imm));
+}
+
+/*
+ * emit mov <imm64>, %<dreg>
+ */
+static void
+emit_ld_imm64(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm0,
+	uint32_t imm1)
+{
+	const uint8_t ops = 0xB8;
+
+	if (imm1 == 0) {
+		emit_mov_imm(st, EBPF_ALU64 | EBPF_MOV | BPF_K, dreg, imm0);
+		return;
+	}
+
+	emit_rex(st, EBPF_ALU64, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+
+	emit_imm(st, imm0, sizeof(imm0));
+	emit_imm(st, imm1, sizeof(imm1));
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * either:
+ *   mov %<sreg>, %rdx
+ * OR
+ *   mov <imm>, %rdx
+ * mul %rdx
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ */
+static void
+emit_mul(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 4;
+
+	/* save rax & rdx */
+	emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RAX, REG_TMP0);
+	emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* rax = dreg */
+	emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, dreg, RAX);
+
+	if (BPF_SRC(op) == BPF_X)
+		/* rdx = sreg */
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X,
+			sreg == RAX ? REG_TMP0 : sreg, RDX);
+	else
+		/* rdx = imm */
+		emit_mov_imm(st, EBPF_ALU64 | EBPF_MOV | BPF_K, RDX, imm);
+
+	emit_rex(st, op, RAX, RDX);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RDX);
+
+	if (dreg != RDX)
+		/* restore rdx */
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, REG_TMP1, RDX);
+
+	if (dreg != RAX) {
+		/* dreg = rax */
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RAX, dreg);
+		/* restore rax */
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, REG_TMP0, RAX);
+	}
+}
+
+/*
+ * emit mov <ofs>(%<sreg>), %<dreg>
+ * note that for non 64-bit ops, higher bits have to be cleared.
+ */
+static void
+emit_ld_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	uint32_t mods, opsz;
+	const uint8_t op32 = 0x8B;
+	const uint8_t op16[] = {0x0F, 0xB7};
+	const uint8_t op8[] = {0x0F, 0xB6};
+
+	emit_rex(st, op, dreg, sreg);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_B)
+		emit_bytes(st, op8, sizeof(op8));
+	else if (opsz == BPF_H)
+		emit_bytes(st, op16, sizeof(op16));
+	else
+		emit_bytes(st, &op32, sizeof(op32));
+
+	mods = (imm_size(ofs) == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, dreg, sreg);
+	if (sreg == RSP || sreg == R12)
+		emit_sib(st, SIB_SCALE_1, sreg, sreg);
+	emit_imm(st, ofs, imm_size(ofs));
+}
+
+/*
+ * emit one of:
+ *   mov %<sreg>, <ofs>(%<dreg>)
+ *   mov <imm>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_common(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, uint32_t imm, int32_t ofs)
+{
+	uint32_t mods, imsz, opsz, opx;
+	const uint8_t prfx16 = 0x66;
+
+	/* 8 bit instruction opcodes */
+	static const uint8_t op8[] = {0xC6, 0x88};
+
+	/* 16/32/64 bit instruction opcodes */
+	static const uint8_t ops[] = {0xC7, 0x89};
+
+	/* is the instruction has immediate value or src reg? */
+	opx = (BPF_CLASS(op) == BPF_STX);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_H)
+		emit_bytes(st, &prfx16, sizeof(prfx16));
+
+	emit_rex(st, op, sreg, dreg);
+
+	if (opsz == BPF_B)
+		emit_bytes(st, &op8[opx], sizeof(op8[opx]));
+	else
+		emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, sreg, dreg);
+
+	if (dreg == RSP || dreg == R12)
+		emit_sib(st, SIB_SCALE_1, dreg, dreg);
+
+	emit_imm(st, ofs, imsz);
+
+	if (opx == 0) {
+		imsz = RTE_MIN(bpf_size(opsz), sizeof(imm));
+		emit_imm(st, imm, imsz);
+	}
+}
+
+static void
+emit_st_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm,
+	int32_t ofs)
+{
+	emit_st_common(st, op, 0, dreg, imm, ofs);
+}
+
+static void
+emit_st_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	emit_st_common(st, op, sreg, dreg, 0, ofs);
+}
+
+/*
+ * emit lock add %<sreg>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_xadd(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	uint32_t imsz, mods;
+
+	const uint8_t lck = 0xF0; /* lock prefix */
+	const uint8_t ops = 0x01; /* add opcode */
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_bytes(st, &lck, sizeof(lck));
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, mods, sreg, dreg);
+	emit_imm(st, ofs, imsz);
+}
+
+/*
+ * emit:
+ *    mov <imm64>, (%rax)
+ *    call *%rax
+ */
+static void
+emit_call(struct bpf_jit_state *st, uintptr_t trg)
+{
+	const uint8_t ops = 0xFF;
+	const uint8_t mods = 2;
+
+	emit_ld_imm64(st, RAX, trg, trg >> 32);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RAX);
+}
+
+/*
+ * emit jmp <ofs>
+ * where 'ofs' is the target offset for the native code.
+ */
+static void
+emit_abs_jmp(struct bpf_jit_state *st, int32_t ofs)
+{
+	int32_t joff;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0xEB;
+	const uint8_t op32 = 0xE9;
+
+	const int32_t sz8 = sizeof(op8) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32) + sizeof(uint32_t);
+
+	/* max possible jmp instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = ofs - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8, sizeof(op8));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, &op32, sizeof(op32));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit jmp <ofs>
+ * where 'ofs' is the target offset for the BPF bytecode.
+ */
+static void
+emit_jmp(struct bpf_jit_state *st, int32_t ofs)
+{
+	emit_abs_jmp(st, st->off[st->idx + ofs]);
+}
+
+/*
+ * emit one of:
+ *    cmovz %<sreg>, <%dreg>
+ *    cmovne %<sreg>, <%dreg>
+ *    cmova %<sreg>, <%dreg>
+ *    cmovb %<sreg>, <%dreg>
+ *    cmovae %<sreg>, <%dreg>
+ *    cmovbe %<sreg>, <%dreg>
+ *    cmovg %<sreg>, <%dreg>
+ *    cmovl %<sreg>, <%dreg>
+ *    cmovge %<sreg>, <%dreg>
+ *    cmovle %<sreg>, <%dreg>
+ */
+static void
+emit_movcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x44},  /* CMOVZ */
+		[GET_BPF_OP(EBPF_JNE)] = {0x0F, 0x45},  /* CMOVNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x47},  /* CMOVA */
+		[GET_BPF_OP(EBPF_JLT)] = {0x0F, 0x42},  /* CMOVB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x43},  /* CMOVAE */
+		[GET_BPF_OP(EBPF_JLE)] = {0x0F, 0x46},  /* CMOVBE */
+		[GET_BPF_OP(EBPF_JSGT)] = {0x0F, 0x4F}, /* CMOVG */
+		[GET_BPF_OP(EBPF_JSLT)] = {0x0F, 0x4C}, /* CMOVL */
+		[GET_BPF_OP(EBPF_JSGE)] = {0x0F, 0x4D}, /* CMOVGE */
+		[GET_BPF_OP(EBPF_JSLE)] = {0x0F, 0x4E}, /* CMOVLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x45}, /* CMOVNE */
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, dreg, sreg);
+	emit_bytes(st, ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, dreg, sreg);
+}
+
+/*
+ * emit one of:
+ * je <ofs>
+ * jne <ofs>
+ * ja <ofs>
+ * jb <ofs>
+ * jae <ofs>
+ * jbe <ofs>
+ * jg <ofs>
+ * jl <ofs>
+ * jge <ofs>
+ * jle <ofs>
+ * where 'ofs' is the target offset for the native code.
+ */
+static void
+emit_abs_jcc(struct bpf_jit_state *st, uint32_t op, int32_t ofs)
+{
+	uint32_t bop, imsz;
+	int32_t joff;
+
+	static const uint8_t op8[] = {
+		[GET_BPF_OP(BPF_JEQ)] = 0x74,  /* JE */
+		[GET_BPF_OP(EBPF_JNE)] = 0x75,  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = 0x77,  /* JA */
+		[GET_BPF_OP(EBPF_JLT)] = 0x72,  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = 0x73,  /* JAE */
+		[GET_BPF_OP(EBPF_JLE)] = 0x76,  /* JBE */
+		[GET_BPF_OP(EBPF_JSGT)] = 0x7F, /* JG */
+		[GET_BPF_OP(EBPF_JSLT)] = 0x7C, /* JL */
+		[GET_BPF_OP(EBPF_JSGE)] = 0x7D, /*JGE */
+		[GET_BPF_OP(EBPF_JSLE)] = 0x7E, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = 0x75, /*JNE */
+	};
+
+	static const uint8_t op32[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x84},  /* JE */
+		[GET_BPF_OP(EBPF_JNE)] = {0x0F, 0x85},  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x87},  /* JA */
+		[GET_BPF_OP(EBPF_JLT)] = {0x0F, 0x82},  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x83},  /* JAE */
+		[GET_BPF_OP(EBPF_JLE)] = {0x0F, 0x86},  /* JBE */
+		[GET_BPF_OP(EBPF_JSGT)] = {0x0F, 0x8F}, /* JG */
+		[GET_BPF_OP(EBPF_JSLT)] = {0x0F, 0x8C}, /* JL */
+		[GET_BPF_OP(EBPF_JSGE)] = {0x0F, 0x8D}, /*JGE */
+		[GET_BPF_OP(EBPF_JSLE)] = {0x0F, 0x8E}, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x85}, /*JNE */
+	};
+
+	const int32_t sz8 = sizeof(op8[0]) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32[0]) + sizeof(uint32_t);
+
+	/* max possible jcc instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = ofs - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	bop = GET_BPF_OP(op);
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8[bop], sizeof(op8[bop]));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, op32[bop], sizeof(op32[bop]));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit one of:
+ * je <ofs>
+ * jne <ofs>
+ * ja <ofs>
+ * jb <ofs>
+ * jae <ofs>
+ * jbe <ofs>
+ * jg <ofs>
+ * jl <ofs>
+ * jge <ofs>
+ * jle <ofs>
+ * where 'ofs' is the target offset for the BPF bytecode.
+ */
+static void
+emit_jcc(struct bpf_jit_state *st, uint32_t op, int32_t ofs)
+{
+	emit_abs_jcc(st, op, st->off[st->idx + ofs]);
+}
+
+
+/*
+ * emit cmp <imm>, %<dreg>
+ */
+static void
+emit_cmp_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t ops;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	const uint8_t mods = 7;
+
+	imsz = imm_size(imm);
+	ops = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit test <imm>, %<dreg>
+ */
+static void
+emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 0;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+static void
+emit_jcc_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_imm(st, EBPF_ALU64, dreg, imm);
+	else
+		emit_cmp_imm(st, EBPF_ALU64, dreg, imm);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * emit test %<sreg>, %<dreg>
+ */
+static void
+emit_tst_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x85;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit cmp %<sreg>, %<dreg>
+ */
+static void
+emit_cmp_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x39;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+
+}
+
+static void
+emit_jcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_reg(st, EBPF_ALU64, sreg, dreg);
+	else
+		emit_cmp_reg(st, EBPF_ALU64, sreg, dreg);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * xor %rdx, %rdx
+ * for divisor as immediate value:
+ *   mov <imm>, %r9
+ * div %<divisor_reg>
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ * either:
+ *   mov %rax, %<dreg>
+ * OR
+ *   mov %rdx, %<dreg>
+ * mov %r11, %rax
+ * mov %r10, %rdx
+ */
+static void
+emit_div(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	uint32_t sr;
+
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 6;
+
+	if (BPF_SRC(op) == BPF_X) {
+
+		/* check that src divisor is not zero */
+		emit_tst_reg(st, BPF_CLASS(op), sreg, sreg);
+
+		/* exit with return value zero */
+		emit_movcc_reg(st, BPF_CLASS(op) | BPF_JEQ | BPF_X, sreg, RAX);
+		emit_abs_jcc(st, BPF_JMP | BPF_JEQ | BPF_K, st->exit.off);
+	}
+
+	/* save rax & rdx */
+	if (dreg != RAX)
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RAX, REG_TMP0);
+	if (dreg != RDX)
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* fill rax & rdx */
+	emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, dreg, RAX);
+	emit_mov_imm(st, EBPF_ALU64 | EBPF_MOV | BPF_K, RDX, 0);
+
+	if (BPF_SRC(op) == BPF_X) {
+		sr = sreg;
+		if (sr == RAX)
+			sr = REG_TMP0;
+		else if (sr == RDX)
+			sr = REG_TMP1;
+	} else {
+		sr = REG_DIV_IMM;
+		emit_mov_imm(st, EBPF_ALU64 | EBPF_MOV | BPF_K, sr, imm);
+	}
+
+	emit_rex(st, op, 0, sr);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, sr);
+
+	if (BPF_OP(op) == BPF_DIV)
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RAX, dreg);
+	else
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RDX, dreg);
+
+	if (dreg != RAX)
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, REG_TMP0, RAX);
+	if (dreg != RDX)
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, REG_TMP1, RDX);
+}
+
+static void
+emit_prolog(struct bpf_jit_state *st, int32_t stack_size)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	/* we can avoid touching the stack at all */
+	if (spil == 0)
+		return;
+
+
+	emit_alu_imm(st, EBPF_ALU64 | BPF_SUB | BPF_K, RSP,
+		spil * sizeof(uint64_t));
+
+	ofs = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++) {
+		if (INUSE(st->reguse, save_regs[i]) != 0) {
+			emit_st_reg(st, BPF_STX | BPF_MEM | EBPF_DW,
+				save_regs[i], RSP, ofs);
+			ofs += sizeof(uint64_t);
+		}
+	}
+
+	if (INUSE(st->reguse, RBP) != 0) {
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RSP, RBP);
+		emit_alu_imm(st, EBPF_ALU64 | BPF_SUB | BPF_K, RSP, stack_size);
+	}
+}
+
+/*
+ * emit ret
+ */
+static void
+emit_ret(struct bpf_jit_state *st)
+{
+	const uint8_t ops = 0xC3;
+
+	emit_bytes(st, &ops, sizeof(ops));
+}
+
+static void
+emit_epilog(struct bpf_jit_state *st)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	/* if we allready have an epilog generate a jump to it */
+	if (st->exit.num++ != 0) {
+		emit_abs_jmp(st, st->exit.off);
+		return;
+	}
+
+	/* store offset of epilog block */
+	st->exit.off = st->sz;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	if (spil != 0) {
+
+		if (INUSE(st->reguse, RBP) != 0)
+			emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X,
+				RBP, RSP);
+
+		ofs = 0;
+		for (i = 0; i != RTE_DIM(save_regs); i++) {
+			if (INUSE(st->reguse, save_regs[i]) != 0) {
+				emit_ld_reg(st, BPF_LDX | BPF_MEM | EBPF_DW,
+					RSP, save_regs[i], ofs);
+				ofs += sizeof(uint64_t);
+			}
+		}
+
+		emit_alu_imm(st, EBPF_ALU64 | BPF_ADD | BPF_K, RSP,
+			spil * sizeof(uint64_t));
+	}
+
+	emit_ret(st);
+}
+
+/*
+ * walk through bpf code and translate them x86_64 one.
+ */
+static int
+emit(struct bpf_jit_state *st, const struct rte_bpf *bpf)
+{
+	uint32_t i, dr, op, sr;
+	const struct ebpf_insn *ins;
+
+	/* reset state fields */
+	st->sz = 0;
+	st->exit.num = 0;
+
+	emit_prolog(st, bpf->stack_sz);
+
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		st->idx = i;
+		st->off[i] = st->sz;
+
+		ins = bpf->prm.ins + i;
+
+		dr = ebpf2x86[ins->dst_reg];
+		sr = ebpf2x86[ins->src_reg];
+		op = ins->code;
+
+		switch (op) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+		case (BPF_ALU | BPF_SUB | BPF_K):
+		case (BPF_ALU | BPF_AND | BPF_K):
+		case (BPF_ALU | BPF_OR | BPF_K):
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | EBPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+		case (BPF_ALU | BPF_SUB | BPF_X):
+		case (BPF_ALU | BPF_AND | BPF_X):
+		case (BPF_ALU | BPF_OR | BPF_X):
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | EBPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		case (BPF_ALU | EBPF_END | EBPF_TO_BE):
+			emit_be2le(st, dr, ins->imm);
+			break;
+		case (BPF_ALU | EBPF_END | EBPF_TO_LE):
+			emit_le2be(st, dr, ins->imm);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (EBPF_ALU64 | BPF_ADD | BPF_K):
+		case (EBPF_ALU64 | BPF_SUB | BPF_K):
+		case (EBPF_ALU64 | BPF_AND | BPF_K):
+		case (EBPF_ALU64 | BPF_OR | BPF_K):
+		case (EBPF_ALU64 | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (EBPF_ALU64 | BPF_LSH | BPF_K):
+		case (EBPF_ALU64 | BPF_RSH | BPF_K):
+		case (EBPF_ALU64 | EBPF_ARSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (EBPF_ALU64 | EBPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 64 bit ALU REG operations */
+		case (EBPF_ALU64 | BPF_ADD | BPF_X):
+		case (EBPF_ALU64 | BPF_SUB | BPF_X):
+		case (EBPF_ALU64 | BPF_AND | BPF_X):
+		case (EBPF_ALU64 | BPF_OR | BPF_X):
+		case (EBPF_ALU64 | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (EBPF_ALU64 | BPF_LSH | BPF_X):
+		case (EBPF_ALU64 | BPF_RSH | BPF_X):
+		case (EBPF_ALU64 | EBPF_ARSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (EBPF_ALU64 | EBPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (EBPF_ALU64 | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		/* multiply instructions */
+		case (BPF_ALU | BPF_MUL | BPF_K):
+		case (BPF_ALU | BPF_MUL | BPF_X):
+		case (EBPF_ALU64 | BPF_MUL | BPF_K):
+		case (EBPF_ALU64 | BPF_MUL | BPF_X):
+			emit_mul(st, op, sr, dr, ins->imm);
+			break;
+		/* divide instructions */
+		case (BPF_ALU | BPF_DIV | BPF_K):
+		case (BPF_ALU | BPF_MOD | BPF_K):
+		case (BPF_ALU | BPF_DIV | BPF_X):
+		case (BPF_ALU | BPF_MOD | BPF_X):
+		case (EBPF_ALU64 | BPF_DIV | BPF_K):
+		case (EBPF_ALU64 | BPF_MOD | BPF_K):
+		case (EBPF_ALU64 | BPF_DIV | BPF_X):
+		case (EBPF_ALU64 | BPF_MOD | BPF_X):
+			emit_div(st, op, sr, dr, ins->imm);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+		case (BPF_LDX | BPF_MEM | BPF_H):
+		case (BPF_LDX | BPF_MEM | BPF_W):
+		case (BPF_LDX | BPF_MEM | EBPF_DW):
+			emit_ld_reg(st, op, sr, dr, ins->off);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | EBPF_DW):
+			emit_ld_imm64(st, dr, ins[0].imm, ins[1].imm);
+			i++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+		case (BPF_STX | BPF_MEM | BPF_H):
+		case (BPF_STX | BPF_MEM | BPF_W):
+		case (BPF_STX | BPF_MEM | EBPF_DW):
+			emit_st_reg(st, op, sr, dr, ins->off);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+		case (BPF_ST | BPF_MEM | BPF_H):
+		case (BPF_ST | BPF_MEM | BPF_W):
+		case (BPF_ST | BPF_MEM | EBPF_DW):
+			emit_st_imm(st, op, dr, ins->imm, ins->off);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | EBPF_XADD | BPF_W):
+		case (BPF_STX | EBPF_XADD | EBPF_DW):
+			emit_st_xadd(st, op, sr, dr, ins->off);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			emit_jmp(st, ins->off + 1);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | EBPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | EBPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | EBPF_JLE | BPF_K):
+		case (BPF_JMP | EBPF_JSGT | BPF_K):
+		case (BPF_JMP | EBPF_JSLT | BPF_K):
+		case (BPF_JMP | EBPF_JSGE | BPF_K):
+		case (BPF_JMP | EBPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			emit_jcc_imm(st, op, dr, ins->imm, ins->off + 1);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | EBPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | EBPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | EBPF_JLE | BPF_X):
+		case (BPF_JMP | EBPF_JSGT | BPF_X):
+		case (BPF_JMP | EBPF_JSLT | BPF_X):
+		case (BPF_JMP | EBPF_JSGE | BPF_X):
+		case (BPF_JMP | EBPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			emit_jcc_reg(st, op, sr, dr, ins->off + 1);
+			break;
+		/* call instructions */
+		case (BPF_JMP | EBPF_CALL):
+			emit_call(st, (uintptr_t)bpf->prm.xsym[ins->imm].func);
+			break;
+		/* return instruction */
+		case (BPF_JMP | EBPF_EXIT):
+			emit_epilog(st);
+			break;
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %u;\n",
+				__func__, bpf, ins->code, i);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * produce a native ISA version of the given BPF code.
+ */
+int
+bpf_jit_x86(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	uint32_t i;
+	size_t sz;
+	struct bpf_jit_state st;
+
+	/* init state */
+	memset(&st, 0, sizeof(st));
+	st.off = malloc(bpf->prm.nb_ins * sizeof(st.off[0]));
+	if (st.off == NULL)
+		return -ENOMEM;
+
+	/* fill with fake offsets */
+	st.exit.off = INT32_MAX;
+	for (i = 0; i != bpf->prm.nb_ins; i++)
+		st.off[i] = INT32_MAX;
+
+	/*
+	 * dry runs, used to calculate total code size and valid jump offsets.
+	 * stop when we get minimal possible size
+	 */
+	do {
+		sz = st.sz;
+		rc = emit(&st, bpf);
+	} while (rc == 0 && sz != st.sz);
+
+	if (rc == 0) {
+
+		/* allocate memory needed */
+		st.ins = mmap(NULL, st.sz, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (st.ins == MAP_FAILED)
+			rc = -ENOMEM;
+		else
+			/* generate code */
+			rc = emit(&st, bpf);
+	}
+
+	if (rc == 0 && mprotect(st.ins, st.sz, PROT_READ | PROT_EXEC) != 0)
+		rc = -ENOMEM;
+
+	if (rc != 0)
+		munmap(st.ins, st.sz);
+	else {
+		bpf->jit.func = (void *)st.ins;
+		bpf->jit.sz = st.sz;
+	}
+
+	free(st.off);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index e7c8d3398..59d84cb2b 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -7,6 +7,10 @@ sources = files('bpf.c',
 		'bpf_load.c',
 		'bpf_validate.c')
 
+if arch_subdir == 'x86'
+	sources += files('bpf_jit_x86.c')
+endif
+
 install_headers = files('bpf_def.h',
 			'rte_bpf.h')
 
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v5 4/8] bpf: introduce basic RX/TX BPF filters
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                         ` (3 preceding siblings ...)
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 3/8] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
@ 2018-05-04 12:45       ` Konstantin Ananyev
  2018-05-09 17:09         ` Ferruh Yigit
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 5/8] testpmd: new commands to load/unload " Konstantin Ananyev
                         ` (3 subsequent siblings)
  8 siblings, 1 reply; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-04 12:45 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce API to install BPF based filters on ethdev RX/TX path.
Current implementation is pure SW one, based on ethdev RX/TX
callback mechanism.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 lib/librte_bpf/Makefile            |   2 +
 lib/librte_bpf/bpf_pkt.c           | 607 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build         |   6 +-
 lib/librte_bpf/rte_bpf_ethdev.h    | 102 +++++++
 lib/librte_bpf/rte_bpf_version.map |   4 +
 5 files changed, 719 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index 7901a0e78..f66541265 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -22,6 +22,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_pkt.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
 ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
@@ -30,5 +31,6 @@ endif
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += bpf_def.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf_ethdev.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf_pkt.c b/lib/librte_bpf/bpf_pkt.c
new file mode 100644
index 000000000..2200228df
--- /dev/null
+++ b/lib/librte_bpf/bpf_pkt.c
@@ -0,0 +1,607 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include <sys/queue.h>
+#include <sys/stat.h>
+
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include <rte_malloc.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_cycles.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+
+#include <rte_bpf_ethdev.h>
+#include "bpf_impl.h"
+
+/*
+ * information about installed BPF rx/tx callback
+ */
+
+struct bpf_eth_cbi {
+	/* used by both data & control path */
+	uint32_t use;    /*usage counter */
+	const struct rte_eth_rxtx_callback *cb;  /* callback handle */
+	struct rte_bpf *bpf;
+	struct rte_bpf_jit jit;
+	/* used by control path only */
+	LIST_ENTRY(bpf_eth_cbi) link;
+	uint16_t port;
+	uint16_t queue;
+} __rte_cache_aligned;
+
+/*
+ * Odd number means that callback is used by datapath.
+ * Even number means that callback is not used by datapath.
+ */
+#define BPF_ETH_CBI_INUSE  1
+
+/*
+ * List to manage RX/TX installed callbacks.
+ */
+LIST_HEAD(bpf_eth_cbi_list, bpf_eth_cbi);
+
+enum {
+	BPF_ETH_RX,
+	BPF_ETH_TX,
+	BPF_ETH_NUM,
+};
+
+/*
+ * information about all installed BPF rx/tx callbacks
+ */
+struct bpf_eth_cbh {
+	rte_spinlock_t lock;
+	struct bpf_eth_cbi_list list;
+	uint32_t type;
+};
+
+static struct bpf_eth_cbh rx_cbh = {
+	.lock = RTE_SPINLOCK_INITIALIZER,
+	.list = LIST_HEAD_INITIALIZER(list),
+	.type = BPF_ETH_RX,
+};
+
+static struct bpf_eth_cbh tx_cbh = {
+	.lock = RTE_SPINLOCK_INITIALIZER,
+	.list = LIST_HEAD_INITIALIZER(list),
+	.type = BPF_ETH_TX,
+};
+
+/*
+ * Marks given callback as used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
+{
+	cbi->use++;
+	/* make sure no store/load reordering could happen */
+	rte_smp_mb();
+}
+
+/*
+ * Marks given callback list as not used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
+{
+	/* make sure all previous loads are completed */
+	rte_smp_rmb();
+	cbi->use++;
+}
+
+/*
+ * Waits till datapath finished using given callback.
+ */
+static void
+bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
+{
+	uint32_t nuse, puse;
+
+	/* make sure all previous loads and stores are completed */
+	rte_smp_mb();
+
+	puse = cbi->use;
+
+	/* in use, busy wait till current RX/TX iteration is finished */
+	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
+		do {
+			rte_pause();
+			rte_compiler_barrier();
+			nuse = cbi->use;
+		} while (nuse == puse);
+	}
+}
+
+static void
+bpf_eth_cbi_cleanup(struct bpf_eth_cbi *bc)
+{
+	bc->bpf = NULL;
+	memset(&bc->jit, 0, sizeof(bc->jit));
+}
+
+static struct bpf_eth_cbi *
+bpf_eth_cbh_find(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *cbi;
+
+	LIST_FOREACH(cbi, &cbh->list, link) {
+		if (cbi->port == port && cbi->queue == queue)
+			break;
+	}
+	return cbi;
+}
+
+static struct bpf_eth_cbi *
+bpf_eth_cbh_add(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *cbi;
+
+	/* return an existing one */
+	cbi = bpf_eth_cbh_find(cbh, port, queue);
+	if (cbi != NULL)
+		return cbi;
+
+	cbi = rte_zmalloc(NULL, sizeof(*cbi), RTE_CACHE_LINE_SIZE);
+	if (cbi != NULL) {
+		cbi->port = port;
+		cbi->queue = queue;
+		LIST_INSERT_HEAD(&cbh->list, cbi, link);
+	}
+	return cbi;
+}
+
+/*
+ * BPF packet processing routinies.
+ */
+
+static inline uint32_t
+apply_filter(struct rte_mbuf *mb[], const uint64_t rc[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i, j, k;
+	struct rte_mbuf *dr[num];
+
+	for (i = 0, j = 0, k = 0; i != num; i++) {
+
+		/* filter matches */
+		if (rc[i] != 0)
+			mb[j++] = mb[i];
+		/* no match */
+		else
+			dr[k++] = mb[i];
+	}
+
+	if (drop != 0) {
+		/* free filtered out mbufs */
+		for (i = 0; i != k; i++)
+			rte_pktmbuf_free(dr[i]);
+	} else {
+		/* copy filtered out mbufs beyond good ones */
+		for (i = 0; i != k; i++)
+			mb[j + i] = dr[i];
+	}
+
+	return j;
+}
+
+static inline uint32_t
+pkt_filter_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i;
+	void *dp[num];
+	uint64_t rc[num];
+
+	for (i = 0; i != num; i++)
+		dp[i] = rte_pktmbuf_mtod(mb[i], void *);
+
+	rte_bpf_exec_burst(bpf, dp, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i, n;
+	void *dp;
+	uint64_t rc[num];
+
+	n = 0;
+	for (i = 0; i != num; i++) {
+		dp = rte_pktmbuf_mtod(mb[i], void *);
+		rc[i] = jit->func(dp);
+		n += (rc[i] == 0);
+	}
+
+	if (n != 0)
+		num = apply_filter(mb, rc, num, drop);
+
+	return num;
+}
+
+static inline uint32_t
+pkt_filter_mb_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint64_t rc[num];
+
+	rte_bpf_exec_burst(bpf, (void **)mb, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_mb_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i, n;
+	uint64_t rc[num];
+
+	n = 0;
+	for (i = 0; i != num; i++) {
+		rc[i] = jit->func(mb[i]);
+		n += (rc[i] == 0);
+	}
+
+	if (n != 0)
+		num = apply_filter(mb, rc, num, drop);
+
+	return num;
+}
+
+/*
+ * RX/TX callbacks for raw data bpf.
+ */
+
+static uint16_t
+bpf_rx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+/*
+ * RX/TX callbacks for mbuf.
+ */
+
+static uint16_t
+bpf_rx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static rte_rx_callback_fn
+select_rx_callback(enum rte_bpf_arg_type type, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (type == RTE_BPF_ARG_PTR)
+			return bpf_rx_callback_jit;
+		else if (type == RTE_BPF_ARG_PTR_MBUF)
+			return bpf_rx_callback_mb_jit;
+	} else if (type == RTE_BPF_ARG_PTR)
+		return bpf_rx_callback_vm;
+	else if (type == RTE_BPF_ARG_PTR_MBUF)
+		return bpf_rx_callback_mb_vm;
+
+	return NULL;
+}
+
+static rte_tx_callback_fn
+select_tx_callback(enum rte_bpf_arg_type type, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (type == RTE_BPF_ARG_PTR)
+			return bpf_tx_callback_jit;
+		else if (type == RTE_BPF_ARG_PTR_MBUF)
+			return bpf_tx_callback_mb_jit;
+	} else if (type == RTE_BPF_ARG_PTR)
+		return bpf_tx_callback_vm;
+	else if (type == RTE_BPF_ARG_PTR_MBUF)
+		return bpf_tx_callback_mb_vm;
+
+	return NULL;
+}
+
+/*
+ * helper function to perform BPF unload for given port/queue.
+ * have to introduce extra complexity (and possible slowdown) here,
+ * as right now there is no safe generic way to remove RX/TX callback
+ * while IO is active.
+ * Still don't free memory allocated for callback handle itself,
+ * again right now there is no safe way to do that without stopping RX/TX
+ * on given port/queue first.
+ */
+static void
+bpf_eth_cbi_unload(struct bpf_eth_cbi *bc)
+{
+	/* mark this cbi as empty */
+	bc->cb = NULL;
+	rte_smp_mb();
+
+	/* make sure datapath doesn't use bpf anymore, then destroy bpf */
+	bpf_eth_cbi_wait(bc);
+	rte_bpf_destroy(bc->bpf);
+	bpf_eth_cbi_cleanup(bc);
+}
+
+static void
+bpf_eth_unload(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *bc;
+
+	bc = bpf_eth_cbh_find(cbh, port, queue);
+	if (bc == NULL || bc->cb == NULL)
+		return;
+
+	if (cbh->type == BPF_ETH_RX)
+		rte_eth_remove_rx_callback(port, queue, bc->cb);
+	else
+		rte_eth_remove_tx_callback(port, queue, bc->cb);
+
+	bpf_eth_cbi_unload(bc);
+}
+
+
+__rte_experimental void
+rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &rx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	bpf_eth_unload(cbh, port, queue);
+	rte_spinlock_unlock(&cbh->lock);
+}
+
+__rte_experimental void
+rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &tx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	bpf_eth_unload(cbh, port, queue);
+	rte_spinlock_unlock(&cbh->lock);
+}
+
+static int
+bpf_eth_elf_load(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbi *bc;
+	struct rte_bpf *bpf;
+	rte_rx_callback_fn frx;
+	rte_tx_callback_fn ftx;
+	struct rte_bpf_jit jit;
+
+	frx = NULL;
+	ftx = NULL;
+
+	if (prm == NULL || rte_eth_dev_is_valid_port(port) == 0 ||
+			queue >= RTE_MAX_QUEUES_PER_PORT)
+		return -EINVAL;
+
+	if (cbh->type == BPF_ETH_RX)
+		frx = select_rx_callback(prm->prog_arg.type, flags);
+	else
+		ftx = select_tx_callback(prm->prog_arg.type, flags);
+
+	if (frx == NULL && ftx == NULL) {
+		RTE_BPF_LOG(ERR, "%s(%u, %u): no callback selected;\n",
+			__func__, port, queue);
+		return -EINVAL;
+	}
+
+	bpf = rte_bpf_elf_load(prm, fname, sname);
+	if (bpf == NULL)
+		return -rte_errno;
+
+	rte_bpf_get_jit(bpf, &jit);
+
+	if ((flags & RTE_BPF_ETH_F_JIT) != 0 && jit.func == NULL) {
+		RTE_BPF_LOG(ERR, "%s(%u, %u): no JIT generated;\n",
+			__func__, port, queue);
+		rte_bpf_destroy(bpf);
+		return -ENOTSUP;
+	}
+
+	/* setup/update global callback info */
+	bc = bpf_eth_cbh_add(cbh, port, queue);
+	if (bc == NULL)
+		return -ENOMEM;
+
+	/* remove old one, if any */
+	if (bc->cb != NULL)
+		bpf_eth_unload(cbh, port, queue);
+
+	bc->bpf = bpf;
+	bc->jit = jit;
+
+	if (cbh->type == BPF_ETH_RX)
+		bc->cb = rte_eth_add_rx_callback(port, queue, frx, bc);
+	else
+		bc->cb = rte_eth_add_tx_callback(port, queue, ftx, bc);
+
+	if (bc->cb == NULL) {
+		rc = -rte_errno;
+		rte_bpf_destroy(bpf);
+		bpf_eth_cbi_cleanup(bc);
+	} else
+		rc = 0;
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &rx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	rc = bpf_eth_elf_load(cbh, port, queue, prm, fname, sname, flags);
+	rte_spinlock_unlock(&cbh->lock);
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &tx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	rc = bpf_eth_elf_load(cbh, port, queue, prm, fname, sname, flags);
+	rte_spinlock_unlock(&cbh->lock);
+
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index 59d84cb2b..07aa8887d 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -5,6 +5,7 @@ allow_experimental_apis = true
 sources = files('bpf.c',
 		'bpf_exec.c',
 		'bpf_load.c',
+		'bpf_pkt.c',
 		'bpf_validate.c')
 
 if arch_subdir == 'x86'
@@ -12,9 +13,10 @@ if arch_subdir == 'x86'
 endif
 
 install_headers = files('bpf_def.h',
-			'rte_bpf.h')
+			'rte_bpf.h',
+			'rte_bpf_ethdev.h')
 
-deps += ['mbuf', 'net']
+deps += ['mbuf', 'net', 'ethdev']
 
 dep = dependency('libelf', required: false)
 if dep.found() == false
diff --git a/lib/librte_bpf/rte_bpf_ethdev.h b/lib/librte_bpf/rte_bpf_ethdev.h
new file mode 100644
index 000000000..4800bbdaa
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_ethdev.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_ETHDEV_H_
+#define _RTE_BPF_ETHDEV_H_
+
+/**
+ * @file
+ *
+ * API to install BPF filter as RX/TX callbacks for eth devices.
+ * Note that right now:
+ * - it is not MT safe, i.e. it is not allowed to do load/unload for the
+ *   same port/queue from different threads in parallel.
+ * - though it allows to do load/unload at runtime
+ *   (while RX/TX is ongoing on given port/queue).
+ * - allows only one BPF program per port/queue,
+ * i.e. new load will replace previously loaded for that port/queue BPF program.
+ * Filter behaviour - if BPF program returns zero value for a given packet,
+ * then it will be dropped inside callback and no further processing
+ *   on RX - it will be dropped inside callback and no further processing
+ *   for that packet will happen.
+ *   on TX - packet will remain unsent, and it is responsibility of the user
+ *   to handle such situation (drop, try to send again, etc.).
+ */
+
+#include <rte_bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum {
+	RTE_BPF_ETH_F_NONE = 0,
+	RTE_BPF_ETH_F_JIT  = 0x1, /*< use compiled into native ISA code */
+};
+
+/**
+ * Unload previously loaded BPF program (if any) from given RX port/queue
+ * and remove appropriate RX port/queue callback.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the RX queue on the given port
+ */
+void rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue);
+
+/**
+ * Unload previously loaded BPF program (if any) from given TX port/queue
+ * and remove appropriate TX port/queue callback.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the TX queue on the given port
+ */
+void rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue);
+
+/**
+ * Load BPF program from the ELF file and install callback to execute it
+ * on given RX port/queue.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the RX queue on the given port
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   Zero on successful completion or negative error code otherwise.
+ */
+int rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+/**
+ * Load BPF program from the ELF file and install callback to execute it
+ * on given TX port/queue.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the TX queue on the given port
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   Zero on successful completion or negative error code otherwise.
+ */
+int rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_ETHDEV_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
index ff65144df..a203e088e 100644
--- a/lib/librte_bpf/rte_bpf_version.map
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -3,6 +3,10 @@ EXPERIMENTAL {
 
 	rte_bpf_destroy;
 	rte_bpf_elf_load;
+	rte_bpf_eth_rx_elf_load;
+	rte_bpf_eth_rx_unload;
+	rte_bpf_eth_tx_elf_load;
+	rte_bpf_eth_tx_unload;
 	rte_bpf_exec;
 	rte_bpf_exec_burst;
 	rte_bpf_get_jit;
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v5 5/8] testpmd: new commands to load/unload BPF filters
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                         ` (4 preceding siblings ...)
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 4/8] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
@ 2018-05-04 12:45       ` Konstantin Ananyev
  2018-05-09 17:09         ` Ferruh Yigit
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 6/8] test: add few eBPF samples Konstantin Ananyev
                         ` (2 subsequent siblings)
  8 siblings, 1 reply; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-04 12:45 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce new testpmd commands to load/unload RX/TX BPF-based filters.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test-pmd/bpf_sup.h   |  25 ++++++++
 app/test-pmd/cmdline.c   | 149 +++++++++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/meson.build |   2 +-
 3 files changed, 175 insertions(+), 1 deletion(-)
 create mode 100644 app/test-pmd/bpf_sup.h

diff --git a/app/test-pmd/bpf_sup.h b/app/test-pmd/bpf_sup.h
new file mode 100644
index 000000000..35f91a07f
--- /dev/null
+++ b/app/test-pmd/bpf_sup.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2017 Intel Corporation
+ */
+
+#ifndef _BPF_SUP_H_
+#define _BPF_SUP_H_
+
+#include <stdio.h>
+#include <rte_mbuf.h>
+#include <rte_bpf_ethdev.h>
+
+static const struct rte_bpf_xsym bpf_xsym[] = {
+	{
+		.name = RTE_STR(stdout),
+		.type = RTE_BPF_XTYPE_VAR,
+		.var = &stdout,
+	},
+	{
+		.name = RTE_STR(rte_pktmbuf_dump),
+		.type = RTE_BPF_XTYPE_FUNC,
+		.func = (void *)rte_pktmbuf_dump,
+	},
+};
+
+#endif /* _BPF_SUP_H_ */
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 961567070..31d777343 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -75,6 +75,7 @@
 #include "testpmd.h"
 #include "cmdline_mtr.h"
 #include "cmdline_tm.h"
+#include "bpf_sup.h"
 
 static struct cmdline *testpmd_cl;
 
@@ -16448,6 +16449,152 @@ cmdline_parse_inst_t cmd_load_from_file = {
 	},
 };
 
+/* *** load BPF program *** */
+struct cmd_bpf_ld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+	cmdline_fixed_string_t op;
+	cmdline_fixed_string_t flags;
+	cmdline_fixed_string_t prm;
+};
+
+static void
+bpf_parse_flags(const char *str, struct rte_bpf_arg *arg, uint32_t *flags)
+{
+	uint32_t i, v;
+
+	*flags = RTE_BPF_ETH_F_NONE;
+	arg->type = RTE_BPF_ARG_PTR;
+	arg->size = mbuf_data_size;
+
+	for (i = 0; str[i] != 0; i++) {
+		v = toupper(str[i]);
+		if (v == 'J')
+			*flags |= RTE_BPF_ETH_F_JIT;
+		else if (v == 'M') {
+			arg->type = RTE_BPF_ARG_PTR_MBUF;
+			arg->size = sizeof(struct rte_mbuf);
+			arg->buf_size = mbuf_data_size;
+		} else if (v == '-')
+			continue;
+		else
+			printf("unknown flag: \'%c\'", v);
+	}
+}
+
+static void cmd_operate_bpf_ld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	int32_t rc;
+	uint32_t flags;
+	struct cmd_bpf_ld_result *res;
+	struct rte_bpf_prm prm;
+	const char *fname, *sname;
+
+	res = parsed_result;
+	memset(&prm, 0, sizeof(prm));
+	prm.xsym = bpf_xsym;
+	prm.nb_xsym = RTE_DIM(bpf_xsym);
+
+	bpf_parse_flags(res->flags, &prm.prog_arg, &flags);
+	fname = res->prm;
+	sname = ".text";
+
+	if (strcmp(res->dir, "rx") == 0) {
+		rc = rte_bpf_eth_rx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else if (strcmp(res->dir, "tx") == 0) {
+		rc = rte_bpf_eth_tx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_load_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			bpf, "bpf-load");
+cmdline_parse_token_string_t cmd_load_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_load_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_load_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, queue, UINT16);
+cmdline_parse_token_string_t cmd_load_bpf_flags =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			flags, NULL);
+cmdline_parse_token_string_t cmd_load_bpf_prm =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			prm, NULL);
+
+cmdline_parse_inst_t cmd_operate_bpf_ld_parse = {
+	.f = cmd_operate_bpf_ld_parsed,
+	.data = NULL,
+	.help_str = "bpf-load rx|tx <port> <queue> <J|M|B> <file_name>",
+	.tokens = {
+		(void *)&cmd_load_bpf_start,
+		(void *)&cmd_load_bpf_dir,
+		(void *)&cmd_load_bpf_port,
+		(void *)&cmd_load_bpf_queue,
+		(void *)&cmd_load_bpf_flags,
+		(void *)&cmd_load_bpf_prm,
+		NULL,
+	},
+};
+
+/* *** unload BPF program *** */
+struct cmd_bpf_unld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+};
+
+static void cmd_operate_bpf_unld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	struct cmd_bpf_unld_result *res;
+
+	res = parsed_result;
+
+	if (strcmp(res->dir, "rx") == 0)
+		rte_bpf_eth_rx_unload(res->port, res->queue);
+	else if (strcmp(res->dir, "tx") == 0)
+		rte_bpf_eth_tx_unload(res->port, res->queue);
+	else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_unload_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			bpf, "bpf-unload");
+cmdline_parse_token_string_t cmd_unload_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_unload_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_unload_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, queue, UINT16);
+
+cmdline_parse_inst_t cmd_operate_bpf_unld_parse = {
+	.f = cmd_operate_bpf_unld_parsed,
+	.data = NULL,
+	.help_str = "bpf-unload rx|tx <port> <queue>",
+	.tokens = {
+		(void *)&cmd_unload_bpf_start,
+		(void *)&cmd_unload_bpf_dir,
+		(void *)&cmd_unload_bpf_port,
+		(void *)&cmd_unload_bpf_queue,
+		NULL,
+	},
+};
+
 /* ******************************************************************************** */
 
 /* list of instructions */
@@ -16695,6 +16842,8 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_set_port_tm_node_parent,
 	(cmdline_parse_inst_t *)&cmd_port_tm_hierarchy_commit,
 	(cmdline_parse_inst_t *)&cmd_cfg_tunnel_udp_port,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_ld_parse,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_unld_parse,
 	NULL,
 };
 
diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
index b47537642..602e20ac3 100644
--- a/app/test-pmd/meson.build
+++ b/app/test-pmd/meson.build
@@ -21,7 +21,7 @@ sources = files('cmdline.c',
 	'testpmd.c',
 	'txonly.c')
 
-deps = ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'meter', 'bus_pci']
+deps = ['ethdev', 'gro', 'gso', 'cmdline', 'metrics', 'meter', 'bus_pci', 'bpf']
 if dpdk_conf.has('RTE_LIBRTE_PDUMP')
 	deps += 'pdump'
 endif
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v5 6/8] test: add few eBPF samples
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                         ` (5 preceding siblings ...)
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 5/8] testpmd: new commands to load/unload " Konstantin Ananyev
@ 2018-05-04 12:45       ` Konstantin Ananyev
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 7/8] test: introduce functional test for librte_bpf Konstantin Ananyev
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 8/8] doc: add bpf library related info Konstantin Ananyev
  8 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-04 12:45 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add few simple eBPF programs as an example.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 test/bpf/dummy.c |  20 ++
 test/bpf/mbuf.h  | 578 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 test/bpf/t1.c    |  52 +++++
 test/bpf/t2.c    |  31 +++
 test/bpf/t3.c    |  36 ++++
 5 files changed, 717 insertions(+)
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c

diff --git a/test/bpf/dummy.c b/test/bpf/dummy.c
new file mode 100644
index 000000000..5851469e7
--- /dev/null
+++ b/test/bpf/dummy.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * does nothing always return success.
+ * used to measure BPF infrastructure overhead.
+ * To compile:
+ * clang -O2 -target bpf -c dummy.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+
+uint64_t
+entry(void *arg)
+{
+	return 1;
+}
diff --git a/test/bpf/mbuf.h b/test/bpf/mbuf.h
new file mode 100644
index 000000000..f24f908d7
--- /dev/null
+++ b/test/bpf/mbuf.h
@@ -0,0 +1,578 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation.
+ * Copyright 2014 6WIND S.A.
+ */
+
+/*
+ * Snipper from dpdk.org rte_mbuf.h.
+ * used to provide BPF programs information about rte_mbuf layout.
+ */
+
+#ifndef _MBUF_H_
+#define _MBUF_H_
+
+#include <stdint.h>
+#include <rte_common.h>
+#include <rte_memory.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * Packet Offload Features Flags. It also carry packet type information.
+ * Critical resources. Both rx/tx shared these bits. Be cautious on any change
+ *
+ * - RX flags start at bit position zero, and get added to the left of previous
+ *   flags.
+ * - The most-significant 3 bits are reserved for generic mbuf flags
+ * - TX flags therefore start at bit position 60 (i.e. 63-3), and new flags get
+ *   added to the right of the previously defined flags i.e. they should count
+ *   downwards, not upwards.
+ *
+ * Keep these flags synchronized with rte_get_rx_ol_flag_name() and
+ * rte_get_tx_ol_flag_name().
+ */
+
+/**
+ * RX packet is a 802.1q VLAN packet. This flag was set by PMDs when
+ * the packet is recognized as a VLAN, but the behavior between PMDs
+ * was not the same. This flag is kept for some time to avoid breaking
+ * applications and should be replaced by PKT_RX_VLAN_STRIPPED.
+ */
+#define PKT_RX_VLAN_PKT      (1ULL << 0)
+
+#define PKT_RX_RSS_HASH      (1ULL << 1)
+/**< RX packet with RSS hash result. */
+#define PKT_RX_FDIR          (1ULL << 2)
+/**< RX packet with FDIR match indicate. */
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_L4_CKSUM_MASK.
+ * This flag was set when the L4 checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_IP_CKSUM_MASK.
+ * This flag was set when the IP checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
+
+#define PKT_RX_EIP_CKSUM_BAD (1ULL << 5)
+/**< External IP header checksum error. */
+
+/**
+ * A vlan has been stripped by the hardware and its tci is saved in
+ * mbuf->vlan_tci. This can only happen if vlan stripping is enabled
+ * in the RX configuration of the PMD.
+ */
+#define PKT_RX_VLAN_STRIPPED (1ULL << 6)
+
+/**
+ * Mask of bits used to determine the status of RX IP checksum.
+ * - PKT_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
+ * - PKT_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
+ * - PKT_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
+ * - PKT_RX_IP_CKSUM_NONE: the IP checksum is not correct in the packet
+ *   data, but the integrity of the IP header is verified.
+ */
+#define PKT_RX_IP_CKSUM_MASK ((1ULL << 4) | (1ULL << 7))
+
+#define PKT_RX_IP_CKSUM_UNKNOWN 0
+#define PKT_RX_IP_CKSUM_BAD     (1ULL << 4)
+#define PKT_RX_IP_CKSUM_GOOD    (1ULL << 7)
+#define PKT_RX_IP_CKSUM_NONE    ((1ULL << 4) | (1ULL << 7))
+
+/**
+ * Mask of bits used to determine the status of RX L4 checksum.
+ * - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
+ * - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
+ * - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
+ * - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
+ *   data, but the integrity of the L4 data is verified.
+ */
+#define PKT_RX_L4_CKSUM_MASK ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_L4_CKSUM_UNKNOWN 0
+#define PKT_RX_L4_CKSUM_BAD     (1ULL << 3)
+#define PKT_RX_L4_CKSUM_GOOD    (1ULL << 8)
+#define PKT_RX_L4_CKSUM_NONE    ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_IEEE1588_PTP  (1ULL << 9)
+/**< RX IEEE1588 L2 Ethernet PT Packet. */
+#define PKT_RX_IEEE1588_TMST (1ULL << 10)
+/**< RX IEEE1588 L2/L4 timestamped packet.*/
+#define PKT_RX_FDIR_ID       (1ULL << 13)
+/**< FD id reported if FDIR match. */
+#define PKT_RX_FDIR_FLX      (1ULL << 14)
+/**< Flexible bytes reported if FDIR match. */
+
+/**
+ * The 2 vlans have been stripped by the hardware and their tci are
+ * saved in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
+ * This can only happen if vlan stripping is enabled in the RX
+ * configuration of the PMD. If this flag is set, PKT_RX_VLAN_STRIPPED
+ * must also be set.
+ */
+#define PKT_RX_QINQ_STRIPPED (1ULL << 15)
+
+/**
+ * Deprecated.
+ * RX packet with double VLAN stripped.
+ * This flag is replaced by PKT_RX_QINQ_STRIPPED.
+ */
+#define PKT_RX_QINQ_PKT      PKT_RX_QINQ_STRIPPED
+
+/**
+ * When packets are coalesced by a hardware or virtual driver, this flag
+ * can be set in the RX mbuf, meaning that the m->tso_segsz field is
+ * valid and is set to the segment size of original packets.
+ */
+#define PKT_RX_LRO           (1ULL << 16)
+
+/**
+ * Indicate that the timestamp field in the mbuf is valid.
+ */
+#define PKT_RX_TIMESTAMP     (1ULL << 17)
+
+/* add new RX flags here */
+
+/* add new TX flags here */
+
+/**
+ * Offload the MACsec. This flag must be set by the application to enable
+ * this offload feature for a packet to be transmitted.
+ */
+#define PKT_TX_MACSEC        (1ULL << 44)
+
+/**
+ * Bits 45:48 used for the tunnel type.
+ * When doing Tx offload like TSO or checksum, the HW needs to configure the
+ * tunnel type into the HW descriptors.
+ */
+#define PKT_TX_TUNNEL_VXLAN   (0x1ULL << 45)
+#define PKT_TX_TUNNEL_GRE     (0x2ULL << 45)
+#define PKT_TX_TUNNEL_IPIP    (0x3ULL << 45)
+#define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
+/**< TX packet with MPLS-in-UDP RFC 7510 header. */
+#define PKT_TX_TUNNEL_MPLSINUDP (0x5ULL << 45)
+/* add new TX TUNNEL type here */
+#define PKT_TX_TUNNEL_MASK    (0xFULL << 45)
+
+/**
+ * Second VLAN insertion (QinQ) flag.
+ */
+#define PKT_TX_QINQ_PKT    (1ULL << 49)
+/**< TX packet with double VLAN inserted. */
+
+/**
+ * TCP segmentation offload. To enable this offload feature for a
+ * packet to be transmitted on hardware supporting TSO:
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *    PKT_TX_TCP_CKSUM)
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
+ *    to 0 in the packet
+ *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
+ *  - calculate the pseudo header checksum without taking ip_len in account,
+ *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
+ *    rte_ipv6_phdr_cksum() that can be used as helpers.
+ */
+#define PKT_TX_TCP_SEG       (1ULL << 50)
+
+#define PKT_TX_IEEE1588_TMST (1ULL << 51)
+/**< TX IEEE1588 packet to timestamp. */
+
+/**
+ * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
+ * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
+ * L4 checksum offload, the user needs to:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - calculate the pseudo header checksum and set it in the L4 header (only
+ *    for TCP or UDP). See rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum().
+ *    For SCTP, set the crc field to 0.
+ */
+#define PKT_TX_L4_NO_CKSUM   (0ULL << 52)
+/**< Disable L4 cksum of TX pkt. */
+#define PKT_TX_TCP_CKSUM     (1ULL << 52)
+/**< TCP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_SCTP_CKSUM    (2ULL << 52)
+/**< SCTP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_UDP_CKSUM     (3ULL << 52)
+/**< UDP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_L4_MASK       (3ULL << 52)
+/**< Mask for L4 cksum offload request. */
+
+/**
+ * Offload the IP checksum in the hardware. The flag PKT_TX_IPV4 should
+ * also be set by the application, although a PMD will only check
+ * PKT_TX_IP_CKSUM.
+ *  - set the IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: l2_len, l3_len
+ */
+#define PKT_TX_IP_CKSUM      (1ULL << 54)
+
+/**
+ * Packet is IPv4. This flag must be set when using any offload feature
+ * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV4          (1ULL << 55)
+
+/**
+ * Packet is IPv6. This flag must be set when using an offload feature
+ * (TSO or L4 checksum) to tell the NIC that the packet is an IPv6
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV6          (1ULL << 56)
+
+#define PKT_TX_VLAN_PKT      (1ULL << 57)
+/**< TX packet is a 802.1q VLAN packet. */
+
+/**
+ * Offload the IP checksum of an external header in the hardware. The
+ * flag PKT_TX_OUTER_IPV4 should also be set by the application, alto ugh
+ * a PMD will only check PKT_TX_IP_CKSUM.  The IP checksum field in the
+ * packet must be set to 0.
+ *  - set the outer IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: outer_l2_len, outer_l3_len
+ */
+#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
+
+/**
+ * Packet outer header is IPv4. This flag must be set when using any
+ * outer offload feature (L3 or L4 checksum) to tell the NIC that the
+ * outer header of the tunneled packet is an IPv4 packet.
+ */
+#define PKT_TX_OUTER_IPV4   (1ULL << 59)
+
+/**
+ * Packet outer header is IPv6. This flag must be set when using any
+ * outer offload feature (L4 checksum) to tell the NIC that the outer
+ * header of the tunneled packet is an IPv6 packet.
+ */
+#define PKT_TX_OUTER_IPV6    (1ULL << 60)
+
+/**
+ * Bitmask of all supported packet Tx offload features flags,
+ * which can be set for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_IEEE1588_TMST |	 \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK |	 \
+		PKT_TX_MACSEC)
+
+#define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
+
+#define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
+
+/* Use final bit of flags to indicate a control mbuf */
+#define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
+
+/** Alignment constraint of mbuf private area. */
+#define RTE_MBUF_PRIV_ALIGN 8
+
+/**
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of RX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the RX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of TX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the TX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Some NICs need at least 2KB buffer to RX standard Ethernet frame without
+ * splitting it into multiple segments.
+ * So, for mbufs that planned to be involved into RX/TX, the recommended
+ * minimal buffer length is 2KB + RTE_PKTMBUF_HEADROOM.
+ */
+#define	RTE_MBUF_DEFAULT_DATAROOM	2048
+#define	RTE_MBUF_DEFAULT_BUF_SIZE	\
+	(RTE_MBUF_DEFAULT_DATAROOM + RTE_PKTMBUF_HEADROOM)
+
+/* define a set of marker types that can be used to refer to set points in the
+ * mbuf.
+ */
+__extension__
+typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
+__extension__
+typedef uint8_t  MARKER8[0];  /**< generic marker with 1B alignment */
+__extension__
+typedef uint64_t MARKER64[0];
+/**< marker that allows us to overwrite 8 bytes with a single assignment */
+
+typedef struct {
+	volatile int16_t cnt; /**< An internal counter value. */
+} rte_atomic16_t;
+
+/**
+ * The generic rte_mbuf, containing a packet mbuf.
+ */
+struct rte_mbuf {
+	MARKER cacheline0;
+
+	void *buf_addr;           /**< Virtual address of segment buffer. */
+	/**
+	 * Physical address of segment buffer.
+	 * Force alignment to 8-bytes, so as to ensure we have the exact
+	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+	 * working on vector drivers easier.
+	 */
+	phys_addr_t buf_physaddr __rte_aligned(sizeof(phys_addr_t));
+
+	/* next 8 bytes are initialised on RX descriptor rearm */
+	MARKER64 rearm_data;
+	uint16_t data_off;
+
+	/**
+	 * Reference counter. Its size should at least equal to the size
+	 * of port field (16 bits), to support zero-copy broadcast.
+	 * It should only be accessed using the following functions:
+	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
+	 * config option.
+	 */
+	RTE_STD_C11
+	union {
+		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
+		uint16_t refcnt;
+		/**< Non-atomically accessed refcnt */
+	};
+	uint16_t nb_segs;         /**< Number of segments. */
+
+	/** Input port (16 bits to support more than 256 virtual ports). */
+	uint16_t port;
+
+	uint64_t ol_flags;        /**< Offload features. */
+
+	/* remaining bytes are set on RX when pulling packet from descriptor */
+	MARKER rx_descriptor_fields1;
+
+	/*
+	 * The packet type, which is the combination of outer/inner L2, L3, L4
+	 * and tunnel types. The packet_type is about data really present in the
+	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+	 * vlan is stripped from the data.
+	 */
+	RTE_STD_C11
+	union {
+		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		struct {
+			uint32_t l2_type:4; /**< (Outer) L2 type. */
+			uint32_t l3_type:4; /**< (Outer) L3 type. */
+			uint32_t l4_type:4; /**< (Outer) L4 type. */
+			uint32_t tun_type:4; /**< Tunnel type. */
+			uint32_t inner_l2_type:4; /**< Inner L2 type. */
+			uint32_t inner_l3_type:4; /**< Inner L3 type. */
+			uint32_t inner_l4_type:4; /**< Inner L4 type. */
+		};
+	};
+
+	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
+	uint16_t data_len;        /**< Amount of data in segment buffer. */
+	/** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
+	uint16_t vlan_tci;
+
+	union {
+		uint32_t rss;     /**< RSS hash result if RSS enabled */
+		struct {
+			RTE_STD_C11
+			union {
+				struct {
+					uint16_t hash;
+					uint16_t id;
+				};
+				uint32_t lo;
+				/**< Second 4 flexible bytes */
+			};
+			uint32_t hi;
+			/**< First 4 flexible bytes or FD ID, dependent on
+			 *   PKT_RX_FDIR_* flag in ol_flags.
+			 */
+		} fdir;           /**< Filter identifier if FDIR enabled */
+		struct {
+			uint32_t lo;
+			uint32_t hi;
+		} sched;          /**< Hierarchical scheduler */
+		uint32_t usr;
+		/**< User defined tags. See rte_distributor_process() */
+	} hash;                   /**< hash information */
+
+	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
+	uint16_t vlan_tci_outer;
+
+	uint16_t buf_len;         /**< Length of segment buffer. */
+
+	/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
+	 * are not normalized but are always the same for a given port.
+	 */
+	uint64_t timestamp;
+
+	/* second cache line - fields only used in slow path or on TX */
+	MARKER cacheline1 __rte_cache_min_aligned;
+
+	RTE_STD_C11
+	union {
+		void *userdata;   /**< Can be used for external metadata */
+		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
+	};
+
+	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
+	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+
+	/* fields to support TX offloads */
+	RTE_STD_C11
+	union {
+		uint64_t tx_offload;       /**< combined for easy fetch */
+		__extension__
+		struct {
+			uint64_t l2_len:7;
+			/**< L2 (MAC) Header Length for non-tunneling pkt.
+			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
+			 */
+			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+			uint64_t tso_segsz:16; /**< TCP TSO segment size */
+
+			/* fields for TX offloading of tunnels */
+			uint64_t outer_l3_len:9;
+			/**< Outer L3 (IP) Hdr Length. */
+			uint64_t outer_l2_len:7;
+			/**< Outer L2 (MAC) Hdr Length. */
+
+			/* uint64_t unused:8; */
+		};
+	};
+
+	/** Size of the application private data. In case of an indirect
+	 * mbuf, it stores the direct mbuf private data size.
+	 */
+	uint16_t priv_size;
+
+	/** Timesync flags for use with IEEE1588. */
+	uint16_t timesync;
+
+	/** Sequence number. See also rte_reorder_insert(). */
+	uint32_t seqn;
+
+} __rte_cache_aligned;
+
+
+/**
+ * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
+ */
+#define RTE_MBUF_INDIRECT(mb)   ((mb)->ol_flags & IND_ATTACHED_MBUF)
+
+/**
+ * Returns TRUE if given mbuf is direct, or FALSE otherwise.
+ */
+#define RTE_MBUF_DIRECT(mb)     (!RTE_MBUF_INDIRECT(mb))
+
+/**
+ * Private data in case of pktmbuf pool.
+ *
+ * A structure that contains some pktmbuf_pool-specific data that are
+ * appended after the mempool structure (in private data).
+ */
+struct rte_pktmbuf_pool_private {
+	uint16_t mbuf_data_room_size; /**< Size of data space in each mbuf. */
+	uint16_t mbuf_priv_size;      /**< Size of private area in each mbuf. */
+};
+
+/**
+ * A macro that points to an offset into the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param o
+ *   The offset into the mbuf data.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod_offset(m, t, o)	\
+	((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
+
+/**
+ * A macro that points to the start of the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _MBUF_H_ */
diff --git a/test/bpf/t1.c b/test/bpf/t1.c
new file mode 100644
index 000000000..60f9434ab
--- /dev/null
+++ b/test/bpf/t1.c
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to first segment packet data as an input parameter.
+ * analog of tcpdump -s 1 -d 'dst 1.2.3.4 && udp && dst port 5000'
+ * (000) ldh      [12]
+ * (001) jeq      #0x800           jt 2    jf 12
+ * (002) ld       [30]
+ * (003) jeq      #0x1020304       jt 4    jf 12
+ * (004) ldb      [23]
+ * (005) jeq      #0x11            jt 6    jf 12
+ * (006) ldh      [20]
+ * (007) jset     #0x1fff          jt 12   jf 8
+ * (008) ldxb     4*([14]&0xf)
+ * (009) ldh      [x + 16]
+ * (010) jeq      #0x1388          jt 11   jf 12
+ * (011) ret      #1
+ * (012) ret      #0
+ *
+ * To compile:
+ * clang -O2 -target bpf -c t1.c
+ */
+
+#include <stdint.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <netinet/udp.h>
+
+uint64_t
+entry(void *pkt)
+{
+	struct ether_header *ether_header = (void *)pkt;
+
+	if (ether_header->ether_type != __builtin_bswap16(0x0800))
+		return 0;
+
+	struct iphdr *iphdr = (void *)(ether_header + 1);
+	if (iphdr->protocol != 17 || (iphdr->frag_off & 0x1ffff) != 0 ||
+			iphdr->daddr != __builtin_bswap32(0x1020304))
+		return 0;
+
+	int hlen = iphdr->ihl * 4;
+	struct udphdr *udphdr = (void *)iphdr + hlen;
+
+	if (udphdr->dest !=  __builtin_bswap16(5000))
+		return 0;
+
+	return 1;
+}
diff --git a/test/bpf/t2.c b/test/bpf/t2.c
new file mode 100644
index 000000000..69d7a4fe1
--- /dev/null
+++ b/test/bpf/t2.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * cleanup mbuf's vlan_tci and all related RX flags
+ * (PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED).
+ * Doesn't touch contents of packet data.
+ * To compile:
+ * clang -O2 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t2.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <rte_config.h>
+#include "mbuf.h"
+
+uint64_t
+entry(void *pkt)
+{
+	struct rte_mbuf *mb;
+
+	mb = pkt;
+	mb->vlan_tci = 0;
+	mb->ol_flags &= ~(PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED);
+
+	return 1;
+}
diff --git a/test/bpf/t3.c b/test/bpf/t3.c
new file mode 100644
index 000000000..531b9cb8c
--- /dev/null
+++ b/test/bpf/t3.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * Dump the mbuf into stdout if it is an ARP packet (aka tcpdump 'arp').
+ * To compile:
+ * clang -O2 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t3.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <net/ethernet.h>
+#include <rte_config.h>
+#include "mbuf.h"
+
+extern void rte_pktmbuf_dump(FILE *, const struct rte_mbuf *, unsigned int);
+
+uint64_t
+entry(const void *pkt)
+{
+	const struct rte_mbuf *mb;
+	const struct ether_header *eth;
+
+	mb = pkt;
+	eth = rte_pktmbuf_mtod(mb, const struct ether_header *);
+
+	if (eth->ether_type == __builtin_bswap16(ETHERTYPE_ARP))
+		rte_pktmbuf_dump(stdout, mb, 64);
+
+	return 1;
+}
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v5 7/8] test: introduce functional test for librte_bpf
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                         ` (6 preceding siblings ...)
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 6/8] test: add few eBPF samples Konstantin Ananyev
@ 2018-05-04 12:45       ` Konstantin Ananyev
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 8/8] doc: add bpf library related info Konstantin Ananyev
  8 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-04 12:45 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 test/test/Makefile    |    2 +
 test/test/meson.build |    2 +
 test/test/test_bpf.c  | 1759 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1763 insertions(+)
 create mode 100644 test/test/test_bpf.c

diff --git a/test/test/Makefile b/test/test/Makefile
index 2630ab484..9a08e9af6 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -193,6 +193,8 @@ endif
 
 SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += test_bpf.c
+
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 
 CFLAGS += -O3
diff --git a/test/test/meson.build b/test/test/meson.build
index ad0a65080..91d0408af 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -8,6 +8,7 @@ test_sources = files('commands.c',
 	'test_alarm.c',
 	'test_atomic.c',
 	'test_barrier.c',
+	'test_bpf.c',
 	'test_byteorder.c',
 	'test_cmdline.c',
 	'test_cmdline_cirbuf.c',
@@ -97,6 +98,7 @@ test_sources = files('commands.c',
 )
 
 test_deps = ['acl',
+	'bpf',
 	'cfgfile',
 	'cmdline',
 	'cryptodev',
diff --git a/test/test/test_bpf.c b/test/test/test_bpf.c
new file mode 100644
index 000000000..cbd6be63d
--- /dev/null
+++ b/test/test/test_bpf.c
@@ -0,0 +1,1759 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_memory.h>
+#include <rte_debug.h>
+#include <rte_hexdump.h>
+#include <rte_random.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+#include <rte_bpf.h>
+
+#include "test.h"
+
+/*
+ * Basic functional tests for librte_bpf.
+ * The main procedure - load eBPF program, execute it and
+ * compare restuls with expected values.
+ */
+
+struct dummy_offset {
+	uint64_t u64;
+	uint32_t u32;
+	uint16_t u16;
+	uint8_t  u8;
+};
+
+struct dummy_vect8 {
+	struct dummy_offset in[8];
+	struct dummy_offset out[8];
+};
+
+#define	TEST_FILL_1	0xDEADBEEF
+
+#define	TEST_MUL_1	21
+#define TEST_MUL_2	-100
+
+#define TEST_SHIFT_1	15
+#define TEST_SHIFT_2	33
+
+#define TEST_JCC_1	0
+#define TEST_JCC_2	-123
+#define TEST_JCC_3	5678
+#define TEST_JCC_4	TEST_FILL_1
+
+struct bpf_test {
+	const char *name;
+	size_t arg_sz;
+	struct rte_bpf_prm prm;
+	void (*prepare)(void *);
+	int (*check_result)(uint64_t, const void *);
+	uint32_t allow_fail;
+};
+
+/*
+ * Compare return value and result data with expected ones.
+ * Report a failure if they don't match.
+ */
+static int
+cmp_res(const char *func, uint64_t exp_rc, uint64_t ret_rc,
+	const void *exp_res, const void *ret_res, size_t res_sz)
+{
+	int32_t ret;
+
+	ret = 0;
+	if (exp_rc != ret_rc) {
+		printf("%s@%d: invalid return value, expected: 0x%" PRIx64
+			",result: 0x%" PRIx64 "\n",
+			func, __LINE__, exp_rc, ret_rc);
+		ret |= -1;
+	}
+
+	if (memcmp(exp_res, ret_res, res_sz) != 0) {
+		printf("%s: invalid value\n", func);
+		rte_memdump(stdout, "expected", exp_res, res_sz);
+		rte_memdump(stdout, "result", ret_res, res_sz);
+		ret |= -1;
+	}
+
+	return ret;
+}
+
+/* store immediate test-cases */
+static const struct ebpf_insn test_store1_prog[] = {
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_H),
+		.dst_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u16),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+		.imm = TEST_FILL_1,
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_store1_prepare(void *arg)
+{
+	struct dummy_offset *df;
+
+	df = arg;
+	memset(df, 0, sizeof(*df));
+}
+
+static int
+test_store1_check(uint64_t rc, const void *arg)
+{
+	const struct dummy_offset *dft;
+	struct dummy_offset dfe;
+
+	dft = arg;
+
+	memset(&dfe, 0, sizeof(dfe));
+	dfe.u64 = (int32_t)TEST_FILL_1;
+	dfe.u32 = dfe.u64;
+	dfe.u16 = dfe.u64;
+	dfe.u8 = dfe.u64;
+
+	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
+}
+
+/* store register test-cases */
+static const struct ebpf_insn test_store2_prog[] = {
+
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_H),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u16),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+/* load test-cases */
+static const struct ebpf_insn test_load1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_H),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u16),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return sum */
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_4,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_3,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_load1_prepare(void *arg)
+{
+	struct dummy_offset *df;
+
+	df = arg;
+
+	memset(df, 0, sizeof(*df));
+	df->u64 = (int32_t)TEST_FILL_1;
+	df->u32 = df->u64;
+	df->u16 = df->u64;
+	df->u8 = df->u64;
+}
+
+static int
+test_load1_check(uint64_t rc, const void *arg)
+{
+	uint64_t v;
+	const struct dummy_offset *dft;
+
+	dft = arg;
+	v = dft->u64;
+	v += dft->u32;
+	v += dft->u16;
+	v += dft->u8;
+
+	return cmp_res(__func__, v, rc, dft, dft, sizeof(*dft));
+}
+
+/* alu mul test-cases */
+static const struct ebpf_insn test_mul1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_MUL | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_MUL | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (BPF_ALU | BPF_MUL | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_MUL | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_3,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_mul1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+	uint64_t v;
+
+	dv = arg;
+
+	v = rte_rand();
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u32 = v;
+	dv->in[1].u64 = v << 12 | v >> 6;
+	dv->in[2].u32 = -v;
+}
+
+static int
+test_mul1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 * TEST_MUL_1;
+	r3 *= TEST_MUL_2;
+	r4 = (uint32_t)(r4 * r2);
+	r4 *= r3;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	return cmp_res(__func__, 1, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* alu shift test-cases */
+static const struct ebpf_insn test_shift1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_LSH | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_SHIFT_1,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_ARSH | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = TEST_SHIFT_2,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_RSH | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_4,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_LSH | BPF_X),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_4,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[3].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_AND | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = sizeof(uint64_t) * CHAR_BIT - 1,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_ARSH | BPF_X),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_ALU | BPF_AND | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = sizeof(uint32_t) * CHAR_BIT - 1,
+	},
+	{
+		.code = (BPF_ALU | BPF_LSH | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[4].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[5].u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_shift1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+	uint64_t v;
+
+	dv = arg;
+
+	v = rte_rand();
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u32 = v;
+	dv->in[1].u64 = v << 12 | v >> 6;
+	dv->in[2].u32 = (-v ^ 5);
+}
+
+static int
+test_shift1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 << TEST_SHIFT_1;
+	r3 = (int64_t)r3 >> TEST_SHIFT_2;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+
+	r2 = (uint32_t)r2 >> r4;
+	r3 <<= r4;
+
+	dve.out[2].u64 = r2;
+	dve.out[3].u64 = r3;
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 &= sizeof(uint64_t) * CHAR_BIT - 1;
+	r3 = (int64_t)r3 >> r2;
+	r2 &= sizeof(uint32_t) * CHAR_BIT - 1;
+	r4 = (uint32_t)r4 << r2;
+
+	dve.out[4].u64 = r4;
+	dve.out[5].u64 = r3;
+
+	return cmp_res(__func__, 1, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* jmp test-cases */
+static const struct ebpf_insn test_jump1_prog[] = {
+
+	[0] = {
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0,
+	},
+	[1] = {
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	[2] = {
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	[3] = {
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u32),
+	},
+	[4] = {
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_5,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	[5] = {
+		.code = (BPF_JMP | BPF_JEQ | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_JCC_1,
+		.off = 8,
+	},
+	[6] = {
+		.code = (BPF_JMP | EBPF_JSLE | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = TEST_JCC_2,
+		.off = 9,
+	},
+	[7] = {
+		.code = (BPF_JMP | BPF_JGT | BPF_K),
+		.dst_reg = EBPF_REG_4,
+		.imm = TEST_JCC_3,
+		.off = 10,
+	},
+	[8] = {
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_5,
+		.imm = TEST_JCC_4,
+		.off = 11,
+	},
+	[9] = {
+		.code = (BPF_JMP | EBPF_JNE | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_3,
+		.off = 12,
+	},
+	[10] = {
+		.code = (BPF_JMP | EBPF_JSGT | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_4,
+		.off = 13,
+	},
+	[11] = {
+		.code = (BPF_JMP | EBPF_JLE | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_5,
+		.off = 14,
+	},
+	[12] = {
+		.code = (BPF_JMP | BPF_JSET | BPF_X),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_5,
+		.off = 15,
+	},
+	[13] = {
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+	[14] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x1,
+	},
+	[15] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -10,
+	},
+	[16] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x2,
+	},
+	[17] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -11,
+	},
+	[18] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x4,
+	},
+	[19] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -12,
+	},
+	[20] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x8,
+	},
+	[21] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -13,
+	},
+	[22] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x10,
+	},
+	[23] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -14,
+	},
+	[24] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x20,
+	},
+	[25] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -15,
+	},
+	[26] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x40,
+	},
+	[27] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -16,
+	},
+	[28] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x80,
+	},
+	[29] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -17,
+	},
+};
+
+static void
+test_jump1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+	uint64_t v1, v2;
+
+	dv = arg;
+
+	v1 = rte_rand();
+	v2 = rte_rand();
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u64 = v1;
+	dv->in[1].u64 = v2;
+	dv->in[0].u32 = (v1 << 12) + (v2 >> 6);
+	dv->in[1].u32 = (v2 << 12) - (v1 >> 6);
+}
+
+static int
+test_jump1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4, r5, rv;
+	const struct dummy_vect8 *dvt;
+
+	dvt = arg;
+
+	rv = 0;
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[0].u64;
+	r4 = dvt->in[1].u32;
+	r5 = dvt->in[1].u64;
+
+	if (r2 == TEST_JCC_1)
+		rv |= 0x1;
+	if ((int64_t)r3 <= TEST_JCC_2)
+		rv |= 0x2;
+	if (r4 > TEST_JCC_3)
+		rv |= 0x4;
+	if (r5 & TEST_JCC_4)
+		rv |= 0x8;
+	if (r2 != r3)
+		rv |= 0x10;
+	if ((int64_t)r2 > (int64_t)r4)
+		rv |= 0x20;
+	if (r2 <= r5)
+		rv |= 0x40;
+	if (r3 & r5)
+		rv |= 0x80;
+
+	return cmp_res(__func__, rv, rc, &rv, &rc, sizeof(rv));
+}
+
+/* alu (add, sub, and, or, xor, neg)  test-cases */
+static const struct ebpf_insn test_alu1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_5,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_AND | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ALU | BPF_XOR | BPF_K),
+		.dst_reg = EBPF_REG_4,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_K),
+		.dst_reg = EBPF_REG_5,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_5,
+		.off = offsetof(struct dummy_vect8, out[3].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_OR | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_3,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_XOR | BPF_X),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_4,
+	},
+	{
+		.code = (BPF_ALU | BPF_SUB | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_5,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_AND | BPF_X),
+		.dst_reg = EBPF_REG_5,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[4].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[5].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[6].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_5,
+		.off = offsetof(struct dummy_vect8, out[7].u64),
+	},
+	/* return (-r2 + (-r3)) */
+	{
+		.code = (BPF_ALU | BPF_NEG),
+		.dst_reg = EBPF_REG_2,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_NEG),
+		.dst_reg = EBPF_REG_3,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_3,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_X),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static int
+test_alu1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4, r5, rv;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[0].u64;
+	r4 = dvt->in[1].u32;
+	r5 = dvt->in[1].u64;
+
+	r2 = (uint32_t)r2 & TEST_FILL_1;
+	r3 |= (int32_t) TEST_FILL_1;
+	r4 = (uint32_t)r4 ^ TEST_FILL_1;
+	r5 += (int32_t)TEST_FILL_1;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+	dve.out[3].u64 = r5;
+
+	r2 = (uint32_t)r2 | (uint32_t)r3;
+	r3 ^= r4;
+	r4 = (uint32_t)r4 - (uint32_t)r5;
+	r5 &= r2;
+
+	dve.out[4].u64 = r2;
+	dve.out[5].u64 = r3;
+	dve.out[6].u64 = r4;
+	dve.out[7].u64 = r5;
+
+	r2 = -(int32_t)r2;
+	rv = (uint32_t)r2;
+	r3 = -r3;
+	rv += r3;
+
+	return cmp_res(__func__, rv, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* endianness conversions (BE->LE/LE->BE)  test-cases */
+static const struct ebpf_insn test_bele1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_H),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u16),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_BE),
+		.dst_reg = EBPF_REG_2,
+		.imm = sizeof(uint16_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_BE),
+		.dst_reg = EBPF_REG_3,
+		.imm = sizeof(uint32_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_BE),
+		.dst_reg = EBPF_REG_4,
+		.imm = sizeof(uint64_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_H),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u16),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_LE),
+		.dst_reg = EBPF_REG_2,
+		.imm = sizeof(uint16_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_LE),
+		.dst_reg = EBPF_REG_3,
+		.imm = sizeof(uint32_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_LE),
+		.dst_reg = EBPF_REG_4,
+		.imm = sizeof(uint64_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[3].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[4].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[5].u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_bele1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+
+	dv = arg;
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u64 = rte_rand();
+	dv->in[0].u32 = dv->in[0].u64;
+	dv->in[0].u16 = dv->in[0].u64;
+}
+
+static int
+test_bele1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u16;
+	r3 = dvt->in[0].u32;
+	r4 = dvt->in[0].u64;
+
+	r2 =  rte_cpu_to_be_16(r2);
+	r3 =  rte_cpu_to_be_32(r3);
+	r4 =  rte_cpu_to_be_64(r4);
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	r2 = dvt->in[0].u16;
+	r3 = dvt->in[0].u32;
+	r4 = dvt->in[0].u64;
+
+	r2 =  rte_cpu_to_le_16(r2);
+	r3 =  rte_cpu_to_le_32(r3);
+	r4 =  rte_cpu_to_le_64(r4);
+
+	dve.out[3].u64 = r2;
+	dve.out[4].u64 = r3;
+	dve.out[5].u64 = r4;
+
+	return cmp_res(__func__, 1, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* atomic add test-cases */
+static const struct ebpf_insn test_xadd1_prog[] = {
+
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = -1,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_4,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_5,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_5,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_5,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_6,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_6,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_6,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_7,
+		.imm = TEST_JCC_2,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_7,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_7,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_8,
+		.imm = TEST_JCC_3,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_8,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_8,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static int
+test_xadd1_check(uint64_t rc, const void *arg)
+{
+	uint64_t rv;
+	const struct dummy_offset *dft;
+	struct dummy_offset dfe;
+
+	dft = arg;
+	memset(&dfe, 0, sizeof(dfe));
+
+	rv = 1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = -1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = (int32_t)TEST_FILL_1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_MUL_1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_MUL_2;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_JCC_2;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_JCC_3;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
+}
+
+/* alu div test-cases */
+static const struct ebpf_insn test_div1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_DIV | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_MOD | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_ALU | BPF_MOD | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_DIV | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_3,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	/* check that we can handle division by zero gracefully. */
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[3].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_DIV | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_2,
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static int
+test_div1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 / TEST_MUL_1;
+	r3 %= TEST_MUL_2;
+	r2 |= 1;
+	r3 |= 1;
+	r4 = (uint32_t)(r4 % r2);
+	r4 /= r3;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	/*
+	 * in the test prog we attempted to divide by zero.
+	 * so return value should return 0.
+	 */
+	return cmp_res(__func__, 0, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* call test-cases */
+static const struct ebpf_insn test_call1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_10,
+		.src_reg = EBPF_REG_2,
+		.off = -4,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_10,
+		.src_reg = EBPF_REG_3,
+		.off = -16,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_10,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_SUB | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 4,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_X),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_10,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_SUB | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = 16,
+	},
+	{
+		.code = (BPF_JMP | EBPF_CALL),
+		.imm = 0,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_10,
+		.off = -4,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_10,
+		.off = -16
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+dummy_func1(const void *p, uint32_t *v32, uint64_t *v64)
+{
+	const struct dummy_offset *dv;
+
+	dv = p;
+
+	v32[0] += dv->u16;
+	v64[0] += dv->u8;
+}
+
+static int
+test_call1_check(uint64_t rc, const void *arg)
+{
+	uint32_t v32;
+	uint64_t v64;
+	const struct dummy_offset *dv;
+
+	dv = arg;
+
+	v32 = dv->u32;
+	v64 = dv->u64;
+	dummy_func1(arg, &v32, &v64);
+	v64 += v32;
+
+	if (v64 != rc) {
+		printf("%s@%d: invalid return value "
+			"expected=0x%" PRIx64 ", actual=0x%" PRIx64 "\n",
+			__func__, __LINE__, v64, rc);
+		return -1;
+	}
+	return 0;
+	return cmp_res(__func__, v64, rc, dv, dv, sizeof(*dv));
+}
+
+static const struct rte_bpf_xsym test_call1_xsym[] = {
+	{
+		.name = RTE_STR(dummy_func1),
+		.type = RTE_BPF_XTYPE_FUNC,
+		.func = (void *)dummy_func1,
+	},
+};
+
+static const struct bpf_test tests[] = {
+	{
+		.name = "test_store1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_store1_prog,
+			.nb_ins = RTE_DIM(test_store1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_store1_check,
+	},
+	{
+		.name = "test_store2",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_store2_prog,
+			.nb_ins = RTE_DIM(test_store2_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_store1_check,
+	},
+	{
+		.name = "test_load1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_load1_prog,
+			.nb_ins = RTE_DIM(test_load1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_load1_prepare,
+		.check_result = test_load1_check,
+	},
+	{
+		.name = "test_mul1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_mul1_prog,
+			.nb_ins = RTE_DIM(test_mul1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_mul1_prepare,
+		.check_result = test_mul1_check,
+	},
+	{
+		.name = "test_shift1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_shift1_prog,
+			.nb_ins = RTE_DIM(test_shift1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_shift1_prepare,
+		.check_result = test_shift1_check,
+	},
+	{
+		.name = "test_jump1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_jump1_prog,
+			.nb_ins = RTE_DIM(test_jump1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_jump1_prepare,
+		.check_result = test_jump1_check,
+	},
+	{
+		.name = "test_alu1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_alu1_prog,
+			.nb_ins = RTE_DIM(test_alu1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_jump1_prepare,
+		.check_result = test_alu1_check,
+	},
+	{
+		.name = "test_bele1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_bele1_prog,
+			.nb_ins = RTE_DIM(test_bele1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_bele1_prepare,
+		.check_result = test_bele1_check,
+	},
+	{
+		.name = "test_xadd1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_xadd1_prog,
+			.nb_ins = RTE_DIM(test_xadd1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_xadd1_check,
+	},
+	{
+		.name = "test_div1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_div1_prog,
+			.nb_ins = RTE_DIM(test_div1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_mul1_prepare,
+		.check_result = test_div1_check,
+	},
+	{
+		.name = "test_call1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_call1_prog,
+			.nb_ins = RTE_DIM(test_call1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+			.xsym = test_call1_xsym,
+			.nb_xsym = RTE_DIM(test_call1_xsym),
+		},
+		.prepare = test_load1_prepare,
+		.check_result = test_call1_check,
+		/* for now don't support function calls on 32 bit platform */
+		.allow_fail = (sizeof(uint64_t) != sizeof(uintptr_t)),
+	},
+};
+
+static int
+run_test(const struct bpf_test *tst)
+{
+	int32_t ret, rv;
+	int64_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_jit jit;
+	uint8_t tbuf[tst->arg_sz];
+
+	printf("%s(%s) start\n", __func__, tst->name);
+
+	bpf = rte_bpf_load(&tst->prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		return -1;
+	}
+
+	tst->prepare(tbuf);
+
+	rc = rte_bpf_exec(bpf, tbuf);
+	ret = tst->check_result(rc, tbuf);
+	if (ret != 0) {
+		printf("%s@%d: check_result(%s) failed, error: %d(%s);\n",
+			__func__, __LINE__, tst->name, ret, strerror(ret));
+	}
+
+	rte_bpf_get_jit(bpf, &jit);
+	if (jit.func == NULL)
+		return 0;
+
+	tst->prepare(tbuf);
+	rc = jit.func(tbuf);
+	rv = tst->check_result(rc, tbuf);
+	ret |= rv;
+	if (rv != 0) {
+		printf("%s@%d: check_result(%s) failed, error: %d(%s);\n",
+			__func__, __LINE__, tst->name, rv, strerror(ret));
+	}
+
+	rte_bpf_destroy(bpf);
+	return ret;
+
+}
+
+static int
+test_bpf(void)
+{
+	int32_t rc, rv;
+	uint32_t i;
+
+	rc = 0;
+	for (i = 0; i != RTE_DIM(tests); i++) {
+		rv = run_test(tests + i);
+		if (tests[i].allow_fail == 0)
+			rc |= rv;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v5 8/8] doc: add bpf library related info
  2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
                         ` (7 preceding siblings ...)
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 7/8] test: introduce functional test for librte_bpf Konstantin Ananyev
@ 2018-05-04 12:45       ` Konstantin Ananyev
  8 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-04 12:45 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 doc/api/doxy-api-index.md                   |  3 +-
 doc/api/doxy-api.conf                       |  1 +
 doc/guides/prog_guide/bpf_lib.rst           | 38 ++++++++++++++++++++
 doc/guides/prog_guide/index.rst             |  1 +
 doc/guides/rel_notes/release_18_05.rst      |  7 ++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 56 +++++++++++++++++++++++++++++
 6 files changed, 105 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/prog_guide/bpf_lib.rst

diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 26ce7b44b..927ec59b2 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -108,7 +108,8 @@ The public API headers are grouped by topics:
   [EFD]                (@ref rte_efd.h),
   [ACL]                (@ref rte_acl.h),
   [member]             (@ref rte_member.h),
-  [flow classify]      (@ref rte_flow_classify.h)
+  [flow classify]      (@ref rte_flow_classify.h),
+  [BPF]                (@ref rte_bpf.h)
 
 - **containers**:
   [mbuf]               (@ref rte_mbuf.h),
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 5686cbb9d..037166e76 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -42,6 +42,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_acl \
                           lib/librte_bbdev \
                           lib/librte_bitratestats \
+                          lib/librte_bpf \
                           lib/librte_cfgfile \
                           lib/librte_cmdline \
                           lib/librte_compat \
diff --git a/doc/guides/prog_guide/bpf_lib.rst b/doc/guides/prog_guide/bpf_lib.rst
new file mode 100644
index 000000000..7c08e6b2d
--- /dev/null
+++ b/doc/guides/prog_guide/bpf_lib.rst
@@ -0,0 +1,38 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+Berkeley Packet Filter Library
+==============================
+
+The DPDK provides an BPF library that gives the ability
+to load and execute Enhanced Berkeley Packet Filter (eBPF) bytecode within
+user-space dpdk application.
+
+It supports basic set of features from eBPF spec.
+Please refer to the
+`eBPF spec <https://www.kernel.org/doc/Documentation/networking/filter.txt>`
+for more information.
+Also it introduces basic framework to load/unload BPF-based filters
+on eth devices (right now only via SW RX/TX callbacks).
+
+The library API provides the following basic operations:
+
+*  Create a new BPF execution context and load user provided eBPF code into it.
+
+*   Destroy an BPF execution context and its runtime structures and free the associated memory.
+
+*   Execute eBPF bytecode associated with provided input parameter.
+
+*   Provide information about natively compiled code for given BPF context.
+
+*   Load BPF program from the ELF file and install callback to execute it on given ethdev port/queue.
+
+Not currently supported eBPF features
+-------------------------------------
+
+ - JIT for non X86_64 platforms
+ - cBPF
+ - tail-pointer call
+ - eBPF MAP
+ - skb
+ - external function calls for 32-bit platforms
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 235ad0201..2c40fb4ec 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -51,6 +51,7 @@ Programmer's Guide
     vhost_lib
     metrics_lib
     port_hotplug_framework
+    bpf_lib
     source_org
     dev_kit_build_system
     dev_kit_root_make_help
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 0ae61e87b..1ddb094c4 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -164,6 +164,13 @@ New Features
   stats/xstats on shared memory from secondary process, and also pdump packets on
   those virtual devices.
 
+* **Added the BPF Library.**
+
+  The BPF Library provides the ability to load and execute
+  Enhanced Berkeley Packet Filter (eBPF) within user-space dpdk application.
+  Also it introduces basic framework to load/unload BPF-based filters
+  on eth devices (right now only via SW RX/TX callbacks).
+  It also adds dependency on libelf.
 
 API Changes
 -----------
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 013a40549..e4afb03dc 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3813,3 +3813,59 @@ Validate and create a QinQ rule on port 0 to steer traffic to a queue on the hos
    ID      Group   Prio    Attr    Rule
    0       0       0       i-      ETH VLAN VLAN=>VF QUEUE
    1       0       0       i-      ETH VLAN VLAN=>PF QUEUE
+
+BPF Functions
+--------------
+
+The following sections show functions to load/unload eBPF based filters.
+
+bpf-load
+~~~~~~~~
+
+Load an eBPF program as a callback for partciular RX/TX queue::
+
+   testpmd> bpf-load rx|tx (portid) (queueid) (load-flags) (bpf-prog-filename)
+
+The available load-flags are:
+
+* ``J``: use JIT generated native code, otherwise BPF interpreter will be used.
+
+* ``M``: assume input parameter is a pointer to rte_mbuf, otherwise assume it is a pointer to first segment's data.
+
+* ``-``: none.
+
+.. note::
+
+   You'll need clang v3.7 or above to build bpf program you'd like to load
+
+For example:
+
+.. code-block:: console
+
+   cd test/bpf
+   clang -O2 -target bpf -c t1.c
+
+Then to load (and JIT compile) t1.o at RX queue 0, port 1::
+
+.. code-block:: console
+
+   testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o
+
+To load (not JITed) t1.o at TX queue 0, port 0::
+
+.. code-block:: console
+
+   testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/t1.o
+
+bpf-unload
+~~~~~~~~~~
+
+Unload previously loaded eBPF program for partciular RX/TX queue::
+
+   testpmd> bpf-unload rx|tx (portid) (queueid)
+
+For example to unload BPF filter from TX queue 0, port 0:
+
+.. code-block:: console
+
+   testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/t1.o
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
@ 2018-05-09 17:09         ` Ferruh Yigit
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 0/9] add framework to load and execute BPF code Konstantin Ananyev
                           ` (9 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Ferruh Yigit @ 2018-05-09 17:09 UTC (permalink / raw)
  To: Konstantin Ananyev, dev

On 5/4/2018 1:45 PM, Konstantin Ananyev wrote:
> librte_bpf provides a framework to load and execute eBPF bytecode
> inside user-space dpdk based applications.
> It supports basic set of features from eBPF spec
> (https://www.kernel.org/doc/Documentation/networking/filter.txt).
> 
> Not currently supported features:
>  - JIT
>  - cBPF
>  - tail-pointer call
>  - eBPF MAP
>  - skb
>  - function calls for 32-bit apps
>  - mbuf pointer as input parameter for 32-bit apps
> 
> It also adds dependency on libelf.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  MAINTAINERS                        |   4 +
>  config/common_base                 |   5 +
>  lib/Makefile                       |   2 +
>  lib/librte_bpf/Makefile            |  31 +++
>  lib/librte_bpf/bpf.c               |  59 +++++
>  lib/librte_bpf/bpf_def.h           | 138 +++++++++++
>  lib/librte_bpf/bpf_exec.c          | 453 +++++++++++++++++++++++++++++++++++++
>  lib/librte_bpf/bpf_impl.h          |  41 ++++
>  lib/librte_bpf/bpf_load.c          | 386 +++++++++++++++++++++++++++++++
>  lib/librte_bpf/bpf_validate.c      |  55 +++++
>  lib/librte_bpf/meson.build         |  19 ++
>  lib/librte_bpf/rte_bpf.h           | 184 +++++++++++++++
>  lib/librte_bpf/rte_bpf_version.map |  12 +
>  lib/meson.build                    |   2 +-
>  mk/rte.app.mk                      |   2 +

Can you please update release notes "Shared Library Versions" section and add
this new library?

>  15 files changed, 1392 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_bpf/Makefile
>  create mode 100644 lib/librte_bpf/bpf.c
>  create mode 100644 lib/librte_bpf/bpf_def.h
>  create mode 100644 lib/librte_bpf/bpf_exec.c
>  create mode 100644 lib/librte_bpf/bpf_impl.h
>  create mode 100644 lib/librte_bpf/bpf_load.c
>  create mode 100644 lib/librte_bpf/bpf_validate.c
>  create mode 100644 lib/librte_bpf/meson.build
>  create mode 100644 lib/librte_bpf/rte_bpf.h
>  create mode 100644 lib/librte_bpf/rte_bpf_version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ce06e93c2..4a7edbcf7 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1049,6 +1049,10 @@ Latency statistics
>  M: Reshma Pattan <reshma.pattan@intel.com>
>  F: lib/librte_latencystats/
>  
> +BPF
> +M: Konstantin Ananyev <konstantin.ananyev@intel.com>
> +F: lib/librte_bpf/
> +F: doc/guides/prog_guide/bpf_lib.rst

This file will be added in next patches, it can be possible to add this line in
the patch that adds the document.

>  
>  Test Applications
>  -----------------
> diff --git a/config/common_base b/config/common_base
> index 03a8688b5..ac425491c 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -863,3 +863,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
>  # Compile the eventdev application
>  #
>  CONFIG_RTE_APP_EVENTDEV=y
> +
> +#
> +# Compile librte_bpf
> +#
> +CONFIG_RTE_LIBRTE_BPF=y

This is default enabled for all architectures and environment, just to double
check that all architectures supported? I am not able to test because of missing
libelf.h for cross environment

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v5 4/8] bpf: introduce basic RX/TX BPF filters
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 4/8] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
@ 2018-05-09 17:09         ` Ferruh Yigit
  0 siblings, 0 replies; 83+ messages in thread
From: Ferruh Yigit @ 2018-05-09 17:09 UTC (permalink / raw)
  To: Konstantin Ananyev, dev

On 5/4/2018 1:45 PM, Konstantin Ananyev wrote:
> +/**
> + * Load BPF program from the ELF file and install callback to execute it
> + * on given RX port/queue.
> + *
> + * @param port
> + *   The identifier of the ethernet port
> + * @param queue
> + *   The identifier of the RX queue on the given port
> + * @param fname
> + *  Pathname for a ELF file.
> + * @param sname
> + *  Name of the executable section within the file to load.
> + * @return
> + *   Zero on successful completion or negative error code otherwise.
> + */
> +int rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
> +	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
> +	uint32_t flags);
> +
> +/**
> + * Load BPF program from the ELF file and install callback to execute it
> + * on given TX port/queue.
> + *
> + * @param port
> + *   The identifier of the ethernet port
> + * @param queue
> + *   The identifier of the TX queue on the given port
> + * @param fname
> + *  Pathname for a ELF file.
> + * @param sname
> + *  Name of the executable section within the file to load.
> + * @return
> + *   Zero on successful completion or negative error code otherwise.
> + */
> +int rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
> +	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
> +	uint32_t flags);

Doxygen is complaining about documented params for both functions, prm & flags
are missing.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v5 5/8] testpmd: new commands to load/unload BPF filters
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 5/8] testpmd: new commands to load/unload " Konstantin Ananyev
@ 2018-05-09 17:09         ` Ferruh Yigit
  2018-05-09 18:31           ` Kevin Traynor
  0 siblings, 1 reply; 83+ messages in thread
From: Ferruh Yigit @ 2018-05-09 17:09 UTC (permalink / raw)
  To: Konstantin Ananyev, dev

On 5/4/2018 1:45 PM, Konstantin Ananyev wrote:
> Introduce new testpmd commands to load/unload RX/TX BPF-based filters.
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  app/test-pmd/bpf_sup.h   |  25 ++++++++
>  app/test-pmd/cmdline.c   | 149 +++++++++++++++++++++++++++++++++++++++++++++++
>  app/test-pmd/meson.build |   2 +-
>  3 files changed, 175 insertions(+), 1 deletion(-)
>  create mode 100644 app/test-pmd/bpf_sup.h
> 
> diff --git a/app/test-pmd/bpf_sup.h b/app/test-pmd/bpf_sup.h
> new file mode 100644
> index 000000000..35f91a07f
> --- /dev/null
> +++ b/app/test-pmd/bpf_sup.h
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2010-2017 Intel Corporation
> + */
> +
> +#ifndef _BPF_SUP_H_
> +#define _BPF_SUP_H_
> +
> +#include <stdio.h>
> +#include <rte_mbuf.h>
> +#include <rte_bpf_ethdev.h>

This makes testpmd depends to librte_bpf unconditionally. What do you think
using RTE_LIBRTE_BPF macro to make it possible to disable bpf library?

<...>

> @@ -16695,6 +16842,8 @@ cmdline_parse_ctx_t main_ctx[] = {
>  	(cmdline_parse_inst_t *)&cmd_set_port_tm_node_parent,
>  	(cmdline_parse_inst_t *)&cmd_port_tm_hierarchy_commit,
>  	(cmdline_parse_inst_t *)&cmd_cfg_tunnel_udp_port,
> +	(cmdline_parse_inst_t *)&cmd_operate_bpf_ld_parse,
> +	(cmdline_parse_inst_t *)&cmd_operate_bpf_unld_parse,

It can be better to move testpmd_funcs.rst updates in doc patch into this patch,
it helps confirming the both doc and patch.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/8] add framework to load and execute BPF code
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 0/8] add framework to load and execute BPF code Konstantin Ananyev
@ 2018-05-09 17:11         ` Ferruh Yigit
  0 siblings, 0 replies; 83+ messages in thread
From: Ferruh Yigit @ 2018-05-09 17:11 UTC (permalink / raw)
  To: Konstantin Ananyev, dev

On 5/4/2018 1:45 PM, Konstantin Ananyev wrote:
> BPF is used quite intensively inside Linux (and BSD) kernels
> for various different purposes and proved to be extremely useful.
> 
> BPF inside DPDK might also be used in a lot of places
> for a lot of similar things.
>  As an example to:
> - packet filtering/tracing (aka tcpdump)
> - packet classification
> - statistics collection
> - HW/PMD live-system debugging/prototyping - trace HW descriptors,
>   internal PMD SW state, etc.
> - Comeup with your own idea
> 
> All of that in a dynamic, user-defined and extensible manner.
> 
> So these series introduce new library - librte_bpf.
> librte_bpf provides API to load and execute BPF bytecode within
> user-space dpdk app.
> It supports basic set of features from eBPF spec.
> Also it introduces basic framework to load/unload BPF-based filters
> on eth devices (right now via SW RX/TX callbacks).
> 
> How to try it:
> ===============
> 
> 1) run testpmd as usual and start your favorite forwarding case.
> 2) build bpf program you'd like to load
> (you'll need clang v3.7 or above):
> $ cd test/bpf
> $ clang -O2 -target bpf -c t1.c
> 
> 3) load bpf program(s):
> testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>
> 
> <load-flags>:  [-][J][M]
> J - use JIT generated native code, otherwise BPF interpreter will be used.
> M - assume input parameter is a pointer to rte_mbuf,
>     otherwise assume it is a pointer to first segment's data.
> 
> Few examples:
> 
> # to load (not JITed) dummy.o at TX queue 0, port 0:
> testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o
> #to load (and JIT compile) t1.o at RX queue 0, port 1:
> testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o
> 
> #to load and JIT t3.o (note that it expects mbuf as an input):
> testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o
> 
> 4) observe changed traffic behavior
> Let say with the examples above:
>  - dummy.o  does literally nothing, so no changes should be here,
>    except some possible slowdown.
>  - t1.o - should force to drop all packets that doesn't match:
>    'dst 1.2.3.4 && udp && dst port 5000' filter.
>  - t3.o - should dump to stdout ARP packets.
> 
> 5) unload some or all bpf programs:
> testpmd> bpf-unload tx 0 0
> 
> 6) continue with step 3) or exit
> 
> Not currently supported features:
> =================================
> - cBPF
> - tail-pointer call
> - eBPF MAP
> - JIT for non X86_64 targets
> - skb
> - function calls for 32-bit apps
> - mbuf pointer as input parameter for 32-bit apps
> 
> v2:
>  - add meson build
>  - add freebsd build
>  - use new logging API
>  - using rte_malloc() for cbi allocation
>  - add extra logic into bpf_validate()
> 
> v3:
>  - add new test-case for it
>  - update docs
>  - update MAINTAINERS
> 
> v4:
>  - add more tests to cover BPF ISA
>  - fix few issues
> 
> v5:
>  - revert changes in tap_bpf.h
>  - rename eBPF related defines
>  - apply Thomas and Marco and Marco comments
> 
> Konstantin Ananyev (8):
>   bpf: add BPF loading and execution framework
>   bpf: add more logic into bpf_validate()
>   bpf: add JIT compilation for x86_64 ISA
>   bpf: introduce basic RX/TX BPF filters
>   testpmd: new commands to load/unload BPF filters
>   test: add few eBPF samples
>   test: introduce functional test for librte_bpf
>   doc: add bpf library related info

Not able to verify build for other architectures but x86 looks fine.

Some minor comments sent to the individual patches.

Taking into account that this is new library not effecting other pieces and
added as experimental with document and unit test provided:

For series,
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v5 5/8] testpmd: new commands to load/unload BPF filters
  2018-05-09 17:09         ` Ferruh Yigit
@ 2018-05-09 18:31           ` Kevin Traynor
  0 siblings, 0 replies; 83+ messages in thread
From: Kevin Traynor @ 2018-05-09 18:31 UTC (permalink / raw)
  To: Ferruh Yigit, Konstantin Ananyev, dev

On 05/09/2018 06:09 PM, Ferruh Yigit wrote:
> On 5/4/2018 1:45 PM, Konstantin Ananyev wrote:
>> Introduce new testpmd commands to load/unload RX/TX BPF-based filters.
>>
>> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
>> ---
>>  app/test-pmd/bpf_sup.h   |  25 ++++++++
>>  app/test-pmd/cmdline.c   | 149 +++++++++++++++++++++++++++++++++++++++++++++++
>>  app/test-pmd/meson.build |   2 +-
>>  3 files changed, 175 insertions(+), 1 deletion(-)
>>  create mode 100644 app/test-pmd/bpf_sup.h
>>
>> diff --git a/app/test-pmd/bpf_sup.h b/app/test-pmd/bpf_sup.h
>> new file mode 100644
>> index 000000000..35f91a07f
>> --- /dev/null
>> +++ b/app/test-pmd/bpf_sup.h
>> @@ -0,0 +1,25 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2010-2017 Intel Corporation
>> + */
>> +
>> +#ifndef _BPF_SUP_H_
>> +#define _BPF_SUP_H_
>> +
>> +#include <stdio.h>
>> +#include <rte_mbuf.h>
>> +#include <rte_bpf_ethdev.h>
> 
> This makes testpmd depends to librte_bpf unconditionally. What do you think
> using RTE_LIBRTE_BPF macro to make it possible to disable bpf library?
> 

+1

> <...>
> 
>> @@ -16695,6 +16842,8 @@ cmdline_parse_ctx_t main_ctx[] = {
>>  	(cmdline_parse_inst_t *)&cmd_set_port_tm_node_parent,
>>  	(cmdline_parse_inst_t *)&cmd_port_tm_hierarchy_commit,
>>  	(cmdline_parse_inst_t *)&cmd_cfg_tunnel_udp_port,
>> +	(cmdline_parse_inst_t *)&cmd_operate_bpf_ld_parse,
>> +	(cmdline_parse_inst_t *)&cmd_operate_bpf_unld_parse,
> 
> It can be better to move testpmd_funcs.rst updates in doc patch into this patch,
> it helps confirming the both doc and patch.
> 

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v6 0/9] add framework to load and execute BPF code
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
  2018-05-09 17:09         ` Ferruh Yigit
@ 2018-05-10 10:23         ` Konstantin Ananyev
  2018-05-11 14:23           ` Ferruh Yigit
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 1/9] bpf: add BPF loading and execution framework Konstantin Ananyev
                           ` (8 subsequent siblings)
  10 siblings, 1 reply; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-10 10:23 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

BPF is used quite intensively inside Linux (and BSD) kernels
for various different purposes and proved to be extremely useful.

BPF inside DPDK might also be used in a lot of places
for a lot of similar things.
 As an example to:
- packet filtering/tracing (aka tcpdump)
- packet classification
- statistics collection
- HW/PMD live-system debugging/prototyping - trace HW descriptors,
  internal PMD SW state, etc.
- Comeup with your own idea

All of that in a dynamic, user-defined and extensible manner.

So these series introduce new library - librte_bpf.
librte_bpf provides API to load and execute BPF bytecode within
user-space dpdk app.
It supports basic set of features from eBPF spec.
Also it introduces basic framework to load/unload BPF-based filters
on eth devices (right now via SW RX/TX callbacks).

How to try it:
===============

1) run testpmd as usual and start your favorite forwarding case.
2) build bpf program you'd like to load
(you'll need clang v3.7 or above):
$ cd test/bpf
$ clang -O2 -target bpf -c t1.c

3) load bpf program(s):
testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>

<load-flags>:  [-][J][M]
J - use JIT generated native code, otherwise BPF interpreter will be used.
M - assume input parameter is a pointer to rte_mbuf,
    otherwise assume it is a pointer to first segment's data.

Few examples:

# to load (not JITed) dummy.o at TX queue 0, port 0:
testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o
#to load (and JIT compile) t1.o at RX queue 0, port 1:
testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o

#to load and JIT t3.o (note that it expects mbuf as an input):
testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o


4) observe changed traffic behavior
Let say with the examples above:
 - dummy.o  does literally nothing, so no changes should be here,
   except some possible slowdown.
 - t1.o - should force to drop all packets that doesn't match:
   'dst 1.2.3.4 && udp && dst port 5000' filter.
 - t3.o - should dump to stdout ARP packets.

5) unload some or all bpf programs:
testpmd> bpf-unload tx 0 0

6) continue with step 3) or exit

Not currently supported features:
=================================
- cBPF
- tail-pointer call
- eBPF MAP
- JIT for non X86_64 targets
- skb
- function calls for 32-bit apps
- mbuf pointer as input parameter for 32-bit apps

v2:
 - add meson build
 - add freebsd build
 - use new logging API
 - using rte_malloc() for cbi allocation
 - add extra logic into bpf_validate()

v3:
 - add new test-case for it
 - update docs
 - update MAINTAINERS

v4:
 - add more tests to cover BPF ISA
 - fix few issues

v5:
 - revert changes in tap_bpf.h
 - rename eBPF related defines
 - apply Thomas and Marco comments

v6:
 Address Thomas, Kevin and Ferruh comments:
 - handle case when libelf is not installed gracefully
 - allow testpmd to be built without librte_bpf
 - doc nits

Konstantin Ananyev (9):
  bpf: add BPF loading and execution framework
  bpf: add ability to load eBPF program from ELF object file
  bpf: add more logic into bpf_validate()
  bpf: add JIT compilation for x86_64 ISA
  bpf: introduce basic RX/TX BPF filters
  testpmd: new commands to load/unload BPF filters
  test: add few eBPF samples
  test: introduce functional test for librte_bpf
  doc: add bpf library related info

 MAINTAINERS                                 |    4 +
 app/test-pmd/Makefile                       |    1 +
 app/test-pmd/bpf_cmd.c                      |  175 +++
 app/test-pmd/bpf_cmd.h                      |   16 +
 app/test-pmd/cmdline.c                      |    5 +
 app/test-pmd/meson.build                    |    4 +
 config/common_base                          |    8 +
 doc/api/doxy-api-index.md                   |    3 +-
 doc/api/doxy-api.conf                       |    1 +
 doc/guides/prog_guide/bpf_lib.rst           |   38 +
 doc/guides/prog_guide/index.rst             |    1 +
 doc/guides/rel_notes/release_18_05.rst      |    8 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |   56 +
 lib/Makefile                                |    2 +
 lib/librte_bpf/Makefile                     |   41 +
 lib/librte_bpf/bpf.c                        |   64 +
 lib/librte_bpf/bpf_def.h                    |  138 +++
 lib/librte_bpf/bpf_exec.c                   |  453 +++++++
 lib/librte_bpf/bpf_impl.h                   |   41 +
 lib/librte_bpf/bpf_jit_x86.c                | 1369 +++++++++++++++++++++
 lib/librte_bpf/bpf_load.c                   |  101 ++
 lib/librte_bpf/bpf_load_elf.c               |  322 +++++
 lib/librte_bpf/bpf_pkt.c                    |  607 +++++++++
 lib/librte_bpf/bpf_validate.c               | 1184 ++++++++++++++++++
 lib/librte_bpf/meson.build                  |   25 +
 lib/librte_bpf/rte_bpf.h                    |  184 +++
 lib/librte_bpf/rte_bpf_ethdev.h             |  112 ++
 lib/librte_bpf/rte_bpf_version.map          |   16 +
 lib/meson.build                             |    2 +-
 mk/rte.app.mk                               |    5 +
 test/bpf/dummy.c                            |   20 +
 test/bpf/mbuf.h                             |  578 +++++++++
 test/bpf/t1.c                               |   52 +
 test/bpf/t2.c                               |   31 +
 test/bpf/t3.c                               |   36 +
 test/test/Makefile                          |    2 +
 test/test/meson.build                       |    2 +
 test/test/test_bpf.c                        | 1759 +++++++++++++++++++++++++++
 38 files changed, 7464 insertions(+), 2 deletions(-)
 create mode 100644 app/test-pmd/bpf_cmd.c
 create mode 100644 app/test-pmd/bpf_cmd.h
 create mode 100644 doc/guides/prog_guide/bpf_lib.rst
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_def.h
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_load_elf.c
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c
 create mode 100644 test/test/test_bpf.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v6 1/9] bpf: add BPF loading and execution framework
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
  2018-05-09 17:09         ` Ferruh Yigit
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 0/9] add framework to load and execute BPF code Konstantin Ananyev
@ 2018-05-10 10:23         ` Konstantin Ananyev
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 2/9] bpf: add ability to load eBPF program from ELF object file Konstantin Ananyev
                           ` (7 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-10 10:23 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

librte_bpf provides a framework to load and execute eBPF bytecode
inside user-space dpdk based applications.
It supports basic set of features from eBPF spec
(https://www.kernel.org/doc/Documentation/networking/filter.txt).

Not currently supported features:
 - JIT
 - cBPF
 - tail-pointer call
 - eBPF MAP
 - skb
 - function calls for 32-bit apps
 - mbuf pointer as input parameter for 32-bit apps

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 MAINTAINERS                            |   3 +
 config/common_base                     |   5 +
 doc/guides/rel_notes/release_18_05.rst |   1 +
 lib/Makefile                           |   2 +
 lib/librte_bpf/Makefile                |  30 +++
 lib/librte_bpf/bpf.c                   |  59 +++++
 lib/librte_bpf/bpf_def.h               | 138 ++++++++++
 lib/librte_bpf/bpf_exec.c              | 453 +++++++++++++++++++++++++++++++++
 lib/librte_bpf/bpf_impl.h              |  41 +++
 lib/librte_bpf/bpf_load.c              |  85 +++++++
 lib/librte_bpf/bpf_validate.c          |  55 ++++
 lib/librte_bpf/meson.build             |  13 +
 lib/librte_bpf/rte_bpf.h               | 164 ++++++++++++
 lib/librte_bpf/rte_bpf_version.map     |  11 +
 lib/meson.build                        |   2 +-
 mk/rte.app.mk                          |   2 +
 16 files changed, 1063 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_bpf/Makefile
 create mode 100644 lib/librte_bpf/bpf.c
 create mode 100644 lib/librte_bpf/bpf_def.h
 create mode 100644 lib/librte_bpf/bpf_exec.c
 create mode 100644 lib/librte_bpf/bpf_impl.h
 create mode 100644 lib/librte_bpf/bpf_load.c
 create mode 100644 lib/librte_bpf/bpf_validate.c
 create mode 100644 lib/librte_bpf/meson.build
 create mode 100644 lib/librte_bpf/rte_bpf.h
 create mode 100644 lib/librte_bpf/rte_bpf_version.map

diff --git a/MAINTAINERS b/MAINTAINERS
index 7105920f3..7350f61ed 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1060,6 +1060,9 @@ Latency statistics
 M: Reshma Pattan <reshma.pattan@intel.com>
 F: lib/librte_latencystats/
 
+BPF
+M: Konstantin Ananyev <konstantin.ananyev@intel.com>
+F: lib/librte_bpf/
 
 Test Applications
 -----------------
diff --git a/config/common_base b/config/common_base
index 0d181ace8..b598feae8 100644
--- a/config/common_base
+++ b/config/common_base
@@ -873,3 +873,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y
 # Compile the eventdev application
 #
 CONFIG_RTE_APP_EVENTDEV=y
+
+#
+# Compile librte_bpf
+#
+CONFIG_RTE_LIBRTE_BPF=y
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 718734852..5d1cc1807 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -432,6 +432,7 @@ The libraries prepended with a plus sign were incremented in this version.
      librte_acl.so.2
      librte_bbdev.so.1
      librte_bitratestats.so.2
+   + librte_bpf.so.1
      librte_bus_dpaa.so.1
      librte_bus_fslmc.so.1
      librte_bus_pci.so.1
diff --git a/lib/Makefile b/lib/Makefile
index 057bf7890..29cea6429 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -98,6 +98,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ethdev
 DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso
 DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ethdev librte_net
 DEPDIRS-librte_gso += librte_mempool
+DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf
+DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ethdev
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
new file mode 100644
index 000000000..da9306564
--- /dev/null
+++ b/lib/librte_bpf/Makefile
@@ -0,0 +1,30 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_bpf.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+CFLAGS += -DALLOW_EXPERIMENTAL_API
+LDLIBS += -lrte_net -lrte_eal
+LDLIBS += -lrte_mempool -lrte_ring
+LDLIBS += -lrte_mbuf -lrte_ethdev
+
+EXPORT_MAP := rte_bpf_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+
+# install header files
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += bpf_def.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
new file mode 100644
index 000000000..d7f68c017
--- /dev/null
+++ b/lib/librte_bpf/bpf.c
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+int rte_bpf_logtype;
+
+__rte_experimental void
+rte_bpf_destroy(struct rte_bpf *bpf)
+{
+	if (bpf != NULL) {
+		if (bpf->jit.func != NULL)
+			munmap(bpf->jit.func, bpf->jit.sz);
+		munmap(bpf, bpf->sz);
+	}
+}
+
+__rte_experimental int
+rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit)
+{
+	if (bpf == NULL || jit == NULL)
+		return -EINVAL;
+
+	jit[0] = bpf->jit;
+	return 0;
+}
+
+int
+bpf_jit(struct rte_bpf *bpf)
+{
+	int32_t rc;
+
+	rc = -ENOTSUP;
+	if (rc != 0)
+		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
+
+RTE_INIT(rte_bpf_init_log);
+
+static void
+rte_bpf_init_log(void)
+{
+	rte_bpf_logtype = rte_log_register("lib.bpf");
+	if (rte_bpf_logtype >= 0)
+		rte_log_set_level(rte_bpf_logtype, RTE_LOG_INFO);
+}
diff --git a/lib/librte_bpf/bpf_def.h b/lib/librte_bpf/bpf_def.h
new file mode 100644
index 000000000..6b69de345
--- /dev/null
+++ b/lib/librte_bpf/bpf_def.h
@@ -0,0 +1,138 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 1982, 1986, 1990, 1993
+ *      The Regents of the University of California.
+ * Copyright(c) 2018 Intel Corporation.
+ */
+
+#ifndef _RTE_BPF_DEF_H_
+#define _RTE_BPF_DEF_H_
+
+/**
+ * @file
+ *
+ * classic BPF (cBPF) and extended BPF (eBPF) related defines.
+ * For more information regarding cBPF and eBPF ISA and their differences,
+ * please refer to:
+ * https://www.kernel.org/doc/Documentation/networking/filter.txt.
+ * As a rule of thumb for that file:
+ * all definitions used by both cBPF and eBPF start with bpf(BPF)_ prefix,
+ * while eBPF only ones start with ebpf(EBPF)) prefix.
+ */
+
+#include <stdint.h>
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * The instruction encodings.
+ */
+
+/* Instruction classes */
+#define BPF_CLASS(code) ((code) & 0x07)
+#define	BPF_LD		0x00
+#define	BPF_LDX		0x01
+#define	BPF_ST		0x02
+#define	BPF_STX		0x03
+#define	BPF_ALU		0x04
+#define	BPF_JMP		0x05
+#define	BPF_RET		0x06
+#define	BPF_MISC        0x07
+
+#define EBPF_ALU64	0x07
+
+/* ld/ldx fields */
+#define BPF_SIZE(code)  ((code) & 0x18)
+#define	BPF_W		0x00
+#define	BPF_H		0x08
+#define	BPF_B		0x10
+#define	EBPF_DW		0x18
+
+#define BPF_MODE(code)  ((code) & 0xe0)
+#define	BPF_IMM		0x00
+#define	BPF_ABS		0x20
+#define	BPF_IND		0x40
+#define	BPF_MEM		0x60
+#define	BPF_LEN		0x80
+#define	BPF_MSH		0xa0
+
+#define EBPF_XADD	0xc0
+
+/* alu/jmp fields */
+#define BPF_OP(code)    ((code) & 0xf0)
+#define	BPF_ADD		0x00
+#define	BPF_SUB		0x10
+#define	BPF_MUL		0x20
+#define	BPF_DIV		0x30
+#define	BPF_OR		0x40
+#define	BPF_AND		0x50
+#define	BPF_LSH		0x60
+#define	BPF_RSH		0x70
+#define	BPF_NEG		0x80
+#define	BPF_MOD		0x90
+#define	BPF_XOR		0xa0
+
+#define EBPF_MOV	0xb0
+#define EBPF_ARSH	0xc0
+#define EBPF_END	0xd0
+
+#define	BPF_JA		0x00
+#define	BPF_JEQ		0x10
+#define	BPF_JGT		0x20
+#define	BPF_JGE		0x30
+#define	BPF_JSET        0x40
+
+#define EBPF_JNE	0x50
+#define EBPF_JSGT	0x60
+#define EBPF_JSGE	0x70
+#define EBPF_CALL	0x80
+#define EBPF_EXIT	0x90
+#define EBPF_JLT	0xa0
+#define EBPF_JLE	0xb0
+#define EBPF_JSLT	0xc0
+#define EBPF_JSLE	0xd0
+
+#define BPF_SRC(code)   ((code) & 0x08)
+#define	BPF_K		0x00
+#define	BPF_X		0x08
+
+/* if BPF_OP(code) == EBPF_END */
+#define EBPF_TO_LE	0x00  /* convert to little-endian */
+#define EBPF_TO_BE	0x08  /* convert to big-endian */
+
+/*
+ * eBPF registers
+ */
+enum {
+	EBPF_REG_0,  /* return value from internal function/for eBPF program */
+	EBPF_REG_1,  /* 0-th argument to internal function */
+	EBPF_REG_2,  /* 1-th argument to internal function */
+	EBPF_REG_3,  /* 2-th argument to internal function */
+	EBPF_REG_4,  /* 3-th argument to internal function */
+	EBPF_REG_5,  /* 4-th argument to internal function */
+	EBPF_REG_6,  /* callee saved register */
+	EBPF_REG_7,  /* callee saved register */
+	EBPF_REG_8,  /* callee saved register */
+	EBPF_REG_9,  /* callee saved register */
+	EBPF_REG_10, /* stack pointer (read-only) */
+	EBPF_REG_NUM,
+};
+
+/*
+ * eBPF instruction format
+ */
+struct ebpf_insn {
+	uint8_t code;
+	uint8_t dst_reg:4;
+	uint8_t src_reg:4;
+	int16_t off;
+	int32_t imm;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* RTE_BPF_DEF_H_ */
diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c
new file mode 100644
index 000000000..e373b1f3d
--- /dev/null
+++ b/lib/librte_bpf/bpf_exec.c
@@ -0,0 +1,453 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define BPF_JMP_UNC(ins)	((ins) += (ins)->off)
+
+#define BPF_JMP_CND_REG(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \
+		(ins)->off : 0)
+
+#define BPF_JMP_CND_IMM(reg, ins, op, type)	\
+	((ins) += \
+		((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \
+		(ins)->off : 0)
+
+#define BPF_NEG_ALU(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg]))
+
+#define EBPF_MOV_ALU_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg])
+
+#define BPF_OP_ALU_REG(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg])
+
+#define EBPF_MOV_ALU_IMM(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = (type)(ins)->imm)
+
+#define BPF_OP_ALU_IMM(reg, ins, op, type)	\
+	((reg)[(ins)->dst_reg] = \
+		(type)(reg)[(ins)->dst_reg] op (type)(ins)->imm)
+
+#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \
+	if ((type)(reg)[(ins)->src_reg] == 0) { \
+		RTE_BPF_LOG(ERR, \
+			"%s(%p): division by 0 at pc: %#zx;\n", \
+			__func__, bpf, \
+			(uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \
+		return 0; \
+	} \
+} while (0)
+
+#define BPF_LD_REG(reg, ins, type)	\
+	((reg)[(ins)->dst_reg] = \
+		*(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off))
+
+#define BPF_ST_IMM(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(ins)->imm)
+
+#define BPF_ST_REG(reg, ins, type)	\
+	(*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \
+		(type)(reg)[(ins)->src_reg])
+
+#define BPF_ST_XADD_REG(reg, ins, tp)	\
+	(rte_atomic##tp##_add((rte_atomic##tp##_t *) \
+		(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \
+		reg[ins->src_reg]))
+
+static inline void
+bpf_alu_be(uint64_t reg[EBPF_REG_NUM], const struct ebpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_be_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_be_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_be_64(*v);
+		break;
+	}
+}
+
+static inline void
+bpf_alu_le(uint64_t reg[EBPF_REG_NUM], const struct ebpf_insn *ins)
+{
+	uint64_t *v;
+
+	v = reg + ins->dst_reg;
+	switch (ins->imm) {
+	case 16:
+		*v = rte_cpu_to_le_16(*v);
+		break;
+	case 32:
+		*v = rte_cpu_to_le_32(*v);
+		break;
+	case 64:
+		*v = rte_cpu_to_le_64(*v);
+		break;
+	}
+}
+
+static inline uint64_t
+bpf_exec(const struct rte_bpf *bpf, uint64_t reg[EBPF_REG_NUM])
+{
+	const struct ebpf_insn *ins;
+
+	for (ins = bpf->prm.ins; ; ins++) {
+		switch (ins->code) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | EBPF_MOV | BPF_K):
+			EBPF_MOV_ALU_IMM(reg, ins, uint32_t);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint32_t);
+			break;
+		case (BPF_ALU | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint32_t);
+			break;
+		case (BPF_ALU | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint32_t);
+			break;
+		case (BPF_ALU | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint32_t);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint32_t);
+			break;
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint32_t);
+			break;
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint32_t);
+			break;
+		case (BPF_ALU | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint32_t);
+			break;
+		case (BPF_ALU | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint32_t);
+			break;
+		case (BPF_ALU | EBPF_MOV | BPF_X):
+			EBPF_MOV_ALU_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint32_t);
+			break;
+		case (BPF_ALU | EBPF_END | EBPF_TO_BE):
+			bpf_alu_be(reg, ins);
+			break;
+		case (BPF_ALU | EBPF_END | EBPF_TO_LE):
+			bpf_alu_le(reg, ins);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (EBPF_ALU64 | BPF_ADD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, +, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_SUB | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, -, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_AND | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, &, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_OR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, |, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_LSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, <<, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_RSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, uint64_t);
+			break;
+		case (EBPF_ALU64 | EBPF_ARSH | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, >>, int64_t);
+			break;
+		case (EBPF_ALU64 | BPF_XOR | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, ^, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_MUL | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, *, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_DIV | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, /, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_MOD | BPF_K):
+			BPF_OP_ALU_IMM(reg, ins, %, uint64_t);
+			break;
+		case (EBPF_ALU64 | EBPF_MOV | BPF_K):
+			EBPF_MOV_ALU_IMM(reg, ins, uint64_t);
+			break;
+		/* 64 bit ALU REG operations */
+		case (EBPF_ALU64 | BPF_ADD | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, +, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_SUB | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, -, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_AND | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, &, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_OR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, |, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_LSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, <<, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_RSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, uint64_t);
+			break;
+		case (EBPF_ALU64 | EBPF_ARSH | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, >>, int64_t);
+			break;
+		case (EBPF_ALU64 | BPF_XOR | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, ^, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_MUL | BPF_X):
+			BPF_OP_ALU_REG(reg, ins, *, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_DIV | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, /, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_MOD | BPF_X):
+			BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t);
+			BPF_OP_ALU_REG(reg, ins, %, uint64_t);
+			break;
+		case (EBPF_ALU64 | EBPF_MOV | BPF_X):
+			EBPF_MOV_ALU_REG(reg, ins, uint64_t);
+			break;
+		case (EBPF_ALU64 | BPF_NEG):
+			BPF_NEG_ALU(reg, ins, uint64_t);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+			BPF_LD_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_H):
+			BPF_LD_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_LDX | BPF_MEM | BPF_W):
+			BPF_LD_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_LDX | BPF_MEM | EBPF_DW):
+			BPF_LD_REG(reg, ins, uint64_t);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | EBPF_DW):
+			reg[ins->dst_reg] = (uint32_t)ins[0].imm |
+				(uint64_t)(uint32_t)ins[1].imm << 32;
+			ins++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+			BPF_ST_REG(reg, ins, uint8_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_H):
+			BPF_ST_REG(reg, ins, uint16_t);
+			break;
+		case (BPF_STX | BPF_MEM | BPF_W):
+			BPF_ST_REG(reg, ins, uint32_t);
+			break;
+		case (BPF_STX | BPF_MEM | EBPF_DW):
+			BPF_ST_REG(reg, ins, uint64_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+			BPF_ST_IMM(reg, ins, uint8_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_H):
+			BPF_ST_IMM(reg, ins, uint16_t);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_W):
+			BPF_ST_IMM(reg, ins, uint32_t);
+			break;
+		case (BPF_ST | BPF_MEM | EBPF_DW):
+			BPF_ST_IMM(reg, ins, uint64_t);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | EBPF_XADD | BPF_W):
+			BPF_ST_XADD_REG(reg, ins, 32);
+			break;
+		case (BPF_STX | EBPF_XADD | EBPF_DW):
+			BPF_ST_XADD_REG(reg, ins, 64);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			BPF_JMP_UNC(ins);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JNE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JSGT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSLT | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSGE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSLE | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			BPF_JMP_CND_IMM(reg, ins, &, uint64_t);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, ==, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JNE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, !=, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, uint64_t);
+			break;
+		case (BPF_JMP | BPF_JGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, uint64_t);
+			break;
+		case (BPF_JMP | EBPF_JSGT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSLT | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSGE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, >=, int64_t);
+			break;
+		case (BPF_JMP | EBPF_JSLE | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, <=, int64_t);
+			break;
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			BPF_JMP_CND_REG(reg, ins, &, uint64_t);
+			break;
+		/* call instructions */
+		case (BPF_JMP | EBPF_CALL):
+			reg[EBPF_REG_0] = bpf->prm.xsym[ins->imm].func(
+				reg[EBPF_REG_1], reg[EBPF_REG_2],
+				reg[EBPF_REG_3], reg[EBPF_REG_4],
+				reg[EBPF_REG_5]);
+			break;
+		/* return instruction */
+		case (BPF_JMP | EBPF_EXIT):
+			return reg[EBPF_REG_0];
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %#zx;\n",
+				__func__, bpf, ins->code,
+				(uintptr_t)ins - (uintptr_t)bpf->prm.ins);
+			return 0;
+		}
+	}
+
+	/* should never be reached */
+	RTE_VERIFY(0);
+	return 0;
+}
+
+__rte_experimental uint32_t
+rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[],
+	uint32_t num)
+{
+	uint32_t i;
+	uint64_t reg[EBPF_REG_NUM];
+	uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)];
+
+	for (i = 0; i != num; i++) {
+
+		reg[EBPF_REG_1] = (uintptr_t)ctx[i];
+		reg[EBPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack));
+
+		rc[i] = bpf_exec(bpf, reg);
+	}
+
+	return i;
+}
+
+__rte_experimental uint64_t
+rte_bpf_exec(const struct rte_bpf *bpf, void *ctx)
+{
+	uint64_t rc;
+
+	rte_bpf_exec_burst(bpf, &ctx, &rc, 1);
+	return rc;
+}
diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h
new file mode 100644
index 000000000..5d7e65c31
--- /dev/null
+++ b/lib/librte_bpf/bpf_impl.h
@@ -0,0 +1,41 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_H_
+#define _BPF_H_
+
+#include <rte_bpf.h>
+#include <sys/mman.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MAX_BPF_STACK_SIZE	0x200
+
+struct rte_bpf {
+	struct rte_bpf_prm prm;
+	struct rte_bpf_jit jit;
+	size_t sz;
+	uint32_t stack_sz;
+};
+
+extern int bpf_validate(struct rte_bpf *bpf);
+
+extern int bpf_jit(struct rte_bpf *bpf);
+
+#ifdef RTE_ARCH_X86_64
+extern int bpf_jit_x86(struct rte_bpf *);
+#endif
+
+extern int rte_bpf_logtype;
+
+#define	RTE_BPF_LOG(lvl, fmt, args...) \
+	rte_log(RTE_LOG_## lvl, rte_bpf_logtype, fmt, ##args)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _BPF_H_ */
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
new file mode 100644
index 000000000..f28ecfb4d
--- /dev/null
+++ b/lib/librte_bpf/bpf_load.c
@@ -0,0 +1,85 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+static struct rte_bpf *
+bpf_load(const struct rte_bpf_prm *prm)
+{
+	uint8_t *buf;
+	struct rte_bpf *bpf;
+	size_t sz, bsz, insz, xsz;
+
+	xsz =  prm->nb_xsym * sizeof(prm->xsym[0]);
+	insz = prm->nb_ins * sizeof(prm->ins[0]);
+	bsz = sizeof(bpf[0]);
+	sz = insz + xsz + bsz;
+
+	buf = mmap(NULL, sz, PROT_READ | PROT_WRITE,
+		MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+	if (buf == MAP_FAILED)
+		return NULL;
+
+	bpf = (void *)buf;
+	bpf->sz = sz;
+
+	memcpy(&bpf->prm, prm, sizeof(bpf->prm));
+
+	memcpy(buf + bsz, prm->xsym, xsz);
+	memcpy(buf + bsz + xsz, prm->ins, insz);
+
+	bpf->prm.xsym = (void *)(buf + bsz);
+	bpf->prm.ins = (void *)(buf + bsz + xsz);
+
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_load(const struct rte_bpf_prm *prm)
+{
+	struct rte_bpf *bpf;
+	int32_t rc;
+
+	if (prm == NULL || prm->ins == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load(prm);
+	if (bpf == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	rc = bpf_validate(bpf);
+	if (rc == 0) {
+		bpf_jit(bpf);
+		if (mprotect(bpf, bpf->sz, PROT_READ) != 0)
+			rc = -ENOMEM;
+	}
+
+	if (rc != 0) {
+		rte_bpf_destroy(bpf);
+		rte_errno = -rc;
+		return NULL;
+	}
+
+	return bpf;
+}
diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
new file mode 100644
index 000000000..6a1b33181
--- /dev/null
+++ b/lib/librte_bpf/bpf_validate.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_eal.h>
+
+#include "bpf_impl.h"
+
+/*
+ * dummy one for now, need more work.
+ */
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc, ofs, stack_sz;
+	uint32_t i, op, dr;
+	const struct ebpf_insn *ins;
+
+	rc = 0;
+	stack_sz = 0;
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		ins = bpf->prm.ins + i;
+		op = ins->code;
+		dr = ins->dst_reg;
+		ofs = ins->off;
+
+		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
+				dr == EBPF_REG_10) {
+			ofs -= sizeof(uint64_t);
+			stack_sz = RTE_MIN(ofs, stack_sz);
+		}
+	}
+
+	if (stack_sz != 0) {
+		stack_sz = -stack_sz;
+		if (stack_sz > MAX_BPF_STACK_SIZE)
+			rc = -ERANGE;
+		else
+			bpf->stack_sz = stack_sz;
+	}
+
+	if (rc != 0)
+		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
+			__func__, bpf, rc);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
new file mode 100644
index 000000000..4fa000f5a
--- /dev/null
+++ b/lib/librte_bpf/meson.build
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2018 Intel Corporation
+
+allow_experimental_apis = true
+sources = files('bpf.c',
+		'bpf_exec.c',
+		'bpf_load.c',
+		'bpf_validate.c')
+
+install_headers = files('bpf_def.h',
+			'rte_bpf.h')
+
+deps += ['mbuf', 'net']
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
new file mode 100644
index 000000000..4b3d970b9
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf.h
@@ -0,0 +1,164 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_H_
+#define _RTE_BPF_H_
+
+/**
+ * @file
+ *
+ * RTE BPF support.
+ * librte_bpf provides a framework to load and execute eBPF bytecode
+ * inside user-space dpdk based applications.
+ * It supports basic set of features from eBPF spec
+ * (https://www.kernel.org/doc/Documentation/networking/filter.txt).
+ */
+
+#include <rte_common.h>
+#include <rte_mbuf.h>
+#include <bpf_def.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Possible types for function/BPF program arguments.
+ */
+enum rte_bpf_arg_type {
+	RTE_BPF_ARG_UNDEF,      /**< undefined */
+	RTE_BPF_ARG_RAW,        /**< scalar value */
+	RTE_BPF_ARG_PTR = 0x10, /**< pointer to data buffer */
+	RTE_BPF_ARG_PTR_MBUF,   /**< pointer to rte_mbuf */
+	RTE_BPF_ARG_PTR_STACK,
+};
+
+/**
+ * function argument information
+ */
+struct rte_bpf_arg {
+	enum rte_bpf_arg_type type;
+	size_t size;     /**< for pointer types, size of data it points to */
+	size_t buf_size;
+	/**< for mbuf ptr type, max size of rte_mbuf data buffer */
+};
+
+/**
+ * determine is argument a pointer
+ */
+#define RTE_BPF_ARG_PTR_TYPE(x)	((x) & RTE_BPF_ARG_PTR)
+
+/**
+ * Possible types for external symbols.
+ */
+enum rte_bpf_xtype {
+	RTE_BPF_XTYPE_FUNC, /**< function */
+	RTE_BPF_XTYPE_VAR,  /**< variable */
+	RTE_BPF_XTYPE_NUM
+};
+
+/**
+ * Definition for external symbols available in the BPF program.
+ */
+struct rte_bpf_xsym {
+	const char *name;        /**< name */
+	enum rte_bpf_xtype type; /**< type */
+	union {
+		uint64_t (*func)(uint64_t, uint64_t, uint64_t,
+				uint64_t, uint64_t);
+		void *var;
+	}; /**< value */
+};
+
+/**
+ * Input parameters for loading eBPF code.
+ */
+struct rte_bpf_prm {
+	const struct ebpf_insn *ins; /**< array of eBPF instructions */
+	uint32_t nb_ins;            /**< number of instructions in ins */
+	const struct rte_bpf_xsym *xsym;
+	/**< array of external symbols that eBPF code is allowed to reference */
+	uint32_t nb_xsym; /**< number of elements in xsym */
+	struct rte_bpf_arg prog_arg; /**< eBPF program input arg description */
+};
+
+/**
+ * Information about compiled into native ISA eBPF code.
+ */
+struct rte_bpf_jit {
+	uint64_t (*func)(void *); /**< JIT-ed native code */
+	size_t sz;                /**< size of JIT-ed code */
+};
+
+struct rte_bpf;
+
+/**
+ * De-allocate all memory used by this eBPF execution context.
+ *
+ * @param bpf
+ *   BPF handle to destroy.
+ */
+void rte_bpf_destroy(struct rte_bpf *bpf);
+
+/**
+ * Create a new eBPF execution context and load given BPF code into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
+
+/**
+ * Execute given BPF bytecode.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   pointer to input context.
+ * @return
+ *   BPF execution return value.
+ */
+uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx);
+
+/**
+ * Execute given BPF bytecode over a set of input contexts.
+ *
+ * @param bpf
+ *   handle for the BPF code to execute.
+ * @param ctx
+ *   array of pointers to the input contexts.
+ * @param rc
+ *   array of return values (one per input).
+ * @param num
+ *   number of elements in ctx[] (and rc[]).
+ * @return
+ *   number of successfully processed inputs.
+ */
+uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[],
+	uint64_t rc[], uint32_t num);
+
+/**
+ * Provide information about natively compield code for given BPF handle.
+ *
+ * @param bpf
+ *   handle for the BPF code.
+ * @param jit
+ *   pointer to the rte_bpf_jit structure to be filled with related data.
+ * @return
+ *   - -EINVAL if the parameters are invalid.
+ *   - Zero if operation completed successfully.
+ */
+int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
new file mode 100644
index 000000000..ea1d621c4
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -0,0 +1,11 @@
+EXPERIMENTAL {
+	global:
+
+	rte_bpf_destroy;
+	rte_bpf_exec;
+	rte_bpf_exec_burst;
+	rte_bpf_get_jit;
+	rte_bpf_load;
+
+	local: *;
+};
diff --git a/lib/meson.build b/lib/meson.build
index 166905c1c..9635aff41 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -23,7 +23,7 @@ libraries = [ 'compat', # just a header, used for versioning
 	# add pkt framework libs which use other libs from above
 	'port', 'table', 'pipeline',
 	# flow_classify lib depends on pkt framework table lib
-	'flow_classify']
+	'flow_classify', 'bpf']
 
 foreach l:libraries
 	build = true
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 26f35630a..9d3c421cc 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -82,6 +82,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf
+
 _LDLIBS-y += --whole-archive
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v6 2/9] bpf: add ability to load eBPF program from ELF object file
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
                           ` (2 preceding siblings ...)
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 1/9] bpf: add BPF loading and execution framework Konstantin Ananyev
@ 2018-05-10 10:23         ` Konstantin Ananyev
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 3/9] bpf: add more logic into bpf_validate() Konstantin Ananyev
                           ` (6 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-10 10:23 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce rte_bpf_elf_load() function to provide ability to
load eBPF program from ELF object file.
It also adds dependency on libelf.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 config/common_base                 |   3 +
 lib/librte_bpf/Makefile            |   6 +
 lib/librte_bpf/bpf_load.c          |  16 ++
 lib/librte_bpf/bpf_load_elf.c      | 322 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build         |   6 +
 lib/librte_bpf/rte_bpf.h           |  20 +++
 lib/librte_bpf/rte_bpf_version.map |   1 +
 mk/rte.app.mk                      |   3 +
 8 files changed, 377 insertions(+)
 create mode 100644 lib/librte_bpf/bpf_load_elf.c

diff --git a/config/common_base b/config/common_base
index b598feae8..6a1908104 100644
--- a/config/common_base
+++ b/config/common_base
@@ -878,3 +878,6 @@ CONFIG_RTE_APP_EVENTDEV=y
 # Compile librte_bpf
 #
 CONFIG_RTE_LIBRTE_BPF=y
+
+# allow load BPF from ELF files (requires liblef)
+CONFIG_RTE_LIBRTE_BPF_ELF=n
diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index da9306564..885a31381 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -12,6 +12,9 @@ CFLAGS += -DALLOW_EXPERIMENTAL_API
 LDLIBS += -lrte_net -lrte_eal
 LDLIBS += -lrte_mempool -lrte_ring
 LDLIBS += -lrte_mbuf -lrte_ethdev
+ifeq ($(CONFIG_RTE_LIBRTE_BPF_ELF),y)
+LDLIBS += -lelf
+endif
 
 EXPORT_MAP := rte_bpf_version.map
 
@@ -22,6 +25,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
+ifeq ($(CONFIG_RTE_LIBRTE_BPF_ELF),y)
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load_elf.c
+endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += bpf_def.h
diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c
index f28ecfb4d..d1c9abd7f 100644
--- a/lib/librte_bpf/bpf_load.c
+++ b/lib/librte_bpf/bpf_load.c
@@ -83,3 +83,19 @@ rte_bpf_load(const struct rte_bpf_prm *prm)
 
 	return bpf;
 }
+
+__rte_experimental __attribute__ ((weak)) struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	RTE_BPF_LOG(ERR, "%s() is not supported with current config\n"
+		"rebuild with libelf installed\n",
+		__func__);
+	rte_errno = ENOTSUP;
+	return NULL;
+}
diff --git a/lib/librte_bpf/bpf_load_elf.c b/lib/librte_bpf/bpf_load_elf.c
new file mode 100644
index 000000000..6ab03d86e
--- /dev/null
+++ b/lib/librte_bpf/bpf_load_elf.c
@@ -0,0 +1,322 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/queue.h>
+#include <fcntl.h>
+
+#include <libelf.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+
+#include "bpf_impl.h"
+
+/* To overcome compatibility issue */
+#ifndef EM_BPF
+#define	EM_BPF	247
+#endif
+
+static uint32_t
+bpf_find_xsym(const char *sn, enum rte_bpf_xtype type,
+	const struct rte_bpf_xsym fp[], uint32_t fn)
+{
+	uint32_t i;
+
+	if (sn == NULL || fp == NULL)
+		return UINT32_MAX;
+
+	for (i = 0; i != fn; i++) {
+		if (fp[i].type == type && strcmp(sn, fp[i].name) == 0)
+			break;
+	}
+
+	return (i != fn) ? i : UINT32_MAX;
+}
+
+/*
+ * update BPF code at offset *ofs* with a proper address(index) for external
+ * symbol *sn*
+ */
+static int
+resolve_xsym(const char *sn, size_t ofs, struct ebpf_insn *ins, size_t ins_sz,
+	const struct rte_bpf_prm *prm)
+{
+	uint32_t idx, fidx;
+	enum rte_bpf_xtype type;
+
+	if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz)
+		return -EINVAL;
+
+	idx = ofs / sizeof(ins[0]);
+	if (ins[idx].code == (BPF_JMP | EBPF_CALL))
+		type = RTE_BPF_XTYPE_FUNC;
+	else if (ins[idx].code == (BPF_LD | BPF_IMM | EBPF_DW) &&
+			ofs < ins_sz - sizeof(ins[idx]))
+		type = RTE_BPF_XTYPE_VAR;
+	else
+		return -EINVAL;
+
+	fidx = bpf_find_xsym(sn, type, prm->xsym, prm->nb_xsym);
+	if (fidx == UINT32_MAX)
+		return -ENOENT;
+
+	/* for function we just need an index in our xsym table */
+	if (type == RTE_BPF_XTYPE_FUNC)
+		ins[idx].imm = fidx;
+	/* for variable we need to store its absolute address */
+	else {
+		ins[idx].imm = (uintptr_t)prm->xsym[fidx].var;
+		ins[idx + 1].imm =
+			(uint64_t)(uintptr_t)prm->xsym[fidx].var >> 32;
+	}
+
+	return 0;
+}
+
+static int
+check_elf_header(const Elf64_Ehdr *eh)
+{
+	const char *err;
+
+	err = NULL;
+
+#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
+	if (eh->e_ident[EI_DATA] != ELFDATA2LSB)
+#else
+	if (eh->e_ident[EI_DATA] != ELFDATA2MSB)
+#endif
+		err = "not native byte order";
+	else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE)
+		err = "unexpected OS ABI";
+	else if (eh->e_type != ET_REL)
+		err = "unexpected ELF type";
+	else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF)
+		err = "unexpected machine type";
+
+	if (err != NULL) {
+		RTE_BPF_LOG(ERR, "%s(): %s\n", __func__, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find executable section by name.
+ */
+static int
+find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx)
+{
+	Elf_Scn *sc;
+	const Elf64_Ehdr *eh;
+	const Elf64_Shdr *sh;
+	Elf_Data *sd;
+	const char *sn;
+	int32_t rc;
+
+	eh = elf64_getehdr(elf);
+	if (eh == NULL) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	if (check_elf_header(eh) != 0)
+		return -EINVAL;
+
+	/* find given section by name */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL;
+			sc = elf_nextscn(elf, sc)) {
+		sh = elf64_getshdr(sc);
+		sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name);
+		if (sn != NULL && strcmp(section, sn) == 0 &&
+				sh->sh_type == SHT_PROGBITS &&
+				sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR))
+			break;
+	}
+
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL || sd->d_size == 0 ||
+			sd->d_size % sizeof(struct ebpf_insn) != 0) {
+		rc = elf_errno();
+		RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n",
+			__func__, elf, section, rc, elf_errmsg(rc));
+		return -EINVAL;
+	}
+
+	*psd = sd;
+	*pidx = elf_ndxscn(sc);
+	return 0;
+}
+
+/*
+ * helper function to process data from relocation table.
+ */
+static int
+process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz,
+	struct ebpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm)
+{
+	int32_t rc;
+	uint32_t i, n;
+	size_t ofs, sym;
+	const char *sn;
+	const Elf64_Ehdr *eh;
+	Elf_Scn *sc;
+	const Elf_Data *sd;
+	Elf64_Sym *sm;
+
+	eh = elf64_getehdr(elf);
+
+	/* get symtable by section index */
+	sc = elf_getscn(elf, sym_idx);
+	sd = elf_getdata(sc, NULL);
+	if (sd == NULL)
+		return -EINVAL;
+	sm = sd->d_buf;
+
+	n = re_sz / sizeof(re[0]);
+	for (i = 0; i != n; i++) {
+
+		ofs = re[i].r_offset;
+
+		/* retrieve index in the symtable */
+		sym = ELF64_R_SYM(re[i].r_info);
+		if (sym * sizeof(sm[0]) >= sd->d_size)
+			return -EINVAL;
+
+		sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name);
+
+		rc = resolve_xsym(sn, ofs, ins, ins_sz, prm);
+		if (rc != 0) {
+			RTE_BPF_LOG(ERR,
+				"resolve_xsym(%s, %zu) error code: %d\n",
+				sn, ofs, rc);
+			return rc;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * helper function, find relocation information (if any)
+ * and update bpf code.
+ */
+static int
+elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx,
+	const struct rte_bpf_prm *prm)
+{
+	Elf64_Rel *re;
+	Elf_Scn *sc;
+	const Elf64_Shdr *sh;
+	const Elf_Data *sd;
+	int32_t rc;
+
+	rc = 0;
+
+	/* walk through all sections */
+	for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0;
+			sc = elf_nextscn(elf, sc)) {
+
+		sh = elf64_getshdr(sc);
+
+		/* relocation data for our code section */
+		if (sh->sh_type == SHT_REL && sh->sh_info == sidx) {
+			sd = elf_getdata(sc, NULL);
+			if (sd == NULL || sd->d_size == 0 ||
+					sd->d_size % sizeof(re[0]) != 0)
+				return -EINVAL;
+			rc = process_reloc(elf, sh->sh_link,
+				sd->d_buf, sd->d_size, ed->d_buf, ed->d_size,
+				prm);
+		}
+	}
+
+	return rc;
+}
+
+static struct rte_bpf *
+bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section)
+{
+	Elf *elf;
+	Elf_Data *sd;
+	size_t sidx;
+	int32_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_prm np;
+
+	elf_version(EV_CURRENT);
+	elf = elf_begin(fd, ELF_C_READ, NULL);
+
+	rc = find_elf_code(elf, section, &sd, &sidx);
+	if (rc == 0)
+		rc = elf_reloc_code(elf, sd, sidx, prm);
+
+	if (rc == 0) {
+		np = prm[0];
+		np.ins = sd->d_buf;
+		np.nb_ins = sd->d_size / sizeof(struct ebpf_insn);
+		bpf = rte_bpf_load(&np);
+	} else {
+		bpf = NULL;
+		rte_errno = -rc;
+	}
+
+	elf_end(elf);
+	return bpf;
+}
+
+__rte_experimental struct rte_bpf *
+rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname,
+	const char *sname)
+{
+	int32_t fd, rc;
+	struct rte_bpf *bpf;
+
+	if (prm == NULL || fname == NULL || sname == NULL) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	fd = open(fname, O_RDONLY);
+	if (fd < 0) {
+		rc = errno;
+		RTE_BPF_LOG(ERR, "%s(%s) error code: %d(%s)\n",
+			__func__, fname, rc, strerror(rc));
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	bpf = bpf_load_elf(prm, fd, sname);
+	close(fd);
+
+	if (bpf == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s(fname=\"%s\", sname=\"%s\") failed, "
+			"error code: %d\n",
+			__func__, fname, sname, rte_errno);
+		return NULL;
+	}
+
+	RTE_BPF_LOG(INFO, "%s(fname=\"%s\", sname=\"%s\") "
+		"successfully creates %p(jit={.func=%p,.sz=%zu});\n",
+		__func__, fname, sname, bpf, bpf->jit.func, bpf->jit.sz);
+	return bpf;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index 4fa000f5a..a6a9229bd 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -11,3 +11,9 @@ install_headers = files('bpf_def.h',
 			'rte_bpf.h')
 
 deps += ['mbuf', 'net']
+
+dep = cc.find_library('elf', required: false)
+if dep.found() == true and cc.has_header('libelf.h', dependencies: dep)
+	sources += files('bpf_load_elf.c')
+	ext_deps += dep
+endif
diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h
index 4b3d970b9..1d6c4a9d2 100644
--- a/lib/librte_bpf/rte_bpf.h
+++ b/lib/librte_bpf/rte_bpf.h
@@ -116,6 +116,26 @@ void rte_bpf_destroy(struct rte_bpf *bpf);
 struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm);
 
 /**
+ * Create a new eBPF execution context and load BPF code from given ELF
+ * file into it.
+ *
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   BPF handle that is used in future BPF operations,
+ *   or NULL on error, with error code set in rte_errno.
+ *   Possible rte_errno errors include:
+ *   - EINVAL - invalid parameter passed to function
+ *   - ENOMEM - can't reserve enough memory
+ */
+struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm,
+	const char *fname, const char *sname);
+
+/**
  * Execute given BPF bytecode.
  *
  * @param bpf
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
index ea1d621c4..ff65144df 100644
--- a/lib/librte_bpf/rte_bpf_version.map
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -2,6 +2,7 @@ EXPERIMENTAL {
 	global:
 
 	rte_bpf_destroy;
+	rte_bpf_elf_load;
 	rte_bpf_exec;
 	rte_bpf_exec_burst;
 	rte_bpf_get_jit;
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 9d3c421cc..e54cc5c4e 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -83,6 +83,9 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER)          += -lrte_power
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD)            += -lrte_efd
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lrte_bpf
+ifeq ($(CONFIG_RTE_LIBRTE_BPF_ELF),y)
+_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF)            += -lelf
+endif
 
 _LDLIBS-y += --whole-archive
 
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v6 3/9] bpf: add more logic into bpf_validate()
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
                           ` (3 preceding siblings ...)
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 2/9] bpf: add ability to load eBPF program from ELF object file Konstantin Ananyev
@ 2018-05-10 10:23         ` Konstantin Ananyev
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 4/9] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
                           ` (5 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-10 10:23 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add checks for:
 - all instructions are valid ones
   (known opcodes, correct syntax, valid reg/off/imm values, etc.)
 - no unreachable instructions
 - no loops
 - basic stack boundaries checks
 - division by zero

Still need to add checks for:
 - use/return only initialized registers and stack data.
 - memory boundaries violation

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 lib/librte_bpf/bpf_validate.c | 1181 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 1155 insertions(+), 26 deletions(-)

diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c
index 6a1b33181..b7081c853 100644
--- a/lib/librte_bpf/bpf_validate.c
+++ b/lib/librte_bpf/bpf_validate.c
@@ -14,42 +14,1171 @@
 
 #include "bpf_impl.h"
 
+/* possible instruction node colour */
+enum {
+	WHITE,
+	GREY,
+	BLACK,
+	MAX_NODE_COLOUR
+};
+
+/* possible edge types */
+enum {
+	UNKNOWN_EDGE,
+	TREE_EDGE,
+	BACK_EDGE,
+	CROSS_EDGE,
+	MAX_EDGE_TYPE
+};
+
+struct bpf_reg_state {
+	uint64_t val;
+};
+
+struct bpf_eval_state {
+	struct bpf_reg_state rs[EBPF_REG_NUM];
+};
+
+#define	MAX_EDGES	2
+
+struct inst_node {
+	uint8_t colour;
+	uint8_t nb_edge:4;
+	uint8_t cur_edge:4;
+	uint8_t edge_type[MAX_EDGES];
+	uint32_t edge_dest[MAX_EDGES];
+	uint32_t prev_node;
+	struct bpf_eval_state *evst;
+};
+
+struct bpf_verifier {
+	const struct rte_bpf_prm *prm;
+	struct inst_node *in;
+	int32_t stack_sz;
+	uint32_t nb_nodes;
+	uint32_t nb_jcc_nodes;
+	uint32_t node_colour[MAX_NODE_COLOUR];
+	uint32_t edge_type[MAX_EDGE_TYPE];
+	struct bpf_eval_state *evst;
+	struct {
+		uint32_t num;
+		uint32_t cur;
+		struct bpf_eval_state *ent;
+	} evst_pool;
+};
+
+struct bpf_ins_check {
+	struct {
+		uint16_t dreg;
+		uint16_t sreg;
+	} mask;
+	struct {
+		uint16_t min;
+		uint16_t max;
+	} off;
+	struct {
+		uint32_t min;
+		uint32_t max;
+	} imm;
+	const char * (*check)(const struct ebpf_insn *);
+	const char * (*eval)(struct bpf_verifier *, const struct ebpf_insn *);
+};
+
+#define	ALL_REGS	RTE_LEN2MASK(EBPF_REG_NUM, uint16_t)
+#define	WRT_REGS	RTE_LEN2MASK(EBPF_REG_10, uint16_t)
+#define	ZERO_REG	RTE_LEN2MASK(EBPF_REG_1, uint16_t)
+
 /*
- * dummy one for now, need more work.
+ * check and evaluate functions for particular instruction types.
  */
-int
-bpf_validate(struct rte_bpf *bpf)
+
+static const char *
+check_alu_bele(const struct ebpf_insn *ins)
+{
+	if (ins->imm != 16 && ins->imm != 32 && ins->imm != 64)
+		return "invalid imm field";
+	return NULL;
+}
+
+static const char *
+eval_stack(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
+{
+	int32_t ofs;
+
+	ofs = ins->off;
+
+	if (ofs >= 0 || ofs < -MAX_BPF_STACK_SIZE)
+		return "stack boundary violation";
+
+	ofs = -ofs;
+	bvf->stack_sz = RTE_MAX(bvf->stack_sz, ofs);
+	return NULL;
+}
+
+static const char *
+eval_store(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
+{
+	if (ins->dst_reg == EBPF_REG_10)
+		return eval_stack(bvf, ins);
+	return NULL;
+}
+
+static const char *
+eval_load(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
+{
+	if (ins->src_reg == EBPF_REG_10)
+		return eval_stack(bvf, ins);
+	return NULL;
+}
+
+static const char *
+eval_call(struct bpf_verifier *bvf, const struct ebpf_insn *ins)
+{
+	uint32_t idx;
+
+	idx = ins->imm;
+
+	if (idx >= bvf->prm->nb_xsym ||
+			bvf->prm->xsym[idx].type != RTE_BPF_XTYPE_FUNC)
+		return "invalid external function index";
+
+	/* for now don't support function calls on 32 bit platform */
+	if (sizeof(uint64_t) != sizeof(uintptr_t))
+		return "function calls are supported only for 64 bit apps";
+	return NULL;
+}
+
+/*
+ * validate parameters for each instruction type.
+ */
+static const struct bpf_ins_check ins_chk[UINT8_MAX] = {
+	/* ALU IMM 32-bit instructions */
+	[(BPF_ALU | BPF_ADD | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_SUB | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_AND | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_OR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_LSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_RSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_XOR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_MUL | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | EBPF_MOV | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(BPF_ALU | BPF_DIV | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	[(BPF_ALU | BPF_MOD | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	/* ALU IMM 64-bit instructions */
+	[(EBPF_ALU64 | BPF_ADD | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_SUB | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_AND | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_OR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_LSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_RSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | EBPF_ARSH | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_XOR | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_MUL | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | EBPF_MOV | BPF_K)] = {
+		.mask = {.dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX,},
+	},
+	[(EBPF_ALU64 | BPF_DIV | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	[(EBPF_ALU64 | BPF_MOD | BPF_K)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 1, .max = UINT32_MAX},
+	},
+	/* ALU REG 32-bit instructions */
+	[(BPF_ALU | BPF_ADD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_SUB | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_AND | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_OR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_LSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_RSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_XOR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MUL | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_DIV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_MOD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | EBPF_MOV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | BPF_NEG)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_ALU | EBPF_END | EBPF_TO_BE)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 16, .max = 64},
+		.check = check_alu_bele,
+	},
+	[(BPF_ALU | EBPF_END | EBPF_TO_LE)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 16, .max = 64},
+		.check = check_alu_bele,
+	},
+	/* ALU REG 64-bit instructions */
+	[(EBPF_ALU64 | BPF_ADD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_SUB | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_AND | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_OR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_LSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_RSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | EBPF_ARSH | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_XOR | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_MUL | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_DIV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_MOD | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | EBPF_MOV | BPF_X)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(EBPF_ALU64 | BPF_NEG)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* load instructions */
+	[(BPF_LDX | BPF_MEM | BPF_B)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_H)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | BPF_W)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	[(BPF_LDX | BPF_MEM | EBPF_DW)] = {
+		.mask = {. dreg = WRT_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_load,
+	},
+	/* load 64 bit immediate value */
+	[(BPF_LD | BPF_IMM | EBPF_DW)] = {
+		.mask = { .dreg = WRT_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	/* store REG instructions */
+	[(BPF_STX | BPF_MEM | BPF_B)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_H)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | BPF_MEM | EBPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	/* atomic add instructions */
+	[(BPF_STX | EBPF_XADD | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	[(BPF_STX | EBPF_XADD | EBPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+		.eval = eval_store,
+	},
+	/* store IMM instructions */
+	[(BPF_ST | BPF_MEM | BPF_B)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_H)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | BPF_W)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	[(BPF_ST | BPF_MEM | EBPF_DW)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_store,
+	},
+	/* jump instruction */
+	[(BPF_JMP | BPF_JA)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* jcc IMM instructions */
+	[(BPF_JMP | BPF_JEQ | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JNE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JGT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JLT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JGE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JLE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JSGT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JSLT | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JSGE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | EBPF_JSLE | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	[(BPF_JMP | BPF_JSET | BPF_K)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = UINT32_MAX},
+	},
+	/* jcc REG instructions */
+	[(BPF_JMP | BPF_JEQ | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JNE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JGT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JLT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JGE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JLE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JSGT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JSLT | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JSGE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | EBPF_JSLE | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	[(BPF_JMP | BPF_JSET | BPF_X)] = {
+		.mask = { .dreg = ALL_REGS, .sreg = ALL_REGS},
+		.off = { .min = 0, .max = UINT16_MAX},
+		.imm = { .min = 0, .max = 0},
+	},
+	/* call instruction */
+	[(BPF_JMP | EBPF_CALL)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = UINT32_MAX},
+		.eval = eval_call,
+	},
+	/* ret instruction */
+	[(BPF_JMP | EBPF_EXIT)] = {
+		.mask = { .dreg = ZERO_REG, .sreg = ZERO_REG},
+		.off = { .min = 0, .max = 0},
+		.imm = { .min = 0, .max = 0},
+	},
+};
+
+/*
+ * make sure that instruction syntax is valid,
+ * and it fields don't violate partciular instrcution type restrictions.
+ */
+static const char *
+check_syntax(const struct ebpf_insn *ins)
+{
+
+	uint8_t op;
+	uint16_t off;
+	uint32_t imm;
+
+	op = ins->code;
+
+	if (ins_chk[op].mask.dreg == 0)
+		return "invalid opcode";
+
+	if ((ins_chk[op].mask.dreg & 1 << ins->dst_reg) == 0)
+		return "invalid dst-reg field";
+
+	if ((ins_chk[op].mask.sreg & 1 << ins->src_reg) == 0)
+		return "invalid src-reg field";
+
+	off = ins->off;
+	if (ins_chk[op].off.min > off || ins_chk[op].off.max < off)
+		return "invalid off field";
+
+	imm = ins->imm;
+	if (ins_chk[op].imm.min > imm || ins_chk[op].imm.max < imm)
+		return "invalid imm field";
+
+	if (ins_chk[op].check != NULL)
+		return ins_chk[op].check(ins);
+
+	return NULL;
+}
+
+/*
+ * helper function, return instruction index for the given node.
+ */
+static uint32_t
+get_node_idx(const struct bpf_verifier *bvf, const struct inst_node *node)
 {
-	int32_t rc, ofs, stack_sz;
-	uint32_t i, op, dr;
+	return node - bvf->in;
+}
+
+/*
+ * helper function, used to walk through constructed CFG.
+ */
+static struct inst_node *
+get_next_node(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	uint32_t ce, ne, dst;
+
+	ne = node->nb_edge;
+	ce = node->cur_edge;
+	if (ce == ne)
+		return NULL;
+
+	node->cur_edge++;
+	dst = node->edge_dest[ce];
+	return bvf->in + dst;
+}
+
+static void
+set_node_colour(struct bpf_verifier *bvf, struct inst_node *node,
+	uint32_t new)
+{
+	uint32_t prev;
+
+	prev = node->colour;
+	node->colour = new;
+
+	bvf->node_colour[prev]--;
+	bvf->node_colour[new]++;
+}
+
+/*
+ * helper function, add new edge between two nodes.
+ */
+static int
+add_edge(struct bpf_verifier *bvf, struct inst_node *node, uint32_t nidx)
+{
+	uint32_t ne;
+
+	if (nidx > bvf->prm->nb_ins) {
+		RTE_BPF_LOG(ERR, "%s: program boundary violation at pc: %u, "
+			"next pc: %u\n",
+			__func__, get_node_idx(bvf, node), nidx);
+		return -EINVAL;
+	}
+
+	ne = node->nb_edge;
+	if (ne >= RTE_DIM(node->edge_dest)) {
+		RTE_BPF_LOG(ERR, "%s: internal error at pc: %u\n",
+			__func__, get_node_idx(bvf, node));
+		return -EINVAL;
+	}
+
+	node->edge_dest[ne] = nidx;
+	node->nb_edge = ne + 1;
+	return 0;
+}
+
+/*
+ * helper function, determine type of edge between two nodes.
+ */
+static void
+set_edge_type(struct bpf_verifier *bvf, struct inst_node *node,
+	const struct inst_node *next)
+{
+	uint32_t ce, clr, type;
+
+	ce = node->cur_edge - 1;
+	clr = next->colour;
+
+	type = UNKNOWN_EDGE;
+
+	if (clr == WHITE)
+		type = TREE_EDGE;
+	else if (clr == GREY)
+		type = BACK_EDGE;
+	else if (clr == BLACK)
+		/*
+		 * in fact it could be either direct or cross edge,
+		 * but for now, we don't need to distinguish between them.
+		 */
+		type = CROSS_EDGE;
+
+	node->edge_type[ce] = type;
+	bvf->edge_type[type]++;
+}
+
+static struct inst_node *
+get_prev_node(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	return  bvf->in + node->prev_node;
+}
+
+/*
+ * Depth-First Search (DFS) through previously constructed
+ * Control Flow Graph (CFG).
+ * Information collected at this path would be used later
+ * to determine is there any loops, and/or unreachable instructions.
+ */
+static void
+dfs(struct bpf_verifier *bvf)
+{
+	struct inst_node *next, *node;
+
+	node = bvf->in;
+	while (node != NULL) {
+
+		if (node->colour == WHITE)
+			set_node_colour(bvf, node, GREY);
+
+		if (node->colour == GREY) {
+
+			/* find next unprocessed child node */
+			do {
+				next = get_next_node(bvf, node);
+				if (next == NULL)
+					break;
+				set_edge_type(bvf, node, next);
+			} while (next->colour != WHITE);
+
+			if (next != NULL) {
+				/* proceed with next child */
+				next->prev_node = get_node_idx(bvf, node);
+				node = next;
+			} else {
+				/*
+				 * finished with current node and all it's kids,
+				 * proceed with parent
+				 */
+				set_node_colour(bvf, node, BLACK);
+				node->cur_edge = 0;
+				node = get_prev_node(bvf, node);
+			}
+		} else
+			node = NULL;
+	}
+}
+
+/*
+ * report unreachable instructions.
+ */
+static void
+log_unreachable(const struct bpf_verifier *bvf)
+{
+	uint32_t i;
+	struct inst_node *node;
 	const struct ebpf_insn *ins;
 
-	rc = 0;
-	stack_sz = 0;
-	for (i = 0; i != bpf->prm.nb_ins; i++) {
-
-		ins = bpf->prm.ins + i;
-		op = ins->code;
-		dr = ins->dst_reg;
-		ofs = ins->off;
-
-		if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) &&
-				dr == EBPF_REG_10) {
-			ofs -= sizeof(uint64_t);
-			stack_sz = RTE_MIN(ofs, stack_sz);
+	for (i = 0; i != bvf->prm->nb_ins; i++) {
+
+		node = bvf->in + i;
+		ins = bvf->prm->ins + i;
+
+		if (node->colour == WHITE &&
+				ins->code != (BPF_LD | BPF_IMM | EBPF_DW))
+			RTE_BPF_LOG(ERR, "unreachable code at pc: %u;\n", i);
+	}
+}
+
+/*
+ * report loops detected.
+ */
+static void
+log_loop(const struct bpf_verifier *bvf)
+{
+	uint32_t i, j;
+	struct inst_node *node;
+
+	for (i = 0; i != bvf->prm->nb_ins; i++) {
+
+		node = bvf->in + i;
+		if (node->colour != BLACK)
+			continue;
+
+		for (j = 0; j != node->nb_edge; j++) {
+			if (node->edge_type[j] == BACK_EDGE)
+				RTE_BPF_LOG(ERR,
+					"loop at pc:%u --> pc:%u;\n",
+					i, node->edge_dest[j]);
 		}
 	}
+}
+
+/*
+ * First pass goes though all instructions in the set, checks that each
+ * instruction is a valid one (correct syntax, valid field values, etc.)
+ * and constructs control flow graph (CFG).
+ * Then deapth-first search is performed over the constructed graph.
+ * Programs with unreachable instructions and/or loops will be rejected.
+ */
+static int
+validate(struct bpf_verifier *bvf)
+{
+	int32_t rc;
+	uint32_t i;
+	struct inst_node *node;
+	const struct ebpf_insn *ins;
+	const char *err;
 
-	if (stack_sz != 0) {
-		stack_sz = -stack_sz;
-		if (stack_sz > MAX_BPF_STACK_SIZE)
-			rc = -ERANGE;
-		else
-			bpf->stack_sz = stack_sz;
+	rc = 0;
+	for (i = 0; i < bvf->prm->nb_ins; i++) {
+
+		ins = bvf->prm->ins + i;
+		node = bvf->in + i;
+
+		err = check_syntax(ins);
+		if (err != 0) {
+			RTE_BPF_LOG(ERR, "%s: %s at pc: %u\n",
+				__func__, err, i);
+			rc |= -EINVAL;
+		}
+
+		/*
+		 * construct CFG, jcc nodes have to outgoing edges,
+		 * 'exit' nodes - none, all others nodes have exaclty one
+		 * outgoing edge.
+		 */
+		switch (ins->code) {
+		case (BPF_JMP | EBPF_EXIT):
+			break;
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | EBPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | EBPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | EBPF_JLE | BPF_K):
+		case (BPF_JMP | EBPF_JSGT | BPF_K):
+		case (BPF_JMP | EBPF_JSLT | BPF_K):
+		case (BPF_JMP | EBPF_JSGE | BPF_K):
+		case (BPF_JMP | EBPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | EBPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | EBPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | EBPF_JLE | BPF_X):
+		case (BPF_JMP | EBPF_JSGT | BPF_X):
+		case (BPF_JMP | EBPF_JSLT | BPF_X):
+		case (BPF_JMP | EBPF_JSGE | BPF_X):
+		case (BPF_JMP | EBPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			rc |= add_edge(bvf, node, i + ins->off + 1);
+			rc |= add_edge(bvf, node, i + 1);
+			bvf->nb_jcc_nodes++;
+			break;
+		case (BPF_JMP | BPF_JA):
+			rc |= add_edge(bvf, node, i + ins->off + 1);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | EBPF_DW):
+			rc |= add_edge(bvf, node, i + 2);
+			i++;
+			break;
+		default:
+			rc |= add_edge(bvf, node, i + 1);
+			break;
+		}
+
+		bvf->nb_nodes++;
+		bvf->node_colour[WHITE]++;
 	}
 
 	if (rc != 0)
-		RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n",
-			__func__, bpf, rc);
+		return rc;
+
+	dfs(bvf);
+
+	RTE_BPF_LOG(DEBUG, "%s(%p) stats:\n"
+		"nb_nodes=%u;\n"
+		"nb_jcc_nodes=%u;\n"
+		"node_color={[WHITE]=%u, [GREY]=%u,, [BLACK]=%u};\n"
+		"edge_type={[UNKNOWN]=%u, [TREE]=%u, [BACK]=%u, [CROSS]=%u};\n",
+		__func__, bvf,
+		bvf->nb_nodes,
+		bvf->nb_jcc_nodes,
+		bvf->node_colour[WHITE], bvf->node_colour[GREY],
+			bvf->node_colour[BLACK],
+		bvf->edge_type[UNKNOWN_EDGE], bvf->edge_type[TREE_EDGE],
+		bvf->edge_type[BACK_EDGE], bvf->edge_type[CROSS_EDGE]);
+
+	if (bvf->node_colour[BLACK] != bvf->nb_nodes) {
+		RTE_BPF_LOG(ERR, "%s(%p) unreachable instructions;\n",
+			__func__, bvf);
+		log_unreachable(bvf);
+		return -EINVAL;
+	}
+
+	if (bvf->node_colour[GREY] != 0 || bvf->node_colour[WHITE] != 0 ||
+			bvf->edge_type[UNKNOWN_EDGE] != 0) {
+		RTE_BPF_LOG(ERR, "%s(%p) DFS internal error;\n",
+			__func__, bvf);
+		return -EINVAL;
+	}
+
+	if (bvf->edge_type[BACK_EDGE] != 0) {
+		RTE_BPF_LOG(ERR, "%s(%p) loops detected;\n",
+			__func__, bvf);
+		log_loop(bvf);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ * helper functions get/free eval states.
+ */
+static struct bpf_eval_state *
+pull_eval_state(struct bpf_verifier *bvf)
+{
+	uint32_t n;
+
+	n = bvf->evst_pool.cur;
+	if (n == bvf->evst_pool.num)
+		return NULL;
+
+	bvf->evst_pool.cur = n + 1;
+	return bvf->evst_pool.ent + n;
+}
+
+static void
+push_eval_state(struct bpf_verifier *bvf)
+{
+	bvf->evst_pool.cur--;
+}
+
+static void
+evst_pool_fini(struct bpf_verifier *bvf)
+{
+	bvf->evst = NULL;
+	free(bvf->evst_pool.ent);
+	memset(&bvf->evst_pool, 0, sizeof(bvf->evst_pool));
+}
+
+static int
+evst_pool_init(struct bpf_verifier *bvf)
+{
+	uint32_t n;
+
+	n = bvf->nb_jcc_nodes + 1;
+
+	bvf->evst_pool.ent = calloc(n, sizeof(bvf->evst_pool.ent[0]));
+	if (bvf->evst_pool.ent == NULL)
+		return -ENOMEM;
+
+	bvf->evst_pool.num = n;
+	bvf->evst_pool.cur = 0;
+
+	bvf->evst = pull_eval_state(bvf);
+	return 0;
+}
+
+/*
+ * Save current eval state.
+ */
+static int
+save_eval_state(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	struct bpf_eval_state *st;
+
+	/* get new eval_state for this node */
+	st = pull_eval_state(bvf);
+	if (st == NULL) {
+		RTE_BPF_LOG(ERR,
+			"%s: internal error (out of space) at pc: %u",
+			__func__, get_node_idx(bvf, node));
+		return -ENOMEM;
+	}
+
+	/* make a copy of current state */
+	memcpy(st, bvf->evst, sizeof(*st));
+
+	/* swap current state with new one */
+	node->evst = bvf->evst;
+	bvf->evst = st;
+
+	RTE_BPF_LOG(DEBUG, "%s(bvf=%p,node=%u) old/new states: %p/%p;\n",
+		__func__, bvf, get_node_idx(bvf, node), node->evst, bvf->evst);
+
+	return 0;
+}
+
+/*
+ * Restore previous eval state and mark current eval state as free.
+ */
+static void
+restore_eval_state(struct bpf_verifier *bvf, struct inst_node *node)
+{
+	RTE_BPF_LOG(DEBUG, "%s(bvf=%p,node=%u) old/new states: %p/%p;\n",
+		__func__, bvf, get_node_idx(bvf, node), bvf->evst, node->evst);
+
+	bvf->evst = node->evst;
+	node->evst = NULL;
+	push_eval_state(bvf);
+}
+
+/*
+ * Do second pass through CFG and try to evaluate instructions
+ * via each possible path.
+ * Right now evaluation functionality is quite limited.
+ * Still need to add extra checks for:
+ * - use/return uninitialized registers.
+ * - use uninitialized data from the stack.
+ * - memory boundaries violation.
+ */
+static int
+evaluate(struct bpf_verifier *bvf)
+{
+	int32_t rc;
+	uint32_t idx, op;
+	const char *err;
+	const struct ebpf_insn *ins;
+	struct inst_node *next, *node;
+
+	node = bvf->in;
+	ins = bvf->prm->ins;
+	rc = 0;
+
+	while (node != NULL && rc == 0) {
+
+		/* current node evaluation */
+		idx = get_node_idx(bvf, node);
+		op = ins[idx].code;
+
+		if (ins_chk[op].eval != NULL) {
+			err = ins_chk[op].eval(bvf, ins + idx);
+			if (err != NULL) {
+				RTE_BPF_LOG(ERR, "%s: %s at pc: %u\n",
+					__func__, err, idx);
+				rc = -EINVAL;
+			}
+		}
+
+		/* proceed through CFG */
+		next = get_next_node(bvf, node);
+		if (next != NULL) {
+
+			/* proceed with next child */
+			if (node->cur_edge != node->nb_edge)
+				rc |= save_eval_state(bvf, node);
+			else if (node->evst != NULL)
+				restore_eval_state(bvf, node);
+
+			next->prev_node = get_node_idx(bvf, node);
+			node = next;
+		} else {
+			/*
+			 * finished with current node and all it's kids,
+			 * proceed with parent
+			 */
+			node->cur_edge = 0;
+			node = get_prev_node(bvf, node);
+
+			/* finished */
+			if (node == bvf->in)
+				node = NULL;
+		}
+	}
+
+	return rc;
+}
+
+int
+bpf_validate(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	struct bpf_verifier bvf;
+
+	/* check input argument type, don't allow mbuf ptr on 32-bit */
+	if (bpf->prm.prog_arg.type != RTE_BPF_ARG_RAW &&
+			bpf->prm.prog_arg.type != RTE_BPF_ARG_PTR &&
+			(sizeof(uint64_t) != sizeof(uintptr_t) ||
+			bpf->prm.prog_arg.type != RTE_BPF_ARG_PTR_MBUF)) {
+		RTE_BPF_LOG(ERR, "%s: unsupported argument type\n", __func__);
+		return -ENOTSUP;
+	}
+
+	memset(&bvf, 0, sizeof(bvf));
+	bvf.prm = &bpf->prm;
+	bvf.in = calloc(bpf->prm.nb_ins, sizeof(bvf.in[0]));
+	if (bvf.in == NULL)
+		return -ENOMEM;
+
+	rc = validate(&bvf);
+
+	if (rc == 0) {
+		rc = evst_pool_init(&bvf);
+		if (rc == 0)
+			rc = evaluate(&bvf);
+		evst_pool_fini(&bvf);
+	}
+
+	free(bvf.in);
+
+	/* copy collected info */
+	if (rc == 0)
+		bpf->stack_sz = bvf.stack_sz;
+
 	return rc;
 }
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v6 4/9] bpf: add JIT compilation for x86_64 ISA
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
                           ` (4 preceding siblings ...)
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 3/9] bpf: add more logic into bpf_validate() Konstantin Ananyev
@ 2018-05-10 10:23         ` Konstantin Ananyev
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 5/9] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
                           ` (4 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-10 10:23 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 lib/librte_bpf/Makefile      |    3 +
 lib/librte_bpf/bpf.c         |    5 +
 lib/librte_bpf/bpf_jit_x86.c | 1369 ++++++++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build   |    4 +
 4 files changed, 1381 insertions(+)
 create mode 100644 lib/librte_bpf/bpf_jit_x86.c

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index 885a31381..7a9e00cf3 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -28,6 +28,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
 ifeq ($(CONFIG_RTE_LIBRTE_BPF_ELF),y)
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load_elf.c
 endif
+ifeq ($(CONFIG_RTE_ARCH_X86_64),y)
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_jit_x86.c
+endif
 
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += bpf_def.h
diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c
index d7f68c017..dc6d10991 100644
--- a/lib/librte_bpf/bpf.c
+++ b/lib/librte_bpf/bpf.c
@@ -41,7 +41,12 @@ bpf_jit(struct rte_bpf *bpf)
 {
 	int32_t rc;
 
+#ifdef RTE_ARCH_X86_64
+	rc = bpf_jit_x86(bpf);
+#else
 	rc = -ENOTSUP;
+#endif
+
 	if (rc != 0)
 		RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n",
 			__func__, bpf, rc);
diff --git a/lib/librte_bpf/bpf_jit_x86.c b/lib/librte_bpf/bpf_jit_x86.c
new file mode 100644
index 000000000..111e028d2
--- /dev/null
+++ b/lib/librte_bpf/bpf_jit_x86.c
@@ -0,0 +1,1369 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_byteorder.h>
+
+#include "bpf_impl.h"
+
+#define GET_BPF_OP(op)	(BPF_OP(op) >> 4)
+
+enum {
+	RAX = 0,  /* scratch, return value */
+	RCX = 1,  /* scratch, 4th arg */
+	RDX = 2,  /* scratch, 3rd arg */
+	RBX = 3,  /* callee saved */
+	RSP = 4,  /* stack pointer */
+	RBP = 5,  /* frame pointer, callee saved */
+	RSI = 6,  /* scratch, 2nd arg */
+	RDI = 7,  /* scratch, 1st arg */
+	R8  = 8,  /* scratch, 5th arg */
+	R9  = 9,  /* scratch, 6th arg */
+	R10 = 10, /* scratch */
+	R11 = 11, /* scratch */
+	R12 = 12, /* callee saved */
+	R13 = 13, /* callee saved */
+	R14 = 14, /* callee saved */
+	R15 = 15, /* callee saved */
+};
+
+#define IS_EXT_REG(r)	((r) >= R8)
+
+enum {
+	REX_PREFIX = 0x40, /* fixed value 0100 */
+	REX_W = 0x8,       /* 64bit operand size */
+	REX_R = 0x4,       /* extension of the ModRM.reg field */
+	REX_X = 0x2,       /* extension of the SIB.index field */
+	REX_B = 0x1,       /* extension of the ModRM.rm field */
+};
+
+enum {
+	MOD_INDIRECT = 0,
+	MOD_IDISP8 = 1,
+	MOD_IDISP32 = 2,
+	MOD_DIRECT = 3,
+};
+
+enum {
+	SIB_SCALE_1 = 0,
+	SIB_SCALE_2 = 1,
+	SIB_SCALE_4 = 2,
+	SIB_SCALE_8 = 3,
+};
+
+/*
+ * eBPF to x86_64 register mappings.
+ */
+static const uint32_t ebpf2x86[] = {
+	[EBPF_REG_0] = RAX,
+	[EBPF_REG_1] = RDI,
+	[EBPF_REG_2] = RSI,
+	[EBPF_REG_3] = RDX,
+	[EBPF_REG_4] = RCX,
+	[EBPF_REG_5] = R8,
+	[EBPF_REG_6] = RBX,
+	[EBPF_REG_7] = R13,
+	[EBPF_REG_8] = R14,
+	[EBPF_REG_9] = R15,
+	[EBPF_REG_10] = RBP,
+};
+
+/*
+ * r10 and r11 are used as a scratch temporary registers.
+ */
+enum {
+	REG_DIV_IMM = R9,
+	REG_TMP0 = R11,
+	REG_TMP1 = R10,
+};
+
+/*
+ * callee saved registers list.
+ * keep RBP as the last one.
+ */
+static const uint32_t save_regs[] = {RBX, R12, R13, R14, R15, RBP};
+
+struct bpf_jit_state {
+	uint32_t idx;
+	size_t sz;
+	struct {
+		uint32_t num;
+		int32_t off;
+	} exit;
+	uint32_t reguse;
+	int32_t *off;
+	uint8_t *ins;
+};
+
+#define	INUSE(v, r)	(((v) >> (r)) & 1)
+#define	USED(v, r)	((v) |= 1 << (r))
+
+union bpf_jit_imm {
+	uint32_t u32;
+	uint8_t u8[4];
+};
+
+static size_t
+bpf_size(uint32_t bpf_op_sz)
+{
+	if (bpf_op_sz == BPF_B)
+		return sizeof(uint8_t);
+	else if (bpf_op_sz == BPF_H)
+		return sizeof(uint16_t);
+	else if (bpf_op_sz == BPF_W)
+		return sizeof(uint32_t);
+	else if (bpf_op_sz == EBPF_DW)
+		return sizeof(uint64_t);
+	return 0;
+}
+
+/*
+ * In many cases for imm8 we can produce shorter code.
+ */
+static size_t
+imm_size(int32_t v)
+{
+	if (v == (int8_t)v)
+		return sizeof(int8_t);
+	return sizeof(int32_t);
+}
+
+static void
+emit_bytes(struct bpf_jit_state *st, const uint8_t ins[], uint32_t sz)
+{
+	uint32_t i;
+
+	if (st->ins != NULL) {
+		for (i = 0; i != sz; i++)
+			st->ins[st->sz + i] = ins[i];
+	}
+	st->sz += sz;
+}
+
+static void
+emit_imm(struct bpf_jit_state *st, const uint32_t imm, uint32_t sz)
+{
+	union bpf_jit_imm v;
+
+	v.u32 = imm;
+	emit_bytes(st, v.u8, sz);
+}
+
+/*
+ * emit REX byte
+ */
+static void
+emit_rex(struct bpf_jit_state *st, uint32_t op, uint32_t reg, uint32_t rm)
+{
+	uint8_t rex;
+
+	/* mark operand registers as used*/
+	USED(st->reguse, reg);
+	USED(st->reguse, rm);
+
+	rex = 0;
+	if (BPF_CLASS(op) == EBPF_ALU64 ||
+			op == (BPF_ST | BPF_MEM | EBPF_DW) ||
+			op == (BPF_STX | BPF_MEM | EBPF_DW) ||
+			op == (BPF_STX | EBPF_XADD | EBPF_DW) ||
+			op == (BPF_LD | BPF_IMM | EBPF_DW) ||
+			(BPF_CLASS(op) == BPF_LDX &&
+			BPF_MODE(op) == BPF_MEM &&
+			BPF_SIZE(op) != BPF_W))
+		rex |= REX_W;
+
+	if (IS_EXT_REG(reg))
+		rex |= REX_R;
+
+	if (IS_EXT_REG(rm))
+		rex |= REX_B;
+
+	/* store using SIL, DIL */
+	if (op == (BPF_STX | BPF_MEM | BPF_B) && (reg == RDI || reg == RSI))
+		rex |= REX_PREFIX;
+
+	if (rex != 0) {
+		rex |= REX_PREFIX;
+		emit_bytes(st, &rex, sizeof(rex));
+	}
+}
+
+/*
+ * emit MODRegRM byte
+ */
+static void
+emit_modregrm(struct bpf_jit_state *st, uint32_t mod, uint32_t reg, uint32_t rm)
+{
+	uint8_t v;
+
+	v = mod << 6 | (reg & 7) << 3 | (rm & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit SIB byte
+ */
+static void
+emit_sib(struct bpf_jit_state *st, uint32_t scale, uint32_t idx, uint32_t base)
+{
+	uint8_t v;
+
+	v = scale << 6 | (idx & 7) << 3 | (base & 7);
+	emit_bytes(st, &v, sizeof(v));
+}
+
+/*
+ * emit xchg %<sreg>, %<dreg>
+ */
+static void
+emit_xchg_reg(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	const uint8_t ops = 0x87;
+
+	emit_rex(st, EBPF_ALU64, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit neg %<dreg>
+ */
+static void
+emit_neg(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 3;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+/*
+ * emit mov %<sreg>, %<dreg>
+ */
+static void
+emit_mov_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x89;
+
+	/* if operands are 32-bit, then it can be used to clear upper 32-bit */
+	if (sreg != dreg || BPF_CLASS(op) == BPF_ALU) {
+		emit_rex(st, op, sreg, dreg);
+		emit_bytes(st, &ops, sizeof(ops));
+		emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+	}
+}
+
+/*
+ * emit movzwl %<sreg>, %<dreg>
+ */
+static void
+emit_movzwl(struct bpf_jit_state *st, uint32_t sreg, uint32_t dreg)
+{
+	static const uint8_t ops[] = {0x0F, 0xB7};
+
+	emit_rex(st, BPF_ALU, sreg, dreg);
+	emit_bytes(st, ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit ror <imm8>, %<dreg>
+ */
+static void
+emit_ror_imm(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t prfx = 0x66;
+	const uint8_t ops = 0xC1;
+	const uint8_t mods = 1;
+
+	emit_bytes(st, &prfx, sizeof(prfx));
+	emit_rex(st, BPF_ALU, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit bswap %<dreg>
+ */
+static void
+emit_be2le_48(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	uint32_t rop;
+
+	const uint8_t ops = 0x0F;
+	const uint8_t mods = 1;
+
+	rop = (imm == 64) ? EBPF_ALU64 : BPF_ALU;
+	emit_rex(st, rop, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+}
+
+static void
+emit_be2le(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16) {
+		emit_ror_imm(st, dreg, 8);
+		emit_movzwl(st, dreg, dreg);
+	} else
+		emit_be2le_48(st, dreg, imm);
+}
+
+/*
+ * In general it is NOP for x86.
+ * Just clear the upper bits.
+ */
+static void
+emit_le2be(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm)
+{
+	if (imm == 16)
+		emit_movzwl(st, dreg, dreg);
+	else if (imm == 32)
+		emit_mov_reg(st, BPF_ALU | EBPF_MOV | BPF_X, dreg, dreg);
+}
+
+/*
+ * emit one of:
+ *   add <imm>, %<dreg>
+ *   and <imm>, %<dreg>
+ *   or  <imm>, %<dreg>
+ *   sub <imm>, %<dreg>
+ *   xor <imm>, %<dreg>
+ */
+static void
+emit_alu_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t mod, opcode;
+	uint32_t bop, imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0,
+		[GET_BPF_OP(BPF_AND)] = 4,
+		[GET_BPF_OP(BPF_OR)] =  1,
+		[GET_BPF_OP(BPF_SUB)] = 5,
+		[GET_BPF_OP(BPF_XOR)] = 6,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+
+	imsz = imm_size(imm);
+	opcode = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &opcode, sizeof(opcode));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit one of:
+ *   add %<sreg>, %<dreg>
+ *   and %<sreg>, %<dreg>
+ *   or  %<sreg>, %<dreg>
+ *   sub %<sreg>, %<dreg>
+ *   xor %<sreg>, %<dreg>
+ */
+static void
+emit_alu_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[] = {
+		[GET_BPF_OP(BPF_ADD)] = 0x01,
+		[GET_BPF_OP(BPF_AND)] = 0x21,
+		[GET_BPF_OP(BPF_OR)] =  0x09,
+		[GET_BPF_OP(BPF_SUB)] = 0x29,
+		[GET_BPF_OP(BPF_XOR)] = 0x31,
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+static void
+emit_shift(struct bpf_jit_state *st, uint32_t op, uint32_t dreg)
+{
+	uint8_t mod;
+	uint32_t bop, opx;
+
+	static const uint8_t ops[] = {0xC1, 0xD3};
+	static const uint8_t mods[] = {
+		[GET_BPF_OP(BPF_LSH)] = 4,
+		[GET_BPF_OP(BPF_RSH)] = 5,
+		[GET_BPF_OP(EBPF_ARSH)] = 7,
+	};
+
+	bop = GET_BPF_OP(op);
+	mod = mods[bop];
+	opx = (BPF_SRC(op) == BPF_X);
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+	emit_modregrm(st, MOD_DIRECT, mod, dreg);
+}
+
+/*
+ * emit one of:
+ *   shl <imm>, %<dreg>
+ *   shr <imm>, %<dreg>
+ *   sar <imm>, %<dreg>
+ */
+static void
+emit_shift_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm)
+{
+	emit_shift(st, op, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+/*
+ * emit one of:
+ *   shl %<dreg>
+ *   shr %<dreg>
+ *   sar %<dreg>
+ * note that rcx is implicitly used as a source register, so few extra
+ * instructions for register spillage might be necessary.
+ */
+static void
+emit_shift_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+
+	emit_shift(st, op, (dreg == RCX) ? sreg : dreg);
+
+	if (sreg != RCX)
+		emit_xchg_reg(st, RCX, sreg);
+}
+
+/*
+ * emit mov <imm>, %<dreg>
+ */
+static void
+emit_mov_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xC7;
+
+	if (imm == 0) {
+		/* replace 'mov 0, %<dst>' with 'xor %<dst>, %<dst>' */
+		op = BPF_CLASS(op) | BPF_XOR | BPF_X;
+		emit_alu_reg(st, op, dreg, dreg);
+		return;
+	}
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+	emit_imm(st, imm, sizeof(imm));
+}
+
+/*
+ * emit mov <imm64>, %<dreg>
+ */
+static void
+emit_ld_imm64(struct bpf_jit_state *st, uint32_t dreg, uint32_t imm0,
+	uint32_t imm1)
+{
+	const uint8_t ops = 0xB8;
+
+	if (imm1 == 0) {
+		emit_mov_imm(st, EBPF_ALU64 | EBPF_MOV | BPF_K, dreg, imm0);
+		return;
+	}
+
+	emit_rex(st, EBPF_ALU64, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, 0, dreg);
+
+	emit_imm(st, imm0, sizeof(imm0));
+	emit_imm(st, imm1, sizeof(imm1));
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * either:
+ *   mov %<sreg>, %rdx
+ * OR
+ *   mov <imm>, %rdx
+ * mul %rdx
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ */
+static void
+emit_mul(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 4;
+
+	/* save rax & rdx */
+	emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RAX, REG_TMP0);
+	emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* rax = dreg */
+	emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, dreg, RAX);
+
+	if (BPF_SRC(op) == BPF_X)
+		/* rdx = sreg */
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X,
+			sreg == RAX ? REG_TMP0 : sreg, RDX);
+	else
+		/* rdx = imm */
+		emit_mov_imm(st, EBPF_ALU64 | EBPF_MOV | BPF_K, RDX, imm);
+
+	emit_rex(st, op, RAX, RDX);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RDX);
+
+	if (dreg != RDX)
+		/* restore rdx */
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, REG_TMP1, RDX);
+
+	if (dreg != RAX) {
+		/* dreg = rax */
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RAX, dreg);
+		/* restore rax */
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, REG_TMP0, RAX);
+	}
+}
+
+/*
+ * emit mov <ofs>(%<sreg>), %<dreg>
+ * note that for non 64-bit ops, higher bits have to be cleared.
+ */
+static void
+emit_ld_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	uint32_t mods, opsz;
+	const uint8_t op32 = 0x8B;
+	const uint8_t op16[] = {0x0F, 0xB7};
+	const uint8_t op8[] = {0x0F, 0xB6};
+
+	emit_rex(st, op, dreg, sreg);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_B)
+		emit_bytes(st, op8, sizeof(op8));
+	else if (opsz == BPF_H)
+		emit_bytes(st, op16, sizeof(op16));
+	else
+		emit_bytes(st, &op32, sizeof(op32));
+
+	mods = (imm_size(ofs) == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, dreg, sreg);
+	if (sreg == RSP || sreg == R12)
+		emit_sib(st, SIB_SCALE_1, sreg, sreg);
+	emit_imm(st, ofs, imm_size(ofs));
+}
+
+/*
+ * emit one of:
+ *   mov %<sreg>, <ofs>(%<dreg>)
+ *   mov <imm>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_common(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, uint32_t imm, int32_t ofs)
+{
+	uint32_t mods, imsz, opsz, opx;
+	const uint8_t prfx16 = 0x66;
+
+	/* 8 bit instruction opcodes */
+	static const uint8_t op8[] = {0xC6, 0x88};
+
+	/* 16/32/64 bit instruction opcodes */
+	static const uint8_t ops[] = {0xC7, 0x89};
+
+	/* is the instruction has immediate value or src reg? */
+	opx = (BPF_CLASS(op) == BPF_STX);
+
+	opsz = BPF_SIZE(op);
+	if (opsz == BPF_H)
+		emit_bytes(st, &prfx16, sizeof(prfx16));
+
+	emit_rex(st, op, sreg, dreg);
+
+	if (opsz == BPF_B)
+		emit_bytes(st, &op8[opx], sizeof(op8[opx]));
+	else
+		emit_bytes(st, &ops[opx], sizeof(ops[opx]));
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_modregrm(st, mods, sreg, dreg);
+
+	if (dreg == RSP || dreg == R12)
+		emit_sib(st, SIB_SCALE_1, dreg, dreg);
+
+	emit_imm(st, ofs, imsz);
+
+	if (opx == 0) {
+		imsz = RTE_MIN(bpf_size(opsz), sizeof(imm));
+		emit_imm(st, imm, imsz);
+	}
+}
+
+static void
+emit_st_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm,
+	int32_t ofs)
+{
+	emit_st_common(st, op, 0, dreg, imm, ofs);
+}
+
+static void
+emit_st_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	int32_t ofs)
+{
+	emit_st_common(st, op, sreg, dreg, 0, ofs);
+}
+
+/*
+ * emit lock add %<sreg>, <ofs>(%<dreg>)
+ */
+static void
+emit_st_xadd(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	uint32_t imsz, mods;
+
+	const uint8_t lck = 0xF0; /* lock prefix */
+	const uint8_t ops = 0x01; /* add opcode */
+
+	imsz = imm_size(ofs);
+	mods = (imsz == 1) ? MOD_IDISP8 : MOD_IDISP32;
+
+	emit_bytes(st, &lck, sizeof(lck));
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, mods, sreg, dreg);
+	emit_imm(st, ofs, imsz);
+}
+
+/*
+ * emit:
+ *    mov <imm64>, (%rax)
+ *    call *%rax
+ */
+static void
+emit_call(struct bpf_jit_state *st, uintptr_t trg)
+{
+	const uint8_t ops = 0xFF;
+	const uint8_t mods = 2;
+
+	emit_ld_imm64(st, RAX, trg, trg >> 32);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, RAX);
+}
+
+/*
+ * emit jmp <ofs>
+ * where 'ofs' is the target offset for the native code.
+ */
+static void
+emit_abs_jmp(struct bpf_jit_state *st, int32_t ofs)
+{
+	int32_t joff;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0xEB;
+	const uint8_t op32 = 0xE9;
+
+	const int32_t sz8 = sizeof(op8) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32) + sizeof(uint32_t);
+
+	/* max possible jmp instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = ofs - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8, sizeof(op8));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, &op32, sizeof(op32));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit jmp <ofs>
+ * where 'ofs' is the target offset for the BPF bytecode.
+ */
+static void
+emit_jmp(struct bpf_jit_state *st, int32_t ofs)
+{
+	emit_abs_jmp(st, st->off[st->idx + ofs]);
+}
+
+/*
+ * emit one of:
+ *    cmovz %<sreg>, <%dreg>
+ *    cmovne %<sreg>, <%dreg>
+ *    cmova %<sreg>, <%dreg>
+ *    cmovb %<sreg>, <%dreg>
+ *    cmovae %<sreg>, <%dreg>
+ *    cmovbe %<sreg>, <%dreg>
+ *    cmovg %<sreg>, <%dreg>
+ *    cmovl %<sreg>, <%dreg>
+ *    cmovge %<sreg>, <%dreg>
+ *    cmovle %<sreg>, <%dreg>
+ */
+static void
+emit_movcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	uint32_t bop;
+
+	static const uint8_t ops[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x44},  /* CMOVZ */
+		[GET_BPF_OP(EBPF_JNE)] = {0x0F, 0x45},  /* CMOVNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x47},  /* CMOVA */
+		[GET_BPF_OP(EBPF_JLT)] = {0x0F, 0x42},  /* CMOVB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x43},  /* CMOVAE */
+		[GET_BPF_OP(EBPF_JLE)] = {0x0F, 0x46},  /* CMOVBE */
+		[GET_BPF_OP(EBPF_JSGT)] = {0x0F, 0x4F}, /* CMOVG */
+		[GET_BPF_OP(EBPF_JSLT)] = {0x0F, 0x4C}, /* CMOVL */
+		[GET_BPF_OP(EBPF_JSGE)] = {0x0F, 0x4D}, /* CMOVGE */
+		[GET_BPF_OP(EBPF_JSLE)] = {0x0F, 0x4E}, /* CMOVLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x45}, /* CMOVNE */
+	};
+
+	bop = GET_BPF_OP(op);
+
+	emit_rex(st, op, dreg, sreg);
+	emit_bytes(st, ops[bop], sizeof(ops[bop]));
+	emit_modregrm(st, MOD_DIRECT, dreg, sreg);
+}
+
+/*
+ * emit one of:
+ * je <ofs>
+ * jne <ofs>
+ * ja <ofs>
+ * jb <ofs>
+ * jae <ofs>
+ * jbe <ofs>
+ * jg <ofs>
+ * jl <ofs>
+ * jge <ofs>
+ * jle <ofs>
+ * where 'ofs' is the target offset for the native code.
+ */
+static void
+emit_abs_jcc(struct bpf_jit_state *st, uint32_t op, int32_t ofs)
+{
+	uint32_t bop, imsz;
+	int32_t joff;
+
+	static const uint8_t op8[] = {
+		[GET_BPF_OP(BPF_JEQ)] = 0x74,  /* JE */
+		[GET_BPF_OP(EBPF_JNE)] = 0x75,  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = 0x77,  /* JA */
+		[GET_BPF_OP(EBPF_JLT)] = 0x72,  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = 0x73,  /* JAE */
+		[GET_BPF_OP(EBPF_JLE)] = 0x76,  /* JBE */
+		[GET_BPF_OP(EBPF_JSGT)] = 0x7F, /* JG */
+		[GET_BPF_OP(EBPF_JSLT)] = 0x7C, /* JL */
+		[GET_BPF_OP(EBPF_JSGE)] = 0x7D, /*JGE */
+		[GET_BPF_OP(EBPF_JSLE)] = 0x7E, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = 0x75, /*JNE */
+	};
+
+	static const uint8_t op32[][2] = {
+		[GET_BPF_OP(BPF_JEQ)] = {0x0F, 0x84},  /* JE */
+		[GET_BPF_OP(EBPF_JNE)] = {0x0F, 0x85},  /* JNE */
+		[GET_BPF_OP(BPF_JGT)] = {0x0F, 0x87},  /* JA */
+		[GET_BPF_OP(EBPF_JLT)] = {0x0F, 0x82},  /* JB */
+		[GET_BPF_OP(BPF_JGE)] = {0x0F, 0x83},  /* JAE */
+		[GET_BPF_OP(EBPF_JLE)] = {0x0F, 0x86},  /* JBE */
+		[GET_BPF_OP(EBPF_JSGT)] = {0x0F, 0x8F}, /* JG */
+		[GET_BPF_OP(EBPF_JSLT)] = {0x0F, 0x8C}, /* JL */
+		[GET_BPF_OP(EBPF_JSGE)] = {0x0F, 0x8D}, /*JGE */
+		[GET_BPF_OP(EBPF_JSLE)] = {0x0F, 0x8E}, /* JLE */
+		[GET_BPF_OP(BPF_JSET)] = {0x0F, 0x85}, /*JNE */
+	};
+
+	const int32_t sz8 = sizeof(op8[0]) + sizeof(uint8_t);
+	const int32_t sz32 = sizeof(op32[0]) + sizeof(uint32_t);
+
+	/* max possible jcc instruction size */
+	const int32_t iszm = RTE_MAX(sz8, sz32);
+
+	joff = ofs - st->sz;
+	imsz = RTE_MAX(imm_size(joff), imm_size(joff + iszm));
+
+	bop = GET_BPF_OP(op);
+
+	if (imsz == 1) {
+		emit_bytes(st, &op8[bop], sizeof(op8[bop]));
+		joff -= sz8;
+	} else {
+		emit_bytes(st, op32[bop], sizeof(op32[bop]));
+		joff -= sz32;
+	}
+
+	emit_imm(st, joff, imsz);
+}
+
+/*
+ * emit one of:
+ * je <ofs>
+ * jne <ofs>
+ * ja <ofs>
+ * jb <ofs>
+ * jae <ofs>
+ * jbe <ofs>
+ * jg <ofs>
+ * jl <ofs>
+ * jge <ofs>
+ * jle <ofs>
+ * where 'ofs' is the target offset for the BPF bytecode.
+ */
+static void
+emit_jcc(struct bpf_jit_state *st, uint32_t op, int32_t ofs)
+{
+	emit_abs_jcc(st, op, st->off[st->idx + ofs]);
+}
+
+
+/*
+ * emit cmp <imm>, %<dreg>
+ */
+static void
+emit_cmp_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	uint8_t ops;
+	uint32_t imsz;
+
+	const uint8_t op8 = 0x83;
+	const uint8_t op32 = 0x81;
+	const uint8_t mods = 7;
+
+	imsz = imm_size(imm);
+	ops = (imsz == 1) ? op8 : op32;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imsz);
+}
+
+/*
+ * emit test <imm>, %<dreg>
+ */
+static void
+emit_tst_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg, uint32_t imm)
+{
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 0;
+
+	emit_rex(st, op, 0, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, dreg);
+	emit_imm(st, imm, imm_size(imm));
+}
+
+static void
+emit_jcc_imm(struct bpf_jit_state *st, uint32_t op, uint32_t dreg,
+	uint32_t imm, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_imm(st, EBPF_ALU64, dreg, imm);
+	else
+		emit_cmp_imm(st, EBPF_ALU64, dreg, imm);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * emit test %<sreg>, %<dreg>
+ */
+static void
+emit_tst_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x85;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+}
+
+/*
+ * emit cmp %<sreg>, %<dreg>
+ */
+static void
+emit_cmp_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg)
+{
+	const uint8_t ops = 0x39;
+
+	emit_rex(st, op, sreg, dreg);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, sreg, dreg);
+
+}
+
+static void
+emit_jcc_reg(struct bpf_jit_state *st, uint32_t op, uint32_t sreg,
+	uint32_t dreg, int32_t ofs)
+{
+	if (BPF_OP(op) == BPF_JSET)
+		emit_tst_reg(st, EBPF_ALU64, sreg, dreg);
+	else
+		emit_cmp_reg(st, EBPF_ALU64, sreg, dreg);
+
+	emit_jcc(st, op, ofs);
+}
+
+/*
+ * note that rax:rdx are implicitly used as source/destination registers,
+ * so some reg spillage is necessary.
+ * emit:
+ * mov %rax, %r11
+ * mov %rdx, %r10
+ * mov %<dreg>, %rax
+ * xor %rdx, %rdx
+ * for divisor as immediate value:
+ *   mov <imm>, %r9
+ * div %<divisor_reg>
+ * mov %r10, %rdx
+ * mov %rax, %<dreg>
+ * mov %r11, %rax
+ * either:
+ *   mov %rax, %<dreg>
+ * OR
+ *   mov %rdx, %<dreg>
+ * mov %r11, %rax
+ * mov %r10, %rdx
+ */
+static void
+emit_div(struct bpf_jit_state *st, uint32_t op, uint32_t sreg, uint32_t dreg,
+	uint32_t imm)
+{
+	uint32_t sr;
+
+	const uint8_t ops = 0xF7;
+	const uint8_t mods = 6;
+
+	if (BPF_SRC(op) == BPF_X) {
+
+		/* check that src divisor is not zero */
+		emit_tst_reg(st, BPF_CLASS(op), sreg, sreg);
+
+		/* exit with return value zero */
+		emit_movcc_reg(st, BPF_CLASS(op) | BPF_JEQ | BPF_X, sreg, RAX);
+		emit_abs_jcc(st, BPF_JMP | BPF_JEQ | BPF_K, st->exit.off);
+	}
+
+	/* save rax & rdx */
+	if (dreg != RAX)
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RAX, REG_TMP0);
+	if (dreg != RDX)
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RDX, REG_TMP1);
+
+	/* fill rax & rdx */
+	emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, dreg, RAX);
+	emit_mov_imm(st, EBPF_ALU64 | EBPF_MOV | BPF_K, RDX, 0);
+
+	if (BPF_SRC(op) == BPF_X) {
+		sr = sreg;
+		if (sr == RAX)
+			sr = REG_TMP0;
+		else if (sr == RDX)
+			sr = REG_TMP1;
+	} else {
+		sr = REG_DIV_IMM;
+		emit_mov_imm(st, EBPF_ALU64 | EBPF_MOV | BPF_K, sr, imm);
+	}
+
+	emit_rex(st, op, 0, sr);
+	emit_bytes(st, &ops, sizeof(ops));
+	emit_modregrm(st, MOD_DIRECT, mods, sr);
+
+	if (BPF_OP(op) == BPF_DIV)
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RAX, dreg);
+	else
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RDX, dreg);
+
+	if (dreg != RAX)
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, REG_TMP0, RAX);
+	if (dreg != RDX)
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, REG_TMP1, RDX);
+}
+
+static void
+emit_prolog(struct bpf_jit_state *st, int32_t stack_size)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	/* we can avoid touching the stack at all */
+	if (spil == 0)
+		return;
+
+
+	emit_alu_imm(st, EBPF_ALU64 | BPF_SUB | BPF_K, RSP,
+		spil * sizeof(uint64_t));
+
+	ofs = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++) {
+		if (INUSE(st->reguse, save_regs[i]) != 0) {
+			emit_st_reg(st, BPF_STX | BPF_MEM | EBPF_DW,
+				save_regs[i], RSP, ofs);
+			ofs += sizeof(uint64_t);
+		}
+	}
+
+	if (INUSE(st->reguse, RBP) != 0) {
+		emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X, RSP, RBP);
+		emit_alu_imm(st, EBPF_ALU64 | BPF_SUB | BPF_K, RSP, stack_size);
+	}
+}
+
+/*
+ * emit ret
+ */
+static void
+emit_ret(struct bpf_jit_state *st)
+{
+	const uint8_t ops = 0xC3;
+
+	emit_bytes(st, &ops, sizeof(ops));
+}
+
+static void
+emit_epilog(struct bpf_jit_state *st)
+{
+	uint32_t i;
+	int32_t spil, ofs;
+
+	/* if we allready have an epilog generate a jump to it */
+	if (st->exit.num++ != 0) {
+		emit_abs_jmp(st, st->exit.off);
+		return;
+	}
+
+	/* store offset of epilog block */
+	st->exit.off = st->sz;
+
+	spil = 0;
+	for (i = 0; i != RTE_DIM(save_regs); i++)
+		spil += INUSE(st->reguse, save_regs[i]);
+
+	if (spil != 0) {
+
+		if (INUSE(st->reguse, RBP) != 0)
+			emit_mov_reg(st, EBPF_ALU64 | EBPF_MOV | BPF_X,
+				RBP, RSP);
+
+		ofs = 0;
+		for (i = 0; i != RTE_DIM(save_regs); i++) {
+			if (INUSE(st->reguse, save_regs[i]) != 0) {
+				emit_ld_reg(st, BPF_LDX | BPF_MEM | EBPF_DW,
+					RSP, save_regs[i], ofs);
+				ofs += sizeof(uint64_t);
+			}
+		}
+
+		emit_alu_imm(st, EBPF_ALU64 | BPF_ADD | BPF_K, RSP,
+			spil * sizeof(uint64_t));
+	}
+
+	emit_ret(st);
+}
+
+/*
+ * walk through bpf code and translate them x86_64 one.
+ */
+static int
+emit(struct bpf_jit_state *st, const struct rte_bpf *bpf)
+{
+	uint32_t i, dr, op, sr;
+	const struct ebpf_insn *ins;
+
+	/* reset state fields */
+	st->sz = 0;
+	st->exit.num = 0;
+
+	emit_prolog(st, bpf->stack_sz);
+
+	for (i = 0; i != bpf->prm.nb_ins; i++) {
+
+		st->idx = i;
+		st->off[i] = st->sz;
+
+		ins = bpf->prm.ins + i;
+
+		dr = ebpf2x86[ins->dst_reg];
+		sr = ebpf2x86[ins->src_reg];
+		op = ins->code;
+
+		switch (op) {
+		/* 32 bit ALU IMM operations */
+		case (BPF_ALU | BPF_ADD | BPF_K):
+		case (BPF_ALU | BPF_SUB | BPF_K):
+		case (BPF_ALU | BPF_AND | BPF_K):
+		case (BPF_ALU | BPF_OR | BPF_K):
+		case (BPF_ALU | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_K):
+		case (BPF_ALU | BPF_RSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (BPF_ALU | EBPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 32 bit ALU REG operations */
+		case (BPF_ALU | BPF_ADD | BPF_X):
+		case (BPF_ALU | BPF_SUB | BPF_X):
+		case (BPF_ALU | BPF_AND | BPF_X):
+		case (BPF_ALU | BPF_OR | BPF_X):
+		case (BPF_ALU | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_LSH | BPF_X):
+		case (BPF_ALU | BPF_RSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | EBPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (BPF_ALU | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		case (BPF_ALU | EBPF_END | EBPF_TO_BE):
+			emit_be2le(st, dr, ins->imm);
+			break;
+		case (BPF_ALU | EBPF_END | EBPF_TO_LE):
+			emit_le2be(st, dr, ins->imm);
+			break;
+		/* 64 bit ALU IMM operations */
+		case (EBPF_ALU64 | BPF_ADD | BPF_K):
+		case (EBPF_ALU64 | BPF_SUB | BPF_K):
+		case (EBPF_ALU64 | BPF_AND | BPF_K):
+		case (EBPF_ALU64 | BPF_OR | BPF_K):
+		case (EBPF_ALU64 | BPF_XOR | BPF_K):
+			emit_alu_imm(st, op, dr, ins->imm);
+			break;
+		case (EBPF_ALU64 | BPF_LSH | BPF_K):
+		case (EBPF_ALU64 | BPF_RSH | BPF_K):
+		case (EBPF_ALU64 | EBPF_ARSH | BPF_K):
+			emit_shift_imm(st, op, dr, ins->imm);
+			break;
+		case (EBPF_ALU64 | EBPF_MOV | BPF_K):
+			emit_mov_imm(st, op, dr, ins->imm);
+			break;
+		/* 64 bit ALU REG operations */
+		case (EBPF_ALU64 | BPF_ADD | BPF_X):
+		case (EBPF_ALU64 | BPF_SUB | BPF_X):
+		case (EBPF_ALU64 | BPF_AND | BPF_X):
+		case (EBPF_ALU64 | BPF_OR | BPF_X):
+		case (EBPF_ALU64 | BPF_XOR | BPF_X):
+			emit_alu_reg(st, op, sr, dr);
+			break;
+		case (EBPF_ALU64 | BPF_LSH | BPF_X):
+		case (EBPF_ALU64 | BPF_RSH | BPF_X):
+		case (EBPF_ALU64 | EBPF_ARSH | BPF_X):
+			emit_shift_reg(st, op, sr, dr);
+			break;
+		case (EBPF_ALU64 | EBPF_MOV | BPF_X):
+			emit_mov_reg(st, op, sr, dr);
+			break;
+		case (EBPF_ALU64 | BPF_NEG):
+			emit_neg(st, op, dr);
+			break;
+		/* multiply instructions */
+		case (BPF_ALU | BPF_MUL | BPF_K):
+		case (BPF_ALU | BPF_MUL | BPF_X):
+		case (EBPF_ALU64 | BPF_MUL | BPF_K):
+		case (EBPF_ALU64 | BPF_MUL | BPF_X):
+			emit_mul(st, op, sr, dr, ins->imm);
+			break;
+		/* divide instructions */
+		case (BPF_ALU | BPF_DIV | BPF_K):
+		case (BPF_ALU | BPF_MOD | BPF_K):
+		case (BPF_ALU | BPF_DIV | BPF_X):
+		case (BPF_ALU | BPF_MOD | BPF_X):
+		case (EBPF_ALU64 | BPF_DIV | BPF_K):
+		case (EBPF_ALU64 | BPF_MOD | BPF_K):
+		case (EBPF_ALU64 | BPF_DIV | BPF_X):
+		case (EBPF_ALU64 | BPF_MOD | BPF_X):
+			emit_div(st, op, sr, dr, ins->imm);
+			break;
+		/* load instructions */
+		case (BPF_LDX | BPF_MEM | BPF_B):
+		case (BPF_LDX | BPF_MEM | BPF_H):
+		case (BPF_LDX | BPF_MEM | BPF_W):
+		case (BPF_LDX | BPF_MEM | EBPF_DW):
+			emit_ld_reg(st, op, sr, dr, ins->off);
+			break;
+		/* load 64 bit immediate value */
+		case (BPF_LD | BPF_IMM | EBPF_DW):
+			emit_ld_imm64(st, dr, ins[0].imm, ins[1].imm);
+			i++;
+			break;
+		/* store instructions */
+		case (BPF_STX | BPF_MEM | BPF_B):
+		case (BPF_STX | BPF_MEM | BPF_H):
+		case (BPF_STX | BPF_MEM | BPF_W):
+		case (BPF_STX | BPF_MEM | EBPF_DW):
+			emit_st_reg(st, op, sr, dr, ins->off);
+			break;
+		case (BPF_ST | BPF_MEM | BPF_B):
+		case (BPF_ST | BPF_MEM | BPF_H):
+		case (BPF_ST | BPF_MEM | BPF_W):
+		case (BPF_ST | BPF_MEM | EBPF_DW):
+			emit_st_imm(st, op, dr, ins->imm, ins->off);
+			break;
+		/* atomic add instructions */
+		case (BPF_STX | EBPF_XADD | BPF_W):
+		case (BPF_STX | EBPF_XADD | EBPF_DW):
+			emit_st_xadd(st, op, sr, dr, ins->off);
+			break;
+		/* jump instructions */
+		case (BPF_JMP | BPF_JA):
+			emit_jmp(st, ins->off + 1);
+			break;
+		/* jump IMM instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_K):
+		case (BPF_JMP | EBPF_JNE | BPF_K):
+		case (BPF_JMP | BPF_JGT | BPF_K):
+		case (BPF_JMP | EBPF_JLT | BPF_K):
+		case (BPF_JMP | BPF_JGE | BPF_K):
+		case (BPF_JMP | EBPF_JLE | BPF_K):
+		case (BPF_JMP | EBPF_JSGT | BPF_K):
+		case (BPF_JMP | EBPF_JSLT | BPF_K):
+		case (BPF_JMP | EBPF_JSGE | BPF_K):
+		case (BPF_JMP | EBPF_JSLE | BPF_K):
+		case (BPF_JMP | BPF_JSET | BPF_K):
+			emit_jcc_imm(st, op, dr, ins->imm, ins->off + 1);
+			break;
+		/* jump REG instructions */
+		case (BPF_JMP | BPF_JEQ | BPF_X):
+		case (BPF_JMP | EBPF_JNE | BPF_X):
+		case (BPF_JMP | BPF_JGT | BPF_X):
+		case (BPF_JMP | EBPF_JLT | BPF_X):
+		case (BPF_JMP | BPF_JGE | BPF_X):
+		case (BPF_JMP | EBPF_JLE | BPF_X):
+		case (BPF_JMP | EBPF_JSGT | BPF_X):
+		case (BPF_JMP | EBPF_JSLT | BPF_X):
+		case (BPF_JMP | EBPF_JSGE | BPF_X):
+		case (BPF_JMP | EBPF_JSLE | BPF_X):
+		case (BPF_JMP | BPF_JSET | BPF_X):
+			emit_jcc_reg(st, op, sr, dr, ins->off + 1);
+			break;
+		/* call instructions */
+		case (BPF_JMP | EBPF_CALL):
+			emit_call(st, (uintptr_t)bpf->prm.xsym[ins->imm].func);
+			break;
+		/* return instruction */
+		case (BPF_JMP | EBPF_EXIT):
+			emit_epilog(st);
+			break;
+		default:
+			RTE_BPF_LOG(ERR,
+				"%s(%p): invalid opcode %#x at pc: %u;\n",
+				__func__, bpf, ins->code, i);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * produce a native ISA version of the given BPF code.
+ */
+int
+bpf_jit_x86(struct rte_bpf *bpf)
+{
+	int32_t rc;
+	uint32_t i;
+	size_t sz;
+	struct bpf_jit_state st;
+
+	/* init state */
+	memset(&st, 0, sizeof(st));
+	st.off = malloc(bpf->prm.nb_ins * sizeof(st.off[0]));
+	if (st.off == NULL)
+		return -ENOMEM;
+
+	/* fill with fake offsets */
+	st.exit.off = INT32_MAX;
+	for (i = 0; i != bpf->prm.nb_ins; i++)
+		st.off[i] = INT32_MAX;
+
+	/*
+	 * dry runs, used to calculate total code size and valid jump offsets.
+	 * stop when we get minimal possible size
+	 */
+	do {
+		sz = st.sz;
+		rc = emit(&st, bpf);
+	} while (rc == 0 && sz != st.sz);
+
+	if (rc == 0) {
+
+		/* allocate memory needed */
+		st.ins = mmap(NULL, st.sz, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+		if (st.ins == MAP_FAILED)
+			rc = -ENOMEM;
+		else
+			/* generate code */
+			rc = emit(&st, bpf);
+	}
+
+	if (rc == 0 && mprotect(st.ins, st.sz, PROT_READ | PROT_EXEC) != 0)
+		rc = -ENOMEM;
+
+	if (rc != 0)
+		munmap(st.ins, st.sz);
+	else {
+		bpf->jit.func = (void *)st.ins;
+		bpf->jit.sz = st.sz;
+	}
+
+	free(st.off);
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index a6a9229bd..668c89184 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -7,6 +7,10 @@ sources = files('bpf.c',
 		'bpf_load.c',
 		'bpf_validate.c')
 
+if arch_subdir == 'x86'
+	sources += files('bpf_jit_x86.c')
+endif
+
 install_headers = files('bpf_def.h',
 			'rte_bpf.h')
 
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v6 5/9] bpf: introduce basic RX/TX BPF filters
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
                           ` (5 preceding siblings ...)
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 4/9] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
@ 2018-05-10 10:23         ` Konstantin Ananyev
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 6/9] testpmd: new commands to load/unload " Konstantin Ananyev
                           ` (3 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-10 10:23 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce API to install BPF based filters on ethdev RX/TX path.
Current implementation is pure SW one, based on ethdev RX/TX
callback mechanism.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 lib/librte_bpf/Makefile            |   2 +
 lib/librte_bpf/bpf_pkt.c           | 607 +++++++++++++++++++++++++++++++++++++
 lib/librte_bpf/meson.build         |   6 +-
 lib/librte_bpf/rte_bpf_ethdev.h    | 102 +++++++
 lib/librte_bpf/rte_bpf_version.map |   4 +
 5 files changed, 719 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_bpf/bpf_pkt.c
 create mode 100644 lib/librte_bpf/rte_bpf_ethdev.h

diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile
index 7a9e00cf3..c0e8aaa68 100644
--- a/lib/librte_bpf/Makefile
+++ b/lib/librte_bpf/Makefile
@@ -24,6 +24,7 @@ LIBABIVER := 1
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_pkt.c
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c
 ifeq ($(CONFIG_RTE_LIBRTE_BPF_ELF),y)
 SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load_elf.c
@@ -35,5 +36,6 @@ endif
 # install header files
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += bpf_def.h
 SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf_ethdev.h
 
 include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_bpf/bpf_pkt.c b/lib/librte_bpf/bpf_pkt.c
new file mode 100644
index 000000000..2200228df
--- /dev/null
+++ b/lib/librte_bpf/bpf_pkt.c
@@ -0,0 +1,607 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdarg.h>
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include <sys/queue.h>
+#include <sys/stat.h>
+
+#include <rte_common.h>
+#include <rte_byteorder.h>
+#include <rte_malloc.h>
+#include <rte_log.h>
+#include <rte_debug.h>
+#include <rte_cycles.h>
+#include <rte_eal.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_atomic.h>
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+
+#include <rte_bpf_ethdev.h>
+#include "bpf_impl.h"
+
+/*
+ * information about installed BPF rx/tx callback
+ */
+
+struct bpf_eth_cbi {
+	/* used by both data & control path */
+	uint32_t use;    /*usage counter */
+	const struct rte_eth_rxtx_callback *cb;  /* callback handle */
+	struct rte_bpf *bpf;
+	struct rte_bpf_jit jit;
+	/* used by control path only */
+	LIST_ENTRY(bpf_eth_cbi) link;
+	uint16_t port;
+	uint16_t queue;
+} __rte_cache_aligned;
+
+/*
+ * Odd number means that callback is used by datapath.
+ * Even number means that callback is not used by datapath.
+ */
+#define BPF_ETH_CBI_INUSE  1
+
+/*
+ * List to manage RX/TX installed callbacks.
+ */
+LIST_HEAD(bpf_eth_cbi_list, bpf_eth_cbi);
+
+enum {
+	BPF_ETH_RX,
+	BPF_ETH_TX,
+	BPF_ETH_NUM,
+};
+
+/*
+ * information about all installed BPF rx/tx callbacks
+ */
+struct bpf_eth_cbh {
+	rte_spinlock_t lock;
+	struct bpf_eth_cbi_list list;
+	uint32_t type;
+};
+
+static struct bpf_eth_cbh rx_cbh = {
+	.lock = RTE_SPINLOCK_INITIALIZER,
+	.list = LIST_HEAD_INITIALIZER(list),
+	.type = BPF_ETH_RX,
+};
+
+static struct bpf_eth_cbh tx_cbh = {
+	.lock = RTE_SPINLOCK_INITIALIZER,
+	.list = LIST_HEAD_INITIALIZER(list),
+	.type = BPF_ETH_TX,
+};
+
+/*
+ * Marks given callback as used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_inuse(struct bpf_eth_cbi *cbi)
+{
+	cbi->use++;
+	/* make sure no store/load reordering could happen */
+	rte_smp_mb();
+}
+
+/*
+ * Marks given callback list as not used by datapath.
+ */
+static __rte_always_inline void
+bpf_eth_cbi_unuse(struct bpf_eth_cbi *cbi)
+{
+	/* make sure all previous loads are completed */
+	rte_smp_rmb();
+	cbi->use++;
+}
+
+/*
+ * Waits till datapath finished using given callback.
+ */
+static void
+bpf_eth_cbi_wait(const struct bpf_eth_cbi *cbi)
+{
+	uint32_t nuse, puse;
+
+	/* make sure all previous loads and stores are completed */
+	rte_smp_mb();
+
+	puse = cbi->use;
+
+	/* in use, busy wait till current RX/TX iteration is finished */
+	if ((puse & BPF_ETH_CBI_INUSE) != 0) {
+		do {
+			rte_pause();
+			rte_compiler_barrier();
+			nuse = cbi->use;
+		} while (nuse == puse);
+	}
+}
+
+static void
+bpf_eth_cbi_cleanup(struct bpf_eth_cbi *bc)
+{
+	bc->bpf = NULL;
+	memset(&bc->jit, 0, sizeof(bc->jit));
+}
+
+static struct bpf_eth_cbi *
+bpf_eth_cbh_find(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *cbi;
+
+	LIST_FOREACH(cbi, &cbh->list, link) {
+		if (cbi->port == port && cbi->queue == queue)
+			break;
+	}
+	return cbi;
+}
+
+static struct bpf_eth_cbi *
+bpf_eth_cbh_add(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *cbi;
+
+	/* return an existing one */
+	cbi = bpf_eth_cbh_find(cbh, port, queue);
+	if (cbi != NULL)
+		return cbi;
+
+	cbi = rte_zmalloc(NULL, sizeof(*cbi), RTE_CACHE_LINE_SIZE);
+	if (cbi != NULL) {
+		cbi->port = port;
+		cbi->queue = queue;
+		LIST_INSERT_HEAD(&cbh->list, cbi, link);
+	}
+	return cbi;
+}
+
+/*
+ * BPF packet processing routinies.
+ */
+
+static inline uint32_t
+apply_filter(struct rte_mbuf *mb[], const uint64_t rc[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i, j, k;
+	struct rte_mbuf *dr[num];
+
+	for (i = 0, j = 0, k = 0; i != num; i++) {
+
+		/* filter matches */
+		if (rc[i] != 0)
+			mb[j++] = mb[i];
+		/* no match */
+		else
+			dr[k++] = mb[i];
+	}
+
+	if (drop != 0) {
+		/* free filtered out mbufs */
+		for (i = 0; i != k; i++)
+			rte_pktmbuf_free(dr[i]);
+	} else {
+		/* copy filtered out mbufs beyond good ones */
+		for (i = 0; i != k; i++)
+			mb[j + i] = dr[i];
+	}
+
+	return j;
+}
+
+static inline uint32_t
+pkt_filter_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint32_t i;
+	void *dp[num];
+	uint64_t rc[num];
+
+	for (i = 0; i != num; i++)
+		dp[i] = rte_pktmbuf_mtod(mb[i], void *);
+
+	rte_bpf_exec_burst(bpf, dp, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i, n;
+	void *dp;
+	uint64_t rc[num];
+
+	n = 0;
+	for (i = 0; i != num; i++) {
+		dp = rte_pktmbuf_mtod(mb[i], void *);
+		rc[i] = jit->func(dp);
+		n += (rc[i] == 0);
+	}
+
+	if (n != 0)
+		num = apply_filter(mb, rc, num, drop);
+
+	return num;
+}
+
+static inline uint32_t
+pkt_filter_mb_vm(const struct rte_bpf *bpf, struct rte_mbuf *mb[], uint32_t num,
+	uint32_t drop)
+{
+	uint64_t rc[num];
+
+	rte_bpf_exec_burst(bpf, (void **)mb, rc, num);
+	return apply_filter(mb, rc, num, drop);
+}
+
+static inline uint32_t
+pkt_filter_mb_jit(const struct rte_bpf_jit *jit, struct rte_mbuf *mb[],
+	uint32_t num, uint32_t drop)
+{
+	uint32_t i, n;
+	uint64_t rc[num];
+
+	n = 0;
+	for (i = 0; i != num; i++) {
+		rc[i] = jit->func(mb[i]);
+		n += (rc[i] == 0);
+	}
+
+	if (n != 0)
+		num = apply_filter(mb, rc, num, drop);
+
+	return num;
+}
+
+/*
+ * RX/TX callbacks for raw data bpf.
+ */
+
+static uint16_t
+bpf_rx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+/*
+ * RX/TX callbacks for mbuf.
+ */
+
+static uint16_t
+bpf_rx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_rx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts,
+	__rte_unused uint16_t max_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 1) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_vm(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_vm(cbi->bpf, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static uint16_t
+bpf_tx_callback_mb_jit(__rte_unused uint16_t port, __rte_unused uint16_t queue,
+	struct rte_mbuf *pkt[], uint16_t nb_pkts, void *user_param)
+{
+	struct bpf_eth_cbi *cbi;
+	uint16_t rc;
+
+	cbi = user_param;
+	bpf_eth_cbi_inuse(cbi);
+	rc = (cbi->cb != NULL) ?
+		pkt_filter_mb_jit(&cbi->jit, pkt, nb_pkts, 0) :
+		nb_pkts;
+	bpf_eth_cbi_unuse(cbi);
+	return rc;
+}
+
+static rte_rx_callback_fn
+select_rx_callback(enum rte_bpf_arg_type type, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (type == RTE_BPF_ARG_PTR)
+			return bpf_rx_callback_jit;
+		else if (type == RTE_BPF_ARG_PTR_MBUF)
+			return bpf_rx_callback_mb_jit;
+	} else if (type == RTE_BPF_ARG_PTR)
+		return bpf_rx_callback_vm;
+	else if (type == RTE_BPF_ARG_PTR_MBUF)
+		return bpf_rx_callback_mb_vm;
+
+	return NULL;
+}
+
+static rte_tx_callback_fn
+select_tx_callback(enum rte_bpf_arg_type type, uint32_t flags)
+{
+	if (flags & RTE_BPF_ETH_F_JIT) {
+		if (type == RTE_BPF_ARG_PTR)
+			return bpf_tx_callback_jit;
+		else if (type == RTE_BPF_ARG_PTR_MBUF)
+			return bpf_tx_callback_mb_jit;
+	} else if (type == RTE_BPF_ARG_PTR)
+		return bpf_tx_callback_vm;
+	else if (type == RTE_BPF_ARG_PTR_MBUF)
+		return bpf_tx_callback_mb_vm;
+
+	return NULL;
+}
+
+/*
+ * helper function to perform BPF unload for given port/queue.
+ * have to introduce extra complexity (and possible slowdown) here,
+ * as right now there is no safe generic way to remove RX/TX callback
+ * while IO is active.
+ * Still don't free memory allocated for callback handle itself,
+ * again right now there is no safe way to do that without stopping RX/TX
+ * on given port/queue first.
+ */
+static void
+bpf_eth_cbi_unload(struct bpf_eth_cbi *bc)
+{
+	/* mark this cbi as empty */
+	bc->cb = NULL;
+	rte_smp_mb();
+
+	/* make sure datapath doesn't use bpf anymore, then destroy bpf */
+	bpf_eth_cbi_wait(bc);
+	rte_bpf_destroy(bc->bpf);
+	bpf_eth_cbi_cleanup(bc);
+}
+
+static void
+bpf_eth_unload(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbi *bc;
+
+	bc = bpf_eth_cbh_find(cbh, port, queue);
+	if (bc == NULL || bc->cb == NULL)
+		return;
+
+	if (cbh->type == BPF_ETH_RX)
+		rte_eth_remove_rx_callback(port, queue, bc->cb);
+	else
+		rte_eth_remove_tx_callback(port, queue, bc->cb);
+
+	bpf_eth_cbi_unload(bc);
+}
+
+
+__rte_experimental void
+rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &rx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	bpf_eth_unload(cbh, port, queue);
+	rte_spinlock_unlock(&cbh->lock);
+}
+
+__rte_experimental void
+rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue)
+{
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &tx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	bpf_eth_unload(cbh, port, queue);
+	rte_spinlock_unlock(&cbh->lock);
+}
+
+static int
+bpf_eth_elf_load(struct bpf_eth_cbh *cbh, uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbi *bc;
+	struct rte_bpf *bpf;
+	rte_rx_callback_fn frx;
+	rte_tx_callback_fn ftx;
+	struct rte_bpf_jit jit;
+
+	frx = NULL;
+	ftx = NULL;
+
+	if (prm == NULL || rte_eth_dev_is_valid_port(port) == 0 ||
+			queue >= RTE_MAX_QUEUES_PER_PORT)
+		return -EINVAL;
+
+	if (cbh->type == BPF_ETH_RX)
+		frx = select_rx_callback(prm->prog_arg.type, flags);
+	else
+		ftx = select_tx_callback(prm->prog_arg.type, flags);
+
+	if (frx == NULL && ftx == NULL) {
+		RTE_BPF_LOG(ERR, "%s(%u, %u): no callback selected;\n",
+			__func__, port, queue);
+		return -EINVAL;
+	}
+
+	bpf = rte_bpf_elf_load(prm, fname, sname);
+	if (bpf == NULL)
+		return -rte_errno;
+
+	rte_bpf_get_jit(bpf, &jit);
+
+	if ((flags & RTE_BPF_ETH_F_JIT) != 0 && jit.func == NULL) {
+		RTE_BPF_LOG(ERR, "%s(%u, %u): no JIT generated;\n",
+			__func__, port, queue);
+		rte_bpf_destroy(bpf);
+		return -ENOTSUP;
+	}
+
+	/* setup/update global callback info */
+	bc = bpf_eth_cbh_add(cbh, port, queue);
+	if (bc == NULL)
+		return -ENOMEM;
+
+	/* remove old one, if any */
+	if (bc->cb != NULL)
+		bpf_eth_unload(cbh, port, queue);
+
+	bc->bpf = bpf;
+	bc->jit = jit;
+
+	if (cbh->type == BPF_ETH_RX)
+		bc->cb = rte_eth_add_rx_callback(port, queue, frx, bc);
+	else
+		bc->cb = rte_eth_add_tx_callback(port, queue, ftx, bc);
+
+	if (bc->cb == NULL) {
+		rc = -rte_errno;
+		rte_bpf_destroy(bpf);
+		bpf_eth_cbi_cleanup(bc);
+	} else
+		rc = 0;
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &rx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	rc = bpf_eth_elf_load(cbh, port, queue, prm, fname, sname, flags);
+	rte_spinlock_unlock(&cbh->lock);
+
+	return rc;
+}
+
+__rte_experimental int
+rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags)
+{
+	int32_t rc;
+	struct bpf_eth_cbh *cbh;
+
+	cbh = &tx_cbh;
+	rte_spinlock_lock(&cbh->lock);
+	rc = bpf_eth_elf_load(cbh, port, queue, prm, fname, sname, flags);
+	rte_spinlock_unlock(&cbh->lock);
+
+	return rc;
+}
diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build
index 668c89184..de9de0091 100644
--- a/lib/librte_bpf/meson.build
+++ b/lib/librte_bpf/meson.build
@@ -5,6 +5,7 @@ allow_experimental_apis = true
 sources = files('bpf.c',
 		'bpf_exec.c',
 		'bpf_load.c',
+		'bpf_pkt.c',
 		'bpf_validate.c')
 
 if arch_subdir == 'x86'
@@ -12,9 +13,10 @@ if arch_subdir == 'x86'
 endif
 
 install_headers = files('bpf_def.h',
-			'rte_bpf.h')
+			'rte_bpf.h',
+			'rte_bpf_ethdev.h')
 
-deps += ['mbuf', 'net']
+deps += ['mbuf', 'net', 'ethdev']
 
 dep = cc.find_library('elf', required: false)
 if dep.found() == true and cc.has_header('libelf.h', dependencies: dep)
diff --git a/lib/librte_bpf/rte_bpf_ethdev.h b/lib/librte_bpf/rte_bpf_ethdev.h
new file mode 100644
index 000000000..4800bbdaa
--- /dev/null
+++ b/lib/librte_bpf/rte_bpf_ethdev.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _RTE_BPF_ETHDEV_H_
+#define _RTE_BPF_ETHDEV_H_
+
+/**
+ * @file
+ *
+ * API to install BPF filter as RX/TX callbacks for eth devices.
+ * Note that right now:
+ * - it is not MT safe, i.e. it is not allowed to do load/unload for the
+ *   same port/queue from different threads in parallel.
+ * - though it allows to do load/unload at runtime
+ *   (while RX/TX is ongoing on given port/queue).
+ * - allows only one BPF program per port/queue,
+ * i.e. new load will replace previously loaded for that port/queue BPF program.
+ * Filter behaviour - if BPF program returns zero value for a given packet,
+ * then it will be dropped inside callback and no further processing
+ *   on RX - it will be dropped inside callback and no further processing
+ *   for that packet will happen.
+ *   on TX - packet will remain unsent, and it is responsibility of the user
+ *   to handle such situation (drop, try to send again, etc.).
+ */
+
+#include <rte_bpf.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+enum {
+	RTE_BPF_ETH_F_NONE = 0,
+	RTE_BPF_ETH_F_JIT  = 0x1, /*< use compiled into native ISA code */
+};
+
+/**
+ * Unload previously loaded BPF program (if any) from given RX port/queue
+ * and remove appropriate RX port/queue callback.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the RX queue on the given port
+ */
+void rte_bpf_eth_rx_unload(uint16_t port, uint16_t queue);
+
+/**
+ * Unload previously loaded BPF program (if any) from given TX port/queue
+ * and remove appropriate TX port/queue callback.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the TX queue on the given port
+ */
+void rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue);
+
+/**
+ * Load BPF program from the ELF file and install callback to execute it
+ * on given RX port/queue.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the RX queue on the given port
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   Zero on successful completion or negative error code otherwise.
+ */
+int rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+/**
+ * Load BPF program from the ELF file and install callback to execute it
+ * on given TX port/queue.
+ *
+ * @param port
+ *   The identifier of the ethernet port
+ * @param queue
+ *   The identifier of the TX queue on the given port
+ * @param fname
+ *  Pathname for a ELF file.
+ * @param sname
+ *  Name of the executable section within the file to load.
+ * @return
+ *   Zero on successful completion or negative error code otherwise.
+ */
+int rte_bpf_eth_tx_elf_load(uint16_t port, uint16_t queue,
+	const struct rte_bpf_prm *prm, const char *fname, const char *sname,
+	uint32_t flags);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_BPF_ETHDEV_H_ */
diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map
index ff65144df..a203e088e 100644
--- a/lib/librte_bpf/rte_bpf_version.map
+++ b/lib/librte_bpf/rte_bpf_version.map
@@ -3,6 +3,10 @@ EXPERIMENTAL {
 
 	rte_bpf_destroy;
 	rte_bpf_elf_load;
+	rte_bpf_eth_rx_elf_load;
+	rte_bpf_eth_rx_unload;
+	rte_bpf_eth_tx_elf_load;
+	rte_bpf_eth_tx_unload;
 	rte_bpf_exec;
 	rte_bpf_exec_burst;
 	rte_bpf_get_jit;
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v6 6/9] testpmd: new commands to load/unload BPF filters
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
                           ` (6 preceding siblings ...)
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 5/9] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
@ 2018-05-10 10:23         ` Konstantin Ananyev
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 7/9] test: add few eBPF samples Konstantin Ananyev
                           ` (2 subsequent siblings)
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-10 10:23 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Introduce new testpmd commands to load/unload RX/TX BPF-based filters.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 app/test-pmd/Makefile           |   1 +
 app/test-pmd/bpf_cmd.c          | 175 ++++++++++++++++++++++++++++++++++++++++
 app/test-pmd/bpf_cmd.h          |  16 ++++
 app/test-pmd/cmdline.c          |   5 ++
 app/test-pmd/meson.build        |   4 +
 lib/librte_bpf/rte_bpf_ethdev.h |  10 +++
 6 files changed, 211 insertions(+)
 create mode 100644 app/test-pmd/bpf_cmd.c
 create mode 100644 app/test-pmd/bpf_cmd.h

diff --git a/app/test-pmd/Makefile b/app/test-pmd/Makefile
index 60ae9b9c1..a5a827bbd 100644
--- a/app/test-pmd/Makefile
+++ b/app/test-pmd/Makefile
@@ -33,6 +33,7 @@ SRCS-y += txonly.c
 SRCS-y += csumonly.c
 SRCS-y += icmpecho.c
 SRCS-$(CONFIG_RTE_LIBRTE_IEEE1588) += ieee1588fwd.c
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_cmd.c
 
 ifeq ($(CONFIG_RTE_LIBRTE_PMD_SOFTNIC)$(CONFIG_RTE_LIBRTE_SCHED),yy)
 SRCS-y += tm.c
diff --git a/app/test-pmd/bpf_cmd.c b/app/test-pmd/bpf_cmd.c
new file mode 100644
index 000000000..584fad908
--- /dev/null
+++ b/app/test-pmd/bpf_cmd.c
@@ -0,0 +1,175 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <rte_mbuf.h>
+#include <rte_ethdev.h>
+#include <rte_flow.h>
+#include <rte_bpf_ethdev.h>
+
+#include <cmdline.h>
+#include <cmdline_parse.h>
+#include <cmdline_parse_num.h>
+#include <cmdline_parse_string.h>
+
+#include "testpmd.h"
+
+static const struct rte_bpf_xsym bpf_xsym[] = {
+	{
+		.name = RTE_STR(stdout),
+		.type = RTE_BPF_XTYPE_VAR,
+		.var = &stdout,
+	},
+	{
+		.name = RTE_STR(rte_pktmbuf_dump),
+		.type = RTE_BPF_XTYPE_FUNC,
+		.func = (void *)rte_pktmbuf_dump,
+	},
+};
+
+/* *** load BPF program *** */
+struct cmd_bpf_ld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+	cmdline_fixed_string_t op;
+	cmdline_fixed_string_t flags;
+	cmdline_fixed_string_t prm;
+};
+
+static void
+bpf_parse_flags(const char *str, struct rte_bpf_arg *arg, uint32_t *flags)
+{
+	uint32_t i, v;
+
+	*flags = RTE_BPF_ETH_F_NONE;
+	arg->type = RTE_BPF_ARG_PTR;
+	arg->size = mbuf_data_size;
+
+	for (i = 0; str[i] != 0; i++) {
+		v = toupper(str[i]);
+		if (v == 'J')
+			*flags |= RTE_BPF_ETH_F_JIT;
+		else if (v == 'M') {
+			arg->type = RTE_BPF_ARG_PTR_MBUF;
+			arg->size = sizeof(struct rte_mbuf);
+			arg->buf_size = mbuf_data_size;
+		} else if (v == '-')
+			continue;
+		else
+			printf("unknown flag: \'%c\'", v);
+	}
+}
+
+static void cmd_operate_bpf_ld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	int32_t rc;
+	uint32_t flags;
+	struct cmd_bpf_ld_result *res;
+	struct rte_bpf_prm prm;
+	const char *fname, *sname;
+
+	res = parsed_result;
+	memset(&prm, 0, sizeof(prm));
+	prm.xsym = bpf_xsym;
+	prm.nb_xsym = RTE_DIM(bpf_xsym);
+
+	bpf_parse_flags(res->flags, &prm.prog_arg, &flags);
+	fname = res->prm;
+	sname = ".text";
+
+	if (strcmp(res->dir, "rx") == 0) {
+		rc = rte_bpf_eth_rx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else if (strcmp(res->dir, "tx") == 0) {
+		rc = rte_bpf_eth_tx_elf_load(res->port, res->queue, &prm,
+			fname, sname, flags);
+		printf("%d:%s\n", rc, strerror(-rc));
+	} else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_load_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			bpf, "bpf-load");
+cmdline_parse_token_string_t cmd_load_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_load_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_load_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_ld_result, queue, UINT16);
+cmdline_parse_token_string_t cmd_load_bpf_flags =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			flags, NULL);
+cmdline_parse_token_string_t cmd_load_bpf_prm =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_ld_result,
+			prm, NULL);
+
+cmdline_parse_inst_t cmd_operate_bpf_ld_parse = {
+	.f = cmd_operate_bpf_ld_parsed,
+	.data = NULL,
+	.help_str = "bpf-load rx|tx <port> <queue> <J|M|B> <file_name>",
+	.tokens = {
+		(void *)&cmd_load_bpf_start,
+		(void *)&cmd_load_bpf_dir,
+		(void *)&cmd_load_bpf_port,
+		(void *)&cmd_load_bpf_queue,
+		(void *)&cmd_load_bpf_flags,
+		(void *)&cmd_load_bpf_prm,
+		NULL,
+	},
+};
+
+/* *** unload BPF program *** */
+struct cmd_bpf_unld_result {
+	cmdline_fixed_string_t bpf;
+	cmdline_fixed_string_t dir;
+	uint8_t port;
+	uint16_t queue;
+};
+
+static void cmd_operate_bpf_unld_parsed(void *parsed_result,
+				__attribute__((unused)) struct cmdline *cl,
+				__attribute__((unused)) void *data)
+{
+	struct cmd_bpf_unld_result *res;
+
+	res = parsed_result;
+
+	if (strcmp(res->dir, "rx") == 0)
+		rte_bpf_eth_rx_unload(res->port, res->queue);
+	else if (strcmp(res->dir, "tx") == 0)
+		rte_bpf_eth_tx_unload(res->port, res->queue);
+	else
+		printf("invalid value: %s\n", res->dir);
+}
+
+cmdline_parse_token_string_t cmd_unload_bpf_start =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			bpf, "bpf-unload");
+cmdline_parse_token_string_t cmd_unload_bpf_dir =
+	TOKEN_STRING_INITIALIZER(struct cmd_bpf_unld_result,
+			dir, "rx#tx");
+cmdline_parse_token_num_t cmd_unload_bpf_port =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, port, UINT8);
+cmdline_parse_token_num_t cmd_unload_bpf_queue =
+	TOKEN_NUM_INITIALIZER(struct cmd_bpf_unld_result, queue, UINT16);
+
+cmdline_parse_inst_t cmd_operate_bpf_unld_parse = {
+	.f = cmd_operate_bpf_unld_parsed,
+	.data = NULL,
+	.help_str = "bpf-unload rx|tx <port> <queue>",
+	.tokens = {
+		(void *)&cmd_unload_bpf_start,
+		(void *)&cmd_unload_bpf_dir,
+		(void *)&cmd_unload_bpf_port,
+		(void *)&cmd_unload_bpf_queue,
+		NULL,
+	},
+};
diff --git a/app/test-pmd/bpf_cmd.h b/app/test-pmd/bpf_cmd.h
new file mode 100644
index 000000000..5ee4c9f79
--- /dev/null
+++ b/app/test-pmd/bpf_cmd.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#ifndef _BPF_CMD_H_
+#define _BPF_CMD_H_
+
+#ifdef RTE_LIBRTE_BPF
+
+ /* BPF CLI */
+extern cmdline_parse_inst_t cmd_operate_bpf_ld_parse;
+extern cmdline_parse_inst_t cmd_operate_bpf_unld_parse;
+
+#endif /* RTE_LIBRTE_BPF */
+
+#endif /* _BPF_CMD_H_ */
diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index 961567070..5c4bf4e5b 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -75,6 +75,7 @@
 #include "testpmd.h"
 #include "cmdline_mtr.h"
 #include "cmdline_tm.h"
+#include "bpf_cmd.h"
 
 static struct cmdline *testpmd_cl;
 
@@ -16695,6 +16696,10 @@ cmdline_parse_ctx_t main_ctx[] = {
 	(cmdline_parse_inst_t *)&cmd_resume_port_tm_node,
 	(cmdline_parse_inst_t *)&cmd_port_tm_hierarchy_commit,
 	(cmdline_parse_inst_t *)&cmd_cfg_tunnel_udp_port,
+#ifdef RTE_LIBRTE_BPF
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_ld_parse,
+	(cmdline_parse_inst_t *)&cmd_operate_bpf_unld_parse,
+#endif
 	NULL,
 };
 
diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
index b47537642..a51514b03 100644
--- a/app/test-pmd/meson.build
+++ b/app/test-pmd/meson.build
@@ -38,3 +38,7 @@ endif
 if dpdk_conf.has('RTE_LIBRTE_DPAA_PMD')
 	deps += ['bus_dpaa', 'mempool_dpaa', 'pmd_dpaa']
 endif
+if dpdk_conf.has('RTE_LIBRTE_BPF')
+	sources += files('bpf_cmd.c')
+	deps += 'bpf'
+endif
diff --git a/lib/librte_bpf/rte_bpf_ethdev.h b/lib/librte_bpf/rte_bpf_ethdev.h
index 4800bbdaa..b4d4d3b16 100644
--- a/lib/librte_bpf/rte_bpf_ethdev.h
+++ b/lib/librte_bpf/rte_bpf_ethdev.h
@@ -69,6 +69,11 @@ void rte_bpf_eth_tx_unload(uint16_t port, uint16_t queue);
  *  Pathname for a ELF file.
  * @param sname
  *  Name of the executable section within the file to load.
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param flags
+ *  Flags that define expected expected behavior of the loaded filter
+ *  (i.e. jited/non-jited version to use).
  * @return
  *   Zero on successful completion or negative error code otherwise.
  */
@@ -88,6 +93,11 @@ int rte_bpf_eth_rx_elf_load(uint16_t port, uint16_t queue,
  *  Pathname for a ELF file.
  * @param sname
  *  Name of the executable section within the file to load.
+ * @param prm
+ *  Parameters used to create and initialise the BPF exeution context.
+ * @param flags
+ *  Flags that define expected expected behavior of the loaded filter
+ *  (i.e. jited/non-jited version to use).
  * @return
  *   Zero on successful completion or negative error code otherwise.
  */
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v6 7/9] test: add few eBPF samples
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
                           ` (7 preceding siblings ...)
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 6/9] testpmd: new commands to load/unload " Konstantin Ananyev
@ 2018-05-10 10:23         ` Konstantin Ananyev
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 8/9] test: introduce functional test for librte_bpf Konstantin Ananyev
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 9/9] doc: add bpf library related info Konstantin Ananyev
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-10 10:23 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Add few simple eBPF programs as an example.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 test/bpf/dummy.c |  20 ++
 test/bpf/mbuf.h  | 578 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 test/bpf/t1.c    |  52 +++++
 test/bpf/t2.c    |  31 +++
 test/bpf/t3.c    |  36 ++++
 5 files changed, 717 insertions(+)
 create mode 100644 test/bpf/dummy.c
 create mode 100644 test/bpf/mbuf.h
 create mode 100644 test/bpf/t1.c
 create mode 100644 test/bpf/t2.c
 create mode 100644 test/bpf/t3.c

diff --git a/test/bpf/dummy.c b/test/bpf/dummy.c
new file mode 100644
index 000000000..5851469e7
--- /dev/null
+++ b/test/bpf/dummy.c
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * does nothing always return success.
+ * used to measure BPF infrastructure overhead.
+ * To compile:
+ * clang -O2 -target bpf -c dummy.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+
+uint64_t
+entry(void *arg)
+{
+	return 1;
+}
diff --git a/test/bpf/mbuf.h b/test/bpf/mbuf.h
new file mode 100644
index 000000000..f24f908d7
--- /dev/null
+++ b/test/bpf/mbuf.h
@@ -0,0 +1,578 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2010-2014 Intel Corporation.
+ * Copyright 2014 6WIND S.A.
+ */
+
+/*
+ * Snipper from dpdk.org rte_mbuf.h.
+ * used to provide BPF programs information about rte_mbuf layout.
+ */
+
+#ifndef _MBUF_H_
+#define _MBUF_H_
+
+#include <stdint.h>
+#include <rte_common.h>
+#include <rte_memory.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * Packet Offload Features Flags. It also carry packet type information.
+ * Critical resources. Both rx/tx shared these bits. Be cautious on any change
+ *
+ * - RX flags start at bit position zero, and get added to the left of previous
+ *   flags.
+ * - The most-significant 3 bits are reserved for generic mbuf flags
+ * - TX flags therefore start at bit position 60 (i.e. 63-3), and new flags get
+ *   added to the right of the previously defined flags i.e. they should count
+ *   downwards, not upwards.
+ *
+ * Keep these flags synchronized with rte_get_rx_ol_flag_name() and
+ * rte_get_tx_ol_flag_name().
+ */
+
+/**
+ * RX packet is a 802.1q VLAN packet. This flag was set by PMDs when
+ * the packet is recognized as a VLAN, but the behavior between PMDs
+ * was not the same. This flag is kept for some time to avoid breaking
+ * applications and should be replaced by PKT_RX_VLAN_STRIPPED.
+ */
+#define PKT_RX_VLAN_PKT      (1ULL << 0)
+
+#define PKT_RX_RSS_HASH      (1ULL << 1)
+/**< RX packet with RSS hash result. */
+#define PKT_RX_FDIR          (1ULL << 2)
+/**< RX packet with FDIR match indicate. */
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_L4_CKSUM_MASK.
+ * This flag was set when the L4 checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_L4_CKSUM_BAD  (1ULL << 3)
+
+/**
+ * Deprecated.
+ * Checking this flag alone is deprecated: check the 2 bits of
+ * PKT_RX_IP_CKSUM_MASK.
+ * This flag was set when the IP checksum of a packet was detected as
+ * wrong by the hardware.
+ */
+#define PKT_RX_IP_CKSUM_BAD  (1ULL << 4)
+
+#define PKT_RX_EIP_CKSUM_BAD (1ULL << 5)
+/**< External IP header checksum error. */
+
+/**
+ * A vlan has been stripped by the hardware and its tci is saved in
+ * mbuf->vlan_tci. This can only happen if vlan stripping is enabled
+ * in the RX configuration of the PMD.
+ */
+#define PKT_RX_VLAN_STRIPPED (1ULL << 6)
+
+/**
+ * Mask of bits used to determine the status of RX IP checksum.
+ * - PKT_RX_IP_CKSUM_UNKNOWN: no information about the RX IP checksum
+ * - PKT_RX_IP_CKSUM_BAD: the IP checksum in the packet is wrong
+ * - PKT_RX_IP_CKSUM_GOOD: the IP checksum in the packet is valid
+ * - PKT_RX_IP_CKSUM_NONE: the IP checksum is not correct in the packet
+ *   data, but the integrity of the IP header is verified.
+ */
+#define PKT_RX_IP_CKSUM_MASK ((1ULL << 4) | (1ULL << 7))
+
+#define PKT_RX_IP_CKSUM_UNKNOWN 0
+#define PKT_RX_IP_CKSUM_BAD     (1ULL << 4)
+#define PKT_RX_IP_CKSUM_GOOD    (1ULL << 7)
+#define PKT_RX_IP_CKSUM_NONE    ((1ULL << 4) | (1ULL << 7))
+
+/**
+ * Mask of bits used to determine the status of RX L4 checksum.
+ * - PKT_RX_L4_CKSUM_UNKNOWN: no information about the RX L4 checksum
+ * - PKT_RX_L4_CKSUM_BAD: the L4 checksum in the packet is wrong
+ * - PKT_RX_L4_CKSUM_GOOD: the L4 checksum in the packet is valid
+ * - PKT_RX_L4_CKSUM_NONE: the L4 checksum is not correct in the packet
+ *   data, but the integrity of the L4 data is verified.
+ */
+#define PKT_RX_L4_CKSUM_MASK ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_L4_CKSUM_UNKNOWN 0
+#define PKT_RX_L4_CKSUM_BAD     (1ULL << 3)
+#define PKT_RX_L4_CKSUM_GOOD    (1ULL << 8)
+#define PKT_RX_L4_CKSUM_NONE    ((1ULL << 3) | (1ULL << 8))
+
+#define PKT_RX_IEEE1588_PTP  (1ULL << 9)
+/**< RX IEEE1588 L2 Ethernet PT Packet. */
+#define PKT_RX_IEEE1588_TMST (1ULL << 10)
+/**< RX IEEE1588 L2/L4 timestamped packet.*/
+#define PKT_RX_FDIR_ID       (1ULL << 13)
+/**< FD id reported if FDIR match. */
+#define PKT_RX_FDIR_FLX      (1ULL << 14)
+/**< Flexible bytes reported if FDIR match. */
+
+/**
+ * The 2 vlans have been stripped by the hardware and their tci are
+ * saved in mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
+ * This can only happen if vlan stripping is enabled in the RX
+ * configuration of the PMD. If this flag is set, PKT_RX_VLAN_STRIPPED
+ * must also be set.
+ */
+#define PKT_RX_QINQ_STRIPPED (1ULL << 15)
+
+/**
+ * Deprecated.
+ * RX packet with double VLAN stripped.
+ * This flag is replaced by PKT_RX_QINQ_STRIPPED.
+ */
+#define PKT_RX_QINQ_PKT      PKT_RX_QINQ_STRIPPED
+
+/**
+ * When packets are coalesced by a hardware or virtual driver, this flag
+ * can be set in the RX mbuf, meaning that the m->tso_segsz field is
+ * valid and is set to the segment size of original packets.
+ */
+#define PKT_RX_LRO           (1ULL << 16)
+
+/**
+ * Indicate that the timestamp field in the mbuf is valid.
+ */
+#define PKT_RX_TIMESTAMP     (1ULL << 17)
+
+/* add new RX flags here */
+
+/* add new TX flags here */
+
+/**
+ * Offload the MACsec. This flag must be set by the application to enable
+ * this offload feature for a packet to be transmitted.
+ */
+#define PKT_TX_MACSEC        (1ULL << 44)
+
+/**
+ * Bits 45:48 used for the tunnel type.
+ * When doing Tx offload like TSO or checksum, the HW needs to configure the
+ * tunnel type into the HW descriptors.
+ */
+#define PKT_TX_TUNNEL_VXLAN   (0x1ULL << 45)
+#define PKT_TX_TUNNEL_GRE     (0x2ULL << 45)
+#define PKT_TX_TUNNEL_IPIP    (0x3ULL << 45)
+#define PKT_TX_TUNNEL_GENEVE  (0x4ULL << 45)
+/**< TX packet with MPLS-in-UDP RFC 7510 header. */
+#define PKT_TX_TUNNEL_MPLSINUDP (0x5ULL << 45)
+/* add new TX TUNNEL type here */
+#define PKT_TX_TUNNEL_MASK    (0xFULL << 45)
+
+/**
+ * Second VLAN insertion (QinQ) flag.
+ */
+#define PKT_TX_QINQ_PKT    (1ULL << 49)
+/**< TX packet with double VLAN inserted. */
+
+/**
+ * TCP segmentation offload. To enable this offload feature for a
+ * packet to be transmitted on hardware supporting TSO:
+ *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag implies
+ *    PKT_TX_TCP_CKSUM)
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - if it's IPv4, set the PKT_TX_IP_CKSUM flag and write the IP checksum
+ *    to 0 in the packet
+ *  - fill the mbuf offload information: l2_len, l3_len, l4_len, tso_segsz
+ *  - calculate the pseudo header checksum without taking ip_len in account,
+ *    and set it in the TCP header. Refer to rte_ipv4_phdr_cksum() and
+ *    rte_ipv6_phdr_cksum() that can be used as helpers.
+ */
+#define PKT_TX_TCP_SEG       (1ULL << 50)
+
+#define PKT_TX_IEEE1588_TMST (1ULL << 51)
+/**< TX IEEE1588 packet to timestamp. */
+
+/**
+ * Bits 52+53 used for L4 packet type with checksum enabled: 00: Reserved,
+ * 01: TCP checksum, 10: SCTP checksum, 11: UDP checksum. To use hardware
+ * L4 checksum offload, the user needs to:
+ *  - fill l2_len and l3_len in mbuf
+ *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or PKT_TX_UDP_CKSUM
+ *  - set the flag PKT_TX_IPV4 or PKT_TX_IPV6
+ *  - calculate the pseudo header checksum and set it in the L4 header (only
+ *    for TCP or UDP). See rte_ipv4_phdr_cksum() and rte_ipv6_phdr_cksum().
+ *    For SCTP, set the crc field to 0.
+ */
+#define PKT_TX_L4_NO_CKSUM   (0ULL << 52)
+/**< Disable L4 cksum of TX pkt. */
+#define PKT_TX_TCP_CKSUM     (1ULL << 52)
+/**< TCP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_SCTP_CKSUM    (2ULL << 52)
+/**< SCTP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_UDP_CKSUM     (3ULL << 52)
+/**< UDP cksum of TX pkt. computed by NIC. */
+#define PKT_TX_L4_MASK       (3ULL << 52)
+/**< Mask for L4 cksum offload request. */
+
+/**
+ * Offload the IP checksum in the hardware. The flag PKT_TX_IPV4 should
+ * also be set by the application, although a PMD will only check
+ * PKT_TX_IP_CKSUM.
+ *  - set the IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: l2_len, l3_len
+ */
+#define PKT_TX_IP_CKSUM      (1ULL << 54)
+
+/**
+ * Packet is IPv4. This flag must be set when using any offload feature
+ * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV4          (1ULL << 55)
+
+/**
+ * Packet is IPv6. This flag must be set when using an offload feature
+ * (TSO or L4 checksum) to tell the NIC that the packet is an IPv6
+ * packet. If the packet is a tunneled packet, this flag is related to
+ * the inner headers.
+ */
+#define PKT_TX_IPV6          (1ULL << 56)
+
+#define PKT_TX_VLAN_PKT      (1ULL << 57)
+/**< TX packet is a 802.1q VLAN packet. */
+
+/**
+ * Offload the IP checksum of an external header in the hardware. The
+ * flag PKT_TX_OUTER_IPV4 should also be set by the application, alto ugh
+ * a PMD will only check PKT_TX_IP_CKSUM.  The IP checksum field in the
+ * packet must be set to 0.
+ *  - set the outer IP checksum field in the packet to 0
+ *  - fill the mbuf offload information: outer_l2_len, outer_l3_len
+ */
+#define PKT_TX_OUTER_IP_CKSUM   (1ULL << 58)
+
+/**
+ * Packet outer header is IPv4. This flag must be set when using any
+ * outer offload feature (L3 or L4 checksum) to tell the NIC that the
+ * outer header of the tunneled packet is an IPv4 packet.
+ */
+#define PKT_TX_OUTER_IPV4   (1ULL << 59)
+
+/**
+ * Packet outer header is IPv6. This flag must be set when using any
+ * outer offload feature (L4 checksum) to tell the NIC that the outer
+ * header of the tunneled packet is an IPv6 packet.
+ */
+#define PKT_TX_OUTER_IPV6    (1ULL << 60)
+
+/**
+ * Bitmask of all supported packet Tx offload features flags,
+ * which can be set for packet.
+ */
+#define PKT_TX_OFFLOAD_MASK (    \
+		PKT_TX_IP_CKSUM |        \
+		PKT_TX_L4_MASK |         \
+		PKT_TX_OUTER_IP_CKSUM |  \
+		PKT_TX_TCP_SEG |         \
+		PKT_TX_IEEE1588_TMST |	 \
+		PKT_TX_QINQ_PKT |        \
+		PKT_TX_VLAN_PKT |        \
+		PKT_TX_TUNNEL_MASK |	 \
+		PKT_TX_MACSEC)
+
+#define __RESERVED           (1ULL << 61) /**< reserved for future mbuf use */
+
+#define IND_ATTACHED_MBUF    (1ULL << 62) /**< Indirect attached mbuf */
+
+/* Use final bit of flags to indicate a control mbuf */
+#define CTRL_MBUF_FLAG       (1ULL << 63) /**< Mbuf contains control data */
+
+/** Alignment constraint of mbuf private area. */
+#define RTE_MBUF_PRIV_ALIGN 8
+
+/**
+ * Get the name of a RX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid RX flag.
+ */
+const char *rte_get_rx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of RX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the RX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_rx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Get the name of a TX offload flag
+ *
+ * @param mask
+ *   The mask describing the flag. Usually only one bit must be set.
+ *   Several bits can be given if they belong to the same mask.
+ *   Ex: PKT_TX_L4_MASK.
+ * @return
+ *   The name of this flag, or NULL if it's not a valid TX flag.
+ */
+const char *rte_get_tx_ol_flag_name(uint64_t mask);
+
+/**
+ * Dump the list of TX offload flags in a buffer
+ *
+ * @param mask
+ *   The mask describing the TX flags.
+ * @param buf
+ *   The output buffer.
+ * @param buflen
+ *   The length of the buffer.
+ * @return
+ *   0 on success, (-1) on error.
+ */
+int rte_get_tx_ol_flag_list(uint64_t mask, char *buf, size_t buflen);
+
+/**
+ * Some NICs need at least 2KB buffer to RX standard Ethernet frame without
+ * splitting it into multiple segments.
+ * So, for mbufs that planned to be involved into RX/TX, the recommended
+ * minimal buffer length is 2KB + RTE_PKTMBUF_HEADROOM.
+ */
+#define	RTE_MBUF_DEFAULT_DATAROOM	2048
+#define	RTE_MBUF_DEFAULT_BUF_SIZE	\
+	(RTE_MBUF_DEFAULT_DATAROOM + RTE_PKTMBUF_HEADROOM)
+
+/* define a set of marker types that can be used to refer to set points in the
+ * mbuf.
+ */
+__extension__
+typedef void    *MARKER[0];   /**< generic marker for a point in a structure */
+__extension__
+typedef uint8_t  MARKER8[0];  /**< generic marker with 1B alignment */
+__extension__
+typedef uint64_t MARKER64[0];
+/**< marker that allows us to overwrite 8 bytes with a single assignment */
+
+typedef struct {
+	volatile int16_t cnt; /**< An internal counter value. */
+} rte_atomic16_t;
+
+/**
+ * The generic rte_mbuf, containing a packet mbuf.
+ */
+struct rte_mbuf {
+	MARKER cacheline0;
+
+	void *buf_addr;           /**< Virtual address of segment buffer. */
+	/**
+	 * Physical address of segment buffer.
+	 * Force alignment to 8-bytes, so as to ensure we have the exact
+	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+	 * working on vector drivers easier.
+	 */
+	phys_addr_t buf_physaddr __rte_aligned(sizeof(phys_addr_t));
+
+	/* next 8 bytes are initialised on RX descriptor rearm */
+	MARKER64 rearm_data;
+	uint16_t data_off;
+
+	/**
+	 * Reference counter. Its size should at least equal to the size
+	 * of port field (16 bits), to support zero-copy broadcast.
+	 * It should only be accessed using the following functions:
+	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+	 * or non-atomic) is controlled by the CONFIG_RTE_MBUF_REFCNT_ATOMIC
+	 * config option.
+	 */
+	RTE_STD_C11
+	union {
+		rte_atomic16_t refcnt_atomic; /**< Atomically accessed refcnt */
+		uint16_t refcnt;
+		/**< Non-atomically accessed refcnt */
+	};
+	uint16_t nb_segs;         /**< Number of segments. */
+
+	/** Input port (16 bits to support more than 256 virtual ports). */
+	uint16_t port;
+
+	uint64_t ol_flags;        /**< Offload features. */
+
+	/* remaining bytes are set on RX when pulling packet from descriptor */
+	MARKER rx_descriptor_fields1;
+
+	/*
+	 * The packet type, which is the combination of outer/inner L2, L3, L4
+	 * and tunnel types. The packet_type is about data really present in the
+	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+	 * vlan is stripped from the data.
+	 */
+	RTE_STD_C11
+	union {
+		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		struct {
+			uint32_t l2_type:4; /**< (Outer) L2 type. */
+			uint32_t l3_type:4; /**< (Outer) L3 type. */
+			uint32_t l4_type:4; /**< (Outer) L4 type. */
+			uint32_t tun_type:4; /**< Tunnel type. */
+			uint32_t inner_l2_type:4; /**< Inner L2 type. */
+			uint32_t inner_l3_type:4; /**< Inner L3 type. */
+			uint32_t inner_l4_type:4; /**< Inner L4 type. */
+		};
+	};
+
+	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
+	uint16_t data_len;        /**< Amount of data in segment buffer. */
+	/** VLAN TCI (CPU order), valid if PKT_RX_VLAN_STRIPPED is set. */
+	uint16_t vlan_tci;
+
+	union {
+		uint32_t rss;     /**< RSS hash result if RSS enabled */
+		struct {
+			RTE_STD_C11
+			union {
+				struct {
+					uint16_t hash;
+					uint16_t id;
+				};
+				uint32_t lo;
+				/**< Second 4 flexible bytes */
+			};
+			uint32_t hi;
+			/**< First 4 flexible bytes or FD ID, dependent on
+			 *   PKT_RX_FDIR_* flag in ol_flags.
+			 */
+		} fdir;           /**< Filter identifier if FDIR enabled */
+		struct {
+			uint32_t lo;
+			uint32_t hi;
+		} sched;          /**< Hierarchical scheduler */
+		uint32_t usr;
+		/**< User defined tags. See rte_distributor_process() */
+	} hash;                   /**< hash information */
+
+	/** Outer VLAN TCI (CPU order), valid if PKT_RX_QINQ_STRIPPED is set. */
+	uint16_t vlan_tci_outer;
+
+	uint16_t buf_len;         /**< Length of segment buffer. */
+
+	/** Valid if PKT_RX_TIMESTAMP is set. The unit and time reference
+	 * are not normalized but are always the same for a given port.
+	 */
+	uint64_t timestamp;
+
+	/* second cache line - fields only used in slow path or on TX */
+	MARKER cacheline1 __rte_cache_min_aligned;
+
+	RTE_STD_C11
+	union {
+		void *userdata;   /**< Can be used for external metadata */
+		uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */
+	};
+
+	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
+	struct rte_mbuf *next;    /**< Next segment of scattered packet. */
+
+	/* fields to support TX offloads */
+	RTE_STD_C11
+	union {
+		uint64_t tx_offload;       /**< combined for easy fetch */
+		__extension__
+		struct {
+			uint64_t l2_len:7;
+			/**< L2 (MAC) Header Length for non-tunneling pkt.
+			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
+			 */
+			uint64_t l3_len:9; /**< L3 (IP) Header Length. */
+			uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
+			uint64_t tso_segsz:16; /**< TCP TSO segment size */
+
+			/* fields for TX offloading of tunnels */
+			uint64_t outer_l3_len:9;
+			/**< Outer L3 (IP) Hdr Length. */
+			uint64_t outer_l2_len:7;
+			/**< Outer L2 (MAC) Hdr Length. */
+
+			/* uint64_t unused:8; */
+		};
+	};
+
+	/** Size of the application private data. In case of an indirect
+	 * mbuf, it stores the direct mbuf private data size.
+	 */
+	uint16_t priv_size;
+
+	/** Timesync flags for use with IEEE1588. */
+	uint16_t timesync;
+
+	/** Sequence number. See also rte_reorder_insert(). */
+	uint32_t seqn;
+
+} __rte_cache_aligned;
+
+
+/**
+ * Returns TRUE if given mbuf is indirect, or FALSE otherwise.
+ */
+#define RTE_MBUF_INDIRECT(mb)   ((mb)->ol_flags & IND_ATTACHED_MBUF)
+
+/**
+ * Returns TRUE if given mbuf is direct, or FALSE otherwise.
+ */
+#define RTE_MBUF_DIRECT(mb)     (!RTE_MBUF_INDIRECT(mb))
+
+/**
+ * Private data in case of pktmbuf pool.
+ *
+ * A structure that contains some pktmbuf_pool-specific data that are
+ * appended after the mempool structure (in private data).
+ */
+struct rte_pktmbuf_pool_private {
+	uint16_t mbuf_data_room_size; /**< Size of data space in each mbuf. */
+	uint16_t mbuf_priv_size;      /**< Size of private area in each mbuf. */
+};
+
+/**
+ * A macro that points to an offset into the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param o
+ *   The offset into the mbuf data.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod_offset(m, t, o)	\
+	((t)((char *)(m)->buf_addr + (m)->data_off + (o)))
+
+/**
+ * A macro that points to the start of the data in the mbuf.
+ *
+ * The returned pointer is cast to type t. Before using this
+ * function, the user must ensure that the first segment is large
+ * enough to accommodate its data.
+ *
+ * @param m
+ *   The packet mbuf.
+ * @param t
+ *   The type to cast the result into.
+ */
+#define rte_pktmbuf_mtod(m, t) rte_pktmbuf_mtod_offset(m, t, 0)
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _MBUF_H_ */
diff --git a/test/bpf/t1.c b/test/bpf/t1.c
new file mode 100644
index 000000000..60f9434ab
--- /dev/null
+++ b/test/bpf/t1.c
@@ -0,0 +1,52 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to first segment packet data as an input parameter.
+ * analog of tcpdump -s 1 -d 'dst 1.2.3.4 && udp && dst port 5000'
+ * (000) ldh      [12]
+ * (001) jeq      #0x800           jt 2    jf 12
+ * (002) ld       [30]
+ * (003) jeq      #0x1020304       jt 4    jf 12
+ * (004) ldb      [23]
+ * (005) jeq      #0x11            jt 6    jf 12
+ * (006) ldh      [20]
+ * (007) jset     #0x1fff          jt 12   jf 8
+ * (008) ldxb     4*([14]&0xf)
+ * (009) ldh      [x + 16]
+ * (010) jeq      #0x1388          jt 11   jf 12
+ * (011) ret      #1
+ * (012) ret      #0
+ *
+ * To compile:
+ * clang -O2 -target bpf -c t1.c
+ */
+
+#include <stdint.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <netinet/udp.h>
+
+uint64_t
+entry(void *pkt)
+{
+	struct ether_header *ether_header = (void *)pkt;
+
+	if (ether_header->ether_type != __builtin_bswap16(0x0800))
+		return 0;
+
+	struct iphdr *iphdr = (void *)(ether_header + 1);
+	if (iphdr->protocol != 17 || (iphdr->frag_off & 0x1ffff) != 0 ||
+			iphdr->daddr != __builtin_bswap32(0x1020304))
+		return 0;
+
+	int hlen = iphdr->ihl * 4;
+	struct udphdr *udphdr = (void *)iphdr + hlen;
+
+	if (udphdr->dest !=  __builtin_bswap16(5000))
+		return 0;
+
+	return 1;
+}
diff --git a/test/bpf/t2.c b/test/bpf/t2.c
new file mode 100644
index 000000000..69d7a4fe1
--- /dev/null
+++ b/test/bpf/t2.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * cleanup mbuf's vlan_tci and all related RX flags
+ * (PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED).
+ * Doesn't touch contents of packet data.
+ * To compile:
+ * clang -O2 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t2.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <rte_config.h>
+#include "mbuf.h"
+
+uint64_t
+entry(void *pkt)
+{
+	struct rte_mbuf *mb;
+
+	mb = pkt;
+	mb->vlan_tci = 0;
+	mb->ol_flags &= ~(PKT_RX_VLAN_PKT | PKT_RX_VLAN_STRIPPED);
+
+	return 1;
+}
diff --git a/test/bpf/t3.c b/test/bpf/t3.c
new file mode 100644
index 000000000..531b9cb8c
--- /dev/null
+++ b/test/bpf/t3.c
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+/*
+ * eBPF program sample.
+ * Accepts pointer to struct rte_mbuf as an input parameter.
+ * Dump the mbuf into stdout if it is an ARP packet (aka tcpdump 'arp').
+ * To compile:
+ * clang -O2 -I${RTE_SDK}/${RTE_TARGET}/include \
+ * -target bpf -Wno-int-to-void-pointer-cast -c t3.c
+ */
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <net/ethernet.h>
+#include <rte_config.h>
+#include "mbuf.h"
+
+extern void rte_pktmbuf_dump(FILE *, const struct rte_mbuf *, unsigned int);
+
+uint64_t
+entry(const void *pkt)
+{
+	const struct rte_mbuf *mb;
+	const struct ether_header *eth;
+
+	mb = pkt;
+	eth = rte_pktmbuf_mtod(mb, const struct ether_header *);
+
+	if (eth->ether_type == __builtin_bswap16(ETHERTYPE_ARP))
+		rte_pktmbuf_dump(stdout, mb, 64);
+
+	return 1;
+}
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v6 8/9] test: introduce functional test for librte_bpf
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
                           ` (8 preceding siblings ...)
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 7/9] test: add few eBPF samples Konstantin Ananyev
@ 2018-05-10 10:23         ` Konstantin Ananyev
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 9/9] doc: add bpf library related info Konstantin Ananyev
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-10 10:23 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 test/test/Makefile    |    2 +
 test/test/meson.build |    2 +
 test/test/test_bpf.c  | 1759 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1763 insertions(+)
 create mode 100644 test/test/test_bpf.c

diff --git a/test/test/Makefile b/test/test/Makefile
index 2630ab484..9a08e9af6 100644
--- a/test/test/Makefile
+++ b/test/test/Makefile
@@ -193,6 +193,8 @@ endif
 
 SRCS-$(CONFIG_RTE_LIBRTE_KVARGS) += test_kvargs.c
 
+SRCS-$(CONFIG_RTE_LIBRTE_BPF) += test_bpf.c
+
 CFLAGS += -DALLOW_EXPERIMENTAL_API
 
 CFLAGS += -O3
diff --git a/test/test/meson.build b/test/test/meson.build
index ad0a65080..91d0408af 100644
--- a/test/test/meson.build
+++ b/test/test/meson.build
@@ -8,6 +8,7 @@ test_sources = files('commands.c',
 	'test_alarm.c',
 	'test_atomic.c',
 	'test_barrier.c',
+	'test_bpf.c',
 	'test_byteorder.c',
 	'test_cmdline.c',
 	'test_cmdline_cirbuf.c',
@@ -97,6 +98,7 @@ test_sources = files('commands.c',
 )
 
 test_deps = ['acl',
+	'bpf',
 	'cfgfile',
 	'cmdline',
 	'cryptodev',
diff --git a/test/test/test_bpf.c b/test/test/test_bpf.c
new file mode 100644
index 000000000..cbd6be63d
--- /dev/null
+++ b/test/test/test_bpf.c
@@ -0,0 +1,1759 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_memory.h>
+#include <rte_debug.h>
+#include <rte_hexdump.h>
+#include <rte_random.h>
+#include <rte_byteorder.h>
+#include <rte_errno.h>
+#include <rte_bpf.h>
+
+#include "test.h"
+
+/*
+ * Basic functional tests for librte_bpf.
+ * The main procedure - load eBPF program, execute it and
+ * compare restuls with expected values.
+ */
+
+struct dummy_offset {
+	uint64_t u64;
+	uint32_t u32;
+	uint16_t u16;
+	uint8_t  u8;
+};
+
+struct dummy_vect8 {
+	struct dummy_offset in[8];
+	struct dummy_offset out[8];
+};
+
+#define	TEST_FILL_1	0xDEADBEEF
+
+#define	TEST_MUL_1	21
+#define TEST_MUL_2	-100
+
+#define TEST_SHIFT_1	15
+#define TEST_SHIFT_2	33
+
+#define TEST_JCC_1	0
+#define TEST_JCC_2	-123
+#define TEST_JCC_3	5678
+#define TEST_JCC_4	TEST_FILL_1
+
+struct bpf_test {
+	const char *name;
+	size_t arg_sz;
+	struct rte_bpf_prm prm;
+	void (*prepare)(void *);
+	int (*check_result)(uint64_t, const void *);
+	uint32_t allow_fail;
+};
+
+/*
+ * Compare return value and result data with expected ones.
+ * Report a failure if they don't match.
+ */
+static int
+cmp_res(const char *func, uint64_t exp_rc, uint64_t ret_rc,
+	const void *exp_res, const void *ret_res, size_t res_sz)
+{
+	int32_t ret;
+
+	ret = 0;
+	if (exp_rc != ret_rc) {
+		printf("%s@%d: invalid return value, expected: 0x%" PRIx64
+			",result: 0x%" PRIx64 "\n",
+			func, __LINE__, exp_rc, ret_rc);
+		ret |= -1;
+	}
+
+	if (memcmp(exp_res, ret_res, res_sz) != 0) {
+		printf("%s: invalid value\n", func);
+		rte_memdump(stdout, "expected", exp_res, res_sz);
+		rte_memdump(stdout, "result", ret_res, res_sz);
+		ret |= -1;
+	}
+
+	return ret;
+}
+
+/* store immediate test-cases */
+static const struct ebpf_insn test_store1_prog[] = {
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_H),
+		.dst_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u16),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ST | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+		.imm = TEST_FILL_1,
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_store1_prepare(void *arg)
+{
+	struct dummy_offset *df;
+
+	df = arg;
+	memset(df, 0, sizeof(*df));
+}
+
+static int
+test_store1_check(uint64_t rc, const void *arg)
+{
+	const struct dummy_offset *dft;
+	struct dummy_offset dfe;
+
+	dft = arg;
+
+	memset(&dfe, 0, sizeof(dfe));
+	dfe.u64 = (int32_t)TEST_FILL_1;
+	dfe.u32 = dfe.u64;
+	dfe.u16 = dfe.u64;
+	dfe.u8 = dfe.u64;
+
+	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
+}
+
+/* store register test-cases */
+static const struct ebpf_insn test_store2_prog[] = {
+
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_H),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u16),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+/* load test-cases */
+static const struct ebpf_insn test_load1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_B),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u8),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_H),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u16),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return sum */
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_4,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_3,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_load1_prepare(void *arg)
+{
+	struct dummy_offset *df;
+
+	df = arg;
+
+	memset(df, 0, sizeof(*df));
+	df->u64 = (int32_t)TEST_FILL_1;
+	df->u32 = df->u64;
+	df->u16 = df->u64;
+	df->u8 = df->u64;
+}
+
+static int
+test_load1_check(uint64_t rc, const void *arg)
+{
+	uint64_t v;
+	const struct dummy_offset *dft;
+
+	dft = arg;
+	v = dft->u64;
+	v += dft->u32;
+	v += dft->u16;
+	v += dft->u8;
+
+	return cmp_res(__func__, v, rc, dft, dft, sizeof(*dft));
+}
+
+/* alu mul test-cases */
+static const struct ebpf_insn test_mul1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_MUL | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_MUL | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (BPF_ALU | BPF_MUL | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_MUL | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_3,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_mul1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+	uint64_t v;
+
+	dv = arg;
+
+	v = rte_rand();
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u32 = v;
+	dv->in[1].u64 = v << 12 | v >> 6;
+	dv->in[2].u32 = -v;
+}
+
+static int
+test_mul1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 * TEST_MUL_1;
+	r3 *= TEST_MUL_2;
+	r4 = (uint32_t)(r4 * r2);
+	r4 *= r3;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	return cmp_res(__func__, 1, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* alu shift test-cases */
+static const struct ebpf_insn test_shift1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_LSH | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_SHIFT_1,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_ARSH | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = TEST_SHIFT_2,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_RSH | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_4,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_LSH | BPF_X),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_4,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[3].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_AND | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = sizeof(uint64_t) * CHAR_BIT - 1,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_ARSH | BPF_X),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_ALU | BPF_AND | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = sizeof(uint32_t) * CHAR_BIT - 1,
+	},
+	{
+		.code = (BPF_ALU | BPF_LSH | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[4].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[5].u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_shift1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+	uint64_t v;
+
+	dv = arg;
+
+	v = rte_rand();
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u32 = v;
+	dv->in[1].u64 = v << 12 | v >> 6;
+	dv->in[2].u32 = (-v ^ 5);
+}
+
+static int
+test_shift1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 << TEST_SHIFT_1;
+	r3 = (int64_t)r3 >> TEST_SHIFT_2;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+
+	r2 = (uint32_t)r2 >> r4;
+	r3 <<= r4;
+
+	dve.out[2].u64 = r2;
+	dve.out[3].u64 = r3;
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 &= sizeof(uint64_t) * CHAR_BIT - 1;
+	r3 = (int64_t)r3 >> r2;
+	r2 &= sizeof(uint32_t) * CHAR_BIT - 1;
+	r4 = (uint32_t)r4 << r2;
+
+	dve.out[4].u64 = r4;
+	dve.out[5].u64 = r3;
+
+	return cmp_res(__func__, 1, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* jmp test-cases */
+static const struct ebpf_insn test_jump1_prog[] = {
+
+	[0] = {
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0,
+	},
+	[1] = {
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	[2] = {
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	[3] = {
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u32),
+	},
+	[4] = {
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_5,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	[5] = {
+		.code = (BPF_JMP | BPF_JEQ | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_JCC_1,
+		.off = 8,
+	},
+	[6] = {
+		.code = (BPF_JMP | EBPF_JSLE | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = TEST_JCC_2,
+		.off = 9,
+	},
+	[7] = {
+		.code = (BPF_JMP | BPF_JGT | BPF_K),
+		.dst_reg = EBPF_REG_4,
+		.imm = TEST_JCC_3,
+		.off = 10,
+	},
+	[8] = {
+		.code = (BPF_JMP | BPF_JSET | BPF_K),
+		.dst_reg = EBPF_REG_5,
+		.imm = TEST_JCC_4,
+		.off = 11,
+	},
+	[9] = {
+		.code = (BPF_JMP | EBPF_JNE | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_3,
+		.off = 12,
+	},
+	[10] = {
+		.code = (BPF_JMP | EBPF_JSGT | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_4,
+		.off = 13,
+	},
+	[11] = {
+		.code = (BPF_JMP | EBPF_JLE | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_5,
+		.off = 14,
+	},
+	[12] = {
+		.code = (BPF_JMP | BPF_JSET | BPF_X),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_5,
+		.off = 15,
+	},
+	[13] = {
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+	[14] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x1,
+	},
+	[15] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -10,
+	},
+	[16] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x2,
+	},
+	[17] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -11,
+	},
+	[18] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x4,
+	},
+	[19] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -12,
+	},
+	[20] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x8,
+	},
+	[21] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -13,
+	},
+	[22] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x10,
+	},
+	[23] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -14,
+	},
+	[24] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x20,
+	},
+	[25] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -15,
+	},
+	[26] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x40,
+	},
+	[27] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -16,
+	},
+	[28] = {
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 0x80,
+	},
+	[29] = {
+		.code = (BPF_JMP | BPF_JA),
+		.off = -17,
+	},
+};
+
+static void
+test_jump1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+	uint64_t v1, v2;
+
+	dv = arg;
+
+	v1 = rte_rand();
+	v2 = rte_rand();
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u64 = v1;
+	dv->in[1].u64 = v2;
+	dv->in[0].u32 = (v1 << 12) + (v2 >> 6);
+	dv->in[1].u32 = (v2 << 12) - (v1 >> 6);
+}
+
+static int
+test_jump1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4, r5, rv;
+	const struct dummy_vect8 *dvt;
+
+	dvt = arg;
+
+	rv = 0;
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[0].u64;
+	r4 = dvt->in[1].u32;
+	r5 = dvt->in[1].u64;
+
+	if (r2 == TEST_JCC_1)
+		rv |= 0x1;
+	if ((int64_t)r3 <= TEST_JCC_2)
+		rv |= 0x2;
+	if (r4 > TEST_JCC_3)
+		rv |= 0x4;
+	if (r5 & TEST_JCC_4)
+		rv |= 0x8;
+	if (r2 != r3)
+		rv |= 0x10;
+	if ((int64_t)r2 > (int64_t)r4)
+		rv |= 0x20;
+	if (r2 <= r5)
+		rv |= 0x40;
+	if (r3 & r5)
+		rv |= 0x80;
+
+	return cmp_res(__func__, rv, rc, &rv, &rc, sizeof(rv));
+}
+
+/* alu (add, sub, and, or, xor, neg)  test-cases */
+static const struct ebpf_insn test_alu1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_5,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_AND | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_ALU | BPF_XOR | BPF_K),
+		.dst_reg = EBPF_REG_4,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_K),
+		.dst_reg = EBPF_REG_5,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_5,
+		.off = offsetof(struct dummy_vect8, out[3].u64),
+	},
+	{
+		.code = (BPF_ALU | BPF_OR | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_3,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_XOR | BPF_X),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_4,
+	},
+	{
+		.code = (BPF_ALU | BPF_SUB | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_5,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_AND | BPF_X),
+		.dst_reg = EBPF_REG_5,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[4].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[5].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[6].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_5,
+		.off = offsetof(struct dummy_vect8, out[7].u64),
+	},
+	/* return (-r2 + (-r3)) */
+	{
+		.code = (BPF_ALU | BPF_NEG),
+		.dst_reg = EBPF_REG_2,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_NEG),
+		.dst_reg = EBPF_REG_3,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_3,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_X),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static int
+test_alu1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4, r5, rv;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[0].u64;
+	r4 = dvt->in[1].u32;
+	r5 = dvt->in[1].u64;
+
+	r2 = (uint32_t)r2 & TEST_FILL_1;
+	r3 |= (int32_t) TEST_FILL_1;
+	r4 = (uint32_t)r4 ^ TEST_FILL_1;
+	r5 += (int32_t)TEST_FILL_1;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+	dve.out[3].u64 = r5;
+
+	r2 = (uint32_t)r2 | (uint32_t)r3;
+	r3 ^= r4;
+	r4 = (uint32_t)r4 - (uint32_t)r5;
+	r5 &= r2;
+
+	dve.out[4].u64 = r2;
+	dve.out[5].u64 = r3;
+	dve.out[6].u64 = r4;
+	dve.out[7].u64 = r5;
+
+	r2 = -(int32_t)r2;
+	rv = (uint32_t)r2;
+	r3 = -r3;
+	rv += r3;
+
+	return cmp_res(__func__, rv, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* endianness conversions (BE->LE/LE->BE)  test-cases */
+static const struct ebpf_insn test_bele1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_H),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u16),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_BE),
+		.dst_reg = EBPF_REG_2,
+		.imm = sizeof(uint16_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_BE),
+		.dst_reg = EBPF_REG_3,
+		.imm = sizeof(uint32_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_BE),
+		.dst_reg = EBPF_REG_4,
+		.imm = sizeof(uint64_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_H),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u16),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u64),
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_LE),
+		.dst_reg = EBPF_REG_2,
+		.imm = sizeof(uint16_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_LE),
+		.dst_reg = EBPF_REG_3,
+		.imm = sizeof(uint32_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_ALU | EBPF_END | EBPF_TO_LE),
+		.dst_reg = EBPF_REG_4,
+		.imm = sizeof(uint64_t) * CHAR_BIT,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[3].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[4].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[5].u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+test_bele1_prepare(void *arg)
+{
+	struct dummy_vect8 *dv;
+
+	dv = arg;
+
+	memset(dv, 0, sizeof(*dv));
+	dv->in[0].u64 = rte_rand();
+	dv->in[0].u32 = dv->in[0].u64;
+	dv->in[0].u16 = dv->in[0].u64;
+}
+
+static int
+test_bele1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u16;
+	r3 = dvt->in[0].u32;
+	r4 = dvt->in[0].u64;
+
+	r2 =  rte_cpu_to_be_16(r2);
+	r3 =  rte_cpu_to_be_32(r3);
+	r4 =  rte_cpu_to_be_64(r4);
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	r2 = dvt->in[0].u16;
+	r3 = dvt->in[0].u32;
+	r4 = dvt->in[0].u64;
+
+	r2 =  rte_cpu_to_le_16(r2);
+	r3 =  rte_cpu_to_le_32(r3);
+	r4 =  rte_cpu_to_le_64(r4);
+
+	dve.out[3].u64 = r2;
+	dve.out[4].u64 = r3;
+	dve.out[5].u64 = r4;
+
+	return cmp_res(__func__, 1, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* atomic add test-cases */
+static const struct ebpf_insn test_xadd1_prog[] = {
+
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = -1,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_4,
+		.imm = TEST_FILL_1,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_5,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_5,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_5,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_6,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_6,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_6,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_7,
+		.imm = TEST_JCC_2,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_7,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_7,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_8,
+		.imm = TEST_JCC_3,
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | BPF_W),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_8,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_STX | EBPF_XADD | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_8,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static int
+test_xadd1_check(uint64_t rc, const void *arg)
+{
+	uint64_t rv;
+	const struct dummy_offset *dft;
+	struct dummy_offset dfe;
+
+	dft = arg;
+	memset(&dfe, 0, sizeof(dfe));
+
+	rv = 1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = -1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = (int32_t)TEST_FILL_1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_MUL_1;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_MUL_2;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_JCC_2;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	rv = TEST_JCC_3;
+	rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv);
+	rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv);
+
+	return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe));
+}
+
+/* alu div test-cases */
+static const struct ebpf_insn test_div1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[0].u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[1].u64),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[2].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_DIV | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = TEST_MUL_1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_MOD | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = TEST_MUL_2,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 1,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_OR | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_ALU | BPF_MOD | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_DIV | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_3,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_2,
+		.off = offsetof(struct dummy_vect8, out[0].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_3,
+		.off = offsetof(struct dummy_vect8, out[1].u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_1,
+		.src_reg = EBPF_REG_4,
+		.off = offsetof(struct dummy_vect8, out[2].u64),
+	},
+	/* check that we can handle division by zero gracefully. */
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_vect8, in[3].u32),
+	},
+	{
+		.code = (BPF_ALU | BPF_DIV | BPF_X),
+		.dst_reg = EBPF_REG_4,
+		.src_reg = EBPF_REG_2,
+	},
+	/* return 1 */
+	{
+		.code = (BPF_ALU | EBPF_MOV | BPF_K),
+		.dst_reg = EBPF_REG_0,
+		.imm = 1,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static int
+test_div1_check(uint64_t rc, const void *arg)
+{
+	uint64_t r2, r3, r4;
+	const struct dummy_vect8 *dvt;
+	struct dummy_vect8 dve;
+
+	dvt = arg;
+	memset(&dve, 0, sizeof(dve));
+
+	r2 = dvt->in[0].u32;
+	r3 = dvt->in[1].u64;
+	r4 = dvt->in[2].u32;
+
+	r2 = (uint32_t)r2 / TEST_MUL_1;
+	r3 %= TEST_MUL_2;
+	r2 |= 1;
+	r3 |= 1;
+	r4 = (uint32_t)(r4 % r2);
+	r4 /= r3;
+
+	dve.out[0].u64 = r2;
+	dve.out[1].u64 = r3;
+	dve.out[2].u64 = r4;
+
+	/*
+	 * in the test prog we attempted to divide by zero.
+	 * so return value should return 0.
+	 */
+	return cmp_res(__func__, 0, rc, dve.out, dvt->out, sizeof(dve.out));
+}
+
+/* call test-cases */
+static const struct ebpf_insn test_call1_prog[] = {
+
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u32),
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_1,
+		.off = offsetof(struct dummy_offset, u64),
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_10,
+		.src_reg = EBPF_REG_2,
+		.off = -4,
+	},
+	{
+		.code = (BPF_STX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_10,
+		.src_reg = EBPF_REG_3,
+		.off = -16,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_X),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_10,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_SUB | BPF_K),
+		.dst_reg = EBPF_REG_2,
+		.imm = 4,
+	},
+	{
+		.code = (EBPF_ALU64 | EBPF_MOV | BPF_X),
+		.dst_reg = EBPF_REG_3,
+		.src_reg = EBPF_REG_10,
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_SUB | BPF_K),
+		.dst_reg = EBPF_REG_3,
+		.imm = 16,
+	},
+	{
+		.code = (BPF_JMP | EBPF_CALL),
+		.imm = 0,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | BPF_W),
+		.dst_reg = EBPF_REG_2,
+		.src_reg = EBPF_REG_10,
+		.off = -4,
+	},
+	{
+		.code = (BPF_LDX | BPF_MEM | EBPF_DW),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_10,
+		.off = -16
+	},
+	{
+		.code = (EBPF_ALU64 | BPF_ADD | BPF_X),
+		.dst_reg = EBPF_REG_0,
+		.src_reg = EBPF_REG_2,
+	},
+	{
+		.code = (BPF_JMP | EBPF_EXIT),
+	},
+};
+
+static void
+dummy_func1(const void *p, uint32_t *v32, uint64_t *v64)
+{
+	const struct dummy_offset *dv;
+
+	dv = p;
+
+	v32[0] += dv->u16;
+	v64[0] += dv->u8;
+}
+
+static int
+test_call1_check(uint64_t rc, const void *arg)
+{
+	uint32_t v32;
+	uint64_t v64;
+	const struct dummy_offset *dv;
+
+	dv = arg;
+
+	v32 = dv->u32;
+	v64 = dv->u64;
+	dummy_func1(arg, &v32, &v64);
+	v64 += v32;
+
+	if (v64 != rc) {
+		printf("%s@%d: invalid return value "
+			"expected=0x%" PRIx64 ", actual=0x%" PRIx64 "\n",
+			__func__, __LINE__, v64, rc);
+		return -1;
+	}
+	return 0;
+	return cmp_res(__func__, v64, rc, dv, dv, sizeof(*dv));
+}
+
+static const struct rte_bpf_xsym test_call1_xsym[] = {
+	{
+		.name = RTE_STR(dummy_func1),
+		.type = RTE_BPF_XTYPE_FUNC,
+		.func = (void *)dummy_func1,
+	},
+};
+
+static const struct bpf_test tests[] = {
+	{
+		.name = "test_store1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_store1_prog,
+			.nb_ins = RTE_DIM(test_store1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_store1_check,
+	},
+	{
+		.name = "test_store2",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_store2_prog,
+			.nb_ins = RTE_DIM(test_store2_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_store1_check,
+	},
+	{
+		.name = "test_load1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_load1_prog,
+			.nb_ins = RTE_DIM(test_load1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_load1_prepare,
+		.check_result = test_load1_check,
+	},
+	{
+		.name = "test_mul1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_mul1_prog,
+			.nb_ins = RTE_DIM(test_mul1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_mul1_prepare,
+		.check_result = test_mul1_check,
+	},
+	{
+		.name = "test_shift1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_shift1_prog,
+			.nb_ins = RTE_DIM(test_shift1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_shift1_prepare,
+		.check_result = test_shift1_check,
+	},
+	{
+		.name = "test_jump1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_jump1_prog,
+			.nb_ins = RTE_DIM(test_jump1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_jump1_prepare,
+		.check_result = test_jump1_check,
+	},
+	{
+		.name = "test_alu1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_alu1_prog,
+			.nb_ins = RTE_DIM(test_alu1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_jump1_prepare,
+		.check_result = test_alu1_check,
+	},
+	{
+		.name = "test_bele1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_bele1_prog,
+			.nb_ins = RTE_DIM(test_bele1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_bele1_prepare,
+		.check_result = test_bele1_check,
+	},
+	{
+		.name = "test_xadd1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_xadd1_prog,
+			.nb_ins = RTE_DIM(test_xadd1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+		},
+		.prepare = test_store1_prepare,
+		.check_result = test_xadd1_check,
+	},
+	{
+		.name = "test_div1",
+		.arg_sz = sizeof(struct dummy_vect8),
+		.prm = {
+			.ins = test_div1_prog,
+			.nb_ins = RTE_DIM(test_div1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_vect8),
+			},
+		},
+		.prepare = test_mul1_prepare,
+		.check_result = test_div1_check,
+	},
+	{
+		.name = "test_call1",
+		.arg_sz = sizeof(struct dummy_offset),
+		.prm = {
+			.ins = test_call1_prog,
+			.nb_ins = RTE_DIM(test_call1_prog),
+			.prog_arg = {
+				.type = RTE_BPF_ARG_PTR,
+				.size = sizeof(struct dummy_offset),
+			},
+			.xsym = test_call1_xsym,
+			.nb_xsym = RTE_DIM(test_call1_xsym),
+		},
+		.prepare = test_load1_prepare,
+		.check_result = test_call1_check,
+		/* for now don't support function calls on 32 bit platform */
+		.allow_fail = (sizeof(uint64_t) != sizeof(uintptr_t)),
+	},
+};
+
+static int
+run_test(const struct bpf_test *tst)
+{
+	int32_t ret, rv;
+	int64_t rc;
+	struct rte_bpf *bpf;
+	struct rte_bpf_jit jit;
+	uint8_t tbuf[tst->arg_sz];
+
+	printf("%s(%s) start\n", __func__, tst->name);
+
+	bpf = rte_bpf_load(&tst->prm);
+	if (bpf == NULL) {
+		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
+			__func__, __LINE__, rte_errno, strerror(rte_errno));
+		return -1;
+	}
+
+	tst->prepare(tbuf);
+
+	rc = rte_bpf_exec(bpf, tbuf);
+	ret = tst->check_result(rc, tbuf);
+	if (ret != 0) {
+		printf("%s@%d: check_result(%s) failed, error: %d(%s);\n",
+			__func__, __LINE__, tst->name, ret, strerror(ret));
+	}
+
+	rte_bpf_get_jit(bpf, &jit);
+	if (jit.func == NULL)
+		return 0;
+
+	tst->prepare(tbuf);
+	rc = jit.func(tbuf);
+	rv = tst->check_result(rc, tbuf);
+	ret |= rv;
+	if (rv != 0) {
+		printf("%s@%d: check_result(%s) failed, error: %d(%s);\n",
+			__func__, __LINE__, tst->name, rv, strerror(ret));
+	}
+
+	rte_bpf_destroy(bpf);
+	return ret;
+
+}
+
+static int
+test_bpf(void)
+{
+	int32_t rc, rv;
+	uint32_t i;
+
+	rc = 0;
+	for (i = 0; i != RTE_DIM(tests); i++) {
+		rv = run_test(tests + i);
+		if (tests[i].allow_fail == 0)
+			rc |= rv;
+	}
+
+	return rc;
+}
+
+REGISTER_TEST_COMMAND(bpf_autotest, test_bpf);
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [dpdk-dev] [PATCH v6 9/9] doc: add bpf library related info
  2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
                           ` (9 preceding siblings ...)
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 8/9] test: introduce functional test for librte_bpf Konstantin Ananyev
@ 2018-05-10 10:23         ` Konstantin Ananyev
  10 siblings, 0 replies; 83+ messages in thread
From: Konstantin Ananyev @ 2018-05-10 10:23 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
Acked-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 MAINTAINERS                                 |  1 +
 doc/api/doxy-api-index.md                   |  3 +-
 doc/api/doxy-api.conf                       |  1 +
 doc/guides/prog_guide/bpf_lib.rst           | 38 ++++++++++++++++++++
 doc/guides/prog_guide/index.rst             |  1 +
 doc/guides/rel_notes/release_18_05.rst      |  7 ++++
 doc/guides/testpmd_app_ug/testpmd_funcs.rst | 56 +++++++++++++++++++++++++++++
 7 files changed, 106 insertions(+), 1 deletion(-)
 create mode 100644 doc/guides/prog_guide/bpf_lib.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 7350f61ed..3d4b92cc4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1063,6 +1063,7 @@ F: lib/librte_latencystats/
 BPF
 M: Konstantin Ananyev <konstantin.ananyev@intel.com>
 F: lib/librte_bpf/
+F: doc/guides/prog_guide/bpf_lib.rst
 
 Test Applications
 -----------------
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 55d075c66..6365917d5 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -111,7 +111,8 @@ The public API headers are grouped by topics:
   [EFD]                (@ref rte_efd.h),
   [ACL]                (@ref rte_acl.h),
   [member]             (@ref rte_member.h),
-  [flow classify]      (@ref rte_flow_classify.h)
+  [flow classify]      (@ref rte_flow_classify.h),
+  [BPF]                (@ref rte_bpf.h)
 
 - **containers**:
   [mbuf]               (@ref rte_mbuf.h),
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index aa66751a0..8bc9b0b1d 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -45,6 +45,7 @@ INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_acl \
                           lib/librte_bbdev \
                           lib/librte_bitratestats \
+                          lib/librte_bpf \
                           lib/librte_cfgfile \
                           lib/librte_cmdline \
                           lib/librte_compat \
diff --git a/doc/guides/prog_guide/bpf_lib.rst b/doc/guides/prog_guide/bpf_lib.rst
new file mode 100644
index 000000000..7c08e6b2d
--- /dev/null
+++ b/doc/guides/prog_guide/bpf_lib.rst
@@ -0,0 +1,38 @@
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+Berkeley Packet Filter Library
+==============================
+
+The DPDK provides an BPF library that gives the ability
+to load and execute Enhanced Berkeley Packet Filter (eBPF) bytecode within
+user-space dpdk application.
+
+It supports basic set of features from eBPF spec.
+Please refer to the
+`eBPF spec <https://www.kernel.org/doc/Documentation/networking/filter.txt>`
+for more information.
+Also it introduces basic framework to load/unload BPF-based filters
+on eth devices (right now only via SW RX/TX callbacks).
+
+The library API provides the following basic operations:
+
+*  Create a new BPF execution context and load user provided eBPF code into it.
+
+*   Destroy an BPF execution context and its runtime structures and free the associated memory.
+
+*   Execute eBPF bytecode associated with provided input parameter.
+
+*   Provide information about natively compiled code for given BPF context.
+
+*   Load BPF program from the ELF file and install callback to execute it on given ethdev port/queue.
+
+Not currently supported eBPF features
+-------------------------------------
+
+ - JIT for non X86_64 platforms
+ - cBPF
+ - tail-pointer call
+ - eBPF MAP
+ - skb
+ - external function calls for 32-bit platforms
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index 235ad0201..2c40fb4ec 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -51,6 +51,7 @@ Programmer's Guide
     vhost_lib
     metrics_lib
     port_hotplug_framework
+    bpf_lib
     source_org
     dev_kit_build_system
     dev_kit_root_make_help
diff --git a/doc/guides/rel_notes/release_18_05.rst b/doc/guides/rel_notes/release_18_05.rst
index 5d1cc1807..41a862959 100644
--- a/doc/guides/rel_notes/release_18_05.rst
+++ b/doc/guides/rel_notes/release_18_05.rst
@@ -183,6 +183,13 @@ New Features
   stats/xstats on shared memory from secondary process, and also pdump packets on
   those virtual devices.
 
+* **Added the BPF Library.**
+
+  The BPF Library provides the ability to load and execute
+  Enhanced Berkeley Packet Filter (eBPF) within user-space dpdk application.
+  Also it introduces basic framework to load/unload BPF-based filters
+  on eth devices (right now only via SW RX/TX callbacks).
+  It also adds dependency on libelf.
 
 API Changes
 -----------
diff --git a/doc/guides/testpmd_app_ug/testpmd_funcs.rst b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
index 013a40549..e4afb03dc 100644
--- a/doc/guides/testpmd_app_ug/testpmd_funcs.rst
+++ b/doc/guides/testpmd_app_ug/testpmd_funcs.rst
@@ -3813,3 +3813,59 @@ Validate and create a QinQ rule on port 0 to steer traffic to a queue on the hos
    ID      Group   Prio    Attr    Rule
    0       0       0       i-      ETH VLAN VLAN=>VF QUEUE
    1       0       0       i-      ETH VLAN VLAN=>PF QUEUE
+
+BPF Functions
+--------------
+
+The following sections show functions to load/unload eBPF based filters.
+
+bpf-load
+~~~~~~~~
+
+Load an eBPF program as a callback for partciular RX/TX queue::
+
+   testpmd> bpf-load rx|tx (portid) (queueid) (load-flags) (bpf-prog-filename)
+
+The available load-flags are:
+
+* ``J``: use JIT generated native code, otherwise BPF interpreter will be used.
+
+* ``M``: assume input parameter is a pointer to rte_mbuf, otherwise assume it is a pointer to first segment's data.
+
+* ``-``: none.
+
+.. note::
+
+   You'll need clang v3.7 or above to build bpf program you'd like to load
+
+For example:
+
+.. code-block:: console
+
+   cd test/bpf
+   clang -O2 -target bpf -c t1.c
+
+Then to load (and JIT compile) t1.o at RX queue 0, port 1::
+
+.. code-block:: console
+
+   testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o
+
+To load (not JITed) t1.o at TX queue 0, port 0::
+
+.. code-block:: console
+
+   testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/t1.o
+
+bpf-unload
+~~~~~~~~~~
+
+Unload previously loaded eBPF program for partciular RX/TX queue::
+
+   testpmd> bpf-unload rx|tx (portid) (queueid)
+
+For example to unload BPF filter from TX queue 0, port 0:
+
+.. code-block:: console
+
+   testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/t1.o
-- 
2.13.6

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/9] add framework to load and execute BPF code
  2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 0/9] add framework to load and execute BPF code Konstantin Ananyev
@ 2018-05-11 14:23           ` Ferruh Yigit
  2018-05-11 22:46             ` Thomas Monjalon
  0 siblings, 1 reply; 83+ messages in thread
From: Ferruh Yigit @ 2018-05-11 14:23 UTC (permalink / raw)
  To: Konstantin Ananyev, dev

On 5/10/2018 11:23 AM, Konstantin Ananyev wrote:
> BPF is used quite intensively inside Linux (and BSD) kernels
> for various different purposes and proved to be extremely useful.
> 
> BPF inside DPDK might also be used in a lot of places
> for a lot of similar things.
>  As an example to:
> - packet filtering/tracing (aka tcpdump)
> - packet classification
> - statistics collection
> - HW/PMD live-system debugging/prototyping - trace HW descriptors,
>   internal PMD SW state, etc.
> - Comeup with your own idea
> 
> All of that in a dynamic, user-defined and extensible manner.
> 
> So these series introduce new library - librte_bpf.
> librte_bpf provides API to load and execute BPF bytecode within
> user-space dpdk app.
> It supports basic set of features from eBPF spec.
> Also it introduces basic framework to load/unload BPF-based filters
> on eth devices (right now via SW RX/TX callbacks).
> 
> How to try it:
> ===============
> 
> 1) run testpmd as usual and start your favorite forwarding case.
> 2) build bpf program you'd like to load
> (you'll need clang v3.7 or above):
> $ cd test/bpf
> $ clang -O2 -target bpf -c t1.c
> 
> 3) load bpf program(s):
> testpmd> bpf-load rx|tx <portid> <queueid> <load-flags> <bpf-prog-filename>
> 
> <load-flags>:  [-][J][M]
> J - use JIT generated native code, otherwise BPF interpreter will be used.
> M - assume input parameter is a pointer to rte_mbuf,
>     otherwise assume it is a pointer to first segment's data.
> 
> Few examples:
> 
> # to load (not JITed) dummy.o at TX queue 0, port 0:
> testpmd> bpf-load tx 0 0 - ./dpdk.org/test/bpf/dummy.o
> #to load (and JIT compile) t1.o at RX queue 0, port 1:
> testpmd> bpf-load rx 1 0 J ./dpdk.org/test/bpf/t1.o
> 
> #to load and JIT t3.o (note that it expects mbuf as an input):
> testpmd> bpf-load rx 2 0 JM ./dpdk.org/test/bpf/t3.o
> 
> 
> 4) observe changed traffic behavior
> Let say with the examples above:
>  - dummy.o  does literally nothing, so no changes should be here,
>    except some possible slowdown.
>  - t1.o - should force to drop all packets that doesn't match:
>    'dst 1.2.3.4 && udp && dst port 5000' filter.
>  - t3.o - should dump to stdout ARP packets.
> 
> 5) unload some or all bpf programs:
> testpmd> bpf-unload tx 0 0
> 
> 6) continue with step 3) or exit
> 
> Not currently supported features:
> =================================
> - cBPF
> - tail-pointer call
> - eBPF MAP
> - JIT for non X86_64 targets
> - skb
> - function calls for 32-bit apps
> - mbuf pointer as input parameter for 32-bit apps
> 
> v2:
>  - add meson build
>  - add freebsd build
>  - use new logging API
>  - using rte_malloc() for cbi allocation
>  - add extra logic into bpf_validate()
> 
> v3:
>  - add new test-case for it
>  - update docs
>  - update MAINTAINERS
> 
> v4:
>  - add more tests to cover BPF ISA
>  - fix few issues
> 
> v5:
>  - revert changes in tap_bpf.h
>  - rename eBPF related defines
>  - apply Thomas and Marco comments
> 
> v6:
>  Address Thomas, Kevin and Ferruh comments:
>  - handle case when libelf is not installed gracefully
>  - allow testpmd to be built without librte_bpf
>  - doc nits
> 
> Konstantin Ananyev (9):
>   bpf: add BPF loading and execution framework
>   bpf: add ability to load eBPF program from ELF object file
>   bpf: add more logic into bpf_validate>   bpf: add JIT compilation for x86_64 ISA
check-git-log.sh complains about "_"

>   bpf: introduce basic RX/TX BPF filters
s,RX/TX,Rx/Tx

>   testpmd: new commands to load/unload BPF filters
app/testpmd: ...

>   test: add few eBPF samples
>   test: introduce functional test for librte_bpf
>   doc: add bpf library related info

I confirm pathset passes from my build scripts.
Also new RTE_LIBRTE_BPF_ELF config option disabled by default resolves
dependency issue.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [dpdk-dev] [PATCH v6 0/9] add framework to load and execute BPF code
  2018-05-11 14:23           ` Ferruh Yigit
@ 2018-05-11 22:46             ` Thomas Monjalon
  0 siblings, 0 replies; 83+ messages in thread
From: Thomas Monjalon @ 2018-05-11 22:46 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, Ferruh Yigit

11/05/2018 16:23, Ferruh Yigit:
> On 5/10/2018 11:23 AM, Konstantin Ananyev wrote:
> > Konstantin Ananyev (9):
> >   bpf: add BPF loading and execution framework
> >   bpf: add ability to load eBPF program from ELF object file
> >   bpf: add more logic into bpf_validate>   bpf: add JIT compilation for x86_64 ISA
> check-git-log.sh complains about "_"
> 
> >   bpf: introduce basic RX/TX BPF filters
> s,RX/TX,Rx/Tx
> 
> >   testpmd: new commands to load/unload BPF filters
> app/testpmd: ...
> 
> >   test: add few eBPF samples
> >   test: introduce functional test for librte_bpf
> >   doc: add bpf library related info

I have fixed the titles as Ferruh suggest,
moved doxygen changes in the right commits,
moved testpmd doc in the right commit,
fixed MAINTAINERS file, fixed some typos,

and applied for 18.05-rc3.

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2018-05-11 22:46 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-09 16:42 [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Konstantin Ananyev
2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 1/5] bpf: add BPF loading and execution framework Konstantin Ananyev
2018-03-13 13:24   ` Jerin Jacob
2018-03-13 17:47     ` Ananyev, Konstantin
2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 2/5] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 3/5] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
2018-03-13 13:39   ` Jerin Jacob
2018-03-13 18:07     ` Ananyev, Konstantin
2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 4/5] testpmd: new commands to load/unload " Konstantin Ananyev
2018-03-09 16:42 ` [dpdk-dev] [PATCH v1 5/5] test: add few eBPF samples Konstantin Ananyev
2018-03-13 13:02 ` [dpdk-dev] [PATCH v1 0/5] add framework to load and execute BPF code Jerin Jacob
2018-03-13 17:24   ` Ananyev, Konstantin
2018-03-14 16:43 ` Alejandro Lucero
     [not found]   ` <2601191342CEEE43887BDE71AB9772589E29032C@irsmsx105.ger.corp.intel.com>
2018-03-16  9:45     ` Ananyev, Konstantin
2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 0/7] " Konstantin Ananyev
2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 1/7] net: move BPF related definitions into librte_net Konstantin Ananyev
2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 00/10] add framework to load and execute BPF code Konstantin Ananyev
2018-04-09  4:54     ` Jerin Jacob
2018-04-09 11:10       ` Ananyev, Konstantin
2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 00/10] add framework to load and execute BPF code Konstantin Ananyev
2018-04-16 21:25       ` Thomas Monjalon
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 01/10] net: move BPF related definitions into librte_net Konstantin Ananyev
2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 0/8] add framework to load and execute BPF code Konstantin Ananyev
2018-05-09 17:11         ` Ferruh Yigit
2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 1/8] bpf: add BPF loading and execution framework Konstantin Ananyev
2018-05-09 17:09         ` Ferruh Yigit
2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 0/9] add framework to load and execute BPF code Konstantin Ananyev
2018-05-11 14:23           ` Ferruh Yigit
2018-05-11 22:46             ` Thomas Monjalon
2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 1/9] bpf: add BPF loading and execution framework Konstantin Ananyev
2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 2/9] bpf: add ability to load eBPF program from ELF object file Konstantin Ananyev
2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 3/9] bpf: add more logic into bpf_validate() Konstantin Ananyev
2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 4/9] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 5/9] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 6/9] testpmd: new commands to load/unload " Konstantin Ananyev
2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 7/9] test: add few eBPF samples Konstantin Ananyev
2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 8/9] test: introduce functional test for librte_bpf Konstantin Ananyev
2018-05-10 10:23         ` [dpdk-dev] [PATCH v6 9/9] doc: add bpf library related info Konstantin Ananyev
2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 2/8] bpf: add more logic into bpf_validate() Konstantin Ananyev
2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 3/8] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 4/8] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
2018-05-09 17:09         ` Ferruh Yigit
2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 5/8] testpmd: new commands to load/unload " Konstantin Ananyev
2018-05-09 17:09         ` Ferruh Yigit
2018-05-09 18:31           ` Kevin Traynor
2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 6/8] test: add few eBPF samples Konstantin Ananyev
2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 7/8] test: introduce functional test for librte_bpf Konstantin Ananyev
2018-05-04 12:45       ` [dpdk-dev] [PATCH v5 8/8] doc: add bpf library related info Konstantin Ananyev
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 02/10] bpf: add BPF loading and execution framework Konstantin Ananyev
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 03/10] bpf: add more logic into bpf_validate() Konstantin Ananyev
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 04/10] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 05/10] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 06/10] testpmd: new commands to load/unload " Konstantin Ananyev
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 07/10] test: add few eBPF samples Konstantin Ananyev
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 08/10] test: introduce functional test for librte_bpf Konstantin Ananyev
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 09/10] doc: add librte_bpf related info Konstantin Ananyev
2018-04-23 13:26       ` Kovacevic, Marko
2018-04-23 13:34       ` Kovacevic, Marko
2018-04-13 14:43     ` [dpdk-dev] [PATCH v4 10/10] MAINTAINERS: " Konstantin Ananyev
2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 02/10] bpf: add BPF loading and execution framework Konstantin Ananyev
2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 03/10] bpf: add more logic into bpf_validate() Konstantin Ananyev
2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 04/10] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 05/10] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 06/10] testpmd: new commands to load/unload " Konstantin Ananyev
2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 07/10] test: add few eBPF samples Konstantin Ananyev
2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 08/10] test: introduce functional test for librte_bpf Konstantin Ananyev
2018-04-06 18:49   ` [dpdk-dev] [PATCH v3 09/10] doc: add librte_bpf related info Konstantin Ananyev
2018-04-23 13:22     ` Kovacevic, Marko
2018-04-06 23:18   ` [dpdk-dev] [PATCH v3 10/10] MAINTAINERS: " Konstantin Ananyev
2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 2/7] bpf: add BPF loading and execution framework Konstantin Ananyev
2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 3/7] bpf: add more logic into bpf_validate() Konstantin Ananyev
2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 4/7] bpf: add JIT compilation for x86_64 ISA Konstantin Ananyev
2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 5/7] bpf: introduce basic RX/TX BPF filters Konstantin Ananyev
2018-04-02 22:44   ` Jerin Jacob
2018-04-03 14:57     ` Ananyev, Konstantin
2018-04-03 17:17       ` Jerin Jacob
2018-04-04 11:39         ` Ananyev, Konstantin
2018-04-04 17:51           ` Jerin Jacob
2018-04-05 12:51             ` Ananyev, Konstantin
2018-04-09  4:38               ` Jerin Jacob
2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 6/7] testpmd: new commands to load/unload " Konstantin Ananyev
2018-03-30 17:32 ` [dpdk-dev] [PATCH v2 7/7] test: add few eBPF samples Konstantin Ananyev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).