From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id E8D7D5F1B for ; Fri, 30 Mar 2018 19:33:01 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Mar 2018 10:33:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,382,1517904000"; d="scan'208";a="46664506" Received: from sivswdev02.ir.intel.com (HELO localhost.localdomain) ([10.237.217.46]) by orsmga002.jf.intel.com with ESMTP; 30 Mar 2018 10:32:59 -0700 From: Konstantin Ananyev To: dev@dpdk.org Cc: Konstantin Ananyev Date: Fri, 30 Mar 2018 18:32:38 +0100 Message-Id: <1522431163-25621-3-git-send-email-konstantin.ananyev@intel.com> X-Mailer: git-send-email 1.7.0.7 In-Reply-To: <1520613725-9176-1-git-send-email-konstantin.ananyev@intel.com> References: <1520613725-9176-1-git-send-email-konstantin.ananyev@intel.com> Subject: [dpdk-dev] [PATCH v2 2/7] bpf: add BPF loading and execution framework X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Mar 2018 17:33:02 -0000 librte_bpf provides a framework to load and execute eBPF bytecode inside user-space dpdk based applications. It supports basic set of features from eBPF spec (https://www.kernel.org/doc/Documentation/networking/filter.txt). Not currently supported features: - JIT - cBPF - tail-pointer call - eBPF MAP - skb It also adds dependency on libelf. Signed-off-by: Konstantin Ananyev --- config/common_base | 5 + lib/Makefile | 2 + lib/librte_bpf/Makefile | 30 +++ lib/librte_bpf/bpf.c | 59 +++++ lib/librte_bpf/bpf_exec.c | 452 +++++++++++++++++++++++++++++++++++++ lib/librte_bpf/bpf_impl.h | 41 ++++ lib/librte_bpf/bpf_load.c | 385 +++++++++++++++++++++++++++++++ lib/librte_bpf/bpf_validate.c | 55 +++++ lib/librte_bpf/meson.build | 18 ++ lib/librte_bpf/rte_bpf.h | 160 +++++++++++++ lib/librte_bpf/rte_bpf_version.map | 12 + lib/meson.build | 2 +- mk/rte.app.mk | 2 + 13 files changed, 1222 insertions(+), 1 deletion(-) create mode 100644 lib/librte_bpf/Makefile create mode 100644 lib/librte_bpf/bpf.c create mode 100644 lib/librte_bpf/bpf_exec.c create mode 100644 lib/librte_bpf/bpf_impl.h create mode 100644 lib/librte_bpf/bpf_load.c create mode 100644 lib/librte_bpf/bpf_validate.c create mode 100644 lib/librte_bpf/meson.build create mode 100644 lib/librte_bpf/rte_bpf.h create mode 100644 lib/librte_bpf/rte_bpf_version.map diff --git a/config/common_base b/config/common_base index ee10b449b..97b60f9ff 100644 --- a/config/common_base +++ b/config/common_base @@ -827,3 +827,8 @@ CONFIG_RTE_APP_CRYPTO_PERF=y # Compile the eventdev application # CONFIG_RTE_APP_EVENTDEV=y + +# +# Compile librte_bpf +# +CONFIG_RTE_LIBRTE_BPF=y diff --git a/lib/Makefile b/lib/Makefile index ec965a606..a4a2329f9 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -97,6 +97,8 @@ DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether DIRS-$(CONFIG_RTE_LIBRTE_GSO) += librte_gso DEPDIRS-librte_gso := librte_eal librte_mbuf librte_ether librte_net DEPDIRS-librte_gso += librte_mempool +DIRS-$(CONFIG_RTE_LIBRTE_BPF) += librte_bpf +DEPDIRS-librte_bpf := librte_eal librte_mempool librte_mbuf librte_ether ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y) DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni diff --git a/lib/librte_bpf/Makefile b/lib/librte_bpf/Makefile new file mode 100644 index 000000000..e0f434e77 --- /dev/null +++ b/lib/librte_bpf/Makefile @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Intel Corporation + +include $(RTE_SDK)/mk/rte.vars.mk + +# library name +LIB = librte_bpf.a + +CFLAGS += -O3 +CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) +CFLAGS += -DALLOW_EXPERIMENTAL_API +LDLIBS += -lrte_net -lrte_eal +LDLIBS += -lrte_mempool -lrte_ring +LDLIBS += -lrte_mbuf -lrte_ethdev +LDLIBS += -lelf + +EXPORT_MAP := rte_bpf_version.map + +LIBABIVER := 1 + +# all source are stored in SRCS-y +SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf.c +SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_exec.c +SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_load.c +SRCS-$(CONFIG_RTE_LIBRTE_BPF) += bpf_validate.c + +# install header files +SYMLINK-$(CONFIG_RTE_LIBRTE_BPF)-include += rte_bpf.h + +include $(RTE_SDK)/mk/rte.lib.mk diff --git a/lib/librte_bpf/bpf.c b/lib/librte_bpf/bpf.c new file mode 100644 index 000000000..d7f68c017 --- /dev/null +++ b/lib/librte_bpf/bpf.c @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "bpf_impl.h" + +int rte_bpf_logtype; + +__rte_experimental void +rte_bpf_destroy(struct rte_bpf *bpf) +{ + if (bpf != NULL) { + if (bpf->jit.func != NULL) + munmap(bpf->jit.func, bpf->jit.sz); + munmap(bpf, bpf->sz); + } +} + +__rte_experimental int +rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit) +{ + if (bpf == NULL || jit == NULL) + return -EINVAL; + + jit[0] = bpf->jit; + return 0; +} + +int +bpf_jit(struct rte_bpf *bpf) +{ + int32_t rc; + + rc = -ENOTSUP; + if (rc != 0) + RTE_BPF_LOG(WARNING, "%s(%p) failed, error code: %d;\n", + __func__, bpf, rc); + return rc; +} + +RTE_INIT(rte_bpf_init_log); + +static void +rte_bpf_init_log(void) +{ + rte_bpf_logtype = rte_log_register("lib.bpf"); + if (rte_bpf_logtype >= 0) + rte_log_set_level(rte_bpf_logtype, RTE_LOG_INFO); +} diff --git a/lib/librte_bpf/bpf_exec.c b/lib/librte_bpf/bpf_exec.c new file mode 100644 index 000000000..0382ade98 --- /dev/null +++ b/lib/librte_bpf/bpf_exec.c @@ -0,0 +1,452 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +#include "bpf_impl.h" + +#define BPF_JMP_UNC(ins) ((ins) += (ins)->off) + +#define BPF_JMP_CND_REG(reg, ins, op, type) \ + ((ins) += \ + ((type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) ? \ + (ins)->off : 0) + +#define BPF_JMP_CND_IMM(reg, ins, op, type) \ + ((ins) += \ + ((type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) ? \ + (ins)->off : 0) + +#define BPF_NEG_ALU(reg, ins, type) \ + ((reg)[(ins)->dst_reg] = (type)(-(reg)[(ins)->dst_reg])) + +#define BPF_MOV_ALU_REG(reg, ins, type) \ + ((reg)[(ins)->dst_reg] = (type)(reg)[(ins)->src_reg]) + +#define BPF_OP_ALU_REG(reg, ins, op, type) \ + ((reg)[(ins)->dst_reg] = \ + (type)(reg)[(ins)->dst_reg] op (type)(reg)[(ins)->src_reg]) + +#define BPF_MOV_ALU_IMM(reg, ins, type) \ + ((reg)[(ins)->dst_reg] = (type)(ins)->imm) + +#define BPF_OP_ALU_IMM(reg, ins, op, type) \ + ((reg)[(ins)->dst_reg] = \ + (type)(reg)[(ins)->dst_reg] op (type)(ins)->imm) + +#define BPF_DIV_ZERO_CHECK(bpf, reg, ins, type) do { \ + if ((type)(reg)[(ins)->src_reg] == 0) { \ + RTE_BPF_LOG(ERR, \ + "%s(%p): division by 0 at pc: %#zx;\n", \ + __func__, bpf, \ + (uintptr_t)(ins) - (uintptr_t)(bpf)->prm.ins); \ + return 0; \ + } \ +} while (0) + +#define BPF_LD_REG(reg, ins, type) \ + ((reg)[(ins)->dst_reg] = \ + *(type *)(uintptr_t)((reg)[(ins)->src_reg] + (ins)->off)) + +#define BPF_ST_IMM(reg, ins, type) \ + (*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \ + (type)(ins)->imm) + +#define BPF_ST_REG(reg, ins, type) \ + (*(type *)(uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off) = \ + (type)(reg)[(ins)->src_reg]) + +#define BPF_ST_XADD_REG(reg, ins, tp) \ + (rte_atomic##tp##_add((rte_atomic##tp##_t *) \ + (uintptr_t)((reg)[(ins)->dst_reg] + (ins)->off), \ + reg[ins->src_reg])) + +static inline void +bpf_alu_be(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins) +{ + uint64_t *v; + + v = reg + ins->dst_reg; + switch (ins->imm) { + case 16: + *v = rte_cpu_to_be_16(*v); + break; + case 32: + *v = rte_cpu_to_be_32(*v); + break; + case 64: + *v = rte_cpu_to_be_64(*v); + break; + } +} + +static inline void +bpf_alu_le(uint64_t reg[MAX_BPF_REG], const struct bpf_insn *ins) +{ + uint64_t *v; + + v = reg + ins->dst_reg; + switch (ins->imm) { + case 16: + *v = rte_cpu_to_le_16(*v); + break; + case 32: + *v = rte_cpu_to_le_32(*v); + break; + case 64: + *v = rte_cpu_to_le_64(*v); + break; + } +} + +static inline uint64_t +bpf_exec(const struct rte_bpf *bpf, uint64_t reg[MAX_BPF_REG]) +{ + const struct bpf_insn *ins; + + for (ins = bpf->prm.ins; ; ins++) { + switch (ins->code) { + /* 32 bit ALU IMM operations */ + case (BPF_ALU | BPF_ADD | BPF_K): + BPF_OP_ALU_IMM(reg, ins, +, uint32_t); + break; + case (BPF_ALU | BPF_SUB | BPF_K): + BPF_OP_ALU_IMM(reg, ins, -, uint32_t); + break; + case (BPF_ALU | BPF_AND | BPF_K): + BPF_OP_ALU_IMM(reg, ins, &, uint32_t); + break; + case (BPF_ALU | BPF_OR | BPF_K): + BPF_OP_ALU_IMM(reg, ins, |, uint32_t); + break; + case (BPF_ALU | BPF_LSH | BPF_K): + BPF_OP_ALU_IMM(reg, ins, <<, uint32_t); + break; + case (BPF_ALU | BPF_RSH | BPF_K): + BPF_OP_ALU_IMM(reg, ins, >>, uint32_t); + break; + case (BPF_ALU | BPF_XOR | BPF_K): + BPF_OP_ALU_IMM(reg, ins, ^, uint32_t); + break; + case (BPF_ALU | BPF_MUL | BPF_K): + BPF_OP_ALU_IMM(reg, ins, *, uint32_t); + break; + case (BPF_ALU | BPF_DIV | BPF_K): + BPF_OP_ALU_IMM(reg, ins, /, uint32_t); + break; + case (BPF_ALU | BPF_MOD | BPF_K): + BPF_OP_ALU_IMM(reg, ins, %, uint32_t); + break; + case (BPF_ALU | BPF_MOV | BPF_K): + BPF_MOV_ALU_IMM(reg, ins, uint32_t); + break; + /* 32 bit ALU REG operations */ + case (BPF_ALU | BPF_ADD | BPF_X): + BPF_OP_ALU_REG(reg, ins, +, uint32_t); + break; + case (BPF_ALU | BPF_SUB | BPF_X): + BPF_OP_ALU_REG(reg, ins, -, uint32_t); + break; + case (BPF_ALU | BPF_AND | BPF_X): + BPF_OP_ALU_REG(reg, ins, &, uint32_t); + break; + case (BPF_ALU | BPF_OR | BPF_X): + BPF_OP_ALU_REG(reg, ins, |, uint32_t); + break; + case (BPF_ALU | BPF_LSH | BPF_X): + BPF_OP_ALU_REG(reg, ins, <<, uint32_t); + break; + case (BPF_ALU | BPF_RSH | BPF_X): + BPF_OP_ALU_REG(reg, ins, >>, uint32_t); + break; + case (BPF_ALU | BPF_XOR | BPF_X): + BPF_OP_ALU_REG(reg, ins, ^, uint32_t); + break; + case (BPF_ALU | BPF_MUL | BPF_X): + BPF_OP_ALU_REG(reg, ins, *, uint32_t); + break; + case (BPF_ALU | BPF_DIV | BPF_X): + BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t); + BPF_OP_ALU_REG(reg, ins, /, uint32_t); + break; + case (BPF_ALU | BPF_MOD | BPF_X): + BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint32_t); + BPF_OP_ALU_REG(reg, ins, %, uint32_t); + break; + case (BPF_ALU | BPF_MOV | BPF_X): + BPF_MOV_ALU_REG(reg, ins, uint32_t); + break; + case (BPF_ALU | BPF_NEG): + BPF_NEG_ALU(reg, ins, uint32_t); + break; + case (BPF_ALU | BPF_END | BPF_TO_BE): + bpf_alu_be(reg, ins); + break; + case (BPF_ALU | BPF_END | BPF_TO_LE): + bpf_alu_le(reg, ins); + break; + /* 64 bit ALU IMM operations */ + case (BPF_ALU64 | BPF_ADD | BPF_K): + BPF_OP_ALU_IMM(reg, ins, +, uint64_t); + break; + case (BPF_ALU64 | BPF_SUB | BPF_K): + BPF_OP_ALU_IMM(reg, ins, -, uint64_t); + break; + case (BPF_ALU64 | BPF_AND | BPF_K): + BPF_OP_ALU_IMM(reg, ins, &, uint64_t); + break; + case (BPF_ALU64 | BPF_OR | BPF_K): + BPF_OP_ALU_IMM(reg, ins, |, uint64_t); + break; + case (BPF_ALU64 | BPF_LSH | BPF_K): + BPF_OP_ALU_IMM(reg, ins, <<, uint64_t); + break; + case (BPF_ALU64 | BPF_RSH | BPF_K): + BPF_OP_ALU_IMM(reg, ins, >>, uint64_t); + break; + case (BPF_ALU64 | BPF_ARSH | BPF_K): + BPF_OP_ALU_IMM(reg, ins, >>, int64_t); + break; + case (BPF_ALU64 | BPF_XOR | BPF_K): + BPF_OP_ALU_IMM(reg, ins, ^, uint64_t); + break; + case (BPF_ALU64 | BPF_MUL | BPF_K): + BPF_OP_ALU_IMM(reg, ins, *, uint64_t); + break; + case (BPF_ALU64 | BPF_DIV | BPF_K): + BPF_OP_ALU_IMM(reg, ins, /, uint64_t); + break; + case (BPF_ALU64 | BPF_MOD | BPF_K): + BPF_OP_ALU_IMM(reg, ins, %, uint64_t); + break; + case (BPF_ALU64 | BPF_MOV | BPF_K): + BPF_MOV_ALU_IMM(reg, ins, uint64_t); + break; + /* 64 bit ALU REG operations */ + case (BPF_ALU64 | BPF_ADD | BPF_X): + BPF_OP_ALU_REG(reg, ins, +, uint64_t); + break; + case (BPF_ALU64 | BPF_SUB | BPF_X): + BPF_OP_ALU_REG(reg, ins, -, uint64_t); + break; + case (BPF_ALU64 | BPF_AND | BPF_X): + BPF_OP_ALU_REG(reg, ins, &, uint64_t); + break; + case (BPF_ALU64 | BPF_OR | BPF_X): + BPF_OP_ALU_REG(reg, ins, |, uint64_t); + break; + case (BPF_ALU64 | BPF_LSH | BPF_X): + BPF_OP_ALU_REG(reg, ins, <<, uint64_t); + break; + case (BPF_ALU64 | BPF_RSH | BPF_X): + BPF_OP_ALU_REG(reg, ins, >>, uint64_t); + break; + case (BPF_ALU64 | BPF_ARSH | BPF_X): + BPF_OP_ALU_REG(reg, ins, >>, int64_t); + break; + case (BPF_ALU64 | BPF_XOR | BPF_X): + BPF_OP_ALU_REG(reg, ins, ^, uint64_t); + break; + case (BPF_ALU64 | BPF_MUL | BPF_X): + BPF_OP_ALU_REG(reg, ins, *, uint64_t); + break; + case (BPF_ALU64 | BPF_DIV | BPF_X): + BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t); + BPF_OP_ALU_REG(reg, ins, /, uint64_t); + break; + case (BPF_ALU64 | BPF_MOD | BPF_X): + BPF_DIV_ZERO_CHECK(bpf, reg, ins, uint64_t); + BPF_OP_ALU_REG(reg, ins, %, uint64_t); + break; + case (BPF_ALU64 | BPF_MOV | BPF_X): + BPF_MOV_ALU_REG(reg, ins, uint64_t); + break; + case (BPF_ALU64 | BPF_NEG): + BPF_NEG_ALU(reg, ins, uint64_t); + break; + /* load instructions */ + case (BPF_LDX | BPF_MEM | BPF_B): + BPF_LD_REG(reg, ins, uint8_t); + break; + case (BPF_LDX | BPF_MEM | BPF_H): + BPF_LD_REG(reg, ins, uint16_t); + break; + case (BPF_LDX | BPF_MEM | BPF_W): + BPF_LD_REG(reg, ins, uint32_t); + break; + case (BPF_LDX | BPF_MEM | BPF_DW): + BPF_LD_REG(reg, ins, uint64_t); + break; + /* load 64 bit immediate value */ + case (BPF_LD | BPF_IMM | BPF_DW): + reg[ins->dst_reg] = (uint32_t)ins[0].imm | + (uint64_t)(uint32_t)ins[1].imm << 32; + ins++; + break; + /* store instructions */ + case (BPF_STX | BPF_MEM | BPF_B): + BPF_ST_REG(reg, ins, uint8_t); + break; + case (BPF_STX | BPF_MEM | BPF_H): + BPF_ST_REG(reg, ins, uint16_t); + break; + case (BPF_STX | BPF_MEM | BPF_W): + BPF_ST_REG(reg, ins, uint32_t); + break; + case (BPF_STX | BPF_MEM | BPF_DW): + BPF_ST_REG(reg, ins, uint64_t); + break; + case (BPF_ST | BPF_MEM | BPF_B): + BPF_ST_IMM(reg, ins, uint8_t); + break; + case (BPF_ST | BPF_MEM | BPF_H): + BPF_ST_IMM(reg, ins, uint16_t); + break; + case (BPF_ST | BPF_MEM | BPF_W): + BPF_ST_IMM(reg, ins, uint32_t); + break; + case (BPF_ST | BPF_MEM | BPF_DW): + BPF_ST_IMM(reg, ins, uint64_t); + break; + /* atomic add instructions */ + case (BPF_STX | BPF_XADD | BPF_W): + BPF_ST_XADD_REG(reg, ins, 32); + break; + case (BPF_STX | BPF_XADD | BPF_DW): + BPF_ST_XADD_REG(reg, ins, 64); + break; + /* jump instructions */ + case (BPF_JMP | BPF_JA): + BPF_JMP_UNC(ins); + break; + /* jump IMM instructions */ + case (BPF_JMP | BPF_JEQ | BPF_K): + BPF_JMP_CND_IMM(reg, ins, ==, uint64_t); + break; + case (BPF_JMP | BPF_JNE | BPF_K): + BPF_JMP_CND_IMM(reg, ins, !=, uint64_t); + break; + case (BPF_JMP | BPF_JGT | BPF_K): + BPF_JMP_CND_IMM(reg, ins, >, uint64_t); + break; + case (BPF_JMP | BPF_JLT | BPF_K): + BPF_JMP_CND_IMM(reg, ins, <, uint64_t); + break; + case (BPF_JMP | BPF_JGE | BPF_K): + BPF_JMP_CND_IMM(reg, ins, >=, uint64_t); + break; + case (BPF_JMP | BPF_JLE | BPF_K): + BPF_JMP_CND_IMM(reg, ins, <=, uint64_t); + break; + case (BPF_JMP | BPF_JSGT | BPF_K): + BPF_JMP_CND_IMM(reg, ins, >, int64_t); + break; + case (BPF_JMP | BPF_JSLT | BPF_K): + BPF_JMP_CND_IMM(reg, ins, <, int64_t); + break; + case (BPF_JMP | BPF_JSGE | BPF_K): + BPF_JMP_CND_IMM(reg, ins, >=, int64_t); + break; + case (BPF_JMP | BPF_JSLE | BPF_K): + BPF_JMP_CND_IMM(reg, ins, <=, int64_t); + break; + case (BPF_JMP | BPF_JSET | BPF_K): + BPF_JMP_CND_IMM(reg, ins, &, uint64_t); + break; + /* jump REG instructions */ + case (BPF_JMP | BPF_JEQ | BPF_X): + BPF_JMP_CND_REG(reg, ins, ==, uint64_t); + break; + case (BPF_JMP | BPF_JNE | BPF_X): + BPF_JMP_CND_REG(reg, ins, !=, uint64_t); + break; + case (BPF_JMP | BPF_JGT | BPF_X): + BPF_JMP_CND_REG(reg, ins, >, uint64_t); + break; + case (BPF_JMP | BPF_JLT | BPF_X): + BPF_JMP_CND_REG(reg, ins, <, uint64_t); + break; + case (BPF_JMP | BPF_JGE | BPF_X): + BPF_JMP_CND_REG(reg, ins, >=, uint64_t); + break; + case (BPF_JMP | BPF_JLE | BPF_X): + BPF_JMP_CND_REG(reg, ins, <=, uint64_t); + break; + case (BPF_JMP | BPF_JSGT | BPF_X): + BPF_JMP_CND_REG(reg, ins, >, int64_t); + break; + case (BPF_JMP | BPF_JSLT | BPF_X): + BPF_JMP_CND_REG(reg, ins, <, int64_t); + break; + case (BPF_JMP | BPF_JSGE | BPF_X): + BPF_JMP_CND_REG(reg, ins, >=, int64_t); + break; + case (BPF_JMP | BPF_JSLE | BPF_X): + BPF_JMP_CND_REG(reg, ins, <=, int64_t); + break; + case (BPF_JMP | BPF_JSET | BPF_X): + BPF_JMP_CND_REG(reg, ins, &, uint64_t); + break; + /* call instructions */ + case (BPF_JMP | BPF_CALL): + reg[BPF_REG_0] = bpf->prm.xsym[ins->imm].func( + reg[BPF_REG_1], reg[BPF_REG_2], reg[BPF_REG_3], + reg[BPF_REG_4], reg[BPF_REG_5]); + break; + /* return instruction */ + case (BPF_JMP | BPF_EXIT): + return reg[BPF_REG_0]; + default: + RTE_BPF_LOG(ERR, + "%s(%p): invalid opcode %#x at pc: %#zx;\n", + __func__, bpf, ins->code, + (uintptr_t)ins - (uintptr_t)bpf->prm.ins); + return 0; + } + } + + /* should never be reached */ + RTE_VERIFY(0); + return 0; +} + +__rte_experimental uint32_t +rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], uint64_t rc[], + uint32_t num) +{ + uint32_t i; + uint64_t reg[MAX_BPF_REG]; + uint64_t stack[MAX_BPF_STACK_SIZE / sizeof(uint64_t)]; + + for (i = 0; i != num; i++) { + + reg[BPF_REG_1] = (uintptr_t)ctx[i]; + reg[BPF_REG_10] = (uintptr_t)(stack + RTE_DIM(stack)); + + rc[i] = bpf_exec(bpf, reg); + } + + return i; +} + +__rte_experimental uint64_t +rte_bpf_exec(const struct rte_bpf *bpf, void *ctx) +{ + uint64_t rc; + + rte_bpf_exec_burst(bpf, &ctx, &rc, 1); + return rc; +} diff --git a/lib/librte_bpf/bpf_impl.h b/lib/librte_bpf/bpf_impl.h new file mode 100644 index 000000000..5d7e65c31 --- /dev/null +++ b/lib/librte_bpf/bpf_impl.h @@ -0,0 +1,41 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#ifndef _BPF_H_ +#define _BPF_H_ + +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +#define MAX_BPF_STACK_SIZE 0x200 + +struct rte_bpf { + struct rte_bpf_prm prm; + struct rte_bpf_jit jit; + size_t sz; + uint32_t stack_sz; +}; + +extern int bpf_validate(struct rte_bpf *bpf); + +extern int bpf_jit(struct rte_bpf *bpf); + +#ifdef RTE_ARCH_X86_64 +extern int bpf_jit_x86(struct rte_bpf *); +#endif + +extern int rte_bpf_logtype; + +#define RTE_BPF_LOG(lvl, fmt, args...) \ + rte_log(RTE_LOG_## lvl, rte_bpf_logtype, fmt, ##args) + +#ifdef __cplusplus +} +#endif + +#endif /* _BPF_H_ */ diff --git a/lib/librte_bpf/bpf_load.c b/lib/librte_bpf/bpf_load.c new file mode 100644 index 000000000..e1ff5714a --- /dev/null +++ b/lib/librte_bpf/bpf_load.c @@ -0,0 +1,385 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include + +#include + +#include +#include +#include +#include +#include +#include +#include + +#include "bpf_impl.h" + +/* To overcome compatibility issue */ +#ifndef EM_BPF +#define EM_BPF 247 +#endif + +static uint32_t +bpf_find_xsym(const char *sn, enum rte_bpf_xtype type, + const struct rte_bpf_xsym fp[], uint32_t fn) +{ + uint32_t i; + + if (sn == NULL || fp == NULL) + return UINT32_MAX; + + for (i = 0; i != fn; i++) { + if (fp[i].type == type && strcmp(sn, fp[i].name) == 0) + break; + } + + return (i != fn) ? i : UINT32_MAX; +} + +/* + * update BPF code at offset *ofs* with a proper address(index) for external + * symbol *sn* + */ +static int +resolve_xsym(const char *sn, size_t ofs, struct bpf_insn *ins, size_t ins_sz, + const struct rte_bpf_prm *prm) +{ + uint32_t idx, fidx; + enum rte_bpf_xtype type; + + if (ofs % sizeof(ins[0]) != 0 || ofs >= ins_sz) + return -EINVAL; + + idx = ofs / sizeof(ins[0]); + if (ins[idx].code == (BPF_JMP | BPF_CALL)) + type = RTE_BPF_XTYPE_FUNC; + else if (ins[idx].code == (BPF_LD | BPF_IMM | BPF_DW) && + ofs < ins_sz - sizeof(ins[idx])) + type = RTE_BPF_XTYPE_VAR; + else + return -EINVAL; + + fidx = bpf_find_xsym(sn, type, prm->xsym, prm->nb_xsym); + if (fidx == UINT32_MAX) + return -ENOENT; + + /* for function we just need an index in our xsym table */ + if (type == RTE_BPF_XTYPE_FUNC) + ins[idx].imm = fidx; + /* for variable we need to store its absolute address */ + else { + ins[idx].imm = (uintptr_t)prm->xsym[fidx].var; + ins[idx + 1].imm = (uintptr_t)prm->xsym[fidx].var >> 32; + } + + return 0; +} + +static int +check_elf_header(const Elf64_Ehdr * eh) +{ + const char *err; + + err = NULL; + +#if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN + if (eh->e_ident[EI_DATA] != ELFDATA2LSB) +#else + if (eh->e_ident[EI_DATA] != ELFDATA2MSB) +#endif + err = "not native byte order"; + else if (eh->e_ident[EI_OSABI] != ELFOSABI_NONE) + err = "unexpected OS ABI"; + else if (eh->e_type != ET_REL) + err = "unexpected ELF type"; + else if (eh->e_machine != EM_NONE && eh->e_machine != EM_BPF) + err = "unexpected machine type"; + + if (err != NULL) { + RTE_BPF_LOG(ERR, "%s(): %s\n", __func__, err); + return -EINVAL; + } + + return 0; +} + +/* + * helper function, find executable section by name. + */ +static int +find_elf_code(Elf *elf, const char *section, Elf_Data **psd, size_t *pidx) +{ + Elf_Scn *sc; + const Elf64_Ehdr *eh; + const Elf64_Shdr *sh; + Elf_Data *sd; + const char *sn; + int32_t rc; + + eh = elf64_getehdr(elf); + if (eh == NULL) { + rc = elf_errno(); + RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n", + __func__, elf, section, rc, elf_errmsg(rc)); + return -EINVAL; + } + + if (check_elf_header(eh) != 0) + return -EINVAL; + + /* find given section by name */ + for (sc = elf_nextscn(elf, NULL); sc != NULL; + sc = elf_nextscn(elf, sc)) { + sh = elf64_getshdr(sc); + sn = elf_strptr(elf, eh->e_shstrndx, sh->sh_name); + if (sn != NULL && strcmp(section, sn) == 0 && + sh->sh_type == SHT_PROGBITS && + sh->sh_flags == (SHF_ALLOC | SHF_EXECINSTR)) + break; + } + + sd = elf_getdata(sc, NULL); + if (sd == NULL || sd->d_size == 0 || + sd->d_size % sizeof(struct bpf_insn) != 0) { + rc = elf_errno(); + RTE_BPF_LOG(ERR, "%s(%p, %s) error code: %d(%s)\n", + __func__, elf, section, rc, elf_errmsg(rc)); + return -EINVAL; + } + + *psd = sd; + *pidx = elf_ndxscn(sc); + return 0; +} + +/* + * helper function to process data from relocation table. + */ +static int +process_reloc(Elf *elf, size_t sym_idx, Elf64_Rel *re, size_t re_sz, + struct bpf_insn *ins, size_t ins_sz, const struct rte_bpf_prm *prm) +{ + int32_t rc; + uint32_t i, n; + size_t ofs, sym; + const char *sn; + const Elf64_Ehdr *eh; + Elf_Scn *sc; + const Elf_Data *sd; + Elf64_Sym *sm; + + eh = elf64_getehdr(elf); + + /* get symtable by section index */ + sc = elf_getscn(elf, sym_idx); + sd = elf_getdata(sc, NULL); + if (sd == NULL) + return -EINVAL; + sm = sd->d_buf; + + n = re_sz / sizeof(re[0]); + for (i = 0; i != n; i++) { + + ofs = re[i].r_offset; + + /* retrieve index in the symtable */ + sym = ELF64_R_SYM(re[i].r_info); + if (sym * sizeof(sm[0]) >= sd->d_size) + return -EINVAL; + + sn = elf_strptr(elf, eh->e_shstrndx, sm[sym].st_name); + + rc = resolve_xsym(sn, ofs, ins, ins_sz, prm); + if (rc != 0) { + RTE_BPF_LOG(ERR, + "resolve_xsym(%s, %zu) error code: %d\n", + sn, ofs, rc); + return rc; + } + } + + return 0; +} + +/* + * helper function, find relocation information (if any) + * and update bpf code. + */ +static int +elf_reloc_code(Elf *elf, Elf_Data *ed, size_t sidx, + const struct rte_bpf_prm *prm) +{ + Elf64_Rel *re; + Elf_Scn *sc; + const Elf64_Shdr *sh; + const Elf_Data *sd; + int32_t rc; + + rc = 0; + + /* walk through all sections */ + for (sc = elf_nextscn(elf, NULL); sc != NULL && rc == 0; + sc = elf_nextscn(elf, sc)) { + + sh = elf64_getshdr(sc); + + /* relocation data for our code section */ + if (sh->sh_type == SHT_REL && sh->sh_info == sidx) { + sd = elf_getdata(sc, NULL); + if (sd == NULL || sd->d_size == 0 || + sd->d_size % sizeof(re[0]) != 0) + return -EINVAL; + rc = process_reloc(elf, sh->sh_link, + sd->d_buf, sd->d_size, ed->d_buf, ed->d_size, + prm); + } + } + + return rc; +} + +static struct rte_bpf * +bpf_load(const struct rte_bpf_prm *prm) +{ + uint8_t *buf; + struct rte_bpf *bpf; + size_t sz, bsz, insz, xsz; + + xsz = prm->nb_xsym * sizeof(prm->xsym[0]); + insz = prm->nb_ins * sizeof(prm->ins[0]); + bsz = sizeof(bpf[0]); + sz = insz + xsz + bsz; + + buf = mmap(NULL, sz, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (buf == MAP_FAILED) + return NULL; + + bpf = (void *)buf; + bpf->sz = sz; + + memcpy(&bpf->prm, prm, sizeof(bpf->prm)); + + memcpy(buf + bsz, prm->xsym, xsz); + memcpy(buf + bsz + xsz, prm->ins, insz); + + bpf->prm.xsym = (void *)(buf + bsz); + bpf->prm.ins = (void *)(buf + bsz + xsz); + + return bpf; +} + +__rte_experimental struct rte_bpf * +rte_bpf_load(const struct rte_bpf_prm *prm) +{ + struct rte_bpf *bpf; + int32_t rc; + + if (prm == NULL || prm->ins == NULL) { + rte_errno = EINVAL; + return NULL; + } + + bpf = bpf_load(prm); + if (bpf == NULL) { + rte_errno = ENOMEM; + return NULL; + } + + rc = bpf_validate(bpf); + if (rc == 0) { + bpf_jit(bpf); + if (mprotect(bpf, bpf->sz, PROT_READ) != 0) + rc = -ENOMEM; + } + + if (rc != 0) { + rte_bpf_destroy(bpf); + rte_errno = -rc; + return NULL; + } + + return bpf; +} + +static struct rte_bpf * +bpf_load_elf(const struct rte_bpf_prm *prm, int32_t fd, const char *section) +{ + Elf *elf; + Elf_Data *sd; + size_t sidx; + int32_t rc; + struct rte_bpf *bpf; + struct rte_bpf_prm np; + + elf_version(EV_CURRENT); + elf = elf_begin(fd, ELF_C_READ, NULL); + + rc = find_elf_code(elf, section, &sd, &sidx); + if (rc == 0) + rc = elf_reloc_code(elf, sd, sidx, prm); + + if (rc == 0) { + np = prm[0]; + np.ins = sd->d_buf; + np.nb_ins = sd->d_size / sizeof(struct bpf_insn); + bpf = rte_bpf_load(&np); + } else { + bpf = NULL; + rte_errno = -rc; + } + + elf_end(elf); + return bpf; +} + +__rte_experimental struct rte_bpf * +rte_bpf_elf_load(const struct rte_bpf_prm *prm, const char *fname, + const char *sname) +{ + int32_t fd, rc; + struct rte_bpf *bpf; + + if (prm == NULL || fname == NULL || sname == NULL) { + rte_errno = EINVAL; + return NULL; + } + + fd = open(fname, O_RDONLY); + if (fd < 0) { + rc = errno; + RTE_BPF_LOG(ERR, "%s(%s) error code: %d(%s)\n", + __func__, fname, rc, strerror(rc)); + rte_errno = EINVAL; + return NULL; + } + + bpf = bpf_load_elf(prm, fd, sname); + close(fd); + + if (bpf == NULL) { + RTE_BPF_LOG(ERR, + "%s(fname=\"%s\", sname=\"%s\") failed, " + "error code: %d\n", + __func__, fname, sname, rte_errno); + return NULL; + } + + RTE_BPF_LOG(INFO, "%s(fname=\"%s\", sname=\"%s\") " + "successfully creates %p(jit={.func=%p,.sz=%zu});\n", + __func__, fname, sname, bpf, bpf->jit.func, bpf->jit.sz); + return bpf; +} diff --git a/lib/librte_bpf/bpf_validate.c b/lib/librte_bpf/bpf_validate.c new file mode 100644 index 000000000..1911e1381 --- /dev/null +++ b/lib/librte_bpf/bpf_validate.c @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#include +#include +#include +#include +#include +#include + +#include +#include + +#include "bpf_impl.h" + +/* + * dummy one for now, need more work. + */ +int +bpf_validate(struct rte_bpf *bpf) +{ + int32_t rc, ofs, stack_sz; + uint32_t i, op, dr; + const struct bpf_insn *ins; + + rc = 0; + stack_sz = 0; + for (i = 0; i != bpf->prm.nb_ins; i++) { + + ins = bpf->prm.ins + i; + op = ins->code; + dr = ins->dst_reg; + ofs = ins->off; + + if ((BPF_CLASS(op) == BPF_STX || BPF_CLASS(op) == BPF_ST) && + dr == BPF_REG_10) { + ofs -= sizeof(uint64_t); + stack_sz = RTE_MIN(ofs, stack_sz); + } + } + + if (stack_sz != 0) { + stack_sz = -stack_sz; + if (stack_sz > MAX_BPF_STACK_SIZE) + rc = -ERANGE; + else + bpf->stack_sz = stack_sz; + } + + if (rc != 0) + RTE_BPF_LOG(ERR, "%s(%p) failed, error code: %d;\n", + __func__, bpf, rc); + return rc; +} diff --git a/lib/librte_bpf/meson.build b/lib/librte_bpf/meson.build new file mode 100644 index 000000000..05c48c7ff --- /dev/null +++ b/lib/librte_bpf/meson.build @@ -0,0 +1,18 @@ +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2018 Intel Corporation + +allow_experimental_apis = true +sources = files('bpf.c', + 'bpf_exec.c', + 'bpf_load.c', + 'bpf_validate.c') + +install_headers = files('rte_bpf.h') + +deps += ['mbuf', 'net'] + +dep = dependency('libelf', required: false) +if dep.found() == false + build = false +endif +ext_deps += dep diff --git a/lib/librte_bpf/rte_bpf.h b/lib/librte_bpf/rte_bpf.h new file mode 100644 index 000000000..4d4b93599 --- /dev/null +++ b/lib/librte_bpf/rte_bpf.h @@ -0,0 +1,160 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2018 Intel Corporation + */ + +#ifndef _RTE_BPF_H_ +#define _RTE_BPF_H_ + +#include +#include +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * Possible types for external symbols. + */ +enum rte_bpf_xtype { + RTE_BPF_XTYPE_FUNC, /**< function */ + RTE_BPF_XTYPE_VAR, /**< variable */ + RTE_BPF_XTYPE_NUM +}; + +/** + * Definition for external symbols available in the BPF program. + */ +struct rte_bpf_xsym { + const char *name; /**< name */ + enum rte_bpf_xtype type; /**< type */ + union { + uint64_t (*func)(uint64_t, uint64_t, uint64_t, + uint64_t, uint64_t); + void *var; + }; /**< value */ +}; + +/** + * Possible BPF program types. + * Use negative values for DPDK specific prog-types, to make sure they will + * not interfere with Linux related ones. + */ +enum rte_bpf_prog_type { + RTE_BPF_PROG_TYPE_UNSPEC = BPF_PROG_TYPE_UNSPEC, + /**< input is a pointer to raw data */ + RTE_BPF_PROG_TYPE_MBUF = INT32_MIN, + /**< input is a pointer to rte_mbuf */ +}; + +/** + * Input parameters for loading eBPF code. + */ +struct rte_bpf_prm { + const struct bpf_insn *ins; /**< array of eBPF instructions */ + uint32_t nb_ins; /**< number of instructions in ins */ + const struct rte_bpf_xsym *xsym; + /**< array of external symbols that eBPF code is allowed to reference */ + uint32_t nb_xsym; /**< number of elements in xsym */ + enum rte_bpf_prog_type prog_type; /**< eBPF program type */ +}; + +/** + * Information about compiled into native ISA eBPF code. + */ +struct rte_bpf_jit { + uint64_t (*func)(void *); + size_t sz; +}; + +struct rte_bpf; + +/** + * De-allocate all memory used by this eBPF execution context. + * + * @param bpf + * BPF handle to destroy. + */ +void rte_bpf_destroy(struct rte_bpf *bpf); + +/** + * Create a new eBPF execution context and load given BPF code into it. + * + * @param prm + * Parameters used to create and initialise the BPF exeution context. + * @return + * BPF handle that is used in future BPF operations, + * or NULL on error, with error code set in rte_errno. + * Possible rte_errno errors include: + * - EINVAL - invalid parameter passed to function + * - ENOMEM - can't reserve enough memory + */ +struct rte_bpf *rte_bpf_load(const struct rte_bpf_prm *prm); + +/** + * Create a new eBPF execution context and load BPF code from given ELF + * file into it. + * + * @param prm + * Parameters used to create and initialise the BPF exeution context. + * @param fname + * Pathname for a ELF file. + * @param sname + * Name of the executable section within the file to load. + * @return + * BPF handle that is used in future BPF operations, + * or NULL on error, with error code set in rte_errno. + * Possible rte_errno errors include: + * - EINVAL - invalid parameter passed to function + * - ENOMEM - can't reserve enough memory + */ +struct rte_bpf *rte_bpf_elf_load(const struct rte_bpf_prm *prm, + const char *fname, const char *sname); + +/** + * Execute given BPF bytecode. + * + * @param bpf + * handle for the BPF code to execute. + * @param ctx + * pointer to input context. + * @return + * BPF execution return value. + */ +uint64_t rte_bpf_exec(const struct rte_bpf *bpf, void *ctx); + +/** + * Execute given BPF bytecode over a set of input contexts. + * + * @param bpf + * handle for the BPF code to execute. + * @param ctx + * array of pointers to the input contexts. + * @param rc + * array of return values (one per input). + * @param num + * number of elements in ctx[] (and rc[]). + * @return + * number of successfully processed inputs. + */ +uint32_t rte_bpf_exec_burst(const struct rte_bpf *bpf, void *ctx[], + uint64_t rc[], uint32_t num); + +/** + * Provide information about natively compield code for given BPF handle. + * + * @param bpf + * handle for the BPF code. + * @param jit + * pointer to the rte_bpf_jit structure to be filled with related data. + * @return + * - -EINVAL if the parameters are invalid. + * - Zero if operation completed successfully. + */ +int rte_bpf_get_jit(const struct rte_bpf *bpf, struct rte_bpf_jit *jit); + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_BPF_H_ */ diff --git a/lib/librte_bpf/rte_bpf_version.map b/lib/librte_bpf/rte_bpf_version.map new file mode 100644 index 000000000..ff65144df --- /dev/null +++ b/lib/librte_bpf/rte_bpf_version.map @@ -0,0 +1,12 @@ +EXPERIMENTAL { + global: + + rte_bpf_destroy; + rte_bpf_elf_load; + rte_bpf_exec; + rte_bpf_exec_burst; + rte_bpf_get_jit; + rte_bpf_load; + + local: *; +}; diff --git a/lib/meson.build b/lib/meson.build index ef6159170..7ff7aaaa5 100644 --- a/lib/meson.build +++ b/lib/meson.build @@ -23,7 +23,7 @@ libraries = [ 'compat', # just a header, used for versioning # add pkt framework libs which use other libs from above 'port', 'table', 'pipeline', # flow_classify lib depends on pkt framework table lib - 'flow_classify'] + 'flow_classify', 'bpf'] foreach l:libraries build = true diff --git a/mk/rte.app.mk b/mk/rte.app.mk index 94525dc80..07a9bcfe2 100644 --- a/mk/rte.app.mk +++ b/mk/rte.app.mk @@ -83,6 +83,8 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_POWER) += -lrte_power _LDLIBS-$(CONFIG_RTE_LIBRTE_TIMER) += -lrte_timer _LDLIBS-$(CONFIG_RTE_LIBRTE_EFD) += -lrte_efd +_LDLIBS-$(CONFIG_RTE_LIBRTE_BPF) += -lrte_bpf -lelf + _LDLIBS-y += --whole-archive _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE) += -lrte_cfgfile -- 2.13.6