BPF standardization

DPDK patches and discussions
 help / color / mirror / Atom feed

* BPF standardization
@ 2024-05-30 23:23 Stephen Hemminger
  2024-07-19  9:12 ` Konstantin Ananyev
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2024-05-30 23:23 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev

It would be good to make sure that DPDK BPF conforms to IETF draft.
https://datatracker.ietf.org/doc/draft-ietf-bpf-isa/

Based on LWN article on presentation at Linux Storage, Filesystem,
Memory Mangerment, and BPF Summit.

https://lwn.net/SubscriberLink/975830/3b32df6be23d3abf/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: BPF standardization
  2024-05-30 23:23 BPF standardization Stephen Hemminger
@ 2024-07-19  9:12 ` Konstantin Ananyev
  2024-07-19 17:13   ` Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: Konstantin Ananyev @ 2024-07-19  9:12 UTC (permalink / raw)
  To: Stephen Hemminger, Konstantin Ananyev; +Cc: dev

> It would be good to make sure that DPDK BPF conforms to IETF draft.
> https://datatracker.ietf.org/doc/draft-ietf-bpf-isa/
> 
> Based on LWN article on presentation at Linux Storage, Filesystem,
> Memory Mangerment, and BPF Summit.
> 
> https://lwn.net/SubscriberLink/975830/3b32df6be23d3abf/

Yes, it would be really good...
Another interesting option that was raised few times byt different people,
would be opportunity to re-use extrernal eBPF verifiers with DPDK eBPF progs:
either one from linux kernel or user-space (PREVAIL: https://github.com/vbpf/ebpf-verifier/tree/main),
or even both.
One of the main obstacle with that: both linux kernel and PREVAIL
assume input context for eBPF prog in particular format 
(usually struct __sk_buff or struct xdp_md).
In fact, PREVAIL is more flexible here, and allows to specify your own format,
but it still expects some main things (data, data_end) to be present and located
in the same way as linux kernel.
After another thought, might be a simple way to overcome it would be  
to mimic what linux kernel does with 'direct' packet access:
At verify stage it rewrites given BPF prog to
convert load instructions that access fields of a context type into a
sequence of instructions that access fields of the underlying structure: 
struct __sk_buff    -> struct sk_buff
struct bpf_sock_ops -> struct sock
etc. (for more details see convert_ctx_accesses() in linux/kernel/bpf/verifier.c).
Inside DPDK verifier/loader we can probably do the same:
convert direct access of __sk_buff and/or xdp_md fields into rte_mbuf fields.
I.E.:
(__sk_buff->data) -> (mbuf->buf_addr + mbuf->data_off)
(__sk_buff->data_end) -> (mbuf->buf_addr + mbuf->data_off + mbuf->data_len)
and so on. 

BTW, right now, eBPF programs produced by DPDK cBPF->eBPF converter
can be successfully verified by linux kernel.
Things are easy here, as cBPF converter doesn't try to access packet contents directly
(but only through special instructions: BPF_LD_ABS, BPF_LD_IND).  
Just small fix is required in rte_bpf_convert() to achieve that, see below.

In theory, that would not also give us ability to re-use external verifiers,
but also should make possible to execute subset of eBPF progs written for linux kernel
within DPDK app.
Of-course, not all of them, as right now linux eBPF has much richer functionality
that we missing (MAPs, tail calls, etc.), but that's another story.

We plan to do some work for eBPF+DPDK within next several months, so might
be able to look at it too... though not hard promises here.
Meanwhile interested in comments/thoughts/volunteers :)

Thanks
Konstantin

======================

 [PATCH 1/2] bpf: fix converter emitted code fails with linux verifier

bpf_convert_filter() uses standard approach with XOR-ing itself:
xor r0, r0, r0
to reset some register values.
Unfortunately linux verifier seems way too strict here and
doesn't allow access to register with undefined value.
It generates error log like that for this op:
Failed to verify program: Permission denied (13)
LOG: func#0 @0
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (af) r0 ^= r0
R0 !read_ok
processed 1 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

To overcome that, simply replace XOR with itself to explicit
mov32 r0, #0x0

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 lib/bpf/bpf_convert.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
index d7ff2b4325..eceaa19c76 100644
--- a/lib/bpf/bpf_convert.c
+++ b/lib/bpf/bpf_convert.c
@@ -267,8 +267,11 @@ static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
 		/* Classic BPF expects A and X to be reset first. These need
 		 * to be guaranteed to be the first two instructions.
 		 */
-		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
-		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+		//*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		//*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		*new_insn++ = BPF_MOV32_IMM(BPF_REG_A, 0);
+		*new_insn++ = BPF_MOV32_IMM(BPF_REG_X, 0);

 		/* All programs must keep CTX in callee saved BPF_REG_CTX.
 		 * In eBPF case it's done by the compiler, here we need to
-- 
2.35.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BPF standardization
  2024-07-19  9:12 ` Konstantin Ananyev
@ 2024-07-19 17:13   ` Stephen Hemminger
  2024-07-19 18:29     ` Konstantin Ananyev
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2024-07-19 17:13 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: Konstantin Ananyev, dev

On Fri, 19 Jul 2024 09:12:58 +0000
Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:

> bpf_convert_filter() uses standard approach with XOR-ing itself:
> xor r0, r0, r0
> to reset some register values.
> Unfortunately linux verifier seems way too strict here and
> doesn't allow access to register with undefined value.
> It generates error log like that for this op:
> Failed to verify program: Permission denied (13)
> LOG: func#0 @0
> 0: R1=ctx(id=0,off=0,imm=0) R10=fp0
> 0: (af) r0 ^= r0
> R0 !read_ok
> processed 1 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
> 
> To overcome that, simply replace XOR with itself to explicit
> mov32 r0, #0x0
> 
> Signed-off-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
> ---
>  lib/bpf/bpf_convert.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
> index d7ff2b4325..eceaa19c76 100644
> --- a/lib/bpf/bpf_convert.c
> +++ b/lib/bpf/bpf_convert.c
> @@ -267,8 +267,11 @@ static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
>  		/* Classic BPF expects A and X to be reset first. These need
>  		 * to be guaranteed to be the first two instructions.
>  		 */
> -		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
> -		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
> +		//*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
> +		//*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
> +
> +		*new_insn++ = BPF_MOV32_IMM(BPF_REG_A, 0);
> +		*new_insn++ = BPF_MOV32_IMM(BPF_REG_X, 0);
>  
>  		/* All programs must keep CTX in callee saved BPF_REG_CTX.
>  		 * In eBPF case it's done by the compiler, here we need to

This was taken from how kernel converts cBPF and the prologue it generates.
Surprising that verifier gags?

see net/core/filter.c
	/* Classic BPF related prologue emission. */
	if (new_prog) {
		/* Classic BPF expects A and X to be reset first. These need
		 * to be guaranteed to be the first two instructions.
		 */
		*new_insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
		*new_insn++ = BPF_ALU32_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);

		/* All programs must keep CTX in callee saved BPF_REG_CTX.
		 * In eBPF case it's done by the compiler, here we need to
		 * do this ourself. Initial CTX is present in BPF_REG_ARG1.
		 */
		*new_insn++ = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1);
		if (*seen_ld_abs) {
			/* For packet access in classic BPF, cache skb->data
			 * in callee-saved BPF R8 and skb->len - skb->data_len
			 * (headlen) in BPF R9. Since classic BPF is read-only
			 * on CTX, we only need to cache it once.
			 */
			*new_insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, data),
						  BPF_REG_D, BPF_REG_CTX,
						  offsetof(struct sk_buff, data));
			*new_insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_H, BPF_REG_CTX,
						  offsetof(struct sk_buff, len));
			*new_insn++ = BPF_LDX_MEM(BPF_W, BPF_REG_TMP, BPF_REG_CTX,
						  offsetof(struct sk_buff, data_len));
			*new_insn++ = BPF_ALU32_REG(BPF_SUB, BPF_REG_H, BPF_REG_TMP);
		}



^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: BPF standardization
  2024-07-19 17:13   ` Stephen Hemminger
@ 2024-07-19 18:29     ` Konstantin Ananyev
  0 siblings, 0 replies; 4+ messages in thread
From: Konstantin Ananyev @ 2024-07-19 18:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Konstantin Ananyev, dev


> This was taken from how kernel converts cBPF and the prologue it generates.
> Surprising that verifier gags?

Indeed it looks strange...
Have to admit I didn't dig into how linux converter works...
Below is the patch I used to test it from DPDK UT - might be you'll spot something obvious here.  

================

Subject: [PATCH 2/2] test/bpf: call linux verifier for converted cBPF filter

To ensure DPDK/kernel compatibility for converted cBPF filters, etc.

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 app/test/meson.build             |  1 +
 app/test/test_bpf.c              | 11 +++++++++
 app/test/test_bpf.h              | 12 ++++++++++
 app/test/test_bpf_verify_linux.c | 39 ++++++++++++++++++++++++++++++++
 4 files changed, 63 insertions(+)
 create mode 100644 app/test/test_bpf.h
 create mode 100644 app/test/test_bpf_verify_linux.c

diff --git a/app/test/meson.build b/app/test/meson.build
index e29258e6ec..d46e688db5 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -35,6 +35,7 @@ source_file_deps = {
     'test_bitops.c': [],
     'test_bitratestats.c': ['metrics', 'bitratestats', 'ethdev'] + sample_packet_forward_deps,
     'test_bpf.c': ['bpf', 'net'],
+    'test_bpf_verify_linux.c': [],
     'test_byteorder.c': [],
 #    'test_cfgfile.c': ['cfgfile'],
     'test_cksum.c': ['net'],
diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c
index 7819d6aba9..749f9de20c 100644
--- a/app/test/test_bpf.c
+++ b/app/test/test_bpf.c
@@ -15,6 +15,7 @@
 #include <rte_byteorder.h>
 #include <rte_errno.h>
 #include "test.h"
+#include "test_bpf.h"
 
 #if !defined(RTE_LIB_BPF)
 
@@ -3431,10 +3432,13 @@ static const char * const sample_filters[] = {
 static int
 test_bpf_filter(pcap_t *pcap, const char *s)
 {
+	int32_t rc;
 	struct bpf_program fcode;
 	struct rte_bpf_prm *prm = NULL;
 	struct rte_bpf *bpf = NULL;
 
+	static char logbuf[UINT16_MAX + 1];
+
 	if (pcap_compile(pcap, &fcode, s, 1, PCAP_NETMASK_UNKNOWN)) {
 		printf("%s@%d: pcap_compile('%s') failed: %s;\n",
 		       __func__, __LINE__, s, pcap_geterr(pcap));
@@ -3451,6 +3455,13 @@ test_bpf_filter(pcap_t *pcap, const char *s)
 	printf("bpf convert for \"%s\" produced:\n", s);
 	rte_bpf_dump(stdout, prm->ins, prm->nb_ins);
 
+	rc = bpf_linux_verify(prm, logbuf, sizeof(logbuf));
+	printf("%s@%d: linux verifier reports: %d(%s);\n",
+		__func__, __LINE__, rc, strerror(-rc));
+	printf("linux verifier log: %s\n", logbuf);
+	if (rc != 0)
+		goto error;
+
 	bpf = rte_bpf_load(prm);
 	if (bpf == NULL) {
 		printf("%s@%d: failed to load bpf code, error=%d(%s);\n",
diff --git a/app/test/test_bpf.h b/app/test/test_bpf.h
new file mode 100644
index 0000000000..e5be2be920
--- /dev/null
+++ b/app/test/test_bpf.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#ifndef _TEST_BPF_H_
+#define _TEST_BPF_H_
+
+#include <rte_bpf.h>
+
+int bpf_linux_verify(const struct rte_bpf_prm *prm, char *logbuf,
+	uint32_t lbsz);
+
+#endif
diff --git a/app/test/test_bpf_verify_linux.c b/app/test/test_bpf_verify_linux.c
new file mode 100644
index 0000000000..2a42d61fb5
--- /dev/null
+++ b/app/test/test_bpf_verify_linux.c
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <linux/bpf.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+
+#include "test_bpf.h"
+
+int
+bpf_linux_verify(const struct rte_bpf_prm *prm, char *logbuf, uint32_t lbsz)
+{
+	int32_t rc;
+	union bpf_attr attr;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
+	attr.insn_cnt = prm->nb_ins;
+	attr.insns = (uintptr_t)prm->ins;
+	attr.license = (uintptr_t)"GPL";
+	attr.log_buf = (uintptr_t)logbuf;
+	attr.log_size = lbsz;
+	attr.log_level = (logbuf == NULL) ? 0 : 3;
+
+	rc = syscall(SYS_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	if (rc < 0)
+		rc = -errno;
+	else {
+		/* closed fd for BPF prog */
+		close(rc);
+		rc = 0;
+	}
+
+	return rc;
+}
-- 
2.35.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-07-19 18:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-30 23:23 BPF standardization Stephen Hemminger
2024-07-19  9:12 ` Konstantin Ananyev
2024-07-19 17:13   ` Stephen Hemminger
2024-07-19 18:29     ` Konstantin Ananyev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).