RE: BPF standardization - Konstantin Ananyev

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: Stephen Hemminger <stephen@networkplumber.org>,
	Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: RE: BPF standardization
Date: Fri, 19 Jul 2024 09:12:58 +0000	[thread overview]
Message-ID: <956072b13b7b45fe8ce676989580f3e0@huawei.com> (raw)
In-Reply-To: <20240530162326.02218863@hermes.local>

> It would be good to make sure that DPDK BPF conforms to IETF draft.
> https://datatracker.ietf.org/doc/draft-ietf-bpf-isa/
> 
> Based on LWN article on presentation at Linux Storage, Filesystem,
> Memory Mangerment, and BPF Summit.
> 
> https://lwn.net/SubscriberLink/975830/3b32df6be23d3abf/

Yes, it would be really good...
Another interesting option that was raised few times byt different people,
would be opportunity to re-use extrernal eBPF verifiers with DPDK eBPF progs:
either one from linux kernel or user-space (PREVAIL: https://github.com/vbpf/ebpf-verifier/tree/main),
or even both.
One of the main obstacle with that: both linux kernel and PREVAIL
assume input context for eBPF prog in particular format 
(usually struct __sk_buff or struct xdp_md).
In fact, PREVAIL is more flexible here, and allows to specify your own format,
but it still expects some main things (data, data_end) to be present and located
in the same way as linux kernel.
After another thought, might be a simple way to overcome it would be  
to mimic what linux kernel does with 'direct' packet access:
At verify stage it rewrites given BPF prog to
convert load instructions that access fields of a context type into a
sequence of instructions that access fields of the underlying structure: 
struct __sk_buff    -> struct sk_buff
struct bpf_sock_ops -> struct sock
etc. (for more details see convert_ctx_accesses() in linux/kernel/bpf/verifier.c).
Inside DPDK verifier/loader we can probably do the same:
convert direct access of __sk_buff and/or xdp_md fields into rte_mbuf fields.
I.E.:
(__sk_buff->data) -> (mbuf->buf_addr + mbuf->data_off)
(__sk_buff->data_end) -> (mbuf->buf_addr + mbuf->data_off + mbuf->data_len)
and so on. 

BTW, right now, eBPF programs produced by DPDK cBPF->eBPF converter
can be successfully verified by linux kernel.
Things are easy here, as cBPF converter doesn't try to access packet contents directly
(but only through special instructions: BPF_LD_ABS, BPF_LD_IND).  
Just small fix is required in rte_bpf_convert() to achieve that, see below.

In theory, that would not also give us ability to re-use external verifiers,
but also should make possible to execute subset of eBPF progs written for linux kernel
within DPDK app.
Of-course, not all of them, as right now linux eBPF has much richer functionality
that we missing (MAPs, tail calls, etc.), but that's another story.

We plan to do some work for eBPF+DPDK within next several months, so might
be able to look at it too... though not hard promises here.
Meanwhile interested in comments/thoughts/volunteers :)

Thanks
Konstantin

======================

 [PATCH 1/2] bpf: fix converter emitted code fails with linux verifier

bpf_convert_filter() uses standard approach with XOR-ing itself:
xor r0, r0, r0
to reset some register values.
Unfortunately linux verifier seems way too strict here and
doesn't allow access to register with undefined value.
It generates error log like that for this op:
Failed to verify program: Permission denied (13)
LOG: func#0 @0
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (af) r0 ^= r0
R0 !read_ok
processed 1 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

To overcome that, simply replace XOR with itself to explicit
mov32 r0, #0x0

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 lib/bpf/bpf_convert.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
index d7ff2b4325..eceaa19c76 100644
--- a/lib/bpf/bpf_convert.c
+++ b/lib/bpf/bpf_convert.c
@@ -267,8 +267,11 @@ static int bpf_convert_filter(const struct bpf_insn *prog, size_t len,
 		/* Classic BPF expects A and X to be reset first. These need
 		 * to be guaranteed to be the first two instructions.
 		 */
-		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
-		*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+		//*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		//*new_insn++ = EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		*new_insn++ = BPF_MOV32_IMM(BPF_REG_A, 0);
+		*new_insn++ = BPF_MOV32_IMM(BPF_REG_X, 0);

 		/* All programs must keep CTX in callee saved BPF_REG_CTX.
 		 * In eBPF case it's done by the compiler, here we need to
-- 
2.35.3

next prev parent reply	other threads:[~2024-07-19  9:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-30 23:23 Stephen Hemminger
2024-07-19  9:12 ` Konstantin Ananyev [this message]
2024-07-19 17:13   ` Stephen Hemminger
2024-07-19 18:29     ` Konstantin Ananyev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=956072b13b7b45fe8ce676989580f3e0@huawei.com \
    --to=konstantin.ananyev@huawei.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).