From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D36424564D; Fri, 19 Jul 2024 11:15:42 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5BA69427D0; Fri, 19 Jul 2024 11:15:35 +0200 (CEST) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by mails.dpdk.org (Postfix) with ESMTP id 6BAF1402D2 for ; Fri, 19 Jul 2024 11:12:59 +0200 (CEST) Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4WQP6c6Chhz6F9BC; Fri, 19 Jul 2024 17:11:04 +0800 (CST) Received: from frapeml100006.china.huawei.com (unknown [7.182.85.201]) by mail.maildlp.com (Postfix) with ESMTPS id 92F541400D8; Fri, 19 Jul 2024 17:12:58 +0800 (CST) Received: from frapeml500007.china.huawei.com (7.182.85.172) by frapeml100006.china.huawei.com (7.182.85.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Fri, 19 Jul 2024 11:12:58 +0200 Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.039; Fri, 19 Jul 2024 11:12:58 +0200 From: Konstantin Ananyev To: Stephen Hemminger , Konstantin Ananyev CC: "dev@dpdk.org" Subject: RE: BPF standardization Thread-Topic: BPF standardization Thread-Index: AQHasuiDB+5zO0SqFEuIA3Uk/IasZbH+CG3g Date: Fri, 19 Jul 2024 09:12:58 +0000 Message-ID: <956072b13b7b45fe8ce676989580f3e0@huawei.com> References: <20240530162326.02218863@hermes.local> In-Reply-To: <20240530162326.02218863@hermes.local> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.206.138.42] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > It would be good to make sure that DPDK BPF conforms to IETF draft. > https://datatracker.ietf.org/doc/draft-ietf-bpf-isa/ >=20 > Based on LWN article on presentation at Linux Storage, Filesystem, > Memory Mangerment, and BPF Summit. >=20 > https://lwn.net/SubscriberLink/975830/3b32df6be23d3abf/ Yes, it would be really good... Another interesting option that was raised few times byt different people, would be opportunity to re-use extrernal eBPF verifiers with DPDK eBPF prog= s: either one from linux kernel or user-space (PREVAIL: https://github.com/vbp= f/ebpf-verifier/tree/main), or even both. One of the main obstacle with that: both linux kernel and PREVAIL assume input context for eBPF prog in particular format=20 (usually struct __sk_buff or struct xdp_md). In fact, PREVAIL is more flexible here, and allows to specify your own form= at, but it still expects some main things (data, data_end) to be present and lo= cated in the same way as linux kernel. After another thought, might be a simple way to overcome it would be =20 to mimic what linux kernel does with 'direct' packet access: At verify stage it rewrites given BPF prog to convert load instructions that access fields of a context type into a sequence of instructions that access fields of the underlying structure:=20 struct __sk_buff -> struct sk_buff struct bpf_sock_ops -> struct sock etc. (for more details see convert_ctx_accesses() in linux/kernel/bpf/verif= ier.c). Inside DPDK verifier/loader we can probably do the same: convert direct access of __sk_buff and/or xdp_md fields into rte_mbuf field= s. I.E.: (__sk_buff->data) -> (mbuf->buf_addr + mbuf->data_off) (__sk_buff->data_end) -> (mbuf->buf_addr + mbuf->data_off + mbuf->data_len) and so on.=20 BTW, right now, eBPF programs produced by DPDK cBPF->eBPF converter can be successfully verified by linux kernel. Things are easy here, as cBPF converter doesn't try to access packet conten= ts directly (but only through special instructions: BPF_LD_ABS, BPF_LD_IND). =20 Just small fix is required in rte_bpf_convert() to achieve that, see below. In theory, that would not also give us ability to re-use external verifiers= , but also should make possible to execute subset of eBPF progs written for l= inux kernel within DPDK app. Of-course, not all of them, as right now linux eBPF has much richer functio= nality that we missing (MAPs, tail calls, etc.), but that's another story. We plan to do some work for eBPF+DPDK within next several months, so might be able to look at it too... though not hard promises here. Meanwhile interested in comments/thoughts/volunteers :) Thanks Konstantin =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D [PATCH 1/2] bpf: fix converter emitted code fails with linux verifier bpf_convert_filter() uses standard approach with XOR-ing itself: xor r0, r0, r0 to reset some register values. Unfortunately linux verifier seems way too strict here and doesn't allow access to register with undefined value. It generates error log like that for this op: Failed to verify program: Permission denied (13) LOG: func#0 @0 0: R1=3Dctx(id=3D0,off=3D0,imm=3D0) R10=3Dfp0 0: (af) r0 ^=3D r0 R0 !read_ok processed 1 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak= _states 0 mark_read 0 To overcome that, simply replace XOR with itself to explicit mov32 r0, #0x0 Signed-off-by: Konstantin Ananyev --- lib/bpf/bpf_convert.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c index d7ff2b4325..eceaa19c76 100644 --- a/lib/bpf/bpf_convert.c +++ b/lib/bpf/bpf_convert.c @@ -267,8 +267,11 @@ static int bpf_convert_filter(const struct bpf_insn *p= rog, size_t len, /* Classic BPF expects A and X to be reset first. These need * to be guaranteed to be the first two instructions. */ - *new_insn++ =3D EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A); - *new_insn++ =3D EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X); + //*new_insn++ =3D EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A); + //*new_insn++ =3D EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X); + + *new_insn++ =3D BPF_MOV32_IMM(BPF_REG_A, 0); + *new_insn++ =3D BPF_MOV32_IMM(BPF_REG_X, 0); =20 /* All programs must keep CTX in callee saved BPF_REG_CTX. * In eBPF case it's done by the compiler, here we need to --=20 2.35.3 =20