From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id D36424564D;
	Fri, 19 Jul 2024 11:15:42 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 5BA69427D0;
	Fri, 19 Jul 2024 11:15:35 +0200 (CEST)
Received: from frasgout.his.huawei.com (frasgout.his.huawei.com
 [185.176.79.56]) by mails.dpdk.org (Postfix) with ESMTP id 6BAF1402D2
 for <dev@dpdk.org>; Fri, 19 Jul 2024 11:12:59 +0200 (CEST)
Received: from mail.maildlp.com (unknown [172.18.186.216])
 by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4WQP6c6Chhz6F9BC;
 Fri, 19 Jul 2024 17:11:04 +0800 (CST)
Received: from frapeml100006.china.huawei.com (unknown [7.182.85.201])
 by mail.maildlp.com (Postfix) with ESMTPS id 92F541400D8;
 Fri, 19 Jul 2024 17:12:58 +0800 (CST)
Received: from frapeml500007.china.huawei.com (7.182.85.172) by
 frapeml100006.china.huawei.com (7.182.85.201) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.1.2507.39; Fri, 19 Jul 2024 11:12:58 +0200
Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by
 frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.039;
 Fri, 19 Jul 2024 11:12:58 +0200
From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: Stephen Hemminger <stephen@networkplumber.org>, Konstantin Ananyev
 <konstantin.v.ananyev@yandex.ru>
CC: "dev@dpdk.org" <dev@dpdk.org>
Subject: RE: BPF standardization
Thread-Topic: BPF standardization
Thread-Index: AQHasuiDB+5zO0SqFEuIA3Uk/IasZbH+CG3g
Date: Fri, 19 Jul 2024 09:12:58 +0000
Message-ID: <956072b13b7b45fe8ce676989580f3e0@huawei.com>
References: <20240530162326.02218863@hermes.local>
In-Reply-To: <20240530162326.02218863@hermes.local>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.206.138.42]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org


> It would be good to make sure that DPDK BPF conforms to IETF draft.
> https://datatracker.ietf.org/doc/draft-ietf-bpf-isa/
>=20
> Based on LWN article on presentation at Linux Storage, Filesystem,
> Memory Mangerment, and BPF Summit.
>=20
> https://lwn.net/SubscriberLink/975830/3b32df6be23d3abf/

Yes, it would be really good...
Another interesting option that was raised few times byt different people,
would be opportunity to re-use extrernal eBPF verifiers with DPDK eBPF prog=
s:
either one from linux kernel or user-space (PREVAIL: https://github.com/vbp=
f/ebpf-verifier/tree/main),
or even both.
One of the main obstacle with that: both linux kernel and PREVAIL
assume input context for eBPF prog in particular format=20
(usually struct __sk_buff or struct xdp_md).
In fact, PREVAIL is more flexible here, and allows to specify your own form=
at,
but it still expects some main things (data, data_end) to be present and lo=
cated
in the same way as linux kernel.
After another thought, might be a simple way to overcome it would be =20
to mimic what linux kernel does with 'direct' packet access:
At verify stage it rewrites given BPF prog to
convert load instructions that access fields of a context type into a
sequence of instructions that access fields of the underlying structure:=20
struct __sk_buff    -> struct sk_buff
struct bpf_sock_ops -> struct sock
etc. (for more details see convert_ctx_accesses() in linux/kernel/bpf/verif=
ier.c).
Inside DPDK verifier/loader we can probably do the same:
convert direct access of __sk_buff and/or xdp_md fields into rte_mbuf field=
s.
I.E.:
(__sk_buff->data) -> (mbuf->buf_addr + mbuf->data_off)
(__sk_buff->data_end) -> (mbuf->buf_addr + mbuf->data_off + mbuf->data_len)
and so on.=20

BTW, right now, eBPF programs produced by DPDK cBPF->eBPF converter
can be successfully verified by linux kernel.
Things are easy here, as cBPF converter doesn't try to access packet conten=
ts directly
(but only through special instructions: BPF_LD_ABS, BPF_LD_IND). =20
Just small fix is required in rte_bpf_convert() to achieve that, see below.

In theory, that would not also give us ability to re-use external verifiers=
,
but also should make possible to execute subset of eBPF progs written for l=
inux kernel
within DPDK app.
Of-course, not all of them, as right now linux eBPF has much richer functio=
nality
that we missing (MAPs, tail calls, etc.), but that's another story.

We plan to do some work for eBPF+DPDK within next several months, so might
be able to look at it too... though not hard promises here.
Meanwhile interested in comments/thoughts/volunteers :)

Thanks
Konstantin

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

 [PATCH 1/2] bpf: fix converter emitted code fails with linux verifier

bpf_convert_filter() uses standard approach with XOR-ing itself:
xor r0, r0, r0
to reset some register values.
Unfortunately linux verifier seems way too strict here and
doesn't allow access to register with undefined value.
It generates error log like that for this op:
Failed to verify program: Permission denied (13)
LOG: func#0 @0
0: R1=3Dctx(id=3D0,off=3D0,imm=3D0) R10=3Dfp0
0: (af) r0 ^=3D r0
R0 !read_ok
processed 1 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak=
_states 0 mark_read 0

To overcome that, simply replace XOR with itself to explicit
mov32 r0, #0x0

Signed-off-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 lib/bpf/bpf_convert.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/bpf/bpf_convert.c b/lib/bpf/bpf_convert.c
index d7ff2b4325..eceaa19c76 100644
--- a/lib/bpf/bpf_convert.c
+++ b/lib/bpf/bpf_convert.c
@@ -267,8 +267,11 @@ static int bpf_convert_filter(const struct bpf_insn *p=
rog, size_t len,
 		/* Classic BPF expects A and X to be reset first. These need
 		 * to be guaranteed to be the first two instructions.
 		 */
-		*new_insn++ =3D EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
-		*new_insn++ =3D EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+		//*new_insn++ =3D EBPF_ALU64_REG(BPF_XOR, BPF_REG_A, BPF_REG_A);
+		//*new_insn++ =3D EBPF_ALU64_REG(BPF_XOR, BPF_REG_X, BPF_REG_X);
+
+		*new_insn++ =3D BPF_MOV32_IMM(BPF_REG_A, 0);
+		*new_insn++ =3D BPF_MOV32_IMM(BPF_REG_X, 0);
=20
 		/* All programs must keep CTX in callee saved BPF_REG_CTX.
 		 * In eBPF case it's done by the compiler, here we need to
--=20
2.35.3


=20