DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Jerin Jacob <jerinj@marvell.com>,
	"Varghese, Vipin" <vipin.varghese@intel.com>,
	"mb@smartsharesystems.com" <mb@smartsharesystems.com>
Subject: Re: [dpdk-dev] BUG: eBPF missing BPF_ABS
Date: Wed, 5 Feb 2020 21:16:01 +0000	[thread overview]
Message-ID: <DM6PR11MB25568543B04FD6F6EA77BD779A020@DM6PR11MB2556.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20200204065229.0e8a85f7@shemminger-XPS-13-9360>

> 
> As I mentioned in my FOSDEM talk the current DPDK eBPF handling is
> not usable for packet filters. I have ported the classic BPF to eBPF code
> and the generated code is not usable by DPDK.
> 
> The problem is that DPDK eBPF does not implement all the opcodes.
> BPF_ABS is not implemented and must be. It is in the Linux kernel.

Yep, it doesn't.
This is not a generic eBPF instruction, but sort of implicit function call
to access skb data. At initial stage of librte_bpf development,
I didn't think much about cBPF conversion, and to simplify things
decided to put all special skb features aside.
But sure, if that will enable DPDK cBPF support, let's add it in.
Please fill a Bugzilla ticket to track that issue, and I'll try to
find some time within 20.05 to look at it.
Unless of course, you or someone else would like to volunteer for it.

Though at first step, we probably need to decide what should be
our requirements for it in terms of DPDK.
From https://www.kernel.org/doc/Documentation/networking/filter.txt:
"eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
(BPF_IND | <size> | BPF_LD) which are used to access packet data.
They had to be carried over from classic to have strong performance of
socket filters running in eBPF interpreter. These instructions can only
be used when interpreter context is a pointer to 'struct sk_buff' and
have seven implicit operands. Register R6 is an implicit input that must
contain pointer to sk_buff. Register R0 is an implicit output which contains
the data fetched from the packet. Registers R1-R5 are scratch registers
and must not be used to store the data across BPF_ABS | BPF_LD or
BPF_IND | BPF_LD instructions.
These instructions have implicit program exit condition as well. When
eBPF program is trying to access the data beyond the packet boundary,
the interpreter will abort the execution of the program. JIT compilers
therefore must preserve this property. src_reg and imm32 fields are
explicit inputs to these instructions.
For example:
BPF_IND | BPF_W | BPF_LD means:
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
and R1 - R5 were scratched."

For RTE_BPF_ARG_PTR_MBUF context we probably want behavior similar
to linux, i.e. BPF_IND | BPF_W | BPF_LD would mean something like:

1) uint32_t tmp;
    R0 = &tmp;
    R0 = rte_pktmbuf_read((const struct rte_mbuf *)R6,  src_reg + imm32,
	sizeof(uint32_t), R0);
     if (R0 == NULL) return FAILED; 
     R0 = ntohl(*(uint32_t *)R0); 
  
But what to do with when ctx is raw data buffer (RTE_BPF_ARG_PTR)?
Should it be just:
2) R0 = ntohl(*(uint32_t *)(R6 + src_reg + imm32));
3) not allow LD_ABS/LD_IND in this mode at all.

Second question is implementation.
I can see two main options here:
a) if we plan to have our own cBPF->eBPF converter and support only it,
we can add these extra instructions generation into converter itself.
I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series
of generic eBPF instructions.
b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit
 
(a) probably a simpler way (eBPF interpreter/jit/verifier would remain unchanged),
but seems way too limited. So I think (b) is a better choice, even more work implied 
(interpreter seems more or less straightforward, jit would probably need some effort).

Any thoughts/opinions?
Konstantin

> 
> PCAP filter string: ip dst fosdem.org
> cBPF program (6 insns):
>  L0:	28 00 00 00 0c 00 00 00 ldh [12]
>  L1:	15 00 00 03 00 08 00 00 jeq #0x800, L2, L5
>  L2:	20 00 00 00 1e 00 00 00 ld [30]
>  L3:	15 00 00 01 8c 16 16 1f jeq #0x1f16168c, L4, L5
>  L4:	06 00 00 00 ff ff ff ff ret #0xffffffff
>  L5:	06 00 00 00 00 00 00 00 ret #0x0
> eBPF program (11 insns):
>  L0:	af 00 00 00 00 00 00 00 xor r0, r0
>  L1:	af 77 00 00 00 00 00 00 xor r7, r7
>  L2:	bf 16 00 00 00 00 00 00 mov r6, r1
>  L3:	28 70 00 00 0c 00 00 00 ldh r0, [12]
>  L4:	55 00 04 00 00 08 00 00 jne r0, #0x800, L9
>  L5:	20 70 00 00 1e 00 00 00 ldw r0, [30]
>  L6:	55 00 02 00 8c 16 16 1f jne r0, #0x1f16168c, L9
>  L7:	b4 00 00 00 01 00 00 00 mov32 r0, #0x1
>  L8:	95 00 00 00 00 00 00 00 exit
>  L9:	b4 00 00 00 02 00 00 00 mov32 r0, #0x2
>  L10:	95 00 00 00 00 00 00 00 exit
> validate: invalid opcode at pc: 3
> validate: invalid opcode at pc: 5
> rte_bpf_load failed: Invalid argument

  reply	other threads:[~2020-02-05 21:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-04 14:52 Stephen Hemminger
2020-02-05 21:16 ` Ananyev, Konstantin [this message]
2020-02-06  4:22   ` Jerin Jacob
2020-02-06  8:54   ` Morten Brørup
2020-02-06  9:11     ` Stephen Hemminger
2020-02-06 10:16       ` Ananyev, Konstantin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR11MB25568543B04FD6F6EA77BD779A020@DM6PR11MB2556.namprd11.prod.outlook.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=mb@smartsharesystems.com \
    --cc=stephen@networkplumber.org \
    --cc=vipin.varghese@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).