DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] BUG: eBPF missing BPF_ABS
@ 2020-02-04 14:52 Stephen Hemminger
  2020-02-05 21:16 ` Ananyev, Konstantin
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2020-02-04 14:52 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev

As I mentioned in my FOSDEM talk the current DPDK eBPF handling is
not usable for packet filters. I have ported the classic BPF to eBPF code
and the generated code is not usable by DPDK.

The problem is that DPDK eBPF does not implement all the opcodes.
BPF_ABS is not implemented and must be. It is in the Linux kernel.

PCAP filter string: ip dst fosdem.org
cBPF program (6 insns):
 L0:	28 00 00 00 0c 00 00 00 ldh [12]
 L1:	15 00 00 03 00 08 00 00 jeq #0x800, L2, L5
 L2:	20 00 00 00 1e 00 00 00 ld [30]
 L3:	15 00 00 01 8c 16 16 1f jeq #0x1f16168c, L4, L5
 L4:	06 00 00 00 ff ff ff ff ret #0xffffffff
 L5:	06 00 00 00 00 00 00 00 ret #0x0
eBPF program (11 insns):
 L0:	af 00 00 00 00 00 00 00 xor r0, r0
 L1:	af 77 00 00 00 00 00 00 xor r7, r7
 L2:	bf 16 00 00 00 00 00 00 mov r6, r1
 L3:	28 70 00 00 0c 00 00 00 ldh r0, [12]
 L4:	55 00 04 00 00 08 00 00 jne r0, #0x800, L9
 L5:	20 70 00 00 1e 00 00 00 ldw r0, [30]
 L6:	55 00 02 00 8c 16 16 1f jne r0, #0x1f16168c, L9
 L7:	b4 00 00 00 01 00 00 00 mov32 r0, #0x1
 L8:	95 00 00 00 00 00 00 00 exit
 L9:	b4 00 00 00 02 00 00 00 mov32 r0, #0x2
 L10:	95 00 00 00 00 00 00 00 exit
validate: invalid opcode at pc: 3
validate: invalid opcode at pc: 5
rte_bpf_load failed: Invalid argument

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] BUG: eBPF missing BPF_ABS
  2020-02-04 14:52 [dpdk-dev] BUG: eBPF missing BPF_ABS Stephen Hemminger
@ 2020-02-05 21:16 ` Ananyev, Konstantin
  2020-02-06  4:22   ` Jerin Jacob
  2020-02-06  8:54   ` Morten Brørup
  0 siblings, 2 replies; 6+ messages in thread
From: Ananyev, Konstantin @ 2020-02-05 21:16 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Jerin Jacob, Varghese, Vipin, mb

> 
> As I mentioned in my FOSDEM talk the current DPDK eBPF handling is
> not usable for packet filters. I have ported the classic BPF to eBPF code
> and the generated code is not usable by DPDK.
> 
> The problem is that DPDK eBPF does not implement all the opcodes.
> BPF_ABS is not implemented and must be. It is in the Linux kernel.

Yep, it doesn't.
This is not a generic eBPF instruction, but sort of implicit function call
to access skb data. At initial stage of librte_bpf development,
I didn't think much about cBPF conversion, and to simplify things
decided to put all special skb features aside.
But sure, if that will enable DPDK cBPF support, let's add it in.
Please fill a Bugzilla ticket to track that issue, and I'll try to
find some time within 20.05 to look at it.
Unless of course, you or someone else would like to volunteer for it.

Though at first step, we probably need to decide what should be
our requirements for it in terms of DPDK.
From https://www.kernel.org/doc/Documentation/networking/filter.txt:
"eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
(BPF_IND | <size> | BPF_LD) which are used to access packet data.
They had to be carried over from classic to have strong performance of
socket filters running in eBPF interpreter. These instructions can only
be used when interpreter context is a pointer to 'struct sk_buff' and
have seven implicit operands. Register R6 is an implicit input that must
contain pointer to sk_buff. Register R0 is an implicit output which contains
the data fetched from the packet. Registers R1-R5 are scratch registers
and must not be used to store the data across BPF_ABS | BPF_LD or
BPF_IND | BPF_LD instructions.
These instructions have implicit program exit condition as well. When
eBPF program is trying to access the data beyond the packet boundary,
the interpreter will abort the execution of the program. JIT compilers
therefore must preserve this property. src_reg and imm32 fields are
explicit inputs to these instructions.
For example:
BPF_IND | BPF_W | BPF_LD means:
R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
and R1 - R5 were scratched."

For RTE_BPF_ARG_PTR_MBUF context we probably want behavior similar
to linux, i.e. BPF_IND | BPF_W | BPF_LD would mean something like:

1) uint32_t tmp;
    R0 = &tmp;
    R0 = rte_pktmbuf_read((const struct rte_mbuf *)R6,  src_reg + imm32,
	sizeof(uint32_t), R0);
     if (R0 == NULL) return FAILED; 
     R0 = ntohl(*(uint32_t *)R0); 
  
But what to do with when ctx is raw data buffer (RTE_BPF_ARG_PTR)?
Should it be just:
2) R0 = ntohl(*(uint32_t *)(R6 + src_reg + imm32));
3) not allow LD_ABS/LD_IND in this mode at all.

Second question is implementation.
I can see two main options here:
a) if we plan to have our own cBPF->eBPF converter and support only it,
we can add these extra instructions generation into converter itself.
I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series
of generic eBPF instructions.
b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit
 
(a) probably a simpler way (eBPF interpreter/jit/verifier would remain unchanged),
but seems way too limited. So I think (b) is a better choice, even more work implied 
(interpreter seems more or less straightforward, jit would probably need some effort).

Any thoughts/opinions?
Konstantin

> 
> PCAP filter string: ip dst fosdem.org
> cBPF program (6 insns):
>  L0:	28 00 00 00 0c 00 00 00 ldh [12]
>  L1:	15 00 00 03 00 08 00 00 jeq #0x800, L2, L5
>  L2:	20 00 00 00 1e 00 00 00 ld [30]
>  L3:	15 00 00 01 8c 16 16 1f jeq #0x1f16168c, L4, L5
>  L4:	06 00 00 00 ff ff ff ff ret #0xffffffff
>  L5:	06 00 00 00 00 00 00 00 ret #0x0
> eBPF program (11 insns):
>  L0:	af 00 00 00 00 00 00 00 xor r0, r0
>  L1:	af 77 00 00 00 00 00 00 xor r7, r7
>  L2:	bf 16 00 00 00 00 00 00 mov r6, r1
>  L3:	28 70 00 00 0c 00 00 00 ldh r0, [12]
>  L4:	55 00 04 00 00 08 00 00 jne r0, #0x800, L9
>  L5:	20 70 00 00 1e 00 00 00 ldw r0, [30]
>  L6:	55 00 02 00 8c 16 16 1f jne r0, #0x1f16168c, L9
>  L7:	b4 00 00 00 01 00 00 00 mov32 r0, #0x1
>  L8:	95 00 00 00 00 00 00 00 exit
>  L9:	b4 00 00 00 02 00 00 00 mov32 r0, #0x2
>  L10:	95 00 00 00 00 00 00 00 exit
> validate: invalid opcode at pc: 3
> validate: invalid opcode at pc: 5
> rte_bpf_load failed: Invalid argument

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] BUG: eBPF missing BPF_ABS
  2020-02-05 21:16 ` Ananyev, Konstantin
@ 2020-02-06  4:22   ` Jerin Jacob
  2020-02-06  8:54   ` Morten Brørup
  1 sibling, 0 replies; 6+ messages in thread
From: Jerin Jacob @ 2020-02-06  4:22 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Stephen Hemminger, dev, Jerin Jacob, Varghese, Vipin, mb

On Thu, Feb 6, 2020 at 2:46 AM Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
>
> >

>
> Second question is implementation.
> I can see two main options here:
> a) if we plan to have our own cBPF->eBPF converter and support only it,
> we can add these extra instructions generation into converter itself.
> I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series
> of generic eBPF instructions.
> b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit
>
> (a) probably a simpler way (eBPF interpreter/jit/verifier would remain unchanged),
> but seems way too limited. So I think (b) is a better choice, even more work implied
> (interpreter seems more or less straightforward, jit would probably need some effort).

Looks like cBPF is an important use case. If we are leveraging code
cBPF->eBPF code
from somewhere and there is limited option to change then it is better
to add our JIT.
I can work on adding arm64 JIT support once base code in place.


>
> Any thoughts/opinions?
> Konstantin
>
> >
> > PCAP filter string: ip dst fosdem.org
> > cBPF program (6 insns):
> >  L0:  28 00 00 00 0c 00 00 00 ldh [12]
> >  L1:  15 00 00 03 00 08 00 00 jeq #0x800, L2, L5
> >  L2:  20 00 00 00 1e 00 00 00 ld [30]
> >  L3:  15 00 00 01 8c 16 16 1f jeq #0x1f16168c, L4, L5
> >  L4:  06 00 00 00 ff ff ff ff ret #0xffffffff
> >  L5:  06 00 00 00 00 00 00 00 ret #0x0
> > eBPF program (11 insns):
> >  L0:  af 00 00 00 00 00 00 00 xor r0, r0
> >  L1:  af 77 00 00 00 00 00 00 xor r7, r7
> >  L2:  bf 16 00 00 00 00 00 00 mov r6, r1
> >  L3:  28 70 00 00 0c 00 00 00 ldh r0, [12]
> >  L4:  55 00 04 00 00 08 00 00 jne r0, #0x800, L9
> >  L5:  20 70 00 00 1e 00 00 00 ldw r0, [30]
> >  L6:  55 00 02 00 8c 16 16 1f jne r0, #0x1f16168c, L9
> >  L7:  b4 00 00 00 01 00 00 00 mov32 r0, #0x1
> >  L8:  95 00 00 00 00 00 00 00 exit
> >  L9:  b4 00 00 00 02 00 00 00 mov32 r0, #0x2
> >  L10: 95 00 00 00 00 00 00 00 exit
> > validate: invalid opcode at pc: 3
> > validate: invalid opcode at pc: 5
> > rte_bpf_load failed: Invalid argument

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] BUG: eBPF missing BPF_ABS
  2020-02-05 21:16 ` Ananyev, Konstantin
  2020-02-06  4:22   ` Jerin Jacob
@ 2020-02-06  8:54   ` Morten Brørup
  2020-02-06  9:11     ` Stephen Hemminger
  1 sibling, 1 reply; 6+ messages in thread
From: Morten Brørup @ 2020-02-06  8:54 UTC (permalink / raw)
  To: Ananyev, Konstantin, Stephen Hemminger; +Cc: dev, Jerin Jacob, Varghese, Vipin

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> Sent: Wednesday, February 5, 2020 10:16 PM
> 
> >
> > As I mentioned in my FOSDEM talk the current DPDK eBPF handling is
> > not usable for packet filters. I have ported the classic BPF to eBPF
> code
> > and the generated code is not usable by DPDK.
> >
> > The problem is that DPDK eBPF does not implement all the opcodes.
> > BPF_ABS is not implemented and must be. It is in the Linux kernel.
> 
> Yep, it doesn't.
> This is not a generic eBPF instruction, but sort of implicit function
> call
> to access skb data. At initial stage of librte_bpf development,
> I didn't think much about cBPF conversion, and to simplify things
> decided to put all special skb features aside.
> But sure, if that will enable DPDK cBPF support, let's add it in.
> Please fill a Bugzilla ticket to track that issue, and I'll try to
> find some time within 20.05 to look at it.
> Unless of course, you or someone else would like to volunteer for it.
> 
> Though at first step, we probably need to decide what should be
> our requirements for it in terms of DPDK.
> From https://www.kernel.org/doc/Documentation/networking/filter.txt:
> "eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
> (BPF_IND | <size> | BPF_LD) which are used to access packet data.
> They had to be carried over from classic to have strong performance of
> socket filters running in eBPF interpreter. These instructions can only
> be used when interpreter context is a pointer to 'struct sk_buff' and
> have seven implicit operands. Register R6 is an implicit input that
> must
> contain pointer to sk_buff. Register R0 is an implicit output which
> contains
> the data fetched from the packet. Registers R1-R5 are scratch registers
> and must not be used to store the data across BPF_ABS | BPF_LD or
> BPF_IND | BPF_LD instructions.
> These instructions have implicit program exit condition as well. When
> eBPF program is trying to access the data beyond the packet boundary,
> the interpreter will abort the execution of the program. JIT compilers
> therefore must preserve this property. src_reg and imm32 fields are
> explicit inputs to these instructions.
> For example:
> BPF_IND | BPF_W | BPF_LD means:
> R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
> and R1 - R5 were scratched."
> 
> For RTE_BPF_ARG_PTR_MBUF context we probably want behavior similar
> to linux, i.e. BPF_IND | BPF_W | BPF_LD would mean something like:
> 
> 1) uint32_t tmp;
>     R0 = &tmp;
>     R0 = rte_pktmbuf_read((const struct rte_mbuf *)R6,  src_reg +
> imm32,
> 	sizeof(uint32_t), R0);
>      if (R0 == NULL) return FAILED;
>      R0 = ntohl(*(uint32_t *)R0);
> 
> But what to do with when ctx is raw data buffer (RTE_BPF_ARG_PTR)?
> Should it be just:
> 2) R0 = ntohl(*(uint32_t *)(R6 + src_reg + imm32));
> 3) not allow LD_ABS/LD_IND in this mode at all.
> 

I think that 1) is the correct choice for the "cBPF filter" use case.

In that context I consider both 2) and 3) irrelevant because the RTE_BPF_ARG_PTR_MBUF type should be used for cBPF filtering. I have argued for this before.

Others have argued for using the RTE_BPF_ARG_PTR type instead. Let's consider using the RTE_BPF_ARG_PTR for a moment... Is there an implicit understanding that the data points to packet data? Then a range check in 2) might be relevant.

However, if the RTE_BPF_ARG_PTR_MBUF type is supported and used for cBPF->eBPF conversion, we would not need to support LD_ABS/LD_IND for the RTE_BPF_ARG_PTR type. So between 2) and 3), I support 3).

> Second question is implementation.
> I can see two main options here:
> a) if we plan to have our own cBPF->eBPF converter and support only it,
> we can add these extra instructions generation into converter itself.
> I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series
> of generic eBPF instructions.
> b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit
> 
> (a) probably a simpler way (eBPF interpreter/jit/verifier would remain
> unchanged),
> but seems way too limited. So I think (b) is a better choice, even more
> work implied
> (interpreter seems more or less straightforward, jit would probably
> need some effort).
> 

This is going to be used in the fast path, probably on all packets on an interface. So clearly b).

> Any thoughts/opinions?
> Konstantin

One more piece of information: Linux cBPF supports Auxiliary data (VLAN ID, Interface Index, etc.), i.e. metadata that are not part of the packet data, but can be found in the sk_buff/mbuf. Going for 1) and b) might make it easier adding support for these later.


Med venlig hilsen / kind regards
- Morten Brørup


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] BUG: eBPF missing BPF_ABS
  2020-02-06  8:54   ` Morten Brørup
@ 2020-02-06  9:11     ` Stephen Hemminger
  2020-02-06 10:16       ` Ananyev, Konstantin
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2020-02-06  9:11 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Ananyev, Konstantin, dev, Jerin Jacob, Varghese, Vipin

I agree fixing offsets in cbpf to ebpf converter and passing mbuf is
easier. There is still the pathological case of multi segment mbuf. But
Linux XDP doesn't handle it either.

Let me put early version of filter2rteebf on GitHub

On Thu, Feb 6, 2020, 8:54 AM Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> > Konstantin
> > Sent: Wednesday, February 5, 2020 10:16 PM
> >
> > >
> > > As I mentioned in my FOSDEM talk the current DPDK eBPF handling is
> > > not usable for packet filters. I have ported the classic BPF to eBPF
> > code
> > > and the generated code is not usable by DPDK.
> > >
> > > The problem is that DPDK eBPF does not implement all the opcodes.
> > > BPF_ABS is not implemented and must be. It is in the Linux kernel.
> >
> > Yep, it doesn't.
> > This is not a generic eBPF instruction, but sort of implicit function
> > call
> > to access skb data. At initial stage of librte_bpf development,
> > I didn't think much about cBPF conversion, and to simplify things
> > decided to put all special skb features aside.
> > But sure, if that will enable DPDK cBPF support, let's add it in.
> > Please fill a Bugzilla ticket to track that issue, and I'll try to
> > find some time within 20.05 to look at it.
> > Unless of course, you or someone else would like to volunteer for it.
> >
> > Though at first step, we probably need to decide what should be
> > our requirements for it in terms of DPDK.
> > From https://www.kernel.org/doc/Documentation/networking/filter.txt:
> > "eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
> > (BPF_IND | <size> | BPF_LD) which are used to access packet data.
> > They had to be carried over from classic to have strong performance of
> > socket filters running in eBPF interpreter. These instructions can only
> > be used when interpreter context is a pointer to 'struct sk_buff' and
> > have seven implicit operands. Register R6 is an implicit input that
> > must
> > contain pointer to sk_buff. Register R0 is an implicit output which
> > contains
> > the data fetched from the packet. Registers R1-R5 are scratch registers
> > and must not be used to store the data across BPF_ABS | BPF_LD or
> > BPF_IND | BPF_LD instructions.
> > These instructions have implicit program exit condition as well. When
> > eBPF program is trying to access the data beyond the packet boundary,
> > the interpreter will abort the execution of the program. JIT compilers
> > therefore must preserve this property. src_reg and imm32 fields are
> > explicit inputs to these instructions.
> > For example:
> > BPF_IND | BPF_W | BPF_LD means:
> > R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
> > and R1 - R5 were scratched."
> >
> > For RTE_BPF_ARG_PTR_MBUF context we probably want behavior similar
> > to linux, i.e. BPF_IND | BPF_W | BPF_LD would mean something like:
> >
> > 1) uint32_t tmp;
> >     R0 = &tmp;
> >     R0 = rte_pktmbuf_read((const struct rte_mbuf *)R6,  src_reg +
> > imm32,
> >       sizeof(uint32_t), R0);
> >      if (R0 == NULL) return FAILED;
> >      R0 = ntohl(*(uint32_t *)R0);
> >
> > But what to do with when ctx is raw data buffer (RTE_BPF_ARG_PTR)?
> > Should it be just:
> > 2) R0 = ntohl(*(uint32_t *)(R6 + src_reg + imm32));
> > 3) not allow LD_ABS/LD_IND in this mode at all.
> >
>
> I think that 1) is the correct choice for the "cBPF filter" use case.
>
> In that context I consider both 2) and 3) irrelevant because the
> RTE_BPF_ARG_PTR_MBUF type should be used for cBPF filtering. I have argued
> for this before.
>
> Others have argued for using the RTE_BPF_ARG_PTR type instead. Let's
> consider using the RTE_BPF_ARG_PTR for a moment... Is there an implicit
> understanding that the data points to packet data? Then a range check in 2)
> might be relevant.
>
> However, if the RTE_BPF_ARG_PTR_MBUF type is supported and used for
> cBPF->eBPF conversion, we would not need to support LD_ABS/LD_IND for the
> RTE_BPF_ARG_PTR type. So between 2) and 3), I support 3).
>
> > Second question is implementation.
> > I can see two main options here:
> > a) if we plan to have our own cBPF->eBPF converter and support only it,
> > we can add these extra instructions generation into converter itself.
> > I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series
> > of generic eBPF instructions.
> > b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit
> >
> > (a) probably a simpler way (eBPF interpreter/jit/verifier would remain
> > unchanged),
> > but seems way too limited. So I think (b) is a better choice, even more
> > work implied
> > (interpreter seems more or less straightforward, jit would probably
> > need some effort).
> >
>
> This is going to be used in the fast path, probably on all packets on an
> interface. So clearly b).
>
> > Any thoughts/opinions?
> > Konstantin
>
> One more piece of information: Linux cBPF supports Auxiliary data (VLAN
> ID, Interface Index, etc.), i.e. metadata that are not part of the packet
> data, but can be found in the sk_buff/mbuf. Going for 1) and b) might make
> it easier adding support for these later.
>
>
> Med venlig hilsen / kind regards
> - Morten Brørup
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [dpdk-dev] BUG: eBPF missing BPF_ABS
  2020-02-06  9:11     ` Stephen Hemminger
@ 2020-02-06 10:16       ` Ananyev, Konstantin
  0 siblings, 0 replies; 6+ messages in thread
From: Ananyev, Konstantin @ 2020-02-06 10:16 UTC (permalink / raw)
  To: Stephen Hemminger, Morten Brørup; +Cc: dev, Jerin Jacob, Varghese, Vipin

> I agree fixing offsets in cbpf to ebpf converter and passing mbuf is easier. There is still the pathological case of multi segment mbuf. But
> Linux XDP doesn't handle it either.

Why do you think multi-seg would be a problem?
If we’ll use rte_pktmbuf_read() for interpreter and something similar for jit,
multi-seg case should be covered, no?

>
> Let me put early version of filter2rteebf on GitHub
>
> On Thu, Feb 6, 2020, 8:54 AM Morten Brørup <mailto:mb@smartsharesystems.com> wrote:
> > From: dev [mailto:mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> > Konstantin
> > Sent: Wednesday, February 5, 2020 10:16 PM
> >
> > >
> > > As I mentioned in my FOSDEM talk the current DPDK eBPF handling is
> > > not usable for packet filters. I have ported the classic BPF to eBPF
> > code
> > > and the generated code is not usable by DPDK.
> > >
> > > The problem is that DPDK eBPF does not implement all the opcodes.
> > > BPF_ABS is not implemented and must be. It is in the Linux kernel.
> >
> > Yep, it doesn't.
> > This is not a generic eBPF instruction, but sort of implicit function
> > call
> > to access skb data. At initial stage of librte_bpf development,
> > I didn't think much about cBPF conversion, and to simplify things
> > decided to put all special skb features aside.
> > But sure, if that will enable DPDK cBPF support, let's add it in.
> > Please fill a Bugzilla ticket to track that issue, and I'll try to
> > find some time within 20.05 to look at it.
> > Unless of course, you or someone else would like to volunteer for it.
> >
> > Though at first step, we probably need to decide what should be
> > our requirements for it in terms of DPDK.
> > From https://www.kernel.org/doc/Documentation/networking/filter.txt:
> > "eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
> > (BPF_IND | <size> | BPF_LD) which are used to access packet data.
> > They had to be carried over from classic to have strong performance of
> > socket filters running in eBPF interpreter. These instructions can only
> > be used when interpreter context is a pointer to 'struct sk_buff' and
> > have seven implicit operands. Register R6 is an implicit input that
> > must
> > contain pointer to sk_buff. Register R0 is an implicit output which
> > contains
> > the data fetched from the packet. Registers R1-R5 are scratch registers
> > and must not be used to store the data across BPF_ABS | BPF_LD or
> > BPF_IND | BPF_LD instructions.
> > These instructions have implicit program exit condition as well. When
> > eBPF program is trying to access the data beyond the packet boundary,
> > the interpreter will abort the execution of the program. JIT compilers
> > therefore must preserve this property. src_reg and imm32 fields are
> > explicit inputs to these instructions.
> > For example:
> > BPF_IND | BPF_W | BPF_LD means:
> > R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
> > and R1 - R5 were scratched."
> >
> > For RTE_BPF_ARG_PTR_MBUF context we probably want behavior similar
> > to linux, i.e. BPF_IND | BPF_W | BPF_LD would mean something like:
> >
> > 1) uint32_t tmp;
> >     R0 = &tmp;
> >     R0 = rte_pktmbuf_read((const struct rte_mbuf *)R6,  src_reg +
> > imm32,
> >       sizeof(uint32_t), R0);
> >      if (R0 == NULL) return FAILED;
> >      R0 = ntohl(*(uint32_t *)R0);
> >
> > But what to do with when ctx is raw data buffer (RTE_BPF_ARG_PTR)?
> > Should it be just:
> > 2) R0 = ntohl(*(uint32_t *)(R6 + src_reg + imm32));
> > 3) not allow LD_ABS/LD_IND in this mode at all.
> >
>
> I think that 1) is the correct choice for the "cBPF filter" use case.
>
> In that context I consider both 2) and 3) irrelevant because the RTE_BPF_ARG_PTR_MBUF type should be used for cBPF filtering. I have
> argued for this before.
>
> Others have argued for using the RTE_BPF_ARG_PTR type instead. Let's consider using the RTE_BPF_ARG_PTR for a moment... Is there an
> implicit understanding that the data points to packet data? Then a range check in 2) might be relevant.
>
> However, if the RTE_BPF_ARG_PTR_MBUF type is supported and used for cBPF->eBPF conversion, we would not need to support
> LD_ABS/LD_IND for the RTE_BPF_ARG_PTR type. So between 2) and 3), I support 3).
>
> > Second question is implementation.
> > I can see two main options here:
> > a) if we plan to have our own cBPF->eBPF converter and support only it,
> > we can add these extra instructions generation into converter itself.
> > I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series
> > of generic eBPF instructions.
> > b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit
> >
> > (a) probably a simpler way (eBPF interpreter/jit/verifier would remain
> > unchanged),
> > but seems way too limited. So I think (b) is a better choice, even more
> > work implied
> > (interpreter seems more or less straightforward, jit would probably
> > need some effort).
> >
>
> This is going to be used in the fast path, probably on all packets on an interface. So clearly b).
>
> > Any thoughts/opinions?
> > Konstantin
>
> One more piece of information: Linux cBPF supports Auxiliary data (VLAN ID, Interface Index, etc.), i.e. metadata that are not part of the
> packet data, but can be found in the sk_buff/mbuf. Going for 1) and b) might make it easier adding support for these later.
>
>
> Med venlig hilsen / kind regards
> - Morten Brørup

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-06 10:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-04 14:52 [dpdk-dev] BUG: eBPF missing BPF_ABS Stephen Hemminger
2020-02-05 21:16 ` Ananyev, Konstantin
2020-02-06  4:22   ` Jerin Jacob
2020-02-06  8:54   ` Morten Brørup
2020-02-06  9:11     ` Stephen Hemminger
2020-02-06 10:16       ` Ananyev, Konstantin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).