DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Stephen Hemminger" <stephen@networkplumber.org>,
	"Morten Brørup" <mb@smartsharesystems.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Jerin Jacob <jerinj@marvell.com>,
	"Varghese, Vipin" <vipin.varghese@intel.com>
Subject: Re: [dpdk-dev] BUG: eBPF missing BPF_ABS
Date: Thu, 6 Feb 2020 10:16:47 +0000	[thread overview]
Message-ID: <SN6PR11MB2558280F2F4465B4795A61099A1D0@SN6PR11MB2558.namprd11.prod.outlook.com> (raw)
In-Reply-To: <CAOaVG15Ld_AjMW8NF2bjT9aEbM2_65neui7yEsN24PUK2YnrvA@mail.gmail.com>

> I agree fixing offsets in cbpf to ebpf converter and passing mbuf is easier. There is still the pathological case of multi segment mbuf. But
> Linux XDP doesn't handle it either.

Why do you think multi-seg would be a problem?
If we’ll use rte_pktmbuf_read() for interpreter and something similar for jit,
multi-seg case should be covered, no?

>
> Let me put early version of filter2rteebf on GitHub
>
> On Thu, Feb 6, 2020, 8:54 AM Morten Brørup <mailto:mb@smartsharesystems.com> wrote:
> > From: dev [mailto:mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> > Konstantin
> > Sent: Wednesday, February 5, 2020 10:16 PM
> >
> > >
> > > As I mentioned in my FOSDEM talk the current DPDK eBPF handling is
> > > not usable for packet filters. I have ported the classic BPF to eBPF
> > code
> > > and the generated code is not usable by DPDK.
> > >
> > > The problem is that DPDK eBPF does not implement all the opcodes.
> > > BPF_ABS is not implemented and must be. It is in the Linux kernel.
> >
> > Yep, it doesn't.
> > This is not a generic eBPF instruction, but sort of implicit function
> > call
> > to access skb data. At initial stage of librte_bpf development,
> > I didn't think much about cBPF conversion, and to simplify things
> > decided to put all special skb features aside.
> > But sure, if that will enable DPDK cBPF support, let's add it in.
> > Please fill a Bugzilla ticket to track that issue, and I'll try to
> > find some time within 20.05 to look at it.
> > Unless of course, you or someone else would like to volunteer for it.
> >
> > Though at first step, we probably need to decide what should be
> > our requirements for it in terms of DPDK.
> > From https://www.kernel.org/doc/Documentation/networking/filter.txt:
> > "eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
> > (BPF_IND | <size> | BPF_LD) which are used to access packet data.
> > They had to be carried over from classic to have strong performance of
> > socket filters running in eBPF interpreter. These instructions can only
> > be used when interpreter context is a pointer to 'struct sk_buff' and
> > have seven implicit operands. Register R6 is an implicit input that
> > must
> > contain pointer to sk_buff. Register R0 is an implicit output which
> > contains
> > the data fetched from the packet. Registers R1-R5 are scratch registers
> > and must not be used to store the data across BPF_ABS | BPF_LD or
> > BPF_IND | BPF_LD instructions.
> > These instructions have implicit program exit condition as well. When
> > eBPF program is trying to access the data beyond the packet boundary,
> > the interpreter will abort the execution of the program. JIT compilers
> > therefore must preserve this property. src_reg and imm32 fields are
> > explicit inputs to these instructions.
> > For example:
> > BPF_IND | BPF_W | BPF_LD means:
> > R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
> > and R1 - R5 were scratched."
> >
> > For RTE_BPF_ARG_PTR_MBUF context we probably want behavior similar
> > to linux, i.e. BPF_IND | BPF_W | BPF_LD would mean something like:
> >
> > 1) uint32_t tmp;
> >     R0 = &tmp;
> >     R0 = rte_pktmbuf_read((const struct rte_mbuf *)R6,  src_reg +
> > imm32,
> >       sizeof(uint32_t), R0);
> >      if (R0 == NULL) return FAILED;
> >      R0 = ntohl(*(uint32_t *)R0);
> >
> > But what to do with when ctx is raw data buffer (RTE_BPF_ARG_PTR)?
> > Should it be just:
> > 2) R0 = ntohl(*(uint32_t *)(R6 + src_reg + imm32));
> > 3) not allow LD_ABS/LD_IND in this mode at all.
> >
>
> I think that 1) is the correct choice for the "cBPF filter" use case.
>
> In that context I consider both 2) and 3) irrelevant because the RTE_BPF_ARG_PTR_MBUF type should be used for cBPF filtering. I have
> argued for this before.
>
> Others have argued for using the RTE_BPF_ARG_PTR type instead. Let's consider using the RTE_BPF_ARG_PTR for a moment... Is there an
> implicit understanding that the data points to packet data? Then a range check in 2) might be relevant.
>
> However, if the RTE_BPF_ARG_PTR_MBUF type is supported and used for cBPF->eBPF conversion, we would not need to support
> LD_ABS/LD_IND for the RTE_BPF_ARG_PTR type. So between 2) and 3), I support 3).
>
> > Second question is implementation.
> > I can see two main options here:
> > a) if we plan to have our own cBPF->eBPF converter and support only it,
> > we can add these extra instructions generation into converter itself.
> > I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series
> > of generic eBPF instructions.
> > b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit
> >
> > (a) probably a simpler way (eBPF interpreter/jit/verifier would remain
> > unchanged),
> > but seems way too limited. So I think (b) is a better choice, even more
> > work implied
> > (interpreter seems more or less straightforward, jit would probably
> > need some effort).
> >
>
> This is going to be used in the fast path, probably on all packets on an interface. So clearly b).
>
> > Any thoughts/opinions?
> > Konstantin
>
> One more piece of information: Linux cBPF supports Auxiliary data (VLAN ID, Interface Index, etc.), i.e. metadata that are not part of the
> packet data, but can be found in the sk_buff/mbuf. Going for 1) and b) might make it easier adding support for these later.
>
>
> Med venlig hilsen / kind regards
> - Morten Brørup

      reply	other threads:[~2020-02-06 10:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-04 14:52 Stephen Hemminger
2020-02-05 21:16 ` Ananyev, Konstantin
2020-02-06  4:22   ` Jerin Jacob
2020-02-06  8:54   ` Morten Brørup
2020-02-06  9:11     ` Stephen Hemminger
2020-02-06 10:16       ` Ananyev, Konstantin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SN6PR11MB2558280F2F4465B4795A61099A1D0@SN6PR11MB2558.namprd11.prod.outlook.com \
    --to=konstantin.ananyev@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=mb@smartsharesystems.com \
    --cc=stephen@networkplumber.org \
    --cc=vipin.varghese@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).