DPDK patches and discussions
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	dev@dpdk.org,  Jerin Jacob <jerinj@marvell.com>,
	"Varghese, Vipin" <vipin.varghese@intel.com>
Subject: Re: [dpdk-dev] BUG: eBPF missing BPF_ABS
Date: Thu, 6 Feb 2020 09:11:27 +0000	[thread overview]
Message-ID: <CAOaVG15Ld_AjMW8NF2bjT9aEbM2_65neui7yEsN24PUK2YnrvA@mail.gmail.com> (raw)
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35C60DA0@smartserver.smartshare.dk>

I agree fixing offsets in cbpf to ebpf converter and passing mbuf is
easier. There is still the pathological case of multi segment mbuf. But
Linux XDP doesn't handle it either.

Let me put early version of filter2rteebf on GitHub

On Thu, Feb 6, 2020, 8:54 AM Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> > Konstantin
> > Sent: Wednesday, February 5, 2020 10:16 PM
> >
> > >
> > > As I mentioned in my FOSDEM talk the current DPDK eBPF handling is
> > > not usable for packet filters. I have ported the classic BPF to eBPF
> > code
> > > and the generated code is not usable by DPDK.
> > >
> > > The problem is that DPDK eBPF does not implement all the opcodes.
> > > BPF_ABS is not implemented and must be. It is in the Linux kernel.
> >
> > Yep, it doesn't.
> > This is not a generic eBPF instruction, but sort of implicit function
> > call
> > to access skb data. At initial stage of librte_bpf development,
> > I didn't think much about cBPF conversion, and to simplify things
> > decided to put all special skb features aside.
> > But sure, if that will enable DPDK cBPF support, let's add it in.
> > Please fill a Bugzilla ticket to track that issue, and I'll try to
> > find some time within 20.05 to look at it.
> > Unless of course, you or someone else would like to volunteer for it.
> >
> > Though at first step, we probably need to decide what should be
> > our requirements for it in terms of DPDK.
> > From https://www.kernel.org/doc/Documentation/networking/filter.txt:
> > "eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
> > (BPF_IND | <size> | BPF_LD) which are used to access packet data.
> > They had to be carried over from classic to have strong performance of
> > socket filters running in eBPF interpreter. These instructions can only
> > be used when interpreter context is a pointer to 'struct sk_buff' and
> > have seven implicit operands. Register R6 is an implicit input that
> > must
> > contain pointer to sk_buff. Register R0 is an implicit output which
> > contains
> > the data fetched from the packet. Registers R1-R5 are scratch registers
> > and must not be used to store the data across BPF_ABS | BPF_LD or
> > BPF_IND | BPF_LD instructions.
> > These instructions have implicit program exit condition as well. When
> > eBPF program is trying to access the data beyond the packet boundary,
> > the interpreter will abort the execution of the program. JIT compilers
> > therefore must preserve this property. src_reg and imm32 fields are
> > explicit inputs to these instructions.
> > For example:
> > BPF_IND | BPF_W | BPF_LD means:
> > R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
> > and R1 - R5 were scratched."
> >
> > For RTE_BPF_ARG_PTR_MBUF context we probably want behavior similar
> > to linux, i.e. BPF_IND | BPF_W | BPF_LD would mean something like:
> >
> > 1) uint32_t tmp;
> >     R0 = &tmp;
> >     R0 = rte_pktmbuf_read((const struct rte_mbuf *)R6,  src_reg +
> > imm32,
> >       sizeof(uint32_t), R0);
> >      if (R0 == NULL) return FAILED;
> >      R0 = ntohl(*(uint32_t *)R0);
> >
> > But what to do with when ctx is raw data buffer (RTE_BPF_ARG_PTR)?
> > Should it be just:
> > 2) R0 = ntohl(*(uint32_t *)(R6 + src_reg + imm32));
> > 3) not allow LD_ABS/LD_IND in this mode at all.
> >
>
> I think that 1) is the correct choice for the "cBPF filter" use case.
>
> In that context I consider both 2) and 3) irrelevant because the
> RTE_BPF_ARG_PTR_MBUF type should be used for cBPF filtering. I have argued
> for this before.
>
> Others have argued for using the RTE_BPF_ARG_PTR type instead. Let's
> consider using the RTE_BPF_ARG_PTR for a moment... Is there an implicit
> understanding that the data points to packet data? Then a range check in 2)
> might be relevant.
>
> However, if the RTE_BPF_ARG_PTR_MBUF type is supported and used for
> cBPF->eBPF conversion, we would not need to support LD_ABS/LD_IND for the
> RTE_BPF_ARG_PTR type. So between 2) and 3), I support 3).
>
> > Second question is implementation.
> > I can see two main options here:
> > a) if we plan to have our own cBPF->eBPF converter and support only it,
> > we can add these extra instructions generation into converter itself.
> > I.E. cBPF->eBPF conversion for LD_ABS/LD_IND will generate series
> > of generic eBPF instructions.
> > b) support eBPF LD_ABS/LD_IND in eBPF interpreter/jit
> >
> > (a) probably a simpler way (eBPF interpreter/jit/verifier would remain
> > unchanged),
> > but seems way too limited. So I think (b) is a better choice, even more
> > work implied
> > (interpreter seems more or less straightforward, jit would probably
> > need some effort).
> >
>
> This is going to be used in the fast path, probably on all packets on an
> interface. So clearly b).
>
> > Any thoughts/opinions?
> > Konstantin
>
> One more piece of information: Linux cBPF supports Auxiliary data (VLAN
> ID, Interface Index, etc.), i.e. metadata that are not part of the packet
> data, but can be found in the sk_buff/mbuf. Going for 1) and b) might make
> it easier adding support for these later.
>
>
> Med venlig hilsen / kind regards
> - Morten Brørup
>
>

  reply	other threads:[~2020-02-06  9:11 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-04 14:52 Stephen Hemminger
2020-02-05 21:16 ` Ananyev, Konstantin
2020-02-06  4:22   ` Jerin Jacob
2020-02-06  8:54   ` Morten Brørup
2020-02-06  9:11     ` Stephen Hemminger [this message]
2020-02-06 10:16       ` Ananyev, Konstantin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOaVG15Ld_AjMW8NF2bjT9aEbM2_65neui7yEsN24PUK2YnrvA@mail.gmail.com \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=mb@smartsharesystems.com \
    --cc=vipin.varghese@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).