From: "Zhang, Qi Z" <qi.z.zhang@intel.com>
To: William Tu <u9012063@gmail.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"Karlsson, Magnus" <magnus.karlsson@intel.com>,
"Topel, Bjorn" <bjorn.topel@intel.com>,
"Wu, Jingjing" <jingjing.wu@intel.com>,
"Li, Xiaoyun" <xiaoyun.li@intel.com>,
"Yigit, Ferruh" <ferruh.yigit@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 0/6] PMD driver for AF_XDP
Date: Tue, 28 Aug 2018 14:11:57 +0000 [thread overview]
Message-ID: <039ED4275CED7440929022BC67E706115327BF20@SHSMSX103.ccr.corp.intel.com> (raw)
In-Reply-To: <CALDO+SbVLosAARoHpQTtApj6m+dZ6mO4ZDVJ_VkWBuqgb4uhug@mail.gmail.com>
Hi William:
> -----Original Message-----
> From: William Tu [mailto:u9012063@gmail.com]
> Sent: Friday, August 24, 2018 12:25 AM
> To: Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: dev@dpdk.org; Karlsson, Magnus <magnus.karlsson@intel.com>; Topel,
> Bjorn <bjorn.topel@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>; Li,
> Xiaoyun <xiaoyun.li@intel.com>; Yigit, Ferruh <ferruh.yigit@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v3 0/6] PMD driver for AF_XDP
>
> Hi Zhang Qi,
>
> I'm not familiar with DPDK code, but I'm curious about the benefits of using
> AF_XDP pmd, specifically I have a couple questions:
>
> 1) With zero-copy driver support, is AF_XDP pmd expects to have similar
> performance than other pmd?
Zero-copy will improve performance a lot, but it still have gap with native DPDK PMD.
basically it's kind of less performance but more flexible solution.
BTW, Patches to enable zero copy for i40e just be published by Bjorn, there is some performance data for your reference.
http://lists.openwall.net/netdev/2018/08/28/62
> Since AF_XDP is still using native device driver,
> isn't the interrupt still there and not "poll-mode"
> anymore?
Yes, it's still napi->poll triggered by interrupt.
>
> 2) does the patch expect user to customize the ebpf/xdp code so that this
> becomes another way to extend dpdk datapath?
Yes, this provides another option to use kernel's eBPF eco-system for packet filtering,.
And it will be easy for us to develop some tool to load/link/expose ebpf as part of DPDK I think.
According to AF_XDP PMD, my view is since DPDK is very popular, it is becoming some standard way to develop network applications.
So a DPDK PMD is going to be a bridge for developers to take advantage of the AF_XDP technology if compared to deal with the XDP Socket and libc directly.
Regards
Qi
>
> Thank you
> William
>
> On Thu, Aug 16, 2018 at 7:42 AM Qi Zhang <qi.z.zhang@intel.com> wrote:
> >
> > Overview
> > ========
> >
> > The patch set add a new PMD driver for AF_XDP which is a proposed
> > faster version of AF_PACKET interface in Linux, see below link for
> > detail AF_XDP introduction:
> > https://lwn.net/Articles/750845/
> > https://fosdem.org/2018/schedule/event/af_xdp/
> >
> > AF_XDP roadmap
> > ==============
> > - The kernel 4.18 is out and af_xdp is included.
> > https://kernelnewbies.org/Linux_4.18
> > - So far there is no zero copy supported driver be merged, but some are
> > on the way.
> >
> > Change logs
> > ===========
> >
> > v3:
> > - Re-work base on AF_XDP's interface changes.
> > - Support multi-queues, each dpdk queue has its own xdp socket.
> > An xdp socket is always bound to a netdev queue.
> > We assume all xdp socket from the same ethdev are bound to the
> > same netdev queue, though a netdev queue still can be bound by
> > xdp sockets from different ethdev instances.
> > Below is an example of the mapping.
> > ------------------------------------------------------
> > | dpdk q0 | dpdk q1 | dpdk q0 | dpdk q0 | dpdk q1 |
> > ------------------------------------------------------
> > | xsk A | xsk B | xsk C | xsk D | xsk E |<---|
> > ------------------------------------------------------ |
> > | ETHDEV 0 | ETHDEV 1 | ETHDEV 2 | |
> DPDK
> > ------------------------------------------------------------------
> > | netdev queue 0 | netdev queue 1 | |
> KERNEL
> > ------------------------------------------------------ |
> > | NETDEV eth0 | |
> > ------------------------------------------------------ |
> > | key xsk | |
> > | ---------- -------------- | |
> > | | | | 0 | xsk A | | |
> > | | | -------------- | |
> > | | | | 2 | xsk B | | |
> > | | ebpf | ---------------------------------------
> > | | | | 3 | xsk C | |
> > | | redirect ->|-------------- |
> > | | | | 4 | xsk D | |
> > | | | -------------- |
> > | |---------| | 5 | xsk E | |
> > | -------------- |
> > |-----------------------------------------------------
> >
> > - It is an open question that how to load ebpf to kernel and link to
> > specific netdev in DPDK, should it be part of PMD, or it should be
> handled by
> > an independent tool? In this patchset, it takes the second option, there
> will
> > be a "bind" stage before we start AF_XDP PMD, this includes below
> steps:
> > a) load ebpf program to the kernel, (the ebpf program must contain the
> > logic to redirect packet to a xdp socket base on a redirect map).
> > b) link ebpf program to specific network interface.
> > c) expose the xdp socket redirect map id and entries number to user,
> > so this will be parsed to PMD, and PMD will create xdp socket
> > for each queue and update the redirect map correctly.
> > (example:
> >
> --vdev,iface=eth0,xsk_map_id=53,xsk_map_key_base=0,xsk_map_key_count
> =4
> > )
> >
> > v2:
> > - fix lisence header
> > - clean up bpf dependency, bpf program is embedded, no
> "xdpsock_kern.o"
> > required
> > - clean up make file, only linux_header is required
> > - fix all the compile warning.
> > - fix packet number return in Tx.
> >
> > How to try
> > ==========
> >
> > 1. Take the kernel v4.18.
> > make sure you turn on XDP sockets when compiling
> > Networking support -->
> > Networking options -->
> > [ * ] XDP sockets
> > 2. in the kernel source code, apply below patch and compile the bpf sample
> code.
> > #make samples/bpf/
> > so the sample xdpsock can be used as a bind/unbind tool for af_xdp
> > PMD, sorry for this ugly, but in future, there could be a dedicated
> > tool in DPDK, if we agree with the idea that bpf configure in the kernel
> > should be separated from PMD.
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~PATCH
> > START~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > diff --git a/samples/bpf/xdpsock_user.c b/samples/bpf/xdpsock_user.c
> > index d69c8d78d3fd..44a6318043e7 100644
> > --- a/samples/bpf/xdpsock_user.c
> > +++ b/samples/bpf/xdpsock_user.c
> > @@ -76,6 +76,8 @@ static int opt_poll; static int
> > opt_shared_packet_buffer; static int opt_interval = 1; static u32
> > opt_xdp_bind_flags;
> > +static int opt_bind;
> > +static int opt_unbind;
> >
> > struct xdp_umem_uqueue {
> > u32 cached_prod;
> > @@ -662,6 +664,8 @@ static void usage(const char *prog)
> > " -S, --xdp-skb=n Use XDP skb-mod\n"
> > " -N, --xdp-native=n Enfore XDP native mode\n"
> > " -n, --interval=n Specify statistics update
> interval (default 1 sec).\n"
> > + " -b, --bind Bind only.\n"
> > + " -u, --unbind Unbind only.\n"
> > "\n";
> > fprintf(stderr, str, prog);
> > exit(EXIT_FAILURE);
> > @@ -674,7 +678,7 @@ static void parse_command_line(int argc, char
> **argv)
> > opterr = 0;
> >
> > for (;;) {
> > - c = getopt_long(argc, argv, "rtli:q:psSNn:", long_options,
> > + c = getopt_long(argc, argv, "rtli:q:psSNn:bu",
> > + long_options,
> > &option_index);
> > if (c == -1)
> > break;
> > @@ -711,6 +715,12 @@ static void parse_command_line(int argc, char
> **argv)
> > case 'n':
> > opt_interval = atoi(optarg);
> > break;
> > + case 'b':
> > + opt_bind = 1;
> > + break;
> > + case 'u':
> > + opt_unbind = 1;
> > + break;
> > default:
> > usage(basename(argv[0]));
> > }
> > @@ -898,6 +908,12 @@ int main(int argc, char **argv)
> > exit(EXIT_FAILURE);
> > }
> >
> > + if (opt_unbind) {
> > + bpf_set_link_xdp_fd(opt_ifindex, -1, opt_xdp_flags);
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~PATCH
> > END~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > 3. bind
> > #./samples/bpf/xdpsock -i eth0 -b
> >
> > in this step, an ebpf binary xdpsock_kern.o is be loaded into the kernel
> > and linked to eth0, the ebpf source code is
> /samples/bpf/xdpsock_kern.c
> > you can modify it and re-compile for a different test.
> >
> > 4. dump xdp socket map information.
> > #./tools/bpf/bpftool/bpftool map -p, you will see something like below.
> >
> > },{
> > "id": 56,
> > "type": "xskmap",
> > "name": "xsks_map",
> > "flags": 0,
> > "bytes_key": 4,
> > "bytes_value": 4,
> > "max_entries": 4,
> > "bytes_memlock": 4096
> > }
> >
> > in this case 56 is the map id and it has 4 entries
> >
> > 5. start testpmd
> >
> > ./build/app/testpmd -c 0xc -n 4 --vdev
> >
> eth_af_xdp,iface=enp59s0f0,xsk_map_id=56,xsk_map_key_start=2xsk_map_
> ke
> > y_count=2 -- -i --rxq=2 --txq=2
> >
> > in this case, we reserved 2 entries (2,3) in the map, and they will be
> mapped to queue 0 and queue 1.
> >
> > 6. unbind after test
> > ./sample/bpf/xdpsock -i eth0 -u.
> >
> > Performance
> > ===========
> > Since no zero copy driver is ready yet.
> > So far only tested with DRV and SKB mode on i40e 25G the result show
> > identical with kernel sample "xdpsock"
> >
> > Qi Zhang (6):
> > net/af_xdp: new PMD driver
> > lib/mbuf: enable parse flags when create mempool
> > lib/mempool: allow page size aligned mempool
> > net/af_xdp: use mbuf mempool for buffer management
> > net/af_xdp: enable zero copy
> > app/testpmd: add mempool flags parameter
> >
> > app/test-pmd/parameters.c | 12 +
> > app/test-pmd/testpmd.c | 15 +-
> > app/test-pmd/testpmd.h | 1 +
> > config/common_base | 5 +
> > config/common_linuxapp | 1 +
> > drivers/net/Makefile | 1 +
> > drivers/net/af_xdp/Makefile | 30 +
> > drivers/net/af_xdp/meson.build | 7 +
> > drivers/net/af_xdp/rte_eth_af_xdp.c | 1345
> +++++++++++++++++++++++++
> > drivers/net/af_xdp/rte_pmd_af_xdp_version.map | 4 +
> > lib/librte_mbuf/rte_mbuf.c | 15 +-
> > lib/librte_mbuf/rte_mbuf.h | 8 +-
> > lib/librte_mempool/rte_mempool.c | 3 +
> > lib/librte_mempool/rte_mempool.h | 1 +
> > mk/rte.app.mk | 1 +
> > 15 files changed, 1439 insertions(+), 10 deletions(-) create mode
> > 100644 drivers/net/af_xdp/Makefile create mode 100644
> > drivers/net/af_xdp/meson.build create mode 100644
> > drivers/net/af_xdp/rte_eth_af_xdp.c
> > create mode 100644 drivers/net/af_xdp/rte_pmd_af_xdp_version.map
> >
> > --
> > 2.13.6
> >
next prev parent reply other threads:[~2018-08-28 14:12 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-16 14:43 Qi Zhang
2018-08-16 14:43 ` [dpdk-dev] [RFC v3 1/6] net/af_xdp: new PMD driver Qi Zhang
2018-08-16 14:43 ` [dpdk-dev] [RFC v3 2/6] lib/mbuf: enable parse flags when create mempool Qi Zhang
2018-08-16 14:43 ` [dpdk-dev] [RFC v3 3/6] lib/mempool: allow page size aligned mempool Qi Zhang
2018-08-19 6:56 ` Jerin Jacob
2018-08-16 14:43 ` [dpdk-dev] [RFC v3 4/6] net/af_xdp: use mbuf mempool for buffer management Qi Zhang
2018-08-16 14:43 ` [dpdk-dev] [RFC v3 5/6] net/af_xdp: enable zero copy Qi Zhang
2018-08-16 14:43 ` [dpdk-dev] [RFC v3 6/6] app/testpmd: add mempool flags parameter Qi Zhang
2018-08-23 16:25 ` [dpdk-dev] [PATCH v3 0/6] PMD driver for AF_XDP William Tu
2018-08-28 14:11 ` Zhang, Qi Z [this message]
2018-08-25 6:11 ` Zhang, Qi Z
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=039ED4275CED7440929022BC67E706115327BF20@SHSMSX103.ccr.corp.intel.com \
--to=qi.z.zhang@intel.com \
--cc=bjorn.topel@intel.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=jingjing.wu@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=u9012063@gmail.com \
--cc=xiaoyun.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).