From: Stephen Hemminger <stephen@networkplumber.org>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org
Subject: Re: [RFC PATCH 0/5] Using shared mempools for zero-copy IO proxying
Date: Thu, 6 Feb 2025 17:55:42 -0800 [thread overview]
Message-ID: <20250206175542.044244b7@hermes.local> (raw)
In-Reply-To: <20230922081912.7090-1-bruce.richardson@intel.com>
On Fri, 22 Sep 2023 09:19:07 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:
> Following my talk at the recent DPDK Summit [1], here is an RFC patchset
> containing the prototypes I created which led to the talk. This
> patchset is simply to demonstrate:
>
> * what is currently possible with DPDK in terms of zero-copy IPC
> * where the big gaps, and general problem areas are
> * what the performance is like doing zero-copy between processes
> * how we may look to have new deployment models for DPDK apps.
>
> This cover letter is quite long, as it covers how to run the demo app
> and use the drivers included in this set. I felt it more accessible this
> way than putting it in rst files in the patches. This patchset depends
> upon patchsets [2] and [3]
>
> [1] https://dpdksummit2023.sched.com/event/1P9wU
> [2] http://patches.dpdk.org/project/dpdk/list/?series=29536
> [3] http://patches.dpdk.org/project/dpdk/list/?series=29538
>
> Overview
> --------
>
> The patchset contains at a high level the following parts: a proxy
> application which performs packet IO and steers traffic on a per-queue
> basis to other applications which connect to it via unix sockets, and a
> set of drivers to be used by those applications so that they can
> (hopefully) receive packets from the proxy app without any changes to
> their own code. This all helps to demonstrate the feasibility of zero-
> copy packet transfer between independent DPDK apps.
>
> The drivers are:
> * a bus driver, which makes the connection to the proxy app via
> the unix socket. Thereafter it accepts the shared memory from the
> proxy and maps it into the running process for use for buffers and
> rings etc. It also handled communication with the proxy app on behalf
> of the other two drivers
> * a mempool driver, which simply manages a set of buffers on the basis
> of offsets within the shared memory area rather than using pointers.
> The big downside of its use is that it assumes all the objects stored
> in the mempool are mbufs. (As described in my talk, this is a big
> issue where I'm not sure we have a good solution available right now
> to resolve it)
> * an ethernet driver, which creates an rx and tx ring in shared memory
> for use in communicating with the proxy app. All buffers sent/received
> are converted to offsets within the shared memory area.
>
> The proxy app itself implements all the other logic - mostly inside
> datapath.c - to allow the connecting app to run. When an app connects to
> the unix socket, the proxy app uses memfd to create a hugepage block to
> be passed through to the "guest" app, and then sends/receives the
> messages from the drivers until the app connection is up and running to
> handle traffic. [Ideally, this IPC over unix socket mechanism should
> probably be generalized into a library used by the app, but for now it's
> just built-in]. As stated above, the steering of traffic is done
> per-queue, that is, each app connects to a specific socket corresponding
> to a NIC queue. For demo purposes, the traffic to the queues is just
> distributed using RSS, but obviously it would be possible to use e.g.
> rte_flow to do more interesting distribution in future.
>
> Running the Apps
> ----------------
>
> To get things all working just do a DPDK build as normal. Then run the
> io-proxy app. It only takes a single parameter of the core number to
> use. For example, on my system I run it on lcore 25:
>
> ./build/app/dpdk-io-proxy 25
>
> The sockets to be created and how they map to ports/queues is controlled
> via commandline, but a startup script can be provided, which just needs
> to be in the current directory and name "dpdk-io-proxy.cmds". Patch 5 of
> this set contains an example setup that I use. Therefore it's
> recommended that you run the proxy app from a directory containing that
> file. If so, the proxy app will use two ports and create two queues on
> each, mapping them to 4 unix socket files in /tmp. (Each socket is
> created in its own directory to simplify use with docker containers as
> described below in next section).
>
> No traffic is handled by the app until other end-user apps connect to
> it. Testpmd works as that second "guest" app without any changes to it.
> To run multiple testpmd instances, each taking traffic from a unique RX
> queue and forwarding it back, the following sequence of commands can be
> used [in this case, doing forwarding on cores 26 through 29, and using
> the 4 unix sockets configured using the startup file referenced above].
>
> ./build/app/dpdk-testpmd -l 24,26 --no-huge -m1 --no-shconf \
> -a sock:/tmp/socket_0_0/sock -- --forward-mode=macswap
> ./build/app/dpdk-testpmd -l 24,27 --no-huge -m1 --no-shconf \
> -a sock:/tmp/socket_0_1/sock -- --forward-mode=macswap
> ./build/app/dpdk-testpmd -l 24,28 --no-huge -m1 --no-shconf \
> -a sock:/tmp/socket_1_0/sock -- --forward-mode=macswap
> ./build/app/dpdk-testpmd -l 24,29 --no-huge -m1 --no-shconf \
> -a sock:/tmp/socket_1_1/sock -- --forward-mode=macswap
>
> NOTE:
> * the "--no-huge -m1" is present to guarantee that no regular DPDK
> hugepage memory is being used by the app. It's all coming from the
> proxy app's memfd
> * the "--no-shconf" parameter is necessary just to avoid us needing to
> specify a unix file-prefix for each instance
> * the forwarding type to be used is optional, macswap is chosen just to
> have some work done inside testpmd to prove it can touch the packet
> payload, not just the mbuf header.
>
> Using with docker containers
> ----------------------------
>
> The testpmd instances run above can also be run within a docker
> container. Using a dockerfile like below we can run testpmd in a
> container getting the packets in a zero-copy manner from the io-proxy
> running on the host.
>
> # syntax=docker/dockerfile:1-labs
> FROM alpine
> RUN apk add --update alpine-sdk \
> py3-elftools meson ninja \
> bsd-compat-headers \
> linux-headers \
> numactl-dev \
> bash
> ADD . dpdk
> WORKDIR dpdk
> RUN rm -rf build
> RUN meson setup -Denable_drivers=*/shared_mem -Ddisable_libs=* \
> -Denable_apps=test-pmd -Dtests=false build
> RUN ninja -v -C build
> ENTRYPOINT ["/dpdk/build/app/dpdk-testpmd"]
>
> To access the proxy, all the container needs is access to the unix
> socket on the filesystem. Since in the example startup script each
> socket is placed in its own directory we can use "--volume" parameter to
> give each instance it's own unique unix socket, and therefore proxied
> NIC RX/TX queue. To run four testpmd instances as above, just in
> containers the following commands can be used - assuming the dockerfile
> above is built to an image called "testpmd".
>
> docker run -it --volume=/tmp/socket_0_0:/run testpmd \
> -l 24,26 --no-huge -a sock:/run/sock -- \
> --no-mlockall --forward-mode=macswap
> docker run -it --volume=/tmp/socket_0_1:/run testpmd \
> -l 24,27 --no-huge -a sock:/run/sock -- \
> --no-mlockall --forward-mode=macswap
> docker run -it --volume=/tmp/socket_1_0:/run testpmd \
> -l 24,28 --no-huge -a sock:/run/sock -- \
> --no-mlockall --forward-mode=macswap
> docker run -it --volume=/tmp/socket_1_1:/run testpmd \
> -l 24,29 --no-huge -a sock:/run/sock -- \
> --no-mlockall --forward-mode=macswap
>
> NOTE: since these docker testpmd instances don't access IO or allocate
> hugepages directly, they should be runable without extra privileges, so
> long as they can connect to the unix socket.
>
> Additional info
> ---------------
>
> * Stats are available via app commandline
> * By default (#define in code), the proxy app only uses 2 queues per
> port, so you can't configure more than that via cmdline
> * Any ports used by the proxy script must support queue reconfiguration
> at runtime without stopping the port.
> * When a "guest" process connected to a socket terminates, all shared
> memory used by that process is detroyed and a new memfd created on
> reconnect.
> * The above setups using testpmd are the only ways in which this app and
> drivers have been tested. I would be hopeful that other apps would
> work too, but there are quite a few limitations (see my DPDK summit
> talk for some more details on those).
>
> Congratulations on reading this far! :-)
> All comments/feedback on this welcome.
>
> Bruce Richardson (5):
> bus: new driver to accept shared memory over unix socket
> mempool: driver for mempools of mbufs on shared memory
> net: new ethdev driver to communicate using shared mem
> app: add IO proxy app using shared memory interfaces
> app/io-proxy: add startup commands
>
> app/io-proxy/command_fns.c | 160 ++++++
> app/io-proxy/commands.list | 6 +
> app/io-proxy/datapath.c | 595 +++++++++++++++++++++
> app/io-proxy/datapath.h | 37 ++
> app/io-proxy/datapath_mp.c | 78 +++
> app/io-proxy/dpdk-io-proxy.cmds | 6 +
> app/io-proxy/main.c | 71 +++
> app/io-proxy/meson.build | 12 +
> app/meson.build | 1 +
> drivers/bus/meson.build | 1 +
> drivers/bus/shared_mem/meson.build | 11 +
> drivers/bus/shared_mem/shared_mem_bus.c | 323 +++++++++++
> drivers/bus/shared_mem/shared_mem_bus.h | 75 +++
> drivers/bus/shared_mem/version.map | 11 +
> drivers/mempool/meson.build | 1 +
> drivers/mempool/shared_mem/meson.build | 10 +
> drivers/mempool/shared_mem/shared_mem_mp.c | 94 ++++
> drivers/net/meson.build | 1 +
> drivers/net/shared_mem/meson.build | 11 +
> drivers/net/shared_mem/shared_mem_eth.c | 295 ++++++++++
> 20 files changed, 1799 insertions(+)
> create mode 100644 app/io-proxy/command_fns.c
> create mode 100644 app/io-proxy/commands.list
> create mode 100644 app/io-proxy/datapath.c
> create mode 100644 app/io-proxy/datapath.h
> create mode 100644 app/io-proxy/datapath_mp.c
> create mode 100644 app/io-proxy/dpdk-io-proxy.cmds
> create mode 100644 app/io-proxy/main.c
> create mode 100644 app/io-proxy/meson.build
> create mode 100644 drivers/bus/shared_mem/meson.build
> create mode 100644 drivers/bus/shared_mem/shared_mem_bus.c
> create mode 100644 drivers/bus/shared_mem/shared_mem_bus.h
> create mode 100644 drivers/bus/shared_mem/version.map
> create mode 100644 drivers/mempool/shared_mem/meson.build
> create mode 100644 drivers/mempool/shared_mem/shared_mem_mp.c
> create mode 100644 drivers/net/shared_mem/meson.build
> create mode 100644 drivers/net/shared_mem/shared_mem_eth.c
>
> --
> 2.39.2
>
This looked interesting but appears to be a dead end.
No more work, and never clear how it was different from memif.
Would need more documentation etc to be a real NIC.
If there is still interest resubmit it.
prev parent reply other threads:[~2025-02-07 1:55 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-22 8:19 Bruce Richardson
2023-09-22 8:19 ` [RFC PATCH 1/5] bus: new driver to accept shared memory over unix socket Bruce Richardson
2023-11-23 14:50 ` Jerin Jacob
2023-09-22 8:19 ` [RFC PATCH 2/5] mempool: driver for mempools of mbufs on shared memory Bruce Richardson
2023-09-22 8:19 ` [RFC PATCH 3/5] net: new ethdev driver to communicate using shared mem Bruce Richardson
2023-09-22 8:19 ` [RFC PATCH 4/5] app: add IO proxy app using shared memory interfaces Bruce Richardson
2023-09-22 8:19 ` [RFC PATCH 5/5] app/io-proxy: add startup commands Bruce Richardson
2025-02-07 1:55 ` Stephen Hemminger [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250206175542.044244b7@hermes.local \
--to=stephen@networkplumber.org \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).