DPDK patches and discussions
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org
Subject: Re: [RFC PATCH 0/5] Using shared mempools for zero-copy IO proxying
Date: Thu, 6 Feb 2025 17:55:42 -0800	[thread overview]
Message-ID: <20250206175542.044244b7@hermes.local> (raw)
In-Reply-To: <20230922081912.7090-1-bruce.richardson@intel.com>

On Fri, 22 Sep 2023 09:19:07 +0100
Bruce Richardson <bruce.richardson@intel.com> wrote:

> Following my talk at the recent DPDK Summit [1], here is an RFC patchset
> containing the prototypes I created which led to the talk.  This
> patchset is simply to demonstrate:
> 
> * what is currently possible with DPDK in terms of zero-copy IPC
> * where the big gaps, and general problem areas are
> * what the performance is like doing zero-copy between processes
> * how we may look to have new deployment models for DPDK apps.
> 
> This cover letter is quite long, as it covers how to run the demo app
> and use the drivers included in this set. I felt it more accessible this
> way than putting it in rst files in the patches. This patchset depends
> upon patchsets [2] and [3]
> 
> [1] https://dpdksummit2023.sched.com/event/1P9wU
> [2] http://patches.dpdk.org/project/dpdk/list/?series=29536
> [3] http://patches.dpdk.org/project/dpdk/list/?series=29538
> 
> Overview
> --------
> 
> The patchset contains at a high level the following parts: a proxy
> application which performs packet IO and steers traffic on a per-queue
> basis to other applications which connect to it via unix sockets, and a
> set of drivers to be used by those applications so that they can
> (hopefully) receive packets from the proxy app without any changes to
> their own code. This all helps to demonstrate the feasibility of zero-
> copy packet transfer between independent DPDK apps.
> 
> The drivers are:
> * a bus driver, which makes the connection to the proxy app via
>   the unix socket. Thereafter it accepts the shared memory from the
>   proxy and maps it into the running process for use for buffers and
>   rings etc. It also handled communication with the proxy app on behalf
>   of the other two drivers
> * a mempool driver, which simply manages a set of buffers on the basis
>   of offsets within the shared memory area rather than using pointers.
>   The big downside of its use is that it assumes all the objects stored
>   in the mempool are mbufs. (As described in my talk, this is a big
>   issue where I'm not sure we have a good solution available right now
>   to resolve it)
> * an ethernet driver, which creates an rx and tx ring in shared memory
>   for use in communicating with the proxy app. All buffers sent/received
>   are converted to offsets within the shared memory area.
> 
> The proxy app itself implements all the other logic - mostly inside
> datapath.c - to allow the connecting app to run. When an app connects to
> the unix socket, the proxy app uses memfd to create a hugepage block to
> be passed through to the "guest" app, and then sends/receives the
> messages from the drivers until the app connection is up and running to
> handle traffic. [Ideally, this IPC over unix socket mechanism should
> probably be generalized into a library used by the app, but for now it's
> just built-in]. As stated above, the steering of traffic is done
> per-queue, that is, each app connects to a specific socket corresponding
> to a NIC queue. For demo purposes, the traffic to the queues is just
> distributed using RSS, but obviously it would be possible to use e.g.
> rte_flow to do more interesting distribution in future.
> 
> Running the Apps
> ----------------
> 
> To get things all working just do a DPDK build as normal. Then run the
> io-proxy app. It only takes a single parameter of the core number to
> use. For example, on my system I run it on lcore 25:
> 
> 	./build/app/dpdk-io-proxy 25
> 
> The sockets to be created and how they map to ports/queues is controlled
> via commandline, but a startup script can be provided, which just needs
> to be in the current directory and name "dpdk-io-proxy.cmds". Patch 5 of
> this set contains an example setup that I use. Therefore it's
> recommended that you run the proxy app from a directory containing that
> file. If so, the proxy app will use two ports and create two queues on
> each, mapping them to 4 unix socket files in /tmp. (Each socket is
> created in its own directory to simplify use with docker containers as
> described below in next section).
> 
> No traffic is handled by the app until other end-user apps connect to
> it. Testpmd works as that second "guest" app without any changes to it.
> To run multiple testpmd instances, each taking traffic from a unique RX
> queue and forwarding it back, the following sequence of commands can be
> used [in this case, doing forwarding on cores 26 through 29, and using
> the 4 unix sockets configured using the startup file referenced above].
> 
> 	./build/app/dpdk-testpmd -l 24,26 --no-huge -m1 --no-shconf \
> 		-a sock:/tmp/socket_0_0/sock  -- --forward-mode=macswap
> 	./build/app/dpdk-testpmd -l 24,27 --no-huge -m1 --no-shconf \
> 		-a sock:/tmp/socket_0_1/sock  -- --forward-mode=macswap
> 	./build/app/dpdk-testpmd -l 24,28 --no-huge -m1 --no-shconf \
> 		-a sock:/tmp/socket_1_0/sock  -- --forward-mode=macswap
> 	./build/app/dpdk-testpmd -l 24,29 --no-huge -m1 --no-shconf \
> 		-a sock:/tmp/socket_1_1/sock  -- --forward-mode=macswap
> 
> NOTE:
> * the "--no-huge -m1" is present to guarantee that no regular DPDK
>   hugepage memory is being used by the app. It's all coming from the
>   proxy app's memfd
> * the "--no-shconf" parameter is necessary just to avoid us needing to
>   specify a unix file-prefix for each instance
> * the forwarding type to be used is optional, macswap is chosen just to
>   have some work done inside testpmd to prove it can touch the packet
>   payload, not just the mbuf header.
> 
> Using with docker containers
> ----------------------------
> 
> The testpmd instances run above can also be run within a docker
> container. Using a dockerfile like below we can run testpmd in a
> container getting the packets in a zero-copy manner from the io-proxy
> running on the host.
> 
>    # syntax=docker/dockerfile:1-labs
>    FROM alpine
>    RUN apk add --update alpine-sdk \
>            py3-elftools meson ninja \
>            bsd-compat-headers \
>            linux-headers \
>            numactl-dev \
>            bash
>    ADD . dpdk
>    WORKDIR dpdk
>    RUN rm -rf build
>    RUN meson setup -Denable_drivers=*/shared_mem -Ddisable_libs=* \
>         -Denable_apps=test-pmd -Dtests=false build
>    RUN ninja -v -C build
>    ENTRYPOINT ["/dpdk/build/app/dpdk-testpmd"]
> 
> To access the proxy, all the container needs is access to the unix
> socket on the filesystem. Since in the example startup script each
> socket is placed in its own directory we can use "--volume" parameter to
> give each instance it's own unique unix socket, and therefore proxied
> NIC RX/TX queue. To run four testpmd instances as above, just in
> containers the following commands can be used - assuming the dockerfile
> above is built to an image called "testpmd".
> 
> 	docker run -it --volume=/tmp/socket_0_0:/run testpmd \
> 		-l 24,26 --no-huge -a sock:/run/sock -- \
> 		--no-mlockall --forward-mode=macswap
> 	docker run -it --volume=/tmp/socket_0_1:/run testpmd \
> 		-l 24,27 --no-huge -a sock:/run/sock -- \
> 		--no-mlockall --forward-mode=macswap
> 	docker run -it --volume=/tmp/socket_1_0:/run testpmd \
> 		-l 24,28 --no-huge -a sock:/run/sock -- \
> 		--no-mlockall --forward-mode=macswap
> 	docker run -it --volume=/tmp/socket_1_1:/run testpmd \
> 		-l 24,29 --no-huge -a sock:/run/sock -- \
> 		--no-mlockall --forward-mode=macswap
> 
> NOTE: since these docker testpmd instances don't access IO or allocate
> hugepages directly, they should be runable without extra privileges, so
> long as they can connect to the unix socket.
> 
> Additional info
> ---------------
> 
> * Stats are available via app commandline
> * By default (#define in code), the proxy app only uses 2 queues per
>   port, so you can't configure more than that via cmdline
> * Any ports used by the proxy script must support queue reconfiguration
>   at runtime without stopping the port.
> * When a "guest" process connected to a socket terminates, all shared
>   memory used by that process is detroyed and a new memfd created on
>   reconnect.
> * The above setups using testpmd are the only ways in which this app and
>   drivers have been tested. I would be hopeful that other apps would
>   work too, but there are quite a few limitations (see my DPDK summit
>   talk for some more details on those).
> 
> Congratulations on reading this far! :-)
> All comments/feedback on this welcome.
> 
> Bruce Richardson (5):
>   bus: new driver to accept shared memory over unix socket
>   mempool: driver for mempools of mbufs on shared memory
>   net: new ethdev driver to communicate using shared mem
>   app: add IO proxy app using shared memory interfaces
>   app/io-proxy: add startup commands
> 
>  app/io-proxy/command_fns.c                 | 160 ++++++
>  app/io-proxy/commands.list                 |   6 +
>  app/io-proxy/datapath.c                    | 595 +++++++++++++++++++++
>  app/io-proxy/datapath.h                    |  37 ++
>  app/io-proxy/datapath_mp.c                 |  78 +++
>  app/io-proxy/dpdk-io-proxy.cmds            |   6 +
>  app/io-proxy/main.c                        |  71 +++
>  app/io-proxy/meson.build                   |  12 +
>  app/meson.build                            |   1 +
>  drivers/bus/meson.build                    |   1 +
>  drivers/bus/shared_mem/meson.build         |  11 +
>  drivers/bus/shared_mem/shared_mem_bus.c    | 323 +++++++++++
>  drivers/bus/shared_mem/shared_mem_bus.h    |  75 +++
>  drivers/bus/shared_mem/version.map         |  11 +
>  drivers/mempool/meson.build                |   1 +
>  drivers/mempool/shared_mem/meson.build     |  10 +
>  drivers/mempool/shared_mem/shared_mem_mp.c |  94 ++++
>  drivers/net/meson.build                    |   1 +
>  drivers/net/shared_mem/meson.build         |  11 +
>  drivers/net/shared_mem/shared_mem_eth.c    | 295 ++++++++++
>  20 files changed, 1799 insertions(+)
>  create mode 100644 app/io-proxy/command_fns.c
>  create mode 100644 app/io-proxy/commands.list
>  create mode 100644 app/io-proxy/datapath.c
>  create mode 100644 app/io-proxy/datapath.h
>  create mode 100644 app/io-proxy/datapath_mp.c
>  create mode 100644 app/io-proxy/dpdk-io-proxy.cmds
>  create mode 100644 app/io-proxy/main.c
>  create mode 100644 app/io-proxy/meson.build
>  create mode 100644 drivers/bus/shared_mem/meson.build
>  create mode 100644 drivers/bus/shared_mem/shared_mem_bus.c
>  create mode 100644 drivers/bus/shared_mem/shared_mem_bus.h
>  create mode 100644 drivers/bus/shared_mem/version.map
>  create mode 100644 drivers/mempool/shared_mem/meson.build
>  create mode 100644 drivers/mempool/shared_mem/shared_mem_mp.c
>  create mode 100644 drivers/net/shared_mem/meson.build
>  create mode 100644 drivers/net/shared_mem/shared_mem_eth.c
> 
> --
> 2.39.2
> 

This looked interesting but appears to be a dead end.
No more work, and never clear how it was different from memif.
Would need more documentation etc to be a real NIC.
If there is still interest resubmit it.

      parent reply	other threads:[~2025-02-07  1:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-22  8:19 Bruce Richardson
2023-09-22  8:19 ` [RFC PATCH 1/5] bus: new driver to accept shared memory over unix socket Bruce Richardson
2023-11-23 14:50   ` Jerin Jacob
2023-09-22  8:19 ` [RFC PATCH 2/5] mempool: driver for mempools of mbufs on shared memory Bruce Richardson
2023-09-22  8:19 ` [RFC PATCH 3/5] net: new ethdev driver to communicate using shared mem Bruce Richardson
2023-09-22  8:19 ` [RFC PATCH 4/5] app: add IO proxy app using shared memory interfaces Bruce Richardson
2023-09-22  8:19 ` [RFC PATCH 5/5] app/io-proxy: add startup commands Bruce Richardson
2025-02-07  1:55 ` Stephen Hemminger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250206175542.044244b7@hermes.local \
    --to=stephen@networkplumber.org \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).