* [v12 0/3] net/af_xdp: fix multi interface support for K8s
@ 2024-04-04 13:24 Maryam Tahhan
2024-04-04 13:24 ` [v12 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Maryam Tahhan @ 2024-04-04 13:24 UTC (permalink / raw)
To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
david.marchand, shibin.koikkara.reeny, ciara.loftus
Cc: dev, Maryam Tahhan
The original `use_cni` implementation was limited to
supporting only a single netdev in a DPDK pod. This patchset
aims to fix this limitation transparently to the end user.
It will also enable compatibility with the latest AF_XDP
Device Plugin.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
v12:
* Ensure backwards compability with libbpf versions that don't support
xsk_socket__update_xskmap().
v11:
* Fixed up typos picked up by checkpatch.
v10:
* Add UDS acronym
* Update `use_cni` in docs with ``use_cni``
* Remove reference to limitations and simply document behaviour
before and after DPDK 23.11.
v9:
* Fixup checkpatch issues.
v8:
* Go back to using `use_cni` vdev argument
* Introduce `use_map_pinning` vdev param.
* Rename `uds_path` to `dp_path` so that it can be used
with map pinning as well as `use_cni`.
* Set `dp_path` internally in the AF_XDP PMD if it's
not configured by the user.
* Clean up the original `use_cni` documentation separately
to coding changes.
v7:
* Give a more descriptive commit msg headline.
* Fixup typos in documentation.
v6:
* Add link to PR 81 in commit message
* Add release notes changes to this patchset
v5:
* Fix alignment for ETH_AF_XDP_USE_DP_UDS_PATH_ARG
* Remove use_cni references in af_xdp.rst
v4:
* Rename af_xdp_cni.rst to af_xdp_dp.rst
* Removed all incorrect references to CNI throughout af_xdp
PMD file.
* Fixed Typos in af_xdp_dp.rst
v3:
* Remove `use_cni` vdev argument as it's no longer needed.
* Update incorrect CNI references for the AF_XDP DP in the
documentation.
* Update the documentation to run a simple example with the
AF_XDP DP plugin in K8s.
v2:
* Rename sock_path to uds_path.
* Update documentation to reflect when CAP_BPF is needed.
* Fix testpmd arguments in the provided example for Pods.
* Use AF_XDP API to update the xskmap entry.
Maryam Tahhan (3):
docs: AF_XDP Device Plugin
net/af_xdp: fix multi interface support for K8s
net/af_xdp: support AF_XDP DP pinned maps
doc/guides/howto/af_xdp_cni.rst | 253 ------------------
doc/guides/howto/af_xdp_dp.rst | 340 +++++++++++++++++++++++++
doc/guides/howto/index.rst | 2 +-
doc/guides/nics/af_xdp.rst | 44 +++-
doc/guides/rel_notes/release_24_07.rst | 17 ++
drivers/net/af_xdp/compat.h | 15 ++
drivers/net/af_xdp/meson.build | 4 +
drivers/net/af_xdp/rte_eth_af_xdp.c | 171 +++++++++----
8 files changed, 544 insertions(+), 302 deletions(-)
delete mode 100644 doc/guides/howto/af_xdp_cni.rst
create mode 100644 doc/guides/howto/af_xdp_dp.rst
--
2.41.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [v12 1/3] docs: AF_XDP Device Plugin
2024-04-04 13:24 [v12 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
@ 2024-04-04 13:24 ` Maryam Tahhan
2024-04-04 13:24 ` [v12 2/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-04-04 13:24 ` [v12 3/3] net/af_xdp: support AF_XDP DP pinned maps Maryam Tahhan
2 siblings, 0 replies; 4+ messages in thread
From: Maryam Tahhan @ 2024-04-04 13:24 UTC (permalink / raw)
To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
david.marchand, shibin.koikkara.reeny, ciara.loftus
Cc: dev, Maryam Tahhan, stable
Fixup the references to the AF_XDP Device Plugin in
the documentation (was referred to as CNI previously)
and document the single netdev limitation for deploying
an AF_XDP based DPDK pod. Also renames af_xdp_cni.rst to
af_xdp_dp.rst
Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: stable@dpdk.org
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
doc/guides/howto/af_xdp_cni.rst | 253 ---------------------------
doc/guides/howto/af_xdp_dp.rst | 299 ++++++++++++++++++++++++++++++++
doc/guides/howto/index.rst | 2 +-
doc/guides/nics/af_xdp.rst | 4 +-
4 files changed, 302 insertions(+), 256 deletions(-)
delete mode 100644 doc/guides/howto/af_xdp_cni.rst
create mode 100644 doc/guides/howto/af_xdp_dp.rst
diff --git a/doc/guides/howto/af_xdp_cni.rst b/doc/guides/howto/af_xdp_cni.rst
deleted file mode 100644
index a1a6d5b99c..0000000000
--- a/doc/guides/howto/af_xdp_cni.rst
+++ /dev/null
@@ -1,253 +0,0 @@
-.. SPDX-License-Identifier: BSD-3-Clause
- Copyright(c) 2023 Intel Corporation.
-
-Using a CNI with the AF_XDP driver
-==================================
-
-Introduction
-------------
-
-CNI, the Container Network Interface, is a technology for configuring
-container network interfaces
-and which can be used to setup Kubernetes networking.
-AF_XDP is a Linux socket Address Family that enables an XDP program
-to redirect packets to a memory buffer in userspace.
-
-This document explains how to enable the `AF_XDP Plugin for Kubernetes`_ within
-a DPDK application using the :doc:`../nics/af_xdp` to connect and use these technologies.
-
-.. _AF_XDP Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
-
-
-Background
-----------
-
-The standard :doc:`../nics/af_xdp` initialization process involves loading an eBPF program
-onto the kernel netdev to be used by the PMD.
-This operation requires root or escalated Linux privileges
-and thus prevents the PMD from working in an unprivileged container.
-The AF_XDP CNI plugin handles this situation
-by providing a device plugin that performs the program loading.
-
-At a technical level the CNI opens a Unix Domain Socket and listens for a client
-to make requests over that socket.
-A DPDK application acting as a client connects and initiates a configuration "handshake".
-The client then receives a file descriptor which points to the XSKMAP
-associated with the loaded eBPF program.
-The XSKMAP is a BPF map of AF_XDP sockets (XSK).
-The client can then proceed with creating an AF_XDP socket
-and inserting that socket into the XSKMAP pointed to by the descriptor.
-
-The EAL vdev argument ``use_cni`` is used to indicate that the user wishes
-to run the PMD in unprivileged mode and to receive the XSKMAP file descriptor
-from the CNI.
-When this flag is set,
-the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag
-should be used when creating the socket
-to instruct libbpf not to load the default libbpf program on the netdev.
-Instead the loading is handled by the CNI.
-
-.. note::
-
- The Unix Domain Socket file path appear in the end user is "/tmp/afxdp.sock".
-
-
-Prerequisites
--------------
-
-Docker and container prerequisites:
-
-* Set up the device plugin
- as described in the instructions for `AF_XDP Plugin for Kubernetes`_.
-
-* The Docker image should contain the libbpf and libxdp libraries,
- which are dependencies for AF_XDP,
- and should include support for the ``ethtool`` command.
-
-* The Pod should have enabled the capabilities ``CAP_NET_RAW`` and ``CAP_BPF``
- for AF_XDP along with support for hugepages.
-
-* Increase locked memory limit so containers have enough memory for packet buffers.
- For example:
-
- .. code-block:: console
-
- cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
- [Service]
- LimitMEMLOCK=infinity
- EOF
-
-* dpdk-testpmd application should have AF_XDP feature enabled.
-
- For further information see the docs for the: :doc:`../../nics/af_xdp`.
-
-
-Example
--------
-
-Howto run dpdk-testpmd with CNI plugin:
-
-* Clone the CNI plugin
-
- .. code-block:: console
-
- # git clone https://github.com/intel/afxdp-plugins-for-kubernetes.git
-
-* Build the CNI plugin
-
- .. code-block:: console
-
- # cd afxdp-plugins-for-kubernetes/
- # make build
-
- .. note::
-
- CNI plugin has a dependence on the config.json.
-
- Sample Config.json
-
- .. code-block:: json
-
- {
- "logLevel":"debug",
- "logFile":"afxdp-dp-e2e.log",
- "pools":[
- {
- "name":"e2e",
- "mode":"primary",
- "timeout":30,
- "ethtoolCmds" : ["-L -device- combined 1"],
- "devices":[
- {
- "name":"ens785f0"
- }
- ]
- }
- ]
- }
-
- For further reference please use the `config.json`_
-
- .. _config.json: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/config.json
-
-* Create the Network Attachment definition
-
- .. code-block:: console
-
- # kubectl create -f nad.yaml
-
- Sample nad.yml
-
- .. code-block:: yaml
-
- apiVersion: "k8s.cni.cncf.io/v1"
- kind: NetworkAttachmentDefinition
- metadata:
- name: afxdp-e2e-test
- annotations:
- k8s.v1.cni.cncf.io/resourceName: afxdp/e2e
- spec:
- config: '{
- "cniVersion": "0.3.0",
- "type": "afxdp",
- "mode": "cdq",
- "logFile": "afxdp-cni-e2e.log",
- "logLevel": "debug",
- "ipam": {
- "type": "host-local",
- "subnet": "192.168.1.0/24",
- "rangeStart": "192.168.1.200",
- "rangeEnd": "192.168.1.216",
- "routes": [
- { "dst": "0.0.0.0/0" }
- ],
- "gateway": "192.168.1.1"
- }
- }'
-
- For further reference please use the `nad.yaml`_
-
- .. _nad.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/nad.yaml
-
-* Build the Docker image
-
- .. code-block:: console
-
- # docker build -t afxdp-e2e-test -f Dockerfile .
-
- Sample Dockerfile:
-
- .. code-block:: console
-
- FROM ubuntu:20.04
- RUN apt-get update -y
- RUN apt install build-essential libelf-dev -y
- RUN apt-get install iproute2 acl -y
- RUN apt install python3-pyelftools ethtool -y
- RUN apt install libnuma-dev libjansson-dev libpcap-dev net-tools -y
- RUN apt-get install clang llvm -y
- COPY ./libbpf<version>.tar.gz /tmp
- RUN cd /tmp && tar -xvmf libbpf<version>.tar.gz && cd libbpf/src && make install
- COPY ./libxdp<version>.tar.gz /tmp
- RUN cd /tmp && tar -xvmf libxdp<version>.tar.gz && cd libxdp && make install
-
- .. note::
-
- All the files that need to COPY-ed should be in the same directory as the Dockerfile
-
-* Run the Pod
-
- .. code-block:: console
-
- # kubectl create -f pod.yaml
-
- Sample pod.yaml:
-
- .. code-block:: yaml
-
- apiVersion: v1
- kind: Pod
- metadata:
- name: afxdp-e2e-test
- annotations:
- k8s.v1.cni.cncf.io/networks: afxdp-e2e-test
- spec:
- containers:
- - name: afxdp
- image: afxdp-e2e-test:latest
- imagePullPolicy: Never
- env:
- - name: LD_LIBRARY_PATH
- value: /usr/lib64/:/usr/local/lib/
- command: ["tail", "-f", "/dev/null"]
- securityContext:
- capabilities:
- add:
- - CAP_NET_RAW
- - CAP_BPF
- resources:
- requests:
- hugepages-2Mi: 2Gi
- memory: 2Gi
- afxdp/e2e: '1'
- limits:
- hugepages-2Mi: 2Gi
- memory: 2Gi
- afxdp/e2e: '1'
-
- For further reference please use the `pod.yaml`_
-
- .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/pod-1c1d.yaml
-
-* Run DPDK with a command like the following:
-
- .. code-block:: console
-
- kubectl exec -i <Pod name> --container <containers name> -- \
- /<Path>/dpdk-testpmd -l 0,1 --no-pci \
- --vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
- -- --no-mlockall --in-memory
-
-For further reference please use the `e2e`_ test case in `AF_XDP Plugin for Kubernetes`_
-
- .. _e2e: https://github.com/intel/afxdp-plugins-for-kubernetes/tree/v0.0.2/test/e2e
diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
new file mode 100644
index 0000000000..7166d904bd
--- /dev/null
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -0,0 +1,299 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2023 Intel Corporation.
+
+Using the AF_XDP driver in Kubernetes
+=====================================
+
+Introduction
+------------
+
+Two infrastructure components are needed in order to provision a pod that is
+using the AF_XDP PMD in Kubernetes:
+
+1. AF_XDP Device Plugin (DP).
+2. AF_XDP Container Network Interface (CNI) binary.
+
+Both of these components are available through the `AF_XDP Device Plugin for Kubernetes`_
+repository.
+
+The AF_XDP DP provisions and advertises networking interfaces to Kubernetes,
+while the CNI configures and plumbs network interfaces for the Pod.
+
+This document explains how to use the `AF_XDP Device Plugin for Kubernetes`_ with
+a DPDK application using the :doc:`../nics/af_xdp`.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+Background
+----------
+
+The standard :doc:`../nics/af_xdp` initialization process involves loading an eBPF program
+onto the kernel netdev to be used by the PMD.
+This operation requires root or escalated Linux privileges
+and thus prevents the PMD from working in an unprivileged container.
+The AF_XDP Device Plugin handles this situation
+by managing the eBPF program(s) on behalf of the Pod, outside of the pod context.
+
+At a technical level the AF_XDP Device Plugin opens a Unix Domain Socket (UDS) and listens for a client
+to make requests over that socket.
+A DPDK application acting as a client connects and initiates a configuration "handshake".
+After some validation on the Device Plugin side, the client receives a file descriptor which points to the XSKMAP
+associated with the loaded eBPF program.
+The XSKMAP is an eBPF map of AF_XDP sockets (XSK).
+The client can then proceed with creating an AF_XDP socket
+and inserting that socket into the XSKMAP pointed to by the descriptor.
+
+The EAL vdev argument ``use_cni`` is used to indicate that the user wishes
+to run the PMD in unprivileged mode and to receive the XSKMAP file descriptor
+from the CNI.
+When this flag is set,
+the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag
+should be used when creating the socket
+to instruct libbpf not to load the default libbpf program on the netdev.
+Instead the loading is handled by the AF_XDP Device Plugin.
+
+Limitations
+-----------
+
+For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
+the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
+is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
+and the pod is limited to a single netdev.
+
+.. note::
+
+ DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the
+ AF_XDP Device Plugin.
+
+The issue is if a single pod requests different devices from different pools it
+results in multiple UDS servers serving the pod with the container using only a
+single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at best one
+device might be able to complete the handshake. This has been fixed in the AF_XDP
+Device Plugin so that the mount point in the pods for the UDS appear at
+``/tmp/afxdp_dp/<netdev>/afxdp.sock``. Later versions of DPDK fix this hardcoded path
+in the PMD alongside the ``use_cni`` parameter.
+
+.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
+
+
+Prerequisites
+-------------
+
+Device Plugin and DPDK container prerequisites:
+
+* Create a DPDK container image.
+
+* Set up the device plugin and prepare the Pod Spec as described in
+ the instructions for `AF_XDP Device Plugin for Kubernetes`_.
+
+* The Docker image should contain the libbpf and libxdp libraries,
+ which are dependencies for AF_XDP,
+ and should include support for the ``ethtool`` command.
+
+* The Pod should have enabled the capabilities ``CAP_NET_RAW`` for
+ AF_XDP socket creation, ``IPC_LOCK`` for umem creation and
+ ``CAP_BPF`` (for Kernel < 5.19) along with support for hugepages.
+
+ .. note::
+
+ For Kernel versions < 5.19, all BPF sys calls required CAP_BPF, to access maps shared
+ between the eBFP program and the userspace program. Kernels >= 5.19, only requires CAP_BPF
+ for map creation (BPF_MAP_CREATE) and loading programs (BPF_PROG_LOAD).
+
+* Increase locked memory limit so containers have enough memory for packet buffers.
+ For example:
+
+ .. code-block:: console
+
+ cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+ [Service]
+ LimitMEMLOCK=infinity
+ EOF
+
+* dpdk-testpmd application should have AF_XDP feature enabled.
+
+ For further information see the docs for the: :doc:`../../nics/af_xdp`.
+
+
+Example
+-------
+
+Build a DPDK container image (using Docker)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+1. Create a Dockerfile (should be placed in top level DPDK directory):
+
+ .. code-block:: console
+
+ FROM fedora:38
+
+ # Setup container to build DPDK applications
+ RUN dnf -y upgrade && dnf -y install \
+ libbsd-devel \
+ numactl-libs \
+ libbpf-devel \
+ libbpf \
+ meson \
+ ninja-build \
+ libxdp-devel \
+ libxdp \
+ numactl-devel \
+ python3-pyelftools \
+ python38 \
+ iproute
+ RUN dnf groupinstall -y 'Development Tools'
+
+ # Create DPDK dir and copy over sources
+ # Create DPDK dir and copy over sources
+ COPY ./ /dpdk
+ WORKDIR /dpdk
+
+ # Build DPDK
+ RUN meson setup build
+ RUN ninja -C build
+
+2. Build a DPDK container image (using Docker)
+
+ .. code-block:: console
+
+ # docker build -t dpdk -f Dockerfile
+
+Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+* Clone the AF_XDP Device plugin and CNI
+
+ .. code-block:: console
+
+ # git clone https://github.com/intel/afxdp-plugins-for-kubernetes.git
+
+ .. note::
+
+ Ensure you have the AF_XDP Device Plugin + CNI prerequisites installed.
+
+* Build the AF_XDP Device plugin and CNI
+
+ .. code-block:: console
+
+ # cd afxdp-plugins-for-kubernetes/
+ # make image
+
+* Make sure to modify the image used by the `daemonset.yml`_ file in the deployments directory with
+ the following configuration:
+
+ .. _daemonset.yml : https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/deployments/daemonset.yml
+
+ .. code-block:: yaml
+
+ image: afxdp-device-plugin:latest
+
+ .. note::
+
+ This will select the AF_XDP DP image that was built locally. Detailed configuration
+ options can be found in the AF_XDP Device Plugin `readme`_ .
+
+ .. _readme: https://github.com/intel/afxdp-plugins-for-kubernetes#readme
+
+* Deploy the AF_XDP Device Plugin and CNI
+
+ .. code-block:: console
+
+ # kubectl create -f deployments/daemonset.yml
+
+* Create the Network Attachment definition
+
+ .. code-block:: console
+
+ # kubectl create -f nad.yaml
+
+ Sample nad.yml
+
+ .. code-block:: yaml
+
+ apiVersion: "k8s.cni.cncf.io/v1"
+ kind: NetworkAttachmentDefinition
+ metadata:
+ name: afxdp-network
+ annotations:
+ k8s.v1.cni.cncf.io/resourceName: afxdp/myPool
+ spec:
+ config: '{
+ "cniVersion": "0.3.0",
+ "type": "afxdp",
+ "mode": "primary",
+ "logFile": "afxdp-cni.log",
+ "logLevel": "debug",
+ "ethtoolCmds" : ["-N -device- rx-flow-hash udp4 fn",
+ "-N -device- flow-type udp4 dst-port 2152 action 22"
+ ],
+ "ipam": {
+ "type": "host-local",
+ "subnet": "192.168.1.0/24",
+ "rangeStart": "192.168.1.200",
+ "rangeEnd": "192.168.1.220",
+ "routes": [
+ { "dst": "0.0.0.0/0" }
+ ],
+ "gateway": "192.168.1.1"
+ }
+ }'
+
+ For further reference please use the example provided by the AF_XDP DP `nad.yaml`_
+
+ .. _nad.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/network-attachment-definition.yaml
+
+* Run the Pod
+
+ .. code-block:: console
+
+ # kubectl create -f pod.yaml
+
+ Sample pod.yaml:
+
+ .. code-block:: yaml
+
+ apiVersion: v1
+ kind: Pod
+ metadata:
+ name: dpdk
+ annotations:
+ k8s.v1.cni.cncf.io/networks: afxdp-network
+ spec:
+ containers:
+ - name: testpmd
+ image: dpdk:latest
+ command: ["tail", "-f", "/dev/null"]
+ securityContext:
+ capabilities:
+ add:
+ - NET_RAW
+ - IPC_LOCK
+ resources:
+ requests:
+ afxdp/myPool: '1'
+ limits:
+ hugepages-1Gi: 2Gi
+ cpu: 2
+ memory: 256Mi
+ afxdp/myPool: '1'
+ volumeMounts:
+ - name: hugepages
+ mountPath: /dev/hugepages
+ volumes:
+ - name: hugepages
+ emptyDir:
+ medium: HugePages
+
+ For further reference please use the `pod.yaml`_
+
+ .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
+
+* Run DPDK with a command like the following:
+
+ .. code-block:: console
+
+ kubectl exec -i <Pod name> --container <containers name> -- \
+ /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+ --vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
+ --no-mlockall --in-memory \
+ -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
diff --git a/doc/guides/howto/index.rst b/doc/guides/howto/index.rst
index 71a3381c36..a7692e8a97 100644
--- a/doc/guides/howto/index.rst
+++ b/doc/guides/howto/index.rst
@@ -8,7 +8,7 @@ HowTo Guides
:maxdepth: 2
:numbered:
- af_xdp_cni
+ af_xdp_dp
lm_bond_virtio_sriov
lm_virtio_vhost_user
flow_bifurcation
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 1932525d4d..4dd9c73742 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -155,9 +155,9 @@ use_cni
~~~~~~~
The EAL vdev argument ``use_cni`` is used to indicate that the user wishes to
-enable the `AF_XDP Plugin for Kubernetes`_ within a DPDK application.
+enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
-.. _AF_XDP Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
.. code-block:: console
--
2.41.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [v12 2/3] net/af_xdp: fix multi interface support for K8s
2024-04-04 13:24 [v12 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-04-04 13:24 ` [v12 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
@ 2024-04-04 13:24 ` Maryam Tahhan
2024-04-04 13:24 ` [v12 3/3] net/af_xdp: support AF_XDP DP pinned maps Maryam Tahhan
2 siblings, 0 replies; 4+ messages in thread
From: Maryam Tahhan @ 2024-04-04 13:24 UTC (permalink / raw)
To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
david.marchand, shibin.koikkara.reeny, ciara.loftus
Cc: dev, Maryam Tahhan, stable
The original 'use_cni' implementation, was added
to enable support for the AF_XDP PMD in a K8s env
without any escalated privileges.
However 'use_cni' used a hardcoded socket rather
than a configurable one. If a DPDK pod is requesting
multiple net devices and these devices are from
different pools, then the AF_XDP PMD attempts to
mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
Which means that at best only 1 netdev will handshake
correctly with the AF_XDP DP. This patch addresses
this by making the socket parameter configurable using
a new vdev param called 'dp_path' alongside the
original 'use_cni' param. If the 'dp_path' parameter
is not set alongside the 'use_cni' parameter, then
it's configured inside the AF_XDP PMD (transparently
to the user). This change has been tested
with the AF_XDP DP PR 81[1], with both single and
multiple interfaces.
[1] https://github.com/intel/afxdp-plugins-for-kubernetes/pull/81
Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: stable@dpdk.org
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
doc/guides/howto/af_xdp_dp.rst | 62 ++++++++++------
doc/guides/nics/af_xdp.rst | 14 ++++
doc/guides/rel_notes/release_24_07.rst | 7 ++
drivers/net/af_xdp/compat.h | 15 ++++
drivers/net/af_xdp/meson.build | 4 ++
drivers/net/af_xdp/rte_eth_af_xdp.c | 98 ++++++++++++++++----------
6 files changed, 143 insertions(+), 57 deletions(-)
diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 7166d904bd..4aa6b5499f 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,29 +52,33 @@ should be used when creating the socket
to instruct libbpf not to load the default libbpf program on the netdev.
Instead the loading is handled by the AF_XDP Device Plugin.
-Limitations
------------
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
+argument then the AF_XDP PMD configures it internally.
-For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
-the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
-is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
-and the pod is limited to a single netdev.
+.. note::
+
+ DPDK AF_XDP PMD <= v23.11 will only work with the AF_XDP Device Plugin
+ <= commit id `38317c2`_.
.. note::
- DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the
- AF_XDP Device Plugin.
+ DPDK AF_XDP PMD > v23.11 will work with latest version of the
+ AF_XDP Device Plugin through a combination of the ``dp_path`` and/or
+ the ``use_cni`` parameter. In these versions of the PMD if a user doesn't
+ explicitly set the ``dp_path`` parameter when using ``use_cni`` then that
+ path is transparently configured in the AF_XDP PMD to the default
+ `AF_XDP Device Plugin for Kubernetes`_ mount point path. The path can
+ be overridden by explicitly setting the ``dp_path`` param.
-The issue is if a single pod requests different devices from different pools it
-results in multiple UDS servers serving the pod with the container using only a
-single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at best one
-device might be able to complete the handshake. This has been fixed in the AF_XDP
-Device Plugin so that the mount point in the pods for the UDS appear at
-``/tmp/afxdp_dp/<netdev>/afxdp.sock``. Later versions of DPDK fix this hardcoded path
-in the PMD alongside the ``use_cni`` parameter.
+.. note::
-.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
+ DPDK AF_XDP PMD > v23.11 is backwards compatible with (older) versions
+ of the AF_XDP DP <= commit id `38317c2`_ by explicitly setting ``dp_path`` to
+ ``/tmp/afxdp.sock``.
+.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
Prerequisites
-------------
@@ -105,10 +109,10 @@ Device Plugin and DPDK container prerequisites:
.. code-block:: console
- cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
- [Service]
- LimitMEMLOCK=infinity
- EOF
+ cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+ [Service]
+ LimitMEMLOCK=infinity
+ EOF
* dpdk-testpmd application should have AF_XDP feature enabled.
@@ -284,7 +288,7 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
emptyDir:
medium: HugePages
- For further reference please use the `pod.yaml`_
+ For further reference please see the `pod.yaml`_
.. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
@@ -297,3 +301,19 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
--vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
--no-mlockall --in-memory \
-- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
+ Or
+
+ .. code-block:: console
+
+ kubectl exec -i <Pod name> --container <containers name> -- \
+ /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+ --vdev=net_af_xdp0,use_cni=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/afxdp.sock" \
+ --no-mlockall --in-memory \
+ -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
+.. note::
+
+ If the ``dp_path`` parameter isn't explicitly set (like the example above)
+ the AF_XDP PMD will set the parameter value to
+ ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 4dd9c73742..7f8651beda 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -171,6 +171,20 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
so enabling and disabling of the promiscuous mode through the DPDK application
is also not supported.
+dp_path
+~~~~~~~
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
+alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+ --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
+
Limitations
-----------
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24cf99..2b85ae55aa 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -55,6 +55,13 @@ New Features
Also, make sure to start the actual text at the margin.
=======================================================
+* **Enabled AF_XDP PMD multi interface (UDS) support with AF_XDP Device Plugin**.
+
+ The EAL vdev argument for the AF_XDP PMD ``use_cni`` previously limited
+ a pod to using only a single netdev/interface. The latest changes (adding
+ the ``dp_path`` parameter) remove this limitation and maintain backward
+ compatibility for any applications already using the ``use_cni`` vdev
+ argument with the AF_XDP Device Plugin.
Removed Items
-------------
diff --git a/drivers/net/af_xdp/compat.h b/drivers/net/af_xdp/compat.h
index 28ea64aeaa..62a60d242b 100644
--- a/drivers/net/af_xdp/compat.h
+++ b/drivers/net/af_xdp/compat.h
@@ -46,6 +46,21 @@ create_shared_socket(struct xsk_socket **xsk_ptr __rte_unused,
}
#endif
+#ifdef ETH_AF_XDP_UPDATE_XSKMAP
+static __rte_always_inline int
+update_xskmap(struct xsk_socket *xsk, int map_fd, int xsk_queue_idx __rte_unused)
+{
+ return xsk_socket__update_xskmap(xsk, map_fd);
+}
+#else
+static __rte_always_inline int
+update_xskmap(struct xsk_socket *xsk, int map_fd, int xsk_queue_idx)
+{
+ int fd = xsk_socket__fd(xsk);
+ return bpf_map_update_elem(map_fd, &xsk_queue_idx, &map_fd, 0);
+}
+#endif
+
#ifdef XDP_USE_NEED_WAKEUP
static int
tx_syscall_needed(struct xsk_ring_prod *q)
diff --git a/drivers/net/af_xdp/meson.build b/drivers/net/af_xdp/meson.build
index 9f33e57fa2..280bfa8f80 100644
--- a/drivers/net/af_xdp/meson.build
+++ b/drivers/net/af_xdp/meson.build
@@ -77,6 +77,10 @@ if build
dependencies : bpf_dep, args: cflags)
cflags += ['-DRTE_NET_AF_XDP_LIBBPF_XDP_ATTACH']
endif
+ if cc.has_function('xsk_socket__update_xskmap', prefix : xsk_check_prefix,
+ dependencies : ext_deps, args: cflags)
+ cflags += ['-DETH_AF_XDP_UPDATE_XSKMAP']
+ endif
endif
require_iova_in_mbuf = false
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 268a130c49..83903ae82a 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -83,12 +83,13 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
#define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
+#define DP_BASE_PATH "/tmp/afxdp_dp"
+#define DP_UDS_SOCK "afxdp.sock"
#define MAX_LONG_OPT_SZ 64
#define UDS_MAX_FD_NUM 2
#define UDS_MAX_CMD_LEN 64
#define UDS_MAX_CMD_RESP 128
#define UDS_XSK_MAP_FD_MSG "/xsk_map_fd"
-#define UDS_SOCK "/tmp/afxdp.sock"
#define UDS_CONNECT_MSG "/connect"
#define UDS_HOST_OK_MSG "/host_ok"
#define UDS_HOST_NAK_MSG "/host_nak"
@@ -171,6 +172,7 @@ struct pmd_internals {
bool custom_prog_configured;
bool force_copy;
bool use_cni;
+ char dp_path[PATH_MAX];
struct bpf_map *map;
struct rte_ether_addr eth_addr;
@@ -191,6 +193,7 @@ struct pmd_process_private {
#define ETH_AF_XDP_BUDGET_ARG "busy_budget"
#define ETH_AF_XDP_FORCE_COPY_ARG "force_copy"
#define ETH_AF_XDP_USE_CNI_ARG "use_cni"
+#define ETH_AF_XDP_DP_PATH_ARG "dp_path"
static const char * const valid_arguments[] = {
ETH_AF_XDP_IFACE_ARG,
@@ -201,6 +204,7 @@ static const char * const valid_arguments[] = {
ETH_AF_XDP_BUDGET_ARG,
ETH_AF_XDP_FORCE_COPY_ARG,
ETH_AF_XDP_USE_CNI_ARG,
+ ETH_AF_XDP_DP_PATH_ARG,
NULL
};
@@ -1351,7 +1355,7 @@ configure_preferred_busy_poll(struct pkt_rx_queue *rxq)
}
static int
-init_uds_sock(struct sockaddr_un *server)
+init_uds_sock(struct sockaddr_un *server, const char *dp_path)
{
int sock;
@@ -1362,7 +1366,7 @@ init_uds_sock(struct sockaddr_un *server)
}
server->sun_family = AF_UNIX;
- strlcpy(server->sun_path, UDS_SOCK, sizeof(server->sun_path));
+ strlcpy(server->sun_path, dp_path, sizeof(server->sun_path));
if (connect(sock, (struct sockaddr *)server, sizeof(struct sockaddr_un)) < 0) {
close(sock);
@@ -1382,7 +1386,7 @@ struct msg_internal {
};
static int
-send_msg(int sock, char *request, int *fd)
+send_msg(int sock, char *request, int *fd, const char *dp_path)
{
int snd;
struct iovec iov;
@@ -1393,7 +1397,7 @@ send_msg(int sock, char *request, int *fd)
memset(&dst, 0, sizeof(dst));
dst.sun_family = AF_UNIX;
- strlcpy(dst.sun_path, UDS_SOCK, sizeof(dst.sun_path));
+ strlcpy(dst.sun_path, dp_path, sizeof(dst.sun_path));
/* Initialize message header structure */
memset(&msgh, 0, sizeof(msgh));
@@ -1470,8 +1474,8 @@ read_msg(int sock, char *response, struct sockaddr_un *s, int *fd)
}
static int
-make_request_cni(int sock, struct sockaddr_un *server, char *request,
- int *req_fd, char *response, int *out_fd)
+make_request_dp(int sock, struct sockaddr_un *server, char *request,
+ int *req_fd, char *response, int *out_fd, const char *dp_path)
{
int rval;
@@ -1483,7 +1487,7 @@ make_request_cni(int sock, struct sockaddr_un *server, char *request,
if (req_fd == NULL)
rval = write(sock, request, strlen(request));
else
- rval = send_msg(sock, request, req_fd);
+ rval = send_msg(sock, request, req_fd, dp_path);
if (rval < 0) {
AF_XDP_LOG(ERR, "Write error %s\n", strerror(errno));
@@ -1507,7 +1511,7 @@ check_response(char *response, char *exp_resp, long size)
}
static int
-get_cni_fd(char *if_name)
+uds_get_xskmap_fd(char *if_name, const char *dp_path)
{
char request[UDS_MAX_CMD_LEN], response[UDS_MAX_CMD_RESP];
char hostname[MAX_LONG_OPT_SZ], exp_resp[UDS_MAX_CMD_RESP];
@@ -1520,14 +1524,14 @@ get_cni_fd(char *if_name)
return -1;
memset(&server, 0, sizeof(server));
- sock = init_uds_sock(&server);
+ sock = init_uds_sock(&server, dp_path);
if (sock < 0)
return -1;
- /* Initiates handshake to CNI send: /connect,hostname */
+ /* Initiates handshake to the AF_XDP Device Plugin send: /connect,hostname */
snprintf(request, sizeof(request), "%s,%s", UDS_CONNECT_MSG, hostname);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1541,7 +1545,7 @@ get_cni_fd(char *if_name)
/* Request for "/version" */
strlcpy(request, UDS_VERSION_MSG, UDS_MAX_CMD_LEN);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1549,7 +1553,7 @@ get_cni_fd(char *if_name)
/* Request for file descriptor for netdev name*/
snprintf(request, sizeof(request), "%s,%s", UDS_XSK_MAP_FD_MSG, if_name);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1571,7 +1575,7 @@ get_cni_fd(char *if_name)
/* Initiate close connection */
strlcpy(request, UDS_FIN_MSG, UDS_MAX_CMD_LEN);
memset(response, 0, sizeof(response));
- if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+ if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
goto err_close;
}
@@ -1695,21 +1699,22 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
}
if (internals->use_cni) {
- int err, fd, map_fd;
+ int err, map_fd;
- /* get socket fd from CNI plugin */
- map_fd = get_cni_fd(internals->if_name);
+ /* get socket fd from AF_XDP Device Plugin */
+ map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
if (map_fd < 0) {
- AF_XDP_LOG(ERR, "Failed to receive CNI plugin fd\n");
+ AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
goto out_xsk;
}
- /* get socket fd */
- fd = xsk_socket__fd(rxq->xsk);
- err = bpf_map_update_elem(map_fd, &rxq->xsk_queue_idx, &fd, 0);
+
+ err = update_xskmap(rxq->xsk, map_fd, rxq->xsk_queue_idx);
if (err) {
- AF_XDP_LOG(ERR, "Failed to insert unprivileged xsk in map.\n");
+ AF_XDP_LOG(ERR, "Failed to insert xsk in map.\n");
goto out_xsk;
}
+ AF_XDP_LOG(DEBUG,"Inserted xsk in map.\n");
+
} else if (rxq->busy_budget) {
ret = configure_preferred_busy_poll(rxq);
if (ret) {
@@ -1881,13 +1886,13 @@ static const struct eth_dev_ops ops = {
.get_monitor_addr = eth_get_monitor_addr,
};
-/* CNI option works in unprivileged container environment
- * and ethernet device functionality will be reduced. So
- * additional customiszed eth_dev_ops struct is needed
- * for cni. Promiscuous enable and disable functionality
- * is removed.
+/* AF_XDP Device Plugin option works in unprivileged
+ * container environments and ethernet device functionality
+ * will be reduced. So additional customised eth_dev_ops
+ * struct is needed for the Device Plugin. Promiscuous
+ * enable and disable functionality is removed.
**/
-static const struct eth_dev_ops ops_cni = {
+static const struct eth_dev_ops ops_afxdp_dp = {
.dev_start = eth_dev_start,
.dev_stop = eth_dev_stop,
.dev_close = eth_dev_close,
@@ -2023,7 +2028,8 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
static int
parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
int *queue_cnt, int *shared_umem, char *prog_path,
- int *busy_budget, int *force_copy, int *use_cni)
+ int *busy_budget, int *force_copy, int *use_cni,
+ char *dp_path)
{
int ret;
@@ -2069,6 +2075,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
if (ret < 0)
goto free_kvlist;
+ ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
+ &parse_prog_arg, dp_path);
+ if (ret < 0)
+ goto free_kvlist;
+
free_kvlist:
rte_kvargs_free(kvlist);
return ret;
@@ -2108,7 +2119,7 @@ static struct rte_eth_dev *
init_internals(struct rte_vdev_device *dev, const char *if_name,
int start_queue_idx, int queue_cnt, int shared_umem,
const char *prog_path, int busy_budget, int force_copy,
- int use_cni)
+ int use_cni, const char *dp_path)
{
const char *name = rte_vdev_device_name(dev);
const unsigned int numa_node = dev->device.numa_node;
@@ -2138,6 +2149,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
internals->shared_umem = shared_umem;
internals->force_copy = force_copy;
internals->use_cni = use_cni;
+ strlcpy(internals->dp_path, dp_path, PATH_MAX);
if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
&internals->combined_queue_cnt)) {
@@ -2199,7 +2211,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
if (!internals->use_cni)
eth_dev->dev_ops = &ops;
else
- eth_dev->dev_ops = &ops_cni;
+ eth_dev->dev_ops = &ops_afxdp_dp;
eth_dev->rx_pkt_burst = eth_af_xdp_rx;
eth_dev->tx_pkt_burst = eth_af_xdp_tx;
@@ -2328,6 +2340,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
int busy_budget = -1, ret;
int force_copy = 0;
int use_cni = 0;
+ char dp_path[PATH_MAX] = {'\0'};
struct rte_eth_dev *eth_dev = NULL;
const char *name = rte_vdev_device_name(dev);
@@ -2370,7 +2383,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
&xsk_queue_cnt, &shared_umem, prog_path,
- &busy_budget, &force_copy, &use_cni) < 0) {
+ &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
AF_XDP_LOG(ERR, "Invalid kvargs value\n");
return -EINVAL;
}
@@ -2384,7 +2397,19 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
if (use_cni && strnlen(prog_path, PATH_MAX)) {
AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_PROG_ARG);
- return -EINVAL;
+ return -EINVAL;
+ }
+
+ if (use_cni && !strnlen(dp_path, PATH_MAX)) {
+ snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_UDS_SOCK);
+ AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
+ ETH_AF_XDP_DP_PATH_ARG, dp_path);
+ }
+
+ if (!use_cni && strnlen(dp_path, PATH_MAX)) {
+ AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
+ ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
+ return -EINVAL;
}
if (strlen(if_name) == 0) {
@@ -2410,7 +2435,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
xsk_queue_cnt, shared_umem, prog_path,
- busy_budget, force_copy, use_cni);
+ busy_budget, force_copy, use_cni, dp_path);
if (eth_dev == NULL) {
AF_XDP_LOG(ERR, "Failed to init internals\n");
return -1;
@@ -2471,4 +2496,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
"xdp_prog=<string> "
"busy_budget=<int> "
"force_copy=<int> "
- "use_cni=<int> ");
+ "use_cni=<int> "
+ "dp_path=<string> ");
--
2.41.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [v12 3/3] net/af_xdp: support AF_XDP DP pinned maps
2024-04-04 13:24 [v12 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-04-04 13:24 ` [v12 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
2024-04-04 13:24 ` [v12 2/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
@ 2024-04-04 13:24 ` Maryam Tahhan
2 siblings, 0 replies; 4+ messages in thread
From: Maryam Tahhan @ 2024-04-04 13:24 UTC (permalink / raw)
To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
david.marchand, shibin.koikkara.reeny, ciara.loftus
Cc: dev, Maryam Tahhan
Enable the AF_XDP PMD to retrieve the xskmap
from a pinned eBPF map. This map is expected
to be pinned by an external entity like the
AF_XDP Device Plugin. This enabled unprivileged
pods to create and use AF_XDP sockets.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
doc/guides/howto/af_xdp_dp.rst | 35 ++++++++--
doc/guides/nics/af_xdp.rst | 34 ++++++++--
doc/guides/rel_notes/release_24_07.rst | 10 +++
drivers/net/af_xdp/rte_eth_af_xdp.c | 93 ++++++++++++++++++++------
4 files changed, 141 insertions(+), 31 deletions(-)
diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 4aa6b5499f..8b9b5ebbad 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,10 +52,21 @@ should be used when creating the socket
to instruct libbpf not to load the default libbpf program on the netdev.
Instead the loading is handled by the AF_XDP Device Plugin.
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
-argument then the AF_XDP PMD configures it internally.
+The EAL vdev argument ``use_pinned_map`` is used indicate to the AF_XDP PMD to
+retrieve the XSKMAP fd from a pinned eBPF map. This map is expected to be pinned
+by an external entity like the AF_XDP Device Plugin. This enabled unprivileged pods
+to create and use AF_XDP sockets. When this flag is set, the
+``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag is used by the AF_XDP PMD when
+creating the AF_XDP socket.
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or ``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for Kubernetes`_.
.. note::
@@ -312,8 +323,18 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
--no-mlockall --in-memory \
-- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+ Or
+
+ .. code-block:: console
+
+ kubectl exec -i <Pod name> --container <containers name> -- \
+ /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+ --vdev=net_af_xdp0,use_pinned_map=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/xsks_map" \
+ --no-mlockall --in-memory \
+ -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
.. note::
- If the ``dp_path`` parameter isn't explicitly set (like the example above)
- the AF_XDP PMD will set the parameter value to
- ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
+ If the ``dp_path`` parameter isn't explicitly set with ``use_cni`` or ``use_pinned_map``
+ the AF_XDP PMD will set the parameter values to the `AF_XDP Device Plugin for Kubernetes`_
+ defaults.
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 7f8651beda..940bbf60f2 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -171,13 +171,35 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
so enabling and disabling of the promiscuous mode through the DPDK application
is also not supported.
+use_pinned_map
+~~~~~~~~~~~~~~
+
+The EAL vdev argument ``use_pinned_map`` is used to indicate that the user wishes to
+load a pinned xskmap mounted by `AF_XDP Device Plugin for Kubernetes`_ in the DPDK
+application/pod.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+ --vdev=net_af_xdp0,use_pinned_map=1
+
+.. note::
+
+ This feature can also be used with any external entity that can pin an eBPF map, not just
+ the `AF_XDP Device Plugin for Kubernetes`_.
+
dp_path
~~~~~~~
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
-alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or ``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for Kubernetes`_.
.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
@@ -185,6 +207,10 @@ alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
--vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
+.. code-block:: console
+
+ --vdev=net_af_xdp0,use_pinned_map=1,dp_path="/tmp/afxdp_dp/<<interface name>>/xsks_map"
+
Limitations
-----------
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index 2b85ae55aa..ffc4e02944 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -63,6 +63,16 @@ New Features
compatibility for any applications already using the ``use_cni`` vdev
argument with the AF_XDP Device Plugin.
+* **Integrated AF_XDP PMD with AF_XDP Device Plugin eBPF map pinning support**.
+
+ The EAL vdev argument for the AF_XDP PMD ``use_map_pinning`` was added
+ to allow Kubernetes Pods to use AF_XDP with DPDK, and run with limited
+ privileges, without having to do a full handshake over a Unix Domain
+ Socket with the Device Plugin. This flag indicates that the AF_XDP PMD
+ will be used in unprivileged mode and will obtain the XSKMAP FD by calling
+ ``bpf_obj_get()`` for an xskmap pinned (by the AF_XDP DP) inside the
+ container.
+
Removed Items
-------------
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 83903ae82a..3b861bff89 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -85,6 +85,7 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
#define DP_BASE_PATH "/tmp/afxdp_dp"
#define DP_UDS_SOCK "afxdp.sock"
+#define DP_XSK_MAP "xsks_map"
#define MAX_LONG_OPT_SZ 64
#define UDS_MAX_FD_NUM 2
#define UDS_MAX_CMD_LEN 64
@@ -172,6 +173,7 @@ struct pmd_internals {
bool custom_prog_configured;
bool force_copy;
bool use_cni;
+ bool use_pinned_map;
char dp_path[PATH_MAX];
struct bpf_map *map;
@@ -193,6 +195,7 @@ struct pmd_process_private {
#define ETH_AF_XDP_BUDGET_ARG "busy_budget"
#define ETH_AF_XDP_FORCE_COPY_ARG "force_copy"
#define ETH_AF_XDP_USE_CNI_ARG "use_cni"
+#define ETH_AF_XDP_USE_PINNED_MAP_ARG "use_pinned_map"
#define ETH_AF_XDP_DP_PATH_ARG "dp_path"
static const char * const valid_arguments[] = {
@@ -204,6 +207,7 @@ static const char * const valid_arguments[] = {
ETH_AF_XDP_BUDGET_ARG,
ETH_AF_XDP_FORCE_COPY_ARG,
ETH_AF_XDP_USE_CNI_ARG,
+ ETH_AF_XDP_USE_PINNED_MAP_ARG,
ETH_AF_XDP_DP_PATH_ARG,
NULL
};
@@ -1258,6 +1262,21 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
}
#endif
+static int
+get_pinned_map(const char *dp_path, int *map_fd)
+{
+ *map_fd = bpf_obj_get(dp_path);
+ if (!*map_fd) {
+ AF_XDP_LOG(ERR, "Failed to find xsks_map in %s\n", dp_path);
+ return -1;
+ }
+
+ AF_XDP_LOG(INFO, "Successfully retrieved map %s with fd %d\n",
+ dp_path, *map_fd);
+
+ return 0;
+}
+
static int
load_custom_xdp_prog(const char *prog_path, int if_index, struct bpf_map **map)
{
@@ -1644,7 +1663,7 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
#endif
/* Disable libbpf from loading XDP program */
- if (internals->use_cni)
+ if (internals->use_cni || internals->use_pinned_map)
cfg.libbpf_flags |= XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
if (strnlen(internals->prog_path, PATH_MAX)) {
@@ -1698,14 +1717,23 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
}
}
- if (internals->use_cni) {
+ if (internals->use_cni || internals->use_pinned_map) {
int err, map_fd;
- /* get socket fd from AF_XDP Device Plugin */
- map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
- if (map_fd < 0) {
- AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
- goto out_xsk;
+ if (internals->use_cni) {
+ /* get socket fd from AF_XDP Device Plugin */
+ map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
+ if (map_fd < 0) {
+ AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
+ goto out_xsk;
+ }
+ } else {
+ /* get socket fd from AF_XDP plugin */
+ err = get_pinned_map(internals->dp_path, &map_fd);
+ if (err < 0 || map_fd < 0) {
+ AF_XDP_LOG(ERR, "Failed to retrieve pinned map fd\n");
+ goto out_xsk;
+ }
}
err = update_xskmap(rxq->xsk, map_fd, rxq->xsk_queue_idx);
@@ -2029,7 +2057,7 @@ static int
parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
int *queue_cnt, int *shared_umem, char *prog_path,
int *busy_budget, int *force_copy, int *use_cni,
- char *dp_path)
+ int *use_pinned_map, char *dp_path)
{
int ret;
@@ -2075,6 +2103,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
if (ret < 0)
goto free_kvlist;
+ ret = rte_kvargs_process(kvlist, ETH_AF_XDP_USE_PINNED_MAP_ARG,
+ &parse_integer_arg, use_pinned_map);
+ if (ret < 0)
+ goto free_kvlist;
+
ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
&parse_prog_arg, dp_path);
if (ret < 0)
@@ -2119,7 +2152,7 @@ static struct rte_eth_dev *
init_internals(struct rte_vdev_device *dev, const char *if_name,
int start_queue_idx, int queue_cnt, int shared_umem,
const char *prog_path, int busy_budget, int force_copy,
- int use_cni, const char *dp_path)
+ int use_cni, int use_pinned_map, const char *dp_path)
{
const char *name = rte_vdev_device_name(dev);
const unsigned int numa_node = dev->device.numa_node;
@@ -2149,6 +2182,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
internals->shared_umem = shared_umem;
internals->force_copy = force_copy;
internals->use_cni = use_cni;
+ internals->use_pinned_map = use_pinned_map;
strlcpy(internals->dp_path, dp_path, PATH_MAX);
if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
@@ -2208,7 +2242,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
eth_dev->data->dev_link = pmd_link;
eth_dev->data->mac_addrs = &internals->eth_addr;
eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
- if (!internals->use_cni)
+ if (!internals->use_cni && !internals->use_pinned_map)
eth_dev->dev_ops = &ops;
else
eth_dev->dev_ops = &ops_afxdp_dp;
@@ -2340,6 +2374,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
int busy_budget = -1, ret;
int force_copy = 0;
int use_cni = 0;
+ int use_pinned_map = 0;
char dp_path[PATH_MAX] = {'\0'};
struct rte_eth_dev *eth_dev = NULL;
const char *name = rte_vdev_device_name(dev);
@@ -2383,20 +2418,29 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
&xsk_queue_cnt, &shared_umem, prog_path,
- &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
+ &busy_budget, &force_copy, &use_cni, &use_pinned_map,
+ dp_path) < 0) {
AF_XDP_LOG(ERR, "Invalid kvargs value\n");
return -EINVAL;
}
- if (use_cni && busy_budget > 0) {
+ if (use_cni && use_pinned_map) {
AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
- ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_BUDGET_ARG);
+ ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG);
return -EINVAL;
}
- if (use_cni && strnlen(prog_path, PATH_MAX)) {
- AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
- ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_PROG_ARG);
+ if ((use_cni || use_pinned_map) && busy_budget > 0) {
+ AF_XDP_LOG(ERR, "When '%s' or '%s' parameter is used, '%s' parameter is not valid\n",
+ ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG,
+ ETH_AF_XDP_BUDGET_ARG);
+ return -EINVAL;
+ }
+
+ if ((use_cni || use_pinned_map) && strnlen(prog_path, PATH_MAX)) {
+ AF_XDP_LOG(ERR, "When '%s' or '%s' parameter is used, '%s' parameter is not valid\n",
+ ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG,
+ ETH_AF_XDP_PROG_ARG);
return -EINVAL;
}
@@ -2406,9 +2450,16 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
ETH_AF_XDP_DP_PATH_ARG, dp_path);
}
- if (!use_cni && strnlen(dp_path, PATH_MAX)) {
- AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
- ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
+ if (use_pinned_map && !strnlen(dp_path, PATH_MAX)) {
+ snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_XSK_MAP);
+ AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
+ ETH_AF_XDP_DP_PATH_ARG, dp_path);
+ }
+
+ if ((!use_cni && !use_pinned_map) && strnlen(dp_path, PATH_MAX)) {
+ AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' or '%s' were not enabled\n",
+ ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG,
+ ETH_AF_XDP_USE_PINNED_MAP_ARG);
return -EINVAL;
}
@@ -2435,7 +2486,8 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
xsk_queue_cnt, shared_umem, prog_path,
- busy_budget, force_copy, use_cni, dp_path);
+ busy_budget, force_copy, use_cni, use_pinned_map,
+ dp_path);
if (eth_dev == NULL) {
AF_XDP_LOG(ERR, "Failed to init internals\n");
return -1;
@@ -2497,4 +2549,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
"busy_budget=<int> "
"force_copy=<int> "
"use_cni=<int> "
+ "use_pinned_map=<int> "
"dp_path=<string> ");
--
2.41.0
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-04-04 13:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-04 13:24 [v12 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-04-04 13:24 ` [v12 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
2024-04-04 13:24 ` [v12 2/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-04-04 13:24 ` [v12 3/3] net/af_xdp: support AF_XDP DP pinned maps Maryam Tahhan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).