DPDK patches and discussions
 help / color / mirror / Atom feed
* [v8 0/3] net/af_xdp: fix multi interface support for K8s
@ 2024-02-14 16:06 Maryam Tahhan
  2024-02-14 16:06 ` [v8 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Maryam Tahhan @ 2024-02-14 16:06 UTC (permalink / raw)
  To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
	david.marchand, shibin.koikkara.reeny, ciara.loftus
  Cc: dev, Maryam Tahhan

The original `use_cni` implementation was limited to
supporting only a single netdev in a DPDK pod. This patchset
aims to fix this limitation transparently to the end user.
It will also enable compatibility with the latest AF_XDP
Device Plugin. 

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
v8:
* Go back to using `use_cni` vdev argument
* Introduce `use_map_pinning` vdev param.
* Rename `uds_path` to `dp_path` so that it can be used
  with map pinning as well as `use_cni`.
* Set `dp_path` internally in the AF_XDP PMD if it's
  not configured by the user.
* Clean up the original `use_cni` documentation separately
  to coding changes.

v7:
* Give a more descriptive commit msg headline.
* Fixup typos in documentation.

v6:
* Add link to PR 81 in commit message
* Add release notes changes to this patchset

v5:
* Fix alignment for ETH_AF_XDP_USE_DP_UDS_PATH_ARG
* Remove use_cni references in af_xdp.rst

v4:
* Rename af_xdp_cni.rst to af_xdp_dp.rst
* Removed all incorrect references to CNI throughout af_xdp
  PMD file.
* Fixed Typos in af_xdp_dp.rst

v3:
* Remove `use_cni` vdev argument as it's no longer needed.
* Update incorrect CNI references for the AF_XDP DP in the
  documentation.
* Update the documentation to run a simple example with the
  AF_XDP DP plugin in K8s.

v2:
* Rename sock_path to uds_path.
* Update documentation to reflect when CAP_BPF is needed.
* Fix testpmd arguments in the provided example for Pods.
* Use AF_XDP API to update the xskmap entry.
---
Maryam Tahhan (3):
  docs: AF_XDP Device Plugin
  net/af_xdp: fix multi interface support for K8s
  net/af_xdp: support AF_XDP DP pinned maps

 doc/guides/howto/af_xdp_cni.rst        | 253 ------------------
 doc/guides/howto/af_xdp_dp.rst         | 349 +++++++++++++++++++++++++
 doc/guides/howto/index.rst             |   2 +-
 doc/guides/nics/af_xdp.rst             |  44 +++-
 doc/guides/rel_notes/release_24_03.rst |  17 ++
 drivers/net/af_xdp/rte_eth_af_xdp.c    | 162 ++++++++----
 6 files changed, 527 insertions(+), 300 deletions(-)
 delete mode 100644 doc/guides/howto/af_xdp_cni.rst
 create mode 100644 doc/guides/howto/af_xdp_dp.rst

-- 
2.41.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [v8 1/3] docs: AF_XDP Device Plugin
  2024-02-14 16:06 [v8 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
@ 2024-02-14 16:06 ` Maryam Tahhan
  2024-02-14 16:06 ` [v8 2/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Maryam Tahhan @ 2024-02-14 16:06 UTC (permalink / raw)
  To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
	david.marchand, shibin.koikkara.reeny, ciara.loftus
  Cc: dev, Maryam Tahhan, stable

Fixup the references to the AF_XDP Device Plugin in
the documentation (was referred to as CNI previously)
and document the single netdev limitation for deploying
an AF_XDP based DPDK pod. Also renames af_xdp_cni.rst to
af_xdp_dp.rst

Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: stable@dpdk.org

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
 doc/guides/howto/af_xdp_cni.rst | 253 ---------------------------
 doc/guides/howto/af_xdp_dp.rst  | 299 ++++++++++++++++++++++++++++++++
 doc/guides/howto/index.rst      |   2 +-
 doc/guides/nics/af_xdp.rst      |   4 +-
 4 files changed, 302 insertions(+), 256 deletions(-)
 delete mode 100644 doc/guides/howto/af_xdp_cni.rst
 create mode 100644 doc/guides/howto/af_xdp_dp.rst

diff --git a/doc/guides/howto/af_xdp_cni.rst b/doc/guides/howto/af_xdp_cni.rst
deleted file mode 100644
index a1a6d5b99c..0000000000
--- a/doc/guides/howto/af_xdp_cni.rst
+++ /dev/null
@@ -1,253 +0,0 @@
-.. SPDX-License-Identifier: BSD-3-Clause
-   Copyright(c) 2023 Intel Corporation.
-
-Using a CNI with the AF_XDP driver
-==================================
-
-Introduction
-------------
-
-CNI, the Container Network Interface, is a technology for configuring
-container network interfaces
-and which can be used to setup Kubernetes networking.
-AF_XDP is a Linux socket Address Family that enables an XDP program
-to redirect packets to a memory buffer in userspace.
-
-This document explains how to enable the `AF_XDP Plugin for Kubernetes`_ within
-a DPDK application using the :doc:`../nics/af_xdp` to connect and use these technologies.
-
-.. _AF_XDP Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
-
-
-Background
-----------
-
-The standard :doc:`../nics/af_xdp` initialization process involves loading an eBPF program
-onto the kernel netdev to be used by the PMD.
-This operation requires root or escalated Linux privileges
-and thus prevents the PMD from working in an unprivileged container.
-The AF_XDP CNI plugin handles this situation
-by providing a device plugin that performs the program loading.
-
-At a technical level the CNI opens a Unix Domain Socket and listens for a client
-to make requests over that socket.
-A DPDK application acting as a client connects and initiates a configuration "handshake".
-The client then receives a file descriptor which points to the XSKMAP
-associated with the loaded eBPF program.
-The XSKMAP is a BPF map of AF_XDP sockets (XSK).
-The client can then proceed with creating an AF_XDP socket
-and inserting that socket into the XSKMAP pointed to by the descriptor.
-
-The EAL vdev argument ``use_cni`` is used to indicate that the user wishes
-to run the PMD in unprivileged mode and to receive the XSKMAP file descriptor
-from the CNI.
-When this flag is set,
-the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag
-should be used when creating the socket
-to instruct libbpf not to load the default libbpf program on the netdev.
-Instead the loading is handled by the CNI.
-
-.. note::
-
-   The Unix Domain Socket file path appear in the end user is "/tmp/afxdp.sock".
-
-
-Prerequisites
--------------
-
-Docker and container prerequisites:
-
-* Set up the device plugin
-  as described in the instructions for `AF_XDP Plugin for Kubernetes`_.
-
-* The Docker image should contain the libbpf and libxdp libraries,
-  which are dependencies for AF_XDP,
-  and should include support for the ``ethtool`` command.
-
-* The Pod should have enabled the capabilities ``CAP_NET_RAW`` and ``CAP_BPF``
-  for AF_XDP along with support for hugepages.
-
-* Increase locked memory limit so containers have enough memory for packet buffers.
-  For example:
-
-  .. code-block:: console
-
-     cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
-     [Service]
-     LimitMEMLOCK=infinity
-     EOF
-
-* dpdk-testpmd application should have AF_XDP feature enabled.
-
-  For further information see the docs for the: :doc:`../../nics/af_xdp`.
-
-
-Example
--------
-
-Howto run dpdk-testpmd with CNI plugin:
-
-* Clone the CNI plugin
-
-  .. code-block:: console
-
-     # git clone https://github.com/intel/afxdp-plugins-for-kubernetes.git
-
-* Build the CNI plugin
-
-  .. code-block:: console
-
-     # cd afxdp-plugins-for-kubernetes/
-     # make build
-
-  .. note::
-
-     CNI plugin has a dependence on the config.json.
-
-  Sample Config.json
-
-  .. code-block:: json
-
-     {
-        "logLevel":"debug",
-        "logFile":"afxdp-dp-e2e.log",
-        "pools":[
-           {
-              "name":"e2e",
-              "mode":"primary",
-              "timeout":30,
-              "ethtoolCmds" : ["-L -device- combined 1"],
-              "devices":[
-                 {
-                    "name":"ens785f0"
-                 }
-              ]
-           }
-        ]
-     }
-
-  For further reference please use the `config.json`_
-
-  .. _config.json: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/config.json
-
-* Create the Network Attachment definition
-
-  .. code-block:: console
-
-     # kubectl create -f nad.yaml
-
-  Sample nad.yml
-
-  .. code-block:: yaml
-
-      apiVersion: "k8s.cni.cncf.io/v1"
-      kind: NetworkAttachmentDefinition
-      metadata:
-        name: afxdp-e2e-test
-        annotations:
-          k8s.v1.cni.cncf.io/resourceName: afxdp/e2e
-      spec:
-        config: '{
-            "cniVersion": "0.3.0",
-            "type": "afxdp",
-            "mode": "cdq",
-            "logFile": "afxdp-cni-e2e.log",
-            "logLevel": "debug",
-            "ipam": {
-              "type": "host-local",
-              "subnet": "192.168.1.0/24",
-              "rangeStart": "192.168.1.200",
-              "rangeEnd": "192.168.1.216",
-              "routes": [
-                { "dst": "0.0.0.0/0" }
-              ],
-              "gateway": "192.168.1.1"
-            }
-          }'
-
-  For further reference please use the `nad.yaml`_
-
-  .. _nad.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/nad.yaml
-
-* Build the Docker image
-
-  .. code-block:: console
-
-     # docker build -t afxdp-e2e-test -f Dockerfile .
-
-  Sample Dockerfile:
-
-  .. code-block:: console
-
-     FROM ubuntu:20.04
-     RUN apt-get update -y
-     RUN apt install build-essential libelf-dev -y
-     RUN apt-get install iproute2  acl -y
-     RUN apt install python3-pyelftools ethtool -y
-     RUN apt install libnuma-dev libjansson-dev libpcap-dev net-tools -y
-     RUN apt-get install clang llvm -y
-     COPY ./libbpf<version>.tar.gz /tmp
-     RUN cd /tmp && tar -xvmf libbpf<version>.tar.gz && cd libbpf/src && make install
-     COPY ./libxdp<version>.tar.gz /tmp
-     RUN cd /tmp && tar -xvmf libxdp<version>.tar.gz && cd libxdp && make install
-
-  .. note::
-
-     All the files that need to COPY-ed should be in the same directory as the Dockerfile
-
-* Run the Pod
-
-  .. code-block:: console
-
-     # kubectl create -f pod.yaml
-
-  Sample pod.yaml:
-
-  .. code-block:: yaml
-
-     apiVersion: v1
-     kind: Pod
-     metadata:
-       name: afxdp-e2e-test
-       annotations:
-         k8s.v1.cni.cncf.io/networks: afxdp-e2e-test
-     spec:
-       containers:
-       - name: afxdp
-         image: afxdp-e2e-test:latest
-         imagePullPolicy: Never
-         env:
-         - name: LD_LIBRARY_PATH
-           value: /usr/lib64/:/usr/local/lib/
-         command: ["tail", "-f", "/dev/null"]
-         securityContext:
-          capabilities:
-             add:
-               - CAP_NET_RAW
-               - CAP_BPF
-         resources:
-           requests:
-             hugepages-2Mi: 2Gi
-             memory: 2Gi
-             afxdp/e2e: '1'
-           limits:
-             hugepages-2Mi: 2Gi
-             memory: 2Gi
-             afxdp/e2e: '1'
-
-  For further reference please use the `pod.yaml`_
-
-  .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/pod-1c1d.yaml
-
-* Run DPDK with a command like the following:
-
-  .. code-block:: console
-
-     kubectl exec -i <Pod name> --container <containers name> -- \
-           /<Path>/dpdk-testpmd -l 0,1 --no-pci \
-           --vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
-           -- --no-mlockall --in-memory
-
-For further reference please use the `e2e`_ test case in `AF_XDP Plugin for Kubernetes`_
-
-  .. _e2e: https://github.com/intel/afxdp-plugins-for-kubernetes/tree/v0.0.2/test/e2e
diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
new file mode 100644
index 0000000000..657fc8d52c
--- /dev/null
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -0,0 +1,299 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright(c) 2023 Intel Corporation.
+
+Using the AF_XDP driver in Kubernetes
+=====================================
+
+Introduction
+------------
+
+Two infrastructure components are needed in order to provision a pod that is
+using the AF_XDP PMD in Kubernetes:
+
+1. AF_XDP Device Plugin (DP).
+2. AF_XDP Container Network Interface (CNI) binary.
+
+Both of these components are available through the `AF_XDP Device Plugin for Kubernetes`_
+repository.
+
+The AF_XDP DP provisions and advertises networking interfaces to Kubernetes,
+while the CNI configures and plumbs network interfaces for the Pod.
+
+This document explains how to use the `AF_XDP Device Plugin for Kubernetes`_ with
+a DPDK application using the :doc:`../nics/af_xdp`.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+Background
+----------
+
+The standard :doc:`../nics/af_xdp` initialization process involves loading an eBPF program
+onto the kernel netdev to be used by the PMD.
+This operation requires root or escalated Linux privileges
+and thus prevents the PMD from working in an unprivileged container.
+The AF_XDP Device Plugin handles this situation
+by managing the eBPF program(s) on behalf of the Pod, outside of the pod context.
+
+At a technical level the AF_XDP Device Plugin opens a Unix Domain Socket and listens for a client
+to make requests over that socket.
+A DPDK application acting as a client connects and initiates a configuration "handshake".
+After some validation on the Device Plugin side, the client receives a file descriptor which points to the XSKMAP
+associated with the loaded eBPF program.
+The XSKMAP is an eBPF map of AF_XDP sockets (XSK).
+The client can then proceed with creating an AF_XDP socket
+and inserting that socket into the XSKMAP pointed to by the descriptor.
+
+The EAL vdev argument ``use_cni`` is used to indicate that the user wishes
+to run the PMD in unprivileged mode and to receive the XSKMAP file descriptor
+from the CNI.
+When this flag is set,
+the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag
+should be used when creating the socket
+to instruct libbpf not to load the default libbpf program on the netdev.
+Instead the loading is handled by the AF_XDP Device Plugin.
+
+Limitations
+-----------
+
+For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
+the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
+is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
+and the pod is limited to a single netdev.
+
+.. note::
+
+    DPDK AF_XDP PMD <= v23.11 will not work with the latest version of the
+    AF_XDP Device Plugin.
+
+The issue is if a single pod requests different devices from different pools it
+results in multiple UDS servers serving the pod with the container using only a
+single mount point for their UDS as ``/tmp/afxdp.sock``. This means that at best one
+device might be able to complete the handshake. This has been fixed in the AF_XDP
+Device Plugin so that the mount point in the pods for the UDS appear at
+``/tmp/afxdp_dp/<netdev>/afxdp.sock``. Later versions of DPDK fix this hardcoded path
+in the PMD alongside the `use_cni` parameter.
+
+.. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
+
+
+Prerequisites
+-------------
+
+Device Plugin and DPDK container prerequisites:
+
+* Create a DPDK container image.
+
+* Set up the device plugin and prepare the Pod Spec as described in
+  the instructions for `AF_XDP Device Plugin for Kubernetes`_.
+
+* The Docker image should contain the libbpf and libxdp libraries,
+  which are dependencies for AF_XDP,
+  and should include support for the ``ethtool`` command.
+
+* The Pod should have enabled the capabilities ``CAP_NET_RAW`` for
+  AF_XDP socket creation, ``IPC_LOCK`` for umem creation and
+  ``CAP_BPF`` (for Kernel < 5.19) along with support for hugepages.
+
+  .. note::
+
+    For Kernel versions < 5.19, all BPF sys calls required CAP_BPF, to access maps shared
+    between the eBFP program and the userspace program. Kernels >= 5.19, only requires CAP_BPF
+    for map creation (BPF_MAP_CREATE) and loading programs (BPF_PROG_LOAD).
+
+* Increase locked memory limit so containers have enough memory for packet buffers.
+  For example:
+
+  .. code-block:: console
+
+     cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+     [Service]
+     LimitMEMLOCK=infinity
+     EOF
+
+* dpdk-testpmd application should have AF_XDP feature enabled.
+
+  For further information see the docs for the: :doc:`../../nics/af_xdp`.
+
+
+Example
+-------
+
+Build a DPDK container image (using Docker)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+1. Create a Dockerfile (should be placed in top level DPDK directory):
+
+  .. code-block:: console
+
+    FROM fedora:38
+
+    # Setup container to build DPDK applications
+    RUN dnf -y upgrade && dnf -y install \
+        libbsd-devel \
+        numactl-libs \
+        libbpf-devel \
+        libbpf \
+        meson \
+        ninja-build \
+        libxdp-devel \
+        libxdp \
+        numactl-devel \
+        python3-pyelftools \
+        python38 \
+        iproute
+    RUN dnf groupinstall -y 'Development Tools'
+
+    # Create DPDK dir and copy over sources
+    # Create DPDK dir and copy over sources
+    COPY ./ /dpdk
+    WORKDIR /dpdk
+
+    # Build DPDK
+    RUN meson setup build
+    RUN ninja -C build
+
+2. Build a DPDK container image (using Docker)
+
+  .. code-block:: console
+
+    # docker build -t dpdk -f Dockerfile
+
+Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+* Clone the AF_XDP Device plugin and CNI
+
+  .. code-block:: console
+
+     # git clone https://github.com/intel/afxdp-plugins-for-kubernetes.git
+
+  .. note::
+
+    Ensure you have the AF_XDP Device Plugin + CNI prerequisites installed.
+
+* Build the AF_XDP Device plugin and CNI
+
+  .. code-block:: console
+
+     # cd afxdp-plugins-for-kubernetes/
+     # make image
+
+* Make sure to modify the image used by the `daemonset.yml`_ file in the deployments directory with
+  the following configuration:
+
+   .. _daemonset.yml : https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/deployments/daemonset.yml
+
+  .. code-block:: yaml
+
+    image: afxdp-device-plugin:latest
+
+  .. note::
+
+    This will select the AF_XDP DP image that was built locally. Detailed configuration
+    options can be found in the AF_XDP Device Plugin `readme`_ .
+
+  .. _readme: https://github.com/intel/afxdp-plugins-for-kubernetes#readme
+
+* Deploy the AF_XDP Device Plugin and CNI
+
+  .. code-block:: console
+
+    # kubectl create -f deployments/daemonset.yml
+
+* Create the Network Attachment definition
+
+  .. code-block:: console
+
+     # kubectl create -f nad.yaml
+
+  Sample nad.yml
+
+  .. code-block:: yaml
+
+    apiVersion: "k8s.cni.cncf.io/v1"
+    kind: NetworkAttachmentDefinition
+    metadata:
+      name: afxdp-network
+      annotations:
+        k8s.v1.cni.cncf.io/resourceName: afxdp/myPool
+    spec:
+      config: '{
+          "cniVersion": "0.3.0",
+          "type": "afxdp",
+          "mode": "primary",
+          "logFile": "afxdp-cni.log",
+          "logLevel": "debug",
+          "ethtoolCmds" : ["-N -device- rx-flow-hash udp4 fn",
+                           "-N -device- flow-type udp4 dst-port 2152 action 22"
+                        ],
+          "ipam": {
+            "type": "host-local",
+            "subnet": "192.168.1.0/24",
+            "rangeStart": "192.168.1.200",
+            "rangeEnd": "192.168.1.220",
+            "routes": [
+              { "dst": "0.0.0.0/0" }
+            ],
+            "gateway": "192.168.1.1"
+          }
+        }'
+
+  For further reference please use the example provided by the AF_XDP DP `nad.yaml`_
+
+  .. _nad.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/network-attachment-definition.yaml
+
+* Run the Pod
+
+  .. code-block:: console
+
+     # kubectl create -f pod.yaml
+
+  Sample pod.yaml:
+
+  .. code-block:: yaml
+
+    apiVersion: v1
+    kind: Pod
+    metadata:
+     name: dpdk
+     annotations:
+       k8s.v1.cni.cncf.io/networks: afxdp-network
+    spec:
+      containers:
+      - name: testpmd
+        image: dpdk:latest
+        command: ["tail", "-f", "/dev/null"]
+        securityContext:
+          capabilities:
+            add:
+              - NET_RAW
+              - IPC_LOCK
+        resources:
+          requests:
+            afxdp/myPool: '1'
+          limits:
+            hugepages-1Gi: 2Gi
+            cpu: 2
+            memory: 256Mi
+            afxdp/myPool: '1'
+        volumeMounts:
+        - name: hugepages
+          mountPath: /dev/hugepages
+      volumes:
+      - name: hugepages
+        emptyDir:
+          medium: HugePages
+
+  For further reference please use the `pod.yaml`_
+
+  .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
+
+* Run DPDK with a command like the following:
+
+  .. code-block:: console
+
+     kubectl exec -i <Pod name> --container <containers name> -- \
+           /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+           --vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
+           --no-mlockall --in-memory \
+           -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
diff --git a/doc/guides/howto/index.rst b/doc/guides/howto/index.rst
index 71a3381c36..a7692e8a97 100644
--- a/doc/guides/howto/index.rst
+++ b/doc/guides/howto/index.rst
@@ -8,7 +8,7 @@ HowTo Guides
     :maxdepth: 2
     :numbered:
 
-    af_xdp_cni
+    af_xdp_dp
     lm_bond_virtio_sriov
     lm_virtio_vhost_user
     flow_bifurcation
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 1932525d4d..4dd9c73742 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -155,9 +155,9 @@ use_cni
 ~~~~~~~
 
 The EAL vdev argument ``use_cni`` is used to indicate that the user wishes to
-enable the `AF_XDP Plugin for Kubernetes`_ within a DPDK application.
+enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
 
-.. _AF_XDP Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
 
 .. code-block:: console
 
-- 
2.41.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [v8 2/3] net/af_xdp: fix multi interface support for K8s
  2024-02-14 16:06 [v8 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
  2024-02-14 16:06 ` [v8 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
@ 2024-02-14 16:06 ` Maryam Tahhan
  2024-02-14 16:06 ` [v8 3/3] net/af_xdp: support AF_XDP DP pinned maps Maryam Tahhan
  2024-02-14 16:18 ` [v8 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
  3 siblings, 0 replies; 5+ messages in thread
From: Maryam Tahhan @ 2024-02-14 16:06 UTC (permalink / raw)
  To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
	david.marchand, shibin.koikkara.reeny, ciara.loftus
  Cc: dev, Maryam Tahhan, stable

The original 'use_cni' implementation, was added
to enable support for the AF_XDP PMD in a K8s env
without any escalated privileges.
However 'use_cni' used a hardcoded socket rather
than a configurable one. If a DPDK pod is requesting
multiple net devices and these devices are from
different pools, then the AF_XDP PMD attempts to
mount all the netdev UDSes in the pod as /tmp/afxdp.sock.
Which means that at best only 1 netdev will handshake
correctly with the AF_XDP DP. This patch addresses
this by making the socket parameter configurable using
a new vdev param called 'dp_path' alongside the
original 'use_cni' param. If the 'dp_path' parameter
is not set alongside the 'use_cni' parameter, then
it's configured inside the AF_XDP PMD (transparently
to the user). This change has been tested
with the AF_XDP DP PR 81[1], with both single and
multiple interfaces.

Fixes: 7fc6ae50369d ("net/af_xdp: support CNI Integration")
Cc: stable@dpdk.org

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
 doc/guides/howto/af_xdp_dp.rst         | 43 ++++++++++--
 doc/guides/nics/af_xdp.rst             | 14 ++++
 doc/guides/rel_notes/release_24_03.rst |  7 ++
 drivers/net/af_xdp/rte_eth_af_xdp.c    | 92 ++++++++++++++++----------
 4 files changed, 115 insertions(+), 41 deletions(-)

diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 657fc8d52c..8a64ec5599 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,13 +52,18 @@ should be used when creating the socket
 to instruct libbpf not to load the default libbpf program on the netdev.
 Instead the loading is handled by the AF_XDP Device Plugin.
 
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
+argument then the AF_XDP PMD configures it internally.
+
 Limitations
 -----------
 
 For DPDK versions <= v23.11 the Unix Domain Socket file path appears in
 the pod at "/tmp/afxdp.sock". The handshake implementation in the AF_XDP PMD
-is only compatible with the AF_XDP Device Plugin up to commit id `38317c2`_
-and the pod is limited to a single netdev.
+is only compatible with the `AF_XDP Device Plugin for Kubernetes`_  up to
+commit id `38317c2`_ and the pod is limited to a single netdev.
 
 .. note::
 
@@ -75,6 +80,14 @@ in the PMD alongside the `use_cni` parameter.
 
 .. _38317c2: https://github.com/intel/afxdp-plugins-for-kubernetes/commit/38317c256b5c7dfb39e013a0f76010c2ded03669
 
+.. note::
+
+    The introduction of the ``dp_path`` EAL vdev argument fixes the limitation above. If a
+    user doesn't explicitly set the ``dp_path``parameter when using ``use_cni`` then that
+    path is transparently configured in the AF_XDP PMD to the default
+    `AF_XDP Device Plugin for Kubernetes`_ mount point path. This is compatible with the latest
+    AF_XDP Device Plugin. For backwards compatibility with versions of the AF_XDP DP <= commit
+    id `38317c2`_ please explicitly set ``dp_path`` to ``/tmp/afxdp.sock``.
 
 Prerequisites
 -------------
@@ -105,10 +118,10 @@ Device Plugin and DPDK container prerequisites:
 
   .. code-block:: console
 
-     cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
-     [Service]
-     LimitMEMLOCK=infinity
-     EOF
+    cat << EOF | sudo tee /etc/systemd/system/containerd.service.d/limits.conf
+    [Service]
+    LimitMEMLOCK=infinity
+    EOF
 
 * dpdk-testpmd application should have AF_XDP feature enabled.
 
@@ -284,7 +297,7 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
         emptyDir:
           medium: HugePages
 
-  For further reference please use the `pod.yaml`_
+  For further reference please see the `pod.yaml`_
 
   .. _pod.yaml: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/examples/pod-spec.yaml
 
@@ -297,3 +310,19 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
            --vdev=net_af_xdp0,use_cni=1,iface=<interface name> \
            --no-mlockall --in-memory \
            -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
+  Or
+
+  .. code-block:: console
+
+     kubectl exec -i <Pod name> --container <containers name> -- \
+           /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+           --vdev=net_af_xdp0,use_cni=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/afxdp.sock" \
+           --no-mlockall --in-memory \
+           -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
+.. note::
+
+    If the ``dp_path`` parameter isn't explicitly set (like the example above)
+    the AF_XDP PMD will set the parameter value to
+    ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 4dd9c73742..7f8651beda 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -171,6 +171,20 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
    so enabling and disabling of the promiscuous mode through the DPDK application
    is also not supported.
 
+dp_path
+~~~~~~~
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
+to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
+`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
+alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+   --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
+
 Limitations
 -----------
 
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 6f8ad27808..7f3653ad6f 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -55,6 +55,13 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Enabled AF_XDP PMD multi interface (UDS) support with AF_XDP Device Plugin**.
+
+  The EAL vdev argument for the AF_XDP PMD ``use_cni`` previously limited
+  a pod to using only a single netdev/interface. The latest changes (adding
+  the ``dp_path`` parameter) remove this limitation and maintain backward
+  compatibility for any applications already using the ``use_cni`` vdev
+  argument with the AF_XDP Device Plugin.
 
 Removed Items
 -------------
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 353c8688ec..6be8badf54 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -83,12 +83,13 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
 
 #define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
 
+#define DP_BASE_PATH			"/tmp/afxdp_dp"
+#define DP_UDS_SOCK             "afxdp.sock"
 #define MAX_LONG_OPT_SZ			64
 #define UDS_MAX_FD_NUM			2
 #define UDS_MAX_CMD_LEN			64
 #define UDS_MAX_CMD_RESP		128
 #define UDS_XSK_MAP_FD_MSG		"/xsk_map_fd"
-#define UDS_SOCK			"/tmp/afxdp.sock"
 #define UDS_CONNECT_MSG			"/connect"
 #define UDS_HOST_OK_MSG			"/host_ok"
 #define UDS_HOST_NAK_MSG		"/host_nak"
@@ -171,6 +172,7 @@ struct pmd_internals {
 	bool custom_prog_configured;
 	bool force_copy;
 	bool use_cni;
+	char dp_path[PATH_MAX];
 	struct bpf_map *map;
 
 	struct rte_ether_addr eth_addr;
@@ -191,6 +193,7 @@ struct pmd_process_private {
 #define ETH_AF_XDP_BUDGET_ARG			"busy_budget"
 #define ETH_AF_XDP_FORCE_COPY_ARG		"force_copy"
 #define ETH_AF_XDP_USE_CNI_ARG			"use_cni"
+#define ETH_AF_XDP_DP_PATH_ARG			"dp_path"
 
 static const char * const valid_arguments[] = {
 	ETH_AF_XDP_IFACE_ARG,
@@ -201,6 +204,7 @@ static const char * const valid_arguments[] = {
 	ETH_AF_XDP_BUDGET_ARG,
 	ETH_AF_XDP_FORCE_COPY_ARG,
 	ETH_AF_XDP_USE_CNI_ARG,
+	ETH_AF_XDP_DP_PATH_ARG,
 	NULL
 };
 
@@ -1351,7 +1355,7 @@ configure_preferred_busy_poll(struct pkt_rx_queue *rxq)
 }
 
 static int
-init_uds_sock(struct sockaddr_un *server)
+init_uds_sock(struct sockaddr_un *server, const char *dp_path)
 {
 	int sock;
 
@@ -1362,7 +1366,7 @@ init_uds_sock(struct sockaddr_un *server)
 	}
 
 	server->sun_family = AF_UNIX;
-	strlcpy(server->sun_path, UDS_SOCK, sizeof(server->sun_path));
+	strlcpy(server->sun_path, dp_path, sizeof(server->sun_path));
 
 	if (connect(sock, (struct sockaddr *)server, sizeof(struct sockaddr_un)) < 0) {
 		close(sock);
@@ -1382,7 +1386,7 @@ struct msg_internal {
 };
 
 static int
-send_msg(int sock, char *request, int *fd)
+send_msg(int sock, char *request, int *fd, const char *dp_path)
 {
 	int snd;
 	struct iovec iov;
@@ -1393,7 +1397,7 @@ send_msg(int sock, char *request, int *fd)
 
 	memset(&dst, 0, sizeof(dst));
 	dst.sun_family = AF_UNIX;
-	strlcpy(dst.sun_path, UDS_SOCK, sizeof(dst.sun_path));
+	strlcpy(dst.sun_path, dp_path, sizeof(dst.sun_path));
 
 	/* Initialize message header structure */
 	memset(&msgh, 0, sizeof(msgh));
@@ -1470,8 +1474,8 @@ read_msg(int sock, char *response, struct sockaddr_un *s, int *fd)
 }
 
 static int
-make_request_cni(int sock, struct sockaddr_un *server, char *request,
-		 int *req_fd, char *response, int *out_fd)
+make_request_dp(int sock, struct sockaddr_un *server, char *request,
+		 int *req_fd, char *response, int *out_fd, const char *dp_path)
 {
 	int rval;
 
@@ -1483,7 +1487,7 @@ make_request_cni(int sock, struct sockaddr_un *server, char *request,
 	if (req_fd == NULL)
 		rval = write(sock, request, strlen(request));
 	else
-		rval = send_msg(sock, request, req_fd);
+		rval = send_msg(sock, request, req_fd, dp_path);
 
 	if (rval < 0) {
 		AF_XDP_LOG(ERR, "Write error %s\n", strerror(errno));
@@ -1507,7 +1511,7 @@ check_response(char *response, char *exp_resp, long size)
 }
 
 static int
-get_cni_fd(char *if_name)
+uds_get_xskmap_fd(char *if_name, const char *dp_path)
 {
 	char request[UDS_MAX_CMD_LEN], response[UDS_MAX_CMD_RESP];
 	char hostname[MAX_LONG_OPT_SZ], exp_resp[UDS_MAX_CMD_RESP];
@@ -1520,14 +1524,14 @@ get_cni_fd(char *if_name)
 		return -1;
 
 	memset(&server, 0, sizeof(server));
-	sock = init_uds_sock(&server);
+	sock = init_uds_sock(&server, dp_path);
 	if (sock < 0)
 		return -1;
 
-	/* Initiates handshake to CNI send: /connect,hostname */
+	/* Initiates handshake to the AF_XDP Device Plugin send: /connect,hostname */
 	snprintf(request, sizeof(request), "%s,%s", UDS_CONNECT_MSG, hostname);
 	memset(response, 0, sizeof(response));
-	if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+	if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
 		goto err_close;
 	}
@@ -1541,7 +1545,7 @@ get_cni_fd(char *if_name)
 	/* Request for "/version" */
 	strlcpy(request, UDS_VERSION_MSG, UDS_MAX_CMD_LEN);
 	memset(response, 0, sizeof(response));
-	if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+	if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
 		goto err_close;
 	}
@@ -1549,7 +1553,7 @@ get_cni_fd(char *if_name)
 	/* Request for file descriptor for netdev name*/
 	snprintf(request, sizeof(request), "%s,%s", UDS_XSK_MAP_FD_MSG, if_name);
 	memset(response, 0, sizeof(response));
-	if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+	if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
 		goto err_close;
 	}
@@ -1571,7 +1575,7 @@ get_cni_fd(char *if_name)
 	/* Initiate close connection */
 	strlcpy(request, UDS_FIN_MSG, UDS_MAX_CMD_LEN);
 	memset(response, 0, sizeof(response));
-	if (make_request_cni(sock, &server, request, NULL, response, &out_fd) < 0) {
+	if (make_request_dp(sock, &server, request, NULL, response, &out_fd, dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Error in processing cmd [%s]\n", request);
 		goto err_close;
 	}
@@ -1695,17 +1699,16 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	}
 
 	if (internals->use_cni) {
-		int err, fd, map_fd;
+		int err, map_fd;
 
-		/* get socket fd from CNI plugin */
-		map_fd = get_cni_fd(internals->if_name);
+		/* get socket fd from AF_XDP Device Plugin */
+		map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
 		if (map_fd < 0) {
-			AF_XDP_LOG(ERR, "Failed to receive CNI plugin fd\n");
+			AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
 			goto out_xsk;
 		}
-		/* get socket fd */
-		fd = xsk_socket__fd(rxq->xsk);
-		err = bpf_map_update_elem(map_fd, &rxq->xsk_queue_idx, &fd, 0);
+
+		err = xsk_socket__update_xskmap(rxq->xsk, map_fd);
 		if (err) {
 			AF_XDP_LOG(ERR, "Failed to insert unprivileged xsk in map.\n");
 			goto out_xsk;
@@ -1881,13 +1884,13 @@ static const struct eth_dev_ops ops = {
 	.get_monitor_addr = eth_get_monitor_addr,
 };
 
-/* CNI option works in unprivileged container environment
- * and ethernet device functionality will be reduced. So
- * additional customiszed eth_dev_ops struct is needed
- * for cni. Promiscuous enable and disable functionality
- * is removed.
+/* AF_XDP Device Plugin option works in unprivileged
+ * container environments and ethernet device functionality
+ * will be reduced. So additional customised eth_dev_ops
+ * struct is needed for the Device Plugin. Promiscuous
+ * enable and disable functionality is removed.
  **/
-static const struct eth_dev_ops ops_cni = {
+static const struct eth_dev_ops ops_afxdp_dp = {
 	.dev_start = eth_dev_start,
 	.dev_stop = eth_dev_stop,
 	.dev_close = eth_dev_close,
@@ -2023,7 +2026,8 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
 static int
 parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 		 int *queue_cnt, int *shared_umem, char *prog_path,
-		 int *busy_budget, int *force_copy, int *use_cni)
+		 int *busy_budget, int *force_copy, int *use_cni,
+		 char *dp_path)
 {
 	int ret;
 
@@ -2069,6 +2073,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 	if (ret < 0)
 		goto free_kvlist;
 
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
+				 &parse_prog_arg, dp_path);
+	if (ret < 0)
+		goto free_kvlist;
+
 free_kvlist:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -2108,7 +2117,7 @@ static struct rte_eth_dev *
 init_internals(struct rte_vdev_device *dev, const char *if_name,
 	       int start_queue_idx, int queue_cnt, int shared_umem,
 	       const char *prog_path, int busy_budget, int force_copy,
-	       int use_cni)
+	       int use_cni, const char *dp_path)
 {
 	const char *name = rte_vdev_device_name(dev);
 	const unsigned int numa_node = dev->device.numa_node;
@@ -2138,6 +2147,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
 	internals->shared_umem = shared_umem;
 	internals->force_copy = force_copy;
 	internals->use_cni = use_cni;
+    strlcpy(internals->dp_path, dp_path, PATH_MAX);
 
 	if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
 				  &internals->combined_queue_cnt)) {
@@ -2199,7 +2209,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
 	if (!internals->use_cni)
 		eth_dev->dev_ops = &ops;
 	else
-		eth_dev->dev_ops = &ops_cni;
+		eth_dev->dev_ops = &ops_afxdp_dp;
 
 	eth_dev->rx_pkt_burst = eth_af_xdp_rx;
 	eth_dev->tx_pkt_burst = eth_af_xdp_tx;
@@ -2328,6 +2338,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	int busy_budget = -1, ret;
 	int force_copy = 0;
 	int use_cni = 0;
+    char dp_path[PATH_MAX] = {'\0'};
 	struct rte_eth_dev *eth_dev = NULL;
 	const char *name = rte_vdev_device_name(dev);
 
@@ -2370,7 +2381,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 
 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
 			     &xsk_queue_cnt, &shared_umem, prog_path,
-			     &busy_budget, &force_copy, &use_cni) < 0) {
+			     &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
 		return -EINVAL;
 	}
@@ -2387,6 +2398,18 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 			return -EINVAL;
 	}
 
+	if (use_cni && !strnlen(dp_path, PATH_MAX)) {
+		snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_UDS_SOCK);
+		AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
+			ETH_AF_XDP_DP_PATH_ARG, dp_path);
+	}
+
+    if (!use_cni && strnlen(dp_path, PATH_MAX)) {
+		AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
+			ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
+			return -EINVAL;
+	}
+
 	if (strlen(if_name) == 0) {
 		AF_XDP_LOG(ERR, "Network interface must be specified\n");
 		return -EINVAL;
@@ -2410,7 +2433,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 
 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
 				 xsk_queue_cnt, shared_umem, prog_path,
-				 busy_budget, force_copy, use_cni);
+				 busy_budget, force_copy, use_cni, dp_path);
 	if (eth_dev == NULL) {
 		AF_XDP_LOG(ERR, "Failed to init internals\n");
 		return -1;
@@ -2471,4 +2494,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
 			      "xdp_prog=<string> "
 			      "busy_budget=<int> "
 			      "force_copy=<int> "
-			      "use_cni=<int> ");
+			      "use_cni=<int> "
+			      "dp_path=<string> ");
-- 
2.41.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [v8 3/3] net/af_xdp: support AF_XDP DP pinned maps
  2024-02-14 16:06 [v8 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
  2024-02-14 16:06 ` [v8 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
  2024-02-14 16:06 ` [v8 2/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
@ 2024-02-14 16:06 ` Maryam Tahhan
  2024-02-14 16:18 ` [v8 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
  3 siblings, 0 replies; 5+ messages in thread
From: Maryam Tahhan @ 2024-02-14 16:06 UTC (permalink / raw)
  To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
	david.marchand, shibin.koikkara.reeny, ciara.loftus
  Cc: dev, Maryam Tahhan

Enable the AF_XDP PMD to retrieve the xskmap
from a pinned eBPF map. This map is expected
to be pinned by an external entity like the
AF_XDP Device Plugin. This enabled unprivileged
pods to create and use AF_XDP sockets.

Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
---
 doc/guides/howto/af_xdp_dp.rst         | 35 ++++++++--
 doc/guides/nics/af_xdp.rst             | 34 ++++++++--
 doc/guides/rel_notes/release_24_03.rst | 10 +++
 drivers/net/af_xdp/rte_eth_af_xdp.c    | 90 ++++++++++++++++++++------
 4 files changed, 138 insertions(+), 31 deletions(-)

diff --git a/doc/guides/howto/af_xdp_dp.rst b/doc/guides/howto/af_xdp_dp.rst
index 8a64ec5599..5cd87f93df 100644
--- a/doc/guides/howto/af_xdp_dp.rst
+++ b/doc/guides/howto/af_xdp_dp.rst
@@ -52,10 +52,21 @@ should be used when creating the socket
 to instruct libbpf not to load the default libbpf program on the netdev.
 Instead the loading is handled by the AF_XDP Device Plugin.
 
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-AF_XDP Device Plugin. If this argument is not passed alongside the ``use_cni``
-argument then the AF_XDP PMD configures it internally.
+The EAL vdev argument ``use_pinned_map`` is used indicate to the AF_XDP PMD to
+retrieve the XSKMAP fd from a pinned eBPF map. This map is expected to be pinned
+by an external entity like the AF_XDP Device Plugin. This enabled unprivileged pods
+to create and use AF_XDP sockets. When this flag is set, the
+``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD`` libbpf flag is used by the AF_XDP PMD when
+creating the AF_XDP socket.
+
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or ``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for Kubernetes`_.
 
 Limitations
 -----------
@@ -321,8 +332,18 @@ Run dpdk-testpmd with the AF_XDP Device Plugin + CNI
            --no-mlockall --in-memory \
            -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
 
+  Or
+
+  .. code-block:: console
+
+     kubectl exec -i <Pod name> --container <containers name> -- \
+           /<Path>/dpdk-testpmd -l 0,1 --no-pci \
+           --vdev=net_af_xdp0,use_pinned_map=1,iface=<interface name>,dp_path="/tmp/afxdp_dp/<interface name>/xsks_map" \
+           --no-mlockall --in-memory \
+           -- -i --a --nb-cores=2 --rxq=1 --txq=1 --forward-mode=macswap;
+
 .. note::
 
-    If the ``dp_path`` parameter isn't explicitly set (like the example above)
-    the AF_XDP PMD will set the parameter value to
-    ``/tmp/afxdp_dp/<<interface name>>/afxdp.sock``.
+    If the ``dp_path`` parameter isn't explicitly set with ``use_cni`` or ``use_pinned_map``
+    the AF_XDP PMD will set the parameter values to the `AF_XDP Device Plugin for Kubernetes`_
+    defaults.
diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index 7f8651beda..940bbf60f2 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -171,13 +171,35 @@ enable the `AF_XDP Device Plugin for Kubernetes`_ with a DPDK application/pod.
    so enabling and disabling of the promiscuous mode through the DPDK application
    is also not supported.
 
+use_pinned_map
+~~~~~~~~~~~~~~
+
+The EAL vdev argument ``use_pinned_map`` is used to indicate that the user wishes to
+load a pinned xskmap mounted by `AF_XDP Device Plugin for Kubernetes`_ in the DPDK
+application/pod.
+
+.. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
+
+.. code-block:: console
+
+   --vdev=net_af_xdp0,use_pinned_map=1
+
+.. note::
+
+    This feature can also be used with any external entity that can pin an eBPF map, not just
+    the `AF_XDP Device Plugin for Kubernetes`_.
+
 dp_path
 ~~~~~~~
 
-The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` argument
-to explicitly tell the AF_XDP PMD where to find the UDS to interact with the
-`AF_XDP Device Plugin for Kubernetes`_. If this argument is not passed
-alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
+The EAL vdev argument ``dp_path`` is used alongside the ``use_cni`` or ``use_pinned_map``
+arguments to explicitly tell the AF_XDP PMD where to find either:
+
+1. The UDS to interact with the AF_XDP Device Plugin. OR
+2. The pinned xskmap to use when creating AF_XDP sockets.
+
+If this argument is not passed alongside the ``use_cni`` or ``use_pinned_map`` arguments then
+the AF_XDP PMD configures it internally to the `AF_XDP Device Plugin for Kubernetes`_.
 
 .. _AF_XDP Device Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
 
@@ -185,6 +207,10 @@ alongside the ``use_cni`` argument then the AF_XDP PMD configures it internally.
 
    --vdev=net_af_xdp0,use_cni=1,dp_path="/tmp/afxdp_dp/<<interface name>>/afxdp.sock"
 
+.. code-block:: console
+
+   --vdev=net_af_xdp0,use_pinned_map=1,dp_path="/tmp/afxdp_dp/<<interface name>>/xsks_map"
+
 Limitations
 -----------
 
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 7f3653ad6f..99336dcb2b 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -63,6 +63,16 @@ New Features
   compatibility for any applications already using the ``use_cni`` vdev
   argument with the AF_XDP Device Plugin.
 
+* **Integrated AF_XDP PMD with AF_XDP Device Plugin eBPF map pinning support**.
+
+  The EAL vdev argument for the AF_XDP PMD ``use_map_pinning`` was added
+  to allow Kubernetes Pods to use AF_XDP with DPDK, and run  with limited
+  privileges, without having to do a full handshake over a Unix Domain
+  Socket with the Device Plugin. This flag indicates that the AF_XDP PMD
+  will be used in unprivileged mode and will obtain the XSKMAP FD by calling
+  ``bpf_obj_get()`` for an xskmap pinned (by the AF_XDP DP) inside the
+  container.
+
 Removed Items
 -------------
 
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index 6be8badf54..bf86b994c5 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -85,6 +85,7 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
 
 #define DP_BASE_PATH			"/tmp/afxdp_dp"
 #define DP_UDS_SOCK             "afxdp.sock"
+#define DP_XSK_MAP				"xsks_map"
 #define MAX_LONG_OPT_SZ			64
 #define UDS_MAX_FD_NUM			2
 #define UDS_MAX_CMD_LEN			64
@@ -172,6 +173,7 @@ struct pmd_internals {
 	bool custom_prog_configured;
 	bool force_copy;
 	bool use_cni;
+	bool use_pinned_map;
 	char dp_path[PATH_MAX];
 	struct bpf_map *map;
 
@@ -193,6 +195,7 @@ struct pmd_process_private {
 #define ETH_AF_XDP_BUDGET_ARG			"busy_budget"
 #define ETH_AF_XDP_FORCE_COPY_ARG		"force_copy"
 #define ETH_AF_XDP_USE_CNI_ARG			"use_cni"
+#define ETH_AF_XDP_USE_PINNED_MAP_ARG	"use_pinned_map"
 #define ETH_AF_XDP_DP_PATH_ARG			"dp_path"
 
 static const char * const valid_arguments[] = {
@@ -204,6 +207,7 @@ static const char * const valid_arguments[] = {
 	ETH_AF_XDP_BUDGET_ARG,
 	ETH_AF_XDP_FORCE_COPY_ARG,
 	ETH_AF_XDP_USE_CNI_ARG,
+	ETH_AF_XDP_USE_PINNED_MAP_ARG,
 	ETH_AF_XDP_DP_PATH_ARG,
 	NULL
 };
@@ -1258,6 +1262,21 @@ xsk_umem_info *xdp_umem_configure(struct pmd_internals *internals,
 }
 #endif
 
+static int
+get_pinned_map(const char *dp_path, int *map_fd)
+{
+	*map_fd  = bpf_obj_get(dp_path);
+	if (!*map_fd) {
+		AF_XDP_LOG(ERR, "Failed to find xsks_map in %s\n", dp_path);
+		return -1;
+	}
+
+	AF_XDP_LOG(INFO, "Successfully retrieved map %s with fd %d\n",
+				dp_path, *map_fd);
+
+	return 0;
+}
+
 static int
 load_custom_xdp_prog(const char *prog_path, int if_index, struct bpf_map **map)
 {
@@ -1644,7 +1663,7 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 #endif
 
 	/* Disable libbpf from loading XDP program */
-	if (internals->use_cni)
+	if (internals->use_cni || internals->use_pinned_map)
 		cfg.libbpf_flags |= XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD;
 
 	if (strnlen(internals->prog_path, PATH_MAX)) {
@@ -1698,14 +1717,23 @@ xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 		}
 	}
 
-	if (internals->use_cni) {
+	if (internals->use_cni || internals->use_pinned_map) {
 		int err, map_fd;
 
-		/* get socket fd from AF_XDP Device Plugin */
-		map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
-		if (map_fd < 0) {
-			AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
-			goto out_xsk;
+		if (internals->use_cni) {
+			/* get socket fd from AF_XDP Device Plugin */
+			map_fd = uds_get_xskmap_fd(internals->if_name, internals->dp_path);
+			if (map_fd < 0) {
+				AF_XDP_LOG(ERR, "Failed to receive xskmap fd from AF_XDP Device Plugin\n");
+				goto out_xsk;
+			}
+		} else {
+			/* get socket fd from AF_XDP plugin */
+			err = get_pinned_map(internals->dp_path, &map_fd);
+			if (err < 0 || map_fd < 0) {
+				AF_XDP_LOG(ERR, "Failed to retrieve pinned map fd\n");
+				goto out_xsk;
+			}
 		}
 
 		err = xsk_socket__update_xskmap(rxq->xsk, map_fd);
@@ -2027,7 +2055,7 @@ static int
 parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 		 int *queue_cnt, int *shared_umem, char *prog_path,
 		 int *busy_budget, int *force_copy, int *use_cni,
-		 char *dp_path)
+		 int *use_pinned_map, char *dp_path)
 {
 	int ret;
 
@@ -2073,6 +2101,11 @@ parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 	if (ret < 0)
 		goto free_kvlist;
 
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_USE_PINNED_MAP_ARG,
+				 &parse_integer_arg, use_pinned_map);
+	if (ret < 0)
+		goto free_kvlist;
+
 	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_DP_PATH_ARG,
 				 &parse_prog_arg, dp_path);
 	if (ret < 0)
@@ -2117,7 +2150,7 @@ static struct rte_eth_dev *
 init_internals(struct rte_vdev_device *dev, const char *if_name,
 	       int start_queue_idx, int queue_cnt, int shared_umem,
 	       const char *prog_path, int busy_budget, int force_copy,
-	       int use_cni, const char *dp_path)
+	       int use_cni, int use_pinned_map, const char *dp_path)
 {
 	const char *name = rte_vdev_device_name(dev);
 	const unsigned int numa_node = dev->device.numa_node;
@@ -2147,6 +2180,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
 	internals->shared_umem = shared_umem;
 	internals->force_copy = force_copy;
 	internals->use_cni = use_cni;
+	internals->use_pinned_map = use_pinned_map;
     strlcpy(internals->dp_path, dp_path, PATH_MAX);
 
 	if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
@@ -2206,7 +2240,7 @@ init_internals(struct rte_vdev_device *dev, const char *if_name,
 	eth_dev->data->dev_link = pmd_link;
 	eth_dev->data->mac_addrs = &internals->eth_addr;
 	eth_dev->data->dev_flags |= RTE_ETH_DEV_AUTOFILL_QUEUE_XSTATS;
-	if (!internals->use_cni)
+	if (!internals->use_cni && !internals->use_pinned_map)
 		eth_dev->dev_ops = &ops;
 	else
 		eth_dev->dev_ops = &ops_afxdp_dp;
@@ -2338,6 +2372,7 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	int busy_budget = -1, ret;
 	int force_copy = 0;
 	int use_cni = 0;
+	int use_pinned_map = 0;
     char dp_path[PATH_MAX] = {'\0'};
 	struct rte_eth_dev *eth_dev = NULL;
 	const char *name = rte_vdev_device_name(dev);
@@ -2381,20 +2416,27 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 
 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
 			     &xsk_queue_cnt, &shared_umem, prog_path,
-			     &busy_budget, &force_copy, &use_cni, dp_path) < 0) {
+			     &busy_budget, &force_copy, &use_cni, &use_pinned_map,
+			     dp_path) < 0) {
 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
 		return -EINVAL;
 	}
 
-	if (use_cni && busy_budget > 0) {
+	if (use_cni && use_pinned_map) {
 		AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
-			ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_BUDGET_ARG);
+			ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG);
 		return -EINVAL;
 	}
 
-	if (use_cni && strnlen(prog_path, PATH_MAX)) {
-		AF_XDP_LOG(ERR, "When '%s' parameter is used, '%s' parameter is not valid\n",
-			ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_PROG_ARG);
+	if ((use_cni || use_pinned_map) && busy_budget > 0) {
+		AF_XDP_LOG(ERR, "When '%s' or '%s' parameter is used, '%s' parameter is not valid\n",
+			ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG, ETH_AF_XDP_BUDGET_ARG);
+		return -EINVAL;
+	}
+
+	if ((use_cni || use_pinned_map) && strnlen(prog_path, PATH_MAX)) {
+		AF_XDP_LOG(ERR, "When '%s' or '%s' parameter is used, '%s' parameter is not valid\n",
+			ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG, ETH_AF_XDP_PROG_ARG);
 			return -EINVAL;
 	}
 
@@ -2404,9 +2446,15 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 			ETH_AF_XDP_DP_PATH_ARG, dp_path);
 	}
 
-    if (!use_cni && strnlen(dp_path, PATH_MAX)) {
-		AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' was not enabled\n",
-			ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG);
+	if (use_pinned_map && !strnlen(dp_path, PATH_MAX)) {
+		snprintf(dp_path, sizeof(dp_path), "%s/%s/%s", DP_BASE_PATH, if_name, DP_XSK_MAP);
+		AF_XDP_LOG(INFO, "'%s' parameter not provided, setting value to '%s'\n",
+			ETH_AF_XDP_DP_PATH_ARG, dp_path);
+	}
+
+	if ((!use_cni && !use_pinned_map) && strnlen(dp_path, PATH_MAX)) {
+		AF_XDP_LOG(ERR, "'%s' parameter is set, but '%s' or '%s' were not enabled\n",
+			ETH_AF_XDP_DP_PATH_ARG, ETH_AF_XDP_USE_CNI_ARG, ETH_AF_XDP_USE_PINNED_MAP_ARG);
 			return -EINVAL;
 	}
 
@@ -2433,7 +2481,8 @@ rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 
 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
 				 xsk_queue_cnt, shared_umem, prog_path,
-				 busy_budget, force_copy, use_cni, dp_path);
+				 busy_budget, force_copy, use_cni, use_pinned_map,
+				 dp_path);
 	if (eth_dev == NULL) {
 		AF_XDP_LOG(ERR, "Failed to init internals\n");
 		return -1;
@@ -2495,4 +2544,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
 			      "busy_budget=<int> "
 			      "force_copy=<int> "
 			      "use_cni=<int> "
+			      "use_pinned_map=<int> "
 			      "dp_path=<string> ");
-- 
2.41.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [v8 0/3] net/af_xdp: fix multi interface support for K8s
  2024-02-14 16:06 [v8 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
                   ` (2 preceding siblings ...)
  2024-02-14 16:06 ` [v8 3/3] net/af_xdp: support AF_XDP DP pinned maps Maryam Tahhan
@ 2024-02-14 16:18 ` Maryam Tahhan
  3 siblings, 0 replies; 5+ messages in thread
From: Maryam Tahhan @ 2024-02-14 16:18 UTC (permalink / raw)
  To: ferruh.yigit, stephen, lihuisong, fengchengwen, liuyonglong,
	david.marchand, shibin.koikkara.reeny, ciara.loftus
  Cc: dev

Sorry folks

Just spotted the checkpatch warnings :( I will fix these now.

BR
Maryam


On 14/02/2024 16:06, Maryam Tahhan wrote:
> The original `use_cni` implementation was limited to
> supporting only a single netdev in a DPDK pod. This patchset
> aims to fix this limitation transparently to the end user.
> It will also enable compatibility with the latest AF_XDP
> Device Plugin.
>
> Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
> ---
> v8:
> * Go back to using `use_cni` vdev argument
> * Introduce `use_map_pinning` vdev param.
> * Rename `uds_path` to `dp_path` so that it can be used
>    with map pinning as well as `use_cni`.
> * Set `dp_path` internally in the AF_XDP PMD if it's
>    not configured by the user.
> * Clean up the original `use_cni` documentation separately
>    to coding changes.
>
> v7:
> * Give a more descriptive commit msg headline.
> * Fixup typos in documentation.
>
> v6:
> * Add link to PR 81 in commit message
> * Add release notes changes to this patchset
>
> v5:
> * Fix alignment for ETH_AF_XDP_USE_DP_UDS_PATH_ARG
> * Remove use_cni references in af_xdp.rst
>
> v4:
> * Rename af_xdp_cni.rst to af_xdp_dp.rst
> * Removed all incorrect references to CNI throughout af_xdp
>    PMD file.
> * Fixed Typos in af_xdp_dp.rst
>
> v3:
> * Remove `use_cni` vdev argument as it's no longer needed.
> * Update incorrect CNI references for the AF_XDP DP in the
>    documentation.
> * Update the documentation to run a simple example with the
>    AF_XDP DP plugin in K8s.
>
> v2:
> * Rename sock_path to uds_path.
> * Update documentation to reflect when CAP_BPF is needed.
> * Fix testpmd arguments in the provided example for Pods.
> * Use AF_XDP API to update the xskmap entry.
> ---
> Maryam Tahhan (3):
>    docs: AF_XDP Device Plugin
>    net/af_xdp: fix multi interface support for K8s
>    net/af_xdp: support AF_XDP DP pinned maps
>
>   doc/guides/howto/af_xdp_cni.rst        | 253 ------------------
>   doc/guides/howto/af_xdp_dp.rst         | 349 +++++++++++++++++++++++++
>   doc/guides/howto/index.rst             |   2 +-
>   doc/guides/nics/af_xdp.rst             |  44 +++-
>   doc/guides/rel_notes/release_24_03.rst |  17 ++
>   drivers/net/af_xdp/rte_eth_af_xdp.c    | 162 ++++++++----
>   6 files changed, 527 insertions(+), 300 deletions(-)
>   delete mode 100644 doc/guides/howto/af_xdp_cni.rst
>   create mode 100644 doc/guides/howto/af_xdp_dp.rst
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-14 16:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-14 16:06 [v8 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-02-14 16:06 ` [v8 1/3] docs: AF_XDP Device Plugin Maryam Tahhan
2024-02-14 16:06 ` [v8 2/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-02-14 16:06 ` [v8 3/3] net/af_xdp: support AF_XDP DP pinned maps Maryam Tahhan
2024-02-14 16:18 ` [v8 0/3] net/af_xdp: fix multi interface support for K8s Maryam Tahhan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).