DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ferruh Yigit <ferruh.yigit@amd.com>
To: Shibin Koikkara Reeny <shibin.koikkara.reeny@intel.com>,
	dev@dpdk.org, qi.z.zhang@intel.com, anatoly.burakov@intel.com,
	bruce.richardson@intel.com, john.mcnamara@intel.com
Cc: ciara.loftus@intel.com
Subject: Re: [PATCH v3] net/af_xdp: AF_XDP PMD CNI Integration
Date: Wed, 8 Feb 2023 18:30:54 +0000	[thread overview]
Message-ID: <a68005cf-2b9e-b41f-8df4-392f3438bfa4@amd.com> (raw)
In-Reply-To: <20230202165513.31012-1-shibin.koikkara.reeny@intel.com>

On 2/2/2023 4:55 PM, Shibin Koikkara Reeny wrote:
> Integrate support for the AF_XDP CNI and device plugin [1] so that the
> DPDK AF_XDP PMD can work in an unprivileged container environment.
> Part of the AF_XDP PMD initialization process involves loading
> an eBPF program onto the given netdev. This operation requires
> privileges, which prevents the PMD from being able to work in an
> unprivileged container (without root access). The plugin CNI handles
> the program loading. CNI open Unix Domain Socket (UDS) and waits
> listening for a client to make requests over that UDS. The client(DPDK)
> connects and a "handshake" occurs, then the File Descriptor which points
> to the XSKMAP associated with the loaded eBPF program is handed over
> to the client. The client can then proceed with creating an AF_XDP
> socket and inserting the socket into the XSKMAP pointed to by the
> FD received on the UDS.
> 
> A new vdev arg "use_cni" is created to indicate user wishes to run
> the PMD in unprivileged mode and to receive the XSKMAP FD from the CNI.
> When this flag is set, the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf flag
> should be used when creating the socket, which tells libbpf not to load the
> default libbpf program on the netdev. We tell libbpf not to do this because
> the loading is handled by the CNI in this scenario.
> 
> Patch include howto doc explain how to configure AF_XDP CNI to
> working with DPDK.
> 

I can't able to test this but I will rely on Anatoly's tested-by to proceed.

> [1]: https://github.com/intel/afxdp-plugins-for-kubernetes
> 
> Signed-off-by: Shibin Koikkara Reeny <shibin.koikkara.reeny@intel.com>
> ---
>  doc/guides/howto/af_xdp_cni.rst     | 254 +++++++++++++++++++++
>  doc/guides/howto/index.rst          |   1 +
>  drivers/net/af_xdp/rte_eth_af_xdp.c | 337 +++++++++++++++++++++++++++-
>  3 files changed, 580 insertions(+), 12 deletions(-)
>  create mode 100644 doc/guides/howto/af_xdp_cni.rst
> 
> diff --git a/doc/guides/howto/af_xdp_cni.rst b/doc/guides/howto/af_xdp_cni.rst
> new file mode 100644
> index 0000000000..1cda08b2c6
> --- /dev/null
> +++ b/doc/guides/howto/af_xdp_cni.rst
> @@ -0,0 +1,254 @@
> +..  SPDX-License-Identifier: BSD-3-Clause
> +    Copyright(c) 2020 Intel Corporation.

2023?

> +
> +
> +Using a CNI with the AF_XDP driver
> +==================================
> +
> +
> +

Is multiple empty lines required?

> +Introduction
> +------------
> +
> +CNI, the Container Network Interface, is a technology for configuring container
> +network interfaces and which can be used to setup Kubernetes networking. AF_XDP
> +is a Linux socket Address Family that enables an XDP program to redirect packets
> +to a memory buffer in userspace.
> +
> +This document explains how to enable the `AF_XDP Plugin for Kubernetes`_ within
> +a DPDK application using the `AF_XDP PMD`_ to connect and use these technologies.
> +
> +.. _AF_XDP Plugin for Kubernetes: https://github.com/intel/afxdp-plugins-for-kubernetes
> +
> +Background
> +----------
> +
> +The standard `AF_XDP PMD`_ initialization process involves loading an eBPF program
> +onto the kernel netdev to be used by the PMD. This operation requires root or
> +escalated Linux privileges and thus prevents the PMD from working in an
> +unprivileged container. The AF_XDP CNI plugin handles this situation by
> +providing a device plugin that performs the program loading.
> +
> +At a technical level the CNI opens a Unix Domain Socket and listens for a client
> +to make requests over that socket. A DPDK application acting as a client
> +connects and initiates a configuration "handshake". The client then receives a
> +file descriptor which points to the XSKMAP associated with the loaded eBPF
> +program. The XSKMAP is a BPF map of AF_XDP sockets (XSK). The client can then
> +proceed with creating an AF_XDP socket and inserting that socket into the XSKMAP
> +pointed to by the descriptor.
> +
> +The EAL vdev argument ``use_cni`` is used to indicate that the user wishes to
> +run the PMD in unprivileged mode and to receive the XSKMAP file descriptor from
> +the CNI. When this flag is set, the ``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD``
> +libbpf flag should be used when creating the socket to instruct libbpf not to
> +load the default libbpf program on the netdev. Instead the loading is handled by
> +the CNI.
> +
> +.. _AF_XDP PMD: https://doc.dpdk.org/guides/nics/af_xdp.html
> +
> +.. Note::
> +
> +     The Unix Domain Socket file path appear in the end user is "/tmp/afxdp.sock".
> +

@Anatoly, I guess you did a patch in the past to group runtime files in
a common folder, it was something like '/tmp/dpdk/<prefix>/'.
Does that runtime folder includes the socket files?

Also is this fixes socket name a blocker to have multiple primary
process (--file-prefix)? (assuming both using af_xdp driver).


> +Prerequisites
> +-------------
> +
> +Docker and container prerequisites:
> +
> +* Set up the device plugin as described in  the instructions for `AF_XDP Plugin
> +  for Kubernetes`_.
> +
> +* The Docker image should contain the libbpf and libxdp libraries, which
> +  are dependencies for AF_XDP, and should include support for the ``ethtool``
> +  command.
> +

Is there any specific library version dependency? If so can you please
document?

<...>

> +
> +  For further reference please use the `config.json`_
> +
> +  .. _config.json: https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/test/e2e/config.json
> +


Should references have a fixed tag instead of a dynamic branch that can
change later, like:
https://github.com/intel/afxdp-plugins-for-kubernetes/blob/v0.0.2/test/e2e/config.json

What do you think?

<...>

> @@ -81,6 +82,24 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype, NOTICE);
>  
>  #define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
>  
> +#define MAX_LONG_OPT_SZ			64
> +#define UDS_MAX_FD_NUM			2
> +#define UDS_MAX_CMD_LEN			64
> +#define UDS_MAX_CMD_RESP		128
> +#define UDS_XSK_MAP_FD_MSG		"/xsk_map_fd"
> +#define UDS_SOCK			"/tmp/afxdp.sock"
> +#define UDS_CONNECT_MSG			"/connect"
> +#define UDS_HOST_OK_MSG			"/host_ok"
> +#define UDS_HOST_NAK_MSG		"/host_nak"
> +#define UDS_VERSION_MSG			"/version"
> +#define UDS_XSK_MAP_FD_MSG		"/xsk_map_fd"
> +#define UDS_XSK_SOCKET_MSG		"/xsk_socket"
> +#define UDS_FD_ACK_MSG			"/fd_ack"
> +#define UDS_FD_NAK_MSG			"/fd_nak"
> +#define UDS_FIN_MSG			"/fin"
> +#define UDS_FIN_ACK_MSG			"/fin_ack"
> +
> +

extra empty line

>  static int afxdp_dev_count;
>  
>  /* Message header to synchronize fds via IPC */
> @@ -151,6 +170,7 @@ struct pmd_internals {
>  	char prog_path[PATH_MAX];
>  	bool custom_prog_configured;
>  	bool force_copy;
> +	bool use_cni;
>  	struct bpf_map *map;
>  
>  	struct rte_ether_addr eth_addr;
> @@ -170,6 +190,7 @@ struct pmd_process_private {
>  #define ETH_AF_XDP_PROG_ARG			"xdp_prog"
>  #define ETH_AF_XDP_BUDGET_ARG			"busy_budget"
>  #define ETH_AF_XDP_FORCE_COPY_ARG		"force_copy"
> +#define ETH_AF_XDP_USE_CNI_ARG			"use_cni"
>  

Should document new devarg in driver documentation:
doc/guides/nics/af_xdp.rst

Also need to add it to 'RTE_PMD_REGISTER_PARAM_STRING' macro for pmdinfo.py.

>  static const char * const valid_arguments[] = {
>  	ETH_AF_XDP_IFACE_ARG,
> @@ -179,8 +200,8 @@ static const char * const valid_arguments[] = {
>  	ETH_AF_XDP_PROG_ARG,
>  	ETH_AF_XDP_BUDGET_ARG,
>  	ETH_AF_XDP_FORCE_COPY_ARG,
> -	NULL
> -};
> +	ETH_AF_XDP_USE_CNI_ARG,
> +	NULL};
>  

please keep '}' in next line.

<...>

> +static int
> +make_request_cni(int sock, struct sockaddr_un *server, char *request,
> +		 int *req_fd, char *response, int *out_fd)
> +{
> +	int rval;
> +
> +	AF_XDP_LOG(INFO, "Request: [%s]\n", request);
> +

Isn't INFO level logging too verbose for log each request and response,
what do you think for debug level?

<...>

> @@ -1584,6 +1865,26 @@ static const struct eth_dev_ops ops = {
>  	.get_monitor_addr = eth_get_monitor_addr,
>  };
>  
> +/* CNI option works in unprivileged container environment
> + * and ethernet device functionality will be reduced. So
> + * additional customiszed eth_dev_ops struct is needed
> + * for cni.
> + **/
> +static const struct eth_dev_ops ops_cni = {
> +	.dev_start = eth_dev_start,
> +	.dev_stop = eth_dev_stop,
> +	.dev_close = eth_dev_close,
> +	.dev_configure = eth_dev_configure,
> +	.dev_infos_get = eth_dev_info,
> +	.mtu_set = eth_dev_mtu_set,
> +	.rx_queue_setup = eth_rx_queue_setup,
> +	.tx_queue_setup = eth_tx_queue_setup,
> +	.link_update = eth_link_update,
> +	.stats_get = eth_stats_get,
> +	.stats_reset = eth_stats_reset,
> +	.get_monitor_addr = eth_get_monitor_addr,
> +};
> +

The customisation is reduced dev_ops, right? Can you please comment
which functionality is removed and why (what kind of permission
requirement prevents that functionality to be used)?

And can you please document in the driver documentation that 'use_cni'
will cause reduced functionality and what those dropped functionalities are?

>  /** parse busy_budget argument */
>  static int
>  parse_budget_arg(const char *key __rte_unused,
> @@ -1704,8 +2005,8 @@ xdp_get_channels_info(const char *if_name, int *max_queues,
>  
>  static int
>  parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
> -			int *queue_cnt, int *shared_umem, char *prog_path,
> -			int *busy_budget, int *force_copy)
> +		 int *queue_cnt, int *shared_umem, char *prog_path,
> +		 int *busy_budget, int *force_copy, int *use_cni)

At some point need to put these variables for paramters into a struct,
instead of keep increasing the parameter list. Feel free to make a
preperation patch before this patch if you want.


  parent reply	other threads:[~2023-02-08 18:31 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <to=20221214154102.1521489-1-shibin.koikkara.reeny@intel.com>
2023-02-02 16:55 ` Shibin Koikkara Reeny
2023-02-03  9:46   ` Mcnamara, John
2023-02-08  8:23   ` Zhang, Qi Z
2023-02-08 18:30   ` Ferruh Yigit [this message]
2023-02-09 11:49     ` Koikkara Reeny, Shibin
2023-02-09 12:05   ` [PATCH v4] " Shibin Koikkara Reeny
2023-02-10 13:04     ` Ferruh Yigit
2023-02-10 15:38       ` Koikkara Reeny, Shibin
2023-02-10 20:50         ` Ferruh Yigit
2023-02-10 13:41     ` Ferruh Yigit
2023-02-10 15:48     ` [PATCH v5] net/af_xdp: support " Shibin Koikkara Reeny
2023-02-14 14:50       ` Mcnamara, John
2023-02-15 16:30       ` [PATCH v6] " Shibin Koikkara Reeny
2023-02-16 13:50         ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a68005cf-2b9e-b41f-8df4-392f3438bfa4@amd.com \
    --to=ferruh.yigit@amd.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=ciara.loftus@intel.com \
    --cc=dev@dpdk.org \
    --cc=john.mcnamara@intel.com \
    --cc=qi.z.zhang@intel.com \
    --cc=shibin.koikkara.reeny@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).