DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Koikkara Reeny, Shibin" <shibin.koikkara.reeny@intel.com>
To: Ferruh Yigit <ferruh.yigit@amd.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	"Zhang, Qi Z" <qi.z.zhang@intel.com>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>,
	"Mcnamara, John" <john.mcnamara@intel.com>
Cc: "Loftus, Ciara" <ciara.loftus@intel.com>
Subject: RE: [PATCH v3] net/af_xdp: AF_XDP PMD CNI Integration
Date: Thu, 9 Feb 2023 11:49:14 +0000	[thread overview]
Message-ID: <DM6PR11MB3995CBEE412EE02E54C4B3A2A2D99@DM6PR11MB3995.namprd11.prod.outlook.com> (raw)
In-Reply-To: <a68005cf-2b9e-b41f-8df4-392f3438bfa4@amd.com>



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Wednesday, February 8, 2023 6:31 PM
> To: Koikkara Reeny, Shibin <shibin.koikkara.reeny@intel.com>;
> dev@dpdk.org; Zhang, Qi Z <qi.z.zhang@intel.com>; Burakov, Anatoly
> <anatoly.burakov@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; Mcnamara, John
> <john.mcnamara@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>
> Subject: Re: [PATCH v3] net/af_xdp: AF_XDP PMD CNI Integration
> 
> On 2/2/2023 4:55 PM, Shibin Koikkara Reeny wrote:
> > Integrate support for the AF_XDP CNI and device plugin [1] so that the
> > DPDK AF_XDP PMD can work in an unprivileged container environment.
> > Part of the AF_XDP PMD initialization process involves loading an eBPF
> > program onto the given netdev. This operation requires privileges,
> > which prevents the PMD from being able to work in an unprivileged
> > container (without root access). The plugin CNI handles the program
> > loading. CNI open Unix Domain Socket (UDS) and waits listening for a
> > client to make requests over that UDS. The client(DPDK) connects and a
> > "handshake" occurs, then the File Descriptor which points to the
> > XSKMAP associated with the loaded eBPF program is handed over to the
> > client. The client can then proceed with creating an AF_XDP socket and
> > inserting the socket into the XSKMAP pointed to by the FD received on
> > the UDS.
> >
> > A new vdev arg "use_cni" is created to indicate user wishes to run the
> > PMD in unprivileged mode and to receive the XSKMAP FD from the CNI.
> > When this flag is set, the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD libbpf
> > flag should be used when creating the socket, which tells libbpf not
> > to load the default libbpf program on the netdev. We tell libbpf not
> > to do this because the loading is handled by the CNI in this scenario.
> >
> > Patch include howto doc explain how to configure AF_XDP CNI to working
> > with DPDK.
> >
> 
> I can't able to test this but I will rely on Anatoly's tested-by to proceed.
> 
> > [1]: https://github.com/intel/afxdp-plugins-for-kubernetes
> >
> > Signed-off-by: Shibin Koikkara Reeny <shibin.koikkara.reeny@intel.com>
> > ---
> >  doc/guides/howto/af_xdp_cni.rst     | 254 +++++++++++++++++++++
> >  doc/guides/howto/index.rst          |   1 +
> >  drivers/net/af_xdp/rte_eth_af_xdp.c | 337
> > +++++++++++++++++++++++++++-
> >  3 files changed, 580 insertions(+), 12 deletions(-)  create mode
> > 100644 doc/guides/howto/af_xdp_cni.rst
> >
> > diff --git a/doc/guides/howto/af_xdp_cni.rst
> > b/doc/guides/howto/af_xdp_cni.rst new file mode 100644 index
> > 0000000000..1cda08b2c6
> > --- /dev/null
> > +++ b/doc/guides/howto/af_xdp_cni.rst
> > @@ -0,0 +1,254 @@
> > +..  SPDX-License-Identifier: BSD-3-Clause
> > +    Copyright(c) 2020 Intel Corporation.
> 
> 2023?

Will update it in the next patch.

> 
> > +
> > +
> > +Using a CNI with the AF_XDP driver
> > +==================================
> > +
> > +
> > +
> 
> Is multiple empty lines required?


Will update it in the next patch.

> 
> > +Introduction
> > +------------
> > +
> > +CNI, the Container Network Interface, is a technology for configuring
> > +container network interfaces and which can be used to setup
> > +Kubernetes networking. AF_XDP is a Linux socket Address Family that
> > +enables an XDP program to redirect packets to a memory buffer in
> userspace.
> > +
> > +This document explains how to enable the `AF_XDP Plugin for
> > +Kubernetes`_ within a DPDK application using the `AF_XDP PMD`_ to
> connect and use these technologies.
> > +
> > +.. _AF_XDP Plugin for Kubernetes:
> > +https://github.com/intel/afxdp-plugins-for-kubernetes
> > +
> > +Background
> > +----------
> > +
> > +The standard `AF_XDP PMD`_ initialization process involves loading an
> > +eBPF program onto the kernel netdev to be used by the PMD. This
> > +operation requires root or escalated Linux privileges and thus
> > +prevents the PMD from working in an unprivileged container. The
> > +AF_XDP CNI plugin handles this situation by providing a device plugin that
> performs the program loading.
> > +
> > +At a technical level the CNI opens a Unix Domain Socket and listens
> > +for a client to make requests over that socket. A DPDK application
> > +acting as a client connects and initiates a configuration
> > +"handshake". The client then receives a file descriptor which points
> > +to the XSKMAP associated with the loaded eBPF program. The XSKMAP is
> > +a BPF map of AF_XDP sockets (XSK). The client can then proceed with
> > +creating an AF_XDP socket and inserting that socket into the XSKMAP
> pointed to by the descriptor.
> > +
> > +The EAL vdev argument ``use_cni`` is used to indicate that the user
> > +wishes to run the PMD in unprivileged mode and to receive the XSKMAP
> > +file descriptor from the CNI. When this flag is set, the
> > +``XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD``
> > +libbpf flag should be used when creating the socket to instruct
> > +libbpf not to load the default libbpf program on the netdev. Instead
> > +the loading is handled by the CNI.
> > +
> > +.. _AF_XDP PMD: https://doc.dpdk.org/guides/nics/af_xdp.html
> > +
> > +.. Note::
> > +
> > +     The Unix Domain Socket file path appear in the end user is
> "/tmp/afxdp.sock".
> > +
> 
> @Anatoly, I guess you did a patch in the past to group runtime files in a
> common folder, it was something like '/tmp/dpdk/<prefix>/'.
> Does that runtime folder includes the socket files?
> 
/tmp/afxdp.sock is created by the AF_XDP CNI plugin not  by the DPDK testpmd.
So we don’t have the control over the creation of this socket. 

> Also is this fixes socket name a blocker to have multiple primary process (--
> file-prefix)? (assuming both using af_xdp driver).
> 
> 
> > +Prerequisites
> > +-------------
> > +
> > +Docker and container prerequisites:
> > +
> > +* Set up the device plugin as described in  the instructions for
> > +`AF_XDP Plugin
> > +  for Kubernetes`_.
> > +
> > +* The Docker image should contain the libbpf and libxdp libraries,
> > +which
> > +  are dependencies for AF_XDP, and should include support for the
> > +``ethtool``
> > +  command.
> > +
> 
> Is there any specific library version dependency? If so can you please
> document?
> 

Library Libbpf and Libxdp are dependence for AF_XDP and it is already documented in the 
https://doc.dpdk.org/guides/nics/af_xdp.html. This link is also added as part of the Prerequisites.

> <...>
> 
> > +
> > +  For further reference please use the `config.json`_
> > +
> > +  .. _config.json:
> > + https://github.com/intel/afxdp-plugins-for-kubernetes/blob/main/test
> > + /e2e/config.json
> > +
> 
> 
> Should references have a fixed tag instead of a dynamic branch that can
> change later, like:
> https://github.com/intel/afxdp-plugins-for-
> kubernetes/blob/v0.0.2/test/e2e/config.json
> 
> What do you think?
> 

I agree will update in the next version.


> <...>
> 
> > @@ -81,6 +82,24 @@ RTE_LOG_REGISTER_DEFAULT(af_xdp_logtype,
> NOTICE);
> >
> >  #define ETH_AF_XDP_MP_KEY "afxdp_mp_send_fds"
> >
> > +#define MAX_LONG_OPT_SZ			64
> > +#define UDS_MAX_FD_NUM			2
> > +#define UDS_MAX_CMD_LEN			64
> > +#define UDS_MAX_CMD_RESP		128
> > +#define UDS_XSK_MAP_FD_MSG		"/xsk_map_fd"
> > +#define UDS_SOCK			"/tmp/afxdp.sock"
> > +#define UDS_CONNECT_MSG			"/connect"
> > +#define UDS_HOST_OK_MSG			"/host_ok"
> > +#define UDS_HOST_NAK_MSG		"/host_nak"
> > +#define UDS_VERSION_MSG			"/version"
> > +#define UDS_XSK_MAP_FD_MSG		"/xsk_map_fd"
> > +#define UDS_XSK_SOCKET_MSG		"/xsk_socket"
> > +#define UDS_FD_ACK_MSG			"/fd_ack"
> > +#define UDS_FD_NAK_MSG			"/fd_nak"
> > +#define UDS_FIN_MSG			"/fin"
> > +#define UDS_FIN_ACK_MSG			"/fin_ack"
> > +
> > +
> 
> extra empty line

Will remove it in the next version.


> 
> >  static int afxdp_dev_count;
> >
> >  /* Message header to synchronize fds via IPC */ @@ -151,6 +170,7 @@
> > struct pmd_internals {
> >  	char prog_path[PATH_MAX];
> >  	bool custom_prog_configured;
> >  	bool force_copy;
> > +	bool use_cni;
> >  	struct bpf_map *map;
> >
> >  	struct rte_ether_addr eth_addr;
> > @@ -170,6 +190,7 @@ struct pmd_process_private {
> >  #define ETH_AF_XDP_PROG_ARG			"xdp_prog"
> >  #define ETH_AF_XDP_BUDGET_ARG
> 	"busy_budget"
> >  #define ETH_AF_XDP_FORCE_COPY_ARG		"force_copy"
> > +#define ETH_AF_XDP_USE_CNI_ARG			"use_cni"
> >
> 
> Should document new devarg in driver documentation:
> doc/guides/nics/af_xdp.rst
> 
Will update it.

> Also need to add it to 'RTE_PMD_REGISTER_PARAM_STRING' macro for
> pmdinfo.py.
> 
Will be updating it in the next version.

> >  static const char * const valid_arguments[] = {
> >  	ETH_AF_XDP_IFACE_ARG,
> > @@ -179,8 +200,8 @@ static const char * const valid_arguments[] = {
> >  	ETH_AF_XDP_PROG_ARG,
> >  	ETH_AF_XDP_BUDGET_ARG,
> >  	ETH_AF_XDP_FORCE_COPY_ARG,
> > -	NULL
> > -};
> > +	ETH_AF_XDP_USE_CNI_ARG,
> > +	NULL};
> >
> 
> please keep '}' in next line.
> 

Will update it in the next version.

> <...>
> 
> > +static int
> > +make_request_cni(int sock, struct sockaddr_un *server, char *request,
> > +		 int *req_fd, char *response, int *out_fd) {
> > +	int rval;
> > +
> > +	AF_XDP_LOG(INFO, "Request: [%s]\n", request);
> > +
> 
> Isn't INFO level logging too verbose for log each request and response, what
> do you think for debug level?
> 

I agree. Will update in the next version.

> <...>
> 
> > @@ -1584,6 +1865,26 @@ static const struct eth_dev_ops ops = {
> >  	.get_monitor_addr = eth_get_monitor_addr,  };
> >
> > +/* CNI option works in unprivileged container environment
> > + * and ethernet device functionality will be reduced. So
> > + * additional customiszed eth_dev_ops struct is needed
> > + * for cni.
> > + **/
> > +static const struct eth_dev_ops ops_cni = {
> > +	.dev_start = eth_dev_start,
> > +	.dev_stop = eth_dev_stop,
> > +	.dev_close = eth_dev_close,
> > +	.dev_configure = eth_dev_configure,
> > +	.dev_infos_get = eth_dev_info,
> > +	.mtu_set = eth_dev_mtu_set,
> > +	.rx_queue_setup = eth_rx_queue_setup,
> > +	.tx_queue_setup = eth_tx_queue_setup,
> > +	.link_update = eth_link_update,
> > +	.stats_get = eth_stats_get,
> > +	.stats_reset = eth_stats_reset,
> > +	.get_monitor_addr = eth_get_monitor_addr, };
> > +
> 
> The customisation is reduced dev_ops, right? Can you please comment
> which functionality is removed and why (what kind of permission
> requirement prevents that functionality to be used)?
> 
Will update in the next version.

> And can you please document in the driver documentation that 'use_cni'
> will cause reduced functionality and what those dropped functionalities are?
> 

This is updated in the Prerequisites as Pod should enable some capabilities as PMD 
will have only limited functionality. 

> >  /** parse busy_budget argument */
> >  static int
> >  parse_budget_arg(const char *key __rte_unused, @@ -1704,8 +2005,8
> @@
> > xdp_get_channels_info(const char *if_name, int *max_queues,
> >
> >  static int
> >  parse_parameters(struct rte_kvargs *kvlist, char *if_name, int
> *start_queue,
> > -			int *queue_cnt, int *shared_umem, char
> *prog_path,
> > -			int *busy_budget, int *force_copy)
> > +		 int *queue_cnt, int *shared_umem, char *prog_path,
> > +		 int *busy_budget, int *force_copy, int *use_cni)
> 
> At some point need to put these variables for paramters into a struct, instead
> of keep increasing the parameter list. Feel free to make a preperation patch
> before this patch if you want.

I think it will be good to do. But I prefer to do it after this patch.


  reply	other threads:[~2023-02-09 11:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <to=20221214154102.1521489-1-shibin.koikkara.reeny@intel.com>
2023-02-02 16:55 ` Shibin Koikkara Reeny
2023-02-03  9:46   ` Mcnamara, John
2023-02-08  8:23   ` Zhang, Qi Z
2023-02-08 18:30   ` Ferruh Yigit
2023-02-09 11:49     ` Koikkara Reeny, Shibin [this message]
2023-02-09 12:05   ` [PATCH v4] " Shibin Koikkara Reeny
2023-02-10 13:04     ` Ferruh Yigit
2023-02-10 15:38       ` Koikkara Reeny, Shibin
2023-02-10 20:50         ` Ferruh Yigit
2023-02-10 13:41     ` Ferruh Yigit
2023-02-10 15:48     ` [PATCH v5] net/af_xdp: support " Shibin Koikkara Reeny
2023-02-14 14:50       ` Mcnamara, John
2023-02-15 16:30       ` [PATCH v6] " Shibin Koikkara Reeny
2023-02-16 13:50         ` Ferruh Yigit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR11MB3995CBEE412EE02E54C4B3A2A2D99@DM6PR11MB3995.namprd11.prod.outlook.com \
    --to=shibin.koikkara.reeny@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=ciara.loftus@intel.com \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=john.mcnamara@intel.com \
    --cc=qi.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).