From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <dev-bounces@dpdk.org> Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4B47BA2EDB for <public@inbox.dpdk.org>; Mon, 30 Sep 2019 09:53:07 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6ADA94C99; Mon, 30 Sep 2019 09:52:49 +0200 (CEST) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 4D4144C9D for <dev@dpdk.org>; Mon, 30 Sep 2019 09:52:43 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Sep 2019 00:52:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,565,1559545200"; d="scan'208";a="204778011" Received: from baranmx-mobl.ger.corp.intel.com ([10.103.104.83]) by fmsmga001.fm.intel.com with ESMTP; 30 Sep 2019 00:52:41 -0700 From: Marcin Baran <marcinx.baran@intel.com> To: dev@dpdk.org, bruce.richardson@intel.com Cc: Marcin Baran <marcinx.baran@intel.com> Date: Mon, 30 Sep 2019 09:50:34 +0200 Message-Id: <20190930075034.2000-7-marcinx.baran@intel.com> X-Mailer: git-send-email 2.22.0.windows.1 In-Reply-To: <20190930075034.2000-1-marcinx.baran@intel.com> References: <20190920073714.1314-1-marcinx.baran@intel.com> <20190930075034.2000-1-marcinx.baran@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v6 6/6] doc/guides/: provide IOAT sample app guide X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> Added guide for IOAT sample app usage and code description. Signed-off-by: Marcin Baran <marcinx.baran@intel.com> --- doc/guides/sample_app_ug/index.rst | 1 + doc/guides/sample_app_ug/intro.rst | 4 + doc/guides/sample_app_ug/ioat.rst | 581 +++++++++++++++++++++++++++++ 3 files changed, 586 insertions(+) create mode 100644 doc/guides/sample_app_ug/ioat.rst diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst index f23f8f59e..a6a1d9e7a 100644 --- a/doc/guides/sample_app_ug/index.rst +++ b/doc/guides/sample_app_ug/index.rst @@ -23,6 +23,7 @@ Sample Applications User Guides ip_reassembly kernel_nic_interface keep_alive + ioat l2_forward_crypto l2_forward_job_stats l2_forward_real_virtual diff --git a/doc/guides/sample_app_ug/intro.rst b/doc/guides/sample_app_ug/intro.rst index 90704194a..74462312f 100644 --- a/doc/guides/sample_app_ug/intro.rst +++ b/doc/guides/sample_app_ug/intro.rst @@ -91,6 +91,10 @@ examples are highlighted below. forwarding, or ``l3fwd`` application does forwarding based on Internet Protocol, IPv4 or IPv6 like a simple router. +* :doc:`Hardware packet copying<ioat>`: The Hardware packet copying, + or ``ioatfwd`` application demonstrates how to use IOAT rawdev driver for + copying packets between two threads. + * :doc:`Packet Distributor<dist_app>`: The Packet Distributor demonstrates how to distribute packets arriving on an Rx port to different cores for processing and transmission. diff --git a/doc/guides/sample_app_ug/ioat.rst b/doc/guides/sample_app_ug/ioat.rst new file mode 100644 index 000000000..bab7654b8 --- /dev/null +++ b/doc/guides/sample_app_ug/ioat.rst @@ -0,0 +1,581 @@ +.. SPDX-License-Identifier: BSD-3-Clause + Copyright(c) 2019 Intel Corporation. + +.. include:: <isonum.txt> + +Packet copying using Intel\ |reg| QuickData Technology +====================================================== + +Overview +-------- + +This sample is intended as a demonstration of the basic components of a DPDK +forwarding application and example of how to use IOAT driver API to make +packets copies. + +Also while forwarding, the MAC addresses are affected as follows: + +* The source MAC address is replaced by the TX port MAC address + +* The destination MAC address is replaced by 02:00:00:00:00:TX_PORT_ID + +This application can be used to compare performance of using software packet +copy with copy done using a DMA device for different sizes of packets. +The example will print out statistics each second. The stats shows +received/send packets and packets dropped or failed to copy. + +Compiling the Application +------------------------- + +To compile the sample application see :doc:`compiling`. + +The application is located in the ``ioat`` sub-directory. + + +Running the Application +----------------------- + +In order to run the hardware copy application, the copying device +needs to be bound to user-space IO driver. + +Refer to the "IOAT Rawdev Driver" chapter in the "Rawdev Drivers" document +for information on using the driver. + +The application requires a number of command line options: + +.. code-block:: console + + ./build/ioatfwd [EAL options] -- [-p MASK] [-q NQ] [-s RS] [-c <sw|hw>] + [--[no-]mac-updating] + +where, + +* p MASK: A hexadecimal bitmask of the ports to configure (default is all) + +* q NQ: Number of Rx queues used per port equivalent to CBDMA channels + per port (default is 1) + +* c CT: Performed packet copy type: software (sw) or hardware using + DMA (hw) (default is hw) + +* s RS: Size of IOAT rawdev ring for hardware copy mode or rte_ring for + software copy mode (default is 2048) + +* --[no-]mac-updating: Whether MAC address of packets should be changed + or not (default is mac-updating) + +The application can be launched in various configurations depending on +provided parameters. The app can use up to 2 lcores: one of them receives +incoming traffic and makes a copy of each packet. The second lcore then +updates MAC address and sends the copy. If one lcore per port is used, +both operations are done sequentially. For each configuration an additional +lcore is needed since the master lcore does not handle traffic but is +responsible for configuration, statistics printing and safe shutdown of +all ports and devices. + +The application can use a maximum of 8 ports. + +To run the application in a Linux environment with 3 lcores (the master lcore, +plus two forwarding cores), a single port (port 0), software copying and MAC +updating issue the command: + +.. code-block:: console + + $ ./build/ioatfwd -l 0-2 -n 2 -- -p 0x1 --mac-updating -c sw + +To run the application in a Linux environment with 2 lcores (the master lcore, +plus one forwarding core), 2 ports (ports 0 and 1), hardware copying and no MAC +updating issue the command: + +.. code-block:: console + + $ ./build/ioatfwd -l 0-1 -n 1 -- -p 0x3 --no-mac-updating -c hw + +Refer to the *DPDK Getting Started Guide* for general information on +running applications and the Environment Abstraction Layer (EAL) options. + +Explanation +----------- + +The following sections provide an explanation of the main components of the +code. + +All DPDK library functions used in the sample code are prefixed with +``rte_`` and are explained in detail in the *DPDK API Documentation*. + + +The Main Function +~~~~~~~~~~~~~~~~~ + +The ``main()`` function performs the initialization and calls the execution +threads for each lcore. + +The first task is to initialize the Environment Abstraction Layer (EAL). +The ``argc`` and ``argv`` arguments are provided to the ``rte_eal_init()`` +function. The value returned is the number of parsed arguments: + +.. code-block:: c + + /* init EAL */ + ret = rte_eal_init(argc, argv); + if (ret < 0) + rte_exit(EXIT_FAILURE, "Invalid EAL arguments\n"); + + +The ``main()`` also allocates a mempool to hold the mbufs (Message Buffers) +used by the application: + +.. code-block:: c + + nb_mbufs = RTE_MAX(rte_eth_dev_count_avail() * (nb_rxd + nb_txd + + MAX_PKT_BURST + rte_lcore_count() * MEMPOOL_CACHE_SIZE), + MIN_POOL_SIZE); + + /* Create the mbuf pool */ + ioat_pktmbuf_pool = rte_pktmbuf_pool_create("mbuf_pool", nb_mbufs, + MEMPOOL_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, + rte_socket_id()); + if (ioat_pktmbuf_pool == NULL) + rte_exit(EXIT_FAILURE, "Cannot init mbuf pool\n"); + +Mbufs are the packet buffer structure used by DPDK. They are explained in +detail in the "Mbuf Library" section of the *DPDK Programmer's Guide*. + +The ``main()`` function also initializes the ports: + +.. code-block:: c + + /* Initialise each port */ + RTE_ETH_FOREACH_DEV(portid) { + port_init(portid, ioat_pktmbuf_pool); + } + +Each port is configured using ``port_init()`` function. The Ethernet +ports are configured with local settings using the ``rte_eth_dev_configure()`` +function and the ``port_conf`` struct. The RSS is enabled so that +multiple Rx queues could be used for packet receiving and copying by +multiple CBDMA channels per port: + +.. code-block:: c + + /* configuring port to use RSS for multiple RX queues */ + static const struct rte_eth_conf port_conf = { + .rxmode = { + .mq_mode = ETH_MQ_RX_RSS, + .max_rx_pkt_len = RTE_ETHER_MAX_LEN + }, + .rx_adv_conf = { + .rss_conf = { + .rss_key = NULL, + .rss_hf = ETH_RSS_PROTO_MASK, + } + } + }; + +For this example the ports are set up with the number of Rx queues provided +with -q option and 1 Tx queue using the ``rte_eth_rx_queue_setup()`` +and ``rte_eth_tx_queue_setup()`` functions. + +The Ethernet port is then started: + +.. code-block:: c + + ret = rte_eth_dev_start(portid); + if (ret < 0) + rte_exit(EXIT_FAILURE, "rte_eth_dev_start:err=%d, port=%u\n", + ret, portid); + + +Finally the Rx port is set in promiscuous mode: + +.. code-block:: c + + rte_eth_promiscuous_enable(portid); + + +After that each port application assigns resources needed. + +.. code-block:: c + + check_link_status(ioat_enabled_port_mask); + + if (!cfg.nb_ports) { + rte_exit(EXIT_FAILURE, + "All available ports are disabled. Please set portmask.\n"); + } + + /* Check if there is enough lcores for all ports. */ + cfg.nb_lcores = rte_lcore_count() - 1; + if (cfg.nb_lcores < 1) + rte_exit(EXIT_FAILURE, + "There should be at least one slave lcore.\n"); + + ret = 0; + + if (copy_mode == COPY_MODE_IOAT_NUM) { + assign_rawdevs(); + } else /* copy_mode == COPY_MODE_SW_NUM */ { + assign_rings(); + } + +Depending on mode set (whether copy should be done by software or by hardware) +special structures are assigned to each port. If software copy was chosen, +application have to assign ring structures for packet exchanging between lcores +assigned to ports. + +.. code-block:: c + + static void + assign_rings(void) + { + uint32_t i; + + for (i = 0; i < cfg.nb_ports; i++) { + char ring_name[20]; + + snprintf(ring_name, 20, "rx_to_tx_ring_%u", i); + /* Create ring for inter core communication */ + cfg.ports[i].rx_to_tx_ring = rte_ring_create( + ring_name, ring_size, + rte_socket_id(), RING_F_SP_ENQ); + + if (cfg.ports[i].rx_to_tx_ring == NULL) + rte_exit(EXIT_FAILURE, "%s\n", + rte_strerror(rte_errno)); + } + } + + +When using hardware copy each Rx queue of the port is assigned an +IOAT device (``assign_rawdevs()``) using IOAT Rawdev Driver API +functions: + +.. code-block:: c + + static void + assign_rawdevs(void) + { + uint16_t nb_rawdev = 0, rdev_id = 0; + uint32_t i, j; + + for (i = 0; i < cfg.nb_ports; i++) { + for (j = 0; j < cfg.ports[i].nb_queues; j++) { + struct rte_rawdev_info rdev_info = { 0 }; + + do { + if (rdev_id == rte_rawdev_count()) + goto end; + rte_rawdev_info_get(rdev_id++, &rdev_info); + } while (strcmp(rdev_info.driver_name, + IOAT_PMD_RAWDEV_NAME_STR) != 0); + + cfg.ports[i].ioat_ids[j] = rdev_id - 1; + configure_rawdev_queue(cfg.ports[i].ioat_ids[j]); + ++nb_rawdev; + } + } + end: + if (nb_rawdev < cfg.nb_ports * cfg.ports[0].nb_queues) + rte_exit(EXIT_FAILURE, + "Not enough IOAT rawdevs (%u) for all queues (%u).\n", + nb_rawdev, cfg.nb_ports * cfg.ports[0].nb_queues); + RTE_LOG(INFO, IOAT, "Number of used rawdevs: %u.\n", nb_rawdev); + } + + +The initialization of hardware device is done by ``rte_rawdev_configure()`` +function using ``rte_rawdev_info`` struct. After configuration the device is +started using ``rte_rawdev_start()`` function. Each of the above operations +is done in ``configure_rawdev_queue()``. + +.. code-block:: c + + static void + configure_rawdev_queue(uint32_t dev_id) + { + struct rte_ioat_rawdev_config dev_config = { .ring_size = ring_size }; + struct rte_rawdev_info info = { .dev_private = &dev_config }; + + if (rte_rawdev_configure(dev_id, &info) != 0) { + rte_exit(EXIT_FAILURE, + "Error with rte_rawdev_configure()\n"); + } + if (rte_rawdev_start(dev_id) != 0) { + rte_exit(EXIT_FAILURE, + "Error with rte_rawdev_start()\n"); + } + } + +If initialization is successful, memory for hardware device +statistics is allocated. + +Finally ``main()`` function starts all packet handling lcores and starts +printing stats in a loop on the master lcore. The application can be +interrupted and closed using ``Ctrl-C``. The master lcore waits for +all slave processes to finish, deallocates resources and exits. + +The processing lcores launching function are described below. + +The Lcores Launching Functions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As described above, ``main()`` function invokes ``start_forwarding_cores()`` +function in order to start processing for each lcore: + +.. code-block:: c + + static void start_forwarding_cores(void) + { + uint32_t lcore_id = rte_lcore_id(); + + RTE_LOG(INFO, IOAT, "Entering %s on lcore %u\n", + __func__, rte_lcore_id()); + + if (cfg.nb_lcores == 1) { + lcore_id = rte_get_next_lcore(lcore_id, true, true); + rte_eal_remote_launch((lcore_function_t *)rxtx_main_loop, + NULL, lcore_id); + } else if (cfg.nb_lcores > 1) { + lcore_id = rte_get_next_lcore(lcore_id, true, true); + rte_eal_remote_launch((lcore_function_t *)rx_main_loop, + NULL, lcore_id); + + lcore_id = rte_get_next_lcore(lcore_id, true, true); + rte_eal_remote_launch((lcore_function_t *)tx_main_loop, NULL, + lcore_id); + } + } + +The function launches Rx/Tx processing functions on configured lcores +using ``rte_eal_remote_launch()``. The configured ports, their number +and number of assigned lcores are stored in user-defined +``rxtx_transmission_config`` struct: + +.. code-block:: c + + struct rxtx_transmission_config { + struct rxtx_port_config ports[RTE_MAX_ETHPORTS]; + uint16_t nb_ports; + uint16_t nb_lcores; + }; + +The structure is initialized in 'main()' function with the values +corresponding to ports and lcores configuration provided by the user. + +The Lcores Processing Functions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For receiving packets on each port, the ``ioat_rx_port()`` function is used. +The function receives packets on each configured Rx queue. Depending on the +mode the user chose, it will enqueue packets to IOAT rawdev channels and +then invoke copy process (hardware copy), or perform software copy of each +packet using ``pktmbuf_sw_copy()`` function and enqueue them to an rte_ring: + +.. code-block:: c + + /* Receive packets on one port and enqueue to IOAT rawdev or rte_ring. */ + static void + ioat_rx_port(struct rxtx_port_config *rx_config) + { + uint32_t nb_rx, nb_enq, i, j; + struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; + for (i = 0; i < rx_config->nb_queues; i++) { + + nb_rx = rte_eth_rx_burst(rx_config->rxtx_port, i, + pkts_burst, MAX_PKT_BURST); + + if (nb_rx == 0) + continue; + + port_statistics.rx[rx_config->rxtx_port] += nb_rx; + + if (copy_mode == COPY_MODE_IOAT_NUM) { + /* Perform packet hardware copy */ + nb_enq = ioat_enqueue_packets(pkts_burst, + nb_rx, rx_config->ioat_ids[i]); + if (nb_enq > 0) + rte_ioat_do_copies(rx_config->ioat_ids[i]); + } else { + /* Perform packet software copy, free source packets */ + int ret; + struct rte_mbuf *pkts_burst_copy[MAX_PKT_BURST]; + + ret = rte_mempool_get_bulk(ioat_pktmbuf_pool, + (void *)pkts_burst_copy, nb_rx); + + if (unlikely(ret < 0)) + rte_exit(EXIT_FAILURE, + "Unable to allocate memory.\n"); + + for (j = 0; j < nb_rx; j++) + pktmbuf_sw_copy(pkts_burst[j], + pkts_burst_copy[j]); + + rte_mempool_put_bulk(ioat_pktmbuf_pool, + (void *)pkts_burst, nb_rx); + + nb_enq = rte_ring_enqueue_burst( + rx_config->rx_to_tx_ring, + (void *)pkts_burst_copy, nb_rx, NULL); + + /* Free any not enqueued packets. */ + rte_mempool_put_bulk(ioat_pktmbuf_pool, + (void *)&pkts_burst_copy[nb_enq], + nb_rx - nb_enq); + } + + port_statistics.copy_dropped[rx_config->rxtx_port] += + (nb_rx - nb_enq); + } + } + +The packets are received in burst mode using ``rte_eth_rx_burst()`` +function. When using hardware copy mode the packets are enqueued in +copying device's buffer using ``ioat_enqueue_packets()`` which calls +``rte_ioat_enqueue_copy()``. When all received packets are in the +buffer the copy operations are started by calling ``rte_ioat_do_copies()``. +Function ``rte_ioat_enqueue_copy()`` operates on physical address of +the packet. Structure ``rte_mbuf`` contains only physical address to +start of the data buffer (``buf_iova``). Thus the address is adjusted +by ``addr_offset`` value in order to get the address of ``rearm_data`` +member of ``rte_mbuf``. That way both the packet data and metadata can +be copied in a single operation. This method can be used because the mbufs +are direct mbufs allocated by the apps. If another app uses external buffers, +or indirect mbufs, then multiple copy operations must be used. + +.. code-block:: c + + static uint32_t + ioat_enqueue_packets(struct rte_mbuf **pkts, + uint32_t nb_rx, uint16_t dev_id) + { + int ret; + uint32_t i; + struct rte_mbuf *pkts_copy[MAX_PKT_BURST]; + + const uint64_t addr_offset = RTE_PTR_DIFF(pkts[0]->buf_addr, + &pkts[0]->rearm_data); + + ret = rte_mempool_get_bulk(ioat_pktmbuf_pool, + (void *)pkts_copy, nb_rx); + + if (unlikely(ret < 0)) + rte_exit(EXIT_FAILURE, "Unable to allocate memory.\n"); + + for (i = 0; i < nb_rx; i++) { + /* Perform data copy */ + ret = rte_ioat_enqueue_copy(dev_id, + pkts[i]->buf_iova + - addr_offset, + pkts_copy[i]->buf_iova + - addr_offset, + rte_pktmbuf_data_len(pkts[i]) + + addr_offset, + (uintptr_t)pkts[i], + (uintptr_t)pkts_copy[i], + 0 /* nofence */); + + if (ret != 1) + break; + } + + ret = i; + /* Free any not enqueued packets. */ + rte_mempool_put_bulk(ioat_pktmbuf_pool, (void *)&pkts[i], nb_rx - i); + rte_mempool_put_bulk(ioat_pktmbuf_pool, (void *)&pkts_copy[i], + nb_rx - i); + + return ret; + } + + +All completed copies are processed by ``ioat_tx_port()`` function. When using +hardware copy mode the function invokes ``rte_ioat_completed_copies()`` +on each assigned IOAT channel to gather copied packets. If software copy +mode is used the function dequeues copied packets from the rte_ring. Then each +packet MAC address is changed if it was enabled. After that copies are sent +in burst mode using `` rte_eth_tx_burst()``. + + +.. code-block:: c + + /* Transmit packets from IOAT rawdev/rte_ring for one port. */ + static void + ioat_tx_port(struct rxtx_port_config *tx_config) + { + uint32_t i, j, nb_dq = 0; + struct rte_mbuf *mbufs_src[MAX_PKT_BURST]; + struct rte_mbuf *mbufs_dst[MAX_PKT_BURST]; + + for (i = 0; i < tx_config->nb_queues; i++) { + if (copy_mode == COPY_MODE_IOAT_NUM) { + /* Deque the mbufs from IOAT device. */ + nb_dq = rte_ioat_completed_copies( + tx_config->ioat_ids[i], MAX_PKT_BURST, + (void *)mbufs_src, (void *)mbufs_dst); + } else { + /* Deque the mbufs from rx_to_tx_ring. */ + nb_dq = rte_ring_dequeue_burst( + tx_config->rx_to_tx_ring, (void *)mbufs_dst, + MAX_PKT_BURST, NULL); + } + + if (nb_dq == 0) + return; + + if (copy_mode == COPY_MODE_IOAT_NUM) + rte_mempool_put_bulk(ioat_pktmbuf_pool, + (void *)mbufs_src, nb_dq); + + /* Update macs if enabled */ + if (mac_updating) { + for (j = 0; j < nb_dq; j++) + update_mac_addrs(mbufs_dst[j], + tx_config->rxtx_port); + } + + const uint16_t nb_tx = rte_eth_tx_burst( + tx_config->rxtx_port, 0, + (void *)mbufs_dst, nb_dq); + + port_statistics.tx[tx_config->rxtx_port] += nb_tx; + + /* Free any unsent packets. */ + if (unlikely(nb_tx < nb_dq)) + rte_mempool_put_bulk(ioat_pktmbuf_pool, + (void *)&mbufs_dst[nb_tx], + nb_dq - nb_tx); + } + } + +The Packet Copying Functions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In order to perform packet copy there is a user-defined function +``pktmbuf_sw_copy()`` used. It copies a whole packet by copying +metadata from source packet to new mbuf, and then copying a data +chunk of source packet. Both memory copies are done using +``rte_memcpy()``: + +.. code-block:: c + + static inline void + pktmbuf_sw_copy(struct rte_mbuf *src, struct rte_mbuf *dst) + { + /* Copy packet metadata */ + rte_memcpy(&dst->rearm_data, + &src->rearm_data, + offsetof(struct rte_mbuf, cacheline1) + - offsetof(struct rte_mbuf, rearm_data)); + + /* Copy packet data */ + rte_memcpy(rte_pktmbuf_mtod(dst, char *), + rte_pktmbuf_mtod(src, char *), src->data_len); + } + +The metadata in this example is copied from ``rearm_data`` member of +``rte_mbuf`` struct up to ``cacheline1``. + +In order to understand why software packet copying is done as shown +above please refer to the "Mbuf Library" section of the +*DPDK Programmer's Guide*. -- 2.22.0.windows.1