From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 40AEFF04 for ; Fri, 21 Sep 2018 00:48:04 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 20 Sep 2018 15:48:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,282,1534834800"; d="scan'208";a="72497047" Received: from irsmsx108.ger.corp.intel.com ([163.33.3.3]) by fmsmga008.fm.intel.com with ESMTP; 20 Sep 2018 15:48:00 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.54]) by IRSMSX108.ger.corp.intel.com ([169.254.11.13]) with mapi id 14.03.0319.002; Thu, 20 Sep 2018 23:47:59 +0100 From: "Ananyev, Konstantin" To: "Burakov, Anatoly" , "dev@dpdk.org" CC: "laszlo.madarassy@ericsson.com" , "laszlo.vadkerti@ericsson.com" , "andras.kovacs@ericsson.com" , "winnie.tian@ericsson.com" , "daniel.andrasi@ericsson.com" , "janos.kobor@ericsson.com" , "geza.koblo@ericsson.com" , "srinath.mannam@broadcom.com" , "scott.branden@broadcom.com" , "ajit.khaparde@broadcom.com" , "Wiles, Keith" , "Richardson, Bruce" , "thomas@monjalon.net" , "shreyansh.jain@nxp.com" , "shahafs@mellanox.com" , "arybchenko@solarflare.com" Thread-Topic: [dpdk-dev] [PATCH v3 17/20] examples: add external memory example app Thread-Index: AQHUUNa5jz3PxhYAQk2dZF9Cj9jwXaT5xZCw Date: Thu, 20 Sep 2018 22:47:58 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258EA959482@irsmsx105.ger.corp.intel.com> References: <665a207254fdcf2a86dd371f78e7c15c3f9ed298.1537443103.git.anatoly.burakov@intel.com> In-Reply-To: <665a207254fdcf2a86dd371f78e7c15c3f9ed298.1537443103.git.anatoly.burakov@intel.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMmIzZjZhOGQtZDgxMi00Y2ZkLWEzMjgtODZiODE4ZTI5YjA0IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiRUNHM1NLam9yXC9tM2tjXC9NN0NIVzV2NDJYMVRHazVUZTM1WGY3TlRFN2hLRWpTOE9tM0RZUm9nem94TnRVNndoIn0= x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v3 17/20] examples: add external memory example app X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 20 Sep 2018 22:48:05 -0000 Hi Anatoly >=20 > Introduce an example application demonstrating the use of > external memory support. This is a simple application based on > skeleton app, but instead of using internal DPDK memory, it is > using externally allocated memory. >=20 > The RX/TX and init path is a carbon-copy of skeleton app, with > no modifications whatseoever. The only difference is an additional > init stage to allocate memory and create a heap for it, and the > socket ID supplied to the mempool initialization function. The > memory used by this app is hugepage memory allocated anonymously. >=20 > Anonymous hugepage memory will not be allocated in a NUMA-aware > fashion, so there is a chance of performance degradation when > using this app, but given that kernel usually gives hugepages on > local socket first, this should not be a problem in most cases. Do we need a new sample app just for that? Couldn't it be added into testpmd, same, as we have now 'mp-anon' to use mempool over anonymous memory? Konstantin >=20 > Signed-off-by: Anatoly Burakov > --- > examples/external_mem/Makefile | 62 ++++ > examples/external_mem/extmem.c | 461 ++++++++++++++++++++++++++++++ > examples/external_mem/meson.build | 12 + > 3 files changed, 535 insertions(+) > create mode 100644 examples/external_mem/Makefile > create mode 100644 examples/external_mem/extmem.c > create mode 100644 examples/external_mem/meson.build >=20 > diff --git a/examples/external_mem/Makefile b/examples/external_mem/Makef= ile > new file mode 100644 > index 000000000..3b6ab3b2f > --- /dev/null > +++ b/examples/external_mem/Makefile > @@ -0,0 +1,62 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2010-2018 Intel Corporation > + > +# binary name > +APP =3D extmem > + > +# all source are stored in SRCS-y > +SRCS-y :=3D extmem.c > + > +# Build using pkg-config variables if possible > +$(shell pkg-config --exists libdpdk) > +ifeq ($(.SHELLSTATUS),0) > + > +all: shared > +.PHONY: shared static > +shared: build/$(APP)-shared > + ln -sf $(APP)-shared build/$(APP) > +static: build/$(APP)-static > + ln -sf $(APP)-static build/$(APP) > + > +PC_FILE :=3D $(shell pkg-config --path libdpdk) > +CFLAGS +=3D -O3 $(shell pkg-config --cflags libdpdk) > +CFLAGS +=3D -DALLOW_EXPERIMENTAL_API > +LDFLAGS_SHARED =3D $(shell pkg-config --libs libdpdk) > +LDFLAGS_STATIC =3D -Wl,-Bstatic $(shell pkg-config --static --libs libdp= dk) > + > +build/$(APP)-shared: $(SRCS-y) Makefile $(PC_FILE) | build > + $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_SHARED) > + > +build/$(APP)-static: $(SRCS-y) Makefile $(PC_FILE) | build > + $(CC) $(CFLAGS) $(SRCS-y) -o $@ $(LDFLAGS) $(LDFLAGS_STATIC) > + > +build: > + @mkdir -p $@ > + > +.PHONY: clean > +clean: > + rm -f build/$(APP) build/$(APP)-static build/$(APP)-shared > + rmdir --ignore-fail-on-non-empty build > + > +else # Build using legacy build system > + > +ifeq ($(RTE_SDK),) > +$(error "Please define RTE_SDK environment variable") > +endif > + > +# Default target, can be overridden by command line or environment > +RTE_TARGET ?=3D x86_64-native-linuxapp-gcc > + > +include $(RTE_SDK)/mk/rte.vars.mk > + > +CFLAGS +=3D $(WERROR_FLAGS) > +CFLAGS +=3D -DALLOW_EXPERIMENTAL_API > + > +# workaround for a gcc bug with noreturn attribute > +# http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D12603 > +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y) > +CFLAGS_main.o +=3D -Wno-return-type > +endif > + > +include $(RTE_SDK)/mk/rte.extapp.mk > +endif > diff --git a/examples/external_mem/extmem.c b/examples/external_mem/extme= m.c > new file mode 100644 > index 000000000..818a02171 > --- /dev/null > +++ b/examples/external_mem/extmem.c > @@ -0,0 +1,461 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2010-2018 Intel Corporation > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define RX_RING_SIZE 1024 > +#define TX_RING_SIZE 1024 > + > +#define NUM_MBUFS 8191 > +#define MBUF_CACHE_SIZE 250 > +#define BURST_SIZE 32 > +#define EXTMEM_HEAP_NAME "extmem" > + > +static const struct rte_eth_conf port_conf_default =3D { > + .rxmode =3D { > + .max_rx_pkt_len =3D ETHER_MAX_LEN, > + }, > +}; > + > +/* extmem.c: Basic DPDK skeleton forwarding example using external memor= y. */ > + > +/* > + * Initializes a given port using global settings and with the RX buffer= s > + * coming from the mbuf_pool passed as a parameter. > + */ > +static inline int > +port_init(uint16_t port, struct rte_mempool *mbuf_pool) > +{ > + struct rte_eth_conf port_conf =3D port_conf_default; > + const uint16_t rx_rings =3D 1, tx_rings =3D 1; > + uint16_t nb_rxd =3D RX_RING_SIZE; > + uint16_t nb_txd =3D TX_RING_SIZE; > + int retval; > + uint16_t q; > + struct rte_eth_dev_info dev_info; > + struct rte_eth_txconf txconf; > + > + if (!rte_eth_dev_is_valid_port(port)) > + return -1; > + > + rte_eth_dev_info_get(port, &dev_info); > + if (dev_info.tx_offload_capa & DEV_TX_OFFLOAD_MBUF_FAST_FREE) > + port_conf.txmode.offloads |=3D > + DEV_TX_OFFLOAD_MBUF_FAST_FREE; > + > + /* Configure the Ethernet device. */ > + retval =3D rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf); > + if (retval !=3D 0) > + return retval; > + > + retval =3D rte_eth_dev_adjust_nb_rx_tx_desc(port, &nb_rxd, &nb_txd); > + if (retval !=3D 0) > + return retval; > + > + /* Allocate and set up 1 RX queue per Ethernet port. */ > + for (q =3D 0; q < rx_rings; q++) { > + retval =3D rte_eth_rx_queue_setup(port, q, nb_rxd, > + rte_eth_dev_socket_id(port), NULL, mbuf_pool); > + if (retval < 0) > + return retval; > + } > + > + txconf =3D dev_info.default_txconf; > + txconf.offloads =3D port_conf.txmode.offloads; > + /* Allocate and set up 1 TX queue per Ethernet port. */ > + for (q =3D 0; q < tx_rings; q++) { > + retval =3D rte_eth_tx_queue_setup(port, q, nb_txd, > + rte_eth_dev_socket_id(port), &txconf); > + if (retval < 0) > + return retval; > + } > + > + /* Start the Ethernet port. */ > + retval =3D rte_eth_dev_start(port); > + if (retval < 0) > + return retval; > + > + /* Display the port MAC address. */ > + struct ether_addr addr; > + rte_eth_macaddr_get(port, &addr); > + printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8 > + " %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n", > + port, > + addr.addr_bytes[0], addr.addr_bytes[1], > + addr.addr_bytes[2], addr.addr_bytes[3], > + addr.addr_bytes[4], addr.addr_bytes[5]); > + > + /* Enable RX in promiscuous mode for the Ethernet device. */ > + rte_eth_promiscuous_enable(port); > + > + return 0; > +} > + > +/* > + * The lcore main. This is the main thread that does the work, reading f= rom > + * an input port and writing to an output port. > + */ > +static __attribute__((noreturn)) void > +lcore_main(void) > +{ > + uint16_t port; > + > + /* > + * Check that the port is on the same NUMA node as the polling thread > + * for best performance. > + */ > + RTE_ETH_FOREACH_DEV(port) > + if (rte_eth_dev_socket_id(port) > 0 && > + rte_eth_dev_socket_id(port) !=3D > + (int)rte_socket_id()) > + printf("WARNING, port %u is on remote NUMA node to " > + "polling thread.\n\tPerformance will " > + "not be optimal.\n", port); > + > + printf("\nCore %u forwarding packets. [Ctrl+C to quit]\n", > + rte_lcore_id()); > + > + /* Run until the application is quit or killed. */ > + for (;;) { > + /* > + * Receive packets on a port and forward them on the paired > + * port. The mapping is 0 -> 1, 1 -> 0, 2 -> 3, 3 -> 2, etc. > + */ > + RTE_ETH_FOREACH_DEV(port) { > + > + /* Get burst of RX packets, from first port of pair. */ > + struct rte_mbuf *bufs[BURST_SIZE]; > + const uint16_t nb_rx =3D rte_eth_rx_burst(port, 0, > + bufs, BURST_SIZE); > + > + if (unlikely(nb_rx =3D=3D 0)) > + continue; > + > + /* Send burst of TX packets, to second port of pair. */ > + const uint16_t nb_tx =3D rte_eth_tx_burst(port ^ 1, 0, > + bufs, nb_rx); > + > + /* Free any unsent packets. */ > + if (unlikely(nb_tx < nb_rx)) { > + uint16_t buf; > + for (buf =3D nb_tx; buf < nb_rx; buf++) > + rte_pktmbuf_free(bufs[buf]); > + } > + } > + } > +} > + > +/* extremely pessimistic estimation of memory required to create a mempo= ol */ > +static int > +calc_mem_size(uint32_t nb_ports, uint32_t nb_mbufs_per_port, > + uint32_t mbuf_sz, size_t pgsz, size_t *out) > +{ > + uint32_t nb_mbufs =3D nb_ports * nb_mbufs_per_port; > + uint64_t total_mem, mbuf_mem, obj_sz; > + > + /* there is no good way to predict how much space the mempool will > + * occupy because it will allocate chunks on the fly, and some of those > + * will come from default DPDK memory while some will come from our > + * external memory, so just assume 16MB will be enough for everyone. > + */ > + uint64_t hdr_mem =3D 16 << 20; > + > + obj_sz =3D rte_mempool_calc_obj_size(mbuf_sz, 0, NULL); > + if (rte_eal_iova_mode() =3D=3D RTE_IOVA_VA) { > + /* contiguous - no need to account for page boundaries */ > + mbuf_mem =3D nb_mbufs * obj_sz; > + } else { > + /* account for possible non-contiguousness */ > + unsigned int n_pages, mbuf_per_pg, leftover; > + > + mbuf_per_pg =3D pgsz / obj_sz; > + leftover =3D (nb_mbufs % mbuf_per_pg) > 0; > + n_pages =3D (nb_mbufs / mbuf_per_pg) + leftover; > + > + mbuf_mem =3D n_pages * pgsz; > + } > + > + total_mem =3D RTE_ALIGN(hdr_mem + mbuf_mem, pgsz); > + > + if (total_mem > SIZE_MAX) { > + printf("Memory size too big\n"); > + return -1; > + } > + *out =3D (size_t)total_mem; > + > + return 0; > +} > + > +static inline uint32_t > +bsf64(uint64_t v) > +{ > + return (uint32_t)__builtin_ctzll(v); > +} > + > +static inline uint32_t > +log2_u64(uint64_t v) > +{ > + if (v =3D=3D 0) > + return 0; > + v =3D rte_align64pow2(v); > + return bsf64(v); > +} > + > +#ifndef MAP_HUGE_SHIFT > +#define HUGE_SHIFT 26 > +#else > +#define HUGE_SHIFT MAP_HUGE_SHIFT > +#endif > + > +static int > +pagesz_flags(uint64_t page_sz) > +{ > + /* as per mmap() manpage, all page sizes are log2 of page size > + * shifted by MAP_HUGE_SHIFT > + */ > + int log2 =3D log2_u64(page_sz); > + return log2 << HUGE_SHIFT; > +} > + > +static void * > +alloc_mem(size_t memsz, size_t pgsz) > +{ > + void *addr; > + int flags; > + > + /* allocate anonymous hugepages */ > + flags =3D MAP_ANONYMOUS | MAP_PRIVATE | MAP_HUGETLB | pagesz_flags(pgsz= ); > + > + addr =3D mmap(NULL, memsz, PROT_READ | PROT_WRITE, flags, -1, 0); > + if (addr =3D=3D MAP_FAILED) > + return NULL; > + > + return addr; > +} > + > +struct extmem_param { > + void *addr; > + size_t len; > + size_t pgsz; > + rte_iova_t *iova_table; > + unsigned int iova_table_len; > +}; > + > +static int > +create_extmem(uint32_t nb_ports, uint32_t nb_mbufs_per_port, uint32_t mb= uf_sz, > + struct extmem_param *param) > +{ > + uint64_t pgsizes[] =3D {RTE_PGSIZE_2M, RTE_PGSIZE_1G, /* x86_64, ARM */ > + RTE_PGSIZE_16M, RTE_PGSIZE_16G}; /* POWER */ > + unsigned int n_pages, cur_page, pgsz_idx; > + size_t mem_sz, offset, cur_pgsz; > + bool vfio_supported =3D true; > + rte_iova_t *iovas =3D NULL; > + void *addr; > + int ret; > + > + for (pgsz_idx =3D 0; pgsz_idx < RTE_DIM(pgsizes); pgsz_idx++) { > + /* skip anything that is too big */ > + if (pgsizes[pgsz_idx] > SIZE_MAX) > + continue; > + > + cur_pgsz =3D pgsizes[pgsz_idx]; > + > + ret =3D calc_mem_size(nb_ports, nb_mbufs_per_port, > + mbuf_sz, cur_pgsz, &mem_sz); > + if (ret < 0) { > + printf("Cannot calculate memory size\n"); > + return -1; > + } > + > + /* allocate our memory */ > + addr =3D alloc_mem(mem_sz, cur_pgsz); > + > + /* if we couldn't allocate memory with a specified page size, > + * that doesn't mean we can't do it with other page sizes, so > + * try another one. > + */ > + if (addr =3D=3D NULL) > + continue; > + > + /* store IOVA addresses for every page in this memory area */ > + n_pages =3D mem_sz / cur_pgsz; > + > + iovas =3D malloc(sizeof(*iovas) * n_pages); > + > + if (iovas =3D=3D NULL) { > + printf("Cannot allocate memory for iova addresses\n"); > + goto fail; > + } > + > + /* populate IOVA table */ > + for (cur_page =3D 0; cur_page < n_pages; cur_page++) { > + rte_iova_t iova; > + void *cur; > + > + offset =3D cur_pgsz * cur_page; > + cur =3D RTE_PTR_ADD(addr, offset); > + > + iova =3D (uintptr_t)rte_mem_virt2iova(cur); > + > + iovas[cur_page] =3D iova; > + > + if (vfio_supported) { > + /* map memory for DMA */ > + ret =3D rte_vfio_dma_map((uintptr_t)addr, > + iova, cur_pgsz); > + if (ret < 0) { > + /* > + * ENODEV means VFIO is not initialized > + * ENOTSUP means current IOMMU mode > + * doesn't support mapping > + * both cases are not an error > + */ > + if (rte_errno =3D=3D ENOTSUP || > + rte_errno =3D=3D ENODEV) > + /* VFIO is unsupported, don't > + * try again. > + */ > + vfio_supported =3D false; > + else > + /* this is an actual error */ > + goto fail; > + } > + } > + } > + > + break; > + } > + /* if we couldn't allocate anything */ > + if (iovas =3D=3D NULL) > + return -1; > + > + param->addr =3D addr; > + param->len =3D mem_sz; > + param->pgsz =3D cur_pgsz; > + param->iova_table =3D iovas; > + param->iova_table_len =3D n_pages; > + > + return 0; > +fail: > + if (iovas) > + free(iovas); > + if (addr) > + munmap(addr, mem_sz); > + > + return -1; > +} > + > +static int > +setup_extmem(uint32_t nb_ports, uint32_t nb_mbufs_per_port, uint32_t mbu= f_sz) > +{ > + struct extmem_param param; > + int ret; > + > + /* create our heap */ > + ret =3D rte_malloc_heap_create(EXTMEM_HEAP_NAME); > + if (ret < 0) { > + printf("Cannot create heap\n"); > + return -1; > + } > + > + ret =3D create_extmem(nb_ports, nb_mbufs_per_port, mbuf_sz, ¶m); > + if (ret < 0) { > + printf("Cannot create memory area\n"); > + return -1; > + } > + > + /* we now have a valid memory area, so add it to heap */ > + ret =3D rte_malloc_heap_memory_add(EXTMEM_HEAP_NAME, > + param.addr, param.len, param.iova_table, > + param.iova_table_len, param.pgsz); > + > + /* not needed any more */ > + free(param.iova_table); > + > + if (ret < 0) { > + printf("Cannot add memory to heap\n"); > + munmap(param.addr, param.len); > + return -1; > + } > + > + printf("Allocated %zuMB of memory\n", param.len >> 20); > + > + /* success */ > + return 0; > +} > + > + > +/* > + * The main function, which does initialization and calls the per-lcore > + * functions. > + */ > +int > +main(int argc, char *argv[]) > +{ > + struct rte_mempool *mbuf_pool; > + unsigned int nb_ports; > + int socket_id; > + uint16_t portid; > + uint32_t nb_mbufs_per_port, mbuf_sz; > + > + /* Initialize the Environment Abstraction Layer (EAL). */ > + int ret =3D rte_eal_init(argc, argv); > + if (ret < 0) > + rte_exit(EXIT_FAILURE, "Error with EAL initialization\n"); > + > + argc -=3D ret; > + argv +=3D ret; > + > + /* Check that there is an even number of ports to send/receive on. */ > + nb_ports =3D rte_eth_dev_count_avail(); > + if (nb_ports < 2 || (nb_ports & 1)) > + rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n"); > + > + nb_mbufs_per_port =3D NUM_MBUFS; > + mbuf_sz =3D RTE_MBUF_DEFAULT_BUF_SIZE; > + > + if (setup_extmem(nb_ports, nb_mbufs_per_port, mbuf_sz) < 0) > + rte_exit(EXIT_FAILURE, "Error: cannot set up external memory\n"); > + > + /* retrieve socket ID for our heap */ > + socket_id =3D rte_malloc_heap_get_socket(EXTMEM_HEAP_NAME); > + if (socket_id < 0) > + rte_exit(EXIT_FAILURE, "Invalid socket for external heap\n"); > + > + /* Creates a new mempool in memory to hold the mbufs. */ > + mbuf_pool =3D rte_pktmbuf_pool_create("MBUF_POOL", > + nb_mbufs_per_port * nb_ports, MBUF_CACHE_SIZE, 0, > + mbuf_sz, socket_id); > + > + if (mbuf_pool =3D=3D NULL) > + rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n"); > + > + /* Initialize all ports. */ > + RTE_ETH_FOREACH_DEV(portid) > + if (port_init(portid, mbuf_pool) !=3D 0) > + rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu16 "\n", > + portid); > + > + if (rte_lcore_count() > 1) > + printf("\nWARNING: Too many lcores enabled. Only 1 used.\n"); > + > + /* Call lcore_main on the master core only. */ > + lcore_main(); > + > + return 0; > +} > diff --git a/examples/external_mem/meson.build b/examples/external_mem/me= son.build > new file mode 100644 > index 000000000..17a363ad2 > --- /dev/null > +++ b/examples/external_mem/meson.build > @@ -0,0 +1,12 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2017 Intel Corporation > + > +# meson file, for building this example as part of a main DPDK build. > +# > +# To build this example as a standalone application with an already-inst= alled > +# DPDK instance, use 'make' > + > +allow_experimental_apis =3D true > +sources =3D files( > + 'extmem.c' > +) > -- > 2.17.1