From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 8F44F7E7B for ; Tue, 21 Oct 2014 22:40:48 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 21 Oct 2014 10:56:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.97,862,1389772800"; d="scan'208";a="403513050" Received: from pgsmsx103.gar.corp.intel.com ([10.221.44.82]) by FMSMGA003.fm.intel.com with ESMTP; 21 Oct 2014 06:12:28 -0700 Received: from pgsmsx106.gar.corp.intel.com (10.221.44.98) by PGSMSX103.gar.corp.intel.com (10.221.44.82) with Microsoft SMTP Server (TLS) id 14.3.195.1; Tue, 21 Oct 2014 21:17:03 +0800 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by pgsmsx106.gar.corp.intel.com (10.221.44.98) with Microsoft SMTP Server (TLS) id 14.3.195.1; Tue, 21 Oct 2014 21:17:03 +0800 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.156]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.13]) with mapi id 14.03.0195.001; Tue, 21 Oct 2014 21:17:01 +0800 From: "Liang, Cunming" To: Neil Horman Thread-Topic: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx cycles/packet Thread-Index: AQHP5IYD5aCkJuEDxEurfcum8uYUPpwpFr4AgALSmBCADfxZgIAApRJA Date: Tue, 21 Oct 2014 13:17:01 +0000 Message-ID: References: <1408947174-11323-1-git-send-email-cunming.liang@intel.com> <1412944201-30703-1-git-send-email-cunming.liang@intel.com> <1412944201-30703-2-git-send-email-cunming.liang@intel.com> <20141010175226.GG19499@hmsreliant.think-freely.org> <20141021103315.GB12795@hmsreliant.think-freely.org> In-Reply-To: <20141021103315.GB12795@hmsreliant.think-freely.org> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx cycles/packet X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2014 20:40:50 -0000 > -----Original Message----- > From: Neil Horman [mailto:nhorman@tuxdriver.com] > Sent: Tuesday, October 21, 2014 6:33 PM > To: Liang, Cunming > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx > cycles/packet >=20 > On Sun, Oct 12, 2014 at 11:10:39AM +0000, Liang, Cunming wrote: > > Hi Neil, > > > > Very appreciate your comments. > > I add inline reply, will send v3 asap when we get alignment. > > > > BRs, > > Liang Cunming > > > > > -----Original Message----- > > > From: Neil Horman [mailto:nhorman@tuxdriver.com] > > > Sent: Saturday, October 11, 2014 1:52 AM > > > To: Liang, Cunming > > > Cc: dev@dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and= tx > cycles/packet > > > > > > On Fri, Oct 10, 2014 at 08:29:58PM +0800, Cunming Liang wrote: > > > > It provides unit test to measure cycles/packet in NIC loopback mode= . > > > > It simply gives the average cycles of IO used per packet without te= st > equipment. > > > > When doing the test, make sure the link is UP. > > > > > > > > Usage Example: > > > > 1. Run unit test app in interactive mode > > > > app/test -c f -n 4 -- -i > > > > 2. Run and wait for the result > > > > pmd_perf_autotest > > > > > > > > There's option to choose rx/tx pair, default is vector. > > > > set_rxtx_mode [vector|scalar|full|hybrid] > > > > Note: To get acurate scalar fast, please choose 'vector' or 'hybrid= ' without > > > INC_VEC=3Dy in config > > > > > > > > Signed-off-by: Cunming Liang > > > > Acked-by: Bruce Richardson > > > > > > Notes inline > > > > > > > --- > > > > app/test/Makefile | 1 + > > > > app/test/commands.c | 38 +++ > > > > app/test/packet_burst_generator.c | 4 +- > > > > app/test/test.h | 4 + > > > > app/test/test_pmd_perf.c | 626 > > > +++++++++++++++++++++++++++++++++++ > > > > lib/librte_pmd_ixgbe/ixgbe_ethdev.c | 6 + > > > > 6 files changed, 677 insertions(+), 2 deletions(-) > > > > create mode 100644 app/test/test_pmd_perf.c > > > > > > > > diff --git a/app/test/Makefile b/app/test/Makefile > > > > index 6af6d76..ebfa0ba 100644 > > > > --- a/app/test/Makefile > > > > +++ b/app/test/Makefile > > > > @@ -56,6 +56,7 @@ SRCS-y +=3D test_memzone.c > > > > > > > > SRCS-y +=3D test_ring.c > > > > SRCS-y +=3D test_ring_perf.c > > > > +SRCS-y +=3D test_pmd_perf.c > > > > > > > > ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y) > > > > SRCS-y +=3D test_table.c > > > > diff --git a/app/test/commands.c b/app/test/commands.c > > > > index a9e36b1..f1e746e 100644 > > > > --- a/app/test/commands.c > > > > +++ b/app/test/commands.c > > > > @@ -310,12 +310,50 @@ cmdline_parse_inst_t cmd_quit =3D { > > > > > > > > +#define NB_ETHPORTS_USED (1) > > > > +#define NB_SOCKETS (2) > > > > +#define MEMPOOL_CACHE_SIZE 250 > > > > +#define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) + > > > RTE_PKTMBUF_HEADROOM) > > > Don't you want to size this in accordance with the amount of data you= r > sending > > > (64 Bytes as noted above)? > > [Liang, Cunming] The case is designed to measure small packet IO cost w= ith > normal mbuf size. > > Even if decreasing the size, it won't gain significant cycles. > > > > That presumes a non-memory constrained system, doesn't it? I suppose in = the > end > as long as you have consistency, its not overly relevant, but it seems li= ke > you'll want to add data sizing as a feature to this eventually (i.e. the = ability > to test performance for larger frames sizes), at which point you'll need = to make > this non-static anyway. [Liang, Cunming] For a normal Ethernet packet(w/o jumbo frame), packet size= is 1518B. As in really network, there won't have huge number of jumbo frames. The mbuf size 2048 is a reasonable value to cover most of the packet size. It's also be chosen by lots of NIC as the default receiving buffer size in = DMA register. In case larger than the size, it need do scatter and gather but lose some p= erformance. The unit test won't measure size from 64 to 9600, won't plan to measure sca= tter-gather rx/tx. It focus on 64B packet size and taking the mbuf size being used the most of= ten. >=20 > > > > +static void > > > > +print_ethaddr(const char *name, const struct ether_addr *eth_addr) > > > > +{ > > > > + printf("%s%02X:%02X:%02X:%02X:%02X:%02X", name, > > > > + eth_addr->addr_bytes[0], > > > > + eth_addr->addr_bytes[1], > > > > + eth_addr->addr_bytes[2], > > > > + eth_addr->addr_bytes[3], > > > > + eth_addr->addr_bytes[4], > > > > + eth_addr->addr_bytes[5]); > > > > +} > > > > + > > > This was copieed from print_ethaddr. Seems like a good candidate for= a > common > > > function in rte_ether.h > > [Liang, Cunming] Agree with you, some of samples now use it with the sa= me > copy. > > I'll rework it. Adding 'ether_format_addr' in rte_ether.h only for form= at the > 48bits address output. > > And leaving other prints for application customization. > > > > Sounds good. >=20 > > > > > > > +} > > > > + > > > > +static void > > > > +signal_handler(int signum) > > > > +{ > > > > + /* When we receive a USR1 signal, print stats */ > > > I think you mean SIGUSR2, below, SIGUSR1 tears the test down and exit= s the > > > program > > [Liang, Cunming] Thanks, it's a typo. > > > > > > > + if (signum =3D=3D SIGUSR1) { > > > SIGINT instead. Thats the common practice. > > [Liang, Cunming] I understood your opinion. > > The considerations I'm not using SIGINT instead are: > > 1. We unset ISIG in c_lflag of term. CRTL+C won't trigger SIGINT in com= mand > interactive. > > It always has to explicitly send signal. No matter SIGUSR1 or SIGINT. > > 2. By SIGINT semantic, expect to terminate the process. > > Here I expect to force stop this case, but still alive in command lin= e. > > After it stopped, it can run again or start to run other test cases. > > So I keep SIGINT, SIGUSR1 in different behavior. > > 3. It should be rarely used. > > Only when exception timeout, I leave this backdoor for automation tes= t > control. > > For manual test, we can easily force kill the process. > > > Hmm, ok, that sounds reasonable. >=20 > > > > > > > + printf("Force Stop!\n"); > > > > + stop =3D 1; > > > > + } > > > > + if (signum =3D=3D SIGUSR2) > > > > + stats_display(0); > > > > +} > > > > +/* main processing loop */ > > > > +static int > > > > +main_loop(__rte_unused void *args) > > > > +{ > > > > +#define PACKET_SIZE 64 > > > > +#define FRAME_GAP 12 > > > > +#define MAC_PREAMBLE 8 > > > > + struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; > > > > + unsigned lcore_id; > > > > + unsigned i, portid, nb_rx =3D 0, nb_tx =3D 0; > > > > + struct lcore_conf *conf; > > > > + uint64_t prev_tsc, cur_tsc; > > > > + int pkt_per_port; > > > > + uint64_t packets_per_second, total_packets; > > > > + > > > > + lcore_id =3D rte_lcore_id(); > > > > + conf =3D &lcore_conf[lcore_id]; > > > > + if (conf->status !=3D LCORE_USED) > > > > + return 0; > > > > + > > > > + pkt_per_port =3D MAX_TRAFIC_BURST / conf->nb_ports; > > > > + > > > > + int idx =3D 0; > > > > + for (i =3D 0; i < conf->nb_ports; i++) { > > > > + int num =3D pkt_per_port; > > > > + portid =3D conf->portlist[i]; > > > > + printf("inject %d packet to port %d\n", num, portid); > > > > + while (num) { > > > > + nb_tx =3D RTE_MIN(MAX_PKT_BURST, num); > > > > + nb_tx =3D rte_eth_tx_burst(portid, 0, > > > > + &tx_burst[idx], nb_tx); > > > > + num -=3D nb_tx; > > > > + idx +=3D nb_tx; > > > > + } > > > > + } > > > > + printf("Total packets inject to prime ports =3D %u\n", idx); > > > > + > > > > + packets_per_second =3D (link_mbps * 1000 * 1000) / > > > > + +((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) * CHAR_BIT); > > > > + printf("Each port will do %"PRIu64" packets per second\n", > > > > + +packets_per_second); > > > > + > > > > + total_packets =3D RTE_TEST_DURATION * conf->nb_ports * > > > packets_per_second; > > > > + printf("Test will stop after at least %"PRIu64" packets received\= n", > > > > + + total_packets); > > > > + > > > > + prev_tsc =3D rte_rdtsc(); > > > > + > > > > + while (likely(!stop)) { > > > > + for (i =3D 0; i < conf->nb_ports; i++) { > > > > + portid =3D conf->portlist[i]; > > > > + nb_rx =3D rte_eth_rx_burst((uint8_t) portid, 0, > > > > + pkts_burst, MAX_PKT_BURST); > > > > + if (unlikely(nb_rx =3D=3D 0)) { > > > > + idle++; > > > > + continue; > > > > + } > > > > + > > > > + count +=3D nb_rx; > > > Doesn't take into consideration error conditions. rte_eth_rx_burst c= an > return > > > -ENOTSUP > > [Liang, Cunming] It returns -ENOTSUP when turning on ETHDEV_DEBUG > CONFIG. > > The error is used to avoid no function call register. > > When ETHDEV_DEBUG turn off, the NULL function call cause segfault direc= tly. > > So I think it's a library internal error. > No, look at FUNC_PTR_OR_ERR_RET. If the passed in function pointer is nu= ll, > -ENOTSUPP will be returned to the application, you need to handle the err= or > condition. [Liang, Cunming] The runtime rte_eth_rx_burst is a inline function in rte_e= thdev.h. The one you're talking about is the one defined in rte_ethdev.c which is a = extern non-inline function. It works only when RTE_LIBRTE_ETHDEV_DEBUG turns on. If we always turns such library internal checking on, it lose performance. So I insist it's a library internal error checking, doesn't need to take ca= re in application level. >=20 > > In such library exceptional case, I prefer not expecting sample/applica= tion to > condition check library functional error. > But you're assertion about the library handling the exception is wrong. >=20 > > > > > > > + nb_tx =3D rte_eth_tx_burst(portid, 0, pkts_burst, nb_rx); > > > Ditto with -ENOTSUP > > > > > > > + if (unlikely(nb_tx < nb_rx)) { > > > What makes this unlikely? Seems like a perfectly reasonable conditio= n to > happen > > > to me. If the network is busy, its completely likely that you will r= eceive more > > > frames than you send, if you elect to receive all frames. > > [Liang, Cunming] For this case, NIC works in MAC loopback mode. > > It firstly injects numbers of packets to NIC. Then NIC will loopback al= l packets to > ingress side. > > The code here will receive the packets, and send them back to NIC > > Packets looping inside all come from initial injection. > > As the total number of injected packets is much less than in-chip queue= size, > the tx egress queue shouldn't block desc. ring update. > > So when receiving packets in line rate and nb_tx < nb_rx, it means tx c= annot > archive line rate. > > When it happens, the cycles/packets result make no sense, as the bottle= neck is > NIC. > > The drop counter can record it. > Ok. >=20 > > > > > > > + drop +=3D (nb_rx - nb_tx); > > > > + do { > > > > + rte_pktmbuf_free(pkts_burst[nb_tx]); > > > Defer this, it skews your timing > > [Liang, Cunming] Agree with you, I ever thought about it. > > This test cases is designed to measure pure IO RX/TX routine. > > When doing tx burst with result nb_tx < nb_rx, we either repeat re-send= or > drop it. > > Each way introduces noise(adding additional control code), resending mu= ch > times even cost more than free it. > > The cycles/packets is useful when there's no packet drop, otherwise it = gives the > hint where the problem comes from (by idle or drop). > I'm not sure what you're asserting here. Are you suggesting that you wan= t to > include the time it takes to free memory buffers in your testing? That s= eems > dubious at best to me. If you want to measure I/O, thats great, but memo= ry > management of packet buffers is a separate operation from that I/O [Liang, Cunming] Agree with you. I means it doesn't need to take care of th= e mbuf free cost. As I said in previous reply, it rarely happens in line rate. The cycle measurement doesn't make much sense if happens. On that time the unit test just notify it happens, and keep safe free all t= he unsent mbuf. >=20 > > > > > > > + } while (++nb_tx < nb_rx); > > > > + } > > > > + } > > > > + if (unlikely(count >=3D total_packets)) > > > > + break; > > > Whats the reasoning here? Do you only ever expect to receive frames = that > you > > > send? If so, seems like this should call for a big warning to get pr= inted. > > [Liang, Cunming] The loop exits when the pre-calculated total_packets a= re > received. > > As the nb_rx is unpredictable, the packet counter may large equal than > total_packets the last time. > > The reason unlikely used here is because the condition becomes true onl= y the > last time. > ok >=20 > > > > > > > + } > > > > + > > > > + cur_tsc =3D rte_rdtsc(); > > > > + > > > > + for (i =3D 0; i < conf->nb_ports; i++) { > > > > + portid =3D conf->portlist[i]; > > > > + int nb_free =3D pkt_per_port; > > > > + do { /* dry out */ > > > > + nb_rx =3D rte_eth_rx_burst((uint8_t) portid, 0, > > > > + pkts_burst, MAX_PKT_BURST); > > > > + nb_tx =3D 0; > > > > + while (nb_tx < nb_rx) > > > > + rte_pktmbuf_free(pkts_burst[nb_tx++]); > > > > + nb_free -=3D nb_rx; > > > > + } while (nb_free !=3D 0); > > > > + printf("free %d mbuf left in port %u\n", pkt_per_port, portid); > > > > + } > > > > + > > > Whats the purpose of this? Are you trying to flush the device? Woul= dn't it > be > > > enough just to stop the interface? > > [Liang, Cunming] If we only run the cases once and exit, it doesn't mat= ter. > > But it's designed to run multi-times without exit, for the purpose of w= arming > up or for getting average number. > > So stopping device is not enough, we have to release the flying packets= . > > > Ok >=20 > > > > > > > + if (count =3D=3D 0) > > > > + return -1; > > > > + > > > > + printf("%lu packet, %lu drop, %lu idle\n", count, drop, idle); > > > > + printf("Result: %ld cycles per packet\n", (cur_tsc - prev_tsc) / = count); > > > > + > > > Bad math here. Theres no guarantee that the tsc hasn't wrapped > (potentially > > > more than once) depending on your test length. you need to check the= tsc > before > > > and after each burst and record an average of deltas instead, account= ing in > each > > > instance for the possibility of wrap. > > [Liang, Cunming] I'm not sure catch your point correctly. > > I think both cur_tsc and prev_tsc are 64 bits width. > > For 3GHz, I think it won't wrapped so quick. > > As it's uint64_t, so even get wrapped, the delta should still be correc= t. > But theres no guarantee that the tsc starts at zero when you begin your t= est. > The system may have been up for a long time and near wrapping already. > Regardless, you need to account for the possibility that cur_tsc is small= er > than prev_tsc, or this breaks. [Liang, Cunming] In case prev_tsc near wrapping, and cur_tsc get wrapped. As they are unsigned, it still ok. e.g. cur_tsc=3D0x2, prev_tsc=3D0xFFFFFFFFFFFFFFFC Delta=3Dcur_tsc-prev_tsc is 6, which is still correct. And for uint64_t, we need to start computer for hundreds of years. >=20 > > > > > > > + return 0; > > > > +} > > > > + > > > > +static int > > > > +test_pmd_perf(void) > > > > +{ > > > > + uint16_t nb_ports, num, nb_lcores, slave_id =3D (uint16_t)-1; > > > > + uint16_t nb_rxd =3D RTE_TEST_RX_DESC_DEFAULT; > > > > + uint16_t nb_txd =3D RTE_TEST_TX_DESC_DEFAULT; > > > > + uint16_t portid; > > > > + uint16_t nb_rx_queue =3D 1, nb_tx_queue =3D 1; > > > > + int socketid =3D -1; > > > > + int ret; > > > > + > > > > + printf("Start PMD RXTX cycles cost test.\n"); > > > > + > > > > + signal(SIGUSR1, signal_handler); > > > Again SIGINT here. > > > > > > > + signal(SIGUSR2, signal_handler); > > > > + > > > > + nb_ports =3D rte_eth_dev_count(); > > > > + if (nb_ports < NB_ETHPORTS_USED) { > > > > + printf("At least %u port(s) used for perf. test\n", > > > > + NB_ETHPORTS_USED); > > > > + return -1; > > > > + } > > > > + > > > > + if (nb_ports > RTE_MAX_ETHPORTS) > > > > + nb_ports =3D RTE_MAX_ETHPORTS; > > > > + > > > > + nb_lcores =3D rte_lcore_count(); > > > > + > > > > + memset(lcore_conf, 0, sizeof(lcore_conf)); > > > > + init_lcores(); > > > > + > > > > + init_mbufpool(NB_MBUF); > > > > + > > > > + reset_count(); > > > > + num =3D 0; > > > > + for (portid =3D 0; portid < nb_ports; portid++) { > > > > + if (socketid =3D=3D -1) { > > > > + socketid =3D rte_eth_dev_socket_id(portid); > > > > + slave_id =3D alloc_lcore(socketid); > > > > + if (slave_id =3D=3D (uint16_t)-1) { > > > > + printf("No avail lcore to run test\n"); > > > > + return -1; > > > > + } > > > > + printf("Performance test runs on lcore %u socket %u\n", > > > > + slave_id, socketid); > > > > + } > > > > + > > > > + if (socketid !=3D rte_eth_dev_socket_id(portid)) { > > > > + printf("Skip port %d\n", portid); > > > > + continue; > > > > + } > > > > + > > > > + /* port configure */ > > > > + ret =3D rte_eth_dev_configure(portid, nb_rx_queue, > > > > + nb_tx_queue, &port_conf); > > > > + if (ret < 0) > > > > + rte_exit(EXIT_FAILURE, > > > > + "Cannot configure device: err=3D%d, port=3D%d\n", > > > > + ret, portid); > > > > + > > > > + rte_eth_macaddr_get(portid, &ports_eth_addr[portid]); > > > > + printf("Port %u ", portid); > > > > + print_ethaddr("Address:", &ports_eth_addr[portid]); > > > > + printf("\n"); > > > > + > > > > + /* tx queue setup */ > > > > + ret =3D rte_eth_tx_queue_setup(portid, 0, nb_txd, > > > > + socketid, &tx_conf); > > > > + if (ret < 0) > > > > + rte_exit(EXIT_FAILURE, > > > > + "rte_eth_tx_queue_setup: err=3D%d, " > > > > + "port=3D%d\n", ret, portid); > > > > + > > > > + /* rx queue steup */ > > > > + ret =3D rte_eth_rx_queue_setup(portid, 0, nb_rxd, > > > > + socketid, &rx_conf, > > > > + mbufpool[socketid]); > > > > + if (ret < 0) > > > > + rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup: err=3D%d," > > > > + "port=3D%d\n", ret, portid); > > > > + > > > > + /* Start device */ > > > > + stop =3D 0; > > > > + ret =3D rte_eth_dev_start(portid); > > > > + if (ret < 0) > > > > + rte_exit(EXIT_FAILURE, > > > > + "rte_eth_dev_start: err=3D%d, port=3D%d\n", > > > > + ret, portid); > > > > + > > > > + /* always eanble promiscuous */ > > > > + rte_eth_promiscuous_enable(portid); > > > > + > > > > + lcore_conf[slave_id].portlist[num++] =3D portid; > > > > + lcore_conf[slave_id].nb_ports++; > > > > + } > > > > + check_all_ports_link_status(nb_ports, RTE_PORT_ALL); > > > > + > > > > + init_traffic(mbufpool[socketid], tx_burst, MAX_TRAFIC_BURST); > > > > + > > > > + rte_eal_remote_launch(main_loop, NULL, slave_id); > > > > + if (rte_eal_wait_lcore(slave_id) < 0) > > > > + return -1; > > > > + > > > > + /* port tear down */ > > > > + for (portid =3D 0; portid < nb_ports; portid++) { > > > > + if (socketid !=3D rte_eth_dev_socket_id(portid)) > > > > + continue; > > > > + > > > > + rte_eth_dev_stop(portid); > > > > + } > > > > + > > > Clean up your allocated memory/lcores/etc? > > [Liang, Cunming] It do cleanup on the beginning of case. > > "Init_lcores","init_mbufpool","reset_count" guarantee the data clean be= fore > each testing. > > And mbufpool only allocated once even if we run multiple times. > Its a janitorial issue. Before the program exits, you need to free any > resources that you've allocated. [Liang, Cunming] Yes, I do but only not for the mbuf pool. Firstly the case exit is not equal to program exit. I expect the re-enter w= ill use the same mempool, so that can do cache warm up. Secondly, DPDK doesn't supply really mempool release. The program exit, the= memzone then free. >=20 > Neil >=20 > > > > > > Neil > > > >