From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 3045F5908 for ; Fri, 18 Mar 2016 14:32:16 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP; 18 Mar 2016 06:32:13 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,355,1455004800"; d="scan'208";a="927018859" Received: from unknown (HELO Sent) ([10.217.248.31]) by fmsmga001.fm.intel.com with SMTP; 18 Mar 2016 06:32:11 -0700 Received: by Sent (sSMTP sendmail emulation); Fri, 18 Mar 2016 14:31:49 +0100 From: Tomasz Kulasek To: dev@dpdk.org Date: Fri, 18 Mar 2016 14:31:46 +0100 Message-Id: <1458307906-4824-1-git-send-email-tomaszx.kulasek@intel.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1458294767-5996-1-git-send-email-tomaszx.kulasek@intel.com> References: <1458294767-5996-1-git-send-email-tomaszx.kulasek@intel.com> Subject: [dpdk-dev] [PATCH v6] examples/l3fwd: em path performance fix X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2016 13:32:16 -0000 It seems that for the most use cases, previous hash_multi_lookup provides better performance, and more, sequential lookup can cause significant performance drop. This patch sets previously optional hash_multi_lookup method as default. It also provides some minor optimizations such as queue drain only on used tx ports. This patch should be applied after Maciej Czekaj's patch "l3fwd: Fix compilation with HASH_MULTI_LOOKUP" v6 changes: - use RTE_MACHINE_CPUFLAG_NEON instead of __ARM_NEON for ARM Neon detection v5 changes: - removed debug informations, patch cleanup v4 changes: - rebased to be applicable after patch "l3fwd: Fix compilation with HASH_MULTI_LOOKUP" of Maciej Czekaj v3 changes: - "lpm: extend IPv4 next hop field" patch extends dst_port table from uint16_t to uint32_t omiting previously disabled l3fwd_em_hlm_sse.h, what causes incompatible pointer type error after turning on this header v2 changes: - fixed copy-paste error causing that not all packets are classified right in hash_multi_lookup implementation when burst size is not divisible by 8 Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match") Fixes: dc81ebbacaeb ("lpm: extend IPv4 next hop field") Fixes: 64d3955de1de ("examples/l3fwd: fix ARM build") Reported-by: Qian Xu Signed-off-by: Tomasz Kulasek --- examples/l3fwd/l3fwd.h | 6 ++++++ examples/l3fwd/l3fwd_em.c | 8 ++++---- examples/l3fwd/l3fwd_em_hlm_sse.h | 28 ++++++++++------------------ examples/l3fwd/l3fwd_em_sse.h | 9 +++++++++ examples/l3fwd/l3fwd_lpm.c | 4 ++-- examples/l3fwd/main.c | 7 +++++++ 6 files changed, 38 insertions(+), 24 deletions(-) diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h index 7dcc7e5..726e8cc 100644 --- a/examples/l3fwd/l3fwd.h +++ b/examples/l3fwd/l3fwd.h @@ -40,6 +40,10 @@ #define RTE_LOGTYPE_L3FWD RTE_LOGTYPE_USER1 +#if !defined(NO_HASH_MULTI_LOOKUP) && defined(RTE_MACHINE_CPUFLAG_NEON) +#define NO_HASH_MULTI_LOOKUP 1 +#endif + #define MAX_PKT_BURST 32 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */ @@ -86,6 +90,8 @@ struct lcore_rx_queue { struct lcore_conf { uint16_t n_rx_queue; struct lcore_rx_queue rx_queue_list[MAX_RX_QUEUE_PER_LCORE]; + uint16_t n_tx_port; + uint16_t tx_port_id[RTE_MAX_ETHPORTS]; uint16_t tx_queue_id[RTE_MAX_ETHPORTS]; struct mbuf_table tx_mbufs[RTE_MAX_ETHPORTS]; void *ipv4_lookup_struct; diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c index 0adf8f4..50f09e5 100644 --- a/examples/l3fwd/l3fwd_em.c +++ b/examples/l3fwd/l3fwd_em.c @@ -250,7 +250,7 @@ em_mask_key(void *key, xmm_t mask) return _mm_and_si128(data, mask); } -#elif defined(__ARM_NEON) +#elif defined(RTE_MACHINE_CPUFLAG_NEON) static inline xmm_t em_mask_key(void *key, xmm_t mask) { @@ -320,7 +320,7 @@ em_get_ipv6_dst_port(void *ipv6_hdr, uint8_t portid, void *lookup_struct) * buffer optimization i.e. ENABLE_MULTI_BUFFER_OPTIMIZE=1. */ #if defined(__SSE4_1__) -#ifndef HASH_MULTI_LOOKUP +#if defined(NO_HASH_MULTI_LOOKUP) #include "l3fwd_em_sse.h" #else #include "l3fwd_em_hlm_sse.h" @@ -568,8 +568,8 @@ em_main_loop(__attribute__((unused)) void *dummy) diff_tsc = cur_tsc - prev_tsc; if (unlikely(diff_tsc > drain_tsc)) { - for (i = 0; i < qconf->n_rx_queue; i++) { - portid = qconf->rx_queue_list[i].port_id; + for (i = 0; i < qconf->n_tx_port; ++i) { + portid = qconf->tx_port_id[i]; if (qconf->tx_mbufs[portid].len == 0) continue; send_burst(qconf, diff --git a/examples/l3fwd/l3fwd_em_hlm_sse.h b/examples/l3fwd/l3fwd_em_hlm_sse.h index 891ae2e..7faf04a 100644 --- a/examples/l3fwd/l3fwd_em_hlm_sse.h +++ b/examples/l3fwd/l3fwd_em_hlm_sse.h @@ -34,17 +34,9 @@ #ifndef __L3FWD_EM_HLM_SSE_H__ #define __L3FWD_EM_HLM_SSE_H__ -/** - * @file - * This is an optional implementation of packet classification in Exact-Match - * path using rte_hash_lookup_multi method from previous implementation. - * While sequential classification seems to be faster, it's disabled by default - * and can be enabled with HASH_LOOKUP_MULTI global define in compilation time. - */ - #include "l3fwd_sse.h" -static inline void +static inline __attribute__((always_inline)) void em_get_dst_port_ipv4x8(struct lcore_conf *qconf, struct rte_mbuf *m[8], uint8_t portid, uint32_t dst_port[8]) { @@ -168,7 +160,7 @@ get_ipv6_5tuple(struct rte_mbuf *m0, __m128i mask0, key->xmm[2] = _mm_and_si128(tmpdata2, mask1); } -static inline void +static inline __attribute__((always_inline)) void em_get_dst_port_ipv6x8(struct lcore_conf *qconf, struct rte_mbuf *m[8], uint8_t portid, uint32_t dst_port[8]) { @@ -322,17 +314,17 @@ l3fwd_em_send_packets(int nb_rx, struct rte_mbuf **pkts_burst, } else { dst_port[j] = em_get_dst_port(qconf, pkts_burst[j], portid); - dst_port[j+1] = em_get_dst_port(qconf, pkts_burst[j], portid); - dst_port[j+2] = em_get_dst_port(qconf, pkts_burst[j], portid); - dst_port[j+3] = em_get_dst_port(qconf, pkts_burst[j], portid); - dst_port[j+4] = em_get_dst_port(qconf, pkts_burst[j], portid); - dst_port[j+5] = em_get_dst_port(qconf, pkts_burst[j], portid); - dst_port[j+6] = em_get_dst_port(qconf, pkts_burst[j], portid); - dst_port[j+7] = em_get_dst_port(qconf, pkts_burst[j], portid); + dst_port[j+1] = em_get_dst_port(qconf, pkts_burst[j+1], portid); + dst_port[j+2] = em_get_dst_port(qconf, pkts_burst[j+2], portid); + dst_port[j+3] = em_get_dst_port(qconf, pkts_burst[j+3], portid); + dst_port[j+4] = em_get_dst_port(qconf, pkts_burst[j+4], portid); + dst_port[j+5] = em_get_dst_port(qconf, pkts_burst[j+5], portid); + dst_port[j+6] = em_get_dst_port(qconf, pkts_burst[j+6], portid); + dst_port[j+7] = em_get_dst_port(qconf, pkts_burst[j+7], portid); } } - for (; j < n; j++) + for (; j < nb_rx; j++) dst_port[j] = em_get_dst_port(qconf, pkts_burst[j], portid); send_packets_multi(qconf, pkts_burst, dst_port, nb_rx); diff --git a/examples/l3fwd/l3fwd_em_sse.h b/examples/l3fwd/l3fwd_em_sse.h index d4a2a2d..8bd150a 100644 --- a/examples/l3fwd/l3fwd_em_sse.h +++ b/examples/l3fwd/l3fwd_em_sse.h @@ -34,6 +34,15 @@ #ifndef __L3FWD_EM_SSE_H__ #define __L3FWD_EM_SSE_H__ +/** + * @file + * This is an optional implementation of packet classification in Exact-Match + * path using sequential packet classification method. + * While hash lookup multi seems to provide better performance, it's disabled + * by default and can be enabled with NO_HASH_LOOKUP_MULTI global define in + * compilation time. + */ + #include "l3fwd_sse.h" static inline __attribute__((always_inline)) uint16_t diff --git a/examples/l3fwd/l3fwd_lpm.c b/examples/l3fwd/l3fwd_lpm.c index a354797..990a7f1 100644 --- a/examples/l3fwd/l3fwd_lpm.c +++ b/examples/l3fwd/l3fwd_lpm.c @@ -159,8 +159,8 @@ lpm_main_loop(__attribute__((unused)) void *dummy) diff_tsc = cur_tsc - prev_tsc; if (unlikely(diff_tsc > drain_tsc)) { - for (i = 0; i < qconf->n_rx_queue; i++) { - portid = qconf->rx_queue_list[i].port_id; + for (i = 0; i < qconf->n_tx_port; ++i) { + portid = qconf->tx_port_id[i]; if (qconf->tx_mbufs[portid].len == 0) continue; send_burst(qconf, diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c index 8520f71..792894f 100644 --- a/examples/l3fwd/main.c +++ b/examples/l3fwd/main.c @@ -791,6 +791,7 @@ main(int argc, char **argv) unsigned lcore_id; uint32_t n_tx_queue, nb_lcores; uint8_t portid, nb_rx_queue, queue, socketid; + uint8_t nb_tx_port; /* init EAL */ ret = rte_eal_init(argc, argv); @@ -830,6 +831,7 @@ main(int argc, char **argv) rte_exit(EXIT_FAILURE, "check_port_config failed\n"); nb_lcores = rte_lcore_count(); + nb_tx_port = 0; /* Setup function pointers for lookup method. */ setup_l3fwd_lookup_tables(); @@ -906,8 +908,13 @@ main(int argc, char **argv) qconf = &lcore_conf[lcore_id]; qconf->tx_queue_id[portid] = queueid; queueid++; + + qconf->n_tx_port = nb_tx_port; + qconf->tx_port_id[qconf->n_tx_port] = portid; } printf("\n"); + + nb_tx_port++; } for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { -- 1.7.9.5