From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 0256B2BF7 for ; Wed, 12 Sep 2018 09:45:36 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Sep 2018 00:45:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.53,363,1531810800"; d="scan'208";a="89284167" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by fmsmga001.fm.intel.com with ESMTP; 12 Sep 2018 00:45:27 -0700 Received: from fmsmsx118.amr.corp.intel.com (10.18.116.18) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 12 Sep 2018 00:45:26 -0700 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by fmsmsx118.amr.corp.intel.com (10.18.116.18) with Microsoft SMTP Server (TLS) id 14.3.319.2; Wed, 12 Sep 2018 00:45:26 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.240]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.143]) with mapi id 14.03.0319.002; Wed, 12 Sep 2018 15:45:14 +0800 From: "Zhang, Qi Z" To: "Li, Xiaoyun" , "Xing, Beilei" CC: "dev@dpdk.org" , "Yang, Zhiyong" , "Richardson, Bruce" , "Hunt, David" Thread-Topic: [PATCH v4] net/i40e: add interface to choose latest vector path Thread-Index: AQHUSPCzmYfgpgJgo0qSZHWOHGAg9qTsQ2HA Date: Wed, 12 Sep 2018 07:45:13 +0000 Message-ID: <039ED4275CED7440929022BC67E706115328439F@SHSMSX103.ccr.corp.intel.com> References: <1535595399-430873-1-git-send-email-xiaoyun.li@intel.com> <20180910101746.68835-1-xiaoyun.li@intel.com> In-Reply-To: <20180910101746.68835-1-xiaoyun.li@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYWI2MGQwNTktNGU3Zi00ZDIyLTlhNDktMDA1NjZlNmEwZWZhIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiQTd5YUI5Y2tJb1pIdUhsejVrbU5Ud1lCTXpKNFlmSGlabGhMK0p1dVcxdHVSREtESVdYVndJYzZicmhGalwvZFcifQ== x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v4] net/i40e: add interface to choose latest vector path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Sep 2018 07:45:37 -0000 > -----Original Message----- > From: Li, Xiaoyun > Sent: Monday, September 10, 2018 6:18 PM > To: Xing, Beilei ; Zhang, Qi Z > Cc: dev@dpdk.org; Yang, Zhiyong ; Richardson, > Bruce ; Hunt, David ; L= i, > Xiaoyun > Subject: [PATCH v4] net/i40e: add interface to choose latest vector path >=20 > Right now, vector path is limited to only use on later platform. > This patch adds a devarg use-latest-vec to allow the users to use the lat= est > vector path that the platform supported. Namely, using AVX2 vector path o= n > broadwell is possible. >=20 > Signed-off-by: Xiaoyun Li > --- > v4: > * Polish the codes. > v3: > * Polish the doc and commit log. > v2: > * Correct the calling of the wrong function last time. > * Fix seg fault bug. > --- > doc/guides/nics/i40e.rst | 8 ++ > doc/guides/rel_notes/release_18_11.rst | 4 + > drivers/net/i40e/i40e_ethdev.c | 46 ++++++++++- > drivers/net/i40e/i40e_ethdev.h | 3 + > drivers/net/i40e/i40e_rxtx.c | 103 ++++++++++++++++--------- > 5 files changed, 128 insertions(+), 36 deletions(-) >=20 > diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst index > 65d87f869..643e6a062 100644 > --- a/doc/guides/nics/i40e.rst > +++ b/doc/guides/nics/i40e.rst > @@ -163,6 +163,14 @@ Runtime Config Options > Currently hot-plugging of representor ports is not supported so all re= quired > representors must be specified on the creation of the PF. >=20 > +- ``Use latest vector`` (default ``disable``) > + > + Vector path was limited to use only on later platform. But users may > + want the latest vector path. For example, VPP users may want to use > + AVX2 vector path on HSW/BDW because it can get better perf. So > + ``devargs`` parameter ``use-latest-vec`` is introduced, for example:: > + -w 84:00.0,use-latest-vec=3D1 > + > Driver compilation and testing > ------------------------------ >=20 > diff --git a/doc/guides/rel_notes/release_18_11.rst > b/doc/guides/rel_notes/release_18_11.rst > index 3ae6b3f58..34af591a2 100644 > --- a/doc/guides/rel_notes/release_18_11.rst > +++ b/doc/guides/rel_notes/release_18_11.rst > @@ -54,6 +54,10 @@ New Features > Also, make sure to start the actual text at the margin. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > +* **Added a devarg to use the latest vector path.** > + A new devarg ``use-latest-vec`` was introduced to allow users to > +choose > + the latest vector path that the platform supported. For example, VPP > +users > + can use AVX2 vector path on BDW/HSW to get better performance. >=20 > API Changes > ----------- > diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethde= v.c > index 85a6a867f..72377d0b6 100644 > --- a/drivers/net/i40e/i40e_ethdev.c > +++ b/drivers/net/i40e/i40e_ethdev.c > @@ -44,6 +44,7 @@ > #define ETH_I40E_FLOATING_VEB_LIST_ARG "floating_veb_list" > #define ETH_I40E_SUPPORT_MULTI_DRIVER "support-multi-driver" > #define ETH_I40E_QUEUE_NUM_PER_VF_ARG "queue-num-per-vf" > +#define ETH_I40E_USE_LATEST_VEC "use-latest-vec" >=20 > #define I40E_CLEAR_PXE_WAIT_MS 200 >=20 > @@ -408,6 +409,7 @@ static const char *const valid_keys[] =3D { > ETH_I40E_FLOATING_VEB_LIST_ARG, > ETH_I40E_SUPPORT_MULTI_DRIVER, > ETH_I40E_QUEUE_NUM_PER_VF_ARG, > + ETH_I40E_USE_LATEST_VEC, > NULL}; >=20 > static const struct rte_pci_id pci_id_i40e_map[] =3D { @@ -1201,6 +1203,= 46 @@ > i40e_aq_debug_write_global_register(struct i40e_hw *hw, > return i40e_aq_debug_write_register(hw, reg_addr, reg_val, > cmd_details); } >=20 > +static int > +i40e_parse_latest_vec(struct rte_eth_dev *dev) { > + struct i40e_adapter *ad =3D > + I40E_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private); > + int kvargs_count, use_latest_vec; > + struct rte_kvargs *kvlist; > + > + ad->use_latest_vec =3D false; > + > + if (!dev->device->devargs) > + return 0; > + > + kvlist =3D rte_kvargs_parse(dev->device->devargs->args, valid_keys); > + if (!kvlist) > + return -EINVAL; > + > + kvargs_count =3D rte_kvargs_count(kvlist, ETH_I40E_USE_LATEST_VEC); > + if (!kvargs_count) { > + rte_kvargs_free(kvlist); > + return 0; > + } > + > + if (kvargs_count > 1) > + PMD_DRV_LOG(WARNING, "More than one argument \"%s\" and > only " > + "the first one is used !", > + ETH_I40E_USE_LATEST_VEC); > + > + use_latest_vec =3D atoi((&kvlist->pairs[0])->value); > + > + rte_kvargs_free(kvlist); > + > + if (use_latest_vec !=3D 0 && use_latest_vec !=3D 1) > + PMD_DRV_LOG(WARNING, "Value should be 0 or 1, set it as 1!"); > + > + ad->use_latest_vec =3D (bool)use_latest_vec; > + > + return 0; > +} > + > static int > eth_i40e_dev_init(struct rte_eth_dev *dev, void *init_params __rte_unuse= d) > { @@ -1263,6 +1305,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev, void > *init_params __rte_unused) >=20 > /* Check if need to support multi-driver */ > i40e_support_multi_driver(dev); > + i40e_parse_latest_vec(dev); >=20 > /* Make sure all is clean before doing PF reset */ > i40e_clear_hw(hw); > @@ -12527,4 +12570,5 @@ RTE_PMD_REGISTER_PARAM_STRING(net_i40e, > ETH_I40E_FLOATING_VEB_ARG "=3D1" > ETH_I40E_FLOATING_VEB_LIST_ARG "=3D" > ETH_I40E_QUEUE_NUM_PER_VF_ARG "=3D1|2|4|8|16" > - ETH_I40E_SUPPORT_MULTI_DRIVER "=3D1"); > + ETH_I40E_SUPPORT_MULTI_DRIVER "=3D1" > + ETH_I40E_USE_LATEST_VEC "=3D1"); > diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethde= v.h > index 3fffe5a55..140c92b84 100644 > --- a/drivers/net/i40e/i40e_ethdev.h > +++ b/drivers/net/i40e/i40e_ethdev.h > @@ -1078,6 +1078,9 @@ struct i40e_adapter { > uint64_t pctypes_tbl[I40E_FLOW_TYPE_MAX] __rte_cache_min_aligned; > uint64_t flow_types_mask; > uint64_t pctypes_mask; > + > + /* For devargs */ > + bool use_latest_vec; > }; >=20 > /** > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c = index > 2a28ee348..e9fa7ed90 100644 > --- a/drivers/net/i40e/i40e_rxtx.c > +++ b/drivers/net/i40e/i40e_rxtx.c > @@ -2909,6 +2909,34 @@ i40e_txq_info_get(struct rte_eth_dev *dev, > uint16_t queue_id, > qinfo->conf.offloads =3D txq->offloads; > } >=20 > +static eth_rx_burst_t > +i40e_get_latest_rx_vec(bool scatter) > +{ > +#ifdef RTE_ARCH_X86 > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) > + return scatter ? i40e_recv_scattered_pkts_vec_avx2 : > + i40e_recv_pkts_vec_avx2; > +#endif > + return scatter ? i40e_recv_scattered_pkts_vec : > + i40e_recv_pkts_vec; > +} > + > +static eth_rx_burst_t > +i40e_get_recommend_rx_vec(bool scatter) { #ifdef RTE_ARCH_X86 > + /* > + * since AVX frequency can be different to base frequency, limit > + * use of AVX2 version to later plaforms, not all those that could > + * theoretically run it. > + */ > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F)) > + return scatter ? i40e_recv_scattered_pkts_vec_avx2 : > + i40e_recv_pkts_vec_avx2; > +#endif > + return scatter ? i40e_recv_scattered_pkts_vec : > + i40e_recv_pkts_vec; > +} > void __attribute__((cold)) > i40e_set_rx_function(struct rte_eth_dev *dev) { @@ -2948,19 +2976,12 > @@ i40e_set_rx_function(struct rte_eth_dev *dev) > PMD_INIT_LOG(DEBUG, "Using Vector Scattered Rx " > "callback (port=3D%d).", > dev->data->port_id); > - > - dev->rx_pkt_burst =3D i40e_recv_scattered_pkts_vec; > -#ifdef RTE_ARCH_X86 > - /* > - * since AVX frequency can be different to base > - * frequency, limit use of AVX2 version to later > - * plaforms, not all those that could theoretically > - * run it. > - */ > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F)) > + if (ad->use_latest_vec) > dev->rx_pkt_burst =3D > - i40e_recv_scattered_pkts_vec_avx2; > -#endif > + i40e_get_latest_rx_vec(true); > + else > + dev->rx_pkt_burst =3D > + i40e_get_recommend_rx_vec(true); > } else { > PMD_INIT_LOG(DEBUG, "Using a Scattered with bulk " > "allocation callback (port=3D%d).", @@ -2978,18 > +2999,10 @@ i40e_set_rx_function(struct rte_eth_dev *dev) > "burst size no less than %d (port=3D%d).", > RTE_I40E_DESCS_PER_LOOP, > dev->data->port_id); > - > - dev->rx_pkt_burst =3D i40e_recv_pkts_vec; > -#ifdef RTE_ARCH_X86 > - /* > - * since AVX frequency can be different to base > - * frequency, limit use of AVX2 version to later > - * plaforms, not all those that could theoretically > - * run it. > - */ > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F)) > - dev->rx_pkt_burst =3D i40e_recv_pkts_vec_avx2; > -#endif > + if (ad->use_latest_vec) > + dev->rx_pkt_burst =3D i40e_get_latest_rx_vec(false); > + else > + dev->rx_pkt_burst =3D i40e_get_recommend_rx_vec(false); How about simplify the code as below? /* default */ dev->rx_pkt_burst =3D dev->data->scattered_rx ? i40e_recv_scattered_pkts : i40e_recv_pkts; if (ad->rx_vec_allowed) { /* overwrite by vec path*/ if (ad->use_latest_vec) dev->rx_pkt_burst =3D i40e_get_latest_rx_vec(dev->data->scattered_rx); else dev->rx_pkt_burst =3D i40e_get_recommend_rx_vec(dev->data->scattered_rx); } else if (ad->rx_bulk_alloc_allowed) { /* or overwrite by bulk alloc */ dev->rx_pkt_burst =3D i40e_recv_pkts_bulk_alloc; } > } else if (ad->rx_bulk_alloc_allowed) { > PMD_INIT_LOG(DEBUG, "Rx Burst Bulk Alloc Preconditions are " > "satisfied. Rx Burst Bulk Alloc function " > @@ -3049,6 +3062,31 @@ i40e_set_tx_function_flag(struct rte_eth_dev *dev, > struct i40e_tx_queue *txq) > txq->queue_id); > } >=20 > +static eth_tx_burst_t > +i40e_get_latest_tx_vec(void) > +{ > +#ifdef RTE_ARCH_X86 > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2)) > + return i40e_xmit_pkts_vec_avx2; > +#endif > + return i40e_xmit_pkts_vec; > +} > + > +static eth_tx_burst_t > +i40e_get_recommend_tx_vec(void) > +{ > +#ifdef RTE_ARCH_X86 > + /* > + * since AVX frequency can be different to base frequency, limit > + * use of AVX2 version to later plaforms, not all those that could > + * theoretically run it. > + */ > + if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F)) > + return i40e_xmit_pkts_vec_avx2; > +#endif > + return i40e_xmit_pkts_vec; > +} > + > void __attribute__((cold)) > i40e_set_tx_function(struct rte_eth_dev *dev) { @@ -3073,17 +3111,12 > @@ i40e_set_tx_function(struct rte_eth_dev *dev) > if (ad->tx_simple_allowed) { > if (ad->tx_vec_allowed) { > PMD_INIT_LOG(DEBUG, "Vector tx finally be used."); > - dev->tx_pkt_burst =3D i40e_xmit_pkts_vec; > -#ifdef RTE_ARCH_X86 > - /* > - * since AVX frequency can be different to base > - * frequency, limit use of AVX2 version to later > - * plaforms, not all those that could theoretically > - * run it. > - */ > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F)) > - dev->tx_pkt_burst =3D i40e_xmit_pkts_vec_avx2; > -#endif > + if (ad->use_latest_vec) > + dev->tx_pkt_burst =3D > + i40e_get_latest_tx_vec(); > + else > + dev->tx_pkt_burst =3D > + i40e_get_recommend_tx_vec(); > } else { > PMD_INIT_LOG(DEBUG, "Simple tx finally be used."); > dev->tx_pkt_burst =3D i40e_xmit_pkts_simple; > -- > 2.17.1