From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 98DBF239; Wed, 7 Nov 2018 10:17:37 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Nov 2018 01:17:36 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,475,1534834800"; d="scan'208";a="94315811" Received: from pgsmsx112-dag.png.intel.com (HELO PGSMSX112.gar.corp.intel.com) ([10.108.55.234]) by FMSMGA003.fm.intel.com with ESMTP; 07 Nov 2018 01:17:35 -0800 Received: from pgsmsx103.gar.corp.intel.com ([169.254.2.114]) by PGSMSX112.gar.corp.intel.com ([169.254.3.221]) with mapi id 14.03.0415.000; Wed, 7 Nov 2018 17:17:34 +0800 From: "Zhao1, Wei" To: Luca Boccassi , "dev@dpdk.org" CC: "Lu, Wenzhuo" , "Ananyev, Konstantin" , "stable@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH] net/ixgbe: reduce PF mailbox interrupt rate Thread-Index: AQHUNKKC3OuJ6g/U/USeuorm26D0SKVEim3Q Date: Wed, 7 Nov 2018 09:17:33 +0000 Message-ID: References: <20180815141430.13421-1-bluca@debian.org> In-Reply-To: <20180815141430.13421-1-bluca@debian.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [172.30.20.205] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: reduce PF mailbox interrupt rate X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Nov 2018 09:17:38 -0000 Hi, Luca Boccassi The purpose of this patch is to reduce the mailbox interrupt from vf to= pf, but there seem some point need for discussion in this patch. =20 First, I do not know why do you change code of function ixgbe_check_mac_lin= k_vf(), because in rte_eth_link_get_nowait() and rte_eth_link_get(), it will call ixgbe_dev_link_update()->ixgbe_dev_link_update_share()-> ixgbe= vf_check_link() for VF, NOT ixgbe_check_mac_link_vf() in your patch! Second, in function ixgbevf_check_link(), there is mailbox message read ope= ration for vf,=20 " if (mbx->ops.read(hw, &in_msg, 1, 0))", that is ixgbe_read_mbx_vf() , This will cause interrupt from vf to pf, this is just the point of this pat= ch, it is also the problem that you want to solve. So, you use autoneg_wait_to_complete flag to control this mailbox message r= ead operation, maybe you will use rte_eth_link_get_nowait(), Which set auto= neg_wait_to_complete =3D 0, then the interrupt from vf to pf can be reduced= . But I do not think this patch is necessary, because in ixgbevf_check_link(= ), it,has "=20 bool no_pflink_check =3D wait_to_complete =3D=3D 0;=20 //////////////////////// if (no_pflink_check) { if (*speed =3D=3D IXGBE_LINK_SPEED_UNKNOWN) mac->get_link_status =3D tr= ue; else mac->get_link_status =3D fa= lse; goto out; } " Comment of "for a quick link status checking, wait_to_compelet =3D=3D 0, sk= ip PF link status checking " is clear. That means in rte_eth_link_get_nowait(), code will skip this mailbox read i= nterrupt, only in=20 rte_eth_link_get() there will be this interrupt, so I think what you need t= o is just replace rte_eth_link_get() with rte_eth_link_get_nowait() in your APP, that will reduce interrupt from vf to pf in mailbox read. > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Luca Boccassi > Sent: Wednesday, August 15, 2018 10:15 PM > To: dev@dpdk.org > Cc: Lu, Wenzhuo ; Ananyev, Konstantin > ; Luca Boccassi ; > stable@dpdk.org > Subject: [dpdk-dev] [PATCH] net/ixgbe: reduce PF mailbox interrupt rate >=20 > We have observed high rate of NIC PF interrupts when VNF is using DPDK > APIs rte_eth_link_get_nowait() and rte_eth_link_get() functions, as they > are causing VF driver to send many MBOX ACK messages. >=20 > With these changes, the interrupt rates go down significantly. Here's som= e > testing results: >=20 > Without the patch: >=20 > $ egrep 'CPU|ens1f' /proc/interrupts ; sleep 10; egrep 'CPU|ens1f' > /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 = CPU6 CPU7 > CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 = CPU15 > 34: 88 0 0 0 0 41 = 30 509 0 0 350 > 24 88 114 461 562 PCI-MSI 1572864-edge = ens1f0-TxRx-0 > 35: 49 24 0 0 65 130 = 64 29 67 0 10 > 0 0 46 38 764 PCI-MSI 1572865-edge = ens1f0-TxRx-1 > 36: 53 0 0 64 15 85 = 132 71 108 0 > 30 0 165 215 303 104 PCI-MSI 15728= 66-edge ens1f0- > TxRx-2 > 37: 46 196 0 0 10 48 = 62 68 51 0 0 > 0 103 82 54 192 PCI-MSI 1572867-edge = ens1f0-TxRx-3 > 38: 226 0 0 0 159 145 = 749 265 0 0 > 202 0 69229 166 450 0 PCI-MSI 1572= 868-edge ens1f0 > 52: 95 896 0 0 0 18 = 53 0 494 0 0 > 0 0 265 79 124 PCI-MSI 1574912-edge = ens1f1-TxRx-0 > 53: 50 0 18 0 72 33 = 0 168 330 0 0 > 0 141 22 12 65 PCI-MSI 1574913-edge = ens1f1-TxRx-1 > 54: 65 0 0 0 239 104 = 166 49 442 0 > 0 0 126 26 307 0 PCI-MSI 157491= 4-edge ens1f1-TxRx-2 > 55: 57 0 0 0 123 35 = 83 54 157 106 > 0 0 26 29 312 97 PCI-MSI 157491= 5-edge ens1f1-TxRx-3 > 56: 232 0 13910 0 16 21 = 0 54422 0 0 > 0 24 25 0 78 0 PCI-MSI 157491= 6-edge ens1f1 > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 = CPU6 CPU7 > CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 = CPU15 > 34: 88 0 0 0 0 41 = 30 509 0 0 350 > 24 88 119 461 562 PCI-MSI 1572864-edge = ens1f0-TxRx-0 > 35: 49 24 0 0 65 130 = 64 29 67 0 10 > 0 0 46 38 771 PCI-MSI 1572865-edge = ens1f0-TxRx-1 > 36: 53 0 0 64 15 85 = 132 71 108 0 > 30 0 165 215 303 113 PCI-MSI 15728= 66-edge ens1f0- > TxRx-2 > 37: 46 196 0 0 10 48 = 62 68 56 0 0 > 0 103 82 54 192 PCI-MSI 1572867-edge = ens1f0-TxRx-3 > 38: 226 0 0 0 159 145 = 749 265 0 0 > 202 0 71281 166 450 0 PCI-MSI 1572= 868-edge ens1f0 > 52: 95 896 0 0 0 18 = 53 0 494 0 0 > 0 0 265 79 133 PCI-MSI 1574912-edge = ens1f1-TxRx-0 > 53: 50 0 18 0 72 33 = 0 173 330 0 0 > 0 141 22 12 65 PCI-MSI 1574913-edge = ens1f1-TxRx-1 > 54: 65 0 0 0 239 104 = 166 49 442 0 > 0 0 126 26 312 0 PCI-MSI 157491= 4-edge ens1f1-TxRx-2 > 55: 57 0 0 0 123 35 = 83 59 157 106 > 0 0 26 29 312 97 PCI-MSI 157491= 5-edge ens1f1-TxRx-3 > 56: 232 0 15910 0 16 21 = 0 54422 0 0 > 0 24 25 0 78 0 PCI-MSI 157491= 6-edge ens1f1 >=20 > During the 10s interval, CPU2 jumped by 2000 interrupts, CPU12 by 2051 > interrupts, for about 200 interrupts/second. That's on the order of what = we > expect. I would have guessed 100/s but perhaps there are two mailbox > messages. >=20 > With the patch: >=20 > $ egrep 'CPU|ens1f' /proc/interrupts ; sleep 10; egrep 'CPU|ens1f' > /proc/interrupts > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 = CPU6 CPU7 > CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 = CPU15 > 34: 88 0 0 0 0 25 = 19 177 0 0 350 > 24 88 100 362 559 PCI-MSI 1572864-edge = ens1f0-TxRx-0 > 35: 49 19 0 0 65 130 = 64 29 67 0 10 > 0 0 46 38 543 PCI-MSI 1572865-edge = ens1f0-TxRx-1 > 36: 53 0 0 64 15 53 = 85 71 108 0 24 > 0 85 215 292 31 PCI-MSI 1572866-edge = ens1f0-TxRx-2 > 37: 46 196 0 0 10 43 = 57 39 19 0 0 > 0 78 69 49 149 PCI-MSI 1572867-edge = ens1f0-TxRx-3 > 38: 226 0 0 0 159 145 = 749 247 0 0 > 202 0 58250 0 450 0 PCI-MSI 1572= 868-edge ens1f0 > 52: 95 896 0 0 0 18 = 53 0 189 0 0 > 0 0 265 79 25 PCI-MSI 1574912-edge = ens1f1-TxRx-0 > 53: 50 0 18 0 72 33 = 0 90 330 0 0 > 0 136 5 12 0 PCI-MSI 1574913-edge = ens1f1-TxRx-1 > 54: 65 0 0 0 10 104 = 166 49 442 0 0 > 0 126 26 226 0 PCI-MSI 1574914-edge = ens1f1-TxRx-2 > 55: 57 0 0 0 61 35 = 83 30 157 101 0 > 0 26 15 312 0 PCI-MSI 1574915-edge = ens1f1-TxRx-3 > 56: 232 0 2062 0 16 21 = 0 54422 0 0 > 0 24 25 0 78 0 PCI-MSI 157491= 6-edge ens1f1 > CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 = CPU6 CPU7 > CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 = CPU15 > 34: 88 0 0 0 0 25 = 19 177 0 0 350 > 24 88 102 362 562 PCI-MSI 1572864-edge = ens1f0-TxRx-0 > 35: 49 19 0 0 65 130 = 64 29 67 0 10 > 0 0 46 38 548 PCI-MSI 1572865-edge = ens1f0-TxRx-1 > 36: 53 0 0 64 15 53 = 85 71 108 0 24 > 0 85 215 292 36 PCI-MSI 1572866-edge = ens1f0-TxRx-2 > 37: 46 196 0 0 10 45 = 57 39 19 0 0 > 0 78 69 49 152 PCI-MSI 1572867-edge = ens1f0-TxRx-3 > 38: 226 0 0 0 159 145 = 749 247 0 0 > 202 0 58259 0 450 0 PCI-MSI 1572= 868-edge ens1f0 > 52: 95 896 0 0 0 18 = 53 0 194 0 0 > 0 0 265 79 25 PCI-MSI 1574912-edge = ens1f1-TxRx-0 > 53: 50 0 18 0 72 33 = 0 95 330 0 0 > 0 136 5 12 0 PCI-MSI 1574913-edge = ens1f1-TxRx-1 > 54: 65 0 0 0 10 104 = 166 49 442 0 0 > 0 126 26 231 0 PCI-MSI 1574914-edge = ens1f1-TxRx-2 > 55: 57 0 0 0 66 35 = 83 30 157 101 0 > 0 26 15 312 0 PCI-MSI 1574915-edge = ens1f1-TxRx-3 > 56: 232 0 2071 0 16 21 = 0 54422 0 0 > 0 24 25 0 78 0 PCI-MSI 157491= 6-edge ens1f1 >=20 > Note the interrupt rate has gone way down. During the 10s interval, we on= ly > saw a handful of interrupts. >=20 > Note that this patch was originally provided by Intel directly to AT&T an= d > Vyatta, but unfortunately I am unable to find records of the exact author= . >=20 > We have been using this in production for more than a year. >=20 > Fixes: af75078fece3 ("first public release") > Cc: stable@dpdk.org >=20 > Signed-off-by: Luca Boccassi > --- > drivers/net/ixgbe/base/ixgbe_vf.c | 33 ++++++++++++++++--------------- > 1 file changed, 17 insertions(+), 16 deletions(-) >=20 > diff --git a/drivers/net/ixgbe/base/ixgbe_vf.c > b/drivers/net/ixgbe/base/ixgbe_vf.c > index 5b25a6b4d4..16086670b1 100644 > --- a/drivers/net/ixgbe/base/ixgbe_vf.c > +++ b/drivers/net/ixgbe/base/ixgbe_vf.c > @@ -586,7 +586,6 @@ s32 ixgbe_check_mac_link_vf(struct ixgbe_hw *hw, > ixgbe_link_speed *speed, > s32 ret_val =3D IXGBE_SUCCESS; > u32 links_reg; > u32 in_msg =3D 0; > - UNREFERENCED_1PARAMETER(autoneg_wait_to_complete); >=20 > /* If we were hit with a reset drop the link */ > if (!mbx->ops.check_for_rst(hw, 0) || !mbx->timeout) @@ -643,23 > +642,25 @@ s32 ixgbe_check_mac_link_vf(struct ixgbe_hw *hw, > ixgbe_link_speed *speed, > *speed =3D IXGBE_LINK_SPEED_UNKNOWN; > } >=20 > - /* if the read failed it could just be a mailbox collision, best wait > - * until we are called again and don't report an error > - */ > - if (mbx->ops.read(hw, &in_msg, 1, 0)) > - goto out; > + if (autoneg_wait_to_complete) { > + /* if the read failed it could just be a mailbox collision, best > wait > + * until we are called again and don't report an error > + */ > + if (mbx->ops.read(hw, &in_msg, 1, 0)) > + goto out; >=20 > - if (!(in_msg & IXGBE_VT_MSGTYPE_CTS)) { > - /* msg is not CTS and is NACK we must have lost CTS status > */ > - if (in_msg & IXGBE_VT_MSGTYPE_NACK) > + if (!(in_msg & IXGBE_VT_MSGTYPE_CTS)) { > + /* msg is not CTS and is NACK we must have lost CTS > status */ > + if (in_msg & IXGBE_VT_MSGTYPE_NACK) > + ret_val =3D -1; > + goto out; > + } > + > + /* the pf is talking, if we timed out in the past we reinit */ > + if (!mbx->timeout) { > ret_val =3D -1; > - goto out; > - } > - > - /* the pf is talking, if we timed out in the past we reinit */ > - if (!mbx->timeout) { > - ret_val =3D -1; > - goto out; > + goto out; > + } > } >=20 > /* if we passed all the tests above then the link is up and we no > -- > 2.18.0