From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <wei.zhao1@intel.com>
Received: from mga04.intel.com (mga04.intel.com [192.55.52.120])
 by dpdk.org (Postfix) with ESMTP id 98DBF239;
 Wed,  7 Nov 2018 10:17:37 +0100 (CET)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 07 Nov 2018 01:17:36 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.54,475,1534834800"; d="scan'208";a="94315811"
Received: from pgsmsx112-dag.png.intel.com (HELO PGSMSX112.gar.corp.intel.com)
 ([10.108.55.234])
 by FMSMGA003.fm.intel.com with ESMTP; 07 Nov 2018 01:17:35 -0800
Received: from pgsmsx103.gar.corp.intel.com ([169.254.2.114]) by
 PGSMSX112.gar.corp.intel.com ([169.254.3.221]) with mapi id 14.03.0415.000;
 Wed, 7 Nov 2018 17:17:34 +0800
From: "Zhao1, Wei" <wei.zhao1@intel.com>
To: Luca Boccassi <bluca@debian.org>, "dev@dpdk.org" <dev@dpdk.org>
CC: "Lu, Wenzhuo" <wenzhuo.lu@intel.com>, "Ananyev, Konstantin"
 <konstantin.ananyev@intel.com>, "stable@dpdk.org" <stable@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH] net/ixgbe: reduce PF mailbox interrupt rate
Thread-Index: AQHUNKKC3OuJ6g/U/USeuorm26D0SKVEim3Q
Date: Wed, 7 Nov 2018 09:17:33 +0000
Message-ID: <A2573D2ACFCADC41BB3BE09C6DE313CA07E699E8@PGSMSX103.gar.corp.intel.com>
References: <20180815141430.13421-1-bluca@debian.org>
In-Reply-To: <20180815141430.13421-1-bluca@debian.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
dlp-product: dlpe-windows
dlp-version: 11.0.400.15
dlp-reaction: no-action
x-originating-ip: [172.30.20.205]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: reduce PF mailbox interrupt rate
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Nov 2018 09:17:38 -0000

Hi, Luca Boccassi

    The purpose of this patch is to reduce the mailbox interrupt from vf to=
 pf, but there seem some point need for discussion in this patch.
=20
First, I do not know why do you change code of function ixgbe_check_mac_lin=
k_vf(), because in rte_eth_link_get_nowait() and rte_eth_link_get(),
it will call ixgbe_dev_link_update()->ixgbe_dev_link_update_share()-> ixgbe=
vf_check_link() for VF, NOT ixgbe_check_mac_link_vf() in your patch!

Second, in function ixgbevf_check_link(), there is mailbox message read ope=
ration for vf,=20
" if (mbx->ops.read(hw, &in_msg, 1, 0))", that is ixgbe_read_mbx_vf() ,
This will cause interrupt from vf to pf, this is just the point of this pat=
ch, it is also the problem that you want to solve.
So, you use autoneg_wait_to_complete flag to control this mailbox message r=
ead operation, maybe you will use rte_eth_link_get_nowait(), Which set auto=
neg_wait_to_complete =3D 0, then the interrupt from vf to pf can be reduced=
.

But  I do not think this patch is necessary, because in ixgbevf_check_link(=
), it,has
"=20
bool no_pflink_check =3D wait_to_complete =3D=3D 0;=20

               ////////////////////////

                if (no_pflink_check) {
                                if (*speed =3D=3D IXGBE_LINK_SPEED_UNKNOWN)
                                                mac->get_link_status =3D tr=
ue;
                                else
                                                mac->get_link_status =3D fa=
lse;

                                goto out;
                }
"
Comment of "for a quick link status checking, wait_to_compelet =3D=3D 0, sk=
ip PF link status checking " is clear.

That means in rte_eth_link_get_nowait(), code will skip this mailbox read i=
nterrupt, only in=20
rte_eth_link_get() there will be this interrupt, so I think what you need t=
o is just replace
rte_eth_link_get() with rte_eth_link_get_nowait() in your APP,
that will reduce interrupt from vf to pf in mailbox read.


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Luca Boccassi
> Sent: Wednesday, August 15, 2018 10:15 PM
> To: dev@dpdk.org
> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Luca Boccassi <bluca@debian.org>;
> stable@dpdk.org
> Subject: [dpdk-dev] [PATCH] net/ixgbe: reduce PF mailbox interrupt rate
>=20
> We have observed high rate of NIC PF interrupts when VNF is using DPDK
> APIs rte_eth_link_get_nowait() and rte_eth_link_get() functions, as they
> are causing VF driver to send many MBOX ACK messages.
>=20
> With these changes, the interrupt rates go down significantly. Here's som=
e
> testing results:
>=20
> Without the patch:
>=20
> $ egrep 'CPU|ens1f' /proc/interrupts ; sleep 10; egrep 'CPU|ens1f'
> /proc/interrupts
>             CPU0       CPU1       CPU2       CPU3       CPU4       CPU5  =
     CPU6       CPU7
> CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14  =
    CPU15
>   34:         88          0          0          0          0         41  =
       30        509          0          0        350
> 24         88        114        461        562   PCI-MSI 1572864-edge    =
  ens1f0-TxRx-0
>   35:         49         24          0          0         65        130  =
       64         29         67          0         10
> 0          0         46         38        764   PCI-MSI 1572865-edge     =
 ens1f0-TxRx-1
>   36:         53          0          0         64         15         85  =
      132         71        108          0
> 30          0        165        215        303        104   PCI-MSI 15728=
66-edge      ens1f0-
> TxRx-2
>   37:         46        196          0          0         10         48  =
       62         68         51          0          0
> 0        103         82         54        192   PCI-MSI 1572867-edge     =
 ens1f0-TxRx-3
>   38:        226          0          0          0        159        145  =
      749        265          0          0
> 202          0      69229        166        450          0   PCI-MSI 1572=
868-edge      ens1f0
>   52:         95        896          0          0          0         18  =
       53          0        494          0          0
> 0          0        265         79        124   PCI-MSI 1574912-edge     =
 ens1f1-TxRx-0
>   53:         50          0         18          0         72         33  =
        0        168        330          0          0
> 0        141         22         12         65   PCI-MSI 1574913-edge     =
 ens1f1-TxRx-1
>   54:         65          0          0          0        239        104  =
      166         49        442          0
> 0          0        126         26        307          0   PCI-MSI 157491=
4-edge      ens1f1-TxRx-2
>   55:         57          0          0          0        123         35  =
       83         54        157        106
> 0          0         26         29        312         97   PCI-MSI 157491=
5-edge      ens1f1-TxRx-3
>   56:        232          0      13910          0         16         21  =
        0      54422          0          0
> 0         24         25          0         78          0   PCI-MSI 157491=
6-edge      ens1f1
>             CPU0       CPU1       CPU2       CPU3       CPU4       CPU5  =
     CPU6       CPU7
> CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14  =
    CPU15
>   34:         88          0          0          0          0         41  =
       30        509          0          0        350
> 24         88        119        461        562   PCI-MSI 1572864-edge    =
  ens1f0-TxRx-0
>   35:         49         24          0          0         65        130  =
       64         29         67          0         10
> 0          0         46         38        771   PCI-MSI 1572865-edge     =
 ens1f0-TxRx-1
>   36:         53          0          0         64         15         85  =
      132         71        108          0
> 30          0        165        215        303        113   PCI-MSI 15728=
66-edge      ens1f0-
> TxRx-2
>   37:         46        196          0          0         10         48  =
       62         68         56          0          0
> 0        103         82         54        192   PCI-MSI 1572867-edge     =
 ens1f0-TxRx-3
>   38:        226          0          0          0        159        145  =
      749        265          0          0
> 202          0      71281        166        450          0   PCI-MSI 1572=
868-edge      ens1f0
>   52:         95        896          0          0          0         18  =
       53          0        494          0          0
> 0          0        265         79        133   PCI-MSI 1574912-edge     =
 ens1f1-TxRx-0
>   53:         50          0         18          0         72         33  =
        0        173        330          0          0
> 0        141         22         12         65   PCI-MSI 1574913-edge     =
 ens1f1-TxRx-1
>   54:         65          0          0          0        239        104  =
      166         49        442          0
> 0          0        126         26        312          0   PCI-MSI 157491=
4-edge      ens1f1-TxRx-2
>   55:         57          0          0          0        123         35  =
       83         59        157        106
> 0          0         26         29        312         97   PCI-MSI 157491=
5-edge      ens1f1-TxRx-3
>   56:        232          0      15910          0         16         21  =
        0      54422          0          0
> 0         24         25          0         78          0   PCI-MSI 157491=
6-edge      ens1f1
>=20
> During the 10s interval, CPU2 jumped by 2000 interrupts, CPU12 by 2051
> interrupts, for about 200 interrupts/second. That's on the order of what =
we
> expect. I would have guessed 100/s but perhaps there are two mailbox
> messages.
>=20
> With the patch:
>=20
> $ egrep 'CPU|ens1f' /proc/interrupts ; sleep 10; egrep 'CPU|ens1f'
> /proc/interrupts
>             CPU0       CPU1       CPU2       CPU3       CPU4       CPU5  =
     CPU6       CPU7
> CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14  =
    CPU15
>   34:         88          0          0          0          0         25  =
       19        177          0          0        350
> 24         88        100        362        559   PCI-MSI 1572864-edge    =
  ens1f0-TxRx-0
>   35:         49         19          0          0         65        130  =
       64         29         67          0         10
> 0          0         46         38        543   PCI-MSI 1572865-edge     =
 ens1f0-TxRx-1
>   36:         53          0          0         64         15         53  =
       85         71        108          0         24
> 0         85        215        292         31   PCI-MSI 1572866-edge     =
 ens1f0-TxRx-2
>   37:         46        196          0          0         10         43  =
       57         39         19          0          0
> 0         78         69         49        149   PCI-MSI 1572867-edge     =
 ens1f0-TxRx-3
>   38:        226          0          0          0        159        145  =
      749        247          0          0
> 202          0      58250          0        450          0   PCI-MSI 1572=
868-edge      ens1f0
>   52:         95        896          0          0          0         18  =
       53          0        189          0          0
> 0          0        265         79         25   PCI-MSI 1574912-edge     =
 ens1f1-TxRx-0
>   53:         50          0         18          0         72         33  =
        0         90        330          0          0
> 0        136          5         12          0   PCI-MSI 1574913-edge     =
 ens1f1-TxRx-1
>   54:         65          0          0          0         10        104  =
      166         49        442          0          0
> 0        126         26        226          0   PCI-MSI 1574914-edge     =
 ens1f1-TxRx-2
>   55:         57          0          0          0         61         35  =
       83         30        157        101          0
> 0         26         15        312          0   PCI-MSI 1574915-edge     =
 ens1f1-TxRx-3
>   56:        232          0       2062          0         16         21  =
        0      54422          0          0
> 0         24         25          0         78          0   PCI-MSI 157491=
6-edge      ens1f1
>             CPU0       CPU1       CPU2       CPU3       CPU4       CPU5  =
     CPU6       CPU7
> CPU8       CPU9       CPU10      CPU11      CPU12      CPU13      CPU14  =
    CPU15
>   34:         88          0          0          0          0         25  =
       19        177          0          0        350
> 24         88        102        362        562   PCI-MSI 1572864-edge    =
  ens1f0-TxRx-0
>   35:         49         19          0          0         65        130  =
       64         29         67          0         10
> 0          0         46         38        548   PCI-MSI 1572865-edge     =
 ens1f0-TxRx-1
>   36:         53          0          0         64         15         53  =
       85         71        108          0         24
> 0         85        215        292         36   PCI-MSI 1572866-edge     =
 ens1f0-TxRx-2
>   37:         46        196          0          0         10         45  =
       57         39         19          0          0
> 0         78         69         49        152   PCI-MSI 1572867-edge     =
 ens1f0-TxRx-3
>   38:        226          0          0          0        159        145  =
      749        247          0          0
> 202          0      58259          0        450          0   PCI-MSI 1572=
868-edge      ens1f0
>   52:         95        896          0          0          0         18  =
       53          0        194          0          0
> 0          0        265         79         25   PCI-MSI 1574912-edge     =
 ens1f1-TxRx-0
>   53:         50          0         18          0         72         33  =
        0         95        330          0          0
> 0        136          5         12          0   PCI-MSI 1574913-edge     =
 ens1f1-TxRx-1
>   54:         65          0          0          0         10        104  =
      166         49        442          0          0
> 0        126         26        231          0   PCI-MSI 1574914-edge     =
 ens1f1-TxRx-2
>   55:         57          0          0          0         66         35  =
       83         30        157        101          0
> 0         26         15        312          0   PCI-MSI 1574915-edge     =
 ens1f1-TxRx-3
>   56:        232          0       2071          0         16         21  =
        0      54422          0          0
> 0         24         25          0         78          0   PCI-MSI 157491=
6-edge      ens1f1
>=20
> Note the interrupt rate has gone way down. During the 10s interval, we on=
ly
> saw a handful of interrupts.
>=20
> Note that this patch was originally provided by Intel directly to AT&T an=
d
> Vyatta, but unfortunately I am unable to find records of the exact author=
.
>=20
> We have been using this in production for more than a year.
>=20
> Fixes: af75078fece3 ("first public release")
> Cc: stable@dpdk.org
>=20
> Signed-off-by: Luca Boccassi <bluca@debian.org>
> ---
>  drivers/net/ixgbe/base/ixgbe_vf.c | 33 ++++++++++++++++---------------
>  1 file changed, 17 insertions(+), 16 deletions(-)
>=20
> diff --git a/drivers/net/ixgbe/base/ixgbe_vf.c
> b/drivers/net/ixgbe/base/ixgbe_vf.c
> index 5b25a6b4d4..16086670b1 100644
> --- a/drivers/net/ixgbe/base/ixgbe_vf.c
> +++ b/drivers/net/ixgbe/base/ixgbe_vf.c
> @@ -586,7 +586,6 @@ s32 ixgbe_check_mac_link_vf(struct ixgbe_hw *hw,
> ixgbe_link_speed *speed,
>  	s32 ret_val =3D IXGBE_SUCCESS;
>  	u32 links_reg;
>  	u32 in_msg =3D 0;
> -	UNREFERENCED_1PARAMETER(autoneg_wait_to_complete);
>=20
>  	/* If we were hit with a reset drop the link */
>  	if (!mbx->ops.check_for_rst(hw, 0) || !mbx->timeout) @@ -643,23
> +642,25 @@ s32 ixgbe_check_mac_link_vf(struct ixgbe_hw *hw,
> ixgbe_link_speed *speed,
>  		*speed =3D IXGBE_LINK_SPEED_UNKNOWN;
>  	}
>=20
> -	/* if the read failed it could just be a mailbox collision, best wait
> -	 * until we are called again and don't report an error
> -	 */
> -	if (mbx->ops.read(hw, &in_msg, 1, 0))
> -		goto out;
> +	if (autoneg_wait_to_complete) {
> +		/* if the read failed it could just be a mailbox collision, best
> wait
> +		 * until we are called again and don't report an error
> +		 */
> +		if (mbx->ops.read(hw, &in_msg, 1, 0))
> +			goto out;
>=20
> -	if (!(in_msg & IXGBE_VT_MSGTYPE_CTS)) {
> -		/* msg is not CTS and is NACK we must have lost CTS status
> */
> -		if (in_msg & IXGBE_VT_MSGTYPE_NACK)
> +		if (!(in_msg & IXGBE_VT_MSGTYPE_CTS)) {
> +			/* msg is not CTS and is NACK we must have lost CTS
> status */
> +			if (in_msg & IXGBE_VT_MSGTYPE_NACK)
> +				ret_val =3D -1;
> +			goto out;
> +		}
> +
> +		/* the pf is talking, if we timed out in the past we reinit */
> +		if (!mbx->timeout) {
>  			ret_val =3D -1;
> -		goto out;
> -	}
> -
> -	/* the pf is talking, if we timed out in the past we reinit */
> -	if (!mbx->timeout) {
> -		ret_val =3D -1;
> -		goto out;
> +			goto out;
> +		}
>  	}
>=20
>  	/* if we passed all the tests above then the link is up and we no
> --
> 2.18.0