From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0C990A00C5 for ; Fri, 22 Jul 2022 04:19:23 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8BED140141; Fri, 22 Jul 2022 04:19:22 +0200 (CEST) Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by mails.dpdk.org (Postfix) with ESMTP id 7C7B9400D7 for ; Fri, 22 Jul 2022 04:19:21 +0200 (CEST) Received: by mail-ej1-f50.google.com with SMTP id os14so6265021ejb.4 for ; Thu, 21 Jul 2022 19:19:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=H/nisVFigHKEmfXy26btYljk3kOkFgaDvnw5FgZg47Q=; b=IeTIZbq/k1USlG4QtVpikC7ExLJ6+sN3rW1p7kh11vkKOcDZQkPz6Ltg/iLTktC/LM zaXuo4nEmioDSh6vCIpa7bYy9L31DqcXTVZvUMroAnGJYwFAy7w4mvrDrpvPmNMf0mnq mwnuvH3uVpEiw2uYD8AUoRfGDSgIu15wj54IfzXGD3fp7Vt/GHBOWfin55urvEKIq2wz bHj0UQd7CCtg1hTBI7RLbTfGjj8a4Z1WoOlKw8sB9LIsBymM2EjM+OJ7uH1/bg1dzDQy hZS1daMEy2llMKs3EiGWEHbwIzYXwUxHXTWin64e/qiw/vbUdfOil3xZp6ZY67tyM7x4 g+VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=H/nisVFigHKEmfXy26btYljk3kOkFgaDvnw5FgZg47Q=; b=funZQGv2XPUtbsT7czvhktg2csKZ5lazb+4f4mhnY+V5V3hHt/4ca0Nm3cV+axPgQA XLuAfzepPP8XQ+Tj/IZdfZPf2l0djV5CDG4p3lUqajtLRDefu+Z5xjOg5yNE0v/pwb+1 jQfduF0CFNaSB3oBIyIHbsbCau2gwV/xoK3TLwlivPuBe2gg/CbrzN1zu13PeJ/N3/TC X9YQtGqsFQ5jgTmyZDG1fDa7IVDt0AcpX1lUafIO2+zhGfzWlLBNfEyvLyptBpzAjx5G XCWMImdcTUqFJlpr31qiRyEswOtsnZugjhOisj7BK0oRM/6L5okf2vpRCJvHz+6DZb1c R6Nw== X-Gm-Message-State: AJIora80vcDF77DHUa6DsrcMlqAAh37OrIz4CzS/5fhj04b1i6PMsb3C EKnW21u7LdZKSV0XSjKX3b4pTv94t0rX7uwtZwA= X-Google-Smtp-Source: AGRyM1tHSd24fBkcl7KyJ0r3GpyimdhduRT7nKCd6huEp6JIpssUXOW9CLtXnvtScmiv2Iaq1Irbr0aFhm9+3zOG6U0= X-Received: by 2002:a17:907:7fa5:b0:72b:755a:b77e with SMTP id qk37-20020a1709077fa500b0072b755ab77emr1239994ejc.474.1658456360773; Thu, 21 Jul 2022 19:19:20 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Nobin Mathew Date: Fri, 22 Jul 2022 07:49:08 +0530 Message-ID: Subject: Re: VF is still resetting To: "Yang, SteveX" Cc: "Xing, Beilei" , users@dpdk.org, "Yang, Qiming" Content-Type: multipart/alternative; boundary="00000000000042700c05e45b76e0" X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org --00000000000042700c05e45b76e0 Content-Type: text/plain; charset="UTF-8" Thanks Steve, is this something which will be fixed next iterations of NIC/FW? Or architecture limitation? On Thu, 21 Jul 2022, 2:47 pm Yang, SteveX, wrote: > Hi Nobin, > > It seems be limitation of the firmware/NIC implementation for reset VF. > Firmware should spend some time to respond reset operation, > if app/user reset VF with higher frequency, the HW/FW perhaps would be > hung. (e.g.: 0xDEADBEEF code from register). > This patch ( > https://github.com/DPDK/dpdk/commit/be7226980c9ad4963b92b489c8afb17f08899953) > just is the workaround to delay reset checking. > It cannot resolve this pressure testing. > > Thanks & Regards, > Steve Yang. > > > -----Original Message----- > > From: Nobin Mathew > > Sent: Thursday, July 21, 2022 9:45 AM > > To: Xing, Beilei > > Cc: users@dpdk.org; Yang, SteveX ; Yang, Qiming > > > > Subject: Re: VF is still resetting > > > > Any pointers? > > Is this a firmware problem? > > > > I am not seeing > > " dev_err(&pf->pdev->dev, "VF reset check timeout on VF %d\n", " from > > i40e driver anywhere in syslog. > > > > -Nobin > > > > On Wed, Jul 20, 2022 at 11:56 AM Xing, Beilei > wrote: > > > > > > Hi Steve, > > > > > > Could you please help on this? Thanks. > > > > > > BR, > > > Beilei > > > > > > > -----Original Message----- > > > > From: Nobin Mathew > > > > Sent: Wednesday, July 20, 2022 12:18 AM > > > > To: users@dpdk.org > > > > Subject: VF is still resetting > > > > > > > > Hi, > > > > > > > > We are running a dpdk app inside a pod, and orchestrating the app > > > > very frequently(test app). > > > > > > > > 1/100 or so we are getting an error: > > > > > > > > 2022-07-17T22:34:24.620291289+03:00 iavf_check_vf_reset_done(): > > > > reset VFR value: 3 > > > > 2022-07-17T22:34:24.620310455+03:00 iavf_init_vf(): VF is still > > > > resetting > > > > 2022-07-17T22:34:24.620339697+03:00 iavf_dev_init(): Init vf failed > > > > 2022-07-17T22:34:24.620390802+03:00 EAL: Releasing PCI mapped > > > > resource for 0000:3b:0f.5 > > > > 2022-07-17T22:34:24.620397381+03:00 EAL: Calling pci_unmap_resource > > > > for > > > > 0000:3b:0f.5 at 0x2101000000 > > > > 2022-07-17T22:34:24.620442514+03:00 EAL: Calling pci_unmap_resource > > > > for > > > > 0000:3b:0f.5 at 0x2101010000 > > > > 2022-07-17T22:34:24.729012277+03:00 EAL: Requested device > > > > 0000:3b:0f.5 cannot be used > > > > 2022-07-17T22:34:24.729028758+03:00 EAL: Bus (pci) probe failed. > > > > > > > > we added one log in dpdk lib to print the VFGEN_RSTAT register of > > > > the VF. In problematic cases, we are seeing the value 3 which maps > > > > to 0xDEADBEEF > > > > > > > > / VF reset states - these are written into the RSTAT register: > > > > * VFGEN_RSTAT on the VF > > > > * When the PF initiates a reset, it writes 0 > > > > * When the reset is complete, it writes 1 > > > > * When the PF detects that the VF has recovered, it writes 2 > > > > * VF checks this register periodically to determine if a reset has > > > > occurred, > > > > * then polls it to know when the reset is complete. > > > > * If either the PF or VF reads the register while the hardware > > > > * is in a reset state, it will return DEADBEEF, which, when masked > > > > * will result in 3. > > > > / > > > > enum virtchnl_vfr_states { > > > > VIRTCHNL_VFR_INPROGRESS = 0, > > > > VIRTCHNL_VFR_COMPLETED, > > > > VIRTCHNL_VFR_VFACTIVE, > > > > }; > > > > > > > > We tried this patch also, increasing the poll time, no help. > > > > > > https://github.com/DPDK/dpdk/commit/be7226980c9ad4963b92b489c8afb > > > > 17f08899953 > > > > > > > > Details of the setup: > > > > > > > > DPDK library version > > > > 21.11 > > > > VF Driver:- > > > > intel-iavf version 4.0.1-3.2 > > > > PF driver:- > > > > sudo ethtool -i enp94s0f1 > > > > driver: i40e > > > > version: 2.14.13 > > > > firmware-version: 8.15 0x800096ca 20.0.17 > > > > > > > > Since we are seeing 0xDEADBEEF, I am assuming VF-PF reset mailbox > > > > msg is received by PF, and PF initiated the RESET sequence by > > > > writing VFSWR to VPGEN_VFRTRIG register. > > > > > > > > I am not seeing > > > > " dev_err(&pf->pdev->dev, "VF reset check timeout on VF %d\n", " > > > > anywhere in syslog. > > > > > > > > Any pointers?, why does this happen(why VF reset is not complete)?... > > > > > > > > One more question, what is the sequence of calls in the reset path? > > > > i40e_vc_process_vf_msg() -> VIRTCHNL_OP_RESET_VF > > i40e_vc_reset_vf() > > > > -> > > > > i40e_reset_vf() -> i40e_trigger_vf_reset() & i40e_cleanup_reset_vf() > > > > > > > > this one? > > > > > > > > -Nobin > --00000000000042700c05e45b76e0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Steve, is this something which will be fixed = next iterations of NIC/FW? Or architecture limitation?


On Thu, 21 Jul 2022, 2:47 pm Yang, SteveX, <stevex.yang@intel.com> wrote:
=
Hi Nobin,

It seems be limitation of the firmware/NIC implementation for reset VF.
Firmware should spend some time to respond reset operation,
if app/user reset VF with higher frequency, the HW/FW perhaps would be hung= . (e.g.: 0xDEADBEEF code from register).
This patch (h= ttps://github.com/DPDK/dpdk/commit/be7226980c9ad4963b92b489c8afb17f08899953= ) just is the workaround to delay reset checking.
It cannot resolve this pressure testing.

Thanks & Regards,
Steve Yang.

> -----Original Message-----
> From: Nobin Mathew <nobin.mathew@gmail.com>
> Sent: Thursday, July 21, 2022 9:45 AM
> To: Xing, Beilei <beilei.xing@intel.com>
> Cc: users@dpdk.org; Yang, SteveX <stevex.yang@intel.com>;= Yang, Qiming
> <qiming.yang@intel.com>
> Subject: Re: VF is still resetting
>
> Any pointers?
> Is this a firmware problem?
>
> I am not seeing
>=C2=A0 " dev_err(&pf->pdev->dev, "VF reset check ti= meout on VF %d\n", " from
> i40e driver=C2=A0 anywhere in syslog.
>
> -Nobin
>
> On Wed, Jul 20, 2022 at 11:56 AM Xing, Beilei <beilei.xing@intel= .com> wrote:
> >
> > Hi Steve,
> >
> > Could you please help on this? Thanks.
> >
> > BR,
> > Beilei
> >
> > > -----Original Message-----
> > > From: Nobin Mathew <nobin.mathew@gmail.com> > > > Sent: Wednesday, July 20, 2022 12:18 AM
> > > To: users@dpdk.org
> > > Subject: VF is still resetting
> > >
> > > Hi,
> > >
> > > We are running a dpdk app inside a pod, and orchestrating th= e app
> > > very frequently(test app).
> > >
> > > 1/100 or so we are getting an error:
> > >
> > > 2022-07-17T22:34:24.620291289+03:00 iavf_check_vf_reset_done= ():
> > > reset VFR value: 3
> > > 2022-07-17T22:34:24.620310455+03:00 iavf_init_vf(): VF is st= ill
> > > resetting
> > > 2022-07-17T22:34:24.620339697+03:00 iavf_dev_init(): Init vf= failed
> > > 2022-07-17T22:34:24.620390802+03:00 EAL: Releasing PCI mappe= d
> > > resource for 0000:3b:0f.5
> > > 2022-07-17T22:34:24.620397381+03:00 EAL: Calling pci_unmap_r= esource
> > > for
> > > 0000:3b:0f.5 at 0x2101000000
> > > 2022-07-17T22:34:24.620442514+03:00 EAL: Calling pci_unmap_r= esource
> > > for
> > > 0000:3b:0f.5 at 0x2101010000
> > > 2022-07-17T22:34:24.729012277+03:00 EAL: Requested device > > > 0000:3b:0f.5 cannot be used
> > > 2022-07-17T22:34:24.729028758+03:00 EAL: Bus (pci) probe fai= led.
> > >
> > > we added one log in dpdk lib to print the VFGEN_RSTAT regist= er of
> > > the VF. In problematic cases, we are seeing the value 3 whic= h maps
> > > to 0xDEADBEEF
> > >
> > > / VF reset states - these are written into the RSTAT registe= r:
> > > * VFGEN_RSTAT on the VF
> > > * When the PF initiates a reset, it writes 0
> > > * When the reset is complete, it writes 1
> > > * When the PF detects that the VF has recovered, it writes 2=
> > > * VF checks this register periodically to determine if a res= et has
> > > occurred,
> > > * then polls it to know when the reset is complete.
> > > * If either the PF or VF reads the register while the hardwa= re
> > > * is in a reset state, it will return DEADBEEF, which, when = masked
> > > * will result in 3.
> > > /
> > > enum virtchnl_vfr_states {
> > > VIRTCHNL_VFR_INPROGRESS =3D 0,
> > > VIRTCHNL_VFR_COMPLETED,
> > > VIRTCHNL_VFR_VFACTIVE,
> > > };
> > >
> > > We tried this patch also, increasing the poll time, no help.=
> > >
> https://github.com/= DPDK/dpdk/commit/be7226980c9ad4963b92b489c8afb
> > > 17f08899953
> > >
> > > Details of the setup:
> > >
> > > DPDK library version
> > > 21.11
> > > VF Driver:-
> > > intel-iavf version 4.0.1-3.2
> > > PF driver:-
> > > sudo ethtool -i enp94s0f1
> > > driver: i40e
> > > version: 2.14.13
> > > firmware-version: 8.15 0x800096ca 20.0.17
> > >
> > > Since we are seeing 0xDEADBEEF, I am assuming VF-PF reset ma= ilbox
> > > msg is received by PF, and PF initiated the RESET sequence b= y
> > > writing VFSWR to VPGEN_VFRTRIG register.
> > >
> > > I am not seeing
> > > " dev_err(&pf->pdev->dev, "VF reset chec= k timeout on VF %d\n", "
> > > anywhere in syslog.
> > >
> > > Any pointers?, why does this happen(why VF reset is not comp= lete)?...
> > >
> > > One more question, what is the sequence of calls in the rese= t path?
> > > i40e_vc_process_vf_msg() -> VIRTCHNL_OP_RESET_VF
> i40e_vc_reset_vf()
> > > ->
> > > i40e_reset_vf() -> i40e_trigger_vf_reset() & i40e_cle= anup_reset_vf()
> > >
> > > this one?
> > >
> > > -Nobin
--00000000000042700c05e45b76e0--