Ifconfig pf port up/down, after several times, the dpdk vf driver may fail to obtain the mailbox lock, resulting in configuration failure and functional failure. In order to increase the reliability of mailbox communication, the patch uses a trial strategy. Fixes: abf7275bbaa2 ("ixgbe: move to drivers/net/") Cc: stable@dpdk.org Signed-off-by: Qiming Chen <chenqiming_huawei@163.com> --- drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c index 4dddff2c58..5a14fcc7b4 100644 --- a/drivers/net/ixgbe/base/ixgbe_mbx.c +++ b/drivers/net/ixgbe/base/ixgbe_mbx.c @@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id) STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw) { s32 ret_val = IXGBE_ERR_MBX; + s32 timeout = hw->mbx.timeout; + s32 usec = hw->mbx.usec_delay; DEBUGFUNC("ixgbe_obtain_mbx_lock_vf"); - /* Take ownership of the buffer */ - IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); + do { + /* Take ownership of the buffer */ + IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); - /* reserve mailbox for vf use */ - if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) - ret_val = IXGBE_SUCCESS; + /* reserve mailbox for vf use */ + if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) { + ret_val = IXGBE_SUCCESS; + break; + } + + usec_delay(usec); + } while (timeout--); return ret_val; } -- 2.30.1.windows.1
> -----Original Message----- > From: Qiming Chen <chenqiming_huawei@163.com> > Sent: Tuesday, August 31, 2021 16:41 > To: dev@dpdk.org > Cc: Wang, Haiyue <haiyue.wang@intel.com>; Qiming Chen <chenqiming_huawei@163.com>; stable@dpdk.org > Subject: [PATCH] net/ixgbe: fix probability of obtaining mailbox lock failure > > Ifconfig pf port up/down, after several times, the dpdk vf driver may fail > to obtain the mailbox lock, resulting in configuration failure and > functional failure. In order to increase the reliability of mailbox > communication, the patch uses a trial strategy. > > Fixes: abf7275bbaa2 ("ixgbe: move to drivers/net/") Should be Fixes: af75078fece3 ("first public release") > Cc: stable@dpdk.org > > Signed-off-by: Qiming Chen <chenqiming_huawei@163.com> > --- > drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c > index 4dddff2c58..5a14fcc7b4 100644 > --- a/drivers/net/ixgbe/base/ixgbe_mbx.c > +++ b/drivers/net/ixgbe/base/ixgbe_mbx.c > @@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id) > STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw) > { > s32 ret_val = IXGBE_ERR_MBX; > + s32 timeout = hw->mbx.timeout; > + s32 usec = hw->mbx.usec_delay; > > DEBUGFUNC("ixgbe_obtain_mbx_lock_vf"); > > - /* Take ownership of the buffer */ > - IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); > + do { > + /* Take ownership of the buffer */ > + IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); > > - /* reserve mailbox for vf use */ > - if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) > - ret_val = IXGBE_SUCCESS; > + /* reserve mailbox for vf use */ > + if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) { > + ret_val = IXGBE_SUCCESS; > + break; > + } > + > + usec_delay(usec); > + } while (timeout--); > > return ret_val; > } > -- > 2.30.1.windows.1
Ifconfig pf port up/down, after several times, the dpdk vf driver may fail to obtain the mailbox lock, resulting in configuration failure and functional failure. In order to increase the reliability of mailbox communication, the patch uses a trial strategy. Fixes: af75078fece3 ("first public release") Cc: stable@dpdk.org Signed-off-by: Qiming Chen <chenqiming_huawei@163.com> --- v2: Modify fixes commit --- drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c index 4dddff2c58..5a14fcc7b4 100644 --- a/drivers/net/ixgbe/base/ixgbe_mbx.c +++ b/drivers/net/ixgbe/base/ixgbe_mbx.c @@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id) STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw) { s32 ret_val = IXGBE_ERR_MBX; + s32 timeout = hw->mbx.timeout; + s32 usec = hw->mbx.usec_delay; DEBUGFUNC("ixgbe_obtain_mbx_lock_vf"); - /* Take ownership of the buffer */ - IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); + do { + /* Take ownership of the buffer */ + IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); - /* reserve mailbox for vf use */ - if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) - ret_val = IXGBE_SUCCESS; + /* reserve mailbox for vf use */ + if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) { + ret_val = IXGBE_SUCCESS; + break; + } + + usec_delay(usec); + } while (timeout--); return ret_val; } -- 2.30.1.windows.1
> -----Original Message----- > From: Qiming Chen <chenqiming_huawei@163.com> > Sent: Monday, September 6, 2021 10:22 > To: dev@dpdk.org > Cc: Wang, Haiyue <haiyue.wang@intel.com>; Qiming Chen <chenqiming_huawei@163.com>; stable@dpdk.org > Subject: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure > > Ifconfig pf port up/down, after several times, the dpdk vf driver may fail > to obtain the mailbox lock, resulting in configuration failure and > functional failure. In order to increase the reliability of mailbox > communication, the patch uses a trial strategy. What's your log message like after " --log-level=pmd.net.ixgbe.init:8 --log-level=pmd.net.ixgbe.driver:8" ? What I got is just a little messages, no more function call. "ifconfig PF down/up". testpmd> ixgbevf_intr_disable(): >> ixgbe_read_mbx(): ixgbe_read_mbx ixgbe_read_mbx_vf(): ixgbe_read_mbx_vf ixgbe_obtain_mbx_lock_vf(): ixgbe_obtain_mbx_lock_vf Port 0: reset event ixgbevf_intr_enable(): >> > > Fixes: af75078fece3 ("first public release") > Cc: stable@dpdk.org > > Signed-off-by: Qiming Chen <chenqiming_huawei@163.com> > --- > v2: > Modify fixes commit > --- > drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c > index 4dddff2c58..5a14fcc7b4 100644 > --- a/drivers/net/ixgbe/base/ixgbe_mbx.c > +++ b/drivers/net/ixgbe/base/ixgbe_mbx.c > @@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id) > STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw) > { > s32 ret_val = IXGBE_ERR_MBX; > + s32 timeout = hw->mbx.timeout; > + s32 usec = hw->mbx.usec_delay; > > DEBUGFUNC("ixgbe_obtain_mbx_lock_vf"); > > - /* Take ownership of the buffer */ > - IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); > + do { > + /* Take ownership of the buffer */ > + IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); > > - /* reserve mailbox for vf use */ > - if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) > - ret_val = IXGBE_SUCCESS; > + /* reserve mailbox for vf use */ > + if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) { > + ret_val = IXGBE_SUCCESS; > + break; > + } > + > + usec_delay(usec); > + } while (timeout--); > > return ret_val; > } > -- > 2.30.1.windows.1
This problem is not based on the log to observe and locate, you can try the following steps to reproduce: 1) kernel pf + dpdk vf mode; 2) The vf control panel keeps adding or acquiring configurations, such as create thread to get link status, etc. 3) Write a script to repeatedly perform "if config pf down/up" operations After a period of time, there will be a probability that the mailbox cannot be obtained, which will cause an abnormality. This problem is reproduced locally through the development of a demo. The probability is relatively small and it may not be easy to reproduce, but the problem does exist. On 9/8/2021 11:33,Wang, Haiyue<haiyue.wang@intel.com> wrote: -----Original Message----- From: Qiming Chen <chenqiming_huawei@163.com> Sent: Monday, September 6, 2021 10:22 To: dev@dpdk.org Cc: Wang, Haiyue <haiyue.wang@intel.com>; Qiming Chen <chenqiming_huawei@163.com>; stable@dpdk.org Subject: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure Ifconfig pf port up/down, after several times, the dpdk vf driver may fail to obtain the mailbox lock, resulting in configuration failure and functional failure. In order to increase the reliability of mailbox communication, the patch uses a trial strategy. What's your log message like after " --log-level=pmd.net.ixgbe.init:8 --log-level=pmd.net.ixgbe.driver:8" ? What I got is just a little messages, no more function call. "ifconfig PF down/up". testpmd> ixgbevf_intr_disable(): >> ixgbe_read_mbx(): ixgbe_read_mbx ixgbe_read_mbx_vf(): ixgbe_read_mbx_vf ixgbe_obtain_mbx_lock_vf(): ixgbe_obtain_mbx_lock_vf Port 0: reset event ixgbevf_intr_enable(): >> Fixes: af75078fece3 ("first public release") Cc: stable@dpdk.org Signed-off-by: Qiming Chen <chenqiming_huawei@163.com> --- v2: Modify fixes commit --- drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c index 4dddff2c58..5a14fcc7b4 100644 --- a/drivers/net/ixgbe/base/ixgbe_mbx.c +++ b/drivers/net/ixgbe/base/ixgbe_mbx.c @@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id) STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw) { s32 ret_val = IXGBE_ERR_MBX; + s32 timeout = hw->mbx.timeout; + s32 usec = hw->mbx.usec_delay; DEBUGFUNC("ixgbe_obtain_mbx_lock_vf"); - /* Take ownership of the buffer */ - IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); + do { + /* Take ownership of the buffer */ + IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); - /* reserve mailbox for vf use */ - if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) - ret_val = IXGBE_SUCCESS; + /* reserve mailbox for vf use */ + if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) { + ret_val = IXGBE_SUCCESS; + break; + } + + usec_delay(usec); + } while (timeout--); return ret_val; } -- 2.30.1.windows.1
Again, Please DON’T REPLY with rich text, it is hard to handle in patchwork. And DON'T REPLY on top. BR, Haiyue From: Qiming Chen <chenqiming_huawei@163.com> Sent: Thursday, September 9, 2021 09:57 To: Wang, Haiyue <haiyue.wang@intel.com> Cc: dev@dpdk.org; stable@dpdk.org Subject: Re: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure This problem is not based on the log to observe and locate, you can try the following steps to reproduce: 1) kernel pf + dpdk vf mode; 2) The vf control panel keeps adding or acquiring configurations, such as create thread to get link status, etc. 3) Write a script to repeatedly perform "if config pf down/up" operations After a period of time, there will be a probability that the mailbox cannot be obtained, which will cause an abnormality. This problem is reproduced locally through the development of a demo. The probability is relatively small and it may not be easy to reproduce, but the problem does exist. On 9/8/2021 11:33,mailto:haiyue.wang@intel.com wrote: -----Original Message----- From: Qiming Chen <mailto:chenqiming_huawei@163.com> Sent: Monday, September 6, 2021 10:22 To: mailto:dev@dpdk.org Cc: Wang, Haiyue <mailto:haiyue.wang@intel.com>; Qiming Chen <mailto:chenqiming_huawei@163.com>; mailto:stable@dpdk.org Subject: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure Ifconfig pf port up/down, after several times, the dpdk vf driver may fail to obtain the mailbox lock, resulting in configuration failure and functional failure. In order to increase the reliability of mailbox communication, the patch uses a trial strategy. What's your log message like after " --log-level=pmd.net.ixgbe.init:8 --log-level=pmd.net.ixgbe.driver:8" ? What I got is just a little messages, no more function call. "ifconfig PF down/up". testpmd> ixgbevf_intr_disable(): >> ixgbe_read_mbx(): ixgbe_read_mbx ixgbe_read_mbx_vf(): ixgbe_read_mbx_vf ixgbe_obtain_mbx_lock_vf(): ixgbe_obtain_mbx_lock_vf Port 0: reset event ixgbevf_intr_enable(): >> Fixes: af75078fece3 ("first public release") Cc: mailto:stable@dpdk.org Signed-off-by: Qiming Chen <mailto:chenqiming_huawei@163.com> --- v2: Modify fixes commit --- drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c index 4dddff2c58..5a14fcc7b4 100644 --- a/drivers/net/ixgbe/base/ixgbe_mbx.c +++ b/drivers/net/ixgbe/base/ixgbe_mbx.c @@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id) STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw) { s32 ret_val = IXGBE_ERR_MBX; + s32 timeout = hw->mbx.timeout; + s32 usec = hw->mbx.usec_delay; DEBUGFUNC("ixgbe_obtain_mbx_lock_vf"); - /* Take ownership of the buffer */ - IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); + do { + /* Take ownership of the buffer */ + IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); - /* reserve mailbox for vf use */ - if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) - ret_val = IXGBE_SUCCESS; + /* reserve mailbox for vf use */ + if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) { + ret_val = IXGBE_SUCCESS; + break; + } + + usec_delay(usec); + } while (timeout--); return ret_val; } -- 2.30.1.windows.1
I have to say that ixgbevf PMD have limitation to handle the reset event, so for your application demo, if the link down/up event is detected, it needs to reset the ixgbevf as kernel does: BTW, retry doesn't help to make things better, you have to wait the PF notify you thing is done. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c static void ixgbevf_watchdog_update_link(struct ixgbevf_adapter *adapter) { struct ixgbe_hw *hw = &adapter->hw; u32 link_speed = adapter->link_speed; bool link_up = adapter->link_up; s32 err; spin_lock_bh(&adapter->mbx_lock); err = hw->mac.ops.check_link(hw, &link_speed, &link_up, false); spin_unlock_bh(&adapter->mbx_lock); /* if check for link returns error we will need to reset */ if (err && time_after(jiffies, adapter->last_reset + (10 * HZ))) { set_bit(__IXGBEVF_RESET_REQUESTED, &adapter->state); link_up = false; } adapter->link_up = link_up; adapter->link_speed = link_speed; } BR, Haiyue From: Qiming Chen <chenqiming_huawei@163.com> Sent: Thursday, September 9, 2021 09:57 To: Wang, Haiyue <haiyue.wang@intel.com> Cc: dev@dpdk.org; stable@dpdk.org Subject: Re: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure This problem is not based on the log to observe and locate, you can try the following steps to reproduce: 1) kernel pf + dpdk vf mode; 2) The vf control panel keeps adding or acquiring configurations, such as create thread to get link status, etc. 3) Write a script to repeatedly perform "if config pf down/up" operations After a period of time, there will be a probability that the mailbox cannot be obtained, which will cause an abnormality. This problem is reproduced locally through the development of a demo. The probability is relatively small and it may not be easy to reproduce, but the problem does exist. On 9/8/2021 11:33,mailto:haiyue.wang@intel.com wrote: -----Original Message----- From: Qiming Chen <mailto:chenqiming_huawei@163.com> Sent: Monday, September 6, 2021 10:22 To: mailto:dev@dpdk.org Cc: Wang, Haiyue <mailto:haiyue.wang@intel.com>; Qiming Chen <mailto:chenqiming_huawei@163.com>; mailto:stable@dpdk.org Subject: [PATCH v2] net/ixgbe: fix probability of obtaining mailbox lock failure Ifconfig pf port up/down, after several times, the dpdk vf driver may fail to obtain the mailbox lock, resulting in configuration failure and functional failure. In order to increase the reliability of mailbox communication, the patch uses a trial strategy. What's your log message like after " --log-level=pmd.net.ixgbe.init:8 --log-level=pmd.net.ixgbe.driver:8" ? What I got is just a little messages, no more function call. "ifconfig PF down/up". testpmd> ixgbevf_intr_disable(): >> ixgbe_read_mbx(): ixgbe_read_mbx ixgbe_read_mbx_vf(): ixgbe_read_mbx_vf ixgbe_obtain_mbx_lock_vf(): ixgbe_obtain_mbx_lock_vf Port 0: reset event ixgbevf_intr_enable(): >> Fixes: af75078fece3 ("first public release") Cc: mailto:stable@dpdk.org Signed-off-by: Qiming Chen <mailto:chenqiming_huawei@163.com> --- v2: Modify fixes commit --- drivers/net/ixgbe/base/ixgbe_mbx.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/net/ixgbe/base/ixgbe_mbx.c b/drivers/net/ixgbe/base/ixgbe_mbx.c index 4dddff2c58..5a14fcc7b4 100644 --- a/drivers/net/ixgbe/base/ixgbe_mbx.c +++ b/drivers/net/ixgbe/base/ixgbe_mbx.c @@ -370,15 +370,23 @@ STATIC s32 ixgbe_check_for_rst_vf(struct ixgbe_hw *hw, u16 mbx_id) STATIC s32 ixgbe_obtain_mbx_lock_vf(struct ixgbe_hw *hw) { s32 ret_val = IXGBE_ERR_MBX; + s32 timeout = hw->mbx.timeout; + s32 usec = hw->mbx.usec_delay; DEBUGFUNC("ixgbe_obtain_mbx_lock_vf"); - /* Take ownership of the buffer */ - IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); + do { + /* Take ownership of the buffer */ + IXGBE_WRITE_REG(hw, IXGBE_VFMAILBOX, IXGBE_VFMAILBOX_VFU); - /* reserve mailbox for vf use */ - if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) - ret_val = IXGBE_SUCCESS; + /* reserve mailbox for vf use */ + if (ixgbe_read_v2p_mailbox(hw) & IXGBE_VFMAILBOX_VFU) { + ret_val = IXGBE_SUCCESS; + break; + } + + usec_delay(usec); + } while (timeout--); return ret_val; } -- 2.30.1.windows.1
This is a problem triggered by the existing network. I discovered it a long time ago. I use the link state as an example. It is not to say that it is a link state problem, but to show that ixgbevf does have such a probability problem. The current modification and repeated verification can indeed solve the problem. The specific root cause of the problem may not be analyzed. Since the mailbox itself has a reliability mechanism, why not use it here?I understand that the status of the vf mailbox is read from the register. If you repeatedly reset the pf, will the transient fail because the register value has not been initialized, and it will succeed later?