From: "Zhao1, Wei" <wei.zhao1@intel.com>
To: Laurent Hardy <laurent.hardy@6wind.com>,
Ilya Maximets <i.maximets@samsung.com>,
"Zhang, Qi Z" <qi.z.zhang@intel.com>
Cc: "Lu, Wenzhuo" <wenzhuo.lu@intel.com>,
"Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
"stable@dpdk.org" <stable@dpdk.org>,
"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while fiber link update
Date: Fri, 12 Oct 2018 07:36:37 +0000 [thread overview]
Message-ID: <A2573D2ACFCADC41BB3BE09C6DE313CA07E4F2A0@PGSMSX103.gar.corp.intel.com> (raw)
In-Reply-To: <879a4ff6-b9a3-c8d8-9550-e252a053800a@6wind.com>
Hi, Laurent Hardy
> -----Original Message-----
> From: Laurent Hardy [mailto:laurent.hardy@6wind.com]
> Sent: Thursday, October 11, 2018 8:22 PM
> To: Ilya Maximets <i.maximets@samsung.com>; Zhao1, Wei
> <wei.zhao1@intel.com>; Zhang, Qi Z <qi.z.zhang@intel.com>
> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; stable@dpdk.org; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while fiber link
> update
>
> Hi Wei,
>
> Please see comments below.
>
>
> On 10/11/2018 12:26 PM, Ilya Maximets wrote:
> > On 11.10.2018 12:56, Zhao1, Wei wrote:
> >> Hi, Ilya Maximets AND laurent.hardy
> > Hi, thanks for sharing your thoughts.
> > Comments inline.
> >
> >>
> >>> -----Original Message-----
> >>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ilya Maximets
> >>> Sent: Wednesday, September 12, 2018 4:05 PM
> >>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> >>> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Ananyev, Konstantin
> >>> <konstantin.ananyev@intel.com>; Laurent Hardy
> >>> <laurent.hardy@6wind.com>; Dai, Wei <wei.dai@intel.com>;
> >>> stable@dpdk.org
> >>> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while
> >>> fiber link update
> >>>
> >>> On 12.09.2018 09:49, Zhang, Qi Z wrote:
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Ilya Maximets [mailto:i.maximets@samsung.com]
> >>>>> Sent: Monday, September 10, 2018 11:09 PM
> >>>>> To: Zhang, Qi Z <qi.z.zhang@intel.com>; dev@dpdk.org
> >>>>> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Ananyev, Konstantin
> >>>>> <konstantin.ananyev@intel.com>; Laurent Hardy
> >>>>> <laurent.hardy@6wind.com>; Dai, Wei <wei.dai@intel.com>;
> >>>>> stable@dpdk.org
> >>>>> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while
> >>>>> fiber link update
> >>>>>
> >>>>> On 04.09.2018 09:08, Zhang, Qi Z wrote:
> >>>>>> Hi Ilya:
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ilya
> >>>>>>> Maximets
> >>>>>>> Sent: Friday, August 31, 2018 8:40 PM
> >>>>>>> To: dev@dpdk.org
> >>>>>>> Cc: Lu, Wenzhuo <wenzhuo.lu@intel.com>; Ananyev, Konstantin
> >>>>>>> <konstantin.ananyev@intel.com>; Laurent Hardy
> >>>>>>> <laurent.hardy@6wind.com>; Dai, Wei <wei.dai@intel.com>; Ilya
> >>>>>>> Maximets <i.maximets@samsung.com>; stable@dpdk.org
> >>>>>>> Subject: [dpdk-dev] [PATCH] net/ixgbe: fix busy polling while
> >>>>>>> fiber link update
> >>>>>>>
> >>>>>>> If the multispeed fiber link is in DOWN state, ixgbe_setup_link
> >>>>>>> could take around a second of busy polling. This is highly
> >>>>>>> inconvenient for the case where single thread periodically
> >>>>>>> checks the
> >>> link statuses.
> >>>>>>> For example, OVS main thread periodically updates the link
> >>>>>>> statuses and hangs for a really long time busy waiting on
> >>>>>>> ixgbe_setup_link() for a DOWN fiber ports. For case with 3 down
> >>>>>>> ports it hangs for a 3 seconds and unable to do anything including
> packet processing.
> >>>>>>> Fix that by shifting that workaround to a separate thread by
> >>>>>>> alarm handler that will try to set up link if it is DOWN.
> >>>>>> Does that mean we will block the interrupt thread for 3 seconds?
> >>>>> Three times for one second. Other work could be scheduled between.
> >>>>> IMHO, it's much better than blocking usual caller for 3 seconds.
> >>>>>
> >>>>>> Also, can we guarantee there will not be any race condition if we
> >>>>>> call
> >>>>> ixgbe_setup_link at another thread, the base code API is not
> >>>>> assumed to be thread-safe as I know.
> >>>>>
> >>>>> The only user of 'ixgbe_setup_link' is 'ixgbe_dev_start', but it
> >>>>> could be called only if device stopped. 'ixgbe_dev_stop' cancels the
> alarm.
> >>>>> Race with 'link_update' avoided by
> 'IXGBE_FLAG_NEED_LINK_CONFIG'
> >>> flag.
> >>>> I guess, it' not only about when ixgb_setup_link race with itself,
> >>>> but also
> >>> when it race with other APIs.
> >>>> Also the concern is, even in current version, we can prove there is
> >>>> no issue,
> >>> how can we guarantee we are safe for future base code update? It's
> >>> not designed as thread-safe.
> >>>> For my option, the change is risky.
> >>> In current implementation interrupt handler already calls the
> >>> 'ixgbe_dev_link_update' which subsequently calls 'ixgbe_setup_link'
> >>> in our case if LSC interrupts enabled. So, my change makes the
> >>> driver even safer by moving 'ixgbe_setup_link' to the same interrupt
> thread.
> >>> Otherwise two threads (interrupts handler and the link status
> >>> checking
> >>> thread) could call 'ixgbe_setup_link' simultaneously.
> >>>
> >>>> Btw, since ixgbe support LSC, it is not necessary for "single
> >>>> thread
> >>> periodically checks the link statuses", right?
> >>>
> >>> In current implementation it will take at least 5 seconds (4 + 1)
> >>> for the interrupt handler to detect DOWN link state for ixgbe
> >>> multispeed fiber. This is too much for many real world cases.
> >> I have reviewed this patch, now I agree with you of the point that
> >> when port is down, " main thread periodically updates the link statuses
> and hangs for a really long time busy waiting on ixgbe_setup_link() for a
> DOWN fiber ports ".
> >> This is introduced by a patch in the following:
> >> SHA-1: c12d22f65b132c56db7b4fdbfd5ddce27d1e9572
> >> * net/ixgbe: ensure link status is updated
> >>
> >> Because in this patch, ixgbe_setup_link() is called with input parameter
> autoneg_wait_to_complete=1, this will cause loop check and sleep delay.
> >> At least 82599 seems has this delay.(BTW, whivh type of NIC are you
> >> use? X550 or 82599)
> > I have 82599.
> >
> >> Your solution is add a eal_alarm_set for ixgbe_setup_link in the thread of
> PMD driver, and do the set up work in that thread, is that right?
> >> And main thread avoid hang by the flag of
> IXGBE_FLAG_NEED_LINK_CONFIG.
> >> I think this is a good idea for this problem, but it may cause
> >> problem for other legacy user of ixgbe pmd, because their legacy code,
> which use main thread to check link state and setup_link when port is down,
> and they are not aware of it is done by other thread if add your patch.
> > What are these applications? I see no public API for setup_link function.
> > It's internal to driver and should not be used externally.
> > Am I missing something?
> >
> >> And is that ok if we change code in ixgbe_dev_link_update_share() to
> >>
> >> ixgbe_dev_link_update_share()
> >> {
> >>
> >> /* check if it needs to wait to complete, if lsc interrupt is enabled */
> >> if (wait_to_complete == 0 || dev->data->dev_conf.intr_conf.lsc != 0)
> >> wait = 0;
> >>
> >> if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) &&
> >> ixgbe_get_media_type(hw) == ixgbe_media_type_fiber) {
> >> speed = hw->phy.autoneg_advertised;
> >> if (!speed)
> >> ixgbe_get_link_capabilities(hw, &speed, &autoneg);
> >> ixgbe_setup_link(hw, speed, wait);
> >> }
> >> }
> >>
> >> Then, your application can call rte_eth_link_get_nowait () to make
> >> wait_to_complete=0 when doing periodic link state check, Which will not
> cause loop check and sleep delay. Legacy code of other user call
> rte_eth_link_get() will not be affected also.
> >> But, I am NOT confident ,whether this will introduce new problem when
> set up link without wait!
> >> So, this is just a discussion topic.
> > Unfortunately this will not help. Take a look to the function
> > 'ixgbe_setup_mac_link_multispeed_fiber()', which is the main
> > problematic function here. 'wait_to_complete' here used only as
> > argument for ixgbe_setup_mac_link(), and the busy waiting loops are
> outside of it.
> > Regardless of the 'wait_to_complete' value, this function will busy
> > poll the link for 1040 ms trying to setup 10GB speed and 140ms more
> > trying to setup 1GB speed. After that, it will call itself recursively
> > and wait again... Looks like I miscalculated last time. Right now
> > it'll be more than 2 seconds for each down port since following patch
> merged:
> > 8fc1f32fa615 ("net/ixgbe: wait longer for link after fiber MAC setup").
> >
> >> Hi, laurent.hardy
> >> You are the author for the patch (* net/ixgbe: ensure link status is
> updated), why do you implement code that way?
> >> Is that must that set up link with wait?
> >>
> >> ixgbe_setup_link(hw, speed, true);
> >>
> The main issue which has lead to this patch has been reported through a test
> case with the autoneg enabled (which has been also reported in the pmd
> test provided along with the patch:
> http://patches.dpdk.org/comment/46253/).
> In this context, without the flag set the patch wasn't effective.
My question is whether we can change to:
ixgbe_setup_link(hw, speed, false);
in your patch, it is " ixgbe_setup_link(hw, speed, true);"
some user may need flexible in wait for complete.
> >>
> >>>>>> Regards
> >>>>>> Qi
> >>>>>>
> >>>>>>> Fixes: c12d22f65b13 ("net/ixgbe: ensure link status is updated")
> >>>>>>> CC: stable@dpdk.org
> >>>>>>>
> >>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> >>>>>>> ---
> >>>>>>> drivers/net/ixgbe/ixgbe_ethdev.c | 43
> >>>>>>> ++++++++++++++++++++++++--------
> >>>>>>> 1 file changed, 32 insertions(+), 11 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> >>>>>>> b/drivers/net/ixgbe/ixgbe_ethdev.c
> >>>>>>> index 26b192737..a33b9a6e8 100644
> >>>>>>> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> >>>>>>> @@ -221,6 +221,8 @@ static int ixgbe_dev_interrupt_action(struct
> >>>>>>> rte_eth_dev *dev,
> >>>>>>> struct rte_intr_handle *handle);
> static
> >>> void
> >>>>>>> ixgbe_dev_interrupt_handler(void *param); static void
> >>>>>>> ixgbe_dev_interrupt_delayed_handler(void *param);
> >>>>>>> +static void ixgbe_dev_setup_link_alarm_handler(void *param);
> >>>>>>> +
> >>>>>>> static int ixgbe_add_rar(struct rte_eth_dev *dev, struct
> >>>>>>> ether_addr *mac_addr,
> >>>>>>> uint32_t index, uint32_t pool); static void
> >>>>>>> ixgbe_remove_rar(struct rte_eth_dev *dev, uint32_t index); @@
> >>>>>>> -2791,6 +2793,8 @@ ixgbe_dev_stop(struct rte_eth_dev *dev)
> >>>>>>>
> >>>>>>> PMD_INIT_FUNC_TRACE();
> >>>>>>>
> >>>>>>> + rte_eal_alarm_cancel(ixgbe_dev_setup_link_alarm_handler,
> dev);
> >>>>>>> +
> >>>>>>> /* disable interrupts */
> >>>>>>> ixgbe_disable_intr(hw);
> >>>>>>>
> >>>>>>> @@ -3969,6 +3973,25 @@ ixgbevf_check_link(struct ixgbe_hw
> *hw,
> >>>>>>> ixgbe_link_speed *speed,
> >>>>>>> return ret_val;
> >>>>>>> }
> >>>>>>>
> >>>>>>> +static void
> >>>>>>> +ixgbe_dev_setup_link_alarm_handler(void *param) {
> >>>>>>> + struct rte_eth_dev *dev = (struct rte_eth_dev *)param;
> >>>>>>> + struct ixgbe_hw *hw =
> >>>>>>> IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> >>>>>>> + struct ixgbe_interrupt *intr =
> >>>>>>> + IXGBE_DEV_PRIVATE_TO_INTR(dev->data-
> >dev_private);
> >>>>>>> + u32 speed;
> >>>>>>> + bool autoneg = false;
> >>>>>>> +
> >>>>>>> + speed = hw->phy.autoneg_advertised;
> >>>>>>> + if (!speed)
> >>>>>>> + ixgbe_get_link_capabilities(hw, &speed, &autoneg);
> >>>>>>> +
> >>>>>>> + ixgbe_setup_link(hw, speed, true);
> >>>>>>> +
> >>>>>>> + intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG; }
> >>>>>>> +
> >>>>>>> /* return 0 means link status changed, -1 means not changed */
> >>>>>>> int ixgbe_dev_link_update_share(struct rte_eth_dev *dev, @@ -
> >>> 3981,9
> >>>>>>> +4004,7 @@ ixgbe_dev_link_update_share(struct rte_eth_dev
> *dev,
> >>>>>>> IXGBE_DEV_PRIVATE_TO_INTR(dev->data-
> >dev_private);
> >>>>>>> int link_up;
> >>>>>>> int diag;
> >>>>>>> - u32 speed = 0;
> >>>>>>> int wait = 1;
> >>>>>>> - bool autoneg = false;
> >>>>>>>
> >>>>>>> memset(&link, 0, sizeof(link));
> >>>>>>> link.link_status = ETH_LINK_DOWN; @@ -3993,13 +4014,8
> @@
> >>>>>>> ixgbe_dev_link_update_share(struct
> >>>>> rte_eth_dev
> >>>>>>> *dev,
> >>>>>>>
> >>>>>>> hw->mac.get_link_status = true;
> >>>>>>>
> >>>>>>> - if ((intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG) &&
> >>>>>>> - ixgbe_get_media_type(hw) ==
> ixgbe_media_type_fiber) {
> >>>>>>> - speed = hw->phy.autoneg_advertised;
> >>>>>>> - if (!speed)
> >>>>>>> - ixgbe_get_link_capabilities(hw, &speed,
> &autoneg);
> >>>>>>> - ixgbe_setup_link(hw, speed, true);
> >>>>>>> - }
> >>>>>>> + if (intr->flags & IXGBE_FLAG_NEED_LINK_CONFIG)
> >>>>>>> + return rte_eth_linkstatus_set(dev, &link);
> >>>>>>>
> >>>>>>> /* check if it needs to wait to complete, if lsc interrupt is
> enabled */
> >>>>>>> if (wait_to_complete == 0 ||
> >>>>>>> dev->data->dev_conf.intr_conf.lsc !=
> >>>>>>> 0) @@
> >>>>>>> -4017,11 +4033,14 @@ ixgbe_dev_link_update_share(struct
> >>> rte_eth_dev
> >>>>> *dev,
> >>>>>>> }
> >>>>>>>
> >>>>>>> if (link_up == 0) {
> >>>>>>> - intr->flags |= IXGBE_FLAG_NEED_LINK_CONFIG;
> >>>>>>> + if (ixgbe_get_media_type(hw) ==
> ixgbe_media_type_fiber)
> >>> {
> >>>>>>> + intr->flags |=
> IXGBE_FLAG_NEED_LINK_CONFIG;
> >>>>>>> + rte_eal_alarm_set(10,
> >>>>>>> +
> ixgbe_dev_setup_link_alarm_handler, dev);
> >>>>>>> + }
> >>>>>>> return rte_eth_linkstatus_set(dev, &link);
> >>>>>>> }
> >>>>>>>
> >>>>>>> - intr->flags &= ~IXGBE_FLAG_NEED_LINK_CONFIG;
> >>>>>>> link.link_status = ETH_LINK_UP;
> >>>>>>> link.link_duplex = ETH_LINK_FULL_DUPLEX;
> >>>>>>>
> >>>>>>> @@ -5128,6 +5147,8 @@ ixgbevf_dev_stop(struct rte_eth_dev
> *dev)
> >>>>>>>
> >>>>>>> PMD_INIT_FUNC_TRACE();
> >>>>>>>
> >>>>>>> + rte_eal_alarm_cancel(ixgbe_dev_setup_link_alarm_handler,
> dev);
> >>>>>>> +
> >>>>>>> ixgbevf_intr_disable(dev);
> >>>>>>>
> >>>>>>> hw->adapter_stopped = 1;
> >>>>>>> --
> >>>>>>> 2.17.1
next prev parent reply other threads:[~2018-10-12 7:37 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20180831123824eucas1p1cd2981c716c4764703e24c3eeb4d33c7@eucas1p1.samsung.com>
2018-08-31 12:39 ` Ilya Maximets
2018-09-04 6:08 ` Zhang, Qi Z
2018-09-10 15:08 ` Ilya Maximets
2018-09-12 6:49 ` Zhang, Qi Z
2018-09-12 8:05 ` Ilya Maximets
2018-09-12 8:28 ` Zhang, Qi Z
2018-09-21 14:25 ` Zhang, Qi Z
2018-10-03 7:43 ` Ilya Maximets
2018-10-09 10:22 ` Zhao1, Wei
2018-10-11 9:56 ` Zhao1, Wei
2018-10-11 10:26 ` Ilya Maximets
2018-10-11 12:21 ` Laurent Hardy
2018-10-12 7:36 ` Zhao1, Wei [this message]
2018-10-15 10:43 ` Laurent Hardy
2018-10-16 8:29 ` Zhao1, Wei
2018-10-12 9:19 ` Zhao1, Wei
2018-10-12 10:14 ` Ilya Maximets
2018-10-15 3:03 ` Zhao1, Wei
2018-10-15 8:40 ` Ilya Maximets
2018-10-16 8:59 ` Zhao1, Wei
2018-10-30 10:20 ` Ilya Maximets
2018-11-01 15:45 ` Zhang, Qi Z
2018-11-01 16:05 ` Ilya Maximets
[not found] ` <CGME20181101160505eucas1p1fcf268f3febaa80dcbb3e573b2fc2c68@eucas1p1.samsung.com>
2018-11-01 16:04 ` [dpdk-dev] [PATCH v2] " Ilya Maximets
2018-11-02 13:49 ` Zhang, Qi Z
2018-11-07 15:52 ` Burakov, Anatoly
2018-11-08 10:27 ` Ilya Maximets
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=A2573D2ACFCADC41BB3BE09C6DE313CA07E4F2A0@PGSMSX103.gar.corp.intel.com \
--to=wei.zhao1@intel.com \
--cc=dev@dpdk.org \
--cc=i.maximets@samsung.com \
--cc=konstantin.ananyev@intel.com \
--cc=laurent.hardy@6wind.com \
--cc=qi.z.zhang@intel.com \
--cc=stable@dpdk.org \
--cc=wenzhuo.lu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).