DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] "No probed ethernet devices" caused by inaccurate msec_delay()
@ 2014-01-27  2:56 Sangjin Han
  2014-01-27  9:19 ` Thomas Monjalon
  0 siblings, 1 reply; 5+ messages in thread
From: Sangjin Han @ 2014-01-27  2:56 UTC (permalink / raw)
  To: dev

Hi,

I encountered this error message when I tried to use the testpmd application.

Cause: No probed ethernet devices - check that
CONFIG_RTE_LIBRTE_IGB_PMD=y and that CONFIG_RTE_LIBRTE_EM_PMD=y and
that CONFIG_RTE_LIBRTE_IXGBE_PMD=y in your configuration file

which is caused by rte_eth_dev_count() == 0. However, my 82599 ports
are already unbound from ixgbe. (I have two Xeon X5560 (@ 2.80GHz)
processors and two X520-DA2 cards).

I googled for possible causes and came across a similar case:
http://openetworking.blogspot.com/2014/01/debugging-no-probed-ethernet-devices.html

Based on the article, I dug into the source code, and found the cause:

ixgbe_82599.c: ixgbe_reset_pipeline_82599()
...
for (i = 0; i < 10; i++) {
        msec_delay(4);
        anlp1_reg = IXGBE_READ_REG(hw, IXGBE_ANLP1);
        if (anlp1_reg & IXGBE_ANLP1_AN_STATE_MASK)
                break;
}

if (!(anlp1_reg & IXGBE_ANLP1_AN_STATE_MASK)) {
        DEBUGOUT("auto negotiation not completed\n");
        ret_val = IXGBE_ERR_RESET_FAILED;
        goto reset_pipeline_out;
}
...

The number of iterations (== 10) in the for loop was not enough. In my
case, it needed to be at least 12, then everything worked fine.

The issue was that msec_delay() is not very accurate on my system.
While it reads the CPU Hz info from /proc/cpuinfo, it may not reflect
the actual TSCs/sec. Since I did not disable the P-State feature ,
/proc/cpuinfo reports 1.6GHz, but my TSC counter is 2.8GHz. As a
result, msec_delay(4) only waited 2.x milliseconds, which in turn
causes the failure.

I think /proc/cpuinfo is not a reliable way to get
eal_tsc_resolution_hz, since it varies based on the current CPU clock
frequency. Enforcing applications to run at the max frequency can be
too restrictive. It would be nice if I can bypass
set_tsc_freq_from_cpuinfo() in set_tsc_freq().

Thanks,
Sangjin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] "No probed ethernet devices" caused by inaccurate msec_delay()
  2014-01-27  2:56 [dpdk-dev] "No probed ethernet devices" caused by inaccurate msec_delay() Sangjin Han
@ 2014-01-27  9:19 ` Thomas Monjalon
  2014-01-28  1:16   ` Sangjin Han
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Monjalon @ 2014-01-27  9:19 UTC (permalink / raw)
  To: Sangjin Han; +Cc: dev

Hello,

27/01/2014 03:56, Sangjin Han:
> Cause: No probed ethernet devices - check that
> CONFIG_RTE_LIBRTE_IGB_PMD=y and that CONFIG_RTE_LIBRTE_EM_PMD=y and
> that CONFIG_RTE_LIBRTE_IXGBE_PMD=y in your configuration file
[...] 
> I googled for possible causes and came across a similar case:
> http://openetworking.blogspot.com/2014/01/debugging-no-probed-ethernet-devi
> ces.html
[...]
>         msec_delay(4);
[...]
> I think /proc/cpuinfo is not a reliable way to get
> eal_tsc_resolution_hz, since it varies based on the current CPU clock
> frequency. Enforcing applications to run at the max frequency can be
> too restrictive.

Indeed, as described in the quick start page, the highest frequency must be 
set: http://dpdk.org/doc/quick-start

> It would be nice if I can bypass set_tsc_freq_from_cpuinfo() in
> set_tsc_freq().

I think it would not solve the problem because your clock is varying and the 
TSC calibration must be updated accordingly with different values by core.

Feel free to submit a patch if you find a smart solution.
-- 
Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] "No probed ethernet devices" caused by inaccurate msec_delay()
  2014-01-27  9:19 ` Thomas Monjalon
@ 2014-01-28  1:16   ` Sangjin Han
  2014-01-28 16:23     ` Thomas Monjalon
  0 siblings, 1 reply; 5+ messages in thread
From: Sangjin Han @ 2014-01-28  1:16 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

Hi,

>> It would be nice if I can bypass set_tsc_freq_from_cpuinfo() in
>> set_tsc_freq().
>
> I think it would not solve the problem because your clock is varying and the
> TSC calibration must be updated accordingly with different values by core.

Reasonably new Intel CPUs (including Nehalem) has a constant TSC rate,
regardless of the current P/C-state (constant_tsc and nonstop_tsc
flags in /proc/cpuinfo). So TSC calibration is unnecessary even with
variable clock frequency on those CPUs.

Also, it seems that there is no guarantee that the TSC rate is
identical to the CPU max clock frequency. While it happens to be true
for Intel CPUs, this article from AMD says,
(https://lkml.org/lkml/2005/11/4/173)

"The rate of the invariant TSC is implementation-dependent and will
likely *not* be the frequency of the processor core [...]"

It would be great if someone can actually measure TSC rate on AMD
processors to verify this.

I would like to suggest two possible options:

1. If we can assume that the TSC rate always equals to the max clock
frequency, then we can use
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq instead of
/proc/cpuinfo (which reflects cpuinfo_cur_freq).

2. If we can't (AMD?), we can simply get rid of
set_tsc_freq_from_cpuinfo() and fall back to set_tsc_freq_from_clock()
or set_tsc_freq_ballback() instead. I always get reasonably good
accuracy with those two functions -- the only drawback is that it
takes 0.5 - 1 second for applications to boot up. Not sure if it is a
big deal or not, though.

---

Besides the TSC frequency, the 4ms * 10 delay in
ixgbe_reset_pipeline_82599() seems too tight. On my system, it
succeeds only after 7 (or so) iterations with correct msec_delay().
The per-iteration delay (4ms; in the kernel ixgbe driver, it is set to
be 4-8ms) and/or the number of iterations (10) should be increased, I
suppose.

Sangjin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] "No probed ethernet devices" caused by inaccurate msec_delay()
  2014-01-28  1:16   ` Sangjin Han
@ 2014-01-28 16:23     ` Thomas Monjalon
  2014-01-28 18:13       ` Stephen Hemminger
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Monjalon @ 2014-01-28 16:23 UTC (permalink / raw)
  To: Sangjin Han; +Cc: dev

28/01/2014 02:16, Sangjin Han:
> >> It would be nice if I can bypass set_tsc_freq_from_cpuinfo() in
> >> set_tsc_freq().
> > 
> > I think it would not solve the problem because your clock is varying and
> > the TSC calibration must be updated accordingly with different values by
> > core.

[...]
> Also, it seems that there is no guarantee that the TSC rate is
> identical to the CPU max clock frequency.

So you may submit a revert of the commit a46154b9c6bc
(timer: get TSC frequency from /proc/cpuinfo)

> I would like to suggest two possible options:
> 
> 1. If we can assume that the TSC rate always equals to the max clock
> frequency, then we can use
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq instead of
> /proc/cpuinfo (which reflects cpuinfo_cur_freq).
> 
> 2. If we can't (AMD?), we can simply get rid of
> set_tsc_freq_from_cpuinfo() and fall back to set_tsc_freq_from_clock()
> or set_tsc_freq_ballback() instead. I always get reasonably good
> accuracy with those two functions -- the only drawback is that it
> takes 0.5 - 1 second for applications to boot up. Not sure if it is a
> big deal or not, though.

Maybe that you can choose between these two methods with a runtime option.

> Besides the TSC frequency, the 4ms * 10 delay in
> ixgbe_reset_pipeline_82599() seems too tight. On my system, it
> succeeds only after 7 (or so) iterations with correct msec_delay().
> The per-iteration delay (4ms; in the kernel ixgbe driver, it is set to
> be 4-8ms) and/or the number of iterations (10) should be increased, I
> suppose.

Feel free to submit a patch.

-- 
Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] "No probed ethernet devices" caused by inaccurate msec_delay()
  2014-01-28 16:23     ` Thomas Monjalon
@ 2014-01-28 18:13       ` Stephen Hemminger
  0 siblings, 0 replies; 5+ messages in thread
From: Stephen Hemminger @ 2014-01-28 18:13 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev

TSC has lots of platform related issues. It is not guaranteed sync'd across physical
packages and AMD boxes have lots of problems.

Why does delay_ms not just use nanosleep() and let the OS worry about it?
On a related note, I have found that putting the worker (non master) threads
into real time scheduling class also helps.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-01-28 18:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-27  2:56 [dpdk-dev] "No probed ethernet devices" caused by inaccurate msec_delay() Sangjin Han
2014-01-27  9:19 ` Thomas Monjalon
2014-01-28  1:16   ` Sangjin Han
2014-01-28 16:23     ` Thomas Monjalon
2014-01-28 18:13       ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).