DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] DPDK memory error check and offline bad pages
@ 2017-11-07 21:13 Jianjian Huo
  2017-11-13  5:52 ` Jianjian Huo
  2017-11-13  7:08 ` Tan, Jianfeng
  0 siblings, 2 replies; 5+ messages in thread
From: Jianjian Huo @ 2017-11-07 21:13 UTC (permalink / raw)
  To: dev

Hi dpdk developers,

I have a question regarding how DPDK memory module treats memory errors.

In Linux kernel, it has mechanism (mcelog and EDAC) to monitor the memory controller and report correctable/uncorrectable memory errors. Using some configurations, if memory errors exceed threshold, system can offline bad memory pages and avoid applications to access/crash.
Do we have similar mechanism in DPDK?

Thanks,
Jianjian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] DPDK memory error check and offline bad pages
  2017-11-07 21:13 [dpdk-dev] DPDK memory error check and offline bad pages Jianjian Huo
@ 2017-11-13  5:52 ` Jianjian Huo
  2017-11-13  7:08 ` Tan, Jianfeng
  1 sibling, 0 replies; 5+ messages in thread
From: Jianjian Huo @ 2017-11-13  5:52 UTC (permalink / raw)
  To: dev

Anyone has any idea on this?
Can’t believe DPDK doesn’t support such an important feature. This is going to be a show stopper for real production system.

-Jianjian

On 11/7/17, 1:13 PM, "Jianjian Huo" <j.huo@alibaba-inc.com> wrote:

    Hi dpdk developers,
    
    I have a question regarding how DPDK memory module treats memory errors.
    
    In Linux kernel, it has mechanism (mcelog and EDAC) to monitor the memory controller and report correctable/uncorrectable memory errors. Using some configurations, if memory errors exceed threshold, system can offline bad memory pages and avoid applications to access/crash.
    Do we have similar mechanism in DPDK?
    
    Thanks,
    Jianjian
    
    

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] DPDK memory error check and offline bad pages
  2017-11-07 21:13 [dpdk-dev] DPDK memory error check and offline bad pages Jianjian Huo
  2017-11-13  5:52 ` Jianjian Huo
@ 2017-11-13  7:08 ` Tan, Jianfeng
  2017-11-13 21:40   ` Wiles, Keith
  1 sibling, 1 reply; 5+ messages in thread
From: Tan, Jianfeng @ 2017-11-13  7:08 UTC (permalink / raw)
  To: Jianjian Huo, dev

Hi Jianjian,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianjian Huo
> Sent: Wednesday, November 8, 2017 5:13 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] DPDK memory error check and offline bad pages
> 
> Hi dpdk developers,
> 
> I have a question regarding how DPDK memory module treats memory
> errors.

You mean hardware error which cannot be fixed by ECC?

> 
> In Linux kernel, it has mechanism (mcelog and EDAC) to monitor the memory
> controller and report correctable/uncorrectable memory errors. Using some
> configurations, if memory errors exceed threshold, system can offline bad
> memory pages and avoid applications to access/crash.

DPDK app is just one of applications. Are there any framework to notify such error to applications?
To notify is the first thing, to recover is another thing which takes more effort.

> Do we have similar mechanism in DPDK?

No, as far as I know.

Thanks,
Jianfeng

> 
> Thanks,
> Jianjian
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] DPDK memory error check and offline bad pages
  2017-11-13  7:08 ` Tan, Jianfeng
@ 2017-11-13 21:40   ` Wiles, Keith
  2017-11-14  0:03     ` Tan, Jianfeng
  0 siblings, 1 reply; 5+ messages in thread
From: Wiles, Keith @ 2017-11-13 21:40 UTC (permalink / raw)
  To: Tan, Jianfeng; +Cc: Jianjian Huo, dev



> On Nov 12, 2017, at 11:08 PM, Tan, Jianfeng <jianfeng.tan@intel.com> wrote:
> 
> Hi Jianjian,
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianjian Huo
>> Sent: Wednesday, November 8, 2017 5:13 AM
>> To: dev@dpdk.org
>> Subject: [dpdk-dev] DPDK memory error check and offline bad pages
>> 
>> Hi dpdk developers,
>> 
>> I have a question regarding how DPDK memory module treats memory
>> errors.
> 
> You mean hardware error which cannot be fixed by ECC?
> 
>> 
>> In Linux kernel, it has mechanism (mcelog and EDAC) to monitor the memory
>> controller and report correctable/uncorrectable memory errors. Using some
>> configurations, if memory errors exceed threshold, system can offline bad
>> memory pages and avoid applications to access/crash.
> 
> DPDK app is just one of applications. Are there any framework to notify such error to applications?
> To notify is the first thing, to recover is another thing which takes more effort.
> 
>> Do we have similar mechanism in DPDK?
> 
> No, as far as I know.


Because DPDK runs as a normal user space application in Linux then the current features in the Linux Kernel can be used correct?

> 
> Thanks,
> Jianfeng
> 
>> 
>> Thanks,
>> Jianjian
>> 
> 

Regards,
Keith

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [dpdk-dev] DPDK memory error check and offline bad pages
  2017-11-13 21:40   ` Wiles, Keith
@ 2017-11-14  0:03     ` Tan, Jianfeng
  0 siblings, 0 replies; 5+ messages in thread
From: Tan, Jianfeng @ 2017-11-14  0:03 UTC (permalink / raw)
  To: Wiles, Keith; +Cc: Jianjian Huo, dev



On 11/14/2017 5:40 AM, Wiles, Keith wrote:
>
>> On Nov 12, 2017, at 11:08 PM, Tan, Jianfeng <jianfeng.tan@intel.com> wrote:
>>
>> Hi Jianjian,
>>
>>> -----Original Message-----
>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianjian Huo
>>> Sent: Wednesday, November 8, 2017 5:13 AM
>>> To: dev@dpdk.org
>>> Subject: [dpdk-dev] DPDK memory error check and offline bad pages
>>>
>>> Hi dpdk developers,
>>>
>>> I have a question regarding how DPDK memory module treats memory
>>> errors.
>> You mean hardware error which cannot be fixed by ECC?
>>
>>> In Linux kernel, it has mechanism (mcelog and EDAC) to monitor the memory
>>> controller and report correctable/uncorrectable memory errors. Using some
>>> configurations, if memory errors exceed threshold, system can offline bad
>>> memory pages and avoid applications to access/crash.
>> DPDK app is just one of applications. Are there any framework to notify such error to applications?
>> To notify is the first thing, to recover is another thing which takes more effort.
>>
>>> Do we have similar mechanism in DPDK?
>> No, as far as I know.
>
> Because DPDK runs as a normal user space application in Linux then the current features in the Linux Kernel can be used correct?

I suppose so, but we still have not leveraged any of those features 
explicitly AFAIK. Implicitly? Tend to be problematic, as DPDK only 
translates physical address once at the very beginning.

Thanks,
Jianfeng

>
>> Thanks,
>> Jianfeng
>>
>>> Thanks,
>>> Jianjian
>>>
> Regards,
> Keith
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-14  0:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-07 21:13 [dpdk-dev] DPDK memory error check and offline bad pages Jianjian Huo
2017-11-13  5:52 ` Jianjian Huo
2017-11-13  7:08 ` Tan, Jianfeng
2017-11-13 21:40   ` Wiles, Keith
2017-11-14  0:03     ` Tan, Jianfeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).