Linux内核下RAS(Reliability, Availability and Serviceability)功能分析记录

1 简介

Reliability, Availability and Serviceability (RAS) — The Linux Kernel documentation

在服务器 和 卫星等领域,对设备的稳定性要求很高,需要及时的发现并处理软/硬件上的错误。RAS功能可以用来及时的发现硬件上的错误。

RAS功能需要硬件的支持。

目录我了解到的linux内核下的RAS功能有以下几类:

  • EDAC:主要用来检测物理内存 和 PCI硬件错误
  • APEI:基于ACPI的RAS
  • ARMv8架构的RAS:使用这个功能的CPU很少,目前只知道飞腾D2000V使用了这个功能。
  • AMDGPU的RAS

2 EDAC(Error Detection And Correction)

2.1 简介

The ``edac`` kernel module's goal is to detect and report hardware errors that occur within the computer system running under linux.
                                《<kernel_src/Documentation/admin-guide/ras.rst>》

2.2 EDAC的核⼼模块:edac_core.ko

2.2.1 中断 或者 轮训模式 来获取硬件错误信息

全局变量edac_op_state用来控制使用中断 或者 轮训模式,可以通过模块参数来设置edac_op_state的值,例如:

drivers/edac/amd64_edac.c:3753:module_param(edac_op_state, int, 0444);
drivers/edac/x38_edac.c:523:module_param(edac_op_state, int, 0444);

默认为轮训模式。轮训模式下,内核会创建专用的工作队列——edac-poller来周期获取硬件错误信息。

 

2.2.2 创建专用工作队列——edac-poller

edac_init();-> edac_workqueue_setup();-> alloc_ordered_workqueue("edac-poller", WQ_MEM_RECLAIM);

对应的可以在系统下看到一个工作队列处理线程

# ps aux | grep edac-
root         124  0.0  0.0      0     0 ?        I<   10:09   0:00 [edac-poller]

2.2.3 向专用工作队列(edac-poller)添加工作项

bool edac_queue_work(struct delayed_work *work, unsigned long delay)                                                                                    
{return queue_delayed_work(wq, work, delay);
}
EXPORT_SYMBOL_GPL(edac_queue_work);

 

2.2.4 模块参数

# ls /sys/module/edac_core/parameters/
check_pci_errors  edac_mc_log_ue       edac_mc_poll_msec
edac_mc_log_ce    edac_mc_panic_on_ue  edac_pci_panic_on_pe

2.3 通过EDAC功能来获取物理内存的硬件ECC错误

2.3.1 ECC功能简介

ECC的⼯作原理

As mentioned on the previous section, ECC memory has extra bits to be
used for error correction. So, on 64 bit systems, a memory module
has 64 bits of *data width*, and 74 bits of *total width*. So, there are
8 bits extra bits to be used for the error detection and correction
mechanisms. Those extra bits are called *syndrome*\ [#f1]_\ [#f2]_.

So, when the cpu requests the memory controller to write a word with 
*data width*, the memory controller calculates the *syndrome* in real time,
using Hamming code, or some other error correction code, like SECDED+,
producing a code with *total width* size. Such code is then written
on the memory modules.

At read, the *total width* bits code is converted back, using the same
ECC code used on write, producing a word with *data width* and a *syndrome*.
The word with *data width* is sent to the CPU, even when errors happen.

The memory controller also looks at the *syndrome* in order to check if
there was an error, and if the ECC code was able to fix such error.
If the error was corrected, a Corrected Error (CE) happened. If not, an
Uncorrected Error (UE) happened.
                                《<kernel_src>/Documentation/admin-guide/ras.rst》

2.3.2 数据结构——struct mem_ctl_info;

struct mem_ctl_info {....../* pointer to edac checking routine */void (*edac_check) (struct mem_ctl_info * mci);......
};

2.3.3 创建/sys/devices/system/edac/mc/下的文件 并 创建工作项

edac_mc_add_mc_with_groups();-> edac_create_sysfs_mci_device();-> INIT_DELAYED_WORK(&mci->work, edac_mc_workq_function);

2.3.4 工作项处理函数——edac_mc_workq_function();

edac_mc_workq_function();-> mci->edac_check(mci);      //获取硬件错误的具体函数-> edac_queue_work(&mci->work, msecs_to_jiffies(edac_mc_get_poll_msec()));    //不断周期运行

上面的程序会周期运行,周期为模块参数/sys/module/edac_core/parameters/edac_mc_poll_msec。

2.3.5 /sys/devices/system/edac/mc/

请参考《<kernel_src>/Documentation/admin-guide/ras.rst》

2.3.6 实际用例(Freescale的MPC8572处理器) 

MPC8572手册上的DDR Memory Controllers信息,《MPC8572E PowerQUICC™ III Integrated Host Processor Family Reference Manual》Page9-1

 

设备树

//arch/powerpc/boot/dts/fsl/mpc8572si-post.dtsimemory-controller@2000 {compatible = "fsl,mpc8572-memory-controller";reg = <0x2000 0x1000>;interrupts = <18 2 0 0>;};  memory-controller@6000 {compatible = "fsl,mpc8572-memory-controller";reg = <0x6000 0x1000>;interrupts = <18 2 0 0>;};

edac驱动

fsl_mc_err_probe();    //drivers/edac/mpc85xx_edac.c-> mci->edac_check = fsl_mc_check;    //获取物理内存错误信息的关键函数-> edac_mc_add_mc_with_groups(mci, fsl_ddr_dev_groups);

2.4 通过EDAC功能来获取PCI硬件错误

Linux下通过EDAC功能检测PCIE硬件错误_linux如何查询pcie误码率-CSDN博客

2.5 通过EDAC功能获取其他类型硬件的错误

edac_device_add_device();-> edac_device_create_sysfs();-> edac_device_workq_setup();-> INIT_DELAYED_WORK(&edac_dev->work, edac_device_workq_function);

3 APEI(ACPI Platform Error Interface)

3.1 简介

APEI allows to report errors (for example from the chipset) to the operating system. This improves NMI handling especially. In addition it supports error serialization and error injection.
                                《<kernel_src>/drivers/acpi/apei/Kconfig》

ACPI Platform Error Interfaces (APEI), which provide a means for a computer platform to convey error information to OSPM.
APEI consists of four separate tables:

  • Error Record Serialization Table (ERST)
  • Boot Error Record Table (BERT)
  • Hardware Error Source Table (HEST)
  • Error Injection Table (EINJ)

                                《Advanced Configuration and Power Interface (ACPI) Specification》P793

3.2 APEI Generic Hardware Error Source(GHES)

内核配置:CONFIG_ACPI_APEI_GHES

Generic Hardware Error Source provides a way to report platform hardware errors (such as that from chipset). It works in so called "Firmware First" mode, that is, hardware errors are reported to firmware firstly, then reported to Linux by firmware. This way, some non-standard hardware error registers or non-standard hardware link can be checked by firmware to produce more valuable hardware error information for Linux.
                                《drivers/acpi/apei/Kconfig》

3.3 APEI PCIe AER logging/recovering support

内核配置:CONFIG_ACPI_APEI_PCIEAER

PCIe AER errors may be reported via APEI firmware first mode. Turn on this option to enable the corresponding support.
                                《drivers/acpi/apei/Kconfig》

调试方法

/sys/kernel/debug/tracing/events/ras/aer_event/

3.4 APEI memory error recovering support

内核配置: CONFIG_ACPI_APEI_MEMORY_FAILURE

Memory errors may be reported via APEI firmware first mode. Turn on this option to enable the memory recovering support.
                                《drivers/acpi/apei/Kconfig》

调试方法

/sys/kernel/debug/tracing/events/ras/mc_event/

3.5 APEI Error INJection (EINJ)

3.5.1 简介

内核配置: CONFIG_ACPI_APEI_EINJ

EINJ provides a hardware error injection mechanism, it is mainly used for debugging and testing the other parts of APEI and some other RAS features.
                                《drivers/acpi/apei/Kconfig》

3.5.2 Error Injection Table

The Error Injection (EINJ) table provides a generic interface mechanism through which OSPM can inject hardware errors to the platform without requiring platform specific OSPM software. System firmware is responsible for building this table, which is made up of Injection Instruction entries.
                                《Advanced Configuration and Power Interface (ACPI) Specification》P832

3.5.3 是否支持EINJ

是否存在 /sys/firmware/acpi/tables/EINJ。

go into BIOS setup to see if the BIOS has an option to enable error injection. Look for something called WHEA or similar. Often, you need to enable an ACPI5 support option prior, in order to see the APEI,EINJ,... functionality supported and exposed by the BIOS menu.
                                《Documentation/firmware-guide/acpi/apei/einj.rst》

3.5.4 /sys/kernel/debug/apei/einj/

使用方法:服务器内存故障预测居然可以这样做

3.6 ARMv8架构下对APEI中断的支持

APEI requires the equivalent of an SCI and an NMI on ARMv8. The SCI is used to notify the OSPM of errors that have occurred but can be corrected and the system can continue correct operation, even if possibly degraded. The NMI is used to indicate fatal errors that cannot be corrected, and require immediate attention.

Since there is no direct equivalent of the x86 SCI or NMI, arm64 handles these slightly differently. The SCI is handled as a high priority interrupt; given that these are corrected (or correctable) errors being reported, this is sufficient. The NMI is emulated as the highest priority interrupt possible. This implies some caution must be used since there could be interrupts at higher privilege levels or even interrupts at the same priority as the emulated NMI. In Linux, this should not be the case but one should be aware it could happen.
                                《<kernel_src>/Documentation/arm64/acpi_object_usage.rst》

3.7 关键函数——ghes_do_proc();

3.8 调试方法

# ls /sys/kernel/debug/tracing/events/ras/ -l
总用量 0
drwxr-x--- 2 root root 0 5月  13 10:09 aer_event
drwxr-x--- 2 root root 0 5月  13 10:09 arm_event
-rw-r----- 1 root root 0 5月  13 10:09 enable
drwxr-x--- 2 root root 0 5月  13 10:09 extlog_mem_event
-rw-r----- 1 root root 0 5月  13 10:09 filter
drwxr-x--- 2 root root 0 5月  13 10:09 mc_event
drwxr-x--- 2 root root 0 5月  13 10:09 memory_failure_event
drwxr-x--- 2 root root 0 5月  13 10:09 non_standard_event

4 ARMv8 的 RAS

RAS System Architecture,请看《Arm Architecture Reference Manual for Aprofile architecture》Page11593

我目前接触过的ARM处理器中,只有飞腾D2000V使用了ARMv8手册中所描述的RAS功能。

5 AMDGPU RAS Support

https://www.kernel.org/doc/html/latest/gpu/amdgpu/ras.html

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/695883.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

vue3中使用cherry-markdown

附cherry-markdown官网及api使用示例 官网:https://github.com/Tencent/cherry-markdown/blob/main/README.CN.md api:Cherry Markdown API 考虑到复用性,我在插件的基础上做了二次封装,步骤如下: 1.下载 (一定要指定版本0.8.22,否则会报错: [vitel Internal server e…

省公派访学|社科老师赴世界名校牛津大学开展研究

F老师已获某省公派出国访学半年的资助&#xff0c;希望落实的学校尽量知名。但因为F老师只是硕士毕业而无博士学位&#xff0c;专业方向又是社科类&#xff0c;所以申请到世界知名高校有一定难度。经过努力&#xff0c;最终我们获得了世界顶尖高校-英国牛津大学的访问学者邀请函…

558、Vue 3 学习笔记 -【常用Composition API(七)】 2024.05.13

目录 一、Composition API的优势1. Options API存在的问题2. Composition API的优势 二、 新的组件1. Fragment2. Teleport3. Suspense 三、其他1. 全局API的转移2. 其他改变 四、参考链接 一、Composition API的优势 1. Options API存在的问题 使用传统OptionsAPI中&#xf…

全球静态IP购买:全面指南与实用建议

在数字化时代&#xff0c;互联网连接已成为企业和个人日常运营和生活的核心。而全球静态IP地址&#xff0c;作为网络身份的独特标识&#xff0c;其重要性不言而喻。静态IP地址提供了稳定的网络连接和更高级的网络管理功能&#xff0c;使得远程访问、服务器托管、网络安全等应用…

如何在群晖NAS中开启FTP并实现使用公网地址远程访问传输文件

文章目录 1. 群晖安装Cpolar2. 创建FTP公网地址3. 开启群晖FTP服务4. 群晖FTP远程连接5. 固定FTP公网地址6. 固定FTP地址连接 本文主要介绍如何在群晖NAS中开启FTP服务并结合cpolar内网穿透工具&#xff0c;实现使用固定公网地址远程访问群晖FTP服务实现文件上传下载。 Cpolar内…

2024年淘宝618跨店满减是满300减多少?淘宝618超级红包领取口令是什么?

2024年淘宝618跨店满减是满300减多少&#xff1f; 随着2024年618年中大促的临近&#xff0c;各大电商平台纷纷亮出了各自的优惠活动。其中&#xff0c;淘宝和天猫的跨店满减活动总是备受关注。今年&#xff0c;淘宝618更是取消预售环节&#xff0c;推出了新的玩法——淘宝、天…

PRTR5V0U4D ESD抑制器 6V TVS二极管 参数 应用案例

PRTR5V0U4D 是一款特定应用型 ESD&#xff08;静电放电&#xff09;和 ESD/EMI&#xff08;电磁干扰&#xff09;解决方案产品。它是一款超低电容四路轨到轨 ESD 保护器件&#xff0c;适用于保护高速数据线和高频信号线免受 ESD 和其他瞬态电压的影响。该器件采用了小型 SOT457…

只需使用浏览器,就能从地球上的任何地方远程访问你的树莓派?

我们很高兴地宣布 Raspberry Pi Connect 测试版发布&#xff1a;这是一种安全、易用的方法&#xff0c;只需使用网络浏览器&#xff0c;就能从地球上的任何地方远程访问您的 Raspberry Pi。 Raspberry Pi Connect 测试版&#xff1a;https://www.raspberrypi.com/software/con…

数据科学:使用Optuna进行特征选择

大家好&#xff0c;特征选择是机器学习流程中的关键步骤&#xff0c;在实践中通常有大量的变量可用作模型的预测变量&#xff0c;但其中只有少数与目标相关。特征选择包括找到这些特征的子集&#xff0c;主要用于改善泛化能力、助力推断预测、提高训练效率。有许多技术可用于执…

Linux本地部署Nightingale夜莺监控并实现远程访问提高运维效率

&#x1f49d;&#x1f49d;&#x1f49d;欢迎来到我的博客&#xff0c;很高兴能够在这里和您见面&#xff01;希望您在这里可以感受到一份轻松愉快的氛围&#xff0c;不仅可以获得有趣的内容和知识&#xff0c;也可以畅所欲言、分享您的想法和见解。 推荐:kwan 的首页,持续学…

Windows虚拟主机如何查看当前磁盘使用数值

我使用的Hostease的Windows虚拟主机产品默认带普通用户权限的Plesk面板&#xff0c;由于我想要搭建第2个网站但是不知道当前磁盘使用了多少&#xff0c;因此想要查看一下但是没有找到具体位置&#xff0c;不敢随意操做&#xff0c;因为也是第一次使用Hostease主机产品&#xff…

第七届世界通信工程研讨会(WSCE 2024)即将召开!

第七届世界通信工程研讨会&#xff08;WSCE 2024&#xff09;将于2024年9月27-29日在日本东京举行。WSCE 的成立旨在应对通信工程领域所面临的挑战和机遇&#xff0c;尽管该领域已趋于饱和&#xff0c;但其仍保持着强劲的发展势头。本次研讨会旨在加速通信创新并加强该领域专家…