在debug k8s node不可用过程中,有可能会看到:
System OOM encountered, victim process: xx
为了搞清楚oom事件是什么,以及如何产生的,我们做了一定探索,并输出了下面的信息。(本文关注oom事件是如何生成&传输的,具体cadvisor如何判定oom不在本片的讨论范围)
解析
主要代码文件:
1)pkg.kubelet.oom.oom_watcher_linux.go
oom_watcher主要描述了kubelet是如何接受并log系统产生的oom事件的
2)oom_watcher_linux.go:
NewWatcher方法会返回一个Watcher类型的对象,该对象包含recorder和oomStreamer。recorder用于记录,oomStreamer是一个OomParser(Cadvisor)类型的对象, 用于将OomInstance类型的对象写入outStream管道(channel)
package oomimport ("fmt"v1 "k8s.io/api/core/v1""k8s.io/apimachinery/pkg/util/runtime""k8s.io/client-go/tools/record""k8s.io/klog/v2""github.com/google/cadvisor/utils/oomparser"
)// streamer 接口定义了一个 StreamOoms 函数,
// 它接收一个 oomparser.OomInstance 类型的 channel,存储OomInstance类型数据
type streamer interface {StreamOoms(chan<- *oomparser.OomInstance)
}var _ streamer = &oomparser.OomParser{}type realWatcher struct {recorder record.EventRecorderoomStreamer streamer
}var _ Watcher = &realWatcher{}// NewWatcher creates and initializes a OOMWatcher backed by Cadvisor as
// the oom streamer.
// 启动一个新的OOM watcher, 参数是一个 EventRecorder
// EventRecorder 是一个能够存储event并记录到一个queue里的Interface
// 函数声明中前面的括号里面是函数形参列表;后面的括号里面是函数返回值列表。
func NewWatcher(recorder record.EventRecorder) (Watcher, error) {
// 生成一个oomStreamer,由cadvisor的oomparser创建oomStreamer, err := oomparser.New()if err != nil {return nil, err}
// 生成一个watcher,包含上面的两个对象: recorder 和 oomStreamerwatcher := &realWatcher{recorder: recorder,oomStreamer: oomStreamer,}return watcher, nil
}// Start watches for system oom's and records an event for every system oom encountered.
func (ow *realWatcher) Start(ref *v1.ObjectReference) error {
// 这段代码用来创建一个outStream channel,它是一个由 oomparser.OomInstance
// 类型指针元素的channel,并可以向channel中传输10个元素。接着就启动了一个goroutine,
// 该goroutine调用ow.oomStreamer.StreamOoms方法并将outStream作为参数传入。该方法会往outStream channel中不断地写数据(即oom instance对象)outStream := make(chan *oomparser.OomInstance, 10)go ow.oomStreamer.StreamOoms(outStream)go func() {defer runtime.HandleCrash()
// 从outStream 读取event,并根据判断条件做是否oom。并输出相应的logfor event := range outStream {if event.VictimContainerName == recordEventContainerName {klog.V(1).InfoS("Got sys oom event", "event", event)eventMsg := "System OOM encountered"if event.ProcessName != "" && event.Pid != 0 {eventMsg = fmt.Sprintf("%s, victim process: %s, pid: %d", eventMsg, event.ProcessName, event.Pid)}ow.recorder.Eventf(ref, v1.EventTypeWarning, systemOOMEvent, eventMsg)}}klog.ErrorS(nil, "Unexpectedly stopped receiving OOM notifications")}()return nil
}
再来看下kubelet.go中如何应用
kubelet.go:
创建oomWatcher
# 通过上面的NewWathcher方法创建一个新的oomWatcher
oomWatcher, err := oomwatcher.NewWatcher(kubeDeps.Recorder)
# 如果创建新的oomWatcher报错,则查看原因
if err != nil {if libcontaineruserns.RunningInUserNS() {if utilfeature.DefaultFeatureGate.Enabled(features.KubeletInUserNamespace) {// oomwatcher.NewWatcher returns "open /dev/kmsg: operation not permitted" error,// when running in a user namespace with sysctl value `kernel.dmesg_restrict=1`.klog.V(2).InfoS("Failed to create an oomWatcher (running in UserNS, ignoring)", "err", err)oomWatcher = nil} else {klog.ErrorS(err, "Failed to create an oomWatcher (running in UserNS, Hint: enable KubeletInUserNamespace feature flag to ignore the error)")return nil, err}} else {return nil, err}
}
启动oomWatcher
// Start out of memory watcher.if kl.oomWatcher != nil {if err := kl.oomWatcher.Start(kl.nodeRef); err != nil {return fmt.Errorf("failed to start OOM watcher: %w", err)}}
图示
上面的代码体现的就是如下流程,下图较完整描述了oom事件是如何被cAdvisor读取最终输出到node的事件的。
图片参考:启动oomWatcher
参考
1)https://www.jianshu.com/p/ef524b0b0119
2)启动oomWatcher