Opentelemetry——Observability Primer

Observability Primer

可观测性入门

Core observability concepts.
可观测性核心概念。

What is Observability?

什么是可观测性?

Observability lets us understand a system from the outside, by letting us ask questions about that system without knowing its inner workings. Furthermore, it allows us to easily troubleshoot and handle novel problems (i.e. “unknown unknowns”), and helps us answer the question, “Why is this happening?”
可观测性是指我们可以从外部,在不了解其内部工作原理的情况下,可以向系统提出(诊断)问题(的特性)。(可以理解为医生没有进入我们血管,但是可以问我们“血压多少”)此外,它还使我们能够轻松排查和处理新问题,并帮助我们回答”为什么会发生这种情况?之类的问题。

In order to be able to ask those questions of a system, the application must be properly instrumented. That is, the application code must emit signals such as traces, metrics, and logs. An application is properly instrumented when developers don’t need to add more instrumentation to troubleshoot an issue, because they have all of the information they need.
为了能够对系统提出这些问题,应用程序必须被正确测量。也就是说,应用程序代码必须发出Traces、Metrics和Logs等信号。应用程序已被正确测量的标志是:开发人员不需要添加更多测量装置(诸如代码等)来解决问题。 因为他们拥有所有所需的信息。

OpenTelemetry is the mechanism by which application code is instrumented, to help make a system observable.
OpenTelemetry是一种对应用程序代码进行测量,以帮助使其具有可观测性的机制。

Reliability & Metrics

可靠性和指标

Telemetry refers to data emitted from a system, about its behavior. The data can come in the form of traces, metrics, and logs.
遥测(数据)是指从系统发出来的行为数据。数据的形式可以是Traces、Metrics和Logs。

Reliability answers the question: “Is the service doing what users expect it to be doing?” A system could be up 100% of the time, but if, when a user clicks “Add to Cart” to add a black pair of shoes to their shopping cart, and instead, the system doesn’t always add black shoes, then the system would be said to be unreliable.
可靠性回答了这个问题:“服务是否按照用户的期望运行?”。如果一个系统一直可以运行,但是如果当用户点击 “添加到购物车”以将一双黑色鞋子添加到他们的购物车中,然而系统并不总是添加黑鞋,那么就可以认为系统不可靠的。(是想表达服务可用,但是功能错误)

Metrics are aggregations over a period of time of numeric data about your infrastructure or application. Examples include: system error rate, CPU utilization, request rate for a given service. For more on metrics and how they pertain to OpenTelemetry, see Metrics.
指标是在一段时间内基础设施或应用程序的量化数据的聚合信息。示例包括:系统错误率、CPU 利用率,给定服务的请求速率。有关指标及其与 OpenTelemetry关联的更多信息,请参阅Metrics。

SLI, or Service Level Indicator, represents a measurement of a service’s behavior. A good SLI measures your service from the perspective of your users. An example SLI can be the speed at which a web page loads.
SLI服务级别指标表示对服务行为的测量。一个好的 SLI 是从用户的角度来衡量您的服务。 一个SLI示例是网页加载的速度。

SLO, or Service Level Objective, is the means by which reliability is communicated to an organization/other teams. This is accomplished by attaching one or more SLIs to business value.
SLO,即服务水平目标,是向组织/其他团队传达可靠性的方式。这是通过将一个或多个 SLI 附加到业务价值上来实现的。

Understanding Distributed Tracing

了解分布式跟踪

To understand Distributed Tracing, let’s start with some basics.
要了解分布式跟踪,让我们从一些基础知识开始。

Logs

A log is a timestamped message emitted by services or other components. Unlike traces, however, they are not necessarily associated with any particular user request or transaction. They are found almost everywhere in software, and have been heavily relied on in the past by both developers and operators alike to help them understand system behavior.

Log 是由服务或其他组件发出的带时间戳的消息。 然而,与Trace不同的是,Log不一定是与任何特定的用户请求或事务相关联。它们在软件中几乎无处不在,并且在过去被开发人员和操作员严重依赖,以帮助他们了解系统行为。

Sample log:

I, [2021-02-23T13:26:23.505892 #22473] INFO – : [6459ffe1-ea53-4044-aaa3-bf902868f730] Started GET “/” for ::1 at 2021-02-23 13:26:23 -0800

Unfortunately, logs aren’t extremely useful for tracking code execution, as they typically lack contextual information, such as where they were called from.
不幸的是,Log对于跟踪代码执行并不是非常有用,因为它们通常缺少上下文信息,例如从哪里调用它们(即调用链路不清晰)。

They become far more useful when they are included as part of a span, or when they are correlated with a trace and a span.
当它们作为Span的一部分被包含在内时, 或者当它们与Trace和Span相关联时,它们会变得更加有用。

For more on logs and how they pertain to OTel, see Logs.
有关Logs及其与 OTel 关系的更多信息,请参阅Logs。

Spans

A span represents a unit of work or operation. It tracks specific operations that a request makes, painting a picture of what happened during the time in which that operation was executed.
Span表示一个工作或操作的单元。它跟踪请求所产生的具体操作,描绘了当时在执行该操作时发生的情况。

A span contains name, time-related data, structured log messages, and other metadata (that is, Attributes) to provide information about the operation it tracks.
一个Span包含名称、与时间相关的数据、结构化日志消息和其他元数据(即属性),以提供有关其跟踪的操作的信息。

Span attributes

Span属性

The following table contains examples of span attributes:
下表包含Span属性的示例:

KeyValue
http.request.method“GET”
network.protocol.version“1.1”
url.path“/webshop/articles/4”
url.query“?s=1”
server.address“example.com”
server.port8080
url.scheme“https”
http.route“/webshop/articles/:article_id”
http.response.status_code200
client.address“192.0.2.4”
client.socket.address“192.0.2.5” (the client goes through a proxy)
user_agent.original“Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0”

For more on spans and how they pertain to OpenTelemetry, see Spans.
有关Spans及其与OpenTelemetry的关系的详细信息,请参阅Spans。

Distributed Traces

分布式跟踪

A distributed trace, more commonly known as a trace, records the paths taken by requests (made by an application or end-user) as they propagate through multi-service architectures, like microservice and serverless applications.
分布式跟踪(通常称为Trace)记录了在多服务器架构上,如微服务和无服务器应用程序,(由应用程序或最终用户发出的)请求传播的路径。

Without tracing, it is challenging to pinpoint the cause of performance problems in a distributed system.
在分布式系统中,如果没有Trace,就很难查明性能问题的原因。

It improves the visibility of our application or system’s health and lets us debug behavior that is difficult to reproduce locally. Tracing is essential for distributed systems, which commonly have nondeterministic problems or are too complicated to reproduce locally.
它提高了应用程序或系统运行状况的可见性,并让我们调试难以在本地重现的行为。Trace对于分布式系统至关重要,因为很多不确定性问题很难在本地复现。

Tracing makes debugging and understanding distributed systems less daunting by breaking down what happens within a request as it flows through a distributed system.
Trace使调试和理解分布式系统变得不那么令人生畏,它会分解请求流经分布式系统时发生的情况。

A trace is made of one or more spans. The first span represents the root span. Each root span represents a request from start to finish. The spans underneath the parent provide a more in-depth context of what occurs during a request (or what steps make up a request).
Trace由一个或多个Span组成。第一个Span是Root Span。 每个Root Span表示从头到尾的请求。下面的Span父级提供更详细的上下文以了解请求期间发生的情况(或请求的构成步骤)。

Many Observability backends visualize traces as waterfall diagrams that may look something like this:
许多可观测性后端将Trace可视化为瀑布图,这些瀑布图可能看起来像下图:

在这里插入图片描述

Waterfall diagrams show the parent-child relationship between a root span and its child spans. When a span encapsulates another span, this also represents a nested relationship.
瀑布图展示了Root Span和 它的Child Span。当一个Span封装另一个Span时,这也展示嵌套关系。

For more on traces and how they pertain to OpenTelemetry, see Traces.
有关Trace及其与 OpenTelemetry 的关系的详细信息,请参阅Traces。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/611117.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

(2022级)成都工业学院数据库原理及应用实验三:数据定义语言DDL

唉,用爱发电连赞都没几个,博主感觉没有动力了 想要完整版的sql文件的同学们,点赞评论截图,发送到2923612607qq,com,我就会把sql文件以及如何导入sql文件到navicat的使用教程发给你的 基本上是无脑教程了,…

【Ubuntu】 Github Readme导入GIF

1.工具安装 我们使用 ffmpeg 软件来完成转换工作1.1 安装命令 sudo add-apt-repository ppa:jonathonf/ffmpeg-3sudo apt-get updatesudo apt-get install ffmpeg1.2 转换命令 (1)直接转换命令: ffmpeg -i out.mp4 out.gif(2) 带参数命令&…

泽众Testone自动化测试平台,测试用例支持单个调试执行,同步查看执行日志

泽众Testone自动化测试平台之前版本,测试用例批量和单个执行,必须要通过测试集操作执行,操作略繁琐,我们通过本轮优化升级,测试用例直接可以单个调试执行,同步查看执行日志,操作上去繁就简&…

SpringCloud集成SkyWalking链路追踪并收集日志

博主介绍:✌全网粉丝5W,全栈开发工程师,从事多年软件开发,在大厂呆过。持有软件中级、六级等证书。可提供微服务项目搭建与毕业项目实战,博主也曾写过优秀论文,查重率极低,在这方面有丰富的经验…

解决PROFINET转PROFIBUS DP网关控制水处理系统通讯的问题

在工业自动化的浩渺星空中,PROFINET犹如一颗璀璨的明星,以其高效、稳定和灵活的特性,在以太网通信协议的舞台上大放异彩。然而,即便是在最明亮的星光下,也难免会有阴影存在。在实际应用中,PROFINET转PROFIB…

2024个人动态线条导航HTML源码

源码介绍 2024个人导航HTML源码,源码由HTMLCSSJS组成,记事本打开源码文件可以进行内容文字之类的修改,双击html文件可以本地运行效果,也可以上传到服务器里面,重定向这个界面 源码下载 2024个人导航HTML源码

面试经典算法系列之二叉树3 -- 二叉树的层序遍历

面试经典算法18 - 二叉树的层序遍历 LeetCode.102 公众号:阿Q技术站 问题描述 给你二叉树的根节点 root ,返回其节点值的 层序遍历 。 (即逐层地,从左到右访问所有节点)。 示例 1: 输入:roo…

Python学习之-matplotlib详解

前言: Matplotlib 是一个 Python 的图表绘制库,广泛用于生成各种静态、动态和交互式的图表。它能够创建线图、散点图、条形图、饼图、直方图、误差线图、箱型图、热图、子图网络、散点矩阵等图表。 安装 Matplotlib: pip install matplotli…

KKVIEW远程远程访问家里电脑

远程访问家里电脑:简易指南与价值所在 在数字化时代,电脑已成为我们日常生活和工作中不可或缺的工具。有时,我们可能在外出时急需访问家中电脑里的某个文件或应用,这时,远程访问家里电脑就显得尤为重要。本文将简要介…

计算机网络——交换机和路由器

目录 前言 引言 交换机是用来做什么的? 与路由器有什么区别? 网关 子网掩码 网关、路由 前言 本博客是博主用于复习计算机网络的博客,如果疏忽出现错误,还望各位指正。 这篇博客是在B站掌芝士zzs这个UP主的视频的总结&am…

Ubuntu下配置Android NDK环境

Android-NDK的下载 下载Android-NDK wget -c http://dl.google.com/android/ndk/android-ndk-r10e-linux-x86_64.bin 执行bin文件(即解压) ./android-ndk-r10c-linux-x86_64.bin Android-NDK的配置 要想使用Android-NDK,还需要进行环境变量…

C/C++基础----判断和循环

判断 if-elseif-else判断 语句&#xff1a; 条件使用之前的逻辑运算符或者关系运算符 if(条件1){条件1成立时内容 }else if(条件2){条件2成立时内容 }else{所有条件不成立时内容 }#include <iostream>using namespace std;int main() {int age 10;if (age > 18) {c…