4 - 配置指南
1. 文档概述
huatuo-bamai 作为 HUATUO 的核心采集器(bpf-based metrics and anomaly inspector),其配置文件用于定义数据采集范围、探针启用策略、指标输出格式、异常检测规则、以及日志行为等。
配置文件包含全局黑名单、日志、运行时资源限制、存储配置以及自动追踪(AutoTracing)等多个 section。每个配置项均附带详细注释,明确说明用途、默认值及注意事项。本文档针对配置文件中的每一个配置项提供中文的详细解释,帮助用户准确理解和安全定制配置。
注意:配置文件中多数参数以 # 注释形式提供默认值,实际启用时需移除 # 并根据环境调整。修改后需重启 huatuo-bamai 进程生效。生产环境建议遵循最小化原则,避免过度开启高开销特性。
2. 全局黑名单
# The global blacklist for tracing and metrics
BlackList = ["netdev_hw", "metax_gpu"]
-
BlackList:全局追踪与指标黑名单。
用于排除特定模块或追踪和指标采集,避免无关噪声或高开销探针。例如 [“netdev_hw”, “metax_gpu”],即全局禁用网络设备硬件层(netdev_hw)和 Metax GPU 相关的追踪与指标。
说明:添加黑名单项可有效降低资源消耗,尤其在特定硬件环境中;支持数组格式,可根据实际业务扩展。
3. 日志配置
# Log Configuration
#
# - Level
# The log level for huatuo-bamai: Debug, Info, Warn, Error, Panic.
# Default: Info
#
# - File
# Store logs to where the logging file is. If it is empty, don't write log
# to any file.
# Default: empty
#
[Log]
# Level = "Info"
# File = ""
-
Level:日志级别。
可选值包括 Debug、Info、Warn、Error、Panic。默认值为 Info。
说明:控制 huatuo-bamai 的日志输出详细程度。生产环境推荐使用 Info 或 Warn 以减少日志量;Debug 级别仅用于故障排查,会产生大量输出。
-
File:日志文件路径。
指定日志写入的文件路径。若为空字符串,则不写入文件(仅输出到标准输出或系统日志)。默认值为空。
说明:在容器化部署中,建议配置具体路径进行持久化。
4. 运行时资源限制
# Runtime resource limit
#
# - LimitInitCPU
# During the huatuo-bamai startup, the CPU of process are restricted from use.
# Default is 0.5 CPU.
#
# - LimitCPU
# The CPU resource restricted once the process starts.
# Default is 2.0 CPU.
#
# - LimitMem
# The memory resource limitted for huatuo-bamai process.
# Default is 2048MB.
#
[RuntimeCgroup]
# LimitInitCPU = 0.5
# LimitCPU = 2.0
# LimitMem = 2048
-
LimitInitCPU:启动阶段 CPU 限制。
huatuo-bamai 进程启动期间允许使用的 CPU 核数限制。默认值为 0.5 CPU。
说明:防止启动过程占用过多 CPU 资源影响宿主机业务,单位为 CPU 核心数(支持小数)。
-
LimitCPU:运行时 CPU 限制。
进程正常运行后允许使用的 CPU 资源上限。默认值为 2.0 CPU。
说明:根据节点规模和业务负载调整,推荐在高密度容器环境中适当降低以保障业务稳定性。
-
LimitMem:内存资源限制。
huatuo-bamai 进程可使用的最大内存量。默认值为 2048 MB。
说明:单位为 MB,用于通过 cgroup 限制内存占用,防止 OOM(Out Of Memory)风险。生产环境可根据实际采集规模适当增加。
5. 存储配置
5.1 ElasticSearch/OpenSearch 存储
# Storage configuration
[Storage]
# Elasticsearch and OpenSearch Storage
#
# Disable ES/OS storage if one of Address, Username, Password is empty.
# Store the tracing and events data of linux kernel to ES/OS.
#
# - Address
# Default address is :9200 of localhost. Port 9200 is used for all API calls
# over HTTP. This includes search and aggregations, monitoring and anything
# else that uses a HTTP or HTTPS request. All client libraries will use this port to
# talk to Elasticsearch or OpenSearch.
# e.g.
# http://127.0.0.1:9200
# https://127.0.0.1:9200
#
# Default: :9200
#
# - Index
# Elasticsearch or OpenSearch index, a logical namespace that holds a collection of
# documents for huatuo-bamai.
# Default: huatuo_bamai
#
# - Username
# - Password
# There is no default username and password.
#
[Storage.ES]
# Address = "http://127.0.0.1:9200"
# Index = "huatuo_bamai"
Username = "elastic"
Password = "huatuo-bamai"
-
Address:ElasticSearch/OpenSearch 存储服务地址。
默认值为 http://127.0.0.1:9200。
说明:用于存储内核追踪和事件数据。如果 Address、Username 或 Password 中任一项为空,则禁用 ES/OS 存储。支持 HTTP/HTTPS 协议。
-
Index:索引名称。
默认值为 huatuo_bamai。
说明:索引是 ElasticSearch/OpenSearch 文档的逻辑命名空间,用于组织 huatuo-bamai 产生的追踪与事件数据。
-
Username:用户名。
无默认值(示例中使用 elastic)。
说明:用于 Basic Auth 认证。
-
Password:认证密码。
无默认值(示例中使用 huatuo-bamai)。
说明:配合用户名进行安全认证。生产环境强烈建议使用强密码并结合 TLS 加密传输。
整体说明:ES/OS 存储用于持久化内核追踪和事件数据,便于后续检索与分析。如果用户不关心 Linux 内核事件、Autotracing 数据则可以关闭该配置。
5.2 本地文件存储
# LocalFile Storage
#
# Store data to local directory for troubleshooting on the host machine.
#
# - Path
# The directory for storing data. If the Path is empty, LocalFile will be disabled.
# Default: "huatuo-local"
#
# - RotationSize
# The maximum size in Megabytes of a record file before it gets rotated
# for per linux kernel tracer.
# Default: 100MB
#
# - MaxRotation
# The maximum number of old log files to retain for per tracer.
# Default: 10
#
[Storage.LocalFile]
# Path = "huatuo-local"
# RotationSize = 100
# MaxRotation = 10
-
Path:本地数据存储目录。
默认值为 huatuo-local。若路径为空,则禁用本地文件存储。
说明:用于在宿主机本地保存数据,主要用于现场故障排查。推荐配置为绝对路径。
-
RotationSize:单文件轮转大小。
每个追踪器记录文件在达到该大小时进行轮转。默认值为 100 MB。
说明:单位为 MB,防止单个文件过大导致磁盘占用失控。
-
MaxRotation:最大保留轮转文件数。
每个追踪器最多保留的历史文件数量。默认值为 10。
说明:超过数量后自动删除最早文件,控制磁盘空间使用。
6. 自动追踪配置
自动追踪模块是 HUATUO 的智能特性之一,可根据阈值自动触发特定性能追踪,减少人工干预。
6.1 CPUIdle 自动追踪 — 容器突发高 CPU 使用场景
# Autotracing configuration
[AutoTracing]
# cpuidle
#
# For a high cpu usage all of a sudden in containers.
#
# - UserThreshold
# User CPU usage threshold, when cpu usage reaches this threshold, cpu
# performance tracing will be triggered.
# Default: 75%
#
# - SysThreshold
# System CPU usage threshold, when reaching this threshold, cpu performance
# tracing will be triggered.
# Default: 45%
#
# - UsageThreshold
# The total cpu usage (system + user cpu usage) threshold, when reaching
# this threshold, cpu performance tracing will be triggered.
# Default: 45%
#
# - DeltaUserThreshold
# The range of this user cpu changes within a short period of time.
# Default: 45%
#
# - DeltaSysThreshold
# The range of this system cpu changes within a short period of time.
# Default: 20%
#
# - DeltaUsageThreshold
# The range of this cpu usage changes within a short period of time.
# Default: 55%
#
# - Interval
# The sample interval of the cpu usage for all containers.
# Default: 10s
#
# - IntervalTracing
# Time since last run. Avoid frequently executing this tracing to prevent
# damage to the system.
# Default: 1800s
#
# - RunTracingToolTimeout
# The executing time of this tracing program.
# Default: 10s
#
# NOTE:
# Running this performance tool, when:
# 1. UserThreshold and DeltaUserThreshold are true, or
# 2. SysThreshold and DeltaSysThreshold are true, or
# 3. UsageThreshold and DeltaUsageThreshold
#
[AutoTracing.CPUIdle]
# UserThreshold = 75
# SysThreshold = 45
# UsageThreshold = 90
# DeltaUserThreshold = 45
# DeltaSysThreshold = 20
# DeltaUsageThreshold = 55
# Interval = 10
# IntervalTracing = 1800
# RunTracingToolTimeout = 10
-
UserThreshold:用户态 CPU 使用率阈值(%)。
默认 75%。 当容器用户态 CPU 使用率达到该值时,可能触发 CPU 性能追踪。
-
SysThreshold:系统态 CPU 使用率阈值(%)。
默认 45%。 当系统态 CPU 使用率达到该值时,可能触发追踪。
-
UsageThreshold:总 CPU 使用率阈值(用户态 + 系统态,%)。
默认 90%(注释中示例)。 总 CPU 使用率达到该阈值时触发追踪。
-
DeltaUserThreshold:用户态 CPU 短期变化幅度阈值(%)。
默认 45%。 短时间内用户态 CPU 使用率变化超过该值时触发。
-
DeltaSysThreshold:系统态 CPU 短期变化幅度阈值(%)。
默认 20%。 短时间内系统态 CPU 使用率变化超过该值时触发。
-
DeltaUsageThreshold:总 CPU 使用率短期变化幅度阈值(%)。
默认 55%。 短时间内总 CPU 使用率变化超过该值时触发。
-
Interval:CPU 使用率采样间隔(秒)。
默认 10s。 对所有容器进行 CPU 使用率采样的周期。
-
IntervalTracing:连续运行间隔(秒)。
默认 1800s(30 分钟)。 两次自动追踪之间的最小间隔,防止频繁执行对系统造成压力。
-
RunTracingToolTimeout:单次性能追踪执行超时时间(秒)。默认 10s。 控制追踪程序的最长运行时间,避免长时间占用资源。
触发逻辑说明:当满足以下任一条件时触发追踪:
- UserThreshold 与 DeltaUserThreshold 同时满足;或
- SysThreshold 与 DeltaSysThreshold 同时满足;或
- UsageThreshold 与 DeltaUsageThreshold 同时满足。
Filter 容器过滤:通过 Included/Excluded 规则数组控制监控范围。
# 每条规则包含 Field(过滤字段)和 Pattern(正则)
# Field: container_host_namespace | container_hostname | container_qos
#
# [[AutoTracing.CPUIdle.Filter.Excluded]]
# Field = "container_qos"
# Pattern = "besteffort"
# [[AutoTracing.CPUIdle.Filter.Included]]
# Field = "container_host_namespace"
# Pattern = "^application-"
6.2 CPUSys 自动追踪 — 宿主机突发高系统 CPU 使用场景
# cpusys
#
# For a high system cpu usage all of a sudden on host machine.
#
# - SysThreshold
# System CPU usage threshold, when reaching this threshold, cpu performance
# tracing will be triggered.
# Default: 45%
#
# - DeltaSysThreshold
# The range of system cpu changes within a short period of time.
# Default: 20%
#
# - Interval
# The sample interval of the cpu usage for host machine.
# Default: 10s
#
# - RunTracingToolTimeout
# The executing time of this tracing program.
# Default: 10s
#
# NOTE:
# Running this performance tool, when:
# SysThreshold and DeltaSysThreshold are true.
#
[AutoTracing.CPUSys]
# SysThreshold = 45
# DeltaSysThreshold = 20
# Interval = 10
# RunTracingToolTimeout = 10
-
SysThreshold:系统态 CPU 使用率阈值(%)。
默认 45%。
-
DeltaSysThreshold:系统态 CPU 短期变化幅度阈值(%)。
默认 20%。
-
Interval:宿主机 CPU 使用率采样间隔(秒)。
默认 10s。
-
RunTracingToolTimeout:单次追踪执行超时时间(秒)。默认 10s。
触发逻辑:当 SysThreshold 与 DeltaSysThreshold 同时满足时触发。
6.3 Dload 自动追踪 — 容器 D 状态任务剖析
# dload
#
# linux tasks D state profiling for containers.
#
# - ThresholdLoad
# The loadavg threshold value, when reaching this threshold, dload profiling
# is triggered.
# Defalut: 5
#
# - Interval
# The sample interval of the load for all containers.
# Default: 10s
#
# - IntervalTracing
# Time since last run. Avoid frequently executing this tracing to prevent
# damage to the system.
# Default: 1800s
#
[AutoTracing.Dload]
# ThresholdLoad = 5
# Interval = 10
# IntervalTracing = 1800
-
ThresholdLoad:容器的系统负载平均值(loadavg)阈值。
默认 5。 当 loadavg 达到该值时,触发 D 状态(不可中断睡眠)任务剖析。
说明:用于诊断容器中大量进程进入 D 状态的场景。
-
Interval:监控间隔(秒)。
默认 10。 Dload 监控的周期。
-
IntervalTracing:连续运行间隔(秒)。
默认 1800s(30 分钟)。 两次自动追踪之间的最小间隔,防止频繁执行对系统造成压力。
6.4 IOTracing 自动追踪 — 容器 IO 性能剖析
# iotracing
#
# io profiling for containers.
#
# - WbpsThreshold
# Max write bytes per second, when reaching this threshold, iotracing is triggered.
# Please note that if it is an NVMe device, it must also meet the UtilThreshold.
# Defalut: 1500 MB/s
#
# - RbpsThreshold
# Max read bytes per second, when reaching this threshold, iotracing is triggered.
# Please note that if it is an NVMe device, it must also meet the UtilThreshold.
# Defalut: 2000 MB/s
#
# - UtilThreshold
# Disk utilization, Percentage of time the disk is busy. If this is consistently
# above 80-90%, the disk may be a bottleneck.
# Defalut: 90%
#
# - AwaitThreshold
# Await (Average IO wait time in ms): High values indicate slow disk response times.
# Defalut: 100ms
#
# - RunTracingToolTimeout
# The executing time of this tracing tool.
# Default: 10s
#
# - MaxProcDump
# The number of processes displayed by iotracing tool.
# Defalut: 10
#
# - MaxFilesPerProcDump
# The number of files per process displayed by iotracing tool.
# Defalut: 5
#
[AutoTracing.IOTracing]
# WbpsThreshold = 1500
# RbpsThreshold = 2000
# UtilThreshold = 90
# AwaitThreshold = 100
# RunTracingToolTimeout = 10
# MaxProcDump = 10
# MaxFilesPerProcDump = 5
-
WbpsThreshold:每秒最大写字节数阈值(MB/s)。
默认 1500 MB/s。 达到该值时可能触发 IO 追踪(NVMe 设备需同时满足 UtilThreshold)。
-
RbpsThreshold:每秒最大读字节数阈值(MB/s)。
默认 2000 MB/s。 类似写字节,达到阈值时触发。
-
UtilThreshold:磁盘利用率阈值(%)。
默认 90%。 磁盘忙碌时间百分比,持续高于 80-90% 可能成为瓶颈。
-
AwaitThreshold:平均 IO 等待时间阈值(ms)。
默认 100ms。 高值表示磁盘响应缓慢。
-
RunIOTracingTimeout:IO 追踪工具执行超时时间(秒)。
默认 10s。
-
MaxProcDump:IO 追踪显示的最大进程数。
默认 10。 控制输出中展示的进程数量。
-
MaxFilesPerProcDump:每个进程显示的最大文件数。
默认 5。 控制每个进程关联文件的展示数量。
说明:IOTracing 用于容器 IO 热点诊断,特别关注高负载磁盘场景。
6.5 内存突发自动追踪
该模块用于检测宿主机内存使用量突发增长场景,并在触发时自动捕获内核上下文,便于诊断内存压力事件。
# memory burst
#
# If there is a memory used burst on the host, capture this kernel context.
#
# - Interval
# The sample interval of the memory used.
# Default: 10s
#
# - DeltaMemoryBurst
# A certain percentage of memory burst used. 100% that means, e.g.,
# memory used increased from 200MB to 400MB.
# Default: 100%
#
# - DeltaAnonThreshold
# A certain percentage of anon memory burst used. 100% that means, e.g.,
# anon memory used increased from 200MB to 400MB.
# Default: 70%
#
# - IntervalTracing
# Time since last run. Avoid frequently executing this tracing
# to prevent damage to the system.
# Default: 1800s
#
# - DumpProcessMaxNum
# How many processes to dump when this event is triggered.
# Default: 10
#
[AutoTracing.MemoryBurst]
# DeltaMemoryBurst = 100
# DeltaAnonThreshold = 70
# Interval = 10
# IntervalTracing = 1800
# SlidingWindowLength = 60
# DumpProcessMaxNum = 10
-
DeltaMemoryBurst:内存使用量突发增长百分比阈值。
默认 100%。 表示内存使用量在采样窗口内增长的比例(例如从 200MB 增长到 400MB 即 100%)。达到该阈值时可能触发内存突发追踪。
说明:用于捕获整体内存使用量的急剧上升场景。
-
DeltaAnonThreshold:匿名页内存突发增长百分比阈值。
默认 70%。 匿名内存(anonymous memory)增长比例阈值,匿名页是内存压力诊断的重要指标。
说明:重点监控易导致 OOM 或 swap 的匿名内存突发。
-
Interval:内存使用量采样间隔(秒)。
默认 10s。 对宿主机内存使用情况进行周期性采样的时间间隔。
说明:采样频率影响检测灵敏度与开销。
-
IntervalTracing:连续运行最小间隔(秒)。
默认 1800s(30 分钟)。 两次内存突发追踪之间的冷却时间,避免频繁执行对系统造成额外压力。
说明:防止追踪工具被过度触发。
-
DumpProcessMaxNum:触发事件时转储的最大进程数。
默认 10。 当内存突发事件触发时,最多转储多少个相关进程的详细信息(包括内存占用、调用栈等)。
说明:控制输出数据量,避免单次事件产生过多诊断信息。
6.6 已知问题过滤(IssuesList)
# IssuesList for known issue filtering in autotracing
IssuesList = []
-
IssuesList:已知问题过滤器。格式 [["问题名称", "正则"], ...]。采集到的堆栈匹配正则时标记为对应问题名称,默认 []。当前用于 dload 追踪。
示例:IssuesList = [["known_issue1", "softlockup"], ["known_issue2", "alloc_pages.*failed"]]
注意:当前仅支持 dload 追踪的已知问题过滤,其他事件暂不支持。
7. 事件追踪配置
该 section 负责内核关键事件的捕获与延迟监控,包括软中断、内存回收、网络接收延迟、网卡事件及丢包监控等,是 HUATUO 内核级异常上下文采集的核心模块。
7.1 软中断禁用追踪
# linux kernel events capturing configuration
[EventTracing]
# softirq
#
# tracing the softirq disabled events of linux kernel.
#
# - DisabledThreshold
# When the disable duration of softirq exceeds the threshold, huatuo-bamai
# will collect kernel context.
# Defalut: 10000000 in nanoseconds, 10ms
#
[EventTracing.Softirq]
# DisabledThreshold = 10000000
- DisabledThreshold:软中断禁用持续时间阈值(纳秒)。默认 10000000 ns(10ms)。 当内核软中断被禁用时间超过该阈值时,huatuo-bamai 将自动采集内核上下文。 说明:软中断长时间禁用可能导致网络、定时器等延迟,适合诊断中断风暴或高负载场景。
7.2 内存回收阻塞追踪
# memreclaim
#
# The memory reclaim may block the process, if one process is blocked
# for a long time, reporting the events to userspace.
#
# - BlockedThreshold
# The blocked time when memory reclaiming.
# Default: 900000000ns, 900ms
#
[EventTracing.MemoryReclaim]
# BlockedThreshold = 900000000
- BlockedThreshold:内存回收阻塞时间阈值(纳秒)。默认 900000000 ns(900ms)。 当单个进程因内存回收(reclaim)被阻塞超过该时间时,向用户态上报事件并捕获上下文。 说明:内存回收阻塞是导致进程卡顿的常见原因,尤其在内存紧张的云原生环境中。
7.3 网络接收延迟追踪
# networking rx latency
#
# linux net stack rx latency for every tcp skbs.
#
# - Driver2NetRx
# The latency from driver to net rx, e.g., netif_receive_skb.
# Default: 5ms
#
# - Driver2TCP
# The latency from driver to tcp rx, e.g., tcp_v4_rcv.
# Default: 10ms
#
# - Driver2Userspace
# The latency from driver to userspace copy data, e.g., skb_copy_datagram_iovec.
# Default: 115ms
#
# - ExcludedContainerQos
# Blacklist: skip containers whose qos level matches.
# Values: "guaranteed", "burstable", "besteffort" (case-insensitive).
# Default: [].
#
# - ExcludedHostNetnamespace
# Don't care the skbs, packets in the host net namespace.
# Default: true
#
[EventTracing.NetRxLatency]
# Driver2NetRx = 5
# Driver2TCP = 10
# Driver2Userspace = 115
# ExcludedContainerQos = []
ExcludedContainerQos = ["besteffort"]
# ExcludedHostNetnamespace = true
-
Driver2NetRx:从驱动到网络层接收的延迟阈值(毫秒)。
默认 5ms。 例如 netif_receive_skb 等函数的延迟监控阈值。
-
Driver2TCP:从驱动到 TCP 协议栈接收的延迟阈值(毫秒)。
默认 10ms。 例如 tcp_v4_rcv 等函数的延迟监控。
-
Driver2Userspace:从驱动到用户态数据拷贝的延迟阈值(毫秒)。
默认 115ms。 例如 skb_copy_datagram_iovec 等函数的延迟监控。
-
ExcludedContainerQos:排除的容器 QoS 级别,黑名单模式。
默认 [""]。 不监控指定 QoS 级别的容器网络接收延迟(对应 Kubernetes Pod QoS:Guaranteed、Burstable、BestEffort,大小写不敏感)。
说明:通常排除 BestEffort 容器以减少噪声。
-
ExcludedHostNetnamespace:是否排除宿主机网络命名空间。
默认 true。 不监控宿主机 net namespace 中的 skb 数据包延迟。
说明:聚焦容器网络流量,减少无关宿主机数据干扰。
7.4 网卡事件监控
# netdev events
#
# monitor the net device events.
#
# - DeviceList
# The net devices we take care of.
# Default: [] is empty, meaning no devices.
#
[EventTracing.Netdev]
DeviceList = ["eth0", "eth1", "bond4", "lo"]
-
DeviceList:需要监控的网卡设备列表。
默认示例包含 “eth0”, “eth1”, “bond4”, “lo”。 为空列表时表示不监控任何设备。 监控网络设备的物理链路状态事件等。
说明:精确指定感兴趣的网络接口,支持 bond、lo 等。
7.5 丢包监控([EventTracing.Dropwatch])
# dropwatch
#
# monitor packets dropped events in the Linux kernel.
#
# - ExcludedNeighInvalidate
# Don't care of neigh_invalidate drop events.
# Default: true
#
[EventTracing.Dropwatch]
# ExcludedNeighInvalidate = true
7.6 硬件错误事件追踪(EventTracing.Ras)
# ras
#
# Hardware error event tracing (RAS: Reliability, Availability, Serviceability).
# Captures MCE, EDAC, ACPI/GHES, PCIe AER, and MCE threshold (THR) events via eBPF.
#
# - MceThrBackoff
# Minimum interval in seconds between consecutive MCE threshold (THR) event saves.
# THR events are fired by the local-APIC threshold interrupt and can storm at high
# frequency; this cooldown prevents flooding storage with redundant records.
# Default: 1800s (30 minutes)
#
[EventTracing.Ras]
# MceThrBackoff = 1800
7.8 已知问题过滤(IssuesList)
# IssuesList for known issue filtering in event tracing
IssuesList = []
-
IssuesList:已知问题过滤器。格式和用法同 AutoTracing 的 IssuesList。匹配事件上下文,标记为对应问题名称,默认 []。
示例:IssuesList = [["known_issue1", "comm=ignored_process"]]
注意:当前仅支持 net_rx_latency 事件的过滤,其他事件暂不支持。
8. 指标采集器配置
该 section 定义各类系统与网络指标的采集规则。所有 Included/Excluded 字段底层共用同一套过滤逻辑(正则表达式):
- 无规则:全部采集
- 仅 Excluded:黑名单,匹配即跳过
- 仅 Included:白名单,仅采集匹配项
- 两者并存:必须匹配 Included 且不匹配 Excluded
8.1 网卡统计
# Metric Collector
[MetricCollector]
# Netdev statistic
#
# - EnableNetlink
# Use netlink instead of procfs net/dev to get netdev statistic.
# Only support the host environment to use `netlink` now.
# Default is "false".
#
# - DeviceIncluded
# Accept special devices in netdev statistic.
# Default: "" (empty), meaning include all.
#
# - DeviceExcluded
# Exclude special devices in netdev statistic.
# Default: "" (empty), meaning exclude nothing.
#
# Filter logic see MetricCollector section header.
#
[MetricCollector.NetdevStats]
# EnableNetlink = false
# DeviceIncluded = ""
DeviceExcluded = "^(lo)|(docker\\w*)|(veth\\w*)$"
-
EnableNetlink:是否使用 netlink 而非 procfs 获取网卡统计。
默认 false。 仅宿主机环境支持 netlink。
说明:netlink 方式通常更高效,但需内核支持。
-
DeviceIncluded:需要纳入统计的网卡设备正则。默认空(全部采集)。
-
DeviceExcluded:需排除的网卡设备正则。如:排除 lo、docker、veth 等虚拟接口。
8.2 网卡 DCB(Data Center Bridging)采集
# netdev dcb, DCB (Data Center Bridging)
#
# Collecting the DCB PFC (Priority-based Flow Control).
#
# - DeviceList
# The net devices we take care of.
# Default: [] is empty, meaning no devices.
#
[MetricCollector.NetdevDCB]
DeviceList = ["eth0", "eth1"]
8.3 网卡硬件统计
# netdev hardware statistic
#
# Collecting the hardware statistic of net devices, e.g, rx_dropped.
#
# - DeviceList
# The net devices we take care of.
# Default: [] is empty, meaning no devices.
#
[MetricCollector.NetdevHW]
DeviceList = ["eth0", "eth1"]
8.4 Qdisc(队列规则)采集
# Qdisc
#
# - DeviceIncluded / DeviceExcluded
# Same as above.
#
[MetricCollector.Qdisc]
# DeviceIncluded = ""
DeviceExcluded = "^(lo)|(docker\\w*)|(veth\\w*)$"
8.5 vmstat 指标采集
# vmstat
#
# This metric supports host vmstat and cgroup vmstat.
# - IncludedOnHost / ExcludedOnHost: same filter logic, for host /proc/vmstat.
# - IncludedOnContainer / ExcludedOnContainer: same, for cgroup containers memory.stat.
#
[MetricCollector.Vmstat]
IncludedOnHost = "allocstall|nr_active_anon|nr_active_file|nr_boost_pages|nr_dirty|nr_free_pages|nr_inactive_anon|nr_inactive_file|nr_kswapd_boost|nr_mlock|nr_shmem|nr_slab_reclaimable|nr_slab_unreclaimable|nr_unevictable|nr_writeback|numa_pages_migrated|pgdeactivate|pgrefill|pgscan_direct|pgscan_kswapd|pgsteal_direct|pgsteal_kswapd"
ExcludedOnHost = "total"
IncludedOnContainer = "active_anon|active_file|dirty|inactive_anon|inactive_file|pgdeactivate|pgrefill|pgscan_direct|pgscan_kswapd|pgsteal_direct|pgsteal_kswapd|shmem|unevictable|writeback|pgscan_globaldirect|pgscan_globalkswapd|pgscan_cswapd|pgsteal_cswapd|pgsteal_globaldirect|pgsteal_globalkswapd"
ExcludedOnContainer = "total"
-
IncludedOnHost / ExcludedOnHost:宿主机 /proc/vmstat 的过滤字段正则。
-
IncludedOnContainer / ExcludedOnContainer:容器 cgroup memory.stat 的过滤字段正则。
说明:精细控制 vmstat 指标采集,支持主机与容器差异化配置,避免采集无关字段。
8.6 其他指标采集
# MemoryEvents/Netstat/MountPointStat
#
# - Included / Excluded: same as above.
# - MountPointsIncluded: whitelist only (no Excluded), same logic.
#
[MetricCollector.MemoryEvents]
Included = "watermark_inc|watermark_dec"
# Excluded = ""
[MetricCollector.Netstat]
# Excluded = ""
# Included = ""
# MountPointStat
[MetricCollector.MountPointStat]
MountPointsIncluded = "(^/home$)|(^/$)|(^/boot$)"
9. Pod 配置
该 section 用于从 kubelet 获取 Pod 信息,实现容器与 Pod 级别的标签关联和指标隔离。
# Pod Configuration
#
# Configure these parameters for fetching pods from kubelet.
#
# - KubeletReadOnlyPort
# The KubeletReadOnlyPort is kubelet read-only port for the Kubelet to serve on with
# no authentication/authorization. The port number must be between 1 and 65535, inclusive.
# Setting this field to 0 disables fetching pods from kubelet read-only service.
# Default: 10255
#
# - KubeletAuthorizedPort
# The port is the HTTPs port of the kubelet. The port number must be between 1 and 65535,
# inclusive. Setting this field to 0 disables fetching pods from kubelet HTTPS port.
# Default: 10250
#
# - KubeletClientCertPath
# https://kubernetes.io/docs/setup/best-practices/certificates/
#
# Client certificate and private key file name. One file or two files:
# "/path/to/xxx-kubelet-client.crt,/path/to/xxx-kubelet-client.key",
# "/path/to/kubelet-client-current.pem"
#
# You can disable this kubelet fetching pods, for bare metal service, by
# KubeletReadOnlyPort = 0, and KubeletAuthorizedPort = 0.
#
[Pod]
KubeletClientCertPath = "/etc/kubernetes/pki/apiserver-kubelet-client.crt,/etc/kubernetes/pki/apiserver-kubelet-client.key"
-
KubeletReadOnlyPort:kubelet 只读端口。
默认 10255。 用于无认证方式从 kubelet 获取 Pod 列表。设置为 0 时禁用该方式。
说明:端口范围 1-65535,适合测试或非安全环境。
-
KubeletAuthorizedPort:kubelet HTTPS 授权端口。
默认 10250。 用于安全方式(证书认证)从 kubelet 获取 Pod 信息。设置为 0 时禁用。
说明:生产环境推荐使用该端口结合证书认证。
-
KubeletClientCertPath:kubelet 客户端证书及私钥路径。
支持格式:"/path/to/xxx-kubelet-client.crt,/path/to/xxx-kubelet-client.key" 或单文件 PEM 格式。
说明:参考 Kubernetes 证书最佳实践,用于 HTTPS 端口的 mTLS 认证。在裸金属或非 Kubernetes 环境中可通过将两个端口设为 0 来禁用 Pod 获取功能。
10. 事件监听配置
该 section 用于控制 POST /v1/events/watch SSE 流式接口的运行行为,外部客户端可通过该接口实时订阅内核事件数据流。
# Events Watch Configuration
#
# Controls the behavior of the POST /v1/events/watch SSE streaming API,
# which allows external clients to subscribe to kernel events in real-time.
#
# - MaxClients
# Maximum number of concurrent clients allowed to hold an open /v1/events/watch
# connection. Once the limit is reached, new requests are rejected with HTTP 429
# (Too Many Requests) until an existing client disconnects.
# Default: 100
#
# - KeepAliveInterval
# Interval in seconds at which the server sends an SSE comment ping to each
# connected client. The ping keeps the HTTP connection alive through load
# balancers and proxies that would otherwise time out idle connections.
# If writing the ping fails three consecutive times the server treats the
# client as gone and closes the connection.
# Default: 30s
#
[EventsWatch]
# MaxClients = 100
# KeepAliveInterval = 30
-
MaxClients:最大并发客户端连接数。
默认 100。允许同时持有 /v1/events/watch 长连接的客户端上限。当连接数达到该值时,新请求将以 HTTP 429(Too Many Requests)被拒绝,直到已有客户端断开连接后方可接入。
说明:根据节点资源和实际订阅方数量合理调整。每个长连接会占用一个 goroutine 和一个订阅通道(缓冲 256 条),连接数过多时注意内存压力。
-
KeepAliveInterval:探活心跳间隔(秒)。
默认 30s。服务端每隔该时间向已连接客户端发送一条 SSE 注释行(": ping")以维持 HTTP 长连接,防止负载均衡器或代理因连接空闲而超时断开。
说明:若服务端连续 3 次写入探活消息(或事件数据)均失败,则视为客户端已断开并主动关闭连接,释放相关资源。建议该值不超过上游代理的 idle timeout,生产环境常见值为 15–60s。
11. 配置最佳实践与注意事项
- 资源控制:生产环境优先调整 RuntimeCgroup 中的 CPU 和内存限制,避免影响业务容器。
- 存储选择:小规模部署可优先使用 LocalFile 进行本地排查;大规模集群推荐配置 Elasticsearch 实现集中存储与查询。
- 自动追踪调优:根据业务负载特征调整阈值,过低阈值会导致频繁触发,过高则可能遗漏问题。建议在测试环境逐步验证。
- 安全性:ES 配置中请使用强密码,并考虑启用 HTTPS;避免在配置文件中硬编码敏感信息。
- 兼容性:配置参数受内核版本、硬件环境影响,建议结合 HUATUO 官方文档验证。
通过合理配置 huatuo-bamai.conf,可充分发挥 HUATUO 在内核级异常检测与智能追踪方面的优势,有效提升云原生系统的可观测性和故障诊断效率。如需针对特定场景的深度定制,欢迎提供更多环境细节进一步讨论。
5.1 - 内核全景观测
当前版本支持的指标:
CPU 系统
调度延迟
如下指标可以观测进程调度延迟状态,即一个进程从变得可运行的时刻(即被放进运行队列),到它真正开始在 CPU 上执行的这段时间。
# HELP huatuo_bamai_runqlat_container_latency cpu run queue latency for the containers
# TYPE huatuo_bamai_runqlat_container_latency gauge
huatuo_bamai_runqlat_container_latency{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev",zone="0"} 226
huatuo_bamai_runqlat_container_latency{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev",zone="1"} 0
huatuo_bamai_runqlat_container_latency{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev",zone="2"} 0
huatuo_bamai_runqlat_container_latency{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev",zone="3"} 0
# HELP huatuo_bamai_runqlat_latency cpu run queue latency for the host
# TYPE huatuo_bamai_runqlat_latency gauge
huatuo_bamai_runqlat_latency{host="hostname",region="dev",zone="0"} 35100
huatuo_bamai_runqlat_latency{host="hostname",region="dev",zone="1"} 0
huatuo_bamai_runqlat_latency{host="hostname",region="dev",zone="2"} 0
huatuo_bamai_runqlat_latency{host="hostname",region="dev",zone="3"} 0
| 指标 |
意义 |
单位 |
对象 |
取值 |
标签 |
| runqlat_container_latency |
进程调度延迟计数: zone0, 0~10ms zone1, 10-20ms zone2, 20-50ms zone3, 50+ms |
计数 |
容器 |
eBPF |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region, zone |
| runqlat_latency |
进程调度延迟计数: zone0, 0~10ms zone1, 10-20ms zone2, 20-50ms zone3, 50+ms |
计数 |
物理机 |
eBPF |
host, region, zone |
中断延迟
系统中各类软中断在不同CPU上的响应延迟指标(当前只采集了 NET_RX/NET_TX)。
# HELP huatuo_bamai_softirq_latency softirq latency
# TYPE huatuo_bamai_softirq_latency gauge
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_RX",zone="0"} 125
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_RX",zone="1"} 2
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_RX",zone="2"} 0
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_RX",zone="3"} 0
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_TX",zone="0"} 0
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_TX",zone="1"} 0
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_TX",zone="2"} 0
huatuo_bamai_softirq_latency{cpuid="0",host="hostname",region="dev",type="NET_TX",zone="3"} 0
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_RX",zone="0"} 110
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_RX",zone="1"} 0
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_RX",zone="2"} 1
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_RX",zone="3"} 0
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_TX",zone="0"} 0
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_TX",zone="1"} 0
huatuo_bamai_softirq_latency{cpuid="1",host="hostname",region="dev",type="NET_TX",zone="2"} 0
| 指标 |
意义 |
单位 |
对象 |
取值 |
标签 |
| softirq_latency |
软中断响应延迟在不同 zone 的计数: zone0, 0-10us zone1, 10-100us zone2, 100-1000us zone3, 1+ms |
计数 |
物理机 |
eBPF |
cpuid, host, region, type, zone |
资源利用率
通过如下指标可以观测,物理机,容器的 CPU 资源使用情况,prometheus 指标格式:
# HELP huatuo_bamai_cpu_util_sys cpu sys for the host
# TYPE huatuo_bamai_cpu_util_sys gauge
huatuo_bamai_cpu_util_sys{host="hostname",region="dev"} 6.268857848549965e-06
# HELP huatuo_bamai_cpu_util_total cpu total for the host
# TYPE huatuo_bamai_cpu_util_total gauge
huatuo_bamai_cpu_util_total{host="hostname",region="dev"} 1.7736934944144352e-05
# HELP huatuo_bamai_cpu_util_usr cpu usr for the host
# TYPE huatuo_bamai_cpu_util_usr gauge
huatuo_bamai_cpu_util_usr{host="hostname",region="dev"} 1.1468077095594387e-05
# HELP huatuo_bamai_cpu_util_container_sys cpu sys for the containers
# TYPE huatuo_bamai_cpu_util_container_sys gauge
huatuo_bamai_cpu_util_container_sys{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1.6708593420881415e-07
# HELP huatuo_bamai_cpu_util_container_total cpu total for the containers
# TYPE huatuo_bamai_cpu_util_container_total gauge
huatuo_bamai_cpu_util_container_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 3.379584661890774e-07
# HELP huatuo_bamai_cpu_util_container_usr cpu usr for the containers
# TYPE huatuo_bamai_cpu_util_container_usr gauge
huatuo_bamai_cpu_util_container_usr{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1.7087253017325962e-07
| 指标 |
意义 |
单位 |
对象 |
标签 |
| cpu_util_sys |
CPU 内核态利用率 |
% |
物理机 |
host, region |
| cpu_util_usr |
CPU 用户态利用率 |
% |
物理机 |
host, region |
| cpu_util_total |
CPU 总利用率 |
% |
物理机 |
host, region |
| cpu_util_container_sys |
CPU 内核态利用率 |
% |
容器 |
container_host,container_hostnamespace,container_level,container_name,container_type,host,region |
| cpu_util_container_usr |
CPU 用户态利用率 |
% |
容器 |
container_host,container_hostnamespace,container_level,container_name,container_type,host,region |
| cpu_util_container_total |
CPU 总利用率 |
% |
容器 |
container_host,container_hostnamespace,container_level,container_name,container_type,host,region |
资源配置
通过如下指标可以了解容器 CPU 资源配置情况,prometheus 指标格式:
# HELP huatuo_bamai_cpu_util_container_cores cpu core number for the containers
# TYPE huatuo_bamai_cpu_util_container_cores gauge
huatuo_bamai_cpu_util_container_cores{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="Burstable",container_name="coredns",container_type="Normal",host="hostname",region="dev"} 6
| 指标 |
意义 |
单位 |
对象 |
标签 |
| cpu_util_container_cores |
CPU 核心数 |
个 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
资源争抢
这些指标体现了容器争抢,被限制等状态,prometheus 指标格式:
# HELP huatuo_bamai_cpu_stat_container_nr_throttled throttle nr for the containers
# TYPE huatuo_bamai_cpu_stat_container_nr_throttled gauge
huatuo_bamai_cpu_stat_container_nr_throttled{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_cpu_stat_container_throttled_time throttle time for the containers
# TYPE huatuo_bamai_cpu_stat_container_throttled_time gauge
huatuo_bamai_cpu_stat_container_throttled_time{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
标签 |
| cpu_stat_container_nr_throttled |
当前 cgroup 被 throttled 限制的次数 |
计数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| cpu_stat_container_throttled_time |
当前 cgroup 被 throttled 限制的总时间 |
纳秒 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
Ref:
此外,滴滴内核支持如下争抢指标,未来会开放:
# HELP huatuo_bamai_cpu_stat_container_wait_rate wait rate for the containers
# TYPE huatuo_bamai_cpu_stat_container_wait_rate gauge
huatuo_bamai_cpu_stat_container_wait_rate{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_cpu_stat_container_throttle_wait_rate throttle wait rate for the containers
# TYPE huatuo_bamai_cpu_stat_container_throttle_wait_rate gauge
huatuo_bamai_cpu_stat_container_throttle_wait_rate{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_cpu_stat_container_inner_wait_rate inner wait rate for the containers
# TYPE huatuo_bamai_cpu_stat_container_inner_wait_rate gauge
huatuo_bamai_cpu_stat_container_inner_wait_rate{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_cpu_stat_container_exter_wait_rate exter wait rate for the containers
# TYPE huatuo_bamai_cpu_stat_container_exter_wait_rate gauge
huatuo_bamai_cpu_stat_container_exter_wait_rate{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
资源突发
如下指标体现了容器出现资源突发使用状态:
# HELP huatuo_bamai_cpu_stat_container_nr_bursts burst nr for the containers
# TYPE huatuo_bamai_cpu_stat_container_nr_bursts gauge
huatuo_bamai_cpu_stat_container_nr_bursts{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_cpu_stat_container_burst_time burst time for the containers
# TYPE huatuo_bamai_cpu_stat_container_burst_time gauge
huatuo_bamai_cpu_stat_container_burst_time{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
标签 |
| cpu_stat_container_burst_time |
所有在各个周期中超过 quota 部分所累计使用的真实墙钟时间 |
纳秒 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| cpu_stat_container_nr_bursts |
发生超额使用的周期数量 |
计数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
资源负载
这些指标体现物理机、容器负载状态。
# HELP huatuo_bamai_loadavg_load1 system load average, 1 minute
# TYPE huatuo_bamai_loadavg_load1 gauge
huatuo_bamai_loadavg_load1{host="hostname",region="dev"} 0.3
# HELP huatuo_bamai_loadavg_load15 system load average, 15 minutes
# TYPE huatuo_bamai_loadavg_load15 gauge
huatuo_bamai_loadavg_load15{host="hostname",region="dev"} 0.22
# HELP huatuo_bamai_loadavg_load5 system load average, 5 minutes
# TYPE huatuo_bamai_loadavg_load5 gauge
huatuo_bamai_loadavg_load5{host="hostname",region="dev"} 0.2
# HELP huatuo_bamai_loadavg_container_nr_running nr_running of container
# TYPE huatuo_bamai_loadavg_container_nr_running gauge
huatuo_bamai_loadavg_container_nr_running{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_loadavg_container_nr_uninterruptible nr_uninterruptible of container
# TYPE huatuo_bamai_loadavg_container_nr_uninterruptible gauge
huatuo_bamai_loadavg_container_nr_uninterruptible{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
标签 |
备注 |
| loadavg_load1 |
系统过去 1 分钟的平均负载 |
计数 |
物理机 |
host, region |
|
| loadavg_load5 |
系统过去 5 分钟的平均负载 |
计数 |
物理机 |
host, region |
|
| loadavg_load15 |
系统过去 15 分钟的平均负载 |
计数 |
物理机 |
host, region |
|
| loadavg_container_container_nr_running |
容器中运行的任务数量 |
计数 |
容器 |
host, region |
只支持 cgroup v1 |
| loadavg_container_container_nr_uninterruptible |
容器中不可中断任务的数量 |
计数 |
容器 |
host, region |
只支持 cgroup v1 |
内存系统
资源回收
系统内存回收行为可能导致进程被阻塞。通过这些指标可以了解系统内存状态。
# HELP huatuo_bamai_memory_free_allocpages_stall time stalled in alloc pages
# TYPE huatuo_bamai_memory_free_allocpages_stall gauge
huatuo_bamai_memory_free_allocpages_stall{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_free_compaction_stall time stalled in memory compaction
# TYPE huatuo_bamai_memory_free_compaction_stall gauge
huatuo_bamai_memory_free_compaction_stall{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_reclaim_container_directstall counter of cgroup reclaim when try_charge
# TYPE huatuo_bamai_memory_reclaim_container_directstall gauge
huatuo_bamai_memory_reclaim_container_directstall{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
取值 |
标签 |
| memory_free_allocpages_stall |
系统在分配内存页过程中的耗时计数 |
纳秒 |
物理机 |
eBPF |
host, region |
| memory_free_compaction_stall |
系统在规整内存页过程中的耗时计数 |
纳秒 |
物理机 |
eBPF |
host, region |
| memory_reclaim_container_directstall |
容器直接内存事件次数 |
计数 |
容器 |
eBPF |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
资源状态
通过如下指标可以了解整体系统、容器的内存状态。
# HELP huatuo_bamai_memory_vmstat_container_active_anon cgroup memory.stat active_anon
# TYPE huatuo_bamai_memory_vmstat_container_active_anon gauge
huatuo_bamai_memory_vmstat_container_active_anon{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1.47456e+07
# HELP huatuo_bamai_memory_vmstat_container_active_file cgroup memory.stat active_file
# TYPE huatuo_bamai_memory_vmstat_container_active_file gauge
huatuo_bamai_memory_vmstat_container_active_file{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 2.3617536e+07
# HELP huatuo_bamai_memory_vmstat_container_file_dirty cgroup memory.stat file_dirty
# TYPE huatuo_bamai_memory_vmstat_container_file_dirty gauge
huatuo_bamai_memory_vmstat_container_file_dirty{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_file_writeback cgroup memory.stat file_writeback
# TYPE huatuo_bamai_memory_vmstat_container_file_writeback gauge
huatuo_bamai_memory_vmstat_container_file_writeback{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_inactive_anon cgroup memory.stat inactive_anon
# TYPE huatuo_bamai_memory_vmstat_container_inactive_anon gauge
huatuo_bamai_memory_vmstat_container_inactive_anon{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_inactive_file cgroup memory.stat inactive_file
# TYPE huatuo_bamai_memory_vmstat_container_inactive_file gauge
huatuo_bamai_memory_vmstat_container_inactive_file{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 65536
# HELP huatuo_bamai_memory_vmstat_container_pgdeactivate cgroup memory.stat pgdeactivate
# TYPE huatuo_bamai_memory_vmstat_container_pgdeactivate gauge
huatuo_bamai_memory_vmstat_container_pgdeactivate{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_pgrefill cgroup memory.stat pgrefill
# TYPE huatuo_bamai_memory_vmstat_container_pgrefill gauge
huatuo_bamai_memory_vmstat_container_pgrefill{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_pgscan_direct cgroup memory.stat pgscan_direct
# TYPE huatuo_bamai_memory_vmstat_container_pgscan_direct gauge
huatuo_bamai_memory_vmstat_container_pgscan_direct{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_pgscan_kswapd cgroup memory.stat pgscan_kswapd
# TYPE huatuo_bamai_memory_vmstat_container_pgscan_kswapd gauge
huatuo_bamai_memory_vmstat_container_pgscan_kswapd{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_pgsteal_direct cgroup memory.stat pgsteal_direct
# TYPE huatuo_bamai_memory_vmstat_container_pgsteal_direct gauge
huatuo_bamai_memory_vmstat_container_pgsteal_direct{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_pgsteal_kswapd cgroup memory.stat pgsteal_kswapd
# TYPE huatuo_bamai_memory_vmstat_container_pgsteal_kswapd gauge
huatuo_bamai_memory_vmstat_container_pgsteal_kswapd{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_shmem cgroup memory.stat shmem
# TYPE huatuo_bamai_memory_vmstat_container_shmem gauge
huatuo_bamai_memory_vmstat_container_shmem{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_shmem_thp cgroup memory.stat shmem_thp
# TYPE huatuo_bamai_memory_vmstat_container_shmem_thp gauge
huatuo_bamai_memory_vmstat_container_shmem_thp{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_container_unevictable cgroup memory.stat unevictable
# TYPE huatuo_bamai_memory_vmstat_container_unevictable gauge
huatuo_bamai_memory_vmstat_container_unevictable{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
标签 |
| memory_vmstat_container_active_file |
活跃的文件内存数 |
字节, Bytes |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_active_anon |
活跃的匿名内存数 |
字节, Bytes |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_inactive_file |
非活跃的文件内存数 |
字节, Bytes |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_inactive_anon |
非活跃的匿名内存数 |
字节, Bytes |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_file_dirty |
已修改且还未写入磁盘的文件内存大小 |
字节, Bytes |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_file_writeback |
已修改且正等待写入磁盘的文件内存大小 |
字节, Bytes |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_dirty |
已修改且还未写入磁盘的内存大小 |
字节, Bytes |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_writeback |
已修改且正等待写入磁盘的文件,匿名内存大小 |
字节, Bytes |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_pgdeactivate |
将页面从 active LRU 移动到 inactive LRU 的数量 |
页数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_pgrefill |
在 active LRU 链表上被扫描的页面总数 |
页数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_pgscan_direct |
直接回收时,在 inactive LRU 上扫描过的页面总数 |
页数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_pgscan_kswapd |
kswapd 在 inactive LRU 链表上扫描过的页面总数 |
页数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_pgsteal_direct |
直接回收时,成功从 inactive LRU 回收的页面总数 |
页数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_pgsteal_kswapd |
kswapd 成功从 inactive LRU 回收的页面总数 |
页数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_vmstat_container_unevictable |
不可回收的页面字节数 |
字节, Bytes |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
物理机内存资源指标:
# HELP huatuo_bamai_memory_vmstat_allocstall_device /proc/vmstat allocstall_device
# TYPE huatuo_bamai_memory_vmstat_allocstall_device gauge
huatuo_bamai_memory_vmstat_allocstall_device{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_allocstall_dma /proc/vmstat allocstall_dma
# TYPE huatuo_bamai_memory_vmstat_allocstall_dma gauge
huatuo_bamai_memory_vmstat_allocstall_dma{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_allocstall_dma32 /proc/vmstat allocstall_dma32
# TYPE huatuo_bamai_memory_vmstat_allocstall_dma32 gauge
huatuo_bamai_memory_vmstat_allocstall_dma32{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_allocstall_movable /proc/vmstat allocstall_movable
# TYPE huatuo_bamai_memory_vmstat_allocstall_movable gauge
huatuo_bamai_memory_vmstat_allocstall_movable{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_allocstall_normal /proc/vmstat allocstall_normal
# TYPE huatuo_bamai_memory_vmstat_allocstall_normal gauge
huatuo_bamai_memory_vmstat_allocstall_normal{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_nr_active_anon /proc/vmstat nr_active_anon
# TYPE huatuo_bamai_memory_vmstat_nr_active_anon gauge
huatuo_bamai_memory_vmstat_nr_active_anon{host="hostname",region="dev"} 155449
# HELP huatuo_bamai_memory_vmstat_nr_active_file /proc/vmstat nr_active_file
# TYPE huatuo_bamai_memory_vmstat_nr_active_file gauge
huatuo_bamai_memory_vmstat_nr_active_file{host="hostname",region="dev"} 212425
# HELP huatuo_bamai_memory_vmstat_nr_dirty /proc/vmstat nr_dirty
# TYPE huatuo_bamai_memory_vmstat_nr_dirty gauge
huatuo_bamai_memory_vmstat_nr_dirty{host="hostname",region="dev"} 19047
# HELP huatuo_bamai_memory_vmstat_nr_dirty_background_threshold /proc/vmstat nr_dirty_background_threshold
# TYPE huatuo_bamai_memory_vmstat_nr_dirty_background_threshold gauge
huatuo_bamai_memory_vmstat_nr_dirty_background_threshold{host="hostname",region="dev"} 379858
# HELP huatuo_bamai_memory_vmstat_nr_dirty_threshold /proc/vmstat nr_dirty_threshold
# TYPE huatuo_bamai_memory_vmstat_nr_dirty_threshold gauge
huatuo_bamai_memory_vmstat_nr_dirty_threshold{host="hostname",region="dev"} 760646
# HELP huatuo_bamai_memory_vmstat_nr_free_pages /proc/vmstat nr_free_pages
# TYPE huatuo_bamai_memory_vmstat_nr_free_pages gauge
huatuo_bamai_memory_vmstat_nr_free_pages{host="hostname",region="dev"} 3.20535e+06
# HELP huatuo_bamai_memory_vmstat_nr_inactive_anon /proc/vmstat nr_inactive_anon
# TYPE huatuo_bamai_memory_vmstat_nr_inactive_anon gauge
huatuo_bamai_memory_vmstat_nr_inactive_anon{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_nr_inactive_file /proc/vmstat nr_inactive_file
# TYPE huatuo_bamai_memory_vmstat_nr_inactive_file gauge
huatuo_bamai_memory_vmstat_nr_inactive_file{host="hostname",region="dev"} 428518
# HELP huatuo_bamai_memory_vmstat_nr_mlock /proc/vmstat nr_mlock
# TYPE huatuo_bamai_memory_vmstat_nr_mlock gauge
huatuo_bamai_memory_vmstat_nr_mlock{host="hostname",region="dev"} 6821
# HELP huatuo_bamai_memory_vmstat_nr_shmem /proc/vmstat nr_shmem
# TYPE huatuo_bamai_memory_vmstat_nr_shmem gauge
huatuo_bamai_memory_vmstat_nr_shmem{host="hostname",region="dev"} 541
# HELP huatuo_bamai_memory_vmstat_nr_shmem_hugepages /proc/vmstat nr_shmem_hugepages
# TYPE huatuo_bamai_memory_vmstat_nr_shmem_hugepages gauge
huatuo_bamai_memory_vmstat_nr_shmem_hugepages{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_nr_shmem_pmdmapped /proc/vmstat nr_shmem_pmdmapped
# TYPE huatuo_bamai_memory_vmstat_nr_shmem_pmdmapped gauge
huatuo_bamai_memory_vmstat_nr_shmem_pmdmapped{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_nr_slab_reclaimable /proc/vmstat nr_slab_reclaimable
# TYPE huatuo_bamai_memory_vmstat_nr_slab_reclaimable gauge
huatuo_bamai_memory_vmstat_nr_slab_reclaimable{host="hostname",region="dev"} 22322
# HELP huatuo_bamai_memory_vmstat_nr_slab_unreclaimable /proc/vmstat nr_slab_unreclaimable
# TYPE huatuo_bamai_memory_vmstat_nr_slab_unreclaimable gauge
huatuo_bamai_memory_vmstat_nr_slab_unreclaimable{host="hostname",region="dev"} 24168
# HELP huatuo_bamai_memory_vmstat_nr_unevictable /proc/vmstat nr_unevictable
# TYPE huatuo_bamai_memory_vmstat_nr_unevictable gauge
huatuo_bamai_memory_vmstat_nr_unevictable{host="hostname",region="dev"} 6839
# HELP huatuo_bamai_memory_vmstat_nr_writeback /proc/vmstat nr_writeback
# TYPE huatuo_bamai_memory_vmstat_nr_writeback gauge
huatuo_bamai_memory_vmstat_nr_writeback{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_nr_writeback_temp /proc/vmstat nr_writeback_temp
# TYPE huatuo_bamai_memory_vmstat_nr_writeback_temp gauge
huatuo_bamai_memory_vmstat_nr_writeback_temp{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_numa_pages_migrated /proc/vmstat numa_pages_migrated
# TYPE huatuo_bamai_memory_vmstat_numa_pages_migrated gauge
huatuo_bamai_memory_vmstat_numa_pages_migrated{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgdeactivate /proc/vmstat pgdeactivate
# TYPE huatuo_bamai_memory_vmstat_pgdeactivate gauge
huatuo_bamai_memory_vmstat_pgdeactivate{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgrefill /proc/vmstat pgrefill
# TYPE huatuo_bamai_memory_vmstat_pgrefill gauge
huatuo_bamai_memory_vmstat_pgrefill{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgscan_direct /proc/vmstat pgscan_direct
# TYPE huatuo_bamai_memory_vmstat_pgscan_direct gauge
huatuo_bamai_memory_vmstat_pgscan_direct{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgscan_direct_throttle /proc/vmstat pgscan_direct_throttle
# TYPE huatuo_bamai_memory_vmstat_pgscan_direct_throttle gauge
huatuo_bamai_memory_vmstat_pgscan_direct_throttle{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgscan_kswapd /proc/vmstat pgscan_kswapd
# TYPE huatuo_bamai_memory_vmstat_pgscan_kswapd gauge
huatuo_bamai_memory_vmstat_pgscan_kswapd{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgsteal_direct /proc/vmstat pgsteal_direct
# TYPE huatuo_bamai_memory_vmstat_pgsteal_direct gauge
huatuo_bamai_memory_vmstat_pgsteal_direct{host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_vmstat_pgsteal_kswapd /proc/vmstat pgsteal_kswapd
# TYPE huatuo_bamai_memory_vmstat_pgsteal_kswapd gauge
huatuo_bamai_memory_vmstat_pgsteal_kswapd{host="hostname",region="dev"} 0
- 页面状态与 LRU 分布, Page state & LRU
| 指标 |
意义 |
单位 |
对象 |
标签 |
| nr_free_pages |
空闲页面总数(伙伴系统可直接分配)。 |
页面 |
物理机 |
host, region |
| nr_inactive_anon |
非活跃匿名页面数 |
页面 |
物理机 |
host, region |
| nr_inactive_file |
活跃文件页面数 |
页面 |
物理机 |
host, region |
| nr_active_anon |
活跃匿名页面数 |
页面 |
物理机 |
host, region |
| nr_active_file |
活跃文件页面数 |
页面 |
物理机 |
host, region |
| nr_unevictable |
不可回收页面数(mlocked、hugetlbfs 等) |
页面 |
物理机 |
host, region |
| nr_mlock |
被 mlock() 锁定的页面数 |
页面 |
物理机 |
host, region |
| nr_shmem |
tmpfs / shmem 使用的页面数 |
页面 |
物理机 |
host, region |
| nr_slab_reclaimable |
可回收的 slab 缓存对象 |
页面 |
物理机 |
host, region |
| nr_slab_unreclaimable |
不可回收的 slab 缓存对象 |
页面 |
物理机 |
host, region |
- 脏页与写回控制, Dirty & writeback thresholds
| 指标 |
意义 |
单位 |
对象 |
标签 |
| nr_dirty |
当前脏页数 |
页面 |
物理机 |
host, region |
| nr_writeback |
正在写回的页面数 |
页面 |
物理机 |
host, region |
| nr_dirty_threshold |
脏页达到此阈值时开始强制写回(dirty_background_ratio / dirty_ratio 决定) |
页面 |
物理机 |
host, region |
| nr_dirty_background_threshold |
后台写回开始的阈值 |
页面 |
物理机 |
host, region |
| nr_dirty_background_threshold |
后台写回开始的阈值 |
页面 |
物理机 |
host, region |
- 页面错误与换页, Page fault & swapping
| 指标 |
意义 |
单位 |
对象 |
标签 |
| pgfault |
总缺页异常次数 |
计数 |
物理机 |
host, region |
| pgmajfault |
主缺页异常次数 |
计数 |
物理机 |
host, region |
| pgpgin |
从块设备读入的页面数 |
页面 |
物理机 |
host, region |
| pgpgout |
写出到块设备的页面数 |
页面 |
物理机 |
host, region |
| pswpin/pswpout |
换入/换出的页面数(swap) |
页面 |
物理机 |
host, region |
- 回收与扫描, Reclaim & scanning
| 指标 |
意义 |
单位 |
对象 |
标签 |
| pgscan_kswapd/direct/khugepaged |
kswapd/直接回收/khugepaged 扫描的页面数 |
页面数 |
物理机 |
host, region |
| pgsteal_kswapd/direct/khugepaged |
回收成功的页面数 |
页面数 |
物理机 |
host, region |
| 指标 |
意义 |
单位 |
对象 |
标签 |
| thp_fault_alloc |
缺页时成功分配 THP 的次数 |
计数 |
物理机 |
host, region |
| thp_fault_fallback |
缺页时分配 THP 失败而回落普通页的次数 |
计数 |
物理机 |
host, region |
| thp_collapse_alloc |
khugepaged 折叠成 THP 的成功次数 |
计数 |
物理机 |
host, region |
| thp_collapse_alloc_failed |
khugepaged 折叠 THP 的失败次数 |
计数 |
物理机 |
host, region |
- NUMA 相关统计, NUMA balancing & allocation
| 指标 |
意义 |
单位 |
对象 |
标签 |
| numa_hit |
进程希望从某个节点分配内存,并且成功在该节点上分配到的页面总数。 |
计数 |
物理机 |
host, region |
| numa_miss |
进程原本希望从其他节点分配,但由于目标节点内存不足等原因,最终在本节点分配成功的页面数。 |
计数 |
物理机 |
host, region |
| numa_foreign |
进程原本希望从本节点分配内存,但最终在其他节点分配成功的页面数。 |
计数 |
物理机 |
host, region |
| numa_local |
进程在本地节点上成功分配到的页面总数。 |
计数 |
物理机 |
host, region |
| numa_other |
进程在远程节点上分配到的页面总数。 |
计数 |
物理机 |
host, region |
| numa_pages_migrated |
由于自动 NUMA 平衡而成功迁移的页面总数 |
计数 |
物理机 |
host, region |
Ref:
资源事件
容器级别的内存事件指标。
# HELP huatuo_bamai_memory_events_container_high memory events high
# TYPE huatuo_bamai_memory_events_container_high gauge
huatuo_bamai_memory_events_container_high{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_events_container_low memory events low
# TYPE huatuo_bamai_memory_events_container_low gauge
huatuo_bamai_memory_events_container_low{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_events_container_max memory events max
# TYPE huatuo_bamai_memory_events_container_max gauge
huatuo_bamai_memory_events_container_max{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_events_container_oom memory events oom
# TYPE huatuo_bamai_memory_events_container_oom gauge
huatuo_bamai_memory_events_container_oom{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_events_container_oom_group_kill memory events oom_group_kill
# TYPE huatuo_bamai_memory_events_container_oom_group_kill gauge
huatuo_bamai_memory_events_container_oom_group_kill{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_memory_events_container_oom_kill memory events oom_kill
# TYPE huatuo_bamai_memory_events_container_oom_kill gauge
huatuo_bamai_memory_events_container_oom_kill{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
标签 |
| memory_events_container_low |
使用量低于 memory.low,但由于系统内存压力大,仍被主动回收的次数。说明 memory.low 被过度承诺。 |
计数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_events_container_high |
内存使用量超过 memory.high(软限制),导致进程被节流并强制走直接回收的次数。 |
计数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_events_container_max |
内存使用量达到或即将超过 memory.max(硬限制),触发内存分配失败检查的次数。 |
计数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_events_container_oom |
内存使用量达到 memory.max 限制,导致内存分配失败,进入 OOM 路径的次数。 |
计数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_events_container_oom_kill |
cgroup 内因达到内存限制而被 OOM killer 杀死的进程数。 |
计数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| memory_events_container_oom_group_kill |
整个 cgroup 被 OOM killer 杀死的次数。 |
计数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
Buddyinfo
展示 Buddy 分配器(内核页分配器核心算法)在每个 NUMA 节点(Node)和每个内存区域(Zone)中的空闲内存块分布情况。
# HELP huatuo_bamai_memory_buddyinfo_blocks buddy info
# TYPE huatuo_bamai_memory_buddyinfo_blocks gauge
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="0",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="0",region="dev",zone="DMA32"} 3
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="0",region="dev",zone="Normal"} 7
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="1",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="1",region="dev",zone="DMA32"} 1
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="1",region="dev",zone="Normal"} 36
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="10",region="dev",zone="DMA"} 2
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="10",region="dev",zone="DMA32"} 743
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="10",region="dev",zone="Normal"} 2265
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="2",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="2",region="dev",zone="DMA32"} 3
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="2",region="dev",zone="Normal"} 10
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="3",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="3",region="dev",zone="DMA32"} 2
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="3",region="dev",zone="Normal"} 224
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="4",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="4",region="dev",zone="DMA32"} 1
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="4",region="dev",zone="Normal"} 376
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="5",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="5",region="dev",zone="DMA32"} 1
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="5",region="dev",zone="Normal"} 165
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="6",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="6",region="dev",zone="DMA32"} 3
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="6",region="dev",zone="Normal"} 118
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="7",region="dev",zone="DMA"} 0
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="7",region="dev",zone="DMA32"} 4
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="7",region="dev",zone="Normal"} 172
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="8",region="dev",zone="DMA"} 1
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="8",region="dev",zone="DMA32"} 4
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="8",region="dev",zone="Normal"} 35
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="9",region="dev",zone="DMA"} 2
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="9",region="dev",zone="DMA32"} 4
huatuo_bamai_memory_buddyinfo_blocks{host="hostname",node="0",order="9",region="dev",zone="Normal"} 25
| 指标 |
意义 |
单位 |
对象 |
取值 |
标签 |
|
| memory_buddyinfo_blocks |
buddy 内存页空闲情况。 |
内存页 |
物理机 |
procfs |
host, node, order, region, zone |
|
网络系统
TCP 内存
如下指标描述 TCP 协议栈占用系统内存状态。
# HELP huatuo_bamai_tcp_memory_limit_pages tcp memory pages limit
# TYPE huatuo_bamai_tcp_memory_limit_pages gauge
huatuo_bamai_tcp_memory_limit_pages{host="hostname",region="dev"} 380526
# HELP huatuo_bamai_tcp_memory_usage_bytes tcp memory bytes usage
# TYPE huatuo_bamai_tcp_memory_usage_bytes gauge
huatuo_bamai_tcp_memory_usage_bytes{host="hostname",region="dev"} 0
# HELP huatuo_bamai_tcp_memory_usage_pages tcp memory pages usage
# TYPE huatuo_bamai_tcp_memory_usage_pages gauge
huatuo_bamai_tcp_memory_usage_pages{host="hostname",region="dev"} 0
# HELP huatuo_bamai_tcp_memory_usage_percent tcp memory usage percent
# TYPE huatuo_bamai_tcp_memory_usage_percent gauge
huatuo_bamai_tcp_memory_usage_percent{host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
标签 |
| tcp_memory_limit_pages |
系统可使用的 TCP 总内存大小 |
内存页 |
物理机 |
host, region |
| tcp_memory_usage_bytes |
系统已使用的 TCP 内存大小 |
字节 |
物理机 |
host, region |
| tcp_memory_usage_pages |
系统已使用的 TCP 内存大小 |
内存页 |
物理机 |
host, region |
| tcp_memory_usage_percent |
系统已使用的 TCP 内存百分比(相对 TCP 内存总限制) |
% |
物理机 |
host, region |
邻居项
如下指标描述邻居项使用状态。
# HELP huatuo_bamai_arp_container_entries arp entries in container netns
# TYPE huatuo_bamai_arp_container_entries gauge
huatuo_bamai_arp_container_entries{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_arp_entries host init namespace
# TYPE huatuo_bamai_arp_entries gauge
huatuo_bamai_arp_entries{host="hostname",region="dev"} 5
# HELP huatuo_bamai_arp_total all entries in arp_cache for containers and host netns
# TYPE huatuo_bamai_arp_total gauge
huatuo_bamai_arp_total{host="hostname",region="dev"} 12
| 指标 |
意义 |
单位 |
对象 |
标签 |
| arp_entries |
宿主机网络命名空间 arp 条目数量 |
计数 |
宿主命名空间 |
host, region |
| arp_total |
物理机所有网络命名空间 arp 条目数量总和 |
计数 |
物理机 |
host, region |
| arp_container_entries |
容器网络命名空间 arp 条目数量 |
计数 |
容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
Qdisc
Qdisc 是内核网络子系统重要模块。通过观测该模块,可以清楚的看到网络报文处理,延迟情况。
# HELP huatuo_bamai_netdev_qdisc_backlog Number of bytes currently in queue to be sent.
# TYPE huatuo_bamai_netdev_qdisc_backlog gauge
huatuo_bamai_netdev_qdisc_backlog{device="ens2",host="hostname",kind="fq_codel",region="dev"} 0
# HELP huatuo_bamai_netdev_qdisc_bytes_total Number of bytes sent.
# TYPE huatuo_bamai_netdev_qdisc_bytes_total counter
huatuo_bamai_netdev_qdisc_bytes_total{device="ens2",host="hostname",kind="fq_codel",region="dev"} 2.578235443e+09
# HELP huatuo_bamai_netdev_qdisc_current_queue_length Number of packets currently in queue to be sent.
# TYPE huatuo_bamai_netdev_qdisc_current_queue_length gauge
huatuo_bamai_netdev_qdisc_current_queue_length{device="ens2",host="hostname",kind="fq_codel",region="dev"} 0
# HELP huatuo_bamai_netdev_qdisc_drops_total Number of packet drops.
# TYPE huatuo_bamai_netdev_qdisc_drops_total counter
huatuo_bamai_netdev_qdisc_drops_total{device="ens2",host="hostname",kind="fq_codel",region="dev"} 0
# HELP huatuo_bamai_netdev_qdisc_overlimits_total Number of packet overlimits.
# TYPE huatuo_bamai_netdev_qdisc_overlimits_total counter
huatuo_bamai_netdev_qdisc_overlimits_total{device="ens2",host="hostname",kind="fq_codel",region="dev"} 0
# HELP huatuo_bamai_netdev_qdisc_packets_total Number of packets sent.
# TYPE huatuo_bamai_netdev_qdisc_packets_total counter
huatuo_bamai_netdev_qdisc_packets_total{device="ens2",host="hostname",kind="fq_codel",region="dev"} 6.867714e+06
# HELP huatuo_bamai_netdev_qdisc_requeues_total Number of packets dequeued, not transmitted, and requeued.
# TYPE huatuo_bamai_netdev_qdisc_requeues_total counter
huatuo_bamai_netdev_qdisc_requeues_total{device="ens2",host="hostname",kind="fq_codel",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
标签 |
| qdisc_backlog |
后备排队待发送的包数 |
字节 |
物理机 |
device, host, kind, region |
| qdisc_current_queue_length |
当前排队的包量 |
计数 |
物理机 |
device, host, kind, region |
| qdisc_overlimits_total |
超限次数 |
计数 |
物理机 |
device, host, kind, region |
| qdisc_requeues_total |
由于网卡/驱动暂时无法发送而被重新入队的次数 |
计数 |
物理机 |
device, host, kind, region |
| qdisc_drops_total |
主动丢弃的包数(因队列满、限速策略等原因) |
计数 |
物理机 |
device, host, kind, region |
| qdisc_bytes_total |
已发送的包量 |
字节 |
物理机 |
device, host, kind, region |
| qdisc_packets_total |
已发送的包数 |
计数 |
物理机 |
device, host, kind, region |
硬件丢包
网络设备硬件接收方向丢包数。
# HELP huatuo_bamai_netdev_hw_rx_dropped count of packets dropped at hardware level
# TYPE huatuo_bamai_netdev_hw_rx_dropped gauge
huatuo_bamai_netdev_hw_rx_dropped{device="eth0",driver="mlx5_core",host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
取值 |
标签 |
| netdev_hw_rx_dropped |
网卡硬件接收方向丢包 |
计数 |
物理机 |
eBPF |
device, driver, host, region |
网络设备
# HELP huatuo_bamai_netdev_container_receive_bytes_total Network device statistic receive_bytes.
# TYPE huatuo_bamai_netdev_container_receive_bytes_total counter
huatuo_bamai_netdev_container_receive_bytes_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 6.4400018e+07
# HELP huatuo_bamai_netdev_container_receive_compressed_total Network device statistic receive_compressed.
# TYPE huatuo_bamai_netdev_container_receive_compressed_total counter
huatuo_bamai_netdev_container_receive_compressed_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_dropped_total Network device statistic receive_dropped.
# TYPE huatuo_bamai_netdev_container_receive_dropped_total counter
huatuo_bamai_netdev_container_receive_dropped_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_errors_total Network device statistic receive_errors.
# TYPE huatuo_bamai_netdev_container_receive_errors_total counter
huatuo_bamai_netdev_container_receive_errors_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_fifo_total Network device statistic receive_fifo.
# TYPE huatuo_bamai_netdev_container_receive_fifo_total counter
huatuo_bamai_netdev_container_receive_fifo_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_frame_total Network device statistic receive_frame.
# TYPE huatuo_bamai_netdev_container_receive_frame_total counter
huatuo_bamai_netdev_container_receive_frame_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_multicast_total Network device statistic receive_multicast.
# TYPE huatuo_bamai_netdev_container_receive_multicast_total counter
huatuo_bamai_netdev_container_receive_multicast_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_receive_packets_total Network device statistic receive_packets.
# TYPE huatuo_bamai_netdev_container_receive_packets_total counter
huatuo_bamai_netdev_container_receive_packets_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 693155
# HELP huatuo_bamai_netdev_container_transmit_bytes_total Network device statistic transmit_bytes.
# TYPE huatuo_bamai_netdev_container_transmit_bytes_total counter
huatuo_bamai_netdev_container_transmit_bytes_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 6.2347911e+07
# HELP huatuo_bamai_netdev_container_transmit_carrier_total Network device statistic transmit_carrier.
# TYPE huatuo_bamai_netdev_container_transmit_carrier_total counter
huatuo_bamai_netdev_container_transmit_carrier_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_colls_total Network device statistic transmit_colls.
# TYPE huatuo_bamai_netdev_container_transmit_colls_total counter
huatuo_bamai_netdev_container_transmit_colls_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_compressed_total Network device statistic transmit_compressed.
# TYPE huatuo_bamai_netdev_container_transmit_compressed_total counter
huatuo_bamai_netdev_container_transmit_compressed_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_dropped_total Network device statistic transmit_dropped.
# TYPE huatuo_bamai_netdev_container_transmit_dropped_total counter
huatuo_bamai_netdev_container_transmit_dropped_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_errors_total Network device statistic transmit_errors.
# TYPE huatuo_bamai_netdev_container_transmit_errors_total counter
huatuo_bamai_netdev_container_transmit_errors_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_fifo_total Network device statistic transmit_fifo.
# TYPE huatuo_bamai_netdev_container_transmit_fifo_total counter
huatuo_bamai_netdev_container_transmit_fifo_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netdev_container_transmit_packets_total Network device statistic transmit_packets.
# TYPE huatuo_bamai_netdev_container_transmit_packets_total counter
huatuo_bamai_netdev_container_transmit_packets_total{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",device="eth0",host="hostname",region="dev"} 660218
| 指标 |
意义 |
单位 |
对象 |
标签 |
| netdev_receive_bytes_total |
成功接收的总字节数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_receive_packets_total |
成功接收的数据包总数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_receive_compressed_total |
接收到的已压缩数据包数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_receive_frame_total |
接收帧错误数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_receive_errors_total |
接收错误总数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_receive_dropped_total |
由于各种原因被内核或驱动丢弃的接收包数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_receive_fifo_total |
接收FIFO/环形缓冲区溢出错误数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_transmit_bytes_total |
成功发送的总字节数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_transmit_packets_total |
成功发送的数据包总数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_transmit_errors_total |
发送错误总数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_transmit_dropped_total |
发送过程中被丢弃的包数(队列满、策略丢弃等) |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_transmit_fifo_total |
发送FIFO/环形缓冲区错误数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_transmit_carrier_total |
载波错误次数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netdev_transmit_compressed_total |
发送的已压缩数据包数 |
计数 |
物理机或者容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
TCP
# HELP huatuo_bamai_netstat_container_TcpExt_ArpFilter statistic TcpExtArpFilter.
# TYPE huatuo_bamai_netstat_container_TcpExt_ArpFilter gauge
huatuo_bamai_netstat_container_TcpExt_ArpFilter{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_BusyPollRxPackets statistic TcpExtBusyPollRxPackets.
# TYPE huatuo_bamai_netstat_container_TcpExt_BusyPollRxPackets gauge
huatuo_bamai_netstat_container_TcpExt_BusyPollRxPackets{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_DelayedACKLocked statistic TcpExtDelayedACKLocked.
# TYPE huatuo_bamai_netstat_container_TcpExt_DelayedACKLocked gauge
huatuo_bamai_netstat_container_TcpExt_DelayedACKLocked{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_DelayedACKLost statistic TcpExtDelayedACKLost.
# TYPE huatuo_bamai_netstat_container_TcpExt_DelayedACKLost gauge
huatuo_bamai_netstat_container_TcpExt_DelayedACKLost{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_DelayedACKs statistic TcpExtDelayedACKs.
# TYPE huatuo_bamai_netstat_container_TcpExt_DelayedACKs gauge
huatuo_bamai_netstat_container_TcpExt_DelayedACKs{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 4650
# HELP huatuo_bamai_netstat_container_TcpExt_EmbryonicRsts statistic TcpExtEmbryonicRsts.
# TYPE huatuo_bamai_netstat_container_TcpExt_EmbryonicRsts gauge
huatuo_bamai_netstat_container_TcpExt_EmbryonicRsts{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_IPReversePathFilter statistic TcpExtIPReversePathFilter.
# TYPE huatuo_bamai_netstat_container_TcpExt_IPReversePathFilter gauge
huatuo_bamai_netstat_container_TcpExt_IPReversePathFilter{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_ListenDrops statistic TcpExtListenDrops.
# TYPE huatuo_bamai_netstat_container_TcpExt_ListenDrops gauge
huatuo_bamai_netstat_container_TcpExt_ListenDrops{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_ListenOverflows statistic TcpExtListenOverflows.
# TYPE huatuo_bamai_netstat_container_TcpExt_ListenOverflows gauge
huatuo_bamai_netstat_container_TcpExt_ListenOverflows{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_LockDroppedIcmps statistic TcpExtLockDroppedIcmps.
# TYPE huatuo_bamai_netstat_container_TcpExt_LockDroppedIcmps gauge
huatuo_bamai_netstat_container_TcpExt_LockDroppedIcmps{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_OfoPruned statistic TcpExtOfoPruned.
# TYPE huatuo_bamai_netstat_container_TcpExt_OfoPruned gauge
huatuo_bamai_netstat_container_TcpExt_OfoPruned{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_OutOfWindowIcmps statistic TcpExtOutOfWindowIcmps.
# TYPE huatuo_bamai_netstat_container_TcpExt_OutOfWindowIcmps gauge
huatuo_bamai_netstat_container_TcpExt_OutOfWindowIcmps{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_PAWSActive statistic TcpExtPAWSActive.
# TYPE huatuo_bamai_netstat_container_TcpExt_PAWSActive gauge
huatuo_bamai_netstat_container_TcpExt_PAWSActive{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_PAWSEstab statistic TcpExtPAWSEstab.
# TYPE huatuo_bamai_netstat_container_TcpExt_PAWSEstab gauge
huatuo_bamai_netstat_container_TcpExt_PAWSEstab{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_PFMemallocDrop statistic TcpExtPFMemallocDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_PFMemallocDrop gauge
huatuo_bamai_netstat_container_TcpExt_PFMemallocDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_PruneCalled statistic TcpExtPruneCalled.
# TYPE huatuo_bamai_netstat_container_TcpExt_PruneCalled gauge
huatuo_bamai_netstat_container_TcpExt_PruneCalled{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_RcvPruned statistic TcpExtRcvPruned.
# TYPE huatuo_bamai_netstat_container_TcpExt_RcvPruned gauge
huatuo_bamai_netstat_container_TcpExt_RcvPruned{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_SyncookiesFailed statistic TcpExtSyncookiesFailed.
# TYPE huatuo_bamai_netstat_container_TcpExt_SyncookiesFailed gauge
huatuo_bamai_netstat_container_TcpExt_SyncookiesFailed{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_SyncookiesRecv statistic TcpExtSyncookiesRecv.
# TYPE huatuo_bamai_netstat_container_TcpExt_SyncookiesRecv gauge
huatuo_bamai_netstat_container_TcpExt_SyncookiesRecv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_SyncookiesSent statistic TcpExtSyncookiesSent.
# TYPE huatuo_bamai_netstat_container_TcpExt_SyncookiesSent gauge
huatuo_bamai_netstat_container_TcpExt_SyncookiesSent{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedChallenge statistic TcpExtTCPACKSkippedChallenge.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedChallenge gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedChallenge{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedFinWait2 statistic TcpExtTCPACKSkippedFinWait2.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedFinWait2 gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedFinWait2{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedPAWS statistic TcpExtTCPACKSkippedPAWS.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedPAWS gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedPAWS{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSeq statistic TcpExtTCPACKSkippedSeq.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSeq gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSeq{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSynRecv statistic TcpExtTCPACKSkippedSynRecv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSynRecv gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedSynRecv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedTimeWait statistic TcpExtTCPACKSkippedTimeWait.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedTimeWait gauge
huatuo_bamai_netstat_container_TcpExt_TCPACKSkippedTimeWait{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAOBad statistic TcpExtTCPAOBad.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAOBad gauge
huatuo_bamai_netstat_container_TcpExt_TCPAOBad{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAODroppedIcmps statistic TcpExtTCPAODroppedIcmps.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAODroppedIcmps gauge
huatuo_bamai_netstat_container_TcpExt_TCPAODroppedIcmps{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAOGood statistic TcpExtTCPAOGood.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAOGood gauge
huatuo_bamai_netstat_container_TcpExt_TCPAOGood{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAOKeyNotFound statistic TcpExtTCPAOKeyNotFound.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAOKeyNotFound gauge
huatuo_bamai_netstat_container_TcpExt_TCPAOKeyNotFound{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAORequired statistic TcpExtTCPAORequired.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAORequired gauge
huatuo_bamai_netstat_container_TcpExt_TCPAORequired{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortFailed statistic TcpExtTCPAbortFailed.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortFailed gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortFailed{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortOnClose statistic TcpExtTCPAbortOnClose.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortOnClose gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortOnClose{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortOnData statistic TcpExtTCPAbortOnData.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortOnData gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortOnData{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortOnLinger statistic TcpExtTCPAbortOnLinger.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortOnLinger gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortOnLinger{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortOnMemory statistic TcpExtTCPAbortOnMemory.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortOnMemory gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortOnMemory{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAbortOnTimeout statistic TcpExtTCPAbortOnTimeout.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAbortOnTimeout gauge
huatuo_bamai_netstat_container_TcpExt_TCPAbortOnTimeout{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAckCompressed statistic TcpExtTCPAckCompressed.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAckCompressed gauge
huatuo_bamai_netstat_container_TcpExt_TCPAckCompressed{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPAutoCorking statistic TcpExtTCPAutoCorking.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPAutoCorking gauge
huatuo_bamai_netstat_container_TcpExt_TCPAutoCorking{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPBacklogCoalesce statistic TcpExtTCPBacklogCoalesce.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPBacklogCoalesce gauge
huatuo_bamai_netstat_container_TcpExt_TCPBacklogCoalesce{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 3
# HELP huatuo_bamai_netstat_container_TcpExt_TCPBacklogDrop statistic TcpExtTCPBacklogDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPBacklogDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPBacklogDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPChallengeACK statistic TcpExtTCPChallengeACK.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPChallengeACK gauge
huatuo_bamai_netstat_container_TcpExt_TCPChallengeACK{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredDubious statistic TcpExtTCPDSACKIgnoredDubious.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredDubious gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredDubious{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredNoUndo statistic TcpExtTCPDSACKIgnoredNoUndo.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredNoUndo gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredNoUndo{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredOld statistic TcpExtTCPDSACKIgnoredOld.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredOld gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKIgnoredOld{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoRecv statistic TcpExtTCPDSACKOfoRecv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoRecv gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoRecv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoSent statistic TcpExtTCPDSACKOfoSent.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoSent gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKOfoSent{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKOldSent statistic TcpExtTCPDSACKOldSent.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKOldSent gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKOldSent{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecv statistic TcpExtTCPDSACKRecv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecv gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecvSegs statistic TcpExtTCPDSACKRecvSegs.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecvSegs gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKRecvSegs{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDSACKUndo statistic TcpExtTCPDSACKUndo.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDSACKUndo gauge
huatuo_bamai_netstat_container_TcpExt_TCPDSACKUndo{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDeferAcceptDrop statistic TcpExtTCPDeferAcceptDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDeferAcceptDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPDeferAcceptDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDelivered statistic TcpExtTCPDelivered.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDelivered gauge
huatuo_bamai_netstat_container_TcpExt_TCPDelivered{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 3.28098e+06
# HELP huatuo_bamai_netstat_container_TcpExt_TCPDeliveredCE statistic TcpExtTCPDeliveredCE.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPDeliveredCE gauge
huatuo_bamai_netstat_container_TcpExt_TCPDeliveredCE{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActive statistic TcpExtTCPFastOpenActive.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActive gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActive{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActiveFail statistic TcpExtTCPFastOpenActiveFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActiveFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenActiveFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenBlackhole statistic TcpExtTCPFastOpenBlackhole.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenBlackhole gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenBlackhole{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenCookieReqd statistic TcpExtTCPFastOpenCookieReqd.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenCookieReqd gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenCookieReqd{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenListenOverflow statistic TcpExtTCPFastOpenListenOverflow.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenListenOverflow gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenListenOverflow{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassive statistic TcpExtTCPFastOpenPassive.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassive gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassive{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveAltKey statistic TcpExtTCPFastOpenPassiveAltKey.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveAltKey gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveAltKey{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveFail statistic TcpExtTCPFastOpenPassiveFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastOpenPassiveFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFastRetrans statistic TcpExtTCPFastRetrans.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFastRetrans gauge
huatuo_bamai_netstat_container_TcpExt_TCPFastRetrans{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFromZeroWindowAdv statistic TcpExtTCPFromZeroWindowAdv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFromZeroWindowAdv gauge
huatuo_bamai_netstat_container_TcpExt_TCPFromZeroWindowAdv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPFullUndo statistic TcpExtTCPFullUndo.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPFullUndo gauge
huatuo_bamai_netstat_container_TcpExt_TCPFullUndo{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHPAcks statistic TcpExtTCPHPAcks.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHPAcks gauge
huatuo_bamai_netstat_container_TcpExt_TCPHPAcks{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 616667
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHPHits statistic TcpExtTCPHPHits.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHPHits gauge
huatuo_bamai_netstat_container_TcpExt_TCPHPHits{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 9913
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayCwnd statistic TcpExtTCPHystartDelayCwnd.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayCwnd gauge
huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayCwnd{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayDetect statistic TcpExtTCPHystartDelayDetect.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayDetect gauge
huatuo_bamai_netstat_container_TcpExt_TCPHystartDelayDetect{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainCwnd statistic TcpExtTCPHystartTrainCwnd.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainCwnd gauge
huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainCwnd{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainDetect statistic TcpExtTCPHystartTrainDetect.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainDetect gauge
huatuo_bamai_netstat_container_TcpExt_TCPHystartTrainDetect{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPKeepAlive statistic TcpExtTCPKeepAlive.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPKeepAlive gauge
huatuo_bamai_netstat_container_TcpExt_TCPKeepAlive{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 20
# HELP huatuo_bamai_netstat_container_TcpExt_TCPLossFailures statistic TcpExtTCPLossFailures.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPLossFailures gauge
huatuo_bamai_netstat_container_TcpExt_TCPLossFailures{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPLossProbeRecovery statistic TcpExtTCPLossProbeRecovery.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPLossProbeRecovery gauge
huatuo_bamai_netstat_container_TcpExt_TCPLossProbeRecovery{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPLossProbes statistic TcpExtTCPLossProbes.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPLossProbes gauge
huatuo_bamai_netstat_container_TcpExt_TCPLossProbes{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_netstat_container_TcpExt_TCPLossUndo statistic TcpExtTCPLossUndo.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPLossUndo gauge
huatuo_bamai_netstat_container_TcpExt_TCPLossUndo{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPLostRetransmit statistic TcpExtTCPLostRetransmit.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPLostRetransmit gauge
huatuo_bamai_netstat_container_TcpExt_TCPLostRetransmit{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMD5Failure statistic TcpExtTCPMD5Failure.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMD5Failure gauge
huatuo_bamai_netstat_container_TcpExt_TCPMD5Failure{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMD5NotFound statistic TcpExtTCPMD5NotFound.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMD5NotFound gauge
huatuo_bamai_netstat_container_TcpExt_TCPMD5NotFound{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMD5Unexpected statistic TcpExtTCPMD5Unexpected.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMD5Unexpected gauge
huatuo_bamai_netstat_container_TcpExt_TCPMD5Unexpected{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMTUPFail statistic TcpExtTCPMTUPFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMTUPFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPMTUPFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMTUPSuccess statistic TcpExtTCPMTUPSuccess.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMTUPSuccess gauge
huatuo_bamai_netstat_container_TcpExt_TCPMTUPSuccess{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressures statistic TcpExtTCPMemoryPressures.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressures gauge
huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressures{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressuresChrono statistic TcpExtTCPMemoryPressuresChrono.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressuresChrono gauge
huatuo_bamai_netstat_container_TcpExt_TCPMemoryPressuresChrono{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqFailure statistic TcpExtTCPMigrateReqFailure.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqFailure gauge
huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqFailure{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqSuccess statistic TcpExtTCPMigrateReqSuccess.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqSuccess gauge
huatuo_bamai_netstat_container_TcpExt_TCPMigrateReqSuccess{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPMinTTLDrop statistic TcpExtTCPMinTTLDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPMinTTLDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPMinTTLDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPOFODrop statistic TcpExtTCPOFODrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPOFODrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPOFODrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPOFOMerge statistic TcpExtTCPOFOMerge.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPOFOMerge gauge
huatuo_bamai_netstat_container_TcpExt_TCPOFOMerge{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPOFOQueue statistic TcpExtTCPOFOQueue.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPOFOQueue gauge
huatuo_bamai_netstat_container_TcpExt_TCPOFOQueue{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPOrigDataSent statistic TcpExtTCPOrigDataSent.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPOrigDataSent gauge
huatuo_bamai_netstat_container_TcpExt_TCPOrigDataSent{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 2.675557e+06
# HELP huatuo_bamai_netstat_container_TcpExt_TCPPLBRehash statistic TcpExtTCPPLBRehash.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPPLBRehash gauge
huatuo_bamai_netstat_container_TcpExt_TCPPLBRehash{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPPartialUndo statistic TcpExtTCPPartialUndo.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPPartialUndo gauge
huatuo_bamai_netstat_container_TcpExt_TCPPartialUndo{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPPureAcks statistic TcpExtTCPPureAcks.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPPureAcks gauge
huatuo_bamai_netstat_container_TcpExt_TCPPureAcks{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 2.095262e+06
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRcvCoalesce statistic TcpExtTCPRcvCoalesce.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRcvCoalesce gauge
huatuo_bamai_netstat_container_TcpExt_TCPRcvCoalesce{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 3
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRcvCollapsed statistic TcpExtTCPRcvCollapsed.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRcvCollapsed gauge
huatuo_bamai_netstat_container_TcpExt_TCPRcvCollapsed{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRcvQDrop statistic TcpExtTCPRcvQDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRcvQDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPRcvQDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRenoFailures statistic TcpExtTCPRenoFailures.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRenoFailures gauge
huatuo_bamai_netstat_container_TcpExt_TCPRenoFailures{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRenoRecovery statistic TcpExtTCPRenoRecovery.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRenoRecovery gauge
huatuo_bamai_netstat_container_TcpExt_TCPRenoRecovery{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRenoRecoveryFail statistic TcpExtTCPRenoRecoveryFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRenoRecoveryFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPRenoRecoveryFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRenoReorder statistic TcpExtTCPRenoReorder.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRenoReorder gauge
huatuo_bamai_netstat_container_TcpExt_TCPRenoReorder{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDoCookies statistic TcpExtTCPReqQFullDoCookies.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDoCookies gauge
huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDoCookies{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDrop statistic TcpExtTCPReqQFullDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPReqQFullDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPRetransFail statistic TcpExtTCPRetransFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPRetransFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPRetransFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSACKDiscard statistic TcpExtTCPSACKDiscard.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSACKDiscard gauge
huatuo_bamai_netstat_container_TcpExt_TCPSACKDiscard{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSACKReneging statistic TcpExtTCPSACKReneging.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSACKReneging gauge
huatuo_bamai_netstat_container_TcpExt_TCPSACKReneging{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSACKReorder statistic TcpExtTCPSACKReorder.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSACKReorder gauge
huatuo_bamai_netstat_container_TcpExt_TCPSACKReorder{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSYNChallenge statistic TcpExtTCPSYNChallenge.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSYNChallenge gauge
huatuo_bamai_netstat_container_TcpExt_TCPSYNChallenge{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackFailures statistic TcpExtTCPSackFailures.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackFailures gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackFailures{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackMerged statistic TcpExtTCPSackMerged.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackMerged gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackMerged{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackRecovery statistic TcpExtTCPSackRecovery.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackRecovery gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackRecovery{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackRecoveryFail statistic TcpExtTCPSackRecoveryFail.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackRecoveryFail gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackRecoveryFail{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackShiftFallback statistic TcpExtTCPSackShiftFallback.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackShiftFallback gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackShiftFallback{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSackShifted statistic TcpExtTCPSackShifted.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSackShifted gauge
huatuo_bamai_netstat_container_TcpExt_TCPSackShifted{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSlowStartRetrans statistic TcpExtTCPSlowStartRetrans.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSlowStartRetrans gauge
huatuo_bamai_netstat_container_TcpExt_TCPSlowStartRetrans{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRTOs statistic TcpExtTCPSpuriousRTOs.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRTOs gauge
huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRTOs{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRtxHostQueues statistic TcpExtTCPSpuriousRtxHostQueues.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRtxHostQueues gauge
huatuo_bamai_netstat_container_TcpExt_TCPSpuriousRtxHostQueues{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPSynRetrans statistic TcpExtTCPSynRetrans.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPSynRetrans gauge
huatuo_bamai_netstat_container_TcpExt_TCPSynRetrans{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPTSReorder statistic TcpExtTCPTSReorder.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPTSReorder gauge
huatuo_bamai_netstat_container_TcpExt_TCPTSReorder{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPTimeWaitOverflow statistic TcpExtTCPTimeWaitOverflow.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPTimeWaitOverflow gauge
huatuo_bamai_netstat_container_TcpExt_TCPTimeWaitOverflow{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPTimeouts statistic TcpExtTCPTimeouts.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPTimeouts gauge
huatuo_bamai_netstat_container_TcpExt_TCPTimeouts{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPToZeroWindowAdv statistic TcpExtTCPToZeroWindowAdv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPToZeroWindowAdv gauge
huatuo_bamai_netstat_container_TcpExt_TCPToZeroWindowAdv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPWantZeroWindowAdv statistic TcpExtTCPWantZeroWindowAdv.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPWantZeroWindowAdv gauge
huatuo_bamai_netstat_container_TcpExt_TCPWantZeroWindowAdv{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPWinProbe statistic TcpExtTCPWinProbe.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPWinProbe gauge
huatuo_bamai_netstat_container_TcpExt_TCPWinProbe{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPWqueueTooBig statistic TcpExtTCPWqueueTooBig.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPWqueueTooBig gauge
huatuo_bamai_netstat_container_TcpExt_TCPWqueueTooBig{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TCPZeroWindowDrop statistic TcpExtTCPZeroWindowDrop.
# TYPE huatuo_bamai_netstat_container_TcpExt_TCPZeroWindowDrop gauge
huatuo_bamai_netstat_container_TcpExt_TCPZeroWindowDrop{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TW statistic TcpExtTW.
# TYPE huatuo_bamai_netstat_container_TcpExt_TW gauge
huatuo_bamai_netstat_container_TcpExt_TW{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 720624
# HELP huatuo_bamai_netstat_container_TcpExt_TWKilled statistic TcpExtTWKilled.
# TYPE huatuo_bamai_netstat_container_TcpExt_TWKilled gauge
huatuo_bamai_netstat_container_TcpExt_TWKilled{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TWRecycled statistic TcpExtTWRecycled.
# TYPE huatuo_bamai_netstat_container_TcpExt_TWRecycled gauge
huatuo_bamai_netstat_container_TcpExt_TWRecycled{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 2461
# HELP huatuo_bamai_netstat_container_TcpExt_TcpDuplicateDataRehash statistic TcpExtTcpDuplicateDataRehash.
# TYPE huatuo_bamai_netstat_container_TcpExt_TcpDuplicateDataRehash gauge
huatuo_bamai_netstat_container_TcpExt_TcpDuplicateDataRehash{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_netstat_container_TcpExt_TcpTimeoutRehash statistic TcpExtTcpTimeoutRehash.
# TYPE huatuo_bamai_netstat_container_TcpExt_TcpTimeoutRehash gauge
huatuo_bamai_netstat_container_TcpExt_TcpTimeoutRehash{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
标签 |
| netstat_TcpExt_ArpFilter |
因 ARP 过滤规则而被丢弃的数据包数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_BusyPollRxPackets |
通过 busy polling 机制接收到的数据包数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_DelayedACKLocked |
由于用户态进程锁住了 socket,而无法发送 delayed ACK 的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_DelayedACKLost |
延迟 ACK 丢失导致重传的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_DelayedACKs |
尝试发送 delayed ACK 的次数,包括未成功发送的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_EmbryonicRsts |
在 SYN_RECV 状态收到带 RST/SYN 标记的包个数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_ListenDrops |
因全连接队列满丢弃的连接总数(含ListenOverflows) |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_ListenOverflows |
表示在 TCP 监听队列中发生的溢出次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_OfoPruned |
乱序队列因内存不足被修剪的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_OutOfWindowIcmps |
收到的与当前 TCP 窗口无关的 ICMP 错误报文数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_PruneCalled |
因内存不足触发缓存清理的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_RcvPruned |
接收队列因内存不足被修剪(丢弃数据包)的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_SyncookiesFailed |
验证失败的 SYN cookie 数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_SyncookiesRecv |
表示接收的 SYN cookie 的数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_SyncookiesSent |
表示发送的 SYN cookie 的数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPACKSkippedChallenge |
在处理 Challenge ACK 过程中跳过的其他 ACK 数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPACKSkippedFinWait2 |
在 FIN-WAIT-2 状态下跳过的 ACK 数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPACKSkippedPAWS |
因 PAWS 检查失败而跳过的 ACK 数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPACKSkippedSeq |
因为序列号检查而跳过的 ACK 数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPACKSkippedTimeWait |
在 TIME-WAIT 状态下跳过的 ACK 数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPAbortOnClose |
用户态程序在缓冲区内还有数据时关闭连接的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPAbortOnData |
收到未知数据导致被关闭的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPAbortOnLinger |
在LINGER状态下等待超时后中止连接的数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPAbortOnMemory |
因内存问题关闭连接的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPAbortOnTimeout |
因各种计时器的重传次数超过上限而关闭连接的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPLossFailures |
丢失数据包而进行恢复失败的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPLossProbeRecovery |
检测到丢失的数据包恢复的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPLossProbes |
TCP 检测到丢失的数据包数量,通常用于检测网络拥塞或丢包 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPLossUndo |
在恢复过程中检测到丢失而撤销的次数 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
| netstat_TcpExt_TCPLostRetransmit |
丢包重传的数量 |
计数 |
宿主,容器 |
container_host, container_hostnamespace, container_level, container_name, container_type, host, region |
备注:TcpExt 扩展指标非常多,可按需参考官方文档。
Ref:
Socket
# HELP huatuo_bamai_sockstat_container_FRAG_inuse Number of FRAG sockets in state inuse.
# TYPE huatuo_bamai_sockstat_container_FRAG_inuse gauge
huatuo_bamai_sockstat_container_FRAG_inuse{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_FRAG_memory Number of FRAG sockets in state memory.
# TYPE huatuo_bamai_sockstat_container_FRAG_memory gauge
huatuo_bamai_sockstat_container_FRAG_memory{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_RAW_inuse Number of RAW sockets in state inuse.
# TYPE huatuo_bamai_sockstat_container_RAW_inuse gauge
huatuo_bamai_sockstat_container_RAW_inuse{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_TCP_alloc Number of TCP sockets in state alloc.
# TYPE huatuo_bamai_sockstat_container_TCP_alloc gauge
huatuo_bamai_sockstat_container_TCP_alloc{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 171
# HELP huatuo_bamai_sockstat_container_TCP_inuse Number of TCP sockets in state inuse.
# TYPE huatuo_bamai_sockstat_container_TCP_inuse gauge
huatuo_bamai_sockstat_container_TCP_inuse{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 1
# HELP huatuo_bamai_sockstat_container_TCP_orphan Number of TCP sockets in state orphan.
# TYPE huatuo_bamai_sockstat_container_TCP_orphan gauge
huatuo_bamai_sockstat_container_TCP_orphan{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_TCP_tw Number of TCP sockets in state tw.
# TYPE huatuo_bamai_sockstat_container_TCP_tw gauge
huatuo_bamai_sockstat_container_TCP_tw{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 75
# HELP huatuo_bamai_sockstat_container_UDPLITE_inuse Number of UDPLITE sockets in state inuse.
# TYPE huatuo_bamai_sockstat_container_UDPLITE_inuse gauge
huatuo_bamai_sockstat_container_UDPLITE_inuse{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_UDP_inuse Number of UDP sockets in state inuse.
# TYPE huatuo_bamai_sockstat_container_UDP_inuse gauge
huatuo_bamai_sockstat_container_UDP_inuse{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 0
# HELP huatuo_bamai_sockstat_container_sockets_used Number of IPv4 sockets in use.
# TYPE huatuo_bamai_sockstat_container_sockets_used gauge
huatuo_bamai_sockstat_container_sockets_used{container_host="coredns-855c4dd65d-8v5kg",container_hostnamespace="kube-system",container_level="burstable",container_name="coredns",container_type="normal",host="hostname",region="dev"} 7
# HELP huatuo_bamai_sockstat_sockets_used Number of IPv4 sockets in use.
# TYPE huatuo_bamai_sockstat_sockets_used gauge
huatuo_bamai_sockstat_sockets_used{host="hostname",region="dev"} 409
| 指标 |
意义 |
单位 |
对象 |
标签 |
| sockstat_sockets_used |
系统层面当前正在使用的 socket 描述符总数 |
计数 |
系统 |
|
| sockstat_TCP_inuse |
当前处于 TCP 连接状态(如 ESTABLISHED、LISTEN 等,除 TIME_WAIT 外)的 socket 数量 |
计数 |
宿主,容器 |
|
| sockstat_TCP_orphan |
通常表示应用已关闭但 TCP 连接仍未结束 |
计数 |
宿主,容器 |
|
| sockstat_TCP_tw |
当前处于 TIME_WAIT 状态的 TCP socket 数量 |
计数 |
宿主,容器 |
|
| sockstat_TCP_alloc |
当前已分配的 TCP socket 对象总数 |
计数 |
宿主,容器 |
|
| sockstat_TCP_mem |
TCP 套接字当前占用的内核内存页数 |
内存页 |
系统 |
|
| sockstat_UDP_inuse |
当前已绑定了本地端口的 UDP socket 数量 |
计数 |
宿主,容器 |
|
IO
iolatency 用来统计磁盘 I/O 延迟分布。可以把它理解成“把一次磁盘请求拆成几个阶段,再分别看每个阶段耗时多久”。
q2c:从请求进入队列到完成,反映整个 I/O 生命周期延迟
d2c:从驱动层下发到完成,更接近磁盘和驱动本身的耗时
freeze:磁盘冻结事件次数
队列
这些指标都会自动带上公共标签 host 和 region。其中容器维度指标还会固定带上
container_host、container_name、container_type、container_level、container_hostnamespace 标签。
# HELP huatuo_bamai_iolatency_blkdisk_d2c the disk d2c latency
# TYPE huatuo_bamai_iolatency_blkdisk_d2c gauge
huatuo_bamai_iolatency_blkdisk_d2c{disk="253:1",host="hostname",region="dev",zone="0"} 3
# HELP huatuo_bamai_iolatency_blkdisk_q2c the disk q2c latency
# TYPE huatuo_bamai_iolatency_blkdisk_q2c gauge
huatuo_bamai_iolatency_blkdisk_q2c{disk="253:1",host="hostname",region="dev",zone="0"} 3
# HELP huatuo_bamai_iolatency_container_blkdisk_d2c container blkio d2c latency
# TYPE huatuo_bamai_iolatency_container_blkdisk_d2c gauge
huatuo_bamai_iolatency_container_blkdisk_d2c{container_host="etcd-hostname",container_hostnamespace="kube-system",container_level="burstable",container_name="etcd",container_type="normal",disk="253:1",host="hostname",region="dev",zone="5"} 2
# HELP huatuo_bamai_iolatency_container_blkdisk_q2c container blkio q2c latency
# TYPE huatuo_bamai_iolatency_container_blkdisk_q2c gauge
huatuo_bamai_iolatency_container_blkdisk_q2c{container_host="etcd-hostname",container_hostnamespace="kube-system",container_level="burstable",container_name="etcd",container_type="normal",disk="253:1",host="hostname",region="dev",zone="5"} 2
| 指标 |
意义 |
单位 |
对象 |
标签 |
| iolatency_blkdisk_q2c |
宿主机磁盘整体 I/O 生命周期延迟统计,从入队到完成。分桶为:zone0 20-30ms,zone1 30-50ms,zone2 50-100ms,zone3 100-200ms,zone4 200-400ms,zone5 400ms+ |
计数 |
宿主 |
host, region, disk, zone |
| iolatency_blkdisk_d2c |
宿主机磁盘驱动到完成阶段的延迟统计,更接近设备处理耗时。分桶为:zone0 20-30ms,zone1 30-50ms,zone2 50-100ms,zone3 100-200ms,zone4 200-400ms,zone5 400ms+ |
计数 |
宿主 |
host, region, disk, zone |
| iolatency_container_blkdisk_q2c |
容器触发的整体 I/O 生命周期延迟统计,从入队到完成。分桶为:zone0 20-30ms,zone1 30-50ms,zone2 50-100ms,zone3 100-200ms,zone4 200-400ms,zone5 400ms+ |
计数 |
容器 |
host, region, container_host, container_name, container_type, container_level, container_hostnamespace, zone |
| iolatency_container_blkdisk_d2c |
容器触发的驱动到完成阶段延迟统计。分桶为:zone0 20-30ms,zone1 30-50ms,zone2 50-100ms,zone3 100-200ms,zone4 200-400ms,zone5 400ms+ |
计数 |
容器 |
host, region, container_host, container_name, container_type, container_level, container_hostnamespace, zone |
硬件
# HELP huatuo_bamai_iolatency_blkdisk_freeze the disk freeze event count
# TYPE huatuo_bamai_iolatency_blkdisk_freeze gauge
huatuo_bamai_iolatency_blkdisk_freeze{disk="253:1",host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
标签 |
| iolatency_blkdisk_freeze |
宿主机磁盘 freeze 事件次数 |
计数 |
宿主 |
host, region, disk |
通用系统
Soft Lockup
# HELP huatuo_bamai_softlockup_total softlockup counter
# TYPE huatuo_bamai_softlockup_total counter
huatuo_bamai_softlockup_total{host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
取值 |
标签 |
| softlockup_total |
系统 softlockup 事件计数 |
计数 |
物理机 |
BPF |
|
HungTask
# HELP huatuo_bamai_hungtask_total hungtask counter
# TYPE huatuo_bamai_hungtask_total counter
huatuo_bamai_hungtask_total{host="hostname",region="dev"} 0
| 指标 |
意义 |
单位 |
对象 |
取值 |
标签 |
| hungtask_total |
系统 hungtask 事件计数 |
计数 |
物理机 |
BPF |
|
GPU
当前版本支持的 GPU 平台:
| 指标 |
描述 |
单位 |
统计纬度 |
指标来源 |
| metax_gpu_sdk_info |
GPU SDK 信息 |
- |
version |
sml.GetSDKVersion |
| metax_gpu_driver_info |
GPU 驱动信息 |
- |
version |
sml.GetGPUVersion with driver unit |
| metax_gpu_info |
GPU 基本信息 |
- |
gpu |
|
| metax_gpu_board_power_watts |
GPU 板级功耗 |
瓦特(W) |
gpu |
sml.ListGPUBoardWayElectricInfos |
| metax_gpu_pcie_link_speed_gt_per_second |
GPU PCIe 当前链路速率 |
GT/s |
gpu |
sml.GetGPUPcieLinkInfo |
| metax_gpu_pcie_link_width_lanes |
GPU PCIe 当前链路宽度 |
链路宽度(通道数) |
gpu |
sml.GetGPUPcieLinkInfo |
| metax_gpu_pcie_receive_bytes_per_second |
GPU PCIe 接收吞吐率 |
Bps |
gpu |
sml.GetGPUPcieThroughputInfo |
| metax_gpu_pcie_transmit_bytes_per_second |
GPU PCIe 发送吞吐率 |
Bps |
gpu |
sml.GetGPUPcieThroughputInfo |
| metax_gpu_metaxlink_link_speed_gt_per_second |
GPU MetaXLink 当前链路速率 |
GT/s |
gpu, metaxlink |
sml.ListGPUMetaXLinkLinkInfos |
| metax_gpu_metaxlink_link_width_lanes |
GPU MetaXLink 当前链路宽度 |
链路宽度(通道数) |
gpu, metaxlink |
sml.ListGPUMetaXLinkLinkInfos |
| metax_gpu_metaxlink_receive_bytes_per_second |
GPU MetaXLink 接收吞吐率 |
Bps |
gpu, metaxlink |
sml.ListGPUMetaXLinkThroughputInfos |
| metax_gpu_metaxlink_transmit_bytes_per_second |
GPU MetaXLink 发送吞吐率 |
Bps |
gpu, metaxlink |
sml.ListGPUMetaXLinkThroughputInfos |
| metax_gpu_metaxlink_receive_bytes_total |
GPU MetaXLink 接收数据总量 |
字节 |
gpu, metaxlink |
sml.ListGPUMetaXLinkTrafficStatInfos |
| metax_gpu_metaxlink_transmit_bytes_total |
GPU MetaXLink 发送数据总量 |
字节 |
gpu, metaxlink |
sml.ListGPUMetaXLinkTrafficStatInfos |
| metax_gpu_metaxlink_aer_errors_total |
GPU MetaXLink AER 错误次数 |
计数 |
gpu, metaxlink, error_type |
sml.ListGPUMetaXLinkAerErrorsInfos |
| metax_gpu_status |
GPU 状态 |
- |
gpu, die |
sml.GetDieStatus |
| metax_gpu_temperature_celsius |
GPU 温度 |
摄氏度 |
gpu, die |
sml.GetDieTemperature |
| metax_gpu_utilization_percent |
GPU 利用率(0–100) |
% |
gpu, die, ip |
sml.GetDieUtilization |
| metax_gpu_memory_total_bytes |
显存总容量 |
字节 |
gpu, die |
sml.GetDieMemoryInfo |
| metax_gpu_memory_used_bytes |
已使用显存容量 |
字节 |
gpu, die |
sml.GetDieMemoryInfo |
| metax_gpu_clock_mhz |
GPU 时钟频率 |
兆赫兹(MHz) |
gpu, die, ip |
sml.ListDieClocks |
| metax_gpu_clocks_throttling |
GPU 时钟降频原因 |
- |
gpu, die, reason |
sml.GetDieClocksThrottleStatus |
| metax_gpu_dpm_performance_level |
GPU DPM 性能等级 |
- |
gpu, die, ip |
sml.GetDieDPMPerformanceLevel |
| metax_gpu_ecc_memory_errors_total |
GPU ECC 内存错误次数 |
计数 |
gpu, die, memory_type, error_type |
sml.GetDieECCMemoryInfo |
| metax_gpu_ecc_memory_retired_pages_total |
GPU ECC 内存退役页数 |
计数 |
gpu, die |
sml.GetDieECCMemoryInfo |
5.2 - 异常事件诊断
🎯 关于 HUATUO(华佗)
HUATUO(华佗)是由滴滴开源并依托 CCF(中国计算机学会)孵化的操作系统深度观测项目,专注为云原生通用计算、AI 计算、云服务、基础服务等提供操作系统内核级深度观测能力。
📖 概述
HUATUO 基于 eBPF 技术,对 Linux 内核中的 CPU 调度、内存子系统、网络协议栈、硬件错误等核心子系统实施实时异常事件观测。当内核触发 softlockup、OOM、硬件 MCE 等异常状态时,eBPF 程序通过挂钩(hook)内核函数(kprobe)或内核 tracepoint,在事件发生的第一时间采集进程信息、内核调用栈、网络上下文等现场数据,并经由 perf event 环形缓冲区传递至用户态处理程序,最终持久化至 Elasticsearch 或本地磁盘文件。
相比传统的基于内核日志(dmesg/syslog)采集方案,eBPF 事件观测具备更低的数据丢失风险——不会因内核日志缓冲区满溢而丢失关键事件;同时可捕获不会写入内核日志的短暂性异常(如软中断关闭时间过长);并提供容器级别的事件关联信息,满足云原生场景下的精准定位需求。
当前支持 11 类事件的持续观测,覆盖 CPU 调度健康状态(softirq_tracing、softlockup、hungtask)、内存压力(oom、memory_reclaim_events)、网络协议栈(dropwatch、net_rx_latency、netdev_events、netdev_bonding_lacp、netdev_txqueue_timeout)以及硬件可靠性(ras)等方面。
🎯 场景
Kubernetes 容器内存故障定位:在容器频繁 OOM 重启场景下,oom 事件同时记录被 OOM Killer 终止的进程(victim)与触发 OOM 的进程(trigger)的 memcg cgroup 指针及容器 ID,结合时序数据可快速定位内存资源争抢的根因容器,降低人工排查容器日志的时间成本。
AI 训练集群硬件故障感知:在 GPU 训练服务器上,ras 事件持续采集 MCE(Machine Check Exception)、EDAC 内存控制器错误和 PCIe AER(Advanced Error Reporting)错误,对错误进行严重程度分级(Corrected / UncorrectedRecoverable / UncorrectedFatal),在训练任务中断前提前感知硬件老化或单点故障,减少因硬件故障导致的训练任务损失。
网络性能毛刺分析:dropwatch 观测 TCP 协议栈丢包行为(含 syn_flood、listen_overflow 等类型),net_rx_latency 检测单个数据包从网卡驱动到用户态的完整接收路径延迟,按阶段(网卡到内核、内核到 TCP、TCP 到用户态)分别设置阈值(默认 5ms / 10ms / 115ms),精准定位造成业务超时的网络层位置,提升网络问题根因定位效率。
主机调度健康观测:softirq_tracing(软中断关闭时间,默认阈值 10ms)、softlockup(CPU 无法调度,约 1 秒)、hungtask(D 状态进程任务挂起)三类事件联合覆盖 CPU 调度路径的异常状态,当系统出现卡顿、响应超时等现象时,自动保留内核调用栈等诊断信息,支持在故障消失后的离线分析。
🚀 使用
配置参数
各事件可通过以下参数进行调优,参数均提供默认值,无需配置即可运行:
| 参数 |
默认值 |
说明 |
softirq.disabled_threshold |
10000000(10ms,纳秒) |
软中断关闭时间触发阈值 |
memory_reclaim.blocked_threshold |
900000000(900ms,纳秒) |
直接内存回收时间触发阈值 |
net_rx_latency.driver2net_rx |
5(ms) |
从网卡驱动到 __netif_receive_skb 的延迟阈值 |
net_rx_latency.driver2tcp |
10(ms) |
从网卡驱动到 tcp_v4_rcv 的延迟阈值 |
net_rx_latency.driver2userspace |
115(ms) |
从网卡驱动到用户态拷贝(skb_copy_datagram_iovec)的延迟阈值 |
net_rx_latency.excluded_host_netnamespace |
true |
是否过滤宿主机网络命名空间(默认仅观测容器) |
net_rx_latency.excluded_container_qos |
[] |
需要排除的容器 QoS 级别列表 |
dropwatch.excluded_neigh_invalidate |
true |
是否过滤 neigh_invalidate 引起的邻居表丢包噪声 |
netdev.device_list |
[] |
需要监控链路状态的网卡设备名称列表 |
ras.mce_thr_backoff |
1800(秒) |
MCE 阈值中断(THR)事件上报冷却时间,防止中断风暴 |
issues_list |
[] |
已知问题过滤规则列表(用于 net_rx_latency) |
事件列表
| 事件名称(tracer_name) |
探针类型 |
触发条件 |
典型场景 |
softirq_tracing |
kprobe |
软中断关闭时间 > 阈值(默认 10ms) |
系统卡顿、网络延迟、调度延迟 |
softlockup |
kprobe |
CPU 长时间无法调度(约 1 秒) |
系统软锁死、响应异常 |
hungtask |
kprobe |
D 状态进程任务挂起 |
瞬时批量 D 进程、IO 阻塞 |
oom |
kprobe |
OOM Killer 触发 |
容器/宿主机内存耗尽 |
memory_reclaim_events |
kprobe |
容器进程直接回收时间 > 阈值(默认 900ms) |
内存压力导致业务卡顿 |
ras |
tracepoint |
CPU/MEM/PCIe 硬件错误 |
硬件故障感知 |
dropwatch |
kprobe |
TCP 协议栈丢包 |
协议栈丢包导致业务毛刺 |
net_rx_latency |
kprobe |
协议栈接收延迟超分段阈值 |
接收延迟引起业务超时 |
netdev_events |
netlink |
网卡链路状态变化 |
网卡物理链路故障 |
netdev_bonding_lacp |
kprobe |
LACP 协议状态变化(仅 802.3ad 模式环境) |
物理机与交换机故障边界界定 |
netdev_txqueue_timeout |
kprobe |
网卡发送队列超时 |
网卡发送队列硬件故障 |
通用字段说明
所有事件数据均包含以下通用字段:
- hostname:物理机 hostname
- region:物理机所在可用区
- uploaded_time:数据上传时间
- container_id:如果事件关联容器,则记录的容器 ID
- container_hostname:如果事件关联容器,则记录的容器 hostname
- container_host_namespace:如果事件关联容器,则记录容器的 K8s 命名空间
- container_type:容器类型,例如
normal 普通容器,sidecar 边车容器等
- container_qos:容器 QoS 级别
- tracer_name:事件名称(如
softirq_tracing、oom 等)
- tracer_id:此次的 tracing ID
- tracer_time:触发 tracing 时间
- tracer_type:触发类型(手动触发或自动触发)
- tracer_data:特定事件私有数据(详见各事件说明)
1. softirq_tracing 软中断关闭
功能描述 检测内核关闭软中断时间过长时触发,记录关闭软中断期间的内核调用栈、当前进程信息等关键数据,帮助分析中断相关延迟问题。过滤器自动排除 ksoftirqd 和 swapper 进程产生的噪声事件。
数据存储 事件数据自动存储至 Elasticsearch 或物理机磁盘文件。
示例数据
{
"uploaded_time": "2025-06-11T16:05:16.251152703+08:00",
"hostname": "***",
"tracer_data": {
"offtime": 237328905,
"threshold": 10000000,
"comm": "***-agent",
"pid": 688073,
"cpu": 1,
"now": 5532940660025295,
"stack": "scheduler_tick/..."
},
"tracer_time": "2025-06-11 16:05:16.251 +0800",
"tracer_type": "auto",
"time": "2025-06-11 16:05:16.251 +0800",
"region": "***",
"tracer_name": "softirq_tracing"
}
字段含义解释
- comm:触发事件的进程名称
- stack:关闭软中断期间的内核调用栈
- now:事件发生时的单调时钟时间戳(纳秒)
- offtime:软中断关闭的持续时间(纳秒)
- cpu:发生事件的 CPU 编号
- threshold:触发阈值(纳秒),超过该值则记录事件
- pid:触发事件的进程 ID
2. dropwatch 协议栈丢包
功能描述 检测内核网络协议栈中的丢包行为,输出丢包时的内核调用栈、网络五元组、TCP 状态等信息。支持识别 4 种丢包类型:common_drop(通用丢包)、syn_flood(SYN 洪泛)、listen_overflow_handshake1(半连接队列溢出)、listen_overflow_handshake3(全连接队列溢出)。过滤器默认排除 neigh_invalidate 邻居表过期丢包和 bnxt 驱动发送侧丢包等已知噪声。
数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。
示例数据
{
"tracer_data": {
"type": "common_drop",
"comm": "kubelet",
"pid": 1687046,
"saddr": "10.79.68.62",
"daddr": "10.134.72.4",
"sport": 8080,
"dport": 49000,
"src_hostname": "<nil>",
"dest_hostname": "<nil>",
"max_ack_backlog": 128,
"seq": 1009085774,
"ack_seq": 689410995,
"pkt_len": 1460,
"sk_state": "ESTABLISHED",
"stack": "kfree_skb/...",
"netdev_queue_mapping": 3,
"netdev_linkstatus": ["linkStatusUp"],
"netdev_name": "eth0",
"netdev_ifindex": 2,
"net_cookie": 123456789
}
}
字段含义解释
- type:丢包类型(
common_drop / syn_flood / listen_overflow_handshake1 / listen_overflow_handshake3)
- comm:触发丢包的进程名称
- pid:进程 ID
- saddr / daddr:源 IP / 目的 IP 地址
- sport / dport:源端口 / 目的端口
- src_hostname / dest_hostname:源/目的 IP 的反向 DNS 解析结果
- max_ack_backlog:socket 最大 accept 队列长度
- seq / ack_seq:TCP 序列号 / 确认序列号
- pkt_len:数据包长度(字节)
- sk_state:丢包时 TCP 连接状态
- stack:丢包发生时的内核调用栈
- netdev_queue_mapping:网卡队列索引
- netdev_linkstatus:网卡链路状态标志列表
- netdev_name:网卡设备名称
- netdev_ifindex:网卡接口索引
- net_cookie:网络命名空间标识符
3. net_rx_latency 协议栈延迟
功能描述 检测协议栈接收方向(网卡驱动 → 内核协议栈 → 用户态主动收包)的分段延迟事件。在接收路径上设置三个观测点,任意阶段延迟超过对应阈值(默认:网卡到内核 5ms、内核到 TCP 10ms、TCP 到用户态 115ms)时触发,记录网络五元组、TCP 序列号、延迟位置及延迟时间。默认过滤宿主机网络命名空间,仅观测容器网络。
数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。
示例数据
{
"tracer_data": {
"comm": "nginx",
"pid": 2921092,
"where": "TO_USER_COPY",
"latency_ms": 95973,
"state": "ESTABLISHED",
"saddr": "10.156.248.76",
"daddr": "10.134.72.4",
"sport": 9213,
"dport": 49000,
"seq": 1009085774,
"ack_seq": 689410995,
"pkt_len": 26064
}
}
字段含义解释
- comm:触发事件的进程名称
- pid:触发事件的进程 ID
- saddr / daddr:源 IP / 目的 IP 地址
- sport / dport:源端口 / 目的端口
- seq / ack_seq:TCP 序列号 / 确认序列号
- state:TCP 连接状态(如
ESTABLISHED)
- pkt_len:数据包长度(字节)
- where:延迟发生的阶段(
TO_NETIF_RCV 网卡到内核 / TO_TCPV4_RCV 内核到 TCP / TO_USER_COPY TCP 到用户态)
- latency_ms:实际延迟时间(毫秒)
4. oom 内存耗尽
功能描述 检测宿主机或容器内发生的 OOM(Out of Memory)事件,记录被 OOM Killer 终止的进程(victim)与触发 OOM 的进程(trigger)信息,以及对应容器和 memory cgroup 的详细信息,提供完整的故障快照。同时维护宿主机和各容器的 OOM 计数指标。
数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。
示例数据
{
"tracer_data": {
"trigger_memcg_css": "0xff4b8d8be3818000",
"trigger_container_id": "***",
"trigger_container_hostname": "***.docker",
"trigger_pid": 3218804,
"trigger_process_name": "java",
"victim_memcg_css": "0xff4b8d8be3818000",
"victim_container_id": "***",
"victim_container_hostname": "***.docker",
"victim_pid": 3218745,
"victim_process_name": "java"
}
}
字段含义解释
- victim_process_name / victim_pid:被 OOM Killer 终止的进程名称与 PID
- victim_container_hostname / victim_container_id:被终止进程所在的容器主机名与容器 ID
- victim_memcg_css:被终止进程对应的 memory cgroup 指针(十六进制)
- trigger_process_name / trigger_pid:触发 OOM 的进程名称与 PID
- trigger_container_hostname / trigger_container_id:触发进程所在的容器主机名与容器 ID
- trigger_memcg_css:触发进程对应的 memory cgroup 指针(十六进制)
5. softlockup 软锁死
功能描述 检测系统 softlockup 事件(CPU 长时间无法被调度,约 1 秒),提供导致锁死的目标进程信息、所在 CPU 及所有 CPU 的 NMI 回溯信息。采用退避(backoff)策略,同一轮事件风暴期间上报间隔从 10 分钟递增至最长 3 小时,防止重复上报。同时维护 softlockup 发生次数的计数指标。
数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。
示例数据
{
"tracer_data": {
"cpu": 15,
"pid": 12345,
"comm": "kworker/15:0",
"cpus_stack": "2025-06-10 14:30:22 sysrq: Show backtrace of all active CPUs\nNMI backtrace for cpu 15\n..."
}
}
字段含义解释
- cpu:发生 softlockup 的 CPU 编号
- pid:触发 softlockup 的进程 PID
- comm:触发 softlockup 的进程名称
- cpus_stack:所有 CPU 的 NMI 回溯信息(多行文本,包含时间戳和调用栈)
6. hungtask 任务挂起
功能描述 检测系统 hungtask 事件,捕获当前所有处于 D 状态(不可中断睡眠)的进程内核栈及所有 CPU 的回溯信息,用于保留故障现场。采用退避策略,同一轮事件风暴期间上报间隔从 10 分钟递增至最长 3 小时。同时维护 hungtask 发生次数的计数指标。注意:部分 Linux 发行版(如 Fedora 42)默认禁用 hungtask 检测,此时该观测器不会启动。
数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。
示例数据
{
"tracer_data": {
"pid": 2567042,
"comm": "kworker/u48:2",
"cpus_stack": "2025-06-10 09:57:14 sysrq: Show backtrace of all active CPUs\nNMI backtrace for cpu 33\n...",
"blocked_processes_stack": "task:java state:D stack: 0 pid: 12345 ..."
}
}
字段含义解释
- pid:触发 hungtask 检测的进程 PID
- comm:触发 hungtask 检测的进程名称
- cpus_stack:所有 CPU 的 NMI 回溯信息(多行文本,包含时间戳和调用栈)
- blocked_processes_stack:D 状态进程的内核栈信息
7. memory_reclaim_events 内存回收
功能描述 检测容器进程发生直接内存回收(direct reclaim)的事件,当同一进程在 1 秒内直接回收时间超过阈值(默认 900ms)时触发,记录回收耗时、进程及容器信息。注意:该观测器仅记录容器进程的内存回收事件,宿主机进程的事件会被过滤。
数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。
示例数据
{
"tracer_data": {
"pid": 1896137,
"comm": "java",
"deltatime": 1412702917
}
}
字段含义解释
- comm:触发直接内存回收的进程名称
- pid:触发进程的 PID
- deltatime:直接回收耗时(纳秒)
8. ras 硬件错误
功能描述 通过内核 tracepoint 检测 CPU、内存、PCIe 等硬件错误,支持 5 种硬件错误来源:MCE(Machine Check Exception)、EDAC(内存控制器)、ACPI/GHES(非标准硬件错误)、PCIe AER(高级错误上报)、MCE 阈值中断(THR)。错误按严重程度分级:Corrected(已纠正)、UncorrectedRecoverable(未纠正可恢复)、UncorrectedFatal(未纠正致命)。MCE 阈值中断事件采用冷却策略(默认 30 分钟),防止中断风暴触发大量重复上报。
数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。
MCE 示例数据
{
"tracer_data": {
"dev": "CPU/MEM",
"event": "MCE",
"type": "UncorrectedRecoverable",
"timestamp": 1749600000000000000,
"info": "{\"mcg_cpu_cap\":4096,\"banks_msr_status\":9295429630892703744,\"cpu\":2,\"socketid\":0,\"bank\":5}"
}
}
PCIe AER 示例数据
{
"tracer_data": {
"dev": "PCIe 0000:3b:00.0",
"event": "AER",
"type": "UncorrectedRecoverable",
"timestamp": 1749600000000000000,
"info": "{\"dev_name\":\"0000:3b:00.0\",\"err_type\":\"UncorrectedRecoverable\",\"err_reason\":\"Completion Timeout\",\"tlp_header\":\"not available\"}"
}
}
字段含义解释
- dev:发生错误的硬件设备(如
CPU/MEM、PCIe 0000:3b:00.0)
- event:错误类型(
MCE / EDAC / NON_STANDARD / AER / MCE_THRESHOLD)
- type:错误严重程度(
Corrected / UncorrectedRecoverable / UncorrectedDeferred / UncorrectedFatal / Info)
- timestamp:硬件错误发生时的时间戳
- info:JSON 格式的详细错误信息,内容因 event 类型不同而不同
9. netdev_events 网络设备
功能描述 通过订阅内核 netlink RTM_NEWLINK 消息,检测网卡链路状态变化事件(down/up、MTU 变更、AdminDown、CarrierDown 等),输出接口名称、链路状态、MAC 地址及驱动信息。观测器启动时会扫描 device_list 中配置的网卡当前状态作为基线,后续仅上报状态变化事件。
数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。
示例数据
{
"tracer_data": {
"ifname": "eth1",
"index": 3,
"linkstatus": "linkStatusAdminDown, linkStatusCarrierDown",
"mac": "5c:6f:69:34:dc:72",
"start": false,
"driver": "ixgbe",
"driver_version": "5.1.0-k",
"firmware_version": "3.25 0x80000421 1.2163.0"
}
}
字段含义解释
- ifname:网络接口名称(如
eth1)
- index:接口索引号
- linkstatus:链路状态变化描述(可包含多个状态)
- mac:网卡 MAC 地址
- start:是否为启动时扫描的基线事件(
true:启动扫描,false:实时变化事件)
- driver:网卡驱动名称
- driver_version:网卡驱动版本
- firmware_version:网卡固件版本
10. netdev_bonding_lacp LACP 协议
功能描述 检测 bonding 模式下 LACP(Link Aggregation Control Protocol,IEEE 802.3ad)协议的状态变化,读取并记录 /proc/net/bonding/ 下所有 bonding 接口的完整状态信息,包含模式、MII 状态、Actor/Partner 协商参数、Slave 链路状态等。仅在系统存在 IEEE 802.3ad bonding 模式接口时自动启用。
数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。
示例数据
{
"tracer_data": {
"content": "/proc/net/bonding/bond0\nEthernet Channel Bonding Driver: v4.18.0...\nBonding Mode: IEEE 802.3ad Dynamic link aggregation\nMII Status: down\n..."
}
}
字段含义解释
- content:完整的 bonding 接口状态信息(多行文本,包含所有 Slave 的 LACP 协商细节,等同于
/proc/net/bonding/bondX 文件内容)
11. netdev_txqueue_timeout 发送队列超时
功能描述 检测网卡发送队列超时(TX queue timeout)事件,记录发生超时的队列索引、设备名称和驱动名称,用于定位网卡发送方向的硬件故障。
数据存储 自动存储至 Elasticsearch 或物理机磁盘文件。
示例数据
{
"tracer_data": {
"queue_index": 3,
"device_name": "eth0",
"driver_name": "ixgbe"
}
}
字段含义解释
- queue_index:发生超时的发送队列索引
- device_name:网卡设备名称
- driver_name:网卡驱动名称
⚙️ 原理
整体架构
HUATUO 的异常事件观测基于 eBPF 技术,在内核态以极低的性能损耗采集异常事件现场数据,并通过用户态守护进程完成格式化、过滤、容器信息关联和持久化存储。
graph TB
subgraph "Linux Kernel"
direction TB
K1["kprobe 挂钩\n(softirq_tracing / softlockup / hungtask\n oom / memory_reclaim_events / dropwatch\n net_rx_latency / netdev_txqueue_timeout)"]
K2["tracepoint 挂钩\n(ras: MCE / EDAC / AER / ACPI)"]
K3["netlink 订阅\n(netdev_events: RTM_NEWLINK)"]
K4["kprobe 挂钩\n(netdev_bonding_lacp: 802.3ad)"]
PEB["Perf Event 环形缓冲区\n(8192 页)"]
end
subgraph "HUATUO 用户态"
direction TB
EH["Go 事件处理 goroutine\n(每类事件独立运行)"]
CF["过滤器\n(阈值判断 / 降噪 / 已知问题过滤)"]
CM["容器信息关联\n(CSS → ContainerID\n NetNS → ContainerID)"]
end
subgraph "存储"
ES["Elasticsearch"]
DISK["本地磁盘文件"]
end
K1 --> PEB
K2 --> PEB
K4 --> PEB
PEB --> EH
K3 --> EH
EH --> CF
CF --> CM
CM --> ES
CM --> DISK
事件处理流程
sequenceDiagram
participant K as Linux Kernel
participant B as eBPF Program
participant P as Perf Event Buffer
participant H as Go 事件处理器
participant F as 过滤器
participant S as 存储
K->>B: 触发 kprobe / tracepoint
B->>B: 采集现场数据<br/>(进程信息 / 内核栈 / 网络上下文)
B->>P: 写入 perf event 环形缓冲区
H->>P: 读取事件数据(阻塞等待)
H->>F: 格式化并执行过滤<br/>(阈值 / 降噪 / 已知问题)
F->>H: 通过过滤的事件
H->>H: 关联容器信息<br/>(CSS / NetNS 映射)
H->>S: 持久化存储<br/>(Elasticsearch / 本地文件)
6.1 - 存储服务
🎯 关于 HUATUO(华佗)
HUATUO(华佗)是由滴滴开源并依托 CCF(中国计算机学会)孵化的操作系统深度观测项目,专注为云原生通用计算、AI 计算、云服务、基础服务等提供操作系统内核级深度观测能力。
📖 概述
HUATUO(华佗)支持将采集到的 Linux 内核事件与 AutoTracing 数据持久化写入外部存储后端。当前支持 Elasticsearch 和 OpenSearch 两种存储系统。
采集到的事件在序列化为 JSON 后,同时写入节点本地目录(huatuo-local/)和配置的远端存储后端。本地目录保留事件的本地副本,远端存储提供持久化与结构化查询能力。
本文介绍 Elasticsearch 和 OpenSearch 的配置与验证方法。示例基于 Docker 部署,生产环境只需将地址替换为实际服务地址,配置方式一致。
🎯 应用场景
Kubernetes 云原生故障溯源
容器化环境中,Pod OOM、节点 Hung Task 等内核事件具有短暂性,日志往往在事件发生后被清理。将事件写入 Elasticsearch 或 OpenSearch 后,运维团队可按时间范围查询历史异常时间线,在事后复盘阶段精确定位间歇性故障的根因。
AI 计算集群稳定性审计
GPU 训练集群长期运行过程中,ras 硬件错误、iotracing I/O 延迟等事件的历史分布对容量规划和硬件健康评估至关重要。将采集数据持久化后,可通过聚合查询建立节点稳定性基线,为主动维护提供数据依据。
合规与事件留存
等保合规要求系统异常事件具备可追溯性。将 HUATUO 采集的内核事件写入 OpenSearch 并配置索引生命周期策略,可满足对事件留存周期和查询能力的合规要求。
可观测性平台集成
Elasticsearch 和 OpenSearch 均提供与 Grafana 的原生数据源对接能力。将 HUATUO 事件写入存储后,可在 Grafana 中构建内核事件趋势面板,与应用层指标叠加展示,实现历史数据分析与告警回顾。
💎 价值
| 维度 |
仅本地存储 |
接入外部存储后端 |
| 数据持久性 |
受节点磁盘容量限制,重启后可能丢失 |
数据持久化至分布式存储,支持长期保留 |
| 查询能力 |
无结构化查询,依赖文件搜索 |
支持全文检索、字段过滤、时间范围聚合 |
| 可视化集成 |
不支持 |
可直接对接 Grafana、Kibana 等可视化平台 |
| 多节点汇聚 |
数据分散在各节点本地 |
集中写入统一存储,支持跨节点查询 |
| 合规留存 |
难以满足留存周期要求 |
可配置索引生命周期策略,满足合规留存要求 |
🚀 使用
OpenSearch V2
1. 部署 OpenSearch
docker pull opensearchproject/opensearch:2.6.0
docker run -d --name opensearch -p 9200:9200 -p 9600:9600 \
-e "discovery.type=single-node" \
opensearchproject/opensearch:2.6.0
2. 验证服务状态
curl -k -u admin:admin https://localhost:9200
返回示例:
{
"name" : "22ca72df78c0",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "yxb3foceQVKzXXO6bHpPHQ",
"version" : {
"distribution" : "opensearch",
"number" : "2.6.0",
"build_type" : "tar",
"build_hash" : "7203a5af21a8a009aece1474446b437a3c674db6",
"build_date" : "2023-02-24T18:57:04.388618985Z",
"build_snapshot" : false,
"lucene_version" : "9.5.0",
"minimum_wire_compatibility_version" : "7.10.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "The OpenSearch Project: https://opensearch.org/"
}
若验证失败,可通过以下命令查看容器日志:
3. 配置 huatuo-bamai
在 huatuo-bamai.conf 中添加以下配置。OpenSearch 容器镜像默认用户名和密码均为 admin。存储配置的详细说明请参见《配置指南》章节。
[Storage.ES]
Address = "https://127.0.0.1:9200"
Index = "huatuo_bamai"
Username = "admin"
Password = "admin"
4. 启动 huatuo-bamai
通过 --config-dir 指定配置文件所在目录:
./_output/bin/huatuo-bamai --region dev --config-dir .
当本地存储目录 huatuo-local/ 中生成文件(例如 net_rx_latency)时,说明已成功采集到内核事件。可使用以下命令从 OpenSearch 查询数据:
curl -k -u admin:admin \
-X GET "https://localhost:9200/huatuo_bamai/_search?pretty" \
-H "Content-Type: application/json" \
-d '{"query": {"match_all": {}}}'
返回示例:
{
"_index" : "huatuo_bamai",
"_id" : "yjPG_50Bu_OF-hukxKR7",
"_score" : 1.0,
"_source" : {
"hostname" : "hostname",
"region" : "dev",
"uploaded_time" : "2026-05-07T00:11:49.753166222Z",
"time" : "2026-05-07 00:11:49.753 +0000",
"tracer_name" : "net_rx_latency",
"tracer_time" : "2026-05-07 00:11:49.753 +0000",
"tracer_type" : "auto",
"tracer_data" : {
"comm" : "<nil>",
"pid" : 0,
"where" : "TO_NETIF_RCV",
"latency_ms" : 1776078133565,
"saddr" : "127.0.0.1",
"daddr" : "127.0.0.1",
"sport" : 37736,
"dport" : 9200,
"seq" : 1080592402,
"ack_seq" : 2465063876,
"pkt_len" : 781
}
}
}
查看文档记录总数,不查看具体列表。
curl -k -u admin:admin -X GET "https://localhost:9200/huatuo_bamai/_count?pretty"
返回示例:其中 count 数字 = 写入记录的总数。
{
"count" : 2680,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
Elasticsearch V8
1. 部署 Elasticsearch
docker pull docker.elastic.co/elasticsearch/elasticsearch:8.15.5
docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
-e "ELASTIC_PASSWORD=123456" \
docker.elastic.co/elasticsearch/elasticsearch:8.15.5
2. 验证服务状态
curl -k -u elastic:123456 https://localhost:9200
返回示例:
{
"name" : "ab0b562f8dbd",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "aVfOVgJTQXuhZ3HGotK3ww",
"version" : {
"number" : "8.15.5",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "b10896bcfe167cce44a84ba2771d101fb596d40d",
"build_date" : "2024-11-21T22:06:13.985834967Z",
"build_snapshot" : false,
"lucene_version" : "9.11.1",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
3. 配置 huatuo-bamai
在 huatuo-bamai.conf 中添加以下配置。Elasticsearch 容器镜像默认用户名为 elastic,密码通过环境变量 ELASTIC_PASSWORD 设置。存储配置的详细说明请参见《配置指南》章节。
[Storage.ES]
Address = "https://127.0.0.1:9200"
Index = "huatuo_bamai"
Username = "elastic"
Password = "123456"
4. 启动 huatuo-bamai
通过 --config-dir 指定配置文件所在目录:
./_output/bin/huatuo-bamai --region dev --config-dir .
当本地存储目录 huatuo-local/ 中生成文件(例如 net_rx_latency)时,说明已成功采集到内核事件。可使用以下命令从 Elasticsearch 查询数据:
curl -k -u elastic:123456 \
-X GET "https://localhost:9200/huatuo_bamai/_search?pretty" \
-H "Content-Type: application/json" \
-d '{"query": {"match_all": {}}}'
返回示例:
{
"_index" : "huatuo_bamai",
"_id" : "WtNZAJ4BQ8x-thPHEY1i",
"_score" : 1.0,
"_source" : {
"hostname" : "hostname",
"region" : "dev",
"uploaded_time" : "2026-05-07T02:51:37.696263325Z",
"time" : "2026-05-07 02:51:37.696 +0000",
"tracer_name" : "net_rx_latency",
"tracer_time" : "2026-05-07 02:51:37.696 +0000",
"tracer_type" : "auto",
"tracer_data" : {
"comm" : "<nil>",
"pid" : 0,
"where" : "TO_NETIF_RCV",
"latency_ms" : 1776078133565,
"saddr" : "127.0.0.1",
"daddr" : "127.0.0.1",
"sport" : 2379,
"dport" : 36706,
"seq" : 950542706,
"ack_seq" : 1960972383,
"pkt_len" : 91
}
}
}
查看文档记录总数,不查看具体列表。
curl -k -u elastic:123456 -X GET "https://localhost:9200/huatuo_bamai/_count?pretty"
返回示例:其中 count 数字 = 写入记录的总数。
{
"count" : 2680,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
Elasticsearch V7
V7 默认使用 HTTP,因此只需要在访问服务时替换为 HTTP 即可。
1. 部署 Elasticsearch
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.10.1
docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e "ES_JAVA_OPTS=-Xms1g -Xmx1g" \
-e "ELASTIC_PASSWORD=123456" \
docker.elastic.co/elasticsearch/elasticsearch:7.10.1
2. 验证服务状态
curl -k -u elastic:123456 http://localhost:9200
返回示例:
{
"name" : "d88c9e8df48b",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "_ZZefWx4SniAc255t_lIVg",
"version" : {
"number" : "7.10.1",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "1c34507e66d7db1211f66f3513706fdf548736aa",
"build_date" : "2020-12-05T01:00:33.671820Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
3. 配置 huatuo-bamai
[Storage.ES]
Address = "http://127.0.0.1:9200"
Index = "huatuo_bamai"
Username = "elastic"
Password = "123456"
4. 启动 huatuo-bamai
通过 --config-dir 指定配置文件所在目录:
./_output/bin/huatuo-bamai --region dev --config-dir .
当本地存储目录 huatuo-local/ 中生成文件(例如 net_rx_latency)时,说明已成功采集到内核事件。可使用以下命令从 Elasticsearch 查询数据:
curl -k -u elastic:123456 \
-X GET "http://localhost:9200/huatuo_bamai/_search?pretty" \
-H "Content-Type: application/json" \
-d '{"query": {"match_all": {}}}'
或者:
curl -k -u elastic:123456 \
-X GET "http://localhost:9200/huatuo_bamai/_count?pretty"
⚙️ 原理
系统架构
HUATUO Storage 模块部署在节点上,将采集到的内核事件同时写入本地目录和 Elasticsearch 或 OpenSearch。两种存储后端共用同一套 [Storage.ES] 配置接口,通过地址区分。
graph TB
subgraph kernel["Linux 内核"]
K1[内核事件]
K2[AutoTracing]
end
subgraph huatuo["HUATUO Agent(节点级)"]
T["采集层"]
L["本地目录\nhuatuo-local/"]
S["Storage 模块\n同步写入"]
end
subgraph backends["存储后端"]
ES[Elasticsearch]
OS[OpenSearch]
end
kernel --> T
T --> L
T --> S
S -->|Index API| ES
S -->|Index API| OS
数据写入流程
HUATUO 采集到内核事件后,Storage 模块将事件同时写入本地目录和远端存储后端。两路写入并发执行,本地目录保留副本,远端存储提供持久化与查询能力。
sequenceDiagram
participant T as 采集层
participant L as 本地目录(huatuo-local/)
participant S as Storage 模块
participant B as ES / OpenSearch
T->>S: 采集到内核事件,序列化为 JSON
par 同时写入
S->>L: 写入本地文件
and
S->>B: 写入远端存储(Index API)
B-->>S: 写入确认(200 OK)
end
存储写入流程
从内核事件产生到写入存储后端,经过采集、序列化、同步写入三个阶段。本地目录与远端存储并发写入,互不阻塞。
flowchart LR
A([内核事件触发]) --> B["采集\n序列化为 JSON"]
B --> C["Storage 模块\n同步写入"]
C --> D["写入本地目录\nhuatuo-local/"]
C --> E["写入 ES / OpenSearch\nIndex API"]
🌟 结尾