Prometheus 在 Kubernetes 平台部署指南
概述
Prometheus 是一个开源的系统监控和告警工具包,专为云原生环境设计。它通过 HTTP 拉取指标、支持多种导出器和强大的查询语言 PromQL。本文档将详细介绍如何在 Kubernetes 平台中通过 Helm 部署 Prometheus。
目录
- 环境准备
- 1.1 Kubernetes 平台要求
- 1.2 必需组件启用
- Helm 部署 Prometheus
- 2.1 添加 Prometheus Helm 仓库
- 2.2 配置 Prometheus 参数
- 2.3 安装 Prometheus
- 网络配置
- 3.1 创建 Ingress
- 部署验证与访问
- 4.1 检查服务状态
- 4.2 访问 Prometheus Web 界面
- 4.3 功能验证
1. 环境准备
1.1 Kubernetes 平台要求
- Kubernetes 版本: 1.20+
- 节点配置: 至少 2 个节点,每个节点最少 2 核 4GB 内存
- 存储类: 需要配置默认存储类(如 NFS、Local Path 等)
1.2 必需组件启用
确保以下组件已启用:
- Ingress Controller(如 Nginx Ingress)
- 默认 StorageClass
2. Helm 部署 Prometheus
2.1 添加 Prometheus Helm 仓库
# 添加 Prometheus Helm 仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
2.2 配置 Prometheus 参数
创建 prometheus-values.yaml 配置文件:
# Prometheus 配置
prometheus:
enabled: true
annotations: {}
# Prometheus 服务配置
service:
type: ClusterIP
port: 9090
targetPort: 9090
nodePort: 30090
annotations: {}
labels: {}
clusterIP: ""
loadBalancerIP: ""
loadBalancerSourceRanges: []
# Prometheus Ingress 配置
ingress:
enabled: false
annotations: {}
labels: {}
hosts:
- prometheus.example.com
paths:
- /
pathType: ImplementationSpecific
tls: []
# Prometheus 规则配置
prometheusSpec:
# 镜像配置
image:
repository: quay.io/prometheus/prometheus
tag: v2.44.0
sha: ""
# 资源限制
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
# 持久化存储
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: ""
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
# 服务监控配置
serviceMonitorSelector: {}
serviceMonitorSelectorNilUsesHelmValues: false
# Pod 监控配置
podMonitorSelector: {}
podMonitorSelectorNilUsesHelmValues: false
# 规则选择器
ruleSelector: {}
ruleSelectorNilUsesHelmValues: false
# 告警管理器配置
alertingEndpoints: []
# 外部标签
externalLabels: {}
# 远程写入配置
remoteWrite: []
# 远程读取配置
remoteRead: []
# 保留时间
retention: 10d
# WAL 压缩
walCompression: true
# 管理员 API
enableAdminAPI: false
# 自动扩展
paused: false
# 镜像拉取策略
imagePullPolicy: IfNotPresent
# 镜像拉取密钥
imagePullSecrets: []
# Node Selector
nodeSelector: {}
# 容忍度
tolerations: []
# 亲和性
affinity: {}
# 安全上下文
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 2000
fsGroup: 2000
# 优先级类名
priorityClassName: ""
# 初始化容器
initContainers: []
# 附加容器
additionalContainers: []
# 附加卷
additionalVolumes: []
# 附加卷挂载
additionalVolumeMounts: []
# 配置重新加载镜像
configReloaderImage:
repository: quay.io/prometheus-operator/prometheus-config-reloader
tag: v0.66.0
# 配置重新加载资源
configReloaderResources: {}
# 端点配置
endpoints: []
# 监控命名空间
monitoringNamespace: ""
# 监控服务端点
monitoringServiceEndpoints: []
# 监控 Pod 端点
monitoringPodEndpoints: []
# 监控规则
monitoringRules: []
# 监控告警
monitoringAlerts: []
# 监控服务发现
monitoringServiceDiscovery: []
# 监控 Pod 发现
monitoringPodDiscovery: []
# 监控节点发现
monitoringNodeDiscovery: []
# 监控 Kubernetes 发现
monitoringKubernetesDiscovery: []
# 监控文件发现
monitoringFileDiscovery: []
# 监控 DNS 发现
monitoringDNSDiscovery: []
# 监控 EC2 发现
monitoringEC2Discovery: []
# 监控 Azure 发现
monitoringAzureDiscovery: []
# 监控 GCE 发现
monitoringGCEDiscovery: []
# 监控 OpenStack 发现
monitoringOpenStackDiscovery: []
# 监控 Triton 发现
monitoringTritonDiscovery: []
# 监控 Kubernetes SD 配置
monitoringKubernetesSDConfig: []
# 监控 HTTP SD 配置
monitoringHTTPSDConfig: []
# Alertmanager 配置
alertmanager:
enabled: true
annotations: {}
# Alertmanager 服务配置
service:
type: ClusterIP
port: 9093
targetPort: 9093
nodePort: 30093
annotations: {}
labels: {}
clusterIP: ""
loadBalancerIP: ""
loadBalancerSourceRanges: []
# Alertmanager Ingress 配置
ingress:
enabled: false
annotations: {}
labels: {}
hosts:
- alertmanager.example.com
paths:
- /
pathType: ImplementationSpecific
tls: []
# Alertmanager 配置
config:
global:
resolve_timeout: 5m
route:
group_by: ['job']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'null'
routes:
- match:
alertname: Watchdog
receiver: 'null'
receivers:
- name: 'null'
templates:
- '/etc/alertmanager/config/*.tmpl'
# Alertmanager 持久化存储
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: ""
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
# Node Exporter 配置
nodeExporter:
enabled: true
# Node Exporter 服务配置
service:
type: ClusterIP
port: 9100
targetPort: 9100
nodePort: 30100
annotations: {}
labels: {}
# Node Exporter 资源配置
resources:
limits:
cpu: 200m
memory: 50Mi
requests:
cpu: 100m
memory: 30Mi
# Kube State Metrics 配置
kubeStateMetrics:
enabled: true
# Kube State Metrics 服务配置
service:
type: ClusterIP
port: 8080
targetPort: 8080
nodePort: 30800
annotations: {}
labels: {}
# Kube State Metrics 资源配置
resources:
limits:
cpu: 100m
memory: 150Mi
requests:
cpu: 50m
memory: 100Mi
# Prometheus Pushgateway 配置
pushgateway:
enabled: true
# Pushgateway 服务配置
service:
type: ClusterIP
port: 9091
targetPort: 9091
nodePort: 30091
annotations: {}
labels: {}
# Pushgateway 资源配置
resources:
limits:
cpu: 100m
memory: 100Mi
requests:
cpu: 50m
memory: 50Mi
# Grafana 配置
grafana:
enabled: false # 我们将单独部署 Grafana
# 其他配置
additionalPrometheusRules: []
additionalScrapeConfigs: []
additionalAlertRelabelConfigs: []
additionalAlertManagerConfigs: []
# Kubernetes 指标配置
kubernetesServiceMonitors:
enabled: true
selectors:
matchLabels: {}
kubernetesPodMonitors:
enabled: true
selectors:
matchLabels: {}
kubernetesProbes:
enabled: true
selectors:
matchLabels: {}
kubernetesAlertmanagers:
enabled: true
selectors:
matchLabels: {}
kubernetesPrometheuses:
enabled: true
selectors:
matchLabels: {}
kubernetesThanosRulers:
enabled: true
selectors:
matchLabels: {}
kubernetesServiceMonitorsCRD:
enabled: true
kubernetesPodMonitorsCRD:
enabled: true
kubernetesProbesCRD:
enabled: true
kubernetesAlertmanagersCRD:
enabled: true
kubernetesPrometheusesCRD:
enabled: true
kubernetesThanosRulersCRD:
enabled: true
2.3 安装 Prometheus
# 创建命名空间
kubectl create namespace monitoring
# 安装 Prometheus
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--values prometheus-values.yaml \
--version 48.1.0
3. 网络配置
3.1 创建 Ingress
创建 prometheus-ingress.yaml 文件:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus-ingress
namespace: monitoring
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: prometheus.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-kube-prometheus-prometheus
port:
number: 9090
- host: alertmanager.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-kube-prometheus-alertmanager
port:
number: 9093
应用 Ingress 配置:
kubectl apply -f prometheus-ingress.yaml
4. 部署验证与访问
4.1 检查服务状态
# 检查 Prometheus Pod 状态
kubectl get pods -n monitoring
# 检查 Prometheus 服务状态
kubectl get svc -n monitoring
# 检查 Ingress 状态
kubectl get ingress -n monitoring
4.2 访问 Prometheus Web 界面
-
在本地
/etc/hosts文件中添加域名解析:<节点IP> prometheus.example.com
<节点IP> alertmanager.example.com -
在浏览器中访问:
- Prometheus:
http://prometheus.example.com - Alertmanager:
http://alertmanager.example.com
- Prometheus:
4.3 功能验证
查询指标
- 访问 Prometheus Web 界面
- 在查询框中输入
up并执行查询 - 验证所有目标是否正常运行
告警测试
- 访问 Alertmanager Web 界面
- 检查是否有告警触发
- 验证告警通知机制