Prometheus 在 Kubernetes 平台部署指南

概述

Prometheus 是一个开源的系统监控和告警工具包，专为云原生环境设计。它通过 HTTP 拉取指标、支持多种导出器和强大的查询语言 PromQL。本文档将详细介绍如何在 Kubernetes 平台中通过 Helm 部署 Prometheus。

环境准备
- 1.1 Kubernetes 平台要求
- 1.2 必需组件启用
Helm 部署 Prometheus
- 2.1 添加 Prometheus Helm 仓库
- 2.2 配置 Prometheus 参数
- 2.3 安装 Prometheus
网络配置
- 3.1 创建 Ingress
部署验证与访问
- 4.1 检查服务状态
- 4.2 访问 Prometheus Web 界面
- 4.3 功能验证

1. 环境准备

1.1 Kubernetes 平台要求

Kubernetes 版本: 1.20+
节点配置: 至少 2 个节点，每个节点最少 2 核 4GB 内存
存储类: 需要配置默认存储类（如 NFS、Local Path 等）

1.2 必需组件启用

确保以下组件已启用：

Ingress Controller（如 Nginx Ingress）
默认 StorageClass

2. Helm 部署 Prometheus

2.1 添加 Prometheus Helm 仓库

# 添加 Prometheus Helm 仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

2.2 配置 Prometheus 参数

创建 prometheus-values.yaml 配置文件：

# Prometheus 配置
prometheus:
  enabled: true
  annotations: {}
  
  # Prometheus 服务配置
  service:
    type: ClusterIP
    port: 9090
    targetPort: 9090
    nodePort: 30090
    annotations: {}
    labels: {}
    clusterIP: ""
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
  
  # Prometheus Ingress 配置
  ingress:
    enabled: false
    annotations: {}
    labels: {}
    hosts:
      - prometheus.example.com
    paths:
      - /
    pathType: ImplementationSpecific
    tls: []
  
  # Prometheus 规则配置
  prometheusSpec:
    # 镜像配置
    image:
      repository: quay.io/prometheus/prometheus
      tag: v2.44.0
      sha: ""
    
    # 资源限制
    resources:
      limits:
        cpu: 1000m
        memory: 2Gi
      requests:
        cpu: 500m
        memory: 1Gi
    
    # 持久化存储
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: ""
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi
    
    # 服务监控配置
    serviceMonitorSelector: {}
    serviceMonitorSelectorNilUsesHelmValues: false
    
    # Pod 监控配置
    podMonitorSelector: {}
    podMonitorSelectorNilUsesHelmValues: false
    
    # 规则选择器
    ruleSelector: {}
    ruleSelectorNilUsesHelmValues: false
    
    # 告警管理器配置
    alertingEndpoints: []
    
    # 外部标签
    externalLabels: {}
    
    # 远程写入配置
    remoteWrite: []
    
    # 远程读取配置
    remoteRead: []
    
    # 保留时间
    retention: 10d
    
    # WAL 压缩
    walCompression: true
    
    # 管理员 API
    enableAdminAPI: false
    
    # 自动扩展
    paused: false
    
    # 镜像拉取策略
    imagePullPolicy: IfNotPresent
    
    # 镜像拉取密钥
    imagePullSecrets: []
    
    # Node Selector
    nodeSelector: {}
    
    # 容忍度
    tolerations: []
    
    # 亲和性
    affinity: {}
    
    # 安全上下文
    securityContext:
      runAsNonRoot: true
      runAsUser: 1000
      runAsGroup: 2000
      fsGroup: 2000
    
    # 优先级类名
    priorityClassName: ""
    
    # 初始化容器
    initContainers: []
    
    # 附加容器
    additionalContainers: []
    
    # 附加卷
    additionalVolumes: []
    
    # 附加卷挂载
    additionalVolumeMounts: []
    
    # 配置重新加载镜像
    configReloaderImage:
      repository: quay.io/prometheus-operator/prometheus-config-reloader
      tag: v0.66.0
    
    # 配置重新加载资源
    configReloaderResources: {}
    
    # 端点配置
    endpoints: []
    
    # 监控命名空间
    monitoringNamespace: ""
    
    # 监控服务端点
    monitoringServiceEndpoints: []
    
    # 监控 Pod 端点
    monitoringPodEndpoints: []
    
    # 监控规则
    monitoringRules: []
    
    # 监控告警
    monitoringAlerts: []
    
    # 监控服务发现
    monitoringServiceDiscovery: []
    
    # 监控 Pod 发现
    monitoringPodDiscovery: []
    
    # 监控节点发现
    monitoringNodeDiscovery: []
    
    # 监控 Kubernetes 发现
    monitoringKubernetesDiscovery: []
    
    # 监控文件发现
    monitoringFileDiscovery: []
    
    # 监控 DNS 发现
    monitoringDNSDiscovery: []
    
    # 监控 EC2 发现
    monitoringEC2Discovery: []
    
    # 监控 Azure 发现
    monitoringAzureDiscovery: []
    
    # 监控 GCE 发现
    monitoringGCEDiscovery: []
    
    # 监控 OpenStack 发现
    monitoringOpenStackDiscovery: []
    
    # 监控 Triton 发现
    monitoringTritonDiscovery: []
    
    # 监控 Kubernetes SD 配置
    monitoringKubernetesSDConfig: []
    
    # 监控 HTTP SD 配置
    monitoringHTTPSDConfig: []

# Alertmanager 配置
alertmanager:
  enabled: true
  annotations: {}
  
  # Alertmanager 服务配置
  service:
    type: ClusterIP
    port: 9093
    targetPort: 9093
    nodePort: 30093
    annotations: {}
    labels: {}
    clusterIP: ""
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
  
  # Alertmanager Ingress 配置
  ingress:
    enabled: false
    annotations: {}
    labels: {}
    hosts:
      - alertmanager.example.com
    paths:
      - /
    pathType: ImplementationSpecific
    tls: []
  
  # Alertmanager 配置
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['job']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'null'
      routes:
      - match:
          alertname: Watchdog
        receiver: 'null'
    receivers:
    - name: 'null'
    templates:
    - '/etc/alertmanager/config/*.tmpl'
  
  # Alertmanager 持久化存储
  alertmanagerSpec:
    storage:
      volumeClaimTemplate:
        spec:
          storageClassName: ""
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

# Node Exporter 配置
nodeExporter:
  enabled: true
  
  # Node Exporter 服务配置
  service:
    type: ClusterIP
    port: 9100
    targetPort: 9100
    nodePort: 30100
    annotations: {}
    labels: {}
  
  # Node Exporter 资源配置
  resources:
    limits:
      cpu: 200m
      memory: 50Mi
    requests:
      cpu: 100m
      memory: 30Mi

# Kube State Metrics 配置
kubeStateMetrics:
  enabled: true
  
  # Kube State Metrics 服务配置
  service:
    type: ClusterIP
    port: 8080
    targetPort: 8080
    nodePort: 30800
    annotations: {}
    labels: {}
  
  # Kube State Metrics 资源配置
  resources:
    limits:
      cpu: 100m
      memory: 150Mi
    requests:
      cpu: 50m
      memory: 100Mi

# Prometheus Pushgateway 配置
pushgateway:
  enabled: true
  
  # Pushgateway 服务配置
  service:
    type: ClusterIP
    port: 9091
    targetPort: 9091
    nodePort: 30091
    annotations: {}
    labels: {}
  
  # Pushgateway 资源配置
  resources:
    limits:
      cpu: 100m
      memory: 100Mi
    requests:
      cpu: 50m
      memory: 50Mi

# Grafana 配置
grafana:
  enabled: false  # 我们将单独部署 Grafana

# 其他配置
additionalPrometheusRules: []
additionalScrapeConfigs: []
additionalAlertRelabelConfigs: []
additionalAlertManagerConfigs: []

# Kubernetes 指标配置
kubernetesServiceMonitors:
  enabled: true
  selectors:
    matchLabels: {}
  
kubernetesPodMonitors:
  enabled: true
  selectors:
    matchLabels: {}
  
kubernetesProbes:
  enabled: true
  selectors:
    matchLabels: {}
  
kubernetesAlertmanagers:
  enabled: true
  selectors:
    matchLabels: {}
  
kubernetesPrometheuses:
  enabled: true
  selectors:
    matchLabels: {}
  
kubernetesThanosRulers:
  enabled: true
  selectors:
    matchLabels: {}
  
kubernetesServiceMonitorsCRD:
  enabled: true
  
kubernetesPodMonitorsCRD:
  enabled: true
  
kubernetesProbesCRD:
  enabled: true
  
kubernetesAlertmanagersCRD:
  enabled: true
  
kubernetesPrometheusesCRD:
  enabled: true
  
kubernetesThanosRulersCRD:
  enabled: true

2.3 安装 Prometheus

# 创建命名空间
kubectl create namespace monitoring

# 安装 Prometheus
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --values prometheus-values.yaml \
  --version 48.1.0

3. 网络配置

3.1 创建 Ingress

创建 prometheus-ingress.yaml 文件：

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: prometheus.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-kube-prometheus-prometheus
            port:
              number: 9090
  - host: alertmanager.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: prometheus-kube-prometheus-alertmanager
            port:
              number: 9093

应用 Ingress 配置：

kubectl apply -f prometheus-ingress.yaml

4. 部署验证与访问

4.1 检查服务状态

# 检查 Prometheus Pod 状态
kubectl get pods -n monitoring

# 检查 Prometheus 服务状态
kubectl get svc -n monitoring

# 检查 Ingress 状态
kubectl get ingress -n monitoring

4.2 访问 Prometheus Web 界面

在本地 /etc/hosts 文件中添加域名解析：

<节点IP> prometheus.example.com
<节点IP> alertmanager.example.com

在浏览器中访问：
- Prometheus: http://prometheus.example.com
- Alertmanager: http://alertmanager.example.com

Prometheus 在 Kubernetes 平台部署指南

概述

目录

1. 环境准备

1.1 Kubernetes 平台要求

1.2 必需组件启用

2. Helm 部署 Prometheus

2.1 添加 Prometheus Helm 仓库

2.2 配置 Prometheus 参数

2.3 安装 Prometheus

3. 网络配置

3.1 创建 Ingress

4. 部署验证与访问

4.1 检查服务状态

4.2 访问 Prometheus Web 界面

4.3 功能验证

查询指标

告警测试

概述​

目录​

1. 环境准备​

1.1 Kubernetes 平台要求​

1.2 必需组件启用​

2. Helm 部署 Prometheus​

2.1 添加 Prometheus Helm 仓库​

2.2 配置 Prometheus 参数​

2.3 安装 Prometheus​

3. 网络配置​

3.1 创建 Ingress​

4. 部署验证与访问​

4.1 检查服务状态​

4.2 访问 Prometheus Web 界面​

4.3 功能验证​

查询指标​

告警测试​

概述

目录

1. 环境准备

1.1 Kubernetes 平台要求

1.2 必需组件启用

2. Helm 部署 Prometheus

2.1 添加 Prometheus Helm 仓库

2.2 配置 Prometheus 参数

2.3 安装 Prometheus

3. 网络配置

3.1 创建 Ingress

4. 部署验证与访问

4.1 检查服务状态

4.2 访问 Prometheus Web 界面

4.3 功能验证

查询指标

告警测试