跳到主要内容

多集群高可用架构部署手册

核心目标:基于 Ubuntu 24.04 搭建 Platform、Middleware、Application 三个独立 k3s 集群,实现控制平面高可用、跨集群内网通信、外部流量隔离,集成 Calico、MetalLB、Ingress-nginx 及 Rancher 管理,适配 Ubuntu 系统包管理与配置特性。

第一章 集群角色与资源规划

1.1 节点角色与配置清单

节点类型角色数量硬件配置(单台)节点IP核心功能
内部API LB主/备节点22核CPU、4GB内存、50GB磁盘10.1.10.5/6转发各集群6443端口(apiserver)
外部服务LB主/备节点22核CPU、4GB内存、50GB磁盘192.168.192.75/76(外网)、10.1.10.7/8(内网)仅转发外部80/443流量至Application集群Ingress
Platform集群master节点24核CPU、8GB内存、100GB磁盘10.1.10.60/61控制面(etcd高可用)
worker节点28核CPU、16GB内存、200GB磁盘10.1.10.65/66运行Rancher等运维工具
Middleware集群master节点24核CPU、8GB内存、100GB磁盘10.1.10.50/51控制面(etcd高可用)
worker节点28核CPU、16GB内存、200GB磁盘10.1.10.55/56运行Redis/MQ等中间件
Application集群master节点24核CPU、8GB内存、100GB磁盘10.1.10.40/41控制面(etcd高可用)
worker节点38核CPU、16GB内存、200GB磁盘10.1.10.45/46/47运行业务微服务/前端
已有数据库主库14核CPU、8GB内存、200GB磁盘192.168.192.170存储集群控制面状态

1.2 网络规划(关键网段)

网段用途网段地址说明
管理网段(VLAN 10)10.1.10.0/24节点IP、内部LB VIP(10.1.10.10/11/12)
服务网段(VLAN 100)192.168.192.70/24外部LB VIP(192.168.192.80),仅面向Application集群
Platform Pod/Service10.42.0.0/16/10.43.0.0/16Calico分配的Pod/Service IP
Middleware Pod/Service10.44.0.0/16/10.45.0.0/16Calico分配的Pod/Service IP
Application Pod/Service10.46.0.0/16/10.47.0.0/16Calico分配的Pod/Service IP
MetalLB固定IP段10.2.10.100-250内网服务暴露IP(分三段:Platform 100-150;Middleware 151-200;Application 201-250)

1.3 VIP规划

VIP用途VIP地址端口说明
Platform控制平面10.1.10.126443Platform集群apiserver入口(内部LB转发)
Middleware控制平面10.1.10.116443Middleware集群apiserver入口(内部LB转发)
Application控制平面10.1.10.106443Application集群apiserver入口(内部LB转发)
外部服务入口192.168.192.8080/443外部用户访问入口(外部LB VIP)

第二章 前期准备(所有节点执行)

2.1 操作系统初始化

sudo su -

# 更新系统包索引
sudo apt update && sudo apt upgrade -y

# 关闭防火墙(生产环境建议按需开放端口)
sudo systemctl stop ufw && sudo systemctl disable ufw # Ubuntu默认使用ufw防火墙

# 关闭Swap(k8s要求)
sudo swapoff -a
sudo sed -i '/swap/s/^/#/' /etc/fstab # 永久禁用

# 配置内核参数(容器网络)
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay && sudo modprobe br_netfilter

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system # 生效内核参数

# 关闭IPv6(可选,避免网络冲突)
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1

2.2 安装依赖工具

# 安装基础依赖(Ubuntu包名适配)
sudo apt install -y curl wget vim net-tools ipvsadm chrony apt-transport-https ca-certificates
sudo systemctl start chrony && sudo systemctl enable chrony # 时间同步
sudo chronyc sources # 验证时间同步状态

第三章 负载均衡器部署(HAProxy+Keepalived)

3.1 内部API LB部署(10.1.10.5/6节点)

用于转发各集群apiserver(6443端口)流量,实现控制平面高可用。

安装HAProxy+Keepalived

sudo apt install -y haproxy keepalived  # Ubuntu包管理安装

配置HAProxy

sudo tee /etc/haproxy/haproxy.cfg <<EOF
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon

defaults
log global
mode tcp # k3s 6443是TCP协议
option tcplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000

# 监控页面(访问http://节点IP:9000,账号haproxy:haproxy)
listen stats
bind *:9000
mode http
stats enable
stats uri /
stats auth haproxy:haproxy

# 合并所有6443端口的流量处理(关键修改)
frontend k3s-api-front
bind *:6443 # 仅绑定一次6443端口
# 根据目标VIP区分不同集群
acl is_app_api dst 10.1.10.10 # 业务集群VIP
acl is_mid_api dst 10.1.10.11 # 中间件集群VIP
acl is_plat_api dst 10.1.10.12 # platform集群VIP(你的目标VIP)
# 匹配ACL后转发到对应的backend
use_backend app-api-backend if is_app_api
use_backend mid-api-backend if is_mid_api
use_backend plat-api-backend if is_plat_api
# 默认后端(可选,如无匹配则拒绝)
default_backend plat-api-backend # 可改为拒绝:default_backend reject-backend

# 业务集群后端
backend app-api-backend
balance roundrobin
server app-master-01 10.1.10.40:6443 check inter 2000 fall 3 rise 2
server app-master-02 10.1.10.41:6443 check inter 2000 fall 3 rise 2

# 中间件集群后端
backend mid-api-backend
balance roundrobin
server mid-master-01 10.1.10.50:6443 check inter 2000 fall 3 rise 2
server mid-master-02 10.1.10.51:6443 check inter 2000 fall 3 rise 2

# platform集群后端(你的目标后端)
backend plat-api-backend
balance roundrobin
server plat-master-01 10.1.10.60:6443 check inter 2000 fall 3 rise 2
server plat-master-02 10.1.10.61:6443 check inter 2000 fall 3 rise 2

# 可选:拒绝未匹配的流量
backend reject-backend
mode tcp
server reject 127.0.0.1:8080 disabled # 禁用的服务器,用于拒绝请求
EOF

配置Keepalived(主节点10.1.10.5)

sudo tee /etc/keepalived/keepalived.conf <<EOF
global_defs {
router_id K3S_INNER_LB_MASTER
}

# 业务控制平面VIP(10.1.10.10)
vrrp_instance VI_APP_API {
state MASTER
interface ens18 # 替换为实际网卡名(如enp0s3)
virtual_router_id 10
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass appApi123
}
virtual_ipaddress {
10.1.10.10/24 dev ens18
}
}

# 中间件控制平面VIP(10.1.10.11)
vrrp_instance VI_MID_API {
state MASTER
interface ens18
virtual_router_id 11
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass midApi123
}
virtual_ipaddress {
10.1.10.11/24 dev ens18
}
}

# platform控制平面VIP(10.1.10.12)
vrrp_instance VI_PLAT_API {
state MASTER
interface ens18
virtual_router_id 12
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass platApi123
}
virtual_ipaddress {
10.1.10.12/24 dev ens18
}
}
EOF

备节点10.1.10.6配置

sudo tee /etc/keepalived/keepalived.conf <<EOF
global_defs {
router_id K3S_INNER_LB_BACKUP
}

# 业务控制平面VIP(10.1.10.10)
vrrp_instance VI_APP_API {
state BACKUP
interface ens18
virtual_router_id 10
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass appApi123
}
virtual_ipaddress {
10.1.10.10/24 dev ens18
}
}

# 中间件控制平面VIP(10.1.10.11)
vrrp_instance VI_MID_API {
state BACKUP
interface ens18
virtual_router_id 11
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass midApi123
}
virtual_ipaddress {
10.1.10.11/24 dev ens18
}
}

# platform控制平面VIP(10.1.10.12)
vrrp_instance VI_PLAT_API {
state BACKUP
interface ens18
virtual_router_id 12
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass platApi123
}
virtual_ipaddress {
10.1.10.12/24 dev ens18
}
}
EOF

启动服务并验证

# 重启服务生效
sudo systemctl restart haproxy keepalived
sudo systemctl enable haproxy keepalived

# 验证VIP绑定(主节点应显示10.1.10.10/11/12)
ip addr show ens18
# 验证HAProxy状态
sudo systemctl status haproxy

3.2 外部服务LB部署(192.168.192.75/76节点)

仅用于转发外部用户流量至Application集群的Ingress-nginx(10.2.10.201)。

安装HAProxy+Keepalived

sudo apt install -y haproxy keepalived

允许 HAProxy 绑定到尚未分配给当前节点的 VIP:

# 临时生效(立即生效)
echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind

# 永久生效(重启后保留)
echo "net.ipv4.ip_nonlocal_bind=1" | tee -a /etc/sysctl.conf
sysctl -p # 加载配置

配置HAProxy

cat <<EOF | tee /etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon

defaults
log global
mode tcp
option tcplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000

# 监控页面(绑定节点内网IP,不依赖VIP,方便内部访问)
listen stats
bind 10.1.10.7:9000 # 主LB内网IP(10.1.10.7);备LB改为10.1.10.8
mode http
stats enable
stats uri /
stats auth haproxy:haproxy

# HTTP服务(绑定VIP的80端口)
frontend http-front
bind 192.168.192.80:80 # VIP地址
mode tcp
default_backend ingress-http-back

# HTTPS服务(绑定VIP的443端口)
frontend https-ingress-front
bind 192.168.192.80:443 # VIP地址
mode tcp
default_backend ingress-https-back

# 后端配置(与之前一致)
backend ingress-http-back
mode tcp
balance roundrobin
server ingress-nginx 10.2.10.201:80 check inter 2000 fall 3 rise 2

backend ingress-https-back
mode tcp
balance roundrobin
server ingress-nginx 10.2.10.201:443 check inter 2000 fall 3 rise 2
EOF

配置Keepalived(主节点192.168.192.75)

cat <<EOF | tee /etc/keepalived/keepalived.conf
global_defs {
router_id K3S_OUTER_LB_MASTER
}

vrrp_script check_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
fall 3
rise 2
}

vrrp_instance VI_1 {
state MASTER
interface ens18
virtual_router_id 52 # 与内部LB不同ID
priority 100
advert_int 1

authentication {
auth_type PASS
auth_pass 654321
}

virtual_ipaddress {
192.168.192.80/24 dev ens18 # 外部服务VIP
}

track_script {
check_haproxy
}
}
EOF

备节点192.168.192.76配置

cat <<EOF | tee /etc/keepalived/keepalived.conf
global_defs {
router_id K3S_OUTER_LB_MASTER
}

vrrp_script check_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
fall 3
rise 2
}

vrrp_instance VI_1 {
state MASTER
interface ens18
virtual_router_id 52 # 与内部LB不同ID
priority 100
advert_int 1

authentication {
auth_type PASS
auth_pass 654321
}

virtual_ipaddress {
192.168.192.80/24 dev ens18 # 外部服务VIP
}

track_script {
check_haproxy
}
}
EOF

启动服务并验证

sudo systemctl restart haproxy keepalived
sudo systemctl enable haproxy keepalived
ip addr show ens18 # 验证VIP 192.168.192.80是否绑定(主节点)

第四章 数据库准备(外部独立MySQL)

所有集群的etcd数据通过外部MySQL存储(示例使用已有MySQL 192.168.192.170),Ubuntu节点连接MySQL无需额外配置,确保网络可达即可。

4.1 创建数据库与用户(MySQL服务器执行)

-- platform集群数据库及用户
CREATE DATABASE IF NOT EXISTS k3s_platform CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
CREATE USER IF NOT EXISTS 'plat_db_user'@'10.1.10.60' IDENTIFIED BY 'K3s@Db2024';
CREATE USER IF NOT EXISTS 'plat_db_user'@'10.1.10.61' IDENTIFIED BY 'K3s@Db2024';
GRANT ALL PRIVILEGES ON k3s_platform.* TO 'plat_db_user'@'10.1.10.60';
GRANT ALL PRIVILEGES ON k3s_platform.* TO 'plat_db_user'@'10.1.10.61';

-- 中间件集群数据库及用户
CREATE DATABASE IF NOT EXISTS k3s_middleware CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
CREATE USER IF NOT EXISTS 'mid_db_user'@'10.1.10.50' IDENTIFIED BY 'K3s@Db2024';
CREATE USER IF NOT EXISTS 'mid_db_user'@'10.1.10.51' IDENTIFIED BY 'K3s@Db2024';
GRANT ALL PRIVILEGES ON k3s_middleware.* TO 'mid_db_user'@'10.1.10.50';
GRANT ALL PRIVILEGES ON k3s_middleware.* TO 'mid_db_user'@'10.1.10.51';

-- 业务集群数据库及用户
CREATE DATABASE IF NOT EXISTS k3s_application CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
CREATE USER IF NOT EXISTS 'app_db_user'@'10.1.10.40' IDENTIFIED BY 'K3s@Db2024';
CREATE USER IF NOT EXISTS 'app_db_user'@'10.1.10.41' IDENTIFIED BY 'K3s@Db2024';
GRANT ALL PRIVILEGES ON k3s_application.* TO 'app_db_user'@'10.1.10.40';
GRANT ALL PRIVILEGES ON k3s_application.* TO 'app_db_user'@'10.1.10.41';

FLUSH PRIVILEGES;

4.2 验证Ubuntu节点连接(各集群master节点执行)

# 安装MySQL客户端验证连接
sudo apt install -y mysql-client
mysql -u k3s_admin -p'K3s@Db2024' -h 192.168.192.170 # 成功登录即正常

第五章 Helm工具安装(所有集群master节点)

Helm是Kubernetes的包管理工具,用于简化Ingress-nginx、Rancher等组件部署。

5.1 安装Helm

# 下载Helm 3(Ubuntu与CentOS通用下载方式)
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod +x get_helm.sh
sudo ./get_helm.sh

# 验证安装
helm version # 应显示版本信息,如v3.14.0+g...

5.2 Helm基本操作与仓库配置

# 添加常用仓库(国内网络建议替换为镜像源)
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo add jetstack https://charts.jetstack.io
helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm repo update # 更新仓库索引

第六章 k3s集群部署

k3s在Ubuntu 24.04上的部署命令与CentOS一致,仅需确保依赖已安装。

  • 下载离线安装包
wget https://get.k3s.io -O install.sh
wget https://github.com/k3s-io/k3s/releases/download/v1.33.5%2Bk3s1/k3s -O k3s
wget https://github.com/k3s-io/k3s/releases/download/v1.33.5%2Bk3s1/k3s-airgap-images-amd64.tar -O k3s-airgap-images-amd64.tar

备用地址:

https://wengtx.cn/scripts/k3s/v1.33.5+k3s1/install.sh

https://wengtx.cn/scripts/k3s/v1.33.5+k3s1/k3s

https://wengtx.cn/scripts/k3s/v1.33.5+k3s1/k3s-airgap-images-amd64.tar

  • 复制install.shk3s-airgap-images-amd64.tark3s到所有k3s节点的用户目录下并添加权限:
# 将 tar 文件放在images目录下
sudo mkdir -p /var/lib/rancher/k3s/agent/images/
sudo cp ./k3s-airgap-images-amd64.tar /var/lib/rancher/k3s/agent/images/

# 将 k3s 二进制文件放在 /usr/local/bin/k3s路径下
sudo cp ./k3s /usr/local/bin/
sudo chmod +x /usr/local/bin/k3s
chmod +x install.sh k3s

6.1 Platform集群部署(10.1.10.60/61/65/66)

初始化第一个master节点(10.1.10.60)

# 安装Platform集群master节点(k3s-platform-master-01)
INSTALL_K3S_SKIP_DOWNLOAD=true ./install.sh server \
--cluster-init \
--server https://10.1.10.12:6443 \
--cluster-cidr 10.42.0.0/16 \
--service-cidr 10.43.0.0/16 \
--flannel-backend none \
--disable-network-policy \
--disable=traefik \
--disable=servicelb \
--datastore-endpoint "mysql://plat_db_user:Plat@Db123@tcp(192.168.192.170:3306)/k3s_platform" \
--node-ip 10.1.10.60 \
--node-name k3s-platform-master-01 \
--node-taint "CriticalAddonsOnly=true:NoExecute" \
--write-kubeconfig-mode 0644 \
--tls-san "10.1.10.12,10.1.10.5,10.1.10.6,10.1.10.60,10.1.10.61,k3s-platform-master-01"

# 参数说明:
# --server https://10.1.10.12:6443:Platform集群LB的VIP(集群入口)
# --cluster-cidr 10.42.0.0/16:Pod网段(需与Calico匹配)
# --service-cidr 10.43.0.0/16:Service网段
# --flannel-backend none:禁用默认flannel,使用Calico
# --disable=traefik:禁用默认traefik,使用Ingress-nginx
# --disable=servicelb:禁用默认servicelb,使用MetalLB
# --datastore-endpoint:数据库存储地址(mysql://用户名:密码@tcp(数据库IP:端口)/数据库名)
# --node-ip 10.1.10.60:节点IP
# --node-name k3s-platform-master-01:节点名称
# --node-taint:节点污点(避免普通Pod调度到master节点)
# --write-kubeconfig-mode 0644:kubeconfig文件权限(允许非root用户读取)
# --tls-san:TLS证书中包含的SAN(确保各节点/IP访问时证书有效)

# 查询集群token(用于后续worker节点加入集群)
sudo cat /var/lib/rancher/k3s/server/node-token

添加第二个master节点(10.1.10.61)

INSTALL_K3S_SKIP_DOWNLOAD=true  ./install.sh server \
--server https://10.1.10.12:6443 \
--token "plat-k3s-token-2024" \
--cluster-cidr 10.42.0.0/16 \
--service-cidr 10.43.0.0/16 \
--flannel-backend none \
--disable-network-policy \
--disable=traefik \
--disable=servicelb \
--datastore-endpoint "mysql://plat_db_user:Plat@Db123@tcp(192.168.192.170:3306)/k3s_platform" \
--node-ip 10.1.10.61 \
--node-name k3s-platform-master-02 \
--node-taint "CriticalAddonsOnly=true:NoExecute" \
--write-kubeconfig-mode 0644 \
--tls-san "10.1.10.12,10.1.10.5,10.1.10.6,10.1.10.60,10.1.10.61,k3s-platform-master-02"

添加worker节点(10.1.10.65/66)

# 在worker节点执行(TOKEN从master节点的/var/lib/rancher/k3s/server/node-token获取)
INSTALL_K3S_SKIP_DOWNLOAD=true bash ./install.sh agent \
--server https://10.1.10.12:6443 \
--token "plat-k3s-token-2024" \
--node-ip 10.1.10.65 \
--node-name k3s-platform-worker-01
# --with-node-id 自动添加随机ID,确保唯一(可选)

6.2 Middleware集群部署(10.1.10.50/51/55/56)

初始化第一个master节点(10.1.10.50)

INSTALL_K3S_SKIP_DOWNLOAD=true  ./install.sh server \
--server https://10.1.10.11:6443 \
--cluster-cidr 10.44.0.0/16 \
--service-cidr 10.45.0.0/16 \
--flannel-backend none \
--disable-network-policy \
--disable=traefik \
--disable=servicelb \
--datastore-endpoint "mysql://mid_db_user:Mid@Db123@tcp(192.168.192.170:3306)/k3s_middleware" \
--node-ip 10.1.10.50 \
--node-name k3s-middleware-master-01 \
--node-taint "CriticalAddonsOnly=true:NoExecute" \
--write-kubeconfig-mode 0644 \
--tls-san "10.1.10.11,10.1.10.5,10.1.10.6,10.1.10.50,10.1.10.51,k3s-middleware-master-01"

# 参数说明:
# --server https://10.1.10.11:6443:Middleware集群LB的VIP(集群入口)
# --cluster-cidr 10.44.0.0/16:Pod网段(需与Calico匹配)
# --service-cidr 10.45.0.0/16:Service网段
# --flannel-backend none:禁用默认flannel,使用Calico
# --disable=traefik:禁用默认traefik,使用Ingress-nginx
# --disable=servicelb:禁用默认servicelb,使用MetalLB
# --datastore-endpoint:数据库存储地址(mysql://用户名:密码@tcp(数据库IP:端口)/数据库名)
# --node-ip 10.1.10.50:节点IP
# --node-name k3s-middleware-master-01:节点名称
# --node-taint:节点污点(避免普通Pod调度到master节点)
# --write-kubeconfig-mode 0644:kubeconfig文件权限(允许非root用户读取)
# --tls-san:TLS证书中包含的SAN(确保各节点/IP访问时证书有效)

# 查询集群token(用于后续worker节点加入集群)
sudo cat /var/lib/rancher/k3s/server/node-token

添加第二个master节点(10.1.10.51)

INSTALL_K3S_SKIP_DOWNLOAD=true  ./install.sh server \
--server https://10.1.10.11:6443 \
--token "mid-k3s-token-2024" \
--cluster-cidr 10.44.0.0/16 \
--service-cidr 10.45.0.0/16 \
--flannel-backend none \
--disable-network-policy \
--disable=traefik \
--disable=servicelb \
--datastore-endpoint "mysql://mid_db_user:Mid@Db123@tcp(192.168.192.170:3306)/k3s_middleware" \
--node-ip 10.1.10.51 \
--node-name k3s-middleware-master-02 \
--node-taint "CriticalAddonsOnly=true:NoExecute" \
--write-kubeconfig-mode 0644 \
--tls-san "10.1.10.11,10.1.10.5,10.1.10.6,10.1.10.50,10.1.10.51,k3s-middleware-master-02"

添加worker节点(10.1.10.55/56)

# 替换TOKEN和IP,其他同Platform集群worker节点部署
INSTALL_K3S_SKIP_DOWNLOAD=true ./install.sh agent \
--server https://10.1.10.11:6443 \
--token "mid-k3s-token-2024" \
--node-ip 10.1.10.55 \
--node-name k3s-middleware-worker-01

6.3 Application集群部署(10.1.10.40/41/45/46/47)

初始化第一个master节点(10.1.10.40)

INSTALL_K3S_SKIP_DOWNLOAD=true  ./install.sh server \
--server https://10.1.10.10:6443 \
--cluster-cidr 10.46.0.0/16 \
--service-cidr 10.47.0.0/16 \
--flannel-backend none \
--disable-network-policy \
--disable=traefik \
--disable=servicelb \
--datastore-endpoint "mysql://app_db_user:App@Db123@tcp(192.168.192.170:3306)/k3s_application" \
--node-ip 10.1.10.40 \
--node-name k3s-application-master-01 \
--node-taint "CriticalAddonsOnly=true:NoExecute" \
--write-kubeconfig-mode 0644 \
--tls-san "10.1.10.10,10.1.10.5,10.1.10.6,10.1.10.40,10.1.10.41,k3s-application-master-01"

# 参数说明:
# --server https://10.1.10.10:6443:Application集群LB的VIP(集群入口)
# --cluster-cidr 10.46.0.0/16:Pod网段(需与Calico匹配)
# --service-cidr 10.47.0.0/16:Service网段
# --flannel-backend none:禁用默认flannel,使用Calico
# --disable=traefik:禁用默认traefik,使用Ingress-nginx
# --disable=servicelb:禁用默认servicelb,使用MetalLB
# --datastore-endpoint:数据库存储地址(mysql://用户名:密码@tcp(数据库IP:端口)/数据库名)
# --node-ip 10.1.10.40:节点IP
# --node-name k3s-application-master-01:节点名称
# --node-taint:节点污点(避免普通Pod调度到master节点)
# --write-kubeconfig-mode 0644:kubeconfig文件权限(允许非root用户读取)
# --tls-san:TLS证书中包含的SAN(确保各节点/IP访问时证书有效)

# 查询集群token(用于后续worker节点加入集群)
sudo cat /var/lib/rancher/k3s/server/node-token

添加第二个master节点(10.1.10.41)

INSTALL_K3S_SKIP_DOWNLOAD=true  ./install.sh server \
--server https://10.1.10.10:6443 \
--token "app-k3s-token-2024" \
--cluster-cidr 10.46.0.0/16 \
--service-cidr 10.47.0.0/16 \
--flannel-backend none \
--disable-network-policy \
--disable=traefik \
--disable=servicelb \
--datastore-endpoint "mysql://app_db_user:App@Db123@tcp(192.168.192.170:3306)/k3s_application" \
--node-ip 10.1.10.41 \
--node-name k3s-application-master-02 \
--node-taint "CriticalAddonsOnly=true:NoExecute" \
--write-kubeconfig-mode 0644 \
--tls-san "10.1.10.10,10.1.10.5,10.1.10.6,10.1.10.40,10.1.10.41,k3s-application-master-02"

添加worker节点(10.1.10.45/46/47)

# 替换TOKEN和IP,其他同前
INSTALL_K3S_SKIP_DOWNLOAD=true ./install.sh agent \
--server https://10.1.10.10:6443 \
--token "app-k3s-token-2024" \
--node-ip 10.1.10.45 \
--node-name k3s-application-worker-01

6.4 验证集群状态(各集群master节点执行)

# k3s在Ubuntu上的kubectl已集成,直接执行
kubectl get nodes # 此时节点状态为NotReady(等待网络插件)
sudo systemctl status k3s # 验证k3s服务状态

第七章 网络插件与服务暴露组件部署

7.1 配置kubeconfig(所有集群master节点)

mkdir -p ~/.kube
sudo ln -s /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $USER:$USER ~/.kube/config # 赋予当前用户权限
kubectl config view # 验证配置

7.2 Calico网络插件(所有集群)

实现Pod跨节点通信,部署后节点状态变为Ready。

# 下载Calico部署文件
curl -O https://docs.projectcalico.org/v3.25/manifests/calico.yaml

# 修改Pod网段(需与k3s的Pod网段一致)
# Platform集群:10.42.0.0/16
# Middleware集群:10.44.0.0/16
# Application集群:10.46.0.0/16
sed -i 's|192.168.0.0/16|10.42.0.0/16|' calico.yaml # 按集群修改网段

# 部署Calico
kubectl apply -f calico.yaml

# 验证(10分钟内节点状态变为Ready)
kubectl get nodes
kubectl get pods -n kube-system | grep calico # 验证Calico Pod状态

7.3 MetalLB(所有集群)

为Service分配固定IP(使用10.2.10.100-250网段,按集群分段)。

Platform集群(10.2.10.100-150)

# 1. 下载MetalLB
curl https://raw.githubusercontent.com/metallb/metallb/v0.13.10/config/manifests/metallb-native.yaml -O

# 2. 部署MetalLB运算符
kubectl apply -f metallb-native.yaml

# 等待组件就绪
kubectl wait --namespace metallb-system \
--for=condition=ready pod \
--selector=app=metallb \
--timeout=90s

# 配置IP池
cat <<EOF | tee metallb-config.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: platform-pool
namespace: metallb-system
spec:
addresses:
- 10.2.10.100-10.2.10.150
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: platform-l2
namespace: metallb-system
spec:
ipAddressPools:
- platform-pool
EOF
kubectl apply -f metallb-config.yaml

Middleware集群(10.2.10.151-200)

# 1. 下载MetalLB
curl https://raw.githubusercontent.com/metallb/metallb/v0.13.10/config/manifests/metallb-native.yaml -O

# 2. 部署MetalLB运算符
kubectl apply -f metallb-native.yaml

kubectl wait --namespace metallb-system --for=condition=ready pod --selector=app=metallb --timeout=90s

cat <<EOF | tee metallb-config.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: middleware-pool
namespace: metallb-system
spec:
addresses:
- 10.2.10.151-10.2.10.200
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: middleware-l2
namespace: metallb-system
spec:
ipAddressPools:
- middleware-pool
EOF
kubectl apply -f metallb-config.yaml

Application集群(10.2.10.201-250)

# 1. 下载MetalLB
curl https://raw.githubusercontent.com/metallb/metallb/v0.13.10/config/manifests/metallb-native.yaml -O

# 2. 部署MetalLB运算符
kubectl apply -f metallb-native.yaml

kubectl wait --namespace metallb-system --for=condition=ready pod --selector=app=metallb --timeout=90s

cat <<EOF | tee metallb-config.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: application-pool
namespace: metallb-system
spec:
addresses:
- 10.2.10.201-10.2.10.250
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: application-l2
namespace: metallb-system
spec:
ipAddressPools:
- application-pool
EOF
kubectl apply -f metallb-config.yaml

7.4 Ingress-nginx(所有集群,按需部署)

用于HTTP/HTTPS流量路由,依赖Helm工具。

Platform集群(绑定MetalLB IP 10.2.10.100)

helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.type=LoadBalancer \
--set controller.service.loadBalancerIP=10.2.10.100 \
--set controller.ports.https=443

# 验证(EXTERNAL-IP为10.2.10.100,Pod状态为Running)
kubectl get svc ingress-nginx-controller -n ingress-nginx

Middleware集群(绑定MetalLB IP 10.2.10.151)

helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.type=LoadBalancer \
--set controller.service.loadBalancerIP=10.2.10.151 \
--set controller.ports.https=443

kubectl get svc ingress-nginx-controller -n ingress-nginx

Application集群(绑定MetalLB IP 10.2.10.201)

helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.type=LoadBalancer \
--set controller.service.loadBalancerIP=10.2.10.201 \
--set controller.ports.https=443

kubectl get svc ingress-nginx-controller -n ingress-nginx

第八章 Rancher部署

用于多集群统一管理,依赖Helm和Ingress-nginx。

8.1 安装cert-manager(证书管理)

helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true

# 等待组件就绪
kubectl wait --for=condition=Ready pods --all -n cert-manager

8.2 部署Rancher

kubectl create namespace cattle-system

helm upgrade --install rancher rancher-stable/rancher \
--namespace cattle-system \
--set hostname=rancher.k3s.local \
--set bootstrapPassword=admin123 \
--set replicas=2 \
--set ingress.tls.source=rancher \
--set ingress.ingressClassName=nginx \
--set 'extraEnv[0].name=CATTLE_TLS_SAN' \
--set 'extraEnv[0].value=rancher.k3s.local\,10.2.10.100'

8.3 访问验证

  • 配置本地hosts(Windows:C:\Windows\System32\drivers\etc\hosts;Ubuntu:/etc/hosts):
10.2.10.100 rancher.k3s.local

linux

echo "10.2.10.100 rancher.k3s.local" | sudo tee -a /etc/hosts
  • 浏览器访问https://rancher.k3s.local
  • 登录账号admin,密码admin123
  • 通过Rancher导入Middleware和Application集群(使用各集群的~/.kube/config文件)

第九章 完整网络拓扑图

9.1 拓扑总览(物理+虚拟层级)

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 物理主机(承载所有集群节点与网络组件,Ubuntu 24.04) |
| +------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |
| | 虚拟交换机(核心网络枢纽) | |
| | +----------------------+ +----------------------+ +----------------------+ +----------------+ +----------------------+ | | |
| | | VLAN 10(管理网) | | VLAN 100(服务网) | | 路由模块(跨VLAN) | | 端口绑定表 | | 安全组规则(网段隔离) | | | |
| | | 10.1.10.0/24 | | 192.168.192.70/24 | | (VLAN间路由转发) | | (节点-VLAN) | | (限制跨网段端口) | | | |
| | +----------------------+ +----------------------+ +----------------------+ +----------------+ +----------------------+ | | |
| | | | | | | | | |
| | +----------------------+ +----------------------+ +----------------------+ +----------------------+ +----------------------+ +----------------------+ | | |
| | | Platform集群 | | Middleware集群 | | Application集群 | | 内部API LB集群 | | 外部服务LB集群 | | 数据库节点 | | | |
| | | +------------------+ | | +------------------+ | | +------------------+ | | +------------------+ | | +------------------+ | | 192.168.192.170 | | | |
| | | |master:10.1.10.60 | | | |master:10.1.10.50 | | | |master:10.1.10.40 | | | |主LB:10.1.10.5 | | | |主LB:192.168.192.75| | | (MySQL集群) | | | |
| | | |master:10.1.10.61 | | | |master:10.1.10.51 | | | |master:10.1.10.41 | | | |备LB:10.1.10.6 | | | |备LB:192.168.192.76| | | | | | |
| | | |worker:10.1.10.65 | | | |worker:10.1.10.55 | | | |worker:10.1.10.45 | | | |VIP:10.1.10.10/11/12| | | |VIP:192.168.192.80 | | | | | | |
| | | |worker:10.1.10.66 | | | |worker:10.1.10.56 | | | |worker:10.1.10.46 | | | |(控制平面入口) | | | |(外部访问入口) | | | | | | |
| | | | | | | | | | | |worker:10.1.10.47 | | | | | | | | | | | | | | |
| | +----------------------+ +----------------------+ +----------------------+ +----------------------+ +----------------------+ +----------------------+ | | |
| | | | | | | | | |
| | +----------------------+ +----------------------+ +----------------------+ +----------------------+ +----------------------+ | | |
| | | 网络插件:Calico | | 服务暴露:MetalLB | | 入口控制器:Ingress | | 虚拟网卡(VLAN绑定)| | 存储卷(本地/共享) | | | |
| | | - Platform:10.42/43 | | - Platform:10.2.10.100-150 | - Platform:10.2.10.100 | | (绑定VLAN 10/100) | | (持久化数据存储) | | | |
| | | - Middleware:10.44/45| | - Middleware:10.2.10.151-200 | - Middleware:10.2.10.151 | | | | | | | |
| | | - Application:10.46/47| | - Application:10.2.10.201-250 | - Application:10.2.10.201 | | | | | | | |
| | +----------------------+ +----------------------+ +----------------------+ +----------------------+ +----------------------+ | | |
| +------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

9.2 架构关系图

9.3 核心场景数据流转

场景1:外部用户访问业务服务

  1. 外部客户端访问https://192.168.192.80(外部LB VIP);
  2. 外部LB(主192.168.192.75/备192.168.192.76)转发至Application集群Ingress-nginx(10.2.10.201);
  3. Ingress-nginx按规则路由至业务Pod,响应经原路径返回。

场景2:跨集群内网通信(Application→Middleware)

  1. Application集群业务Pod通过Middleware的MetalLB IP(如10.2.10.155)发起请求;
  2. 流量经Calico封装后,通过VLAN 10在管理网段内流转;
  3. Middleware集群Ingress-nginx接收请求,转发至中间件Pod,响应原路返回。

第十章 集群验证与高可用测试

10.1 基础功能验证

# 验证节点状态(所有节点Ready)
kubectl get nodes

# 验证网络插件(Calico Pod运行正常)
kubectl get pods -n kube-system | grep calico

# 验证MetalLB与Ingress(EXTERNAL-IP正确分配)
kubectl get svc -n ingress-nginx

# 验证跨集群通信(Application集群测试访问Middleware)
kubectl run test-pod -n default --image=ubuntu:24.04 --command -- sleep 3600
kubectl exec -it test-pod -- curl 10.2.10.155:6379 # 替换为Middleware中间件IP和端口

10.2 高可用测试

  1. 控制平面故障:关闭某集群master节点(sudo systemctl stop k3s),验证剩余master节点正常提供服务;
  2. LB故障切换:关闭外部主LB(sudo systemctl stop keepalived haproxy),验证备LB接管VIP,外部访问正常;
  3. 服务可用性:删除某业务Pod(kubectl delete pod <pod-name>),验证Deployment自动重建,服务不中断。

第十一章 国内镜像配置

若服务器无法访问国外镜像仓库,配置国内镜像源:

11.1 系统镜像源替换

sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
sudo tee /etc/apt/sources.list <<EOF
deb http://mirrors.aliyun.com/ubuntu/ noble main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ noble main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ noble-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ noble-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ noble-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ noble-backports main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ noble-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ noble-security main restricted universe multiverse
EOF
sudo apt update

11.2 k3s镜像仓库配置

sudo mkdir -p /etc/rancher/k3s
sudo tee /etc/rancher/k3s/registries.yaml <<EOF
mirrors:
docker.io:
endpoint:
- "https://docker.1panel.live"
- "https://docker.1panelproxy.com"
- "https://docker.m.daocloud.io"
- "https://huecker.io"
- "https://dockerhub.timeweb.cloud"
- "https://noohub.ru"
- "https://b6ce57c867d045bob163be8658fd1438.mirror.swr.myhuaweicloud.com"
- "https://f2kfz0k0.mirror.aliyuncs.com"
- "https://registry.docker-cn.com"
- "http://hub-mirror.c.163.com"
- "https://docker.mirrors.ustc.edu.cn"
"k8s.gcr.io":
endpoint:
- "https://lank8s.cn"
- "https://k8s.lank8s.cn"
- "https://registry.aliyuncs.com/google_containers"
"gcr.io":
endpoint:
- "https://gcr.m.daocloud.io"
- "https://gcr.lank8s.cn"
"ghcr.io":
endpoint:
- "https://gcr.m.daocloud.io"
- "https://ghcr.lank8s.cn"
"registry.k8s.io":
endpoint:
- "https://registry.lank8s.cn"
- "https://registry.aliyuncs.com/v2/google_containers"
quay.io:
endpoint:
- "https://quay.tencentcloudcr.com/"
EOF
# 重启k3s生效
sudo systemctl restart k3s # master节点
sudo systemctl restart k3s-agent # worker节点

国内无法拉取的镜像也可以从渡渡鸟镜像拉取 https://docker.aityp.com/

k3s基础镜像(Master节点安装)

sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/mirrored-coredns-coredns:1.12.3
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/mirrored-coredns-coredns:1.12.3 docker.io/rancher/mirrored-coredns-coredns:1.12.3
sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/local-path-provisioner:v0.0.31
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/local-path-provisioner:v0.0.31 docker.io/rancher/local-path-provisioner:v0.0.31
sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/mirrored-metrics-server:v0.8.0
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/mirrored-metrics-server:v0.8.0 docker.io/rancher/mirrored-metrics-server:v0.8.0

calico 插件镜像(所有节点安装)

sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/cni:v3.26.1
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/cni:v3.26.1 docker.io/calico/cni:v3.26.1
sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/node:v3.26.1
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/node:v3.26.1 docker.io/calico/node:v3.26.1
sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/kube-controllers:v3.26.1
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/kube-controllers:v3.26.1 docker.io/calico/kube-controllers:v3.26.1

metallb 插件镜像(需要翻墙)

sudo ctr -n k8s.io images pull quay.io/metallb/controller:v0.13.10

ingress-nginx 插件镜像(需要翻墙)

sudo ctr -n k8s.io images pull registry.k8s.io/ingress-nginx/kube-webhook-certgen:v1.6.4@sha256:bcfc926ed57831edf102d62c5c0e259572591df4796ef1420b87f9cf6092497f
sudo ctr -n k8s.io images pull registry.k8s.io/ingress-nginx/controller:v1.14.0@sha256:e4127065d0317bd11dc64c4dd38dcf7fb1c3d72e468110b4086e636dbaac943d

rancher 插件镜像(所有节点安装)

sudo ctr -n k8s.io images  pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/rancher:v2.12.3
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/rancher:v2.12.3 docker.io/rancher/rancher:v2.12.3

sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/shell:v0.5.0
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/shell:v0.5.0 docker.io/rancher/shell:v0.5.0

sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/system-upgrade-controller:v0.16.3
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/system-upgrade-controller:v0.16.3 docker.io/rancher/system-upgrade-controller:v0.16.3

sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/rancher-webhook:v0.8.3
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/rancher-webhook:v0.8.3 docker.io/rancher/rancher-webhook:v0.8.3

sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/fleet:v0.13.4
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/fleet:v0.13.4 docker.io/rancher/fleet:v0.13.4

sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/fleet-agent:v0.13.4
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/fleet-agent:v0.13.4 docker.io/rancher/fleet-agent:v0.13.4

sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/rancher-agent:v2.12.3
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/rancher-agent:v2.12.3 docker.io/rancher/rancher-agent:v2.12.3

sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/mirrored-cluster-api-controller:v1.10.2
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/rancher/mirrored-cluster-api-controller:v1.10.2 docker.io/rancher/mirrored-cluster-api-controller:v1.10.2

sudo ctr -n k8s.io images pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/library/busybox:1.35
sudo ctr -n k8s.io images tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/library/busybox:1.35 docker.io/library/busybox:1.35