【源码分析】StarRocks 初始化场景：Follower FE 找不到 BE 导致 “Cluster has no available capacity“ 问题分析

原创已于 2025-12-19 21:09:53 修改 · 502 阅读

15 ·

CC 4.0 BY-SA版权

文章标签：

#python #开发语言

于 2025-12-19 12:30:44 首次发布

starrocks 专栏收录该内容

14 篇文章

订阅专栏

2025博客之星年度评选已开启 10w+人浏览 1.5k人参与

问题场景

初始化流程：

先启动 3 个 FE（1 Leader + 2 Follower）
然后启动 3 个 BE（BE 通过 MySQL 连接到 FE 并注册自己）

问题现象：

两个 Follower FE 报错：Cluster has no available capacity
Leader FE 可能正常（能看到 BE）

根本原因分析

1. BE 注册机制

BE 启动时，会通过 MySQL 协议连接到 FE 并注册自己。关键点：

BE 只连接到指定的 FE（通常是 Leader，或配置的 helper FE）
BE 注册操作只在连接的 FE 上执行，不会直接广播到所有 FE
BE 注册信息通过 EditLog 同步到其他 FE

2. 问题发生的时序

时间线：
T1: 启动 3 个 FE
    - FE1 成为 Leader
    - FE2, FE3 成为 Follower（可能还在同步 journal）

T2: 启动 3 个 BE
    - BE1 连接到 Leader FE1，注册成功
    - BE2 连接到 Leader FE1，注册成功
    - BE3 连接到 Leader FE1，注册成功
    
    Leader FE1:
    - 执行 addBackend() → 更新 idToBackendRef
    - 记录到 EditLog: logAddBackend()
    - 此时 Leader 能看到 3 个 BE，容量正常

T3: Follower FE2, FE3 的状态
    - 如果 Follower 还没完全启动/同步完成
    - 或者 Follower 的 journal replay 还没追上
    - 它们可能还没 replay 到 BE 注册的 EditLog
    - 因此 idToBackendRef 还是空的（或只有部分 BE）
    - 导致 getClusterAvailableCapacityB() 返回 0 或很小
    - 触发 "Cluster has no available capacity" 错误

3. 代码流程分析

3.1 BE 注册到 Leader FE

当 BE 通过 MySQL 连接到 Leader FE 并执行注册时：

// SystemInfoService.addBackend() 方法（第 203-224 行）
private void addBackend(String host, int heartbeatPort) {
    Backend newBackend = new Backend(GlobalStateMgr.getCurrentState().getNextId(), host, heartbeatPort);
    
    // 1. 更新 Leader 的 idToBackendRef（立即生效）
    Map<Long, Backend> copiedBackends = Maps.newHashMap(idToBackendRef);
    copiedBackends.put(newBackend.getId(), newBackend);
    idToBackendRef = ImmutableMap.copyOf(copiedBackends);
    
    // 2. 记录到 EditLog（用于同步到 Follower）
    GlobalStateMgr.getCurrentState().getEditLog().logAddBackend(newBackend);
    LOG.info("finished to add {} ", newBackend);
}

关键点：

Leader 的 idToBackendRef 立即更新（Leader 能立即看到 BE）
BE 注册信息记录到 EditLog（需要时间同步到 Follower）

3.2 Follower FE 同步 BE 信息

Follower FE 通过 Replay EditLog 来同步 BE 信息：

// SystemInfoService.replayAddBackend() 方法（第 909-934 行）
public void replayAddBackend(Backend newBackend) {
    // 更新 Follower 的 idToBackendRef
    Map<Long, Backend> copiedBackends = Maps.newHashMap(idToBackendRef);
    copiedBackends.put(newBackend.getId(), newBackend);
    idToBackendRef = ImmutableMap.copyOf(copiedBackendRef);
    
    // 添加到集群
    if (newBackend.getBackendState() == BackendState.using) {
        final Cluster cluster = GlobalStateMgr.getCurrentState().getCluster();
        if (null != cluster) {
            cluster.addBackend(newBackend.getId());
        }
    }
}

关键点：

Follower 的 idToBackendRef 只有在 Replay EditLog 时才会更新
如果 Follower 的 journal replay 还没追上，就看不到新注册的 BE

3.3 容量检查逻辑

当 TableKeeper 或其他组件尝试创建表时：

// SystemInfoService.checkClusterCapacity() 方法（第 1024-1028 行）
public void checkClusterCapacity() throws DdlException {
    if (getClusterAvailableCapacityB() <= 0L) {
        throw new DdlException("Cluster has no available capacity");
    }
}

// SystemInfoService.getClusterAvailableCapacityB() 方法（第 1007-1022 行）
public long getClusterAvailableCapacityB() {
    List<Backend> clusterBackends = getBackends();  // 从 idToBackendRef 获取
    long capacity = 0L;
    for (Backend backend : clusterBackends) {
        if (backend.isDecommissioned()) {
            capacity -= backend.getDataUsedCapacityB();
        } else {
            capacity += backend.getAvailableCapacityB();  // 如果 BE 不在 idToBackendRef 中，这里就是 0
        }
    }
    return capacity;
}

关键点：

getBackends() 从 idToBackendRef 获取 BE 列表
如果 Follower 的 idToBackendRef 还是空的（或只有部分 BE），容量就是 0
触发 “Cluster has no available capacity” 错误

为什么会出现这个问题？

原因1：Follower 启动/同步延迟

场景：

Follower FE2, FE3 启动较慢，或者还在 replay journal
BE 注册时，Follower 的 journal replay 还没追上
Follower 的 idToBackendRef 还是空的

验证方法：

-- 在 Follower FE 上执行
SHOW FRONTENDS;
-- 查看 ReplayedJournalId，如果比 Leader 小很多，说明还在同步

SHOW BACKENDS;
-- 如果看不到 BE，说明还没 replay 到 BE 注册的 EditLog

原因2：Follower 的 journal replay 卡住

场景：

Follower 的 journal replay 遇到错误/异常，卡在某个 journal ID
无法继续 replay 后续的 BE 注册 EditLog
导致 Follower 永远看不到新注册的 BE

验证方法：

# 在 Follower FE 上查看日志
tail -100 fe/log/fe.log | grep -i "replay\|error\|exception"

原因3：BE 注册时 Follower 还没完全加入集群

场景：

Follower FE2, FE3 虽然启动了，但可能还没完全加入集群（状态异常）
Leader 的 EditLog 可能还没同步到这些 Follower
导致 Follower 看不到 BE

验证方法：

-- 在 Leader FE 上执行
SHOW FRONTENDS;
-- 查看 Follower 的 Alive 状态和 ReplayedJournalId

解决方案

方案1：等待 Follower 同步完成（推荐）

操作：

等待 Follower FE 的 journal replay 追上 Leader
确认 Follower 能看到所有 BE

验证：

-- 在 Leader 上查看
SHOW FRONTENDS;
-- 确认 Follower 的 ReplayedJournalId 接近 Leader

-- 在 Follower 上查看（如果能连接）
SHOW BACKENDS;
-- 确认能看到所有 BE

等待时间：通常需要 1-5 分钟（取决于 journal 数量）

方案2：重启 Follower FE（如果同步卡住）

操作：

停止 Follower FE
使用 Leader 作为 helper 重新启动

# 在 Follower FE 上执行
./bin/stop_fe.sh

# 使用 Leader 作为 helper 启动
./bin/start_fe.sh --helper <leader_ip>:9010 --daemon

# 查看启动日志，等待同步完成
tail -f log/fe.log | grep -i "replay\|ready\|transfer"

预防措施

1. 正确的启动顺序

推荐顺序：

先启动 Leader FE，等待完全启动
再启动 Follower FE，等待完全启动并同步完成
最后启动 BE，让 BE 注册到 Leader

验证每个步骤：

-- 步骤1：确认 Leader 启动
SHOW FRONTENDS;  -- 应该看到 1 个 Leader

-- 步骤2：确认 Follower 启动并同步
SHOW FRONTENDS;  -- 应该看到 1 Leader + 2 Follower，且 （重要点）ReplayedJournalId 接近

-- 步骤3：启动 BE 后，在所有 FE 上验证
SHOW BACKENDS;   -- 所有 FE 都应该能看到 BE

2. 监控 Follower 同步状态

定期检查：

-- 在 Leader 上执行
SHOW FRONTENDS;

-- 关注：
-- - Follower 的 Alive 状态
-- - ReplayedJournalId 是否接近 Leader
-- - LastHeartbeat 是否正常

3. 确保网络连通性

检查：

Follower FE 能连接到 Leader FE（9010 端口）
Follower FE 能连接到 BE（9050 心跳端口）
Leader FE 能连接到所有 BE

代码关键点总结

BE 注册只在连接的 FE（通常是 Leader）上立即生效
- Leader 的 idToBackendRef 立即更新
- 其他 FE 需要通过 EditLog Replay 才能看到
Follower 的 BE 视图依赖 Journal Replay
- replayAddBackend() 方法更新 Follower 的 idToBackendRef
- 如果 Replay 还没追上，Follower 就看不到 BE
容量检查基于本地的 idToBackendRef
- getClusterAvailableCapacityB() 从 idToBackendRef 获取 BE 列表
- 如果 idToBackendRef 为空，容量就是 0，触发错误

快速诊断命令

-- 1. 在 Leader 上检查 FE 状态
SHOW FRONTENDS;

-- 2. 在 Leader 上检查 BE 状态
SHOW BACKENDS;

-- 3. 在 Follower 上检查 BE 状态（如果能连接）
SHOW BACKENDS;

-- 4. 对比 Leader 和 Follower 的 ReplayedJournalId
-- 如果差异很大，说明 Follower 还在同步

# 5. 在 Follower 上查看 journal replay 日志
tail -100 fe/log/fe.log | grep -i "replay\|error\|exception"

# 6. 检查 Follower 到 Leader 的网络连通性
nc -zv <leader_ip> 9010