Spring Boot 项目中的潜在隐患：缓存策略不当引发的内存泄漏与性能退化

本文链接：https://blog.youkuaiyun.com/weixin_45616905/article/details/147142314

大家好！今天来聊一个在 SpringBoot 项目中经常被忽视但又极其重要的问题：缓存策略不当。不知道你是否遇到过这样的情况：系统上线初期一切正常，随着时间推移和数据量增加，应用突然变得越来越慢，甚至出现内存溢出（OOM）？很可能，你的缓存策略出了问题！

一、常见的缓存陷阱

1. 无限制缓存：最常见的内存问题

案例：我曾参与一个电商系统，团队使用 Caffeine 缓存存储商品信息。代码看起来很简单：

@Configuration
public class CacheConfig {
    @Bean
    public CacheManager cacheManager() {
        CaffeineCacheManager cacheManager = new CaffeineCacheManager("productCache");
        cacheManager.setCaffeine(Caffeine.newBuilder());
        return cacheManager;
    }
}

@Service
public class ProductService {
    @Cacheable(value = "productCache", key = "#productId")
    public ProductDTO getProduct(Long productId) {
        // 从数据库获取商品
        return productRepository.findById(productId)
                .map(this::convertToDTO)
                .orElseThrow(() -> new ProductNotFoundException(productId));
    }
}

问题分析：看出问题了吗？我们没有设置缓存的最大容量和过期时间！随着商品数量增加，缓存不断膨胀，最终导致 JVM 内存不足。

┌─────────────────────────────────┐
│        JVM堆内存空间            │
│                                 │
│  ┌─────────────────────────┐    │
│  │     无限制缓存空间      │    │
│  │                         │    │
│  │   [商品1] [商品2] ...   │    │
│  │                         │    │
│  │  随着商品不断增加，     │    │
│  │  缓存占用空间不断扩大   │    │
│  └─────────────────────────┘    │
│                                 │
│  可用内存空间越来越小 ↓↓↓       │
│                                 │
└─────────────────────────────────┘

解决方案：设置最大缓存容量和过期策略

@Configuration
public class CacheConfig {
    @Bean
    public CacheManager cacheManager() {
        CaffeineCacheManager cacheManager = new CaffeineCacheManager("productCache");
        cacheManager.setCaffeine(Caffeine.newBuilder()
                .maximumSize(10000)  // 最多缓存10000个商品
                .expireAfterWrite(1, TimeUnit.HOURS)  // 缓存项写入1小时后过期
                .recordStats());  // 启用统计，便于监控
        return cacheManager;
    }
}

2. 缓存键设计不当：内存浪费和命中率低

案例：一个用户权限系统，缓存用户的权限信息：

@Service
public class AuthorizationService {
    @Cacheable(value = "permissionCache", key = "#request.toString()")
    public boolean hasPermission(PermissionRequest request) {
        // 复杂的权限检查逻辑
        return checkUserPermission(request);
    }
}

问题分析：使用整个 request 对象作为缓存键，每个微小的请求参数变化都会生成新的缓存项，导致缓存命中率极低，且占用大量内存。

解决方案：精心设计缓存键，只包含影响结果的关键属性

@Service
public class AuthorizationService {
    @Cacheable(value = "permissionCache",
              key = "#request.userId + '_' + #request.resourceType + '_' + #request.actionType")
    public boolean hasPermission(PermissionRequest request) {
        // 复杂的权限检查逻辑
        return checkUserPermission(request);
    }
}

优化改进：使用自定义 KeyGenerator 避免硬编码拼接

@Configuration
public class CacheConfig extends CachingConfigurerSupport {
    @Override
    public KeyGenerator keyGenerator() {
        return new CustomKeyGenerator();
    }
}

public class CustomKeyGenerator implements KeyGenerator {
    @Override
    public Object generate(Object target, Method method, Object... params) {
        StringBuilder sb = new StringBuilder();
        // 添加类名和方法名
        sb.append(target.getClass().getSimpleName()).append(":")
          .append(method.getName());

        // 智能处理参数
        if (params.length > 0) {
            sb.append(":");
            for (Object param : params) {
                if (param == null) {
                    sb.append("null");
                } else if (param instanceof PermissionRequest) {
                    // 特殊处理PermissionRequest
                    PermissionRequest req = (PermissionRequest) param;
                    sb.append(req.getUserId())
                      .append("_").append(req.getResourceType())
                      .append("_").append(req.getActionType());
                } else {
                    sb.append(param.toString());
                }
                sb.append("_");
            }
        }

        return sb.toString();
    }
}

// 使用示例
@Service
public class AuthorizationService {
    // 使用自定义KeyGenerator
    @Cacheable(value = "permissionCache", keyGenerator = "keyGenerator")
    public boolean hasPermission(PermissionRequest request) {
        // 复杂的权限检查逻辑
        return checkUserPermission(request);
    }
}

3. 缓存粒度过大：内存浪费和更新频繁

案例：一个社交应用缓存用户信息，包括经常变化的状态数据：

@Service
public class UserProfileService {
    @Cacheable(value = "userCache", key = "#userId")
    public UserProfileDTO getUserProfile(Long userId) {
        UserEntity user = userRepository.findById(userId).orElseThrow();
        // 包含基本信息、状态、统计数据等
        return convertToDTO(user);
    }
}

问题分析：将频繁变化的数据（如在线状态）和相对稳定的数据（如用户基本信息）放在同一个缓存项中，导致整个缓存项频繁失效，缓存效果大打折扣。

┌───────────────────────────────────────────┐
│            单一大粒度缓存项               │
│                                           │
│  ┌─────────────┬─────────────┬─────────┐  │
│  │ 用户基本信息 │ 在线状态信息 │ 统计数据 │  │
│  │ (很少变化)   │ (频繁变化)   │ (偶尔变化)│  │
│  └─────────────┴─────────────┴─────────┘  │
│                                           │
│       任何部分变化都导致整体失效          │
└───────────────────────────────────────────┘

解决方案：按数据变化频率拆分缓存

@Service
public class UserProfileService {
    @Cacheable(value = "userBasicInfoCache", key = "#userId")
    public UserBasicInfoDTO getUserBasicInfo(Long userId) {
        // 获取基本信息（很少变化）
        return userRepository.findBasicInfoById(userId);
    }

    @Cacheable(value = "userStatusCache", key = "#userId", unless = "#result == null",
              condition = "#userId != null")
    public UserStatusDTO getUserStatus(Long userId) {
        // 获取状态信息（频繁变化），设置短期过期时间
        return statusRepository.findLatestByUserId(userId);
    }

    public UserProfileDTO getUserProfile(Long userId) {
        // 组合不同缓存数据
        UserBasicInfoDTO basicInfo = getUserBasicInfo(userId);
        UserStatusDTO status = getUserStatus(userId);
        UserStatsDTO stats = getUserStats(userId);
        return new UserProfileDTO(basicInfo, status, stats);
    }
}

二、高并发场景下的缓存灾难

1. 缓存穿透：恶意请求的噩梦

案例：一个商品查询 API，当请求不存在的商品 ID 时，缓存无法发挥作用：

@RestController
public class ProductController {
    @GetMapping("/products/{id}")
    public ProductDTO getProduct(@PathVariable Long id) {
        return productService.getProduct(id);
    }
}

@Service
public class ProductService {
    @Cacheable(value = "productCache", key = "#productId")
    public ProductDTO getProduct(Long productId) {
        // 如果是不存在的ID，每次都会查询数据库
        return productRepository.findById(productId)
                .map(this::convertToDTO)
                .orElse(null); // 返回null，不会被缓存
    }
}

问题分析：当大量请求查询不存在的 ID 时（例如恶意攻击），会导致大量请求直接打到数据库，绕过缓存，引发数据库压力剧增。

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  恶意请求   │ -> │    缓存     │ -> │   数据库    │
│ 不存在的ID  │    │  未命中!    │    │ 大量查询!   │
└─────────────┘    └─────────────┘    └─────────────┘
      │                  │                  │
      │                  │                  │
      v                  v                  v
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  更多请求   │ -> │  仍未命中!  │ -> │ 数据库压力  │
│ 不存在的ID  │    │             │    │ 持续增加!   │
└─────────────┘    └─────────────┘    └─────────────┘

解决方案：布隆过滤器 + 空值缓存

@Service
public class ProductService {
    private BloomFilter<Long> productIdFilter;

    @PostConstruct
    public void initBloomFilter() {
        // 初始化布隆过滤器，加载所有商品ID
        productIdFilter = BloomFilter.create(
            Funnels.longFunnel(),
            1000000,  // 预期元素数量
            0.01      // 误判率
        );

        // 分批加载所有商品ID到布隆过滤器，避免内存占用过高
        long lastId = 0;
        boolean hasMore = true;
        int batchSize = 10000;
        int totalLoaded = 0;

        while (hasMore) {
            List<Long> idBatch = productRepository.findIdsBatchAfter(lastId, batchSize);

            if (idBatch.isEmpty()) {
                hasMore = false;
            } else {
                for (Long id : idBatch) {
                    productIdFilter.put(id);
                    lastId = Math.max(lastId, id);
                }

                totalLoaded += idBatch.size();
                log.info("已加载{}个商品ID到布隆过滤器", totalLoaded);
            }
        }
    }

    public ProductDTO getProduct(Long productId) {
        // 布隆过滤器预判断，可能存在误判情况(判定不存在实际存在)
        if (!productIdFilter.mightContain(productId)) {
            log.debug("布隆过滤器判断商品ID{}可能不存在", productId);
            // 即使布隆过滤器判断不存在，仍然查询一次数据库确认
            // 避免因为布隆过滤器误判而漏查有效数据
        }

        // 查询缓存
        String cacheKey = "product:" + productId;
        ProductDTO product = (ProductDTO) cacheService.get(cacheKey);

        if (product != null) {
            // 注意: 这里需要区分空值占位符和有效数据
            if (product instanceof EmptyProductDTO) {
                return null; // 返回真正的null给调用者
            }
            return product;
        }

        // 查询数据库
        product = productRepository.findById(productId)
                .map(this::convertToDTO)
                .orElse(null);

        // 空值缓存处理 - Spring Cache的配置方式
        /*
           使用Spring的@Cacheable时需要注意:
           @Cacheable(value = "productCache", key = "#productId", unless = "#result == null")
           默认情况下Spring Cache不会缓存null值，需要通过unless = "#result == null"取反来允许缓存null

           或者使用自定义CacheManager:
           public CacheManager cacheManager() {
               return new CaffeineCacheManager() {
                   @Override
                   protected Cache createCache(String name) {
                       return new CaffeineCache(name, createNativeCaffeineCache(name), true); // allowNullValues=true
                   }
               };
           }
        */

        // 使用自定义cacheService时的空值处理
        if (product != null) {
            cacheService.put(cacheKey, product, 3600); // 有效值缓存1小时
        } else {
            // 存储空值占位符，而不是直接存null
            cacheService.put(cacheKey, new EmptyProductDTO(), 60); // 空值缓存1分钟
        }

        return product;
    }
}

// 空值占位对象
public class EmptyProductDTO extends ProductDTO {
    public EmptyProductDTO() {
        super(0L, "", "", 0.0);
    }
}

2. 缓存击穿：热点数据失效

案例：秒杀系统中的热门商品缓存失效

问题分析：当热点数据缓存恰好过期时，大量并发请求同时击穿缓存，直接请求数据库，导致数据库瞬间压力剧增。

解决方案：互斥锁（单应用）或分布式锁（集群环境）防止并发重建缓存

@Service
public class HotProductService {
    private final Map<String, ReentrantLock> keyLockMap = new ConcurrentHashMap<>();
    private static final int MAX_RETRY = 3; // 最大重试次数

    public ProductDTO getHotProduct(Long productId) {
        String cacheKey = "hotProduct:" + productId;
        ProductDTO product = null;
        int retryCount = 0;

        // 使用循环而非递归，避免栈溢出风险
        while (product == null && retryCount < MAX_RETRY) {
            product = (ProductDTO) cacheService.get(cacheKey);

            if (product != null) {
                return product;
            }

            // 获取该key的锁
            ReentrantLock lock = keyLockMap.computeIfAbsent(cacheKey, k -> new ReentrantLock());

            // 尝试获取锁，避免并发重建缓存
            boolean locked = lock.tryLock();
            try {
                if (locked) {
                    // 双重检查，避免其他线程已经重建缓存
                    product = (ProductDTO) cacheService.get(cacheKey);
                    if (product != null) {
                        return product;
                    }

                    // 从数据库获取数据并重建缓存
                    product = productRepository.findById(productId)
                            .map(this::convertToDTO)
                            .orElse(null);

                    if (product != null) {
                        cacheService.put(cacheKey, product, 3600); // 缓存1小时
                    }

                    // 即使product为null也跳出循环，避免无谓的重试
                    break;
                } else {
                    // 没获取到锁，说明有其他线程正在重建缓存
                    // 短暂等待后重试从缓存获取
                    Thread.sleep(50);
                    retryCount++; // 增加重试计数

                    // 在日志中记录重试情况，便于监控和排查问题
                    if (retryCount >= MAX_RETRY) {
                        log.warn("获取热点商品{}缓存重试次数达到上限", productId);
                    }
                }
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                break;
            } finally {
                if (locked) {
                    lock.unlock();
                    // 清理不再使用的锁对象
                    keyLockMap.remove(cacheKey);
                }
            }
        }

        return product;
    }
}

// 集群环境下使用Redisson实现分布式锁
@Service
public class DistributedHotProductService {
    @Autowired
    private RedissonClient redissonClient;

    public ProductDTO getHotProduct(Long productId) {
        String cacheKey = "hotProduct:" + productId;
        ProductDTO product = (ProductDTO) cacheService.get(cacheKey);

        if (product != null) {
            return product;
        }

        // 创建分布式锁
        String lockKey = "lock:hotProduct:" + productId;
        RLock lock = redissonClient.getLock(lockKey);

        boolean locked = false;
        try {
            // 尝试获取锁，最多等待100ms，锁过期时间5秒
            locked = lock.tryLock(100, 5000, TimeUnit.MILLISECONDS);

            if (locked) {
                // 双重检查，避免其他服务实例已经重建缓存
                product = (ProductDTO) cacheService.get(cacheKey);
                if (product != null) {
                    return product;
                }

                // 从数据库获取数据并重建缓存
                product = productRepository.findById(productId)
                        .map(this::convertToDTO)
                        .orElse(null);

                if (product != null) {
                    cacheService.put(cacheKey, product, 3600);
                }
            } else {
                log.debug("未能获取商品{}的分布式锁，等待其他实例重建缓存", productId);

                // 短暂等待后再查一次缓存
                Thread.sleep(200);
                product = (ProductDTO) cacheService.get(cacheKey);
            }
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        } finally {
            if (locked) {
                lock.unlock();
            }
        }

        return product;
    }
}

3. 缓存雪崩：系统的噩梦

案例：某电商平台在促销活动期间，大量缓存同时过期

问题分析：如果大量缓存项设置了相同的过期时间，会导致它们同时失效，引发缓存雪崩，系统性能急剧下降。

┌─────────────────────────────────────────────────────┐
│                    时间轴                           │
│                                                     │
│  ┌─────┐  ┌─────┐  ┌─────┐       ┌─────┐  ┌─────┐  │
│  │缓存1│  │缓存2│  │缓存3│  ...  │缓存n-1│ │缓存n│  │
│  └─────┘  └─────┘  └─────┘       └─────┘  └─────┘  │
│     │        │        │             │        │     │
│     │        │        │             │        │     │
│     ▼        ▼        ▼             ▼        ▼     │
│  ┌─────────────────────────────────────────────┐   │
│  │              同一时间点过期                 │   │
│  └─────────────────────────────────────────────┘   │
│                        │                           │
│                        ▼                           │
│  ┌─────────────────────────────────────────────┐   │
│  │        大量请求直接访问数据库               │   │
│  └─────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────┘

解决方案：

随机过期时间
多级缓存架构
熔断降级机制

@Configuration
public class CacheConfig {
    @Bean
    public CacheManager cacheManager() {
        CaffeineCacheManager cacheManager = new CaffeineCacheManager();

        // 配置不同缓存空间的策略
        Map<String, Caffeine<Object, Object>> cacheBuilders = new HashMap<>();

        // 商品缓存 - 随机过期时间
        cacheBuilders.put("productCache", Caffeine.newBuilder()
                .maximumSize(10000)
                .expireAfterWrite(randomExpireTime(3600, 4000), TimeUnit.SECONDS)); // 1小时左右随机过期

        // 用户缓存 - 随机过期时间
        cacheBuilders.put("userCache", Caffeine.newBuilder()
                .maximumSize(5000)
                .expireAfterWrite(randomExpireTime(1800, 2200), TimeUnit.SECONDS)); // 30分钟左右随机过期

        cacheManager.setCacheBuildersByName(cacheBuilders);
        return cacheManager;
    }

    // 生成随机过期时间，避免同时过期
    private int randomExpireTime(int baseSeconds, int maxSeconds) {
        return baseSeconds + new Random().nextInt(maxSeconds - baseSeconds);
    }
}

// 多级缓存架构实现
@Service
public class MultiLevelCacheService {
    // 一级缓存：JVM本地缓存 (极快，但容量有限)
    private final Cache<String, Object> localCache;

    // 二级缓存：应用级缓存 (较快，容量较大)
    private final CaffeineCacheManager caffeineCacheManager;

    // 三级缓存：分布式缓存 (慢于本地，但可跨实例共享)
    private final RedisTemplate<String, Object> redisTemplate;

    public MultiLevelCacheService() {
        // 本地缓存配置：容量小，超短过期时间，适合极热点数据
        this.localCache = CacheBuilder.newBuilder()
                .maximumSize(1000) // 仅缓存1000个对象
                .expireAfterWrite(30, TimeUnit.SECONDS) // 30秒过期
                .recordStats() // 启用统计
                .build();

        // 应用级缓存：容量适中，短过期时间
        this.caffeineCacheManager = new CaffeineCacheManager();
        caffeineCacheManager.setCaffeine(Caffeine.newBuilder()
                .maximumSize(50000) // 缓存50000个对象
                .expireAfterWrite(randomExpireTime(300, 600), TimeUnit.SECONDS) // 5-10分钟随机过期
                .recordStats());

        // Redis缓存配置在Bean中初始化
    }

    public Object get(String key, String cacheRegion) {
        String fullKey = cacheRegion + ":" + key;

        // 1. 查询本地缓存
        Object value = localCache.getIfPresent(fullKey);
        if (value != null) {
            log.debug("本地缓存命中: {}", fullKey);
            return value;
        }

        // 2. 查询Caffeine缓存
        Cache caffeineCache = caffeineCacheManager.getCache(cacheRegion);
        if (caffeineCache != null) {
            Cache.ValueWrapper wrapper = caffeineCache.get(key);
            if (wrapper != null) {
                value = wrapper.get();
                if (value != null) {
                    log.debug("Caffeine缓存命中: {}", fullKey);
                    // 回填本地缓存
                    localCache.put(fullKey, value);
                    return value;
                }
            }
        }

        // 3. 查询Redis缓存
        try {
            value = redisTemplate.opsForValue().get(fullKey);
            if (value != null) {
                log.debug("Redis缓存命中: {}", fullKey);
                // 回填本地和Caffeine缓存
                localCache.put(fullKey, value);
                if (caffeineCache != null) {
                    caffeineCache.put(key, value);
                }
                return value;
            }
        } catch (Exception e) {
            log.warn("Redis缓存访问异常: {}", e.getMessage());
            // Redis异常不影响后续逻辑
        }

        return null; // 所有级别缓存都未命中
    }

    // 更新缓存的方法会分别更新三级缓存，并使用随机过期时间
    // ...
}

// 熔断降级机制 - 详细配置说明
@Service
public class ResilientCacheService {
    // 使用Resilience4j熔断器
    private final CircuitBreaker circuitBreaker;

    public ResilientCacheService() {
        // 熔断器配置详解
        CircuitBreakerConfig config = CircuitBreakerConfig.custom()
                .failureRateThreshold(50) // 当失败率达到50%时触发熔断
                .waitDurationInOpenState(Duration.ofSeconds(30)) // 熔断后等待30秒再尝试半开状态
                .permittedNumberOfCallsInHalfOpenState(10) // 半开状态允许10次调用测试
                .minimumNumberOfCalls(20) // 至少20次调用才开始计算失败率
                .slidingWindowSize(100) // 基于最近100次调用计算失败率
                .slidingWindowType(SlidingWindowType.COUNT_BASED) // 基于调用次数的滑动窗口
                .recordExceptions(TimeoutException.class, IOException.class) // 记录哪些异常算作失败
                .build();

        circuitBreaker = CircuitBreaker.of("cacheCircuitBreaker", config);

        // 添加状态变化监听器，便于监控
        circuitBreaker.getEventPublisher()
            .onStateTransition(event -> {
                CircuitBreaker.State newState = event.getStateTransition().getToState();
                log.warn("缓存熔断器状态变化: {} -> {}",
                         event.getStateTransition().getFromState(),
                         newState);

                if (newState == CircuitBreaker.State.OPEN) {
                    // 发送告警通知
                    notificationService.sendAlert("缓存服务熔断，请检查缓存服务状态！");
                }
            });
    }

    public ProductDTO getProduct(Long productId) {
        // 使用熔断器包装缓存调用
        Supplier<ProductDTO> cacheSupplier = CircuitBreaker.decorateSupplier(
                circuitBreaker, () -> getCachedProduct(productId));

        try {
            return cacheSupplier.get();
        } catch (Exception e) {
            log.warn("缓存服务异常，启动降级: {}", e.getMessage());
            // 降级处理，例如返回基础数据或默认值
            return getFallbackProduct(productId);
        }
    }

    private ProductDTO getCachedProduct(Long productId) {
        // 正常的缓存获取逻辑
        // ...
    }

    private ProductDTO getFallbackProduct(Long productId) {
        // 降级逻辑，从只读数据库或本地快照获取基础数据
        try {
            return emergencyDataService.getProductBasicInfo(productId);
        } catch (Exception e) {
            log.error("降级服务也失败，返回默认空对象: {}", e.getMessage());
            // 最终降级：返回默认对象
            return new ProductDTO(productId, "临时商品信息", "系统繁忙，显示默认信息", 0.0);
        }
    }
}

三、缓存与数据库的一致性问题

1. 缓存更新策略不当

案例：用户信息更新后，缓存未及时刷新导致数据不一致

@Service
public class UserService {
    @Cacheable(value = "userCache", key = "#userId")
    public UserDTO getUser(Long userId) {
        return userRepository.findById(userId)
                .map(this::convertToDTO)
                .orElse(null);
    }

    // 更新用户信息，但忘记更新缓存
    public void updateUser(Long userId, UserUpdateRequest request) {
        UserEntity user = userRepository.findById(userId).orElseThrow();
        user.setName(request.getName());
        user.setEmail(request.getEmail());
        userRepository.save(user);
        // 缓存中的数据未更新！
    }
}

问题分析：数据库更新后，缓存未同步更新，导致数据不一致。用户看到的是旧数据，造成困惑和业务错误。

解决方案：Cache-Aside Pattern（旁路缓存模式）

@Service
public class UserService {
    @Cacheable(value = "userCache", key = "#userId", unless = "#result == null")
    public UserDTO getUser(Long userId) {
        return userRepository.findById(userId)
                .map(this::convertToDTO)
                .orElse(null);
    }

    // 正确的更新方式：先更新数据库，再删除缓存
    @Transactional
    public void updateUser(Long userId, UserUpdateRequest request) {
        UserEntity user = userRepository.findById(userId).orElseThrow();
        user.setName(request.getName());
        user.setEmail(request.getEmail());
        userRepository.save(user);

        // 更新后删除缓存，而不是更新缓存
        cacheManager.getCache("userCache").evict(userId);
    }
}

2. 分布式环境下的缓存一致性

问题分析：在分布式系统中，多个服务实例可能同时操作缓存和数据库，导致复杂的一致性问题。

解决方案：

使用事务性消息确保缓存同步
采用最终一致性策略

@Service
public class DistributedUserService {
    @Autowired
    private KafkaTemplate<String, CacheInvalidationEvent> kafkaTemplate;

    @Autowired
    private TransactionTemplate transactionTemplate;

    @Autowired
    private CacheInvalidationMessageRepository messageRepository;

    public void updateUser(Long userId, UserUpdateRequest request) {
        // 使用编程式事务确保数据库更新和消息发送的原子性
        transactionTemplate.execute(status -> {
            try {
                // 1. 更新数据库
                UserEntity user = userRepository.findById(userId).orElseThrow();
                user.setName(request.getName());
                user.setEmail(request.getEmail());
                userRepository.save(user);

                // 2. 将缓存失效消息保存到消息表(与业务在同一事务)
                CacheInvalidationMessage message = new CacheInvalidationMessage();
                message.setCacheName("userCache");
                message.setCacheKey(userId.toString());
                message.setStatus("PENDING");
                message.setCreatedTime(new Date());
                messageRepository.save(message);

                return true;
            } catch (Exception e) {
                // 事务回滚
                status.setRollbackOnly();
                log.error("用户更新失败: {}", e.getMessage());
                throw e;
            }
        });

        // 3. 事务提交成功后，异步发送消息到Kafka
        // 这是一个补偿措施：即使消息发送失败，定时任务也会重试发送PENDING状态的消息
        sendPendingCacheInvalidationMessages();
    }

    // 定时任务：每分钟扫描一次未发送的消息并重试
    @Scheduled(fixedRate = 60000)
    public void sendPendingCacheInvalidationMessages() {
        List<CacheInvalidationMessage> pendingMessages =
            messageRepository.findByStatusAndCreatedTimeBefore("PENDING",
                new Date(System.currentTimeMillis() - 300000)); // 5分钟前的消息

        for (CacheInvalidationMessage message : pendingMessages) {
            try {
                // 发送到Kafka
                CacheInvalidationEvent event = new CacheInvalidationEvent(
                    message.getCacheName(), message.getCacheKey());

                kafkaTemplate.send("cache-invalidation", event).get(); // 同步等待发送结果

                // 更新消息状态为已发送
                message.setStatus("SENT");
                message.setSentTime(new Date());
                messageRepository.save(message);

                log.info("成功发送缓存失效消息: {}:{}", message.getCacheName(), message.getCacheKey());
            } catch (Exception e) {
                log.error("发送缓存失效消息失败，将在下次重试: {}", e.getMessage());
                // 失败时不更新状态，下次还会重试

                // 如果消息重试次数过多，可以考虑标记为失败或报警
                if (message.getRetryCount() > 10) {
                    message.setStatus("FAILED");
                    messageRepository.save(message);

                    // 发送告警
                    notificationService.sendAlert("缓存同步消息发送失败超过重试上限，请检查！");
                } else {
                    // 增加重试计数
                    message.setRetryCount(message.getRetryCount() + 1);
                    messageRepository.save(message);
                }
            }
        }
    }

    // 在所有服务实例中监听缓存失效消息
    @KafkaListener(topics = "cache-invalidation")
    public void handleCacheInvalidation(CacheInvalidationEvent event) {
        try {
            if ("userCache".equals(event.getCacheName())) {
                log.info("收到缓存失效消息，清除缓存: {}:{}",
                        event.getCacheName(), event.getKey());
                cacheManager.getCache(event.getCacheName()).evict(event.getKey());
            }
        } catch (Exception e) {
            log.error("处理缓存失效消息失败: {}", e.getMessage());
            // 考虑是否需要重试或者其他补偿措施
        }
    }
}

四、缓存监控与优化

1. 缓存效率监控

案例：一个团队发现系统性能不佳，但无法确定是否与缓存有关

解决方案：实现缓存监控和统计

@Configuration
public class CacheMonitoringConfig {
    @Bean
    public CacheManager cacheManager() {
        CaffeineCacheManager cacheManager = new CaffeineCacheManager();
        cacheManager.setCaffeine(Caffeine.newBuilder()
                .maximumSize(10000)
                .expireAfterWrite(1, TimeUnit.HOURS)
                .recordStats()); // 启用统计
        return cacheManager;
    }

    @Bean
    public CacheMetricsCollector cacheMetricsCollector(CacheManager cacheManager) {
        return new CacheMetricsCollector(cacheManager);
    }
}

@Component
public class CacheMetricsCollector {
    private final CacheManager cacheManager;
    private final MeterRegistry meterRegistry;

    @Scheduled(fixedRate = 60000) // 每分钟收集一次
    public void collectMetrics() {
        for (String cacheName : cacheManager.getCacheNames()) {
            Cache cache = cacheManager.getCache(cacheName);
            if (cache instanceof CaffeineCache) {
                CaffeineCache caffeineCache = (CaffeineCache) cache;
                com.github.benmanes.caffeine.cache.Cache<Object, Object> nativeCache = caffeineCache.getNativeCache();

                CacheStats stats = nativeCache.stats();

                // 记录各种指标
                meterRegistry.gauge("cache.size", nativeCache.estimatedSize());
                meterRegistry.gauge("cache.hit.ratio", stats.hitRate());
                meterRegistry.gauge("cache.miss.ratio", stats.missRate());
                meterRegistry.gauge("cache.eviction.count", stats.evictionCount());

                log.info("Cache stats for {}: hit ratio={}, miss ratio={}, size={}",
                        cacheName, stats.hitRate(), stats.missRate(), nativeCache.estimatedSize());
            }
        }
    }
}

2. 缓存预热策略

案例：系统重启后，缓存为空，导致短时间内大量请求直接访问数据库

解决方案：实现缓存预热

@Configuration
public class AsyncConfig {
    @Bean(name = "cacheWarmerExecutor")
    public Executor cacheWarmerExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(5);
        executor.setMaxPoolSize(20);
        executor.setQueueCapacity(500);
        executor.setThreadNamePrefix("CacheWarmer-");
        executor.initialize();
        return executor;
    }
}

@Component
public class CacheWarmer {
    @Autowired
    private ProductService productService;

    @Autowired
    private UserService userService;

    private final AtomicInteger warmedUpItems = new AtomicInteger(0);

    // 系统启动后执行
    @EventListener(ApplicationReadyEvent.class)
    public void warmUpCaches() {
        log.info("开始预热缓存...");

        // 预热热门商品缓存
        List<Long> hotProductIds = productService.getHotProductIds();
        log.info("计划预热{}个热门商品...", hotProductIds.size());

        // 使用专用线程池而非parallelStream，避免占用公共ForkJoinPool
        for (Long productId : hotProductIds) {
            warmupProductAsync(productId);
        }

        // 预热活跃用户缓存
        List<Long> activeUserIds = userService.getActiveUserIds();
        log.info("计划预热{}个活跃用户...", activeUserIds.size());

        for (Long userId : activeUserIds) {
            warmupUserAsync(userId);
        }
    }

    @Async("cacheWarmerExecutor")
    public void warmupProductAsync(Long productId) {
        try {
            productService.getProduct(productId);
            int count = warmedUpItems.incrementAndGet();

            if (count % 100 == 0) {
                log.info("已预热{}个缓存项", count);
            }
        } catch (Exception e) {
            log.warn("预热商品{}缓存失败: {}", productId, e.getMessage());
        }
    }

    @Async("cacheWarmerExecutor")
    public void warmupUserAsync(Long userId) {
        try {
            userService.getUser(userId);
            int count = warmedUpItems.incrementAndGet();

            if (count % 100 == 0) {
                log.info("已预热{}个缓存项", count);
            }
        } catch (Exception e) {
            log.warn("预热用户{}缓存失败: {}", userId, e.getMessage());
        }
    }

    // 可以添加一个监控预热进度的接口，便于在启动时查看
    public int getWarmedUpItemCount() {
        return warmedUpItems.get();
    }
}

五、实战策略与落地经验总结

设置合理的缓存容量和过期策略
- 为每种缓存设置最大容量限制
- 根据数据更新频率设置过期时间
- 为不同类型的数据使用不同的过期策略
精心设计缓存键
- 只包含影响结果的关键属性
- 避免使用完整对象或过长字符串作为键
- 考虑使用键前缀区分不同类型的缓存
- 使用自定义 KeyGenerator 提升可维护性
合理的缓存粒度
- 按照数据变化频率拆分缓存
- 热点数据可以单独缓存
- 避免缓存过大的对象
防御性缓存设计
- 实现布隆过滤器防止缓存穿透，注意分批加载 ID 避免内存压力
- 使用互斥锁（单应用）或 Redisson 分布式锁（集群环境）防止缓存击穿
- 设置随机过期时间防止缓存雪崩
- 实现熔断降级机制应对极端情况
正确的缓存更新策略
- 优先使用 Cache-Aside 模式
- 先更新数据库，再删除缓存
- 在分布式环境使用事务性消息确保缓存同步
全面的缓存监控
- 收集命中率、大小等关键指标
- 设置告警阈值
- 定期分析缓存效率
缓存预热
- 系统启动时预热热点数据
- 大促活动前预热相关数据
- 使用专用线程池进行异步预热，避免影响系统启动