A new open source data set for anomaly detection

Yahoo实验室发布了一个用于时间序列异常检测研究的有趣新数据集,包含真实流量数据和合成数据,可用于检测互联网流量异常。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Yahoo Labs has just released an interesting new data set useful for research on detecting anomalies (or outliers) in time series data. There are many contexts in which anomaly detection is important. For Yahoo, the main use case is in detecting unusual traffic on Yahoo servers.

The data set comprises real traffic to Yahoo services, along with some synthetic data. There are 367 time series in the data set, each of which contains between 741 and 1680 observations recorded at regular intervals. Each series is accompanied by an indicator series with a 1 if the observation was an anomaly, and 0 otherwise. The anomalies in the real data were determined by human judgement, while those in the synthetic data were generated algorithmically. For the synthetic data, some information about the components used to construct the data is also provided.

Although the Yahoo announcement claims that the data are publicly available, in fact they are only available to people with an edu address. Further, you have to apply to use them, and it takes about 24 hours before approval is granted. I have suggested that they remove these restrictions, and make the data available without restriction to anyone who wants to use them.

Research on anomaly detection in time series seems to be growing in popularity. Twitter has also released their own Anomaly Detection R package. Their approach has some similarities with my own tsoutliers function in the forecast package. The tso function in the tsoutliers package is another approach to the same problem.

Hopefully having a large public data set available will lead to improvements in time series outlier detection methods, at least for detecting outliers in internet traffic data.

package com.tongchuang.realtime.mds; import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONObject; import com.tongchuang.realtime.util.KafkaUtils; import org.apache.flink.api.common.eventtime.WatermarkStrategy; import org.apache.flink.api.common.functions.RichFlatMapFunction; import org.apache.flink.api.common.serialization.DeserializationSchema; import org.apache.flink.api.common.state.*; import org.apache.flink.api.common.time.Time; import org.apache.flink.api.common.typeinfo.BasicTypeInfo; import org.apache.flink.api.common.typeinfo.TypeInformation; import org.apache.flink.configuration.Configuration; import org.apache.flink.connector.kafka.source.KafkaSource; import org.apache.flink.connector.kafka.source.enumerator.initializer.OffsetsInitializer; import org.apache.flink.streaming.api.datastream.*; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.functions.co.KeyedBroadcastProcessFunction; import org.apache.flink.streaming.api.functions.sink.PrintSinkFunction; import org.apache.flink.streaming.api.functions.sink.SinkFunction; import org.apache.flink.streaming.api.functions.source.RichSourceFunction; import org.apache.flink.util.Collector; import org.apache.flink.util.OutputTag; import java.io.Serializable; import java.sql.*; import java.text.SimpleDateFormat; import java.util.*; import java.util.Date; import java.util.concurrent.TimeUnit; import java.util.stream.Collectors; public class ULEDataanomalyanalysis { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); // 设置合理并行度 KafkaSource<String> kafkaConsumer = KafkaUtils.getKafkaConsumer("da-data-xl", "minuteaggregation_calculation", OffsetsInitializer.latest()); //测试 指定最新位置 DataStreamSource<String> kafkaDS = env.fromSource(kafkaConsumer, WatermarkStrategy.noWatermarks(), "realdata_minuteaggregation"); // 解析JSON并拆分每个tag的数据 SingleOutputStreamOperator<JSONObject> splitStream = kafkaDS .map(JSON::parseObject) .flatMap((JSONObject value, Collector<JSONObject> out) -> { JSONObject data = value.getJSONObject("datas"); String time = value.getString("times"); for (String tag : data.keySet()) { JSONObject tagData = data.getJSONObject(tag); JSONObject newObj = new JSONObject(); newObj.put("time", time); newObj.put("tag", tag); newObj.put("ontime", tagData.getDouble("ontime")); newObj.put("avg", tagData.getDouble("avg")); out.collect(newObj); } }) .returns(TypeInformation.of(JSONObject.class)) .name("Split-By-Tag"); // 每5分钟加载参数配置 DataStream<ConfigCollection> configDataStream = env .addSource(new MysqlConfigSource()) .setParallelism(1) // 单并行度确保配置顺序 .filter(Objects::nonNull) .name("Config-Source"); // 将配置流转换为广播流 BroadcastStream<ConfigCollection> configBroadcastStream = configDataStream .broadcast(Descriptors.configStateDescriptor); // 按tag分组并连接广播流 KeyedStream<JSONObject, String> keyedStream = splitStream .keyBy(json -> json.getString("tag")); BroadcastConnectedStream<JSONObject, ConfigCollection> connectedStream = keyedStream.connect(configBroadcastStream); // 异常检测处理 SingleOutputStreamOperator<JSONObject> anomalyStream = connectedStream .process(new OptimizedAnomalyDetectionFunction()) .name("Anomaly-Detection"); // 获取配置更新侧输出流 DataStream<ConfigCollection> configUpdateStream = anomalyStream.getSideOutput( OptimizedAnomalyDetectionFunction.CONFIG_UPDATE_TAG ); // 处理配置更新事件(检测缺失标签) SingleOutputStreamOperator<JSONObject> missingTagAnomalies = configUpdateStream .keyBy(cfg -> "global") // 全局键确保所有事件到同一分区 .flatMap(new MissingTagDetector()) .name("Missing-Tag-Detector"); // 合并所有异常流 DataStream<JSONObject> allAnomalies = anomalyStream.union(missingTagAnomalies); // 输出结果 allAnomalies.print("异常检测结果"); // 生产环境写入Kafka // allAnomalies.map(JSON::toString) // .addSink(KafkaUtils.getKafkaSink("minutedata_uleanomaly", bootstrapServers)); // 启动执行 env.execute("ULEDataAnomalyAnalysis"); } // 配置集合类 public static class ConfigCollection implements Serializable { private static final long serialVersionUID = 1L; public final Map<String, List<ULEParamConfig>> tagToConfigs; public final Map<String, ULEParamConfig> encodeToConfig; public final Set<String> allTags; public final long checkpointTime; public ConfigCollection(Map<String, List<ULEParamConfig>> tagToConfigs, Map<String, ULEParamConfig> encodeToConfig) { this.tagToConfigs = new HashMap<>(tagToConfigs); this.encodeToConfig = new HashMap<>(encodeToConfig); this.allTags = new HashSet<>(tagToConfigs.keySet()); this.checkpointTime = System.currentTimeMillis(); } } // MySQL配置源 public static class MysqlConfigSource extends RichSourceFunction<ConfigCollection> { private volatile boolean isRunning = true; private final long interval = TimeUnit.MINUTES.toMillis(5); @Override public void run(SourceContext<ConfigCollection> ctx) throws Exception { while (isRunning) { try { ConfigCollection newConfig = loadParams(); if (newConfig != null) { ctx.collect(newConfig); System.out.println("[Config] 配置加载完成,检查点时间: " + new SimpleDateFormat("yyyy-MM-dd HH:mm").format(new Date(newConfig.checkpointTime)) + ", 参数数量: " + newConfig.encodeToConfig.size()); } else { System.out.println("[Config] 配置加载失败"); } } catch (Exception e) { System.err.println("[Config] 配置加载错误: " + e.getMessage()); } Thread.sleep(interval); } } private ConfigCollection loadParams() throws SQLException { Map<String, List<ULEParamConfig>> tagToConfigs = new HashMap<>(5000); Map<String, ULEParamConfig> encodeToConfig = new HashMap<>(5000); // 数据库配置 - 实际使用时应从环境变量获取 String url = "jdbc:mysql://mysql-host:3306/eps?useSSL=false"; String user = "app_user"; String password = "secure_password"; String query = "SELECT F_tag AS tag, F_enCode AS encode, F_dataTypes AS datatype, " + "F_isConstantValue AS constantvalue, F_isOnline AS isonline, " + "F_isSync AS issync, F_syncParaEnCode AS syncparaencode, " + "F_isZero AS iszero, F_isHigh AS ishigh, F_highThreshold AS highthreshold, " + "F_isLow AS islow, F_lowThreshold AS lowthreshold, F_duration AS duration " + "FROM t_equipmentparameter " + "WHERE F_enabledmark = '1' AND (F_isConstantValue ='1' OR F_isZero= '1' " + "OR F_isHigh = '1' OR F_isLow = '1' OR F_isOnline = '1' OR F_isSync = '1')"; try (Connection conn = DriverManager.getConnection(url, user, password); Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery(query)) { while (rs.next()) { ULEParamConfig config = new ULEParamConfig(); config.tag = rs.getString("tag"); config.encode = rs.getString("encode"); config.datatype = rs.getString("datatype"); config.constantvalue = rs.getInt("constantvalue"); config.iszero = rs.getInt("iszero"); config.ishigh = rs.getInt("ishigh"); config.highthreshold = rs.getDouble("highthreshold"); config.islow = rs.getInt("islow"); config.lowthreshold = rs.getDouble("lowthreshold"); config.duration = rs.getLong("duration"); config.isonline = rs.getInt("isonline"); config.issync = rs.getInt("issync"); config.syncparaencode = rs.getString("syncparaencode"); String tag = config.tag; tagToConfigs.computeIfAbsent(tag, k -> new ArrayList<>(10)).add(config); encodeToConfig.put(config.encode, config); } return new ConfigCollection(tagToConfigs, encodeToConfig); } } @Override public void cancel() { isRunning = false; } } // 状态描述符 public static class Descriptors { public static final MapStateDescriptor<Void, ConfigCollection> configStateDescriptor = new MapStateDescriptor<>( "configState", TypeInformation.of(Void.class), TypeInformation.of(ConfigCollection.class) ); } // 优化后的异常检测函数 public static class OptimizedAnomalyDetectionFunction extends KeyedBroadcastProcessFunction<String, JSONObject, ConfigCollection, JSONObject> { // 状态管理 private transient MapState<String, AnomalyState> stateMap; private transient MapState<String, Double> lastValuesMap; private transient MapState<String, Long> lastDataTimeMap; private transient ValueState<Long> lastCheckpointState; private transient SimpleDateFormat timeFormat; // 侧输出标签用于配置更新事件 public static final OutputTag<ConfigCollection> CONFIG_UPDATE_TAG = new OutputTag<ConfigCollection>("config-update"){}; @Override public void open(Configuration parameters) { // 状态TTL配置(30天自动清理) StateTtlConfig ttlConfig = StateTtlConfig.newBuilder(Time.days(300)) .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite) .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired) .cleanupFullSnapshot() .build(); // 初始化状态 MapStateDescriptor<String, AnomalyState> stateDesc = new MapStateDescriptor<>( "anomalyState", BasicTypeInfo.STRING_TYPE_INFO, TypeInformation.of(AnomalyState.class)); stateDesc.enableTimeToLive(ttlConfig); stateMap = getRuntimeContext().getMapState(stateDesc); MapStateDescriptor<String, Double> valuesDesc = new MapStateDescriptor<>( "lastValuesState", BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.DOUBLE_TYPE_INFO); valuesDesc.enableTimeToLive(ttlConfig); lastValuesMap = getRuntimeContext().getMapState(valuesDesc); MapStateDescriptor<String, Long> timeDesc = new MapStateDescriptor<>( "lastDataTimeState", BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.LONG_TYPE_INFO); timeDesc.enableTimeToLive(ttlConfig); lastDataTimeMap = getRuntimeContext().getMapState(timeDesc); ValueStateDescriptor<Long> checkpointDesc = new ValueStateDescriptor<>( "lastCheckpointState", BasicTypeInfo.LONG_TYPE_INFO); checkpointDesc.enableTimeToLive(ttlConfig); lastCheckpointState = getRuntimeContext().getState(checkpointDesc); timeFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm"); } @Override public void processElement(JSONObject data, ReadOnlyContext ctx, Collector<JSONObject> out) throws Exception { String tag = ctx.getCurrentKey(); String timeStr = data.getString("time"); long eventTime = timeFormat.parse(timeStr).getTime(); // 更新最后数据时间 lastDataTimeMap.put(tag, eventTime); // 获取广播配置 ConfigCollection configCollection = getBroadcastConfig(ctx); if (configCollection == null) return; List<ULEParamConfig> configs = configCollection.tagToConfigs.get(tag); if (configs == null || configs.isEmpty()) return; // ========== 清理无效状态 ========== cleanupInvalidStates(configs); // ========== 检查离线状态(基于配置检查点) ========== checkOfflineStatus(tag, configs, configCollection, out); double value = 0; boolean valueSet = false; // 遍历配置项进行异常检测 for (ULEParamConfig config : configs) { if (!valueSet) { value = "436887485805570949".equals(config.datatype) ? data.getDouble("ontime") : data.getDouble("avg"); lastValuesMap.put(tag, value); valueSet = true; } AnomalyState state = getOrCreateState(config.encode); // ========== 离线恢复检测 ========== checkOnlineRecovery(config, tag, timeStr, state, out); // 处理异常类型 checkConstantValueAnomaly(config, value, timeStr, state, out); checkZeroValueAnomaly(config, value, timeStr, state, out); checkThresholdAnomaly(config, value, timeStr, state, out); checkSyncAnomaly(config, value, timeStr, state, configCollection, out); stateMap.put(config.encode, state); } } // 清理无效状态 private void cleanupInvalidStates(List<ULEParamConfig> configs) throws Exception { Set<String> validEncodes = configs.stream() .map(cfg -> cfg.encode) .collect(Collectors.toSet()); Iterator<String> stateKeys = stateMap.keys().iterator(); while (stateKeys.hasNext()) { String encode = stateKeys.next(); if (!validEncodes.contains(encode)) { stateMap.remove(encode); } } } // 检查离线状态 private void checkOfflineStatus(String tag, List<ULEParamConfig> configs, ConfigCollection configCollection, Collector<JSONObject> out) throws Exception { Long lastCP = lastCheckpointState.value(); if (lastCP == null || configCollection.checkpointTime > lastCP) { for (ULEParamConfig config : configs) { if (config.isonline == 1) { Long lastEventTime = lastDataTimeMap.get(tag); if (lastEventTime == null) { AnomalyState state = getOrCreateState(config.encode); AnomalyStatus status = state.getStatus(5); if (!status.reported) { reportAnomaly(5, 1, 0.0, timeFormat.format(new Date(configCollection.checkpointTime)), config, out); status.reported = true; stateMap.put(config.encode, state); } } else { long timeoutPoint = configCollection.checkpointTime - config.duration * 60 * 1000; if (lastEventTime < timeoutPoint) { AnomalyState state = getOrCreateState(config.encode); AnomalyStatus status = state.getStatus(5); if (!status.reported) { reportAnomaly(5, 1, 0.0, timeFormat.format(new Date(configCollection.checkpointTime)), config, out); status.reported = true; stateMap.put(config.encode, state); } } } } } lastCheckpointState.update(configCollection.checkpointTime); } } // 检查在线恢复 private void checkOnlineRecovery(ULEParamConfig config, String tag, String timeStr, AnomalyState state, Collector<JSONObject> out) { if (config.isonline == 1) { AnomalyStatus status = state.getStatus(5); if (status.reported) { reportAnomaly(5, 0, 0.0, timeStr, config, out); status.reset(); } } } // 恒值检测 private void checkConstantValueAnomaly(ULEParamConfig config, double currentValue, String timeStr, AnomalyState state, Collector<JSONObject> out) { if (config.constantvalue != 1) return; try { AnomalyStatus status = state.getStatus(1); long durationThreshold = config.duration * 60 * 1000; Date timestamp = timeFormat.parse(timeStr); if (status.lastValue == null) { status.lastValue = currentValue; status.lastChangeTime = timestamp; return; } if (Math.abs(currentValue - status.lastValue) > 0.001) { status.lastValue = currentValue; status.lastChangeTime = timestamp; if (status.reported) { reportAnomaly(1, 0, currentValue, timeStr, config, out); } status.reset(); return; } long elapsed = timestamp.getTime() - status.lastChangeTime.getTime(); if (elapsed > durationThreshold) { if (!status.reported) { reportAnomaly(1, 1, currentValue, timeStr, config, out); status.reported = true; } } } catch (Exception e) { System.err.println("恒值检测错误: " + config.encode + " - " + e.getMessage()); } } // 零值检测 private void checkZeroValueAnomaly(ULEParamConfig config, double currentValue, String timeStr, AnomalyState state, Collector<JSONObject> out) { if (config.iszero != 1) return; try { AnomalyStatus status = state.getStatus(2); Date timestamp = timeFormat.parse(timeStr); boolean isZero = Math.abs(currentValue) < 0.001; if (isZero) { if (status.startTime == null) { status.startTime = timestamp; } else if (!status.reported) { long elapsed = timestamp.getTime() - status.startTime.getTime(); if (elapsed >= config.duration * 60 * 1000) { reportAnomaly(2, 1, currentValue, timeStr, config, out); status.reported = true; } } } else { if (status.reported) { reportAnomaly(2, 0, currentValue, timeStr, config, out); status.reset(); } else if (status.startTime != null) { status.startTime = null; } } } catch (Exception e) { System.err.println("零值检测错误: " + config.encode + " - " + e.getMessage()); } } // 阈值检测 private void checkThresholdAnomaly(ULEParamConfig config, double currentValue, String timeStr, AnomalyState state, Collector<JSONObject> out) { try { if (config.ishigh == 1) { AnomalyStatus highStatus = state.getStatus(3); processThresholdAnomaly(highStatus, currentValue, timeStr, currentValue > config.highthreshold, config, 3, out); } if (config.islow == 1) { AnomalyStatus lowStatus = state.getStatus(4); processThresholdAnomaly(lowStatus, currentValue, timeStr, currentValue < config.lowthreshold, config, 4, out); } } catch (Exception e) { System.err.println("阈值检测错误: " + config.encode + " - " + e.getMessage()); } } private void processThresholdAnomaly(AnomalyStatus status, double currentValue, String timeStr, boolean isAnomaly, ULEParamConfig config, int anomalyType, Collector<JSONObject> out) { try { Date timestamp = timeFormat.parse(timeStr); if (isAnomaly) { if (status.startTime == null) { status.startTime = timestamp; } else if (!status.reported) { long elapsed = timestamp.getTime() - status.startTime.getTime(); if (elapsed >= config.duration * 60 * 1000) { reportAnomaly(anomalyType, 1, currentValue, timeStr, config, out); status.reported = true; } } } else { if (status.reported) { reportAnomaly(anomalyType, 0, currentValue, timeStr, config, out); status.reset(); } else if (status.startTime != null) { status.startTime = null; } } } catch (Exception e) { System.err.println("阈值处理错误: " + config.encode + " - " + e.getMessage()); } } // 同步检测 private void checkSyncAnomaly(ULEParamConfig config, double currentValue, String timeStr, AnomalyState state, ConfigCollection configCollection, Collector<JSONObject> out) { if (config.issync != 1 || config.syncparaencode == null) return; try { AnomalyStatus status = state.getStatus(6); Date timestamp = timeFormat.parse(timeStr); ULEParamConfig relatedConfig = configCollection.encodeToConfig.get(config.syncparaencode); if (relatedConfig == null) return; String relatedTag = null; for (Map.Entry<String, List<ULEParamConfig>> entry : configCollection.tagToConfigs.entrySet()) { if (entry.getValue().contains(relatedConfig)) { relatedTag = entry.getKey(); break; } } if (relatedTag == null) return; Double relatedValue = lastValuesMap.get(relatedTag); if (relatedValue == null) return; boolean isAnomaly = (Math.abs(currentValue - 1.0) < 0.001) && (Math.abs(relatedValue) < 0.001); if (isAnomaly) { if (status.startTime == null) { status.startTime = timestamp; } else if (!status.reported) { long elapsed = timestamp.getTime() - status.startTime.getTime(); if (elapsed >= config.duration * 60 * 1000) { reportAnomaly(6, 1, currentValue, timeStr, config, out); status.reported = true; } } } else { if (status.reported) { reportAnomaly(6, 0, currentValue, timeStr, config, out); status.reset(); } else if (status.startTime != null) { status.startTime = null; } } } catch (Exception e) { System.err.println("同步检测错误: " + config.encode + " - " + e.getMessage()); } } // 报告异常 private void reportAnomaly(int anomalyType, int statusFlag, double value, String time, ULEParamConfig config, Collector<JSONObject> out) { JSONObject event = new JSONObject(); event.put("tag", config.tag); event.put("paracode", config.encode); event.put("abnormaltype", anomalyType); event.put("statusflag", statusFlag); event.put("datavalue", value); event.put("triggertime", time); out.collect(event); } @Override public void processBroadcastElement(ConfigCollection newConfig, Context ctx, Collector<JSONObject> out) { BroadcastState<Void, ConfigCollection> state = ctx.getBroadcastState(Descriptors.configStateDescriptor); try { ConfigCollection oldConfig = state.get(null); // 处理配置变更 if (oldConfig != null) { handleConfigChanges(oldConfig, newConfig, ctx, out); } // 输出配置更新事件(用于后续缺失标签检测) ctx.output(CONFIG_UPDATE_TAG, newConfig); // 更新广播状态 state.put(null, newConfig); System.out.println("[Broadcast] 配置更新完成, 参数数量: " + newConfig.encodeToConfig.size()); } catch (Exception e) { System.err.println("[Broadcast] 配置更新错误: " + e.getMessage()); } } // 处理配置变更 private void handleConfigChanges(ConfigCollection oldConfig, ConfigCollection newConfig, Context ctx, Collector<JSONObject> out) throws Exception { // 1. 处理被删除或禁用的配置项 for (String encode : oldConfig.encodeToConfig.keySet()) { if (!newConfig.encodeToConfig.containsKey(encode)) { ULEParamConfig oldCfg = oldConfig.encodeToConfig.get(encode); sendRecoveryEvents(encode, oldCfg, ctx, out); } } // 2. 处理被删除的tag for (String tag : oldConfig.allTags) { if (!newConfig.allTags.contains(tag)) { cleanupTagStates(tag, oldConfig, ctx, out); } } } // 清理tag相关状态 private void cleanupTagStates(String tag, ConfigCollection configCollection, Context ctx, Collector<JSONObject> out) throws Exception { List<ULEParamConfig> configs = configCollection.tagToConfigs.get(tag); if (configs != null) { for (ULEParamConfig config : configs) { sendRecoveryEvents(config.encode, config, ctx, out); stateMap.remove(config.encode); } } lastValuesMap.remove(tag); lastDataTimeMap.remove(tag); } // 发送恢复事件 private void sendRecoveryEvents(String encode, ULEParamConfig config, Context ctx, Collector<JSONObject> out) { try { AnomalyState state = stateMap.get(encode); if (state == null) return; for (int type = 1; type <= 6; type++) { AnomalyStatus status = state.getStatus(type); if (status.reported) { JSONObject recoveryEvent = new JSONObject(); recoveryEvent.put("tag", config.tag); recoveryEvent.put("paracode", config.encode); recoveryEvent.put("abnormaltype", type); recoveryEvent.put("statusflag", 0); recoveryEvent.put("datavalue", 0.0); recoveryEvent.put("triggertime", timeFormat.format(new Date())); out.collect(recoveryEvent); status.reset(); } } stateMap.put(encode, state); } catch (Exception e) { System.err.println("发送恢复事件失败: " + e.getMessage()); } } // 辅助方法 private ConfigCollection getBroadcastConfig(ReadOnlyContext ctx) throws Exception { return ctx.getBroadcastState(Descriptors.configStateDescriptor).get(null); } private AnomalyState getOrCreateState(String encode) throws Exception { AnomalyState state = stateMap.get(encode); return state != null ? state : new AnomalyState(); } } // 缺失标签检测器 public static class MissingTagDetector extends RichFlatMapFunction<ConfigCollection, JSONObject> { private transient MapState<String, Long> lastCheckTimeMap; private transient SimpleDateFormat timeFormat; @Override public void open(Configuration parameters) { // 状态TTL配置 StateTtlConfig ttlConfig = StateTtlConfig.newBuilder(Time.days(30)) .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite) .build(); // 初始化状态 MapStateDescriptor<String, Long> lastCheckDesc = new MapStateDescriptor<>( "lastCheckTime", BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.LONG_TYPE_INFO); lastCheckDesc.enableTimeToLive(ttlConfig); lastCheckTimeMap = getRuntimeContext().getMapState(lastCheckDesc); timeFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm"); } @Override public void flatMap(ConfigCollection config, Collector<JSONObject> out) throws Exception { long currentTime = System.currentTimeMillis(); for (String tag : config.allTags) { // 检查上次检测时间(避免频繁检测) Long lastCheckTime = lastCheckTimeMap.get(tag); if (lastCheckTime == null || (currentTime - lastCheckTime) > TimeUnit.MINUTES.toMillis(5)) { processTag(tag, config, out); lastCheckTimeMap.put(tag, currentTime); } } } private void processTag(String tag, ConfigCollection config, Collector<JSONObject> out) { List<ULEParamConfig> configs = config.tagToConfigs.get(tag); if (configs == null) return; for (ULEParamConfig configItem : configs) { if (configItem.isonline == 1) { // 触发离线报警 JSONObject event = new JSONObject(); event.put("tag", configItem.tag); event.put("paracode", configItem.encode); event.put("abnormaltype", 5); // 离线类型 event.put("statusflag", 1); // 报警 event.put("datavalue", 0.0); event.put("triggertime", timeFormat.format(new Date())); out.collect(event); System.out.println("[MissingTag] 检测到缺失标签: " + tag); } } } } // 异常状态类 public static class AnomalyState implements Serializable { private static final long serialVersionUID = 1L; private final Map<Integer, AnomalyStatus> statusMap = new HashMap<>(); public AnomalyStatus getStatus(int type) { return statusMap.computeIfAbsent(type, k -> new AnomalyStatus()); } } // 异常状态详情 public static class AnomalyStatus implements Serializable { private static final long serialVersionUID = 1L; public Date startTime; // 异常开始时间 public Double lastValue; // 用于恒值检测 public Date lastChangeTime; // 值最后变化时间 public boolean reported; // 是否已报告 public void reset() { startTime = null; lastValue = null; lastChangeTime = null; reported = false; } } // 参数配置类 public static class ULEParamConfig implements Serializable { public String tag; public String encode; public String datatype; public int constantvalue; public int isonline; public int issync; public String syncparaencode; public int iszero; public int ishigh; public double highthreshold; public int islow; public double lowthreshold; public long duration; } // 简单的字符串反序列化器 public static class SimpleStringDeserializer implements DeserializationSchema<String> { @Override public String deserialize(byte[] message) { return new String(message); } @Override public boolean isEndOfStream(String nextElement) { return false; } @Override public TypeInformation<String> getProducedType() { return BasicTypeInfo.STRING_TYPE_INFO; } } } [Config] 配置加载错误: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144) at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137) at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$0(AkkaInvocationHandler.java:237) at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) at org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:1081) at akka.dispatch.OnComplete.internal(Future.scala:264) at akka.dispatch.OnComplete.internal(Future.scala:261) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:191) at akka.dispatch.japi$CallbackBridge.apply(Future.scala:188) at scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:60) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala) at org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:73) at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68) at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284) at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284) at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284) at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:573) at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:22) at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:21) at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:532) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) at scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:60) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala) at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55) at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12) at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81) at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138) at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82) at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:216) at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:206) at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:197) at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:682) at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79) at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:435) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:305) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:212) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) at scala.PartialFunction.applyOrElse(PartialFunction.scala:123) at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172) at akka.actor.Actor.aroundReceive(Actor.scala:517) at akka.actor.Actor.aroundReceive$(Actor.scala:515) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage$$$capture(ActorCell.scala:592) at akka.actor.ActorCell.receiveMessage(ActorCell.scala) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) ... 4 more Caused by: java.lang.NullPointerException at com.tongchuang.realtime.mds.ULEDataanomalyanalysis.lambda$main$d2c6a2d6$1(ULEDataanomalyanalysis.java:49) at org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:47) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:50) at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:28) at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:38) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:71) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:46) at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:26) at org.apache.flink.streaming.runtime.tasks.SourceOperatorStreamTask$AsyncDataOutputToOutput.emitRecord(SourceOperatorStreamTask.java:188) at org.apache.flink.streaming.api.operators.source.SourceOutputWithWatermarks.collect(SourceOutputWithWatermarks.java:110) at org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:36) at org.apache.flink.connector.kafka.source.reader.KafkaRecordEmitter.emitRecord(KafkaRecordEmitter.java:27) at org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:128) at org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:305) at org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:69) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684) at org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639) at org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:623) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566) at java.lang.Thread.run(Thread.java:745) 与目标 VM 断开连接, 地址为: ''127.0.0.1:53628',传输: '套接字'' 进程已结束,退出代码为 1
最新发布
08-02
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值