1. 经过chukwa自带的data-process每5分钟生成(基本的数据合并以及去重(基础设施))
repos/[clusterName]/[dataType]/[yyyyMMdd]/[HH]/[mm]/[dataType]_[yyyyMMdd]_[mm].[N].evt
2. 利用HourlyChukwaRecordRolling每隔一个小时将上一步生成的文件合并成一个大的文件最终生成一个SequenceFile<ChukwaRecordKey, ChukwaRecord>:
repos/[clusterName]/[dataType]/[yyyyMMdd]/[dataType]_HourlyDone_[yyyyMMdd].[N].evt
3. 可以修改第二步中的源码在结束的时候向etl task server 发个请求插入文件路径;或者额外一个程序去检查HDFS路径然后再向etl task server 发个请求
repos/[clusterName]/[dataType]/[yyyyMMdd]/rotateDone/
注:程序只执行HourlyChukwaRecordRolling即可,如果执行了DailyChukwaRecordRolling那么HourlyChukwaRecordRolling的结果将会被删除。
建议尽量减少将业务代码融合到Chukwa自身的数据处理中,让其作为一个基础设施,和业务相关的尽量放到后面的workflow中。
Adatper(initial_adaptors)
add DirTailingAdaptor dataType /opt/ssp/logs/filetailer.CharFileTailingAdaptorUTF8 0
1. DirTailingAdaptor
Takes a directory path and an adaptor name as mandatoryparameters; repeatedly scans that directory and all subdirectories, and startsthe indicated adaptor running on each file. Since the DirTailingAdaptor doesnot, itself, emit data, the datatype parameter is applied to the newly-spawnedadaptors. Note that if you try this on a large directory with an adaptor thatkeeps file handles open, it is possible to exceed your system's limit on openfiles. A file pattern can be specified as an optional second parameter.
2. dataType可以为:SSPLOG_VIDEO_SERVERTAG
SSPLOG_VIDEO stand for type of Log(video or common)
SERVERTAG stand for the server where thelog come from
3. /opt/ssp/logs/ : logproducer dirctory
4. filetailer.CharFileTailingAdaptorUTF8
Repeatedly tails a file, again ignoring content and withunspecified Chunk boundaries. Takes one mandatory parameter; a path to the fileto tail. Keeps a file handle open in order to detect log file rotation. Thesame as the base FileTailingAdaptor, except that chunks are guaranteed to endonly at carriage returns. This is useful for most ASCII log file formats.
Refine the HourlyChukwaRecordRolling.java file. TheHourlyChukwaRecordRolling use the SequenceFile<ChukwaRecordKey, ChukwaRecord>as default output format, we need compress the output file and choose lzo asthe compression algorithm.