T001-UT001-0026--------文件读写练习——日志分析

文件读写练习——日志分析

附件中是一个网站的访问日志log.txt，在此截取了一小段内容。访问日志通常会纪录下某些用户访问某个网页的信息，比如用户名、当时的ip地址、访问的网页的URL，访问时间以及通过什么浏览器访问。不同公司的项目中，常常有大量的访问日志需要去分析。以我们本题中的log.txt为例，每一行字符串代表一条访问日志。每条日志都具备相同的格式，如下：

 
           - 192.168.1.208 - - [13
           /Apr/2015
           :12:13:10 +0800] 302 - 
           "<a href="http://pm.ambimmort.com/app/signin.do" "="" style="color: rgb(50, 108, 166); text-decoration: none; border-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; min-height: inherit !important; background: none !important;">http://pm.ambimmort.com/app/signin.do" 
           "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36"
          
           hedingwei 192.168.1.208 - - [13
           /Apr/2015
           :12:13:11 +0800] 200 7168 
           "<a href="http://pm.ambimmort.com/app/signin.do" "="" style="color: rgb(50, 108, 166); text-decoration: none; border-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; min-height: inherit !important; background: none !important;">http://pm.ambimmort.com/app/signin.do" 
           "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36"

上面的例子中，每个字段以空格分割后形成一列，比如上面例子中的第一行的第一列是"-"，第二行的第一列是"hedingwei"。这个第一列所代表的意思是用户名。第二列代表的是ip地址。第八列"http://pm.ambimmort.com/app/signin.do"代表的是用户访问的网站的地址URL；第9列代表的是用户通过什么浏览器以及操作系统访问这个URL；第5列[13/Apr/2015:12:13:11 +0800]代表的是访问时间。那么整个第二行所表达的意思是：

用户"hedingwei" 在“13/Apr/2015:12:13:11 +0800”时间点通过一个IP“192.168.1.208”,使用Chrome浏览器访问了“http://pm.ambimmort.com/app/signin.do”（这个网页）。他的电脑使用的是苹果的Mac OS操作系统，版本是10.10.2。

本题需要编写一个程序，从标准终端上输入两个文件路径，一个是待分析的日志文件路径，一个是日志分析问结果文件路径。日志文件必须满足上面的日志结构，若不满足规范，则在日志结果文件中输出Error Input并退出程序。若满足规范，则进行如下统计：

1. 统计一共有多少条记录。

2. 统计一共有多少个不同的用户。

3. 统计一共有多少个不同的页面URL被访问过

举例一：

输入：

 
           input log 
           file
           :
          
           d:
           /test
           .txt
          
           input log analysis 
           file
           :
          
           d:
           /test
           .ana.txt

输出：

 
           total records:
           227
          
           distinct users:
           1
          
           distinct pages:
           5

样例文件：log.txt

 
      
           - 192.168.1.208 - - [13
           /Apr/2015
           :12:13:10 +0800] 302 - 
           "<a href="http://pm.ambimmort.com/app/signin.do" "="" style="color: rgb(50, 108, 166); text-decoration: none; border-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; min-height: inherit !important; background: none !important;">http://pm.ambimmort.com/app/signin.do" 
           "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36"
          
 
           hedingwei 192.168.1.208 - - [13
           /Apr/2015
           :12:13:11 +0800] 200 7168 
           "<a href="http://pm.ambimmort.com/app/signin.do" "="" style="color: rgb(50, 108, 166); text-decoration: none; border-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; min-height: inherit !important; background: none !important;">http://pm.ambimmort.com/app/signin.do" 
           "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36"
          
 
           - 192.168.1.208 - - [13
           /Apr/2015
           :12:13:11 +0800] 200 3463 
           "<a href="http://pm.ambimmort.com/app/task/viewtask.do?taskId=T001-UT001-0017&taskGroup=a" "="" style="color: rgb(50, 108, 166); text-decoration: none; border-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; min-height: inherit !important; background: none !important;">http://pm.ambimmort.com/app/task/viewtask.do?taskId=T001-UT001-0017&taskGroup=a" 
           "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36"
          
 
           hedingwei 192.168.1.208 - - [13
           /Apr/2015
           :12:13:15 +0800] 200 72826 
           "<a href="http://pm.ambimmort.com/app/task/viewtask.do?taskId=T001-UT001-0017&taskGroup=a" "="" style="color: rgb(50, 108, 166); text-decoration: none; border-radius: 0px !important; border: 0px !important; bottom: auto !important; float: none !important; height: auto !important; left: auto !important; margin: 0px !important; outline: 0px !important; overflow: visible !important; padding: 0px !important; position: static !important; right: auto !important; top: auto !important; vertical-align: baseline !important; width: auto !important; box-sizing: content-box !important; min-height: inherit !important; background: none !important;">http://pm.ambimmort.com/app/task/viewtask.do?taskId=T001-UT001-0017&taskGroup=a" 
           "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36"