python分析nginx中access日志

本文介绍如何利用Python解析大型access日志文件,获取每秒并发量、超时请求和每小时处理量等关键数据,以生成报表。通过引入xlwt模块,可在几分钟内处理十几个G的日志数据。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

最近项目需要,通过访问日志来确认每秒并发量,处理时间超60ms的量,每小时处理量之类的数据,故花了点小时间用python分析access日志来得到数据报表,切入正题就是代码,简单快速,一天十几个G的access日志文件,在几分钟内可以得到相应的报表

#coding=utf-8

import os
import xlwt
import time

FILE_NAME = "/alidata1/wwwlogs/rtb_2017/03/access-rtb_20170311.log"
#FILE_NAME = "access-rtb.log"
#日志格式:'$remote_addr - $remote_user [$time_local][$request_time] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $http_x_forwarded_for'
#100.109.192.22 - - [09/Mar/2017:00:00:16 +0800][0.002] "POST /d_iqiyi HTTP/1.0" 204 0 "-" "-" 123.125.118.42

time_second_statistic = {}
time_min_statistic = {}
time_hour_statistic = {}
remote_ip_statistic = {}
proccess_time_statistic = {}
platform_statistic = {}

def from_this_dir(filename):
    return os.path.join(os.path.dirname(os.path.abspath(__file__)), filename)

def time_out_check(proccess_time):
    if proccess_time == "" or proccess_time == "0":
        return 0
    if 60 < (float(proccess_time) * 1000):
        return 1
    return 0

file_seek_index = 0
del_index = 0
old_time_hour = ""
old_time_day = ""
file_name = ""
time_hour = ""
time_day = ""
line_index = 0
cell_index = 0
file_handle = None
wbk = None
second_sheet = None
min_sheet = None
hour_sheet = None
remoteip_sheet = None
proccess_sheet = None
platform_sheet = None
quit_flag = False
file_name_time_str = ""

while(not quit_flag):
    if file_handle == None:
        file_handle = open(FILE_NAME)
        file_handle.seek(file_seek_index)
    line = file_handle.readline()
    file_seek_index = file_handle.tell()
    if line == '':
        quit_flag = True
        time_hour = ""
        time_day = ""

    if quit_flag == False:
        line = line.strip('\n')
        strs = line.split(' ')
        time_second = strs[3][1:]
        file_name_time_str = time_second
        time_min = time_second[0:len(time_second)-3]
        time_hour = time_min[0:len(time_min)-3]
        time_day = time_hour[0:len(time_hour) - 3]
        proccess_time = strs[4][7:12]
        remote_ip = strs[
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值