Hang with log file sync

本文详细记录了昨天数据库核心库出现故障的过程,包括ORA-01555错误和UNDO表空间耗尽等问题,通过分析等待事件和系统日志,最终定位到SAN配置错误导致的物理链路丢失,以及误操作触发的数据库宕机。同时,讨论了UNDO表空间配置与实际使用情况不符的原因,以及业务调整导致的问题。文章强调了操作谨慎的重要性,并提供了故障恢复的步骤。

昨天一套核心库出现故障,几百个session堵塞,最后宕机,由于数据库是客服库,很敏感,所以办公室顿时嘈杂起来。
我们处理这类故障的原则是先恢复业务,其它的都随后在展开,在随后的分析工作中,走了点弯路,下面我把整个故障的情况展现给大家。
首先,在alert里面发现了ora-01555,这个是从7点左右开始的,一直持续到下午2点都有这样的事件.

Tue May 22 07:11:56 2012
ORA-01555 caused by SQL statement below (SQL ID: gd31z0zdh33nj, Query Duration=101 sec, SCN: 0x0b3c.37ef1c98)

Tue May 22 08:45:34 2012
Errors in file /u01/oracle/app/oracle/admin/crmkf/bdump/crmkf1_smon_8004.trc:
ORA-30036: unable to extend segment by 4 in undo tablespace 'UNDOTBS1'

直到八点多报出ORA-30036,在随后一直有ORA-01555,的确当时UNDO消耗殆尽,大家都走入了误区,
把数据库Hang都集中在undo相关的因素上,在这个基础上去找原因,但是最后也没啥进展。
看看如下的统计,这是在事故发生的前几分钟的信息:

ORA-01555 caused by SQL statement below (SQL ID: gd31z0zdh33nj, Query Duration=101 sec, SCN: 0x0b3c.37ef1c98):
ORA-01555 caused by SQL statement below (SQL ID: 18hwx0w69f9pg, Query Duration=107 sec, SCN: 0x0b3c.381970a9):
ORA-01555 caused by SQL statement below (SQL ID: 18hwx0w69f9pg, Query Duration=152 sec, SCN: 0x0b3c.380ff215):
ORA-01555 caused by SQL statement below (SQL ID: 18hwx0w69f9pg, Query Duration=73 sec, SCN: 0x0b3c.3822c0ef):
ORA-01555 caused by SQL statement below (SQL ID: dx2gpf5jk1jnu, Query Duration=113 sec, SCN: 0x0b3c.38e2030f):
ORA-01555 caused by SQL statement below (SQL ID: f1v5vww09yzzd, Query Duration=170 sec, SCN: 0x0b3c.3a53f9b0):
ORA-01555 caused by SQL statement below (SQL ID: 948xj46pvh0j4, Query Duration=28 sec, SCN: 0x0b3c.3a84c77d):
ORA-01555 caused by SQL statement below (SQL ID: gd31z0zdh33nj, Query Duration=186 sec, SCN: 0x0b3c.3a7d6206):
ORA-01555 caused by SQL statement below (SQL ID: gd31z0zdh33nj, Query Duration=226 sec, SCN: 0x0b3c.3a76b4ab):
ORA-01555 caused by SQL statement below (SQL ID: dx2gpf5jk1jnu, Query Duration=29 sec, SCN: 0x0b3c.3b871228):
ORA-01555 caused by SQL statement below (SQL ID: 02pnhfvmwua8r, Query Duration=37 sec, SCN: 0x0b3c.3be2e941):
ORA-01555 caused by SQL statement below (SQL ID: 05m5p2na5f8yd, Query Duration=6 sec, SCN: 0x0b3c.3c17f5fa):
ORA-01555 caused by SQL statement below (SQL ID: 1xj2gpb5x1a38, Query Duration=6 sec, SCN: 0x0b3c.3c24d864):
ORA-01555 caused by SQL statement below (SQL ID: 4qcd9jnyt4j27, Query Duration=66 sec, SCN: 0x0b3c.3c4c2d04):
ORA-01555 caused by SQL statement below (SQL ID: 2xp5nc6rmd42t, Query Duration=15 sec, SCN: 0x0b3c.3c63797d):
ORA-01555 caused by SQL statement below (SQL ID: 8r4xzv0amrkyb, Query Duration=64 sec, SCN: 0x0b3c.3c5c7c15):
ORA-01555 caused by SQL statement below (SQL ID: 8v307x8qzc78y, Query Duration=12 sec, SCN: 0x0b3c.3d084c98):

很短小的事务也失败,这个有点不可理解,duration=6的也报出ora-01555。
再来看看当时数据库的等待事件:

col samp_time for a25
col event for a30
select session_id sid,
       to_char(SAMPLE_TIME,'YY-MM-DD:HH:MI:SS') samp_time,
       session_state,
       event,
       p1,
       p2,
       p3,
       wait_time,
       time_waited
from DBA_HIST_ACTIVE_SESS_HISTORY
  where to_char(SAMPLE_TIME,'YY-MM-DD:HH24:MI:SS') between
 '12-05-22:14:45:00' and '12-05-22:14:55:00' --and event like'cursor%' and p1=2057858921
  order by sample_id desc;

SID SAMP_TIME                        SESSION_STATE  EVENT                  P1         P2         P3        TIME_WAITED
---------- ------------------------- ------------- ---------------------- ---------- ---------- ---------- -----------
      5608 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986328
      5613 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986320
      5615 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986324
      5616 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986314
      5619 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986327
      5621 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986321
      5622 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986311
      5623 12-05-22:02:52:30         WAITING       enq: FB - contention   1178730502          5   42451933      498021
      5625 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986323
      5627 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986321
      5631 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986322
      5633 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986323
      5634 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986327
      5637 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986308
      5638 12-05-22:02:52:30         WAITING       enq: FB - contention   1178730502          5   42451933      498022
      5640 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986328
      5641 12-05-22:02:52:30         WAITING       log file sync          4294967295          0          0      986328
      5642 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986329
      5656 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986315
      5658 12-05-22:02:52:30         WAITING       log file sync                2734          0          0      986278

这里只摘录了一部分,看看等待的统计信息:

SQL>  select event,count(*) from
  2   dba_hist_active_sess_history
  3   where to_char(SAMPLE_TIME,'YY-MM-DD:HH24:MI:SS') between
  4   '12-05-22:14:45:00' and '12-05-22:14:55:00' 
  5   group by event
  6   order by 2 desc;

EVENT                              COUNT(*)
---------------------------------  --------
log file sync                         12370
enq: FB - contention                   1018
buffer busy waits                       497
db file sequential read                 290
db file parallel write                  231
direct path read                        165
control file sequential read             92
enq: HW - contention                     79
log file parallel write                  27
control file parallel write              25
read by other session                    13
gcs log flush sync                        2
wait for scn ack                          1

当时数据库基本都在等待log file sync,enq:FB-contention,尤其log file sync等待的时间是s级别。这个基本可以断定IO肯定有问题了,
于是查看系统log,有了实质性的发现,导致redo写入异常,从而引发数据库的不正常,废话少说了,看下面log:

May 22 14:51:14 zjddkf02 vmunix: LVM: WARNING: VG 64 0x090000: LV 3: Some I/O requests to this LV are waiting
May 22 14:51:14 zjddkf02 vmunix: LVM: WARNING: VG 64 0x140000: LV 10: Some I/O requests to this LV are waiting
May 22 14:51:18 zjddkf02 vmunix: LVM: WARNING: VG 64 0x090000: LV 1: Some I/O requests to this LV are waiting
May 22 14:51:19 zjddkf02 vmunix: LVM: WARNING: VG 64 0x140000: LV 12: Some I/O requests to this LV are waiting
May 22 14:51:26 zjddkf02 vmunix: LVM: WARNING: VG 64 0x110000: LV 18: Some I/O requests to this LV are waiting
May 22 14:51:26 zjddkf02 vmunix: LVM: VG 64 0x170000: PVLink 1 0x000078 Failed! The PV is not accessible.
May 22 14:51:29 zjddkf02 vmunix: LVM: VG 64 0x120000: PVLink 1 0x000035 Failed! The PV is not accessible.
May 22 14:51:26 zjddkf02 vmunix:        indefinitely for an unavailable PV. These requests will be queued until
May 22 14:51:38 zjddkf02  above message repeats 25 times
May 22 14:51:38 zjddkf02 vmunix: LVM: WARNING: VG 64 0x190000: LV 2: Some I/O requests to this LV are waiting
May 22 14:51:26 zjddkf02 vmunix:        the PV becomes available (or a timeout is specified for the LV).
May 22 14:51:38 zjddkf02  above message repeats 25 times
May 22 14:51:38 zjddkf02 vmunix:        indefinitely for an unavailable PV. These requests will be queued until
May 22 14:51:38 zjddkf02 vmunix:        the PV becomes available (or a timeout is specified for the LV).
May 22 14:51:39 zjddkf02 vmunix: LVM: VG 64 0x090000: PVLink 1 0x000054 Failed! The PV is not accessible.
May 22 14:51:41 zjddkf02 vmunix: LVM: WARNING: VG 64 0x190000: LV 14: Some I/O requests to this LV are waiting
May 22 14:51:47 zjddkf02 vmunix: LVM: VG 64 0x130000: PVLink 1 0x000037 Failed! The PV is not accessible.
May 22 14:51:47 zjddkf02 vmunix: LVM: VG 64 0x130000: PVLink 1 0x000038 Failed! The PV is not accessible.
May 22 14:51:41 zjddkf02 vmunix:        indefinitely for an unavailable PV. These requests will be queued until
May 22 14:51:47 zjddkf02 vmunix: LVM: VG 64 0x130000: PVLink 1 0x00003c Failed! The PV is not accessible.
May 22 14:51:41 zjddkf02 vmunix:        the PV becomes available (or a timeout is specified for the LV).
May 22 14:51:47 zjddkf02 vmunix: LVM: VG 64 0x130000: PVLink 1 0x000044 Failed! The PV is not accessible.
May 22 14:51:49 zjddkf02 cmdisklockd[4459]: Still trying to inquire cluster lock disk /dev/disk/disk274
May 22 14:51:59 zjddkf02 vmunix: LVM: VG 64 0x180000: PVLink 1 0x000083 Failed! The PV is not accessible.
May 22 14:51:59 zjddkf02 vmunix: LVM: VG 64 0x180000: PVLink 1 0x000085 Failed! The PV is not accessible.
...
May 22 14:55:37 zjddkf02  above message repeats 4 times
May 22 14:55:37 zjddkf02 vmunix: LVM: WARNING: VG 64 0x110000: LV 19: Some I/O requests to this LV are waiting
May 22 14:55:38 zjddkf02 vmunix: LVM: VG 64 0x090000: PVLink 1 0x000054 Recovered.
May 22 14:55:38 zjddkf02 vmunix: LVM: NOTICE: VG 64 0x090000: LV 1: All I/O requests to this LV that were
May 22 14:55:37 zjddkf02 vmunix:        indefinitely for an unavailable PV. These requests will be queued until
May 22 14:55:38 zjddkf02 vmunix:        waiting indefinitely for an unavailable PV have now completed.
May 22 14:55:37 zjddkf02 vmunix:        the PV becomes available (or a timeout is specified for the LV).
May 22 14:55:38 zjddkf02 vmunix: LVM: NOTICE: VG 64 0x090000: LV 3: All I/O requests to this LV that were
May 22 14:55:38 zjddkf02 vmunix: LVM: NOTICE: VG 64 0x090000: LV 5: All I/O requests to this LV that were
May 22 14:55:38 zjddkf02 vmunix: LVM: NOTICE: VG 64 0x090000: LV 4: All I/O requests to this LV that were
May 22 14:55:38 zjddkf02 vmunix: LVM: VG 64 0x140000: PVLink 1 0x000045 Recovered.
May 22 14:55:38 zjddkf02 vmunix: LVM: VG 64 0x140000: PVLink 1 0x000046 Recovered.
May 22 14:55:38 zjddkf02 vmunix: LVM: VG 64 0x140000: PVLink 1 0x000047 Recovered.
May 22 14:55:38 zjddkf02 vmunix:        waiting indefinitely for an unavailable PV have now completed.
May 22 14:55:38 zjddkf02  above message repeats 3 times
May 22 14:55:38 zjddkf02 vmunix: LVM: VG 64 0x140000: PVLink 1 0x000049 Recovered.
May 22 14:55:38 zjddkf02 vmunix: LVM: VG 64 0x140000: PVLink 1 0x000050 Recovered.
May 22 14:55:38 zjddkf02 vmunix: LVM: VG 64 0x140000: PVLink 1 0x000053 Recovered.
May 22 14:55:38 zjddkf02 vmunix: LVM: VG 64 0x140000: PVLink 1 0x000051 Failed! The PV is not accessible.
May 22 14:55:38 zjddkf02 vmunix: LVM: NOTICE: VG 64 0x140000: LV 5: All I/O requests to this LV that were
May 22 14:55:38 zjddkf02 vmunix:        waiting indefinitely for an unavailable PV have now completed.
May 22 14:55:38 zjddkf02 vmunix: LVM: NOTICE: VG 64 0x140000: LV 16: All I/O requests to this LV that were
May 22 14:55:38 zjddkf02 vmunix: LVM: NOTICE: VG 64 0x140000: LV 2: All I/O requests to this LV that were
May 22 14:55:38 zjddkf02 vmunix: LVM: WARNING: VG 64 0x140000: LV 12: Some I/O requests to this LV are waiting
May 22 14:55:38 zjddkf02 vmunix:        indefinitely for an unavailable PV. These requests will be queued until
May 22 14:55:38 zjddkf02 vmunix:        the PV becomes available (or a timeout is specified for the LV).
May 22 14:55:38 zjddkf02 vmunix:        waiting indefinitely for an unavailable PV have now completed.
May 22 14:55:38 zjddkf02 vmunix: LVM: WARNING: VG 64 0x190000: LV 14: Some I/O requests to this LV are waiting
May 22 14:55:38 zjddkf02 vmunix: LVM: WARNING: VG 64 0x130000: LV 16: Some I/O requests to this LV are waiting
May 22 14:55:38 zjddkf02 vmunix:        indefinitely for an unavailable PV. These requests will be queued until
May 22 14:55:38 zjddkf02  above message repeats 2 times
May 22 14:55:38 zjddkf02 vmunix:        the PV becomes available (or a timeout is specified for the LV).
May 22 14:55:38 zjddkf02  above message repeats 2 times
May 22 14:55:39 zjddkf02 vmunix: LVM: VG 64 0x150000: PVLink 1 0x000056 Recovered.
May 22 14:55:39 zjddkf02 vmunix: LVM: VG 64 0x150000: PVLink 1 0x000057 Failed! The PV is not accessible.
May 22 14:55:39 zjddkf02 vmunix: LVM: VG 64 0x150000: PVLink 1 0x000058 Failed! The PV is not accessible.
May 22 14:55:39 zjddkf02 vmunix: LVM: NOTICE: VG 64 0x150000: LV 29: All I/O requests to this LV that were
May 22 14:55:39 zjddkf02 vmunix:        waiting indefinitely for an unavailable PV have now completed.
May 22 14:55:39 zjddkf02 vmunix: LVM: VG 64 0x120000: PVLink 1 0x000027 Recovered.
May 22 14:55:39 zjddkf02 vmunix: LVM: VG 64 0x120000: PVLink 1 0x000028 Recovered.
May 22 14:55:39 zjddkf02 vmunix: LVM: VG 64 0x120000: PVLink 1 0x000029 Recovered.
May 22 14:55:39 zjddkf02 vmunix: LVM: NOTICE: VG 64 0x120000: LV 27: All I/O requests to this LV that were
May 22 14:55:39 zjddkf02 vmunix:        waiting indefinitely for an unavailable PV have now completed.
May 22 14:55:39 zjddkf02 vmunix: LVM: NOTICE: VG 64 0x130000: LV 16: All I/O requests to this LV that were
May 22 14:55:39 zjddkf02 vmunix: LVM: VG 64 0x180000: PVLink 1 0x000083 Recovered.
May 22 14:55:39 zjddkf02 vmunix: LVM: VG 64 0x180000: PVLink 1 0x000084 Recovered.
May 22 14:55:39 zjddkf02 vmunix:        waiting indefinitely for an unavailable PV have now completed.

摘录部分信息,从14:51:14开始出现问题,pv不可得,物理链路丢失,但是到14:55:39已经recovered完成。唉,悲催的,有人动了SAN的相关配置文件。
又是误操作,,,,

再来说说UNDO tablespace,其实我们的UNDO完全是够用的,看看现在的配置:

SQL> show parameter undo

NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ------------------------------
_gc_undo_affinity                    boolean                          FALSE
undo_management                      string                           AUTO
undo_retention                       integer                          36000
undo_tablespace                      string                           UNDOTBS1

SQL> SELECT d.undo_size/(1024*1024) "ACTUAL UNDO SIZE [MByte]",
  2  SUBSTR(e.value,1,25) "UNDO RETENTION [Sec]",
  3  (TO_NUMBER(e.value) * TO_NUMBER(f.value) *
  4  g.undo_block_per_sec) / (1024*1024)
  5  "NEEDED UNDO SIZE [MByte]"
  6  FROM (
  7  SELECT SUM(a.bytes) undo_size
  8  FROM v$datafile a,
  9  v$tablespace b,
 10  dba_tablespaces c
 11  WHERE c.contents = 'UNDO'
 12  AND c.status = 'ONLINE'
 13  AND b.name = c.tablespace_name
 14  AND a.ts# = b.ts#
 15  ) d,
 16  v$parameter e,
 17  v$parameter f,
 18  (
 19  SELECT MAX(undoblks/((end_time-begin_time)*3600*24))
 20  undo_block_per_sec
 21  FROM v$undostat
 22  ) g
 23  WHERE e.name = 'undo_retention'
 24  AND f.name = 'db_block_size'
 25  /

ACTUAL UNDO SIZE [MByte] UNDO RETENTION [Sec]                NEEDED UNDO SIZE [MByte]
------------------------ ----------------------------------- ------------------------
                   96000 36000                                                35242.5

要保证undo retention需要的表空间大小事35242.5M,我们的UNDO表空间大小是96000M,为什么早晨会开始报ORA-01555呢,
经过和应用沟通,最近他们把本来晚上跑的批处理,报表,放在白天开始跑。隐隐蛋疼的感觉,这么大的操作,也能随意调时间,
白天业务忙,事务量高,一个报表都要跑1-2小时,不报01555才怪呢。

汇报到这里吧,IT系统有风险,操作需谨慎,可惜误操作的哥们,撞到雷了,,,,一定要小心,小心,再小心

服务端代码:bool worker_snapshot_preview_handle(WORKER_CTX *worker_ctx, struct tpsocket_fd *sock, struct list_head *buf) { struct timespec cur_time; clock_gettime(CLOCK_MONOTONIC, &cur_time); MYDEBUG("#### local snapshot producer reached : %ld.%ld\n", cur_time.tv_sec, cur_time.tv_nsec/1000/1000); SNAPSHOT_PREVIEW_PRODUCER *producer; SNAPSHOT_PREVIEW_CONSUMER *local_consumer, *nlocal_consumer; SNAPSHOT_PREVIEW_SERVER *snapshot_preview_server; DEV_INFO *dev_info = NULL; MEDIACENTER_CTX *mediacenter_ctx = worker_ctx->top_ctx; char *dev_id = NULL; if (!worker_ctx || !sock || !buf) { DBG_ERR("arg NULL\n"); return false; } if (!(producer = (SNAPSHOT_PREVIEW_PRODUCER *)snapshot_preview_producer_new(sock, worker_ctx->snapshot_preview_server))) { DBG_ERR("snapshot_preview_producer_new failed\n"); return false; } snapshot_preview_producer_set_content_len(producer, sock->parser.http.length); snapshot_preview_producer_set_dev_ip(producer, sock->addr); list_for_each_entry(dev_info, &mediacenter_ctx->dev_info, list) { if (!strncmp(dev_info->dev_ip, producer->dev_ip, sizeof(dev_info->dev_ip))) { dev_id = dev_info->dev_id; break; } } snapshot_preview_producer_set_dev_id(producer, dev_id); sock->handler.cb = tpsocket_event_snapshot_preview_producer_cb; sock->handler.priv = producer; snapshot_preview_cache_init(producer, producer->content_len); snapshot_preview_start(worker_ctx, producer); /* find local consumer */ snapshot_preview_server = worker_ctx->snapshot_preview_server; list_for_each_entry_safe(local_consumer, nlocal_consumer, &snapshot_preview_server->consumer, list) { if (!strncmp(local_consumer->dev_id, producer->dev_id, LEN_MAX_ID)) { list_del(&local_consumer->list); snapshot_preview_consumer_set_content_len(local_consumer, producer->content_len); list_add_tail(&local_consumer->list, &producer->local_waiting); } } return true; } bool worker_file_log_handle(WORKER_CTX *worker_ctx, struct tpsocket_fd *sock, struct list_head *buf) { // 1. 获取设备MAC地址 char *mac = common_find_key_from_buf(buf, "Device-Mac"); if (!mac || strlen(mac) < 12) { DBG_ERR("Invalid MAC address\n"); return false; } DBG_ERR("mac: %s\n", mac); // 2. MAC格式化处理(冒号转下划线) char processed_mac[18] = {0}; strncpy(processed_mac, mac, sizeof(processed_mac)-1); for (char *p = processed_mac; *p; p++) { if (*p == ':') *p = '_'; } DBG_ERR("processed_mac: %s\n", processed_mac); // 3. 获取文件名 char *filename = common_find_key_from_buf(buf, "FileName"); if (!filename || filename[0] == '\0') { DBG_ERR("Missing filename\n"); return false; } // 4. 创建目标目录 char dirpath[128]; snprintf(dirpath, sizeof(dirpath), "/mnt/sd_card/%s", processed_mac); // 检创建父目录(/mnt/sd_card) if (access("/mnt/sd_card", F_OK) != 0) { if (mkdir("/mnt/sd_card", 0755) != 0) { DBG_ERR("SD card mount failed: %s\n", strerror(errno)); return false; } } // 创建设备目录(/mnt/sd_card/<processed_mac>) if (access(dirpath, F_OK) != 0) { if (mkdir(dirpath, 0755) != 0) { DBG_ERR("Device dir '%s' creation failed: %s\n", dirpath, strerror(errno)); return false; } } // 5. 构建文件路径 char filepath[256]; snprintf(filepath, sizeof(filepath), "%s/%s", dirpath, filename); DBG_ERR("filepath: %s\n", filepath); // 6. 获取内容长度( char *content_length_str = common_find_key_from_buf(buf, "Content-Length"); if (!content_length_str || *content_length_str == '\0') { DBG_ERR("Missing Content-Length header\n"); return false; } size_t content_len = atol(content_length_str); if (content_len <= 0) { DBG_ERR("Invalid content length: %s\n", content_length_str); return false; } FILE *fp = fopen(filepath, "wb"); if (!fp) { DBG_ERR("File open failed: %s\n", strerror(errno)); return false; } char buffer[8192]; size_t remaining = content_len; ssize_t bytes_read = 0; while (remaining > 0) { size_t to_read = (remaining > sizeof(buffer)) ? sizeof(buffer) : remaining; bytes_read = recv(sock->fd.fd, buffer, to_read, 0); DBG_ERR("sock->fd.fd: %d\n", sock->fd.fd); if (bytes_read <= 0) { DBG_ERR("Socket read error: %zd\n", bytes_read); fclose(fp); return false; } if (fwrite(buffer, 1, bytes_read, fp) != bytes_read) { DBG_ERR("File write error: %s\n", strerror(errno)); fclose(fp); return false; } remaining -= bytes_read; } fsync(fileno(fp)); fclose(fp); return true; }客户端代码:# -*- coding: utf-8 -*- import socket import ssl import time import os import logging from urllib.parse import urlparse def send_file(url: str, mac_address: str, file_path: str, chunk_size: int = 8192): """ Send file to server, placing file content in HTTP request body :param url: Server URL (https://host:port/path) :param mac_address: Device MAC address (format: A1:B2:C3:D4:E5:F6) :param file_path: File path to send :param chunk_size: Chunk size for transfer (default: 8KB) :return: Returns True on success, False on failure """ # Parse URL try: parsed_url = urlparse(url) host = parsed_url.hostname port = parsed_url.port or 443 # Default HTTPS port path = parsed_url.path if not host: raise ValueError("Missing host in URL") except Exception as e: logging.error(f"URL parsing failed: {e}") return False # Configure logging logging.basicConfig( level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s', handlers=[logging.FileHandler('transfer.log'), logging.StreamHandler()] ) # 1. Prepare file information try: file_size = os.path.getsize(file_path) filename = os.path.basename(file_path) logging.info(f"Preparing to send file to {path}: {file_path} ({file_size} bytes)") except Exception as e: logging.error(f"File loading failed: {e}") return False # 2. Create SSL connection (with retry mechanism) MAX_RETRIES = 3 RETRY_DELAY = 2 # seconds ssl_sock = None for attempt in range(MAX_RETRIES): try: context = ssl.create_default_context() context.check_hostname = False # Can disable hostname verification in test environments context.verify_mode = ssl.CERT_NONE # Can disable certificate verification in test environments # context.load_verify_locations(cafile="/path/to/ca_cert.pem") sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.settimeout(120) # Set longer timeout ssl_sock = context.wrap_socket(sock, server_hostname=host) ssl_sock.connect((host, port)) logging.info(f"SSL connection established to {host}:{port}") break except Exception as e: if attempt < MAX_RETRIES - 1: logging.warning(f"SSL connection failed ({attempt+1}/{MAX_RETRIES}): {e}, retrying in {RETRY_DELAY} seconds...") time.sleep(RETRY_DELAY) else: logging.error(f"Maximum retries reached: {e}") return False # 3. Build and send HTTP request headers try: # Construct HTTP request headers headers = [ f"POST {path} HTTP/1.1", f"Host: {host}", "Cache-Control: no-cache", "Connection: keep-alive", # Server requires connection closure f"Device-Mac: {mac_address}", # MAC address header required by server f"FileName: {filename}", # Filename header required by server f"Content-Length: {file_size}", # Key: content length "\r\n" # Empty line ends the headers ] header_str = "\r\n".join(headers) logging.debug(f"Request headers:\n{header_str}") ssl_sock.sendall(header_str.encode('utf-8')) except Exception as e: logging.error(f"Request header send error: {e}") if ssl_sock: ssl_sock.close() return False # 4. Send file content (request body) bytes_sent = 0 start_time = time.time() last_progress_update = start_time try: with open(file_path, 'rb') as f: while bytes_sent < file_size: # Read and send data chunk chunk = f.read(chunk_size) if not chunk: break ssl_sock.sendall(chunk) bytes_sent += len(chunk) # Smart progress update (maximum once per second) current_time = time.time() if current_time - last_progress_update > 1.0: progress = bytes_sent / file_size * 100 elapsed = current_time - start_time speed = bytes_sent / (elapsed * 1024) # KB/s logging.info( f"Sending: {bytes_sent}/{file_size} bytes " f"({progress:.1f}%), Speed: {speed:.2f} KB/s" ) last_progress_update = current_time logging.info(f"File transfer completed: {bytes_sent}/{file_size} bytes") except Exception as e: logging.error(f"Transfer interrupted: {e}") if ssl_sock: ssl_sock.close() return False # 5. Receive server response try: response = b"" while True: chunk = ssl_sock.recv(1024) if not chunk: break response += chunk logging.info(f"Server response: {response.decode('utf-8', errors='ignore')}") except Exception as e: logging.error(f"Response receive error: {e}") # 6. Close connection and calculate performance ssl_sock.close() elapsed = time.time() - start_time if bytes_sent == file_size: speed = file_size / (elapsed * 1024) # KB/s logging.info(f"Transfer successful! Time: {elapsed:.2f} seconds, Speed: {speed:.2f} KB/s") return True logging.warning(f"Transfer incomplete: {bytes_sent}/{file_size} bytes") return False if __name__ == "__main__": # Configure server information SERVER_URL = "https://192.168.137.96:21443/log/file_log" # Ensure path matches server TEST_MAC = "A1:B2:C3:D4:E5:F6" # MAC address format TEST_FILE = "test_file.log" # File to send print("="*50) send_file(SERVER_URL, TEST_MAC, TEST_FILE) print("="*50) 服务端错误码:[1014 15:42:55][Error] mediacenter: [worker.c:5933][tpsocket_event_worker_listen_cb] WORKERC: 192.168.137.1:59469 connected [1014 15:42:55][Error] mediacenter: [worker.c:3665][worker_file_log_handle] mac: A1:B2:C3:D4:E5:F6 [1014 15:42:55][Error] mediacenter: [worker.c:3672][worker_file_log_handle] processed_mac: A1_B2_C3_D4_E5_F6 [1014 15:42:55][Error] mediacenter: [worker.c:3700][worker_file_log_handle] filepath: /mnt/sd_card/A1_B2_C3_D4_E5_F6/test_file.log [1014 15:42:55][Error] mediacenter: [worker.c:3726][worker_file_log_handle] Socket read error: -1 客户端错误码:================================================== 2025-10-14 15:42:54,060 [INFO] Preparing to send file to /log/file_log: test_file.log (2097152 bytes) 2025-10-14 15:42:54,235 [INFO] SSL connection established to 192.168.137.96:21443 2025-10-14 15:42:54,239 [ERROR] Transfer interrupted: [WinError 10053] ��������е�������ֹ��һ���ѽ��������ӡ� ==================================================
最新发布
10-15
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值