海山数据库(He3DB)源码详解:海山Mysql 崩溃恢复(2)-日志解析

崩溃恢复之日志解析

在这里插入图片描述

MySQL redo日志崩溃恢复整体流程如图所示,本文将针对日志解析过程中涉及到的几个函数源码进行详细解析。

1. recv_group_scan_log_recs()函数流程

static
bool
recv_group_scan_log_recs(
	log_group_t*	group,
	lsn_t*		contiguous_lsn,
	bool		last_phase)         // 接收并扫描日志组中的日志记录
{
	DBUG_ENTER("recv_group_scan_log_recs");
	assert(!last_phase || recv_sys->mlog_checkpoint_lsn > 0);

	mutex_enter(&recv_sys->mutex);
	recv_sys->len = 0;
	recv_sys->recovered_offset = 0;
	recv_sys->n_addrs = 0;
	recv_sys_empty_hash();
	srv_start_lsn = *contiguous_lsn;
	recv_sys->parse_start_lsn = *contiguous_lsn;
	recv_sys->scanned_lsn = *contiguous_lsn;
	recv_sys->recovered_lsn = *contiguous_lsn;
	recv_sys->scanned_checkpoint_no = 0;
	recv_previous_parsed_rec_type = MLOG_SINGLE_REC_FLAG;
	recv_previous_parsed_rec_offset	= 0;
	recv_previous_parsed_rec_is_multi = 0;
	ut_ad(recv_max_page_lsn == 0);
	ut_ad(last_phase || !recv_writer_thread_active);
	mutex_exit(&recv_sys->mutex);

	lsn_t	checkpoint_lsn	= *contiguous_lsn;
	lsn_t	start_lsn;
	lsn_t	end_lsn;
	store_t	store_to_hash	= last_phase ? STORE_IF_EXISTS : STORE_YES;
	ulint	available_mem	= UNIV_PAGE_SIZE
		* (buf_pool_get_n_pages()
		   - (recv_n_pool_free_frames * srv_buf_pool_instances));

	end_lsn = *contiguous_lsn = ut_uint64_align_down(
		*contiguous_lsn, OS_FILE_LOG_BLOCK_SIZE);

	do {
		if (last_phase && store_to_hash == STORE_NO) {
			store_to_hash = STORE_IF_EXISTS;
			/* We must not allow change buffer
			merge here, because it would generate
			redo log records before we have
			finished the redo log scan. */
			recv_apply_hashed_log_recs(FALSE);
		}

		start_lsn = end_lsn;
		end_lsn += RECV_SCAN_SIZE;

		log_group_read_log_seg(
			log_sys->buf, group, start_lsn, end_lsn);
	} while (!recv_scan_log_recs(
			 available_mem, &store_to_hash, log_sys->buf,
			 RECV_SCAN_SIZE,
			 checkpoint_lsn,
			 start_lsn, contiguous_lsn, &group->scanned_lsn));

	if (recv_sys->found_corrupt_log || recv_sys->found_corrupt_fs) {
		DBUG_RETURN(false);
	}

	DBUG_PRINT("ib_log", ("%s " LSN_PF
			      " completed for log group " ULINTPF,
			      last_phase ? "rescan" : "scan",
			      group->scanned_lsn, group->id));

	DBUG_RETURN(store_to_hash == STORE_NO);
}

1、参数验证和初始化:

  • 使用assert断言来确保如果last_phase为真,则必须有有效的检查点LSNrecv_sys->mlog_checkpoint_lsn
  • 初始化接收系统recv_sys的一些关键成员变量,比如长度、恢复偏移量、地址数量等,并清空哈希表。设置一些起始LSN比如srv_start_lsnrecv_sys->parse_start_lsn等为传入的contiguous_lsn值。

2、内存和资源准备:

  • 计算可用的内存量,这是基于缓冲池的总页数减去为接收保留的空闲帧数。

3、日志扫描循环:

  • 调整end_lsncontiguous_lsn向下对齐到日志块大小OS_FILE_LOG_BLOCK_SIZE
	end_lsn = *contiguous_lsn = ut_uint64_align_down(
		*contiguous_lsn, OS_FILE_LOG_BLOCK_SIZE);
  • 在一个循环中,不断读取日志段log_group_read_log_seg,并处理日志记录recv_scan_log_recs。如果是在最后阶段且之前设置为不存储到哈希表STORE_NO,则更改为STORE_IF_EXISTS,并应用已哈希的日志记录。
	do {
		if (last_phase && store_to_hash == STORE_NO) {
            /* 如果满足条件,在日志解析中间,先进行日志应用操作
            如果日志缓冲区满,将更改store_to_hash策略为STORE_NO,且满足last_parse条件
            调用日志应用函数recv_apply_hashed_log_recs()*/
			store_to_hash = STORE_IF_EXISTS;
			recv_apply_hashed_log_recs(FALSE);
		}

		start_lsn = end_lsn;
		end_lsn += RECV_SCAN_SIZE;

		log_group_read_log_seg(
			log_sys->buf, group, start_lsn, end_lsn);  // 读取日志文件将start_lsn到end_lsn的日志读取到log_sys->buf

	} while (!recv_scan_log_recs(
			 available_mem, &store_to_hash, log_sys->buf,
			 RECV_SCAN_SIZE,
			 checkpoint_lsn,
			 start_lsn, contiguous_lsn, &group->scanned_lsn));  // while条件为日志解析函数入口

4、错误处理和返回:

  • 如果在扫描过程中发现损坏的日志或文件系统,则返回false
  • 打印日志信息,表示扫描或重新扫描完成。
  • 返回store_to_hash == STORE_NO的值,这可能表示是否还有更多日志需要处理。

2. recv_scan_log_recs()函数流程

static
bool
recv_scan_log_recs(   // 接收并扫描日志记录的函数
/*===============*/
	ulint		available_memory,/*!< in: we let the hash table of recs
					to grow to this size, at the maximum */
	store_t*	store_to_hash,	/*!< in,out: whether the records should be
					stored to the hash table; this is reset
					if just debug checking is needed, or
					when the available_memory runs out */
	const byte*	buf,		/*!< in: buffer containing a log
					segment or garbage */
	ulint		len,		/*!< in: buffer length */
	lsn_t		checkpoint_lsn,	/*!< in: latest checkpoint LSN */
	lsn_t		start_lsn,	/*!< in: buffer start lsn */
	lsn_t*		contiguous_lsn,	/*!< in/out: it is known that all log
					groups contain contiguous log data up
					to this lsn */
	lsn_t*		group_scanned_lsn)/*!< out: scanning succeeded up to
					this lsn */
{
	const byte*	log_block	= buf;
	ulint		no;
	lsn_t		scanned_lsn	= start_lsn;
	bool		finished	= false;
	ulint		data_len;
	bool		more_data	= false;
	ulint		recv_parsing_buf_size = RECV_PARSING_BUF_SIZE;

	ut_ad(start_lsn % OS_FILE_LOG_BLOCK_SIZE == 0);
	ut_ad(len % OS_FILE_LOG_BLOCK_SIZE == 0);
	ut_ad(len >= OS_FILE_LOG_BLOCK_SIZE);

	do {
		ut_ad(!finished);
		no = log_block_get_hdr_no(log_block);
		ulint expected_no = log_block_convert_lsn_to_no(scanned_lsn);
		if (no != expected_no) {
			finished = true;
			break;
		}

		if (!log_block_checksum_is_ok(log_block)) {
			ib::error() << "Log block " << no <<
				" at lsn " << scanned_lsn << " has valid"
				" header, but checksum field contains "
				<< log_block_get_checksum(log_block)
				<< ", should be "
				<< log_block_calc_checksum(log_block);
			finished = true;
			break;
		}

		if (log_block_get_flush_bit(log_block)) {

			if (scanned_lsn > *contiguous_lsn) {
				*contiguous_lsn = scanned_lsn;
			}
		}

		data_len = log_block_get_data_len(log_block);

		if (scanned_lsn + data_len > recv_sys->scanned_lsn
		    && log_block_get_checkpoint_no(log_block)
		    < recv_sys->scanned_checkpoint_no
		    && (recv_sys->scanned_checkpoint_no
			- log_block_get_checkpoint_no(log_block)
			> 0x80000000UL)) {

			/* Garbage from a log buffer flush which was made
			before the most recent database recovery */
			finished = true;
			break;
		}

		if (!recv_sys->parse_start_lsn
		    && (log_block_get_first_rec_group(log_block) > 0)) {

			/* We found a point from which to start the parsing
			of log records */

			recv_sys->parse_start_lsn = scanned_lsn
				+ log_block_get_first_rec_group(log_block);
			recv_sys->scanned_lsn = recv_sys->parse_start_lsn;
			recv_sys->recovered_lsn = recv_sys->parse_start_lsn;
		}

		scanned_lsn += data_len;

		if (scanned_lsn > recv_sys->scanned_lsn) {

			DBUG_EXECUTE_IF(
				"reduce_recv_parsing_buf",
				recv_parsing_buf_size
					= (70 * 1024);
				);

			if (recv_sys->len + 4 * OS_FILE_LOG_BLOCK_SIZE
			    >= recv_parsing_buf_size) {
				ib::error() << "Log parsing buffer overflow."
					" Recovery may have failed!";

				recv_sys->found_corrupt_log = true;

			} else if (!recv_sys->found_corrupt_log) {
				more_data = recv_sys_add_to_parsing_buf(
					log_block, scanned_lsn);
			}

			recv_sys->scanned_lsn = scanned_lsn;
			recv_sys->scanned_checkpoint_no
				= log_block_get_checkpoint_no(log_block);
		}

		if (data_len < OS_FILE_LOG_BLOCK_SIZE) {
			/* Log data for this group ends here */
			finished = true;
			break;
		} else {
			log_block += OS_FILE_LOG_BLOCK_SIZE;
		}
	} while (log_block < buf + len);

	*group_scanned_lsn = scanned_lsn;

	if (recv_needed_recovery
	    || (recv_is_from_backup && !recv_is_making_a_backup)) {
		recv_scan_print_counter++;

		if (finished || (recv_scan_print_counter % 80 == 0)) {

			ib::info() << "Doing recovery: scanned up to"
				" log sequence number " << scanned_lsn;
		}
	}

	if (more_data && !recv_sys->found_corrupt_log) {
		/* Try to parse more log records */

		if (recv_parse_log_recs(checkpoint_lsn,
					*store_to_hash)) {
			ut_ad(recv_sys->found_corrupt_log
			      || recv_sys->found_corrupt_fs
			      || recv_sys->mlog_checkpoint_lsn
			      == recv_sys->recovered_lsn);
			return(true);
		}

		if (*store_to_hash != STORE_NO
		    && mem_heap_get_size(recv_sys->heap) > available_memory) {
			*store_to_hash = STORE_NO;
		}

		if (recv_sys->recovered_offset > recv_parsing_buf_size / 4) {
			/* Move parsing buffer data to the buffer start */

			recv_sys_justify_left_parsing_buf();
		}
	}

	return(finished);
}

1、初始化:设置日志块指针、扫描LSN、完成标志等。
2、循环处理日志块:

  • 检验日志块头部的no字段和尾部的checksum字段
    no = log_block_get_hdr_no(log_block);
    ulint expected_no = log_block_convert_lsn_to_no(scanned_lsn);
    if (no != expected_no) {
        /* 检查日志块头部no字段*/ 
        finished = true;
        break;
    }
    if (!log_block_checksum_is_ok(log_block)) {
        ib::error() << "Log block " << no <<
            " at lsn " << scanned_lsn << " has valid"
            " header, but checksum field contains "
            << log_block_get_checksum(log_block)
            << ", should be "
            << log_block_calc_checksum(log_block);
        /* Garbage or an incompletely written log block.

        This could be the result of killing the server
        while it was writing this log block. We treat
        this as an abrupt end of the redo log. */
        finished = true;
        break;
    }
  • 检验当前块是否具有flush bit。具有flush bit的块,表示是某次将Log buffer中的block刷新到磁盘时的第一个被刷入的块。如果块具有flush bit,则更新contiguous_lsn,表示当前scanned_lsn之前的日志都是连续的。
    if (log_block_get_flush_bit(log_block)) {

        if (scanned_lsn > *contiguous_lsn) {
            *contiguous_lsn = scanned_lsn;
        }
    }
  • 如果找到开始解析日志记录的点,则更新相关状态信息。
    if (!recv_sys->parse_start_lsn
        && (log_block_get_first_rec_group(log_block) > 0)) {

        /* We found a point from which to start the parsing
        of log records */

        recv_sys->parse_start_lsn = scanned_lsn
            + log_block_get_first_rec_group(log_block);
        recv_sys->scanned_lsn = recv_sys->parse_start_lsn;
        recv_sys->recovered_lsn = recv_sys->parse_start_lsn;
    }
  • 更新扫描LSN,并根据需要添加日志数据到解析缓冲区。
more_data = recv_sys_add_to_parsing_buf(
					log_block, scanned_lsn);

3、处理结束:

  • 更新group_scanned_lsn
*group_scanned_lsn = scanned_lsn;
  • 根据恢复需求打印恢复进度。
  • 如果还有更多数据且未发现损坏日志,则尝试解析更多日志记录。
    if (more_data && !recv_sys->found_corrupt_log) {
		/* Try to parse more log records */

		if (recv_parse_log_recs(checkpoint_lsn,
					*store_to_hash)) {
			ut_ad(recv_sys->found_corrupt_log
			      || recv_sys->found_corrupt_fs
			      || recv_sys->mlog_checkpoint_lsn
			      == recv_sys->recovered_lsn);
			return(true);
		}

		if (*store_to_hash != STORE_NO
		    && mem_heap_get_size(recv_sys->heap) > available_memory) {   // 根据内存使用情况决定是否继续存储到哈希表。
			*store_to_hash = STORE_NO;
		}

		if (recv_sys->recovered_offset > recv_parsing_buf_size / 4) {    // 如果解析缓冲区中的数据较多,则调整数据位置。
			/* Move parsing buffer data to the buffer start */

			recv_sys_justify_left_parsing_buf();
		}
	}

3.recv_parse_log_recs()函数流程

static MY_ATTRIBUTE((warn_unused_result))
bool
recv_parse_log_recs(
	lsn_t		checkpoint_lsn,
	store_t		store)   // 解析日志记录
{
	byte*		ptr;
	byte*		end_ptr;
	bool		single_rec;
	ulint		len;
	lsn_t		new_recovered_lsn;
	lsn_t		old_lsn;
	mlog_id_t	type;
	ulint		space;
	ulint		page_no;
	byte*		body;

	ut_ad(log_mutex_own());
	ut_ad(recv_sys->parse_start_lsn != 0);
loop:
	ptr = recv_sys->buf + recv_sys->recovered_offset;

	end_ptr = recv_sys->buf + recv_sys->len;

	if (ptr == end_ptr) {

		return(false);
	}

	switch (*ptr) {
	case MLOG_CHECKPOINT:
#ifdef UNIV_LOG_LSN_DEBUG
	case MLOG_LSN:
#endif /* UNIV_LOG_LSN_DEBUG */
	case MLOG_DUMMY_RECORD:
		single_rec = true;
		break;
	default:
		single_rec = !!(*ptr & MLOG_SINGLE_REC_FLAG);
	}

	if (single_rec) {
		/* The mtr did not modify multiple pages */

		old_lsn = recv_sys->recovered_lsn;

		len = recv_parse_log_rec(&type, ptr, end_ptr, &space,
					 &page_no, true, &body);

		if (len == 0) {
			return(false);
		}

		if (recv_sys->found_corrupt_log) {
			recv_report_corrupt_log(
				ptr, type, space, page_no);
			return(true);
		}

		if (recv_sys->found_corrupt_fs) {
			return(true);
		}

		new_recovered_lsn = recv_calc_lsn_on_data_add(old_lsn, len);

		if (new_recovered_lsn > recv_sys->scanned_lsn) {

			return(false);
		}

		recv_previous_parsed_rec_type = type;
		recv_previous_parsed_rec_offset = recv_sys->recovered_offset;
		recv_previous_parsed_rec_is_multi = 0;

		recv_sys->recovered_offset += len;
		recv_sys->recovered_lsn = new_recovered_lsn;

		switch (type) {
			lsn_t	lsn;
		case MLOG_DUMMY_RECORD:
			...
	} else {

		ulint	total_len	= 0;
		ulint	n_recs		= 0;
		bool	only_mlog_file	= true;
		ulint	mlog_rec_len	= 0;

		for (;;) {
			len = recv_parse_log_rec(
				&type, ptr, end_ptr, &space, &page_no,
				false, &body);

			if (len == 0) {
				return(false);
			}

			if (recv_sys->found_corrupt_log
			    || type == MLOG_CHECKPOINT
			    || (*ptr & MLOG_SINGLE_REC_FLAG)) {
				recv_sys->found_corrupt_log = true;
				recv_report_corrupt_log(
					ptr, type, space, page_no);
				return(true);
			}

			if (recv_sys->found_corrupt_fs) {
				return(true);
			}

			recv_previous_parsed_rec_type = type;
			recv_previous_parsed_rec_offset
				= recv_sys->recovered_offset + total_len;
			recv_previous_parsed_rec_is_multi = 1;

			if (type != MLOG_FILE_NAME && only_mlog_file == true) {
				only_mlog_file = false;
			}

			if (only_mlog_file) {
				new_recovered_lsn = recv_calc_lsn_on_data_add(
					recv_sys->recovered_lsn, len);
				mlog_rec_len += len;
				recv_sys->recovered_offset += len;
				recv_sys->recovered_lsn = new_recovered_lsn;
			}

			total_len += len;
			n_recs++;

			ptr += len;

			if (type == MLOG_MULTI_REC_END) {
				DBUG_PRINT("ib_log",
					   ("scan " LSN_PF
					    ": multi-log end"
					    " total_len " ULINTPF
					    " n=" ULINTPF,
					    recv_sys->recovered_lsn,
					    total_len, n_recs));
				total_len -= mlog_rec_len;
				break;
			}

			DBUG_PRINT("ib_log",
				   ("scan " LSN_PF ": multi-log rec %s"
				    " len " ULINTPF
				    " page " ULINTPF ":" ULINTPF,
				    recv_sys->recovered_lsn,
				    get_mlog_string(type), len, space, page_no));
		}

		new_recovered_lsn = recv_calc_lsn_on_data_add(
			recv_sys->recovered_lsn, total_len);

		if (new_recovered_lsn > recv_sys->scanned_lsn) {
			/* The log record filled a log block, and we require
			that also the next log block should have been scanned
			in */

			return(false);
		}

		/* Add all the records to the hash table */

		ptr = recv_sys->buf + recv_sys->recovered_offset;

		for (;;) {
			old_lsn = recv_sys->recovered_lsn;

			len = recv_parse_log_rec(
				&type, ptr, end_ptr, &space, &page_no,
				true, &body);

			if (recv_sys->found_corrupt_log
			    && !recv_report_corrupt_log(
				    ptr, type, space, page_no)) {
				return(true);
			}

			if (recv_sys->found_corrupt_fs) {
				return(true);
			}

			ut_a(len != 0);
			ut_a(!(*ptr & MLOG_SINGLE_REC_FLAG));

			recv_sys->recovered_offset += len;
			recv_sys->recovered_lsn
				= recv_calc_lsn_on_data_add(old_lsn, len);

			switch (type) {
			case MLOG_MULTI_REC_END:
				...
				}
			}

			ptr += len;
		}
	}

	goto loop;
}

1、循环解析日志记录:

  • 使用一个标签loop:开始一个无限循环,用于连续解析日志记录,直到没有更多记录可以解析。

2、解析单个或多个日志记录:

  • 根据日志记录的第一个字节,判断这是一个单条记录还是多条记录的开始。
single_rec = !!(*ptr & MLOG_SINGLE_REC_FLAG);
  • 对于单条记录,直接解析并处理。
len = recv_parse_log_rec(&type, ptr, end_ptr, &space,
                &page_no, true, &body);
  • 对于多条记录,由MLOG_MULTI_REC_START标记开始,循环解析直到遇到MLOG_MULTI_REC_END
for (;;) {
    len = recv_parse_log_rec(
        &type, ptr, end_ptr, &space, &page_no,
        false, &body);

    if (len == 0) {   // 没有更多日志记录可以解析
        return(false);
    }

    if (recv_sys->found_corrupt_log      
        || type == MLOG_CHECKPOINT
        || (*ptr & MLOG_SINGLE_REC_FLAG)) {   // 处理损坏的日志
        recv_sys->found_corrupt_log = true;
        recv_report_corrupt_log(
            ptr, type, space, page_no);
        return(true);
    }

    if (recv_sys->found_corrupt_fs) {
        return(true);
    }

    recv_previous_parsed_rec_type = type;
    recv_previous_parsed_rec_offset
        = recv_sys->recovered_offset + total_len;
    recv_previous_parsed_rec_is_multi = 1;

    if (type != MLOG_FILE_NAME && only_mlog_file == true) {
        only_mlog_file = false;
    }

    if (only_mlog_file) {
        new_recovered_lsn = recv_calc_lsn_on_data_add(
            recv_sys->recovered_lsn, len);
        mlog_rec_len += len;
        recv_sys->recovered_offset += len;
        recv_sys->recovered_lsn = new_recovered_lsn;
    }

    total_len += len;
    n_recs++;

    ptr += len;
    
    ...
}

new_recovered_lsn = recv_calc_lsn_on_data_add(
    recv_sys->recovered_lsn, total_len);  // 计算基于总长度的lsn

if (new_recovered_lsn > recv_sys->scanned_lsn) {
    /* 如果新的LSN大于已扫描的LSN,则表示日志块已填满,
    但下一个日志块尚未扫描完毕,因此返回false。*/

    return(false);
}

3、错误处理和日志记录:

  • 如果在解析过程中发现日志损坏(recv_sys->found_corrupt_log),则报告错误并可能返回。
  • 如果发现文件系统损坏(recv_sys->found_corrupt_fs),则同样返回。

4、更新恢复状态:

  • 解析每条记录后,更新recv_sys中的recovered_offsetrecovered_lsn,以反映当前恢复的进度。
for (;;) {
    old_lsn = recv_sys->recovered_lsn;
    len = recv_parse_log_rec(
        &type, ptr, end_ptr, &space, &page_no,
        true, &body);

    if (recv_sys->found_corrupt_log
        && !recv_report_corrupt_log(
            ptr, type, space, page_no)) {
        return(true);
    }

    if (recv_sys->found_corrupt_fs) {
        return(true);
    }

    ut_a(len != 0);
    ut_a(!(*ptr & MLOG_SINGLE_REC_FLAG));

    recv_sys->recovered_offset += len;
    recv_sys->recovered_lsn
        = recv_calc_lsn_on_data_add(old_lsn, len);
    ...
}

5、处理特定类型的日志记录:

  • 根据解析出的日志记录类型,执行特定的处理逻辑。

6、返回:

  • 如果成功解析完所有日志记录,或者由于某些原因不能继续,函数将返回truefalse

4、recv_parse_log_rec()函数流程

static
ulint
recv_parse_log_rec(  // 解析单条日志记录
	mlog_id_t*	type,
	byte*		ptr,
	byte*		end_ptr,
	ulint*		space,
	ulint*		page_no,
	bool		apply,
	byte**		body)
{
	byte*	new_ptr;

	*body = NULL;

	UNIV_MEM_INVALID(type, sizeof *type);
	UNIV_MEM_INVALID(space, sizeof *space);
	UNIV_MEM_INVALID(page_no, sizeof *page_no);
	UNIV_MEM_INVALID(body, sizeof *body);

	if (ptr == end_ptr) {

		return(0);
	}

	switch (*ptr) {
#ifdef UNIV_LOG_LSN_DEBUG
	case MLOG_LSN | MLOG_SINGLE_REC_FLAG:
	case MLOG_LSN:
		new_ptr = mlog_parse_initial_log_record(
			ptr, end_ptr, type, space, page_no);
		if (new_ptr != NULL) {
			const lsn_t	lsn = static_cast<lsn_t>(
				*space) << 32 | *page_no;
			ut_a(lsn == recv_sys->recovered_lsn);
		}

		*type = MLOG_LSN;
		return(new_ptr - ptr);
#endif /* UNIV_LOG_LSN_DEBUG */
	case MLOG_MULTI_REC_END:
	case MLOG_DUMMY_RECORD:
		*type = static_cast<mlog_id_t>(*ptr);
		return(1);
	case MLOG_CHECKPOINT:
		if (end_ptr < ptr + SIZE_OF_MLOG_CHECKPOINT) {
			return(0);
		}
		*type = static_cast<mlog_id_t>(*ptr);
		return(SIZE_OF_MLOG_CHECKPOINT);
	case MLOG_MULTI_REC_END | MLOG_SINGLE_REC_FLAG:
	case MLOG_DUMMY_RECORD | MLOG_SINGLE_REC_FLAG:
	case MLOG_CHECKPOINT | MLOG_SINGLE_REC_FLAG:
		recv_sys->found_corrupt_log = true;
		return(0);
	}

	new_ptr = mlog_parse_initial_log_record(ptr, end_ptr, type, space,
						page_no);
	*body = new_ptr;

	if (UNIV_UNLIKELY(!new_ptr)) {

		return(0);
	}

	new_ptr = recv_parse_or_apply_log_rec_body(
		*type, new_ptr, end_ptr, *space, *page_no, NULL, NULL);

	if (UNIV_UNLIKELY(new_ptr == NULL)) {

		return(0);
	}

	return(new_ptr - ptr);
}

1、switch(type)对所有不需要进行应用的日志进行处理,需要应用的日志不在switch结构中处理
2、解析出日志的表空间、页号字段

new_ptr = mlog_parse_initial_log_record(ptr, end_ptr, type, space,
						page_no);

3、解析日志内容,此时该函数传入参数blockNULL,应此只进行日志解析不进行日志应用

new_ptr = recv_parse_or_apply_log_rec_body(
		*type, new_ptr, end_ptr, *space, *page_no, NULL, NULL);

5、recv_add_to_hash_table()函数流程

static
void
recv_add_to_hash_table(  // 向哈希表中添加日志记录
/*===================*/
	mlog_id_t	type,		/*!< in: log record type */
	ulint		space,		/*!< in: space id */
	ulint		page_no,	/*!< in: page number */
	byte*		body,		/*!< in: log record body */
	byte*		rec_end,	/*!< in: log record end */
	lsn_t		start_lsn,	/*!< in: start lsn of the mtr */
	lsn_t		end_lsn)	/*!< in: end lsn of the mtr */
{
	recv_t*		recv;
	ulint		len;
	recv_data_t*	recv_data;
	recv_data_t**	prev_field;
	recv_addr_t*	recv_addr;

	ut_ad(type != MLOG_FILE_DELETE);
	ut_ad(type != MLOG_FILE_CREATE2);
	ut_ad(type != MLOG_FILE_RENAME2);
	ut_ad(type != MLOG_FILE_NAME);
	ut_ad(type != MLOG_DUMMY_RECORD);
	ut_ad(type != MLOG_CHECKPOINT);
	ut_ad(type != MLOG_INDEX_LOAD);
	ut_ad(type != MLOG_TRUNCATE);

	len = rec_end - body;

	recv = static_cast<recv_t*>(
		mem_heap_alloc(recv_sys->heap, sizeof(recv_t)));

	recv->type = type;
	recv->len = rec_end - body;
	recv->start_lsn = start_lsn;
	recv->end_lsn = end_lsn;

	recv_addr = recv_get_fil_addr_struct(space, page_no);

	if (recv_addr == NULL) {
		recv_addr = static_cast<recv_addr_t*>(
			mem_heap_alloc(recv_sys->heap, sizeof(recv_addr_t)));

		recv_addr->space = space;
		recv_addr->page_no = page_no;
		recv_addr->state = RECV_NOT_PROCESSED;

		UT_LIST_INIT(recv_addr->rec_list, &recv_t::rec_list);

		HASH_INSERT(recv_addr_t, addr_hash, recv_sys->addr_hash,
			    recv_fold(space, page_no), recv_addr);
		recv_sys->n_addrs++;
#if 0
		fprintf(stderr, "Inserting log rec for space %lu, page %lu\n",
			space, page_no);
#endif
	}

	UT_LIST_ADD_LAST(recv_addr->rec_list, recv);

	prev_field = &(recv->data);

	/* Store the log record body in chunks of less than UNIV_PAGE_SIZE:
	recv_sys->heap grows into the buffer pool, and bigger chunks could not
	be allocated */

	while (rec_end > body) {

		len = rec_end - body;

		if (len > RECV_DATA_BLOCK_SIZE) {
			len = RECV_DATA_BLOCK_SIZE;
		}

		recv_data = static_cast<recv_data_t*>(
			mem_heap_alloc(recv_sys->heap,
				       sizeof(recv_data_t) + len));

		*prev_field = recv_data;

		memcpy(recv_data + 1, body, len);

		prev_field = &(recv_data->next);

		body += len;
	}

	*prev_field = NULL;
}

1、使用recv_sys的堆,为当前日志分配一个recv_t空间,并初始化

	recv = static_cast<recv_t*>(
		mem_heap_alloc(recv_sys->heap, sizeof(recv_t)));

	recv->type = type;
	recv->len = rec_end - body;
	recv->start_lsn = start_lsn;
	recv->end_lsn = end_lsn;

2、调用recv_get_fil_addr_struct(),在哈希表中获取该数据页的链表

recv_addr = recv_get_fil_addr_struct(space, page_no);

3、如果数据页没在哈希表中,在哈希表中创建该节点

  • HASH_INSERT宏将新创建的链表添加到recv_sys->addr_hash的哈希桶中,通过(spacd,page_no)计算哈希值,获取recv_sys->addr_hash中的位置。如果当前位置为NULL,直接插入;如果产生哈希冲突,是链表的方式解决哈希冲突,进行插入。
  • recv_sys->n_addrs表示当前崩溃恢复哈希表中的要进行应用的数据页数量
if (recv_addr == NULL) {
		recv_addr = static_cast<recv_addr_t*>(
			mem_heap_alloc(recv_sys->heap, sizeof(recv_addr_t)));

		recv_addr->space = space;
		recv_addr->page_no = page_no;
		recv_addr->state = RECV_NOT_PROCESSED;

		UT_LIST_INIT(recv_addr->rec_list, &recv_t::rec_list);

		HASH_INSERT(recv_addr_t, addr_hash, recv_sys->addr_hash,
			    recv_fold(space, page_no), recv_addr);
		recv_sys->n_addrs++;
	}

4、将当前日志添加到链表中

UT_LIST_ADD_LAST(recv_addr->rec_list, recv);

5、将日志内容拷贝到recv->data,如果日志长度过长,采用链表的形式链接起来

UT_LIST_ADD_LAST(recv_addr->rec_list, recv);

prev_field = &(recv->data);

/* Store the log record body in chunks of less than UNIV_PAGE_SIZE:
recv_sys->heap grows into the buffer pool, and bigger chunks could not
be allocated */

while (rec_end > body) {

    len = rec_end - body;

    if (len > RECV_DATA_BLOCK_SIZE) {
        len = RECV_DATA_BLOCK_SIZE;
    }

    recv_data = static_cast<recv_data_t*>(
        mem_heap_alloc(recv_sys->heap,
                    sizeof(recv_data_t) + len));

    *prev_field = recv_data;

    memcpy(recv_data + 1, body, len);

    prev_field = &(recv_data->next);

    body += len;
}

redo日志最终存储在recv_sys->addr_hash哈希表结构如下所示:
在这里插入图片描述

日志解析过程中,redo日志位置变化:
在这里插入图片描述

根据日志获取服务不可用阶段Build 湘雅三GCP(生产环境) - dockerbuild - Default Job #83 (GCPXY3-DOC-JOB1-83) started building on agent Local Agent1, bamboo version: 8.1.1 simple 12-Aug-2025 17:57:36 Local Agent1 simple 12-Aug-2025 17:57:36 Build working directory is /var/atlassian/application-data/bamboo/local-working-dir/622593/GCPXY3-DOC-JOB1 simple 12-Aug-2025 17:57:36 Executing build 湘雅三GCP(生产环境) - dockerbuild - Default Job #83 (GCPXY3-DOC-JOB1-83) simple 12-Aug-2025 17:57:36 Running pre-build action: VCS Version Collector simple 12-Aug-2025 17:57:36 Running pre-build action: Build Log Labeller Pre Build Action command 12-Aug-2025 17:57:36 Substituting variable: ${bamboo.prod_xy3} with 119.91.104.191 simple 12-Aug-2025 17:57:36 Starting task &#39;SSH Task&#39; of type &#39;com.atlassian.bamboo.plugins.bamboo-scp-plugin:sshtask&#39; simple 12-Aug-2025 17:57:36 Connecting to 119.91.104.191 on port: 2022 simple 12-Aug-2025 17:57:37 Executing [ simple 12-Aug-2025 17:57:37 cd /opt/xy3/binaries/ simple 12-Aug-2025 17:57:37 docker-compose -f docker-compose-web-prod.yml up -d --build --force-recreate ghc-web simple 12-Aug-2025 17:57:37 docker-compose -f docker-compose-prod.yml up -d --build --force-recreate ctms-auth-prod simple 12-Aug-2025 17:57:37 docker-compose -f docker-compose-prod.yml up -d --build --force-recreate ctms-gateway-prod simple 12-Aug-2025 17:57:37 docker-compose -f docker-compose-prod.yml up -d --build --force-recreate ctms-system-prod simple 12-Aug-2025 17:57:37 docker-compose -f docker-compose-prod.yml up -d --build --force-recreate ctms-irbs-prod simple 12-Aug-2025 17:57:37 simple 12-Aug-2025 17:57:37 docker system prune -f simple 12-Aug-2025 17:57:37 ] build 12-Aug-2025 18:00:09 Step 1/6 : FROM java:8 build 12-Aug-2025 18:00:09 ---> d23bdf5b1b1b build 12-Aug-2025 18:00:09 Step 2/6 : VOLUME /tmp build 12-Aug-2025 18:00:09 ---> Using cache build 12-Aug-2025 18:00:09 ---> 65ebb77cad09 build 12-Aug-2025 18:00:09 Step 3/6 : ARG version build 12-Aug-2025 18:00:09 ---> Using cache build 12-Aug-2025 18:00:09 ---> 227aa09399d2 build 12-Aug-2025 18:00:09 Step 4/6 : ARG app build 12-Aug-2025 18:00:09 ---> Using cache build 12-Aug-2025 18:00:09 ---> 81b3499de399 build 12-Aug-2025 18:00:09 Step 5/6 : COPY services/${version}/lib/${app}/ ./lib/ build 12-Aug-2025 18:00:09 ---> Using cache build 12-Aug-2025 18:00:09 ---> 1e7dd4b95377 build 12-Aug-2025 18:00:09 Step 6/6 : ADD services/${version}/${app}.jar ${app}.jar build 12-Aug-2025 18:00:15 ---> cab7592fa7f9 build 12-Aug-2025 18:00:15 build 12-Aug-2025 18:00:17 Successfully built cab7592fa7f9 build 12-Aug-2025 18:00:17 Successfully tagged ctms-auth-prod:1.0.0 build 12-Aug-2025 18:02:14 Step 1/6 : FROM java:8 build 12-Aug-2025 18:02:15 ---> d23bdf5b1b1b build 12-Aug-2025 18:02:15 Step 2/6 : VOLUME /tmp build 12-Aug-2025 18:02:15 ---> Using cache build 12-Aug-2025 18:02:15 ---> 65ebb77cad09 build 12-Aug-2025 18:02:15 Step 3/6 : ARG version build 12-Aug-2025 18:02:15 ---> Using cache build 12-Aug-2025 18:02:15 ---> 227aa09399d2 build 12-Aug-2025 18:02:15 Step 4/6 : ARG app build 12-Aug-2025 18:02:15 ---> Using cache build 12-Aug-2025 18:02:15 ---> 81b3499de399 build 12-Aug-2025 18:02:15 Step 5/6 : COPY services/${version}/lib/${app}/ ./lib/ build 12-Aug-2025 18:02:16 ---> Using cache build 12-Aug-2025 18:02:16 ---> 9036f99d9220 build 12-Aug-2025 18:02:16 Step 6/6 : ADD services/${version}/${app}.jar ${app}.jar build 12-Aug-2025 18:02:29 ---> 7cdc7da77ff7 build 12-Aug-2025 18:02:29 build 12-Aug-2025 18:02:30 Successfully built 7cdc7da77ff7 build 12-Aug-2025 18:02:30 Successfully tagged ctms-gateway-prod:1.0.0 build 12-Aug-2025 18:04:17 Step 1/6 : FROM java:8 build 12-Aug-2025 18:04:18 ---> d23bdf5b1b1b build 12-Aug-2025 18:04:18 Step 2/6 : VOLUME /tmp build 12-Aug-2025 18:04:18 ---> Using cache build 12-Aug-2025 18:04:18 ---> 65ebb77cad09 build 12-Aug-2025 18:04:18 Step 3/6 : ARG version build 12-Aug-2025 18:04:18 ---> Using cache build 12-Aug-2025 18:04:18 ---> 227aa09399d2 build 12-Aug-2025 18:04:18 Step 4/6 : ARG app build 12-Aug-2025 18:04:18 ---> Using cache build 12-Aug-2025 18:04:18 ---> 81b3499de399 build 12-Aug-2025 18:04:18 Step 5/6 : COPY services/${version}/lib/${app}/ ./lib/ build 12-Aug-2025 18:04:18 ---> Using cache build 12-Aug-2025 18:04:18 ---> aa994c85a058 build 12-Aug-2025 18:04:18 Step 6/6 : ADD services/${version}/${app}.jar ${app}.jar build 12-Aug-2025 18:04:27 ---> 52ea93d68a7c build 12-Aug-2025 18:04:27 build 12-Aug-2025 18:04:28 Successfully built 52ea93d68a7c build 12-Aug-2025 18:04:28 Successfully tagged ctms-system-prod:1.0.0 build 12-Aug-2025 18:06:09 Step 1/6 : FROM java:8 build 12-Aug-2025 18:06:10 ---> d23bdf5b1b1b build 12-Aug-2025 18:06:10 Step 2/6 : VOLUME /tmp build 12-Aug-2025 18:06:10 ---> Using cache build 12-Aug-2025 18:06:10 ---> 65ebb77cad09 build 12-Aug-2025 18:06:10 Step 3/6 : ARG version build 12-Aug-2025 18:06:10 ---> Using cache build 12-Aug-2025 18:06:10 ---> 227aa09399d2 build 12-Aug-2025 18:06:10 Step 4/6 : ARG app build 12-Aug-2025 18:06:11 ---> Using cache build 12-Aug-2025 18:06:11 ---> 81b3499de399 build 12-Aug-2025 18:06:11 Step 5/6 : COPY services/${version}/lib/${app}/ ./lib/ build 12-Aug-2025 18:06:12 ---> Using cache build 12-Aug-2025 18:06:12 ---> 06af45c2f79d build 12-Aug-2025 18:06:12 Step 6/6 : ADD services/${version}/${app}.jar ${app}.jar build 12-Aug-2025 18:06:23 ---> d28fed9c95a3 build 12-Aug-2025 18:06:23 build 12-Aug-2025 18:06:24 Successfully built d28fed9c95a3 build 12-Aug-2025 18:06:24 Successfully tagged ctms-irbs-prod:1.0.0 build 12-Aug-2025 18:06:26 Deleted Images: build 12-Aug-2025 18:06:26 deleted: sha256:b5ba06737e2e551082b8096fbc83ce3675dac3ccfd413b9fe1b15121fbf97cf9 build 12-Aug-2025 18:06:26 deleted: sha256:35c76628a0ea6c92bf3f1e23a65d39b65cb9cfaf83af57df5293573bff5b944f build 12-Aug-2025 18:06:26 deleted: sha256:904ebc147f8ad93396e7ee55cb978baf913c7a9934b2249d22da7c798f15f5c1 build 12-Aug-2025 18:06:26 deleted: sha256:6587fc1c0e72fe65af594441b34436d18df5546688aa564f1419dfa50b022112 build 12-Aug-2025 18:06:26 deleted: sha256:1c8813bff1f6ce731675cf4229211dd4bf3421e7548792cf7657591a970494ac build 12-Aug-2025 18:06:26 deleted: sha256:16b2e7700101600360923d8b05101ff56e06d8da0928ef41edbf2b3f7a68c30c build 12-Aug-2025 18:06:26 deleted: sha256:943137eae2d76ad2baa2753da688278509d17f3b13c36b819dd66126db3db594 build 12-Aug-2025 18:06:26 deleted: sha256:ad52f9ba3affa27cdfc761c46ee7233344dae05480051f6f00da31088f608174 build 12-Aug-2025 18:06:26 build 12-Aug-2025 18:06:26 Total reclaimed space: 24.83MB error 12-Aug-2025 18:06:26 Found orphan containers (ghc-nacos, ctms-irbs-prod, ghc-minio, ctms-auth-prod, ctms-system-prod, ctms-gateway-prod, ghc-mysql, ghc-mysql2, ghc-redis) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up. error 12-Aug-2025 18:06:26 Recreating ghc-web ... error 12-Aug-2025 18:06:26  error 12-Aug-2025 18:06:26 Recreating ghc-web ... done error 12-Aug-2025 18:06:26 Found orphan containers (ghc-mysql, ghc-minio, ghc-web, ghc-mysql2, ghc-nacos, ghc-redis) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up. error 12-Aug-2025 18:06:26 Building ctms-auth-prod error 12-Aug-2025 18:06:26 Recreating ctms-auth-prod ... error 12-Aug-2025 18:06:26  error 12-Aug-2025 18:06:26 Recreating ctms-auth-prod ... done error 12-Aug-2025 18:06:26 Found orphan containers (ghc-mysql, ghc-web, ghc-mysql2, ghc-nacos, ghc-minio, ghc-redis) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up. error 12-Aug-2025 18:06:26 Building ctms-gateway-prod error 12-Aug-2025 18:06:26 Recreating ctms-gateway-prod ... error 12-Aug-2025 18:06:26  error 12-Aug-2025 18:06:26 Recreating ctms-gateway-prod ... done error 12-Aug-2025 18:06:26 Found orphan containers (ghc-redis, ghc-nacos, ghc-web, ghc-mysql2, ghc-mysql, ghc-minio) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up. error 12-Aug-2025 18:06:26 Building ctms-system-prod error 12-Aug-2025 18:06:26 Recreating ctms-system-prod ... error 12-Aug-2025 18:06:26  error 12-Aug-2025 18:06:26 Recreating ctms-system-prod ... done error 12-Aug-2025 18:06:26 Found orphan containers (ghc-mysql2, ghc-redis, ghc-nacos, ghc-mysql, ghc-minio, ghc-web) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up. error 12-Aug-2025 18:06:26 Building ctms-irbs-prod error 12-Aug-2025 18:06:26 Recreating ctms-irbs-prod ... error 12-Aug-2025 18:06:26  error 12-Aug-2025 18:06:26 Recreating ctms-irbs-prod ... done error 12-Aug-2025 18:06:26  simple 12-Aug-2025 18:06:26 [ simple 12-Aug-2025 18:06:26 cd /opt/xy3/binaries/ simple 12-Aug-2025 18:06:26 docker-compose -f docker-compose-web-prod.yml up -d --build --force-recreate ghc-web simple 12-Aug-2025 18:06:26 docker-compose -f docker-compose-prod.yml up -d --build --force-recreate ctms-auth-prod simple 12-Aug-2025 18:06:26 docker-compose -f docker-compose-prod.yml up -d --build --force-recreate ctms-gateway-prod simple 12-Aug-2025 18:06:26 docker-compose -f docker-compose-prod.yml up -d --build --force-recreate ctms-system-prod simple 12-Aug-2025 18:06:26 docker-compose -f docker-compose-prod.yml up -d --build --force-recreate ctms-irbs-prod simple 12-Aug-2025 18:06:26 simple 12-Aug-2025 18:06:26 docker system prune -f simple 12-Aug-2025 18:06:26 ] has finished. simple 12-Aug-2025 18:06:26 Result: exit code = 0 simple 12-Aug-2025 18:06:26 Finished task &#39;SSH Task&#39; with result: Success simple 12-Aug-2025 18:06:26 Running post build plugin &#39;NCover Results Collector&#39; simple 12-Aug-2025 18:06:26 Running post build plugin &#39;Artifact Copier&#39; simple 12-Aug-2025 18:06:26 Running post build plugin &#39;npm Cache Cleanup&#39; simple 12-Aug-2025 18:06:26 Running post build plugin &#39;Build Results Label Collector&#39; simple 12-Aug-2025 18:06:26 Running post build plugin &#39;Clover Results Collector&#39; simple 12-Aug-2025 18:06:26 Running post build plugin &#39;Docker Container Cleanup&#39; simple 12-Aug-2025 18:06:26 Finalising the build... simple 12-Aug-2025 18:06:26 Stopping timer. simple 12-Aug-2025 18:06:26 Build GCPXY3-DOC-JOB1-83 completed. simple 12-Aug-2025 18:06:26 Running on server: post build plugin &#39;NCover Results Collector&#39; simple 12-Aug-2025 18:06:26 Running on server: post build plugin &#39;Build Hanging Detection Configuration&#39; simple 12-Aug-2025 18:06:26 Running on server: post build plugin &#39;Build Labeller&#39; simple 12-Aug-2025 18:06:26 Running on server: post build plugin &#39;Clover Delta Calculator&#39; simple 12-Aug-2025 18:06:26 Running on server: post build plugin &#39;Maven Dependencies Postprocessor&#39; simple 12-Aug-2025 18:06:26 All post build plugins have finished simple 12-Aug-2025 18:06:26 Generating build results summary... simple 12-Aug-2025 18:06:26 Saving build results to disk... simple 12-Aug-2025 18:06:26 Store variable context... simple 12-Aug-2025 18:06:26 Indexing build results... simple 12-Aug-2025 18:06:26 Finished building GCPXY3-DOC-JOB1-83.
08-14
<think>我们首先需要分析在Docker部署过程中服务不可用的阶段,然后结合日志(特别是Bamboo构建日志和Docker Compose日志)来确定具体的不可用时间段。 ### 服务不可用的阶段回顾 根据之前的讨论,在Docker Compose部署过程中,服务不可用主要发生在以下阶段: 1. **旧容器停止阶段**:当执行`docker-compose stop`或`up`命令中的重建(`--force-recreate`)时,旧容器会先停止(发送SIGTERM信号),此时服务已不可用。 2. **新容器启动阶段**:新容器启动后,直到其通过健康检查(如果有配置)或完全就绪之前,服务仍不可用。这个阶段包括应用初始化(如连接数据库、加载配置等)。 ### 通过日志确定不可用时间段 要确定服务不可用的具体时间段,我们需要从以下日志中获取关键时间戳: #### 1. Bamboo构建日志 在Bamboo构建计划中,通常会执行如下步骤(参考引用[4]): ```bash docker build -t $NAME:$TAG . docker-compose -f $YAML -p $PROJECT_NAME stop docker-compose -f $YAML -p $PROJECT_NAME up -d ``` 关键时间点: - `docker-compose stop`命令执行的时间:此时旧容器停止,服务开始不可用。 - `docker-compose up -d`命令执行的时间:此时开始启动新容器。 #### 2. Docker Compose日志 通过`docker-compose logs`可以获取容器的详细日志。重点关注: - 旧容器的停止时间:在日志中搜索旧容器退出的记录(如`exited with code 0`)。 - 新容器的启动时间:在日志中搜索新容器启动的记录(如`Starting...`)。 - 新容器的就绪时间:在日志中搜索应用启动完成的记录(如应用框架的启动完成消息,或健康检查通过的日志)。 ### 步骤:确定不可用时间段 1. **从Bamboo日志中提取关键命令执行时间**: - 记录`docker-compose stop`命令的开始和结束时间。 - 记录`docker-compose up -d`命令的开始时间。 2. **从Docker Compose日志中提取容器事件时间**: - 使用`docker-compose logs --timestamps ghc-web`获取带时间戳的日志- 查找旧容器停止的日志行(例如,应用收到SIGTERM信号并开始关闭)。 - 查找新容器启动的日志行(例如,应用开始启动)。 - 查找新容器就绪的日志行(例如,应用监听端口或通过健康检查)。 3. **计算不可用时间段**: - **不可用开始时间**:旧容器停止命令执行时间(或旧容器实际停止的日志时间)。 - **不可用结束时间**:新容器通过健康检查(或应用就绪)的日志时间。 ### 示例:日志分析 假设Bamboo日志片段: ``` 2024-06-15 10:00:00,000 INFO: Running: docker-compose -f docker-compose-web-prod.yml -p myproject stop 2024-06-15 10:00:05,000 INFO: Running: docker-compose -f docker-compose-web-prod.yml -p myproject up -d ``` Docker Compose日志(旧容器): ``` 2024-06-15T10:00:01.000Z myproject_ghc-web_1 | Received SIGTERM, shutting down... 2024-06-15T10:00:02.000Z myproject_ghc-web_1 | Service stopped. ``` Docker Compose日志(新容器): ``` 2024-06-15T10:00:06.000Z myproject_ghc-web_1 | Starting service... 2024-06-15T10:00:10.000Z myproject_ghc-web_1 | Connected to database. 2024-06-15T10:00:12.000Z myproject_ghc-web_1 | Listening on port 8080. ``` 则不可用时间段为: - **开始**:2024-06-15 10:00:01(旧容器收到SIGTERM) - **结束**:2024-06-15 10:00:12(新容器监听端口,服务就绪) - **总不可用时间**:11秒 ### 优化建议 1. **配置健康检查**:在`docker-compose.yml`中为服务配置健康检查,这样Docker可以自动判断服务何时就绪[^3]。 2. **使用滚动更新**:在Docker Swarm模式下,通过滚动更新策略(`update_config`)实现零停机部署[^4]。 3. **优雅停止**:确保应用正确处理SIGTERM信号,在关闭前完成正在处理的请求。 ```mermaid graph TD A[Bamboo日志] -->|提取时间戳| B(旧容器停止时间) C[Docker日志] -->|旧容器日志| B C -->|新容器日志| D(新容器就绪时间) B --> E[计算不可用时间段] D --> E ``` ### 如何获取日志 1. **Bamboo日志**:在Bamboo构建结果页面查看构建日志2. **Docker Compose日志**: ```bash # 获取带时间戳的日志(新容器启动后) docker-compose -f docker-compose-web-prod.yml logs --timestamps --tail=100 ghc-web ``` 注意:如果旧容器日志已被覆盖,可以配置Docker日志驱动(如`json-file`)并设置更大的日志大小限制。 ### 相关问题 1. 如何配置Docker Compose健康检查以减少服务中断时间? 2. 在Docker部署中,如何确保旧容器优雅关闭? 3. 如何从Docker日志中提取关键事件的时间戳? 4. 在Bamboo中如何优化构建计划以实现更快的部署? 通过以上步骤,你可以精确分析出服务在Docker部署过程中的不可用时间段,并针对性地进行优化。</think>### 分析 Docker 部署时服务不可用的阶段及日志定位 根据 Docker 部署流程和引用内容,服务不可用主要发生在以下阶段,通过分析日志可精确确定不可用时间段: --- #### 一、服务不可用的关键阶段分析 1. **容器重建过渡期(核心中断阶段)** - **旧容器停止 → 新容器就绪** 的时间窗口(参考 `--force-recreate` 行为[^1]) - **中断触发点**: - 旧容器收到 `SIGTERM` 信号(服务停止响应) - 新容器启动但未通过健康检查(若有配置) - **持续时间公式**: $$T_{down} = T_{stop\_old} + T_{start\_new} + T_{healthcheck}$$ - $T_{stop\_old}$:旧容器优雅关闭时间(默认 10s) - $T_{start\_new}$:新容器启动初始化时间(依赖应用) - $T_{healthcheck}$:健康检查重试时间(若配置) 2. **镜像构建阶段(潜在风险)** - `docker build` 期间旧容器仍运行(服务正常)[^4] - **风险场景**:构建失败且旧容器已被移除(如前置 `stop` 命令) --- #### 二、通过日志定位不可用时间段 结合 `docker-compose` 和 Bamboo 日志分析流程: ```mermaid graph LR A[Bamboo日志] -->|1.提取时间戳| B[构建命令时间点] C[Docker日志] -->|2.分析容器事件| D[停止/启动时间] D -->|3.计算差值| E[不可用时间段] ``` ##### 具体步骤: 1. **从 Bamboo 日志提取关键时间点**(参考引用[4]): ```bash # 示例日志片段 2023-10-01 12:00:00 INFO: docker-compose stop # T1: 停止旧容器 2023-10-01 12:00:05 INFO: docker-compose up -d # T2: 启动新容器 ``` 2. **从 Docker 日志获取容器状态变化**: ```bash docker logs --since $T1 --until $T2 <container_id> ``` - **关键日志标记**: ``` # 旧容器停止 2023-10-01T12:00:01Z Received SIGTERM, shutting down... 2023-10-01T12:00:03Z Container exited # 新容器启动 2023-10-01T12:00:06Z Starting application... 2023-10-01T12:00:15Z Health check passed # T3: 服务就绪 ``` 3. **计算不可用时间段**: - **服务中断开始**:`T1 + 旧容器优雅关闭时间`(如 `12:00:01`) - **服务恢复结束**:`T3`(如 `12:00:15`) - **总不可用时间**:`14秒` --- #### 三、优化建议(减少不可用时间) 1. **配置健康检查**(缩短 $T_{healthcheck}$)[^3]: ```yaml services: ghc-web: healthcheck: test: curl -f http://localhost:8080/health || exit 1 interval: 3s timeout: 2s retries: 3 # 最大等待 9s ``` 2. **实现零停机部署**: - 使用 `docker-compose up --no-deps --scale ghc-web=2` 先启动新容器 - 通过负载均衡切换流量后再停旧容器 3. **日志分析工具**: ```bash # 提取关键事件时间戳 docker events --filter &#39;event=die&#39; --filter &#39;container=<old_container_id>&#39; docker events --filter &#39;event=health_status: healthy&#39; --filter &#39;container=<new_container_id>&#39; ``` --- #### 四、相关实践问题 1. 如何通过 Bamboo 日志识别因镜像构建失败导致的服务中断? - 检查 `docker build` 退出码(非 0 即失败)[^4] - 分析构建失败后是否执行了 `docker-compose stop` 2. 如何区分服务不可用是源于容器启动失败还是健康检查超时? - 启动失败:Docker 日志包含 `Container failed to start` - 健康检查超时:日志包含 `Unhealthy` 但容器仍在运行 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值