海山数据库(He3DB)源码详解:海山PG 表和元组的组织方式(2)

海山数据库(He3DB)源码详解:海山PG 表和元组的组织方式(2)

一、页的操作

1、页面初始化

访问Page时会先将它加载到内存,所以Page可以仅用一个char *类型的指针来表示,指向内存中该Page的起始位置。由于Page的大小是已知的,通过Page指针和Page的大小即可表示并访问一个Page。在构建一个Page时,会调用PageInit函数进行初始化。

void
PageInit(Page page, Size pageSize, Size specialSize)
{
    // p指向Page的头部的起始位置,也是整个Page的起始位置
    PageHeader  p = (PageHeader) page; 

    // 对special区域的大小进行对齐
    specialSize = MAXALIGN(specialSize);

    // Page的大小应该为常量BLCKSZ(默认是8192)
    Assert(pageSize == BLCKSZ);
    // 除了头部和special区域外,Page内还应该有可用空间
    Assert(pageSize > specialSize + SizeOfPageHeaderData);

    // 将整个Page的内容填充为0
    MemSet(p, 0, pageSize);

    // 初始化Page头部的一些字段
    p->pd_flags = 0;
    p->pd_lower = SizeOfPageHeaderData;
    p->pd_upper = pageSize - specialSize;
    p->pd_special = pageSize - specialSize;
    PageSetPageSizeAndVersion(page, pageSize, PG_PAGE_LAYOUT_VERSION);
    /* p->pd_prune_xid = InvalidTransactionId;      done by above MemSet */
}
  • 页面初始化函数:流程首先判断参数是否正确,即pageSize是否等于BLCKSZ8KB),之后对specialSize的大小做判断
  • 初始化页面头部信息中的字段,如pd->flagspd->lowerpd->upperpd->special以及pd->pagesize_version等标志位

2、检验页面有效性

bool
PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags)
{
	PageHeader	p = (PageHeader) page;
	size_t *pagebytes = NULL;
	int i = 0;
	bool		checksum_failure = false;
	bool		header_sane = false;
	bool		all_zeroes = false;
	uint16		checksum = 0;

	if (!PageIsNew(page))
	{
		if (DataChecksumsEnabled())
		{
			checksum = pg_checksum_page((char *) page, blkno);

			if (checksum != p->pd_checksum) {
				checksum_failure = true;
			}
		}

		if ((p->pd_flags & ~PD_VALID_FLAG_BITS) == 0 && p->pd_lower <= p->pd_upper && p->pd_upper <= p->pd_special && p->pd_special <= BLCKSZ && p->pd_special == MAXALIGN(p->pd_special)) {
			header_sane = true;
		}

		if (header_sane && !checksum_failure) {
			LOG_FUNCTION_EXIT();
			return true;
		}
	}

	all_zeroes = true;
	pagebytes = (size_t *) page;
	for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++)
	{
		if (pagebytes[i] != 0)
		{
			all_zeroes = false;
			break;
		}
	}

	if (all_zeroes) {
		LOG_FUNCTION_EXIT();
		return true;
	}

	if (checksum_failure)
	{
		if ((flags & PIV_LOG_WARNING) != 0) {
			ereport(WARNING,
				(errcode(ERRCODE_DATA_CORRUPTED), errmsg("page verification failed, calculated checksum %u but expected %u", checksum, p->pd_checksum)));
		}

		if ((flags & PIV_REPORT_STAT) != 0) {
			pgstat_report_checksum_failure();
		}
		

		if (header_sane && ignore_checksum_failure) {
			LOG_FUNCTION_EXIT();
			return true;
		}
	}
	return false;
}
  • 函数的作用是检查页面头部信息和检验和是否有效
  • 首先判断页面的校验和功能是否开启,如果开启,使用pg_checksum_page()函数计算页面的校验和并和页面头部信息中存储的校验和做对比
  • 计算页面头部信息的那些字段是否符合常理
  • 为了效率上的优化,不对全零页面做检测

二、表的操作

在这里插入图片描述

表的打开并不是物理的打开文件,而是返回表的RelationData结构体,核心就是两个函数:

1、relation_open

根据表的OIDlockmode来获得表的RealtionData结构体并加锁,返回relationData。如果是第一次打开,会在RelCache中创建一个新的RelationData结构体。

Relation
relation_open(Oid relationId, LOCKMODE lockmode)
{
	Relation	r;

	Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES);
	
	if (lockmode != NoLock) {
		LockRelationOid(relationId, lockmode);
	}

	r = RelationIdGetRelation(relationId);

	if (!RelationIsValid(r)) {
		elog(ERROR, "could not open relation with OID %u", relationId);
	}

	Assert(lockmode != NoLock ||
		   IsBootstrapProcessingMode() ||
		   CheckRelationLockedByMe(r, AccessShareLock, true));

	if (RelationUsesLocalBuffers(r)) {
		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;
	}

	pgstat_init_relation(r);
	return r;
}
  • 1、断言:Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES); 确保提供的锁模式在有效范围内。
  • 2、获取锁:如果锁模式不是 NoLock,则调用 LockRelationOid(relationId, lockmode); 获取相应的锁。如果是 NoLock,则不获取锁,并记录日志。
  • 3、打开关系:通过 RelationIdGetRelation(relationId); 根据 OID 获取关系的缓存条目。
  • 4、检查关系有效性:如果获取的关系无效(!RelationIsValid(r)),则记录错误日志并抛出错误。
  • 5、断言持有锁:如果锁模式不是 NoLock,则断言当前事务已经持有该关系的锁(
  • 6、标记访问临时关系:如果关系使用本地缓冲区(即临时表),则通过 MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE; 标记当前事务访问了临时命名空间。
  • 7、统计初始化:调用 pgstat_init_relation(r); 初始化关系的统计信息。
  • 8、返回关系:最后,函数返回打开的关系的缓存条目。

2、relation_openrv

根据表的name来获取表的OID,进而调用relation_open函数。

Relation
relation_openrv(const RangeVar *relation, LOCKMODE lockmode)
{
	Oid relOid = 0;

	if (lockmode != NoLock) {
		AcceptInvalidationMessages();
	}

	relOid = RangeVarGetRelid(relation, lockmode, false);

	return relation_open(relOid, NoLock);
}
  • 1、变量初始化:Oid relOid = 0; 初始化一个用于存储关系对象标识符(OID)的变量。
  • 2、处理锁模式:
    如果锁模式不是 NoLock,则调用 AcceptInvalidationMessages();。这个函数通常用于处理来自其他事务的无效化消息,以确保当前事务能够感知到最新的数据状态。
    如果锁模式是 NoLock,则记录一条日志。
  • 3、获取关系 OID:通过调用 RangeVarGetRelid(relation, lockmode, false); 获取关系的 OID。这个函数会根据提供的 RangeVar结构和锁模式来查找关系的 OID,并在需要时获取相应的锁。
  • 4、打开关系:调用 relation_open(relOid, NoLock); 根据获取到的 OID 打开关系。这里传递 NoLock 作为锁模式是因为在 RangeVarGetRelid 中已经根据需要获取了锁,所以在这里不需要再次获取。
  • 5、返回值:函数返回打开的关系的缓存条目。

3、扫描表

在这里插入图片描述

  1. 首先将文件块逐一加载到缓冲区中,然后扫描每个缓冲区中的每一个元组,以找到满足条件的元
    组。
  2. 在对一个表进行扫描的时候,会使用结构体HeapScanDescData来保存表的基本信息以及当前的扫
    描状。

三、元组的操作

对元组的操作包括、插入、删除和更新三种操作,其中在元组操作中,更新是通过删除旧元组并插入新元组实现的。

1、插入元组

插入元组的数据接口是 heap_insert()函数。

void
heap_insert(Relation relation, HeapTuple tup, CommandId cid,
			int options, BulkInsertState bistate)
{
	TransactionId xid = GetCurrentTransactionId();
	HeapTuple	heaptup;
	Buffer buffer = InvalidBuffer;
	Buffer		vmbuffer = InvalidBuffer;
	bool		all_visible_cleared = false;
	
	Assert(HeapTupleHeaderGetNatts(tup->t_data) <=
		   RelationGetNumberOfAttributes(relation));

	heaptup = heap_prepare_insert(relation, tup, xid, cid, options);

	buffer = RelationGetBufferForTuple(relation, heaptup->t_len,
									   InvalidBuffer, options, bistate,
									   &vmbuffer, NULL);

	CheckForSerializableConflictIn(relation, NULL, InvalidBlockNumber);

	START_CRIT_SECTION();

	RelationPutHeapTuple(relation, buffer, heaptup,
						 (options & HEAP_INSERT_SPECULATIVE) != 0);

	if (PageIsAllVisible(BufferGetPage(buffer)))
	{
		all_visible_cleared = true;
		PageClearAllVisible(BufferGetPage(buffer));
		visibilitymap_clear(relation,
							ItemPointerGetBlockNumber(&(heaptup->t_self)),
							vmbuffer, VISIBILITYMAP_VALID_BITS);
	}

	MarkBufferDirty(buffer);

	/* XLOG stuff */
	if (RelationNeedsWAL(relation))
	{
		xl_heap_insert xlrec;
		xl_heap_header xlhdr;
		XLogRecPtr recptr = 0;
		Page		page = BufferGetPage(buffer);
		uint8		info = XLOG_HEAP_INSERT;
		int			bufflags = 0;

		if (RelationIsAccessibleInLogicalDecoding(relation)) {
			log_heap_new_cid(relation, heaptup);
		}

		if (ItemPointerGetOffsetNumber(&(heaptup->t_self)) == FirstOffsetNumber &&
			PageGetMaxOffsetNumber(page) == FirstOffsetNumber)
		{
			info |= XLOG_HEAP_INIT_PAGE;
			bufflags |= REGBUF_WILL_INIT;
		}

		xlrec.offnum = ItemPointerGetOffsetNumber(&heaptup->t_self);
		xlrec.flags = 0;
		if (all_visible_cleared) {
			xlrec.flags |= XLH_INSERT_ALL_VISIBLE_CLEARED;
		}
		if (options & HEAP_INSERT_SPECULATIVE) {
			xlrec.flags |= XLH_INSERT_IS_SPECULATIVE;
		}
		Assert(ItemPointerGetBlockNumber(&heaptup->t_self) == BufferGetBlockNumber(buffer));
		
		if (RelationIsLogicallyLogged(relation) &&
			!(options & HEAP_INSERT_NO_LOGICAL))
		{
			xlrec.flags |= XLH_INSERT_CONTAINS_NEW_TUPLE;
			bufflags |= REGBUF_KEEP_DATA;

			if (IsToastRelation(relation)) {
				xlrec.flags |= XLH_INSERT_ON_TOAST_RELATION;
			}
		}

		XLogBeginInsert();
		XLogRegisterData((char *) &xlrec, SizeOfHeapInsert);

		xlhdr.t_infomask2 = heaptup->t_data->t_infomask2;
		xlhdr.t_infomask = heaptup->t_data->t_infomask;
		xlhdr.t_hoff = heaptup->t_data->t_hoff;

		XLogRegisterBuffer(0, buffer, REGBUF_STANDARD | bufflags);
		XLogRegisterBufData(0, (char *) &xlhdr, SizeOfHeapHeader);
		/* PG73FORMAT: write bitmap [+ padding] [+ oid] + data */
		XLogRegisterBufData(0,
							(char *) heaptup->t_data + SizeofHeapTupleHeader,
							heaptup->t_len - SizeofHeapTupleHeader);

		/* filtering by origin on a row level is much more efficient */
		XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);

		recptr = XLogInsert(RM_HEAP_ID, info);

		PageSetLSN(page, recptr);
	}

	END_CRIT_SECTION();

	UnlockReleaseBuffer(buffer);
	if (vmbuffer != InvalidBuffer) {
		ReleaseBuffer(vmbuffer);
	}

	CacheInvalidateHeapTuple(relation, heaptup, NULL);
	
	pgstat_count_heap_insert(relation, 1);

	if (heaptup != tup)
	{
		tup->t_self = heaptup->t_self;
		heap_freetuple(heaptup);
	}
	return;
}

该函数的流程如下所示:
在这里插入图片描述

  1. 首先为新插入的元组调用 newoid 函数为其分配一个OID
  2. 初始化元组,包括设置t_xmint_cmin为当前事务ID和当前命令ID、将t_xmax设置为无效、设置
    tableOid(包含此元组的表的OID
  3. 找到属于该表且空闲空间大于newtup的文件块,将其载入缓冲区以用来插入tup(调用函数
    RealtionGetBufferForTuple)。
  4. 有了新插入的元组tup和存放元组的缓冲区后,就会调用RelationPutHeapTuple函数将新元组插入
    至选中的缓冲区。
  5. 向事务日志(XLog)写入一条XLog
  6. 当完成上述过程后,将缓冲区解锁释放,并返回插入元组的OID

2、删除元组

PostgreSQL中,使用标记删除的方式删除元组,这对于MVCC是有好处的,其UndoRedo速度是相当高
速的,因只需重新设置即可。被标记删除的磁盘空间会通过运行VACUUM收回。
删除元组主要调用 heap_delete 来实现:

TM_Result
heap_delete(Relation relation, ItemPointer tid,
			CommandId cid, Snapshot crosscheck, bool wait,
			TM_FailureData *tmfd, bool changingPart)
{
	TM_Result	result;
	TransactionId xid = GetCurrentTransactionId();
	ItemId		lp;
	HeapTupleData tp;
	Page		page;
	BlockNumber block = 0;
	Buffer buffer = InvalidBuffer;
	Buffer		vmbuffer = InvalidBuffer;
	TransactionId new_xmax;
	uint16		new_infomask,
				new_infomask2;
	bool		have_tuple_lock = false;
	bool iscombo = false;
	bool		all_visible_cleared = false;
	HeapTuple	old_key_tuple = NULL;	/* replica identity of the tuple */
	bool		old_key_copied = false;

	Assert(ItemPointerIsValid(tid));

	if (IsInParallelMode()) {
		ereport(ERROR,
			(errcode(ERRCODE_INVALID_TRANSACTION_STATE), errmsg("cannot delete tuples during a parallel operation")));
	}

	block = ItemPointerGetBlockNumber(tid);
	buffer = ReadBuffer(relation, block);
	page = BufferGetPage(buffer);

	if (PageIsAllVisible(page)) {
		visibilitymap_pin(relation, block, &vmbuffer);
	}

	LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);

	lp = PageGetItemId(page, ItemPointerGetOffsetNumber(tid));
	Assert(ItemIdIsNormal(lp));

	tp.t_tableOid = RelationGetRelid(relation);
	tp.t_data = (HeapTupleHeader) PageGetItem(page, lp);
	tp.t_len = ItemIdGetLength(lp);
	tp.t_self = *tid;

l1:
	if (vmbuffer == InvalidBuffer && PageIsAllVisible(page))
	{
		LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
		visibilitymap_pin(relation, block, &vmbuffer);
		LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
	}

	result = HeapTupleSatisfiesUpdate(&tp, cid, buffer);

	if (result == TM_Invisible)
	{
		UnlockReleaseBuffer(buffer);
		ereport(ERROR,
				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
				 errmsg("attempted to delete invisible tuple")));
	}
	else if (result == TM_BeingModified && wait)
	{
		TransactionId xwait;
		uint16		infomask;

		/* must copy state data before unlocking buffer */
		xwait = HeapTupleHeaderGetRawXmax(tp.t_data);
		infomask = tp.t_data->t_infomask;

		if (infomask & HEAP_XMAX_IS_MULTI)
		{
			bool		current_is_member = false;

			if (DoesMultiXactIdConflict((MultiXactId) xwait, infomask,
										LockTupleExclusive, &current_is_member))
			{
				LockBuffer(buffer, BUFFER_LOCK_UNLOCK);

				if (!current_is_member) {
					heap_acquire_tuplock(relation,
							     &(tp.t_self),
							     LockTupleExclusive,
							     LockWaitBlock,
							     &have_tuple_lock);
				}

				/* wait for multixact */
				MultiXactIdWait((MultiXactId) xwait, MultiXactStatusUpdate, infomask,
								relation, &(tp.t_self), XLTW_Delete,
								NULL);
				LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);

				if ((vmbuffer == InvalidBuffer && PageIsAllVisible(page)) ||
					xmax_infomask_changed(tp.t_data->t_infomask, infomask) ||
					!TransactionIdEquals(HeapTupleHeaderGetRawXmax(tp.t_data),
										 xwait))
					goto l1;
			}
		}
		else if (!TransactionIdIsCurrentTransactionId(xwait))
		{
			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
			heap_acquire_tuplock(relation, &(tp.t_self), LockTupleExclusive,
								 LockWaitBlock, &have_tuple_lock);
			XactLockTableWait(xwait, relation, &(tp.t_self), XLTW_Delete);
			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);

			if ((vmbuffer == InvalidBuffer && PageIsAllVisible(page)) ||
				xmax_infomask_changed(tp.t_data->t_infomask, infomask) ||
				!TransactionIdEquals(HeapTupleHeaderGetRawXmax(tp.t_data),
									 xwait))
				goto l1;

			/* Otherwise check if it committed or aborted */
			UpdateXmaxHintBits(tp.t_data, buffer, xwait);
		}

		if ((tp.t_data->t_infomask & HEAP_XMAX_INVALID) ||
			HEAP_XMAX_IS_LOCKED_ONLY(tp.t_data->t_infomask) ||
			HeapTupleHeaderIsOnlyLocked(tp.t_data))
			result = TM_Ok;
		else if (!ItemPointerEquals(&tp.t_self, &tp.t_data->t_ctid))
			result = TM_Updated;
		else
			result = TM_Deleted;
	}

	if (crosscheck != InvalidSnapshot && result == TM_Ok)
	{
		/* Perform additional check for transaction-snapshot mode RI updates */
		if (!HeapTupleSatisfiesVisibility(&tp, crosscheck, buffer)) {
			result = TM_Updated;
		}
	}

	if (result != TM_Ok)
	{
		Assert(result == TM_SelfModified ||
			   result == TM_Updated ||
			   result == TM_Deleted ||
			   result == TM_BeingModified);
		Assert(!(tp.t_data->t_infomask & HEAP_XMAX_INVALID));
		Assert(result != TM_Updated ||
			   !ItemPointerEquals(&tp.t_self, &tp.t_data->t_ctid));
		tmfd->ctid = tp.t_data->t_ctid;
		tmfd->xmax = HeapTupleHeaderGetUpdateXid(tp.t_data);
		if (result == TM_SelfModified)
			tmfd->cmax = HeapTupleHeaderGetCmax(tp.t_data);
		else
			tmfd->cmax = InvalidCommandId;
		UnlockReleaseBuffer(buffer);
		if (have_tuple_lock) {
			UnlockTupleTuplock(relation, &(tp.t_self),
					   LockTupleExclusive);
		}
		if (vmbuffer != InvalidBuffer) {
			ReleaseBuffer(vmbuffer);
		}
		return result;
	}

	CheckForSerializableConflictIn(relation, tid, BufferGetBlockNumber(buffer));

	HeapTupleHeaderAdjustCmax(tp.t_data, &cid, &iscombo);
	
	old_key_tuple = ExtractReplicaIdentity(relation, &tp, true, &old_key_copied);

	MultiXactIdSetOldestMember();

	compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(tp.t_data),
							  tp.t_data->t_infomask, tp.t_data->t_infomask2,
							  xid, LockTupleExclusive, true,
							  &new_xmax, &new_infomask, &new_infomask2);

	START_CRIT_SECTION();

	PageSetPrunable(page, xid);

	if (PageIsAllVisible(page))
	{
		all_visible_cleared = true;
		PageClearAllVisible(page);
		visibilitymap_clear(relation, BufferGetBlockNumber(buffer),
							vmbuffer, VISIBILITYMAP_VALID_BITS);
	}

	/* store transaction information of xact deleting the tuple */
	tp.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
	tp.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
	tp.t_data->t_infomask |= new_infomask;
	tp.t_data->t_infomask2 |= new_infomask2;
	HeapTupleHeaderClearHotUpdated(tp.t_data);
	HeapTupleHeaderSetXmax(tp.t_data, new_xmax);
	HeapTupleHeaderSetCmax(tp.t_data, cid, iscombo);
	/* Make sure there is no forward chain link in t_ctid */
	tp.t_data->t_ctid = tp.t_self;

	/* Signal that this is actually a move into another partition */
	if (changingPart) {
		HeapTupleHeaderSetMovedPartitions(tp.t_data);
	}

	MarkBufferDirty(buffer);

	if (RelationNeedsWAL(relation))
	{
		xl_heap_delete xlrec;
		xl_heap_header xlhdr;
		XLogRecPtr recptr = 0;

		if (RelationIsAccessibleInLogicalDecoding(relation)) {
			log_heap_new_cid(relation, &tp);
		}

		xlrec.flags = 0;
		if (all_visible_cleared) {
			xlrec.flags |= XLH_DELETE_ALL_VISIBLE_CLEARED;
		}
		if (changingPart) {
			xlrec.flags |= XLH_DELETE_IS_PARTITION_MOVE;
		}
		xlrec.infobits_set = compute_infobits(tp.t_data->t_infomask,
											  tp.t_data->t_infomask2);
		xlrec.offnum = ItemPointerGetOffsetNumber(&tp.t_self);
		xlrec.xmax = new_xmax;

		if (old_key_tuple != NULL)
		{
			if (relation->rd_rel->relreplident == REPLICA_IDENTITY_FULL)
				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_TUPLE;
			else
				xlrec.flags |= XLH_DELETE_CONTAINS_OLD_KEY;
		}

		XLogBeginInsert();
		XLogRegisterData((char *) &xlrec, SizeOfHeapDelete);

		XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);

		if (old_key_tuple != NULL)
		{
			xlhdr.t_infomask2 = old_key_tuple->t_data->t_infomask2;
			xlhdr.t_infomask = old_key_tuple->t_data->t_infomask;
			xlhdr.t_hoff = old_key_tuple->t_data->t_hoff;

			XLogRegisterData((char *) &xlhdr, SizeOfHeapHeader);
			XLogRegisterData((char *) old_key_tuple->t_data
							 + SizeofHeapTupleHeader,
							 old_key_tuple->t_len
							 - SizeofHeapTupleHeader);
		}

		/* filtering by origin on a row level is much more efficient */
		XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);

		recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_DELETE);

		PageSetLSN(page, recptr);
	}

	END_CRIT_SECTION();

	LockBuffer(buffer, BUFFER_LOCK_UNLOCK);

	if (vmbuffer != InvalidBuffer) {
		ReleaseBuffer(vmbuffer);
	}

	if (relation->rd_rel->relkind != RELKIND_RELATION &&
		relation->rd_rel->relkind != RELKIND_MATVIEW)
	{
		/* toast table entries should never be recursively toasted */
		Assert(!HeapTupleHasExternal(&tp));
	}
	else if (HeapTupleHasExternal(&tp)) {
		heap_toast_delete(relation, &tp, false);
	}
	
	CacheInvalidateHeapTuple(relation, &tp, NULL);

	/* Now we can release the buffer */
	ReleaseBuffer(buffer);

	/*
	 * Release the lmgr tuple lock, if we had it.
	 */
	if (have_tuple_lock) {
		UnlockTupleTuplock(relation, &(tp.t_self), LockTupleExclusive);
	}
	pgstat_count_heap_delete(relation);

	if (old_key_tuple != NULL && old_key_copied) {
		heap_freetuple(old_key_tuple);
	}
	
	return TM_Ok;
}

其主要流程如下:

  1. 根据要删除的元组 tid 得到相关的缓冲区,并对其加排他锁。
  2. 调用 HeapTupleSatisfiesUpdate 函数检查元组对当前事务的可见性。如果元组对当前事务是不可
    见的(HeapTupleSatisfiesUpdate函数返回HeapTupleInvisible),那么对缓冲区解锁并释放,再
    返回错误信息。
  3. 如果元组正在被本事务修改(HeapTupleSatisfiesUpdate 函数返回 HeapTupleSelfUpdated)或
    已经修改(HeapTupleSatisfiesUpdate 函数返回 HeapTupleUpdated),则将元组的ctid字段指
    向被修改后的元组物理位置,并对缓冲区解锁,释放,再返回 HeapTupleSelfUpdated
    HeapTupleUpdated 信息。
  4. 如果元组正在被其他事务修改(HeapTupleSatisfiesUpdate 函数返回
    HeapTupleBeingUpdated),那么将等待该事务结束再检测。如果事务可以修改
    HeapTupleSatisfiesUpdate 函数返回 HeapTupleMayBeUpdated),那么heap_delete会继续
    向下执行。
  5. 进入临界区域,设置t_xmaxt_cmax为当前事务ID和当前命令ID。那么到此位置该元组已经被标
    记删除
  6. 记录XLog
  7. 如果此元组存在线外数据,即经过TOAST的数据,那么还需要将其TOAST表中对应的数据删除。
  8. 如果是系统表元组,则发送无效消息。
  9. 设置FSM表中该元组所处文件块的空闲空间值。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值