golang的垃圾回收算法之四调度控制

最新推荐文章于 2024-08-22 23:42:12 发布

fpcc

最新推荐文章于 2024-08-22 23:42:12 发布

阅读量302

点赞数

CC 4.0 BY-SA版权

分类专栏： Golang 文章标签： golang 算法

本文链接：https://blog.youkuaiyun.com/fpcc/article/details/125991687

Golang 专栏收录该内容

10 篇文章

订阅专栏

本文详细解读了Go语言中调度控制的核心数据结构，如writeBarrier、gcControllerState、work和trace，以及它们在mgc.go、mgcwork.go、trace.go和proc.go中的源码分析。通过实例展示了调度策略和控制逻辑，涉及GC的不同模式（背景、强制、停止世界）和关键步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一、主要的调度控制类

在了解了基本流程和调度的策略方式后，就要看具体的调度控制了，首先要看的就是调度控制的几个基本类：

// The compiler knows about this variable.
// If you change it, you must change the compiler too.
var writeBarrier struct {
	enabled bool    // compiler emits a check of this before calling write barrier
	pad     [3]byte // compiler uses 32-bit load for "enabled" field
	needed  bool    // whether we need a write barrier for current GC phase
	cgo     bool    // whether we need a write barrier for a cgo check
	alignme uint64  // guarantee alignment so that compiler can use a 32 or 64-bit load
}

type gcControllerState struct {
	// scanWork is the total scan work performed this cycle. This
	// is updated atomically during the cycle. Updates occur in
	// bounded batches, since it is both written and read
	// throughout the cycle. At the end of the cycle, this is how
	// much of the retained heap is scannable.
	//
	// Currently this is the bytes of heap scanned. For most uses,
	// this is an opaque unit of work, but for estimation the
	// definition is important.
	scanWork int64

	// bgScanCredit is the scan work credit accumulated by the
	// concurrent background scan. This credit is accumulated by
	// the background scan and stolen by mutator assists. This is
	// updated atomically. Updates occur in bounded batches, since
	// it is both written and read throughout the cycle.
	bgScanCredit int64

	// assistTime is the nanoseconds spent in mutator assists
	// during this cycle. This is updated atomically. Updates
	// occur in bounded batches, since it is both written and read
	// throughout the cycle.
	assistTime int64

	// dedicatedMarkTime is the nanoseconds spent in dedicated
	// mark workers during this cycle. This is updated atomically
	// at the end of the concurrent mark phase.
	dedicatedMarkTime int64

	// fractionalMarkTime is the nanoseconds spent in the
	// fractional mark worker during this cycle. This is updated
	// atomically throughout the cycle and will be up-to-date if
	// the fractional mark worker is not currently running.
	fractionalMarkTime int64

	// idleMarkTime is the nanoseconds spent in idle marking
	// during this cycle. This is updated atomically throughout
	// the cycle.
	idleMarkTime int64

	// markStartTime is the absolute start time in nanoseconds
	// that assists and background mark workers started.
	markStartTime int64

	// dedicatedMarkWorkersNeeded is the number of dedicated mark
	// workers that need to be started. This is computed at the
	// beginning of each cycle and decremented atomically as
	// dedicated mark workers get started.
	dedicatedMarkWorkersNeeded int64

	// assistWorkPerByte is the ratio of scan work to allocated
	// bytes that should be performed by mutator assists. This is
	// computed at the beginning of each cycle and updated every
	// time heap_scan is updated.
	assistWorkPerByte float64

	// assistBytesPerWork is 1/assistWorkPerByte.
	assistBytesPerWork float64

	// fractionalUtilizationGoal is the fraction of wall clock
	// time that should be spent in the fractional mark worker.
	// For example, if the overall mark utilization goal is 25%
	// and GOMAXPROCS is 6, one P will be a dedicated mark worker
	// and this will be set to 0.5 so that 50% of the time some P
	// is in a fractional mark worker. This is computed at the
	// beginning of each cycle.
	fractionalUtilizationGoal float64

	// triggerRatio is the heap growth ratio at which the garbage
	// collection cycle should start. E.g., if this is 0.6, then
	// GC should start when the live heap has reached 1.6 times
	// the heap size marked by the previous cycle. This should be
	// ≤ GOGC/100 so the trigger heap size is less than the goal
	// heap size. This is updated at the end of of each cycle.
	triggerRatio float64

	_ [sys.CacheLineSize]byte

	// fractionalMarkWorkersNeeded is the number of fractional
	// mark workers that need to be started. This is either 0 or
	// 1. This is potentially updated atomically at every
	// scheduling point (hence it gets its own cache line).
	fractionalMarkWorkersNeeded int64

	_ [sys.CacheLineSize]byte
}

var work struct {
	full  uint64                   // lock-free list of full blocks workbuf
	empty uint64                   // lock-free list of empty blocks workbuf
	pad0  [sys.CacheLineSize]uint8 // prevents false-sharing between full/empty and nproc/nwait

	// bytesMarked is the number of bytes marked this cycle. This
	// includes bytes blackened in scanned objects, noscan objects
	// that go straight to black, and permagrey objects scanned by
	// markroot during the concurrent scan phase. This is updated
	// atomically during the cycle. Updates may be batched
	// arbitrarily, since the value is only read at the end of the
	// cycle.
	//
	// Because of benign races during marking, this number may not
	// be the exact number of marked bytes, but it should be very
	// close.
	//
	// Put this field here because it needs 64-bit atomic access
	// (and thus 8-byte alignment even on 32-bit architectures).
	bytesMarked uint64

	markrootNext uint32 // next markroot job
	markrootJobs uint32 // number of markroot jobs

	nproc   uint32
	tstart  int64
	nwait   uint32
	ndone   uint32
	alldone note

	// helperDrainBlock indicates that GC mark termination helpers
	// should pass gcDrainBlock to gcDrain to block in the
	// getfull() barrier. Otherwise, they should pass gcDrainNoBlock.
	//
	// TODO: This is a temporary fallback to support
	// debug.gcrescanstacks > 0 and to work around some known
	// races. Remove this when we remove the debug option and fix
	// the races.
	helperDrainBlock bool

	// Number of roots of various root types. Set by gcMarkRootPrepare.
	nFlushCacheRoots                                             int
	nDataRoots, nBSSRoots, nSpanRoots, nStackRoots, nRescanRoots int

	// markrootDone indicates that roots have been marked at least
	// once during the current GC cycle. This is checked by root
	// marking operations that have to happen only during the
	// first root marking pass, whether that's during the
	// concurrent mark phase in current GC or mark termination in
	// STW GC.
	markrootDone bool

	// Each type of GC state transition is protected by a lock.
	// Since multiple threads can simultaneously detect the state
	// transition condition, any thread that detects a transition
	// condition must acquire the appropriate transition lock,
	// re-check the transition condition and return if it no
	// longer holds or perform the transition if it does.
	// Likewise, any transition must invalidate the transition
	// condition before releasing the lock. This ensures that each
	// transition is performed by exactly one thread and threads
	// that need the transition to happen block until it has
	// happened.
	//
	// startSema protects the transition from "off" to mark or
	// mark termination.
	startSema uint32
	// markDoneSema protects transitions from mark 1 to mark 2 and
	// from mark 2 to mark termination.
	markDoneSema uint32

	bgMarkReady note   // signal background mark worker has started
	bgMarkDone  uint32 // cas to 1 when at a background mark completion point
	// Background mark completion signaling

	// mode is the concurrency mode of the current GC cycle.
	mode gcMode

	// totaltime is the CPU nanoseconds spent in GC since the
	// program started if debug.gctrace > 0.
	totaltime int64

	// initialHeapLive is the value of memstats.heap_live at the
	// beginning of this GC cycle.
	initialHeapLive uint64

	// assistQueue is a queue of assists that are blocked because
	// there was neither enough credit to steal or enough work to
	// do.
	assistQueue struct {
		lock       mutex
		head, tail guintptr
	}

	// rescan is a list of G's that need to be rescanned during
	// mark termination. A G adds itself to this list when it
	// first invalidates its stack scan.
	rescan struct {
		lock mutex
		list []guintptr
	}

	// Timing/utilization stats for this cycle.
	stwprocs, maxprocs                 int32
	tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start

	pauseNS    int64 // total STW time this cycle
	pauseStart int64 // nanotime() of last STW

	// debug.gctrace heap sizes for this cycle.
	heap0, heap1, heap2, heapGoal uint64
}

// trace is global tracing context.
var trace struct {
	lock          mutex       // protects the following members
	lockOwner     *g          // to avoid deadlocks during recursive lock locks
	enabled       bool        // when set runtime traces events
	shutdown      bool        // set when we are waiting for trace reader to finish after setting enabled to false
	headerWritten bool        // whether ReadTrace has emitted trace header
	footerWritten bool        // whether ReadTrace has emitted trace footer
	shutdownSema  uint32      // used to wait for ReadTrace completion
	seqStart      uint64      // sequence number when tracing was started
	ticksStart    int64       // cputicks when tracing was started
	ticksEnd      int64       // cputicks when tracing was stopped
	timeStart     int64       // nanotime when tracing was started
	timeEnd       int64       // nanotime when tracing was stopped
	seqGC         uint64      // GC start/done sequencer
	reading       traceBufPtr // buffer currently handed off to user
	empty         traceBufPtr // stack of empty buffers
	fullHead      traceBufPtr // queue of full buffers
	fullTail      traceBufPtr
	reader        guintptr        // goroutine that called ReadTrace, or nil
	stackTab      traceStackTable // maps stack traces to unique ids

	// Dictionary for traceEvString.
	//
	// Currently this is used only at trace setup and for
	// func/file:line info after tracing session, so we assume
	// single-threaded access.
	strings   map[string]uint64
	stringSeq uint64

	// markWorkerLabels maps gcMarkWorkerMode to string ID.
	markWorkerLabels [len(gcMarkWorkerModeStrings)]uint64

	bufLock mutex       // protects buf
	buf     traceBufPtr // global trace buffer, used when running without a p
}

// It's important that any use of gcWork during the mark phase prevent
// the garbage collector from transitioning to mark termination since
// gcWork may locally hold GC work buffers. This can be done by
// disabling preemption (systemstack or acquirem).
type gcWork struct {
	// wbuf1 and wbuf2 are the primary and secondary work buffers.
	//
	// This can be thought of as a stack of both work buffers'
	// pointers concatenated. When we pop the last pointer, we
	// shift the stack up by one work buffer by bringing in a new
	// full buffer and discarding an empty one. When we fill both
	// buffers, we shift the stack down by one work buffer by
	// bringing in a new empty buffer and discarding a full one.
	// This way we have one buffer's worth of hysteresis, which
	// amortizes the cost of getting or putting a work buffer over
	// at least one buffer of work and reduces contention on the
	// global work lists.
	//
	// wbuf1 is always the buffer we're currently pushing to and
	// popping from and wbuf2 is the buffer that will be discarded
	// next.
	//
	// Invariant: Both wbuf1 and wbuf2 are nil or neither are.
	wbuf1, wbuf2 wbufptr

	// Bytes marked (blackened) on this gcWork. This is aggregated
	// into work.bytesMarked by dispose.
	bytesMarked uint64

	// Scan work performed on this gcWork. This is aggregated into
	// gcController by dispose and may also be flushed by callers.
	scanWork int64
}

type sweepdata struct {
	lock    mutex
	g       *g
	parked  bool
	started bool

	nbgsweep    uint32
	npausesweep uint32

	// pacertracegen is the sweepgen at which the last pacer trace
	// "sweep finished" message was printed.
	pacertracegen uint32
}

其实基础的数据结构除这上面几个还有几个，但多是从这几个衍生出来的，这里不再赘述。第一个writeBarrier 就是在GC中常用到的写屏障，最后一个sweepdata 表示清除的数据构成，这里面的那个g，就是GPM中那个G。gcControllerState 这个就更好理解了，看里面的成员变量的定义就知道是对控制GC过程中的状态显示和控制。gcWork正如注释中所说，它是一个GC状态控制和缓冲数据的类，有点类似于Context的上下文的意思,里面最主要是开头的两个缓冲区，类似于一个乒乓交换，和主缓冲区一起工作。一般一个P绑定一个gcWork。其实这其中就是一个任务队列，用来管理灰色对象。work和trace基本都是数据和状态记录（包括锁等），用来保存在GC过程当前的具体场景。

二、源码分析

调度控制主要分布在mgc.go,mgcwork.go,trace.go，proc.go几个文件中，其主要的控制逻辑如下：

//mgc.go
func gcStart(mode gcMode, forceTrigger bool) {
	// Since this is called from malloc and malloc is called in
	// the guts of a number of libraries that might be holding
	// locks, don't attempt to start GC in non-preemptible or
	// potentially unstable situations.
	mp := acquirem()
	if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" {
		releasem(mp)
		return
	}
	releasem(mp)
	mp = nil

	// Pick up the remaining unswept/not being swept spans concurrently
	//
	// This shouldn't happen if we're being invoked in background
	// mode since proportional sweep should have just finished
	// sweeping everything, but rounding errors, etc, may leave a
	// few spans unswept. In forced mode, this is necessary since
	// GC can be forced at any point in the sweeping cycle.
	//
	// We check the transition condition continuously here in case
	// this G gets delayed in to the next GC cycle.
	for (mode != gcBackgroundMode || gcShouldStart(forceTrigger)) && gosweepone() != ^uintptr(0) {
		sweep.nbgsweep++
	}

	// Perform GC initialization and the sweep termination
	// transition.
	//
	// If this is a forced GC, don't acquire the transition lock
	// or re-check the transition condition because we
	// specifically *don't* want to share the transition with
	// another thread.
	useStartSema := mode == gcBackgroundMode
	if useStartSema {
		semacquire(&work.startSema, 0)
		// Re-check transition condition under transition lock.
		if !gcShouldStart(forceTrigger) {
			semrelease(&work.startSema)
			return
		}
	}

	// For stats, check if this GC was forced by the user.
	forced := mode != gcBackgroundMode

	// In gcstoptheworld debug mode, upgrade the mode accordingly.
	// We do this after re-checking the transition condition so
	// that multiple goroutines that detect the heap trigger don't
	// start multiple STW GCs.
	if mode == gcBackgroundMode {
		if debug.gcstoptheworld == 1 {
			mode = gcForceMode
		} else if debug.gcstoptheworld == 2 {
			mode = gcForceBlockMode
		}
	}

	// Ok, we're doing it!  Stop everybody else
	semacquire(&worldsema, 0)

	if trace.enabled {
		traceGCStart()
	}

	if mode == gcBackgroundMode {
		gcBgMarkStartWorkers()
	}

	gcResetMarkState()

	now := nanotime()
	work.stwprocs, work.maxprocs = gcprocs(), gomaxprocs
	work.tSweepTerm = now
	work.heap0 = memstats.heap_live
	work.pauseNS = 0
	work.mode = mode

	work.pauseStart = now
	systemstack(stopTheWorldWithSema)
	// Finish sweep before we start concurrent scan.
	systemstack(func() {
		finishsweep_m()
	})
	// clearpools before we start the GC. If we wait they memory will not be
	// reclaimed until the next GC cycle.
	clearpools()

	if mode == gcBackgroundMode { // Do as much work concurrently as possible
		gcController.startCycle()
		work.heapGoal = memstats.next_gc

		// Enter concurrent mark phase and enable
		// write barriers.
		//
		// Because the world is stopped, all Ps will
		// observe that write barriers are enabled by
		// the time we start the world and begin
		// scanning.
		//
		// It's necessary to enable write barriers
		// during the scan phase for several reasons:
		//
		// They must be enabled for writes to higher
		// stack frames before we scan stacks and
		// install stack barriers because this is how
		// we track writes to inactive stack frames.
		// (Alternatively, we could not install stack
		// barriers over frame boundaries with
		// up-pointers).
		//
		// They must be enabled before assists are
		// enabled because they must be enabled before
		// any non-leaf heap objects are marked. Since
		// allocations are blocked until assists can
		// happen, we want enable assists as early as
		// possible.
		setGCPhase(_GCmark)

		gcBgMarkPrepare() // Must happen before assist enable.
		gcMarkRootPrepare()

		// Mark all active tinyalloc blocks. Since we're
		// allocating from these, they need to be black like
		// other allocations. The alternative is to blacken
		// the tiny block on every allocation from it, which
		// would slow down the tiny allocator.
		gcMarkTinyAllocs()

		// At this point all Ps have enabled the write
		// barrier, thus maintaining the no white to
		// black invariant. Enable mutator assists to
		// put back-pressure on fast allocating
		// mutators.
		atomic.Store(&gcBlackenEnabled, 1)

		// Assists and workers can start the moment we start
		// the world.
		gcController.markStartTime = now

		// Concurrent mark.
		systemstack(startTheWorldWithSema)
		now = nanotime()
		work.pauseNS += now - work.pauseStart
		work.tMark = now
	} else {
		t := nanotime()
		work.tMark, work.tMarkTerm = t, t
		work.heapGoal = work.heap0

		if forced {
			memstats.numforcedgc++
		}

		// Perform mark termination. This will restart the world.
		gcMarkTermination()
	}

	if useStartSema {
		semrelease(&work.startSema)
	}
}

其实在这个gcStart函数里，基本就把GC控制的调度都给创建完毕了。正如前面分析，这里把调度的模式分成了三种情况：

// gcMode indicates how concurrent a GC cycle should be.
type gcMode int

const (
	gcBackgroundMode gcMode = iota // concurrent GC and sweep
	gcForceMode                    // stop-the-world GC now, concurrent sweep
	gcForceBlockMode               // stop-the-world GC now and STW sweep (forced by user)
)

明白了这三种模式就可以看gcStart的调度方式了。判断是否是后台模式，然后启动一个后台gcBgMarkStartWorkers()函数，它最终会启动gcDrain来进行标记行动。在此模式的最后，会在条件符合时启动一个前面提到的startCycle记GC循环起来，并在setGCPhase函数中根据GC阶段启用写屏障。同时，跟踪也会根据条件启动，然后就开始各种GC前的准备和相关数据状态的更新，而此时更具体的代码分析，在后面的GC流程中再看。
在这些文件中和相关的类中的函数，都是用来调度的，如：

func (c *gcControllerState) revise() {
	// Compute the expected scan work remaining.
	//
	// Note that we currently count allocations during GC as both
	// scannable heap (heap_scan) and scan work completed
	// (scanWork), so this difference won't be changed by
	// allocations during GC.
	//
	// This particular estimate is a strict upper bound on the
	// possible remaining scan work for the current heap.
	// You might consider dividing this by 2 (or by
	// (100+GOGC)/100) to counter this over-estimation, but
	// benchmarks show that this has almost no effect on mean
	// mutator utilization, heap size, or assist time and it
	// introduces the danger of under-estimating and letting the
	// mutator outpace the garbage collector.
	scanWorkExpected := int64(memstats.heap_scan) - c.scanWork
	if scanWorkExpected < 1000 {
		// We set a somewhat arbitrary lower bound on
		// remaining scan work since if we aim a little high,
		// we can miss by a little.
		//
		// We *do* need to enforce that this is at least 1,
		// since marking is racy and double-scanning objects
		// may legitimately make the expected scan work
		// negative.
		scanWorkExpected = 1000
	}

	// Compute the heap distance remaining.
	heapDistance := int64(memstats.next_gc) - int64(memstats.heap_live)
	if heapDistance <= 0 {
		// This shouldn't happen, but if it does, avoid
		// dividing by zero or setting the assist negative.
		heapDistance = 1
	}

	// Compute the mutator assist ratio so by the time the mutator
	// allocates the remaining heap bytes up to next_gc, it will
	// have done (or stolen) the remaining amount of scan work.
	c.assistWorkPerByte = float64(scanWorkExpected) / float64(heapDistance)
	c.assistBytesPerWork = float64(heapDistance) / float64(scanWorkExpected)
}

其它的函数可以在相关的文件中查看，只要掌握了这些基础的动作函数，在后面的GC过程分析中，才会更有的放矢。当然，直接看GC过程再回过头来翻看也不是不可以，不过经常会出现看了前面忘记后面的尴尬的情况。根据自己的实际情况来决定如何看代码吧。

三、总结

其实所有的工作在一开始之前，只有工作的人。当工作进一步复杂精细化时，可能这些个工作的人就抽时间代为管理一下。但随着整体的工作愈发的复杂和精细化，专门负责管理工作的人就出现了。代码也是如此，有兴趣看看Linux0.1的代码，好多代码直接就开始干活，哪有什么这种那种的策略、管理？可再看看现在的Linux内核的代码到处都是各种算法、任务、策略、平台、管理、调度等等。再看看Linux基金会，慢慢就会明白，没有这些是真的不行。但掌握好一个好的度，这就是难点和重点了。
学习别人的代码，重点就是如何平衡管理的代码和工作的代码，既要知道怎么干（工作）又要知道如何干（管理）。