一、主要的调度控制类
在了解了基本流程和调度的策略方式后,就要看具体的调度控制了,首先要看的就是调度控制的几个基本类:
// The compiler knows about this variable.
// If you change it, you must change the compiler too.
var writeBarrier struct {
enabled bool // compiler emits a check of this before calling write barrier
pad [3]byte // compiler uses 32-bit load for "enabled" field
needed bool // whether we need a write barrier for current GC phase
cgo bool // whether we need a write barrier for a cgo check
alignme uint64 // guarantee alignment so that compiler can use a 32 or 64-bit load
}
type gcControllerState struct {
// scanWork is the total scan work performed this cycle. This
// is updated atomically during the cycle. Updates occur in
// bounded batches, since it is both written and read
// throughout the cycle. At the end of the cycle, this is how
// much of the retained heap is scannable.
//
// Currently this is the bytes of heap scanned. For most uses,
// this is an opaque unit of work, but for estimation the
// definition is important.
scanWork int64
// bgScanCredit is the scan work credit accumulated by the
// concurrent background scan. This credit is accumulated by
// the background scan and stolen by mutator assists. This is
// updated atomically. Updates occur in bounded batches, since
// it is both written and read throughout the cycle.
bgScanCredit int64
// assistTime is the nanoseconds spent in mutator assists
// during this cycle. This is updated atomically. Updates
// occur in bounded batches, since it is both written and read
// throughout the cycle.
assistTime int64
// dedicatedMarkTime is the nanoseconds spent in dedicated
// mark workers during this cycle. This is updated atomically
// at the end of the concurrent mark phase.
dedicatedMarkTime int64
// fractionalMarkTime is the nanoseconds spent in the
// fractional mark worker during this cycle. This is updated
// atomically throughout the cycle and will be up-to-date if
// the fractional mark worker is not currently running.
fractionalMarkTime int64
// idleMarkTime is the nanoseconds spent in idle marking
// during this cycle. This is updated atomically throughout
// the cycle.
idleMarkTime int64
// markStartTime is the absolute start time in nanoseconds
// that assists and background mark workers started.
markStartTime int64
// dedicatedMarkWorkersNeeded is the number of dedicated mark
// workers that need to be started. This is computed at the
// beginning of each cycle and decremented atomically as
// dedicated mark workers get started.
dedicatedMarkWorkersNeeded int64
// assistWorkPerByte is the ratio of scan work to allocated
// bytes that should be performed by mutator assists. This is
// computed at the beginning of each cycle and updated every
// time heap_scan is updated.
assistWorkPerByte float64
// assistBytesPerWork is 1/assistWorkPerByte.
assistBytesPerWork float64
// fractionalUtilizationGoal is the fraction of wall clock
// time that should be spent in the fractional mark worker.
// For example, if the overall mark utilization goal is 25%
// and GOMAXPROCS is 6, one P will be a dedicated mark worker
// and this will be set to 0.5 so that 50% of the time some P
// is in a fractional mark worker. This is computed at the
// beginning of each cycle.
fractionalUtilizationGoal float64
// triggerRatio is the heap growth ratio at which the garbage
// collection cycle should start. E.g., if this is 0.6, then
// GC should start when the live heap has reached 1.6 times
// the heap size marked by the previous cycle. This should be
// ≤ GOGC/100 so the trigger heap size is less than the goal
// heap size. This is updated at the end of of each cycle.
triggerRatio float64
_ [sys.CacheLineSize]byte
// fractionalMarkWorkersNeeded is the number of fractional
// mark workers that need to be started. This is either 0 or
// 1. This is potentially updated atomically at every
// scheduling point (hence it gets its own cache line).
fractionalMarkWorkersNeeded int64
_ [sys.CacheLineSize]byte
}
var work struct {
full uint64 // lock-free list of full blocks workbuf
empty uint64 // lock-free list of empty blocks workbuf
pad0 [sys.CacheLineSize]uint8 // prevents false-sharing between full/empty and nproc/nwait
// bytesMarked is the number of bytes marked this cycle. This
// includes bytes blackened in scanned objects, noscan objects
// that go straight to black, and permagrey objects scanned by
// markroot during the concurrent scan phase. This is updated
// atomically during the cycle. Updates may be batched
// arbitrarily, since the value is only read at the end of the
// cycle.
//
// Because of benign races during marking, this number may not
// be the exact number of marked bytes, but it should be very
// close.
//
// Put this field here because it needs 64-bit atomic access
// (and thus 8-byte alignment even on 32-bit architectures).
bytesMarked uint64
markrootNext uint32 // next markroot job
markrootJobs uint32 // number of markroot jobs
nproc uint32
tstart int64
nwait uint32
ndone uint32
alldone note
// helperDrainBlock indicates that GC mark termination helpers
// should pass gcDrainBlock to gcDrain to block in the
// getfull() barrier. Otherwise, they should pass gcDrainNoBlock.
//
// TODO: This is a temporary fallback to support
// debug.gcrescanstacks > 0 and to work around some known
// races. Remove this when we remove the debug option and fix
// the races.
helperDrainBlock bool
// Number of roots of various root types. Set by gcMarkRootPrepare.
nFlushCacheRoots int
nDataRoots, nBSSRoots, nSpanRoots, nStackRoots, nRescanRoots int
// markrootDone indicates that roots have been marked at least
// once during the current GC cycle. This is checked by root
// marking operations that have to happen only during the
// first root marking pass, whether that's during the
// concurrent mark phase in current GC or mark termination in
// STW GC.
markrootDone bool
// Each type of GC state transition is protected by a lock.
// Since multiple threads can simultaneously detect the state
// transition condition, any thread that detects a transition
// condition must acquire the appropriate transition lock,
// re-check the transition condition and return if it no
// longer holds or perform the transition if it does.
// Likewise, any transition must invalidate the transition
// condition before releasing the lock. This ensures that each
// transition is performed by exactly one thread and threads
// that need the transition to happen block until it has
// happened.
//
// startSema protects the transition from "off" to mark or
// mark termination.
startSema uint32
// markDoneSema protects transitions from mark 1 to mark 2 and
// from mark 2 to mark termination.
markDoneSema uint32
bgMarkReady note // signal background mark worker has started
bgMarkDone uint32 // cas to 1 when at a background mark completion point
// Background mark completion signaling
// mode is the concurrency mode of the current GC cycle.
mode gcMode
// totaltime is the CPU nanoseconds spent in GC since the
// program started if debug.gctrace > 0.
totaltime int64
// initialHeapLive is the value of memstats.heap_live at the
// beginning of this GC cycle.
initialHeapLive uint64
// assistQueue is a queue of assists that are blocked because
// there was neither enough credit to steal or enough work to
// do.
assistQueue struct {
lock mutex
head, tail guintptr
}
// rescan is a list of G's that need to be rescanned during
// mark termination. A G adds itself to this list when it
// first invalidates its stack scan.
rescan struct {
lock mutex
list []guintptr
}
// Timing/utilization stats for this cycle.
stwprocs, maxprocs int32
tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start
pauseNS int64 // total STW time this cycle
pauseStart int64 // nanotime() of last STW
// debug.gctrace heap sizes for this cycle.
heap0, heap1, heap2, heapGoal uint64
}
// trace is global tracing context.
var trace struct {
lock mutex // protects the following members
lockOwner *g // to avoid deadlocks during recursive lock locks
enabled bool // when set runtime traces events
shutdown bool // set when we are waiting for trace reader to finish after setting enabled to false
headerWritten bool // whether ReadTrace has emitted trace header
footerWritten bool // whether ReadTrace has emitted trace footer
shutdownSema uint32 // used to wait for ReadTrace completion
seqStart uint64 // sequence number when tracing was started
ticksStart int64 // cputicks when tracing was started
ticksEnd int64 // cputicks when tracing was stopped
timeStart int64 // nanotime when tracing was started
timeEnd int64 // nanotime when tracing was stopped
seqGC uint64 // GC start/done sequencer
reading traceBufPtr // buffer currently handed off to user
empty traceBufPtr // stack of empty buffers
fullHead traceBufPtr // queue of full buffers
fullTail traceBufPtr
reader guintptr // goroutine that called ReadTrace, or nil
stackTab traceStackTable // maps stack traces to unique ids
// Dictionary for traceEvString.
//
// Currently this is used only at trace setup and for
// func/file:line info after tracing session, so we assume
// single-threaded access.
strings map[string]uint64
stringSeq uint64
// markWorkerLabels maps gcMarkWorkerMode to string ID.
markWorkerLabels [len(gcMarkWorkerModeStrings)]uint64
bufLock mutex // protects buf
buf traceBufPtr // global trace buffer, used when running without a p
}
// It's important that any use of gcWork during the mark phase prevent
// the garbage collector from transitioning to mark termination since
// gcWork may locally hold GC work buffers. This can be done by
// disabling preemption (systemstack or acquirem).
type gcWork struct {
// wbuf1 and wbuf2 are the primary and secondary work buffers.
//
// This can be thought of as a stack of both work buffers'
// pointers concatenated. When we pop the last pointer, we
// shift the stack up by one work buffer by bringing in a new
// full buffer and discarding an empty one. When we fill both
// buffers, we shift the stack down by one work buffer by
// bringing in a new empty buffer and discarding a full one.
// This way we have one buffer's worth of hysteresis, which
// amortizes the cost of getting or putting a work buffer over
// at least one buffer of work and reduces contention on the
// global work lists.
//
// wbuf1 is always the buffer we're currently pushing to and
// popping from and wbuf2 is the buffer that will be discarded
// next.
//
// Invariant: Both wbuf1 and wbuf2 are nil or neither are.
wbuf1, wbuf2 wbufptr
// Bytes marked (blackened) on this gcWork. This is aggregated
// into work.bytesMarked by dispose.
bytesMarked uint64
// Scan work performed on this gcWork. This is aggregated into
// gcController by dispose and may also be flushed by callers.
scanWork int64
}
type sweepdata struct {
lock mutex
g *g
parked bool
started bool
nbgsweep uint32
npausesweep uint32
// pacertracegen is the sweepgen at which the last pacer trace
// "sweep finished" message was printed.
pacertracegen uint32
}
其实基础的数据结构除这上面几个还有几个,但多是从这几个衍生出来的,这里不再赘述。第一个writeBarrier 就是在GC中常用到的写屏障,最后一个sweepdata 表示清除的数据构成,这里面的那个g,就是GPM中那个G。gcControllerState 这个就更好理解了,看里面的成员变量的定义就知道是对控制GC过程中的状态显示和控制。gcWork正如注释中所说,它是一个GC状态控制和缓冲数据的类,有点类似于Context的上下文的意思,里面最主要是开头的两个缓冲区,类似于一个乒乓交换,和主缓冲区一起工作。一般一个P绑定一个gcWork。其实这其中就是一个任务队列,用来管理灰色对象。work和trace基本都是数据和状态记录(包括锁等),用来保存在GC过程当前的具体场景。
二、源码分析
调度控制主要分布在mgc.go,mgcwork.go,trace.go,proc.go几个文件中,其主要的控制逻辑如下:
//mgc.go
func gcStart(mode gcMode, forceTrigger bool) {
// Since this is called from malloc and malloc is called in
// the guts of a number of libraries that might be holding
// locks, don't attempt to start GC in non-preemptible or
// potentially unstable situations.
mp := acquirem()
if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" {
releasem(mp)
return
}
releasem(mp)
mp = nil
// Pick up the remaining unswept/not being swept spans concurrently
//
// This shouldn't happen if we're being invoked in background
// mode since proportional sweep should have just finished
// sweeping everything, but rounding errors, etc, may leave a
// few spans unswept. In forced mode, this is necessary since
// GC can be forced at any point in the sweeping cycle.
//
// We check the transition condition continuously here in case
// this G gets delayed in to the next GC cycle.
for (mode != gcBackgroundMode || gcShouldStart(forceTrigger)) && gosweepone() != ^uintptr(0) {
sweep.nbgsweep++
}
// Perform GC initialization and the sweep termination
// transition.
//
// If this is a forced GC, don't acquire the transition lock
// or re-check the transition condition because we
// specifically *don't* want to share the transition with
// another thread.
useStartSema := mode == gcBackgroundMode
if useStartSema {
semacquire(&work.startSema, 0)
// Re-check transition condition under transition lock.
if !gcShouldStart(forceTrigger) {
semrelease(&work.startSema)
return
}
}
// For stats, check if this GC was forced by the user.
forced := mode != gcBackgroundMode
// In gcstoptheworld debug mode, upgrade the mode accordingly.
// We do this after re-checking the transition condition so
// that multiple goroutines that detect the heap trigger don't
// start multiple STW GCs.
if mode == gcBackgroundMode {
if debug.gcstoptheworld == 1 {
mode = gcForceMode
} else if debug.gcstoptheworld == 2 {
mode = gcForceBlockMode
}
}
// Ok, we're doing it! Stop everybody else
semacquire(&worldsema, 0)
if trace.enabled {
traceGCStart()
}
if mode == gcBackgroundMode {
gcBgMarkStartWorkers()
}
gcResetMarkState()
now := nanotime()
work.stwprocs, work.maxprocs = gcprocs(), gomaxprocs
work.tSweepTerm = now
work.heap0 = memstats.heap_live
work.pauseNS = 0
work.mode = mode
work.pauseStart = now
systemstack(stopTheWorldWithSema)
// Finish sweep before we start concurrent scan.
systemstack(func() {
finishsweep_m()
})
// clearpools before we start the GC. If we wait they memory will not be
// reclaimed until the next GC cycle.
clearpools()
if mode == gcBackgroundMode { // Do as much work concurrently as possible
gcController.startCycle()
work.heapGoal = memstats.next_gc
// Enter concurrent mark phase and enable
// write barriers.
//
// Because the world is stopped, all Ps will
// observe that write barriers are enabled by
// the time we start the world and begin
// scanning.
//
// It's necessary to enable write barriers
// during the scan phase for several reasons:
//
// They must be enabled for writes to higher
// stack frames before we scan stacks and
// install stack barriers because this is how
// we track writes to inactive stack frames.
// (Alternatively, we could not install stack
// barriers over frame boundaries with
// up-pointers).
//
// They must be enabled before assists are
// enabled because they must be enabled before
// any non-leaf heap objects are marked. Since
// allocations are blocked until assists can
// happen, we want enable assists as early as
// possible.
setGCPhase(_GCmark)
gcBgMarkPrepare() // Must happen before assist enable.
gcMarkRootPrepare()
// Mark all active tinyalloc blocks. Since we're
// allocating from these, they need to be black like
// other allocations. The alternative is to blacken
// the tiny block on every allocation from it, which
// would slow down the tiny allocator.
gcMarkTinyAllocs()
// At this point all Ps have enabled the write
// barrier, thus maintaining the no white to
// black invariant. Enable mutator assists to
// put back-pressure on fast allocating
// mutators.
atomic.Store(&gcBlackenEnabled, 1)
// Assists and workers can start the moment we start
// the world.
gcController.markStartTime = now
// Concurrent mark.
systemstack(startTheWorldWithSema)
now = nanotime()
work.pauseNS += now - work.pauseStart
work.tMark = now
} else {
t := nanotime()
work.tMark, work.tMarkTerm = t, t
work.heapGoal = work.heap0
if forced {
memstats.numforcedgc++
}
// Perform mark termination. This will restart the world.
gcMarkTermination()
}
if useStartSema {
semrelease(&work.startSema)
}
}
其实在这个gcStart函数里,基本就把GC控制的调度都给创建完毕了。正如前面分析,这里把调度的模式分成了三种情况:
// gcMode indicates how concurrent a GC cycle should be.
type gcMode int
const (
gcBackgroundMode gcMode = iota // concurrent GC and sweep
gcForceMode // stop-the-world GC now, concurrent sweep
gcForceBlockMode // stop-the-world GC now and STW sweep (forced by user)
)
明白了这三种模式就可以看gcStart的调度方式了。判断是否是后台模式,然后启动一个后台gcBgMarkStartWorkers()函数,它最终会启动gcDrain来进行标记行动。在此模式的最后,会在条件符合时启动一个前面提到的startCycle记GC循环起来,并在setGCPhase函数中根据GC阶段启用写屏障。同时,跟踪也会根据条件启动,然后就开始各种GC前的准备和相关数据状态的更新,而此时更具体的代码分析,在后面的GC流程中再看。
在这些文件中和相关的类中的函数,都是用来调度的,如:
func (c *gcControllerState) revise() {
// Compute the expected scan work remaining.
//
// Note that we currently count allocations during GC as both
// scannable heap (heap_scan) and scan work completed
// (scanWork), so this difference won't be changed by
// allocations during GC.
//
// This particular estimate is a strict upper bound on the
// possible remaining scan work for the current heap.
// You might consider dividing this by 2 (or by
// (100+GOGC)/100) to counter this over-estimation, but
// benchmarks show that this has almost no effect on mean
// mutator utilization, heap size, or assist time and it
// introduces the danger of under-estimating and letting the
// mutator outpace the garbage collector.
scanWorkExpected := int64(memstats.heap_scan) - c.scanWork
if scanWorkExpected < 1000 {
// We set a somewhat arbitrary lower bound on
// remaining scan work since if we aim a little high,
// we can miss by a little.
//
// We *do* need to enforce that this is at least 1,
// since marking is racy and double-scanning objects
// may legitimately make the expected scan work
// negative.
scanWorkExpected = 1000
}
// Compute the heap distance remaining.
heapDistance := int64(memstats.next_gc) - int64(memstats.heap_live)
if heapDistance <= 0 {
// This shouldn't happen, but if it does, avoid
// dividing by zero or setting the assist negative.
heapDistance = 1
}
// Compute the mutator assist ratio so by the time the mutator
// allocates the remaining heap bytes up to next_gc, it will
// have done (or stolen) the remaining amount of scan work.
c.assistWorkPerByte = float64(scanWorkExpected) / float64(heapDistance)
c.assistBytesPerWork = float64(heapDistance) / float64(scanWorkExpected)
}
其它的函数可以在相关的文件中查看,只要掌握了这些基础的动作函数,在后面的GC过程分析中,才会更有的放矢。当然,直接看GC过程再回过头来翻看也不是不可以,不过经常会出现看了前面忘记后面的尴尬的情况。根据自己的实际情况来决定如何看代码吧。
三、总结
其实所有的工作在一开始之前,只有工作的人。当工作进一步复杂精细化时,可能这些个工作的人就抽时间代为管理一下。但随着整体的工作愈发的复杂和精细化,专门负责管理工作的人就出现了。代码也是如此,有兴趣看看Linux0.1的代码,好多代码直接就开始干活,哪有什么这种那种的策略、管理?可再看看现在的Linux内核的代码到处都是各种算法、任务、策略、平台、管理、调度等等。再看看Linux基金会,慢慢就会明白,没有这些是真的不行。但掌握好一个好的度,这就是难点和重点了。
学习别人的代码,重点就是如何平衡管理的代码和工作的代码,既要知道怎么干(工作)又要知道如何干(管理)。