作用:视频编码器里面的码率控制模块,从功能上来说,就是负责给编码器实际编码时(量化模块),提供合适的量化参数QP值,对于某一帧甚至某个宏块,到底是用高QP编码性能好,还是用低QP编码好,这个策略需要码率控制模块来做。
原理解读参考x265代码阅读:码率控制(一)_编码视界的博客-优快云博客_x265码率控制
x265中码率控制算法与x264的码率控制算法基本相同,基本上是经验性的,与ITU-T/MPEG各类标准推荐的码率控制算法均不同。
x265的率控应该只是帧级率控,虽然有与CU相关的率控参数,但其实那是块级的率失真优化技术,并非块级率控。x265支持三种率控模式:
/* rate tolerance method */
typedef enum
{
X265_RC_ABR, // average bit rate 对应学界的CBR算法
X265_RC_CQP, // constant QP CQP对应HM不开率控时的配置
X265_RC_CRF // constant rate fator
// X265_RC_CRF是“Quality-controlled VBR”学界叫做consistent quality的码率控制
} X265_RC_METHODS;
一、实现方法:ABR算法细节(QP的计算)
x265--速率控制模块理解_进击的研究僧的博客-优快云博客
整体流程为
1、计算当前帧的模糊复杂度为cplx_blur(i)
利用当前帧的SATD计算图像的模糊复杂度(Blurred Complexity), 设当前帧的SATD为SATD(i),累积复杂度为cplx_sum(i)
当前帧的模糊复杂度为cplx_blur(i)
(其中cplx_count表示累计加权帧数)
/* 1pass ABR */
/* Calculate the quantizer which would have produced the desired
* average bitrate if it had been applied to all frames so far.
* Then modulate that quant based on the current frame's complexity
* relative to the average complexity so far (using the 2pass RCEQ).
* Then bias the quant up or down if total size so far was far from
* the target.
* Result: Depending on the value of rate_tolerance, there is a
* tradeoff between quality and bitrate precision. But at large
* tolerances, the bit distribution approaches that of 2pass. */
//当前帧的模糊复杂度
double overflow = 1;
double lqmin = MIN_QPSCALE, lqmax = MAX_MAX_QPSCALE;
m_shortTermCplxSum *= 0.5;
m_shortTermCplxCount *= 0.5;
m_shortTermCplxSum += m_currentSatd / (CLIP_DURATION(m_frameDuration) / BASE_FRAME_DURATION);
m_shortTermCplxCount++;
/* coeffBits to be used in 2-pass */
rce->coeffBits = (int)m_currentSatd;
//得到当前帧的模糊复杂度
rce->blurredComplexity = m_shortTermCplxSum / m_shortTermCplxCount;
rce->mvBits = 0;
rce->sliceType = m_sliceType;
2.原始量化参数为qscale_raw,
其中qc为压制参数,用来调控qscale_raw的幅度
else
//原始量化参数为qscale_raw,pow函数:取n次方返回
q = pow(rce->blurredComplexity, 1 - m_param->rc.qCompress);
3、qscale_raw需要两次修正(qscale_raw重新计算)
第一次修正利用rate_factor修正。 其中
CRF和ABR调用getQScale时给的rateFactor的参数不同,CRF时是常数,ABR等于m_wantedBitsWindow / m_cplxrSum,使得计算得到的QScale与已编码视频的复杂度成正比。
其中wanted_bits_window表示到当前编码帧为止所有目标比特累计值;cplxr_sum(i)为根据前一帧的量化等级参数求取情况估计出的当前帧复杂度,是一个迭代量,初始值为
m_cplxrSum = .01 * pow(7.0e5, m_qCompress) * pow(m_ncu, 0.5) * tuneCplxFactor;
其中
double tuneCplxFactor = (m_param->rc.cuTree && m_ncu > 3600) ? 2.5 :1;//720p以上参数为2.5 以下为1.0
复杂度情况计算:rate_factor修正后的复杂度情况,计算公式如下:
/* After encoding one frame, update rate control state */
int RateControl::rateControlEnd(Frame* curFrame, int64_t bits, RateControlEntry* rce)
{
int orderValue = m_startEndOrder.get();
int endOrdinal = (rce->encodeOrder + m_param->frameNumThreads) * 2 - 1;
while (orderValue < endOrdinal && !m_bTerminated)
{
/* no more frames are being encoded, so fake the start event if we would
* have blocked on it. Note that this does not enforce rateControlEnd()
* ordering during flush, but this has no impact on the outputs */
if (m_finalFrameCount && orderValue >= 2 * m_finalFrameCount)
break;
orderValue = m_startEndOrder.waitForChange(orderValue);
}
FrameData& curEncData = *curFrame->m_encData;
int64_t actualBits = bits;
Slice *slice = curEncData.m_slice;
if (m_param->rc.aqMode || m_isVbv)
{
if (m_isVbv)
{
/* determine avg QP decided by VBV rate control */
for (uint32_t i = 0; i < slice->m_sps->numCuInHeight; i++)
curEncData.m_avgQpRc += curEncData.m_rowStat[i].sumQpRc;
curEncData.m_avgQpRc /= slice->m_sps->numCUsInFrame;
rce->qpaRc = curEncData.m_avgQpRc;
}
if (m_param->rc.aqMode)
{
/* determine actual avg encoded QP, after AQ/cutree adjustments */
for (uint32_t i = 0; i < slice->m_sps->numCuInHeight; i++)
curEncData.m_avgQpAq += curEncData.m_rowStat[i].sumQpAq;
curEncData.m_avgQpAq /= (slice->m_sps->numCUsInFrame * NUM_4x4_PARTITIONS);
}
else
curEncData.m_avgQpAq = curEncData.m_avgQpRc;
}
if (m_isAbr)
{
if (m_param->rc.rateControlMode == X265_RC_ABR && !m_param->rc.bStatRead)
checkAndResetABR(rce, true);
if (m_param->rc.rateControlMode == X265_RC_CRF)
{
if (int(curEncData.m_avgQpRc + 0.5) == slice->m_sliceQp)
curEncData.m_rateFactor = m_rateFactorConstant;
else
{
/* If vbv changed the frame QP recalculate the rate-factor */
double baseCplx = m_ncu * (m_param->bframes ? 120 : 80);
double mbtree_offset = m_param->rc.cuTree ? (1.0 - m_param->rc.qCompress) * 13.5 : 0;
curEncData.m_rateFactor = pow(baseCplx, 1 - m_qCompress) /
x265_qp2qScale(int(curEncData.m_avgQpRc + 0.5) + mbtree_offset);
}
}
}
if (m_isAbr && !m_isAbrReset)
{
/* amortize part of each I slice over the next several frames, up to
* keyint-max, to avoid over-compensating for the large I slice cost */
if (!m_param->rc.bStatWrite && !m_param->rc.bStatRead)
{
if (rce->sliceType == I_SLICE)
{
/* previous I still had a residual; roll it into the new loan */
if (m_residualFrames)
bits += m_residualCost * m_residualFrames;
m_residualFrames = X265_MIN((int)rce->amortizeFrames, m_param->keyframeMax);
m_residualCost = (int)((bits * rce->amortizeFraction) / m_residualFrames);
bits -= m_residualCost * m_residualFrames;
}
else if (m_residualFrames)
{
bits += m_residualCost;
m_residualFrames--;
}
}
if (rce->sliceType != B_SLICE)
{
/* The factor 1.5 is to tune up the actual bits, otherwise the cplxrSum is scaled too low
* to improve short term compensation for next frame. */
m_cplxrSum += (bits * x265_qp2qScale(rce->qpaRc) / rce->qRceq) - (rce->rowCplxrSum);
}
else
{
/* Depends on the fact that B-frame's QP is an offset from the following P-frame's.
* Not perfectly accurate with B-refs, but good enough. */
m_cplxrSum += (bits * x265_qp2qScale(rce->qpaRc) / (rce->qRceq * fabs(m_param->rc.pbFactor))) - (rce->rowCplxrSum);
}
m_wantedBitsWindow += m_frameDuration * m_bitrate;
m_totalBits += bits - rce->rowTotalBits;
m_encodedBits += actualBits;
int pos = m_sliderPos - m_param->frameNumThreads;
if (pos >= 0)
m_encodedBitsWindow[pos % s_slidingWindowFrames] = actualBits;
}
if (m_2pass)
{
m_expectedBitsSum += qScale2bits(rce, x265_qp2qScale(rce->newQp));
m_totalBits += bits - rce->rowTotalBits;
}
if (m_isVbv)
{
updateVbv(actualBits, rce);
if (m_param->bEmitHRDSEI)
{
const VUI *vui = &curEncData.m_slice->m_sps->vuiParameters;
const HRDInfo *hrd = &vui->hrdParameters;
const TimingInfo *time = &vui->timingInfo;
if (!curFrame->m_poc)
{
// first access unit initializes the HRD
rce->hrdTiming->cpbInitialAT = 0;
rce->hrdTiming->cpbRemovalTime = m_nominalRemovalTime = (double)m_bufPeriodSEI.m_initialCpbRemovalDelay / 90000;
}
else
{
rce->hrdTiming->cpbRemovalTime = m_nominalRemovalTime + (double)rce->picTimingSEI->m_auCpbRemovalDelay * time->numUnitsInTick / time->timeScale;
double cpbEarliestAT = rce->hrdTiming->cpbRemovalTime - (double)m_bufPeriodSEI.m_initialCpbRemovalDelay / 90000;
if (!curFrame->m_lowres.bKeyframe)
cpbEarliestAT -= (double)m_bufPeriodSEI.m_initialCpbRemovalDelayOffset / 90000;
rce->hrdTiming->cpbInitialAT = hrd->cbrFlag ? m_prevCpbFinalAT : X265_MAX(m_prevCpbFinalAT, cpbEarliestAT);
}
uint32_t cpbsizeUnscale = hrd->cpbSizeValue << (hrd->cpbSizeScale + CPB_SHIFT);
rce->hrdTiming->cpbFinalAT = m_prevCpbFinalAT = rce->hrdTiming->cpbInitialAT + actualBits / cpbsizeUnscale;
rce->hrdTiming->dpbOutputTime = (double)rce->picTimingSEI->m_picDpbOutputDelay * time->numUnitsInTick / time->timeScale + rce->hrdTiming->cpbRemovalTime;
}
}
rce->isActive = false;
// Allow rateControlStart of next frame only when rateControlEnd of previous frame is over
m_startEndOrder.incr();
return 0;
}
4、 qscale_raw第二次修正
利用溢出判断因子overflow来修正,它可以表示出总目标比特和实际产生的总比特的之间的偏差,修正公式如下:
overflow限定在0.5到2之间。
其中,total_bits(i-1)为到前一帧为止编码所产生的实际比特数之和;wanted_bits(i-1)为到前一帧为止累计的目标比特数之和。abr_buffer(i)为平均比特率缓冲区,初始值是两倍的平均目标比特和瞬时码率容忍度(默认为1)的乘积,是根据当前帧数和编码帧率增长的。
对应代码(tuneAbrQScaleFromFeedback())
//第二次修正,利用溢出判断因子overflow来修正,它可以表示出总目标比特和实际产生的总比特的之间的偏差
double RateControl::tuneAbrQScaleFromFeedback(double qScale)
{
double abrBuffer = 2 * m_rateTolerance * m_bitrate;
if (m_currentSatd)
{
/* use framesDone instead of POC as poc count is not serial with bframes enabled */
double overflow = 1.0;
double timeDone = (double)(m_framesDone - m_param->frameNumThreads + 1) * m_frameDuration;
double wantedBits = timeDone * m_bitrate;
int64_t encodedBits = m_totalBits;
if (m_param->totalFrames && m_param->totalFrames <= 2 * m_fps)
{
abrBuffer = m_param->totalFrames * (m_bitrate / m_fps);
encodedBits = m_encodedBits;
}
if (wantedBits > 0 && encodedBits > 0 && (!m_partialResidualFrames ||
m_param->rc.bStrictCbr))
{
abrBuffer *= X265_MAX(1, sqrt(timeDone));
overflow = x265_clip3(.5, 2.0, 1.0 + (encodedBits - wantedBits) / abrBuffer);
qScale *= overflow;
}
}
return qScale;
}
5.用计算出的qscale_raw 计算qp
R-D模型x264和x265编码器码率控制之基本模型 - 知乎)
R表示目标码率(比特数),X是当前的画面帧复杂度,α是码控参数。上式意味着,当前帧的复杂度越大时,qscale就越大(qscale就是拉格朗日乘子λ),帧级QP就相对越小(按x265里面的模型来说,SATD cost越大,帧级QP会越大。)
展开公式相当于,需要计算每一帧的SATD cost
上式中下标n表示当前帧号,取值从0开始。当前帧的复杂度用SATD cost来表征,其值实际上不仅仅是当前帧的SATD,还包括先前已编码帧的SATD加权。comp是考虑人眼视觉特性对SATD的非线性映射,默认取0.6,CBR时取0。
上面计算QP式中分母部分,编码器配置参数目标码率和帧率用于计算期望每帧比特数,随着编码帧数增大不断累加。而分子部分的右边一项下标是i-1,表示它需要前一帧的编码信息,包括前一帧的实际bit数,实际qscale,实际rceq。
但对于编码的第一帧(而且是IDR帧)来说,因为它没有前一帧信息,所以,对于前面计算QP的公式,需要考虑其边界值,n=0的情况。也就是每个码率控制算法里面,都需要加以考虑的"首帧QP问题"。
/* The qscale - qp conversion is specified in the standards.
* Approx qscale increases by 12% with every qp increment */
double x265_qScale2qp(double qScale)
{
return 12.0 + 6.0 * (double)X265_LOG2(qScale / 0.85);
}
double x265_qp2qScale(double qp)
{
return 0.85 * pow(2.0, (qp - 12.0) / 6.0);
}
二、代码模块细节
1、编码线程与预处理线程
x265主要由两个线程组成:编码线程(coding thread)和预处理线程(lookahead thread)。在编码线程中,速率控制包括两级:MB-level(CTU-level)、frame-level
RateControl构造函数只在开始时调用一次。这个函数执行全局资源和参数初始化(包括指定CRF模式时的RateFactor、速率控制参数以及VBV的初始化等)。
2、RateControlStart模块
(1)RateControlStart每帧调用一次,由三个功能块组成:
校验RC复位条件
FrameStartQp-计算启动QP(除非qpfile模式是开启的)
溢出校正
(2)RCReset:如果检测到在当前和前一帧之间存在场景切换,RC被reset。Reset的条件:current frame average satdCost > 4 * moving average of satdCosts
(3)StartQP:该模块被用来计算当前帧的初始QP,如果在从流开始或最后RC重置的时候应用,则它会得到所需的bitsize。 startQP = 12.0 + 6.0*log2(qscale/0.85)
(4)溢出校正:
timeDone = framesDone *m_frameDuration
wantedBits = timeDone * bitrate
if (wantedBits > 0 && m_totalBits > 0 && !m_partialResidualFrames)
{
abrBuffer *= MAX(1, sqrt(timeDone));
overflow = Clip3(.5, 2.0, 1.0 +
(m_totalBits - wantedBits) / abrBuffer);
startFrameQp *= overflow;
}
(5)RC模型参数更新:RC模型参数复杂度每帧更新一次,但不在帧开始时更新。在基于帧的并行情况下,不允许下一个帧进入rateControlStart,直到这个帧更新了它的中间帧模型参数。
m_refLagRows = 1 + (search_range + 63) / 64
3、VBV模块
用于控制接收端缓存不上溢不下溢,实质是对视频短时码率进行限制。
流程图如下:
4、RateControlEnd模块
在每一帧的末尾调用rateControlEnd函数。此函数执行以下操作:
更新统计数据
I-frame摊销
更新/计算的统计数字如下:
(1)平均Qp (HVS Qp调整后),存储在m_avgQpAq中
(2)更新m_wantedBitsWindowbybitrate*frameDuration
(3)更新累积复杂性m_cplxrSumbybits*avgQp
(4)更新m_totalBits
为防止I帧峰值对RC的影响,不向RC提供实际的I帧比特大小,而是提供比实际小的I帧比特数,剩下的比特被后面的I帧摊销。