A qualitative overview of x264's ratecontrol methods

本文提供了一种对x264编码器的速率控制方法的全面理解,包括其历史演变、核心原理和不同模式的详细解释。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

本内容来源于:http://git.videolan.org/?p=x264.git;a=blob_plain;f=doc/ratecontrol.txt;hb=HEAD

A qualitative overview of x264's ratecontrol methods
By Loren Merritt

Historical note:
This document is outdated, but a significant part of it is still accurate.  Here are some important ways ratecontrol has changed since the authoring of this document:
- By default, MB-tree is used instead of qcomp for weighting frame quality based on complexity.  MB-tree is effectively a generalization of qcomp to the macroblock level.  MB-tree also replaces the constant offsets for B-frame quantizers.  The legacy algorithm is still available for low-latency applications.
- Adaptive quantization is now used to distribute quality among each frame; frames are no longer constant quantizer, even if MB-tree is off.
- VBV runs per-row rather than per-frame to improve accuracy.

x264's ratecontrol is based on libavcodec's, and is mostly empirical. But I can retroactively propose the following theoretical points which underlie most of the algorithms:

- You want the movie to be somewhere approaching constant quality. However, constant quality does not mean constant PSNR nor constant QP. Details are less noticeable in high-complexity or high-motion scenes, so you can get away with somewhat higher QP for the same perceived quality.
- On the other hand, you get more quality per bit if you spend those bits in scenes where motion compensation works well: A given artifact may stick around several seconds in a low-motion scene, and you only have to fix it in one frame to improve the quality of the whole scene.
- Both of the above are correlated with the number of bits it takes to encode a frame at a given QP.
- Given one encoding of a frame, we can predict the number of bits needed to encode it at a different QP. This prediction gets less accurate if the QPs are far apart.
- The importance of a frame depends on the number of other frames that are predicted from it. Hence I-frames get reduced QP depending on the number and complexity of following inter-frames, disposable B-frames get higher QP than P-frames, and referenced B-frames are between P-frames and disposable B-frames.


The modes:

    2pass:
Given some data about each frame of a 1st pass (e.g. generated by 1pass ABR, below), we try to choose QPs to maximize quality while matching a specified total size. This is separated into 3 parts:
(1) Before starting the 2nd pass, select the relative number of bits to allocate between frames. This pays no attention to the total size of the encode. The default formula, empirically selected to balance between the 1st 2 theoretical points, is "complexity ** 0.6", where complexity is defined to be the bit size of the frame at a constant QP (estimated from the 1st pass).
(2) Scale the results of (1) to fill the requested total size. Optional: Impose VBV limitations. Due to nonlinearities in the frame size predictor and in VBV, this is an iterative process.
(3) Now start encoding. After each frame, update future QPs to compensate for mispredictions in size. If the 2nd pass is consistently off from the predicted size (usually because we use slower compression options than the 1st pass), then we multiply all future frames' qscales by the reciprocal of the error. Additionally, there is a short-term compensation to prevent us from deviating too far from the desired size near the beginning (when we don't have much data for the global compensation) and near the end (when global doesn't have time to react).

    1pass, average bitrate:
The goal is the same as in 2pass, but here we don't have the benefit of a previous encode, so all ratecontrol must be done during the encode.
(1) This is the same as in 2pass, except that instead of estimating complexity from a previous encode, we run a fast motion estimation algo over a half-resolution version of the frame, and use the SATD residuals (these are also used in the decision between P- and B-frames). Also, we don't know the size or complexity of the following GOP, so I-frame bonus is based on the past.
(2) We don't know the complexities of future frames, so we can only scale based on the past. The scaling factor is chosen to be the one that would have resulted in the desired bitrate if it had been applied to all frames so far.
(3) Overflow compensation is the same as in 2pass. By tuning the strength of compensation, you can get anywhere from near the quality of 2pass (but unpredictable size, like +- 10%) to reasonably strict filesize but lower quality.

    1pass, constant bitrate (VBV compliant):
(1) Same as ABR.
(2) Scaling factor is based on a local average (dependent on VBV buffer size) instead of all past frames.
(3) Overflow compensation is stricter, and has an additional term to hard limit the QPs if the VBV is near empty. Note that no hard limit is done for a full VBV, so CBR may use somewhat less than the requested bitrate. Note also that if a frame violates VBV constraints despite the best efforts of prediction, it is not re-encoded.

    1pass, constant ratefactor:
(1) Same as ABR.
(2) The scaling factor is a constant based on the --crf argument.
(3) No overflow compensation is done.

    constant quantizer:
QPs are simply based on frame type.


电动汽车数据集:2025年3K+记录 真实电动汽车数据:特斯拉、宝马、日产车型,含2025年电池规格和销售数据 关于数据集 电动汽车数据集 这个合成数据集包含许多品牌和年份的电动汽车和插电式车型的记录,捕捉技术规格、性能、定价、制造来源、销售和安全相关属性。每一行代表由vehicle_ID标识的唯一车辆列表。 关键特性 覆盖范围:全球制造商和车型组合,包括纯电动汽车和插电式混合动力汽车。 范围:电池化学成分、容量、续航里程、充电标准和速度、价格、产地、自主水平、排放、安全等级、销售和保修。 时间跨度:模型跨度多年(包括传统和即将推出的)。 数据质量说明: 某些行可能缺少某些字段(空白)。 几个分类字段包含不同的、特定于供应商的值(例如,Charging_Type、Battery_Type)。 各列中的单位混合在一起;注意kWh、km、hr、USD、g/km和额定值。 列 列类型描述示例 Vehicle_ID整数每个车辆记录的唯一标识符。1 制造商分类汽车品牌或OEM。特斯拉 型号类别特定型号名称/变体。型号Y 与记录关联的年份整数模型。2024 电池_类型分类使用的电池化学/技术。磷酸铁锂 Battery_Capacity_kWh浮充电池标称容量,单位为千瓦时。75.0 Range_km整数表示充满电后的行驶里程(公里)。505 充电类型主要充电接口或功能。CCS、NACS、CHAdeMO、DCFC、V2G、V2H、V2L Charge_Time_hr浮动充电的大致时间(小时),上下文因充电方法而异。7.5 价格_USD浮动参考车辆价格(美元).85000.00 颜色类别主要外观颜色或饰面。午夜黑 制造国_制造类别车辆制造/组装的国家。美国 Autonomous_Level浮点自动化能力级别(例如0-5),可能包括子级别的小
To find the maxima and minima of the function F(x) = x1 + x2 + (x1 + x2)^2, we can first calculate the gradient and Hessian matrix using the following equations: ∇F(x) = [∂F/∂x1, ∂F/∂x2] = [1 + 2(x1 + x2), 1 + 2(x1 + x2)] H(x) = ∇²F(x) = [[2, 2], [2, 2]] We will then use two optimization algorithms, namely Newton and BFGS, to find the maxima and minima of F(x). 1. Newton's Method: The Newton's method is an iterative optimization algorithm that uses the Hessian matrix to update the current estimate of the solution. The algorithm proceeds as follows: Step 1: Choose the initial guess x(0) and set k = 0. Step 2: Calculate the Newton direction d(k) by solving the system of linear equations H(x(k))d(k) = -∇F(x(k)). Step 3: Update the current estimate by setting x(k+1) = x(k) + α(k)d(k), where α(k) is the step size chosen to satisfy the Armijo-Goldstein condition. Step 4: If ||∇F(x(k+1))|| ≤ ε, where ε is the tolerance level, stop. Otherwise, set k = k+1 and go to Step 2. To implement the Newton's method for F(x), we can use the following Python code: ``` import numpy as np from scipy.optimize import minimize # Define the function, gradient and Hessian matrix def f(x): return x[0] + x[1] + (x[0] + x[1])**2 def grad_f(x): return np.array([1 + 2*(x[0] + x[1]), 1 + 2*(x[0] + x[1])]) def hess_f(x): return np.array([[2, 2], [2, 2]]) # Set the initial guess and tolerance level x0 = np.array([0.0, 0.0]) tol = 1e-6 # Use the minimize function to apply the Newton's method res = minimize(f, x0, method='Newton-CG', jac=grad_f, hess=hess_f, tol=tol) # Print the result print(res) ``` The output of the code shows that the Newton's method converges to the global minimum of F(x) at x = [-1.0, -1.0]. ``` fun: -2.0 jac: array([0., 0.]) message: 'Optimization terminated successfully.' nfev: 6 nhev: 5 nit: 5 njev: 7 status: 0 success: True x: array([-1., -1.]) ``` 2. BFGS Method: The BFGS method is another iterative optimization algorithm that uses the gradient information to update the current estimate of the solution. The algorithm proceeds as follows: Step 1: Choose the initial guess x(0), set k = 0, and choose an initial Hessian matrix H(0). Step 2: Calculate the search direction d(k) by solving the system of linear equations H(k)d(k) = -∇F(x(k)). Step 3: Choose the step size α(k) by using a line search algorithm, such as the Armijo condition. Step 4: Update the current estimate by setting x(k+1) = x(k) + α(k)d(k). Step 5: Update the Hessian matrix by using the BFGS formula: H(k+1) = (I - ρ(k)s(k)y(k)T)H(k)(I - ρ(k)y(k)s(k)T) + ρ(k)s(k)s(k)T, where s(k) = x(k+1) - x(k), y(k) = ∇F(x(k+1)) - ∇F(x(k)), and ρ(k) = 1/(y(k)Ts(k)). Step 6: If ||∇F(x(k+1))|| ≤ ε, where ε is the tolerance level, stop. Otherwise, set k = k+1 and go to Step 2. To implement the BFGS method for F(x), we can use the following Python code: ``` import numpy as np from scipy.optimize import minimize # Define the function and gradient def f(x): return x[0] + x[1] + (x[0] + x[1])**2 def grad_f(x): return np.array([1 + 2*(x[0] + x[1]), 1 + 2*(x[0] + x[1])]) # Set the initial guess and tolerance level x0 = np.array([0.0, 0.0]) tol = 1e-6 # Use the minimize function to apply the BFGS method res = minimize(f, x0, method='BFGS', jac=grad_f, tol=tol) # Print the result print(res) ``` The output of the code shows that the BFGS method also converges to the global minimum of F(x) at x = [-1.0, -1.0]. ``` fun: -2.0 hess_inv: array([[0.5, -0.5], [-0.5, 0.5]]) jac: array([0., 0.]) message: 'Optimization terminated successfully.' nfev: 12 nit: 11 njev: 12 status: 0 success: True x: array([-1., -1.]) ``` In terms of the number of steps required for convergence, the BFGS method converges faster than the Newton's method for this particular function. This is because the Hessian matrix of F(x) is constant and does not change during the optimization process, which makes it easier for the BFGS method to approximate the inverse Hessian matrix. However, in general, the performance of the two methods may depend on the specific function and initial guess chosen.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值