Optimization

部署运行你感兴趣的模型镜像

http://answers.opencv.org/question/755/object-detection-slow/#760


Haar features are inherently slow - they make extensive use of floating point operations, which are a bit slow on mobile devices.

A quick solution would be to turn to LBP cascades - all you need is a few lines changed in your code. The performance gain is significant, and the loss in accuracy is minimal. Look for lbpcascades/lbpcascade_frontalface.xml.

If you want to dig deeper into optimzations, here is a generic optimization tip list (cross-posted from SO) Please note that face detection, being one of the most requested features of OpenCV, is already quite optimized, so advancing it further may mean deep knowledge.

Advice for optimization

A. Profile your app. Do it first on your computer, since it is much easier. Use visual studio profiler, and see what functions take the most. Optimize them. Never ever optimize because you think is slow, but because you measure it. Start with the slowest function, optimize it as much as possible, then take the second slower.

B. First, focus on algorithms. A faster algorithm can improve performance with orders of magnitude (100x). A C++ trick will give you maybe 2x performance boost.

Classical techniques:

  • Resize you video frames to be smaller. many times, you can extract the information from a 200x300px image, instead of a 1024x768. The area of the first one is 10 times smaller.

  • Use simpler operations instead of complicated ones. Use integers instead of floats. And never usedouble in a matrix or a for loop that executes thousands of times.

  • Do as little calculation as possible. Can you track an object only in a specific area of the image, instead of processing it all for all the frames? Can you make a rough/approximate detection on a very small image and then refine it on a ROI in the full frame?

C. In for loops, it may make sense to use C style instead of C++. A pointer to data matrix or a float array is much faster than mat.at<i, j=""> or std::vector<>. But change only if it's needed. Usually, a lot of processing (90%) is done in some double for loop. Focus on it. It doesn't make sense to replace vector<> all over the place, ad make your code look like spaghetti.

D. Some OpenCV functions convert data to double, process it, then convert back to the input format. Beware of them, they kill performance on mobile devices. Examples: warping, scaling, type conversions. Also, color space conversions are known to be lazy. Prefer grayscale obtained directly from native YUV.

E. ARM processors have NEON. Learn and use it. It is powerfull!

A small example:

float* a, *b, *c;
// init a and b to 1000001 elements
for(int i=0;i<1000001;i++)
    c[i] = a[i]*b[i];

can be rewritten as follows. It's more verbose, but trust me it's faster.

float* a, *b, *c;
// init a and b to 1000001 elements
float32x4_t _a, _b, _c;
int i;
for(i=0;i<1000001;i+=4)
{  
    a_ = vld1q_f32( &a[i] ); // load 4 floats from a in a NEON register
    b_ = vld1q_f32( &b[i] );
    c_ = vmulq_f32(a_, b_); // perform 4 float multiplies in parrallel
    vst1q_f32( &c[i], c_); // store the four results in c
}
// the vector size is not always multiple of 4 or 8 or 16. 
// Process the remaining elements
for(;i<1000001;i++)
    c[i] = a[i]*b[i];

Purists say you must write in assembler, but for the regular programmer guy that's a bit daunting. I found good results writing with intrinsics, like in the above example.

Also check this blog post and the following posts about NEON.

And, last but not least, I should mention that I had very good success converting the SSE optimizations (this is the NEON counterpart in x86-64 processors) in OpenCV to NEON, like here. This is the image filtering code for uchar matrices (the regular image format). You should't blindly convert instructions one by one, because there are better ways to do it, but take it as an example to start with.

您可能感兴趣的与本文相关的镜像

Yolo-v5

Yolo-v5

Yolo

YOLO(You Only Look Once)是一种流行的物体检测和图像分割模型,由华盛顿大学的Joseph Redmon 和Ali Farhadi 开发。 YOLO 于2015 年推出,因其高速和高精度而广受欢迎

### 优化工具箱的技术内容与相关软件工具 在编程和软件工具领域,优化工具箱(Optimization Toolbox)通常用于解决数值优化问题。这些工具箱可以应用于多种场景,例如轨迹优化、控制理论、机器学习中的超参数调整等。以下是一些常见的优化工具箱及其技术特性: #### 1. MATLAB Optimization Toolbox MATLAB 提供了一个强大的优化工具箱,支持求解线性规划、非线性规划、二次规划等问题。它还提供了许多高级功能,例如全局优化算法、混合整数规划以及多目标优化[^2]。该工具箱特别适合需要快速开发原型的应用场景。 #### 2. CasADi CasADi 是一个开源的符号计算工具箱,专注于数值优化和动态系统建模。它支持自动微分,并能够生成高效的 C 代码以加速在线优化任务。CasADi 被广泛应用于轨迹优化和模型预测控制(MPC)中[^1]。 ```python import casadi as cs # 定义变量和目标函数 x = cs.SX.sym('x', 2) f = x[0]**2 + x[1]**2 # 定义约束条件 g = x[0] + x[1] - 1 # 创建优化问题 nlp = {'x':x, 'f':f, 'g':g} # 使用 IPOPT 求解器 solver = cs.nlpsol('solver', 'ipopt', nlp) sol = solver(lbx=[-cs.inf, -cs.inf], ubx=[cs.inf, cs.inf], lbg=0, ubg=0) print(sol['x']) ``` #### 3. Python SciPy Optimization Module SciPy 的优化模块提供了一系列用于数值优化的算法,包括但不限于梯度下降法、牛顿法和信赖域法。虽然其功能不如 MATLAB 或 CasADi 强大,但对于简单的优化问题已经足够[^3]。 ```python from scipy.optimize import minimize # 定义目标函数 def objective(x): return x[0]**2 + x[1]**2 # 定义约束条件 def constraint(x): return x[0] + x[1] - 1 cons = {'type': 'eq', 'fun': constraint} solution = minimize(objective, [0, 0], constraints=cons) print(solution.x) ``` #### 4. Drake Optimization Framework Drake 是由 MIT 开发的一个用于机器人动力学和控制的开源框架。它的优化模块支持凸优化、非线性规划以及混合整数规划,适用于复杂的机器人控制问题。 #### 5. Gurobi 和 CPLEX Gurobi 和 CPLEX 是两个商业化的优化求解器,擅长处理大规模线性规划和混合整数规划问题。它们提供了 Python、MATLAB 和 C++ 等多种接口[^4]。 --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值