Embedded vision: we’re only at the beginning

智能图像分析拥有巨大潜力。图像传感器产生大量数据,嵌入式视觉算法和平台能够解析并赋予这些数据意义,为移动设备、家庭和汽车带来全新的用户体验。文章探讨了应用、使用的技术以及所需的计算平台。

Embedded vision: we’re only at the beginning
03/09/2014 / Marco Jacobs

Smart image analysis has enormous potential. An image sensor produces copious amount of data. The embedded vision algorithms and platforms that can interpret and give meaning to this data enable completely new user experiences, on the mobile phone, in the home, and in the car. videantis’ Marco Jacobs shines his light on applications, techniques that are used, and the required compute platforms.

Image processing, according to Wikipedia, is “Any method of signal processing for which the input is an image, like a photo or frame of video, and the output is either an image, or a collection of characteristics”. This kind of image processing is everywhere around us. Our mobile phones do it, for example, as do our TVs…and so do we.

For us humans, the eyes perform simple image processing tasks: focusing (sometimes with the help of glasses), controlling exposure, and capturing color and light information. The interesting part starts in our brain, however, where we interpret the images and give meaning to them. Research has shown that about half of our brain is allocated to image processing. Apparently this is a compute-intensive task, as it requires lots of synapses and neurons performing their magic. But it does pay off.

Applications

A decade ago, professional applications primarily used computer vision techniques: cameras inspecting products during assembly in the factory, or surveillance cameras triggering an alarm when they detected motion. In the past decade, however, embedded vision has expanded and (for example) entered consumer electronics. Even an inexpensive digital camera these days detects the location of faces in the scene and adjusts its focus accordingly.

One of the best-known successes of embedded vision is Microsoft’s Xbox Kinect. The Kinect, originally sold as an Xbox accessory (later also as a PC peripheral), projects a pattern of infrared light onto the gamers in the room. Based on the distortions of the pattern, it then constructs a depth map. Using this depth information, the console can easily distinguish people or objects from the scene’s background, and use this information in games and other applications. Since the introduction of the Kinect to the gaming market, similar techniques have also made their way to other industries.

Today’s smartphones have at least two cameras, three if you also count the touch sensor, since it captures an image from which the positions of the finger tips on the screen can be deduced. The iPhone includes another sensor that takes an image of a fingerprint. Amazon’s Fire Phone includes another four image sensors that are used to find the gaze direction of the person holding the phone, which is then used to present a real-time 3D user interface.

Still, we’re only at the beginning of embedded vision. Many new applications are being developed, and large innovative companies like Google and Amazon are investing heavily. Speaking to the imagination most is probably the self-driving car. Google recently introduce an autonomous 25 MPH two-seat vehicle that doesn’t have a steering wheel, gas pedal, or brake pedal. Since the premise of autonomous vehicles is to drive us to our destinations in a flawless manner, car accidents will largely be a thing of the past.

Another interesting initiative at Google is Project Tango, which adds multiple image sensors to mobile phones and tablets. The primary goal of these depth and fisheye cameras is not to take nice pictures, but to analyze the mobile device’s surroundings, in order to accurately deduce its location and orientation. Once the exact camera pose is known, unique augmented reality games can be implemented. Imagine, for example, Mario not only being able to jump on platforms on the display, but also on the couches, tables and bookcases around you. Such non-GPS-based accurate positioning also opens the door to indoor navigation, i.e. user-generated InsideView instead of StreetView.

Key Technologies

The algorithms at the foundation of smart embedded vision are still very much in evolution. Scientists publish one paper after another; companies similarly have unique approaches. One popular software package at the moment is OpenCV. This open source library offers over a thousand different computer vision and image processing routines, of which typically only a small portion is used in any one product.

The Khronos group, perhaps best-known for its OpenGL, OpenCL and OpenMax standards, is working on OpenVX, a library that can be used to construct efficient image processing systems. This library consists of only 40 image analysis routines, but is structured in a framework that allows image data to stay local to processing units. This attribute can greatly reduce the number of data read and write operations to external memory, lowering power consumption and increasing performance significantly.

Most algorithms are variations on the same theme. Feature detection, for example, finds interesting points in an image, mostly corners. A 300 KByte VGA image of a square, for instance, is then converted into just 4 data points of the square’s corners, a significant reduction of the amount of data. There are many different algorithms to find the corners, but most of them are very similar. Another key technique is feature tracking. This algorithm follows points from one frame to another in a video stream. This way, we get information about the speed and direction of the objects in the scene, or the change in position of the camera. Using a technique called structure from motion, this information can even be used to obtain a rough 3D model of the scene that the camera captured.

A third key technique is object detection, which finds and classifies objects in an image, such as the location of a face, a person, or a car. Such algorithms need to be trained and tuned using reference images.  By running a large library of images through a training algorithm, the software learns what what are (and aren’t) the objects we’d like to detect. The resulting parameters of such an offline, non-real-time training algorithm are then fed into the real-time object detector.

The training phase typically requires lots of reference images, tuning, manual guiding, and verification of the algorithm. in the last few years, however, a new class of algorithms has been developed and become popular: convolutional networks, also known as deep learning. These algorithms can detect objects with higher accuracy or in a more generalized way. Training is also deemed to be an easier process via deep learning techniques.

Platforms

Image analysis requires lots of compute power, and at first glance seems quite brute force. A 5 megapixel black and white camera that captures 30 frames per second generates 150 Megasamples per second. Many algorithms generate a multiscale image of this input data, each time downscaling the image (by 10%, for instance), which increases the amount of data significantly. The object detector then runs the same algorithm on all the different resolutions, looking for a match.

Running such vision algorithms on a standard CPU is nearly impossible even when the algorithms are simple and the resolution is low. When the algorithms get slightly more complex, or the required resolution and accuracy goes up, we have to look for alternative, more powerful processing solutions. Recently, GPUs have become GPGPUs and quite powerful in the process. In addition, the tooling and software frameworks to program and optimize for these complicated machines have become more workable. Still, GPUs are typically not efficient enough. They use a lot of energy and are expensive because of the large silicon area they consume. FPGAs are another alternative for lower volumes. Casting algorithms in hardware yields the most efficient implementation, but since these algorithms are still under development and changing, this usually isn’t an effective solution.

A new class of digital signal processors, specifically optimized for energy efficiency and high-performance image processing, has recently emerged. Such processors don’t have the overhead that RISC processor have; they don’t need to run complex operatings systems, web browsers and other large software stacks. These video DSPs also don’t carry the baggage that GPUs have because of their history in 3D graphics. An efficient and parallel video DSP that’s optimized for image processing seems to be an ideal solution.

【事件触发一致性】研究多智能体网络如何通过分布式事件驱动控制实现有限时间内的共识(Matlab代码实现)内容概要:本文围绕多智能体网络中的事件触发一致性问题,研究如何通过分布式事件驱动控制实现有限时间内的共识,并提供了相应的Matlab代码实现方案。文中探讨了事件触发机制在降低通信负担、提升系统效率方面的优势,重点分析了多智能体系统在有限时间收敛的一致性控制策略,涉及系统模型构建、触发条件设计、稳定性与收敛性分析等核心技术环节。此外,文档还展示了该技术在航空航天、电力系统、机器人协同、无人机编队等多个前沿领域的潜在应用,体现了其跨学科的研究价值和工程实用性。; 适合人群:具备一定控制理论基础和Matlab编程能力的研究生、科研人员及从事自动化、智能系统、多智能体协同控制等相关领域的工程技术人员。; 使用场景及目标:①用于理解和实现多智能体系统在有限时间内达成一致的分布式控制方法;②为事件触发控制、分布式优化、协同控制等课题提供算法设计与仿真验证的技术参考;③支撑科研项目开发、学术论文复现及工程原型系统搭建; 阅读建议:建议结合文中提供的Matlab代码进行实践操作,重点关注事件触发条件的设计逻辑与系统收敛性证明之间的关系,同时可延伸至其他应用场景进行二次开发与性能优化。
【四旋翼无人机】具备螺旋桨倾斜机构的全驱动四旋翼无人机:建模与控制研究(Matlab代码、Simulink仿真实现)内容概要:本文围绕具备螺旋桨倾斜机构的全驱动四旋翼无人机展开,重点研究其动力学建模与控制系统设计。通过Matlab代码与Simulink仿真实现,详细阐述了该类无人机的运动学与动力学模型构建过程,分析了螺旋桨倾斜机构如何提升无人机的全向机动能力与姿态控制性能,并设计相应的控制策略以实现稳定飞行与精确轨迹跟踪。文中涵盖了从系统建模、控制器设计到仿真验证的完整流程,突出了全驱动结构相较于传统四旋翼在欠驱动问题上的优势。; 适合人群:具备一定控制理论基础和Matlab/Simulink使用经验的自动化、航空航天及相关专业的研究生、科研人员或无人机开发工程师。; 使用场景及目标:①学习全驱动四旋翼无人机的动力学建模方法;②掌握基于Matlab/Simulink的无人机控制系统设计与仿真技术;③深入理解螺旋桨倾斜机构对飞行性能的影响及其控制实现;④为相关课题研究或工程开发提供可复现的技术参考与代码支持。; 阅读建议:建议读者结合提供的Matlab代码与Simulink模型,逐步跟进文档中的建模与控制设计步骤,动手实践仿真过程,以加深对全驱动无人机控制原理的理解,并可根据实际需求对模型与控制器进行修改与优化。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值