李飞飞:如何教计算机理解图片

2016年早期读论文阶段,我第一次接触Artificial Intelligence,当时只觉得这个词汇,真难拼,听起来逼格蛮高的,cool;2017年,人工智能已经成了媒体年度新词,甚至带动了一票AI概念股涨起来,比如当前时间(2017/12/29 15:00:00 北京时间)PE值已经在365.29的科大讯飞。2017年4、5月份假期,又听了Andrew Ng的半节《Machine Learning》,半途而废。错失了两次了解新技术,甚至是新技术革命的良机,2018年的第一天,把李飞飞的这个Computer Vision的TED刷一下好了,正好好久没有练英文听写了。

Let me show you something…
some pictures are shown, and a girl is describing what does the picture have…
the boy is …
those are the …
that’s a big airplane …

This is a three year old child describing what she sees in a series of photos. She may still have a lot to learn about this world, but she’d already an expert at one very important task to make sense what she sees. Our society is more technologically advanced than ever. We sent people to the moon, we make phones that talk to us or customise radio stations that can play only music we like. Yet, our most advanced machines and computers still struggle at this task. So I’m here today, to give you a progress report on latest advances in out research in computer vision, one of the most frontier and potentially revolutionary technology in computer science. Yes, we have prototyped of cars that can drive by themselves, but without smart vision, they cannot really tell the difference between a crumpled paper bag no the road, which can be run over, and a rock that size, which should be avoided. We have made fabulous megapixel cameras, but we have not delivered sight to the blind. Drones can fly over massive land, but don’t have enough vision technology to help us track the changes of the rainforests. Security cameras are everywhere, but they do not alert us when their child is drawning in a swimming pool. Photos and videos are becoming an integral part of global life. They’re being generated at a pace that’s far beyond what any human, or team of human, could hope to view. And you and I are contributing to that at this TED. Yet our most advanced software is still struggling at understanding or managing these enormous content. So in other words, collectively as a society, we’re very much blind, because our smartest machine are still blind.

“Why is this so hard?” you may ask. Camera can take pictures like this one, by converting lights into a two-dimensional array of numbers known as pixels, but these are just lifeless numbers. They do not carry meaning in themselves. Just as to hear is not the same as to listen, to take pictures is not the same as to see, and by seeing we really mean by understanding.

In fact, it took Mother Nature 540 millions year of hard work to do this task, and much of that effort went into developing the visual processing apparatus of our brains, not the eyes themselves. So the vision begins with the eye, but it truly take place in the brain.

So for fifteen years now, starting from my Ph.D. at Caltech and then leading Standford’s Vision Lab, I’ve been working with my mentors, collaborators and students to teach computers to see. Our research filed is called computer vision and machine learning. It’s part of the general filed of aritificail intelligence.

So ultimately, we want to teach the machines to see just what we do: naming objects, identify people, inferring 3D geometry of things, understanding relations, emotions, actions and intensions. You and I weave together entire stories of people, places and things the moment we lay our gaze on them. The first step towards this goal is to teach a computer to see objects, the building block of the visual world. In a simplest terms, imagine this teching process as showing the computer some training images of a particular object, let’s say cats, and designing a model that learns from these training images. How hard can this be? Tomorrow answer will been shown haha

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值