没有像应用数据科学这样的数据科学

介绍(Introduction)

A while back I listened to an interview with Jeremy Howard of FastAI by Lex Fridman. In that interview he shared his passion: making neural networks extremely easy to work with and thus making them accessible to a wider audience. By increasing the number of people trying to solve relevant problems, he argues, more of those problems will be solved in the long run. The example he referenced was a tech savvy doctor using deep learning to solve concrete medical issues.

不久前,我听了Lex FridmanFastAIJeremy Howard的采访。 在那次采访中,他分享了他的热情:使神经网络非常容易使用,从而使更广泛的用户可以访问它们。 他认为,通过增加解决相关问题的人数,从长远来看,其中更多的问题将得到解决。 他引用的示例是一位精通技术的医生,他使用深度学习来解决具体的医学问题。

In this article I want to expand on what in my opinion is needed for an accessible applied deep learning platform. In addition, I want to go into the relation between more fundamental research and applied deep learning.

在本文中,我想扩展一下我认为可访问的应用深度学习平台所需的内容。 此外,我想探讨更基础的研究与应用深度学习之间的关系。

是什么让深度学习系统变得可访问 (What makes a deep learning system accessible)

Abstraction level

抽象级别

One of the most important aspects an accessible system needs to get right is the abstraction level of the interface. With the term abstraction I mean the amount and type of information the user has to deal with when working with the deep learning system. Abstracting unneeded information and details away from the user makes the information load more manageable. Which information is considered to be unneeded is of course a key challenge in designing such an interface.

可访问系统需要正确处理的最重要方面之一是接口的抽象级别。 术语“抽象”是指用户在使用深度学习系统时必须处理的信息量和类型。 从用户那里提取不需要的信息和详细信息使信息负载更易于管理。 当然,在设计这样的接口时,哪些信息被认为是不需要的。

For example, when we train a CNN to perform image recognition there is no need for the applied user to be aware of how exactly gradient descent fits the coefficients of the neural network. Nor should the user be asked to tweak settings deep inside the neural network if good automatic ways of estimating them exist. Of course, in an ideal system, the underlying details can be accessed if the user so desires.

例如,当我们训练CNN进行图像识别时,应用的用户无需知道梯度下降与神经网络系数的拟合程度。 如果存在估算它们的良好自动方法,也不应要求用户对神经网络内部的设置进行调整。 当然,在理想的系统中,如果用户愿意,可以访问基本细节。

Stability and ease of software management

稳定易用的软件管理

Installing and updating the system should be very easy. In addition, the software should be very stable and predictable. Strange bugs and unexpected behavior are the bane of the non-it focused user: they simply want to get work done and not deal with days of delicately setting up a working system.

安装和更新系统应该非常容易。 此外,该软件应非常稳定且可预测。 怪异的错误和意外的行为是不专心的用户的祸根:他们只是想完成工作,而不想花很多时间精心设置工作系统。

I recently delved into reinforcement learning, and ended up using ReAgent. The system is a big step up from the code a researcher would write, but it is certainly not an accessible and robust system like sklearn. The lack of accessibility makes it really hard to work with if you cannot dedicate yourself to ReAgent full-time. This limited the usability for myself as I could only dedicate one day a week. Getting deep learning tools into the hands to part-time users, especially domain experts, has great potential to get a lot of real-world problems solved.

我最近研究了强化学习,最终使用了ReAgent 。 该系统比研究人员编写的代码有很大的进步,但它肯定不是像sklearn那样可访问且强大的系统。 如果您无法全职致力于ReAgent,那么缺乏可访问性将使工作变得非常困难。 这限制了我自己的可用性,因为我一周只能奉献一天。 将深度学习工具交付给兼职用户(尤其是领域专家),具有解决许多实际问题的巨大潜力。

Reasonable resource usage

合理使用资源

Going from smaller datasets and tools like regression to larger datasets and deep learning, the amount of computing resources needed grows exponentially. Large companies and research groups can invest into large amounts of this compute, but this is not feasible for the average user. One of the key design principles of FastAI is that it should run on a single GPU, making the investement feasible as you only need a decent gaming or workstation pc.

从较小的数据集和回归等工具到较大的数据集和深度学习,所需的计算资源量呈指数增长。 大型公司和研究小组可以对大量的这种计算进行投资,但这对于普通用户而言是不可行的。 FastAI的主要设计原则之一是它应该在单个GPU上运行,从而使投资可行,因为您只需要一个体面的游戏或工作站PC。

那基础研究呢 (What about fundamental research)

With all this talk of applied deep learning being important, what about hardcore research? One of Howard’s frustrations was that the deep learning research community was overly focused on small incremental updates to algorithms. Make the learning process five percent faster, make a slight tweak to the architecture of a network to increase performance. He really wants to solve real-world problems, not get stuck in minor details.

在所有这些有关应用深度学习的讨论很重要的情况下,如何进行核心研究呢? 霍华德的挫败之一是,深度学习研究社区过于关注算法的小增量更新。 使学习过程加快百分之五,对网络体系结构稍作调整以提高性能。 他真的很想解决现实世界中的问题,而不是陷入小细节。

Of course, without research, all the fancy algorithms and tools would not exist in the first place for FastAI to pick up. In my opinion, we need both: researchers invent new methods of solving issues and industry people and domain experts can take the tools and apply them to new settings. Tools like FastAI go a long way to enabling industry and domain experts to enable them to do just that.

当然,如果没有研究,FastAI最初就不会拥有所有出色的算法和工具。 我认为,我们两者都需要:研究人员发明了解决问题的新方法,行业人士和领域专家可以使用这些工具并将其应用于新的设置。 诸如FastAI之类的工具在使行业专家和领域专家能够做到这一点上大有帮助。

Finally, what are your thoughts on accessible deep learning systems? Should we put these tools in the hands of the masses, or should they be reserved for the experts?

最后,您对可访问的深度学习系统有何看法? 我们应该把这些工具交到群众手中,还是应该留给专家使用?

If you enjoyed this article, you might also enjoy some of my more longform articles:

如果您喜欢这篇文章,那么您可能还会喜欢我的一些长篇文章:

翻译自: https://towardsdatascience.com/there-is-no-data-science-like-applied-data-science-99b6c5308b5a

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值