NASA是如何使用Hadoop来推动气候科学

NASA正在利用Hadoop简化气候数据的分析流程。通过构建34节点集群,研究人员能够快速处理并分析过去三十年间的大气及气候数据,极大地提高了工作效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Whenever climate scientists want to analyze data, they need to request it in its messy original format, clean it up and analyzeit. That sort of work takes up valuable time, and so it makes sense that the federal government has started funding efforts to simplify the process.

Speaking at 2013 Hadoop Summit in San Jose on Wednesday, NASA software developer Glenn Tamkin (pictured) explained how he and one of his colleagues have been cooking up a 34-node Hadoop cluster for NASA’s Center for Climate Simulation that can analyze slices of the data in response to end users’ queries. The new architecture could be handy in seeing how the data stacks up in comparison with other data sets used in the U.S. and in other countries.

Tamkin’s team has an 80 TB data set on its hands concerning all kinds of information about climate and atmosphere: winds, clouds, humidity, air and water temperature and so on for the past three decades. The data includes observational information mostly collected from satellites as well as simulation data for filling in gaps. But it’s not continually streaming in; rather, it gets fed in every once a year, Tamkin said. The data is already publicly available.

The developers have brought this data into the Hadoop Distributed File System and rely on all the scaled-out nodes to quickly compute sums, counts, averages, standard deviation and other measurements in MapReduce.

While the MapReduce jobs don’t run as fast as he would like — it took two minutes to answer one query Tamkin’s ran recently — the new Hadoop setup sounds like it would be a lot less trouble for scientists looking for basic information across many years.

NASA is now employing the Cloudera Distribution for Hadoop for this work, although Tamkin said he’s not using every part of it; he would like to tack on more components around managing the cluster to try to further optimize the system, he said. He also wants to develop a method for caching queries, so they can run faster.

The project will end up serving data out of Hadoop through an API to scientists across government agencies and private organizations later this year, Tamkin said. And like the data itself, the API will also become available to the general public, perhaps as soon as February 2014, Tamkin said.

Cloudera 亮了,这个NASA团队采用的是他的hadoop产品。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值