Cloudera Developer Training for Apache Hadoop

最新推荐文章于 2025-11-25 09:50:05 发布

最新推荐文章于 2025-11-25 09:50:05 发布 · 433 阅读

·

0

·

文章标签：

#hadoop #Training

Cloud Computing 专栏收录该内容

1 篇文章

订阅专栏

本文详细阐述了Hadoop的基本概念、分布式文件系统、MapReduce工作原理及集群结构，同时指导如何使用Eclipse进行快速开发。进一步介绍了如何将Hadoop整合到工作流程中，包括与关系型数据库管理系统的交互、实时数据处理、HDFS访问方法等。深入分析了Hadoop API的高级应用，如结合器的使用、本地作业运行模式、数据中间件的减少、配置和关闭方法、分区器的编写、直接HDFS访问、分布式缓存利用等。此外，还讨论了Hive和Pig的集成，提供了实用的开发技巧和测试策略，如使用MRUnit、调试MapReduce代码、本地作业运行模式下的简化调试、Eclipse开发技术、计数器获取、日志记录、可分割文件格式的确定、最优 reducers数量的判断、只映射任务的实现、多个mapper的使用等。

•The Motivation For Hadoop

· Problems with traditional large-scale systems

· Requirements for a new approach

• Hadoop Basic Concepts

· An Overview of Hadoop

· The Hadoop Distributed File System

· How MapReduce Works

· Anatomy of a Hadoop Cluster

· Other Hadoop Ecosystem Components

• Writing a MapReduce Program

· The MapReduce Flow

· Examining a Sample MapReduce Program

· Basic MapReduce API Concepts

· The Driver Code

· The Mapper

· The Reducer

· Hadoop’s Streaming API

· Using Eclipse for Rapid Development

• Integrating Hadoop Into The Workflow

· Relational Database Management Systems

· Storage Systems

· Creating workflows with Oozie

· Importing Data from RDBMSs With Sqoop

· Importing Real-Time Data with Flume

· Accessing HDFS Using FuseDFS and Hoop

• Delving Deeper Into The Hadoop API

· Using Combiners

· Using LocalJobRunner Mode for Faster Development

· Reducing Intermediate Data with Combiners

· The configure and close methods for MapReduce

Setup and Teardown

· Writing Partitioners for Better Load Balancing

· Directly Accessing HDFS

· Using The Distributed Cache

• Using Hive and Pig

· Hive Basics

· Pig Basics

• Common MapReduce Algorithms

· Sorting and Searching

· Indexing

· Machine Learning with Mahout

· Term Frequency - Inverse Document Frequency

· Word Co-Occurrence

• Practical Development Tips and Techniques

· Testing with MRUnit

· Debugging MapReduce Code

· Using LocalJobRunner Mode for Easier Debugging

· Eclipse development techniques

· Retrieving Job Information with Counters

· Logging

· Splittable File Formats

· Determining the Optimal Number of Reducers

· Map-Only MapReduce Jobs

· Implementing Multiple Mappers using ChainMapper

• More Advanced MapReduce Programming

· Custom Writables and WritableComparables

· Saving Binary Data using SequenceFiles and Avro Files

· Creating InputFormats and OutputFormats

• Joining Data Sets in MapReduce Jobs

· Map-Side Joins

· The Secondary Sort

· Reduce-Side Joins

• Graph Manipulation in Hadoop

· Introduction to graph techniques

· Representing Graphs in Hadoop

· Implementing a sample algorithm: Single Source

· Shortest Path

• Creating Workflows with Oozie

· The Motivation for Oozie

· Oozie’s Workflow Definition Format

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。