pig 介绍与pig版 hello world

本文深入探讨了Pig作为ETL工具在处理原始数据方面的应用,介绍了其核心特性,如数据流执行、Pig Latin语言、与Hadoop的整合,并通过实例展示了如何使用Pig进行数据转换和分析。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

前两天使用pig做ETL,粗浅的看了一下,没有系统地学习,感觉pig还是值得学习的,故又重新看programming pig.

以下是看的第一章的笔记:

What is pig?

Pig provides an engine for executing data flows in parallel on Hadoop. It includes a

language, Pig Latin, for expressing these data flows. Pig Latin includes operators for

many of the traditional data operations (join, sort, filter, etc.), as well as the ability for

users to develop their own functions for reading, processing, and writing data.

 Pig runs on Hadoop. It makes use of both the Hadoop Distributed File System,

HDFS, and Hadoop’s processing system, MapReduce.

 pig Latin for a language, Grunt for a shell, and Piggybank for a CPAN-like shared repository。

 What is pig used for ?

ETL?

research for raw data (unstructured)

Pig Philosophy

eat everything ;

live anywhere;

pig fly;

domestic animal;(easy to write UDF)

pig版 hello world:

data:

hello world, hello pig

hello hadooop, hello hdfs

I love programming

I love this world

I love programming with pig

 

pig script:

txt = load 'data.txt' as (line);

words = foreach txt generate flatten(TOKENIZE(line)) as word;

grpd = group words by word;

describe grpd

cntd = foreach grpd generate group, COUNT(words);

dump cntd

 

转载于:https://www.cnblogs.com/huaxiaoyao/p/3465368.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值