一、封装类:
public class Bean implements Writable {
private String orderId;
private String userId;
private String name;
private String age;
private String userName;
private String firstName;
public Bean() {
}
public Bean(String orderId, String userId, String name, String age, String userName, String firstName) {
this.orderId = orderId;
this.userId = userId;
this.name = name;
this.age = age;
this.userName = userName;
this.firstName = firstName;
}
public String getOrderId() {
return orderId;
}
public void setOrderId(String orderId) {
this.orderId = orderId;
}
public String getUserId() {
return userId;
}
public void setUserId(String userId) {
this.userId = userId;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getAge() {
return age;
}
public void setAge(String age) {
this.age = age;
}
public String getUserName() {
return userName;
}
public void setUserName(String userName) {
this.userName = userName;
}
public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
@Override
public String toString() {
return "orderId='" + orderId + '\'' +
", userId='" + userId + '\'' +
", name='" + name + '\'' +
", age='" + age + '\'' +
", userName='" + userName + '\'' +
", firstName='" + firstName + '\'' ;
}
@Override
public void write(DataOutput dataOutput) throws IOException {
dataOutput.writeUTF(this.orderId);
dataOutput.writeUTF(this.userId);
dataOutput.writeUTF(this.name);
dataOutput.writeUTF(this.age);
dataOutput.writeUTF(this.userName);
dataOutput.writeUTF(this.firstName);
}
@Override
public void readFields(DataInput dataInput) throws IOException {
this.orderId = dataInput.readUTF();
this.userId = dataInput.readUTF();
this.name = dataInput.readUTF();
this.age = dataInput.readUTF();
this.userName = dataInput.readUTF();
this.firstName = dataInput.readUTF();
}
}
二、Mapper类:
public class GoodsMapper extends Mapper<LongWritable,Text,Text,JoinBean> {
private String fileName;
JoinBean joinBean = new JoinBean();
@Override
protected void setup(Context context) throws IOException, InterruptedException {
FileSplit split = (FileSplit) context.getInputSplit();
fileName = split.getPath().getName();
}
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] lien = value.toString().split(",");
if (fileName.startsWith("order")){
joinBean.setOrderId(lien[0]);
joinBean.setUserId(lien[1]);
joinBean.setAge("NULL");
joinBean.setName("NULL");
joinBean.setUserName("NULL");
joinBean.setFistName("order");
}
if (fileName.startsWith("user")){
joinBean.setOrderId("NULL");
joinBean.setUserId(lien[0]);
joinBean.setAge(lien[2]);
joinBean.setName(lien[1]);
joinBean.setUserName(lien[3]);
joinBean.setFistName("user");
}
//以用户名为key ,把数据进行分文件组合成javabean,然后放入context中,shuffle会自动分区,排序map阶段的数据。
context.write(new Text(joinBean.getUserId()),joinBean);
}
}
三、Reducer类:
1.注意for循环迭代器Values的括号(引发错误:重复K,null)!
2.注意将BeanUtils.写在if比较语句的里面!简便了bean.setOrderId(value.getOrderId());
3.注意遍历链表里的Bean时,context输出的K,V要写在for括号的里面!
4.注意复制文件BeanUtils.copyProperties(joinBean1,value);
public class GoodsReducer extends Reducer<Text,JoinBean,JoinBean,NullWritable> {
@Override
protected void reduce(Text key, Iterable<JoinBean> values, Context context) throws IOException, InterruptedException {
//user类集合
List<JoinBean> joinBeans = new ArrayList<>();
//保存orderid的对象
JoinBean joinBean = new JoinBean();
//区分对象是order还是user
for (JoinBean value : values) {
if (value.getFistName().equals("order")){
joinBean.setOrderId(value.getOrderId());
joinBean.setUserId(value.getUserId());
}
if (value.getFistName().equals("user")){
JoinBean joinBean1 = new JoinBean();
try {
BeanUtils.copyProperties(joinBean1,value);
} catch (IllegalAccessException e) {
e.printStackTrace();
} catch (InvocationTargetException e) {
e.printStackTrace();
}
joinBeans.add(joinBean1);
}
}
for (JoinBean bean : joinBeans) {
bean.setOrderId(joinBean.getOrderId());
context.write(bean,NullWritable.get());
}
}
}
四、测试类:
1.创建Configuration环境
2.创建Job添加conf环境
3.设置整个job所用的那些类在哪个Jar包
4.设置job使用的mapper和reducer的类
5.设置mapper的输出数据KV类型
6.指定reducer的输出数据KV类型
7.指定要处理的输入数据存放路径
8.指定要处理的输出数据存放路径
9.将job提交给集群运行