hive本身提供explode函数实现这种功能,explode方法接受一个数值作为输入,将每个数组元素作为一行输出。
比如:
pageid paged
page1 a,b,c
要转换成
page1 a
page1 b
page1 c
select pageid,p from test lateral view explode(split(paged,',')) adtable as p;
通过split拆成多个元素的集合,再通过split打散成多行,lateral view作用是解决explode这种UDTF函数拆成的数据不能与其他字段共同服务问题
http://blog.youkuaiyun.com/inte_sleeper/article/details/7196114
但本人在使用split分割时总是出现问题,split的正则一直匹配错误
于是转去研究使用hive提供的自定义UDTF函数的方式实现,在这感谢两位博客的文章分享:
http://blog.sina.com.cn/s/blog_6ff05a2c0100tpi4.html
http://blog.youkuaiyun.com/cheersu/article/details/8333045
ORACLE中生成从1到n的连续数字的方法很多,最简单的一种是:
select level from dual connect by level<=5;
----------
使用UDTF写出来的函数进行查询:
hive> select serial(5) as col1 from dual;
OK
1
2
3
4
5
1
2
3
4
5
或者使用lateral view进行查询:
hive> select T.col1 from dual lateral view serial(5) T as col1;
OK
1
2
3
4
5
1
2
3
4
5
提供一下java代码,仅供参考:
package com.hadoopbook.hive;
import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthExcepti on;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspector Factory;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthExcepti
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspector
public class UDTFSerial extends GenericUDTF {
Object[] result = new Object[1];
@Override
public void close() throws HiveException {
}
@Override
public StructObjectInspector initialize(ObjectInspector[] args)
throws UDFArgumentException {
if (args.length != 1) {
throw new UDFArgumentLengthExcepti
on("UDTFSerial takes only one argument");
}
if (!args[0].getTypeName().equals("int")) {
throw new UDFArgumentException("UDTFSerial only takes an integer as a parameter");
}
ArrayList<String> fieldNames = new ArrayList<String>();
ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
fieldNames.add("col1");
fieldOIs.add(PrimitiveObjectInspector
Factory.javaIntObjectInspector);
return ObjectInspectorFactory.getStandardStructObjectI
nspector(fieldNames,fieldOIs);
}
@Override
public void process(Object[] args) throws HiveException {
try
{
int n = Integer.parseInt(args[0].toString());
for (int i=0;i<n;i++)
{
result[0] = i+1;
forward(result);
}
}
catch (Exception e) {
throw new HiveException("UDTFSerial has an exception");
}
}
}
}