20120813record

`````````````````

sql中连接有多个表时。都是有效的。 join 后 表继续可用。

``````````````

if(a.user_name is null,b.user_name,a.user_name) 三木运算

``````````````

11111111111111111hive实践

hive>  load data local inpath '/home/mjiang/program/eclipse/ccms_stat/test/data/edm' overwrite into table test.s_edm;                                 
FAILED: Error in semantic analysis: Need to specify partition columns because the destination table is partitioned

1111111111111111111hive 实践

hive> CREATE DATABASE database_test_ccms  COMMENT database for test ccms                                                         
    > ;
FAILED: Parse Error: line 1:44 mismatched input 'database' expecting StringLiteral near 'COMMENT' in database's comment


hive> CREATE DATABASE database_test_ccms  COMMENT 'database for test ccms'

111111111111111111111

code1155:

 select user_name,count(1),sum(sms_bill_count),count(distinct uid) from s_sms group by user_name  limit 10;

0112:

hive> select sms.user_name,count(1),sum(sms_bill_count),count(distinct uid) from (select * from s_sms)sms group by sms.user_name  limit 10;      
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Defaulting to jobconf value of: 10

02.07:

insert overwrite table r_fx_daily_all
select t_sms_sum.*,t_edm_sum.edm_send_count,t_edm_sum.edm_people_count
from 
( select sms.user_name,sms.stat_time,count(1),sum(sms_bill_count),count(distinct uid) 
  from s_sms sms
  group by sms.user_name,sms.stat_time )t_sms_sum 
full outer join 
( select edm.user_name ,edm.stat_time, count(1) as edm_send_count,count(distinct uid) as edm_people_count
  from s_edm edm 
  group by edm.user_name,edm.stat_time )t_edm_sum 
on (t_sms_sum.user_name=t_edm_sum.user_name and t_sms_sum.stat_time=t_edm_sum.stat_time) 
sort by _col1 asc;

02.10:

hive> insert overwrite table r_fx_daily_all
    > select t_sms_sum.*,t_edm_sum.edm_send_count,t_edm_sum.edm_people_count,null,null,null,null,null,null,null,null
    > from 
    > ( select sms.user_name,sms.stat_time,count(1),sum(sms_bill_count),count(distinct uid) 
    >   from s_sms sms
    >   group by sms.user_name,sms.stat_time )t_sms_sum 
    > full outer join 
    > ( select edm.user_name ,edm.stat_time, count(1) as edm_send_count,count(distinct uid) as edm_people_count
    >   from s_edm edm 
    >   group by edm.user_name,edm.stat_time )t_edm_sum 
    > on (t_sms_sum.user_name=t_edm_sum.user_name and t_sms_sum.stat_time=t_edm_sum.stat_time) ;
Total MapReduce jobs = 6
Launching Job 1 out of 6
Launching Job 2 out of 6
Number of reduce tasks not specified. Defaulting to jobconf value of: 10

insert overwrite table r_fx_daily_all
select t_sms_sum.*,t_edm_sum.edm_send_count,t_edm_sum.edm_people_count,null,null,null,null,null,null,null,null
from 
( select sms.user_name,sms.stat_time,count(1),sum(sms_bill_count),count(distinct uid) 
  from s_sms sms
  group by sms.user_name,sms.stat_time )t_sms_sum 
full outer join 
( select edm.user_name ,edm.stat_time, count(1) as edm_send_count,count(distinct uid) as edm_people_count
  from s_edm edm 
  group by edm.user_name,edm.stat_time )t_edm_sum 
on ( t_sms_sum.user_name=t_edm_sum.user_name and t_sms_sum.stat_time=t_edm_sum.stat_time) ;

0447:

insert overwrite table r_fx_daily_all
select t_sms_sum.*,t_edm_sum.edm_send_count,t_edm_sum.edm_people_count,null,null,null,null,null,null,null,null
from 
( select sms.user_name,sms.stat_time,count(1),sum(sms_bill_count),count(distinct uid) 
  from s_sms sms
  group by sms.user_name,sms.stat_time )t_sms_sum 
full outer join 
( select edm.user_name ,edm.stat_time, count(1) as edm_send_count,count(distinct uid) as edm_people_count
  from s_edm edm 
  group by edm.user_name,edm.stat_time )t_edm_sum 
on ( if (t_sms_sum.stat_time=null, t_sms_sum.user_name )=t_edm_sum.user_name and t_sms_sum.stat_time=t_edm_sum.stat_time) ;

1111111111111111111111

2张表的完成 是基础,3张表是突破。

3张表要考虑的是 前两张表的合并结果怎么表示

111111111111111111111hive 实践hive

hive> insert table r_fx_daily_all
    > select t_sms_sum.*,t_edm_sum.edm_send_count,t_edm_sum.edm_people_count
    > from 
    > ( select sms.user_name,sms.stat_time,count(1),sum(sms_bill_count),count(distinct uid) 
    >   from s_sms sms
    >   group by sms.user_name,sms_stat_time )t_sms_sum 
    > full outer join 
    > ( select edm.user_name ,edm.stat_time, count(1) as edm_send_count,count(distinct uid) as edm_people_count
    >   from s_edm edm 
    >   group by edm.user_name,edm.stat_time )t_edm_sum 
    > on (t_sms_sum.user_name=t_edm_sum.user_name and t_sms_sum.stat_time=t_edm_sum.stat_time) 
    > sort by r_fx_daily_all.dd_date asc;
FAILED: Parse Error: line 1:0 cannot recognize input near 'insert' 'table' 'r_fx_daily_all' in insert clause

11111111111111111111111hive实践

hive> insert overwrite table r_fx_daily_all
    > select t_sms_sum.*,t_edm_sum.edm_send_count,t_edm_sum.edm_people_count
    > from 
    > ( select sms.user_name,sms.stat_time,count(1),sum(sms_bill_count),count(distinct uid) 
    >   from s_sms sms
    >   group by sms.user_name,sms.stat_time )t_sms_sum 
    > full outer join 
    > ( select edm.user_name ,edm.stat_time, count(1) as edm_send_count,count(distinct uid) as edm_people_count
    >   from s_edm edm 
    >   group by edm.user_name,edm.stat_time )t_edm_sum 
    > on (t_sms_sum.user_name=t_edm_sum.user_name and t_sms_sum.stat_time=t_edm_sum.stat_time) 
    > sort by r_fx_daily_all.dd_date asc;
FAILED: Error in semantic analysis: Line 12:8 Invalid table alias or column reference 'r_fx_daily_all': (possible column names are: _col0, _col1, _col2, _col3, _col4, _col5, _col6)

1111111111111111hive 实践

hive> insert overwrite table r_fx_daily_all
    > select t_sms_sum.*,t_edm_sum.edm_send_count,t_edm_sum.edm_people_count
    > from 
    > ( select sms.user_name,sms.stat_time,count(1),sum(sms_bill_count),count(distinct uid) 
    >   from s_sms sms
    >   group by sms.user_name,sms.stat_time )t_sms_sum 
    > full outer join 
    > ( select edm.user_name ,edm.stat_time, count(1) as edm_send_count,count(distinct uid) as edm_people_count
    >   from s_edm edm 
    >   group by edm.user_name,edm.stat_time )t_edm_sum 
    > on (t_sms_sum.user_name=t_edm_sum.user_name and t_sms_sum.stat_time=t_edm_sum.stat_time) ;
FAILED: Error in semantic analysis: Line 1:23 Cannot insert into target table because column number/types are different 'r_fx_daily_all': Table insclause-0 has 15 columns, but query has 7 columns.

11111111111111

alt + ` 同程序切换

1111111111111111

hive 

    FROM pv_users
    INSERT OVERWRITE TABLE pv_gender_sum
        SELECT pv_users.gender, count_distinct(pv_users.userid)
        GROUP BY pv_users.gender

    INSERT OVERWRITE DIRECTORY '/user/data/tmp/pv_age_sum'
        SELECT pv_users.age, count_distinct(pv_users.userid)
        GROUP BY pv_users.age;

hive 和 union all:

union对两张表的操作是合并数据条数,等于是纵向的,要求是两张表字段必须是相同的(Schema of both sides of union should match.)。也就是说如果A表中有三条数据,B表中有两条数据,那么A union B就会有五条数据。说明一下union 和union all的差别,对于union如果存在相同的数据记录会被合并,而union all不会合并相同的数据记录,该有多少条记录就会有多少条记录。例如在mysql下执行以下语句:
select * from tmp_libingxue_a;
name number
libingxue  1001
yuwen 1002
select * from tmp_libingxue_b;
name number
libingxue 1001
feiyao 1003
select * from tmp_libingxue_a union select * from tmp_libingxue_b;
libingxue 1001
yuwen 1002
feiyao 1003
select * from tmp_libingxue_a union all select * from tmp_libingxue_b;
libingxue 1001
yuwen 1002
libingxue 1001
feiyao 1003
但是这样在hive里面是不能执行的,执行select * from tmp_libingxue_a union all select * from tmp_libingxue_b;会failed,hive中union必须在子查询中进行。如
select * from (select * from tmp_yuwen_a union all select * from tmp_yuwen_b) t1;
注意,必须是union all,单独用union它会提示你缺少ALL,而且后面的t1必须写,你可以写成a或者b,但是一定要写,不写会出错。

111111111111111111111111111111111111

学习网站:

https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-SimpleQuery


22222222222222

源表用s_为前缀

临时表用t_为前缀

结果表用r_为前缀

跨sql的临时表用m_为前缀

11111111111111111111111111

code:

 load data inpath '/user/hive/warehouse/s_sms/stat_time=20120613/sms_20120613' overwrite  into table s_sms partition(stat_time='20120808');


create table r_fx_daily_all(
user_name string,
dd_date string,
sms_send_count bigint,
sms_bill_count bigint,
sms_people_count bigint,
edm_send_count bigint,
edm_people_count bigint,
wap_send_count bigint,
wap_people_count bigint,
mms_send_count bigint,
mms_people_count bigint,
coupon_send_count bigint,
coupon_people_count bigint,
ump_send_count bigint,
ump_people_count bigint)
partitioned by(stat_time string);


hive实践之:


hive> select ( select sms.user_name,sms.stat_time,count(sms.1),sum(sms.sms_bill_count),count(distinct sms.uid) group by sms.user_name) group by stat_time from s_sms sms limit 10;
FAILED: Parse Error: line 1:9 cannot recognize input near 'select' 'sms' '.' in expression specification


hive> select ( select sms.user_name,sms.stat_time,count(sms.1),sum(sms.sms_bill_count),count(distinct sms.uid) group by sms.user_name) group by sms.stat_time from s_sms sms limit 10;
FAILED: Parse Error: line 1:9 cannot recognize input near 'select' 'sms' '.' in expression specification


hive> select ( select user_name,stat_time,count(1),sum(sms_bill_count),count(distinct uid) group by user_name) group by stat_time from s_sms  limit 10;                               
FAILED: Parse Error: line 1:9 cannot recognize input near 'select' 'user_name' ',' in expression specification


hive> select user_name,stat_time,count(1),sum(sms_bill_count),count(distinct uid) group by user_name from s_sms  limit 10;                             
FAILED: Parse Error: line 1:76 mismatched input 'group' expecting FROM near ')' in from clause


hive> select user_name,stat_time,count(1),sum(sms_bill_count),count(distinct uid) from s_sms group by user_name  limit 10;           
FAILED: Error in semantic analysis: Line 1:17 Expression not in GROUP BY key 'stat_time'

hive> select user_name,count(1),sum(sms_bill_count),count(distinct uid) from s_sms group by user_name  limit 10;          
Total MapReduce jobs = 2
Launching Job 1 out of 2

1111111111111111111111111111111long long ago

安装jdk
[root@test11hadoop]# chmod +x jdk-6u16-linux-i586.bin
[root@test11 hadoop]#./jdk-6u16-linux-i586.bin


nfs 挂载目录



audio

允许使用声音设备

cdrom

允许直接使用光驱

floppy

允许直接使用软驱

games

允许运行游戏

portage

允许以普通用户的身份执行emerge--pretend

usb

允许使用USB设备

plugdev

允许挂载和使用可拔插设备,例如摄像头和U

video

允许使用视频采集设备和硬件加速

wheel

允许使用su

例如,要创建一个名为john的用户,并使其隶属于wheel组,users组和audio组,请先用root用户登录(只有root有权限创建用户),然后执行useradd命令:

代码 1.1:添加一个日常使用的用户

Login: root
Password: (您的root密码)

# useradd -m -G users,wheel,audio -s /bin/bash john
# passwd john
Password: (输入john用户使用的密码)
Re-enter password: (再次输入以便进行校验)



A=(a b cdef) shell数组


统一资源定位符不但被用作网页地址,数据库终端也使用统一资源定位符服务器连接其服务器。实际上任何终端-服务器程序都可以使用统一资源定位符来连接。

以下是一个数据库的统一资源定位符

jdbc:datadirect:oracle://myserver:1521;sid=testdb

里面放的是该进程的process  id. Pid文件


manrsyncd.conf


mars: bash:/home/mjiang/hadoop-1.0.0/sbin/hadoop-daemon.sh: Permission denied


shuffle 洗牌


给师兄解决了个问题无法编译文件 原来是因为权限不够 在su下就可以了


hash函数


对于应届毕业生,最重要的是可塑性而不是经验。
对于hadoop这种东西,在实际项目中是很快就可以掌握的,所以更重要的是要看你的学习能力,基础知识是否扎实。熟练程度应该是社会招聘或者职位紧缺需要人立即过来干活的时候才会看中的。
越大的公司,这一点会越明显。
当然,如果你能了解这个东西,很多公司的招聘条件里会写:优先考虑。
还有一点,是你自己的定位问题。如果你要进的公司根本不用这个玩意,那你就算精通hadoop的每一处细节也没有用;反之,如果你立志就要投身于hadoop研究中,那你最好要用这个东西做过几个项目。
总之最重要的是:你懂hadoop的原理,而不是会使用这个工具




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值