环境
虚拟机:VMware 10 Linux版本:CentOS-6.5-x86_64 客户端:Xshell4 FTP:Xftp4 jdk8 hadoop-3.1.1 apache-hive-3.1.1一、Hive Lateral View
Lateral View用于和UDTF函数(explode、split)结合来使用。首先通过UDTF函数拆分成多行,再将多行结果组合成一个支持别名的虚拟表。主要解决在select使用UDTF做查询过程中,查询只能包含单个UDTF,不能包含其他字段、以及多个UDTF的问题语法:
LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)举例:统计人员表中共有多少种爱好、多少个城市?
hive> select * from psn2;OKpsn2.id psn2.name psn2.likes psn2.address psn2.age1 小明1 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 102 小明2 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 103 小明3 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 104 小明4 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 105 小明5 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 106 小明6 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 101 小明1 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 202 小明2 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 203 小明3 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 204 小明4 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 205 小明5 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 206 小明6 ["lol","book","movie"] { "beijing":"shangxuetang","shanghai":"pudong"} 20Time taken: 0.138 seconds, Fetched: 12 row(s)hive> select explode(likes) from psn2;OKcollolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovieTime taken: 0.294 seconds, Fetched: 36 row(s)hive> select count(distinct(myCol1)), count(distinct(myCol2)) from psn2 > LATERAL VIEW explode(likes) myTable1 AS myCol1 > LATERAL VIEW explode(address) myTable2 AS myCol2, myCol3;Query ID = root_20190216171853_af297af9-dcc6-4e1e-8674-fa0969727b23Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers:set hive.exec.reducers.max= In order to set a constant number of reducers:set mapreduce.job.reduces= Starting Job = job_1548397153910_0012, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0012/Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job -kill job_1548397153910_0012Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12019-02-16 17:19:00,480 Stage-1 map = 0%, reduce = 0%2019-02-16 17:19:04,582 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.08 sec2019-02-16 17:19:09,693 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.24 secMapReduce Total cumulative CPU time: 7 seconds 240 msecEnded Job = job_1548397153910_0012MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.24 sec HDFS Read: 15860 HDFS Write: 103 SUCCESSTotal MapReduce CPU Time Spent: 7 seconds 240 msecOK_c0 _c13 2Time taken: 16.894 seconds, Fetched: 1 row(s)hive>
二、hive View视图
和关系型数据库中的普通视图一样,hive也支持视图特点: 不支持物化视图(oracle支持) 只能查询,不能做加载数据操作 视图的创建,只是保存一份元数据,查询视图时才执行对应的子查询 view定义中若包含了ORDER BY/LIMIT语句,当查询视图时也进行ORDER BY/LIMIT语句操作,view当中定义的优先级更高 view支持迭代视图View语法
创建视图:CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ][COMMENT view_comment][TBLPROPERTIES (property_name = property_value, ...)]AS SELECT ... ;
举例:注意 视图在HDFS下不存在文件
hive> create view v_psn2 as select id,name from psn2;OKid nameTime taken: 0.127 secondshive> show tables;OKtab_namecell_drop_monitorcell_monitordocslogtblpersonperson3psn2psn21psn22psn3psn31psn4psnbucketstudenttest01v_psn2wcTime taken: 0.02 seconds, Fetched: 17 row(s)hive> select * from v_psn2;OKv_psn2.id v_psn2.name1 小明12 小明23 小明34 小明45 小明56 小明61 小明12 小明23 小明34 小明45 小明56 小明6Time taken: 0.11 seconds, Fetched: 12 row(s)hive> drop view v_psn2;OKTime taken: 0.08 secondshive> select * from v_psn2;FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'v_psn2'hive>
三、Hive 索引
目的:优化查询以及检索性能
给表psn2创建索引:create index t1_index on table psn2(name) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild in table t1_index_table;
as:指定索引器;
in table:指定索引表,若不指定默认生成在default__psn2_t1_index__表中create index t1_index on table psn2(name)
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild;查询索引
show index on psn2;重建索引(建立索引之后必须重建索引才能生效)
ALTER INDEX t1_index ON psn2 REBUILD;删除索引
DROP INDEX IF EXISTS t1_index ON psn2;