博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
【Hive学习之六】Hive Lateral View &视图&索引
阅读量:4625 次
发布时间:2019-06-09

本文共 4933 字,大约阅读时间需要 16 分钟。

环境

  虚拟机:VMware 10
  Linux版本:CentOS-6.5-x86_64
  客户端:Xshell4
  FTP:Xftp4
  jdk8
  hadoop-3.1.1
  apache-hive-3.1.1

一、Hive Lateral View

Lateral View用于和UDTF函数(explode、split)结合来使用。
首先通过UDTF函数拆分成多行,再将多行结果组合成一个支持别名的虚拟表。
主要解决在select使用UDTF做查询过程中,查询只能包含单个UDTF,不能包含其他字段、以及多个UDTF的问题

语法:

LATERAL VIEW udtf(expression) tableAlias AS columnAlias (',' columnAlias)

举例:统计人员表中共有多少种爱好、多少个城市?

hive> select * from psn2;OKpsn2.id    psn2.name    psn2.likes    psn2.address    psn2.age1    小明1    ["lol","book","movie"]    {
"beijing":"shangxuetang","shanghai":"pudong"} 102 小明2 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 103 小明3 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 104 小明4 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 105 小明5 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 106 小明6 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 101 小明1 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 202 小明2 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 203 小明3 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 204 小明4 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 205 小明5 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 206 小明6 ["lol","book","movie"] {
"beijing":"shangxuetang","shanghai":"pudong"} 20Time taken: 0.138 seconds, Fetched: 12 row(s)hive> select explode(likes) from psn2;OKcollolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovielolbookmovieTime taken: 0.294 seconds, Fetched: 36 row(s)hive> select count(distinct(myCol1)), count(distinct(myCol2)) from psn2 > LATERAL VIEW explode(likes) myTable1 AS myCol1 > LATERAL VIEW explode(address) myTable2 AS myCol2, myCol3;Query ID = root_20190216171853_af297af9-dcc6-4e1e-8674-fa0969727b23Total jobs = 1Launching Job 1 out of 1Number of reduce tasks determined at compile time: 1In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:set hive.exec.reducers.max=
In order to set a constant number of reducers:set mapreduce.job.reduces=
Starting Job = job_1548397153910_0012, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0012/Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job -kill job_1548397153910_0012Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 12019-02-16 17:19:00,480 Stage-1 map = 0%, reduce = 0%2019-02-16 17:19:04,582 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.08 sec2019-02-16 17:19:09,693 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.24 secMapReduce Total cumulative CPU time: 7 seconds 240 msecEnded Job = job_1548397153910_0012MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.24 sec HDFS Read: 15860 HDFS Write: 103 SUCCESSTotal MapReduce CPU Time Spent: 7 seconds 240 msecOK_c0 _c13 2Time taken: 16.894 seconds, Fetched: 1 row(s)hive>

 

二、hive View视图

和关系型数据库中的普通视图一样,hive也支持视图
特点:
  不支持物化视图(oracle支持)
  只能查询,不能做加载数据操作
  视图的创建,只是保存一份元数据,查询视图时才执行对应的子查询
  view定义中若包含了ORDER BY/LIMIT语句,当查询视图时也进行ORDER BY/LIMIT语句操作,view当中定义的优先级更高
  view支持迭代视图

View语法

创建视图:

CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ][COMMENT view_comment][TBLPROPERTIES (property_name = property_value, ...)]AS SELECT ... ;

举例:注意 视图在HDFS下不存在文件  

hive> create view v_psn2 as select id,name from psn2;OKid    nameTime taken: 0.127 secondshive> show tables;OKtab_namecell_drop_monitorcell_monitordocslogtblpersonperson3psn2psn21psn22psn3psn31psn4psnbucketstudenttest01v_psn2wcTime taken: 0.02 seconds, Fetched: 17 row(s)hive> select * from v_psn2;OKv_psn2.id    v_psn2.name1    小明12    小明23    小明34    小明45    小明56    小明61    小明12    小明23    小明34    小明45    小明56    小明6Time taken: 0.11 seconds, Fetched: 12 row(s)hive> drop view v_psn2;OKTime taken: 0.08 secondshive> select * from v_psn2;FAILED: SemanticException [Error 10001]: Line 1:14 Table not found 'v_psn2'hive>

 

三、Hive 索引

 目的:优化查询以及检索性能

给表psn2创建索引:

create index t1_index on table psn2(name)
as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild
in table t1_index_table;

as:指定索引器;

in table:指定索引表,若不指定默认生成在default__psn2_t1_index__表中

create index t1_index on table psn2(name)

as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild;

查询索引

show index on psn2;

重建索引(建立索引之后必须重建索引才能生效)

ALTER INDEX t1_index ON psn2 REBUILD;

删除索引

DROP INDEX IF EXISTS t1_index ON psn2;

 

转载于:https://www.cnblogs.com/cac2020/p/10388485.html

你可能感兴趣的文章
sublime3 sftp 忽略文件的写法
查看>>
42. Multiply Strings
查看>>
Socket 基于完整CS样式例子(转载)
查看>>
C++ 概念易错点
查看>>
【顶】在node环境下玩转less
查看>>
图片标签
查看>>
论网站结构(框架篇)
查看>>
oracle常用函数汇总
查看>>
.net下跨Server分布式事务的处理办法TransactionScope (转)
查看>>
LeetCode Fraction to Recurring Decimal
查看>>
知识点篇:8.1)坑爹的面试问题--选定轴孔配合公差等级
查看>>
数位dp
查看>>
输出 Office 报表
查看>>
galera cluster集群的分裂与仲裁机制
查看>>
day19
查看>>
java基础学习总结六(对象与类、类的属性与方法)
查看>>
PAT天梯赛L2-005 集合相似度
查看>>
Exercise 15: Reading Files
查看>>
进制转换的方法和技巧
查看>>
初级前端自学react-native,必备知识点(ES6+ReactJS+flexbox)
查看>>