當(dāng)前位置：首頁 > news >正文

長沙縣工程建設(shè)質(zhì)監(jiān)站網(wǎng)站站長統(tǒng)計官方網(wǎng)站

news 2025/7/7 15:35:06

長沙縣工程建設(shè)質(zhì)監(jiān)站網(wǎng)站,站長統(tǒng)計官方網(wǎng)站,wordpress入門視頻教程7 - 如何在文章里加入視頻和音樂,鄭州工程設(shè)計公司官網(wǎng)1.內(nèi)部表和外部表默認(rèn)情況下創(chuàng)建的表就是內(nèi)部表，Hive擁有該表的結(jié)構(gòu)和文件。換句話說，Hive完全管理表（元數(shù)據(jù)和數(shù)據(jù)）的生命周期，類似于RDBMS中的表。當(dāng)你刪除內(nèi)部表時，它會刪除數(shù)據(jù)以及表的元數(shù)據(jù)。可以使…

1.內(nèi)部表和外部表

默認(rèn)情況下創(chuàng)建的表就是內(nèi)部表，Hive擁有該表的結(jié)構(gòu)和文件。換句話說，Hive完全管理表（元數(shù)據(jù)和數(shù)據(jù)）的生命周期，類似于RDBMS中的表。當(dāng)你刪除內(nèi)部表時，它會刪除數(shù)據(jù)以及表的元數(shù)據(jù)?？梢允褂肈ESCRIBE FORMATTED tablename,來獲取表的元數(shù)據(jù)描述信息，從中可以看出表的類型
在這里插入圖片描述
外部表（External table )中的數(shù)據(jù)不是Hive擁有或管理的，只管理表元數(shù)據(jù)的生命周期。要創(chuàng)建一個外部表，需要使用EXTERNAL語法關(guān)鍵字。刪除外部表只會刪除元數(shù)據(jù)，而不會刪除實際數(shù)據(jù)。在Hive外部仍然可以訪問實際數(shù)據(jù)。實際場景中，外部表搭配location語法指定數(shù)據(jù)的路徑，可以讓數(shù)據(jù)更安全。
主要差異:

無論內(nèi)部表還是外部表，Hive都在Hive Metastore中管理表定義、字段類型等元數(shù)據(jù)信息。
刪除內(nèi)部表時，除了會從Metastore中刪除表元數(shù)據(jù)，還會從HDFS中刪除其所有數(shù)據(jù)文件。
刪除外部表時，只會從Metastore中刪除表的元數(shù)據(jù)，并保持HDFS位置中的實際數(shù)據(jù)不變。

代碼演示

-- 創(chuàng)建內(nèi)部表 加載數(shù)據(jù)
create table t_user_inner(id int,uname string,pwd string,sex string,age int
)row format delimited fields terminated by ',';
load data local inpath '/root/user.txt' into table t_user_inner;
-- 查看表信息
desc formatted t_user_inner ;
-- 創(chuàng)建外部表 加載數(shù)據(jù)
create external table t_user_ext(id int,uname string,pwd string,sex string,age int
)row format delimited fields terminated by ',';
load data local inpath '/root/user.txt' into table t_user_ext;
-- 查看表信息
desc formatted table t_user_ext;-- 刪除內(nèi)部表  數(shù)據(jù)被刪除了
drop  table t_user_inner;
-- 刪除外部表  數(shù)據(jù)并沒有被刪除
drop  table t_user_ext;-- 再次重新創(chuàng)建 t_user_ext 可以直接查詢數(shù)據(jù)
select * from  t_user_ext;-- 將t_user_ext 轉(zhuǎn)換為內(nèi)部表
alter table t_user_ext set tblproperties('EXTERNAL'='FALSE'); -- 要求KV的大小寫
-- 查詢表信息發(fā)現(xiàn) Table Type: MANAGED_TABLE
desc formatted t_user_ext; 
--將t_user_ext 轉(zhuǎn)換為外部表
alter table t_user_ext set tblproperties('EXTERNAL'='true');
-- 查詢表信息發(fā)現(xiàn) Table Type:EXTERNAL_TABLE
desc formatted t_user_ext;

2.分區(qū)表

分區(qū)表實際上就是將表中的數(shù)據(jù)以某種維度進(jìn)行劃分文件夾管理 ,當(dāng)要查詢數(shù)據(jù)的時候,根據(jù)維度直接加載對應(yīng)文件夾下的數(shù)據(jù)! 不用加載整張表所有的數(shù)據(jù)再進(jìn)行過濾, 從而提升處理數(shù)據(jù)的效率!

比如在一個學(xué)生表中想查詢某一個年級的學(xué)生,如果不分區(qū),需要在整個數(shù)據(jù)文件中全表掃描,但是分區(qū)后只需要查詢對應(yīng)分區(qū)的文件即可.
在這里插入圖片描述

靜態(tài)分區(qū)

靜態(tài)分區(qū)指的是分區(qū)的屬性值是由用戶在加載數(shù)據(jù)的時候手動指定的

1.創(chuàng)建單分區(qū)表:

-- 創(chuàng)建學(xué)生表 分區(qū)字段為年級grade
CREATE TABLE  t_student (sid int,sname string) partitioned by(grade int)   -- 指定分區(qū)字段
row format delimited fields terminated by ',';
-- 注意∶分區(qū)字段不能是表中已經(jīng)存在的字段，因為分區(qū)字段最終也會以虛擬字段的形式顯示在表結(jié)構(gòu)上。
select * from t_student;
+----------------+------------------+------------------+
| t_student.sid  | t_student.sname  | t_student.grade  |
+----------------+------------------+------------------+
+----------------+------------------+------------------+

創(chuàng)建本地文件

stu01.txt
1,zhangsan,1
2,lisi,1
3,wangwu,1stu02.txt
4,zhaoliu,2
5,lvqi,2
6,maba,2stu03.txt
7,liuyan,3
8,tangyan,3
9,jinlian,3

-- 靜態(tài)分區(qū)需要用戶手動加載數(shù)據(jù) 并指定分區(qū)
load  data local  inpath '/root/stu01.txt' into table t_student partition(grade=1);
load  data local  inpath '/root/stu02.txt' into table t_student partition(grade=2);
load  data local  inpath '/root/stu03.txt' into table t_student partition(grade=3);
-- 查詢
select * from t_student where grade=1;
+----------------+------------------+------------------+
| t_student.sid  | t_student.sname  | t_student.grade  |
+----------------+------------------+------------------+
| 1              | zhangsan         | 1                |
| 2              | lisi             | 1                |
| 3              | wangwu           | 1                |
+----------------+------------------+------------------+

在這里插入圖片描述
注意:文件中的數(shù)據(jù)放入到哪個分區(qū)下就屬于當(dāng)前分區(qū)的數(shù)據(jù),即使數(shù)據(jù)有誤,也會按照當(dāng)前分區(qū)處理

stu03.txt
7,liuyan,3
8,tangyan,3
9,jinlian,3
10.aaa,4load  data local  inpath '/root/stu03.txt' overwrite into table t_student partition(grade=3);select * from t_student where grade=3;
-- 最后一條記錄雖然寫的是4 但是 放到了年級3分區(qū)下 效果也是年級3
+----------------+------------------+------------------+
| t_student.sid  | t_student.sname  | t_student.grade  |
+----------------+------------------+------------------+
| 7              | liuyan           | 3                |
| 8              | tangyan          | 3                |
| 9              | jinlian          | 3                |
| 10             | aaa              | 3                |
+----------------+------------------+------------------+

創(chuàng)建多分區(qū)表

-- 創(chuàng)建學(xué)生表 分區(qū)字段為年級grade 班級clazz
CREATE TABLE  t_student02 (sid int,sname string) partitioned by(grade int,clazz int)   -- 指定分區(qū)字段
row format delimited fields terminated by ',';

1年級1班
stu0101.txt  
1,zhangsan,1,1
2,lisi,1,11年級2班
stu0102.txt
3,wangwu,1,22年級1班
stu0201.txt
4,zhangsan,2,1
5,lisi,2,1
6,maba,2,13年級1班
stu0301.txt
7,liuyan,3,1
8,tangyan,3,1
3年級2班
9,dalang,3,2
10,jinlian,3,2

load  data local  inpath '/root/stu0101.txt' into table t_student02 partition(grade=1,clazz=1);
load  data local  inpath '/root/stu0102.txt' into table t_student02 partition(grade=1,clazz=2);
load  data local  inpath '/root/stu0201.txt' into table t_student02 partition(grade=2,clazz=1);
load  data local  inpath '/root/stu0301.txt' into table t_student02 partition(grade=3,clazz=1);
load  data local  inpath '/root/stu0302.txt' into table t_student02 partition(grade=3,clazz=2);select * from t_student02 where grade=1 and clazz=2;
+------------------+--------------------+--------------------+--------------------+
| t_student02.sid  | t_student02.sname  | t_student02.grade  | t_student02.clazz  |
+------------------+--------------------+--------------------+--------------------+
| 7                | liuyan             | 3                  | 1                  |
| 8                | tangyan            | 3                  | 1                  |
+------------------+--------------------+--------------------+--------------------+

在這里插入圖片描述

動態(tài)分區(qū)

靜態(tài)分區(qū)與動態(tài)分區(qū)的主要區(qū)別在于靜態(tài)分區(qū)是手動指定，而動態(tài)分區(qū)是通過數(shù)據(jù)來進(jìn)行判斷.

詳細(xì)來說:靜態(tài)分區(qū)需要我們自己手動load并指定分區(qū),如果數(shù)據(jù)很多,那么是麻煩了.而動態(tài)分區(qū)指的是分區(qū)的字段值是基于查詢結(jié)果（參數(shù)位置）自動推斷出來的。核心語法就是insert+seclect。
開啟動態(tài)分區(qū)首先要在hive會話中設(shè)置如下的參數(shù)

-- 臨時設(shè)置 重新連接需要重新設(shè)置
set hive.exec.dynamic.partition=true; 
set hive.exec.dynamic.partition.mode=nonstrict;

其余參數(shù)配置如下:

設(shè)置為true表示開啟動態(tài)分區(qū)的功能（默認(rèn)為false） 
--hive.exec.dynamic.partition=true; 設(shè)置為nonstrict，表示允許所有分區(qū)都是動態(tài)的（默認(rèn)為strict） 嚴(yán)格模式至少有一個靜態(tài)分區(qū)
-- hive.exec.dynamic.partition.mode=nonstrict; 每個mapper或reducer可以創(chuàng)建的最大動態(tài)分區(qū)個數(shù)(默認(rèn)為100) 
比如：源數(shù)據(jù)中包含了一年的數(shù)據(jù)，即day字段有365個值，那么該參數(shù)就需要設(shè)置成大于365，
如果使用默認(rèn) 值100，則會報錯 
--hive.exec.max.dynamic.partition.pernode=100; 一個動態(tài)分區(qū)創(chuàng)建可以創(chuàng)建的最大動態(tài)分區(qū)個數(shù)（默認(rèn)值1000） 
--hive.exec.max.dynamic.partitions=1000; 全局可以創(chuàng)建的最大文件個數(shù)（默認(rèn)值100000） 
--hive.exec.max.created.files=100000; 當(dāng)有空分區(qū)產(chǎn)生時，是否拋出異常（默認(rèn)false） 
-- hive.error.on.empty.partition=false;

動態(tài)分區(qū)創(chuàng)建操作步驟

創(chuàng)建文件并上傳
創(chuàng)建外部表指向文件(相當(dāng)于臨時表)
創(chuàng)建動態(tài)分區(qū)表
查詢外部表將數(shù)據(jù)動態(tài)存入分區(qū)表中

創(chuàng)建文件并上傳student.txt  1,zhangsan,1,1
2,lisi,1,1
3,wangwu,1,2
4,zhangsan,2,1
5,lisi,2,1
6,maba,2,1
7,liuyan,3,1
8,tangyan,3,1
9,dalang,3,2
10,jinlian,3,2-- 將文件上傳到hdfs根目錄
hdfs dfs -put student.txt  /stu

創(chuàng)建外部表指向文件(相當(dāng)于臨時表)  
create external table t_stu_e(sid int,sname string,grade int,clazz int
)row format delimited fields terminated by ","
location "/stu";

創(chuàng)建動態(tài)分區(qū)表
create  table  t_stu_d(sid int,sname string
)partitioned by (grade int,clazz int)
row format delimited fields terminated by ",";

查詢外部表將數(shù)據(jù)動態(tài)存入分區(qū)表中
insert overwrite table t_stu_d partition (grade,clazz) select  *  from t_stu_e ;select * from t_stu_d;

在這里插入圖片描述

分桶表

分桶表也叫做桶表，叫法源自建表語法中bucket單詞，是一種用于優(yōu)化查詢而設(shè)計的表類型。

分區(qū)提供一個隔離數(shù)據(jù)和優(yōu)化查詢的便利方式。不過，并非所有的數(shù)據(jù)集都可形成合理的分區(qū)。不合理的數(shù)據(jù)分區(qū)劃分方式可能導(dǎo)致有的分區(qū)數(shù)據(jù)過多，而某些分區(qū)沒有什么數(shù)據(jù)的尷尬情況。分桶是將數(shù)據(jù)集分解為更容易管理的若干部分的另一種技術(shù)
在這里插入圖片描述
對Hive(Inceptor)表分桶可以將表中記錄按分桶鍵(字段)的哈希值分散進(jìn)多個文件中，這些小文件稱為桶。桶以文件為單位管理數(shù)據(jù)!分區(qū)針對的是數(shù)據(jù)的存儲路徑；分桶針對的是數(shù)據(jù)文件。
原理:

bucket num = hash_function(bucketing_column) mod   num_buckets分隔編號      哈希方法(分桶字段)              取模   分桶的個數(shù)

在這里插入圖片描述
分桶好處:

基于分桶字段查詢時，減少全表掃描.
根據(jù)join的字段對表進(jìn)行分桶操作,join時可以提高M(jìn)R程序效率，減少笛卡爾積數(shù)量.
分桶表數(shù)據(jù)進(jìn)行高效抽樣.數(shù)據(jù)量大時,使用抽樣數(shù)據(jù)估計和推斷整體特性.

分桶表的創(chuàng)建

1.準(zhǔn)備person.txt上傳到hdfs
2.創(chuàng)建外部表指向person.txt
3.創(chuàng)建分桶表
4.查詢外部表將數(shù)據(jù)加載到分桶表中


person.txt 
public class Test02 {public static void main(String[] args) {for (int i = 1; i <= 10000; i++) {System.out.println(i + "," + "liuyan" + (new Random().nextInt(10000) + 10000));}}
}hdfs dfs -mkdir /person
hdfs dfs -put person.txt /person

2.創(chuàng)建外部表指向person.txt
create external table  t_person_e(id int,pname string
) row format delimited fields terminated by ","location "/person";select  * from t_person_e;

create table  t_person(id int,pname string
)clustered by(id) sorted by (pname) into 24 buckets
row format delimited fields terminated by ",";

insert overwrite table t_person select * from t_person_e ;

在這里插入圖片描述
桶表抽樣

-- tablesample是抽樣語句，語法：TABLESAMPLE(BUCKET x OUT OF y) 
-- x表示從哪個bucket開始抽取。例如，table總bucket數(shù)為32，tablesample(bucket 3 out of 16)32 / 16  = 2  代表16桶為一組  抽取 第一組的第3桶  抽取第二組的第3桶 也就是第19桶
-- y必須是table總bucket數(shù)的倍數(shù)或者因子。hive根據(jù)y的大小，決定抽樣的比例。tablesample(bucket 3 out of 64)32/64 = 2分之一      64桶為一組  不夠一組 取第三桶的 前百分之50select * from t_person tablesample(bucket 4 out of 12); 24/12 抽取2桶數(shù)據(jù)      12桶一組 抽取 第一組第4桶 第二組 第4桶 4+12 =16桶

查看全文

http://www.risenshineclean.com/news/6470.html

中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网