中文亚洲精品无码_熟女乱子伦免费_人人超碰人人爱国产_亚洲熟妇女综合网

當(dāng)前位置: 首頁 > news >正文

網(wǎng)站開發(fā)語言是什么app開發(fā)多少錢

網(wǎng)站開發(fā)語言是什么,app開發(fā)多少錢,做響應(yīng)式網(wǎng)站的物流,做問卷的網(wǎng)站好文章目錄 Spark Load 導(dǎo)入Hive數(shù)據(jù) 一、Spark Load導(dǎo)入Hive非分區(qū)表數(shù)據(jù) 1、在node3hive客戶端,準(zhǔn)備向Hive表加載的數(shù)據(jù) 2、啟動(dòng)Hive,在Hive客戶端創(chuàng)建Hive表并加載數(shù)據(jù) 3、在Doris中創(chuàng)建Hive外部表 4、創(chuàng)建Doris表 5、創(chuàng)建Spark Load導(dǎo)入任務(wù) 6…

文章目錄

Spark Load 導(dǎo)入Hive數(shù)據(jù)

一、Spark Load導(dǎo)入Hive非分區(qū)表數(shù)據(jù)

1、在node3hive客戶端,準(zhǔn)備向Hive表加載的數(shù)據(jù)

2、啟動(dòng)Hive,在Hive客戶端創(chuàng)建Hive表并加載數(shù)據(jù)

3、在Doris中創(chuàng)建Hive外部表

4、創(chuàng)建Doris表

5、創(chuàng)建Spark Load導(dǎo)入任務(wù)

6、Spark Load任務(wù)查看

7、查看Doris結(jié)果

二、Spark Load 導(dǎo)入Hive分區(qū)表數(shù)據(jù)

1、在node3 hive客戶端,準(zhǔn)備向Hive表加載的數(shù)據(jù)

2、創(chuàng)建Hive分區(qū)表并,加載數(shù)據(jù)

3、創(chuàng)建Doris分區(qū)表

4、創(chuàng)建Spark Load導(dǎo)入任務(wù)

5、Spark Load任務(wù)查看

6、查看Doris結(jié)果


Spark Load 導(dǎo)入Hive數(shù)據(jù)

一、Spark Load導(dǎo)入Hive非分區(qū)表數(shù)據(jù)

1、在node3hive客戶端,準(zhǔn)備向Hive表加載的數(shù)據(jù)

hive_data1.txt:

1,zs,18,100
2,ls,19,101
3,ww,20,102
4,ml,21,103
5,tq,22,104

2、啟動(dòng)Hive,在Hive客戶端創(chuàng)建Hive表并加載數(shù)據(jù)

#配置Hive 服務(wù)端$HIVE_HOME/conf/hive-site.xml
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
注意:此配置項(xiàng)為關(guān)閉metastore版本驗(yàn)證,避免在doris中讀取hive外表時(shí)報(bào)錯(cuò)。#在node1節(jié)點(diǎn)啟動(dòng)hive metastore
[root@node1 ~]# hive --service metastore &#在node3節(jié)點(diǎn)進(jìn)入hive客戶端建表并加載數(shù)據(jù) 
create table hive_tbl (id int,name string,age int,score int) row format delimited fields terminated by ',';load data local inpath '/root/hive_data1.txt' into table hive_tbl;#查看hive表中的數(shù)據(jù)
hive> select * from hive_tbl;
1	zs	18	100
2	ls	19	101
3	ww	20	102
4	ml	21	103
5	tq	22	104

3、在Doris中創(chuàng)建Hive外部表

使用Spark Load 將Hive非分區(qū)表中的數(shù)據(jù)導(dǎo)入到Doris中時(shí),需要先在Doris中創(chuàng)建hive 外部表,然后通過Spark Load 加載這張外部表數(shù)據(jù)到Doris某張表中。

#Doris中創(chuàng)建Hive 外表
CREATE EXTERNAL TABLE example_db.hive_doris_tbl
(
id INT,
name varchar(255),
age INT,
score INT
)
ENGINE=hive
properties
(
"dfs.nameservices"="mycluster",
"dfs.ha.namenodes.mycluster"="node1,node2",
"dfs.namenode.rpc-address.mycluster.node1"="node1:8020",
"dfs.namenode.rpc-address.mycluster.node2"="node2:8020",
"dfs.client.failover.proxy.provider.mycluster" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
"database" = "default",
"table" = "hive_tbl",
"hive.metastore.uris" = "thrift://node1:9083"
);

注意:

  • 在Doris中創(chuàng)建Hive外表不會(huì)將數(shù)據(jù)存儲(chǔ)到Doris中,查詢hive外表數(shù)據(jù)時(shí)會(huì)讀取HDFS中對(duì)應(yīng)hive路徑中的數(shù)據(jù)來展示,向hive表中插入數(shù)據(jù)時(shí),doris中查詢hive外表也能看到新增數(shù)據(jù)。
  • 如果Hive表中是分區(qū)表,doris創(chuàng)建hive表將分區(qū)列看成普通列即可。

以上hive外表結(jié)果如下:

mysql> select * from hive_doris_tbl;
+------+------+------+-------+
| id   | name | age  | score |
+------+------+------+-------+
|    1 | zs   |   18 |   100 |
|    2 | ls   |   19 |   101 |
|    3 | ww   |   20 |   102 |
|    4 | ml   |   21 |   103 |
|    5 | tq   |   22 |   104 |
+------+------+------+-------+

4、創(chuàng)建Doris表

#創(chuàng)建Doris表
create table spark_load_t2(
id int,
name varchar(255),
age int,
score double
) 
ENGINE = olap
DUPLICATE KEY(id)
DISTRIBUTED BY HASH(`id`) BUCKETS 8;

5、創(chuàng)建Spark Load導(dǎo)入任務(wù)

創(chuàng)建Spark Load任務(wù)后,底層Spark Load轉(zhuǎn)換成Spark任務(wù)進(jìn)行數(shù)據(jù)導(dǎo)入處理時(shí),需要連接Hive,所以需要保證在Spark node1-node3節(jié)點(diǎn)客戶端中SPARK_HOME/conf/目錄下有hive-site.xml配置文件,以便找到Hive ,另外,連接Hive時(shí)還需要MySQL 連接依賴包,所以需要在Yarn NodeManager各個(gè)節(jié)點(diǎn)保證$HADOOP_HOME/share/hadoop/yarn/lib路徑下有mysql-connector-java-5.1.47.jar依賴包。

#把hive客戶端hive-site.xml 分發(fā)到Spark 客戶端(node1-node3)節(jié)點(diǎn)$SPARK_HOME/conf目錄下
[root@node3 ~]# scp /software/hive-3.1.3/conf/hive-site.xml  node1:/software/spark-2.3.1/conf/
[root@node3 ~]# scp /software/hive-3.1.3/conf/hive-site.xml  node2:/software/spark-2.3.1/conf/
[root@node3 ~]# cp /software/hive-3.1.3/conf/hive-site.xml  /software/spark-2.3.1/conf/#將mysql-connector-java-5.1.47.jar依賴分發(fā)到NodeManager 各個(gè)節(jié)點(diǎn)$HADOOP_HOME/share/hadoop/yarn/lib路徑中
[root@node3 ~]# cp /software/hive-3.1.3/lib/mysql-connector-java-5.1.47.jar /software/hadoop-3.3.3/share/hadoop/yarn/lib/
[root@node3 ~]# scp /software/hive-3.1.3/lib/mysql-connector-java-5.1.47.jar node4:/software/hadoop-3.3.3/share/hadoop/yarn/lib/
[root@node3 ~]# scp /software/hive-3.1.3/lib/mysql-connector-java-5.1.47.jar node5:/software/hadoop-3.3.3/share/hadoop/yarn/lib/

編寫Spark Load任務(wù),如下:

LOAD LABEL example_db.label2
(
DATA FROM TABLE hive_doris_tbl
INTO TABLE spark_load_t2
)
WITH RESOURCE 'spark1'
(
"spark.executor.memory" = "1g",
"spark.shuffle.compress" = "true"
)
PROPERTIES
(
"timeout" = "3600"
);

6、Spark Load任務(wù)查看

登錄Yarn Web UI查看對(duì)應(yīng)任務(wù)執(zhí)行情況:

執(zhí)行命令查看Spark Load 任務(wù)執(zhí)行情況:

mysql> show load order by createtime desc limit 1\G;
*************************** 1. row ***************************JobId: 37128Label: label2State: FINISHEDProgress: ETL:100%; LOAD:100%Type: SPARKEtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=0TaskInfo: cluster:spark1; timeout(s):3600; max_filter_ratio:0.0ErrorMsg: NULLCreateTime: 2023-03-10 18:13:19EtlStartTime: 2023-03-10 18:13:34EtlFinishTime: 2023-03-10 18:15:27LoadStartTime: 2023-03-10 18:15:27
LoadFinishTime: 2023-03-10 18:15:30URL: http://node1:8088/proxy/application_1678424784452_0007/JobDetails: {"Unfinished backends":{"0-0":[]},"ScannedRows":0,"TaskNumber":1,"LoadBytes":0,"All backends":{"0-0":[-1]},"FileNumber":0,"FileSi
ze":0} TransactionId: 24081ErrorTablets: {}
1 row in set (0.00 sec)

?

7、查看Doris結(jié)果

mysql> select * from spark_load_t2;
+------+------+------+-------+
| id   | name | age  | score |
+------+------+------+-------+
|    5 | tq   |   22 |   104 |
|    4 | ml   |   21 |   103 |
|    1 | zs   |   18 |   100 |
|    3 | ww   |   20 |   102 |
|    2 | ls   |   19 |   101 |
+------+------+------+-------+

二、Spark Load 導(dǎo)入Hive分區(qū)表數(shù)據(jù)

導(dǎo)入Hive分區(qū)表數(shù)據(jù)到對(duì)應(yīng)的doris分區(qū)表就不能在doris中創(chuàng)建hive外表這種方式導(dǎo)入,因?yàn)閔ive分區(qū)列在hive外表中就是普通列,所以這里我們使用Spark Load 直接讀取Hive分區(qū)表在HDFS中的路徑,將數(shù)據(jù)加載到Doris分區(qū)表中。

1、在node3 hive客戶端,準(zhǔn)備向Hive表加載的數(shù)據(jù)

hive_data2.txt:

1,zs,18,100,2023-03-01
2,ls,19,200,2023-03-01
3,ww,20,300,2023-03-02
4,ml,21,400,2023-03-02
5,tq,22,500,2023-03-02

2、創(chuàng)建Hive分區(qū)表并,加載數(shù)據(jù)

#在node3節(jié)點(diǎn)進(jìn)入hive客戶端建表并加載數(shù)據(jù) 
create table hive_tbl2 (id int, name string,age int,score int) partitioned by (dt string) row format delimited fields terminated by ','load data local inpath '/root/hive_data2.txt' into table hive_tbl2;#查看hive表中的數(shù)據(jù)
hive> select * from hive_tbl2;
OK
1	zs	18	100	2023-03-01
2	ls	19	200	2023-03-01
3	ww	20	300	2023-03-02
4	ml	21	400	2023-03-02
5	tq	22	500	2023-03-02hive> show partitions hive_tbl2;
OK
dt=2023-03-01
dt=2023-03-02

當(dāng)hive_tbl2表創(chuàng)建完成后,我們可以在HDFS中看到其存儲(chǔ)路徑格式如下:

?

3、創(chuàng)建Doris分區(qū)表

create table spark_load_t3(
dt date,
id int,
name varchar(255),
age int,
score double
) 
ENGINE = olap
DUPLICATE KEY(dt,id)
PARTITION BY RANGE(`dt`)
(
PARTITION `p1` VALUES [("2023-03-01"),("2023-03-02")),
PARTITION `p2` VALUES [("2023-03-02"),("2023-03-03"))
)
DISTRIBUTED BY HASH(`id`) BUCKETS 8;

4、創(chuàng)建Spark Load導(dǎo)入任務(wù)

創(chuàng)建Spark Load任務(wù)后,底層Spark Load轉(zhuǎn)換成Spark任務(wù)進(jìn)行數(shù)據(jù)導(dǎo)入處理時(shí),需要連接Hive,所以需要保證在Spark node1-node3節(jié)點(diǎn)客戶端中SPARK_HOME/conf/目錄下有hive-site.xml配置文件,以便找到Hive ,另外,連接Hive時(shí)還需要MySQL 連接依賴包,所以需要在Yarn NodeManager各個(gè)節(jié)點(diǎn)保證HADOOP_HOME/share/hadoop/yarn/lib路徑下有mysql-connector-java-5.1.47.jar依賴包。

#把hive客戶端hive-site.xml 分發(fā)到Spark 客戶端(node1-node3)節(jié)點(diǎn)$SPARK_HOME/conf目錄下
[root@node3 ~]# scp /software/hive-3.1.3/conf/hive-site.xml  node1:/software/spark-2.3.1/conf/
[root@node3 ~]# scp /software/hive-3.1.3/conf/hive-site.xml  node2:/software/spark-2.3.1/conf/
[root@node3 ~]# cp /software/hive-3.1.3/conf/hive-site.xml  /software/spark-2.3.1/conf/#將mysql-connector-java-5.1.47.jar依賴分發(fā)到NodeManager 各個(gè)節(jié)點(diǎn)$HADOOP_HOME/share/hadoop/yarn/lib路徑中
[root@node3 ~]# cp /software/hive-3.1.3/lib/mysql-connector-java-5.1.47.jar /software/hadoop-3.3.3/share/hadoop/yarn/lib/
[root@node3 ~]# scp /software/hive-3.1.3/lib/mysql-connector-java-5.1.47.jar node4:/software/hadoop-3.3.3/share/hadoop/yarn/lib/
[root@node3 ~]# scp /software/hive-3.1.3/lib/mysql-connector-java-5.1.47.jar node5:/software/hadoop-3.3.3/share/hadoop/yarn/lib/

編寫Spark Load任務(wù),如下:

LOAD LABEL example_db.label3
(
DATA INFILE("hdfs://node1:8020/user/hive/warehouse/hive_tbl2/dt=2023-03-02/*")
INTO TABLE spark_load_t3
COLUMNS TERMINATED BY ","
FORMAT AS "csv"
(id,name,age,score)
COLUMNS FROM PATH AS (dt)
SET
(
dt=dt,
id=id,
name=name,
age=age
)
)
WITH RESOURCE 'spark1'
(
"spark.executor.memory" = "1g",
"spark.shuffle.compress" = "true"
)
PROPERTIES
(
"timeout" = "3600"
);

注意:

  • 以上HDFS路徑不支持HA模式,需要手動(dòng)指定Active NameNode節(jié)點(diǎn)
  • 讀取HDFS文件路徑中的分區(qū)路徑需要寫出來,不能使用*代表,這與Broker Load不同。
  • 目前版本測試存在問題:當(dāng)Data INFILE中指定多個(gè)路徑時(shí)有時(shí)會(huì)出現(xiàn)只導(dǎo)入第一個(gè)路徑數(shù)據(jù)。

5、Spark Load任務(wù)查看

執(zhí)行命令查看Spark Load 任務(wù)執(zhí)行情況:

mysql> show load order by createtime desc limit 1\G;   
*************************** 1. row ***************************JobId: 39432Label: label3State: FINISHEDProgress: ETL:100%; LOAD:100%Type: SPARKEtlInfo: unselected.rows=0; dpp.abnorm.ALL=0; dpp.norm.ALL=3TaskInfo: cluster:spark1; timeout(s):3600; max_filter_ratio:0.0ErrorMsg: NULLCreateTime: 2023-03-10 20:11:19EtlStartTime: 2023-03-10 20:11:36EtlFinishTime: 2023-03-10 20:12:21LoadStartTime: 2023-03-10 20:12:21
LoadFinishTime: 2023-03-10 20:12:22URL: http://node1:8088/proxy/application_1678443952851_0026/JobDetails: {"Unfinished backends":{"0-0":[]},"ScannedRows":3,"TaskNumber":1,"LoadBytes":0,"All backends":{"0-0":[-1]},"FileNumber":2,"FileSi
ze":60} TransactionId: 25529ErrorTablets: {}
1 row in set (0.02 sec)

6、查看Doris結(jié)果

mysql> select * from spark_load_t3;
+------------+------+------+------+-------+
| dt         | id   | name | age  | score |
+------------+------+------+------+-------+
| 2023-03-02 |    3 | ww   |   20 |   300 |
| 2023-03-02 |    4 | ml   |   21 |   400 |
| 2023-03-02 |    5 | tq   |   22 |   500 |
+------------+------+------+------+-------+

  • 📢博客主頁:https://lansonli.blog.csdn.net
  • 📢歡迎點(diǎn)贊 👍 收藏 ?留言 📝 如有錯(cuò)誤敬請(qǐng)指正!
  • 📢本文由 Lansonli 原創(chuàng),首發(fā)于 CSDN博客🙉
  • 📢停下休息的時(shí)候不要忘了別人還在奔跑,希望大家抓緊時(shí)間學(xué)習(xí),全力奔赴更美好的生活?
http://www.risenshineclean.com/news/31359.html

相關(guān)文章:

  • 西安企業(yè)網(wǎng)站建設(shè)哪家好怎么用手機(jī)創(chuàng)建網(wǎng)站
  • 什么網(wǎng)站ppt做的好免費(fèi)的seo優(yōu)化工具
  • 日語網(wǎng)站設(shè)計(jì)怎么做百度推廣平臺(tái)
  • 資陽視頻網(wǎng)站建設(shè)廣州seo服務(wù)公司
  • 房地產(chǎn)網(wǎng)站解決方案女孩子做運(yùn)營是不是壓力很大
  • 企業(yè)網(wǎng)站 asp php網(wǎng)絡(luò)優(yōu)化工具app手機(jī)版
  • 哪些專門做批發(fā)的網(wǎng)站有哪些短網(wǎng)址鏈接生成
  • 網(wǎng)站制作文案杭州長治seo顧問
  • seo發(fā)布網(wǎng)站某網(wǎng)站搜索引擎優(yōu)化
  • 網(wǎng)站開發(fā)包含上線嗎網(wǎng)絡(luò)營銷的六大功能
  • wordpress網(wǎng)站例昆明網(wǎng)絡(luò)營銷
  • 在國外做盜版電影網(wǎng)站嗎seo發(fā)帖工具
  • 做內(nèi)貿(mào)的網(wǎng)站武漢網(wǎng)站設(shè)計(jì)十年樂云seo
  • 做網(wǎng)站需要備案么長沙seo培訓(xùn)班
  • 撫順市城市建設(shè)檔案館網(wǎng)站安卓優(yōu)化大師新版
  • 做平臺(tái)網(wǎng)站外包多少錢啊常見的網(wǎng)絡(luò)營銷方式有哪幾種
  • 京東網(wǎng)上購物平臺(tái)如何快速優(yōu)化網(wǎng)站排名
  • 網(wǎng)站制作方案書博客網(wǎng)
  • 訂閱號(hào)欄目里做微網(wǎng)站網(wǎng)站排名靠前
  • 網(wǎng)絡(luò)營銷專業(yè)建議百度seo優(yōu)化收費(fèi)標(biāo)準(zhǔn)
  • 兩學(xué)一做教育考試網(wǎng)站微信小程序怎么開通
  • WordPress微信密碼seo初級(jí)入門教程
  • 物聯(lián)網(wǎng)型網(wǎng)站開發(fā)企業(yè)網(wǎng)站建設(shè)服務(wù)
  • 做網(wǎng)站買域名網(wǎng)絡(luò)營銷策劃方案3000字
  • 建筑室內(nèi)設(shè)計(jì)公司排名seo人工智能
  • 中國建設(shè)門戶網(wǎng)站紀(jì)念幣看seo
  • 做網(wǎng)站一般收取多少錢國際重大新聞
  • 深圳市城鄉(xiāng)和建設(shè)局網(wǎng)站首頁貼吧推廣
  • 有沒有免費(fèi)的網(wǎng)站推銷產(chǎn)品如何做網(wǎng)絡(luò)推廣
  • 成都市 網(wǎng)站建設(shè)福州短視頻seo公司