帶視頻的網(wǎng)站模板免費(fèi)數(shù)據(jù)分析網(wǎng)站
使用限制?
- 支持 Iceberg V1/V2 表格式。
- 支持 Position Delete。
- 2.1.3 版本開始支持 Equality Delete。
- 支持 Parquet 文件格式
- 2.1.3 版本開始支持 ORC 文件格式。
創(chuàng)建 Catalog?
基于 Hive Metastore 創(chuàng)建 Catalog?
和 Hive Catalog 基本一致,這里僅給出簡單示例。其他示例可參閱?Hive Catalog。
CREATE CATALOG iceberg PROPERTIES ('type'='hms','hive.metastore.uris' = 'thrift://172.21.0.1:7004','hadoop.username' = 'hive','dfs.nameservices'='your-nameservice','dfs.ha.namenodes.your-nameservice'='nn1,nn2','dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007','dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007','dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
基于 Iceberg API 創(chuàng)建 Catalog?
使用 Iceberg API 訪問元數(shù)據(jù)的方式,支持 Hadoop File System、Hive、REST、Glue、DLF 等服務(wù)作為 Iceberg 的 Catalog。
Hadoop Catalog?
注意:
warehouse
?的路徑必須指向?Database
?路徑的上一級。示例:如果你的表路徑是:
s3://bucket/path/to/db1/table1
,那么?warehouse
?應(yīng)該是:s3://bucket/path/to/
CREATE CATALOG iceberg_hadoop PROPERTIES ('type'='iceberg','iceberg.catalog.type' = 'hadoop','warehouse' = 'hdfs://your-host:8020/dir/key'
);
CREATE CATALOG iceberg_hadoop_ha PROPERTIES ('type'='iceberg','iceberg.catalog.type' = 'hadoop','warehouse' = 'hdfs://your-nameservice/dir/key','dfs.nameservices'='your-nameservice','dfs.ha.namenodes.your-nameservice'='nn1,nn2','dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007','dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007','dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
CREATE CATALOG iceberg_s3 PROPERTIES ('type'='iceberg','iceberg.catalog.type' = 'hadoop','warehouse' = 's3://bucket/dir/key','s3.endpoint' = 's3.us-east-1.amazonaws.com','s3.access_key' = 'ak','s3.secret_key' = 'sk'
);
Hive Metastore?
CREATE CATALOG iceberg PROPERTIES ('type'='iceberg','iceberg.catalog.type'='hms','hive.metastore.uris' = 'thrift://172.21.0.1:7004','hadoop.username' = 'hive','dfs.nameservices'='your-nameservice','dfs.ha.namenodes.your-nameservice'='nn1,nn2','dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007','dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007','dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
AWS Glue?
連接 Glue 時,如果是在非 EC2 環(huán)境,需要將 EC2 環(huán)境里的?
~/.aws
?目錄拷貝到當(dāng)前環(huán)境里。也可以下載AWS Cli工具進(jìn)行配置,這種方式也會在當(dāng)前用戶目錄下創(chuàng)建.aws
目錄。 請升級到 Doris 2.1.7 或 3.0.3 之后的版本使用該功能。
-- Using access key and secret key
CREATE CATALOG glue2 PROPERTIES ("type"="iceberg","iceberg.catalog.type" = "glue","glue.endpoint" = "https://glue.us-east-1.amazonaws.com/","client.credentials-provider" = "com.amazonaws.glue.catalog.credentials.ConfigAWSProvider","client.credentials-provider.glue.access_key" = "ak","client.credentials-provider.glue.secret_key" = "sk"
);
-
Iceberg 屬性詳情參見?Iceberg Glue Catalog
-
如果不指定?
client.credentials-provider
,Doris 就會使用默認(rèn)的 DefaultAWSCredentialsProviderChain,它會讀取系統(tǒng)環(huán)境變量或者 InstanceProfile 中配置的屬性。
阿里云 DLF?
參見阿里云 DLF Catalog 配置
REST Catalog?
該方式需要預(yù)先提供 REST 服務(wù),用戶需實(shí)現(xiàn)獲取 Iceberg 元數(shù)據(jù)的 REST 接口。
CREATE CATALOG iceberg PROPERTIES ('type'='iceberg','iceberg.catalog.type'='rest','uri' = 'http://172.21.0.1:8181'
);
如果使用 HDFS 存儲數(shù)據(jù),并開啟了高可用模式,還需在 Catalog 中增加 HDFS 高可用配置:
CREATE CATALOG iceberg PROPERTIES ('type'='iceberg','iceberg.catalog.type'='rest','uri' = 'http://172.21.0.1:8181','dfs.nameservices'='your-nameservice','dfs.ha.namenodes.your-nameservice'='nn1,nn2','dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.1:8020','dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.2:8020','dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
Google Dataproc Metastore?
CREATE CATALOG iceberg PROPERTIES ("type"="iceberg","iceberg.catalog.type"="hms","hive.metastore.uris" = "thrift://172.21.0.1:9083","gs.endpoint" = "https://storage.googleapis.com","gs.region" = "us-east-1","gs.access_key" = "ak","gs.secret_key" = "sk","use_path_style" = "true"
);
hive.metastore.uris
: Dataproc Metastore 服務(wù)開放的接口,在 Metastore 管理頁面獲取:Dataproc Metastore Services.
Iceberg On Object Storage?
若數(shù)據(jù)存放在 S3 上,properties 中可以使用以下參數(shù):
"s3.access_key" = "ak"
"s3.secret_key" = "sk"
"s3.endpoint" = "s3.us-east-1.amazonaws.com"
"s3.region" = "us-east-1"
數(shù)據(jù)存放在阿里云 OSS 上:
"oss.access_key" = "ak"
"oss.secret_key" = "sk"
"oss.endpoint" = "oss-cn-beijing-internal.aliyuncs.com"
"oss.region" = "oss-cn-beijing"
數(shù)據(jù)存放在騰訊云 COS 上:
"cos.access_key" = "ak"
"cos.secret_key" = "sk"
"cos.endpoint" = "cos.ap-beijing.myqcloud.com"
"cos.region" = "ap-beijing"
數(shù)據(jù)存放在華為云 OBS 上:
"obs.access_key" = "ak"
"obs.secret_key" = "sk"
"obs.endpoint" = "obs.cn-north-4.myhuaweicloud.com"
"obs.region" = "cn-north-4"
示例?
-- MinIO & Rest Catalog
CREATE CATALOG `iceberg` PROPERTIES ("type" = "iceberg","iceberg.catalog.type" = "rest","uri" = "http://10.0.0.1:8181","warehouse" = "s3://bucket","token" = "token123456","s3.access_key" = "ak","s3.secret_key" = "sk","s3.endpoint" = "http://10.0.0.1:9000","s3.region" = "us-east-1"
);
列類型映射?
Iceberg Type | Doris Type |
---|---|
boolean | boolean |
int | int |
long | bigint |
float | float |
double | double |
decimal(p,s) | decimal(p,s) |
date | date |
uuid | string |
timestamp (Timestamp without timezone) | datetime(6) |
timestamptz (Timestamp with timezone) | datetime(6) |
string | string |
fixed(L) | char(L) |
binary | string |
struct | struct(2.1.3 版本開始支持) |
map | map(2.1.3 版本開始支持) |
list | array |
time | 不支持 |
Time Travel?
支持讀取 Iceberg 表指定的 Snapshot。
每一次對 iceberg 表的寫操作都會產(chǎn)生一個新的快照。
默認(rèn)情況下,讀取請求只會讀取最新版本的快照。
可以使用?FOR TIME AS OF
?和?FOR VERSION AS OF
?語句,根據(jù)快照 ID 或者快照產(chǎn)生的時間讀取歷史版本的數(shù)據(jù)。示例如下:
SELECT * FROM iceberg_tbl FOR TIME AS OF "2022-10-07 17:20:37";
SELECT * FROM iceberg_tbl FOR VERSION AS OF 868895038966572;
另外,可以使用?iceberg_meta?表函數(shù)查詢指定表的 snapshot 信息。