威海網(wǎng)絡(luò)公司/時空seo助手
datax可以理解為sqoop的優(yōu)化版,
速度比sqoop快
因為sqoop底層是map任務(wù),而datax底層是基于內(nèi)存
DataX 是一個異構(gòu)數(shù)據(jù)源離線同步工具,致力于實現(xiàn)包括關(guān)系型數(shù)據(jù)庫(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各種異構(gòu)數(shù)據(jù)源之間穩(wěn)定高效的數(shù)據(jù)同步功能
datax 是讓你編寫 json
flume 是讓你編寫 conf
azkaban 是讓你編寫 flow
sqoop 是讓你寫命令
將mysql中的數(shù)據(jù)導(dǎo)入到hdfs上
{"job": {"setting": {"speed": {"channel": 1}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "123456","connection": [{"querySql": ["select empno,ename,job,hiredate,sal from emp;"],"jdbcUrl": ["jdbc:mysql://bigdata01:3306/sqoop"]}]}},"writer": {"name": "hdfswriter","parameter": {"defaultFS": "hdfs://bigdata01:9820","path": "/datax/emp","fileName": "emp","column": [{"name": "empno", "type": "int"},{"name": "ename", "type": "string"},{"name": "job", "type": "string"},{"name": "hiredate", "type": "string"},{"name": "sal", "type": "double"}],"fileType": "text","writeMode": "append","fieldDelimiter": "\t"}}}]}
}
將hdfs上的數(shù)據(jù)導(dǎo)入到mysql中
{"job": {"setting": {"speed": {"channel": 1}},"content": [{"reader": {"name": "hdfsreader","parameter": {"path": "/datax/emp/*","defaultFS": "hdfs://bigdata01:9820","column":[ {"index": 0, "type": "string"},{"index": 1, "type": "string"},{"index": 2, "type": "string"},{"index": 3, "type": "string"},{"index": 4, "type": "string"}],"fileType": "text","encoding": "UTF-8","fieldDelimiter": "\t"}},"writer": {"name": "mysqlwriter","parameter": {"writeMode": "replace","username": "root","password": "123456","column": ["empno", "ename", "job", "hiredate", "sal"],"connection": [{"jdbcUrl": "jdbc:mysql://bigdata01:3306/sqoop","table": ["eemmpp"]}]}}}]}
}
使用注意
注意點:
1)指定字段的類型時,datax中的類型只有下面幾種,而不是像java一樣的
2)默認(rèn)的分隔符,即 "fieldDelimiter": "xxx" 不指定或者不寫的時候,默認(rèn)為 ' , ' 分割
3)
將mysql 中的數(shù)據(jù)導(dǎo)入hive(重要)*
說是把mysql中的數(shù)據(jù)導(dǎo)入hive,其實本質(zhì)還是將mysql中的數(shù)據(jù)導(dǎo)入hdfs中
首先先創(chuàng)建一個hive表 指定到hdfs的路徑上,再將mysql中的數(shù)據(jù)導(dǎo)入到這個路徑即可
1)首先先創(chuàng)建一個hive表
reate external table if not exists ods_01_base_area ( id int COMMENT 'id標(biāo)識', area_code string COMMENT '省份編碼', province_name string COMMENT '省份名稱', iso string COMMENT 'ISO編碼' )row format delimited fields terminated by ',' stored as TextFile location '/data/nshop/ods/ods_01_base_area/'; -- 指定到hdfs的路徑
2)將mysql的數(shù)據(jù)通過datax導(dǎo)入hdfs
注意指定路徑和分隔符 ! 一定要與創(chuàng)建hive表指定的路徑一致 且 分隔符也保持一致
{"job": {"setting": {"speed": {"channel": 1}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "123456","column": ["id","area_code","province_name","iso"],"splitPk": "id","connection": [{"table": ["base_area"],"jdbcUrl": ["jdbc:mysql://bigdata01:3306/datax"]}]}},"writer": {"name": "hdfswriter","parameter": {"defaultFS": "hdfs://bigdata01:9820","path": "/data/nshop/ods/ods_01_base_area/","fileName": "base_area","column": [{"name": "id","type": "int"},{"name": "area_code","type": "string"},{"name": "province_name","type": "string"},{"name": "iso","type": "string"}],"fileType": "text","writeMode": "append","fieldDelimiter": ","}}}]}
}