威海網絡公司/時空seo助手
datax可以理解為sqoop的優(yōu)化版,
速度比sqoop快
因為sqoop底層是map任務,而datax底層是基于內存
DataX 是一個異構數據源離線同步工具,致力于實現包括關系型數據庫(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各種異構數據源之間穩(wěn)定高效的數據同步功能
datax 是讓你編寫 json
flume 是讓你編寫 conf
azkaban 是讓你編寫 flow
sqoop 是讓你寫命令
將mysql中的數據導入到hdfs上
{"job": {"setting": {"speed": {"channel": 1}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "123456","connection": [{"querySql": ["select empno,ename,job,hiredate,sal from emp;"],"jdbcUrl": ["jdbc:mysql://bigdata01:3306/sqoop"]}]}},"writer": {"name": "hdfswriter","parameter": {"defaultFS": "hdfs://bigdata01:9820","path": "/datax/emp","fileName": "emp","column": [{"name": "empno", "type": "int"},{"name": "ename", "type": "string"},{"name": "job", "type": "string"},{"name": "hiredate", "type": "string"},{"name": "sal", "type": "double"}],"fileType": "text","writeMode": "append","fieldDelimiter": "\t"}}}]}
}
將hdfs上的數據導入到mysql中
{"job": {"setting": {"speed": {"channel": 1}},"content": [{"reader": {"name": "hdfsreader","parameter": {"path": "/datax/emp/*","defaultFS": "hdfs://bigdata01:9820","column":[ {"index": 0, "type": "string"},{"index": 1, "type": "string"},{"index": 2, "type": "string"},{"index": 3, "type": "string"},{"index": 4, "type": "string"}],"fileType": "text","encoding": "UTF-8","fieldDelimiter": "\t"}},"writer": {"name": "mysqlwriter","parameter": {"writeMode": "replace","username": "root","password": "123456","column": ["empno", "ename", "job", "hiredate", "sal"],"connection": [{"jdbcUrl": "jdbc:mysql://bigdata01:3306/sqoop","table": ["eemmpp"]}]}}}]}
}
使用注意
注意點:
1)指定字段的類型時,datax中的類型只有下面幾種,而不是像java一樣的
2)默認的分隔符,即 "fieldDelimiter": "xxx" 不指定或者不寫的時候,默認為 ' , ' 分割
3)
將mysql 中的數據導入hive(重要)*
說是把mysql中的數據導入hive,其實本質還是將mysql中的數據導入hdfs中
首先先創(chuàng)建一個hive表 指定到hdfs的路徑上,再將mysql中的數據導入到這個路徑即可
1)首先先創(chuàng)建一個hive表
reate external table if not exists ods_01_base_area ( id int COMMENT 'id標識', area_code string COMMENT '省份編碼', province_name string COMMENT '省份名稱', iso string COMMENT 'ISO編碼' )row format delimited fields terminated by ',' stored as TextFile location '/data/nshop/ods/ods_01_base_area/'; -- 指定到hdfs的路徑
2)將mysql的數據通過datax導入hdfs
注意指定路徑和分隔符 ! 一定要與創(chuàng)建hive表指定的路徑一致 且 分隔符也保持一致
{"job": {"setting": {"speed": {"channel": 1}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "123456","column": ["id","area_code","province_name","iso"],"splitPk": "id","connection": [{"table": ["base_area"],"jdbcUrl": ["jdbc:mysql://bigdata01:3306/datax"]}]}},"writer": {"name": "hdfswriter","parameter": {"defaultFS": "hdfs://bigdata01:9820","path": "/data/nshop/ods/ods_01_base_area/","fileName": "base_area","column": [{"name": "id","type": "int"},{"name": "area_code","type": "string"},{"name": "province_name","type": "string"},{"name": "iso","type": "string"}],"fileType": "text","writeMode": "append","fieldDelimiter": ","}}}]}
}