1、datax简介
-
概述
DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能。 -
支持的数据源
2、架构
为了解决异构数据源同步问题,DataX将复杂的网状的同步链路变成了星型数据链路,DataX作为中间传输载体负责连接各种数据源。当需要接入一个新的数据源的时候,只需要将此数据源对接到DataX,便能跟已有的数据源做到无缝数据同步。
3、datax
datax的安装很简单,解压出来就能直接使用配一下环境变量即可
4、实例
- MySQLTOHDFS
cd /usr/local/soft/datax/job
vim MySQLToHDFD.json
添加下面内容
{"job": {"setting": {"speed": {"channel":3},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "123456","column": ["id","name","age","sex","clazz"],"splitPk": "id","connection": [{"table": ["students"],"jdbcUrl": ["jdbc:mysql://master:3306/bigdata?characterEncoding=utf8"]}]}},"writer": {"name": "hdfswriter","parameter": {"defaultFS": "hdfs://master:9000","fileType": "text","path": "/wll/data","fileName": "t1","column": [{"name": "id","type": "INT"},{"name": "name","type": "STRING"},{"name": "age","type": "INT"},{"name": "sex","type": "STRING"},{"name": "clazz","type": "STRING"}],"writeMode": "truncate","fieldDelimiter": ","}}}]}
}
datax.py MySQLToHDFD.json