Datax
简介
DataX 是阿里巴巴集团内被广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、TableStore(OTS)、MaxCompute(ODPS)、DRDS 等各种异构数据源之间高效的数据同步功能。
DataX本身作为数据同步框架,将不同数据源的同步抽象为从源头数据源读取数据的Reader插件,以及向目标端写入数据的Writer插件,理论上DataX框架可以支持任意数据源类型的数据同步工作。同时DataX插件体系作为一套生态系统, 每接入一套新数据源该新加入的数据源即可实现和现有的数据源互通。
使用说明:
DataX支持数据源链接:https://github.com/alibaba/DataX/blob/master/introduction.md
DataX安装
DataX安装
1、上传依赖、
2、解压
tar -zxvf datax.tar.gz
3、配置环境变量
vim /etc/profile
添加环境变量
source /etc/profile注意:DataX所使用的环境为Python2,Python3环境下执行会报错
DataX使用
测试案例
Mysql2Mysql
stream2stream.json
{"job": {"content": [{"reader": {"name": "streamreader","parameter": {"sliceRecordCount": 10,"column": [{"type": "long","value": "10"},{"type": "string","value": "hello,你好,世界-DataX"}]}},"writer": {"name": "streamwriter","parameter": {"encoding": "UTF-8","print": true}}}],"setting": {"speed": {"channel": 5}}}
}
channel表示 并发数
“print”: true 表示打印到控制台
“sliceRecordCount”: 10 表示每个并发生成的测试数据条数
运行
datax.py stream2stream.json
结果显示五十条数据,因为通道channel:5和"sliceRecordCount": 10
Mysql2Mysql .json
{"job": {"setting": {"speed": {"channel": 3},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "123456","column": ["id","name","age","gender","clazz","last_mod"],"where": "clazz like '文科%'","splitPk": "id","connection": [{"table": ["student"],"jdbcUrl": ["jdbc:mysql://master:3306/data?useSSL=false"]}]}},"writer": {"name": "mysqlwriter","parameter": {"writeMode": "insert","username": "root","password": "123456","column": ["id","name","age","gender","clazz","last_mod"],"connection": [{"jdbcUrl": "jdbc:mysql://master:3306/data?useSSL=false&useUnicode=true&characterEncoding=utf-8","table": ["datax_stu"]}]}}}]}
}
writeMode: insert表示数据插入 update表示根据目标表的主键进行数据的更新操作 replace 表示主键覆盖写入
Mysql2Mysql-qureySQL
{"job": {"setting": {"speed": {"channel": 3},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "123456","splitPk": "id","connection": [{"querySql": ["select t2.*,t1.total_score FROM( SELECT student_id ,sum(score) as total_score from score GROUP BY student_id ) t1 join student t2 on t1.student_id = t2.id
"],"jdbcUrl": ["jdbc:mysql://master:3306/student?useSSL=false"]}]}},"writer": {"name": "mysqlwriter","parameter": {"writeMode": "insert","username": "root","password": "123456","column": ["id","name","age","gender","clazz","last_mod","total_score"],"connection": [{"jdbcUrl": "jdbc:mysql://master:3306/data?useSSL=false&useUnicode=true&characterEncoding=utf-8","table": ["score_student"]}]}}}]}
}
Mysql2HDFS
mysql2hdfs
{"job": {"setting": {"speed": {"channel": 3},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "123456","splitPk": "id","connection": [{"querySql": ["select t2.*,t1.total_score FROM( SELECT student_id ,sum(score) as total_score from score GROUP BY student_id ) t1 join student t2 on t1.student_id = t2.id
"],"jdbcUrl": ["jdbc:mysql://master:3306/student?useSSL=false"]}]}},"writer": {"name": "hdfswriter","parameter": {"defaultFS": "hdfs://master:9000","fileType": "text","path": "/user/hive/warehouse/bigdata.db/students_scores","fileName": "students_scores","column": [{"name": "id","type": "STRING"},{"name": "name","type": "STRING"},{"name": "age","type": "INT"},{"name": "gender","type": "STRING"},{"name": "clazz","type": "STRING"},{"name": "last_mod","type": "STRING"},{"name": "total_score","type": "INT"}],"writeMode": "append","fieldDelimiter": ","}}}]}
}
-
hdfs2mysql
{"job": {"setting": {"speed": {"channel": 3},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "hdfsreader","parameter": {"path": "/datax/stu_score/*","defaultFS": "hdfs://master:9000","column": [{"index": "0","type": "STRING"},{"index": "1","type": "STRING"},{"index": "2","type": "STRING"},{"index": "3","type": "STRING"},{"index": "4","type": "STRING"},{"index": "5","type": "STRING"}],"fileType": "text","encoding": "UTF-8","fieldDelimiter": ","}},"writer": {"name": "mysqlwriter","parameter": {"writeMode": "insert","username": "root","password": "123456","column": ["id","name","age","gender","clazz","total_score"],"connection": [{"jdbcUrl": "jdbc:mysql://master:3306/datax?useSSL=false&useUnicode=true&characterEncoding=utf-8","table": ["stu_score_2"]}]}}}]} }
{“message”:“No enum constant com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil.Type.INT”,“record”:[{“byteSize”
:10,“index”:0,“rawData”:“1500100001”,“type”:“STRING”},{“byteSize”:3,“index”:1,“rawData”:“施笑槐”,“type”:“STRING”}],“type”:“reader”}HDFS中读取文本数据时,所有列的类型都可以给定一个String类型,在插入数据到Mysql时,会做自动类型转换
Hbase2Mysql
-
Mysql2Hbase
{"job": {"setting": {"speed": {"channel": 3},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "123456","splitPk": "id","connection": [{"querySql": ["select t2.*,t1.total_score FROM( SELECT student_id ,sum(score) as total_score from score GROUP BY student_id ) t1 join student t2 on t1.student_id = t2.id"],"jdbcUrl": ["jdbc:mysql://master:3306/student?useSSL=false"]}]}},"writer": {"name": "hbase11xwriter","parameter": {"hbaseConfig": {"hbase.zookeeper.quorum": "master:2181,node1:2181,node2:2181"},"table": "score_stu","mode": "normal","rowkeyColumn": [{"index":0,"type":"string"},{"index":-1,"type":"string","value":"_"},{"index":6,"type":"string"}],"column": [{"index":1,"name": "info:name","type": "string"},{"index":2,"name": "info:age","type": "string"},{"index":3,"name": "info:gender","type": "string"},{"index":4,"name": "info:clazz","type": "string"},{"index":6,"name": "info:total_score","type": "string"}],"encoding": "utf-8"}}}]} }
-
hbase2mysql
{"job": {"setting": {"speed": {"channel": 3},"errorLimit": {"record": 10,"percentage": 0.02}},"content": [{"reader": {"name": "hbase11xreader","parameter": {"hbaseConfig": {"hbase.zookeeper.quorum": "master:2181"},"table": "api:stu","encoding": "utf-8","mode": "normal","column": [{"name": "rowkey","type": "string"},{"name": "info: name","type": "string"},{"name": "info: age","type": "string"},{"name": "info: gender","type": "string"},{"name": "info: clazz","type": "string"}]}},"writer": {"name": "mysqlwriter","parameter": {"writeMode": "insert","username": "root","password": "123456","column": ["id","name","age","gender","clazz"],"connection": [{"jdbcUrl": "jdbc:mysql://master:3306/datax?useSSL=false&useUnicode=true&characterEncoding=utf-8","table": ["stu"]}]}}}]} }