datax入门(datax的安装与简单使用)——01
- 1. 官网
- 2. 工具部署(通过下载DataX工具包)
- 2.1 下载、解压
- 2.2 配置
- 2.2.1 查看配置模版
- 2.2.2 根据模版配置json
- 2.2.3 启动DataX
- 3. datax的简单使用
- 3.1 mysql2stream
- 3.2 mysql2mysql
- 3.2.1 拼接where的
- 3.2.2 直接写查询的sql语句的
- 4. 解释
- 4.1 json中seeting说明
- 4.2 参数说明(以mysql为例)
1. 官网
- 地址如下:
https://github.com/alibaba/DataX/blob/master/userGuid.md. - 简介
2. 工具部署(通过下载DataX工具包)
2.1 下载、解压
- 因为官网很详细,这里就简单记录一下:
下载 datax.tar.gz ,然后解压,命令如下:tar -zxvf datax.tar.gz
- 查看解压后的目录
2.2 配置
2.2.1 查看配置模版
- 命令如下:
python datax.py -r streamreader -w streamwriter
2.2.2 根据模版配置json
- 创建
stream2stream.json
文件,如下:cd /Users/susu/study_down/about_datax/datax/jobvim stream2stream.json
- stream2stream.json 内容如下:
#stream2stream.json {"job": {"content": [{"reader": {"name": "streamreader","parameter": {"sliceRecordCount": 10,"column": [{"type": "long","value": "10"},{"type": "string","value": "hello,你好,世界-DataX"}]}},"writer": {"name": "streamwriter","parameter": {"encoding": "UTF-8","print": true}}}],"setting": {"speed": {"channel": 5}}} }
2.2.3 启动DataX
- 启动命令,开始同步,如下:
python ../bin/datax.py stream2stream.json
- 同步结束,查看日志如下:
3. datax的简单使用
- 环境有限,下面就以mysql为主了,mysql_to_别的数据库,后续有机会再做介绍
3.1 mysql2stream
-
使用命令先查看模版:
python datax.py -r mysqlreader -w streamwriter
-
mysql2stream.json 如下:
{"job": {"setting": {"speed": {"channel": 3},"errorLimit": {"record": 0,"percentage": 0.02}},"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "susu@123","column": ["dog_num","dog_name"],"splitPk": "dog_num","connection": [{"table": ["dog"],"jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/datax_1"]}]}},"writer": {"name": "streamwriter","parameter": {"print": true}}}]} }
-
效果如下:
python ../bin/datax.py mysql2stream.json
3.2 mysql2mysql
- 使用命令先查看模版:
python datax.py -r mysqlreader -w mysqlwriter
3.2.1 拼接where的
- mysql2mysql_where.json文件如下:
{"job": {"content": [{"reader": {"name": "mysqlreader","parameter": {"column": ["*"],"connection": [{"jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/datax_1"],"table": ["dog"]}],"username": "root","password": "susu@123","where": "dog_num=1000003"}},"writer": {"name": "mysqlwriter","parameter": {"column": ["*"],"connection": [{"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/datax_2","table": ["dog"]}],"username": "root","password": "susu@123","writeMode": "insert"}}}],"setting": {"speed": {"channel": "1"}}} }
- 效果如下:
python ../bin/datax.py mysql2mysql_where.json
3.2.2 直接写查询的sql语句的
- 使用querySql参数(注意querySql 和 SQL 只能保留一个),如下:
- mysql2mysql_query.json 文件代码如下:
{"job": {"content": [{"reader": {"name": "mysqlreader","parameter": {"connection": [{"jdbcUrl": ["jdbc:mysql://127.0.0.1:3306/datax_1"],"querySql": ["select t.dog_num,t.dog_name,t.db_source from dog t where dog_num=1000004"]}],"username": "root","password": "susu@123"}},"writer": {"name": "mysqlwriter","parameter": {"column": ["*"],"connection": [{"jdbcUrl": "jdbc:mysql://127.0.0.1:3306/datax_2","table": ["dog"]}],"username": "root","password": "susu@123","writeMode": "insert"}}}],"setting": {"speed": {"channel": "1"}}} }
- 效果如下:
python ../bin/datax.py mysql2mysql_query.json
4. 解释
4.1 json中seeting说明
- 关于seeting
settingspeed表示控制并发数channel设置并发的数量如果设置的print为true,则会打印slicRecordCount*channel次如果是从mysql导入hdfs等其他操作,则会是真正代表并发数,而不是打印多少次
4.2 参数说明(以mysql为例)
- 其他的,从官网截图来看吧:
https://github.com/alibaba/DataX/blob/master/mysqlreader/doc/mysqlreader.md.