intersect组件是解决纵向联邦学习中的隐私求交问题
fate隐私求交的方式有三种:raw,rsa,dh。raw方式不安全,rsa和dh方式是安全的,dh是基于对称加密的安全交集 rsa是基于RSA(非对称加密)的安全交集,,dh方法也用于安全的信息检索(SIR),fate intersect 支持多主机模式即(1个guest与多个host进行求交集)
可配置的hash方法有sha256、md5 和 sm3。raw交集支持base64编码,支持与缓存的交集。
统案例脚本文件:/data/projects/fate/examples/dsl/v2/
1、 1v1案例:
本次案例在guest执行任务,求交host
host | guest | |
数据文件名称 | xxl_test_host.csv | xxl_test_guest.csv |
表空间名称 | sp_host | sp_guest |
表名称 | tb_host | tb_guest |
/data/projects/fate/examples 目录创建测试目录,并拷贝系统配置
cd /data/projects/fate/examples/mytest
cp /data/projects/fate/examples/dsl/v2/intersect/test_intersect_job_dsl.json ./
cp /data/projects/fate/examples/dsl/v2/intersect/test_intersect_job_raw_conf.json ./
1.1上传文件
1.1.1、host创建上传脚本:
upload_xxl_host.json
{"file": "/data/projects/fate/examples/mytest/xxl_test_host.csv","head": 1,"partition": 10,"work_mode": 0,"namespace": "sp_host","table_name": "tb_host"
}
xxl_test_host.csv 是数据文件(需要有表头)
namespace 表空间名称
table_name 表名称
1.1.2、guest创建脚本:
upload_xxl_guest.json
{"file": "/data/projects/fate/examples/mytest/xxl_test_guest.csv","head": 1,"partition": 10,"work_mode": 0,"namespace": "sp_guest","table_name": "tb_guest"
}
1.1.3上传文件
# host 端:
source /data/projects/fate/bin/init_env.sh
flow data upload -c upload_xxl_host.json# guest端:
source /data/projects/fate/bin/init_env.sh
flow data upload -c upload_xxl_guest.json
1.2创建任务脚本
1.2.1、创建dsl文件
test_intersect_job_dsl.json
{"components": {"reader_0": {"module": "Reader","output": {"data": ["data"]}},"data_transform_0": {"module": "DataTransform","input": {"data": {"data": ["reader_0.data"]}},"output": {"data": ["data"],"model": ["model"]}},"intersect_0": {"module": "Intersection","input": {"data": {"data": ["data_transform_0.data"]}},"output": {"data": ["data"]}}}
}
1.2.2、创建任务配置文件
test_intersect_job_rsa_conf.json
{"dsl_version": 2,"initiator": {"role": "guest","party_id": 9999},"role": {"guest": [9999],"host": [10000]},"component_parameters": {"common": {"intersect_0": {"intersect_method": "rsa","sync_intersect_ids": false,"only_output_key": true,"rsa_params": {"hash_method": "sha256","final_hash_method": "sha256","split_calculation": false,"key_length": 2048}}},"role": {"guest": {"0": {"reader_0": {"table": {"name": "tb_guest","namespace": "sp_guest"}},"data_transform_0": {"with_label": false,"output_format": "dense"}}},"host": {"0": {"reader_0": {"table": {"name": "tb_host","namespace": "sp_host"}},"data_transform_0": {"with_label": false,"output_format": "dense"}}}}}
}
1.3执行任务(guest端执行)
cd /data/projects/fate/examples/mytest
source /data/projects/fate/bin/init_env.sh
flow job submit -d test_intersect_job_dsl.json -c test_intersect_job_raw_conf.json
2、1v2案例
本次案例在guest上执行求交任务,求交host2个文件
2.1 上传文件
2.1.1 host创建上传脚本(2个):
upload_xxl_host.json
# 上传第一个文件:upload_xxl_host1.json
{"file": "/data/projects/fate/examples/mytest/1v2/xxl_test_host1.csv","head": 1,"partition": 10,"work_mode": 0,"namespace": "sp_host1","table_name": "tb_host1"
}# 上传第二个文件:upload_xxl_host2.json
{"file": "/data/projects/fate/examples/mytest/xxl_test_host2.csv","head": 1,"partition": 10,"work_mode": 0,"namespace": "sp_host2","table_name": "tb_host2"
}
2.1.2 guest创建脚本:
upload_xxl_guest.json
{"file": "/data/projects/fate/examples/mytest/1v2/xxl_test_guest.csv","head": 1,"partition": 10,"work_mode": 0,"namespace": "sp_guest1","table_name": "tb_guest1"
}
2.1.3 上传文件
# host 端:
source /data/projects/fate/bin/init_env.sh
cd /data/projects/fate/examples/mytest/1v2flow data upload -c upload_xxl_host1.json
flow data upload -c upload_xxl_host2.json
# guest端:
source /data/projects/fate/bin/init_env.sh
cd /data/projects/fate/examples/mytest/1v2flow data upload -c upload_xxl_guest.json
2.2 创建任务脚本
本次任务在guest上执行,任务脚本在guest端
2.2.1 创建dsl文件
test_union_job_dsl.json
{"components": {"reader_0": {"module": "Reader","output": {"data": ["data"]}},"reader_1": {"module": "Reader","output": {"data": ["data"]}},"intersection_0": {"module": "Intersection","input": {"data": {"data": ["reader_0.data"]}},"output": {"data": ["data"]}},"intersection_1": {"module": "Intersection","input": {"data": {"data": ["reader_1.data"]}},"output": {"data": ["data"]}},"union_0": {"module": "Union","input": {"data": {"data": ["intersection_0.data","intersection_1.data"]}},"output": {"data": ["data"]}}}
}
2.2.2 创建任务配置文件
test_union_job_conf.json
注意:因为 guest 只有1个文件,host 有2个文件,guest 1个文件求交 host 2个文件,所以这里 guest 角色的 reader_ 和 reader_1 读取的数据都是 guest 同一个表的同一份数据
{"dsl_version": 2,"initiator": {"role": "guest","party_id": 9999},"role": {"host": [10000],"guest": [9999]},"component_parameters": {"common": {"union_0": {"allow_missing": false,"need_run": true}},"role": {"guest": {"0": {"reader_0": {"table": {"name": "tb_guest1","namespace": "sp_guest1"}},"reader_1": {"table": {"name": "tb_guest1","namespace": "sp_guest1"}}}},"host": {"0": {"reader_0": {"table": {"name": "tb_host1","namespace": "sp_host1"}},"reader_1": {"table": {"name": "tb_host2","namespace": "sp_host2"}}}}}}
}
2.3 执行任务(guest端执行)
cd /data/projects/fate/examples/mytest/1v2
source /data/projects/fate/bin/init_env.shflow job submit -d test_intersect_job_dsl.json -c test_intersect_job_raw_conf.json
正确求交页面显示: