SparkML
SparkML_lr_train :读取py处理后的train表用于训练,将训练模型保存好。
SparkML_lr_predict :读取训练好的模型,读取py处理后的test表用于预测。将预测结果写入normal_data中,根据id修改stream_is_normal的值。
提交spark任务
bin/spark-submit \
--class SparkML_lr_train \
--master yarn \
--deploy-mode cluster \
./SparkML_lr_train1.jar \
10bin/spark-submit \
--class SparkML_lr_train \
--master yarn \
--deploy-mode client \
./SparkML_lr_train4.jar \
10bin/spark-submit \
--class SparkML_lr_predict \
--master yarn \
--deploy-mode client \
./SparkML_lr_predict.jar \
10bin/spark-submit \
--class lr_train\
--master yarn \
--deploy-mode client \
./lr_train.jar \
10bin/spark-submit \
--class lr_predict\
--master yarn \
--deploy-mode client \
./lr_predict.jar \
10
启动hadoop(启动脚本)
hdp.sh start
启动spark(命令行启动)
sbin/start-all.sh
bin/spark-submit
–class SparkSQL_lr_train
–master yarn
–deploy-mode client
./SparkSQL_lr_train.jar
10
bin/spark-submit
–class lr_train
–master yarn
–deploy-mode client
./lr_train.jar
10