Ray.tune可视化调整超参数Tensorflow 2.0

Ray.tune官方文档

调整超参数通常是机器学习工作流程中最昂贵的部分。 Tune专为解决此问题而设计，展示了针对此痛点的有效且可扩展的解决方案。请注意，此示例取决于Tensorflow 2.0。
Code: ray/python/ray/tune at master · ray-project/ray · GitHub

Examples: https://github.com/ray-project/ray/tree/master/python/ray/tune/examples)

Documentation: Tune: Scalable Hyperparameter Tuning — Ray v1.6.0

Mailing List: https://groups.google.com/forum/#!forum/ray-dev

## If you are running on Google Colab, uncomment below to install the necessary dependencies 
## before beginning the exercise.# print("Setting up colab environment")
# !pip uninstall -y -q pyarrow
# !pip install -q https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.8.0.dev5-cp36-cp36m-manylinux1_x86_64.whl
# !pip install -q ray[debug]# # A hack to force the runtime to restart, needed to include the above dependencies.
# print("Done installing! Restarting via forced crash (this is not an issue).")
# import os
# os._exit(0)

## If you are running on Google Colab, please install TensorFlow 2.0 by uncommenting below..# try:
#   # %tensorflow_version only exists in Colab.
#   %tensorflow_version 2.x
# except Exception:
#   pass

本教程将逐步介绍使用Tune进行超参数调整的几个关键步骤。

可视化数据。
创建模型训练过程（使用Keras）。
通过调整上述模型训练过程以使用Tune来调整模型。
分析Tune创建的模型。

请注意，这使用了Tune的基于函数的API。这主要是用于原型制作。后面的教程将介绍Tune更加强大的基于类的可训练 API。

import numpy as np
np.random.seed(0)import tensorflow as tf
try:tf.get_logger().setLevel('INFO')
except Exception as exc:print(exc)
import warnings
warnings.simplefilter("ignore")from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Densefrom tensorflow.keras.optimizers import SGD, Adam
from tensorflow.keras.callbacks import ModelCheckpointimport ray
from ray import tune
from ray.tune.examples.utils import get_iris_dataimport inspect
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
%matplotlib inline

Visualize your data

首先让我们看一下数据集的分布。

鸢尾花数据集由3种不同类型的鸢尾花（Setosa，Versicolour和Virginica）的花瓣和萼片长度组成，存储在150x4 numpy中。

行为样本，列为：隔片长度，隔片宽度，花瓣长度和花瓣宽度。

本教程的目标是提供一个模型，该模型可以准确地预测给定的萼片长度，萼片宽度，花瓣长度和花瓣宽度4元组的真实标签。

from sklearn.datasets import load_irisiris = load_iris()
true_data = iris['data']
true_label = iris['target']
names = iris['target_names']
feature_names = iris['feature_names']def plot_data(X, y):# Visualize the data setsplt.figure(figsize=(16, 6))plt.subplot(1, 2, 1)for target, target_name in enumerate(names):X_plot = X[y == target]plt.plot(X_plot[:, 0], X_plot[:, 1], linestyle='none', marker='o', label=target_name)plt.xlabel(feature_names[0])plt.ylabel(feature_names[1])plt.axis('equal')plt.legend();plt.subplot(1, 2, 2)for target, target_name in enumerate(names):X_plot = X[y == target]plt.plot(X_plot[:, 2], X_plot[:, 3], linestyle='none', marker='o', label=target_name)plt.xlabel(feature_names[2])plt.ylabel(feature_names[3])plt.axis('equal')plt.legend();plot_data(true_data, true_label)

创建模型训练过程（使用Keras）

现在，让我们定义一个函数，该函数将包含一些超参数并返回一个可用于训练的模型。

def create_model(learning_rate, dense_1, dense_2):assert learning_rate > 0 and dense_1 > 0 and dense_2 > 0, "Did you set the right configuration?"model = Sequential()model.add(Dense(int(dense_1), input_shape=(4,), activation='relu', name='fc1'))model.add(Dense(int(dense_2), activation='relu', name='fc2'))model.add(Dense(3, activation='softmax', name='output'))optimizer = SGD(lr=learning_rate)model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])return model

下面是一个使用create_model函数训练模型并返回训练后的模型的函数。

def train_on_iris():train_x, train_y, test_x, test_y = get_iris_data()model = create_model(learning_rate=0.1, dense_1=2, dense_2=2)# This saves the top model. `accuracy` is only available in TF2.0.checkpoint_callback = ModelCheckpoint("model.h5", monitor='accuracy', save_best_only=True, save_freq=2)# Train the modelmodel.fit(train_x, train_y, validation_data=(test_x, test_y),verbose=0, batch_size=10, epochs=20, callbacks=[checkpoint_callback])return model

让我们在数据集中快速训练模型。准确性应该很低。

original_model = train_on_iris()  # This trains the model and returns it.
train_x, train_y, test_x, test_y = get_iris_data()
original_loss, original_accuracy = original_model.evaluate(test_x, test_y, verbose=0)
print("Loss is {:0.4f}".format(original_loss))
print("Accuracy is {:0.4f}".format(original_accuracy))

与tune整合

现在，让我们使用Tune优化学习鸢尾花分类的模型。这将分为两个部分-修改训练功能以支持Tune，然后配置Tune。

让我们首先定义一个回调函数，以将中间训练进度报告回Tune。

import tensorflow.keras as keras
from ray.tune import trackclass TuneReporterCallback(keras.callbacks.Callback):"""Tune Callback for Keras.The callback is invoked every epoch."""def __init__(self, logs={}):self.iteration = 0super(TuneReporterCallback, self).__init__()def on_epoch_end(self, batch, logs={}):self.iteration += 1track.log(keras_info=logs, mean_accuracy=logs.get("accuracy"), mean_loss=logs.get("loss"))

整合第1部分：修改训练功能

说明按照接下来的2个步骤来修改train_iris函数以支持Tune。

更改函数的签名以接收超参数字典。该函数将在Ray上调用。
def tune_iris(config)
将配置值传递到create_model中：
model = create_model(learning_rate=config["lr"], dense_1=config["dense_1"], dense_2=config["dense_2"])

def tune_iris():  # TODO: Change me.train_x, train_y, test_x, test_y = get_iris_data()model = create_model(learning_rate=0, dense_1=0, dense_2=0)  # TODO: Change me.checkpoint_callback = ModelCheckpoint("model.h5", monitor='loss', save_best_only=True, save_freq=2)# Enable Tune to make intermediate decisions by using a Tune Callback hook. This is Keras specific.callbacks = [checkpoint_callback, TuneReporterCallback()]# Train the modelmodel.fit(train_x, train_y, validation_data=(test_x, test_y),verbose=0, batch_size=10, epochs=20, callbacks=callbacks)assert len(inspect.getargspec(tune_iris).args) == 1, "The `tune_iris` function needs to take in the arg `config`."print("Test-running to make sure this function will run correctly.")
tune.track.init()  # For testing purposes only.
tune_iris({"lr": 0.1, "dense_1": 4, "dense_2": 4})
print("Success!")

第2部分：配置Tune以调整超参数。

说明按照接下来的2个步骤来配置Tune，以识别顶部的超参数。

指定超参数空间。
hyperparameter_space = { "lr": tune.loguniform(0.001, 0.1), "dense_1": tune.uniform(2, 128), "dense_2": tune.uniform(2, 128), }
增加样品数量。我们评估的试验越多，选择好的模型的机会就越大。
num_samples = 20

常见问题：并行在Tune中如何工作？

设置num_samples将总共运行20个试验（超参数配置示例）。但是，并非所有这些都可以一次运行。最大训练并发性是您正在运行的计算机上的CPU内核数。对于2核机器，将同时训练2个模型。完成后，新的训练过程将从新的超参数配置示例开始。

每个试用版都将在新的Python进程上运行。试用结束后，python进程将被杀死。

常见问题解答：如何调试Tune中的内容？

错误文件列将显示在输出中。运行下面带有错误文件路径路径的单元格以诊断您的问题。
! cat /home/ubuntu/tune_iris/tune_iris_c66e1100_2019-10-09_17-13-24x_swb9xs/error_2019-10-09_17-13-29.txt

启动Tune超参数搜索

# This seeds the hyperparameter sampling.
import numpy as np; np.random.seed(5)  
hyperparameter_space = {}  # TODO: Fill me out.
num_samples = 1  # TODO: Fill me out.####################################################################################################
################ This is just a validation function for tutorial purposes only. ####################
HP_KEYS = ["lr", "dense_1", "dense_2"]
assert all(key in hyperparameter_space for key in HP_KEYS), ("The hyperparameter space is not fully designated. It must include all of {}".format(HP_KEYS))
######################################################################################################ray.shutdown()  # Restart Ray defensively in case the ray connection is lost. 
ray.init(log_to_driver=False)
# We clean out the logs before running for a clean visualization later.
! rm -rf ~/ray_results/tune_irisanalysis = tune.run(tune_iris, verbose=1, config=hyperparameter_space,num_samples=num_samples)assert len(analysis.trials) == 20, "Did you set the correct number of samples?"

分析最佳调整的模型

让我们将真实标签与分类标签进行比较。

_, _, test_data, test_labels = get_iris_data()
plot_data(test_data, test_labels.argmax(1))

# Obtain the directory where the best model is saved.
print("You can use any of the following columns to get the best model: \n{}.".format([k for k in analysis.dataframe() if k.startswith("keras_info")]))
print("=" * 10)
logdir = analysis.get_best_logdir("keras_info/val_loss", mode="min")
# We saved the model as `model.h5` in the logdir of the trial.
from tensorflow.keras.models import load_model
tuned_model = load_model(logdir + "/model.h5")tuned_loss, tuned_accuracy = tuned_model.evaluate(test_data, test_labels, verbose=0)
print("Loss is {:0.4f}".format(tuned_loss))
print("Tuned accuracy is {:0.4f}".format(tuned_accuracy))
print("The original un-tuned model had an accuracy of {:0.4f}".format(original_accuracy))
predicted_label = tuned_model.predict(test_data)
plot_data(test_data, predicted_label.argmax(1))

我们可以通过可视化与基本事实相比较的预测来比较最佳模型的性能。

def plot_comparison(X, y):# Visualize the data setsplt.figure(figsize=(16, 6))plt.subplot(1, 2, 1)for target, target_name in enumerate(["Incorrect", "Correct"]):X_plot = X[y == target]plt.plot(X_plot[:, 0], X_plot[:, 1], linestyle='none', marker='o', label=target_name)plt.xlabel(feature_names[0])plt.ylabel(feature_names[1])plt.axis('equal')plt.legend();plt.subplot(1, 2, 2)for target, target_name in enumerate(["Incorrect", "Correct"]):X_plot = X[y == target]plt.plot(X_plot[:, 2], X_plot[:, 3], linestyle='none', marker='o', label=target_name)plt.xlabel(feature_names[2])plt.ylabel(feature_names[3])plt.axis('equal')plt.legend();plot_comparison(test_data, test_labels.argmax(1) == predicted_label.argmax(1))

额外-使用Tensorboard获得结果

您可以使用TensorBoard查看试用表演。如果未加载图形，请单击“切换所有运行”。

%load_ext tensorboard

%load_ext tensorboard