转自https://blog.csdn.net/jackxu8/article/details/71308390#commentBox
源地址https://docs.pymc.io/notebooks/bayesian_neural_network_advi.html
PyMC3中的贝叶斯深网络
生成数据
产生一个简单的线性不可分的二分类问题的模拟数据。
%matplotlib inline
import pymc3 as pm
import theano.tensor as T
import theano
import sklearn
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
from sklearn import datasets
from sklearn.preprocessing import scale
from sklearn.cross_validation import train_test_split
from sklearn.datasets import make_moons
/Users/xujie/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20."This module will be removed in 0.20.", DeprecationWarning)
X, Y = make_moons(noise=0.2, random_state=0, n_samples=1000)
X = scale(X)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.5)
fig, ax = plt.subplots()
ax.scatter(X[Y==0, 0], X[Y==0, 1], color='b',edgecolors='k',alpha=0.6, label='Class 0')
ax.scatter(X[Y==1, 0], X[Y==1, 1], color='r',edgecolors='k',alpha=0.6, label='Class 1')
sns.despine(); ax.legend()
ax.set(xlabel='X', ylabel='Y', title='Toy binary classification data set');
模型定义
神经网络的基础单元是感知器,和逻辑回归中的类似。将感知器并行排列然后堆叠起来获得隐层。这里使用2个隐层,每层有5个神经元。使用正态分布来正则化权重。
ann_input = theano.shared(X_train)
ann_output = theano.shared(Y_train)n_hidden = 5# Initialize random weights between each layer
init_1 = np.random.randn(X.shape[1], n_hidden)
init_2 = np.random.randn(n_hidden, n_hidden)
init_out = np.random.randn(n_hidden)with pm.Model() as neural_network:# Weights from input to hidden layerweights_in_1 = pm.Normal('w_in_1', 0, sd=1, shape=(X.shape[1], n_hidden), testval=init_1)# Weights from 1st to 2nd layerweights_1_2 = pm.Normal('w_1_2', 0, sd=1, shape=(n_hidden, n_hidden), testval=init_2)# Weights from hidden layer to outputweights_2_out = pm.Normal('w_2_out', 0, sd=1, shape=(n_hidden,), testval=init_out)# Build neural-network using tanh activation functionact_1 = T.tanh(T.dot(ann_input, weights_in_1))act_2 = T.tanh(T.dot(act_1, weights_1_2))act_out = T.nnet.sigmoid(T.dot(act_2, weights_2_out))# Binary classification -> Bernoulli likelihoodout = pm.Bernoulli('out', act_out,observed=ann_output)
/Users/xujie/.pyenv/versions/anaconda3-4.3.1/lib/python3.6/site-packages/theano/tensor/basic.py:2146: UserWarning: theano.tensor.round() changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy. Use the Theano flag `warn.round=False` to disable this warning."theano.tensor.round() changed its default from"
变分推理:缩放模型的复杂性
现在可以使用MCMC采样器(如NUTS
)进行采样将获得不错的效果,但是这会非常缓慢。这里使用全新的ADVI变分推理算法,更加快速,对模型复杂度的适应性也更好。但是,这是均值近似,因此忽略后验中的相关性。
%%timewith neural_network:# Run ADVI which returns posterior means, standard deviations, and the evidence lower bound (ELBO)v_params = pm.variational.advi(n=50000)
Average ELBO = -151.63: 100%|██████████| 50000/50000 [00:13<00:00, 3790.27it/s]
Finished [100%]: Average ELBO = -136.89CPU times: user 15.5 s, sys: 695 ms, total: 16.1 s
Wall time: 18 s
画出目标函数(ELBO)可以看出随着优化的进行,效果主见提升。
plt.plot(v_params.elbo_vals);
plt.ylabel('ELBO');
plt.xlabel('iteration');
为了观察方便,这里使用sample_vp()
对后验进行采样(这个函数只是对正态分布进行采样,因此和MCMC不一样)。
with neural_network:trace = pm.variational.sample_vp(v_params, draws=5000)
100%|██████████| 5000/5000 [00:00<00:00, 10687.44it/s]
这里完成了模型训练,然后使用posterior predictive check (PPC)对排除在外的数据点进行预测。使用sample_ppc()
从后验中生成新数据(从变分估计中采样)。
# Replace shared variables with testing set
ann_input.set_value(X_test)
ann_output.set_value(Y_test)# Creater posterior predictive samples
ppc = pm.sample_ppc(trace, model=neural_network, samples=500)# Use probability of > 0.5 to assume prediction of class 1
pred = ppc['out'].mean(axis=0) > 0.5
100%|██████████| 500/500 [00:03<00:00, 139.67it/s]
fig, ax = plt.subplots()
ax.scatter(X_test[pred==0, 0], X_test[pred==0, 1], color='b',edgecolors='k',alpha=0.6)
ax.scatter(X_test[pred==1, 0], X_test[pred==1, 1], color='r',edgecolors='k',alpha=0.6)
sns.despine()
ax.set(title='Predicted labels in testing set', xlabel='X', ylabel='Y');
print('Accuracy = {}%'.format((Y_test == pred).mean() * 100))
Accuracy = 95.6%
分类器分析
为了分析这个分类器的结果,多整个输入空间上的网络输出做出评估。
grid = np.mgrid[-3:3:100j,-3:3:100j]
grid_2d = grid.reshape(2, -1).T
dummy_out = np.ones(grid.shape[1], dtype=np.int8)
ann_input.set_value(grid_2d)
ann_output.set_value(dummy_out)# 对后验估计进行采样
ppc = pm.sample_ppc(trace, model=neural_network, samples=500)
100%|██████████| 500/500 [00:04<00:00, 101.43it/s]
绘制预测平面如下
cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True)
fig, ax = plt.subplots(figsize=(10, 6))
contour = ax.contourf(*grid, ppc['out'].mean(axis=0).reshape(100, 100), cmap=cmap)
ax.scatter(X_test[pred==0, 0], X_test[pred==0, 1], color='b', edgecolors='k',alpha=0.6)
ax.scatter(X_test[pred==1, 0], X_test[pred==1, 1], color='r', edgecolors='k',alpha=0.6)
cbar = plt.colorbar(contour, ax=ax)
_ = ax.set(xlim=(-3, 3), ylim=(-3, 3), xlabel='X', ylabel='Y');
cbar.ax.set_ylabel('Posterior predictive mean probability of class label = 0');
预测中的不确定性
目前未知,这些工作都可以在非贝叶斯神经网络上完成。每个类别的后验均值可以用极大似然估计给出。然而,贝叶斯神经网络还可以给出后验估计的标准差,来评价后验中的不确定性。
cmap = sns.cubehelix_palette(light=1, as_cmap=True)
fig, ax = plt.subplots(figsize=(10, 6))
contour = ax.contourf(*grid, ppc['out'].std(axis=0).reshape(100, 100), cmap=cmap)
ax.scatter(X_test[pred==0, 0], X_test[pred==0, 1], color='b', edgecolors='k',alpha=0.6)
ax.scatter(X_test[pred==1, 0], X_test[pred==1, 1], color='r', edgecolors='k',alpha=0.6)
cbar = plt.colorbar(contour, ax=ax)
_ = ax.set(xlim=(-3, 3), ylim=(-3, 3), xlabel='X', ylabel='Y');
cbar.ax.set_ylabel('Uncertainty (posterior predictive standard deviation)');
可以看出,在越接近决策边界的地方,预测的不确定性越高。而预测结果与不确定性相关联分析是许多应用领域的重要因素,比如医学检测。为了更高的准确性,可以让模型在不确定性高的地方进行更多的采样。
微批AVDI(Mini-batch AVDI):可扩展数据大小
此前,都是在所有数据上一次性训练模型。这样的训练方式对于数据集的大小不可扩展。然而可以在一小批数据上(mini-batch of data)进行训练(随机梯度下降),可以避免局部最小并获得更快的收敛。
from six.moves import zip# Set back to original data to retrain
ann_input.set_value(X_train)
ann_output.set_value(Y_train)# Tensors and RV that will be using mini-batches
minibatch_tensors = [ann_input, ann_output]
minibatch_RVs = [out]# Generator that returns mini-batches in each iteration
def create_minibatch(data):while True:# Return random data samples of set size 100 each iterationixs = np.random.randint(len(data), size=50)yield data[ixs]minibatches = zip(create_minibatch(X_train),create_minibatch(Y_train),
)total_size = len(Y_train)
%%timewith neural_network:# Run advi_minibatchv_params = pm.variational.advi_minibatch(n=50000, minibatch_tensors=minibatch_tensors, minibatch_RVs=minibatch_RVs, minibatches=minibatches, total_size=total_size, learning_rate=1e-2, epsilon=1.0)
Average ELBO = -123.12: 100%|██████████| 50000/50000 [00:16<00:00, 3043.69it/s]
Finished minibatch ADVI: ELBO = -121.76CPU times: user 20.5 s, sys: 371 ms, total: 20.9 s
Wall time: 23.9 s
with neural_network: trace = pm.variational.sample_vp(v_params, draws=5000)
100%|██████████| 5000/5000 [00:00<00:00, 9914.64it/s]
mini-batch方式的运行时间更长,但是收敛更快。
pm.traceplot(trace);