今天做了一下 Stanford CS294A 的一个 programming assignment: sparse autoencoder
因为之前做过 Andrew Ng 的 ml online class 的 neural network 那节的作业,
所以这个实现起来就很 easy 了。
直接贴代码(all vectorized):
1 [d m] = size(data); 2 3 %% forward pass for all exampes 4 Z2 = W1 * data + repmat(b1, 1, m); % [hiddenSize*visibleSize] * [visibleSize*m] = [hiddenSize*m] 5 A2 = sigmoid(Z2); % activations of the hidden layer 6 Z3 = W2 * A2 + repmat(b2, 1, m); % [visibleSize*hiddenSize] * [hiddenSize*m] = [visibleSize*m] 7 A3 = sigmoid(Z3); % activations of the output layer [visibleSize*m] 8 9 % only error term 10 cost = cost + mean(sum((A3 - data).^2))/2; % (half) squared-error 11 12 % add weight decay 13 cost = cost + lambda / 2 * (sum(W1(:).^2) + sum(W2(:).^2)); 14 15 % add sparsity 16 rho = mean(A2, 2); 17 rho0 = sparsityParam; 18 % accumulated kl divergence between calculated average actiavtion and target activation 19 kl = sum(rho0 * log(rho0./rho) + ... 20 (1 - rho0) * log((1-rho0)./(1-rho))); 21 cost = cost + beta * kl; 22 23 24 25 %% now use backpropagation to calculate gradients 26 delta3 = -(data - A3) .* sigmoidGradient(Z3); % [visibleSize*m] 27 delta2 = ((W2' * delta3) + ... 28 beta * repmat(-rho0./rho + (1-rho0)./(1-rho), 1, m)) .* ... % considering sparsity 29 sigmoidGradient(Z2); % [hiddenSize*m] 30 31 % only consider error term 32 W2grad = delta3 * A2' / m; % [visibleSize*m] * [m*hiddenSize] = [visibleSize*hiddenSize] 33 b2grad = mean(delta3, 2); % [visibleSize*1] 34 W1grad = delta2 * data' / m; % [hiddenSize*m] * [m*visibleSize] = [hiddenSize*visibleSize] 35 b1grad = mean(delta2, 2); % [hiddenSize*1] 36 37 % add weight decay term 38 W2grad = W2grad + lambda * W2; 39 W1grad = W1grad + lambda * W1;
Visualization 的结果如下:
参考:
http://www.stanford.edu/class/cs294a/handouts.html
或
http://deeplearning.stanford.edu/wiki/index.php/Exercise:Sparse_Autoencoder