In the diagram (architecture) below, how was the (fully-connected) dense layer of 4096 units derived from last max-pool layer (on the right) of dimensions 256x13x13? Instead of 4096, shouldn't it be 256*13*13=43264 ?
解决方案
If I'm correct, you're asking why the 4096x1x1 layer is much smaller.
That's because it's a fully connected layer. Every neuron from the last max-pooling layer (=256*13*13=43264 neurons) is connectd to every neuron of the fully-connected layer.
This is an example of an ALL to ALL connected neural network:
As you can see, layer2 is bigger than layer3. That doesn't mean they can't connect.
There is no conversion of the last max-pooling layer -> all the neurons in the max-pooling layer are just connected with all the 4096 neurons in the next layer.
The 'dense' operation just means calculate the weights and biases of all these connections (= 4096 * 43264 connections) and add the bias of the neurons to calculate the next output.
It's connected the same was an MLP.
But why 4096? There is no reasoning. It's just a choice. It could have been 8000, it could have been 20, it just depends on what works best for the network.