ml
Dropout & Batch Norm
Dropout neuron grid (inverted scaling, training vs inference), BatchNorm forward pass (μ, σ², x̂, γ/β), running mean/variance via EMA.
Mode
Dropout Parameters
Neuron Grid (24 neurons, keep p=0.5)
● active⊗ dropped
Dropout Analysis
Keep probability p0.5
Drop probability0.5
Expected active neurons12
Effective LR multiplier (1/p)2
Gradient variance factor (1/p²)4
Recommended p by Layer
Input layerp = 0.8 (keep 80%)minimal dropout
Hidden layersp = 0.5 (keep 50%)standard dropout
Output layerNonenever drop outputs
CNNsp ≥ 0.8 (keep ≥80%)spatial features need most neurons
Inverted dropout: keep units with prob p, scale by 1/p
Inference: no dropout, outputs identical to training scale
Effective ensemble of 2^N sub-networks
Inference: no dropout, outputs identical to training scale
Effective ensemble of 2^N sub-networks