## 论文及背景

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

## Motivation

Distribution of each layer’s inputs changes during training. This slows down the training by requiring lower learning rates and careful parameter initialization

— Internal Covariate Shift Phenomenon

## 解决方法

### 论文的方法

1. 将每一层的每个特征看作独立
2. 在一个mini-batch上白化，而不是在整个数据集上白化