BN的过程，具体是怎样计算均值和方差的？ -一个渣渣

对于一个小批次的图像样本，NCHW [128,3,10,10], BN的过程，具体是怎样计算均值和方差的？

下来找到部分相关代码如下：（\tensorflow\python\layers\normalization.py)
def call(self, inputs, training=False):
    # First, compute the axes along which to reduce the mean / variance,
    # as well as the broadcast shape to be used for all parameters.
    input_shape = inputs.get_shape()
    ndim = len(input_shape)
    reduction_axes = list(range(len(input_shape)))
    del reduction_axes[self.axis]
    broadcast_shape = [1] * len(input_shape)
    broadcast_shape[self.axis] = input_shape[self.axis].value

# Determines whether broadcasting is needed.
needs_broadcasting = (sorted(reduction_axes) != list(range(ndim))[:-1])

scale, offset = self.gamma, self.beta

    # Determine a boolean value for `training`: could be True, False, or None.
    training_value = utils.constant_value(training)
    if training_value is not False:
      # Some of the computations here are not necessary when training==False
      # but not a constant. However, this makes the code simpler.
      mean, variance = nn.moments(inputs, reduction_axes)

最后一行的 reduction_axes 去除的元素是在如下的代码： axis = 1 if data_format == DATA_FORMAT_NCHW else -1
也就是del reduction_axes[self.axis] 中的self.axis 在channel_first的情况下为1，即reduction_axes为[0,2,3]
效果就是在C channel这一维 reduce计算均值和方差！

另外一点小问题，本以为计算均值，方差会是 tf.mean, tf.std之类的方法，不知道还有 nn.moments这样的方法来计算.. 先搜了一下moments这个东东。。名字是矩? moment是动量? 其实其原始含义是“to move"或者“移动”，这样就好理解了。不了解的同学也先自行了解吧。

其实在nn.moments的注释里有提示:
When using these moments for batch normalization (see
`tf.nn.batch_normalization`):

   * for so-called "global normalization", used with convolutional filters with
     shape `[batch, height, width, depth]`, pass `axes=[0, 1, 2]`.
   * for simple batch normalization pass `axes=[0]` (batch only).


  看一个例子：

import tensorflow as tf
a = []
for i in range(24):
for j in range(5):
a.append(float(i+1))

shape = [2,3,4,5]
b = tf.constant(a, shape=shape)

axis1 = list(range(len(shape)-1)) #从最后一维计算均值方差
axis2 = list(range(len(shape)))
del axis2[1] #模仿NCHW，从chanel维计算均值方差

end_mean, end_var = tf.nn.moments(b, axis1)
cha_mean, cha_var = tf.nn.moments(b, axis2)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for temp in [b, end_mean, cha_mean]:
        print ('\n', sess.run(temp))

执行结果：

[[[[ 1.   1.   1.   1.   1.]
   [ 2.   2.   2.   2.   2.]
   [ 3.   3.   3.   3.   3.]
   [ 4.   4.   4.   4.   4.]]

[[ 5.   5.   5.   5.   5.]
   [ 6.   6.   6.   6.   6.]
   [ 7.   7.   7.   7.   7.]
   [ 8.   8.   8.   8.   8.]]

[[ 9.   9.   9.   9.   9.]
   [ 10. 10. 10. 10. 10.]
   [ 11. 11. 11. 11. 11.]
   [ 12. 12. 12. 12. 12.]]]

[[[ 13. 13. 13. 13. 13.]
   [ 14. 14. 14. 14. 14.]
   [ 15. 15. 15. 15. 15.]
   [ 16. 16. 16. 16. 16.]]

[[ 17. 17. 17. 17. 17.]
   [ 18. 18. 18. 18. 18.]
   [ 19. 19. 19. 19. 19.]
   [ 20. 20. 20. 20. 20.]]

[[ 21. 21. 21. 21. 21.]
   [ 22. 22. 22. 22. 22.]
   [ 23. 23. 23. 23. 23.]
   [ 24. 24. 24. 24. 24.]]]]

[ 12.5 12.5 12.5 12.5 12.5]

[ 8.5 12.5 16.5]

解释：

从最后一维计算均值方差：结果[ 12.5 12.5 12.5 12.5 12.5] 这是每一个最后一维的值1+2+...+24 的结果除以24个数。

从chanel维计算均值方差：结果 [ 8.5 12.5 16.5] 计算方法是（1+2+3+4+13+14+15+16 ）*5 / (4*5*2) ... 其他两个类推

19 年中再来看以前写的这篇BN 觉得太浅了，很多没说清楚。

BN是为了使深度学习中每一层神经网络的输入保持相同的分布的，把不同的分布强行拉回到均值为0 方差为1的分布。

上面说的在channel这一维计算均值方差，是针对CNN场景的BN来说的。

对于DNN的BN来说，其求均值的对象是一个Batch之内的n个样本被同一个神经元激活的激活值，所有（n个数加和）

对于CNN的BN来说，其求均值的对象是一个Batch之内的n个样本被同一个卷积核的输出通道激活的激活值，所有（n个激活平面加和，比如 NCHW [128,3,10,10], 则为 128个样本 * 10 * 10）

建议大家看张俊林老师写的BN相关吧！

转载自原文链接, 如需删除请联系管理员。

原文链接：BN的过程，具体是怎样计算均值和方差的？，转载请注明来源！