# BatchNormTraining

BatchNormTraining  // Compute mean and variance from the input.


## Description¶

### Inputs¶

Name

Element Type

Shape

input

real

$$(\bullet, C, \ldots)$$

gamma

same as input

$$(C)$$

beta

same as input

$$(C)$$

### Attributes¶

Name

Type

Notes

epsilon

double

Small bias added to variance to avoid division by 0.

### Outputs¶

Name

Element Type

Shape

normalized

same as gamma

Same as input

batch_mean

same as gamma

$$(C)$$

batch_variance

same as gamma

$$(C)$$

The batch_mean and batch_variance outputs are computed per-channel from input.

## Mathematical Definition¶

The axes of the input fall into two categories: positional and channel, with channel being axis 1. For each position, there are $$C$$ channel values, each normalized independently.

Normalization of a channel sample is controlled by two values:

• the batch_mean $$\mu$$, and

• the batch_variance $$\sigma^2$$;

and by two scaling attributes: $$\gamma$$ and $$\beta$$.

The values for $$\mu$$ and $$\sigma^2$$ come from computing the mean and variance of input.

$\begin{split}\mu_c &= \mathop{\mathbb{E}}\left(\mathtt{input}_{\bullet, c, \ldots}\right)\\ \sigma^2_c &= \mathop{\mathtt{Var}}\left(\mathtt{input}_{\bullet, c, \ldots}\right)\\ \mathtt{normlized}_{\bullet, c, \ldots} &= \frac{\mathtt{input}_{\bullet, c, \ldots}-\mu_c}{\sqrt{\sigma^2_c+\epsilon}}\gamma_c+\beta_c\end{split}$

## Backprop¶

$\begin{split}[\overline{\texttt{input}}, \overline{\texttt{gamma}}, \overline{\texttt{beta}}]=\\ \mathop{\texttt{BatchNormTrainingBackprop}}(\texttt{input},\texttt{gamma},\texttt{beta},\texttt{mean},\texttt{variance},\overline{\texttt{normed_input}}).\end{split}$

## C++ Interface¶

class BatchNormTraining : public ngraph::op::Op

Batchnorm for training operation.

Subclassed by ngraph::op::gpu::BatchNormTrainingWithStats

Public Functions

const std::string &description() const

Get the string name for the type of the node, such as Add or Multiply. The class name, must not contain spaces as it is used for codegen.

Return

A const reference to the node’s type name

BatchNormTraining(const Output<Node> &input, const Output<Node> &gamma, const Output<Node> &beta, double epsilon)

Parameters
• input: Must have rank >= 2, [., C, …]

• gamma: gamma scaling for normalized value. [C]

• beta: bias added to the scaled normalized value [C]

• epsilon: Avoids divsion by 0 if input has 0 variance

BatchNormTraining(double eps, const Output<Node> &gamma, const Output<Node> &beta, const Output<Node> &input)

In this version of BatchNorm:

MEAN AND VARIANCE: computed directly from the content of ‘input’.

OUTPUT VALUE: A tuple with the following structure:  - The normalization of ‘input’.  - The per-channel means of (pre-normalized) ‘input’.  - The per-channel variances of (pre-normalized) ‘input’.

AUTODIFF SUPPORT: yes: ‘generate_adjoints(…)’ works as expected.

SHAPE DETAILS: gamma: must have rank 1, with the same span as input’s channel axis. beta: must have rank 1, with the same span as input’s channel axis. input: must have rank >= 2. The second dimension represents the channel axis and must have a span of at least 1. output: shall have the same shape as ‘input’. output: shall have rank 1, with the same span as input’s channel axis. output: shall have rank 1, with the same span as input’s channel axis.

void validate_and_infer_types()

Throws if the node is invalid.