tensorpack.models package¶
Relevant tutorials: Symbolic Layers.

tensorpack.models.
BatchNorm
(scope_name, inputs, axis=None, training=None, momentum=0.9, epsilon=1e05, center=True, scale=True, beta_initializer=<tf.python.ops.init_ops.Zeros object>, gamma_initializer=<tf.python.ops.init_ops.Ones object>, virtual_batch_size=None, data_format='channels_last', internal_update=False, sync_statistics=None)[source]¶ Almost equivalent to tf.layers.batch_normalization, but different (and more powerful) in the following:
Accepts an alternative data_format option when axis is None. For 2D input, this argument will be ignored.
Default value for momentum and epsilon is different.
Default value for training is automatically obtained from tensorpack’s TowerContext, but can be overwritten.
Support the internal_update option, which cover more use cases than the standard collectionbased update.
Support the sync_statistics option, which is very useful in smallbatch models.
Parameters: internal_update (bool) –
if False, add EMA update ops to tf.GraphKeys.UPDATE_OPS. If True, update EMA inside the layer by control dependencies. They are very similar in speed, but internal_update=True is recommended and can be helpful when:
BatchNorm is used inside dynamic control flow. The collectionbased update does not support dynamic control flows.
BatchNorm layer is sometimes unused (e.g., when you have two networks to train alternatively). Putting all update ops into a single collection will waste a lot of compute.
Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/14699
sync_statistics (str or None) –
one of None, “nccl”, or “horovod”.
By default (None), it uses statistics of the input tensor to normalize during training. This is the standard way BatchNorm was implemented in most frameworks.
When set to “nccl”, this layer must be used under tensorpack’s multiGPU trainers. It uses the aggregated statistics of the whole batch (across all GPUs) to normalize.
When set to “horovod”, this layer must be used under tensorpack’s
HorovodTrainer
. It uses the aggregated statistics of the whole batch (across all MPI ranks) to normalize. Note that on single machine this is significantly slower than the “nccl” implementation.When enabled, perGPU E[x] and E[x^2] among all GPUs are averaged to compute global mean & variance. Therefore each GPU needs to have the same batch size.
The synchronization is based on the current variable scope + the name of the layer (BatchNorm(‘name’, input)). Therefore, you need to make sure that:
The BatchNorm layer on different GPUs needs to have the same name, so that statistics can be synchronized. If names do not match, this layer will hang.
Different BatchNorm layers in one tower cannot share the same name.
A BatchNorm layer needs to be executed for the same number of times by all GPUs. If different GPUs execute one BatchNorm layer for different number of times (e.g., if some GPUs do not execute it), this layer may hang.
This option only has effect when training == get_current_tower_context().training == True.
This option is also known as “CrossGPU BatchNorm” as mentioned in: MegDet: A Large MiniBatch Object Detector. Corresponding TF issue: https://github.com/tensorflow/tensorflow/issues/18222.
When sync_statistics is enabled, internal_update will be set to True automatically. This is to avoid running UPDATE_OPS, which requires synchronization.
Variable Names:
beta
: the bias term. Will be zeroinited by default.gamma
: the scale term. Will be oneinited by default.mean/EMA
: the moving average of mean.variance/EMA
: the moving average of variance.
Note
Combinations of
training
andctx.is_training
:training == ctx.is_training
: standard BN, EMA are maintained during training and used during inference. This is the default.training and not ctx.is_training
: still use batch statistics in inference.not training and ctx.is_training
: use EMA to normalize in training. This is useful when you load a pretrained BN and don’t want to fine tune the EMA. EMA will not be updated in this case.

tensorpack.models.
BatchRenorm
(scope_name, x, rmax, dmax, momentum=0.9, epsilon=1e05, center=True, scale=True, gamma_initializer=None, data_format='channels_last')[source]¶ Batch Renormalization layer, as described in the paper: Batch Renormalization: Towards Reducing Minibatch Dependence in BatchNormalized Models. This implementation is a wrapper around tf.layers.batch_normalization.
Parameters: Returns: tf.Tensor – a tensor named
output
with the same shape of x.Variable Names:
beta
: the bias term.gamma
: the scale term. Input will be transformed byx * gamma + beta
.moving_mean, renorm_mean, renorm_mean_weight
: See TF documentation.moving_variance, renorm_stddev, renorm_stddev_weight
: See TF documentation.

tensorpack.models.
layer_register
(log_shape=False, use_scope=True)[source]¶ Parameters: log_shape (bool) – log input/output shape of this layer
use_scope (bool or None) – Whether to call this layer with an extra first argument as scope. When set to None, it can be called either with or without the scope name argument. It will try to figure out by checking if the first argument is string or not.
Returns: A decorator used to register a layer.
Example:
@layer_register(use_scope=True) def add10(x): return x + tf.get_variable('W', shape=[10])

class
tensorpack.models.
VariableHolder
(**kwargs)[source]¶ Bases:
object
A proxy to access variables defined in a layer.

tensorpack.models.
rename_tflayer_get_variable
()[source]¶ Rename all
tf.get_variable()
with rules that transforms tflayer style to tensorpack style.Returns: A context where the variables are renamed. Example:
with rename_tflayer_get_variable(): x = tf.layer.conv2d(input, 3, 3, name='conv0') # variables will be named 'conv0/W', 'conv0/b'

tensorpack.models.
Conv2D
(scope_name, inputs, filters, kernel_size, strides=(1, 1), padding='same', data_format='channels_last', dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=None, bias_initializer=<tf.python.ops.init_ops.Zeros object>, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, split=1)[source]¶ A wrapper around tf.layers.Conv2D. Some differences to maintain backwardcompatibility:
Default kernel initializer is variance_scaling_initializer(2.0).
Default padding is ‘same’.
Support ‘split’ argument to do group conv. Note that this is not efficient.
Variable Names:
W
: weightsb
: bias

tensorpack.models.
Conv2DTranspose
(scope_name, inputs, filters, kernel_size, strides=(1, 1), padding='same', data_format='channels_last', activation=None, use_bias=True, kernel_initializer=None, bias_initializer=<tf.python.ops.init_ops.Zeros object>, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None)[source]¶ A wrapper around tf.layers.Conv2DTranspose. Some differences to maintain backwardcompatibility:
Default kernel initializer is variance_scaling_initializer(2.0).
Default padding is ‘same’
Variable Names:
W
: weightsb
: bias

tensorpack.models.
FullyConnected
(scope_name, inputs, units, activation=None, use_bias=True, kernel_initializer=None, bias_initializer=<tf.python.ops.init_ops.Zeros object>, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None)[source]¶ A wrapper around tf.layers.Dense. One difference to maintain backwardcompatibility: Default weight initializer is variance_scaling_initializer(2.0).
Variable Names:
W
: weights of shape [in_dim, out_dim]b
: bias

tensorpack.models.
LayerNorm
(scope_name, x, epsilon=1e05, use_bias=True, use_scale=True, gamma_init=None, data_format='channels_last')[source]¶ Layer Normalization layer, as described in the paper: Layer Normalization.
Parameters: x (tf.Tensor) – a 4D or 2D tensor. When 4D, the layout should match data_format.
epsilon (float) – epsilon to avoid dividebyzero.
use_bias (use_scale,) – whether to use the extra affine transformation or not.

tensorpack.models.
InstanceNorm
(scope_name, x, epsilon=1e05, use_affine=True, gamma_init=None, data_format='channels_last')[source]¶ Instance Normalization, as in the paper: Instance Normalization: The Missing Ingredient for Fast Stylization.
Parameters:

class
tensorpack.models.
LinearWrap
(tensor)[source]¶ Bases:
object
A simple wrapper to easily create “linear” graph, consisting of layers / symbolic functions with only one input & output.

apply
(func, *args, **kwargs)[source]¶ Apply a function on the wrapped tensor.
Returns: LinearWrap – LinearWrap(func(self.tensor(), *args, **kwargs))
.

apply2
(func, *args, **kwargs)[source]¶ Apply a function on the wrapped tensor. The tensor will be the second argument of func.
This is because many symbolic functions (such as tensorpack’s layers) takes ‘scope’ as the first argument.
Returns: LinearWrap – LinearWrap(func(args[0], self.tensor(), *args[1:], **kwargs))
.

print_tensor
()[source]¶ Print the underlying tensor and return self. Can be useful to get the name of tensors inside
LinearWrap
.Returns: self


tensorpack.models.
Maxout
([scope_name, ]x, num_unit)[source]¶ Maxout as in the paper Maxout Networks.
Parameters: x (tf.Tensor) – a NHWC or NC tensor. Channel has to be known.
num_unit (int) – a int. Must be divisible by C.
Returns: tf.Tensor – of shape NHW(C/num_unit) named
output
.

tensorpack.models.
PReLU
(scope_name, x, init=0.001, name='output')[source]¶ Parameterized ReLU as in the paper Delving Deep into Rectifiers: Surpassing HumanLevel Performance on ImageNet Classification.
Parameters: Variable Names:
alpha
: learnable slope.

tensorpack.models.
BNReLU
([scope_name, ]x, name=None)[source]¶ A shorthand of BatchNormalization + ReLU.

tensorpack.models.
MaxPooling
(scope_name, inputs, pool_size, strides=None, padding='valid', data_format='channels_last')[source]¶ Same as tf.layers.MaxPooling2D. Default strides is equal to pool_size.

tensorpack.models.
FixedUnPooling
(scope_name, x, shape, unpool_mat=None, data_format='channels_last')[source]¶ Unpool the input with a fixed matrix to perform kronecker product with.
Parameters: x (tf.Tensor) – a 4D image tensor
shape – int or (h, w) tuple
unpool_mat – a tf.Tensor or np.ndarray 2D matrix with size=shape. If is None, will use a matrix with 1 at topleft corner.
Returns: tf.Tensor – a 4D image tensor.

tensorpack.models.
AvgPooling
(scope_name, inputs, pool_size, strides=None, padding='valid', data_format='channels_last')[source]¶ Same as tf.layers.AveragePooling2D. Default strides is equal to pool_size.

tensorpack.models.
GlobalAvgPooling
(scope_name, x, data_format='channels_last')[source]¶ Global average pooling as in the paper Network In Network.
Parameters: x (tf.Tensor) – a 4D tensor. Returns: tf.Tensor – a NC tensor named output
.

tensorpack.models.
regularize_cost
(regex, func, name='regularize_cost')[source]¶ Apply a regularizer on trainable variables matching the regex, and print the matched variables (only print once in multitower training). In replicated mode, it will only regularize variables within the current tower.
If called under a TowerContext with is_training==False, this function returns a zero constant tensor.
Parameters: regex (str) – a regex to match variable names, e.g. “conv.*/W”
func – the regularization function, which takes a tensor and returns a scalar tensor. E.g.,
tf.nn.l2_loss, tf.contrib.layers.l1_regularizer(0.001)
.
Returns: tf.Tensor – a scalar, the total regularization cost.
Example
cost = cost + regularize_cost("fc.*/W", l2_regularizer(1e5))

tensorpack.models.
regularize_cost_from_collection
(name='regularize_cost')[source]¶ Get the cost from the regularizers in
tf.GraphKeys.REGULARIZATION_LOSSES
. If in replicated mode, will only regularize variables created within the current tower.Parameters: name (str) – the name of the returned tensor Returns: tf.Tensor – a scalar, the total regularization cost.

tensorpack.models.
Dropout
([scope_name, ]x, *args, **kwargs)[source]¶ Same as tf.layers.dropout. However, for historical reasons, the first positional argument is interpreted as keep_prob rather than drop_prob. Explicitly use rate= keyword arguments to ensure things are consistent.

tensorpack.models.
ConcatWith
([scope_name, ]x, tensor, dim)[source]¶ A wrapper around
tf.concat
to cooperate withLinearWrap
.Parameters: Returns: tf.Tensor –
tf.concat([x] + tensor, dim)