tensorpack.callbacks package

Everything other than the training iterations happen in the callbacks. Most of the fancy things you want to do will probably end up here. See relevant tutorials: Callbacks.

class tensorpack.callbacks.Callback[source]

Bases: object

Base class for all callbacks. See Write a Callback for more detailed explanation of the callback methods.

epoch_num

int – trainer.epoch_num

global_step

int – trainer.global_step

local_step

int – trainer.local_step

trainer

Trainer – the trainer.

graph

tf.Graph – the graph.

Note

These attributes are available only after (and including) _setup_graph().

_setup_graph()[source]

Called before finalizing the graph. Override this method to setup the ops used in the callback. This is the same as tf.train.SessionRunHook.begin().

_before_train()[source]

Called right before the first iteration. The main difference to setup_graph is that at this point the graph is finalized and a default session is initialized. Override this method to, e.g. run some operations under the session.

This is similar to tf.train.SessionRunHook.after_create_session(), but different: it is called after the session is initialized by tfutils.SessionInit.

_after_train()[source]

Called after training.

_before_run(ctx)[source]

It is called before every hooked_sess.run() call, and it registers some extra op/tensors to run in the next call. This method is the same as tf.train.SessionRunHook.before_run. Refer to TensorFlow docs for more details.

_after_run(run_context, run_values)[source]

It is called after every hooked_sess.run() call, and it processes the values requested by the corresponding before_run(). It is equivalent to tf.train.SessionRunHook.after_run(), refer to TensorFlow docs for more details.

_before_epoch()[source]

Called right before each epoch. Usually you should use the trigger() callback to run something between epochs. Use this method only when something really needs to be run immediately before each epoch.

_after_epoch()[source]

Called right after each epoch. Usually you should use the trigger() callback to run something between epochs. Use this method only when something really needs to be run immediately after each epoch.

_trigger_step()[source]

Called after each Trainer.run_step() completes. Defaults to no-op.

You can override it to implement, e.g. a ProgressBar.

_trigger_epoch()[source]

Called after the completion of every epoch. Defaults to call self.trigger()

_trigger()[source]

Override this method to define a general trigger behavior, to be used with trigger schedulers. Note that the schedulers (e.g. PeriodicTrigger) might call this method both inside an epoch and after an epoch.

When used without the scheduler, this method by default will be called by trigger_epoch().

chief_only

Only run this callback on chief training process.

Returns: bool

get_tensors_maybe_in_tower(names)[source]

Get tensors in the graph with the given names. Will automatically check for the first training tower if no existing tensor is found with the name.

Returns:[tf.Tensor]
set_chief_only(v=True)[source]

Set chief_only property, and returns the callback itself.

class tensorpack.callbacks.ProxyCallback(cb)[source]

Bases: tensorpack.callbacks.base.Callback

A callback which proxy all methods to another callback. It’s useful as a base class of callbacks which decorate other callbacks.

__init__(cb)[source]
Parameters:cb (Callback) – the underlying callback
class tensorpack.callbacks.CallbackFactory(setup_graph=None, before_train=None, trigger=None, after_train=None)[source]

Bases: tensorpack.callbacks.base.Callback

Create a callback with some lambdas.

__init__(setup_graph=None, before_train=None, trigger=None, after_train=None)[source]

Each lambda takes self as the only argument.

class tensorpack.callbacks.StartProcOrThread(startable, stop_at_last=True)[source]

Bases: tensorpack.callbacks.base.Callback

Start some threads or processes before training.

__init__(startable, stop_at_last=True)[source]
Parameters:
  • startable (list) – list of processes or threads which have start() method. Can also be a single instance of process of thread.

  • stop_at_last (bool) – whether to stop the processes or threads after training. It will use Process.terminate() or StoppableThread.stop(), but will do nothing on normal threading.Thread or other startable objects.

class tensorpack.callbacks.RunOp(op, run_before=True, run_as_trigger=True, run_step=False, verbose=False)[source]

Bases: tensorpack.callbacks.base.Callback

Run an Op.

__init__(op, run_before=True, run_as_trigger=True, run_step=False, verbose=False)[source]
Parameters:
  • op (tf.Operation or function) – an Op, or a function that returns the Op in the graph. The function will be called after the main graph has been created (in the setup_graph callback).

  • run_before (bool) – run the Op before training

  • run_as_trigger (bool) – run the Op on every trigger() call.

  • run_step (bool) – run the Op every step (along with training)

  • verbose (bool) – print logs when the op is run.

Example

The DQN Example uses this callback to update target network.

class tensorpack.callbacks.RunUpdateOps(collection='update_ops')[source]

Bases: tensorpack.callbacks.graph.RunOp

Run ops from the collection UPDATE_OPS every step

__init__(collection='update_ops')[source]
Parameters:collection (str) – collection of ops to run. Defaults to tf.GraphKeys.UPDATE_OPS
class tensorpack.callbacks.ProcessTensors(names, fn)[source]

Bases: tensorpack.callbacks.base.Callback

Fetch extra tensors along with each training step, and call some function over the values. It uses _{before,after}_run method to inject tf.train.SessionRunHooks to the session. You can use it to print tensors, save tensors to file, etc.

Example:

ProcessTensors(['mycost1', 'mycost2'], lambda c1, c2: print(c1, c2, c1 + c2))
__init__(names, fn)[source]
Parameters:
  • names (list[str]) – names of tensors

  • fn – a function taking all requested tensors as input

class tensorpack.callbacks.DumpTensors(names)[source]

Bases: tensorpack.callbacks.graph.ProcessTensors

Dump some tensors to a file. Every step this callback fetches tensors and write them to a npz file under logger.get_logger_dir. The dump can be loaded by dict(np.load(filename).items()).

__init__(names)[source]
Parameters:names (list[str]) – names of tensors
class tensorpack.callbacks.DumpTensorAsImage(tensor_name, prefix=None, map_func=None, scale=255)[source]

Bases: tensorpack.callbacks.base.Callback

Dump a tensor to image(s) to logger.get_logger_dir() once triggered.

Note that it requires the tensor is directly evaluable, i.e. either inputs are not its dependency (e.g. the weights of the model), or the inputs are feedfree (in which case this callback will take an extra datapoint from the input pipeline).

__init__(tensor_name, prefix=None, map_func=None, scale=255)[source]
Parameters:
  • tensor_name (str) – the name of the tensor.

  • prefix (str) – the filename prefix for saved images. Defaults to the Op name.

  • map_func – map the value of the tensor to an image or list of images of shape [h, w] or [h, w, c]. If None, will use identity.

  • scale (float) – a multiplier on pixel values, applied after map_func.

class tensorpack.callbacks.Callbacks(cbs)[source]

Bases: tensorpack.callbacks.base.Callback

A container to hold all callbacks, and trigger them iteratively. Note that it does nothing to before_run/after_run.

__init__(cbs)[source]
Parameters:cbs (list) – a list of Callback instances.
class tensorpack.callbacks.CallbackToHook(cb)[source]

Bases: tensorflow.python.training.session_run_hook.SessionRunHook

This is only for internal implementation of before_run/after_run callbacks. You shouldn’t need to use this.

after_run(ctx, vals)[source]

Called after each call to run().

The run_values argument contains results of requested ops/tensors by before_run().

The run_context argument is the same one send to before_run call. run_context.request_stop() can be called to stop the iteration.

If session.run() raises any exceptions then after_run() is not called.

Parameters:
  • run_context – A SessionRunContext object.

  • run_values – A SessionRunValues object.

before_run(ctx)[source]

Called before each call to run().

You can return from this call a SessionRunArgs object indicating ops or tensors to add to the upcoming run() call. These ops/tensors will be run together with the ops/tensors originally passed to the original run() call. The run args you return can also contain feeds to be added to the run() call.

The run_context argument is a SessionRunContext that provides information about the upcoming run() call: the originally requested op/tensors, the TensorFlow Session.

At this point graph is finalized and you can not add ops.

Parameters:run_context – A SessionRunContext object.
Returns:None or a SessionRunArgs object.
class tensorpack.callbacks.HookToCallback(hook)[source]

Bases: tensorpack.callbacks.base.Callback

Make a tf.train.SessionRunHook into a callback. Note that when SessionRunHook.after_create_session is called, the coord argument will be None.

__init__(hook)[source]
Parameters:hook (tf.train.SessionRunHook) –
class tensorpack.callbacks.ScalarStats(names, prefix='validation')[source]

Bases: tensorpack.callbacks.inference.Inferencer

Statistics of some scalar tensor. The value will be averaged over all given datapoints.

Note that the average of accuracy over all batches is not necessarily the accuracy of the whole dataset. See ClassificationError for details.

__init__(names, prefix='validation')[source]
Parameters:
  • names (list or str) – list of names or just one name. The corresponding tensors have to be scalar.

  • prefix (str) – a prefix for logging

class tensorpack.callbacks.Inferencer[source]

Bases: tensorpack.callbacks.base.Callback

Base class of Inferencer. Inferencer is a special kind of callback that should be called by InferenceRunner. It has the methods _get_fetches and _on_fetches which are like SessionRunHooks, except that they will be used only by InferenceRunner.

_before_inference()[source]

Called before a new round of inference starts.

_after_inference()[source]

Called after a round of inference ends. Returns a dict of scalar statistics which will be logged to monitors.

_get_fetches()[source]

To be implemented by subclasses

_on_fetches(results)[source]

To be implemented by subclasses

get_fetches()[source]

Return a list of tensor names (guaranteed not op name) this inferencer needs.

on_fetches(results)[source]

Called after each new datapoint finished the forward inference.

Parameters:results (list) – list of results this inferencer fetched. Has the same length as self._get_fetches().
class tensorpack.callbacks.ClassificationError(wrong_tensor_name='incorrect_vector', summary_name='validation_error')[source]

Bases: tensorpack.callbacks.inference.Inferencer

Compute true classification error in batch mode, from a wrong tensor.

The wrong tensor is supposed to be an binary vector containing whether each sample in the batch is incorrectly classified. You can use tf.nn.in_top_k to produce this vector.

This Inferencer produces the “true” error, which could be different from ScalarStats(‘error_rate’). It takes account of the fact that batches might not have the same size in testing (because the size of test set might not be a multiple of batch size). Therefore the result can be different from averaging the error rate of each batch.

You can also use the “correct prediction” tensor, then this inferencer will give you “classification accuracy” instead of error.

__init__(wrong_tensor_name='incorrect_vector', summary_name='validation_error')[source]
Parameters:
  • wrong_tensor_name (str) – name of the wrong binary vector tensor.

  • summary_name (str) – the name to log the error with.

class tensorpack.callbacks.BinaryClassificationStats(pred_tensor_name, label_tensor_name, prefix='val')[source]

Bases: tensorpack.callbacks.inference.Inferencer

Compute precision / recall in binary classification, given the prediction vector and the label vector.

__init__(pred_tensor_name, label_tensor_name, prefix='val')[source]
Parameters:
  • pred_tensor_name (str) – name of the 0/1 prediction tensor.

  • label_tensor_name (str) – name of the 0/1 label tensor.

class tensorpack.callbacks.InferenceRunnerBase(input, infs)[source]

Bases: tensorpack.callbacks.base.Callback

Base class for inference runner.

Note

  1. InferenceRunner will use input.size() to determine how much iterations to run, so you’re responsible to ensure that input.size() is reasonable.

  2. Only works with instances of TowerTrainer.

__init__(input, infs)[source]
Parameters:
register_hook(hook)[source]
Parameters:hook (tf.train.SessionRunHook) –
class tensorpack.callbacks.InferenceRunner(input, infs, tower_name='InferenceTower', tower_func=None, device=0)[source]

Bases: tensorpack.callbacks.inference_runner.InferenceRunnerBase

A callback that runs a list of Inferencer on some InputSource.

__init__(input, infs, tower_name='InferenceTower', tower_func=None, device=0)[source]
Parameters:
  • input (InputSource or DataFlow) – The InputSource to run inference on. If given a DataFlow, will use FeedInput.

  • infs (list) – a list of Inferencer instances.

  • tower_name (str) – the name scope of the tower to build. Need to set a different one if multiple InferenceRunner are used.

  • tower_func (tfutils.TowerFuncWrapper or None) – the tower function to be used to build the graph. By defaults to call trainer.tower_func under a training=False TowerContext, but you can change it to a different tower function if you need to inference with several different graphs.

  • device (int) – the device to use

class tensorpack.callbacks.DataParallelInferenceRunner(input, infs, gpus, tower_name='InferenceTower', tower_func=None)[source]

Bases: tensorpack.callbacks.inference_runner.InferenceRunnerBase

Inference with data-parallel support on multiple GPUs. It will build one predict tower on each GPU, and run prediction with a large total batch in parallel on all GPUs. It will run the remainder (when the total size of input is not a multiple of #GPU) sequentially.

class InferencerToHookDataParallel(inf, fetches, size)[source]

Bases: tensorpack.callbacks.inference_runner.InferencerToHook

__init__(inf, fetches, size)[source]
Parameters:size (int) – number of tensors to fetch per tower
after_run(_, run_values)[source]

Called after each call to run().

The run_values argument contains results of requested ops/tensors by before_run().

The run_context argument is the same one send to before_run call. run_context.request_stop() can be called to stop the iteration.

If session.run() raises any exceptions then after_run() is not called.

Parameters:
  • run_context – A SessionRunContext object.

  • run_values – A SessionRunValues object.

__init__(input, infs, gpus, tower_name='InferenceTower', tower_func=None)[source]
Parameters:
  • input (DataFlow or QueueInput) –

  • gpus (int or list[int]) – #gpus, or list of GPU id

  • tower_name (str) – the name scope of the tower to build. Need to set a different one if multiple InferenceRunner are used.

  • tower_func (tfutils.TowerFuncWrapper or None) – the tower function to be used to build the graph. By defaults to call trainer.tower_func under a training=False TowerContext, but you can change it to a different tower function if you need to inference with several different graphs.

register_hook(h)[source]

Args: hook (tf.train.SessionRunHook):

class tensorpack.callbacks.SendStat(command, names)[source]

Bases: tensorpack.callbacks.base.Callback

An equivalent of SendMonitorData, but as a normal callback.

class tensorpack.callbacks.InjectShell(file='INJECT_SHELL.tmp', shell='ipython')[source]

Bases: tensorpack.callbacks.base.Callback

Allow users to create a specific file as a signal to pause and iteratively debug the training. Once the trigger() method is called, it detects whether the file exists, and opens an IPython/pdb shell if yes. In the shell, self is this callback, self.trainer is the trainer, and from that you can access everything else.

Example:

callbacks=[InjectShell('/path/to/pause-training.tmp'), ...]

# the following command will pause the training when the epoch finishes:
$ touch /path/to/pause-training.tmp
__init__(file='INJECT_SHELL.tmp', shell='ipython')[source]
Parameters:
  • file (str) – if this file exists, will open a shell.

  • shell (str) – one of ‘ipython’, ‘pdb’

class tensorpack.callbacks.EstimatedTimeLeft(last_k_epochs=5, median=False)[source]

Bases: tensorpack.callbacks.base.Callback

Estimate the time left until completion of training.

__init__(last_k_epochs=5, median=False)[source]
Parameters:
  • last_k_epochs (int) – Use the time spent on last k epochs to estimate total time left.

  • median (bool) – Use mean by default. If True, use the median time spent on last k epochs.

class tensorpack.callbacks.TrainingMonitor[source]

Bases: tensorpack.callbacks.base.Callback

Monitor a training progress, by processing different types of summary/statistics from trainer.

_setup_graph()[source]

Override this method to setup the monitor.

process(name, val)[source]

Process a key-value pair.

process_event(evt)[source]
Parameters:evt (tf.Event) – the most basic format acceptable by tensorboard. It could include Summary, RunMetadata, LogMessage, and more.
process_image(name, val)[source]
Parameters:val (np.ndarray) – 4D (NHWC) numpy array of images in range [0,255]. If channel is 3, assumed to be RGB.
process_scalar(name, val)[source]
Parameters:val – a scalar
process_summary(summary)[source]

Process a tf.Summary.

class tensorpack.callbacks.Monitors(monitors)[source]

Bases: tensorpack.callbacks.base.Callback

Merge monitors together for trainer to use.

In training, each trainer will create a Monitors instance, and you can access it through trainer.monitors. You should use trainer.monitors for logging and it will dispatch your logs to each sub-monitor.

get_history(name)[source]

Get a history of the scalar value of some data.

If you run multiprocess training, keep in mind that the data is perhaps only available on chief process.

Returns:a list of (global_step, value) pairs – history data for this scalar
get_latest(name)[source]

Get latest scalar value of some data.

If you run multiprocess training, keep in mind that the data is perhaps only available on chief process.

Returns:scalar
put_event(evt)[source]

Put an tf.Event. step and wall_time fields of tf.Event will be filled automatically.

Parameters:evt (tf.Event) –
put_image(name, val)[source]

Put an image.

Parameters:
  • name (str) –

  • val (np.ndarray) – 2D, 3D (HWC) or 4D (NHWC) numpy array of images in range [0,255]. If channel is 3, assumed to be RGB.

put_scalar(name, val)[source]

Put a scalar.

put_summary(summary)[source]

Put a tf.Summary.

class tensorpack.callbacks.TFEventWriter(logdir=None, max_queue=10, flush_secs=120, split_files=False)[source]

Bases: tensorpack.callbacks.monitor.TrainingMonitor

Write summaries to TensorFlow event file.

__init__(logdir=None, max_queue=10, flush_secs=120, split_files=False)[source]
Parameters:
  • logdirlogger.get_logger_dir() by default.

  • flush_secs (max_queue,) – Same as in tf.summary.FileWriter.

  • split_files – if True, split events to multiple files rather than append to a single file. Useful on certain filesystems where append is expensive.

class tensorpack.callbacks.JSONWriter[source]

Bases: tensorpack.callbacks.monitor.TrainingMonitor

Write all scalar data to a json file under logger.get_logger_dir(), grouped by their global step. If found an earlier json history file, will append to it.

FILENAME = 'stats.json'

The name of the json file. Do not change it.

static load_existing_epoch_number()[source]

Try to load the latest epoch number from an existing json stats file (if any). Returns None if not found.

static load_existing_json()[source]

Look for an existing json under logger.get_logger_dir() named “stats.json”, and return the loaded list of statistics if found. Returns None otherwise.

class tensorpack.callbacks.ScalarPrinter(enable_step=False, enable_epoch=True, whitelist=None, blacklist=None)[source]

Bases: tensorpack.callbacks.monitor.TrainingMonitor

Print scalar data into terminal.

__init__(enable_step=False, enable_epoch=True, whitelist=None, blacklist=None)[source]
Parameters:
  • enable_epoch (enable_step,) – whether to print the monitor data (if any) between steps or between epochs.

  • whitelist (list[str] or None) – A list of regex. Only names matching some regex will be allowed for printing. Defaults to match all names.

  • blacklist (list[str] or None) – A list of regex. Names matching any regex will not be printed. Defaults to match no names.

class tensorpack.callbacks.SendMonitorData(command, names)[source]

Bases: tensorpack.callbacks.monitor.TrainingMonitor

Execute a command with some specific scalar monitor data. This is useful for, e.g. building a custom statistics monitor.

It will try to send once receiving all the stats

__init__(command, names)[source]
Parameters:
  • command (str) – a command to execute. Use format string with stat names as keys.

  • names (list or str) – data name(s) to use.

Example

Send the stats to your phone through pushbullet:

SendMonitorData('curl -u your_id: https://api.pushbullet.com/v2/pushes \
         -d type=note -d title="validation error" \
         -d body={validation_error} > /dev/null 2>&1',
         'validation_error')
class tensorpack.callbacks.HyperParam[source]

Bases: object

Base class for a hyperparam.

get_value()[source]

Get the value of the param.

readable_name

A name to display

set_value(v)[source]

Set the value of the param.

Parameters:v – the value to be set
setup_graph()[source]

setup the graph in setup_graph callback stage, if necessary

class tensorpack.callbacks.GraphVarParam(name, shape=[])[source]

Bases: tensorpack.callbacks.param.HyperParam

A variable in the graph (e.g. learning_rate) can be a hyperparam.

__init__(name, shape=[])[source]
Parameters:
  • name (str) – name of the variable.

  • shape (list) – shape of the variable.

get_value()[source]

Evaluate the variable.

set_value(v)[source]

Assign the variable a new value.

setup_graph()[source]

Will setup the assign operator for that variable.

class tensorpack.callbacks.ObjAttrParam(obj, attrname, readable_name=None)[source]

Bases: tensorpack.callbacks.param.HyperParam

An attribute of an object can be a hyperparam.

__init__(obj, attrname, readable_name=None)[source]
Parameters:
  • obj – the object

  • attrname (str) – the attribute

  • readable_name (str) – The name to display and set with. Defaults to be attrname.

get_value(v)[source]

Get the value of the param.

set_value(v)[source]

Set the value of the param.

Parameters:v – the value to be set
class tensorpack.callbacks.HyperParamSetter(param)[source]

Bases: tensorpack.callbacks.base.Callback

An abstract base callback to set hyperparameters.

Once the trigger() method is called, the method _get_value_to_set() will be used to get a new value for the hyperparameter.

__init__(param)[source]
Parameters:param (HyperParam or str) – if is a str, it is assumed to be a GraphVarParam.
get_current_value()[source]
Returns:The current value of the param.
get_value_to_set()[source]
Returns:The value to assign to the variable.

Note

Subclasses will implement the abstract method _get_value_to_set(), which should return a new value to set, or return None to do nothing.

class tensorpack.callbacks.HumanHyperParamSetter(param, file_name='hyper.txt')[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Set hyperparameter by loading the value from a file each time it get called. This is useful for manually tuning some parameters (e.g. learning_rate) without interrupting the training.

__init__(param, file_name='hyper.txt')[source]
Parameters:
  • param – same as in HyperParamSetter.

  • file_name (str) – a file containing the new value of the parameter. Each line in the file is a k:v pair, for example, learning_rate:1e-4. If the pair is not found, the param will not be changed.

class tensorpack.callbacks.ScheduledHyperParamSetter(param, schedule, interp=None, step_based=False)[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Set hyperparameters by a predefined epoch-based schedule.

__init__(param, schedule, interp=None, step_based=False)[source]
Parameters:
  • param – same as in HyperParamSetter.

  • schedule (list) – with the format [(epoch1, val1), (epoch2, val2), (epoch3, val3)]. Each (ep, val) pair means to set the param to “val” after the completion of epoch ep. If ep == 0, the value will be set before the first epoch (because by default the first is epoch 1). The epoch numbers have to be increasing.

  • interp (str or None) – Either None or ‘linear’. If None, the parameter will only be set when the specific epoch or steps is reached exactly. If ‘linear’, perform linear interpolation (but no extrapolation) every time this callback is triggered.

  • step_based (bool) – interpret schedule as (step, value) instead of (epoch, value).

Example

ScheduledHyperParamSetter('learning_rate',
                          [(30, 1e-2), (60, 1e-3), (85, 1e-4), (95, 1e-5)]),
class tensorpack.callbacks.StatMonitorParamSetter(param, stat_name, value_func, threshold, last_k, reverse=False)[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Change the param by monitoring the change of a scalar statistics. The param will be changed when the scalar does not decrease/increase enough.

Once triggered, this callback observes the latest one value of stat_name, from the monitor backend.

This callback will then change a hyperparameter param by new_value = value_func(old_value), if: min(history) >= history[0] - threshold, where history = [the most recent k observations of stat_name]

Note

The statistics of interest must be created at a frequency higher than or equal to this callback. For example, using PeriodicTrigger(StatMonitorParamSetter(...), every_k_steps=100) is meaningless if the statistics to be monitored is only updated every 500 steps.

Callbacks are executed in order. Therefore, if the statistics to be monitored is created after this callback, the behavior of this callback may get delayed.

Example

If validation error wasn’t decreasing for 5 epochs, decay the learning rate by 0.2:

StatMonitorParamSetter('learning_rate', 'val-error',
                        lambda x: x * 0.2, threshold=0, last_k=5)
__init__(param, stat_name, value_func, threshold, last_k, reverse=False)[source]
Parameters:
  • param – same as in HyperParamSetter.

  • stat_name (str) – name of the statistics.

  • value_func (float -> float) – a function which returns a new value taking the old value.

  • threshold (float) – change threshold.

  • last_k (int) – use last k observations of statistics.

  • reverse (bool) – monitor increasing instead of decreasing. If True, param will be changed when max(history) <= history[0] + threshold.

class tensorpack.callbacks.HyperParamSetterWithFunc(param, func)[source]

Bases: tensorpack.callbacks.param.HyperParamSetter

Set the parameter by a function of epoch num and old value.

__init__(param, func)[source]
Parameters:
  • param – same as in HyperParamSetter.

  • funcparam will be set by new_value = func(epoch_num, old_value). epoch_num is the number of epochs that have finished.

Example

Decrease by a factor of 0.9 every two epochs:

HyperParamSetterWithFunc('learning_rate',
                         lambda e, x: x * 0.9 if e % 2 == 0 else x)
class tensorpack.callbacks.GPUUtilizationTracker(devices=None)[source]

Bases: tensorpack.callbacks.base.Callback

Summarize the average GPU utilization within an epoch.

It will start a process to run nvidia-smi every second within the epoch (the trigger_epoch time was not included), and write average utilization to monitors.

This callback creates a process, therefore it’s not safe to be used with MPI.

__init__(devices=None)[source]
Parameters:devices (list[int]) – physical GPU ids. If None, will use CUDA_VISIBLE_DEVICES
class tensorpack.callbacks.GraphProfiler(dump_metadata=False, dump_tracing=True, dump_event=False)[source]

Bases: tensorpack.callbacks.base.Callback

Enable profiling by installing session hooks, and write tracing files / events / metadata to logger.get_logger_dir().

The tracing files can be loaded from chrome://tracing. The metadata files can be processed by tfprof command line utils. The event is viewable from tensorboard.

Tips:

Note that the profiling is by default enabled for every step and is expensive. You probably want to schedule it less frequently, e.g.:

EnableCallbackIf(
    GraphProfiler(dump_tracing=True, dump_event=True),
    lambda self: self.trainer.global_step > 20 and self.trainer.global_step < 30)
__init__(dump_metadata=False, dump_tracing=True, dump_event=False)[source]
Parameters:
  • dump_metadata (bool) – Dump tf.RunMetadata to be used with tfprof.

  • dump_tracing (bool) – Dump chrome tracing files.

  • dump_event (bool) – Dump to an event processed by FileWriter and will be shown in TensorBoard.

class tensorpack.callbacks.PeakMemoryTracker(devices=[0])[source]

Bases: tensorpack.callbacks.base.Callback

Track peak memory used on each GPU device every epoch, by tf.contrib.memory_stats. The peak memory comes from the MaxBytesInUse op, which might span multiple session.run. See https://github.com/tensorflow/tensorflow/pull/13107.

__init__(devices=[0])[source]
Parameters:devices ([int] or [str]) – list of GPU devices to track memory on.
class tensorpack.callbacks.ModelSaver(max_to_keep=10, keep_checkpoint_every_n_hours=0.5, checkpoint_dir=None, var_collections=['variables'])[source]

Bases: tensorpack.callbacks.base.Callback

Save the model once triggered.

__init__(max_to_keep=10, keep_checkpoint_every_n_hours=0.5, checkpoint_dir=None, var_collections=['variables'])[source]
Parameters:
  • max_to_keep (int) – the same as in tf.train.Saver.

  • keep_checkpoint_every_n_hours (float) – the same as in tf.train.Saver. Note that “keep” does not mean “create”, but means “don’t delete”.

  • checkpoint_dir (str) – Defaults to logger.get_logger_dir().

  • var_collections (str or list of str) – collection of the variables (or list of collections) to save.

class tensorpack.callbacks.MinSaver(monitor_stat, reverse=False, filename=None, checkpoint_dir=None)[source]

Bases: tensorpack.callbacks.base.Callback

Separately save the model with minimum value of some statistics.

__init__(monitor_stat, reverse=False, filename=None, checkpoint_dir=None)[source]
Parameters:
  • monitor_stat (str) – the name of the statistics.

  • reverse (bool) – if True, will save the maximum.

  • filename (str) – the name for the saved model. Defaults to min-{monitor_stat}.tfmodel.

  • checkpoint_dir (str) – the directory containing checkpoints.

Example

Save the model with minimum validation error to “min-val-error.tfmodel”:

MinSaver('val-error')

Note

  1. It assumes that ModelSaver is used with the same checkpoint_dir and appears earlier in the callback list. The default for both ModelSaver and MinSaver is checkpoint_dir=logger.get_logger_dir()

  2. Callbacks are executed in the order they are defined. Therefore you’d want to use this callback after the callback (e.g. InferenceRunner) that produces the statistics.

class tensorpack.callbacks.MaxSaver(monitor_stat, filename=None, checkpoint_dir=None)[source]

Bases: tensorpack.callbacks.saver.MinSaver

Separately save the model with maximum value of some statistics.

See docs of MinSaver for details.

__init__(monitor_stat, filename=None, checkpoint_dir=None)[source]
Parameters:
  • monitor_stat (str) – the name of the statistics.

  • filename (str) – the name for the saved model. Defaults to max-{monitor_stat}.tfmodel.

class tensorpack.callbacks.TensorPrinter(names)[source]

Bases: tensorpack.callbacks.base.Callback

Prints the value of some tensors in each step. It’s an example of how before_run/after_run works.

__init__(names)[source]
Parameters:names (list) – list of string, the names of the tensors to print.
class tensorpack.callbacks.ProgressBar(names=[])[source]

Bases: tensorpack.callbacks.base.Callback

A progress bar based on tqdm. Enabled by default.

__init__(names=[])[source]
Parameters:names (list) – list of string, the names of the tensors to monitor on the progress bar.
class tensorpack.callbacks.SessionRunTimeout(timeout_in_ms)[source]

Bases: tensorpack.callbacks.base.Callback

Add timeout option to each sess.run call.

__init__(timeout_in_ms)[source]
Parameters:timeout_in_ms (int) –
class tensorpack.callbacks.MovingAverageSummary(collection='MOVING_SUMMARY_OPS')[source]

Bases: tensorpack.callbacks.base.Callback

This callback is enabled by default. Maintain the moving average of summarized tensors in every step, by ops added to the collection. Note that it only __maintains__ the moving averages in the graph, the actual summary should be done in other callbacks.

__init__(collection='MOVING_SUMMARY_OPS')[source]
Parameters:collection (str) – the collection of EMA-maintaining ops. The default value would work with the tensors you added by tfutils.summary.add_moving_summary(), but you can use other collections as well.
tensorpack.callbacks.MergeAllSummaries(period=0, run_alone=False, key='summaries')[source]

This callback is enabled by default. Evaluate all summaries by tf.summary.merge_all, and write them to logs.

Parameters:
  • period (int) – by default the callback summarizes once every epoch. This option (if not set to 0) makes it additionally summarize every period steps.

  • run_alone (bool) – whether to evaluate the summaries alone. If True, summaries will be evaluated after each epoch alone. If False, summaries will be evaluated together with the sess.run calls, in the last step of each epoch. For SimpleTrainer, it needs to be False because summary may depend on inputs.

  • key (str) – the collection of summary tensors. Same as in tf.summary.merge_all. Default is tf.GraphKeys.SUMMARIES.

class tensorpack.callbacks.SimpleMovingAverage(tensors, window_size)[source]

Bases: tensorpack.callbacks.base.Callback

Monitor Simple Moving Average (SMA), i.e. an average within a sliding window, of some tensors.

__init__(tensors, window_size)[source]
Parameters:
  • tensors (str or [str]) – names of tensors

  • window_size (int) – size of the moving window

class tensorpack.callbacks.PeriodicTrigger(triggerable, every_k_steps=None, every_k_epochs=None, before_train=False)[source]

Bases: tensorpack.callbacks.base.ProxyCallback

Trigger a callback every k global steps or every k epochs by its trigger() method.

Most existing callbacks which do something every epoch are implemented with trigger() method. By default the trigger() method will be called every epoch. This wrapper can make the callback run at a different frequency.

All other methods (before/after_run, trigger_step, etc) of the given callback are unaffected. They will still be called as-is.

__init__(triggerable, every_k_steps=None, every_k_epochs=None, before_train=False)[source]
Parameters:
  • triggerable (Callback) – a Callback instance with a trigger method to be called.

  • every_k_steps (int) – trigger when global_step % k == 0. Set to None to ignore.

  • every_k_epochs (int) – trigger when epoch_num % k == 0. Set to None to ignore.

  • before_train (bool) – trigger in the before_train() method.

every_k_steps and every_k_epochs can be both set, but cannot be both None unless before_train is True.

class tensorpack.callbacks.PeriodicCallback(callback, every_k_steps=None, every_k_epochs=None)[source]

Bases: tensorpack.callbacks.trigger.EnableCallbackIf

The {before,after}_epoch, {before,after}_run, trigger_{epoch,step} methods of the given callback will be enabled only when global_step % every_k_steps == 0` or ``epoch_num % every_k_epochs == 0. The other methods are unaffected.

Note that this can only makes a callback less frequent than itself. If you have a callback that by default runs every epoch by its trigger() method, use PeriodicTrigger to schedule it more frequent than itself.

__init__(callback, every_k_steps=None, every_k_epochs=None)[source]
Parameters:
  • callback (Callback) – a Callback instance.

  • every_k_steps (int) – enable the callback when global_step % k == 0. Set to None to ignore.

  • every_k_epochs (int) – enable the callback when epoch_num % k == 0. Also enable when the last step finishes (epoch_num == max_epoch and local_step == steps_per_epoch - 1). Set to None to ignore.

every_k_steps and every_k_epochs can be both set, but cannot be both None.

class tensorpack.callbacks.EnableCallbackIf(callback, pred)[source]

Bases: tensorpack.callbacks.base.ProxyCallback

Disable the {before,after}_epoch, {before,after}_run, trigger_{epoch,step} methods of a callback, unless some condition satisfies. The other methods are unaffected.

A more accurate name for this callback should be “DisableCallbackUnless”, but that’s too ugly.

Note

If you use {before,after}_run, pred will be evaluated only in before_run.

__init__(callback, pred)[source]
Parameters:
  • callback (Callback) –

  • pred (self -> bool) – a callable predicate. Has to be a pure function. The callback is disabled unless this predicate returns True.