tensorpack.input_source package

Read the relevant tutorials first for an overview of InputSource: Input Pipeline.

class tensorpack.input_source.PlaceholderInput[source]

Bases: tensorpack.input_source.input_source_base.InputSource

Just produce placeholders as input tensors.

class tensorpack.input_source.FeedInput(ds, infinite=True)[source]

Bases: tensorpack.input_source.input_source_base.InputSource

Input by iterating over a DataFlow and feed datapoints.

Note

If get_input_tensors() is called more than one time, it will return the same placeholders (i.e. feed points) as the first time. Therefore you can’t use it for data-parallel training.

__init__(ds, infinite=True)[source]
Parameters
  • ds (DataFlow) – the input DataFlow.

  • infinite (bool) – When set to False, will raise StopIteration when ds is exhausted.

class tensorpack.input_source.FeedfreeInput[source]

Bases: tensorpack.input_source.input_source_base.InputSource

Abstract base for input without feed, e.g. by queue or other operations.

class tensorpack.input_source.QueueInput(ds, queue=None)[source]

Bases: tensorpack.input_source.input_source.FeedfreeInput

Enqueue datapoints from a DataFlow to a TF queue. And the model receives dequeued tensors.

__init__(ds, queue=None)[source]
Parameters
  • ds (DataFlow) – the input DataFlow.

  • queue (tf.QueueBase) – A tf.QueueBase whose type should match the corresponding input signature of the model. Defaults to a FIFO queue of size 50.

refill_queue()[source]

Clear the queue, then call dataflow.__iter__() again and fill into the queue.

class tensorpack.input_source.BatchQueueInput(ds, batch_size, queue=None)[source]

Bases: tensorpack.input_source.input_source.QueueInput

Enqueue datapoints from a DataFlow to a TF queue. And the model receives batches formed by concatenating dequeued tensors.

__init__(ds, batch_size, queue=None)[source]
Parameters
  • ds (DataFlow) – the input DataFlow.

  • batch_size (int) – the batch size.

  • queue (tf.QueueBase) – A tf.QueueBase whose type should match the corresponding input signature of the model. Defaults to a FIFO queue of size 3000.

class tensorpack.input_source.DummyConstantInput(shapes)[source]

Bases: tensorpack.input_source.input_source.TensorInput

Input with a constant zero tensor placed on GPU. Useful for debugging performance issues

__init__(shapes)[source]
Parameters

shapes (list[list]) – a list of fully-specified shapes.

class tensorpack.input_source.TensorInput(get_tensor_fn, size=None)[source]

Bases: tensorpack.input_source.input_source.FeedfreeInput

Use inputs from a list of tensors, e.g. a TF data reading pipeline. The PTB training example shows how to use it.

__init__(get_tensor_fn, size=None)[source]
Parameters
  • get_tensor_fn (-> [tf.Tensor]) – a function which returns a list of input tensors (for example, [image, label]) when called. It will be called under a TowerContext and should return the inputs to be used in that tower. The returned tensors will be evaluated every iteration, it’s your job to make sure it’s possible.

  • size (int) – size of this input. Use None to leave it undefined.

class tensorpack.input_source.ZMQInput(end_point, hwm, bind=True)[source]

Bases: tensorpack.input_source.input_source.TensorInput

Receive tensors from a ZMQ endpoint, with ops from https://github.com/tensorpack/zmq_ops. It works with dataflow.remote.send_dataflow_zmq(format='zmq_ops')().

__init__(end_point, hwm, bind=True)[source]
Parameters
  • end_point (str) – the ZMQ endpoint

  • hwm (int) – the ZMQ high-water-mark

to_dataset(input_signature)[source]

Convert to a TF dataset.

Parameters

input_signature (list[InputSpec]) –

Returns

tf.data.Dataset

class tensorpack.input_source.TFDatasetInput(dataset)[source]

Bases: tensorpack.input_source.input_source.FeedfreeInput

Use a tf.data.Dataset instance as input.

Note

  1. In training, the given dataset or dataflow has to be infinite

    (you can use repeat(), or RepeatedData ).

  2. TensorFlow may keep the dataflow alive even if the dataset is no longer used.

__init__(dataset)[source]
Parameters

dataset (tf.data.Dataset or DataFlow) –

static dataflow_to_dataset(df, types)[source]

Wrap a dataflow to tf.data.Dataset. This function will also reset the dataflow.

If the dataflow itself is finite, the returned dataset is also finite. Therefore, if used for training, you’ll need to add .repeat() on the returned dataset.

Parameters
  • df (DataFlow) – a dataflow which produces lists

  • types ([tf.DType]) – list of types

Returns

(tf.data.Dataset)

Note

TensorFlow may keep the dataflow alive even if the dataset is no longer used.

class tensorpack.input_source.StagingInput(input, nr_stage=1, device=None)[source]

Bases: tensorpack.input_source.input_source.FeedfreeInput

A wrapper around a feedfree input, to prefetch the input in StagingArea (on GPUs).

It works by registering hooks to put & get tensors into the StagingArea. If get_input_tensors gets called multiple times, it requires that all outputs ever produced by this InputSource will be fetched together.

This means that in multi-GPU training, you should ensure that each call on hooked_sess.run depends on either all input tensors on all GPUs, or no input tensors at all. As a result you cannot use this InputSource for InferenceRunner.

More than one StagingInput cannot be used together.

class StagingCallback(input, nr_stage)[source]

Bases: tensorpack.callbacks.base.Callback

A callback registered by this input source, to make sure stage/unstage is run at each step.

__init__(input, nr_stage=1, device=None)[source]
Parameters
  • input (FeedfreeInput) –

  • nr_stage (int) – number of elements to prefetch into each StagingArea, at the beginning. Since enqueue and dequeue are synchronized, prefetching 1 element should be sufficient.

  • device (str or None) – if not None, place the StagingArea on a specific device. e.g., ‘/cpu:0’. Otherwise, they are placed under where get_inputs_tensors gets called, which could be unspecified in case of simple trainers.

class tensorpack.input_source.InputSource[source]

Bases: object

Base class for the abstract InputSource.

cached_name_scope()[source]

Yield a context under a cached name scope, whose name is the name of this InputSource class.

get_callbacks()[source]

An InputSource might need some extra maintenance during training, which is done also through the Callback interface. This method returns the callbacks and the return value will be memoized.

All callbacks will be automatically marked as chief_only=False, so they will run on all nodes.

Callbacks returned by InputSource only supports a subset of callback’s functionalities:

  1. It cannot access the trainer, because an InputSource can be used in pure inference.

  2. It cannot use the following methods: trigger_{step,epoch}, {before,after}_epoch.

In other words, these callbacks should only have the basic functionality of tf.train.SessionRunHooks.

Returns

list[Callback] – extra callbacks needed by this InputSource.

get_input_tensors()[source]
Returns

list[Tensor]

A list of tensors corresponding to the inputs of the model.

Will be used as input for the tower function. This method should always create and return new tensors when called, unless it returns placeholders.

reset_state()[source]

Initialize/reinitialize this InputSource. Must be called under a default session.

For training, it will get called once by the trainer in before_train callbacks. For inference, the InferenceRunner will call this method each time it is triggered.

setup(input_signature)[source]
Parameters

input_signature (list[tf.TensorSpec]) – list of specs for each input tensor

Returns

list[Callback] – extra callbacks needed by this InputSource. callbacks of InputSource cannot use any trigger*() method.

setup_done()[source]
Returns

bool – whether setup() has been called.

size()[source]
Returns

int – epoch size of the InputSource

tensorpack.input_source.remap_input_source(input, names)[source]

When you have some InputSource which doesn’t match the inputs of your tower function, use RemapInputSource. It produces placeholders for all the inputs in your model, except that the corresponding ones are replaced with the tensor produced by the given InputSource.

Example:

input1 = QueueInput(ds)
# assume ds produces data that should be fed to 'image' and 'label',
# but the graph takes more inputs for some reasons, or takes inputs
# of a different order, for example like the following:

# input_signature = [tf.TensorSpec((None,10), tf.float32, 'score'),
#                    tf.TensorSpec((None,20,20,3), tf.float32, 'label'),
#                    tf.TensorSpec((None,), tf.int32, 'image') ]

input2 = remap_input_source(input1, ['image', 'label'])
# now, if input2 is used with the above input_signature, it will return a
# placeholder for 'score', plus the tensors returned by input1