For a concrete of how to run the training script, refer to the Neural Machine Translation Tutorial.
Also see Configuration. The configuration for input data, models, and training parameters is done via YAML. You can pass YAML strings directly to the training script, or create configuration files and pass their paths to the script. These two approaches are technically equivalent. However, large YAML strings can become difficult to manage so we recommend the latter one. For example, the following two are equivalent:
1. Pass FLAGS directly:
python -m bin.train \ --model AttentionSeq2Seq \ --model_params " embedding.dim: 256 encoder.class: seq2seq.encoders.BidirectionalRNNEncoder encoder.params: rnn_cell: cell_class: GRUCell"
model: AttentionSeq2Seq model_params: embedding.dim: 256 encoder.class: seq2seq.encoders.BidirectionalRNNEncoder encoder.params: rnn_cell: cell_class: GRUCell
... and pass FLAGS via config:
python -m bin.train --config_paths config.yml
Multiple configuration files are merged recursively, in the order they are passed. This means you can have separate configuration files for model hyperparameters, input data, and training options, and mix and match as needed.
In addition to looking at the output of the training script, Tensorflow write summaries and training logs into the specified
output_dir. Use Tensorboard to visualize training progress.
Distributed Training is supported out of the box using
tf.learn. Cluster Configurations can be specified using the
TF_CONFIG environment variable, which is parsed by the
RunConfig. Refer to the Distributed Tensorflow Guide for more information.
Training script Reference
The train.py script has many more options.
||Path to a YAML configuration file defining FLAG values. Multiple files can be separated by commas. Files are merged recursively. Setting a key in these files is equivalent to setting the FLAG value with the same name.|
||YAML configuration string for the training hooks to use.|
||YAML configuration string for the training metrics to use.|
||Name of the model class. Can be either a fully-qualified name, or the name of a class defined in
||YAML configuration string for the model parameters.|
||YAML configuration string for the training data input pipeline.|
||YAML configuration string for the development data input pipeline.|
||Buckets input sequences according to these length. A comma-separated list of sequence length buckets, e.g.
||Batch size used for training and evaluation.|
||The directory to write model checkpoints and summaries to. If None, a local temporary directory is created.|
||Maximum number of training steps to run. If None, train forever.|
||Run evaluation on validation data every N steps.|
||Random seed for TensorFlow initializers. Setting this value allows consistency between reruns.|
||Save checkpoints every N seconds. Can not be specified with
||Save checkpoints every N steps. Can not be specified with
||Maximum number of recent checkpoint files to keep. As new files are created, older files are deleted. If None or 0, all checkpoint files are kept.|
||In addition to keeping the most recent checkpoint files, keep one checkpoint file for every N hours of training.|