What to work on

We are always looking for contributors. If you are interested in contributing but are not sure to what work on, take a look at the open Github Issues that are unassigned. Those with the help wanted label are especially good candidates. If you are working on a larger task and unsure how to approach it, just leave a comment to get feedback on design decisions. We are also always looking for the following:

  • Fix issues with the documentation (typos, outdated docs, ...)
  • Improve code quality through refactoring, more tests, better docstrings, etc.
  • Implement standard benchmark model found in the literature
  • Running benchmarks on standard datasets

Development Setup

We recommend using Python 3. If you're on a Mac the easiest way to do this is probably using Homebrew. Then,

# Clone this repository.
git clone https://github.com/google/seq2seq.git
cd seq2seq

# Create a new virtual environment and activate it.
python3 -m venv ~/tf-venv
source ~/tf-venv/bin/activate

# Install package dependencies and utilities.
pip install -e .
pip install nose pylint tox yapf mkdocs

# Make sure the tests are passing.
nosetests

# Code :)

# Make sure the tests are passing
nosetests

# Before submitting a pull request,
# run the full test suite for Python 3 and Python 2.7
tox

Python Style

We use pylint to enforce coding style. Before submitting a pull request, make sure you run:

pylint seq2seq

CircleCI integration tests will fail if pylint reports any critica errors, preventing use from merging your changes. If you are unsure about code formatting, you can use yapf for automated code formatting:

yapf -ir ./seq2seq/some/file/you/changed

GraphModule

All classes that modify the Graph should inherit from seq2seq.graph_module.GraphModule, which is a wrapper around TensorFlow's tf.make_template function that enables easy variable sharing, allowing you to do something like this:

encode_fn = SomeEncoderModule(...)

# New variables are created in this call.
output1 = encode_fn(input1)

# No new variables are created here. The variables from the above call are re-used.
# Note how this is different from normal TensorFlow where you would need to use variable scopes.
output2 = encode_fn(input2)

# Because this is a new instance a second set of variables is created.
encode_fn2 = SomeEncoderModule(...)
output3 = encode_fn2(input3)

Functions vs. Classes

  • Operations that create new variables must be implemented as classes and must inherit from GraphModule.
  • Operations that do not create new variables can be implemented as standard python functions, or as classes that inherit from GraphModule if they have a lot of logic.