Results

Machine Translation: WMT'15 English-German

Single models only, no ensembles. Results are listed in chronological order.

Model Name & Reference	Settings / Notes	Training Time	Test Set BLEU
tf-seq2seq	Configuration	~4 days on 8 NVidia K80 GPUs	newstest2014: 22.19 newstest2015: 25.23
Gehring, et al. (2016-11) Deep Convolutional 15/5		---	newstest2014: - newstest2015: 24.3
Wu et al. (2016-09) GNMT	8 encoder/decoder layers, 1024 LSTM units, 32k shared wordpieces (similar to BPE); residual between layers connections; lots of other tricks; newstest2012 and newstest2013 as validation sets.	---	newstest2014: 24.61 newstest2015: -
Zhou et al. (2016-06) Deep-Att		---	newstest2014: 20.6 newstest2015: -
Chung, et al. (2016-03) BPE-Char	Character-level decoder with BPE encoder. Based on Bahdanau attention model; Bidirectional encoder with 512 GRU units; 2-layer GRU decoder with 1024 units; Adam; batch size 128; gradient clipping at norm 1; Moses Tokenizer; limit sequences to 50 symbols in source and 100 symbols and 500 characters in target.	---	newstest2014: 21.5 newstest2015: 23.9
Sennrich et al. (2015-8) BPE	Authors propose BPE for subword unit nsegmentation as a pre/post-processing step to handle open vocabulary; Base model is based on Bahndanau's paper. Bidirectional encoder; GRU; 1000 hidden units; 1000 attention units; 620-dimensional word embeddings; single-layer; beam search width 12; Adadelta with batch size 80; Using Groundhog;		newstest2014: - newstest2015: 20.5
Luong et al. (2015-08)	Novel local/global attention mechanism; 50k vocabulary; 4 layers in encoder and decoder; unidirectional encoder; gradient clipping at norm 5; 1028 LSTM units, 1028-dimensional embeddings; (somewhat complicated) SGD decay schedule; dropout 0.2; UNK replace;	---	newstest2014: 20.9 newstest2015: -
Jean et al. (2014-12) RNNsearch-LV	Authors propose a new sampling-based approach to incorporate a larger vocabulary; Base model is based on Bahndanau's paper. Bidirectional encoder; GRU; 1000 hidden units; 1000 attention units; 620-dimensional word embeddings; single-layer; beam search width 12;	---	newstest2014: 19.4 newstest2015: -

Machine Translation: WMT'17

Coming soon.

Text Summarization: Gigaword

Coming soon.

Image Captioning: MSCOCO

Coming soon.

Conversational Modeling

Coming soon.