Machine Translation: WMT'15 English-German
Single models only, no ensembles. Results are listed in chronological order.
Model Name & Reference | Settings / Notes | Training Time | Test Set BLEU |
---|---|---|---|
tf-seq2seq | Configuration | ~4 days on 8 NVidia K80 GPUs | newstest2014: 22.19 newstest2015: 25.23 |
Gehring, et al. (2016-11) Deep Convolutional 15/5 |
--- | newstest2014: - newstest2015: 24.3 |
|
Wu et al. (2016-09) GNMT |
8 encoder/decoder layers, 1024 LSTM units, 32k shared wordpieces (similar to BPE); residual between layers connections; lots of other tricks; newstest2012 and newstest2013 as validation sets. | --- | newstest2014: 24.61 newstest2015: - |
Zhou et al. (2016-06) Deep-Att |
--- | newstest2014: 20.6 newstest2015: - |
|
Chung, et al. (2016-03) BPE-Char |
Character-level decoder with BPE encoder. Based on Bahdanau attention model; Bidirectional encoder with 512 GRU units; 2-layer GRU decoder with 1024 units; Adam; batch size 128; gradient clipping at norm 1; Moses Tokenizer; limit sequences to 50 symbols in source and 100 symbols and 500 characters in target. | --- | newstest2014: 21.5 newstest2015: 23.9 |
Sennrich et al. (2015-8) BPE |
Authors propose BPE for subword unit nsegmentation as a pre/post-processing step to handle open vocabulary; Base model is based on Bahndanau's paper. Bidirectional encoder; GRU; 1000 hidden units; 1000 attention units; 620-dimensional word embeddings; single-layer; beam search width 12; Adadelta with batch size 80; Using Groundhog; | newstest2014: - newstest2015: 20.5 |
|
Luong et al. (2015-08) | Novel local/global attention mechanism; 50k vocabulary; 4 layers in encoder and decoder; unidirectional encoder; gradient clipping at norm 5; 1028 LSTM units, 1028-dimensional embeddings; (somewhat complicated) SGD decay schedule; dropout 0.2; UNK replace; | --- | newstest2014: 20.9 newstest2015: - |
Jean et al. (2014-12) RNNsearch-LV |
Authors propose a new sampling-based approach to incorporate a larger vocabulary; Base model is based on Bahndanau's paper. Bidirectional encoder; GRU; 1000 hidden units; 1000 attention units; 620-dimensional word embeddings; single-layer; beam search width 12; | --- | newstest2014: 19.4 newstest2015: - |
Machine Translation: WMT'17
Coming soon.
Text Summarization: Gigaword
Coming soon.
Image Captioning: MSCOCO
Coming soon.
Conversational Modeling
Coming soon.