Reranker Framework (ReFr)
Reranking framework for structure prediction and discriminative language modeling
|
A python program which will train a reranking model on a Hadoop cluster using the Iterative Parameter Mixtures perceptron training algorithm. More...
Go to the source code of this file.
Namespaces | |
hadoop-run | |
Variables | |
tuple | hadoop-run.optParse = OptionParser() |
The following arguments are available to hadoop-run.py. More... | |
string | hadoop-run.help = "Location of hadoop installation. If not set, " |
string | hadoop-run.default = "" |
string | hadoop-run.action = "append" |
hadoop-run.hadooproot = options.hadooproot | |
hadoop-run.streamingloc = options.streamingloc | |
string | hadoop-run.tmppath = hadooproot+"/contrib/streaming" |
tuple | hadoop-run.streamingjar = glob.glob(tmppath + "/hadoop-streaming*.jar") |
list | hadoop-run.filenames = [] |
Collect input filenames. More... | |
tuple | hadoop-run.hdproc |
Create output directory if it does not exist. More... | |
string | hadoop-run.train_map_options = "" |
Configuration for training optionsOptions passed to the mapper binary. More... | |
string | hadoop-run.train_files = "" |
tuple | hadoop-run.train_map |
string | hadoop-run.extractsym_map = "'" |
Shortcuts to command-line programs. More... | |
string | hadoop-run.compiledata_map = "'" |
string | hadoop-run.train_reduce = options.refrbin+"/model-merge-reducer" |
string | hadoop-run.train_recomb = options.refrbin+"/model-combine-shards" |
string | hadoop-run.symbol_recomb = options.refrbin+"/model-combine-symbols" |
string | hadoop-run.pipeeval_options = "" |
string | hadoop-run.pipeeval = options.refrbin+"/piped-model-evaluator" |
string | hadoop-run.hadoop_inputfiles = "" |
hadoop-run.precompdevfile = options.develdata | |
Precopilation of string features. More... | |
string | hadoop-run.symbol_dir = options.hdfsinputdir+"/Symbols/" |
string | hadoop-run.precomp_dir = options.hdfsinputdir+"/Precompiled/" |
string | hadoop-run.precompdev_dir = options.hdfsinputdir+"/PrecompiledDev/" |
string | hadoop-run.addl_data = "" |
string | hadoop-run.symfile_name = options.outputdir+"/" |
hadoop-run.cur_model = options.inputmodel | |
hadoop-run.converged = False | |
tuple | hadoop-run.iteration = int(options.startiter) |
int | hadoop-run.prev_loss = -9999 |
list | hadoop-run.loss_history = [] |
int | hadoop-run.num_in_decline = 0 |
int | hadoop-run.best_loss_index = 0 |
string | hadoop-run.eval_cmd = pipeeval+" -d " |
tuple | hadoop-run.evalio = pyutil.CommandIO(eval_cmd) |
string | hadoop-run.iter_str = "'" |
string | hadoop-run.model_output = options.outputdir+"/" |
string | hadoop-run.proc_cmd = train_recomb+" -o " |
int | hadoop-run.devtest_score = 0 |
float | hadoop-run.loss = 0.0 |
list | hadoop-run.diff = loss_history[-1] |
A python program which will train a reranking model on a Hadoop cluster using the Iterative Parameter Mixtures perceptron training algorithm.
You must first have a Hadoop account configured. In order to train, you will need to have the following:
The program will attempt to locate the Hadoop binary and the Hadoop streaming library. If this fails, you can specify these via command-line parameters (–hadooproot and –streamingloc).
Usage: hadoop-run.py –input InputData –hdfsinputdir HDFSIndir \ –hdfsoutputdir HDFSOutDir –outputdir OutputDir
InputData - A comma-separated list of file globs containing the training data. These must be accessible by script. OutputDir - The local directory where the trained model(s) are written. The default model name is 'model'. You can change this using the –modelname command-line parameter. HDFSInDir - A directory on HDFS where the input data will be copied to. HDFSOutDir - A directory on HDFS where the temporary data and output data will be written to. The final models are copied to the locally-accessible OutputDir.
Check input command line options.
Definition in file hadoop-run.py.