Reranker Framework (ReFr)
Reranking framework for structure prediction and discriminative language modeling
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
perceptron-model-test.C
Go to the documentation of this file.
1 // Copyright 2012, Google Inc.
2 // All rights reserved.
3 //
4 // Redistribution and use in source and binary forms, with or without
5 // modification, are permitted provided that the following conditions are
6 // met:
7 //
8 // * Redistributions of source code must retain the above copyright
9 // notice, this list of conditions and the following disclaimer.
10 // * Redistributions in binary form must reproduce the above
11 // copyright notice, this list of conditions and the following disclaimer
12 // in the documentation and/or other materials provided with the
13 // distribution.
14 // * Neither the name of Google Inc. nor the names of its
15 // contributors may be used to endorse or promote products derived from
16 // this software without specific prior written permission.
17 //
18 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
19 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
20 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
21 // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
22 // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
23 // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
24 // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
25 // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
26 // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
27 // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28 // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 // -----------------------------------------------------------------------------
30 //
31 //
35 
36 #include <iostream>
37 #include <memory>
38 
39 #include "candidate.H"
40 #include "candidate-set.H"
41 #include "candidate-set-iterator.H"
42 #include "candidate-set-reader.H"
43 #include "model.H"
44 #include "perceptron-model.H"
45 #include "model-proto-writer.H"
47 #include "../proto/data.pb.h"
48 #include "../proto/dataio.h"
49 
50 #define DEBUG 1
51 #define REPORTING_INTERVAL 100
52 
53 #define MAX_NUM_EXAMPLES 1000
54 #define MAX_NUM_CANDIDATES 1000000
55 
56 
57 using namespace reranker;
58 using namespace std;
59 
60 int main(int argc, char **argv) {
61  if (argc < 4) {
62  cout << "usage: <training data>+ <devtest data> <model output file>"
63  << endl;
64  return -1;
65  }
66  vector<string> training_files;
67  int i = 1;
68  for ( ; i < argc - 2; ++i) {
69  training_files.push_back(argv[i]);
70  }
71  const string devtest_file = argv[i++];
72  const string model_file = argv[i++];
73 
76  csr.set_verbosity(DEBUG);
77 
78  vector<shared_ptr<CandidateSet> > training_examples;
79  bool compressed = true;
80  bool use_base64 = true;
81  bool reset_counters = true;
82  for (vector<string>::const_iterator it = training_files.begin();
83  it != training_files.end();
84  ++it) {
85  csr.Read(*it, compressed, use_base64, reset_counters, training_examples);
86  }
87  if (DEBUG) {
88  cout << "Read " << training_examples.size() << " training examples."
89  << endl;
90  }
91 
92  vector<shared_ptr<CandidateSet> > devtest_examples;
93  csr.Read(devtest_file, compressed, use_base64, reset_counters,
94  devtest_examples);
95  if (DEBUG) {
96  cout << "Read " << devtest_examples.size() << " devtest examples."
97  << endl;
98  }
99 
100  shared_ptr<Model> model(new PerceptronModel("My Test Model"));
102  CandidateSetVectorIt;
103  CandidateSetVectorIt training_examples_it(training_examples);
104  CandidateSetVectorIt devtest_examples_it(devtest_examples);
105  model->Train(training_examples_it, devtest_examples_it);
106 
107  model->CompactifyFeatureUids();
108 
109  // TODO(dbikel): Need some kind of a factory for model writers, so the
110  // proper ModelProtoWriter gets instantiated given a
111  // particular Model subclass.
112  shared_ptr<ModelProtoWriter> model_writer(new PerceptronModelProtoWriter());
113 
114  confusion_learning::ModelMessage model_message;
115  model_writer->Write(model.get(), &model_message);
116 
117  // Write out serialized model.
118  shared_ptr<ConfusionProtoIO> proto_writer(
119  new ConfusionProtoIO(model_file, ConfusionProtoIO::WRITE,
120  compressed, use_base64));
121  proto_writer->Write(model_message);
122 
123  cout << "Have a nice day!" << endl;
124 }
Provides the reranker::PerceptronModel reranker class.
Provides the reranker::Candidate class for representing a candidate hypothesis from an initial model...
Serializer for reranker::PerceptronModel instances to ModelMessage instances.
Provides an interface and some implementations for iterating over CandidateSet instances.
void set_verbosity(int verbosity)
Sets the verbosity of this reader (mostly for debugging purposes).
This class implements a perceptron model reranker.
#define REPORTING_INTERVAL
Class for reading streams of training or test instances, where each training or test instance is a re...
void Read(const string &filename, bool compressed, bool use_base64, bool reset_counters, vector< shared_ptr< CandidateSet > > &examples)
Reads a stream of CandidateSet instances from the specified file or from standard input...
int main(int argc, char **argv)
An implementation of the CandidateSetIterator interface that is backed by an arbitrary C++ collection...
#define DEBUG
A class to construct a ModelMessage from a PerceptronModel instance.
A class for reading streams of training or test instances, where each training or test instance is a ...
Interface for serializer for reranker::Model instances to ModelMessage instances. ...
#define MAX_NUM_EXAMPLES
#define MAX_NUM_CANDIDATES
Class to hold a single training instance for a reranker, which is a set of examples, typically the n-best output of some input process, posibly including a gold-standard feature vector.
Reranker model interface.