Reranker Framework (ReFr)
Reranking framework for structure prediction and discriminative language modeling
|
This class implements a perceptron model reranker. More...
#include <perceptron-model.H>
Classes | |
class | DefaultUpdatePredicate |
The default update predicate for perceptron and perceptron-style models, which indicates to do a model update whenever the top-scoring candidate hypothesis under the current model differs from the oracle or “gold” candidate hypothesis. More... | |
class | DefaultUpdater |
The default update function for perceptron models. More... | |
Public Member Functions | |
PerceptronModel () | |
Constructs a new instance with the empty string for its name and the DotProduct kernel function. More... | |
PerceptronModel (const string &name) | |
Constructs a new perceptron model with a DotProduct kernel function. More... | |
PerceptronModel (const string &name, KernelFunction *kernel_fn) | |
Constructs a new perceptron model with the specified kernel function. More... | |
PerceptronModel (const string &name, KernelFunction *kernel_fn, Symbols *symbols) | |
Constructs a new perceptron model with the specified kernel function and symbol table. More... | |
virtual | ~PerceptronModel () |
Destroys this perceptron model and all its data members. More... | |
virtual const string & | model_spec () const |
Returns the spec string for constructing a default instance of this model so it may be properly de-serialized by its ModelProtoReader. More... | |
virtual const string & | proto_reader_spec () const |
Returns the spec string for contructing an instance of a ModelProtoReader capable of de-serializing this Model implementation. More... | |
virtual const string & | proto_writer_spec () const |
Returns the spec string for contructing an instance of a ModelProtoWriter capable of serializing this Model implementation. More... | |
virtual int | best_model_epoch () const |
Returns the epoch of the best models seen so far during training. More... | |
virtual void | RegisterInitializers (Initializers &initializers) |
Registers several variables that may be initialized when this object is constructed via Factory::CreateOrDie. More... | |
virtual void | Init (const Environment *env, const string &arg) |
Initializes this instance. More... | |
virtual bool | NeedToKeepTraining () |
Returns whether more training epochs are required for this model. More... | |
virtual void | Train (CandidateSetIterator &examples, CandidateSetIterator &development_test) |
Trains this model on a collection of training examples, where each training example is a set of candidates. More... | |
virtual void | NewEpoch () |
virtual void | EndOfEpoch () |
virtual void | TrainOneEpoch (CandidateSetIterator &examples) |
Trains this model for one epoch, i.e., a single pass through the specified set of training examples. More... | |
virtual void | TrainOnExample (CandidateSet &example) |
Trains this model on the specified training example. More... | |
virtual bool | NeedToUpdate (CandidateSet &example) |
Indicates whether the current model needs to be updated; the implementation here simply returns true if the best-scoring candidate is not equal to the gold or reference candidate. More... | |
virtual void | Update (CandidateSet &example) |
Updates the current model based on the specified set of candidates. More... | |
virtual double | Evaluate (CandidateSetIterator &development_test) |
Evaluates this model on the specified set of held-out development test data. More... | |
virtual void | ScoreCandidates (CandidateSet &candidates, bool training) |
Scores the specified set of candidates according to either the raw or averaged version of this perceptron model, keeping track of which candidate has the highest score and which candidate has the lowest loss with the best score. More... | |
virtual double | ScoreCandidate (Candidate &candidate, bool training) |
Scores a candidate according to either the raw or averaged version of this perceptron model. More... | |
virtual void | CompactifyFeatureUids () |
Renumbers the potentially sparse feature uid’s so that they occupy the interval [0,n-1] densely, for n non-zero features in use by this model. More... | |
void | set_max_epochs_in_decline (int max_epochs_in_decline) |
Sets the maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far). More... | |
virtual const TrainingVectorSet & | models () const |
Returns the set of models and statistics used by this PerceptronModel instance. More... | |
Public Member Functions inherited from reranker::Model | |
Model () | |
Constructs a new instance with the empty string for its name and a NULL kernel function. More... | |
Model (const string &name) | |
Constructs a new instance with a NULL kernel function. More... | |
Model (const string &name, KernelFunction *kernel_fn) | |
Constructs a new instance with the specified kernel function. More... | |
Model (const string &name, KernelFunction *kernel_fn, Symbols *symbols) | |
Constructs a new instance with the specified kernel function and symbol table. More... | |
virtual | ~Model () |
Destroys this model and its associated kernel function. More... | |
const string & | name () const |
Returns the unique name for this model instance. More... | |
Symbols * | symbols () const |
Returns the symbol table for this model. More... | |
const Time & | time () const |
Returns the current training time of this model: number of epochs, number of time steps in the current epoch and total number of time steps (which is equal to the total number of training examples seen). More... | |
int | num_updates () const |
Returns the number of updates made by this model. More... | |
const vector< int > & | num_training_errors_per_epoch () |
Returns the number of training errors made for each epoch. More... | |
int | num_training_errors () const |
Returns the number of training errors made by this model. More... | |
int | min_epochs () const |
Returns the minimum number of epochs to train. More... | |
int | max_epochs () const |
Returns the maximum number of epochs to train. More... | |
const vector< double > & | loss_per_epoch () |
Returns the loss per epoch for epoch of training that was evaluated. More... | |
virtual shared_ptr < Candidate::Comparator > | score_comparator () |
Returns a pointer to the score comparator used by this model. More... | |
virtual shared_ptr < Candidate::Comparator > | gold_comparator () |
Returns a pointer to the gold comparator used by this model. More... | |
virtual void | set_min_epochs (int min_epochs) |
Sets the minimum number of epochs to train. More... | |
virtual void | set_max_epochs (int max_epochs) |
Sets the maximum number of epochs to train. More... | |
virtual void | set_end_of_epoch_hook (Hook *end_of_epoch_hook) |
virtual bool | use_weighted_loss () |
virtual void | set_use_weighted_loss (bool use_weighted_loss) |
virtual void | set_symbols (Symbols *symbols) |
Sets the Symbols instance for this Model to be the specified instance. More... | |
Public Member Functions inherited from reranker::FactoryConstructible | |
virtual | ~FactoryConstructible () |
Protected Member Functions | |
void | SetDefaultObjects () |
virtual void | ComputeFeaturesToUpdate (const CandidateSet &example, unordered_set< int > &gold_features_to_update, unordered_set< int > &best_scoring_features_to_update) const |
Computes the features to be updated for the gold candidate and the best-scoring candidate. More... | |
virtual double | ComputeStepSize (const unordered_set< int > &gold_features, const unordered_set< int > &best_scoring_features, const CandidateSet &example) |
Computes the step size for the next update, and, as a side effect, caches this value in step_size_. More... | |
Protected Member Functions inherited from reranker::Model | |
void | set_name (const string &name) |
Sets the name of this Model instance. More... | |
void | set_kernel_fn (KernelFunction *kernel_fn) |
Sets the kernel function for this model. More... | |
void | set_score_comparator (shared_ptr< Candidate::Comparator > score_comparator) |
void | set_gold_comparator (shared_ptr< Candidate::Comparator > gold_comparator) |
void | SetDefaultObjects () |
void | SetDefaultComparators () |
void | SetDefaultCandidateSetScorer () |
shared_ptr< Candidate::Comparator > | GetComparator (const string &spec) const |
shared_ptr< CandidateSet::Scorer > | GetCandidateSetScorer (const string &spec) const |
shared_ptr< UpdatePredicate > | GetUpdatePredicate (const string &spec) const |
shared_ptr< Updater > | GetUpdater (const string &spec) const |
virtual void | CheckNumberOfTokens (const string &arg, const vector< string > &tokens, size_t min_expected_number, size_t max_expected_number, const string &class_name) const |
A helper method for implementing the Init method: throws a std::runtime_error if the number of tokens in the argument string is not the expected number. More... | |
Protected Attributes | |
TrainingVectorSet | models_ |
The feature vectors representing this model. More... | |
TrainingVectorSet | best_models_ |
The best models seen so far during training, according to evaluation on the held-out development test data. More... | |
int | best_model_epoch_ |
The epoch of the best models seen so far during training. More... | |
int | max_epochs_in_decline_ |
The maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far). More... | |
int | num_epochs_in_decline_ |
The current number of training epochs in which the model has been degrading in development set performance (i.e., has been having more errors than best model so far). More... | |
double | step_size_ |
The last value computed by the ComputeStepSize method. More... | |
string | model_spec_ |
Protected Attributes inherited from reranker::Model | |
string | name_ |
This model’s unique name. More... | |
Time | time_ |
The tiny object that holds the "training time" for this model (epoch, index and absolute time index). More... | |
KernelFunction * | kernel_fn_ |
Yes, this is an interface, but we add the kernel function as a data member. More... | |
Symbols * | symbols_ |
The symbol table for this model (may be NULL). More... | |
shared_ptr< Candidate::Comparator > | score_comparator_ |
A comparator to provide an ordering for candidates based on score when scoring all candidates in a set. More... | |
shared_ptr< Candidate::Comparator > | gold_comparator_ |
A comparator to provide an ordering for candidates to find the gold candidate in a set. More... | |
shared_ptr< CandidateSet::Scorer > | candidate_set_scorer_ |
A scorer for CandidateSet instances. More... | |
shared_ptr< UpdatePredicate > | update_predicate_ |
The update predicate for this model. More... | |
shared_ptr< Updater > | updater_ |
The updater for this model. More... | |
vector< double > | loss_per_epoch_ |
The average loss per epoch. More... | |
vector< int > | num_testing_errors_per_epoch_ |
The number of testing errors made on held-out development test data for each epoch. More... | |
vector< int > | num_training_errors_per_epoch_ |
The number of errors made on training examples during each epoch. More... | |
int | num_training_errors_ |
The number of errors made on training examples. More... | |
int | num_updates_ |
The number of times an update was performed on this model during training. More... | |
int | min_epochs_ |
The minimum number of training epochs to execute. More... | |
int | max_epochs_ |
The maximum number of training epochs to execute. More... | |
Hook * | end_of_epoch_hook_ |
A hook to be performed at the end of every epoch. More... | |
bool | use_weighted_loss_ |
Indicates whether this model should weight each candidate’s loss by the value returned by CandidateSet::loss_weight. More... | |
Static Protected Attributes | |
static string | proto_reader_spec_ |
A string that specifies to construct a PerceptronModelProtoReader, which is capable of de-serializing an instance of this class. More... | |
static string | proto_writer_spec_ |
A string that specifies to construct a PerceptronModelProtoWriter, which is capable of serializing an instance of this class. More... | |
Friends | |
class | PerceptronModelProtoWriter |
class | PerceptronModelProtoReader |
This class implements a perceptron model reranker.
While this model can consist of arbitrary feature types, there is special handling for n-gram–based features, to capture the fact that, e.g., a bigram suffix exists whenever a trigram occurs.
Definition at line 63 of file perceptron-model.H.
|
inline |
Constructs a new instance with the empty string for its name and the DotProduct kernel function.
Definition at line 70 of file perceptron-model.H.
|
inline |
Constructs a new perceptron model with a DotProduct kernel function.
name | the unique name of this perceptron model instance |
Definition at line 85 of file perceptron-model.H.
|
inline |
Constructs a new perceptron model with the specified kernel function.
name | the unique name of this perceptron model instance |
kernel_fn | the kernel function for this model to use when evaluating on training or test instances |
Definition at line 101 of file perceptron-model.H.
|
inline |
Constructs a new perceptron model with the specified kernel function and symbol table.
name | the unique name of this model instance |
kernel_fn | the kernel function for this model to use when applied to training or test instances |
symbols | the symbol table for this Model to use; this Model will be responsible for deleting this Symbols object |
Definition at line 120 of file perceptron-model.H.
|
inlinevirtual |
Destroys this perceptron model and all its data members.
Definition at line 134 of file perceptron-model.H.
|
inlinevirtual |
Returns the epoch of the best models seen so far during training.
(Primarily here for the PerceptronModelProtoWriter serializer.)
Implements reranker::Model.
Definition at line 190 of file perceptron-model.H.
|
virtual |
Renumbers the potentially sparse feature uid’s so that they occupy the interval [0,n-1]
densely, for n
non-zero features in use by this model.
If the internal Symbols instance is non-NULL
, then this method also adjusts it to reflect the new set of feature uid’s.
Implements reranker::Model.
Definition at line 374 of file perceptron-model.C.
|
protectedvirtual |
Computes the features to be updated for the gold candidate and the best-scoring candidate.
Let G be gold features and B be best-scoring features. For the perceptron, we want to update the set difference G\B positively and B\G negatively. These two set difference operations are computed by this method.
example | the candidate set from which to get the gold feature vector and the best-scoring candidate feature vector | |
[out] | gold_features_to_update | a set in which to insert the uid's of all features in the gold that are not in the best scoring candidate |
[out] | best_scoring_features_to_update | a set in which to insert the uid's of all features in the best-scoring candidate that are not in the gold candidate |
Definition at line 409 of file perceptron-model.C.
|
inlineprotectedvirtual |
Computes the step size for the next update, and, as a side effect, caches this value in step_size_.
In the case of the standard perceptron model implemented here, the step size does not change, and so this method simply returns the step size value set at construction time.
Reimplemented in reranker::MiraStyleModel.
Definition at line 389 of file perceptron-model.H.
|
virtual |
Implements reranker::Model.
Definition at line 185 of file perceptron-model.C.
|
virtual |
Evaluates this model on the specified set of held-out development test data.
development_test | a held-out set of examples to use for evaluation of this model (during training, this method is typically invoked after each epoch to determine when to stop) |
Implements reranker::Model.
Definition at line 288 of file perceptron-model.C.
|
virtual |
Initializes this instance.
This method is guaranteed to be invoked by a Factory just after construction.
Reimplemented from reranker::FactoryConstructible.
Definition at line 82 of file perceptron-model.C.
|
inlinevirtual |
Returns the spec string for constructing a default instance of this model so it may be properly de-serialized by its ModelProtoReader.
Implements reranker::Model.
Definition at line 176 of file perceptron-model.H.
|
inlinevirtual |
Returns the set of models and statistics used by this PerceptronModel instance.
Definition at line 353 of file perceptron-model.H.
|
virtual |
Returns whether more training epochs are required for this model.
Implements reranker::Model.
Definition at line 115 of file perceptron-model.C.
|
virtual |
Indicates whether the current model needs to be updated; the implementation here simply returns true if the best-scoring candidate is not equal to the gold or reference candidate.
example | the current training example |
Implements reranker::Model.
Definition at line 226 of file perceptron-model.C.
|
virtual |
Implements reranker::Model.
Definition at line 163 of file perceptron-model.C.
|
inlinevirtual |
Returns the spec string for contructing an instance of a ModelProtoReader capable of de-serializing this Model implementation.
Implements reranker::Model.
Definition at line 179 of file perceptron-model.H.
|
inlinevirtual |
Returns the spec string for contructing an instance of a ModelProtoWriter capable of serializing this Model implementation.
Implements reranker::Model.
Definition at line 184 of file perceptron-model.H.
|
virtual |
Registers several variables that may be initialized when this object is constructed via Factory::CreateOrDie.
Variable name | Type | Required | Description | Default value |
---|---|---|---|---|
name | string | Yes | The name of this model instance (for human consumption). | n/a |
score_comparator | Candidate::Comparator | No | The object by which the scores of two Candidate instances are compared. | DefaultScoreComparator |
gold_comparator | Candidate::Comparator | No | The object by which two Candidate instances are compared when finding the “gold” candidate. | DefaultGoldComparator |
candidate_set_scorer | CandidateSet::Scorer | No | The object to score a CandidateSet instance. | DefaultCandidateSetScorer |
update_predicate | Model::UpdatePredicate | No | The object to let the model know if it is time to do an update. | PerceptronModelDefaultUpdatePredicate |
updater | Model::Updater | No | The object to update the model. | PerceptronModelDefaultUpdater |
step_size | double | No | The initial value of the step size for parameter updates. | 1.0 |
Reimplemented from reranker::FactoryConstructible.
Reimplemented in reranker::MiraStyleModel.
Definition at line 70 of file perceptron-model.C.
|
virtual |
Scores a candidate according to either the raw or averaged version of this perceptron model.
The specified candidate's score may be modified.
[in,out] | candidate | the candidate to be scored by this model |
training | whether this is being called during training or evaluation of a model |
Implements reranker::Model.
Definition at line 359 of file perceptron-model.C.
|
virtual |
Scores the specified set of candidates according to either the raw or averaged version of this perceptron model, keeping track of which candidate has the highest score and which candidate has the lowest loss with the best score.
The scores of the specified set of candidates may be modified. This method is currently entirely implemented via DefaultCandidateSetScorer.
[in,out] | candidates | the set of candidates to be scored |
training | whether this is being called during training or evaluation of a model |
Implements reranker::Model.
Definition at line 354 of file perceptron-model.C.
|
inline |
Sets the maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far).
Definition at line 347 of file perceptron-model.H.
|
inlineprotected |
Definition at line 356 of file perceptron-model.H.
|
virtual |
Trains this model on a collection of training examples, where each training example is a set of candidates.
examples | the set of training examples on which to train this model |
development_test | the set of held-out examples to use to evaluate the model after each epoch |
Implements reranker::Model.
Definition at line 88 of file perceptron-model.C.
|
virtual |
Trains this model for one epoch, i.e., a single pass through the specified set of training examples.
Typically the Train method will be implemented in terms of this method.
examples | the set of training examples on which to train this model |
Implements reranker::Model.
Definition at line 176 of file perceptron-model.C.
|
virtual |
Trains this model on the specified training example.
example | the example to train on |
Implements reranker::Model.
Definition at line 203 of file perceptron-model.C.
|
virtual |
Updates the current model based on the specified set of candidates.
TrainOnExample will be implemented in terms of this method.
example | the current training example |
Implements reranker::Model.
Definition at line 237 of file perceptron-model.C.
|
friend |
Definition at line 66 of file perceptron-model.H.
|
friend |
Definition at line 65 of file perceptron-model.H.
|
protected |
The epoch of the best models seen so far during training.
Definition at line 413 of file perceptron-model.H.
|
protected |
The best models seen so far during training, according to evaluation on the held-out development test data.
Definition at line 411 of file perceptron-model.H.
|
protected |
The maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far).
Definition at line 416 of file perceptron-model.H.
|
protected |
Definition at line 423 of file perceptron-model.H.
|
protected |
The feature vectors representing this model.
Definition at line 408 of file perceptron-model.H.
|
protected |
The current number of training epochs in which the model has been degrading in development set performance (i.e., has been having more errors than best model so far).
Definition at line 420 of file perceptron-model.H.
|
staticprotected |
A string that specifies to construct a PerceptronModelProtoReader, which is capable of de-serializing an instance of this class.
Definition at line 428 of file perceptron-model.H.
|
staticprotected |
A string that specifies to construct a PerceptronModelProtoWriter, which is capable of serializing an instance of this class.
Definition at line 432 of file perceptron-model.H.
|
protected |
The last value computed by the ComputeStepSize method.
Definition at line 422 of file perceptron-model.H.