Reranker Framework (ReFr)
Reranking framework for structure prediction and discriminative language modeling
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Pages
Classes | Public Member Functions | Protected Member Functions | Protected Attributes | Static Protected Attributes | Friends | List of all members
reranker::PerceptronModel Class Reference

This class implements a perceptron model reranker. More...

#include <perceptron-model.H>

Inheritance diagram for reranker::PerceptronModel:
reranker::Model reranker::FactoryConstructible reranker::MiraStyleModel

Classes

class  DefaultUpdatePredicate
 The default update predicate for perceptron and perceptron-style models, which indicates to do a model update whenever the top-scoring candidate hypothesis under the current model differs from the oracle or “gold” candidate hypothesis. More...
 
class  DefaultUpdater
 The default update function for perceptron models. More...
 

Public Member Functions

 PerceptronModel ()
 Constructs a new instance with the empty string for its name and the DotProduct kernel function. More...
 
 PerceptronModel (const string &name)
 Constructs a new perceptron model with a DotProduct kernel function. More...
 
 PerceptronModel (const string &name, KernelFunction *kernel_fn)
 Constructs a new perceptron model with the specified kernel function. More...
 
 PerceptronModel (const string &name, KernelFunction *kernel_fn, Symbols *symbols)
 Constructs a new perceptron model with the specified kernel function and symbol table. More...
 
virtual ~PerceptronModel ()
 Destroys this perceptron model and all its data members. More...
 
virtual const string & model_spec () const
 Returns the spec string for constructing a default instance of this model so it may be properly de-serialized by its ModelProtoReader. More...
 
virtual const string & proto_reader_spec () const
 Returns the spec string for contructing an instance of a ModelProtoReader capable of de-serializing this Model implementation. More...
 
virtual const string & proto_writer_spec () const
 Returns the spec string for contructing an instance of a ModelProtoWriter capable of serializing this Model implementation. More...
 
virtual int best_model_epoch () const
 Returns the epoch of the best models seen so far during training. More...
 
virtual void RegisterInitializers (Initializers &initializers)
 Registers several variables that may be initialized when this object is constructed via Factory::CreateOrDie. More...
 
virtual void Init (const Environment *env, const string &arg)
 Initializes this instance. More...
 
virtual bool NeedToKeepTraining ()
 Returns whether more training epochs are required for this model. More...
 
virtual void Train (CandidateSetIterator &examples, CandidateSetIterator &development_test)
 Trains this model on a collection of training examples, where each training example is a set of candidates. More...
 
virtual void NewEpoch ()
 
virtual void EndOfEpoch ()
 
virtual void TrainOneEpoch (CandidateSetIterator &examples)
 Trains this model for one epoch, i.e., a single pass through the specified set of training examples. More...
 
virtual void TrainOnExample (CandidateSet &example)
 Trains this model on the specified training example. More...
 
virtual bool NeedToUpdate (CandidateSet &example)
 Indicates whether the current model needs to be updated; the implementation here simply returns true if the best-scoring candidate is not equal to the gold or reference candidate. More...
 
virtual void Update (CandidateSet &example)
 Updates the current model based on the specified set of candidates. More...
 
virtual double Evaluate (CandidateSetIterator &development_test)
 Evaluates this model on the specified set of held-out development test data. More...
 
virtual void ScoreCandidates (CandidateSet &candidates, bool training)
 Scores the specified set of candidates according to either the raw or averaged version of this perceptron model, keeping track of which candidate has the highest score and which candidate has the lowest loss with the best score. More...
 
virtual double ScoreCandidate (Candidate &candidate, bool training)
 Scores a candidate according to either the raw or averaged version of this perceptron model. More...
 
virtual void CompactifyFeatureUids ()
 Renumbers the potentially sparse feature uid’s so that they occupy the interval [0,n-1] densely, for n non-zero features in use by this model. More...
 
void set_max_epochs_in_decline (int max_epochs_in_decline)
 Sets the maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far). More...
 
virtual const TrainingVectorSetmodels () const
 Returns the set of models and statistics used by this PerceptronModel instance. More...
 
- Public Member Functions inherited from reranker::Model
 Model ()
 Constructs a new instance with the empty string for its name and a NULL kernel function. More...
 
 Model (const string &name)
 Constructs a new instance with a NULL kernel function. More...
 
 Model (const string &name, KernelFunction *kernel_fn)
 Constructs a new instance with the specified kernel function. More...
 
 Model (const string &name, KernelFunction *kernel_fn, Symbols *symbols)
 Constructs a new instance with the specified kernel function and symbol table. More...
 
virtual ~Model ()
 Destroys this model and its associated kernel function. More...
 
const string & name () const
 Returns the unique name for this model instance. More...
 
Symbolssymbols () const
 Returns the symbol table for this model. More...
 
const Timetime () const
 Returns the current training time of this model: number of epochs, number of time steps in the current epoch and total number of time steps (which is equal to the total number of training examples seen). More...
 
int num_updates () const
 Returns the number of updates made by this model. More...
 
const vector< int > & num_training_errors_per_epoch ()
 Returns the number of training errors made for each epoch. More...
 
int num_training_errors () const
 Returns the number of training errors made by this model. More...
 
int min_epochs () const
 Returns the minimum number of epochs to train. More...
 
int max_epochs () const
 Returns the maximum number of epochs to train. More...
 
const vector< double > & loss_per_epoch ()
 Returns the loss per epoch for epoch of training that was evaluated. More...
 
virtual shared_ptr
< Candidate::Comparator
score_comparator ()
 Returns a pointer to the score comparator used by this model. More...
 
virtual shared_ptr
< Candidate::Comparator
gold_comparator ()
 Returns a pointer to the gold comparator used by this model. More...
 
virtual void set_min_epochs (int min_epochs)
 Sets the minimum number of epochs to train. More...
 
virtual void set_max_epochs (int max_epochs)
 Sets the maximum number of epochs to train. More...
 
virtual void set_end_of_epoch_hook (Hook *end_of_epoch_hook)
 
virtual bool use_weighted_loss ()
 
virtual void set_use_weighted_loss (bool use_weighted_loss)
 
virtual void set_symbols (Symbols *symbols)
 Sets the Symbols instance for this Model to be the specified instance. More...
 
- Public Member Functions inherited from reranker::FactoryConstructible
virtual ~FactoryConstructible ()
 

Protected Member Functions

void SetDefaultObjects ()
 
virtual void ComputeFeaturesToUpdate (const CandidateSet &example, unordered_set< int > &gold_features_to_update, unordered_set< int > &best_scoring_features_to_update) const
 Computes the features to be updated for the gold candidate and the best-scoring candidate. More...
 
virtual double ComputeStepSize (const unordered_set< int > &gold_features, const unordered_set< int > &best_scoring_features, const CandidateSet &example)
 Computes the step size for the next update, and, as a side effect, caches this value in step_size_. More...
 
- Protected Member Functions inherited from reranker::Model
void set_name (const string &name)
 Sets the name of this Model instance. More...
 
void set_kernel_fn (KernelFunction *kernel_fn)
 Sets the kernel function for this model. More...
 
void set_score_comparator (shared_ptr< Candidate::Comparator > score_comparator)
 
void set_gold_comparator (shared_ptr< Candidate::Comparator > gold_comparator)
 
void SetDefaultObjects ()
 
void SetDefaultComparators ()
 
void SetDefaultCandidateSetScorer ()
 
shared_ptr< Candidate::ComparatorGetComparator (const string &spec) const
 
shared_ptr< CandidateSet::ScorerGetCandidateSetScorer (const string &spec) const
 
shared_ptr< UpdatePredicateGetUpdatePredicate (const string &spec) const
 
shared_ptr< UpdaterGetUpdater (const string &spec) const
 
virtual void CheckNumberOfTokens (const string &arg, const vector< string > &tokens, size_t min_expected_number, size_t max_expected_number, const string &class_name) const
 A helper method for implementing the Init method: throws a std::runtime_error if the number of tokens in the argument string is not the expected number. More...
 

Protected Attributes

TrainingVectorSet models_
 The feature vectors representing this model. More...
 
TrainingVectorSet best_models_
 The best models seen so far during training, according to evaluation on the held-out development test data. More...
 
int best_model_epoch_
 The epoch of the best models seen so far during training. More...
 
int max_epochs_in_decline_
 The maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far). More...
 
int num_epochs_in_decline_
 The current number of training epochs in which the model has been degrading in development set performance (i.e., has been having more errors than best model so far). More...
 
double step_size_
 The last value computed by the ComputeStepSize method. More...
 
string model_spec_
 
- Protected Attributes inherited from reranker::Model
string name_
 This model’s unique name. More...
 
Time time_
 The tiny object that holds the "training time" for this model (epoch, index and absolute time index). More...
 
KernelFunctionkernel_fn_
 Yes, this is an interface, but we add the kernel function as a data member. More...
 
Symbolssymbols_
 The symbol table for this model (may be NULL). More...
 
shared_ptr< Candidate::Comparatorscore_comparator_
 A comparator to provide an ordering for candidates based on score when scoring all candidates in a set. More...
 
shared_ptr< Candidate::Comparatorgold_comparator_
 A comparator to provide an ordering for candidates to find the gold candidate in a set. More...
 
shared_ptr< CandidateSet::Scorercandidate_set_scorer_
 A scorer for CandidateSet instances. More...
 
shared_ptr< UpdatePredicateupdate_predicate_
 The update predicate for this model. More...
 
shared_ptr< Updaterupdater_
 The updater for this model. More...
 
vector< double > loss_per_epoch_
 The average loss per epoch. More...
 
vector< int > num_testing_errors_per_epoch_
 The number of testing errors made on held-out development test data for each epoch. More...
 
vector< int > num_training_errors_per_epoch_
 The number of errors made on training examples during each epoch. More...
 
int num_training_errors_
 The number of errors made on training examples. More...
 
int num_updates_
 The number of times an update was performed on this model during training. More...
 
int min_epochs_
 The minimum number of training epochs to execute. More...
 
int max_epochs_
 The maximum number of training epochs to execute. More...
 
Hookend_of_epoch_hook_
 A hook to be performed at the end of every epoch. More...
 
bool use_weighted_loss_
 Indicates whether this model should weight each candidate’s loss by the value returned by CandidateSet::loss_weight. More...
 

Static Protected Attributes

static string proto_reader_spec_
 A string that specifies to construct a PerceptronModelProtoReader, which is capable of de-serializing an instance of this class. More...
 
static string proto_writer_spec_
 A string that specifies to construct a PerceptronModelProtoWriter, which is capable of serializing an instance of this class. More...
 

Friends

class PerceptronModelProtoWriter
 
class PerceptronModelProtoReader
 

Detailed Description

This class implements a perceptron model reranker.

While this model can consist of arbitrary feature types, there is special handling for n-gram–based features, to capture the fact that, e.g., a bigram suffix exists whenever a trigram occurs.

Definition at line 63 of file perceptron-model.H.

Constructor & Destructor Documentation

reranker::PerceptronModel::PerceptronModel ( )
inline

Constructs a new instance with the empty string for its name and the DotProduct kernel function.

Definition at line 70 of file perceptron-model.H.

reranker::PerceptronModel::PerceptronModel ( const string &  name)
inline

Constructs a new perceptron model with a DotProduct kernel function.

Parameters
namethe unique name of this perceptron model instance

Definition at line 85 of file perceptron-model.H.

reranker::PerceptronModel::PerceptronModel ( const string &  name,
KernelFunction kernel_fn 
)
inline

Constructs a new perceptron model with the specified kernel function.

Parameters
namethe unique name of this perceptron model instance
kernel_fnthe kernel function for this model to use when evaluating on training or test instances

Definition at line 101 of file perceptron-model.H.

reranker::PerceptronModel::PerceptronModel ( const string &  name,
KernelFunction kernel_fn,
Symbols symbols 
)
inline

Constructs a new perceptron model with the specified kernel function and symbol table.

Parameters
namethe unique name of this model instance
kernel_fnthe kernel function for this model to use when applied to training or test instances
symbolsthe symbol table for this Model to use; this Model will be responsible for deleting this Symbols object

Definition at line 120 of file perceptron-model.H.

virtual reranker::PerceptronModel::~PerceptronModel ( )
inlinevirtual

Destroys this perceptron model and all its data members.

Definition at line 134 of file perceptron-model.H.

Member Function Documentation

virtual int reranker::PerceptronModel::best_model_epoch ( ) const
inlinevirtual

Returns the epoch of the best models seen so far during training.

(Primarily here for the PerceptronModelProtoWriter serializer.)

Implements reranker::Model.

Definition at line 190 of file perceptron-model.H.

void reranker::PerceptronModel::CompactifyFeatureUids ( )
virtual

Renumbers the potentially sparse feature uid’s so that they occupy the interval [0,n-1] densely, for n non-zero features in use by this model.

If the internal Symbols instance is non-NULL, then this method also adjusts it to reflect the new set of feature uid’s.

Implements reranker::Model.

Definition at line 374 of file perceptron-model.C.

void reranker::PerceptronModel::ComputeFeaturesToUpdate ( const CandidateSet example,
unordered_set< int > &  gold_features_to_update,
unordered_set< int > &  best_scoring_features_to_update 
) const
protectedvirtual

Computes the features to be updated for the gold candidate and the best-scoring candidate.

Let G be gold features and B be best-scoring features. For the perceptron, we want to update the set difference G\B positively and B\G negatively. These two set difference operations are computed by this method.

Attention
Neither of the two specified sets are cleared by this method.
Parameters
examplethe candidate set from which to get the gold feature vector and the best-scoring candidate feature vector
[out]gold_features_to_updatea set in which to insert the uid's of all features in the gold that are not in the best scoring candidate
[out]best_scoring_features_to_updatea set in which to insert the uid's of all features in the best-scoring candidate that are not in the gold candidate

Definition at line 409 of file perceptron-model.C.

virtual double reranker::PerceptronModel::ComputeStepSize ( const unordered_set< int > &  gold_features,
const unordered_set< int > &  best_scoring_features,
const CandidateSet example 
)
inlineprotectedvirtual

Computes the step size for the next update, and, as a side effect, caches this value in step_size_.

In the case of the standard perceptron model implemented here, the step size does not change, and so this method simply returns the step size value set at construction time.

Reimplemented in reranker::MiraStyleModel.

Definition at line 389 of file perceptron-model.H.

void reranker::PerceptronModel::EndOfEpoch ( )
virtual

Implements reranker::Model.

Definition at line 185 of file perceptron-model.C.

double reranker::PerceptronModel::Evaluate ( CandidateSetIterator development_test)
virtual

Evaluates this model on the specified set of held-out development test data.

Side effects:
This method is guaranteed to append a value to the internal loss_per_epoch_ vector.
Parameters
development_testa held-out set of examples to use for evaluation of this model (during training, this method is typically invoked after each epoch to determine when to stop)
Returns
the loss of this model when evaluated on the specified development test set

Implements reranker::Model.

Definition at line 288 of file perceptron-model.C.

void reranker::PerceptronModel::Init ( const Environment env,
const string &  arg 
)
virtual

Initializes this instance.

This method is guaranteed to be invoked by a Factory just after construction.

Reimplemented from reranker::FactoryConstructible.

Definition at line 82 of file perceptron-model.C.

virtual const string& reranker::PerceptronModel::model_spec ( ) const
inlinevirtual

Returns the spec string for constructing a default instance of this model so it may be properly de-serialized by its ModelProtoReader.

Implements reranker::Model.

Definition at line 176 of file perceptron-model.H.

virtual const TrainingVectorSet& reranker::PerceptronModel::models ( ) const
inlinevirtual

Returns the set of models and statistics used by this PerceptronModel instance.

Definition at line 353 of file perceptron-model.H.

bool reranker::PerceptronModel::NeedToKeepTraining ( )
virtual

Returns whether more training epochs are required for this model.

Implementation advice:
Implementations of the Train method are strongly encouraged to have their main loop test be the return value of this method. Also, the return value of this method should respect the min_epochs and max_epochs values.

Implements reranker::Model.

Definition at line 115 of file perceptron-model.C.

bool reranker::PerceptronModel::NeedToUpdate ( CandidateSet example)
virtual

Indicates whether the current model needs to be updated; the implementation here simply returns true if the best-scoring candidate is not equal to the gold or reference candidate.

Parameters
examplethe current training example

Implements reranker::Model.

Definition at line 226 of file perceptron-model.C.

void reranker::PerceptronModel::NewEpoch ( )
virtual

Implements reranker::Model.

Definition at line 163 of file perceptron-model.C.

virtual const string& reranker::PerceptronModel::proto_reader_spec ( ) const
inlinevirtual

Returns the spec string for contructing an instance of a ModelProtoReader capable of de-serializing this Model implementation.

Implements reranker::Model.

Definition at line 179 of file perceptron-model.H.

virtual const string& reranker::PerceptronModel::proto_writer_spec ( ) const
inlinevirtual

Returns the spec string for contructing an instance of a ModelProtoWriter capable of serializing this Model implementation.

Implements reranker::Model.

Definition at line 184 of file perceptron-model.H.

void reranker::PerceptronModel::RegisterInitializers ( Initializers initializers)
virtual

Registers several variables that may be initialized when this object is constructed via Factory::CreateOrDie.

Variable name Type Required Description Default value
name string Yes The name of this model instance (for human consumption). n/a
score_comparator Candidate::Comparator No The object by which the scores of two Candidate instances are compared. DefaultScoreComparator
gold_comparator Candidate::Comparator No The object by which two Candidate instances are compared when finding the “gold” candidate. DefaultGoldComparator
candidate_set_scorer CandidateSet::Scorer No The object to score a CandidateSet instance. DefaultCandidateSetScorer
update_predicate Model::UpdatePredicate No The object to let the model know if it is time to do an update. PerceptronModelDefaultUpdatePredicate
updater Model::Updater No The object to update the model. PerceptronModelDefaultUpdater
step_size double No The initial value of the step size for parameter updates. 1.0

Reimplemented from reranker::FactoryConstructible.

Reimplemented in reranker::MiraStyleModel.

Definition at line 70 of file perceptron-model.C.

double reranker::PerceptronModel::ScoreCandidate ( Candidate candidate,
bool  training 
)
virtual

Scores a candidate according to either the raw or averaged version of this perceptron model.

The specified candidate's score may be modified.

Parameters
[in,out]candidatethe candidate to be scored by this model
trainingwhether this is being called during training or evaluation of a model
Returns
the score of the specified candidate according to the specified model (also contained in the candidate itself)

Implements reranker::Model.

Definition at line 359 of file perceptron-model.C.

void reranker::PerceptronModel::ScoreCandidates ( CandidateSet candidates,
bool  training 
)
virtual

Scores the specified set of candidates according to either the raw or averaged version of this perceptron model, keeping track of which candidate has the highest score and which candidate has the lowest loss with the best score.

The scores of the specified set of candidates may be modified. This method is currently entirely implemented via DefaultCandidateSetScorer.

Parameters
[in,out]candidatesthe set of candidates to be scored
trainingwhether this is being called during training or evaluation of a model
See Also
DefaultCandidateSetScorer::Score

Implements reranker::Model.

Definition at line 354 of file perceptron-model.C.

void reranker::PerceptronModel::set_max_epochs_in_decline ( int  max_epochs_in_decline)
inline

Sets the maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far).

Definition at line 347 of file perceptron-model.H.

void reranker::PerceptronModel::SetDefaultObjects ( )
inlineprotected

Definition at line 356 of file perceptron-model.H.

void reranker::PerceptronModel::Train ( CandidateSetIterator examples,
CandidateSetIterator development_test 
)
virtual

Trains this model on a collection of training examples, where each training example is a set of candidates.

Attention
This method is implemented in terms of the TrainOnExample method. Thus, for mistake-driven learning methods similar to the perceptron, one need only derive a class from this one and override TrainOnExample.
Parameters
examplesthe set of training examples on which to train this model
development_testthe set of held-out examples to use to evaluate the model after each epoch

Implements reranker::Model.

Definition at line 88 of file perceptron-model.C.

void reranker::PerceptronModel::TrainOneEpoch ( CandidateSetIterator examples)
virtual

Trains this model for one epoch, i.e., a single pass through the specified set of training examples.

Typically the Train method will be implemented in terms of this method.

Parameters
examplesthe set of training examples on which to train this model

Implements reranker::Model.

Definition at line 176 of file perceptron-model.C.

void reranker::PerceptronModel::TrainOnExample ( CandidateSet example)
virtual

Trains this model on the specified training example.

Parameters
examplethe example to train on

Implements reranker::Model.

Definition at line 203 of file perceptron-model.C.

void reranker::PerceptronModel::Update ( CandidateSet example)
virtual

Updates the current model based on the specified set of candidates.

TrainOnExample will be implemented in terms of this method.

Parameters
examplethe current training example

Implements reranker::Model.

Definition at line 237 of file perceptron-model.C.

Friends And Related Function Documentation

friend class PerceptronModelProtoReader
friend

Definition at line 66 of file perceptron-model.H.

friend class PerceptronModelProtoWriter
friend

Definition at line 65 of file perceptron-model.H.

Member Data Documentation

int reranker::PerceptronModel::best_model_epoch_
protected

The epoch of the best models seen so far during training.

Definition at line 413 of file perceptron-model.H.

TrainingVectorSet reranker::PerceptronModel::best_models_
protected

The best models seen so far during training, according to evaluation on the held-out development test data.

Definition at line 411 of file perceptron-model.H.

int reranker::PerceptronModel::max_epochs_in_decline_
protected

The maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far).

Definition at line 416 of file perceptron-model.H.

string reranker::PerceptronModel::model_spec_
protected

Definition at line 423 of file perceptron-model.H.

TrainingVectorSet reranker::PerceptronModel::models_
protected

The feature vectors representing this model.

Definition at line 408 of file perceptron-model.H.

int reranker::PerceptronModel::num_epochs_in_decline_
protected

The current number of training epochs in which the model has been degrading in development set performance (i.e., has been having more errors than best model so far).

Definition at line 420 of file perceptron-model.H.

string reranker::PerceptronModel::proto_reader_spec_
staticprotected

A string that specifies to construct a PerceptronModelProtoReader, which is capable of de-serializing an instance of this class.

Definition at line 428 of file perceptron-model.H.

string reranker::PerceptronModel::proto_writer_spec_
staticprotected

A string that specifies to construct a PerceptronModelProtoWriter, which is capable of serializing an instance of this class.

Definition at line 432 of file perceptron-model.H.

double reranker::PerceptronModel::step_size_
protected

The last value computed by the ComputeStepSize method.

Definition at line 422 of file perceptron-model.H.


The documentation for this class was generated from the following files: