This class implements a perceptron model reranker. More...

#include <perceptron-model.H>

Inheritance diagram for reranker::PerceptronModel:

Classes
class	DefaultUpdatePredicate
	The default update predicate for perceptron and perceptron-style models, which indicates to do a model update whenever the top-scoring candidate hypothesis under the current model differs from the oracle or “gold” candidate hypothesis. More...

class	DefaultUpdater
	The default update function for perceptron models. More...

Public Member Functions
	PerceptronModel ()
	Constructs a new instance with the empty string for its name and the DotProduct kernel function. More...

	PerceptronModel (const string &name)
	Constructs a new perceptron model with a DotProduct kernel function. More...

	PerceptronModel (const string &name, KernelFunction *kernel_fn)
	Constructs a new perceptron model with the specified kernel function. More...

	PerceptronModel (const string &name, KernelFunction kernel_fn, Symbols symbols)
	Constructs a new perceptron model with the specified kernel function and symbol table. More...

virtual	~PerceptronModel ()
	Destroys this perceptron model and all its data members. More...

virtual const string &	model_spec () const
	Returns the spec string for constructing a default instance of this model so it may be properly de-serialized by its ModelProtoReader. More...

virtual const string &	proto_reader_spec () const
	Returns the spec string for contructing an instance of a ModelProtoReader capable of de-serializing this Model implementation. More...

virtual const string &	proto_writer_spec () const
	Returns the spec string for contructing an instance of a ModelProtoWriter capable of serializing this Model implementation. More...

virtual int	best_model_epoch () const
	Returns the epoch of the best models seen so far during training. More...

virtual void	RegisterInitializers (Initializers &initializers)
	Registers several variables that may be initialized when this object is constructed via Factory::CreateOrDie. More...

virtual void	Init (const Environment *env, const string &arg)
	Initializes this instance. More...

virtual bool	NeedToKeepTraining ()
	Returns whether more training epochs are required for this model. More...

virtual void	Train (CandidateSetIterator &examples, CandidateSetIterator &development_test)
	Trains this model on a collection of training examples, where each training example is a set of candidates. More...

virtual void	NewEpoch ()

virtual void	EndOfEpoch ()

virtual void	TrainOneEpoch (CandidateSetIterator &examples)
	Trains this model for one epoch, i.e., a single pass through the specified set of training examples. More...

virtual void	TrainOnExample (CandidateSet &example)
	Trains this model on the specified training example. More...

virtual bool	NeedToUpdate (CandidateSet &example)
	Indicates whether the current model needs to be updated; the implementation here simply returns true if the best-scoring candidate is not equal to the gold or reference candidate. More...

virtual void	Update (CandidateSet &example)
	Updates the current model based on the specified set of candidates. More...

virtual double	Evaluate (CandidateSetIterator &development_test)
	Evaluates this model on the specified set of held-out development test data. More...

virtual void	ScoreCandidates (CandidateSet &candidates, bool training)
	Scores the specified set of candidates according to either the raw or averaged version of this perceptron model, keeping track of which candidate has the highest score and which candidate has the lowest loss with the best score. More...

virtual double	ScoreCandidate (Candidate &candidate, bool training)
	Scores a candidate according to either the raw or averaged version of this perceptron model. More...

virtual void	CompactifyFeatureUids ()
	Renumbers the potentially sparse feature uid’s so that they occupy the interval `[0,n-1]` densely, for `n` non-zero features in use by this model. More...

void	set_max_epochs_in_decline (int max_epochs_in_decline)
	Sets the maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far). More...

virtual const TrainingVectorSet &	models () const
	Returns the set of models and statistics used by this PerceptronModel instance. More...

Public Member Functions inherited from reranker::Model
	Model ()
	Constructs a new instance with the empty string for its name and a NULL kernel function. More...

	Model (const string &name)
	Constructs a new instance with a NULL kernel function. More...

	Model (const string &name, KernelFunction *kernel_fn)
	Constructs a new instance with the specified kernel function. More...

	Model (const string &name, KernelFunction kernel_fn, Symbols symbols)
	Constructs a new instance with the specified kernel function and symbol table. More...

virtual	~Model ()
	Destroys this model and its associated kernel function. More...

const string &	name () const
	Returns the unique name for this model instance. More...

Symbols *	symbols () const
	Returns the symbol table for this model. More...

const Time &	time () const
	Returns the current training time of this model: number of epochs, number of time steps in the current epoch and total number of time steps (which is equal to the total number of training examples seen). More...

int	num_updates () const
	Returns the number of updates made by this model. More...

const vector< int > &	num_training_errors_per_epoch ()
	Returns the number of training errors made for each epoch. More...

int	num_training_errors () const
	Returns the number of training errors made by this model. More...

int	min_epochs () const
	Returns the minimum number of epochs to train. More...

int	max_epochs () const
	Returns the maximum number of epochs to train. More...

const vector< double > &	loss_per_epoch ()
	Returns the loss per epoch for epoch of training that was evaluated. More...

virtual shared_ptr < Candidate::Comparator >	score_comparator ()
	Returns a pointer to the score comparator used by this model. More...

virtual shared_ptr < Candidate::Comparator >	gold_comparator ()
	Returns a pointer to the gold comparator used by this model. More...

virtual void	set_min_epochs (int min_epochs)
	Sets the minimum number of epochs to train. More...

virtual void	set_max_epochs (int max_epochs)
	Sets the maximum number of epochs to train. More...

virtual void	set_end_of_epoch_hook (Hook *end_of_epoch_hook)

virtual bool	use_weighted_loss ()

virtual void	set_use_weighted_loss (bool use_weighted_loss)

virtual void	set_symbols (Symbols *symbols)
	Sets the Symbols instance for this Model to be the specified instance. More...

Public Member Functions inherited from reranker::FactoryConstructible
virtual	~FactoryConstructible ()

Protected Member Functions
void	SetDefaultObjects ()

virtual void	ComputeFeaturesToUpdate (const CandidateSet &example, unordered_set< int > &gold_features_to_update, unordered_set< int > &best_scoring_features_to_update) const
	Computes the features to be updated for the gold candidate and the best-scoring candidate. More...

virtual double	ComputeStepSize (const unordered_set< int > &gold_features, const unordered_set< int > &best_scoring_features, const CandidateSet &example)
	Computes the step size for the next update, and, as a side effect, caches this value in step_size_. More...

Protected Member Functions inherited from reranker::Model
void	set_name (const string &name)
	Sets the name of this Model instance. More...

void	set_kernel_fn (KernelFunction *kernel_fn)
	Sets the kernel function for this model. More...

void	set_score_comparator (shared_ptr< Candidate::Comparator > score_comparator)

void	set_gold_comparator (shared_ptr< Candidate::Comparator > gold_comparator)

void	SetDefaultObjects ()

void	SetDefaultComparators ()

void	SetDefaultCandidateSetScorer ()

shared_ptr< Candidate::Comparator >	GetComparator (const string &spec) const

shared_ptr< CandidateSet::Scorer >	GetCandidateSetScorer (const string &spec) const

shared_ptr< UpdatePredicate >	GetUpdatePredicate (const string &spec) const

shared_ptr< Updater >	GetUpdater (const string &spec) const

virtual void	CheckNumberOfTokens (const string &arg, const vector< string > &tokens, size_t min_expected_number, size_t max_expected_number, const string &class_name) const
	A helper method for implementing the Init method: throws a std::runtime_error if the number of tokens in the argument string is not the expected number. More...

Protected Attributes
TrainingVectorSet	models_
	The feature vectors representing this model. More...

TrainingVectorSet	best_models_
	The best models seen so far during training, according to evaluation on the held-out development test data. More...

int	best_model_epoch_
	The epoch of the best models seen so far during training. More...

int	max_epochs_in_decline_
	The maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far). More...

int	num_epochs_in_decline_
	The current number of training epochs in which the model has been degrading in development set performance (i.e., has been having more errors than best model so far). More...

double	step_size_
	The last value computed by the ComputeStepSize method. More...

string	model_spec_

Protected Attributes inherited from reranker::Model
string	name_
	This model’s unique name. More...

Time	time_
	The tiny object that holds the "training time" for this model (epoch, index and absolute time index). More...

KernelFunction *	kernel_fn_
	Yes, this is an interface, but we add the kernel function as a data member. More...

Symbols *	symbols_
	The symbol table for this model (may be NULL). More...

shared_ptr< Candidate::Comparator >	score_comparator_
	A comparator to provide an ordering for candidates based on score when scoring all candidates in a set. More...

shared_ptr< Candidate::Comparator >	gold_comparator_
	A comparator to provide an ordering for candidates to find the gold candidate in a set. More...

shared_ptr< CandidateSet::Scorer >	candidate_set_scorer_
	A scorer for CandidateSet instances. More...

shared_ptr< UpdatePredicate >	update_predicate_
	The update predicate for this model. More...

shared_ptr< Updater >	updater_
	The updater for this model. More...

vector< double >	loss_per_epoch_
	The average loss per epoch. More...

vector< int >	num_testing_errors_per_epoch_
	The number of testing errors made on held-out development test data for each epoch. More...

vector< int >	num_training_errors_per_epoch_
	The number of errors made on training examples during each epoch. More...

int	num_training_errors_
	The number of errors made on training examples. More...

int	num_updates_
	The number of times an update was performed on this model during training. More...

int	min_epochs_
	The minimum number of training epochs to execute. More...

int	max_epochs_
	The maximum number of training epochs to execute. More...

Hook *	end_of_epoch_hook_
	A hook to be performed at the end of every epoch. More...

bool	use_weighted_loss_
	Indicates whether this model should weight each candidate’s loss by the value returned by CandidateSet::loss_weight. More...

Static Protected Attributes
static string	proto_reader_spec_
	A string that specifies to construct a PerceptronModelProtoReader, which is capable of de-serializing an instance of this class. More...

static string	proto_writer_spec_
	A string that specifies to construct a PerceptronModelProtoWriter, which is capable of serializing an instance of this class. More...

Friends
class	PerceptronModelProtoWriter

class	PerceptronModelProtoReader

Detailed Description

This class implements a perceptron model reranker.

While this model can consist of arbitrary feature types, there is special handling for n-gram–based features, to capture the fact that, e.g., a bigram suffix exists whenever a trigram occurs.

Definition at line 63 of file perceptron-model.H.

Constructor & Destructor Documentation

reranker::PerceptronModel::PerceptronModel ( )

inline

Constructs a new instance with the empty string for its name and the DotProduct kernel function.

Definition at line 70 of file perceptron-model.H.

reranker::PerceptronModel::PerceptronModel ( const string & name )

inline

Constructs a new perceptron model with a DotProduct kernel function.

Parameters

name	the unique name of this perceptron model instance

Definition at line 85 of file perceptron-model.H.

reranker::PerceptronModel::PerceptronModel	(	const string &	name,
		KernelFunction *	kernel_fn
	)

inline

Constructs a new perceptron model with the specified kernel function.

Parameters

name	the unique name of this perceptron model instance
kernel_fn	the kernel function for this model to use when evaluating on training or test instances

Definition at line 101 of file perceptron-model.H.

reranker::PerceptronModel::PerceptronModel	(	const string &	name,
		KernelFunction *	kernel_fn,
		Symbols *	symbols
	)

inline

Constructs a new perceptron model with the specified kernel function and symbol table.

Parameters

name	the unique name of this model instance
kernel_fn	the kernel function for this model to use when applied to training or test instances
symbols	the symbol table for this Model to use; this Model will be responsible for deleting this Symbols object

Definition at line 120 of file perceptron-model.H.

virtual reranker::PerceptronModel::~PerceptronModel ( )

inlinevirtual

Destroys this perceptron model and all its data members.

Definition at line 134 of file perceptron-model.H.

Member Function Documentation

virtual int reranker::PerceptronModel::best_model_epoch ( ) const

inlinevirtual

Returns the epoch of the best models seen so far during training.

(Primarily here for the PerceptronModelProtoWriter serializer.)

Implements reranker::Model.

Definition at line 190 of file perceptron-model.H.

void reranker::PerceptronModel::CompactifyFeatureUids ( )

virtual

Renumbers the potentially sparse feature uid’s so that they occupy the interval [0,n-1] densely, for n non-zero features in use by this model.

If the internal Symbols instance is non-NULL, then this method also adjusts it to reflect the new set of feature uid’s.

Implements reranker::Model.

Definition at line 374 of file perceptron-model.C.

void reranker::PerceptronModel::ComputeFeaturesToUpdate	(	const CandidateSet &	example,
		unordered_set< int > &	gold_features_to_update,
		unordered_set< int > &	best_scoring_features_to_update
	)		const

protectedvirtual

Computes the features to be updated for the gold candidate and the best-scoring candidate.

Let G be gold features and B be best-scoring features. For the perceptron, we want to update the set difference G\B positively and B\G negatively. These two set difference operations are computed by this method.

Attention: Neither of the two specified sets are cleared by this method.

Parameters

	example	the candidate set from which to get the gold feature vector and the best-scoring candidate feature vector
[out]	gold_features_to_update	a set in which to insert the uid's of all features in the gold that are not in the best scoring candidate
[out]	best_scoring_features_to_update	a set in which to insert the uid's of all features in the best-scoring candidate that are not in the gold candidate

Definition at line 409 of file perceptron-model.C.

virtual double reranker::PerceptronModel::ComputeStepSize	(	const unordered_set< int > &	gold_features,
		const unordered_set< int > &	best_scoring_features,
		const CandidateSet &	example
	)

inlineprotectedvirtual

Computes the step size for the next update, and, as a side effect, caches this value in step_size_.

In the case of the standard perceptron model implemented here, the step size does not change, and so this method simply returns the step size value set at construction time.

Reimplemented in reranker::MiraStyleModel.

Definition at line 389 of file perceptron-model.H.

void reranker::PerceptronModel::EndOfEpoch ( )

virtual

Implements reranker::Model.

Definition at line 185 of file perceptron-model.C.

double reranker::PerceptronModel::Evaluate ( CandidateSetIterator & development_test )

virtual

Evaluates this model on the specified set of held-out development test data.

Side effects:: This method is guaranteed to append a value to the internal loss_per_epoch_ vector.

Parameters

development_test a held-out set of examples to use for evaluation of this model (during training, this method is typically invoked after each epoch to determine when to stop)

Returns: the loss of this model when evaluated on the specified development test set

Implements reranker::Model.

Definition at line 288 of file perceptron-model.C.

void reranker::PerceptronModel::Init	(	const Environment *	env,
		const string &	arg
	)

virtual

Initializes this instance.

This method is guaranteed to be invoked by a Factory just after construction.

Reimplemented from reranker::FactoryConstructible.

Definition at line 82 of file perceptron-model.C.

virtual const string& reranker::PerceptronModel::model_spec ( ) const

inlinevirtual

Returns the spec string for constructing a default instance of this model so it may be properly de-serialized by its ModelProtoReader.

Implements reranker::Model.

Definition at line 176 of file perceptron-model.H.

virtual const TrainingVectorSet& reranker::PerceptronModel::models ( ) const

inlinevirtual

Returns the set of models and statistics used by this PerceptronModel instance.

Definition at line 353 of file perceptron-model.H.

bool reranker::PerceptronModel::NeedToKeepTraining ( )

virtual

Returns whether more training epochs are required for this model.

Implementation advice:: Implementations of the Train method are strongly encouraged to have their main loop test be the return value of this method. Also, the return value of this method should respect the min_epochs and max_epochs values.

Implements reranker::Model.

Definition at line 115 of file perceptron-model.C.

bool reranker::PerceptronModel::NeedToUpdate ( CandidateSet & example )

virtual

Indicates whether the current model needs to be updated; the implementation here simply returns true if the best-scoring candidate is not equal to the gold or reference candidate.

Parameters

example the current training example

Implements reranker::Model.

Definition at line 226 of file perceptron-model.C.

void reranker::PerceptronModel::NewEpoch ( )

virtual

Implements reranker::Model.

Definition at line 163 of file perceptron-model.C.

virtual const string& reranker::PerceptronModel::proto_reader_spec ( ) const

inlinevirtual

Returns the spec string for contructing an instance of a ModelProtoReader capable of de-serializing this Model implementation.

Implements reranker::Model.

Definition at line 179 of file perceptron-model.H.

virtual const string& reranker::PerceptronModel::proto_writer_spec ( ) const

inlinevirtual

Returns the spec string for contructing an instance of a ModelProtoWriter capable of serializing this Model implementation.

Implements reranker::Model.

Definition at line 184 of file perceptron-model.H.

void reranker::PerceptronModel::RegisterInitializers ( Initializers & initializers )

virtual

Registers several variables that may be initialized when this object is constructed via Factory::CreateOrDie.

Variable name	Type	Required	Description	Default value
`name`	`string`	Yes	The name of this model instance (for human consumption).	n/a
`score_comparator`	Candidate::Comparator	No	The object by which the scores of two Candidate instances are compared.	DefaultScoreComparator
`gold_comparator`	Candidate::Comparator	No	The object by which two Candidate instances are compared when finding the “gold” candidate.	DefaultGoldComparator
`candidate_set_scorer`	CandidateSet::Scorer	No	The object to score a CandidateSet instance.	DefaultCandidateSetScorer
`update_predicate`	Model::UpdatePredicate	No	The object to let the model know if it is time to do an update.	PerceptronModelDefaultUpdatePredicate
`updater`	Model::Updater	No	The object to update the model.	PerceptronModelDefaultUpdater
`step_size`	double	No	The initial value of the step size for parameter updates.	`1.0`

Reimplemented from reranker::FactoryConstructible.

Reimplemented in reranker::MiraStyleModel.

Definition at line 70 of file perceptron-model.C.

double reranker::PerceptronModel::ScoreCandidate	(	Candidate &	candidate,
		bool	training
	)

virtual

Scores a candidate according to either the raw or averaged version of this perceptron model.

The specified candidate's score may be modified.

Parameters

[in,out]	candidate	the candidate to be scored by this model
	training	whether this is being called during training or evaluation of a model

Returns: the score of the specified candidate according to the specified model (also contained in the candidate itself)

Implements reranker::Model.

Definition at line 359 of file perceptron-model.C.

void reranker::PerceptronModel::ScoreCandidates	(	CandidateSet &	candidates,
		bool	training
	)

virtual

Scores the specified set of candidates according to either the raw or averaged version of this perceptron model, keeping track of which candidate has the highest score and which candidate has the lowest loss with the best score.

The scores of the specified set of candidates may be modified. This method is currently entirely implemented via DefaultCandidateSetScorer.

Parameters

[in,out]	candidates	the set of candidates to be scored
	training	whether this is being called during training or evaluation of a model

See Also: DefaultCandidateSetScorer::Score

Implements reranker::Model.

Definition at line 354 of file perceptron-model.C.

void reranker::PerceptronModel::set_max_epochs_in_decline ( int max_epochs_in_decline )

inline

Sets the maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far).

Definition at line 347 of file perceptron-model.H.

void reranker::PerceptronModel::SetDefaultObjects ( )

inlineprotected

Definition at line 356 of file perceptron-model.H.

void reranker::PerceptronModel::Train	(	CandidateSetIterator &	examples,
		CandidateSetIterator &	development_test
	)

virtual

Trains this model on a collection of training examples, where each training example is a set of candidates.

Attention: This method is implemented in terms of the TrainOnExample method. Thus, for mistake-driven learning methods similar to the perceptron, one need only derive a class from this one and override TrainOnExample.

Parameters

examples	the set of training examples on which to train this model
development_test	the set of held-out examples to use to evaluate the model after each epoch

Implements reranker::Model.

Definition at line 88 of file perceptron-model.C.

void reranker::PerceptronModel::TrainOneEpoch ( CandidateSetIterator & examples )

virtual

Trains this model for one epoch, i.e., a single pass through the specified set of training examples.

Typically the Train method will be implemented in terms of this method.

Parameters

examples the set of training examples on which to train this model

Implements reranker::Model.

Definition at line 176 of file perceptron-model.C.

void reranker::PerceptronModel::TrainOnExample ( CandidateSet & example )

virtual

Trains this model on the specified training example.

Parameters

example the example to train on

Implements reranker::Model.

Definition at line 203 of file perceptron-model.C.

void reranker::PerceptronModel::Update ( CandidateSet & example )

virtual

Updates the current model based on the specified set of candidates.

TrainOnExample will be implemented in terms of this method.

Parameters

example the current training example

Implements reranker::Model.

Definition at line 237 of file perceptron-model.C.

Friends And Related Function Documentation

friend class PerceptronModelProtoReader

friend

Definition at line 66 of file perceptron-model.H.

friend class PerceptronModelProtoWriter

friend

Definition at line 65 of file perceptron-model.H.

Member Data Documentation

int reranker::PerceptronModel::best_model_epoch_

protected

The epoch of the best models seen so far during training.

Definition at line 413 of file perceptron-model.H.

TrainingVectorSet reranker::PerceptronModel::best_models_

protected

The best models seen so far during training, according to evaluation on the held-out development test data.

Definition at line 411 of file perceptron-model.H.

int reranker::PerceptronModel::max_epochs_in_decline_

protected

The maximum number of training epochs to keep training after the model starts to degrade (i.e., has more errors than the best model so far).

Definition at line 416 of file perceptron-model.H.

string reranker::PerceptronModel::model_spec_

protected

Definition at line 423 of file perceptron-model.H.

TrainingVectorSet reranker::PerceptronModel::models_

protected

The feature vectors representing this model.

Definition at line 408 of file perceptron-model.H.

int reranker::PerceptronModel::num_epochs_in_decline_

protected

The current number of training epochs in which the model has been degrading in development set performance (i.e., has been having more errors than best model so far).

Definition at line 420 of file perceptron-model.H.

string reranker::PerceptronModel::proto_reader_spec_

staticprotected

A string that specifies to construct a PerceptronModelProtoReader, which is capable of de-serializing an instance of this class.

Definition at line 428 of file perceptron-model.H.

string reranker::PerceptronModel::proto_writer_spec_

staticprotected

A string that specifies to construct a PerceptronModelProtoWriter, which is capable of serializing an instance of this class.

Definition at line 432 of file perceptron-model.H.

double reranker::PerceptronModel::step_size_

protected

The last value computed by the ComputeStepSize method.

Definition at line 422 of file perceptron-model.H.

The documentation for this class was generated from the following files:

Classes

Public Member Functions

Protected Member Functions

Protected Attributes

Static Protected Attributes

Friends

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Friends And Related Function Documentation

Member Data Documentation