4.1.7. Optimizers

4.1.7.1. Base Class

class Optimizer : primitiv::mixins::Nonmovable<Optimizer>

Abstract class for parameter optimizers.

Subclassed by primitiv::optimizers::AdaDelta, primitiv::optimizers::AdaGrad, primitiv::optimizers::Adam, primitiv::optimizers::MomentumSGD, primitiv::optimizers::RMSProp, primitiv::optimizers::SGD

Public Functions

void load(const std::string &path)

Loads configurations from a file.

Parameters
  • path: Path of the optimizer parameter file.

void save(const std::string &path) const

Saves current configurations to a file.

Parameters
  • path: Path of the file that will store optimizer parameters.

std::uint32_t get_epoch() const

Retrieves current epoch.

Return
Current epoch.

void set_epoch(std::uint32_t epoch)

Sets current epoch.

Parameters
  • epoch: New epoch.

float get_learning_rate_scaling() const

Retrieves current learning rate scaling factor.

Return
The scaling factor.

void set_learning_rate_scaling(float scale)

Sets learning rate scaling factor.

Remark
Could not set negative values.
Parameters
  • scale: New scaling factor.

float get_weight_decay() const

Retrieves current L2 decay strength.

Return
Current L2 decay strength.

void set_weight_decay(float strength)

Sets L2 decay strength.

Remark
Could not set negative values.
Parameters
  • strength: New L2 decay strength, or 0 to disable L2 decay.

float get_gradient_clipping() const

Retrieves current gradient clipping threshold.

Return
Current gradient clipping threshold.

void set_gradient_clipping(float threshold)

Sets gradient clipping threshold.

Remark
Could not set negative values.
Parameters
  • threshold: New clipping threshold, or 0 to disable gradient clipping.

void add()

Do nothing. This function is used as the sentinel of other specialized functions.

template <typename T, typename... Args>
void add(T &model_or_param, Args&... args)

Registers multiple parameters and models.

This function behaves similar to multiple

add() calls with the same order of arguments. E.g., below lines should behave similarly (except the case of exceptions):
add(a, b, c, d);
add(a, b); add(c, d);
add(a); add(b); add(c); add(d);
Parameters

void reset_gradients()

Resets all gradients of registered parameters.

void update()

Updates parameter values.

virtual void get_configs(std::unordered_map<std::string, std::uint32_t> &uint_configs, std::unordered_map<std::string, float> &float_configs) const

Gathers configuration values.

Parameters
  • uint_configs: Configurations with std::uint32_t type.
  • float_configs: Configurations with float type.

virtual void set_configs(const std::unordered_map<std::string, std::uint32_t> &uint_configs, const std::unordered_map<std::string, float> &float_configs)

Sets configuration values.

Parameters
  • uint_configs: Configurations with std::uint32_t type.
  • float_configs: Configurations with float type.

4.1.7.2. Inherited Classes

class SGD : public primitiv::Optimizer

Simple stochastic gradient descent.

Public Functions

SGD(float eta = 0.1)

Creates a new SGD object.

Parameters
  • eta: Learning rate.

float eta() const

Returns the learning rate.

Return
Learning rate.

class MomentumSGD : public primitiv::Optimizer

Stochastic gradient descent with momentum.

Public Functions

MomentumSGD(float eta = 0.01, float momentum = 0.9)

Creates a new MomentumSGD object.

Parameters
  • eta: Learning rate.
  • momentum: Decay factor of the momentum.

float eta() const

Returns the hyperparameter eta.

Return
The value of eta.

float momentum() const

Returns the hyperparameter momentum.

Return
The value of momentum.

class AdaGrad : public primitiv::Optimizer

AdaGrad optimizer.

Public Functions

primitiv::optimizers::AdaGrad::AdaGrad(float eta = 0.001, float eps = 1e-8)

Creates a new AdaGrad object.

Parameters
  • eta: Learning rate.
  • eps: Bias of power.

float eta() const

Returns the hyperparameter eta.

Return
The value of eta.

float eps() const

Returns the hyperparameter eps.

Return
The value of eps.

class RMSProp : public primitiv::Optimizer

RMSProp Optimizer.

Public Functions

primitiv::optimizers::RMSProp::RMSProp(float eta = 0.01, float alpha = 0.9, float eps = 1e-8)

Creates a new RMSProp object.

Parameters
  • eta: Learning rate.
  • alpha: Decay factor of moment.
  • eps: Bias of power.

float eta() const

Returns the hyperparameter eta.

Return
The value of eta.

float alpha() const

Returns the hyperparameter alpha.

Return
The value of alpha.

float eps() const

Returns the hyperparameter eps.

Return
The value of eps.

class AdaDelta : public primitiv::Optimizer

AdaDelta optimizer. https://arxiv.org/abs/1212.5701

Public Functions

primitiv::optimizers::AdaDelta::AdaDelta(float rho = 0.95, float eps = 1e-6)

Creates a new AdaDelta object.

Parameters
  • rho: Decay factor of RMS operation.
  • eps: Bias of RMS values.

float rho() const

Returns the hyperparameter rho.

Return
The value of rho.

float eps() const

Returns the hyperparameter eps.

Return
The value of eps.

class Adam : public primitiv::Optimizer

Adam optimizer. https://arxiv.org/abs/1412.6980

Public Functions

primitiv::optimizers::Adam::Adam(float alpha = 0.001, float beta1 = 0.9, float beta2 = 0.999, float eps = 1e-8)

Creates a new Adam object.

Parameters
  • alpha: Learning rate.
  • beta1: Decay factor of momentum history.
  • beta2: Decay factor of power history.
  • eps: Bias of power.

float alpha() const

Returns the hyperparameter alpha.

Return
The value of alpha.

float beta1() const

Returns the hyperparameter beta1.

Return
The value of beta1.

float beta2() const

Returns the hyperparameter beta2.

Return
The value of beta2.

float eps() const

Returns the hyperparameter eps.

Return
The value of eps.