4.1.7. Optimizers¶

4.1.7.1. Base Class¶

class Optimizer : primitiv::mixins::Nonmovable<Optimizer>¶

Abstract class for parameter optimizers.

Subclassed by primitiv::optimizers::AdaDelta, primitiv::optimizers::AdaGrad, primitiv::optimizers::Adam, primitiv::optimizers::MomentumSGD, primitiv::optimizers::RMSProp, primitiv::optimizers::SGD

Public Functions

void load(const std::string &path)¶

Loads configurations from a file.

Parameters

path: Path of the optimizer parameter file.

void save(const std::string &path) const¶

Saves current configurations to a file.

Parameters

path: Path of the file that will store optimizer parameters.

std::uint32_t get_epoch() const¶

Retrieves current epoch.

Return: Current epoch.

void set_epoch(std::uint32_t epoch)¶

Sets current epoch.

Parameters

epoch: New epoch.

float get_learning_rate_scaling() const¶

Retrieves current learning rate scaling factor.

Return: The scaling factor.

void set_learning_rate_scaling(float scale)¶

Sets learning rate scaling factor.

Remark

Could not set negative values.

Parameters

scale: New scaling factor.

float get_weight_decay() const¶

Retrieves current L2 decay strength.

Return: Current L2 decay strength.

void set_weight_decay(float strength)¶

Sets L2 decay strength.

Remark

Could not set negative values.

Parameters

strength: New L2 decay strength, or 0 to disable L2 decay.

float get_gradient_clipping() const¶

Retrieves current gradient clipping threshold.

Return: Current gradient clipping threshold.

void set_gradient_clipping(float threshold)¶

Sets gradient clipping threshold.

Remark

Could not set negative values.

Parameters

threshold: New clipping threshold, or 0 to disable gradient clipping.

void add()¶: Do nothing. This function is used as the sentinel of other specialized functions.

template <typename T, typename... Args> void add(T &model_or_param, Args&... args)¶

Registers multiple parameters and models.

This function behaves similar to multiple

add() calls with the same order of arguments. E.g., below lines should behave similarly (except the case of exceptions):

add(a, b, c, d);
add(a, b); add(c, d);
add(a); add(b); add(c); add(d);

Parameters

model_or_param: Parameter or Model to be optimized.
args: List of remaining Parameter or Model to be optimized.

void reset_gradients()¶: Resets all gradients of registered parameters.

void update()¶: Updates parameter values.

virtual void get_configs(std::unordered_map<std::string, std::uint32_t> &uint_configs, std::unordered_map<std::string, float> &float_configs) const¶

Gathers configuration values.

Parameters

uint_configs: Configurations with std::uint32_t type.
float_configs: Configurations with float type.

virtual void set_configs(const std::unordered_map<std::string, std::uint32_t> &uint_configs, const std::unordered_map<std::string, float> &float_configs)¶

Sets configuration values.

Parameters

uint_configs: Configurations with std::uint32_t type.
float_configs: Configurations with float type.

4.1.7.2. Inherited Classes¶

class SGD : public primitiv::Optimizer ¶

Simple stochastic gradient descent.

Public Functions

SGD(float eta = 0.1)¶

Creates a new SGD object.

Parameters

eta: Learning rate.

float eta() const¶

Returns the learning rate.

Return: Learning rate.

class MomentumSGD : public primitiv::Optimizer ¶

Stochastic gradient descent with momentum.

Public Functions

MomentumSGD(float eta = 0.01, float momentum = 0.9)¶

Creates a new MomentumSGD object.

Parameters

eta: Learning rate.
momentum: Decay factor of the momentum.

float eta() const¶

Returns the hyperparameter eta.

Return: The value of eta.

float momentum() const¶

Returns the hyperparameter momentum.

Return: The value of momentum.

class AdaGrad : public primitiv::Optimizer ¶

AdaGrad optimizer.

Public Functions

primitiv::optimizers::AdaGrad::AdaGrad(float eta = 0.001, float eps = 1e-8)

Creates a new AdaGrad object.

Parameters

eta: Learning rate.
eps: Bias of power.

float eta() const¶

Returns the hyperparameter eta.

Return: The value of eta.

float eps() const¶

Returns the hyperparameter eps.

Return: The value of eps.

class RMSProp : public primitiv::Optimizer ¶

RMSProp Optimizer.

Public Functions

primitiv::optimizers::RMSProp::RMSProp(float eta = 0.01, float alpha = 0.9, float eps = 1e-8)

Creates a new RMSProp object.

Parameters

eta: Learning rate.
alpha: Decay factor of moment.
eps: Bias of power.

float eta() const¶

Returns the hyperparameter eta.

Return: The value of eta.

float alpha() const¶

Returns the hyperparameter alpha.

Return: The value of alpha.

float eps() const¶

Returns the hyperparameter eps.

Return: The value of eps.

class AdaDelta : public primitiv::Optimizer ¶

AdaDelta optimizer. https://arxiv.org/abs/1212.5701

Public Functions

primitiv::optimizers::AdaDelta::AdaDelta(float rho = 0.95, float eps = 1e-6)

Creates a new AdaDelta object.

Parameters

rho: Decay factor of RMS operation.
eps: Bias of RMS values.

float rho() const¶

Returns the hyperparameter rho.

Return: The value of rho.

float eps() const¶

Returns the hyperparameter eps.

Return: The value of eps.

class Adam : public primitiv::Optimizer ¶

Adam optimizer. https://arxiv.org/abs/1412.6980

Public Functions

primitiv::optimizers::Adam::Adam(float alpha = 0.001, float beta1 = 0.9, float beta2 = 0.999, float eps = 1e-8)

Creates a new Adam object.

Parameters

alpha: Learning rate.
beta1: Decay factor of momentum history.
beta2: Decay factor of power history.
eps: Bias of power.

float alpha() const¶

Returns the hyperparameter alpha.

Return: The value of alpha.

float beta1() const¶

Returns the hyperparameter beta1.

Return: The value of beta1.

float beta2() const¶

Returns the hyperparameter beta2.

Return: The value of beta2.

float eps() const¶

Returns the hyperparameter eps.

Return: The value of eps.