lr_schedule
Classes
Linear learning rate decay from initial_value to final_value. |
|
Cosine annealing learning rate schedule. |
|
Exponential learning rate decay. |
|
Cosine annealing with linear warmup. |
|
Polynomial learning rate decay. |
Module Contents
- class lr_schedule.BaseLRSchedule(cfg)
- Parameters:
cfg (dict | float)
- classmethod create(cfg)
- Parameters:
cfg (dict | float)
- Return type:
- func(progress_remaining)
Get the learning rate :param progress_remaining: (float) :return: (float)
- Parameters:
progress_remaining (float)
- Return type:
float
- __call__(progress_remaining)
- Parameters:
progress_remaining (float)
- Return type:
float
- class lr_schedule.LinearLRSchedule(cfg)
Bases:
BaseLRScheduleLinear learning rate decay from initial_value to final_value. Standard linear annealing schedule.
- Parameters:
cfg (dict | float)
- func(progress_remaining)
Get the current learning rate depending on remaining progress. :param progress_remaining: (float) 1.0 at start, 0.0 at end :return: (float) learning rate
- Parameters:
progress_remaining (float)
- Return type:
float
- class lr_schedule.CosineAnnealingLRSchedule(cfg)
Bases:
BaseLRScheduleCosine annealing learning rate schedule. Smooth decay following cosine curve - often works better than linear. Popular in modern deep learning (e.g., ResNet, Transformers).
Formula: lr = final_lr + 0.5 * (initial_lr - final_lr) * (1 + cos(π * progress))
- Parameters:
cfg (dict | float)
- func(progress_remaining)
Cosine annealing from initial_value to final_value. :param progress_remaining: (float) 1.0 at start, 0.0 at end :return: (float) learning rate
- Parameters:
progress_remaining (float)
- Return type:
float
- class lr_schedule.ExponentialLRSchedule(cfg)
Bases:
BaseLRScheduleExponential learning rate decay. Decays faster early, slower later.
Formula: lr = initial_lr * decay_rate^progress
- Parameters:
cfg (dict | float)
- decay_rate
- func(progress_remaining)
Exponential decay from initial_value to initial_value * decay_rate. :param progress_remaining: (float) 1.0 at start, 0.0 at end :return: (float) learning rate
- Parameters:
progress_remaining (float)
- Return type:
float
- class lr_schedule.CosineWarmupLRSchedule(cfg)
Bases:
BaseLRScheduleCosine annealing with linear warmup. Starts from small LR, linearly increases to initial_value during warmup, then cosine annealing to final_value.
Very popular in Transformer training (BERT, GPT, etc).
- Parameters:
cfg (dict | float)
- func(progress_remaining)
Linear warmup followed by cosine annealing. :param progress_remaining: (float) 1.0 at start, 0.0 at end :return: (float) learning rate
- Parameters:
progress_remaining (float)
- Return type:
float
- class lr_schedule.PolynomialLRSchedule(cfg)
Bases:
BaseLRSchedulePolynomial learning rate decay. More gradual than exponential, more controlled than linear.
Formula: lr = (initial_lr - final_lr) * (progress_remaining**power) + final_lr
- Parameters:
cfg (dict | float)
- func(progress_remaining)
Polynomial decay from initial_value to final_value. :param progress_remaining: (float) 1.0 at start, 0.0 at end :return: (float) learning rate
- Parameters:
progress_remaining (float)
- Return type:
float