lr_schedule =========== .. py:module:: lr_schedule Classes ------- .. autoapisummary:: lr_schedule.BaseLRSchedule lr_schedule.LinearLRSchedule lr_schedule.CosineAnnealingLRSchedule lr_schedule.ExponentialLRSchedule lr_schedule.CosineWarmupLRSchedule lr_schedule.PolynomialLRSchedule Module Contents --------------- .. py:class:: BaseLRSchedule(cfg) .. py:method:: create(cfg) :classmethod: .. py:method:: func(progress_remaining) Get the learning rate :param progress_remaining: (float) :return: (float) .. py:method:: __call__(progress_remaining) .. py:class:: LinearLRSchedule(cfg) Bases: :py:obj:`BaseLRSchedule` Linear learning rate decay from initial_value to final_value. Standard linear annealing schedule. .. py:method:: func(progress_remaining) Get the current learning rate depending on remaining progress. :param progress_remaining: (float) 1.0 at start, 0.0 at end :return: (float) learning rate .. py:class:: CosineAnnealingLRSchedule(cfg) Bases: :py:obj:`BaseLRSchedule` Cosine annealing learning rate schedule. Smooth decay following cosine curve - often works better than linear. Popular in modern deep learning (e.g., ResNet, Transformers). Formula: lr = final_lr + 0.5 * (initial_lr - final_lr) * (1 + cos(π * progress)) .. py:method:: func(progress_remaining) Cosine annealing from initial_value to final_value. :param progress_remaining: (float) 1.0 at start, 0.0 at end :return: (float) learning rate .. py:class:: ExponentialLRSchedule(cfg) Bases: :py:obj:`BaseLRSchedule` Exponential learning rate decay. Decays faster early, slower later. Formula: lr = initial_lr * decay_rate^progress .. py:attribute:: decay_rate .. py:method:: func(progress_remaining) Exponential decay from initial_value to initial_value * decay_rate. :param progress_remaining: (float) 1.0 at start, 0.0 at end :return: (float) learning rate .. py:class:: CosineWarmupLRSchedule(cfg) Bases: :py:obj:`BaseLRSchedule` Cosine annealing with linear warmup. Starts from small LR, linearly increases to initial_value during warmup, then cosine annealing to final_value. Very popular in Transformer training (BERT, GPT, etc). .. py:method:: func(progress_remaining) Linear warmup followed by cosine annealing. :param progress_remaining: (float) 1.0 at start, 0.0 at end :return: (float) learning rate .. py:class:: PolynomialLRSchedule(cfg) Bases: :py:obj:`BaseLRSchedule` Polynomial learning rate decay. More gradual than exponential, more controlled than linear. Formula: lr = (initial_lr - final_lr) * (progress_remaining**power) + final_lr .. py:method:: func(progress_remaining) Polynomial decay from initial_value to final_value. :param progress_remaining: (float) 1.0 at start, 0.0 at end :return: (float) learning rate