ml

Learning Rate Finder

Smith (2017) LR range test — log-scale loss vs LR, steepest descent suggestion, cyclical LR range [LR_min, LR_max], warmup schedule SVG.

Input Mode
Simulating GD on f(x)=x² from x₀=5 over 60 log-spaced LRs (30 steps each)
Loss vs Learning Rate
—— loss--- steepest descent LR
Warmup / Annealing Preview
--- constant—— linear warmup—— cosine annealing
Smith (2017) LR Suggestion
Data points60
Steepest descent LR0.007851
Suggested LR (1/10 steep)0.0007851
Cyclical LR_min0.0007851
Cyclical LR_max0.007851
Min observed loss9.084e-82
LR at min loss0.5211
Warmup Schedules
ConstantLR = max_lr throughout
Linear warmup0 → max_lr over warmup_steps, then constant
Cosine annealingmax_lr × ½(1 + cos(πt/T)) → decays to 0
Steepest descent LR: argmin d(loss)/d(log LR)
Cyclical CLR (Smith 2017): oscillate LR_min ↔ LR_max
Suggested 1-cycle: LR_max = steepest, LR_min = LR_max/10