Learning Rate Finder

Smith (2017) LR range test — log-scale loss vs LR, steepest descent suggestion, cyclical LR range [LR_min, LR_max], warmup schedule SVG.

Input Mode

Simulating GD on f(x)=x² from x₀=5 over 60 log-spaced LRs (30 steps each)

Loss vs Learning Rate

—— loss--- steepest descent LR

Warmup / Annealing Preview

Max LRWarmup stepsTotal steps

--- constant—— linear warmup—— cosine annealing

Smith (2017) LR Suggestion

Data points60

Steepest descent LR0.007851

Suggested LR (1/10 steep)0.0007851

Cyclical LR_min0.0007851

Cyclical LR_max0.007851

Min observed loss9.084e-82

LR at min loss0.5211

Warmup Schedules

ConstantLR = max_lr throughout

Linear warmup0 → max_lr over warmup_steps, then constant

Cosine annealingmax_lr × ½(1 + cos(πt/T)) → decays to 0

Steepest descent LR: argmin d(loss)/d(log LR)
Cyclical CLR (Smith 2017): oscillate LR_min ↔ LR_max
Suggested 1-cycle: LR_max = steepest, LR_min = LR_max/10