MIT 6.5940 TinyML and Efficient Deep Learning Computing

Class 1

Why do we need to compress ML Model?
- Bridges the gap between the Supply and Demand of AI computing
- Improve hardware utilization
- Exponential growth in model parameter (0.05B to Trillions) while the GPU memory growth has plateaued
- Ship more models
- Ship bigger models
- Run models locally on-device liked edge, phones, microcontrollers
- On-device training for customization, privacy and life long learning
Reduce model size while maintaining accuracy doing neural architecture search techniques
Accelerate model processing capabilities like processing x vs 10x images
We now have a new module after model development - Model Compression (Quantization, Pruning, Distillation, Sparsity)
What are various examples and types of compression used? (Slide)
- https://www.dropbox.com/scl/fi/h3ggav4eopxsitqxzf6t2/Lec01-Introduction.pdf?rlkey=hzbpsha72p5e3ed4mdvcgcda5&e=1&st=pz5u977e&dl=0
On-device Benefits
- Better Privacy
- Lower Cost
- Customizations
- Life-long learning