-
Why do we need to compress ML Model?

- Bridges the gap between the Supply and Demand of AI computing
- Improve hardware utilization
- Exponential growth in model parameter (0.05B to Trillions) while the GPU memory growth has plateaued
- Ship more models
- Ship bigger models
- Run models locally on-device liked edge, phones, microcontrollers
- On-device training for customization, privacy and life long learning
-
Reduce model size while maintaining accuracy doing neural architecture search techniques
-
Accelerate model processing capabilities like processing x vs 10x images
-
We now have a new module after model development - Model Compression (Quantization, Pruning, Distillation, Sparsity)
-
What are various examples and types of compression used? (Slide)
-
On-device Benefits
- Better Privacy
- Lower Cost
- Customizations
- Life-long learning