Dataset Distillation Survey
Published:
An Introduction to Dataset Distillation and its Application
Training large AI models typically requires large-scale datasets, making the training and parameter-tuning process both time-consuming and costly. Dataset Distillation (DD) addresses this problem by carefully synthesizing a very small number of highly representative and informative samples from real-world datasets, offering a promising perspective for data-efficient learning.
This survey provides a comprehensive introduction to the field of dataset distillation, covering:
- Foundational concepts and problem formulation of dataset distillation
- Core methods: performance matching, parameter matching, distribution matching, and trajectory matching
- Applications: continual learning, neural architecture search, federated learning, and privacy preservation
- Challenges and future directions in scaling DD to larger datasets and more complex tasks
Resources
Full survey slides (pdf): An Introduction to Dataset Distillation and its Application
Presentation slides (pptx): Dataset Distillation Presentation
Related publication: AST: Effective Dataset Distillation through Alignment with Smooth and High-Quality Expert Trajectories
Related code: AST Implementation
