Dataset Distillation Survey

Published:


An Introduction to Dataset Distillation and its Application

Training large AI models typically requires large-scale datasets, making the training and parameter-tuning process both time-consuming and costly. Dataset Distillation (DD) addresses this problem by carefully synthesizing a very small number of highly representative and informative samples from real-world datasets, offering a promising perspective for data-efficient learning.

This survey provides a comprehensive introduction to the field of dataset distillation, covering:

  • Foundational concepts and problem formulation of dataset distillation
  • Core methods: performance matching, parameter matching, distribution matching, and trajectory matching
  • Applications: continual learning, neural architecture search, federated learning, and privacy preservation
  • Challenges and future directions in scaling DD to larger datasets and more complex tasks

Resources

Full survey slides (pdf): An Introduction to Dataset Distillation and its Application

Presentation slides (pptx): Dataset Distillation Presentation

Related publication: AST: Effective Dataset Distillation through Alignment with Smooth and High-Quality Expert Trajectories

Related code: AST Implementation