Dataset Distillation Survey

Published: March 25, 2024

An Introduction to Dataset Distillation and its Application

Training large AI models typically requires large-scale datasets, making the training and parameter-tuning process both time-consuming and costly. Dataset Distillation (DD) addresses this problem by carefully synthesizing a very small number of highly representative and informative samples from real-world datasets, offering a promising perspective for data-efficient learning.

This survey provides a comprehensive introduction to the field of dataset distillation, covering:

Foundational concepts and problem formulation of dataset distillation
Core methods: performance matching, parameter matching, distribution matching, and trajectory matching
Applications: continual learning, neural architecture search, federated learning, and privacy preservation
Challenges and future directions in scaling DD to larger datasets and more complex tasks

Resources

Full survey slides (pdf): An Introduction to Dataset Distillation and its Application

Presentation slides (pptx): Dataset Distillation Presentation

Related code: AST Implementation

Share on

Twitter Facebook LinkedIn

SHEN Jiyuan

An Introduction to Dataset Distillation and its Application

Resources

Share on