Evolving Universal Datasets: Cross-Architecture Generalization via Evolutionary Distillation

The proliferation of large-scale datasets has been central to the success of modern deep learning, yet it presents significant challenges in terms of computational cost, training time, and data privacy. These issues are particularly acute in applications like Neural Architecture Search (NAS), where repeated training is time consuming. Dataset distillation offers a compelling solution by synthesizing small, information-rich datasets that act as efficient, privacy-preserving proxies for the originals. However, the practical utility of current distillation methods is severely hampered by a critical flaw: poor cross-architecture generalization. Datasets distilled for one network architecture often fail when used to train a different one, limiting their use as universal training assets.

This thesis aim to directly confront this generalization challenge by proposing a novel distillation framework based on an Evolutionary Algorithm (EA). We posit that conventional gradient-based optimization methods are prone to finding solutions overfitted to a single model’s inductive biases. In contrast, an EA can perform a more global search for a truly architecture-agnostic dataset. The core contribution of this work is a new fitness function that explicitly rewards generalization. By evaluating a candidate dataset’s performance across a diverse portfolio of architectures, our evolutionary search is driven to discover a compact dataset that captures universal features. This objective is further refined by incorporating gradient matching principles and full training epoch evaluations, ensuring the resulting dataset is not only generalizable but also effective for training robust models.