Strategies to Improve Real-World Generalization of Models Trained on Synthetic Data
- Data augmentation with real-world samples
Incorporate a portion of real-world data into the training set
Gradually increase the proportion of real data as training progresses
- Domain randomization
Introduce random variations in synthetic data generation
Helps model learn invariant features that generalize better
- Transfer learning
Pre-train on synthetic data, then fine-tune on smaller real-world dataset
Leverages large synthetic datasets while adapting to real-world nuances