1. Propagation of Existing Biases
- Garbage in, garbage out: Synthetic data generators are themselves AI models, trained on existing data. If this base data contains biases or limitations, these will be reflected in the synthetic outputs.
- Representation issues: Underrepresented groups in the original data will likely remain underrepresented in synthetic data.
- Example: A dataset with limited diversity (e.g., only 30 Black individuals, aLL middle-class) will produce synthetic data that reflects and potentially amplifies these limitations.