Beyond text and images: The rise of interleaved, multimodal AI
The xGen-MM models were trained on massive datasets curated by the Salesforce team, including a trillion-token scale dataset of interleaved image and text data called “MINT-1T.” The researchers also created new datasets focused on optical character recognition and visual grounding, areas that are crucial for AI systems to interact more naturally with the visual world.
As AI systems become more advanced and ubiquitous, Salesforce’s open-source release provides valuable tools for researchers to better understand and improve these powerful technologies. It also sets a precedent for transparency in a field often criticized for its lack of openness. The move could pressure other tech giants to be more forthcoming with their own AI research and development.