You are viewing a single comment's thread from:

RE: LeoThread 2024-08-21 03:32

in LeoFinance6 months ago

Salesforce releases ‘xGen-MM’ open-source multimodal AI models to advance visual language understanding

Salesforce, the enterprise software giant, has released a new suite of open-source large multimodal AI models that could accelerate research and development of more capable artificial intelligence systems.

The models, dubbed xGen-MM (also known as BLIP-3), represent a significant advance in AI’s ability to understand and generate content combining text, images and other data types.

In a paper published on arXiv, researchers from Salesforce AI Research detailed the xGen-MM framework, which includes pre-trained models, datasets, and code for fine-tuning. The largest model, with 4 billion parameters, achieves competitive performance on various benchmarks compared to similar-sized open-source models.

#ai #technology #salesforce

Sort:  

Unleashing AI’s potential: Salesforce’s game-changing open-source models

A key innovation of xGen-MM is its ability to handle “interleaved data” combining multiple images and text, which the researchers describe as “the most natural form of multimodal data.” This capability allows the models to perform complex tasks like answering questions about multiple images simultaneously, a skill that could prove invaluable in real-world applications ranging from medical diagnosis to autonomous vehicles.

The release includes variants of the model optimized for different purposes, including a base pretrained model, an “instruction-tuned” model for following directions, and a “safety-tuned” model designed to reduce harmful outputs. This range of models reflects a growing awareness in the AI community of the need to balance capability with safety and ethical considerations.

Salesforce’s decision to open-source these models could significantly accelerate innovation in the field. By providing researchers and developers with access to high-quality models and datasets, Salesforce is enabling a wider range of participants to contribute to the advancement of multimodal AI. This move stands in contrast to the more closed approaches of some tech giants, who have kept their most advanced models under wraps.

However, the release of such powerful models also raises important questions about the potential risks and societal impacts of increasingly capable AI systems. While Salesforce has included safety tuning to mitigate risks, the broader implications of widespread access to advanced AI models remain a topic of debate in the tech community and beyond.

Beyond text and images: The rise of interleaved, multimodal AI

The xGen-MM models were trained on massive datasets curated by the Salesforce team, including a trillion-token scale dataset of interleaved image and text data called “MINT-1T.” The researchers also created new datasets focused on optical character recognition and visual grounding, areas that are crucial for AI systems to interact more naturally with the visual world.

As AI systems become more advanced and ubiquitous, Salesforce’s open-source release provides valuable tools for researchers to better understand and improve these powerful technologies. It also sets a precedent for transparency in a field often criticized for its lack of openness. The move could pressure other tech giants to be more forthcoming with their own AI research and development.