Stable Diffusion 3.5 debuts as Stability AI aims to improve open models for generating images
Stability AI Raises the Bar with Stable Diffusion 3.5: A Major Update for Text-to-Image Generative AI
Stability AI, a pioneer in the text-to-image generative AI space, has released a major update to its technology with the debut of Stable Diffusion 3.5. This new update aims to reclaim the company's leadership position in the market, which has been increasingly competitive with the emergence of rivals such as Black Forest Labs' Flux Pro, OpenAI's Dall-E, Ideogram, and Midjourney.
Customizable Models and Improved Quality
Stable Diffusion 3.5 introduces multiple model variants, each designed to cater to different user needs. The new models are highly customizable and can generate a wide range of different styles. The update includes three new models: Stable Diffusion 3.5 Large, Stable Diffusion 3.5 Large Turbo, and Stable Diffusion 3.5 Medium.
Stable Diffusion 3.5 Large is an 8 billion parameter model that offers the highest quality and prompt adherence in the series. This model is designed for users who require the highest level of image quality and precision. Stable Diffusion 3.5 Large Turbo is a distilled version of the large model, providing faster image generation while maintaining a high level of quality. This model is ideal for users who require rapid image generation without sacrificing quality. Stable Diffusion 3.5 Medium has 2.6 billion parameters and is optimized for edge computing deployments, making it suitable for users who require a balance between quality and computational resources.
Lessons Learned from Previous Release
The original release of Stable Diffusion 3 Medium in June was a less than ideal release. However, the lessons learned from that experience have helped inform and improve the new Stable Diffusion 3.5 updates. According to Hanno Basse, CTO of Stability AI, the company identified that several model and dataset choices made for the Stable Diffusion Large 8B model were not optimal for the smaller-sized Medium model.
"We did thorough analysis of these bottlenecks and innovated further on our architecture and training protocols on the Medium model to provide a better balance between the model size and the output quality," Basse explained. "This experience has allowed us to refine our approach and create a more robust and high-quality model."
Novel Techniques and Improvements
Stability AI has taken advantage of several novel techniques to improve quality and performance in Stable Diffusion 3.5. One notable addition is the integration of Query-Key Normalization into the transformer blocks. This technique facilitates easier fine-tuning and further development of the models by end-users. This innovation enables users to adapt the models to their specific needs and applications.
Stability AI has also enhanced its Multimodal Diffusion Transformer MMDiT-X architecture, specifically for the medium model. MMDiT-X is able to help improve image quality and enhance multi-resolution generation capabilities. This architecture allows the model to generate images with a range of resolutions, from low to high, while maintaining a high level of quality.
Prompt Adherence and ControlNets
Stable Diffusion 3.5 Large demonstrates superior prompt adherence compared to other models in the market. This is achieved through a combination of better dataset curation, captioning, and additional innovation in training protocols. Prompt adherence refers to the model's ability to accurately generate images that match the input prompt. This feature is essential for users who require precise control over the generated images.
Looking forward, Stability AI is planning on releasing a ControlNets capability for Stable Diffusion 3.5. ControlNets will provide more control for various professional use cases, such as upscaling an image while maintaining the overall colors or creating an image that follows a specific depth pattern. This feature will enable users to fine-tune the generated images to meet their specific requirements.
Availability and Licensing
All three new Stable Diffusion 3.5 models are available under the Stability AI Community License, which is an open license that enables free non-commercial usage and free commercial usage for entities with annual revenue under $1 million. Stability AI has an enterprise license for larger deployments. The models are available via Stability AI's API as well as Hugging Face.
In conclusion, Stable Diffusion 3.5 represents a significant update to Stability AI's text-to-image generative AI technology. With its highly customizable models, improved quality, and novel techniques, Stability AI is poised to reclaim its leadership position in the market. The company's commitment to innovation and customer satisfaction has led to a major update that addresses the needs of users and sets a new standard for text-to-image generative AI.
Article