You are viewing a single comment's thread from:

RE: LeoThread 2024-09-25 05:16

in LeoFinance6 months ago

Technical Advancements

  • Adapter Weights: Meta integrated image encoder into the language model using adapter weights, enabling image reasoning capabilities.
  • Cross-Attention layers: The adapter consists of cross-attention layers that feed image encoder representations into the language model.
    Alignment Training: Meta employed alignment training, supervised fine-tuning, rejection sampling, and direct preference optimization to refine the model.
  • Synthetic data Generation: LLaMA 3.1 was used to generate synthetic data for question-answer pairs on tOP of in-domain images.