You are viewing a single comment's thread from:

RE: LeoThread 2024-12-27 09:16

in LeoFinance20 days ago

Local Chat GPT Models on Various Hardware

Dave tests running local chat GPT style large language models on different hardware, from a $50 Raspberry Pi to a $50,000 AI workstation, to compare performance and usability.

Summarized by Llama 3.3 70B Instruct Model

Sort:  

Testing on a Raspberry Pi

  • 📊 Dave installs LLaMA on a Raspberry Pi 4 with 8 GB of RAM, using Raspian.
  • ⚠️ The model runs slowly, about one word per second, due to the Pi's lack of GPU and limited CPU power.
  • 📝 The test shows that while the Pi can run the model, it's not practical for real-time use.

Testing on a Mini PC

  • 🖥️ Dave tests LLaMA on a Herk Mini PC, which starts at $388 and features a Ryzen 9 7940HS chip and Radeon 780M GPU.
  • 💻 He installs LLaMA directly on Windows and runs the 3.1 model, which performs well but doesn't utilize the GPU due to its limited memory.
  • 📊 A smaller 3.2 model is tested, which is faster but still doesn't use the GPU, likely due to compatibility issues.

Testing on Higher-End Hardware

  • 🤖 Dave moves on to a 3970x Thread Ripper with an Nvidia 4080 GPU, which runs the 3.1 model quickly and utilizes the GPU.
  • 📊 He also tests the model on an M2 Mac Pro, which performs well and can allocate system RAM as video RAM.
  • 🚀 Finally, he tests a 96-core Thread Ripper with an Nvidia 6000 Ada card, which struggles to run a massive 405 billion parameter model.

Conclusion

  • 📊 The size of the model and its complexity have a significant impact on performance, regardless of the hardware used.
  • 📈 Dave concludes that choosing the right model is crucial, and that even high-end hardware can be brought to its knees by large models.