You are viewing a single comment's thread from:

RE: LeoThread 2024-11-19 11:14

in LeoFinance3 days ago

Part 5/6:

Beyond basic quantization, the GG-LLM format developed by Georgi Gerganov takes things a step further. This format combines the model weights, quantization, and serving infrastructure into a single, efficient file. The LLaMA 3.2 model can be converted to a GG-LLM file that is only 800 MB in size and can be run directly on the CPU without requiring any GPU resources.

Conclusion