RE: LeoThread 2025-02-17 08:49

Part 6/10:

Initial tests were conducted using a single Mac Studio and then with the cluster setup. The results from these tests revealed that while individual devices could handle smaller models efficiently, introducing the cluster complicated performance due to bandwidth limitations, significantly reducing token processing speeds from 117 to 29 tokens per second when the entire group was engaged.

While the creator noted frustrations with the reduced performance in larger setups, they also acknowledged improvements when utilizing Thunderbolt connections, indicating that even small changes could unlock better performance.

RE: LeoThread 2025-02-17 08:49

Exploring Quantization and Memory Management