You are viewing a single comment's thread from:

RE: LeoThread 2024-11-11 05:49

In tandem, Berkeley’s Sky Computing Lab also birthed vLLM in 2022, spearheaded by researchers Zhuohan Li, Woosuk Kwon, and Simon Mo, who started the project after developing a system to distribute complex processes across multiple GPUs more efficiently. vLLM leans on a new “attention algorithm” dubbed PagedAttention, which helps reduce memory waste and is already being used by developers at companies such as AWS, Cloudflare, and Nvidia.