RE: LeoThread 2025-02-01 10:54

Peng wanted to test this hunch. His team started by studying the properties of a simple transformer, one with only a single layer, which learns to “pay attention” to the ordering and position of a sentence’s words when trying to predict the next word. (Modern LLMs have scores of such layers.) The team established a link(opens a new tab) between the complexity of the transformer layer and the “domain size,” or the number of bits required to represent the questions. By focusing on this simple model, they proved a mathematical bound. “If the total number of parameters in this one-layer transformer is less than the size of a domain, then transformers provably cannot solve the compositional task,” Peng said. In other words, an LLM with only one transformer layer was clearly and mathematically limited.