New AI Algorithm Can Reduce LLM Energy Usage by 80-95%
New Linear-complexity Multiplication (L-Mul) algorithm claims it can reduce energy costs by 95% for element-wise tensor multiplications and 80% for dot
New Linear-complexity Multiplication (L-Mul) algorithm claims it can reduce energy costs by 95% for element-wise tensor multiplications and 80% for dot products in large language models. It maintains or even improving precision compared to 8-bit floating point operations.
Solution in this Paper
– Approximates floating-point multiplication using integer addition
– Linear O(n) complexity vs O(m^2) for standard floating-point multiplication
– Replaces tensor multiplications in attention mechanisms and linear transformations
– Implements L-Mul-based attention mechanism in transformer models
Key Insights from this Paper
– L-Mul achieves higher precision than 8-bit float operations with less computation
– Potential 95% energy reduction for element-wise tensor multiplications
– 80% energy reduction for dot products compared to 8-bit float operations
– Can be integrated into existing models without additional training
Results
– L-Mul with 4-bit mantissa: comparable precision to float8 e4m3
– L-Mul with 3-bit mantissa: outperforms float8 e5m2
– Attention mechanism replacement: 0.07% average performance loss across NLP tasks
– Vision tasks: 0.12% accuracy improvement
– Full model fine-tuning: equivalent results to float8 e4m3 accumulation precision
Article