Results
– L-Mul with 4-bit mantissa: comparable precision to float8 e4m3
– L-Mul with 3-bit mantissa: outperforms float8 e5m2
– Attention mechanism replacement: 0.07% average performance loss across NLP tasks
– Vision tasks: 0.12% accuracy improvement
– Full model fine-tuning: equivalent results to float8 e4m3 accumulation precision