You are viewing a single comment's thread from:

RE: LeoThread 2024-10-25 09:33

Results

– L-Mul with 4-bit mantissa: comparable precision to float8 e4m3

– L-Mul with 3-bit mantissa: outperforms float8 e5m2
– Attention mechanism replacement: 0.07% average performance loss across NLP tasks
– Vision tasks: 0.12% accuracy improvement
– Full model fine-tuning: equivalent results to float8 e4m3 accumulation precision