But extremely low quantization precision might not be desirable. According to Kumar, unless the original model is incredibly large in terms of its parameter count, precisions lower than 7- or 8-bit may see a noticeable step down in quality.
If this all seems a little technical, don’t worry — it is. But the takeaway is simply that AI models are not fully understood, and known shortcuts that work in many kinds of computation don’t work here. You wouldn’t say “noon” if someone asked when they started a 100-meter dash, right? It’s not quite so obvious as that, of course, but the idea is the same: