The ever-shrinking model
According to a study from researchers at Harvard, Stanford, MIT, Databricks, and Carnegie Mellon, quantized models perform worse if the original, unquantized version of the model was trained over a long period on lots of data. In other words, at a certain point, it may actually be better to just train a smaller model rather than cook down a big one.
That could spell bad news for AI companies training extremely large models (known to improve answer quality) and then quantizing them in an effort to make them less expensive to serve.