It indicates code quality in some cases and not others. If used properly, yes, it can be used to measure code quality. But this report used it improperly in three major ways:
- They compared results numerically across different languages.
- They combined results in security critical code with results in non-critical code.
- They didn't survey a statistical sample of their results to determine how many of them reflected actual bugs.
3 is the most important because if they had done this for Ripple, they would have discovered that all their results were false positives because Ripple uses this exact same technique to find actual issues.
Formal verification is not yet ready for real world use on systems as complex as the ones we're dealing with. We follow the research and there's certainly hope that it might be possible some day. We've particularly looked into formal verification of aspects of ILP's design (with some actual successes and some issues found that way) and with the mechanics of consensus and validation (with very limited success to date). Much more research is needed in this area before it will be practical on a larger scale.