Formalizing Validation, Disabling and Enabling Stakers, Staker Termination
For this post, I will first continue last week’s discussion of validation, then describe the focus of development for the week.
Formalizing the Assertion that Validation Defines a System
Upon reflection, I realized that even though I said last week validation was important, I don’t think I struck at the core reason why validation defines a system, where the system is defined a network of connected nodes that seek to achieve consensus. An example of a system in this context would be the Bitcoin network that seeks to achieve consensus on the state of the Bitcoin ledger.
To make my point that validation defines a system, I realize that I need to formalize two very disparate systems and show how they differ in whether they can achieve consensus.
Before we present these two systems, we have to define a “system” as a state machine, which is a very common formalization in cryptocurrencies. For example, Bitcoin is a state machine where the state is the UTXO set (the UTXO set is basically the unspent BTC). A more concrete example might be a ledger with two accounts, Bob and Alice. One state of this ledger would be that Bob’s account has $10 and Alice’s account has $5. This ledger could be held on a state machine that receives a message that says Bob’s account must transfer $5 to Alice’s account. Since Bob has at least $5, this message is valid, and the ledger undergoes a state change where Bob’s account now has $5 and Alice’s account has $10.
To further this formalization, it is convenient to decompose the system into three parts:
- A messaging component
- A validating component
- An operational component
To understand how these three components work together in the ledger example, the messaging component creates the message “Transfer $5 from Bob to Alice”, the validating component ensures that Bob has at least $5, and the operational component adjusts the balances. For the purposes of this discussion, we can ignore the operational component because it will only perform those operations that pass the validating component. We are left only with the messaging component and the validating component.
What are the sources of these messages? This question is actually very important, but the answer is quite simple: it doesn’t matter. What matters, however, is the validating component.
To understand why the nature of the validating component is critical but the nature of the messaging component is irrelevant, imagine two systems:
In the first system, the messaging component is a room full of monkeys typing on computer terminals. These monkeys essentially create random messages, overwhelmingly producing nonsense. Every once in a while, the monkeys will write a Shakespearean sonnet. Sometimes the monkeys will even write a message that indicates a state transition for a ledger. For example, one of the monkeys might randomly type “transfer $5 from Bob to Alice”. Let’s imagine these monkey messages are fed into a perfectly functioning validator for a ledger.
In the second system, the messaging component is also an omniscient observer that can ascertain the state of the network and create messages that specify state transitions that will be accepted by all nodes IF all nodes have a perfect validating component. In this system, however, the validating component happens to be one where each node accepts or rejects messages randomly. For example, let’s say all nodes begin in a state where Bob has a balance of $10. The messaging component creates a message specifying “transfer $5 from Bob to Alice”. Each node gets this message and rejects it randomly, say 50% of the time.
Which of these two systems will be able to achieve consensus? Obviously, the first system will be able to achieve consensus because the state transitions must undergo perfect validation. In the second system, even though the messaging component is perfect, the lack of validation causes the nodes to quickly diverge from consensus. If the network is composed of only a handful of nodes, one expects the second system to never achieve consensus beyond the initial state.
-------
Development Progress
Code Reorganization
This week began with code reorganization and wrapping up the basics of block and transaction validation, which I discussed last week. Code reorganization has simply meant that I will have a header and source file for each class separately. This reorganization is already improving my development speed because the header files document the classes, making it easy for me to add new functionality without duplicating my work or creating inconsistent functionalities.
Accounting for Block Production and Missed Blocks
After this reorganization, I started working on what happens when stakers produce or miss blocks. Additionally I wrote the logic that accounts for when a staker’s immediate predecessor in the round misses its block. Below, I discuss how these events factor into staker termination.
Staker Disqualification and Termination
I have also changed when stakers are terminated, and adjusted the terminology, which I indicate with quotation marks. Before stakers are “terminated”, they are “disqualified”. Disqualification happens immediately after a staker misses or produces a block to which the staker is assigned. When a staker is disqualified, it is marked for termination. Stakers are terminated at the end of a round, wherein they are removed from the registry. The reason disqualified stakers must be kept until the end of the round is because the tests for disqualification depend on the statistics for all stakers assigned to that round.
Protocol Improvement: Enabling/Disabling Stakers
One protocol improvement I realized would be necessary is that stakers can be disabled. Importantly, a disabled staker will not be assigned blocks. Disabled stakers do not earn block rewards, but they do not accrue negative statistics either, like missed blocks. Disabling and enabling stakers will be an inexpensive and seamless process. The ability to disable a staker has many applications. For example, a staker owner could have a failover service where, if the staker’s delegate begins missing blocks, the failover service disables the staker, preventing the staker from being terminated or losing earning capacity.
Stakers begin life disabled so that owners do not need to consider the time to get stakers operational when they are purchased. I believe this change will facilitate greater aftermarket liquidity.
Protocol Improvement: Controller Pubkey
The ability to have two different services for delegation and enabling/disabling stakers is owed to another protocol improvement, which is that stakers have a third public key, called the controller pubkey. A staker owner can assign a delegate pubkey, which is used to produce blocks. The delegate pubkey is different from a controller pubkey, which is used to enable or disable the staker. These two pubkeys can also be different from the owner pubkey. This separation of privileges is much more secure than assigning all privileges to a single key. For example, imagine that a delegate is hacked and the hacker decides to sabotage the staker by forcing it to miss blocks. If the failover (tied to the controller pubkey) is hosted by a different service, the hacker would have to hack the failover service as well to eliminate the staker. Because the owner key can be different from the delegate and controller keys, the owner is protected from theft in case either or both of the delegate and controller keys are compromised.
Criteria for Staker Termination
Finally, I have believe I have settled on the criteria for staker termination. In all cases, a staker will be terminated if its immediate predecessors miss seven standard deviations more blocks than the mean of all the stakers for the last 4096 rounds. The likelihood of this happening by chance for a single staker in a single round is about 1 in 781 billion. In other words, if there are 30 stakers, a given staker would have about a 50% chance of getting terminated after 1,857,000 years.
Let’s imagine a set of 30 stakers have predecessors missing an average of 7 blocks in the last 4096 rounds, with a standard deviation of 4. To get disqualified, a staker’s immediate predecessors would have to miss more than 35 blocks, which is expected to happen randomly only once every 3,714,000 years, on average.
Absent this termination criterion, a staker must have been assigned at least 8192 blocks and
- have either 0 or fewer net blocks (net blocks = produced blocks - missed blocks) or
- missed more than 2048 of the last 4096 blocks.
These latter rules mean that, even though the criterion for predecessor misses is statistically very generous, a malicious staker has no practical way to disqualify other stakers by forcing them to miss blocks.
–––––
Hondo