Developing the Block Production and Registry Sync Loops
This week I worked on the two most critical processing threads in quantum Proof-of-Stake (qPoS), which are (1) the block production thread and (2) the registry synchronization thread.
In today’s post, first I discuss the structure and functions of these two threads and how they interact. Second, I briefly review this week’s development progress.
Two Key Threads for qPoS
The special requirements for qPoS means that it will have two threads that are different from other consensus systems. These two threads are a block production thread, and a unique type of thread that synchronizes the registry.
Relative to Proof-of-Stake (PoS) and Proof-of-Work (PoW), qPoS will be vastly more efficient. With this efficiency comes critical synchronization requirements. Full nodes must undergo registry state changes accurate to within a few milliseconds, even when taking network latency into consideration. For this reason, each full node must have a synchronization thread that ensures it responds in a consensus-compatible way to blockchain events, the most critical of which is the creation and addition of new blocks to the blockchain. The creation of new blocks is called “block production”.
Block Production Thread
For Stealth, block production is currently by PoS, and in the future will be by qPoS. It is important to note that a block producing client will always have a live thread dedicated to block production. For PoS and PoW, this dedication is needed so that the client can create blocks as fast as it can, maximizing profitability. For qPoS block production must be continuous so that the client does not miss its infrequent requirement to create blocks. In qPoS, missed blocks not only result in lost block rewards, but also negatively impact reputation, reducing future earnings.
Figure 1: Main Loop of the Block Production Thread
Figure 1 illustrates the logic of the main loop of the block production thread. The direction of this loop is clockwise in the diagram, and the conceptual starting point is at 12:00, indicated by the grey triangle. Figure 1 illustrates key parts of qPoS logic, although the actual block production loop is much more complicated. First, the loop queries the registry to determine which staker should be signing a block at the time of the query. Then this loop will query the wallet to see if it owns this staker’s key. If so, the loop will attempt to create a block and submit it to the network. Then the loop sleeps for a fraction of a second (here, 1 millisecond), thereupon completing a cycle. It is important that the only interaction with the registry in this loop (which manages the queue and stakers) is a read-only query. Other parts of the Stealth codebase query the registry too, but only the registry synchronization loop, described below, can change the registry.
Registry Synchronization Thread
As its name implies, the registry synchronization thread ensures that the state of the registry (and its queue and active stakers) is consistent with (1) network time (called “adjusted time”) and (2) blockchain events. The registry synchronization thread will be the only thread wherein changes to the registry happen. Additionally, there will be one instance of this thread running in a given client. This design, wherein a single registry that undergoes state transitions only within a single thread, will avoid problems related to multithreading.
Figure 2: Main Loop of the Registry Synchronization Thread
Figure 2 illustrates the main loop of the registry synchronization thread. In this figure, the loop starts at 12:00 with the grey triangle, and goes clockwise. In the first step, the blockchain is queried for any new blocks that may have been added since the last iteration of the loop. Because they must be validated to go into the blockchain, these blocks will have timestamps that are in the past, meaning the block information must be processed before the current time is considered. The loop then updates the internal state of the registry to be consistent with this block information. Part of this update includes storing the hash of the most recent block known to the registry.
The loop then queries the network time (called “adjusted time”), which is essentially the time provided by the hardware system clock adjusted by network clock drift. The registry state is changed according this adjusted time, meaning that the loop itself does not “keep time”. A new queue is created, if necessary, based on all information available to the registry at this point in the loop. The loop completes when it sleeps for a fraction of a second. Both the block production and staker synchronization loops sleep briefly to reduce CPU load, even though these sleeping intervals are very small.
The ordering of the registry synchronization loop reflects the importance of different types of information. After the sleep, the most critical information comes from blockchain events. The registry must respond to new blocks without any other considerations, as these blocks have already been validated. Only after blockchain events are considered will the registry consider time information.
Importance of the Hardware Clock for Synchronization
It may not be obvious why querying the hardware clock is better than using the loop as a timekeeper. To see why, let’s first say that the hardware clock provides time that is perfect to within a few milliseconds of drift every day, but is synchronized daily by the node using the network time protocol (NTP). During every cycle of the registry synchronization loop, this hardware clock is queried and the state of the queue (the key data structure in qPoS that requires synchronization) is updated. Updates include advancing which staker should sign the current block and eliminating and adding stakers at the end of a round. We can see that if all nodes in the network keep their system clocks updated, the state of their registries will only be asynchronous by approximately the network latency.
In contrast, imagine that the loop itself is keeping time, and that it sleeps a certain amount of time then updates the Registry, estimating how long each process within the loop took. In this case, operations in the loop may take different times to complete on different nodes because of hardware difference and input/output (I/O) operations. Sometimes, the loop will even freeze while waiting on hardware operations. The thread itself has no idea it froze without querying an external source of time information. You can see that these freezes and differences in operation times will rapidly cause this loop to diverge from its state within other nodes. It is obvious from this example that the main loop used to update the registry must query an external time source (the hardware-based system clock) which itself should be kept synchronized using NTP.
-------
Coding Progress This Week
Staker Management
I started this week by completing key areas of staker management that I began just before the holidays. These areas are staker enabling/disabling, disqualification, and termination.
Conditional Compilation of Mining Logic
I then made the compilation of the PoW mining code optional, using preprocessor conditionals that can be easily adjusted before building the client.
The PoW mining code in the Stealth codebase is part of the Bitcoin legacy, and has persisted within Stealth since launch, even though PoW ended almost 4.5 years ago. The existence of this code has historically not been an issue so we never bothered to remove it. However in the last year or so, this mining code has increasingly triggered false positives for malware detection and I decided it must be removed from our release builds to instill greater confidence in end-users.
Although the mining logic was never needed on mainnet, it is very useful for testnet, so I want to keep the code in place, with the ability to easily incorporate the mining functionality. Preprocessor conditionals is the best way to achieve this flexibility.
One consideration in removing the PoW miner logic is that it is intermingled with the PoS minter logic in a processor thread that I will call the “block production thread”, or sometimes the “minter thread”. The main loop of this thread is implemented in the function presently called “StealthMinter()” in main.cpp. This intermingling means that splicing out the PoW miner with preprocessor conditionals makes the code a little less readable. I have decided that it is going to stay this way until we tackle code reorganization between testnet and mainnet.
QPoS Is Added to the Block Production Loop
Removing the PoW miner (i.e. making it preprocessor conditional) was part of modifying the main block production loop to handle qPoS block production as well. I completed this coding, representing the main accomplishment this week. For expediency, I added the qPoS logic to the existing loop, intending to refactor the existing block production loop. Once refactored, we will separate logic unique to PoW, PoS, and qPoS. I anticipate this refactoring after testnet and before mainnet release.
–––––
Hondo