3rd update of 2022 on BlockTrades work on Hive software

in HiveDevs3 years ago (edited)

blocktrades update.png

Below are highlights of some of the Hive-related programming issues worked on by the BlockTrades team during the past month.

Hived (blockchain node software) work

We finally got around to a long-needed overhaul of the communication between the blockchain thread (this processes blocks, transactions and operations) and the peer-to-peer thread (this thread gets all that data from peer nodes in the Hive p2p network and puts it on a “write_queue” for the blockchain thread to process).

The p2p code was originally written for another blockchain (BitShares) and when it was incorporated into Steem’s code (and later Hive code), a few things were somewhat broken.

The most problematic issue was that the p2p thread employs cooperative multi-tasking (non-OS level fibers/tasks implemented in a library called “fc”) to service requests from peers in the network in a timely fashion.

Whenever a peer provides a new block or transaction, the p2p task assigned to this peer adds this new item to the blockchain’s write_queue and blocks until the blockchain signals that it has processed that new item (this task is waiting to find out if the blockchain says the new item is valid or not, so that it knows whether or not to share it with other peers). When all the code was using fc-task-aware blocking primitives (fc::promise), a new p2p task could be started as soon as the one placing the item on the queue blocked.

But when this code was transplanted to Steem, the blocking code was switched to use boost:promise objects, which are not fc-task-aware, so no new p2p task would get scheduled whenever a p2p task placed a new item on the write_queue, effectively stopping the p2p thread from doing any more work until the blockchain finished processing the new item. If the blockchain took a while to process items on the queue, this could even result in a loss of peers because they decided that this hived node wasn’t responding in a timely manner and disconnected from it. But generally it just slowed down the speed of the p2p network.

To resolve this problem, we changed the code so that the p2p tasks now block using an fc::promise when placing items on the write_queue. We also create new task that waits on this response, so that the primary task for the peer can continue to manage communication with the peer while our new task is awaiting a response from the blockchain thread.

Now, there is one other way this write queue can be written to, via an API call to broadcast_transaction (or it’s bad cousin, broadcast_transaction_synchronous), and these still use a boost promise (and should do so). So the write queue processor in the blockchain thread has to respond to each item that gets processed by waking up the associated blocked task using either an fc promise or a boost promise, depending on the source of the new item (either the p2p thread or the thread used to process the API call).

Another problem we fixed was the “potential peer” database was not getting saved to the peers.json file when hived exited. This meant that when a hived node was restarted, it always had to first connect to at least one of Hive’s public “seed nodes” that are hardcoded into hived itself before it could find other peers in the network.

This created an undesirable centralization point for hived nodes that needed to re-connect to the network after being restarted. Now that this file is being properly saved off at shutdown, a restarting node can try to connect to any of the peers in its peer database, not just the seed nodes. The new code also periodically saves to the peer database, allowing a user to inspect the “quality” of the peers connected to their node (e.g. how long since a given peer has failed to respond to a request).

While we were fixing this problem, we also took the opportunity to improve the algorithm used by the node to select from potential peers stored in the database. Now the node will first try to connect to new peers it saw most recently that it didn’t have any communication problems with. Next it will retry connecting to peers that it did experience an error with, trying first to connect to peers where the error happened longest ago.

We had reports that the p2p layer was also under-performing when a new node was synced to other peers from scratch. For this reason, most experienced hived node operators don’t use this method, but instead download a block_log file with all the blockchain data from a trusted source.

But since we were working in the p2p code anyways, we decided to spend some time optimizing the sync performance of the p2p layer. We made many improvements to the algorithms used during sync, and our tests have shown that sync time is now solely bottlenecked by the time required by the blockchain to process blocks (i.e.any further improvements to the p2p syncing process would not further speed up the overall syncing process, they would only lower CPU usage by the p2p thread).

This is true even when we set a block_number/block_hash checkpoint when launching a sync of hived. Setting a checkpoint allows the blockchain thread to do less work. Observing the speed with which blocks were processed during our testing, I would guess the blockchain was almost 3x faster at processing blocks before the checkpoint block number. So even when the blockchain thread is configured to do the least amount of work, it is still the performance bottleneck now, and we would need to substantially speedup blockchain processing before it would make sense to look at making further improvements to p2p sync performance.

Command-line-interface (CLI) wallet improvements

The CLI wallet is a command-line wallet that is mostly used by a few expert hive users and cryptocurrency exchanges. It also is useful for automated testing purposes.

We’ve been refactoring the CLI wallet code to ease future maintenance of this application and also improving the wallet API (this is an API provided by the CLI wallet process that can be used by external processes such as Hive-based applications or automated scripts).

As part of the improvement process, we’ve also added an option to API calls to control how the output from the API call is formatted (for example, as web client-friendly json or as human-friendly tabular data).

Hive Application Framework (HAF)

We’re currently adding filtering options for operating “lightweight” HAF servers that store less blockchain data. The first filtering option we’re adding uses the same syntax used by the account history plugin to limit operations and transactions stored to those which impact a specific set of accounts.

This form of filtering will allow hafah servers to duplicate the lightweight behavior of a regular account history node that is configured to only record data for a few accounts (for example, exchanges often operate their own account history node in this mode to save storage space).

We’ve spent a lot of time creating more tests for account history functionality, and further verifying the results of hafah against the latest development version of hived’s account history plugin (and we’ve also verified the performance of that versus the master branch of hived’s account history plugin deployed in production now).

HAF account history app (aka hafah)

We’re periodically testing hafah on our production system, then making improvements whenever this exposes a performance problem not discovered by automated testing.

We’re also finishing up work now for creating dockerized HAF servers and modifying the continuous integration process (i.e. automated testing) for all HAF-based apps to re-use existing dockerized HAF servers when possible. We’re using hafah’s CI as the guinea pig for this process.

This will allow for much faster testing time on average, especially when we want to run benchmarks on a HAF app with a fully populated HAF database. Currently it takes over 10 hours to fully populate a HAF database with 60M+ blocks from scratch.

Conceptually, the idea is simple: if a HAF app (e.g. hafah) needs a specific version of a HAF server populated with a certain number of blocks, the automated testing system will first see if it can download a pre-populated docker image from gitlab’s docker registry (or a local cache on the builder) with the proper version of HAF and the required amount of block data. Only if this fails will it be required to create one itself (which can then be stored in the docker registry and re-used in subsequent test runs).

Hivemind (social media middleware server used by web sites)

We’ve added a new person to the Hive development team who is working on conversion of Hivemind to a HAF-based app (under Bartek’s supervision).

What’s next?

  • Modify the one-step script for installing HAF to optionally download a trusted block_log and block_log.index file (or maybe just allow an option for fast-syncing using a checkpoint to reduce block processing time now that peer syncing process is faster and may actually perform better than downloading a block_log and replaying it).
  • Continue work on filtering of operations by sql_serializer to allow for smaller HAF server databases.
  • Collect benchmarks for hafah operating in “irreversible block mode” and compare to a hafah operation in “normal” mode.
  • Further testing of hafah on production servers (api.hive.blog).
  • Finish conversion of hivemind to a HAF-based app.
  • Complete testing of new P2P code under forking conditions and various live mode scenarios and in a mirrornet testnet using only hived servers with the new P2P code (tests so far have only been performed on the mainnet in a mixed-hived environment).
  • Experiment with methods of improving block finality time.
  • Complete work on resource credit rationalization.

Current expected date for the next hardfork is the latter part of April assuming no serious problems are uncovered during testing over the next month.

Sort:  

I don´t understand a single word, just the last thing struck me: "resource credit rationalization". Can you please expand on this? Does this e.g. means sooner or later the resource credits or the account claim tokens will be kind of tradeble?

No, it's not about trading of account claim tokens.

Rationalization here means that accounting for RC costs will be charged more realistically (more rationally) based on their real costs to the blockchain. For example, one of the dominant factors in terms of CPU consumption for processing an operation is checking the cryptographic signatures that were used to sign the transaction that contains the operation, and this cost is independent of what the actual operation is (and instead varies based on how many signatures were included). But previously the cpu costs for checking these signatures was ignored.

it's great to start taking into account that CPU time cost of transactions. As you say, finally getting more rational (realistic)

I think it may have been ignored because the cpu limits were set based on keeping replay time from getting too long, and replay doesn't check signatures. There might be merit in having two different limits.

Alternately, keeping p2p sync from getting too long may be a better target than replay, in which case the CPU limit should just include signature checking, as you suggest.

Experiment with methods of improving block finality time.

Is this (the block finality time) have something to do with the recent transaction errors with the Splinterlands game? Or one or more of the servers was/were overloaded?

Yesterday it wrote the following two times, when I tried to claim the season end rewards:

There was an error completing this transaction: Block processing is behind. Please try again later.

img_0.1978033690335763.jpg

img_0.36202010993651246.jpg

This error sounds like a Splinterlands problem (following the head block), not the hived software itself.

I thought the same at first. Probably one (or more) of its server(s) was/were overloaded.

Weldone @blocktrades and your entire team. Hive Application Framework is expanding and getting better day by day with your great work 👍. And for providing more solutions to hive in making it a best and better place for it's users.

We’ve added a new person to the Hive development team who is working on conversion of Hivemind to a HAF-based app (under Bartek’s supervision).

I also warmly welcome the new team member and I am optimistic that with new great hand joining the team more achievements would be accomplished.

My best regards to the entire team and thanks @blocktrades for the great work and always giving us updates 🤝.

Congratulations @blocktrades! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

You received more than 1585000 HP as payout for your posts, comments and curation.
Your next payout target is 1590000 HP.
The unit is Hive Power equivalent because post and comment rewards can be split into HP and HBD

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Support the HiveBuzz project. Vote for our proposal!

Hive for the past month has really improved and adding another new person to the development team will bring a great development to hive the more.

Great work, as always. Thanks!

One small thing...

Now the node will first try to connect to new peers it saw most recently that it didn’t not have any communication problems with.

Typo or intended double negative?

Just an accidental double negative, should read "that it didn't have any communications problems with". I'll fix it.

hey blocktrades, I tried to use your site but I realized that you don't have a spanish version, that would help a lot not only me but the whole spanish speaking community to be able to use your services!

We have support for multiple languages on the site nowadays, so I guess we just need to find someone to do a translation file for us. We've just been too busy with hive to work on the site in a while.

ooo very nice to see syncing from 0 has gotten some improvements! It nice to have the layer of trustless-ness available for those who don't trust anyone xD.

Yes, I recalled the delays you experienced when setting up your witness node by syncing from scratch when we started the speedup effort :-)

Holy smokes! It sure does take a lot of work to keep the wonderful world of Hive moving forward. Thank you for doing that thing you do so that the rest of us can do that thing we do. Cheers!

I appreciate these developments and its represented well in this blog, wish to see more developments in upcoming days.

Keep on the excellent work!

I don't pretend to understand how the p2p thread works, but what happens to the data on the "writing_queue" after a certain item, if the blockchain thread signals that item is invalid?

The write_queue is a place where the p2p thread puts stuff temporarily, so that the blockchain code can process it. When the blockchain code runs, it pops the oldest item off the queue, decides whether it is a valid transaction or not, then signals back to the p2p task that put it on the queue abou it's validity.

So these items are always being removed from the queue by the blockchain code, regardless of whether they are valid or invalid. In other words, the write_queue is just a producer/consumer buffer between the two threads.

Congratulations @blocktrades! Your post has been a top performer on the Hive blockchain and you have been rewarded with the following badge:

Post with the highest payout of the week.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out the last post from @hivebuzz:

Hive Power Up Month - Feedback from February day 15
Support the HiveBuzz project. Vote for our proposal!