Below is a list of some of the Hive-related programming issues worked on by BlockTrades team during the past work period. I haven’t had time to check in with all our Hive developers this week, so this report will just focus on the devs I’ve been working with directly, and I’ll report on work done by other devs in my next report.
Hive Application Framework: framework for building robust and scalable Hive apps
While running benchmarks of the HAF sql_serializer plugin over the past few weeks, I noticed that while the sql_serializer indexed most of the blocks very quickly, its performance was considerably slower once it dropped out of “massive sync” mode (where many blocks are processed simultaneously) into “single block” live sync mode. It also appeared that the sql_serializer was dropping out of massive sync mode earlier than it should, which could result in hundreds of thousands or even a million or more blocks being processed in this slower single block mode.
Benchmarks this week showed that the serializer can only process about 6.7 blocks/second on average in live sync mode, whereas in massive sync mode, it can process about 2300 blocks per second.
Live sync will always be slower than massive sync, but probably we can do better
Slower performance was expected in live sync mode, because this mode requires the sql tables to maintain several indexes and these indexes slow down the rate at which data can be added to the tables, but the magnitude of the slowdown was surprising (to me at least).
So one of the first things we’re looking at to improve this situation is optimizing live sync performance. I believe there’s a good chance that we can improve the speed of live sync mode considerably still, because the live sync coding so far was only focused on proper functional behavior.
But even if we get a 4x speedup in live sync performance, it is still much slower than massive sync mode, so we also need to reduce the amount of old blocks that get processed in live sync mode.
Performance impacts of swapping to live sync mode too early
As a real-world measurement of the impact of this, we replayed a hived with a block_log that was 6.5 days old, so it was missing about 189K blocks, and it only took 7.15 hours to process the 59M blocks in the block log. But it took another 10 hours to process the remaining 189K new blocks that it received from the peer-to-peer network that were missing from the block_log (plus it had to process new blocks that got created while processing the old blocks).
We can envision an impractical “worst case” scenario where we sync a hived with an empty block_log with the sql_serializer enabled. This would currently take 59M blocks / 6.7 blocks/s / 3600 seconds/hr / 24 hrs/day = 101 days! I say this is an impractical scenario, because this can be worked around by first performing a normal sync of the hived node without the sql_serializer (which takes about 14 hours), then replaying with the sql_serializer with a fully filled block_log (so add another 7.5 hrs). But it does demonstrate one motivation for improving performance of the serializer with regard to live sync mode.
Why does sql_serializer switch from massive sync to live sync so early?
Originally I believed that this transition from massive sync to live sync was accidentally happening too early because of an incorrect implementation of the psql_index_threshold
setting. This setting helps hived decide whether it should start in massive sync mode or live sync mode, but I thought it might also be telling hived when to change from massive sync mode to live sync mode, so I expected we could fix the problem very easily and this issue was created to address the problem: https://gitlab.syncad.com/hive/haf/-/issues/9
But after further discussions this week with the devs working on the sql_serializer, it turns out that this flag wasn’t the problem.
The real reason the sql_serializer was switching from massive_sync to live sync mode was because massive_sync mode can only be used for processing irreversible blocks (blocks that can’t be removed from the blockchain due to a fork), so as soon as the serializer was no longer sure that a block wasn’t irreversible, it swapped to live sync mode.
The only way that the serializer knows a block is irreversible is if the block is in the block_log. So it first processes all the blocks in the block_log in massive sync mode, then switches to live sync mode to process all blocks it receives from the P2P network (this includes old blocks that were generated since hived was last running, plus new blocks that get generated while hived is processing old blocks).
So, ultimately the problem is that the serializer is processing new blocks as soon as it receives them from the P2P network, but these blocks only get marked as irreversible and added to the block_log after they are confirmed by later blocks received via the P2P network.
How to stay in massive sync mode longer?
I’ve proposed a tentative solution to this problem that we’ll be trying to implement in the coming week: the serializer will continue to process blocks as they are received from the P2P network (this is important because the serializer makes use of information that must be computed at this time), but the resulting data will not be immediately sent to the SQL database, instead the data for the blocks will be stored in a queue in hived.
During massive sync mode, a task will be stepping through the blocks in the block_log and dispatching the associated SQL statements in the serializer queue. The serializer will stay in massive sync mode until it determines that the hived node has synced to within one minute of the head block of the chain AND all the serializer data for all blocks in the block_log has been sent to the SQL database, then it will switch to live sync mode. Switching to live sync mode currently takes about 30 minutes to build all the associated table indexes. Then the serializer will need to flush the queue of all blocks that have built up during the building of the table indexes. Finally, once the queue is flushed, new blocks from the P2P network can be sent immediately to the database, with no need to first store them in the local hived queue.
Another possible improvement: make sql_serializer use non-blocking writes
Based on the benchmarks I ran, I believe that currently the sql_serializer writes data to the database as a blocking call (in other words, it waits for the SQL server to reply back that the block data has been written to the database before it allows hived to process another block).
Using a blocking call ensures that hived’s state stays in sync with the state of the HAF database, but this comes at the cost of less parallelism, which means processing each block takes longer, and it also places additional strain on hived in a critical execution path (during the time that hived is processing a block, hived can’t safely do much else, so the block writing time should be kept as short as possible to lower the hardware requirements needed to operate a hived node).
To avoid this problem, we will take a look at converting the database write to a non-blocking call and only block if and when the queue of unprocessed non-blocking calls gets too large. This will make replays with the sql_serializer faster and will also reduce the amount of time that hived is blocked and unable to process new blocks from the P2P network.
The only potential scenario I can think at the moment where using using a non-blocking call could cause a problem would be the case where the postgres server failed to write a block to the database (for example, if the database storage device filled up). With a non-blocking call, hived would continue to process blocks for a while instead of immediately shutting down, and hived’s state would become out of sync with the state of the HAF database.
Postgres servers tend to be very reliable, so this is an unlikely scenario, but even if it happens, in the worst case, it would only require the state of hived and the HAF database to be cleared and a replay from block 0. And more likely, a system admin would just re-initialize hived and HAF database using a relatively recent snapshot and a HAF database restore file, and then they could quickly be synced up to the head block again.
Two new programmers working on balance_tracker app
We have two new Hive devs (a python programmer and a web developer) working on creating an API and web interface for balance_tracker (an example HAF app) as an introduction to the Hive programming environment.
Investigating defining HAF APIs with PostgREST web server
In hivemind and in our previous HAF apps, we’ve been using a Python-based jsonrpc server that translate the RPC calls to manually written python functions that then make SQL queries to the HAF database. This works pretty well and we recently optimized it to perform better when heavily loaded, but there is a certain amount of boilerplate Python code that must be written with this methodology, which requires someone familiar with Python (not a difficult requirement) and also increases the chance for errors during the intermediate translation of data back and forth between Python and SQL.
To make this task a little more interesting, Bartek suggested we investigate an alternative way to define the API interface to a HAF app using a web server called PostgREST to replace the python-based server.
This approach could be very interesting for experienced SQL developers, I think. Instead of writing Python code, APIs are defined directly in SQL as table views and/or stored procedures. API clients can then make REST calls to the PostgREST server and it converts the REST calls to equivalent SQL queries. So far this approach looks like a promising alternative for SQL programmers, and it will also be interesting to compare the performance between the PostgREST web server and the Python-based one.
wow, it's so cool.
Is there a way I can join block trades, programming teams?
that is if their is a place for JavaScript programmer 😃
We do hire Javascript programmers, but we generally only hire programmers that are local to one of our development offices (we have one in Poland and one in Virginia).
It's ok
Am in Nigeria
But if you ever want to hire a JavaScript programmer, am right here.
This is why Hive is improving day be day, as it has hardworking experts like you guys behind it.
This will pave a great road into the future for Hive.
Don't mind if I twitter this.
~~~ embed:1466428279699783689 twitter metadata:eWFzaXJidWxvaHx8aHR0cHM6Ly90d2l0dGVyLmNvbS95YXNpcmJ1bG9oL3N0YXR1cy8xNDY2NDI4Mjc5Njk5NzgzNjg5fA== ~~~
The rewards earned on this comment will go directly to the person sharing the post on Twitter as long as they are registered with @poshtoken. Sign up at https://hiveposh.com.
sooo good man, we need this type of person in the world thanks @blocktraders
what does the balance_tracker means?
The balance_tracker is a an example HAF app that can be used to plot the historical Hive and HBD balances of any Hive account as it changes over time.
Hmmm, that's interesting. Can it be also used for other transactions? for example hive engine or future Defi stuff?
It could be modified to support it, but right now it only supports 1st layer coins.
Hmmm I like it, special for DeFi stuff :)
Thanks to you @blocktrades, among many curators, Hive has been the platform that has motivated me to do yoga for a year, whose techniques I communicate in my publications, also because yoga has allowed me to recover physically after being intervened with gamma rays by a tumor in the cerebellum and thanks to the motivations of experienced Hive people like you, everything has gone very well for me, so I wish you every success in your publications and cures. Always forward dear @blocktrades that with your experience and encouragement we will always do very well.
@blocktrades Very heartening to see so much effort being put in by Blocktrades team. I can not even claim to just understand all the developments happening but since I have been a C++ programmer for CFD research in my days, I know enough to understand the complexities and the amount of work being put in.
Cheers.
As it happens, we have a couple of CFD programmers in our contracting companies (one works in C++ on contract to Ansys, the other is developing a new general-purpose solver using Python).
@blocktrades Oh wow - Thats great. It is great to see CFD programmers involved in the effort too. I can not claim to be up to speed in my old skillset (since it was way back) so I do not think I can contribute. However, if there are any of my skillsets, that may be of help to the team, in non-critical areas, I will be glad to assist as a free volunteer. My posts used to get good response (in the old days anyway before I took a long hiatus due to personal tragedy) so I think I am a fair writer. I am good at graphics and photography including macro. I even designed logos for a couple of guys back then. So - please feel free to let me know if team needs any support like that aa a volunteer. It will be an honor to be able to contribute to such a great effort (even if it an insignificant and minuscule ant's share of contribution)
A lot to sort out is sounds, very exciting information though! I am eager to see where this goes.
Do you have like a hive dev group ?
There's several such groups: there's a community on Hive where devs can post (this post was posted there) and there's also discord and mattermost groups where Hive devs interact.
Thanks guys for putting up your efforts.
And welcome to the new developers
Congratulations @blocktrades! Your post has been a top performer on the Hive blockchain and you have been rewarded with the following badges:
You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOP
Check out the last post from @hivebuzz:
It's a really good news for us ☺️
@blocktrades
Thanks guys for putting up your efforts.
And welcome to the new developers :)
heat to see you improving amazing work you guys
It is always great to see some strong supporting for the development of the hive blockchain. It is important for a smooth interface and friendly access to enjoy the stay at this platform.
Thanks for sharing. Nice share.
good news for blocktrade user. Even @blocktrades help me alot to convert my HBD into other crptocurrency. Thanks to @HiveDevs making so much effort on blocktrade.
thanks for updating us and the hard work you put on it !
😃
When will blocktrades be my one stop shop/fiat offramp
I think we'll support Euros late next quarter.
Nice!!
Congratulations @blocktrades! Your post has been a top performer on the Hive blockchain and you have been rewarded with the following badges:
You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word
STOP
Check out the last post from @hivebuzz:
Hello @blocktrades, I voted you as witness!
My community is called CHECKMATE COIN: CHECKMATE COIN.
We created some videos about chess and crypto, the last is this: CHECKMATE COIN ARENA OF MAY 6th 2021 PART 2.
Suggestions and aids are welcomed, thank you!
The problem can be overcome by switching to a more powerful server PostgREST, but not by giving up programming through Python.
Hello @blocktrades
I would like to invite you to watch my movie:
https://hive.blog/hive-184437/@kamilkowalski/jaskinia-wpierd-lu-extreme-cave-pl-en
Greetings
@kamilkowalski
Looking forward to your next update!
I haven't seen this issue. Can you share an example post where you are experiencing this? Also please let me know details about your browser and OS.
Apparently the issue with that post wasn't just that it had a lot of comments, the issue is that those comments all required rendering of tweets. @quochuy is working on adding pagination of comments to handle such posts better.