STEEMSQL - Update 9 - Performances improvement

in #steemsql7 years ago

SteemSQL had to be redesigned to manage the awesome growth of Steemit.

Previous posts:

What’s new?

In my last witness report, I expressed my concern about SteemSQL's ability to ingest the growing mass of data that is generated every day.

Indeed, if you look at the following chart taken from my last daily stats, you will find that the number of transactions to be processed has tripled (+199%) on the last 30 days.

For posts and comments, this represents respectively an increase of 315% and 297%
And the number of upvotes has more than doubled with an increase of 112%

Gradually, the performances of data injection in the database have deteriorated. SteemSQL ended up struggling with the numerous updates all transactions generated. At some times, I found it with being late for more than 24 hours compared to the data available in the block chain (that's why I was unable to publish my daily stats for 2 days).

So, I decided to take the bull by the horns and performed a thorough analysis that led to several changes to the database and a complete redesign of the Database Injector.

1. Changes to the database.

These mainly concern the indexes of the database. I found that many indexes had been created to improve performances of some queries, but that finally these indexes were rarely used.

The indexes of the database have therefore been analyzed one by one, some have been deleted, others have been modified.

2. Changes to the Database Injector

The Database Injector is the process responsible for reading each block of the blockchain, and distributing the transactions in the different tables of the database.

I worked on the optimization of these processes by separating the injection of the transactions from the update of the impacted non-transactional tables (Accounts and Comments tables)

These processes now run in parallel thanks to an optimized multithreaded implementation of the different tasks.

Result

Wow, thanks to this work, I have been able to reduce the overall processing time and got it divided by 3 !!

And I am persuaded that it is still possible to further improve it because I worked a bit in emergency mode in order to restore the database to a functional state as quickly as possible. I had to take into account that more and more people rely on SteemSQL for their process Or analysis.

Things are done and SteemSQL is now ready to face without any problem a doubling of the number of transactions. This is something I really wish for Steemit!

Oh ... and for those who missed this announcement ... in the middle of my optimization work, I found a bit of time to launch the "Happy Birthday" action on SteemitBoard ... that was just to think to something different while I was doing small pauses ... :P

Thanks for reading !


All payout from this post will be dedicated keep SteemSQL running. Thanks for your support.


footer created with steemitboard - click any award to see my board of honor

Support my work as a witness by voting for me here!

Sort:  

Awesome job, thanks again for making this service freely available for all of us to use!

Thanks for your comment. Enjoy it ;)

Love this! Upvoted, followed, and you have my witness vote.

Great to see how much posts and comments have increased
I've voted for you as witness.
Thank you for your great work!

Thank you. You vote has been really appreciated!

Cheers Arcange, very useful!

A suggestion: you could change the login password to 'withnolock' to remind people :P

Thanks.
Good suggestion. I guess a password like "use_(nolock)_hint_in_your_queries_or_get_kicked!!!" will be even more secure XD
Unfortunately, can't do it because their are already too many processes running out there and relying on the existing config :/

So is this SQL database decentralized? Or is in centralized and controlled?

This database is not decentralized.

It is a service I offer to the community.

It does not contains anything else than the data from the blockchain. You can create your own database if you want. So, centralization/decentralization is not a concern.

sketchy

Why so? You can just check it against the Steemit blockchain or other databases (mysql/Mongo DB) if you want.

Good morning 🌞

And once again, great info!

Thank you!

SteemSQL is a new proprietary database? Is it better than the well proven MySQL or MariaDB?

SteemSQL is not proprietary. It is a MS-SQL database made public.
Please read the Introduction post and updates for more information.

The transaction chart looks like an exponential growth is happening.

To the mooooon !!!

Good work. Keep it up

Thanks you!

Nice 👍🏼! I was wondering what the load time was on transactional data. Glad to hear SteemSQL is continuously improving.

Question: Are edits to posts triggering an update to the existing records?

Also, do you know the approximate load time for transaction made? i.e how frequent are loads?

Question: Are edits to posts triggering an update to the existing records?

Each edit generate an insert of a new record in TxComments and an update in Comments

how frequent are loads?

SteemSQL process new blocks every 10 seconds.
Accounts and Comments are updated every 10 minutes

Wonderful, but I am a little concerned for my test Investment of 100 bucks SP that eventually dropped to 85 bucks total account value. So, where is the advantage to existing member account values from the folks you are showing were added? That's a 15% haircut for me. In two months :((

You have very interesting blog posts and upvoted a few i have read. Followed you, hoping you follow back check out my posts 👍☺

Thks for your work @arcange. I've been your chart for my Steem presentation during meetup. Upvoted! Let's Steem On.

Nice work my friend!

Thank you!

Read the topic aleni I like you I hope to improve the situation of the site to the urethra+follow

Great news!