STEEMSQL - Update 12 - About locks and performances

in #steemsql7 years ago

SteemSQL is a public MS-SQL database with all the blockchain data in it.

Previous posts:

SteemSQL success and wide adoption

SteemSQL has been widely adopted by a lot of users for analytical processing and even for running bots. Indeed, it is often easier to perform SQL queries to obtain information on the content of the blockchain than to parse every block and perform yourself the aggregation of all these data .

SteemSQL use a “Database Injector” process that read the blockchain in real time and split the transactions made on the blockchain into differents tables. Today, an average of 9 transactions per second are generated with some peak at 184 transactions per second. Each transaction may also have an effect on the non-transactional Accounts andComments tables, which must also be updated.

To ensure the integrity of the database, the Database Injector needs to perform a lock on the updated tables each time it has to insert or update data.

Gimme a lock!

While the database injector is working, users of SteemSQL issue queries to the database to perform their analysis.

Therefore, the SQL server will also try to performs a lock on the table involved in the user’s query in order to ensure that the returned data is consistent.

As you can see, as both process runs simultaneously, we can enter in a race for acquiring a lock on the tables, either by the Database Injector to add data to the database or by the query issuer to read data from it.

The database injector is a cool process, because it often ask for locks, but only for a very short time as its update are made very quickly.

On the other side, it often happens that the execution time for queries issued by users requires several seconds, or even minutes to complete. I saw some requests that took more than half an hour to finish, because they were poorly written or not restrictive enough.

While these long queries are running, the Database Injector can not insert new data into the database. This is not only annoying in terms of performance because the database start to be out of sync with the blockchain, but it is also a source of spam to me because I receive a notification whenever there is a problem of data injection in the database.

Please, use the NOLOCK hint!

Members of the #steemSQL channel on steemit.chat started to see my repeated requests to them to use the SQL (NOLOCK) optimization hint.

The (NOLOCK) hint tells the SQL server that dirty reads are allowed. When a user issue a query with the NOLOCK hint, no shared locks are issued to prevent the Database Injector to add data to the database. Even if the last one issue an exclusive locks, it will not block the user’s query from reading the locked data.
As I explained to SteemSQL users, the usage of the NOLOCK hint not only stop them interrupting the work of the Database Injector, but it also improves their queries performance.

How users responded to my request

The Good

Some quickly understood their advantage in terms of performance (and mine in terms of tranquility) and started to do systematic use of the NOLOCK. They carefully added the hint to their queries and everybody went happy.

The Bad

Unfortunately, some did not hear my call or did not care enough. They continued to issue queries without the hint, or forgot to add it to all table in their queries (or subqueries).

Others, having no knowledge of how to write SQL queries, have used analysis tools like PowerBI to perform the task, and even if aware of the NOLOCK issue, weren’t able to tell their tool to use the hint.

The Ugly

Finally, some who I knew they saw my request for the NOLOCK use, simply ignored it. Lazy ones! I have to admit that I had to ban a few of them to preserve SteemSQL smooth work.

The need for strike back

More and more, I had to struggle with people issuing queries that were crashing SteemSQL. Those who are members of the #steemSQL channel on steemit.chat know what I am talking about.

To give you an idea of the problem, here is a graph that shows the number of lock requests per minutes over the last 30 days:

As you can see, the server has to deal with an average of 767K lock requests/min, sometimes rising up to 4075K lock requests/min.

I also received hundreds notification of deadlock situation per day. I started to get tired of having to continuously clean my mailbox. It was time to put an end to this anarchy.

But how to solve this issue without disturbing existing process (and it looks they are quite a few)?

James to the rescue

(Un)fortunately, I am not 007 and don’t own a licence to kill. Therefore, the only option to me was to use “A VIEW”!

In SQL, a view is a virtual table based on the result-set of an SQL statement. A view contains rows and columns, just like a real table. The fields in a view are fields from one or more real tables in the database.

Therefore, as from now, all tables have been moved behind SQL views that mimics the tables and use the NOLOCK hint. The underlying tables are no more accessible to users, but only to the Database Injector. As it is the only process to update the database, we do not have to fear data corruption.

For example, the table Comments has been renamed TComments and a view has been created with the following definition:

CREATE VIEW Comments AS SELECT * FROM TComments (NOLOCK)



This ensure that each query issued against to the table through the view will be using the NOLOCK hint.

Users have nothing to change in their existing process as this is fully transparent to them or to any application they use to query the database.

Full speed ahead

Despite the newly implemented structure will now hide how the tables are designed and linked together, this will allow SteemSQL to continue working at full speed and manage the future growth of the Steemit platform.

I will put some effort in creating and publishing documentation and tutorials for SteemSQL.

Moreover, I will now be able to release some very awaited new features that I was holding on to prevent putting the server on its knees.

Thank you to all those who regularly send me comments of encouragement and share their appreciation of my work with SteemSQL.

Thanks for reading.


Help SteemSQL stay public and running 24/7/365.

All payout from this post will be allocated to SteemSQL infrastructure.

Thanks for your support!

footer created with steemitboard - click any award to see my board of honor

Support me and my work as a witness by voting for me here!

Sort:  

Great work @arcange. I often wondered how it was possible to open a SQL server up to the world without it being killed by bad queries. SteemSQL is an invaluable resource and your VIEW solution seems sound so I hope it does the job. Thanks again.

Yeah, making a server public is kind of crazy thing.
I should have thought a bit before doing it.

Anyway, as I did, I now have to deal with ...
Funny (but exhausting) thing to do and learning a lot. So ... added value for everyone.

This kind of thing gives an opportunity to learn.

Hey @arcange thanks buddy for the update and nice post as well.

Thanks again.

full vote for this!
and thanks alot for hosting this, adding a witness vote as well!
Kind regards
Jan

Thank you so much for your support!

@rawdawg just voted for you as a witness. Good luck bro!

Thank you. Really appreciated!

just an FYI, we just got MVP michaelvenompaige on steemit. . .
He is a well known MMA fighter. . .like really well known. . . We should get him some Steemit gear. Is that something we would ask SteemInc for?

@ michaelvenompage

Great solution! Nicely done.

I got in the habit some time ago of always using nolock, but I only write very basic queries and now mostly use Power BI. So this will make it much better.

Phenomenal work. Although with the down times and crashes, some tables were missing data. Will that require a new full resynch or is it fixed?
This VIEW trick is awesome, I'll try it on my databases.

P.S. I was one of those lazies (well not lazy, but busy) who forgot to implement NOLOCK...

Are you Bad or Ugly? ;)

I am the Good.

Good boy! XD

I know i am one of the ones that has caused problems, i am now using sql no lock queries, so there should be no problems, I hope from my side. I also see the benefit in speed using sql queries and hope to help the bisteemit team get the hang of doing this too.

Keep up the good work and thank you for the update

Thanks @paulag for yor support. I know you were one putting the the server down, but you also did your best to try to correct the situation. And this is highly appreciated.
Let's have you labeled "The Good" ;)

thats a relief :-)

@arcanage I voted for you. Thank you for your hard work on this awesome db for us. I know it is not easy and often thankless. I greatly appreciate your work!

I hate it when people suck at queries...
Gotta do what ya gotta do with the database to make sure it continues to function!

Thanks for keeping the DB up and running!!

Just making sure I understand correctly... So now we will be able to use Power BI and query the Views and all will be good? Up voted for finding this solution!

Yes, you can now use PowerBI without worrying about the NOLOCK hint.

Thank you for this update. And I understand the need to use views. I just wasn't aware that there was the ugly part. Full steem ahead!

I'm one of the "The Bad" here. I haven't had time to learn SQL queries and have been using Excel Power Query.

Thank you so much for providing this service! I use it daily to hunt down spam & scam.

hay @patrice, I haven't forgotten you about that data, I was just fixing all my queries with sql before I could move on and do data for anyone. Give me a few days to catch up and I will be in touch

I've been busy myself. Thanks!

Thanks to my "view trick", everybody is now on "The Good" side ;)

I like the "View to A Kill" fitting in this post :-)