You are viewing a single comment's thread from:

RE: MySBDS - Steem Blockchain Data Service in MySQL

in #utopian-io7 years ago

No worries, it's good to document these issues in the blockchain so others can learn too!

Interestingly, this is one issue I have not yet run into. For me, steem_mysql has been running flawlessly, it's the steem_sbds container that has stopped and had issues so far.

My general thoughts are this may be system level issues. How much memory do you have? I've tested this in a 4GB instance and can say it definitely does NOT work with less, at least not reliably.

Also, you mentioned installing other tools so these may also have an impact.

Something to check is to dump all the logs to local files so you can parse and review more easily. Docker makes it a little weird, so use this technique:

docker logs steem_mysql > mysql.log 2>&1
docker logs steem_sbds > sbds.log 2>&1
Sort:  

I didn't install any other tools in this particular setup (I destroyed the server with the other tools), and I think I adhered very closely to the instructions you provided. I'm using the 16GB digitalocean server and 500GB volume that were recommended in the instructions, and I used the high memory script.

Alright, with that deploy when you're doing the database restore, that does hammer mysql pretty hard so that's probably the main issue here.

Again, check the docker logs to see if anything obvious pops out there. Beyond that, watch mysql to see what it's doing by running this command:

mysqladmin -h 172.17.0.2 -u root -p processlist

While the restore is running, you should see 'INSERT INTO...' in the process list for many hours.

If mysql dies, you shouldn't need to start all the way over. Assuming that you still have all the .gz files in the dump/ folder you can run this:

mysql_password="mystrongcomplexpassword"
volume_name="volume-tor1-01"
mysql_ip=`docker inspect --format "{{ .NetworkSettings.IPAddress }}" steem_mysql`
for i in /mnt/$volume_name/dump/*.gz ; do gunzip < $i | mysql -h $mysql_ip -p$mysql_password steem ; done

Obviously, update the password and volume. This is just part of the script that calls the import.

I haven't been too suspicious about the database restores because I actually did the restores twice (onec on the server I destroyed and once on the current server), and both times seemed to take about the same amount of time (maybe around 8 hours) and both resulted in about the same amount of space being used on /mnt so before investigating the database restore, I wanted to try something based on what you said about how you had no issues with steem_mysql stopping and that you tested things on 4GB so I'm in the process of trying things with your low memory script's mysql settings. More specifically, I removed both the docker steem_mysql and steem_sbds and then ran lines 38 and 59 in your low memory script to get them both going again, and I found I also had to rename a directory with a really long name that was in the "volumes" directory so the system would continue using all existing mysql data instead of populating mysql from scratch. So far things seem to be working as steem_mysql hasn't stopped in over an hour.

I'm glad it's coming together! It really is a lot of learning little pieces of the puzzle like this to understand everything. I'm still getting that tutorial together to walk through all of the steps like this in a little more detail.

One tip, renaming the volume could cause issues with docker (thought it sounds like it worked here) so here's a trick in case it comes up again.

Inside the volume folder you'll see another folder named _data containing all of the databases. Make sure you docker stop steem_mysql first so nothing is running and then just move your _data folder with all the existing data to overwrite the _data folder in the running container. You can't just replace the steem/ folder as this is innoDB and needs the ib... files as well.

Then, just docker start steem_mysql and it'll see the database.

I was wondering about the possibility of zipping and tarring up the _data directory to populate sbds instead of using mysqldumps. For example, instead of just offering latest.tar for download on your website, would it also work to offer latest_data.tar as a download? There is probably a reason why mysqldumps are what people typically do, but I'm wondering if this is a special case where the _data directory might be a faster solution because the database restore took around 8 hours on a 6 virtual core machine with 16GB and the approach of downloading overwriting the _data directory could be much faster.

Haha, the .tar download itself was a whole other battle for me.

The short answer is the ibdata... and ib_logfile... files are required as well as the steem/ folder, but those files contain data about all the databases on that server. Even so, this approach should work if you backup the entire _data folder, but nothing short of that. That would almost assuredly go faster than dealing with the dump.

Creating my .tar is no walk in the park either. I'm looping through the tables and doing a mysqldump one at a time and appending each of them to the .tar file. This database is so huge a standard mysqldump of the entire thing never finished once in my testing.

It is difficult and expensive to be working with such large databases which got me thinking about a way to potentially reduce the size of the database by more than 50% without too much of a sacrifice. Anyway, I noticed that the sbds_core_blocks table accounts for around two-thirds of the entire database size, and I'm guessing that the mediumtext entry named, "raw" (probably for raw blockchain data) probably accounts for the vast majority of that space, and it got me wondering if it might be possible to simply replace the "raw" entries with a single character or something save space. For example, I think doing this might enable the entire database to fit comfortably on a single $40/month droplet with room to spare.

It still seems to be working, and the lastblock seems to stay very close to the current head_block_number.

Thanks for the tip about the _data folder. I was definitely not confident about what I did, but it is good to know the proper way to do things going forward.

Note: What I did fortunately had the same effect as overwriting the data folder in the running container because there was /old64characterdirectory/_data/ and /new64characterdirectory/_data/ and since I saw that the old one was about 360G but that only the new one was the only one being written to so I stopped both steem_mysql and steem_sbds and then removed the new one and renamed the old one to what the new one was which was a roundabout way of overwriting the _data folder in the running container, but my approach could have been problematic if there was anything other than the _data folder in the running container.

Awesome! To test, I run this a least once a day:

SELECT MAX(block_num) AS lastblock FROM sbds_core_blocks

And check against the head_block_number here: https://api.steemjs.com/getDynamicGlobalProperties

I have fallen a bit behind a few times and as far as I can tell it's the public node we're connecting to that's the issue. Even so, stopping and restarting steem_sbds should get you back on track eventually if you see you're falling behind.

Yes, I've been checking the max block_num against the head_block_number throughout the day, and for the past 24 hours it seems to be keeping up well with about a 20 block lag which seems reasonable.

As a test I reverted back to the high memory settings (line 38 of your high memory script) to see if steem_mysql would keep stopping like it was doing before, and so far it has been working well for the past hour, and I plan to continue testing it out tomorrow to see if I need to do any restarts. I'm thinking that the high memory settings may only run into issues when it is way way behind on the block_num because maybe the high memory mysql settings cause the system to try and catch up too fast. It will be interesting to see if steem_mysql has any stops tomorrow.