I broke the SPS Validators and why this is a good thing

in #spsdao19 days ago (edited)

Hey everybody,

For those that are running a SPS Validator u probably see something like this now:

validation errors.jpg

This is an error we shouldn't be getting, the ValidationError means that the validators are no longer creating the same hash from the same data, if this was live and there was no bug it would be a sign that the data was altered, incomplete, or corrupted.


What did i do to create this.
Since i started with testing on the validators, i have been voting on all new validators that came online, as of yesterday that was 9 or 10, voting for a validator is something we all should do with our SPS, not just the people that are testing the validators, but all SPS stake holders.

If i have been doing this from the start, then what triggered this error only now, i also went in the qa environment and claimed and staked my SPS, this was the first time that somebody staked a good amount of SPS while he had voted on this many validators, i have done the same when we were just with 4 and it wasn't a problem back then. Now from what i know, it seems that doing this and changing the stake weight vote on that many validators have caused this error, we'll know more when the devs update us on it.


Everybody can help test, even those without a license

For this test it doesn't matter that much who u vote for, but when we go live this is the way the top validators are chosen, by stake weight vote, for now u can do this from here vote for validators, u will then see the page below, fill in your username where it says Hive Account and click Authorize (pressing enter doesn't seem to work), keychain will pop up and u just sign in to the page.

image.png

This will take you to the following page, where on the left you can see the validators i voted for and on the right the validators u can vote for, in my case u can see that the buttons are grayed out because i already voted on them. Click vote for the validator u want and sign with keychain, everybody has 10 votes to give out.

image.png


So what's next for the people running the validator software
For now it's wait for the devs to fix the bug and create a new snapshot, when they have done this we should get a heads up in discord and then we can go to work. To continue u need access to a terminal, so if u still have the terminal open from running the software u can either close it and reopen it after that, you can ignore the warning about the running process, the docker will continue to run even if u close the terminal, what i have done is just open a new terminal and navigate to the right folder with cd SPS-Validator if it doesn't open in that folder.

Now we need to get the updated software, this first command is something u may not need if you haven't done anything special with git, but git checkout master will make sure u are on the master branch of the repository, then with git pull u will download the updated software, after that command it should look like this, i used a different command, but that was because i needed to get an older version first to show what it would look like for u.
git pull.jpg

After this we will need to go update the url for the new snapshot that the devs will provide, so depending on how u edited the .env file the first time, use nano .env or gedit .env to open the .env file, u should see something simular to the screenshot below, i have added a red line under the url we need to change, to be sure u can sopy paste the entire url, but usually the only thing that is changed is the date at the end of the url. Save the changes with control + S and close the editor.
snapshot.jpg

Now we are going to stop our validator to rebuild it, so to make it a good habit for when we are live we will go either here from another pc if u don't have a gui or didn't do ./run.sh start all when u started the validator or don't have keychain on your device that runs the software, the rest can do the first link also, but also have the option from here, for both options it should be the same, u see a input box for Hive Account, enter the account that is used as Validator Account, not the Reward Acount, click Authorize and keychain will pop up and u just sign in to the page, just like we did to activate our validator, now on the right of the screen u see this.

image.png

Uncheck the box, press update and sign with keychain to set your validator to inactive, this will prevent missed blocks in the future, it could be that u will still get them now, but we are still in testing phase so it's not that big of a deal now.


This was all the preparation we need to do, now we can stop and rebuild our validator, so back in the terminal enter the ./run.sh replay command, u will be prompted with a question that u are sure u want to replay, press Y to continue, this will destroy all data u have on your machine and start all over from the snapshot, this is needed to have all validators have the same data again. Next u will what u see in the screenshot below.
image.png

Here also press Y to download the new snapshot and not use the older data, after that u will see the validator beeing build and restarted also, it will need to catch up again, depending on when u do this it can take longer if only see the update later. When u see the "Blocks to head: 0" again, then u can reactivate your validator to get rewards again.


Key Takeaways
We need all the testers we can get, not just for running the software, but also people that vote for validators and interact with the qa environment, from here do all u can think of that u normally do in the game, 3 things that don't work: burn SPS for DEC, buy CREDITS with SPS and buying spellbook with SPS. For the rest to more that gets tested now, the better we are prepared for what can happen, better find those pesky bugs now.

Sort:  

Glad you found that one, but it also show we need a lot more scenarios to be tested. Honestly, I think at this point it would be good if the company developing the nodes for Splinterlands would lay out their tests. They might need to be extended.

I'm not sure they can do that all by themself, like i said in this post, we did stake sps before when we were doing closed testing, but never got this, now it seems the combination of me having voted on all i could, with then staking that the validators needed to ranked again and that caused the problem. We can only find those things with more people, that was part of why i made this post in the hope that who has a few minutes, goes vote for the validators and then do some actions in the qa environment, that will be the best testing we can do.

Don't get me wrong, I also think they cannot do all of that by themselves. But it would be good to see their test coverage and that way extend it and scale it. Unfortunately, real world behavior often differs from what is intended by developers :-P

Let's hope they read this and they can answer that and thanks for the validator vote.

This post has been supported by @Splinterboost with a 15% upvote! Delagate HP to Splinterboost to Earn Daily HIVE rewards for supporting the @Splinterlands community!

Delegate HP | Join Discord

Congratulations @jef-001! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

You received more than 50 HP as payout for your posts, comments and curation.
Your next payout target is 100 HP.
The unit is Hive Power equivalent because post and comment rewards can be split into HP and HBD

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP