Monitor Witness - Know your backup witness has failed before you depend on it

in LeoFinancelast year

If you are witness and run more than one witness node, even if you don't, you might find this handy.

I currently run four witness nodes, and there is one failure scenario I don't think many people think of. Most witnesses have notification if their witness node starts missing blocks and even automated fail-over. What if your backup node fails and you don't know it until it's time to fail over your primary node?

Most witnesses have one or two nodes, so in this scenario you are at the mercy of a full resync, which can take days or even weeks depending on your hardware and is just getting slower as time goes on.

Would you like to know that your server is behind before you actually need to use it?

Well then,

Introducing Monitor Witness

I know, not a very exciting name, I was lazy at the time and it wasn't something I planned on posting anywhere. I recently threw it up on Github and letting other witnesses benefit from it.

Monitor Witness is build using completely free tools with no ongoing fees. Unlike typical network monitoring, Monitor Witness will warn you before a problem happens potentially saving you days of downtime.

How does it work?

Monitor WItness is a relatively simple bash shell script that monitors your witness log file looking for the last block processed, then compares against full nodes to determine if it is within 10 blocks. If it is, it sends a heart beat, if it isn't, you get notified it is behind or even failed.

First you need an account with HealthChecks, Uptime Robot, or your own Uptime Kuma instance. Any service that provides ping/heatbeat type of monitors via url will work.

Healthchecks is basically a reverse notification. Typically when you use notification services, you set a destination service for it to ping or monitor and alert you when it can't be reached. With a healthcheck, you set up a URL that needs to be visited within a set amount of time, if it is not, then it starts to complain. It's a dead man's switch, something you probably only heard of in movies. This is an extremely handy tool for monitoring backup jobs, and other jobs that run and you want to make sure it completely properly but has no external IP to ping or monitor.

Installation

Enable Logging

To use this, you need to have logging enabled. The easiest way to do this is to use screen (i.e. screen -S witness -L -Logfile witness.log) so it can monitor the log file. You can use tmux as well, but it much more diffcult to log with out of the box.

Setup Heartbeat monitors

Setup an account https://healthchecks.io. Create one Health Check for each of your witness nodes. I am going to assume through these instructions you are using healthchecks.io.

I recommend using the name of the witness as the name and slug. I recommend setting it to 1 minute checks with a 5 minute grace period. This means, if you don't visit this url every minute, it will go into a failure state, but there is a grace period of 5 minutes before it starts sending alerts. Customize this as you wish, but this is what I would recommend.

If you want to see the health check in action, visit the URL of one of your health checks in the browser and you will see it turn green immediately. By default, your healthchecks will notify you by email, but you can customize it with almost every tool you can think of from discord, ntfy, pushover, slack, and even SMS.

Once you have created these, just leave the tab open, we will come back to this.

Download Monitor Witness

You can download *Monitor Witness from Github using git clone https://github.com/officiallymarky/monitorwitness.

As always, review the code and make sure it doesn't do anything you don't approve of. The code is very easy to read and extremely short. Should only take a minute.

Make sure monitorwitness.sh is executable using chmod +x monitorwitness.sh.

Modify monitorwitness.env with your preferred settings and healthcheck URL from above. The defaults are three popular full nodes with allowing a single failure and a 10 block tolerance. This means if you are outside of 10 blocks on 2 out of 3 nodes, you will send a heatbeat preventing any notifications. If you are outside of 10 blocks for 2 or more nodes, you will not send a heartbeat causing an alert.

**The code will not run unless you add a healthcheck URL to the configuration file. **

I recommend installing this on all witness nodes, and setting a unique healthcheck URL for each node, idealy with a matching name on Healthchecks. UptimeRobot supports this sort of monitoring as well, but you need a paid account. Healthchecks.io allows you 20 free healthchecks. You can also use UptimeKuma (my favorite) which is an open source clone of Uptime Robot you can host yourself, but is a lot more involved. Healthchecks.io works really well, is free, and super fast to set up (under 15 seconds to be up and running!).

Schedule monitorwitness.sh

Once you have modified monitorwitness.env with your preferred settings, all that is left is to schedule it.

I recommend using cron, it is as easy as typing crontab -e to modify your cron schedule.

Add in the following entry with the correct path to monitorwitness.sh. Cron requires exact paths and will not use short cuts like $HOME or ~.

* * * * * /home/marky/monitorwitness.sh

This entry will call this script every 1 minute. This is what I recommend, and remember your healthcheck will only fail after 5 minutes of failures. Google search crontab examples or crontab generator if you do not understand this.

Verify installation

After about 60 seconds, you should be able to see all your healthchecks.io on healthchecks.io are green and everything is running and notifying you. You can do two things to verify everything is working properly. You can pause your witness or you can just remove the crontab entry once it turns green and make sure you get notified.


Posted Using InLeo Alpha

Sort:  

Friend, what you say is interesting, could monitor witness generate random IPs with a virtual machine? Those topics are interesting.

Thanks, looks handy. :)
As i‘m still learning the details of hive i don’t understand how the network picks the witness? ( i guess „find consensus“ )

So lets say I broadcast a transaction, how does the network determines which witnesses can add the next block?

Do you know were i can find this kind of information? Can also be on Code Level if you happen to know.

Since you've asked about witness schedule I'll shamelessly link my old article where you can find some information. The actual code that compiles future witness schedule starts here (follow to definition of update_witness_schedule4), but of course the data it uses is updated all over the place.

Thank you for sharing that valuable information and providing a link. It's exactly what I was searching for. Finding this kind of information, especially with the correct code examples and implementations, can be quite challenging.

I believe today is my lucky day as @edicted shared this post: https://peakd.com/hive-167922/@edicted/privex-hive-node-in-a-box-experience. It contains some really good information as well.

May I ask you something? I inquired with ChatGPT about consensus on the Hive blockchain and received this response. It appears to be quite accurate to me. Do you agree, or do you see any flaws in it?

Consensus:

In the Hive blockchain, the process of reaching consensus is a crucial step that ensures the validity and finality of a newly produced block. This consensus is achieved through the collaboration of various network participants, including the active witnesses and other nodes in the network. Here's how it works:

  • Broadcasted Block: After a witness produces a block, it is broadcasted to the Hive network. This block contains a collection of transactions that have been validated by the producing witness.

  • Relayed to Network: The block is propagated across the network to all other witnesses, full nodes, and participants in the Hive ecosystem.

  • Validation by Other Witnesses: Upon receiving the block, other witnesses, including those who are currently active and those in the backup positions, validate the transactions within the block. They check that the transactions comply with the Hive blockchain's consensus rules, that the sender has the required balance, and that there are no double-spending attempts.

  • Consensus Process: Witnesses and network participants then engage in a consensus process to determine the validity of the block. This process involves their nodes communicating with each other. Each participant verifies that the transactions within the block are valid and follow the established rules.

  • Acceptance or Rejection: If the majority of witnesses and nodes reach a consensus that the block is valid, it is accepted, and the transactions within it are considered confirmed and added to the blockchain. If there is a discrepancy or a significant portion of participants deems the block as invalid, it is rejected.

  • Finalization: Once consensus is reached, and the block is accepted, it is considered finalized. The transactions within the block are now an integral part of the Hive blockchain and are irreversible.

That description is far too broad. It shows that ChatGPT knows very little specific to the Hive other than that it is a cryptocurrency, since the description fits pretty much any blockchain. The sentence about "required balance" and "no double-spending attempts" in particular exemplify the problem - transfers, where that sentence fits, are only tiny portion of what is happening on Hive.

The true description that would be Hive specific should include topics such as:

  • DPoS - Delegated Proof of Stake mechanism for selecting producing witnesses
  • TaPoS - Transaction as Proof of Stake mechanism for users to prevent rewriting of blocks even in the event of total collusion from all top witnesses
  • forking - how the network determines which path is valid if there are competing blocks of the same number
  • OBI - One Block Irreversibility mechanism for speeding up block finality
  • hardforks - what are the rules that govern major changes in code
  • hard-vs-soft consensus (law vs gentlemen's agreement) - what are the differences and which parts of Hive belong where

As i‘m still learning the details of hive i don’t understand how the network picks the witness? ( i guess „find consensus“ )

The top 20 witnesses take turns every block, then a shuffle is made for a backup witness in the 21st spot based on their rank.

So lets say I broadcast a transaction, how does the network determines which witnesses can add the next block?

It looks on the witness schedule.

Do you know were i can find this kind of information? Can also be on Code Level if you happen to know.

https://hive.io

Healthchecks.io allows you 20 free healthchecks

Worth mentioning that with the free plan you get notified max 5 times per month:

image.png

This is only for Whatsapp and sms, email you get unlimited notifications. I personally use my own monitoring, but you can also self host Healthchecks as well or use Uptime Kuma.

interesting

PS. I assumed that to receive emails you needed a paid plan. Looks like it's not the case.

image.png

No, what I suggested all works on the free plan, you can run 20 of them. If you need more, you can self host it, or buy a paid plan. But personally, I'd recommend Uptime Kuma as it does this and a whole lot more, just healthchecks was the easiest to use for this situation and wanted this to be easy.

Yay! 🤗
Your content has been boosted with Ecency Points, by @drakernoise.
Use Ecency daily to boost your growth on platform!

Support Ecency
Vote for new Proposal
Delegate HP and earn more

These are very informative and interesting. Good to grow if anyone follow your instructions.