Witness-Mon: A Witness Monitoring System

in #utopian-io6 years ago (edited)

witness-mon.gif

Witness-Mon is a simple error detection system for a STEEM witness. A system that parses the witness log, verifies normal operation and automatically disables the witness in case of detected errors.

Its purpose is to prevent missed blocks with the help of proactive monitoring.

High-Level Overview

The monitoring setup comprises of one script running on the witness node and one script on a remote monitoring node.

The witness node script:

  • Checks the witness logs
  • Restarts in case of issues
  • Sets the witness status (Fetched by the monitoring node)
  • Sends email notifications

The monitoring node script:

  • Checks the witness node status
  • Disables the witness in case of issues
  • Sends email notifications

Scope

At the moment the monitoring node is only configured to work with one witness node. But this could pretty easily be extended to provide automatic failover, so instead of disabling the witness, we keep track of every witness node and failover to a functioning node.

As soon as I get higher up the witness ranking I will explore this option in more detail. 😀

It is also not designed for Top20 witnesses due to their low 63 seconds production interval.

Although, it could perfectly well be used to monitor backup nodes, but without the witness disable function.

Requirements

Below a list of requirements for the scripts:

  • Tool to disable the witness (e.g. Conductor/Beempy)
  • SSH key login enabled
  • nc (netcat) - To perform connectivity checks
  • SMTP client - To send email notifications

Witness Node Setup:

Github link for the witness node script: Github: nodecheck.sh

Default settings for steem-in-a-box

The script's default settings are made for @someguy123's steem-in-a-box. Place the nodecheck.sh file in the steem-docker folder, in the same folder where you have the run.sh.

In the script's #### CONFIGURATION SECTION #### you can adjust the commands you use to start and stop your witness process, as well as the command used to read the logs.

Screen Shot 2018-08-15 at 20.49.48.png

Make your script executable

Don't forget to run: chmod +x nodecheck.sh to make your script executable.

Script Overview

The witness node runs a shell script that parses the witness log and checks for recent 'handle_block' entries. It also checks for stale records to be sure the system is logging. It takes a snapshot of the log and compares it with the active log 12 seconds later.

The script outputs the node status to the file /tmp/status. It writes OK or BAD to reflect the status. This file is fetched by the monitoring node, described below.

If an error is detected, the witness process is restarted, the script waits a moment for the system to recover and then checks the logs again. These steps are repeated a maximum three times. If the script still detects an error then an email notification is sent to the administrator and BAD is written to /tmp/status.

The script needs to be persistent, start-up at boot and automatically restart in case of a failure. Please see below how to set that up using Systemd.

Default test mode

TESTING_MODE_ON=YES is active by default in the #### CONFIGURATION SECTION ####. It will prevent witness restarts. When done testing, comment that line to go live.

Monitoring Node Setup:

Github link for the witness node script: Github: remotecheck.sh

Default settings

The location of this script is not important, place it in a folder in the user's home directory. A folder witness-check/ is used in the examples below.

The scripts default settings can be adjusted in the script's #### CONFIGURATION SECTION ####. Adjust the SSH connection string for the witness node and the conductor command line.

Screen Shot 2018-08-15 at 20.50.10.png

Make your script executable

Don't forget to run: chmod +x remotecheck.sh to make your script executable.

Disabling the witness

As I understand many witnesses have been using @furion's Conductor toolset, I have based the example script on that. But the Beempy tool, part of Beem from @holger80, could equally well be used.

To automatically unlock the wallet, for Conductor or Beempy, we need to set the UNLOCK environment variable when executing the script. See the Systemd section below to see how that can be done.

Check out Conductor on Github (Conductor) for installation instructions. Make sure the tool works manually via the command line before proceeding.

Script Overview

The script connects with SSH to the witness node and checks the status by reading the /tmp/status file.

Status = OK:

If the status is OK the script does nothing, it sleeps for 2 minutes and then runs the check again.

Status = BAD:

If the status is BAD the witness will be disabled immediately.

Status = something else... (connection failure):

If the SSH connection failed, we are in a situation where it is either a problem with the witness node or the monitoring node itself. We let the script retry the SSH connection up to 5 times, waiting 15 seconds between each attempt.
If after 5 attempts there is still an issue, we try to verify the local Internet connection by connecting to https://google.com (because if Google is down, the Internet is down... 😀).

Is google.com responding?

  • Yes = Witness node issue = Disable witness.
  • No = Monitoring node issue = Do nothing

Default test mode

TESTING_MODE_ON=YES is active by default in the #### CONFIGURATION SECTION ####. It will prevent the witness disable action. When done testing, comment that line to go live.

Systemd Services

We will use Systemd services to make sure our monitoring scripts are persistent, restarts in case of failures or at system reboot.

Witness Node

Create nodecheck.service in /etc/systemd/system/ with the content below. Modify the first five lines following the [Service] tag to match the environment.

Need to be done as root:
sudo nano /etc/systemd/system/nodecheck.service

[Unit]
Description=Witness Node Monitoring

[Service]
User=ubuntu
Group=ubuntu
EnvironmentFile=/home/ubuntu/steem-docker/.env
WorkingDirectory=/home/ubuntu/steem-docker/
ExecStart=/home/ubuntu/steem-docker/nodecheck.sh

Type=simple
TimeoutStopSec=20
KillMode=process
Restart=always

[Install]
WantedBy=multi-user.target

To avoid silly message when calling run.sh, part of steem-in-a-box, messages like these: tput: No value for $TERM and no -T specified, we add TERM=dumb to the .env environment file, referenced above in nodecheck.service.

PORTS=
DOCKER_NAME=witness
TERM=dumb

tput is used to change the appearance of the text, to make it bold and change colors etc. As we are starting the script from Systemd and not from a terminal, we don't have the TERM variable set. So that's why we assign it a dummy value to get rid of the message.

Monitoring Node

Create remote-mon.service in /etc/systemd/system/ with the content below. Modify the first five lines following the [Service] tag to match the environment. Take note of the Environment="UNLOCK=XXXXXX", replace XXXXXX with the wallet passphrase used by Conductor.

Need to be done as root:
sudo nano /etc/systemd/system/remote-mon.service

[Unit]
Description=Remote Witness Monitoring

[Service]
User=ubuntu
Group=ubuntu
Environment="UNLOCK=XXXXXX"
WorkingDirectory=/home/ubuntu/witness-check/
ExecStart=/home/ubuntu/witness-check/remotecheck.sh

Type=simple
TimeoutStopSec=20
KillMode=process
Restart=always

[Install]
WantedBy=multi-user.target

Manage Systemd services

To install each service to make it start at boot, run the following command:
sudo systemctl enable SERVICE-NAME

Replace enable with start, stop or status to perform respective action.

Status check

Check the status by running: sudo systemctl status remote-mon.service, for example. It provides statistics and the latest output from the script.

ubuntu@merlin:~$ sudo systemctl status remote-mon.service 
 remote-mon.service - Remote Witness Monitoring
   Loaded: loaded (/etc/systemd/system/remote-mon.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2018-07-22 14:40:49 UTC; 3min 58s ago
 Main PID: 1157 (remotecheck.sh)
    Tasks: 2
   Memory: 1.4M
      CPU: 29ms
   CGroup: /system.slice/remote-mon.service
           ├─1157 /bin/bash /home/ubuntu/witness-check/remotecheck.sh
           └─1433 sleep 180

Jul 22 14:40:49 host systemd[1]: Started Remote Witness Monitoring.
Jul 22 14:40:50 host remotecheck.sh[1157]: Sun Jul 22 14:40:50 UTC 2018 Status OK.
Jul 22 14:43:52 host remotecheck.sh[1157]: Sun Jul 22 14:43:52 UTC 2018 Status OK.

Email Setup

It is outside of the scope to discuss the email setup in detail, there are several ways to accomplish this. But if you want to use the functionality in the script you need to be able to run something like this from the command line:
echo "This is a message" | mail [email protected]

In the script, I have an example of using the ssmtp mail client. Please see the following guide how to set it up using a Google account: http://www.havetheknowhow.com/Configure-the-server/Install-ssmtp.html

The email function is disabled by default. Enable it in both scripts in the #### CONFIGURATION SECTION ####.

SSH login

For the monitoring node to be able to login autonomously, it is best to configure ssh key login. Utilize the ssh-copy-id command to copy your public key from the monitoring node to the witness node. See this guide for a detailed explanation: https://www.digitalocean.com/community/tutorials/how-to-configure-ssh-key-based-authentication-on-a-linux-server

Roadmap

  • Improve error detection, e.g. witness node script halts in a good state.
  • Backup witness support with failover.
  • Speed-up error detection by having the witness node push status changes, instead of the monitoring node always pulling for it.


Do you like this system?
Please be awesome and vote me as a Witness.

SteemConnect Link

Steemit Link
Vote for danielsaori

Vote for danielsaori

Sort:  
Loading...

Thank you for sharing this information @danielsaori! I'd be lying if I said I completely understand it all, but what I do understand after reading this, is that you are a witness who truly cares about this platform ... but I already knew that :) I do appreciate you always trying to improve on things here though!

As soon as I get higher up the witness ranking I will explore this option in more detail.

I have no doubt about that; I just wish you could do it now ... I'm sure you do too!

Thx Lynn! After reading only your first sentence I thought you were in the process of starting your own Witness. That would be cool btw.

I hope everything is fine with you and Brian!

You're welcome @danielsaori! Ha! Me as witness! I appreciate your faith in my abilities, but I'd need someone like you doing all the actual work for me haha

Thank you; Brian seems to be on an upswing of sorts these days! 😅

Hi @danielsaori! We are @steem-ua, a new Steem dApp, computing UserAuthority for all accounts on Steem. We are currently in test modus upvoting quality Utopian-io contributions! Nice work!

Thanks a lot!
Interested to see how this will develop.

Hey @danielsaori
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

hard work 👍🏻

anyone that used to play with Lego knows it's hard work dealing with blocks ;)

Preety amazing one very helpful for the persons like me who want to know more about this platform

your idea is good i need to know more about your profile

Congratulations @danielsaori! You have completed the following achievement on Steemit and have been rewarded with new badge(s) :

Award for the number of comments received

Click on the badge to view your Board of Honor.
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Do you like SteemitBoard's project? Then Vote for its witness and get one more award!

hello brother @danielsaori...withness mean?your written in your bio.