For quite some time--about a year, in fact--I had nearly forgotten my crypto-world. I barely glanced at the markets, I let my coin all just sit without moving anything around or chasing any trades. I shut off my miners because the return was so far underground I got letters from the center of the Earth telling me to forget it and sell the machines. It seemed like a cold, barren winter had settled.
Now, I'm no expert--as you've probably already guessed by my description of dormancy--but I took actions I felt were best at the time. Then, life simply happened; I shut down the miners and they collected dust (their dust covers collected dust) as family and work and desperate clinging to little free-time took my attention. I had every intention of kicking them back on once the price of Litecoin rose enough to pay for the electricity they consumed at the very least.
In early June, a friend casually asked me, "How are your miners doing?" and I replied that they hadn't been running for some time. He pointed out that the price was rising again. I took a look at the market and realized the time had come to fire up the miners again. The clouds parted and the Age of Ragnarok had come.
My first fears were that everything had somehow become outdated and sat so long that nothing would work. Of course, the opposite happened--everything was as I had left it, so all hardware, software, and firmware was intact and configured. I flipped the breakers on and voila! The miners immediately began their air-raid siren startup procedures.
I connected a monitor to the computer that runs the MCP (Miner Control Program) that I wrote and saw that all the sensors and relays were functional and I still had remote desktop access. Then, the first failure. Miner number six's power supply failed, so I am down to five miners until I have the time to repair the damaged power supply or purchase a new one and install it.
Then, failures two through thirty happened. At this point, it is pertinent to understand the setup...
There are five Antminer L3+ LTC miners connected to three twenty amp circuits. The first two miners are on the first circuit, the second two on the second circuit, and the third pair on the third circuit--which is also shared with the MCP computer, the LAN switch, and the exhaust booster fan (which I had to turn on manually when the temps were no longer manageable by the monstrously powerful 120mm computer fans already installed on the miners themselves).
Failures two through thirty were all intermittent and temperature regulation issues caused by two oversights on my part while investigating the functionality of the system as a whole. Remember when I said that I had popped a monitor on the MCP computer and checked the sensors? Well, they were working, but I hadn't noticed that one of the temperature sensors--the exhaust manifold sensor--was laying across the room (attached to the exhaust manifold I had removed when I shut the miners down in order to make more space in the main area of the basement). I didn't feel I needed the manifold right away because I wasn't planning on overclocking. That bears repeating; I wasn't planning on overclocking.
After the MCP started up, it did exactly what I programmed it to do. It checked the state of the miners--all miners responded with their timestamps. It checked the miner pool--API responsive and authentication successful. It checked the sensors--ambient room sensor and exhaust manifold sensor responding on expected COM ports. The MCP then went into its standby time wherein it monitors and records all data coming from miners and sensors, but doesn't take any action for a few minutes in order to get more accurate temperature readings. If the MCP checked the temps too early, it might accidentally think a miner is cooling down and kick the fans down immediately after turning them up (the miners have to reboot after every configuration change). After standby, the MCP began its routine loop.
Multiple threads monitor each miner in one-second intervals. Those threads feed data into shared collections that the MCP main thread reads--along with data from the sensor monitoring thread. This allows the MCP to respond a bit faster. The threads are built on the OODA loop (Observe Orient Decide Act).
The main thread and each miner thread have their own OODA loops. The main thread's loop handles sensor data and communicating with the API server for remote management. The miner loops handle the miners--obviously. In the main thread, the Observe step reads the sensors and then reads from a shared collection of data from the miner threads. During Orient, the main thread sends aggregated statistical data to the API. Then, in the Decide step, the sensor data is put against a set of rules that set up a queue of actions that are taken in the Act step. The main thread, at that time, could only send alerts or send down emergency power off commands to the miners.
In the miner threads, it does more or less the same thing, but the sensors are in the miners, the data is sent to the main thread instead of the API, and in the Act step, they can send configuration changes and other commands to the miners.
During the first few hours of running, the temperatures were all super low--in the high thirties (C). Then, my second oversight; I had left the automatic overclocking enabled in the MCP control panel. At this point in time, the MCP can only do two things when the chips overheat or start failing--turn the fans up or restart the miner. If the fans are set to their maximum speed, the MCP falls back on restarting, which can lead to a loop of restarts--which it did.
"Why don't you power the miners down if the fans don't work?" you might ask. Well, I sort of did and couldn't at the same time. After months of experimenting and gathering data, I saw that during these restart loops, the exhaust would steadily rise in temperature since the miners soft-reboot faster than they cool down. So, if the manifold temperature rises too quickly--or gets too high--the MCP would do the only thing it could do without a physical connection to the power, it would send the poweroff command to all of the miners.
The poweroff command is wonderful and terrible. It stops the processors in the machine while leaving the fans on at full-speed to cool the miner before you disconnect the power. This protects the chips from cooking themselves instead of just pulling the plug and stopping the airflow inside the case. The drawback is that there is no timer for shutting off completely. So, the fans will run at full-speed indefinitely. The fans actually make up a considerable percentage of the power consumption of the miners, so while your miners stay safe, you're potentially drawing enough current to cause heat issues elsewhere.
Not that any of that matters. The MCP didn't see the exhaust temp rising at all as the miners continuously rebooted until the room sensor sent me an alert. When I looked at the control panel, all the temps were red and there were failing chips left and right.
I noticed the overclock was turned on, so I shut it off, set all the miners to default clock speed with auto fan speed and shut down the MCP. I decided to make some upgrades. I built a solid state relay board to give me the ability to control three pairs of circuits for the miners and a fourth circuit for the big fan and a small turbine cooler for the relay board. I moved the room sensor to the relay board and mounted the exhaust sensor behind the miners so it will read with or without the manifold (and adjusted for the proximity after some testing). The MCP is also now capable of controlling the power during market lows so I don't miss the restart if things get dim for a bit.
In the process of getting all of this software and sundry up and running again, I rediscovered several projects I had for Steem--and Steemit itself. I realized how much I miss writing and how easy it is to write whatever I want here. So, I'm back. I'm mining and developing and writing again.
I recently purchased components to build a better relay box that will allow me to control each miner individually as well as two extra circuits for two additional cooling fans. More pictures, articles, and status updates to come.