Steem was temporarily down --- So what happened? --- You have questions? Let's provide some answers.

in #steem6 years ago (edited)

All eyes are on steem right now. There was a temporary problem with the chain.

steem-not-broken-long.png

You probably are going to hear about this a LOT over the next couple of days...

Yes, it is true, the steem blockchain was paused. It did that because witness nodes couldn't reach consensus on what blocks were valid.

This happened at exactly the right time, right when witnesses were transitioning from version 0.19.6 / 0.19.12 to the new 0.20 version of the blockchain.

There was a small glitch, that made 0.20 software not 100% compatible with older 0.19 witness nodes.

Before I continue, I must explain, these are my observations and opinions, and in no way should be taken as official word or facts. Please investigate yourself, or wait from official word from Steemit.

It was a tiny mathematical inconsistency that was present in 0.20 and merged July 18, 2018. (here)

  • It made it through testnet "just fine".

  • It even made it through "mainnet" just fine.


Actually both older 0.19 witness nodes, and 0.20 witness nodes were working as intended, just fine.... on the main live chain for many days, without issue.


Until the right time hit, where someone did a 0.20 vote that certain witnesses accepted, that older 0.19 witnesses didn't agree with... and at that point, things got fragmented really quick. It seemed that the chain lost consensus, all at once.

The quick response (and believe me, same day fix is fairly quick)... to track down the problem, implement fixes, test those fixes, and get witnesses back online again was pretty fair... especially if you understand "decentralized software development".

During the outage, I heard people complaining about "software needs to be perfect."

  • I agree...

  • But ask yourself, when is the last time your computer needed an emergency update? Or your antivirus software? How about your phone?

  • Ever hear about a bank glitch? Ever see a TV station not broadcast properly?

In a software world, unexpected things will happen. Especially when programming code gets complicated, and there are more and more lines of code to examine.

The best thing you can do, when a problem like this arises... (and there may be others in the future)... is to keep calm. Be supportive, and communicate to others if you know something they might not.

....and that's the reason for this post. :)


Since I am not on the development team.... nor am I privy to the steem slack... I had to obtain my information through my own research and by participating in steem.chat.


I also run a witness node, named @intelliwitness that could use your vote. Should I not make it to a top 50 witness soon, I may have to eventually shut it down for financial reasons. So if you have a spare vote in your list of "30 free votes"... please consider voting for intelliwitness. See screenshots and howto here

I should also mention that other networks, like whaleshares were not effected by this particular outage, because they are running similar code with significant differences on a different fork.

  • ...and that's the benefit of different blockchains, and open source code development. Dan Larimer was brilliant when he insisted that steem should be Open Source, just like Satoshi Nakamoto did with Bitcoin. :)
Sort:  

We’ve had periodic glitches since I joined a year and a half ago but I’ve always been impressed by how witnesses from all over the world have worked fast to get things running again. I used to freak out when that Something Went Wrong Robot pic would pop up up, but now it’s like not a big deal, smart people are working on it.

Same feeling here.
I have a full trust in the steem crew.

I can stop holding my breath now.

Thanks for the explanation. I was wondering exactly what happened.

If anyone thinks code should be perfect then they never tried to do it themselves.
I'm a very inexperienced coder but I can cobble a bit of Python together, I got a problem the other day with adding two floats together and then comparing to a third float; apparently in binary adding 0.00264 and 0.0001 does not equal 0.00274... until you work with this stuff you don't realise just how weird results can happen. Yes testing is the key, but with weird math like that it's easy to miss what should be logical certainty.

Often it is the simple things that can break code... just like this... that is a great example.

So HF 20 was in effect or partially in effect?

Probably not yet, because the @steemitblog would have announced that.
Hardfork 20 is scheduled to go live on September 25, 15:00 UTC.

Good to see it's back. We have had similar issues before and the team has always pulled through. I know we'll how software can go wrong

Interesting thanks for this detailed info, considering you did all the research yourself that’s impressive

Posted using Partiko iOS

Everyone was researching... I was taking little comments made by various people in chat and then checked out what I could on github... coming back, etc, etc... examining my own witness node, and my own source code I had available. (I don't run docker images).

I by no means should take credit for finding the problem. We have everyone in the private slack who did all of it... and they deserve the true credit.

I am simply reporting what I found out, and verified to be true.

But whatever you have reported is valuable as there was a little bit of fear about the sudden breakdown of blockchain

I saw lots of steemian panic yesterday ..most were scared but, thanks to the powerful devs holding unto the project with a firm grip. Just like I've always said, Gat nothing to worry about. Even human gets sick sometimes.. Just some check up needed to keep things right. Steemit is here already and its going no where any time soon. If probably you hear that it has gone,then it mean it has moved to a better level. Nice post there .

Yep @intelliguy I'm gonna upvote all the good witnesses right away. BUT, only after each one of them who want my vote read and follow my final message on this article and after doing that get back here and reply to this comment. ;)

So do good and don't look for someone to blame?

Thanks for the update. It is the only one I have seen. You earned your witness vote.
I advise you to tell current position. I went to witness page 📄 and still have not found you. Make yourself easy to find.

God bless our witnesses for the quick response

it happen in HF19 before. it's normal, but to the newbie, they will be scared to death, lol!

I'm not worried because it's not the first time this happens, although today it took longer than usual.

I upvoted your post.

Keep steeming for a better tomorrow.
@Acknowledgement - God Bless

Posted using https://Steeming.com condenser site.

Hf20? Deos? Nothing stops steem!

Posted using Partiko Android

Good analysis and report.

if you shut down your witness...

:)

Is that a tail hanging off there to the left side of that silhouette?

I really hope so 😂

Personally I think it's been quite refreshing having some down time - and it forces people to look @some new stuff in their feeds and consider whose on their auto vote feed!

Posted using Partiko Android

Glad Steem is open source. What a long day.

I wonder if this will impact the transition process towards HF20 next week.

Good review, I agree with you on a lot of points. And I am not expecting any software ot be perfect.

But I would say that an outage lasting multiple hours is excessive in this day and age. And perhaps better testing and rollout M&P's are called for here in the Steemit ecosystem.

Throwing tainted data through the testnet would be advisable in my opinion.

One day, someone will knock and my door and invite me to the roundtable, so I can help discuss such things.

Es normal que este tipo de cosas sucedan sabiendo que estamos en un mundo donde las nuevas tecnologias son las que dominan ahora el sistema....pero me alegro que se haya solucionado rapidamente.