Steem Pressure #6 - MIRA: YMMV, RTFM, TLDR: LGTM

in #steem-pressure6 years ago

MIRA has recently become the most famous ETLA on STEEM.
But what exactly is MIRA and what does it do? Let’s take a look at MIRA performance-wise.


Video created for Steem Pressure series.

MIRA (Multi Index RocksDB Adapter)

The purpose of MIRA is to allow Steem blockchain nodes to store almost all necessary data on a disk in a modern database, as opposed to a shared memory file

But why is that better?

Non-MIRA steemd works fine… as long as its state file, which is located in shared_memory.bin, fits on a RAM disk or your system has enough RAM for buffers and cache to handle it. The state file grows and once you run out of RAM, performance degrades SIGNIFICANTLY because of its extreme I/O intensive nature.

Pre-MIRA main issues:

  • Full API: To provide full API support, you need servers with a ridiculously high amount of RAM (512GB+)
  • Consensus node: A lot of memory is needed even for a basic node that doesn’t need very low latency (like a private steem node for broadcasting and managing a wallet)
  • Exchanges: They usually run steemd on a potato-grade VPS servers where NAS or IOPS limiting is a common practice.

Full API

A Long, long time ago, you had no choice but to run a big monolithic node if you wanted to satisfy all the API needs.
Nowadays you can build a customized infrastructure that suits you best.
Hivemind can replace slow and resource hungry tags and follows plugins.
The account history plugin, which was blessed with RocksDB optimization earlier, may significantly reduce the burden on a “fat node” (i.e. LOW_MEMORY_NODE=OFF).
Unfortunately, a “fat node”, even after removing the burden of tags, follows, and account_history, is still memory hungry.

30M blocks need 175GB for shared_memory.bin file.
That means that nowadays a server with 256GB of RAM is barely enough.

(Don’t forget that somewhere there there’s a 64GB RAM machine for a consensus node running a RocksDB powered account history plugin and a 16GB RAM machine with jussi and hivemind.)

Consensus node

A consensus node doesn’t need that much RAM, but it still needs a significant amount. It should be a lightweight node but it isn’t exactly lightweight if you need a 32GB RAM machine with a decent storage backend (in my benchmark, I used a 3x SSD drive in RAID0 configuration, not really a typical setup for your workstation at home).
A consensus node can be found in many use cases: a seed node, a broadcasting node, a witness node, or a private node for cli_wallet interaction.
If you have it running, this means that you have your own copy of the Steem blockchain which is essential for decentralization.

Exchanges

Unfortunately, we can’t expect exchanges to run a fancy server dedicated to Steem in order to process a simple transfer operation. Sure, we can tell them that Steem is awesome, and that’s very true, but they care about money, and even though many other coins can also be a pain in the back(end), some of them tend to have daily volumes 500x higher than Steem.
In the past, it took some of the exchanges weeks to replay.

“Hey, exchange, we don’t want to insult you but…”

Is not a good way to start a conversation, so it’s essential that we make Steem run on a potato. Faster.

So what is MIRA and how can it help?

There is already a lot of material on that topic, most importantly:

@vandeberg’s What is MIRA?
@steemitblog’s MIRA: Soft Roll-Out Begins!

YMMV (Your Mileage May Vary)

Depending on your needs, MIRA can make your time of replay 2-5x longer or infinitely faster (“within a few days” is a much better ETA than “never”).

  • If you are a witness, have a good 64GB RAM machine with a local SSD storage backend, you won’t be happy, because MIRA doesn’t help much in your case - on the contrary, it will slow you down.
  • If you run a seed node or a consensus node on hardware similar to my benchmark setup, that is 32GB RAM with a low latency storage backend you won’t be happy either, because replay times are expected to be much longer, but once it’s running you should be happy, especially when you were about to run out of your fancy storage space. Yes, RocksDB data take less space than the shared_memory.bin file.
  • If you run the most common node, a consensus node on an average machine or a VPS, where the storage backend is usually either limited in terms of IOPS or has higher latency because it’s network attached, you will be happy, because in that environment MIRA behaves much better. That applies to many low ranked backup witnesses, exchanges, and even individuals who want to run their own node locally.
  • If you would like to run fat nodes or full nodes, or you have a decent infrastructure with state providers, or much more RAM than a consensus node needs (to reindex in-memory using the hybrid mode), then you will be more than happy to run MIRA. A fat node on a 64GB RAM workstation? Sure.

RTFM (Read The Friendly Manual)

You need to get to know MIRA better to be able to make it work for you in the most efficient way.

MIRA has many options that allow the user to improve performance. For most use cases, the default options will be sufficient. However, one consideration when configuring MIRA is resource limiting.

- MIRA basic configuration guide english corrects and improvements

Hybrid mode

You can actually reindex with the most expensive indices in memory, using current solution and then migrate to RocksDB after the reindex is complete. If you have enough RAM to keep everything in memory and still enough left for what MIRA needs, you will be happy with that speed.
(If not… then you won’t: take a look at Case 1 below)

LGTM (Looks Good To me)

Here are some of the examples of how MIRA behaves in different use case scenarios and hardware setups:

Case 1: Consensus node on 32GB server optimized for non-MIRA

Intel Xeon E3-1245 V2 @3.40GHz, 32GB RAM, 3x SSD (RAID0)
This is our usual one, that were used for benchmarks in previous episodes.

Replay speed: Consensus node on 32GB server optimized for non-MIRA

That’s one of the scenarios that gives the worst results. A node that is optimized to run non-MIRA steemd and had done so just fine for the last 30M blocks. It uses tmpfs to reduce latency for the shared_memory.bin file.
With MIRA replay is significantly slower.

MIRA in hybrid mode replays almost as fast as optimized tmpfs, but since machine doesn’t have enough RAM switching from BMIC to RocksDB takes a significant amount of time.

Replayv0.20.10MIRAhybrid
30 M blocks8 hours48 hours8+25 hours

Case 2: Fat node on 64GB workstation

Intel Core i7-6700K @4.00GHz, 64GB RAM, SSD
“Fat node”
That is built with:
LOW_MEMORY_NODE=OFF and CLEAR_VOTES=OFF
and running plugins aimed to satisfy hivemind needs:

plugin = webserver p2p json_rpc witness account_by_key reputation market_history
plugin = database_api account_by_key_api network_broadcast_api reputation_api market_history_api condenser_api block_api rc_api

Replay speed: Fat node on 64GB workstation

Replayv0.20.10MIRA
30 M blocks71 hours41 hours
State size175 GB81 GB

Case 3: Consensus node on a 16GB server

Intel Xeon W3530 @2.80GHz, 16GB RAM, 2x SSD (RAID0)

Replay speed: Consensus node on a 16GB server

Yes, you guess right, for v0.20.10 a significant drop in performance is around 18M-19M blocks, that’s exactly when shared_memory.bin file grows beyond 16GB (amount of RAM).
I stopped replay process at 20M blocks.
Why?
Because ETA at 18.5M was 3 days, at 19M almost a month, and at 19.5M almost three months.
MIRA starts slow, really, really slow, even 10-20x slower than non-MIRA, but after certain point it doesn’t slow down anymore.

Replayv0.20.10MIRA
20 M blocks215 hours+27 hours
30 M blocksat least months61 hours

Previous episodes of Steem Pressure series

Introducing: Steem Pressure #1
Steem Pressure #2 - Toys for Boys and Girls
Steem Pressure #3 - Steem Node 101
Steem Pressure: The Movie ;-)
Steem Pressure #4 - Need for Speed
Steem Pressure #5 - Run, Block, Run!

Stay tuned for next episodes of Steem Pressure :-)

Bonus: with dedication to future performance improvements


“Run Boy Run” - Woodkid, The Golden Age



If you believe I can be of value to Steem, please vote for me (gtg) as a witness on Steemit's Witnesses List or set (gtg) as a proxy that will vote for witnesses for you.
Your vote does matter!
You can contact me directly on steem.chat, as Gandalf



Steem On

Sort:  

good to see you posting despite not answering my PMs :P

Doing my best to keep "once a month" schedule ;-)

Any rewards and incentives for Delegators?

Posted using Partiko iOS

What do you mean?
MIRA is on the backend.

I didnt understand this well.

Could you maybe write it in layman terms. Will Mira affect user experience? Transaction speed? Cost of running a witness node? Cost of running a full node? Things like that. :)

Will Mira affect user experience?

No.

Transaction speed?

No.

Cost of running a witness node?

Yes, for backup witnesses.
Maybe for top20 witnesses (not main node, but some helper nodes they are also running)

Cost of running a full node?

Yes, should reduce significantly.

Awesome! Thx for the explanation. :)

Excellent post with very great pieces of information, as usual.
Thanks @gtg. I can't wait to (find time to) rebuild some of my nodes with MIRA.

MAN!!! I wish you wrote the posts for steemit developers... This is amazing and so so much more readable!! Thank you!!

This is pretty awesome news. Great work. I know this has been in the works for a while and it should help cuts costs for all of those running full nodes because those prices are crazy. Keep being awesome with those wizard coding skills.

Looks to me you are doing a fine job as a witness, I checked and have you on my witnesses, I commented to thank you for the upvote in my last post about Peter Rabbit being mysteriously killed.

Thank you. In fact that upvote was thanks to @elsiekjay.

OK, so, when this drops we can move to 16gb servers, but if anything goes wrong or we have to change something than it'll be 2+ days to reindex?

After Mira is in there are there new ways to speed things up?

Well, MIRA is awesome for many cases, but not for our primary block production nodes.
Top20 witness case is the worst case scenario for MIRA I'm afraid.
That doesn't mean it doesn't help us. It enable us to run seed nodes, api nodes and some other helper nodes on less powerful hardware.
I think that further tuning up a hybrid mode settings can bring some great results, problem is that the optimal configuration for any one specific piece of hardware will vary.
Not to mention differences in infrastructure. For example a state provider with 128GB RAM can replay with awesome speed and deliver state to block producer through 10Gbps network in a decent amount of time.

Yes, there's a plenty of ways to speed things up but it's matter of time/effort needed and priorities. Native RocksDB would be much faster, but also much more time consuming.

Great explanation.
I'm interested in your comment about a 128Gb machine, both for personal reasons and because this is the RAM limit for HEDT PCs (and now also the latest Intel 9th Gen Desktop CPUs).
This means that this is the largest RAM one can get on a PC at reasonable prices.
I noticed that the state size in MIRA is only 81Gb. I assume this means that it will fit in RAM on a 128Gb PC (whereas previously it wouldn't).

I've got the following equipment and wanted to know the best config for a fat node, witness node and seed node.

  1. i7 6800k with 128Gb RAM & 256 Gb NVMe & 500Gb SATA,
  2. i5 8400 with 32 or 64Gb RAM & 500Gb SATA
  3. Multiple Pentium 4560s with 16 or 32Gb RAM and 500Gb SATA

Are those SATA HDD drives or SSD?

It’s a 500 Gb SSD. I also have 4 Pentium 4560s and an i5 8400 and RAM sitting idle. What can I build for Steem?

Posted using Partiko iOS

Top20 witness case is the worst case scenario for MIRA I'm afraid.

So lets just not break the chain :-)

This is so detailed for noobs like me. It really decrease some burden on witness runners. Definitely its power the steem block chain. I will resteem this for my friends.

Interesting results and a bit surprising too. Mira is supposed to be able to run on a potato. Isn't it?

That's what I understood from van's earlier post.

MIRA in hybrid mode replays almost as fast as optimized tmpfs, but since machine doesn’t have enough RAM switching from BMIC to RocksDB takes a significant amount of time.

This looks like a case for spinning up a fresh node from scratch rather than trying to migrate from an existing setup with ram limits as mentioned.

Just curious as I may do just that if it makes a difference to greater decentralisation? If it works?

Yes, it will run and shouldn't slow down just because state can no longer fit in RAM.
That switching I was writing about is on a new node, after replaying in memory. It doesn't fit, so it has to use swap, a lot, thus it takes that much time.
Decentralization itself isn't a value, it need to have a purpose. If you run service that makes use of Steem node, now it will be much cheaper for you to have own node.

Cool, I've been wanting to do a new build to self host my websites for a while now. This could be very useful.

Using swap is fine as long as it has decent speed which this setup seems to show.

The value of decentralisation in my expression above was referring to increased security for the blockchain not the setup above.

Thanks for the update.

What do you think if the change in curation/author reward change proposed by STEEMIT? Do you think it should be a witness parameter, or defined on chain?

Posted using Partiko Android

IMHO it's a good attempt to improve platform economics and definitely worth trying. I prefer it to be defined at HF (even though that has its own drawbacks), it's way less complex, and also makes less uncertainty and clear perspective on before/after.

Mira saves a lot of resources, right?

Yes, unfortunately at a cost of reindex speed.

This should be really usefull, if we get Smt protocol on the Blockchain in the future.

you're the best, thanks for this!

Posted using Partiko Android

Re: your previous comment, well... replay speed is one thing and we can try to run it on a cheap systems and check how far we can go on a low-end side before replay times will no longer be viable, but there's also latency for block production which is essential for a witness. I haven't measured that (yet).

Thanks @gtg, it helps me a lot (read: not very much) and I am glad it is going to reduce the infra costs for at least some of the witnesses. :)

Most of that goes over my head, but I’m glad you’re on it. Thanks for all you do! 💞

I've included some cool charts:
green being above orange at the right side of chart means good
and a fancy intro at the beginning and a nice music video at the end
;-)

Yes, you have. I like the videos. Fancy indeed. ;) I'm also glad I don't need to know all this stuff, I can simply vote for you and other competent witnesses. I'll just post poetry and philosophical ramblings, and have fun meeting fabulous people, knowing you've got the tech side covered.

What is the planned release date of MIRA adapter?

The code is already merged so it can be used already, but I don't know about "official release" date. There might be still some rough edges, I'll try to look into it today.

Any chance we can get some instructions on how to set up a 16Gb Consensus (witness) node w/ MIRA for someone interested in becoming a backup witness? I've got some hardware at home just kinda sittin' around and not doing too much -- would be keen to put a witness together for the Steem project I'm working on.

Sure. Actually it should be really straightforward basing on episode 3 of Steem Pressure series. Not much changed, but to keep consistency I think I will post an updated version sometime soon.

Very cool! I'll also check out your Episode 3 of Steem Pressure.

Looking forward to seeing what's up, and hopefully setting up a witness.

I don't know why I have read the post.I didn't understand the most of it.But what I did understand is that you need to know a lot of things to become a witness.I think the main advantage of using MIRA is that it lets you store a huge amount of data with less memory.

Yes, not only store it on disk, but also provide efficient access to selected pieces of that data in short amount of time without relaying that much on a very fast but very expensive RAM.

With these graphs you made easy to under stand a really complicat es subject!
Thanks for your work Gandalf! :)

Very Good ! All the best very Good post !

Congratulations @gtg! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You published more than 60 posts. Your next target is to reach 70 posts.
You received more than 20000 upvotes. Your next target is to reach 25000 upvotes.

You can view your badges on your Steem Board and compare to others on the Steem Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

To support your work, I also upvoted your post!

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

60 Posts. How awesome is that?
There are people who can do that in less than a week.
Well... ;-)

Congratulations @gtg!
Your post was mentioned in the Steem Hit Parade in the following category:

  • Pending payout - Ranked 8 with $ 109,74

Magic Dice has rewarded your post with a 73% upvote. Thanks for playing Magic Dice.

Thanks for playing

I've never played.
I'm not a fan of gambling, but this version of the Russian roulette have won my heart:
kill -9 $RANDOM
There's a rumor that witnesses have it in their crontabs and are playing all the time ;-)

You have receive an upvote. Thanks for playing moonSTEEM

 6 years ago  Reveal Comment
 6 years ago  Reveal Comment