iguana status - debugging by rewriting

in #iguana8 years ago

Sometimes the code just has too many special cases, strange behavior and outright bugs and even if all the identified issues are solved, you end up with something that is barely working properly and if anything goes wrong, it breaks.

The realtime block handling in iguana ended up in this state as it evolved from prior to full bitcoin RPC support to adding all the things needed to support the bitcoin RPC (yes, from scratch).

After spending all weekend trying to get it stabilized, my "i really need to rewrite this" detector started going off, louder and louder.

Background:
iguana does blockchain sync in parallel, which at first sounds impossible, or at least crazy, or cant possibly work. Well I did get things to sync in parallel quite well by using bundles of blocks. Each bundle is 500 or 2000 blocks and each bundle has its own implicit statemachine. What that means is that when a bundle gets some runtime, it looks at its current state and figures out what needs to be done to advance to the next level. Each step progressing so that in the end all the blocks in the bundle are downloaded, verified, validated, indexed, cross referenced, etc.

For those who know a bit about blockchains will probably say that you cant know if a block is valid until all the previous blocks are known. And this is true. However, the ultimate validation can go quite quickly if all the prep work is done ahead of time. It takes just a few minutes to totally rescan the blockchain, once all the data is in place.

Now all other blockchain implementations I know of us a DB (database) for some or all of its blockchain processing, especially at the higher level. iguana doesnt. What it creates are a set of bundle files using append only writing to avoid any extra disk seeks and basically can process the blockchain data at whatever network bandwidth speed you have, which means the entire BTC blockchain can be synced in about an hour if you have a nice VPS with a 1gbps connection. Yes, one hour for a full BTC sync, not a day or a week, just one hour.

So we have all these bundles getting completed as the blockchain data comes in and when all the prior bundles are present, the entire data of that bundle can be finalized. By front loading the syncing to the earlier blocks, you can end up having the data processed in parallel as the data is coming in and it all ends up in raw data files. These raw datafiles are directly memory mappable, so the "DB" in iguana is done by direct CPU access of the memory mapped pages, which once it is in memory is essentially a direct memory operation.

Sounds great! so what is the problem?

The problem is the current realtime bundle, the one that keeps growing by one block and especially when it completes and a new bundle needs to be spawned. All memory operations in iguana are precisely managed, even mutex usage is limited to queue updating, which means all the threads are operating directly to all the memory in parallel.

In the first iteration, I reused the code for the parallel sync for the realtime bundle. I made a truncated bundle and just incrementally updated it as new blocks came in. This allowed me to get all the parallel search functions to find txid, vins and vouts without much extra work as all that is in place for the normal bundles. As I needed a few more search items, I had to add a special case here, an exception handler here and in general ended up spending a fair amount of time making the realtime bundle look like the normal bundles. It did work, until the bundle boundary. at that point there are now the previous realtime bundle which can now become a normal bundle and the new blocks that need to become the new realtime bundle. Sounds not so bad, but if you visualize all the things going on in parallel, it turns out to have caused more pain and suffering during my debugging that anything else.

So, I decided to finally rewrite it into a clean solution that handles the bundle boundary issue, reduces the amount of memory used and in general is faster and reliable enough for people to use with real money.

I was able to get half the new realtime handling done today! Now it is locked to the final pass that extends the chaintip and handles reorgs and it goes in lockstep updating the realtime state. Reorgs are handled with a negative polarity in reverse order, and with all the realtime handling limited to a single thread, it avoids the multiple process contention over the same memory problem.

It is now properly updating the blocks, recreating the raw data that is needed to be updated, so I feel very good now that I can make a robust tracking of the realtime state and from what I am seeing it will update within seconds (probably milliseconds for all but bitcoin)

James

Sort:  

@jl777 any opinion on this? If so, care to make a post on it?

https://github.com/steemit/steem/issues/279#issuecomment-240294611

very complicated, it is not obvious what will happen and there is no way to know without deploying it. as long as it is iterated until we get a good solution, it seems as good as any other change at this point.

clearly things arent perfect now, so something needs to change. by observing the effect from this change (good or bad), then the next iteration should be that much better.

i did a whale for whale voting analysis and I didnt see any systemic pattern, so all the conspiracy theories are not based on the data, at least as of a month ago

So if say you were to use this technique to update the bitcoin blockchain from scratch how long would it take? Apologies if it is a stupid question but I'm not a coder but I find this kind of stuff about how blockchains work quite fascinating.

blockchain size divided by bandwidth, plus half hour or so depending on how fast your computer is.
At 70 gigabytes, if you have a 100mbps connection, that's ~10MB/sec -> 70 * 100 = 7000 seconds, so about 2.5 hours

Thanks for this write up James! :)
It's gonna be helpful explaining this tech in videos I'll make.

From the way and speed that you write code, it must be true what they say; that you are half human and half machine. While reading this post my eyes started the glaze over, not because of the way you write, but rather the why my brain is wired. A coder I'm certainly am not.
Off for a cup of tea. ^_^