Analysis of the Late Gridcoin Superblock and Suggested Solutions

in #gridcoin7 years ago (edited)



UPDATE: If you are running a GRC wallet on Windows OS and want to help, please follow the instructions here.


As most of you will be aware by now, we have been waiting on the current Gridcoin superblock for almost 112 hours. Given that we expect superblocks every 24-48h, this is severely out of the ordinary. The problems with a late superblock were detailed by @Erkan in a previous post on this sub, where he placed a bounty on solving the problem:

$GRC bounty -> help us create a superblock (no superblock for 99h)

To summarise, the lack of a superblock:

  • Stops newcomers having their CPID properly included in the neural network
  • Means magnitude will not be updated, so users cannot be rewarded for their research fairly

The Problem

The late superblock appears to be attributable to the massive influx of newcomers into the Gridcoin network, which the current superblock system is not equipped to deal with. Before we delve into that, lets have a look at how a superblock is created:

  1. Every 24 hour period, a quorum (group of nodes) is selected at random. Only Windows users can be a part of the quorum. The job of the quorum nodes is to download project data from the BOINC project servers. They then hash this data, and compare hashes.

  2. If the hashes of most of the quorum (I am uncertain of the cutoff) agree, then one of the quorum members needs to stake a block on the Gridcoin blockchain to add the next superblock.

The influx of newcomers to the network has had unforeseen consequences that stopped both of the above two steps in generating a superblock from functioning as intended:

Due to the huge influx of new users, the number of CPIDs in the network changes rapidly. It is likely that as a result, when the quorum members download the data from the project servers, the first and the last connection may not get the same data set. This results in mismatched hashes, which yields a hung quorum as observed yesterday.

Hash : number of nodes from quorum that agree on hash; percentage
21bf4d42dc4f5dacef7daba2ab54c196 : 36; 39.46%
567d6a9e3c7451f7673c4d05bc4bdc8b : 31; 33.39%
b4b529630837104782408a0bb5cfdbad : 25; 27.13%

As a result of this, there is no consensus between the quorum nodes and no superblock can be generated at all.

If the hashes do match, and thus quorum consensus is reached, the quorum is unlikely to be able to stake. Most PC users have Windows as their OS, which means the huge influx of members into the Gridcoin community has created many new wallets with a very low chance to stake a block. A wallet with a low chance to stake a block is one with both a low balance (for POS staking), and a low magnitude (for POR staking). The most recent quorum appears to have fallen into this category, as they reached consensus but failed to stake a superblock:

213472e4b11ea3170e1a01323f309576 : 29; 99.97%

Due to both these problems being highly likely in any round of generating a new superblock, it will take a long time for a superblock to be generated from now on out. This is a scaling issue that we will need to address as a community.

The Proposed Solution

We can address the above problem in a myriad of different ways, but to allow for better scalability I would suggest:

  • Excluding any CPIDs less than an hour old from the BOINC project servers' data before carrying out the hash. This will reduce the likelihood of a hung quorum.

  • Adding a DPOR weight requirement to be selected for the quorum. This ensures that once the quorum does reach consensus, there is a relatively high likelihood of a member to stake and generate the new superblock.

The Bigger Picture

Every cryptocurrency suffers through issues, which comes with being part of the bleeding edge of technology (such as blockchain tech). @Vortac elgantly summed this up on Reddit:

Crypto is still far away from being even close to 100% reliable and Gridcoin is not an exception. In fact, I would say it's comparable to the biggest networks out there, on the reliability metric.

Overcoming scaling issues like this will take Gridcoin to new highs. These problems were born from massively increased enthusiasm and a huge influx of members. They are good problems to be facing and to overcome.

I would like to thank IFoggz, Mercosity, Bullshark, NateOnTheNet, Barton26 and Ravon for taking part in our discussions to work out the cause of the superblock issues, and will leave you in the hands of our GRC IRC bot:


Sort:  

Thank you and thanks to @jringo for sending this my way. I'm new to this whole thing, but I'm hoping it will work out well so I can share my experiences with my followers and make this scaling problem even worse. Hehehe. Scaling issues are a good sign that things weren't prematurely optimized and that people are actually interested in participating in the project. I hope it gets resolved soon!

Thanks for this info
Keep it coming !

You're very welcome. Fingers crossed for a SB.

@Dutch, thanks for explaining. I really appreciate the details as a user of the GRC community.

Regarding the proposed solutions

  • Excluding any CPIDs less than an hour old from the BOINC project servers' data before carrying out the hash. This will reduce the likelihood of a hung quorum.

  • Adding a DPOR weight requirement to be selected for the quorum. This ensures that once the quorum does reach consensus, there is a relatively high likelihood of a member to stake and generate the new superblock.

I support these -although @nuda1 had a good suggestion as well regarding beacon age rather than DPOR weight...but that might be how we have to implement the first bullet. It would be great to see if Rob or one of the other active developers would be willing to implement this in the code so we can get a new update pushed out. I think this is going to be critical as well if we're going to eventually remove the team requirement (which I also support doing).

I would like to ask though: what does a lot of new users mean? What is the scale we're talking here? 10s, 100s, 1000s?

Would it not seem legit that maybe the issue also lays in the fact we just merged Rob's code with the #gridcoin community dev's code and merged the two code tree's. Since then we have had 1 if not 2 blackswans and instead of forking we are lucky enough to not get a superblock.. I would think since we were fine for months , and we had the same influx of users the past few weeks/months that seems a little weird to blame it on the number of new user beacons into the nn and the newbie block etc. IMO and no offense to the dev team , roll back the code pre merge and go from there back to 1 coder vs 5-6 additional. Yes this happens etc etc , but also Rob has in the past put in things like code time bombs and forgotten about them himself and caused us to fork , soooo this could be due to the merge of code either with the " consensus or quorum " issue being a result of something deeper. I am no dev , but when something breaks right after you change things completely and merge things the best fix is to remove the modifications and changes. ( btw , i love that to community is getting a change to get involved in dev and my hypothesis is nothing negative at them , they are much appreciated ) it's just how you would deal with it , if it was physically and tangible in your hands.

This is a bit negative. Before casting judgment I do believe these gentlemen have done some fine analysis - but, I would actually like to see the information that backs it up. How many new members? What's the magnitude of the issue - if we're going to blame it on scaling, then show us the scale. I would tend to agree with you, @jamezz, when we went up from $0.004 to $0.01, we added a lot of new users too - is that number smaller than the scaling issue that supposedly exists now?

So why haven't we had hung quorums these past few weeks leading to such a long superblock? As the influx of new users hasn't been over the past few days its been over the past month, thanks.

I have moved my staking from my Raspberry Pi to my Windows machine

How about including Linux users?

Thanks for the explanation, was wondering what was up, even though im part of the problem being new to GRC, so hopefully this gets fixed soon and will just make GRC better!

Adding a DPOR weight requirement might not be the best idea because DPOR weight is always dropping after you get a POR reward.
How about adding the requirement that the beacon must be older then one month or so instead?

I love the concept of gridcoin, but since the conversion to GRC, I have had problems trying to get my wallet to work properly. I hope that you are able to fix the scaling problems. I would love to get back in to this coin.

Join us on IRC and we'll help you through any wallet problems you may have.

We are a friendly and transparent community.

Hope to see you soon.

http://webchat.freenode.net/?channels=gridcoin-help&uio=d4

Thanks. I will join in tonight or tomorrow. Have to see if I can even find my old wallet now. Was not much in it, though - but I can change that once I get started with your coin again.

Gridcoin seems such a cool one. Upvoted!

Is this another ICO?

No. This is an established coin (2013) rewarding BOINC computations rather than aimless hashing. We have seen unprecedented growth of the network in the last months and are dealing with some scaling issues.

good to know the origins of the problem :)

And, we have a new superblock: https://www.gridcoinstats.eu/block/942327

Only took 200+ hours and was created by an investor account if I understand it correctly.