So much negativity :o)
Do not show the size of the actual blocks we produce. Because we're fucking noobs.
I wish I had access to hiveblocks.com, because it has many annoying shortcomings that should be easy to correct making it so much more useful tool. It would certainly be easier than writing entirely new block explorer.
Obviously we do know the sizes of blocks, and if you run your own node, you have that in "Block stats" in log (I've discussed the need to be able to extract similar info from block log, but that's something that has understandably very low priority).
Assuming we only fill up 10% of our blocks on average
It is more like 50%. We also at times have blocks that are full, mostly during Splinterlands related events, but the increased traffic is not sustained. For the sake of some estimations below let's assume it is 1/3.
And to that I say: no, we can not. If anyone is even thinking about increasing the blocksize without first stress-testing the current blocksize they are not to be taken seriously.
Yes we can (to an extend). Obviously we'd need to do it gradually and observe the effects. I wish witnesses doubled block size already, so we could see how it behaves on actual traffic in rare events when blocks are now full (there would be the same amount of data, just packed in less bigger blocks followed by regular small ones - there is no actual demand for bigger blocks all the time). Consensus code was stress tested and we know we can handle even 2MB blocks. The special "executionally dense" transactions that were a threat before HF26, only mitigated by small blocks, were optimized away, so they are no longer a problem. It is actually hard to set up environment for stress test that can saturate 2MB blocks, but you can make estimations yourself based on the information the node puts in log. On my mid-range desktop computer during sync (where every check is made, just there is no waiting between blocks) the blocks are processed in 1.2-1.7 ms per block. Let's assume 2ms. With above assumption that blocks are only 1/3 full, full block sync would take on average 6ms. When witness node produces block, it has to process every transaction separately. With most of other code optimized, now dealing with undo sessions can take even 80% of the processing time (something that is on the radar). So we multiply by 5. Therefore production of average full 64kB block takes 30ms. To make it full 2MB block, we need to multiply again by 32. We are still under one second. I'm confident we can still cut it in half, but there is no need for that at the moment (we could f.e. process signatures on GPU for extra parallelism).
Similar with memory consumption. Out of 24GB recommended size of state, 8GB is still free, 4GB is fragmentation waste, finally more than 11GB does not really need to reside in memory all the time and is therefore a target of relatively easy optimizations (by that I mean we know what needs to be done and that it can be done, not that it is a short task :o) ). Once those optimizations are in place we could grow to 50 million users and still be able to run consensus node on regular computer.
I agree with one aspect. It is near impossible to stress test full stack of important second layer services, because you'd need a whole bunch of spare servers. And I'm sure they would not be able to handle load of full 2MB blocks - we'd need much better ones. They are not in the realm of "not yet available on the market" though, they just cost a lot. While problem of block log can be somewhat mitigated by pruning and sharing the same storage between nodes, there is no helping in case of stuff put into databases. We are talking about even 150 million transactions per day and 56GB of new blockchain data (20TB per year). That is not sustainable, at least not in context of decentralized network (I think it is important for semi-normal people to be able to run Hive related services even if they don't receive producer rewards). And I think it is not sustainable in the fundamental way, to the point where we should split transactions between those that have to be kept forever and others (majority) that is only kept for limited time (1-3 months).
Think about how many times critical infrastructure on Hive has just failed and websites were down
I'd need to invite @gtg to talk about what can cause some servers to fail at times, however what happens later is partially related to decentralization. There is no central balancer to distribute traffic in reasonable way. When popular API node fails, its traffic is often redirected to other popular API node, which has to end in trouble. No one is going to keep servers so oversized that they are mostly idle all the time only to be able to handle occasional heavy load.
Here's what's going to happen:
When adoption increases, so does the price of Hive and therefore producer rewards. They will have both need (increased traffic) and mean (increased rewards) to scale up the hardware. I also hope increased RC costs due to traffic would put some sanity into the minds of app devs. The same data could be stored way more efficiently in binary form instead of bloated custom_jsons. I've also recently learned about node-health service, that just sends "meaningless" transactions to those nodes in order to see if they pass to blockchain. I can understand the reasons why someone would do this, but it is only possible because RC is dirt cheap (one of the reasons I was kind of opposed to RC delegations and why I have not yet started experiments with "RC pool" solution similar to "contract pays" you've mentioned, even though it would be very useful). So, we have use bloat that can be reduced when incentives arrive, we can scale up through better hardware in the short term and some more optimizations in the longer timeframe. Once better hardware becomes a norm we can increase size of RC pools to reflect that, pushing RC price back to affordable levels.
Websites going down isn't a critical Hive infrastructure, usually it's related to API nodes having troubles
which isn't best way to do, because it should be solved at client side (i.e. by changing API node that client is using)
Well, from user perspective it might actually look like it is, since they are cut out of interaction with the blockchain when that happens.