How Breakable Is Bitcoin? A Scenario

in LeoFinance4 years ago (edited)

bitcoin5773850_1920.jpg


Let’s start by saying that I’m an admirer of the blockchain technology and a very early observer of it. Please note I said “early observer”, and not necessarily “early adopter”. When it comes to blockchain, I’m more of a “just in time adopter”, to be honest, and not an early one, because, sometimes, early success might be bad. But that’s another topic.

Beyond the speculative bubble that is engulfing the entire crypto space every 3 years or so (around the “halving” time, for those Bitcoin-savvy), there are a lot of interesting topics to ponder about in this area. So, if you’re here for price predictions, technical analysis (or drawing lines on charts and pretending the reality follows them), well, it’s safe to leave now. But if you’re here for some thought provoking hypothesis related to the overall security of the blockchain, specifically of Bitcoin’s blockchain, grab a coffee. It will take at least 10 minutes.

What Breaking Bitcoin Means?

In my opinion, breaking Bitcoin means proving that access to tokens located at a certain address is unsafe. It doesn’t necessarily mean to break the SHA-256 / RIPEMD-160 algorithms per se (don’t worry if you don’t know what these algorithms are, for now).

In mathematical terms it means to find a repeatable and predictable collision in the address space.

In layman terms, it means finding “the password” or the private key of a Bitcoin address.

Even if this particular event is very difficult to achieve and even if it requires a lot of computing power (the kind that just a nation-state actor can harness), the result will be that Bitcoin is breakable. The value stored in the blockchain is built on top of the trust invested in, and proved by the blockchain. If this trust decreases, or has provable chances to decline, then the trust will decline as well.

It’s important, for me, to start with this definition, because it implies many other components. First of all, it’s not about proving theoretically that cryptography behind it is weak, because it isn’t. I am very well familiar with terms like “untractable problems”, “polinomial times”, “double hashing” and so on and so forth. It’s not about that, because that is solid, it has been audited many times. And second, Bitcoin is not a theoretical product, it’s became a practical asset, one with a market cap around one trillion dollars. Any shakeup to this asset can have massive ripples in many directions. As you will see below, if we can prove that access is unsafe, by using a variety of tools, even if the cryptography is solid, then it is breakable.

So, what would breaking Bitcoin would involve, in a scenario which doesn’t rely solely on breaking the theoretical crypto behind it?

Quantum Computing? Not Yet

First thing that comes to my mind is quantum computing. Because of the way computing is done in these computers, some cryptography problems might be a bit easier to solve than others. Each type of computing machine is better suited for specific tasks. Using traditional computing machines to solve crypto problems isn’t practical, but putting quantum computers at work for that might be relevant.

I don’t think this is an avenue worth checking, at least not in the next 3-5 years. The cost to that is still prohibitive, not only to use it, but to scale it to the amount of work required to tackle such a problem.

So, brute force with the most appropriate tool, quantum computing, is out of reach, at least for now.

What next?

Introducing Generative Adversarial Networks (GANs) And A Bit Of Adjustment

If you’re not up to date with AI, specifically with Generative Adversarial Networks, you should change that. This is a very recent area of AI, but it evolves really fast.

Very shortly put, GANs are AI structures comprised of two actors: one that tries to fool, and other that checks the first one for falsified stuff – the latter has access to the “true” objects as well, while the former doesn’t, the forger doesn’t even know what it falsifies. It’s common to use, in the learning process, the terms “art forger” and “art critic”, to distinguish between the two.

The process usually starts with some noise from the part of the “forger” and with answers from the “critic”. The forger tries to guess what exactly should paint, in order to fool the critic. It’s a fascinating process, especially because it may start with just some noise, and get to unbelievable levels of accuracy. One of the most famous results of such an AI model is: thispersondoesnotexist.com. Every time you load that page, the model generates a person that doesn’t exist, but who is extremely plausible. On a side note, GANs applicability area is immense and I might write something about this in the future, but for now let’s see how this could be used for testing Bitcoin.

Suppose we want to train a forger to generate not human faces (which are, basically, still collections of bits), but private keys. It will start with just some noise, and then it will continue to talk to the critic, (the other part of the model), which already has a huge repository of real private keys – you can generate millions of that.

Let’s add some restrictions to this, because, like I said, we’re not intending to break the SHA256 protocol itself – I doubt this will work for such a task – but just a subset of it. So, not only the model will be trained on a data set of about 400 millions public / private keys (which is the dimension of the Bitcoin network at this moment), but the results will also be checked against an even smaller subset, namely all the public keys from satoshi era. I believe we’re under 10 millions of keys in the first 2-3 years. This iteration will make things go significantly faster, saving on computing power.

So, the process will go like this:

  • generate test data of the size of the current Bitcoin network (400 millions public / private pairs)
  • start training the forger until it gets “close enough”, by making it talk to the critic, which will evaluate the forger against its huge set of key pairs
  • as the forger gets closer, run the results against the 10 million public keys in the satoshi era, see if any of them “fits”
  • optional: run the result on a separate thread also, against all the remaining 390 millions addresses of the network

As you can see, it’s a combination of GANs, social engineering and brute force. The brute force part is still huge, if we want to do this properly, but not outside the reach for some powerful entities, from corporations (Google trained CPT on trillions of parameters) to nation-state actors.

I lack the mathematical skills to properly model this, and that’s one of the reasons I’m putting this out, in the hope that someone better equipped in this area can run some analysis on it, but my intuition (which may be wrong) tells me we have a much higher probability to find a private key in this setup, rather than trying to break the entire algorithm.

Of course, we may also have a much lower probability, that’s also an option.

But I think it’s worth thinking about it, especially about the data set that we use to train the forger.

What If BItcoin Is Unbreakable?

Well, if we find that this is not working, all good, back to our lives.

What If Bitcoin, As It Is Now, Is Breakable?

If this scenario works, then we have a problem. Here’s how I think it can be solved.

If the size of the trainable model is a parameter in the probability of discovering the private keys, then we should adjust the size of this attack surface. In other words, we know that an AI model performs better when there is more data to train on. So, having a fixed repository of 400 millions public addresses that can, theoretically, be used to check a generated private key against them, will make the model perform better. Let’s just lower this attack surface, then. Less data should make the model perform worse.

How can we do this? Apart from increasing the complexity of the generated keys, which is obvious, I think we can also do this by introducing something that I call “permanent limited supply”.

How does this work?

Well, Bitcoin will still have a limited supply, only it will be limited in time. It will still be 21,000,000 tokens, but, and here’s where the part “permanent” comes in, not since the genesis, but in the last, let’s say 10 years. The supply will be kept at 21,000,000 by “moving it” along a time window. As we advance in time, coins outside this window will become obsolete, untransferable. This can be done relatively easy, by allowing coins to be liquid, or “transferable”, only from a certain block height onwards.

That means people who have coins that will “expire” will have to move them to a new address, which will be added to a higher block height, thus escaping obsoletion. This, again, can be done either by users themselves (kinda difficult, but it’s nice to have it as an option) or by a separate, programmable entity. It can be a layer 2 app on top of blockchain, or even a routine added in a hardfork, on-chain.

This will always generate fresh addresses, while obsoleting the expired ones, which means it will put more pressure on the GANs to train for these new addresses (it lowers the quality of the data set by increasing its entropy), and, from a certain point, this will become ineffective / too expensive for the GAN.

Another interesting economical consequence of a permanent limited supply is that fresh Bitcoin will keep being generated, as the unused, obsoleted one is “left behind”. In our current setup, this will be true for a good number of years, since the satoshi era BTC is, allegedly, unusable – Satoshi Nakamoto disappeared (or we will finally find out he didn’t disappear, once he starts moving his coins to higher block heights?). But even as more people will be wary about keeping their Bitcoin “valid”, and will move it “forward”, new BTC should still be generated, to account for the one used in transaction fees.

Thoughts?

Initially pusblished on my blog.


I'm a geek, blogger and ultrarunner. You can find me mainly on my blog at Dragos Roua where I write about productivity, business, relationships and running. Here on Hive you may stay updated by following me @dragosroua.


Dragos Roua


Wanna know when you're getting paid?

I know the feeling. That's why I created hive.supply, an easy to use and accurate tool for calculating your HIVE rewards

It's free to use, but if you think this is a useful addition, I'd appreciate your witness vote.

Thank you!

Posted Using LeoFinance Beta

Sort:  

If your scenario happened it could cause the biggest crypto bear market ever. Also when quantum computers do become more cheaper and available, do you think bitcoin will adapt ( if it maintains it's top dog position of course)

Yeah, that's what I mean by "what if Bitcoin is breakable". It's not necessarily happening because of SHA256 in itself is breakable by brute force, but if a GAN can shorten the path to finding a specific pair, in a specific space (like the total Bitcoin address space) then we should take this into account, somehow.

If this happens, of course there will be adjustments.

Posted Using LeoFinance Beta

This sounds really intriguing. I believe we will experience the power of AI through years to come. And both of your scenarios could happen. Btc is always full of surprises. Luckily my name is not Musk, so this will not influence the price.

Luckily, my name isn't Musk either :)

Posted Using LeoFinance Beta

This may be a stupid question but why would there be anything learnable about existing public key/private key pairs? Isn't the whole point of encryption that the fastest way to generate a private key from a public key is to use brute force. Otherwise the encryption algorithm would be badly designed. Has SHA256 been proven to have such flaws? Granted, your system might find them if they exist.

GANs are different. They may "learn" something plausibly enough, even if they don't know what they are learning. A GAN model could, in theory, generate "plausible" private keys, by learning what private keys are, without knowing anything about their corresponding public keys. Now, from this space of plausible keys that the model generates after we train it, we would just use the results on a subset of the entire Bitcoin network address space, until we find a fit. That's the "brute force" part of it. GANs are coming somehow before that, and they may, because of how they work, shorten the path to a working private key.

I'm not sure I understand this part: "Isn't the whole point of encryption that the fastest way to generate a private key from a public key is to use brute force."

Please educate me but isn't generating plausible private keys rather trivial? But if you want to generate a private key corresponding to any particular public key, there is no choice but to try brute force?

I may have misunderstood some basic things completely.

Generating plausible private keys isn't that trivial. They are derived in a specific process from a part of the public key, and this process involves double hashing.

If by "plausible" you understand correct from a formal point of view (length, allowed chars), this is of course trivial. It's like you would say a person has one nose and two eyes (to continue our GAN example).

But if by "plausible" you understand "with a higher probability to find a fit in a specific set of public keys", then it's different and it's my understanding that GANs can help here. They may create something very close to that, in the same sense they are creating very plausible faces, not just sketchy noses and eyes (have a look at that site).

A very simple explanation of a GAN is that the forger learns specific patterns (nose, eyes, hairlines) by approximating the distance in pixels from them (it's way more complicated than that, obviously). In this sense, if there are patterns in the cryptography generating process, or recognizable groups of letters, etc, a GAN can be closer to a public key than just brute-brute force.

Posted Using LeoFinance Beta

Generating plausible private keys isn't that trivial. They are derived in a specific process from a part of the public key, and this process involves double hashing.

This I did not know. It cleared up the issue!

Yes, I've read about Generative Adversarial Networks. There are two parts: a generative network that generates possible solutions and an adversarial network that judges them.

But let's get back to plausible private key.

How far from having a plausible keys are you to finding the actual key that corresponds to a public key in general terms?

Posted Using LeoFinance Beta

How far from having a plausible keys are you to finding the actual key that corresponds to a public key in general terms?

That I do not know for sure, but my intuition tells me that we can get closer if we first use GAN, because it may shrink the distribution space. It's just an intuition now, like I said I lack the mathematical skills (or the time) to model properly.

It may even be a question of trial and error until we find which parameters in the learning process affect the validity or the probability of finding a working key.

Sounds cool.