Private blockchains vs other databases part 1: The tech perspective

in #blockchain6 years ago (edited)

tl;dr: If your project doesn't specifically benefit from a feature which is only offered by blockchains, save yourself the effort and the pain and use a regular database.

As an introduction, I'll just say that I did stuff with distributed databases before getting involved with blockchains, which was around 2014. I hold a PhD in computer engineering and you can take a look at my LinkedIn profile to see some of the interesting things I did. 

This is a two-part article on the pros and cons of using a blockchain in a project, where I'm summarising my experience so far, and where I'm offering guideline to decide whether a blockchain is actually needed to implement an idea for a project. This will be mostly done within the context of having a private blockchain. Since I'm currently bridging between the tech side and the business side, I'll write both, with the tech side being first.

The second part can be found here.

Blockchain is a data structure

Saying that a project is "powered by a blockchain" is pretty similar to saying the project is "powered by a linked list" or "powered by a binary tree." It really comes down to how this data structure is used. One may also say that a project is "powered by a relational database." When did you last see that a successful product is being advertised as "powered by MySQL" as one of its selling points? Exactly.

This data structure, in essence, is about grouping pieces of data, usually called transactions, in blocks, and having each block digitally sign the preceding one (which can be done in several ways, including simply hashing the preceding block, under certain restrictions). This directly leads to an important property: all things being equal, data put in the blockchain is immutable.

As good as immutability often is, as it provides assurance against data tampering, corruption, and leaves an automatic audit trace for all changes to the blockchain, it can also be a downside, especially if GDPR is mentioned in the same sentence.

Property #1: Blockchains are immutable

So, if the project directly benefits from immutably storing data in an append-only fashion, blockchains will provide this feature. Of course, on the other hand, the core ideas of the blockchain can also be used simply to establish a transport protocol or a serialisation protocol, where old blocks are designed to be discarded, but so far this idea isn't popular, even if one of the most popular projects, Ethereum, already has the concept of blocks carrying instructions on how to modify a data set which basically can exist as its own database.

If data needs to be modified, or deleted, all types of regular databases will be much, much better suited for the task. For starts, this is exactly what most of them are designed to do (especially the ones optimised for OLTP performance).

Property #2: Blockchains use compact, slow storage

Blockchains are usually designed to hold a digitally signed version of compact data records. In other words, usually, data is first written out in bytes in the shortest way possible (e.g. RLP), then it's digitally signed, then it's included in blocks which are themselves compact (and digitally signed). This means that it's not optimised for either read or write performance.

Regular databases store their data in a deliberately loose format, not only uncompressed but also written out on disks in a generous way with much slack space and redundancy, all of which significantly helps both read and write performance. Indexes, for example, are a way of significantly increasing database access performance, but are in essence, duplicates of the data arranged in a way which particularly helps very specific types of operations. This is why a relational database which imports a blockchain can take up to 5x the disk space the blockchain itself takes - all in the interest of increasing performance. 

I've seen operations involving reading blockchain data directly be up to 100x slower than the corresponding operations on the same data set stored in a regular database.

What makes public blockchains useful is consensus

The idea of the consensus is similar to the idea of data integrity rules in relational (and other types of) databases. These are rules which govern which data is acceptable and in which format. In cryptocurrencies, the consensus rules decide, for example, if the transaction is valid by ensuring that the sender has enough coins to send to the receiver. With smart contracts, the consensus rules also ensure that every single node running the appropriate blockchain executes exactly the same code, with exactly the same inputs and produces exactly the same outputs.

As every single full node on the planet verifies (or at least is able to verify) all the data in the blockchain against the consensus rules, that means that every full node on the planet can unambiguously verify transactions, and so the people who use them can trust the system to accurately and irrevocably carry them out. One of the biggest innovations Bitcoin made when it was introduced is the idea of combining distributed consensus with securely mined blocks and with incentives for mining blocks. Every single miner effectively verifies all the transactions in a block before mining it, so other nodes don't have to (but can still do it if they want to). By spending a huge effort in mining blocks, the possibility of someone forging historical blocks (e.g. by removing certain transactions from them so it looks like a payment never happened) becomes ever lesser.

Property #3: Distributed consensus is what makes public blockchains useful

In cryptocurrencies, distributed consensus is what makes blockchain-based payment networks work without a central point of authority. There isn't a "clearing house", or a trusted party which verifies that the sender has enough coins to make a transaction, or which ensures the transaction is correctly written down and safely executed. By having code all over the world constantly processing exactly the same data in exactly the same way, and reaching a consensus about exactly the same outcome of transactions, there is no need for any kind of centralisation, at least on this level.

Since consensus in cryptocurrencies is mostly tied to the concept of mining, the choice of mining algorithm has an impact on how the consensus works. The most common algorithm is called Proof of Work and it involves spending an incredible amount of CPU power to solve a special type of mathematical puzzle which is hard to solve but its solution is easy to verify. Spending so much computational power, and as a consequence, a huge amount of energy, to create blocks, is seen as a good thing because it makes forging blocks very hard to do. Specifically, hard enough so no one would bother doing it. 

The next popular algorithm is Proof of Stake which doesn't waste electricity and in most variants operates somewhat as a savings account. A specific amount of coins is usually stashed away, and there they "earn interest." Each address participating in such a scheme gradually earns more coins just by letting some coins sit for a while untouched, and this is done in a distributed fashion, governed by algorithms in a global consensus. However, there's a distinct lack of working, publicly available and mainstream-ready implementations of a PoS-based cryptocurrency with all the features of a PoW-based one, because it's much harder to implement securely than PoW. 

Finally, there's Proof of Authority, where blocks are digitally signed by miners, which usually don't do any particular "mining" operations like in PoW and PoS, aside for validating blocks' data. This is an algorithm most suitable for private blockchains, because it skips the energy wasting problem of the PoW (as well as the possibility of a security breach by someone just having enough computational power to re-calculate old blocks, i.e. the 51% attack), and it skips the huge complications in algorithms the PoS introduces. PoA is easy to implement, cheap, secure, resilient, but - not distributed. To maintain security against just everyone creating blocks which suit them, PoA is usually implemented by specifying a limited set of cryptographic keys which are allowed to sign (and this create) new blocks. But, since this set is well known and limited, those miners form a central authority, and thus the consensus is not as distributed as in other schemas. If the blockchain data is publicly available (i.e. everyone can read it), then everyone can also verify the data stored in the blocks, but this only works if consensus rules apply to data itself, e.g. if its formatted as transactions with specific rules which govern their validity within the context of the blockchain as a whole. If the blockchain is not cryptocurrency-oriented with strict transaction rules, but is used for storing arbitrary data, PoA blocks might pass digital signature validation without anyone being sure about if the data contained in the blocks is valid or impartial.

If a blockchain is to be private, i.e. only used within a single organisation or with a well known set of organisations, with a PoA algorithm to sign blocks, then it's not making use of (or not even implementing) one of the most important features of blockchains: the distributed consensus. All currently popular "regular" databases have data replication features which are extraordinarily better suited for the task of distributing data between nodes than any blockchain. While the blockchain software needs to form transactions, push them into a peer-to-peer network, wait for a miner to pick up the transaction, pack them into blocks and finally mine those blocks in any way, databases can simply store digitally signed data with the same efficiency with which they store (and retrieve!) all other data. There's simply no comparison in performance and ease of use between a relational database and a blockchain, if the distributed consensus is taken out of the list of requirements.

Smart contracts are just code

The elephant in the room is that there's not much that's smart about smart contracts. It's just program code attached to transactions, and executed as a part of the consensus rules. This code can, e.g. decide if the transactions are valid, examine data previously stored in the blockchain and evaluate if new data can be accepted (and how) in the blockchain, etc. 

Smart contracts themselves have data attached to them, usually called "state". A smart contract may manipulate its state data, and this is how arbitrary data is commonly stored in blockchains which support smart contracts.

For practical and security reasons, smart contracts cannot deal with data outside the blockchain. External systems, usually called "oracles" exist to insert data into the blockchain (by issuing smart contract calls which store data) which can be examined by smart contracts.

Smart contract calls (invocations, executions...) are themselves transactions with a small bit of attached data which identifies the exact piece of code and the exact smart contract to execute (i.e. the method name and the smart contract address), and the parameters to pass to the code. The outcome of this code execution is included in the transaction's outcome and can include e.g. a change in smart contract's state data.

Property #4: Smart contracts are most usable with distributed consensus

In a way, smart contracts are "business rules" for the blockchain. They can decide what happens with new data in the context of old data, and they can output / signal certain values to be used by an external system. The key thing to notice here is that smart contracts are a part of the consensus rules. Every single node executes this code and must arrive exactly at the same outcome. This outcome usually gets attached to blocks before they are mined, which is how blocks in the blockchain actually contain a history of all the effects of all the transactions happening progressively through time.

If there is no distributed consensus, i.e. if there is a single authority (even if it uses multiple nodes under its influence) which executes smart contracts and decides what are the smart contract's outputs and which transactions are valid, the concept of smart contracts can be reduced to digitally signed pieces of code stored either in a database, or more commonly, just as files on a server. This code can be verified (i.e. its digital signature can be validated), executed, and its results digitally signed and written to the database without changing the essential nature of what a smart contract is. Since the security of this scenario completely depends on the security of the consensus, and the consensus is centralised, its much simpler to add a field to an existing database which would contain code than engineer a blockchain from scratch to do it.

If the data is not going to be verified by third parties, or the public at large, there is no need for a distributed consensus (and it wouldn't be useful or used even if implemented), and there is no need to format data into transactions with smart contract code attached to them. Both data and code can more efficiently, and possibly more securely, be stored in a conventional way.

Summary

Blockchains are most useful where authenticated data needs to be distributed among a large-ish and unpredictable set of nodes, where the nodes do not trust each other so they must rely on distributed processing algorithms to distinguish which data is valid and which isn't. This introduces significant overhead which makes them incredibly slow to read and write, hard to implement securely, and plain inconvenient to use in many real-world cases.

However, if a project is trying to solve a problem which requires the use of a distributed consensus (which may or may not include smart contracts), immutable and append-only data, blockchains are basically the only solution we know of how to do it, and in those cases are unavoidable.

Stay tuned for an article examining most of these properties from a business side of things.