Self-written blockchain. From theory to practice.

in #self-written4 years ago

Hello!

Blockchain is closely included in our life, but what is blockchain,
The majority have an extremely vague understanding. However, it is not surprising, since the state of affairs with information on the network has an extremely superficial information content. In this blog, I will lay out a phased advancement from the first lines to the generation of the first block.
But let's start in order. I consider it right to start with the theoretical part, since in any business this is the basis. In addition, this information can be useful to many.

open-uri20190314-5165-1s3xur9.jpg

                      And so, let's get down to the theory.

Blockchain is a continuous chain of blocks (singly linked list),
having only two operations: read and add, excluding
functions of editing and deleting due to elements of cryptography and
computer networks. The need for this way of storing information
caused in situations where the parties to the protocol do not trust the intermediaries
and each other (exchange without the participation of an arbiter, transaction of funds without participation
bank).
Blockchain can be implemented in different ways, even if the sphere
application is known in advance. So for example, if the developed program
blockchain-based is a payment system, then getting the balance
the user can already be implemented in two ways: deterministic and
non-deterministic *, not to mention what the block will consist of,
what are the block limits, what are the mining rewards, etc. This state of affairs
is more negative, since the safety of the final product is not
will be determined by generally accepted standards that have passed open and
long-term analysis. To combat this factor it is accepted
adhere to de facto standards, similar to bitcoin "Bitcoin" (which
acts as a classic payment system) and Ethereum
Ethereum (as a contract platform).

  • Determinism can be viewed as mathematical
    a function whose result depends only on the arguments received and not on,
    what more. It is also called a stateless concept, excellent
    functional programming languages ​​(Haskell,
    LISP, etc.). So for example, to get the user's balance,
    it will be necessary to iterate over the entire set of blocks that belong
    acceptance / sending of coins. This will create the entire sequence
    operations of a person and the complete pattern of his actions from beginning to end.
    Non-determinism can be viewed as the concept of
    states ", and an analogy for this can be imperative languages
    3 programming (C, Go, etc.). If it is necessary to transfer this property to
    blockchain, then you can make the balance state of the user
    was stored in a separate block and not in the blockchain. So reading
    balance of the user will occur from the end of the chain and will stop at that
    the moment when the desired block is found.

                         Learn more about blocks.
    

Each individual block can be associated with an atom of the physical world,
which is
relatively indivisible, but at the same time consists of protons, neutrons,
electrons, and if you look deeper then you can reach quarks. The same
the situation is typical for a block, it consists of a list of transactions, current and
previous hash, computation complexity, timestamp, miner signature, and
may contain even more information (such as balance
for a non-deterministic concept). And this section is intended for the first
a queue for analyzing the insides of the block.
It is worth starting consideration of the block with its minimal version, where
it is able to stay and function as an object on the blockchain. So much for
block, just one field is enough - the hash of the previous block. This stripped down
the version is easy to understand, but it also gives rise to a theoretical basis.
So for example, let's imagine that we have a certain block B 0, which
represents the beginning of a chain, in terminology it is a genesis block.
Its essence is that it does not have a previous block and is the only one
the progenitor of all subsequent ones. The block B 0 can be represented as follows
form:
B 0 = ^random string^
A random string in the genesis block replaces the hash of the block, and in theory it is
it can be anything, not even random, since this does not affect security.
All subsequent blocks will be generated based on the following formula:
B i = hash (B i-1),
where hash is a cryptographic hash function *
Thus, a hash chain is obtained, which only with reservations
can be called a block. To give the block "meaning", you need to put in
data block (D) for storage, but in such a way that this data
committed as a result of the hash. Now the block has become more complicated and instead of one
values ​​(hash), he began to store two (data and hash), thereby becoming
an array:
B 0 = [^ NULL ^, ^ random string ^]
B i = [D, hash (D || B i-1 {2})]
In this example, the entry "||" was entered, which means concatenation
lines and the record "B i-1 {2}", which means to take the second element from the array B i-1
(indexing for the array will start at one).
This record is already more like a blockchain, because it stores at least some
then the values ​​are supposed to be useful to the user. But there may be
a logical question: why was it all done? And the answer is for safety
(although at this stage there are still a lot of problems). The bottom line is now
it is impossible to change the value of B i {1} (or B i-1 {2}) without affecting B i {2}, otherwise
the change will be detected because the block value will not
match its hash.
The situation for an attacker will be dire the moment he
it will be necessary to change the value of the block in the middle or the beginning of the chain, so
how he will have to change all subsequent hashes that come after
variable block. But problems in this scheme still remain.
What if funds transfers are stored on the blockchain? Will be very
it's a shame if an attacker manages to change your value stored in
block, for a value of much larger size, or it will generate a new block, in
which will indicate you as the sender.
To solve this drawback, you need to familiarize yourself with this
term like digital signature.
The digital signature helps to identify the creator of the block, the
the most excluding moments when someone can impersonate another
person. The very concept of a digital signature carries two more terms - address
(public key) and wallet (private key).
The signature should be applied to the hash of the block, not to its value by two
grounds. First, the hash is always of a fixed length,
accordingly, the signing speed will not vary. Secondly,
the signature will also affect the hash information of the previous block, which
will fully link the signature to the previous blocks.
The block has become more complex to three elements (value, hash, signature):
B 0 = [^ NULL '^ ^ random string ^, ^ NULL ^]
B i = [D, hash (D || B i-1 {2}), sign (B i {2})]
Now, if an attacker wants to change the value of B i {1}, he
it will be necessary to change the value of B i {2}, but if it changes the value
B i {2} he cannot change the value of B i {3} in any way so that this block
continued to identify as before the change. Or in other words,
the intermediary will be able to change the block, but only on the condition that he
will put his signature on the block and change all subsequent blocks in the same way.
Thus, if an attacker conducts such fraud in
payment system, then it will become bankrupt quickly.
But what if the attacker is so desperate that he is ready
spend all your money just to make others feel bad? Even if your block
will be at the beginning of the chain, and the chain of a million blocks, modern
the computer will be able to change the value in the chain and then generate
all subsequent hashes with the attacker's signatures, thereby erasing all
real blocks.
In this case, proof-of-work * comes to the rescue. it
further complication of the block, which adds such a term as complexity
calculations. This difficulty is based on a mathematically difficult problem,
similarity of finding the desired hash. An ordinary computer can spend on
such tasks for several hours, days, or even months and years, not to mention
a person's ability to find a solution with his own hand. To regulate
complexity, the problem of calculating not the whole hash was invented, but only its
parts. Suppose a number is being looked over and it is hashed, you need to find such
the hashed value so that it starts with four zero bytes. If
such a hash is found, then the problem is solved. But there definitely is
minus if at least one person finds a hashable number that is
will start with a large number of zeros, you will have to change the way
calculating a difficult problem, for example, go to ones, then to two,
threes, fours, etc. This is clearly inconvenient, accordingly it was invented
an improved way of doing this, what if hashing isn't easy
the incremented number, and hash the concatenation of the incremented
numbers with the current block hash? You will always get a random result and
it will not be possible to reuse the resulting incremented number by
another block. Finding the desired hash is also called mining.
The block will now look like this:
B 0 = [^ NULL> ^ ^ random string ^, ^ NULL ^, ^ NULL ^]
B i = [D, hash (D || B i-1 {2}), sign (B i {2}), pow (B i {2})], where pow returns
number x = B i {4}, for which hash (x || B i {2}) = "0000 ...".
It should be noted that the pow function focuses specifically on the hash. That is, the block does not contain any information about who mined. Let's imagine such a situation that you yourself do not mine the block, but shift this task to
a group of other people whose computers are many times more powerful than yours
Computer and for this you pay a certain percentage of the funds transferred. Further one
a person from this group finally finds the desired number and enters it into the block.
But there is an intruder in this group, he copies the number x and forwards it
you. Now you have received two answers about the found hash, who to trust?
You can say who brought the first, he will receive the reward, but you can enter
in another, more correct way. Let the miner himself create the block, and
you will only send him a transaction (namely, the first three fields of the block),
then the miner will enter his hash and signature into the block, and his hash will be
perform the operation pow.
The block gets even more complicated:
B 0 = [^ NULL ^, ^ NULL ^, ^ random string ^, ^ NULL ^, ^ NULL ^,
^ NULL ^]
B i = [[D, hash (D), sign (B i {1} {2})], M, hash (M || B i {1} {2} || B i-1 {3}) ,
sign M (B i {3}), pow (B i {3})]
When the miner finds the required number x = pow (B i {3}), no one else
will be able to copy this number and substitute it for its result, since
the resulting value will not match the hash of another block, where it will be
another miner is specified. But even at the same time, the required transaction falls into
blockchain, with value, value hash and hash signature. That is, everything is like
need to.
But even at this stage, there is a loophole. Can be replaced without
the consequences of sign (B i {1} {2}) and no one will notice. The question of course can
stand differently, for what purposes the existing signature of a third party
want to change to your signature, while adjusting to its hash? But
alas, anything can happen and therefore it should also be considered a vulnerability. IN
Revealing the guts can help fix this vulnerability
magic value D.
And again the block starts to get more complicated:
B 0 = [^ NULL ^, ^ NULL ^, ^ random string ^, ^ NULL ^, ^ NULL ^,
^ NULL ^]
B i = [[[S, R, V], hash (B i {1} {1}), sign S (B i {1} {2})], M, hash (M || B i {1 } {2} || B i-
13}),
sign M (B i {3}), pow (B i {3})]
In this case, S - sender (sender), R - receiver (receiver), V -
value (the value itself or the amount of money). The view has also changed
sign (B i {1} {2}) to sign S (B i {1} {2}), indicating whose signature it is. Now
the signature cannot be replaced without affecting the hash. Change Signature Creator
also impossible, because it will be necessary to change the hash value, which
inevitably leads again to the replacement of the signature. Miner signature also
problematic to change because its name is in the hash. Evidence
work to steal also will not work, because it is tied to a hash. It seems
as if here it is, the perfect protection! But alas, no, there are still vulnerabilities.
Remember that you can't trust anyone on the blockchain? Actually it
also true for miners. But how is the miner able to harm?
After all, a signature was made on the hash, and the hash points to the sender,
recipient and value, that is, none of this can be changed, and if
change everything completely, which is analogous to simply ignoring
adding a transaction to a block. This is all true, but there is one detail that
very annoying - the transaction is not attached to anything. Respectively,
the miner or anyone else will be able to copy the entire transaction and
duplicate it again, while it will also be valid, because all
the hashes and signatures are correct (after all, they are really real). To get rid of
from this vulnerability, you can simply indicate in the transaction the hash of the previous
block.
The block has become a little more complicated:
B 0 = [^ NULL ^, ^ NULL ^, ^ random string ^, ^ NULL ^, ^ NULL ^,
^ NULL ^]
B i = [[[S, R, V], hash (B i {1} {1} || B i-1 {3}), sign S (B i {1} {2})], M, hash (M ||
B i {1} {2} || B i-1 {3}), sign M (B i {3}), pow (B i {3})]
Now even if the miner wants to duplicate the transaction, it will
be under the new block, and the hash of the previous block will no longer be
match the hash that is in the transaction. And if the miner
will try to keep the changed values ​​even though they do not match,
then other miners will simply reject its block as invalid.
And now our blocks are lined up in a chain, they are all protected from
substitutions, changes, duplication, and at what from all users. But what
if there are several miners and each of the miners has a different chain
blocks? How will they agree on a common blockchain? And is it possible?
The answer is of course yes, otherwise bitcoins would not exist, yes
ethereums. Rather, the question is the complexity of such a change. Main
the way to resolve such conflicts comes down to the total complexity of the chain itself,
which was composed of many pow. If one chain
exceeds another in complexity, then it becomes the leading one. If in
in chains, the complexity is static, then chains are measured
in the number of blocks, and so a chain with a larger size will be
leading and the node with the smaller chain must yield and accept the larger one.
Such conflicting situations are called soft fork when the blockchain
is split into several branches, but there is still a possibility
choose only one "correct" chain. If the attacker wants
replace the entire blockchain, then he will have to generate himself
more blocks than there are in the attacked chain, respectively, if
the chain is already huge, then the potential of such an attack tends to
zero. ****
And in addition to soft forks, there are also so-called hard forks. Hard
Forks, unlike soft forks, do not appear by themselves. They are used
when it is necessary to change or improve the application, or
the creation of a new blockchain based on the existing one. This leads to
the fact that the new branch will not "merge" with the old one, thereby
becoming incompatible.
When talking about money-based transactions,
automatically and logically, the question arises, where do the roots of these
money in the digital blockchain world? I hope you haven't forgotten what exists
such a genesis block. Well, whoever creates the genesis block is
a person with money, he is able to manipulate them, or in other words
make initial transactions. But among other things, in the blockchain
there is also a storage, which is a kind of blockchain
a user who does not have a private key. This can be considered
as an exception to the rule, which forms a new rule only for him in
the context of miners. The essence of the vault is that it pays out
from their "treasury" money for the work done to miners. Wherein,
the storage can be either self-healing or
non-regenerating.
If we are talking about non-recoverable storage:
1- Miner mined the block
2- The vault pays money to the miner
3- Miner gets a percentage of transactions
If we're talking about self-healing storages:
1- Miner mined the block
2- The storage lists the percentage of transactions
3- The vault pays money to the miner
Thus, knowing all this, it is necessary to change the genesis block:
B 0 = [[[S, M, V], ^ NULL ^, ^ NULL ^], ^ NULL ^, hash (B 0 {1} {1}), ^ NULL ^,
^ NULL ^, ^ NULL ^],
where S is storage, M is genesis block miner, V is genesis block reward.
As soon as the genesis block is generated, the storage begins to operate,
which is a certain and limited supply of resources.
It seems as if everything has already been said, but there are still a number of subtleties.
in the blockchain.
First, if a transaction is sent to different miners at the same time,
then miners have a competition or a mining race. Chances
win more against the one who has more power, but the question is
friend, how are miners coordinated? Indeed, in theory, miners can mine
the same values, thus wasting time and energy. For
in such a case, pool servers are created to which the group connects
miners interested in solving one problem and allocating resources
between themselves. Pool servers coordinate the actions of each individual
miner, so that they do not overlap in iterating over the same
values.
Secondly, there can be several transactions in a block (and this is most often
it happens). In this case, another vulnerability arises, namely
duplication of a transaction in one block. To avoid this, you need
insert a random number (r) into each transaction and hash it together
with all other information.
Thirdly, you can apply time stamps (Time), which will
indicate the time of block creation, for its subsequent
tracking in history and determining the current capacity of the blockchain with
sides of miners *****. All this is possible only if competent
synchronization with the server representing the time.
As a result, the latest version of the block will look like this:
B 0 = [[[[^ NULL ^, S, M, V], ^ NULL ^, ^ NULL ^]], ^ NULL ^, ^ NULL ^,
hash (B 0 {1} {1} {1}), ^ NULL ^, ^ NULL ^, ^ NULL ^]
B i = [[[[r, S, R, V]], hash (B i {1} {1} {1} || B i-1 {4}), sign S (B i {1} {1 } {2})], [...], ...
], M, Time, hash (M || Time || (B i {1} {1} {2} || B i {1} {...} {2}) || B i-1 {4 }), sign M (B i {4}),
pow (B i {4})]

  • A cryptographic hash function is a one-way function that
    takes a string of arbitrary length and returns a fixed string
    length. The one-sidedness property says that it is easy to get y from x [y = f (x)],
    but it is difficult to get x from y [x = f -1 (y)]. As cryptographic hash-
    functions, it is recommended to use hash functions of the SHA-2 family (sha224,
    sha256, sha384, sha512) or SHA-3 (keccak). Raw hash functions have
    disadvantages like lengthening the message and
    collisions. It is problematic to get rid of the second, due to the existence
    birthday attacks, thus it is customary to choose hash functions of a larger
    order. You can get rid of the first drawback, as an example, by applying
    HMAC algorithm per function. But in the blockchain, such an attack would be useless,
    since it requires editing already existing data and a data hash,
    which entails the need to edit the signature, for the reason
    further mismatch between hash and signature.
    ** Digital signature is an element of asymmetric cryptography. Such
    encryption algorithms like RSA is the result of the function
    13 decryption, and signature verification is carried out using the function
    encryption. The idea is that the decryption function is known only
    the creator of the private key, respectively, only he is capable of using
    this property to confirm their actions.
    *** PoW (Proof-of-work) - proof of work carried out when
    help solving a complex mathematical problem (for example, finding
    desired hash). Besides PoW, there are other ways of proving
    such as PoS (Proof-of-stake), PoA (Proof-of-authority), PoB (Proof-of-burn) and
    etc.
    **** It is worth saying that if the miners gather in one coalition and their
    the total power will be more than 50% of the power of all miners, then it is possible
    an attack in which a sufficiently long soft fork will occur. To-
    is, the accumulation of blocks in different chains will occur
    at the same time, thereby dividing the common blockchain into two camps. From this
    phenomena can be speculative opportunities for canceling transactions for
    the account of the choice of the desired chain by the miners.
    ***** You can calculate the current power of miners using tags
    time. If earlier it took an average of 10 minutes for mining, and now 5 minutes, then
    it can be argued that the mining power has doubled. Wherein,
    the tendency towards a decrease in the required time has an adverse effect
    thus, since there will be more frequent occurrences of soft forks.
    Because of this, blockchains pre-set a function that regulates
    the current complexity of the blockchain, adjusting to the constant time
    (for example 10 minutes).

                       Let's move on to the network.
    

The more you study the blockchain network, the less it seems to you.

completely decentralized. There are servers on this network that indicate
exact time, pool servers are responsible for coordinating the actions of nodes, and
among other things, the nodes themselves are distributed
server, since the users who carry out transactions are
clients.
The essence of this section is to identify the features of the blockchain with
sides of the network and building connections between its participants. So for a start it is worth
to say that there are two main types of networks - multi-rank and
peer-to-peer. Multi-rank (client-server) networks involve
existence of two objects - client and server, where the client is capable
make a request, and the server issues a response. Peer-to-peer
networks imply equality of participants in actions, or in other words,
a user on a given network is both a client and a server
(based on the terminology of multi-rank networks). Such users
referred to as nodes.
But in addition to the above two types of networks, there is a certain
a hybrid, which is called a hybrid network. Its main essence lies
in that there are, as in a multi-rank network, two main objects -
client and server. But the peculiarity of such a network is that as
server acts as a peer-to-peer network, or in other words, a network of nodes. And this
the definition of a hybrid network can still raise questions. Like what
if a person brings up a site whose data storage will
distributed between its servers, then such a network will be called
hybrid? And can the network be called a hybrid, if only
one node?
Without answering these questions, it is hardly possible to continue the discussion about
this topic. Accordingly, it is necessary to find in multi-rank and hybrid
networks are a fundamental difference, and it does exist. Nodes in hybrid networks, in
unlike servers in multi-rank, do not trust each other, but at the same time
do common work. It follows from this that the nodes do not obey any
then to a specific person, or to a narrow circle of people and thus carry
decentralized nature. But this also implies the rule that if
there is only one node or nodes are in the hands of one group of persons,
then such a network loses its hybrid properties and goes into a multi-rank phase.
As a result, there are the following connections among the blockchain participants:
1- Client -> [multi-rank] -> Node
2- Node -> [peer] -> Node
3- Node -> [multi-rank] -> Pool Server
4- Node -> [multi-rank] -> Time server
1- Clients send requests to nodes to receive balance, blocks or
entering a transaction into a block.
2- A node communicates with other nodes to store a common chain. Knot
can send requests to another node to add a new block to
blockchain.
3- The node requests the required mining range from the pool server.
4- The node requests the current time state from the time server.
Although the number of multi-rank connections prevails, nevertheless,
the importance of action is peer-to-peer. The main question is here
rather lies in whether the resiliency of the blockchain network is deteriorating, in
moments of emergence of multi-rank communication?
Let's take the example of a pool server first and imagine an attacker
trying to disconnect it from the network using DDoS attacks. The server is shutting down,
access to it is terminated for some nodes. As a result, this group of nodes can
go through two scenarios, either connect to another pool server, or
generate blocks individually based on a random number. In this
case, the network continues to function under any circumstance, albeit with
possible caveats about performance drops.
Now let's take a time server. The difference between this server and the pool
server is that it can only be used once before
launching the node. In other words, ask the server for the exact time, set
this time on the local machine and start the node. Thus, attacks on
the server will only harm the emergence of new nodes, while already
functioning nodes will continue to work. It should be borne in mind that
that there are always several time servers and make a successful DDoS attack
appears to be a daunting task. And even if all the time servers were
disconnected from the network, at this time the blockchain itself continues to function, in
nodes of which the current time is stored. Accordingly, the forming node
can request time not directly, from time servers, but indirectly, through already
working node in the blockchain.
To summarize all of the above, then the fault tolerance of the blockchain
will depend entirely on the peer-to-peer architecture, while disabling
from servers can only lead to a drop in performance of some
groups of miners, but not in any way to lower the level of fault tolerance.
It's good if the blockchain network is inherently decentralized,
then a logical question arises: how the clients (and the nodes themselves) will
connect to other nodes? Where will they get the initial list from?
required addresses? This is actually a very common question in context.
decentralized networks, since each such network solves similar
problems in many ways. More often than not, the solution is to use
third-party communication channels. Let's say a server is being created on which there will be
there is a list of active nodes. Disabling such a server
will not destroy the network itself, but will block access to information about it and its
increase (which is a problem only during the initial formation
blockchain). The list will be expanded by the nodes themselves (i.e.
they will enter their address on the server), since they are interested in profit from
mining. And the more often, on various servers, the miner's address will come across,
the more often clients will turn to him for help in mining (and
miners to add new blocks).
If the addition of addresses is taken into account in the blockchain, then the number of connections in
the network will increase by two lines:
1-Client -> [multi-rank] -> Address Server
2-Node -> [Multi-Rank] -> Address Server
It should also be borne in mind that the miner himself is capable of issuing a list of all
other miners to which it is connected, thus in most cases
you don't even need to visit the servers. But one question may arise: because of
competition of miners in finding the desired hash, will there be a miner
try to issue a truncated list of addresses to increase your chances of
mining? Maybe he will, but the point here is different, if he does not have time to conceal
the block and the blockchain will be updated at the expense of another miner, then the block that mined
The fraudulent miner will be invalid. As a result, he will either have to create
its blockchain, while ignoring all other blockchain (which is enough
risky, since it will be similar to a hard fork, followed by
attracting miners to your blockchain), or agree with others
miners and accept the fact that transactions from clients who
sent to him turned out to be overdue. From this case, customers will understand
that if the transactions did not get into the new block, then most likely the miner did not issue
the entire list of active nodes. Thus, customers are better off
connect to several nodes at once and receive addresses of others from them
miners, eliminating duplicate ones.
And there is also a third way to get the list
nodes. The authors of the blockchain client program (or blockchain node) can
by default add to the program (or to the configuration file) the list
trusted nodes that have shown themselves to be acting
a long time. And as the program is updated, it will also be updated
this list.
On the part of maintaining the capacity and availability of the blockchain network,
apply all three methods of finding addresses at once. So for example if
the popularity of this blockchain has increased by several orders of magnitude, then in
theory, the first way to find addresses (using a server) may not
used, but then a vulnerability will arise when trusted nodes
(of the third way of finding miners) will cooperate and will not issue
addresses of all other nodes, thereby lowering the power of the blockchain network. it
will affect the more successful replenishment of the balance of "fraudulent miners",
at the same time, other miners will not even get a chance to form a new block,
based on transactions. Moreover, if we remove the third method, then there will always be
there is a need to contact the server at the initiating stage
launch the application. If we remove the first and second methods, then the chances
entering a transaction into a new block will be downgraded (provided that
there are several clients with different address lists). If we remove the second
way, then the ability to receive a list of new addresses will be assigned
only to the server. Well, if we remove the first and third methods, then this
the blockchain network will lose its ease of use, due to the need
manual connection settings.

Well, in general, I stated the theory. In the next post, we will go directly to its implementation.