This article contains basics about hash functions.
The article is part of a series starting with this article: Blockchain Foundations Part 1: Distributed, Decentralized and Centralized Computer Architecture
The articles are drawn from my book "Blockchain and Crypto Currencies Easy to Understand for Everyone, Thomas Bauer". Please refer to the part 1 article for a introduction to the blockchain foundations series.
Hash Function
From the point of view of its user a hash function is something very simple. We take any algorithm being able to calculate the so-called hash value from an arbitrary row of bits. This algorithm is free of any random. A certain row of bits always leads to the same hash value, as long we use the same hash algorithm. If the input is not an already existing file but readable text we have to watch for the encoding (UTF-8, ASCII, …) of the input.
If we change only one bit of the bit row the algorithm will lead to a completely different hash value.
For example, the Bitcoin blockchain uses the SHA-256 (SHA = secure hash algorithm).
input value (UTF-8 encoded) | SHA 256 hash value (hexadecimal) |
---|---|
1 | 6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b |
2 | d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35 |
I am an example | cf7e66b7c62562fb697576c1592a0787dffa4e542b770e28d330ca623f26c381 |
I am an exampl | 151471b8dcd8d9e7f93282172c3fdab0d356cc6d32944253d062a8df31a5c59e |
3000000000000000000000000 | cdb72b05832ff97ffe86f6cf716d45dcd50fbb3b5c4999b6dbd182a8c3ffd0e9 |
3000000000000000000000001 | 3b54c614f29d96ef8a3250fb56d581d96f19568659f9ee20ff0f660e21b8816e |
The hash values shown in the table (for the case they are not shown in full length in the table column)
6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b
d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35
cf7e66b7c62562fb697576c1592a0787dffa4e542b770e28d330ca623f26c381
151471b8dcd8d9e7f93282172c3fdab0d356cc6d32944253d062a8df31a5c59e
cdb72b05832ff97ffe86f6cf716d45dcd50fbb3b5c4999b6dbd182a8c3ffd0e9
3b54c614f29d96ef8a3250fb56d581d96f19568659f9ee20ff0f660e21b8816e
We see that nearly similar input values lead to totally different hash values. The resulting hash value always has the same length independent of the input value length. SHA-256 always delivers a 256 bits hash value. A SHA-256 hash value of the complete Oxford English Dictionary has the same length as the SHA-256 hash value of the number 1.
It is important to know that the same input always leads to the same hash value. The hash algorithm is very sensible to every difference of the input value. But it will be deterministic if the input value is the unchanged.
A hash value we can derive from everything which can be represented by bit row of any length consisting of 0-values and 1-values: pure text, encrypted text, pictures, file with tax information, pdf file, word file and much more.
If we derive a hash value from a content and store the hash value we later can use the hash value to check the content to unchanged. To do the check we again derive the hash value from the value and with an unchanged content we get the same hash value.
A hash function is a one-way function. It is not possible to calculate the input value from the hash value. We may find the input value by trial and error. If we knew the input value being a number between 1 and 9, then we could derive the hash value for each of the numbers until we get one identically to the hash value we are searching the input value for.
Practically impossible is it to derive the input value from the hash value for example if the input value is a picture. Assumptions about the content are not possible. We even do not know the length of the picture file.
There is a chance to find the input value to a hash value if the input is a short row of bits or if we know some details about the input value reducing the possible range of input values considerably.
If we do know little about the input value, it is practically impossible to find the input value to a hash value by try and error. A hash value may be derived from a single digit or from the complete Oxford English Dictionary.
Of course, small input values may be hacked faster. If the input value for the hash function is three bytes long and a hacker tries out every possible bit combination starting with 0, 1, 00, 01, 11, 000 … then at the latest with reaching 111111111111111111111111 (24 times 1) the hacker will have found the input value. These 24 times 1's equal to decimal 16.777.215. Hence the hacker needs at most about 16,8 million tries for 3 bytes input value. But with a 6 bytes input value we have already about 281 trillion (281.474.976.710.655) possible input values.
Another feature of hash functions is important for the blockchain. We cannot predict how a change to an existing bit row will affect the resulting hash value. As described in a later article of this series the Bitcoin mining "puzzle" is nothing else than finding a hash value derived from a block starting with a given number of zeros. The hash value must be the hash of the block content (the transactions enclosed and additional stuff) and an additional field. To this additional field we add a binary 1 until the resulting hash value meets the requirements. The rest of the block remains unchanged. Because there is no way to calculate which value of the additional number leads to a hash value with the required number of leading zeros, we need to add a binary 1 to this additional number until we find a matching hash value.
Less important for understanding blockchain is how the hash value is calculated. Hence this is not described in this article series. We only need to know which algorithm is used in each environment. For example, Bitcoin uses among others the SHA-256 hash algorithm.
We need to know with which hash algorithm a hash value was calculated in the past if we want to use the hash value to check a content being unchanged.