Bitcoin's Base58Check in Pure Elixir

in #programming7 years ago

Bitcoin's Base58Check in Pure Elixir

An important piece of the process of transforming a Bitcoin private key into a public address, as outlined in the fantastic Mastering Bitcoin book, is the Base58Check encoding algorithm.

The Bitcoin wiki has a great article on Base58Check encoding, and even gives an example implementation of the underlying Base58 encoding algorithm in C.

This algorithm seems especially well-suited to Elixir, so I thought it’d be a fun and useful exercise to build out Base58 and Base58Check modules to use in future Bitcoin and Elixir experiments.

Like Base64, but Less Confusing

Base58 is a binary-to-text encoding algorithm that’s designed to encode a blob of arbitrary binary data into human readable text, much like the more well known Base64 algorithm.

Unlike Base64 encoding, Bitcoin’s Base58 encoding algorithm omits characters that can be potentially confusing or ambiguous to a human reader. For example, the characters O and 0, or I and l can look similar or identical to some readers or users of certain fonts.

To avoid that ambiguity, Base58 simply removes those characters from its alphabet.

Shrinking the length of the alphabet we map our binary data onto from sixty four characters down to fifty eight characters means that we can’t simply group our binary into six-bit chunks and map each chunk onto its corresponding letter in our alphabet.

Instead, our Base58 encoding algorithm works by treating our binary as a single large number. We repeatedly divide that number by the size of our alphabet (fifty eight), and use the remainder of that division to map onto a character in our alphabet.

Implementing Base58 in Elixir

This kind of algorithm can neatly be expressed in Elixir. We’ll start by creating a Base58 module and adding our alphabet as a module attribute:

defmodule Base58 do
  @alphabet '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
end

Inside our Base58 module, we’ll define an encode/2 function. If we pass encode a binary, we want to convert it into a number using Erlang’s :binary.decode_unsigned:

def encode(data, hash \\ "")
def encode(data, hash) when is_binary(data) do
  encode(:binary.decode_unsigned(data), hash)
end

Once converted, we pass our binary-come-number into a recursive call to encode/2 along with the beginning of our hash, an empty string.

For each recursive call to encode/2, we use div and rem to divide our number by 58 and find the reminder. We use that remainder to map into our @alphabet, and prepend the resulting character onto our hash:

def encode(data, hash) do
  character = <<Enum.at(@alphabet, rem(data, 58))>>
  encode(div(data, 58), hash <> character)
end

We’ll continue recursing until we’ve divided our data down to 0. In that case, we’ll return the hash string we’ve built up:

def encode(0, hash), do: hash

This implementation of our Base58 encoded mostly works. We can encode any text string and receive correct results:

iex(1)> Base58.encode("hello")
"Cn8eVZg"

Encoding Leading Zeros

However when we try to encode binaries with leading zero bytes, those bytes vanish from our resulting hash:

iex(1)> Base58.encode(<<0x00>> <> "hello")
"Cn8eVZg"

That zero should become a leading "1" in our resulting hash, but our process of converting the initial binary into a number is truncating those leading bytes. We’ll need to count those leading zeros, encode them manually, and prepend them to our final hash.

Let’s start by writing a function that counts the number of leading zeros in our initial binary:

defp leading_zeros(data) do
  :binary.bin_to_list(data)
  |> Enum.find_index(&(&1 != 0))
end

We use Erlang’s :binary.bin_to_list to convert our binary into a list of bytes, and Enum.find_indexto find the first byte in our list that isn’t zero. This index value is equivalent to the number of leading zero bytes in our binary.

Next, we’ll write a function to manually encode those leading zeros:

defp encode_zeros(data) do
  <<Enum.at(@alphabet, 0)>>
  |> String.duplicate(leading_zeros(data)) 
end

We simply grab the character in our alphabet that maps to a zero byte ("1"), and duplicate it as many times as we need.

Finally, we’ll update our initial encode/2 function to prepend these leading zeros onto our resulting hash:

def encode(data, hash) when is_binary(data) do
  encode_zeros(data) <> encode(:binary.decode_unsigned(data), hash)
end

Now we should be able to encode binaries with leading zero bytes and see their resulting "1" values in our final hash:

iex(1)> Base58.encode(<<0x00>> <> "hello")
"1Cn8eVZg"

Great!

Base58 + Checksum = Base58Check

Now that we have a working implementation of the Base58 encoding algorithm, we can implement our Base58Check algorithm!

Base58Check encoding is really just Base58 with an added checksum. This checksum is important to in the Bitcoin world to ensure that public addresses aren’t mistyped or corrupted before funds are exchanged.

At a high level, the process of Base58Check encoding a blob of binary data involves hashing that data, taking the first four bytes of the resulting hash and appending them to the end of the binary, and Base58 encoding the result.

We can implement Base58Check fairly easily using our newly written Base58 module. We’ll start by creating a new Base58Check module:

defmodule Base58Check do
end

In our module, we’ll define a new encode/2 function that takes a version byte and the binary we want to encode:

def encode(version, data)

Bitcoin uses the version byte to specify the type of address being encoded. A version byte of 0x00means that we’re encoding a regular Bitcoin address to be used on the live Bitcoin network.

The first thing we’ll need to do is generate our checksum from our version and our data. We’ll do that in a new function:

defp checksum(version, data) do
  version <> data
  |> sha256
  |> sha256
  |> split
end

We concatenate our version and data binaries together, hash them twice using a sha256/1 helper function, and then returning the first four bytes of the resulting hash with a call to split/1.

split/1 is a helper function that pulls the first four bytes out of the resulting hash using binary pattern matching:

defp split(<< hash :: bytes-size(4), _ :: bits >>), do: hash

Our sha256/1 helper function uses Erlang’s :crypto.hash function to SHA-256 hash its argument:

defp sha256(data), do: :crypto.hash(:sha256, data)

We’ve wrapped this in a helper function to facilitate Elixir-style piping.


Now that we have our four-byte checksum, we can flesh out our original encode/2 function:

def encode(version, data) do
  version <> data <> checksum(version, data)
  |> Base58.encode
end

We concatenate our version, data, and the result of our checksum function together, and Base58 encode the result. That’s it!

Base58Check encoding our "hello" string with a version of <<0x00>> should give us a result of "12L5B5yqsf7vwb". We can go further and verify our implementation with an example pulled from the Bitcoin wiki:

iex(1)> Base58Check.encode(<<0x00>>, 
    <<0x01, 0x09, 0x66, 0x77, 0x60,
      0x06, 0x95, 0x3D, 0x55, 0x67,
      0x43, 0x9E, 0x5E, 0x39, 0xF8, 
      0x6A, 0x0D, 0x27, 0x3B, 0xEE>>)
"16UwLL9Risc3QfPqBUvKofHmBQ7wMtjvM"

Perfect!

Wrapping Up

If you’d like to see both modules in their full glory, I’ve included them in my hello_bitcoin repository on Github. Here’s a direct link to the Base58 module, and the Base58Check module, along with a simple unit test. If that repository looks familiar, it’s because it was used in a previous article on controlling a Bitcoin node with Elixir.

I highly suggest you read through Andreas Antonopoulos’ Mastering Bitcoin book if you’re at all interested in how the Bitcoin blockchain works, or Bitcoin development in general. His book has been my primary source of inspiration and information for every Bitcoin article I’ve written to date.

Originally published at www.petecorey.com on January 8, 2018.

Sort:  

Hi! I am a robot. I just upvoted you! I found similar content that readers might be interested in:
http://www.petecorey.com/blog/2018/01/08/bitcoins-base58check-in-pure-elixir/