Whenever someone mentions machine-learning, it’s like the next level stuff. But the truth is that machine learning should help human to infer complicated or high dimensional data into easy and plain math. This story tends to use machine learning to analyse
1. Does bitcoin behave like a currency (like a USD to GBP) or an investment asset (like a stock) when it comes to trend analysis?
2. If we want to do some math about bitcoin future price, which number is really useful anyway?
Well, let’s begin by collecting some numbers.
Part 1. Feature (aka the data we collect)
The data collected comes from Blockchain.info API (https://blockchain.info/api). I have collect 365 days of bitcoin data, and I expand every day’s feature (or every row of data) by adding information from previous days.
For every row of data (or one single set of data I put to the model), the feature includes:
1. Bitcoin price as at today and the past three days
2. Bitcoin exchange volume as at today and the past three days
3. Bitcoin transaction volume as at today and the past three days
4. Bitcoin mining volume as at today and the past three days
I export the data into an excel file, feel free to cross validate my data.
Although the dataset has 17 features for each day’s transaction, the dataset is far from enough, other statistics like mining difficulty / mining cost is ignored. Let alone to mention that the market news was completely ignored as well.
Please pay attention that I further normalize the data by each column. For every column, I scale every number X to have zero mean and unit variance. Detailed preprocessing calculation can be seen here.
By just looking at the price, trading volume and miner volume (for 4 days), is a machine able to discover any hidden structure, i.e. observing a trend in the data?
Part 2. Machine learning
The model I apply is known as T-SNE, or t-distributed stochastic neighbour embedding. In simple word, the model will try to squeeze the high dimensional data (the 17 features) into a low dimensional structure (for example, to only 2 features). Such that we can visualize the date on a 2-D graph. In other word, data-point with similar characteristic, which we do not know what it is for now, will group together in the graph.
Result
It doesn't really make any sense to look at the graph for now, but we can see some areas are dense (aka have many data-points grouped together). I have tried different way to infer the implication of the graph.
1. Up and down
Is there any implication about tomorrow closing price? I have classified the data-point into 2 groups. Red group represents the BTC price at t + 1 day is higher, blue is lower.
This is an expected result. Frankly, if you can use publicly available information (and only 4 days) to make an investment decision, you are a god.
2. Market trend
When I look at the 1 year price chart of bitcoin, I saw there are two phrases of movement. In the first half period from 2016/06 to 2017/01, the price is flat (also known as zig zag move). In the second half, as we all know, is a bitcoin rally.
What if we assign red and blue to the data-point at the first and second half period? Red group represents data-point from 2016/06/05 to 2017/01/05. Blue group represents data-point from 2017/01/06 – 2017/06/05.
This is much better. Two blue clusters and a red cluster.
Well, the data should be best described as 3 clusters, but right now I still struggling to understand what’s the third cluster representing.
(Do remember that the red, blue indicators were never been put to the model, the model learns all this by itself)
Finding
In the feature, we consider the trading data about bitcoin from both supply (mining vol.) and demand (trans. vol.). From the model, we know that the data-point can indicate certain information about market sentiment.
This kind of feature is similar to commodity trading in real world like gold, oil, copper, you name it. In commodity trading, traders rely on the data from supply and demand to shed light on the future market trend.
Although the ticker of BTC simulates a forex pair like BTC/USD, the market nature inherits features as a commodity. In other words, if an investor wants to make an investment decision wisely in BTC market, the data in supply side (Our fellow miner) should not be ignored.
This project is far from complete. Please feel free to comment and add idea to the further development. Right now, I would like to see if there can be satisfied result by applying deep learning. Stay Tuned!
Disclaimer
(1. This research is only a naïve approach to understand bitcoin and by no mean serves as an investment decision signal. There will be future update for part 2, 3, 4… your support is tremendous to my future development in this project. 2. Do let me know if you want to share the content to other platform.)
Thank you for your investigation and the clear findings.
At your finding, you mentioned that it is important to consider the supply of bitcoin. In my opinion, the supply side of bitcoin might not be that much important. It is all because of the minner cannot obtain an enormous amount of BTC in a short period of time, so the supply should increase rapidly.
So, should we only add demand side information when we are prediccting the trend?
What do you think?
Although the overall mining difficulty must go up, I think that in short term (say 4 - 5 days), the movement of mining volume / difficulty can share information about recent market sentiment.
I believe that miner is a mature participant in the market, thus their movement is somehow important to understand the market.
Actually I had plot a correlation graph and interestingly, the price (tomorrow) is quite correlated to mining volume (today).
So I will dig deeper in this area. Stay Tuned!
like!
Nice idea, hop will it able to apply in real environment :)
Excellent article! Just want to ask how would you usually decide which model to use in this kind of problem? There are too many models to choose from and I always dunno where to start
In this case when I just want to take a look at the hidden structure of data (or fancy name like data mining), unsupervised machine learning should be in place.
If a person is familiar with bitcoin and knows that certain information can help him to make a trading decision, then he should go for supervised machine learning.
Can you explain a little bit about the difference of unsupervised machine language and supervised machine learning? Sorry I am quite new to this haha
In unsupervised machine learning, there is no answer given. Like in my story here, the red dot and blue dot are not given to the machine, the color is assigned after the machine runs its model.
Supervised machine learning means that for every set of data, we assign a 'true answer' to the model. Then the model will try to work out its way to generate some parameters (for example a regression) to predict the next answer by another set of data.
I got it now, very clear explanation, thanks!
Really a good and detailed article, it makes me more understand this topic!