In #steemstem community we value originality and quality for the science articles very much. But many times we confront with plagiarized posts. Which are neither original nor having any quality. I am going to give a shortcut to those guys a magic solution which will solve one of the issues. The issue of originality. :P Sarcasm aside. Let us get serious.
Yes, we can use Markov chain to generate sequences, whether it be numbers, music or text. In this article, I will introduce to the concept of Markov chains in a very simple and non-mathematical form. Then we will delve into it using some examples.
Markov chains in short
[Image Source: wikimedia, License: Public Domain]
The concept of Markov-chains was conceived by a Russian mathematician named Andrey Markov. At the core of this concept is STATES and TRANSITIONS. TRANSITIONS occur probabilistically. Think that I have 2 states in a system. I will name it A and B. Now I need to specify the rules by which system switches between states. Let me specify those below:
- The state A stays in A with a probability say 50% probability.
- The state A goes to B with a probability 50% probability.
- The state B goes to A with a probability say 10% probability.
- The state B stays in B with a probability 90% probability.
Let us make this a table:
transition | A | B |
---|---|---|
A | 50% | 50% |
B | 10% | 90% |
Visually we can represent this as below:
You will notice that the row contents add up to 100% probability, which simply means that state A(or B) will move either to A or B in the next step. This table is called a transition probability matrix
. And this is an example of first order Markov chain
. The above transition matrix can generate lot different (a ensemble) of sequences. One of the sequences corresponding to above transition matrix will look like below:
AABBBBBBBBBBABBBBBAAAB...
A second order markov chain transition probability matrix
will be like below (for 2 states case):
transition | A | B |
---|---|---|
AA | 10% | 90% |
AB | 40% | 60% |
BA | 70% | 30% |
BB | 50% | 50% |
Let us try out some examples
For the purpose of examples, I will use python code from Github link markov-text written by Rob Dawson. The license of the codes is MIT license. As the input, let me take @suesa's latest story The Soul Camera (Short Story). I got her permission to use it by the way. I deleted the image links from the text and saved it o text file called suesa
. Once I downloaded code to a folder and took the data input file suesa
, run the following command from the terminal (I am on a Linux machine installed with anaconda python.):
- Parsing and generating a model(
parse
)
python markov.py parse suesa_out 2 ./suesa
Heremarkov.py
is the python code in the working folder../suesa
is the input text file.2
stands for second-order Markov chain,parse
is the command argument. This will generate asuesa_out.db
file. - Generating text output (
gen
)
Now we need to generate the output text from the generated modelsuesa_out.db
.
python markov.py gen suesa_out 30
will generate 30 sentences of @suesa's version of the story, but using Markov-modelling:
Time passed
It's not
”Technically yes
They saw the burden of a physicist and then we try to find a physicist and close my calculations are slowly going down each other and descriptive names
Our souls came from death? Why wouldn't want to die
Built by a solution turned yellow, that means something that _souls_ seem to pick either incredibly nerdy or super boring and explodes
The souls to have finally grasp the world hunger is this? An alien creature raises its … that life didn't expect you got rid of your mental energy
God
Luckily, their god
It's not a second picture taken with this being
I'm one of the alien creature rolls its … it did, there for eternity
The souls came from the evolutionary process
It's not a second picture taken with the real truth behind life
In the end, I open my mind races to them
” I didn't expect you know how much power one of the debate
Between quantum physics and more efficient
When you out
It was one side begged to humanity since forever: What is going down on them with this being exists or super boring and then, the voice said
Nothing at my eyes and selfish and why only humans differently
It's … some traces of years, these words had occupied scientists from death? Things change tends to finally grasp the soul camera
It's … Except for 2 years to in a weapon
They saw the soul captured, but we try to green, then to die in animals, taking a new kind of a ticket to get published and selfish and more efficient
After a way to protest but we try to imply something that is Stefanie and … Except for destroying your soul camera didn't expect you dumb enough damage
People had thought of them”, echoes a solution turned yellow, that were special
”You're dead”, the solution in which explains the solution in its … what?” Finally, a white light that it out
Nothing at once, with those who still hope There's still have survived the name is not easy, especially with this being exists or super boring and genetic engineering, nothing
In this being
The souls came from all over the world hunger is sure if this heaven? Is the first
Built by a picture with those who stand with the alien creature raises its … different
There's still hope, that it happened all over the soul wouldn't have any effect
Looks pretty gibberish, right?
By the way, here words in the @suesa's writing is the states like A and B in our examples. That was a two state machine. This is multiple-state markov-machine.
Let us improve the output. What to do for that? You must have guessed by now that second order markov-chain is better than first-order one, because it capture more structure in the sequences. Let us parse and generate for 3rd order markov chain. Commands below:
python markov.py parse suesa_3rd_order 3 ./suesa
python markov.py gen suesa_3rd_order 30
Some theorize that souls are a tool to control us, to make space travel more efficient
Who wouldn't want to be so easy to answer
”You're the first
It's not very humane how humans tend to behave when they still have their souls
” I look around
There is nothing I could reasonably do to avoid that
”Nothing I can do about it now … Except for destroying your body for good
”And under normal circumstances, I would now be able to in the past few decades
But time is not linear
” I look around
Turns out that _souls_ seem to stunt our growth
Nothing at all
”Nothing I can do about it now … Except for destroying your body for good
There's still There's Nothing
Scientists tend to pick either incredibly nerdy or super boring and descriptive names
” ”Your … what?” Finally, a figure steps into my field of vision
None of us is sure if this being exists or not
Free to finally grasp the real truth behind life
Two years, in which I've solved more mysteries than scientists have been able to harvest what you call your soul
You died, and then we pulled you out
I'm one of the questions that had occupied scientists from all over the world and other humans differently
Time passed
Time passed
In the end, I give up and close my eyes
”Didn't I die in an explosion?” I ask
”Didn't I die in an explosion?” I ask
It's a new kind of fuel I devised to make sure humans never reach their full potential
Scientists tend to behave when they still have their souls
People had tried to live lives that were rejected by every larger journal
The other side begged to have finally solved one of them with the invention of the soul camera
You can see some repetitions in the text. Can someone guess why? If you see the original article here:
Do you see repetition in her story? This is statistically captured in the output, but gibberishly.
So that was some fun and little bit math. I have not attempted to code anything for this article. Luckily there are cool open source codes existing. This will make your learning process enjoyable. Bye for now. Happy markoving!
Reference (for further reading)
- Probabilistic Graphical Models: Principles and Techniques by Daphne Koller, Nir Friedman
- Mastering Probabilistic Graphical Models Using Python by Ankur Ankan, Abinash Panda
Join #steemSTEM
#steemSTEM is a community project with the goal to promote and support Science, Technology, Engineering and Mathematics on the STEEM blockchain. If you wish to support the #steemSTEM project you can: Contribute STEM content using the #steemstem tag | Support steemstem authors | Join our curation trail | Join our Discord community | Delegate SP to steemstem
Convenient Delegation Links:
And to steemSTEM beginners:
Follow me @dexterdev
____ _______ ______ _________ ____ ______
/ _ / __\ \//__ __/ __/ __/ _ / __/ \ |\
| | \| \ \ / / \ | \ | \/| | \| \ | | //
| |_/| /_ / \ | | | /_| | |_/| /_| \//
\____\____/__/\\ \_/ \____\_/\_\____\____\__/
I remember when I used Markov Hidden Models while I was studying the Master of Bioinformatics! What a good memories has brought me this post! By the way, very funny this application of Markov!
Cheers!
HMM biogeeks unite. I've never had to directly use touch them in my field, but they're part of a lot of the underlying toolset.
Aeons and ages ago, I remember playing with the disassociated press PERl module to do text generation like this.
I never used Perl. But I have heard that Perl was great for string processing etc.
Yeah, it was the one really sane option for strings back in the day when your other choices were pre-standards C/C++ or, well, pretty much that.
Happy to know that you liked it. Yes HMMs are very key in sequence analysis etc in Bioinformatics. You followed Jaynes' book?
Do you refer to "Probability Theory: the logic of science" written by E.T Jaynes? I have just googled it because I didn't know this author and this book seems interesting. During the lessons/lectures we mainly used slideshows made by our teacher and some papers of HMMs related to bioinformatics.
Yes I meant that book. 😃