Generating a word cloud from all your written words

in #beem6 years ago (edited)

A word cloud is a visual representation of text data in which the font size depends on the its frequency in the text sample.

Wordcloud of an account

I'm using the python library word_cloud for generating a word cloud image from all posts and comments an account has written.
Here is the result for my account:

wordcloud.png

All links are removed from the included text. I use also the predefined stopword list to remove words like is, and and so on.

In order to improve speed, only the first version of a post or comment is taken into account and all editing of posts/comments are skipped.

#!/usr/bin/python
from beem import Steem
from beem.account import Account
from beem.comment import Comment
from beem.nodelist import NodeList
import six
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
import re

if __name__ == "__main__":
    nodelist = NodeList()
    nodelist.update_nodes()
    stm = Steem(node=nodelist.get_nodes())
    if six.PY2:
        authorperm = raw_input("account / authorperm:")
    else:
        authorperm = input("account / authorperm:")
    text = ""
    try:
        comment = Comment(authorperm, steem_instance=stm)
        text = comment.body
    except:
        account = Account(authorperm, steem_instance=stm)
        permlink_list = []
        for h in account.history(only_ops=["comment"]):
            if h["permlink"] in permlink_list:
                continue
            if h["author"] != account["name"]:
                continue
            permlink_list.append(h["permlink"])
            text += h["body"]

    text = re.sub(r'https?:\/\/.*[\r\n]*', '', text.replace(" ", "\n"), flags=re.MULTILINE).replace("\n", " ")
    stopwords = set(STOPWORDS)
    
    wordcloud = WordCloud(max_font_size=120, width=1280, height=800, min_font_size=6, stopwords=stopwords).generate(text)
    plt.figure()
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    # plt.show()
    plt.savefig("wordcloud.png", dpi=300)
    

Store the script as plot_wordcloud.py and run it by

python plot_wordcloud.py

The script asks for the steem account name and stores the image wordcloud.png in the same directory as the python script.
Word cloud needs to be installed:

pip install wordcloud

The Wordcloud class has the following options:

class WordCloud(self, font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=.9, mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None, background_color='black', max_font_size=None, font_step=1, mode="RGB", relative_scaling='auto', regexp=None, collocations=True, colormap=None, normalize_plurals=True, contour_width=0, contour_color='black', repeat=False)

which can be changed for modifying the wordcloud image.
You can find more information in the API-reference: https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html

Wordcloud of a post

The script can also be used to create a wordcloud for a post.

Here is the result for https://hive.blog/steem/@steemitblog/engineering-update-cost-reductions-rocksdb-mira-condenser-split

wordcloud.png

Version with image

#!/usr/bin/python
from beem import Steem
from beem.comment import Comment
from beem.account import Account
from beem.nodelist import NodeList
import numpy as np
import six
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
from PIL import Image
import os
from os import path
import re

if __name__ == "__main__":
    nodelist = NodeList()
    nodelist.update_nodes()
    stm = Steem(node=nodelist.get_nodes())
    if six.PY2:
        authorperm = raw_input("account / authorperm:")
    else:
        authorperm = input("account / authorperm:")
    text = ""
    try:
        comment = Comment(authorperm, steem_instance=stm)
        text = comment.body
    except:
        account = Account(authorperm, steem_instance=stm)
        permlink_list = []
        for h in account.history(only_ops=["comment"]):
            if h["permlink"] in permlink_list:
                continue
            if h["author"] != account["name"]:
                continue
            permlink_list.append(h["permlink"])
            text += h["body"]
    
    text = re.sub(r'https?:\/\/.*[\r\n]*', '', text.replace(" ", "\n"), flags=re.MULTILINE).replace("\n", " ")
    stopwords = set(STOPWORDS)
    
    d = path.dirname(__file__) if "__file__" in locals() else os.getcwd()
    coloring = np.array(Image.open(path.join(d, "Steem_Symbol_Gradient.png")))
                
    wc = WordCloud(background_color="white", mask=coloring, max_font_size=80, max_words=1000,
                          stopwords=stopwords, random_state=42)
    wc.generate(text)
    
    image_colors = ImageColorGenerator(coloring)    
    
    plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
    plt.axis("off")
    plt.savefig("wordcloud_steem.png", dpi=500)
    

In order to run this script, Steem_Symbol_Gradient.png from Steem-Logos-and-Usage-Guide.zip needs to copy in the same directory.

wordcloud_steem.png


I will generate your personal wordcloud for your account or a post. Just asks for it in a comment and specify if you want the big steemit logo shaped version or the small version :).

Sort:  

Could be useful information...are we coming across as we intend? What kind of impression are we making on our audience? This 'cloud' may make us more effective bloggers. People rarely pay attention to specifics. They are moved by overall effect--a word cloud might give an objective clue as to what that effect is.
As usual, I'm impressed by what you can do, and what you accomplish. Resteeming :)

Here is your personal wordcloud:
wordcloud.png

Thank you! I never would have asked, but love it! It's kind of an inkblot impression of my blogging personality. And maybe a tiny bit revealing of my worldview. Only words that surprise me in there are "always" and "never". Didn't realize I was so categorical :))
Great fun! Thanks again.

Now this is a VERY interesting tool for a profiler, isn't it?

Laughter! Also, a tricky one.

I was dealing with this idea last year and was instantly fascinated with the word cloud result. The one I was using was not that good as @holger80 s and showed only the results of recent blogging.

This gives me food for an article. I may write something about it. Thank you for resteeming this post as I would have otherwise forgotten about it.

I just posted on Twitter about it. Fascinating, isn't it? Hope you do write a blog. Offer a perspective on what a word cloud reveals to us, and to others.

It's indeed fascinating. Today I made a longer trip with my man and we talked during the car drive. I told him about the word cloud and pointed out that I was not sure how to evaluate my result. From the first sight it seemed somehow indifferent and even boring. We talked about the algorithms and how one could alter the code in order to get a more intelligent result. But this led us to the old problem of a qualitative evaluation which can never really lead to a unified perspective. Don't know how to explain right now.

Objects interest me to the extent as to have a contrast to the subject. So this word cloud thing could be one object serving it.

I just saw yours. Interesting challenge with the two languages. In a way, that does reveal a lot, that you are a kind of bridge between cultures. You don't recognize barriers or allow them to stop you. So there you have 'aber' and 'und' floating around. It's lovely, to me. The world I wish we lived in.

Can I use this, to promote the Steem blockchain on my blog, Twitter and anywhere else I can think of?

Of course, you can use it as you like.

Thank you! My first promo, based on the word cloud.

Apologies in advance for the spam. This is an automated comment to reduce the voting power of @fulltimegeek, a flat earth retard who is flagging reputable users like @themadcurator, @themarkymark and myself because he's a whiny little bitch who is throwing a hissy fit.

Please support these users and follow @themadcurator and @themarkymark.

Thank you for your attention.

please do one for me :)

Here is your personal word cloud :)
wordcloud.png

@reggaemuffin: So, I give you an unasked for psychological profile :) LOL:

You seem to be a very to this STEEMIT platform devoted person
with a strong WILL to bring it forward
You REALLY THINK that CHANGE is NEEDed and
what ONE WANTs is going to be supported through the PEOPLE
You are THANKful for the GOOD THINGs
and hard WORK is the way to go
CODE is a key to make us SEE
as this is what WITNESSES also do
HELPing each other means also coming together on DISCORD (By the way, what a paradox ;-)
and one day the PAY CHECK will be there for MANY of us.

I couldn't resist. As a hobby psychologist and a social worker and consultant in real life this is much too tempting. You can get satisfaction through my word cloud which holger80 hopefully will provide me with:)

Have a great day!

Wow will try it soon.

This post has been just added as new item to timeline of beem on Steem Projects.

If you want to be notified about new updates from this project, register on Steem Projects and add beem to your favorite projects.

You got a 45.50% upvote from @ocdb courtesy of @holger80! :)

@ocdb is a non-profit bidbot for whitelisted Steemians, current max bid is 24 SBD and the equivalent amount in STEEM.
Check our website https://thegoodwhales.io/ for the whitelist, queue and delegation info. Join our Discord channel for more information.

If you like what @ocd does, consider voting for ocd-witness through SteemConnect or on the Steemit Witnesses page. :)

Hi @holger80!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 7.392 which ranks you at #60 across all Steem accounts.
Your rank has improved 2 places in the last three days (old rank 62).

In our last Algorithmic Curation Round, consisting of 216 contributions, your post is ranked at #115.

Evaluation of your UA score:
  • Your follower network is great!
  • The readers appreciate your great work!
  • Try to work on user engagement: the more people that interact with you via the comments, the higher your UA score!

Feel free to join our @steem-ua Discord server

Hi, @holger80!

You just got a 3.26% upvote from SteemPlus!
To get higher upvotes, earn more SteemPlus Points (SPP). On your Steemit wallet, check your SPP balance and click on "How to earn SPP?" to find out all the ways to earn.
If you're not using SteemPlus yet, please check our last posts in here to see the many ways in which SteemPlus can improve your Steem experience on Steemit and Busy.

Thank you so much for participating in the Partiko Delegation Plan Round 1! We really appreciate your support! As part of the delegation benefits, we just gave you a 3.00% upvote! Together, let’s change the world!

Congratulations! This post has been upvoted from the communal account, @minnowsupport, by MaxPatternMan from the Minnow Support Project. It's a witness project run by aggroed, ausbitbank, teamsteem, someguy123, neoxian, followbtcnews, and netuoso. The goal is to help Steemit grow by supporting Minnows. Please find us at the Peace, Abundance, and Liberty Network (PALnet) Discord Channel. It's a completely public and open space to all members of the Steemit community who voluntarily choose to be there.

If you would like to delegate to the Minnow Support Project you can do so by clicking on the following links: 50SP, 100SP, 250SP, 500SP, 1000SP, 5000SP.
Be sure to leave at least 50SP undelegated on your account.

This post has been included in the latest edition of SoS Daily News - a digest of all you need to know about the State of Steem.

I would like to have my result, too.
VERY interesting. I stumbled over it last year here: http://www.steemreports.com/steem-word-clouds/?account=erh.germany

but it doesn't count all posts like yours but only the recent ones. I really would like to see mine as I am more than 1,5 years on the platform:)

Thank you in advance!

This link doesn't work, it says "page not found": https://steemit.com/steem/@steemitblog/engineering-update-cost-reductions-rocksdb-mira-condenser-split:

You have to remove the ":"

Great!
I'm curious: could you produce my personal word-cloud in both formats?

Many thanks!

Awesome, could you generate a small one for me, please? :D
Oh, I guess the "common words" will dominate a wordcloud based on a german blog, right?