[STEEMSQL] A deep analysis of Steemians gratefulness

in #stats8 years ago (edited)

Using SteemSQL, I discovered someone created a post just to say “thanks” to me.

Remember in SteemSQL introduction, I wrote about full text search and said :

Let say I want to know if anyone mentioned me in a post or comment, the following simple query will do the trick
SELECT author, title, body, url FROM TxComments WHERE CONTAINS(body, '@arcange')

I decided to use my favorite analytic toy and put my own piece of advice into practice.

I executed the query and surprisingly discovered that a few weeks ago @lesliestarrohara created a post post fully dedicated to “thanks” me!

WOW … made me smile too, made me proud.

This is exactly why I came to Steemit after reading Steem whitepaper.
I loved the idea of upvote incentive vs flagging. Building something with a positive attitude.

As usual, my little thoughts mill started to produce new questions:

  • Are people on Steemit grateful towards each other?
  • Are we really building a friendly community?
  • Who is grateful?
    and many more…

Couldn’t sleep with all those ‘stupid ’ questions.

Anyway, maybe it’s useless, but I decided to transform it into a good tutorial on how to build a more complex analytic report.

Let’s go …

We first need to build the SQL query

I will start with the question “how many users post or reply with the word “thanks” in it?”
This translate into T-SQL as

SELECT 
    COUNT(*) AS [Thanks]
FROM
    TxComments
WHERE
    CONTAINS(body,'thanks')

This return an awesome number of 89936 “thanks” posted.

Magic predicate

But wait, people do not always say “thanks”. Some say “Thank” or “Thank you” or something similar.
The same query with the word “thank” (singular) return only 66517 posts.

As it won’t be fun to give a try to each word, I will use one of the coolest feature of SQL full text search: FREETEXT() predicate. This predicate searches for values that match the meaning and not just the exact wording of the words in the search condition.

The where clause in the query becomes

...
WHERE
    FREETEXT(body,'thanks')
...

We now get a greater number of 151869 posts where gratitude has been expressed!

Linking and combining

As I want to create a graph to see the evolution over time, I will link the Transactions and Blocks tables to get some date parts from the block’s timestamp. I will also ask the server to compile data using GROUP BY clause to avoid returning all those individuals records:

SELECT 
    MONTH(blocks.timestamp) as [Month],
    DAY(blocks.timestamp) as [Day],
    COUNT(*) AS [Thanks]
FROM
    TxComments
    inner join Transactions on TxComments.tx_id = Transactions.tx_id
    inner join Blocks on Transactions.block_num = Blocks.block_num
WHERE
    FREETEXT(body,'thanks')
GROUP BY
    MONTH(blocks.timestamp),
    DAY(blocks.timestamp)

Cool. Let’s have a look at our result:

Want to know how I created the chart, look here

Hmm, that’s cool but not really instructive.
It looks like if people became suddenly very thankful from middle July (1) up to end July (2) then got more and more bored posting kind words to their fellow Steemians (3) and finally colluded to stop it as of today (4).

What’s wrong and how can we improve our analysis

  • We have to get rid of today’s data because the day is not yet over and that’s why the graph is plunging down at the end.
  • We should compare our “thanks” data with the total number of post per day. Remember there was a big hype around 4th of July and lot of people joined and started to post. Then some moved away.

To get all of this, I will use a more complex query:

SELECT 
    MONTH(blocks.timestamp) as [Month],
    DAY(blocks.timestamp) as [Day],
    COUNT(*) AS [Post],
    0 AS [Thanks]
FROM
    TxComments
    inner join Transactions on TxComments.tx_id = Transactions.tx_id
    inner join Blocks on Transactions.block_num = Blocks.block_num
WHERE
    DATEPART(dy,timestamp) <> DATEPART(dy,GETDATE())
GROUP BY
    MONTH(blocks.timestamp),
    DAY(blocks.timestamp)

UNION SELECT 
    MONTH(blocks.timestamp) as [Month],
    DAY(blocks.timestamp) as [Day],
    0 as [Post],
    COUNT(*) AS [Thanks]
FROM
    TxComments
    inner join Transactions on TxComments.tx_id = Transactions.tx_id
    inner join Blocks on Transactions.block_num = Blocks.block_num
WHERE
    DATEPART(dy,timestamp) <> DATEPART(dy,GETDATE())
    AND FREETEXT(body,'thanks')
GROUP BY
    MONTH(blocks.timestamp),
    DAY(blocks.timestamp),
    txcomments.author

I won't explain all details of the query here and know it can be improved. Feel free to ask me on steemit.chat SteemSQL channel

Let’s look at our new result:

Well, that’s much better!

We now see that despite the moving average trend is going down for posts, the gratefulness trend is now quite steady for weeks, with some nice hiccups.

Here I am reassured. There are plenty of nice people on Steemit who recognize the good work of others and thank them. I'm definitely glad to be part of this community!

The Bonus

Want to know if you’re a pro of "pat in the back" and rank in top 20 chart? Check this:

Ooops, I 'm not in
I told you it was a stupid idea!

Last but not least

Thanks to @lesliestarrohara for inspiring me this post.
Thanks to all of you who upvoted my SteemSQL introduction
Thanks to all the nice people I met here

“Bring light and delight”, Arcange


UPDATE - TOP 20 correction

I check @msjennifer account to say congrats for being top 1 saying "Thanks" to others, but noticed this account is a bot always posting similar comments. Shame on me, I should have verify it before posting.

Congrats to @kaylinart for being the most grateful people on Steemit!

Image: Shutterstock

You like this post, do not forget to upvote or follow me

Sort:  

That's a great analysis. I really like that you showed how you did the programming, too, not just presenting the results. That's they way some of us will learn how to do more with the blockchain, too. Very nice - and thanks!

Thanks for your feedback.
Doing my best to share knowledge.

Haha, this is so cool! I'm pleased that my thank you post in response to your comment served as the inspiration for this analysis of gratitude! It's amazing, the small ways we impact one another in this ecosystem.

@arcange, please do not underestimate "Thank you" while searching for "Thanks" ;-)

Thank you,
Gandalf ("gtg")

That's the purpose of using FREETEXT() instead of CONTAINS()

I noticed that just after writing a comment (when went back to continue reading ... ) Thank you for pointing out. :-)

Lol. You're welcome!
But I suspect you're working hard to hit the top 20 chart with your two comments :p

True, as well as encouraging others to try to do the same :-)

Nice analysis:) I didn't realise I was saying thanks so much!

He he, good boy!

Hi @arcange, really cool analysis and specially amazing job on STEEMSQL ! Thanks for it!

Thanks for your positive feedback!

You are doing a great job! THANK YOU! I read all your articles, but I find it difficult to comment on, I do not know much English. But I'm with you!

Unfortunately, I don't speak Russian (yet).
Russian's "thanks" were not taken into account in the top 20 chart. Maybe you had a spot in.
This give me the idea on creating a tutorial on how to manage with other languages than English.
Спасибо!

If I can be useful, let me know in steemit.chat I will be glad!


He he he, was worth checking: you got pole position in the russian chart !
Congrats! Поздравление!

Потрясающе! Я и не думала!
How thrilling! I didn't know!
How you choose Russians?

I used the following query:

SELECT TOP 20
    Author,
    COUNT(*) AS [Post]
FROM
    TxComments
WHERE
    body LIKE '%Спасибо%'
GROUP BY
    Author
ORDER BY
    COUNT(*) DESC

It is less efficient than using CONTAINS() or FREETEXT() but these can't be used as full text search catalog is based on English locale.
If you need more help, please use steemsql channel on steemit.chat

I did not find any word "спасибо" at linkback-bot-v0

This post has been linked to from another place on Steem.

Learn more about linkback bot v0.4. Upvote if you want the bot to continue posting linkbacks for your posts. Flag if otherwise.

Built by @ontofractal

EXCELLENT RESEARCH.
@papa-pepper is #4!
Thanks!