DIY Steemit Statistics with Python: Part 5 - Reputation

in #python7 years ago

Our next order of business: the distribution of reputation scores among active Steemit users.

Before we start, we prepare the workspace as usual (see the previous posts in the series for additional context: 1, 2, 3, 4):

%matplotlib inline
import sqlalchemy as sa, pandas as pd, seaborn as sns, matplotlib.pyplot as plt

sns.set_style()
e = sa.create_engine('mssql+pymssql://steemit:[email protected]/DBSteem')

def sql(query, index_col=None):
    return pd.read_sql(query, e, index_col=index_col)

 
The reputation of each user is updated dynamically with each vote they receive. In principle, we could recover it by processing the TxVotes blockchain table and accounting for all the votes and flags one by one. This would probably be computationally quite heavy, however. Instead, we can rely on the fact that SteemSQL is helpfully tracking various current user-related metrics in a dedicated table for us already.

Indeed, let us take a look at all the tables currently available to us at SteemSQL:

sql("select * from information_schema.tables")

TABLE_CATALOG TABLE_SCHEMA TABLE_NAME TABLE_TYPE
0 DBSteem dbo Reblogs VIEW
1 DBSteem dbo TxWithdraws BASE TABLE
2 DBSteem dbo TxWithdrawVestingRoutes BASE TABLE
3 DBSteem dbo TxWitnessUpdates BASE TABLE
4 DBSteem dbo VOCurationRewards BASE TABLE
5 DBSteem dbo Tokens BASE TABLE
6 DBSteem dbo Blocks BASE TABLE
7 DBSteem dbo Transactions BASE TABLE
8 DBSteem dbo TxCustomsReblogs VIEW
9 DBSteem dbo TxCustomsFollows VIEW
10 DBSteem dbo Comments BASE TABLE
11 DBSteem dbo VOInterests BASE TABLE
12 DBSteem dbo TxAccountCreates BASE TABLE
13 DBSteem dbo VOFillOrders BASE TABLE
14 DBSteem dbo Accounts BASE TABLE
15 DBSteem dbo TxAccountRecovers BASE TABLE
16 DBSteem dbo TxVotes BASE TABLE
17 DBSteem dbo TxAccountUpdates BASE TABLE
18 DBSteem dbo TxAccountWitnessProxies BASE TABLE
19 DBSteem dbo TxAccountWitnessVotes BASE TABLE
20 DBSteem dbo VOFillConvertRequest BASE TABLE
21 DBSteem dbo TxClaimRewardBalances BASE TABLE
22 DBSteem dbo TxComments BASE TABLE
23 DBSteem dbo TxCommentsOptions BASE TABLE
24 DBSteem dbo TxConverts BASE TABLE
25 DBSteem dbo TxCustoms BASE TABLE
26 DBSteem dbo VOAuthorRewards BASE TABLE
27 DBSteem dbo TxDelegateVestingShares BASE TABLE
28 DBSteem dbo TxDeleteComments BASE TABLE
29 DBSteem dbo TxEscrowApproves BASE TABLE
30 DBSteem dbo TxEscrowDisputes BASE TABLE
31 DBSteem dbo TxEscrowReleases BASE TABLE
32 DBSteem dbo VOShutdownWitnesses BASE TABLE
33 DBSteem dbo TxEscrowTransfers BASE TABLE
34 DBSteem dbo VOFillVestingWithdraws BASE TABLE
35 DBSteem dbo TxFeeds BASE TABLE
36 DBSteem dbo TxLimitOrders BASE TABLE
37 DBSteem dbo Followers VIEW
38 DBSteem dbo TxPows BASE TABLE
39 DBSteem dbo TxTransfers BASE TABLE

Scanning through the list, we can guess that Accounts is most probably the table we are interested in:

sql("select top 3 * from Accounts")

 
From the results of this query (omitted) can see that there's a reputation field indeed. It is, however, given as a long, unfamiliar number - not the short value we are used to seeing in the profile page. It turns out we can convert this "raw" value to a "usual" reputation score using the following formula:

reputation.png

Let us test it:

sql("""
select 
    name, 
    reputation as raw_reputation,
    cast(log10(reputation)*9 - 56 as int) as reputation
from Accounts 
where name = 'konstantint'""")

name raw_reputation reputation
0 konstantint 263717261138 46

This worked as expected. Now let us finally count how many accounts are there per each reputation score (dropping the scores below 26, just like @arcange does it):

reputations = sql("""
with Data as 
    (select 
       cast(log10(isnull(reputation, 0))*9 - 56 as int) as Reputation
     from Accounts
     where reputation > 0)

select 
    Reputation, count(*) as Count
from Data 
group by Reputation
having Reputation > 25
order by Reputation desc""", "Reputation")

 
For the sake of variety, we made use of a Common Table Expression in our query this time.

Let us conclude by plotting the results:

reputations.plot.bar(figsize=(8, 4));

output_12_0.png

The source code of this post is also available as a Jupyter notebook.

Next time we will plot the voting power distribution and learn to insert fishy images into our barplots.