Analysis of the Public Profile Settings fields

in #utopian-io7 years ago (edited)

ANALYSIS.jpg

Link to the github repository
https://github.com/steemit/steem

In my second contribution in the analysis category I focus on the Public Profile Settings on Steem/Steemit. When an account is created, the user name is established, but there are other aditional variables or fields that can be filled in to customize the account that can be updated as many times as desired.

These variables or fields are:

  • Profile_picture url (I will refer as Image)
  • Cover_image url (I will refer as Cover)
  • Display Name (I will refer as Name)
  • About
  • Location
  • Website

As we will see, there is a variety in the use of these fields. There are accounts that have never updated these fields, others that have done it very frequently and others have a behavior between these two extreme uses.

In this analysis, I try to quantify and visualize the behavior that exists in the election of these fields, the number of accounts that use them and the way they do it.

1. ACQUIRING THE RAW DATA

I have used the Steem database of the Steem Blockchain Database Service (SBDS) of PRIVEX.IO. (sbds.privex.io) On the date that I obtained the data for this analysis, the date of the most recent records was April 27th, 2018.

The raw data used to perform this analysis is contained in the table TxAccountUpdates.

Captura de pantalla 2018-05-17 a las 9.55.19.jpg

In particular, the data about the Public Profile Settings (PPS) can be found in the json_metadatavarchar that appears as follows:

EXAMPLE:

{"profile":{"name":"ATX Trading","about":"Technical Swing Trader","location":"Austin, TX","website":"https://www.atxtradingco.com","profile_image":"https://ibb.co/eCQtSk"}}

If an account has never updated any of its six PPS fields, it will not appear in this table. Each time an account performs an update of one or more of its fields in the PPS, a record will appear in this table with the updated values.

The Steemit frontend did not begin to include data from the PPS fields until August 2006, so the first records in the table do not include values ​​of the PPS fields and therefore the jsonmetadata field appears empty or used for another purpose. These values ​​have been filtered out in this analysis.

Therefore, the query is:

SELECT
sbds_tx_account_updates.timestamp,
sbds_tx_account_updates.account,
sbds_tx_account_updates.json_metadata
FROM sbds_tx_account_updates
WHERE
sbds_tx_account_updates.json_metadata LIKE "%profile%"

2. ANALYSIS

2.1 How many accounts have made at least one update in their PPS?

At the time of obtaining the data for this analysis, the number of accounts in Steem/Steemit was 927,444.

The result of the query shows that there have been 1,567,926 updates (related to updates of any of the six fields of the PPS) made by 293,007 accounts. This implies that 634,437 accounts have never updated the values ​​of their PPS.

Therefore, measured in percentages, 31.59% of the accounts have made at least one update and 68.41% of the accounts have never made an update.

Num AccountsNum. updatesAVRG%
293,0071,567,9265.3531.59%
634,4370068.41%
TOTAL 927,4441,567,9261.57100%

output_U09p5v.gif

This percentage, which to me personally seems small, can give an approximate idea of ​​the true size of Steem/Steemit. Although many accounts can be active without the need to update the profile fields, what is expected, is that an account that creates posts, comments or votes has updated at least the profile image.

That seems to suggest that there are many accounts inactive or used only as a wallet or are in a dormant state because possibly many persons owns more than one account.

When the number of accounts is close to reaching the symbolic value of 1 million, it makes me think that the actual size of unique people does not even reach a third of that value.

This is a conclusion that I obtain by analyzing this data although there are possibly better ways to approach this value using IP access information.

2.2 Evolution in time of the updates of the PPS fields

Grouping the data by months in the time period from April-2016 to the April-2018 I have obtained the following table of values where I have added a column for the Number of new accounts created in each month and the Accumulated number of accounts in each month. Additionally I have created two Percentual Relative Ratios.

  • INDX 1= Num. Updates / Num. New Accounts created
  • INDX 2 = Num. Updates / Num. Accumulated Accounts

TABLE1.jpg

To visually analyze the data of that table I have created the following graphs where

  • UPDATES = Number of Updates (at least one field)
  • ACC. CRTN = Number of new accounts created
  • SUM ACC = Accumulated Number of Accounts

output_Nzv6VQ.gif

It can be observed that, in general, the pattern that follows the Number of updates of the PPS over time coincides in form with the amount of new accounts created in time. Although it must be said that there are two very different stages.

In the first stage until 2017-04, the number of updates was less than the number of accounts created. This was due to the fact that there was no possibility to include any of these fields or, for a period of time, only the IMAGE field could be included.

In the second stage from April-May 2017 the popularity of the use of these fields grew as more users became aware of its existence. Between the months JAN-2018 until MAR-2018, the proportion of updates was much greater, to fall again in the last weeks.

The correlation between the number of updates of the PPS and the creation of new accounts is easily explained, since it is logical that these fields are completed shortly after the creation of an account although later can be re-updated as we will see later.

DOS.jpg

In this graph, where the Number of updates of the PPS is compared with the Accumulated number of accounts, on the contrary, it can be observed that such correlation does not exist.

Somehow it can be said that new accounts, in a high percentage update their PPS quickly and those that do not when they do it later, they do it in a more distributed pattern in time.

INDX12.jpg

In this third chart, where the Ratios INDX1 and INDX2 are represented, the same effect can be seen more clearly.

  • INDX1 presents very steep slopes and follows the form of the rhythm of new accounts creation.
  • INDX 2 shows soft slopes and does not follow that pattern.

2.3 What is the percentage of use for each field of the PPS?

For those accounts that have at least completed a field of the PPS (31.59%) we want to know what use there has been for each of the particular fields.

I have filtered the data by unique accounts in such a way that only the most recent state of the PPS fields appears.

I have calculated the number of empty records for each field relative to the total, obtaining the following results ordered by the percentage of use in decreasing order.

FIELDFILLED INEMPTY
Image81.71%18.29%
Name71.69%28.31%
Location59.76%40.24%
About54.12%45.88%
Cover42.50%57.50%
website26.78%73.22%

porcentag.jpg

The vast majority of the 293,007 accounts have updated their profile IMAGE(81.71%) and their NAME field (71.69%). A little more than half of the accounts have updated the LOCATION (59.76%) and ABOUT (54.12%) fields. In smaller quantity, although still relevant, are used the COVER field (42.50%) and finally the WEBSITE field (26.78%).

In general we see that users make a great use of these features giving more importance to the field that is more visible when a post or a comment is made, which is the IMAGE field although the possibility of customizing theirs profiles in more detail is widely used.

2.4 What are the most popular values ​​for each field in the PPS?

At first I had no intention of analyzing the content of the fields but I realized that some interesting results can be extracted from these values. I present below the most used values ​​for each of the fields.

LOCATION

Captura de pantalla 2018-05-16 a las 15.37.09.png

It should be noted that since the inclusion in these fields can be any, some users enter their country, others their region, others their city (some even choose the "Earth" as their location) which would force to redo the previous table adding, for example, the values ​​of USA + United States, Aceh + Indonesia, UK + London.

But I'm not trying to do an exhaustive study of the location of Steem/Steemit users but only an overview.

In this list it is worth highlighting the first positions of Venezuela and Germany that are representative of that "strange mixture" of countries of contrasts so great from the economic and social point of view.

It could be said that the steem/steemit ecosystem is made up of people united by links that do not have to do with the typical (perhaps expired) political or economic borders between countries. From my point of view this provides a great strength.

ABOUT + NAME + WEBSITE fields

I have summarized in a table the most popular values ​​for these three fields of the PPS.

Captura de pantalla 2018-05-16 a las 15.37.00.png

To express an overview of each of these fields I have used a semi-joking ;) summary phrase for each one.

  • ABOUT field

Well, it seems that there are many software engineering students with an artistic interest in photography who likes music and say HELLO to their friends.

  • NAME field

He is a boy with a short biblical name who owns bitcoins.

  • WEBSITE field

Everybody have a past on Facebook.

IMAGE

Regarding the profile image and cover is a story that tells itself.

IMAGEs.jpg

Summary phrase:

Is this place an antisystem social network that uses cryptocurrency in which bots proliferate?

COVER

cover33.jpg

Summary phrase:

We know that there are beautiful landscapes in nature, either during the day at the mountains and beaches, or at night under the stars, but the circuits of my brain do not allow me to quit Steem.

This description of Steem users using the most popular values ​​of the PPS fields offers results that we already know because we know the origins and who we are.....but someone who does not know anything about Steem doing this simple analysis would have a very rough idea of ​​the average reality.

2.5. Analysis of the Re-updates.

Now I want to know how are the Re-updates average behavior. I consider a re-update when the value of a field is updated again after the first update.

  • 0 re-updates = Just 1 update
  • 1 re-update = 2 updates
  • 2 re-updates = 3 updates ...

In the previous section I filtered the data, keeping the last update of the PPS fields for each account. In this case I have grouped the records by accounts but keeping all the uniques values ​​for each field and for each account (creating a list of all the unique values ​​for each field). Subsequently I counted the number of unique values ​​in those lists for each account and for each field.

Doing this you can know the number of times an account have changed with unique values ​​each field. Counting them for all the accounts and obtaining percentages the following table is obtained

Captura de pantalla 2018-05-18 a las 11.41.38.png

Therefore, this table indicates that the ABOUT field is not modified again (for the second time) in 39.34% of the accounts and 60.66% of the accounts modify it a second or more times.

Therefore, the ABOUT field is the one that is most often updated, although the rest of the fields are also modified later in more than half of the accounts.

These are the percentages of re-updates for the ABOUT field differentiating for 2,3,4,5,6, 7 and (> 7) number of re-updates.

NUM CHANGES AFTER FIRST UPDATEcount%
0 re-updates1033039.34%
1 re-updates500519.06%
2 re-updates295511.25%
3 re-updates20967.98%
4 re-updates11154.24%
5 re-updates11074.21%
6 re-updates5171.96%
7 re-updates5932.25%
> 7 re-updates25389.67%

In this graph all the percentages for all fields are displayed.

Re-updates percentages per number of updates for the PPS fields

Captura de pantalla 2018-05-18 a las 11.43.29.png

The graph shows that the percentage of accounts that modify their PPS fields decreases in a similar way in all the fields, being the ABOUT field (in green) the one that presents a lower decay which indicates that more updates are made in this field.

This seems to indicate that users redefine more times the information about who they are or what their vital state is that their information of their IMAGE, COVER, NAME, LOCATION or WEBSITE although these values ​​are also renewed in time that can be seen psychologically as a renewal of their identities or their or activities.

2.6 Suspicious activity in the re-updates

Finally, to finish this analysis I wanted to investigate some results that caught my attention.

A certain number of accounts have been making a very high number of updates to their PPS (hundreds or thousands of times). I suspected that this had to do with bots and indeed I could verify it.

In particular there are four accounts @minibot, @nanobot, @microbot, @millibot that started their activity in DEC-2017 and keep it currently that have the same behavior and clearly belong to the same person who updates the ABOUT fields status very frequently.

The fact of re-updating frequently is not in itself malicious but looking at the values ​​of the ABOUT field that are updated we can find two states that alternate consecutively.

BID STATE

[@nanobot Bid Bot. Beta. 2.4h bid window, no refund, 0.001 SBD min. Next Vote (apx.): 2018-2-3 0:43 UTC My Upvote (apx.): 0.056 SBD. Max SP of post author: 100 SP]

DONATION STATE

['VOTING TIME. Bids are not accepted now. Sent SBD will be considered a donation. ']

output_0lKCgW.gif

I do not know in detail the behavior of Bots and therefore I can not make very clear statements, but I wonder:

  • Is it normal for a bot to have these periods when the transfers are considered donations?
  • Could not a scam strategy be used using this technique to constantly vary to a donation status and users do not even have the opportunity to receive the vote they are bidding to?

I have been looking at the transfers and comments in these accounts and I have not found comments about scams but I find this activity a bit suspicious, but as I said earlier this may be a common practice in bots that I do not know.

3. SUMMARY

This is a basic analysis of the Public Profile Settings fields where the use of this feature is examined in the steem/steemit accounts that allow the profile of the accounts to be customized with six fields (IMAGE, COVER, NAME, LOCATION, WEBSITE)

Regarding the general use of these fields, it has been found that only 31.59% of the accounts have used it, which can be used to estimate the true size of steem/steemit. These accounts have made an average of 5.35 updates of their PPS fields.

The evolution of the use of this feature began to become popular as of April-May 2007 and there is a very correlated pattern with the rhythm of creation of new accounts which seems a reasonable behavior.

Regarding the use of each of the fields separately, it can be seen that the most used fields are IMAGE and NAME and the least is the WEBSITE field, although in general it can be said that they are widely used.

The most popular values ​​within these fields have been examined, which has allowed us to obtain an average profile of Steem users that in a certain way verifies what we already knew:

This would indicate that Steem/Steemit is an ecosystem that includes people from very different countries where there is a large number of students related to software engineering, with a past on Facebook, mainly of male gender with an anti-system character, that uses cryptocurrencies.

The behavior of the accounts has also been examined when making new updates of their PPS fields. As a result, it is common for these fields to be updated at least once more, although their percentage decreases rapidly for a greater number of updates, with the ABOUT field being the one that on average more times is updated.

Finally, the data obtained in this analysis have allowed to detect some anomalous behavior in some accounts that perform a large amount of re-updates of their PPS fields that could be related to some scam use of some bots although this should be confirmed by those who understand more deeply the use and abuse of bots.

4. SCOPE, TOOLS & CODE

SCOPE

  • Submit Date: 20.05.2018.
  • Data extracted was from 2016-3-30 to 2018-4-27

TOOLS
Captura de pantalla 2018-05-20 a las 11.42.07.png

  • To get and process data from the Steem database I used dBeaver a free and open source multi-platform database tool for developers, SQL programmers, database administrators and analysts

  • Infogram to create charts.

CODE

SELECT
sbds_tx_account_updates.timestamp,
sbds_tx_account_updates.account,
sbds_tx_account_updates.json_metadata
FROM sbds_tx_account_updates
WHERE
$json_metadata$ LIKE “%profile%”

Sort:  

Excellent analysis. This is another example of how analysis contributions should be done. What I was able to get from this was:

  • Around 290,000 thousand accounts (30%+) have completed their profile; this can be used to estimate the true size of steem(it) in combination with the number of active accounts by @arcange
  • I particularly loved your summary phrases :)
  • I also loved the placing of your notes before and after the images. They made it easy to understand the charts.

Just a minor inconsistency I observed:

  1. There were places where PSS was used instead of PPS for Public Profile Settings

Your contribution has been evaluated according to Utopian rules and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post,Click here


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Hey @eastmael
Here's a tip for your valuable feedback! @Utopian-io loves and incentivises informative comments.

Contributing on Utopian
Learn how to contribute on our website.

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Really sir you so great man.
steemit job my most like man @utopian-io

@eastmael, thanks for your comments. I have already corrected the inconsistency with PPS and PSS that is due to my dyslexia with the keyboard ;)

Great! Looking forward to your next analysis. :)

Nice job.
If you have any questions regarding thw update behaviour of micro mini milli and nanobot, just ask.
I am their owner.
The timframe between the update of one's profile should be 2.4 hours, as is their voting cycle.
Why that behaviour?
To discourage auto biding on the last second

@isnochys, I know almost nothing about bots. The behavior of your bots appeared in my data and that is why I mentioned it in my analysis because it seemed striking to me. If you make good use of them I wish you good luck. Just add that for me it is a good sign that you have made your comment.

Hey @lokomotion
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Contributing on Utopian
Learn how to contribute on our website or by watching this tutorial on Youtube.

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

That's great analysis.
Really my steemit job my most like man @utopian-io ,really sir you so My most choice man.... best of luck sir

this steemit job my most you like @utopian-io and you very a great man.

I like your post, because it divides the knowledge about steemit, good post and useful @lokomotion

hello, i like this post, i voted for you, i have followed you, hihi you, i have a private blog about coffee in steemit, maybe you like, i will share your opinion You will know there are many good things about my coffee blog, follow me, and vote for me, thank you, make a funny steemit

Very interesting information excellent post greetings friend

Very interesting information excellent post greetings friend