Improving Steem’s rankings to cater to diverse content preferences

in #steemit9 years ago (edited)

Improving Steem’s rankings to cater to diverse content preferences

Would Steem fail if every blog post of a lady putting on her makeup is rewarded $26,000?

Destroy Internet?

This non-meritorious aberration is because Steem’s incentive system maximally rewards those who vote in a groupthink. This is becoming a significant concern that threatens to destroy Steem’s utility and value as evident by the parodies and rants against ‘circlejerks’.

The One-Size-Fits-All Problem

Community Disagreement

I explained in a comment that the fundamental technical flaw appears to be that Steem’s ranking model computes a one-size-fits-all ranking for all users.

It's getting harder and harder to find all the hidden gems on Steemit these days.

That is unavoidable in the current system design, due to the one-size-fits-all reputation system, i.e. each user will have different priorities but the design of the system doesn't accommodate such degrees-of-freedom.

If some of the voting power shares your preferences, that content will rank higher than content that interests none of the voting power, but assuming that interests are reasonably diverse then ranking will be more or less uniform and uncorrelated to individual preferences. So thus more or less in order for any content to rise up, it must be a groupthink effect.

To maximize each voter’s reward, the optimum mathematical strategy is for all voters to vote on the same blog post, because currently Steem only computes a one-size-fits-all global ranking metric. I explained in another comment that invested whales who were expected by Steem’s whitepaper to vote to curate and protect the long-term value of Steem ecosystem, are mathematically incapable of fulfilling that expectation, because they are tied in a knot of a one-size-fits-all global ranking metric.

I suppose that one possible response is that whales have an incentive to preserve the value of their investments, and the best way to do that is to promote a system of fair voting and promote the integrity of the system

The one-size-fits-all ranking system (c.f. my other reply to @smooth below) makes it impossible for whales to act rationally, because they can't compute a set of votes which would reflect their individual preferences for quality which might be shared with other like-minded users.

Thus as far as I can see, the system disincentivizes the whales from participating in voting, for they will come to see that either they become one dysfunctional groupthink monolith or they more or less effectively nullify each others votes in terms of anything other than a uniform ranking which is functionally equivalent to no ranking.

Proposed Solution of Clustering Multiple Rankings

In my prior comment I had alluded to a possible improvement and solution to the dilemma:

An improvement would be some algorithm which allows each grouping of like-minded interests to have their own separate ranking computation. The monetary reward algorithm would also need to change, so as to reward content that ranks highly in any grouping.

The inspiration is that votes should be automatically clustered (grouped) into coteries by an algorithm which can automagically detect the users’ shared preferences, so that a plurality of rankings are allowed: one for each coterie cluster detected. Each user will then see rankings customized to that user’s content preferences.

Benefits

Hypothetically, not only would this eliminate the mindless strategy of voting only for the most globally popular posts, thus reducing rewards for blogging to the meritorious content quality perceived by the voters, it would also stop spamming users with content they aren’t interested to see first. Each user would only see highest ranked the content that they prefer and the content rankings would vary for each user given each user’s individual content preferences. This should reduce animosity between people who have different content preferences and are in the current system competing against each other to spam each other.

No Troll Battles

And all of this would happen automatically, with no changes to the user interface. Users would continue voting, and only their optimum voting strategy would change. Users browsing blogs would continue to do so with the only change being they would see content highest ranked relevant to their preferences.

This would be in theory be a major innovation as compared to Reddit's global one-size-fits-all ranking algorithm.

Technical Description

The technical description of the algorithm I propose may be a bit difficult for some readers to grasp, but I’ll try to not use more technobabble words than absolutely necessary. Notice above I used ‘voting strategy’ instead of ‘game theory’, and I avoided mentioning ‘Nash equilibrium’.

I propose we compute the like-minded distance Dₛₜ between each pair of voters s and t by finding the mean of the following value (Dₛₜ)ⱼ for each pair of votes sⱼ and tⱼ for instance j of content on which either (or both) of these paired users voted.

Alternatively we may wish to underweight in the computation of the mean, the cases where the value is 1, given these add less information because one of the paired voters has not voted on the instance of the content. The term ‘content’ means for example a blog post.

The distances Dₛₜ are grouped into k clusters with the Jenks natural breaks optimization, a.k.a. one dimensional (univariate) k-means clustering.

Demonstration of k-means clustering algorithm

This clustering algorithm groups the distances Dₛₜ into the k clusters which minimize the sum-of-the-squares of the deviations of each member distance Dₛₜ of the cluster from the mean of the cluster. As depicted in the image above, this maximizes the separation between clusters by maximizing the sum-of-the-squares of the deviations of the cluster means from the mean of all the distances Dₛₜ.

We can either choose a value k; or find the value of k which provides the a minimum value for GVF (goodness of variance fit) we’ve chosen.

This algorithm has then grouped the voters into k clusters which maximize the like-mindedness of content preferences for the members of each cluster.

An extra restriction is required on the algorithm, in that for every instance of voter (for s or t) that is a member of a cluster then all instances of that voter (for s or t) must be grouped in the same cluster.

Application

The ranking within each cluster can be computed weighted by Steem Power (SP) according the current computation in the Steem white paper. These totals from the clusters are fed into the algorithm from the white paper to determine the relative rewards for each instance of the content.

Each voter will see the rankings for the cluster which he/she is automagically a member of.

Also the like-minded distances and clustering algorithm could be applied orthogonally to each hashtag (a.k.a. Tag or category), so that voters can be clustered differently for different hashtags. Individuals may have different automatically computed groupings depending on which genre of content they are voting on.

Sort:  

Would Steem fail if every blog post of a lady putting on her makeup is rewarded $26,000?

I don't know that it would (FWIW, I do not believe that every such post would be so rewarded, only the first that emerged as a hit or others that offered something new). Look around you. The most commercially successful music is cookie cutter. The most successful TV shows are dumb. Nobody even buys books any more at all (other than maybe romance novels?). One of the top posts at reddit right now (5000 votes) is: "Excluding my mom, what's the worst sex you've ever had?"

What is successful is what addresses the market as it exists, and that includes a great deal of demand for shallow content and being popular for being popular.

Also, implemention factors are very important. Voting is processed by the consensus code which needs to not only be kept relatively simple but also very high performance. This doesn't mean that your ideas can't work but to be a credible proposal you need to completely specify how and when the various processing steps occur. Of course that can be the subject of later posts.

Would Steem fail if every blog post of a lady putting on her makeup is rewarded $26,000?

I don't know that it would (FWIW, I do not believe that every such post would be so rewarded, only the first that emerged as a hit or others that offered something new).

I realize that opening statement could be interpreted the way you did. Rather I intended it in the context of the article, meaning that given the current optimum voting strategy of each voter maximizing his/her reward by choosing to vote on the posts with the highest $rewards, then there will always be some post that is awarded far too much while so many others are rewarded far too little. Btw, my proposal isn't intended to entirely interfere with the intended psychology effect of quadratic weighting of voting power which the white paper says is intended to motivate blog posters by causing them to incorrectly assess the odds of their likely average reward. Rather my proposal is merely so that the voting power isn’t incorrectly incentivized to vote as a groupthink monolith. I realize some people may be voting their conscience in spite of it not maximizing their curation rewards, but still my proposal benefits them because their conscience will then actually be more highly reflected in the ranking for the cluster that shares the same like-mindedness.

Look around you. The most commercially successful music is cookie cutter. The most successful TV shows are dumb. Nobody even buys books any more at all (other than maybe romance novels?). One of the top posts at reddit right now (5000 votes) is: "Excluding my mom, what's the worst sex you've ever had?"

My rebuttal to the claim that one-size-fits-all content is what the masses want, is to observe the decline in viewership of non-interactive media (e.g. newspapers and TV) since the Internet. Apparently the masses want to customize their experience with media, while sharing that experience/content with like-minded community (and 5000 people in a like-minded cluster is sufficient socializing and good feeling being a member of a group that shares interests). Meaning I agree and disagree with you, in that yes masses want to share things that many others also like, but they want to prioritize their sharing around their mutual likes, not everything under the sun. And they probably do also want to venture outside their priority interests sometimes to see what is going on outside. Even my 26 year old filipina gf tells me she doesn't like seeing violent sharings on her Facebook timeline. She wants cutesy and humorous content, e.g. dogs dancing, etc.. She would have absolutely no interest in reading this proposal. But she will partake of an occasional guy who got crushed under a bus (or apparently a pornographic scandal she did not tell me about lol).

Given the millions of users who visit Reddit every day, 5000 up votes seems quite low. This seems to confirm that millions of viewers are being spammed with content they probably aren’t interested in. We might have an opportunity to improve upon Reddit’s ranking algorithm.

What is successful is what addresses the market as it exists, and that includes a great deal of demand for shallow content and being popular for being popular.

A “Trending for Others” ranking choice could still provide the original unclustered rankings when users want to venture outside their voting preferences. These could even be sparsely interleaved by the UI in the clustered rankings to provide some statistical opportunities for users to morph their voting preferences and not get stuck in a localized groupthink. Yeah we probably need both! Good point.

Note none of the reduces the other main benefit of my proposal which is that curator rewards would be confined to clusters, to remove the incentive for voting in one global groupthink.

Also, implemention factors are very important. Voting is processed by the consensus code which needs to not only be kept relatively simple but also very high performance.

The voting is recorded on the blockchain in I presume real-time, but the rewards are only recorded periodically. Note the rankings and rewards can be computed in real-time for the UI independently of the consensus validators. Thus afaics, it is only when the rewards need to be periodically recorded that those computations need to be performed by the consensus validators. Thus it appears the cost can perhaps be amortized over significant periods.

Agree overall on most most points. Re. implementation, payouts are transactions too. Currently those perform a relatively simple calculations based on shares, and those transactions would need to remain limited in computational cost. I don't think this proposal changes that much though, it just has more complicated (but still computationally simple) accounting of shares by clutser. My concern is primarily the k-means clustering; Incremental k-means variants exist, so it is probably solvable.

Also, dividing users into clusters requires a minimum number of users in each cluster, thus a minimum number of users on the site as a while. At present I doubt that is feasible as the number of users on the site is just barely reaching the point where it works at all. Ideally of course the number of users will be much larger in the future, so something like this could be phased in.

Pleas check out this post I made about a steemit Bug the dev's should see this post so they can fix it. https://steemit.com/bug/@stijn/steemit-bug-needs-to-be-fixed

You are right about that but I think if the interface had clearly defined groups like Reddit, which were moderated like Reddit or Facebook, where only people in that group selected as moderators can determine for example to allow a post into that topic, then maybe you can focus the curation per topic.

Right now if I check the basic income topic I can't find anything related to basic income without digging through all sorts of noise.

It would help but not resolve the main issue. Currently users are rewarded more to rewarding content creators with the highest rewards rather than best content.

The best content is subjective. The only way to determine best is to go by rank and the only rank we have is votes. So the most votes is considered the best by the community. No different from Reddit or Slashdot or any other collaborative filtering.

where only people in that group selected as moderators can determine for example to allow a post into that topic ... if I check the basic income topic I can't find anything related to basic income without digging through all sorts of noise

Afaics, my proposal when orthogonally clustered per topic (aka tag or hashtag) would automatically (i.e. algorithmically) accomplish the same by ranking the irrelevant posts at the bottom of the list of posts for the topic, without needing to manually pick and trust moderators.

The best content is subjective. The only way to determine best is to go by rank and the only rank we have is votes. So the most votes is considered the best by the community. No different from Reddit or Slashdot or any other collaborative filtering.

True, but hypothetically my clustering proposal should offer the advantage that rankings are customized for each cluster, i.e. for each group of people who tend to vote with the same preferences, thus automatically customizing rankings to different people’s likes (up votes) and dislikes (down votes).

Not all votes are equal. Rich/Powerful voters have more important vites.

I would not even try to map the data to the Grassmannian and use the naturally defined metric to determine clustering of the data on various manifolds.

Bad ideas don't work in practice.

Seriously. Avoid this notion at all costs. Grassmannians are tricky beasts.

Is it the intermediate mean for Dₛₜ? You think I should cluster directly on the votes? That worry crossed my mind but I hadn't yet had time to delve into the ramifications. I am not familiar with why they are tricky.

I dunno man. Anytime I start thinking about the space of k-dimensional linear subspaces on an n-dimensional space (i.e. Gr(n,k)), and about points on that space, my mind begins to get warped.

The big question is how can other metrics be used as a way of identifying clustered points on a manifold.

I think we'll see how persistent homology will be coming into play.

I think the only thing Steemit needs to do is improve the user interface so we can find posts on topics we look for. Now it's hard to do because the interface doesn't make it simple.

In Reddit you can subscribe to groups. Groups have moderators. You can expect to see only posts of a certain topic within certain groups. Same with Facebook.

Steemit doesn't yet have clear groups. This is why it's hard to find the topics which we look for. I don't think the math or technical aspects need to change but I do think the interface needs to be at least Reddit level.

Afaics, the problem can not be solved only with filtering the display by tag or grouping without changing the underlying computation of the rewards. Without my proposal, the voters have a mathematical incentive to vote on only the highest payout posts, in order to maximize their own curator rewards. Maybe it is not clear to everyone why I think that is the case. The reason is because I presume the voter has no way to predict which voters will vote on each blog post and thus which blog posts which be optimum for them to vote on to maximize their curator rewards constrained to their cluster. So I presume absent an a priori strategy to maximize their curator rewards, they will choose to vote honestly according their content quality preferences. Note this aspect of the game theory needs to be pondered carefully.

Additionally, I quote what I wrote in reply to @smooth:

I realize some people may be voting their conscience in spite of it not maximizing their curation rewards, but still my proposal benefits them because their conscience will then actually be more highly reflected in the ranking for the cluster that shares the same like-mindedness.

As the system is right now, perhaps he doesn't even need to "predict". He can vote, see whether others voted, and then if they did vote and raised money preserve the vote. If they didn't raise money, he can proceed to unvote and vote something else: https://steemit.com/steemit/@alexgr/curation-gamed-through-unvoting

This doesn't work. I'll explain why in a response to your post.

I am not concerned about the rewards. I think the rewards are working great. Popular content is getting the most money. Sure you can have content which gets a lot of votes but which doesn't get much money because not a lot of voting weigh is behind it but that will change over time as more people have Steem Power.

I don’t understand why you are not concerned that people are apparently not always voting for the content which they like the best, but instead for the content they think can give them the highest curator rewards? I am doubting that you have understood deeply enough the various concepts and impacts of this proposal.

Without my proposal, the voters have a mathematical incentive to vote on only the highest payout posts, in order to maximize their own curator rewards.

The point is voters are being incentivized by the curator rewards to not vote on the most relevant content, but rather on the content that other whales have voted on. This apparently causes a swarm vortex effect where all the votes get sucked towards what ever got the early momentum, regardless if that blog post was not so much better in quality than all the others.

Granted I am not sure that everyone is voting to maximize their curator rewards. But if say two whales who have 3 million Steem Power voted for a blog post, then many others will vote for it, because not only their curator rewards will be higher for doing so than wasting their vote on another blog post, but also because that blog post being at the top of the listing will also cause it to be seen by EVERYBODY more often.

Hypothetically my proposal not only divvies up the curator rewards by like-mindedness, it also divvies up what people see at the top of the listings so not everybody sees the same top ranked blog posts.

In the abstract, my proposal is about adding more degrees-of-freedom to the ranking system in all aspects of it.

I posted my proposal to solve this issue the other day which actually fits in with your algo as well.

https://steemit.com/steemit/@jacobt/the-circlejerk-needs-to-end-now

This need to be on first page. Upvoted.

EVERYONE UPVOTE THIS MAN!! We need him to have a vested interest in the platform so he'll keep on contributing his brilliance to the community!

This was long due! As a non-coder I can really hope that some of what you've suggested gets implemented. This is a great proposal and I hope it will bring many positive changes to this platform. I hope it makes it to the front page and stays there for a while! :)

Loading...

This is what will make this platform great. People from all walks of life putting their efforts in. Great post!

I agree about the imperfections of the issues with the ranking and reward systems. However, I do not see the solution as fully considered in its present state.

Primarily, what are the implications for the allocation of rewards across tags? How much should #steemit get versus #beauty?

Might multiple cluster membership or topic modeling be a better fit?

And by automagically, I assume machine learning of some variety?

Multiple cluster membership is what I alluded to in the last paragraph of the last section:

Also the like-minded distances and clustering algorithm could be applied orthogonally to each hashtag (a.k.a. Tag or category), so that voters can be clustered differently for different hashtags.

You are correct to point out I was vague in the last section, but afaics the rewards should still be according to voting power; thus the relative allocation between clusters and hashtags could remain an orthogonal attribute of the voting power. In other words, the total reward for the blog post’s author would still be based on the sum of the voting power for it, but the rankings would be clustered. However, the voter’s reward should be constrained to the voting power of the clusters he/she is a member of, so that the Nash equilibrium of voting for the posts with the highest rewards is I think removed. The proposed algorithm incentivizes the voting power to divvied up more granularly by providing orthogonal rankings for like-minded clusters of voters and this could be computed orthogonally for each hashtag. I will be pondering more the game theory ramifications of this proposal.

I did actually think of this issue, but I just forgot to write it into the last section. My energy level and focus was tailing off apparently as I composed the end of the document. There were many details I thought of and I may not have written them all down yet. I will as I remember.

The automagic is the algorithm as described.

This is a great solution. The reward system should align with a personal preference recommendation system, recommending new exciting (personal preference matching) posts and further rewarding people to vote for what they actually like, to further recommend those posts to others in the same focus group. This is where steemit lacks the most right now and which could set it apart from many other solutions out there.
In this case, Reddit sets kind of a bad example for Steemit. Yeah you can choose subreddits you are interested in, but then again... Reddit does not have a reward system and if users feel like they miss out on rewards if the don't follow "mainstream propaganda", Substeems will diminish.
Other platforms like Facebook and in specific Quora set far better examples for Steemit going forward.

Very interesting and well thought out proposition.

Do you think we could roll something like this out in waves of varying detail?

For instance: coming up with an algorithm that somewhat normalizes the rewards pool into different topics/tags, which then get split up based on the relative success of posts within those topics. I feel like this would be a step in the right direction without also trying to take into account each I dividual user's preferences which are subject to change over time, and other factors like that.

Good point. I forgot to address changing user preferences. To adapt to changing user preferences, the set of content instances considered by my proposed algorithm could be a sliding window excluding old content, or probably better would be to weight the older instances less in the computation of the mean for Dₛₜ.

So far, I can’t think of any computation of rankings by tags without the clustering by voter like-mindedness, which is any different from the current algorithm; and thus won't stop the optimum voting strategy from being for everyone to vote in a groupthink for the most highly voted content.

Thanks to everyone who appreciated the proposal. We need to spend more time thinking about any vulnerabilities it might introduce. I wanted to get the proposal published as soon as possible. I will be pondering it.

yes good proposal

The quality of curation would be higher if people voted only on the content they were interested in, in preference to what happens now which is voting for content that they think will earn them STEEM. The difficulty to find posts that curators are interested in is becoming an issue with the very high volumes of posts. Functionality to "follow" certain headings - e.g. photography or travel, would enable curators to find what interests them as they log in and, as they are relative experts in those areas, would allow their knowledge to drive curation quality up. Also the possibility exists for people to become curation "experts" in certain fields due to the number and quality of their curation. Could this work?

Great reading, hope all read it. Would gladly recomend it to my friends. Uppvote this :)

-Regards

Great idea! I think this is the exact direction the site should go in.

I forgot to quantify that afaics the likely effect of introducing k clusters is to lower the maximum rewards by a factor of k by spreading the voting power around to k times greater granularity (divvying up the groupthink by k clusters of like-mindedness). I would need to develop the last section more and write the explicit details for the computation of the monetary rewards.

Good point, I'm going to try to diversify my votes more.

A lot of that was outside my wheelhouse but definitely a fantastic read. Thank you.

Would steem fail? Perhaps though it depends on the required outcome. I'm fairly sure that the developers could walk away now with significant return on investment and count steemit as a massive success.

Steemit was advertised to me as a content platfrm that rewarded quality content by offering users a way to directly reward content providers with the platform rewarding users for doing so.

That isn't what its doing or what it has ever done. So as it stands steemit is already a failure in that regard.

We as a community need to know what we want steemit to be and judge it accordingly. The steemit's user base will decide whether we judge well or poorly. The value of the currency is somewhat irrelevant to that.

Excellent post, the "shallow content" of certain posts will continue to be rewarded; although articles like these seem to be increasing in number. The idea of a social platform has to appeal to one and all!! Nevertheless very compelling and interesting read.

Thank you!!

By the way, I wrote about another problem with the way curating works, after being inspired by your post. https://steemit.com/steemit/@lightnovelist/tweaking-curator-rewards-algorithm-for-better-content-discovery

What do you think?

We are animals, violence and sex excite us. boobs for 26,000 $, 13,000 every one no?

My proposal is not designed to diminish the rewards for content that voters are truly interested in up voting. My proposal is merely to untie votes from a one-size-fits-all preference, so that different voters can be grouped (clustered) automatically (algorithmically) by their relative like-mindedness, as measured by the similarity of their voting patterns over time.

In short, I am proposing an improvement which will be computed automatically behind the the scenes and which will give us more relevant rankings so that we all hopefully see the sort of content we are most interested in.

Soon there will be so much content that we otherwise will not be able to find the content we are interested in.

I like this... I need to think more about it, but I've been wondering if something like this could be implemented autonomously.

BTW, Part 4 of the game theory series is up!

I replied to your new blog post.

Your idea reminded me Backfeed.cc on Ethereum.