SteemData 1.2 is here ∙ Raised $5,120 of $5,000 ∙ Now on GitHub

in #steemdata8 years ago (edited)

SteemData 1.2

I've decided to ship early, and not wait until SteemData 2.0. The main reason is that I'd like to push out all the breaking changes now, to reduce the amount of pain in the future.

Features in 1.2

Fast updates and eventual consistency

Before 1.2, I would run a handful of workers in a loop, and scrape account related updates one by one. Steem now has over 120,000 accounts, and this approach certainly doesn't scale. It also means that an account can only be updated once every few hours, and thus some of the data is stale.

I have solved this problem by switching to an asynchronous event based model (powered by Celery and RabbitMQ, the distributed queue), where posts, accounts and their virtual operations are updated shortly after new blocks become available.

I have repurposed the old worker model as a fail-safe - if for whatever reason the event based approach fails in such a way that it would cause loss of data - the background worker will back-fill the missing data afterwards.

Structural Changes and Types

This release contains a handful of design improvements and changes, which are not backwards compatible. I do not expect any major breaking changes for 2.0.
Also, the typing support has been improved greatly.

Historic Prices

I've added hourly snapshots for STEEM, implied SBD and Bitcoin prices.

Performance Improvements

The new Mongo deployment is wriredTiger enabled.

I have reworked indexes on all collections, which yields in over 2-10 fold query performance improvement for most historic queries.

SteemData is now also hosted on a more expensive, Intel i7 6700k powered server with 64GB RAM. The hardware upgrade should yield over 2x performance gain.

Open Source

All of the code powering SteemData is now available on Github, and is licensed under highly permissive MIT.

steemdata-node

If you're looking for a Docker based, easy to use steemd RPC deployment, this is it.
It comes with all blockchain plugins enabled, latest seed node list and automatic blockchain snapshot download on first run for quick syncing times (thanks to @gtg).

steemdata-mongo

This repo contains all the code that is responsible for syncing STEEM blockchain with MongoDB.

steemdata

This is a core library for working with STEEM blockchain data. It is database agnostic (could be used for SQL or any other database in the future).

steemdata.com

Right now, the website only hosts basic instructions and stats.

Eventually, I would like to build:

  • an API for 3rd party apps
  • blockchain explorer
  • steemle inspired charts and analytics

TODO (until next release)

  • Integrate Comments
  • Add Relationships via HRefs
  • Create Sample Notebooks
  • Documentation!

Now that the stable base is in place, I'd like to work on making this project more useful and friendly to people who can benefit from it. If you're a developer, please talk to me (I am @furion on steemit.chat)

Upgrade Now

The old version of SteemData will be shutting down on Feb 10th. Please upgrade to SteemData 1.2, see steemdata.com for connection info.

Crowdfunding

We have raised $5,120 of the $5,000 goal so far. Big thanks to @cass for making this project possible.

Supporters
@cass$4,900
@fabien$100
@abit$100
@tuck-fheman$20

The donations should be sent to @steemdata, and the list of friendly donors will be published and updated here, as well as in future announcements.


If you'd like to support my work, feel free to vote @furion for witness.

Sort:  

Great work!

For those who don't trust Mongo, we'll also be releasing the sbds (Steem Blockchain Data Service) pretty soon which does mostly the same thing, but with MySQL. :)

Having all the block/tx/op/json transactional data in tables for querying is super powerful, and we hope this spawns a new generation of apps that use the blockchain data.

😎 Upvoted! Steem on!

I am very much looking forward to SBDS. MongoDB might be the most popular 'nosql' database out there, but for most people a SQL based solution is the real deal. I hope that the two services complement each other in ensuring wide developer coverage.

https://steemd.com/tx/bb40f6bd0e0caee4bca8989a759f141e981c5758
Hope this helps, you should be ready now for getting your hands more dirty on coding :)

When getting time to i will try to provide a "design" etc. Maybe we should get in talk with fabien and christopher at busy.org .. ! I could imagine to get steemdata integrated somehow into busy as well ..

So much is happening, thank you for keeping us updated as well as working on these great projects both individually and in cooperation. AWESOME work! Namaste :)

This is awesome. :)

Thank you for the tip :)

You're welcome! It's not much but it's what I had on hand. XD
I voted for you as a witness too. I'm still learning about Steemit hehe.

great work furion. thx

wow, this is really cool. I have a very technical question that I'd love if a dev could answer, I'll just ask it here:
Do server-based websites like Medium have technical constraints that blockchain-based ones do not? More specifically, I read about a guy who basically broke Medium's algorithm for detecting bot activity. He was so prolific on the site, commenting and recommending others' articles, that the server basically became backlogged and flagged his account for automation. Is this a problem with them using a server instead of blockchain? Or does our 20 second commenting limit get rid of this issue? Is this problem not even a possibility on a blockchain-based platform or not?

Actually the problem you described can also happen on a blockchain-based platform. Here are even more challenges.

Wow, really impressive, congrats.

I'll have to dig into your database :)

Mentioned on Steem Data Resource - Collection Of Posts About Steem Bots, Data And Mining, Issue No. 4, of course!

Upvoted, Resteemed! Thank you again for your work please keep it up!