SteemData 1.2
I've decided to ship early, and not wait until SteemData 2.0. The main reason is that I'd like to push out all the breaking changes now, to reduce the amount of pain in the future.
Features in 1.2
Fast updates and eventual consistency
Before 1.2, I would run a handful of workers in a loop, and scrape account related updates one by one. Steem now has over 120,000 accounts, and this approach certainly doesn't scale. It also means that an account can only be updated once every few hours, and thus some of the data is stale.
I have solved this problem by switching to an asynchronous event based model (powered by Celery and RabbitMQ, the distributed queue), where posts, accounts and their virtual operations are updated shortly after new blocks become available.
I have repurposed the old worker model as a fail-safe - if for whatever reason the event based approach fails in such a way that it would cause loss of data - the background worker will back-fill the missing data afterwards.
Structural Changes and Types
This release contains a handful of design improvements and changes, which are not backwards compatible. I do not expect any major breaking changes for 2.0.
Also, the typing support has been improved greatly.
Historic Prices
I've added hourly snapshots for STEEM, implied SBD and Bitcoin prices.
Performance Improvements
The new Mongo deployment is wriredTiger enabled.
I have reworked indexes on all collections, which yields in over 2-10 fold query performance improvement for most historic queries.
SteemData is now also hosted on a more expensive, Intel i7 6700k powered server with 64GB RAM. The hardware upgrade should yield over 2x performance gain.
Open Source
All of the code powering SteemData is now available on Github, and is licensed under highly permissive MIT.
steemdata-node
If you're looking for a Docker based, easy to use steemd RPC deployment, this is it.
It comes with all blockchain plugins enabled, latest seed node list and automatic blockchain snapshot download on first run for quick syncing times (thanks to @gtg).
steemdata-mongo
This repo contains all the code that is responsible for syncing STEEM blockchain with MongoDB.
steemdata
This is a core library for working with STEEM blockchain data. It is database agnostic (could be used for SQL or any other database in the future).
steemdata.com
Right now, the website only hosts basic instructions and stats.
Eventually, I would like to build:
- an API for 3rd party apps
- blockchain explorer
- steemle inspired charts and analytics
TODO (until next release)
- Integrate Comments
- Add Relationships via HRefs
- Create Sample Notebooks
- Documentation!
Now that the stable base is in place, I'd like to work on making this project more useful and friendly to people who can benefit from it. If you're a developer, please talk to me (I am @furion on steemit.chat)
Upgrade Now
The old version of SteemData will be shutting down on Feb 10th. Please upgrade to SteemData 1.2, see steemdata.com for connection info.
Crowdfunding
We have raised $5,120 of the $5,000 goal so far. Big thanks to @cass for making this project possible.
Supporters | |
---|---|
@cass | $4,900 |
@fabien | $100 |
@abit | $100 |
@tuck-fheman | $20 |
The donations should be sent to @steemdata, and the list of friendly donors will be published and updated here, as well as in future announcements.
If you'd like to support my work, feel free to vote @furion for witness.
Furion is a Steem beast!
Great work!
For those who don't trust Mongo, we'll also be releasing the
sbds
(Steem Blockchain Data Service) pretty soon which does mostly the same thing, but with MySQL. :)Having all the block/tx/op/json transactional data in tables for querying is super powerful, and we hope this spawns a new generation of apps that use the blockchain data.
😎 Upvoted! Steem on!
I am very much looking forward to SBDS. MongoDB might be the most popular 'nosql' database out there, but for most people a SQL based solution is the real deal. I hope that the two services complement each other in ensuring wide developer coverage.
https://steemd.com/tx/bb40f6bd0e0caee4bca8989a759f141e981c5758
Hope this helps, you should be ready now for getting your hands more dirty on coding :)
When getting time to i will try to provide a "design" etc. Maybe we should get in talk with fabien and christopher at busy.org .. ! I could imagine to get steemdata integrated somehow into busy as well ..
You are awesome man!! Very kind
So much is happening, thank you for keeping us updated as well as working on these great projects both individually and in cooperation. AWESOME work! Namaste :)
This is awesome. :)
Thank you for the tip :)
You're welcome! It's not much but it's what I had on hand. XD
I voted for you as a witness too. I'm still learning about Steemit hehe.
great work furion. thx
wow, this is really cool. I have a very technical question that I'd love if a dev could answer, I'll just ask it here:
Do server-based websites like Medium have technical constraints that blockchain-based ones do not? More specifically, I read about a guy who basically broke Medium's algorithm for detecting bot activity. He was so prolific on the site, commenting and recommending others' articles, that the server basically became backlogged and flagged his account for automation. Is this a problem with them using a server instead of blockchain? Or does our 20 second commenting limit get rid of this issue? Is this problem not even a possibility on a blockchain-based platform or not?
Actually the problem you described can also happen on a blockchain-based platform. Here are even more challenges.
Wow, really impressive, congrats.
I'll have to dig into your database :)
Mentioned on Steem Data Resource - Collection Of Posts About Steem Bots, Data And Mining, Issue No. 4, of course!
Upvoted, Resteemed! Thank you again for your work please keep it up!