Hadoop MapReduce brainstorming: Bitshares

in #bitshares7 years ago (edited)

hadoop

Hadoop MapReduce brainstorming: Bitshares

Why? I'm currently studying for an upcoming BigData exam which largely focuses on Map Reduce.

Market maker incentives

Given the transaction history for all holders of a select asset (large amount of data dumped to disk in a nosql manner), summarize the market maker participation (sum of buys and sells in a time period) for use during a manual sharedrop (market maker incentives).

The final output will be a text file containing rows of 'username final_trading_value percent_total_market_activity'.

We could then perform a sharedrop which takes percent_total_market_activity and the user's current_asset_holdings into account when performing the final distribution (so as to incentivize holding), or simply the activity so that users that no longer hold said asset but performed market maker activities are still rewarded.

Related content

Acknowledged proposed MR design limitations

  • Chained MR jobs instead of a single MR job. Potentially could introduce a partitioner/sorter to merge the two jobs into the one.
  • The current plan is to treat buyers and sellers identically, however we could include an user input for selecting a strategy (incentivize buyers/sellers only or both).
  • Not production MR code, simply a pseudocode plan.
  • If a 'market-maker' does not hold the

User input variables

  • Timestamp range for market history
  • Trading pair - bitUSD:BTS
  • Reference asset - bitUSD.

Pre-MR Data steps

Note: This will require interacting with the full_node client (with the History_API plugin enabled) through websockets as the cli_wallet has insufficient commands available to it & I don't believe you can authenticate over http remote procedure calls.

Example websocket commands:

Documentation: Github wiki docs, Bitshares.eu wiki docs

Note: The API docs do not have example output, so you'll need to run them before understanding their full output.

Login
> {"id":2,"method":"call","params":[1,"login",["",""]]}
< {"id":2,"jsonrpc":"2.0","result":true}
Get asset holder count
> {"id":1, "method":"call", "params":[2,"get_asset_holders_count",["1.3.0"]]}
< {"id":1,"jsonrpc":"2.0","result":24085}
Get asset holder accounts & asset holdings (10 instead of 24085 for simple example

Acquire list of asset holders -> Output to text file 'asset_holders.json'

> {"id":1, "method":"call", "params":[2,"get_asset_holders",["1.3.0", 0, 10]]}
< {"id":1,"jsonrpc":"2.0","result":[{"name":"poloniexcoldstorage","account_id":"1.2.24484","amount":"29000120286608"},{"name":"chbts","account_id":"1.2.224015","
amount":"21323905140061"},{"name":"yunbi-cold-wallet","account_id":"1.2.29211","amount":"14000000042195"},{"name":"bitcc-bts-cold","account_id":"1.2.152313","amo
unt":"10943523959939"},{"name":"yunbi-granary","account_id":"1.2.170580","amount":"10000000048617"},{"name":"jubi-bts","account_id":"1.2.253310","amount":"699215
7568429"},{"name":"bittrex-deposit","account_id":"1.2.22583","amount":"6843227690310"},{"name":"btschbtc","account_id":"1.2.224081","amount":"5000098977059"},{"n
ame":"bterdeposit","account_id":"1.2.9173","amount":"2195728656599"},{"name":"aloha","account_id":"1.2.10946","amount":"2061578333527"}]}
Dump each asset holder's transaction history to json file on disk

Note: This stage doesn't require websockets & can be performed using the web rpc.

curl --data '{"jsonrpc": "2.0", "method": "get_account_history", "params": ["customminer", "1000"], "id": 1}' http://127.0.0.1:8092/rpc > customminer_account_history.json
Finally merge the files

Merge the many json files into the one massive json file containing asset holders transaction history (potentially using the unix jq program).

Websocket clients

I've been looking into this and I don't believe you can automate wscat nor dump the command output to disk, so a simple bash script is out of the equation. I've narrowed down my preference to either Haskell or NodeJs.

Map Phase 1

  • Import user-variable: Timestamp range
    • Disregard any transaction outwith timestamp range!
  • Filter each entry in transaction histroy json file for: Filled Order (FO)
  • For each parsed FO:
    • Extract trade participants (buyer & seller)
    • Extract trade amount
    • Counter: Overall_traded_currency
  • Produce key: User1_User2 //buyer_seller
  • Produce value: User1TradeAmount_User2TradeAmount
  • End Map phase, outputting the key and value pair towards the reducer.

Reduce Phase 1

Note: Within the 1st reduce phase we have every occurrence of trades between user1_user2, nobody else and not the reverse user2_user1. Splitting this will require a second reduce job or changing the logic of the first MR task.

  • Input key & value pair from the mapper.
  • Output to text file 'participants.txt'
    • User1:buy_amount
    • User2:sell_amount

Map phase 2

  • For each row in 'participants.txt' file:
    • Split row on ':'
    • Key = Username
    • Value = buy/sell amount

Partitioner

  • Sort alphabetically on key so as to send identical username <k,v> pairs to the same reducer.

Reduce phase 2

  • Sum the buy & sell values (amount, not trading value) for each user.
    • Divide the total by the 'Overall_traded_currency' counter used during MR phase 1. This provides us a % of total market activity for an user.
    • Output to text file 'end_data.txt'
      • username summed_trading_value percent_total_market_activity

Any input? Please do comment below!

Do you have an idea for a Map Reduce program for Bitshares or Steemit? I'd love to hear about it!

An alternative to getting the list of asset holders and their individual full account history would be to dump the tx from each blocks within a time range input by the user, if possible..

Best regards,
@cm-steem

Sort:  

Please, write more stuff like this!

Good post. Very informative and useful to readers. Got really good information. This is really of great benefit to Steem community. Will continue to follow post.

I've seen this post where someone wrote a map reduce based algorithm to check steem statistics.
Check here:
https://steemit.com/steem/@void/steemreduce-you-personal-mapreduce-for-steem

Map reduce can be a powerful mechanism.
Good luck with your studies.

Maybe you also want to have a quick look at apache

Very good information, very useful for me. Thanks I am waiting for the next information from you @cm-steem. I am new in steemit, please help me and guide me to be able to quickly understand steemit. Please follow me @siren7 and give good suggestions for each of my posts

@cm-steem I speak for those who dont understand any of these programming terms or true understanding. Im not a programmer but I def. enjoyed the short article. I am a crypto economics A.I enthusiast & hope I can live comfortably off internet revenue streams. Maybe programming is the way out? Thanks for sharing!

Thank you for sharing this interesting article i'm a Business Intelligence student and i like Hadoop !

This is nice article . I really love this article

The baby elephant in the logo is so cute :-)

really is brainstorming awesome.

Thank for sharing the knowledge

Thanks @cm-steem for sharing. it is informative

Regards
@nandikafuture

Good information!

wonderful blog, I upvoted you and followed you

This is very good post!

Thanks for this great information.

Thank you for posting. All information is usable .

You have my UpVote ,,

Would love to think about exploring some use for such data as another way for users to get incentives / rewards for some part(s) of market participation.

Very good information, very useful for me. Thanks I am waiting for the next information from you @cm-steem. I am new in steemit, please help me and guide me to be able to quickly understand steemit. Please follow me @siren7 and give good suggestions for each of my posts