Write a Steemit Web App: Part 9 - Retrieving Content with getDiscussionsBy*()

in #steemdev7 years ago (edited)

(Previous Post: Part 8)

Previously, I introduced the getState() API function that is used by the Condenser app (a.k.a., Steemit.com) to retrieve content for a given URL path, such as 'trending/steemdev'. That function is great in order to get a page-worth of posts and other information, such as Account objects and the current feed price, all in one call. But, in order to retrieve more than just the first 20 posts, you must use a different API function.

Today, we'll look into the various getDiscussionsBy* functions.

The get_discussions_by API Functions

A lot of the documentation for the Steem API only exists in the form of source code - that's part of the reason why I started this series of posts (to document what I find as I explored the source code while trying to figure out how the API works). The database_api header file is a good place to start, since header files tend to be better documented than the rest of the source.

In that file, you'll find a group of functions with the same signature:

vector<discussion> get_discussions_by_payout(const discussion_query& query )const;
vector<discussion> get_discussions_by_trending( const discussion_query& query )const;
vector<discussion> get_discussions_by_created( const discussion_query& query )const;
vector<discussion> get_discussions_by_active( const discussion_query& query )const;
vector<discussion> get_discussions_by_cashout( const discussion_query& query )const;
vector<discussion> get_discussions_by_votes( const discussion_query& query )const;
vector<discussion> get_discussions_by_children( const discussion_query& query )const;
vector<discussion> get_discussions_by_hot( const discussion_query& query )const;
vector<discussion> get_discussions_by_feed( const discussion_query& query )const;
vector<discussion> get_discussions_by_blog( const discussion_query& query )const;
vector<discussion> get_discussions_by_comments( const discussion_query& query )const;
vector<discussion> get_discussions_by_promoted( const discussion_query& query )const;


Note: In the C API, snake_case is used, while Steem.js uses camelCase. Simply remove the underscores and capitalize each subsequent word to figure out what the function name needs to be for JavaScript.

So, let's say that you used getState('trending/steemdev') to retrieve the first 20 posts, and you want to fetch the next 20. How would you do that with one of these API functions?

First, you need to know which database index you were really using for the getState() call. From a pure data point of view, you can check the data returned by getState() to see which array under discussion_idx is populated.

In this case, it was 'trending', so it's pretty easy to figure out that you probably want the get_discussions_by_trending(query) function (or, rather, getDiscussionsByTrending(query) in JavaScript). But, what's that query argument all about?

discussion_query Struct

Again, referring to the header file, we see that discussion_query is defined as:

/**
 *  Defines the arguments to a query as a struct so it can be easily extended
 */
struct discussion_query {
   void validate()const{
      FC_ASSERT( filter_tags.find(tag) == filter_tags.end() );
      FC_ASSERT( limit <= 100 );
   }

   string           tag;
   uint32_t         limit = 0;
   set<string>      filter_tags;
   set<string>      select_authors;     ///< list of authors to include, posts not by this author are filtered
   set<string>      select_tags;        ///< list of tags to include, posts without these tags are filtered
   uint32_t         truncate_body = 0;  ///< the number of bytes of the post body to return, 0 for all
   optional<string> start_author;
   optional<string> start_permlink;
   optional<string> parent_author;
   optional<string> parent_permlink;
};


Essentially, this is a structure that is used by a lot of the API functions, but functions only take what they need, so some properties will be ignored even if you provided data.

A JavaScript Example

Here's an example of how a call into getState() can be followed by a call into getDiscussionsByTrending():

let index = 'trending'
let tag = 'steemdev'

steem.api.getStateAsync(`${index}/${tag}`)
  .then(function (o) {
    let posts = o.discussion_idx[tag][index]
    let last = o.content[_.last(posts)]

    let query = {
      tag: tag,
      limit: 20, 
      start_author: last.author,
      start_permlink: last.permlink
    }

    steem.api.getDiscussionsByTrendingAsync(query) 
      .then(r => console.log(JSON.stringify(r,null,2)))
        .catch(console.log)          
      })
  })
  .catch(console.log)


Note: As @pilcrow points out in his Angular tutorial that also introduces the getDiscussionsBy* functions, the function itself returns a promise. I'm a creature of habit, so that's why my code shows *Async() in the function name because it explicitly tells me that the function was promisified by Bluebird.js under the covers (and that's a Bluebird convention). You'll get the same result with or without Async in the function name if you follow the function call with a .then().

So, what's happening here? Well, first we get the top 20 posts of trending/steemdev using getState(). The sorted list of posts will be in the .discussion_idx.steemdev.trending property (a string array of permlinks):

"discussion_idx": {
  "steemdev": {
    "category": "",
    "trending": [
      "good-karma/esteem-filters-community-input-om0vmqj9sw",
      "ausbitbank/steemvids-alpha-update",
      "steemreports/steemreports-outgoing-votes-analysis-tool",
      "ontofractal/glasnost-v0-12-released-now-with-postgresql-realtime-and-7-day-lookback-comments-data-sync-open-source-app-server-for-steem",
      "morning/just-another-wordpress-steem-plugin-also-introducing-steemeasy-com",
      "good-karma/good-karma-witness-update-22nd-july-2017-ulf0cx9y6o",
      "almost-digital/dsteem-playground",
      "rycharde/proposal-for-new-rules-regarding-self-votes-and-voting-rings",
      "adept/steemh-com-a-hacker-news-styled-interface-for-steemit",
      "steepshot/the-practice-of-programming-using-graphene-based-blockchains",
      "davidk/steemphp-v0-2-released",
      "djvidov/osteem-chrome-extension-it-s-alive-and-need-20-40-alpha-testers",
      "good-karma/esteem-calling-for-volunteer-translators-16-get-reward-lqpst47n77",
      "calamus056/extensive-curation-stats-overview-since-hf19-june-20th-2017",
      "dez1337/steemj-v0-3-1-has-been-released-update-13",
      "recrypto/wordpress-steem-1-0-2",
      "klye/klye-witness-update-07-22-2017",
      "freyman/esteem-steem-no-mobil-preguntas-frecuentes-faq",
      "djvidov/osteem-first-alpha-version-its-almost-ready-its-time-to-create-an-chrome-developer-account",
      "jfollas/write-a-steemit-web-app-part-8-retrieving-content-with-getstate"
    ],
    "payout": [],
    "payout_comments": [],
    "trending30": [],
    "updated": [],
    "created": [],
    "responses": [],
    "active": [],
    "votes": [],
    "maturing": [],
    "best": [],
    "hot": [],
    "promoted": [],
    "cashout": []
  }
}


From this array, the code picks the last one to use as the start_author and start_permlink for the continuation list of the next 20.

Note that the maximum limit is 100, so there's no real reason to only retrieve 20 posts at a time if you need more. Also note that the results of the getDiscussionsByTrending() call will start with the last post that you already have - so be prepared to handle duplicate data if you are going to merge the results of the two calls.

Unlike getState(), the getDiscussionsByTrending() function will return just an array of posts - if you need the Author's Account metadata, etc, then you will need to make subsequent API calls to fetch that data.

Also, while getState() truncates the post body at 1024 characters, getDiscussionsBy* will not truncate the body unless you provide a truncate_body value in the query struct.

What about Comments?

Stay tuned - I'll cover how to retrieve (and post) Comments next time.

javascriptlogo.png

(Next Post: Part 10)

Sort:  

Your posts are very helpful, since the documentation is not very comprehensive yet. Thank you! I'm having a try to built a simple jqurey plugin to display posts and profile information on websites. News version is already on it's way. Maybe you want to have a look at it. :)

Your link is broken.

giphy.gif

I wish I had the time to experiment in interacting with steemit via API calls and (attempting) scripted routines, I'll happily settle for reading your articles with keen interest.

Some things I'd really like to see would be, for one, a tag explorer that was aware of votes/views for various tags and able to suggest the best tags to apply to an article based on a selection of keywords.

I'm really looking forward to your next post about retrieving and posting comments, it's giving me hope that there will someday be a means of checking comments at submission time to see if it's:

  • very short/generic XOR
  • clone of another comment AND
  • Nth clone of X comment in Y period

Then warning the submitter of potential reputation consequences should they choose to proceed, the second 2 conditions being by far more important. That's the other one I'd really like to see.

Not sure that out of the box API would support your type of query. You'd probably have to use one of the database implementations (blockchain/transactions stored in a searchable database) because you're looking to find matches across the comments from a multitude of posts at a time.

An API call would go to steemit from an external source, but what I'm describing probably cannot be achieved by anything other than steemit server side, essentially something along these lines (and please excuse the faux verbs/functions):

FOR user submitting a comment, return an array of comments,datetimestamp WHERE datediff(now,12H) AND string(exact(comment)), count all perfect matches of current comment, return count, time, string

user, this is the count time you have posted the exact phrase "string" in time, please consider a less generic response, steemians thrive on original content & meaningful contributions, proceeding with this comment may negatively impact your reputation.
POST     EDIT

EG:
copypastamasta, this is the 11th time you have posted the exact phrase "good post, I upvote please upvote me" in 37 minutes, please consider a less generic response, steemians thrive on original content & meaningful contributions, proceeding with this comment may negatively impact your reputation.
POST     EDIT

Not something within your purview, I know, more just a hypothetical item for discussion.

Ah, I never realized *Async is just a convention for Bluebird promises. I might as well start using that too then. It makes sense that it's nice to see at a glance whether a method returns a promise or not, just like the $ suffix for Observables that many people use.

It's at least a convention for Bluebird's Promise.promisifyAll() feature to turn functions with Node-style callbacks into promises:

http://bluebirdjs.com/docs/api/promise.promisifyall.html

Great documentation! I can't wait to read all the posts in this series. I'm currently working on some NodeJS Steemit bots & articles.

Thanks for your work digging into this. One thing that I'm hitting my head against is how to retrieve all posts for a given day... right now I'm iterating through the results of get_discussions_by_created until I'm in the correct time range, but this takes an extremely long time for any dates longer than a few days ago since it has to retrieve a large amount of unnecessary discussions until we hit the correct timeframe. Do you have any ideas on how to make this more efficient?

Nice post. Upvoted and following. Please visit my blog @bikash-tutor and upvote and follow. Thank you.