Details
If you have tried MiXion you would have noticed that some Discussion display fine while others will still contain links and Markdown fragments.
The problem here is that not all posts are the same, some are pure HTML, some are pure Markdown, some are plain text, and some are a combination of Html/Markdown.
I already have methods of parsing each type of content, Bypass for markdown and the stock Html.fromHtml
in Android but I need a proper way to determine what exactly is in each discussion and how to handle those that are a combination.
Task 1
- Write a method or class in Kotlin or Java that determines what exactly is in the discussion and parses it accordingly.
You can play with the API and see all the types of results that we get. I found that the format tag in the Json metadata is unreliable.
Some discussion are easy to determine what they are based on where they are from, for example, all posts from dmania begin with <center>\n <a href=\"https://dmania.lol
and they are all in Html, all posts from dtube begin with <center><a href='https://d.tube
and all posts from utopian.io are in Markdown.
Task 2
- Write a method or class that extracts all human readable text only, this wil be used for the feed summary.
I've already wrote this function to strip out all html tags, we need one to strip out markdown and plain links. Maybe you could combine them both.
fun stripHtmlTags(html: String): String {
val sbText = StringBuilder()
val sbHtml = StringBuilder()
var isText = true
for (ch in html.toCharArray()) {
if (isText) { // outside html
if (ch != '<') {
sbText.append(ch)
continue
} else { // switch mode
isText = false
sbHtml.append(ch)
continue
}
} else { // inside html
if (ch != '>') {
sbHtml.append(ch)
continue
} else { // switch mode
isText = true
sbHtml.append(ch)
continue
}
}
}
return sbText.toString()
}
Current work
You can find some of the work I've already done on this problem in StringUtils.java StringExt.kt and the entire steemitutils directory
Communication
You can reach me through discord at edTheGuy00
All proceeds of this post will go to whoever complete a task i guess
Posted on Utopian.io - Rewarding Open Source Contributors
Thank you for the contribution. It has been approved.
You can contact us on Discord.
[utopian-moderator]
Hey @edgar-trem I am @utopian-io. I have just upvoted you!
Achievements
Suggestions
Get Noticed!
Community-Driven Witness!
I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!
Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x
Technically Markdown is supposed to support you adding HTML in.
So in theory, depending on your markdown parser, you should just be able to feed everything through a markdown parser and get out the valid HTML, even if the content contains HTML itself.