My open source anti-plagiarism steemit bot

in #steemit8 years ago (edited)

In response to @cryptoctopus's request for this, I spent the last full week making an anti-plagiarism bot for steemit. Unfortunately for me, I was too late to claim the 500 odd steem $ bounty.

Nevermind though, I had a feeling I'd be beaten as I'm not really a programmer.

Here is the bot!


Source: publicdomainpictures.net

And what it looks like:


The famous steem API tool Piston:

Piston Bong by XrNiX (changes made) (CC BY 3.0)

Also a quick and easy way to read through new steemit posts without all the distractions of pictures, reaction-gifs, headers and other styling or waiting for page loading / navigating the website.

It was fun learning to code!

I realise my code is likely unnecessarily unwieldy (it's all on one page) and needs to be refactored for other's sanity. Maybe I could have made better use of Python's inbuilt functions rather than relying on so many regular expressions.


I promise in my next project I will make use of functions and classes.

If you want to point out any glaring errors I'd really appreciate the feedback!

Comments

I try to explain what's going on in most of the code so check it out!

What does it do?

It scans (and displays in the console) newly created posts and searches @bunix's yaCy index for 4 random 'exact phrase' chunks of words. If a search hit comes up, and the author is different, the full content of the posts are compared (minus some formatting). If the match is at least 50% then a comment is made linking to the article and showing the percentage of the match.


What doesn't it do?

Doesn't search the whole web

I realise that a smarter anti-plagiarism bot might search the entire web using Google, Bing, or Faroo thus finding the original source of the content.


But most frustratingly, I just couldn't navigate Google's search API or get any example Python code to work (most written for Python 2.7). Faroo requires manual verification to get access to its API (still waiting for access).


In the end I decided it would be an interesting enough experiment to catch out spammers who copy content from other steemit peers.

Motivations

Although the bounty was certainly one motivation, learning to code has always been on my todo list. Also, I was getting annoyed at the amount of flagrant plagiarism in the new category, so @cryptoctopus's post resonated with me. This is my first potentially useful Python program!

I learnt how to use regular expressions (a bit) and debug Python errors


Image: Copyright © 2013 by TWiki.org

Resources I used

My reliance on pythex.org was extensive. They make practising regular expressions fun!

Piston, obviously :) thanks to @xeroc for all your help!

The Python documentation

TutorialsPoint.com

StackExchange

IRC freenode #python channel

Requirements

You'll need Python 3, some pip modules (starting with Python 3.4, it is included by default with the Python binary installers) and most importantly Piston.

Note: Starting with Python 3.4, pip is included by default with the Python binary installers.

Bugs

Some random chunks should come up in the search results but don't. I guess it could be with the way yaCy indexes posts but I'm just guessing.

Another bug (albeit a minor one) is when a match is found it will continuously print '.Percentage difference:' which is pretty annoying - anyone know how to fix it?

Also it doesn't shut down cleanly when you exit (ctrl+c) so you need to mash ctrl+c a few times.

I'm sure there are many more that I haven't discovered yet so please make an issue on the GitHub page if you get one!

What's the next project?... an intelligent upvote bot!

I'd like to automate my curation duties using an upvote bot. I really find this AI social media stuff fascinating! If you know of any projects that do this already please shar

Sort:  

Here are some other Bots
https://steemit.com/steemit/@marsresident/bots-and-the-steemit-ecosystem

If you keep making Bots you will probably get a good amount of STEEM. People upvote Bot Comments not even realizing it sometimes, like Wang, he has over $1,000,000.

Can confirm... wang fooled me until i noticed him EVERYWHERE.

Thanks - this is really great!

I really would like to see how projects like this and cheetah bot play out a few months down the road.

Good job trying coding is way to daunting for me to try

sorry! you were a little late. I hope you've learned something useful in the process. :-)

I certainly did! Thanks for your original post that got me started.

Its good to have multiple. You and cheetah bot together could work nicely.

Maybe just add something to check for a post by cheetahbot before posting, essentially splitting the workload if cheetahbot did the same? Either way kudos on the effort and maybe the experience will pay off when the next bot bounty gets thrown down.

Thank you for your efforts; I cannot say anything more since i have no clue you approached or achived your programs goals.
Thanks again , good luck on your next learning project ...

Thanks for sharing and going into detail on your programming journey :)

Best wishes on it's continued progress!

Good job!!!

Well done, good luck with next project

Thanks!

Thank you great research have to reread it.

Good work. Thanks to share!

bien hecho!

Interested to hear more, thanks for sharing! :)

Hell yeah! All my content is original. I went to Area 51 to explain my ideas.

https://steemit.com/area/@steve-mcclair/area-51-steemit-has-arrived