A few months ago, I decided to begin work on my first machine learning project using Tensorflow, a powerful machine learning framework created by Google.
What resulted was the Diabetic Retinopathy screening project, which can take a retinal image, run it through an algorithm, and give you a pretty good idea if an eye is showing signs of diabetic retinopathy, an ocular disease that manifests as a result of diabetes, and is one of the leading causes of blindness.
In this post, I will walk you through a tutorial on what each file in the project does, and how to use it. All of the code in this project is written in Python, so make sure you know the basics of Python and Tensorflow before attempting this tutorial.
Disclaimer: This project and software should not be used in a real world scenario. I am not a physician, and this is not going to definitively tell you whether you have an ocular disease. Don’t be stupid, and don’t trust your wellbeing or someone else’s wellbeing in a random computer program you found online.
Also, if you want to try out the algorithm behind this project without all the Python and stuff behind it, I would recommend you check out the retinopathy-server or retinopathy-desktop repositories, as they are much easier to use and require very minimal knowledge of Python.
Anyway, cue the tutorial.
This is a gross oversimplification of how Tensorflow and machine learning works. If you want to know this works in detail, I suggest the Tensorflow for Poets tutorial by Google, which is available here: https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/index.html. The algorithm and files contained in this repository and that tutorial are quite similar, and that will also guide you through some stuff that I’ve decided to leave out of this tutorial. A further part of this tutorial is contained in the repository, under the “documentation” directory in a file called “retrain.md”.
This algorithm uses a few different files, so let’s walk through them.
Definitions
arch.py — this is one of the main files. It takes a list of retinal images (contained in a CSV file) and runs the algorithm (contained in labelimage.py) to classify the images as either showing signs of diabetic retinopathy, or clear of it.
labelimage.py (not to be confused with label_image.py) — subroutine that takes an image and runs it through the graph file in order to classify it (retrained_graph.pb)
retrained_graph.pb — the heart of the algorithm. This contains the methods that Tensorflow uses to classify an image. Open it, and you’ll find a bunch of binary stuff. (If you want to retrain the algorithm, see retrain.md in the repository)
label_image.py — this file allows you to classify an image without preparing a CSV list. It’s helpful if you want to just test one file rather than something like twenty.
retrain.py — see retrain.md in the repository.
dataset.csv — this file is the “answer key”, as it contains the clinician-graded classifications of each image, on a scale from 1–4.
dataindex.csv — this is the file that arch.py uses to see what files to run through the algorithm.
retrained_labels.txt — this is the list of classifications that the algorithm could assign to a single image. (Don’t modify this file without retraining retrained_graph.pb. If you want to know how to do that, see retrain.md in the repository)
bottlenecks/ — this is a Tensorflow system area. For the purposes of this tutorial, it should be left alone. If you want to learn more about what’s in here, see the Tensorflow for Poets article by Google)
logs/— see the README.md contained in that directory.
inception/ — this is a software package that Tensorflow uses to make sense of image data that it receives.
Quick start
Install Python and Tensorflow. (You can find a tutorial on the Tensorflow website on how to do this.) In the course of developing this project, we used a python 2.7 install with a corresponding Tensorflow for python 2.7 install.
Clone/download this repository and the dataset. (Dataset here: https://www.kaggle.com/c/diabetic-retinopathy-detection/data. You will need all of the “train.00x.zip” files, the training_labels.zip file, and some method to separate them into different directories based on that master list. You can use our sorting script, which is available here: ~~~ embed:6a9f9c8a0ee29cb7a817777421685e49. A smaller, curated version of this dataset is available at https://github.com/Nomikxyz/retinopathy-dataset.) gist metadata:amF2YXRodW5kZXJtYW4vNmE5ZjljOGEwZWUyOWNiN2E4MTc3Nzc0MjE2ODVlNDkuIEEgc21hbGxlciwgY3VyYXRlZCB2ZXJzaW9uIG9mIHRoaXMgZGF0YXNldCBpcyBhdmFpbGFibGUgYXQgaHR0cHM6Ly9naXRodWIuY29tL05vbWlreHl6L3JldGlub3BhdGh5LWRhdGFzZXQuKQ== ~~~
Fill out dataindex.csv with the file addresses of the images you want to test. (Unless you have a system with a lot of resources on tap, don’t go over 10 images in a single batch. It tends to overload the graph file and cause all sorts of issues.)
Run “python arch.py”, and it will start printing out the results of each image. It will also automatically generate a log file in the root directory of the repository to store the output of a batch. If you plan on saving the contents of that log file, ensure that you rename it something different so that it is not overwritten by the next batch of output.
Profit. If you want to go further, check out retrain.md in the repository for information on how to retrain the algorithm on your own training set or check out the Tensorflow for Poets tutorial for more information.
Thanks for reading. If you have any questions, tweet me @javathunderman.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc. Code included in this tutorial are subject to several licenses. See LICENSE.md in the repository for more information.
(This is a repost from last year)
Hello,
We have contacted you on your Twitter to verify the authorship of your Steemit blog but we have received no response yet. We would be grateful if you could, please, respond to us via Twitter.
Please note I am a volunteer that works to ensure that plagiarised content does not get rewarded. I have no way to remove any content from steemit.com.
Thank you
Hi! I am a robot. I just upvoted you! I found similar content that readers might be interested in:
https://medium.com/@javathunderman/how-to-get-started-with-the-diabetic-retinopathy-project-f1c2700188ae
!cheetah ban
Failed ID Verification.
Okay, I have banned @javathunderman.
Thank you so much for sharing this amazing post with us!
Have you heard about Partiko? It’s a really convenient mobile app for Steem! With Partiko, you can easily see what’s going on in the Steem community, make posts and comments (no beneficiary cut forever!), and always stayed connected with your followers via push notification!
Partiko also rewards you with Partiko Points (3000 Partiko Point bonus when you first use it!), and Partiko Points can be converted into Steem tokens. You can earn Partiko Points easily by making posts and comments using Partiko.
We also noticed that your Steem Power is low. We will be very happy to delegate 15 Steem Power to you once you have made a post using Partiko! With more Steem Power, you can make more posts and comments, and earn more rewards!
If that all sounds interesting, you can:
Thank you so much for reading this message!
Congratulations @javathunderman! You received a personal award!
You can view your badges on your Steem Board and compare to others on the Steem Ranking
Vote for @Steemitboard as a witness to get one more award and increased upvotes!