You are viewing a single comment's thread from:

RE: Some limitations that I probably should have mentioned.

in #photomag7 years ago

Currently im working on some tineye equivalent for steemit for searching for duplicate and in the future also similar images. Would you be interested? Im pretty New to programming (2-3 years as a Hobby)

Sort:  

Will it present better results than Google image search?
What exactly are you trying to achieve?

Well it was planned to use to check if a picture was already uploaded/used on steemit or not.

It could be more exact than Google for certain cases because the aproach is different. I calculate a imagehash(p-hash in my case) like tineye. Googles aproach is not open for the public to know, but probably uses machine Learning and pattern matching. From my own experience the algorithm im using is very fast (approx 1s-2s on a 1ghz single core cpu for one hash+ approx 0.2s for listing similar hashes from a database containing approx 0.5 million hashes, the database is subject to change and is missing many pictures from steemit). But the downside is that i can only find identical pictures and slightly edited pictures, whereas Google can is very good at finding similar pictures, due to machine Learning. Note that i do this project just for fun for me to learn database handling, pictureprocessing and multiprocessing.

It sounds like a great project. I didn't mean to criticize you, just wondered about the details. Even if it is not growing into something big, it will still be a great project to work on and learn from.

Finding similar images would be key though, as people tend to adjust 'stolen' images a little to make them look their own.

Well it works to a certain extent. Atm the Problem is the database structure, because the hashes are saved to sql where i only can check if they are exactly the same. To look for similar i would need to compute the hamming distance which is very slow because I need to compute it for every other hash in the database, which would be very slow. Therefore i need to expirement with b-trees.

Not sure if it is possible as I have not tried anything related before. But if you could just save the middle of the image somehow, you might be able to make a good comparison.


On that github repro are also links to webpages on how they work and how effective they are.
Due to these algorithims slight changes like jpeg compression artifacts, rescaling, slight cropping do not affect the hash that much. Cropping does still affect the hash the most ill try out if your Idea or similar techniques work.Well to produce a hash images are scaled down to a picture with a Pixel amount from the Power of 2 (64 pixels being the smallest with good results). Before and After resizing certain Operations are applied to get better results. Depending on the Operations used the accuracy and time to compute changes. Sample operations are, convert to grayscale, Discrete wavelet transform, Discrete cosine transform, etc. There is a Python libary that i am using : https://github.com/JohannesBuchner/imagehash

Keep in mind I am just thinking out loud there. My idea is that changes made to an image mostly happen at the top and bottom. If it is possible to just check some area in the middle you could find equal and adjusted versions