No matter how anonymous you think your tweets are, thanks to the huge amount of metadata that Twitter stores, you can identify yourself and almost any user with great precision , as researchers at the Alan Turing Institutein London have proven in a new study .
Applying a supervised machine learning algorithm, the researchers were able to identify all users of a group of 10,000 twitterers with approximately 96.7% accuracy.
The problem that metadata is not considered sensitive information
Metadata is associated with most of the information we produce every day in our interactions and communications within the digital world. And surprisingly that information is not considered as "Sensitive".
While most focus on identifying a user using the content of the message, metadata is much more effective in classifying the information that belongs to a particular user. Although this research used Twitter for its tests, the problem applies to other social networks.
The metadata that Twitter saves are public access
Most Twitter users do not know that the social network stores 144 pieces of metadata about them , and they are publicly accessible through the site's API. Compared to the content of a tweet, the metadata is much larger.
One of the authors of the investigation illustrates the situation with a very interesting example:
No one in their right mind would tell their address to a stranger who asks for it in the street. But perhaps they could tell you how often they turn on and off the light in their room. That's the mentality with metadata, people think it's not a big deal. But if you join with another piece of information I can know if you are at home or not.
Things like the date on which an account was created, the time at which a tweet is published, the number of favorites , followers and followed, etc. Basic information that combined can identify the user extremely efficiently.
The researchers hope that with the introduction of the GDPR perhaps the inspection on the metadata will increase , since the regulation requires that only the specific data to perform a task are processed by the companies. But the other problem, as always, is not how bad it is that the technology stores so much information about us, but that people care in the first place.
Please put the source on image :)
Thanks for reminding me @yandot.
None of this should really shock people, it's basically impossible to be anonymous online at this point, but how effective do you think VPNs are at fighting this? That's the main strategy I have at this point when it comes to internet anonymity.