Legislators are being urged to tighten up the laws on anonymised data – the bedrock of nearly every data-driven marketing programme on the market – after a new study found that relatively simple modelling techniques can easily identify individuals to 99.98% accuracy.
Any such move would not only have a devastating impact on tech giants and data companies, which have spent hundreds of millions of pounds building martech and adtech systems using unidentifiable consumer data, ultimately it would also scupper their clients’ marketing efforts.
The research, carried out by Imperial College London and the Catholic University of Louvain in Belgium, shows that even datasets with names, addresses and other unique identifiers removed can be traced back to individuals using machine learning.
The model developed by the team allowed them to correctly re-identify 99.98% of Americans in any anonymised dataset using just 15 characteristics, such as age, gender and marital status.
The research has been led by Imperial College London assistant professor Dr Yves-Alexandre de Montjoye, who is also a special advisor to the EU’s Competition Commissioner Margrethe Vestager.
He says that the findings, which are published in the journal Nature Communications, should be a wake-up call for policymakers on the need to tighten the rules for what constitutes truly anonymous data.
While the vast majority – if not all – adtech, martech and consumer profiling systems are based on anonymised data, it is also widely used by governments for scientific, social and societal research.
But the study shows just how easily the data can be reverse engineered to re-identify individuals, using modelling based on what is called “a generative copula-based method”.
The research cites the example of a 61-year old man living in Chelsea, New York. It is claimed he can be correctly identified 81% of the time just from gender, birth date and postcode data. By adding five more points of data (marital status, vehicle ownership, home ownership status and employment status) the likelihood of this person being correctly identified hits 100%.
The more data points are gathered, the easier it becomes to identify the individual behind the data and some anonymous data-sets contain as many as 248 data points.
The study insists this ease of discovery could expose sensitive information about personally identified individuals, and allow data buyers to build increasingly comprehensive personal profiles on them.
De Montjoye believes the re-identification of anonymised data has been downplayed because it is always claimed the datasets are incomplete.
He added: “Our findings contradict this and demonstrate that an attacker could easily and accurately estimate the likelihood that the record they found belongs to the person they are looking for.
“The goal of anonymisation is so we can use data to benefit society. This is extremely important but it should not and does not have to happen at the expense of people’s privacy.”
No doubt the EU Competition Commissioner will also be studying the findings, too.
Related stories
ICO urged to act now on adtech or be seen as soft touch
ICO: online ad industry ‘leaving millions at risk of harm’
Germans unleash GDPR blitz on behavioural ad giants
Google Ad Exchange probe threatens online ad mayhem
Adspend nears £24bn with surge in data-driven activity
Irish data regulator launches inquiry into adtech giant
New Govt probe to scrutinise behavioural data market
ICO taps up industry for probe into programmatic ads