Alexandros Bampoulidis, RSA FG, WP5

Personal data is data that contains information about individuals and is used for research and innovation purposes both in the industry and academia. However, such data contains private, sensitive information about individuals that they might not be willing to publicly share and that could be used maliciously against them. Therefore, their privacy must be protected. The data controllers carry the responsibility of protecting the privacy of individuals and they need to be cautious when collecting, processing, and sharing such datasets.

Simply removing direct identifiers, such as full name and address, and releasing only a subset of a dataset, is not enough to protect the privacy of the individuals, because of the quasi-identifiers (QIs). QIs are attributes that do not directly identify individuals, but, when combined, could serve as a unique identifier of individuals. For example, in Simple Demographics Often Identify People Uniquely (2000), Sweeney showed that 87% of the U.S. population is uniquely identifiable by the combination of their gender, date of birth and ZIP code.

In order to counter the risks of de-anonymisation of individuals, privacy models have been introduced. These privacy models rely on distorting the dataset. The more private a dataset is, the less useful it is. A challenge in protecting the privacy of individuals is to find the right amount of privacy, while having a useful dataset.

In the context of Safe-DEED, we investigated the de-anonymisation and anonymisation of the use-case dataset of the project in a data sharing setting, in order to raise privacy „red flags“. To do so, we defined a procedure that takes into account the GDPR, and raises the awareness of data controllers on the de-anonymisation risks in their dataset and helps them in deciding the anonymisation measures. The procedure consists of 3 steps:

  1. Data Landscape Analysis
  2. Threat Analysis
  3. Anonymisation Measures

To find out more about the procedure read Safe-DEED’s latest publication Practice and Challenges of (De-)Anonymisation for Data Sharing here, or visit its poster at the RCIS 2020 Posters and Demos session on Thursday 24th September 11:00 – 12:30 CEST.

Additionally, in this publication we recount the current challenges in the field of (de-)anonymisation that we faced and are addressing in Safe-DEED and the H2020 TRUSTS project, as well.