Sebastian Ramacher, TU Graz, Leader WP 5

In Safe-DEED, we focus on a different architectural paradigm of a decentralized data marketplace. We assume that data is not stored in a centralized repository or database, but instead companies keep their dataset. Therefore, the platform does not contain a centralized database (https://safe-deed.eu/finding-a-business-model-for-decentralized-data-marketplaces/). Additionally, as one of its core components, Safe-DEED is concerned with improving privacy-preserving methods allowing computations on sensitive data. For this setting, i.e. decentralized datasets used as basis for privacy-preserving computations, cryptographic protocols are well known in the literature: secure multiparty computation (MPC) enables two or more parties to provide inputs for a function and jointly compute the result. Most importantly, the parties learn nothing more than the result. Therefore, the inputs provided by the respective parties remain confidential to the extent that they can not be recovered from the result itself.

For example, companies would like to join their data on their customers in order to get a more accurate predictions on their spending habits or other statistics. Since this data is highly sensitive personal data, e.g. it may include data protected under the GDPR, they are not allowed to share their customers data. Yet, sharing this data would enable the businesses to improve their overall business strategy in ways developed in WP2, but also improve the exploitation and usefulness of their data.

Computation performed by trusted third party

In scenarios of this kind, we could perform this computation by providing the data to a trusted party, which would then have access to the data of all hospitals and could compute the statistics. MPC would enable the hospitals to obtain the combined mortality of lung cancer patients, without learning the other hospital’s data or any intermediate state of the computation. In general, computation that rely on data from multiple owners that can performed by a trusted third party, can also be performed by employing a MPC protocol involving all parties. Thereby, we can remove the trusted third party from the picture and no longer need to unilaterally trust that party. Such methods are natural choice when sensitive data of multiple data owners is involved in these computation. Consequently, MPC fits perfectly for a decentralized data marketplace, as the data no longer needs to be uploaded to a central store and remains in the control of the data owner.

Computation performed without trusted third party using MPC

Private Set Intersection

Secure multiparty computation is a general concept that can be applied to a vast selection of computations. For concrete use-cases, however, it is often more efficient to select more specialized protocols. One prominent example of more specialized protocols is private set intersection (PSI). Here two (or more) parties have sets of elements they want to keep private, but want to compute the intersection between their sets. Alternatively, they could also be interested in related metrics such as the size of the intersection or if the size of the intersection is above some threshold. To highlight the power and usefulness of PSI protocols, we will discuss a particularly interesting application: advertisement conversion rates. The goal here is to determine how effective advertisements are and, e.g., whether the lead to new customers signing up for a service, or the selling of more products.

For instance, imagine a golf club that advertises on a premium live sports TV channel. To measure the effectiveness of the advertisement, the marketing department of the TV operator and the golf club want to compare lists of users that have a subscription of the pay TV channel and new members of the golf club since the ad first ran. On both sides users can be tracked via, e.g., name or email addresses, so effectiveness can be easily measured by comparing the lists of TV subscriptions and the member list of the golf club. However, neither the TV operator nor the golf club is interested in simply sending the list of customers to the other party. Clearly, the intersection between those lists could of course be computed by sending both lists to trusted third party. Using a PSI protocol, however, the computation of the intersection can also be performed in a privacy-preserving manner only involving the TV operator and the golf club. Neither party learns anything else than the new golf club members, which have the live sport package and therefore likely have seen the ad.