Synthetic data privacy evaluation

Privacy evaluation framework for synthetic data publishing

Carmela Troncoso — Security and Privacy Engineering Laboratory

The framework implemented in this library allows data holders to evaluate how much publishing a synthetic dataset in place of a sensitive raw dataset reduces the privacy risk for the individuals whose data is included in the raw data. The results of the evaluation help to inform decisions about whether to publish the data or which generative model provides the best trade-off between utility and privacy gain.

The framework implemented in this library measures the privacy gain of publishing a synthetic dataset in place of the raw data with respect to a specific privacy concern. Each concern is modelled as a privacy adversary that targets an individual record and aims to infer a secret about this record. The library includes implementations of two new privacy attacks on the output of a generative model. To evaluate privacy gain, the framework is instantiated under the chosen threat model and outputs an estimate about how much publishing the synthetic data instead of the raw data reduces the privacy loss of a chosen target record under this threat model.

Synthetic Data -- Anonymisation Groundhog Day — Published at: 31st USENIX Security Symposium, 2022

inactive — entered showcase: 2021-02-08 — entry updated: 2022-07-07

Lab Github - last commit: 2022-05-13

This project has not yet been evaluated by the C4DT Factory team. We will be happy to evaluate it upon request.

Library

Python

BSD-3-Clause