Differential privacy (DP) provides a powerful, mathematically rigorous assurance that sensitive individual information in a dataset remains protected, even when a dataset is used for analysis. Since DP’s inception nearly two decades ago, researchers have developed differentially private versions of myriad data analysis and machine learning methods, ranging from calculating simple statistics to fine-tuning complex AI models. However, the requirement for organizations to privatize every analytical technique can be complex, burdensome, and error-prone.
Generative AI models like Gemini offer a simpler, more efficient solution. Instead of separately modifying every analysis method, they create a single private synthetic version of the original dataset. This synthetic data is an amalgamation of common data patterns, containing no unique details from any individual user. By using a differentially private training algorithm, such as DP-SGD, to fine-tune the generative model on the original dataset, we ensure the synthetic dataset is both private and highly representative of the real data. Any standard, non-private analytical technique or modeling can then be performed on this safe (and highly representative) substitute dataset, simplifying workflows. DP fine-tuning is a versatile tool that is particularly valuable for generating high-volume, controlled datasets in situations where access to high-quality, representative data is unavailable.
Most published work on private synthetic data generation has focused on simple outputs like short text passages or individual images, but modern applications using multi-modal data (images, video, etc.) rely on modeling complex, real-world systems and behaviors, which simple, unstructured text data cannot adequately capture.
We introduce a new method for privately generating synthetic photo albums as a way to address this need for synthetic versions of rich, structured image-based datasets. This task presents unique challenges beyond generating individual images, specifically the need to maintain thematic coherence and character consistency across multiple photos within a sequential album. Our method is based on translating complex image data to text and back. Our results show that this process, with rigorous DP guarantees enabled, successfully preserves the high-level semantic information and thematic coherence in datasets necessary for effective analysis and modeling applications.