Synthetic Data: Replace sensitive data entirely
Synthetic data is used as a PET as it generates a new dataset that closely reflects the statistical traits of the original one, but without including any sensitive, real-world information. This enables analysts, researchers, and organizations to engage in data analysis, model construction, and insight generation without revealing individual data points.
For instance, if a healthcare provider wishes to share patient information for new drug research, using real patient data would compromise privacy. A synthetic dataset can be created instead, maintaining the general trends and characteristics—such as age distribution and medication response—of the original data, but without any personally identifiable details.
Among Molten's portfolio companies, Mostly AI specialises in creating synthetic data as a technology for privacy enhancement. Mostly AI produces software that allows businesses to create synthetic datasets suitable for various applications like training AI models, conducting data analytics, and product testing.
Using Generative AI they convert original datasets containing sensitive and private information into synthetic versions that comply with privacy regulations. These synthetic copies usually preserve 80-99%+ of the original data's underlying patterns, offering a secure method for training models, making business and product choices, and externally sharing data without risking data breaches or privacy infringement.