Can I remove my personal data from GenAI training datasets?

Many tech companies that develop GenAI products are not transparent about the sources of their training data. However, there are now tools available for users to investigate if their personal data has been used in these datasets. Examples include the “Have I Been Trained” tool and the Exposing.ai project. It has been discovered that many GenAI products are trained on large datasets that include personal information scraped from popular websites, such as social media platforms and online encyclopedias. This practice is not exclusive to GenAI, as companies like Clearview AI have previously scraped billions of photos for their facial recognition technology. Some companies are now taking legal action against GenAI companies for using their users’ data without permission. Removing personal data from training datasets is challenging, especially when users don’t know if their data is involved. Mozilla has petitioned Microsoft to disclose if personal data will be used to train their AI models. Another method is through legal action, as seen in the case of J.L. v. Alphabet, where plaintiffs allege that their data was used by Github and OpenAI for training Copilot. California residents can use the CCPA to request the removal of their personal data from GenAI training datasets, but companies often respond in inconsistent ways. It

https://knowingmachines.org/knowing-legal-machines/legal-explainer/questions/can-i-remove-my-personal-data-from-genai-training-datasets

To top