38TB of data accidentally exposed by Microsoft AI researchers

Microsoft’s AI research team accidentally exposed 38 terabytes of private data, including backups of employees’ workstations and over 30,000 internal Microsoft Teams messages, while publishing open-source training data on GitHub. The researchers used Azure Storage SAS tokens to share their files, but the link was misconfigured to share the entire storage account, not just specific files. This incident highlights the new risks organizations face when working with large amounts of training data for AI solutions. It also reveals the potential for supply chain attacks by injecting malicious code into AI models. SAS tokens pose security risks due to excessive permissions, long expiry times, and difficulties in management and revocation.

https://www.wiz.io/blog/38-terabytes-of-private-data-accidentally-exposed-by-microsoft-ai-researchers