It’s infuriatingly hard to understand how closed models train on their input

The lack of transparency from builders of big closed language models, such as GPT-3 and GPT-4, regarding their training data is causing concern among users. The main issue is that there is no way to confidently state that private data being passed to these models isn’t being used to train future models, given the lack of transparency about what goes into them. OpenAI’s unambiguous policy is reassuring, but there are still questions about how its ChatGPT model uses conversations with users to improve further. Concerns about security leaks also persist, with companies like GitHub urged to communicate more clearly about the use of private repository data.

https://simonwillison.net/2023/Jun/4/closed-model-training/