Published in PNAS Nexus in 2024, the paper of Johannes Wachs and co-authors demonstrate for the first time that Large language models are a potential substitute for human-generated data and knowledge resources. The authors claim that this substitution can present a significant problem for the training data needed to develop future models if it leads to a reduction of human-generated content.
Access the paper at this LINK.