Congrats to MOL posters for helping train Chat GPT, Bard, etc

PVW

Apr 19, 2023 at 2:07pm

Per the Washington Post:

Inside the secret list of websites that make AI like ChatGPT sound smart

To look inside this black box, we analyzed Google’s C4 data set, a massive snapshot of the contents of 15 million websites that have been used to instruct some high-profile English-language AIs, called large language models, including Google’s T5 and Facebook’s LLaMA. (OpenAI does not disclose what datasets it uses to train the models backing its popular chatbot, ChatGPT)

There's a search tool to see what sites are this data set. Relevant screenshot attached...