Congrats to MOL posters for helping train Chat GPT, Bard, etc

Per the Washington Post:

Inside the secret list of websites that make AI like ChatGPT sound smart

To look inside this black box, we analyzed Google’s C4 data set, a massive snapshot of the contents of 15 million websites that have been used to instruct some high-profile English-language AIs, called large language models, including Google’s T5 and Facebook’s LLaMA. (OpenAI does not disclose what datasets it uses to train the models backing its popular chatbot, ChatGPT)

There's a search tool to see what sites are this data set. Relevant screenshot attached...

That explains a lot.

But we can’t be sure until it calls someone a “poopyhead”.

Where can we spend the tokens?

In order to add a comment – you must Join this community – Click here to do so.